[GH-ISSUE #1654] Bug: SingleFile extractor doesn't use cookies from CHROME_USER_DATA_DIR #989

Open
opened 2026-03-01 14:47:45 +03:00 by kerem · 0 comments
Owner

Originally created by @Intralexical on GitHub (Feb 8, 2025).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1654

Originally assigned to: @pirate on GitHub.

Provide a screenshot and describe the bug

For pages where the PDF and screenshot extractors are able to use cookies in both backend and frontend, the SingleFile extractor renders the page without any cookies at all.

Screenshot

Image

Image

SingleFile

Image

Image

Presumably this applies to logins too.

Steps to reproduce

Inside existing collection:


# Make profile directory:
mkdir ChromeProfile
# Initialize manually:
chromium --user-data-dir=ChromeProfile
# (Visit setcookie.net and set a cookie for DATA=ABC.)

# Tell ArchiveBox to use it:
CHROME_USER_DATA_DIR=/data/ChromeProfile >> ArchiveBox.conf

# Now add cookie test pages, repeat a couple times:
docker run -v $PWD:/data -it archivebox/archivebox add \
    https://setcookie.net/#$(date -Iseconds) \
    https://www.whatarecookies.com/cookietest.asp#$(date -Iseconds)

docker run -v $PWD:/data -it archivebox/archivebox add \
    https://setcookie.net/#$(date -Iseconds) \
    https://www.whatarecookies.com/cookietest.asp#$(date -Iseconds)

# Check saved outputs:
docker run -v $PWD:/data -p 8000:8000 -it archivebox/archivebox server

Logs or errors


ArchiveBox Version

0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.6.19-1-MANJARO-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644
DEBUG=False IS_TTY=False TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.3          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.10.1         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v20.18.1        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.54         valid     /app/node_modules/single-file-cli/single-file                               
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor               
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js                                  
 √  GIT_BINARY            v2.39.5         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2024.12.13     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v131.0.6778.33  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 √  CHROME_USER_DATA_DIR  45 files        valid     ./CR                                                                        
 -  COOKIES_FILE          -               disabled  None                                                                        

[i] Data locations:
 √  OUTPUT_DIR            6 files @       valid     /data                                                                       
 √  SOURCES_DIR           14 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           21 files        valid     ./archive                                                                   
 √  CONFIG_FILE           113.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             424.0 KB        valid     ./index.sqlite3

How did you install the version of ArchiveBox you are using?

Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc)

What operating system are you running on?

Linux (Ubuntu/Debian/Arch/Alpine/etc.)

What type of drive are you using to store your ArchiveBox data?

  • some of data/ is on a local SSD or NVMe drive
  • some of data/ is on a spinning hard drive or external USB drive
  • some of data/ is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.)
  • some of data/ is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.)

Docker Compose Configuration


ArchiveBox Configuration

[SERVER_CONFIG]
CHROME_USER_DATA_DIR = /data/ChromeProfile
Originally created by @Intralexical on GitHub (Feb 8, 2025). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1654 Originally assigned to: @pirate on GitHub. ### Provide a screenshot and describe the bug For pages where the PDF and screenshot extractors are able to use cookies in both backend and frontend, the SingleFile extractor renders the page without any cookies at all. **Screenshot** ![Image](https://github.com/user-attachments/assets/fc16bb17-9b84-452e-9a81-f4078e81f65b) ![Image](https://github.com/user-attachments/assets/c9aba908-7d9c-41ce-ad85-f06275ea2dee) **SingleFile** ![Image](https://github.com/user-attachments/assets/c8f73f92-1fa8-4578-bb3d-7d3e0433a9c0) ![Image](https://github.com/user-attachments/assets/18810473-04ad-4b39-a55a-3776690649b4) Presumably this applies to logins too. ### Steps to reproduce ```markdown Inside existing collection: # Make profile directory: mkdir ChromeProfile # Initialize manually: chromium --user-data-dir=ChromeProfile # (Visit setcookie.net and set a cookie for DATA=ABC.) # Tell ArchiveBox to use it: CHROME_USER_DATA_DIR=/data/ChromeProfile >> ArchiveBox.conf # Now add cookie test pages, repeat a couple times: docker run -v $PWD:/data -it archivebox/archivebox add \ https://setcookie.net/#$(date -Iseconds) \ https://www.whatarecookies.com/cookietest.asp#$(date -Iseconds) docker run -v $PWD:/data -it archivebox/archivebox add \ https://setcookie.net/#$(date -Iseconds) \ https://www.whatarecookies.com/cookietest.asp#$(date -Iseconds) # Check saved outputs: docker run -v $PWD:/data -p 8000:8000 -it archivebox/archivebox server ``` ### Logs or errors ```shell ``` ### ArchiveBox Version ```shell 0.7.3 ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.6.19-1-MANJARO-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644 DEBUG=False IS_TTY=False TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.11 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.3 valid /usr/local/bin/archivebox √ CURL_BINARY v8.10.1 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.18.1 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.54 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.5 valid /usr/bin/git √ YOUTUBEDL_BINARY v2024.12.13 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v131.0.6778.33 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: √ CHROME_USER_DATA_DIR 45 files valid ./CR - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 6 files @ valid /data √ SOURCES_DIR 14 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 21 files valid ./archive √ CONFIG_FILE 113.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 424.0 KB valid ./index.sqlite3 ``` ### How did you install the version of ArchiveBox you are using? Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc) ### What operating system are you running on? Linux (Ubuntu/Debian/Arch/Alpine/etc.) ### What type of drive are you using to store your ArchiveBox data? - [ ] some of `data/` is on a local SSD or NVMe drive - [ ] some of `data/` is on a spinning hard drive or external USB drive - [ ] some of `data/` is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.) - [ ] some of `data/` is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.) ### Docker Compose Configuration ```shell ``` ### ArchiveBox Configuration ```shell [SERVER_CONFIG] CHROME_USER_DATA_DIR = /data/ChromeProfile ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#989
No description provided.