[GH-ISSUE #1645] Bug: Chromium profile and cookies.txt not working #985

Open
opened 2026-03-01 14:47:44 +03:00 by kerem · 0 comments
Owner

Originally created by @TooManyStacks on GitHub (Jan 31, 2025).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1645

Originally assigned to: @pirate on GitHub.

Provide a screenshot and describe the bug

Maybe related to #1637. I put my chromium profile and a cookies.txt from that profile, into ArchiveBox, but I still keep seeing cookie walls.
ArchiveBox chromium version: Chromium 131.0.6778.33
My chromium version: 131.0.6778.264

Steps to reproduce

Add link and try to snapshot it:

https://www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/

Logs or errors

> Skipping full snapshot directory check (quick mode)
----------------------------------------------------------------------
[] Done. Verified and updated the existing ArchiveBox collection.
[+] Starting ArchiveBox webserver...
    > Logging errors to ./logs/errors.log
Performing system checks...
System check identified no issues (0 silenced).
January 31, 2025 - 06:11:42
Django version 3.1.14, using settings 'core.settings'
Starting development server at http://0.0.0.0:8000/
Quit the server with CONTROL-C.
"GET /admin/core/snapshot/ HTTP/1.1" 200 134575
"GET /admin/jsi18n/ HTTP/1.1" 200 3191
"GET /archive/1738303270.275219/index.html HTTP/1.1" 200 242657
"GET /archive/1738303270.275219/singlefile.html HTTP/1.1" 200 207646
"GET /archive/1738303270.275219/screenshot.png HTTP/1.1" 200 228247
"GET /archive/1738303270.275219/readability/content.html HTTP/1.1" 200 1005
"GET /archive/1738303270.275219/mercury/content.html HTTP/1.1" 200 32
"GET /archive/1738303270.275219/www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/index.html HTTP/1.1" 200 303481
"GET /archive/1738303270.275219/output.pdf HTTP/1.1" 304 0
"GET /archive/1738303270.275219/media/ HTTP/1.1" 200 401
"GET /archive/1738303270.275219/headers.json HTTP/1.1" 200 341
"GET /archive/1738303270.275219/output.html HTTP/1.1" 304 0
Not Found: /archive/1738303270.275219/git/
"GET /archive/1738303270.275219/git/ HTTP/1.1" 404 1232
"GET /archive/1738303270.275219/statics.ad.nl/fonts/FlamaSemicond-Black-6cf7d54458.woff2 HTTP/1.1" 200 35816
"GET /archive/1738303270.275219/statics.ad.nl/css/main-91099771a2.css HTTP/1.1" 200 477306
"GET /archive/1738303270.275219/statics.ad.nl/fonts/FlamaSemicond-Bold-8199bf9ba9.woff2 HTTP/1.1" 200 35656
"GET /archive/1738303270.275219/statics.ad.nl/js/head-8fb7a9a3bb.js HTTP/1.1" 200 16623
"GET /archive/1738303270.275219/myprivacy-static.dpgmedia.net/consent.js HTTP/1.1" 200 263963
"GET /archive/1738303270.275219/statics.ad.nl/js/advertising-dbc43085ca.js HTTP/1.1" 200 2096
"GET /archive/1738303270.275219/login-static.dpgmedia.net/ssosession/main.js HTTP/1.1" 200 266860
"GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/web-advertising/16/26/0/advert-xandr.js HTTP/1.1" 200 137674
"GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/native-renderer/main.js HTTP/1.1" 200 80292
"GET /archive/1738303270.275219/statics.ad.nl/js/main-571a8262ad.js HTTP/1.1" 200 113444
"GET /archive/1738303270.275219/temptation.ad.nl/temptation.js@v=20220407 HTTP/1.1" 200 9510
"GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/native-templates/prod/algemeendagblad/templates.js@version=662 HTTP/1.1" 200 260917
"GET /archive/1738303270.275219/embed.mychannels.video/sdk@brand=ad HTTP/1.1" 200 306442
"GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/header-bidding/prod/algemeendagblad/cfb53ea7d1e97565bb2a82d094c0e34551a602f4.js HTTP/1.1" 200 28503
"GET /archive/1738303270.275219/statics.ad.nl/fonts/Flama-Basic-38a942a4fb.woff2 HTTP/1.1" 200 33528
"GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/web-advertising/prebid-8-32-0.js HTTP/1.1" 200 399777
"GET /archive/1738303270.275219/statics.ad.nl/img/brand-logo-57502e4ec6.svg HTTP/1.1" 200 479
"GET /archive/1738303270.275219/statics.ad.nl/img/icons/timeline/timeline-dot-b0e1b87a7f.png HTTP/1.1" 200 755
"GET /index.html HTTP/1.1" 302 0
"GET /archive/1738303270.275219/statics.ad.nl/fonts/FlamaPro-Basic-91c9c285f4.woff HTTP/1.1" 200 36612
Not Found: /article/remaining-content/~a6f30e49
"GET /article/remaining-content/~a6f30e49?articleUrl=http%3A%2F%2F192.168.0.100%3A85%2Farchive%2F1738303270.275219%2Fwww.ad.nl%2Fbuitenland%2Fduitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49%2Findex.html&referrer=http%3A%2F%2F192.168.0.100%3A85%2Farchive%2F1738303270.275219%2Findex.html HTTP/1.1" 404 179
"GET / HTTP/1.1" 302 0
"GET /admin/core/snapshot/ HTTP/1.1" 200 134575
"GET /admin/jsi18n/ HTTP/1.1" 200 3191
[+] [2025-01-31 06:12:33] Adding 1 links to index (crawl depth=0)...
    > Saved verbatim input to sources/1738303953-import.txt
    > Parsed 1 URLs from input (Generic TXT)
    > Found 1 new URLs not already in index
[*] [2025-01-31 06:12:33] Writing 1 links to main index...

    √ ./index.sqlite3
[*] [2025-01-31 06:12:33] Archiving 1/40 URLs from added set...
[] [2025-01-31 06:12:33] Starting archiving of 1 snapshots in index...
[+] [2025-01-31 06:12:34] "www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/#2025-01-31T06:12:33+00:00"
    https://www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/#2025-01-31T06:12:33+00:00
    > ./archive/1738303953.931291
      > favicon
      > headers
      > singlefile
      > pdf
      > screenshot
      > dom
      > wget
      > title
      > readability
      > mercury
      > htmltotext
      > media
      > archive_org
        Extractor failed:
             Failed to find "content-location" URL header in Archive.org response.
        Run to see full output:
          docker run -it -v $PWD/data:/data archivebox/archivebox /bin/bash
            cd /data/archive/1738303953.931291;
            curl --silent --location --compressed --head --max-time 60 --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.3 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 8.10.1 (x86_64-pc-linux-gnu)" "https://web.archive.org/save/https://www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/#2025-01-31T06:12:33+00:00"
        152 files (11.1 MB) in 0:01:31s 
[] [2025-01-31 06:14:05] Update of 1 pages complete (1.52 min)
    - 0 links skipped
    - 1 links updated
    - 1 links had errors
    Hint: To manage your archive in a Web UI, run:
        archivebox server 0.0.0.0:8000
"POST /admin/core/snapshot/ HTTP/1.1" 302 0
"GET /admin/core/snapshot/ HTTP/1.1" 200 137418
"GET /admin/jsi18n/ HTTP/1.1" 200 3191


And here is the log file

Exception in archive_methods.save_htmltotext(Link(url=https://www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/#2025-01-31T06:12:33+00:00)) command=/usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2025-01-31__06:13:55
cannot access local variable 'cmd' where it is not associated with a value

ArchiveBox Version

archivebox version
0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-5.10.0-16-amd64-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.3          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.10.1         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v20.18.1        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.54         valid     /app/node_modules/single-file-cli/single-file                               
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor               
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js                                  
 √  GIT_BINARY            v2.39.5         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2024.12.13     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v131.0.6778.33  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 √  CHROME_USER_DATA_DIR  54 files        valid     /chromium-profile                                                           
 √  COOKIES_FILE          332.4 KB        valid     /cookies.txt                                                                

[i] Data locations:
 √  OUTPUT_DIR            5 files @       valid     /data                                                                       
 √  SOURCES_DIR           56 files        valid     ./sources                                                                   
 √  LOGS_DIR              2 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           40 files        valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             748.0 KB        valid     ./index.sqlite3

How did you install the version of ArchiveBox you are using?

Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc)

What operating system are you running on?

Linux (Ubuntu/Debian/Arch/Alpine/etc.)

What type of drive are you using to store your ArchiveBox data?

  • some of data/ is on a local SSD or NVMe drive
  • some of data/ is on a spinning hard drive or external USB drive
  • some of data/ is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.)
  • some of data/ is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.)

Docker Compose Configuration

services:
    archivebox:
        image: archivebox/archivebox:latest
        ports:
            - 85:8000
        volumes:
            - /data_disk/docker/archivebox/data:/data
            - /data_disk/docker/archivebox/cookies.txt:/cookies.txt  # Mount the cookies file
            - /data_disk/docker/archivebox/chromium-profile:/chromium-profile # Mount the chrome file
        environment:
            - ALLOWED_HOSTS=*
            - CSRF_TRUSTED_ORIGINS=http://localhost:8000
            - PUBLIC_INDEX=True
            - PUBLIC_SNAPSHOTS=True
            - PUBLIC_ADD_VIEW=False
            - SEARCH_BACKEND_ENGINE=sonic
            - SEARCH_BACKEND_HOST_NAME=sonic
            - SEARCH_BACKEND_PASSWORD={my_super_safe_place_holder}
            - COOKIES_FILE=/cookies.txt  # Set the cookies file path
            - CHROME_USER_DATA_DIR=/chromium-profile # Tell it to use the chromium profile
            #- PUID=1000
            #- PGID=1000
            #- GID=1000
            #- GUID=1000
        labels:
            - com.centurylinklabs.watchtower.enable=true

    archivebox_scheduler:
        image: archivebox/archivebox:latest
        command: schedule --foreground --update --every=day
        environment:
            - TIMEOUT=120
            - SEARCH_BACKEND_ENGINE=sonic
            - SEARCH_BACKEND_HOST_NAME=sonic
            - SEARCH_BACKEND_PASSWORD={my_super_safe_place_holder}
        volumes:
            - /data_disk/docker/archivebox/data:/data
        labels:
            - com.centurylinklabs.watchtower.enable=true

    sonic:
        image: archivebox/sonic:latest
        expose:
            - 1491
        environment:
            - SEARCH_BACKEND_PASSWORD={my_super_safe_place_holder}
        volumes:
            - /data_disk/docker/archivebox/sonic:/var/lib/sonic/store
        labels:
            - com.centurylinklabs.watchtower.enable=true

    novnc:
        image: theasp/novnc:latest
        environment:
            - DISPLAY_WIDTH=1920
            - DISPLAY_HEIGHT=1080
            - RUN_XTERM=no
        ports:
            - 127.0.0.1:8089:8080
        labels:
            - com.centurylinklabs.watchtower.enable=true

networks:
    dns:
        ipam:
            driver: default
            config:
                - subnet: 172.20.0.0/24

ArchiveBox Configuration

[SERVER_CONFIG]
SECRET_KEY = redacted
Originally created by @TooManyStacks on GitHub (Jan 31, 2025). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1645 Originally assigned to: @pirate on GitHub. ### Provide a screenshot and describe the bug Maybe related to #1637. I put my chromium profile and a cookies.txt from that profile, into ArchiveBox, but I still keep seeing cookie walls. ArchiveBox chromium version: Chromium 131.0.6778.33 My chromium version: 131.0.6778.264 ### Steps to reproduce ```markdown Add link and try to snapshot it: https://www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/ ``` ### Logs or errors ```shell > Skipping full snapshot directory check (quick mode) ---------------------------------------------------------------------- [√] Done. Verified and updated the existing ArchiveBox collection. [+] Starting ArchiveBox webserver... > Logging errors to ./logs/errors.log Performing system checks... System check identified no issues (0 silenced). January 31, 2025 - 06:11:42 Django version 3.1.14, using settings 'core.settings' Starting development server at http://0.0.0.0:8000/ Quit the server with CONTROL-C. "GET /admin/core/snapshot/ HTTP/1.1" 200 134575 "GET /admin/jsi18n/ HTTP/1.1" 200 3191 "GET /archive/1738303270.275219/index.html HTTP/1.1" 200 242657 "GET /archive/1738303270.275219/singlefile.html HTTP/1.1" 200 207646 "GET /archive/1738303270.275219/screenshot.png HTTP/1.1" 200 228247 "GET /archive/1738303270.275219/readability/content.html HTTP/1.1" 200 1005 "GET /archive/1738303270.275219/mercury/content.html HTTP/1.1" 200 32 "GET /archive/1738303270.275219/www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/index.html HTTP/1.1" 200 303481 "GET /archive/1738303270.275219/output.pdf HTTP/1.1" 304 0 "GET /archive/1738303270.275219/media/ HTTP/1.1" 200 401 "GET /archive/1738303270.275219/headers.json HTTP/1.1" 200 341 "GET /archive/1738303270.275219/output.html HTTP/1.1" 304 0 Not Found: /archive/1738303270.275219/git/ "GET /archive/1738303270.275219/git/ HTTP/1.1" 404 1232 "GET /archive/1738303270.275219/statics.ad.nl/fonts/FlamaSemicond-Black-6cf7d54458.woff2 HTTP/1.1" 200 35816 "GET /archive/1738303270.275219/statics.ad.nl/css/main-91099771a2.css HTTP/1.1" 200 477306 "GET /archive/1738303270.275219/statics.ad.nl/fonts/FlamaSemicond-Bold-8199bf9ba9.woff2 HTTP/1.1" 200 35656 "GET /archive/1738303270.275219/statics.ad.nl/js/head-8fb7a9a3bb.js HTTP/1.1" 200 16623 "GET /archive/1738303270.275219/myprivacy-static.dpgmedia.net/consent.js HTTP/1.1" 200 263963 "GET /archive/1738303270.275219/statics.ad.nl/js/advertising-dbc43085ca.js HTTP/1.1" 200 2096 "GET /archive/1738303270.275219/login-static.dpgmedia.net/ssosession/main.js HTTP/1.1" 200 266860 "GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/web-advertising/16/26/0/advert-xandr.js HTTP/1.1" 200 137674 "GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/native-renderer/main.js HTTP/1.1" 200 80292 "GET /archive/1738303270.275219/statics.ad.nl/js/main-571a8262ad.js HTTP/1.1" 200 113444 "GET /archive/1738303270.275219/temptation.ad.nl/temptation.js@v=20220407 HTTP/1.1" 200 9510 "GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/native-templates/prod/algemeendagblad/templates.js@version=662 HTTP/1.1" 200 260917 "GET /archive/1738303270.275219/embed.mychannels.video/sdk@brand=ad HTTP/1.1" 200 306442 "GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/header-bidding/prod/algemeendagblad/cfb53ea7d1e97565bb2a82d094c0e34551a602f4.js HTTP/1.1" 200 28503 "GET /archive/1738303270.275219/statics.ad.nl/fonts/Flama-Basic-38a942a4fb.woff2 HTTP/1.1" 200 33528 "GET /archive/1738303270.275219/advertising-cdn.dpgmedia.cloud/web-advertising/prebid-8-32-0.js HTTP/1.1" 200 399777 "GET /archive/1738303270.275219/statics.ad.nl/img/brand-logo-57502e4ec6.svg HTTP/1.1" 200 479 "GET /archive/1738303270.275219/statics.ad.nl/img/icons/timeline/timeline-dot-b0e1b87a7f.png HTTP/1.1" 200 755 "GET /index.html HTTP/1.1" 302 0 "GET /archive/1738303270.275219/statics.ad.nl/fonts/FlamaPro-Basic-91c9c285f4.woff HTTP/1.1" 200 36612 Not Found: /article/remaining-content/~a6f30e49 "GET /article/remaining-content/~a6f30e49?articleUrl=http%3A%2F%2F192.168.0.100%3A85%2Farchive%2F1738303270.275219%2Fwww.ad.nl%2Fbuitenland%2Fduitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49%2Findex.html&referrer=http%3A%2F%2F192.168.0.100%3A85%2Farchive%2F1738303270.275219%2Findex.html HTTP/1.1" 404 179 "GET / HTTP/1.1" 302 0 "GET /admin/core/snapshot/ HTTP/1.1" 200 134575 "GET /admin/jsi18n/ HTTP/1.1" 200 3191 [+] [2025-01-31 06:12:33] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1738303953-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2025-01-31 06:12:33] Writing 1 links to main index... √ ./index.sqlite3 [*] [2025-01-31 06:12:33] Archiving 1/40 URLs from added set... [▶] [2025-01-31 06:12:33] Starting archiving of 1 snapshots in index... [+] [2025-01-31 06:12:34] "www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/#2025-01-31T06:12:33+00:00" https://www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/#2025-01-31T06:12:33+00:00 > ./archive/1738303953.931291 > favicon > headers > singlefile > pdf > screenshot > dom > wget > title > readability > mercury > htmltotext > media > archive_org Extractor failed: Failed to find "content-location" URL header in Archive.org response. Run to see full output: docker run -it -v $PWD/data:/data archivebox/archivebox /bin/bash cd /data/archive/1738303953.931291; curl --silent --location --compressed --head --max-time 60 --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.3 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 8.10.1 (x86_64-pc-linux-gnu)" "https://web.archive.org/save/https://www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/#2025-01-31T06:12:33+00:00" 152 files (11.1 MB) in 0:01:31s [√] [2025-01-31 06:14:05] Update of 1 pages complete (1.52 min) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 "POST /admin/core/snapshot/ HTTP/1.1" 302 0 "GET /admin/core/snapshot/ HTTP/1.1" 200 137418 "GET /admin/jsi18n/ HTTP/1.1" 200 3191 And here is the log file Exception in archive_methods.save_htmltotext(Link(url=https://www.ad.nl/buitenland/duitse-afd-is-de-schaamte-voorbij-partij-wil-buitenlanders-deporteren-zoals-in-verleden~a6f30e49/#2025-01-31T06:12:33+00:00)) command=/usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2025-01-31__06:13:55 cannot access local variable 'cmd' where it is not associated with a value ``` ### ArchiveBox Version ```shell archivebox version 0.7.3 ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-5.10.0-16-amd64-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.11 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.3 valid /usr/local/bin/archivebox √ CURL_BINARY v8.10.1 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.18.1 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.54 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.5 valid /usr/bin/git √ YOUTUBEDL_BINARY v2024.12.13 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v131.0.6778.33 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: √ CHROME_USER_DATA_DIR 54 files valid /chromium-profile √ COOKIES_FILE 332.4 KB valid /cookies.txt [i] Data locations: √ OUTPUT_DIR 5 files @ valid /data √ SOURCES_DIR 56 files valid ./sources √ LOGS_DIR 2 files valid ./logs √ ARCHIVE_DIR 40 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 748.0 KB valid ./index.sqlite3 ``` ### How did you install the version of ArchiveBox you are using? Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc) ### What operating system are you running on? Linux (Ubuntu/Debian/Arch/Alpine/etc.) ### What type of drive are you using to store your ArchiveBox data? - [ ] some of `data/` is on a local SSD or NVMe drive - [ ] some of `data/` is on a spinning hard drive or external USB drive - [x] some of `data/` is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.) - [ ] some of `data/` is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.) ### Docker Compose Configuration ```shell services: archivebox: image: archivebox/archivebox:latest ports: - 85:8000 volumes: - /data_disk/docker/archivebox/data:/data - /data_disk/docker/archivebox/cookies.txt:/cookies.txt # Mount the cookies file - /data_disk/docker/archivebox/chromium-profile:/chromium-profile # Mount the chrome file environment: - ALLOWED_HOSTS=* - CSRF_TRUSTED_ORIGINS=http://localhost:8000 - PUBLIC_INDEX=True - PUBLIC_SNAPSHOTS=True - PUBLIC_ADD_VIEW=False - SEARCH_BACKEND_ENGINE=sonic - SEARCH_BACKEND_HOST_NAME=sonic - SEARCH_BACKEND_PASSWORD={my_super_safe_place_holder} - COOKIES_FILE=/cookies.txt # Set the cookies file path - CHROME_USER_DATA_DIR=/chromium-profile # Tell it to use the chromium profile #- PUID=1000 #- PGID=1000 #- GID=1000 #- GUID=1000 labels: - com.centurylinklabs.watchtower.enable=true archivebox_scheduler: image: archivebox/archivebox:latest command: schedule --foreground --update --every=day environment: - TIMEOUT=120 - SEARCH_BACKEND_ENGINE=sonic - SEARCH_BACKEND_HOST_NAME=sonic - SEARCH_BACKEND_PASSWORD={my_super_safe_place_holder} volumes: - /data_disk/docker/archivebox/data:/data labels: - com.centurylinklabs.watchtower.enable=true sonic: image: archivebox/sonic:latest expose: - 1491 environment: - SEARCH_BACKEND_PASSWORD={my_super_safe_place_holder} volumes: - /data_disk/docker/archivebox/sonic:/var/lib/sonic/store labels: - com.centurylinklabs.watchtower.enable=true novnc: image: theasp/novnc:latest environment: - DISPLAY_WIDTH=1920 - DISPLAY_HEIGHT=1080 - RUN_XTERM=no ports: - 127.0.0.1:8089:8080 labels: - com.centurylinklabs.watchtower.enable=true networks: dns: ipam: driver: default config: - subnet: 172.20.0.0/24 ``` ### ArchiveBox Configuration ```shell [SERVER_CONFIG] SECRET_KEY = redacted ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#985
No description provided.