[GH-ISSUE #1637] Bug: archivebox doesn't use cookie file and vnc doesn't show anything #2490

Closed
opened 2026-03-01 17:59:24 +03:00 by kerem · 11 comments
Owner

Originally created by @orthodoxe on GitHub (Jan 19, 2025).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1637

Originally assigned to: @pirate on GitHub.

Provide a screenshot and describe the bug

Description

I gave ArchiveBox a cookie file (netscape format) but when I try to archive a page like reddit or youtube that have cookie popups (or other type of popups) they still appear in the archive.
If I try to use the NoVNC browser to accept cookies I just see a debian wallpaper with an (almost) empty taskbar (see screenshot)

Screenshots

Vnc empty desktop

Image

Expected result (reddit)

Image

Result (reddit)

onefile
Image

dom
Image

Expected Result (youtube)

Image

Result (youtube)

onefile
Image

dom
Image

wget
Image

P.S.

I don't need to be logged into the accounts, I just don't want the cookie popup.
I also can't add a chrome profile as it gives errors (the container can't start if I uncomment the lines regarding the chrome profile in the docker compose):

[i] [2025-01-19 13:02:23] ArchiveBox v0.7.3: archivebox server --quick-init 0.0.0.0:8000
    > /data
[X] Could not find profile "Default" in CHROME_USER_DATA_DIR.
    /data/personas/Default/chrome_profile
    Make sure you set it to a Chrome user data directory containing a Default profile folder.
    For more info see:
        https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#CHROME_USER_DATA_DIR
    Try removing /Default from the end e.g.:
Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/app/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 74, in run_subcommand
    setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending)
  File "/app/archivebox/config.py", line 1344, in setup_django
    check_system_config()
  File "/app/archivebox/config.py", line 1254, in check_system_config
    stderr('        CHROME_USER_DATA_DIR="{}"'.format(config['CHROME_USER_DATA_DIR'].split('/Default')[0]))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'PosixPath' object has no attribute 'split'

Steps to reproduce

1. Started ArchiveBox on my server
2. Archived https://www.reddit.com/r/ProgrammerHumor/comments/1i4a48j/pushrejectedbydragon/ with archivebox chrome extension
3. Archived https://www.youtube.com/watch?v=jNQXAC9IVRw with archivebox chrome extension
4. View the archived pages (onefile, pdf, screenshot, dom, wget, ecc...)

Logs or errors

---------------------- Container logs ----------------------
[+] Adding URL: https://www.reddit.com/r/ProgrammerHumor/comments/1i4a48j/pushrejectedbydragon/
[*] [2025-01-19 12:29:08] Archiving 1/45 URLs from added set...
"GET /admin/core/snapshot/ HTTP/1.1" 200 107990
"GET /admin/jsi18n/ HTTP/1.1" 200 3191
"GET /admin/core/snapshot/ HTTP/1.1" 200 107988
"GET /admin/jsi18n/ HTTP/1.1" 200 3191
"GET /admin/core/snapshot/ HTTP/1.1" 200 108013
"GET /admin/jsi18n/ HTTP/1.1" 200 3191
"POST /add/ HTTP/1.1" 200 7037

---------------- data/logs/errors.log ------------------
Exception in archive_methods.save_media(Link(url=https://blog.stackademic.com/java-in-practice-one-line-code-for-performance-tracking-aa6a431faba0)) command=/usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2025-01-19__12:14:55
FOREIGN KEY constraint failed
Internal Server Error: /add/
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/views/generic/base.py", line 70, in view
    return self.dispatch(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 43, in _wrapper
    return bound_method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/contrib/auth/mixins.py", line 109, in dispatch
    return super().dispatch(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/views/generic/base.py", line 98, in dispatch
    return handler(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/views/generic/edit.py", line 142, in post
    return self.form_valid(form)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/core/views.py", line 290, in form_valid
    add(**input_kwargs)
  File "/app/archivebox/util.py", line 116, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/main.py", line 693, in add
    archive_links(new_links, overwrite=False, **archive_kwargs)
  File "/app/archivebox/util.py", line 116, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/extractors/__init__.py", line 236, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "/app/archivebox/util.py", line 116, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/extractors/__init__.py", line 199, in archive_link
    write_link_details(link, out_dir=out_dir, skip_sql_index=False)
  File "/app/archivebox/util.py", line 116, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/__init__.py", line 335, in write_link_details
    write_json_link_details(link, out_dir=out_dir)
  File "/app/archivebox/util.py", line 116, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/json.py", line 99, in write_json_link_details
    atomic_write(str(path), link._asdict(extended=True))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/schema.py", line 193, in _asdict
    'snapshot_id': self.snapshot_id,
                   ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/functional.py", line 48, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
                                         ^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/schema.py", line 265, in snapshot_id
    return str(Snapshot.objects.only('id').get(url=self.url).id)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 429, in get
    raise self.model.DoesNotExist(
core.models.Snapshot.DoesNotExist: Snapshot matching query does not exist.

Exception in archive_methods.save_htmltotext(Link(url=https://www.reddit.com/r/ProgrammerHumor/comments/1i4a48j/pushrejectedbydragon/)) command=/usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2025-01-19__12:29:38
cannot access local variable 'cmd' where it is not associated with a valu

ArchiveBox Version

0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.6.44-production+truenas-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=568:568 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.3          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.10.1         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v20.18.1        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.54         valid     /app/node_modules/single-file-cli/single-file                               
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor               
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js                                  
 √  GIT_BINARY            v2.39.5         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2024.12.13     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v131.0.6778.33  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           24 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 √  COOKIES_FILE          6.7 KB          valid     ./chrome-cookies/cookies.txt                                                

[i] Data locations:
 √  OUTPUT_DIR            7 files @       valid     /data                                                                       
 √  SOURCES_DIR           76 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           45 files        valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             624.0 KB        valid     ./index.sqlite3

How did you install the version of ArchiveBox you are using?

Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc)

What operating system are you running on?

Linux (Ubuntu/Debian/Arch/Alpine/etc.)

What type of drive are you using to store your ArchiveBox data?

  • some of data/ is on a local SSD or NVMe drive
  • some of data/ is on a spinning hard drive or external USB drive
  • some of data/ is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.)
  • some of data/ is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.)

Docker Compose Configuration

services:
    archivebox:
        image: archivebox/archivebox:latest
        # handled by traefik
        # ports:
            # - 8000:8000
        volumes:
            - "path/to/archivebox/data/on/host:/data"
            - "path/to/cookies/on/host:/data/chrome-cookies" 
            # - "path/to/chrome/profile/on/host:/data/personas/Default/chrome_profile"
        env_file:
          - stack.env
        networks:
            - archivebox-default
            - traefik-network
        environment:
            - ADMIN_USERNAME=adminuser  
            - ADMIN_PASSWORD=${DEFAULT_ADMIN_PASSWORD}
            - CSRF_TRUSTED_ORIGINS=https://archivebox.mydomain.xyz  
            - ALLOWED_HOSTS=* 
            - PUBLIC_INDEX=False            
            - PUBLIC_SNAPSHOTS=False            
            - PUBLIC_ADD_VIEW=False        
            - SEARCH_BACKEND_ENGINE=sonic     
            - SEARCH_BACKEND_HOST_NAME=sonic
            - SEARCH_BACKEND_PASSWORD=${SEARCH_BACKEND_PASSWORD}
            - PUID=apps-user-puid                       
            - PGID=apps-user-pgid                     
            - COOKIES_FILE=/data/chrome-cookies/cookies.txt
            # - CHROME_USER_DATA_DIR=/data/personas/Default/chrome_profile

            # - MEDIA_MAX_SIZE=5000m        
            # - TIMEOUT=60                   
            # - CHECK_SSL_VALIDITY=True  
            - SAVE_ARCHIVE_DOT_ORG=False     
            # - USER_AGENT="..."    
        dns:
            - ip.to.pihole
        labels:
           # traefik labels
    sonic:
            image: archivebox/sonic:latest
            expose:
                - 1491
            env_file:
                - stack.env
            networks:
                - archivebox-default
            environment:
                - SEARCH_BACKEND_PASSWORD=${SEARCH_BACKEND_PASSWORD}
            volumes:
                - "/path/to/sonic/data/on/host:/var/lib/sonic/store"
            cpus: 2.0
            mem_limit: 2048m

    novnc:
        image: theasp/novnc:latest
        environment:
            - DISPLAY_WIDTH=1920
            - DISPLAY_HEIGHT=1080
            - RUN_XTERM=no
        networks:
            - archivebox-default
        ports:
            - 8080:8080

networks:
    traefik-network:
        external: true
    archivebox-default:

ArchiveBox Configuration

[SERVER_CONFIG]
SECRET_KEY = <redacted>
Originally created by @orthodoxe on GitHub (Jan 19, 2025). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1637 Originally assigned to: @pirate on GitHub. ### Provide a screenshot and describe the bug ## Description I gave ArchiveBox a cookie file (netscape format) but when I try to archive a page like reddit or youtube that have cookie popups (or other type of popups) they still appear in the archive. If I try to use the NoVNC browser to accept cookies I just see a debian wallpaper with an (almost) empty taskbar (see screenshot) ## Screenshots ### Vnc empty desktop ![Image](https://github.com/user-attachments/assets/17bef3de-dd21-4416-a065-064a3032b59c) ### Expected result (reddit) ![Image](https://github.com/user-attachments/assets/6d2b0ab1-3222-4b9f-b038-07e2c42324d5) ### Result (reddit) **onefile** ![Image](https://github.com/user-attachments/assets/8818604b-4dca-49e6-8e50-e33b75d1a0f8) **dom** ![Image](https://github.com/user-attachments/assets/dd6b671a-a26e-4a4b-b868-00c20a4546a3) ### Expected Result (youtube) ![Image](https://github.com/user-attachments/assets/ae84c06c-e04f-4f4b-8d97-25db81e89eab) ### Result (youtube) **onefile** ![Image](https://github.com/user-attachments/assets/d9c89082-3a2b-4342-bf74-1df696cb3b38) **dom** ![Image](https://github.com/user-attachments/assets/9127482f-2175-4f50-81c5-5f944a242871) **wget** ![Image](https://github.com/user-attachments/assets/bfc4d694-e610-4170-806d-6cd7d8fa2800) ## P.S. I don't need to be logged into the accounts, I just don't want the cookie popup. I also can't add a chrome profile as it gives errors (the container can't start if I uncomment the lines regarding the chrome profile in the docker compose): ```txt [i] [2025-01-19 13:02:23] ArchiveBox v0.7.3: archivebox server --quick-init 0.0.0.0:8000 > /data [X] Could not find profile "Default" in CHROME_USER_DATA_DIR. /data/personas/Default/chrome_profile Make sure you set it to a Chrome user data directory containing a Default profile folder. For more info see: https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#CHROME_USER_DATA_DIR Try removing /Default from the end e.g.: Traceback (most recent call last): File "/usr/local/bin/archivebox", line 8, in <module> sys.exit(main()) ^^^^^^ File "/app/archivebox/cli/__init__.py", line 140, in main run_subcommand( File "/app/archivebox/cli/__init__.py", line 74, in run_subcommand setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending) File "/app/archivebox/config.py", line 1344, in setup_django check_system_config() File "/app/archivebox/config.py", line 1254, in check_system_config stderr(' CHROME_USER_DATA_DIR="{}"'.format(config['CHROME_USER_DATA_DIR'].split('/Default')[0])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'PosixPath' object has no attribute 'split' ``` ### Steps to reproduce ```markdown 1. Started ArchiveBox on my server 2. Archived https://www.reddit.com/r/ProgrammerHumor/comments/1i4a48j/pushrejectedbydragon/ with archivebox chrome extension 3. Archived https://www.youtube.com/watch?v=jNQXAC9IVRw with archivebox chrome extension 4. View the archived pages (onefile, pdf, screenshot, dom, wget, ecc...) ``` ### Logs or errors ```shell ---------------------- Container logs ---------------------- [+] Adding URL: https://www.reddit.com/r/ProgrammerHumor/comments/1i4a48j/pushrejectedbydragon/ [*] [2025-01-19 12:29:08] Archiving 1/45 URLs from added set... "GET /admin/core/snapshot/ HTTP/1.1" 200 107990 "GET /admin/jsi18n/ HTTP/1.1" 200 3191 "GET /admin/core/snapshot/ HTTP/1.1" 200 107988 "GET /admin/jsi18n/ HTTP/1.1" 200 3191 "GET /admin/core/snapshot/ HTTP/1.1" 200 108013 "GET /admin/jsi18n/ HTTP/1.1" 200 3191 "POST /add/ HTTP/1.1" 200 7037 ---------------- data/logs/errors.log ------------------ Exception in archive_methods.save_media(Link(url=https://blog.stackademic.com/java-in-practice-one-line-code-for-performance-tracking-aa6a431faba0)) command=/usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2025-01-19__12:14:55 FOREIGN KEY constraint failed Internal Server Error: /add/ Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 181, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/views/generic/base.py", line 70, in view return self.dispatch(request, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 43, in _wrapper return bound_method(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view return view_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/contrib/auth/mixins.py", line 109, in dispatch return super().dispatch(request, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/views/generic/base.py", line 98, in dispatch return handler(request, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/views/generic/edit.py", line 142, in post return self.form_valid(form) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/core/views.py", line 290, in form_valid add(**input_kwargs) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/main.py", line 693, in add archive_links(new_links, overwrite=False, **archive_kwargs) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/extractors/__init__.py", line 236, in archive_links archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir)) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/extractors/__init__.py", line 199, in archive_link write_link_details(link, out_dir=out_dir, skip_sql_index=False) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/__init__.py", line 335, in write_link_details write_json_link_details(link, out_dir=out_dir) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/json.py", line 99, in write_json_link_details atomic_write(str(path), link._asdict(extended=True)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/schema.py", line 193, in _asdict 'snapshot_id': self.snapshot_id, ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/utils/functional.py", line 48, in __get__ res = instance.__dict__[self.name] = self.func(instance) ^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/schema.py", line 265, in snapshot_id return str(Snapshot.objects.only('id').get(url=self.url).id) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 429, in get raise self.model.DoesNotExist( core.models.Snapshot.DoesNotExist: Snapshot matching query does not exist. Exception in archive_methods.save_htmltotext(Link(url=https://www.reddit.com/r/ProgrammerHumor/comments/1i4a48j/pushrejectedbydragon/)) command=/usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2025-01-19__12:29:38 cannot access local variable 'cmd' where it is not associated with a valu ``` ### ArchiveBox Version ```shell 0.7.3 ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.6.44-production+truenas-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=568:568 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.11 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.3 valid /usr/local/bin/archivebox √ CURL_BINARY v8.10.1 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.18.1 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.54 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.5 valid /usr/bin/git √ YOUTUBEDL_BINARY v2024.12.13 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v131.0.6778.33 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 24 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None √ COOKIES_FILE 6.7 KB valid ./chrome-cookies/cookies.txt [i] Data locations: √ OUTPUT_DIR 7 files @ valid /data √ SOURCES_DIR 76 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 45 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 624.0 KB valid ./index.sqlite3 ``` ### How did you install the version of ArchiveBox you are using? Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc) ### What operating system are you running on? Linux (Ubuntu/Debian/Arch/Alpine/etc.) ### What type of drive are you using to store your ArchiveBox data? - [ ] some of `data/` is on a local SSD or NVMe drive - [x] some of `data/` is on a spinning hard drive or external USB drive - [ ] some of `data/` is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.) - [ ] some of `data/` is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.) ### Docker Compose Configuration ```shell services: archivebox: image: archivebox/archivebox:latest # handled by traefik # ports: # - 8000:8000 volumes: - "path/to/archivebox/data/on/host:/data" - "path/to/cookies/on/host:/data/chrome-cookies" # - "path/to/chrome/profile/on/host:/data/personas/Default/chrome_profile" env_file: - stack.env networks: - archivebox-default - traefik-network environment: - ADMIN_USERNAME=adminuser - ADMIN_PASSWORD=${DEFAULT_ADMIN_PASSWORD} - CSRF_TRUSTED_ORIGINS=https://archivebox.mydomain.xyz - ALLOWED_HOSTS=* - PUBLIC_INDEX=False - PUBLIC_SNAPSHOTS=False - PUBLIC_ADD_VIEW=False - SEARCH_BACKEND_ENGINE=sonic - SEARCH_BACKEND_HOST_NAME=sonic - SEARCH_BACKEND_PASSWORD=${SEARCH_BACKEND_PASSWORD} - PUID=apps-user-puid - PGID=apps-user-pgid - COOKIES_FILE=/data/chrome-cookies/cookies.txt # - CHROME_USER_DATA_DIR=/data/personas/Default/chrome_profile # - MEDIA_MAX_SIZE=5000m # - TIMEOUT=60 # - CHECK_SSL_VALIDITY=True - SAVE_ARCHIVE_DOT_ORG=False # - USER_AGENT="..." dns: - ip.to.pihole labels: # traefik labels sonic: image: archivebox/sonic:latest expose: - 1491 env_file: - stack.env networks: - archivebox-default environment: - SEARCH_BACKEND_PASSWORD=${SEARCH_BACKEND_PASSWORD} volumes: - "/path/to/sonic/data/on/host:/var/lib/sonic/store" cpus: 2.0 mem_limit: 2048m novnc: image: theasp/novnc:latest environment: - DISPLAY_WIDTH=1920 - DISPLAY_HEIGHT=1080 - RUN_XTERM=no networks: - archivebox-default ports: - 8080:8080 networks: traefik-network: external: true archivebox-default: ``` ### ArchiveBox Configuration ```shell [SERVER_CONFIG] SECRET_KEY = <redacted> ```
kerem closed this issue 2026-03-01 17:59:24 +03:00
Author
Owner

@TooManyStacks commented on GitHub (Jan 24, 2025):

I can confirm that with the latest tag, neither the cookies.txt nor the chromium profile are working.

<!-- gh-comment-id:2612839747 --> @TooManyStacks commented on GitHub (Jan 24, 2025): I can confirm that with the latest tag, neither the cookies.txt nor the chromium profile are working.
Author
Owner

@pirate commented on GitHub (Jan 26, 2025):

OP's post indicates no CHROME_USER_DATA_DIR is set up.

The cookies.txt file only applies to a few of the methods (wget, curl, yt-dlp), for the rest a CHROME_USER_DATA_DIR must be set up.

<!-- gh-comment-id:2614522360 --> @pirate commented on GitHub (Jan 26, 2025): OP's post indicates no `CHROME_USER_DATA_DIR` is set up. The cookies.txt file only applies to a few of the methods (wget, curl, yt-dlp), for the rest a `CHROME_USER_DATA_DIR` must be set up.
Author
Owner

@orthodoxe commented on GitHub (Jan 28, 2025):

OP's post indicates no CHROME_USER_DATA_DIR is set up.

The cookies.txt file only applies to a few of the methods (wget, curl, yt-dlp), for the rest a CHROME_USER_DATA_DIR must be set up.

Thank you for your response.
I have tried to uncomment the lines regarding the chrome profile but when I restart the docker compose stack I get this error from the archivebox main container:

AttributeError: 'PosixPath' object has no attribute 'split'
[i] [2025-01-28 14:19:48] ArchiveBox v0.7.3: archivebox server --quick-init 0.0.0.0:8000
    > /data
[X] Could not find profile "Default" in CHROME_USER_DATA_DIR.
    /data/personas/Default/chrome_profile
    Make sure you set it to a Chrome user data directory containing a Default profile folder.
    For more info see:
        https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#CHROME_USER_DATA_DIR
    Try removing /Default from the end e.g.:
Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/app/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 74, in run_subcommand
    setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending)
  File "/app/archivebox/config.py", line 1344, in setup_django
    check_system_config()
  File "/app/archivebox/config.py", line 1254, in check_system_config
    stderr('        CHROME_USER_DATA_DIR="{}"'.format(config['CHROME_USER_DATA_DIR'].split('/Default')[0]))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'PosixPath' object has no attribute 'split'

Inside the chrome profile folder I have the profile's data not another folder named "Default" or anything else.

<!-- gh-comment-id:2619161067 --> @orthodoxe commented on GitHub (Jan 28, 2025): > OP's post indicates no `CHROME_USER_DATA_DIR` is set up. > > The cookies.txt file only applies to a few of the methods (wget, curl, yt-dlp), for the rest a `CHROME_USER_DATA_DIR` must be set up. Thank you for your response. I have tried to uncomment the lines regarding the chrome profile but when I restart the docker compose stack I get this error from the archivebox main container: ```bash AttributeError: 'PosixPath' object has no attribute 'split' [i] [2025-01-28 14:19:48] ArchiveBox v0.7.3: archivebox server --quick-init 0.0.0.0:8000 > /data [X] Could not find profile "Default" in CHROME_USER_DATA_DIR. /data/personas/Default/chrome_profile Make sure you set it to a Chrome user data directory containing a Default profile folder. For more info see: https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#CHROME_USER_DATA_DIR Try removing /Default from the end e.g.: Traceback (most recent call last): File "/usr/local/bin/archivebox", line 8, in <module> sys.exit(main()) ^^^^^^ File "/app/archivebox/cli/__init__.py", line 140, in main run_subcommand( File "/app/archivebox/cli/__init__.py", line 74, in run_subcommand setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending) File "/app/archivebox/config.py", line 1344, in setup_django check_system_config() File "/app/archivebox/config.py", line 1254, in check_system_config stderr(' CHROME_USER_DATA_DIR="{}"'.format(config['CHROME_USER_DATA_DIR'].split('/Default')[0])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'PosixPath' object has no attribute 'split' ``` Inside the chrome profile folder I have the profile's data not another folder named "Default" or anything else.
Author
Owner

@pirate commented on GitHub (Jan 28, 2025):

If there's no "default" folder inside then you're using the wrong folder, pass it the parent folder to the one you're using now.

<!-- gh-comment-id:2619358424 --> @pirate commented on GitHub (Jan 28, 2025): If there's no "default" folder inside then you're using the wrong folder, pass it the parent folder to the one you're using now.
Author
Owner

@orthodoxe commented on GitHub (Jan 28, 2025):

I put the chrome profile's data inside a folder named "Default" so now I have:

chrome-profile
└ Default
   └ profile's data (it's a placeholder, not an actual directory)

I no longer get the error but websites archived with onefile (and other archiving methods that use chrome profile) still have the cookie popup.

<!-- gh-comment-id:2619507403 --> @orthodoxe commented on GitHub (Jan 28, 2025): I put the chrome profile's data inside a folder named "Default" so now I have: ```directory chrome-profile └ Default └ profile's data (it's a placeholder, not an actual directory) ``` I no longer get the error but websites archived with onefile (and other archiving methods that use chrome profile) still have the cookie popup.
Author
Owner

@pirate commented on GitHub (Jan 28, 2025):

It should look like this:

CHROME_USER_DATA_DIR=/path/to/chrome_profile
/path/to/chrome_profile
└ AutofillStates
└ CertificateRevocation
└ Crowd Deny
└ Default
  └ AutofillStrikeDatabase
  └ blob_storage
  └ BudgetDatabase
  └ Cache
  └ ...
  └ Cookies
  └ History
  └ Preferences
└ FileTypePolicies
└ ...
└ SingletonCookie
└ SingletonLock
└ SingletonSocket
Image Image

ArchiveBox also has the concept of a "Persona" which is folder that contains all the state needed to impersonate a human (which can include a chrome profile dir). Don't be confused by Default appearing twice in the path if you're putting the chrome profile in the /data/personas/Default dir, the personas/Default is created by archivebox, but really it can be any path you don't have to put it inthe personas dir, the chrome_profile/Default subdir is created by chrome and cannot be relocated.

  • CHROME_USER_DATA_DIR=/any/path/to/chrome_profile <- this is ok (this dir should already contain a Default dir inside that's created by chrome)
  • CHROME_USER_DATA_DIR=/data/personas/Default/chrome_profile <- this is ok (this dir should already contain a Default dir inside that's created by chrome)
  • CHROME_USER_DATA_DIR=/data/personas/Default/chrome_profile/Default <- this is incorrect, take the /Default off the end (the error you saw was in the help text explaining this)
  • CHROME_USER_DATA_DIR=/any/path/to/chrome_profile/Default <- this is incorrect (same, take /Default off the end)

For example to create and use a new chrome profile stored in ~/Desktop/test_profile you'd run:

# 1.a. if you want to open the UI so you can log into things to seed the new profile with cookies
chrome --user-data-dir=$HOME/Desktop/test_profile

# 1.b. or if you just want to create a new blank chrome profile with no cookies in it
chrome --user-data-dir=$HOME/Desktop/test_profile --headless=new --screenshot 'https://example.com'

# 2. after chrome exits, ./test_profile will contain the dir you should pass to archivebox
archivebox config --set CHROME_USER_DATA_DIR=$HOME/Desktop/test_profile

# 3. check that it's valid and usable by archivebox
archivebox version | grep CHROME_USER_DATA_DIR
Image Image
<!-- gh-comment-id:2620078666 --> @pirate commented on GitHub (Jan 28, 2025): It should look like this: ```bash CHROME_USER_DATA_DIR=/path/to/chrome_profile ``` ``` /path/to/chrome_profile └ AutofillStates └ CertificateRevocation └ Crowd Deny └ Default └ AutofillStrikeDatabase └ blob_storage └ BudgetDatabase └ Cache └ ... └ Cookies └ History └ Preferences └ FileTypePolicies └ ... └ SingletonCookie └ SingletonLock └ SingletonSocket ``` <img width="868" alt="Image" src="https://github.com/user-attachments/assets/8c0641df-71a6-4c58-bb20-3d178366cc70" /> <img width="1001" alt="Image" src="https://github.com/user-attachments/assets/57021bf5-0ed4-4e0a-99d4-7822f4118b66" /> --- ArchiveBox also has the concept of a "Persona" which is folder that contains all the state needed to impersonate a human (which can include a chrome profile dir). Don't be confused by `Default` appearing twice in the path if you're putting the chrome profile in the `/data/personas/Default` dir, the `personas/Default` is created by archivebox, but really it can be any path you don't have to put it inthe personas dir, the `chrome_profile/Default` subdir is created by chrome and cannot be relocated. - ✅ `CHROME_USER_DATA_DIR=/any/path/to/chrome_profile` <- this is ok (this dir should already contain a `Default` dir inside that's created by chrome) - ✅ `CHROME_USER_DATA_DIR=/data/personas/Default/chrome_profile` <- this is ok (this dir should already contain a `Default` dir inside that's created by chrome) - ❌ `CHROME_USER_DATA_DIR=/data/personas/Default/chrome_profile/Default` <- this is incorrect, take the `/Default` off the end (the error you saw was in the help text explaining this) - ❌ `CHROME_USER_DATA_DIR=/any/path/to/chrome_profile/Default` <- this is incorrect (same, take `/Default` off the end) --- For example to create and use a new chrome profile stored in `~/Desktop/test_profile` you'd run: ```bash # 1.a. if you want to open the UI so you can log into things to seed the new profile with cookies chrome --user-data-dir=$HOME/Desktop/test_profile # 1.b. or if you just want to create a new blank chrome profile with no cookies in it chrome --user-data-dir=$HOME/Desktop/test_profile --headless=new --screenshot 'https://example.com' # 2. after chrome exits, ./test_profile will contain the dir you should pass to archivebox archivebox config --set CHROME_USER_DATA_DIR=$HOME/Desktop/test_profile # 3. check that it's valid and usable by archivebox archivebox version | grep CHROME_USER_DATA_DIR ``` <img width="997" alt="Image" src="https://github.com/user-attachments/assets/62d684fd-2655-4977-96fc-04eacd26394f" /> <img width="534" alt="Image" src="https://github.com/user-attachments/assets/80fe5f48-537b-45ab-966d-27a423f2e182" />
Author
Owner

@orthodoxe commented on GitHub (Jan 29, 2025):

For example to create and use a new chrome profile stored in ~/Desktop/test_profile you'd run:
# 1.a. if you want to open the UI so you can log into things to seed the new profile with cookies
chrome --user-data-dir=$HOME/Desktop/test_profile
# 1.b. or if you just want to create a new blank chrome profile with no cookies in it
chrome --user-data-dir=$HOME/Desktop/test_profile --headless=new --screenshot 'https://example.com'

Should I create the profile with google chrome or chromium? I read the docs and, from my understanding, I should use chromium but does archivebox accept google chrome profile too or do I have to find a way of installing chromium?

<!-- gh-comment-id:2622393571 --> @orthodoxe commented on GitHub (Jan 29, 2025): > For example to create and use a new chrome profile stored in ~/Desktop/test_profile you'd run: > \# 1.a. if you want to open the UI so you can log into things to seed the new profile with cookies > chrome --user-data-dir=$HOME/Desktop/test_profile > \# 1.b. or if you just want to create a new blank chrome profile with no cookies in it > chrome --user-data-dir=$HOME/Desktop/test_profile --headless=new --screenshot 'https://example.com' Should I create the profile with google chrome or chromium? I read the docs and, from my understanding, I should use chromium but does archivebox accept google chrome profile too or do I have to find a way of installing chromium?
Author
Owner

@pirate commented on GitHub (Jan 31, 2025):

You can use either one (controlled by archivebox config --set CHROME_BINARY=chromium), but whatever you use needs to match. The browser that creates the profile needs to be on the same OS, CPU architecture, and ideally the exact same chromium/chrome binary.

Depending on your OS you can use any of these:

github.com/ArchiveBox/ArchiveBox@12f109b1be/archivebox/pkgs/abx-plugin-chrome/abx_plugin_chrome/binaries.py (L27C1-L51C76)

and probably other chromium based browsers too like brave

<!-- gh-comment-id:2626457277 --> @pirate commented on GitHub (Jan 31, 2025): You can use either one (controlled by `archivebox config --set CHROME_BINARY=chromium`), but whatever you use needs to match. The browser that creates the profile needs to be on the same OS, CPU architecture, and ideally the exact same chromium/chrome binary. Depending on your OS you can use any of these: https://github.com/ArchiveBox/ArchiveBox/blob/12f109b1be9577f5d7adde0a93021496e2d60624/archivebox/pkgs/abx-plugin-chrome/abx_plugin_chrome/binaries.py#L27C1-L51C76 and probably other chromium based browsers too like `brave`
Author
Owner

@orthodoxe commented on GitHub (Feb 10, 2025):

Hello, sorry to respond to late but life's been busy.
I tried making a chromium profile and it loads without errors but it still doesn't use it.

This is the Default folder inside chrome-profile. The mounted folder is chrome-profile.
Image

Image

Image

I still get the cookie popup on things like singlefile

Image

<!-- gh-comment-id:2649036880 --> @orthodoxe commented on GitHub (Feb 10, 2025): Hello, sorry to respond to late but life's been busy. I tried making a chromium profile and it loads without errors but it still doesn't use it. This is the `Default` folder inside `chrome-profile`. The mounted folder is `chrome-profile`. ![Image](https://github.com/user-attachments/assets/a5ab732f-1a70-482d-a65b-741471547a28) ![Image](https://github.com/user-attachments/assets/7bfacd37-34f7-40bc-bc32-c3f84246c01f) ![Image](https://github.com/user-attachments/assets/dc45ead7-c208-420f-90b2-2907786b98e6) I still get the cookie popup on things like singlefile ![Image](https://github.com/user-attachments/assets/8ef13ba2-d32f-48ac-9877-00970b13c2b6)
Author
Owner

@pirate commented on GitHub (Feb 11, 2025):

Singlefile not respecting the chrome profile is a known issue on v0.7.3, is it working for screenshot, PDF, and DOM though? Those are the only ones it applies to

<!-- gh-comment-id:2649599905 --> @pirate commented on GitHub (Feb 11, 2025): Singlefile not respecting the chrome profile is a known issue on v0.7.3, is it working for screenshot, PDF, and DOM though? Those are the only ones it applies to
Author
Owner

@orthodoxe commented on GitHub (Feb 11, 2025):

Yes, it is working for pdf, screenshot and dom, I don't see the cookie popup.
Then I'll wait for some bug fixes and maybe help with the project myself.
Anyways, thanks for all the help.

<!-- gh-comment-id:2650997818 --> @orthodoxe commented on GitHub (Feb 11, 2025): Yes, it is working for pdf, screenshot and dom, I don't see the cookie popup. Then I'll wait for some bug fixes and maybe help with the project myself. Anyways, thanks for all the help.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2490
No description provided.