[GH-ISSUE #1622] Bug: New/re snapshots failing #2481

Open
opened 2026-03-01 17:59:20 +03:00 by kerem · 7 comments
Owner

Originally created by @parkerlreed on GitHub (Dec 20, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1622

Originally assigned to: @pirate on GitHub.

Provide a screenshot and describe the bug

New or updating snapshots are failing

Image

Steps to reproduce

1. Updated to latest version in podman
2. Tried to re-snapshot a link
3. During debug deleted the existing snapshot and tried again
4. Same error

Logs or errors

Not Found: /archive/1734681023.702893/favicon.ico

Internal Server Error: /admin/core/snapshot/

Traceback (most recent call last):

  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner

    response = get_response(request)

               ^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 181, in _get_response

    response = wrapped_callback(request, *callback_args, **callback_kwargs)

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/contrib/admin/options.py", line 614, in wrapper

    return self.admin_site.admin_view(view)(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 130, in _wrapped_view

    response = view_func(request, *args, **kwargs)

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func

    response = view_func(request, *args, **kwargs)

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/contrib/admin/sites.py", line 233, in inner

    return view(request, *args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 43, in _wrapper

    return bound_method(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 130, in _wrapped_view

    response = view_func(request, *args, **kwargs)

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/contrib/admin/options.py", line 1719, in changelist_view

    response = self.response_action(request, queryset=cl.get_queryset(request))

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/contrib/admin/options.py", line 1402, in response_action

    response = func(self, request, queryset)

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/app/archivebox/core/admin.py", line 263, in resnapshot_snapshot

    add(new_url, tag=snapshot.tags_str())

  File "/app/archivebox/util.py", line 116, in typechecked_function

    return func(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^

  File "/app/archivebox/main.py", line 693, in add

    archive_links(new_links, overwrite=False, **archive_kwargs)

  File "/app/archivebox/util.py", line 116, in typechecked_function

    return func(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^

  File "/app/archivebox/extractors/__init__.py", line 236, in archive_links

    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))

  File "/app/archivebox/util.py", line 116, in typechecked_function

    return func(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^

  File "/app/archivebox/extractors/__init__.py", line 199, in archive_link

    write_link_details(link, out_dir=out_dir, skip_sql_index=False)

  File "/app/archivebox/util.py", line 116, in typechecked_function

    return func(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^

  File "/app/archivebox/index/__init__.py", line 335, in write_link_details

    write_json_link_details(link, out_dir=out_dir)

  File "/app/archivebox/util.py", line 116, in typechecked_function

    return func(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^

  File "/app/archivebox/index/json.py", line 99, in write_json_link_details

    atomic_write(str(path), link._asdict(extended=True))

                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/app/archivebox/index/schema.py", line 193, in _asdict

    'snapshot_id': self.snapshot_id,

                   ^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/utils/functional.py", line 48, in __get__

    res = instance.__dict__[self.name] = self.func(instance)

                                         ^^^^^^^^^^^^^^^^^^^

  File "/app/archivebox/index/schema.py", line 265, in snapshot_id

    return str(Snapshot.objects.only('id').get(url=self.url).id)

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 429, in get

    raise self.model.DoesNotExist(

core.models.Snapshot.DoesNotExist: Snapshot matching query does not exist.

"POST /admin/core/snapshot/ HTTP/1.1" 500 145

"GET /admin/core/snapshot/ HTTP/1.1" 200 54248

"GET /admin/jsi18n/ HTTP/1.1" 200 3191

Not Found: /archive/1734681023.702893/favicon.ico

ArchiveBox Version

0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.12.4-arch1-1-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.3          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.10.1         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v20.18.1        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.54         valid     /app/node_modules/single-file-cli/single-file                               
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor               
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js                                  
 √  GIT_BINARY            v2.39.5         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2024.12.13     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v131.0.6778.33  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 -  COOKIES_FILE          -               disabled  None                                                                        

[i] Data locations:
 √  OUTPUT_DIR            10 files @      valid     /data                                                                       
 √  SOURCES_DIR           15 files        valid     ./sources                                                                   
 √  LOGS_DIR              2 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           16 files        valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             436.0 KB        valid     ./index.sqlite3

How did you install the version of ArchiveBox you are using?

Docker (or other container system like podman/LXC/Kubernetes or TrueNAS/Cloudron/YunoHost/etc.)

What operating system are you running on?

Linux: Arch Linux

What type of drive are you using to store your ArchiveBox data?

  • data/ is on a local SSD or NVMe drive
  • data/ is on a spinning hard drive or external USB drive
  • data/ is on a network mount (e.g. NFS/SMB/CIFS/etc.)
  • data/ is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/OneDrive, etc.)

Docker Compose Configuration

# Usage:
#     curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml
#     docker compose up
#     docker compose run archivebox version
#     docker compose run -T archivebox add < urls_to_archive.txt
#     docker compose run archivebox add --depth=1 'https://news.ycombinator.com'
#     docker compose run archivebox config --set SAVE_ARCHIVE_DOT_ORG=False
#     docker compose run archivebox help
# Documentation:
#     https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#docker-compose

services:
    archivebox:
        image: archivebox/archivebox:latest
        ports:
            - 8000:8000
        volumes:
            - ./data:/data
            # ./data/personas/Default/chrome_profile/Default:/data/personas/Default/chrome_profile/Default
        environment:
            # - ADMIN_USERNAME=admin            # creates an admin user on first run with the given user/pass combo
            # - ADMIN_PASSWORD=SomeSecretPassword
            - CSRF_TRUSTED_ORIGINS=https://archivebox.example.com  # REQUIRED for auth, REST API, etc. to work
            - ALLOWED_HOSTS=*                   # set this to the hostname(s) from your CSRF_TRUSTED_ORIGINS
            - PUBLIC_INDEX=True                 # set to False to prevent anonymous users from viewing snapshot list
            - PUBLIC_SNAPSHOTS=True             # set to False to prevent anonymous users from viewing snapshot content
            - PUBLIC_ADD_VIEW=False             # set to True to allow anonymous users to submit new URLs to archive
            - SEARCH_BACKEND_ENGINE=sonic       # tells ArchiveBox to use sonic container below for fast full-text search
            - SEARCH_BACKEND_HOST_NAME=sonic
            - SEARCH_BACKEND_PASSWORD=SomeSecretPassword
            # - PUID=911                        # set to your host user's UID & GID if you encounter permissions issues
            # - PGID=911                        # UID/GIDs <500 may clash with existing users and are not recommended
            # For options below, it's better to set using `docker compose run archivebox config --set SOME_KEY=someval` instead of setting here:
            # - MEDIA_MAX_SIZE=750m             # increase this filesize limit to allow archiving larger audio/video files
            # - TIMEOUT=60                      # increase this number to 120+ seconds if you see many slow downloads timing out
            # - CHECK_SSL_VALIDITY=True         # set to False to disable strict SSL checking (allows saving URLs w/ broken certs)
            # - SAVE_ARCHIVE_DOT_ORG=True       # set to False to disable submitting all URLs to Archive.org when archiving
            # - USER_AGENT="..."                # set a custom USER_AGENT to avoid being blocked as a bot
            # ...
            # For more info, see: https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#configuration
            
        # For ad-blocking during archiving, uncomment this section and the pihole service below
        # networks:
        #   - dns
        # dns:
        #   - 172.20.0.53


    ######## Optional Addons: tweak examples below as needed for your specific use case ########

    ### This optional container runs scheduled jobs in the background (and retries failed ones). To add a new job:
    #   $ docker compose run archivebox schedule --add --every=day --depth=1 'https://example.com/some/rss/feed.xml'
    # then restart the scheduler container to apply any changes to the scheduled task list:
    #   $ docker compose restart archivebox_scheduler
    # https://github.com/ArchiveBox/ArchiveBox/wiki/Scheduled-Archiving

    archivebox_scheduler:
        
        image: archivebox/archivebox:latest
        command: schedule --foreground --update --every=day
        environment:
            # - PUID=911                        # set to your host user's UID & GID if you encounter permissions issues
            # - PGID=911
            - TIMEOUT=120                       # use a higher timeout than the main container to give slow tasks more time when retrying
            - SEARCH_BACKEND_ENGINE=sonic       # tells ArchiveBox to use sonic container below for fast full-text search
            - SEARCH_BACKEND_HOST_NAME=sonic
            - SEARCH_BACKEND_PASSWORD=SomeSecretPassword
            # For other config it's better to set using `docker compose run archivebox config --set SOME_KEY=someval` instead of setting here
            # ...
            # For more info, see: https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#configuration
        volumes:
            - ./data:/data
        # cpus: 2                               # uncomment / edit these values to limit scheduler container resource consumption
        # mem_limit: 2048m
        # restart: always


    ### This runs the optional Sonic full-text search backend (much faster than default rg backend).
    # If Sonic is ever started after not running for a while, update its full-text index by running:
    #   $ docker-compose run archivebox update --index-only
    # https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search

    sonic:
        image: archivebox/sonic:latest
        expose:
            - 1491
        environment:
            - SEARCH_BACKEND_PASSWORD=SomeSecretPassword
        volumes:
            #- ./sonic.cfg:/etc/sonic.cfg:ro    # mount to customize: https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/stable/etc/sonic.cfg
            - ./data/sonic:/var/lib/sonic/store


    ### This optional container runs xvfb+noVNC so you can watch the ArchiveBox browser as it archives things,
    # or remote control it to set up a chrome profile w/ login credentials for sites you want to archive.
    # https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile
    # https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#docker-vnc-setup

    novnc:
        image: theasp/novnc:latest
        environment:
            - DISPLAY_WIDTH=1920
            - DISPLAY_HEIGHT=1080
            - RUN_XTERM=no
        ports:
            # to view/control ArchiveBox's browser, visit: http://127.0.0.1:8080/vnc.html
            # restricted to access from localhost by default because it has no authentication
            - 127.0.0.1:8080:8080


    ### Example: Put Nginx in front of the ArchiveBox server for SSL termination and static file serving.
    # You can also any other ingress provider for SSL like Apache, Caddy, Traefik, Cloudflare Tunnels, etc.

    # nginx:
    #     image: nginx:alpine
    #     ports:
    #         - 443:443
    #         - 80:80
    #     volumes:
    #         - ./etc/nginx.conf:/etc/nginx/nginx.conf
    #         - ./data:/var/www


    ### Example: To run pihole in order to block ad/tracker requests during archiving,
    # uncomment this optional block and set up pihole using its admin interface

    # pihole:
    #   image: pihole/pihole:latest
    #   ports:
    #     # access the admin HTTP interface on http://localhost:8090
    #     - 127.0.0.1:8090:80
    #   environment:
    #     - WEBPASSWORD=SET_THIS_TO_SOME_SECRET_PASSWORD_FOR_ADMIN_DASHBOARD
    #     - DNSMASQ_LISTENING=all
    #   dns:
    #     - 127.0.0.1
    #     - 1.1.1.1
    #   networks:
    #     dns:
    #       ipv4_address: 172.20.0.53
    #   volumes:
    #     - ./etc/pihole:/etc/pihole
    #     - ./etc/dnsmasq:/etc/dnsmasq.d


    ### Example: run all your ArchiveBox traffic through a WireGuard VPN tunnel to avoid IP blocks.
    # You can also use any other VPN that works at the docker/IP level, e.g. Tailscale, OpenVPN, etc.

    # wireguard:
    #   image: linuxserver/wireguard:latest
    #   network_mode: 'service:archivebox'
    #   cap_add:
    #     - NET_ADMIN
    #     - SYS_MODULE
    #   sysctls:
    #     - net.ipv4.conf.all.rp_filter=2
    #     - net.ipv4.conf.all.src_valid_mark=1
    #   volumes:
    #     - /lib/modules:/lib/modules
    #     - ./wireguard.conf:/config/wg0.conf:ro

    ### Example: Run ChangeDetection.io to watch for changes to websites, then trigger ArchiveBox to archive them
    # Documentation: https://github.com/dgtlmoon/changedetection.io
    # More info: https://github.com/dgtlmoon/changedetection.io/blob/master/docker-compose.yml

    # changedetection:
    #     image: ghcr.io/dgtlmoon/changedetection.io
    #     volumes:
    #         - ./data-changedetection:/datastore


    ### Example: Run PYWB in parallel and auto-import WARCs from ArchiveBox

    # pywb:
    #     image: webrecorder/pywb:latest
    #     entrypoint: /bin/sh -c '(wb-manager init default || test $$? -eq 2) && wb-manager add default /archivebox/archive/*/warc/*.warc.gz; wayback;'
    #     environment:
    #         - INIT_COLLECTION=archivebox
    #     ports:
    #         - 8686:8080
    #     volumes:
    #         - ./data:/archivebox
    #         - ./data/wayback:/webarchive


networks:
    # network just used for pihole container to offer :53 dns resolving on fixed ip for archivebox container
    dns:
        ipam:
            driver: default
            config:
                - subnet: 172.20.0.0/24


# HOW TO: Set up cloud storage for your ./data/archive (e.g. Amazon S3, Backblaze B2, Google Drive, OneDrive, SFTP, etc.)
#   https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-Up-Storage
#
#   Follow the steps here to set up the Docker RClone Plugin https://rclone.org/docker/
#     $ docker plugin install rclone/docker-volume-rclone:amd64 --grant-all-permissions --alias rclone
#     $ nano /var/lib/docker-plugins/rclone/config/rclone.conf
#     [examplegdrive]
#     type = drive
#     scope = drive
#     drive_id = 1234567...
#     root_folder_id = 0Abcd...
#     token = {"access_token":...}

# volumes:
#     archive:
#         driver: rclone
#         driver_opts:
#             remote: 'examplegdrive:archivebox'
#             allow_other: 'true'
#             vfs_cache_mode: full
#             poll_interval: 0

ArchiveBox Configuration

[SERVER_CONFIG]
SECRET_KEY = redcated
Originally created by @parkerlreed on GitHub (Dec 20, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1622 Originally assigned to: @pirate on GitHub. ### Provide a screenshot and describe the bug New or updating snapshots are failing ![Image](https://github.com/user-attachments/assets/346feabc-5178-4783-a949-9c8f09353705) ### Steps to reproduce ```markdown 1. Updated to latest version in podman 2. Tried to re-snapshot a link 3. During debug deleted the existing snapshot and tried again 4. Same error ``` ### Logs or errors ```shell Not Found: /archive/1734681023.702893/favicon.ico Internal Server Error: /admin/core/snapshot/ Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 181, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/contrib/admin/options.py", line 614, in wrapper return self.admin_site.admin_view(view)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 130, in _wrapped_view response = view_func(request, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func response = view_func(request, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/contrib/admin/sites.py", line 233, in inner return view(request, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 43, in _wrapper return bound_method(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 130, in _wrapped_view response = view_func(request, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/contrib/admin/options.py", line 1719, in changelist_view response = self.response_action(request, queryset=cl.get_queryset(request)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/contrib/admin/options.py", line 1402, in response_action response = func(self, request, queryset) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/core/admin.py", line 263, in resnapshot_snapshot add(new_url, tag=snapshot.tags_str()) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/main.py", line 693, in add archive_links(new_links, overwrite=False, **archive_kwargs) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/extractors/__init__.py", line 236, in archive_links archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir)) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/extractors/__init__.py", line 199, in archive_link write_link_details(link, out_dir=out_dir, skip_sql_index=False) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/__init__.py", line 335, in write_link_details write_json_link_details(link, out_dir=out_dir) File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/json.py", line 99, in write_json_link_details atomic_write(str(path), link._asdict(extended=True)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/schema.py", line 193, in _asdict 'snapshot_id': self.snapshot_id, ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/utils/functional.py", line 48, in __get__ res = instance.__dict__[self.name] = self.func(instance) ^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/schema.py", line 265, in snapshot_id return str(Snapshot.objects.only('id').get(url=self.url).id) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 429, in get raise self.model.DoesNotExist( core.models.Snapshot.DoesNotExist: Snapshot matching query does not exist. "POST /admin/core/snapshot/ HTTP/1.1" 500 145 "GET /admin/core/snapshot/ HTTP/1.1" 200 54248 "GET /admin/jsi18n/ HTTP/1.1" 200 3191 Not Found: /archive/1734681023.702893/favicon.ico ``` ### ArchiveBox Version ```shell 0.7.3 ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.12.4-arch1-1-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.11 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.3 valid /usr/local/bin/archivebox √ CURL_BINARY v8.10.1 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.18.1 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.54 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.5 valid /usr/bin/git √ YOUTUBEDL_BINARY v2024.12.13 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v131.0.6778.33 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 10 files @ valid /data √ SOURCES_DIR 15 files valid ./sources √ LOGS_DIR 2 files valid ./logs √ ARCHIVE_DIR 16 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 436.0 KB valid ./index.sqlite3 ``` ### How did you install the version of ArchiveBox you are using? Docker (or other container system like podman/LXC/Kubernetes or TrueNAS/Cloudron/YunoHost/etc.) ### What operating system are you running on? Linux: Arch Linux ### What type of drive are you using to store your ArchiveBox data? - [x] `data/` is on a local SSD or NVMe drive - [ ] `data/` is on a spinning hard drive or external USB drive - [ ] `data/` is on a network mount (e.g. NFS/SMB/CIFS/etc.) - [ ] `data/` is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/OneDrive, etc.) ### Docker Compose Configuration ```shell # Usage: # curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml # docker compose up # docker compose run archivebox version # docker compose run -T archivebox add < urls_to_archive.txt # docker compose run archivebox add --depth=1 'https://news.ycombinator.com' # docker compose run archivebox config --set SAVE_ARCHIVE_DOT_ORG=False # docker compose run archivebox help # Documentation: # https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#docker-compose services: archivebox: image: archivebox/archivebox:latest ports: - 8000:8000 volumes: - ./data:/data # ./data/personas/Default/chrome_profile/Default:/data/personas/Default/chrome_profile/Default environment: # - ADMIN_USERNAME=admin # creates an admin user on first run with the given user/pass combo # - ADMIN_PASSWORD=SomeSecretPassword - CSRF_TRUSTED_ORIGINS=https://archivebox.example.com # REQUIRED for auth, REST API, etc. to work - ALLOWED_HOSTS=* # set this to the hostname(s) from your CSRF_TRUSTED_ORIGINS - PUBLIC_INDEX=True # set to False to prevent anonymous users from viewing snapshot list - PUBLIC_SNAPSHOTS=True # set to False to prevent anonymous users from viewing snapshot content - PUBLIC_ADD_VIEW=False # set to True to allow anonymous users to submit new URLs to archive - SEARCH_BACKEND_ENGINE=sonic # tells ArchiveBox to use sonic container below for fast full-text search - SEARCH_BACKEND_HOST_NAME=sonic - SEARCH_BACKEND_PASSWORD=SomeSecretPassword # - PUID=911 # set to your host user's UID & GID if you encounter permissions issues # - PGID=911 # UID/GIDs <500 may clash with existing users and are not recommended # For options below, it's better to set using `docker compose run archivebox config --set SOME_KEY=someval` instead of setting here: # - MEDIA_MAX_SIZE=750m # increase this filesize limit to allow archiving larger audio/video files # - TIMEOUT=60 # increase this number to 120+ seconds if you see many slow downloads timing out # - CHECK_SSL_VALIDITY=True # set to False to disable strict SSL checking (allows saving URLs w/ broken certs) # - SAVE_ARCHIVE_DOT_ORG=True # set to False to disable submitting all URLs to Archive.org when archiving # - USER_AGENT="..." # set a custom USER_AGENT to avoid being blocked as a bot # ... # For more info, see: https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#configuration # For ad-blocking during archiving, uncomment this section and the pihole service below # networks: # - dns # dns: # - 172.20.0.53 ######## Optional Addons: tweak examples below as needed for your specific use case ######## ### This optional container runs scheduled jobs in the background (and retries failed ones). To add a new job: # $ docker compose run archivebox schedule --add --every=day --depth=1 'https://example.com/some/rss/feed.xml' # then restart the scheduler container to apply any changes to the scheduled task list: # $ docker compose restart archivebox_scheduler # https://github.com/ArchiveBox/ArchiveBox/wiki/Scheduled-Archiving archivebox_scheduler: image: archivebox/archivebox:latest command: schedule --foreground --update --every=day environment: # - PUID=911 # set to your host user's UID & GID if you encounter permissions issues # - PGID=911 - TIMEOUT=120 # use a higher timeout than the main container to give slow tasks more time when retrying - SEARCH_BACKEND_ENGINE=sonic # tells ArchiveBox to use sonic container below for fast full-text search - SEARCH_BACKEND_HOST_NAME=sonic - SEARCH_BACKEND_PASSWORD=SomeSecretPassword # For other config it's better to set using `docker compose run archivebox config --set SOME_KEY=someval` instead of setting here # ... # For more info, see: https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#configuration volumes: - ./data:/data # cpus: 2 # uncomment / edit these values to limit scheduler container resource consumption # mem_limit: 2048m # restart: always ### This runs the optional Sonic full-text search backend (much faster than default rg backend). # If Sonic is ever started after not running for a while, update its full-text index by running: # $ docker-compose run archivebox update --index-only # https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search sonic: image: archivebox/sonic:latest expose: - 1491 environment: - SEARCH_BACKEND_PASSWORD=SomeSecretPassword volumes: #- ./sonic.cfg:/etc/sonic.cfg:ro # mount to customize: https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/stable/etc/sonic.cfg - ./data/sonic:/var/lib/sonic/store ### This optional container runs xvfb+noVNC so you can watch the ArchiveBox browser as it archives things, # or remote control it to set up a chrome profile w/ login credentials for sites you want to archive. # https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile # https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#docker-vnc-setup novnc: image: theasp/novnc:latest environment: - DISPLAY_WIDTH=1920 - DISPLAY_HEIGHT=1080 - RUN_XTERM=no ports: # to view/control ArchiveBox's browser, visit: http://127.0.0.1:8080/vnc.html # restricted to access from localhost by default because it has no authentication - 127.0.0.1:8080:8080 ### Example: Put Nginx in front of the ArchiveBox server for SSL termination and static file serving. # You can also any other ingress provider for SSL like Apache, Caddy, Traefik, Cloudflare Tunnels, etc. # nginx: # image: nginx:alpine # ports: # - 443:443 # - 80:80 # volumes: # - ./etc/nginx.conf:/etc/nginx/nginx.conf # - ./data:/var/www ### Example: To run pihole in order to block ad/tracker requests during archiving, # uncomment this optional block and set up pihole using its admin interface # pihole: # image: pihole/pihole:latest # ports: # # access the admin HTTP interface on http://localhost:8090 # - 127.0.0.1:8090:80 # environment: # - WEBPASSWORD=SET_THIS_TO_SOME_SECRET_PASSWORD_FOR_ADMIN_DASHBOARD # - DNSMASQ_LISTENING=all # dns: # - 127.0.0.1 # - 1.1.1.1 # networks: # dns: # ipv4_address: 172.20.0.53 # volumes: # - ./etc/pihole:/etc/pihole # - ./etc/dnsmasq:/etc/dnsmasq.d ### Example: run all your ArchiveBox traffic through a WireGuard VPN tunnel to avoid IP blocks. # You can also use any other VPN that works at the docker/IP level, e.g. Tailscale, OpenVPN, etc. # wireguard: # image: linuxserver/wireguard:latest # network_mode: 'service:archivebox' # cap_add: # - NET_ADMIN # - SYS_MODULE # sysctls: # - net.ipv4.conf.all.rp_filter=2 # - net.ipv4.conf.all.src_valid_mark=1 # volumes: # - /lib/modules:/lib/modules # - ./wireguard.conf:/config/wg0.conf:ro ### Example: Run ChangeDetection.io to watch for changes to websites, then trigger ArchiveBox to archive them # Documentation: https://github.com/dgtlmoon/changedetection.io # More info: https://github.com/dgtlmoon/changedetection.io/blob/master/docker-compose.yml # changedetection: # image: ghcr.io/dgtlmoon/changedetection.io # volumes: # - ./data-changedetection:/datastore ### Example: Run PYWB in parallel and auto-import WARCs from ArchiveBox # pywb: # image: webrecorder/pywb:latest # entrypoint: /bin/sh -c '(wb-manager init default || test $$? -eq 2) && wb-manager add default /archivebox/archive/*/warc/*.warc.gz; wayback;' # environment: # - INIT_COLLECTION=archivebox # ports: # - 8686:8080 # volumes: # - ./data:/archivebox # - ./data/wayback:/webarchive networks: # network just used for pihole container to offer :53 dns resolving on fixed ip for archivebox container dns: ipam: driver: default config: - subnet: 172.20.0.0/24 # HOW TO: Set up cloud storage for your ./data/archive (e.g. Amazon S3, Backblaze B2, Google Drive, OneDrive, SFTP, etc.) # https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-Up-Storage # # Follow the steps here to set up the Docker RClone Plugin https://rclone.org/docker/ # $ docker plugin install rclone/docker-volume-rclone:amd64 --grant-all-permissions --alias rclone # $ nano /var/lib/docker-plugins/rclone/config/rclone.conf # [examplegdrive] # type = drive # scope = drive # drive_id = 1234567... # root_folder_id = 0Abcd... # token = {"access_token":...} # volumes: # archive: # driver: rclone # driver_opts: # remote: 'examplegdrive:archivebox' # allow_other: 'true' # vfs_cache_mode: full # poll_interval: 0 ``` ### ArchiveBox Configuration ```shell [SERVER_CONFIG] SECRET_KEY = redcated ```
Author
Owner

@rcarmo commented on GitHub (Jan 8, 2025):

I've noticed the same ever since I updated my Docker image 0.7.3. There is something broken in the new dependencies, perhaps.

<!-- gh-comment-id:2577250432 --> @rcarmo commented on GitHub (Jan 8, 2025): I've noticed the same ever since I updated my Docker image 0.7.3. There is something broken in the new dependencies, perhaps.
Author
Owner

@pirate commented on GitHub (Jan 9, 2025):

What browser are you using out of curiosity to access the UI?

Safari has had issues in the past submitting checked snapshot IDs correctly when Admin UI buttons are clicked.

<!-- gh-comment-id:2578938716 --> @pirate commented on GitHub (Jan 9, 2025): What browser are you using out of curiosity to access the UI? Safari has had issues in the past submitting checked snapshot IDs correctly when Admin UI buttons are clicked.
Author
Owner

@parkerlreed commented on GitHub (Jan 9, 2025):

Firefox on Linux

<!-- gh-comment-id:2579159384 --> @parkerlreed commented on GitHub (Jan 9, 2025): Firefox on Linux
Author
Owner

@rcarmo commented on GitHub (Jan 9, 2025):

I did a little more digging on this and (since I'm using the stable container image) set up a novnc container so I could see if Chrome was taking to long to load, etc.

I tried snapshotting a public gist while I was watching the VNC screen, and Chrome launched instantly, then just sat there for 60s until the timeout, then was launched again (I have PDFs and screenshots turned on), etc.

I then looked at the container logs, and the progress bar instantly shoots to 97.5% and then just sits there for 59 seconds.

The web log showed:

Command '['/usr/bin/chromium-browser', '--no-sandbox', '--no-zygote', '--disable-dev-shm-usage', '--disable-software-rasterizer', '--run-all-compositor-stages-before-draw', '--hide-scrollbars', '--window-size=1440,2000', '--autoplay-policy=no-user-gesture-required', '--no-first-run', '--use-fake-ui-for-media-stream', '--use-fake-device-for-media-stream', '--disable-sync', '--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.3 (+https://github.com/ArchiveBox/ArchiveBox/)', '--window-size=1440,2000', '--print-to-pdf', 'https://gist.github.com/nikp123/c658b31c45288b55141c9d19ede78e6c']' timed out after 60 seconds

I reverted to the 0.7.1 container image (in headless mode) and everything worked, so I tried again with 0.7.2 (also in headless mode) and it also worked.

Something is broken in the code that is not orchestrating the browser correctly, or not getting the output, and judging from my history, it happened with the 0.7.3 dependency update.

<!-- gh-comment-id:2581272105 --> @rcarmo commented on GitHub (Jan 9, 2025): I did a little more digging on this and (since I'm using the `stable` container image) set up a `novnc` container so I could see if Chrome was taking to long to load, etc. I tried snapshotting a public gist while I was watching the VNC screen, and Chrome launched instantly, then just sat there for 60s until the timeout, then was launched again (I have PDFs and screenshots turned on), etc. I then looked at the container logs, and the progress bar instantly shoots to 97.5% and then just sits there for 59 seconds. The web log showed: ``` Command '['/usr/bin/chromium-browser', '--no-sandbox', '--no-zygote', '--disable-dev-shm-usage', '--disable-software-rasterizer', '--run-all-compositor-stages-before-draw', '--hide-scrollbars', '--window-size=1440,2000', '--autoplay-policy=no-user-gesture-required', '--no-first-run', '--use-fake-ui-for-media-stream', '--use-fake-device-for-media-stream', '--disable-sync', '--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.3 (+https://github.com/ArchiveBox/ArchiveBox/)', '--window-size=1440,2000', '--print-to-pdf', 'https://gist.github.com/nikp123/c658b31c45288b55141c9d19ede78e6c']' timed out after 60 seconds ``` I reverted to the 0.7.1 container image (in headless mode) and everything worked, so I tried again with 0.7.2 (also in headless mode) and it also worked. Something is broken in the code that is not orchestrating the browser correctly, or not getting the output, and judging from my history, it happened with the 0.7.3 dependency update.
Author
Owner

@pirate commented on GitHub (Jan 9, 2025):

That sounds like you're running into one of the forms of this ssue:

Despite the title implying it's only an issue with hanging on exit, I've also seen this issue manifest as hanging on launch and not doing anything when in headful mode.

@rcarmo do you have any CHROME_USER_DATA_DIR set? and what CPU architecture are you on?

Also can ya'll try setting CHROME_HEADLESS = True manually in the config and running again?

<!-- gh-comment-id:2581461392 --> @pirate commented on GitHub (Jan 9, 2025): That sounds like you're running into one of the forms of this ssue: - https://github.com/ArchiveBox/ArchiveBox/issues/746 Despite the title implying it's only an issue with hanging on exit, I've also seen this issue manifest as hanging on launch and not doing anything when in headful mode. @rcarmo do you have any `CHROME_USER_DATA_DIR` set? and what CPU architecture are you on? Also can ya'll try setting `CHROME_HEADLESS = True` manually in the config and running again?
Author
Owner

@rcarmo commented on GitHub (Jan 10, 2025):

@pirate no to CHROME_USER_DATA_DIR, the container is on an Intel box.

<!-- gh-comment-id:2583479567 --> @rcarmo commented on GitHub (Jan 10, 2025): @pirate no to `CHROME_USER_DATA_DIR`, the container is on an Intel box.
Author
Owner

@parkerlreed commented on GitHub (Jan 10, 2025):

That sounds like you're running into one of the forms of this ssue:

* [`singlefile` extractor leaves behind zombie orphan chromium processes when it times out #746](https://github.com/ArchiveBox/ArchiveBox/issues/746)

Despite the title implying it's only an issue with hanging on exit, I've also seen this issue manifest as hanging on launch and not doing anything when in headful mode.

@rcarmo do you have any CHROME_USER_DATA_DIR set? and what CPU architecture are you on?

Also can ya'll try setting CHROME_HEADLESS = True manually in the config and running again?

Adding the headless flag does seem to work. Was able to run a new task.

<!-- gh-comment-id:2584869635 --> @parkerlreed commented on GitHub (Jan 10, 2025): > That sounds like you're running into one of the forms of this ssue: > > * [`singlefile` extractor leaves behind zombie orphan chromium processes when it times out #746](https://github.com/ArchiveBox/ArchiveBox/issues/746) > > > Despite the title implying it's only an issue with hanging on exit, I've also seen this issue manifest as hanging on launch and not doing anything when in headful mode. > > [@rcarmo](https://github.com/rcarmo) do you have any `CHROME_USER_DATA_DIR` set? and what CPU architecture are you on? > > Also can ya'll try setting `CHROME_HEADLESS = True` manually in the config and running again? Adding the headless flag does seem to work. Was able to run a new task.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2481
No description provided.