[GH-ISSUE #1699] Bug: SQLite db disk image requires repair when macOS interrupts container without clean shutdown #2526

Open
opened 2026-03-01 17:59:37 +03:00 by kerem · 1 comment
Owner

Originally created by @Scribbd on GitHub (Oct 12, 2025).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1699

Originally assigned to: @pirate on GitHub.

Provide a screenshot and describe the bug

I am aware of #955, and that the documentation has a working fix. This ticket is to add more information about this bug, as I can reproduce it with some consistency. The uncertainty is what gives rise to my hypothesis, as I think it is an unfortunate timing issue. It has happened to me three times, and I can always link it back to when I leave my device unattended for a while. Either I forget I have an archiving job running and close the lid. Or having the screen locked with the power plugged in(?: see steps to reproduce; I have a battery management app).

My hypothesis is that some external power management process interrupts/sleeps one of the ArchiveBox processes and puts it and the DB in a state from which it cannot recover. This can either be macOS power management or the Docker Desktop sleep feature.

Some details on my setup: I run a modified docker-compose stack that has some health checks and dependencies added. The first few times the database got malformed, I deleted the archivebox data folder and started the stack to rebuild the folder from scratch. Only to be met with an error akin to cannot init folder that isn't empty, while having rm -rfd-ed the folder. This addition seemed to resolve the issue. I suspect that there might be a race condition between the scheduler and the main container.

Steps to reproduce

1. Have M1 (if relevant for powermanagement) MacBook Pro with Sequoia 15.6.1
2. Install Aldente Pro (battery manager) v1.35.1 with disable charging while sleeping and use `caffeinate -du &`.
3. Start the docker-compose stack with the scheduler.
4. Run multiple long-running archive jobs.
5. Lock the screen, close the lid, or let the Mac go to sleep in general.
6. ??? Wait a long time for macOS power management magic ???
7. Wake device
8. Have a malformed SQLite disk image.

Logs or errors

docker compose run --rm --env-from-file ./.env archivebox add --tag AWS < ~/Tools/aws-doc-url-collector/out.txt
[+] Creating 2/2
 ✔ Container archivebox-pihole-1  Running                                                                                                        0.0s
 ✔ Container archivebox-sonic-1   Running                                                                                                        0.0s
[i] [2025-10-12 12:09:09] ArchiveBox v0.7.3: archivebox add --tag AWS
    > /data

[+] [2025-10-12 12:09:11] Adding 11142 links to index (crawl depth=0)...
    > Saved verbatim input to sources/1760270951-import.txt
    > Parsed 11141 URLs from input (Generic TXT)
    > Found 0 new URLs not already in index

[*] [2025-10-12 12:09:17] Writing 0 links to main index...
    √ ./index.sqlite3
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute
    return Database.Cursor.execute(self, query, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.DatabaseError: database disk image is malformed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/app/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/cli/archivebox_add.py", line 109, in main
    add(
  File "/app/archivebox/util.py", line 116, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/main.py", line 661, in add
    snapshot.save()
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 753, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 790, in save_base
    updated = self._save_table(
              ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 872, in _save_table
    updated = self._do_update(base_qs, using, pk_val, values, update_fields,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 926, in _do_update
    return filtered._update(values) > 0
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 803, in _update
    return query.get_compiler(self.db).execute_sql(CURSOR)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1522, in execute_sql
    cursor = super().execute_sql(result_type)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1156, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 79, in _execute
    with self.db.wrap_database_errors:
  File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute
    return Database.Cursor.execute(self, query, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.DatabaseError: database disk image is malformed

ArchiveBox Version

docker compose run --rm --env-from-file ./.env archivebox version                                                                           󱎫 20m6s
[+] Running 2/2
 ✔ Container archivebox-sonic-1   Started                                                                                                        0.2s
 ✔ Container archivebox-pihole-1  Started                                                                                                        0.2s
0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:01 1734256441
IN_DOCKER=True IN_QEMU=False ARCH=aarch64 OS=Linux PLATFORM=Linux-6.10.14-linuxkit-aarch64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=911:0 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /usr/local/bin/python3.11
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.3          valid     /usr/local/bin/archivebox

 √  CURL_BINARY           v8.10.1         valid     /usr/bin/curl
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget
 √  NODE_BINARY           v20.18.1        valid     /usr/bin/node
 √  SINGLEFILE_BINARY     v1.1.54         valid     /app/node_modules/single-file-cli/single-file
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js
 √  GIT_BINARY            v2.39.5         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2024.12.13     valid     /usr/local/bin/yt-dlp
 √  CHROME_BINARY         v131.0.6778.33  valid     /usr/bin/chromium-browser
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None

How did you install the version of ArchiveBox you are using?

Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc)

What operating system are you running on?

macOS (including Docker on macOS)

What type of drive are you using to store your ArchiveBox data?

  • some of data/ is on a local SSD or NVMe drive
  • some of data/ is on a spinning hard drive or external USB drive
  • some of data/ is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.)
  • some of data/ is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.)

Docker Compose Configuration

# Usage:
#     mkdir -p ~/archivebox/data && cd ~/archivebox
#     curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml
#     docker compose run archivebox version
#     docker compose run archivebox config --set SAVE_ARCHIVE_DOT_ORG=False
#     docker compose run archivebox add --depth=1 'https://news.ycombinator.com'
#     docker compose run -T archivebox add < bookmarks.txt
#     docker compose up -d && open 'https://localhost:8000'
#     docker compose run archivebox help
# Documentation:
#     https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#docker-compose

services:
  archivebox:
    image: archivebox/archivebox:latest
    ports:
      - 40080:8000
    volumes:
      - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data:/data
      # - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data/personas/Default/chrome_profile/Default:/data/personas/Default/chrome_profile/Default
    environment:
      - ADMIN_USERNAME=admin # creates an admin user on first run with the given user/pass combo
      - ADMIN_PASSWORD=${ADMIN_PASSWORD:-SomeSecretPassword}
      - ALLOWED_HOSTS=* # set this to the hostname(s) you're going to serve the site from!
      - CSRF_TRUSTED_ORIGINS=http://localhost:8000 # you MUST set this to the server's URL for admin login and the REST API to work
      - PUBLIC_INDEX=True # set to False to prevent anonymous users from viewing snapshot list
      - PUBLIC_SNAPSHOTS=True # set to False to prevent anonymous users from viewing snapshot content
      - PUBLIC_ADD_VIEW=False # set to True to allow anonymous users to submit new URLs to archive
      - SEARCH_BACKEND_ENGINE=sonic # tells ArchiveBox to use sonic container below for fast full-text search
      - SEARCH_BACKEND_HOST_NAME=sonic
      - SEARCH_BACKEND_PASSWORD=${BACKEND_PASSWORD:-SomeSecretPassword}
      # - PUID=911                        # set to your host user's UID & GID if you encounter permissions issues
      # - PGID=911                        # UID/GIDs lower than 500 may clash with system uids and are not recommended
      # For options below, it's better to set in data/ArchiveBox.conf or use `docker compose run archivebox config --set SOME_KEY=someval` instead of setting here:
      # - MEDIA_MAX_SIZE=750m             # increase this filesize limit to allow archiving larger audio/video files
      # - TIMEOUT=60                      # increase this number to 120+ seconds if you see many slow downloads timing out
      # - CHECK_SSL_VALIDITY=True         # set to False to disable strict SSL checking (allows saving URLs w/ broken certs)
      - SAVE_ARCHIVE_DOT_ORG=True # set to False to disable submitting all URLs to Archive.org when archiving
      # - USER_AGENT="..."                # set a custom USER_AGENT to avoid being blocked as a bot
      # ...
      # For more info, see: https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#configuration

      # For ad-blocking during archiving, uncomment this section and the pihole service below
    networks:
      - dns
    dns:
      - ${PIHOLE_IP:-172.20.0.53}
    depends_on:
      pihole:
        condition: service_healthy
      sonic:
        condition: service_started
    healthcheck:
      test:
        - CMD-SHELL
        - curl --silent 'http://localhost:8000/health/' | grep -q 'OK'
      interval: 30s
      timeout: 20s
      retries: 15
      ######## Optional Addons: tweak examples below as needed for your specific use case ########

      ### This optional container runs scheduled jobs in the background (and retries failed ones). To add a new job:
      #   $ docker compose run archivebox schedule --add --every=day --depth=1 'https://example.com/some/rss/feed.xml'
      # then restart the scheduler container to apply any changes to the scheduled task list:
      #   $ docker compose restart archivebox_scheduler
      # https://github.com/ArchiveBox/ArchiveBox/wiki/Scheduled-Archiving
    restart: unless-stopped
  archivebox_scheduler:
    image: archivebox/archivebox:latest
    command: schedule --foreground --update --every=day
    depends_on:
      archivebox:
        condition: service_healthy
    environment:
      # - PUID=911                        # set to your host user's UID & GID if you encounter permissions issues
      # - PGID=911
      - TIMEOUT=120 # use a higher timeout than the main container to give slow tasks more time when retrying
      - SEARCH_BACKEND_ENGINE=sonic # tells ArchiveBox to use sonic container below for fast full-text search
      - SEARCH_BACKEND_HOST_NAME=sonic
      - SEARCH_BACKEND_PASSWORD=${BACKEND_PASSWORD:-SomeSecretPassword}
      # For other config it's better to set using `docker compose run archivebox config --set SOME_KEY=someval` instead of setting here
      # ...
      # For more info, see: https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#configuration
    volumes:
      - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data:/data
    # cpus: 2                               # uncomment / edit these values to limit scheduler container resource consumption
    # mem_limit: 2048m
    # restart: always

    ### This runs the optional Sonic full-text search backend (much faster than default rg backend).
    # If Sonic is ever started after not running for a while, update its full-text index by running:
    #   $ docker-compose run archivebox update --index-only
    # https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search

  sonic:
    image: archivebox/sonic:latest
    expose:
      - 1491
    environment:
      - SEARCH_BACKEND_PASSWORD=${BACKEND_PASSWORD:-SomeSecretPassword}
    volumes:
      - ${SONIC_CONFIG_PATH:-./sonic.cfg}:/etc/sonic.cfg:ro # mount to customize: https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/stable/etc/sonic.cfg
      - ${PERSISTENT_ROOT_DIR:-.}/archivebox/sonic:/var/lib/sonic/store
  ### This optional container runs xvfb+noVNC so you can watch the ArchiveBox browser as it archives things,
  # or remote control it to set up a chrome profile w/ login credentials for sites you want to archive.
  # https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile
  # https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#docker-vnc-setup

  # novnc:
  #   image: theasp/novnc:latest
  #   environment:
  #     - DISPLAY_WIDTH=1920
  #     - DISPLAY_HEIGHT=1080
  #     - RUN_XTERM=no
  #   ports:
  #     # to view/control ArchiveBox's browser, visit: http://127.0.0.1:8080/vnc.html
  #     # restricted to access from localhost by default because it has no authentication
  #     - 127.0.0.1:40081:8080
  #   restart: unless-stopped
  ### Example: Put Nginx in front of the ArchiveBox server for SSL termination and static file serving.
  # You can also any other ingress provider for SSL like Apache, Caddy, Traefik, Cloudflare Tunnels, etc.

  # nginx:
  #     image: nginx:alpine
  #     ports:
  #         - 443:443
  #         - 80:80
  #     volumes:
  #         - ./etc/nginx.conf:/etc/nginx/nginx.conf
  #         - ./data:/var/www

  ### Example: To run pihole in order to block ad/tracker requests during archiving,
  # uncomment this optional block and set up pihole using its admin interface

  pihole:
    image: pihole/pihole:latest
    ports:
      # access the admin HTTP interface on http://localhost:8090
      - 127.0.0.1:40082:80
    environment:
      - WEBPASSWORD=${ADMIN_PASSWORD:-SomeSecretPassword}
      - DNSMASQ_LISTENING=all
    dns:
      - 127.0.0.1
      - 1.1.1.1
    networks:
      dns:
        ipv4_address: ${PIHOLE_IP:-172.20.0.53}
    volumes:
      - ${PERSISTENT_ROOT_DIR:-.}/archivebox/etc/pihole:/etc/pihole
      - ${PERSISTENT_ROOT_DIR:-.}/archivebox/etc/dnsmasq:/etc/dnsmasq.d
    restart: unless-stopped
  ### Example: run all your ArchiveBox traffic through a WireGuard VPN tunnel to avoid IP blocks.
  # You can also use any other VPN that works at the docker/IP level, e.g. Tailscale, OpenVPN, etc.

  # wireguard:
  #   image: linuxserver/wireguard:latest
  #   network_mode: 'service:archivebox'
  #   cap_add:
  #     - NET_ADMIN
  #     - SYS_MODULE
  #   sysctls:
  #     - net.ipv4.conf.all.rp_filter=2
  #     - net.ipv4.conf.all.src_valid_mark=1
  #   volumes:
  #     - /lib/modules:/lib/modules
  #     - ./wireguard.conf:/config/wg0.conf:ro

  ### Example: Run ChangeDetection.io to watch for changes to websites, then trigger ArchiveBox to archive them
  # Documentation: https://github.com/dgtlmoon/changedetection.io
  # More info: https://github.com/dgtlmoon/changedetection.io/blob/master/docker-compose.yml

#  changedetection:
#    image: ghcr.io/dgtlmoon/changedetection.io
#    volumes:
#      - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data-changedetection:/datastore
#    restart: unless-stopped
#    ports:
#      - 127.0.0.1:40084:5000
#    depends_on:
#      archivebox:
#        condition: service_healthy
  ### Example: Run PYWB in parallel and auto-import WARCs from ArchiveBox

  pywb:
    image: webrecorder/pywb:latest
    entrypoint: /bin/sh -c '(wb-manager init default || test $$? -eq 2) &&
      wb-manager add default /archivebox/archive/*/warc/*.warc.gz; wayback;'
    environment:
      - INIT_COLLECTION=archivebox
    ports:
      - 40083:8080
    volumes:
      - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data:/archivebox
      - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data/wayback:/webarchive
    depends_on:
      archivebox:
        condition: service_healthy
networks:
  # network just used for pihole container to offer :53 dns resolving on fixed ip for archivebox container
  dns:
    ipam:
      driver: default
      config:
        - subnet: ${NETWORK_CONFIG:-172.20.0.0/24}

ArchiveBox Configuration

[SERVER_CONFIG]
SECRET_KEY = *******
Originally created by @Scribbd on GitHub (Oct 12, 2025). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1699 Originally assigned to: @pirate on GitHub. ### Provide a screenshot and describe the bug I am aware of #955, and that the documentation has a working fix. This ticket is to add more information about this bug, as I can reproduce it with some consistency. The uncertainty is what gives rise to my hypothesis, as I think it is an unfortunate timing issue. It has happened to me three times, and I can always link it back to when I leave my device unattended for a while. Either I forget I have an archiving job running and close the lid. Or having the screen locked with the power plugged in(?: see steps to reproduce; I have a battery management app). My hypothesis is that some external power management process interrupts/sleeps one of the ArchiveBox processes and puts it and the DB in a state from which it cannot recover. This can either be macOS power management or the Docker Desktop sleep feature. Some details on my setup: I run a modified docker-compose stack that has some health checks and dependencies added. The first few times the database got malformed, I deleted the archivebox data folder and started the stack to rebuild the folder from scratch. Only to be met with an error akin to `cannot init folder that isn't empty`, while having `rm -rfd`-ed the folder. This addition seemed to resolve the issue. I suspect that there might be a race condition between the scheduler and the main container. ### Steps to reproduce ```markdown 1. Have M1 (if relevant for powermanagement) MacBook Pro with Sequoia 15.6.1 2. Install Aldente Pro (battery manager) v1.35.1 with disable charging while sleeping and use `caffeinate -du &`. 3. Start the docker-compose stack with the scheduler. 4. Run multiple long-running archive jobs. 5. Lock the screen, close the lid, or let the Mac go to sleep in general. 6. ??? Wait a long time for macOS power management magic ??? 7. Wake device 8. Have a malformed SQLite disk image. ``` ### Logs or errors ```shell docker compose run --rm --env-from-file ./.env archivebox add --tag AWS < ~/Tools/aws-doc-url-collector/out.txt [+] Creating 2/2 ✔ Container archivebox-pihole-1 Running 0.0s ✔ Container archivebox-sonic-1 Running 0.0s [i] [2025-10-12 12:09:09] ArchiveBox v0.7.3: archivebox add --tag AWS > /data [+] [2025-10-12 12:09:11] Adding 11142 links to index (crawl depth=0)... > Saved verbatim input to sources/1760270951-import.txt > Parsed 11141 URLs from input (Generic TXT) > Found 0 new URLs not already in index [*] [2025-10-12 12:09:17] Writing 0 links to main index... √ ./index.sqlite3 Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute return Database.Cursor.execute(self, query, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ sqlite3.DatabaseError: database disk image is malformed The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/bin/archivebox", line 8, in <module> sys.exit(main()) ^^^^^^ File "/app/archivebox/cli/__init__.py", line 140, in main run_subcommand( File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/cli/archivebox_add.py", line 109, in main add( File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/main.py", line 661, in add snapshot.save() File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 753, in save self.save_base(using=using, force_insert=force_insert, File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 790, in save_base updated = self._save_table( ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 872, in _save_table updated = self._do_update(base_qs, using, pk_val, values, update_fields, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 926, in _do_update return filtered._update(values) > 0 ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 803, in _update return query.get_compiler(self.db).execute_sql(CURSOR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1522, in execute_sql cursor = super().execute_sql(result_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1156, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 66, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers return executor(sql, params, many, context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 79, in _execute with self.db.wrap_database_errors: File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 90, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute return Database.Cursor.execute(self, query, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ django.db.utils.DatabaseError: database disk image is malformed ``` ### ArchiveBox Version ```shell docker compose run --rm --env-from-file ./.env archivebox version 󱎫 20m6s [+] Running 2/2 ✔ Container archivebox-sonic-1 Started 0.2s ✔ Container archivebox-pihole-1 Started 0.2s 0.7.3 ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:01 1734256441 IN_DOCKER=True IN_QEMU=False ARCH=aarch64 OS=Linux PLATFORM=Linux-6.10.14-linuxkit-aarch64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=911:0 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.11 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.3 valid /usr/local/bin/archivebox √ CURL_BINARY v8.10.1 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.18.1 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.54 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.5 valid /usr/bin/git √ YOUTUBEDL_BINARY v2024.12.13 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v131.0.6778.33 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None ``` ### How did you install the version of ArchiveBox you are using? Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc) ### What operating system are you running on? macOS (including Docker on macOS) ### What type of drive are you using to store your ArchiveBox data? - [x] some of `data/` is on a local SSD or NVMe drive - [ ] some of `data/` is on a spinning hard drive or external USB drive - [ ] some of `data/` is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.) - [ ] some of `data/` is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.) ### Docker Compose Configuration ```shell # Usage: # mkdir -p ~/archivebox/data && cd ~/archivebox # curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml # docker compose run archivebox version # docker compose run archivebox config --set SAVE_ARCHIVE_DOT_ORG=False # docker compose run archivebox add --depth=1 'https://news.ycombinator.com' # docker compose run -T archivebox add < bookmarks.txt # docker compose up -d && open 'https://localhost:8000' # docker compose run archivebox help # Documentation: # https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#docker-compose services: archivebox: image: archivebox/archivebox:latest ports: - 40080:8000 volumes: - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data:/data # - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data/personas/Default/chrome_profile/Default:/data/personas/Default/chrome_profile/Default environment: - ADMIN_USERNAME=admin # creates an admin user on first run with the given user/pass combo - ADMIN_PASSWORD=${ADMIN_PASSWORD:-SomeSecretPassword} - ALLOWED_HOSTS=* # set this to the hostname(s) you're going to serve the site from! - CSRF_TRUSTED_ORIGINS=http://localhost:8000 # you MUST set this to the server's URL for admin login and the REST API to work - PUBLIC_INDEX=True # set to False to prevent anonymous users from viewing snapshot list - PUBLIC_SNAPSHOTS=True # set to False to prevent anonymous users from viewing snapshot content - PUBLIC_ADD_VIEW=False # set to True to allow anonymous users to submit new URLs to archive - SEARCH_BACKEND_ENGINE=sonic # tells ArchiveBox to use sonic container below for fast full-text search - SEARCH_BACKEND_HOST_NAME=sonic - SEARCH_BACKEND_PASSWORD=${BACKEND_PASSWORD:-SomeSecretPassword} # - PUID=911 # set to your host user's UID & GID if you encounter permissions issues # - PGID=911 # UID/GIDs lower than 500 may clash with system uids and are not recommended # For options below, it's better to set in data/ArchiveBox.conf or use `docker compose run archivebox config --set SOME_KEY=someval` instead of setting here: # - MEDIA_MAX_SIZE=750m # increase this filesize limit to allow archiving larger audio/video files # - TIMEOUT=60 # increase this number to 120+ seconds if you see many slow downloads timing out # - CHECK_SSL_VALIDITY=True # set to False to disable strict SSL checking (allows saving URLs w/ broken certs) - SAVE_ARCHIVE_DOT_ORG=True # set to False to disable submitting all URLs to Archive.org when archiving # - USER_AGENT="..." # set a custom USER_AGENT to avoid being blocked as a bot # ... # For more info, see: https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#configuration # For ad-blocking during archiving, uncomment this section and the pihole service below networks: - dns dns: - ${PIHOLE_IP:-172.20.0.53} depends_on: pihole: condition: service_healthy sonic: condition: service_started healthcheck: test: - CMD-SHELL - curl --silent 'http://localhost:8000/health/' | grep -q 'OK' interval: 30s timeout: 20s retries: 15 ######## Optional Addons: tweak examples below as needed for your specific use case ######## ### This optional container runs scheduled jobs in the background (and retries failed ones). To add a new job: # $ docker compose run archivebox schedule --add --every=day --depth=1 'https://example.com/some/rss/feed.xml' # then restart the scheduler container to apply any changes to the scheduled task list: # $ docker compose restart archivebox_scheduler # https://github.com/ArchiveBox/ArchiveBox/wiki/Scheduled-Archiving restart: unless-stopped archivebox_scheduler: image: archivebox/archivebox:latest command: schedule --foreground --update --every=day depends_on: archivebox: condition: service_healthy environment: # - PUID=911 # set to your host user's UID & GID if you encounter permissions issues # - PGID=911 - TIMEOUT=120 # use a higher timeout than the main container to give slow tasks more time when retrying - SEARCH_BACKEND_ENGINE=sonic # tells ArchiveBox to use sonic container below for fast full-text search - SEARCH_BACKEND_HOST_NAME=sonic - SEARCH_BACKEND_PASSWORD=${BACKEND_PASSWORD:-SomeSecretPassword} # For other config it's better to set using `docker compose run archivebox config --set SOME_KEY=someval` instead of setting here # ... # For more info, see: https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#configuration volumes: - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data:/data # cpus: 2 # uncomment / edit these values to limit scheduler container resource consumption # mem_limit: 2048m # restart: always ### This runs the optional Sonic full-text search backend (much faster than default rg backend). # If Sonic is ever started after not running for a while, update its full-text index by running: # $ docker-compose run archivebox update --index-only # https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search sonic: image: archivebox/sonic:latest expose: - 1491 environment: - SEARCH_BACKEND_PASSWORD=${BACKEND_PASSWORD:-SomeSecretPassword} volumes: - ${SONIC_CONFIG_PATH:-./sonic.cfg}:/etc/sonic.cfg:ro # mount to customize: https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/stable/etc/sonic.cfg - ${PERSISTENT_ROOT_DIR:-.}/archivebox/sonic:/var/lib/sonic/store ### This optional container runs xvfb+noVNC so you can watch the ArchiveBox browser as it archives things, # or remote control it to set up a chrome profile w/ login credentials for sites you want to archive. # https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile # https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#docker-vnc-setup # novnc: # image: theasp/novnc:latest # environment: # - DISPLAY_WIDTH=1920 # - DISPLAY_HEIGHT=1080 # - RUN_XTERM=no # ports: # # to view/control ArchiveBox's browser, visit: http://127.0.0.1:8080/vnc.html # # restricted to access from localhost by default because it has no authentication # - 127.0.0.1:40081:8080 # restart: unless-stopped ### Example: Put Nginx in front of the ArchiveBox server for SSL termination and static file serving. # You can also any other ingress provider for SSL like Apache, Caddy, Traefik, Cloudflare Tunnels, etc. # nginx: # image: nginx:alpine # ports: # - 443:443 # - 80:80 # volumes: # - ./etc/nginx.conf:/etc/nginx/nginx.conf # - ./data:/var/www ### Example: To run pihole in order to block ad/tracker requests during archiving, # uncomment this optional block and set up pihole using its admin interface pihole: image: pihole/pihole:latest ports: # access the admin HTTP interface on http://localhost:8090 - 127.0.0.1:40082:80 environment: - WEBPASSWORD=${ADMIN_PASSWORD:-SomeSecretPassword} - DNSMASQ_LISTENING=all dns: - 127.0.0.1 - 1.1.1.1 networks: dns: ipv4_address: ${PIHOLE_IP:-172.20.0.53} volumes: - ${PERSISTENT_ROOT_DIR:-.}/archivebox/etc/pihole:/etc/pihole - ${PERSISTENT_ROOT_DIR:-.}/archivebox/etc/dnsmasq:/etc/dnsmasq.d restart: unless-stopped ### Example: run all your ArchiveBox traffic through a WireGuard VPN tunnel to avoid IP blocks. # You can also use any other VPN that works at the docker/IP level, e.g. Tailscale, OpenVPN, etc. # wireguard: # image: linuxserver/wireguard:latest # network_mode: 'service:archivebox' # cap_add: # - NET_ADMIN # - SYS_MODULE # sysctls: # - net.ipv4.conf.all.rp_filter=2 # - net.ipv4.conf.all.src_valid_mark=1 # volumes: # - /lib/modules:/lib/modules # - ./wireguard.conf:/config/wg0.conf:ro ### Example: Run ChangeDetection.io to watch for changes to websites, then trigger ArchiveBox to archive them # Documentation: https://github.com/dgtlmoon/changedetection.io # More info: https://github.com/dgtlmoon/changedetection.io/blob/master/docker-compose.yml # changedetection: # image: ghcr.io/dgtlmoon/changedetection.io # volumes: # - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data-changedetection:/datastore # restart: unless-stopped # ports: # - 127.0.0.1:40084:5000 # depends_on: # archivebox: # condition: service_healthy ### Example: Run PYWB in parallel and auto-import WARCs from ArchiveBox pywb: image: webrecorder/pywb:latest entrypoint: /bin/sh -c '(wb-manager init default || test $$? -eq 2) && wb-manager add default /archivebox/archive/*/warc/*.warc.gz; wayback;' environment: - INIT_COLLECTION=archivebox ports: - 40083:8080 volumes: - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data:/archivebox - ${PERSISTENT_ROOT_DIR:-.}/archivebox/data/wayback:/webarchive depends_on: archivebox: condition: service_healthy networks: # network just used for pihole container to offer :53 dns resolving on fixed ip for archivebox container dns: ipam: driver: default config: - subnet: ${NETWORK_CONFIG:-172.20.0.0/24} ``` ### ArchiveBox Configuration ```shell [SERVER_CONFIG] SECRET_KEY = ******* ```
Author
Owner

@pirate commented on GitHub (Oct 16, 2025):

Interesting thanks for the additional info. I still have never been able to replicate this personally. I use an M1 MacBook Pro as my daily driver and often sleep/wake it without pausing containers. I don't use Aldente Pro or Caffinate, but I do use Amphetamine.app occasionally which is probably similar.

For anyone who stumbles across this later, the docs in question that describe the fix are here:

https://github.com/ArchiveBox/ArchiveBox/wiki/Troubleshooting#repairing-a-corrupted-sqlite3-database-file

<!-- gh-comment-id:3412694065 --> @pirate commented on GitHub (Oct 16, 2025): Interesting thanks for the additional info. I still have never been able to replicate this personally. I use an M1 MacBook Pro as my daily driver and often sleep/wake it without pausing containers. I don't use Aldente Pro or Caffinate, but I do use `Amphetamine.app` occasionally which is probably similar. For anyone who stumbles across this later, the docs in question that describe the fix are here: https://github.com/ArchiveBox/ArchiveBox/wiki/Troubleshooting#repairing-a-corrupted-sqlite3-database-file
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2526
No description provided.