[GH-ISSUE #1625] archivebox update does not pick up status properly #3987

Open
opened 2026-03-15 01:12:48 +03:00 by kerem · 0 comments
Owner

Originally created by @groby on GitHub (Dec 23, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1625

Originally assigned to: @pirate on GitHub.

Provide a screenshot and describe the bug

When an archivebox add fails, it gives you commands you can run by hand ("Run to see full output:")

That's super-helpful for debugging, but when you do that, they occasionally leave the correct artifacts - technically yay, I've archived what I wanted. But, alas, index.json is not updated, and archivebox update -t doesn't pick up on that change either. That means that the status as displayed by the web server isn't necessarily correct - e.g. screenshots shown as greyed out when they exist.

Not a major issue (I can always manually fix up the index.json :), but it'd be nice if there was a way to apply those fixes wholesale to the archive

Steps to reproduce

1. Try to archive a URL and fail in one or more steps
2. Run the suggested manual command to investigate what the problem is
3. Have the manual command succeed
4. View status on server - you don't see the success
5. Run archivebox update for that snapshot 
6. View status on server - still no update.

Logs or errors


ArchiveBox Version

0.7.2
ArchiveBox v0.7.2 BUILD_TIME=2024-12-22 19:26:23 1734924383
IN_DOCKER=False IN_QEMU=False ARCH=x86_64 OS=Darwin PLATFORM=macOS-15.2-x86_64-i386-64bit PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=False FS_USER=502:20 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.5         valid     /usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/bin/python3.11
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/sqlite3/dbapi2.py
 √  DJANGO_BINARY         v3.1.14         valid     /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/lib/python3.11/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.2          valid     /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/bin/archivebox

 √  CURL_BINARY           v8.7.1          valid     /usr/bin/curl
 -  WGET_BINARY           -               disabled  /usr/local/bin/wget
 √  NODE_BINARY           v16.4.2         valid     /usr/local/bin/node
 √  SINGLEFILE_BINARY     v1.0.31         valid     ./node_modules/single-file/cli/single-file
 -  READABILITY_BINARY    -               disabled  ./node_modules/readability-extractor/readability-extractor
 -  MERCURY_BINARY        -               disabled  postlight-parser
 -  GIT_BINARY            -               disabled  /usr/local/bin/git
 -  YOUTUBEDL_BINARY      -               disabled  /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/bin/yt-dlp
 √  CHROME_BINARY         v131.0.6778.205  valid     "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/local/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/lib/python3.11/site-packages/archivebox
 √  TEMPLATES_DIR         3 files         valid     /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/lib/python3.11/site-packages/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None
 -  COOKIES_FILE          -               disabled  None

[i] Data locations:
 √  OUTPUT_DIR            11 files        valid     /Users/groby/web-archive.2
 √  SOURCES_DIR           7 files         valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           1774 files      valid     ./archive
 √  CONFIG_FILE           453.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             18.1 MB         valid     ./index.sqlite3

How did you install the version of ArchiveBox you are using?

Other

What operating system are you running on?

macOS (including Docker on macOS)

What type of drive are you using to store your ArchiveBox data?

  • data/ is on a local SSD or NVMe drive
  • data/ is on a spinning hard drive or external USB drive
  • data/ is on a network mount (e.g. NFS/SMB/CIFS/etc.)
  • data/ is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/OneDrive, etc.)

Docker Compose Configuration


ArchiveBox Configuration

[SERVER_CONFIG]
<redacted>

[ARCHIVE_METHOD_TOGGLES]
SAVE_TITLE = True
SAVE_FAVICON = False
SAVE_GIT = False
SAVE_SINGLEFILE = True
SAVE_READABILITY = False
SAVE_MERCURY = False
SAVE_PDF = True
SAVE_DOM = False
SAVE_SCREENSHOT = True
SAVE_WGET = True
SAVE_WARC = True
SAVE_ARCHIVEBOX = False
SAVE_ARCHIVE_DOT_ORG = True
SAVE_MEDIA = False

[DEPENDENCY_CONFIG]
USE_SINGLEFILE = True
USE_WGET = False
Originally created by @groby on GitHub (Dec 23, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1625 Originally assigned to: @pirate on GitHub. ### Provide a screenshot and describe the bug When an archivebox add fails, it gives you commands you can run by hand ("Run to see full output:") That's super-helpful for debugging, but when you do that, they occasionally leave the correct artifacts - technically yay, I've archived what I wanted. But, alas, index.json is not updated, and archivebox update -t <timestamp> doesn't pick up on that change either. That means that the status as displayed by the web server isn't necessarily correct - e.g. screenshots shown as greyed out when they exist. Not a major issue (I can always manually fix up the index.json :), but it'd be nice if there was a way to apply those fixes wholesale to the archive ### Steps to reproduce ```markdown 1. Try to archive a URL and fail in one or more steps 2. Run the suggested manual command to investigate what the problem is 3. Have the manual command succeed 4. View status on server - you don't see the success 5. Run archivebox update for that snapshot 6. View status on server - still no update. ``` ### Logs or errors ```shell ``` ### ArchiveBox Version ```shell 0.7.2 ArchiveBox v0.7.2 BUILD_TIME=2024-12-22 19:26:23 1734924383 IN_DOCKER=False IN_QEMU=False ARCH=x86_64 OS=Darwin PLATFORM=macOS-15.2-x86_64-i386-64bit PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=False FS_USER=502:20 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.5 valid /usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.2 valid /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/bin/archivebox √ CURL_BINARY v8.7.1 valid /usr/bin/curl - WGET_BINARY - disabled /usr/local/bin/wget √ NODE_BINARY v16.4.2 valid /usr/local/bin/node √ SINGLEFILE_BINARY v1.0.31 valid ./node_modules/single-file/cli/single-file - READABILITY_BINARY - disabled ./node_modules/readability-extractor/readability-extractor - MERCURY_BINARY - disabled postlight-parser - GIT_BINARY - disabled /usr/local/bin/git - YOUTUBEDL_BINARY - disabled /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/bin/yt-dlp √ CHROME_BINARY v131.0.6778.205 valid "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" √ RIPGREP_BINARY v13.0.0 valid /usr/local/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/lib/python3.11/site-packages/archivebox √ TEMPLATES_DIR 3 files valid /Users/groby/.cache/uv/archive-v0/phGKS0M5lK0pwRy3g-KFx/lib/python3.11/site-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 11 files valid /Users/groby/web-archive.2 √ SOURCES_DIR 7 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 1774 files valid ./archive √ CONFIG_FILE 453.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 18.1 MB valid ./index.sqlite3 ``` ### How did you install the version of ArchiveBox you are using? Other ### What operating system are you running on? macOS (including Docker on macOS) ### What type of drive are you using to store your ArchiveBox data? - [x] `data/` is on a local SSD or NVMe drive - [ ] `data/` is on a spinning hard drive or external USB drive - [ ] `data/` is on a network mount (e.g. NFS/SMB/CIFS/etc.) - [ ] `data/` is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/OneDrive, etc.) ### Docker Compose Configuration ```shell ``` ### ArchiveBox Configuration ```shell [SERVER_CONFIG] <redacted> [ARCHIVE_METHOD_TOGGLES] SAVE_TITLE = True SAVE_FAVICON = False SAVE_GIT = False SAVE_SINGLEFILE = True SAVE_READABILITY = False SAVE_MERCURY = False SAVE_PDF = True SAVE_DOM = False SAVE_SCREENSHOT = True SAVE_WGET = True SAVE_WARC = True SAVE_ARCHIVEBOX = False SAVE_ARCHIVE_DOT_ORG = True SAVE_MEDIA = False [DEPENDENCY_CONFIG] USE_SINGLEFILE = True USE_WGET = False ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3987
No description provided.