[GH-ISSUE #1177] Bug: When SAVE_WARC is set to false, the web UI still shows the WARC icon activated even though it leads to nothing #2240

Closed
opened 2026-03-01 17:57:36 +03:00 by kerem · 1 comment
Owner

Originally created by @melyux on GitHub (Jul 12, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1177

Describe the bug

When the environment variable SAVE_WARC is set to false, the web UI still shows the WARC icon activated even though it leads to nothing.

Steps to reproduce

  1. Set SAVE_WARC to false.
  2. Snapshot a URL.
  3. Look at the URL on the snapshots page.

Screenshots or log output

In this picture, SAVE_PDF, SAVE_GIT, and SAVE_WARC are all set to false, but only the WARC icon shows up as activated while the other 2 are properly grayed out.
image

ArchiveBox version

find: '/.config/chromium/Crash Reports/pending/': No such file or directory
0.6.3
ArchiveBox v0.6.3 Cpython Linux Linux-6.1.0-10-amd64-x86_64-with-glibc2.31 x86_64
DEBUG=False IN_DOCKER=True IS_TTY=True TZ=UTC FS_ATOMIC=True FS_REMOTE=True FS_PERMS=644 999:999 SEARCH_BACKEND=ripgrep

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.3         valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.6.3          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v7.74.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21           valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v18.16.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file                              
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor              
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js                         
 √  GIT_BINARY            v2.30.2         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2023.03.04     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v112.0.5615.138  valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v12.1.1         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           22 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            7 files @       valid     /data                                                                       
 √  SOURCES_DIR           172 files       valid     ./sources                                                                   
 √  LOGS_DIR              2 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           54 files        valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             776.0 KB        valid     ./index.sqlite3  
Originally created by @melyux on GitHub (Jul 12, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1177 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you. (the only required section is the version information) --> #### Describe the bug When the environment variable SAVE_WARC is set to false, the web UI still shows the WARC icon activated even though it leads to nothing. #### Steps to reproduce <!-- For example: 1. Ran ArchiveBox with the following config '...' 2. Saw this output during archiving '....' 3. UI didn't show the thing I was expecting '....' --> 1. Set SAVE_WARC to false. 2. Snapshot a URL. 3. Look at the URL on the snapshots page. #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> In this picture, SAVE_PDF, SAVE_GIT, and SAVE_WARC are all set to false, but only the WARC icon shows up as activated while the other 2 are properly grayed out. <img width="257" alt="image" src="https://github.com/ArchiveBox/ArchiveBox/assets/10296053/781f8152-ca8a-4a7b-87c3-4f9fd44b9a6f"> #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs find: '/.config/chromium/Crash Reports/pending/': No such file or directory 0.6.3 ArchiveBox v0.6.3 Cpython Linux Linux-6.1.0-10-amd64-x86_64-with-glibc2.31 x86_64 DEBUG=False IN_DOCKER=True IS_TTY=True TZ=UTC FS_ATOMIC=True FS_REMOTE=True FS_PERMS=644 999:999 SEARCH_BACKEND=ripgrep [i] Dependency versions: √ PYTHON_BINARY v3.11.3 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.6.3 valid /usr/local/bin/archivebox √ CURL_BINARY v7.74.0 valid /usr/bin/curl √ WGET_BINARY v1.21 valid /usr/bin/wget √ NODE_BINARY v18.16.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.30.2 valid /usr/bin/git √ YOUTUBEDL_BINARY v2023.03.04 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v112.0.5615.138 valid /usr/bin/chromium √ RIPGREP_BINARY v12.1.1 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 22 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 7 files @ valid /data √ SOURCES_DIR 172 files valid ./sources √ LOGS_DIR 2 files valid ./logs √ ARCHIVE_DIR 54 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 776.0 KB valid ./index.sqlite3 ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue -→
Author
Owner

@pirate commented on GitHub (Jan 5, 2024):

The WARC method is sort of a lie as it's not really its own extractor, it's just a parameter added when the wget extractor runs.

For speed reasons (to avoid 100s of filesystem calls to tell if the warc files are present) when viewing the Snapshot list, we assume that if WGET ran then there is a WARC available.

This may change in the future if we switch to a different WARC generation method (e.g. ArchiveWeb.page / browsertrix / pywb), but for now it's expected behavior and I'm unlikely to add a whole caching system or complex workaround just for this edge case.

<!-- gh-comment-id:1877968350 --> @pirate commented on GitHub (Jan 5, 2024): The WARC method is sort of a lie as it's not really its own extractor, it's just a parameter added when the wget extractor runs. For speed reasons (to avoid 100s of filesystem calls to tell if the warc files are present) when viewing the Snapshot list, we assume that if WGET ran then there is a WARC available. This may change in the future if we switch to a different WARC generation method (e.g. ArchiveWeb.page / browsertrix / pywb), but for now it's expected behavior and I'm unlikely to add a whole caching system or complex workaround just for this edge case.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2240
No description provided.