[GH-ISSUE #1278] Bug: Singlefile and other Chrome-based extractors not working in 0.7.1 on x86_64 #2295

Closed
opened 2026-03-01 17:57:59 +03:00 by kerem · 12 comments
Owner

Originally created by @onemenzel on GitHub (Dec 2, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1278

Describe the bug

Since about a month, singlefile and other Chrome-based extractors don't work anymore.

Steps to reproduce

Enter a url into the add interface or try to pull any snapshot. My config:

[SERVER_CONFIG]
SECRET_KEY = REDACTED
PUBLIC_INDEX = False
PUBLIC_SNAPSHOTS = False
PUBLIC_ADD_VIEW = False
SAVE_FAVICON = False
SAVE_ARCHIVE_DOT_ORG = False
SAVE_WGET = False
SAVE_WARC = False
SAVE_PDF = False
SAVE_SCREENSHOT = False
SAVE_MEDIA = False
SAVE_DOM = False
SAVE_GIT = False
CHROME_USER_DATA_DIR = /data/chromium-profile
CHROME_SANDBOX = True
CHROME_HEADLESS = True

I also tried to enable user namespace cloning in my host system as recommended in the puppeteer docs that are linked in the logs that I pasted below. Also, I tried the second method from there (Setup setuid sandbox) within the container but I could not get that to work as well…

Screenshots or log output

Log Output
archivebox_1  |         Extractor failed:
archivebox_1  |              SingleFile was not able to archive the page
archivebox_1  |         Run to see full output:
archivebox_1  |             cd /data/archive/REDACTED;
archivebox_1  |             /app/node_modules/single-file-cli/single-file --browser-executable-path=chromium-browser "--browser-args=[\"--headless=new\", \"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/)\", \"--window-size=1440,2000\", \"--user-data-dir=/data/chromium-profile\"]" REDACTED singlefile.html

When I execute the above in via docker-compose exec archivebox bash:

root@dfbcbfb5e14f:/data# gosu $PUID /app/node_modules/single-file-cli/single-file --browser-executable-path=chromium-browser "--browser-args=[\"--headless=new\", \"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/)\", \"--window-size=1440,2000\", \"--user-data-dir=/data/chromium-profile\"]" REDACTED singlefile.html
Failed to launch the browser process!
[194:194:1202/191703.178938:FATAL:zygote_host_impl_linux.cc(127)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.
#0 0x5580a3179882 base::debug::CollectStackTrace()
#1 0x5580a3166083 base::debug::StackTrace::StackTrace()
#2 0x5580a30c0723 logging::LogMessage::~LogMessage()
#3 0x5580a1751cea content::ZygoteHostImpl::Init()
#4 0x5580a24e83cc content::ContentMainRunnerImpl::Initialize()
#5 0x5580a24e5eb0 content::RunContentProcess()
#6 0x5580a24e627d content::ContentMain()
#7 0x55809e96331b ChromeMain
#8 0x7f980c8f51ca (/usr/lib/x86_64-linux-gnu/libc.so.6+0x271c9)
#9 0x7f980c8f5285 __libc_start_main
#10 0x55809e96302a _start
Crash keys:
  "switch-42" = "about:blank"
  "switch-41" = "--use-angle=swiftshader-webgl"
  "switch-40" = "--ozone-override-screen-size=800,600"
  "switch-39" = "--ozone-platform=headless"
  "switch-38" = "--noerrdialogs"
  "switch-37" = "--remote-debugging-port=0"
  "switch-36" = "--window-size=1280,720"
  "switch-35" = "--no-pings"
  "switch-34" = "--disable-web-security"
  "switch-33" = "--user-data-dir=/data/chromium-profile"
  "switch-32" = "--window-size=1440,2000"
  "switch-31" = "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) App"
  "switch-30" = "--headless=new"
  "switch-29" = "--mute-audio"
  "switch-28" = "--hide-scrollbars"
  "switch-27" = "--headless"
  "switch-26" = "--use-mock-keychain"
  "switch-25" = "--password-store=basic"
  "switch-24" = "--no-first-run"
  "switch-23" = "--metrics-recording-only"
  "switch-22" = "--force-color-profile=srgb"
  "switch-21" = "--export-tagged-pdf"
  "switch-20" = "--enable-blink-features=IdleDetection"
  "switch-19" = "--enable-automation"
  "switch-18" = "--disable-sync"
  "switch-17" = "--disable-search-engine-choice-screen"
  "switch-16" = "--disable-renderer-backgrounding"
  "switch-15" = "--disable-prompt-on-repost"
  "switch-14" = "--disable-popup-blocking"
  "switch-13" = "--disable-ipc-flooding-protection"
  "switch-12" = "--disable-hang-monitor"
  "switch-11" = "--disable-extensions"
  "switch-10" = "--disable-dev-shm-usage"
  "switch-9" = "--disable-default-apps"
  "switch-8" = "--disable-component-update"
  "switch-7" = "--disable-component-extensions-with-background-pages"
  "switch-6" = "--disable-client-side-phishing-detection"
  "switch-5" = "--disable-breakpad"
  "switch-4" = "--disable-backgrounding-occluded-windows"
  "switch-3" = "--disable-background-timer-throttling"
  "switch-2" = "--disable-background-networking"
  "switch-1" = "--allow-pre-commit-input"
  "num-switches" = "44"
  "commandline-disabled-feature-6" = "BackForwardCache"
  "commandline-disabled-feature-5" = "Prerender2"
  "commandline-disabled-feature-4" = "OptimizationHints"
  "commandline-disabled-feature-3" = "MediaRouter"
  "commandline-disabled-feature-2" = "AcceptCHFrame"
  "commandline-disabled-feature-1" = "Translate"
  "commandline-enabled-feature-1" = "NetworkServiceInProcess2"
  "osarch" = "x86_64"
  "pid" = "194"
  "ptype" = "browser"

[1202/191703.431034:ERROR:file_io_posix.cc(144)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2)
[1202/191703.431174:ERROR:file_io_posix.cc(144)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2)
Received signal 6
#0 0x5580a3179882 base::debug::CollectStackTrace()
#1 0x5580a3166083 base::debug::StackTrace::StackTrace()
#2 0x5580a31792a1 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7f980c909fd0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x3bfcf)
#4 0x7f980c958d3c (/usr/lib/x86_64-linux-gnu/libc.so.6+0x8ad3b)
#5 0x7f980c909f32 gsignal
#6 0x7f980c8f4472 abort
#7 0x5580a315a0c5 base::debug::BreakDebuggerAsyncSafe()
#8 0x5580a30c11c8 base::RepeatingCallback<>::Run()
#9 0x5580a30c0ee4 logging::LogMessage::~LogMessage()
#10 0x5580a1751cea content::ZygoteHostImpl::Init()
#11 0x5580a24e83cc content::ContentMainRunnerImpl::Initialize()
#12 0x5580a24e5eb0 content::RunContentProcess()
#13 0x5580a24e627d content::ContentMain()
#14 0x55809e96331b ChromeMain
#15 0x7f980c8f51ca (/usr/lib/x86_64-linux-gnu/libc.so.6+0x271c9)
#16 0x7f980c8f5285 __libc_start_main
#17 0x55809e96302a _start
  r8: 0000323000338167  r9: 0000000000000b27 r10: 0000000000000008 r11: 0000000000000246
 r12: 0000000000000006 r13: 000000000000007f r14: 000055809c33609f r15: 00007fffb1ca2950
  di: 00000000000000c2  si: 00000000000000c2  bp: 00007f980b58b380  bx: 00000000000000c2
  dx: 0000000000000006  ax: 0000000000000000  cx: 00007f980c958d3c  sp: 00007fffb1ca27b0
  ip: 00007f980c958d3c efl: 0000000000000246 cgf: 002b000000000033 erf: 0000000000000000
 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]


TROUBLESHOOTING: https://pptr.dev/troubleshooting

ArchiveBox version

docker: archivebox/archivebox:latest as of today

I'm using docker-compose with ubuntu 20.04 as the host system.

0.7.1+editable
ArchiveBox v0.7.1+editable Cpython Linux Linux-5.4.0-167-generic-x86_64-with-glibc2.36 x86_64
DEBUG=False IN_DOCKER=True IN_QEMU=False IS_TTY=True TZ=UTC FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644 SEARCH_BACKEND=sonic

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.6         valid     /usr/local/bin/python3.11
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.1          valid     /usr/local/bin/archivebox

 √  CURL_BINARY           v8.4.0          valid     /usr/bin/curl
 -  WGET_BINARY           -               disabled  /usr/bin/wget
 √  NODE_BINARY           v21.3.0         valid     /usr/bin/node
 √  SINGLEFILE_BINARY     v1.1.18         valid     /app/node_modules/single-file-cli/single-file
 √  READABILITY_BINARY    v0.0.9          valid     /app/node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js
 -  GIT_BINARY            -               disabled  /usr/bin/git
 -  YOUTUBEDL_BINARY      -               disabled  /usr/local/bin/yt-dlp
 √  CHROME_BINARY         v119.0.6045.9   valid     /usr/bin/chromium-browser
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           22 files        valid     /app/archivebox
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None

[i] Secrets locations:
 √  CHROME_USER_DATA_DIR  3 files         valid     ./chromium-profile
 -  COOKIES_FILE          -               disabled  None

[i] Data locations:
 √  OUTPUT_DIR            9 files @       valid     /data
 √  SOURCES_DIR           979 files       valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           879 files       valid     ./archive
 √  CONFIG_FILE           421.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             6.5 MB          valid     ./index.sqlite3
Originally created by @onemenzel on GitHub (Dec 2, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1278 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you. (the only required section is the version information) --> #### Describe the bug <!-- A description of what the bug is, what you expected to happen, and any relevant context about issue. --> Since about a month, singlefile and other Chrome-based extractors don't work anymore. #### Steps to reproduce <!-- For example: 1. Ran ArchiveBox with the following config '...' 2. Saw this output during archiving '....' 3. UI didn't show the thing I was expecting '....' --> Enter a url into the add interface or try to pull any snapshot. My config: ``` [SERVER_CONFIG] SECRET_KEY = REDACTED PUBLIC_INDEX = False PUBLIC_SNAPSHOTS = False PUBLIC_ADD_VIEW = False SAVE_FAVICON = False SAVE_ARCHIVE_DOT_ORG = False SAVE_WGET = False SAVE_WARC = False SAVE_PDF = False SAVE_SCREENSHOT = False SAVE_MEDIA = False SAVE_DOM = False SAVE_GIT = False CHROME_USER_DATA_DIR = /data/chromium-profile CHROME_SANDBOX = True CHROME_HEADLESS = True ``` I also tried to [enable user namespace cloning](https://pptr.dev/troubleshooting#recommended-enable-user-namespace-cloning) in my host system as recommended in the puppeteer docs that are linked in the logs that I pasted below. Also, I tried the second method from there ([Setup setuid sandbox](https://pptr.dev/troubleshooting#alternative-setup-setuid-sandbox)) within the container but I could not get that to work as well… #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> <details> <summary>Log Output</summary> ``` archivebox_1 | Extractor failed: archivebox_1 | SingleFile was not able to archive the page archivebox_1 | Run to see full output: archivebox_1 | cd /data/archive/REDACTED; archivebox_1 | /app/node_modules/single-file-cli/single-file --browser-executable-path=chromium-browser "--browser-args=[\"--headless=new\", \"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/)\", \"--window-size=1440,2000\", \"--user-data-dir=/data/chromium-profile\"]" REDACTED singlefile.html ``` When I execute the above in via `docker-compose exec archivebox bash`: ``` root@dfbcbfb5e14f:/data# gosu $PUID /app/node_modules/single-file-cli/single-file --browser-executable-path=chromium-browser "--browser-args=[\"--headless=new\", \"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/)\", \"--window-size=1440,2000\", \"--user-data-dir=/data/chromium-profile\"]" REDACTED singlefile.html Failed to launch the browser process! [194:194:1202/191703.178938:FATAL:zygote_host_impl_linux.cc(127)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox. #0 0x5580a3179882 base::debug::CollectStackTrace() #1 0x5580a3166083 base::debug::StackTrace::StackTrace() #2 0x5580a30c0723 logging::LogMessage::~LogMessage() #3 0x5580a1751cea content::ZygoteHostImpl::Init() #4 0x5580a24e83cc content::ContentMainRunnerImpl::Initialize() #5 0x5580a24e5eb0 content::RunContentProcess() #6 0x5580a24e627d content::ContentMain() #7 0x55809e96331b ChromeMain #8 0x7f980c8f51ca (/usr/lib/x86_64-linux-gnu/libc.so.6+0x271c9) #9 0x7f980c8f5285 __libc_start_main #10 0x55809e96302a _start Crash keys: "switch-42" = "about:blank" "switch-41" = "--use-angle=swiftshader-webgl" "switch-40" = "--ozone-override-screen-size=800,600" "switch-39" = "--ozone-platform=headless" "switch-38" = "--noerrdialogs" "switch-37" = "--remote-debugging-port=0" "switch-36" = "--window-size=1280,720" "switch-35" = "--no-pings" "switch-34" = "--disable-web-security" "switch-33" = "--user-data-dir=/data/chromium-profile" "switch-32" = "--window-size=1440,2000" "switch-31" = "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) App" "switch-30" = "--headless=new" "switch-29" = "--mute-audio" "switch-28" = "--hide-scrollbars" "switch-27" = "--headless" "switch-26" = "--use-mock-keychain" "switch-25" = "--password-store=basic" "switch-24" = "--no-first-run" "switch-23" = "--metrics-recording-only" "switch-22" = "--force-color-profile=srgb" "switch-21" = "--export-tagged-pdf" "switch-20" = "--enable-blink-features=IdleDetection" "switch-19" = "--enable-automation" "switch-18" = "--disable-sync" "switch-17" = "--disable-search-engine-choice-screen" "switch-16" = "--disable-renderer-backgrounding" "switch-15" = "--disable-prompt-on-repost" "switch-14" = "--disable-popup-blocking" "switch-13" = "--disable-ipc-flooding-protection" "switch-12" = "--disable-hang-monitor" "switch-11" = "--disable-extensions" "switch-10" = "--disable-dev-shm-usage" "switch-9" = "--disable-default-apps" "switch-8" = "--disable-component-update" "switch-7" = "--disable-component-extensions-with-background-pages" "switch-6" = "--disable-client-side-phishing-detection" "switch-5" = "--disable-breakpad" "switch-4" = "--disable-backgrounding-occluded-windows" "switch-3" = "--disable-background-timer-throttling" "switch-2" = "--disable-background-networking" "switch-1" = "--allow-pre-commit-input" "num-switches" = "44" "commandline-disabled-feature-6" = "BackForwardCache" "commandline-disabled-feature-5" = "Prerender2" "commandline-disabled-feature-4" = "OptimizationHints" "commandline-disabled-feature-3" = "MediaRouter" "commandline-disabled-feature-2" = "AcceptCHFrame" "commandline-disabled-feature-1" = "Translate" "commandline-enabled-feature-1" = "NetworkServiceInProcess2" "osarch" = "x86_64" "pid" = "194" "ptype" = "browser" [1202/191703.431034:ERROR:file_io_posix.cc(144)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2) [1202/191703.431174:ERROR:file_io_posix.cc(144)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2) Received signal 6 #0 0x5580a3179882 base::debug::CollectStackTrace() #1 0x5580a3166083 base::debug::StackTrace::StackTrace() #2 0x5580a31792a1 base::debug::(anonymous namespace)::StackDumpSignalHandler() #3 0x7f980c909fd0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x3bfcf) #4 0x7f980c958d3c (/usr/lib/x86_64-linux-gnu/libc.so.6+0x8ad3b) #5 0x7f980c909f32 gsignal #6 0x7f980c8f4472 abort #7 0x5580a315a0c5 base::debug::BreakDebuggerAsyncSafe() #8 0x5580a30c11c8 base::RepeatingCallback<>::Run() #9 0x5580a30c0ee4 logging::LogMessage::~LogMessage() #10 0x5580a1751cea content::ZygoteHostImpl::Init() #11 0x5580a24e83cc content::ContentMainRunnerImpl::Initialize() #12 0x5580a24e5eb0 content::RunContentProcess() #13 0x5580a24e627d content::ContentMain() #14 0x55809e96331b ChromeMain #15 0x7f980c8f51ca (/usr/lib/x86_64-linux-gnu/libc.so.6+0x271c9) #16 0x7f980c8f5285 __libc_start_main #17 0x55809e96302a _start r8: 0000323000338167 r9: 0000000000000b27 r10: 0000000000000008 r11: 0000000000000246 r12: 0000000000000006 r13: 000000000000007f r14: 000055809c33609f r15: 00007fffb1ca2950 di: 00000000000000c2 si: 00000000000000c2 bp: 00007f980b58b380 bx: 00000000000000c2 dx: 0000000000000006 ax: 0000000000000000 cx: 00007f980c958d3c sp: 00007fffb1ca27b0 ip: 00007f980c958d3c efl: 0000000000000246 cgf: 002b000000000033 erf: 0000000000000000 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000 [end of stack trace] TROUBLESHOOTING: https://pptr.dev/troubleshooting ``` </details> #### ArchiveBox version docker: archivebox/archivebox:latest as of today I'm using docker-compose with ubuntu 20.04 as the host system. <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs 0.7.1+editable ArchiveBox v0.7.1+editable Cpython Linux Linux-5.4.0-167-generic-x86_64-with-glibc2.36 x86_64 DEBUG=False IN_DOCKER=True IN_QEMU=False IS_TTY=True TZ=UTC FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644 SEARCH_BACKEND=sonic [i] Dependency versions: √ PYTHON_BINARY v3.11.6 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.1 valid /usr/local/bin/archivebox √ CURL_BINARY v8.4.0 valid /usr/bin/curl - WGET_BINARY - disabled /usr/bin/wget √ NODE_BINARY v21.3.0 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.18 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.9 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js - GIT_BINARY - disabled /usr/bin/git - YOUTUBEDL_BINARY - disabled /usr/local/bin/yt-dlp √ CHROME_BINARY v119.0.6045.9 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 22 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: √ CHROME_USER_DATA_DIR 3 files valid ./chromium-profile - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 9 files @ valid /data √ SOURCES_DIR 979 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 879 files valid ./archive √ CONFIG_FILE 421.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 6.5 MB valid ./index.sqlite3 ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue -->
kerem closed this issue 2026-03-01 17:57:59 +03:00
Author
Owner

@pirate commented on GitHub (Dec 18, 2023):

Can you try again with the latest build, I bumped the chrome version and pushed a few other minor fixes.

<!-- gh-comment-id:1859593793 --> @pirate commented on GitHub (Dec 18, 2023): Can you try again with the latest build, I bumped the chrome version and pushed a few other minor fixes.
Author
Owner

@darwinshameran commented on GitHub (Dec 20, 2023):

Same issue here. Unfortunately this is preventing us from using this in production internally, and don't want to exactly run a dev build.

<!-- gh-comment-id:1864732827 --> @darwinshameran commented on GitHub (Dec 20, 2023): Same issue here. Unfortunately this is preventing us from using this in production internally, and don't want to exactly run a dev build.
Author
Owner

@pirate commented on GitHub (Dec 20, 2023):

I'm almost ready to roll a minor patches release from dev -> main https://github.com/ArchiveBox/ArchiveBox/pull/1297. If you're able to verify it works on your machine I can get the release out by new years.

Chrome archiving works on my test machines and I cant reproduce this reported issue on the 0.7.2 candidate, so it would be super helpful to get bug reports from anyone who's experiencing failures so can make sure 0.7.2 works for everyone.

Namespace cloning or suid sandboxing should not be necessary within Docker, it should "just work" on the first try with our new playwright-based chrome distribution 😕

<!-- gh-comment-id:1865017781 --> @pirate commented on GitHub (Dec 20, 2023): I'm almost ready to roll a minor patches release from `dev` -> `main` https://github.com/ArchiveBox/ArchiveBox/pull/1297. If you're able to verify it works on your machine I can get the release out by new years. Chrome archiving works on my test machines and I cant reproduce this reported issue on the 0.7.2 candidate, so it would be super helpful to get bug reports from anyone who's experiencing failures so can make sure 0.7.2 works for everyone. Namespace cloning or suid sandboxing should not be necessary within Docker, it should "just work" on the first try with our new playwright-based chrome distribution 😕
Author
Owner

@MyNameIsOka commented on GitHub (Jan 1, 2024):

I would have liked to test it locally but I am not able to build the Docker container in my local environment following those steps: https://github.com/ArchiveBox/ArchiveBox#setup-the-dev-environment

It fails at docker build. First it failed because there was a , missing in parse_version_string which was easily fixable.
Then it failed because it couldn't find VERSIONS_AVAILABLE probably because no release exists (?). How can I circumvent that error?

log:

# DOCKER_BUILDKIT=1 docker build . -t archivebox
[+] Building 79.6s (26/27)                                                                                                                                                                                                                  
 => [internal] load build definition from Dockerfile                                                                                                                                                                                   6.2s
 => => transferring dockerfile: 37B                                                                                                                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                      4.6s
 => => transferring context: 35B                                                                                                                                                                                                       0.1s
 => [internal] load metadata for docker.io/library/python:3.11-slim-bookworm                                                                                                                                                           0.0s
 => [stage-0  1/23] FROM docker.io/library/python:3.11-slim-bookworm                                                                                                                                                                   0.0s
 => [internal] load build context                                                                                                                                                                                                      2.6s
 => => transferring context: 95.13kB                                                                                                                                                                                                   0.2s
 => CACHED [stage-0  2/23] COPY --chown=root:root --chmod=755 package.json /app/                                                                                                                                                       0.0s
 => CACHED [stage-0  3/23] RUN grep '"version": ' "/app/package.json" | awk -F'"' '{print $4}' > /VERSION.txt                                                                                                                          0.0s
 => CACHED [stage-0  4/23] RUN echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache     && rm -f /etc/apt/apt.conf.d/docker-clean                                                                0.0s
 => CACHED [stage-0  5/23] RUN (echo "[i] Docker build for ArchiveBox $(cat /VERSION.txt) starting..."     && echo "PLATFORM=linux/amd64 ARCH=$(uname -m) ($(uname -s) amd64 )"     && echo "BUILD_START_TIME=$(date +"%Y-%m-%d %H:%M  0.0s
 => CACHED [stage-0  6/23] RUN echo "[*] Setting up archivebox user uid=911..."     && groupadd --system archivebox     && useradd --system --create-home --gid archivebox --groups audio,video archivebox     && usermod -u "911" "a  0.0s
 => CACHED [stage-0  7/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64     echo "[+] Installing APT base system dependencies for linux/amd64..."     && echo 'deb https://deb.debian.org/debian bookworm  0.0s
 => CACHED [stage-0  8/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.npm,sharing=locked,id=npm-amd64     echo "[+] Installing Node 21 environment in /app/node_module  0.0s
 => CACHED [stage-0  9/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64     echo "[+] Setting up Python 3.11 runtime..."     && (   0.0s
 => CACHED [stage-0 10/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64     echo "[+] Installing APT extractor dependencies global  0.0s
 => CACHED [stage-0 11/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64 --mount=type=cache,target=/root/.cache/ms-playwright,shari  0.0s
 => CACHED [stage-0 12/23] WORKDIR /app                                                                                                                                                                                                0.0s
 => CACHED [stage-0 13/23] COPY --chown=root:root --chmod=755 package.json package-lock.json /app/                                                                                                                                     0.0s
 => CACHED [stage-0 14/23] RUN --mount=type=cache,target=/root/.npm,sharing=locked,id=npm-amd64     echo "[+] Installing NPM extractor dependencies from package.json into /app/node_modules..."     && npm ci --prefer-offline --no-  0.0s
 => CACHED [stage-0 15/23] WORKDIR /app                                                                                                                                                                                                0.0s
 => CACHED [stage-0 16/23] COPY --chown=root:root --chmod=755 ./pyproject.toml requirements.txt /app/                                                                                                                                  0.0s
 => CACHED [stage-0 17/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64     echo "[+] Installing PIP ArchiveBox dependencies from   0.0s
 => [stage-0 18/23] COPY --chown=root:root --chmod=755 . /app/                                                                                                                                                                         6.1s
 => [stage-0 19/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64     echo "[*] Installing PIP ArchiveBox package from /app..."     27.3s
 => [stage-0 20/23] WORKDIR /data                                                                                                                                                                                                      7.5s 
 => [stage-0 21/23] RUN (echo -e "\n\n[√] Finished Docker build succesfully. Saving build summary in: /VERSION.txt"     && echo -e "PLATFORM=linux/amd64 ARCH=$(uname -m) ($(uname -s) amd64 )\n"     && echo -e "BUILD_END_TIME=$    10.3s 
 => ERROR [stage-0 22/23] RUN "/app"/bin/docker_entrypoint.sh version 2>&1 | tee -a /VERSION.txt                                                                                                                                      16.1s 
------                                                                                                                                                                                                                                      
 > [stage-0 22/23] RUN "/app"/bin/docker_entrypoint.sh version 2>&1 | tee -a /VERSION.txt:                                                                                                                                                  
#26 13.64                                                                                                                                                                                                                                   
#26 13.64 [X] Error while loading configuration value: VERSIONS_AVAILABLE                                                                                                                                                                   
#26 13.64     IndexError: list index out of range
#26 13.64 
#26 13.64     Check your config for mistakes and try again (your archive data is unaffected).
#26 13.64 
#26 13.64     For config documentation and examples see:
#26 13.64         https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration
#26 13.64 
------
executor failed running [/bin/bash -o pipefail -o errexit -o errtrace -o nounset -c "$CODE_DIR"/bin/docker_entrypoint.sh version 2>&1 | tee -a /VERSION.txt]: exit code: 2
<!-- gh-comment-id:1873128115 --> @MyNameIsOka commented on GitHub (Jan 1, 2024): I would have liked to test it locally but I am not able to build the Docker container in my local environment following those steps: https://github.com/ArchiveBox/ArchiveBox#setup-the-dev-environment It fails at `docker build`. First it failed because there was a `,` missing in `parse_version_string` which was easily fixable. Then it failed because it couldn't find `VERSIONS_AVAILABLE` probably because no release exists (?). How can I circumvent that error? log: ``` # DOCKER_BUILDKIT=1 docker build . -t archivebox [+] Building 79.6s (26/27) => [internal] load build definition from Dockerfile 6.2s => => transferring dockerfile: 37B 0.0s => [internal] load .dockerignore 4.6s => => transferring context: 35B 0.1s => [internal] load metadata for docker.io/library/python:3.11-slim-bookworm 0.0s => [stage-0 1/23] FROM docker.io/library/python:3.11-slim-bookworm 0.0s => [internal] load build context 2.6s => => transferring context: 95.13kB 0.2s => CACHED [stage-0 2/23] COPY --chown=root:root --chmod=755 package.json /app/ 0.0s => CACHED [stage-0 3/23] RUN grep '"version": ' "/app/package.json" | awk -F'"' '{print $4}' > /VERSION.txt 0.0s => CACHED [stage-0 4/23] RUN echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache && rm -f /etc/apt/apt.conf.d/docker-clean 0.0s => CACHED [stage-0 5/23] RUN (echo "[i] Docker build for ArchiveBox $(cat /VERSION.txt) starting..." && echo "PLATFORM=linux/amd64 ARCH=$(uname -m) ($(uname -s) amd64 )" && echo "BUILD_START_TIME=$(date +"%Y-%m-%d %H:%M 0.0s => CACHED [stage-0 6/23] RUN echo "[*] Setting up archivebox user uid=911..." && groupadd --system archivebox && useradd --system --create-home --gid archivebox --groups audio,video archivebox && usermod -u "911" "a 0.0s => CACHED [stage-0 7/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 echo "[+] Installing APT base system dependencies for linux/amd64..." && echo 'deb https://deb.debian.org/debian bookworm 0.0s => CACHED [stage-0 8/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.npm,sharing=locked,id=npm-amd64 echo "[+] Installing Node 21 environment in /app/node_module 0.0s => CACHED [stage-0 9/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64 echo "[+] Setting up Python 3.11 runtime..." && ( 0.0s => CACHED [stage-0 10/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64 echo "[+] Installing APT extractor dependencies global 0.0s => CACHED [stage-0 11/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64 --mount=type=cache,target=/root/.cache/ms-playwright,shari 0.0s => CACHED [stage-0 12/23] WORKDIR /app 0.0s => CACHED [stage-0 13/23] COPY --chown=root:root --chmod=755 package.json package-lock.json /app/ 0.0s => CACHED [stage-0 14/23] RUN --mount=type=cache,target=/root/.npm,sharing=locked,id=npm-amd64 echo "[+] Installing NPM extractor dependencies from package.json into /app/node_modules..." && npm ci --prefer-offline --no- 0.0s => CACHED [stage-0 15/23] WORKDIR /app 0.0s => CACHED [stage-0 16/23] COPY --chown=root:root --chmod=755 ./pyproject.toml requirements.txt /app/ 0.0s => CACHED [stage-0 17/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64 echo "[+] Installing PIP ArchiveBox dependencies from 0.0s => [stage-0 18/23] COPY --chown=root:root --chmod=755 . /app/ 6.1s => [stage-0 19/23] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-amd64 --mount=type=cache,target=/root/.cache/pip,sharing=locked,id=pip-amd64 echo "[*] Installing PIP ArchiveBox package from /app..." 27.3s => [stage-0 20/23] WORKDIR /data 7.5s => [stage-0 21/23] RUN (echo -e "\n\n[√] Finished Docker build succesfully. Saving build summary in: /VERSION.txt" && echo -e "PLATFORM=linux/amd64 ARCH=$(uname -m) ($(uname -s) amd64 )\n" && echo -e "BUILD_END_TIME=$ 10.3s => ERROR [stage-0 22/23] RUN "/app"/bin/docker_entrypoint.sh version 2>&1 | tee -a /VERSION.txt 16.1s ------ > [stage-0 22/23] RUN "/app"/bin/docker_entrypoint.sh version 2>&1 | tee -a /VERSION.txt: #26 13.64 #26 13.64 [X] Error while loading configuration value: VERSIONS_AVAILABLE #26 13.64 IndexError: list index out of range #26 13.64 #26 13.64 Check your config for mistakes and try again (your archive data is unaffected). #26 13.64 #26 13.64 For config documentation and examples see: #26 13.64 https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration #26 13.64 ------ executor failed running [/bin/bash -o pipefail -o errexit -o errtrace -o nounset -c "$CODE_DIR"/bin/docker_entrypoint.sh version 2>&1 | tee -a /VERSION.txt]: exit code: 2 ```
Author
Owner

@pirate commented on GitHub (Jan 3, 2024):

Sorry whoops, there a broken commit on dev when you tested, just fixed it. I just pushed the latest working build, mind trying again?

(No need to build locally, just pull docker pull archivebox/archivebox:dev to get the dev image pre-built from Docker Hub)

<!-- gh-comment-id:1874769001 --> @pirate commented on GitHub (Jan 3, 2024): Sorry whoops, there a broken commit on dev when you tested, just fixed it. I just pushed the latest working build, mind trying again? (No need to build locally, just pull `docker pull archivebox/archivebox:dev` to get the dev image pre-built from Docker Hub)
Author
Owner

@MyNameIsOka commented on GitHub (Jan 3, 2024):

Thank you for fixing the image.
I pulled it but it seems that it still doesn't work. This is what is output in the errors.log when I add a website with just SingleFile selected:

Exception in archive_methods.save_singlefile(Link(url=https://docs.archivebox.io/en/v0.6.2/Contents.html#)) command=/usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2024-01-03__03:36:59
expected string or bytes-like object, got 'NoneType'

Also, it looks like the formatting where the tags, pull, snapshot, etc. buttons are is broken:
Screenshot 2024-01-03 at 12 39 13

<!-- gh-comment-id:1874813907 --> @MyNameIsOka commented on GitHub (Jan 3, 2024): Thank you for fixing the image. I pulled it but it seems that it still doesn't work. This is what is output in the errors.log when I add a website with just `SingleFile` selected: ``` Exception in archive_methods.save_singlefile(Link(url=https://docs.archivebox.io/en/v0.6.2/Contents.html#)) command=/usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2024-01-03__03:36:59 expected string or bytes-like object, got 'NoneType' ``` Also, it looks like the formatting where the tags, pull, snapshot, etc. buttons are is broken: ![Screenshot 2024-01-03 at 12 39 13](https://github.com/ArchiveBox/ArchiveBox/assets/18796117/b4c2f127-f317-41d6-b57f-6915a02413ea)
Author
Owner

@pirate commented on GitHub (Jan 4, 2024):

Argh so sorry for the hassle, try again now, I just pushed another fix. I've confirmed it looks like this on our demo server now:

image
<!-- gh-comment-id:1876298885 --> @pirate commented on GitHub (Jan 4, 2024): Argh so sorry for the hassle, try again now, I just pushed another fix. I've confirmed it looks like this on our demo server now: <img width="701" alt="image" src="https://github.com/ArchiveBox/ArchiveBox/assets/511499/c4266da5-152e-4be1-92e6-0e48170cf504">
Author
Owner

@MyNameIsOka commented on GitHub (Jan 4, 2024):

Thanks, I can confirm the tags and buttons are shown correctly again. SingleFile saving is still not working.

<!-- gh-comment-id:1877879764 --> @MyNameIsOka commented on GitHub (Jan 4, 2024): Thanks, I can confirm the tags and buttons are shown correctly again. SingleFile saving is still not working.
Author
Owner

@pirate commented on GitHub (Jan 9, 2024):

I've upgraded singlefile in 0.7.2 and fixed a few small bugs. Can you try on the latest version with archivebox config --set DEBUG=True @MyNameIsOka and let me know if SingleFile is still failing?

You should get more output if you save a specific link you know is broken with only singlefile like so:

docker compose run archivebox add --extract=singlefile 'https://docs.archivebox.io/en/v0.6.2/Contents.html#'
<!-- gh-comment-id:1883978477 --> @pirate commented on GitHub (Jan 9, 2024): I've upgraded singlefile in 0.7.2 and fixed a few small bugs. Can you try on the latest version with `archivebox config --set DEBUG=True` @MyNameIsOka and let me know if SingleFile is still failing? You should get more output if you save a specific link you know is broken with only singlefile like so: ```bash docker compose run archivebox add --extract=singlefile 'https://docs.archivebox.io/en/v0.6.2/Contents.html#' ```
Author
Owner

@MyNameIsOka commented on GitHub (Jan 10, 2024):

Thank you for the update. I installed the newest version by repulling archivebox/archivebox (not dev). Here is an excerpt from the logs when I did a re-snapshot of a website:

[!] Warning: Missing 1 recommended dependencies
    ! CHROME_BINARY: chromium (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_CHROME=False

            
[+] [2024-01-10 01:14:58] Adding 1 links to index (crawl depth=0)...
    > Saved verbatim input to sources/1704849298-import.txt
    > Parsed 1 URLs from input (Generic TXT)
    > Found 1 new URLs not already in index
[*] [2024-01-10 01:14:59] Writing 1 links to main index...

    √ ./index.sqlite3
[*] [2024-01-10 01:15:00] Archiving 1/96 URLs from added set...
[▶] [2024-01-10 01:15:00] Starting archiving of 1 snapshots in index...
[+] [2024-01-10 01:15:02] "wiki.archlinux.org/#2024-01-10T01:14:58+00:00"
    https://wiki.archlinux.org/#2024-01-10T01:14:58+00:00
    > ./archive/1704849299.542502
      > headers
"GET / HTTP/1.1" 302 0
      > singlefile
      > pdf
      > screenshot
      > dom

      > wget
      > title
      > readability
"GET / HTTP/1.1" 302 0
      > mercury
"GET / HTTP/1.1" 302 0
      > htmltotext
      > media
"GET / HTTP/1.1" 302 0

After that, I ran archivebox setup but the logs were the same afterwards.
However, it seems that chromium could not be installed correctly. Here is the output from archivebox setup:

archivebox@archivebox:/data$ archivebox setup
[i] [2024-01-10 01:17:02] ArchiveBox v0.7.2: archivebox setup
    > /data


[+] Installing enabled ArchiveBox dependencies automatically...

    Installing YOUTUBEDL_BINARY automatically using pip...
2023.12.30 is already installed yt-dlp

    Installing CHROME_BINARY automatically using playwright...
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: playwright in /usr/local/lib/python3.11/site-packages (1.40.0)
Requirement already satisfied: greenlet==3.0.1 in /usr/local/lib/python3.11/site-packages (from playwright) (3.0.1)
Requirement already satisfied: pyee==11.0.1 in /usr/local/lib/python3.11/site-packages (from playwright) (11.0.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.11/site-packages (from pyee==11.0.1->playwright) (4.9.0)
Failed to install browsers
Error: EACCES: permission denied, open '/browsers/.links/b6323bfd8590d0d4decc8209e37155ebe8c6517f'
CHROME_BINARY=chromium

    Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm...
SINGLEFILE_BINARY, READABILITY_BINARY, and MERCURURY_BINARY are already installed

[√] Set up ArchiveBox and its dependencies successfully.
0.7.2
ArchiveBox v0.7.2+editable COMMIT_HASH=e888869 BUILD_TIME=2024-01-05 03:55:52 1704426952
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-4.4.302+-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=1029:100 FS_PERMS=644
DEBUG=True IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.7         valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.2          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.5.0          valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v20.10.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.46         valid     /app/node_modules/single-file-cli/single-file                               
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor               
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js                                  
 √  GIT_BINARY            v2.39.2         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2023.12.30     valid     /usr/local/bin/yt-dlp                                                       
 X  CHROME_BINARY         ?               invalid   chromium                                                                    
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 -  COOKIES_FILE          -               disabled  None                                                                        

[i] Data locations:
 √  OUTPUT_DIR            10 files @      valid     /data                                                                       
 √  SOURCES_DIR           83 files        valid     ./sources                                                                   
 √  LOGS_DIR              2 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           96 files        valid     ./archive                                                                   
 √  CONFIG_FILE           162.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             752.0 KB        valid     ./index.sqlite3                                                             

[!] Warning: Missing 1 recommended dependencies
    ! CHROME_BINARY: chromium (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_CHROME=False

As a side note, when I don't use CHROME_BINARY=chromium, SingleFile is working.

<!-- gh-comment-id:1884046031 --> @MyNameIsOka commented on GitHub (Jan 10, 2024): Thank you for the update. I installed the newest version by repulling `archivebox/archivebox` (not `dev`). Here is an excerpt from the logs when I did a re-snapshot of a website: ``` [!] Warning: Missing 1 recommended dependencies ! CHROME_BINARY: chromium (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_CHROME=False  [+] [2024-01-10 01:14:58] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1704849298-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2024-01-10 01:14:59] Writing 1 links to main index... √ ./index.sqlite3 [*] [2024-01-10 01:15:00] Archiving 1/96 URLs from added set... [▶] [2024-01-10 01:15:00] Starting archiving of 1 snapshots in index... [+] [2024-01-10 01:15:02] "wiki.archlinux.org/#2024-01-10T01:14:58+00:00" https://wiki.archlinux.org/#2024-01-10T01:14:58+00:00 > ./archive/1704849299.542502 > headers "GET / HTTP/1.1" 302 0 > singlefile > pdf > screenshot > dom  > wget > title > readability "GET / HTTP/1.1" 302 0 > mercury "GET / HTTP/1.1" 302 0 > htmltotext > media "GET / HTTP/1.1" 302 0 ``` After that, I ran `archivebox setup` but the logs were the same afterwards. However, it seems that chromium could not be installed correctly. Here is the output from `archivebox setup`: ``` archivebox@archivebox:/data$ archivebox setup [i] [2024-01-10 01:17:02] ArchiveBox v0.7.2: archivebox setup > /data [+] Installing enabled ArchiveBox dependencies automatically... Installing YOUTUBEDL_BINARY automatically using pip... 2023.12.30 is already installed yt-dlp Installing CHROME_BINARY automatically using playwright... Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: playwright in /usr/local/lib/python3.11/site-packages (1.40.0) Requirement already satisfied: greenlet==3.0.1 in /usr/local/lib/python3.11/site-packages (from playwright) (3.0.1) Requirement already satisfied: pyee==11.0.1 in /usr/local/lib/python3.11/site-packages (from playwright) (11.0.1) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.11/site-packages (from pyee==11.0.1->playwright) (4.9.0) Failed to install browsers Error: EACCES: permission denied, open '/browsers/.links/b6323bfd8590d0d4decc8209e37155ebe8c6517f' CHROME_BINARY=chromium Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm... SINGLEFILE_BINARY, READABILITY_BINARY, and MERCURURY_BINARY are already installed [√] Set up ArchiveBox and its dependencies successfully. 0.7.2 ArchiveBox v0.7.2+editable COMMIT_HASH=e888869 BUILD_TIME=2024-01-05 03:55:52 1704426952 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-4.4.302+-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=1029:100 FS_PERMS=644 DEBUG=True IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.7 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.2 valid /usr/local/bin/archivebox √ CURL_BINARY v8.5.0 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.10.0 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.46 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.2 valid /usr/bin/git √ YOUTUBEDL_BINARY v2023.12.30 valid /usr/local/bin/yt-dlp X CHROME_BINARY ? invalid chromium √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 10 files @ valid /data √ SOURCES_DIR 83 files valid ./sources √ LOGS_DIR 2 files valid ./logs √ ARCHIVE_DIR 96 files valid ./archive √ CONFIG_FILE 162.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 752.0 KB valid ./index.sqlite3 [!] Warning: Missing 1 recommended dependencies ! CHROME_BINARY: chromium (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_CHROME=False ``` As a side note, when I don't use `CHROME_BINARY=chromium`, SingleFile is working.
Author
Owner

@pirate commented on GitHub (Jan 10, 2024):

In docker you don't need to run archivebox setup, as it already comes with everything pre-installed.

I think it should be CHROME_BINARY=/usr/bin/chromium-browser not chromium. Singlefile depends on a compatible chromium to work, so it will break if you're seeing this:

[!] Warning: Missing 1 recommended dependencies
    ! CHROME_BINARY: chromium (unable to detect version)

Can you set archivebox config --set CHROME_BINARY=/usr/bin/chromium-browser and try again?

Check to make sure it shows a valid version number for CHROME_BINARY in the archivebox version output.

<!-- gh-comment-id:1884174519 --> @pirate commented on GitHub (Jan 10, 2024): In docker you don't need to run `archivebox setup`, as it already comes with everything pre-installed. I think it should be `CHROME_BINARY=/usr/bin/chromium-browser` not `chromium`. Singlefile depends on a compatible chromium to work, so it will break if you're seeing this: ```bash [!] Warning: Missing 1 recommended dependencies ! CHROME_BINARY: chromium (unable to detect version) ``` Can you set `archivebox config --set CHROME_BINARY=/usr/bin/chromium-browser` and try again? Check to make sure it shows a valid version number for `CHROME_BINARY` in the `archivebox version` output.
Author
Owner

@MyNameIsOka commented on GitHub (Jan 10, 2024):

oh nice, it worked by specifying the path as you described! Thanks a lot.

<!-- gh-comment-id:1884226476 --> @MyNameIsOka commented on GitHub (Jan 10, 2024): oh nice, it worked by specifying the path as you described! Thanks a lot.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2295
No description provided.