[GH-ISSUE #1157] ArchiveBox v0.6.2 on bare metal is unable to use newest Chromium version v114 (fails to archive PDF, Screenshot, or DOM) #2230

Closed
opened 2026-03-01 17:57:29 +03:00 by kerem · 17 comments
Owner

Originally created by @HaveANiceDay21 on GitHub (Jun 13, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1157

I have installed archivebox and everything has worked fine other than Chromium, and CHROME_BINARY is valid. I've also specified CHROME_USER_DATA_DIR to /tmp/chrome-profile but doesn't make any change.

ArchiveBox v0.6.2
Cpython Linux Linux-5.10.0-23-amd64-x86_64-with-glibc2.31 x86_64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox
 √  PYTHON_BINARY         v3.9.2          valid     /usr/bin/python3.9
 √  DJANGO_BINARY         v3.1.14         valid     /home/debian/.local/lib/python3.9/site-packages/django/bin/django-admin.py
 √  CURL_BINARY           v7.74.0         valid     /usr/bin/curl
 √  WGET_BINARY           v1.21           valid     /usr/bin/wget
 √  NODE_BINARY           v18.16.0        valid     /home/debian/.nvm/versions/node/v18.16.0/bin/node
 √  SINGLEFILE_BINARY     v1.0.33         valid     ./node_modules/single-file/cli/single-file
 √  READABILITY_BINARY    v0.0.6          valid     ./node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/mercury-parser/cli.js
 √  GIT_BINARY            v2.30.2         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2021.12.17     valid     /usr/local/bin/youtube-dl
 √  CHROME_BINARY         v114.0.5735.106  valid     /usr/bin/chromium
 √  RIPGREP_BINARY        v12.1.1         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/local/lib/python3.9/dist-packages/archivebox
 √  TEMPLATES_DIR         3 files         valid     /usr/local/lib/python3.9/dist-packages/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled

[i] Secrets locations:
 √  CHROME_USER_DATA_DIR  27 files        valid     /tmp/chrome-profile
 √  COOKIES_FILE          951.0 Bytes     valid     /home/debian/Documents/cookies.txt

[i] Data locations:
 √  OUTPUT_DIR            8 files         valid     /home/debian/archivebox
 √  SOURCES_DIR           13 files        valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           1 files         valid     ./archive
 √  CONFIG_FILE           223.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             216.0 KB        valid     ./index.sqlite3
Originally created by @HaveANiceDay21 on GitHub (Jun 13, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1157 I have installed archivebox and everything has worked fine other than Chromium, and CHROME_BINARY is valid. I've also specified CHROME_USER_DATA_DIR to /tmp/chrome-profile but doesn't make any change. ``` ArchiveBox v0.6.2 Cpython Linux Linux-5.10.0-23-amd64-x86_64-with-glibc2.31 x86_64 IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.2 valid /usr/bin/python3.9 √ DJANGO_BINARY v3.1.14 valid /home/debian/.local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.74.0 valid /usr/bin/curl √ WGET_BINARY v1.21 valid /usr/bin/wget √ NODE_BINARY v18.16.0 valid /home/debian/.nvm/versions/node/v18.16.0/bin/node √ SINGLEFILE_BINARY v1.0.33 valid ./node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.6 valid ./node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid ./node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.30.2 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.12.17 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v114.0.5735.106 valid /usr/bin/chromium √ RIPGREP_BINARY v12.1.1 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /usr/local/lib/python3.9/dist-packages/archivebox √ TEMPLATES_DIR 3 files valid /usr/local/lib/python3.9/dist-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: √ CHROME_USER_DATA_DIR 27 files valid /tmp/chrome-profile √ COOKIES_FILE 951.0 Bytes valid /home/debian/Documents/cookies.txt [i] Data locations: √ OUTPUT_DIR 8 files valid /home/debian/archivebox √ SOURCES_DIR 13 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 1 files valid ./archive √ CONFIG_FILE 223.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 216.0 KB valid ./index.sqlite3 ```
Author
Owner

@pirate commented on GitHub (Jun 13, 2023):

To use chrome/chromium >v110 you need to use the latest alpha dev branch. Otherwise please install chromium <=v110 and try that if you want to stay on the more stable v0.6.2 ArchiveBox version.

Unfortunately they made a bunch of breaking changes to the chromium CLI args so we are scrambling to fix it still. You can find our discussion about it here: https://github.com/ArchiveBox/ArchiveBox/issues/1125

The instructions to install the dev build can be found here: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch

Comment back if you're still having issues with chromium in that newer build. Sorry for the trouble.

<!-- gh-comment-id:1588499244 --> @pirate commented on GitHub (Jun 13, 2023): To use chrome/chromium >v110 you need to use the latest alpha `dev` branch. Otherwise please install chromium <=v110 and try that if you want to stay on the more stable v0.6.2 ArchiveBox version. Unfortunately they made a bunch of breaking changes to the chromium CLI args so we are scrambling to fix it still. You can find our discussion about it here: https://github.com/ArchiveBox/ArchiveBox/issues/1125 The instructions to install the `dev` build can be found here: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch Comment back if you're still having issues with chromium in that newer build. Sorry for the trouble.
Author
Owner

@HaveANiceDay21 commented on GitHub (Jun 13, 2023):

What version of chromium do you recommend? I use debian and I haven't been able to find a way to downgrade.

<!-- gh-comment-id:1588548924 --> @HaveANiceDay21 commented on GitHub (Jun 13, 2023): What version of chromium do you recommend? I use debian and I haven't been able to find a way to downgrade.
Author
Owner

@HaveANiceDay21 commented on GitHub (Jun 13, 2023):

I've downgraded to version 108 and it works fine now. But I've set CHROME_USER_DATA_DIR=/tmp/chrome-profile and logged into all the sites I want and it hasn't bypassed the guest only thing. I've set CHROME_HEADLESS=False and when I add a snapshot it doesn't launch the chromium UI.

<!-- gh-comment-id:1590194816 --> @HaveANiceDay21 commented on GitHub (Jun 13, 2023): I've downgraded to version 108 and it works fine now. But I've set CHROME_USER_DATA_DIR=/tmp/chrome-profile and logged into all the sites I want and it hasn't bypassed the guest only thing. I've set CHROME_HEADLESS=False and when I add a snapshot it doesn't launch the chromium UI.
Author
Owner

@pirate commented on GitHub (Jun 14, 2023):

Double check that the chrome binary used to create the profile is exactly the same one ArchiveBox is using.

Can you try running chromium --user-data-dir=/tmp/chrome-profile manually in shell to confirm that the browser opens with the profile correctly and that the chromium binary being used to create the profile exactly matches the one shown in the archivebox version output?

Chrome profiles are strictly tied to the binary that created them, they are not cross-compatible between different build types, different versions, or different OS's/architectures. This means you cannot create a profile with one browser binary and load it with a different one in ArchiveBox. Note you also cannot have two open chrome instances sharing one profile at the same time (it will corrupt the profile silently, so you must recreate it).

More info here: https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile

<!-- gh-comment-id:1590263820 --> @pirate commented on GitHub (Jun 14, 2023): Double check that the chrome binary used to create the profile is exactly the same one ArchiveBox is using. Can you try running `chromium --user-data-dir=/tmp/chrome-profile` manually in shell to confirm that the browser opens with the profile correctly and that the chromium binary being used to create the profile exactly matches the one shown in the `archivebox version` output? Chrome profiles are strictly tied to the binary that created them, they are not cross-compatible between different build types, different versions, or different OS's/architectures. This means you cannot create a profile with one browser binary and load it with a different one in ArchiveBox. Note you also cannot have two open chrome instances sharing one profile at the same time (it will corrupt the profile silently, so you must recreate it). More info here: https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile
Author
Owner

@HaveANiceDay21 commented on GitHub (Jun 15, 2023):

Late response, but I'm going to use VICHAN as an example. I log into the vichan dashboard. I click close. Then I go to archivebox click add and it gives me this: https://files.catbox.moe/mhg20n.PNG
DOM, SCREENSHOT, and PDF give the same thing too.

<!-- gh-comment-id:1593429650 --> @HaveANiceDay21 commented on GitHub (Jun 15, 2023): Late response, but I'm going to use VICHAN as an example. I log into the [vichan dashboard.](https://files.catbox.moe/sytoby.PNG) I click close. Then I go to archivebox click add and it gives me this: [https://files.catbox.moe/mhg20n.PNG](https://files.catbox.moe/mhg20n.PNG) DOM, SCREENSHOT, and PDF give the same thing too.
Author
Owner

@HaveANiceDay21 commented on GitHub (Jun 15, 2023):

I set CHROME_HEADLESS=False to see if I'm even logged in so I run archivebox add https://example.com/mod.php
and it turns out I am, but it still gives out the log in page in the output.

EDIT: Turns out, when I do in the command line, it works but on the web gui it doesn't work.


image

<!-- gh-comment-id:1593495613 --> @HaveANiceDay21 commented on GitHub (Jun 15, 2023): I set `CHROME_HEADLESS=False` to see if I'm even logged in so I run `archivebox add https://example.com/mod.php` and it turns out I am, but it still gives out the log in page in the output. EDIT: Turns out, when I do in the command line, it works but on the web gui it doesn't work. [ ![image](https://github.com/ArchiveBox/ArchiveBox/assets/117618052/38d32982-b558-40d5-ab8b-e5c80f58ae2d) ](url)
Author
Owner

@pirate commented on GitHub (Jun 15, 2023):

Weird! Just to confirm, it's working with your cookies in archivebox now with headless=False?

<!-- gh-comment-id:1593534199 --> @pirate commented on GitHub (Jun 15, 2023): Weird! Just to confirm, it's working with your cookies in archivebox now with headless=False?
Author
Owner

@HaveANiceDay21 commented on GitHub (Jun 15, 2023):

It works with headless=True or headless=false with the command line on my VNC server, but not on the web GUI

<!-- gh-comment-id:1593829087 --> @HaveANiceDay21 commented on GitHub (Jun 15, 2023): It works with headless=True or headless=false with the command line on my VNC server, but not on the web GUI
Author
Owner

@mstyp commented on GitHub (Jun 21, 2023):

please install chromium <=v110 and try that if you want to stay on the more stable v0.6.2 ArchiveBox version.

Please provide explicit instructions on how to do this. running apt-cache policy chromium I can see that no version before 112.0.5615.138-1~deb11u1 is availible and after +10 minutes of searching I cant figure out how to install a different version.

<!-- gh-comment-id:1601711908 --> @mstyp commented on GitHub (Jun 21, 2023): > please install chromium <=v110 and try that if you want to stay on the more stable v0.6.2 ArchiveBox version. Please provide explicit instructions on how to do this. running `apt-cache policy chromium` I can see that no version before `112.0.5615.138-1~deb11u1` is availible and after +10 minutes of searching I cant figure out how to install a different version.
Author
Owner

@pirate commented on GitHub (Jun 23, 2023):

Lots of instructions here, including links to old chromium versions: https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile

<!-- gh-comment-id:1603620386 --> @pirate commented on GitHub (Jun 23, 2023): Lots of instructions here, including links to old chromium versions: https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile
Author
Owner

@mstyp commented on GitHub (Jul 4, 2023):

Your right it contains a lot of instructions, but no instructions on how to actually install the specific version of chromium. It litterally just says "Install desired Chromium version in new directory" but I dont know how to do that. Thats what my question is. How do install the desired version of chromium?

are the instructions on https://chromium.cypress.io ? I've only been able to get that site to load once and when it did it crashed my computer (very slow internet + very shitty computer = lots of crashing)

<!-- gh-comment-id:1620636265 --> @mstyp commented on GitHub (Jul 4, 2023): Your right it contains a lot of instructions, but no instructions on how to actually install the specific version of chromium. It litterally just says "Install desired Chromium version in new directory" *but I dont know how to do that*. Thats what my question is. How do install the desired version of chromium? are the instructions on https://chromium.cypress.io ? I've only been able to get that site to load once and when it did it crashed my computer (very slow internet + very shitty computer = lots of crashing)
Author
Owner

@msalmasi commented on GitHub (Jul 19, 2023):

I'm having the same issue as OP with Chrome v114, and I am using the latest dev branch. This is only an issue when I also have CHROME_USER_DATA_DIR set, even though this was created with the same version of v114. If I unset the user data, chrome works fine in default mode.

<!-- gh-comment-id:1641405819 --> @msalmasi commented on GitHub (Jul 19, 2023): I'm having the same issue as OP with Chrome v114, and I am using the latest dev branch. This is only an issue when I also have CHROME_USER_DATA_DIR set, even though this was created with the same version of v114. If I unset the user data, chrome works fine in default mode.
Author
Owner

@msalmasi commented on GitHub (Jul 19, 2023):

I've nailed down a temporary "fix" for this issue, which may nail down the root cause. Please note I am running ArchiveBox in docker.

I noticed that when I first connect ArchiveBox to my Chrome Profile, it works initially but eventually fails. Looking at my logs I see this error:

[3989:4018:0719/151417.424549:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory

[3989:4015:0719/151417.434587:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "UNIX")

So first, I check to see it dbus is running in the container.

sudo docker exec -u root -it archivebox service dbus status

For some reason dbus has stopped. So I restart it.

sudo docker exec -u root -it archivebox service dbus start

I try running a Snapshot again. This time, the dbus error is gone but I get a new error message:

[4274:4299:0719/151619.852053:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "UNIX")

[4274:4274:0719/151619.857995:ERROR:process_singleton_posix.cc(334)] Failed to create /chromium/config/.config/chromium/SingletonLock: File exists (17)

So I removed the SingletonLock file.

Now ArchiveBox grabs things correctly again.

So there are two issues that need their root causes resolved. One is dbus crashing (may be related to instabilities wrt using the dev build). Another is that at some point the profile is locked by the chromium instance that ArchiveBox is using, and the lock file is not deleted as it should be (I was not running any other instance of chromium and no other apps have access to this profile data).

UPDATE: Manually starting dbus isn't required. Only deleting the Singletonlock file is required.

<!-- gh-comment-id:1642338287 --> @msalmasi commented on GitHub (Jul 19, 2023): I've nailed down a temporary "fix" for this issue, which may nail down the root cause. Please note I am running ArchiveBox in docker. I noticed that when I first connect ArchiveBox to my Chrome Profile, it works initially but eventually fails. Looking at my logs I see this error: [3989:4018:0719/151417.424549:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory [3989:4015:0719/151417.434587:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "UNIX") So first, I check to see it dbus is running in the container. sudo docker exec -u root -it archivebox service dbus status For some reason dbus has stopped. So I restart it. sudo docker exec -u root -it archivebox service dbus start I try running a Snapshot again. This time, the dbus error is gone but I get a new error message: [4274:4299:0719/151619.852053:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "UNIX") [4274:4274:0719/151619.857995:ERROR:process_singleton_posix.cc(334)] Failed to create /chromium/config/.config/chromium/SingletonLock: File exists (17) So I removed the SingletonLock file. Now ArchiveBox grabs things correctly again. So there are two issues that need their root causes resolved. One is dbus crashing (may be related to instabilities wrt using the dev build). Another is that at some point the profile is locked by the chromium instance that ArchiveBox is using, and the lock file is not deleted as it should be (I was not running any other instance of chromium and no other apps have access to this profile data). UPDATE: Manually starting dbus isn't required. Only deleting the Singletonlock file is required.
Author
Owner

@msalmasi commented on GitHub (Jul 19, 2023):

I think this issue is related to this issue:

Orphan chromium processes continue running after ArchiveBox snapshot jobs complete

By running docker top archivebox I can see that the chromium processes persist after finishing archiving. When this happens, the singletonlock file persists and is not killed.

<!-- gh-comment-id:1642522103 --> @msalmasi commented on GitHub (Jul 19, 2023): I think this issue is related to this issue: [Orphan chromium processes continue running after ArchiveBox snapshot jobs complete](https://github.com/ArchiveBox/ArchiveBox/issues/746#top) By running docker top archivebox I can see that the chromium processes persist after finishing archiving. When this happens, the singletonlock file persists and is not killed.
Author
Owner

@bogorad commented on GitHub (Jul 23, 2023):

Running in an LXC container. I've installed Chromium Version 105.0.5177.0 (Developer Build) (64-bit), put it into a directory, then set up a profile. Here's Archivebox's config:

[SERVER_CONFIG]
SECRET_KEY = ***

[ARCHIVE_METHOD_TOGGLES]
SAVE_READABILITY = False

[ARCHIVE_METHOD_OPTIONS]
CURL_USER_AGENT = Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
WGET_USER_AGENT = Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
CHROME_USER_AGENT = Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
CHROME_USER_DATA_DIR = /abdata/chrome
CHROME_HEADLESS = false
CHROME_SANDBOX = true

[DEPENDENCY_CONFIG]
CHROME_BINARY = /abdata/chrome-linux/chrome

When I manually do

/abdata/chrome-linux/chrome --user-data-dir=/abdata/chrome --print-to-pdf "http://ifconfig.me"

I get a chromium window, with all extensions properly started, but no pdf - and it stays open. Same for --screenshot. What am I missing? Does --print-to-pdf only work in headless mode? Then what's the point in disabling it - extensions won't work in headless.

<!-- gh-comment-id:1646964454 --> @bogorad commented on GitHub (Jul 23, 2023): Running in an LXC container. I've installed Chromium `Version 105.0.5177.0 (Developer Build) (64-bit)`, put it into a directory, then set up a profile. Here's Archivebox's config: ``` [SERVER_CONFIG] SECRET_KEY = *** [ARCHIVE_METHOD_TOGGLES] SAVE_READABILITY = False [ARCHIVE_METHOD_OPTIONS] CURL_USER_AGENT = Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 WGET_USER_AGENT = Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 CHROME_USER_AGENT = Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 CHROME_USER_DATA_DIR = /abdata/chrome CHROME_HEADLESS = false CHROME_SANDBOX = true [DEPENDENCY_CONFIG] CHROME_BINARY = /abdata/chrome-linux/chrome ``` When I manually do `/abdata/chrome-linux/chrome --user-data-dir=/abdata/chrome --print-to-pdf "http://ifconfig.me"` I get a chromium window, with all extensions properly started, but no pdf - and it stays open. Same for `--screenshot`. What am I missing? Does --print-to-pdf only work in headless mode? Then what's the point in disabling it - extensions won't work in headless.
Author
Owner

@bogorad commented on GitHub (Jul 24, 2023):

Have you considered using browselress-chrome?

https://github.com/browserless/chrome/

<!-- gh-comment-id:1647669366 --> @bogorad commented on GitHub (Jul 24, 2023): Have you considered using browselress-chrome? https://github.com/browserless/chrome/
Author
Owner

@pirate commented on GitHub (Dec 17, 2023):

I'm going to close this as stale for now, as there are many changes and improvements made to Chrome in Docker ArchiveBox since the release OP is referring to (e.g. we now use a different Chrome install method managed by Playwright instead of by Apt).

If anyone is still experiencing issues running Chrome on >=0.7.1 please open a new issue with a screenshot of the error and the full output of docker compose run archivebox version.

<!-- gh-comment-id:1859311017 --> @pirate commented on GitHub (Dec 17, 2023): I'm going to close this as stale for now, as there are many changes and improvements made to Chrome in Docker ArchiveBox since the release OP is referring to (e.g. we now use a different Chrome install method managed by Playwright instead of by Apt). If anyone is still experiencing issues running Chrome on >=0.7.1 please open a new issue with a screenshot of the error and the full output of `docker compose run archivebox version`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2230
No description provided.