mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1157] ArchiveBox v0.6.2 on bare metal is unable to use newest Chromium version v114 (fails to archive PDF, Screenshot, or DOM) #719
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#719
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @HaveANiceDay21 on GitHub (Jun 13, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1157
I have installed archivebox and everything has worked fine other than Chromium, and CHROME_BINARY is valid. I've also specified CHROME_USER_DATA_DIR to /tmp/chrome-profile but doesn't make any change.
@pirate commented on GitHub (Jun 13, 2023):
To use chrome/chromium >v110 you need to use the latest alpha
devbranch. Otherwise please install chromium <=v110 and try that if you want to stay on the more stable v0.6.2 ArchiveBox version.Unfortunately they made a bunch of breaking changes to the chromium CLI args so we are scrambling to fix it still. You can find our discussion about it here: https://github.com/ArchiveBox/ArchiveBox/issues/1125
The instructions to install the
devbuild can be found here: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branchComment back if you're still having issues with chromium in that newer build. Sorry for the trouble.
@HaveANiceDay21 commented on GitHub (Jun 13, 2023):
What version of chromium do you recommend? I use debian and I haven't been able to find a way to downgrade.
@HaveANiceDay21 commented on GitHub (Jun 13, 2023):
I've downgraded to version 108 and it works fine now. But I've set CHROME_USER_DATA_DIR=/tmp/chrome-profile and logged into all the sites I want and it hasn't bypassed the guest only thing. I've set CHROME_HEADLESS=False and when I add a snapshot it doesn't launch the chromium UI.
@pirate commented on GitHub (Jun 14, 2023):
Double check that the chrome binary used to create the profile is exactly the same one ArchiveBox is using.
Can you try running
chromium --user-data-dir=/tmp/chrome-profilemanually in shell to confirm that the browser opens with the profile correctly and that the chromium binary being used to create the profile exactly matches the one shown in thearchivebox versionoutput?Chrome profiles are strictly tied to the binary that created them, they are not cross-compatible between different build types, different versions, or different OS's/architectures. This means you cannot create a profile with one browser binary and load it with a different one in ArchiveBox. Note you also cannot have two open chrome instances sharing one profile at the same time (it will corrupt the profile silently, so you must recreate it).
More info here: https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile
@HaveANiceDay21 commented on GitHub (Jun 15, 2023):
Late response, but I'm going to use VICHAN as an example. I log into the vichan dashboard. I click close. Then I go to archivebox click add and it gives me this: https://files.catbox.moe/mhg20n.PNG
DOM, SCREENSHOT, and PDF give the same thing too.
@HaveANiceDay21 commented on GitHub (Jun 15, 2023):
I set
CHROME_HEADLESS=Falseto see if I'm even logged in so I runarchivebox add https://example.com/mod.phpand it turns out I am, but it still gives out the log in page in the output.
EDIT: Turns out, when I do in the command line, it works but on the web gui it doesn't work.
@pirate commented on GitHub (Jun 15, 2023):
Weird! Just to confirm, it's working with your cookies in archivebox now with headless=False?
@HaveANiceDay21 commented on GitHub (Jun 15, 2023):
It works with headless=True or headless=false with the command line on my VNC server, but not on the web GUI
@mstyp commented on GitHub (Jun 21, 2023):
Please provide explicit instructions on how to do this. running
apt-cache policy chromiumI can see that no version before112.0.5615.138-1~deb11u1is availible and after +10 minutes of searching I cant figure out how to install a different version.@pirate commented on GitHub (Jun 23, 2023):
Lots of instructions here, including links to old chromium versions: https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile
@mstyp commented on GitHub (Jul 4, 2023):
Your right it contains a lot of instructions, but no instructions on how to actually install the specific version of chromium. It litterally just says "Install desired Chromium version in new directory" but I dont know how to do that. Thats what my question is. How do install the desired version of chromium?
are the instructions on https://chromium.cypress.io ? I've only been able to get that site to load once and when it did it crashed my computer (very slow internet + very shitty computer = lots of crashing)
@msalmasi commented on GitHub (Jul 19, 2023):
I'm having the same issue as OP with Chrome v114, and I am using the latest dev branch. This is only an issue when I also have CHROME_USER_DATA_DIR set, even though this was created with the same version of v114. If I unset the user data, chrome works fine in default mode.
@msalmasi commented on GitHub (Jul 19, 2023):
I've nailed down a temporary "fix" for this issue, which may nail down the root cause. Please note I am running ArchiveBox in docker.
I noticed that when I first connect ArchiveBox to my Chrome Profile, it works initially but eventually fails. Looking at my logs I see this error:
[3989:4018:0719/151417.424549:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[3989:4015:0719/151417.434587:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "UNIX")
So first, I check to see it dbus is running in the container.
sudo docker exec -u root -it archivebox service dbus status
For some reason dbus has stopped. So I restart it.
sudo docker exec -u root -it archivebox service dbus start
I try running a Snapshot again. This time, the dbus error is gone but I get a new error message:
[4274:4299:0719/151619.852053:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "UNIX")
[4274:4274:0719/151619.857995:ERROR:process_singleton_posix.cc(334)] Failed to create /chromium/config/.config/chromium/SingletonLock: File exists (17)
So I removed the SingletonLock file.
Now ArchiveBox grabs things correctly again.
So there are two issues that need their root causes resolved. One is dbus crashing (may be related to instabilities wrt using the dev build). Another is that at some point the profile is locked by the chromium instance that ArchiveBox is using, and the lock file is not deleted as it should be (I was not running any other instance of chromium and no other apps have access to this profile data).
UPDATE: Manually starting dbus isn't required. Only deleting the Singletonlock file is required.
@msalmasi commented on GitHub (Jul 19, 2023):
I think this issue is related to this issue:
Orphan chromium processes continue running after ArchiveBox snapshot jobs complete
By running docker top archivebox I can see that the chromium processes persist after finishing archiving. When this happens, the singletonlock file persists and is not killed.
@bogorad commented on GitHub (Jul 23, 2023):
Running in an LXC container. I've installed Chromium
Version 105.0.5177.0 (Developer Build) (64-bit), put it into a directory, then set up a profile. Here's Archivebox's config:When I manually do
/abdata/chrome-linux/chrome --user-data-dir=/abdata/chrome --print-to-pdf "http://ifconfig.me"I get a chromium window, with all extensions properly started, but no pdf - and it stays open. Same for
--screenshot. What am I missing? Does --print-to-pdf only work in headless mode? Then what's the point in disabling it - extensions won't work in headless.@bogorad commented on GitHub (Jul 24, 2023):
Have you considered using browselress-chrome?
https://github.com/browserless/chrome/
@pirate commented on GitHub (Dec 17, 2023):
I'm going to close this as stale for now, as there are many changes and improvements made to Chrome in Docker ArchiveBox since the release OP is referring to (e.g. we now use a different Chrome install method managed by Playwright instead of by Apt).
If anyone is still experiencing issues running Chrome on >=0.7.1 please open a new issue with a screenshot of the error and the full output of
docker compose run archivebox version.