mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1278] Bug: Singlefile and other Chrome-based extractors not working in 0.7.1 on x86_64 #3805
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3805
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @onemenzel on GitHub (Dec 2, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1278
Describe the bug
Since about a month, singlefile and other Chrome-based extractors don't work anymore.
Steps to reproduce
Enter a url into the add interface or try to pull any snapshot. My config:
I also tried to enable user namespace cloning in my host system as recommended in the puppeteer docs that are linked in the logs that I pasted below. Also, I tried the second method from there (Setup setuid sandbox) within the container but I could not get that to work as well…
Screenshots or log output
Log Output
When I execute the above in via
docker-compose exec archivebox bash:ArchiveBox version
docker: archivebox/archivebox:latest as of today
I'm using docker-compose with ubuntu 20.04 as the host system.
@pirate commented on GitHub (Dec 18, 2023):
Can you try again with the latest build, I bumped the chrome version and pushed a few other minor fixes.
@darwinshameran commented on GitHub (Dec 20, 2023):
Same issue here. Unfortunately this is preventing us from using this in production internally, and don't want to exactly run a dev build.
@pirate commented on GitHub (Dec 20, 2023):
I'm almost ready to roll a minor patches release from
dev->mainhttps://github.com/ArchiveBox/ArchiveBox/pull/1297. If you're able to verify it works on your machine I can get the release out by new years.Chrome archiving works on my test machines and I cant reproduce this reported issue on the 0.7.2 candidate, so it would be super helpful to get bug reports from anyone who's experiencing failures so can make sure 0.7.2 works for everyone.
Namespace cloning or suid sandboxing should not be necessary within Docker, it should "just work" on the first try with our new playwright-based chrome distribution 😕
@MyNameIsOka commented on GitHub (Jan 1, 2024):
I would have liked to test it locally but I am not able to build the Docker container in my local environment following those steps: https://github.com/ArchiveBox/ArchiveBox#setup-the-dev-environment
It fails at
docker build. First it failed because there was a,missing inparse_version_stringwhich was easily fixable.Then it failed because it couldn't find
VERSIONS_AVAILABLEprobably because no release exists (?). How can I circumvent that error?log:
@pirate commented on GitHub (Jan 3, 2024):
Sorry whoops, there a broken commit on dev when you tested, just fixed it. I just pushed the latest working build, mind trying again?
(No need to build locally, just pull
docker pull archivebox/archivebox:devto get the dev image pre-built from Docker Hub)@MyNameIsOka commented on GitHub (Jan 3, 2024):
Thank you for fixing the image.
I pulled it but it seems that it still doesn't work. This is what is output in the errors.log when I add a website with just
SingleFileselected:Also, it looks like the formatting where the tags, pull, snapshot, etc. buttons are is broken:

@pirate commented on GitHub (Jan 4, 2024):
Argh so sorry for the hassle, try again now, I just pushed another fix. I've confirmed it looks like this on our demo server now:
@MyNameIsOka commented on GitHub (Jan 4, 2024):
Thanks, I can confirm the tags and buttons are shown correctly again. SingleFile saving is still not working.
@pirate commented on GitHub (Jan 9, 2024):
I've upgraded singlefile in 0.7.2 and fixed a few small bugs. Can you try on the latest version with
archivebox config --set DEBUG=True@MyNameIsOka and let me know if SingleFile is still failing?You should get more output if you save a specific link you know is broken with only singlefile like so:
@MyNameIsOka commented on GitHub (Jan 10, 2024):
Thank you for the update. I installed the newest version by repulling
archivebox/archivebox(notdev). Here is an excerpt from the logs when I did a re-snapshot of a website:After that, I ran
archivebox setupbut the logs were the same afterwards.However, it seems that chromium could not be installed correctly. Here is the output from
archivebox setup:As a side note, when I don't use
CHROME_BINARY=chromium, SingleFile is working.@pirate commented on GitHub (Jan 10, 2024):
In docker you don't need to run
archivebox setup, as it already comes with everything pre-installed.I think it should be
CHROME_BINARY=/usr/bin/chromium-browsernotchromium. Singlefile depends on a compatible chromium to work, so it will break if you're seeing this:Can you set
archivebox config --set CHROME_BINARY=/usr/bin/chromium-browserand try again?Check to make sure it shows a valid version number for
CHROME_BINARYin thearchivebox versionoutput.@MyNameIsOka commented on GitHub (Jan 10, 2024):
oh nice, it worked by specifying the path as you described! Thanks a lot.