mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #463] Bugfix: UnhandledPromiseRejectionWarning with singlefile attempt #308
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#308
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @drpfenderson on GitHub (Sep 2, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/463
Describe the bug
Decided to move forward with a clean archive, and keep the old one as a historical snapshot. Added a new list of links to archive, but after 6 or 7, almost all of the tasks start erroring out in cascade fashion, starting with
singlefilestep. Error isTimeoutExpired Commandfor chromium-browser, but it gives me a command to run, which also errors out with(node:162485) UnhandledPromiseRejectionWarning: SyntaxError: Unexpected number in JSON at position 2Steps to reproduce
pip install archiveboxarchivebox add ./list.txt.Screenshots or log output
Original error
When I run
Output:
Software versions
@cdvv7788 commented on GitHub (Sep 2, 2020):
@drpfenderson there is a PR that should fix the issue with the index. You can give it a try (it should be merged this week tho): https://github.com/pirate/ArchiveBox/pull/452
About this issue, I will give it a check.
@cdvv7788 commented on GitHub (Sep 3, 2020):
@drpfenderson It works for me. Can you try running it with docker or installing the npm dependencies? (the quickstart has been updated with the instructions)
@drpfenderson commented on GitHub (Sep 3, 2020):
@cdvv7788 I get the same timeout using docker/docker-compose, though I am really not sure how to run the
node_modulescommands from inside the docker to see the exact error. If I cd to the directory listed, don't I lose the docker-compose.yml and the path fornode_modules? I tried researching the exec/run commands a bit to figure out how to execute thesingle-filecommand from within a container, but I can't really grok it.EDIT: To be clear, the npm dependencies were for sure installed, as far as I can tell. No errors on install using
npm install --prefix . 'git+https://github.com/pirate/ArchiveBox.git'. I actually installed both using the prefix for the master branch, as well as your sql_index branch.@gildas-lormeau commented on GitHub (Sep 4, 2020):
Author of SingleFile here. This is a bug in SingleFile due to the presence of "," and spaces in the
--browser-argsswitch (e.g.1440,2000andMozilla/5.0 (Windows NT 10.0; Win64; x64)). I'm trying to see how it could be fixed or circumvented. I guess proposing to pass a JSON string was not a good idea.@gildas-lormeau commented on GitHub (Sep 4, 2020):
Finally, I was able to fix the issue by formatting the
--browser-argsswitch like this (surrounding quotes included) in your example:"--browser-args=""[""--headless"", ""--no-sandbox"", ""--disable-gpu"", ""--disable-dev-shm-usage"", ""--disable-software-rasterizer"", ""--window-size=1440,2000"", ""--user-agent"="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36""]"""instead of:
"--browser-args="["--headless", "--no-sandbox", "--disable-gpu", "--disable-dev-shm-usage", "--disable-software-rasterizer", "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36", "--window-size=1440,2000"]""@cdvv7788 commented on GitHub (Sep 4, 2020):
Thanks. We definitely need to improve the command so it can be run directly on case of error. I will add that to the pending tasks.
@drpfenderson What is your
TIMEOUTsetting? Have you modified any setting or changed any environment variable? I cannot reproduce it, not even with docker, so I am suspecting you have something unusual in your local setup.@drpfenderson commented on GitHub (Sep 4, 2020):
@cdvv7788 My timeout is set to 180, which I set in the docker-compose.yml file. I've tried with much longer times, like 3600 as I've seen mentioned a couple times, but it didn't change anything. The only other change is
SHOW_PROGRESS, but I've also tested it with the default docker-compose file.I'm wondering if there is some config or library file somewhere on my system from the much older versions of archivebox that is floating around and interfering. Before upgrading to v0.21 or docker, I made sure and scrubbed the .conf files and .local stuff related to it, in my user folder as well as the archive folder itself, but there still must be something somewhere that I'm missing since the problem happens with the brand new archive as well. I'll do some deeper searching and let you know.
@cdvv7788 commented on GitHub (Sep 4, 2020):
@drpfenderson I just created a PR that should escape correctly the argument. With that you should be able to, at least, run the command properly, and come back with the error that
singlefileis returning.@drpfenderson commented on GitHub (Sep 4, 2020):
Well, I essentially did a find/replace/delete for every archivebox/node/python file on my system that could be related. There were a number of weird places with files due to the various installation methods I've used over the years for this program. Realizing the amount of work to disentangle everything, I spun up a new server, attached the archive to it, wiped all the local config/conf files, ran the
docker-compose run archivebox addand it all worked!Thank you all for your infinite patience with me.