mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #167] Archive Method: Chrome timing out for many sites when running in Docker #1626
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1626
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tgrosinger on GitHub (Mar 10, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/167
Describe the bug
When running ArchiveBox in the docker container, frequent errors are displayed such as the one below:
Steps to reproduce
Steps to reproduce the behavior:
Screenshots or log output
Software versions
(please complete the following information)
4a7f1d5More Info
I wanted to see more about the errors I was encountering in Chrome, so I started a terminal in the docker container and tried it out. I reduced the timeout because 60000 seemed high. This command still blocks for a very long time though.
Looks like maybe the lack of GPU in the container is causing an issue. Let's disable that.
Not sure how to get past this issue though. Any suggestions?
@pirate commented on GitHub (Mar 11, 2019):
The timeout is in milliseconds for Chromium, so
60000is 60 seconds. If you set it to anything less than about 5s (5000) it'll hang indefinitely, which is what I think you encountered. Can you try running the command in docker with a timeout like30000to see if there's any hint to why it's dying.@tgrosinger commented on GitHub (Mar 12, 2019):
@pirate, thanks for the the information. I bumped the timeout back up to the 60000 used by the original command example and it was successful.
When I go back to the actual URL that was failing in the logs, it too succeeds, but with a lot more error messages. The output pdf looks pretty bad, but it does have most of the information there.
So if this succeeded I am not sure why the original command was failing in the logs. I am running it again and it actually seems to be having more success, though it takes about 70 seconds per url.
@pirate commented on GitHub (Mar 12, 2019):
That output is fairly normal, not all sites archive well to PDF, which is why we also store screenshots and DOM dumps from Chrome headless as redundant backups.
Try increasing your ArchiveBox
TIMEOUTto 70 or 80 and running it again to capture those sites that take longer than 60s.@tgrosinger commented on GitHub (Mar 12, 2019):
Will do. I'll close this issue and reopen if I continue having trouble.
Thanks!
@pirate commented on GitHub (Mar 12, 2019):
Sounds good. FYI I also just added
--disable-gpuwhen running inside Docker:github.com/pirate/ArchiveBox@10bb970d66@ghost commented on GitHub (Jul 1, 2019):
I'm running into this, myself. Even on basic pages with Docker, they all seem to fail to make a screenshot and PDF. If I use the native setup on Debian 9, it works fine.
This page triggered it: https://www.cnn.com/2019/06/30/politics/beto-orourke-mexico-asylum-seekers/index.html
This page did not: https://sporestack.com
@pirate commented on GitHub (Jul 5, 2019):
Interesting, I'll try and take a look but I cant promise I'll get around to it in the next few months, as there's a bunch of security work and the v0.4.0 release that are taking top priority. One thing I'll do is bump all the docker/chrome versions when I release v0.4.0, and hopefully that'll clear up some issues.
@pirate commented on GitHub (Jul 24, 2020):
Now that we're a handful of major versions ahead with Chrome, please give this a shot on the latest
djangobranch, if you still see any issues with timing out comment back here and I'll reopen the ticket.