[GH-ISSUE #167] Archive Method: Chrome timing out for many sites when running in Docker #1626

Closed
opened 2026-03-01 17:52:17 +03:00 by kerem · 8 comments
Owner

Originally created by @tgrosinger on GitHub (Mar 10, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/167

Describe the bug

When running ArchiveBox in the docker container, frequent errors are displayed such as the one below:

Steps to reproduce

Steps to reproduce the behavior:

  1. Follow the wiki instructions to run the docker container
  2. Wait for errors

Screenshots or log output

Failed: TimeoutExpired Command 'google-chrome-unstable' timed out after 60 seconds
        Run to see full output:
            cd /data/archive/1552194240.610;
            google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --dump-dom --timeout=60000 https://my-url-here.com

Software versions

(please complete the following information)

  • OS: Host OS is Ubuntu 18.04
  • ArchiveBox version: 4a7f1d5
  • Docker version: 18.09.2
  • Chrome version: Google Chrome 74.0.3724.8 dev

More Info

I wanted to see more about the errors I was encountering in Chrome, so I started a terminal in the docker container and tried it out. I reduced the timeout because 60000 seemed high. This command still blocks for a very long time though.

pptruser@ffc8e33b6840:/data/archive/1552194240.936$ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars --timeout=60 https://grosinger.net
Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank"
[0310/055813.147638:ERROR:command_buffer_proxy_impl.cc(125)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer.
[0310/055813.173440:INFO:headless_shell.cc(308)] Timeout.

Looks like maybe the lack of GPU in the container is causing an issue. Let's disable that.

pptruser@ffc8e33b6840:/data/archive/1552194240.936$ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars --disable-gpu --timeout=60 https://grosinger.net
[0310/055837.659065:WARNING:discardable_shared_memory_manager.cc(188)] Less than 64MB of free space in temporary directory for shared memory files: 63
Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank"
[0310/055837.767788:INFO:headless_shell.cc(308)] Timeout.
[0310/055839.691556:ERROR:service_worker_storage.cc(2196)] Failed to delete the database: Database IO error

Not sure how to get past this issue though. Any suggestions?

Originally created by @tgrosinger on GitHub (Mar 10, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/167 ### Describe the bug When running ArchiveBox in the docker container, frequent errors are displayed such as the one below: ### Steps to reproduce Steps to reproduce the behavior: 1. Follow the wiki instructions to run the docker container 2. Wait for errors ### Screenshots or log output ``` Failed: TimeoutExpired Command 'google-chrome-unstable' timed out after 60 seconds Run to see full output: cd /data/archive/1552194240.610; google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --dump-dom --timeout=60000 https://my-url-here.com ``` ### Software versions (please complete the following information) - OS: Host OS is Ubuntu 18.04 - ArchiveBox version: 4a7f1d5 - Docker version: 18.09.2 - Chrome version: Google Chrome 74.0.3724.8 dev ### More Info I wanted to see more about the errors I was encountering in Chrome, so I started a terminal in the docker container and tried it out. I reduced the timeout because 60000 seemed high. This command still blocks for a very long time though. ``` pptruser@ffc8e33b6840:/data/archive/1552194240.936$ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars --timeout=60 https://grosinger.net Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank" [0310/055813.147638:ERROR:command_buffer_proxy_impl.cc(125)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer. [0310/055813.173440:INFO:headless_shell.cc(308)] Timeout. ``` Looks like maybe the lack of GPU in the container is causing an issue. Let's disable that. ``` pptruser@ffc8e33b6840:/data/archive/1552194240.936$ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars --disable-gpu --timeout=60 https://grosinger.net [0310/055837.659065:WARNING:discardable_shared_memory_manager.cc(188)] Less than 64MB of free space in temporary directory for shared memory files: 63 Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank" [0310/055837.767788:INFO:headless_shell.cc(308)] Timeout. [0310/055839.691556:ERROR:service_worker_storage.cc(2196)] Failed to delete the database: Database IO error ``` Not sure how to get past this issue though. Any suggestions?
Author
Owner

@pirate commented on GitHub (Mar 11, 2019):

The timeout is in milliseconds for Chromium, so 60000 is 60 seconds. If you set it to anything less than about 5s (5000) it'll hang indefinitely, which is what I think you encountered. Can you try running the command in docker with a timeout like 30000 to see if there's any hint to why it's dying.

<!-- gh-comment-id:471440650 --> @pirate commented on GitHub (Mar 11, 2019): The timeout is [in milliseconds](https://cs.chromium.org/chromium/src/headless/app/headless_shell_switches.cc?q=kTimeout&sq=package:chromium&type=cs&l=91) for Chromium, so `60000` is 60 seconds. If you set it to anything less than about 5s (`5000`) it'll hang indefinitely, which is what I think you encountered. Can you try running the command in docker with a timeout like `30000` to see if there's any hint to why it's dying.
Author
Owner

@tgrosinger commented on GitHub (Mar 12, 2019):

@pirate, thanks for the the information. I bumped the timeout back up to the 60000 used by the original command example and it was successful.

pptruser@e84ba990552d:/data$ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars --timeout=60000 https://grosinger.net
Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank"
[0312/143310.983757:ERROR:command_buffer_proxy_impl.cc(125)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer.
[0312/143316.644646:INFO:headless_shell.cc(534)] Written to file output.pdf.
[0312/143316.744849:ERROR:browser_process_sub_thread.cc(210)] Waited 93 ms for network service

pptruser@e84ba990552d:/data$ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars --timeout=60000 --disable-gpu https://grosinger.net
[0312/143352.914359:WARNING:discardable_shared_memory_manager.cc(188)] Less than 64MB of free space in temporary directory for shared memory files: 45
Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank"
[0312/143359.305380:INFO:headless_shell.cc(534)] Written to file output.pdf.
[0312/143359.371275:ERROR:browser_process_sub_thread.cc(210)] Waited 60 ms for network service

When I go back to the actual URL that was failing in the logs, it too succeeds, but with a lot more error messages. The output pdf looks pretty bad, but it does have most of the information there.

$ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars  --timeout=60000 https://www.wunderground.com/weather/us/wa/eastsound/KWAEASTS33
[0312/143556.407027:WARNING:discardable_shared_memory_manager.cc(188)] Less than 64MB of free space in temporary directory for shared memory files: 59
Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank"
[0312/143556.491513:ERROR:command_buffer_proxy_impl.cc(125)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer.
[0312/143558.441713:ERROR:service_worker_storage.cc(2196)] Failed to delete the database: Database IO error
[0312/143609.103879:ERROR:session_storage_context_mojo.cc(1014)] Got error when opening: 1
[0312/143611.125671:ERROR:session_storage_context_mojo.cc(1014)] Got error when opening: 1
[0312/143615.736152:ERROR:gles2_cmd_decoder.cc(3559)] ContextResult::kFatalFailure: fail_if_major_perf_caveat + swiftshader
[0312/143615.741015:ERROR:gles2_cmd_decoder.cc(3559)] ContextResult::kFatalFailure: fail_if_major_perf_caveat + swiftshader
[0312/143620.936695:INFO:CONSOLE(0)] "Access to XMLHttpRequest at 'https://api.weather.com/v1/geocode/48.69/-122.91/airquality/summary.json?apiKey=6532d6454b8aa370768e63d6ba5a832e&language=en-US&startDate=20190305&endDate=20190312' from origin 'https://www.wunderground.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.", source: https://www.wunderground.com/weather/us/wa/eastsound/KWAEASTS33 (0)
[0312/143625.662688:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143625.662792:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143625.662891:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143625.662912:WARNING:property.cc(149)] DaemonVersion: GetAndBlock: failed.
[0312/143625.662970:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143625.663026:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143636.379509:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3)
[0312/143636.381769:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3)
[0312/143636.383276:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3)
[0312/143636.384679:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3)
[0312/143636.385988:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3)
[0312/143636.387328:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3)
[0312/143636.388702:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3)
[0312/143636.390025:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3)
[0312/143656.453532:INFO:headless_shell.cc(308)] Timeout.
[0312/143657.274724:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143657.274848:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143657.275003:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143657.275030:WARNING:property.cc(149)] DaemonVersion: GetAndBlock: failed.
[0312/143657.275101:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143657.275205:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0312/143658.185730:INFO:headless_shell.cc(534)] Written to file output.pdf.
[0312/143658.266984:ERROR:browser_process_sub_thread.cc(210)] Waited 60 ms for network service

So if this succeeded I am not sure why the original command was failing in the logs. I am running it again and it actually seems to be having more success, though it takes about 70 seconds per url.

<!-- gh-comment-id:472055126 --> @tgrosinger commented on GitHub (Mar 12, 2019): @pirate, thanks for the the information. I bumped the timeout back up to the 60000 used by the original command example and it was successful. ``` pptruser@e84ba990552d:/data$ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars --timeout=60000 https://grosinger.net Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank" [0312/143310.983757:ERROR:command_buffer_proxy_impl.cc(125)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer. [0312/143316.644646:INFO:headless_shell.cc(534)] Written to file output.pdf. [0312/143316.744849:ERROR:browser_process_sub_thread.cc(210)] Waited 93 ms for network service pptruser@e84ba990552d:/data$ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars --timeout=60000 --disable-gpu https://grosinger.net [0312/143352.914359:WARNING:discardable_shared_memory_manager.cc(188)] Less than 64MB of free space in temporary directory for shared memory files: 45 Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank" [0312/143359.305380:INFO:headless_shell.cc(534)] Written to file output.pdf. [0312/143359.371275:ERROR:browser_process_sub_thread.cc(210)] Waited 60 ms for network service ``` When I go back to the actual URL that was failing in the logs, it too succeeds, but with a lot more error messages. The output pdf looks pretty bad, but it does have most of the information there. ``` $ google-chrome-unstable --headless --no-sandbox --user-data-dir=/chrome --print-to-pdf --hide-scrollbars --timeout=60000 https://www.wunderground.com/weather/us/wa/eastsound/KWAEASTS33 [0312/143556.407027:WARNING:discardable_shared_memory_manager.cc(188)] Less than 64MB of free space in temporary directory for shared memory files: 59 Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank" [0312/143556.491513:ERROR:command_buffer_proxy_impl.cc(125)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer. [0312/143558.441713:ERROR:service_worker_storage.cc(2196)] Failed to delete the database: Database IO error [0312/143609.103879:ERROR:session_storage_context_mojo.cc(1014)] Got error when opening: 1 [0312/143611.125671:ERROR:session_storage_context_mojo.cc(1014)] Got error when opening: 1 [0312/143615.736152:ERROR:gles2_cmd_decoder.cc(3559)] ContextResult::kFatalFailure: fail_if_major_perf_caveat + swiftshader [0312/143615.741015:ERROR:gles2_cmd_decoder.cc(3559)] ContextResult::kFatalFailure: fail_if_major_perf_caveat + swiftshader [0312/143620.936695:INFO:CONSOLE(0)] "Access to XMLHttpRequest at 'https://api.weather.com/v1/geocode/48.69/-122.91/airquality/summary.json?apiKey=6532d6454b8aa370768e63d6ba5a832e&language=en-US&startDate=20190305&endDate=20190312' from origin 'https://www.wunderground.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.", source: https://www.wunderground.com/weather/us/wa/eastsound/KWAEASTS33 (0) [0312/143625.662688:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143625.662792:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143625.662891:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143625.662912:WARNING:property.cc(149)] DaemonVersion: GetAndBlock: failed. [0312/143625.662970:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143625.663026:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143636.379509:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3) [0312/143636.381769:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3) [0312/143636.383276:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3) [0312/143636.384679:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3) [0312/143636.385988:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3) [0312/143636.387328:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3) [0312/143636.388702:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3) [0312/143636.390025:INFO:CONSOLE(3)] "digiTrustUser not defined", source: https://js-sec.indexww.com/ht/p/182970-203800257961387.js (3) [0312/143656.453532:INFO:headless_shell.cc(308)] Timeout. [0312/143657.274724:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143657.274848:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143657.275003:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143657.275030:WARNING:property.cc(149)] DaemonVersion: GetAndBlock: failed. [0312/143657.275101:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143657.275205:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0312/143658.185730:INFO:headless_shell.cc(534)] Written to file output.pdf. [0312/143658.266984:ERROR:browser_process_sub_thread.cc(210)] Waited 60 ms for network service ``` So if this succeeded I am not sure why the original command was failing in the logs. I am running it again and it actually seems to be having more success, though it takes about 70 seconds per url.
Author
Owner

@pirate commented on GitHub (Mar 12, 2019):

That output is fairly normal, not all sites archive well to PDF, which is why we also store screenshots and DOM dumps from Chrome headless as redundant backups.

Try increasing your ArchiveBox TIMEOUT to 70 or 80 and running it again to capture those sites that take longer than 60s.

<!-- gh-comment-id:472081080 --> @pirate commented on GitHub (Mar 12, 2019): That output is fairly normal, not all sites archive well to PDF, which is why we also store screenshots and DOM dumps from Chrome headless as redundant backups. Try increasing your ArchiveBox [`TIMEOUT`](https://github.com/pirate/ArchiveBox/wiki/Configuration#TIMEOUT) to 70 or 80 and running it again to capture those sites that take longer than 60s.
Author
Owner

@tgrosinger commented on GitHub (Mar 12, 2019):

Will do. I'll close this issue and reopen if I continue having trouble.
Thanks!

<!-- gh-comment-id:472081665 --> @tgrosinger commented on GitHub (Mar 12, 2019): Will do. I'll close this issue and reopen if I continue having trouble. Thanks!
Author
Owner

@pirate commented on GitHub (Mar 12, 2019):

Sounds good. FYI I also just added --disable-gpu when running inside Docker: github.com/pirate/ArchiveBox@10bb970d66

<!-- gh-comment-id:472087102 --> @pirate commented on GitHub (Mar 12, 2019): Sounds good. FYI I also just added `--disable-gpu` when running inside Docker: https://github.com/pirate/ArchiveBox/commit/10bb970d6647269625b8d9ee1534b784a059b464
Author
Owner

@ghost commented on GitHub (Jul 1, 2019):

I'm running into this, myself. Even on basic pages with Docker, they all seem to fail to make a screenshot and PDF. If I use the native setup on Debian 9, it works fine.

This page triggered it: https://www.cnn.com/2019/06/30/politics/beto-orourke-mexico-asylum-seekers/index.html

This page did not: https://sporestack.com

<!-- gh-comment-id:507108110 --> @ghost commented on GitHub (Jul 1, 2019): I'm running into this, myself. Even on basic pages with Docker, they all seem to fail to make a screenshot and PDF. If I use the native setup on Debian 9, it works fine. This page triggered it: https://www.cnn.com/2019/06/30/politics/beto-orourke-mexico-asylum-seekers/index.html This page did not: https://sporestack.com
Author
Owner

@pirate commented on GitHub (Jul 5, 2019):

Interesting, I'll try and take a look but I cant promise I'll get around to it in the next few months, as there's a bunch of security work and the v0.4.0 release that are taking top priority. One thing I'll do is bump all the docker/chrome versions when I release v0.4.0, and hopefully that'll clear up some issues.

<!-- gh-comment-id:508872054 --> @pirate commented on GitHub (Jul 5, 2019): Interesting, I'll try and take a look but I cant promise I'll get around to it in the next few months, as there's a bunch of security work and the v0.4.0 release that are taking top priority. One thing I'll do is bump all the docker/chrome versions when I release v0.4.0, and hopefully that'll clear up some issues.
Author
Owner

@pirate commented on GitHub (Jul 24, 2020):

Now that we're a handful of major versions ahead with Chrome, please give this a shot on the latest django branch, if you still see any issues with timing out comment back here and I'll reopen the ticket.

git checkout django
git pull
docker build . -t archivebox
docker run -v $PWD/output:/data archivebox init
docker run -v $PWD/output:/data archivebox add 'https://example.com'
<!-- gh-comment-id:663633278 --> @pirate commented on GitHub (Jul 24, 2020): Now that we're a handful of major versions ahead with Chrome, please give this a shot on the latest `django` branch, if you still see any issues with timing out comment back here and I'll reopen the ticket. ```bash git checkout django git pull docker build . -t archivebox docker run -v $PWD/output:/data archivebox init docker run -v $PWD/output:/data archivebox add 'https://example.com' ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1626
No description provided.