[GH-ISSUE #1045] How to resolve "extractor timed out after 60s" error, if possible? #3673

Closed
opened 2026-03-14 23:58:58 +03:00 by kerem · 9 comments
Owner

Originally created by @j-ar7 on GitHub (Nov 17, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1045

OS is MacOS, archivebox installed using pip, archivebox setup has been done on this particular archivebox instance.
Getting "Extractor timed out after 60s." on some urls' pdf, single-file, screenshot or readibility extraction.

example

info

So how do I resolve this extractor error, if its something thats not dependent on the site itself?

Originally created by @j-ar7 on GitHub (Nov 17, 2022). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1045 OS is MacOS, archivebox installed using pip, `archivebox setup` has been done on this particular archivebox instance. Getting "Extractor timed out after 60s." on some urls' pdf, single-file, screenshot or readibility extraction. ![example](https://user-images.githubusercontent.com/112698622/202532011-1b9fc32b-f1dc-4c3a-bd6a-027aff47995b.png) ![info](https://user-images.githubusercontent.com/112698622/202532257-6effe2d4-c386-4b6a-9002-0aa516df3b2d.png) So how do I resolve this extractor error, if its something thats not dependent on the site itself?
kerem closed this issue 2026-03-14 23:59:04 +03:00
Author
Owner

@pirate commented on GitHub (May 26, 2023):

Sorry for the late reply, was just reminded by your twitter DM.

Can you try with the latest dev branch:

# docker:
docker build -t archivebox:dev https://github.com/ArchiveBox/ArchiveBox.git#dev
docker run -it -v $PWD:/data archivebox:dev init --setup

# bare metal:
pip install 'git+https://github.com/pirate/ArchiveBox@dev'
npm install 'git+https://github.com/ArchiveBox/ArchiveBox.git#dev'
archivebox init --setup

https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch

Extractor timeouts can happen for a variety of reasons, it's hard to diagnose, but the first step is running the command it says under Runt to see full output: and read the error messages.

<!-- gh-comment-id:1565072775 --> @pirate commented on GitHub (May 26, 2023): Sorry for the late reply, was just reminded by your twitter DM. Can you try with the latest dev branch: ```bash # docker: docker build -t archivebox:dev https://github.com/ArchiveBox/ArchiveBox.git#dev docker run -it -v $PWD:/data archivebox:dev init --setup # bare metal: pip install 'git+https://github.com/pirate/ArchiveBox@dev' npm install 'git+https://github.com/ArchiveBox/ArchiveBox.git#dev' archivebox init --setup ``` https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch Extractor timeouts can happen for a variety of reasons, it's hard to diagnose, but the first step is running the command it says under `Runt to see full output:` and read the error messages.
Author
Owner

@pirate commented on GitHub (Jan 19, 2024):

Please try on the latest v0.7.2 build, as there are many improvements to extractors and core code that should resolve this. Closing as stale for now, comment back here for help and I can reopen it if you're still having issues.

<!-- gh-comment-id:1899645522 --> @pirate commented on GitHub (Jan 19, 2024): Please try on the latest v0.7.2 build, as there are many improvements to extractors and core code that should resolve this. Closing as stale for now, comment back here for help and I can reopen it if you're still having issues.
Author
Owner

@THeK3nger commented on GitHub (Feb 27, 2024):

Hello! I stumbled into the same issue on macOS. Running the command for the full log, I get this:

[28463:34307:0227/140145.400081:ERROR:trust_store_mac.cc(752)] Error parsing certificate:
ERROR: Failed parsing extensions

[28463:259:0227/140145.422227:ERROR:policy_logger.cc(156)] :components/enterprise/browser/controller/chrome_browser_cloud_management_controller.cc(161) Cloud management controller initialization aborted as CBCM is not enabled. Please use the `--enable-chrome-browser-cloud-management` command line flag to enable it if you are not using the official Google Chrome build.
247207 bytes written to file output.pdf

The two errors (ERROR:trust_store_mac.cc(752) and ERROR:policy_logger.cc(156)) don't seem to be related to the problem (I have them for every Chromium browser I have installed, and they are not causing problems, I think).

As you can see the PDF is saved correctly. But the headless Chromium never returns, so Archivebox identifies them as an error even if everything is fine.

<!-- gh-comment-id:1966517554 --> @THeK3nger commented on GitHub (Feb 27, 2024): Hello! I stumbled into the same issue on macOS. Running the command for the full log, I get this: ``` [28463:34307:0227/140145.400081:ERROR:trust_store_mac.cc(752)] Error parsing certificate: ERROR: Failed parsing extensions [28463:259:0227/140145.422227:ERROR:policy_logger.cc(156)] :components/enterprise/browser/controller/chrome_browser_cloud_management_controller.cc(161) Cloud management controller initialization aborted as CBCM is not enabled. Please use the `--enable-chrome-browser-cloud-management` command line flag to enable it if you are not using the official Google Chrome build. 247207 bytes written to file output.pdf ``` The two errors (`ERROR:trust_store_mac.cc(752)` and `ERROR:policy_logger.cc(156)`) don't seem to be related to the problem (I have them for every Chromium browser I have installed, and they are not causing problems, I think). As you can see the PDF is saved correctly. But the headless Chromium never returns, so Archivebox identifies them as an error even if everything is fine.
Author
Owner

@pirate commented on GitHub (Feb 27, 2024):

this is unfortunately an upstream bug with Chrome :'(

You can confirm it by running the command manually, and you'll see that it still hangs. Occasionally it's fixed by rebooting, or clearing the chrome data dir (destructive). Uninstalling and re-installing chrome also often fixes it.

All are annoying solutions, I know, but I don't want to implement brittle workaround code just yet because I'm hoping they'll fix it soon on their side.

See my comment with more detail on this issue here: https://github.com/cypress-io/cypress/issues/27264#issuecomment-1972167140

<!-- gh-comment-id:1966605982 --> @pirate commented on GitHub (Feb 27, 2024): this is unfortunately an upstream bug with Chrome :'( You can confirm it by running the command manually, and you'll see that it still hangs. Occasionally it's fixed by rebooting, or clearing the chrome data dir (destructive). Uninstalling and re-installing chrome also often fixes it. All are annoying solutions, I know, but I don't want to implement brittle workaround code just yet because I'm hoping they'll fix it soon on their side. See my comment with more detail on this issue here: https://github.com/cypress-io/cypress/issues/27264#issuecomment-1972167140
Author
Owner

@THeK3nger commented on GitHub (Feb 27, 2024):

That's what I thought as well (it happens also with other Chromium browsers I have installed). Don't worry. :) I just wanted to notify other people with the same problem.

(and let's hope it will be fixed upstream!)

<!-- gh-comment-id:1967386548 --> @THeK3nger commented on GitHub (Feb 27, 2024): That's what I thought as well (it happens also with other Chromium browsers I have installed). Don't worry. :) I just wanted to notify other people with the same problem. (and let's hope it will be fixed upstream!)
Author
Owner

@pirate commented on GitHub (Mar 1, 2024):

I confirmed that is indeed an upstream issue an opened a bug report on the Chromium bug tracker: issues.chromium.org/issues/327583144

Screenshot 2024-02-29 at 5 30 24 PM
<!-- gh-comment-id:1972306173 --> @pirate commented on GitHub (Mar 1, 2024): I confirmed that is indeed an upstream issue an opened a bug report on the Chromium bug tracker: [issues.chromium.org/issues/327583144](https://issues.chromium.org/issues/327583144) <img width="1410" alt="Screenshot 2024-02-29 at 5 30 24 PM" src="https://github.com/ArchiveBox/ArchiveBox/assets/511499/6c33c2df-c44c-493b-a15c-c8e38b2599b4">
Author
Owner

@jamwil commented on GitHub (Mar 18, 2024):

I don't think this issue should be closed. It is core functionality that has been broken since 2022. I appreciate it's an upstream bug, but if there is little reasonable chance of an upstream resolution in the near term, then this issue should be open and easy to find so that people don't need to go digging for it.

<!-- gh-comment-id:2004435726 --> @jamwil commented on GitHub (Mar 18, 2024): I don't think this issue should be closed. It is core functionality that has been broken since 2022. I appreciate it's an upstream bug, but if there is little reasonable chance of an upstream resolution in the near term, then this issue should be open and easy to find so that people don't need to go digging for it.
Author
Owner

@pirate commented on GitHub (Mar 18, 2024):

This is the existing long-running issue for this Chromium bug: https://github.com/ArchiveBox/ArchiveBox/issues/746

Subscribe to that one instead ^

<!-- gh-comment-id:2004775445 --> @pirate commented on GitHub (Mar 18, 2024): This is the existing long-running issue for this Chromium bug: https://github.com/ArchiveBox/ArchiveBox/issues/746 Subscribe to that one instead ^
Author
Owner

@philippludwig commented on GitHub (Mar 21, 2025):

I see archiving jobs being aborted at 99.8% because of this. Can't the timeout be increased?

This is the first and only search result for this error message.

<!-- gh-comment-id:2742655297 --> @philippludwig commented on GitHub (Mar 21, 2025): I see archiving jobs being aborted at 99.8% because of this. Can't the timeout be increased? This is the first and only search result for this error message.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3673
No description provided.