mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1045] How to resolve "extractor timed out after 60s" error, if possible? #654
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#654
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @j-ar7 on GitHub (Nov 17, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1045
OS is MacOS, archivebox installed using pip,
archivebox setuphas been done on this particular archivebox instance.Getting "Extractor timed out after 60s." on some urls' pdf, single-file, screenshot or readibility extraction.
So how do I resolve this extractor error, if its something thats not dependent on the site itself?
@pirate commented on GitHub (May 26, 2023):
Sorry for the late reply, was just reminded by your twitter DM.
Can you try with the latest dev branch:
https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch
Extractor timeouts can happen for a variety of reasons, it's hard to diagnose, but the first step is running the command it says under
Runt to see full output:and read the error messages.@pirate commented on GitHub (Jan 19, 2024):
Please try on the latest v0.7.2 build, as there are many improvements to extractors and core code that should resolve this. Closing as stale for now, comment back here for help and I can reopen it if you're still having issues.
@THeK3nger commented on GitHub (Feb 27, 2024):
Hello! I stumbled into the same issue on macOS. Running the command for the full log, I get this:
The two errors (
ERROR:trust_store_mac.cc(752)andERROR:policy_logger.cc(156)) don't seem to be related to the problem (I have them for every Chromium browser I have installed, and they are not causing problems, I think).As you can see the PDF is saved correctly. But the headless Chromium never returns, so Archivebox identifies them as an error even if everything is fine.
@pirate commented on GitHub (Feb 27, 2024):
this is unfortunately an upstream bug with Chrome :'(
You can confirm it by running the command manually, and you'll see that it still hangs. Occasionally it's fixed by rebooting, or clearing the chrome data dir (destructive). Uninstalling and re-installing chrome also often fixes it.
All are annoying solutions, I know, but I don't want to implement brittle workaround code just yet because I'm hoping they'll fix it soon on their side.
See my comment with more detail on this issue here: https://github.com/cypress-io/cypress/issues/27264#issuecomment-1972167140
@THeK3nger commented on GitHub (Feb 27, 2024):
That's what I thought as well (it happens also with other Chromium browsers I have installed). Don't worry. :) I just wanted to notify other people with the same problem.
(and let's hope it will be fixed upstream!)
@pirate commented on GitHub (Mar 1, 2024):
I confirmed that is indeed an upstream issue an opened a bug report on the Chromium bug tracker: issues.chromium.org/issues/327583144
@jamwil commented on GitHub (Mar 18, 2024):
I don't think this issue should be closed. It is core functionality that has been broken since 2022. I appreciate it's an upstream bug, but if there is little reasonable chance of an upstream resolution in the near term, then this issue should be open and easy to find so that people don't need to go digging for it.
@pirate commented on GitHub (Mar 18, 2024):
This is the existing long-running issue for this Chromium bug: https://github.com/ArchiveBox/ArchiveBox/issues/746
Subscribe to that one instead ^
@philippludwig commented on GitHub (Mar 21, 2025):
I see archiving jobs being aborted at 99.8% because of this. Can't the timeout be increased?
This is the first and only search result for this error message.