mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1298] Bug: readability extractor fails in 0.7.1 docker with ERROR: illegal operation on a directory #3821
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3821
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @bramnet on GitHub (Dec 19, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1298
Describe the bug
When running the docker version of ArchiveBox, the readability extractor consistently fails. When looking at logs, it's because it is attempting to use a deprecated module and is failing.
Steps to reproduce
Screenshots or log output
ArchiveBox version
@pirate commented on GitHub (Dec 19, 2023):
The punycode thing is just a warning, not an error. If it's failing there's likely some other error later on causing it.
Can you try running the readability command it shows and post the full output?
I also tried upgrading readability in the
:devdocker image, you can pull the latest version and give that a try as well. https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch@bramnet commented on GitHub (Dec 19, 2023):
Running the readability command directly returns the following:
I don't have the ability to build and run the dev image at this very moment (busy IRL), I'll try this later and report my findings.
@pirate commented on GitHub (Dec 19, 2023):
Looks like it was tried to run on a directory? It's probably being called wrong or there is an unexpected dir taking the place of the expected html files in the snapshot folder. Can you post a screenshot of the
./archive/<timestamp>folder's contents for the failing snapshot.No need to build the image, just pull the published
archivebox/archivebox:devand run it.@bramnet commented on GitHub (Dec 19, 2023):
I haven't had the chance to pull the dev and try running it yet, but the directory contents are as below
@bramnet commented on GitHub (Dec 20, 2023):
I ran dev, readability extracted successfully without any issues as far as I can tell.
@pirate commented on GitHub (Dec 20, 2023):
Great, going to close this for now as fixed in
0.7.2(dev) https://github.com/ArchiveBox/ArchiveBox/pull/1297 then. Let me know if you have any further issues.Note: readability seems to have gotten slower in their latest release, so you may need to increase the default timeouts a bit until they speed it up
archivebox config --set TIMEOUT=120or higher (up from 60sec by default).