mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #1140] Bug: Exception TypeError: a bytes-like object is required, not 'str' in readability log_archive_method_finished #3734
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3734
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jfinkhaeuser on GitHub (Apr 19, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1140
The readability extractor got an exception, but looking at the exception, it's probably best addressed in the logging utility.
Describe the bug
I'm running the docker-compose setup. I've added a few URLs with
--index-only, and have a script that runs an update to fill in the archiving. One one of the URLs, the error below was raised.It's clear that
hintsis expected to be a different type than it is. While the calling function may need to be fixed, the log utility likely should never error out like this and converthintsas necessary; at least that's how I'd approach this.Steps to reproduce
Simply adding a URL with the readability extractor enabled.
Screenshots or log output
ArchiveBox version
@Mrgove10 commented on GitHub (Jun 24, 2023):
@pirate commented on GitHub (Jun 28, 2023):
Sorry @Mrgove10 haven't gotten around to fixing this yet, it should be safe to ignore if you don't mind restarting the crawl after this point using the
archivebox update --resume ...You can also disable readability temporarily to do the first pass
archivebox config --set SAVE_READABILITY=False, and enable it on a second pass so that it doesn't block other archiving work.@Mrgove10 commented on GitHub (Jul 1, 2023):
Thanks for the resolution ! i have temporarly disables redability until the fix :)
@pirate commented on GitHub (Jan 19, 2024):
Should be fixed in v0.7.2. Comment back here if you're still having issues!