mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #999] Bug: UnicodeDecodeError when archiving site #624
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#624
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @InnovativeInventor on GitHub (Jul 16, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/999
Describe the bug
Archiving a certain website (https://www.thedrive.com/the-war-zone/new-radars-are-giving-old-air-force-f-16s-capabilities-like-never-before) results in a
UnicodeDecodeErrorerror, which halts the update process. Ideally, there should be a way to skip errored archives.Steps to reproduce
Screenshots or log output
ArchiveBox version
@pirate commented on GitHub (Jul 20, 2022):
Duplicate of https://github.com/ArchiveBox/ArchiveBox/issues/991
@turian commented on GitHub (Sep 12, 2022):
I believe I fixed this is https://github.com/ArchiveBox/ArchiveBox/pull/1026
TDLR, until that's merged:
Add this to ArchiveBox.conf:
If that doesn't work and you still get crap UnicodeDecodeErrors, you can use my Docker
turian/archivebox:kludge-984-UTF8-bug, instead ofarchivebox/archiveboxfor now. Or use my branch and pip install or whatever from there.