mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[PR #378] [MERGED] fix: Use w3lib to improve the encoding extraction #2646
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2646
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/378
Author: @cdvv7788
Created: 7/22/2020
Status: ✅ Merged
Merged: 7/22/2020
Merged by: @pirate
Base:
django← Head:hotfix/#257📝 Commits (2)
949f78afix: Use w3lib to improve the encoding extractionaa45f9cfix version tag📊 Changes
5 files changed (+787 additions, -11 deletions)
View changed files
📝
archivebox/util.py(+8 -9)📝
setup.py(+1 -0)📝
tests/mock_server/server.py(+3 -1)➕
tests/mock_server/templates/shift_jis.html(+769 -0)📝
tests/test_util.py(+6 -1)📄 Description
Summary
Detecting the right encoding is an issue not only for rss feeds (which had a previous fix) but in general. In this PR I generalize that fix, so the flow is:
Problematic links are all having the right title now.
**Related issues: #257
Changes these areas
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.