mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #257] Error if the character code is Shift_JIS #1692
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1692
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @matoken on GitHub (Aug 16, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/257
Error if the character code is Shift_JIS
An error occurs on some Japanese pages.
Steps to reproduce
An error occurs on this page.
https://www.mbc.co.jp/news/
http://167.86.112.42/hello_sjis.html
Screenshots or log output
The character code of the page where the error occurs seems to be Shift_JIS (a little old Japanese character code).
An error occurred when trying to create a tiny Shift_JIS page.
Software versions
e2b054ae7@pirate commented on GitHub (Aug 17, 2019):
Ahhh encoding problems, they're never-ending and really hard to solve 100% correctly for all cases. Unfortunately cant promise I'll get around to this anytime soon, too many other important issues in the queue, sorry for the trouble. May I suggest finding a site that lets you proxy and convert the character encoding by passing a url, then archiving that site instead? As a last resort Google Translate might do the trick?
@cdvv7788 commented on GitHub (Jul 20, 2020):
@matoken Can you please check if the content can be archived now using the
djangobranch? I still see some issue with the title in the index, but the content seems to be archived correctly.@matoken commented on GitHub (Jul 21, 2020):
I installed django branch in a new environment and tried it out.
I tried some sites that use Shift_JIS and it seems to work.
The title of the index is sometimes a URL and sometimes garbled.
@pirate commented on GitHub (Jul 22, 2020):
Try the
djangobranch now, it should be fixed in #378. If you still see any problems comment back here and I'll reopen the issue.