[GH-ISSUE #674] Failed to archive link: UnicodeEncodeError: 'gbk' codec can't encode character '\u25be' in position 9443: illegal multibyte sequence #3444

Closed
opened 2026-03-14 22:56:19 +03:00 by kerem · 2 comments
Owner

Originally created by @littlegolden on GitHub (Mar 26, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/674

similar with #32, but it happens when archiving the site:

image

Originally created by @littlegolden on GitHub (Mar 26, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/674 similar with #32, but it happens when archiving the site: ![image](https://user-images.githubusercontent.com/35162802/112638479-42a20200-8e7a-11eb-96c8-ddef583435a4.png)
kerem 2026-03-14 22:56:19 +03:00
Author
Owner

@pirate commented on GitHub (Mar 26, 2021):

I think this is a different issue, can you share what the URL is that you tried to archive?

Also please post the full output of archivebox --version.

<!-- gh-comment-id:808360894 --> @pirate commented on GitHub (Mar 26, 2021): I think this is a different issue, can you share what the URL is that you tried to archive? Also please post the full output of `archivebox --version`.
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

I think it's due to windows not defaulting to UTF-8 for file writes.

There's a PEP to fix it, but it's not proposed to land until 3.10: https://discuss.python.org/t/pep-597-enable-utf-8-mode-by-default-on-windows/3122

In the meantime can you try setting the PYTHONLEGACYWINDOWSSTDIO=utf-8 environment variable and running it again.

Related issue: https://github.com/ArchiveBox/ArchiveBox/issues/678

In the meantime I've added a patch to v0.6 that should fix this issue: 71e632a

You can try it out like so:

pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar"

Post back if you're still encountering the problem and I'll reopen the ticket.

I've also added some fixes to v0.6 that should improve the situation.

<!-- gh-comment-id:808652416 --> @pirate commented on GitHub (Mar 27, 2021): I think it's due to windows not defaulting to UTF-8 for file writes. There's a PEP to fix it, but it's not proposed to land until 3.10: https://discuss.python.org/t/pep-597-enable-utf-8-mode-by-default-on-windows/3122 In the meantime can you try setting the `PYTHONLEGACYWINDOWSSTDIO=utf-8` environment variable and running it again. Related issue: https://github.com/ArchiveBox/ArchiveBox/issues/678 In the meantime I've added a patch to v0.6 that should fix this issue: 71e632a You can try it out like so: ```bash pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar" ``` Post back if you're still encountering the problem and I'll reopen the ticket. I've also added some fixes to v0.6 that should improve the situation.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3444
No description provided.