mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #32] Cant run archive.py due to UTF-8 encoding issues #3042
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3042
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @movanet on GitHub (Jul 4, 2017).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/32
Any ideas why?
@pirate commented on GitHub (Jul 4, 2017):
Try pulling and running it again. I've been refactoring over the last couple hours, so you probably pulled a broken version, sorry!
@movanet commented on GitHub (Jul 4, 2017):
Thanks. It still fails:
@pirate commented on GitHub (Jul 4, 2017):
Ok sweet, at least it's failing in a different place though, which makes me think it's due to lacking hardcoded encodings.
I just updated all the
open()calls to manually specifyencoding='utf-8'. Please pull and try again, lemme know how it goes. What system are you running this on by the way?@movanet commented on GitHub (Jul 4, 2017):
It started to work however it stuck at chrome version. Should I update the chromium?
@movanet commented on GitHub (Jul 4, 2017):
btw this is my chromium version:
Chromium 58.0.3029.110 Built on Ubuntu , running on Ubuntu 16.04
@pirate commented on GitHub (Jul 4, 2017):
Yes, you cannot run chrome headless unless you have a newer version of chromium or google-chrome. Simply run
apt upgrade chromium-browserto upgrade.@movanet commented on GitHub (Jul 4, 2017):
Strange. Perhaps its not yet available for my Ubuntu?
apt upgrade chromium-browser
Reading package lists... Done
Building dependency tree
Reading state information... Done
chromium-browser is already the newest version (58.0.3029.110-0ubuntu0.16.04.1281).
@movanet commented on GitHub (Jul 4, 2017):
Tried downloading chromium from https://github.com/scheib/chromium-latest-linux/blob/master/ and modifying the env, but it doesnt work...
env CHROME_BINARY=/root/bookmark-archiver/chromium-latest-linux/484087/chrome-linux/chrome ./archive.py ril_export.html
[+] [2017-07-04 08:03:42] Starting archive from ril_export.html export file.
[] [2017-07-04 08:03:44] Created archive index with 1699 links.
[] Checking Dependencies:
/root/bookmark-archiver/chromium-latest-linux/484087/chrome-linux/chrome
/root/bookmark-archiver/chromium-latest-linux/484087/chrome-linux/chrome: error while loading shared libraries: libgtk-3.so.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "./archive.py", line 88, in
create_archive(export_file, service=export_type, resume=resume_from)
File "./archive.py", line 64, in create_archive
check_dependencies()
File "/root/bookmark-archiver/config.py", line 47, in check_dependencies
if int(version) < 59:
ValueError: invalid literal for int() with base 10: ''
@pirate commented on GitHub (Jul 4, 2017):
What is the output of
/root/bookmark-archiver/chromium-latest-linux/484087/chrome-linux/chrome --version?@movanet commented on GitHub (Jul 6, 2017):
chromium-browser --version
Chromium 58.0.3029.110 Built on Ubuntu , running on Ubuntu 16.04
@movanet commented on GitHub (Jul 6, 2017):
installed google chrome. seemed to be working. so I guess the problem was with chromium-browser. chrome is working allright it seems. I am closing this.
/bookmark-archiver# env CHROME_BINARY=/usr/bin/google-chrome ./archive.py ril_export.html
[+] [2017-07-06 10:11:49] Starting archive from ril_export.html export file.
[] [2017-07-06 10:11:53] Created archive index with 1699 links.
[] Checking Dependencies:
/usr/bin/google-chrome
/usr/bin/wget
/usr/bin/curl
[+] [1497864202 (2017-06-19 05:23)]
@movanet commented on GitHub (Jul 6, 2017):
sorry, another unicode error:
~/bookmark-archiver# ./archive.py ril_export.html
[] [2017-07-06 11:03:01] Starting archive from ril_export.html export file.
[+] [2017-07-06 11:03:07] Created archive index with 1699 links.
[] Checking Dependencies:
/usr/bin/chromium-browser
/usr/bin/wget
/usr/bin/curl
[+] [1497864202 (2017-06-19 05:23)] "Helios4 - Your own private cloud": kobol.io/helios4/
- Downloading full site
0.9% (1/60sec)Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/root/bookmark-archiver/config.py", line 145, in progress_bar
seconds,
UnicodeEncodeError: 'ascii' codec can't encode character '\u2588' in position 15: ordinal not in range(128)
wget output:
Converting links in kobol.io/helios4/css/custom.css... nothing to do.
Converting links in kobol.io/helios4/css/owl.carousel.css... 0-1
Converting links in kobol.io/helios4/css/socicon.css... 1-0
Converting links in kobol.io/helios4/css/iconsmind.css... 3-0
Converting links in kobol.io/helios4/css/bootstrap.css... 0-5
Converting links in kobol.io/helios4/css/interface-icons.css... 6-0
Converting links in kobol.io/helios4/css/theme.css... 1-0
Converting links in kobol.io/helios4/css/font-mulilato.css... nothing to do.
Converted links in 9 files in 0.03 seconds.
Run to see full output: cd pocket/archive/1497864202; wget --timestamping --adjust-extension --no-parent --page-requisites --convert-links http://kobol.io/helios4/
Failed: Exception Failed to wget download
- Printing PDF
0.9% (1/60sec)Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/root/bookmark-archiver/config.py", line 145, in progress_bar
seconds,
UnicodeEncodeError: 'ascii' codec can't encode character '\u2588' in position 15: ordinal not in range(128)
- Snapping Screenshot
0.9% (1/60sec)Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/root/bookmark-archiver/config.py", line 145, in progress_bar
seconds,
UnicodeEncodeError: 'ascii' codec can't encode character '\u2588' in position 15: ordinal not in range(128)
- Submitting to archive.org
0.9% (1/60sec)Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/root/bookmark-archiver/config.py", line 145, in progress_bar
seconds,
UnicodeEncodeError: 'ascii' codec can't encode character '\u2588' in position 15: ordinal not in range(128)
- Fetching Favicon
0.9% (1/60sec)Process Process-5:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/root/bookmark-archiver/config.py", line 145, in progress_bar
seconds,
UnicodeEncodeError: 'ascii' codec can't encode character '\u2588' in position 15: ordinal not in range(128)
- Creating link info file
[X] Archive creation stopped.
Continue where you left off by running:
./archive.py ril_export.html pocket 1497840833
Traceback (most recent call last):
File "./archive.py", line 91, in
create_archive(export_file, service=export_type, resume=resume_from)
File "./archive.py", line 69, in create_archive
raise e
File "./archive.py", line 59, in create_archive
dump_website(link, service)
File "/root/bookmark-archiver/fetch.py", line 260, in dump_website
print('[{green}+{reset}] [{timestamp} ({time})] "{title}": {blue}{base_url}{reset}'.format(**link, **ANSI))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 120: ordinal not in range(128)
@pirate commented on GitHub (Jul 6, 2017):
Try running the script like this:
Also post back with the output of these:
@pirate commented on GitHub (Jul 6, 2017):
You can also just try pulling and running it again, I added instructions to fix this problem. It's fairly rare for this to still be happening in 2017, most distros default to the UTF-8 locale by now. I'm surprised that you're seeing this issue on Ubuntu 16.04.
@pirate commented on GitHub (Jul 25, 2017):
If you're still having trouble feel free to comment back and I'll re-open this. For now I'm closing this issue due to inactivity.