[GH-ISSUE #678] Error on Windows 10 when adding URL: UnicodeEncodeError: 'charmap' codec can't encode: character maps to <undefined> #429

Closed
opened 2026-03-01 14:43:29 +03:00 by kerem · 41 comments
Owner

Originally created by @Leontking on GitHub (Mar 27, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/678

[i] [2021-03-27 04:40:48] ArchiveBox v0.5.4: archivebox add https://youtube.com/
    > E:\ArchiveBox

[!] Warning: Missing 6 recommended dependencies
    ! WGET_BINARY: wget (unable to detect version)
    ! SINGLEFILE_BINARY: single-file (unable to detect version)
      Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git"
            or archivebox config --set SAVE_SINGLEFILE=False to silence this warning

    ! READABILITY_BINARY: readability-extractor (unable to detect version)
      Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git"
            or archivebox config --set SAVE_READABILITY=False to silence this warning

    ! MERCURY_BINARY: mercury-parser (unable to detect version)
      Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git"
            or archivebox config --set SAVE_MERCURY=False to silence this warning

    ! CHROME_BINARY: unable to find binary (unable to detect version)
    ! RIPGREP_BINARY: rg (unable to detect version)

[+] [2021-03-27 04:40:52] Adding 1 links to index (crawl depth=0)...
    > Saved verbatim input to sources/E:\ArchiveBox\sources\1616820052-import.txt
    > Parsed 1 URLs from input (Plain Text)
    > Found 1 new URLs not already in index

[*] [2021-03-27 04:40:52] Writing 1 links to main index...
    √ E:\ArchiveBox\index.sqlite3

[▶] [2021-03-27 04:40:52] Starting archiving of 1 snapshots in index...
    ! Failed to archive link: UnicodeEncodeError: 'charmap' codec can't encode character '\u25be' in position 9443: character maps to <undefined>

Traceback (most recent call last):
  File "d:\python\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\python\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Python\Scripts\archivebox.exe\__main__.py", line 7, in <module>
    from .cli import main
  File "d:\python\lib\site-packages\archivebox\cli\__init__.py", line 129, in main
    run_subcommand(
  File "d:\python\lib\site-packages\archivebox\cli\__init__.py", line 69, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "d:\python\lib\site-packages\archivebox\cli\archivebox_add.py", line 85, in main
    add(
  File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "d:\python\lib\site-packages\archivebox\main.py", line 592, in add
    archive_links(new_links, overwrite=False, **archive_kwargs)
  File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "d:\python\lib\site-packages\archivebox\extractors\__init__.py", line 173, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "d:\python\lib\site-packages\archivebox\extractors\__init__.py", line 95, in archive_link
    write_link_details(link, out_dir=out_dir, skip_sql_index=False)
  File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "d:\python\lib\site-packages\archivebox\index\__init__.py", line 333, in write_link_details
    write_html_link_details(link, out_dir=out_dir)
  File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "d:\python\lib\site-packages\archivebox\index\html.py", line 79, in write_html_link_details
    atomic_write(str(Path(out_dir) / HTML_INDEX_FILENAME), rendered_html)
  File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "d:\python\lib\site-packages\archivebox\system.py", line 47, in atomic_write
    f.write(contents)
  File "d:\python\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u25be' in position 9443: character maps to <undefined>
Originally created by @Leontking on GitHub (Mar 27, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/678 ```batch [i] [2021-03-27 04:40:48] ArchiveBox v0.5.4: archivebox add https://youtube.com/ > E:\ArchiveBox [!] Warning: Missing 6 recommended dependencies ! WGET_BINARY: wget (unable to detect version) ! SINGLEFILE_BINARY: single-file (unable to detect version) Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git" or archivebox config --set SAVE_SINGLEFILE=False to silence this warning ! READABILITY_BINARY: readability-extractor (unable to detect version) Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git" or archivebox config --set SAVE_READABILITY=False to silence this warning ! MERCURY_BINARY: mercury-parser (unable to detect version) Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git" or archivebox config --set SAVE_MERCURY=False to silence this warning ! CHROME_BINARY: unable to find binary (unable to detect version) ! RIPGREP_BINARY: rg (unable to detect version) [+] [2021-03-27 04:40:52] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/E:\ArchiveBox\sources\1616820052-import.txt > Parsed 1 URLs from input (Plain Text) > Found 1 new URLs not already in index [*] [2021-03-27 04:40:52] Writing 1 links to main index... √ E:\ArchiveBox\index.sqlite3 [▶] [2021-03-27 04:40:52] Starting archiving of 1 snapshots in index... ! Failed to archive link: UnicodeEncodeError: 'charmap' codec can't encode character '\u25be' in position 9443: character maps to <undefined> Traceback (most recent call last): File "d:\python\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "d:\python\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\Python\Scripts\archivebox.exe\__main__.py", line 7, in <module> from .cli import main File "d:\python\lib\site-packages\archivebox\cli\__init__.py", line 129, in main run_subcommand( File "d:\python\lib\site-packages\archivebox\cli\__init__.py", line 69, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore File "d:\python\lib\site-packages\archivebox\cli\archivebox_add.py", line 85, in main add( File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function return func(*args, **kwargs) File "d:\python\lib\site-packages\archivebox\main.py", line 592, in add archive_links(new_links, overwrite=False, **archive_kwargs) File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function return func(*args, **kwargs) File "d:\python\lib\site-packages\archivebox\extractors\__init__.py", line 173, in archive_links archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir)) File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function return func(*args, **kwargs) File "d:\python\lib\site-packages\archivebox\extractors\__init__.py", line 95, in archive_link write_link_details(link, out_dir=out_dir, skip_sql_index=False) File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function return func(*args, **kwargs) File "d:\python\lib\site-packages\archivebox\index\__init__.py", line 333, in write_link_details write_html_link_details(link, out_dir=out_dir) File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function return func(*args, **kwargs) File "d:\python\lib\site-packages\archivebox\index\html.py", line 79, in write_html_link_details atomic_write(str(Path(out_dir) / HTML_INDEX_FILENAME), rendered_html) File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function return func(*args, **kwargs) File "d:\python\lib\site-packages\archivebox\system.py", line 47, in atomic_write f.write(contents) File "d:\python\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u25be' in position 9443: character maps to <undefined> ```
kerem 2026-03-01 14:43:29 +03:00
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

what the heck

<!-- gh-comment-id:808651547 --> @Leontking commented on GitHub (Mar 27, 2021): what the heck
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

does it want me to create the system environmet variables or something??

<!-- gh-comment-id:808651689 --> @Leontking commented on GitHub (Mar 27, 2021): does it want me to create the system environmet variables or something??
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

You need to change your system write encoding to use UTF-8, are you on a really old system or something?

Can you try setting the PYTHONLEGACYWINDOWSSTDIO=utf-8 environment variable.

<!-- gh-comment-id:808652098 --> @pirate commented on GitHub (Mar 27, 2021): You need to change your system write encoding to use UTF-8, are you on a really old system or something? Can you try setting the `PYTHONLEGACYWINDOWSSTDIO=utf-8` environment variable.
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

You need to change your system write encoding to use UTF-8, are you on a really old system or something?

Can you try setting the PYTHONLEGACYWINDOWSSTDIO=utf-8 environment variable.

How do you do that

I am on latest win 10

<!-- gh-comment-id:808652972 --> @Leontking commented on GitHub (Mar 27, 2021): > You need to change your system write encoding to use UTF-8, are you on a really old system or something? > > Can you try setting the `PYTHONLEGACYWINDOWSSTDIO=utf-8` environment variable. How do you do that I am on latest win 10
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

Try this:

pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar"

Also there is no need to @ mention me, I already get notified about all issues. This project is free, don't expect instant support, especially if you open multiple issues in a row in quick succession and break the rules by not filling out the issue template and providing your version information. It makes it a lot harder for me to provide support without that info, and I'm going to be a lot grumpier about it if I have to ask for it because people ignore the issue template.

Please try and format your issues better with relevant titles and logs using triple backticks ```, and include your full version information from archivebox --version.

<!-- gh-comment-id:808655255 --> @pirate commented on GitHub (Mar 27, 2021): Try this: ```bash pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar" ``` Also there is no need to @ mention me, I already get notified about all issues. This project is free, don't expect instant support, especially if you open multiple issues in a row in quick succession and break the rules by not filling out [the issue template](https://github.com/ArchiveBox/ArchiveBox/issues/new?assignees=&labels=bug&template=bug_report.md&title=Bug%3A+...) and providing your version information. It makes it a lot harder for me to provide support without that info, and I'm going to be a lot grumpier about it if I have to ask for it because people ignore the issue template. Please try and format your issues better with relevant titles and logs using triple backticks \`\`\`, and include your full version information from `archivebox --version`.
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Try this:

pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar"

Also there is no need to @ mention me, I already get notified about all issues. This project is free, don't expect instant support.

weired how is it related to the error

<!-- gh-comment-id:808655882 --> @Leontking commented on GitHub (Mar 27, 2021): > Try this: > > ```shell > pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar" > ``` > > Also there is no need to @ mention me, I already get notified about all issues. This project is free, don't expect instant support. weired how is it related to the error
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

It's a pre-release version with some fixes for Windows. See: 0f33ceb01ad941cf37c2583cdabe9e1d26cac4eb

<!-- gh-comment-id:808656144 --> @pirate commented on GitHub (Mar 27, 2021): It's a pre-release version with some fixes for Windows. See: 0f33ceb01ad941cf37c2583cdabe9e1d26cac4eb
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Here

E:\ArchiveBox>archivebox add 'https://youtube.com'
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
Traceback (most recent call last):
  File "d:\python\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\python\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Python\Scripts\archivebox.exe\__main__.py", line 4, in <module>
  File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\cli\__init__.py", line 76, in <module>
    SUBCOMMANDS = list_subcommands()
  File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\cli\__init__.py", line 44, in list_subcommands
    module = import_module('.archivebox_{}'.format(subcommand), __package__)
  File "d:\python\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\cli\archivebox_add.py", line 11, in <module>
    from ..main import add
  File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\main.py", line 22, in <module>
    from .parsers import (
  File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\parsers\__init__.py", line 17, in <module>
    from ..system import atomic_write
  File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\system.py", line 40
    encoding = None if if isinstance(contents, bytes) else 'utf-8'
                       ^
SyntaxError: invalid syntax
<!-- gh-comment-id:808657104 --> @Leontking commented on GitHub (Mar 27, 2021): Here ```batch E:\ArchiveBox>archivebox add 'https://youtube.com' SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc Traceback (most recent call last): File "d:\python\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "d:\python\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\Python\Scripts\archivebox.exe\__main__.py", line 4, in <module> File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\cli\__init__.py", line 76, in <module> SUBCOMMANDS = list_subcommands() File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\cli\__init__.py", line 44, in list_subcommands module = import_module('.archivebox_{}'.format(subcommand), __package__) File "d:\python\lib\importlib\__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\cli\archivebox_add.py", line 11, in <module> from ..main import add File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\main.py", line 22, in <module> from .parsers import ( File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\parsers\__init__.py", line 17, in <module> from ..system import atomic_write File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\system.py", line 40 encoding = None if if isinstance(contents, bytes) else 'utf-8' ^ SyntaxError: invalid syntax ```
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

Run it again. Also use ``` to format your logs otherwise it's really hard to read here.

<!-- gh-comment-id:808657296 --> @pirate commented on GitHub (Mar 27, 2021): Run it again. Also use \`\`\` to format your logs otherwise it's really hard to read here.
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Ok

<!-- gh-comment-id:808657315 --> @Leontking commented on GitHub (Mar 27, 2021): Ok
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

...
  File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\system.py", line 40
    encoding = None if if isinstance(contents, bytes) else 'utf-8'
                       ^
SyntaxError: invalid syntax~~~
<!-- gh-comment-id:808657518 --> @Leontking commented on GitHub (Mar 27, 2021): ~~~E:\ArchiveBox>archivebox add 'https://youtube.com' ... File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\system.py", line 40 encoding = None if if isinstance(contents, bytes) else 'utf-8' ^ SyntaxError: invalid syntax~~~
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

Give it a min before reinstalling, it's being cached.

<!-- gh-comment-id:808657949 --> @pirate commented on GitHub (Mar 27, 2021): Give it a min before reinstalling, it's being cached.
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Btw how to set this environment variable i only know how to set binaries environment variable PYTHONLEGACYWINDOWSSTDIO=utf-8

<!-- gh-comment-id:808657983 --> @Leontking commented on GitHub (Mar 27, 2021): Btw how to set this environment variable i only know how to set binaries environment variable PYTHONLEGACYWINDOWSSTDIO=utf-8
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Ok ill wait

<!-- gh-comment-id:808658003 --> @Leontking commented on GitHub (Mar 27, 2021): Ok ill wait
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

I think archive box crashed

<!-- gh-comment-id:808658382 --> @Leontking commented on GitHub (Mar 27, 2021): I think archive box crashed
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

cuz here's what i get by archivebox version

~~~ E:\ArchiveBox>archivebox version
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
Traceback (most recent call last):
  File "d:\python\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
...
  File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\system.py", line 40
    encoding = None if if isinstance(contents, bytes) else 'utf-8'
                       ^
SyntaxError: invalid syntax
<!-- gh-comment-id:808658520 --> @Leontking commented on GitHub (Mar 27, 2021): cuz here's what i get by archivebox version ``` ~~~ E:\ArchiveBox>archivebox version SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc Traceback (most recent call last): File "d:\python\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, ... File "C:\Users\abhia\AppData\Roaming\Python\Python38\site-packages\archivebox\system.py", line 40 encoding = None if if isinstance(contents, bytes) else 'utf-8' ^ SyntaxError: invalid syntax ```
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

i have 32 bit python on windows 64

<!-- gh-comment-id:808658873 --> @Leontking commented on GitHub (Mar 27, 2021): i have 32 bit python on windows 64
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

It's the same error as before, nothing changed which means you didn't reinstall it correctly. Did you run the install command I gave?

pip uninstall archivebox
pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar"
archivebox add ...

Also first result on Google for set Windows environment variable: https://superuser.com/questions/212150/how-to-set-env-variable-in-windows-cmd-line/212153

<!-- gh-comment-id:808659110 --> @pirate commented on GitHub (Mar 27, 2021): It's the same error as before, nothing changed which means you didn't reinstall it correctly. Did you run the install command I gave? ``` pip uninstall archivebox pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar" archivebox add ... ``` Also first result on Google for `set Windows environment variable`: https://superuser.com/questions/212150/how-to-set-env-variable-in-windows-cmd-line/212153
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

its a problem with my python

<!-- gh-comment-id:808659488 --> @Leontking commented on GitHub (Mar 27, 2021): its a problem with my python
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

It's the same error as before, nothing changed which means you didn't reinstall it correctly. Did you run the install command I gave?

pip uninstall archivebox
pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar"
archivebox add ...

Also first result on Google for set Windows environment variable: https://superuser.com/questions/212150/how-to-set-env-variable-in-windows-cmd-line/212153

i did

<!-- gh-comment-id:808659583 --> @Leontking commented on GitHub (Mar 27, 2021): > It's the same error as before, nothing changed which means you didn't reinstall it correctly. Did you run the install command I gave? > > ``` > pip uninstall archivebox > pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar" > archivebox add ... > ``` > > Also first result on Google for `set Windows environment variable`: https://superuser.com/questions/212150/how-to-set-env-variable-in-windows-cmd-line/212153 i did
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

Really? Please paste the full verbatim output of running it:

pip uninstall archivebox
pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar"
archivebox --version
archivebox add https://youtube.com
<!-- gh-comment-id:808661493 --> @pirate commented on GitHub (Mar 27, 2021): Really? Please paste the full verbatim output of running it: ``` pip uninstall archivebox pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@debug-toolbar" archivebox --version archivebox add https://youtube.com ```
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

nope imma reinstall python

<!-- gh-comment-id:808663870 --> @Leontking commented on GitHub (Mar 27, 2021): nope imma reinstall python
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

I had both 64 and 32 bit

<!-- gh-comment-id:808663908 --> @Leontking commented on GitHub (Mar 27, 2021): I had both 64 and 32 bit
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

ok same error still

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
←[01;30m[i] [2021-03-27 06:00:43] ArchiveBox v0.5.6: archivebox add https://youtube.com←[00;00m
←[01;30m    > C:\Windows\System32←[00;00m

Traceback (most recent call last):
  File "d:\python\lib\logging\config.py", line 564, in configure
    handler = self.configure_handler(handlers[name])
  File "d:\python\lib\logging\config.py", line 745, in configure_handler
    result = factory(**kwargs)
  File "d:\python\lib\logging\handlers.py", line 153, in __init__
    BaseRotatingHandler.__init__(self, filename, mode, encoding=encoding,
  File "d:\python\lib\logging\handlers.py", line 58, in __init__
    logging.FileHandler.__init__(self, filename, mode=mode,
  File "d:\python\lib\logging\__init__.py", line 1142, in __init__
    StreamHandler.__init__(self, self._open())
  File "d:\python\lib\logging\__init__.py", line 1171, in _open
    return open(self.baseFilename, self.mode, encoding=self.encoding,
PermissionError: [Errno 13] Permission denied: 'C:\\Windows\\System32\\logs\\errors.log'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\Python\Scripts\archivebox-script.py", line 33, in <module>
    sys.exit(load_entry_point('archivebox==0.5.6', 'console_scripts', 'archivebox')())
  File "d:\python\lib\site-packages\archivebox\cli\__init__.py", line 133, in main
    run_subcommand(
  File "d:\python\lib\site-packages\archivebox\cli\__init__.py", line 70, in run_subcommand
    setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending)
  File "d:\python\lib\site-packages\archivebox\config.py", line 1096, in setup_django
    django.setup()
  File "d:\python\lib\site-packages\django\__init__.py", line 19, in setup
    configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)
  File "d:\python\lib\site-packages\django\utils\log.py", line 75, in configure_logging
    logging_config_func(logging_settings)
  File "d:\python\lib\logging\config.py", line 809, in dictConfig
    dictConfigClass(config).configure()
  File "d:\python\lib\logging\config.py", line 571, in configure
    raise ValueError('Unable to configure handler '
ValueError: Unable to configure handler 'logfile'
<!-- gh-comment-id:808665032 --> @Leontking commented on GitHub (Mar 27, 2021): ok same error still ~~~ C:\Windows\System32>archivebox add https://youtube.com SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc ←[01;30m[i] [2021-03-27 06:00:43] ArchiveBox v0.5.6: archivebox add https://youtube.com←[00;00m ←[01;30m > C:\Windows\System32←[00;00m Traceback (most recent call last): File "d:\python\lib\logging\config.py", line 564, in configure handler = self.configure_handler(handlers[name]) File "d:\python\lib\logging\config.py", line 745, in configure_handler result = factory(**kwargs) File "d:\python\lib\logging\handlers.py", line 153, in __init__ BaseRotatingHandler.__init__(self, filename, mode, encoding=encoding, File "d:\python\lib\logging\handlers.py", line 58, in __init__ logging.FileHandler.__init__(self, filename, mode=mode, File "d:\python\lib\logging\__init__.py", line 1142, in __init__ StreamHandler.__init__(self, self._open()) File "d:\python\lib\logging\__init__.py", line 1171, in _open return open(self.baseFilename, self.mode, encoding=self.encoding, PermissionError: [Errno 13] Permission denied: 'C:\\Windows\\System32\\logs\\errors.log' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "D:\Python\Scripts\archivebox-script.py", line 33, in <module> sys.exit(load_entry_point('archivebox==0.5.6', 'console_scripts', 'archivebox')()) File "d:\python\lib\site-packages\archivebox\cli\__init__.py", line 133, in main run_subcommand( File "d:\python\lib\site-packages\archivebox\cli\__init__.py", line 70, in run_subcommand setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending) File "d:\python\lib\site-packages\archivebox\config.py", line 1096, in setup_django django.setup() File "d:\python\lib\site-packages\django\__init__.py", line 19, in setup configure_logging(settings.LOGGING_CONFIG, settings.LOGGING) File "d:\python\lib\site-packages\django\utils\log.py", line 75, in configure_logging logging_config_func(logging_settings) File "d:\python\lib\logging\config.py", line 809, in dictConfig dictConfigClass(config).configure() File "d:\python\lib\logging\config.py", line 571, in configure raise ValueError('Unable to configure handler ' ValueError: Unable to configure handler 'logfile'
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Oh wait i had to change the directory

<!-- gh-comment-id:808665825 --> @Leontking commented on GitHub (Mar 27, 2021): Oh wait i had to change the directory
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
[i] [2021-03-27 06:06:32] ArchiveBox v0.5.6: archivebox add https://youtube.com
    > E:\ArchiveBox

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
[!] Warning: Missing 4 recommended dependencies
    ! SINGLEFILE_BINARY: single-file (unable to detect version)
      Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git"
            or archivebox config --set SAVE_SINGLEFILE=False to silence this warning

    ! READABILITY_BINARY: readability-extractor (unable to detect version)
      Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git"
            or archivebox config --set SAVE_READABILITY=False to silence this warning

    ! MERCURY_BINARY: mercury-parser (unable to detect version)
      Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git"
            or archivebox config --set SAVE_MERCURY=False to silence this warning

    ! CHROME_BINARY: unable to find binary (unable to detect version)

[+] [2021-03-27 06:06:34] Adding 1 links to index (crawl depth=0)...
    > Saved verbatim input to sources/E:\ArchiveBox\sources\1616825194-import.txt
    > Parsed 1 URLs from input (Plain Text)
    > Found 0 new URLs not already in index

[*] [2021-03-27 06:06:35] Writing 0 links to main index...
    √ ./index.sqlite3
<!-- gh-comment-id:808666118 --> @Leontking commented on GitHub (Mar 27, 2021): ~~~ E:\ArchiveBox> archivebox add https://youtube.com SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc [i] [2021-03-27 06:06:32] ArchiveBox v0.5.6: archivebox add https://youtube.com > E:\ArchiveBox SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc [!] Warning: Missing 4 recommended dependencies ! SINGLEFILE_BINARY: single-file (unable to detect version) Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git" or archivebox config --set SAVE_SINGLEFILE=False to silence this warning ! READABILITY_BINARY: readability-extractor (unable to detect version) Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git" or archivebox config --set SAVE_READABILITY=False to silence this warning ! MERCURY_BINARY: mercury-parser (unable to detect version) Hint: npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git" or archivebox config --set SAVE_MERCURY=False to silence this warning ! CHROME_BINARY: unable to find binary (unable to detect version) [+] [2021-03-27 06:06:34] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/E:\ArchiveBox\sources\1616825194-import.txt > Parsed 1 URLs from input (Plain Text) > Found 0 new URLs not already in index [*] [2021-03-27 06:06:35] Writing 0 links to main index... √ ./index.sqlite3
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

It worked, there's no error in that output ^.

<!-- gh-comment-id:808666357 --> @pirate commented on GitHub (Mar 27, 2021): It worked, there's no error in that output ^.
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

i cant see anything at all

<!-- gh-comment-id:808666439 --> @Leontking commented on GitHub (Mar 27, 2021): i cant see anything at all
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

only a index.js file

<!-- gh-comment-id:808666530 --> @Leontking commented on GitHub (Mar 27, 2021): only a index.js file
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

Not enough info, I need screenshots, log output, etc. What do you mean you cant see anything, where are you looking? What command are you running to access the server? What URLs are you visiting? etc...

<!-- gh-comment-id:808666795 --> @pirate commented on GitHub (Mar 27, 2021): Not enough info, I need screenshots, log output, etc. What do you mean you cant see anything, where are you looking? What command are you running to access the server? What URLs are you visiting? etc...
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):


[√] [2021-03-27 06:48:53] "youtube.com"
    https://youtube.com
    √ E:\ArchiveBox\archive\1616827623.800644
      > title
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
      > favicon
        Extractor failed:
            Exception Failed to chmod: favicon.ico does not exist (did the previous step fail?)
        Run to see full output:
            cd E:\ArchiveBox\archive\1616827623.800644;
            curl --silent --location --compressed --max-time 60 --output favicon.ico --user-agent "ArchiveBox/0.5.6 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://www.google.com/s2/favicons?domain=youtube.com

      > singlefile
        Extractor failed:
            FileNotFoundError [WinError 2] The system cannot find the file specified
        Run to see full output:
            cd E:\ArchiveBox\archive\1616827623.800644;
            single-file --browser-executable-path=None "--browser-args=[\"--headless\", \"--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36\", \"--window-size=1440,2000\"]" https://youtube.com singlefile.html

      > pdf
    ! Failed to archive link: Exception: Exception in archive_methods.save_pdf(Link(url=https://youtube.com))

Internal Server Error: /admin/core/snapshot/grid/
Traceback (most recent call last):
  File "D:\Python\Lib\site-packages\archivebox\extractors\__init__.py", line 108, in archive_link
    result = method_function(link=link, out_dir=out_dir)
  File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "d:\python\lib\site-packages\archivebox\extractors\pdf.py", line 61, in save_pdf
    return ArchiveResult(
  File "<string>", line 12, in __init__
  File "d:\python\lib\site-packages\archivebox\index\schema.py", line 47, in __post_init__
    self.typecheck()
  File "d:\python\lib\site-packages\archivebox\index\schema.py", line 58, in typecheck
    assert all(isinstance(arg, str) and arg for arg in self.cmd)
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "d:\python\lib\site-packages\django\core\handlers\exception.py", line 47, in inner
    response = get_response(request)
  File "d:\python\lib\site-packages\django\core\handlers\base.py", line 179, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "d:\python\lib\site-packages\django\utils\decorators.py", line 130, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "d:\python\lib\site-packages\django\views\decorators\cache.py", line 44, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "d:\python\lib\site-packages\django\contrib\admin\sites.py", line 233, in inner
    return view(request, *args, **kwargs)
  File "D:\Python\Lib\site-packages\archivebox\core\admin.py", line 173, in grid_view
    rendered_response = self.changelist_view(request)
  File "d:\python\lib\site-packages\django\utils\decorators.py", line 43, in _wrapper
    return bound_method(*args, **kwargs)
  File "d:\python\lib\site-packages\django\utils\decorators.py", line 130, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "d:\python\lib\site-packages\django\contrib\admin\options.py", line 1735, in changelist_view
    response = self.response_action(request, queryset=cl.get_queryset(request))
  File "d:\python\lib\site-packages\django\contrib\admin\options.py", line 1402, in response_action
    response = func(self, request, queryset)
  File "D:\Python\Lib\site-packages\archivebox\core\admin.py", line 184, in update_snapshots
    archive_links([
  File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "D:\Python\Lib\site-packages\archivebox\extractors\__init__.py", line 180, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "D:\Python\Lib\site-packages\archivebox\extractors\__init__.py", line 129, in archive_link
    raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_pdf(Link(url=https://youtube.com))
"POST /admin/core/snapshot/grid/ HTTP/1.1" 500 145
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp1252'>
OSError: [Errno 22] Invalid argument
<!-- gh-comment-id:808672641 --> @Leontking commented on GitHub (Mar 27, 2021): ~~~ [▶] [2021-03-27 06:48:53] Starting archiving of 1 snapshots in index... [√] [2021-03-27 06:48:53] "youtube.com" https://youtube.com √ E:\ArchiveBox\archive\1616827623.800644 > title SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc > favicon Extractor failed: Exception Failed to chmod: favicon.ico does not exist (did the previous step fail?) Run to see full output: cd E:\ArchiveBox\archive\1616827623.800644; curl --silent --location --compressed --max-time 60 --output favicon.ico --user-agent "ArchiveBox/0.5.6 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://www.google.com/s2/favicons?domain=youtube.com > singlefile Extractor failed: FileNotFoundError [WinError 2] The system cannot find the file specified Run to see full output: cd E:\ArchiveBox\archive\1616827623.800644; single-file --browser-executable-path=None "--browser-args=[\"--headless\", \"--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36\", \"--window-size=1440,2000\"]" https://youtube.com singlefile.html > pdf ! Failed to archive link: Exception: Exception in archive_methods.save_pdf(Link(url=https://youtube.com)) Internal Server Error: /admin/core/snapshot/grid/ Traceback (most recent call last): File "D:\Python\Lib\site-packages\archivebox\extractors\__init__.py", line 108, in archive_link result = method_function(link=link, out_dir=out_dir) File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function return func(*args, **kwargs) File "d:\python\lib\site-packages\archivebox\extractors\pdf.py", line 61, in save_pdf return ArchiveResult( File "<string>", line 12, in __init__ File "d:\python\lib\site-packages\archivebox\index\schema.py", line 47, in __post_init__ self.typecheck() File "d:\python\lib\site-packages\archivebox\index\schema.py", line 58, in typecheck assert all(isinstance(arg, str) and arg for arg in self.cmd) AssertionError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "d:\python\lib\site-packages\django\core\handlers\exception.py", line 47, in inner response = get_response(request) File "d:\python\lib\site-packages\django\core\handlers\base.py", line 179, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "d:\python\lib\site-packages\django\utils\decorators.py", line 130, in _wrapped_view response = view_func(request, *args, **kwargs) File "d:\python\lib\site-packages\django\views\decorators\cache.py", line 44, in _wrapped_view_func response = view_func(request, *args, **kwargs) File "d:\python\lib\site-packages\django\contrib\admin\sites.py", line 233, in inner return view(request, *args, **kwargs) File "D:\Python\Lib\site-packages\archivebox\core\admin.py", line 173, in grid_view rendered_response = self.changelist_view(request) File "d:\python\lib\site-packages\django\utils\decorators.py", line 43, in _wrapper return bound_method(*args, **kwargs) File "d:\python\lib\site-packages\django\utils\decorators.py", line 130, in _wrapped_view response = view_func(request, *args, **kwargs) File "d:\python\lib\site-packages\django\contrib\admin\options.py", line 1735, in changelist_view response = self.response_action(request, queryset=cl.get_queryset(request)) File "d:\python\lib\site-packages\django\contrib\admin\options.py", line 1402, in response_action response = func(self, request, queryset) File "D:\Python\Lib\site-packages\archivebox\core\admin.py", line 184, in update_snapshots archive_links([ File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function return func(*args, **kwargs) File "D:\Python\Lib\site-packages\archivebox\extractors\__init__.py", line 180, in archive_links archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir)) File "d:\python\lib\site-packages\archivebox\util.py", line 112, in typechecked_function return func(*args, **kwargs) File "D:\Python\Lib\site-packages\archivebox\extractors\__init__.py", line 129, in archive_link raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format( Exception: Exception in archive_methods.save_pdf(Link(url=https://youtube.com)) "POST /admin/core/snapshot/grid/ HTTP/1.1" 500 145 Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp1252'> OSError: [Errno 22] Invalid argument
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Screenshot 2021-03-27 122007

<!-- gh-comment-id:808672857 --> @Leontking commented on GitHub (Mar 27, 2021): ![Screenshot 2021-03-27 122007](https://user-images.githubusercontent.com/66869853/112712636-d9140900-8ef6-11eb-8c50-09df8492341e.jpg)
Author
Owner

@pirate commented on GitHub (Mar 27, 2021):

Can you set set PYTHONLEGACYWINDOWSFSENCODING=1, it's still showing that you're not using UTF-8 encoding.

Honestly I recommend just using Docker. You have so many dependencies missing aside from this python encoding issue that it'll just be faster overall to do it through Docker.

<!-- gh-comment-id:808692978 --> @pirate commented on GitHub (Mar 27, 2021): Can you set `set PYTHONLEGACYWINDOWSFSENCODING=1`, it's still showing that you're not using `UTF-8` encoding. Honestly I recommend just using Docker. You have so many dependencies missing aside from this python encoding issue that it'll just be faster overall to do it through Docker.
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

I cant install docker

<!-- gh-comment-id:808740809 --> @Leontking commented on GitHub (Mar 27, 2021): I cant install docker
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Its showing a bunch of errors

<!-- gh-comment-id:808740844 --> @Leontking commented on GitHub (Mar 27, 2021): Its showing a bunch of errors
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Wsl stuff HOW DO WE DO THAT

<!-- gh-comment-id:808740863 --> @Leontking commented on GitHub (Mar 27, 2021): Wsl stuff HOW DO WE DO THAT
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

I even maked a issue in oficial docker github months ago about that

<!-- gh-comment-id:808741047 --> @Leontking commented on GitHub (Mar 27, 2021): I even maked a issue in oficial docker github months ago about that
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

wait

<!-- gh-comment-id:808741078 --> @Leontking commented on GitHub (Mar 27, 2021): wait
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

can we use docker-toolbox

<!-- gh-comment-id:808741108 --> @Leontking commented on GitHub (Mar 27, 2021): can we use docker-toolbox
Author
Owner

@Leontking commented on GitHub (Mar 27, 2021):

Hope you reply soon @pirate

<!-- gh-comment-id:808741175 --> @Leontking commented on GitHub (Mar 27, 2021): Hope you reply soon @pirate
Author
Owner

@pirate commented on GitHub (Apr 6, 2021):

I think I fixed the underlying issues in the latest release. Try running this:

pip uninstall archivebox
pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@dev"
archivebox init --setup
archivebox add https://youtube.com

If you still get any AssertionError or other encoding errors comment back here and I'll reopen the issue.

<!-- gh-comment-id:813797101 --> @pirate commented on GitHub (Apr 6, 2021): I think I fixed the underlying issues in the latest release. Try running this: ``` pip uninstall archivebox pip install "git+https://github.com/ArchiveBox/ArchiveBox.git@dev" archivebox init --setup archivebox add https://youtube.com ``` If you still get any AssertionError or other encoding errors comment back here and I'll reopen the issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#429
No description provided.