[GH-ISSUE #706] Bug: Exception in archive_methods.save_readability due to bytes string being passed to hint #1956

Closed
opened 2026-03-01 17:55:19 +03:00 by kerem · 4 comments
Owner

Originally created by @Valporaena on GitHub (Apr 15, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/706

I'm encountering the same problem user @jrruethe already described some time ago. Seems like it was solved, but it reoccurred on my setup after installing the latest update and running archivebox setup command for some reason.

  1. Ran arcivebox update (several times, it reproduces)
  2. On a specific link it crashes, giving the following output
[√] [2021-04-15 10:56:49] "The Long War on Objectivity       | The New Republic"
    https://newrepublic.com/article/158497/long-war-objectivity
    √ ./archive/1617309812.979884
      > readability
    ! Failed to archive link: Exception: Exception in archive_methods.save_readability(Link(url=https://newrepublic.com/article/158497/long-war-objectivity))

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 114, in archive_link
    log_archive_method_finished(result)
  File "/usr/lib/python3/dist-packages/archivebox/logging_util.py", line 435, in log_archive_method_finished
    hints = hints if isinstance(hints, (list, tuple)) else hints.split('\n')
TypeError: a bytes-like object is required, not 'str'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/archivebox", line 11, in <module>
    load_entry_point('archivebox==0.6.2', 'console_scripts', 'archivebox')()
  File "/usr/lib/python3/dist-packages/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/usr/lib/python3/dist-packages/archivebox/cli/__init__.py", line 80, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/usr/lib/python3/dist-packages/archivebox/cli/archivebox_update.py", line 119, in main
    update(
  File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/main.py", line 783, in update
    archive_links(to_archive, overwrite=overwrite, **archive_kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 181, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 130, in archive_link
    raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_readability(Link(url=https://newrepublic.com/article/158497/long-war-objectivity))
ArchiveBox v0.6.2
Cpython Linux Linux-5.4.0-71-generic-x86_64-with-glibc2.29 x86_64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/bin/archivebox                                                         
 √  PYTHON_BINARY         v3.8.5          valid     /usr/bin/python3.8                                                          
 √  DJANGO_BINARY         v2.2.12         valid     /usr/lib/python3/dist-packages/django/bin/django-admin.py                   
 √  CURL_BINARY           v7.68.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v10.19.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.16         valid     ./node_modules/single-file/cli/single-file                                  
 √  READABILITY_BINARY    v0.0.2          valid     ./node_modules/readability-extractor/readability-extractor                  
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/mercury-parser/cli.js                             
 √  GIT_BINARY            v2.25.1         valid     /usr/bin/git                                                                
 -  YOUTUBEDL_BINARY      -               disabled  /usr/bin/youtube-dl                                                         
 √  CHROME_BINARY         v89.0.4389.114  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v11.0.2         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/lib/python3/dist-packages/archivebox                                   
 √  TEMPLATES_DIR         3 files         valid     /usr/lib/python3/dist-packages/archivebox/templates                         
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            14 files        valid     /home/.../archivebox                                                     
 √  SOURCES_DIR           27 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           9024 files      valid     ./archive                                                                   
 √  CONFIG_FILE           291.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             105.7 MB        valid     ./index.sqlite3             
Originally created by @Valporaena on GitHub (Apr 15, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/706 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you. (the only required section is the version information) --> I'm encountering the same [problem](https://github.com/ArchiveBox/ArchiveBox/issues/458#issue-686704499) user @jrruethe already described some time ago. Seems like it was solved, but it reoccurred on my setup after installing the latest update and running `archivebox setup` command for some reason. 1. Ran `arcivebox update` (several times, it reproduces) 2. On a specific link it crashes, giving the following output ``` [√] [2021-04-15 10:56:49] "The Long War on Objectivity | The New Republic" https://newrepublic.com/article/158497/long-war-objectivity √ ./archive/1617309812.979884 > readability ! Failed to archive link: Exception: Exception in archive_methods.save_readability(Link(url=https://newrepublic.com/article/158497/long-war-objectivity)) Traceback (most recent call last): File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 114, in archive_link log_archive_method_finished(result) File "/usr/lib/python3/dist-packages/archivebox/logging_util.py", line 435, in log_archive_method_finished hints = hints if isinstance(hints, (list, tuple)) else hints.split('\n') TypeError: a bytes-like object is required, not 'str' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/bin/archivebox", line 11, in <module> load_entry_point('archivebox==0.6.2', 'console_scripts', 'archivebox')() File "/usr/lib/python3/dist-packages/archivebox/cli/__init__.py", line 140, in main run_subcommand( File "/usr/lib/python3/dist-packages/archivebox/cli/__init__.py", line 80, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore File "/usr/lib/python3/dist-packages/archivebox/cli/archivebox_update.py", line 119, in main update( File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) File "/usr/lib/python3/dist-packages/archivebox/main.py", line 783, in update archive_links(to_archive, overwrite=overwrite, **archive_kwargs) File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 181, in archive_links archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir)) File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 130, in archive_link raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format( Exception: Exception in archive_methods.save_readability(Link(url=https://newrepublic.com/article/158497/long-war-objectivity)) ``` ``` ArchiveBox v0.6.2 Cpython Linux Linux-5.4.0-71-generic-x86_64-with-glibc2.29 x86_64 IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/bin/archivebox √ PYTHON_BINARY v3.8.5 valid /usr/bin/python3.8 √ DJANGO_BINARY v2.2.12 valid /usr/lib/python3/dist-packages/django/bin/django-admin.py √ CURL_BINARY v7.68.0 valid /usr/bin/curl √ WGET_BINARY v1.20.3 valid /usr/bin/wget √ NODE_BINARY v10.19.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid ./node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid ./node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid ./node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.25.1 valid /usr/bin/git - YOUTUBEDL_BINARY - disabled /usr/bin/youtube-dl √ CHROME_BINARY v89.0.4389.114 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v11.0.2 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /usr/lib/python3/dist-packages/archivebox √ TEMPLATES_DIR 3 files valid /usr/lib/python3/dist-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 14 files valid /home/.../archivebox √ SOURCES_DIR 27 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 9024 files valid ./archive √ CONFIG_FILE 291.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 105.7 MB valid ./index.sqlite3 ```
kerem 2026-03-01 17:55:19 +03:00
Author
Owner

@pirate commented on GitHub (Apr 15, 2021):

It's a different error, unrelated, but thanks for reporting, I'll fix it.

<!-- gh-comment-id:820432626 --> @pirate commented on GitHub (Apr 15, 2021): It's a different error, unrelated, but thanks for reporting, I'll fix it.
Author
Owner

@Valporaena commented on GitHub (Apr 15, 2021):

Oh, my bad. Looked very similar, but I'm completely untrained in these things - shouldn't have presumed it was related.

<!-- gh-comment-id:820461526 --> @Valporaena commented on GitHub (Apr 15, 2021): Oh, my bad. Looked very similar, but I'm completely untrained in these things - shouldn't have presumed it was related.
Author
Owner

@pirate commented on GitHub (May 10, 2022):

I think I fixed it in d581a50, let me know if you still see this issue in the next release and comment back so I can reopen it if so.

<!-- gh-comment-id:1121907497 --> @pirate commented on GitHub (May 10, 2022): I think I fixed it in d581a50, let me know if you still see this issue in the next release and comment back so I can reopen it if so.
Author
Owner

@mike-greenmmd commented on GitHub (Feb 15, 2023):

Is there a plan to merge this into main / master?
I'm getting this exact issue using the docker compose method of running archivebox:
https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/docker-compose.yml
which is pulling the master branch

<!-- gh-comment-id:1431862422 --> @mike-greenmmd commented on GitHub (Feb 15, 2023): Is there a plan to merge this into main / master? I'm getting this exact issue using the docker compose method of running archivebox: https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/docker-compose.yml which is pulling the master branch
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1956
No description provided.