[GH-ISSUE #458] Exception in archive_methods.save_readability #3324

Closed
opened 2026-03-14 22:08:24 +03:00 by kerem · 4 comments
Owner

Originally created by @jrruethe on GitHub (Aug 26, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/458

When attempting to archive this link with Readability enabled, the following exception occurs.

From what I can tell so far, I think the exception occurs somewhere in here before cmd is instantiated here, which means that cmd doesn't exist after the exception is caught and control makes it down to here.

[√] [2020-08-26 21:10:43] "Big Brother is Watching - '17 WRX Limited and all '17 STI - Page 3 - NASIOC"
    https://forums.nasioc.com/forums/showthread.php?t=2803387&highlight=big+brother&page=3
    √ ./archive/1576899114
      > readability
    ! Failed to archive link: Exception: Exception in archive_methods.save_readability(Link(url=https://forums.nasioc.com/forums/showthread.php?t=2803387&highlight=big+brother&page=3))

Traceback (most recent call last):
  File "/app/archivebox/extractors/__init__.py", line 91, in archive_link
    result = method_function(link=link, out_dir=out_dir)
  File "/app/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/extractors/readability.py", line 109, in save_readability
    cmd=cmd,
UnboundLocalError: local variable 'cmd' referenced before assignment

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 33, in <module>
    sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())
  File "/app/archivebox/cli/__init__.py", line 122, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 62, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/app/archivebox/cli/archivebox_add.py", line 78, in main
    add(
  File "/app/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/main.py", line 572, in add
    archive_links(all_links, overwrite=overwrite, out_dir=out_dir)
  File "/app/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/extractors/__init__.py", line 150, in archive_links
    archive_link(link, overwrite=overwrite, methods=methods, out_dir=link.link_dir)
  File "/app/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/extractors/__init__.py", line 101, in archive_link
    raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_readability(Link(url=https://forums.nasioc.com/forums/showthread.php?t=2803387&amp;highlight=big+brother&amp;page=3))

This is with Docker image nikisweeting/archivebox:0.4.21

Originally created by @jrruethe on GitHub (Aug 26, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/458 When attempting to archive [this](https://forums.nasioc.com/forums/showthread.php?t=2803387&amp;highlight=big+brother&amp;page=3) link with Readability enabled, the following exception occurs. From what I can tell so far, I think the exception occurs somewhere in [here](https://github.com/pirate/ArchiveBox/blob/master/archivebox/extractors/readability.py#L69) before `cmd` is instantiated [here](https://github.com/pirate/ArchiveBox/blob/master/archivebox/extractors/readability.py#L75), which means that `cmd` doesn't exist after the exception is caught and control makes it down to [here](https://github.com/pirate/ArchiveBox/blob/master/archivebox/extractors/readability.py#L109). ``` [√] [2020-08-26 21:10:43] "Big Brother is Watching - '17 WRX Limited and all '17 STI - Page 3 - NASIOC" https://forums.nasioc.com/forums/showthread.php?t=2803387&amp;highlight=big+brother&amp;page=3 √ ./archive/1576899114 > readability ! Failed to archive link: Exception: Exception in archive_methods.save_readability(Link(url=https://forums.nasioc.com/forums/showthread.php?t=2803387&amp;highlight=big+brother&amp;page=3)) Traceback (most recent call last): File "/app/archivebox/extractors/__init__.py", line 91, in archive_link result = method_function(link=link, out_dir=out_dir) File "/app/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/extractors/readability.py", line 109, in save_readability cmd=cmd, UnboundLocalError: local variable 'cmd' referenced before assignment The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/bin/archivebox", line 33, in <module> sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')()) File "/app/archivebox/cli/__init__.py", line 122, in main run_subcommand( File "/app/archivebox/cli/__init__.py", line 62, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore File "/app/archivebox/cli/archivebox_add.py", line 78, in main add( File "/app/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/main.py", line 572, in add archive_links(all_links, overwrite=overwrite, out_dir=out_dir) File "/app/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/extractors/__init__.py", line 150, in archive_links archive_link(link, overwrite=overwrite, methods=methods, out_dir=link.link_dir) File "/app/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/extractors/__init__.py", line 101, in archive_link raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format( Exception: Exception in archive_methods.save_readability(Link(url=https://forums.nasioc.com/forums/showthread.php?t=2803387&amp;highlight=big+brother&amp;page=3)) ``` This is with Docker image `nikisweeting/archivebox:0.4.21`
kerem 2026-03-14 22:08:24 +03:00
Author
Owner

@cdvv7788 commented on GitHub (Aug 26, 2020):

Thanks for the report. I will check it.

<!-- gh-comment-id:681159119 --> @cdvv7788 commented on GitHub (Aug 26, 2020): Thanks for the report. I will check it.
Author
Owner

@cdvv7788 commented on GitHub (Aug 27, 2020):

@jrruethe I sent a PR. A new version with the fix should come out next week. You can try the fix by using that branch directly for now.

<!-- gh-comment-id:682001988 --> @cdvv7788 commented on GitHub (Aug 27, 2020): @jrruethe I sent a PR. A new version with the fix should come out next week. You can try the fix by using that branch directly for now.
Author
Owner

@jrruethe commented on GitHub (Aug 27, 2020):

Thank you very much!

<!-- gh-comment-id:682008716 --> @jrruethe commented on GitHub (Aug 27, 2020): Thank you very much!
Author
Owner

@pirate commented on GitHub (Sep 1, 2020):

This should be fixed now on master. If you still encounter any issues comment back here and I'll reopen the ticket.

To use master immediately without waiting for the next release you can build it directly from Github:

docker build 'https://github.com/pirate/ArchiveBox.git#master' -t archivebox
docker run -v $PWD:/data archivebox update
<!-- gh-comment-id:685191315 --> @pirate commented on GitHub (Sep 1, 2020): This should be fixed now on `master`. If you still encounter any issues comment back here and I'll reopen the ticket. To use `master` immediately without waiting for the next release you can build it directly from Github: ```bash docker build 'https://github.com/pirate/ArchiveBox.git#master' -t archivebox docker run -v $PWD:/data archivebox update ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3324
No description provided.