[GH-ISSUE #626] Bugfix: --overwrite flag ignores disabled outputs #3408

Closed
opened 2026-03-14 22:44:16 +03:00 by kerem · 3 comments
Owner

Originally created by @thedanbob on GitHub (Jan 20, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/626

Describe the bug

archivebox add --overwrite <url> saves to every output method, including ones the user has disabled.

Steps to reproduce

  1. Set any SAVE_<output> option to False
  2. archivebox add http://example.com skips that output
  3. archivebox add --overwrite http://example.com doesn't skip that output

I believe the problem is here: github.com/ArchiveBox/ArchiveBox@befac97f52/archivebox/extractors/init.py#L105

It should be something like

if should_run(link, out_dir) or overwrite and SAVE_<output>:
Originally created by @thedanbob on GitHub (Jan 20, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/626 #### Describe the bug `archivebox add --overwrite <url>` saves to every output method, including ones the user has disabled. #### Steps to reproduce 1. Set any `SAVE_<output>` option to False 2. `archivebox add http://example.com` skips that output 3. `archivebox add --overwrite http://example.com` doesn't skip that output I believe the problem is here: https://github.com/ArchiveBox/ArchiveBox/blob/befac97f524e461f43f372cbb745c07f6f2c1f0f/archivebox/extractors/__init__.py#L105 It should be something like ```python if should_run(link, out_dir) or overwrite and SAVE_<output>: ```
Author
Owner

@pirate commented on GitHub (Jan 20, 2021):

Unfortunately the solution isn't trivial because the should_run functions only return True/False and don't distinguish between "skipping extractor because it's disabled" vs "skipping extractor because existing output is present".

We'll probably have to re-architect the should_run functions into two separate functions is_enabled() and output_exists().

Also have to make sure both env SAVE_MEDIA=True archivebox add --overwrite ... and archivebox add --overwrite --extract=media both work as expected.

<!-- gh-comment-id:764030539 --> @pirate commented on GitHub (Jan 20, 2021): Unfortunately the solution isn't trivial because the `should_run` functions only return True/False and don't distinguish between "skipping extractor because it's disabled" vs "skipping extractor because existing output is present". We'll probably have to re-architect the `should_run` functions into two separate functions `is_enabled()` and `output_exists()`. Also have to make sure both `env SAVE_MEDIA=True archivebox add --overwrite ...` and `archivebox add --overwrite --extract=media` both work as expected.
Author
Owner

@thedanbob commented on GitHub (Jan 21, 2021):

What about passing overwrite into the should_save_ functions:

# archivebox/extractors/__init__.py
if should_run(link, out_dir, overwrite):
    # etc.

# e.g. archivebox/extractors/favicon.py
def should_save_favicon(link: Link, out_dir: Optional[str]=None, overwrite: Optional[bool]=False) -> bool:
    out_dir = out_dir or link.link_dir
    if not overwrite and (Path(out_dir) / 'favicon.ico').exists():
        return False

    return SAVE_FAVICON

github.com/thedanbob/ArchiveBox@5420903102

<!-- gh-comment-id:764941306 --> @thedanbob commented on GitHub (Jan 21, 2021): What about passing `overwrite` into the `should_save_` functions: ```python # archivebox/extractors/__init__.py if should_run(link, out_dir, overwrite): # etc. # e.g. archivebox/extractors/favicon.py def should_save_favicon(link: Link, out_dir: Optional[str]=None, overwrite: Optional[bool]=False) -> bool: out_dir = out_dir or link.link_dir if not overwrite and (Path(out_dir) / 'favicon.ico').exists(): return False return SAVE_FAVICON ``` https://github.com/thedanbob/ArchiveBox/commit/5420903102981a49b97c90e61a2f6959fd49614b
Author
Owner

@pirate commented on GitHub (Jan 22, 2021):

Merged and fixed, thanks @thedanbob!

<!-- gh-comment-id:765472694 --> @pirate commented on GitHub (Jan 22, 2021): Merged and fixed, thanks @thedanbob!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3408
No description provided.