[PR #1360] Add _EXTRA_ARGS for various extractors #2894

Closed
opened 2026-03-01 18:01:04 +03:00 by kerem · 0 comments
Owner

Original Pull Request: https://github.com/ArchiveBox/ArchiveBox/pull/1360

State: closed
Merged: Yes


Summary

This PR adds a way to configure wget, curl, singlefile, youtube-dl, and chrome without overriding the default options.

The main default options, extra options, and more specific options (like WGET_USER_AGENT) are all deduplicated. It's assumed that options set with more specificity should take precedence, so something like the --user-agent argument for wget will come from WGET_USER_AGENT instead of _ARGS or _EXTRA_ARGS, and options set in _EXTRA_ARGS take precedence over _ARGS.

This PR might need some more testing with more complex configurations. Hopefully it's simple enough that won't break anything while still being useful, but I'm not a wizard with curl or wget so there might be some possibilities I don't know about.

Related issues

#1025

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk
**Original Pull Request:** https://github.com/ArchiveBox/ArchiveBox/pull/1360 **State:** closed **Merged:** Yes --- <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary This PR adds a way to configure `wget`, `curl`, `singlefile`, `youtube-dl`, and `chrome` without overriding the default options. The main default options, extra options, and more specific options (like `WGET_USER_AGENT`) are all deduplicated. It's assumed that options set with more specificity should take precedence, so something like the `--user-agent` argument for `wget` will come from `WGET_USER_AGENT` instead of `_ARGS` or `_EXTRA_ARGS`, and options set in `_EXTRA_ARGS` take precedence over `_ARGS`. This PR might need some more testing with more complex configurations. Hopefully it's simple enough that won't break anything while still being useful, but I'm not a wizard with `curl` or `wget` so there might be some possibilities I don't know about. <!--e.g. This PR fixes ABC or adds the ability to do XYZ...--> # Related issues #1025 <!-- e.g. #123 or Roadmap goal # https://github.com/pirate/ArchiveBox/wiki/Roadmap --> # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [x] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk
kerem 2026-03-01 18:01:04 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2894
No description provided.