[GH-ISSUE #1025] Enhancement: Add WGET_EXTRA_ARGS, CURL_EXTRA_ARGS, SINGLEFILE_EXTRA_ARGS to extend default args without overriding defaults #2150

Open
opened 2026-03-01 17:56:53 +03:00 by kerem · 1 comment
Owner

Originally created by @ntevenhere on GitHub (Sep 12, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1025

These WGET_ARGS, CURL_ARGS, etc. options let the user shoot themselves in the foot, silently. I think the documentation or the variables themselves should be changed to be more ergonomic.

Did you need to add a header to wget?

WGET_ARGS=['--header=Accept-Language: en-US,en']

Look at this configuration, it looks inoffensive. Checking the documention, nothing ticks you off you're using it wrong. But no.
After this, your 🆆 button won't take you to the main html you archived, and wget archives things slighlty differently, silently.

Why? By setting WGET_ARGS you overwrote vital settings, as it turns out they're also stored in WGET_ARGS. The documentation doesn't tell you about this. This happened to me, I just happily overwrote the variable. When I should've written something like this:

WGET_ARGS=['--header=Accept-Language: en-US,en;q=0.5', '--no-verbose', '--adjust-extension', '--convert-links', '--force-directories', '--backup-converted', '--span-hosts', '--no-parent', '-e', 'robots=off']

Proposal:

  1. The documentation for *_ARGS should have a good warning or display the default value, so that users can suspect they're overwriting something.
  2. Or, create EXTRA_WGET_ARGS, EXTRA_CURL_ARGS, and so on. These won't overwrite the now-considered low-level *_ARGS. EXTRA_*_ARGS shall be the more user-facing option and more promoted in the documentation.
Originally created by @ntevenhere on GitHub (Sep 12, 2022). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1025 These `WGET_ARGS`, `CURL_ARGS`, etc. options let the user shoot themselves in the foot, silently. I think the documentation or the variables themselves should be changed to be more ergonomic. Did you need to add a header to wget? ```py WGET_ARGS=['--header=Accept-Language: en-US,en'] ``` Look at this configuration, it looks inoffensive. Checking [the documention](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#wget_args), nothing ticks you off you're using it wrong. **But no.** After this, your 🆆 button won't take you to the main html you archived, and wget archives things slighlty differently, silently. Why? By setting WGET_ARGS you overwrote [vital settings](https://github.com/ArchiveBox/ArchiveBox/blob/03eb7e58758d8dcb85ed781e713fc083f8292264/archivebox/config.py#L159), as it turns out they're also stored in WGET_ARGS. The documentation doesn't tell you about this. This happened to me, I just happily overwrote the variable. When I should've written something like this: ```py WGET_ARGS=['--header=Accept-Language: en-US,en;q=0.5', '--no-verbose', '--adjust-extension', '--convert-links', '--force-directories', '--backup-converted', '--span-hosts', '--no-parent', '-e', 'robots=off'] ``` Proposal: 1. The documentation for `*_ARGS` should have a good warning or display the default value, so that users can suspect they're overwriting something. 2. Or, create `EXTRA_WGET_ARGS`, `EXTRA_CURL_ARGS`, and so on. These won't overwrite the now-considered low-level `*_ARGS`. `EXTRA_*_ARGS` shall be the more user-facing option and more promoted in the documentation.
Author
Owner

@pirate commented on GitHub (Sep 28, 2022):

I'm ok with the EXTRA_ options as I want users to be able to override the defaults.

<!-- gh-comment-id:1260263420 --> @pirate commented on GitHub (Sep 28, 2022): I'm ok with the EXTRA_ options as I want users to be able to override the defaults.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2150
No description provided.