[PR #1195] Add method-specific URL allow/deny lists #2851

Closed
opened 2026-03-01 18:00:56 +03:00 by kerem · 0 comments
Owner

Original Pull Request: https://github.com/ArchiveBox/ArchiveBox/pull/1195

State: closed
Merged: Yes


Summary

This adds the ability to toggle extractors (aka methods, aka outputs) on an URL-specific basis. This is useful for sites on which singlepage, for example, does not provide a usable snapshot. Or, in cases in which you might want to only download the media for a URL and nothing else.

This PR also includes a commit to rename URL_(WHITE|BLACK)LIST to URL_(ALLOW|DENY)LIST as proposed in the documentation. The old names are preserved as aliases. I included this change in this PR so as not to have to name the new configuration parameters with the deprecated terms.

Config Example

# Only save media from TikTok, with a favicon and title
SAVE_ALLOWLIST = {"tiktok\\.com/": ["favicon", "title", "media"]}

Documentation

Glad to share some Wiki commits if you'd like to move forward with this PR. You can't PR a wiki, right?

Related issues

None found

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk
**Original Pull Request:** https://github.com/ArchiveBox/ArchiveBox/pull/1195 **State:** closed **Merged:** Yes --- # Summary This adds the ability to toggle extractors (aka methods, aka outputs) on an URL-specific basis. This is useful for sites on which `singlepage`, for example, does not provide a usable snapshot. Or, in cases in which you might want to only download the media for a URL and nothing else. This PR also includes a commit to rename `URL_(WHITE|BLACK)LIST` to `URL_(ALLOW|DENY)LIST` as [proposed in the documentation](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#user-content-url_blacklist). The old names are preserved as aliases. I included this change in this PR so as not to have to name the new configuration parameters with the deprecated terms. ## Config Example ```ini # Only save media from TikTok, with a favicon and title SAVE_ALLOWLIST = {"tiktok\\.com/": ["favicon", "title", "media"]} ``` ## Documentation Glad to share some Wiki commits if you'd like to move forward with this PR. You can't PR a wiki, right? # Related issues *None found* # Changes these areas - [ ] Bugfixes - [x] Feature behavior - [ ] Command line interface - [x] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk
kerem 2026-03-01 18:00:56 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2851
No description provided.