mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #1250] Feature Request: Simple lists as well as regexes for allow/denylists #3787
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3787
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @admiral-Guck on GitHub (Oct 21, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1250
Type
Regexes quickly become unwieldy when dealing with many domains. Two new files like
data/allowlistanddata/denylistwith aproto://domain.tld\n/proto://*.domain.tld\nformat would make quickly adding domains much easier than modifying or writing regexes. I suppose this represents an additional pass over queued URLs with a regex compiled from these two lists.I recognise the flexibility a regex provides and don't want to impinge on that functionality, merely add a more comfortable method for the simple cases (https://github.com/ArchiveBox/ArchiveBox/issues/1251).
How badly do you want this new feature?
@pirate commented on GitHub (Oct 22, 2023):
I totally agree, a better long term solution here is needed.
I've been toying with the idea of "rules" and "rule sets" lately. Not just as a way to block or allow certain URLs but also as a way to trigger extractors to run in the first place / as a basic architectural building block for a lot of archivebox behavior in a more event-driven reimagining of the current structure.