[PR #200] [MERGED] Exclude blacklisted URLs #2602

Closed
opened 2026-03-01 18:00:05 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/200
Author: @mlazana
Created: 3/29/2019
Status: Merged
Merged: 3/30/2019
Merged by: @pirate

Base: masterHead: master


📝 Commits (7)

  • 417ee9e add env variable URL_BLACKLIST
  • 4d10568 exclude links that are in blacklist
  • 81d8464 fix comments in links.py
  • a3705e3 Merge remote-tracking branch 'upstream/master'
  • 8502fa5 config.py: update function exclude_blacklisted(links)
  • 066b36b make URL_BLACKLIST empty by default
  • 529a0f8 fix broken function name

📊 Changes

2 files changed (+18 additions, -9 deletions)

View changed files

📝 archivebox/config.py (+4 -0)
📝 archivebox/links.py (+14 -9)

📄 Description

Summary

Exclude URLs that match the blacklisted url pattern regex.

Related to #38

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Archived data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/200 **Author:** [@mlazana](https://github.com/mlazana) **Created:** 3/29/2019 **Status:** ✅ Merged **Merged:** 3/30/2019 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `master` ← **Head:** `master` --- ### 📝 Commits (7) - [`417ee9e`](https://github.com/ArchiveBox/ArchiveBox/commit/417ee9e302e3b5edcf94e55bae4b06d4f9080796) add env variable URL_BLACKLIST - [`4d10568`](https://github.com/ArchiveBox/ArchiveBox/commit/4d1056847750e5ba2aa1cee0800c43ceb68e1bea) exclude links that are in blacklist - [`81d8464`](https://github.com/ArchiveBox/ArchiveBox/commit/81d846427e95a80cc92bac0b28f04c2e8d06ccf3) fix comments in links.py - [`a3705e3`](https://github.com/ArchiveBox/ArchiveBox/commit/a3705e31c6e32cf8a2dc1fa251e798d5c2e7cf03) Merge remote-tracking branch 'upstream/master' - [`8502fa5`](https://github.com/ArchiveBox/ArchiveBox/commit/8502fa5cc3aa608a546fd93483f113a826b02332) config.py: update function exclude_blacklisted(links) - [`066b36b`](https://github.com/ArchiveBox/ArchiveBox/commit/066b36b6a9d75d9dc15060b1329a3a617250d576) make URL_BLACKLIST empty by default - [`529a0f8`](https://github.com/ArchiveBox/ArchiveBox/commit/529a0f8bb2655128b03b568d2fe41f506645fb9d) fix broken function name ### 📊 Changes **2 files changed** (+18 additions, -9 deletions) <details> <summary>View changed files</summary> 📝 `archivebox/config.py` (+4 -0) 📝 `archivebox/links.py` (+14 -9) </details> ### 📄 Description # Summary Exclude URLs that match the blacklisted url pattern regex. Related to #38 # Changes these areas - [ ] Bugfixes - [X] Feature behavior - [ ] Command line interface - [X] Configuration options - [ ] Internal architecture - [ ] Archived data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-01 18:00:05 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2602
No description provided.