[GH-ISSUE #1319] Feature Request: Automatically rewrite URLs to use alternative frontends for difficult-to-archive sites (e.g. using benbusby/farside) #2319

Open
opened 2026-03-01 17:58:09 +03:00 by kerem · 0 comments
Owner

Originally created by @pirate on GitHub (Jan 12, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1319

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

Sites like Facebook, Instagram, Twitter, Tiktok, etc. are difficult to archive and frequently block bot traffic or require logged-in sessions to simply view content.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

Many alternative frontends exist that display social media content with less clutter and in a more easily archivable way. e.g.

ArchiveBox should be configurable to rewrite sites the user chooses to use alternative frontends.
Ideally it should be a general solution to URL rewriting and cleanup that can take over from URL_ALLOWLIST/DENYLIST and also handle merging duplicate URLs.

What hacks or alternative solutions have you tried to solve the problem?

Manually replacing URL fragments before piping them in to archivebox:

cat urls.txt | perl -pe 's/twitter\.com/nitter.net/gm' | archivebox add

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually
Originally created by @pirate on GitHub (Jan 12, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1319 ## Type - [ ] General question or discussion - [x] Propose a brand new feature - [ ] Request modification of existing behavior or design ## What is the problem that your feature request solves Sites like Facebook, Instagram, Twitter, Tiktok, etc. are difficult to archive and frequently block bot traffic or require logged-in sessions to simply view content. ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes Many alternative frontends exist that display social media content with less clutter and in a more easily archivable way. e.g. - [`twitter.com/ArchiveBoxApp`](https://nitter.net/ArchiveBoxApp) -> [`nitter.net/ArchiveBoxApp`](https://nitter.net/ArchiveBoxApp) - Reddit -> teddit - and many more... https://github.com/mendel5/alternative-front-ends ArchiveBox should be configurable to rewrite sites the user chooses to use alternative frontends. Ideally it should be a general solution to URL rewriting and cleanup that can take over from URL_ALLOWLIST/DENYLIST and also handle merging duplicate URLs. ## What hacks or alternative solutions have you tried to solve the problem? Manually replacing URL fragments before piping them in to archivebox: ```bash cat urls.txt | perl -pe 's/twitter\.com/nitter.net/gm' | archivebox add ``` ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [ ] It's important to add it in the near-mid term future - [x] It would be nice to have eventually
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2319
No description provided.