[GH-ISSUE #1475] Feature Request: Automatic rate limiting for multiple entries of the same domain #3892

Open
opened 2026-03-15 00:53:06 +03:00 by kerem · 1 comment
Owner

Originally created by @maxiride on GitHub (Jul 31, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1475

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

When submitting multiple pages of the same website (domain) it may trigger a captcha or other limiting factor.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

The tool should self- limit processing entries of the same domain. Optionally the delay should be configurable.

What hacks or alternative solutions have you tried to solve the problem?

Nothing. On the other hand I discovered too late that many pages weren't archived and now they aren't available anymore on any platform.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue maybe 5 bucks. Ain't much I know.
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
Originally created by @maxiride on GitHub (Jul 31, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1475 ## Type - [ ] General question or discussion - [x] Propose a brand new feature - [ ] Request modification of existing behavior or design ## What is the problem that your feature request solves When submitting multiple pages of the same website (domain) it may trigger a captcha or other limiting factor. ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes The tool should self- limit processing entries of the same domain. Optionally the delay should be configurable. ## What hacks or alternative solutions have you tried to solve the problem? Nothing. On the other hand I discovered too late that many pages weren't archived and now they aren't available anymore on any platform. ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [ ] It's important to add it in the near-mid term future - [x] It would be nice to have eventually --- - [x] I'm willing to contribute [dev time](https://github.com/ArchiveBox/ArchiveBox#archivebox-development) / [money](https://github.com/sponsors/pirate) to fix this issue maybe 5 bucks. Ain't much I know. - [x] I like ArchiveBox so far / would recommend it to a friend - [ ] I've had a lot of difficulty getting ArchiveBox set up
Author
Owner

@pirate commented on GitHub (Aug 12, 2024):

We discussed this as a sub-requirement of recursive archiving (https://github.com/ArchiveBox/ArchiveBox/issues/191) but I agree it deserves a separate issue.

The (currently closed-source) tooling I provide on top of ArchiveBox for paying clients does implement fairly advanced rate-limiting on a per-local-host, per-remote-IP, per-domain, and per-chrome-profile basis, but it's not yet public.

<!-- gh-comment-id:2284934285 --> @pirate commented on GitHub (Aug 12, 2024): We discussed this as a sub-requirement of recursive archiving (https://github.com/ArchiveBox/ArchiveBox/issues/191) but I agree it deserves a separate issue. The (currently closed-source) tooling I provide on top of ArchiveBox for [paying clients](https://docs.sweeting.me/s/archivebox-consulting-services) does implement fairly advanced rate-limiting on a per-local-host, per-remote-IP, per-domain, and per-chrome-profile basis, but it's not yet public.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3892
No description provided.