[PR #2115] [MERGED] feat: add crawler domain rate limiting #1988

Closed
opened 2026-03-02 12:00:04 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/karakeep-app/karakeep/pull/2115
Author: @MohamedBassem
Created: 11/9/2025
Status: Merged
Merged: 11/9/2025
Merged by: @MohamedBassem

Base: mainHead: codex/add-rate-limiting-per-domain


📝 Commits (7)

  • aafbbfa feat: add crawler domain rate limiting
  • 75dc8dc fix: use runner run results in crawler
  • 48e8218 some fixes
  • 98009dc Merge branch 'main' into codex/add-rate-limiting-per-domain
  • b8cd28e add missing table del
  • 6467197 extract into function
  • c73da92 minor fix

📊 Changes

5 files changed (+121 additions, -32 deletions)

View changed files

📝 apps/workers/workers/crawlerWorker.ts (+80 -4)
📝 docs/docs/03-configuration.md (+25 -23)
📝 packages/plugins/ratelimit-memory/src/index.test.ts (+3 -1)
📝 packages/shared/config.ts (+10 -0)
📝 packages/shared/ratelimiting.ts (+3 -4)

📄 Description

Summary

  • add configuration support for per-domain crawler rate limiting
  • use the rate limiting plugin in the crawler worker to defer jobs when domains exceed limits
  • document the new crawler domain rate limiting environment settings

Testing

  • pnpm --filter @karakeep/shared typecheck
  • pnpm --filter @karakeep/workers typecheck

Codex Task


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/karakeep-app/karakeep/pull/2115 **Author:** [@MohamedBassem](https://github.com/MohamedBassem) **Created:** 11/9/2025 **Status:** ✅ Merged **Merged:** 11/9/2025 **Merged by:** [@MohamedBassem](https://github.com/MohamedBassem) **Base:** `main` ← **Head:** `codex/add-rate-limiting-per-domain` --- ### 📝 Commits (7) - [`aafbbfa`](https://github.com/karakeep-app/karakeep/commit/aafbbfa3ac28043f3c90c5d05c81ddbe6acf37a6) feat: add crawler domain rate limiting - [`75dc8dc`](https://github.com/karakeep-app/karakeep/commit/75dc8dcc4bee67d954c577067fd4d68e11814e6c) fix: use runner run results in crawler - [`48e8218`](https://github.com/karakeep-app/karakeep/commit/48e8218edd871e8920ed6c8753ea518f8ef91737) some fixes - [`98009dc`](https://github.com/karakeep-app/karakeep/commit/98009dc56af60274897d93c098a811d450a92426) Merge branch 'main' into codex/add-rate-limiting-per-domain - [`b8cd28e`](https://github.com/karakeep-app/karakeep/commit/b8cd28eafc86710a7dc8e87999acf20d914bb4f4) add missing table del - [`6467197`](https://github.com/karakeep-app/karakeep/commit/646719791e830eda0d1e6f21fb01c646fdeb126e) extract into function - [`c73da92`](https://github.com/karakeep-app/karakeep/commit/c73da92d437ed58d3f940a5dbb3f71341bb6af41) minor fix ### 📊 Changes **5 files changed** (+121 additions, -32 deletions) <details> <summary>View changed files</summary> 📝 `apps/workers/workers/crawlerWorker.ts` (+80 -4) 📝 `docs/docs/03-configuration.md` (+25 -23) 📝 `packages/plugins/ratelimit-memory/src/index.test.ts` (+3 -1) 📝 `packages/shared/config.ts` (+10 -0) 📝 `packages/shared/ratelimiting.ts` (+3 -4) </details> ### 📄 Description ## Summary - add configuration support for per-domain crawler rate limiting - use the rate limiting plugin in the crawler worker to defer jobs when domains exceed limits - document the new crawler domain rate limiting environment settings ## Testing - pnpm --filter @karakeep/shared typecheck - pnpm --filter @karakeep/workers typecheck ------ [Codex Task](https://chatgpt.com/codex/tasks/task_e_6910d2fc74dc832cb173533ce660ba91) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 12:00:04 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1988
No description provided.