[PR #2091] [CLOSED] feat: archive git repositories #1971

Closed
opened 2026-03-02 11:59:59 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/karakeep-app/karakeep/pull/2091
Author: @maya-doshi
Created: 11/5/2025
Status: Closed

Base: mainHead: feature/git-archive


📝 Commits (7)

  • d013eb8 kinda bad first pass, it works tho
  • b0cf09f fix: issues with deleting leftover assets
  • 158e058 combine zvideoRequestSchema and zgitRequestSchema
  • 1190c3a rename config variables
  • f3a3ac1 fix timeout configuration issues
  • b15b6b4 add note about fetching all branches
  • dfde643 docs: add config documentation for git archiving

📊 Changes

18 files changed (+331 additions, -5 deletions)

View changed files

📝 apps/web/components/dashboard/preview/LinkContentSection.tsx (+14 -0)
📝 apps/web/lib/attachments.tsx (+2 -0)
📝 apps/workers/index.ts (+2 -0)
📝 apps/workers/workerUtils.ts (+3 -0)
📝 apps/workers/workers/crawlerWorker.ts (+12 -0)
apps/workers/workers/gitWorker.ts (+239 -0)
📝 apps/workers/workers/videoWorker.ts (+2 -2)
📝 docs/docs/03-configuration.md (+5 -0)
📝 packages/db/schema.ts (+2 -0)
📝 packages/open-api/karakeep-openapi-spec.json (+5 -1)
📝 packages/sdk/src/karakeep-api.d.ts (+1 -0)
📝 packages/shared-server/src/queues.ts (+14 -2)
📝 packages/shared/assetdb.ts (+3 -0)
📝 packages/shared/config.ts (+14 -0)
📝 packages/shared/types/bookmarks.ts (+2 -0)
📝 packages/trpc/lib/attachments.ts (+5 -0)
📝 packages/trpc/models/bookmarks.ts (+3 -0)
📝 packages/trpc/routers/bookmarks.ts (+3 -0)

📄 Description

issue where mentioned: https://github.com/karakeep-app/karakeep/issues/1186

adds dependencies for system git and tar.

new config options:

  • CRAWLER_GIT_DOWNLOAD: enable archiving (false default)
  • CRAWLER_GIT_TIMEOUT: timeout for jon (60s default)
  • CRAWLER_GIT_MIRROR: use --mirror flag (false default)
  • CRAWLER_GIT_CLONE_DEPTH: depth of repo clone (-1, or full depth default)
  • CRAWLER_GIT_CLONE_ALL_BRANCHES: get all branches (false default), currently not implemented (will finish it if approved)

this is my first time contributing to karakeep so lmk what i should change, and how i can add docuemntation

also the filename for the download only has '.gz' (no '.tar') but that bug is also present with the video downloads all coming as mp4s


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/karakeep-app/karakeep/pull/2091 **Author:** [@maya-doshi](https://github.com/maya-doshi) **Created:** 11/5/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `feature/git-archive` --- ### 📝 Commits (7) - [`d013eb8`](https://github.com/karakeep-app/karakeep/commit/d013eb8ca38efd037b581fbc97bea16d2ab2cfd3) kinda bad first pass, it works tho - [`b0cf09f`](https://github.com/karakeep-app/karakeep/commit/b0cf09f3ea8a7d8a4e2f8afbb4708cef584dcc16) fix: issues with deleting leftover assets - [`158e058`](https://github.com/karakeep-app/karakeep/commit/158e058fce132e98a2be3f13a908dffd4b5417fb) combine zvideoRequestSchema and zgitRequestSchema - [`1190c3a`](https://github.com/karakeep-app/karakeep/commit/1190c3a85df1cdb9b1da8b71aeda69ed73894810) rename config variables - [`f3a3ac1`](https://github.com/karakeep-app/karakeep/commit/f3a3ac197cf2b9e1e572601870915da956bdb1e7) fix timeout configuration issues - [`b15b6b4`](https://github.com/karakeep-app/karakeep/commit/b15b6b4cd7997f372b0db4e110f3ca35183a8d89) add note about fetching all branches - [`dfde643`](https://github.com/karakeep-app/karakeep/commit/dfde643f756491d474fb4fe94cdb4ea50781a30e) docs: add config documentation for git archiving ### 📊 Changes **18 files changed** (+331 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `apps/web/components/dashboard/preview/LinkContentSection.tsx` (+14 -0) 📝 `apps/web/lib/attachments.tsx` (+2 -0) 📝 `apps/workers/index.ts` (+2 -0) 📝 `apps/workers/workerUtils.ts` (+3 -0) 📝 `apps/workers/workers/crawlerWorker.ts` (+12 -0) ➕ `apps/workers/workers/gitWorker.ts` (+239 -0) 📝 `apps/workers/workers/videoWorker.ts` (+2 -2) 📝 `docs/docs/03-configuration.md` (+5 -0) 📝 `packages/db/schema.ts` (+2 -0) 📝 `packages/open-api/karakeep-openapi-spec.json` (+5 -1) 📝 `packages/sdk/src/karakeep-api.d.ts` (+1 -0) 📝 `packages/shared-server/src/queues.ts` (+14 -2) 📝 `packages/shared/assetdb.ts` (+3 -0) 📝 `packages/shared/config.ts` (+14 -0) 📝 `packages/shared/types/bookmarks.ts` (+2 -0) 📝 `packages/trpc/lib/attachments.ts` (+5 -0) 📝 `packages/trpc/models/bookmarks.ts` (+3 -0) 📝 `packages/trpc/routers/bookmarks.ts` (+3 -0) </details> ### 📄 Description issue where mentioned: https://github.com/karakeep-app/karakeep/issues/1186 adds dependencies for system git and tar. new config options: - CRAWLER_GIT_DOWNLOAD: enable archiving (false default) - CRAWLER_GIT_TIMEOUT: timeout for jon (60s default) - CRAWLER_GIT_MIRROR: use --mirror flag (false default) - CRAWLER_GIT_CLONE_DEPTH: depth of repo clone (-1, or full depth default) - CRAWLER_GIT_CLONE_ALL_BRANCHES: get all branches (false default), currently not implemented (will finish it if approved) this is my first time contributing to karakeep so lmk what i should change, and how i can add docuemntation also the filename for the download only has '.gz' (no '.tar') but that bug is also present with the video downloads all coming as mp4s --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 11:59:59 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1971
No description provided.