[GH-ISSUE #1972] Allow automatic deletion of original precrawled files from SingleFile extension #1227

Open
opened 2026-03-02 11:55:54 +03:00 by kerem · 2 comments
Owner

Originally created by @qixing-jk on GitHub (Sep 22, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1972

Describe the feature you'd like

Background
When importing content via the SingleFile extension, Karakeep stores large "precrawled" files. After the system finishes extracting the required content (such as text, metadata, screenshots, etc.), the original precrawled files still occupy significant disk space.

Proposal
Add an option (global setting, environment variable, or extension parameter) to automatically delete the original precrawled file after the extraction and processing are complete.

Describe the benefits this would bring to existing Karakeep users

  • Saves disk space, especially for large or batch imports.
  • No manual cleanup required.
  • Keeps the system simple by using existing extension parameters.

Can the goal of this request already be achieved via other means?

No

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @qixing-jk on GitHub (Sep 22, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1972 ### Describe the feature you'd like **Background** When importing content via the SingleFile extension, Karakeep stores large "precrawled" files. After the system finishes extracting the required content (such as text, metadata, screenshots, etc.), the original precrawled files still occupy significant disk space. **Proposal** Add an option (global setting, environment variable, or extension parameter) to automatically delete the original precrawled file after the extraction and processing are complete. ### Describe the benefits this would bring to existing Karakeep users - Saves disk space, especially for large or batch imports. - No manual cleanup required. - Keeps the system simple by using existing extension parameters. ### Can the goal of this request already be achieved via other means? No ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_
Author
Owner

@MohamedBassem commented on GitHub (Sep 28, 2025):

I think that makes sense, probably can be an extra flag in the singlefile endpoint

<!-- gh-comment-id:3343963926 --> @MohamedBassem commented on GitHub (Sep 28, 2025): I think that makes sense, probably can be an extra flag in the singlefile endpoint
Author
Owner

@katchy3132 commented on GitHub (Sep 30, 2025):

What about compressing the archive after everything is done? Keep it in cold storage unless needed again. Could help with the disk space issue.

<!-- gh-comment-id:3350836945 --> @katchy3132 commented on GitHub (Sep 30, 2025): What about compressing the archive after everything is done? Keep it in cold storage unless needed again. Could help with the disk space issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1227
No description provided.