[GH-ISSUE #1974] /tmp folder is filling with monolith and playwright files #1226

Open
opened 2026-03-02 11:55:54 +03:00 by kerem · 10 comments
Owner

Originally created by @maelp on GitHub (Sep 22, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1974

Describe the Bug

After a few months of use I saw that my docker overlay2 layer containing Karakeep was 20G(!!) upon investigation, this was mostly from the /tmp folder inside which accumulated old files, most notably monolith and playwright temp files

Steps to Reproduce

Run a shell in the container
Go to /tmp
ls -lah

Expected Behaviour

Should clean up temp files regularly

Screenshots or Additional Context

No response

Device Details

No response

Exact Karakeep Version

latest

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem
Originally created by @maelp on GitHub (Sep 22, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1974 ### Describe the Bug After a few months of use I saw that my docker overlay2 layer containing Karakeep was 20G(!!) upon investigation, this was mostly from the /tmp folder inside which accumulated old files, most notably monolith and playwright temp files ### Steps to Reproduce Run a shell in the container Go to /tmp ls -lah ### Expected Behaviour Should clean up temp files regularly ### Screenshots or Additional Context _No response_ ### Device Details _No response_ ### Exact Karakeep Version latest ### Have you checked the troubleshooting guide? - [x] I have checked the troubleshooting guide and I haven't found a solution to my problem
Author
Owner

@maelp commented on GitHub (Sep 22, 2025):

Image

Image

<!-- gh-comment-id:3321501810 --> @maelp commented on GitHub (Sep 22, 2025): ![Image](https://github.com/user-attachments/assets/a8ccda41-39d7-4408-bd8a-36f226dc5271) ![Image](https://github.com/user-attachments/assets/35311623-bb1a-4bbf-9ade-47264ec3a167)
Author
Owner

@thiswillbeyourgithub commented on GitHub (Sep 23, 2025):

du -skh on /tmp shows only 6Mb. I wonder what I'm doing "right"

<!-- gh-comment-id:3322656921 --> @thiswillbeyourgithub commented on GitHub (Sep 23, 2025): `du -skh` on `/tmp` shows only 6Mb. I wonder what I'm doing "right"
Author
Owner

@maelp commented on GitHub (Sep 23, 2025):

Might perhaps depend on usage? and on whether you're downloading videos, etc? also whether you're downloading a full archive by default or not (which I do), not sure... do you also see a lot of accumulated files? or only a few?

<!-- gh-comment-id:3322685367 --> @maelp commented on GitHub (Sep 23, 2025): Might perhaps depend on usage? and on whether you're downloading videos, etc? also whether you're downloading a full archive by default or not (which I do), not sure... do you also see a lot of accumulated files? or only a few?
Author
Owner

@maelp commented on GitHub (Sep 23, 2025):

Might also have been an old version that accumulated stuff and I had just not noticed until now, not sure...

EDIT: looking at the timestamps on monolith files they seemed recent, and I have updated to the latest version a few days/weeks ago, so it seems to be also in the recent version

<!-- gh-comment-id:3322686379 --> @maelp commented on GitHub (Sep 23, 2025): Might also have been an old version that accumulated stuff and I had just not noticed until now, not sure... EDIT: looking at the timestamps on monolith files they seemed recent, and I have updated to the latest version a few days/weeks ago, so it seems to be also in the recent version
Author
Owner

@thiswillbeyourgithub commented on GitHub (Sep 23, 2025):

I have an extensive library and the files are sort of recent : 13 and 17 september. Though they are only playwright and no monolith files. I do often use the SingleFile extension.

<!-- gh-comment-id:3323055370 --> @thiswillbeyourgithub commented on GitHub (Sep 23, 2025): I have an extensive library and the files are sort of recent : 13 and 17 september. Though they are only playwright and no monolith files. I do often use the SingleFile extension.
Author
Owner

@MohamedBassem commented on GitHub (Sep 28, 2025):

I noticed that I wasn't cleaning monolith's files on timeout (github.com/karakeep-app/karakeep@8dd84ef58b) but they're not named as such. I don't know where those monolith files are coming from :D

<!-- gh-comment-id:3343004717 --> @MohamedBassem commented on GitHub (Sep 28, 2025): I noticed that I wasn't cleaning monolith's files on timeout (https://github.com/karakeep-app/karakeep/commit/8dd84ef58b8da920f3e7718cfb5129a44437e53d) but they're not named as such. I don't know where those monolith files are coming from :D
Author
Owner

@zorghere commented on GitHub (Sep 29, 2025):

Confirm I'm also seeing this. Lots of big monolith-* files, all recent, not doing any major archiving/crawling videos, very little recent activity.

<!-- gh-comment-id:3348702611 --> @zorghere commented on GitHub (Sep 29, 2025): Confirm I'm also seeing this. Lots of big monolith-* files, all recent, not doing any major archiving/crawling videos, very little recent activity.
Author
Owner

@MohamedBassem commented on GitHub (Oct 12, 2025):

Ok, I looked a bit more on this. Seems like this is monolith's db cache that it uses to store large assets (https://github.com/Y2Z/monolith/blob/master/src/main.rs#L213-L232). My guess is that this cache file doesn't get properly cleaned when the monolith binary is killed ungracefully (e.g. during timeouts). It doesn't seem like monolith allow us to set this file, so cleaning it will be tricky. Maybe we can file an issue with monolith? Another thing we can consider is overriding monolith's TMPDIR env variable to a dir we control that we clean afterwards.

<!-- gh-comment-id:3394529988 --> @MohamedBassem commented on GitHub (Oct 12, 2025): Ok, I looked a bit more on this. Seems like this is monolith's db cache that it uses to store large assets (https://github.com/Y2Z/monolith/blob/master/src/main.rs#L213-L232). My guess is that this cache file doesn't get properly cleaned when the monolith binary is killed ungracefully (e.g. during timeouts). It doesn't seem like monolith allow us to set this file, so cleaning it will be tricky. Maybe we can file an issue with monolith? Another thing we can consider is overriding monolith's TMPDIR env variable to a dir we control that we clean afterwards.
Author
Owner

@maelp commented on GitHub (Oct 12, 2025):

I guess indeed it could be a solution? Have a specific monolith dir and clean it up regularly

<!-- gh-comment-id:3394852029 --> @maelp commented on GitHub (Oct 12, 2025): I guess indeed it could be a solution? Have a specific monolith dir and clean it up regularly
Author
Owner

@zorghere commented on GitHub (Oct 19, 2025):

My hacky temporary way of resolving this has been to map the /tmp in the docker container dir to a local folder on the host and have a cron job clean it up every day.

<!-- gh-comment-id:3419092673 --> @zorghere commented on GitHub (Oct 19, 2025): My hacky temporary way of resolving this has been to map the `/tmp` in the docker container dir to a local folder on the host and have a cron job clean it up every day.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1226
No description provided.