[GH-ISSUE #2519] If the parser OOMs, orphaned assets / invalid data can be left in the database #1515

Open
opened 2026-03-02 11:57:47 +03:00 by kerem · 1 comment
Owner

Originally created by @BryanWall on GitHub (Feb 25, 2026).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/2519

Describe the Bug

I tried adding an asset through SinglePage (an article on X.com). It didn't work because the parser was OOM. I deleted the failed bookmark and increased CRAWLER_PARSER_MEM_LIMIT_MB to 768. That fixed the issue.

However, it left an orphaned banner image for the deleted asset in the database. I've attached screenshots. I think the image was deleted from storage, but this entry was left in the database with a null bookmark ID. I was able to fix the issue by manually deleting that entry from the database.

Is there, or should there be, a process to remove invalid data from the database, or maybe there needs to be some error handling on the crawl process so it fails gracefully if OOM?

Steps to Reproduce

Try to add an asset that causes parser to be OOM
Delete the failed asset

Expected Behaviour

Failed crawling doesn't leave invalid data in the db

Screenshots or Additional Context

Image Image

Device Details

No response

Exact Karakeep Version

v0.31.0

Environment Details

No response

Debug Logs

No response

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem
Originally created by @BryanWall on GitHub (Feb 25, 2026). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/2519 ### Describe the Bug I tried adding an asset through SinglePage (an article on X.com). It didn't work because the parser was OOM. I deleted the failed bookmark and increased CRAWLER_PARSER_MEM_LIMIT_MB to 768. That fixed the issue. However, it left an orphaned banner image for the deleted asset in the database. I've attached screenshots. I think the image was deleted from storage, but this entry was left in the database with a null bookmark ID. I was able to fix the issue by manually deleting that entry from the database. Is there, or should there be, a process to remove invalid data from the database, or maybe there needs to be some error handling on the crawl process so it fails gracefully if OOM? ### Steps to Reproduce Try to add an asset that causes parser to be OOM Delete the failed asset ### Expected Behaviour Failed crawling doesn't leave invalid data in the db ### Screenshots or Additional Context <img width="1996" height="1292" alt="Image" src="https://github.com/user-attachments/assets/b56855b7-f459-44b6-b220-501b7230de7d" /> <img width="2420" height="1418" alt="Image" src="https://github.com/user-attachments/assets/0c4ffc37-e8d8-4f03-8027-b29122aca802" /> ### Device Details _No response_ ### Exact Karakeep Version v0.31.0 ### Environment Details _No response_ ### Debug Logs _No response_ ### Have you checked the troubleshooting guide? - [x] I have checked the troubleshooting guide and I haven't found a solution to my problem
Author
Owner

@MohamedBassem commented on GitHub (Feb 28, 2026):

Thanks for the report. Yeah, I'm aware of that bug.

<!-- gh-comment-id:3976678957 --> @MohamedBassem commented on GitHub (Feb 28, 2026): Thanks for the report. Yeah, I'm aware of that bug.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1515
No description provided.