[GH-ISSUE #2095] Capture Plugins #1306

Open
opened 2026-03-02 11:56:24 +03:00 by kerem · 2 comments
Owner

Originally created by @FFCoder on GitHub (Nov 7, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/2095

Describe the feature you'd like

So I was reviewing #739 in reference to my own frustrations with capturing Reddit posts. What I think would be great is if a capture plugin system could be developed. In this way, a plugin might have a regex of matching urls. Then karakeep might expose some hooks that the plugin can utilize to fetch the page and return the data better than Karakeep can grab it using existing mechanisms.

Describe the benefits this would bring to existing Karakeep users

  • Better capturing of sites such as Reddit. The plugin could use the JSON api or other mechanisms to grab the data.
  • Better resource management - a plugin could have a setting that disables screenshots and it could save on resources by not having to load a browser to fetch the contents.
  • Increase the performance and accuracy of Karakeep while delegating development off of the core karakeep devs

Can the goal of this request already be achieved via other means?

Yes technically it could be done through the Karakeep api.

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @FFCoder on GitHub (Nov 7, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/2095 ### Describe the feature you'd like So I was reviewing #739 in reference to my own frustrations with capturing Reddit posts. What I think would be great is if a capture plugin system could be developed. In this way, a plugin might have a regex of matching urls. Then karakeep might expose some hooks that the plugin can utilize to fetch the page and return the data better than Karakeep can grab it using existing mechanisms. ### Describe the benefits this would bring to existing Karakeep users - Better capturing of sites such as Reddit. The plugin could use the JSON api or other mechanisms to grab the data. - Better resource management - a plugin could have a setting that disables screenshots and it could save on resources by not having to load a browser to fetch the contents. - Increase the performance and accuracy of Karakeep while delegating development off of the core karakeep devs ### Can the goal of this request already be achieved via other means? Yes technically it could be done through the Karakeep api. ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_
Author
Owner

@xuatz commented on GitHub (Nov 7, 2025):

Very interesting idea, especially the potential to support a crowdsourced plugin ecosystem.

<!-- gh-comment-id:3503306478 --> @xuatz commented on GitHub (Nov 7, 2025): Very interesting idea, especially the potential to support a crowdsourced plugin ecosystem.
Author
Owner

@MohamedBassem commented on GitHub (Nov 8, 2025):

I've been recently taking more and more of the features of karakeep and moving it behind plugins, and the next one on my plate was indeed the crawler plugin. We already have two implementations in place, one that uses fetch directly, and another one that uses playwright if a browser is configured. I was planning to add a third one that delegates the crawling completely to a third party plugin that takes a request, and responds back with the html and the screenshot (and potentially even with parsed metadata). And as @xuatz mentioned, I want to indeed make the plugin ecosystem a first class citizen in karakeep so that I can keep the core slim, and delegate more and more of the features externally.

<!-- gh-comment-id:3506532257 --> @MohamedBassem commented on GitHub (Nov 8, 2025): I've been recently taking more and more of the features of karakeep and moving it behind plugins, and the next one on my plate was indeed the crawler plugin. We already have two implementations in place, one that uses `fetch` directly, and another one that uses `playwright` if a browser is configured. I was planning to add a third one that delegates the crawling completely to a third party plugin that takes a request, and responds back with the html and the screenshot (and potentially even with parsed metadata). And as @xuatz mentioned, I want to indeed make the plugin ecosystem a first class citizen in karakeep so that I can keep the core slim, and delegate more and more of the features externally.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1306
No description provided.