[GH-ISSUE #1416] FR: Add functionality to use old school Natural Language Processing instead of AI / LLMs #899

Closed
opened 2026-03-02 11:53:36 +03:00 by kerem · 4 comments
Owner

Originally created by @drfraser on GitHub (May 16, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1416

Describe the feature you'd like

I would like functionality that implements old school NLP to analyze documents to create tags, summarize, etc. I realize this is hardly the latest cool thing, to not use AI / LLMs, but I had an idea like karakeep years ago and it is nice to see someone has made something more comprehensive than I dreamt up.

Describe the benefits this would bring to existing Karakeep users

My primary intent would be more to make this for myself as part my research into NLP, document processing etc. If someone else finds this feature useful, then great (i.e. the bugs about too many tags, reusing tags, etc). Also, the user would not need to sign up to any LLM services.

Can the goal of this request already be achieved via other means?

No, I don't see how.

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

Like I've said, I plan to add (or at least try, need an excuse to learn React) the functionality myself. If anyone has some suggestions or reactions, please let me know.

Originally created by @drfraser on GitHub (May 16, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1416 ### Describe the feature you'd like I would like functionality that implements old school NLP to analyze documents to create tags, summarize, etc. I realize this is hardly the latest cool thing, to not use AI / LLMs, but I had an idea like karakeep years ago and it is nice to see someone has made something more comprehensive than I dreamt up. ### Describe the benefits this would bring to existing Karakeep users My primary intent would be more to make this for myself as part my research into NLP, document processing etc. If someone else finds this feature useful, then great (i.e. the bugs about too many tags, reusing tags, etc). Also, the user would not need to sign up to any LLM services. ### Can the goal of this request already be achieved via other means? No, I don't see how. ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context Like I've said, I plan to add (or at least try, need an excuse to learn React) the functionality myself. If anyone has some suggestions or reactions, please let me know.
Author
Owner

@thiswillbeyourgithub commented on GitHub (May 16, 2025):

Hi,

What do you mean by nlp? It's very vast. What do you concretely have in mind? What kind of feature?

I too have quite a few ideas (see #1230) and made a python client (#1360 ) to make it easy to connect python to karakeep.

There's a pretty high probability I'll connect bertopic to karakeep to semantically organize my reading queue (not unlike my very first python project: AnnA). And if I click on one item inside bertopic it opens it in karakeep.

<!-- gh-comment-id:2887230831 --> @thiswillbeyourgithub commented on GitHub (May 16, 2025): Hi, What do you mean by nlp? It's very vast. What do you concretely have in mind? What kind of feature? I too have quite a few ideas (see #1230) and made a python client (#1360 ) to make it easy to connect python to karakeep. There's a pretty high probability I'll connect bertopic to karakeep to semantically organize my reading queue (not unlike my very first python project: [AnnA](https://github.com/thiswillbeyourgithub/AnnA_Anki_neuronal_Appendix)). And if I click on one item inside bertopic it opens it in karakeep.
Author
Owner

@drfraser commented on GitHub (May 17, 2025):

I have to find my notes about my ideas, but in general, TF-IDF, topic modeling (LSA, LDA etc), clustering ... to automate the process of categorizing bookmarks/web pages. It was nearly a decade ago that I got my MSc in AI, so I don't know what the state of the relevant research around document categorization is these days. Now I am wondering if there have been any papers comparing the results of using LLMs to do something to the older techniques...

#1230 looks interesting so i will read up. And #1360 as well - I'm going to need karakeep to call out to python based somethings so I will look into what you've done.

<!-- gh-comment-id:2888258748 --> @drfraser commented on GitHub (May 17, 2025): I have to find my notes about my ideas, but in general, TF-IDF, topic modeling (LSA, LDA etc), clustering ... to automate the process of categorizing bookmarks/web pages. It was nearly a decade ago that I got my MSc in AI, so I don't know what the state of the relevant research around document categorization is these days. Now I am wondering if there have been any papers comparing the results of using LLMs to do something to the older techniques... #1230 looks interesting so i will read up. And #1360 as well - I'm going to need karakeep to call out to python based somethings so I will look into what you've done.
Author
Owner

@thiswillbeyourgithub commented on GitHub (May 17, 2025):

Yeah I think you'll link bertopic a lot then. If you give it a try I'm interested in what you're up to!

<!-- gh-comment-id:2888444982 --> @thiswillbeyourgithub commented on GitHub (May 17, 2025): Yeah I think you'll link bertopic a lot then. If you give it a try I'm interested in what you're up to!
Author
Owner

@MohamedBassem commented on GitHub (May 17, 2025):

@drfraser To be honest, it's not clear to me what the benefits of this would be over using LLMs. I think this can be something that's done as an integration/sidecar for karakeep. We already support webhooks which can trigger some processing on your side and then update the bookmark back with the result of the processing. I'd be more than happy to add support for whatever hooks you'd want to integrate with karakeep the way you want. But using NLP in karakeep natively is probably unlikely.

<!-- gh-comment-id:2888449350 --> @MohamedBassem commented on GitHub (May 17, 2025): @drfraser To be honest, it's not clear to me what the benefits of this would be over using LLMs. I think this can be something that's done as an integration/sidecar for karakeep. We already support webhooks which can trigger some processing on your side and then update the bookmark back with the result of the processing. I'd be more than happy to add support for whatever hooks you'd want to integrate with karakeep the way you want. But using NLP in karakeep natively is probably unlikely.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#899
No description provided.