[GH-ISSUE #441] [Feature request] Selfhosted semantic search #281

Open
opened 2026-03-02 11:48:26 +03:00 by kerem · 6 comments
Owner

Originally created by @JojiiOfficial on GitHub (Sep 29, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/441

I'd love to see Sentence Transformers getting added into Hoarder for enhanced semantic search capabilities. It could make finding bookmarks much more efficient and user-friendly.

For reference, you can check out this example that illustrates how they could be applied in Hoarder.

Their lightweight nature also aligns perfectly with self-hosting and privacy goals. Additionally, using a vector database like Qdrant could help storing and retrieving the generated embeddings efficiently, providing fast performance and easy self-hosting.

As I'm very familiar with this area of application, I'd consider contributing, if this gets accepted and we agree on an implementation.

Originally created by @JojiiOfficial on GitHub (Sep 29, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/441 I'd love to see Sentence Transformers getting added into Hoarder for enhanced semantic search capabilities. It could make finding bookmarks much more efficient and user-friendly. For reference, you can check out [this example](https://sbert.net/examples/applications/semantic-search/README.html) that illustrates how they could be applied in Hoarder. Their lightweight nature also aligns perfectly with self-hosting and privacy goals. Additionally, using a vector database like [Qdrant](https://github.com/qdrant/qdrant) could help storing and retrieving the generated embeddings efficiently, providing fast performance and easy self-hosting. As I'm very familiar with this area of application, I'd consider contributing, if this gets accepted and we agree on an implementation.
Author
Owner

@MohamedBassem commented on GitHub (Sep 29, 2024):

@JojiiOfficial We already have @medo who's working on adding RAG on the stored bookmarks. The first PR is here (https://github.com/hoarder-app/hoarder/pull/403/files) (currently pending review) which generates embeddings for the data stored in hoarder. For vector database, we're considering either sqlite-vec or orama (https://github.com/askorama/orama). Orama is cool because we can also use it for FTS (as a replacement for meilisearch). If you're interested in contributing to this effort, please join us in the #development channel on discord.

<!-- gh-comment-id:2381636911 --> @MohamedBassem commented on GitHub (Sep 29, 2024): @JojiiOfficial We already have @medo who's working on adding RAG on the stored bookmarks. The first PR is here (https://github.com/hoarder-app/hoarder/pull/403/files) (currently pending review) which generates embeddings for the data stored in hoarder. For vector database, we're considering either [sqlite-vec](https://github.com/asg017/sqlite-vec) or orama (https://github.com/askorama/orama). Orama is cool because we can also use it for FTS (as a replacement for meilisearch). If you're interested in contributing to this effort, please join us in the #development channel on discord.
Author
Owner

@JojiiOfficial commented on GitHub (Sep 29, 2024):

Thanks for the quick response!

I'd love to be able to fully selfhost Hoarder. I personally don't want all my bookmarks being sent to the OpenAI servers but prefer keeping everything local. I think a lot of people, especially people selfhosting their apps, think similarly.
For this, either some local LLM or specialized models, like Sentence Transformer seem to be the best choice.

What are your thoughts on this?

<!-- gh-comment-id:2381638088 --> @JojiiOfficial commented on GitHub (Sep 29, 2024): Thanks for the quick response! I'd love to be able to fully selfhost Hoarder. I personally don't want all my bookmarks being sent to the OpenAI servers but prefer keeping everything local. I think a lot of people, especially people selfhosting their apps, think similarly. For this, either some local LLM or specialized models, like Sentence Transformer seem to be the best choice. What are your thoughts on this?
Author
Owner

@MohamedBassem commented on GitHub (Sep 29, 2024):

@JojiiOfficial Hoarder already supports ollama for local inference. This feature is going to be no different (will work with either ollama or open ai).

<!-- gh-comment-id:2381638360 --> @MohamedBassem commented on GitHub (Sep 29, 2024): @JojiiOfficial Hoarder already supports ollama for local inference. This feature is going to be no different (will work with either ollama or open ai).
Author
Owner

@JojiiOfficial commented on GitHub (Sep 29, 2024):

I didn't notice the configuration option for Ollama. Thanks for the clarification!

<!-- gh-comment-id:2381639007 --> @JojiiOfficial commented on GitHub (Sep 29, 2024): I didn't notice the configuration option for Ollama. Thanks for the clarification!
Author
Owner

@austinmccalley commented on GitHub (Jan 27, 2025):

See bookmark embeddings PR https://github.com/hoarder-app/hoarder/pull/834 for a rough draft.

<!-- gh-comment-id:2614677504 --> @austinmccalley commented on GitHub (Jan 27, 2025): See bookmark embeddings PR https://github.com/hoarder-app/hoarder/pull/834 for a rough draft.
Author
Owner

@thiswillbeyourgithub commented on GitHub (May 18, 2025):

@JojiiOfficial We already have @medo who's working on adding RAG on the stored bookmarks. The first PR is here (https://github.com/hoarder-app/hoarder/pull/403/files) (currently pending review) which generates embeddings for the data stored in hoarder. For vector database, we're considering either sqlite-vec or orama (https://github.com/askorama/orama). Orama is cool because we can also use it for FTS (as a replacement for meilisearch). If you're interested in contributing to this effort, please join us in the #development channel on discord.

Why do you want to get rid of meilisearch? It supports full text search, embeddings search, binary vectors, integrates with react. They also apparently support image search and video search. Orama does only a fraction of does and no binary search vectors.

<!-- gh-comment-id:2889035025 --> @thiswillbeyourgithub commented on GitHub (May 18, 2025): > [@JojiiOfficial](https://github.com/JojiiOfficial) We already have [@medo](https://github.com/medo) who's working on adding RAG on the stored bookmarks. The first PR is here (https://github.com/hoarder-app/hoarder/pull/403/files) (currently pending review) which generates embeddings for the data stored in hoarder. For vector database, we're considering either [sqlite-vec](https://github.com/asg017/sqlite-vec) or orama (https://github.com/askorama/orama). Orama is cool because we can also use it for FTS (as a replacement for meilisearch). If you're interested in contributing to this effort, please join us in the #development channel on discord. Why do you want to get rid of meilisearch? It supports full text search, embeddings search, binary vectors, integrates with react. They also apparently support image search and video search. Orama does only a fraction of does and no binary search vectors.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#281
No description provided.