[GH-ISSUE #1955] Incomplete search results when using partial keywords in bookmark search (Version 0.27.1) #1214

Open
opened 2026-03-02 11:55:49 +03:00 by kerem · 4 comments
Owner

Originally created by @LinFei83 on GitHub (Sep 16, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1955

Describe the Bug

I am experiencing an issue with bookmark search functionality in version 0.27.1 where partial keyword searches do not return expected results, while full keyword searches work correctly.

Steps to Reproduce

  1. Save a bookmark with the title or description containing the term "MPU6050".
  2. Perform a search using the full keyword "MPU6050" – the bookmark is successfully retrieved.
  3. Perform a search using a partial keyword such as "6050" – the bookmark does not appear in the results.

Expected Behaviour

Searching with partial keywords (e.g., "6050") should return bookmarks that contain the substring, similar to how the full keyword works.

Actual Behavior:
Only exact or full keyword matches return results, while partial matches are ignored.

Screenshots or Additional Context

Image Image

Device Details

ubuntu20.04 docker

Exact Karakeep Version

0.27.1

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem
Originally created by @LinFei83 on GitHub (Sep 16, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1955 ### Describe the Bug I am experiencing an issue with bookmark search functionality in version 0.27.1 where partial keyword searches do not return expected results, while full keyword searches work correctly. ### Steps to Reproduce 1. Save a bookmark with the title or description containing the term "MPU6050". 2. Perform a search using the full keyword "MPU6050" – the bookmark is successfully retrieved. 3. Perform a search using a partial keyword such as "6050" – the bookmark does not appear in the results. ### Expected Behaviour Searching with partial keywords (e.g., "6050") should return bookmarks that contain the substring, similar to how the full keyword works. **Actual Behavior:** Only exact or full keyword matches return results, while partial matches are ignored. ### Screenshots or Additional Context <img width="1912" height="846" alt="Image" src="https://github.com/user-attachments/assets/61c87979-7416-44ea-ad53-1e96622e1402" /> <img width="1913" height="725" alt="Image" src="https://github.com/user-attachments/assets/475bf29c-a13f-495b-905d-7192d94ea89b" /> ### Device Details ubuntu20.04 docker ### Exact Karakeep Version 0.27.1 ### Have you checked the troubleshooting guide? - [ ] I have checked the troubleshooting guide and I haven't found a solution to my problem
Author
Owner

@thiswillbeyourgithub commented on GitHub (Sep 16, 2025):

I believe it's because of the tokenization step of meilisearch called charabia

Looking at the doc, I'm wondering if the exactness is used by karakeep. Actually I never really understood where the configuration for meilisearch was in the karakeep code.

edit: the config file looks like it's in packages/plugins-search-meilisearch/src/index.ts

edit2:

After opening a shell inside the meilisearch container, then doing this curl -X GET http://localhost:7700/indexes/bookmarks/settings -H "Authorization: Bearer [the meilisearch key]" I see the following json:

Click to read the json

{
  "displayedAttributes": [
    "*"
  ],
  "searchableAttributes": [
    "*"
  ],
  "filterableAttributes": [
    "id",
    "userId"
  ],
  "sortableAttributes": [
    "createdAt"
  ],
  "rankingRules": [
    "words",
    "typo",
    "proximity",
    "attribute",
    "sort",
    "exactness"
  ],
  "stopWords": [],
  "nonSeparatorTokens": [],
  "separatorTokens": [],
  "dictionary": [],
  "synonyms": {},
  "distinctAttribute": null,
  "proximityPrecision": "byWord",
  "typoTolerance": {
    "enabled": true,
    "minWordSizeForTypos": {
      "oneTypo": 5,
      "twoTypos": 9
    },
    "disableOnWords": [],
    "disableOnAttributes": []
  },
  "faceting": {
    "maxValuesPerFacet": 100,
    "sortFacetValuesBy": {
      "*": "alpha"
    }
  },
  "pagination": {
    "maxTotalHits": 1000
  },
  "embedders": {},
  "searchCutoffMs": null,
  "localizedAttributes": null,
  "facetSearch": true,
  "prefixSearch": "indexingTime"
}

So exactness is used.

<!-- gh-comment-id:3297786806 --> @thiswillbeyourgithub commented on GitHub (Sep 16, 2025): I believe it's because of the [tokenization step](https://www.meilisearch.com/docs/learn/indexing/tokenization) of meilisearch called [charabia](https://github.com/meilisearch/charabia) Looking at the [doc](https://www.meilisearch.com/docs/learn/relevancy/ranking_rules), I'm wondering if the `exactness` is used by karakeep. Actually I never really understood where the configuration for meilisearch was in the karakeep code. edit: the config file looks like it's in `packages/plugins-search-meilisearch/src/index.ts` edit2: After opening a shell inside the meilisearch container, then doing this `curl -X GET http://localhost:7700/indexes/bookmarks/settings -H "Authorization: Bearer [the meilisearch key]"` I see the following json: <details> <summary> Click to read the json </summary> ```json { "displayedAttributes": [ "*" ], "searchableAttributes": [ "*" ], "filterableAttributes": [ "id", "userId" ], "sortableAttributes": [ "createdAt" ], "rankingRules": [ "words", "typo", "proximity", "attribute", "sort", "exactness" ], "stopWords": [], "nonSeparatorTokens": [], "separatorTokens": [], "dictionary": [], "synonyms": {}, "distinctAttribute": null, "proximityPrecision": "byWord", "typoTolerance": { "enabled": true, "minWordSizeForTypos": { "oneTypo": 5, "twoTypos": 9 }, "disableOnWords": [], "disableOnAttributes": [] }, "faceting": { "maxValuesPerFacet": 100, "sortFacetValuesBy": { "*": "alpha" } }, "pagination": { "maxTotalHits": 1000 }, "embedders": {}, "searchCutoffMs": null, "localizedAttributes": null, "facetSearch": true, "prefixSearch": "indexingTime" } ``` </details> So exactness is used.
Author
Owner

@thiswillbeyourgithub commented on GitHub (Sep 16, 2025):

I see that by default the the search cutoff is set at 1500ms. It might be useful to increase it and see if it solves your issue. Or maybe try changing settings related to the typo tolerance? Also turns out that you're doing prefix search which makes use of data created at indexing time. Noticing this is not helpful but worth knowing.

<!-- gh-comment-id:3298058439 --> @thiswillbeyourgithub commented on GitHub (Sep 16, 2025): I see that [by default](https://www.meilisearch.com/docs/learn/relevancy/custom_ranking_rules) the the search cutoff is set at 1500ms. It might be useful to increase it and see if it solves your issue. Or maybe try changing settings related to the `typo tolerance`? Also turns out that you're doing [prefix search](https://www.meilisearch.com/docs/learn/engine/prefix) which makes use of data created at indexing time. Noticing this is not helpful but worth knowing.
Author
Owner

@LinFei83 commented on GitHub (Sep 17, 2025):

@thiswillbeyourgithub Wow, thank you for the expert analysis! This is much more in-depth than I could have hoped for.
Your point about the Meilisearch tokenizer and the configuration settings sounds exactly like the core of the issue. I may not have the time to test a configuration change immediately, but your findings provide a critical clue for solving this.
Hopefully, the maintainers or other developers will see your analysis, as it will surely help them fix this much faster. Thank you so much!

<!-- gh-comment-id:3300864805 --> @LinFei83 commented on GitHub (Sep 17, 2025): @thiswillbeyourgithub Wow, thank you for the expert analysis! This is much more in-depth than I could have hoped for. Your point about the Meilisearch tokenizer and the configuration settings sounds exactly like the core of the issue. I may not have the time to test a configuration change immediately, but your findings provide a critical clue for solving this. Hopefully, the maintainers or other developers will see your analysis, as it will surely help them fix this much faster. Thank you so much!
Author
Owner

@thiswillbeyourgithub commented on GitHub (Sep 17, 2025):

Thanks for the kind words!

Re reading this: it appears that you search "6050" to look for "MPU6050" so this is not prefix search but typo search with 4 typos. At least if they consider missing letters as a typo: source.

So maybe this is just not possible for meilisearch as it would be similar to a plaintext exhaustive search and they appear to not do this. Oh wait, maybe if you search for "6050" it will do that search?

<!-- gh-comment-id:3301389662 --> @thiswillbeyourgithub commented on GitHub (Sep 17, 2025): Thanks for the kind words! Re reading this: it appears that you search "6050" to look for "MPU6050" so this is not prefix search but typo search with 4 typos. At least if they consider missing letters as a typo: [source](https://www.meilisearch.com/docs/learn/relevancy/typo_tolerance_calculations). So maybe this is just not possible for meilisearch as it would be similar to a plaintext exhaustive search and they appear to not do this. Oh wait, maybe if you search for `"6050"` it will do that search?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1214
No description provided.