mirror of
https://github.com/karakeep-app/karakeep.git
synced 2026-04-25 07:56:05 +03:00
[GH-ISSUE #111] [Feature request] Force AI to use existing tags (instead of creating them) #91
Open
opened 2026-03-02 11:46:30 +03:00 by kerem
·
25 comments
No Branch/Tag specified
main
refactor/use-npm-singlefile
onetab
claude/issue-2596-20260321-1401
claude/fix-docs-button-responsive-V3aBQ
claude/review-import-backpressure-D4ArJ
claude/fix-archived-bookmarks-mobile-P9OJW
claude/issue-1189-20260211-1601
claude/fix-nested-smart-lists-3uFkt
claude/issue-2298-20251223-1704
feat/import-v3
claude/add-cli-search-subcommand-6kIe0
claude/add-bookmark-indexing-timestamps-96bPj
claude/auto-disable-failing-feeds-fkDhP
claude/add-tag-search-aliases-HzESD
feat/docker-compose-dev
claude/add-attachedby-tags-endpoint-01WYfemMGHJJjXsPYLvUJAno
claude/fix-crawler-memory-leaks-NE7Ct
bookmark-debugger
claude/issue-2352-20260106-1120
claude/issue-1977-20260102-2348
claude/add-banner-rendering-JeLUk
claude/add-descendant-qualifier-cUm26
claude/skip-metadata-refresh-archives-CAo4Y
claude/fix-archive-pending-banner-pAyGM
claude/add-embeddings-support-h2swV
claude/nested-manage-lists-QVV85
claude/privacy-type-system-MG1bT
claude/add-action-menu-icons-6hNKw
claude/issue-2299-20251223-1711
claude/bookmark-indexing-progress-QwZSI
claude/migrate-bookmark-attachments-3O2te
claude/add-2025-wrapped-feature-tIUIh
claude/improve-ai-settings-design-639tq
claude/add-youtube-metascraper-plugin-0lWC7
claude/add-problem-reporting-gSSEV
claude/add-mobile-list-menus-spcS7
claude/shadcn-bookmark-cards-WWHzP
claude/add-extensions-link-HTeXc
claude/add-onboarding-screens-hsYMO
claude/fix-settings-switch-overflow-nlzM4
claude/clamp-bookmark-titles-diAEz
claude/port-stats-mobile-expo-MuXAn
claude/whats-new-base-version-vrv8C
claude/fix-settings-auth-checks-jgyD8
claude/add-server-version-display-3sGa2
claude/fix-tag-editor-scrolling-rzdbG
claude/add-company-pricing-card-y5mHY
claude/audit-optimize-transactions-xpDVc
codex/ensure-consistent-ui-experience-across-app-pages
claude/plan-opentelemetry-integration-01Jx183mz1Ev8h8JoYj97Auw
libsql
db-indicies
claude/export-import-lists-01UuCWwdaqduAd35NppvjnMD
claude/configurable-worker-timeout-0198GQh6YrrRzqG62xnogyrz
claude/check-import-quota-01CPdxTpHp18Ba62bYcBTVbA
claude/scraper-worker-thread-01FEHen6MGrQHmdBstJSuiyA
claude/customize-dialog-styling-01CVjEv2KgyZJSpCg3mqkvR7
claude/add-asset-cache-headers-0175WhNcqwiwurrmjj52jnLT
claude/add-db-search-plugin-017Xxd4Jq3MfjWT788vgfbaq
benchmarks-2
claude/add-filtered-deletion-01DTxWNcg3hhqdNpeNLa3s6L
claude/actionbutton-loading-spinner-015DY5ZTvgPgFAXTZz3UGaYv
claude/add-broken-links-qualifier-01S31X1LsKiYb9gE1dXTKvi3
claude/docker-release-tag-trigger-01UmzFXEumhK2jdmRGtMcueo
claude/spread-feed-fetch-scheduling-01EihUtmZSyqeE1HfRMessxW
restate-idempotency
claude/align-android-ios-colors-01GJfkhEyZVBReohVioPa8ok
claude/improve-mobile-app-colors-0155LzHfkd5HyJr6YyZMsus5
codex/add-autocomplete-for-search-query-language
claude/add-bookmark-backups-016L2A8Z94n7tDgDdMPdFuAd
claude/restrict-binary-user-permissions-01FSGyy2RXGZvE26YbAejzGi
effect-ts
claude/prepare-trpc-npm-publish-0193EjfwpxSNVNcLXqXjs6Ln
shared-list-sidebar
claude/lazy-load-tiktoken-017UTNpJPTcMMQvNEBa1aFwo
codex/fix-asset-pre-processing-worker-abort-signals
add-groupid
claude/add-bookmark-list-button-01VF7uXYNLsVDzqdozWMXP5M
claude/extract-shared-ui-components-01DSVfaCr6WRqAyx1vJTZk9r
claude/migrate-shadcn-sidebar-01DKjpg9MD5PJ2potemSnbvW
claude/add-collaborators-rate-limits-01VjXyRWWPUkGQKa8d8D8qKj
claude/modernize-dark-mode-01FRfE81PAY5C44pFu1cYocf
claude/add-signed-url-bookmark-01PjYT1ZhvLK2FPJNTAhJsWf
restate-group-id
claude/add-highlights-page-012vhHpn8fVNp3gf7gBeW14s
claude/disable-shared-bookmark-features-01B9fiGUdu6NyWaxSQFsQBxP
claude/mobile-bookmark-grid-layouts-018cGBBMhPJVq6PJVRBpqT2r
claude/add-mobile-bookmark-summary-01494LYoh4sJW5Fj4GPm62Vj
claude/add-mobile-tags-screen-01WRADt4ZzvXVew1Y9vqF8SV
claude/add-highlight-notes-01LpanRLS4a2YMnT1qB5GTqX
claude/add-search-bar-014k2ngaqjwYRVSvqmbuECqr
claude/hide-collaborator-emails-01TQrkkMupC7CR9BTuDkireg
claude/list-invitation-approval-0129V89M1riXW6JqmoF74VfM
claude/add-bookmark-archive-sort-018VbGPGvtmsGgXFEERoAX7B
claude/add-mobile-smart-lists-01251tYo9u1SywE6XFezAv9e
claude/bookmark-drag-drop-01DmWq286ogHpDGHKcXjKr3z
claude/add-rss-import-01DH1Q2axcDeq8nQJR5MWjPJ
claude/mobile-inapp-browser-auth-01KiT6bwyntRPQ1X4oTtAveC
claude/offline-mode-react-query-01D1rE2bdBEPw2teGqunr5Gd
claude/add-singlefile-extension-support-01BEB9QQZABzwfZDvR9Bz5b2
claude/custom-list-slugs-01VxcfkNUXZ97FNpNVURopMq
claude/issue-2148-20251118-1133
claude/add-groupid-queue-fairness-011CV1r8Wb46HuGAg5o95i3m
claude/hide-viewer-shared-lists-01Fst6NBvdxrXXnDhUmjsNDP
claude/collaborative-lists-013AvDvMqkoszDVcSoCYgBcM
claude/implement-feature-01LT5XzGsbEhZkYXNEjEwdui
claude/fix-bookmark-loading-state-01AgF4H2drxwuTCJDB2Xgiu4
claude/admin-user-edit-013tbiRmb1KX2fhSYqmGKCu8
claude/expose-all-api-01YTruEW72WQYMtq4iZoaPkA
claude/add-doc-link-main-016NYLxShpKuH6R8XCBgeZtc
claude/fix-issue-2133-019JLvdSRAUbU4FtjQztcM6S
claude/explore-effect-ts-integration-01F7xb1dWwP1ma4LnLbFGfDD
claude/optimize-dockerfile-build-011CV5gDnPZbdbbVSPDofC4e
claude/add-custom-headers-guide-011CV249t16aWDRb1mCrzQdC
claude/mobile-app-signup-011CUxPtCXgU6U3T8GShTR2Q
claude/crawler-worker-fetch-browser-011CUvcRc24XEr9DTWDW6MX8
claude/fix-issue-784-011CUvubQrcZHG9S3KjpCKbK
codex/add-user-settings-for-inference-language-and-screenshots
claude/fix-mobile-signin-server-address-011CUnaUWwY2Fhq5Xbwhgr8H
better-auth-2
claude/issue-2028-20251012-1429
claude/issue-1010-20251012-1154
codex/update-feed-refresh-job-idempotency-key
restate
import-v2
fix-public-lists
recurse-delete-list
abort-dangling-processing
tag-pagination
ratelimit-plugin
claude/issue-1937-20250914-0912
codex/implement-title-search-query-qualifier
copilot/add-edit-button-for-notes
cookie-path
ai-tag-cleanup
codex/add-allowlist-and-blocklist-env-variables
mobile-retheme
expo-next-upgrade
opencode/issue1788-20250727215611
fix-trailing-slash-deduplication
edit-bookmark-dialog
bookmark-embeddings
rag
nextjs-15
bookmark-hover-bar
sapling-pr-archive-MohamedBassem
track-bookmark-assets
json-cli
admin-settings
mobile-dark-mode
android/v1.9.2-0
ios/v1.9.1-1
android/v1.9.1-0
ios/v1.9.1-0
ios/v1.9.0-2
ios/v1.9.0-1
android/v1.9.0-1
extension/v1.2.9
cli/v0.31.0
sdk/v0.31.0
mcp/v0.31.0
android/v1.9.0-0
ios/v1.9.0-0
v0.31.0
android/v1.8.5-0
cli/v0.30.0
sdk/v0.30.0
ios/v1.8.4-0
android/v1.8.4-0
v0.30.0
cli/v0.29.1
v0.29.3
v0.29.2
v0.29.1
sdk/v0.29.0
cli/v0.29.0
mcp/v0.29.0
ios/v1.8.3-0
android/v1.8.3-0
extension/v1.2.8
v0.29.0
android/v1.8.2-2
android/v1.8.2-1
ios/v1.8.2-0
android/v1.8.2-0
extension/v1.2.7
android/v1.8.1-0
ios/v1.8.1-0
v0.28.0
cli/v0.27.1
cli/v0.27.0
v0.27.1
sdk/v0.27.0
v0.27.0
android/v1.8.0-1
ios/v1.8.0-1
mcp/v0.26.0
sdk/v0.26.0
v0.26.0
cli/v0.25.0
ios/v1.7.0-1
mcp/v0.25.0
v0.25.0
extension/v1.2.6
ios/v1.7.0-0
android/v1.7.0-0
v0.24.1
v0.24.0
mcp/v0.23.10
mcp/v0.23.9
mcp/v0.23.8
extension/v1.2.5
mcp/v0.23.7
mcp/v0.23.6
mcp/v0.23.5
mcp/v0.23.4
sdk/v0.23.2
cli/v0.23.0
extension/v1.2.4
android/v1.6.9-1
ios/v1.6.9-1
v0.23.2
v0.23.1
sdk/v0.23.0
v0.23.0
ios/v1.6.9-0
sdk/v0.22.0
v0.22.0
android/v1.6.8-0
ios/v1.6.8-0
sdk/v0.21.2
sdk/v0.21.1
sdk/v0.21.0
v0.21.0
cli/v0.20.0
v0.20.0
ios/v1.6.7-4
android/v1.6.7-4
ios/v1.6.7-3
android/v1.6.7-3
android/v1.6.7-2
ios/v1.6.7-2
android/v1.6.7-1
ios/v1.6.7-1
ios/v1.6.7-0
android/v1.6.7-0
v0.19.0
android/v1.6.6-0
android/v1.6.5-0
ios/v1.6.5-0
ios/v1.6.4-0
android/v1.6.4-0
v0.18.0
v0.17.1
v0.17.0
ios/v1.6.3-0
android/v1.6.3-0
extension/v1.2.3
ios/v1.6.2-1
android/v1.6.2-1
ios/v1.6.2-0
android/v1.6.2-0
v0.16.0
ios/v1.6.1-3
android/v1.6.1-3
ios/v1.6.1-2
android/v1.6.1-2
android/v1.6.1-1
ios/v1.6.1-1
android/v1.6.1-0
ios/v1.6.1-0
extension/v1.2.2
android/v1.6.0-1
ios/v1.6.0-1
ios/v1.6.0
android/v1.6.0
cli/v0.13.7
cli/v0.13.6
v0.15.0
cli/v0.13.5
extension/v1.2.1
v0.14.0
cli/v0.13.3
cli/v0.13.2
cli/v0.13.1
cli/v0.13.0
v0.13.1
v0.13.0
mobile-v1.5.0
mobile-v1.4.0
v0.12.2
v0.12.1
v0.12.0
v0.11.1
v0.11.0
v0.10.1
v0.10.0
v0.9.0
v0.8.0
v0.7.0
v0.6.0
v0.5.0
v0.4.1
v.0.4.0
v.0.3.1
v0.3.0
v0.2.2
v0.2.1
v0.2.0
v0.1.0
Labels
Clear labels
Mirrored from GitHub Pull Request
UI/UX
android
bug
dependencies
documentation
documentation
extension
feature request
feature request
good first issue
ios
long-term
performance
pri/high
pri/low
pri/medium
pull-request
Mirrored from GitHub Pull Request
question
status/approved
status/icebox
status/pending_clarification
status/untriaged
No labels
UI/UX
android
bug
dependencies
documentation
documentation
extension
feature request
feature request
good first issue
ios
long-term
performance
pri/high
pri/low
pri/medium
pull-request
question
status/approved
status/icebox
status/pending_clarification
status/untriaged
Milestone
Clear milestone
No items
No milestone
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".
No due date set.
Dependencies
No dependencies set.
Reference
starred/karakeep#91
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mowsat on GitHub (Apr 19, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/111
An option in the settings for forcing AI to use pre-existing tags would allow for more fine-tuned organization
@MikeKMiller commented on GitHub (Apr 20, 2024):
Possibly having existing tags be passed with the content, and have the AI api return any that 'could' apply, and new ones if 'none' apply. This way it does not just always come up with a new one, even if the same thing already exists. For example, mine has these two tags, that are exactly the same thing:
AI
Artificial Intelligence
If we passed the existing 'Artificial Intelligence' tag, it would have chosen it, and not created 'AI'
@MohamedBassem commented on GitHub (Apr 22, 2024):
This seems to be a popular request, so i'll probably have to implement it at some point. The main problem though is that the naive implementation will be expensive if you have a lot of tags. Basically, the naive implementation is that you pass all the tags of the user to openai/ollama on every request and ask it to only select from those tags. While this is easy to implement, every word you add to the AI request basically costs more money. So if you have 1000 tags for example, and every article you add is around 1000 words, you'll end up paying twice as much per inference request. I'm happy to add this as a feature with a big warning about this limitation but I'm not sure I like it.
The more advanced approach which I'm planning to implement is much more complex but will achieve the best result. The way it works from a high level is that we'll have a mechanism to find the potentially relevant tags from all the existing tags and pass only those to OpenAI making the request much cheaper. This will take a bit more time to implement though, but it's on my radar.
Does that make sense?
@1d618 commented on GitHub (Jul 25, 2024):
With gpt4o-mini, 3-5 thousand tokens are extremely cheap in input. and in the future the price will only go down, as it seems to me. and this is not to mention the use of local opensource models.
by the way, can I ask you a question? are there plans to add a function to summarise the content of the added page and use this summarisation in search?
@ant1fr commented on GitHub (Aug 10, 2024):
To address these near-duplicate tags, I suggest a few potential solutions:
Additionally, a complementary approach could involve periodic tag review and standardization. This would entail running a specific prompt that provides ChatGPT with all the AI-generated tags, asking it to suggest merges, clean up, and standardize the tags.
@devbydaniel commented on GitHub (Oct 13, 2024):
Another way to achieve a cleaner tag collection could revolve around using function calling as the method to get the tags (see also my suggestion in #529 before I saw this thread 😅 ). The function to pass to the LLM would be something like
The upside of this approach is that you get more flexibility and accuracy by having a strict data model including enums, the downside is that this only works with models capable of function calling.
The same mechanism can be applied if the AI should only select from existing tags and should not use new tags at all (which I would prefer tbh).
Regarding the price point: As of now, 10k characters are roughly 2.5k - 3k tokens which as of now cost $0.00045 with ChatGPT 4o-mini. So in my opinion, adding one or two thousand characters more to a prompt would not make much of a difference.
@MohamedBassem commented on GitHub (Nov 24, 2024):
I've just merged (
github.com/hoarder-app/hoarder@fdf28ae19a) from @kamtschatka which allows you to include existing tags in the custom prompts. This can allow you to instruct the tagging to only pick one of the existing tags. You need to be VERY careful when using this because it can make the cost of your prompts explode. I'll not advertise this too much, because I believe that the embeddings based approach is still better, but this can be a stop gap for now.@MikeKMiller commented on GitHub (Nov 24, 2024):
Thank you. I will probably delete all the tags, and re-infer them as my install now has 2000 tags, of which 99% relate to one link.
@MohamedBassem commented on GitHub (Nov 24, 2024):
@MikeKMiller Instead of deleting all of them, you can just use
$userTagsand re-run the tagging, which should attempt to include only the tags that you've manually tagged at least once.@stanstrup commented on GitHub (Dec 5, 2024):
This is great! Is there any way the final prompt? The preview seems to show the placeholder unless I am doing something wrong.
@kamtschatka commented on GitHub (Dec 5, 2024):
no you can't see the final prompt, that would have required some additional requests to get them into the UI and I figured it would not add any benefit, as it would be clear that it would just take all the tags you can see in the tags page.
@stanstrup commented on GitHub (Dec 5, 2024):
it would just give some peace of mind that it is doing what you think it is doing. Write the queries and response to a log file?
@thiswillbeyourgithub commented on GitHub (May 8, 2025):
Hi,
I have around 2000 bookmarks, but the LLMs generated about 10k tags.
Hence a couple of remarks:
[{"law" : 1, "education" : 0, "politics": 1]}. Of course with the input specifying those tags.I think this issue should be assigned a fairly medium/high priority as i've not been using karakeep for very long (although I imported stuff) yet was surprised to see it hanging because of the 10k tags that will inevitably keep growing.
All in all, there are some quick fix that would go a long way, and among the solutions that I outlined there seems to be some that are pretty straightforward and not overly complex to pull off.
I might code a fix using my karakeep python api to load them into python then do some heuristics like I mentionned.
@thiswillbeyourgithub commented on GitHub (May 14, 2025):
Update:
8above: there is actually sort of a way to know if an ollama model is a thinking model:curl -s http://localhost:11434/api/show -d '{"model": "qwen3:8b", "verbose": true}' | jq '.["model_info"]["tokenizer.ggml.tokens"]'shows its vocabulary, if you look towards the end (usually) you see:"ãĩº", "ãĩ½", "ï¨Ĭ", "áķ·", "âį¨", "âºŁ", "â½Ĺ", "<|endoftext|>", "<|im_start|>", "<|im_end|>", "<|object_ref_start|>", "<|object_ref_end|>", "<|box_start|>", "<|box_end|>", "<|quad_start|>", "<|quad_end|>", "<|vision_start|>", "<|vision_end|>", "<|vision_pad|>", "<|image_pad|>", "<|video_pad|>", "<tool_call>", "</tool_call>", "<|fim_prefix|>", "<|fim_middle|>", "<|fim_suffix|>", "<|fim_pad|>", "<|repo_name|>", "<|file_sep|>", "<tool_response>", "</tool_response>", "<think>", "</think>", "[PAD151669]", "[PAD151670]", "[PAD151671]", ... "[PAD151934]", "[PAD151935]" ]which contains<think>and</think>. But regardless of wether a model was actually trained to use those tokens, prompting it to use helps them. This is especially true regarding the structured output format: using athinkas first key always seems to help in my experience.@thiswillbeyourgithub commented on GitHub (May 18, 2025):
I have some ideas, I'm interested in opinions. It needs embeddings to be setup though.
Pre requisites:
What should happen when LLM tagging a bookmark:
Not sure if those N tags should include only AI generated or also human generated ones, I think the former because the latter could leak personal information.
I am not the least experienced in prompting and have used this kind of setup already in wdoc, my RAG library with very good results but I know that the
LLM as a judgeis a field in itself if anyone has experiences to share.Thoughts? Criticism welcome.
In my opinion, although it's a bit long to explain in details, it's actually straightforward and not overly complex. I don't think we can have "good tags" when hoarding stuff without this kind of setup anyway.
@richardgaywood commented on GitHub (Jun 10, 2025):
Noodling on this....
Most of the time, I want a new thing to be associated tags I already have. Less often, I want to add a new tag to cover some new area, and that should be as convenient as possible. So how about something along the lines of:
So the core set of tags does not grow unbounded with dupes and overlaps, but the extended tags are still there for when I realise the last 10 things I added needed a tag in common but I didn't have one. The system is always keeping an eye for new tags I could be using, but I am in control of which ones are in use.
EDIT TO ADD: and now, too late, I notice that
github.com/karakeep-app/karakeep@fdf28ae19aalready has distinguishes between user and AI tags...! loooool I am such an idiot.@peroksid5 commented on GitHub (Jun 16, 2025):
Hi. I've been playing with adding rules to prompts to use only my predefined tag list (or at least make those tags preferable) by using the $userTags placeholder. It doesn't really work (or at least I can't make it work), as the AI tagger still prefers its own tags (and with my test suite of 20 articles I always get almost 100 new tags). The idea was to just delete the AI tags periodically and keep the user defined ones.
The problem is that newly generated tags don't get added to $aiTags, but to $userTags, which is visible in /dashboard/tags: of the last batch of 71 AI created tags 69 get added to "Your Tags" and only 2 to "AI Tags".
Is this a bug or expected behaviour?
@richardgaywood commented on GitHub (Jun 16, 2025):
It seems to work OK for me, @peroksid5. How do you have your prompt set up? I added two rules to text tagging prompts under the "AI Settings" page:
Check if any of the previously used tags are relevant for this content. Reuse them when appropriate to maintain consistency.Here are your previously used tags that you should prioritize when appropriate: $userTagsWhen I look at any given article, the correctly reused tags do show up under the "AI Added" kind
But on my tags page, they are correctly listed and tallied under "your tags" (and they are correctly reused for future tag generation)
I plan to occasionally review the AI generated ones and add a new curated tag if I notice one is needed, then drag-n-drop multiple "AI" tags into the curated one to merge them. I don't mind the "AI" going nuts and adding tags at random as long as they're segmented away from my curated ones.
@peroksid5 commented on GitHub (Jun 16, 2025):
I've used your prompts and it works ok (with the tags correctly used and segmented between User and AI). Seems by making the prompt too complex I've managed to break my own tagging. Thank you. :)
@sprior commented on GitHub (Jun 21, 2025):
Not the entire solution, but tags with only one or two bookmarks aren't useful yet though they could become useful if more bookmarks later use them. I propose that the tags screen by default should not display any AI tags which have under a (potentially user configurable but a good number might be 3) threshold number of bookmarks. That'll clean up a lot of clutter as an easy fix.
@pdc1 commented on GitHub (Jul 24, 2025):
I've been working on prompts to avoid too many duplicate tags as well. It is VERY challenging to get the LLM to do so, it really has its own rules for summarizing.
That said, here are the custom prompts I use that help the situation. I still get a lot of one-offs, but in general it does a good job at picking some of my preferred tags as well. I found if I constrained the LLM too much, the results were less useful. To handle the one-offs, I wrote a small script using the API to remove stragglers (tags with less than 3 bookmarks works for me).
I actually use a fixed list instead of $userTags, but it should work well with $userTags, I just haven't bothered "promoting" all my preferred tags to user tags. You can see I also had some problems where the LLM tags everything involving food as also cooking and a recipe 😄.
This is using
ollamallama3.1:8b:@sprior commented on GitHub (Jul 24, 2025):
@pdc1 A minor addition I'd recommend is to suggest tags being in your preferred singular or plural form.
With the default prompts I end up getting tag merge suggestions like "meeting" to be merged with "meetings"
@Gaibhne commented on GitHub (Sep 1, 2025):
As this issue appears to not be 'fixed', I am confused by the discussions above. Are you guys using a fork, or is there something one can do with the current version to make what is being discussed above work ? At the moment, the AI tagging seems pretty unusable; I only added a dozen bookmarks and have a ton of single use tags, leaving me with nothing usable. Even my six added recipes didn't end up with a shared tag (three 'cooking', two 'recipes' and one 'thermomix'), so I'm not sure how this feature is supposed to be used at the moment ?
@kamtschatka commented on GitHub (Sep 1, 2025):
You can get something working by using the placeholders described in the docs and adding some custom instructions to the prompt: https://docs.karakeep.app/configuration: $tags, $aiTags, $userTags
The problem is, that the preexisting prompt has quite a few things most people with custom prompts do not want.
Unfortunately the preexisting prompt can not be modified, so the results for me are very suboptimal and I have to resort to a lot of manual tagging.
@josias-r commented on GitHub (Nov 10, 2025):
I'm not sure if this was brought up already, but wouldn't allowing tag-aliases indirectly solve this.
So if you alias "AI" with "Artificial Intelligence", the LLMs choice doesn't really matter anymore.
You already have tag-merging, why not just keep the alias around in the db, so future recurrences won't need to be merged again.
@Yasand123 commented on GitHub (Dec 23, 2025):
Not really. AI can come up with a lot of potential tags that are ever so slightly different. It's too "creative". Having "AI" and "Artificial Intelligence" as an example is not realistic, this is too ideal. In reality it's much messier than this to the point where having aliases is not feasible, because it keeps creating slightly different tags. Are you gonna just keep adding an infinite number of aliases?
At this point a proper solution would be much less time consuming than this workaround.