[GH-ISSUE #111] [Feature request] Force AI to use existing tags (instead of creating them) #91

Open
opened 2026-03-02 11:46:30 +03:00 by kerem · 25 comments
Owner

Originally created by @mowsat on GitHub (Apr 19, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/111

An option in the settings for forcing AI to use pre-existing tags would allow for more fine-tuned organization

Originally created by @mowsat on GitHub (Apr 19, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/111 An option in the settings for forcing AI to use pre-existing tags would allow for more fine-tuned organization
Author
Owner

@MikeKMiller commented on GitHub (Apr 20, 2024):

Possibly having existing tags be passed with the content, and have the AI api return any that 'could' apply, and new ones if 'none' apply. This way it does not just always come up with a new one, even if the same thing already exists. For example, mine has these two tags, that are exactly the same thing:
AI
Artificial Intelligence

If we passed the existing 'Artificial Intelligence' tag, it would have chosen it, and not created 'AI'

<!-- gh-comment-id:2067806138 --> @MikeKMiller commented on GitHub (Apr 20, 2024): Possibly having existing tags be passed with the content, and have the AI api return any that 'could' apply, and new ones if 'none' apply. This way it does not just always come up with a new one, even if the same thing already exists. For example, mine has these two tags, that are exactly the same thing: AI Artificial Intelligence If we passed the existing 'Artificial Intelligence' tag, it would have chosen it, and not created 'AI'
Author
Owner

@MohamedBassem commented on GitHub (Apr 22, 2024):

This seems to be a popular request, so i'll probably have to implement it at some point. The main problem though is that the naive implementation will be expensive if you have a lot of tags. Basically, the naive implementation is that you pass all the tags of the user to openai/ollama on every request and ask it to only select from those tags. While this is easy to implement, every word you add to the AI request basically costs more money. So if you have 1000 tags for example, and every article you add is around 1000 words, you'll end up paying twice as much per inference request. I'm happy to add this as a feature with a big warning about this limitation but I'm not sure I like it.

The more advanced approach which I'm planning to implement is much more complex but will achieve the best result. The way it works from a high level is that we'll have a mechanism to find the potentially relevant tags from all the existing tags and pass only those to OpenAI making the request much cheaper. This will take a bit more time to implement though, but it's on my radar.

Does that make sense?

<!-- gh-comment-id:2069013079 --> @MohamedBassem commented on GitHub (Apr 22, 2024): This seems to be a popular request, so i'll probably have to implement it at some point. The main problem though is that the naive implementation will be expensive if you have a lot of tags. Basically, the naive implementation is that you pass all the tags of the user to openai/ollama on every request and ask it to only select from those tags. While this is easy to implement, every word you add to the AI request basically costs more money. So if you have 1000 tags for example, and every article you add is around 1000 words, you'll end up paying twice as much per inference request. I'm happy to add this as a feature with a big warning about this limitation but I'm not sure I like it. The more advanced approach which I'm planning to implement is much more complex but will achieve the best result. The way it works from a high level is that we'll have a mechanism to find the potentially relevant tags from all the existing tags and pass only those to OpenAI making the request much cheaper. This will take a bit more time to implement though, but it's on my radar. Does that make sense?
Author
Owner

@1d618 commented on GitHub (Jul 25, 2024):

While this is easy to implement, every word you add to the AI request basically costs more money. So if you have 1000 tags for example, and every article you add is around 1000 words, you'll end up paying twice as much per inference request. I'm happy to add this as a feature with a big warning about this limitation but I'm not sure I like it.

With gpt4o-mini, 3-5 thousand tokens are extremely cheap in input. and in the future the price will only go down, as it seems to me. and this is not to mention the use of local opensource models.

by the way, can I ask you a question? are there plans to add a function to summarise the content of the added page and use this summarisation in search?

<!-- gh-comment-id:2249883758 --> @1d618 commented on GitHub (Jul 25, 2024): > While this is easy to implement, every word you add to the AI request basically costs more money. So if you have 1000 tags for example, and every article you add is around 1000 words, you'll end up paying twice as much per inference request. I'm happy to add this as a feature with a big warning about this limitation but I'm not sure I like it. > With gpt4o-mini, 3-5 thousand tokens are extremely cheap in input. and in the future the price will only go down, as it seems to me. and this is not to mention the use of local opensource models. by the way, can I ask you a question? are there plans to add a function to summarise the content of the added page and use this summarisation in search?
Author
Owner

@ant1fr commented on GitHub (Aug 10, 2024):

To address these near-duplicate tags, I suggest a few potential solutions:

  • Prompt Adjustment: Modify the prompt to include existing tags and instruct ChatGPT to select from them when appropriate. Pros: This approach keeps tagging fully automated and more consistent. Cons: It could increase token usage over time.
  • User-Validated Tags: Enhance the tag management system to allow users to "validate" AI-generated tags by dragging them from "AI Tags" to "My Tags." These user-approved tags could then be used in future prompts. Pros: Less token-intensive than option 1. Cons: Requires user involvement, which may be time-consuming.

Additionally, a complementary approach could involve periodic tag review and standardization. This would entail running a specific prompt that provides ChatGPT with all the AI-generated tags, asking it to suggest merges, clean up, and standardize the tags.

<!-- gh-comment-id:2282211760 --> @ant1fr commented on GitHub (Aug 10, 2024): To address these near-duplicate tags, I suggest a few potential solutions: - Prompt Adjustment: Modify the prompt to include existing tags and instruct ChatGPT to select from them when appropriate. Pros: This approach keeps tagging fully automated and more consistent. Cons: It could increase token usage over time. - User-Validated Tags: Enhance the tag management system to allow users to "validate" AI-generated tags by dragging them from "AI Tags" to "My Tags." These user-approved tags could then be used in future prompts. Pros: Less token-intensive than option 1. Cons: Requires user involvement, which may be time-consuming. Additionally, a complementary approach could involve periodic tag review and standardization. This would entail running a specific prompt that provides ChatGPT with all the AI-generated tags, asking it to suggest merges, clean up, and standardize the tags.
Author
Owner

@devbydaniel commented on GitHub (Oct 13, 2024):

Another way to achieve a cleaner tag collection could revolve around using function calling as the method to get the tags (see also my suggestion in #529 before I saw this thread 😅 ). The function to pass to the LLM would be something like

{
    "name": "assign_tags",
    "description": "Assign tags to the given bookmark according to the content. You can assign existing tags which are already used by the user and new tags which the user does not use yet. Use the 'existing_tags' property whenever possible and only introduce new tags if no existing tag fits. If no tags fit the bookmark, return an empty array for both 'new_tags' and 'existing_tags'",
    "parameters": {
        "type": "object",
        "properties": {
            "existing_tags": {
                "type": "array",
                "description": "Through the enum you see all tags which the user already defined. Use these tags preferably.",
                "items": {
                    "type": "string",
                    "enum": ["tag1", "tag2", "tag3"] // dynamically load user defined tags here
                },
            },
            "new_tags": {
                "type": "array",
                "description": "The array of new tags which should be assigned to the bookmark. Only use these if no existing tag fits. Return an empty array if you don't need to use these.",
                    "items": {
                        "type": "string",
                },
            },
        },
        "required": ["existing_tags", "new_tags"],
        "additionalProperties": false,
    }
}

The upside of this approach is that you get more flexibility and accuracy by having a strict data model including enums, the downside is that this only works with models capable of function calling.

The same mechanism can be applied if the AI should only select from existing tags and should not use new tags at all (which I would prefer tbh).

Regarding the price point: As of now, 10k characters are roughly 2.5k - 3k tokens which as of now cost $0.00045 with ChatGPT 4o-mini. So in my opinion, adding one or two thousand characters more to a prompt would not make much of a difference.

<!-- gh-comment-id:2409012159 --> @devbydaniel commented on GitHub (Oct 13, 2024): Another way to achieve a cleaner tag collection could revolve around using [function calling](https://platform.openai.com/docs/guides/function-calling) as the method to get the tags (see also my suggestion in #529 before I saw this thread 😅 ). The function to pass to the LLM would be something like ``` { "name": "assign_tags", "description": "Assign tags to the given bookmark according to the content. You can assign existing tags which are already used by the user and new tags which the user does not use yet. Use the 'existing_tags' property whenever possible and only introduce new tags if no existing tag fits. If no tags fit the bookmark, return an empty array for both 'new_tags' and 'existing_tags'", "parameters": { "type": "object", "properties": { "existing_tags": { "type": "array", "description": "Through the enum you see all tags which the user already defined. Use these tags preferably.", "items": { "type": "string", "enum": ["tag1", "tag2", "tag3"] // dynamically load user defined tags here }, }, "new_tags": { "type": "array", "description": "The array of new tags which should be assigned to the bookmark. Only use these if no existing tag fits. Return an empty array if you don't need to use these.", "items": { "type": "string", }, }, }, "required": ["existing_tags", "new_tags"], "additionalProperties": false, } } ``` The upside of this approach is that you get more flexibility and accuracy by having a strict data model including enums, the downside is that this only works with models capable of function calling. The same mechanism can be applied if the AI should only select from existing tags and should not use new tags at all (which I would prefer tbh). Regarding the price point: As of now, 10k characters are roughly 2.5k - 3k tokens which as of now cost $0.00045 with ChatGPT 4o-mini. So in my opinion, adding one or two thousand characters more to a prompt would not make much of a difference.
Author
Owner

@MohamedBassem commented on GitHub (Nov 24, 2024):

I've just merged (github.com/hoarder-app/hoarder@fdf28ae19a) from @kamtschatka which allows you to include existing tags in the custom prompts. This can allow you to instruct the tagging to only pick one of the existing tags. You need to be VERY careful when using this because it can make the cost of your prompts explode. I'll not advertise this too much, because I believe that the embeddings based approach is still better, but this can be a stop gap for now.

<!-- gh-comment-id:2496169585 --> @MohamedBassem commented on GitHub (Nov 24, 2024): I've just merged (https://github.com/hoarder-app/hoarder/commit/fdf28ae19ac8d7314bfa6c5d24fdcbabba0aee32) from @kamtschatka which allows you to include existing tags in the custom prompts. This can allow you to instruct the tagging to only pick one of the existing tags. You need to be VERY careful when using this because it can make the cost of your prompts explode. I'll not advertise this too much, because I believe that the embeddings based approach is still better, but this can be a stop gap for now.
Author
Owner

@MikeKMiller commented on GitHub (Nov 24, 2024):

Thank you. I will probably delete all the tags, and re-infer them as my install now has 2000 tags, of which 99% relate to one link.

<!-- gh-comment-id:2496171054 --> @MikeKMiller commented on GitHub (Nov 24, 2024): Thank you. I will probably delete all the tags, and re-infer them as my install now has 2000 tags, of which 99% relate to one link.
Author
Owner

@MohamedBassem commented on GitHub (Nov 24, 2024):

@MikeKMiller Instead of deleting all of them, you can just use$userTags and re-run the tagging, which should attempt to include only the tags that you've manually tagged at least once.

<!-- gh-comment-id:2496173499 --> @MohamedBassem commented on GitHub (Nov 24, 2024): @MikeKMiller Instead of deleting all of them, you can just use`$userTags` and re-run the tagging, which should attempt to include only the tags that you've manually tagged at least once.
Author
Owner

@stanstrup commented on GitHub (Dec 5, 2024):

This is great! Is there any way the final prompt? The preview seems to show the placeholder unless I am doing something wrong.

<!-- gh-comment-id:2520202480 --> @stanstrup commented on GitHub (Dec 5, 2024): This is great! Is there any way the final prompt? The preview seems to show the placeholder unless I am doing something wrong.
Author
Owner

@kamtschatka commented on GitHub (Dec 5, 2024):

no you can't see the final prompt, that would have required some additional requests to get them into the UI and I figured it would not add any benefit, as it would be clear that it would just take all the tags you can see in the tags page.

<!-- gh-comment-id:2520218230 --> @kamtschatka commented on GitHub (Dec 5, 2024): no you can't see the final prompt, that would have required some additional requests to get them into the UI and I figured it would not add any benefit, as it would be clear that it would just take all the tags you can see in the tags page.
Author
Owner

@stanstrup commented on GitHub (Dec 5, 2024):

it would just give some peace of mind that it is doing what you think it is doing. Write the queries and response to a log file?

<!-- gh-comment-id:2521011444 --> @stanstrup commented on GitHub (Dec 5, 2024): it would just give some peace of mind that it is doing what you think it is doing. Write the queries and response to a log file?
Author
Owner

@thiswillbeyourgithub commented on GitHub (May 8, 2025):

Hi,

I have around 2000 bookmarks, but the LLMs generated about 10k tags.

Hence a couple of remarks:

  1. When going to the webpage that displays the tags: the UI practically hangs because it's trying to display them all instead of lazy loading like for the bookmarks page. This will need to be fixed at some point because karakeep aims specifically at hoarders that can end up with way too many pages. Edit: created an issue to track this in #1404
  2. I'd say 99% of the tags are for only one bookmark because they are too specific. I think the prompt could be improved to ask to provide only broad categories like "law" instead of " american inheritance law" for example. This makes the tags useless. Edit: I had not seen there was actually a cleanup page to merge the tags already!
  3. I don't think tags like "law" and "laws" are merged, but I think they should. More generally if two tags share a stem we should probably merge it. This should be easy to implement and buy us some time. Edit: had not noticed there was a cleanup page where this could be addressed.
  4. More generally: when a tag contains some previous tag into it, I think it should be merged. For example if "law" exists and some ai suggests "inheritance law" then the easy fix would be to understand that we should just use the tag "law" instead of the other one. This could be done via heuristics instead of using LLMs as a judge. The hard ambitious solution would be to allow nested tagging hierarchy: so "law > inheritance law".
  5. I believe it should be pretty easy to gather the 25 closest (embeddings wise) tags for a given new bookmark, then with structured output allowing it to answer like so: [{"law" : 1, "education" : 0, "politics": 1]}. Of course with the input specifying those tags.
  6. It would be great to set via a setting the granularity of ai tags with one simple knob: the number of words. For example for 1: "law", for 2: "inheritance law", 3: "american inheritance law". The default would be 1. I'd set it to 2 personaly. Backend wise it would just appear in the prompt in the form of "make tags of only {n} word long".
  7. You were concerned about the cost balloning. In thase case, if summaries are enabled by the user we should use the summary instead of the full text when generating tags.
  8. In general: when using structured output for thinking LLMs it helps them to provide a "thinking" first key in their json output which can then be discarded. And there does not appear to be a way for the user to specify wether the LLM is supporting thinking or not, I think it should be because that has an impact on prompting. Edit: implemented in PR #1474

I think this issue should be assigned a fairly medium/high priority as i've not been using karakeep for very long (although I imported stuff) yet was surprised to see it hanging because of the 10k tags that will inevitably keep growing.

All in all, there are some quick fix that would go a long way, and among the solutions that I outlined there seems to be some that are pretty straightforward and not overly complex to pull off.

I might code a fix using my karakeep python api to load them into python then do some heuristics like I mentionned.

<!-- gh-comment-id:2864622221 --> @thiswillbeyourgithub commented on GitHub (May 8, 2025): Hi, I have around 2000 bookmarks, but the LLMs generated about 10k tags. Hence a couple of remarks: 1. When going to the webpage that displays the tags: the UI practically hangs because it's trying to display them all instead of lazy loading like for the bookmarks page. This will need to be fixed at some point because karakeep aims specifically at hoarders that can end up with way too many pages. **Edit: created an issue to track this in #1404** 2. I'd say 99% of the tags are for only one bookmark because they are too specific. I think the prompt could be improved to ask to provide only broad categories like "law" instead of " american inheritance law" for example. This makes the tags useless. **Edit: I had not seen there was actually a cleanup page to merge the tags already!** 3. I don't think tags like "law" and "laws" are merged, but I think they should. More generally if two tags share a stem we should probably merge it. This should be easy to implement and buy us some time. **Edit: had not noticed there was a cleanup page where this could be addressed**. 4. More generally: when a tag contains some previous tag into it, I think it should be merged. For example if "law" exists and some ai suggests "inheritance law" then the easy fix would be to understand that we should just use the tag "law" instead of the other one. This could be done via heuristics instead of using LLMs as a judge. The hard ambitious solution would be to allow nested tagging hierarchy: so "law > inheritance law". 5. I believe it should be pretty easy to gather the 25 closest (embeddings wise) tags for a given new bookmark, then with structured output allowing it to answer like so: `[{"law" : 1, "education" : 0, "politics": 1]}`. Of course with the input specifying those tags. 6. It would be great to set via a setting the granularity of ai tags with one simple knob: the number of words. For example for 1: "law", for 2: "inheritance law", 3: "american inheritance law". The default would be 1. I'd set it to 2 personaly. Backend wise it would just appear in the prompt in the form of "make tags of only {n} word long". 7. You were concerned about the cost balloning. In thase case, if summaries are enabled by the user we should use the summary instead of the full text when generating tags. 8. In general: when using structured output for thinking LLMs it helps them to provide a "thinking" first key in their json output which can then be discarded. And there does not appear to be a way for the user to specify wether the LLM is supporting thinking or not, I think it should be because that has an impact on prompting. Edit: implemented in PR #1474 I think this issue should be assigned a fairly medium/high priority as i've not been using karakeep for very long (although I imported stuff) yet was surprised to see it hanging because of the 10k tags that will inevitably keep growing. All in all, there are some quick fix that would go a long way, and among the solutions that I outlined there seems to be some that are pretty straightforward and not overly complex to pull off. I might code a fix using my [karakeep python api](https://github.com/thiswillbeyourgithub/karakeep_python_api) to load them into python then do some heuristics like I mentionned.
Author
Owner

@thiswillbeyourgithub commented on GitHub (May 14, 2025):

Update:

  1. I had somehow missed the whole "tag cleanup" page, I think it should be linked to also at the top of the "tags" page. I edited the comment above. Are the merge rules remembered and auto re applied or not? It seems that the "ignore" button is not remembered.
  2. Contrary to what I said in point 8 above: there is actually sort of a way to know if an ollama model is a thinking model: curl -s http://localhost:11434/api/show -d '{"model": "qwen3:8b", "verbose": true}' | jq '.["model_info"]["tokenizer.ggml.tokens"]' shows its vocabulary, if you look towards the end (usually) you see: "ãĩº", "ãĩ½", "ï¨Ĭ", "áķ·", "âį¨", "âºŁ", "â½Ĺ", "<|endoftext|>", "<|im_start|>", "<|im_end|>", "<|object_ref_start|>", "<|object_ref_end|>", "<|box_start|>", "<|box_end|>", "<|quad_start|>", "<|quad_end|>", "<|vision_start|>", "<|vision_end|>", "<|vision_pad|>", "<|image_pad|>", "<|video_pad|>", "<tool_call>", "</tool_call>", "<|fim_prefix|>", "<|fim_middle|>", "<|fim_suffix|>", "<|fim_pad|>", "<|repo_name|>", "<|file_sep|>", "<tool_response>", "</tool_response>", "<think>", "</think>", "[PAD151669]", "[PAD151670]", "[PAD151671]", ... "[PAD151934]", "[PAD151935]" ] which contains <think> and </think>. But regardless of wether a model was actually trained to use those tokens, prompting it to use helps them. This is especially true regarding the structured output format: using a think as first key always seems to help in my experience.
<!-- gh-comment-id:2879850512 --> @thiswillbeyourgithub commented on GitHub (May 14, 2025): Update: 1. I had somehow missed the whole "tag cleanup" page, I think it should be linked to also at the top of the "tags" page. I edited the comment above. Are the merge rules remembered and auto re applied or not? It seems that the "ignore" button is not remembered. 2. Contrary to what I said in point `8` above: there is actually sort of a way to know if an ollama model is a thinking model: `curl -s http://localhost:11434/api/show -d '{"model": "qwen3:8b", "verbose": true}' | jq '.["model_info"]["tokenizer.ggml.tokens"]'` shows its vocabulary, if you look towards the end (usually) you see: ` "ãĩº", "ãĩ½", "ï¨Ĭ", "áķ·", "âį¨", "âºŁ", "â½Ĺ", "<|endoftext|>", "<|im_start|>", "<|im_end|>", "<|object_ref_start|>", "<|object_ref_end|>", "<|box_start|>", "<|box_end|>", "<|quad_start|>", "<|quad_end|>", "<|vision_start|>", "<|vision_end|>", "<|vision_pad|>", "<|image_pad|>", "<|video_pad|>", "<tool_call>", "</tool_call>", "<|fim_prefix|>", "<|fim_middle|>", "<|fim_suffix|>", "<|fim_pad|>", "<|repo_name|>", "<|file_sep|>", "<tool_response>", "</tool_response>", "<think>", "</think>", "[PAD151669]", "[PAD151670]", "[PAD151671]", ... "[PAD151934]", "[PAD151935]" ]` which contains `<think>` and `</think>`. But regardless of wether a model was actually trained to use those tokens, prompting it to use <think> helps them. This is especially true regarding the structured output format: using a `think` as first key always seems to help in my experience.
Author
Owner

@thiswillbeyourgithub commented on GitHub (May 18, 2025):

I have some ideas, I'm interested in opinions. It needs embeddings to be setup though.

Pre requisites:

  1. get embeddings working (#834 )
  2. add a "recreate tag embeddings" in the background jobs settings page

What should happen when LLM tagging a bookmark:

  1. create the embedding of the bookmark's content
  2. create the summary of the content (this makes the next steps way faster)
  3. find the N most similar tags using embedding search (tags <-> summary), say N=30 (note: with a hardcoded lower threshold to avoid matching inappropriately when we bookmark something really new. This bound should probably be automatically calibrated per embedding-model: like using the distance between tags "history" and "computer science" to the header of the napoleon wikipedia page as some model can differ wildly in how they consider similarity.)
  4. Group those 30 tags into batches of 5, use structured output (each tag as one key) to ask the LLM on a scale of 1 to 10 if it thinks that the tag is a good fit for that article.
  5. Out of the 30, keep the 5 highest scores (with a user defined lower bound, default to 8).
  6. If we are left with less than 5 tags, we ask the llm to produce the remaining ones but mentioning the already present tag in the prompt.
  7. If we have 5 already, we still ask the LLM to produce a new one, we then ask it again to rate the appropriateness of the 6 tags and keep the 5 best. This would help avoid the tags being too influenced by your previous bookmarks. Also this allows tracing in the logs the effectiveness of that setup.

Not sure if those N tags should include only AI generated or also human generated ones, I think the former because the latter could leak personal information.

I am not the least experienced in prompting and have used this kind of setup already in wdoc, my RAG library with very good results but I know that the LLM as a judge is a field in itself if anyone has experiences to share.

Thoughts? Criticism welcome.

In my opinion, although it's a bit long to explain in details, it's actually straightforward and not overly complex. I don't think we can have "good tags" when hoarding stuff without this kind of setup anyway.

<!-- gh-comment-id:2888986938 --> @thiswillbeyourgithub commented on GitHub (May 18, 2025): I have some ideas, I'm interested in opinions. It needs embeddings to be setup though. Pre requisites: 1. get embeddings working (#834 ) 2. add a "recreate tag embeddings" in the background jobs settings page What should happen when LLM tagging a bookmark: 1. create the embedding of the bookmark's content 2. create the summary of the content (this makes the next steps way faster) 3. find the N most similar tags using embedding search (tags <-> summary), say N=30 (note: with a hardcoded lower threshold to avoid matching inappropriately when we bookmark something really new. This bound should probably be automatically calibrated per embedding-model: like using the distance between tags "history" and "computer science" to the header of the [napoleon wikipedia page](https://en.wikipedia.org/wiki/Napoleon) as some model can differ wildly in how they consider similarity.) 4. Group those 30 tags into batches of 5, use structured output (each tag as one key) to ask the LLM on a scale of 1 to 10 if it thinks that the tag is a good fit for that article. 5. Out of the 30, keep the 5 highest scores (with a user defined lower bound, default to 8). 6. If we are left with less than 5 tags, we ask the llm to produce the remaining ones but mentioning the already present tag in the prompt. 7. If we have 5 already, we still ask the LLM to produce a new one, we then ask it again to rate the appropriateness of the 6 tags and keep the 5 best. This would help avoid the tags being too influenced by your previous bookmarks. Also this allows tracing in the logs the effectiveness of that setup. Not sure if those N tags should include only AI generated or also human generated ones, I think the former because the latter could leak personal information. I am not the least experienced in prompting and have used this kind of setup already in [wdoc](https://github.com/thiswillbeyourgithub/wdoc/), my RAG library with very good results but I know that the `LLM as a judge` is a field in itself if anyone has experiences to share. Thoughts? Criticism welcome. In my opinion, although it's a bit long to explain in details, it's actually straightforward and not overly complex. I don't think we can have "good tags" when hoarding stuff without this kind of setup anyway.
Author
Owner

@richardgaywood commented on GitHub (Jun 10, 2025):

Basically, the naive implementation is that you pass all the tags of the user to openai/ollama on every request and ask it to only select from those tags. While this is easy to implement, every word you add to the AI request basically costs more money. So if you have 1000 tags for example, and every article you add is around 1000 words, you'll end up paying twice as much per inference request.

Noodling on this....

Most of the time, I want a new thing to be associated tags I already have. Less often, I want to add a new tag to cover some new area, and that should be as convenient as possible. So how about something along the lines of:

  • There are two kinds of tags in the system; ones the user has made, and ones that were added by an LLM. The former, we can probably assume, will be a manageable number. Call these the "core" and "extended" tags.
  • When the LLM is prompted, only the core tags are passed in. The prompt tells the LLM to use those tags wherever possible, but also to try and guess some other tags that might apply. These are applied to the new bookmark as core and extended tags.
  • A user can choose to "promote" one of the extended tags into the core set. This means it will be used for later tagging on new bookmarks.
  • (Optional extension) The system displays the extended tags in their own part of the Tags page so users can review them for addition to the core set.
  • (Optional extension) When core tags change, maybe rescan old content to see if they fit the new tags somehow.

So the core set of tags does not grow unbounded with dupes and overlaps, but the extended tags are still there for when I realise the last 10 things I added needed a tag in common but I didn't have one. The system is always keeping an eye for new tags I could be using, but I am in control of which ones are in use.

EDIT TO ADD: and now, too late, I notice that github.com/karakeep-app/karakeep@fdf28ae19a already has distinguishes between user and AI tags...! loooool I am such an idiot.

<!-- gh-comment-id:2958462960 --> @richardgaywood commented on GitHub (Jun 10, 2025): > Basically, the naive implementation is that you pass all the tags of the user to openai/ollama on every request and ask it to only select from those tags. While this is easy to implement, every word you add to the AI request basically costs more money. So if you have 1000 tags for example, and every article you add is around 1000 words, you'll end up paying twice as much per inference request. Noodling on this.... Most of the time, I want a new thing to be associated tags I already have. Less often, I want to add a new tag to cover some new area, and that should be as convenient as possible. So how about something along the lines of: - There are two kinds of tags in the system; ones the user has made, and ones that were added by an LLM. The former, we can probably assume, will be a manageable number. Call these the "core" and "extended" tags. - When the LLM is prompted, only the core tags are passed in. The prompt tells the LLM to use those tags wherever possible, but also to try and guess some other tags that might apply. These are applied to the new bookmark as core and extended tags. - A user can choose to "promote" one of the extended tags into the core set. This means it will be used for later tagging on new bookmarks. - (Optional extension) The system displays the extended tags in their own part of the Tags page so users can review them for addition to the core set. - (Optional extension) When core tags change, maybe rescan old content to see if they fit the new tags somehow. So the core set of tags does not grow unbounded with dupes and overlaps, but the extended tags are still there for when I realise the last 10 things I added needed a tag in common but I didn't have one. The system is always keeping an eye for new tags I could be using, but I am in control of which ones are in use. **EDIT TO ADD: and now, too late, I notice that https://github.com/karakeep-app/karakeep/commit/fdf28ae19ac8d7314bfa6c5d24fdcbabba0aee32 already has distinguishes between user and AI tags...! loooool I am such an idiot.**
Author
Owner

@peroksid5 commented on GitHub (Jun 16, 2025):

Hi. I've been playing with adding rules to prompts to use only my predefined tag list (or at least make those tags preferable) by using the $userTags placeholder. It doesn't really work (or at least I can't make it work), as the AI tagger still prefers its own tags (and with my test suite of 20 articles I always get almost 100 new tags). The idea was to just delete the AI tags periodically and keep the user defined ones.

The problem is that newly generated tags don't get added to $aiTags, but to $userTags, which is visible in /dashboard/tags: of the last batch of 71 AI created tags 69 get added to "Your Tags" and only 2 to "AI Tags".

Is this a bug or expected behaviour?

<!-- gh-comment-id:2976648236 --> @peroksid5 commented on GitHub (Jun 16, 2025): Hi. I've been playing with adding rules to prompts to use only my predefined tag list (or at least make those tags preferable) by using the $userTags placeholder. It doesn't really work (or at least I can't make it work), as the AI tagger still prefers its own tags (and with my test suite of 20 articles I always get almost 100 new tags). The idea was to just delete the AI tags periodically and keep the user defined ones. The problem is that newly generated tags don't get added to $aiTags, but to $userTags, which is visible in /dashboard/tags: of the last batch of 71 AI created tags 69 get added to "Your Tags" and only 2 to "AI Tags". Is this a bug or expected behaviour?
Author
Owner

@richardgaywood commented on GitHub (Jun 16, 2025):

It seems to work OK for me, @peroksid5. How do you have your prompt set up? I added two rules to text tagging prompts under the "AI Settings" page:

  1. Check if any of the previously used tags are relevant for this content. Reuse them when appropriate to maintain consistency.
  2. Here are your previously used tags that you should prioritize when appropriate: $userTags

When I look at any given article, the correctly reused tags do show up under the "AI Added" kind

Image

But on my tags page, they are correctly listed and tallied under "your tags" (and they are correctly reused for future tag generation)

Image

I plan to occasionally review the AI generated ones and add a new curated tag if I notice one is needed, then drag-n-drop multiple "AI" tags into the curated one to merge them. I don't mind the "AI" going nuts and adding tags at random as long as they're segmented away from my curated ones.

<!-- gh-comment-id:2976679724 --> @richardgaywood commented on GitHub (Jun 16, 2025): It seems to work OK for me, @peroksid5. How do you have your prompt set up? I added two rules to text tagging prompts under the "AI Settings" page: 1. `Check if any of the previously used tags are relevant for this content. Reuse them when appropriate to maintain consistency.` 2. `Here are your previously used tags that you should prioritize when appropriate: $userTags` When I look at any given article, the correctly reused tags do show up under the "AI Added" kind <img width="539" alt="Image" src="https://github.com/user-attachments/assets/c4309e49-25b0-457b-8df4-3d1fe6677066" /> But on my tags page, they are correctly listed and tallied under "your tags" (and they are correctly reused for future tag generation) <img width="421" alt="Image" src="https://github.com/user-attachments/assets/e93faf79-e92b-4416-9f05-de84352ddf43" /> I plan to occasionally review the AI generated ones and add a new curated tag if I notice one is needed, then drag-n-drop multiple "AI" tags into the curated one to merge them. I don't mind the "AI" going nuts and adding tags at random as long as they're segmented away from my curated ones.
Author
Owner

@peroksid5 commented on GitHub (Jun 16, 2025):

I've used your prompts and it works ok (with the tags correctly used and segmented between User and AI). Seems by making the prompt too complex I've managed to break my own tagging. Thank you. :)

<!-- gh-comment-id:2976726461 --> @peroksid5 commented on GitHub (Jun 16, 2025): I've used your prompts and it works ok (with the tags correctly used and segmented between User and AI). Seems by making the prompt too complex I've managed to break my own tagging. Thank you. :)
Author
Owner

@sprior commented on GitHub (Jun 21, 2025):

Not the entire solution, but tags with only one or two bookmarks aren't useful yet though they could become useful if more bookmarks later use them. I propose that the tags screen by default should not display any AI tags which have under a (potentially user configurable but a good number might be 3) threshold number of bookmarks. That'll clean up a lot of clutter as an easy fix.

<!-- gh-comment-id:2993377106 --> @sprior commented on GitHub (Jun 21, 2025): Not the entire solution, but tags with only one or two bookmarks aren't useful yet though they could become useful if more bookmarks later use them. I propose that the tags screen by default should not display any AI tags which have under a (potentially user configurable but a good number might be 3) threshold number of bookmarks. That'll clean up a lot of clutter as an easy fix.
Author
Owner

@pdc1 commented on GitHub (Jul 24, 2025):

I've been working on prompts to avoid too many duplicate tags as well. It is VERY challenging to get the LLM to do so, it really has its own rules for summarizing.

That said, here are the custom prompts I use that help the situation. I still get a lot of one-offs, but in general it does a good job at picking some of my preferred tags as well. I found if I constrained the LLM too much, the results were less useful. To handle the one-offs, I wrote a small script using the API to remove stragglers (tags with less than 3 bookmarks works for me).

I actually use a fixed list instead of $userTags, but it should work well with $userTags, I just haven't bothered "promoting" all my preferred tags to user tags. You can see I also had some problems where the LLM tags everything involving food as also cooking and a recipe 😄.

This is using ollama llama3.1:8b:

  • The total number of tags must not exceed 5.
  • Use tags from this preferred list when possible: $userTags.
  • If at least 3 preferred tags apply, do not add a new tag.
  • If fewer than 3 preferred tags apply, you may include **one new tag**, but only if it adds a clearly distinct concept.
  • Use only one tag for closely related ideas (e.g., for "recipe", "cooking", and "food", choose the most relevant).
  • All tags must be lowercase, use spaces instead of punctuation or camelCase, and be formatted as JSON: {"tags": ["tag1", "tag2", ...]}.
  • Do not use generic tags like "technology" or "computing". Use specific categories instead.
  • Do not include explanations or commentary.
<!-- gh-comment-id:3113752494 --> @pdc1 commented on GitHub (Jul 24, 2025): I've been working on prompts to avoid too many duplicate tags as well. It is VERY challenging to get the LLM to do so, it really has its own rules for summarizing. That said, here are the custom prompts I use that help the situation. I still get a lot of one-offs, but in general it does a good job at picking some of my preferred tags as well. I found if I constrained the LLM too much, the results were less useful. To handle the one-offs, I wrote a small script using the API to remove stragglers (tags with less than 3 bookmarks works for me). I actually use a fixed list instead of $userTags, but it should work well with $userTags, I just haven't bothered "promoting" all my preferred tags to user tags. You can see I also had some problems where the LLM tags everything involving food as also cooking and a recipe 😄. This is using `ollama` `llama3.1:8b`: - The total number of tags must not exceed 5. - Use tags from this preferred list when possible: $userTags. - If at least 3 preferred tags apply, do not add a new tag. - If fewer than 3 preferred tags apply, you may include \*\*one new tag**, but only if it adds a clearly distinct concept. - Use only one tag for closely related ideas (e.g., for "recipe", "cooking", and "food", choose the most relevant). - All tags must be lowercase, use spaces instead of punctuation or camelCase, and be formatted as JSON: {"tags": ["tag1", "tag2", ...]}. - Do not use generic tags like "technology" or "computing". Use specific categories instead. - Do not include explanations or commentary.
Author
Owner

@sprior commented on GitHub (Jul 24, 2025):

@pdc1 A minor addition I'd recommend is to suggest tags being in your preferred singular or plural form.
With the default prompts I end up getting tag merge suggestions like "meeting" to be merged with "meetings"

<!-- gh-comment-id:3114037055 --> @sprior commented on GitHub (Jul 24, 2025): @pdc1 A minor addition I'd recommend is to suggest tags being in your preferred singular or plural form. With the default prompts I end up getting tag merge suggestions like "meeting" to be merged with "meetings"
Author
Owner

@Gaibhne commented on GitHub (Sep 1, 2025):

As this issue appears to not be 'fixed', I am confused by the discussions above. Are you guys using a fork, or is there something one can do with the current version to make what is being discussed above work ? At the moment, the AI tagging seems pretty unusable; I only added a dozen bookmarks and have a ton of single use tags, leaving me with nothing usable. Even my six added recipes didn't end up with a shared tag (three 'cooking', two 'recipes' and one 'thermomix'), so I'm not sure how this feature is supposed to be used at the moment ?

<!-- gh-comment-id:3242711642 --> @Gaibhne commented on GitHub (Sep 1, 2025): As this issue appears to not be 'fixed', I am confused by the discussions above. Are you guys using a fork, or is there something one can do with the current version to make what is being discussed above work ? At the moment, the AI tagging seems pretty unusable; I only added a dozen bookmarks and have a ton of single use tags, leaving me with nothing usable. Even my six added recipes didn't end up with a shared tag (three 'cooking', two 'recipes' and one 'thermomix'), so I'm not sure how this feature is supposed to be used at the moment ?
Author
Owner

@kamtschatka commented on GitHub (Sep 1, 2025):

You can get something working by using the placeholders described in the docs and adding some custom instructions to the prompt: https://docs.karakeep.app/configuration: $tags, $aiTags, $userTags
The problem is, that the preexisting prompt has quite a few things most people with custom prompts do not want.

Unfortunately the preexisting prompt can not be modified, so the results for me are very suboptimal and I have to resort to a lot of manual tagging.

<!-- gh-comment-id:3242779824 --> @kamtschatka commented on GitHub (Sep 1, 2025): You can get something working by using the placeholders described in the docs and adding some custom instructions to the prompt: https://docs.karakeep.app/configuration: $tags, $aiTags, $userTags The problem is, that the preexisting prompt has quite a few things most people with custom prompts do not want. Unfortunately the preexisting prompt can not be modified, so the results for me are very suboptimal and I have to resort to a lot of manual tagging.
Author
Owner

@josias-r commented on GitHub (Nov 10, 2025):

I'm not sure if this was brought up already, but wouldn't allowing tag-aliases indirectly solve this.

So if you alias "AI" with "Artificial Intelligence", the LLMs choice doesn't really matter anymore.

You already have tag-merging, why not just keep the alias around in the db, so future recurrences won't need to be merged again.

<!-- gh-comment-id:3513776154 --> @josias-r commented on GitHub (Nov 10, 2025): I'm not sure if this was brought up already, but wouldn't allowing tag-aliases indirectly solve this. So if you alias "AI" with "Artificial Intelligence", the LLMs choice doesn't really matter anymore. You already have tag-merging, why not just keep the alias around in the db, so future recurrences won't need to be merged again.
Author
Owner

@Yasand123 commented on GitHub (Dec 23, 2025):

I'm not sure if this was brought up already, but wouldn't allowing tag-aliases indirectly solve this.

So if you alias "AI" with "Artificial Intelligence", the LLMs choice doesn't really matter anymore.

You already have tag-merging, why not just keep the alias around in the db, so future recurrences won't need to be merged again.

Not really. AI can come up with a lot of potential tags that are ever so slightly different. It's too "creative". Having "AI" and "Artificial Intelligence" as an example is not realistic, this is too ideal. In reality it's much messier than this to the point where having aliases is not feasible, because it keeps creating slightly different tags. Are you gonna just keep adding an infinite number of aliases?

At this point a proper solution would be much less time consuming than this workaround.

<!-- gh-comment-id:3685765780 --> @Yasand123 commented on GitHub (Dec 23, 2025): > I'm not sure if this was brought up already, but wouldn't allowing tag-aliases indirectly solve this. > > So if you alias "AI" with "Artificial Intelligence", the LLMs choice doesn't really matter anymore. > > You already have tag-merging, why not just keep the alias around in the db, so future recurrences won't need to be merged again. Not really. AI can come up with a lot of potential tags that are ever so slightly different. It's too "creative". Having "AI" and "Artificial Intelligence" as an example is not realistic, this is too ideal. In reality it's much messier than this to the point where having aliases is not feasible, because it keeps creating slightly different tags. Are you gonna just keep adding an infinite number of aliases? At this point a proper solution would be much less time consuming than this workaround.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#91
No description provided.