[GH-ISSUE #1593] Model prompt still being truncated #994

Open
opened 2026-03-02 11:54:14 +03:00 by kerem · 7 comments
Owner

Originally created by @ubff389 on GitHub (Jun 11, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1593

Describe the Bug

The patch notes for v0.25.0 mention that now a proper tokenizer is used, which prevents the system prompt from being truncated. Even after updating, I have observed that the prompt is still being truncated by ollama:

Jun 10 21:22:18 ollama-host ollama[159]: time=2025-06-10T21:22:18.314Z level=WARN source=runner.go:128 msg="truncating input prompt" limit=512 prompt=617 keep=4 new=512
Jun 10 22:46:49 ollama-host ollama[159]: [GIN] 2025/06/10 - 22:46:49 | 200 |      1h24m39s |    10.99.99.105 | POST     "/api/chat"
Jun 11 08:07:59 ollama-host ollama[159]: time=2025-06-11T08:07:59.286Z level=WARN source=runner.go:128 msg="truncating input prompt" limit=512 prompt=617 keep=4 new=512
Jun 11 08:10:38 ollama-host ollama[159]: [GIN] 2025/06/11 - 08:10:38 | 200 |         2m39s |    10.99.99.105 | POST     "/api/chat"
Jun 11 08:13:42 ollama-host ollama[159]: time=2025-06-11T08:13:42.858Z level=WARN source=runner.go:128 msg="truncating input prompt" limit=512 prompt=529 keep=4 new=512
Jun 11 08:18:51 ollama-host ollama[159]: [GIN] 2025/06/11 - 08:18:51 | 200 |          5m8s |    10.99.99.105 | POST     "/api/chat"
Jun 11 08:20:08 ollama-host ollama[159]: time=2025-06-11T08:20:08.539Z level=WARN source=runner.go:128 msg="truncating input prompt" limit=512 prompt=587 keep=4 new=512

Steps to Reproduce

  1. Set a token limit that is smaller than the length of the page contents
  2. Start an AI tagging job
  3. Observe Ollama logs

Expected Behaviour

The entire prompt should be crafted on Karakeep's side to not exceed the set amount of tokens.

Screenshots or Additional Context

No response

Device Details

No response

Exact Karakeep Version

v0.25.0

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem
Originally created by @ubff389 on GitHub (Jun 11, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1593 ### Describe the Bug The patch notes for v0.25.0 mention that now a proper tokenizer is used, which prevents the system prompt from being truncated. Even after updating, I have observed that the prompt is still being truncated by ollama: ``` Jun 10 21:22:18 ollama-host ollama[159]: time=2025-06-10T21:22:18.314Z level=WARN source=runner.go:128 msg="truncating input prompt" limit=512 prompt=617 keep=4 new=512 Jun 10 22:46:49 ollama-host ollama[159]: [GIN] 2025/06/10 - 22:46:49 | 200 | 1h24m39s | 10.99.99.105 | POST "/api/chat" Jun 11 08:07:59 ollama-host ollama[159]: time=2025-06-11T08:07:59.286Z level=WARN source=runner.go:128 msg="truncating input prompt" limit=512 prompt=617 keep=4 new=512 Jun 11 08:10:38 ollama-host ollama[159]: [GIN] 2025/06/11 - 08:10:38 | 200 | 2m39s | 10.99.99.105 | POST "/api/chat" Jun 11 08:13:42 ollama-host ollama[159]: time=2025-06-11T08:13:42.858Z level=WARN source=runner.go:128 msg="truncating input prompt" limit=512 prompt=529 keep=4 new=512 Jun 11 08:18:51 ollama-host ollama[159]: [GIN] 2025/06/11 - 08:18:51 | 200 | 5m8s | 10.99.99.105 | POST "/api/chat" Jun 11 08:20:08 ollama-host ollama[159]: time=2025-06-11T08:20:08.539Z level=WARN source=runner.go:128 msg="truncating input prompt" limit=512 prompt=587 keep=4 new=512 ``` ### Steps to Reproduce 1. Set a token limit that is smaller than the length of the page contents 2. Start an AI tagging job 3. Observe Ollama logs ### Expected Behaviour The entire prompt should be crafted on Karakeep's side to not exceed the set amount of tokens. ### Screenshots or Additional Context _No response_ ### Device Details _No response_ ### Exact Karakeep Version v0.25.0 ### Have you checked the troubleshooting guide? - [x] I have checked the troubleshooting guide and I haven't found a solution to my problem
Author
Owner

@MohamedBassem commented on GitHub (Jun 11, 2025):

The main gap I'm currently aware of regarding prompt truncation is that we used a tokenizer for a certain model (openai's ones) which is not necessarily the same tokenizer used by other open source models which might cause this difference. Because people using ollama can use any sort of models, some of which might not be even supported by the tokenizer library we're using, I'm honestly not entirely sure how to handle this generically. Beside maybe giving you a configurable factor to multiply against the calculated token count.

<!-- gh-comment-id:2961730329 --> @MohamedBassem commented on GitHub (Jun 11, 2025): The main gap I'm currently aware of regarding prompt truncation is that we used a tokenizer for a certain model (openai's ones) which is not necessarily the same tokenizer used by other open source models which might cause this difference. Because people using ollama can use any sort of models, some of which might not be even supported by the tokenizer library we're using, I'm honestly not entirely sure how to handle this generically. Beside maybe giving you a configurable factor to multiply against the calculated token count.
Author
Owner

@ubff389 commented on GitHub (Jun 11, 2025):

What if we just reformat the prompt in a way that there are no instructions at the end that can be truncated and get lost? For example, now it looks like this:

- Aim for 3-5 tags.
- If there are no good tags, leave the array empty.

CONTENT START HERE

<CONTENT_HERE>

CONTENT END HERE
You must respond in JSON with the key "tags" and the value is an array of string tags.

Instead, make it like this:

- Aim for 3-5 tags.
- If there are no good tags, leave the array empty.
- You must respond in JSON with the key "tags" and the value is an array of string tags.


CONTENT STARTS HERE

<CONTENT_HERE>

Or what I would consider to be even better, let users craft their own prompts completely, put it behind a "DANGER" confirmation or whatever, but this would definitely be useful for people running local Ollama setups, especially with small-ish or unorthodox models.

<!-- gh-comment-id:2961745638 --> @ubff389 commented on GitHub (Jun 11, 2025): What if we just reformat the prompt in a way that there are no instructions at the end that can be truncated and get lost? For example, now it looks like this: ``` - Aim for 3-5 tags. - If there are no good tags, leave the array empty. CONTENT START HERE <CONTENT_HERE> CONTENT END HERE You must respond in JSON with the key "tags" and the value is an array of string tags. ``` Instead, make it like this: ``` - Aim for 3-5 tags. - If there are no good tags, leave the array empty. - You must respond in JSON with the key "tags" and the value is an array of string tags. CONTENT STARTS HERE <CONTENT_HERE> ``` Or what I would consider to be even better, let users craft their own prompts completely, put it behind a "DANGER" confirmation or whatever, but this would definitely be useful for people running local Ollama setups, especially with small-ish or unorthodox models.
Author
Owner

@pdc1 commented on GitHub (Jun 12, 2025):

Or what I would consider to be even better, let users craft their own prompts completely, put it behind a "DANGER" confirmation or whatever, but this would definitely be useful for people running local Ollama setups, especially with small-ish or unorthodox models.

I second this suggestion! 👍 I want to completely replace the prompt with my own.

I have fine-tuned a nice prompt for llama 3.1 8b, which has a 128K context. My prompt includes existing tags, which might make it quite large, but even by itself it is ~1K. I am disappointed I can't use it!

Also, the dialog to enter your own prompt is only ONE LINE! Please make this a multi-line text box so we can use normal formatting like the default prompts.

For anyone's interest, here is my ollama tagging prompt. It works quite nicely with llama 3.1, mistral, and gemma. I'm using llama for the largest context window. You can see I had to encourage valid JSON, llama really wants to explain how clever it is 😄

You are a tagging assistant for a read-it-later app. Your task is to return **only valid JSON**
with 4-5 concise, meaningful tags that best represent the main ideas and themes of the content.

Guidelines:
- Ignore cookie consent or privacy boilerplate.
- Avoid generic words like "article", "life", or "content".
- If a tag closely matches an item from the **Preferred Tags**, use it exactly.
- Else if it matches one from **Existing Tags**, use that version.
- If no match is found, create a new tag in lowercase with **space** between words.

Respond with only valid JSON in the format: {"tags": ["tag1", "tag2", "tag3", ...]}

Preferred Tags:

$userTags

Existing Tags:

$aiTags

CONTENT START HERE

<CONTENT_HERE>

CONTENT END HERE

Respond with nothing except valid JSON. Do **not** include any commentary, explanation, or notes.

I also have a summary prompt I'm working on. So far it is <500 chars so should work. It works best with llama, but is reasonable with gemma. Mistral wants to start the summary with "The article suggests" despite being told not to.

Summarize the following content in 3 to 4 full sentences using no more than 75 words.

- Focus only on the core insights and ideas.
- Avoid phrases like "The article discusses" or "This piece says."
- Write in clear, natural English — concise, but expressive.
- If helpful, use metaphor or vivid imagery to convey meaning or emotion, but avoid over-explaining.
- Do not use bullet points or lists.
- Maintain a neutral, reflective tone.

Respond ONLY with the summary.
<!-- gh-comment-id:2967495411 --> @pdc1 commented on GitHub (Jun 12, 2025): > Or what I would consider to be even better, let users craft their own prompts completely, put it behind a "DANGER" confirmation or whatever, but this would definitely be useful for people running local Ollama setups, especially with small-ish or unorthodox models. I second this suggestion! 👍 I want to completely replace the prompt with my own. I have fine-tuned a nice prompt for llama 3.1 8b, which has a 128K context. My prompt includes existing tags, which might make it quite large, but even by itself it is ~1K. I am disappointed I can't use it! Also, the dialog to enter your own prompt is only ONE LINE! Please make this a multi-line text box so we can use normal formatting like the default prompts. For anyone's interest, here is my `ollama` tagging prompt. It works quite nicely with llama 3.1, mistral, and gemma. I'm using llama for the largest context window. You can see I had to encourage valid JSON, llama *really* wants to explain how clever it is 😄 ``` You are a tagging assistant for a read-it-later app. Your task is to return **only valid JSON** with 4-5 concise, meaningful tags that best represent the main ideas and themes of the content. Guidelines: - Ignore cookie consent or privacy boilerplate. - Avoid generic words like "article", "life", or "content". - If a tag closely matches an item from the **Preferred Tags**, use it exactly. - Else if it matches one from **Existing Tags**, use that version. - If no match is found, create a new tag in lowercase with **space** between words. Respond with only valid JSON in the format: {"tags": ["tag1", "tag2", "tag3", ...]} Preferred Tags: $userTags Existing Tags: $aiTags CONTENT START HERE <CONTENT_HERE> CONTENT END HERE Respond with nothing except valid JSON. Do **not** include any commentary, explanation, or notes. ``` I also have a summary prompt I'm working on. So far it is <500 chars so should work. It works best with llama, but is reasonable with gemma. Mistral wants to start the summary with "The article suggests" despite being told not to. ``` Summarize the following content in 3 to 4 full sentences using no more than 75 words. - Focus only on the core insights and ideas. - Avoid phrases like "The article discusses" or "This piece says." - Write in clear, natural English — concise, but expressive. - If helpful, use metaphor or vivid imagery to convey meaning or emotion, but avoid over-explaining. - Do not use bullet points or lists. - Maintain a neutral, reflective tone. Respond ONLY with the summary. ```
Author
Owner

@pdc1 commented on GitHub (Jun 15, 2025):

I would like to amend my previous post 😊

I did not realize each entry in the AI configuration was for a single line in the prompt! I added single lines like If a tag closely matches an item from $userTags, use it exactly. and Else if it matches one from $aiTags, use that version., and that seems to have helped quite a bit.

That said, I would really prefer NOT to have site names as tags, but I can't remove that from the predefined prompt.

<!-- gh-comment-id:2974529579 --> @pdc1 commented on GitHub (Jun 15, 2025): I would like to amend my previous post 😊 I did not realize each entry in the AI configuration was for a **_single line_** in the prompt! I added single lines like `If a tag closely matches an item from $userTags, use it exactly.` and `Else if it matches one from $aiTags, use that version.`, and that seems to have helped quite a bit. That said, I would really prefer NOT to have site names as tags, but I can't remove that from the predefined prompt.
Author
Owner

@Shiv-Patil commented on GitHub (Jun 23, 2025):

A configurable buffer size which will be added to num_ctx could work.
Also yes, A completely customizable prompt would be nice - The hardcoded prompt rules can be moved to custom prompts which are enabled by default (maybe add an on/off flag for each of those default prompts and remove the option to delete)

<!-- gh-comment-id:2995783594 --> @Shiv-Patil commented on GitHub (Jun 23, 2025): A configurable buffer size which will be added to `num_ctx` could work. Also yes, A completely customizable prompt would be nice - The hardcoded prompt rules can be moved to custom prompts which are enabled by default (maybe add an on/off flag for each of those default prompts and remove the option to delete)
Author
Owner

@biscanli commented on GitHub (Aug 15, 2025):

Aside from what the others have mentioned with being able to edit the prompt and being able to configure a buffer, another way to fix this can be to not send a token limit at all when either:

  • Someone uses an ollama endpoint, so truncating is not that useful since there is no per token cost
  • Or with a boolean flag for anyone to be able to disable it

Note that I am not suggesting that we don't truncate the content ourselves before sending, just not adding to the request the token count that it should be truncated to. Especially with a self hosted ollama instance I don't think there is a necessity for it, but I could be wrong.

<!-- gh-comment-id:3192137309 --> @biscanli commented on GitHub (Aug 15, 2025): Aside from what the others have mentioned with being able to edit the prompt and being able to configure a buffer, another way to fix this can be to not send a token limit at all when either: - Someone uses an ollama endpoint, so truncating is not that useful since there is no per token cost - Or with a boolean flag for anyone to be able to disable it Note that I am *not* suggesting that we don't truncate the content ourselves before sending, just not adding to the request the token count that it should be truncated to. Especially with a self hosted ollama instance I don't think there is a necessity for it, but I could be wrong.
Author
Owner

@eriktews commented on GitHub (Jan 1, 2026):

Even though preventing the issue might be hard, but it should be possible to detect the issue and display a warning to the user.

The JSON returned by Ollama contains a key "prompt_eval_count". When this value is equal to the maximum context size configured (2048 by default) then the prompt was truncated almost for sure.

<!-- gh-comment-id:3704073336 --> @eriktews commented on GitHub (Jan 1, 2026): Even though preventing the issue might be hard, but it should be possible to detect the issue and display a warning to the user. The JSON returned by Ollama contains a key "prompt_eval_count". When this value is equal to the maximum context size configured (2048 by default) then the prompt was truncated almost for sure.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#994
No description provided.