starred/karakeep

Fork 0

mirror of https://github.com/karakeep-app/karakeep.git synced 2026-04-25 07:56:05 +03:00

[GH-ISSUE #1806] Add support for Ollama options #1130

New issue

Open

opened 2026-03-02 11:55:13 +03:00 by kerem · 7 comments

kerem commented

2026-03-02 11:55:13 +03:00

Owner

Originally created by @templehasfallen on GitHub (Aug 1, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1806

Describe the feature you'd like

Please add an option to specify options to be passed to the ollama model for inference/tagging.

Suggested var: INFERENCE_TEXT_MODEL_OPTIONS allowing a list of options.

An example of a request with a lot of options which can be passed to ollama:

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false,
  "options": {
    "num_keep": 5,
    "seed": 42,
    "num_predict": 100,
    "top_k": 20,
    "top_p": 0.9,
    "tfs_z": 0.5,
    "typical_p": 0.7,
    "repeat_last_n": 33,
    "temperature": 0.8,
    "repeat_penalty": 1.2,
    "presence_penalty": 1.5,
    "frequency_penalty": 1.0,
    "mirostat": 1,
    "mirostat_tau": 0.8,
    "mirostat_eta": 0.6,
    "penalize_newline": true,
    "stop": ["\n", "user:"],
    "numa": false,
    "num_ctx": 1024,
    "num_batch": 2,
    "num_gqa": 1,
    "main_gpu": 1,
    "low_vram": false,
    "f16_kv": true,
    "vocab_only": false,
    "use_mmap": true,
    "use_mlock": false,
    "embedding_only": false,
    "rope_frequency_base": 1.1,
    "rope_frequency_scale": 0.8,
    "num_thread": 8
  }
}'

Describe the benefits this would bring to existing Karakeep users

This may be useful in various cases and allows the user to finetune the model used (top_k, top_p etc.), or allow using only CPU, modify GPU offloading (num_gpu) etc.

Can the goal of this request already be achieved via other means?

Currently it cannot.

Have you searched for an existing open/closed issue?

I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @templehasfallen on GitHub (Aug 1, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1806 ### Describe the feature you'd like Please add an option to specify options to be passed to the ollama model for inference/tagging. Suggested var: `INFERENCE_TEXT_MODEL_OPTIONS` allowing a list of options. An example of a request with a lot of options which can be passed to ollama: ``` curl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "stream": false, "options": { "num_keep": 5, "seed": 42, "num_predict": 100, "top_k": 20, "top_p": 0.9, "tfs_z": 0.5, "typical_p": 0.7, "repeat_last_n": 33, "temperature": 0.8, "repeat_penalty": 1.2, "presence_penalty": 1.5, "frequency_penalty": 1.0, "mirostat": 1, "mirostat_tau": 0.8, "mirostat_eta": 0.6, "penalize_newline": true, "stop": ["\n", "user:"], "numa": false, "num_ctx": 1024, "num_batch": 2, "num_gqa": 1, "main_gpu": 1, "low_vram": false, "f16_kv": true, "vocab_only": false, "use_mmap": true, "use_mlock": false, "embedding_only": false, "rope_frequency_base": 1.1, "rope_frequency_scale": 0.8, "num_thread": 8 } }' ``` ### Describe the benefits this would bring to existing Karakeep users This may be useful in various cases and allows the user to finetune the model used (`top_k`, `top_p` etc.), or allow using only CPU, modify GPU offloading (`num_gpu`) etc. ### Can the goal of this request already be achieved via other means? Currently it cannot. ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_

kerem added the

feature request

status/icebox

labels

2026-03-02 11:55:13 +03:00

kerem commented

2026-03-02 11:55:14 +03:00

Author

Owner

@MohamedBassem commented on GitHub (Aug 3, 2025):

It's unlikely that this will get implemented for ollama in particular because I'm planning to get rid of the custom ollama client and instead access ollama using the openai compatible endpoint. Don't know how feasible it'll be to pass such options in the openai compatible endpoint.

@MohamedBassem commented on GitHub (Aug 3, 2025): It's unlikely that this will get implemented for ollama in particular because I'm planning to get rid of the custom ollama client and instead access ollama using the openai compatible endpoint. Don't know how feasible it'll be to pass such options in the openai compatible endpoint.

kerem commented

2026-03-02 11:55:14 +03:00

Author

Owner

@Gekko23 commented on GitHub (Aug 8, 2025):

A very, very important feature using Ollama would be to get rid of the 500 character limitation. I don't know why this even exists, but in a local environment using complex tagging rules, this limits seems to be oddly artificial. Please get rid of it. I use Karakeep on an unraid system, i don't know, if this 500 character limit is an unraid specific thing, but it's blocking usefulness of using an own llm.

@Gekko23 commented on GitHub (Aug 8, 2025): A very, very important feature using Ollama would be to get rid of the 500 character limitation. I don't know why this even exists, but in a local environment using complex tagging rules, this limits seems to be oddly artificial. Please get rid of it. I use Karakeep on an unraid system, i don't know, if this 500 character limit is an unraid specific thing, but it's blocking usefulness of using an own llm.

kerem commented

2026-03-02 11:55:14 +03:00

Author

Owner

@MohamedBassem commented on GitHub (Aug 8, 2025):

@Gekko23 I have no idea what this 500 char limit thing you’re talking about.

@MohamedBassem commented on GitHub (Aug 8, 2025): @Gekko23 I have no idea what this 500 char limit thing you’re talking about.

kerem commented

2026-03-02 11:55:14 +03:00

Author

Owner

@blackfeather9 commented on GitHub (Aug 9, 2025):

A very, very important feature using Ollama would be to get rid of the 500 character limitation. I don't know why this even exists, but in a local environment using complex tagging rules, this limits seems to be oddly artificial. Please get rid of it. I use Karakeep on an unraid system, i don't know, if this 500 character limit is an unraid specific thing, but it's blocking usefulness of using an own llm.

@Gekko23 Are you referring to the context window on the model you're using?

I've been messing with ollama + open-webui with karakeep and other apps recently. Inference in a different app failed for me recently because the default ollama setting (using ollama:latest in docker) is 4096 tokens. This is easily exhausted if you have a large prompt and a few pages of content to analyze.

When the model hits the token limit, it fails. Even if your model supports larger (example, gemma3:4b can handle 128k tokens), ollama is setting an artificial limit. Example, in your post, the options you shared are setting the context window to 1024 tokens: "num_ctx": 1024.

The workaround for me, since I can't pass options in karakeep directly or the other apps I'm using, is to create a custom modelfile in ollama that uses my preferred model (in this case, gemma3:4b) but sets a custom context window. You can set any of the options you want within the modelfile directly, and bypass what the app can/can't do:

FROM gemma3:4b
PARAMETER num_ctx 65792

Then import the model with ollama create and call the model in karakeep with INFERENCE_TEXT_MODEL to get what you want.

@blackfeather9 commented on GitHub (Aug 9, 2025): > A very, very important feature using Ollama would be to get rid of the 500 character limitation. I don't know why this even exists, but in a local environment using complex tagging rules, this limits seems to be oddly artificial. Please get rid of it. I use Karakeep on an unraid system, i don't know, if this 500 character limit is an unraid specific thing, but it's blocking usefulness of using an own llm. @Gekko23 Are you referring to the context window on the model you're using? I've been messing with ollama + open-webui with karakeep and other apps recently. Inference in a different app failed for me recently because the default ollama setting (using `ollama:latest` in docker) is 4096 tokens. This is easily exhausted if you have a large prompt and a few pages of content to analyze. When the model hits the token limit, it fails. Even if your model supports larger (example, gemma3:4b can handle 128k tokens), ollama is setting an artificial limit. Example, in your post, the options you shared are setting the context window to 1024 tokens: `"num_ctx": 1024`. The workaround for me, since I can't pass options in karakeep directly or the other apps I'm using, is to create a custom modelfile in ollama that uses my preferred model (in this case, gemma3:4b) but sets a custom context window. You can set any of the options you want within the modelfile directly, and bypass what the app can/can't do: ``` FROM gemma3:4b PARAMETER num_ctx 65792 ``` Then import the model with `ollama create` and call the model in karakeep with `INFERENCE_TEXT_MODEL` to get what you want.

kerem commented

2026-03-02 11:55:15 +03:00

Author

Owner

@Gekko23 commented on GitHub (Aug 9, 2025):

https://postimg.cc/GTjPr1zD

@Gekko23 commented on GitHub (Aug 9, 2025): [https://postimg.cc/GTjPr1zD](https://postimg.cc/GTjPr1zD)

kerem commented

2026-03-02 11:55:15 +03:00

Author

Owner

@MohamedBassem commented on GitHub (Aug 9, 2025):

@Gekko23 this is not meant to replace the entire prompt. This is meant to add more “custom rules”. So the 500 char limit is per rule, and you can add as many rules as you want.

@MohamedBassem commented on GitHub (Aug 9, 2025): @Gekko23 this is not meant to replace the entire prompt. This is meant to add more “custom rules”. So the 500 char limit is per rule, and you can add as many rules as you want.

kerem commented

2026-03-02 11:55:15 +03:00

Author

Owner

@Gekko23 commented on GitHub (Aug 14, 2025):

@MohamedBassem:
I could not solve a complex tagging prompt using the "custom rules". It's not that i'm not able to, it's just not working that way, cause the prompt is complex. Could you PLEASE just dissolve the 500 character limit for the tagging prompt rules, everything else is just unnecessarily complicating a very, very simple task.

Again: i'm just talking about the AI TAGGING RULES. No 500 character limits for prompts. That would be great. Thanks.

@Gekko23 commented on GitHub (Aug 14, 2025): @MohamedBassem: I could not solve a complex tagging prompt using the "custom rules". It's not that i'm not able to, it's just not working that way, cause the prompt is complex. Could you PLEASE just dissolve the 500 character limit for the tagging prompt rules, everything else is just unnecessarily complicating a very, very simple task. Again: i'm just talking about the AI TAGGING RULES. No 500 character limits for prompts. That would be great. Thanks.

kerem referenced this issue

2026-03-02 11:59:14 +03:00

[PR #1318] [MERGED] feat: Implement generic rule engine #1800

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

starred/karakeep#1130

No description provided.

Rows
Columns