[PR #505] [CLOSED] feature: Improve the LLM prompts for tag generation #1640

Closed
opened 2026-03-02 11:58:31 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/karakeep-app/karakeep/pull/505
Author: @Papierkorb
Created: 10/7/2024
Status: Closed

Base: mainHead: feature/custom-prompt


📝 Commits (1)

  • 567e63d feature: Improve the LLM prompts for tag generation

📊 Changes

1 file changed (+16 additions, -17 deletions)

View changed files

📝 packages/shared/prompts.ts (+16 -17)

📄 Description

Hello!

I just found this great piece of software! I was thrilled to see that I can use my own OpenAI endpoint and point it towards my Llama 3.1 8B endpoint.

Status Quo

I think the standard prompt as set in packages/shared/prompts.ts could require some tweaking.
The default prompt doesn't work at all with my model (Which I think is a pretty commonly used one). This one does work great:

You're an expert document tagger. Given an input you generate an JSON object of the following format: { "tags": ["First tag", "Another tag", ...] }

Only respond in this format! Write only the JSON data and nothing else. Do not write an explanation. If you can't find tags you're satisfied with respond with an empty tags array. Write at most five tags. Write your tags in ${lang}.

${content}

I've only tested this prompt manually with my model. With this, my 0% success rate turns into a 100% success rate; Which surprises me, usually you need to handle the model giving an introduction like "Your requested JSON document: ..." by looking for the first { and last }.

Thus

Long story short: This PR adds the "TEXT_TAG_PROMPT" and "IMAGE_TAG_PROMPT" environment variables. If not set, then the already existing prompt is used. If it is set, it's used instead. The variables are then interpolated Like {{this}}, akin to Jinja2 templates (Which the project may adopt in the future?)

My template in this syntax

You're an expert document tagger. Given an input you generate an JSON object of the following format: { "tags": ["First tag", "Another tag", ...] }

Only respond in this format! Write only the JSON data and nothing else. Do not write an explanation. If you can't find tags you're satisfied with respond with an empty tags array. Write at most five tags. Write your tags in {{lang}}.

{{content}}

Not Great: As is, this change breaks AISettings.tsx, as it would now require the server configuration to work. For my tests I simply removed the calls to the buildPrompt functions - But this of course isn't acceptable for merging. I'd need a little help here on how we should approach this.

Now it works with Llama 3.1 8B, yay

grafik


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/karakeep-app/karakeep/pull/505 **Author:** [@Papierkorb](https://github.com/Papierkorb) **Created:** 10/7/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `feature/custom-prompt` --- ### 📝 Commits (1) - [`567e63d`](https://github.com/karakeep-app/karakeep/commit/567e63dcf0f4783838365bd205828c3275083660) feature: Improve the LLM prompts for tag generation ### 📊 Changes **1 file changed** (+16 additions, -17 deletions) <details> <summary>View changed files</summary> 📝 `packages/shared/prompts.ts` (+16 -17) </details> ### 📄 Description Hello! I just found this great piece of software! I was thrilled to see that I can use my own OpenAI endpoint and point it towards my Llama 3.1 8B endpoint. # Status Quo I think the standard prompt as set in `packages/shared/prompts.ts` could require some tweaking. The default prompt doesn't work at all with my model (Which I think is a pretty commonly used one). This one does work great: ``` You're an expert document tagger. Given an input you generate an JSON object of the following format: { "tags": ["First tag", "Another tag", ...] } Only respond in this format! Write only the JSON data and nothing else. Do not write an explanation. If you can't find tags you're satisfied with respond with an empty tags array. Write at most five tags. Write your tags in ${lang}. ${content} ``` I've only tested this prompt manually with my model. With this, my 0% success rate turns into a 100% success rate; Which surprises me, usually you need to handle the model giving an introduction like "Your requested JSON document: ..." by looking for the first `{` and last `}`. # Thus Long story short: This PR adds the "TEXT_TAG_PROMPT" and "IMAGE_TAG_PROMPT" environment variables. If not set, then the already existing prompt is used. If it is set, it's used instead. The variables are then interpolated `Like {{this}}`, akin to Jinja2 templates (Which the project may adopt in the future?) <details><summary>My template in this syntax</summary> <p> ``` You're an expert document tagger. Given an input you generate an JSON object of the following format: { "tags": ["First tag", "Another tag", ...] } Only respond in this format! Write only the JSON data and nothing else. Do not write an explanation. If you can't find tags you're satisfied with respond with an empty tags array. Write at most five tags. Write your tags in {{lang}}. {{content}} ``` </p> </details> **Not Great**: As is, this change breaks `AISettings.tsx`, as it would now require the server configuration to work. For my tests I simply removed the calls to the buildPrompt functions - But this of course isn't acceptable for merging. I'd need a little help here on how we should approach this. <details><summary>Now it works with Llama 3.1 8B, yay</summary> <p> ![grafik](https://github.com/user-attachments/assets/3e5ed180-67d5-499f-a119-fb36f4f9b5c6) </p> </details> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 11:58:31 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1640
No description provided.