[GH-ISSUE #347] Using Inference against compatible OpenAI Endpoint #228

Closed
opened 2026-03-02 11:47:47 +03:00 by kerem · 16 comments
Owner

Originally created by @StackShard on GitHub (Aug 8, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/347

Running LM Studio with the OpenAI compatible API. Connection seems to go through, but Hoarder worker doesn't like the response. Here is the log from Hoarder, then followed by the console in LM Studio. This happens with any attempt to auto-tag. Note that it works perfectly fine against "real" OpenAI API. There's something I need to tweak but I need some direction.

HOARDER LOG: ------------------------------------------

workers-1 | 2024-08-08T15:36:06.816Z error: [inference][17] inference job failed: Error: [inference][17] The model ignored our prompt and didn't respond with the expected JSON: {}. Here's a sneak peak from the response: json workers-1 | { workers-1 | "tags": workers-1 | 2024-08-08T15:36:07.870Z info: [inference][17] Starting an inference job for bookmark with id "ajd0kzbdznwb18rixelmjy7g" workers-1 | 2024-08-08T15:36:11.540Z error: [inference][17] inference job failed: Error: [inference][17] The model ignored our prompt and didn't respond with the expected JSON: {}. Here's a sneak peak from the response: json
workers-1 | {
workers-1 | "tags":
workers-1 | 2024-08-08T15:36:14.690Z info: [search][43] Attempting to index bookmark with id ajd0kzbdznwb18rixelmjy7g ...

LLM RESPONSE: -------------------------------------------

[2024-08-08 11:36:06.939] [INFO] Received POST request to /v1/chat/completions with body: {
"messages": [
{
"role": "system",
"content": "\n\nI'm building a read-it-later app and I need your help with automatic tagging.\nPlease analyze the text between the sentences "CONTENT START HERE" and "CONTENT END HERE" and suggest relevant tags that describe its key themes, topics, and main ideas.\nAim for a variety of tags, including broad categories, specific keywords, and potential sub-genres. The tags language must be english. If it's a famous website\nyou may also include a tag for the website. If the tag is not generic enough, don't include it.\nThe content can include text for cookie consent and privacy policy, ignore those while tagging.\nCONTENT START HERE\n\nURL: https://blackforestindustries.com/collections/vw-manual-shift-knobs/products/gs2-alcantara-vw-audi-manual?variant=14452908032044\nTitle: GS2 Alcantara (VW/Audi Manual)\nDescription: Not to toot our own horn, but we’re pretty sure we’ve just raised the bar when it comes to one of the most frequently touched parts of your car’s interior. This heavyweight shift knob feels like butter in palm of your hand, thanks due to the black alcantara that it’s wrapped in. Weighing in at approximately 205 grams t\nContent: \n \n \n GS2 Alcantara (VW/Audi Manual)\n \n\n \n \n \n \n\n \n \n \n\n\n \n \n \n SKU: GS2SUF\n \n\n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n \n \n \n\n\n \n \n \n \n Please note that our shift knobs are made-to-order. Due to an increase in shift knob orders, there is currently a 2-3 week lead time depending on the specific vehicle application. Please contact sales@blackforestindustries.com for more information, thank you!\n \n \n\n \n \n \n \nNot to toot our own horn, but we're pretty sure we've just raised the bar when it comes to one of the most frequently touched parts of your car's interior. This heavyweight shift knob feels like butter in palm of your hand, thanks due to the black alcantara that it's wrapped in. Weighing in at approximately 205 grams the added inertial mass makes shifting effort substantially less while speeding up the process at the same time. While we've borrowed inspiration from shift knobs past, we think we've really perfected the proportions this time around- with the 3/4 alcantara to metal ratio. Visually this thing leaves nothing to be desired. The machining is top notch, and the detailed crest coin gives that classy yet sporting look which elevates any interior's level by at least 50 cool points. The included adapter is used to work with just about any VW/Audi shift lever. The knob's adapter is secured using three set screws for all models. You might think that this is a lot of hype for such a little part, but try telling that to us once you've had the chance to hold one of these guys in your hand.\nINCLUDES:\n\nOne heavy weight GS2 shift knob\nOne BFI crest logo coin\nOne adapter for VW / Audi manual selector shaft\nThree set screws\nAllen key for set screws\n\nOPTIONAL ACCESSORIES:\n5-speed pattern coin top\n6-speed pattern coin top\nLOCTITE for set screws\n\n\n \n \n\n \n\nCONTENT END HERE\nYou must respond in JSON with the key "tags" and the value is an array of string tags. \nAim for 3-5 tags. If there are no good tags, leave the array empty.\n"
}
],
"model": "Publisher/Repository/WizardLM-2-7B-abliterated.Q8_0.gguf",
"response_format": {
"type": "json_object"
}
}
[2024-08-08 11:36:10.600] [INFO] [LM STUDIO SERVER] [Publisher/Repository/WizardLM-2-7B-abliterated.Q8_0.gguf] Generated prediction: {
"id": "chatcmpl-u44u7ro6rv9w7kq5jwk2j",
"object": "chat.completion",
"created": 1723131366,
"model": "Publisher/Repository/WizardLM-2-7B-abliterated.Q8_0.gguf",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "json\n{\n \"tags\": [\n \"Automotive Accessories\",\n \"VW Manual Shift Knob\",\n \"Audi Manual Shift Knob\",\n \"Alcantara Interior\",\n \"Car Parts\"\n ]\n}\n"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 923,
"completion_tokens": 68,
"total_tokens": 991
}
}

Originally created by @StackShard on GitHub (Aug 8, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/347 Running LM Studio with the OpenAI compatible API. Connection seems to go through, but Hoarder worker doesn't like the response. Here is the log from Hoarder, then followed by the console in LM Studio. This happens with any attempt to auto-tag. Note that it works perfectly fine against "real" OpenAI API. There's something I need to tweak but I need some direction. **HOARDER LOG: ------------------------------------------** workers-1 | 2024-08-08T15:36:06.816Z error: [inference][17] inference job failed: Error: [inference][17] The model ignored our prompt and didn't respond with the expected JSON: {}. Here's a sneak peak from the response: ```json workers-1 | { workers-1 | "tags": workers-1 | 2024-08-08T15:36:07.870Z info: [inference][17] Starting an inference job for bookmark with id "ajd0kzbdznwb18rixelmjy7g" workers-1 | 2024-08-08T15:36:11.540Z error: [inference][17] inference job failed: Error: [inference][17] The model ignored our prompt and didn't respond with the expected JSON: {}. Here's a sneak peak from the response: ```json workers-1 | { workers-1 | "tags": workers-1 | 2024-08-08T15:36:14.690Z info: [search][43] Attempting to index bookmark with id ajd0kzbdznwb18rixelmjy7g ... **LLM RESPONSE: -------------------------------------------** [2024-08-08 11:36:06.939] [INFO] Received POST request to /v1/chat/completions with body: { "messages": [ { "role": "system", "content": "\n\nI'm building a read-it-later app and I need your help with automatic tagging.\nPlease analyze the text between the sentences \"CONTENT START HERE\" and \"CONTENT END HERE\" and suggest relevant tags that describe its key themes, topics, and main ideas.\nAim for a variety of tags, including broad categories, specific keywords, and potential sub-genres. The tags language must be english. If it's a famous website\nyou may also include a tag for the website. If the tag is not generic enough, don't include it.\nThe content can include text for cookie consent and privacy policy, ignore those while tagging.\nCONTENT START HERE\n\nURL: https://blackforestindustries.com/collections/vw-manual-shift-knobs/products/gs2-alcantara-vw-audi-manual?variant=14452908032044\nTitle: GS2 Alcantara (VW/Audi Manual)\nDescription: Not to toot our own horn, but we’re pretty sure we’ve just raised the bar when it comes to one of the most frequently touched parts of your car’s interior. This heavyweight shift knob feels like butter in palm of your hand, thanks due to the black alcantara that it’s wrapped in. Weighing in at approximately 205 grams t\nContent: \n \n \n GS2 Alcantara (VW/Audi Manual)\n \n\n \n \n \n \n\n \n \n \n\n\n \n \n \n SKU: GS2SUF\n \n\n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n \n \n \n\n\n \n \n \n \n Please note that our shift knobs are made-to-order. Due to an increase in shift knob orders, there is currently a 2-3 week lead time depending on the specific vehicle application. Please contact sales@blackforestindustries.com for more information, thank you!\n \n \n\n \n \n \n \nNot to toot our own horn, but we're pretty sure we've just raised the bar when it comes to one of the most frequently touched parts of your car's interior. This heavyweight shift knob feels like butter in palm of your hand, thanks due to the black alcantara that it's wrapped in. Weighing in at approximately 205 grams the added inertial mass makes shifting effort substantially less while speeding up the process at the same time. While we've borrowed inspiration from shift knobs past, we think we've really perfected the proportions this time around- with the 3/4 alcantara to metal ratio. Visually this thing leaves nothing to be desired. The machining is top notch, and the detailed crest coin gives that classy yet sporting look which elevates any interior's level by at least 50 cool points. The included adapter is used to work with just about any VW/Audi shift lever. The knob's adapter is secured using three set screws for all models. You might think that this is a lot of hype for such a little part, but try telling that to us once you've had the chance to hold one of these guys in your hand.\nINCLUDES:\n\nOne heavy weight GS2 shift knob\nOne BFI crest logo coin\nOne adapter for VW / Audi manual selector shaft\nThree set screws\nAllen key for set screws\n\nOPTIONAL ACCESSORIES:\n5-speed pattern coin top\n6-speed pattern coin top\nLOCTITE for set screws\n\n\n \n \n\n \n\nCONTENT END HERE\nYou must respond in JSON with the key \"tags\" and the value is an array of string tags. \nAim for 3-5 tags. If there are no good tags, leave the array empty.\n" } ], "model": "Publisher/Repository/WizardLM-2-7B-abliterated.Q8_0.gguf", "response_format": { "type": "json_object" } } [2024-08-08 11:36:10.600] [INFO] [LM STUDIO SERVER] [Publisher/Repository/WizardLM-2-7B-abliterated.Q8_0.gguf] Generated prediction: { "id": "chatcmpl-u44u7ro6rv9w7kq5jwk2j", "object": "chat.completion", "created": 1723131366, "model": "Publisher/Repository/WizardLM-2-7B-abliterated.Q8_0.gguf", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "```json\n{\n \"tags\": [\n \"Automotive Accessories\",\n \"VW Manual Shift Knob\",\n \"Audi Manual Shift Knob\",\n \"Alcantara Interior\",\n \"Car Parts\"\n ]\n}\n```" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 923, "completion_tokens": 68, "total_tokens": 991 } }
kerem closed this issue 2026-03-02 11:47:47 +03:00
Author
Owner

@StackShard commented on GitHub (Aug 8, 2024):

I don't know what the worker is expecting or where to tune the prompt. For example, perhaps need to add something like "respond in RFC8259 compliant JSON"?

<!-- gh-comment-id:2276171258 --> @StackShard commented on GitHub (Aug 8, 2024): I don't know what the worker is expecting or where to tune the prompt. For example, perhaps need to add something like "respond in RFC8259 compliant JSON"?
Author
Owner

@MohamedBassem commented on GitHub (Aug 8, 2024):

The problem is that the response for some weird reasons starts with "json\n". Hoarder expects the response to be purely json which this prefix kinda ruins.

<!-- gh-comment-id:2276179698 --> @MohamedBassem commented on GitHub (Aug 8, 2024): The problem is that the response for some weird reasons starts with "json\n". Hoarder expects the response to be purely json which this prefix kinda ruins.
Author
Owner

@StackShard commented on GitHub (Aug 8, 2024):

I'm happy to tune the prompt to get what the output needs to be, if you can point me in the direction... and the output standard.

<!-- gh-comment-id:2276185002 --> @StackShard commented on GitHub (Aug 8, 2024): I'm happy to tune the prompt to get what the output needs to be, if you can point me in the direction... and the output standard.
Author
Owner
<!-- gh-comment-id:2278026633 --> @kamtschatka commented on GitHub (Aug 9, 2024): https://github.com/hoarder-app/hoarder/blob/main/apps/workers/openaiWorker.ts#L108
Author
Owner

@StackShard commented on GitHub (Aug 9, 2024):

https://github.com/hoarder-app/hoarder/blob/main/apps/workers/openaiWorker.ts#L108

Thanks! Now I need to find out how to get in there, but that's on me.

EDIT: docker exec -u 0 -it 40de0bae69c9 /bin/sh
For my container ID anyway. Moving on.

<!-- gh-comment-id:2278159845 --> @StackShard commented on GitHub (Aug 9, 2024): > https://github.com/hoarder-app/hoarder/blob/main/apps/workers/openaiWorker.ts#L108 Thanks! Now I need to find out how to get in there, but that's on me. EDIT: docker exec -u 0 -it 40de0bae69c9 /bin/sh For my container ID anyway. Moving on.
Author
Owner

@prabhjotsbhatia-ca commented on GitHub (Sep 27, 2024):

You could perhaps tweak the prompt to add "Only give me the JSON. Do not write anything else."

<!-- gh-comment-id:2378395768 --> @prabhjotsbhatia-ca commented on GitHub (Sep 27, 2024): You could perhaps tweak the prompt to add "Only give me the JSON. Do not write anything else."
Author
Owner

@DrFrankensteinUK commented on GitHub (Sep 28, 2024):

Hey, I am also looking to mirror what you are doing with LM Studio - I am missing a trick as getting Unexpected endpoint or method. (POST /v1/api/chat). Returning 200 anyway what settings did you use out of interest as can also test along as well.

<!-- gh-comment-id:2380885640 --> @DrFrankensteinUK commented on GitHub (Sep 28, 2024): Hey, I am also looking to mirror what you are doing with LM Studio - I am missing a trick as getting `Unexpected endpoint or method. (POST /v1/api/chat). Returning 200 anyway` what settings did you use out of interest as can also test along as well.
Author
Owner

@robvanvolt commented on GitHub (Dec 15, 2024):

I have also tried to get it to work with lm studio.. maybe we could add an open-ai compatible chat endpoint with /v1/chat/completions ?

Ollama

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    }
  ]
}'

LM Studio

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "granite-3.0-2b-instruct",
    "messages": [
      { "role": "system", "content": "Always answer in rhymes." },
      { "role": "user", "content": "Introduce yourself." }
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": false
  }'

So the only difference here is the url... if we could customize the path (e.g., /v1/chat/completions vs /api/chat) in the settings, I think the problem would be solved.

<!-- gh-comment-id:2544002724 --> @robvanvolt commented on GitHub (Dec 15, 2024): I have also tried to get it to work with lm studio.. maybe we could add an open-ai compatible chat endpoint with /v1/chat/completions ? **Ollama** ``` curl http://localhost:11434/api/chat -d '{ "model": "llama3.2", "messages": [ { "role": "user", "content": "why is the sky blue?" } ] }' ``` **LM Studio** ``` curl http://localhost:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "granite-3.0-2b-instruct", "messages": [ { "role": "system", "content": "Always answer in rhymes." }, { "role": "user", "content": "Introduce yourself." } ], "temperature": 0.7, "max_tokens": -1, "stream": false }' ``` So the only difference here is the url... if we could customize the path (e.g., /v1/chat/completions vs /api/chat) in the settings, I think the problem would be solved.
Author
Owner

@robvanvolt commented on GitHub (Jan 8, 2025):

Any news on this?

<!-- gh-comment-id:2578782384 --> @robvanvolt commented on GitHub (Jan 8, 2025): Any news on this?
Author
Owner

@joshoram80 commented on GitHub (Jan 15, 2025):

2025-01-15T20:42:16.167Z error: [inference][1135] inference job failed: Error: [inference][1135] The model ignored our prompt and didn't respond with the expected JSON: {"issues":[{"code":"invalid_type","expected":"string","received":"number","path":["tags",3],"message":"Expected string, received number"}],"name":"ZodError"}. Here's a sneak peak from the response: { "tags": [ "P Error: [inference][1135] The model ignored our prompt and didn't respond with the expected JSON: {"issues":[{"code":"invalid_type","expected":"string","received":"number","path":["tags",3],"message":"Expected string, received number"}],"name":"ZodError"}. Here's a sneak peak from the response: { "tags": [ "P at inferTags (/app/apps/workers/openaiWorker.ts:6:4164) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6686) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)

Is this related? Sometimes i get tags, othertimes i get this error

<!-- gh-comment-id:2593901575 --> @joshoram80 commented on GitHub (Jan 15, 2025): `2025-01-15T20:42:16.167Z error: [inference][1135] inference job failed: Error: [inference][1135] The model ignored our prompt and didn't respond with the expected JSON: {"issues":[{"code":"invalid_type","expected":"string","received":"number","path":["tags",3],"message":"Expected string, received number"}],"name":"ZodError"}. Here's a sneak peak from the response: { "tags": [ "P Error: [inference][1135] The model ignored our prompt and didn't respond with the expected JSON: {"issues":[{"code":"invalid_type","expected":"string","received":"number","path":["tags",3],"message":"Expected string, received number"}],"name":"ZodError"}. Here's a sneak peak from the response: { "tags": [ "P at inferTags (/app/apps/workers/openaiWorker.ts:6:4164) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6686) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)` Is this related? Sometimes i get tags, othertimes i get this error
Author
Owner

@kurokuma-lab commented on GitHub (Feb 2, 2025):

I am having the same issue

<!-- gh-comment-id:2629280602 --> @kurokuma-lab commented on GitHub (Feb 2, 2025): I am having the same issue
Author
Owner

@porochickenrye commented on GitHub (Mar 2, 2025):

Same error as @joshoram80. I'm using an OpenAI compatible endpoint, which works fine with my OpenWebUI setup.

2025-03-02T14:14:38.760Z error: [inference][69] inference job failed: Error: [inference][69] The model ignored our prompt and didn't respond with the expected JSON: {}. Here's a sneak peak from the response: ```json

{

  "tags": 

Error: [inference][69] The model ignored our prompt and didn't respond with the expected JSON: {}. Here's a sneak peak from the response: ```json

{

  "tags": 

    at inferTags (/app/apps/workers/openaiWorker.ts:6:4292)

    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)

    at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6814)

    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.2_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2656)
<!-- gh-comment-id:2692763520 --> @porochickenrye commented on GitHub (Mar 2, 2025): Same error as @joshoram80. I'm using an OpenAI compatible endpoint, which works fine with my OpenWebUI setup. ``` 2025-03-02T14:14:38.760Z error: [inference][69] inference job failed: Error: [inference][69] The model ignored our prompt and didn't respond with the expected JSON: {}. Here's a sneak peak from the response: ```json { "tags": Error: [inference][69] The model ignored our prompt and didn't respond with the expected JSON: {}. Here's a sneak peak from the response: ```json { "tags": at inferTags (/app/apps/workers/openaiWorker.ts:6:4292) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6814) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.2_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2656) ```
Author
Owner

@MohamedBassem commented on GitHub (Mar 2, 2025):

@porochickenrye the model that you're using is wrapping the output in a markdown block. Try adding a custom instruction in the AI setting instructing the model to output only the json without a markdown code block.

<!-- gh-comment-id:2692768233 --> @MohamedBassem commented on GitHub (Mar 2, 2025): @porochickenrye the model that you're using is wrapping the output in a markdown block. Try adding a custom instruction in the AI setting instructing the model to output only the json without a markdown code block.
Author
Owner

@MohamedBassem commented on GitHub (Mar 2, 2025):

For the people getting a response format error, this has been fixed in the nightly build. If you're still facing problems, please open a new issue.

<!-- gh-comment-id:2692768781 --> @MohamedBassem commented on GitHub (Mar 2, 2025): For the people getting a response format error, this has been fixed in the nightly build. If you're still facing problems, please open a new issue.
Author
Owner

@porochickenrye commented on GitHub (Mar 2, 2025):

@porochickenrye the model that you're using is wrapping the output in a markdown block. Try adding a custom instruction in the AI setting instructing the model to output only the json without a markdown code block.

Thanks. Will try that.

Also, I don't see a nightly tag at ghcr. Possible to add it at some point?

<!-- gh-comment-id:2692827428 --> @porochickenrye commented on GitHub (Mar 2, 2025): > [@porochickenrye](https://github.com/porochickenrye) the model that you're using is wrapping the output in a markdown block. Try adding a custom instruction in the AI setting instructing the model to output only the json without a markdown code block. Thanks. Will try that. Also, I don't see a nightly tag at ghcr. Possible to add it at some point?
Author
Owner

@MohamedBassem commented on GitHub (Mar 2, 2025):

@porochickenrye nightly is just called "latest".

<!-- gh-comment-id:2692827734 --> @MohamedBassem commented on GitHub (Mar 2, 2025): @porochickenrye nightly is just called "latest".
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#228
No description provided.