[GH-ISSUE #646] TypeError parsing Ollama response when prompt is truncated #412

Closed
opened 2026-03-02 11:49:38 +03:00 by kerem · 1 comment
Owner

Originally created by @sbarbett on GitHub (Nov 11, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/646

Describe the Bug

I've noticed that whenever an input prompt gets truncated I'm getting a TypeError from the ollama response.

web-1          | 2024-11-11T20:44:43.789Z info: [inference][718] Starting an inference job for bookmark with id "fwkl0xsdqya1pk6ad9gsb4mn"
web-1          | 2024-11-11T20:44:44.471Z info: [search][812] Attempting to index bookmark with id f5zaz5x63gwtmi7akx0qc012 ...
web-1          | 2024-11-11T20:44:45.058Z info: [search][812] Completed successfully
web-1          | 2024-11-11T20:49:52.445Z info: [inference][718] Inferring tag for bookmark "fwkl0xsdqya1pk6ad9gsb4mn" used 1095 tokens and inferred: <redacted>
web-1          | 2024-11-11T20:49:52.478Z info: [inference][718] Completed successfully
web-1          | 2024-11-11T20:49:52.499Z info: [inference][712] Starting an inference job for bookmark with id "ex2ue3p24p4726elmm5h0qrt"
web-1          | 2024-11-11T20:49:53.286Z info: [search][813] Attempting to index bookmark with id fwkl0xsdqya1pk6ad9gsb4mn ...
web-1          | 2024-11-11T20:49:53.868Z info: [search][813] Completed successfully
web-1          | 2024-11-11T20:54:52.159Z error: [inference][712] inference job failed: TypeError: fetch failed
web-1          | TypeError: fetch failed
web-1          |     at node:internal/deps/undici/undici:13392:13
web-1          |     at async post (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:114:20)
web-1          |     at async Ollama.processStreamableRequest (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:232:25)
web-1          |     at async OllamaInferenceClient.runModel (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3206)
web-1          |     at async OllamaInferenceClient.inferFromText (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3956)
web-1          |     at async inferTagsFromText (/app/apps/workers/openaiWorker.ts:6:3135)
web-1          |     at async inferTags (/app/apps/workers/openaiWorker.ts:6:3370)
web-1          |     at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6792)
web-1          |     at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
web-1          | 2024-11-11T20:54:52.189Z info: [inference][721] Starting an inference job for bookmark with id "hkdkt9800t7fdbr4p03waeae"
web-1          | 2024-11-11T20:59:52.273Z error: [inference][721] inference job failed: TypeError: fetch failed
web-1          | TypeError: fetch failed
web-1          |     at node:internal/deps/undici/undici:13392:13
web-1          |     at async post (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:114:20)
web-1          |     at async Ollama.processStreamableRequest (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:232:25)
web-1          |     at async OllamaInferenceClient.runModel (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3206)
web-1          |     at async OllamaInferenceClient.inferFromText (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3956)
web-1          |     at async inferTagsFromText (/app/apps/workers/openaiWorker.ts:6:3135)
web-1          |     at async inferTags (/app/apps/workers/openaiWorker.ts:6:3370)
web-1          |     at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6792)
web-1          |     at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
web-1          | 2024-11-11T20:59:52.289Z info: [inference][721] Starting an inference job for bookmark with id "hkdkt9800t7fdbr4p03waeae"
web-1          | 2024-11-11T21:00:00.756Z info: [feed] Scheduling feed refreshing jobs ...

This is the log from ollama. It's responding successfully.

Nov 11 20:45:06 llama ollama[3317]: time=2024-11-11T20:45:06.001Z level=INFO source=server.go:601 msg="llama runner started in 2.26 seconds"
Nov 11 20:49:52 llama ollama[3317]: [GIN] 2024/11/11 - 20:49:52 | 200 |          5m8s |        10.0.0.4 | POST     "/api/chat"
Nov 11 20:49:52 llama ollama[3317]: time=2024-11-11T20:49:52.521Z level=WARN source=runner.go:126 msg="truncating input prompt" limit=4096 prompt=4664 numKeep=5
Nov 11 20:54:52 llama ollama[3317]: [GIN] 2024/11/11 - 20:54:52 | 200 |         4m59s |        10.0.0.4 | POST     "/api/chat"
Nov 11 20:54:52 llama ollama[3317]: time=2024-11-11T20:54:52.202Z level=WARN source=runner.go:126 msg="truncating input prompt" limit=4096 prompt=8510 numKeep=5
Nov 11 20:59:52 llama ollama[3317]: [GIN] 2024/11/11 - 20:59:52 | 200 |          5m0s |        10.0.0.4 | POST     "/api/chat"
Nov 11 20:59:52 llama ollama[3317]: time=2024-11-11T20:59:52.302Z level=WARN source=runner.go:126 msg="truncating input prompt" limit=4096 prompt=8510 numKeep=5

I have the timeout set all the way up to 10 minutes. This is my environment.

HOARDER_VERSION=release
NEXTAUTH_SECRET=<redacted>
MEILI_MASTER_KEY=<redacted>
NEXTAUTH_URL=https://<redacted>
DISABLE_SIGNUPS=true
MEILI_ADDR=http://meilisearch:7700
# Ollama is in a separate LXC on the same Proxmox device
OLLAMA_BASE_URL=http://10.0.0.7:11434
INFERENCE_TEXT_MODEL=mistral
INFERENCE_IMAGE_MODEL=llava
INFERENCE_JOB_TIMEOUT_SEC=600
INFERENCE_CONTEXT_LENGTH=4096

Anything I can do to debug this further?

Steps to Reproduce

  1. Set up a Hoarder instance
  2. Configure environment to point to an Ollama instance
  3. Try tagging bookmarks and observe the ones with prompts that exceed context lengths failing in Docker logs

Expected Behaviour

Although the prompt is truncated, I would except some manner of response to be parsed and not a TypeError

Screenshots or Additional Context

No response

Device Details

Hoarder is running in Docker in a Debian 12 LXC with 2 vCPUs and 8gb of RAM (alongside multiple other Docker services)

Ollama is running "baremetal" on a Debian 12 LXC with 4 vCPUs and 10gb RAM (sufficient for the 7B parameter models)

Exact Hoarder Version

v0.19.0

Originally created by @sbarbett on GitHub (Nov 11, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/646 ### Describe the Bug I've noticed that whenever an input prompt gets truncated I'm getting a TypeError from the ollama response. ``` web-1 | 2024-11-11T20:44:43.789Z info: [inference][718] Starting an inference job for bookmark with id "fwkl0xsdqya1pk6ad9gsb4mn" web-1 | 2024-11-11T20:44:44.471Z info: [search][812] Attempting to index bookmark with id f5zaz5x63gwtmi7akx0qc012 ... web-1 | 2024-11-11T20:44:45.058Z info: [search][812] Completed successfully web-1 | 2024-11-11T20:49:52.445Z info: [inference][718] Inferring tag for bookmark "fwkl0xsdqya1pk6ad9gsb4mn" used 1095 tokens and inferred: <redacted> web-1 | 2024-11-11T20:49:52.478Z info: [inference][718] Completed successfully web-1 | 2024-11-11T20:49:52.499Z info: [inference][712] Starting an inference job for bookmark with id "ex2ue3p24p4726elmm5h0qrt" web-1 | 2024-11-11T20:49:53.286Z info: [search][813] Attempting to index bookmark with id fwkl0xsdqya1pk6ad9gsb4mn ... web-1 | 2024-11-11T20:49:53.868Z info: [search][813] Completed successfully web-1 | 2024-11-11T20:54:52.159Z error: [inference][712] inference job failed: TypeError: fetch failed web-1 | TypeError: fetch failed web-1 | at node:internal/deps/undici/undici:13392:13 web-1 | at async post (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:114:20) web-1 | at async Ollama.processStreamableRequest (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:232:25) web-1 | at async OllamaInferenceClient.runModel (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3206) web-1 | at async OllamaInferenceClient.inferFromText (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3956) web-1 | at async inferTagsFromText (/app/apps/workers/openaiWorker.ts:6:3135) web-1 | at async inferTags (/app/apps/workers/openaiWorker.ts:6:3370) web-1 | at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6792) web-1 | at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) web-1 | 2024-11-11T20:54:52.189Z info: [inference][721] Starting an inference job for bookmark with id "hkdkt9800t7fdbr4p03waeae" web-1 | 2024-11-11T20:59:52.273Z error: [inference][721] inference job failed: TypeError: fetch failed web-1 | TypeError: fetch failed web-1 | at node:internal/deps/undici/undici:13392:13 web-1 | at async post (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:114:20) web-1 | at async Ollama.processStreamableRequest (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:232:25) web-1 | at async OllamaInferenceClient.runModel (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3206) web-1 | at async OllamaInferenceClient.inferFromText (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3956) web-1 | at async inferTagsFromText (/app/apps/workers/openaiWorker.ts:6:3135) web-1 | at async inferTags (/app/apps/workers/openaiWorker.ts:6:3370) web-1 | at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6792) web-1 | at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) web-1 | 2024-11-11T20:59:52.289Z info: [inference][721] Starting an inference job for bookmark with id "hkdkt9800t7fdbr4p03waeae" web-1 | 2024-11-11T21:00:00.756Z info: [feed] Scheduling feed refreshing jobs ... ``` This is the log from ollama. It's responding successfully. ``` Nov 11 20:45:06 llama ollama[3317]: time=2024-11-11T20:45:06.001Z level=INFO source=server.go:601 msg="llama runner started in 2.26 seconds" Nov 11 20:49:52 llama ollama[3317]: [GIN] 2024/11/11 - 20:49:52 | 200 | 5m8s | 10.0.0.4 | POST "/api/chat" Nov 11 20:49:52 llama ollama[3317]: time=2024-11-11T20:49:52.521Z level=WARN source=runner.go:126 msg="truncating input prompt" limit=4096 prompt=4664 numKeep=5 Nov 11 20:54:52 llama ollama[3317]: [GIN] 2024/11/11 - 20:54:52 | 200 | 4m59s | 10.0.0.4 | POST "/api/chat" Nov 11 20:54:52 llama ollama[3317]: time=2024-11-11T20:54:52.202Z level=WARN source=runner.go:126 msg="truncating input prompt" limit=4096 prompt=8510 numKeep=5 Nov 11 20:59:52 llama ollama[3317]: [GIN] 2024/11/11 - 20:59:52 | 200 | 5m0s | 10.0.0.4 | POST "/api/chat" Nov 11 20:59:52 llama ollama[3317]: time=2024-11-11T20:59:52.302Z level=WARN source=runner.go:126 msg="truncating input prompt" limit=4096 prompt=8510 numKeep=5 ``` I have the timeout set all the way up to 10 minutes. This is my environment. ``` HOARDER_VERSION=release NEXTAUTH_SECRET=<redacted> MEILI_MASTER_KEY=<redacted> NEXTAUTH_URL=https://<redacted> DISABLE_SIGNUPS=true MEILI_ADDR=http://meilisearch:7700 # Ollama is in a separate LXC on the same Proxmox device OLLAMA_BASE_URL=http://10.0.0.7:11434 INFERENCE_TEXT_MODEL=mistral INFERENCE_IMAGE_MODEL=llava INFERENCE_JOB_TIMEOUT_SEC=600 INFERENCE_CONTEXT_LENGTH=4096 ``` Anything I can do to debug this further? ### Steps to Reproduce 1. Set up a Hoarder instance 2. Configure environment to point to an Ollama instance 3. Try tagging bookmarks and observe the ones with prompts that exceed context lengths failing in Docker logs ### Expected Behaviour Although the prompt is truncated, I would except some manner of response to be parsed and not a TypeError ### Screenshots or Additional Context _No response_ ### Device Details Hoarder is running in Docker in a Debian 12 LXC with 2 vCPUs and 8gb of RAM (alongside multiple other Docker services) Ollama is running "baremetal" on a Debian 12 LXC with 4 vCPUs and 10gb RAM (sufficient for the 7B parameter models) ### Exact Hoarder Version v0.19.0
kerem closed this issue 2026-03-02 11:49:39 +03:00
Author
Owner

@MohamedBassem commented on GitHub (Nov 11, 2024):

This seems to be a duplicate of #628. Timeouts above 5mins on ollama doesn't seem to work. Let's track it there.

<!-- gh-comment-id:2469037870 --> @MohamedBassem commented on GitHub (Nov 11, 2024): This seems to be a duplicate of #628. Timeouts above 5mins on ollama doesn't seem to work. Let's track it there.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#412
No description provided.