[GH-ISSUE #1611] [FR] Preflight checks on inference / browser service health before executing the background jobs #1006

Open
opened 2026-03-02 11:54:20 +03:00 by kerem · 4 comments
Owner

Originally created by @pdc1 on GitHub (Jun 15, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1611

Describe the feature you'd like

I would like inference tasks that time out to be retried. My Ollama machine is my desktop, which is often on standby during the day, and off at night. The logs show TypeError: fetch failed so the condition is detected, but it appears it is just listed as failed.

Alternately, it could also work to have a specific category of retry requests that timed out.

Describe the benefits this would bring to existing Karakeep users

More reliability on AI tagging, and more flexibility on hosting Ollama.

Can the goal of this request already be achieved via other means?

I believe I would have to retry all failed AI requests, and for whatever reason, I have a lot of those, or by refreshing individual links, but that also refreshes everything (re-crawls, etc), plus it is not obvious which ones failed for which reasons.

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @pdc1 on GitHub (Jun 15, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1611 ### Describe the feature you'd like I would like inference tasks that time out to be retried. My Ollama machine is my desktop, which is often on standby during the day, and off at night. The logs show `TypeError: fetch failed` so the condition is detected, but it appears it is just listed as failed. Alternately, it could also work to have a specific category of retry requests that timed out. ### Describe the benefits this would bring to existing Karakeep users More reliability on AI tagging, and more flexibility on hosting Ollama. ### Can the goal of this request already be achieved via other means? I believe I would have to retry all failed AI requests, and for whatever reason, I have a lot of those, or by refreshing individual links, but that also refreshes everything (re-crawls, etc), plus it is not obvious which ones failed for which reasons. ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_
Author
Owner

@pdc1 commented on GitHub (Jun 15, 2025):

Alternately, it would be nice to have an admin view to the API so I could write my own tool to find and retry failed inference tasks. I would love to be able to get more insight into which tasks failed, and why (without poking around in the database 😉)

Update: it looks like only bookmarkLinks has more details on success/failure, so it might be nice to capture the nature of the inference failures to allow more nuanced reprocessing.

<!-- gh-comment-id:2974437235 --> @pdc1 commented on GitHub (Jun 15, 2025): Alternately, it would be nice to have an admin view to the API so I could write my own tool to find and retry failed inference tasks. I would love to be able to get more insight into which tasks failed, and why (without poking around in the database 😉) Update: it looks like only `bookmarkLinks` has more details on success/failure, so it might be nice to capture the nature of the inference failures to allow more nuanced reprocessing.
Author
Owner

@grimwiz commented on GitHub (Jun 19, 2025):

Some of my timeouts are because for some unknown reason the ollama process does not return an answer within 4 minutes. There's a difference between the AI process not listening (where retrying instantly is pointless, you may as well wait an hour) or if it's just not processing (where you don't want to retry with that model because if it failed last time it won't work next) Maybe have a backup model to use on retries that process for a while and fail to return an answer?

<!-- gh-comment-id:2988587112 --> @grimwiz commented on GitHub (Jun 19, 2025): Some of my timeouts are because for some unknown reason the ollama process does not return an answer within 4 minutes. There's a difference between the AI process not listening (where retrying instantly is pointless, you may as well wait an hour) or if it's just not processing (where you don't want to retry with that model because if it failed last time it won't work next) Maybe have a backup model to use on retries that process for a while and fail to return an answer?
Author
Owner

@pdc1 commented on GitHub (Jun 22, 2025):

I took a closer look at the logs, and what I described as a timeout is actually a type error. I am assuming that is because the request to the ollama server is failing with an http status or error message instead of JSON. I updated the original post to reflect this.

For this category of errors specifically, it would be great if Karakeep could check that the inference server and/or ollama are available before sending the request. This would allow marking the task/bookmark as being due to "networking issues" and make it available to retry on a regular basis.

Ideally it would be nice if the error type could be saved on failures, and to have some admin tools (web UI and/or API) to retry specific types of errors.

<!-- gh-comment-id:2994243434 --> @pdc1 commented on GitHub (Jun 22, 2025): I took a closer look at the logs, and what I described as a timeout is actually a type error. I am assuming that is because the request to the ollama server is failing with an http status or error message instead of JSON. I updated the original post to reflect this. For this category of errors specifically, it would be great if Karakeep could check that the inference server and/or ollama are available before sending the request. This would allow marking the task/bookmark as being due to "networking issues" and make it available to retry on a regular basis. Ideally it would be nice if the error type could be saved on failures, and to have some admin tools (web UI and/or API) to retry specific types of errors.
Author
Owner

@MohamedBassem commented on GitHub (Sep 7, 2025):

I'm planning to add add support for preflight checks on inference server and browser health. Will use this issue for that.

<!-- gh-comment-id:3263711810 --> @MohamedBassem commented on GitHub (Sep 7, 2025): I'm planning to add add support for preflight checks on inference server and browser health. Will use this issue for that.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1006
No description provided.