[GH-ISSUE #595] Some tags not being added after inference with Open AI endpoint #381

Closed
opened 2026-03-02 11:49:21 +03:00 by kerem · 1 comment
Owner

Originally created by @MakeSomeGood on GitHub (Oct 29, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/595

Describe the Bug

I've noticed that some tags are not being added after inference when using the Open AI endpoint. For example, the tag "3D Printing" seems to not get added.

While looking into the code, I found a potential starting point for investigation in the normalizeTag function in apps/workers/openaiWorker.ts. The current implementation uses a RegEx that removes a range of characters from space to _, which also removes all numbers:

return tag.toLowerCase().replace(/[ -_]/g, "");

Thanks for all your awesome work!

Steps to Reproduce

These steps used OpenAI compatible endpoints with Mistral Large,

  1. Add "https://www.reddit.com/r/3Dprinting/"
  2. Verify with the logs to see if "3D Printing" is one of the tags inferred (my logs showed: 3D Printing,Technology,DIY Projects,Reddit,Engineering)
  3. Verify to see if "3D Printing" was added to the tag list.

Expected Behaviour

The tag "3D Printing" be one of the tags added.

Screenshots or Additional Context

Inference Log:

hoarder-web-1          | 2024-10-29T02:25:53.570Z info: [inference][113] Inferring tag for bookmark "keikas82ivrdaekqf7q33xys" used 1246 tokens and inferred: 3D Printing,Technology,DIY Projects,Reddit,Engineering
hoarder-web-1          | 2024-10-29T02:25:53.611Z info: [inference][113] Completed successfully

The entry after it was added:
Screenshot_20241028_193408

Device Details

No response

Exact Hoarder Version

v0.18.0

Originally created by @MakeSomeGood on GitHub (Oct 29, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/595 ### Describe the Bug I've noticed that some tags are not being added after inference when using the Open AI endpoint. For example, the tag "3D Printing" seems to not get added. While looking into the code, I found a potential starting point for investigation in the `normalizeTag` function in `apps/workers/openaiWorker.ts`. The current implementation uses a RegEx that removes a range of characters from space to _, which also removes all numbers: ```typescript return tag.toLowerCase().replace(/[ -_]/g, ""); ``` Thanks for all your awesome work! ### Steps to Reproduce These steps used OpenAI compatible endpoints with Mistral Large, 1. Add "https://www.reddit.com/r/3Dprinting/" 2. Verify with the logs to see if "3D Printing" is one of the tags inferred (my logs showed: 3D Printing,Technology,DIY Projects,Reddit,Engineering) 3. Verify to see if "3D Printing" was added to the tag list. ### Expected Behaviour The tag "3D Printing" be one of the tags added. ### Screenshots or Additional Context Inference Log: ``` hoarder-web-1 | 2024-10-29T02:25:53.570Z info: [inference][113] Inferring tag for bookmark "keikas82ivrdaekqf7q33xys" used 1246 tokens and inferred: 3D Printing,Technology,DIY Projects,Reddit,Engineering hoarder-web-1 | 2024-10-29T02:25:53.611Z info: [inference][113] Completed successfully ``` The entry after it was added: ![Screenshot_20241028_193408](https://github.com/user-attachments/assets/e873e9ac-a59b-46a3-990f-8be4f2a5e40b) ### Device Details _No response_ ### Exact Hoarder Version v0.18.0
kerem 2026-03-02 11:49:21 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@MohamedBassem commented on GitHub (Oct 29, 2024):

yeah, that's a bug, I intended to remove the dashes but forgot to escape the dash in the regex. Thanks for the detailed report and taking the time to go through the code and pinpoint the bug. I really appreciate it.

<!-- gh-comment-id:2445386915 --> @MohamedBassem commented on GitHub (Oct 29, 2024): yeah, that's a bug, I intended to remove the dashes but forgot to escape the dash in the regex. Thanks for the detailed report and taking the time to go through the code and pinpoint the bug. I really appreciate it.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#381
No description provided.