[GH-ISSUE #864] Deduplication doesn't take into account the trailing "/" #564

Open
opened 2026-03-02 11:50:54 +03:00 by kerem · 0 comments
Owner

Originally created by @deandmx on GitHub (Jan 12, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/864

Describe the Bug

I am not a programmer so I may be well off base here but this is what i've found.

I have imported all my bookmarks from the various services i've used over the years totalling 12844 bookmarks. While scanning the list in Hoarder I came across some duplicates, the only difference between urls being a trailing "/". I've seen that you do deduplication on import but treat duplicate URLs with different protocols (http and https) as unique.

Long story short, I knocked up a python script with chatgpt to take my exported bookmarks and ignore the protocol and normalise the urls (ignoring the trailing "/") and the output was 2257!

Steps to Reproduce

Import bookmarks with duplicate urls but with one having trailing slashes added.

Expected Behaviour

One copy of a particular url.

Screenshots or Additional Context

No response

Device Details

No response

Exact Hoarder Version

v0.21.0

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem
Originally created by @deandmx on GitHub (Jan 12, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/864 ### Describe the Bug I am not a programmer so I may be well off base here but this is what i've found. I have imported all my bookmarks from the various services i've used over the years totalling 12844 bookmarks. While scanning the list in Hoarder I came across some duplicates, the only difference between urls being a trailing "/". I've seen that you do deduplication on import but treat duplicate URLs with different protocols (http and https) as unique. Long story short, I knocked up a python script with chatgpt to take my exported bookmarks and ignore the protocol and normalise the urls (ignoring the trailing "/") and the output was 2257! ### Steps to Reproduce Import bookmarks with duplicate urls but with one having trailing slashes added. ### Expected Behaviour One copy of a particular url. ### Screenshots or Additional Context _No response_ ### Device Details _No response_ ### Exact Hoarder Version v0.21.0 ### Have you checked the troubleshooting guide? - [X] I have checked the troubleshooting guide and I haven't found a solution to my problem
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#564
No description provided.