[GH-ISSUE #693] Importing over 12,000 bookmarks from html file #449

Closed
opened 2026-03-02 11:49:58 +03:00 by kerem · 12 comments
Owner

Originally created by @nkj8732 on GitHub (Nov 24, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/693

Describe the Bug

Ok I am not a techie, just someone wanting to use this great software. I am trying to import an html file exported from Brave with over 12,000 bookmarks, with local ollama ai tagging. Hoarder seems to show it is importing all bookmarks (via the progress line), but when I go to Lists - Imported Bookmarks, and scroll all the way down, only about 1000 bookmarks seem to have been imported, and not all of those are tagged or even fetched from website, it seems. How to fix? Thank you. (If there is a better place to ask this question other than here on github, let me know.)

Steps to Reproduce

  1. import an html file exported from Brave with over 12,000 bookmarks
  2. go to Lists - Imported Bookmarks, and scroll all the way down

Expected Behaviour

I expect to see all 12,000+ bookmarks, fetched and ai tagged.

Screenshots or Additional Context

No response

Device Details

No response

Exact Hoarder Version

v0.19.0

Originally created by @nkj8732 on GitHub (Nov 24, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/693 ### Describe the Bug Ok I am not a techie, just someone wanting to use this great software. I am trying to import an html file exported from Brave with over 12,000 bookmarks, with local ollama ai tagging. Hoarder seems to show it is importing all bookmarks (via the progress line), but when I go to Lists - Imported Bookmarks, and scroll all the way down, only about 1000 bookmarks seem to have been imported, and not all of those are tagged or even fetched from website, it seems. How to fix? Thank you. (If there is a better place to ask this question other than here on github, let me know.) ### Steps to Reproduce 1. import an html file exported from Brave with over 12,000 bookmarks 2. go to Lists - Imported Bookmarks, and scroll all the way down ### Expected Behaviour I expect to see all 12,000+ bookmarks, fetched and ai tagged. ### Screenshots or Additional Context _No response_ ### Device Details _No response_ ### Exact Hoarder Version v0.19.0
kerem closed this issue 2026-03-02 11:49:58 +03:00
Author
Owner

@MohamedBassem commented on GitHub (Nov 24, 2024):

@nkj87lm9532 Can you go the admin dashboard and see how many bookmarks got assigned to your user? And can you also screenshot the background jobs status?

<!-- gh-comment-id:2496161972 --> @MohamedBassem commented on GitHub (Nov 24, 2024): @nkj87lm9532 Can you go the admin dashboard and see how many bookmarks got assigned to your user? And can you also screenshot the background jobs status?
Author
Owner

@nkj8732 commented on GitHub (Nov 24, 2024):

Hi. Thank you for your kind respone. 4246 Total Bookmarks.

<!-- gh-comment-id:2496165496 --> @nkj8732 commented on GitHub (Nov 24, 2024): Hi. Thank you for your kind respone. 4246 Total Bookmarks.
Author
Owner

@nkj8732 commented on GitHub (Nov 24, 2024):

background jobs

<!-- gh-comment-id:2496165798 --> @nkj8732 commented on GitHub (Nov 24, 2024): ![background jobs](https://github.com/user-attachments/assets/430503c3-a84a-4210-9c47-9631b29f6183)
Author
Owner

@MohamedBassem commented on GitHub (Nov 24, 2024):

Ok, there's a lot of failures in your background workers. Can you share the logs of the web container to find out why?

<!-- gh-comment-id:2496166049 --> @MohamedBassem commented on GitHub (Nov 24, 2024): Ok, there's a lot of failures in your background workers. Can you share the logs of the web container to find out why?
Author
Owner

@nkj8732 commented on GitHub (Nov 24, 2024):

How to stop and reset all background jobs? I stopped and restarted the container in docker but all background jobs are still there.

<!-- gh-comment-id:2496293909 --> @nkj8732 commented on GitHub (Nov 24, 2024): How to stop and reset all background jobs? I stopped and restarted the container in docker but all background jobs are still there.
Author
Owner

@MohamedBassem commented on GitHub (Nov 24, 2024):

you can stop hoarder and delete the 'queue.db' file in the data directory

<!-- gh-comment-id:2496294345 --> @MohamedBassem commented on GitHub (Nov 24, 2024): you can stop hoarder and delete the 'queue.db' file in the data directory
Author
Owner

@nkj8732 commented on GitHub (Nov 25, 2024):

Ok, for any noobs like me who might be helped with this chat. I guess part of the problem was that 2/3 of the 12,000+ bookmarks were duplicates. (Apparently that is a problem with Brave Sync). I cleaned up the bookmarks using web browser extension called Bookmark Dupes. It reduced the bookmarks to 4101. I reset Hoarder by going to docker and stopping the hoarder-web-1 container, then going to cd /var/lib/docker/volumes/hoarder_data/_data in docker and deleting the db.db file. I then imported the 4101 bookmarks into Hoarder successfully. I left it overnight...

<!-- gh-comment-id:2498202731 --> @nkj8732 commented on GitHub (Nov 25, 2024): Ok, for any noobs like me who might be helped with this chat. I guess part of the problem was that 2/3 of the 12,000+ bookmarks were duplicates. (Apparently that is a problem with Brave Sync). I cleaned up the bookmarks using web browser extension called Bookmark Dupes. It reduced the bookmarks to 4101. I reset Hoarder by going to docker and stopping the hoarder-web-1 container, then going to cd /var/lib/docker/volumes/hoarder_data/_data in docker and deleting the db.db file. I then imported the 4101 bookmarks into Hoarder successfully. I left it overnight...
Author
Owner

@nkj8732 commented on GitHub (Nov 25, 2024):

Now the question is, there are still many Queued and Pending jobs in Background Jobs. I would like to understand what is happening in Background Jobs, but I did not find an explanation in the Hoarder docs.
What is the difference between Queued and Pending?
Should I just let these jobs run? The numbers do seem to be diminishing, except Inference Jobs, which increases under Queued, but remains the same number 4044 under Pending.

<!-- gh-comment-id:2498212556 --> @nkj8732 commented on GitHub (Nov 25, 2024): Now the question is, there are still many Queued and Pending jobs in Background Jobs. I would like to understand what is happening in Background Jobs, but I did not find an explanation in the Hoarder docs. What is the difference between Queued and Pending? Should I just let these jobs run? The numbers do seem to be diminishing, except Inference Jobs, which increases under Queued, but remains the same number 4044 under Pending.
Author
Owner

@nkj8732 commented on GitHub (Nov 25, 2024):

Well it took a couple of days and nights for Hoarder to process the thousands of background jobs, but the good news is that it did. 144 and 255 failed, but all bookmarks were imported and most were ai tagged.

<!-- gh-comment-id:2498213674 --> @nkj8732 commented on GitHub (Nov 25, 2024): Well it took a couple of days and nights for Hoarder to process the thousands of background jobs, but the good news is that it did. 144 and 255 failed, but all bookmarks were imported and most were ai tagged.
Author
Owner

@nkj8732 commented on GitHub (Nov 25, 2024):

@MohamedBassem, I know you asked for a log file for the fails, so here is part of one. I did not realize my local ollama was not working at the time, but I have it up now. Shall I send a txt file of hoarder-web-1 log to see the crawling fails?

<!-- gh-comment-id:2498227360 --> @nkj8732 commented on GitHub (Nov 25, 2024): @MohamedBassem, I know you asked for a log file for the fails, so here is part of one. I did not realize my local ollama was not working at the time, but I have it up now. Shall I send a txt file of hoarder-web-1 log to see the crawling fails?
Author
Owner

@nonsleepr commented on GitHub (Jan 2, 2025):

What is the difference between Queued and Pending?

I've been puzzled by those terms too. Looking at the was it works under the hood, "pending" is the total number of the links that weren't processed (crawled/indexed) yet. "Queued" is the number of the links in the task queue (queue.db).

you can stop hoarder and delete the 'queue.db' file in the data directory

In my case of migrating 7000+ bookmarks from Omnivore I got the processing stuck several times (no changes for days with a few tasks in "running" state) with restarts doing nothing. Didn't try to delete the whole db yet but deleting records from it creates a situation of tasks being entirely forgotten (can't re-crawl pending, only re-crawl all).

<!-- gh-comment-id:2567229822 --> @nonsleepr commented on GitHub (Jan 2, 2025): > What is the difference between Queued and Pending? I've been puzzled by those terms too. Looking at the was it works under the hood, "pending" is the total number of the links that weren't processed (crawled/indexed) yet. "Queued" is the number of the links in the task queue (`queue.db`). > you can stop hoarder and delete the 'queue.db' file in the data directory In my case of migrating 7000+ bookmarks from Omnivore I got the processing stuck several times (no changes for days with a few tasks in "running" state) with restarts doing nothing. Didn't try to delete the whole db yet but deleting records from it creates a situation of tasks being entirely forgotten (can't re-crawl pending, only re-crawl all).
Author
Owner

@nonsleepr commented on GitHub (Jan 2, 2025):

you can stop hoarder and delete the 'queue.db' file in the data directory

Yeah, that doesn't work either. Now I'm just getting 0 queued and ~2,000 pending.

<!-- gh-comment-id:2567251266 --> @nonsleepr commented on GitHub (Jan 2, 2025): > you can stop hoarder and delete the 'queue.db' file in the data directory Yeah, that doesn't work either. Now I'm just getting 0 queued and ~2,000 pending.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#449
No description provided.