[GH-ISSUE #938] Ignore certain websites crawling/archiving after X tries #619

Closed
opened 2026-03-02 11:51:22 +03:00 by kerem · 4 comments
Owner

Originally created by @s1lverkin on GitHub (Jan 26, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/938

Describe the feature you'd like

I am a link hoarder, have thousands of URL's in Hoarder, so there are some of the links that are no longer existing, and crawling for those it's just a waste of time and resource.

Let's have a configuration option for this how many retries do we want. Of course this need to have some kind of "release" mechanism, to remove sites from the blacklist.

Describe the benefits this would bring to existing Hoarder users

It would speed-up the process of recrawling all failed URL's.

Can the goal of this request already be achieved via other means?

No

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @s1lverkin on GitHub (Jan 26, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/938 ### Describe the feature you'd like I am a link hoarder, have thousands of URL's in Hoarder, so there are some of the links that are no longer existing, and crawling for those it's just a waste of time and resource. Let's have a configuration option for this how many retries do we want. Of course this need to have some kind of "release" mechanism, to remove sites from the blacklist. ### Describe the benefits this would bring to existing Hoarder users It would speed-up the process of recrawling all failed URL's. ### Can the goal of this request already be achieved via other means? No ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_
kerem closed this issue 2026-03-02 11:51:22 +03:00
Author
Owner

@MohamedBassem commented on GitHub (Jan 26, 2025):

Hoarder is configured to retry 5 times and then give up on the link. I honestly, don't see a lot of value in making this configurable. Once the initial crawling is done, you can then find all the ones that failed in the Broken Links setting page and you can delete them if you no longer want to keep them.

<!-- gh-comment-id:2614356783 --> @MohamedBassem commented on GitHub (Jan 26, 2025): Hoarder is configured to retry 5 times and then give up on the link. I honestly, don't see a lot of value in making this configurable. Once the initial crawling is done, you can then find all the ones that failed in the `Broken Links` setting page and you can delete them if you no longer want to keep them.
Author
Owner

@s1lverkin commented on GitHub (Jan 26, 2025):

Hi @MohamedBassem, oh, didn't know about it. So it's just retrying 5 times, then it it goes to Broken Links directory, but after hitting Regenerate Failed Links / AI Tags for Failed Bookmarks it's getting resetted once again, right?

<!-- gh-comment-id:2614359361 --> @s1lverkin commented on GitHub (Jan 26, 2025): Hi @MohamedBassem, oh, didn't know about it. So it's just retrying 5 times, then it it goes to Broken Links directory, but after hitting Regenerate Failed Links / AI Tags for Failed Bookmarks it's getting resetted once again, right?
Author
Owner

@MohamedBassem commented on GitHub (Jan 26, 2025):

Yes, if you explicitly ask hoarder to retry all failed links, it'll retry those links 5 more times. If you no longer want them, just find and remove the from the broken links directory.

<!-- gh-comment-id:2614369047 --> @MohamedBassem commented on GitHub (Jan 26, 2025): Yes, if you explicitly ask hoarder to retry all failed links, it'll retry those links 5 more times. If you no longer want them, just find and remove the from the broken links directory.
Author
Owner

@s1lverkin commented on GitHub (Jan 26, 2025):

Thank you for explanation, you were right, there's no need to do this.

<!-- gh-comment-id:2614375159 --> @s1lverkin commented on GitHub (Jan 26, 2025): Thank you for explanation, you were right, there's no need to do this.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#619
No description provided.