[GH-ISSUE #339] [Feature Request] Allow 3rd party crawling #218

Closed
opened 2026-03-02 11:47:43 +03:00 by kerem · 3 comments
Owner

Originally created by @aaroneden on GitHub (Jul 31, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/339

Allow connections to Zapier or systems like FireCrawl for more robust crawling

Originally created by @aaroneden on GitHub (Jul 31, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/339 Allow connections to Zapier or systems like FireCrawl for more robust crawling
kerem 2026-03-02 11:47:43 +03:00
Author
Owner

@kamtschatka commented on GitHub (Aug 3, 2024):

from what I can see they only provide markdown, whereas the link scraping in hoarder uses html, so it would be more like a text bookmark and not a link bookmark.

From previous responses to issues like this, the intention is rather to keep hoarder clean instead of adding all kinds of integrations for all kinds of (paid) services.
Can't you utilize the CLI and push the markdown you scraped using those services to hoarder?

<!-- gh-comment-id:2266733118 --> @kamtschatka commented on GitHub (Aug 3, 2024): from what I can see they only provide markdown, whereas the link scraping in hoarder uses html, so it would be more like a text bookmark and not a link bookmark. From previous responses to issues like this, the intention is rather to keep hoarder clean instead of adding all kinds of integrations for all kinds of (paid) services. Can't you utilize the CLI and push the markdown you scraped using those services to hoarder?
Author
Owner

@MohamedBassem commented on GitHub (Aug 26, 2024):

Hoarder currently supports browserless (via BROWSER_WEBSOCKET_URL), given that it's the container that's used on unraid for chrome, and that it still keeps hoarder 3rd party provider agnostic.

We don't currently plan to support more 3rd party crawling unless there's strong demand from the community.

<!-- gh-comment-id:2310828667 --> @MohamedBassem commented on GitHub (Aug 26, 2024): Hoarder currently supports browserless (via `BROWSER_WEBSOCKET_URL`), given that it's the container that's used on unraid for chrome, and that it still keeps hoarder 3rd party provider agnostic. We don't currently plan to support more 3rd party crawling unless there's strong demand from the community.
Author
Owner

@MohamedBassem commented on GitHub (Sep 15, 2024):

Closing this as it's unlikely it'll get implemented.

<!-- gh-comment-id:2351238115 --> @MohamedBassem commented on GitHub (Sep 15, 2024): Closing this as it's unlikely it'll get implemented.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#218
No description provided.