[GH-ISSUE #28] feature request: pdf support #25

Closed
opened 2026-03-02 11:45:49 +03:00 by kerem · 2 comments
Owner

Originally created by @asg0451 on GitHub (Mar 27, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/28

I'd like to be able to capture a pdf url, eg https://gavinadair.files.wordpress.com/2017/03/baker-changes-of-mind.pdf

currently, it is captured but no tags or added nor is text extracted
image

logs:

hoarder-workers 2024-03-27T16:53:27.624Z info: [Crawler][9] Will crawl "https://gavinadair.files.wordpress.com/2017/03/baker-changes-of-mind.pdf" for link with id "h03n4dihn2gp0kn8giwiyir7"                                                                                           hoarder-workers 2024-03-27T16:53:27.813Z info: [search][30] Completed successfully                                                                                                                                                                                                      hoarder-workers 2024-03-27T16:53:27.822Z error: [Crawler][9] Crawling job failed: {}                               
Originally created by @asg0451 on GitHub (Mar 27, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/28 I'd like to be able to capture a pdf url, eg https://gavinadair.files.wordpress.com/2017/03/baker-changes-of-mind.pdf currently, it is captured but no tags or added nor is text extracted ![image](https://github.com/MohamedBassem/hoarder-app/assets/4358545/3f31cbf2-4b34-4f1f-9698-3833e5685a91) logs: ``` hoarder-workers 2024-03-27T16:53:27.624Z info: [Crawler][9] Will crawl "https://gavinadair.files.wordpress.com/2017/03/baker-changes-of-mind.pdf" for link with id "h03n4dihn2gp0kn8giwiyir7" hoarder-workers 2024-03-27T16:53:27.813Z info: [search][30] Completed successfully hoarder-workers 2024-03-27T16:53:27.822Z error: [Crawler][9] Crawling job failed: {} ```
kerem 2026-03-02 11:45:49 +03:00
Author
Owner

@MohamedBassem commented on GitHub (Mar 27, 2024):

Yeah, only html Content-Type currently works. PDF support is a reasonable feature request though. Will add it to the backlog. Thanks!

<!-- gh-comment-id:2023310934 --> @MohamedBassem commented on GitHub (Mar 27, 2024): Yeah, only html `Content-Type` currently works. PDF support is a reasonable feature request though. Will add it to the backlog. Thanks!
Author
Owner

@MarkLuk commented on GitHub (Jun 8, 2024):

Exactly my use-case! I research & bookmark a lot of PDF files. Would like to have support to view their content in the preview.

<!-- gh-comment-id:2156131399 --> @MarkLuk commented on GitHub (Jun 8, 2024): Exactly my use-case! I research & bookmark a lot of PDF files. Would like to have support to view their content in the preview.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#25
No description provided.