[GH-ISSUE #593] can not fetch the picture: #378

Closed
opened 2026-03-02 11:49:20 +03:00 by kerem · 1 comment
Owner
Originally created by @leftchest on GitHub (Oct 28, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/593 ### Describe the Bug can not fetch the picture from the flowing pages: 1、https://linux.do/t/topic/101732 2、https://mp.weixin.qq.com/s?__biz=Mzk0MzYyMzExMQ%3D%3D&ascene=3&chksm=c25936b85404103d5d572eb3d3cd5ffb181f18cc418ef3cc6bb18ccc96514e0978e1e5ee94a3&clicktime=1730084477&countrycode=EE&devicetype=android-31&enterid=1730084477&exportkey=n_ChQIAhIQLKTcD7%2FreQqJrj0dvVLXFBLxAQIE97dBBAEAAAAAANNBK5KxDGAAAAAOpnltbLcz9gKNyK89dVj0wVLFxs5eWGDi4sJnwm5c0RdqnLfJDCU16QjL%2BYmonML99lRqdFMx%2FmqUSoDqUc4tGipPURo7XZNTk%2Bo%2FHooSvPHW%2FMB0UmLxYgZMe0vgvTK8tC6d7%2FJ3QxgpX7vb6%2BVYHjKE4RFt85Jv%2Bd1Ki%2FSYRxuFVoNCUs8mIkZlhUh9cxO5XZtTpHDvz67MvyzF4kzs0fDswXdWa0EHhvX7wYDzJU%2BaTa1QpPpPg8fs%2B9c5e2Jj8hxn%2FRGTX18g8BQ7I%2BLfGdOxcn5e%2BxNwO7U%3D&fasttmpl_flag=0&fasttmpl_fullversion=7442750-zh_CN-zip&fasttmpl_type=0&idx=1&lang=zh_CN&mid=2247484663&nettype=WIFI&pass_ticket=o6j9dJe02FB7QCLqGVRJdUEOkI0iFNcqpKAEyicLOGUAGzp2QKhBcneiCaP4gjzO&realreporttime=1730084477950&scene=126&session_us=gh_449a85299e74&sessionid=1730082275&sn=20947a8c81cac1fce388b562bd79a8b0&subscene=10000&version=28003339&wx_header=3 thanks and expect ### Steps to Reproduce can not fetch the picture: ### Expected Behaviour can not fetch the picture: ### Screenshots or Additional Context _No response_ ### Device Details linux ### Exact Hoarder Version 0.18
kerem 2026-03-02 11:49:20 +03:00
  • closed this issue
  • added the
    question
    label
Author
Owner

@kamtschatka commented on GitHub (Oct 28, 2024):

https://linux.do/t/topic/101732 is behind cloudflare protection. This is specifically designed to prevent crawlers from crawling the content --> we will not be able to crawl that image, without adding some special handling and starting a back and forth between cloudflare and us to circumvent the protection. We are not going to do that. There are projects out there trying to do that, but they are also not very successful.

For your second link: works fine for me:
image

Please provide the logs when you are crawling this page, so we know what is happening for you.

<!-- gh-comment-id:2442411593 --> @kamtschatka commented on GitHub (Oct 28, 2024): https://linux.do/t/topic/101732 is behind cloudflare protection. This is specifically designed to prevent crawlers from crawling the content --> we will not be able to crawl that image, without adding some special handling and starting a back and forth between cloudflare and us to circumvent the protection. We are not going to do that. There are projects out there trying to do that, but they are also not very successful. For your second link: works fine for me: ![image](https://github.com/user-attachments/assets/e7eaa81a-4e6f-408a-a66d-e35d8e62560f) Please provide the logs when you are crawling this page, so we know what is happening for you.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#378
No description provided.