[GH-ISSUE #976] The crawling job failed despite the website working fine #644

Open
opened 2026-03-02 11:51:35 +03:00 by kerem · 4 comments
Owner

Originally created by @s1lverkin on GitHub (Feb 3, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/976

Describe the Bug

Having issues with some websites, e.g. https://daisyui.com/ which works fine on various devices, but I don't think that browserless is able to crawl through it.

Steps to Reproduce

Add https://daisyui.com/ to bookmarks

Expected Behaviour

Should be crawled

Screenshots or Additional Context

2025-02-03T14:20:00.326Z info: [Crawler][10925] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me"
2025-02-03T14:20:00.326Z info: [Crawler][10925] Attempting to determine the content-type for the url https://daisyui.com/
2025-02-03T14:20:00.462Z info: [Crawler][10925] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8"
2025-02-03T14:20:00.462Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true
2025-02-03T14:20:00.464Z error: [Crawler][10925] Crawling job failed: [object Object]

2025-02-03T14:26:14.841Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me"
2025-02-03T14:26:14.841Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/
2025-02-03T14:26:14.985Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8"
2025-02-03T14:26:14.986Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true
2025-02-03T14:26:15.602Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ...
2025-02-03T14:26:20.602Z info: [Crawler][11000] Finished waiting for the page to load.
2025-02-03T14:27:14.829Z error: [Crawler][11000] Crawling job failed: Error: Timed-out after 60 secs
Error: Timed-out after 60 secs
at Timeout._onTimeout (/app/apps/workers/utils.ts:2:1025)
at listOnTimeout (node:internal/timers:594:17)
at process.processTimers (node:internal/timers:529:7)
2025-02-03T14:27:15.268Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me"
2025-02-03T14:27:15.268Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/
2025-02-03T14:27:15.355Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8"
2025-02-03T14:27:15.355Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true
2025-02-03T14:27:16.012Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ...
2025-02-03T14:27:21.012Z info: [Crawler][11000] Finished waiting for the page to load.
2025-02-03T14:28:15.268Z error: [Crawler][11000] Crawling job failed: Error: Timed-out after 60 secs
Error: Timed-out after 60 secs
at Timeout._onTimeout (/app/apps/workers/utils.ts:2:1025)
at listOnTimeout (node:internal/timers:594:17)
at process.processTimers (node:internal/timers:529:7)
2025-02-03T14:28:15.822Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me"
2025-02-03T14:28:15.822Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/
2025-02-03T14:28:15.916Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8"
2025-02-03T14:28:15.916Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true
2025-02-03T14:28:16.508Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ...
2025-02-03T14:28:21.509Z info: [Crawler][11000] Finished waiting for the page to load.

Receiving "Failed to fetch link content"

Device Details

No response

Exact Hoarder Version

0.22.0

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem
Originally created by @s1lverkin on GitHub (Feb 3, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/976 ### Describe the Bug Having issues with some websites, e.g. https://daisyui.com/ which works fine on various devices, but I don't think that browserless is able to crawl through it. ### Steps to Reproduce Add https://daisyui.com/ to bookmarks ### Expected Behaviour Should be crawled ### Screenshots or Additional Context 2025-02-03T14:20:00.326Z info: [Crawler][10925] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me" 2025-02-03T14:20:00.326Z info: [Crawler][10925] Attempting to determine the content-type for the url https://daisyui.com/ 2025-02-03T14:20:00.462Z info: [Crawler][10925] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8" 2025-02-03T14:20:00.462Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true 2025-02-03T14:20:00.464Z error: [Crawler][10925] Crawling job failed: [object Object] 2025-02-03T14:26:14.841Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me" 2025-02-03T14:26:14.841Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/ 2025-02-03T14:26:14.985Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8" 2025-02-03T14:26:14.986Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true 2025-02-03T14:26:15.602Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ... 2025-02-03T14:26:20.602Z info: [Crawler][11000] Finished waiting for the page to load. 2025-02-03T14:27:14.829Z error: [Crawler][11000] Crawling job failed: Error: Timed-out after 60 secs Error: Timed-out after 60 secs at Timeout._onTimeout (/app/apps/workers/utils.ts:2:1025) at listOnTimeout (node:internal/timers:594:17) at process.processTimers (node:internal/timers:529:7) 2025-02-03T14:27:15.268Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me" 2025-02-03T14:27:15.268Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/ 2025-02-03T14:27:15.355Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8" 2025-02-03T14:27:15.355Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true 2025-02-03T14:27:16.012Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ... 2025-02-03T14:27:21.012Z info: [Crawler][11000] Finished waiting for the page to load. 2025-02-03T14:28:15.268Z error: [Crawler][11000] Crawling job failed: Error: Timed-out after 60 secs Error: Timed-out after 60 secs at Timeout._onTimeout (/app/apps/workers/utils.ts:2:1025) at listOnTimeout (node:internal/timers:594:17) at process.processTimers (node:internal/timers:529:7) 2025-02-03T14:28:15.822Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me" 2025-02-03T14:28:15.822Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/ 2025-02-03T14:28:15.916Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8" 2025-02-03T14:28:15.916Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true 2025-02-03T14:28:16.508Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ... 2025-02-03T14:28:21.509Z info: [Crawler][11000] Finished waiting for the page to load. Receiving "Failed to fetch link content" ### Device Details _No response_ ### Exact Hoarder Version 0.22.0 ### Have you checked the troubleshooting guide? - [x] I have checked the troubleshooting guide and I haven't found a solution to my problem
Author
Owner

@hametovbr commented on GitHub (Feb 8, 2025):

Same here - cannot crawl any new link, e.g. GitHub repo

<!-- gh-comment-id:2645800807 --> @hametovbr commented on GitHub (Feb 8, 2025): Same here - cannot crawl any new link, e.g. [GitHub repo](https://github.com/mealie-recipes/mealie)
Author
Owner

@MohamedBassem commented on GitHub (Feb 9, 2025):

hmmm, are you by any chance running hoarder in a low power hardware? Or not giving it enough resources? My guess is that the chrome container is running out in memory or something when it attempts to fetch the content of the page. Maybe try the headless mode (by commenting BROWSER_WEBSOCKET_URL) and see if it helps?

<!-- gh-comment-id:2646359938 --> @MohamedBassem commented on GitHub (Feb 9, 2025): hmmm, are you by any chance running hoarder in a low power hardware? Or not giving it enough resources? My guess is that the chrome container is running out in memory or something when it attempts to fetch the content of the page. Maybe try the headless mode (by commenting `BROWSER_WEBSOCKET_URL`) and see if it helps?
Author
Owner

@s1lverkin commented on GitHub (Feb 9, 2025):

@MohamedBassem Partial success, it archived the content but without a thumnbail or a screenshot.

I am giving enough resources for this I think, as I run this in a docker in unraid, and have a 6c/12t system.

<!-- gh-comment-id:2646368625 --> @s1lverkin commented on GitHub (Feb 9, 2025): @MohamedBassem Partial success, it archived the content but without a thumnbail or a screenshot. I am giving enough resources for this I think, as I run this in a docker in unraid, and have a 6c/12t system.
Author
Owner

@MohamedBassem commented on GitHub (Feb 9, 2025):

Screenshots don't work in headless mode, that's expected. As for the thumbnail, daisyui's thumbnail didn't work for me as well. Maybe try other websites?

<!-- gh-comment-id:2646372024 --> @MohamedBassem commented on GitHub (Feb 9, 2025): Screenshots don't work in headless mode, that's expected. As for the thumbnail, daisyui's thumbnail didn't work for me as well. Maybe try other websites?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#644
No description provided.