starred/karakeep

Fork 0

mirror of https://github.com/karakeep-app/karakeep.git synced 2026-04-24 23:46:06 +03:00

[GH-ISSUE #976] The crawling job failed despite the website working fine #644

New issue

Open

opened 2026-03-02 11:51:35 +03:00 by kerem · 4 comments

kerem commented

2026-03-02 11:51:35 +03:00

Owner

Originally created by @s1lverkin on GitHub (Feb 3, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/976

Describe the Bug

Having issues with some websites, e.g. https://daisyui.com/ which works fine on various devices, but I don't think that browserless is able to crawl through it.

Steps to Reproduce

Add https://daisyui.com/ to bookmarks

Expected Behaviour

Should be crawled

Screenshots or Additional Context

2025-02-03T14:20:00.326Z info: [Crawler][10925] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me"
2025-02-03T14:20:00.326Z info: [Crawler][10925] Attempting to determine the content-type for the url https://daisyui.com/
2025-02-03T14:20:00.462Z info: [Crawler][10925] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8"
2025-02-03T14:20:00.462Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true
2025-02-03T14:20:00.464Z error: [Crawler][10925] Crawling job failed: [object Object]

2025-02-03T14:26:14.841Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me"
2025-02-03T14:26:14.841Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/
2025-02-03T14:26:14.985Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8"
2025-02-03T14:26:14.986Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true
2025-02-03T14:26:15.602Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ...
2025-02-03T14:26:20.602Z info: [Crawler][11000] Finished waiting for the page to load.
2025-02-03T14:27:14.829Z error: [Crawler][11000] Crawling job failed: Error: Timed-out after 60 secs
Error: Timed-out after 60 secs
at Timeout._onTimeout (/app/apps/workers/utils.ts:2:1025)
at listOnTimeout (node:internal/timers:594:17)
at process.processTimers (node:internal/timers:529:7)
2025-02-03T14:27:15.268Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me"
2025-02-03T14:27:15.268Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/
2025-02-03T14:27:15.355Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8"
2025-02-03T14:27:15.355Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true
2025-02-03T14:27:16.012Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ...
2025-02-03T14:27:21.012Z info: [Crawler][11000] Finished waiting for the page to load.
2025-02-03T14:28:15.268Z error: [Crawler][11000] Crawling job failed: Error: Timed-out after 60 secs
Error: Timed-out after 60 secs
at Timeout._onTimeout (/app/apps/workers/utils.ts:2:1025)
at listOnTimeout (node:internal/timers:594:17)
at process.processTimers (node:internal/timers:529:7)
2025-02-03T14:28:15.822Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me"
2025-02-03T14:28:15.822Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/
2025-02-03T14:28:15.916Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8"
2025-02-03T14:28:15.916Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true
2025-02-03T14:28:16.508Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ...
2025-02-03T14:28:21.509Z info: [Crawler][11000] Finished waiting for the page to load.

Receiving "Failed to fetch link content"

Device Details

No response

Exact Hoarder Version

0.22.0

Have you checked the troubleshooting guide?

I have checked the troubleshooting guide and I haven't found a solution to my problem

Originally created by @s1lverkin on GitHub (Feb 3, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/976 ### Describe the Bug Having issues with some websites, e.g. https://daisyui.com/ which works fine on various devices, but I don't think that browserless is able to crawl through it. ### Steps to Reproduce Add https://daisyui.com/ to bookmarks ### Expected Behaviour Should be crawled ### Screenshots or Additional Context 2025-02-03T14:20:00.326Z info: [Crawler][10925] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me" 2025-02-03T14:20:00.326Z info: [Crawler][10925] Attempting to determine the content-type for the url https://daisyui.com/ 2025-02-03T14:20:00.462Z info: [Crawler][10925] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8" 2025-02-03T14:20:00.462Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true 2025-02-03T14:20:00.464Z error: [Crawler][10925] Crawling job failed: [object Object] 2025-02-03T14:26:14.841Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me" 2025-02-03T14:26:14.841Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/ 2025-02-03T14:26:14.985Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8" 2025-02-03T14:26:14.986Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true 2025-02-03T14:26:15.602Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ... 2025-02-03T14:26:20.602Z info: [Crawler][11000] Finished waiting for the page to load. 2025-02-03T14:27:14.829Z error: [Crawler][11000] Crawling job failed: Error: Timed-out after 60 secs Error: Timed-out after 60 secs at Timeout._onTimeout (/app/apps/workers/utils.ts:2:1025) at listOnTimeout (node:internal/timers:594:17) at process.processTimers (node:internal/timers:529:7) 2025-02-03T14:27:15.268Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me" 2025-02-03T14:27:15.268Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/ 2025-02-03T14:27:15.355Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8" 2025-02-03T14:27:15.355Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true 2025-02-03T14:27:16.012Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ... 2025-02-03T14:27:21.012Z info: [Crawler][11000] Finished waiting for the page to load. 2025-02-03T14:28:15.268Z error: [Crawler][11000] Crawling job failed: Error: Timed-out after 60 secs Error: Timed-out after 60 secs at Timeout._onTimeout (/app/apps/workers/utils.ts:2:1025) at listOnTimeout (node:internal/timers:594:17) at process.processTimers (node:internal/timers:529:7) 2025-02-03T14:28:15.822Z info: [Crawler][11000] Will crawl "https://daisyui.com/" for link with id "o1ib5zpo6cvikz70hkd5r7me" 2025-02-03T14:28:15.822Z info: [Crawler][11000] Attempting to determine the content-type for the url https://daisyui.com/ 2025-02-03T14:28:15.916Z info: [Crawler][11000] Content-type for the url https://daisyui.com/ is "text/html; charset=utf-8" 2025-02-03T14:28:15.916Z info: [Crawler] Connecting to existing browser websocket address: ws://192.168.1.65:3010/?stealth=1&--disable-web-security=true 2025-02-03T14:28:16.508Z info: [Crawler][11000] Successfully navigated to "https://daisyui.com/". Waiting for the page to load ... 2025-02-03T14:28:21.509Z info: [Crawler][11000] Finished waiting for the page to load. Receiving "Failed to fetch link content" ### Device Details _No response_ ### Exact Hoarder Version 0.22.0 ### Have you checked the troubleshooting guide? - [x] I have checked the troubleshooting guide and I haven't found a solution to my problem

kerem added the

question

label

2026-03-02 11:51:35 +03:00

kerem commented

2026-03-02 11:51:36 +03:00

Author

Owner

@hametovbr commented on GitHub (Feb 8, 2025):

Same here - cannot crawl any new link, e.g. GitHub repo

@hametovbr commented on GitHub (Feb 8, 2025): Same here - cannot crawl any new link, e.g. [GitHub repo](https://github.com/mealie-recipes/mealie)

kerem commented

2026-03-02 11:51:36 +03:00

Author

Owner

@MohamedBassem commented on GitHub (Feb 9, 2025):

hmmm, are you by any chance running hoarder in a low power hardware? Or not giving it enough resources? My guess is that the chrome container is running out in memory or something when it attempts to fetch the content of the page. Maybe try the headless mode (by commenting BROWSER_WEBSOCKET_URL) and see if it helps?

@MohamedBassem commented on GitHub (Feb 9, 2025): hmmm, are you by any chance running hoarder in a low power hardware? Or not giving it enough resources? My guess is that the chrome container is running out in memory or something when it attempts to fetch the content of the page. Maybe try the headless mode (by commenting `BROWSER_WEBSOCKET_URL`) and see if it helps?

kerem commented

2026-03-02 11:51:36 +03:00

Author

Owner

@s1lverkin commented on GitHub (Feb 9, 2025):

@MohamedBassem Partial success, it archived the content but without a thumnbail or a screenshot.

I am giving enough resources for this I think, as I run this in a docker in unraid, and have a 6c/12t system.

@s1lverkin commented on GitHub (Feb 9, 2025): @MohamedBassem Partial success, it archived the content but without a thumnbail or a screenshot. I am giving enough resources for this I think, as I run this in a docker in unraid, and have a 6c/12t system.

kerem commented

2026-03-02 11:51:37 +03:00

Author

Owner

@MohamedBassem commented on GitHub (Feb 9, 2025):

Screenshots don't work in headless mode, that's expected. As for the thumbnail, daisyui's thumbnail didn't work for me as well. Maybe try other websites?

@MohamedBassem commented on GitHub (Feb 9, 2025): Screenshots don't work in headless mode, that's expected. As for the thumbnail, daisyui's thumbnail didn't work for me as well. Maybe try other websites?

kerem referenced this issue

2026-03-02 11:52:05 +03:00

[GH-ISSUE #1074] UI tile and text sizing customization #708

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

starred/karakeep#644

No description provided.

Rows
Columns