[GH-ISSUE #2220] Browserless browser shows as disconnected (fetch failed) although crawiling is working fine #1357

Open
opened 2026-03-02 11:56:45 +03:00 by kerem · 1 comment
Owner

Originally created by @alexanderccc on GitHub (Dec 4, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/2220

Describe the Bug

Hello,

Let me start off by saying that this project is awesome and it helped me a lot with keeping track and organizing my links/bookmarks! Really appreciate all the effort that went into it.

I've just upgraded to 0.29.1 from 0.27.1 and this release introduces the Service Connections section with the health of each service. I've observed some issues here with the Browser connection.

I'm using browserless as the crawler in my Karakeep deployment and although the Crawler works just fine, it's being reported as being Disconnected with a fetch failed error.

I added a bookmark to the github issues page here and here are the logs of the Crawl

2025-12-04T23:38:56.600Z info: [Crawler][437:0] Will crawl "https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless" for link with id "pkr384ureptihk2gnv2yly6b"
2025-12-04T23:38:56.600Z info: [Crawler][437:0] Attempting to determine the content-type for the url https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless
2025-12-04T23:38:56.618Z info: [webhook][440] Starting a webhook job for bookmark with id "pkr384ureptihk2gnv2yly6b for operation "created"
2025-12-04T23:38:56.618Z info: [webhook][440] Completed successfully
2025-12-04T23:38:56.626Z info: [ruleEngine][438] Completed successfully
2025-12-04T23:38:57.127Z info: [Crawler][437:0] Content-type for the url https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless is "text/html"
2025-12-04T23:38:57.354Z info: [Crawler][437:0] Navigating to "https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless"
2025-12-04T23:38:57.460Z info: [search][439] Completed successfully
2025-12-04T23:38:58.720Z info: [Crawler][437:0] Successfully navigated to "https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless". Waiting for the page to load ...
2025-12-04T23:38:59.682Z info: [Crawler][437:0] Finished waiting for the page to load.
2025-12-04T23:38:59.699Z info: [Crawler][437:0] Successfully fetched the page content.
2025-12-04T23:38:59.814Z info: [Crawler][437:0] Finished capturing page content and a screenshot. FullPageScreenshot: false
2025-12-04T23:38:59.842Z info: [Crawler][437:0] Will attempt to extract metadata from page ...
2025-12-04T23:39:00.187Z info: [Crawler][437:0] Done extracting metadata from the page.
2025-12-04T23:39:00.187Z info: [Crawler][437:0] Will attempt to extract readable content ...
2025-12-04T23:39:01.117Z info: [Crawler][437:0] Done extracting readable content.
2025-12-04T23:39:01.127Z info: [Crawler][437:0] Stored the screenshot as assetId: f7f12c10-83c5-43dc-bfef-b987034049c7 (73370 bytes)
2025-12-04T23:39:01.128Z info: [Crawler][437:0] Stored large HTML content (26232 bytes) as asset: adcd72d7-9d7f-461f-ac0b-0c51b3ccf3cf
2025-12-04T23:39:01.129Z info: [Crawler][437:0] Downloading image from "https://opengraph.githubassets.com/c710728ef7547071ec0baae5b97a8d5dd204709c8946a725dfc21c1ba3454be7/..."
2025-12-04T23:39:01.297Z info: [Crawler][437:0] Downloaded image as assetId: 6b27833c-ee32-478b-9e2b-84505ad5668c (55076 bytes)
2025-12-04T23:39:01.303Z info: [Crawler][437] Completed successfully

The connection to browserless is done over a websocket, not sure if this impacts anything. It's configured as follows:

- name: BROWSER_WEB_URL
  value: "ws://browserless:3000"

I'm using that the connection is being closed when not in active use by browserless and this causes the health to be reported as being unhealthy as I do see this in the logs periodically:

2025-12-04T23:52:01.589Z info: [Crawler] The Playwright browser got disconnected. Will attempt to launch it again.
2025-12-04T23:52:01.589Z info: [Crawler] Connecting to existing browser instance: ws://browserless:3000
2025-12-04T23:52:01.590Z info: [Crawler] Successfully resolved IP address, new address: ws://10.43.203.45:3000/

IMHO the above should be fine though, no need to keep a connection established when there's nothing to crawl 🤷

Steps to Reproduce

Use a browserless image as the crawler for Karakeep
Set the connection over websocket
Health of browser will show as Disconnected
Crawls will work just fine

Expected Behaviour

Service to show as healthy

Screenshots or Additional Context

Image

Device Details

No response

Exact Karakeep Version

0.29.1

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem
Originally created by @alexanderccc on GitHub (Dec 4, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/2220 ### Describe the Bug Hello, Let me start off by saying that this project is awesome and it helped me a lot with keeping track and organizing my links/bookmarks! Really appreciate all the effort that went into it. I've just upgraded to 0.29.1 from 0.27.1 and this release introduces the Service Connections section with the health of each service. I've observed some issues here with the Browser connection. I'm using [browserless](https://github.com/browserless/browserless) as the crawler in my Karakeep deployment and although the Crawler works just fine, it's being reported as being Disconnected with a fetch failed error. I added a bookmark to the github issues page here and here are the logs of the Crawl ``` 2025-12-04T23:38:56.600Z info: [Crawler][437:0] Will crawl "https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless" for link with id "pkr384ureptihk2gnv2yly6b" 2025-12-04T23:38:56.600Z info: [Crawler][437:0] Attempting to determine the content-type for the url https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless 2025-12-04T23:38:56.618Z info: [webhook][440] Starting a webhook job for bookmark with id "pkr384ureptihk2gnv2yly6b for operation "created" 2025-12-04T23:38:56.618Z info: [webhook][440] Completed successfully 2025-12-04T23:38:56.626Z info: [ruleEngine][438] Completed successfully 2025-12-04T23:38:57.127Z info: [Crawler][437:0] Content-type for the url https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless is "text/html" 2025-12-04T23:38:57.354Z info: [Crawler][437:0] Navigating to "https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless" 2025-12-04T23:38:57.460Z info: [search][439] Completed successfully 2025-12-04T23:38:58.720Z info: [Crawler][437:0] Successfully navigated to "https://github.com/karakeep-app/karakeep/issues?q=is%3Aissue%20state%3Aopen%20browserless". Waiting for the page to load ... 2025-12-04T23:38:59.682Z info: [Crawler][437:0] Finished waiting for the page to load. 2025-12-04T23:38:59.699Z info: [Crawler][437:0] Successfully fetched the page content. 2025-12-04T23:38:59.814Z info: [Crawler][437:0] Finished capturing page content and a screenshot. FullPageScreenshot: false 2025-12-04T23:38:59.842Z info: [Crawler][437:0] Will attempt to extract metadata from page ... 2025-12-04T23:39:00.187Z info: [Crawler][437:0] Done extracting metadata from the page. 2025-12-04T23:39:00.187Z info: [Crawler][437:0] Will attempt to extract readable content ... 2025-12-04T23:39:01.117Z info: [Crawler][437:0] Done extracting readable content. 2025-12-04T23:39:01.127Z info: [Crawler][437:0] Stored the screenshot as assetId: f7f12c10-83c5-43dc-bfef-b987034049c7 (73370 bytes) 2025-12-04T23:39:01.128Z info: [Crawler][437:0] Stored large HTML content (26232 bytes) as asset: adcd72d7-9d7f-461f-ac0b-0c51b3ccf3cf 2025-12-04T23:39:01.129Z info: [Crawler][437:0] Downloading image from "https://opengraph.githubassets.com/c710728ef7547071ec0baae5b97a8d5dd204709c8946a725dfc21c1ba3454be7/..." 2025-12-04T23:39:01.297Z info: [Crawler][437:0] Downloaded image as assetId: 6b27833c-ee32-478b-9e2b-84505ad5668c (55076 bytes) 2025-12-04T23:39:01.303Z info: [Crawler][437] Completed successfully ``` The connection to browserless is done over a websocket, not sure if this impacts anything. It's configured as follows: ```yaml - name: BROWSER_WEB_URL value: "ws://browserless:3000" ``` I'm using that the connection is being closed when not in active use by browserless and this causes the health to be reported as being unhealthy as I do see this in the logs periodically: ``` 2025-12-04T23:52:01.589Z info: [Crawler] The Playwright browser got disconnected. Will attempt to launch it again. 2025-12-04T23:52:01.589Z info: [Crawler] Connecting to existing browser instance: ws://browserless:3000 2025-12-04T23:52:01.590Z info: [Crawler] Successfully resolved IP address, new address: ws://10.43.203.45:3000/ ``` IMHO the above should be fine though, no need to keep a connection established when there's nothing to crawl 🤷 ### Steps to Reproduce Use a browserless image as the crawler for Karakeep Set the connection over websocket Health of browser will show as Disconnected Crawls will work just fine ### Expected Behaviour Service to show as healthy ### Screenshots or Additional Context <img width="2103" height="454" alt="Image" src="https://github.com/user-attachments/assets/bcf22bac-c894-4088-887d-afe60cdf2345" /> ### Device Details _No response_ ### Exact Karakeep Version 0.29.1 ### Have you checked the troubleshooting guide? - [x] I have checked the troubleshooting guide and I haven't found a solution to my problem
Author
Owner

@MohamedBassem commented on GitHub (Dec 7, 2025):

hmmm, yeah, that's a bug. Didn't know that you can pass a websocket to BROWSER_WEB_URL. The code doesn't currently handle this right.

<!-- gh-comment-id:3622701511 --> @MohamedBassem commented on GitHub (Dec 7, 2025): hmmm, yeah, that's a bug. Didn't know that you can pass a websocket to `BROWSER_WEB_URL`. The code doesn't currently handle this right.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1357
No description provided.