[GH-ISSUE #248] [Crawler] Failed to connect to the browser instance, will retry in 5 secs #173

Closed
opened 2026-03-02 11:47:18 +03:00 by kerem · 6 comments
Owner

Originally created by @francisafu on GitHub (Jun 21, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/248

The workers continue to output error information, and the crawler doesn't work.

1.Workers' log:

2024-06-21T17:13:05.149Z info: Workers version: 0.14.0
2024-06-21T17:13:05.164Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:13:05.183Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:13:06.905Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:13:06.906Z info: Starting crawler worker ...
2024-06-21T17:13:06.908Z info: Starting inference worker ...
2024-06-21T17:13:06.908Z info: Starting search indexing worker ...
2024-06-21T17:13:11.907Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:13:11.908Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:13:13.510Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:13:18.512Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:13:18.513Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:13:20.065Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:13:25.067Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:13:25.067Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:13:26.662Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:13:31.663Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:13:31.664Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:13:33.265Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:13:38.265Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:13:38.267Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:13:39.878Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:13:44.878Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:13:44.879Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:13:46.436Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:13:51.439Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:13:51.439Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:13:53.037Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:13:58.039Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:13:58.040Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:13:59.594Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:14:04.595Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:14:04.596Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:14:06.190Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
2024-06-21T17:14:11.191Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-06-21T17:14:11.192Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/
2024-06-21T17:14:12.734Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
......(continue goes on)

2.Chrome's log:

[0621/171258.016381:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0621/171258.017399:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0621/171258.021242:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0621/171258.024240:WARNING:dns_config_service_linux.cc(427)] Failed to read DnsConfig.
[0621/171258.104721:INFO:policy_logger.cc(145)] :components/policy/core/common/config_dir_policy_loader.cc(118) Skipping mandatory platform policies because no policy file was found at: /etc/chromium/policies/managed
[0621/171258.104812:INFO:policy_logger.cc(145)] :components/policy/core/common/config_dir_policy_loader.cc(118) Skipping recommended platform policies because no policy file was found at: /etc/chromium/policies/recommended

DevTools listening on ws://0.0.0.0:9222/devtools/browser/17f397b8-a977-4ace-80bc-6fca6b9b4b4a
[0621/171258.112522:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
[0621/171258.168899:WARNING:sandbox_linux.cc(418)] InitializeSandbox() called with multiple threads in process gpu-process.
[0621/171258.266562:WARNING:dns_config_service_linux.cc(427)] Failed to read DnsConfig.

3.Docker compose file(totally the same with default, except with the port redirection):

version: "3.8"
services:
  web:
    image: ghcr.io/hoarder-app/hoarder-web:${HOARDER_VERSION:-release}
    restart: unless-stopped
    volumes:
      - data:/data
    ports:
      - 6600:3000
    env_file:
      - .env
    environment:
      REDIS_HOST: redis
      MEILI_ADDR: http://meilisearch:7700
      DATA_DIR: /data
  redis:
    image: redis:7.2-alpine
    restart: unless-stopped
    volumes:
      - redis:/data
  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
  meilisearch:
    image: getmeili/meilisearch:v1.6
    restart: unless-stopped
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - meilisearch:/meili_data
  workers:
    image: ghcr.io/hoarder-app/hoarder-workers:${HOARDER_VERSION:-release}
    restart: unless-stopped
    volumes:
      - data:/data
    env_file:
      - .env
    environment:
      REDIS_HOST: redis
      MEILI_ADDR: http://meilisearch:7700
      BROWSER_WEB_URL: http://chrome:9222
      DATA_DIR: /data
      # OPENAI_API_KEY: ...
    depends_on:
      web:
        condition: service_started

volumes:
  redis:
  meilisearch:
  data:

4.Environment:

  • Unraid 6.12.10
  • Compose.Manager 2024.05.10
Originally created by @francisafu on GitHub (Jun 21, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/248 ## The workers continue to output error information, and the crawler doesn't work. ### 1.Workers' log: ``` 2024-06-21T17:13:05.149Z info: Workers version: 0.14.0 2024-06-21T17:13:05.164Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:13:05.183Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:13:06.905Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:13:06.906Z info: Starting crawler worker ... 2024-06-21T17:13:06.908Z info: Starting inference worker ... 2024-06-21T17:13:06.908Z info: Starting search indexing worker ... 2024-06-21T17:13:11.907Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:13:11.908Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:13:13.510Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:13:18.512Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:13:18.513Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:13:20.065Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:13:25.067Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:13:25.067Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:13:26.662Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:13:31.663Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:13:31.664Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:13:33.265Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:13:38.265Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:13:38.267Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:13:39.878Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:13:44.878Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:13:44.879Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:13:46.436Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:13:51.439Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:13:51.439Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:13:53.037Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:13:58.039Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:13:58.040Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:13:59.594Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:14:04.595Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:14:04.596Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:14:06.190Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs 2024-06-21T17:14:11.191Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-06-21T17:14:11.192Z info: [Crawler] Successfully resolved IP address, new address: http://172.20.0.3:9222/ 2024-06-21T17:14:12.734Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs ......(continue goes on) ``` ### 2.Chrome's log: ``` [0621/171258.016381:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0621/171258.017399:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0621/171258.021242:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [0621/171258.024240:WARNING:dns_config_service_linux.cc(427)] Failed to read DnsConfig. [0621/171258.104721:INFO:policy_logger.cc(145)] :components/policy/core/common/config_dir_policy_loader.cc(118) Skipping mandatory platform policies because no policy file was found at: /etc/chromium/policies/managed [0621/171258.104812:INFO:policy_logger.cc(145)] :components/policy/core/common/config_dir_policy_loader.cc(118) Skipping recommended platform policies because no policy file was found at: /etc/chromium/policies/recommended DevTools listening on ws://0.0.0.0:9222/devtools/browser/17f397b8-a977-4ace-80bc-6fca6b9b4b4a [0621/171258.112522:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable. [0621/171258.168899:WARNING:sandbox_linux.cc(418)] InitializeSandbox() called with multiple threads in process gpu-process. [0621/171258.266562:WARNING:dns_config_service_linux.cc(427)] Failed to read DnsConfig. ``` ### 3.Docker compose file(totally the same with default, except with the port redirection): ``` version: "3.8" services: web: image: ghcr.io/hoarder-app/hoarder-web:${HOARDER_VERSION:-release} restart: unless-stopped volumes: - data:/data ports: - 6600:3000 env_file: - .env environment: REDIS_HOST: redis MEILI_ADDR: http://meilisearch:7700 DATA_DIR: /data redis: image: redis:7.2-alpine restart: unless-stopped volumes: - redis:/data chrome: image: gcr.io/zenika-hub/alpine-chrome:123 restart: unless-stopped command: - --no-sandbox - --disable-gpu - --disable-dev-shm-usage - --remote-debugging-address=0.0.0.0 - --remote-debugging-port=9222 - --hide-scrollbars meilisearch: image: getmeili/meilisearch:v1.6 restart: unless-stopped env_file: - .env environment: MEILI_NO_ANALYTICS: "true" volumes: - meilisearch:/meili_data workers: image: ghcr.io/hoarder-app/hoarder-workers:${HOARDER_VERSION:-release} restart: unless-stopped volumes: - data:/data env_file: - .env environment: REDIS_HOST: redis MEILI_ADDR: http://meilisearch:7700 BROWSER_WEB_URL: http://chrome:9222 DATA_DIR: /data # OPENAI_API_KEY: ... depends_on: web: condition: service_started volumes: redis: meilisearch: data: ``` ### 4.Environment: * Unraid 6.12.10 * Compose.Manager 2024.05.10
kerem 2026-03-02 11:47:18 +03:00
  • closed this issue
  • added the
    question
    label
Author
Owner

@MohamedBassem commented on GitHub (Jun 22, 2024):

That's interesting, because your configuration looks good to me.
Let's start with the obvious suggestions, have you attempted to turn down the stack and turn it up again? :D

<!-- gh-comment-id:2184055358 --> @MohamedBassem commented on GitHub (Jun 22, 2024): That's interesting, because your configuration looks good to me. Let's start with the obvious suggestions, have you attempted to turn down the stack and turn it up again? :D
Author
Owner

@francisafu commented on GitHub (Jun 23, 2024):

That's interesting, because your configuration looks good to me. Let's start with the obvious suggestions, have you attempted to turn down the stack and turn it up again? :D

Yeah of course, multiple times turn on&off, compose up&down, doesn't work, still the same problem occurred.

<!-- gh-comment-id:2184969343 --> @francisafu commented on GitHub (Jun 23, 2024): > That's interesting, because your configuration looks good to me. Let's start with the obvious suggestions, have you attempted to turn down the stack and turn it up again? :D Yeah of course, multiple times turn on&off, compose up&down, doesn't work, still the same problem occurred.
Author
Owner

@francisafu commented on GitHub (Jun 23, 2024):

Forgot to post the ENV file. I'll put it here:

HOARDER_VERSION=release
NEXTAUTH_SECRET=some_random_keys
MEILI_MASTER_KEY=some_other_random_keys
NEXTAUTH_URL=http://192.168.124.2:6600
MAX_ASSET_SIZE_MB=20480
OPENAI_API_KEY=fk**************
OPENAI_BASE_URL=https://*****.net
INFERENCE_LANG=chinese
<!-- gh-comment-id:2184978427 --> @francisafu commented on GitHub (Jun 23, 2024): ## Forgot to post the ENV file. I'll put it here: ``` HOARDER_VERSION=release NEXTAUTH_SECRET=some_random_keys MEILI_MASTER_KEY=some_other_random_keys NEXTAUTH_URL=http://192.168.124.2:6600 MAX_ASSET_SIZE_MB=20480 OPENAI_API_KEY=fk************** OPENAI_BASE_URL=https://*****.net INFERENCE_LANG=chinese ```
Author
Owner

@dodying commented on GitHub (Jul 8, 2024):

crawlerWorker.ts前插入console.log(e);后发现是worker会下载github上adblocker的easylist规则,而且不知道为什么使用环境变量设置代理了也没用。
使用hosts指定github的ip地址,就可以正常了

<!-- gh-comment-id:2212985509 --> @dodying commented on GitHub (Jul 8, 2024): 在[crawlerWorker.ts](https://github.com/hoarder-app/hoarder/blob/main/apps/workers/crawlerWorker.ts#L116)前插入`console.log(e);`后发现是worker会下载github上adblocker的easylist规则,而且不知道为什么使用环境变量设置代理了也没用。 使用hosts指定github的ip地址,就可以正常了
Author
Owner

@francisafu commented on GitHub (Jul 18, 2024):

crawlerWorker.ts前插入console.log(e);后发现是worker会下载github上adblocker的easylist规则,而且不知道为什么使用环境变量设置代理了也没用。 使用hosts指定github的ip地址,就可以正常了

Well, it doesn't seems like a network problem, I tried to fix it as you said, however the problem still exists.

Here is the network connection

/app/apps/workers # ping github.com
PING github.com (140.82.112.4): 56 data bytes
64 bytes from 140.82.112.4: seq=0 ttl=47 time=252.627 ms
64 bytes from 140.82.112.4: seq=1 ttl=47 time=252.859 ms
64 bytes from 140.82.112.4: seq=2 ttl=47 time=252.153 ms
64 bytes from 140.82.112.4: seq=3 ttl=47 time=252.746 ms
64 bytes from 140.82.112.4: seq=4 ttl=47 time=252.870 ms
^C
--- github.com ping statistics ---
6 packets transmitted, 5 packets received, 16% packet loss
round-trip min/avg/max = 252.153/252.651/252.870 ms
/app/apps/workers # ping raw.githubusercontent.com
PING raw.githubusercontent.com (185.199.111.133): 56 data bytes
64 bytes from 185.199.111.133: seq=0 ttl=54 time=111.197 ms
64 bytes from 185.199.111.133: seq=1 ttl=54 time=110.841 ms
64 bytes from 185.199.111.133: seq=4 ttl=54 time=112.224 ms
64 bytes from 185.199.111.133: seq=5 ttl=54 time=113.838 ms
64 bytes from 185.199.111.133: seq=6 ttl=54 time=111.442 ms
^C
--- raw.githubusercontent.com ping statistics ---
7 packets transmitted, 5 packets received, 28% packet loss
round-trip min/avg/max = 110.841/111.908/113.838 ms
2024-07-18T09:41:07.591Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6"
2024-07-18T09:41:07.591Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/
2024-07-18T09:41:07.683Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html"
2024-07-18T09:41:07.684Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
2024-07-18T09:41:08.736Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6"
2024-07-18T09:41:08.736Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/
2024-07-18T09:41:09.860Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html"
2024-07-18T09:41:09.861Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
2024-07-18T09:41:11.945Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6"
2024-07-18T09:41:11.945Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/
2024-07-18T09:41:12.025Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html"
2024-07-18T09:41:12.027Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
2024-07-18T09:41:16.058Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6"
2024-07-18T09:41:16.058Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/
2024-07-18T09:41:16.149Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html"
2024-07-18T09:41:16.151Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
2024-07-18T09:41:24.181Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6"
2024-07-18T09:41:24.181Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/
2024-07-18T09:41:24.271Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html"
2024-07-18T09:41:24.272Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
<!-- gh-comment-id:2236076886 --> @francisafu commented on GitHub (Jul 18, 2024): > 在[crawlerWorker.ts](https://github.com/hoarder-app/hoarder/blob/main/apps/workers/crawlerWorker.ts#L116)前插入`console.log(e);`后发现是worker会下载github上adblocker的easylist规则,而且不知道为什么使用环境变量设置代理了也没用。 使用hosts指定github的ip地址,就可以正常了 Well, it doesn't seems like a network problem, I tried to fix it as you said, however the problem still exists. ## Here is the network connection ``` /app/apps/workers # ping github.com PING github.com (140.82.112.4): 56 data bytes 64 bytes from 140.82.112.4: seq=0 ttl=47 time=252.627 ms 64 bytes from 140.82.112.4: seq=1 ttl=47 time=252.859 ms 64 bytes from 140.82.112.4: seq=2 ttl=47 time=252.153 ms 64 bytes from 140.82.112.4: seq=3 ttl=47 time=252.746 ms 64 bytes from 140.82.112.4: seq=4 ttl=47 time=252.870 ms ^C --- github.com ping statistics --- 6 packets transmitted, 5 packets received, 16% packet loss round-trip min/avg/max = 252.153/252.651/252.870 ms /app/apps/workers # ping raw.githubusercontent.com PING raw.githubusercontent.com (185.199.111.133): 56 data bytes 64 bytes from 185.199.111.133: seq=0 ttl=54 time=111.197 ms 64 bytes from 185.199.111.133: seq=1 ttl=54 time=110.841 ms 64 bytes from 185.199.111.133: seq=4 ttl=54 time=112.224 ms 64 bytes from 185.199.111.133: seq=5 ttl=54 time=113.838 ms 64 bytes from 185.199.111.133: seq=6 ttl=54 time=111.442 ms ^C --- raw.githubusercontent.com ping statistics --- 7 packets transmitted, 5 packets received, 28% packet loss round-trip min/avg/max = 110.841/111.908/113.838 ms ``` ## And afterwards I tried to fetch a link, here's the log ``` 2024-07-18T09:41:07.591Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6" 2024-07-18T09:41:07.591Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/ 2024-07-18T09:41:07.683Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html" 2024-07-18T09:41:07.684Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true 2024-07-18T09:41:08.736Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6" 2024-07-18T09:41:08.736Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/ 2024-07-18T09:41:09.860Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html" 2024-07-18T09:41:09.861Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true 2024-07-18T09:41:11.945Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6" 2024-07-18T09:41:11.945Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/ 2024-07-18T09:41:12.025Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html" 2024-07-18T09:41:12.027Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true 2024-07-18T09:41:16.058Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6" 2024-07-18T09:41:16.058Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/ 2024-07-18T09:41:16.149Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html" 2024-07-18T09:41:16.151Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true 2024-07-18T09:41:24.181Z info: [Crawler][17] Will crawl "https://www.baidu.com/" for link with id "atrqsg02v8ugw7fwehlygwh6" 2024-07-18T09:41:24.181Z info: [Crawler][17] Attempting to determine the content-type for the url https://www.baidu.com/ 2024-07-18T09:41:24.271Z info: [Crawler][17] Content-type for the url https://www.baidu.com/ is "text/html" 2024-07-18T09:41:24.272Z error: [Crawler][17] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true ```
Author
Owner

@francisafu commented on GitHub (Aug 4, 2024):

Similiar with #331 ,issue closed.

<!-- gh-comment-id:2267541564 --> @francisafu commented on GitHub (Aug 4, 2024): Similiar with #331 ,issue closed.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#173
No description provided.