[GH-ISSUE #674] Failed to connect to the browser instance, will retry in 5 secs #437

Open
opened 2026-03-02 11:49:51 +03:00 by kerem · 24 comments
Owner

Originally created by @snowdream on GitHub (Nov 19, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/674

Describe the Bug

https://docs.hoarder.app/Installation/docker

i try to run hoarder with docker compose,but failed.

image image image image

Steps to Reproduce

  1. create .env
HOARDER_VERSION=release
NEXTAUTH_SECRET=super_random_string
MEILI_MASTER_KEY=another_random_string
NEXTAUTH_URL=http://localhost:3000
  1. create docker-compose.yml
version: "3.8"
services:
  web:
    image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release}
    restart: unless-stopped
    volumes:
      - data:/data
    ports:
      - 3000:3000
    env_file:
      - .env
    environment:
      MEILI_ADDR: http://meilisearch:7700
      BROWSER_WEB_URL: http://chrome:9222
      # OPENAI_API_KEY: ...
      DATA_DIR: /data
  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
  meilisearch:
    image: getmeili/meilisearch:v1.11.1
    restart: unless-stopped
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - meilisearch:/meili_data

volumes:
  meilisearch:
  data:
  1. docker compose up -d
docker compose up -d

Expected Behaviour

http://localhost:3000/ is OK

Screenshots or Additional Context

image

Device Details

Microsoft Edge 版本 131.0.2903.48 (正式版本) (x86_64) On macOS

Exact Hoarder Version

release

Originally created by @snowdream on GitHub (Nov 19, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/674 ### Describe the Bug https://docs.hoarder.app/Installation/docker i try to run hoarder with docker compose,but failed. <img width="1270" alt="image" src="https://github.com/user-attachments/assets/b4518c99-2e0e-454d-b571-04386f81e8af"> <img width="1270" alt="image" src="https://github.com/user-attachments/assets/0e158613-d444-4787-9a26-ba4b41cbc2b9"> <img width="1270" alt="image" src="https://github.com/user-attachments/assets/3b5107c0-6fe8-41f8-a99c-03afa3af99da"> <img width="1270" alt="image" src="https://github.com/user-attachments/assets/9a5a73db-5f91-4b1a-ba76-b28fc740dd6b"> ### Steps to Reproduce 1. create .env ``` HOARDER_VERSION=release NEXTAUTH_SECRET=super_random_string MEILI_MASTER_KEY=another_random_string NEXTAUTH_URL=http://localhost:3000 ``` 2. create docker-compose.yml ``` version: "3.8" services: web: image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release} restart: unless-stopped volumes: - data:/data ports: - 3000:3000 env_file: - .env environment: MEILI_ADDR: http://meilisearch:7700 BROWSER_WEB_URL: http://chrome:9222 # OPENAI_API_KEY: ... DATA_DIR: /data chrome: image: gcr.io/zenika-hub/alpine-chrome:123 restart: unless-stopped command: - --no-sandbox - --disable-gpu - --disable-dev-shm-usage - --remote-debugging-address=0.0.0.0 - --remote-debugging-port=9222 - --hide-scrollbars meilisearch: image: getmeili/meilisearch:v1.11.1 restart: unless-stopped env_file: - .env environment: MEILI_NO_ANALYTICS: "true" volumes: - meilisearch:/meili_data volumes: meilisearch: data: ``` 3. docker compose up -d ``` docker compose up -d ``` ### Expected Behaviour http://localhost:3000/ is OK ### Screenshots or Additional Context <img width="1439" alt="image" src="https://github.com/user-attachments/assets/67dec34d-892c-4f0b-b070-f3be9d19b378"> ### Device Details Microsoft Edge 版本 131.0.2903.48 (正式版本) (x86_64) On macOS ### Exact Hoarder Version release
Author
Owner

@Guillaume-Bignon commented on GitHub (Nov 19, 2024):

I have a similar error with latest Hoarder version. The app can be used, but when I add a bookmark, it can't retrieve any image or description.

<!-- gh-comment-id:2486722204 --> @Guillaume-Bignon commented on GitHub (Nov 19, 2024): I have a similar error with latest Hoarder version. The app can be used, but when I add a bookmark, it can't retrieve any image or description.
Author
Owner

@miracloon commented on GitHub (Nov 21, 2024):

same question

<!-- gh-comment-id:2491107069 --> @miracloon commented on GitHub (Nov 21, 2024): same question
Author
Owner

@kamtschatka commented on GitHub (Nov 21, 2024):

everyone using docker desktop? we have seen before, that networking works differently on e.g. windows and linux.

<!-- gh-comment-id:2491502768 --> @kamtschatka commented on GitHub (Nov 21, 2024): everyone using docker desktop? we have seen before, that networking works differently on e.g. windows and linux.
Author
Owner

@miracloon commented on GitHub (Nov 21, 2024):

My deployment system is linux and this is my config file

version: "3.8"
networks:
  traefiknet:
    external: true
services:
  web:
    image: ghcr.io/hoarder-app/hoarder:release
    restart: unless-stopped
    container_name: hoarder
    volumes:
      - /opt/mydocker/hoarder/data:/data
    ports:
      - 54110:3000
    env_file:
      - .env
    networks:
      - traefiknet
    labels:
      - traefik.docker.network=traefiknet
      - traefik.enable=true
      - traefik.http.routers.hoarder.rule=Host(`hoarder.my.domain`)
      - traefik.http.routers.hoarder.entrypoints=http,https
      - traefik.http.routers.hoarder.priority=10
      - traefik.http.routers.hoarder.tls=true
      - traefik.http.services.hoarder.loadbalancer.server.port=3000
      - traefik.http.routers.hoarder.tls.certresolver=mycloudflare

  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    container_name: chrome
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
    networks:
      - traefiknet
  meilisearch:
    image: getmeili/meilisearch:v1.11.1
    restart: unless-stopped
    container_name: meilisearch
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - /opt/mydocker/hoarder/meilisearch:/meili_data
    networks:
      - traefiknet
<!-- gh-comment-id:2491551291 --> @miracloon commented on GitHub (Nov 21, 2024): My deployment system is linux and this is my config file ```yaml version: "3.8" networks: traefiknet: external: true services: web: image: ghcr.io/hoarder-app/hoarder:release restart: unless-stopped container_name: hoarder volumes: - /opt/mydocker/hoarder/data:/data ports: - 54110:3000 env_file: - .env networks: - traefiknet labels: - traefik.docker.network=traefiknet - traefik.enable=true - traefik.http.routers.hoarder.rule=Host(`hoarder.my.domain`) - traefik.http.routers.hoarder.entrypoints=http,https - traefik.http.routers.hoarder.priority=10 - traefik.http.routers.hoarder.tls=true - traefik.http.services.hoarder.loadbalancer.server.port=3000 - traefik.http.routers.hoarder.tls.certresolver=mycloudflare chrome: image: gcr.io/zenika-hub/alpine-chrome:123 restart: unless-stopped container_name: chrome command: - --no-sandbox - --disable-gpu - --disable-dev-shm-usage - --remote-debugging-address=0.0.0.0 - --remote-debugging-port=9222 - --hide-scrollbars networks: - traefiknet meilisearch: image: getmeili/meilisearch:v1.11.1 restart: unless-stopped container_name: meilisearch env_file: - .env environment: MEILI_NO_ANALYTICS: "true" volumes: - /opt/mydocker/hoarder/meilisearch:/meili_data networks: - traefiknet ```
Author
Owner

@MohamedBassem commented on GitHub (Nov 21, 2024):

@Crush-RY can you share the logs from the web container?

<!-- gh-comment-id:2492428705 --> @MohamedBassem commented on GitHub (Nov 21, 2024): @Crush-RY can you share the logs from the web container?
Author
Owner

@MohamedBassem commented on GitHub (Nov 21, 2024):

Hmmm, it seems like there are multiple people hitting this now. So I'll label this as a bug until we figure out what's going on.

<!-- gh-comment-id:2492455004 --> @MohamedBassem commented on GitHub (Nov 21, 2024): Hmmm, it seems like there are multiple people hitting this now. So I'll label this as a bug until we figure out what's going on.
Author
Owner

@MohamedBassem commented on GitHub (Nov 21, 2024):

Was anyone running hoarder before and faced this problem after an upgrade or is this all new installations?

<!-- gh-comment-id:2492457085 --> @MohamedBassem commented on GitHub (Nov 21, 2024): Was anyone running hoarder before and faced this problem after an upgrade or is this all new installations?
Author
Owner

@MohamedBassem commented on GitHub (Nov 21, 2024):

I've just pushed github.com/hoarder-app/hoarder@393d097c96 to log more details on the connection failure reason. It'll take 15mins for the container to be built. Once it's built, can someone switch to the nightly build and capture the error for me?

<!-- gh-comment-id:2492467837 --> @MohamedBassem commented on GitHub (Nov 21, 2024): I've just pushed https://github.com/hoarder-app/hoarder/commit/393d097c965c9bc223e9660b689df6a0312e9222 to log more details on the connection failure reason. It'll take 15mins for the container to be built. Once it's built, can someone switch to the nightly build and capture the error for me?
Author
Owner

@Guillaume-Bignon commented on GitHub (Nov 21, 2024):

Sure, here it is:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
  ▲ Next.js 14.2.13
  - Local:        http://localhost:3000
  - Network:      http://0.0.0.0:3000

 ✓ Starting...
 ✓ Ready in 411ms

> @hoarder/workers@0.1.0 start:prod /app/apps/workers
> tsx index.ts

2024-11-21T22:48:49.735Z info: Workers version: nightly
2024-11-21T22:48:49.748Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-21T22:48:49.763Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)

(process:69): VIPS-WARNING **: 22:49:40.996: threads clipped to 1024
2024-11-21T22:51:10.022Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: FetchError: request to https://raw.githubusercontent.com/cliqz-oss/adblocker/master/packages/adblocker/assets/easylist/easylist.txt failed, reason: getaddrinfo EAI_AGAIN raw.githubusercontent.com
    at ClientRequest.<anonymous> (/app/apps/workers/node_modules/.pnpm/node-fetch@2.7.0/node_modules/node-fetch/lib/index.js:1501:11)
    at ClientRequest.emit (node:events:518:28)
    at ClientRequest.emit (node:domain:489:12)
    at emitErrorEvent (node:_http_client:103:11)
    at TLSSocket.socketErrorListener (node:_http_client:506:5)
    at TLSSocket.emit (node:events:518:28)
    at TLSSocket.emit (node:domain:489:12)
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)
2024-11-21T22:51:10.023Z info: Starting crawler worker ...
2024-11-21T22:51:10.025Z info: Starting inference worker ...
2024-11-21T22:51:10.026Z info: Starting search indexing worker ...
2024-11-21T22:51:10.027Z info: Starting tidy assets worker ...
2024-11-21T22:51:10.028Z info: Starting video worker ...
2024-11-21T22:51:10.029Z info: Starting feed worker ...
2024-11-21T22:51:10.171Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:10.171Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:15.023Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-11-21T22:51:15.174Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:15.224Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:15.249Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:15.249Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:20.173Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:20.251Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:20.251Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:20.273Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:20.273Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:25.274Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:25.275Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:25.303Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:25.303Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:30.214Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:30.304Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:30.304Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:30.325Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:30.326Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:35.326Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:35.327Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:35.373Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:35.374Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:40.243Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:40.374Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:40.374Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 

Hope that helps.

And to answer your first question: I run hoarder since a few days and I have this error since the beginning.

<!-- gh-comment-id:2492519578 --> @Guillaume-Bignon commented on GitHub (Nov 21, 2024): Sure, here it is: ``` s6-rc: info: service s6rc-oneshot-runner: starting s6-rc: info: service s6rc-oneshot-runner successfully started s6-rc: info: service fix-attrs: starting s6-rc: info: service init-db-migration: starting Running db migration script s6-rc: info: service fix-attrs successfully started s6-rc: info: service legacy-cont-init: starting s6-rc: info: service legacy-cont-init successfully started s6-rc: info: service init-db-migration successfully started s6-rc: info: service svc-workers: starting s6-rc: info: service svc-web: starting s6-rc: info: service svc-workers successfully started s6-rc: info: service svc-web successfully started s6-rc: info: service legacy-services: starting s6-rc: info: service legacy-services successfully started ▲ Next.js 14.2.13 - Local: http://localhost:3000 - Network: http://0.0.0.0:3000 ✓ Starting... ✓ Ready in 411ms > @hoarder/workers@0.1.0 start:prod /app/apps/workers > tsx index.ts 2024-11-21T22:48:49.735Z info: Workers version: nightly 2024-11-21T22:48:49.748Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 (node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. (Use `node --trace-deprecation ...` to show where the warning was created) 2024-11-21T22:48:49.763Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/ (node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. (Use `node --trace-deprecation ...` to show where the warning was created) (process:69): VIPS-WARNING **: 22:49:40.996: threads clipped to 1024 2024-11-21T22:51:10.022Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: FetchError: request to https://raw.githubusercontent.com/cliqz-oss/adblocker/master/packages/adblocker/assets/easylist/easylist.txt failed, reason: getaddrinfo EAI_AGAIN raw.githubusercontent.com at ClientRequest.<anonymous> (/app/apps/workers/node_modules/.pnpm/node-fetch@2.7.0/node_modules/node-fetch/lib/index.js:1501:11) at ClientRequest.emit (node:events:518:28) at ClientRequest.emit (node:domain:489:12) at emitErrorEvent (node:_http_client:103:11) at TLSSocket.socketErrorListener (node:_http_client:506:5) at TLSSocket.emit (node:events:518:28) at TLSSocket.emit (node:domain:489:12) at emitErrorNT (node:internal/streams/destroy:170:8) at emitErrorCloseNT (node:internal/streams/destroy:129:3) at process.processTicksAndRejections (node:internal/process/task_queues:90:21) 2024-11-21T22:51:10.023Z info: Starting crawler worker ... 2024-11-21T22:51:10.025Z info: Starting inference worker ... 2024-11-21T22:51:10.026Z info: Starting search indexing worker ... 2024-11-21T22:51:10.027Z info: Starting tidy assets worker ... 2024-11-21T22:51:10.028Z info: Starting video worker ... 2024-11-21T22:51:10.029Z info: Starting feed worker ... 2024-11-21T22:51:10.171Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8" 2024-11-21T22:51:10.171Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-21T22:51:15.023Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2024-11-21T22:51:15.174Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout 2024-11-21T22:51:15.224Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true AssertionError [ERR_ASSERTION]: undefined == true at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083) at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502) at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-21T22:51:15.249Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8" 2024-11-21T22:51:15.249Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-21T22:51:20.173Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed MeiliSearchCommunicationError: fetch failed at node:internal/deps/undici/undici:13392:13 at process.processTicksAndRejections (node:internal/process/task_queues:105:5) 2024-11-21T22:51:20.251Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout 2024-11-21T22:51:20.251Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true AssertionError [ERR_ASSERTION]: undefined == true at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083) at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502) at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-21T22:51:20.273Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8" 2024-11-21T22:51:20.273Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-21T22:51:25.274Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout 2024-11-21T22:51:25.275Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true AssertionError [ERR_ASSERTION]: undefined == true at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083) at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502) at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-21T22:51:25.303Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8" 2024-11-21T22:51:25.303Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-21T22:51:30.214Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed MeiliSearchCommunicationError: fetch failed at node:internal/deps/undici/undici:13392:13 at process.processTicksAndRejections (node:internal/process/task_queues:105:5) 2024-11-21T22:51:30.304Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout 2024-11-21T22:51:30.304Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true AssertionError [ERR_ASSERTION]: undefined == true at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083) at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502) at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-21T22:51:30.325Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8" 2024-11-21T22:51:30.326Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-21T22:51:35.326Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout 2024-11-21T22:51:35.327Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true AssertionError [ERR_ASSERTION]: undefined == true at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083) at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502) at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-21T22:51:35.373Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8" 2024-11-21T22:51:35.374Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-21T22:51:40.243Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed MeiliSearchCommunicationError: fetch failed at node:internal/deps/undici/undici:13392:13 at process.processTicksAndRejections (node:internal/process/task_queues:105:5) 2024-11-21T22:51:40.374Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout 2024-11-21T22:51:40.374Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true AssertionError [ERR_ASSERTION]: undefined == true at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083) at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502) at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) ``` Hope that helps. And to answer your first question: I run hoarder since a few days and I have this error since the beginning.
Author
Owner

@MohamedBassem commented on GitHub (Nov 21, 2024):

Yeah, this is actually very helpful. I think I know how I can fix that!

<!-- gh-comment-id:2492539470 --> @MohamedBassem commented on GitHub (Nov 21, 2024): Yeah, this is actually very helpful. I think I know how I can fix that!
Author
Owner

@MohamedBassem commented on GitHub (Nov 21, 2024):

So basically what's happening here is that for one reason or the other (might be your network policies, or github being blocked, etc), hoarder is failing to download the adblock list used in the crawler. I've sent github.com/hoarder-app/hoarder@378ad9bc15 to ensure that this doesn't block worker startup. And in your case, you might also want to set CRAWLER_ENABLE_ADBLOCKER=false so that you don't block the startup of the worker each time given that the download is always failing. Can you give it a try once the container is built?

<!-- gh-comment-id:2492575864 --> @MohamedBassem commented on GitHub (Nov 21, 2024): So basically what's happening here is that for one reason or the other (might be your network policies, or github being blocked, etc), hoarder is failing to download the adblock list used in the crawler. I've sent https://github.com/hoarder-app/hoarder/commit/378ad9bc157fb7741e09cdb687a97c82c2851578 to ensure that this doesn't block worker startup. And in your case, you might also want to set `CRAWLER_ENABLE_ADBLOCKER=false` so that you don't block the startup of the worker each time given that the download is always failing. Can you give it a try once the container is built?
Author
Owner

@Guillaume-Bignon commented on GitHub (Nov 22, 2024):

Thanks again for your very quick answer. I tried with the fix you pushed and I also added the line you suggested in .env file, but unfortunately, it does not work.

Here are the logs:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
  ▲ Next.js 14.2.13
  - Local:        http://localhost:3000
  - Network:      http://0.0.0.0:3000

 ✓ Starting...
 ✓ Ready in 358ms

> @hoarder/workers@0.1.0 start:prod /app/apps/workers
> tsx index.ts

(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.170Z info: Workers version: nightly
2024-11-22T00:50:20.182Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.199Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
2024-11-22T00:50:20.324Z info: Starting crawler worker ...
2024-11-22T00:50:20.325Z info: Starting inference worker ...
2024-11-22T00:50:20.325Z info: Starting search indexing worker ...
2024-11-22T00:50:20.326Z info: Starting tidy assets worker ...
2024-11-22T00:50:20.326Z info: Starting video worker ...
2024-11-22T00:50:20.326Z info: Starting feed worker ...
2024-11-22T00:50:20.365Z info: [Crawler][22] Will crawl "https://www.wikipedia.org/" for link with id "m2wi6yovvkafmnjegdic7b6c"
2024-11-22T00:50:20.365Z info: [Crawler][22] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:20.462Z info: [search][23] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:20.594Z info: [search][23] Completed successfully
2024-11-22T00:50:25.370Z error: [Crawler][22] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
s [TRPCError]: Bookmark not found
    at /app/apps/web/.next/server/chunks/6815.js:1:16914
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async t (/app/apps/web/.next/server/chunks/440.js:4:32333)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async t (/app/apps/web/.next/server/chunks/440.js:4:33299)
    at async /app/apps/web/.next/server/app/api/trpc/[trpc]/route.js:1:4379
    at async Promise.all (index 1) {
  code: 'NOT_FOUND',
  [cause]: undefined
}
2024-11-22T00:50:34.670Z info: [search][24] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:34.758Z info: [search][24] Completed successfully
2024-11-22T00:50:35.751Z error: [Crawler][22] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.772Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.790Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.807Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.826Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.847Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)

(process:69): VIPS-WARNING **: 00:50:42.563: threads clipped to 1024
2024-11-22T00:50:42.890Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:42.891Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:42.909Z info: [search][26] Attempting to index bookmark with id c77a1dclbtoswxfg1dehix2z ...
2024-11-22T00:50:42.989Z info: [search][26] Completed successfully
2024-11-22T00:50:47.893Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:50:58.038Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:58.060Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:58.060Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:03.061Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:51:13.201Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:51:13.224Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:51:13.224Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:18.225Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
<!-- gh-comment-id:2492651427 --> @Guillaume-Bignon commented on GitHub (Nov 22, 2024): Thanks again for your very quick answer. I tried with the fix you pushed and I also added the line you suggested in .env file, but unfortunately, it does not work. Here are the logs: ``` s6-rc: info: service s6rc-oneshot-runner: starting s6-rc: info: service s6rc-oneshot-runner successfully started s6-rc: info: service fix-attrs: starting s6-rc: info: service init-db-migration: starting Running db migration script s6-rc: info: service fix-attrs successfully started s6-rc: info: service legacy-cont-init: starting s6-rc: info: service legacy-cont-init successfully started s6-rc: info: service init-db-migration successfully started s6-rc: info: service svc-workers: starting s6-rc: info: service svc-web: starting s6-rc: info: service svc-workers successfully started s6-rc: info: service svc-web successfully started s6-rc: info: service legacy-services: starting s6-rc: info: service legacy-services successfully started ▲ Next.js 14.2.13 - Local: http://localhost:3000 - Network: http://0.0.0.0:3000 ✓ Starting... ✓ Ready in 358ms > @hoarder/workers@0.1.0 start:prod /app/apps/workers > tsx index.ts (node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. (Use `node --trace-deprecation ...` to show where the warning was created) 2024-11-22T00:50:20.170Z info: Workers version: nightly 2024-11-22T00:50:20.182Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 (node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. (Use `node --trace-deprecation ...` to show where the warning was created) 2024-11-22T00:50:20.199Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/ 2024-11-22T00:50:20.324Z info: Starting crawler worker ... 2024-11-22T00:50:20.325Z info: Starting inference worker ... 2024-11-22T00:50:20.325Z info: Starting search indexing worker ... 2024-11-22T00:50:20.326Z info: Starting tidy assets worker ... 2024-11-22T00:50:20.326Z info: Starting video worker ... 2024-11-22T00:50:20.326Z info: Starting feed worker ... 2024-11-22T00:50:20.365Z info: [Crawler][22] Will crawl "https://www.wikipedia.org/" for link with id "m2wi6yovvkafmnjegdic7b6c" 2024-11-22T00:50:20.365Z info: [Crawler][22] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-22T00:50:20.462Z info: [search][23] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ... 2024-11-22T00:50:20.594Z info: [search][23] Completed successfully 2024-11-22T00:50:25.370Z error: [Crawler][22] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted. s [TRPCError]: Bookmark not found at /app/apps/web/.next/server/chunks/6815.js:1:16914 at async a (/app/apps/web/.next/server/chunks/440.js:4:32960) at async t (/app/apps/web/.next/server/chunks/440.js:4:32333) at async a (/app/apps/web/.next/server/chunks/440.js:4:32960) at async a (/app/apps/web/.next/server/chunks/440.js:4:32960) at async a (/app/apps/web/.next/server/chunks/440.js:4:32960) at async a (/app/apps/web/.next/server/chunks/440.js:4:32960) at async t (/app/apps/web/.next/server/chunks/440.js:4:33299) at async /app/apps/web/.next/server/app/api/trpc/[trpc]/route.js:1:4379 at async Promise.all (index 1) { code: 'NOT_FOUND', [cause]: undefined } 2024-11-22T00:50:34.670Z info: [search][24] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ... 2024-11-22T00:50:34.758Z info: [search][24] Completed successfully 2024-11-22T00:50:35.751Z error: [Crawler][22] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/ Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/ at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20) at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25) at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20) at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456) at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607) at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-22T00:50:35.772Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link Error: The bookmark either doesn't exist or is not a link at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-22T00:50:35.790Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link Error: The bookmark either doesn't exist or is not a link at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-22T00:50:35.807Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link Error: The bookmark either doesn't exist or is not a link at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-22T00:50:35.826Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link Error: The bookmark either doesn't exist or is not a link at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-22T00:50:35.847Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link Error: The bookmark either doesn't exist or is not a link at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) (process:69): VIPS-WARNING **: 00:50:42.563: threads clipped to 1024 2024-11-22T00:50:42.890Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z" 2024-11-22T00:50:42.891Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-22T00:50:42.909Z info: [search][26] Attempting to index bookmark with id c77a1dclbtoswxfg1dehix2z ... 2024-11-22T00:50:42.989Z info: [search][26] Completed successfully 2024-11-22T00:50:47.893Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted. 2024-11-22T00:50:58.038Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/ Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/ at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20) at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25) at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20) at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456) at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607) at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-22T00:50:58.060Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z" 2024-11-22T00:50:58.060Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-22T00:51:03.061Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted. 2024-11-22T00:51:13.201Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/ Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/ at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20) at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25) at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20) at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456) at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607) at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740) at async Object.run (/app/apps/workers/utils.ts:2:1459) at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578) 2024-11-22T00:51:13.224Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z" 2024-11-22T00:51:13.224Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/ 2024-11-22T00:51:18.225Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted. ```
Author
Owner

@MohamedBassem commented on GitHub (Nov 22, 2024):

ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point.

<!-- gh-comment-id:2492655171 --> @MohamedBassem commented on GitHub (Nov 22, 2024): ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point.
Author
Owner

@snowdream commented on GitHub (Nov 22, 2024):

As you know,I am in China.

Does hoarder access any api i can not access?

<!-- gh-comment-id:2492703325 --> @snowdream commented on GitHub (Nov 22, 2024): As you know,I am in China. Does hoarder access any api i can not access?
Author
Owner

@Guillaume-Bignon commented on GitHub (Nov 22, 2024):

ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point.

Alright, I will investigate on my side, thank you for your help!

<!-- gh-comment-id:2493453847 --> @Guillaume-Bignon commented on GitHub (Nov 22, 2024): > ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point. Alright, I will investigate on my side, thank you for your help!
Author
Owner

@miracloon commented on GitHub (Nov 22, 2024):

I'm also in China, and as you mentioned, it turns out to be a network issue. I tried deploying hoarder on a VPS without network restrictions, and it worked perfectly.

Thanks a lot for your help!

<!-- gh-comment-id:2494169805 --> @miracloon commented on GitHub (Nov 22, 2024): I'm also in China, and as you mentioned, it turns out to be a network issue. I tried deploying hoarder on a VPS without network restrictions, and it worked perfectly. Thanks a lot for your help!
Author
Owner

@Guillaume-Bignon commented on GitHub (Nov 23, 2024):

I finally managed to fix the error and it was indeed coming from bad default docker configuration. For people facing the same one, here are the steps I followed: https://stackoverflow.com/questions/39400886/docker-cannot-resolve-dns-on-private-network and then I restarted my server.

<!-- gh-comment-id:2495504677 --> @Guillaume-Bignon commented on GitHub (Nov 23, 2024): I finally managed to fix the error and it was indeed coming from bad default docker configuration. For people facing the same one, here are the steps I followed: https://stackoverflow.com/questions/39400886/docker-cannot-resolve-dns-on-private-network and then I restarted my server.
Author
Owner

@AlotOfBlahaj commented on GitHub (Nov 25, 2024):

In my situation, the web containter does not included the network where chrome and meilisearch in.

docker network inspect hoarder-app-eeoke0_default [ { "Name": "hoarder-app-eeoke0_default", "Id": "b0c282cc6d82c4c22e8124fb046eefff913b2e0f019693f6fe6cd6b86f685047", "Created": "2024-11-11T07:01:38.95469184+01:00", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "172.20.0.0/16", "Gateway": "172.20.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "23200b27a616749aa75a266277134e15300de124e77bc009fac94bc055170b1d": { "Name": "hoarder-app-eeoke0-chrome-1", "EndpointID": "12a25700f84cafefaa61e55ec9f8c9abafc638a68d6b928c3b6de9c71461028d", "MacAddress": "02:42:ac:14:00:02", "IPv4Address": "172.20.0.2/16", "IPv6Address": "" }, "76f3d8333267c3d0e1c4120d1520499466e44ab46ff728f345a6fa815ad3af50": { "Name": "hoarder-app-eeoke0-meilisearch-1", "EndpointID": "5362897cb6d2a74414ab7bcbee65223f40d895671d7b1a3f3ed67df2e5f3dc8e", "MacAddress": "02:42:ac:14:00:03", "IPv4Address": "172.20.0.3/16", "IPv6Address": "" } }, "Options": {}, "Labels": { "com.docker.compose.network": "default", "com.docker.compose.project": "hoarder-app-eeoke0", "com.docker.compose.version": "2.29.7" } } ]

I manualy add web to the network, and it works.

<!-- gh-comment-id:2496967616 --> @AlotOfBlahaj commented on GitHub (Nov 25, 2024): In my situation, the web containter does not included the network where chrome and meilisearch in. `docker network inspect hoarder-app-eeoke0_default [ { "Name": "hoarder-app-eeoke0_default", "Id": "b0c282cc6d82c4c22e8124fb046eefff913b2e0f019693f6fe6cd6b86f685047", "Created": "2024-11-11T07:01:38.95469184+01:00", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "172.20.0.0/16", "Gateway": "172.20.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "23200b27a616749aa75a266277134e15300de124e77bc009fac94bc055170b1d": { "Name": "hoarder-app-eeoke0-chrome-1", "EndpointID": "12a25700f84cafefaa61e55ec9f8c9abafc638a68d6b928c3b6de9c71461028d", "MacAddress": "02:42:ac:14:00:02", "IPv4Address": "172.20.0.2/16", "IPv6Address": "" }, "76f3d8333267c3d0e1c4120d1520499466e44ab46ff728f345a6fa815ad3af50": { "Name": "hoarder-app-eeoke0-meilisearch-1", "EndpointID": "5362897cb6d2a74414ab7bcbee65223f40d895671d7b1a3f3ed67df2e5f3dc8e", "MacAddress": "02:42:ac:14:00:03", "IPv4Address": "172.20.0.3/16", "IPv6Address": "" } }, "Options": {}, "Labels": { "com.docker.compose.network": "default", "com.docker.compose.project": "hoarder-app-eeoke0", "com.docker.compose.version": "2.29.7" } } ]` I manualy add web to the network, and it works.
Author
Owner

@FlorentLM commented on GitHub (Jan 2, 2025):

Had the same dns issue with chome-alpine

Adding the launch parameter --headless=new fixed it.However it created another error:

Failed to fetch browser webSocket URL from http://172.28.0.4:9222/json/version: fetch failed

But that seems to be inconsequential (for now), so the workaround is fine

<!-- gh-comment-id:2568531627 --> @FlorentLM commented on GitHub (Jan 2, 2025): Had the same dns issue with chome-alpine Adding the launch parameter `--headless=new` fixed it.However it created another error: ``` Failed to fetch browser webSocket URL from http://172.28.0.4:9222/json/version: fetch failed ``` But that seems to be inconsequential (for now), so the workaround is fine
Author
Owner

@imp1sh commented on GitHub (Jan 31, 2025):

I'm also hitting this problem. I'm not using docker-compose but ansible and the podman collection (also using --headless=new) to spawn my containers, so I'm using the containers hostname that he automatically gets assigned in dns. As my container name is chrome0 it tries to connect to it:

2025-01-31T23:20:22.321Z info: [Crawler] Connecting to existing browser instance: http://chrome0:9222
2025-01-31T23:20:22.322Z info: [Crawler] Successfully resolved IP address, new address: http://chrome0:9222/
2025-01-31T23:20:22.323Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: TypeError: Failed to fetch browser webSocket URL from http://chrome
0:9222/json/version: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async getWSEndpoint (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:94:24)
    at async getConnectionTransport (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:81:31
)
    at async _connectToBrowser (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:50:50)
    at async PuppeteerExtra.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-extra@3.3.6_puppeteer@22.3.0_typescript@5.3.3_/node_modules/puppeteer-extra/dist/index.cjs
.js:151:25)
    at async /app/apps/workers/crawlerWorker.ts:2:4664

When I manually try to fetch from this url in a netshoot container:

# podman run -it --net podmannetGUA docker.io/nicolaka/netshoot bash
9af73f2a13b8:~# curl http://chrome0:9222/json/version
curl: (7) Failed to connect to chrome0 port 9222 after 1 ms: Couldn't connect to server

It doesn't seem to be a dns problem for me though:

# ping -4 chrome0
PING chrome0.dns.podman (10.121.13.189) 56(84) bytes of data.
64 bytes from 54129a2391df (10.121.13.189): icmp_seq=1 ttl=64 time=0.012 ms
# ping -6 chrome0
PING chrome0 (2001:4dd0:28d4:3001::9e1) 56 data bytes
64 bytes from 54129a2391df (2001:4dd0:28d4:3001::9e1): icmp_seq=1 ttl=64 time=0.224 ms
64 bytes from 54129a2391df (2001:4dd0:28d4:3001::9e1): icmp_seq=2 ttl=64 time=0.045 ms

My podman network is dual stack FYI, dunno if that might be a problem

<!-- gh-comment-id:2628557762 --> @imp1sh commented on GitHub (Jan 31, 2025): I'm also hitting this problem. I'm not using docker-compose but ansible and the podman collection (also using --headless=new) to spawn my containers, so I'm using the containers hostname that he automatically gets assigned in dns. As my container name is chrome0 it tries to connect to it: ``` 2025-01-31T23:20:22.321Z info: [Crawler] Connecting to existing browser instance: http://chrome0:9222 2025-01-31T23:20:22.322Z info: [Crawler] Successfully resolved IP address, new address: http://chrome0:9222/ 2025-01-31T23:20:22.323Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: TypeError: Failed to fetch browser webSocket URL from http://chrome 0:9222/json/version: fetch failed at node:internal/deps/undici/undici:13392:13 at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async getWSEndpoint (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:94:24) at async getConnectionTransport (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:81:31 ) at async _connectToBrowser (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:50:50) at async PuppeteerExtra.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-extra@3.3.6_puppeteer@22.3.0_typescript@5.3.3_/node_modules/puppeteer-extra/dist/index.cjs .js:151:25) at async /app/apps/workers/crawlerWorker.ts:2:4664 ``` When I manually try to fetch from this url in a netshoot container: ``` # podman run -it --net podmannetGUA docker.io/nicolaka/netshoot bash 9af73f2a13b8:~# curl http://chrome0:9222/json/version curl: (7) Failed to connect to chrome0 port 9222 after 1 ms: Couldn't connect to server ``` It doesn't seem to be a dns problem for me though: ``` # ping -4 chrome0 PING chrome0.dns.podman (10.121.13.189) 56(84) bytes of data. 64 bytes from 54129a2391df (10.121.13.189): icmp_seq=1 ttl=64 time=0.012 ms # ping -6 chrome0 PING chrome0 (2001:4dd0:28d4:3001::9e1) 56 data bytes 64 bytes from 54129a2391df (2001:4dd0:28d4:3001::9e1): icmp_seq=1 ttl=64 time=0.224 ms 64 bytes from 54129a2391df (2001:4dd0:28d4:3001::9e1): icmp_seq=2 ttl=64 time=0.045 ms ``` My podman network is dual stack FYI, dunno if that might be a problem
Author
Owner

@Unambiguous commented on GitHub (Mar 1, 2025):

My instance is also suffering from this problem. Has anybody had any luck finding what is causing the issue?

My piece of the log:

025-03-01T19:44:28.699Z info: [Crawler] Connecting to existing browser instance: hoarder-chrome:9222
2025-03-01T19:44:28.699Z info: [Crawler] Successfully resolved IP address, new address: hoarder-chrome:9222
2025-03-01T19:44:28.700Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: TypeError: Invalid URL
    at new URL (node:internal/url:818:25)
    at getWSEndpoint (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:92:25)
    at getConnectionTransport (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:81:37)
    at _connectToBrowser (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:50:56)
    at PuppeteerNode.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Puppeteer.js:96:60)
    at PuppeteerNode.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/node/PuppeteerNode.js:91:22)
    at PuppeteerExtra.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-extra@3.3.6_puppeteer@22.3.0_typescript@5.7.3_/node_modules/puppeteer-extra/dist/index.cjs.js:151:41)
    at async /app/apps/workers/crawlerWorker.ts:2:4664

with this config

BROWSER_WEB_URL=hoarder-chrome:9222
<!-- gh-comment-id:2692383164 --> @Unambiguous commented on GitHub (Mar 1, 2025): My instance is also suffering from this problem. Has anybody had any luck finding what is causing the issue? My piece of the log: ``` 025-03-01T19:44:28.699Z info: [Crawler] Connecting to existing browser instance: hoarder-chrome:9222 2025-03-01T19:44:28.699Z info: [Crawler] Successfully resolved IP address, new address: hoarder-chrome:9222 2025-03-01T19:44:28.700Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: TypeError: Invalid URL at new URL (node:internal/url:818:25) at getWSEndpoint (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:92:25) at getConnectionTransport (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:81:37) at _connectToBrowser (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:50:56) at PuppeteerNode.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Puppeteer.js:96:60) at PuppeteerNode.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/node/PuppeteerNode.js:91:22) at PuppeteerExtra.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-extra@3.3.6_puppeteer@22.3.0_typescript@5.7.3_/node_modules/puppeteer-extra/dist/index.cjs.js:151:41) at async /app/apps/workers/crawlerWorker.ts:2:4664 ``` with this config ```yaml BROWSER_WEB_URL=hoarder-chrome:9222 ```
Author
Owner

@AlotOfBlahaj commented on GitHub (Mar 1, 2025):

My instance is also suffering from this problem. Has anybody had any luck finding what is causing the issue?

My piece of the log:

025-03-01T19:44:28.699Z info: [Crawler] Connecting to existing browser instance: hoarder-chrome:9222
2025-03-01T19:44:28.699Z info: [Crawler] Successfully resolved IP address, new address: hoarder-chrome:9222
2025-03-01T19:44:28.700Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: TypeError: Invalid URL
    at new URL (node:internal/url:818:25)
    at getWSEndpoint (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:92:25)
    at getConnectionTransport (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:81:37)
    at _connectToBrowser (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:50:56)
    at PuppeteerNode.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Puppeteer.js:96:60)
    at PuppeteerNode.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/node/PuppeteerNode.js:91:22)
    at PuppeteerExtra.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-extra@3.3.6_puppeteer@22.3.0_typescript@5.7.3_/node_modules/puppeteer-extra/dist/index.cjs.js:151:41)
    at async /app/apps/workers/crawlerWorker.ts:2:4664

with this config

BROWSER_WEB_URL=hoarder-chrome:9222

you should add http:// before the address

<!-- gh-comment-id:2692400615 --> @AlotOfBlahaj commented on GitHub (Mar 1, 2025): > My instance is also suffering from this problem. Has anybody had any luck finding what is causing the issue? > > My piece of the log: > > ``` > 025-03-01T19:44:28.699Z info: [Crawler] Connecting to existing browser instance: hoarder-chrome:9222 > 2025-03-01T19:44:28.699Z info: [Crawler] Successfully resolved IP address, new address: hoarder-chrome:9222 > 2025-03-01T19:44:28.700Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: TypeError: Invalid URL > at new URL (node:internal/url:818:25) > at getWSEndpoint (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:92:25) > at getConnectionTransport (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:81:37) > at _connectToBrowser (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js:50:56) > at PuppeteerNode.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Puppeteer.js:96:60) > at PuppeteerNode.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/node/PuppeteerNode.js:91:22) > at PuppeteerExtra.connect (/app/apps/workers/node_modules/.pnpm/puppeteer-extra@3.3.6_puppeteer@22.3.0_typescript@5.7.3_/node_modules/puppeteer-extra/dist/index.cjs.js:151:41) > at async /app/apps/workers/crawlerWorker.ts:2:4664 > ``` > > with this config > > BROWSER_WEB_URL=hoarder-chrome:9222 you should add http:// before the address
Author
Owner

@Unambiguous commented on GitHub (Mar 2, 2025):

I tried both http:// and ws:// before the address, but the issue still remains.

The log changes a little:

2025-03-02T07:06:34.111Z info: [Crawler] Connecting to existing browser instance: http://hoarder-chrome:9222
2025-03-02T07:06:34.113Z info: [Crawler] Successfully resolved IP address, new address: http://172.18.0.2:9222/
2025-03-02T07:06:34.115Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: TypeError: Failed to fetch browser webSocket URL from http://172.18.0.2:9222/json/version: fetch failed

When I run curl inside the hoarder app container, I get the following output:

/app # curl http://hoarder-chrome:9222/json/version
curl: (7) Failed to connect to hoarder-chrome port 9222 after 0 ms: Could not connect to server
/app # curl http://172.18.0.2:9222/json/version
curl: (7) Failed to connect to 172.18.0.2 port 9222 after 0 ms: Could not connect to server

The chrome container seems to be listening on the port:

/usr/src/app $ netstat -tlnp | grep 922
tcp        0      0 127.0.0.1:9222          0.0.0.0:*               LISTEN      1/chromium

All containers are running on their own Docker network.

<!-- gh-comment-id:2692607387 --> @Unambiguous commented on GitHub (Mar 2, 2025): I tried both `http://` and `ws://` before the address, but the issue still remains. The log changes a little: ``` 2025-03-02T07:06:34.111Z info: [Crawler] Connecting to existing browser instance: http://hoarder-chrome:9222 2025-03-02T07:06:34.113Z info: [Crawler] Successfully resolved IP address, new address: http://172.18.0.2:9222/ 2025-03-02T07:06:34.115Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: TypeError: Failed to fetch browser webSocket URL from http://172.18.0.2:9222/json/version: fetch failed ``` When I run `curl` inside the hoarder app container, I get the following output: ```shell /app # curl http://hoarder-chrome:9222/json/version curl: (7) Failed to connect to hoarder-chrome port 9222 after 0 ms: Could not connect to server /app # curl http://172.18.0.2:9222/json/version curl: (7) Failed to connect to 172.18.0.2 port 9222 after 0 ms: Could not connect to server ``` The chrome container seems to be listening on the port: ```shell /usr/src/app $ netstat -tlnp | grep 922 tcp 0 0 127.0.0.1:9222 0.0.0.0:* LISTEN 1/chromium ``` All containers are running on their own Docker network.
Author
Owner

@MohamedBassem commented on GitHub (Mar 2, 2025):

@Unambiguous please open a new discussion in the Q&A discussions and make sure to attach your docker compose so that we can help you.

<!-- gh-comment-id:2692717144 --> @MohamedBassem commented on GitHub (Mar 2, 2025): @Unambiguous please open a new discussion in the Q&A discussions and make sure to attach your docker compose so that we can help you.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#437
No description provided.