[GH-ISSUE #832] Queue in stuck state? #541

Closed
opened 2026-03-02 11:50:42 +03:00 by kerem · 13 comments
Owner

Originally created by @icco on GitHub (Jan 5, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/832

Describe the Bug

I uploaded around 100k bookmarks to hoarder, and it seems like the queue hasn't progressed in the last few days. Is there a process for resetting or debugging the queue?

Steps to Reproduce

N/A

Expected Behaviour

Some sort of logging from container about queue and how it's progressing or if it's running into errors.

Screenshots or Additional Context

Screenshot 2025-01-05 at 12 39 55

Device Details

docker

Exact Hoarder Version

v0.21.0

Have you checked the troubleshooting guide?

  • I have checked the troubleshooting guide and I haven't found a solution to my problem
Originally created by @icco on GitHub (Jan 5, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/832 ### Describe the Bug I uploaded around 100k bookmarks to hoarder, and it seems like the queue hasn't progressed in the last few days. Is there a process for resetting or debugging the queue? ### Steps to Reproduce N/A ### Expected Behaviour Some sort of logging from container about queue and how it's progressing or if it's running into errors. ### Screenshots or Additional Context <img width="1409" alt="Screenshot 2025-01-05 at 12 39 55" src="https://github.com/user-attachments/assets/f52e2336-5002-47e7-95ef-1d3a3ff216e0" /> ### Device Details docker ### Exact Hoarder Version v0.21.0 ### Have you checked the troubleshooting guide? - [X] I have checked the troubleshooting guide and I haven't found a solution to my problem
kerem 2026-03-02 11:50:42 +03:00
  • closed this issue
  • added the
    question
    label
Author
Owner

@MohamedBassem commented on GitHub (Jan 5, 2025):

We log extensively actually every time a worker picks up/completes a queued job. Are you not seeing any logs from the container at all? Can you try restarting the container and sharing the logs that you see on startup?

<!-- gh-comment-id:2571700845 --> @MohamedBassem commented on GitHub (Jan 5, 2025): We log extensively actually every time a worker picks up/completes a queued job. Are you not seeing any logs from the container at all? Can you try restarting the container and sharing the logs that you see on startup?
Author
Owner

@icco commented on GitHub (Jan 5, 2025):

Screenshot 2025-01-05 at 12 45 20
<!-- gh-comment-id:2571701217 --> @icco commented on GitHub (Jan 5, 2025): <img width="1822" alt="Screenshot 2025-01-05 at 12 45 20" src="https://github.com/user-attachments/assets/57f7e6b6-7ee8-4859-811a-eb43488c3c20" />
Author
Owner

@icco commented on GitHub (Jan 5, 2025):

Docker Compose config:

  hoarder:
    image: ghcr.io/hoarder-app/hoarder:release
    restart: unless-stopped
    environment:
      BROWSER_WEB_URL: http://chrome:9222
      CRAWLER_FULL_PAGE_ARCHIVE: true
      CRAWLER_NUM_WORKERS: 2
      DATA_DIR: /data
      DISABLE_SIGNUPS: true
      LOG_LEVEL: debug
      MEILI_ADDR: http://meilisearch:7700
      MEILI_MASTER_KEY: HIDDEN
      NEXTAUTH_SECRET: HIDDEN
      NEXTAUTH_URL: https://hoard.natwelch.com
      OPENAI_API_KEY: HIDDEN
    volumes:
      - /docker/appdata/hoarder:/data
    labels:
      prometheus.io/scrape: false
      caddy: hoard.natwelch.com
      caddy.reverse_proxy: '{{upstreams 3000}}'
      caddy.log:
    networks:
      - caddy
  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:latest
    restart: unless-stopped
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
    labels:
      prometheus.io/scrape: false
    networks:
      - caddy
  meilisearch:
    image: getmeili/meilisearch:latest
    restart: unless-stopped
    environment:
      MEILI_MASTER_KEY: HIDDEN
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - /docker/appdata/meilisearch:/meili_data
    labels:
      prometheus.io/scrape: false
    networks:
      - caddy
<!-- gh-comment-id:2571701987 --> @icco commented on GitHub (Jan 5, 2025): Docker Compose config: ``` hoarder: image: ghcr.io/hoarder-app/hoarder:release restart: unless-stopped environment: BROWSER_WEB_URL: http://chrome:9222 CRAWLER_FULL_PAGE_ARCHIVE: true CRAWLER_NUM_WORKERS: 2 DATA_DIR: /data DISABLE_SIGNUPS: true LOG_LEVEL: debug MEILI_ADDR: http://meilisearch:7700 MEILI_MASTER_KEY: HIDDEN NEXTAUTH_SECRET: HIDDEN NEXTAUTH_URL: https://hoard.natwelch.com OPENAI_API_KEY: HIDDEN volumes: - /docker/appdata/hoarder:/data labels: prometheus.io/scrape: false caddy: hoard.natwelch.com caddy.reverse_proxy: '{{upstreams 3000}}' caddy.log: networks: - caddy chrome: image: gcr.io/zenika-hub/alpine-chrome:latest restart: unless-stopped command: - --no-sandbox - --disable-gpu - --disable-dev-shm-usage - --remote-debugging-address=0.0.0.0 - --remote-debugging-port=9222 - --hide-scrollbars labels: prometheus.io/scrape: false networks: - caddy meilisearch: image: getmeili/meilisearch:latest restart: unless-stopped environment: MEILI_MASTER_KEY: HIDDEN MEILI_NO_ANALYTICS: "true" volumes: - /docker/appdata/meilisearch:/meili_data labels: prometheus.io/scrape: false networks: - caddy ```
Author
Owner

@MohamedBassem commented on GitHub (Jan 5, 2025):

Does the logs stop there? This is what the full startup sequence for the workers should be:

2025-01-05T16:51:07.719Z info: Workers version: 0.21.0
2025-01-05T16:51:07.723Z info: [crawler] Loading adblocker ...
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2025-01-05T16:51:08.363Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2025-01-05T16:51:08.364Z info: [Crawler] Successfully resolved IP address, new address: http://172.27.80.2:9222/
2025-01-05T16:51:08.399Z info: Starting crawler worker ...
2025-01-05T16:51:08.400Z info: Starting inference worker ...
2025-01-05T16:51:08.400Z info: Starting search indexing worker ...
2025-01-05T16:51:08.400Z info: Starting tidy assets worker ...
2025-01-05T16:51:08.400Z info: Starting video worker ...
2025-01-05T16:51:08.400Z info: Starting feed worker ...
2025-01-05T16:51:08.400Z info: Starting asset preprocessing worker ...
<!-- gh-comment-id:2571702121 --> @MohamedBassem commented on GitHub (Jan 5, 2025): Does the logs stop there? This is what the full startup sequence for the workers should be: ``` 2025-01-05T16:51:07.719Z info: Workers version: 0.21.0 2025-01-05T16:51:07.723Z info: [crawler] Loading adblocker ... (node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. (Use `node --trace-deprecation ...` to show where the warning was created) 2025-01-05T16:51:08.363Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 2025-01-05T16:51:08.364Z info: [Crawler] Successfully resolved IP address, new address: http://172.27.80.2:9222/ 2025-01-05T16:51:08.399Z info: Starting crawler worker ... 2025-01-05T16:51:08.400Z info: Starting inference worker ... 2025-01-05T16:51:08.400Z info: Starting search indexing worker ... 2025-01-05T16:51:08.400Z info: Starting tidy assets worker ... 2025-01-05T16:51:08.400Z info: Starting video worker ... 2025-01-05T16:51:08.400Z info: Starting feed worker ... 2025-01-05T16:51:08.400Z info: Starting asset preprocessing worker ... ```
Author
Owner

@icco commented on GitHub (Jan 5, 2025):

yeah logs stop there

<!-- gh-comment-id:2571702252 --> @icco commented on GitHub (Jan 5, 2025): yeah logs stop there
Author
Owner

@MohamedBassem commented on GitHub (Jan 5, 2025):

Interesting ... The workers are stuck in startup and it's not 100% clear to me where exactly they are stuck. Can you comment the BROWSER_WEB_URL line, run docker compose up and see if this helps the workers get unstuck?

<!-- gh-comment-id:2571703140 --> @MohamedBassem commented on GitHub (Jan 5, 2025): Interesting ... The workers are stuck in startup and it's not 100% clear to me where exactly they are stuck. Can you comment the `BROWSER_WEB_URL` line, run `docker compose up` and see if this helps the workers get unstuck?
Author
Owner

@icco commented on GitHub (Jan 5, 2025):

oh wow yeah that unstuck them

<!-- gh-comment-id:2571703710 --> @icco commented on GitHub (Jan 5, 2025): oh wow yeah that unstuck them
Author
Owner

@icco commented on GitHub (Jan 5, 2025):

Screenshot 2025-01-05 at 12 56 34
<!-- gh-comment-id:2571704050 --> @icco commented on GitHub (Jan 5, 2025): <img width="1822" alt="Screenshot 2025-01-05 at 12 56 34" src="https://github.com/user-attachments/assets/706faa08-5105-41a5-920b-815e0fff97e0" />
Author
Owner

@MohamedBassem commented on GitHub (Jan 5, 2025):

Ok, now we know it's stuck talking to the chrome container. You probably want to add that line back as you're now crawling without javascript support. Now, is the chrome container healthy?

<!-- gh-comment-id:2571704316 --> @MohamedBassem commented on GitHub (Jan 5, 2025): Ok, now we know it's stuck talking to the chrome container. You probably want to add that line back as you're now crawling without javascript support. Now, is the chrome container healthy?
Author
Owner

@icco commented on GitHub (Jan 5, 2025):

Looks like it wasn't, I changed the image to ghcr.io/zenika/alpine-chrome:latest which recreated the container and it now seems to be working. Any suggestions for healthchecking the chrome container to auto-restart it?

<!-- gh-comment-id:2571707937 --> @icco commented on GitHub (Jan 5, 2025): Looks like it wasn't, I changed the image to `ghcr.io/zenika/alpine-chrome:latest` which recreated the container and it now seems to be working. Any suggestions for healthchecking the chrome container to auto-restart it?
Author
Owner

@MohamedBassem commented on GitHub (Jan 5, 2025):

Glad it now is. The chrome container is healthy if it responds with 200 to http://localhost:9222/json/version. So you can probably plug that as a healthcheck in the docker compose file.

<!-- gh-comment-id:2571712275 --> @MohamedBassem commented on GitHub (Jan 5, 2025): Glad it now is. The chrome container is healthy if it responds with 200 to http://localhost:9222/json/version. So you can probably plug that as a healthcheck in the docker compose file.
Author
Owner

@MohamedBassem commented on GitHub (Jan 5, 2025):

Closing this as done now.

<!-- gh-comment-id:2571720934 --> @MohamedBassem commented on GitHub (Jan 5, 2025): Closing this as done now.
Author
Owner

@icco commented on GitHub (Jan 5, 2025):

Thanks for your help!

<!-- gh-comment-id:2571724570 --> @icco commented on GitHub (Jan 5, 2025): Thanks for your help!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#541
No description provided.