[GH-ISSUE #1783] Networking issue in web container #1114

Closed
opened 2026-03-02 11:55:06 +03:00 by kerem · 0 comments
Owner

Originally created by @pdc1 on GitHub (Jul 25, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1783

Describe the Bug

I have been debugging a crawler timeout, and ended up finding there is a networking issue in the web container.

This command on the host works:

wget \
   --header="User-Agent: Mozilla/5.0" \
   -O wp-test.jpg \
   "https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/G3GBOXRAFNDNTPLCMRJ2FHYYP4.jpg&w=1440"

When run in the web container using docker exec -it karakeep-app-web-1 sh it just hangs.

This failure shows up in the logs like:

2025-07-25T17:53:14.629Z info: [Crawler][32416] Downloading image from "https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/G3GBOXRAFNDNTPLCMRJ2FHYYP4.jpg&w=1440"
<-- GET /api/health
--> GET /api/health 200 2ms
2025-07-25T17:53:44.099Z error: [Crawler][32416] Crawling job failed: Error: Timed-out after 60 secs
Error: Timed-out after 60 secs
    at Timeout._onTimeout (file:///app/apps/workers/dist/index.mjs:55105:117)
    at listOnTimeout (node:internal/timers:588:17)
    at process.processTimers (node:internal/timers:523:7)

I am a little surprised that some requests are run from the web host, when the chrome container has been set up specifically for scraping.

It's not every site, fortunately, so something tricky is going on with Washington Post, and perhaps others...

Steps to Reproduce

  1. docker exec -it karakeep-app-web-1 sh

wget
--header="User-Agent: Mozilla/5.0"
-O wp-test.jpg
"https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/G3GBOXRAFNDNTPLCMRJ2FHYYP4.jpg&w=1440"


### Expected Behaviour

Image is downloaded quickly.

### Screenshots or Additional Context

_No response_

### Device Details

Docker on Debian host

### Exact Karakeep Version

0.26.0

### Have you checked the troubleshooting guide?

- [x] I have checked the troubleshooting guide and I haven't found a solution to my problem
Originally created by @pdc1 on GitHub (Jul 25, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1783 ### Describe the Bug I have been debugging a crawler timeout, and ended up finding there is a networking issue in the web container. This command on the host works: ```sh wget \ --header="User-Agent: Mozilla/5.0" \ -O wp-test.jpg \ "https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/G3GBOXRAFNDNTPLCMRJ2FHYYP4.jpg&w=1440" ``` When run in the web container using `docker exec -it karakeep-app-web-1 sh` it just hangs. This failure shows up in the logs like: ```log 2025-07-25T17:53:14.629Z info: [Crawler][32416] Downloading image from "https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/G3GBOXRAFNDNTPLCMRJ2FHYYP4.jpg&w=1440" <-- GET /api/health --> GET /api/health 200 2ms 2025-07-25T17:53:44.099Z error: [Crawler][32416] Crawling job failed: Error: Timed-out after 60 secs Error: Timed-out after 60 secs at Timeout._onTimeout (file:///app/apps/workers/dist/index.mjs:55105:117) at listOnTimeout (node:internal/timers:588:17) at process.processTimers (node:internal/timers:523:7) ``` I am a little surprised that some requests are run from the web host, when the chrome container has been set up specifically for scraping. It's not every site, fortunately, so something tricky is going on with Washington Post, and perhaps others... ### Steps to Reproduce 1. `docker exec -it karakeep-app-web-1 sh` 2. ```sh wget \ --header="User-Agent: Mozilla/5.0" \ -O wp-test.jpg \ "https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/G3GBOXRAFNDNTPLCMRJ2FHYYP4.jpg&w=1440" ``` ### Expected Behaviour Image is downloaded quickly. ### Screenshots or Additional Context _No response_ ### Device Details Docker on Debian host ### Exact Karakeep Version 0.26.0 ### Have you checked the troubleshooting guide? - [x] I have checked the troubleshooting guide and I haven't found a solution to my problem
kerem 2026-03-02 11:55:06 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1114
No description provided.