[GH-ISSUE #745] Add Proxy Settings for Crawler #486

Closed
opened 2026-03-02 11:50:16 +03:00 by kerem · 10 comments
Owner

Originally created by @Sinterdial on GitHub (Dec 21, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/745

Describe the feature you'd like

In certain regions of the world, internet censorship is very strict, so proxies must be used to access some very popular websites.

However, when I tried to use the environment variables HTTP_PROXY and HTTPS_PROXY to route container traffic through a proxy, it had no effect. I hope there can be configuration options added in the .env file to enable this functionality.

Describe the benefits this would bring to existing Hoarder users

Users around the world

Can the goal of this request already be achieved via other means?

Yes, this could perhaps be achieved by setting the proxy in the Docker daemon's configuration file, but I only want to enable the proxy for specific containers

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @Sinterdial on GitHub (Dec 21, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/745 ### Describe the feature you'd like In certain regions of the world, internet censorship is very strict, so proxies must be used to access some very popular websites. However, when I tried to use the environment variables `HTTP_PROXY` and `HTTPS_PROXY` to route container traffic through a proxy, it had no effect. I hope there can be configuration options added in the `.env` file to enable this functionality. ### Describe the benefits this would bring to existing Hoarder users Users around the world ### Can the goal of this request already be achieved via other means? Yes, this could perhaps be achieved by setting the proxy in the Docker daemon's configuration file, but I only want to enable the proxy for specific containers ### Have you searched for an existing open/closed issue? - [X] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_
kerem closed this issue 2026-03-02 11:50:16 +03:00
Author
Owner

@MohamedBassem commented on GitHub (Dec 22, 2024):

you should be able to setup a proxy server on the chrome container (where most of the fetching happens) using chrome flags. Check https://github.com/hoarder-app/hoarder/issues/420 out

<!-- gh-comment-id:2558377492 --> @MohamedBassem commented on GitHub (Dec 22, 2024): you should be able to setup a proxy server on the chrome container (where most of the fetching happens) using chrome flags. Check https://github.com/hoarder-app/hoarder/issues/420 out
Author
Owner

@KortanZ commented on GitHub (Dec 22, 2024):

Is there any way to setup a proxy server when using browserless? It will be great if i could reuse my browserless service :D.

<!-- gh-comment-id:2558503351 --> @KortanZ commented on GitHub (Dec 22, 2024): Is there any way to setup a proxy server when using browserless? It will be great if i could reuse my browserless service :D.
Author
Owner

@MohamedBassem commented on GitHub (Dec 22, 2024):

@KortanZ seems like you can by adding it at the end of the browserless URL you give to hoarder (https://docs.browserless.io/recipes/proxies#specifying-the-proxy)

<!-- gh-comment-id:2558504024 --> @MohamedBassem commented on GitHub (Dec 22, 2024): @KortanZ seems like you can by adding it at the end of the browserless URL you give to hoarder (https://docs.browserless.io/recipes/proxies#specifying-the-proxy)
Author
Owner

@primejava commented on GitHub (Dec 24, 2024):

I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I www.youtube.com inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded
image

<!-- gh-comment-id:2560871849 --> @primejava commented on GitHub (Dec 24, 2024): I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I www.youtube.com inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded ![image](https://github.com/user-attachments/assets/17d1dee8-7637-4ab0-83a6-07484fec08ec)
Author
Owner

@Sinterdial commented on GitHub (Dec 24, 2024):

I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I www.youtube.com inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded image

same

<!-- gh-comment-id:2560956974 --> @Sinterdial commented on GitHub (Dec 24, 2024): > I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I [www.youtube.com](http://www.youtube.com) inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded ![image](https://private-user-images.githubusercontent.com/10646824/398374682-17d1dee8-7637-4ab0-83a6-07484fec08ec.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzUwMzYyMjUsIm5iZiI6MTczNTAzNTkyNSwicGF0aCI6Ii8xMDY0NjgyNC8zOTgzNzQ2ODItMTdkMWRlZTgtNzYzNy00YWIwLTgzYTYtMDc0ODRmZWMwOGVjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEyMjQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMjI0VDEwMjUyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU3MjI3NTc2OGIzMmJhNTAyMzYzOWQwYjliMzExMzhhMzhmZmFmZTVlN2QwNjg3ZWJhOTU4MTA4NzVlMzAxNzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.ZkFAxlJzQ8bM3n7XlgHNPSmc0Oq_Js3Lk1uNA7_IcVA) same
Author
Owner

@waynexia commented on GitHub (Dec 26, 2024):

In the docker environment the networking is a bit different. You need a special IP 172.17.0.1 to access the host network (my handbook)

<!-- gh-comment-id:2562867564 --> @waynexia commented on GitHub (Dec 26, 2024): In the docker environment the networking is a bit different. You need a special IP `172.17.0.1` to access the host network ([my handbook](https://wiki.waynest.com/#/page/docker%20proxy))
Author
Owner

@seancheung commented on GitHub (Mar 27, 2025):

I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I www.youtube.com inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded image

Like @waynexia said, networks in docker containers are different. You could use extra_hosts if the proxy server is not in the same docker network:

extra_hosts:
  - 'proxy_server:192.168.0.232' # give the ip a hostname
command:
  - --proxy-server='http=proxy_server:20171;https=proxy_server:20171' # use hostname instead of ip
<!-- gh-comment-id:2758740404 --> @seancheung commented on GitHub (Mar 27, 2025): > I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I [www.youtube.com](http://www.youtube.com) inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded ![image](https://github.com/user-attachments/assets/17d1dee8-7637-4ab0-83a6-07484fec08ec) Like @waynexia said, networks in docker containers are different. You could use `extra_hosts` if the proxy server is not in the same docker network: ```yaml extra_hosts: - 'proxy_server:192.168.0.232' # give the ip a hostname command: - --proxy-server='http=proxy_server:20171;https=proxy_server:20171' # use hostname instead of ip ```
Author
Owner

@maidou-00 commented on GitHub (May 16, 2025):

I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I www.youtube.com inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded image

Like @waynexia said, networks in docker containers are different. You could use extra_hosts if the proxy server is not in the same docker network:

extra_hosts:

  • 'proxy_server:192.168.0.232' # give the ip a hostname
    command:
  • --proxy-server='http=proxy_server:20171;https=proxy_server:20171' # use hostname instead of ip

I‘ve tried this but it didn't work...not sure what's wrong:

extra_hosts:
      - "shadowsocks:172.83.0.1"
command:
      - --proxy-server="https=shadowsocks:1080" # use hostname instead of ip
<!-- gh-comment-id:2885856587 --> @maidou-00 commented on GitHub (May 16, 2025): > > I have added the proxy-server parameter in both the command and environment sections of the Chrome container. After successfully starting the container, I verified that the proxy is working by executing the command curl -I [www.youtube.com](http://www.youtube.com) inside the container. However, when I paste a URL into Hoarder, I still encounter the error Crawling job failed: TimeoutError: Navigation timeout of 30000 ms exceeded ![image](https://github.com/user-attachments/assets/17d1dee8-7637-4ab0-83a6-07484fec08ec) > > Like [@waynexia](https://github.com/waynexia) said, networks in docker containers are different. You could use `extra_hosts` if the proxy server is not in the same docker network: > > extra_hosts: > - 'proxy_server:192.168.0.232' # give the ip a hostname > command: > - --proxy-server='http=proxy_server:20171;https=proxy_server:20171' # use hostname instead of ip I‘ve tried this but it didn't work...not sure what's wrong: ``` extra_hosts: - "shadowsocks:172.83.0.1" command: - --proxy-server="https=shadowsocks:1080" # use hostname instead of ip ```
Author
Owner

@dannongruver commented on GitHub (Jun 12, 2025):

any update on this? i experience the same issue. proxy settings on env vars (like any other docker container) but crawling and other errors (eg downloading adblock). proxy works b/c curl works from within the container.

<!-- gh-comment-id:2967570505 --> @dannongruver commented on GitHub (Jun 12, 2025): any update on this? i experience the same issue. proxy settings on env vars (like any other docker container) but crawling and other errors (eg downloading adblock). proxy works b/c curl works from within the container.
Author
Owner

@MohamedBassem commented on GitHub (Jul 13, 2025):

This is coming in the next release: 360ef9d

<!-- gh-comment-id:3066314752 --> @MohamedBassem commented on GitHub (Jul 13, 2025): This is coming in the next release: 360ef9d
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#486
No description provided.