mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #249] Architecture: Support running all archive methods through a SOCKS/HTTP proxy #1683
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1683
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ghost on GitHub (Jul 1, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/249
Would be nice to have a
PROXY=socks5h://1.2.3.4:1080style option to route all traffic through a proxy while archiving.@pirate commented on GitHub (Jul 5, 2019):
I agree this would be nice eventually, but it's tricky to implement consistently across all the archive methods. For now I recommend running it inside docker and doing the proxying for the entire container.
These docs might help, although I cant confirm this type of setup works as I haven't tried it myself:
Expect 1yr+ before I get around to implementing this natively, for now I recommend the docker approach if this is an absolute requirement.
@ghost commented on GitHub (Jul 5, 2019):
Ok, thank you! Appreciate you taking a look.
I will probably look into this if I give ArchiveBox another go.
@issenn commented on GitHub (Apr 22, 2020):
doesn't work with environment inside docker
@pirate commented on GitHub (Apr 29, 2020):
Can you post the docker-compose.yml file you used to test this @issenn? I have gotten docker working through wireguard tunnels with no issues in the pasts, so I'm sure there is a way to do this.
@pirate commented on GitHub (Jul 28, 2020):
FYI all you can now run ArchiveBox through a VPN (wireguard) without too much difficulty: https://github.com/pirate/ArchiveBox/blob/master/docker-compose.yml
I'm still planning on adding HTTP proxy support with an
HTTP_PROXYconfig var so that we can pipe archiving throughpywb'swayback --proxyproxy WARC recorder, but that wont be released until a future version.@kai11 commented on GitHub (Oct 25, 2020):
I had this problem in my setup and write notes on SOCKS5 proxy.
https://gist.github.com/kai11/e91c6fad990c6490b2a4fe8c4defebfe
@marcohald commented on GitHub (May 25, 2022):
@pirate Are you still planing the HTTP_PROXY Implementation ?
It would be very useful in a enterprise environment with a HTTP Proxy which requires authentication.
@allen7u commented on GitHub (Oct 10, 2023):
Any news at this moment?
@pirate commented on GitHub (Oct 11, 2023):
No changes planned to add SOCKS support into ArchiveBox anytime soon because the Docker solutions work well enough for now.
My favorite way to do this is using tailscale, where you can route all Docker traffic through any desired exit node with one line: https://tailscale.com/kb/1103/exit-nodes/#step-4-use-the-exit-node
You can use a docker-compose sidecar container that shares the networking stack with the ArchiveBox container similar to this wireguard example:
https://github.com/pirate/wireguard-docs?tab=readme-ov-file#example-client-container-setup
For more info see: https://tailscale.com/kb/1282/docker/
@huyz commented on GitHub (Jul 30, 2024):
@pirate How did you get this working with Tailscale?
I'm running into this Redditor's issue which is that if you assign the
network_modeto the tailscale container, then all traffic goes through Tailscale. Consequently, my reverse proxy (Caddy) can no longer access thearchiveboxcontainer and I can't bring up the ArchiveBox web UI@pirate commented on GitHub (Jul 30, 2024):
If you share your docker-compose with caddy I can show you, you just add network_mode onto the caddy container too so they all share one network stack. Incoming connections can still be handled by caddy even when all outbound traffic goes through Tailscale.
You can also do it with iptables manually if you are running caddy outside docker.
@huyz commented on GitHub (Aug 2, 2024):
@pirate I'm not running Caddy outside of Docker. But I am running Caddy as part of a separate docker-compose project because it needs to reverse-proxy many services besides ArchiveBox. So it sounds like I need to use iptables.
Do you happen to have sample iptables rules I can use?
Thanks so much for your help.
@pirate commented on GitHub (Aug 2, 2024):
In that case I recommend a named bridge network that they both attach to, it will be simpler than iptables. Run
docker network create archiveboxto create a named network, then attach both the archivebox container and the caddy container to that network. ChatGPT should be able to help generate the yaml for that to attach the containers, if that doesn't work I can maybe help write it for you.@huyz commented on GitHub (Aug 2, 2024):
That's actually what I have: a named bridge network.
But it seems that tailscale's iptables rules just mess everything up and I'm not too familiar with how to allow traffic for that bridged network:
@huyz commented on GitHub (Aug 4, 2024):
@pirate I figured it out. Given that I was using Tailscale in order to pass traffic through a Tailscale exit node, what I was missing was
--exit-node-allow-lan-accessFor everyone, here is my full service definition (where I use Jinja2 as templater; hence the
{{ … }}):For
TS_AUTH_KEY_ARCHIVEBOX, generate a reusable, ephemeral Auth Key (ephemeral, so machine is automatically cleaned up after logout or inactivity; reusable so that we don't have to get a new key it every time the container comes up).And of course, don't forget
sudo tailscale set --advertise-exit-nodeon your exist node.@ShockedCoder commented on GitHub (May 27, 2025):
As an expansion to this, it would be great if you could have a per-site proxy.
Such as a regex or similar, which would match
.organd use the designated proxy.Since if a proxy doesn't allow connections beyond the defined network, then you couldn't use the same ArchiveBox instance to archive normal webpages simultaneously.