[GH-ISSUE #50] Feature Request: Add support for sharing Snapshots with other ArchiveBox instances (to enable distributed / federated archiving) #33

Open
opened 2026-03-01 14:40:00 +03:00 by kerem · 9 comments
Owner

Originally created by @pirate on GitHub (Nov 2, 2017).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/50

Next big thing I'm thinking about for BA is turning it into a distributed way-back machine! Everyone's personal archives can still be kept separate, but as part of the archive process you're prompted if you want to share the pages you've archived with a federated public archive. Each archived url then gets a deterministic "federated id" which other people will be able to use to find all archived versions of a specific url.

So when I visit my personal archive and see "example.com/blog/123.html" I can click a "show all versions" button which shows an archive in 2010 by alice, one in 2013 by bob, and one in 2017 by frank. I can click on links to view each of their versions in case mine is bad or corrupted somehow.

On the search page you'll be able to search for any url (like the wayback machine), and if it's not in your personal archive it'll show results from other people's archives.

Originally created by @pirate on GitHub (Nov 2, 2017). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/50 Next big thing I'm thinking about for BA is turning it into a distributed way-back machine! Everyone's personal archives can still be kept separate, but as part of the archive process you're prompted if you want to share the pages you've archived with a federated public archive. Each archived url then gets a deterministic "federated id" which other people will be able to use to find all archived versions of a specific url. So when I visit my personal archive and see "example.com/blog/123.html" I can click a "show all versions" button which shows an archive in 2010 by alice, one in 2013 by bob, and one in 2017 by frank. I can click on links to view each of their versions in case mine is bad or corrupted somehow. On the search page you'll be able to search for any url (like the wayback machine), and if it's not in your personal archive it'll show results from other people's archives.
Author
Owner

@pirate commented on GitHub (Nov 2, 2017):

In theory, with enough people running BA we could archive a significant portion of Soundcloud before they go out of business. 😁

<!-- gh-comment-id:341591053 --> @pirate commented on GitHub (Nov 2, 2017): In theory, with enough people running BA we could archive a significant portion of Soundcloud before they go out of business. 😁
Author
Owner

@pirate commented on GitHub (Nov 3, 2017):

Step 1: merkel tree for identifying and querying archive blobs across a distributed system: https://gist.github.com/pirate/0a3545254615b985727b49bc5c3d99cf

<!-- gh-comment-id:341657529 --> @pirate commented on GitHub (Nov 3, 2017): Step 1: merkel tree for identifying and querying archive blobs across a distributed system: https://gist.github.com/pirate/0a3545254615b985727b49bc5c3d99cf
Author
Owner

@pirate commented on GitHub (Mar 5, 2019):

I like what ZeroNet is doing in this space: https://github.com/HelloZeroNet/ZeroNet

<!-- gh-comment-id:469852725 --> @pirate commented on GitHub (Mar 5, 2019): I like what ZeroNet is doing in this space: https://github.com/HelloZeroNet/ZeroNet
Author
Owner

@pirate commented on GitHub (Jan 17, 2023):

Blocked by: https://github.com/ArchiveBox/ArchiveBox/issues/74

Once we have a good unique UUID/ULID ID scheme for Snapshots we can begin thinking about how to broadcast that with some metadata to other ArchiveBox instances / endpoints.

Planned baby steps towards this goal in the far-far-future:

  1. Finalize ArchiveBox Add REST API endpoint to allow other services to POST new URLs and snapshots to ArchiveBox
  2. Add functionality for ArchiveBox to announce new snapshots to the world via RSS/webhooks/realtime endpoint of some kind:
    • rest webhook support: i.e. add the ability to configure ArchiveBox to ping outside endpoints whenever a new Snapshot/ArchiveResult is created
    • RSS feed support, i.e. publish an RSS feed on the ArchiveBox server of all recent snapshots (like Pocket does for your pocket bookmarks)
  3. Add native ArchiveBox UI support for searching some of these global federation mechanisms on your own instance so that you can browse snapshots from other instances and providers without leaving your one unified UI

External tools could then be developed that injest this feed to publish archivebox content on other platforms, e.g.:

  • archivebox RSS -> proof-of-history blockchain e.g. Solana
  • archivebox RSS -> bittorrent's magnet DHT and tracker sites
  • archivebox RSS -> IFTTT/zapier/slack/zulip/etc. webhooks

Then later we can add functionality for ArchiveBox to publish snapshots/metadata to global lookup systems like proof-of-history blockchains (e.g. Solana), DHT's like bittorrent's magent system uses, distributed filesystems like IPFS, etc.

<!-- gh-comment-id:1384782971 --> @pirate commented on GitHub (Jan 17, 2023): Blocked by: https://github.com/ArchiveBox/ArchiveBox/issues/74 Once we have a good unique UUID/ULID ID scheme for Snapshots we can begin thinking about how to broadcast that with some metadata to other ArchiveBox instances / endpoints. Planned baby steps towards this goal in the far-far-future: 1. Finalize ArchiveBox `Add` REST API endpoint to allow other services to POST new URLs and snapshots to ArchiveBox 2. Add functionality for ArchiveBox to announce new snapshots to the world via RSS/webhooks/realtime endpoint of some kind: - rest webhook support: i.e. add the ability to configure ArchiveBox to ping outside endpoints whenever a new Snapshot/ArchiveResult is created - RSS feed support, i.e. publish an RSS feed on the ArchiveBox server of all recent snapshots (like Pocket does for your pocket bookmarks) 3. Add native ArchiveBox UI support for searching some of these global federation mechanisms on your own instance so that you can browse snapshots from other instances and providers without leaving your one unified UI External tools could then be developed that injest this feed to publish archivebox content on other platforms, e.g.: - archivebox RSS -> proof-of-history blockchain e.g. Solana - archivebox RSS -> bittorrent's magnet DHT and tracker sites - archivebox RSS -> IFTTT/zapier/slack/zulip/etc. webhooks Then later we can add functionality for ArchiveBox to publish snapshots/metadata to global lookup systems like proof-of-history blockchains (e.g. Solana), DHT's like bittorrent's magent system uses, distributed filesystems like IPFS, etc.
Author
Owner

@ghobs91 commented on GitHub (Oct 13, 2024):

With Internet Archive/Wayback's recent ongoing outage, this functionality would be very valuable as an alternative!

<!-- gh-comment-id:2408880661 --> @ghobs91 commented on GitHub (Oct 13, 2024): With Internet Archive/Wayback's recent ongoing outage, this functionality would be very valuable as an alternative!
Author
Owner

@pirate commented on GitHub (Oct 13, 2024):

Already working on it! The latest 0.8.5 version is adding a sort of content addressable store system with ABIDs that can be shared between instances.

I'm also starting to merkle hash all the archive results, with the intent of building a sharing layer on top of BitTorrent or IPFS in a future release.

It's big design architecture and UI challenge so it will take several releases to get to a final solution, but I'm starting to chip away at it.

<!-- gh-comment-id:2409134996 --> @pirate commented on GitHub (Oct 13, 2024): Already working on it! The latest 0.8.5 version is adding a sort of content addressable store system with ABIDs that can be shared between instances. I'm also starting to merkle hash all the archive results, with the intent of building a sharing layer on top of BitTorrent or IPFS in a future release. It's big design architecture and UI challenge so it will take several releases to get to a final solution, but I'm starting to chip away at it.
Author
Owner

@chaos-baum commented on GitHub (Oct 21, 2024):

When I stumbled upon this project, I was immediately thinking about how IPFS could be of use. It would be nice to see IPFS in action in such a project. Keep up the good work!

<!-- gh-comment-id:2426419913 --> @chaos-baum commented on GitHub (Oct 21, 2024): When I stumbled upon this project, I was immediately thinking about how IPFS could be of use. It would be nice to see IPFS in action in such a project. Keep up the good work!
Author
Owner

@sij-ai commented on GitHub (Jun 23, 2025):

Any thought given to optionally federating ArchiveBox, e.g., via ActivityPub?

AP + IPFS could be the glue to make this happen?

<!-- gh-comment-id:2995086130 --> @sij-ai commented on GitHub (Jun 23, 2025): Any thought given to optionally federating ArchiveBox, e.g., via ActivityPub? AP + IPFS could be the glue to make this happen?
Author
Owner

@pirate commented on GitHub (Jun 23, 2025):

I went even simpler designed it to work as a standard torrent/webtorrent tracker, with RSS to announce new URLs. Nodes already expose a public search interface and can use any S3-compatible storage so I figure that's a good enough discovery mechanism for now.

But I got a new job recently so I dont have a ton of time to work on it at the moment: https://github.com/ArchiveBox/ArchiveBox/issues/191#issuecomment-2848370416

<!-- gh-comment-id:2995369749 --> @pirate commented on GitHub (Jun 23, 2025): I went even simpler designed it to work as a standard torrent/webtorrent tracker, with RSS to announce new URLs. Nodes already expose a public search interface and can use any S3-compatible storage so I figure that's a good enough discovery mechanism for now. But I got a new job recently so I dont have a ton of time to work on it at the moment: https://github.com/ArchiveBox/ArchiveBox/issues/191#issuecomment-2848370416
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#33
No description provided.