[GH-ISSUE #786] Feature Request: Allow locally run ArchiveBox CLI commands to control a separate remote ArchiveBox backend #501

Open
opened 2026-03-01 14:44:08 +03:00 by kerem · 9 comments
Owner

Originally created by @huajianmao on GitHub (Jul 8, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/786

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

I may use ArchiveBox in multiple machines and want to centrally store the archives in one server.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

What hacks or alternative solutions have you tried to solve the problem?

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
Originally created by @huajianmao on GitHub (Jul 8, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/786 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you :) --> ## Type - [x] General question or discussion - [ ] Propose a brand new feature - [ ] Request modification of existing behavior or design ## What is the problem that your feature request solves <!-- e.g. I need to be able to archive spanish and french subtitle files from a particular <example.com> movie site that's going down soon. --> I may use ArchiveBox in multiple machines and want to centrally store the archives in one server. ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes <!-- e.g. I specifically need a new archive method to look for multilingual subtitle files related to pages. The bigger picture solution is the ability for custom user scripts to be run in a puppeteer context during archiving. --> ## What hacks or alternative solutions have you tried to solve the problem? <!-- A clear and concise description of any alternative solutions, workarounds, or other software you've considered using to fix the problem. --> ## How badly do you want this new feature? - [x] It's an urgent deal-breaker, I can't live without it - [ ] It's important to add it in the near-mid term future - [ ] It would be nice to have eventually --- - [x] I'm willing to contribute [dev time](https://github.com/ArchiveBox/ArchiveBox#archivebox-development) / [money](https://github.com/sponsors/pirate) to fix this issue - [ ] I like ArchiveBox so far / would recommend it to a friend - [ ] I've had a lot of difficulty getting ArchiveBox set up
Author
Owner

@pirate commented on GitHub (Jul 8, 2021):

We might get this ability for free with the upcoming message passing architecture refactor https://github.com/ArchiveBox/ArchiveBox/issues/91#issuecomment-871343428, but it may take a while as that refactor is a big one.

For now I suggest using SSH/rsync to push your files to a dir in the central server e.g.
rsync /some/local/urls.txt central-server:/path/to/urls.txt,
and then on the central server run:
archivebox schedule --every=day --depth=1 /path/to/urls.txt.

Another way to submit URLs to a remote ArchiveBox instance is to make a POST request to the /core/snapshot/add/ endpoint with your URLs, see here for more info https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-872961973 and here for an example of how to make a request to the endpoint: github.com/tjhorner/archivebox-exporter@adef67d6b9/src/common/services/archiver.ts (L78).

<!-- gh-comment-id:876247821 --> @pirate commented on GitHub (Jul 8, 2021): We might get this ability for free with the upcoming message passing architecture refactor https://github.com/ArchiveBox/ArchiveBox/issues/91#issuecomment-871343428, but it may take a while as that refactor is a big one. For now I suggest using SSH/rsync to push your files to a dir in the central server e.g. `rsync /some/local/urls.txt central-server:/path/to/urls.txt`, and then on the central server run: `archivebox schedule --every=day --depth=1 /path/to/urls.txt`. Another way to submit URLs to a remote ArchiveBox instance is to make a POST request to the `/core/snapshot/add/` endpoint with your URLs, see here for more info https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-872961973 and here for an example of how to make a request to the endpoint: https://github.com/tjhorner/archivebox-exporter/blob/adef67d6b9a81e788ec91107b871045c4640132b/src/common/services/archiver.ts#L78.
Author
Owner

@huajianmao commented on GitHub (Jul 8, 2021):

I see.

ArchiveBox is awesome, and thanks for your suggestion! @pirate

<!-- gh-comment-id:876334586 --> @huajianmao commented on GitHub (Jul 8, 2021): I see. ArchiveBox is awesome, and thanks for your suggestion! @pirate
Author
Owner

@huajianmao commented on GitHub (Jul 8, 2021):

About the architecture refactor, how about refactor ArchiveBox to a client / server model?
Or keep the current architecture, and separate the url management operations to a thin ArchiveBox client which can be configured to use a remote ArchiveBox server to save the archives? @pirate

<!-- gh-comment-id:876340122 --> @huajianmao commented on GitHub (Jul 8, 2021): About the architecture refactor, how about refactor ArchiveBox to a client / server model? Or keep the current architecture, and separate the url management operations to a thin ArchiveBox client which can be configured to use a remote ArchiveBox server to save the archives? @pirate
Author
Owner

@huajianmao commented on GitHub (Jul 8, 2021):

Besides, it would be really great if a browser plugin could be provided.🤤

<!-- gh-comment-id:876344399 --> @huajianmao commented on GitHub (Jul 8, 2021): Besides, it would be really great if a browser plugin could be provided.🤤
Author
Owner

@pirate commented on GitHub (Jul 13, 2021):

About the architecture refactor, how about refactor ArchiveBox to a client / server model?

This is already how the message-passing refactor works, it will be like client-server on steroids. User actions will cause tasks to be emitted to one of several task queues, which can then be processed by workers on the same machine or remotely. Any subcomponent of ArchiveBox will be configurable to run locally or remotely, so you can split up the CLI, the web backend, the parser workers, and the archiver workers onto any machines you want.


it would be really great if a browser plugin could be provided

A user-contributed browser plugin is already available 😉, see here: https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-872915877

You'll have to run the pre-release version of ArchiveBox on dev to use it, read the full thread for details ^.

<!-- gh-comment-id:879307173 --> @pirate commented on GitHub (Jul 13, 2021): > About the architecture refactor, how about refactor ArchiveBox to a client / server model? This is already how the message-passing refactor works, it will be like client-server on steroids. User actions will cause tasks to be emitted to one of several task queues, which can then be processed by workers on the same machine or remotely. Any subcomponent of ArchiveBox will be configurable to run locally or remotely, so you can split up the CLI, the web backend, the parser workers, and the archiver workers onto any machines you want. --- > it would be really great if a browser plugin could be provided A user-contributed browser plugin is already available :wink:, see here: https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-872915877 You'll have to run the pre-release version of ArchiveBox on `dev` to use it, read the full thread for details ^.
Author
Owner

@huajianmao commented on GitHub (Jul 14, 2021):

Great and thanks!

<!-- gh-comment-id:879534182 --> @huajianmao commented on GitHub (Jul 14, 2021): Great and thanks!
Author
Owner

@jfinkhaeuser commented on GitHub (Apr 17, 2023):

What's the status here? I don't see so much progress, but this is pretty much the number one thing I'm missing.

<!-- gh-comment-id:1510830381 --> @jfinkhaeuser commented on GitHub (Apr 17, 2023): What's the status here? I don't see so much progress, but this is pretty much the number one thing I'm missing.
Author
Owner

@pirate commented on GitHub (Apr 19, 2023):

I would say don't expect this soon, as the temporary solution to use SSH works in almost all cases and it's a lot of complexity for me to maintain.

Higher priority for me right now is the REST API: https://github.com/ArchiveBox/ArchiveBox/issues/496

<!-- gh-comment-id:1514028616 --> @pirate commented on GitHub (Apr 19, 2023): I would say don't expect this soon, as the temporary solution to use SSH works in almost all cases and it's a lot of complexity for me to maintain. Higher priority for me right now is the REST API: https://github.com/ArchiveBox/ArchiveBox/issues/496
Author
Owner

@jfinkhaeuser commented on GitHub (Apr 19, 2023):

Fair enough, thank you!

Yes, the workaround with SSH works. It's a tad more complex than I would like, but the important part is that there is something that can be done!

<!-- gh-comment-id:1514212016 --> @jfinkhaeuser commented on GitHub (Apr 19, 2023): Fair enough, thank you! Yes, the workaround with SSH works. It's a tad more complex than I would like, but the important part is that there is something that can be done!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#501
No description provided.