[GH-ISSUE #531] Feature Request: One-Click Deploy to hosting providers #339

Closed
opened 2026-03-01 14:42:36 +03:00 by kerem · 21 comments
Owner

Originally created by @mAAdhaTTah on GitHub (Nov 11, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/531

DigitalOcean is launching a one-click deploy for it's AppPlatform. This won't work for us yet because we would need to attach a Volume, which AppPlatform doesn't support, but the documentation linked suggests it will soon/eventually. Alternatively, we could look into configuring it for Heroku.

I'm happy to take the lead on this as well, but wanted to open an issue for visibility/discussion.

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

I think it would be helpful for new users to be able to spin up an ArchiveBox instance in the cloud w/ minimal work. Running it on Docker in the first place is really helpful, but would be nice to simplify it even further.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

It should be feasible for a new user

What hacks or alternative solutions have you tried to solve the problem?

I'm still considering how I'm going to host my archive. I initially spun it up on a home server, which works but doesn't help if I want to expose the in-progress REST API to my website. I then put it on a DO droplet, which I'm still fiddling with. I've also considered writing ansible roles for this as well, although that's a bit more involved for the less technical.

The main issue with something like AppPlatform & Heroku is that you don't get CLI access, so everything needs to function via the UI. Downloading sites can take several minutes, which may time out if deployed on AppPlatform (I haven't tested it in that context but it's definitely been happening on my droplet). Maybe worth looking at/considering how we can configure this as background tasks or something? Or maybe deploy to AppPlatform as a worker?

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
Originally created by @mAAdhaTTah on GitHub (Nov 11, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/531 [DigitalOcean is launching a one-click deploy for it's AppPlatform.](https://www.digitalocean.com/docs/app-platform/how-to/add-deploy-do-button/) This won't work for us yet because we would need to attach a Volume, which AppPlatform doesn't support, but the documentation linked suggests it will soon/eventually. Alternatively, we could look into configuring it for Heroku. I'm happy to take the lead on this as well, but wanted to open an issue for visibility/discussion. ## Type - [ ] General question or discussion - [X] Propose a brand new feature - [ ] Request modification of existing behavior or design ## What is the problem that your feature request solves I think it would be helpful for new users to be able to spin up an ArchiveBox instance in the cloud w/ minimal work. Running it on Docker in the first place is really helpful, but would be nice to simplify it even further. ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes It should be feasible for a new user ## What hacks or alternative solutions have you tried to solve the problem? I'm still considering how I'm going to host my archive. I initially spun it up on a home server, which works but doesn't help if I want to expose the in-progress REST API to my website. I then put it on a DO droplet, which I'm still fiddling with. I've also considered writing ansible roles for this as well, although that's a bit more involved for the less technical. The main issue with something like AppPlatform & Heroku is that you don't get CLI access, so everything needs to function via the UI. Downloading sites can take several minutes, which may time out if deployed on AppPlatform (I haven't tested it in that context but it's definitely been happening on my droplet). Maybe worth looking at/considering how we can configure this as background tasks or something? Or maybe deploy to AppPlatform as a worker? ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [ ] It's important to add it in the near-mid term future - [x] It would be nice to have eventually --- - [x] I'm willing to contribute dev time / money to fix this issue - [x] I like ArchiveBox so far / would recommend it to a friend - [ ] I've had a lot of difficulty getting ArchiveBox set up
Author
Owner

@pirate commented on GitHub (Apr 6, 2021):

Some managed hosting options have popped up in the last few months, might be worth checking out if you're willing to pay $ for hosting:

https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-Community#managed-archivebox-hosting

<!-- gh-comment-id:813860823 --> @pirate commented on GitHub (Apr 6, 2021): Some managed hosting options have popped up in the last few months, might be worth checking out if you're willing to pay $ for hosting: https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-Community#managed-archivebox-hosting
Author
Owner

@olimart commented on GitHub (Apr 19, 2021):

Heroku button support would be awesome indeed.
https://www.heroku.com/elements/buttons

<!-- gh-comment-id:822487772 --> @olimart commented on GitHub (Apr 19, 2021): Heroku button support would be awesome indeed. https://www.heroku.com/elements/buttons
Author
Owner

@mAAdhaTTah commented on GitHub (Apr 19, 2021):

@olimart The biggest issue with doing this is the filesystem. Heroku & DO's App Platform both provide ephemeral filesystems per deploy, so they're wiped on restart/redeploy. We'd need to either configure those platforms for block storage (something DO's AP doesn't support yet; not sure about Heroku) or provide a swappable implementation for the filesystem to save things to S3 or some other object storage (DO's Spaces, which is S3 compatible). I haven't dug into this much but it's definitely not a trivial effort.

<!-- gh-comment-id:822531146 --> @mAAdhaTTah commented on GitHub (Apr 19, 2021): @olimart The biggest issue with doing this is the filesystem. Heroku & DO's App Platform both provide ephemeral filesystems per deploy, so they're wiped on restart/redeploy. We'd need to either configure those platforms for block storage (something DO's AP doesn't support yet; not sure about Heroku) or provide a swappable implementation for the filesystem to save things to S3 or some other object storage (DO's Spaces, which is S3 compatible). I haven't dug into this much but it's definitely not a trivial effort.
Author
Owner

@olimart commented on GitHub (Apr 19, 2021):

Thanks @mAAdhaTTah
Yep, would need to provide the ability to configure external storage (S3...)
I saw quickly a reference to SQLite which is not supported by Heroku either.
Web app on Heroku, storage on Dropbox 😄

<!-- gh-comment-id:822560515 --> @olimart commented on GitHub (Apr 19, 2021): Thanks @mAAdhaTTah Yep, would need to provide the ability to configure external storage (S3...) I saw quickly a reference to SQLite which is not supported by Heroku either. Web app on Heroku, storage on Dropbox 😄
Author
Owner

@pirate commented on GitHub (Apr 23, 2021):

Here's a WIP DigitalOcean "one-click" deploy template, but as @mAAdhaTTah mentioned it's broken because disk storage is not supported by DO apps yet: https://github.com/ArchiveBox/ArchiveBox/blob/digitalocean/.do/deploy.template.yaml

image

<!-- gh-comment-id:825908077 --> @pirate commented on GitHub (Apr 23, 2021): Here's a WIP DigitalOcean "one-click" deploy template, but as @mAAdhaTTah mentioned it's broken because disk storage is not supported by DO apps yet: https://github.com/ArchiveBox/ArchiveBox/blob/digitalocean/.do/deploy.template.yaml ![image](https://user-images.githubusercontent.com/511499/115926450-dbfc0e00-a450-11eb-9b00-eebc24dc0815.png)
Author
Owner

@mAAdhaTTah commented on GitHub (Apr 25, 2021):

@pirate Yeah, and swapping out for S3 would be tough/impossible with the SQLite db (plus if the tools we use write their own files, that makes it even more difficult).

<!-- gh-comment-id:826329900 --> @mAAdhaTTah commented on GitHub (Apr 25, 2021): @pirate Yeah, and swapping out for S3 would be tough/impossible with the SQLite db (plus if the tools we use write their own files, that makes it even more difficult).
Author
Owner

@pirate commented on GitHub (Apr 25, 2021):

I think it's still feasible though, we can write to local disk / RAM disk and then sync it to s3 or other storage backends every few seconds. It'll have a second or two of lag but I think that's an acceptable trade off.

<!-- gh-comment-id:826344699 --> @pirate commented on GitHub (Apr 25, 2021): I think it's still feasible though, we can write to local disk / RAM disk and then sync it to s3 or other storage backends every few seconds. It'll have a second or two of lag but I think that's an acceptable trade off.
Author
Owner

@mAAdhaTTah commented on GitHub (Apr 25, 2021):

@pirate How would you handle the db in that instance? Sync it down on boot?

<!-- gh-comment-id:826350512 --> @mAAdhaTTah commented on GitHub (Apr 25, 2021): @pirate How would you handle the db in that instance? Sync it down on boot?
Author
Owner

@pirate commented on GitHub (Apr 25, 2021):

Nah just rsync it every few seconds like all the other files. I think S3 supports byte-range requests so you can just sync the diffs instead of the whole thing each time.

<!-- gh-comment-id:826350678 --> @pirate commented on GitHub (Apr 25, 2021): Nah just rsync it every few seconds like all the other files. I think S3 supports byte-range requests so you can just sync the diffs instead of the whole thing each time.
Author
Owner

@turian commented on GitHub (Aug 12, 2022):

I would also want this feature

<!-- gh-comment-id:1212842234 --> @turian commented on GitHub (Aug 12, 2022): I would also want this feature
Author
Owner

@turian commented on GitHub (Sep 11, 2022):

@pirate How would you handle the db in that instance? Sync it down on boot?

Alternately, use the Digital Ocean postgres server. (Or is archivebox sqlite3 only.)

<!-- gh-comment-id:1242926010 --> @turian commented on GitHub (Sep 11, 2022): > @pirate How would you handle the db in that instance? Sync it down on boot? Alternately, use the Digital Ocean postgres server. (Or is archivebox sqlite3 only.)
Author
Owner

@turian commented on GitHub (Sep 12, 2022):

Additionally, it might be possible to use s3fuse to treat the DO spaces as a local filesystem

This might be kinda gross since you have to overwrite the file each time, you can't modify / append it. That could cause issues

<!-- gh-comment-id:1244025867 --> @turian commented on GitHub (Sep 12, 2022): Additionally, it might be possible to use [s3fuse to treat the DO spaces as a local filesystem](https://cloud.netapp.com/blog/amazon-s3-as-a-file-system) This might be kinda gross since you have to *overwrite* the file each time, you can't modify / append it. That could cause issues
Author
Owner

@mAAdhaTTah commented on GitHub (Sep 12, 2022):

@turian The big issue, as I understand it, is the external binaries write files directly to disk.

<!-- gh-comment-id:1244330545 --> @mAAdhaTTah commented on GitHub (Sep 12, 2022): @turian The big issue, as I understand it, is the external binaries write files directly to disk.
Author
Owner

@turian commented on GitHub (Sep 12, 2022):

@turian The big issue, as I understand it, is the external binaries write files directly to disk.

Yeah but @pirate 's suggestion is just to rsync very frequently to s3.

On startup, you rsync back from s3. (I guess this can get expensive if you are not in AWS, since s3 downloads are costly.)

(BTW, digital ocean spaces are s3 compatible.)

The only real issue I can think of is durability, like if the process breaks for some reason and you have a corrupted thing. Then you have to rollback the s3 which could be a pain.

<!-- gh-comment-id:1244463309 --> @turian commented on GitHub (Sep 12, 2022): > @turian The big issue, as I understand it, is the external binaries write files directly to disk. Yeah but @pirate 's suggestion is just to rsync very frequently to s3. On startup, you rsync back from s3. (I guess this can get expensive if you are not in AWS, since s3 downloads are costly.) (BTW, digital ocean spaces are s3 compatible.) The only real issue I can think of is durability, like if the process breaks for some reason and you have a corrupted thing. Then you have to rollback the s3 which could be a pain.
Author
Owner

@mAAdhaTTah commented on GitHub (Sep 12, 2022):

rsync'ing back & forth seems rough for an archive of any serious size. I believe my archive is several GBs at this point and if I had to resync it down on startup and rsync up after archiving, that would be pretty slow.

<!-- gh-comment-id:1244615504 --> @mAAdhaTTah commented on GitHub (Sep 12, 2022): rsync'ing back & forth seems rough for an archive of any serious size. I believe my archive is several GBs at this point and if I had to resync it down on startup and rsync up after archiving, that would be pretty slow.
Author
Owner

@turian commented on GitHub (Sep 12, 2022):

@mAAdhaTTah So I don't know the internals of archivebox but:

  • rsync'ing it up should be relatively fast, since it only uploads the diff. i.e. whatever is new in the past 10 seconds or whatever.
  • I'm not sure you have to rsync down the entire archive. Probably just the sqlite3 and a few other small files that indicate what's left in the queue to be archived. I could be wrong though, I'm just guessing.
<!-- gh-comment-id:1244703409 --> @turian commented on GitHub (Sep 12, 2022): @mAAdhaTTah So I don't know the internals of archivebox but: * rsync'ing it up should be relatively fast, since it only uploads the diff. i.e. whatever is new in the past 10 seconds or whatever. * I'm not sure you have to rsync down the entire archive. Probably just the sqlite3 and a few other small files that indicate what's left in the queue to be archived. I could be wrong though, I'm just guessing.
Author
Owner

@pirate commented on GitHub (Sep 15, 2022):

I believe rsyncing bidirectionally on startup can be made reasonably fast/efficient even for large archives as there are advanced rsync options that let you store a sync cache file for faster diffing.

<!-- gh-comment-id:1247477560 --> @pirate commented on GitHub (Sep 15, 2022): I believe rsyncing bidirectionally on startup can be made reasonably fast/efficient even for large archives as there are advanced rsync options that let you store a sync cache file for faster diffing.
Author
Owner

@turian commented on GitHub (Sep 15, 2022):

@mAAdhaTTah Also, if you want a one-click deploy of ArchiveBox, you can get one on PikaPods. It costs a few bucks a month.

I think they are running 0.6.2. Unfortunately this means you still will get crashes on the UTF-8 bug and youtube-dl bugs and the archiving will stop, for which there are PRs but are not merged yet.

PikaPods builds all their one-click app stuff in house (not open source) I think, so there's no way to customize.

Another option is YunoHost. Their apps are all open-source, so in principle there could be a bleeding edge archivebox app in there too.

<!-- gh-comment-id:1248532433 --> @turian commented on GitHub (Sep 15, 2022): @mAAdhaTTah Also, if you want a one-click deploy of ArchiveBox, you can get one on [PikaPods](https://www.pikapods.com/apps). It costs a few bucks a month. I think they are running 0.6.2. Unfortunately this means you still will get crashes on the UTF-8 bug and youtube-dl bugs and the archiving will stop, for which there are [PRs](https://github.com/ArchiveBox/ArchiveBox/pull/1026) but are not merged yet. PikaPods builds all their one-click app stuff in house (not open source) I think, so there's no way to customize. Another option is [YunoHost](https://yunohost.org/en/apps?q=%2Fapps). Their apps are all [open-source](https://github.com/YunoHost-Apps/archivebox_ynh), so in principle there could be a bleeding edge archivebox app in there too.
Author
Owner

@pirate commented on GitHub (Jun 13, 2023):

I'm going to close this for now because realistically the only two options I foresee for the future are:

  • I continue maintaining ArchiveBox as a non-profit side-project (in which case I have no personal capacity to support bespoke one-click solutions that deploy to paid hosting platforms beyond linking them in the README)
  • I turn ArchiveBox into a for-profit enterprise and offer paid ArchiveBox hosting (in which case I have no interest in supporting competing paid deployment solutions for free)
<!-- gh-comment-id:1589119917 --> @pirate commented on GitHub (Jun 13, 2023): I'm going to close this for now because realistically the only two options I foresee for the future are: - I continue maintaining ArchiveBox as a non-profit side-project (in which case I have no personal capacity to support bespoke one-click solutions that deploy to paid hosting platforms beyond [linking them in the README](https://github.com/ArchiveBox/ArchiveBox/blob/dev/README.md#-other-options)) - I turn ArchiveBox into a for-profit enterprise and offer paid ArchiveBox hosting (in which case I have no interest in supporting competing paid deployment solutions for free)
Author
Owner

@boehs commented on GitHub (May 6, 2024):

For what its worth I did a railway deploy, this is a link to it. I think for new users they give you $5 in credit, and once that is used you get $5 credit for a $5 subscription. ArchiveBox uses like $1 of credit or so per month.

Edit: here it is deployed: https://box.boehs.org/archive/1714976395.796772/index.html

<!-- gh-comment-id:2095254541 --> @boehs commented on GitHub (May 6, 2024): For what its worth I did a railway deploy, [this](https://railway.app/template/2Vvhmy?referralCode=evan) is a link to it. I think for new users they give you $5 in credit, and once that is used you get $5 credit for a $5 subscription. ArchiveBox uses like $1 of credit or so per month. Edit: here it is deployed: https://box.boehs.org/archive/1714976395.796772/index.html
Author
Owner

@turian commented on GitHub (Oct 24, 2024):

@pirate I just spent the better part of two days trying to write an ansible playbook setting up archivebox on hetzner with caddy and decent security and it still doesn't work. So I would love if you launched a managed hosted option. I would pay at least double what the expenses it costs for your server / PaaS rental, just so you could understand possible pricing.

Indeed, I would venture to say that MANY MANY more people are interested in USING archivebox than in maintaining it. See how popular pinboard.in is? This could be the next one, particularly considering that pinboard.in dev goes dark for extended periods of time.

"I turn ArchiveBox into a for-profit enterprise and offer paid ArchiveBox hosting (in which case I have no interest in supporting competing paid deployment solutions for free)" YES PLEASE. I think that is probably the most sustainable path to recurring revenue.

Feel free to email me at lastname at gmail's email service if you want feedback

<!-- gh-comment-id:2436083509 --> @turian commented on GitHub (Oct 24, 2024): @pirate I just spent the better part of two days trying to write an ansible playbook setting up archivebox on hetzner with caddy and decent security and it still doesn't work. So I would love if you launched a managed hosted option. I would pay at least double what the expenses it costs for your server / PaaS rental, just so you could understand possible pricing. Indeed, I would venture to say that MANY MANY more people are interested in USING archivebox than in maintaining it. See how popular pinboard.in is? This could be the next one, particularly considering that pinboard.in dev goes dark for extended periods of time. "I turn ArchiveBox into a for-profit enterprise and offer paid ArchiveBox hosting (in which case I have no interest in supporting competing paid deployment solutions for free)" YES PLEASE. I think that is probably the most sustainable path to recurring revenue. Feel free to email me at lastname at gmail's email service if you want feedback
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#339
No description provided.