[GH-ISSUE #47] Production setup

kerem commented

2026-02-25 23:40:49 +03:00

Owner

Originally created by @mlanner on GitHub (Feb 20, 2016).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/47

I've been following the "Setting Up for Development" guide to try to come up with a setup that's better adapted to actually running a production server. I think I've got most things right, but I'm stumbling somewhere. My plan is to run this behind Caddy (https://caddyserver.com/). Are there any instructions anywhere for how to set up a "production" server to run healthchecks? If not, I'm happy to contribute one, but might need some pointers.

Originally created by @mlanner on GitHub (Feb 20, 2016). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/47 I've been following the "Setting Up for Development" guide to try to come up with a setup that's better adapted to actually running a production server. I think I've got most things right, but I'm stumbling somewhere. My plan is to run this behind Caddy (https://caddyserver.com/). Are there any instructions anywhere for how to set up a "production" server to run healthchecks? If not, I'm happy to contribute one, but might need some pointers.

kerem closed this issue

2026-02-25 23:40:49 +03:00

kerem commented

2026-02-25 23:40:49 +03:00

Author

Owner

@diwu1989 commented on GitHub (Feb 22, 2016):

check my fork to see how I ran a production deployment on Heroku
https://github.com/BetterWorks/healthchecks
I'm using nginx as the reverse proxy on mine

@diwu1989 commented on GitHub (Feb 22, 2016): check my fork to see how I ran a production deployment on Heroku https://github.com/BetterWorks/healthchecks I'm using nginx as the reverse proxy on mine

kerem commented

2026-02-25 23:40:49 +03:00

Author

Owner

@cuu508 commented on GitHub (Apr 5, 2016):

Hello @mlanner,

there isn't a good "setting up for production" documentation yet. If you want to start one, I would appreciate it.

Many aspects of production setup would not be healthchecks–specific: how to set up a web server, how to set up a database, how to deploy a WSGI application, how to do backups, high availability, monitoring, and how to push code changes. And as you know everyone does that differently, using different tools.

A reasonable way to get going is to use Heroku as @diwu1989 does. You get a managed managed infrastructure and don't need to worry about backups or load balancers or handling SSL certificates.

To give you a few pointers, the setup for https://healthchecks.io is really straightforward for time being. A single VPS running both web and database server. The web server is nginx + gunicorn, the database is PostgreSQL. Nightly backups to S3 triggered by cron. Code deploys by using a Fabric script:

https://medium.com/@healthchecks/deploying-a-django-app-with-no-downtime-f4e02738ab06

I've been looking into moving the service to AWS, and been doing some experimenting. The reason to move to AWS is they have load balancers and explicit availability groups so there are more tools to set up a robust service. For the hairy database management stuff I can then either use Amazon RDS or a service like https://compose.io. In my experiments, Elastic Beanstalk has worked well for deployments and provisioning of AWS resources. If/when I make the switch, I will do a writeup on that. There's a list of things to solve before then like the issue #39, so no ETA on that.

@cuu508 commented on GitHub (Apr 5, 2016): Hello @mlanner, there isn't a good "setting up for production" documentation yet. If you want to start one, I would appreciate it. Many aspects of production setup would not be healthchecks–specific: how to set up a web server, how to set up a database, how to deploy a WSGI application, how to do backups, high availability, monitoring, and how to push code changes. And as you know everyone does that differently, using different tools. A reasonable way to get going is to use Heroku as @diwu1989 does. You get a managed managed infrastructure and don't need to worry about backups or load balancers or handling SSL certificates. To give you a few pointers, the setup for https://healthchecks.io is really straightforward for time being. A single VPS running both web and database server. The web server is nginx + gunicorn, the database is PostgreSQL. Nightly backups to S3 triggered by cron. Code deploys by using a Fabric script: https://medium.com/@healthchecks/deploying-a-django-app-with-no-downtime-f4e02738ab06 I've been looking into moving the service to AWS, and been doing some experimenting. The reason to move to AWS is they have load balancers and explicit availability groups so there are more tools to set up a robust service. For the hairy database management stuff I can then either use Amazon RDS or a service like https://compose.io. In my experiments, Elastic Beanstalk has worked well for deployments and provisioning of AWS resources. If/when I make the switch, I will do a writeup on that. There's a list of things to solve before then like the issue #39, so no ETA on that.

kerem commented

2026-02-25 23:40:49 +03:00

Author

Owner

@stevenmcastano commented on GitHub (Jun 20, 2016):

I have a pretty solid production setup running on AWS now using a few tools:

On my front-end server I'm using nginx to reverse proxy and to force all connections to SSL.
Then nginx using a proxy_pass statement forwards hack to haproxy.
HAproxy is doing round-robin load balancing to two ubuntu nodes on port 8100
supervisior is installed on both nodes to run manage runserver 0.0.0.0:8100 on both nodes
The nodes are both part of a glusterfs cluster which is mounted to /opt/apps where I have healthcheck installed, so both servers see the exact same files all the time.
I have healthcheck set to use a mysql backend, set to localhost and port 3307 on both servers.
Both servers run MariaDB on port 3306 with maxscale in front of it to provide database load balancing and failover.
healtcheck sendalerts only runs on a single node also managed by supervisor
Backups are handled by a bash scripts that dumps and compresses the database, makes a tarball of the entire /opt/apps/healthchecks directory and moves it to an s3 filesystem mounted to /mnt/backups which is mounted using s3fs on a cron job at 1am
Once a week a cronjob runs at 1:30 am to run prunepings to keep my DB size small, and the same job uses the find command to find files older than 30 days old and delete any that weren't created on the 1st, 7th, 14th, 21st or 28th of the month leaving me with daily backups for 1 month, and weekly backups for everything after that.

It's a little bit of a kludge/cobbled together pile of utils and software, but it's all open source, all free, and it works like a charm. I've got a full redundant load balanced "active/active" healthchecks servers with the exception of sendalerts the only runs on a single node... but maybe in future version there should be a way to mark alerts and pending for send, then mark them as sent or something so you could have sendalerts start with a random delay of 10 - 40 seconds so two instances of it could be running at the same time. Maybe even have them use a table in the database to check in once a minute and balance each other out so each one runs every other minute or something like that.

@stevenmcastano commented on GitHub (Jun 20, 2016): I have a pretty solid production setup running on AWS now using a few tools: 1) On my front-end server I'm using nginx to reverse proxy and to force all connections to SSL. 2) Then nginx using a `proxy_pass` statement forwards hack to haproxy. 3) HAproxy is doing round-robin load balancing to two ubuntu nodes on port 8100 4) supervisior is installed on both nodes to run `manage runserver 0.0.0.0:8100` on both nodes 5) The nodes are both part of a glusterfs cluster which is mounted to `/opt/apps` where I have healthcheck installed, so both servers see the exact same files all the time. 6) I have healthcheck set to use a mysql backend, set to `localhost` and port `3307` on both servers. 7) Both servers run `MariaDB` on port `3306` with `maxscale` in front of it to provide database load balancing and failover. 8) healtcheck `sendalerts` only runs on a single node also managed by `supervisor` 9) Backups are handled by a bash scripts that dumps and compresses the database, makes a tarball of the entire `/opt/apps/healthchecks` directory and moves it to an s3 filesystem mounted to `/mnt/backups` which is mounted using `s3fs` on a cron job at 1am 10) Once a week a cronjob runs at 1:30 am to run `prunepings` to keep my DB size small, and the same job uses the `find` command to find files older than 30 days old and delete any that weren't created on the 1st, 7th, 14th, 21st or 28th of the month leaving me with daily backups for 1 month, and weekly backups for everything after that. It's a little bit of a kludge/cobbled together pile of utils and software, but it's all open source, all free, and it works like a charm. I've got a full redundant load balanced "active/active" healthchecks servers with the exception of `sendalerts` the only runs on a single node... but maybe in future version there should be a way to mark alerts and pending for send, then mark them as sent or something so you could have `sendalerts` start with a random delay of 10 - 40 seconds so two instances of it could be running at the same time. Maybe even have them use a table in the database to check in once a minute and balance each other out so each one runs every other minute or something like that.

kerem commented

2026-02-25 23:40:49 +03:00

Author

Owner

@cuu508 commented on GitHub (Jun 20, 2016):

@stevenmcastano very interesting to see your setup. A couple notes:

"manage.py runserver" is a single-threaded server meant for development only. For production, have a look at uwsgi or gunicorn.

Good point about sendalerts being a single point of failure. There's an issue about this: #39

Agreed, a full production setup gets pretty complex. With a platform like Heroku you get cleaner setup, no worries about HA, backups, SSL etc. But you pay for it.

@cuu508 commented on GitHub (Jun 20, 2016): @stevenmcastano very interesting to see your setup. A couple notes: "manage.py runserver" is a single-threaded server meant for development only. For production, have a look at uwsgi or gunicorn. Good point about sendalerts being a single point of failure. There's an issue about this: #39 Agreed, a full production setup gets pretty complex. With a platform like Heroku you get cleaner setup, no worries about HA, backups, SSL etc. But you pay for it.

kerem commented

2026-02-25 23:40:49 +03:00

Author

Owner

@stevenmcastano commented on GitHub (Jun 20, 2016):

@cuu508 Good point... but the good news is, the single threaded app works so well I've been running it this way for weeks and it's stable as hell! You're right though, from what I've read uwsgi/gunicorn is going to be a much better way to run it. I'll have to do some experimenting this week and learn a little bit. I've actually never run either.

Also, I'm reading up on the other issue regarding the HA setup. I'd surely be interesting in experimenting and testing that as well!

@stevenmcastano commented on GitHub (Jun 20, 2016): @cuu508 Good point... but the good news is, the single threaded app works so well I've been running it this way for weeks and it's stable as hell! You're right though, from what I've read uwsgi/gunicorn is going to be a much better way to run it. I'll have to do some experimenting this week and learn a little bit. I've actually never run either. Also, I'm reading up on the other issue regarding the HA setup. I'd surely be interesting in experimenting and testing that as well!

kerem commented

2026-02-25 23:40:49 +03:00

Author

Owner

@cuu508 commented on GitHub (Sep 4, 2019):

I've updated README with a "Running in Production" section. It's pretty minimal and some parts likely need to be expanded on. But I also think it will be out of scope to go into full detail about configuring the web server, the task runner, the database, external monitoring etc. etc.

@cuu508 commented on GitHub (Sep 4, 2019): I've updated README with a "Running in Production" section. It's pretty minimal and some parts likely need to be expanded on. But I also think it will be out of scope to go into full detail about configuring the web server, the task runner, the database, external monitoring etc. etc.

kerem referenced this issue

2026-02-25 23:41:34 +03:00

[GH-ISSUE #268] Send notifications for individual job failures when measuring job execution time #201

Rows
Columns

[GH-ISSUE #47] Production setup #23