starred/healthchecks

Fork 0

mirror of https://github.com/healthchecks/healthchecks.git synced 2026-04-25 06:55:53 +03:00

[GH-ISSUE #58] "sendalerts" dies on MySQL interruption #34

New issue

Closed

opened 2026-02-25 23:40:53 +03:00 by kerem · 5 comments

kerem commented

2026-02-25 23:40:53 +03:00

Owner

Originally created by @stevenmcastano on GitHub (May 13, 2016).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/58

First of all, freaking amazing job with the app so far... I love it and use it for all kinds of stuff now!! The webhook variable are awesome and tie into my notification service perfectly!! Thanks!!

Since then I've expanded my setup on my cloud servers to include a 2 node MariaDB Galera cluster with MaxScale as the front end to help balance the load between DB servers... there's a lot of stuff running on there now.

Sometimes it take MaxScale a few seconds to see a node is dead, mark it as such and reroute the connections to another working node, what I noticed in that int he "manage.py runserver" function, this works great... if a ping happens to come in while it still hasn't rerouted, it shows some error info, but attempts to reconnect to the database which is does without a problem and keeps on tuckin'

HOWEVER, the "sendalerts" function does not. When it noticed the DB connection has gone away, even for just a brief second, it throws out a very similar error message to the "runserver" function, but does NOT attempt to reconnect. The thread just dies and exits.

Is there any way to make the sendalert function attempt to reconnect just like the runserver function does?

(I forgot to grab the debug error output from my testing before posting this, but I'm happy to force the failure in my dev environment and post it if you need it)

Originally created by @stevenmcastano on GitHub (May 13, 2016). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/58 First of all, freaking amazing job with the app so far... I love it and use it for all kinds of stuff now!! The webhook variable are awesome and tie into my notification service perfectly!! Thanks!! Since then I've expanded my setup on my cloud servers to include a 2 node MariaDB Galera cluster with MaxScale as the front end to help balance the load between DB servers... there's a lot of stuff running on there now. Sometimes it take MaxScale a few seconds to see a node is dead, mark it as such and reroute the connections to another working node, what I noticed in that int he "manage.py runserver" function, this works great... if a ping happens to come in while it still hasn't rerouted, it shows some error info, but attempts to reconnect to the database which is does without a problem and keeps on tuckin' HOWEVER, the "sendalerts" function does not. When it noticed the DB connection has gone away, even for just a brief second, it throws out a very similar error message to the "runserver" function, but does NOT attempt to reconnect. The thread just dies and exits. Is there any way to make the sendalert function attempt to reconnect just like the runserver function does? (I forgot to grab the debug error output from my testing before posting this, but I'm happy to force the failure in my dev environment and post it if you need it)

kerem closed this issue

2026-02-25 23:40:53 +03:00

kerem commented

2026-02-25 23:40:54 +03:00

Author

Owner

@cuu508 commented on GitHub (May 14, 2016):

Thanks for the kind words.

How are you running the sendalerts command?

I'm running it using supervisor. If the command exits, supervisor restarts it. It should be possible to set it up similarly with systemd, upstart etc. too.

@cuu508 commented on GitHub (May 14, 2016): Thanks for the kind words. How are you running the `sendalerts` command? I'm running it using supervisor. If the command exits, supervisor restarts it. It should be possible to set it up similarly with systemd, upstart etc. too.

kerem commented

2026-02-25 23:40:54 +03:00

Author

Owner

@stevenmcastano commented on GitHub (May 16, 2016):

Never tried supervisor before... I'll honestly have to look it up and see what it's all about.

Right now I have a script set in rc.local to launched a tmux window, then starts another bash script the opens and names new windows, then in each window runs the python, bash or other script/app that I'd like. That way I can connect into my boxes tmux a and see everything that's running with a decent bit of history.

Do have any tips/tutorial you can point me to for supervisor?

@stevenmcastano commented on GitHub (May 16, 2016): Never tried supervisor before... I'll honestly have to look it up and see what it's all about. Right now I have a script set in rc.local to launched a tmux window, then starts another bash script the opens and names new windows, then in each window runs the python, bash or other script/app that I'd like. That way I can connect into my boxes `tmux a` and see everything that's running with a decent bit of history. Do have any tips/tutorial you can point me to for supervisor?

kerem commented

2026-02-25 23:40:54 +03:00

Author

Owner

@diwu1989 commented on GitHub (May 16, 2016):

http://supervisord.org/running.html

@diwu1989 commented on GitHub (May 16, 2016): http://supervisord.org/running.html

kerem commented

2026-02-25 23:40:54 +03:00

Author

Owner

@stevenmcastano commented on GitHub (May 16, 2016):

Yup.... did some googling last night, I can't believe I've never seen this before. Getting anything newer than 3.0 running well on Ubuntu 14.04 is still a bit of a challenge, but the default repo version of 3.0 runs pretty well except the "tail -f" option seems to be giving me an error. I will however be putting this together on my DEV server ASAP.... it's AWESOME!

@stevenmcastano commented on GitHub (May 16, 2016): Yup.... did some googling last night, I can't believe I've never seen this before. Getting anything newer than 3.0 running well on Ubuntu 14.04 is still a bit of a challenge, but the default repo version of 3.0 runs pretty well except the "tail -f" option seems to be giving me an error. I will however be putting this together on my DEV server ASAP.... it's AWESOME!

kerem commented

2026-02-25 23:40:54 +03:00

Author

Owner

@cuu508 commented on GitHub (May 24, 2016):

In summary I would recommend to run sendalerts command under a daemon which will respawn it when it dies. Supervisor is one option. Another interesting option is uwsgi. If you use uwsgi web server to serve the webapp, you can configure it to also run the sendalerts command. With uwsgi you can also run regular cron-style maintenance tasks, and do a lot of other things.

We can guard and work around specific runtime errors like "mysql has gone away" but in general with the different integrations there can be many kinds of exceptions and I don't think it would be good idea to just catch and ignore all of them.

@cuu508 commented on GitHub (May 24, 2016): In summary I would recommend to run sendalerts command under a daemon which will respawn it when it dies. Supervisor is one option. Another interesting option is uwsgi. If you use uwsgi web server to serve the webapp, you can configure it to also run the sendalerts command. With uwsgi you can also run regular cron-style maintenance tasks, and do a lot of other things. We can guard and work around specific runtime errors like "mysql has gone away" but in general with the different integrations there can be many kinds of exceptions and I don't think it would be good idea to just catch and ignore all of them.

kerem referenced this issue

2026-02-25 23:43:59 +03:00

[PR #34] [MERGED] remove channel should always redirect even if removal fails #873