[GH-ISSUE #664] Lots of alerts after Healthcheck downtime #475

Closed
opened 2026-02-25 23:42:36 +03:00 by kerem · 5 comments
Owner

Originally created by @horschi on GitHub (Jun 8, 2022).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/664

When the healthchecks service is down for some time, starting it up generates lots of alerts. It would be nice if there was a way to have a clean way of having healthchecks shut down (e.g. for updates/maintenance).

Ideally healthchecks would handle this itself.

Alternatively is there some way to cleanly shut-down healthchecks, so that it does not generate alerts on startup?
e.g. like this:

  • Shutdown sendalerts
  • Shutdown runserver
  • Have long downtime (long enough for healthchecks to go in error state)
  • Start runserver
  • Wait for all services to call and make all statuses go green (e.g. wait one minute)
  • TODO: Run some command to reset alerts for downtime?
  • Start sendalerts (currently a lot of alerts are sent at this point)
Originally created by @horschi on GitHub (Jun 8, 2022). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/664 When the healthchecks service is down for some time, starting it up generates lots of alerts. It would be nice if there was a way to have a clean way of having healthchecks shut down (e.g. for updates/maintenance). Ideally healthchecks would handle this itself. Alternatively is there some way to cleanly shut-down healthchecks, so that it does not generate alerts on startup? e.g. like this: - Shutdown sendalerts - Shutdown runserver - Have long downtime (long enough for healthchecks to go in error state) - Start runserver - Wait for all services to call and make all statuses go green (e.g. wait one minute) - TODO: Run some command to reset alerts for downtime? - Start sendalerts (currently a lot of alerts are sent at this point)
kerem closed this issue 2026-02-25 23:42:36 +03:00
Author
Owner

@phaer commented on GitHub (Jun 8, 2022):

TODO: Run some command to reset alerts for downtime?

I think you can use manage.py prunenotifications (https://github.com/healthchecks/healthchecks/blob/master/hc/api/management/commands/prunenotifications.py) to delete after a successful ping.

https://healthchecks.io/docs/self_hosted/ :

Remove old records of sent notifications. For each check, remove notifications that are older than the oldest stored ping for the corresponding check.

<!-- gh-comment-id:1150347011 --> @phaer commented on GitHub (Jun 8, 2022): > TODO: Run some command to reset alerts for downtime? I think you can use `manage.py prunenotifications` (https://github.com/healthchecks/healthchecks/blob/master/hc/api/management/commands/prunenotifications.py) to delete after a successful ping. https://healthchecks.io/docs/self_hosted/ : > Remove old records of sent notifications. For each check, remove notifications that are older than the oldest stored ping for the corresponding check.
Author
Owner

@cuu508 commented on GitHub (Jun 10, 2022):

The first and best option would be to avoid long downtimes :-)

The second option is to pause all checks before starting Healthchecks after an extended downtime. One way to do that would be with a SQL query:

UPDATE api_check SET status='paused' WHERE status='up'

(checks currently in the up state are the ones that could potentially go down when sendalerts resumes. So the above query only pauses those, and does not touch checks in the new, down or already in the paused state)

<!-- gh-comment-id:1152444778 --> @cuu508 commented on GitHub (Jun 10, 2022): The first and best option would be to avoid long downtimes :-) The second option is to pause all checks before starting Healthchecks after an extended downtime. One way to do that would be with a SQL query: ``` UPDATE api_check SET status='paused' WHERE status='up' ``` (checks currently in the `up` state are the ones that could potentially go down when `sendalerts` resumes. So the above query only pauses those, and does not touch checks in the `new`, `down` or already in the `paused` state)
Author
Owner

@horschi commented on GitHub (Jun 10, 2022):

@cuu508 : And after the startup and some grace time (to allow every host to ping) it would be best to reset the paused state. Otherwise if a service went down during the healthchecks maintenance, I would not get notified.

So afterwards something like this?
UPDATE api_check SET status='up' WHERE status='paused'
Or to down state?

<!-- gh-comment-id:1152461196 --> @horschi commented on GitHub (Jun 10, 2022): @cuu508 : And after the startup and some grace time (to allow every host to ping) it would be best to reset the paused state. Otherwise if a service went down during the healthchecks maintenance, I would not get notified. So afterwards something like this? `UPDATE api_check SET status='up' WHERE status='paused'` Or to down state?
Author
Owner

@cuu508 commented on GitHub (Jun 10, 2022):

Ah, I understand what you're after a little better now. You originally suggested:

  • Start runserver
  • Wait for all services to call and make all statuses go green (e.g. wait one minute)
  • TODO: Run some command to reset alerts for downtime?
  • Start sendalerts (currently a lot of alerts are sent at this point)

I haven't tested this myself, but, at a glance, it should work. If all checks are green when you start sendalerts then sendalerts should be happy and not trigger any alerts.

<!-- gh-comment-id:1152470755 --> @cuu508 commented on GitHub (Jun 10, 2022): Ah, I understand what you're after a little better now. You originally suggested: > * Start runserver > * Wait for all services to call and make all statuses go green (e.g. wait one minute) > * TODO: Run some command to reset alerts for downtime? > * Start sendalerts (currently a lot of alerts are sent at this point) > I haven't tested this myself, but, at a glance, it should work. If all checks are green when you start sendalerts then sendalerts should be happy and not trigger any alerts.
Author
Owner

@horschi commented on GitHub (Jun 17, 2022):

@cuu508 : I tried it, but used the following update after when bringing healthchecks back up:

UPDATE api_check SET status='down' WHERE status='paused'

This was working fine for all services the made a ping in time. A service that did not make make a ping while the "Wait for all services to call and make all statuses go green" phase, was being set into down state, but did not trigger any mails.

I guess setting paused states to up afterwards is better.

<!-- gh-comment-id:1158836880 --> @horschi commented on GitHub (Jun 17, 2022): @cuu508 : I tried it, but used the following update after when bringing healthchecks back up: UPDATE api_check SET status='down' WHERE status='paused' This was working fine for all services the made a ping in time. A service that did not make make a ping while the "Wait for all services to call and make all statuses go green" phase, was being set into down state, but did not trigger any mails. I guess setting paused states to up afterwards is better.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/healthchecks#475
No description provided.