mirror of
https://github.com/healthchecks/healthchecks.git
synced 2026-04-26 07:25:51 +03:00
[GH-ISSUE #691] Feature Request: Allow to notify on each failed check #498
Labels
No labels
bug
bug
bug
feature
good-first-issue
new integration
pull-request
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/healthchecks#498
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @DerDanilo on GitHub (Aug 6, 2022).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/691
There are cases where one wants to be notified with every failed trigger of a check. This is per job/task and not globally where there is already a setting to re-notify daily/weekly....
Story:
Workaround:
Have a "success" trigger every minute to switch the status back to be able to be notified one it's down by failure keyword or just failed check again.
Idea:
One wants to be able to be notified every time that a check fails. That might be multiple failed checks / errors again and again. There might be checks that throw an error over and over again. Then the status never changes but it's seriousness cannot be detected by notifications when logging into the dashboard might not be required at that time.
e.g. with email keyword filtering it doesn't really work if one is looking for keywords and to be notified every time a keyword was found.
Possible solution:
Add a simple option that allows to re-notify every time that a failure was logged/triggered.
It may also make sense to add a limiter to how often it may trigger in a certain period of time.
Another solution may be to extend the email filtering to allow simple notification whenever a keyword was found. Not "down" or "up" but "warning" or "match found".
This should be per task/job and not globally (not sure if this would serve any purpose globally).
Thanks in advance!
@DerDanilo commented on GitHub (Aug 7, 2022):
There appear to be similar feature requests:
https://github.com/healthchecks/healthchecks/issues/510
@cuu508 commented on GitHub (Aug 10, 2022):
Thanks for the suggestion.
Can you expand a little more on this part? When/why would one want this?
Hypothesis: to gauge the severity of the issue (one notification – not serious, many notifications – bad).
Hypothesis: to make sure the issue isn't ignored or forgotten about – draw continuous attention to it.
@DerDanilo commented on GitHub (Aug 11, 2022):
Your hypothesis are both correct. But there is also a third type:
When one only wants to be notified if there is an error reported or the state "fail/warning/success" was reported again (not state change). Hence allowing to notify on every state "update".
This is e.g. important when filtering mail and one wants to know every time a email was caught with a certain keyword.
For mail filtering it would also be awesome if one could select the state that the system should report. "down" is not always useful when using mail filtering but also with normal checks. Sometimes it's just "warning" or "hit me, I got something you should look at" without being super important and immediate action being required.
This is quite difficult to explain for me. Please ask again if I should try to explain again/differently.
Thanks!
While at the notifications settings I'd love to have setting to specify after how many occurrences of a failure or success detection to actually notify about it. e.g. "Failed once = nothing to do", "failed twice = alert, something seems wrong and didn't recover".
@cuu508 commented on GitHub (Jul 14, 2023):
Thanks for explaining, I understand the use case, but Healthchecks is designed for a different use case – to alert you when an expected event does not happen on time. Healthchecks alerts you on state changes ("up" -> "down" and "down" -> "up"). If a check goes down, yes, you should log into the system, and investigate why it went down. While investigating you will also see the other failures.
I'd like to stay focused on the dead-mans switch as the focus, and not keep gradually extending functionality in all directions to the point where it can do almost everything, but it is also mediocre at almost everything. I hope you understand.
A couple ideas for your task at hand: