mirror of
https://github.com/healthchecks/healthchecks.git
synced 2026-04-25 23:15:49 +03:00
[GH-ISSUE #809] Exit code whitelisting #569
Labels
No labels
bug
bug
bug
feature
good-first-issue
new integration
pull-request
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/healthchecks#569
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @quentinus95 on GitHub (Mar 24, 2023).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/809
Hello, would it be possible to have a feature that allows some non-0 exit codes to be whitelisted and considered as a success (or a warning for instance)?
I have some scripts that can end with a non-0 exit code that is not critical. It would be nice to be able to allow them and still consider execution as successful.
@cuu508 commented on GitHub (Jul 14, 2023):
Thanks for the suggestion. Technically possible of course, but I'm not sure how widely applicable this would be – is it common for to have scripts that return non-zero exit codes in success scenarios, and there is also no way to influence this either by passing parameters, by editing the scripts, or by using wrapper scripts with additional conditional logic?
@quentinus95 commented on GitHub (Jul 20, 2023):
Hello @cuu508, here is one example I have in mind: when
rsyncperforms a copy of a folder (e.g., a backup) and some files are deleted before they are copied (rsyncperforms a scan of the files, then runs the backup), it may return a non 0. In some (most?) scenarios, it is fine to ignore that specific error code because it can be related to some logs that were rotated, or some lock files that were removed (which is fine when performing a snapshot).In such situations, it would be nice to have a warning state, rather than a failure. It would allow in the previous example to say “maybe you're backing up some folders or files that should be ignored”. Those situations are fine and can be investigated later (very different from a backup that failed to execute and might require immediate action).
@davidtorosyan commented on GitHub (Aug 27, 2023):
I have a similar use case, also backup related.
I expect my backup script to run successfully once a day. However, if it runs more frequently (say due to manual triggers), it'll bail out without actually doing anything.
I don't want to count this as success, but I don't want to alert on the failure either. So right now the only thing I can think to do is omit the "start" ping.
If I want to retain "start", then I'd need a way to signal that a run is canceled. Using an allow-listed non-zero status code could work for that.
@cuu508 commented on GitHub (Aug 28, 2023):
@davidtorosyan a couple of questions, so I understand your use case:
Why does it bail out on manual triggers? Do manual and automatic triggers launch the job differently? Or does the backup job somehow recognize that "it's not the right time for me to run"?
If the job does what it is supposed to do (which may be "nothing" in some cases), why not count it as success?
At the time when you send the "start" signal, you do not yet know if the job will be cancelled / bail out, correct? Like, the script starts up, then recognizes that some condition is not met, and bails out? What is that condition?
If you could detect the bail out condition near the start of the script, perhaps you could send the "start" signal only after it is clear the script will [attempt to] run fully?
@davidtorosyan commented on GitHub (Aug 28, 2023):
@cuu508 good questions! Let me try and answer with pseudocode:
I see an additional solution I didn't before - solving this with two health checks. One for the backup script, and one for successful backup itself. That way I'd have a signal for the backup script running (and succeeding even in the bail out case) and for an actual backup being done with a daily frequency.
@davidtorosyan commented on GitHub (Aug 28, 2023):
After thinking about it more, I think I might be doing to much with healthchecks.
From what I can tell, healthchecks is best at making sure that a job is running with a given schedule (i.e. the backup job runs daily), not validating arbitrary conditions (i.e. the data that's backed up is the data I want).
That said I still do have a need for the latter, so maybe what I'll do is something like this:
@quentinus95 commented on GitHub (Jul 29, 2024):
@cuu508 which alternative would you suggest?
@cuu508 commented on GitHub (Jul 29, 2024):
@quentinus95 if you want to treat some non-zero exit codes as success, use a wrapper script which inspects the exit code and decides whether to report success or failure to Healthchecks.
If you want additional warning state, use monitoring software that supports metric collection, and configurable alerting rules based on collected metric values.
If you want to use something sort-of similar to Healthchecks, look into sensorpad.