mirror of
https://github.com/healthchecks/healthchecks.git
synced 2026-04-25 15:05:49 +03:00
[GH-ISSUE #218] Suggest a Grace Time based on average job execution time #157
Labels
No labels
bug
bug
bug
feature
good-first-issue
new integration
pull-request
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/healthchecks#157
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @danemacmillan on GitHub (Feb 8, 2019).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/218
I've begun measuring job execution time of all of the health checks that were deployed before the feature was available. One thing that occurred to me is that, given a large enough dataset of measured execution times, the Grace Time feature could be more empowered based on the data being collected, and help users set more informed values.
I don't think I'd want the Grace Time to be automatically modified based on this data, in the off chance that some jobs are severely skewed, but I think providing the average execution time for a job somewhere in the interface, and offering it up as a suggested good value based on
njobs, would be really insightful. I would even say this data should only be made available to users who have collected a certain threshold of data--say, 100 successfully measured job runs per job--in order to appear in the UI somewhere. Maybe even provide other time values, such as the longest time and fastest time of a job; maybe even break down execution times by dates, like average all time, average the last week, average the last month, etc.The reason I bring this up is because I'm just manually setting Grace Times based on my anecdotal knowledge of their average run times; this is usually sufficient. Most of us using this service are likely intimately aware of their system operations. However, there are some jobs that take longer due to the simple fact that these jobs have stayed the same but the amount of data they process has increased. I'm not obsessing about how long these particular jobs run and then tweaking their grace times accordingly, but typically the moment I decide to increase a Grace Time is, for example, after a four month period I note that I'm now getting too many alerts for a particular job, and it's simply because it requires more time to process more data. In this example I would check how long it now takes, and then adjust the Grace Time accordingly.
To summarize, then, I think displaying the average execution time for each job somewhere in the UI would help users in setting more informed Grace Time values.
@skorokithakis commented on GitHub (May 19, 2019):
That's a pretty cool feature, would you accept a PR if I had time to work on it?
@cuu508 commented on GitHub (May 20, 2019):
Yes, I think this would be a neat and useful feature. I'm open to reviewing and accepting PR(s).
@cuu508 commented on GitHub (Nov 19, 2025):
I am not planning to work on this. The idea is neat and easy to understand, but quite tricky to implement. We do not store execution times in the database, we calculate them at display time. It is possible to iterate through all pings a check has received, calculate all execution times, and then calculate the average or median. But it's a DB-heavy operation and messy python code. The cost-benefit ratio isn't there for me, and so I'm not planning to add this.
What I've been doing instead is –