[GH-ISSUE #23] Monitoring execution time of script

kerem commented

2026-02-25 23:40:46 +03:00

Owner

Originally created by @marticardus on GitHub (Dec 22, 2015).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/23

It would be interesting to monitor the execution time of script, for example

curl https://hchk.io/UUID/start when script starts
curl https://hchk.io/UUID/end when script ends

And a option to alert on a maximum execution time user-defined, in ranges as same in checks

Kind regards

Originally created by @marticardus on GitHub (Dec 22, 2015). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/23 It would be interesting to monitor the execution time of script, for example curl https://hchk.io/UUID/start when script starts curl https://hchk.io/UUID/end when script ends And a option to alert on a maximum execution time user-defined, in ranges as same in checks Kind regards

kerem closed this issue

2026-02-25 23:40:46 +03:00

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@diwu1989 commented on GitHub (Dec 31, 2015):

why not just time the execution of the command yourself?

@diwu1989 commented on GitHub (Dec 31, 2015): why not just time the execution of the command yourself?

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@marticardus commented on GitHub (Dec 31, 2015):

I think it's easier to send two requests (start and end).

I want to get an alert if execution time is longer than X

@marticardus commented on GitHub (Dec 31, 2015): I think it's easier to send two requests (start and end). I want to get an alert if execution time is longer than X

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@danemacmillan commented on GitHub (Apr 28, 2017):

For reference, cronitor.io provides this functionality and is very useful. For example, not all scripts or services properly return exit codes, which means chaining commands with && will run the healthchecks.io request regardless of the previous command's exit code, which is misleading, because a service or script that fails should return non-zero, so chained commands fail to execute, but not all do. In instances like this, knowing the average time a service or script runs will allow sending alerts based on a process that finished very fast, but did not provide a non-zero exit code signifying that it failed.

@danemacmillan commented on GitHub (Apr 28, 2017): For reference, cronitor.io provides this functionality and is very useful. For example, not all scripts or services properly return exit codes, which means chaining commands with `&&` will run the healthchecks.io request regardless of the previous command's exit code, which is misleading, because a service or script that fails *should* return non-zero, so chained commands fail to execute, but not all do. In instances like this, knowing the average time a service or script runs will allow sending alerts based on a process that finished very fast, but did not provide a non-zero exit code signifying that it failed.

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@audiolion commented on GitHub (Nov 6, 2017):

I would like this feature as well, and am willing to create a PR for it (likely multiple as this touches database, services, frontend, additional rulesets/validations) if the maintainers would accept it.

This will require a refactor of the service as we need to identify which routes we are monitoring, as stated earlier the simplest way to link up which cron routes we want to monitor start/end times for, we use the same hchks.io address hash, but have /start /end appended to the URL to signify we want to track the timing.

After that refactor we can update the frontend templates to include the elapsed time of each checks start/end and provide an average for that endpoint.

Finally we could implement validation rules for alerts when healthchecks start/end time is above/below a certain time or standard deviation or whatever. E.g. if it took 3 standard deviations longer to run sends a notification, or if it runs 3 standard deviations faster it sends notification, or if it takes more than 60 seconds send a notification, or if it takes less than .1 seconds send a notification.

@audiolion commented on GitHub (Nov 6, 2017): I would like this feature as well, and am willing to create a PR for it (likely multiple as this touches database, services, frontend, additional rulesets/validations) if the maintainers would accept it. This will require a refactor of the service as we need to identify which routes we are monitoring, as stated earlier the simplest way to link up which cron routes we want to monitor start/end times for, we use the same `hchks.io` address hash, but have `/start` `/end` appended to the URL to signify we want to track the timing. After that refactor we can update the frontend templates to include the elapsed time of each checks start/end and provide an average for that endpoint. Finally we could implement validation rules for alerts when healthchecks start/end time is above/below a certain time or standard deviation or whatever. E.g. if it took 3 standard deviations longer to run sends a notification, or if it runs 3 standard deviations faster it sends notification, or if it takes more than 60 seconds send a notification, or if it takes less than .1 seconds send a notification.

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@cuu508 commented on GitHub (Nov 7, 2017):

Hello @audiolion, I'm happy to review and merge PR(s) implementing this feature.

I like your suggestion about doing this in stages: 1) add API support 2) document the API support 3) expose the measured times in UI 4) add configurable alerting based on measured times.

@cuu508 commented on GitHub (Nov 7, 2017): Hello @audiolion, I'm happy to review and merge PR(s) implementing this feature. I like your suggestion about doing this in stages: 1) add API support 2) document the API support 3) expose the measured times in UI 4) add configurable alerting based on measured times.

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@pladen commented on GitHub (Jun 1, 2018):

cronalarm.com has this feature too
Between "start" and "end" pings the check is displayed in "running" state in the UI.
This is a great way to monitor what is running at the moment ...

@pladen commented on GitHub (Jun 1, 2018): cronalarm.com has this feature too Between "start" and "end" pings the check is displayed in "running" state in the UI. This is a great way to monitor what is running at the moment ...

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@colinfrei commented on GitHub (Nov 16, 2018):

Interested in this as well.

@colinfrei commented on GitHub (Nov 16, 2018): Interested in this as well.

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@JeremyWeir commented on GitHub (Nov 21, 2018):

This feature would make choosing this project over cronitor/cronhub/deadmans snitch/etc a no brainer. Not having it is the reason I didn't just sign up at https://healthchecks.io

@JeremyWeir commented on GitHub (Nov 21, 2018): This feature would make choosing this project over cronitor/cronhub/deadmans snitch/etc a no brainer. Not having it is the reason I didn't just sign up at https://healthchecks.io

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@cuu508 commented on GitHub (Nov 21, 2018):

This would be an useful feature – agreed!
I'm planning on implementing it. It touches core parts of the system, so will take time to do properly.

For anybody reading this: are you mainly interested in just logging the execution time, or getting alerted when a job takes longer than expected to execute (even though it manages to finish before going "down")?

@cuu508 commented on GitHub (Nov 21, 2018): This would be an useful feature – agreed! I'm planning on implementing it. It touches core parts of the system, so will take time to do properly. For anybody reading this: are you mainly interested in just logging the execution time, or getting alerted when a job takes longer than expected to execute (even though it manages to finish before going "down")?

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@pladen commented on GitHub (Nov 21, 2018):

Logging the execution time is a must-have. Displaying in UI a job as running (ie between the start and end events) is also a great (and simple) feature.

However triggering an alert when getting higher than a threshold seems redundant to me. Periodic check + grace time is sufficient. If my cron job is not finished in time (grace time exhausted), i just want ONE alert that tell me "didnt run" or "didnt finish in time". I don't need another setting with running time, which will mean TWO redundant alerts.

@pladen commented on GitHub (Nov 21, 2018): Logging the execution time is a must-have. Displaying in UI a job as running (ie between the start and end events) is also a great (and simple) feature. However triggering an alert when getting higher than a threshold seems redundant to me. Periodic check + grace time is sufficient. If my cron job is not finished in time (grace time exhausted), i just want ONE alert that tell me "didnt run" or "didnt finish in time". I don't need another setting with running time, which will mean TWO redundant alerts.

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@JeremyWeir commented on GitHub (Nov 22, 2018):

I'm interested in being alerted when the job takes longer than expected. With the interval/grace only, you might never notice a job growing slowly in execution time over time.

@JeremyWeir commented on GitHub (Nov 22, 2018): I'm interested in being alerted when the job takes longer than expected. With the interval/grace only, you might never notice a job growing slowly in execution time over time.

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@cuu508 commented on GitHub (Dec 21, 2018):

I've now added a /start endpoint. For now I'm treating it as an experimental feature and so I haven't updated the documentation yet.

Here's how it works for now. Let's say your ping URL is

https://hc-ping.com/bddfa063-d3ca-4fc6-8220-8ff6c712d4f7

Append "/start" at the end and that's the /start endpoint:

https://hc-ping.com/bddfa063-d3ca-4fc6-8220-8ff6c712d4f7/start

When the /start endpoint receives a request, a few things happen:

Healthchecks logs a [Started] event. You will see it in the log, along with [OK] and [Failure] events
In the web UI, check's status changes to "Started" ("▶" icon)
It kicks off a timer: the check now must receive a ping within the grace time, or it will go down

In other words, if a job signals a "start", we expect it to then signal "success" within its grace time. If it does not, we send an alert.

There are still edge cases that I'm still thinking about how to handle best. I also have UI ideas on how to present the new functionality better. But this seems like a logical point to deploy the current implementation and give a status update.

If you're interested in this functionality and get a chance, you're welcome to try it out. If you notice any bugs, inconsistencies, or have other ideas on how to improve it, I'm all ears.

@cuu508 commented on GitHub (Dec 21, 2018): I've now added a `/start` endpoint. For now I'm treating it as an experimental feature and so I haven't updated the documentation yet. Here's how it works for now. Let's say your ping URL is https://hc-ping.com/bddfa063-d3ca-4fc6-8220-8ff6c712d4f7 Append "/start" at the end and that's the `/start` endpoint: https://hc-ping.com/bddfa063-d3ca-4fc6-8220-8ff6c712d4f7/start When the `/start` endpoint receives a request, a few things happen: * Healthchecks logs a [Started] event. You will see it in the log, along with [OK] and [Failure] events * In the web UI, check's status changes to "Started" ("▶" icon) * It kicks off a timer: the check now *must* receive a ping within the grace time, or it will go down In other words, if a job signals a "start", we expect it to then signal "success" within its grace time. If it does not, we send an alert. There are still edge cases that I'm still thinking about how to handle best. I also have UI ideas on how to present the new functionality better. But this seems like a logical point to deploy the current implementation and give a status update. If you're interested in this functionality and get a chance, you're welcome to try it out. If you notice any bugs, inconsistencies, or have other ideas on how to improve it, I'm all ears.

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@cuu508 commented on GitHub (Dec 25, 2018):

Released in v1.4.0

@cuu508 commented on GitHub (Dec 25, 2018): Released in v1.4.0

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@pladen commented on GitHub (Jan 7, 2019):

in test for 1 week now, it is working great
thank you

@pladen commented on GitHub (Jan 7, 2019): in test for 1 week now, it is working great thank you

kerem commented

2026-02-25 23:40:47 +03:00

Author

Owner

@papatistos commented on GitHub (Apr 30, 2019):

Just started testing this and have a quick suggestions:

Regarding UI: don't use green (or at least not the same green as for the all-OK check-mark:

Perhaps orange would be appropriate for a started timer?
Regarding syntax (or whatever category this falls under). I understand that the intended usage is that you start the timer with an appended /start and stop it with an ordinary ping (without any endpoint). It would be nice, though, if it would also accept a ping with a /stop endpoint for stopping the timer.

Why this redundancy? Not a big deal, but in my mind it just makes sense that a start command is followed by a stop (I intuitively appended a /stop before properly reading the instructions). But intuition and aesthetics aside, I think it also increases the readability of scripts if I can see that this is where the timer starts and this is where it stops.

Notably, though, while stopping a running timer should work both with and without /stop, a /stop should not be accepted as an ordinary ping (i.e. it will do nothing if it wasn't preceded by a /start)

@papatistos commented on GitHub (Apr 30, 2019): Just started testing this and have a quick suggestions: 1. Regarding UI: don't use green (or at least not the same green as for the all-OK check-mark: ![image](https://user-images.githubusercontent.com/3662750/56963606-a110b980-6b59-11e9-91ba-f0ec30de616e.png) Perhaps orange would be appropriate for a started timer? 2. Regarding syntax (or whatever category this falls under). I understand that the intended usage is that you start the timer with an appended `/start` and stop it with an ordinary ping (without any endpoint). It would be nice, though, if it would *also* accept a ping with a `/stop` endpoint for stopping the timer. Why this redundancy? Not a big deal, but in my mind it just makes sense that a start command is followed by a stop (I intuitively appended a `/stop` before properly reading the instructions). But intuition and aesthetics aside, I think it also increases the readability of scripts if I can see that this is where the timer starts and this is where it stops. Notably, though, while stopping a running timer should work both with and without `/stop`, a `/stop` should not be accepted as an ordinary ping (i.e. it will do nothing if it wasn't preceded by a `/start`)

kerem referenced this issue

2026-02-25 23:43:57 +03:00

[PR #14] [MERGED] redirect already logged in user #861

Rows
Columns

[GH-ISSUE #23] Monitoring execution time of script #14