[GH-ISSUE #919] Support Systemd OnCalendar timers #646

Closed
opened 2026-02-25 23:43:09 +03:00 by kerem · 8 comments
Owner

Originally created by @ravench on GitHub (Nov 29, 2023).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/919

The ability to time checks using cron syntax is great. However, we use systemd OnCalendar timers for all our timed jobs. It would be great, if healthchecks could parse those as well.

See this post for an explanation of the OnCalendar syntax.

The parser for this in systemd seems to be here. (in C)

There is also the systemd-analize tool, which has the ability to parse calendar settings.

Originally created by @ravench on GitHub (Nov 29, 2023). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/919 The ability to time checks using cron syntax is great. However, we use systemd OnCalendar timers for all our timed jobs. It would be great, if healthchecks could parse those as well. See [this post](https://silentlad.com/systemd-timers-oncalendar-(cron)-format-explained) for an explanation of the OnCalendar syntax. The parser for this in systemd seems to be [here](https://github.com/systemd/systemd/blob/5fae1561032dca13237ff8cc517a86ae8edcdcdd/src/shared/calendarspec.c#L871). (in C) There is also the [systemd-analize](https://www.freedesktop.org/software/systemd/man/latest/systemd-analyze.html#systemd-analyze%20calendar%20EXPRESSION...) tool, which has the ability to parse calendar settings.
kerem closed this issue 2026-02-25 23:43:09 +03:00
Author
Owner

@cuu508 commented on GitHub (Dec 1, 2023):

Thanks for the suggestion. I've been thinking about this for a long time as well! The main barrier is the absence of a python library for parsing and evaluating OnCalendar schedules. Forking out to systemd-analyze calendar ... would work, but would not be ideal in terms of security, performance, and portability.

No promises yet, but I've started work on a python library for this: https://github.com/cuu508/oncalendar/

<!-- gh-comment-id:1836179999 --> @cuu508 commented on GitHub (Dec 1, 2023): Thanks for the suggestion. I've been thinking about this for a long time as well! The main barrier is the absence of a python library for parsing and evaluating OnCalendar schedules. Forking out to `systemd-analyze calendar ...` would work, but would not be ideal in terms of security, performance, and portability. No promises yet, but I've started work on a python library for this: https://github.com/cuu508/oncalendar/
Author
Owner

@cuu508 commented on GitHub (Dec 11, 2023):

The initial implementation is ready, and deployed to https://healthchecks.io. All testing welcome :-)

Here's how the "Update Schedule" dialog looks:

image

<!-- gh-comment-id:1850081674 --> @cuu508 commented on GitHub (Dec 11, 2023): The initial implementation is ready, and deployed to https://healthchecks.io. All testing welcome :-) Here's how the "Update Schedule" dialog looks: ![image](https://github.com/healthchecks/healthchecks/assets/661859/b34caeba-a2a3-49ec-8600-85d7031c7130)
Author
Owner

@ravench commented on GitHub (Dec 12, 2023):

Awesome, thanks. I've deployed 3.1-dev to our server and adjusted our scripts, seems to work well so far with a few dozen checks and various timer formats.

<!-- gh-comment-id:1852474600 --> @ravench commented on GitHub (Dec 12, 2023): Awesome, thanks. I've deployed 3.1-dev to our server and adjusted our scripts, seems to work well so far with a few dozen checks and various timer formats.
Author
Owner

@ravench commented on GitHub (Dec 12, 2023):

A little issue I ran in to involves grace time and RandomizeDelaySec:

We use fairly large random delays (up to 3600s) in our Jobs, since we have many that trigger at the same time and use the same resources. This means that using schedules instead of timeouts in Healthckecks, we constantly have Jobs that are in their grace period, waiting for the random delay of the systemd.timer to pass. We trigger the /start API call with ExecStartPre= in the service, so not really any way of triggering that call earlier.
It's sub-optimal, since this causes our project to become a bit of a christmas tree, but it doesn't cause any issues with false alerts, we just set hc_gracetime = randomizedelaysec + timeoutsec.

I wonder weather it would make some sense to add a 'green grace time', but that raises even more questions for me:
What is grace time actually intended for in a complete setup, using /start , /<exit-code and possibly log. I see three relevant durations in this context:

Schedule offset: The expected time between the scheduled and the effective start time of the job. (RandomizeDelaySec and AccuracySec in Systemd)
Duration: The maximum time between start and end of the job. (TimeoutSec in Systemd)
Grace Time: Time to delay warnings to catch unexpected delays.

I imagine handling these delays separately would be fairly complicated and it isn't a priority, since it's only really relevant for Systemd and gracetime can just be set accordingly.

<!-- gh-comment-id:1852528769 --> @ravench commented on GitHub (Dec 12, 2023): A little issue I ran in to involves grace time and `RandomizeDelaySec`: We use fairly large random delays (up to 3600s) in our Jobs, since we have many that trigger at the same time and use the same resources. This means that using schedules instead of timeouts in Healthckecks, we constantly have Jobs that are in their grace period, waiting for the random delay of the systemd.timer to pass. We trigger the `/start` API call with `ExecStartPre=` in the service, so not really any way of triggering that call earlier. It's sub-optimal, since this causes our project to become a bit of a christmas tree, but it doesn't cause any issues with false alerts, we just set `hc_gracetime = randomizedelaysec + timeoutsec`. I wonder weather it would make some sense to add a 'green grace time', but that raises even more questions for me: What is grace time actually intended for in a complete setup, using `/start` , `/<exit-code` and possibly log. I see three relevant durations in this context: Schedule offset: The expected time between the scheduled and the effective start time of the job. (RandomizeDelaySec and AccuracySec in Systemd) Duration: The maximum time between start and end of the job. (TimeoutSec in Systemd) Grace Time: Time to delay warnings to catch unexpected delays. I imagine handling these delays separately would be fairly complicated and it isn't a priority, since it's only really relevant for Systemd and gracetime can just be set accordingly.
Author
Owner

@cuu508 commented on GitHub (Dec 13, 2023):

I wonder weather it would make some sense to add a 'green grace time'

I've thought about making icons gradually shift from green to orange as they progress through the grace window. But I'm not sure if this would be an improvement, you would still see non-pure-green statuses, and it may in the end look even more busy with many different shades of green/orange.

Grace time was originally (A) the time to delay alerts when a success ping does not arrive on time.

When I added support for the /start signal, I made the grace time to serve a double duty, and also (B) constrain the maximum time gap between the start and success signal. This means users cannot tune A and B separately, but this does not seem to be a big issue in the practice. And it avoids having another slider in the UI.

Grace time can also be used to account for random startup delay, and for the client system's clock being slightly off.

<!-- gh-comment-id:1854166792 --> @cuu508 commented on GitHub (Dec 13, 2023): > I wonder weather it would make some sense to add a 'green grace time' I've thought about making icons gradually shift from green to orange as they progress through the grace window. But I'm not sure if this would be an improvement, you would still see non-pure-green statuses, and it may in the end look even more busy with many different shades of green/orange. Grace time was originally (A) the time to delay alerts when a success ping does not arrive on time. When I added support for the `/start` signal, I made the grace time to serve a double duty, and also (B) constrain the maximum time gap between the start and success signal. This means users cannot tune A and B separately, but this does not seem to be a big issue in the practice. And it avoids having another slider in the UI. Grace time can also be used to account for random startup delay, and for the client system's clock being slightly off.
Author
Owner

@ravench commented on GitHub (Dec 14, 2023):

How about adding a configurable percentage threshold for the icon changing to yellow?
I agree that gradual color shift would probably be more confusing than helpful. But just being able to configure "only turn yellow if 40% of the grace time has elapsed" would probably cover most use- and edgecases.

Question regarding grace time: If I have a Job scheduled for 00:00, a grace time of 1h and pings at 00:30 and 01:29, what would the behavior be?

<!-- gh-comment-id:1855114845 --> @ravench commented on GitHub (Dec 14, 2023): How about adding a configurable percentage threshold for the icon changing to yellow? I agree that gradual color shift would probably be more confusing than helpful. But just being able to configure "only turn yellow if 40% of the grace time has elapsed" would probably cover most use- and edgecases. Question regarding grace time: If I have a Job scheduled for 00:00, a grace time of 1h and pings at 00:30 and 01:29, what would the behavior be?
Author
Owner

@cuu508 commented on GitHub (Dec 14, 2023):

But just being able to configure "only turn yellow if 40% of the grace time has elapsed" would probably cover most use- and edgecases.

But it would add a configuration setting, that would need to be tucked in the UI somewhere, and explained in the docs. My suggestion would be to think of the orange status icons not as an error condition, but as a sign a particular check will run soon. Same as with traffic lights where orange means "the light will change soon".

Question regarding grace time: If I have a Job scheduled for 00:00, a grace time of 1h and pings at 00:30 and 01:29, what would the behavior be?

Assuming the check is initially up,

  • at 00:00 the check's grace period will start, and the icon will change to orange
  • at 00:30, after receiving a ping, the icon will change back to green
  • after that, the next expected ping is at the next midnight. Any early pings (e.g. at 01:29) does not affect the status, it will stay green.
<!-- gh-comment-id:1855451086 --> @cuu508 commented on GitHub (Dec 14, 2023): > But just being able to configure "only turn yellow if 40% of the grace time has elapsed" would probably cover most use- and edgecases. But it would add a configuration setting, that would need to be tucked in the UI somewhere, and explained in the docs. My suggestion would be to think of the orange status icons not as an error condition, but as a sign a particular check will run soon. Same as with traffic lights where orange means "the light will change soon". > Question regarding grace time: If I have a Job scheduled for 00:00, a grace time of 1h and pings at 00:30 and 01:29, what would the behavior be? Assuming the check is initially up, * at 00:00 the check's grace period will start, and the icon will change to orange * at 00:30, after receiving a ping, the icon will change back to green * after that, the next expected ping is at the next midnight. Any early pings (e.g. at 01:29) does not affect the status, it will stay green.
Author
Owner

@ravench commented on GitHub (Dec 14, 2023):

Ok thanks

<!-- gh-comment-id:1856598903 --> @ravench commented on GitHub (Dec 14, 2023): Ok thanks
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/healthchecks#646
No description provided.