[GH-ISSUE #704] Unknown reason for notifications being sent? #505

Closed
opened 2026-02-25 23:42:42 +03:00 by kerem · 11 comments
Owner

Originally created by @thestraycat on GitHub (Sep 14, 2022).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/704

I have a self-hosted healthchecks instance.

One of my checks is to monitor a shell script that takes around 43 minutes to complete. The script sends a /start request and an end request to healthchecks. It's scheduled to run at 4am every Wednesday with a 2hour grace time.

image

Today i notiiced that it completed in the expected grace period. But still sent out a notification to pushbullet & gotify regardless at 7am. See below:

image

The log from my script shows the curl command completing, the timezones are both the same on healthchecks container and the host the script runs on... What's happening here? And why 7am?

For what it might matter, I use a local version of the UUID so that it stays within my network as opposed to the externally facing UUID, i've tested both and both are recieved fine. (details obfuscated!)

eg.
External UUID: https://healthchecks.mydomain.co.uk/ping/5ffff230-ghkl-3219-1111-c03d804d123h
Internal UUID: http://192.168.1.125:8888/ping/5ffff230-ghkl-3219-1111-c03d804d123h

Originally created by @thestraycat on GitHub (Sep 14, 2022). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/704 I have a self-hosted healthchecks instance. One of my checks is to monitor a shell script that takes around 43 minutes to complete. The script sends a /start request and an end request to healthchecks. It's scheduled to run at 4am every Wednesday with a 2hour grace time. ![image](https://user-images.githubusercontent.com/9296295/190150563-88f884b3-7ceb-4450-9a7c-cd4577f819d5.png) Today i notiiced that it completed in the expected grace period. But still sent out a notification to pushbullet & gotify regardless at 7am. See below: ![image](https://user-images.githubusercontent.com/9296295/190150916-228e0743-5f4a-4775-97c8-5750939777d4.png) The log from my script shows the curl command completing, the timezones are both the same on healthchecks container and the host the script runs on... What's happening here? And why 7am? For what it might matter, I use a local version of the UUID so that it stays within my network as opposed to the externally facing UUID, i've tested both and both are recieved fine. (details obfuscated!) eg. External UUID: https://healthchecks.mydomain.co.uk/ping/5ffff230-ghkl-3219-1111-c03d804d123h Internal UUID: http://192.168.1.125:8888/ping/5ffff230-ghkl-3219-1111-c03d804d123h
kerem closed this issue 2026-02-25 23:42:42 +03:00
Author
Owner

@cuu508 commented on GitHub (Sep 16, 2022):

Thanks for the report. This looks puzzling to me too. A few questions:

  • Did it work as expected until now, and September 14 was the first time you got an unexpected notification? IOW, is this an old check that used to work, and broke "out of the blue"?
  • Have you made changes to the cron expression, the timezone or the grace time recently? (I'm assuming not, just making sure)
  • What version of Healthchecks are you running?
  • What database & database version are you using?
  • Please do the following experiment:
  1. ping the check manually (click "Ping" in the details page)
  2. then post a screenshot of the events log showing the ping that was just received
  3. then run the following query on the database and post its result:
select now(), alert_after from api_check where code='5ffff230-ghkl-3219-1111-c03d804d123h';
<!-- gh-comment-id:1249117709 --> @cuu508 commented on GitHub (Sep 16, 2022): Thanks for the report. This looks puzzling to me too. A few questions: * Did it work as expected until now, and September 14 was the first time you got an unexpected notification? IOW, is this an old check that used to work, and broke "out of the blue"? * Have you made changes to the cron expression, the timezone or the grace time recently? (I'm assuming not, just making sure) * What version of Healthchecks are you running? * What database & database version are you using? * Please do the following experiment: 1. ping the check manually (click "Ping" in the details page) 2. then post a screenshot of the events log showing the ping that was just received 3. then run the following query on the database and post its result: ``` select now(), alert_after from api_check where code='5ffff230-ghkl-3219-1111-c03d804d123h'; ```
Author
Owner

@thestraycat commented on GitHub (Sep 18, 2022):

Sorry for the late reply. I'll answer the questions and do the steps next, but for now just wanted to share what I've just noticed, I think i can see whats happening but dont quite know why...

So firstly, I'm based in the UK/London and if i run 'date' on my linux node (that runs healthchecks) it shows the correct London time here in London right now: Sun, 18 Sep 2022 20:08:46 +0100 (this obviously means nothing in the context of this reply but the +0100 shows my time zone is set correctly and the time displays as it should)

My cron check is set up for Europe/London time in helathchecks as well:

image

The job then starts and healthchecks report log's it in entry #88 as a 4am start: (i have it set to europe/london view tab)

image

But if i click on the entry #88 in the log it's actually reported that the ping is recieved as a 3am start: (i have it set to europe london view tab)

image

This is strange as i selected europe/london at the time of creating the job and also selected europe/london when i'm viewing the log entries.

So my hunch is that in reality healthchecks recieves my first ping at 3am, then recieves the end ping at 3.43am but my healthchecks job is still waiting until it's percieved 4am start time and an end ping before 6am! so the job fails as both the start and end ping both happen within the 1hour tim discrepency, so the job has started and ended before healtchecks starts looking for it at 4am?

What's really obscure though is that when the job inevitably fails, the healthcheck log shows the error at 7am! when i'd have expected it to show as 6am! (as the job is specificed between 4-6am)

Is healthchecks simply ignoring that the expected job time should be adjusted from UTC to UTC+1 (europe/london time) ?
image

I'll send the other info you requested now but thought i'd post the above findings.

<!-- gh-comment-id:1250372182 --> @thestraycat commented on GitHub (Sep 18, 2022): Sorry for the late reply. I'll answer the questions and do the steps next, but for now just wanted to share what I've just noticed, I think i can see whats happening but dont quite know why... So firstly, I'm based in the UK/London and if i run 'date' on my linux node (that runs healthchecks) it shows the correct London time here in London right now: Sun, 18 Sep 2022 20:08:46 +0100 (this obviously means nothing in the context of this reply but the +0100 shows my time zone is set correctly and the time displays as it should) My cron check is set up for Europe/London time in helathchecks as well: ![image](https://user-images.githubusercontent.com/9296295/190924476-077a5d11-1264-4352-9115-7a5b833d4274.png) The job then starts and healthchecks report log's it in entry #88 as a 4am start: (i have it set to europe/london view tab) ![image](https://user-images.githubusercontent.com/9296295/190925239-91a09586-3231-4e6c-9687-72c713e6b128.png) But if i click on the entry #88 in the log it's actually reported that the ping is recieved as a 3am start: (i have it set to europe london view tab) ![image](https://user-images.githubusercontent.com/9296295/190925173-57f54b87-e138-40ab-9781-f6b489fd82a9.png) This is strange as i selected europe/london at the time of creating the job and also selected europe/london when i'm viewing the log entries. So my hunch is that in reality healthchecks recieves my first ping at 3am, then recieves the end ping at 3.43am but my healthchecks job is still waiting until it's percieved 4am start time and an end ping before 6am! so the job fails as both the start and end ping both happen within the 1hour tim discrepency, so the job has started and ended before healtchecks starts looking for it at 4am? What's really obscure though is that when the job inevitably fails, the healthcheck log shows the error at 7am! when i'd have expected it to show as 6am! (as the job is specificed between 4-6am) Is healthchecks simply ignoring that the expected job time should be adjusted from UTC to UTC+1 (europe/london time) ? ![image](https://user-images.githubusercontent.com/9296295/190925280-774e27e4-1d37-4263-8bad-ceda8ed26273.png) I'll send the other info you requested now but thought i'd post the above findings.
Author
Owner

@thestraycat commented on GitHub (Sep 18, 2022):

Info as requested.

Thanks for the report. This looks puzzling to me too. A few questions:

Did it work as expected until now, and September 14 was the first time you got an unexpected notification? IOW, is this an old check that used to work, and broke "out of the blue"?

The whole of the long entry of this job is me simply running my script as various time to test it's functioning
to entries are being made sporadically and all seem to be working /start time is always being recieved and end time.
the September 14th date is the first time the script was triggered as a cron job to start at 4am.

Have you made changes to the cron expression, the timezone or the grace time recently? (I'm assuming not, just making sure)

Nope. Sept 14th is the first run of this script being triggered by a cron job and not run by me manually.

What version of Healthchecks are you running?

Healthchecks v2.3

What database & database version are you using?

sqlite (linuxserver.io https://github.com/linuxserver/docker-healthchecks) Couldn't find any info on it.

Please do the following experiment:
ping the check manually (click "Ping" in the details page)
then post a screenshot of the events log showing the ping that was just received
then run the following query on the database and post its result:
select now(), alert_after from api_check where code='5ffff230-ghkl-3219-1111-c03d804d123h';

This is a little tough as i dont believe linuxserver.io has thrown in the CLI tools to query the sqlite database
into their container, but i'm no expert, i'm sure there's an alternative way, can you advise?

<!-- gh-comment-id:1250413355 --> @thestraycat commented on GitHub (Sep 18, 2022): Info as requested. Thanks for the report. This looks puzzling to me too. A few questions: Did it work as expected until now, and September 14 was the first time you got an unexpected notification? IOW, is this an old check that used to work, and broke "out of the blue"? >> The whole of the long entry of this job is me simply running my script as various time to test it's functioning to entries are being made sporadically and all seem to be working /start time is always being recieved and end time. the September 14th date is the first time the script was triggered as a cron job to start at 4am. Have you made changes to the cron expression, the timezone or the grace time recently? (I'm assuming not, just making sure) >> Nope. Sept 14th is the first run of this script being triggered by a cron job and not run by me manually. What version of Healthchecks are you running? >> Healthchecks v2.3 What database & database version are you using? >>sqlite (linuxserver.io https://github.com/linuxserver/docker-healthchecks) Couldn't find any info on it. Please do the following experiment: ping the check manually (click "Ping" in the details page) then post a screenshot of the events log showing the ping that was just received then run the following query on the database and post its result: select now(), alert_after from api_check where code='5ffff230-ghkl-3219-1111-c03d804d123h'; >> This is a little tough as i dont believe linuxserver.io has thrown in the CLI tools to query the sqlite database into their container, but i'm no expert, i'm sure there's an alternative way, can you advise?
Author
Owner

@cuu508 commented on GitHub (Sep 19, 2022):

I think there's a timezone confusion somewhere – an UTC datetime is interpreted as Europe/London datetime or vice-versa.

As a sanity check, I want to test if the app stores a timezone-aware non-UTC datetime correctly to the database. Could you run the following:

  • launch python shell inside the container
docker ps  
# look up the CONTAINER ID and use it in the command below:
docker exec -it <CONTAINER-ID-HERE> python3 /app/healthchecks/manage.py shell
  • Inside the shell, paste the following:
from datetime import datetime
from zoneinfo import ZoneInfo
from hc.api.models import Check

c = Check(project_id=1)
c.alert_after = datetime(2022, 9, 19, 4, tzinfo=ZoneInfo("UTC"))
c.save()
c.refresh_from_db()
print(c.alert_after.isoformat())

c.alert_after = datetime(2022, 9, 19, 4, tzinfo=ZoneInfo("Europe/London"))
c.save()
c.refresh_from_db()
print(c.alert_after.isoformat())

The output I'd expect to see is:

>>> print(c.alert_after.isoformat())
2022-09-19T04:00:00+00:00
>>> print(c.alert_after.isoformat())
2022-09-19T03:00:00+00:00
<!-- gh-comment-id:1250666690 --> @cuu508 commented on GitHub (Sep 19, 2022): I think there's a timezone confusion somewhere – an UTC datetime is interpreted as Europe/London datetime or vice-versa. As a sanity check, I want to test if the app stores a timezone-aware non-UTC datetime correctly to the database. Could you run the following: * launch python shell inside the container ``` docker ps # look up the CONTAINER ID and use it in the command below: docker exec -it <CONTAINER-ID-HERE> python3 /app/healthchecks/manage.py shell ``` * Inside the shell, paste the following: ``` from datetime import datetime from zoneinfo import ZoneInfo from hc.api.models import Check c = Check(project_id=1) c.alert_after = datetime(2022, 9, 19, 4, tzinfo=ZoneInfo("UTC")) c.save() c.refresh_from_db() print(c.alert_after.isoformat()) c.alert_after = datetime(2022, 9, 19, 4, tzinfo=ZoneInfo("Europe/London")) c.save() c.refresh_from_db() print(c.alert_after.isoformat()) ``` The output I'd expect to see is: ``` >>> print(c.alert_after.isoformat()) 2022-09-19T04:00:00+00:00 >>> print(c.alert_after.isoformat()) 2022-09-19T03:00:00+00:00 ```
Author
Owner

@thestraycat commented on GitHub (Sep 20, 2022):

OK, so here's the output from that code snippet:

image

<!-- gh-comment-id:1252825544 --> @thestraycat commented on GitHub (Sep 20, 2022): OK, so here's the output from that code snippet: ![image](https://user-images.githubusercontent.com/9296295/191350022-4f06b369-015a-4c51-afbe-f6db0ac2ea77.png)
Author
Owner

@thestraycat commented on GitHub (Sep 21, 2022):

Just to add, the job is scheduled to run every 7 days. And it ran last night successfully which i dont understand as the time stamps are the same as the failed job!

image

Start Ping (21st of September) :

image

End Ping (21st of September) :

image

<!-- gh-comment-id:1253755951 --> @thestraycat commented on GitHub (Sep 21, 2022): Just to add, the job is scheduled to run every 7 days. And it ran last night successfully which i dont understand as the time stamps are the same as the failed job! ![image](https://user-images.githubusercontent.com/9296295/191524324-49c7ebf3-d952-43ce-a950-8e6cb2ed2fd2.png) Start Ping (21st of September) : ![image](https://user-images.githubusercontent.com/9296295/191524476-f17aa982-0531-46dc-928e-d43b87836aca.png) End Ping (21st of September) : ![image](https://user-images.githubusercontent.com/9296295/191524773-90f7fbf6-4140-455a-95ee-26b338928b43.png)
Author
Owner

@thestraycat commented on GitHub (Sep 26, 2022):

@cuu508 Do you need anything else from me at all?

<!-- gh-comment-id:1258292148 --> @thestraycat commented on GitHub (Sep 26, 2022): @cuu508 Do you need anything else from me at all?
Author
Owner

@cuu508 commented on GitHub (Oct 7, 2022):

the timezones are both the same on healthchecks container and the host the script runs on...

  • What is the OS, OS version and timezone on the host system that runs the Healthchecks container?
  • What is the output of timedatectl on the host OS?
  • What is cat /etc/timezone output on the host OS?
  • What is date output on the host OS?
  • What is the timezone inside the container?
  • What is the date output inside the container?
  • In /admin/api/flip/ locate the Flip object from September 14. What are the "Created" and "Processed" values?
  • In /admin/api/ping/ locate Ping objects from September 14. What are the "Created" values?
  • In /admin/api/notification/ locate Notification objects from September 14. What are the "Created" values?
<!-- gh-comment-id:1271429362 --> @cuu508 commented on GitHub (Oct 7, 2022): > the timezones are both the same on healthchecks container and the host the script runs on... * What is the OS, OS version and timezone on the host system that runs the Healthchecks container? * What is the output of `timedatectl` on the host OS? * What is `cat /etc/timezone` output on the host OS? * What is `date` output on the host OS? * What is the timezone inside the container? * What is the `date` output inside the container? * In `/admin/api/flip/` locate the Flip object from September 14. What are the "Created" and "Processed" values? * In `/admin/api/ping/` locate Ping objects from September 14. What are the "Created" values? * In `/admin/api/notification/` locate Notification objects from September 14. What are the "Created" values?
Author
Owner

@thestraycat commented on GitHub (Oct 12, 2022):

@cuu508 - Here you go!

What is the OS, OS version and timezone on the host system that runs the Healthchecks container?

Unraid, Version 6.10.3,
image

What is the output of timedatectl on the host OS?

Unraid's terminal dosn't have this command.
image

What is cat /etc/timezone output on the host OS?

Unraid's terminal dosnt have this command.
image

What is date output on the host OS?

Wed Oct 12 15:16:53 BST 2022

What is the timezone inside the container? (none of the timezone commands above work in the container either, below info is from the docker run file)

-e TZ="Europe/London"

What is the date output inside the container?

Wed Oct 12 15:18:50 BST 2022

In /admin/api/flip/ locate the Flip object from September 14. What are the "Created" and "Processed" values?

image

In /admin/api/ping/ locate Ping objects from September 14. What are the "Created" values?

image

In /admin/api/notification/ locate Notification objects from September 14. What are the "Created" values?
image

<!-- gh-comment-id:1276359794 --> @thestraycat commented on GitHub (Oct 12, 2022): @cuu508 - Here you go! What is the OS, OS version and timezone on the host system that runs the Healthchecks container? >> Unraid, Version 6.10.3, ![image](https://user-images.githubusercontent.com/9296295/195366256-62614515-511a-4e22-bed5-1d6266bf78dd.png) What is the output of timedatectl on the host OS? >> Unraid's terminal dosn't have this command. ![image](https://user-images.githubusercontent.com/9296295/195367252-21c62979-6a7a-49fe-9451-4eed65fb4e2e.png) What is cat /etc/timezone output on the host OS? >> Unraid's terminal dosnt have this command. ![image](https://user-images.githubusercontent.com/9296295/195367413-5020f85c-f30a-4966-9f98-a47eeddd5026.png) What is date output on the host OS? >> Wed Oct 12 15:16:53 BST 2022 What is the timezone inside the container? (none of the timezone commands above work in the container either, below info is from the docker run file) >> -e TZ="Europe/London" What is the date output inside the container? >> Wed Oct 12 15:18:50 BST 2022 In /admin/api/flip/ locate the Flip object from September 14. What are the "Created" and "Processed" values? >> ![image](https://user-images.githubusercontent.com/9296295/195382185-b6e24c36-bedc-4337-86ce-447a57aac09a.png) In /admin/api/ping/ locate Ping objects from September 14. What are the "Created" values? >> ![image](https://user-images.githubusercontent.com/9296295/195382645-02fbdc1f-36a4-4b5f-9d2d-ff55a50e3035.png) In /admin/api/notification/ locate Notification objects from September 14. What are the "Created" values? ![image](https://user-images.githubusercontent.com/9296295/195383331-3e740882-b1e8-46b9-920d-81687754f7de.png)
Author
Owner

@thestraycat commented on GitHub (Oct 12, 2022):

In other news, there has recently been an update to the linuxserver container.
The weekly scheduled running of my script that we have been troubleshooting happened last night, and it seems to be showing as having run as expected at 4am. I've made no changes my end whatsoever.

image

Could the timezone config of the healthchecks linuxserver container have been at fault? Perhaps it's been recently ammended in a new container update?

I'm currently running: Linuxserver.io version:- v2.3-ls138 Build-date:- 2022-10-11T15:50:40+02:00

<!-- gh-comment-id:1276368744 --> @thestraycat commented on GitHub (Oct 12, 2022): In other news, there has recently been an update to the linuxserver container. The weekly scheduled running of my script that we have been troubleshooting happened last night, and it seems to be showing as having run as expected at 4am. I've made no changes my end whatsoever. ![image](https://user-images.githubusercontent.com/9296295/195384720-4906bb8a-227a-4d7c-aa0c-53ffa24cc5be.png) Could the timezone config of the healthchecks linuxserver container have been at fault? Perhaps it's been recently ammended in a new container update? I'm currently running: Linuxserver.io version:- v2.3-ls138 Build-date:- 2022-10-11T15:50:40+02:00
Author
Owner

@cuu508 commented on GitHub (Oct 12, 2022):

Thanks for the answers, appreciate it.

Your configuration is quite far from what I'm familiar with:

  • you're using a 3rd party docker image
  • non-Debian container OS (Alpine, nothing wrong with that, I just don't have any significant experience with it)
  • non-Debian host OS (unraid, nothing wrong with that, I just don't have any experience with it)
  • you're setting a TZ env var. I've not tested that configuration, and don't know how various components (python, Django, libsqlite) would handle it

I still think there is or was a timezone confusion somewhere, but not sure where. Perhaps you ran the container without "TZ" env var at some point? If you find a way to reliably reproduce the problem, I'm happy to look into it further. But, right now, I don't think I'll be able to solve this :-/

<!-- gh-comment-id:1276412109 --> @cuu508 commented on GitHub (Oct 12, 2022): Thanks for the answers, appreciate it. Your configuration is quite far from what I'm familiar with: * you're using a 3rd party docker image * non-Debian container OS (Alpine, nothing wrong with that, I just don't have any significant experience with it) * non-Debian host OS (unraid, nothing wrong with that, I just don't have any experience with it) * you're setting a TZ env var. I've not tested that configuration, and don't know how various components (python, Django, libsqlite) would handle it I still think there is or was a timezone confusion somewhere, but not sure where. Perhaps you ran the container without "TZ" env var at some point? If you find a way to reliably reproduce the problem, I'm happy to look into it further. But, right now, I don't think I'll be able to solve this :-/
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/healthchecks#505
No description provided.