mirror of
https://github.com/healthchecks/healthchecks.git
synced 2026-04-24 22:45:56 +03:00
[GH-ISSUE #1201] uWSGI sometimes does not restart crashed sendreports/sendalerts processes #814
Labels
No labels
bug
bug
bug
feature
good-first-issue
new integration
pull-request
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/healthchecks#814
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @heikoh81 on GitHub (Aug 18, 2025).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/1201
Hi,
I have dockerized healthchecks 3.10 with mariadb-backend (healthchecks/healthchecks:v3.10) running locally for my homelab.
I have set up Email-notification with weekly reports.
For several months, I only get 2 weekly notification after restart of container, then no more weekly reports. This has happened for several times now. As far as I know even with older versions of healthchecks (3.7) (because this problem was the reason for my update to 3.10).
I just sent a test mail, which works.
Then I checked Account Settings -> Email reports (Weekly on Mondays Remind me daily), and at the very bottom I see this:
"Next weekly report date is August 4, 2025."
-> The today date is August 18, 2025 !!!
For some reason, the "next report date" does not get updated.
And I think this is the reason why I don't get weekly notifications!
For testing, I manually restarted the container --> I immediately got my weekly report, and "Next weekly report date is August 25, 2025."
I could create a cronjob for restart of the container every tuesday, however, I think if this is a bug this should be fixed within healthchecks, as I don't like manual fixed that get forgotten over time... :-)
But maybe this is only a problem only on my setup? I have a minimal standard Debian VM running on Proxmox that is used only for Docker.
Any help,
Thanks,
Heiko
@cuu508 commented on GitHub (Aug 20, 2025):
Thanks for the report.
Reports are sent by the
manage.py sendreportsmanagement command. This command is run on container startup automatically by uwsgi.If
sendreportsthrows an exception and exits, uwsgi restarts it automatically. From your description it sounds like thesendreportscommand somehow gets stuck (not sure how or why), but does not crash.What container image are you using?
Do you have access to historic docker logs? Can you check for logs around the time the report was supposed to be sent (sometime August 4, 2025)?
@heikoh81 commented on GitHub (Aug 20, 2025):
Hi,
thanks for your reply.
I'm using the standard container provided on docker hub:
docker log
I found 2 log entries where something with manage.py seems to go wrong.
This is for 2025-07-22. Don't know what happend here but probably the container was restarted by myself.
Then on 2025-07-28 something goes wrong again, and after that, there is no more entry found for manage.py until my manual restart of the container when I started posting this issue.
One hint regarding the very last log entry "[Errno 111] Connection refused".
I had some problems with email account settings in my Proxmox Backup Server - that triggered my own fail2ban instance and blocking my own IP again and again. The same IP that healthcecks is using for sending mails.
Proxbox Backup Server also showed that Error 111.
I have fixed this in Proxmox Backup Server around the end of July and havent't restarted healthchecks ever since (restart of healthechecks on 2025-08-18).
It could be that healthchecks tried to send the weekly report while the IP was banned by fail2ban and could not connect to the mailserver.
Nonetheless the date for next report was updated to 2025-08-04, but stuck there with days advancing.
So maybe weekly reports function in healthchecks is somehow irritated if it isn't able to send a mail once, and does not try it any more in the future unless it is restarted.
However, the weekly report should not stop working if it is not possible to send the mail in one week. It should still try it next week. Even better it should try the weekly report again after 1 or 2 hours if it fails to do so the first time.
There is always the possibility that a mailserver is down for whatever reason.
As you can see in the log above, a report was sent successfully when I had a nightly downtime on my WAN remote backup site on 2025-08-13.
So this function was not interrupted.
Regards,
Heiko
@cuu508 commented on GitHub (Aug 21, 2025):
As an experiment, I started a healthchecks container like so:
The email port is set to 1234 but there's nothing listening on this port, so
sendreportsshould crash the first time it attempts to send a report.I then logged into the admin interface and changed my user's next report date to a date in the past to trigger a report. Sure enough, about a minute later, sendreports crashed:
In my experiment, after sendreports crashed, uwsgi restarted it right away.
In your case though, uwsgi logged:
It did not respawn
sendreportsand so you stopped receiving weekly reports.So what's different in these two cases? I looked around uwsgi's source code and it looks like uwsgi prints the "daemon .... annihilated" message and does not attempt to respawn if uwsgi master process is either reloading or shutting down (source).
If we look back at your logs there's indeed a line that would suggest a shutdown in process:
This line is printed before the main app has finished initializing. So I'm guessing what is happening is:
sendreportscrashes, uwsgi does not respawn it because it is still shutting downI managed to reproduce this locally too: I started uwsgi from command line and pressed Ctrl+C almost immediately. This produced the log message:
Afterwards I manipulated
sendreportsto crash and it was indeed not restarted:I'm not sure what's the correct solution here. Perhaps uwsgi should be patched to handle cases where it receives SIGINT/SIGTERM while the app is still loading. But I don't have the skills to contribute that patch.
@heikoh81 commented on GitHub (Aug 21, 2025):
Thanks for checking everything.
I don't know what happened on 2025-07-22, maybe I restarted the VM or even the Proxmox Host.
Maybe I just restarted the healthcheck container.
However, I just use portainer to do that, I don't enter the container.
So I did not kill uwsgi manually.
What I remember for sure is that it is not the first time that weekly reports stopped. It has been going on for several months now.
That's why I upgraded healthcheck from 3.7 own build without heartbeat (https://github.com/healthchecks/healthchecks/issues/1071) to the standard 3.10 container.
I have restarted the container, does that mean uwsgi is in a normal state again, or is it still in "shutting down state"?
@cuu508 commented on GitHub (Aug 21, 2025):
If you don't see
SIGINT/SIGTERM received...killing workers...in logs since the start then it is in normal state.@heikoh81 commented on GitHub (Aug 21, 2025):
Checked the log again.
No SIGINT/SIGTERM since 2025-07-22.
I will leave this issue open for two more weeks,
and see if I get a weekly report on the next 2 mondays.
@cuu508 commented on GitHub (Sep 25, 2025):
There's another clue in your log output, before this line:
There is this line:
So I think the more complete story is:
manage.py migratehook failed because it could not connect to the database (perhaps the database was starting up simultaneously and was not yet ready?)sendalertsandsendreportsprocesses still started upsendreportscrashed for unrelated reason, but was not restarted because uwsgi was technically in a "shutting down" stateI filed a bug for uWSGI here: https://github.com/unbit/uwsgi/issues/2741