mirror of
https://github.com/healthchecks/healthchecks.git
synced 2026-04-25 15:05:49 +03:00
[GH-ISSUE #39] Concurrent sendalerts #19
Labels
No labels
bug
bug
bug
feature
good-first-issue
new integration
pull-request
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/healthchecks#19
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @cuu508 on GitHub (Jan 31, 2016).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/39
I've been thinking and experimenting with HA stuff quite a bit recently. One of the pieces of the puzzle is being able to run two "sendalerts" worker processes concurrently. So if one goes away, the other one picks up the slack. Obviously they must not send the notifications twice so there needs to be some kind of a locking mechanism.
Assume the database (PostgreSQL) is beefy, can take the load, has automatic failover, etc. Assume there's no separate message broker. Here's what I'm considering:
api_checktable:alert_worker(varchar) andalert_date(date).api_checktable and looks for alerts that need to be sent–same as now. But when it finds one:alert_workerandalert_datecolumns with worker's name and current date, and commitsalert_workerandalert_datefieldsMy thinking is, if the database supports at least "read commited" isolation level, this should prevent multiple workers from sending duplicate alerts. The
alert_workerfield is effectively an application-level lock, and thealert_datecan be used for calculating its expiry date (in case a worker dies before it blanks out the the two fields).Alternatively, with PostgreSQL, the advisory locks could be used, and would likely perform better. But then MySQL would still need a separate mechanism like the above.
@diwu1989 commented on GitHub (Feb 19, 2016):
this is too complicated, please just use Celery with rabbitMQ, spin up multiple workers, and configure RabbitMQ to persistent HA for the message queue
the bottleneck is the interaction with the external services, not the actual DB alert checking query, so have one scheduled task in Celery to check and then queue up N alert tasks
@cuu508 commented on GitHub (Sep 15, 2016):
Now it should be safe to run multiple
manage.py sendalertsprocesses at the same time.The important part is this:
In a race condition when two processes try to update check's status to the same value, only one of them will get back a "number of rows changed = 1". The race winner then sends out notifications.
Notifications are sent on a thread so long running HTTP requests don't stall the loop.