mirror of
https://github.com/healthchecks/healthchecks.git
synced 2026-04-25 15:05:49 +03:00
[GH-ISSUE #553] Rate limit in the ping API? #400
Labels
No labels
bug
bug
bug
feature
good-first-issue
new integration
pull-request
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/healthchecks#400
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ben-z on GitHub (Aug 19, 2021).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/553
Is there a rate limit for the ping API? I can't seem to find any documentation on this.
My use case is monitoring a distributed file system (Ceph)'s health status. I have a cluster of 10 machines and
ceph healthwould return the same up/down status for each machine. To be able to know the health status of the distributed filesystem independent of whether individual machines are up, I placed a cron job on each machine in the cluster that pings the same healthchecks.io check every minute. I expected to see 10 health pings every minute. However I'm only seeing 3 pings every minute coming from different machines, which led me to believe that there is a rate limit on pings for each health check.Alternatively, is there a better way to monitor a distributed service that is fault tolerant to individual machines going down?
Thanks!
@cuu508 commented on GitHub (Aug 19, 2021):
Yes, the hosted service at https://healthchecks.io has rate limiting. You can see this in action if you copy a ping URL and open it in the browser. The response body will say "OK", but if you refresh the page a few times, you will start seeing "OK (rate-limited)".
Currently a single ping URL will accept up to 10 requests per minute. Above that, you may or may not get rate limited.
In what cases would you want to receive alerts? Let's say, one machine reports as unhealthy, would you want an alert for that? Or maybe if 50% or more machines are unhealthy?
I would probably want alerts about individual machines even if the cluster as a whole is still OK. I would create a separate check for each machine – this can be automated.
@ben-z commented on GitHub (Aug 19, 2021):
Thanks for the answer! It looks like in addition to the 10 request per minute limit, there is a limit of one or two requests per second (which is completely understandable).
The system is set up in a way that all of the machines will report the same up/down status so we will never run into a situation where a subset of the machines reports healthy and another reports unhealthy. In that case I think (with my limited experience with monitoring) it is okay to use the same check for this service for all of the machines (to reduce the number of checks). Also we have separate check for each individual machine so I think it would be redundant to have multiple checks for this distributed service.
Thanks for your insights!