[GH-ISSUE #553] Rate limit in the ping API? #400

New issue

Closed

opened 2026-02-25 23:42:19 +03:00 by kerem · 2 comments

kerem commented

2026-02-25 23:42:19 +03:00

Owner

Originally created by @ben-z on GitHub (Aug 19, 2021).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/553

Is there a rate limit for the ping API? I can't seem to find any documentation on this.

My use case is monitoring a distributed file system (Ceph)'s health status. I have a cluster of 10 machines and ceph health would return the same up/down status for each machine. To be able to know the health status of the distributed filesystem independent of whether individual machines are up, I placed a cron job on each machine in the cluster that pings the same healthchecks.io check every minute. I expected to see 10 health pings every minute. However I'm only seeing 3 pings every minute coming from different machines, which led me to believe that there is a rate limit on pings for each health check.

Alternatively, is there a better way to monitor a distributed service that is fault tolerant to individual machines going down?

Thanks!

Originally created by @ben-z on GitHub (Aug 19, 2021). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/553 Is there a rate limit for the ping API? I can't seem to find any documentation on this. My use case is monitoring a distributed file system (Ceph)'s health status. I have a cluster of 10 machines and `ceph health` would return the same up/down status for each machine. To be able to know the health status of the distributed filesystem independent of whether individual machines are up, I placed a cron job on each machine in the cluster that pings the same healthchecks.io check every minute. I expected to see 10 health pings every minute. However I'm only seeing 3 pings every minute coming from different machines, which led me to believe that there is a rate limit on pings for each health check. Alternatively, is there a better way to monitor a distributed service that is fault tolerant to individual machines going down? Thanks!

kerem closed this issue

2026-02-25 23:42:19 +03:00

kerem commented

2026-02-25 23:42:20 +03:00

Author

Owner

@cuu508 commented on GitHub (Aug 19, 2021):

Yes, the hosted service at https://healthchecks.io has rate limiting. You can see this in action if you copy a ping URL and open it in the browser. The response body will say "OK", but if you refresh the page a few times, you will start seeing "OK (rate-limited)".

Currently a single ping URL will accept up to 10 requests per minute. Above that, you may or may not get rate limited.

Alternatively, is there a better way to monitor a distributed service that is fault tolerant to individual machines going down?

In what cases would you want to receive alerts? Let's say, one machine reports as unhealthy, would you want an alert for that? Or maybe if 50% or more machines are unhealthy?

I would probably want alerts about individual machines even if the cluster as a whole is still OK. I would create a separate check for each machine – this can be automated.

@cuu508 commented on GitHub (Aug 19, 2021): Yes, the hosted service at https://healthchecks.io has rate limiting. You can see this in action if you copy a ping URL and open it in the browser. The response body will say "OK", but if you refresh the page a few times, you will start seeing "OK (rate-limited)". Currently a single ping URL will accept up to 10 requests per minute. Above that, you may or may not get rate limited. > Alternatively, is there a better way to monitor a distributed service that is fault tolerant to individual machines going down? In what cases would you want to receive alerts? Let's say, one machine reports as unhealthy, would you want an alert for that? Or maybe if 50% or more machines are unhealthy? I would probably want alerts about individual machines even if the cluster as a whole is still OK. I would create a separate check for each machine – this can be automated.

kerem commented

2026-02-25 23:42:20 +03:00

Author

Owner

@ben-z commented on GitHub (Aug 19, 2021):

Thanks for the answer! It looks like in addition to the 10 request per minute limit, there is a limit of one or two requests per second (which is completely understandable).

In what cases would you want to receive alerts? Let's say, one machine reports as unhealthy, would you want an alert for that? Or maybe if 50% or more machines are unhealthy?

The system is set up in a way that all of the machines will report the same up/down status so we will never run into a situation where a subset of the machines reports healthy and another reports unhealthy. In that case I think (with my limited experience with monitoring) it is okay to use the same check for this service for all of the machines (to reduce the number of checks). Also we have separate check for each individual machine so I think it would be redundant to have multiple checks for this distributed service.

Thanks for your insights!

@ben-z commented on GitHub (Aug 19, 2021): Thanks for the answer! It looks like in addition to the 10 request per minute limit, there is a limit of one or two requests per second (which is completely understandable). > In what cases would you want to receive alerts? Let's say, one machine reports as unhealthy, would you want an alert for that? Or maybe if 50% or more machines are unhealthy? The system is set up in a way that all of the machines will report the same up/down status so we will never run into a situation where a subset of the machines reports healthy and another reports unhealthy. In that case I think (with my limited experience with monitoring) it is okay to use the same check for this service for all of the machines (to reduce the number of checks). Also we have separate check for each individual machine so I think it would be redundant to have multiple checks for this distributed service. Thanks for your insights!