[GH-ISSUE #626] Implement check auto-provisioning when pinging with a slug that does not exist #456

Closed
opened 2026-02-25 23:42:31 +03:00 by kerem · 12 comments
Owner

Originally created by @mike503 on GitHub (Mar 29, 2022).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/626

I am able to confirm that pre-existing slugs work. However, non-existent ones do not.

However, following your example here https://blog.healthchecks.io/2021/09/monitoring-postgresql-with-pgmetrics-and-pgdash/ and using runitor (and actually, without runitor, simply using curl) it gives a 404 when supplying a ping key and slug. It needs a pre-existing ping key/slug URL. I have a ping key set in the project. I've tried every combination and it doesn't work.

# curl https://hc-ping.com/my-ping-key/doesnt-exist
not found
# curl https://hc-ping.com/my-ping-key/already-exists
OK

According to your blog post (and runitor) it seems like this should auto-provision the slug "doesnt-exist" for me, which is desired. It doesn't though.

Originally created by @mike503 on GitHub (Mar 29, 2022). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/626 I am able to confirm that pre-existing slugs work. However, non-existent ones do not. However, following your example here https://blog.healthchecks.io/2021/09/monitoring-postgresql-with-pgmetrics-and-pgdash/ and using runitor (and actually, without runitor, simply using curl) it gives a 404 when supplying a ping key and slug. It needs a pre-existing ping key/slug URL. I have a ping key set in the project. I've tried every combination and it doesn't work. ``` # curl https://hc-ping.com/my-ping-key/doesnt-exist not found ``` ``` # curl https://hc-ping.com/my-ping-key/already-exists OK ``` According to your blog post (and runitor) it seems like this should auto-provision the slug "doesnt-exist" for me, which is desired. It doesn't though.
kerem 2026-02-25 23:42:31 +03:00
  • closed this issue
  • added the
    feature
    label
Author
Owner

@cuu508 commented on GitHub (Apr 4, 2022):

According to your blog post (and runitor) it seems like this should auto-provision the slug "doesnt-exist" for me, which is desired. It doesn't though.

Sorry for the confusion, I should have made it explicit in the blog post that there is no auto-provisioning. You have to create checks either via the web UI or using the management API before pinging a slug URL.

I think auto-provisioning would be a neat feature to have.

<!-- gh-comment-id:1087332163 --> @cuu508 commented on GitHub (Apr 4, 2022): > According to your blog post (and runitor) it seems like this should auto-provision the slug "doesnt-exist" for me, which is desired. It doesn't though. Sorry for the confusion, I should have made it explicit in the blog post that there is no auto-provisioning. You have to create checks either via the web UI or using the management API before pinging a slug URL. I think auto-provisioning would be a neat feature to have.
Author
Owner

@mike503 commented on GitHub (Apr 4, 2022):

Yes it’s a huge desire. It’s actually one I require in a service, and don’t want to have 80+ instances with a wrapper script to issue management API calls before a quick cron hit (one of them fires every minute) - that’s a lot of extra chatter. Cronitor supports this and is really the only reason I’m having to use them over you at this point.

<!-- gh-comment-id:1087339220 --> @mike503 commented on GitHub (Apr 4, 2022): Yes it’s a huge desire. It’s actually one I require in a service, and don’t want to have 80+ instances with a wrapper script to issue management API calls before a quick cron hit (one of them fires every minute) - that’s a lot of extra chatter. Cronitor supports this and is really the only reason I’m having to use them over you at this point.
Author
Owner

@cuu508 commented on GitHub (Apr 4, 2022):

Thanks, I'm noting the interest.

<!-- gh-comment-id:1087368381 --> @cuu508 commented on GitHub (Apr 4, 2022): Thanks, I'm noting the interest.
Author
Owner

@cuu508 commented on GitHub (Aug 8, 2022):

I'm looking into this and would like to discuss a few implementation choices.

Security. If we implement auto-provisioning, the Ping Key gains the power of creating new checks. Let's say Alice has shared a slug-based ping URL with Bob. Currently Bob can only ping the one Alice's check (and maybe also the other Alice's checks if he can guess their slugs). After implementing auto-provisioning, Bob can now also create new checks in Alice's account, which Alice may not expect. Is this a valid concern? Should auto-provisioning perhaps need to be explicitly turned on in the "Project Settings" page?

My current preference: keep it simple, auto-provisioning is enabled for all projects that have a Ping Key, no explicit toggle in "Project Settings".

What to do when user tries to use a slug with invalid syntax? A few examples of invalid slugs:

  • foo--bar: the slug should not contain repeated single dashes or underscores
  • -foo: the slug should should not have leading or trailing single dashes or underscores
  • Foo: the slug should not use uppercase letters

Now, let's say the user tries to auto-provisioning a check with foo--bar slug. What are our choices?

  • Accept the slug and use it as is. This will cause problems later when the user edits the check (adds tags, changes schedule) from web UI. Upon saving the check, the slug will get normalized from foo--bar to foo-bar.
  • Validate slug syntax and reject ping requests that use invalid slugs. This is not ideal in terms of developer ergonomics: as a developer, I don't want to care about the specific slug syntax rules. I would expect foo--bar to be usable as-is. I would also be annoyed if slugs with uppercase letters cannot be used. From my point of view, the service is being nitpicky for no good reason.

My current preference: reject invalid slugs with HTTP 400.

How to handle account limits when provisioning new checks. Let's say my account is already at the check limit, and I try to auto-provision another check. What should happen?

  • Should the HTTP request return a non-200 status code?
  • Should there also be an alert in the web dashboard, or a warning email to the account owner saying "Auto-provisioning for slug such-and-such failed because your account is maxxed out"?

My current preference: return HTTP 403, and in the web UI show a warning message that can be dismissed.

Footguns:

  • If I make a mistake in slug generation in my monitoring script, with auto-provisioning active, I can quickly fill my project with garbage checks.
  • If auto-provisioning fails for any reason (invalid slug, account is at check limit), the user may not notice that the new check was not created, and monitoring for their new thing is not active.

My current preference: at some point in the future implement batch actions so garbage checks can be cleaned up quickly. When auto-provisioning fails due to account limits, show a dismiss-able warning in the web UI.

<!-- gh-comment-id:1207988391 --> @cuu508 commented on GitHub (Aug 8, 2022): I'm looking into this and would like to discuss a few implementation choices. **Security**. If we implement auto-provisioning, the Ping Key gains the power of creating new checks. Let's say Alice has shared a slug-based ping URL with Bob. Currently Bob can only ping the one Alice's check (and maybe also the other Alice's checks if he can guess their slugs). After implementing auto-provisioning, Bob can now also create new checks in Alice's account, which Alice may not expect. Is this a valid concern? Should auto-provisioning perhaps need to be explicitly turned on in the "Project Settings" page? My current preference: keep it simple, auto-provisioning is enabled for all projects that have a Ping Key, no explicit toggle in "Project Settings". **What to do when user tries to use a slug with invalid syntax?** A few examples of invalid slugs: * `foo--bar`: the slug should not contain repeated single dashes or underscores * `-foo`: the slug should should not have leading or trailing single dashes or underscores * `Foo`: the slug should not use uppercase letters Now, let's say the user tries to auto-provisioning a check with `foo--bar` slug. What are our choices? * Accept the slug and use it as is. This will cause problems later when the user edits the check (adds tags, changes schedule) from web UI. Upon saving the check, the slug will get normalized from `foo--bar` to `foo-bar`. * Validate slug syntax and reject ping requests that use invalid slugs. This is not ideal in terms of developer ergonomics: as a developer, I don't want to care about the specific slug syntax rules. I would expect `foo--bar` to be usable as-is. I would also be annoyed if slugs with uppercase letters cannot be used. From my point of view, the service is being nitpicky for no good reason. My current preference: reject invalid slugs with HTTP 400. **How to handle account limits when provisioning new checks**. Let's say my account is already at the check limit, and I try to auto-provision another check. What should happen? * Should the HTTP request return a non-200 status code? * Should there also be an alert in the web dashboard, or a warning email to the account owner saying "Auto-provisioning for slug such-and-such failed because your account is maxxed out"? My current preference: return HTTP 403, and in the web UI show a warning message that can be dismissed. **Footguns**: * If I make a mistake in slug generation in my monitoring script, with auto-provisioning active, I can quickly fill my project with garbage checks. * If auto-provisioning fails for any reason (invalid slug, account is at check limit), the user may not notice that the new check *was not* created, and monitoring for their new thing *is not* active. My current preference: at some point in the future implement batch actions so garbage checks can be cleaned up quickly. When auto-provisioning fails due to account limits, show a dismiss-able warning in the web UI.
Author
Owner

@mike503 commented on GitHub (Aug 8, 2022):

Very thorough thoughts, but I think most of that should be within the customer's responsibility. If they have ugly slugs, that's on them. If they auto provision too many, same deal. Maybe just have some sort of threshold to email (either a notice about the number of checks growing fast, courtesy notice) or an account level threshold, and send them a reminder when they're getting close to it, things like that. Reject it if they are at maximum, maybe send an email that said "we just got a request for a new monitor , but you're at your limit" and a link to bump it up.

I'd rather have the responsibility to manage that stuff myself than limited by the platform.

<!-- gh-comment-id:1208234617 --> @mike503 commented on GitHub (Aug 8, 2022): Very thorough thoughts, but I think most of that should be within the customer's responsibility. If they have ugly slugs, that's on them. If they auto provision too many, same deal. Maybe just have some sort of threshold to email (either a notice about the number of checks growing fast, courtesy notice) or an account level threshold, and send them a reminder when they're getting close to it, things like that. Reject it if they are at maximum, maybe send an email that said "we just got a request for a new monitor <slug name>, but you're at your limit" and a link to bump it up. I'd rather have the responsibility to manage that stuff myself than limited by the platform.
Author
Owner

@cuu508 commented on GitHub (Aug 10, 2022):

If they have ugly slugs, that's on them.

The issue with allowing ugly slugs (-foo, foo--bar, Foo) is unintuitive behaviour when the user edits schedule:

  • the user pings a slug -foo, a check with that slug gets created
  • in web UI, the user edits the check's description and tags. When saving changes, the slug gets normalized to foo
  • the user pings the slug -foo again, another check gets created. The user now has two checks: foo and -foo.

I'm leaning towards rejecting invalid slugs (return HTTP 400 when pinging them). Document the slug syntax rules, and if the user pings an invalid slug and gets an error response, that's on them. Some other options would be:

  • the scenario described above where we accept invalid slugs, but later update them when editing check's tags, description. Silent slug updates may cause user confusion – "where did this other check come from?"
  • accept invalid slugs. Once the slug is set, don't allow it to ever change. This would be inconsistent with the current, documented behavior of the system – slugs are generated from check's name. Changing name changes slug.
<!-- gh-comment-id:1210423339 --> @cuu508 commented on GitHub (Aug 10, 2022): > If they have ugly slugs, that's on them. The issue with allowing ugly slugs (`-foo`, `foo--bar`, `Foo`) is unintuitive behaviour when the user edits schedule: * the user pings a slug `-foo`, a check with that slug gets created * in web UI, the user edits the check's description and tags. When saving changes, the slug gets normalized to `foo` * the user pings the slug `-foo` again, *another* check gets created. The user now has two checks: `foo` and `-foo`. I'm leaning towards rejecting invalid slugs (return HTTP 400 when pinging them). Document the slug syntax rules, and if the user pings an invalid slug and gets an error response, that's on them. Some other options would be: * the scenario described above where we accept invalid slugs, but later update them when editing check's tags, description. Silent slug updates may cause user confusion – "where did this other check come from?" * accept invalid slugs. Once the slug is set, don't allow it to ever change. This would be inconsistent with the current, documented behavior of the system – slugs are generated from check's name. Changing name changes slug.
Author
Owner

@mike503 commented on GitHub (Aug 12, 2022):

that'd be fine, ultimately. simply getting the ability to have slugs created on demand is the biggest thing.

<!-- gh-comment-id:1212832964 --> @mike503 commented on GitHub (Aug 12, 2022): that'd be fine, ultimately. simply getting the ability to have slugs created on demand is the biggest thing.
Author
Owner

@cuu508 commented on GitHub (Oct 14, 2022):

After doing some more thinking and prototyping, I've decided to pass on this feature, at least for now. It is tempting to have, but also a ton of work and a ton of extra complexity to implement properly. And I'm not willing to bear with a half-assed implementation :-)

A workaround is to use a wrapper script, in nutshell:

  1. send a ping request
  2. if the ping request returns 404, use management API to create the check, then send the ping request again

A quick PoC in python:

import requests

API_KEY = "..."
PING_KEY = "..."
SLUG = "..."

r = requests.get(f"https://hc-ping.com/{PING_KEY}/{SLUG}")
if r.status_code == 404:
    payload = {"api_key": API_KEY, "name": SLUG, "unique": ["name"]}
    requests.post("https://healthchecks.io/api/v1/checks/", json=payload)
    r = requests.get(f"https://hc-ping.com/{PING_KEY}/{SLUG}")

if r.status_code == 200:
    print("ping successful")
else:
    print(f"ping failed, status={r.status_code}")

and as a shell script:

API_KEY=...
PING_KEY=...
SLUG=...
PING_URL=https://hc-ping.com/$PING_KEY/$SLUG

status=$(curl -s -o /dev/null -w "%{http_code}" $PING_URL)

if [ $status -eq 404 ]; then
    PAYLOAD='{"api_key": "'$API_KEY'", "name": "'$SLUG'", "unique": ["name"]}'
    curl -s -o /dev/null -d "$PAYLOAD" https://healthchecks.io/api/v1/checks/
    status=$(curl -s -o /dev/null -w "%{http_code}" $PING_URL)
fi

if [ $status -eq 200 ]; then
    echo ping successful
else
    echo ping failed, status=$status
fi
<!-- gh-comment-id:1278897337 --> @cuu508 commented on GitHub (Oct 14, 2022): After doing some more thinking and prototyping, I've decided to pass on this feature, at least for now. It is tempting to have, but also a ton of work and a ton of extra complexity to implement properly. And I'm not willing to bear with a half-assed implementation :-) A workaround is to use a wrapper script, in nutshell: 1. send a ping request 2. if the ping request returns 404, use management API to create the check, then send the ping request again A quick PoC in python: ```python import requests API_KEY = "..." PING_KEY = "..." SLUG = "..." r = requests.get(f"https://hc-ping.com/{PING_KEY}/{SLUG}") if r.status_code == 404: payload = {"api_key": API_KEY, "name": SLUG, "unique": ["name"]} requests.post("https://healthchecks.io/api/v1/checks/", json=payload) r = requests.get(f"https://hc-ping.com/{PING_KEY}/{SLUG}") if r.status_code == 200: print("ping successful") else: print(f"ping failed, status={r.status_code}") ``` and as a shell script: ```bash API_KEY=... PING_KEY=... SLUG=... PING_URL=https://hc-ping.com/$PING_KEY/$SLUG status=$(curl -s -o /dev/null -w "%{http_code}" $PING_URL) if [ $status -eq 404 ]; then PAYLOAD='{"api_key": "'$API_KEY'", "name": "'$SLUG'", "unique": ["name"]}' curl -s -o /dev/null -d "$PAYLOAD" https://healthchecks.io/api/v1/checks/ status=$(curl -s -o /dev/null -w "%{http_code}" $PING_URL) fi if [ $status -eq 200 ]; then echo ping successful else echo ping failed, status=$status fi ```
Author
Owner

@frutik commented on GitHub (Nov 16, 2022):

Thank you for your great work!

Regarding the access management issues preventing auto-provisioning implementation, probably a big part of the usage patterns is a simple single-tenant setup. At least for self-hosted installations. And in this setup would be nice to have not just auto-provisioning ping but even ping requests able to modify the check settings. For example, change the timeout or schedule... So, there is a single place to define and control (basic settings) the check.

I am using Django management commands to implement cronjob actions, and this way (with auto-provisioning with ping and settings management), such a command becomes a single point of control for the monitoring.

Something like (huge simplification)

class CronJobCommand(BaseCommand):
    COMMAND_NAME = None
    HEALTHCHECKS_IO = None

    def handle(self, *args, **options):
        self.healthcheck_ping(status=None)
        self.log('Starting')
        self.action(*args, **options)
        self.log('Finished')
        self.healthcheck_ping(status=0)

class Command(CronJobCommand):
    COMMAND_NAME = 'shops_import'
    HEALTHCHECKS_IO = "@hourly"
    def action(self, *args, **options):
        # action

Of course, with your proposed approach, I can handle provisioning with 404 controlling. But I can not easily change my checks from @hourly to @daily in my code.

I understand, this description only covers my specific scenario, but maybe it can bring some new ideas and give some chance to idea of implementing that feature.

<!-- gh-comment-id:1317677079 --> @frutik commented on GitHub (Nov 16, 2022): Thank you for your great work! Regarding the access management issues preventing auto-provisioning implementation, probably a big part of the usage patterns is a simple single-tenant setup. At least for self-hosted installations. And in this setup would be nice to have not just auto-provisioning ping but even ping requests able to modify the check settings. For example, change the timeout or schedule... So, there is a single place to define and control (basic settings) the check. I am using Django management commands to implement cronjob actions, and this way (with auto-provisioning with ping and settings management), such a command becomes a single point of control for the monitoring. Something like (huge simplification) ``` class CronJobCommand(BaseCommand): COMMAND_NAME = None HEALTHCHECKS_IO = None def handle(self, *args, **options): self.healthcheck_ping(status=None) self.log('Starting') self.action(*args, **options) self.log('Finished') self.healthcheck_ping(status=0) class Command(CronJobCommand): COMMAND_NAME = 'shops_import' HEALTHCHECKS_IO = "@hourly" def action(self, *args, **options): # action ``` Of course, with your proposed approach, I can handle provisioning with 404 controlling. But I can not easily change my checks from @hourly to @daily in my code. I understand, this description only covers my specific scenario, but maybe it can bring some new ideas and give some chance to idea of implementing that feature.
Author
Owner

@cuu508 commented on GitHub (Jun 22, 2023):

I've now deployed an initial version of check auto-provisioning functionality.

What works:

  • you can set check's slug, via web UI and API calls
  • you can ping nonexistent slug and Healthchecks will create the check

What does not work yet:

  • when pinging, you cannot yet pass schedule, timeout and grace parameters

Slug validation rules: the slug must contain only lowercase letters, digits, hyphens and underscores. But you can use them in any combination, for example "foo--bar" and "--foo" are valid slugs.

Limits:

  • When auto-creating checks, you are allowed to go over your account's check limit. On the hosted service (https://healthchecks.io), to prevent abuse, you are not allowed to exceed it more than 2x. For example, on a free account the limit is 20 checks. You can auto-create up to 40 checks. The 41st ping to a nonexistent slug will return a HTTP 403 response.
  • If your account is over its limits (can happen through check auto-provisioning, or when upgrading to paid, creating a bunch of checks, then downgrading to free), the web UI will show a warning about it. The warning will say "Please upgrade, or reduce the number of checks in your account. Accounts that remain over the limit for more than 30 days are scheduled for deletion." The warning goes away as soon as the account is at or below its check limit.
<!-- gh-comment-id:1602374949 --> @cuu508 commented on GitHub (Jun 22, 2023): I've now deployed an initial version of check auto-provisioning functionality. What works: - you can set check's slug, via web UI and API calls - you can ping nonexistent slug and Healthchecks will create the check What does not work yet: - when pinging, you cannot yet pass schedule, timeout and grace parameters Slug validation rules: the slug must contain only lowercase letters, digits, hyphens and underscores. But you can use them in any combination, for example "foo--bar" and "--foo" are valid slugs. Limits: - When auto-creating checks, you are allowed to go over your account's check limit. On the hosted service (https://healthchecks.io), to prevent abuse, you are not allowed to exceed it more than 2x. For example, on a free account the limit is 20 checks. You can auto-create up to 40 checks. The 41st ping to a nonexistent slug will return a HTTP 403 response. - If your account is over its limits (can happen through check auto-provisioning, or when upgrading to paid, creating a bunch of checks, then downgrading to free), the web UI will show a warning about it. The warning will say "Please upgrade, or reduce the number of checks in your account. Accounts that remain over the limit for more than 30 days are scheduled for deletion." The warning goes away as soon as the account is at or below its check limit.
Author
Owner

@bdd commented on GitHub (Jun 26, 2023):

Following Pēteris's contribution, I made a prerelease of runitor supporting 201 response code so it doesn't think ping failed. If you use runitor, I'd appreciate testing reports of v1.3.0-beta.1. If no issues are found I intend to cut v.1.3.0 on Wednesday (PDT morning, UTC early evening).

<!-- gh-comment-id:1608413086 --> @bdd commented on GitHub (Jun 26, 2023): Following [Pēteris's contribution](https://github.com/bdd/runitor/pull/118), I made a [prerelease](https://github.com/bdd/runitor/releases/tag/v1.3.0-beta.1) of runitor supporting 201 response code so it doesn't think ping failed. If you use runitor, I'd appreciate testing reports of v1.3.0-beta.1. If no issues are found I intend to cut v.1.3.0 on Wednesday (PDT morning, UTC early evening).
Author
Owner

@cuu508 commented on GitHub (Jul 6, 2023):

Check auto-provisioning announced in blog here: https://blog.healthchecks.io/2023/07/new-feature-check-auto-provisioning/

<!-- gh-comment-id:1623157104 --> @cuu508 commented on GitHub (Jul 6, 2023): Check auto-provisioning announced in blog here: https://blog.healthchecks.io/2023/07/new-feature-check-auto-provisioning/
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/healthchecks#456
No description provided.