starred/healthchecks

Fork 0

mirror of https://github.com/healthchecks/healthchecks.git synced 2026-04-26 07:25:51 +03:00

[GH-ISSUE #549] log entries: keep last n failure entries #398

New issue

Closed

opened 2026-02-25 23:42:19 +03:00 by kerem · 6 comments

kerem commented

2026-02-25 23:42:19 +03:00

Owner

Originally created by @lukastribus on GitHub (Aug 6, 2021).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/549

Hello,

when the log entries hit the maximum, old messages are removed.

Especially with higher frequency intervals, keeping a few of those "failure" events (which may contain important debug information's in the body) would be useful, as opposed to remove log entries solely based on the timestamp. Positive log entries are often only useful for their timestamp.

It so happens that I could have 100 positives log entries but lacking the last 2 - 3 negative log entries with debug informations in the body, and I'm really interested in the failures.

I'm not sure how this could be structured clearly without over-complicating the UI, maybe always keep the last 3 negative entries in the log?

Originally created by @lukastribus on GitHub (Aug 6, 2021). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/549 Hello, when the log entries hit the maximum, old messages are removed. Especially with higher frequency intervals, keeping a few of those "failure" events (which may contain important debug information's in the body) would be useful, as opposed to remove log entries solely based on the timestamp. Positive log entries are often only useful for their timestamp. It so happens that I could have 100 positives log entries but lacking the last 2 - 3 negative log entries with debug informations in the body, and I'm really interested in the failures. I'm not sure how this could be structured clearly without over-complicating the UI, maybe always keep the last 3 negative entries in the log?

kerem

2026-02-25 23:42:19 +03:00

closed this issue
added the
feature
label

kerem commented

2026-02-25 23:42:19 +03:00

Author

Owner

@cuu508 commented on GitHub (Aug 6, 2021):

Thanks for the suggestion!

I remember Mandrill (transactional email service, now part of Mailchimp) doing something like that. They were keeping a log of last 100 successful API calls, and a separate log of last 100 failed API calls. If there are lots of API calls, the successful log may only cover a short time period, but the last 100 failures were still available in the other log. I may have the details wrong, but that was the general idea.

It's possible to do something similar in Healthchecks but it would complicate bookkeeping, and would be a nontrivial change. It could also up to double the database size. For operational simplicity, I want to keep the database size as low as possible.

If you use Healthchecks.io, you can upgrade to a paid plan for 1000 log entry limit. If you run a self-hosted instance, you can set any log entry limit.

@cuu508 commented on GitHub (Aug 6, 2021): Thanks for the suggestion! I remember Mandrill (transactional email service, now part of Mailchimp) doing something like that. They were keeping a log of last 100 successful API calls, and a separate log of last 100 failed API calls. If there are lots of API calls, the successful log may only cover a short time period, but the last 100 failures were still available in the other log. I may have the details wrong, but that was the general idea. It's possible to do something similar in Healthchecks but it would complicate bookkeeping, and would be a nontrivial change. It could also up to double the database size. For operational simplicity, I want to keep the database size as low as possible. If you use Healthchecks.io, you can upgrade to a paid plan for 1000 log entry limit. If you run a self-hosted instance, you can set any log entry limit.

kerem commented

2026-02-25 23:42:19 +03:00

Author

Owner

@lukastribus commented on GitHub (Aug 9, 2021):

Hello,

the goal would certainly be to keep the total database size the same, for example keeping 95 log entries regardless of whether the type was OK or fail, and another 5 entries that failed.

But yeah, I agree that this could make things more complicated.

@lukastribus commented on GitHub (Aug 9, 2021): Hello, the goal would certainly be to keep the total database size the same, for example keeping 95 log entries regardless of whether the type was OK or fail, and another 5 entries that failed. But yeah, I agree that this could make things more complicated.

kerem commented

2026-02-25 23:42:19 +03:00

Author

Owner

@Wouter0100 commented on GitHub (Feb 4, 2022):

I was looking to open a feature request for this as well. We have a cron run every minute, making it very likely that we'll only start to look to errors when a 100 minutes is passed. Splitting up OK/fail entries like that (90 OK, 10 fail) would work.

Wouldn't it be possible to store it in the same table, but have different cleanup rules?

@Wouter0100 commented on GitHub (Feb 4, 2022): I was looking to open a feature request for this as well. We have a cron run every minute, making it very likely that we'll only start to look to errors when a 100 minutes is passed. Splitting up OK/fail entries like that (90 OK, 10 fail) would work. Wouldn't it be possible to store it in the same table, but have different cleanup rules?

kerem commented

2026-02-25 23:42:19 +03:00

Author

Owner

@lukastribus commented on GitHub (Feb 16, 2022):

Currently, my cronjob runs every 10 minutes. I implemented an additional check that the OK to healthchecks is sent only once an hour, not at every cronjob run.

Now I have the problem that when the job fails (sends fail to healthchecks), I don't get an OK on the next successful run, because it only sends an OK every hour.

The cron script would have to keep track of previous failures to be able to handle this correctly. To handle this the right way on the script side, lots of complexity is needed.

@lukastribus commented on GitHub (Feb 16, 2022): Currently, my cronjob runs every 10 minutes. I implemented an additional check that the OK to healthchecks is sent only once an hour, not at every cronjob run. Now I have the problem that when the job fails (sends fail to healthchecks), I don't get an OK on the next successful run, because it only sends an OK every hour. The cron script would have to keep track of previous failures to be able to handle this correctly. To handle this the right way on the script side, lots of complexity is needed.

kerem commented

2026-02-25 23:42:19 +03:00

Author

Owner

@cuu508 commented on GitHub (Feb 16, 2022):

@lukastribus are you using healthchecks.io or self-hosting?

If self-hosting, you can raise the limit of how many log entries are kept (see "Ping log limit" in Django admin → Accounts → Profiles).

On healthchecks.io paid plans the limit is 1000 log entries. If the job runs every 10 minutes, that covers almost a full week (or half that if you also send /start events). If #609 works out, I will look into lifting the 1000 entry limit for paid plans higher.

@cuu508 commented on GitHub (Feb 16, 2022): @lukastribus are you using healthchecks.io or self-hosting? If self-hosting, you can raise the limit of how many log entries are kept (see "Ping log limit" in Django admin → Accounts → Profiles). On healthchecks.io paid plans the limit is 1000 log entries. If the job runs every 10 minutes, that covers almost a full week (or half that if you also send `/start` events). If #609 works out, I will look into lifting the 1000 entry limit for paid plans higher.

kerem commented

2026-02-25 23:42:19 +03:00

Author

Owner

@lukastribus commented on GitHub (Feb 16, 2022):

I use healthchecks.io for now. I ended up maintaining state locally in case of errors, this adds complexity but it works.

LASTFAILFILE=lastfail

fail () {
 eval "$CURLCALL --data-raw \"$1\" \"$HEALTHCHECKURL/fail\""
 echo >"$LASTFAILFILE"
 exit 1
}

test || fail "error"


# submit ok when lastfail or hourly
if [ -f "$LASTFAILFILE" ]; then
 eval "$CURLCALL $HEALTHCHECKURL"
 rm "$LASTFAILFILE"
elif [ `date +%M` -lt "10" ]; then
 eval "$CURLCALL $HEALTHCHECKURL"
fi

@lukastribus commented on GitHub (Feb 16, 2022): I use healthchecks.io for now. I ended up maintaining state locally in case of errors, this adds complexity but it works. LASTFAILFILE=lastfail fail () { eval "$CURLCALL --data-raw \"$1\" \"$HEALTHCHECKURL/fail\"" echo >"$LASTFAILFILE" exit 1 } test || fail "error" # submit ok when lastfail or hourly if [ -f "$LASTFAILFILE" ]; then eval "$CURLCALL $HEALTHCHECKURL" rm "$LASTFAILFILE" elif [ `date +%M` -lt "10" ]; then eval "$CURLCALL $HEALTHCHECKURL" fi

kerem referenced this issue

2026-02-25 23:44:15 +03:00

[PR #398] [MERGED] Integration for Spike.sh #951