mirror of
https://github.com/healthchecks/healthchecks.git
synced 2026-04-25 15:05:49 +03:00
[GH-ISSUE #1023] Ping logs performance issue #710
Labels
No labels
bug
bug
bug
feature
good-first-issue
new integration
pull-request
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/healthchecks#710
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Athorcis on GitHub (Jul 5, 2024).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/1023
Hi, I'm using the self-hosted version of healthchecks (and this is great work).
I increased the ping log limit to 40,000. Now, all the checks with 40.000 pings are getting slow or won't load (getting harakiris in
the container logs). The problematic requests: last_ping or event lists
Is it supposed to happen? Is there a way to fix it without decreasing the ping log limit?
@cuu508 commented on GitHub (Jul 5, 2024):
Thanks for the report.
Can you post specific URLs that take long to load, and perhaps a screenshot from browser's developer tools with timings?
What database are you using, and what hardware are you running on?
@Athorcis commented on GitHub (Jul 10, 2024):
I use MySQL, and the server hardware is Intel i7-7700K - 4c/8t - 4.2 GHz/4.5 GHz, 32 Go RAM, 450 Go SSD.
I succeeded, in reducing some request's duration by adding indexes to some tables
I added a compound index on
api_ping.owner_idandapi_ping.created, it improved the performance on/checks/{check_id}/last_ping/routeI added a compound index on
api_ping.kind,api_ping.n,api_ping.created, it improved the performance on/checks/{check_id}/status/and/checks/{check_id}/log_events/but I still see issues specifically on
/checks/{check_id}/log_events/?fail=onThe request gets canceled because it takes too long, if I resend it, I get a 502 (because of harakiris)

@cuu508 commented on GitHub (Jul 10, 2024):
Thanks for the details.
I haven't managed to reproduce the issue so far. My setup:
./manage.py runserverto run the webserver/checks/fa4d810e-940f-414a-8b4d-3ee62254a056/log_events/?fail=on&start=on&log=on&ign=on&flip=ontakes ~600msWould you be able to install django-debug-toolbar and see in which queries the time is spent?
@cuu508 commented on GitHub (Jul 10, 2024):
For pinging, do you use HTTP POST with request body? If yes,
@cuu508 commented on GitHub (Jul 10, 2024):
An unrelated thing I noticed though, when spamming lots of ping requests simultaneously, I fairly regularly see requests failing with:
@cuu508 commented on GitHub (Jul 10, 2024):
Update –
api_pingtable by this pointlog_eventskept slowly creeping up – 700ms, 900ms, above one secondinnodb_buffer_pool_size = 8G)@Athorcis commented on GitHub (Jul 10, 2024):
It can vary from few 1ko to 50 Mo
100 Mo
No
@Athorcis commented on GitHub (Jul 10, 2024):
Since I use MySQL I did not have lock issues (but before with SQLite it happened)
@Athorcis commented on GitHub (Jul 10, 2024):
My api_table contains currently about 5 millions rows
@Athorcis commented on GitHub (Jul 10, 2024):
I'll try when as soon as I have some time
@cuu508 commented on GitHub (Jul 10, 2024):
50 Mo as in 50 megabytes?
@Athorcis commented on GitHub (Jul 11, 2024):
yes
@cuu508 commented on GitHub (Jul 11, 2024):
That explains it then.
Yes – if you push the limits far enough, you will eventually run into performance problems.
Consider lowering PING_BODY_LIMIT to, say, 1MB.
And additionally consider offloading ping bodies to object storage, see https://healthchecks.io/docs/self_hosted_configuration/#S3_ACCESS_KEY
@cuu508 commented on GitHub (Jul 11, 2024):
I implemented an experimental performance optimization: when querying pings for display in the "Log" page, instead of loading entire stored ping bodies, for each load only its initial 150 bytes (we're displaying only ~100 or so characters in the UI, so nothing should change visually for the user).
@Athorcis if you get a chance, please give this a try, and let me know if the
log_eventsperformance is better.@Athorcis commented on GitHub (Jul 13, 2024):
@cuu508 I applied your commit with a patch but I didn't see any performance improvement on the request generating the harakiris. On the other hand, I installed django-debug-toolbar (with some difficulties) then I identified which query was taking time and I succeeded with another new index (
owner_id,nDESC,created) to decrease the query time so it doesn't fail with a 502. Even though the request still takes too long (6 seconds) and gets aborted by js.Do you know if it would be possible to increase the timeout of ajax requests (or at least make it configurable)?
@cuu508 commented on GitHub (Jul 14, 2024):
The request gets aborted when a new request is about to be run:
github.com/healthchecks/healthchecks@1877a8324f/static/js/log.js (L40)The refresh runs every 3 seconds, and the interval is specified here:
github.com/healthchecks/healthchecks@1877a8324f/static/js/adaptive-setinterval.js (L13)You could increase the refresh interval there, but the root problem is still the request taking excessively long.