mirror of
https://github.com/healthchecks/healthchecks.git
synced 2026-04-25 06:55:53 +03:00
[GH-ISSUE #348] Notification emails: include more details about the check #266
Labels
No labels
bug
bug
bug
feature
good-first-issue
new integration
pull-request
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/healthchecks#266
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @cuu508 on GitHub (Mar 25, 2020).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/348
Consider including:
Consider changing the summary table to a table of totals:
This would make the notification emails more search-friendly, also there would be less items competing for recipient's attention.
@cuu508 commented on GitHub (Dec 23, 2020):
I'm now mocking this up, and thinking about moving away from the heavy styled HTML emails.
Current:
Mockup:
I like the minimal styling better because it puts function over form, and avoids various rendering issues in different email clients. It would use the same font face and size as plain text emails. In most cases the information density would be higher.
What do you think? Any and all feedback welcome!
@r4lv commented on GitHub (Feb 9, 2021):
Dear cuu508,
I know I am a bit late for the discussion, but I really preferred the previous alert emails 😋
While this is definitely a matter of taste, let me explain my thoughts. I agree with the need of more information, but I strongly disagree with putting "function over form" — ideally, we would have both, which is already the case for all the rest of healthchecks! On the healthchecks website, I really appreciate that I quickly find everything I need, while also being pleasant to the eye. When I first saw the new alert (a few minutes ago), I associated nothing of it with the healthchecks I know. It looked broken to me, so I went trough the source and found out it was on purpose 😋
Some ideas:
terse(the current, new template),long(the old template) andshort(the old template without the summary, header, but with a bigger UP/DOWN button).Let me know what you think, and if you would like some help.
And thank you for your amazing work!
@cuu508 commented on GitHub (Mar 8, 2021):
Hello @r4lv, thanks for the feedback – I appreciate it!
I agree that the function and form should be the goal. If/when they are in conflict, there's probably a happy balance somewhere in there.
For some context, here's how the current template came to be: I wanted to add more information in the emails. HTML emails are a PITA – there are relatively few features and techniques that work and look consistent across email clients. Inlining, tables, it's like going back to IE6. To test it in various email clients, I sometimes used Litmus. Making any changes is also a minefield of Gmail potentially deciding that our emails now look too much like some spam pattern and should be marked as spam or suspicious.
I was planning to make extensive changes to the alert template but couldn't afford to spend days if not weeks in the designing - testing - compromising loop. So I went with the practical approach of using only the very basic formatting options.
Two things I like about using email client's default style:
I do want to do more work on this. At the very least, experiment some more with the template to see if there are any "easy wins" to make it look subjectively nicer, while remaining as simple as it is currently. Having multiple, selectable templates is also an interesting idea. One downside with that is the extra maintenance going forward, keeping multiple templates up to date and tested.
@lukastribus commented on GitHub (Mar 8, 2021):
Another minor details:
Please avoid relative time in emails ("two hours ago", "3 days and 40 minutes ago"). In troubleshooting this is almost always harder to interpret and in emails especially, since the email could already be 3 hours old by the time someone takes a look at it.
The reader needs to focus on troubleshooting the real issue, not engage in arithmetic exercises just to understand what time the events actually occurred.
@cuu508 commented on GitHub (Mar 8, 2021):
The relative times work OK if you read the email soon after receiving it. Let's say the period is 1 day, and it says the last ping was 1 day, 1 hour ago. So you go and think – "OK, the 1 day period passed, the 1 hour grace time passed, and this is why I'm now getting the notification".
In the web interface, in the ping log, you can switch between UTC, browser's timezone and server's timezone (if known).
In email messages it is not obvious what timezone to use. Whenever I receive monitoring alerts from systems, I first have to figure out what timezone the sender has probably assumed, and then do the mental math anyway...
There's #365 about letting users specify their preferred timezone – that would help but is not implemented yet.
@lukastribus commented on GitHub (Mar 8, 2021):
I was just thinking about how the timestamp in a down message I received the other day could be off by 4 - 5 hours (hourly cronjob), until I realized that the email itself was delayed for 4 - 5 hours, and gmail doesn't show the Date header of the email in it's interface (you'd have to go in "show original" to find the big disparity between when the email was sent and when it was actually received).
Now in this case the root cause was an outage at my SMTP provider, so the email was delayed because of that. However there is also greylisting which could delay the email for a some time.
Gmail tries very hard to hide the actual Date header (not sure why, not sure about other MUAs), I'd argue that relative time formatting require the email to arrive at the destination in quasi-realtime, to be able to make sense of it, and that is not universally true for SMTP.
The assumption being that the end-user is not sure a) what intervall/grace period is actually configured or b) whether healthchecks actually works correctly or not.
I think it's more likely that the user is concerned about the actual production service that is not running at this point, and when it actually did run the last time. Relative time formatting makes this harder, in my opinion.
I agree the longer the interval, the less critical this gets. But for jobs which intervals of 60 minutes or less, with just a few minutes of grace period, I think it's more important.
Thanks, I subscribed and commented there.