mirror of
https://github.com/healthchecks/healthchecks.git
synced 2026-04-25 23:15:49 +03:00
[GH-ISSUE #574] UnicodeDecodeError on ping endpoints with UTF-16LE-encoded payloads #418
Labels
No labels
bug
bug
bug
feature
good-first-issue
new integration
pull-request
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/healthchecks#418
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @marinbernard-pep06 on GitHub (Oct 19, 2021).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/574
Hi,
Our Windows hosts use PowerShell's
Invoke-RestMethodcmdlet to talk to a local HealthChecks instance. We use POST methods to include short text strings as payloads, and it works as long as the string does not include any special (non-latin1) character. When it does, the request fails with HTTP/500 and an exception is raised.Running:
Returns:
And raises:
It seems HealthChecks fails to parse the payload, because it includes UTF-16LE-encoded characters. Since this is the default encoding on Windows systems, would you agree to make HealthChecks support it ? It would allow everyone to rely on native PowerShell commands regardless of payload encoding, instead of explicitly casting every string to UTF-8 to prevent the issue from happening.
Thank you,
@cuu508 commented on GitHub (Oct 19, 2021):
Thanks for the report!
I definitely want Healthchecks to handle this better – it should not return HTTP 500, even if it cannot decode the payload.
I'm not a Windows user, so am not familiar with the encoding gotchas on Windows. I did experiment a bit, some findings:
Content-Typerequest header, so we cannot look up the charset therepowershell.exe -File script.ps1it appears to use semi-broken-UTF8 encoding. I say semi-broken, because I found one character which it does not encode correctly: "ņ".I think I'll fix Healthchecks to handle UTF-8 decoding better. When it hits non-UTF-8 sequence, it would replace it with U+FFFD. If we use
ABCDéeas example, it would produce the following from cmd.exe:And the following from PowerShell console:
How does that sound?
@marinbernard-pep06 commented on GitHub (Oct 19, 2021):
Yes, that's the legacy console encoding in US versions of Microsoft Windows. On versions distributed in Western Europe, it's
cp850.There may exists encoding differences involving UTF-8 normalization, especially with accented characters. For instance, Python might encode this character as a single UTF-8 byte (i.e.: a single accented char), while Windows may rely on character composition and encode it as a pair of bytes (1 letter + 1 diacritic)... or the other way round.
Well, my bad. The issue title was misleading as UTF-16 was never involved directly. Windows uses UTF-16 internally (and incorrectly calls it Unicode, but that's another story), and so does PowerShell. All strings are represented internally as UTF-16 LE by default. In fact, the real problem is that Django does not support any encoding other than UTF-8 without BOM.
When the
Invoke-RestMethodcmdlet is invoked manually within a console window, the string passed to the-Bodyparameter is supplied by the console host, which applies thecp1252(orcp850) encoding to all inputs. As a consequence, the payload is sent encoded with the code page of the console, which drives Django crazy.Now if you call PowerShell with the
-Fileargument, PowerShell will parse the script file by itself, and stick to the encoding of the file. If your script file is encoded with UTF8 (which is the default encoding of the notepad), the PowerShell command will indeed succeed, since the string will be both parsed and sent as UTF-8 in the HTTP payload. But if you encode the same script file in any other format (ANSI, UTF-16, or even UTF-8 with BOM), the string will still be parsed and sent as-is, and the ping will fail with a Django exception.The only way to prevent this is to explicitly convert the string to UTF-8 before invoking
Invoke-RestMethod. Then, the following script does work, even if saved with UTF-16 LE encoding:But using UTF-16 will fail, as Django probably does not support it:
That would be great! We don't really care about text integrity, as long as the ping succeeds!
Thanks, and sorry for the length.
@cuu508 commented on GitHub (Oct 19, 2021):
Thanks for the details, appreciate it!
It would be nice to be able to decode
utf8,cp1252andcp850correctly but it could be tricky to detect which encoding to use. A piece of binary data can be decoded as bothcp1252andcp850– and who knows which one was intended:Just pushed a commit which replaces non-UTF8 characters with U+FFFD.
@marinbernard-pep06 commented on GitHub (Oct 20, 2021):
Thanks a lot!