[GH-ISSUE #574] UnicodeDecodeError on ping endpoints with UTF-16LE-encoded payloads #418

Closed
opened 2026-02-25 23:42:23 +03:00 by kerem · 4 comments
Owner

Originally created by @marinbernard-pep06 on GitHub (Oct 19, 2021).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/574

Hi,

Our Windows hosts use PowerShell's Invoke-RestMethod cmdlet to talk to a local HealthChecks instance. We use POST methods to include short text strings as payloads, and it works as long as the string does not include any special (non-latin1) character. When it does, the request fails with HTTP/500 and an exception is raised.

Running:

Invoke-RestMethod -Uri <uri> -Method Post -Body "ABCDée"

Returns:

Invoke-RestMethod :
  Server Error (500)
  Server Error (500)
Au caractère Ligne:1 : 1
+ Invoke-RestMethod -Uri <uri> ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation : (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod], WebEx
   ception
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand

And raises:

Internal Server Error: /ping/<uuid>

UnicodeDecodeError at /ping/<uuid>
'utf-8' codec can't decode byte 0x82 in position 237: invalid start byte

Request Method: POST
Request URL: https://<host>/ping/<uuid>
Django Version: 3.2.8
Python Executable: /opt/healthchecks/venv/bin/python3
Python Version: 3.8.10
Python Path: ['/opt/healthchecks/healthchecks', '/opt/healthchecks/venv/bin', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/opt/healthchecks/venv/lib/python3.8/site-packages']
Server time: Mon, 18 Oct 2021 12:27:31 +0000 Installed Applications:
('hc.accounts',
 'django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.humanize',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'compressor',
 'hc.api',
 'hc.front',
 'hc.payments')
Installed Middleware:
('django.middleware.security.SecurityMiddleware',
 'whitenoise.middleware.WhiteNoiseMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'hc.accounts.middleware.CustomHeaderMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware',
 'django.middleware.locale.LocaleMiddleware',
 'hc.accounts.middleware.TeamAccessMiddleware')


Traceback (most recent call last):
  File "/opt/healthchecks/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/healthchecks/venv/lib/python3.8/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/healthchecks/venv/lib/python3.8/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/opt/healthchecks/venv/lib/python3.8/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "/opt/healthchecks/healthchecks/hc/api/views.py", line 50, in ping
    body = request.body.decode()

Exception Type: UnicodeDecodeError at /ping/<uuid>
Exception Value: 'utf-8' codec can't decode byte 0x82 in position 237: invalid start byte Request information:
USER: AnonymousUser

GET: No GET data

POST: No POST data

FILES: No FILES data

COOKIES: No cookie data

META:
CONTENT_LENGTH = '318'
CONTENT_TYPE = 'text/plain'
HTTP_ACCEPT_ENCODING = 'gzip'
HTTP_CONNECTION = 'close'
HTTP_HOST = '<uuid>'
HTTP_USER_AGENT = 'Mozilla/5.0 (Windows NT; Windows NT 10.0; fr-FR) WindowsPowerShell/5.1.19041.1237'
HTTP_X_FORWARDED_FOR = '10.6.185.217, 10.6.104.217'
HTTP_X_FORWARDED_PROTO = 'https'
PATH_INFO = '/ping/<uuid>'
QUERY_STRING = ''
RAW_URI = '/ping/<uuid>'
REMOTE_ADDR = '127.0.0.1'
REMOTE_PORT = '53224'
REQUEST_METHOD = 'POST'
SCRIPT_NAME = ''
SERVER_NAME = '127.0.0.1'
SERVER_PORT = '8000'
SERVER_PROTOCOL = 'HTTP/1.0'
SERVER_SOFTWARE = 'gunicorn/20.1.0'
gunicorn.socket = <socket.socket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 8000), raddr=('127.0.0.1', 53224)> wsgi.errors = <gunicorn.http.wsgi.WSGIErrorsWrapper object at 0x7f0632eb00d0> wsgi.file_wrapper = <class 'gunicorn.http.wsgi.FileWrapper'> wsgi.input = <gunicorn.http.body.Body object at 0x7f0632ea2760> wsgi.input_terminated = True wsgi.multiprocess = False wsgi.multithread = False wsgi.run_once = False wsgi.url_scheme = 'https'
wsgi.version = '(1, 0)'

Settings:
<snip>

It seems HealthChecks fails to parse the payload, because it includes UTF-16LE-encoded characters. Since this is the default encoding on Windows systems, would you agree to make HealthChecks support it ? It would allow everyone to rely on native PowerShell commands regardless of payload encoding, instead of explicitly casting every string to UTF-8 to prevent the issue from happening.

Thank you,

Originally created by @marinbernard-pep06 on GitHub (Oct 19, 2021). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/574 Hi, Our Windows hosts use PowerShell's `Invoke-RestMethod` cmdlet to talk to a local HealthChecks instance. We use POST methods to include short text strings as payloads, and it works as long as the string does not include any special (non-latin1) character. When it does, the request fails with HTTP/500 and an exception is raised. Running: ```powershell Invoke-RestMethod -Uri <uri> -Method Post -Body "ABCDée" ``` Returns: ``` Invoke-RestMethod : Server Error (500) Server Error (500) Au caractère Ligne:1 : 1 + Invoke-RestMethod -Uri <uri> ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation : (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod], WebEx ception + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand ``` And raises: ``` Internal Server Error: /ping/<uuid> UnicodeDecodeError at /ping/<uuid> 'utf-8' codec can't decode byte 0x82 in position 237: invalid start byte Request Method: POST Request URL: https://<host>/ping/<uuid> Django Version: 3.2.8 Python Executable: /opt/healthchecks/venv/bin/python3 Python Version: 3.8.10 Python Path: ['/opt/healthchecks/healthchecks', '/opt/healthchecks/venv/bin', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/opt/healthchecks/venv/lib/python3.8/site-packages'] Server time: Mon, 18 Oct 2021 12:27:31 +0000 Installed Applications: ('hc.accounts', 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.humanize', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'compressor', 'hc.api', 'hc.front', 'hc.payments') Installed Middleware: ('django.middleware.security.SecurityMiddleware', 'whitenoise.middleware.WhiteNoiseMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'hc.accounts.middleware.CustomHeaderMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', 'django.middleware.locale.LocaleMiddleware', 'hc.accounts.middleware.TeamAccessMiddleware') Traceback (most recent call last): File "/opt/healthchecks/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) File "/opt/healthchecks/venv/lib/python3.8/site-packages/django/core/handlers/base.py", line 181, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/opt/healthchecks/venv/lib/python3.8/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view return view_func(*args, **kwargs) File "/opt/healthchecks/venv/lib/python3.8/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func response = view_func(request, *args, **kwargs) File "/opt/healthchecks/healthchecks/hc/api/views.py", line 50, in ping body = request.body.decode() Exception Type: UnicodeDecodeError at /ping/<uuid> Exception Value: 'utf-8' codec can't decode byte 0x82 in position 237: invalid start byte Request information: USER: AnonymousUser GET: No GET data POST: No POST data FILES: No FILES data COOKIES: No cookie data META: CONTENT_LENGTH = '318' CONTENT_TYPE = 'text/plain' HTTP_ACCEPT_ENCODING = 'gzip' HTTP_CONNECTION = 'close' HTTP_HOST = '<uuid>' HTTP_USER_AGENT = 'Mozilla/5.0 (Windows NT; Windows NT 10.0; fr-FR) WindowsPowerShell/5.1.19041.1237' HTTP_X_FORWARDED_FOR = '10.6.185.217, 10.6.104.217' HTTP_X_FORWARDED_PROTO = 'https' PATH_INFO = '/ping/<uuid>' QUERY_STRING = '' RAW_URI = '/ping/<uuid>' REMOTE_ADDR = '127.0.0.1' REMOTE_PORT = '53224' REQUEST_METHOD = 'POST' SCRIPT_NAME = '' SERVER_NAME = '127.0.0.1' SERVER_PORT = '8000' SERVER_PROTOCOL = 'HTTP/1.0' SERVER_SOFTWARE = 'gunicorn/20.1.0' gunicorn.socket = <socket.socket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 8000), raddr=('127.0.0.1', 53224)> wsgi.errors = <gunicorn.http.wsgi.WSGIErrorsWrapper object at 0x7f0632eb00d0> wsgi.file_wrapper = <class 'gunicorn.http.wsgi.FileWrapper'> wsgi.input = <gunicorn.http.body.Body object at 0x7f0632ea2760> wsgi.input_terminated = True wsgi.multiprocess = False wsgi.multithread = False wsgi.run_once = False wsgi.url_scheme = 'https' wsgi.version = '(1, 0)' Settings: <snip> ``` It seems HealthChecks fails to parse the payload, because it includes UTF-16LE-encoded characters. Since this is the default encoding on Windows systems, would you agree to make HealthChecks support it ? It would allow everyone to rely on native PowerShell commands regardless of payload encoding, instead of explicitly casting every string to UTF-8 to prevent the issue from happening. Thank you,
kerem closed this issue 2026-02-25 23:42:23 +03:00
Author
Owner

@cuu508 commented on GitHub (Oct 19, 2021):

Thanks for the report!

I definitely want Healthchecks to handle this better – it should not return HTTP 500, even if it cannot decode the payload.

I'm not a Windows user, so am not familiar with the encoding gotchas on Windows. I did experiment a bit, some findings:

  • PowerShell does not appear to set the Content-Type request header, so we cannot look up the charset there
  • If I open PowerShell console (blue background) and run Invoke-RestMethod there, it appears to use "cp1252" encoding, at least on the VM I'm testing with (Win 10, Version 2004)
  • If I open cmd.exe and run powershell.exe -File script.ps1 it appears to use semi-broken-UTF8 encoding. I say semi-broken, because I found one character which it does not encode correctly: "ņ".
  • I didn't see UTF-16 output anywhere but perhaps it also depends on system's locale, environment variables or what not...

I think I'll fix Healthchecks to handle UTF-8 decoding better. When it hits non-UTF-8 sequence, it would replace it with U+FFFD. If we use ABCDée as example, it would produce the following from cmd.exe:

image

image


And the following from PowerShell console:

image

image

How does that sound?

<!-- gh-comment-id:946716770 --> @cuu508 commented on GitHub (Oct 19, 2021): Thanks for the report! I definitely want Healthchecks to handle this better – it should not return HTTP 500, even if it cannot decode the payload. I'm not a Windows user, so am not familiar with the encoding gotchas on Windows. I did experiment a bit, some findings: * PowerShell does not appear to set the `Content-Type` request header, so we cannot look up the charset there * If I open PowerShell console (blue background) and run Invoke-RestMethod there, it *appears* to use "cp1252" encoding, at least on the VM I'm testing with (Win 10, Version 2004) * If I open cmd.exe and run `powershell.exe -File script.ps1` it appears to use semi-broken-UTF8 encoding. I say semi-broken, because I found one character which it does not encode correctly: "ņ". * I didn't see UTF-16 output anywhere but perhaps it also depends on system's locale, environment variables or what not... I think I'll fix Healthchecks to handle UTF-8 decoding better. When it hits non-UTF-8 sequence, it would replace it with [U+FFFD](https://www.fileformat.info/info/unicode/char/fffd/index.htm). If we use `ABCDée` as example, it would produce the following from cmd.exe: ![image](https://user-images.githubusercontent.com/661859/137917564-6d74056b-1cbb-435e-91c4-5c6ffc908772.png) ![image](https://user-images.githubusercontent.com/661859/137917159-c36ddf1f-8f7c-49d1-86bb-4396890a4be2.png) --- And the following from PowerShell console: ![image](https://user-images.githubusercontent.com/661859/137917762-34aa82ab-3dfa-44c0-a100-57bda413ee96.png) ![image](https://user-images.githubusercontent.com/661859/137917292-8d175a2b-f68c-4e79-b600-88c98230bea3.png) How does that sound?
Author
Owner

@marinbernard-pep06 commented on GitHub (Oct 19, 2021):

If I open PowerShell console (blue background) and run Invoke-RestMethod there, it appears to use "cp1252" encoding, at least on the VM I'm testing with (Win 10, Version 2004)

Yes, that's the legacy console encoding in US versions of Microsoft Windows. On versions distributed in Western Europe, it's cp850.

If I open cmd.exe and run powershell.exe -File script.ps1 it appears to use semi-broken-UTF8 encoding. I say semi-broken, because I found one character which it does not encode correctly: "ņ".

There may exists encoding differences involving UTF-8 normalization, especially with accented characters. For instance, Python might encode this character as a single UTF-8 byte (i.e.: a single accented char), while Windows may rely on character composition and encode it as a pair of bytes (1 letter + 1 diacritic)... or the other way round.

I didn't see UTF-16 output anywhere but perhaps it also depends on system's locale, environment variables or what not...

Well, my bad. The issue title was misleading as UTF-16 was never involved directly. Windows uses UTF-16 internally (and incorrectly calls it Unicode, but that's another story), and so does PowerShell. All strings are represented internally as UTF-16 LE by default. In fact, the real problem is that Django does not support any encoding other than UTF-8 without BOM.

When the Invoke-RestMethod cmdlet is invoked manually within a console window, the string passed to the -Body parameter is supplied by the console host, which applies the cp1252 (or cp850) encoding to all inputs. As a consequence, the payload is sent encoded with the code page of the console, which drives Django crazy.

Now if you call PowerShell with the -File argument, PowerShell will parse the script file by itself, and stick to the encoding of the file. If your script file is encoded with UTF8 (which is the default encoding of the notepad), the PowerShell command will indeed succeed, since the string will be both parsed and sent as UTF-8 in the HTTP payload. But if you encode the same script file in any other format (ANSI, UTF-16, or even UTF-8 with BOM), the string will still be parsed and sent as-is, and the ping will fail with a Django exception.

The only way to prevent this is to explicitly convert the string to UTF-8 before invoking Invoke-RestMethod. Then, the following script does work, even if saved with UTF-16 LE encoding:

image

But using UTF-16 will fail, as Django probably does not support it:

image

I think I'll fix Healthchecks to handle UTF-8 decoding better. When it hits non-UTF-8 sequence, it would replace it with U+FFFD. If we use ABCDée as example, it would produce the following from cmd.exe:

That would be great! We don't really care about text integrity, as long as the ping succeeds!

Thanks, and sorry for the length.

<!-- gh-comment-id:946824919 --> @marinbernard-pep06 commented on GitHub (Oct 19, 2021): > If I open PowerShell console (blue background) and run Invoke-RestMethod there, it _appears_ to use "cp1252" encoding, at least on the VM I'm testing with (Win 10, Version 2004) Yes, that's the legacy console encoding in US versions of Microsoft Windows. On versions distributed in Western Europe, it's `cp850`. > If I open cmd.exe and run `powershell.exe -File script.ps1` it appears to use semi-broken-UTF8 encoding. I say semi-broken, because I found one character which it does not encode correctly: "ņ". There may exists encoding differences involving [UTF-8 normalization](https://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about), especially with accented characters. For instance, Python might encode this character as a single UTF-8 byte (i.e.: a single accented char), while Windows may rely on character composition and encode it as a pair of bytes (1 letter + 1 diacritic)... or the other way round. > I didn't see UTF-16 output anywhere but perhaps it also depends on system's locale, environment variables or what not... Well, my bad. The issue title was misleading as UTF-16 was never involved directly. Windows uses UTF-16 internally (and incorrectly calls it _Unicode_, but that's another story), and so does PowerShell. All strings are represented internally as UTF-16 LE _by default_. In fact, the real problem is that Django does not support any encoding other than UTF-8 without BOM. When the `Invoke-RestMethod` cmdlet is invoked manually within a console window, the string passed to the `-Body` parameter is supplied by the console host, which applies the `cp1252` (or `cp850`) encoding to all inputs. As a consequence, the payload is sent encoded with the code page of the console, which drives Django crazy. Now if you call PowerShell with the `-File` argument, PowerShell will parse the script file by itself, and stick to the encoding of the file. If your script file is encoded with _UTF8_ (which is the default encoding of the notepad), the PowerShell command will indeed succeed, since the string will be both parsed and sent as UTF-8 in the HTTP payload. But if you encode the same script file in any other format (ANSI, UTF-16, or even UTF-8 with BOM), the string will still be parsed and sent as-is, and the ping will fail with a Django exception. The only way to prevent this is to explicitly convert the string to UTF-8 before invoking `Invoke-RestMethod`. Then, the following script does work, even if saved with UTF-16 LE encoding: ![image](https://user-images.githubusercontent.com/68065424/137932275-ca28667a-1cb1-4f89-96c1-1bd9643d6c24.png) But using UTF-16 will fail, as Django probably does not support it: ![image](https://user-images.githubusercontent.com/68065424/137938503-04e3afc1-cea5-400b-af5a-be8ecb415ead.png) > I think I'll fix Healthchecks to handle UTF-8 decoding better. When it hits non-UTF-8 sequence, it would replace it with [U+FFFD](https://www.fileformat.info/info/unicode/char/fffd/index.htm). If we use `ABCDée` as example, it would produce the following from cmd.exe: That would be great! We don't really care about text integrity, as long as the ping succeeds! Thanks, and sorry for the length.
Author
Owner

@cuu508 commented on GitHub (Oct 19, 2021):

Thanks for the details, appreciate it!

It would be nice to be able to decode utf8, cp1252 and cp850 correctly but it could be tricky to detect which encoding to use. A piece of binary data can be decoded as both cp1252 and cp850 – and who knows which one was intended:

>>> "ABCDée".encode("cp1252").decode("cp850")
'ABCDÚe'

Just pushed a commit which replaces non-UTF8 characters with U+FFFD.

<!-- gh-comment-id:946897621 --> @cuu508 commented on GitHub (Oct 19, 2021): Thanks for the details, appreciate it! It would be nice to be able to decode `utf8`, `cp1252` and `cp850` correctly but it could be tricky to detect which encoding to use. A piece of binary data can be decoded as both `cp1252` and `cp850` – and who knows which one was intended: ```python >>> "ABCDée".encode("cp1252").decode("cp850") 'ABCDÚe' ``` Just pushed a commit which replaces non-UTF8 characters with U+FFFD.
Author
Owner

@marinbernard-pep06 commented on GitHub (Oct 20, 2021):

Thanks a lot!

<!-- gh-comment-id:947351043 --> @marinbernard-pep06 commented on GitHub (Oct 20, 2021): Thanks a lot!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/healthchecks#418
No description provided.