[GH-ISSUE #2808] urlDecode() silently produces NUL bytes on malformed input #1304

New issue

Open

opened 2026-03-04 01:52:58 +03:00 by kerem · 2 comments

kerem commented

2026-03-04 01:52:58 +03:00

Owner

Originally created by @CarstenGrohmann on GitHub (Feb 23, 2026).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2808

urlDecode() converts invalid %XX sequences (e.g. %GG) into NUL bytes instead of rejecting or preserving them. A truncated % at end of string is silently dropped. Malformed input can originate from buggy S3 clients, tampered object metadata, or an untrusted S3 backend.

Root Cause:
The hex-decoding ternary in src/string_util.cpp:206,211 falls through to 0x00 for non-hex characters. A truncated % at end of string causes a silent break, dropping the incomplete sequence.

github.com/s3fs-fuse/s3fs-fuse@58b3dd6b16/src/string_util.cpp (L195-L216)

Reproduction:

urlDecode("%GG");  // returns "\x00", expected: error or literal "%GG"
urlDecode("a%2");  // returns "a", silently drops "%2"
urlDecode("a%");   // returns "a", silently drops "%"

Proposed Fix:

Two options — feedback welcome:

Replace with curl_easy_unescape() — libcurl is already a required dependency and passes invalid sequences through literally. Needs a CURL* handle from the existing.
Fix urlDecode() in place — add proper error signaling (bool return) and pass invalid %XX through literally. No handle needed, but keeps custom code to maintain.

Which approach would you prefer?

Originally created by @CarstenGrohmann on GitHub (Feb 23, 2026). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2808 `urlDecode()` converts invalid `%XX` sequences (e.g. `%GG`) into NUL bytes instead of rejecting or preserving them. A truncated `%` at end of string is silently dropped. Malformed input can originate from buggy S3 clients, tampered object metadata, or an untrusted S3 backend. **Root Cause:** The hex-decoding ternary in `src/string_util.cpp:206,211` falls through to `0x00` for non-hex characters. A truncated `%` at end of string causes a silent `break`, dropping the incomplete sequence. https://github.com/s3fs-fuse/s3fs-fuse/blob/58b3dd6b16731303e18cad29dd36138f68e44867/src/string_util.cpp#L195-L216 **Reproduction:** ``` urlDecode("%GG"); // returns "\x00", expected: error or literal "%GG" urlDecode("a%2"); // returns "a", silently drops "%2" urlDecode("a%"); // returns "a", silently drops "%" ``` **Proposed Fix:** Two options — feedback welcome: 1. Replace with `curl_easy_unescape()` — libcurl is already a required dependency and passes invalid sequences through literally. Needs a CURL* handle from the existing. 2. Fix `urlDecode()` in place — add proper error signaling (bool return) and pass invalid `%XX` through literally. No handle needed, but keeps custom code to maintain. Which approach would you prefer?

kerem commented

2026-03-04 01:52:59 +03:00

Author

Owner

@gaul commented on GitHub (Feb 25, 2026):

This is a general issue of early s3fs code not having an error path. Signalling an error via a flag is probably the best we can do given the C++14 dependency.

I asked this is another PR but is this issue LLM-generated? I am asking due to the style and low value of the issue. While it is not wrong, these nits are not a top priority given the limited reviewer capacity. So if you are machine-generating these please use your best human judgement.

@gaul commented on GitHub (Feb 25, 2026): This is a general issue of early s3fs code not having an error path. Signalling an error via a flag is probably the best we can do given the C++14 dependency. I asked this is another PR but is this issue LLM-generated? I am asking due to the style and low value of the issue. While it is not wrong, these nits are not a top priority given the limited reviewer capacity. So if you are machine-generating these please use your best human judgement.

kerem commented

2026-03-04 01:52:59 +03:00

Author

Owner

@CarstenGrohmann commented on GitHub (Feb 25, 2026):

Sorry for flooding you with nits. This issue and the remaining open ones all came from an AI-assisted audit with a broad prompt. They are in the same category — no harm in a trusted environment, but worth fixing for less trusted setups like third-party S3 backends.

How about I space out my issues and PRs, improve their focus, and keep it to roughly once a month?

@CarstenGrohmann commented on GitHub (Feb 25, 2026): Sorry for flooding you with nits. This issue and the remaining open ones all came from an AI-assisted audit with a broad prompt. They are in the same category — no harm in a trusted environment, but worth fixing for less trusted setups like third-party S3 backends. How about I space out my issues and PRs, improve their focus, and keep it to roughly once a month?