[GH-ISSUE #1443] Object names and common prefixes in listings are not URL-decoded #754

Open
opened 2026-03-04 01:48:30 +03:00 by kerem · 2 comments
Owner

Originally created by @init-js on GitHub (Oct 7, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1443

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD

Version of s3fs being used (s3fs --version)

V1.87 (commit: 194262c)

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

2.9.9

Kernel information (uname -r)

4.14.177-139.254.amzn2.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

s3fs command line used, if applicable

s3fs i-0d4cc1b0c5e8cfa7a /mnt -d -d -f -o use_cache=/tmp/s3fs-cache -o enable_noobj_cache -o endpoint=us-west-2 -o nomultipart -o use_path_request_style -o iam_role=auto -o url=https://my-entry-point.example.org -o del_cache

/etc/fstab entry, if applicable

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages

[INF]       curl.cpp:prepare_url(4831): URL is https://my-entry-point.example.org/i-0d4cc1b0c5e8cfa7a?delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/
[INF]       curl.cpp:prepare_url(4864): URL changed is https://my-entry-point.example.org/i-0d4cc1b0c5e8cfa7a/?delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/
[INF]       curl.cpp:insertV4Headers(2863): computing signature [GET] [/] [delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/] []
[INF]       curl.cpp:url_to_host(99): url is https://my-entry-point.example.org
 [INF]       curl.cpp:RequestPerform(2520): HTTP response code 200

The listings have percent-encoded key names, and those don't get decoded:

[INF] s3fs.cpp:s3fs_readdir(2515): [path=/linux_src/linux-4.18.20]
[INF]   s3fs.cpp:list_bucket(2558): [path=/linux_src/linux-4.18.20]
[INF]       curl.cpp:ListBucketRequest(3560): [tpath=/linux_src/linux-4.18.20]
[INF]       curl.cpp:prepare_url(4831): URL is https://my-entry-point.example.org/i-0d4cc1b0c5e8cfa7a?delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/
[INF]       curl.cpp:prepare_url(4864): URL changed is https://my-entry-point.example.org/i-0d4cc1b0c5e8cfa7a/?delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/
[INF]       curl.cpp:insertV4Headers(2863): computing signature [GET] [/] [delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/] []
[INF]       curl.cpp:url_to_host(99): url is https://my-entry-point.example.org
[INF]       curl.cpp:RequestPerform(2520): HTTP response code 200
[INF]   s3fs.cpp:readdir_multi_head(2431): [path=/linux_src/linux-4.18.20/][list=0]
[INF]       curl.cpp:PreHeadRequest(3119): [tpath=/linux_src/linux-4.18.20/linux%5Fsrc%2Flinux%2D4%2E18%2E20%2F][bpath=linux%5Fsrc%2Flinux%2D4%2E18%2E20%2F][save=/linux_src/linux-4.18.20/linux%5Fsrc%2Flinux%2D4%2E18%2E20%2F][sseckeypos=-1]

Details about issue

The object listings are allowed to contain percent-encoded sequences, and those are not decoded by the function: github.com/s3fs-fuse/s3fs-fuse@c7132b7f56/src/s3fs_xml.cpp (L120)

If you have a key called foo>bar in your bucket, and you list it with aws s3api list-objects --bucket my-bucket --prefix foo, you receive <Key>foo%3Ebar</Key> on the wire. That doesn't get decoded by the function above.

encoding type URL is the only encoding available.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html#API_ListObjects_RequestSyntax

I notice that s3fs doesn't set the encodingType query parameter. Is the default then that S3 uses entity-encoding to encode that stuff?

Originally created by @init-js on GitHub (Oct 7, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1443 ### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ _Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD_ #### Version of s3fs being used (s3fs --version) _V1.87 (commit: 194262c)_ #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) _2.9.9_ #### Kernel information (uname -r) _4.14.177-139.254.amzn2.x86_64_ #### GNU/Linux Distribution, if applicable (cat /etc/os-release) _NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal_ #### s3fs command line used, if applicable ``` s3fs i-0d4cc1b0c5e8cfa7a /mnt -d -d -f -o use_cache=/tmp/s3fs-cache -o enable_noobj_cache -o endpoint=us-west-2 -o nomultipart -o use_path_request_style -o iam_role=auto -o url=https://my-entry-point.example.org -o del_cache ``` #### /etc/fstab entry, if applicable ``` ``` #### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs) _if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages_ ``` [INF] curl.cpp:prepare_url(4831): URL is https://my-entry-point.example.org/i-0d4cc1b0c5e8cfa7a?delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/ [INF] curl.cpp:prepare_url(4864): URL changed is https://my-entry-point.example.org/i-0d4cc1b0c5e8cfa7a/?delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/ [INF] curl.cpp:insertV4Headers(2863): computing signature [GET] [/] [delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/] [] [INF] curl.cpp:url_to_host(99): url is https://my-entry-point.example.org [INF] curl.cpp:RequestPerform(2520): HTTP response code 200 ``` The listings have percent-encoded key names, and those don't get decoded: ``` [INF] s3fs.cpp:s3fs_readdir(2515): [path=/linux_src/linux-4.18.20] [INF] s3fs.cpp:list_bucket(2558): [path=/linux_src/linux-4.18.20] [INF] curl.cpp:ListBucketRequest(3560): [tpath=/linux_src/linux-4.18.20] [INF] curl.cpp:prepare_url(4831): URL is https://my-entry-point.example.org/i-0d4cc1b0c5e8cfa7a?delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/ [INF] curl.cpp:prepare_url(4864): URL changed is https://my-entry-point.example.org/i-0d4cc1b0c5e8cfa7a/?delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/ [INF] curl.cpp:insertV4Headers(2863): computing signature [GET] [/] [delimiter=/&max-keys=1000&prefix=linux_src/linux-4.18.20/] [] [INF] curl.cpp:url_to_host(99): url is https://my-entry-point.example.org [INF] curl.cpp:RequestPerform(2520): HTTP response code 200 [INF] s3fs.cpp:readdir_multi_head(2431): [path=/linux_src/linux-4.18.20/][list=0] [INF] curl.cpp:PreHeadRequest(3119): [tpath=/linux_src/linux-4.18.20/linux%5Fsrc%2Flinux%2D4%2E18%2E20%2F][bpath=linux%5Fsrc%2Flinux%2D4%2E18%2E20%2F][save=/linux_src/linux-4.18.20/linux%5Fsrc%2Flinux%2D4%2E18%2E20%2F][sseckeypos=-1] ``` ### Details about issue The object listings are allowed to contain percent-encoded sequences, and those are not decoded by the function: https://github.com/s3fs-fuse/s3fs-fuse/blob/c7132b7f5646107b46b444c9de3c3c4ed7500c78/src/s3fs_xml.cpp#L120 If you have a key called `foo>bar` in your bucket, and you list it with `aws s3api list-objects --bucket my-bucket --prefix foo`, you receive `<Key>foo%3Ebar</Key>` on the wire. That doesn't get decoded by the function above. encoding type URL is the only encoding available. https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html#API_ListObjects_RequestSyntax I notice that s3fs doesn't set the `encodingType` query parameter. Is the default then that S3 uses entity-encoding to encode that stuff?
Author
Owner

@gaul commented on GitHub (Oct 8, 2020):

I agree that s3fs should decode these responses. Could you submit a pull request to do so or try adding a test case that fails to test/integration-test-main.sh?

I don't think that encodingType is relevant here. I believe this is only used for strange characters like ASCII 1-10:

https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/EncodingType.html

<!-- gh-comment-id:705269651 --> @gaul commented on GitHub (Oct 8, 2020): I agree that s3fs should decode these responses. Could you submit a pull request to do so or try adding a test case that fails to `test/integration-test-main.sh`? I don't think that `encodingType` is relevant here. I believe this is only used for strange characters like ASCII 1-10: https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/EncodingType.html
Author
Owner

@gaul commented on GitHub (Feb 12, 2022):

@init-js Could you share a specific test case? I experimented with non-ASCII file and directory names and s3fs appears to do the right thing.

<!-- gh-comment-id:1037236368 --> @gaul commented on GitHub (Feb 12, 2022): @init-js Could you share a specific test case? I experimented with non-ASCII file and directory names and s3fs appears to do the right thing.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#754
No description provided.