[GH-ISSUE #1732] Use a single request for updating metadata and renaming objects #891

Open
opened 2026-03-04 01:49:42 +03:00 by kerem · 4 comments
Owner

Originally created by @CarstenGrohmann on GitHub (Jul 30, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1732

Currently, s3fs uses multipart requests with a single part to update metadata or rename objects of medium size (< 5GB).

By switching from multi-part requests to a single PUT request, s3fs can speed up such operations.

One possible change would be to modify put_headers() so that s3fscurl.PutHeadRequest() is executed most of the time.

Current code:

github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/s3fs.cpp (L760-L768)

Suggested change: (untested) Use single PUT request for all < 5GB

diff --git a/src/s3fs.cpp b/src/s3fs.cpp
index 27419e8..83d6f47 100644
--- a/src/s3fs.cpp
+++ b/src/s3fs.cpp
@@ -757,7 +757,7 @@ int put_headers(const char* path, headers_t& meta, bool is_copy, bool use_st_siz
         size = get_size(meta);
     }
 
-    if(!nocopyapi && !nomultipart && size >= multipart_threshold){
+    if(!nocopyapi && !nomultipart && size >= FIVE_GB){
         if(0 != (result = s3fscurl.MultipartHeadRequest(path, size, meta, is_copy))){
             return result;
         }

What do you think about such change?

Originally created by @CarstenGrohmann on GitHub (Jul 30, 2021). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1732 Currently, s3fs uses multipart requests with a single part to update metadata or rename objects of medium size (< 5GB). By switching from multi-part requests to a single PUT request, s3fs can speed up such operations. One possible change would be to modify `put_headers()` so that `s3fscurl.PutHeadRequest()` is executed most of the time. **Current code:** https://github.com/s3fs-fuse/s3fs-fuse/blob/e1f3b9d8c17b191bce0db462fd5aaedad987e04a/src/s3fs.cpp#L760-L768 **Suggested change:** (untested) Use single PUT request for all < 5GB ``` diff --git a/src/s3fs.cpp b/src/s3fs.cpp index 27419e8..83d6f47 100644 --- a/src/s3fs.cpp +++ b/src/s3fs.cpp @@ -757,7 +757,7 @@ int put_headers(const char* path, headers_t& meta, bool is_copy, bool use_st_siz size = get_size(meta); } - if(!nocopyapi && !nomultipart && size >= multipart_threshold){ + if(!nocopyapi && !nomultipart && size >= FIVE_GB){ if(0 != (result = s3fscurl.MultipartHeadRequest(path, size, meta, is_copy))){ return result; } ``` What do you think about such change?
Author
Owner

@gaul commented on GitHub (Jul 30, 2021):

Multipart copy should improve performance of large objects compared to single part copy due to parallelism for the server-side copy. Do you have benchmarks which show otherwise? My previous results showed a 5x speedup which is why I added this optimization in #1562.

<!-- gh-comment-id:890092406 --> @gaul commented on GitHub (Jul 30, 2021): Multipart copy should improve performance of large objects compared to single part copy due to parallelism for the server-side copy. Do you have benchmarks which show otherwise? My previous results showed a 5x speedup which is why I added this optimization in #1562.
Author
Owner

@CarstenGrohmann commented on GitHub (Jul 31, 2021):

Your explanation sounds logical. At the moment, I don't have any benchmark but I'll create one during the next few days.

Below is an excerpt from my log. This shows that the file of 360MB is only modified as a one part multipart upload. As a result, there can be no server-side parallelization.

Since the file size of 360MB is larger than multipart_size, this may still be a bug, since the metadata update should consist of two parts.

# ./s3fs -o url=http://mys3service:8080,use_path_request_style,notsup_compat_dir,enable_noobj_cache,readwrite_timeout=210,multipart_size=250,parallel_count=20,curldbg,dbglevel=debug -d -d -f mybucket /mys3
# ll rhel-server-7.0-x86_64-boot.iso
-rw-r--r-- 1 root root 360710144 Jul 24  2014 rhel-server-7.0-x86_64-boot.iso

# egrep ' (utimens|rename) |(CURL DBG).+>.+(POST|PUT|copy-source)' s3fs.log | grep -v Authorization 
 22498 utimens /.rhel-server-7.0-x86_64-boot.iso.IyYIH8 1627536493.489913004 1406196627.000000000
 22562 2021-07-29T19:28:13.492Z [CURL DBG] > POST /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?uploads HTTP/1.1
 22603 2021-07-29T19:28:13.539Z [CURL DBG] > PUT /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?partNumber=1&uploadId=in19N-PHeulINv6PNQ1ZRkd8hMSp710I0-ls8n5QFlGk8euQ_kIBtb8-Hg HTTP/1.1
 22610 2021-07-29T19:28:13.540Z [CURL DBG] > x-amz-copy-source: /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8
 22611 2021-07-29T19:28:13.540Z [CURL DBG] > x-amz-copy-source-range: bytes=0-360710143
 22639 2021-07-29T19:28:16.476Z [CURL DBG] > POST /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?uploadId=in19N-PHeulINv6PNQ1ZRkd8hMSp710I0-ls8n5QFlGk8euQ_kIBtb8-Hg HTTP/1.1
 22761 2021-07-29T19:28:16.555Z [CURL DBG] > POST /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?uploads HTTP/1.1
 22802 2021-07-29T19:28:16.558Z [CURL DBG] > PUT /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?partNumber=1&uploadId=mT4tR7bdnZoRazdolAFw_QwwqK8dfxvkECUOnvlANWjFJx7qTNS6P6ArKA HTTP/1.1
 22809 2021-07-29T19:28:16.558Z [CURL DBG] > x-amz-copy-source: /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8
 22810 2021-07-29T19:28:16.558Z [CURL DBG] > x-amz-copy-source-range: bytes=0-360710143
 22839 2021-07-29T19:28:18.906Z [CURL DBG] > POST /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?uploadId=mT4tR7bdnZoRazdolAFw_QwwqK8dfxvkECUOnvlANWjFJx7qTNS6P6ArKA HTTP/1.1
 22919 rename /.rhel-server-7.0-x86_64-boot.iso.IyYIH8 /rhel-server-7.0-x86_64-boot.iso
 23031 2021-07-29T19:28:19.237Z [CURL DBG] > POST /mybucket/rhel-server-7.0-x86_64-boot.iso?uploads HTTP/1.1
 23072 2021-07-29T19:28:19.240Z [CURL DBG] > PUT /mybucket/rhel-server-7.0-x86_64-boot.iso?partNumber=1&uploadId=088tz5RTbSWWrRV1RJwNXqmYo5wTsP4aulyANGy4VsO1nZ8fcFB-x_iLxw HTTP/1.1
 23079 2021-07-29T19:28:19.240Z [CURL DBG] > x-amz-copy-source: /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8
 23080 2021-07-29T19:28:19.240Z [CURL DBG] > x-amz-copy-source-range: bytes=0-360710143
 23108 2021-07-29T19:28:21.848Z [CURL DBG] > POST /mybucket/rhel-server-7.0-x86_64-boot.iso?uploadId=088tz5RTbSWWrRV1RJwNXqmYo5wTsP4aulyANGy4VsO1nZ8fcFB-x_iLxw HTTP/1.1
<!-- gh-comment-id:890337565 --> @CarstenGrohmann commented on GitHub (Jul 31, 2021): Your explanation sounds logical. At the moment, I don't have any benchmark but I'll create one during the next few days. Below is an excerpt from my log. This shows that the file of 360MB is only modified as a one part multipart upload. As a result, there can be no server-side parallelization. Since the file size of 360MB is larger than `multipart_size`, this may still be a bug, since the metadata update should consist of two parts. ``` # ./s3fs -o url=http://mys3service:8080,use_path_request_style,notsup_compat_dir,enable_noobj_cache,readwrite_timeout=210,multipart_size=250,parallel_count=20,curldbg,dbglevel=debug -d -d -f mybucket /mys3 ``` ``` # ll rhel-server-7.0-x86_64-boot.iso -rw-r--r-- 1 root root 360710144 Jul 24 2014 rhel-server-7.0-x86_64-boot.iso # egrep ' (utimens|rename) |(CURL DBG).+>.+(POST|PUT|copy-source)' s3fs.log | grep -v Authorization 22498 utimens /.rhel-server-7.0-x86_64-boot.iso.IyYIH8 1627536493.489913004 1406196627.000000000 22562 2021-07-29T19:28:13.492Z [CURL DBG] > POST /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?uploads HTTP/1.1 22603 2021-07-29T19:28:13.539Z [CURL DBG] > PUT /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?partNumber=1&uploadId=in19N-PHeulINv6PNQ1ZRkd8hMSp710I0-ls8n5QFlGk8euQ_kIBtb8-Hg HTTP/1.1 22610 2021-07-29T19:28:13.540Z [CURL DBG] > x-amz-copy-source: /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8 22611 2021-07-29T19:28:13.540Z [CURL DBG] > x-amz-copy-source-range: bytes=0-360710143 22639 2021-07-29T19:28:16.476Z [CURL DBG] > POST /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?uploadId=in19N-PHeulINv6PNQ1ZRkd8hMSp710I0-ls8n5QFlGk8euQ_kIBtb8-Hg HTTP/1.1 22761 2021-07-29T19:28:16.555Z [CURL DBG] > POST /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?uploads HTTP/1.1 22802 2021-07-29T19:28:16.558Z [CURL DBG] > PUT /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?partNumber=1&uploadId=mT4tR7bdnZoRazdolAFw_QwwqK8dfxvkECUOnvlANWjFJx7qTNS6P6ArKA HTTP/1.1 22809 2021-07-29T19:28:16.558Z [CURL DBG] > x-amz-copy-source: /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8 22810 2021-07-29T19:28:16.558Z [CURL DBG] > x-amz-copy-source-range: bytes=0-360710143 22839 2021-07-29T19:28:18.906Z [CURL DBG] > POST /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8?uploadId=mT4tR7bdnZoRazdolAFw_QwwqK8dfxvkECUOnvlANWjFJx7qTNS6P6ArKA HTTP/1.1 22919 rename /.rhel-server-7.0-x86_64-boot.iso.IyYIH8 /rhel-server-7.0-x86_64-boot.iso 23031 2021-07-29T19:28:19.237Z [CURL DBG] > POST /mybucket/rhel-server-7.0-x86_64-boot.iso?uploads HTTP/1.1 23072 2021-07-29T19:28:19.240Z [CURL DBG] > PUT /mybucket/rhel-server-7.0-x86_64-boot.iso?partNumber=1&uploadId=088tz5RTbSWWrRV1RJwNXqmYo5wTsP4aulyANGy4VsO1nZ8fcFB-x_iLxw HTTP/1.1 23079 2021-07-29T19:28:19.240Z [CURL DBG] > x-amz-copy-source: /mybucket/.rhel-server-7.0-x86_64-boot.iso.IyYIH8 23080 2021-07-29T19:28:19.240Z [CURL DBG] > x-amz-copy-source-range: bytes=0-360710143 23108 2021-07-29T19:28:21.848Z [CURL DBG] > POST /mybucket/rhel-server-7.0-x86_64-boot.iso?uploadId=088tz5RTbSWWrRV1RJwNXqmYo5wTsP4aulyANGy4VsO1nZ8fcFB-x_iLxw HTTP/1.1 ```
Author
Owner

@gaul commented on GitHub (Jul 31, 2021):

Duplicate of #1556?

<!-- gh-comment-id:890351270 --> @gaul commented on GitHub (Jul 31, 2021): Duplicate of #1556?
Author
Owner

@CarstenGrohmann commented on GitHub (Aug 1, 2021):

Maybe, maybe not - I see only a relationship with #1556.

In
github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/s3fs.cpp (L741)

s3fscurl.MultipartHeadRequest() is called if the size is larger than multipart_threshold (default: 25MB).

github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/s3fs.cpp (L760-L763)

But in MultipartHeadRequest() the parts are created with GetMultipartCopySize() / multipart_copy_size (default 512MB):

github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/curl.cpp (L3952-L3953)

This causes that put_headers() uses multi part requests with 1 part only for files between 20M (multipart_threshold) and 512MB (multipart_copy_size).

Thereby I suggest to replace multipart_threshold with a better aligned value (e.g. multipart_copy_size?) in:

github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/s3fs.cpp (L760-L763)

This can be done after further testing in #1556 .

<!-- gh-comment-id:890514311 --> @CarstenGrohmann commented on GitHub (Aug 1, 2021): Maybe, maybe not - I see only a relationship with #1556. In https://github.com/s3fs-fuse/s3fs-fuse/blob/e1f3b9d8c17b191bce0db462fd5aaedad987e04a/src/s3fs.cpp#L741 `s3fscurl.MultipartHeadRequest()` is called if the size is larger than `multipart_threshold` (default: 25MB). https://github.com/s3fs-fuse/s3fs-fuse/blob/e1f3b9d8c17b191bce0db462fd5aaedad987e04a/src/s3fs.cpp#L760-L763 But in `MultipartHeadRequest()` the parts are created with `GetMultipartCopySize()` / `multipart_copy_size` (default 512MB): https://github.com/s3fs-fuse/s3fs-fuse/blob/e1f3b9d8c17b191bce0db462fd5aaedad987e04a/src/curl.cpp#L3952-L3953 This causes that `put_headers()` uses multi part requests with 1 part only for files between 20M (`multipart_threshold`) and 512MB (`multipart_copy_size`). Thereby I suggest to replace `multipart_threshold` with a better aligned value (e.g. `multipart_copy_size`?) in: https://github.com/s3fs-fuse/s3fs-fuse/blob/e1f3b9d8c17b191bce0db462fd5aaedad987e04a/src/s3fs.cpp#L760-L763 This can be done after further testing in #1556 .
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#891
No description provided.