mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #1732] Use a single request for updating metadata and renaming objects #891
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#891
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @CarstenGrohmann on GitHub (Jul 30, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1732
Currently, s3fs uses multipart requests with a single part to update metadata or rename objects of medium size (< 5GB).
By switching from multi-part requests to a single PUT request, s3fs can speed up such operations.
One possible change would be to modify
put_headers()so thats3fscurl.PutHeadRequest()is executed most of the time.Current code:
github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/s3fs.cpp (L760-L768)Suggested change: (untested) Use single PUT request for all < 5GB
What do you think about such change?
@gaul commented on GitHub (Jul 30, 2021):
Multipart copy should improve performance of large objects compared to single part copy due to parallelism for the server-side copy. Do you have benchmarks which show otherwise? My previous results showed a 5x speedup which is why I added this optimization in #1562.
@CarstenGrohmann commented on GitHub (Jul 31, 2021):
Your explanation sounds logical. At the moment, I don't have any benchmark but I'll create one during the next few days.
Below is an excerpt from my log. This shows that the file of 360MB is only modified as a one part multipart upload. As a result, there can be no server-side parallelization.
Since the file size of 360MB is larger than
multipart_size, this may still be a bug, since the metadata update should consist of two parts.@gaul commented on GitHub (Jul 31, 2021):
Duplicate of #1556?
@CarstenGrohmann commented on GitHub (Aug 1, 2021):
Maybe, maybe not - I see only a relationship with #1556.
In
github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/s3fs.cpp (L741)s3fscurl.MultipartHeadRequest()is called if the size is larger thanmultipart_threshold(default: 25MB).github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/s3fs.cpp (L760-L763)But in
MultipartHeadRequest()the parts are created withGetMultipartCopySize()/multipart_copy_size(default 512MB):github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/curl.cpp (L3952-L3953)This causes that
put_headers()uses multi part requests with 1 part only for files between 20M (multipart_threshold) and 512MB (multipart_copy_size).Thereby I suggest to replace
multipart_thresholdwith a better aligned value (e.g.multipart_copy_size?) in:github.com/s3fs-fuse/s3fs-fuse@e1f3b9d8c1/src/s3fs.cpp (L760-L763)This can be done after further testing in #1556 .