mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #1941] multipart upload switches to 5MB part size #981
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#981
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @sbrudenell on GitHub (May 1, 2022).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1941
Version of s3fs being used (s3fs --version)
Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)
Kernel information (uname -r)
GNU/Linux Distribution, if applicable (cat /etc/os-release)
Using an
alpine:edgedocker container.s3fs command line used, if applicable
My
/tmp/is a 1gb tmpfs:Command I used to write a large file:
Logs illustrating issue
Details about issue
When writing a large file, s3fs will initially use parts of size
multipart_size, but then will start uploading parts of size 5mb. (See thesize=arguments in the logs above).It looks like the uploaded data is correct, but I quickly hit the 10000 part limit due to this issue. I am unable to write any files larger than ~55gb.
I have complete logs. I can post them somewhere secure. I don't want to post them in the issue because it's hard to scan them to make sure I'm not publishing my own secrets.
Tests I've done:
-o nomixupload. If I do not use-o nomixupload, I see this issue combined with #1936 (large uploads will start with large parts, then use 5mb parts, then "restart" with a new upload_id once the upload reaches 5gb, and again at 10gb, 15gb, etc).-o multipart_sizeis an even divisor of the free cache space, or not. (this is why I used-o multipart_size=201above)It seems like something is forcing the use of
MIN_MULTIPART_SIZE, which only occurs a few places in the code. I speculate that this is due to:github.com/s3fs-fuse/s3fs-fuse@a30beded1c/src/fdcache_entity.cpp (L1958)github.com/s3fs-fuse/s3fs-fuse@a30beded1c/src/fdcache_entity.cpp (L2037)github.com/s3fs-fuse/s3fs-fuse@a30beded1c/src/fdcache_fdinfo.h (L69)MIN_MULTIPART_SIZEis being used as a default value here. I'm not sure how s3fs' cache architecture works (if there's good documentation, I've missed it), but this is the best candidate I could find for where this value could even be used.