[GH-ISSUE #1311] s3fs error 400 due to part size exceeding 5GB #702

Closed
opened 2026-03-04 01:48:01 +03:00 by kerem · 10 comments
Owner

Originally created by @xrefft on GitHub (Jun 19, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1311

To reproduce this problem,you just need:

touch tmp
truncate -s5368709120 ./tmp

or

Just copy a 5G file from local file system to s3fs.

Note: the maximum part size allowed by our s3 backend is 5GB.

Originally created by @xrefft on GitHub (Jun 19, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1311 To reproduce this problem,you just need: ```shell touch tmp truncate -s5368709120 ./tmp ``` or Just copy a 5G file from local file system to s3fs. Note: the maximum part size allowed by our s3 backend is 5GB.
kerem 2026-03-04 01:48:01 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@gaul commented on GitHub (Jun 19, 2020):

s3fs supports objects larger than 5 GB. How do you mount s3fs? Also which S3 server do you use?
Finally can you run via s3fs -d -f and share the debug logs?

<!-- gh-comment-id:646596129 --> @gaul commented on GitHub (Jun 19, 2020): s3fs supports objects larger than 5 GB. How do you mount s3fs? Also which S3 server do you use? Finally can you run via `s3fs -d -f` and share the debug logs?
Author
Owner

@xrefft commented on GitHub (Jun 23, 2020):

S3fs does support 5GB files.
When the file grows slowly by s3fs_write, everything is ok. But when the file grows fast by s3fs_truncate, it will be different.

The error occurred when calling the s3fs_truncate function.

[INF] s3fs.cpp:s3fs_getattr(971): [path=/big_file]
[INF] s3fs.cpp:s3fs_truncate(2220): [path=/big_file][size=5368709120]

s3fs_truncate calls pagelist.Resize , and pagelist.Resize just append one single large part into the pagelist.

bool PageList::Resize(off_t size, bool is_loaded, bool is_modified) {
        //...
	} else if (total < size) {
		// add new area
		fdpage page(total, (size - total), is_loaded, is_modified);
		pages.push_back(page);
        //..

Next, when RowFlush is called, the error happens naturally. Inside MultiPartUpload request, only 1 part will be uploaded.

[INF]       fdcache.cpp:RowFlush(1860): [tpath=][path=/big_file][fd=9]
[INF]       curl.cpp:ParallelMixMultipartUploadRequest(1472): [tpath=/big_file][fd=9]
[INF]       curl.cpp:PreMultipartPostRequest(3442): [tpath=/big_file]
[INF]       curl.cpp:prepare_url(4624): URL is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?uploads
[INF]       curl.cpp:prepare_url(4657): URL changed is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?uploads
[INF]       curl.cpp:insertV4Headers(2694): computing signature [POST] [/big_file] [uploads] []
[INF]       curl.cpp:url_to_host(97): url is http://s3v2-qos.storage.wanyol.com
[INF]       curl.cpp:RequestPerform(2345): HTTP response code 200
[INF]       curl.cpp:ParallelMixMultipartUploadRequest(1551): Copy Part [tpath=/big_file][start=0][size=5368709120][part=1]
[INF]       curl.cpp:CopyMultipartPostSetup(3787): [from=/big_file][to=/big_file][part=1]
[INF]       curl.cpp:prepare_url(4624): URL is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645
[INF]       curl.cpp:prepare_url(4657): URL changed is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645
[INF]       curl.cpp:CopyMultipartPostSetup(3828): copying... [from=/big_file][to=/big_file][part=1]
[INF]       curl.cpp:Request(4360): [count=1]
[INF]       curl.cpp:insertV4Headers(2694): computing signature [PUT] [/big_file] [partNumber=1&uploadId=5ef1ab471645468ec2af0645] []
[INF]       curl.cpp:url_to_host(97): url is http://s3v2-qos.storage.wanyol.com
[ERR] curl.cpp:RequestPerform(2363): HTTP response code 400, returning EIO. Body Text: <Error><Code>InvalidArgument</Code><Message>Invalid Argument</Message><Resource>/the-test-bkt/big_file</Resource><RequestId>9UwGAEMwgyzFGhsW</RequestId></Error>
[WAN] curl.cpp:MultiPerform(4262): thread failed - rc(-5)
[WAN] curl.cpp:MultiRead(4291): failed a request(400: http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645)
[INF]       curl.cpp:CopyMultipartPostSetup(3787): [from=/big_file][to=/big_file][part=1]
[INF]       curl.cpp:prepare_url(4624): URL is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645
[INF]       curl.cpp:prepare_url(4657): URL changed is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645
[INF]       curl.cpp:CopyMultipartPostSetup(3828): copying... [from=/big_file][to=/big_file][part=1]
[INF]       curl.cpp:insertV4Headers(2694): computing signature [PUT] [/big_file] [partNumber=1&uploadId=5ef1ab471645468ec2af0645] []
[INF]       curl.cpp:url_to_host(97): url is http://s3v2-qos.storage.wanyol.com
[ERR] curl.cpp:RequestPerform(2363): HTTP response code 400, returning EIO. Body Text: <Error><Code>InvalidArgument</Code><Message>Invalid Argument</Message><Resource>/the-test-bkt/big_file</Resource><RequestId>9UwGAA_AMS_FGhsW</RequestId></Error>

Our backend returns Invalid Argument only, and we confirmed that the error is caused by too large Mutipart Size.

<!-- gh-comment-id:647966466 --> @xrefft commented on GitHub (Jun 23, 2020): S3fs does support 5GB files. When the file grows slowly by `s3fs_write`, everything is ok. But when the file grows fast by `s3fs_truncate`, it will be different. The error occurred when calling the `s3fs_truncate` function. ```log [INF] s3fs.cpp:s3fs_getattr(971): [path=/big_file] [INF] s3fs.cpp:s3fs_truncate(2220): [path=/big_file][size=5368709120] ``` `s3fs_truncate` calls `pagelist.Resize` , and `pagelist.Resize` just append one single large part into the `pagelist`. ```c++ bool PageList::Resize(off_t size, bool is_loaded, bool is_modified) { //... } else if (total < size) { // add new area fdpage page(total, (size - total), is_loaded, is_modified); pages.push_back(page); //.. ``` Next, when `RowFlush` is called, the error happens naturally. Inside MultiPartUpload request, only 1 part will be uploaded. ```log [INF] fdcache.cpp:RowFlush(1860): [tpath=][path=/big_file][fd=9] [INF] curl.cpp:ParallelMixMultipartUploadRequest(1472): [tpath=/big_file][fd=9] [INF] curl.cpp:PreMultipartPostRequest(3442): [tpath=/big_file] [INF] curl.cpp:prepare_url(4624): URL is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?uploads [INF] curl.cpp:prepare_url(4657): URL changed is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?uploads [INF] curl.cpp:insertV4Headers(2694): computing signature [POST] [/big_file] [uploads] [] [INF] curl.cpp:url_to_host(97): url is http://s3v2-qos.storage.wanyol.com [INF] curl.cpp:RequestPerform(2345): HTTP response code 200 [INF] curl.cpp:ParallelMixMultipartUploadRequest(1551): Copy Part [tpath=/big_file][start=0][size=5368709120][part=1] [INF] curl.cpp:CopyMultipartPostSetup(3787): [from=/big_file][to=/big_file][part=1] [INF] curl.cpp:prepare_url(4624): URL is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645 [INF] curl.cpp:prepare_url(4657): URL changed is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645 [INF] curl.cpp:CopyMultipartPostSetup(3828): copying... [from=/big_file][to=/big_file][part=1] [INF] curl.cpp:Request(4360): [count=1] [INF] curl.cpp:insertV4Headers(2694): computing signature [PUT] [/big_file] [partNumber=1&uploadId=5ef1ab471645468ec2af0645] [] [INF] curl.cpp:url_to_host(97): url is http://s3v2-qos.storage.wanyol.com [ERR] curl.cpp:RequestPerform(2363): HTTP response code 400, returning EIO. Body Text: <Error><Code>InvalidArgument</Code><Message>Invalid Argument</Message><Resource>/the-test-bkt/big_file</Resource><RequestId>9UwGAEMwgyzFGhsW</RequestId></Error> [WAN] curl.cpp:MultiPerform(4262): thread failed - rc(-5) [WAN] curl.cpp:MultiRead(4291): failed a request(400: http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645) [INF] curl.cpp:CopyMultipartPostSetup(3787): [from=/big_file][to=/big_file][part=1] [INF] curl.cpp:prepare_url(4624): URL is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645 [INF] curl.cpp:prepare_url(4657): URL changed is http://s3v2-qos.storage.wanyol.com/the-test-bkt/big_file?partNumber=1&uploadId=5ef1ab471645468ec2af0645 [INF] curl.cpp:CopyMultipartPostSetup(3828): copying... [from=/big_file][to=/big_file][part=1] [INF] curl.cpp:insertV4Headers(2694): computing signature [PUT] [/big_file] [partNumber=1&uploadId=5ef1ab471645468ec2af0645] [] [INF] curl.cpp:url_to_host(97): url is http://s3v2-qos.storage.wanyol.com [ERR] curl.cpp:RequestPerform(2363): HTTP response code 400, returning EIO. Body Text: <Error><Code>InvalidArgument</Code><Message>Invalid Argument</Message><Resource>/the-test-bkt/big_file</Resource><RequestId>9UwGAA_AMS_FGhsW</RequestId></Error> ``` Our backend returns `Invalid Argument` only, and we confirmed that the error is caused by too large Mutipart Size.
Author
Owner

@gaul commented on GitHub (Jun 23, 2020):

Thanks for the additional details! I can reproduce these symptoms locally. You can work around this with -o nomixupload until we create a fix for this.

<!-- gh-comment-id:648024019 --> @gaul commented on GitHub (Jun 23, 2020): Thanks for the additional details! I can reproduce these symptoms locally. You can work around this with `-o nomixupload` until we create a fix for this.
Author
Owner

@xrefft commented on GitHub (Jun 24, 2020):

Thanks for your support!
Since we need to use s3fs, I tried to fix this bug in the code a few days ago, by trying:
1: append multiple (rather than one) part in the Init and Resize functions of PageList.
2: appropriate setting is_modified=true when calling Resize function.
Obviously this increases the unnecessary memory usage. Looking forward to the official bug fixes!

<!-- gh-comment-id:648552162 --> @xrefft commented on GitHub (Jun 24, 2020): Thanks for your support! Since we need to use s3fs, I tried to fix this bug in the code a few days ago, by trying: 1: append multiple (rather than one) part in the `Init` and `Resize` functions of `PageList`. 2: appropriate setting `is_modified=true` when calling `Resize` function. Obviously this increases the unnecessary memory usage. Looking forward to the official bug fixes!
Author
Owner

@gaul commented on GitHub (Jun 24, 2020):

@ggtakec Any insight on this? I proposed a test based on @xrefft suggestion.

<!-- gh-comment-id:648692491 --> @gaul commented on GitHub (Jun 24, 2020): @ggtakec Any insight on this? I proposed a test based on @xrefft suggestion.
Author
Owner

@ggtakec commented on GitHub (Jun 24, 2020):

@xrefft Thank you for reporting the defect.
and @gaul Thanks for the test code too.

I'm trying some tests on this subject.
Certainly there is a flaw in s3fs, but the problem was a bit more complicated than simple that I expected.
I have confirmed some case where the behavior of s3fs is different when I tried to upload a sparse file(which file had HOLE) created with truncate.
Even with a sparse file, I got different results depending on whether there are HOLEs in the whole file or partially.
I'll investigate the symptoms and make the patch code soon.

Thanks in advance for your help.

<!-- gh-comment-id:648766473 --> @ggtakec commented on GitHub (Jun 24, 2020): @xrefft Thank you for reporting the defect. and @gaul Thanks for the test code too. I'm trying some tests on this subject. Certainly there is a flaw in s3fs, but the problem was a bit more complicated than simple that I expected. I have confirmed some case where the behavior of s3fs is different when I tried to upload a sparse file(which file had HOLE) created with truncate. Even with a sparse file, I got different results depending on whether there are HOLEs in the whole file or partially. I'll investigate the symptoms and make the patch code soon. Thanks in advance for your help.
Author
Owner

@xrefft commented on GitHub (Jun 24, 2020):

@ggtakec Since the parse_partsize_fdpage_list function was added, our problem has been solved!
Thank you very much!

<!-- gh-comment-id:648776866 --> @xrefft commented on GitHub (Jun 24, 2020): @ggtakec Since the `parse_partsize_fdpage_list` function was added, our problem has been solved! Thank you very much!
Author
Owner

@ggtakec commented on GitHub (Jun 24, 2020):

@xrefft It's glad for me, but I'm finding another problem(about sparse file), so I'll write a patchcord.
Thanks for your kindness.

<!-- gh-comment-id:648794512 --> @ggtakec commented on GitHub (Jun 24, 2020): @xrefft It's glad for me, but I'm finding another problem(about sparse file), so I'll write a patchcord. Thanks for your kindness.
Author
Owner

@gaul commented on GitHub (Jun 24, 2020):

The referenced test still fails -- let's leave this open until we address that issue.

<!-- gh-comment-id:649122448 --> @gaul commented on GitHub (Jun 24, 2020): The referenced test still fails -- let's leave this open until we address that issue.
Author
Owner

@ggtakec commented on GitHub (Jun 27, 2020):

@xrefft The fix code was merged into master.
I think this code solves the problem.
This issue will be closed, but if you still have problems, please reopen this.
Thanks.

<!-- gh-comment-id:650566534 --> @ggtakec commented on GitHub (Jun 27, 2020): @xrefft The fix code was merged into master. I think this code solves the problem. This issue will be closed, but if you still have problems, please reopen this. Thanks.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#702
No description provided.