mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #2095] Multipart upload fails with Cloudflare R2 #1064
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#1064
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @gaul on GitHub (Jan 15, 2023).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2095
Cloudflare supports multipart upload but copying a 25 MB file fails:
A 5 MB file (single part request) succeeds.
Original bug report: https://twitter.com/menkatsukiroku/status/1602606940491759616
@gaul commented on GitHub (Jan 15, 2023):
This seems to be caused by uploadId that contain a wider set of characters than AWS usually uses:
These characters
/,+, and=need to be URL-encoded.@ggtakec commented on GitHub (Jan 22, 2023):
@gaul
If uoloadId is passed in normal encoding in advance, it seems to work, but I'm a little worried.
When using encoded
uploadId, I don't know if the encoded string is used as the base string when creating theSignaturefor the header.I think the fix is easy if it's no problem to use a encoded
uploadIdstring for generatingSignature.However, if it uses the original
uploadIdstring, it becomes a little more cumbersome to modify.@ggtakec commented on GitHub (Jan 22, 2023):
@gaul I have checked the current source.
Although it is not a query string, the signature is used the encoded string from the
URL path.Considering this, it seems no problem to pass the encoded string of the query string(included
uploadId) to the signature calculation.I will try to modify the source code according to this policy.
@ggtakec commented on GitHub (Jan 22, 2023):
@gaul
I posted PR #2095.
I don't have Cloudflare set up on hand, can I try it with this PR?
@ggtakec commented on GitHub (Jan 24, 2023):
@gaul
The following error reported in #2095 should now work correctly with the fix in this PR.
However, we still got the error below that you and I detected.
The reason for this error was that s3fs was working with MixUpload(default).
In MixUpload mode, each part size is not constant.(see. #1822)
In this mode, the range of data that needs to be uploaded is uploaded in multipart size as much as possible, but the size is not fixed on a case-by-case basis.
This seems to be an error in Cloudflare because each part size is not fixed.
To resolve, either give the
nomixuploadoption or give thestreamuploadoption.For these options (No-MixUpload or StreamUpload) each part size is fixed.
Also, for Cloudflare, do not give the
enable_content_md5option.The upload fails because the ETag(=md5) at PUT of each part and the ETag(not md5 in case of Cloudflare) of the response do not match.
@barabo commented on GitHub (Jan 25, 2023):
Is the problem that MixUpload mode does not work with
enable_content_md5when using R2? I'm wondering if MixUpload mode with work whenenable_content_md5is disabled.@ggtakec commented on GitHub (Jan 25, 2023):
@barabo
MixUpload mode and
enable_content_md5are not linked.There is no difference about
enable_content_md5between regular MultipartUpload and MixUpload and StreamUpload.enable_content_md5is a function to check the consistency of the data before sending the data and the data received by the server when the upload is completed.As for processing, if
enable_content_md5is specified, s3fs will calculate the md5 value of the uploaded data and compare it with theETagvalue returned in the upload response.For
AWS S3, theETagin the response is an md5 value, so specifyingenable_content_md5increases the accuracy of content integrity.However, in
Cloudflare R2theETagin the response is NOT the md5 value. So specifyingenable_content_md5is an error. In other words,Cloudflare R2cannot check the integrity of the transmitted data using the md5 value.The following is a summary of the issues raised in this issue.
UploadIdwas not URL encoded)MixUploadcannot be used withCloudflare R2(because each part size must be fixed)Cloudflare R3,enable_content_md5cannot be used either.@barabo commented on GitHub (Jan 25, 2023):
@ggtakec - Thank you for the detailed response!
I was looking at the R2 API docs and it seems like
UploadPartoperation should support providing a content MD5 for the uploaded part. Is the problem in the response afterward? You mentioned that the responseETagis not the md5, but I'm wondering if the md5 is in another header, or if there's some other signal to indicate that the part upload matched the checksum provided when upload began.FWIW - if you would like access to an R2 bucket to experiment with I have a personal account I can provide you with keys to play with.
Anyway, this is not an urgent issue for me - I was just curious! Thanks, again!
@ggtakec commented on GitHub (Jan 26, 2023):
@barabo
Below is the transmission/reception log of one part when performing a multipart upload to Cloutflare R2.
There is no md5 etc in the response as above.
It is a possible that I'm missing unknown options/parameters.
But I don't think s3fs(client side) can validate it because there is no information about the content received in the response(such as MD5 value).
@gaul commented on GitHub (Jan 29, 2023):
This still fails:
I don't have time to look at this now but I believe that s3fs should maintain query parameters in a
list<pair<string, string> >. There are different encoding rules for the AWS signature and HTTP encoding.@ggtakec commented on GitHub (Jan 29, 2023):
@gaul
Does it occur even with the
nomixupload(orstreamupload) option specified?In the case I confirmed,
InvalidPartoccurred when the part size was not fixed by usingmixupload.InvalidPartdoes not occur by fixing the part size(excluding the final part).Specify
nomixuploadand investigate a little more to see if the same phenomenon can be reproduced.@ggtakec commented on GitHub (Feb 18, 2023):
#2097 has been merged.
When accessing Cloudflare R2 with code of master branch, you should specify the
nomultipartorstreamuploadoptions.I think that will solve this problem.
@ggtakec commented on GitHub (Mar 19, 2023):
This will be closed. If you still have problems, please reopen or post a new issue.
@abelbeck commented on GitHub (Jul 4, 2024):
Did you mean to say "you should specify the
nomixuploadorstreamuploadoptions" as you stated before ?I have been testing Cloudflare R2 with
s3fs 1.94and large uploads failed until I set-o nomixupload.I think the WIKI https://github.com/s3fs-fuse/s3fs-fuse/wiki/Non-Amazon-S3#cloudflare-r2 for "Cloudflare R2" should replace
nomultipartwithnomixupload.