mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #808] Random write behavior #466
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#466
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @kunallillaney on GitHub (Aug 8, 2018).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/808
I had a few questions on how s3fs handles random writes since I seeing some weird network stats (given the disclaimer in the README and this subsequent post #607)
Background:
Question:
Thanks.
@kunallillaney commented on GitHub (Aug 8, 2018):
And a follow on question is if does it indeed use multi-part copy, then how does it get around the 5GB target file size limit for range reads placed by S3.
@gaul commented on GitHub (Sep 11, 2018):
Which options do you mount s3fs with? Have you enabled the data cache? This could explain the read behavior you experience.
For writes, s3fs does issue multipart uploads and copies data for the unchanged regions. Hence you should expect a 1-byte random write to a 100 MB file to copy (9) 10 MB chunks and upload (1) 10 MB chunk. If the cache does not contain this range it will have to download it before uploading the whole part.
@kunallillaney commented on GitHub (Sep 11, 2018):
The data cache is enabled. I tried this by opening the file in "r+" mode to update. In this case, the file ends up with correct data in the updated region and zeros everywhere else (which sounds like a bug to me). I don't think s3fs copies data for the other 9 chunks. If it does, can you please explain how it gets around the 5GB file size limit I mention above? S3FS needs to do range reads on the remaining 90MB of data to do so but S3 does not allow it since the target file is only 100MB.
I believe this was verified by @orozery as well.
@gaul commented on GitHub (Sep 11, 2018):
If you observe data corruption, with unexpected zeros, this is a serious issue and I will take a look at it. Can you minimize a test case with exact steps to reproduce this behavior?
s3fs should copy ranges which are not updated on the server. S3 parts, either uploaded or copied, are limited to >= 5 MB and <= 5 GB but the total MPU object size can be up to 5 TB.
@kunallillaney commented on GitHub (Sep 11, 2018):
@gaul You are correct in saying that S3 parts are limited to >= 5 MB and <= 5 GB but this is the case for only uploaded ones. They make explicit mention of this in their boto3 documentation under CopySourceRange parameter where it is stated "You can copy a range only if the source object is greater than 5 GB". I don't think this is limited to boto3 since I tried this over the HTTP API as well and it threw and error.
I will post steps or a small script to recreate the issue in AM tomorrow.
@gaul commented on GitHub (Sep 17, 2018):
The boto3 documentation for part size is incorrect. This phrase seems to originate from some AWS code generation which I cannot find the source for so I opened aws/aws-cli#3577 which has the same issue and hope they can direct to the proper location.
Did you reproduce your data loss issue? Please provide instructions so I can investigate this otherwise can you close the issue?
@kunallillaney commented on GitHub (Sep 17, 2018):
@gaul Sorry I have been busy with a paper and have been unable to look at this. When I had looked at it last, I was able to reproduce it multiple times and the issue was also confirmed by another contributor to the repository (mentioned above). I am positive that this issue exists and I will post the steps sometime next week.
@kunallillaney commented on GitHub (Oct 31, 2018):
@gaul Sorry I was busy with a paper and then traveling for a conference. Here are the steps to reproduce this bug
Please let me know if this is unclear or you have further questions.
@gaul commented on GitHub (Jan 23, 2019):
@kunallillaney Could you test again with master? We fixed a few zero data issues and #918 is the most similar to your symptom.
@gaul commented on GitHub (Jun 25, 2019):
Closing due to inactivity.