mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 21:35:58 +03:00
[GH-ISSUE #850] Incorrect etag value after large file upload #496
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#496
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @pawelmarkowski on GitHub (Nov 2, 2018).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/850
Additional Information
Incorrect ETAG in Object Storage after large file upload. If you upload 1GB file to Ceph Object storage you will receive incorrect etag value.
file_hash_large = '7917e22de415e3943220abef484c8526'
file_size_large = 1040000000 [B]
Nevertheless in small files case it looks fine. If I download the large file that was uploaded earlier by s3fs and calculate md5 - it is correct, so file is not broken, but s3fs makes something wrong with metadata.
Version of s3fs being used (s3fs --version)
Amazon Simple Storage Service File System V1.84(commit:f36ac3d) with OpenSSL
Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)
2.9.7
Platform | Linux-4.15.0-38-generic-x86_64-with-LinuxMint-19-tara
Plugins | {'xdist': '1.24.0', 'repeat': '0.7.0', 'metadata': '1.7.0', 'html': '1.19.0', 'forked': '0.2'}
Python
s3fs command line used, if applicable
Details about issue
AssertionError: assert
MD5 value - 7917e22de415e3943220abef484c8526
ETAG + 1a5409445aa4a897571415c264201158-100
@gaul commented on GitHub (Nov 4, 2018):
Could you provide steps to reproduce this along with the source of the error, e.g., s3fs or ceph? I successfully ran
dd if=/dev/zero of=gaulbackup/tmp bs=1M count=1000 status=progresswith-o enable_content_md5against AWS so I wonder if ceph does something different.@pawelmarkowski commented on GitHub (Nov 6, 2018):
We have mount:
/usr/local/bin/s3fs products /mnt/buck -o passwd_file=~/.passwd-s3fs -o url=https://endpointurlcom.com:8080 -o use_path_request_style -o umask=0222 -o allow_other -o enable_content_md5 -o dbglevel=debug -f -o uid=1000 -o gid=1000
dd if=/dev/zero of=/mnt/buck/zero bs=1M count=1000 status=progress
Run test:
self = <test_read.TestReading object at 0x7f7b73c5fc88>, key = 'zero'
@pytest.mark.slow
def test_checksum(self, key):
file_hash = hashlib.md5(file_as_bytes(
open(os.path.join(c['MOUNTPOINT'], key), 'rb'))).hexdigest()
object_hash = self.s3.head_object(Bucket=c['BUCKET'], Key=key)[
'ETag'].strip('"')
E AssertionError: assert 'e5c834fbdaa6...5eb9404eefdd4' == '210d5322e146a...dc36e067f-100'
E - e5c834fbdaa6bfd8eac5eb9404eefdd4
E + 210d5322e146ac65333e9a8dc36e067f-100
tests/test_read.py:84: AssertionError
@pawelmarkowski commented on GitHub (Nov 6, 2018):
Logs look fine. I will send you an email @gaul
@sqlbot commented on GitHub (Nov 6, 2018):
I believe the problem is with your expectations.
ETag == MD5 is an assumption that does not always hold.
Multipart uploads always result in multipart ETags in the form shown (the
-100means the upload was sent using 100 chunks, and the hex portion is the hex-encoded md5 of the result of concatenating the bytes of the binary MD5s of the 100 individual chunks, in order).The ETag, in any event, is created by the storage service, not s3fs.
Using
-o nomultipartdisables multipart uploads, and should result in the storage service assigning the ETag you are expecting. It also will limit your largest possible upload to 5 GB and will probably result in inferior performance, since you lose the parallel upload behavior that multipart allows, as well as any possibility of partial retry.@pawelmarkowski commented on GitHub (Nov 15, 2018):
thanks @sqlbot for explanation