mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #516] enable_content_md5 set metadata? #288
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#288
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tspicer on GitHub (Dec 14, 2016).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/516
Additional Information
The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Version of s3fs being used (s3fs --version)
latest from source
Version of fuse being used (pkg-config --modversion fuse)
2.9.6
Distro (cat /etc/issue)
Alpine Linux:latest
s3fs command line used (if applicable)
Details about issue
When setting enable_content_md5 to be true, I expected the meta-data to contain the actual checksum like "md5chksum": "WZOTosUmxoARnYQVXZDx5Q=="
However, the meta-data for files transfer does not contain the md5 value. It was not clear in the docs if the value should be present in the meta-data
@sqlbot commented on GitHub (Dec 14, 2016):
This value doesn't appear in the object metadata. Content-MD5 is used by
S3 when the upload arrives, to guard against defective uploads. If the
payload doesn't match the checksum, S3 rejects the upload with an error,
with the assumption that it is corrupt. If the payload does match, S3 saves
the object and the upload succeeds. S3 does not store the value in the
object metadata.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
On Dec 13, 2016 22:07, "Thomas Spicer" notifications@github.com wrote:
@tspicer commented on GitHub (Dec 14, 2016):
From what I read a few minutes ago (and a quick test) the MD5 value is set in the ETAG. However, it is not clear if that is true for multi-part uploads.
In this AWS FAQ they describe setting "md5chksum" in the metadata
https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/
I'm trying to make sure there is an external audit of files delivered are verified.
@tspicer commented on GitHub (Dec 14, 2016):
BTW, I see discussion of setting a "multipart_size" option for S3FS but no examples.
@sqlbot commented on GitHub (Dec 16, 2016):
Your concern for file integrity is well-founded. My own opinion is that
not only should enable_content_md5 be on by default, it should not be easy
to disable it. It's a foolhardy tradeoff for a minor performance
consideration.
The ETag of a multipart upload is the md5 of the binary (not hex, not
base64, just the raw octets of the) md5 of each part, concatenated together
in order, followed by "-" and the number of parts. This is not officially
documented but is well-established. If all the parts are of a known, fixed
size, it can be calculated from the downloaded file. Multiupart upload
integrity is assured if Content-MD5 is used there as well, but I don't know
how s3fs handles this case. (Note that I'm not one of the contributors,
I'm just a follower trying to be useful since I have worked extensively
with the raw S3 API).
The md5chksum mentioned in the docs is an arbitrary user metadata key with
no intrinsic meaning to S3.
On Dec 13, 2016 11:01 PM, "Thomas Spicer" notifications@github.com wrote:
@gaul commented on GitHub (Jan 2, 2017):
I agree that s3fs should provide integrity by default and provide an option to disable it. However, when configured for v4 signature, the default, Amazon requires the use of SHA-256 to guarantee file integrity which s3fs provides. Thus s3fs data is protected except when the S3 implementation only supports v2 signatures.
@gaul commented on GitHub (Jul 11, 2019):
These metadata are the user metadata. Content-MD5 is stored externally to this, along with the Content-Length, Content-Type, etc. Please look for these with your external tool.