[GH-ISSUE #516] enable_content_md5 set metadata? #288

Closed
opened 2026-03-04 01:44:04 +03:00 by kerem · 6 comments
Owner

Originally created by @tspicer on GitHub (Dec 14, 2016).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/516

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.

  • Version of s3fs being used (s3fs --version)
    latest from source

  • Version of fuse being used (pkg-config --modversion fuse)
    2.9.6

  • Distro (cat /etc/issue)
    Alpine Linux:latest

  • s3fs command line used (if applicable)

ARGS="-o url=https://s3-external-1.amazonaws.com -o enable_noobj_cache -o stat_cache_expire=300 -o multireq_max=100 -o parallel_count=10 -o max_stat_cache_size=50000 -o use_sse -o enable_content_md5 -o use_cache=${CACHE_DIR} -o nonempty -o allow_other,mp_umask=022 -o uid=2001 -o gid=2001 -o nonempty -o nodev -o default_acl=bucket-owner-full-control ${S3BUCKET} ${MOUNT_POINT}"

Details about issue

When setting enable_content_md5 to be true, I expected the meta-data to contain the actual checksum like "md5chksum": "WZOTosUmxoARnYQVXZDx5Q=="

However, the meta-data for files transfer does not contain the md5 value. It was not clear in the docs if the value should be present in the meta-data

 "Metadata": {
        "gid": "0",
        "mtime": "1481684479",
        "uid": "0",
        "mode": "32961"
Originally created by @tspicer on GitHub (Dec 14, 2016). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/516 #### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ - Version of s3fs being used (s3fs --version) latest from source - Version of fuse being used (pkg-config --modversion fuse) 2.9.6 - Distro (cat /etc/issue) Alpine Linux:latest - s3fs command line used (if applicable) ``` ARGS="-o url=https://s3-external-1.amazonaws.com -o enable_noobj_cache -o stat_cache_expire=300 -o multireq_max=100 -o parallel_count=10 -o max_stat_cache_size=50000 -o use_sse -o enable_content_md5 -o use_cache=${CACHE_DIR} -o nonempty -o allow_other,mp_umask=022 -o uid=2001 -o gid=2001 -o nonempty -o nodev -o default_acl=bucket-owner-full-control ${S3BUCKET} ${MOUNT_POINT}" ``` #### Details about issue When setting enable_content_md5 to be true, I expected the meta-data to contain the actual checksum like "md5chksum": "WZOTosUmxoARnYQVXZDx5Q==" However, the meta-data for files transfer does not contain the md5 value. It was not clear in the docs if the value should be present in the meta-data ``` "Metadata": { "gid": "0", "mtime": "1481684479", "uid": "0", "mode": "32961" ```
kerem closed this issue 2026-03-04 01:44:04 +03:00
Author
Owner

@sqlbot commented on GitHub (Dec 14, 2016):

This value doesn't appear in the object metadata. Content-MD5 is used by
S3 when the upload arrives, to guard against defective uploads. If the
payload doesn't match the checksum, S3 rejects the upload with an error,
with the assumption that it is corrupt. If the payload does match, S3 saves
the object and the upload succeeds. S3 does not store the value in the
object metadata.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html

On Dec 13, 2016 22:07, "Thomas Spicer" notifications@github.com wrote:

Additional Information

The following information is very important in order to help us to help
you. Omission of the following details may delay your support request or
receive no attention at all.

Version of s3fs being used (s3fs --version)
latest from source

Version of fuse being used (pkg-config --modversion fuse)
2.9.6

Distro (cat /etc/issue)
Alpine Linux

s3fs command line used (if applicable)

ARGS="-o url=https://s3-external-1.amazonaws.com -o enable_noobj_cache -o stat_cache_expire=300 -o multireq_max=100 -o parallel_count=10 -o max_stat_cache_size=50000 -o use_sse -o enable_content_md5 -o use_cache=${CACHE_DIR} -o nonempty -o allow_other,mp_umask=022 -o uid=2001 -o gid=2001 -o nonempty -o nodev -o default_acl=bucket-owner-full-control ${S3BUCKET} ${MOUNT_POINT}"

Details about issue

When setting enable_content_md5 to be true, I expected the meta-data to
contain the actual checksum like "md5chksum": "WZOTosUmxoARnYQVXZDx5Q=="

However, the meta-data for files transfer does not contain the md5 value.
It was not clear in the docs if the value should be present in the meta-data

"Metadata": {
"gid": "0",
"mtime": "1481684479",
"uid": "0",
"mode": "32961"


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/s3fs-fuse/s3fs-fuse/issues/516, or mute the thread
https://github.com/notifications/unsubscribe-auth/ACUxPNPW2wIVdHYXiVB1Pe2hMyNGAkjcks5rH11dgaJpZM4LMe9S
.

<!-- gh-comment-id:266936261 --> @sqlbot commented on GitHub (Dec 14, 2016): This value doesn't appear in the object metadata. Content-MD5 is used by S3 when the upload arrives, to guard against defective uploads. If the payload doesn't match the checksum, S3 rejects the upload with an error, with the assumption that it is corrupt. If the payload does match, S3 saves the object and the upload succeeds. S3 does not store the value in the object metadata. http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html On Dec 13, 2016 22:07, "Thomas Spicer" <notifications@github.com> wrote: > Additional Information > > *The following information is very important in order to help us to help > you. Omission of the following details may delay your support request or > receive no attention at all.* > > - > > Version of s3fs being used (s3fs --version) > latest from source > - > > Version of fuse being used (pkg-config --modversion fuse) > 2.9.6 > - > > Distro (cat /etc/issue) > Alpine Linux > - > > s3fs command line used (if applicable) > > ARGS="-o url=https://s3-external-1.amazonaws.com -o enable_noobj_cache -o stat_cache_expire=300 -o multireq_max=100 -o parallel_count=10 -o max_stat_cache_size=50000 -o use_sse -o enable_content_md5 -o use_cache=${CACHE_DIR} -o nonempty -o allow_other,mp_umask=022 -o uid=2001 -o gid=2001 -o nonempty -o nodev -o default_acl=bucket-owner-full-control ${S3BUCKET} ${MOUNT_POINT}" > > Details about issue > > When setting enable_content_md5 to be true, I expected the meta-data to > contain the actual checksum like "md5chksum": "WZOTosUmxoARnYQVXZDx5Q==" > > However, the meta-data for files transfer does not contain the md5 value. > It was not clear in the docs if the value should be present in the meta-data > > "Metadata": { > "gid": "0", > "mtime": "1481684479", > "uid": "0", > "mode": "32961" > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/s3fs-fuse/s3fs-fuse/issues/516>, or mute the thread > <https://github.com/notifications/unsubscribe-auth/ACUxPNPW2wIVdHYXiVB1Pe2hMyNGAkjcks5rH11dgaJpZM4LMe9S> > . >
Author
Owner

@tspicer commented on GitHub (Dec 14, 2016):

From what I read a few minutes ago (and a quick test) the MD5 value is set in the ETAG. However, it is not clear if that is true for multi-part uploads.

In this AWS FAQ they describe setting "md5chksum" in the metadata
https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/

I'm trying to make sure there is an external audit of files delivered are verified.

<!-- gh-comment-id:266936771 --> @tspicer commented on GitHub (Dec 14, 2016): From what I read a few minutes ago (and a quick test) the MD5 value is set in the ETAG. However, it is not clear if that is true for multi-part uploads. In this AWS FAQ they describe setting "md5chksum" in the metadata https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/ I'm trying to make sure there is an external audit of files delivered are verified.
Author
Owner

@tspicer commented on GitHub (Dec 14, 2016):

BTW, I see discussion of setting a "multipart_size" option for S3FS but no examples.

<!-- gh-comment-id:266938145 --> @tspicer commented on GitHub (Dec 14, 2016): BTW, I see discussion of setting a "multipart_size" option for S3FS but no examples.
Author
Owner

@sqlbot commented on GitHub (Dec 16, 2016):

Your concern for file integrity is well-founded. My own opinion is that
not only should enable_content_md5 be on by default, it should not be easy
to disable it. It's a foolhardy tradeoff for a minor performance
consideration.

The ETag of a multipart upload is the md5 of the binary (not hex, not
base64, just the raw octets of the) md5 of each part, concatenated together
in order, followed by "-" and the number of parts. This is not officially
documented but is well-established. If all the parts are of a known, fixed
size, it can be calculated from the downloaded file. Multiupart upload
integrity is assured if Content-MD5 is used there as well, but I don't know
how s3fs handles this case. (Note that I'm not one of the contributors,
I'm just a follower trying to be useful since I have worked extensively
with the raw S3 API).

The md5chksum mentioned in the docs is an arbitrary user metadata key with
no intrinsic meaning to S3.

On Dec 13, 2016 11:01 PM, "Thomas Spicer" notifications@github.com wrote:

From what I read a few minutes ago (and a quick test) the MD5 value is set
in the ETAG. However, it is not clear if that is true for multi-part
uploads.

In this AWS FAQ they describe setting "md5chksum" in the metadata
https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/

I'm trying to make sure there is an external audit of files delivered are
verified.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/s3fs-fuse/s3fs-fuse/issues/516#issuecomment-266936771,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACUxPAs7J6Znh8Z98bi3mgls-0zHNguoks5rH2oKgaJpZM4LMe9S
.

<!-- gh-comment-id:267511877 --> @sqlbot commented on GitHub (Dec 16, 2016): Your concern for file integrity is well-founded. My own opinion is that not only should enable_content_md5 be on by default, it should not be easy to disable it. It's a foolhardy tradeoff for a minor performance consideration. The ETag of a multipart upload is the md5 of the *binary* (not hex, not base64, just the raw octets of the) md5 of each part, concatenated together in order, followed by "-" and the number of parts. This is not officially documented but is well-established. If all the parts are of a known, fixed size, it can be calculated from the downloaded file. Multiupart upload integrity is assured if Content-MD5 is used there as well, but I don't know how s3fs handles this case. (Note that I'm not one of the contributors, I'm just a follower trying to be useful since I have worked extensively with the raw S3 API). The md5chksum mentioned in the docs is an arbitrary user metadata key with no intrinsic meaning to S3. On Dec 13, 2016 11:01 PM, "Thomas Spicer" <notifications@github.com> wrote: > From what I read a few minutes ago (and a quick test) the MD5 value is set > in the ETAG. However, it is not clear if that is true for multi-part > uploads. > > In this AWS FAQ they describe setting "md5chksum" in the metadata > https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/ > > I'm trying to make sure there is an external audit of files delivered are > verified. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <https://github.com/s3fs-fuse/s3fs-fuse/issues/516#issuecomment-266936771>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ACUxPAs7J6Znh8Z98bi3mgls-0zHNguoks5rH2oKgaJpZM4LMe9S> > . >
Author
Owner

@gaul commented on GitHub (Jan 2, 2017):

I agree that s3fs should provide integrity by default and provide an option to disable it. However, when configured for v4 signature, the default, Amazon requires the use of SHA-256 to guarantee file integrity which s3fs provides. Thus s3fs data is protected except when the S3 implementation only supports v2 signatures.

<!-- gh-comment-id:269930015 --> @gaul commented on GitHub (Jan 2, 2017): I agree that s3fs should provide integrity by default and provide an option to disable it. However, when configured for v4 signature, the default, Amazon requires the use of SHA-256 to guarantee file integrity which s3fs provides. Thus s3fs data is protected except when the S3 implementation only supports v2 signatures.
Author
Owner

@gaul commented on GitHub (Jul 11, 2019):

These metadata are the user metadata. Content-MD5 is stored externally to this, along with the Content-Length, Content-Type, etc. Please look for these with your external tool.

<!-- gh-comment-id:510295672 --> @gaul commented on GitHub (Jul 11, 2019): These metadata are the user metadata. Content-MD5 is stored externally to this, along with the Content-Length, Content-Type, etc. Please look for these with your external tool.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#288
No description provided.