[GH-ISSUE #226] 125MB tar.gz file error #126

Closed
opened 2026-03-04 01:42:21 +03:00 by kerem · 11 comments
Owner

Originally created by @ogg1e on GitHub (Aug 11, 2015).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/226

If I use smaller files in s3 (text, jar etc) up to 20MB in size, they work fine in s3fs-fuse. But if I upload a tar.gz file that is 125MB in size, I get the error

gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

when I try to extract it (tar -xvf). If I get the file out of s3 using s3cmd, there's no problem with it and it works fine. So I know the file isn't corrupt.

Here's my /etc/fstab entry using a proxy:

s3fs#int-bucket /opt/buckets/int fuse umask=0022,_netdev,allow_other,use_path_request_style,url=http://10.6.71.202:8080,use_cache=/tmp,nonempty,nomultipar
t,parallel_count=1 0 0
Originally created by @ogg1e on GitHub (Aug 11, 2015). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/226 If I use smaller files in s3 (text, jar etc) up to 20MB in size, they work fine in s3fs-fuse. But if I upload a tar.gz file that is 125MB in size, I get the error ``` gzip: stdin: invalid compressed data--format violated tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now ``` when I try to extract it (tar -xvf). If I get the file out of s3 using s3cmd, there's no problem with it and it works fine. So I know the file isn't corrupt. Here's my /etc/fstab entry using a proxy: ``` s3fs#int-bucket /opt/buckets/int fuse umask=0022,_netdev,allow_other,use_path_request_style,url=http://10.6.71.202:8080,use_cache=/tmp,nonempty,nomultipar t,parallel_count=1 0 0 ```
kerem 2026-03-04 01:42:21 +03:00
  • closed this issue
  • added the
    dataloss
    label
Author
Owner

@sqlbot commented on GitHub (Aug 11, 2015):

This issue seems to leave some questions unanswered.

Among the small files you report having no trouble with, you don't mention "gz." Is s3fs working correctly for smaller gz files and not large ones, or is it actually all .gz files, with the size not actually relevant? What about large files in other formats?

gzip: stdin: invalid compressed data--format violated
when I try to extract it (tar -xvf)

You shouldn't get that message with tar -xvf. You would need -z in the options to get that message, wouldn't you?

What kind of proxy are you using? If your ".gz" files are stored in S3 with Content-Encoding: gzip... well, that is incorrect, and it's a confusingly common mistake I see people make. Content-Encoding is for transparent encoding that the user agent is supposed to be remove, and that's of course not the case with a ".gz" file, which you want to remain gzipped end-to-end. In that case, particularly, the proxy could be stripping the gzip wrapper from the file, leading to the corruption. It might be useful to mention which proxy you are using, and the Content-Type and Content-Encoding of these problematic S3 objects, as seen in the console. I am guessing you did not originally store them with s3fs, but that might be useful information as well, since you seem to be able to download them with s3cmd.

<!-- gh-comment-id:130104485 --> @sqlbot commented on GitHub (Aug 11, 2015): This issue seems to leave some questions unanswered. Among the small files you report having no trouble with, you don't mention "gz." Is s3fs working correctly for smaller gz files and not large ones, or is it actually all .gz files, with the size not actually relevant? What about large files in other formats? > gzip: stdin: invalid compressed data--format violated > when I try to extract it (tar -xvf) You shouldn't get that message with `tar -xvf`. You would need `-z` in the options to get that message, wouldn't you? What kind of proxy are you using? If your ".gz" files are stored in S3 with `Content-Encoding: gzip`... well, that is incorrect, and it's a confusingly common mistake I see people make. `Content-Encoding` is for _transparent_ encoding that the user agent is supposed to be remove, and that's of course not the case with a ".gz" file, which you want to remain gzipped end-to-end. In that case, particularly, the proxy could be stripping the gzip wrapper from the file, leading to the corruption. It might be useful to mention which proxy you are using, and the `Content-Type` and `Content-Encoding` of these problematic S3 objects, as seen in the console. I am guessing you did not originally store them with s3fs, but that might be useful information as well, since you seem to be able to download them with s3cmd.
Author
Owner

@ogg1e commented on GitHub (Aug 12, 2015):

It does work with smaller tar.gz files.

That error is when I do 'tar -xvf'

I uploaded the files with s3cmd.

<!-- gh-comment-id:130274789 --> @ogg1e commented on GitHub (Aug 12, 2015): It does work with smaller tar.gz files. That error is when I do 'tar -xvf' I uploaded the files with s3cmd.
Author
Owner

@ogg1e commented on GitHub (Aug 12, 2015):

Just did some more testing. If I upload the file with s3fs, I can use it from s3fs. But if I upload it with s3cmd, I cannot use it with s3fs. So what's the difference between the two? How do configure both so they work together? The plan was to upload with s3cmd and read with s3fs.

<!-- gh-comment-id:130336656 --> @ogg1e commented on GitHub (Aug 12, 2015): Just did some more testing. If I upload the file with s3fs, I can use it from s3fs. But if I upload it with s3cmd, I cannot use it with s3fs. So what's the difference between the two? How do configure both so they work together? The plan was to upload with s3cmd and read with s3fs.
Author
Owner

@ggtakec commented on GitHub (Aug 12, 2015):

s3fs puts attributes to a file(object), which are x-amz-* HTTP headers.
These attributes are used as file permissions by s3fs.
s3cmd does not put theses attributes.
Because of this difference, s3fs can not access the file(object) which is uploaded by s3cmd.
This behavior is the same as the file system, it will also be confirmed permissions of a directory which has the file.
You can see it by the ls command.

Regards,

<!-- gh-comment-id:130355223 --> @ggtakec commented on GitHub (Aug 12, 2015): s3fs puts attributes to a file(object), which are x-amz-\* HTTP headers. These attributes are used as file permissions by s3fs. s3cmd does not put theses attributes. Because of this difference, s3fs can not access the file(object) which is uploaded by s3cmd. This behavior is the same as the file system, it will also be confirmed permissions of a directory which has the file. You can see it by the ls command. Regards,
Author
Owner

@ogg1e commented on GitHub (Aug 12, 2015):

Can I manually add these headers with s3cmd?

<!-- gh-comment-id:130356191 --> @ogg1e commented on GitHub (Aug 12, 2015): Can I manually add these headers with s3cmd?
Author
Owner

@ggtakec commented on GitHub (Aug 12, 2015):

You can set following http header for the file by s3cmd
x-amz-meta-gid
x-amz-meta-uid
x-amz-meta-mode
x-amz-meta-mtime

Or if your objects do not have any header, at first you can run s3fs with uid/gid option.(as same as your account)
And do “touch” the files or any action(chmod/chown/chgrp/etc) to the files, then s3fs adds those http header.
After that, remount without uid/gid option and see the files, you can see normal permission.

Regards,

<!-- gh-comment-id:130361580 --> @ggtakec commented on GitHub (Aug 12, 2015): You can set following http header for the file by s3cmd x-amz-meta-gid x-amz-meta-uid x-amz-meta-mode x-amz-meta-mtime Or if your objects do not have any header, at first you can run s3fs with uid/gid option.(as same as your account) And do “touch” the files or any action(chmod/chown/chgrp/etc) to the files, then s3fs adds those http header. After that, remount without uid/gid option and see the files, you can see normal permission. Regards,
Author
Owner

@gaul commented on GitHub (Aug 21, 2015):

@ogg1e You could also try passing -o umask=0022 to s3fs.

<!-- gh-comment-id:133283829 --> @gaul commented on GitHub (Aug 21, 2015): @ogg1e You could also try passing `-o umask=0022` to s3fs.
Author
Owner

@ogg1e commented on GitHub (Aug 21, 2015):

I just did some more testing, and even when I upload it with s3fs, i can't use it on another server using s3fs mounted to the same bucket.

  • ssh the 130MB tar.gz file to server 1.
  • on server 1, copy the file to the s3fs mounted folder as a non-root user. It shows as root:root for ownership and 755 permissions.
  • on server 1, I can still use the file that is in the s3fs folder
  • on server 2, the file shows up in the s3fs mounted folder and is the same size and is also owned by root:root and 755 permissions.
  • On server 2, if I try to extract the files from it, it throws the same error:
    • gzip: stdin: invalid compressed data--format violated
      tar: Unexpected EOF in archive
      tar: Unexpected EOF in archive
      tar: Error is not recoverable: exiting now
<!-- gh-comment-id:133519385 --> @ogg1e commented on GitHub (Aug 21, 2015): I just did some more testing, and even when I upload it with s3fs, i can't use it on another server using s3fs mounted to the same bucket. - ssh the 130MB tar.gz file to server 1. - on server 1, copy the file to the s3fs mounted folder as a non-root user. It shows as root:root for ownership and 755 permissions. - on server 1, I can still use the file that is in the s3fs folder - on server 2, the file shows up in the s3fs mounted folder and is the same size and is also owned by root:root and 755 permissions. - On server 2, if I try to extract the files from it, it throws the same error: - gzip: stdin: invalid compressed data--format violated tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now
Author
Owner

@ggtakec commented on GitHub (Sep 13, 2015):

If you can try, please copy a problem file to another directory from server2.
And compares original and that file as binary.

If the result is not same, there is something failure on sending/receiveing through s3fs.
If both files are same, there is gzip decompress fault through s3fs.
I think that we should identify the source of the problem.

Regards,

<!-- gh-comment-id:139849207 --> @ggtakec commented on GitHub (Sep 13, 2015): If you can try, please copy a problem file to another directory from server2. And compares original and that file as binary. If the result is not same, there is something failure on sending/receiveing through s3fs. If both files are same, there is gzip decompress fault through s3fs. I think that we should identify the source of the problem. Regards,
Author
Owner

@gaul commented on GitHub (Jan 24, 2019):

@ogg1e Could you retest against master? It include a number of fixes on write and error paths which might address this symptom.

<!-- gh-comment-id:457039987 --> @gaul commented on GitHub (Jan 24, 2019): @ogg1e Could you retest against master? It include a number of fixes on write and error paths which might address this symptom.
Author
Owner

@gaul commented on GitHub (Apr 9, 2019):

Closing due to inactivity. Please reopen if symptoms persist.

<!-- gh-comment-id:481188057 --> @gaul commented on GitHub (Apr 9, 2019): Closing due to inactivity. Please reopen if symptoms persist.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#126
No description provided.