[GH-ISSUE #281] Data corruption when copying from s3fs #143

Closed
opened 2026-03-04 01:42:35 +03:00 by kerem · 9 comments
Owner

Originally created by @bruceredmon on GitHub (Oct 19, 2015).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/281

Every 3 to 4 days of use, (approximately 5k files/day) we are experiencing a corruption in the download data stream. Our files are gzip'd, so the decompression is failing. It appears that an internal error message from AWS is being embedded in the error stream at a 1MB boundary (true of every corrupt file we've inspected thus far)

06400000  3c 3f 78 6d 6c 20 76 65  72 73 69 6f 6e 3d 22 31  |<?xml version="1|
06400010  2e 30 22 20 65 6e 63 6f  64 69 6e 67 3d 22 55 54  |.0" encoding="UT|
06400020  46 2d 38 22 3f 3e 0a 3c  45 72 72 6f 72 3e 3c 43  |F-8"?>.<Error><C|
06400030  6f 64 65 3e 49 6e 74 65  72 6e 61 6c 45 72 72 6f  |ode>InternalErro|
06400040  72 3c 2f 43 6f 64 65 3e  3c 4d 65 73 73 61 67 65  |r</Code><Message|
06400050  3e 57 65 20 65 6e 63 6f  75 6e 74 65 72 65 64 20  |>We encountered |
06400060  61 6e 20 69 6e 74 65 72  6e 61 6c 20 65 72 72 6f  |an internal erro|
06400070  72 2e 20 50 6c 65 61 73  65 20 74 72 79 20 61 67  |r. Please try ag|
06400080  61 69 6e 2e 3c 2f 4d 65  73 73 61 67 65 3e 3c 52  |ain.</Message><R|
06400090  65 71 75 65 73 74 49 64  3e 31 45 38 44 32 46 30  |equestId>1E8D2F0|
064000a0  42 33 43 39 30 36 33 32  43 3c 2f 52 65 71 75 65  |B3C90632C</Reque|
064000b0  73 74 49 64 3e 3c 48 6f  73 74 49 64 3e 79 7a 4e  |stId><HostId>yzN|
064000c0  2b 70 51 49 76 64 45 52  4f 70 74 7a 31 66 5a 33  |+pQIvdEROptz1fZ3|     
064000d0  33 47 58 71 73 68 77 32  55 79 31 50 36 50 49 68  |3GXqshw2Uy1P6PIh|
064000e0  67 74 58 31 67 2f 43 52  7a 41 50 4b 78 5a 66 58  |gtX1g/CRzAPKxZfX|
064000f0  7a 5a 4e 30 33 45 4e 6c  38 43 6f 63 64 3c 2f 48  |zZN03ENl8Cocd</H|
06400100  6f 73 74 49 64 3e 3c 2f  45 72 72 6f 72 3e c1 d3  |ostId></Error>..|
Originally created by @bruceredmon on GitHub (Oct 19, 2015). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/281 Every 3 to 4 days of use, (approximately 5k files/day) we are experiencing a corruption in the download data stream. Our files are gzip'd, so the decompression is failing. It appears that an internal error message from AWS is being embedded in the error stream at a 1MB boundary (true of every corrupt file we've inspected thus far) ``` 06400000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |<?xml version="1| 06400010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 55 54 |.0" encoding="UT| 06400020 46 2d 38 22 3f 3e 0a 3c 45 72 72 6f 72 3e 3c 43 |F-8"?>.<Error><C| 06400030 6f 64 65 3e 49 6e 74 65 72 6e 61 6c 45 72 72 6f |ode>InternalErro| 06400040 72 3c 2f 43 6f 64 65 3e 3c 4d 65 73 73 61 67 65 |r</Code><Message| 06400050 3e 57 65 20 65 6e 63 6f 75 6e 74 65 72 65 64 20 |>We encountered | 06400060 61 6e 20 69 6e 74 65 72 6e 61 6c 20 65 72 72 6f |an internal erro| 06400070 72 2e 20 50 6c 65 61 73 65 20 74 72 79 20 61 67 |r. Please try ag| 06400080 61 69 6e 2e 3c 2f 4d 65 73 73 61 67 65 3e 3c 52 |ain.</Message><R| 06400090 65 71 75 65 73 74 49 64 3e 31 45 38 44 32 46 30 |equestId>1E8D2F0| 064000a0 42 33 43 39 30 36 33 32 43 3c 2f 52 65 71 75 65 |B3C90632C</Reque| 064000b0 73 74 49 64 3e 3c 48 6f 73 74 49 64 3e 79 7a 4e |stId><HostId>yzN| 064000c0 2b 70 51 49 76 64 45 52 4f 70 74 7a 31 66 5a 33 |+pQIvdEROptz1fZ3| 064000d0 33 47 58 71 73 68 77 32 55 79 31 50 36 50 49 68 |3GXqshw2Uy1P6PIh| 064000e0 67 74 58 31 67 2f 43 52 7a 41 50 4b 78 5a 66 58 |gtX1g/CRzAPKxZfX| 064000f0 7a 5a 4e 30 33 45 4e 6c 38 43 6f 63 64 3c 2f 48 |zZN03ENl8Cocd</H| 06400100 6f 73 74 49 64 3e 3c 2f 45 72 72 6f 72 3e c1 d3 |ostId></Error>..| ```
kerem 2026-03-04 01:42:35 +03:00
  • closed this issue
  • added the
    dataloss
    label
Author
Owner

@ggtakec commented on GitHub (Oct 19, 2015):

Please run s3fs with "-d" and "-f"(and "curldbg") option or "-o dbglevel=xxx" instead of "-d"(and -o f2) option.(in master branch codes is added "dbglevel" option)
And if you can, we want to know how that can be reproduced for this issue.
I hope it helps us for solving this issue.

Thanks in advance for your help.

<!-- gh-comment-id:149273001 --> @ggtakec commented on GitHub (Oct 19, 2015): Please run s3fs with "-d" and "-f"(and "curldbg") option or "-o dbglevel=xxx" instead of "-d"(and -o f2) option.(in master branch codes is added "dbglevel" option) And if you can, we want to know how that can be reproduced for this issue. I hope it helps us for solving this issue. Thanks in advance for your help.
Author
Owner

@bruceredmon commented on GitHub (Oct 26, 2015):

Mounting s3fs volumes with -o retries=4,noatime,dbglevel=info,curldbg. Please let me know if this works or if a higher dbglevel is needed.

<!-- gh-comment-id:151157104 --> @bruceredmon commented on GitHub (Oct 26, 2015): Mounting s3fs volumes with -o retries=4,noatime,dbglevel=info,curldbg. Please let me know if this works or if a higher dbglevel is needed.
Author
Owner

@ggtakec commented on GitHub (Nov 1, 2015):

@bruceredmon
There is a document in aws about "We encountered an internal error. Please try again."
https://forums.aws.amazon.com/message.jspa?messageID=215866

And s3fs retry to send a request when getting 500 and over 500 response code until retry count.

Please try to set the larger value to "retries" option.
I expect that this issue is resolved by it.

Thanks in advance for your help.

<!-- gh-comment-id:152808360 --> @ggtakec commented on GitHub (Nov 1, 2015): @bruceredmon There is a document in aws about "We encountered an internal error. Please try again." https://forums.aws.amazon.com/message.jspa?messageID=215866 And s3fs retry to send a request when getting 500 and over 500 response code until retry count. Please try to set the larger value to "retries" option. I expect that this issue is resolved by it. Thanks in advance for your help.
Author
Owner

@gaul commented on GitHub (Nov 1, 2015):

@ggtakec After s3fs exhausts its retries it should return EIO to the caller instead of bogus data. I cannot reproduce these symptoms but we need a fix similar to a1ca8b7124.

<!-- gh-comment-id:152834587 --> @gaul commented on GitHub (Nov 1, 2015): @ggtakec After s3fs exhausts its retries it should return EIO to the caller instead of bogus data. I cannot reproduce these symptoms but we need a fix similar to a1ca8b712401dd242273c138ca1f65d69cfcc605.
Author
Owner

@bruceredmon commented on GitHub (Nov 9, 2015):

Increasing retries from 4 to 12 seems to have restored stability for now, but agree with @andrewgaul that it would be much better to return IO error than corrupt data.

<!-- gh-comment-id:155078406 --> @bruceredmon commented on GitHub (Nov 9, 2015): Increasing retries from 4 to 12 seems to have restored stability for now, but agree with @andrewgaul that it would be much better to return IO error than corrupt data.
Author
Owner

@ghost commented on GitHub (Jun 16, 2016):

I found weird data corruption when copying files down.. turns out it was use_cache.. the local file cache was corrupting the data, so I disabled it and that fixed it.

<!-- gh-comment-id:226493614 --> @ghost commented on GitHub (Jun 16, 2016): I found weird data corruption when copying files down.. turns out it was `use_cache`.. the local file cache was corrupting the data, so I disabled it and that fixed it.
Author
Owner

@arpad9 commented on GitHub (Aug 25, 2017):

I'm having the same problem with the local cache and corrupted data though that seems like a separate issue. I'll put together some debugging on it.

<!-- gh-comment-id:325032465 --> @arpad9 commented on GitHub (Aug 25, 2017): I'm having the same problem with the local cache and corrupted data though that seems like a separate issue. I'll put together some debugging on it.
Author
Owner

@gaul commented on GitHub (Jan 24, 2019):

@bruceredmon master includes several fixes for data corruption; could you test again and share your results?

<!-- gh-comment-id:457040833 --> @gaul commented on GitHub (Jan 24, 2019): @bruceredmon master includes several fixes for data corruption; could you test again and share your results?
Author
Owner

@gaul commented on GitHub (Apr 9, 2019):

Closing due to inactivity. Please reopen if symptoms persist.

<!-- gh-comment-id:481187815 --> @gaul commented on GitHub (Apr 9, 2019): Closing due to inactivity. Please reopen if symptoms persist.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#143
No description provided.