[GH-ISSUE #1353] [Ceph v15.2.2] s3fs random data corruption at read #724

Closed
opened 2026-03-04 01:48:14 +03:00 by kerem · 6 comments
Owner

Originally created by @pkoutsov on GitHub (Aug 6, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1353

Additional Information

Version of s3fs being used (s3fs --version)

s3fs/1.86 (commit hash e0a38ad; OpenSSL)

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

FUSE library version: 2.9.7

Kernel information (uname -r)

4.4.0-116-generic

GNU/Linux Distribution, if applicable (cat /etc/os-release)

Ubuntu 16.04.6 LTS

s3fs command line used, if applicable

s3fs $S3_BUCKET /mnt/dataset1 -o passwd_file=/s3fspass -o retries=30 -o url="$S3_ENOINT" -o use_path_request_style -o dbglevel="debug" -d -f

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

s3fs_debug.zip

Details about issue

First, I would like to thank you for the effort you put on s3fs-fuse. Now to the issue. I am experiencing random data corruptions when I read the training tfrecords of Resnet-50 (uploaded on a S3 Ceph v15.2.2-backed mountpoint). More specifically, in my effort to investigate the issue further, I wrote this python script that reads all the training tfrecords, in parallel fashion (32 reading workers) through s3fs mountpoint, and compares the md5 of the resulting file to the etag reported by ceph (this is valid because I uploaded each tfrecords as single put object and thus the etag represents the md5 of the contents). I have captured and attached a debug output of s3fs for a run that my script reported tfrecord train-00639-of-01024 as corrupted. Interestingly, the length of the corrupted tfrecord, read by s3fs, matches the one on the S3 endpoint. Also if a rerun my script this file wont be corrupted but I will experience corruptions on some other tfrecords.

PS: I have repeat this validation with other S3 mounters, such as goofys, and I don't face this issue. I would prefer to use s3fs though.

Thanks

Originally created by @pkoutsov on GitHub (Aug 6, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1353 ### Additional Information #### Version of s3fs being used (s3fs --version) s3fs/1.86 (commit hash e0a38ad; OpenSSL) #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) FUSE library version: 2.9.7 #### Kernel information (uname -r) 4.4.0-116-generic #### GNU/Linux Distribution, if applicable (cat /etc/os-release) Ubuntu 16.04.6 LTS #### s3fs command line used, if applicable ``` s3fs $S3_BUCKET /mnt/dataset1 -o passwd_file=/s3fspass -o retries=30 -o url="$S3_ENOINT" -o use_path_request_style -o dbglevel="debug" -d -f ``` #### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs) [s3fs_debug.zip](https://github.com/s3fs-fuse/s3fs-fuse/files/5034461/s3fs_debug.zip) ### Details about issue First, I would like to thank you for the effort you put on s3fs-fuse. Now to the issue. I am experiencing random data corruptions when I read the training tfrecords of Resnet-50 (uploaded on a S3 Ceph v15.2.2-backed mountpoint). More specifically, in my effort to investigate the issue further, I wrote [this](https://github.com/s3fs-fuse/s3fs-fuse/files/5034503/resnet_integrity_tester.zip) python script that reads all the training tfrecords, in parallel fashion (32 reading workers) through s3fs mountpoint, and compares the md5 of the resulting file to the etag reported by ceph (this is valid because I uploaded each tfrecords as single put object and thus the etag represents the md5 of the contents). I have captured and attached a debug output of s3fs for a run that my script reported tfrecord train-00639-of-01024 as corrupted. Interestingly, the length of the corrupted tfrecord, read by s3fs, matches the one on the S3 endpoint. Also if a rerun my script this file wont be corrupted but I will experience corruptions on some other tfrecords. PS: I have repeat this validation with other S3 mounters, such as goofys, and I don't face this issue. I would prefer to use s3fs though. Thanks
kerem 2026-03-04 01:48:14 +03:00
Author
Owner

@gaul commented on GitHub (Aug 16, 2020):

@pkoutsov can you test with the latest master? This has a concurrency fix that may address your symptoms. If it doesn't, it would be great if you can minimize this test case in some way that I can reproduce it on my system. We take data corruption seriously and want to fix this as soon as possible.

<!-- gh-comment-id:674516596 --> @gaul commented on GitHub (Aug 16, 2020): @pkoutsov can you test with the latest master? This has a concurrency fix that may address your symptoms. If it doesn't, it would be great if you can minimize this test case in some way that I can reproduce it on my system. We take data corruption seriously and want to fix this as soon as possible.
Author
Owner

@pkoutsov commented on GitHub (Aug 16, 2020):

@gaul thanks for looking into this. I repeated my test case and I still get data corruptions at read. However this random effect is less frequent with the current master #1363. While I am looking for a way to minimize my test case so it is reproducible by you, my hint is that when s3fs tries to read multiple objects [multiple concurrent object readers ~35+], each ~140MB, that the endpoint will provide all of them blazingly fast [Ceph cluster (maybe this can be a MinIO instance 🤔)], you should experience the same data corruptions.

<!-- gh-comment-id:674540524 --> @pkoutsov commented on GitHub (Aug 16, 2020): @gaul thanks for looking into this. I repeated my test case and I still get data corruptions at read. However this random effect is less frequent with the current master #1363. While I am looking for a way to minimize my test case so it is reproducible by you, my hint is that when s3fs tries to read multiple objects [multiple concurrent object readers ~35+], each ~140MB, that the endpoint will provide all of them blazingly fast [Ceph cluster (maybe this can be a MinIO instance 🤔)], you should experience the same data corruptions.
Author
Owner

@gaul commented on GitHub (Oct 10, 2020):

@pkoutsov Could you test again with the latest master? It includes a race condition fix 0e895f60a0. I also added a test for concurrent readers 3bc565b986. If you can reproduce these symptoms using a test this would help us find a solution.

<!-- gh-comment-id:706510124 --> @gaul commented on GitHub (Oct 10, 2020): @pkoutsov Could you test again with the latest master? It includes a race condition fix 0e895f60a099eedeb6641f08619b1bc320db82fc. I also added a test for concurrent readers 3bc565b9864cd2da9ccaa6906ad002046562e547. If you can reproduce these symptoms using a test this would help us find a solution.
Author
Owner

@pkoutsov commented on GitHub (Oct 12, 2020):

@gaul I tested again with the latest upstream and unfortunately I am still experiencing data corruptions. I started wondering if there is another component that causes them in my whole stack (ceph->s3fs->tensorflow) but with other s3 mounters the corruptions are not present. Ok I will try and mess with concurrent readers test and reproduce my corruptions.

<!-- gh-comment-id:706989081 --> @pkoutsov commented on GitHub (Oct 12, 2020): @gaul I tested again with the latest upstream and unfortunately I am still experiencing data corruptions. I started wondering if there is another component that causes them in my whole stack (ceph->s3fs->tensorflow) but with other s3 mounters the corruptions are *not* present. Ok I will try and mess with concurrent readers test and reproduce my corruptions.
Author
Owner

@gaul commented on GitHub (Nov 15, 2020):

@pkoutsov any update? We would really like to track down any possible corruptions.

<!-- gh-comment-id:727561748 --> @gaul commented on GitHub (Nov 15, 2020): @pkoutsov any update? We would really like to track down any possible corruptions.
Author
Owner

@gaul commented on GitHub (Feb 8, 2021):

Please reopen if symptoms persist.

<!-- gh-comment-id:775165700 --> @gaul commented on GitHub (Feb 8, 2021): Please reopen if symptoms persist.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#724
No description provided.