[GH-ISSUE #1047] With cache enabled, data corrupted after update to an object #574

Closed
opened 2026-03-04 01:46:52 +03:00 by kerem · 3 comments
Owner

Originally created by @amolthorat1 on GitHub (Jun 21, 2019).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1047

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD

Version of s3fs being used (s3fs --version)

1.85

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

2.9.2

Kernel information (uname -r)

3.10.0-123.el7.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="Red Hat Enterprise Linux Server"
VERSION="7.0 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.0"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.0 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.0:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION=7.0

s3fs command line used, if applicable

/usr/bin/s3fs amol-test-s3fs /awsmount -o use_cache=/s3fscache -o dbglevel=debug -o curldbg

/etc/fstab entry, if applicable

NA

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

Attached.
mylog.txt

Details about issue

I am observing that if cache is enabled, when an object is accessed through s3fs, and then updated by means external to s3fs, and later re-accessed through s3fs, s3fs detects that the object has changed, but does not retrieve the entire object. It instead retrieves only the additional number of byes in the new object, retaining the earlier bytes from the old object. This corrupts the retrieved object, by mixing content of the earlier and latest versions of the object.

Here are the steps to do so:

First remove the cache directory and re-create it.

[root@<machine1>~]# rm -rf /s3fscache/
[root@<machine1>~]# mkdir /s3fscache

Now mount with the just-created empty cache directory

[root@<machine1>~]# /usr/bin/s3fs amol-test-s3fs /awsmount -o use_cache=/s3fscache -o dbglevel=debug -o curldbg

Now print the content of the object "tempaws.txt"

[root@<machine1>~]#  cat /awsmount/tempaws.txt
old data

Now change the data in the tempaws.txt externally (using AWS Management Console) to "new data added". I did this by updating my local file and re-uploading it using AWS Management Console.
Now access the file again

[root@<machine1>~]#  cat /awsmount/tempaws.txt
old data added

Note that the data should have been "new data added"; instead it is "old data added".

You will notice that the data is a combination of old data ("old data ") and new data ("added"). If you see the logs, you will see that it detected that the file has changed, but only requested bytes 8-13, assuming that the first 8 bytes have remained same. It then combined bytes 0-7 from old data and 8-13 from new data.

This behavior is only seen with caching enabled.

Is this expected behavior? If so, is there a workaround without turning the cache off altogether?

Originally created by @amolthorat1 on GitHub (Jun 21, 2019). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1047 ### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ _Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD_ #### Version of s3fs being used (s3fs --version) 1.85 #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) 2.9.2 #### Kernel information (uname -r) 3.10.0-123.el7.x86_64 #### GNU/Linux Distribution, if applicable (cat /etc/os-release) NAME="Red Hat Enterprise Linux Server" VERSION="7.0 (Maipo)" ID="rhel" ID_LIKE="fedora" VERSION_ID="7.0" PRETTY_NAME="Red Hat Enterprise Linux Server 7.0 (Maipo)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:7.0:GA:server" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.0 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION=7.0 #### s3fs command line used, if applicable /usr/bin/s3fs amol-test-s3fs /awsmount -o use_cache=/s3fscache -o dbglevel=debug -o curldbg #### /etc/fstab entry, if applicable NA #### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs) Attached. [mylog.txt](https://github.com/s3fs-fuse/s3fs-fuse/files/3313987/mylog.txt) ### Details about issue I am observing that if cache is enabled, when an object is accessed through s3fs, and then updated by means external to s3fs, and later re-accessed through s3fs, s3fs detects that the object has changed, but does not retrieve the entire object. It instead retrieves only the additional number of byes in the new object, retaining the earlier bytes from the old object. This corrupts the retrieved object, by mixing content of the earlier and latest versions of the object. Here are the steps to do so: First remove the cache directory and re-create it. [root@<machine1>~]# rm -rf /s3fscache/ [root@<machine1>~]# mkdir /s3fscache Now mount with the just-created empty cache directory [root@<machine1>~]# /usr/bin/s3fs amol-test-s3fs /awsmount -o use_cache=/s3fscache -o dbglevel=debug -o curldbg Now print the content of the object "tempaws.txt" [root@<machine1>~]# cat /awsmount/tempaws.txt old data Now change the data in the tempaws.txt externally (using AWS Management Console) to "new data added". I did this by updating my local file and re-uploading it using AWS Management Console. Now access the file again [root@<machine1>~]# cat /awsmount/tempaws.txt old data added Note that the data should have been **"new data added"**; instead it is **"old data added"**. You will notice that the data is a combination of old data ("old data ") and new data ("added"). If you see the logs, you will see that it detected that the file has changed, but only requested bytes 8-13, assuming that the first 8 bytes have remained same. It then combined bytes 0-7 from old data and 8-13 from new data. This behavior is only seen with caching enabled. Is this expected behavior? If so, is there a workaround without turning the cache off altogether?
kerem closed this issue 2026-03-04 01:46:52 +03:00
Author
Owner

@gaul commented on GitHub (Jun 22, 2019):

Thanks for the detailed issue! This is not expected behavior and please test the referenced pull request to see if it addresses your symptoms.

<!-- gh-comment-id:504682407 --> @gaul commented on GitHub (Jun 22, 2019): Thanks for the detailed issue! This is not expected behavior and please test the referenced pull request to see if it addresses your symptoms.
Author
Owner

@amolthorat1 commented on GitHub (Jun 24, 2019):

Thank you very much for the quick fix. I verified that the fix works.

Considering this can cause data loss to users, can a s3fs release happen soon with this fix? We can then recommend users to use that version, otherwise we will have to recommend turning the cache off.

<!-- gh-comment-id:505009798 --> @amolthorat1 commented on GitHub (Jun 24, 2019): Thank you very much for the quick fix. I verified that the fix works. Considering this can cause data loss to users, can a s3fs release happen soon with this fix? We can then recommend users to use that version, otherwise we will have to recommend turning the cache off.
Author
Owner

@gaul commented on GitHub (Jun 24, 2019):

I opened #1050 to track the next release. We have a few other issues to resolve so please test against master in the mean-time.

<!-- gh-comment-id:505113044 --> @gaul commented on GitHub (Jun 24, 2019): I opened #1050 to track the next release. We have a few other issues to resolve so please test against master in the mean-time.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#574
No description provided.