mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #1047] With cache enabled, data corrupted after update to an object #574
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#574
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @amolthorat1 on GitHub (Jun 21, 2019).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1047
Additional Information
The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD
Version of s3fs being used (s3fs --version)
1.85
Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)
2.9.2
Kernel information (uname -r)
3.10.0-123.el7.x86_64
GNU/Linux Distribution, if applicable (cat /etc/os-release)
NAME="Red Hat Enterprise Linux Server"
VERSION="7.0 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.0"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.0 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.0:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION=7.0
s3fs command line used, if applicable
/usr/bin/s3fs amol-test-s3fs /awsmount -o use_cache=/s3fscache -o dbglevel=debug -o curldbg
/etc/fstab entry, if applicable
NA
s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)
Attached.
mylog.txt
Details about issue
I am observing that if cache is enabled, when an object is accessed through s3fs, and then updated by means external to s3fs, and later re-accessed through s3fs, s3fs detects that the object has changed, but does not retrieve the entire object. It instead retrieves only the additional number of byes in the new object, retaining the earlier bytes from the old object. This corrupts the retrieved object, by mixing content of the earlier and latest versions of the object.
Here are the steps to do so:
First remove the cache directory and re-create it.
Now mount with the just-created empty cache directory
Now print the content of the object "tempaws.txt"
Now change the data in the tempaws.txt externally (using AWS Management Console) to "new data added". I did this by updating my local file and re-uploading it using AWS Management Console.
Now access the file again
Note that the data should have been "new data added"; instead it is "old data added".
You will notice that the data is a combination of old data ("old data ") and new data ("added"). If you see the logs, you will see that it detected that the file has changed, but only requested bytes 8-13, assuming that the first 8 bytes have remained same. It then combined bytes 0-7 from old data and 8-13 from new data.
This behavior is only seen with caching enabled.
Is this expected behavior? If so, is there a workaround without turning the cache off altogether?
@gaul commented on GitHub (Jun 22, 2019):
Thanks for the detailed issue! This is not expected behavior and please test the referenced pull request to see if it addresses your symptoms.
@amolthorat1 commented on GitHub (Jun 24, 2019):
Thank you very much for the quick fix. I verified that the fix works.
Considering this can cause data loss to users, can a s3fs release happen soon with this fix? We can then recommend users to use that version, otherwise we will have to recommend turning the cache off.
@gaul commented on GitHub (Jun 24, 2019):
I opened #1050 to track the next release. We have a few other issues to resolve so please test against master in the mean-time.