mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 21:35:58 +03:00
[GH-ISSUE #715] Cache file corruption #405
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#405
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @spectre683 on GitHub (Feb 6, 2018).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/715
Cache file corruption
We are experiencing an issue in which files become corrupt on the local machine. When this issue occurs the file at the S3FS mount point and the file in the S3FS cache become corrupt. The file on S3 is not affected. File corruption in this case means that the file size appears to be correct but the file is effectively empty and reading from it only returns 0x00 bytes. We periodically clear our cache with a cron job that executes a command something like this:
Corruption will occur if a cached file is removed from the cache directory but not from the stat directory and then the file is accessed from the mount point. For example, if we have a file called
a_file.datstored in a bucket calleda_bucketthen the steps to reproduce this problem would be like this:Access (read) the file from the mount point.
Check on the file:
Note that the blocks are now 0. In this condition reading from the file will only produce 0x00 values. Deleting the file from the stat directory and re-reading it from the mount point will cause S3FS to reacquire it from S3 and thus replace the file in the cache directory with a valid file.
Additional Information
The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Version of s3fs being used (s3fs --version)
Version of fuse being used (pkg-config --modversion fuse)
System information (uname -r)
Distro (cat /etc/issue)
s3fs command line used (if applicable)
/etc/fstab entry (if applicable):
s3fs syslog messages (grep s3fs /var/log/syslog, or s3fs outputs)
if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages
Details about issue
N/A
@MrMoose commented on GitHub (Aug 6, 2018):
I have the same problem. Gentoo, ~x64, Kernel 4.17.11
Happened several times in a few days I am using this, causing substantial file loss. Is there any workaround or something I can do?
@CMCDragonkai commented on GitHub (Oct 4, 2018):
I'm not sure if this is related, but I had a python script that works locally by outputting CSV files using the native csv module and with line buffering. Using it with the current release of s3fs 1.83 results in a bunch of null bytes written at the beginning of the CSV file. The beginning of the hexdump is:
It's a crazy number of null bytes. It also happens again later.
I have no idea what's causing this, but it literally corrupts the file.
I do want to mention that in order to check what was going on, I went into the cache directory while the file was being written, and used less to inspect the cached file that was being written, but that shouldn't cause these kinds of bugs.
@gaul commented on GitHub (Oct 4, 2018):
If you provide a way to reproduce these symptoms I can look into this.
@gaul commented on GitHub (Jan 26, 2019):
master includes a fix for #918 which may address your symptoms.
@ggtakec commented on GitHub (Mar 29, 2019):
@spectre683 @MrMoose @CMCDragonkai
The s3fs cache file is a sparse file.
In other words, s3fs uses ftruncate to adjust the size of cache file to the size of S3 object.
Cache files are created as Sparse files, and are segmented.
s3fs manages those segments by fixed size.
As needed, s3fs reads objects from S3 with a fixed size and finally fills the cache file.
If the target segment is cached, s3fs reads the area from the cache file.
With the above behavior, the block size of the cache file itself may indicate less than the size of the object.
The condition is not abnormal, but is a normal cache condition.
@spectre683 commented on GitHub (Mar 31, 2019):
@ggtakec I have not had time to study the internals of s3fs - thank you for the explanation regarding the s3fs caching mechanism. However, I do not think your explanation and the original bug report are inconsistent. In other words, I think it is possible that the s3fs caching mechanism is expected to work as you describe and there is also a bug that results in the behavior described in the original bug report.
All the original bug report is saying is that if the steps described in the report are followed I was able to consistently create a situation in which unexpected data was read from a file. In short, my expectation was that s3fs would return the actual file content and not a string of 0x00 bytes.
I have not had the chance to re-evaluate this with #918 so maybe it has been fixed. However, we are only using s3fs to read files and since the description for #918 describes data loss when writing to the middle of a file I'm not sure that this is relevant to the issue I described. When I get a chance I will try to re-evaluate the issue I describe and comment as to whether it has been resolved or not.
@gaul commented on GitHub (Apr 3, 2019):
@spectre683 We are very concerned about data loss issues; please reopen if you can reproduce this with the latest build. 1.84 and 1.85 include several data loss fixes including #918 so retesting would really help us, especially if you can provide a reproduceable test case. Please let us know the results either way!
@gaul commented on GitHub (Apr 9, 2019):
Closing due to inactivity. Please reopen if symptoms persist.
@spectre683 commented on GitHub (Nov 2, 2020):
@gaul With sincere apologies for the length of time it took to get to this, I can confirm that the issue no longer seems to exist. I looked at 1.80 (to confirm I could still produce the issue) and then 1.84, 1.85, and 1.87 to confirm it doesn't occur anymore. I don't have the environment in which the issue originally occurred anymore but I had a VM laying around that was reasonably close and used that. Specifically fuse = 2.9.3-15+deb8u3 and uname -r = 3.16.0-6-amd64. The distro information is the same as the original report. I casually created a bucket with file to test with and used a combination of md5sum and od'ing the first 16 bytes of the file for the access / read step (step 4). Nothing super exacting but I'm reasonably confident that I recreated the issue I was experiencing and showed it does not reproduce in 1.84, 1.85, 1.87. Apologies again for the delay in responding and thank you for your assistance.
@gaul commented on GitHub (Nov 2, 2020):
Thanks for testing this!