mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #2413] Content corruption when file is open during overwrite on bucket #1188
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#1188
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @hbs on GitHub (Feb 14, 2024).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2413
Additional Information
Version of s3fs being used (
s3fs --version)V1.93
Version of fuse being used (
pkg-config --modversion fuse,rpm -qi fuseordpkg -s fuse)2.9.9-3
Kernel information (
uname -r)Linux sl911168 5.4.0-171-generic #189-Ubuntu SMP Fri Jan 5 14:23:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
GNU/Linux Distribution, if applicable (
cat /etc/os-release)Ubuntu
How to run s3fs, if applicable
s3fs syslog messages (
grep s3fs /var/log/syslog,journalctl | grep s3fs, ors3fs outputs)Details about issue
The following sequence allowed us to reproduce the issue most of the time:
FOOon the bucketFOOusingvion the machine where the bucket is mountedFOOon the bucket with new contentcat FOOon the machine where the bucket is mountedvicat FOOagainMost of the time the content of object
FOOis not updated.Doing the same without launching
vi, i.e. without having the file open at the time its content is overwritten, leads to correct result with the lattercat FOOreturning the new content.@gaul commented on GitHub (Feb 15, 2024):
Could you share more about the expected behavior? Without file locking, I expect uncoordinated writers to create arbitrary changes to the file which might appear to be corruption. s3fs
-o use_cachewill only add to this confusion.That said, s3fs could reduce but not eliminate the appearance of corruption by using
GetObjectwith theIf-Matchparameter andUploadPartwith thex-amz-copy-source-if-matchparameter to ensure that it operates only if the ETag matches. This would allow s3fs to do the right thing whenvidoes a rename replacement of files so thatcatcould return some error instead of showing part of the first object and part of the second.@hbs commented on GitHub (Feb 15, 2024):
I've probably done a poor job at my explanation.
In the sequence above, the step "Overwrite the object
FOOon the bucket with new content." is not performed using vi but usings3cmdto push new content onto the bucket,viis simply open and closed, no file modification is performed via the mount point.@ggtakec commented on GitHub (Feb 19, 2024):
When you open an object mounted with s3fs using
vi, the following behavior occurs:At first, s3fs downloads the contents of the file and save it to a file, and the contents of that file are then read by the vi process.
When s3fs started with
use_cacheoption, if another process reads the file, the file with the same path will read the contents of the cache.In other words, you cannot read the content uploaded by other s3 tools.
However, if the
use_cacheoption is not specified, the updated file contents can be read because they are downloaded from the server when other processes read them.Note that if the file is small and does not depend on this
use_cacheoption, the updated content will be read instead of the cache.@hbs commented on GitHub (Feb 19, 2024):
I raised the issue because the behavior differs if you push new content to the bucket while the file is open in vi or not, hence I think there is indeed an issue somewhere.
@ggtakec commented on GitHub (Feb 19, 2024):
@hbs Thanks for your quickly reply.
I have not yet been able to reproduce this problem.
Even when
use_cacheis used, updated files can be read while vi is running.kernel_cacheoption, can you check again with this option removed?enable_content_md5option?I'm interested in these results. (By the way, I haven't had the same problem regardless of these option.)
@hbs commented on GitHub (Feb 29, 2024):
One current example of the issue has the following elements:
.statsfile has the following content:File size is indeed 26689926 bytes, the sparse file in the cache contains only
0x00s, which is not the actual content of the file, and reading the file from the mount point only shows those0x00s followed by some content not in the cache file, which means the original content is not fetched even though the.statsfile seems to indicate the content was not loaded (if I interpret the:0:correctly on the second line).@hbs commented on GitHub (Mar 1, 2024):
The issue encountered might be to caching at the fuse level. How would s3fs behave in terms of access to the cache if the
direct_iooption is passed to fuse?@hbs commented on GitHub (Mar 5, 2024):
Another weirdness when the corruption happen, the stat file (under
.bucket.stat) for a corrupted file has a single range covering the complete file with flags:0:1even though the filesystem is mountedro.How can it be that the stat file thinks the file was modified when the fs is read only?
@hbs commented on GitHub (Mar 8, 2024):
With the
direct_iooption the cache corruption issue still arises, with files showing the zeroed out content of the sparse file in the cache.This seems somehow similar to #715
@ggtakec commented on GitHub (Mar 10, 2024):
@hbs
(I'd like to let you know up front that I haven't been able to reproduce this problem yet, and that I don't fully understand what's at stake.)
Several issues similar to this issue have been reported, but they are difficult to reproduce and it takes time to identify the cause.
I've been asked several questions, so I'll provide a series of answers below:
First, if you specify the
direct_iooption at startup, it is used by FUSE. (i.e. an option that expected FUSE to not cache file content)This option is handled by FUSE and does not affect cache files(files on the local disk) handled by s3fs.
s3fs does not open its own cache file(file content and state of cached file content information) as DIRECT_IO.
Next is the cache file information file under
.<bucketname>.statof s3fs, but this content is loaded internally when the target file is opened and is not updated until the file is closed.There may be a misunderstanding on this point.
Also, the file content cache created under
<bucketname>is a sparse file that holds the downloaded range of the target file content.The area that has not been downloaded is in the HOLE state.
Then, when a file is opened and read, a portion (or all) of the file content is downloaded from the S3 server and stored in a cache file.
If the file is written(modified), it will be written to the cache file.
If updated the file, it will be uploaded to the S3 server when the file is closed or flushed or synced.
The cache file is used in this way, so even if it is mounted in RO mode, it is a file that is updated when it is downloaded.
If possible, please provide detailed steps to reproduce your problem or identify the cause.
Also, it would be helpful for analysis if you could start s3fs with
dbglevel=infoorcurldbgand specify the log that seems to be the problem.Thanks in advance for your assistance.
@hbs commented on GitHub (Mar 11, 2024):
Hi, thanks for your comment. I'll try to detail further what is occurring so you can maybe identify the code to look for.
The set up is a bucket mounted in RO mode on a server. The bucket contains tens of thousands of files. The application accessing those files may keep them open for a very long period of time.
The issue which arises is that sometimes the application is provided with content which includes ranges in HOLE state. This is confirmed by simply looking at the problematic file via
hexdump -C. Reading the s3fs cache file shows the same content as the one retrieved via the mount point.The application may be closed from time to time, either cleanly, i.e. with files being closed before shutdown, or violently with no explicit file closing.
The s3fs cache is not cleaned upon startup as it contains several terabytes of data which would take quite some time to redownload with a significant impact on the application's performance while it is populated.
So in our setup, no files are ever modified (files could be modified on the bucket side when I initially filed the issue but this possibility has now been removed but we still experience the issue).
If I understand correctly what you wrote regarding the range files under
.<bucketname>.stat, their content should only be considered correct once s3fs has been shut down cleanly and the in-memory range information has been flushed to disk.Regarding the logs, given the amount of file access performed by the production application where the issue arises, I don't think I will be able to provide them, unless there is a way to rotate those logs once they reach a certain size so we can limit the total amount of space used by them.
@ggtakec commented on GitHub (Mar 17, 2024):
@hbs Thank you for the detailed explanation.
I understand that collecting and checking logs may be difficult.
The cache file and its stat file are implemented based on the following assumptions:
There is a HOLE in the cache file created by s3fs, but when reading(accessing) the range of the HOLE area, that area is newly downloaded from the S3 server, written it to the HOLE area, and the HOLE is filled.
Also, the stat file of the cache file under
<bucketname>.statwill be updated accordingly when the file is closed.Even if a file is read from multiple processes, the read is performed via this cache file, and the same cache is shared and updated.
Even if one process leaves a file open and another process (or the same process) reads the file, its contents will be read through the same cache file.
When reading an uncached range (HOLE), the read range is downloaded from the S3 server, written to the cache file, and the HOLE is filled.
When s3fs is terminated(not forced), any open files are closed.
The cache file's stat information is also serialized to update when the file is closed.
This should correctly reflect the state of the cache file(information such as HOLE) in the stat file.
Therefore, the cache file and its stat file left behind when s3fs is terminated remain as a matched pair.
After (re)starting s3fs, these cache files and cache stat files will be loaded and used again when you open the file.
This allows s3fs to know the HOLE area of the cache file even after restarting, and the cached portion can continue to be read from the cache file.
When opening a file, the stat of the file on the S3 server is compared with the cache file's one(mtime and file size) to determine whether the cache file is stale.
If the results of this comparison do not match, the cache file will be discarded.
The s3fs cache is designed and implemented like this.
If s3fs does not download the HOLE range from the S3 server but reads it from the cache file, it may be a problem with s3fs.
Unfortunately, I have not yet been able to reproduce the same phenomenon as this issue, so I am not able to understand the cause.
@beatstream69 commented on GitHub (Apr 3, 2024):
Seems like I experience the same issue. S3 bucket mounted via fstab in readonly mode. Files in s3 is not modified.
fstab config
dataset /mnt/dataset fuse.s3fs _netdev,allow_other,use_cache=/mnt/data-ssd/s3-cache,passwd_file=/root/.passwd-s3fs,use_path_request_style,url=https://s3.example.com,uid=1000,gid=1000 0 0System and s3fs versions
Debian 12.5Linux jupyter2 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
Amazon Simple Storage Service File System V1.90 (commit:unknown) with GnuTLS(gcrypt)
fuse (2.9.9-6)
Corrupted file is filled with zeros, content of .stat file