[GH-ISSUE #2413] Content corruption when file is open during overwrite on bucket

kerem commented

2026-03-04 01:52:02 +03:00

Owner

Originally created by @hbs on GitHub (Feb 14, 2024).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2413

Additional Information

Version of s3fs being used (`s3fs --version`)

V1.93

Version of fuse being used (`pkg-config --modversion fuse`, `rpm -qi fuse` or `dpkg -s fuse`)

2.9.9-3

Kernel information (`uname -r`)

Linux sl911168 5.4.0-171-generic #189-Ubuntu SMP Fri Jan 5 14:23:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

GNU/Linux Distribution, if applicable (`cat /etc/os-release`)

Ubuntu

How to run s3fs, if applicable

s3fs MOUNTPOINT -o bucket=BUCKET,_netdev,no_check_certificate,ro,passwd_file=PATH/TO/CREDS,use_path_request_style,url=ENDPOINT,umask=0022,allow_other,uid=501,gid=501,dev,suid,kernel_cache,max_background=1000,max_stat_cache_size=100000,parallel_count=30,multireq_max=30,use_cache=/data/s3fs.cache,ensure_diskfree=500000

s3fs syslog messages (`grep s3fs /var/log/syslog`, `journalctl | grep s3fs`, or `s3fs outputs`)

N/A

Details about issue

The following sequence allowed us to reproduce the issue most of the time:

Create an object FOO on the bucket
Mount the bucket using the above command
Open FOO using vi on the machine where the bucket is mounted
Overwrite the object FOO on the bucket with new content
Perform multiple cat FOO on the machine where the bucket is mounted
Close vi
Perform multiple cat FOO again

Most of the time the content of object FOO is not updated.

Doing the same without launching vi, i.e. without having the file open at the time its content is overwritten, leads to correct result with the latter cat FOO returning the new content.

Originally created by @hbs on GitHub (Feb 14, 2024). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2413  ### Additional Information #### Version of s3fs being used (`s3fs --version`)  V1.93 #### Version of fuse being used (`pkg-config --modversion fuse`, `rpm -qi fuse` or `dpkg -s fuse`)  2.9.9-3 #### Kernel information (`uname -r`)  Linux sl911168 5.4.0-171-generic #189-Ubuntu SMP Fri Jan 5 14:23:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux #### GNU/Linux Distribution, if applicable (`cat /etc/os-release`)  Ubuntu #### How to run s3fs, if applicable  ``` s3fs MOUNTPOINT -o bucket=BUCKET,_netdev,no_check_certificate,ro,passwd_file=PATH/TO/CREDS,use_path_request_style,url=ENDPOINT,umask=0022,allow_other,uid=501,gid=501,dev,suid,kernel_cache,max_background=1000,max_stat_cache_size=100000,parallel_count=30,multireq_max=30,use_cache=/data/s3fs.cache,ensure_diskfree=500000 ```  #### s3fs syslog messages (`grep s3fs /var/log/syslog`, `journalctl | grep s3fs`, or `s3fs outputs`)  ``` N/A ``` ### Details about issue  The following sequence allowed us to reproduce the issue *most* of the time: - Create an object `FOO` on the bucket - Mount the bucket using the above command - Open `FOO` using `vi` on the machine where the bucket is mounted - Overwrite the object `FOO` on the bucket with new content - Perform multiple `cat FOO` on the machine where the bucket is mounted - Close `vi` - Perform multiple `cat FOO` again Most of the time the content of object `FOO` is not updated. Doing the same without launching `vi`, *i.e.* without having the file open at the time its content is overwritten, leads to correct result with the latter `cat FOO` returning the new content.

kerem commented

2026-03-04 01:52:03 +03:00

Author

Owner

@gaul commented on GitHub (Feb 15, 2024):

Could you share more about the expected behavior? Without file locking, I expect uncoordinated writers to create arbitrary changes to the file which might appear to be corruption. s3fs -o use_cache will only add to this confusion.

That said, s3fs could reduce but not eliminate the appearance of corruption by using GetObject with the If-Match parameter and UploadPart with the x-amz-copy-source-if-match parameter to ensure that it operates only if the ETag matches. This would allow s3fs to do the right thing when vi does a rename replacement of files so that cat could return some error instead of showing part of the first object and part of the second.

@gaul commented on GitHub (Feb 15, 2024): Could you share more about the expected behavior? Without file locking, I expect uncoordinated writers to create arbitrary changes to the file which might appear to be corruption. s3fs `-o use_cache` will only add to this confusion. That said, s3fs could reduce but not eliminate the appearance of corruption by using `GetObject` with the `If-Match` parameter and `UploadPart` with the `x-amz-copy-source-if-match` parameter to ensure that it operates only if the ETag matches. This would allow s3fs to do the right thing when `vi` does a rename replacement of files so that `cat` could return some error instead of showing part of the first object and part of the second.

kerem commented

2026-03-04 01:52:03 +03:00

Author

Owner

@hbs commented on GitHub (Feb 15, 2024):

I've probably done a poor job at my explanation.

In the sequence above, the step "Overwrite the object FOO on the bucket with new content." is not performed using vi but using s3cmd to push new content onto the bucket, vi is simply open and closed, no file modification is performed via the mount point.

@hbs commented on GitHub (Feb 15, 2024): I've probably done a poor job at my explanation. In the sequence above, the step "Overwrite the object `FOO` on the bucket with new content." is not performed using vi but using `s3cmd` to push new content onto the bucket, `vi` is simply open and closed, no file modification is performed via the mount point.

kerem commented

2026-03-04 01:52:03 +03:00

Author

Owner

@ggtakec commented on GitHub (Feb 19, 2024):

When you open an object mounted with s3fs using vi, the following behavior occurs:

At first, s3fs downloads the contents of the file and save it to a file, and the contents of that file are then read by the vi process.

When s3fs started with use_cache option, if another process reads the file, the file with the same path will read the contents of the cache.
In other words, you cannot read the content uploaded by other s3 tools.

However, if the use_cache option is not specified, the updated file contents can be read because they are downloaded from the server when other processes read them.

Note that if the file is small and does not depend on this use_cache option, the updated content will be read instead of the cache.

@ggtakec commented on GitHub (Feb 19, 2024): When you open an object mounted with s3fs using `vi`, the following behavior occurs: At first, s3fs downloads the contents of the file and save it to a file, and the contents of that file are then read by the vi process. When s3fs started with `use_cache` option, if another process reads the file, the file with the same path will read the contents of the cache. In other words, you cannot read the content uploaded by other s3 tools. However, if the `use_cache` option is not specified, the updated file contents can be read because they are downloaded from the server when other processes read them. Note that if the file is small and does not depend on this `use_cache` option, the updated content will be read instead of the cache.

kerem commented

2026-03-04 01:52:03 +03:00

Author

Owner

@hbs commented on GitHub (Feb 19, 2024):

I raised the issue because the behavior differs if you push new content to the bucket while the file is open in vi or not, hence I think there is indeed an issue somewhere.

@hbs commented on GitHub (Feb 19, 2024): I raised the issue because the behavior differs if you push new content to the bucket while the file is open in vi or not, hence I think there is indeed an issue somewhere.

kerem commented

2026-03-04 01:52:03 +03:00

Author

Owner

@ggtakec commented on GitHub (Feb 19, 2024):

@hbs Thanks for your quickly reply.
I have not yet been able to reproduce this problem.
Even when use_cache is used, updated files can be read while vi is running.

You are using the kernel_cache option, can you check again with this option removed?
And could you please let me know the results of adding the enable_content_md5 option?

I'm interested in these results. (By the way, I haven't had the same problem regardless of these option.)

@ggtakec commented on GitHub (Feb 19, 2024): @hbs Thanks for your quickly reply. I have not yet been able to reproduce this problem. Even when `use_cache` is used, updated files can be read while vi is running. - You are using the `kernel_cache` option, can you check again with this option removed? - And could you please let me know the results of adding the `enable_content_md5` option? I'm interested in these results. (By the way, I haven't had the same problem regardless of these option.)

kerem commented

2026-03-04 01:52:03 +03:00

Author

Owner

@hbs commented on GitHub (Feb 29, 2024):

One current example of the issue has the following elements:

.stats file has the following content:

64013277:26689926
0:26689926:0:0

File size is indeed 26689926 bytes, the sparse file in the cache contains only 0x00s, which is not the actual content of the file, and reading the file from the mount point only shows those 0x00s followed by some content not in the cache file, which means the original content is not fetched even though the .stats file seems to indicate the content was not loaded (if I interpret the :0: correctly on the second line).

@hbs commented on GitHub (Feb 29, 2024): One current example of the issue has the following elements: `.stats` file has the following content: ``` 64013277:26689926 0:26689926:0:0 ``` File size is indeed 26689926 bytes, the sparse file in the cache contains only `0x00`s, which is *not* the actual content of the file, and reading the file from the mount point only shows those `0x00`s followed by some content not in the cache file, which means the original content is not fetched even though the `.stats` file seems to indicate the content was not loaded (if I interpret the `:0:` correctly on the second line).

kerem commented

2026-03-04 01:52:03 +03:00

Author

Owner

@hbs commented on GitHub (Mar 1, 2024):

The issue encountered might be to caching at the fuse level. How would s3fs behave in terms of access to the cache if the direct_io option is passed to fuse?

@hbs commented on GitHub (Mar 1, 2024): The issue encountered might be to caching at the fuse level. How would s3fs behave in terms of access to the cache if the `direct_io` option is passed to fuse?

kerem commented

2026-03-04 01:52:03 +03:00

Author

Owner

@hbs commented on GitHub (Mar 5, 2024):

Another weirdness when the corruption happen, the stat file (under .bucket.stat) for a corrupted file has a single range covering the complete file with flags :0:1 even though the filesystem is mounted ro.

How can it be that the stat file thinks the file was modified when the fs is read only?

@hbs commented on GitHub (Mar 5, 2024): Another weirdness when the corruption happen, the stat file (under `.bucket.stat`) for a corrupted file has a single range covering the complete file with flags `:0:1` even though the filesystem is mounted `ro`. How can it be that the stat file thinks the file was modified when the fs is read only?

kerem commented

2026-03-04 01:52:04 +03:00

Author

Owner

@hbs commented on GitHub (Mar 8, 2024):

With the direct_io option the cache corruption issue still arises, with files showing the zeroed out content of the sparse file in the cache.

This seems somehow similar to #715

@hbs commented on GitHub (Mar 8, 2024): With the `direct_io` option the cache corruption issue still arises, with files showing the zeroed out content of the sparse file in the cache. This seems somehow similar to #715

kerem commented

2026-03-04 01:52:04 +03:00

Author

Owner

@ggtakec commented on GitHub (Mar 10, 2024):

@hbs
(I'd like to let you know up front that I haven't been able to reproduce this problem yet, and that I don't fully understand what's at stake.)
Several issues similar to this issue have been reported, but they are difficult to reproduce and it takes time to identify the cause.

I've been asked several questions, so I'll provide a series of answers below:

First, if you specify the direct_io option at startup, it is used by FUSE. (i.e. an option that expected FUSE to not cache file content)
This option is handled by FUSE and does not affect cache files(files on the local disk) handled by s3fs.
s3fs does not open its own cache file(file content and state of cached file content information) as DIRECT_IO.

Next is the cache file information file under .<bucketname>.stat of s3fs, but this content is loaded internally when the target file is opened and is not updated until the file is closed.
There may be a misunderstanding on this point.

Also, the file content cache created under <bucketname> is a sparse file that holds the downloaded range of the target file content.
The area that has not been downloaded is in the HOLE state.

Then, when a file is opened and read, a portion (or all) of the file content is downloaded from the S3 server and stored in a cache file.
If the file is written(modified), it will be written to the cache file.
If updated the file, it will be uploaded to the S3 server when the file is closed or flushed or synced.

The cache file is used in this way, so even if it is mounted in RO mode, it is a file that is updated when it is downloaded.

If possible, please provide detailed steps to reproduce your problem or identify the cause.
Also, it would be helpful for analysis if you could start s3fs with dbglevel=info or curldbg and specify the log that seems to be the problem.
Thanks in advance for your assistance.

@ggtakec commented on GitHub (Mar 10, 2024): @hbs _(I'd like to let you know up front that I haven't been able to reproduce this problem yet, and that I don't fully understand what's at stake.)_ Several issues similar to this issue have been reported, but they are difficult to reproduce and it takes time to identify the cause. I've been asked several questions, so I'll provide a series of answers below: First, if you specify the `direct_io` option at startup, it is used by FUSE. (i.e. an option that expected FUSE to not cache file content) This option is handled by FUSE and does not affect cache files(files on the local disk) handled by s3fs. s3fs does not open its own cache file(file content and state of cached file content information) as DIRECT_IO. Next is the cache file information file under `.<bucketname>.stat` of s3fs, but this content is loaded internally when the target file is opened and is not updated until the file is closed. There may be a misunderstanding on this point. Also, the file content cache created under `<bucketname>` is a sparse file that holds the downloaded range of the target file content. The area that has not been downloaded is in the HOLE state. Then, when a file is opened and read, a portion (or all) of the file content is downloaded from the S3 server and stored in a cache file. If the file is written(modified), it will be written to the cache file. If updated the file, it will be uploaded to the S3 server when the file is closed or flushed or synced. The cache file is used in this way, so even if it is mounted in RO mode, it is a file that is updated when it is downloaded. If possible, please provide detailed steps to reproduce your problem or identify the cause. Also, it would be helpful for analysis if you could start s3fs with `dbglevel=info` or `curldbg` and specify the log that seems to be the problem. Thanks in advance for your assistance.

kerem commented

2026-03-04 01:52:04 +03:00

Author

Owner

@hbs commented on GitHub (Mar 11, 2024):

Hi, thanks for your comment. I'll try to detail further what is occurring so you can maybe identify the code to look for.

The set up is a bucket mounted in RO mode on a server. The bucket contains tens of thousands of files. The application accessing those files may keep them open for a very long period of time.

The issue which arises is that sometimes the application is provided with content which includes ranges in HOLE state. This is confirmed by simply looking at the problematic file via hexdump -C. Reading the s3fs cache file shows the same content as the one retrieved via the mount point.

The application may be closed from time to time, either cleanly, i.e. with files being closed before shutdown, or violently with no explicit file closing.

The s3fs cache is not cleaned upon startup as it contains several terabytes of data which would take quite some time to redownload with a significant impact on the application's performance while it is populated.

So in our setup, no files are ever modified (files could be modified on the bucket side when I initially filed the issue but this possibility has now been removed but we still experience the issue).

If I understand correctly what you wrote regarding the range files under .<bucketname>.stat, their content should only be considered correct once s3fs has been shut down cleanly and the in-memory range information has been flushed to disk.

Regarding the logs, given the amount of file access performed by the production application where the issue arises, I don't think I will be able to provide them, unless there is a way to rotate those logs once they reach a certain size so we can limit the total amount of space used by them.

@hbs commented on GitHub (Mar 11, 2024): Hi, thanks for your comment. I'll try to detail further what is occurring so you can maybe identify the code to look for. The set up is a bucket mounted in RO mode on a server. The bucket contains tens of thousands of files. The application accessing those files may keep them open for a very long period of time. The issue which arises is that sometimes the application is provided with content which includes ranges in HOLE state. This is confirmed by simply looking at the problematic file via `hexdump -C`. Reading the s3fs cache file shows the same content as the one retrieved via the mount point. The application may be closed from time to time, either cleanly, i.e. with files being closed before shutdown, or violently with no explicit file closing. The s3fs cache is not cleaned upon startup as it contains several terabytes of data which would take quite some time to redownload with a significant impact on the application's performance while it is populated. So in our setup, no files are ever modified (files could be modified on the bucket side when I initially filed the issue but this possibility has now been removed but we still experience the issue). If I understand correctly what you wrote regarding the range files under `.<bucketname>.stat`, their content should only be considered correct once s3fs has been shut down cleanly and the in-memory range information has been flushed to disk. Regarding the logs, given the amount of file access performed by the production application where the issue arises, I don't think I will be able to provide them, unless there is a way to rotate those logs once they reach a certain size so we can limit the total amount of space used by them.

kerem commented

2026-03-04 01:52:04 +03:00

Author

Owner

@ggtakec commented on GitHub (Mar 17, 2024):

@hbs Thank you for the detailed explanation.
I understand that collecting and checking logs may be difficult.

The cache file and its stat file are implemented based on the following assumptions:

There is a HOLE in the cache file created by s3fs, but when reading(accessing) the range of the HOLE area, that area is newly downloaded from the S3 server, written it to the HOLE area, and the HOLE is filled.
Also, the stat file of the cache file under <bucketname>.stat will be updated accordingly when the file is closed.

Even if a file is read from multiple processes, the read is performed via this cache file, and the same cache is shared and updated.
Even if one process leaves a file open and another process (or the same process) reads the file, its contents will be read through the same cache file.

When reading an uncached range (HOLE), the read range is downloaded from the S3 server, written to the cache file, and the HOLE is filled.

When s3fs is terminated(not forced), any open files are closed.
The cache file's stat information is also serialized to update when the file is closed.
This should correctly reflect the state of the cache file(information such as HOLE) in the stat file.
Therefore, the cache file and its stat file left behind when s3fs is terminated remain as a matched pair.

After (re)starting s3fs, these cache files and cache stat files will be loaded and used again when you open the file.
This allows s3fs to know the HOLE area of the cache file even after restarting, and the cached portion can continue to be read from the cache file.

When opening a file, the stat of the file on the S3 server is compared with the cache file's one(mtime and file size) to determine whether the cache file is stale.
If the results of this comparison do not match, the cache file will be discarded.

The s3fs cache is designed and implemented like this.
If s3fs does not download the HOLE range from the S3 server but reads it from the cache file, it may be a problem with s3fs.

Unfortunately, I have not yet been able to reproduce the same phenomenon as this issue, so I am not able to understand the cause.

@ggtakec commented on GitHub (Mar 17, 2024): @hbs Thank you for the detailed explanation. I understand that collecting and checking logs may be difficult. The cache file and its stat file are implemented based on the following assumptions: There is a HOLE in the cache file created by s3fs, but when reading(accessing) the range of the HOLE area, that area is newly downloaded from the S3 server, written it to the HOLE area, and the HOLE is filled. Also, the stat file of the cache file under `<bucketname>.stat` will be updated accordingly when the file is closed. Even if a file is read from multiple processes, the read is performed via this cache file, and the same cache is shared and updated. Even if one process leaves a file open and another process (or the same process) reads the file, its contents will be read through the same cache file. When reading an uncached range (HOLE), the read range is downloaded from the S3 server, written to the cache file, and the HOLE is filled. When s3fs is terminated(not forced), any open files are closed. The cache file's stat information is also serialized to update when the file is closed. This should correctly reflect the state of the cache file(information such as HOLE) in the stat file. Therefore, the cache file and its stat file left behind when s3fs is terminated remain as a matched pair. After (re)starting s3fs, these cache files and cache stat files will be loaded and used again when you open the file. This allows s3fs to know the HOLE area of the cache file even after restarting, and the cached portion can continue to be read from the cache file. When opening a file, the stat of the file on the S3 server is compared with the cache file's one(mtime and file size) to determine whether the cache file is stale. If the results of this comparison do not match, the cache file will be discarded. The s3fs cache is designed and implemented like this. If s3fs does not download the HOLE range from the S3 server but reads it from the cache file, it may be a problem with s3fs. Unfortunately, I have not yet been able to reproduce the same phenomenon as this issue, so I am not able to understand the cause.

kerem commented

2026-03-04 01:52:04 +03:00

Author

Owner

@beatstream69 commented on GitHub (Apr 3, 2024):

Seems like I experience the same issue. S3 bucket mounted via fstab in readonly mode. Files in s3 is not modified.

fstab config

dataset /mnt/dataset fuse.s3fs _netdev,allow_other,use_cache=/mnt/data-ssd/s3-cache,passwd_file=/root/.passwd-s3fs,use_path_request_style,url=https://s3.example.com,uid=1000,gid=1000 0 0

System and s3fs versions

Debian 12.5

Linux jupyter2 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
Amazon Simple Storage Service File System V1.90 (commit:unknown) with GnuTLS(gcrypt)
fuse (2.9.9-6)

Corrupted file is filled with zeros, content of .stat file

38274300:191275258
0:191275258:0:1

@beatstream69 commented on GitHub (Apr 3, 2024): Seems like I experience the same issue. S3 bucket mounted via fstab in readonly mode. Files in s3 is not modified. <details><summary>fstab config</summary> ```dataset /mnt/dataset fuse.s3fs _netdev,allow_other,use_cache=/mnt/data-ssd/s3-cache,passwd_file=/root/.passwd-s3fs,use_path_request_style,url=https://s3.example.com,uid=1000,gid=1000 0 0``` </details> <details><summary>System and s3fs versions</summary> Debian 12.5 Linux jupyter2 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux Amazon Simple Storage Service File System V1.90 (commit:unknown) with GnuTLS(gcrypt) fuse (2.9.9-6) </details> Corrupted file is filled with zeros, content of .stat file ``` 38274300:191275258 0:191275258:0:1 ```

kerem referenced this issue

2026-03-04 02:02:32 +03:00

[PR #1188] [MERGED] Upgrade to S3Proxy 1.7.0 #1865

Rows
Columns

[GH-ISSUE #2413] Content corruption when file is open during overwrite on bucket #1188

Additional Information

Version of s3fs being used (s3fs --version)

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse or dpkg -s fuse)

Kernel information (uname -r)

GNU/Linux Distribution, if applicable (cat /etc/os-release)

How to run s3fs, if applicable

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

Details about issue

Version of s3fs being used (`s3fs --version`)

Version of fuse being used (`pkg-config --modversion fuse`, `rpm -qi fuse` or `dpkg -s fuse`)

Kernel information (`uname -r`)

GNU/Linux Distribution, if applicable (`cat /etc/os-release`)

s3fs syslog messages (`grep s3fs /var/log/syslog`, `journalctl | grep s3fs`, or `s3fs outputs`)