[GH-ISSUE #1269] fsync guarantees for S3FS not documented #683

Closed
opened 2026-03-04 01:47:51 +03:00 by kerem · 6 comments
Owner

Originally created by @igaztanaga on GitHub (Apr 13, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1269

According to POSIX "fsync", which in turn translates to s3fs_fsync:

"transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted. This includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed."

I can't find what this means for s3fs, we were expecting that all data would be uploaded to S3 before "fsync" returns. According to code (at least in release 1.86) ent->Flush(false); is called which seems to request an asynchronous upload. Shouldn't ent->Flush(true) be called to guarantee data is in S3? AFAIK FUSE does not guarantee that s3fs_release is synchronous, so "fsync" looks the only place to handle this.

Our use case: when executing s3fuse in a docker, we need to know if all data has been transferred to S3 before I kill the docker (and possibly s3FS is still transferring in the background, even if my process has exited). The obvious and clean idea was to call "fsync" on each written file before closing it, but I don't know if this gives enough guarantees, and I could not find any hint in the documentation.

Originally created by @igaztanaga on GitHub (Apr 13, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1269 According to POSIX "fsync", which in turn translates to s3fs_fsync: "transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted. This includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed." I can't find what this means for s3fs, we were expecting that all data would be uploaded to S3 before "fsync" returns. According to code (at least in release 1.86) `ent->Flush(false);` is called which seems to request an asynchronous upload. Shouldn't `ent->Flush(true)` be called to guarantee data is in S3? AFAIK FUSE does not guarantee that s3fs_release is synchronous, so "fsync" looks the only place to handle this. Our use case: when executing s3fuse in a docker, we need to know if all data has been transferred to S3 before I kill the docker (and possibly s3FS is still transferring in the background, even if my process has exited). The obvious and clean idea was to call "fsync" on each written file before closing it, but I don't know if this gives enough guarantees, and I could not find any hint in the documentation.
kerem closed this issue 2026-03-04 01:47:51 +03:00
Author
Owner

@gaul commented on GitHub (Apr 16, 2020):

Agreed that s3fs_fsync should call FdEntity::flush(true). Could you send a pull request to do this?

@ggtakec as far as I can tell, this behavior has existed forever, since 26453c4874.

<!-- gh-comment-id:614455447 --> @gaul commented on GitHub (Apr 16, 2020): Agreed that `s3fs_fsync` should call `FdEntity::flush(true)`. Could you send a pull request to do this? @ggtakec as far as I can tell, this behavior has existed forever, since 26453c4874191185f4a43aa7bcac9ce6d5a58118.
Author
Owner

@gaul commented on GitHub (Apr 16, 2020):

It seems like the logic will upload the dirty pages even without force_sync=true. What is the use case for force_sync=false?

<!-- gh-comment-id:614458927 --> @gaul commented on GitHub (Apr 16, 2020): It seems like the logic will upload the dirty pages even without `force_sync=true`. What is the use case for `force_sync=false`?
Author
Owner

@igaztanaga commented on GitHub (Apr 16, 2020):

Just to make sure: if using force_sync=false (current behaviour) uploads dirty pages, this guarantees that when fsync(fd) returns in userspace, all data from file fd will be in S3?

<!-- gh-comment-id:614936590 --> @igaztanaga commented on GitHub (Apr 16, 2020): Just to make sure: if using `force_sync=false` (current behaviour) uploads dirty pages, this guarantees that when `fsync(fd)` returns in userspace, all data from file `fd` will be in S3?
Author
Owner

@gaul commented on GitHub (Apr 18, 2020):

@igaztanaga Yes it appears that dirty data is synced regardless of the value of force_sync. It appears that this flag exists to sync the metadata, not the data. Thus the current code path does the correct thing for your use case. Have you observed a different behavior?

<!-- gh-comment-id:615594014 --> @gaul commented on GitHub (Apr 18, 2020): @igaztanaga Yes it appears that dirty data is synced regardless of the value of `force_sync`. It appears that this flag exists to sync the metadata, not the data. Thus the current code path does the correct thing for your use case. Have you observed a different behavior?
Author
Owner

@ggtakec commented on GitHub (Apr 18, 2020):

@gaul Thanks for your help.
@igaztanaga
In the handling of s3fs_fsync, if the target file (object) is open and there is a change, it is uploaded.
When uploading, it is executed as a synchronous process.
Does that mean you want to upload this even if it doesn't change?

<!-- gh-comment-id:615894032 --> @ggtakec commented on GitHub (Apr 18, 2020): @gaul Thanks for your help. @igaztanaga In the handling of s3fs_fsync, if the target file (object) is open and there is a change, it is uploaded. When uploading, it is executed as a synchronous process. Does that mean you want to upload this even if it doesn't change?
Author
Owner

@igaztanaga commented on GitHub (Apr 20, 2020):

@ggtakec
s3fs_fsync is exactly what we need. The problem is that if we exit a docker just after the process writing and closing the file ends, as s3fs_release is asynchronous, we can (in theory) lose data.

I also see that s3fs_flush also calls FdEntity::flush(true) so in theory a simple close guarantees S3 data is updated, as s3fs_flush is guaranteed to be called for each close synchronously.

In any case, replying my own question fsync on S3fs guarantees data will be written in S3, which is great.

I've updated FAQ with the question "Q: When are the contents of a file guaranteed to be stored on S3?" trying to explain these guarantees. Thanks for all your replies.

<!-- gh-comment-id:616809534 --> @igaztanaga commented on GitHub (Apr 20, 2020): @ggtakec s3fs_fsync is exactly what we need. The problem is that if we exit a docker just after the process writing and closing the file ends, as `s3fs_release` is asynchronous, we can (in theory) lose data. I also see that s3fs_flush also calls `FdEntity::flush(true)` so in theory a simple close guarantees S3 data is updated, as `s3fs_flush` is guaranteed to be called for each `close` synchronously. In any case, replying my own question `fsync` on S3fs guarantees data will be written in S3, which is great. I've updated FAQ with the question "Q: When are the contents of a file guaranteed to be stored on S3?" trying to explain these guarantees. Thanks for all your replies.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#683
No description provided.