mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #1269] fsync guarantees for S3FS not documented #683
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#683
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @igaztanaga on GitHub (Apr 13, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1269
According to POSIX "fsync", which in turn translates to s3fs_fsync:
"transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted. This includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed."
I can't find what this means for s3fs, we were expecting that all data would be uploaded to S3 before "fsync" returns. According to code (at least in release 1.86)
ent->Flush(false);is called which seems to request an asynchronous upload. Shouldn'tent->Flush(true)be called to guarantee data is in S3? AFAIK FUSE does not guarantee that s3fs_release is synchronous, so "fsync" looks the only place to handle this.Our use case: when executing s3fuse in a docker, we need to know if all data has been transferred to S3 before I kill the docker (and possibly s3FS is still transferring in the background, even if my process has exited). The obvious and clean idea was to call "fsync" on each written file before closing it, but I don't know if this gives enough guarantees, and I could not find any hint in the documentation.
@gaul commented on GitHub (Apr 16, 2020):
Agreed that
s3fs_fsyncshould callFdEntity::flush(true). Could you send a pull request to do this?@ggtakec as far as I can tell, this behavior has existed forever, since
26453c4874.@gaul commented on GitHub (Apr 16, 2020):
It seems like the logic will upload the dirty pages even without
force_sync=true. What is the use case forforce_sync=false?@igaztanaga commented on GitHub (Apr 16, 2020):
Just to make sure: if using
force_sync=false(current behaviour) uploads dirty pages, this guarantees that whenfsync(fd)returns in userspace, all data from filefdwill be in S3?@gaul commented on GitHub (Apr 18, 2020):
@igaztanaga Yes it appears that dirty data is synced regardless of the value of
force_sync. It appears that this flag exists to sync the metadata, not the data. Thus the current code path does the correct thing for your use case. Have you observed a different behavior?@ggtakec commented on GitHub (Apr 18, 2020):
@gaul Thanks for your help.
@igaztanaga
In the handling of s3fs_fsync, if the target file (object) is open and there is a change, it is uploaded.
When uploading, it is executed as a synchronous process.
Does that mean you want to upload this even if it doesn't change?
@igaztanaga commented on GitHub (Apr 20, 2020):
@ggtakec
s3fs_fsync is exactly what we need. The problem is that if we exit a docker just after the process writing and closing the file ends, as
s3fs_releaseis asynchronous, we can (in theory) lose data.I also see that s3fs_flush also calls
FdEntity::flush(true)so in theory a simple close guarantees S3 data is updated, ass3fs_flushis guaranteed to be called for eachclosesynchronously.In any case, replying my own question
fsyncon S3fs guarantees data will be written in S3, which is great.I've updated FAQ with the question "Q: When are the contents of a file guaranteed to be stored on S3?" trying to explain these guarantees. Thanks for all your replies.