[GH-ISSUE #1875] Do we have to flush on s3fs_truncate? #955

Closed
opened 2026-03-04 01:50:12 +03:00 by kerem · 13 comments
Owner

Originally created by @orozery on GitHub (Jan 27, 2022).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1875

We have an application which writes large files (~4GB) in 32MB chunks,
where before each 32MB write, the application is calling ftruncate to increase the file size by 32MB.
This maps to s3fs_truncate, which flushes the file.
The writing to the S3 backend becomes very inefficient, as the entire file is overwritten every 32MBs written.

I'm wondering if the flush on s3fs_truncate can be avoided.
In the man page of ftruncate, I did not see they mention that a flush is guaranteed.

@gaul will be happy to get your thoughts.
Thanks!

Originally created by @orozery on GitHub (Jan 27, 2022). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1875 We have an application which writes large files (~4GB) in 32MB chunks, where before each 32MB write, the application is calling `ftruncate` to increase the file size by 32MB. This maps to `s3fs_truncate`, which flushes the file. The writing to the S3 backend becomes very inefficient, as the entire file is overwritten every 32MBs written. I'm wondering if the flush on `s3fs_truncate` can be avoided. In the man page of ftruncate, I did not see they mention that a flush is guaranteed. @gaul will be happy to get your thoughts. Thanks!
kerem 2026-03-04 01:50:12 +03:00
Author
Owner

@gaul commented on GitHub (Jan 27, 2022):

I believe that s3fs has always fsync on ftruncate, as far back as 4a30df1ff2. I do not believe that POSIX actually requires this and the slower implementation was only a reflection of s3fs' limited dirty data tracking at that time. Since ftruncate is an uncommon operation we have not optimized it yet. Could you submit a PR for this?

<!-- gh-comment-id:1023747553 --> @gaul commented on GitHub (Jan 27, 2022): I believe that s3fs has always `fsync` on `ftruncate`, as far back as 4a30df1ff24aeac92d3ab0f8e84b34df8e70ec3d. I do not believe that POSIX actually requires this and the slower implementation was only a reflection of s3fs' limited dirty data tracking at that time. Since `ftruncate` is an uncommon operation we have not optimized it yet. Could you submit a PR for this?
Author
Owner

@gaul commented on GitHub (Jan 28, 2022):

One thing to investigate is whether s3fs calls ftruncate on its temporary file so that it can return ENOSPC to the application if local storage is too small.

<!-- gh-comment-id:1023831742 --> @gaul commented on GitHub (Jan 28, 2022): One thing to investigate is whether s3fs calls `ftruncate` on its temporary file so that it can return `ENOSPC` to the application if local storage is too small.
Author
Owner

@ggtakec commented on GitHub (Jan 29, 2022):

I think we need to re-check the code that uses the file size after truncate operation.
The file size is used by other operations that occur after truncate. (In various places)

For example, when using a cache file(in the case where there is a directory capacity), it tries to substitute the file size acquisition from the local cache file.
However, if the cache is not used, the current s3fs will have a problem if the size is not reflected in the S3 server side object after calling the truncate operation.

However, we may be able to turn off fsync while the file is open.
(But that doesn't seem to be a simple fix.)

<!-- gh-comment-id:1024811765 --> @ggtakec commented on GitHub (Jan 29, 2022): I think we need to re-check the code that uses the file size after truncate operation. The file size is used by other operations that occur after truncate. (In various places) For example, when using a cache file(in the case where there is a directory capacity), it tries to substitute the file size acquisition from the local cache file. However, if the cache is not used, the current s3fs will have a problem if the size is not reflected in the S3 server side object after calling the truncate operation. However, we may be able to turn off fsync while the file is open. (But that doesn't seem to be a simple fix.)
Author
Owner

@gaul commented on GitHub (Jan 29, 2022):

If we added performance counters, suggested in #1571, we could modify the integration tests to check the expected number of RPCs. This would ensure that we are not regressing performance on operations like truncate.

<!-- gh-comment-id:1024813597 --> @gaul commented on GitHub (Jan 29, 2022): If we added performance counters, suggested in #1571, we could modify the integration tests to check the expected number of RPCs. This would ensure that we are not regressing performance on operations like truncate.
Author
Owner

@ggtakec commented on GitHub (Jan 29, 2022):

As you say, performance evaluation for cache is necessary.
(Let's define what should be measured in #1571)

In the first comment of @orozery, there is the following sentence.

where before each 32MB write, the application is calling ftruncate to increase the file size by 32MB.
This maps to s3fs_truncate, which flushes the file.

In other words, the application repeates call truncate->write pair.
s3fs(and FUSE) does not know that write will be called after truncate.
That's why in the current implementation, it's flushing at every trauncate...

If you want to avoid this, we will be ables to change s3fs as:
only at the close(flush/release) is called, s3fs outputs all dirty parts at once.
But I think that this workaround cannot be used when the local cache is not used (cannot be created).
Conversely, in situations where the local cache is available, it may be possible to prevent flushing until close.

<!-- gh-comment-id:1024819418 --> @ggtakec commented on GitHub (Jan 29, 2022): As you say, performance evaluation for cache is necessary. (Let's define what should be measured in #1571) In the first comment of @orozery, there is the following sentence. > where before each 32MB write, the application is calling ftruncate to increase the file size by 32MB. > This maps to s3fs_truncate, which flushes the file. In other words, the application repeates call truncate->write pair. s3fs(and FUSE) does not know that write will be called after truncate. That's why in the current implementation, it's flushing at every trauncate... If you want to avoid this, we will be ables to change s3fs as: only at the close(flush/release) is called, s3fs outputs all dirty parts at once. But I think that this workaround cannot be used when the local cache is not used (cannot be created). Conversely, in situations where the local cache is available, it may be possible to prevent flushing until close.
Author
Owner

@orozery commented on GitHub (Jan 30, 2022):

I'm trying to re-think this whole idea.
If the user is using ftruncate, it makes sense to not flush, and wait for flush or close.
However, if the user is using truncate, which does not get a file descriptor, then I think he should expect the truncate to be immediate on the S3 server.

With the current s3fs_truncate we cannot tell if truncate was called on a file handle (and which).

Now I see that in FUSE3 they changed int (*truncate) (const char *, off_t) to int (*truncate) (const char *, off_t, struct fuse_file_info *fi).
Specifically here:
https://github.com/libfuse/libfuse/issues/58

I guess s3fs is configured with FUSE2 which does not have this new API?

<!-- gh-comment-id:1025231424 --> @orozery commented on GitHub (Jan 30, 2022): I'm trying to re-think this whole idea. If the user is using `ftruncate`, it makes sense to not flush, and wait for `flush` or `close`. However, if the user is using `truncate`, which does not get a file descriptor, then I think he should expect the truncate to be immediate on the S3 server. With the current `s3fs_truncate` we cannot tell if truncate was called on a file handle (and which). Now I see that in FUSE3 they changed `int (*truncate) (const char *, off_t)` to `int (*truncate) (const char *, off_t, struct fuse_file_info *fi)`. Specifically here: https://github.com/libfuse/libfuse/issues/58 I guess s3fs is configured with FUSE2 which does not have this new API?
Author
Owner

@ggtakec commented on GitHub (Jan 31, 2022):

For details, on the premise that you have to check the source code.

s3fs has been modified to use an internally temporary file descriptor(which is a number that is recognized only inside s3fs, not the fd issued by the system) when a file is opened.
So I think I can use this temporary file descriptor to determine if it's an open file.

<!-- gh-comment-id:1025267533 --> @ggtakec commented on GitHub (Jan 31, 2022): _For details, on the premise that you have to check the source code._ s3fs has been modified to use an internally temporary file descriptor(which is a number that is recognized only inside s3fs, not the fd issued by the system) when a file is opened. So I think I can use this temporary file descriptor to determine if it's an open file.
Author
Owner

@gaul commented on GitHub (Feb 6, 2022):

While s3fs should optimize truncate, should your application call posix_fallocate instead?

<!-- gh-comment-id:1030766113 --> @gaul commented on GitHub (Feb 6, 2022): While s3fs should optimize `truncate`, should your application call `posix_fallocate` instead?
Author
Owner

@orozery commented on GitHub (Feb 6, 2022):

While s3fs should optimize truncate, should your application call posix_fallocate instead?

Well, my application is closed-source (Symantec) Ghost, so I cannot control its posix calls.
But even though, s3fs currently does not implement fuse_operations.fallocate.

<!-- gh-comment-id:1030773511 --> @orozery commented on GitHub (Feb 6, 2022): > While s3fs should optimize `truncate`, should your application call `posix_fallocate` instead? Well, my application is closed-source (Symantec) Ghost, so I cannot control its posix calls. But even though, s3fs currently does not implement `fuse_operations.fallocate`.
Author
Owner

@ggtakec commented on GitHub (Feb 6, 2022):

I think it is possible to implement fuse_operations.fallocate in s3fs.
(it is depending on the mode flags, the implementation can be difficault)

Do you intend to solve this problem by setting(keeping) the file size with fallocate and then writing additional data?

<!-- gh-comment-id:1030779117 --> @ggtakec commented on GitHub (Feb 6, 2022): I think it is possible to implement fuse_operations.fallocate in s3fs. (it is depending on the mode flags, the implementation can be difficault) Do you intend to solve this problem by setting(keeping) the file size with fallocate and then writing additional data?
Author
Owner

@orozery commented on GitHub (Feb 7, 2022):

I think it is possible to implement fuse_operations.fallocate in s3fs. (it is depending on the mode flags, the implementation can be difficault)

Do you intend to solve this problem by setting(keeping) the file size with fallocate and then writing additional data?

As I said, I cannot control the application, which uses ftruncate.
The only option I see is to change s3fs from libfuse-dev to libfuse3-dev. Have we considered this?

<!-- gh-comment-id:1031268376 --> @orozery commented on GitHub (Feb 7, 2022): > I think it is possible to implement fuse_operations.fallocate in s3fs. (it is depending on the mode flags, the implementation can be difficault) > > Do you intend to solve this problem by setting(keeping) the file size with fallocate and then writing additional data? As I said, I cannot control the application, which uses `ftruncate`. The only option I see is to change s3fs from `libfuse-dev` to `libfuse3-dev`. Have we considered this?
Author
Owner

@ggtakec commented on GitHub (Feb 13, 2022):

For this matter, I created PR #1887.
@orozery Please try to use it if you can.

I thought that it's possible that s3fs don't need to flush when you resize the file. #1887 is realized for it.

The logic of the s3fs_truncate function hasn't been reviewed for a while, so it would do unnecessary downloads and flushes even if the file(is modifying) size changed.
I reviewed these processes and made them simpler.

The operation of fuse2 is as follows.(If I'm not mis-understanding)
If user calls truncate on an unopened file, fuse opens the file and then calls s3fs_truncate.
If the file is already opened, s3fs_truncate will be called as is.
(In s3fs_truncate, the difference between the above two cannot be determined.)
Returning from s3fs_truncate, the file will be flushed when it is closed.
(If the file is opened before calling truncate, the flush will occur when it is closed.)
So we can remove the direct call to Flush in s3fs_truncate like #1887.

In fuse3(I'm not familiar completely with it yet), fuse_file_info* is added to the truncate hook, but I think it's the same as fuse2 for s3fs.
Internally, s3fs manages the open file descriptor of the target file, it has the same meaning as this structure of fuse3.
Therefore, I think that the implementation of s3fs_truncate due to the difference in fuse2/3 is not different. (I think it's a small change, if not)

<!-- gh-comment-id:1038006545 --> @ggtakec commented on GitHub (Feb 13, 2022): For this matter, I created PR #1887. @orozery Please try to use it if you can. I thought that it's possible that s3fs don't need to flush when you resize the file. #1887 is realized for it. The logic of the `s3fs_truncate` function hasn't been reviewed for a while, so it would do unnecessary downloads and flushes even if the file(is modifying) size changed. I reviewed these processes and made them simpler. The operation of `fuse2` is as follows.(If I'm not mis-understanding) If user calls `truncate` on an unopened file, `fuse` opens the file and then calls `s3fs_truncate`. If the file is already opened, `s3fs_truncate` will be called as is. (In `s3fs_truncate`, the difference between the above two cannot be determined.) Returning from `s3fs_truncate`, the file will be flushed when it is closed. (If the file is opened before calling truncate, the flush will occur when it is closed.) So we can remove the direct call to Flush in s3fs_truncate like #1887. In `fuse3`(I'm not familiar completely with it yet), `fuse_file_info*` is added to the `truncate` hook, but I think it's the same as `fuse2` for s3fs. Internally, s3fs manages the open file descriptor of the target file, it has the same meaning as this structure of `fuse3`. Therefore, I think that the implementation of `s3fs_truncate` due to the difference in fuse2/3 is not different. (I think it's a small change, if not)
Author
Owner

@gaul commented on GitHub (Feb 15, 2022):

@orozery Please test your application with the latest master and report if this addresses your symptoms.

<!-- gh-comment-id:1040216413 --> @gaul commented on GitHub (Feb 15, 2022): @orozery Please test your application with the latest master and report if this addresses your symptoms.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#955
No description provided.