[GH-ISSUE #1013] S3FS generating double S3 ObjectCreated event notifications #553

Closed
opened 2026-03-04 01:46:41 +03:00 by kerem · 11 comments
Owner

Originally created by @evh69 on GitHub (Apr 15, 2019).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1013

@ggtakec in relation to https://github.com/s3fs-fuse/s3fs-fuse/issues/427#issuecomment-478217372_ ...

we have application architecture that needs to support zero byte files and discovered this issue during testing. I am sure this has been considered but will ask anyway ... Could a configuration switch that one could set that would allow a choice to create the initial file with some sort of filename pattern? we could leverage in the S3 Event Notification that we could use to ignore the initial file creation, similar to an event filter proposed earlier to handle zero byte files?

IE.

  1. Create initial file with [FILE_NAME] 'S3SF_TEMP' prefix, suffix, or file extension.
  2. Implementors could create a S3 event filter to ignore the 'S3SF_TEMP' prefix, suffix, or file extension.
  3. Open the file and write to the file.
  4. Rename the file?

Originally posted by @ggtakec in https://github.com/s3fs-fuse/s3fs-fuse/issues/427#issuecomment-478217372

Originally created by @evh69 on GitHub (Apr 15, 2019). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1013 @ggtakec in relation to https://github.com/s3fs-fuse/s3fs-fuse/issues/427#issuecomment-478217372_ ... we have application architecture that needs to support zero byte files and discovered this issue during testing. I am sure this has been considered but will ask anyway ... Could a configuration switch that one could set that would allow a choice to create the initial file with some sort of filename pattern? we could leverage in the S3 Event Notification that we could use to ignore the initial file creation, similar to an event filter proposed earlier to handle zero byte files? IE. 1. Create initial file with [FILE_NAME] 'S3SF_TEMP' prefix, suffix, or file extension. 2. Implementors could create a S3 event filter to ignore the 'S3SF_TEMP' prefix, suffix, or file extension. 3. Open the file and write to the file. 4. Rename the file? _Originally posted by @ggtakec in https://github.com/s3fs-fuse/s3fs-fuse/issues/427#issuecomment-478217372_
kerem 2026-03-04 01:46:41 +03:00
Author
Owner

@ggtakec commented on GitHub (Apr 16, 2019):

@evh69 Thanks to post new issue.
Now s3fs has the following behavior when creating a file.

  1. s3fs is instructed to create a 0-byte file through FUSE.
    Then follow the instructions to create a 0-byte object in S3.
  2. Then write the file contents.
    Re-upload(overwrite) the 0 byte file after all the write is completed.

Because of this behavior, a 0-byte file is created first anyway.
This is true at the command as well as at the system call.
I understand that for non-existent files, there is a working background for getting a file descriptor to new file.
For this reason, the 0-byte file creation is the same whether you create a new file or rename it.

If s3fs would support WORM, it must bypass the first 0-byte file creation.
Since s3fs creates a file descriptor using a cache file on the local file system, we maybe be able to change s3fs not to upload at first time.
However, this will also need to investigate the impact on other operations (ex. checking file permissions).
So that, I think It is not easy way.

<!-- gh-comment-id:483716619 --> @ggtakec commented on GitHub (Apr 16, 2019): @evh69 Thanks to post new issue. Now s3fs has the following behavior when creating a file. 1. s3fs is instructed to create a 0-byte file through FUSE. Then follow the instructions to create a 0-byte object in S3. 1. Then write the file contents. Re-upload(overwrite) the 0 byte file after all the write is completed. Because of this behavior, a 0-byte file is created first anyway. This is true at the command as well as at the system call. I understand that for non-existent files, there is a working background for getting a file descriptor to new file. For this reason, the 0-byte file creation is the same whether you create a new file or rename it. If s3fs would support WORM, it must bypass the first 0-byte file creation. Since s3fs creates a file descriptor using a cache file on the local file system, we maybe be able to change s3fs not to upload at first time. However, this will also need to investigate the impact on other operations (ex. checking file permissions). So that, I think It is not easy way.
Author
Owner

@dkolli commented on GitHub (Apr 17, 2019):

Is meta data on s3 any different when file descriptor was created first with 0 byte than when writing is complete with 0 bytes other than the object size of course. that would help us to filter out the first put in 0 byte file scenario and allow the second put to go through

<!-- gh-comment-id:483926158 --> @dkolli commented on GitHub (Apr 17, 2019): Is meta data on s3 any different when file descriptor was created first with 0 byte than when writing is complete with 0 bytes other than the object size of course. that would help us to filter out the first put in 0 byte file scenario and allow the second put to go through
Author
Owner

@gaul commented on GitHub (Apr 17, 2019):

However, this will also need to investigate the impact on other operations (ex. checking file permissions).

I briefly looked into this, modifying s3fs_create to avoid creating the object and instead populating the stat cache with an empty file. One common error is s3fs_utimens which tries to copy the metadata from the non-existent object to the new object. I am certain that this can be fixed but will require more effort...

<!-- gh-comment-id:484076448 --> @gaul commented on GitHub (Apr 17, 2019): > However, this will also need to investigate the impact on other operations (ex. checking file permissions). I briefly looked into this, modifying `s3fs_create` to avoid creating the object and instead populating the stat cache with an empty file. One common error is `s3fs_utimens` which tries to copy the metadata from the non-existent object to the new object. I am certain that this can be fixed but will require more effort...
Author
Owner

@alperen66 commented on GitHub (Feb 18, 2021):

Bununla birlikte, bunun diğer operasyonlar üzerindeki etkisini de araştırması gerekecektir (örn. dosya izinlerini kontrol etme).

s3fs_createNesneyi oluşturmaktan kaçınmak için değiştirerek ve bunun yerine stat önbelleğini boş bir dosya ile doldurarak kısaca baktım. s3fs_utimensMeta verileri varolmayan bir nesneden yeni bir nesneye kopyalamaya çalışan yaygın bir hatadır. Bunun düzeltilebileceğinden eminim, ancak daha fazla çaba gerektirecektir...

its been 2 year do we have a solution now ?

<!-- gh-comment-id:781633469 --> @alperen66 commented on GitHub (Feb 18, 2021): > > Bununla birlikte, bunun diğer operasyonlar üzerindeki etkisini de araştırması gerekecektir (örn. dosya izinlerini kontrol etme). > > `s3fs_create`Nesneyi oluşturmaktan kaçınmak için değiştirerek ve bunun yerine stat önbelleğini boş bir dosya ile doldurarak kısaca baktım. `s3fs_utimens`Meta verileri varolmayan bir nesneden yeni bir nesneye kopyalamaya çalışan yaygın bir hatadır. Bunun düzeltilebileceğinden eminim, ancak daha fazla çaba gerektirecektir... its been 2 year do we have a solution now ?
Author
Owner

@gaul commented on GitHub (Apr 30, 2021):

@evh69 @dbbyleo could you test that master resolves your symptoms?

<!-- gh-comment-id:830055463 --> @gaul commented on GitHub (Apr 30, 2021): @evh69 @dbbyleo could you test that master resolves your symptoms?
Author
Owner

@gaul commented on GitHub (May 1, 2021):

I benchmarked this change with time for i in $(seq 100); do touch mnt/$i; done from Japan using a bucket in us-east.

Before:

real    3m48.275s
user    0m0.098s
sys     0m0.333s

After:

real    2m51.464s
user    0m0.119s
sys     0m0.358s
<!-- gh-comment-id:830632915 --> @gaul commented on GitHub (May 1, 2021): I benchmarked this change with `time for i in $(seq 100); do touch mnt/$i; done` from Japan using a bucket in us-east. Before: ``` real 3m48.275s user 0m0.098s sys 0m0.333s ``` After: ``` real 2m51.464s user 0m0.119s sys 0m0.358s ```
Author
Owner

@CESteinmetz commented on GitHub (May 26, 2021):

Thanks for addressing this. I updated to the tip of master, but I am still seeing two ObjectCreated:Put events in the S3 Access Logs. First PUT is with a zero byte file, second with the full file. Am I misunderstanding this fix?

26/May/2021:17:17:55 +0000 |  REST.PUT.OBJECT | prefix/filename.zip | 7943 | "PUT /prefix/filename.zip HTTP/1.1" 
26/May/2021:17:17:55 +0000 | REST.PUT.OBJECT | prefix/filename.zip | 0 | "PUT /prefix/filename.zip HTTP/1.1"
<!-- gh-comment-id:849076522 --> @CESteinmetz commented on GitHub (May 26, 2021): Thanks for addressing this. I updated to the tip of master, but I am still seeing two ObjectCreated:Put events in the S3 Access Logs. First PUT is with a zero byte file, second with the full file. Am I misunderstanding this fix? ``` 26/May/2021:17:17:55 +0000 | REST.PUT.OBJECT | prefix/filename.zip | 7943 | "PUT /prefix/filename.zip HTTP/1.1" 26/May/2021:17:17:55 +0000 | REST.PUT.OBJECT | prefix/filename.zip | 0 | "PUT /prefix/filename.zip HTTP/1.1" ```
Author
Owner

@gaul commented on GitHub (May 27, 2021):

You can watch s3fs make HTTP request by setting the -f -o curldbg flags. It seems like you are not using the s3fs version that you expect.

<!-- gh-comment-id:849219079 --> @gaul commented on GitHub (May 27, 2021): You can watch s3fs make HTTP request by setting the `-f -o curldbg` flags. It seems like you are not using the s3fs version that you expect.
Author
Owner

@CESteinmetz commented on GitHub (May 27, 2021):

I appreciate the prompt response. I enabled debug mode and checked my version:

[xxxxxxx@ip-xxxxxxx bin]# s3fs --version
Amazon Simple Storage Service File System V1.89 (commit:4b69d4b) with OpenSSL
Copyright (C) 2010 Randy Rizun <rrizun@gmail.com>
License GPL2: GNU GPL version 2 <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

The commit hash seems to match the tip of master

In the debug logs I still see the double call:

May 27 07:25:02 ip-xxxxx s3fs[11665]:       computing signature [PUT] [/prefix/file.zip] [] [abcdefghijklmnopqrstuvwxyz]
May 27 07:25:02 ip-10-0-20-14 s3fs[11665]:       url is https://s3-us-east-2.amazonaws.com
May 27 07:25:02 ip-10-0-20-14 s3fs[11665]:       HTTP response code 200
May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: [path=/file.zip][fd=6]
May 27 07:25:02 ip-10-0-20-14 s3fs[11665]:       [tpath=][path=/file.zip][fd=6]
May 27 07:25:02 ip-10-0-20-14 s3fs[11665]:       [tpath=/file.zip]
May 27 07:25:02 ip-10-0-20-14 s3fs[11665]:       URL is https://s3-us-east-2.amazonaws.com/bucket-name/prefix/file.zip
May 27 07:25:02 ip-10-0-20-14 s3fs[11665]:       URL changed is https://bucket-name.s3-us-east-2.amazonaws.com/prefix/file.zip
May 27 07:25:02 ip-10-0-20-14 s3fs[11665]:       uploading... [path=/file.zip][fd=6][size=7943]
May 27 07:25:02 ip-10-0-20-14 s3fs[11665]:       computing signature [PUT] [/prefix/file.zip] [] [abcdefghijklmnopqrstuvwxyz]
<!-- gh-comment-id:849468062 --> @CESteinmetz commented on GitHub (May 27, 2021): I appreciate the prompt response. I enabled debug mode and checked my version: ``` [xxxxxxx@ip-xxxxxxx bin]# s3fs --version Amazon Simple Storage Service File System V1.89 (commit:4b69d4b) with OpenSSL Copyright (C) 2010 Randy Rizun <rrizun@gmail.com> License GPL2: GNU GPL version 2 <https://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. ```` The commit hash seems to match the tip of master In the debug logs I still see the double call: ```May 27 07:25:02 ip-xxxxx s3fs[11665]: uploading... [path=/file.zip][fd=6][size=0] May 27 07:25:02 ip-xxxxx s3fs[11665]: computing signature [PUT] [/prefix/file.zip] [] [abcdefghijklmnopqrstuvwxyz] May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: url is https://s3-us-east-2.amazonaws.com May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: HTTP response code 200 May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: [path=/file.zip][fd=6] May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: [tpath=][path=/file.zip][fd=6] May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: [tpath=/file.zip] May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: URL is https://s3-us-east-2.amazonaws.com/bucket-name/prefix/file.zip May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: URL changed is https://bucket-name.s3-us-east-2.amazonaws.com/prefix/file.zip May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: uploading... [path=/file.zip][fd=6][size=7943] May 27 07:25:02 ip-10-0-20-14 s3fs[11665]: computing signature [PUT] [/prefix/file.zip] [] [abcdefghijklmnopqrstuvwxyz] ```
Author
Owner

@gaul commented on GitHub (May 27, 2021):

Please strace your application to see what calls it is making -- if calls close or fsync then s3fs will flush the file. If it is not, try to correlate the strace output with the s3fs -f -o curldbg logs and share them here.

<!-- gh-comment-id:849470898 --> @gaul commented on GitHub (May 27, 2021): Please `strace` your application to see what calls it is making -- if calls `close` or `fsync` then s3fs will flush the file. If it is not, try to correlate the strace output with the `s3fs -f -o curldbg` logs and share them here.
Author
Owner

@CESteinmetz commented on GitHub (May 27, 2021):

Thanks. I'm using sftp here, and yes I see the close:

May 27 15:34:22 ip-xx-xx-xx-xx internal-sftp[25296]: received client version 3
May 27 15:34:22 ip-xx-xx-xx-xx internal-sftp[25296]: realpath "."
May 27 15:34:25 ip-xx-xx-xx-xx internal-sftp[25296]: realpath "/uploads"
May 27 15:34:25 ip-xx-xx-xx-xx internal-sftp[25296]: stat name "/uploads"
May 27 15:34:31 ip-xx-xx-xx-xx internal-sftp[25296]: open "/uploads/file.zip" flags WRITE,CREATE,TRUNCATE mode 0644
May 27 15:34:32 ip-xx-xx-xx-xx internal-sftp[25296]: close "/uploads/file.zip" bytes read 0 written 7943```
<!-- gh-comment-id:849739971 --> @CESteinmetz commented on GitHub (May 27, 2021): Thanks. I'm using sftp here, and yes I see the close: ```May 27 15:34:22 ip-xx-xx-xx-xx internal-sftp[25296]: session opened for local user <user> from [xx.xx.xx.xx] May 27 15:34:22 ip-xx-xx-xx-xx internal-sftp[25296]: received client version 3 May 27 15:34:22 ip-xx-xx-xx-xx internal-sftp[25296]: realpath "." May 27 15:34:25 ip-xx-xx-xx-xx internal-sftp[25296]: realpath "/uploads" May 27 15:34:25 ip-xx-xx-xx-xx internal-sftp[25296]: stat name "/uploads" May 27 15:34:31 ip-xx-xx-xx-xx internal-sftp[25296]: open "/uploads/file.zip" flags WRITE,CREATE,TRUNCATE mode 0644 May 27 15:34:32 ip-xx-xx-xx-xx internal-sftp[25296]: close "/uploads/file.zip" bytes read 0 written 7943```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#553
No description provided.