[GH-ISSUE #427] Getting Double PUTs when writing files #228

Closed
opened 2026-03-04 01:43:25 +03:00 by kerem · 15 comments
Owner

Originally created by @barryhunter on GitHub (Jun 2, 2016).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/427

Since using s3fs, noticed we getting a high cost on

$0.0125 per GB-Month prorated for small objects deleted or overwritten before 30 days in Standard-Infrequent Access

However we don't delete anything, only writing at this time!

Enabled the S3 logs, and noticed an apparent double put.

Captured log, for writing one single file!
http://data.geograph.org.uk/amazon-log-example.txt
Note the logs only have one second granulity, so the log entries are almost certainly out of order, had to sort the list to combine from multiple distinct amazon provided log files.

It seems there is a 0-byte PUT, and then final 106490 byte PUT. (I'm guessing that was order anyway, because tehre is a 106490 file in S3)

The file is copied with the python shutil.copyfile function. Did use copy2 originally, but that resulted many PUTs (s3fs would write the file, then the chown/touch ext would result in two COPY operations) I just can't figure out how to eliminate this last double write.

(the large number of 404 requests before writing, hoping enable_noobj_cache will help with. although would like to turn off checking for all the 'compatibility' folder types, dont need them, they wont exist. But thats seperate issue!)

Originally created by @barryhunter on GitHub (Jun 2, 2016). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/427 Since using s3fs, noticed we getting a high cost on > $0.0125 per GB-Month prorated for small objects deleted or overwritten before 30 days in Standard-Infrequent Access However we don't delete anything, only writing at this time! Enabled the S3 logs, and noticed an apparent double put. Captured log, for writing one single file! http://data.geograph.org.uk/amazon-log-example.txt Note the logs only have one second granulity, so the log entries are almost certainly out of order, had to sort the list to combine from multiple distinct amazon provided log files. It seems there is a 0-byte PUT, and then final 106490 byte PUT. (I'm guessing that was order anyway, because tehre is a 106490 file in S3) The file is copied with the python shutil.copyfile function. Did use copy2 originally, but that resulted many PUTs (s3fs would write the file, then the chown/touch ext would result in two COPY operations) I just can't figure out how to eliminate this last double write. (the large number of 404 requests before writing, hoping enable_noobj_cache will help with. although would like to turn off checking for all the 'compatibility' folder types, dont need them, they wont exist. But thats seperate issue!)
kerem closed this issue 2026-03-04 01:43:25 +03:00
Author
Owner

@ggtakec commented on GitHub (Jun 12, 2016):

@barryhunter

You are correct, s3fs puts a object twice when it creates file on S3.
It is associated with FUSE.
Operation of the copy command, first create a file first, then writing of data using the file handle.
When s3fs creates the file, PUTs 0 byte file to S3.
In a future version, it may be able to change this behavior, but it is still undecided.

Regards,

<!-- gh-comment-id:225404515 --> @ggtakec commented on GitHub (Jun 12, 2016): @barryhunter You are correct, s3fs puts a object twice when it creates file on S3. It is associated with FUSE. Operation of the copy command, first create a file first, then writing of data using the file handle. When s3fs creates the file, PUTs 0 byte file to S3. In a future version, it may be able to change this behavior, but it is still undecided. Regards,
Author
Owner

@patroy commented on GitHub (Sep 5, 2016):

This is also a problem when you use S3 event notifications. It creates two SNS, Lambda or SNS requests.

<!-- gh-comment-id:244795489 --> @patroy commented on GitHub (Sep 5, 2016): This is also a problem when you use S3 event notifications. It creates two SNS, Lambda or SNS requests.
Author
Owner

@gabpaladino commented on GitHub (Sep 30, 2016):

Hi, i'm using with Lambda and have to pay twice :(
There are some workaround to filter event with object "size": 0 from trigger Lambda?
Regards.

<!-- gh-comment-id:250811115 --> @gabpaladino commented on GitHub (Sep 30, 2016): Hi, i'm using with Lambda and have to pay twice :( There are some workaround to filter event with object `"size": 0` from trigger Lambda? Regards.
Author
Owner

@patroy commented on GitHub (Sep 30, 2016):

I do it in the lambda code:

if (event.Records[0].s3.object.size > 0) ...

But you will still get charged for those. I just do this to prevent the rest of my code from running twice.

<!-- gh-comment-id:250811773 --> @patroy commented on GitHub (Sep 30, 2016): I do it in the lambda code: if (event.Records[0].s3.object.size > 0) ... But you will still get charged for those. I just do this to prevent the rest of my code from running twice.
Author
Owner

@gabpaladino commented on GitHub (Sep 30, 2016):

yes @patroy i'm coding it too, it would be nice if possible a filter as prefix and suffix

<!-- gh-comment-id:250814586 --> @gabpaladino commented on GitHub (Sep 30, 2016): yes @patroy i'm coding it too, it would be nice if possible a filter as prefix and suffix
Author
Owner

@derekmurawsky commented on GitHub (Mar 24, 2017):

+1 this is really bad/unexpected behavior. I, too, run lambdas based on files created in s3, and this can cause a lot of issues. I will implement the workaround in my lambda as suggested by @patroy, but this really shouldn't be the behavior. At the very least, can we get an option to disable this behavior?

<!-- gh-comment-id:289088409 --> @derekmurawsky commented on GitHub (Mar 24, 2017): +1 this is really bad/unexpected behavior. I, too, run lambdas based on files created in s3, and this can cause a lot of issues. I will implement the workaround in my lambda as suggested by @patroy, but this really shouldn't be the behavior. At the very least, can we get an option to disable this behavior?
Author
Owner

@kahing commented on GitHub (Mar 29, 2017):

part of the reason for double PUT is the strong POSIX compatibility offered by s3fs. If you don't need that much POSIX, there are other tools that will use fewer S3 requests.

<!-- gh-comment-id:290248953 --> @kahing commented on GitHub (Mar 29, 2017): part of the reason for double PUT is the strong POSIX compatibility offered by s3fs. If you don't need that much POSIX, there are other tools that will use fewer S3 requests.
Author
Owner

@MrMitch17 commented on GitHub (Apr 17, 2017):

I am running into this issue twice also. Where a single file upload is causing my lambda code to trigger twice. I can code around it. But it I figured I'd make a note that I am experiencing it also.

<!-- gh-comment-id:294604461 --> @MrMitch17 commented on GitHub (Apr 17, 2017): I am running into this issue twice also. Where a single file upload is causing my lambda code to trigger twice. I can code around it. But it I figured I'd make a note that I am experiencing it also.
Author
Owner

@ggtakec commented on GitHub (Jan 14, 2018):

Current s3fs do not have options to avoid this.
When updating files, due to POSIX-compliant behavior, the file is first created, then it is due to the behavior that it is updated.
If it is possible, it seems to be possible to avoid it by checking if the file size is 0 byte in Lambda and not processing when 0 byte.

If we change s3fs, we change it to PUT at release call rather than call to flush for file descriptor.
However, this change will change many actions, it is a remodeling with a high degree of difficulty.
If possible, we hope to check 0 bytes in Lambda's processing.
Or you can use goofys( @kahing ) instead of s3fs.

Thanks in advance for your help.

<!-- gh-comment-id:357494377 --> @ggtakec commented on GitHub (Jan 14, 2018): Current s3fs do not have options to avoid this. When updating files, due to POSIX-compliant behavior, the file is first created, then it is due to the behavior that it is updated. If it is possible, it seems to be possible to avoid it by checking if the file size is 0 byte in Lambda and not processing when 0 byte. If we change s3fs, we change it to PUT at release call rather than call to flush for file descriptor. However, this change will change many actions, it is a remodeling with a high degree of difficulty. If possible, we hope to check 0 bytes in Lambda's processing. Or you can use goofys( @kahing ) instead of s3fs. Thanks in advance for your help.
Author
Owner

@kollyma commented on GitHub (Dec 13, 2018):

@ggtakec: Thanks for the details
We run into this issue with WORM Buckets (Write-Once-Read-Many).
Any plans to integrate an option for "less POSIX compliance but WORM Bucket support"?

<!-- gh-comment-id:446919848 --> @kollyma commented on GitHub (Dec 13, 2018): @ggtakec: Thanks for the details We run into this issue with WORM Buckets (Write-Once-Read-Many). Any plans to integrate an option for "less POSIX compliance but WORM Bucket support"?
Author
Owner

@ggtakec commented on GitHub (Mar 30, 2019):

We kept this issue open for a long time.
We will keep thinking about "less POSIX compliance but WORM Bucket support".
I will close this, but if the problem persists, please reopen or post a new issue.

<!-- gh-comment-id:478217372 --> @ggtakec commented on GitHub (Mar 30, 2019): We kept this issue open for a long time. We will keep thinking about "less POSIX compliance but WORM Bucket support". I will close this, but if the problem persists, please reopen or post a new issue.
Author
Owner

@mitexleo commented on GitHub (Dec 18, 2024):

Any workaround to avoid this issue?

<!-- gh-comment-id:2551210403 --> @mitexleo commented on GitHub (Dec 18, 2024): Any workaround to avoid this issue?
Author
Owner

@gaul commented on GitHub (Dec 19, 2024):

This issue was fixed years ago. Please test with the latest version and open a new issue if you can reproduce the symptoms.

<!-- gh-comment-id:2555541090 --> @gaul commented on GitHub (Dec 19, 2024): This issue was fixed years ago. Please test with the latest version and open a new issue if you can reproduce the symptoms.
Author
Owner

@mitexleo commented on GitHub (Dec 20, 2024):

I'm using the latest version available on Ubuntu APT Repository, a file is being uploaded twice.

On December 20, 2024 12:45:24 AM GMT+06:00, Andrew Gaul @.***> wrote:

This issue was fixed years ago. Please test with the latest version and open a new issue if you can reproduce the symptoms.

--
Reply to this email directly or view it on GitHub:
https://github.com/s3fs-fuse/s3fs-fuse/issues/427#issuecomment-2555541090
You are receiving this because you commented.

Message ID: @.***>

<!-- gh-comment-id:2556023045 --> @mitexleo commented on GitHub (Dec 20, 2024): I'm using the latest version available on Ubuntu APT Repository, a file is being uploaded twice. On December 20, 2024 12:45:24 AM GMT+06:00, Andrew Gaul ***@***.***> wrote: >This issue was fixed years ago. Please test with the latest version and open a new issue if you can reproduce the symptoms. > >-- >Reply to this email directly or view it on GitHub: >https://github.com/s3fs-fuse/s3fs-fuse/issues/427#issuecomment-2555541090 >You are receiving this because you commented. > >Message ID: ***@***.***>
Author
Owner

@gaul commented on GitHub (Dec 20, 2024):

I'm locking this thread Please open a new bug with debug logging.

<!-- gh-comment-id:2556126164 --> @gaul commented on GitHub (Dec 20, 2024): I'm locking this thread Please open a new bug with debug logging.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#228
No description provided.