mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #427] Getting Double PUTs when writing files #228
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#228
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @barryhunter on GitHub (Jun 2, 2016).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/427
Since using s3fs, noticed we getting a high cost on
However we don't delete anything, only writing at this time!
Enabled the S3 logs, and noticed an apparent double put.
Captured log, for writing one single file!
http://data.geograph.org.uk/amazon-log-example.txt
Note the logs only have one second granulity, so the log entries are almost certainly out of order, had to sort the list to combine from multiple distinct amazon provided log files.
It seems there is a 0-byte PUT, and then final 106490 byte PUT. (I'm guessing that was order anyway, because tehre is a 106490 file in S3)
The file is copied with the python shutil.copyfile function. Did use copy2 originally, but that resulted many PUTs (s3fs would write the file, then the chown/touch ext would result in two COPY operations) I just can't figure out how to eliminate this last double write.
(the large number of 404 requests before writing, hoping enable_noobj_cache will help with. although would like to turn off checking for all the 'compatibility' folder types, dont need them, they wont exist. But thats seperate issue!)
@ggtakec commented on GitHub (Jun 12, 2016):
@barryhunter
You are correct, s3fs puts a object twice when it creates file on S3.
It is associated with FUSE.
Operation of the copy command, first create a file first, then writing of data using the file handle.
When s3fs creates the file, PUTs 0 byte file to S3.
In a future version, it may be able to change this behavior, but it is still undecided.
Regards,
@patroy commented on GitHub (Sep 5, 2016):
This is also a problem when you use S3 event notifications. It creates two SNS, Lambda or SNS requests.
@gabpaladino commented on GitHub (Sep 30, 2016):
Hi, i'm using with Lambda and have to pay twice :(
There are some workaround to filter event with object
"size": 0from trigger Lambda?Regards.
@patroy commented on GitHub (Sep 30, 2016):
I do it in the lambda code:
if (event.Records[0].s3.object.size > 0) ...
But you will still get charged for those. I just do this to prevent the rest of my code from running twice.
@gabpaladino commented on GitHub (Sep 30, 2016):
yes @patroy i'm coding it too, it would be nice if possible a filter as prefix and suffix
@derekmurawsky commented on GitHub (Mar 24, 2017):
+1 this is really bad/unexpected behavior. I, too, run lambdas based on files created in s3, and this can cause a lot of issues. I will implement the workaround in my lambda as suggested by @patroy, but this really shouldn't be the behavior. At the very least, can we get an option to disable this behavior?
@kahing commented on GitHub (Mar 29, 2017):
part of the reason for double PUT is the strong POSIX compatibility offered by s3fs. If you don't need that much POSIX, there are other tools that will use fewer S3 requests.
@MrMitch17 commented on GitHub (Apr 17, 2017):
I am running into this issue twice also. Where a single file upload is causing my lambda code to trigger twice. I can code around it. But it I figured I'd make a note that I am experiencing it also.
@ggtakec commented on GitHub (Jan 14, 2018):
Current s3fs do not have options to avoid this.
When updating files, due to POSIX-compliant behavior, the file is first created, then it is due to the behavior that it is updated.
If it is possible, it seems to be possible to avoid it by checking if the file size is 0 byte in Lambda and not processing when 0 byte.
If we change s3fs, we change it to PUT at release call rather than call to flush for file descriptor.
However, this change will change many actions, it is a remodeling with a high degree of difficulty.
If possible, we hope to check 0 bytes in Lambda's processing.
Or you can use goofys( @kahing ) instead of s3fs.
Thanks in advance for your help.
@kollyma commented on GitHub (Dec 13, 2018):
@ggtakec: Thanks for the details
We run into this issue with WORM Buckets (Write-Once-Read-Many).
Any plans to integrate an option for "less POSIX compliance but WORM Bucket support"?
@ggtakec commented on GitHub (Mar 30, 2019):
We kept this issue open for a long time.
We will keep thinking about "less POSIX compliance but WORM Bucket support".
I will close this, but if the problem persists, please reopen or post a new issue.
@mitexleo commented on GitHub (Dec 18, 2024):
Any workaround to avoid this issue?
@gaul commented on GitHub (Dec 19, 2024):
This issue was fixed years ago. Please test with the latest version and open a new issue if you can reproduce the symptoms.
@mitexleo commented on GitHub (Dec 20, 2024):
I'm using the latest version available on Ubuntu APT Repository, a file is being uploaded twice.
On December 20, 2024 12:45:24 AM GMT+06:00, Andrew Gaul @.***> wrote:
@gaul commented on GitHub (Dec 20, 2024):
I'm locking this thread Please open a new bug with debug logging.