mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #16] empty file is written to s3 #11
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#11
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @timurb on GitHub (Feb 24, 2014).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/16
Not sure this is related to #11 but I've decided to create a separate issue.
While writing to s3fs we quite often see that the file of 0 bytes is actually written.
The file size is 75,661,483,206 bytes last time we've see that and about that size in the previous cases. I think we see that only for the only that big file while smaller files (like 20Gb) are written ok.
We use the following command to run s3fs:
S3fs version is 1.74
The logs for the case are the following:
Do you have any ideas what could be the reason for that behaviour and how we could fix that?
Thanks.
@ggtakec commented on GitHub (Mar 17, 2014):
Hi,
I saw your log, and had some question about your processes.
I want to know what /usr/local/bin/cleanup_s3fs_cache script(?) is doing, and I think that it seems to read a file(xxx_0198).
The log seems that s3fs has put zero byte file, and I could not find something wrong...
I want to know what you are doing and your expected result, etc, please let me know about those.
I'm sorry for replying late, and thanks in advance for your help.
@timurb commented on GitHub (Mar 17, 2014):
I think the script has nothing to do with this issue and the reason is in the very large files like 75Gb.
After I've started to split that big file into 2 smaller files (50Gb+25Gb) I no longer see the issue.
If s3fs can't handle big files probably error message would be enough.
Just for reference here is the script I'm using to cleanup cache now, it checks if the file in cache is opened by any process and deletes it if no process accesses the cache-file.
This is not the same case as #10 as in that case I've simply erased the file with no additional checks.
@ggtakec commented on GitHub (Mar 18, 2014):
I think below after your script.
At first s3fs makes zero byte object before uploading fully size of the file.
s3fs does not have a file descriptor for the temporally file and stat file during uploading objects.
It means that s3fs open and close these temporally files during uploading.
So that, I think your script removes these files though s3fs needs to read/write these files.
If it was a reason for this problem, we could check it.
Please do not run your script and we want to know whichever same error is occurred.
If you can, please do it.
Thanks in advance for your help.
@timurb commented on GitHub (Mar 18, 2014):
Ok, I'll check that over weekend. Thanks for a quick reply!
@timurb commented on GitHub (Mar 23, 2014):
I've just disabled the cleanup script and I still see the issue.
Here is the log. There are some additional lines here caused by my browsing the s3fs folder.
I very much think that the reason for this was I uploaded the 75Gb file while I've seen somewhere that s3fs has a limit of 64Gb.
For reference here are the sizes of the files (the first one is on s3fs mount, the second one is in cache).
@ggtakec commented on GitHub (Mar 29, 2014):
Yes, s3fs limits uploading object's size as 64GB.
Please see, https://github.com/s3fs-fuse/s3fs-fuse/blob/master/src/fdcache.cpp#L890
I think, If you can change codes for test, you can upload over 64GB file.
MAX_OBJECT_SIZE and FDPAGE_SIZE in fdcache.cpp, and MULTIPART_SIZE in curl.cpp, if these symbol will be changed for test, you will be able to upload over default limit(64GB).
https://github.com/s3fs-fuse/s3fs-fuse/blob/master/src/fdcache.cpp#L54
https://github.com/s3fs-fuse/s3fs-fuse/blob/master/src/curl.cpp#L138
But if it does not working, probably you need to change more.
@timurb commented on GitHub (Mar 29, 2014):
Having a limit for max file size is ok but you better know about that at the moment of writing but not some months after, when you need that file and it is gone.
Could you please fix s3fs so that it would produce some kind of error if you try to write more than max allowed size?
Thanks in advance!
@ggtakec commented on GitHub (Mar 29, 2014):
This limit continues from an old version.
I think this upper limit is expanded and maybe I can change it, please wait for changing and testing.
But if you need it soon, you can change symbols.
for example:
MAX_OBJECT_SIZE 137438953470LL
FDPAGE_SIZE 100 * 1024 * 1024
MULTIPART_SIZE 20971520
@timurb commented on GitHub (Mar 29, 2014):
The current limit is ok for me.
My point is: the limit should be very hard/impossible to reach (probably some terrabytes at least) or we should received explicit error message when we write to the file past limit.
@ggtakec commented on GitHub (Mar 30, 2014):
s3fs returns error code as ENOTSUP in most cases when over 64gb.
I branched new "upperlimit#16" which is added new option as "multipart_size" option.
This option value means one part size(MB) for multipart uploading.(default 10(MB))
Please check it and try it.
Thanks,
@ggtakec commented on GitHub (Apr 4, 2014):
Merged codes to master branch.
Please try to use master branch, and if you found bugs please post new issue.
Regards,