[GH-ISSUE #535] File in s3 ends up truncated to 0 bytes

kerem commented

2026-03-04 01:44:13 +03:00

Owner

Originally created by @alexeits on GitHub (Feb 16, 2017).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/535

A 75K file ended up in s3 truncated to 0 bytes. We analyzed the logs and couldn't find anything wrong.

It appears that s3fs first creates a file named .in.filename and then renames it to filename. According to S3 object copy documentation (the emphasis is mine):

There are two opportunities for a copy request to return an error. One can occur when Amazon S3 receives the copy request and the other can occur while Amazon S3 is copying the files. If the error occurs before the copy operation starts, you receive a standard Amazon S3 error. If the error occurs during the copy operation, the error response is embedded in the 200 OK response. This means that a 200 OK response can contain either a success or an error. Make sure to design your application to parse the contents of the response and handle it appropriately.

It doesn't appear that s3fs handles the case of error in a 200 response at least based on this code fragment.

Could you please confirm (or deny)?

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.

Version of s3fs being used (s3fs --version)
Amazon Simple Storage Service File System V1.80(commit:unknown) with OpenSSL
Version of fuse being used (pkg-config --modversion fuse)
2.9.2-4ubuntu4.15.04.1
System information (uname -a)
4.7.3-coreos-r2 #1 SMP Thu Feb 2 02:26:10 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Distro (cat /etc/issue)
Ubuntu 15.04
s3fs command line used (if applicable)

/etc/fstab entry (if applicable):

s3fs syslog messages (grep s3fs /var/log/syslog, or s3fs outputs)
The original logs no longer exist, but here are some snippets from the log collector

780	02/13/2017 20:00:25.213 -0800	[INF] s3fs.cpp:s3fs_flush(2141): [path=/.in.GLADLYDTA.CSV.][fd=7]
781	02/13/2017 20:00:25.213 -0800	[INF]       fdcache.cpp:RowFlush(1345): [tpath=][path=/.in.GLADLYDTA.CSV.][fd=7]
782	02/13/2017 20:00:25.213 -0800	[INF]       curl.cpp:PutRequest(2641): [tpath=/.in.GLADLYDTA.CSV.]
783	02/13/2017 20:00:25.214 -0800	[INF]       curl.cpp:prepare_url(4175): URL is https://s3.amazonaws.com/gladly-2c4251c6d0014a9994e8211571b52e10/sftp-uploads/.in.GLADLYDTA.CSV.
784	02/13/2017 20:00:25.214 -0800	[INF]       curl.cpp:prepare_url(4207): URL changed is https://gladly-2c4251c6d0014a9994e8211571b52e10.s3.amazonaws.com/sftp-uploads/.in.GLADLYDTA.CSV.
785	02/13/2017 20:00:25.214 -0800	[INF]       curl.cpp:insertV4Headers(2237): computing signature [PUT] [/sftp-uploads/.in.GLADLYDTA.CSV.] [] [a173f57fd766099fb14b8b481472a419f8eca4f70147813dbcca8762ec3b4e4b]
786	02/13/2017 20:00:25.215 -0800	[INF]       curl.cpp:url_to_host(100): url is https://s3.amazonaws.com
787	02/13/2017 20:00:25.215 -0800	[INF]       curl.cpp:PutRequest(2750): uploading... [path=/.in.GLADLYDTA.CSV.][fd=7][size=77896]
788	02/13/2017 20:00:25.215 -0800	[INF]       curl.cpp:RequestPerform(1910): HTTP response code 200
...
833	02/13/2017 20:00:25.226 -0800	[INF]   s3fs.cpp:rename_object(1175): [from=/.in.GLADLYDTA.CSV.][to=/GLADLYDTA.CSV]
834	02/13/2017 20:00:25.226 -0800	[INF]     s3fs.cpp:put_headers(769): [path=/GLADLYDTA.CSV]
835	02/13/2017 20:00:25.226 -0800	[INF]       curl.cpp:HeadRequest(2486): [tpath=/GLADLYDTA.CSV]
836	02/13/2017 20:00:25.226 -0800	[INF]       curl.cpp:PreHeadRequest(2423): [tpath=/GLADLYDTA.CSV][bpath=][save=][sseckeypos=-1]
837	02/13/2017 20:00:25.227 -0800	[INF]       curl.cpp:prepare_url(4175): URL is https://s3.amazonaws.com/gladly-2c4251c6d0014a9994e8211571b52e10/sftp-uploads/GLADLYDTA.CSV
838	02/13/2017 20:00:25.227 -0800	[INF]       curl.cpp:prepare_url(4207): URL changed is https://gladly-2c4251c6d0014a9994e8211571b52e10.s3.amazonaws.com/sftp-uploads/GLADLYDTA.CSV
839	02/13/2017 20:00:25.227 -0800	[INF]       curl.cpp:insertV4Headers(2237): computing signature [HEAD] [/sftp-uploads/GLADLYDTA.CSV] [] []
840	02/13/2017 20:00:25.227 -0800	[INF]       curl.cpp:url_to_host(100): url is https://s3.amazonaws.com
841	02/13/2017 20:00:25.228 -0800	[INF]       curl.cpp:RequestPerform(1932): HTTP response code 404 was returned, returning ENOENT
842	02/13/2017 20:00:25.228 -0800	[INF]       curl.cpp:HeadRequest(2486): [tpath=/GLADLYDTA.CSV/]
843	02/13/2017 20:00:25.228 -0800	[INF]       curl.cpp:PreHeadRequest(2423): [tpath=/GLADLYDTA.CSV/][bpath=][save=][sseckeypos=-1]
844	02/13/2017 20:00:25.228 -0800	[INF]       curl.cpp:prepare_url(4175): URL is https://s3.amazonaws.com/gladly-2c4251c6d0014a9994e8211571b52e10/sftp-uploads/GLADLYDTA.CSV/
845	02/13/2017 20:00:25.229 -0800	[INF]       curl.cpp:prepare_url(4207): URL changed is https://gladly-2c4251c6d0014a9994e8211571b52e10.s3.amazonaws.com/sftp-uploads/GLADLYDTA.CSV/
846	02/13/2017 20:00:25.229 -0800	[INF]       curl.cpp:insertV4Headers(2237): computing signature [HEAD] [/sftp-uploads/GLADLYDTA.CSV/] [] []
847	02/13/2017 20:00:25.229 -0800	[INF]       curl.cpp:url_to_host(100): url is https://s3.amazonaws.com
848	02/13/2017 20:00:25.230 -0800	[INF]       curl.cpp:RequestPerform(1932): HTTP response code 404 was returned, returning ENOENT
849	02/13/2017 20:00:25.230 -0800	[INF]       curl.cpp:HeadRequest(2486): [tpath=/GLADLYDTA.CSV_$folder$]
850	02/13/2017 20:00:25.230 -0800	[INF]       curl.cpp:PreHeadRequest(2423): [tpath=/GLADLYDTA.CSV_$folder$][bpath=][save=][sseckeypos=-1]
851	02/13/2017 20:00:25.230 -0800	[INF]       curl.cpp:prepare_url(4175): URL is https://s3.amazonaws.com/gladly-2c4251c6d0014a9994e8211571b52e10/sftp-uploads/GLADLYDTA.CSV_%24folder%24
852	02/13/2017 20:00:25.231 -0800	[INF]       curl.cpp:prepare_url(4207): URL changed is https://gladly-2c4251c6d0014a9994e8211571b52e10.s3.amazonaws.com/sftp-uploads/GLADLYDTA.CSV_%24folder%24
853	02/13/2017 20:00:25.231 -0800	[INF]       curl.cpp:insertV4Headers(2237): computing signature [HEAD] [/sftp-uploads/GLADLYDTA.CSV_$folder$] [] []
854	02/13/2017 20:00:25.231 -0800	[INF]       curl.cpp:url_to_host(100): url is https://s3.amazonaws.com
855	02/13/2017 20:00:25.231 -0800	[INF]       curl.cpp:RequestPerform(1932): HTTP response code 404 was returned, returning ENOENT

Details about issue

Originally created by @alexeits on GitHub (Feb 16, 2017). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/535 A 75K file ended up in s3 truncated to 0 bytes. We analyzed the logs and couldn't find anything wrong. It appears that s3fs first creates a file named `.in.filename` and then renames it to `filename`. According to S3 object copy [documentation](http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html) (the emphasis is mine): > There are two opportunities for a copy request to return an error. One can occur when Amazon S3 receives the copy request and the other can occur while Amazon S3 is copying the files. If the error occurs before the copy operation starts, you receive a standard Amazon S3 error. If the error occurs during the copy operation, the error response is embedded in the 200 OK response. **This means that a 200 OK response can contain either a success or an error.** Make sure to design your application to parse the contents of the response and handle it appropriately. It doesn't appear that s3fs handles the case of error in a 200 response at least based on [this code fragment](https://github.com/s3fs-fuse/s3fs-fuse/blob/8a11d7bc2ffd6779f3439a0a730cce9db2f8356d/src/curl.cpp#L1932-L1934). Could you please confirm (or deny)? #### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ - Version of s3fs being used (s3fs --version) - _Amazon Simple Storage Service File System V1.80(commit:unknown) with OpenSSL_ - Version of fuse being used (pkg-config --modversion fuse) - _2.9.2-4ubuntu4.15.04.1_ - System information (uname -a) - _4.7.3-coreos-r2 #1 SMP Thu Feb 2 02:26:10 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux_ - Distro (cat /etc/issue) - _Ubuntu 15.04_ - s3fs command line used (if applicable) ``` ``` - /etc/fstab entry (if applicable): ``` ``` - s3fs syslog messages (grep s3fs /var/log/syslog, or s3fs outputs) _The original logs no longer exist, but here are some snippets from the log collector_ ``` 780 02/13/2017 20:00:25.213 -0800 [INF] s3fs.cpp:s3fs_flush(2141): [path=/.in.GLADLYDTA.CSV.][fd=7] 781 02/13/2017 20:00:25.213 -0800 [INF] fdcache.cpp:RowFlush(1345): [tpath=][path=/.in.GLADLYDTA.CSV.][fd=7] 782 02/13/2017 20:00:25.213 -0800 [INF] curl.cpp:PutRequest(2641): [tpath=/.in.GLADLYDTA.CSV.] 783 02/13/2017 20:00:25.214 -0800 [INF] curl.cpp:prepare_url(4175): URL is https://s3.amazonaws.com/gladly-2c4251c6d0014a9994e8211571b52e10/sftp-uploads/.in.GLADLYDTA.CSV. 784 02/13/2017 20:00:25.214 -0800 [INF] curl.cpp:prepare_url(4207): URL changed is https://gladly-2c4251c6d0014a9994e8211571b52e10.s3.amazonaws.com/sftp-uploads/.in.GLADLYDTA.CSV. 785 02/13/2017 20:00:25.214 -0800 [INF] curl.cpp:insertV4Headers(2237): computing signature [PUT] [/sftp-uploads/.in.GLADLYDTA.CSV.] [] [a173f57fd766099fb14b8b481472a419f8eca4f70147813dbcca8762ec3b4e4b] 786 02/13/2017 20:00:25.215 -0800 [INF] curl.cpp:url_to_host(100): url is https://s3.amazonaws.com 787 02/13/2017 20:00:25.215 -0800 [INF] curl.cpp:PutRequest(2750): uploading... [path=/.in.GLADLYDTA.CSV.][fd=7][size=77896] 788 02/13/2017 20:00:25.215 -0800 [INF] curl.cpp:RequestPerform(1910): HTTP response code 200 ... 833 02/13/2017 20:00:25.226 -0800 [INF] s3fs.cpp:rename_object(1175): [from=/.in.GLADLYDTA.CSV.][to=/GLADLYDTA.CSV] 834 02/13/2017 20:00:25.226 -0800 [INF] s3fs.cpp:put_headers(769): [path=/GLADLYDTA.CSV] 835 02/13/2017 20:00:25.226 -0800 [INF] curl.cpp:HeadRequest(2486): [tpath=/GLADLYDTA.CSV] 836 02/13/2017 20:00:25.226 -0800 [INF] curl.cpp:PreHeadRequest(2423): [tpath=/GLADLYDTA.CSV][bpath=][save=][sseckeypos=-1] 837 02/13/2017 20:00:25.227 -0800 [INF] curl.cpp:prepare_url(4175): URL is https://s3.amazonaws.com/gladly-2c4251c6d0014a9994e8211571b52e10/sftp-uploads/GLADLYDTA.CSV 838 02/13/2017 20:00:25.227 -0800 [INF] curl.cpp:prepare_url(4207): URL changed is https://gladly-2c4251c6d0014a9994e8211571b52e10.s3.amazonaws.com/sftp-uploads/GLADLYDTA.CSV 839 02/13/2017 20:00:25.227 -0800 [INF] curl.cpp:insertV4Headers(2237): computing signature [HEAD] [/sftp-uploads/GLADLYDTA.CSV] [] [] 840 02/13/2017 20:00:25.227 -0800 [INF] curl.cpp:url_to_host(100): url is https://s3.amazonaws.com 841 02/13/2017 20:00:25.228 -0800 [INF] curl.cpp:RequestPerform(1932): HTTP response code 404 was returned, returning ENOENT 842 02/13/2017 20:00:25.228 -0800 [INF] curl.cpp:HeadRequest(2486): [tpath=/GLADLYDTA.CSV/] 843 02/13/2017 20:00:25.228 -0800 [INF] curl.cpp:PreHeadRequest(2423): [tpath=/GLADLYDTA.CSV/][bpath=][save=][sseckeypos=-1] 844 02/13/2017 20:00:25.228 -0800 [INF] curl.cpp:prepare_url(4175): URL is https://s3.amazonaws.com/gladly-2c4251c6d0014a9994e8211571b52e10/sftp-uploads/GLADLYDTA.CSV/ 845 02/13/2017 20:00:25.229 -0800 [INF] curl.cpp:prepare_url(4207): URL changed is https://gladly-2c4251c6d0014a9994e8211571b52e10.s3.amazonaws.com/sftp-uploads/GLADLYDTA.CSV/ 846 02/13/2017 20:00:25.229 -0800 [INF] curl.cpp:insertV4Headers(2237): computing signature [HEAD] [/sftp-uploads/GLADLYDTA.CSV/] [] [] 847 02/13/2017 20:00:25.229 -0800 [INF] curl.cpp:url_to_host(100): url is https://s3.amazonaws.com 848 02/13/2017 20:00:25.230 -0800 [INF] curl.cpp:RequestPerform(1932): HTTP response code 404 was returned, returning ENOENT 849 02/13/2017 20:00:25.230 -0800 [INF] curl.cpp:HeadRequest(2486): [tpath=/GLADLYDTA.CSV_$folder$] 850 02/13/2017 20:00:25.230 -0800 [INF] curl.cpp:PreHeadRequest(2423): [tpath=/GLADLYDTA.CSV_$folder$][bpath=][save=][sseckeypos=-1] 851 02/13/2017 20:00:25.230 -0800 [INF] curl.cpp:prepare_url(4175): URL is https://s3.amazonaws.com/gladly-2c4251c6d0014a9994e8211571b52e10/sftp-uploads/GLADLYDTA.CSV_%24folder%24 852 02/13/2017 20:00:25.231 -0800 [INF] curl.cpp:prepare_url(4207): URL changed is https://gladly-2c4251c6d0014a9994e8211571b52e10.s3.amazonaws.com/sftp-uploads/GLADLYDTA.CSV_%24folder%24 853 02/13/2017 20:00:25.231 -0800 [INF] curl.cpp:insertV4Headers(2237): computing signature [HEAD] [/sftp-uploads/GLADLYDTA.CSV_$folder$] [] [] 854 02/13/2017 20:00:25.231 -0800 [INF] curl.cpp:url_to_host(100): url is https://s3.amazonaws.com 855 02/13/2017 20:00:25.231 -0800 [INF] curl.cpp:RequestPerform(1932): HTTP response code 404 was returned, returning ENOENT ``` #### Details about issue

kerem closed this issue

2026-03-04 01:44:13 +03:00

kerem commented

2026-03-04 01:44:14 +03:00

Author

Owner

@ggtakec commented on GitHub (Apr 2, 2017):

@alexeits I'm sorry for my late replying.
s3fs does not create an object with the filename ".in.*" as s3fs's internal processing.
If you just only create the file under s3fs mount directory, s3fs will create the object "filename" directly.

Please let us know your s3fs option when this problem occurred.
And is there any application you are using with the s3fs file system?
(It seems that the application is trying to create an ".in.GLADLYDTA.CSV." File and rename it to "GLADLYDTA.CSV")

Last, I think the explanation of PUT API is correct.
The part you pointed out is the response processing part for one request of s3fs, and the 200/300 error is not handled as an error.

Thanks in advance for your assistance.

@ggtakec commented on GitHub (Apr 2, 2017): @alexeits I'm sorry for my late replying. s3fs does not create an object with the filename ".in.*" as s3fs's internal processing. If you just only create the file under s3fs mount directory, s3fs will create the object "filename" directly. Please let us know your s3fs option when this problem occurred. And is there any application you are using with the s3fs file system? (It seems that the application is trying to create an ".in.GLADLYDTA.CSV." File and rename it to "GLADLYDTA.CSV") Last, I think the explanation of PUT API is correct. The part you pointed out is the response processing part for one request of s3fs, and the 200/300 error is not handled as an error. Thanks in advance for your assistance.

kerem commented

2026-03-04 01:44:14 +03:00

Author

Owner

@alexeits commented on GitHub (Apr 3, 2017):

@ggtakec thanks for replying

Here are answers to your questions.

Please let us know your s3fs option when this problem occurred

/usr/bin/s3fs -f -d %(ENV_S3FS_BUCKET)s:%(ENV_S3FS_PREFIX)s %(ENV_PROFTPD_HOME)s/%(ENV_SSH_USER)s -o allow_other -o use_cache=/tmp -o iam_role=auto -o use_sse=1 -o url=https://s3.amazonaws.com

And is there any application you are using with the s3fs file system?

Yes, it's proftpd FTP server that comes with ubuntu:15.04. That's what is creating those .in.* files.

@alexeits commented on GitHub (Apr 3, 2017): @ggtakec thanks for replying Here are answers to your questions. > Please let us know your s3fs option when this problem occurred ``` /usr/bin/s3fs -f -d %(ENV_S3FS_BUCKET)s:%(ENV_S3FS_PREFIX)s %(ENV_PROFTPD_HOME)s/%(ENV_SSH_USER)s -o allow_other -o use_cache=/tmp -o iam_role=auto -o use_sse=1 -o url=https://s3.amazonaws.com ``` > And is there any application you are using with the s3fs file system? Yes, it's `proftpd` FTP server that comes with ubuntu:15.04. That's [what is creating](http://www.proftpd.org/docs/directives/linked/config_ref_HiddenStores.html) those `.in.*` files.

kerem commented

2026-03-04 01:44:14 +03:00

Author

Owner

@ggtakec commented on GitHub (Apr 9, 2017):

@alexeits I understood about you are using proftpd with s3fs.

First of all, no error is found in your log of this Issue.
In the log, s3fs only appears until rename ".in.GLADLYDTA.CSV." to "GLADLYDTA.CSV".

The reason that "GLADLYDTA.CSV" is 0 byte is because uploading of ". In.GLADLYDTA.CSV." file failed and created 0 byte ".in.GLADLYDTA.CSV." or it seems to have failed in rename processing.

Is there a log after your log when an error occurred?
If it is an error with s3fs rename, it may be followed by an error.
Or we need to check the log of "..." part from L788 to L833.

Thanks in advance for your help.

@ggtakec commented on GitHub (Apr 9, 2017): @alexeits I understood about you are using proftpd with s3fs. First of all, no error is found in your log of this Issue. In the log, s3fs only appears until rename ".in.GLADLYDTA.CSV." to "GLADLYDTA.CSV". The reason that "GLADLYDTA.CSV" is 0 byte is because uploading of ". In.GLADLYDTA.CSV." file failed and created 0 byte ".in.GLADLYDTA.CSV." or it seems to have failed in rename processing. Is there a log after your log when an error occurred? If it is an error with s3fs rename, it may be followed by an error. Or we need to check the log of "..." part from L788 to L833. Thanks in advance for your help.

kerem commented

2026-03-04 01:44:14 +03:00

Author

Owner

@alexeits commented on GitHub (Apr 13, 2017):

@ggtakec let me try to answer your questions one by one

The reason that "GLADLYDTA.CSV" is 0 byte is because uploading of ". In.GLADLYDTA.CSV." file failed and created 0 byte ".in.GLADLYDTA.CSV." or it seems to have failed in rename processing.

I don't believe that either of this operations failed in a way that was detected by s3fs. The lines 787-
788 of the log show that 77896 bytes of the ".in.GLADLYDTA.CSV." file were uploaded successfully:

787	02/13/2017 20:00:25.215 -0800	[INF]       curl.cpp:PutRequest(2750): uploading... [path=/.in.GLADLYDTA.CSV.][fd=7][size=77896]
788	02/13/2017 20:00:25.215 -0800	[INF]       curl.cpp:RequestPerform(1910): HTTP response code 200

As for the rename I'm certain that NO errors were reported. As I stated in the issue we did not find any errors in the log at all. My hypothesis is that s3 failed after the s3 copy operation had started. In this case s3 returns 200 and, as we discussed earlier, s3fs does NOT handle 200/300 responses from s3 copy operation as an error, which I believe why we ended up with 0 bytes in the destination file "GLADLYDTA.CSV".

It seems that the best course of action for s3fs would be to check the copy operation response for errors even in case of getting HTTP status 200. Please let me know if you disagree.

@alexeits commented on GitHub (Apr 13, 2017): @ggtakec let me try to answer your questions one by one > The reason that "GLADLYDTA.CSV" is 0 byte is because uploading of ". In.GLADLYDTA.CSV." file failed and created 0 byte ".in.GLADLYDTA.CSV." or it seems to have failed in rename processing. I don't believe that either of this operations failed in a way that was detected by s3fs. The lines 787- 788 of the log show that 77896 bytes of the ".in.GLADLYDTA.CSV." file were uploaded successfully: ``` 787 02/13/2017 20:00:25.215 -0800 [INF] curl.cpp:PutRequest(2750): uploading... [path=/.in.GLADLYDTA.CSV.][fd=7][size=77896] 788 02/13/2017 20:00:25.215 -0800 [INF] curl.cpp:RequestPerform(1910): HTTP response code 200 ``` As for the rename I'm certain that NO errors were reported. As I stated in the issue we did not find any errors in the log at all. My hypothesis is that s3 failed after the s3 copy operation had started. In this case s3 returns 200 and, as we discussed [earlier](https://github.com/s3fs-fuse/s3fs-fuse/issues/535#issuecomment-290965832), s3fs does NOT handle 200/300 responses from s3 copy operation as an error, which I believe why we ended up with 0 bytes in the destination file "GLADLYDTA.CSV". It seems that the best course of action for s3fs would be to check the copy operation response for errors even in case of getting HTTP status 200. Please let me know if you disagree.

kerem commented

2026-03-04 01:44:14 +03:00

Author

Owner

@ggtakec commented on GitHub (May 5, 2017):

@alexeits
As you say, uploading seems to have succeeded.
However, if the rename failed, I do not think that it will become a 0 byte file.
This is because s3fs uses the copy api(PUT HEADER) for rename processing, and copy of the file object is done in S3.
As far as logs are concerned, s3fs has also succeeded in this PUT processing.
(s3fs handles error of this PUT HEADER, but here seems to be no error.)

If you can reproduce it, could you get the log with the curldbg option?
It will be helpful if you get more detailed logs.
Thanks in advance for your assistance.

@ggtakec commented on GitHub (May 5, 2017): @alexeits As you say, uploading seems to have succeeded. However, if the rename failed, I do not think that it will become a 0 byte file. This is because s3fs uses the copy api(PUT HEADER) for rename processing, and copy of the file object is done in S3. As far as logs are concerned, s3fs has also succeeded in this PUT processing. (s3fs handles error of this PUT HEADER, but here seems to be no error.) If you can reproduce it, could you get the log with the curldbg option? It will be helpful if you get more detailed logs. Thanks in advance for your assistance.

kerem commented

2026-03-04 01:44:14 +03:00

Author

Owner

@alexeits commented on GitHub (May 6, 2017):

@ggtakec
It won't be possible to collect any more detailed logs at this time. I would make sure to collect them when the issue occurs again. Although it's likely so rare that we might not see it in months or even longer.

I'd like to challenge your assertion that if the rename failed, it will not be possible for the file become a 0 byte file. As we discussed earlier the s3 copy api (PUT HEADER) may fail inside s3 and s3 would return 200 even if the operation failed. Citing the relevant passage from the s3 docs again:

If the error occurs during the copy operation, the error response is embedded in the 200 OK response. This means that a 200 OK response can contain either a success or an error.

Also as we have established in our discussion s3fs does NOT currently handle errors returned in 200 OK responses. I do believe the lack of handling for such errors in s3fs is the most likely reason for the silent rename failure.

@alexeits commented on GitHub (May 6, 2017): @ggtakec It won't be possible to collect any more detailed logs at this time. I would make sure to collect them when the issue occurs again. Although it's likely so rare that we might not see it in months or even longer. I'd like to challenge your assertion that if the rename failed, it will not be possible for the file become a 0 byte file. As we discussed earlier the s3 copy api (PUT HEADER) may fail inside s3 and s3 would return 200 even if the operation failed. Citing the relevant passage from the s3 docs again: > If the error occurs during the copy operation, the error response is embedded in the 200 OK response. This means that a 200 OK response can contain either a success or an error. Also as we have established in our discussion s3fs does NOT currently handle errors returned in 200 OK responses. I do believe the lack of handling for such errors in s3fs is the most likely reason for the _silent_ rename failure.

kerem commented

2026-03-04 01:44:14 +03:00

Author

Owner

@ggtakec commented on GitHub (May 6, 2017):

@alexeits
I was unaware of the essence of this problem.
I added a simple check with your advice.
In renaming, when an error occurred during S3 internal processing, it was made to be an EIO error.

Please use latest codes in master branch or check it.
And if you have no problem about this fix, please close this issue.0
Thanks in advance for your help.

@ggtakec commented on GitHub (May 6, 2017): @alexeits I was unaware of the essence of this problem. I added a simple check with your advice. In renaming, when an error occurred during S3 internal processing, it was made to be an EIO error. Please use latest codes in master branch or check it. And if you have no problem about this fix, please close this issue.0 Thanks in advance for your help.

kerem commented

2026-03-04 01:44:15 +03:00

Author

Owner

@sqlbot commented on GitHub (May 6, 2017):

@alexeits it seems unlikely that an S3 internal error would result in a 0 byte object, since writes are atomic thus an internal failure should result in nothing happening to the bucket at all.

It seems plausible that there's a consistency model issue at play. I don't think we have a way (from the outside looking in) to determine conclusively whether PUT/Copy always makes a copy of the most recently written instance of the object.

We can conclude anecdotally that S3 doesn't literally overwrite objects internally, because overwriting an object doesn't interfere with users who are in the process of downloading an object that is overwritten, and because it seems more likely that overwriting an object allocates new storage and pushes an update to the bucket's index, with index replication being the presumed cause of the eventual consistency behavior.

So while I think the change here is a move in the right direction, I don't know that it will solve the issue, which may be more of a race condition where we inadvertently copy the prior instance of the object -- the empty one we initially create when a file is opened for writing. If this is what's happening then it's not an issue in s3fs but perhaps is something we can find another way to work around.

My buckets use versioning, and I don't think I see this issue... which could imply that buckets with versioning enabled use different internal logic inside S3... different logic that coincidentally mediates the behavior.

@sqlbot commented on GitHub (May 6, 2017): @alexeits it seems unlikely that an S3 internal error would result in a 0 byte object, since writes are atomic thus an internal failure should result in nothing happening to the bucket at all. It seems plausible that there's a consistency model issue at play. I don't think we have a way (from the outside looking in) to determine conclusively whether PUT/Copy always makes a copy of the most recently written instance of the object. We can conclude anecdotally that S3 doesn't literally overwrite objects internally, because overwriting an object doesn't interfere with users who are in the process of downloading an object that is overwritten, and because it seems more likely that overwriting an object allocates new storage and pushes an update to the bucket's index, with index replication being the presumed cause of the eventual consistency behavior. So while I think the change here is a move in the right direction, I don't know that it will solve the issue, which may be more of a race condition where we inadvertently copy the prior instance of the object -- the empty one we initially create when a file is opened for writing. If this is what's happening then it's not an issue in s3fs but perhaps is something we can find another way to work around. My buckets use versioning, and I don't think I see this issue... which could imply that buckets with versioning enabled use different internal logic inside S3... different logic that coincidentally mediates the behavior.

kerem commented

2026-03-04 01:44:15 +03:00

Author

Owner

@ggtakec commented on GitHub (May 6, 2017):

@sqlbot Thanks for your kindness.
Yes, the #582 patch does not solve this problem.
It only avoided the problem that s3fs did not judge the content of the response body.

It is unclear yet why the zero byte file is the essence of this problem.
If @alexeits can reproduce the phenomenon with a new code, he can drop the error body log, so we want that log.

@ggtakec commented on GitHub (May 6, 2017): @sqlbot Thanks for your kindness. Yes, the #582 patch does not solve this problem. It only avoided the problem that s3fs did not judge the content of the response body. It is unclear yet why the zero byte file is the essence of this problem. If @alexeits can reproduce the phenomenon with a new code, he can drop the error body log, so we want that log.

kerem commented

2026-03-04 01:44:15 +03:00

Author

Owner

@ggtakec commented on GitHub (May 7, 2017):

@alexeits
Do you use SSE(use_sse option) for this problem?
I fixed codes about rename bug with SSE at #585.
I got error on this bug case, it was 0 byte files are made.
If you can, please use latest codes in master branch.
Regards,

@ggtakec commented on GitHub (May 7, 2017): @alexeits Do you use SSE(use_sse option) for this problem? I fixed codes about rename bug with SSE at #585. I got error on this bug case, it was 0 byte files are made. If you can, please use latest codes in master branch. Regards,

kerem commented

2026-03-04 01:44:15 +03:00

Author

Owner

@alexeits commented on GitHub (May 8, 2017):

Thank you @sqlbot and @ggtakec. We do use SSE (use_sse=1). I'll get and deploy the latest fixes.

I'm going to close the issue because it's possible that it was fixed in #585. It's very rare in our environment and getting a definitive confirmation of a fix could take a long time.

@alexeits commented on GitHub (May 8, 2017): Thank you @sqlbot and @ggtakec. We do use SSE (use_sse=1). I'll get and deploy the latest fixes. I'm going to close the issue because it's possible that it was fixed in #585. It's very rare in our environment and getting a definitive confirmation of a fix could take a long time.

kerem commented

2026-03-04 01:44:15 +03:00

Author

Owner

@ggtakec commented on GitHub (May 9, 2017):

@alexeits If this problem seems to happen again, please reopen this issue.
Regards,

@ggtakec commented on GitHub (May 9, 2017): @alexeits If this problem seems to happen again, please reopen this issue. Regards,

kerem referenced this issue

2026-03-04 01:54:02 +03:00

[PR #306] [MERGED] Fix read concurrency to work in parallel count #1452

kerem referenced this issue

2026-03-04 01:54:02 +03:00

[PR #308] [MERGED] Changed ensure free disk space as additional change for #306 #1454

Rows
Columns

[GH-ISSUE #535] File in s3 ends up truncated to 0 bytes #306

Additional Information

Details about issue