mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #984] EC2 mount S3 create a lot of CLOSE_WAIT connection #549
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#549
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @reasonlin0512 on GitHub (Mar 18, 2019).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/984
Additional Information
The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD
Version of s3fs being used (s3fs --version)
v1.85
s3fs command line used, if applicable
After mount S3 to EC2 by s3fs, cd to the folder under s3 storage (ex: s3/folder), the "ls" or "tab" console command will create a lot of tcp connection (according to the number of file in the folder).
These tcp connection will hangs on CLOSE_WAIT status until the keep alive time reach the linux setting (/proc/sys/net/ipv4/tcp_keepalive_time, default 7200).
This situation will make the input/output error when writing file to s3 from EC2 (if too many CLOSE_WAIT connection hangs on).
@reasonlin0512 commented on GitHub (Mar 26, 2019):
Does anyone can support this issue?
Or explain why this happen?
@gaul commented on GitHub (Mar 26, 2019):
Keeping the connection alive allows reuse between operations. Does s3fs create too many sockets, more than
multireq_max(default 20)?@reasonlin0512 commented on GitHub (Mar 27, 2019):
For my observation, it won't be exceed 20 connection.
But it may reach the max connection in the same time.
And the other new operation who want to read/write file on s3 will be failed or get errors.
@reasonlin0512 commented on GitHub (Mar 27, 2019):
For more detail.
Because my code need to check the file list of specific folder in s3,
so the check code will be just like "ls" in console by using "dirent.h".
If the folder got more than 20 files, it will create or use full 20 tcp connection, and the other part which need to read/write data, it may get some fwrite or input/output errors.
If the issue is dependent to original aws s3 design, I think now I only can do is decrease the tcp connection such like do not use the "dirent.h" to get the file list.
But the issue still stand here even if I decrease the tcp connection.
So is there have any way to remove the CLOSE_WAIT tcp connection immediately after do not use anymore?
Thanks,
Best Regards.
@gaul commented on GitHub (Mar 27, 2019):
I don't understand -- do the 20 idle connections prevent other operations from proceeding? Could you share an example which shows the undesired behavior?
@reasonlin0512 commented on GitHub (Mar 28, 2019):
For example,
I wrote a simple C code to write file to S3 from EC2 by use s3fs repeatedly, the interval of each iteration is 1 secs, every time it wrote about 5M data to the same s3 object.
The process is work perfectly in the beginning, but after some iteration, it will get the fclose error (input/output error) randomly, and check the network used status by netstat you will find a lot of tcp connection is stay in "CLOSE_WAIT" status.
The simple C code is:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i = 0;
while(i++)
{
char data[0x4E2000];
FILE *fd = fopen("/path/to/s3/mount/point/xxx", "w");
if(fd == NULL) continue;
fwrite(data, 1, sizeof data, fd);
fclose(fd);
}
return 0;
}
Now I guess that is caused by too many unreleased "CLOSE_WAIT" tcp connection.
The code is work perfectly on EC2 if I save the file in EC2 local.
@ggtakec commented on GitHub (Mar 30, 2019):
Although the test for reproduction was done, basically the socket more than around multireq_max option number was not used.
Although the test for reproduction was done, basically s3fs did not use sockets over multireq_max option number.
And I confirmed CLOSE_WAIT with your program, but CLOSE_WAIT count was not increased.
(In addition, we need to change
i ++to++ ifor your example program)s3fs reuses sockets, so I do not think that CLOSE_WAIT will keep increasing.
Is CLOSE_WAIT ever increasing now?
Please let us know if you know a little more details( such as lsof results when CLOSE_WAIT continues to grow ).
I think it helps us for resolving this issue.
@reasonlin0512 commented on GitHub (Apr 1, 2019):
I checked the network status, the CLOSE_WAIT connection won't be exceed than multireq_max option.
In the beginning I think the issue is cause by reach the maximum socket connection.
But after some analyze, I found even if I modify the multireq_max option from 20 to 100 or more, it still may get the same error (fclose error or fwrite error) when the in-use socket is reach the maximum socket connection number.
Now I only can increase the multir_req option, modify the tcp_keepalive_time, and try to make the connection decrease (try not use the something such like "ls" functional).
The connection now is decrease, but I think it still may get the same error if the situation which I mentioned on above is happen.
Now I got the other question is:
Does the phenomenon of s3fs establish so many connection is normally?(When we use "ls" console command to get the file list from s3 folder.)
Thanks,
Best Regards.
@ggtakec commented on GitHub (Apr 7, 2019):
@reasonlin0512 Thanks for your reply.
Is CLOSE_WAIT by upload now within multireq_max?
About the added question, s3fs uses a multipart HEAD request (LIST) to check the files in the directory.
This is fired by ls command etc.
If you want to reduce the number of HEAD requests (also the number of sockets), try caching the status with max_stat_cache_size etc.
@reasonlin0512 commented on GitHub (Apr 8, 2019):
I modify some code of my process to reduce the connection to s3 and also improve the write file code.
Now the socket connection and CLOSE_WAIT is under the multireq_max, and never seen the fwrite/fclose error since 4/2.
I will keep to check this issue.
Thanks~
@ggtakec commented on GitHub (Apr 8, 2019):
Thanks for your cooperation.
After checking for a while, please let me know if there is a problem.
If this is not a problem, please close this issue.
@gaul commented on GitHub (Feb 3, 2020):
Closing due to inactivity. Please reopen if symptoms persist.