[GH-ISSUE #984] EC2 mount S3 create a lot of CLOSE_WAIT connection #549

Closed
opened 2026-03-04 01:46:38 +03:00 by kerem · 12 comments
Owner

Originally created by @reasonlin0512 on GitHub (Mar 18, 2019).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/984

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD

Version of s3fs being used (s3fs --version)

v1.85

s3fs command line used, if applicable

sudo s3fs sv-doorbell-demo /home/ubuntu/s3 -o passwd_file=/home/ubuntu/.passwd-s3fs -ouid=1001,gid=1001,allow_other,mp_umask=022

After mount S3 to EC2 by s3fs, cd to the folder under s3 storage (ex: s3/folder), the "ls" or "tab" console command will create a lot of tcp connection (according to the number of file in the folder).
These tcp connection will hangs on CLOSE_WAIT status until the keep alive time reach the linux setting (/proc/sys/net/ipv4/tcp_keepalive_time, default 7200).

This situation will make the input/output error when writing file to s3 from EC2 (if too many CLOSE_WAIT connection hangs on).

Originally created by @reasonlin0512 on GitHub (Mar 18, 2019). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/984 ### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ _Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD_ #### Version of s3fs being used (s3fs --version) v1.85 #### s3fs command line used, if applicable ``` sudo s3fs sv-doorbell-demo /home/ubuntu/s3 -o passwd_file=/home/ubuntu/.passwd-s3fs -ouid=1001,gid=1001,allow_other,mp_umask=022 ``` After mount S3 to EC2 by s3fs, cd to the folder under s3 storage (ex: s3/folder), the "ls" or "tab" console command will create a lot of tcp connection (according to the number of file in the folder). These tcp connection will hangs on CLOSE_WAIT status until the keep alive time reach the linux setting (/proc/sys/net/ipv4/tcp_keepalive_time, default 7200). This situation will make the input/output error when writing file to s3 from EC2 (if too many CLOSE_WAIT connection hangs on).
kerem closed this issue 2026-03-04 01:46:38 +03:00
Author
Owner

@reasonlin0512 commented on GitHub (Mar 26, 2019):

Does anyone can support this issue?
Or explain why this happen?

<!-- gh-comment-id:476462459 --> @reasonlin0512 commented on GitHub (Mar 26, 2019): Does anyone can support this issue? Or explain why this happen?
Author
Owner

@gaul commented on GitHub (Mar 26, 2019):

Keeping the connection alive allows reuse between operations. Does s3fs create too many sockets, more than multireq_max (default 20)?

<!-- gh-comment-id:476604173 --> @gaul commented on GitHub (Mar 26, 2019): Keeping the connection alive allows reuse between operations. Does s3fs create too many sockets, more than `multireq_max` (default 20)?
Author
Owner

@reasonlin0512 commented on GitHub (Mar 27, 2019):

Keeping the connection alive allows reuse between operations. Does s3fs create too many sockets, more than multireq_max (default 20)?

For my observation, it won't be exceed 20 connection.
But it may reach the max connection in the same time.
And the other new operation who want to read/write file on s3 will be failed or get errors.

<!-- gh-comment-id:476921923 --> @reasonlin0512 commented on GitHub (Mar 27, 2019): > Keeping the connection alive allows reuse between operations. Does s3fs create too many sockets, more than `multireq_max` (default 20)? For my observation, it won't be exceed 20 connection. But it may reach the max connection in the same time. And the other new operation who want to read/write file on s3 will be failed or get errors.
Author
Owner

@reasonlin0512 commented on GitHub (Mar 27, 2019):

For more detail.
Because my code need to check the file list of specific folder in s3,
so the check code will be just like "ls" in console by using "dirent.h".
If the folder got more than 20 files, it will create or use full 20 tcp connection, and the other part which need to read/write data, it may get some fwrite or input/output errors.

If the issue is dependent to original aws s3 design, I think now I only can do is decrease the tcp connection such like do not use the "dirent.h" to get the file list.

But the issue still stand here even if I decrease the tcp connection.
So is there have any way to remove the CLOSE_WAIT tcp connection immediately after do not use anymore?

Thanks,
Best Regards.

<!-- gh-comment-id:476959633 --> @reasonlin0512 commented on GitHub (Mar 27, 2019): For more detail. Because my code need to check the file list of specific folder in s3, so the check code will be just like "ls" in console by using "dirent.h". If the folder got more than 20 files, it will create or use full 20 tcp connection, and the other part which need to read/write data, it may get some fwrite or input/output errors. If the issue is dependent to original aws s3 design, I think now I only can do is decrease the tcp connection such like do not use the "dirent.h" to get the file list. But the issue still stand here even if I decrease the tcp connection. So is there have any way to remove the CLOSE_WAIT tcp connection immediately after do not use anymore? Thanks, Best Regards.
Author
Owner

@gaul commented on GitHub (Mar 27, 2019):

I don't understand -- do the 20 idle connections prevent other operations from proceeding? Could you share an example which shows the undesired behavior?

<!-- gh-comment-id:477020421 --> @gaul commented on GitHub (Mar 27, 2019): I don't understand -- do the 20 idle connections prevent other operations from proceeding? Could you share an example which shows the undesired behavior?
Author
Owner

@reasonlin0512 commented on GitHub (Mar 28, 2019):

For example,
I wrote a simple C code to write file to S3 from EC2 by use s3fs repeatedly, the interval of each iteration is 1 secs, every time it wrote about 5M data to the same s3 object.

The process is work perfectly in the beginning, but after some iteration, it will get the fclose error (input/output error) randomly, and check the network used status by netstat you will find a lot of tcp connection is stay in "CLOSE_WAIT" status.

The simple C code is:

#include <stdio.h>
#include <stdlib.h>
int main()
{
int i = 0;
while(i++)
{
char data[0x4E2000];
FILE *fd = fopen("/path/to/s3/mount/point/xxx", "w");
if(fd == NULL) continue;
fwrite(data, 1, sizeof data, fd);
fclose(fd);
}
return 0;
}

Now I guess that is caused by too many unreleased "CLOSE_WAIT" tcp connection.
The code is work perfectly on EC2 if I save the file in EC2 local.

<!-- gh-comment-id:477495140 --> @reasonlin0512 commented on GitHub (Mar 28, 2019): For example, I wrote a simple C code to write file to S3 from EC2 by use s3fs repeatedly, the interval of each iteration is 1 secs, every time it wrote about 5M data to the same s3 object. The process is work perfectly in the beginning, but after some iteration, it will get the fclose error (input/output error) randomly, and check the network used status by netstat you will find a lot of tcp connection is stay in "CLOSE_WAIT" status. The simple C code is: #include <stdio.h> #include <stdlib.h> int main() { int i = 0; while(i++) { char data[0x4E2000]; FILE *fd = fopen("/path/to/s3/mount/point/xxx", "w"); if(fd == NULL) continue; fwrite(data, 1, sizeof data, fd); fclose(fd); } return 0; } Now I guess that is caused by too many unreleased "CLOSE_WAIT" tcp connection. The code is work perfectly on EC2 if I save the file in EC2 local.
Author
Owner

@ggtakec commented on GitHub (Mar 30, 2019):

Although the test for reproduction was done, basically the socket more than around multireq_max option number was not used.
Although the test for reproduction was done, basically s3fs did not use sockets over multireq_max option number.
And I confirmed CLOSE_WAIT with your program, but CLOSE_WAIT count was not increased.
(In addition, we need to change i ++ to ++ i for your example program)
s3fs reuses sockets, so I do not think that CLOSE_WAIT will keep increasing.
Is CLOSE_WAIT ever increasing now?

Please let us know if you know a little more details( such as lsof results when CLOSE_WAIT continues to grow ).
I think it helps us for resolving this issue.

<!-- gh-comment-id:478210686 --> @ggtakec commented on GitHub (Mar 30, 2019): Although the test for reproduction was done, basically the socket more than around multireq_max option number was not used. Although the test for reproduction was done, basically s3fs did not use sockets over multireq_max option number. And I confirmed CLOSE_WAIT with your program, but CLOSE_WAIT count was not increased. (In addition, we need to change `i ++` to `++ i` for your example program) s3fs reuses sockets, so I do not think that CLOSE_WAIT will keep increasing. Is CLOSE_WAIT ever increasing now? Please let us know if you know a little more details( such as lsof results when CLOSE_WAIT continues to grow ). I think it helps us for resolving this issue.
Author
Owner

@reasonlin0512 commented on GitHub (Apr 1, 2019):

I checked the network status, the CLOSE_WAIT connection won't be exceed than multireq_max option.
In the beginning I think the issue is cause by reach the maximum socket connection.
But after some analyze, I found even if I modify the multireq_max option from 20 to 100 or more, it still may get the same error (fclose error or fwrite error) when the in-use socket is reach the maximum socket connection number.

Now I only can increase the multir_req option, modify the tcp_keepalive_time, and try to make the connection decrease (try not use the something such like "ls" functional).
The connection now is decrease, but I think it still may get the same error if the situation which I mentioned on above is happen.

Now I got the other question is:
Does the phenomenon of s3fs establish so many connection is normally?(When we use "ls" console command to get the file list from s3 folder.)

Thanks,
Best Regards.

<!-- gh-comment-id:478492462 --> @reasonlin0512 commented on GitHub (Apr 1, 2019): I checked the network status, the CLOSE_WAIT connection won't be exceed than multireq_max option. In the beginning I think the issue is cause by reach the maximum socket connection. But after some analyze, I found even if I modify the multireq_max option from 20 to 100 or more, it still may get the same error (fclose error or fwrite error) when the in-use socket is reach the maximum socket connection number. Now I only can increase the multir_req option, modify the tcp_keepalive_time, and try to make the connection decrease (try not use the something such like "ls" functional). The connection now is decrease, but I think it still may get the same error if the situation which I mentioned on above is happen. Now I got the other question is: **Does the phenomenon of s3fs establish so many connection is normally?(When we use "ls" console command to get the file list from s3 folder.)** Thanks, Best Regards.
Author
Owner

@ggtakec commented on GitHub (Apr 7, 2019):

@reasonlin0512 Thanks for your reply.
Is CLOSE_WAIT by upload now within multireq_max?

About the added question, s3fs uses a multipart HEAD request (LIST) to check the files in the directory.
This is fired by ls command etc.
If you want to reduce the number of HEAD requests (also the number of sockets), try caching the status with max_stat_cache_size etc.

<!-- gh-comment-id:480585587 --> @ggtakec commented on GitHub (Apr 7, 2019): @reasonlin0512 Thanks for your reply. Is CLOSE_WAIT by upload now within multireq_max? About the added question, s3fs uses a multipart HEAD request (LIST) to check the files in the directory. This is fired by ls command etc. If you want to reduce the number of HEAD requests (also the number of sockets), try caching the status with max_stat_cache_size etc.
Author
Owner

@reasonlin0512 commented on GitHub (Apr 8, 2019):

I modify some code of my process to reduce the connection to s3 and also improve the write file code.

Now the socket connection and CLOSE_WAIT is under the multireq_max, and never seen the fwrite/fclose error since 4/2.

I will keep to check this issue.

Thanks~

<!-- gh-comment-id:480729361 --> @reasonlin0512 commented on GitHub (Apr 8, 2019): I modify some code of my process to reduce the connection to s3 and also improve the write file code. Now the socket connection and CLOSE_WAIT is under the multireq_max, and never seen the fwrite/fclose error since 4/2. I will keep to check this issue. Thanks~
Author
Owner

@ggtakec commented on GitHub (Apr 8, 2019):

Thanks for your cooperation.
After checking for a while, please let me know if there is a problem.
If this is not a problem, please close this issue.

<!-- gh-comment-id:480860706 --> @ggtakec commented on GitHub (Apr 8, 2019): Thanks for your cooperation. After checking for a while, please let me know if there is a problem. If this is not a problem, please close this issue.
Author
Owner

@gaul commented on GitHub (Feb 3, 2020):

Closing due to inactivity. Please reopen if symptoms persist.

<!-- gh-comment-id:581290536 --> @gaul commented on GitHub (Feb 3, 2020): Closing due to inactivity. Please reopen if symptoms persist.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#549
No description provided.