[GH-ISSUE #283] s3fs causing high io.wait? #146

Closed
opened 2026-03-04 01:42:37 +03:00 by kerem · 2 comments
Owner

Originally created by @flutist599 on GitHub (Oct 22, 2015).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/283

We're using s3fs for a incoming

ftp-server on AWS to connect EC2-instance to S3. Input stream is constantly 25Mbyte/s, objects are 500k-30MB avg. size.

Even EC2 instance is a c4.large we sometimes experience up to 100 % cpu io.wait, load goes high, and traffic throughput is decreasing heavily. I could not nail down the io.wait to a specific process, but I know it's not related to local disks or so they are quite idle..

s3fs main-sftp.production.allexis /var/ftp/data -o rw,enable_content_md5,enable_noobj_cache,stat_cache_expire=120,allow_other,uid=99,gid=9

what I have also found out that the traffic that is uploaded to S3 is also downloaded to the ftp-instance again; I assume this is to calculate the md5-checksum as configured?

So may my assumption be correct that for every incoming file ftp-server is busy with

  • receiving it from ftp-uploader
  • sending to S3
  • retreiving from S3

and this causes high io? Would sound logical.

Also, do I need md5-check? I don't need local caching. From the docu it was not clear for me if I _should have md5 enabled to verify integrity, or if it is 'only' to assist with local caching?

Below some monitoring graphs of the instance; hope you can help. thx, Dominik

you can see that when io.wait goes to 100%, traffic processing is stuck.
load
network
cpu

Originally created by @flutist599 on GitHub (Oct 22, 2015). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/283 We're using s3fs for a incoming ftp-server on AWS to connect EC2-instance to S3. Input stream is constantly 25Mbyte/s, objects are 500k-30MB avg. size. Even EC2 instance is a c4.large we sometimes experience up to 100 % cpu io.wait, load goes high, and traffic throughput is decreasing heavily. I could not nail down the io.wait to a specific process, but I know it's not related to local disks or so they are quite idle.. s3fs main-sftp.production.allexis /var/ftp/data -o rw,enable_content_md5,enable_noobj_cache,stat_cache_expire=120,allow_other,uid=99,gid=9 what I have also found out that the traffic that is uploaded to S3 is also downloaded to the ftp-instance again; I assume this is to calculate the md5-checksum as configured? So may my assumption be correct that for every incoming file ftp-server is busy with - receiving it from ftp-uploader - sending to S3 - retreiving from S3 and this causes high io? Would sound logical. Also, do I need md5-check? I don't need local caching. From the docu it was not clear for me if I _should have md5 enabled to verify integrity, or if it is 'only' to assist with local caching? Below some monitoring graphs of the instance; hope you can help. thx, Dominik > > you can see that when io.wait goes to 100%, traffic processing is stuck. > > ![load](https://cloud.githubusercontent.com/assets/6976834/10658333/200e1120-7896-11e5-9b42-a3ce3e952ff8.png) > > ![network](https://cloud.githubusercontent.com/assets/6976834/10658334/201046d4-7896-11e5-8a8a-bf23b4794b35.png) > > ![cpu](https://cloud.githubusercontent.com/assets/6976834/10658335/20163d8c-7896-11e5-8d6e-680a95a01cf6.png)
kerem closed this issue 2026-03-04 01:42:37 +03:00
Author
Owner

@ggtakec commented on GitHub (Nov 1, 2015):

@flutist599
If you do not check each part of multipart uploading, you can not spesify "enable_content_md5" option.
And you do not specify "use_cache", but s3fs use local file(temporary file).
It is created as a temporary file, just it does not remain after the process is completed.

The reason for CPU usage is high does not understand clearly.
But if you remove enable_content_md5 option and CPU usage is to be low, this problem would depend on calculating MD5.
s3fs calculates MD5 when uploading by multipart, so you can not specify enable_content_md5 option or can specify nomultipart.

Please try to check "not specify" enable_content_md5 option.
Thanks in advance for your help.

<!-- gh-comment-id:152811818 --> @ggtakec commented on GitHub (Nov 1, 2015): @flutist599 If you do not check each part of multipart uploading, you can not spesify "enable_content_md5" option. And you do not specify "use_cache", but s3fs use local file(temporary file). It is created as a temporary file, just it does not remain after the process is completed. The reason for CPU usage is high does not understand clearly. But if you remove enable_content_md5 option and CPU usage is to be low, this problem would depend on calculating MD5. s3fs calculates MD5 when uploading by multipart, so you can not specify enable_content_md5 option or can specify nomultipart. Please try to check "not specify" enable_content_md5 option. Thanks in advance for your help.
Author
Owner

@ggtakec commented on GitHub (Mar 30, 2019):

We kept this issue open for a long time.
We launch new version 1.86, which fixed some problem(bugs).
Please use the latest version.
I will close this, but if the problem persists, please reopen or post a new issue.

<!-- gh-comment-id:478213916 --> @ggtakec commented on GitHub (Mar 30, 2019): We kept this issue open for a long time. We launch new version 1.86, which fixed some problem(bugs). Please use the latest version. I will close this, but if the problem persists, please reopen or post a new issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#146
No description provided.