mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 21:35:58 +03:00
[GH-ISSUE #549] s3fs consumes lot of CPU after 3 days running #314
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#314
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @quezacoatl on GitHub (Mar 28, 2017).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/549
Additional Information
Version of s3fs being used (s3fs --version)
V1.80(commit:8a11d7b) with OpenSSL
Version of fuse being used (pkg-config --modversion fuse)
2.9.2
System information (uname -a)
Linux ip-10-0-0-133 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Distro (cat /etc/issue)
Ubuntu 14.04.5 LTS
s3fs command line used (if applicable)
Nothing remarkable. Lots of
s3fs_getattrand stat cache hits, which should be due to application executing file stat a lot.Details about issue
Our application is only checking if files on s3 can be found or not by executing stat. The stat cache seems to do almost all work according to debug logs. After 3 days running s3fs will consume all CPU of a core (top says 95-105%) during normal load. If s3fs is restarted it will go back to normal - to consume about 0.2% CPU. The CPU usage will increase over time, not sure if it exponential, but it seems that way. Let's assume that it will consume 14% CPU after one day, and 40% after two days and 100% on the third. The
-o stat_cache_expire=3600is actually a new option, and an attempt to solve this issue. My best theory was that as the stat cache grew larger it took an increasing time to search the cache, due to bad complexity. If thestat_cache_expireoption works, this should not be the case, but it is stil likely that some data structure in s3fs is growing out of hand. I cannot find any errors, strange behaviour or even differences between a newly launched s3fs and the long-running, cosuming lots of CPU, by checking strace, application logs and s3fs logs.Not sure if it makes any difference, but our bucket is quite big. It contains about 13.000.000 files and 20GB+ of data. Most of the files are never used, and the stat cache of 100.000 entries will be able to hold almost all used files.
@gaul commented on GitHub (Mar 28, 2017):
Do you have any kind of periodic process crawling s3fs, e.g.,
locatedb? If not run s3fs with-d -fto see what files are being accessed.@quezacoatl commented on GitHub (Mar 29, 2017):
No, there is no crawling, all access is made while handling requests in a webapp.
I ran with
-d -f, but not sure what I am looking for. There are lots ofs3fs_getattrfor files (many files are repeated - in syslog I could see several millions of stat hits) and directories ands3fs_accessfor files. There only seems to be head-requests made by curl, which makes sense as the webapp should only stat files to check if they exist or not. I still can't see anything exciting. It is reasonable to assume that tens of thousands of files and directory are being accessed in total, but I can't see how that should matter.@quezacoatl commented on GitHub (Mar 29, 2017):
I forgot to mention that I think that this never happened in V1.79.
@ggtakec commented on GitHub (Apr 2, 2017):
@quezacoatl
From v1.79 to v1.80, the logic of the stat cache has been changed, and I checked the changed part.
I found a problem in one logic of stats cache.
That is it may have accumulated without cashing out.
I fixed this bug by #558.
If you remember about following, please let me know.
At that time, did the process size of s3fs increase considerably?
And if possible, would you try this new(latest) codes in master branch?
Thanks in advance for your help.
@conradneilands commented on GitHub (Jun 18, 2017):
Seeing this issue as well, even down to the 3 day death cycle. our bucket would probably be up around the 250000 file mark with infrequently accessed data.
s3fs executes out of /etc/fstab using v1.80
s3fs#sentinexdatabucket /sentinexdatabucket fuse _netdev,passwd_file=/etc/gcs-auth.txt,url=http://storage.googleapis.com,sigv2,nomultipart,allow_other,rw,use_cache=/tmp,default_acl=public-read,umask=000,max_stat_cache_size=10000,stat_cache_expire=3600,enable_noobj_cache,ensure_diskfree=512,retries=1,connect_timeout=45,readwrite_timeout=45,noatime,nosscache 0 0
Note the use of http for the api url as not using this causes a memory leak which will kill a server dead within a day
Currently trying the recommendation from other thread to get updatedb to ignore the mount point
https://github.com/s3fs-fuse/s3fs-fuse/issues/193
(added bucket mountpoint to PRUNEPATHS in /etc/updatedb.conf)
Will see how it goes
@gaul commented on GitHub (Jan 26, 2019):
@conradneilands Did
PRUNEPATHShelp your performance problem?@conradneilands commented on GitHub (Jan 26, 2019):
No. What fixed it was turning off ssl. Not sure if they ever fixed it.
Seemed like a big memory leak at the time.
On Sat., 26 Jan. 2019, 12:22 Andrew Gaul, notifications@github.com wrote:
@ggtakec commented on GitHub (Mar 29, 2019):
@conradneilands I'm sorry for late reply.
As you access 250K files, it is recommended to keep max_stat_cache_size=10000 more.
And we launch new version 1.86, which is tuned some perforamnce issue(The head request is getting faster and the SSL renegotiation is getting less).
Please try to use it or master branch code.
Thanks in advance for your assistance.
@SkyLeite commented on GitHub (Apr 22, 2019):
Updated from V1.80 (from apt) to V1.85 (built from latest release) and CPU usage went from 98% to 4% on my VPS. Good job!
@ggtakec commented on GitHub (Apr 22, 2019):
@RodrigoLeiteF Thank you for reporting us.
We are glad the result of the CPU usage rate falling.
@gaul commented on GitHub (Jul 9, 2019):
Seems fixed; please reopen if symptoms persist.
@conradneilands commented on GitHub (Jul 9, 2019):
Did you fix the ssl issue? My solution was to disable that.
On Wed., 10 Jul. 2019, 05:34 Andrew Gaul, notifications@github.com wrote:
@gaul commented on GitHub (Jul 9, 2019):
Possibly; I am sorry this issue has too many symptoms from too many versions to be sure. I recommend retesting with 1.85 and opening a new issue if problems persist. While I am eager to fix these kinds of issues there is no way to make progress at present.
@raj-aws commented on GitHub (Dec 13, 2020):
Hi,
My s3fs version is below, I have two problems, Can someone please help me at the earliest to fix this
Amazon Simple Storage Service File System V1.87 (commit:38e1eaa) with OpenSSL
Copyright (C) 2010 Randy Rizun rrizun@gmail.com
License GPL2: GNU GPL version 2 https://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Problem 1:-
s3fs process is using 100% CPU usage for every 2 days, once we kill the job, after two days we again see s3fs consuming 100% CPU utilization
Problem 2:-
my use_cache is getting filled up very quickly, My total bucket size is 22 TB, But we are using only one folder actively which has around 100K objects
Both these problems are bringing production outage frequently, requesting please reply at the earliest
@conradneilands commented on GitHub (Dec 13, 2020):
As a test anywhere you see https in the connection settings change it to
http.
On Mon, 14 Dec 2020, 00:25 raj-aws, notifications@github.com wrote:
@atulvspl commented on GitHub (Jan 25, 2021):
Hello Guys
Hope you guys doing well.
I have mounted an S3 bucket with my new production server and it is taking 300 + CPUs usage or load and I have syncing last 5 days and it is taking continue the same load on the server and due to that load my application is not working properly and facing some issues with my application.
I have sync my old production s3 bucket to the new production server and in the S3 bucket has approximately 2.5 TB of data to sync with the new server.
So can you please advise me on how can I resolve that issues?
Thanks in advance.
@gaul commented on GitHub (Jan 25, 2021):
Please open a new issue describing your symptoms. I recommend checking to see if
updatedbis unintentionally crawling the system. As stated in https://github.com/s3fs-fuse/s3fs-fuse/issues/549#issuecomment-509846566 there are too many possible causes already addressed by newer versions of s3fs.@atulvspl commented on GitHub (Jan 25, 2021):
@gaul Okay thanks for the information i have created new issues with details you can able to see using the below URL.
https://github.com/s3fs-fuse/s3fs-fuse/issues/1536
Thanks for the help
@alekseyen commented on GitHub (Jan 23, 2022):
Can anyone pls share way to get rid of
CPU_IOWAITdue to s3fs? It is really slow my cloud server@conradneilands commented on GitHub (Jan 23, 2022):
Unfortunately the best way i found was to use http in all my connection
strings. Seema to be a nasty ssl bug somewhere. This may or may not have
been fixed.
On Mon, 24 Jan 2022, 05:13 Aleksey Podkidyshev, @.***>
wrote: