mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 21:35:58 +03:00
[GH-ISSUE #340] Memory Leak - s3fs using over 12GB memory #176
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#176
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @justinfalk on GitHub (Jan 25, 2016).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/340
Hi,
First, thank you for the great product. I've really enjoyed working with it thus far.
We have s3fs running on an AWS instance connected to a bucket with ~110 million small objects. No single directory contains more than a few hundred files and our application accesses these files by their full path, there should be no traversing the directory structure or anything like that. s3fs is used only for retrieval of these objects (no new objects are written) and occasional deletion. On an average night, there are only a few thousand downloads (say 5-20k objects downloaded). I am not using a disk cache because objects are rarely downloaded more than once.
After running for about a week, the kernel killed s3fs because it was consuming all available system memory (~12GB). See logs below.
I've searched and found other memory leak issues related to old libcurl versions and non-ssl usage but those don't seem to apply to my environment. I'm not certain if this is a configuration error or a memory leak, but it seems to be a memory leak because I can easily reproduce this by simply downloading files via my application and watching s3fs memory use grow. Please let me know if there is any other information that might be helpful.
Environment:
OS: Amazon Linux 4.1.13-18.26
s3fs: 1.79
Fuse: 2.9.4-1.17
libcurl: 7.40.0-3.52
fstab config:
Log snippet (full /var/log/messages output attached)
oom.txt
@hryang commented on GitHub (Jan 27, 2016):
Hi,
I met the same problem. In my scenario, the memory leak is due to fdcache.cpp:325, where the page is erased from the list, but not be deleted.
Also, the code may have other logic error. Please see my comments in the following piece of code.
@RobbKistler commented on GitHub (Jan 27, 2016):
@justinfalk have you tried with this with the tip of s3fs-fuse/master instead of the 1.79 release?
@justinfalk commented on GitHub (Jan 27, 2016):
Hi hryang,
Thanks for the response. In your situation, were you able to resolve the memory leak? It looks like this was not resolved because trunk still contains the same logic.
RobbKistler,
I have only used the 1.79 release. I browsed through commits and didn't see anything that looked like it would have resolved this issue but I will give it a try and report back.
Thanks again.
@justinfalk commented on GitHub (Jan 27, 2016):
@RobbKistler I tried building from the latest src and the result was the same. I have 12 threads downloading small files and see a very steady progression of memory use.
I have a video of the top output to demonstrate:
https://www.dropbox.com/s/jrojipbw83zojx1/top.mov?dl=0&preview=top.mov
It sounds like @hryang may have identified the cause of the leak. Can you confirm?
@justinfalk commented on GitHub (Jan 28, 2016):
Thanks @hryang ! I'll test today and report back.
@justinfalk commented on GitHub (Jan 28, 2016):
Regrettably, the fix @hryang provided did not improve the memory leak I am observing. The memory accumulation is unaffected by whether I have a disk cache configured or not, or stat cache settings. It seems to grow linearly over time relative to the number of files downloaded.
Any other ideas?
@hryang commented on GitHub (Jan 29, 2016):
Hi,
Could you please use valgrind to check the memory leak source? In my scenario, the patch fix the memory leak and pass the valgrind check.
The usage:
valgrind --tool=memcheck --leak-check=full --log-file=v.log s3fs -f your_mount_options
Then run your small bunch of workloads and umount.
Finally, you will get valgrind report, v.log. It will show you the detail information.
@justinfalk commented on GitHub (Jan 31, 2016):
I apologize, but I've tried to run valgrind several times on different computers and it keeps failing. What am I doing wrong?
If I run the command outside of valgrind it works fine
@gaul commented on GitHub (Jan 31, 2016):
@justinfalk Please test with Valgrind 3.10 or newer which includes support for MPX instructions.
@justinfalk commented on GitHub (Jan 31, 2016):
Ahh, ok. Thanks @andrewgaul . I was using version 3.9 which is what's available in the Amazon Linux repo. I manually built and installed valgrind 3.11 and it worked.
Here is the result of letting it run for a little over an hour. Memory was over 2GB by the time I stopped it.
@ggtakec commented on GitHub (Feb 6, 2016):
I'm sorry for replying late.
I checked about s3fs cached out logic, and I thought that the cache-out logic should be chaged.
So I updated master codes by #350, I hope it solve this issue.
(please read a reason for changes about bad cache out logic in #350 comment.)
Please use master latest codes, and test it.
Thanks in advance for your kindness.
@justinfalk commented on GitHub (Feb 6, 2016):
Thanks @ggtakec I'll try it out now.
@justinfalk commented on GitHub (Feb 6, 2016):
Using the latest from master I can't even do a directory listing. Note all of the substituted variables in the debug logs like
folderand _%24folder%24fstab
version
@justinfalk commented on GitHub (Feb 6, 2016):
I should also note that the memory leak was even more pronounced using the latest from master. I was able to get the s3fs process up to nearly 7GB resident memory after only 20 minutes or so. That may have been the result of the other issue though. Just thought I would mention in case it's relevant. Thanks.
@ggtakec commented on GitHub (Feb 7, 2016):
@justinfalk
I think that this problem is not memory leak, but s3fs uses large memory for stat cache.
Probaly, you list files in the directory which has many files(objects), don't you?
You specify "stat_cache_expire=300" option, it does not clean up the stat cache which are not elapsed since the lastest access 300 seconds.
Then I think that s3fs continues to accumulate cash, and it grows up to large size.
My suggestion is that you specify max_stat_cache_size option along with now options.
max_stat_cache_size value is specified stat cache entry count. default value is 1000.
This I think that I can define the upper limit of the cache size.
Please try to set the option.
Thanks in advance for your help.
@justinfalk commented on GitHub (Feb 7, 2016):
No, my use case actually never does directory listings. I have full paths stored in the database and the application just occasionally accesses them for downloads. I manually do "ls" on a directory with a couple hundred files just to test the mount, but there is no other directory listing happening anywhere.
With 1.79 these exact same settings worked, albeit with a fairly large memory leak. What about the 404 and the variables shown in the log I just posted?
@justinfalk commented on GitHub (Feb 7, 2016):
Hi @ggtakec , I tried it again with the addition of the following two options:
stat_cache_expire=1
max_stat_cache_size=1
I literally just did the following:
mount /mnt/xxx-xxxxx
ls -lah /mnt/xxx-xxxxx/dir3/2015/2/6/0/837B4FEE/939AD159
This directory only has 89 files. s3fs memory use is at 250 MB, it's been hanging for 10 minutes, and /var/log/messages is has > 10k lines of s3fs debug output.
If I run exactly the same on the 1.79 release it works as expected.
@ggtakec commented on GitHub (Feb 7, 2016):
@justinfalk
I tried to test on my ec2, but s3fs uses memory as 84MB after listing 1000 files in directory.
(I do not set option stat_cache_expire and max_stat_cache_size)
In the listing of 100 of the files, hard so much considered to use the memory...
There may be other causes.
If you can, please try to specify only max_stat_cache_size(ex. =100) or not specify any stat option.
And, Is your objects in the bucket is made by s3fs?(or other s3 tool made it? ex. s3-console, s3cmd...)
Note: 404 error occurs because the compatibility with other s3 tool.
s3 tools, including s3fs is different is how to make subtly directory objects.
Regards,
@grutherford commented on GitHub (Mar 11, 2016):
@justinfalk I'm having a similar issue. Memory keeps being used until eventually the process is killed by the kernel as the server is out of memory. Below is the fstab settings i'm using with version 1.79, the syslog shows normal s3fs logs no errors etc.
bucket /mnt/s3 fuse.s3fs _netdev,noatime,allow_other,uid=1001,gid=1001,dbglevel=debug,curldbg 0 0
Edit: Also based on other issues reported i'm running curl 7.35.0 on Ubuntu 14.04 incase that matters
@ggtakec commented on GitHub (Mar 13, 2016):
Hi, @justinfalk , @grutherford
I tested s3fs(latest codes) with following option.
But I cold not reproduce this issue and could not found a bug about leaking stats cache.
I try and continue to be reproduced.
Regards,
@grantrutherford commented on GitHub (Mar 14, 2016):
Hi @ggtakec looks like I don't have the same issue as @justinfalk , I correctly set stat_cache_expire=300 and my memory issues seem to have gone. Thank you for your help!
@ggtakec commented on GitHub (Mar 22, 2016):
@grantrutherford thanks for reporing result. and I'm glad I no longer leak.
@justinfalk I'm sorry about I have not been able reproduced yet.
Do you still have continued to this bug?
@barsk commented on GitHub (Apr 8, 2016):
Hi, I also have the memory leak issues with 1.79. I do use heavy directory listings when scanning for and loading huge amounts of data into an elsticsearch index. It is about 500.000 files in a few thousand directories. I end up with s3fs using 800 MB after the indexing, Thats 20% of my available memory.
So, what is the solution? Setting stat_cache_expire=300, or compiling the latest sources?
@wytcld commented on GitHub (May 12, 2016):
Had a couple of buckets each with a few million files mounted with s3fs V1.79(commit:d16d616). Was uploading files to them with s3cmd, just using s3fs to sometimes list a directory within the space due to its slight convenience. Had a couple of systems so set up run out of memory, and thought it was s3cmd. But now found one of them with 4 G ram and 8 G swap with all memory exhausted while nothing else much was running besides s3fs - which htop showed as using all the memory. Stopping s3fs freed it all up. This was sitting there idle. Very dangerous.
@ggtakec commented on GitHub (May 15, 2016):
@barsk and @wytcld
You can set max_stat_cache_size option, this option means counts of caching stat of file.
stat cache size for one file is over about 200 byte(you can see struct stat_cache_entry), its size depends on the header count(for meta data) of files in the object(file).
Please adjust the max_stat_cache_size and stat_cache_expire.
Thanks in advance for your assistance.
@jamessoubry commented on GitHub (Jun 6, 2016):
I found running over https ate up my memory so switched back to using http
@ggtakec commented on GitHub (Jun 12, 2016):
Hi, @jamessoubry
I think there is a possibility of the same causes of #254.
If you can, please see following old issue on googlecode.
https://code.google.com/archive/p/s3fs/issues/314
Thanks in advance for your assistance.
@tlevi commented on GitHub (Jun 28, 2016):
I'd like to invite anybody having leaks unrelated to SSL to try my patch and provide feedback. I'm not certain yet this is a complete fix (or at all) so at this point I'm going to test it under load for a while longer before making a PR.
@ggtakec commented on GitHub (Jul 3, 2016):
@tlevi thanks for your PR.
I merged it, please try to use latest codes.
Regards,
@tlevi commented on GitHub (Jul 3, 2016):
Yes I haven't had any more memory issues since applying this to production.
@ggtakec commented on GitHub (Jul 18, 2016):
I'm sorry for my late reply.
@tlevi Thanks for your reply.
@justinfalk Can I close this issue?
@murainwood commented on GitHub (Feb 15, 2017):
we also get the similar issue...
@ggtakec commented on GitHub (May 5, 2017):
@murainwood I'm sorry for my late reply.
What s3fs version did you use?(if you use master branch, please let us know about commit sha1)
If you can, please try to use latest codes in master branch.
Thanks in advance for your assistance.
@nbalakrishnan commented on GitHub (Oct 6, 2017):
Release v1.82 is still leaking memory (pretty rapidly too). Here's my setup:
I tried compiling --with-nss and also (separately) using --with-openssl. Options I'm using with s3fs:
allow_other,umask=0002,max_stat_cache_size=10000,stat_cache_expire=30,multireq_max=50,use_sse
The setup I have involves listing bucket / folder contents and reading objects (files) and is equivalent to the following:
cd
find -type f -exec cat {} ;
Some stats:
The attached valgrind output corresponds to a 8 hour run of the setup. At the end of this run, s3fs was occupying about 1.5G of resident memory and 24GB of virtual memory. I'm continuing to probe this - in the meantime, if anyone has any insights pls share.
Thanks...
vg.log
@PVikash commented on GitHub (Oct 27, 2017):
I am also facing memory leak issue with v1.82
To resolve this I have set minimum values for following options:
max_stat_cache_size=1
stat_cache_expire=1
With above setting, I have not encountered memory leak till 6 hours but it significantly slows down the performance.
Uploading a file with few kbs taking more than a minute :( , prior to above setting(with default) it was uploading the same file within 15 seconds.
@ggtakec , Could you please put some light on this?
Any suggestion would be much appreciated.
Regards,
Vikash
@PVikash commented on GitHub (Oct 30, 2017):
I was trying to perform tuning on max_stat_cache_size & stat_cache_expire parameters.
I started with max_stat_cache_size=1 & stat_cache_expire=1 and as per documnetation with these values cache should be expired after 1 second.
But I found all the files I have copied to the mounting point, still persist in the cache even after 10 minutes.
I have provided both the properties at the time of mounting s3fs with -o option as follow:
sudo /usr/local/bin/s3fs mybucket /mnt/dev/s3/mybucket -o allow_other -o passwd_file=/etc/passwd-s3fs -o use_path_request_style -o url=https://myregion.amazonaws.com -o endpoint=myregion -o uid=uid-o gid=gid-o umask=007 -o max_stat_cache_size=1 -o stat_cache_expire=1 -o use_cache=/tmp/dev/s3/cache/mybucket -o use_sse=kmsid:mykmsidI am checking /tmp/dev/s3/cache/mybucket for the cache.
Am I missing something or using any property incorrectly?
Can someone please comment on this?
Thank you in advance.
Regards,
Vikash
@anilkumardesai commented on GitHub (Mar 30, 2018):
@PVikash
Were you able to get rid of this memory leak? We are still facing the issue, so wanted to know if there is any fix other than just a workaround of unmounting and mounting it back.
@byvalentino commented on GitHub (Apr 13, 2018):
Same here. It doesn’t seem stable al abt i
@ggtakec commented on GitHub (May 7, 2018):
@PVikash @byvalentino Sorry for my late reply.
The same phenomenon has occurred also in #748, and we are investigating the cause.
Probabry, I think that it is possible to fix the leak problem, please wait several days.
Thanks in advance for your assistance.
@anilkumardesai commented on GitHub (May 7, 2018):
We fixed the issue by changing the s3fs url from https(default) to http. To my surprise, I have not seen that issue for last 3 weeks.
@nbalakrishnan commented on GitHub (May 7, 2018):
@anilkumardesai I have tried switching from HTTPS to HTTP because there was some blame on openssl, but the leak remained (pls see my previous message on this thread for details).
@ggtakec commented on GitHub (May 27, 2018):
@PVikash @byvalentino @anilkumardesai @nbalakrishnan
I merged #768 for this issue(memory leak).
If you can, please build latest codes in master branch and try it.(And see #748 comments)
Thanks in advance for your help.
@gaul commented on GitHub (Feb 2, 2019):
@justinfalk did this commit resolve your issue? If so please close this issue.
@srflaxu40 commented on GitHub (Feb 8, 2019):
What version is the fix for this error?
I am getting this from syslog with a crashing sftp server using s3fs:
@ggtakec commented on GitHub (Feb 11, 2019):
@srflaxu40 What version do you use?
latest version is 1.84 which fixes some memory leak.
But we found two memory leak in this code and fixed these in master branch.
If you can build s3fs your local environment, please use latest master branch code.
Or please wait next version it will release by fixing last one issue.
Thanks in advance for your assistance.
@hudac commented on GitHub (Apr 24, 2019):
Hi @ggtakec you mention in #254 that this issue should be fixed in 1.86 which is not available yet.
I'm using commit
381835efor testing and cannot reproduce this bug anymore.Are there any plans for releasing 1.86?
@gaul commented on GitHub (Apr 24, 2019):
1.86 was a typo, 1.85 is the latest version. There are a few reports of memory leaks but I cannot reproduce the symptoms.
@gaul commented on GitHub (Jul 9, 2019):
I am closing this issue since it seems the symptoms are addressed. I appreciate that a memory leak could manifest in several different ways which may not yet be fixed but it would be better to open a new and scoped issue including the s3fs version and description of your workload. FWIW
455e29cbeaaddresses the suggestion in https://github.com/s3fs-fuse/s3fs-fuse/issues/340#issuecomment-175399227.