mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #980] Out of memory s3fs since yesterday on new server using latest version s3fs-fuse #545
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#545
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @nondualit on GitHub (Mar 12, 2019).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/980
Version of s3fs being used (s3fs --version)
s3fs --version
Amazon Simple Storage Service File System V1.85(commit:99ec09f) with OpenSSL
Copyright (C) 2010 Randy Rizun rrizun@gmail.com
License GPL2: GNU GPL version 2 https://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)
example: 2.9.4
Kernel information (uname -r)
4.15.0-1033-aws (kernel)
GNU/Linux Distribution, if applicable (cat /etc/os-release)
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
s3fs command line used, if applicable
s3fs nxxxxx-sql-dxxxxx -o use_cache=/tmp -o allow_other -o uid=1001 -o mp_umask=002 -o multireq_max=5 /mnt/sxxxxx/
/etc/fstab entry, if applicable
no fstab
s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)
if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages
Details about issue
I'm having this memory leak. I just downloaded the newest version. But the oldest was from last week. Is there a problem again? or some dependencies with this new UBUNTU version?
Error:
Mar 12 09:51:09 nondualit_aws kernel: [249822.806272] Out of memory: Kill process 26871 (s3fs) score 284 or sacrifice child
Mar 12 09:51:09 nondualit_aws kernel: [249822.822296] Killed process 26871 (s3fs) total-vm:831084kB, anon-rss:294936kB, file-rss:0kB, shmem-rss:0kB
Mar 12 09:52:42 nondualit_aws kernel: [ 0.000000] Linux version 4.15.0-1033-aws (buildd@lcy01-amd64-019) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #35-U
buntu SMP Wed Feb 6 13:29:46 UTC 2019 (Ubuntu 4.15.0-1033.35-aws 4.15.18)
Mar 12 09:52:42 nondualit_aws kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-1033-aws root=UUID=fc9a41df-6d71-4f4f-a487-e5999bd67182 ro
console=tty1 console=ttyS0 nvme.io_timeout=4294967295
@gaul commented on GitHub (Mar 15, 2019):
Can you share the steps to reproduce these symptoms? We have had many reports of out-of-memory over the years but cannot track down the cause. Using Valgrind massif or similar might help determine the root cause.
@nondualit commented on GitHub (Mar 16, 2019):
Actually, I don't have to do anything, so I can't reproduce. If I mount the S3 bucket in around 6 hours my server will go out of memory. I had to disable the mount since I really can have this issue on my server now. I will give Valgrind massif a try when I have the time. For now, it is better off, is too unsafe to use this package.
@junkert commented on GitHub (Mar 19, 2019):
We are seeing the same issues on our very active SFTP system. I'll will look into getting some Massif outputs here once I get it set up. This is happening on a daily basis for us so shouldn't take too long.
@gaul which flags should I add to the collection using Valgrind Massif? I have it running now with no flags, but want to make sure I get you as much information as you need. Here is how I am currently snagging the massif analysis file:
@gaul commented on GitHub (Mar 20, 2019):
@junkert I believe this will suffice. It would also be helpful to know if you use a distro package or if/how you compile it yourself.
@junkert commented on GitHub (Mar 20, 2019):
We are currently on master and built from source on github so we are running the latest on the Master branch which currently is the v1.85 release.
We built the project with the openssl flag
I have
valgrind --tool=massifrunning now on one of the s3fs processes and should have more data for you by EOD.@junkert commented on GitHub (Mar 21, 2019):
@gaul I have some good data for you, but I need a coworker to verify that all sensitive data has been obfuscated and everything looks good before posting the outputs here. I'll try to get something to you tomorrow AM PST some time.
From the looks of it libtasn1.so.6.5.1 (we are on ubuntu 16.04) accounts for around 96% of the total memory usage. It seems to plateau however around an hour or so (we tested for 7 hours today). We have over 20 or so s3fs mounts on this host and it is happening on all of mounts, but the most active seem to be the ones effected most. This causes the OOM killer to randomly start killing processes (we allocate about 8GB total RAM right now for our VM with 32GB of swap as well and rebooting daily to reset memory. This however is not sustainable for us since this eventually will become a scaling issue.
Our s3fs cmd will be included in the outputs with sensitive pieces removed.
@junkert commented on GitHub (Mar 21, 2019):
@gaul here is the output from Massif https://gist.github.com/junkert/0fdb401eb3d7d77b5c84d936ec7632fb
@tisi1988 commented on GitHub (Apr 5, 2019):
Hi!
We are facing the same problem here. Using v1.85 we experience a huge memory usage that makes our machine hang.
With v1.84 we didn't face this issue but the CPU usage was constantly at ~50% of one core.
With v1.85 the CPU usage seems solved but this memory issue makes this version unusable.
@ggtakec commented on GitHub (Apr 7, 2019):
@junkert Thank you for the memory leak data.
I noticed that I saw your file.
You select openssl and build s3fs, but it seems that libcurl which s3fs uses is using gnutls version.
Please try. (In the case of OpenSSL version, there is an indication of OpenSSL version)
You should try building s3fs with gnutls (--with-gnutls) or using the openssl version of libcurl.
Thanks in advance for your assistance.
@zhou-hongyu commented on GitHub (May 1, 2019):
We are facing the same problem here, can anyone help?
@zhou-hongyu commented on GitHub (May 1, 2019):
Or is there any chance that one of the older version doesn't have memory leakage issue? like 1.8.0?
@nondualit commented on GitHub (May 1, 2019):
I stop using this software, tot buggy.. and started using the AWS cli command line to communicate with de buckets.
@zhou-hongyu commented on GitHub (May 1, 2019):
lmao @nondualit, hey man, would you mind provide any details on how you use aws cli to replace it?
@junkert commented on GitHub (May 1, 2019):
@nevermore2014 We are currently moving to AWS Transfer for SFTP for our permanent solution since we can rely on AWS to handle the scaling side. The login solution is quite cumbersome right now since they only support public key based authentication, but have instructions on how to build a identity provider (IP). I am hoping to have a blog article soon on how to build an IP in Go, Lambda, and DynamoDB. Maybe it will include code, but will see.
@nondualit bugs happen, the only way to make open software better is to contribute and help where you can. Start with bug reports, and grabbing data for the developers of the open source projects. Your contributions will help everyone that uses the software as a whole.
@ggtakec I'll give compiling libcurl with openssl support a shot and grab some more Massif data.
@zhou-hongyu commented on GitHub (May 1, 2019):
@junkert Thanks so much for your reply, when are you expecting this could be done?
@junkert commented on GitHub (May 1, 2019):
@nevermore2014 Hopefully end of May. Will post back here when complete.
@zhou-hongyu commented on GitHub (May 2, 2019):
For those of you who has incurred the same problem, I suggest you use version 1.84 as bumper solution for now, since v1.8.4 will only consumes your memory up to 6G, so as long as your instance has more than 16 G memory it wouldn't cause you immediately down time. Consider it as a bumper solution.
@ggtakec commented on GitHub (May 4, 2019):
@nevermore2014
Would you like to tell us about your environment(os) and
s3fs --versionandlibcurl (curl version)?I want to know the results of the @junkert test, but at first we want to know whether your environment is as same as him.
I'm interested in how this problem is related to gnutls and ubuntu.
Since version 1.85 is modified to keep the SSL session, I'm wonder about the possibility of the bad effect.
Thanks in advance for your assistance.
@zhou-hongyu commented on GitHub (May 6, 2019):
@ggtakec
sure. It's
Amazon Linux AMI
Amazon Simple Storage Service File System V1.84(commit:unknown) with OpenSSL
curl 7.61.1 (x86_64-redhat-linux-gnu) libcurl/7.61.1 OpenSSL/1.0.2k zlib/1.2.8 libidn2/0.16 libpsl/0.6.2 (+libicu/50.1.2) libssh2/1.4.2 nghttp2/1.21.1
@gaul commented on GitHub (Feb 3, 2020):
@nevermore2014 Could you test with the latest version 1.85?
@johnboker commented on GitHub (May 14, 2020):
Has there been any progress on this? I'm seeing this issue as well.
@gaul commented on GitHub (Jul 26, 2020):
Closing due to inactivity. Please retest with the latest 1.86 or master and reopen if symptoms persist.