[GH-ISSUE #2630] S3FS Mount Auto Unmounts intermittently #1254

Open
opened 2026-03-04 01:52:35 +03:00 by kerem · 2 comments
Owner

Originally created by @ranga2crazyy on GitHub (Jan 8, 2025).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2630

Additional Information

Version of s3fs being used (s3fs --version)

[root@instance]# s3fs --version
Amazon Simple Storage Service File System V1.95 with OpenSSL
Copyright (C) 2010 Randy Rizun rrizun@gmail.com
License GPL2: GNU GPL version 2 https://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse or dpkg -s fuse)

rpm -qi fuse
Name : fuse
Version : 2.9.7
Release : 19.el8
Architecture: x86_64
Install Date: Sat 04 Jan 2025 08:29:57 AM UTC
Group : Unspecified
Size : 208332
License : GPL+
Signature : RSA/SHA256, Tue 20 Feb 2024 09:31:11 AM UTC, Key ID 199e2f91fd431d51
Source RPM : fuse-2.9.7-19.el8.src.rpm
Build Date : Tue 06 Feb 2024 01:37:57 PM UTC
Build Host : x86-64-01.build.eng.rdu2.redhat.com
Relocations : (not relocatable)
Packager : Red Hat, Inc. http://bugzilla.redhat.com/bugzilla
Vendor : Red Hat, Inc.
URL : http://fuse.sf.net
Summary : File System in Userspace (FUSE) v2 utilities
Description :
With FUSE it is possible to implement a fully functional filesystem in a
userspace program. This package contains the FUSE v2 userspace tools to
mount a FUSE filesystem.

Kernel information (uname -r)

uname -r
4.18.0-553.32.1.el8_10.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

Red Hat Enterprise Linux release 8.10 (Ootpa)

How to run s3fs, if applicable

[] command line
[] /etc/fstab

s3fs#dev3-data-staging /mnt_s3/dev3-data-staging fuse _netdev,stat_cache_expire=30,umask=0022,uid=airflow,gid=airflow,allow_other,compat_dir,iam_role=auto,disable_noobj_cache 0 0

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

s3fs[3202602]: segfault at 0 ip 00005599d45dd5fc sp 00007f005fffe8f0 error 4 in s3fs[5599d45bd000+c9000]
 s3fs[3983116]: segfault at 0 ip 00005574662f05fc sp 00007fc7e27fb8f0 error 4 in s3fs[5574662d0000+c9000]
 s3fs[84599]: segfault at 0 ip 00005603766805fc sp 00007f512cff88f0 error 4 in s3fs[560376660000+c9000]

Details about issue

We are experiencing intermittent issues with S3FS mounts on our EC2 instances. Specifically, a particular S3 bucket is frequently un-mounting unexpectedly, leading to disruptions in our production jobs.

Symptoms:

Jobs relying on this specific S3 bucket mount fail with the error "Transport endpoint not connected."
This issue occurs intermittently and appears to happen without a discernible pattern (e.g., concurrency, file size).
The issue has become more frequent recently, with multiple instances experiencing simultaneous unmounts.
Other S3 buckets mounted on these instances are not affected to the same degree.

Investigation:

We have enabled debug logging for S3FS but have not been able to pinpoint the root cause of the issue.
We have consulted with AWS support, but no definitive solution has been identified.

Logs:

The following error messages are observed in the dmesg -T logs:
    s3fs[3202602]: segfault at 0 ip 00005599d45dd5fc sp 00007f005fffe8f0 error 4 in s3fs[5599d45bd000+c9000]
    s3fs[3983116]: segfault at 0 ip 00005574662f05fc sp 00007fc7e27fb8f0 error 4 in s3fs[5574662d0000+c9000]
    s3fs[84599]: segfault at 0 ip 00005603766805fc sp 00007f512cff88f0 error 4 in s3fs[560376660000+c9000]

Request:

Has anyone else encountered similar S3FS mount unmounting issues on EC2 instances?
If so, what were the identified root causes and how were they resolved?

Note: This issue only affects a specific S3 bucket, while other buckets mounted on the same instances remain largely unaffected.

Originally created by @ranga2crazyy on GitHub (Jan 8, 2025). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2630 <!-- -------------------------------------------------------------------------- The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all. Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD. --------------------------------------------------------------------------- --> ### Additional Information #### Version of s3fs being used (`s3fs --version`) [root@instance]# s3fs --version Amazon Simple Storage Service File System V1.95 with OpenSSL Copyright (C) 2010 Randy Rizun <rrizun@gmail.com> License GPL2: GNU GPL version 2 <https://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. #### Version of fuse being used (`pkg-config --modversion fuse`, `rpm -qi fuse` or `dpkg -s fuse`) rpm -qi fuse Name : fuse Version : 2.9.7 Release : 19.el8 Architecture: x86_64 Install Date: Sat 04 Jan 2025 08:29:57 AM UTC Group : Unspecified Size : 208332 License : GPL+ Signature : RSA/SHA256, Tue 20 Feb 2024 09:31:11 AM UTC, Key ID 199e2f91fd431d51 Source RPM : fuse-2.9.7-19.el8.src.rpm Build Date : Tue 06 Feb 2024 01:37:57 PM UTC Build Host : x86-64-01.build.eng.rdu2.redhat.com Relocations : (not relocatable) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. URL : http://fuse.sf.net Summary : File System in Userspace (FUSE) v2 utilities Description : With FUSE it is possible to implement a fully functional filesystem in a userspace program. This package contains the FUSE v2 userspace tools to mount a FUSE filesystem. #### Kernel information (`uname -r`) uname -r 4.18.0-553.32.1.el8_10.x86_64 #### GNU/Linux Distribution, if applicable (`cat /etc/os-release`) Red Hat Enterprise Linux release 8.10 (Ootpa) #### How to run s3fs, if applicable <!-- Describe the s3fs "command line" or "/etc/fstab" entry used. --> [] command line [] /etc/fstab <!-- Executed command line or /etc/fastab entry --> ```s3fs#dev3-data-staging /mnt_s3/dev3-data-staging fuse _netdev,stat_cache_expire=30,umask=0022,uid=airflow,gid=airflow,allow_other,compat_dir,iam_role=auto,disable_noobj_cache 0 0``` #### s3fs syslog messages (`grep s3fs /var/log/syslog`, `journalctl | grep s3fs`, or `s3fs outputs`) <!-- if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages. --> ``` s3fs[3202602]: segfault at 0 ip 00005599d45dd5fc sp 00007f005fffe8f0 error 4 in s3fs[5599d45bd000+c9000] s3fs[3983116]: segfault at 0 ip 00005574662f05fc sp 00007fc7e27fb8f0 error 4 in s3fs[5574662d0000+c9000] s3fs[84599]: segfault at 0 ip 00005603766805fc sp 00007f512cff88f0 error 4 in s3fs[560376660000+c9000] ``` ### Details about issue We are experiencing intermittent issues with S3FS mounts on our EC2 instances. Specifically, a particular S3 bucket is frequently un-mounting unexpectedly, leading to disruptions in our production jobs. Symptoms: Jobs relying on this specific S3 bucket mount fail with the error "Transport endpoint not connected." This issue occurs intermittently and appears to happen without a discernible pattern (e.g., concurrency, file size). The issue has become more frequent recently, with multiple instances experiencing simultaneous unmounts. Other S3 buckets mounted on these instances are not affected to the same degree. Investigation: We have enabled debug logging for S3FS but have not been able to pinpoint the root cause of the issue. We have consulted with AWS support, but no definitive solution has been identified. Logs: The following error messages are observed in the dmesg -T logs: s3fs[3202602]: segfault at 0 ip 00005599d45dd5fc sp 00007f005fffe8f0 error 4 in s3fs[5599d45bd000+c9000] s3fs[3983116]: segfault at 0 ip 00005574662f05fc sp 00007fc7e27fb8f0 error 4 in s3fs[5574662d0000+c9000] s3fs[84599]: segfault at 0 ip 00005603766805fc sp 00007f512cff88f0 error 4 in s3fs[560376660000+c9000] Request: Has anyone else encountered similar S3FS mount unmounting issues on EC2 instances? If so, what were the identified root causes and how were they resolved? Note: This issue only affects a specific S3 bucket, while other buckets mounted on the same instances remain largely unaffected.
Author
Owner

@ggtakec commented on GitHub (Jan 19, 2025):

@ranga2crazyy
You are specifying iam_role option for launching s3fs.
Authentication with iam_role may not work due to a bug in v1.95. (See #2612.)
If possible, could you please use v1.94 or try it with the latest master branch code?

<!-- gh-comment-id:2600528227 --> @ggtakec commented on GitHub (Jan 19, 2025): @ranga2crazyy You are specifying `iam_role` option for launching s3fs. Authentication with `iam_role` may not work due to a bug in v1.95. (See #2612.) If possible, could you please use v1.94 or try it with the latest master branch code?
Author
Owner

@ranga2crazyy commented on GitHub (Jan 21, 2025):

@ggtakec Thanks for the pointers.

  • I will try out version 1.94 as suggested.
  • We started experiencing this issue in the first week of October last year, and the frequency has been increasing significantly since then.
  • Enabling debug logging has not been helpful in identifying the root cause of the issue due to the overwhelming volume of generated events.
  • This issue primarily affects a specific bucket with a large size(approximately 360 TB).
  • While I understand that the bucket size might be a contributing factor, I can confirm that there has not been a drastic increase in data volume between September and October 2024.
  • Version 1.95 was released on October 26th. It's likely that we started using this newer version around November but still puzzled what could have caused the OCT issues. Before that no such issues observed.
  • Therefore, testing with version 1.94 might help isolate the issue and determine if it's related to the newer version.
  • This approach seems more feasible than having to manually remount the bucket every time it disconnects.

I will update this issue with the results of my testing.

<!-- gh-comment-id:2605117065 --> @ranga2crazyy commented on GitHub (Jan 21, 2025): @ggtakec Thanks for the pointers. * I will try out version 1.94 as suggested. * We started experiencing this issue in the first week of October last year, and the frequency has been increasing significantly since then. * Enabling debug logging has not been helpful in identifying the root cause of the issue due to the overwhelming volume of generated events. * This issue primarily affects a specific bucket with a large size(approximately 360 TB). * While I understand that the bucket size might be a contributing factor, I can confirm that there has not been a drastic increase in data volume between September and October 2024. * Version 1.95 was released on October 26th. It's likely that we started using this newer version around November but still puzzled what could have caused the OCT issues. Before that no such issues observed. * Therefore, testing with version 1.94 might help isolate the issue and determine if it's related to the newer version. * This approach seems more feasible than having to manually remount the bucket every time it disconnects. I will update this issue with the results of my testing.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#1254
No description provided.