[GH-ISSUE #988] d-state processes on RHEL7.6 #548

Closed
opened 2026-03-04 01:46:38 +03:00 by kerem · 6 comments
Owner

Originally created by @muryoutaisuu on GitHub (Mar 22, 2019).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/988

Version of s3fs being used (s3fs --version)

$ s3fs --version
Amazon Simple Storage Service File System V1.85(commit:unknown) with OpenSSL
Copyright (C) 2010 Randy Rizun <rrizun@gmail.com>
License GPL2: GNU GPL version 2 <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

$ yum list installed | grep fuse
fuse.x86_64                  2.9.2-11.el7                    @anaconda/7.6
fuse-libs.x86_64             2.9.2-11.el7                    @anaconda/7.6

Kernel information (uname -r)

$ uname -r
3.10.0-957.1.3.el7.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

$ cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.6"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.6 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.6:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.6"

s3fs command line used, if applicable

/etc/fstab entry, if applicable

$ grep s3fs /etc/fstab
p-prometheus                                 /appl/prometheus/etc/s3    fuse.s3fs    url=https\://s3.example.ch,x-systemd.automount,x-systemd.idle-timeout=1min,_netdev,user,dbglevel=info,curldbg,instance_name=prometheus,allow_other,passwd_file=/appl/prometheus/.passwd-s3fs,umask=0022,nonempty    0    0

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages

I have changed hostname and S3 endpoint (we've S3 on prem).

These are the last logs before I had to reboot due to D-state processes, which I only noticed Mar 22 09:58

Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< HTTP/1.1 200 OK
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Date: Thu, 21 Mar 2019 01:20:13 GMT
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Strict-Transport-Security: max-age=31536000
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-request-id: b741a1eb-4a59-1fd0-be97-d8c4970cda48
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-meta-s3cmd-attrs: atime:1553131211/ctime:1553131211/gid:0/gname:root/md5:7a81fce2ba3af232346f1097a0e3f176/mode:33188/mtime:1553131211/uid:0/uname:root
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Last-Modified: Thu, 21 Mar 2019 01:20:12 GMT
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< ETag: "7a81fce2ba3af232346f1097a0e3f176"
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Type: text/x-c++
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Accept-Ranges: bytes
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Length: 110654
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Server: CloudianS3
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]<
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connection #0 to host p-prometheus.s3-example.ch left intact
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      HTTP response code 200
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      add stat cache entry[path=/alertmanager/alertmanager.yml]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus][path=/alertmanager/alertmanager.yml][flags=32768]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      delete stat cache entry[path=/alertmanager/alertmanager.yml]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      [tpath=/alertmanager/alertmanager.yml]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      [tpath=/alertmanager/alertmanager.yml][bpath=][save=][sseckeypos=-1]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      URL is https://s3-example.ch/p-prometheus/alertmanager/alertmanager.yml
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      URL changed is https://p-prometheus.s3-example.ch/alertmanager/alertmanager.yml
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      computing signature [HEAD] [/alertmanager/alertmanager.yml] [] []
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      url is https://s3-example.ch
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Found bundle for host p-prometheus.s3-example.ch: 0x7f5788018b30
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Re-using existing connection! (#0) with host p-prometheus.s3-example.ch
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connected to p-prometheus.s3-example.ch (172.16.2.14) port 443 (#0)
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> HEAD /alertmanager/alertmanager.yml HTTP/1.1
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> User-Agent: s3fs/1.85 (commit hash unknown; OpenSSL)[prometheus]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Accept: */*
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Authorization: AWS4-HMAC-SHA256 Credential=<redacted_access_key>/20190321/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=ccf1f7642f426f38b129e88da8b674a9e87f5426394cb05aee3a7687ba79a697
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> host: p-prometheus.s3-example.ch
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> x-amz-date: 20190321T012013Z
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]>
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< HTTP/1.1 200 OK
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Date: Thu, 21 Mar 2019 01:20:13 GMT
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Strict-Transport-Security: max-age=31536000
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-request-id: b741a1ed-4a59-1fd0-be97-d8c4970cda48
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-meta-s3cmd-attrs: atime:1553131211/ctime:1553131211/gid:0/gname:root/md5:7a81fce2ba3af232346f1097a0e3f176/mode:33188/mtime:1553131211/uid:0/uname:root
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Last-Modified: Thu, 21 Mar 2019 01:20:12 GMT
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< ETag: "7a81fce2ba3af232346f1097a0e3f176"
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Type: text/x-c++
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Accept-Ranges: bytes
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Length: 110654
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Server: CloudianS3
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]<
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connection #0 to host p-prometheus.s3-example.ch left intact
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      HTTP response code 200
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      add stat cache entry[path=/alertmanager/alertmanager.yml]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      [path=/alertmanager/alertmanager.yml][fd=8][time=1553131212]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      [tpath=/alertmanager/alertmanager.yml][start=0][size=110654]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      [tpath=/alertmanager/alertmanager.yml][start=0][size=110654]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      URL is https://s3-example.ch/p-prometheus/alertmanager/alertmanager.yml
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      URL changed is https://p-prometheus.s3-example.ch/alertmanager/alertmanager.yml
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      computing signature [GET] [/alertmanager/alertmanager.yml] [] []
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      url is https://s3-example.ch
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      downloading... [path=/alertmanager/alertmanager.yml][fd=8]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Found bundle for host p-prometheus.s3-example.ch: 0x7f5788018b30
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Re-using existing connection! (#0) with host p-prometheus.s3-example.ch
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connected to p-prometheus.s3-example.ch (172.16.2.14) port 443 (#0)
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> GET /alertmanager/alertmanager.yml HTTP/1.1
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> User-Agent: s3fs/1.85 (commit hash unknown; OpenSSL)[prometheus]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Accept: */*
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Authorization: AWS4-HMAC-SHA256 Credential=<redacted_access_key>/20190321/us-east-1/s3/aws4_request, SignedHeaders=host;range;x-amz-content-sha256;x-amz-date, Signature=e55c954a7135d356aebcc9d82e2cf1de6799e32ce8173bc0f5a30cf9af7452c6
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> host: p-prometheus.s3-example.ch
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Range: bytes=0-110653
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> x-amz-date: 20190321T012013Z
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]>
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< HTTP/1.1 206 Partial Content
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Date: Thu, 21 Mar 2019 01:20:13 GMT
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Strict-Transport-Security: max-age=31536000
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-request-id: b741a1ef-4a59-1fd0-be97-d8c4970cda48
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Accept-Ranges: bytes
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< ETag: "7a81fce2ba3af232346f1097a0e3f176"
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Type: text/x-c++
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-meta-s3cmd-attrs: atime:1553131211/ctime:1553131211/gid:0/gname:root/md5:7a81fce2ba3af232346f1097a0e3f176/mode:33188/mtime:1553131211/uid:0/uname:root
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Last-Modified: Thu, 21 Mar 2019 01:20:12 GMT
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Range: bytes 0-110653/110654
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Length: 110654
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Server: CloudianS3
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]<
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connection #0 to host p-prometheus.s3-example.ch left intact
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      HTTP response code 206
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus][path=/alertmanager/alertmanager.yml]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus][path=/alertmanager/alertmanager.yml][fd=8]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      [tpath=][path=/alertmanager/alertmanager.yml][fd=8]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus][path=/alertmanager/alertmanager.yml][fd=8]
Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]      [path=/alertmanager/alertmanager.yml][fd=8]
Mar 21 02:20:21 <hostname> s3fs[30634]: [prometheus][path=/]
Mar 21 02:21:27 <hostname> s3fs[30634]: [prometheus][path=/]
Mar 21 02:21:27 <hostname> s3fs[30634]: [prometheus]destroy

Details about issue

After upgrading from RHEL7.5 to RHEL7.6 we've encountered d-state processes that referenced s3fs-fuse mounts. It seems to have worked fine on RHEL7.5. I also opened issue #981, but I don't know if they're related.

I did not find anything suspicious in s3fs-fuse logs in rsyslog. The processes hanging (be it df, ls or anything else) seemed to hang on syscall autofs4_expire_wait.

We haven't discovered how and why this happens, we haven't seen any pattern. We can't pinpoint in any direction.

As already mentioned in #976 , we had to get a different workaround for our production system. Nevertheless, I have started s3fs-fuse on my lab environment. Should the issue arise again, I may provide additional information.

Originally created by @muryoutaisuu on GitHub (Mar 22, 2019). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/988 #### Version of s3fs being used (s3fs --version) ``` $ s3fs --version Amazon Simple Storage Service File System V1.85(commit:unknown) with OpenSSL Copyright (C) 2010 Randy Rizun <rrizun@gmail.com> License GPL2: GNU GPL version 2 <https://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. ``` #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) ``` $ yum list installed | grep fuse fuse.x86_64 2.9.2-11.el7 @anaconda/7.6 fuse-libs.x86_64 2.9.2-11.el7 @anaconda/7.6 ``` #### Kernel information (uname -r) ``` $ uname -r 3.10.0-957.1.3.el7.x86_64 ``` #### GNU/Linux Distribution, if applicable (cat /etc/os-release) ``` $ cat /etc/os-release NAME="Red Hat Enterprise Linux Server" VERSION="7.6 (Maipo)" ID="rhel" ID_LIKE="fedora" VARIANT="Server" VARIANT_ID="server" VERSION_ID="7.6" PRETTY_NAME="Red Hat Enterprise Linux Server 7.6 (Maipo)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:7.6:GA:server" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.6 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.6" ``` #### s3fs command line used, if applicable ``` ``` #### /etc/fstab entry, if applicable ``` $ grep s3fs /etc/fstab p-prometheus /appl/prometheus/etc/s3 fuse.s3fs url=https\://s3.example.ch,x-systemd.automount,x-systemd.idle-timeout=1min,_netdev,user,dbglevel=info,curldbg,instance_name=prometheus,allow_other,passwd_file=/appl/prometheus/.passwd-s3fs,umask=0022,nonempty 0 0 ``` #### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs) _if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages_ I have changed hostname and S3 endpoint (we've S3 on prem). These are the last logs before I had to reboot due to D-state processes, which I only noticed Mar 22 09:58 ``` Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< HTTP/1.1 200 OK Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Date: Thu, 21 Mar 2019 01:20:13 GMT Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Strict-Transport-Security: max-age=31536000 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-request-id: b741a1eb-4a59-1fd0-be97-d8c4970cda48 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-meta-s3cmd-attrs: atime:1553131211/ctime:1553131211/gid:0/gname:root/md5:7a81fce2ba3af232346f1097a0e3f176/mode:33188/mtime:1553131211/uid:0/uname:root Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Last-Modified: Thu, 21 Mar 2019 01:20:12 GMT Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< ETag: "7a81fce2ba3af232346f1097a0e3f176" Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Type: text/x-c++ Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Accept-Ranges: bytes Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Length: 110654 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Server: CloudianS3 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connection #0 to host p-prometheus.s3-example.ch left intact Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] HTTP response code 200 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] add stat cache entry[path=/alertmanager/alertmanager.yml] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus][path=/alertmanager/alertmanager.yml][flags=32768] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] delete stat cache entry[path=/alertmanager/alertmanager.yml] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] [tpath=/alertmanager/alertmanager.yml] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] [tpath=/alertmanager/alertmanager.yml][bpath=][save=][sseckeypos=-1] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] URL is https://s3-example.ch/p-prometheus/alertmanager/alertmanager.yml Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] URL changed is https://p-prometheus.s3-example.ch/alertmanager/alertmanager.yml Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] computing signature [HEAD] [/alertmanager/alertmanager.yml] [] [] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] url is https://s3-example.ch Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Found bundle for host p-prometheus.s3-example.ch: 0x7f5788018b30 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Re-using existing connection! (#0) with host p-prometheus.s3-example.ch Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connected to p-prometheus.s3-example.ch (172.16.2.14) port 443 (#0) Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> HEAD /alertmanager/alertmanager.yml HTTP/1.1 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> User-Agent: s3fs/1.85 (commit hash unknown; OpenSSL)[prometheus] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Accept: */* Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Authorization: AWS4-HMAC-SHA256 Credential=<redacted_access_key>/20190321/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=ccf1f7642f426f38b129e88da8b674a9e87f5426394cb05aee3a7687ba79a697 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> host: p-prometheus.s3-example.ch Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> x-amz-date: 20190321T012013Z Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< HTTP/1.1 200 OK Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Date: Thu, 21 Mar 2019 01:20:13 GMT Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Strict-Transport-Security: max-age=31536000 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-request-id: b741a1ed-4a59-1fd0-be97-d8c4970cda48 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-meta-s3cmd-attrs: atime:1553131211/ctime:1553131211/gid:0/gname:root/md5:7a81fce2ba3af232346f1097a0e3f176/mode:33188/mtime:1553131211/uid:0/uname:root Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Last-Modified: Thu, 21 Mar 2019 01:20:12 GMT Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< ETag: "7a81fce2ba3af232346f1097a0e3f176" Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Type: text/x-c++ Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Accept-Ranges: bytes Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Length: 110654 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Server: CloudianS3 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connection #0 to host p-prometheus.s3-example.ch left intact Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] HTTP response code 200 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] add stat cache entry[path=/alertmanager/alertmanager.yml] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] [path=/alertmanager/alertmanager.yml][fd=8][time=1553131212] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] [tpath=/alertmanager/alertmanager.yml][start=0][size=110654] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] [tpath=/alertmanager/alertmanager.yml][start=0][size=110654] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] URL is https://s3-example.ch/p-prometheus/alertmanager/alertmanager.yml Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] URL changed is https://p-prometheus.s3-example.ch/alertmanager/alertmanager.yml Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] computing signature [GET] [/alertmanager/alertmanager.yml] [] [] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] url is https://s3-example.ch Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] downloading... [path=/alertmanager/alertmanager.yml][fd=8] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Found bundle for host p-prometheus.s3-example.ch: 0x7f5788018b30 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Re-using existing connection! (#0) with host p-prometheus.s3-example.ch Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connected to p-prometheus.s3-example.ch (172.16.2.14) port 443 (#0) Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> GET /alertmanager/alertmanager.yml HTTP/1.1 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> User-Agent: s3fs/1.85 (commit hash unknown; OpenSSL)[prometheus] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Accept: */* Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Authorization: AWS4-HMAC-SHA256 Credential=<redacted_access_key>/20190321/us-east-1/s3/aws4_request, SignedHeaders=host;range;x-amz-content-sha256;x-amz-date, Signature=e55c954a7135d356aebcc9d82e2cf1de6799e32ce8173bc0f5a30cf9af7452c6 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> host: p-prometheus.s3-example.ch Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Range: bytes=0-110653 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> x-amz-date: 20190321T012013Z Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]> Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< HTTP/1.1 206 Partial Content Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Date: Thu, 21 Mar 2019 01:20:13 GMT Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Strict-Transport-Security: max-age=31536000 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-request-id: b741a1ef-4a59-1fd0-be97-d8c4970cda48 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Accept-Ranges: bytes Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< ETag: "7a81fce2ba3af232346f1097a0e3f176" Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Type: text/x-c++ Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< x-amz-meta-s3cmd-attrs: atime:1553131211/ctime:1553131211/gid:0/gname:root/md5:7a81fce2ba3af232346f1097a0e3f176/mode:33188/mtime:1553131211/uid:0/uname:root Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Last-Modified: Thu, 21 Mar 2019 01:20:12 GMT Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Range: bytes 0-110653/110654 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Content-Length: 110654 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Server: CloudianS3 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]< Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus]* Connection #0 to host p-prometheus.s3-example.ch left intact Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] HTTP response code 206 Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus][path=/alertmanager/alertmanager.yml] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus][path=/alertmanager/alertmanager.yml][fd=8] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] [tpath=][path=/alertmanager/alertmanager.yml][fd=8] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus][path=/alertmanager/alertmanager.yml][fd=8] Mar 21 02:20:13 <hostname> s3fs[30634]: [prometheus] [path=/alertmanager/alertmanager.yml][fd=8] Mar 21 02:20:21 <hostname> s3fs[30634]: [prometheus][path=/] Mar 21 02:21:27 <hostname> s3fs[30634]: [prometheus][path=/] Mar 21 02:21:27 <hostname> s3fs[30634]: [prometheus]destroy ``` ### Details about issue After upgrading from RHEL7.5 to RHEL7.6 we've encountered d-state processes that referenced s3fs-fuse mounts. It seems to have worked fine on RHEL7.5. I also opened issue #981, but I don't know if they're related. I did not find anything suspicious in s3fs-fuse logs in rsyslog. The processes hanging (be it `df`, `ls` or anything else) seemed to hang on syscall `autofs4_expire_wait`. We haven't discovered how and why this happens, we haven't seen any pattern. We can't pinpoint in any direction. As already mentioned in #976 , we had to get a different workaround for our production system. Nevertheless, I have started s3fs-fuse on my lab environment. Should the issue arise again, I may provide additional information.
kerem closed this issue 2026-03-04 01:46:38 +03:00
Author
Owner

@ggtakec commented on GitHub (Mar 24, 2019):

@muryoutaisuu Thank you for the log.
I checked your log, and it seemed the following:

  1. The GET(read) request of alertmanager.yml (110654 bytes) seems to be successful.
  2. And then it seems that there was twice access to the mount point(/).

Is this a log when double starting #981?
If it is, I think this log looks like the log of s3fs(not the second one) you started first.
Maybe the first running s3fs did not put any logs, after running the second s3fs.(It is because the second s3fs handles everything)

And I'm not used to Prometheus, but isn't it the s3fs that you started monitoring with Prometheus first?
(And was there any problem reported by Prometheus?)
If this log is associated with #981, I think this issue appears to be the cause described above.

<!-- gh-comment-id:475966427 --> @ggtakec commented on GitHub (Mar 24, 2019): @muryoutaisuu Thank you for the log. I checked your log, and it seemed the following: 1. The GET(read) request of alertmanager.yml (110654 bytes) seems to be successful. 1. And then it seems that there was twice access to the mount point(/). Is this a log when double starting #981? If it is, I think this log looks like the log of s3fs(not the second one) you started first. Maybe the first running s3fs did not put any logs, after running the second s3fs.(It is because the second s3fs handles everything) And I'm not used to Prometheus, but isn't it the s3fs that you started monitoring with Prometheus first? (And was there any problem reported by Prometheus?) If this log is associated with #981, I think this issue appears to be the cause described above.
Author
Owner

@gaul commented on GitHub (Mar 25, 2019):

Is this a regression from upgrading RHEL from 7.5 to 7.6 or from upgrading s3fs from 1.84 to 1.85? If the former we should understand what changed and perhaps report a Red Hat Bugzilla issue.

<!-- gh-comment-id:476100684 --> @gaul commented on GitHub (Mar 25, 2019): Is this a regression from upgrading RHEL from 7.5 to 7.6 or from upgrading s3fs from 1.84 to 1.85? If the former we should understand what changed and perhaps report a Red Hat Bugzilla issue.
Author
Owner

@muryoutaisuu commented on GitHub (Mar 26, 2019):

@ggtakec I'm not sure if it was mounted twice, I didn't check then. Could potentially be the case, as the nonempty option was set.

The mount only contains Prometheus alert rules and Alertmanager configurations which are only loaded on startup and on config reload. Hence the access to the mount should be quite sparse. The mount is not monitored by Prometheus, in fact it could be any other application using the mount.

@gaul The issues started as soon as we upgraded to RHEL7.6 (still using s3fs v1.84 at that time). I then rebuilt s3fs with v1.85 (to ensure, that it's compiled with new libraries) but it still occurred.

I will remount without the nonempty option on my lab environment and see, whether it still happens (and therefore is unrelated to nonempty option).

<!-- gh-comment-id:476542110 --> @muryoutaisuu commented on GitHub (Mar 26, 2019): @ggtakec I'm not sure if it was mounted twice, I didn't check then. Could potentially be the case, as the nonempty option was set. The mount only contains Prometheus alert rules and Alertmanager configurations which are only loaded on startup and on config reload. Hence the access to the mount should be quite sparse. The mount is not monitored by Prometheus, in fact it could be any other application using the mount. @gaul The issues started as soon as we upgraded to RHEL7.6 (still using s3fs v1.84 at that time). I then rebuilt s3fs with v1.85 (to ensure, that it's compiled with new libraries) but it still occurred. I will remount without the nonempty option on my lab environment and see, whether it still happens (and therefore is unrelated to nonempty option).
Author
Owner

@muryoutaisuu commented on GitHub (Mar 26, 2019):

I just checked our puppet configuration. A colleage added the nonempty option around 3 weeks ago (coincidentally also around the time we patched our RHEL to 7.6). So the issue might have nothing to do with the OS patch.
I've sent him an email to ask why he added the option. Unfortunately, he's on vacation until next month.
I will report back.

<!-- gh-comment-id:476554402 --> @muryoutaisuu commented on GitHub (Mar 26, 2019): I just checked our puppet configuration. A colleage added the nonempty option around 3 weeks ago (coincidentally also around the time we patched our RHEL to 7.6). So the issue might have nothing to do with the OS patch. I've sent him an email to ask why he added the option. Unfortunately, he's on vacation until next month. I will report back.
Author
Owner

@muryoutaisuu commented on GitHub (Apr 2, 2019):

The colleague couldn't really give me an answer as to why he configured the nonempty option. Either way, I was not able to reproduce the issue on my lab system. I tried with and without nonempty option, and even tried with nonempty option and intentionally mounting twice. I always gave at least 24h time for it to happen, but nothing happened so far. I'd rather not try reproducing it on our production system.
Therefore, I can't say whether the RHEL 7.6 upgrade or the nonempty or both in conjunction led to the issue.

<!-- gh-comment-id:478875384 --> @muryoutaisuu commented on GitHub (Apr 2, 2019): The colleague couldn't really give me an answer as to why he configured the nonempty option. Either way, I was not able to reproduce the issue on my lab system. I tried with and without nonempty option, and even tried with nonempty option and intentionally mounting twice. I always gave at least 24h time for it to happen, but nothing happened so far. I'd rather not try reproducing it on our production system. Therefore, I can't say whether the RHEL 7.6 upgrade or the nonempty or both in conjunction led to the issue.
Author
Owner

@ggtakec commented on GitHub (Apr 7, 2019):

@muryoutaisuu Thank you for the reply.
For now, I understand that the cause was a double mount, or it was unknown whether it was due to RHEL 7.6.
However, as long as you look at the log and activation reported by you, I think that it is highly likely that it was caused by double mounting.

I close this issue, but please reopen this if it happens again.
Thanks for your help.

<!-- gh-comment-id:480566464 --> @ggtakec commented on GitHub (Apr 7, 2019): @muryoutaisuu Thank you for the reply. For now, I understand that the cause was a double mount, or it was unknown whether it was due to RHEL 7.6. However, as long as you look at the log and activation reported by you, I think that it is highly likely that it was caused by double mounting. I close this issue, but please reopen this if it happens again. Thanks for your help.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#548
No description provided.