[GH-ISSUE #1391] After running for some time, s3fs starts failing with "failed: mkdir /var/lib/rexray/volumes/<bucket>: file exists" #743

Closed
opened 2026-03-04 01:48:22 +03:00 by kerem · 8 comments
Owner

Originally created by @Makeshift on GitHub (Sep 11, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1391

Additional Information

I'm attempting to use s3fs along with the rexray/s3fs plugin on ECS in AWS. This is the userdata I'm using:

echo 'ECS_CLUSTER=${aws_ecs_cluster.main.name}' >> /etc/ecs/ecs.config
yum install -y amazon-linux-extras amazon-efs-utils
systemctl enable --now amazon-ecs-volume-plugin
amazon-linux-extras install epel
yum install -y s3fs-fuse
docker plugin install rexray/s3fs:latest --grant-all-permissions S3FS_REGION=${var.aws_region} S3FS_OPTIONS="allow_other,iam_role=auto,umask=000" LIBSTORAGE_INTEGRATION_VOLUME_OPERATIONS_MOUNT_ROOTPATH="/" LINUX_VOLUME_ROOTPATH="/" REXRAY_LOGLEVEL=debug S3FS_MAXRETRIES=20 LINUX_VOLUME_FILEMODE=0777

Everything seems fine for a little while, with tasks spawning and correctly connecting to volumes. After a little while (I'm not sure of the exact trigger yet), containers will fail to launch with the error:
"Handler for POST /v1.25/containers/<hash>/start returned error: error while mounting volume '': VolumeDriver.Mount: docker-legacy: Mount: <bucket>: failed: mkdir /var/lib/rexray/volumes/<bucket>: file exists"
That folder doesn't exist on the host - so I'm assuming it's within the volume plugin.

From what I can tell, this happens on all buckets from that have previously been mounted after a period of time. Note in this case I do have multiple containers utilizing the same bucket - but this hasn't been an issue previously.

I'm shortly going to try building master instead of using the EPEL release to see if it changes anything.

If there's any other troubleshooting steps I can take, please let me know.

Version of s3fs being used (s3fs --version)

Amazon Simple Storage Service File System V1.87 (commit:unknown) with OpenSSL (Current version in AML2 EPEL)
This was wrong - I was using the one provided by the official rexray plugin. I've since rebuilt the plugin using the latest s3fs version and am now testing with it.

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

Version     : 2.9.2
Release     : 11.amzn2
Architecture: x86_64

Kernel information (uname -r)

4.14.193-149.317.amzn2.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

Sep 11 16:58:19 ip-172-25-0-42 dockerd: time="2020-09-11T16:58:19Z" level=error msg="time=\"2020-09-11T16:58:19Z\" level=error msg=\"docker-legacy: Mount: iea-prod-resat-genplanning-instance-test-data-temp-secrets: failed: mkdir /var/lib/rexray/volumes/iea-prod-resat-genplanning-instance-test-data-temp-secrets: file exists\" host=\"unix:///var/run/rexray/558708985.sock\" integrationDriver=linux osDriver=linux service=s3fs storageDriver=libstorage time=1599843499618 " plugin=e56108b10891727d176fa918b27c78eda7e62bcaee69731eef8c048943419023
Sep 11 16:58:20 ip-172-25-0-42 dockerd: time="2020-09-11T16:58:20.164127890Z" level=error msg="Handler for POST /v1.25/containers/f0b5574a274d17653df26a6be37efbb69cd60ab12da3ddf1342145497ac293b4/start returned error: error while mounting volume '': VolumeDriver.Mount: docker-legacy: Mount: iea-prod-resat-genplanning-instance-test-data-temp-secrets: failed: mkdir /var/lib/rexray/volumes/iea-prod-resat-genplanning-instance-test-data-temp-secrets: file exists"
Originally created by @Makeshift on GitHub (Sep 11, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1391 ### Additional Information I'm attempting to use s3fs along with the rexray/s3fs plugin on ECS in AWS. This is the userdata I'm using: ``` echo 'ECS_CLUSTER=${aws_ecs_cluster.main.name}' >> /etc/ecs/ecs.config yum install -y amazon-linux-extras amazon-efs-utils systemctl enable --now amazon-ecs-volume-plugin amazon-linux-extras install epel yum install -y s3fs-fuse docker plugin install rexray/s3fs:latest --grant-all-permissions S3FS_REGION=${var.aws_region} S3FS_OPTIONS="allow_other,iam_role=auto,umask=000" LIBSTORAGE_INTEGRATION_VOLUME_OPERATIONS_MOUNT_ROOTPATH="/" LINUX_VOLUME_ROOTPATH="/" REXRAY_LOGLEVEL=debug S3FS_MAXRETRIES=20 LINUX_VOLUME_FILEMODE=0777 ``` Everything seems fine for a little while, with tasks spawning and correctly connecting to volumes. After a little while (I'm not sure of the exact trigger yet), containers will fail to launch with the error: `"Handler for POST /v1.25/containers/<hash>/start returned error: error while mounting volume '': VolumeDriver.Mount: docker-legacy: Mount: <bucket>: failed: mkdir /var/lib/rexray/volumes/<bucket>: file exists"` That folder doesn't exist on the host - so I'm assuming it's within the volume plugin. From what I can tell, this happens on all buckets from that have previously been mounted after a period of time. Note in this case I do have multiple containers utilizing the same bucket - but this hasn't been an issue previously. I'm shortly going to try building master instead of using the EPEL release to see if it changes anything. If there's any other troubleshooting steps I can take, please let me know. #### Version of s3fs being used (s3fs --version) ~~Amazon Simple Storage Service File System V1.87 (commit:unknown) with OpenSSL (Current version in AML2 EPEL)~~ This was wrong - I was using the one provided by the official rexray plugin. I've since rebuilt the plugin using the latest s3fs version and am now testing with it. #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) ``` Version : 2.9.2 Release : 11.amzn2 Architecture: x86_64 ``` #### Kernel information (uname -r) `4.14.193-149.317.amzn2.x86_64` #### GNU/Linux Distribution, if applicable (cat /etc/os-release) ``` NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/" ``` #### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs) ``` Sep 11 16:58:19 ip-172-25-0-42 dockerd: time="2020-09-11T16:58:19Z" level=error msg="time=\"2020-09-11T16:58:19Z\" level=error msg=\"docker-legacy: Mount: iea-prod-resat-genplanning-instance-test-data-temp-secrets: failed: mkdir /var/lib/rexray/volumes/iea-prod-resat-genplanning-instance-test-data-temp-secrets: file exists\" host=\"unix:///var/run/rexray/558708985.sock\" integrationDriver=linux osDriver=linux service=s3fs storageDriver=libstorage time=1599843499618 " plugin=e56108b10891727d176fa918b27c78eda7e62bcaee69731eef8c048943419023 Sep 11 16:58:20 ip-172-25-0-42 dockerd: time="2020-09-11T16:58:20.164127890Z" level=error msg="Handler for POST /v1.25/containers/f0b5574a274d17653df26a6be37efbb69cd60ab12da3ddf1342145497ac293b4/start returned error: error while mounting volume '': VolumeDriver.Mount: docker-legacy: Mount: iea-prod-resat-genplanning-instance-test-data-temp-secrets: failed: mkdir /var/lib/rexray/volumes/iea-prod-resat-genplanning-instance-test-data-temp-secrets: file exists" ```
kerem closed this issue 2026-03-04 01:48:23 +03:00
Author
Owner

@Makeshift commented on GitHub (Sep 11, 2020):

I'm having quite a bit of trouble understanding the interactions between docker plugins/s3fs/rexray/libstorage/etc so I may be being really stupid and not realizing that the plugin isn't actually using the on-host s3fs. If that's the case, please let me know that building master won't do anything for me :D

<!-- gh-comment-id:691216051 --> @Makeshift commented on GitHub (Sep 11, 2020): I'm having quite a bit of trouble understanding the interactions between docker plugins/s3fs/rexray/libstorage/etc so I may be being really stupid and not realizing that the plugin isn't actually using the on-host s3fs. If that's the case, please let me know that building master won't do anything for me :D
Author
Owner

@Makeshift commented on GitHub (Sep 11, 2020):

Ah, looks like this may be a rexray issue: https://github.com/rexray/rexray/issues/1336

<!-- gh-comment-id:691220400 --> @Makeshift commented on GitHub (Sep 11, 2020): Ah, looks like this may be a rexray issue: https://github.com/rexray/rexray/issues/1336
Author
Owner

@Makeshift commented on GitHub (Sep 11, 2020):

I've rebuilt the rexray plugin with the current master of s3fs (available here). Unfortunately now I'm getting a more unhelpful error:

Sep 11 23:07:09 ip-172-25-0-28 dockerd: time="2020-09-11T23:07:09.014027947Z" level=error msg="Handler for POST /v1.25/containers/create returned error: VolumeDriver.Mount: docker-legacy: Mount: iea-prod-resat-genplanning-instance-test-data-files-store: failed: error mounting s3fs bucket"

:(

<!-- gh-comment-id:691349730 --> @Makeshift commented on GitHub (Sep 11, 2020): I've rebuilt the rexray plugin with the current master of s3fs (available [here](https://hub.docker.com/r/makeshift27015/s3fs)). Unfortunately now I'm getting a more unhelpful error: ``` Sep 11 23:07:09 ip-172-25-0-28 dockerd: time="2020-09-11T23:07:09.014027947Z" level=error msg="Handler for POST /v1.25/containers/create returned error: VolumeDriver.Mount: docker-legacy: Mount: iea-prod-resat-genplanning-instance-test-data-files-store: failed: error mounting s3fs bucket" ``` :(
Author
Owner

@gaul commented on GitHub (Sep 12, 2020):

Can you try running s3fs with -d -f to see if there are any debug logs?

<!-- gh-comment-id:691445231 --> @gaul commented on GitHub (Sep 12, 2020): Can you try running s3fs with `-d -f` to see if there are any debug logs?
Author
Owner

@Makeshift commented on GitHub (Sep 14, 2020):

This seems to be some specific bug with repeatedly mounting and unmounting the same bucket(s) using Rexray and some craziness with how docker reuses plugins. I'm going to say this probably isn't an s3fs issue and is much more likely to be rexray, and because of the sheer amount of wrapping rexray does with s3fs, it's pretty hard to debug and get logs out of it.

My workaround is to simply mount the bucket on the host and pass it through to the ECS containers. Not quite as elegant, but it seems to work brilliantly!

<!-- gh-comment-id:691969911 --> @Makeshift commented on GitHub (Sep 14, 2020): This seems to be some specific bug with repeatedly mounting and unmounting the same bucket(s) using Rexray and some craziness with how docker reuses plugins. I'm going to say this probably isn't an s3fs issue and is much more likely to be rexray, and because of the sheer amount of wrapping rexray does with s3fs, it's pretty hard to debug and get logs out of it. My workaround is to simply mount the bucket on the host and pass it through to the ECS containers. Not quite as elegant, but it seems to work brilliantly!
Author
Owner

@Poweranimal commented on GitHub (Mar 7, 2021):

@Makeshift I'm having the exact same problem.
It's super annoying that rexray for s3fs performs that badly.
Also, as you said, debugging it yourself is super difficult because of the lack of helpful logs.

If it helps anyone, I could resolve the issue by restart docker.
This, of course is not the solution and even a good workaround for the problem, but maybe someone can afford doing this.

I'd propose reopening this issue, because it doesn't seem to be resolved.

<!-- gh-comment-id:792310286 --> @Poweranimal commented on GitHub (Mar 7, 2021): @Makeshift I'm having the exact same problem. It's super annoying that rexray for s3fs performs that badly. Also, as you said, debugging it yourself is super difficult because of the lack of helpful logs. If it helps anyone, I could resolve the issue by restart docker. This, of course is not the solution and even a good workaround for the problem, but maybe someone can afford doing this. I'd propose reopening this issue, because it doesn't seem to be resolved.
Author
Owner

@himalreddy commented on GitHub (Mar 8, 2022):

I am also facing same issue. any solution for this problem?

<!-- gh-comment-id:1061358688 --> @himalreddy commented on GitHub (Mar 8, 2022): I am also facing same issue. any solution for this problem?
Author
Owner

@gaul commented on GitHub (Jun 12, 2022):

We need more information to debug further. If rexray does not offer the option to dump debug logs please open an issue against that project.

<!-- gh-comment-id:1153168257 --> @gaul commented on GitHub (Jun 12, 2022): We need more information to debug further. If rexray does not offer the option to dump debug logs please open an issue against that project.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#743
No description provided.