[GH-ISSUE #2516] S3fs returns "FileNotFoundError" error in high concurrency tasks #1230

Open
opened 2026-03-04 01:52:24 +03:00 by kerem · 2 comments
Owner

Originally created by @dartzonline on GitHub (Aug 23, 2024).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2516

Additional Information

I'm running over 30k batch jobs at scale referencing the same S3 source(one 900 KB object) mounted on local disk inside a docker container.

The jobs are producing OSError: Could not find file in less than 1% of the total jobs randomly without a pattern, retrying the failed job often works.

It's not very easy to reproduce the issue but does keep happening randomly.

The container is a Ubuntu 18.04 running as a AWS batch job on Ec2 instances.

Version of s3fs being used (s3fs --version)

V1.94 (commit: 70a30d6)

Originally created by @dartzonline on GitHub (Aug 23, 2024). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2516 <!-- -------------------------------------------------------------------------- The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all. Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD. --------------------------------------------------------------------------- --> ### Additional Information I'm running over 30k batch jobs at scale referencing the same S3 source(one 900 KB object) mounted on local disk inside a docker container. The jobs are producing ```OSError: Could not find file``` in less than 1% of the total jobs randomly without a pattern, retrying the failed job often works. It's not very easy to reproduce the issue but does keep happening randomly. The container is a Ubuntu 18.04 running as a AWS batch job on Ec2 instances. #### Version of s3fs being used (`s3fs --version`) V1.94 (commit: 70a30d6)
Author
Owner

@ggtakec commented on GitHub (Aug 24, 2024):

@dartzonline
I think we need a bit more information.

We need to know about your environment more clearly.
For example:
Which type of mounting by s3fs.

  • You mount S3 using s3fs on the container's parent host (ec2), and mount this s3fs directory as a volume for the docker container

  • Or, you mount S3 using s3fs on the container

  • And, you said that more than 30k batch jobs are executed simultaneously, but are they operating on the same file (under the s3fs mount point)?

  • Or are you creating and operating on a large number of different files under the same directory?
    I think the OSError: Could not find file error is an error displayed by the OS, but since the file cannot be found, I think it's the latter.

If you are operating to the same file or the same directory from multiple containers at the same time, I think that the same result will be obtained if you do not implement exclusive control on the caller side.
s3fs(and S3) do not perform any exclusive control on files(objects). Exclusive control is not possible for objects on S3.

<!-- gh-comment-id:2307974336 --> @ggtakec commented on GitHub (Aug 24, 2024): @dartzonline I think we need a bit more information. We need to know about your environment more clearly. For example: Which type of mounting by s3fs. - You mount S3 using s3fs on the container's parent host (ec2), and mount this s3fs directory as a volume for the docker container - Or, you mount S3 using s3fs on the container - And, you said that more than 30k batch jobs are executed simultaneously, but are they operating on the same file (under the s3fs mount point)? - Or are you creating and operating on a large number of different files under the same directory? _I think the `OSError: Could not find file` error is an error displayed by the OS, but since the file cannot be found, I think it's the latter._ If you are operating to the same file or the same directory from multiple containers at the same time, I think that the same result will be obtained if you do not implement exclusive control on the caller side. s3fs(and S3) do not perform any exclusive control on files(objects). _Exclusive control is not possible for objects on S3._
Author
Owner

@akuzminsky commented on GitHub (Oct 7, 2024):

I get similar errors, but not necessarily the same problem as described in the issue.
Git/GitHub operations fail frequently while working on an s3fs mount. Commands are gh repo clone, git remote update.
Despite the AWS's claim about Strong Consistency, I'm getting

fatal: Failed to checksum 'objects/pack/tmp_pack_1Smb82': No such file or directory

The file does exist in AWS web console.
The error is more probable (subjectively) if multiple commands run in parallel (~5) and happens even in a single thread.

All this on 1.93. The bucket is mounted on a desktop running Ubuntu jammy.

<!-- gh-comment-id:2397924369 --> @akuzminsky commented on GitHub (Oct 7, 2024): I get similar errors, but not necessarily the same problem as described in the issue. Git/GitHub operations fail frequently while working on an s3fs mount. Commands are `gh repo clone`, `git remote update`. Despite the AWS's claim about [Strong Consistency](https://aws.amazon.com/s3/consistency/), I'm getting ``` fatal: Failed to checksum 'objects/pack/tmp_pack_1Smb82': No such file or directory ``` The file does exist in AWS web console. The error is more probable (subjectively) if multiple commands run in parallel (~5) and happens even in a single thread. All this on 1.93. The bucket is mounted on a desktop running Ubuntu jammy.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#1230
No description provided.