[GH-ISSUE #1226] randomly produced file not exist error in high concurrency #656

Closed
opened 2026-03-04 01:47:36 +03:00 by kerem · 7 comments
Owner

Originally created by @liuzqt on GitHub (Jan 8, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1226

I'm running large scale tasks (over 2000 EC2 instances) referencing the same S3 source mounted on local disk.
And I randomly produced file not exist error in boost::filesystem::exists in very few instances, but actually those files do exist, and I was able to somehow hotfix it by retrying a few times......

It's very hard to reproduce this issue, it happened at a very low chance randomly, and on different files each time.

I'm using Ubuntu 16.04 with g++ 5.4.0, C++14 and boost 1.59

Just curious if anyone has met this issue before. Though retry is a workaround for it but that make my code very verbose and I'm not sure what kind of boost::filesystem operation might fail unexpectedly.

Originally created by @liuzqt on GitHub (Jan 8, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1226 I'm running large scale tasks (over 2000 EC2 instances) referencing the same S3 source mounted on local disk. And I randomly produced `file not exist` error in `boost::filesystem::exists` in very few instances, but actually those files do exist, and I was able to somehow hotfix it by retrying a few times...... It's very hard to reproduce this issue, it happened at a very low chance randomly, and on different files each time. I'm using Ubuntu 16.04 with g++ 5.4.0, C++14 and boost 1.59 Just curious if anyone has met this issue before. Though retry is a workaround for it but that make my code very verbose and I'm not sure what kind of `boost::filesystem` operation might fail unexpectedly.
kerem 2026-03-04 01:47:36 +03:00
Author
Owner

@gaul commented on GitHub (Feb 2, 2020):

This may be eventual consistency. We have not explored this in s3fs since we have been fixing more basic issues but is something that interests me. Some of the ways s3fs accesses data are suboptimal; fetching metadata from S3 when it already has it locally. Unfortunately some of it may be inherent to S3 itself. If you can characterize this issue further, e.g., read-after-create, read-after-write, read-after-delete, etc., it might help direct our efforts. In the the long run I want to explore the eventual consistency mode of S3Proxy to test s3fs.

<!-- gh-comment-id:581139011 --> @gaul commented on GitHub (Feb 2, 2020): This may be [eventual consistency](https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel). We have not explored this in s3fs since we have been fixing more basic issues but is something that interests me. Some of the ways s3fs accesses data are suboptimal; fetching metadata from S3 when it already has it locally. Unfortunately some of it may be inherent to S3 itself. If you can characterize this issue further, e.g., read-after-create, read-after-write, read-after-delete, etc., it might help direct our efforts. In the the long run I want to explore the eventual consistency mode of S3Proxy to test s3fs.
Author
Owner

@brianfay commented on GitHub (Jul 13, 2020):

@gaul The AWS docs that you linked mention that read-after-write consistency is guaranteed except in this one condition:

The caveat is that if you make a HEAD or GET request to a key name before the object is created, then create the object shortly after that, a subsequent GET might not return the object due to eventual consistency.

I'm using s3fs to upload files from an sftp server to S3. The S3 bucket is configured to send notifications whenever a file is uploaded, and I have an application that listens for these notifications and tries to download the file.

I often see a similar "file does not exist" error, and it seems like I must be falling into this eventual consistency caveat. Something is trying to do a HEAD or GET before the file has been PUT for the first time.

Does s3fs always do a HEAD or GET before uploading a new file to s3? Or am I maybe running into some weird behavior specific to my sftp server?

<!-- gh-comment-id:657634022 --> @brianfay commented on GitHub (Jul 13, 2020): @gaul The AWS docs that you linked mention that read-after-write consistency is guaranteed except in this one condition: >The caveat is that if you make a HEAD or GET request to a key name before the object is created, then create the object shortly after that, a subsequent GET might not return the object due to eventual consistency. I'm using s3fs to upload files from an sftp server to S3. The S3 bucket is configured to send notifications whenever a file is uploaded, and I have an application that listens for these notifications and tries to download the file. I often see a similar "file does not exist" error, and it seems like I must be falling into this eventual consistency caveat. Something is trying to do a HEAD or GET before the file has been PUT for the first time. Does s3fs always do a HEAD or GET before uploading a new file to s3? Or am I maybe running into some weird behavior specific to my sftp server?
Author
Owner

@ggtakec commented on GitHub (Jul 13, 2020):

@liuzqt @brianfay
s3fs caches the stats information as the file system set for the object of S3.
(These are set as x-amz-meta-* headers to the object)
The control to cache this stats can be set with the following options:
max_stat_cache_size
stat_cache_expire
stat_cache_interval_expire
enable_noobj_cache

The option to cache that the target object does not exist is enable_noobj_cache.
Please check your s3fs options about stats cache, and tell us about these.

When operating a file, s3fs first checks the permissions of the parent directory, and then checks the existence and attributes of the file.
This is a necessary confirmation as a file system.
However, this results in a lot of HEAD requests.
The stats cache is used to reduce the number of s3fs HEAD requests.
However, this stats cache can cause lags and inconsistencies.

Also, because it is compatible with other clients that handle S3 as a file system, unnecessary checks may be performed.
The option to ignore this compatibility is notsup_compat_dir.
Specifying this option can reduce the sending of HEAD requests considerably.

Thanks in advance for your assistance.

<!-- gh-comment-id:657882965 --> @ggtakec commented on GitHub (Jul 13, 2020): @liuzqt @brianfay s3fs caches the stats information as the file system set for the object of S3. (These are set as `x-amz-meta-*` headers to the object) The control to cache this stats can be set with the following options: max_stat_cache_size stat_cache_expire stat_cache_interval_expire enable_noobj_cache The option to cache that the target object does not exist is `enable_noobj_cache`. Please check your s3fs options about stats cache, and tell us about these. When operating a file, s3fs first checks the permissions of the parent directory, and then checks the existence and attributes of the file. This is a necessary confirmation as a file system. However, this results in a lot of HEAD requests. The stats cache is used to reduce the number of s3fs HEAD requests. However, this stats cache can cause lags and inconsistencies. Also, because it is compatible with other clients that handle S3 as a file system, unnecessary checks may be performed. The option to ignore this compatibility is `notsup_compat_dir`. Specifying this option can reduce the sending of HEAD requests considerably. Thanks in advance for your assistance.
Author
Owner

@brianfay commented on GitHub (Jul 14, 2020):

Hi @ggtakec, thank you so much for the info!

I checked our fstab entry for s3fs and these are the relevant settings that we have configured:

stat_cache_expire=30
enable_noobj_cache
use_cache (pointed at a local directory)

We do not have that notsup_compat_dir enabled, could be interesting to try that.

To be honest as I investigate this issue I realize there's a lot of other complications:

  • we're on a very old version of s3fs (s3fs --version says V1.79)
  • we have multiple sftp servers in an autoscaling group, each one mounting the same s3 bucket via s3fs. I imagine this would cause a lot of weirdness with caching, as each server in the group will need to build its own cache. And sometimes servers will be terminated, new servers will be spun up and will have to rebuild cache.

I guess trying to upgrade s3fs, enable notsup_compat_dir, and rethinking the server deployment approach are all good steps to take to mitigate this problem.

But that said, it seems like no matter what we do, there's no way to completely avoid S3's eventual consistency caveat.

Do you have any recommendations for how to handle this on the application side, when an "object does not exist" error is received? Right now we have some basic retry logic - try to get the file every five seconds for fifteen seconds, but that often still fails. Perhaps we should use exponential backoff and retry for a longer period of time?

<!-- gh-comment-id:658270134 --> @brianfay commented on GitHub (Jul 14, 2020): Hi @ggtakec, thank you so much for the info! I checked our fstab entry for s3fs and these are the relevant settings that we have configured: `stat_cache_expire`=30 `enable_noobj_cache` `use_cache` (pointed at a local directory) We do not have that `notsup_compat_dir` enabled, could be interesting to try that. To be honest as I investigate this issue I realize there's a lot of other complications: - we're on a very old version of s3fs (`s3fs --version` says V1.79) - we have multiple sftp servers in an autoscaling group, each one mounting the same s3 bucket via s3fs. I imagine this would cause a lot of weirdness with caching, as each server in the group will need to build its own cache. And sometimes servers will be terminated, new servers will be spun up and will have to rebuild cache. I guess trying to upgrade s3fs, enable `notsup_compat_dir`, and rethinking the server deployment approach are all good steps to take to mitigate this problem. But that said, it seems like no matter what we do, there's no way to completely avoid S3's eventual consistency caveat. Do you have any recommendations for how to handle this on the application side, when an "object does not exist" error is received? Right now we have some basic retry logic - try to get the file every five seconds for fifteen seconds, but that often still fails. Perhaps we should use exponential backoff and retry for a longer period of time?
Author
Owner

@ggtakec commented on GitHub (Jul 14, 2020):

@brianfay If you get an object does not exist as a result of calling the S3 API directly, you probably don't have a problem with s3fs.
However, if you receive this error when accessing via s3fs, the s3fs options may be affecting it.
Try without the enable_noobj_cache option, which caches not existing.
In order to use this option, please use the latest version (but some bugs have been fixed and it will be updated in the near future).

<!-- gh-comment-id:658460152 --> @ggtakec commented on GitHub (Jul 14, 2020): @brianfay If you get an `object does not exist` as a result of calling the S3 API directly, you probably don't have a problem with s3fs. However, if you receive this error when accessing via s3fs, the s3fs options may be affecting it. Try without the `enable_noobj_cache` option, which caches `not existing`. In order to use this option, please use the latest version (but some bugs have been fixed and it will be updated in the near future).
Author
Owner

@gaul commented on GitHub (Jul 27, 2022):

@brianfay Could you test with the latest master? 404c284440 may address these symptoms.

<!-- gh-comment-id:1196579298 --> @gaul commented on GitHub (Jul 27, 2022): @brianfay Could you test with the latest master? 404c284440fcbf52671ff703396bd5c22f2f31e3 may address these symptoms.
Author
Owner

@brianfay commented on GitHub (Jul 27, 2022):

Hi @gaul, I appreciate the update, but I am no longer using s3fs and don't have any way to test the behavior that I was seeing a few years ago.

<!-- gh-comment-id:1197019334 --> @brianfay commented on GitHub (Jul 27, 2022): Hi @gaul, I appreciate the update, but I am no longer using s3fs and don't have any way to test the behavior that I was seeing a few years ago.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#656
No description provided.