mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #1226] randomly produced file not exist error in high concurrency #656
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#656
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @liuzqt on GitHub (Jan 8, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1226
I'm running large scale tasks (over 2000 EC2 instances) referencing the same S3 source mounted on local disk.
And I randomly produced
file not existerror inboost::filesystem::existsin very few instances, but actually those files do exist, and I was able to somehow hotfix it by retrying a few times......It's very hard to reproduce this issue, it happened at a very low chance randomly, and on different files each time.
I'm using Ubuntu 16.04 with g++ 5.4.0, C++14 and boost 1.59
Just curious if anyone has met this issue before. Though retry is a workaround for it but that make my code very verbose and I'm not sure what kind of
boost::filesystemoperation might fail unexpectedly.@gaul commented on GitHub (Feb 2, 2020):
This may be eventual consistency. We have not explored this in s3fs since we have been fixing more basic issues but is something that interests me. Some of the ways s3fs accesses data are suboptimal; fetching metadata from S3 when it already has it locally. Unfortunately some of it may be inherent to S3 itself. If you can characterize this issue further, e.g., read-after-create, read-after-write, read-after-delete, etc., it might help direct our efforts. In the the long run I want to explore the eventual consistency mode of S3Proxy to test s3fs.
@brianfay commented on GitHub (Jul 13, 2020):
@gaul The AWS docs that you linked mention that read-after-write consistency is guaranteed except in this one condition:
I'm using s3fs to upload files from an sftp server to S3. The S3 bucket is configured to send notifications whenever a file is uploaded, and I have an application that listens for these notifications and tries to download the file.
I often see a similar "file does not exist" error, and it seems like I must be falling into this eventual consistency caveat. Something is trying to do a HEAD or GET before the file has been PUT for the first time.
Does s3fs always do a HEAD or GET before uploading a new file to s3? Or am I maybe running into some weird behavior specific to my sftp server?
@ggtakec commented on GitHub (Jul 13, 2020):
@liuzqt @brianfay
s3fs caches the stats information as the file system set for the object of S3.
(These are set as
x-amz-meta-*headers to the object)The control to cache this stats can be set with the following options:
max_stat_cache_size
stat_cache_expire
stat_cache_interval_expire
enable_noobj_cache
The option to cache that the target object does not exist is
enable_noobj_cache.Please check your s3fs options about stats cache, and tell us about these.
When operating a file, s3fs first checks the permissions of the parent directory, and then checks the existence and attributes of the file.
This is a necessary confirmation as a file system.
However, this results in a lot of HEAD requests.
The stats cache is used to reduce the number of s3fs HEAD requests.
However, this stats cache can cause lags and inconsistencies.
Also, because it is compatible with other clients that handle S3 as a file system, unnecessary checks may be performed.
The option to ignore this compatibility is
notsup_compat_dir.Specifying this option can reduce the sending of HEAD requests considerably.
Thanks in advance for your assistance.
@brianfay commented on GitHub (Jul 14, 2020):
Hi @ggtakec, thank you so much for the info!
I checked our fstab entry for s3fs and these are the relevant settings that we have configured:
stat_cache_expire=30enable_noobj_cacheuse_cache(pointed at a local directory)We do not have that
notsup_compat_direnabled, could be interesting to try that.To be honest as I investigate this issue I realize there's a lot of other complications:
s3fs --versionsays V1.79)I guess trying to upgrade s3fs, enable
notsup_compat_dir, and rethinking the server deployment approach are all good steps to take to mitigate this problem.But that said, it seems like no matter what we do, there's no way to completely avoid S3's eventual consistency caveat.
Do you have any recommendations for how to handle this on the application side, when an "object does not exist" error is received? Right now we have some basic retry logic - try to get the file every five seconds for fifteen seconds, but that often still fails. Perhaps we should use exponential backoff and retry for a longer period of time?
@ggtakec commented on GitHub (Jul 14, 2020):
@brianfay If you get an
object does not existas a result of calling the S3 API directly, you probably don't have a problem with s3fs.However, if you receive this error when accessing via s3fs, the s3fs options may be affecting it.
Try without the
enable_noobj_cacheoption, which cachesnot existing.In order to use this option, please use the latest version (but some bugs have been fixed and it will be updated in the near future).
@gaul commented on GitHub (Jul 27, 2022):
@brianfay Could you test with the latest master?
404c284440may address these symptoms.@brianfay commented on GitHub (Jul 27, 2022):
Hi @gaul, I appreciate the update, but I am no longer using s3fs and don't have any way to test the behavior that I was seeing a few years ago.