mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #94] s3fs: failed to read - randomly occuring #56
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#56
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mknwebsolutions on GitHub (Dec 9, 2014).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/94
I've got s3fs mounted and working, every once in a while I'll see a handful of errors i.e. below:
Eventually the file retry will reach limit and not pull file over s3fs. Not sure why this is happening, it's been pretty random... I want to say I see it occurring more often when files are 20mb+
I'm able to switch into the s3fs directory and view the actual files / touch new files / etc.
@gaul commented on GitHub (Dec 9, 2014):
@mknwebsolutions What does not reach limit mean? Is the file corrupt, or does your application not report an error? If you are running 1.78 and have an intermittent network connection you may have encountered #64.
@mknwebsolutions commented on GitHub (Dec 9, 2014):
@andrewgaul the "not reach limit is this error below:
Dec 9 05:40:13 ip-10-0-0-16 s3fs: Over retry count(3) limit(/file-name-here:1).
The file isn't corrupt. I've tried dumping random files with random sizes across and I see this issue - def not corrupt files. The network is through amazon AWS, and I'm sure AWS isn't having any network issues.
@chrislovecnm commented on GitHub (Jan 6, 2015):
I am getting the exact same problem. How can we help you debug this?? I have compiled the master
@mknwebsolutions commented on GitHub (Jan 6, 2015):
So I was actually able to get everything working after rebooting the server. It just worked (and still working) since then.
@ggtakec commented on GitHub (Jan 6, 2015):
Hi, all
(I'm sorry for replying late.)
s3fs supports multiparts request(send some request as parallel), I think this problem is dependent on the number of parallel requests as possible.
If you can, please try to set small value for multireq_max and parallel_count options.
I want to know the result of this.
Thanks in advance for your help.
@chrislovecnm commented on GitHub (Jan 6, 2015):
I have having the problem specifically on the aws ami amazon linux image. I am fine on Gentoo distro running it locally. I am spinning up a Gentoo docker to see if I am ok in aws on Gentoo.
What ami's are confirmed to work?
@ggtakec I will test your recommendations as well.
@chrislovecnm commented on GitHub (Jan 6, 2015):
@ggtakec initial testing is showing that this appears to be a distro issue. Amazon Linux AMI is throwing those errors insanely. While gentoo docker running on the same damn box is working like a champ. Man at times I HATE Centos and RHEL...
@mknwebsolutions commented on GitHub (Jan 13, 2015):
I was able to solve the issue by just installing latest s3fs and rebooting.
@csgyuricza commented on GitHub (Jan 13, 2015):
Thank you - I am now able to run it with the latest version, but I still get that same timeout error occasionally.
@mknwebsolutions commented on GitHub (Jan 13, 2015):
What's your bash look like for mounting?
@mknwebsolutions commented on GitHub (Jan 13, 2015):
Actually I take that back, looks like my mounted s3 went bad a few hours ago "Transport endpoint is not connected"
@mknwebsolutions commented on GitHub (Jan 13, 2015):
I'm going to try out the -f option (foreground) from https://github.com/s3fs-fuse/s3fs-fuse/issues/57
@mknwebsolutions commented on GitHub (Jan 13, 2015):
-f didn't work, back to having the same issue, log below:
@mknwebsolutions commented on GitHub (Jan 13, 2015):
I bumped my instance up to a Medium Instance on EC2 -- errors are immediately gone. This is the second server that followed suit. Micro + Small EC2 instances constantly fail, must be a weak connection or so?
@chrislovecnm commented on GitHub (Jan 28, 2015):
@mknwebsolutions regardless of connection speed I have this problem with the Amazon ami.
@mknwebsolutions commented on GitHub (Jan 28, 2015):
It's a very weird issue. After my last comment here, my medium instance did again fail a few times. After numerous restart, s3fs finally locked in and is still steady today. Bug is unknown so far. Could be a DNS issue or something.
@ggtakec commented on GitHub (Mar 8, 2015):
Hi, all
I heard the libcurl problem(?) about this issue from @boazrf in #117.
One of case, when s3fs gets CURLE_COULDNT_RESOLVE_HOST error, it makes timeout error.
If someone who has same problem, please try to check libcurl version.
Thanks in advance for your assistance.
@mknwebsolutions commented on GitHub (Mar 10, 2015):
@ggtakec makes sense, I figured it was something with DNS. Downgrading I'd say is a temp solution until real solid solution is rolled out.
@ggtakec commented on GitHub (Mar 24, 2015):
I'm looking for a cause of this problem now, but not able to solve these problems.
I think that a cause of #117 was CURLE_COULDNT_RESOLVE_HOST, this has failed to resolve the host name.(There are a lot of direct cause for this.)
Otherwise, I was using a s3fs that was very small connet timeout on EC2, and I was able to get a retry error as same as mknwebsolutions's result.
If the this cause is derived from the connect timeout, we should set a large value for "connect_timeout" options(this value is 10s as default). Maybe then need to set "readwrite_timeout" option too(readwrite_timeout default value is 30s).
Last, about #105 "transport endpoint not connected" error, this error probably is ENOTCONN(errno) which might be a bug of s3fs. However, this error is related to connect too, then maybe we can avoid this problem in the above options.
If you can, please try to specify those options, and let me know the result.
Regards,
@mknwebsolutions commented on GitHub (Mar 24, 2015):
@ggtakec I'm recalling some prior experience where 10s timeout is definitely too low, should be at least 30 seconds. If I remember, I had issues in the past with AWS endpoints taking greater than 10s to resolve "X".
@ggtakec commented on GitHub (Apr 12, 2015):
@mknwebsolutions I updated the default timeout value changed by #167.
Please check it.
If the timeout error is occurred, please try to change timeout value by connect_timeout and readwrite_timeout options.
Regards,
@ggtakec commented on GitHub (Jan 17, 2016):
I'm closing this issue, if you have a problem yet, please post new issue or reopen this issue.
Thanks in advance for your help.