[GH-ISSUE #117] S3fs occasionally fails to open files #72

Closed
opened 2026-03-04 01:41:46 +03:00 by kerem · 4 comments
Owner

Originally created by @boazrf on GitHub (Feb 4, 2015).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/117

While running process that does intensive read/write s3fs occasionally fails to open files.
Problem may be related to #94

Environment:
s3fs version: 1.78
OS: Amazon linux
Linux version 3.10.40-50.136.amzn1.x86_64 (mockbuild@gobi-build-60001) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) #1 SMP Tue May 13 21:35:08 UTC 2014

Scenario

  1. Multi part read is disabled (-o nomultipart)
  2. Application uses many threads to read and write from/to s3.
  3. Not reading/writing to the same file at the same time. Read is done long time (minutes) after write
  4. Occasionally getting an error trying to open or read from file
  5. Using -d -f -o curldbg -o f2 for debug, i'm seeing two issues in the log

Issue 1
Even when all works well I keep seeing the message Hostname was NOT found in DNS cache repeats constantly throughout the log. This might be curl issue but I'm not sure how to resolve it. I've encountered suggestion to downgrade curl to 7.31 (http://stackoverflow.com/questions/27093467/curl-hostname-was-not-found-in-dns-cache-error) but I haven't figured out yet how to do it (yum downgrade doesn't work).

* Connection #0 to host test-boaz2.s3-us-west-2.amazonaws.com left intact
* Hostname was NOT found in DNS cache
*   Trying 54.231.164.145...
* Connected to test-boaz2.s3-us-west-2.amazonaws.com (54.231.164.145) port 80 (#0)
> HEAD /1tb2/metaData HTTP/1.1

Issue 2
After running for a while I'm getting Could not resolve host: and CURLE_COULDNT_RESOLVE_HOST messages followed by a failure to open the file after several retries (I increased retries to 100 but it didn't help).

* Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com
* Closing connection 10
* Hostname was NOT found in DNS cache
* Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com
* Closing connection 0
* Hostname was NOT found in DNS cache
* Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com
* Closing connection 11
* Hostname was NOT found in DNS cache
* Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com
* Closing connection 11
* Hostname was NOT found in DNS cache
* Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com
* Closing connection 11
* Hostname was NOT found in DNS cache
* Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com
* Closing connection 1
* Connection timed out after 10519 milliseconds
* Closing connection 1
* Hostname was NOT found in DNS cache
* Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com
* Closing connection 12
37): [path=/1tb2/tmp/index_block-12-1-0-12-129600843.rct][fd=64][refcnt=1]
    RequestPerform(1726): ### retrying...
    RemakeHandle(1393): Retry request. [type=1][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/tmp/index_block-21-1-0-12-129600843.xml][path=/1tb2/tmp/index_block-21
-1-0-12-129600843.xml]
RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST
    s3fs_read(1935): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][size=131072][offset=20168704][fd=494]
    Open(1147): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][size=-1][time=-1]
    Open(577): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][size=-1][time=-1]
    Dup(561): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][refcnt=2]
    Read(933): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][offset=20168704][size=131072]
    Load(786): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][offset=20168704][size=131072]
    s3fs_read(1935): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][size=131072][offset=20299776][fd=494]
    Open(1147): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][size=-1][time=-1]
    GetObjectRequest(2341): [tpath=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][start=20195136][size=108576]
    get_object_attribute(329): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat]
    Open(577): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][size=-1][time=-1]
    Dup(561): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][refcnt=3]
    GetStat(170): stat cache hit [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][time=1423023545][hit count=187]
    PreGetObjectRequest(2275): [tpath=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][start=20195136][size=108576]
    prepare_url(174): URL is http://s3-us-west-2.amazonaws.com/test-boaz2/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat
    prepare_url(204): URL changed is http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat
    GetObjectRequest(2356): downloading... [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494]
    RequestPerform(1571): connecting to URL http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat
RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST
    RequestPerform(1726): ### retrying...
    RemakeHandle(1393): Retry request. [type=1][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p1/18/.fuse_hidden0000f460000002a9][path=/1tb2/data/1/2/p1/18
/.fuse_hidden0000f460000002a9]
RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST
    RequestPerform(1726): ### retrying...
    RemakeHandle(1393): Retry request. [type=1][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p2/5/index_block-5-2-0-0-10000000.idx][path=/1tb2/data/1/2/p2
/5/index_block-5-2-0-0-10000000.idx]
RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST
    RequestPerform(1726): ### retrying...
    RemakeHandle(1393): Retry request. [type=1][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/tmp/index_block-21-1-0-12-129600843.xml][path=/1tb2/tmp/index_block-21
-1-0-12-129600843.xml]
RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST
    RequestPerform(1726): ### retrying...
    RemakeHandle(1393): Retry request. [type=4][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][path=/1tb2/data/1
/2/p1/12/index_block-12-1-11-11-120000000.dat]
Originally created by @boazrf on GitHub (Feb 4, 2015). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/117 While running process that does intensive read/write s3fs occasionally fails to open files. Problem may be related to #94 **Environment:** s3fs version: 1.78 OS: Amazon linux Linux version 3.10.40-50.136.amzn1.x86_64 (mockbuild@gobi-build-60001) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) #1 SMP Tue May 13 21:35:08 UTC 2014 **Scenario** 1. Multi part read is disabled (-o nomultipart) 2. Application uses many threads to read and write from/to s3. 3. _Not_ reading/writing to the same file at the same time. Read is done long time (minutes) after write 4. Occasionally getting an error trying to open or read from file 5. Using -d -f -o curldbg -o f2 for debug, i'm seeing two issues in the log **Issue 1** Even when all works well I keep seeing the message _Hostname was NOT found in DNS cache_ repeats constantly throughout the log. This might be curl issue but I'm not sure how to resolve it. I've encountered suggestion to downgrade curl to 7.31 (http://stackoverflow.com/questions/27093467/curl-hostname-was-not-found-in-dns-cache-error) but I haven't figured out yet how to do it (yum downgrade doesn't work). ``` * Connection #0 to host test-boaz2.s3-us-west-2.amazonaws.com left intact * Hostname was NOT found in DNS cache * Trying 54.231.164.145... * Connected to test-boaz2.s3-us-west-2.amazonaws.com (54.231.164.145) port 80 (#0) > HEAD /1tb2/metaData HTTP/1.1 ``` **Issue 2** After running for a while I'm getting _Could not resolve host:_ and _CURLE_COULDNT_RESOLVE_HOST_ messages followed by a failure to open the file after several retries (I increased retries to 100 but it didn't help). ``` * Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com * Closing connection 10 * Hostname was NOT found in DNS cache * Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com * Closing connection 0 * Hostname was NOT found in DNS cache * Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com * Closing connection 11 * Hostname was NOT found in DNS cache * Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com * Closing connection 11 * Hostname was NOT found in DNS cache * Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com * Closing connection 11 * Hostname was NOT found in DNS cache * Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com * Closing connection 1 * Connection timed out after 10519 milliseconds * Closing connection 1 * Hostname was NOT found in DNS cache * Could not resolve host: test-boaz2.s3-us-west-2.amazonaws.com * Closing connection 12 37): [path=/1tb2/tmp/index_block-12-1-0-12-129600843.rct][fd=64][refcnt=1] RequestPerform(1726): ### retrying... RemakeHandle(1393): Retry request. [type=1][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/tmp/index_block-21-1-0-12-129600843.xml][path=/1tb2/tmp/index_block-21 -1-0-12-129600843.xml] RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST s3fs_read(1935): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][size=131072][offset=20168704][fd=494] Open(1147): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][size=-1][time=-1] Open(577): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][size=-1][time=-1] Dup(561): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][refcnt=2] Read(933): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][offset=20168704][size=131072] Load(786): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][offset=20168704][size=131072] s3fs_read(1935): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][size=131072][offset=20299776][fd=494] Open(1147): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][size=-1][time=-1] GetObjectRequest(2341): [tpath=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][start=20195136][size=108576] get_object_attribute(329): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat] Open(577): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][size=-1][time=-1] Dup(561): [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494][refcnt=3] GetStat(170): stat cache hit [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][time=1423023545][hit count=187] PreGetObjectRequest(2275): [tpath=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][start=20195136][size=108576] prepare_url(174): URL is http://s3-us-west-2.amazonaws.com/test-boaz2/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat prepare_url(204): URL changed is http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat GetObjectRequest(2356): downloading... [path=/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][fd=494] RequestPerform(1571): connecting to URL http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST RequestPerform(1726): ### retrying... RemakeHandle(1393): Retry request. [type=1][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p1/18/.fuse_hidden0000f460000002a9][path=/1tb2/data/1/2/p1/18 /.fuse_hidden0000f460000002a9] RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST RequestPerform(1726): ### retrying... RemakeHandle(1393): Retry request. [type=1][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p2/5/index_block-5-2-0-0-10000000.idx][path=/1tb2/data/1/2/p2 /5/index_block-5-2-0-0-10000000.idx] RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST RequestPerform(1726): ### retrying... RemakeHandle(1393): Retry request. [type=1][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/tmp/index_block-21-1-0-12-129600843.xml][path=/1tb2/tmp/index_block-21 -1-0-12-129600843.xml] RequestPerform(1633): ### CURLE_COULDNT_RESOLVE_HOST RequestPerform(1726): ### retrying... RemakeHandle(1393): Retry request. [type=4][url=http://test-boaz2.s3-us-west-2.amazonaws.com/1tb2/data/1/2/p1/12/index_block-12-1-11-11-120000000.dat][path=/1tb2/data/1 /2/p1/12/index_block-12-1-11-11-120000000.dat] ```
kerem closed this issue 2026-03-04 01:41:46 +03:00
Author
Owner

@ggtakec commented on GitHub (Mar 8, 2015):

I'm sorry for replying late.

I tried to find the reason searched about "CURLE_COULDNT_RESOLVE_HOST" error, but I did not get clear answer.
I have not been get the reason and not been able to reproduce yet.

s3fs uses DNS cache and session id cache by libcurl CURLSHOPT_SHARE as default.
So if you can, please try to run s3fs with nodnscache and nosscache option.
These option means that s3fs does not use DNS/Session cache.

And if you can, I want to know the result about that you run s3fs with HTTP.(not HTTPS)

I'm sorry for that I do not know the cause of this Issue yet.
(We should know why libcurl could not know resolv host.)

Thanks in advance for your assistance.

<!-- gh-comment-id:77739967 --> @ggtakec commented on GitHub (Mar 8, 2015): I'm sorry for replying late. I tried to find the reason searched about "CURLE_COULDNT_RESOLVE_HOST" error, but I did not get clear answer. I have not been get the reason and not been able to reproduce yet. s3fs uses DNS cache and session id cache by libcurl CURLSHOPT_SHARE as default. So if you can, please try to run s3fs with nodnscache and nosscache option. These option means that s3fs does not use DNS/Session cache. And if you can, I want to know the result about that you run s3fs with HTTP.(not HTTPS) I'm sorry for that I do not know the cause of this Issue yet. (We should know why libcurl could not know resolv host.) Thanks in advance for your assistance.
Author
Owner

@boazrf commented on GitHub (Mar 8, 2015):

Takeshi, thanks for reviewing the problem.
I was able to overcome the problem by downgrading libcurl to version 7.31. It seems that there is a known bug in new version of curl that causes a failure with it's DNS cache in some cases. The bug doesn't exists in version 7.31 (http://stackoverflow.com/questions/27093467/curl-hostname-was-not-found-in-dns-cache-error).

Because yum downgrade didn't work I did the following:

  1. I've downloaded curl 7.31 source (wget http://www.execve.net/curl/curl-7.31.0.tar.gz)
  2. Built it
  3. Manually replaced libcurl (cp libcurl.so.4.3.0 /usr/lib64/libcurl.so.4.3.0)
  4. Restart s3fs mount

Following that the message Hostname was NOT found in DNS cache disappeared and so did CURLE_COULDNT_RESOLVE_HOST, and most important: files opening stopped failing.

So - I consider this issue closed. I belive this workaround will also resolved issue #94. It might be a good idea to update install doc and add check for valid libcurl version.

<!-- gh-comment-id:77742172 --> @boazrf commented on GitHub (Mar 8, 2015): Takeshi, thanks for reviewing the problem. I was able to overcome the problem by downgrading libcurl to version 7.31. It seems that there is a known bug in new version of curl that causes a failure with it's DNS cache in some cases. The bug doesn't exists in version 7.31 (http://stackoverflow.com/questions/27093467/curl-hostname-was-not-found-in-dns-cache-error). Because yum downgrade didn't work I did the following: 1. I've downloaded curl 7.31 source (wget http://www.execve.net/curl/curl-7.31.0.tar.gz) 2. Built it 3. Manually replaced libcurl (cp libcurl.so.4.3.0 /usr/lib64/libcurl.so.4.3.0) 4. Restart s3fs mount Following that the message _Hostname was NOT found in DNS cache_ disappeared and so did _CURLE_COULDNT_RESOLVE_HOST_, and most important: files opening stopped failing. So - I consider this issue closed. I belive this workaround will also resolved issue #94. It might be a good idea to update install doc and add check for valid libcurl version.
Author
Owner

@ggtakec commented on GitHub (Mar 8, 2015):

@boazrf Thanks a lot.

<!-- gh-comment-id:77745320 --> @ggtakec commented on GitHub (Mar 8, 2015): @boazrf Thanks a lot.
Author
Owner

@ggtakec commented on GitHub (Jan 17, 2016):

I'm closing this issue, if you have a problem yet, please post new issue or reopen this issue.

Thanks in advance for your help.

<!-- gh-comment-id:172299973 --> @ggtakec commented on GitHub (Jan 17, 2016): I'm closing this issue, if you have a problem yet, please post new issue or reopen this issue. Thanks in advance for your help.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#72
No description provided.