[GH-ISSUE #821] Modify s3fs default behaviour #477

Closed
opened 2026-03-04 01:45:57 +03:00 by kerem · 5 comments
Owner

Originally created by @aurorak on GitHub (Sep 13, 2018).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/821

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.

Version of s3fs being used (s3fs --version)

1.84

Version of fuse being used (pkg-config --modversion fuse)

2.9.4

System information (uname -r)

3.10.0-514.el7.x86_64

Distro (cat /etc/issue)

Red Hat Enterprise Linux Server release 7.3 (Maipo)

s3fs command line used (if applicable)

/etc/fstab entry (if applicable):

s3fs#data /base_path/bucket/subdirFolder fuse _netdev,allow_other,nodnscache,use_path_request_style,url=http://blobstore.com,passwd_file=/path/to/password_file,endpoint=blob,umask=002,connect_timeout=20,readwrite_timeout=20,multireq_max=5,max_stat_cache_size=4096,stat_cache_expire=60,dbglevel=warn,modules=subdir,subdir=/subdirFolder 0 0

s3fs syslog messages (grep s3fs /var/log/syslog, or s3fs outputs)

Not posting the whole log but a summary from the log.


Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: > HEAD /bucket/subFolder/2018/9/10/20/54/1460035 HTTP/1.1
Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: < HTTP/1.1 404 Not Found

Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: > HEAD /bucket/subFolder/2018/9/10/20/54/1460035/ HTTP/1.1
Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: < HTTP/1.1 404 Not Found

Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: > HEAD /bucket/subFolder/2018/9/10/20/54/1460035_%24folder%24 HTTP/1.1
Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: < HTTP/1.1 404 Not Found

Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: > GET /bucket/?delimiter=/&max-keys=2&prefix=subFolder/2018/9/10/20/54/1460035/ HTTP/1.1
Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: < HTTP/1.1 200 OK

Details about issue

We are trying to get s3fs to work with our custom solution of AWS S3. Lets call it blobstore. We have about 20 nodes in our database and each has s3fs reading from our blobstore. During each microbatch of our job we are writing files to our blobstore which are then read on each of our database nodes. The files once read are rarely ever read again. For each microbatch we are seeing connections from s3fs (on each node). The connections they make and their response code are summarized above. As you can see we are getting 3 404's and 1 successful connection. Is there any config changes that we could do to avoid seeing those 404's?

Originally created by @aurorak on GitHub (Sep 13, 2018). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/821 ### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ #### Version of s3fs being used (s3fs --version) 1.84 #### Version of fuse being used (pkg-config --modversion fuse) 2.9.4 #### System information (uname -r) 3.10.0-514.el7.x86_64 #### Distro (cat /etc/issue) Red Hat Enterprise Linux Server release 7.3 (Maipo) #### s3fs command line used (if applicable) ``` ``` #### /etc/fstab entry (if applicable): ``` s3fs#data /base_path/bucket/subdirFolder fuse _netdev,allow_other,nodnscache,use_path_request_style,url=http://blobstore.com,passwd_file=/path/to/password_file,endpoint=blob,umask=002,connect_timeout=20,readwrite_timeout=20,multireq_max=5,max_stat_cache_size=4096,stat_cache_expire=60,dbglevel=warn,modules=subdir,subdir=/subdirFolder 0 0 ``` #### s3fs syslog messages (grep s3fs /var/log/syslog, or s3fs outputs) Not posting the whole log but a summary from the log. ``` Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: > HEAD /bucket/subFolder/2018/9/10/20/54/1460035 HTTP/1.1 Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: < HTTP/1.1 404 Not Found Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: > HEAD /bucket/subFolder/2018/9/10/20/54/1460035/ HTTP/1.1 Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: < HTTP/1.1 404 Not Found Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: > HEAD /bucket/subFolder/2018/9/10/20/54/1460035_%24folder%24 HTTP/1.1 Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: < HTTP/1.1 404 Not Found Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: > GET /bucket/?delimiter=/&max-keys=2&prefix=subFolder/2018/9/10/20/54/1460035/ HTTP/1.1 Sep 10 20:54:53 msc02-jag-ver-001 s3fs[425925]: < HTTP/1.1 200 OK ``` ### Details about issue We are trying to get s3fs to work with our custom solution of AWS S3. Lets call it blobstore. We have about 20 nodes in our database and each has s3fs reading from our blobstore. During each microbatch of our job we are writing files to our blobstore which are then read on each of our database nodes. The files once read are rarely ever read again. For each microbatch we are seeing connections from s3fs (on each node). The connections they make and their response code are summarized above. As you can see we are getting 3 404's and 1 successful connection. Is there any config changes that we could do to avoid seeing those 404's?
kerem closed this issue 2026-03-04 01:45:57 +03:00
Author
Owner

@gaul commented on GitHub (Jan 24, 2019):

s3fs is trying to detect whether a file or a directory exists, including legacy directories. Does -o notsup_compat_dir eliminate some of these unwanted requests?

<!-- gh-comment-id:457029876 --> @gaul commented on GitHub (Jan 24, 2019): s3fs is trying to detect whether a file or a directory exists, including legacy directories. Does `-o notsup_compat_dir` eliminate some of these unwanted requests?
Author
Owner

@acdha commented on GitHub (Mar 15, 2019):

For my project notsup_compat_dir breaks compatibility because we have keys following the / suffix pattern but no empty directory objects. If, like awscli, it did the key prefix check first this would work without needing to create empty metadata files.

<!-- gh-comment-id:473291839 --> @acdha commented on GitHub (Mar 15, 2019): For my project `notsup_compat_dir` breaks compatibility because we have keys following the `/` suffix pattern but no empty directory objects. If, like awscli, it did the key prefix check first this would work without needing to create empty metadata files.
Author
Owner

@kahing commented on GitHub (Apr 9, 2019):

Prefix checks are 10x more expensive than 404. Metadata operations are usually a significant part of your overall see expenses. Do the 404s bother you or are they causing real problems?

<!-- gh-comment-id:481138964 --> @kahing commented on GitHub (Apr 9, 2019): Prefix checks are 10x more expensive than 404. Metadata operations are usually a significant part of your overall see expenses. Do the 404s bother you or are they causing real problems?
Author
Owner

@ggtakec commented on GitHub (Apr 9, 2019):

By using the latest s3fs and using curl version 7.51.0 or later, you can maintain an SSL session even with 404 errors. This will minimize the cost.
Please try it.

<!-- gh-comment-id:481245551 --> @ggtakec commented on GitHub (Apr 9, 2019): By using the latest s3fs and using curl version 7.51.0 or later, you can maintain an SSL session even with 404 errors. This will minimize the cost. Please try it.
Author
Owner

@gaul commented on GitHub (Oct 10, 2020):

This works as intended.

<!-- gh-comment-id:706528978 --> @gaul commented on GitHub (Oct 10, 2020): This works as intended.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#477
No description provided.