[GH-ISSUE #1465] Trying to improve scan time of files in s3fs-fuse mount

kerem commented

2026-03-04 01:48:39 +03:00

Owner

Originally created by @matrush900 on GitHub (Oct 30, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1465

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD

Version of s3fs being used (s3fs --version)

1.86

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

2.9.2

Kernel information (uname -r)

3.10.0-1127.13.1.el7.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="Red Hat Enterprise Linux Server"
VERSION="7.8 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.8"
PRETTY_NAME=RHEL
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.8:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.8
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.8"

s3fs command line used, if applicable

/etc/fstab entry, if applicable

accumulo-cold-archive /data-cold fuse.s3fs kernel_cache,max_background=1000,max_stat_cache_size=300000,enable_noobj_cache,multipart_size=52,parallel_count=15,multireq_max=15,dbglevel=warn,_netdev,allow_other,mp_umask=0022,nonempty,use_path_request,iam_role=auto 0 0

#### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)
_if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages_

### Details about issue
We are running an HDFS cluster with Accumulo.  We have 48 datanodes in the cluster, each with 5-2TB EBS plus an S3 mount using s3fs-fuse that we've recently added.    The configuration within HDFS has each node pointing to its own folder within the S3 mount e.g. /data-cold/cloud-int-data1a/dfs/dn  .  We are running into a bit of an issue when restarting a datanode, part of the startup process is cataloging/scanning each object under each of the 5-2TB drives along with the /data-cold s3fs mountpoint, each EBS volume takes ~45 seconds, where the /data-cold mount takes 14-20 minutes.  I realize that s3fs will be slower, but are there any parameter changes we should try to speed this up?

Originally created by @matrush900 on GitHub (Oct 30, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1465 ### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ _Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD_ #### Version of s3fs being used (s3fs --version) 1.86 #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) 2.9.2 #### Kernel information (uname -r) 3.10.0-1127.13.1.el7.x86_64 #### GNU/Linux Distribution, if applicable (cat /etc/os-release) NAME="Red Hat Enterprise Linux Server" VERSION="7.8 (Maipo)" ID="rhel" ID_LIKE="fedora" VARIANT="Server" VARIANT_ID="server" VERSION_ID="7.8" PRETTY_NAME=RHEL ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:7.8:GA:server" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.8 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.8" #### s3fs command line used, if applicable ``` ``` #### /etc/fstab entry, if applicable accumulo-cold-archive /data-cold fuse.s3fs kernel_cache,max_background=1000,max_stat_cache_size=300000,enable_noobj_cache,multipart_size=52,parallel_count=15,multireq_max=15,dbglevel=warn,_netdev,allow_other,mp_umask=0022,nonempty,use_path_request,iam_role=auto 0 0 ``` #### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs) _if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages_ ``` ``` ### Details about issue We are running an HDFS cluster with Accumulo. We have 48 datanodes in the cluster, each with 5-2TB EBS plus an S3 mount using s3fs-fuse that we've recently added. The configuration within HDFS has each node pointing to its own folder within the S3 mount e.g. /data-cold/cloud-int-data1a/dfs/dn . We are running into a bit of an issue when restarting a datanode, part of the startup process is cataloging/scanning each object under each of the 5-2TB drives along with the /data-cold s3fs mountpoint, each EBS volume takes ~45 seconds, where the /data-cold mount takes 14-20 minutes. I realize that s3fs will be slower, but are there any parameter changes we should try to speed this up?

kerem

2026-03-04 01:48:39 +03:00

closed this issue
added the
performance
label

kerem commented

2026-03-04 01:48:40 +03:00

Author

Owner

@matrush900 commented on GitHub (Oct 30, 2020):

I forgot to mention that each HDFS block in S3 is 128MB, and there are 150000-200000 objects in each datanode folder in the bucket.

@matrush900 commented on GitHub (Oct 30, 2020): I forgot to mention that each HDFS block in S3 is 128MB, and there are 150000-200000 objects in each datanode folder in the bucket.

kerem commented

2026-03-04 01:48:40 +03:00

Author

Owner

@gaul commented on GitHub (Oct 30, 2020):

Could you try increasing the value of -o multireq_max? s3fs is issuing many HEAD requests to get the stat information for readdir.

@gaul commented on GitHub (Oct 30, 2020): Could you try increasing the value of `-o multireq_max`? s3fs is issuing many HEAD requests to get the stat information for readdir.

kerem commented

2026-03-04 01:48:40 +03:00

Author

Owner

@tke273 commented on GitHub (Oct 30, 2020):

I work with Mat and we have that and parallel_count at 15. When we had it set to 30, we had the s3fs mount go offline on multiple nodes, then a umount/remount was needed to get going again. Now stable at 15, but the latency after mount is where we are looking for guidance. Is there a way to determine what the setting should be?

@tke273 commented on GitHub (Oct 30, 2020): I work with Mat and we have that and parallel_count at 15. When we had it set to 30, we had the s3fs mount go offline on multiple nodes, then a umount/remount was needed to get going again. Now stable at 15, but the latency after mount is where we are looking for guidance. Is there a way to determine what the setting should be?

kerem commented

2026-03-04 01:48:40 +03:00

Author

Owner

@gaul commented on GitHub (Oct 30, 2020):

Doubling the number should halve the scan time and so on. Please test again with 1.87; it includes fixes that might address your symptoms. We expect that s3fs should support a hundred or more concurrent requests. If it crashes we can investigate further.

@gaul commented on GitHub (Oct 30, 2020): Doubling the number should halve the scan time and so on. Please test again with 1.87; it includes fixes that might address your symptoms. We expect that s3fs should support a hundred or more concurrent requests. If it crashes we can investigate further.

kerem commented

2026-03-04 01:48:40 +03:00

Author

Owner

@matrush900 commented on GitHub (Nov 2, 2020):

We've tried a few different combinations of parallel_count and multireq_max, first with 1.86, then with 1.87. None of these combinations made any appreciable difference.
1.86 parallel_count=15, multireq_max=30 - 28 minutes
1.86 parallel_count=15, multireq_max=90 - 26 minutes
1.87 parallel_count=20, multireq_max=90 - 26 minutes
1.87 parallel_count=30, multireq_max=90 - 25 minutes

Are there any other parameter changes we may want to try? It doesn't appear that this s3fs-mount is crashing like before, but we may have to watch it for a day to find out.

@matrush900 commented on GitHub (Nov 2, 2020): We've tried a few different combinations of parallel_count and multireq_max, first with 1.86, then with 1.87. None of these combinations made any appreciable difference. 1.86 parallel_count=15, multireq_max=30 - 28 minutes 1.86 parallel_count=15, multireq_max=90 - 26 minutes 1.87 parallel_count=20, multireq_max=90 - 26 minutes 1.87 parallel_count=30, multireq_max=90 - 25 minutes Are there any other parameter changes we may want to try? It doesn't appear that this s3fs-mount is crashing like before, but we may have to watch it for a day to find out.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@matrush900 commented on GitHub (Nov 3, 2020):

It looks like our GetRequests to the bucket are topping out at 10,500/minute during these times as well.

@matrush900 commented on GitHub (Nov 3, 2020): It looks like our GetRequests to the bucket are topping out at 10,500/minute during these times as well.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@gaul commented on GitHub (Nov 4, 2020):

You might also try -o enable_noobj_cache. If this does not help, can you analyze the requests being sent via the logs in -o curldbg?

@gaul commented on GitHub (Nov 4, 2020): You might also try `-o enable_noobj_cache`. If this does not help, can you analyze the requests being sent via the logs in `-o curldbg`?

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@matrush900 commented on GitHub (Nov 11, 2020):

We already had enable_noobj_cache implemented, but I've added curldbg to several of our nodes. I'll let you know if we see anything strange from the curldbg output. We've also created a second bucket, and mounted it using s3fs-fuse. This looks promising so far, but needs some more load testing.

@matrush900 commented on GitHub (Nov 11, 2020): We already had enable_noobj_cache implemented, but I've added curldbg to several of our nodes. I'll let you know if we see anything strange from the curldbg output. We've also created a second bucket, and mounted it using s3fs-fuse. This looks promising so far, but needs some more load testing.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@gaul commented on GitHub (Jan 1, 2021):

I tried various values of multireq_max when running ls --color=always /path | wc -l on a bucket with 3,000 objects and ~100 ms latency:

multireq_max	run 1	run 2	run 3
10	74.663s	75.892s	77.274s
20	55.648s	57.224s	60.967s
40	47.310s	49.731s	50.041s
80	43.335s	45.279s	46.692s
160	42.993s	43.069s	44.234s

I used a new instance of s3fs in each run to prevent caching. While there is a positive effect from increasing multireq_max, there should have been a linear effect from increasing parallelism. This needs more investigation.

@gaul commented on GitHub (Jan 1, 2021): I tried various values of `multireq_max` when running `ls --color=always /path | wc -l` on a bucket with 3,000 objects and ~100 ms latency: | multireq_max | run 1 | run 2 | run 3 | | -----------: | ----: | ----: | ----: | | 10 | 74.663s | 75.892s | 77.274s | | 20 | 55.648s | 57.224s | 60.967s | | 40 | 47.310s | 49.731s | 50.041s | | 80 | 43.335s | 45.279s | 46.692s | | 160 | 42.993s | 43.069s | 44.234s | I used a new instance of s3fs in each run to prevent caching. While there is a positive effect from increasing `multireq_max`, there should have been a linear effect from increasing parallelism. This needs more investigation.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@matrush900 commented on GitHub (Jan 5, 2021):

Thanks for continuing to look at this. Here's an update on our current settings. We are connecting 10 buckets to each HDFS node with kernel_cache,max_background=1000,max_stat_cache_size=300000,enable_noobj_cache,multipart_size=52,parallel_count=15,multireq_max=15,dbglevel=warn,_netdev,allow_other,mp_umask=0022,use_path_request,iam_role=auto 0 0

At this point each s3fs process is using 7-20% CPU after an HDFS restart, when its cataloging/listing the objects contained under each bucket, each HDFS datanode has eight CPUs. I'm not sure if the 20% CPU limit is due to an s3fs of HDFS limitation.

@matrush900 commented on GitHub (Jan 5, 2021): Thanks for continuing to look at this. Here's an update on our current settings. We are connecting 10 buckets to each HDFS node with kernel_cache,max_background=1000,max_stat_cache_size=300000,enable_noobj_cache,multipart_size=52,parallel_count=15,multireq_max=15,dbglevel=warn,_netdev,allow_other,mp_umask=0022,use_path_request,iam_role=auto 0 0 At this point each s3fs process is using 7-20% CPU after an HDFS restart, when its cataloging/listing the objects contained under each bucket, each HDFS datanode has eight CPUs. I'm not sure if the 20% CPU limit is due to an s3fs of HDFS limitation.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@matrush900 commented on GitHub (Jan 7, 2021):

It looks like HDFS is creating about 1 billion list requests a day through s3fs-fuse across our 10 buckets. Would we benefit from increasing the list_object_max_keys from 1000 to 100000? Each s3fs mount is attached to a bucket with a folder containing around 85000 objects.

@matrush900 commented on GitHub (Jan 7, 2021): It looks like HDFS is creating about 1 billion list requests a day through s3fs-fuse across our 10 buckets. Would we benefit from increasing the list_object_max_keys from 1000 to 100000? Each s3fs mount is attached to a bucket with a folder containing around 85000 objects.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@matrush900 commented on GitHub (Jan 8, 2021):

We've removed the enable_noobj_cache flag, but no luck, LIST api calls are still high, no change.

@matrush900 commented on GitHub (Jan 8, 2021): We've removed the enable_noobj_cache flag, but no luck, LIST api calls are still high, no change.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@gaul commented on GitHub (Jan 11, 2021):

Related to #1482.

@gaul commented on GitHub (Jan 11, 2021): Related to #1482.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@matrush900 commented on GitHub (Jan 12, 2021):

We determined that as part of HDFS's routine, it was running a du -sk every 15 minutes on each storage directory to find out how much space was available. This was creating 90-95% of the 1 billion list requests per day. We changed out du with df to dramatically cut our ListObject requests.

@matrush900 commented on GitHub (Jan 12, 2021): We determined that as part of HDFS's routine, it was running a du -sk every 15 minutes on each storage directory to find out how much space was available. This was creating 90-95% of the 1 billion list requests per day. We changed out du with df to dramatically cut our ListObject requests.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@gaul commented on GitHub (Jan 13, 2021):

We determined that as part of HDFS's routine, it was running a du -sk every 15 minutes on each storage directory to find out how much space was available. This was creating 90-95% of the 1 billion list requests per day. We changed out du with df to dramatically cut our ListObject requests.

I am glad you could diagnose this! Note that s3fs statvfs call just returns the maximum value for available and 0 for used so you will not get an accurate count.

The periodic du issue is similar to the updatedb symptoms we see sometimes. I wonder if there is some way for users to diagnose which processes are querying s3fs so that they can more easily diagnose similar issues?

Please let us know if your performance improves without the periodic du. Although let's leave this issue open so I can follow up on https://github.com/s3fs-fuse/s3fs-fuse/issues/1465#issuecomment-753275976 at some point.

@gaul commented on GitHub (Jan 13, 2021): > We determined that as part of HDFS's routine, it was running a du -sk every 15 minutes on each storage directory to find out how much space was available. This was creating 90-95% of the 1 billion list requests per day. We changed out du with df to dramatically cut our ListObject requests. I am glad you could diagnose this! Note that s3fs `statvfs` call just returns the maximum value for available and 0 for used so you will not get an accurate count. The periodic `du` issue is similar to the `updatedb` symptoms we see sometimes. I wonder if there is some way for users to diagnose which processes are querying s3fs so that they can more easily diagnose similar issues? Please let us know if your performance improves without the periodic `du`. Although let's leave this issue open so I can follow up on https://github.com/s3fs-fuse/s3fs-fuse/issues/1465#issuecomment-753275976 at some point.

kerem commented

2026-03-04 01:48:41 +03:00

Author

Owner

@gaul commented on GitHub (Apr 25, 2021):

Closing since a workaround addresses the symptoms.

@gaul commented on GitHub (Apr 25, 2021): Closing since a workaround addresses the symptoms.

Rows
Columns

[GH-ISSUE #1465] Trying to improve scan time of files in s3fs-fuse mount #772

Additional Information

Version of s3fs being used (s3fs --version)

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

Kernel information (uname -r)

GNU/Linux Distribution, if applicable (cat /etc/os-release)

s3fs command line used, if applicable

/etc/fstab entry, if applicable