[GH-ISSUE #158] Slow and/or no response when listing long directories #93

Closed
opened 2026-03-04 01:42:00 +03:00 by kerem · 20 comments
Owner

Originally created by @darrencruse on GitHub (Mar 23, 2015).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/158

Searched the issues didn't find anything quite like this feel free to delete if I overlooked it.

We are running s3fs on amazon ec2 and at a shell window if you simply try and do "ls" on a directory that has a lot of files or subdirectories (don't have numbers but I'd say "a lot" means roughly in the hundreds) the response time is terrible.

The same type of thing happens if I'm in bash and attempt tab completion on a partial file name btw. If it's a directory with lots of files it just hangs.

Sometimes if I wait a long time (like minutes) it will eventually return, but other times I just give up and kill the window (ctrl-c sometimes can get me back to a shell window but not always).

I don't have the problem with directories with fewer files (say less than a hundred). There the response time isn't super fast but is acceptable.

Originally created by @darrencruse on GitHub (Mar 23, 2015). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/158 Searched the issues didn't find anything quite like this feel free to delete if I overlooked it. We are running s3fs on amazon ec2 and at a shell window if you simply try and do "ls" on a directory that has a lot of files or subdirectories (don't have numbers but I'd say "a lot" means roughly in the hundreds) the response time is terrible. The same type of thing happens if I'm in bash and attempt tab completion on a partial file name btw. If it's a directory with lots of files it just hangs. Sometimes if I wait a long time (like minutes) it will eventually return, but other times I just give up and kill the window (ctrl-c sometimes can get me back to a shell window but not always). I don't have the problem with directories with fewer files (say less than a hundred). There the response time isn't super fast but is acceptable.
kerem closed this issue 2026-03-04 01:42:00 +03:00
Author
Owner

@MdNor commented on GitHub (Apr 2, 2015):

I think this behavior is to be expected when you try to mount any object storage such as S3, as filesystem. This is object storage limitation, not s3fs.

s3fs will issue GET request to S3 everytime you try to list all files and directory. Try mounting your bucket with debug mode and issue list command, you'll see what s3fs actually doing during those slow/no response period.

<!-- gh-comment-id:88861863 --> @MdNor commented on GitHub (Apr 2, 2015): I think this behavior is to be expected when you try to mount any object storage such as S3, as filesystem. This is object storage limitation, not s3fs. s3fs will issue GET request to S3 everytime you try to list all files and directory. Try mounting your bucket with debug mode and issue list command, you'll see what s3fs actually doing during those slow/no response period.
Author
Owner

@gaul commented on GitHub (Aug 8, 2015):

Fixing #223 will help with this. Agreed that s3fs will always have worse performance than something like NFS due to the lack of readdir+.

<!-- gh-comment-id:129059285 --> @gaul commented on GitHub (Aug 8, 2015): Fixing #223 will help with this. Agreed that s3fs will always have worse performance than something like NFS due to the lack of readdir+.
Author
Owner

@pshenoy-uc commented on GitHub (Sep 19, 2015):

I have a similar issue,

My AWS s3 bucket has millions of files and I am mounting it using s3fs. Anytime an 'ls' command is issued (not intentionally) the terminal hangs. Tabbing the filename completion also gives this issue.

If there was a way to limit the number of results returned to 'n' which we can define as an option for 'ls' and or 'tabbing' command it would help. Would this be a possible fix?

<!-- gh-comment-id:141642850 --> @pshenoy-uc commented on GitHub (Sep 19, 2015): I have a similar issue, My AWS s3 bucket has millions of files and I am mounting it using s3fs. Anytime an 'ls' command is issued (not intentionally) the terminal hangs. Tabbing the filename completion also gives this issue. If there was a way to limit the number of results returned to 'n' which we can define as an option for 'ls' and or 'tabbing' command it would help. Would this be a possible fix?
Author
Owner

@kahing commented on GitHub (Sep 29, 2015):

ls doesn't give you a way to limit the number of returned results, although by default it usually waits for the entire list, sorts them, before printing anything. One thing you can try to do to speed it up is using ls -f. Internally s3fs still waits for the all the s3 responses from ListObjects, but at least ls won't stat the objects which will be faster.

Realistically though, how in the world are you going to visualize 1 million files in your terminal anyway?

<!-- gh-comment-id:144150596 --> @kahing commented on GitHub (Sep 29, 2015): `ls` doesn't give you a way to limit the number of returned results, although by default it usually waits for the entire list, sorts them, before printing anything. One thing you can try to do to speed it up is using `ls -f`. Internally s3fs still waits for the all the s3 responses from `ListObjects`, but at least ls won't stat the objects which will be faster. Realistically though, how in the world are you going to visualize 1 million files in your terminal anyway?
Author
Owner

@gaul commented on GitHub (Sep 29, 2015):

@kahing ls -f calls getdents which causes excessive HEAD object calls.

<!-- gh-comment-id:144177194 --> @gaul commented on GitHub (Sep 29, 2015): @kahing `ls -f` calls `getdents` which causes excessive HEAD object calls.
Author
Owner

@dqh-au commented on GitHub (Feb 2, 2016):

I'm having the same issue with a folder of 10,000 files. I wrote a simple java program that can iterate over the files in the bucket, printing them out, in 11 seconds. It takes nearly 8 minutes to 'ls' the same folder using s3fs!

<!-- gh-comment-id:178748567 --> @dqh-au commented on GitHub (Feb 2, 2016): I'm having the same issue with a folder of 10,000 files. I wrote a simple java program that can iterate over the files in the bucket, printing them out, in 11 seconds. It takes nearly 8 minutes to 'ls' the same folder using s3fs!
Author
Owner

@kahing commented on GitHub (Feb 2, 2016):

It's simply not possible to have POSIX, performance, and S3 all in the same sentence. I hate to self-evanglize here, I wrote https://github.com/kahing/goofys to make a different trade-off and one of the things that it solves is precisely this.

<!-- gh-comment-id:178783987 --> @kahing commented on GitHub (Feb 2, 2016): It's simply not possible to have POSIX, performance, and S3 all in the same sentence. I hate to self-evanglize here, I wrote https://github.com/kahing/goofys to make a different trade-off and one of the things that it solves is precisely this.
Author
Owner

@dqh-au commented on GitHub (Feb 2, 2016):

Do you know what a naked 'ls' is doing beyond simply iterating over the bucket keys ?

<!-- gh-comment-id:178801074 --> @dqh-au commented on GitHub (Feb 2, 2016): Do you know what a naked 'ls' is doing beyond simply iterating over the bucket keys ?
Author
Owner

@gaul commented on GitHub (Feb 2, 2016):

@davidqhogan s3fs issues a HEAD request for each key to get POSIX attributes like mode bits from the user metadata.

<!-- gh-comment-id:178811694 --> @gaul commented on GitHub (Feb 2, 2016): @davidqhogan s3fs issues a HEAD request for each key to get POSIX attributes like mode bits from the user metadata.
Author
Owner

@dqh-au commented on GitHub (Feb 2, 2016):

Interesting, I wonder how feasible it would be to make an option that disabled this behaviour. In my scenario all I need is key listing, get object, delete object

<!-- gh-comment-id:178812835 --> @dqh-au commented on GitHub (Feb 2, 2016): Interesting, I wonder how feasible it would be to make an option that disabled this behaviour. In my scenario all I need is key listing, get object, delete object
Author
Owner

@kahing commented on GitHub (Feb 2, 2016):

I haven't looked at ls's source code, but readdir returns a dirent which contains whether the file is a file or directory, and s3fs does a couple HEAD requests to find that out. So even without posix attributes it will still need to do the HEADs.

disabling sorting ls -f can help a little since it will allow ls to print results as soon as it gets them.

<!-- gh-comment-id:178814061 --> @kahing commented on GitHub (Feb 2, 2016): I haven't looked at ls's source code, but readdir returns a dirent which contains whether the file is a file or directory, and s3fs does a couple HEAD requests to find that out. So even without posix attributes it will still need to do the HEADs. disabling sorting `ls -f` can help a little since it will allow ls to print results as soon as it gets them.
Author
Owner

@dqh-au commented on GitHub (Feb 2, 2016):

ls -f seems to reduce the 8 minutes to 2.5 minutes, but it's still too long for my use case.

Couldn't s3fs inspect the key for a '/' character to determine whether or not the file is a directory?

<!-- gh-comment-id:178815827 --> @dqh-au commented on GitHub (Feb 2, 2016): ls -f seems to reduce the 8 minutes to 2.5 minutes, but it's still too long for my use case. Couldn't s3fs inspect the key for a '/' character to determine whether or not the file is a directory?
Author
Owner

@postmaxin commented on GitHub (Feb 2, 2016):

If you're just using S3 as a simple key->data store and are not concerned
about filesystem metadata -- could it be simpler just to use the AWS cli to
put/get/list the objects you're interested in?

On 2 February 2016 at 15:59, davidqhogan notifications@github.com wrote:

ls -f seems to reduce the 8 minutes to 2.5 minutes, but it's still too
long for my use case.

Couldn't s3fs inspect the key for a '/' character to determine whether or
not the file is a directory?


Reply to this email directly or view it on GitHub
https://github.com/s3fs-fuse/s3fs-fuse/issues/158#issuecomment-178815827
.

<!-- gh-comment-id:178824520 --> @postmaxin commented on GitHub (Feb 2, 2016): If you're just using S3 as a simple key->data store and are not concerned about filesystem metadata -- could it be simpler just to use the AWS cli to put/get/list the objects you're interested in? On 2 February 2016 at 15:59, davidqhogan notifications@github.com wrote: > ls -f seems to reduce the 8 minutes to 2.5 minutes, but it's still too > long for my use case. > > Couldn't s3fs inspect the key for a '/' character to determine whether or > not the file is a directory? > > — > Reply to this email directly or view it on GitHub > https://github.com/s3fs-fuse/s3fs-fuse/issues/158#issuecomment-178815827 > .
Author
Owner

@dqh-au commented on GitHub (Feb 2, 2016):

That would be great but unfortunately I have a need for an SFTP interface to an s3 bucket. Currently looking into whether I can override the file listing operation somehow (openssh sftp)

<!-- gh-comment-id:178826274 --> @dqh-au commented on GitHub (Feb 2, 2016): That would be great but unfortunately I have a need for an SFTP interface to an s3 bucket. Currently looking into whether I can override the file listing operation somehow (openssh sftp)
Author
Owner

@kahing commented on GitHub (Feb 2, 2016):

s3fs surely can do better at least in the readdir case. General lookup without readdir is more complicated, because in those cases the kernel just sends you the path /dir1 without the trailing /, so typically you do the following checks in the general case:

  1. See if name + '/' exists, that'd be created if you do mkdir in s3fs for example.
  2. See if name exists, in that case it's a regular file
  3. See if there's any object named name/*, that could happen if you use other s3 tools to create objects like dir1/file1, dir1/file2 but never explicitly created dir1.

s3fs always falls back to the general lookup case. goofys employs a readdir optimization for this to avoid all those HEAD requests.

<!-- gh-comment-id:178840476 --> @kahing commented on GitHub (Feb 2, 2016): s3fs surely can do better at least in the readdir case. General lookup without readdir is more complicated, because in those cases the kernel just sends you the path `/dir1` without the trailing `/`, so typically you do the following checks in the general case: 1. See if `name + '/'` exists, that'd be created if you do mkdir in s3fs for example. 2. See if `name` exists, in that case it's a regular file 3. See if there's any object named `name/*`, that could happen if you use other s3 tools to create objects like `dir1/file1`, `dir1/file2` but never explicitly created `dir1`. s3fs always falls back to the general lookup case. goofys employs a readdir optimization for this to avoid all those HEAD requests.
Author
Owner

@hryang commented on GitHub (Feb 16, 2016):

@davidqhogan

  1. Suppose directory d contains n files, "ls d" triggers: 1 opendir + 1 readdir + n getattr fuse calls. When n is large, the getattr calls become bottleneck. "ls -f" removes the getattr calls, then improve the performance.
  2. You can set larger stat cache max size, -omax_stat_cache_size=xxx. The first "ls" is slow, but the subsequent ones should be very fast.
<!-- gh-comment-id:184595794 --> @hryang commented on GitHub (Feb 16, 2016): @davidqhogan 1. Suppose directory d contains n files, "ls d" triggers: 1 opendir + 1 readdir + n getattr fuse calls. When n is large, the getattr calls become bottleneck. "ls -f" removes the getattr calls, then improve the performance. 2. You can set larger stat cache max size, -omax_stat_cache_size=xxx. The first "ls" is slow, but the subsequent ones should be very fast.
Author
Owner

@barsk commented on GitHub (Jun 1, 2017):

Well a bit off-topic, but I tried kahing's goofys s3-filesystem and it does indeed work fast. Directory listings that took minutes are now instantaneous. Goofy "fakes" some things that s3fs does more properly and does not support all use-cases, so it might not suit all. But for those that it does I recomend a look.

Maybe we can get some of these optimizations in s3fs as well?

<!-- gh-comment-id:305450302 --> @barsk commented on GitHub (Jun 1, 2017): Well a bit off-topic, but I tried kahing's goofys s3-filesystem and it does indeed work fast. Directory listings that took minutes are now instantaneous. Goofy "fakes" some things that s3fs does more properly and does not support all use-cases, so it might not suit all. But for those that it does I recomend a look. Maybe we can get some of these optimizations in s3fs as well?
Author
Owner

@ylluminate commented on GitHub (Jun 1, 2017):

Hah, thanks for that pointer @barsk (and @kahing)... Wow, I was not even familiar with goofys, but after reviewing it, honestly why would one want to use s3fs instead of goofys? Obviously one may want on-fs caching and such in some occasions, but for an always on server scenario that has performance as the most important attribute (like using s3 as a block storage device), goofys seems optimal. I suppose that some kind of a cache could be useful with such a scenario, but holy cow, this is some seriously advantageous speed, especially on fat pipes.

<!-- gh-comment-id:305558088 --> @ylluminate commented on GitHub (Jun 1, 2017): Hah, thanks for that pointer @barsk (and @kahing)... Wow, I was not even familiar with `goofys`, but after reviewing it, honestly why would one want to use `s3fs` instead of `goofys`? Obviously one may want on-fs caching and such in some occasions, but for an always on server scenario that has performance as the most important attribute (like using s3 as a block storage device), `goofys` seems optimal. I suppose that some kind of a cache *could* be useful with such a scenario, but holy cow, this is some seriously advantageous speed, especially on fat pipes.
Author
Owner

@gaul commented on GitHub (Feb 2, 2019):

Could you test again with master? It includes multiple improvements to readdir performance.

<!-- gh-comment-id:459934265 --> @gaul commented on GitHub (Feb 2, 2019): Could you test again with master? It includes multiple improvements to `readdir` performance.
Author
Owner

@gaul commented on GitHub (Apr 9, 2019):

Closing due to inactivity. Please reopen if symptoms persist.

<!-- gh-comment-id:481188084 --> @gaul commented on GitHub (Apr 9, 2019): Closing due to inactivity. Please reopen if symptoms persist.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#93
No description provided.