mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #158] Slow and/or no response when listing long directories #93
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#93
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @darrencruse on GitHub (Mar 23, 2015).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/158
Searched the issues didn't find anything quite like this feel free to delete if I overlooked it.
We are running s3fs on amazon ec2 and at a shell window if you simply try and do "ls" on a directory that has a lot of files or subdirectories (don't have numbers but I'd say "a lot" means roughly in the hundreds) the response time is terrible.
The same type of thing happens if I'm in bash and attempt tab completion on a partial file name btw. If it's a directory with lots of files it just hangs.
Sometimes if I wait a long time (like minutes) it will eventually return, but other times I just give up and kill the window (ctrl-c sometimes can get me back to a shell window but not always).
I don't have the problem with directories with fewer files (say less than a hundred). There the response time isn't super fast but is acceptable.
@MdNor commented on GitHub (Apr 2, 2015):
I think this behavior is to be expected when you try to mount any object storage such as S3, as filesystem. This is object storage limitation, not s3fs.
s3fs will issue GET request to S3 everytime you try to list all files and directory. Try mounting your bucket with debug mode and issue list command, you'll see what s3fs actually doing during those slow/no response period.
@gaul commented on GitHub (Aug 8, 2015):
Fixing #223 will help with this. Agreed that s3fs will always have worse performance than something like NFS due to the lack of readdir+.
@pshenoy-uc commented on GitHub (Sep 19, 2015):
I have a similar issue,
My AWS s3 bucket has millions of files and I am mounting it using s3fs. Anytime an 'ls' command is issued (not intentionally) the terminal hangs. Tabbing the filename completion also gives this issue.
If there was a way to limit the number of results returned to 'n' which we can define as an option for 'ls' and or 'tabbing' command it would help. Would this be a possible fix?
@kahing commented on GitHub (Sep 29, 2015):
lsdoesn't give you a way to limit the number of returned results, although by default it usually waits for the entire list, sorts them, before printing anything. One thing you can try to do to speed it up is usingls -f. Internally s3fs still waits for the all the s3 responses fromListObjects, but at least ls won't stat the objects which will be faster.Realistically though, how in the world are you going to visualize 1 million files in your terminal anyway?
@gaul commented on GitHub (Sep 29, 2015):
@kahing
ls -fcallsgetdentswhich causes excessive HEAD object calls.@dqh-au commented on GitHub (Feb 2, 2016):
I'm having the same issue with a folder of 10,000 files. I wrote a simple java program that can iterate over the files in the bucket, printing them out, in 11 seconds. It takes nearly 8 minutes to 'ls' the same folder using s3fs!
@kahing commented on GitHub (Feb 2, 2016):
It's simply not possible to have POSIX, performance, and S3 all in the same sentence. I hate to self-evanglize here, I wrote https://github.com/kahing/goofys to make a different trade-off and one of the things that it solves is precisely this.
@dqh-au commented on GitHub (Feb 2, 2016):
Do you know what a naked 'ls' is doing beyond simply iterating over the bucket keys ?
@gaul commented on GitHub (Feb 2, 2016):
@davidqhogan s3fs issues a HEAD request for each key to get POSIX attributes like mode bits from the user metadata.
@dqh-au commented on GitHub (Feb 2, 2016):
Interesting, I wonder how feasible it would be to make an option that disabled this behaviour. In my scenario all I need is key listing, get object, delete object
@kahing commented on GitHub (Feb 2, 2016):
I haven't looked at ls's source code, but readdir returns a dirent which contains whether the file is a file or directory, and s3fs does a couple HEAD requests to find that out. So even without posix attributes it will still need to do the HEADs.
disabling sorting
ls -fcan help a little since it will allow ls to print results as soon as it gets them.@dqh-au commented on GitHub (Feb 2, 2016):
ls -f seems to reduce the 8 minutes to 2.5 minutes, but it's still too long for my use case.
Couldn't s3fs inspect the key for a '/' character to determine whether or not the file is a directory?
@postmaxin commented on GitHub (Feb 2, 2016):
If you're just using S3 as a simple key->data store and are not concerned
about filesystem metadata -- could it be simpler just to use the AWS cli to
put/get/list the objects you're interested in?
On 2 February 2016 at 15:59, davidqhogan notifications@github.com wrote:
@dqh-au commented on GitHub (Feb 2, 2016):
That would be great but unfortunately I have a need for an SFTP interface to an s3 bucket. Currently looking into whether I can override the file listing operation somehow (openssh sftp)
@kahing commented on GitHub (Feb 2, 2016):
s3fs surely can do better at least in the readdir case. General lookup without readdir is more complicated, because in those cases the kernel just sends you the path
/dir1without the trailing/, so typically you do the following checks in the general case:name + '/'exists, that'd be created if you do mkdir in s3fs for example.nameexists, in that case it's a regular filename/*, that could happen if you use other s3 tools to create objects likedir1/file1,dir1/file2but never explicitly createddir1.s3fs always falls back to the general lookup case. goofys employs a readdir optimization for this to avoid all those HEAD requests.
@hryang commented on GitHub (Feb 16, 2016):
@davidqhogan
@barsk commented on GitHub (Jun 1, 2017):
Well a bit off-topic, but I tried kahing's goofys s3-filesystem and it does indeed work fast. Directory listings that took minutes are now instantaneous. Goofy "fakes" some things that s3fs does more properly and does not support all use-cases, so it might not suit all. But for those that it does I recomend a look.
Maybe we can get some of these optimizations in s3fs as well?
@ylluminate commented on GitHub (Jun 1, 2017):
Hah, thanks for that pointer @barsk (and @kahing)... Wow, I was not even familiar with
goofys, but after reviewing it, honestly why would one want to uses3fsinstead ofgoofys? Obviously one may want on-fs caching and such in some occasions, but for an always on server scenario that has performance as the most important attribute (like using s3 as a block storage device),goofysseems optimal. I suppose that some kind of a cache could be useful with such a scenario, but holy cow, this is some seriously advantageous speed, especially on fat pipes.@gaul commented on GitHub (Feb 2, 2019):
Could you test again with master? It includes multiple improvements to
readdirperformance.@gaul commented on GitHub (Apr 9, 2019):
Closing due to inactivity. Please reopen if symptoms persist.