[GH-ISSUE #1339] Possible to query the filesystem meta cache #718

Closed
opened 2026-03-04 01:48:11 +03:00 by kerem · 3 comments
Owner

Originally created by @oucil on GitHub (Jul 24, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1339

Version of s3fs being used (s3fs --version)

1.85

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

2.9.7

Kernel information (uname -r)

4.18.0-147.8.1.el8_1.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"

Details about issue

Not so much an issue as a question/request... if I understand correctly file meta is cached on the local filesystem to reduce the required requests going to the S3 volume and speed up responses. One of the shortcomings of S3 at the moment is the ease/difficulty of getting the total current storage used. Examples include running du through either s4cmd or similar, writing recursive scripts, etc.

In our use case, we're ok with eventually consistent values, so is it possible to directly query the s3fs/fuse meta cache to get a total storage value of the remote filesystem?

Thanks,
Kevin.

Originally created by @oucil on GitHub (Jul 24, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1339 #### Version of s3fs being used (s3fs --version) 1.85 #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) 2.9.7 #### Kernel information (uname -r) 4.18.0-147.8.1.el8_1.x86_64 #### GNU/Linux Distribution, if applicable (cat /etc/os-release) NAME="CentOS Linux" VERSION="8 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="8" PLATFORM_ID="platform:el8" ### Details about issue Not so much an issue as a question/request... if I understand correctly file meta is cached on the local filesystem to reduce the required requests going to the S3 volume and speed up responses. One of the shortcomings of S3 at the moment is the ease/difficulty of getting the total current storage used. Examples include running `du` through either `s4cmd` or similar, writing recursive scripts, etc. In our use case, we're ok with *eventually consistent* values, so is it possible to directly query the s3fs/fuse meta cache to get a total storage value of the remote filesystem? Thanks, Kevin.
kerem closed this issue 2026-03-04 01:48:11 +03:00
Author
Owner

@gaul commented on GitHub (Jul 26, 2020):

s3fs has a few different kinds of caches. The stat cache caches object metadata which corresponds to the file metadata. By default, s3fs caches 100,000 entries without expiry (although this might change in #1341). You can change the size and duration of the cache via various flags. Does this help?

<!-- gh-comment-id:663948651 --> @gaul commented on GitHub (Jul 26, 2020): s3fs has a few different kinds of caches. The stat cache caches object metadata which corresponds to the file metadata. By default, s3fs caches 100,000 entries without expiry (although this might change in #1341). You can change the size and duration of the cache via various flags. Does this help?
Author
Owner

@oucil commented on GitHub (Jul 30, 2020):

Thanks @gaul, not exactly, it was more a question of does the cache in some way maintain state over the total file sizes of all objects on the remote volume, but if it's limited to 100k objects then anything beyond that wouldn't be counted. Funny enough, our provider finally made the ability to see our bucket sizes via API recently, just unpublished so far, so I have the solution I needed. Really appreciate your help though, thank you!

<!-- gh-comment-id:666383378 --> @oucil commented on GitHub (Jul 30, 2020): Thanks @gaul, not exactly, it was more a question of does the cache in some way maintain state over the total file sizes of all objects on the remote volume, but if it's limited to 100k objects then anything beyond that wouldn't be counted. Funny enough, our provider finally made the ability to see our bucket sizes via API recently, just unpublished so far, so I have the solution I needed. Really appreciate your help though, thank you!
Author
Owner

@gaul commented on GitHub (Aug 1, 2020):

The S3 API does not provide any kind of summarization of the total number of objects or their cumulative size. The Swift API does, however. The best s3fs could do is run du or similar which is expensive due to issuing HeadObject once per object.

<!-- gh-comment-id:667515207 --> @gaul commented on GitHub (Aug 1, 2020): The S3 API does not provide any kind of summarization of the total number of objects or their cumulative size. The Swift API does, however. The best s3fs could do is run `du` or similar which is expensive due to issuing `HeadObject` once per object.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#718
No description provided.