[GH-ISSUE #1485] Avoiding disk usage for serial reading #781

New issue

Open

opened 2026-03-04 01:48:45 +03:00 by kerem · 0 comments

kerem commented

2026-03-04 01:48:45 +03:00

Owner

Originally created by @AlexeyDmitriev on GitHub (Nov 27, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1485

Additional Information

Version of s3fs being used (s3fs --version)

Amazon Simple Storage Service File System V1.82(commit:unknown) with GnuTLS(gcrypt)

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

2.9.7-1ubuntu1

Kernel information (uname -r)

5.3.0-1023-aws

GNU/Linux Distribution, if applicable (cat /etc/os-release)

Ubuntu 18.04

Details about issue

So, one of my use cases with s3fs contains mostly reading whole files serially at once:
for example, I could ty run zcat /mnt/s3/path/to/log.gz | grep 'smth' | wc
The problem is that s3fs stores the whole file on the disk eventually, while if it knew that I only going to read it once, it could store (probably in memory) only current chunk that is not fed to the output yet.
Can it be improved somehow? For example in my case I think limit on the cache for each file could help.
Or may be there's an option already that I have missed?

Originally created by @AlexeyDmitriev on GitHub (Nov 27, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1485 ### Additional Information #### Version of s3fs being used (s3fs --version) Amazon Simple Storage Service File System V1.82(commit:unknown) with GnuTLS(gcrypt) #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) 2.9.7-1ubuntu1 #### Kernel information (uname -r) 5.3.0-1023-aws #### GNU/Linux Distribution, if applicable (cat /etc/os-release) Ubuntu 18.04 ### Details about issue So, one of my use cases with s3fs contains mostly reading whole files serially at once: for example, I could ty run `zcat /mnt/s3/path/to/log.gz | grep 'smth' | wc` The problem is that s3fs stores the whole file on the disk eventually, while if it knew that I only going to read it once, it could store (probably in memory) only current chunk that is not fed to the output yet. Can it be improved somehow? For example in my case I think limit on the cache for each file could help. Or may be there's an option already that I have missed?