[GH-ISSUE #1485] Avoiding disk usage for serial reading #781

Open
opened 2026-03-04 01:48:45 +03:00 by kerem · 0 comments
Owner

Originally created by @AlexeyDmitriev on GitHub (Nov 27, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1485

Additional Information

Version of s3fs being used (s3fs --version)

Amazon Simple Storage Service File System V1.82(commit:unknown) with GnuTLS(gcrypt)

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

2.9.7-1ubuntu1

Kernel information (uname -r)

5.3.0-1023-aws

GNU/Linux Distribution, if applicable (cat /etc/os-release)

Ubuntu 18.04

Details about issue

So, one of my use cases with s3fs contains mostly reading whole files serially at once:
for example, I could ty run zcat /mnt/s3/path/to/log.gz | grep 'smth' | wc
The problem is that s3fs stores the whole file on the disk eventually, while if it knew that I only going to read it once, it could store (probably in memory) only current chunk that is not fed to the output yet.
Can it be improved somehow? For example in my case I think limit on the cache for each file could help.
Or may be there's an option already that I have missed?

Originally created by @AlexeyDmitriev on GitHub (Nov 27, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1485 ### Additional Information #### Version of s3fs being used (s3fs --version) Amazon Simple Storage Service File System V1.82(commit:unknown) with GnuTLS(gcrypt) #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) 2.9.7-1ubuntu1 #### Kernel information (uname -r) 5.3.0-1023-aws #### GNU/Linux Distribution, if applicable (cat /etc/os-release) Ubuntu 18.04 ### Details about issue So, one of my use cases with s3fs contains mostly reading whole files serially at once: for example, I could ty run `zcat /mnt/s3/path/to/log.gz | grep 'smth' | wc` The problem is that s3fs stores the whole file on the disk eventually, while if it knew that I only going to read it once, it could store (probably in memory) only current chunk that is not fed to the output yet. Can it be improved somehow? For example in my case I think limit on the cache for each file could help. Or may be there's an option already that I have missed?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#781
No description provided.