[GH-ISSUE #1805] s3fs-fuse for machine learning training #922

Closed
opened 2026-03-04 01:49:58 +03:00 by kerem · 2 comments
Owner

Originally created by @syu-lk4b on GitHub (Nov 8, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1805

How does s3fs-fuse handle loading the big files. about 300GB file. If we only need part of the data from the 300GB file, does any one know how does s3fs-fuse handle it.

Let's say I have a 300Gb file , but I only need 2Gb content from this big file. Will s3fs download the whole file in order to get that 2Gb content. if not, how does it work?

Originally created by @syu-lk4b on GitHub (Nov 8, 2021). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1805 How does s3fs-fuse handle loading the big files. about 300GB file. If we only need part of the data from the 300GB file, does any one know how does s3fs-fuse handle it. Let's say I have a 300Gb file , but I only need 2Gb content from this big file. Will s3fs download the whole file in order to get that 2Gb content. if not, how does it work?
kerem closed this issue 2026-03-04 01:49:58 +03:00
Author
Owner

@gaul commented on GitHub (Nov 8, 2021):

s3fs does not download the entire object to read part of the file. Instead it issues (5) 10 MB parallel HTTP GET range requests at a given offset. It caches this data to satisfy future read requests. For your example, if the 2 GB is contiguous it will only download 2 GB. But if the 2 GB are scattered in small chunks throughout the object then s3fs will request a lot more data.

<!-- gh-comment-id:962923340 --> @gaul commented on GitHub (Nov 8, 2021): s3fs does not download the entire object to read part of the file. Instead it issues (5) 10 MB parallel HTTP GET range requests at a given offset. It caches this data to satisfy future read requests. For your example, if the 2 GB is contiguous it will only download 2 GB. But if the 2 GB are scattered in small chunks throughout the object then s3fs will request a lot more data.
Author
Owner

@syu-lk4b commented on GitHub (Nov 9, 2021):

@gaul Thanks so much for the explanation

<!-- gh-comment-id:963711396 --> @syu-lk4b commented on GitHub (Nov 9, 2021): @gaul Thanks so much for the explanation
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#922
No description provided.