mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #1805] s3fs-fuse for machine learning training #922
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#922
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @syu-lk4b on GitHub (Nov 8, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1805
How does s3fs-fuse handle loading the big files. about 300GB file. If we only need part of the data from the 300GB file, does any one know how does s3fs-fuse handle it.
Let's say I have a 300Gb file , but I only need 2Gb content from this big file. Will s3fs download the whole file in order to get that 2Gb content. if not, how does it work?
@gaul commented on GitHub (Nov 8, 2021):
s3fs does not download the entire object to read part of the file. Instead it issues (5) 10 MB parallel HTTP GET range requests at a given offset. It caches this data to satisfy future read requests. For your example, if the 2 GB is contiguous it will only download 2 GB. But if the 2 GB are scattered in small chunks throughout the object then s3fs will request a lot more data.
@syu-lk4b commented on GitHub (Nov 9, 2021):
@gaul Thanks so much for the explanation