mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 21:35:58 +03:00
[GH-ISSUE #2079] S3FS Fio benchmark with nvme cache vs nvme #1052
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#1052
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @itweixiang on GitHub (Dec 20, 2022).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2079
First
I observed two phenomena.
When S3fs writing files, write data to the Cache first
After the cache is written, the data is synchronized to the remote file system
Write succeeds only after synchronization is completed
1, The data is in the memory, directly read the memory data
The data is in cache(use_cache), and the cache data is read directly
Data in the remote File System is read from the File System to the cache, and then returned from the cache
FIO Benchmark
I use an NVME hard drive with a speed of 6000MB/s as the cache. the nvme fio benchmark is :
Then I mount the above NVME hard drive as a cache for my S3FS , The performance of the S3FS tested is as follows:
1g
5g
With an NVME drive as a cache, performance is only one-third at size=1G and one-seventh at size=5g
In a potentially worse case scenario, my Linux machine has 128G of memory and 100G of memory available, which means that either 1G or 5G will be fully loaded into memory. Fio may read directly from memory instead of the NVME cache
The speed of reading from memory should not be only 2000MB/S ?
ENV
Thank
Thank you for your contribution to S3FS, so that we can use it better. My statement may not be complete and clear, or it may be a performance problem caused by my parameter not being set. I look forward to your reply and comment.
@itweixiang commented on GitHub (Jan 3, 2023):
Two weeks passed and there was no response
@ggtakec commented on GitHub (Jan 11, 2023):
@itweixiang I'm sorry for my late reply.
s3fs uploads and downloads objects on S3, and the use_cache option sits in the middle, creating a cache file on your local disk.
Accessing this local file(cache) is a little more complicated.
Actually reading and writing data to the local file is done directly.
For example, when reading a file(object on the S3 server), it checks whether the file has been updated, and then accesses the local file if there is no change.
If the file is cached in its entirety, the file stat information has not changed, and the s3fs's stat cache has been cached out, the local file will only be read without communicating with the S3 server. .
(I think it will give the best performance)
Currently s3fs has some options for caching(files, stats information) and upload method(multipart, steram, etc).
I think the combination of these will change the performance depending on user usage.
Even if you use a high-speed NVMe SSD, you may not get the expected value if processing other than accessing the SSD(used as cache) takes time.
It's a good idea to try a few combinations of options and see what works best for you.
@itweixiang commented on GitHub (Jan 12, 2023):
@ggtakec Thank for your reply
During this time, I also wondered if the metadata overhead of the judgment file was affecting the reading speed
I am trying to follow your suggestions to verify whether there will be better performance. Thank you again for your reply
@ggtakec commented on GitHub (Jan 12, 2023):
@itweixiang Thanks for your kindness, we will also continue to modify it for better performance.