mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[PR #1802] [MERGED] Add the stream upload which starts uploading parts before Flush #2185
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#2185
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/s3fs-fuse/s3fs-fuse/pull/1802
Author: @ggtakec
Created: 11/2/2021
Status: ✅ Merged
Merged: 7/17/2022
Merged by: @gaul
Base:
master← Head:stream_upload📝 Commits (5)
6585356Add the stream upload which starts uploading parts before Flushf3a7fb6Reflected the result of the review in the code3ca3cd8Reflect the result of the review in the code againfd81f63Fixed an error which reported by cppcheck 2.87a578b6Merged the code corresponding to the mknod fix(f11eb7d)📊 Changes
15 files changed (+1771 additions, -135 deletions)
View changed files
📝
src/Makefile.am(+1 -0)📝
src/curl.cpp(+66 -0)📝
src/curl.h(+7 -4)📝
src/fdcache_entity.cpp(+287 -1)📝
src/fdcache_entity.h(+5 -0)📝
src/fdcache_fdinfo.cpp(+826 -20)📝
src/fdcache_fdinfo.h(+56 -13)📝
src/fdcache_untreated.cpp(+82 -86)📝
src/fdcache_untreated.h(+8 -9)📝
src/psemaphore.h(+17 -0)📝
src/s3fs.cpp(+28 -0)➕
src/threadpoolman.cpp(+261 -0)➕
src/threadpoolman.h(+97 -0)📝
src/types.h(+29 -2)📝
test/small-integration-test.sh(+1 -0)📄 Description
Relevant Issue (if applicable)
n/a
Overview
In multi-part upload (mix, non-mix upload), the function to upload the file part sequentially before the file is flushed has been added.
Details
The current s3fs will only start uploading a file when a flush is called for the file.
This PR code has added an option called
streamuploadto allow s3fs to upload the file part before the file is flushed.The
streamuploadoption is only effective when multipart upload (mixupload and nomixupload) is enabled.The individual explanations are as follows:
(1)
streamuploadoptionThis is an option to enable the Stream upload function.
This option is a tentative option.
I will remove this option once this PR has been merged and fully tested.
This function should be as the default behavior of s3fs, I plan to enable this feature like multipart upload.
At that time, I will add the
nostreamoption(pseudonym) instead, it is similar to nomultipart etc.(2) Multipart size
When Stream upload is enabled, each part size for multipart upload is fixed(specified by the
multipart_sizeoption).In other words, from the beginning of the file, the size indicated by the
multipart_sizeoption is used as the boundary, and each part is uploaded.(3) Part upload conditions
When all the data for the fixed range part shown in (2) is written, the upload of that part will start.(Multipart upload will start even if it is not flushed)
If writing occurs again for the range of the part that has already been uploaded, the range will be uploaded again.
If the written area does not fill the range of the part, the part will not be uploaded until flush is called.
This range will be uploaded when flush is called.
(4) Thread pool
The code for this additional feature is implemented to have a thread pool.
This thread pool is used in each part's upload call.
The thread pool is initialized when s3fs starts, and all threads are started and put into a standby state.
Thus the
max_thread_countoption(provisional) has been added for specifying this thread pool count.This option is a temporary option like
streamupload.This option will be replaced with the
parallel_countoption, etc., when the s3fs refurbishment(including this PR) is completed.(5) About test
Existing tests are sufficient for uploading files.
Testing for opening files, writing to non-contiguous areas, and closing files can be done with the recently added
write_multiblocktest.Testing of large files was done individually, please see (6).
(6) Performance
Performance comparisons involving large files are performed individually and summarized in the Gist below:
https://gist.github.com/ggtakec/0482aca53643681e2e410ed4032b780f
The speed of uploading 5GB files has been improved by about 40%.
NOTE
This PR is intended for performance tuning and source code cleanup.
The refurbishment will be a series of modifications including this PR.
In a series of fixes, I plan to use the thread pool mentioned above, and to fix downloads, HEAD requests, and so on.
And when the series of refurbishments is complete, the two tentative options mentioned above will also be sorted out.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.