mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #798] Incredibly slow upload speeds #460
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#460
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @danielmarquard on GitHub (Jul 11, 2018).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/798
I'm using s3fs for one of my use cases after I encountered a problem with goofys (which is super fast), that won't be possible for me to work around. So I'm curious, is 2 MB/s upload to an internal S3 endpoint nominal for s3fs? It just doesn't sound right.
My mount is simple:
Just curious if anyone else has this problem? I mean, I'm grateful that it's at least working, but at 2 MB/s?
@kunallillaney commented on GitHub (Sep 17, 2018):
In my experience, s3fs can perform much faster than this. Can you please describe your workload and size of files?
Are you writing relatively small files to S3 (<5MB) or performing updates over existing files?
@gaul commented on GitHub (Jan 24, 2019):
You may want to try again with master which has some write optimizations. However, using
-o curldbgmight reveal why you experience poor performance. You may need to tune the part size via-o multipart_size.@ggtakec commented on GitHub (Mar 29, 2019):
You can try to use latest version(1.86) or master branch code.
It is tuned performance.
And I recommend you should use use_cache and try to specify multipart_size option(@gaul said, too).
Thanks in advance for your help.
@pwulff commented on GitHub (Apr 3, 2019):
I get the same kind of performance on my private S3 with the master branch with a lot of files around 1 MB.
@gaul commented on GitHub (Apr 3, 2019):
It would help to explain exactly which operations exhibit poor performance and compare them with another tool like AWS CLI. Also please quantify this with file size and MB/s.
@pwulff commented on GitHub (Apr 4, 2019):
I have a data set of 2000 binary files of 1.000.000 bytes created with the command line:
for i in {1..2001}; do dd if=/dev/urandom bs=1000000 count=1 of=file$i; doneI copy them with the command line:
rsync --info=progress2 /src/dataset/* /dst/s3fs-fuse-mountpoint/folderAnd i got between 1.5 and 2.5 MB/s.
I have made the same transfert with a Windows client and I got 12.5 MB/s.
My S3 endpoint is a local private cloud and we have a 1Gb network.
@pwulff commented on GitHub (Apr 4, 2019):
I have made a test with
dbglevel=info -f -o curldbgas requested by @gaul and I got a lot of 404.[INF] curl.cpp:RequestPerform(2250): HTTP response code 404 was returned, returning ENOENT@pwulff commented on GitHub (Apr 4, 2019):
After more test I've found this:
[WAN] curl.cpp:ResetHandle(1855): The CURLOPT_SSL_ENABLE_ALPN option could not be unset. S3 server does not support ALPN, then this option should be disabled to maximize performance. you need to use libcurl 7.36.0 or later. [WAN] curl.cpp:ResetHandle(1858): The S3FS_CURLOPT_KEEP_SENDING_ON_ERROR option could not be set. For maximize performance you need to enable this option and you should use libcurl 7.51.0 or later.I will try to update my RHEL 7.6 with a more recent libcurl to see if that change anything.
@pwulff commented on GitHub (Apr 4, 2019):
No change of speed with curl 7.64.1 installed.
@ggtakec commented on GitHub (Apr 7, 2019):
@pwulff
If objects in your bucket are only files/directories created by s3fs, you can try to specify the notsup_compat_dir option.
I think that using this option will reduce 404 errors.
s3fs performs checks to recognize directories of paths to objects created by other tools, and it generates 404 errors.
This option does not check these another type objects, then it not makes 404 error.
Thanks in advance for your assistance.
@pwulff commented on GitHub (Apr 12, 2019):
I've profiled s3fs during my testing and following callgrind 70% of the time is consumed by EVP_DigestUpdate(). To mount the bucket, upload 300 files of 1.000.000 of bytes and unmount the bucket this method is called more than 800.000 times. I suppose that the SHA256 hash overhead is not so painful for big files.
@ggtakec commented on GitHub (Apr 16, 2019):
@pwulff
EVP_DigestUpdate is called to create a signature to access S3.
And when you mount, many HEAD requests are sent to check the existence of directories and files in order to upload files.
I think that the number of requests has increased because of this confirmation, and the number of calls to EVP_DigestUpdate is also increasing.
Could you try the option (max_stat_cache_size, stat_cache_expire, stat_cache_interval_expire, enable_noobj_cache) to cache the file stat info once confirmed?
Thanks in advance for your assistance.
@joshglenn commented on GitHub (May 6, 2019):
I'm also having this problem... on Mac osx mojave, copying 19 files for a total of 2.0 MB takes about 10 minutes. Slow as dial-up :-)
@nikt12 commented on GitHub (May 14, 2019):
@ggtakec Hello, could you advice, what options should be to upload large files very fast?
I am using 1.84 version and try to upload just one file to S3 compatible storage.
Next options were used: "-o use_path_request_style -o enable_content_md5 -o allow_other -o nonempty -o kernel_cache -o max_stat_cache_size=1000000 -o parallel_count=200 -o multipart_size=100 -o singlepart_copy_limit=200 -o multireq_max=30 -o max_background=1000".
With this I could reach only near 60 MB/S uploading speed.
I'm under assumption that multipart uploading was not enabled somehow (I had 35GB file, and I think that several threads have to upload this, but with htop CPU usage for s3fs daemon was not higher than 100%).
@gaul commented on GitHub (Feb 3, 2020):
Could you test with the latest version 1.85? It includes several optimizations which should improve your performance.
@gaul commented on GitHub (Apr 22, 2020):
Please reopen if symptoms persist.
@neilpalima commented on GitHub (May 19, 2020):
@gaul, I think I'm experiencing this issue. Is there a way on how to verify this? In an aws ec2 instance, we are writing a file with ~70mb but it takes like 40mins-1hour(or more) to be uploaded in our s3 bucket. The commit hash we are using is
005a684600f83a784c526ada078ac807e5e19633-Fix typos.@drzraf commented on GitHub (Sep 4, 2020):
I found that on openstack, in a preliminary step, for each file to upload using rsync, 4 requests are made sequentially.
/2018-09/x.mp4_$folder$/2018-09/x.mp4//2018-09/x.mp4delimiter=/&max-keys=2&prefix=2018-09/x.mp4/And I doubt files are processed in parallel.
As such, before the upload of 500 files even starts, 2000k requests are issued. This is at least suboptimal =>
notsup_compat_dirto the rescue, limit to 2 requests per file instead of 4.@gaul commented on GitHub (Sep 6, 2020):
@neilpalima s3fs should sustain 100+ MBytes/s for sequential reads and writes. Please open a new issue with your symptoms and benchmark setup if you experience poor performance.
@drzraf s3fs does not have optimal metadata performance and does some checks sequentially. #927 discusses changing these defaults. Note that this should not dramatically affect performance for large files.
@hopeseekr commented on GitHub (Sep 21, 2020):
cp -rvfworks fine and fast.rsync -avW --progress --inplace --size-onlyseems to get frozen atsending incremental file list... all it does is create all of the directories.@skepticalwaves commented on GitHub (Nov 23, 2021):
I'm having the same experience as @hopeseekr,
cpworks, but rsync hangs forever after creating directories.