[GH-ISSUE #798] Incredibly slow upload speeds #460

Closed
opened 2026-03-04 01:45:47 +03:00 by kerem · 21 comments
Owner

Originally created by @danielmarquard on GitHub (Jul 11, 2018).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/798

I'm using s3fs for one of my use cases after I encountered a problem with goofys (which is super fast), that won't be possible for me to work around. So I'm curious, is 2 MB/s upload to an internal S3 endpoint nominal for s3fs? It just doesn't sound right.

My mount is simple:

s3fs bucket-name /mnt/dir -o uid=$(id -u user) -o gid=$(id -g group) -o allow_other -o iam_role="$(iam_role)"

Just curious if anyone else has this problem? I mean, I'm grateful that it's at least working, but at 2 MB/s?

Originally created by @danielmarquard on GitHub (Jul 11, 2018). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/798 I'm using s3fs for one of my use cases after I encountered a problem with goofys (which is super fast), that won't be possible for me to work around. So I'm curious, is 2 MB/s upload to an internal S3 endpoint nominal for s3fs? It just doesn't sound right. My mount is simple: ``` s3fs bucket-name /mnt/dir -o uid=$(id -u user) -o gid=$(id -g group) -o allow_other -o iam_role="$(iam_role)" ``` Just curious if anyone else has this problem? I mean, I'm grateful that it's at least working, but at 2 MB/s?
kerem closed this issue 2026-03-04 01:45:47 +03:00
Author
Owner

@kunallillaney commented on GitHub (Sep 17, 2018):

In my experience, s3fs can perform much faster than this. Can you please describe your workload and size of files?
Are you writing relatively small files to S3 (<5MB) or performing updates over existing files?

<!-- gh-comment-id:422202668 --> @kunallillaney commented on GitHub (Sep 17, 2018): In my experience, s3fs can perform much faster than this. Can you please describe your workload and size of files? Are you writing relatively small files to S3 (<5MB) or performing updates over existing files?
Author
Owner

@gaul commented on GitHub (Jan 24, 2019):

You may want to try again with master which has some write optimizations. However, using -o curldbg might reveal why you experience poor performance. You may need to tune the part size via -o multipart_size.

<!-- gh-comment-id:457031155 --> @gaul commented on GitHub (Jan 24, 2019): You may want to try again with master which has some write optimizations. However, using `-o curldbg` might reveal why you experience poor performance. You may need to tune the part size via `-o multipart_size`.
Author
Owner

@ggtakec commented on GitHub (Mar 29, 2019):

You can try to use latest version(1.86) or master branch code.
It is tuned performance.
And I recommend you should use use_cache and try to specify multipart_size option(@gaul said, too).
Thanks in advance for your help.

<!-- gh-comment-id:477874144 --> @ggtakec commented on GitHub (Mar 29, 2019): You can try to use latest version(1.86) or master branch code. It is tuned performance. And I recommend you should use use_cache and try to specify multipart_size option(@gaul said, too). Thanks in advance for your help.
Author
Owner

@pwulff commented on GitHub (Apr 3, 2019):

I get the same kind of performance on my private S3 with the master branch with a lot of files around 1 MB.

<!-- gh-comment-id:479399237 --> @pwulff commented on GitHub (Apr 3, 2019): I get the same kind of performance on my private S3 with the master branch with a lot of files around 1 MB.
Author
Owner

@gaul commented on GitHub (Apr 3, 2019):

It would help to explain exactly which operations exhibit poor performance and compare them with another tool like AWS CLI. Also please quantify this with file size and MB/s.

<!-- gh-comment-id:479461628 --> @gaul commented on GitHub (Apr 3, 2019): It would help to explain exactly which operations exhibit poor performance and compare them with another tool like AWS CLI. Also please quantify this with file size and MB/s.
Author
Owner

@pwulff commented on GitHub (Apr 4, 2019):

I have a data set of 2000 binary files of 1.000.000 bytes created with the command line:
for i in {1..2001}; do dd if=/dev/urandom bs=1000000 count=1 of=file$i; done
I copy them with the command line:
rsync --info=progress2 /src/dataset/* /dst/s3fs-fuse-mountpoint/folder
And i got between 1.5 and 2.5 MB/s.

I have made the same transfert with a Windows client and I got 12.5 MB/s.
My S3 endpoint is a local private cloud and we have a 1Gb network.

<!-- gh-comment-id:479802179 --> @pwulff commented on GitHub (Apr 4, 2019): I have a data set of 2000 binary files of 1.000.000 bytes created with the command line: `for i in {1..2001}; do dd if=/dev/urandom bs=1000000 count=1 of=file$i; done` I copy them with the command line: `rsync --info=progress2 /src/dataset/* /dst/s3fs-fuse-mountpoint/folder` And i got between 1.5 and 2.5 MB/s. I have made the same transfert with a Windows client and I got 12.5 MB/s. My S3 endpoint is a local private cloud and we have a 1Gb network.
Author
Owner

@pwulff commented on GitHub (Apr 4, 2019):

I have made a test with dbglevel=info -f -o curldbg as requested by @gaul and I got a lot of 404.

[INF] curl.cpp:RequestPerform(2250): HTTP response code 404 was returned, returning ENOENT

<!-- gh-comment-id:479835155 --> @pwulff commented on GitHub (Apr 4, 2019): I have made a test with `dbglevel=info -f -o curldbg` as requested by @gaul and I got a lot of 404. `[INF] curl.cpp:RequestPerform(2250): HTTP response code 404 was returned, returning ENOENT`
Author
Owner

@pwulff commented on GitHub (Apr 4, 2019):

After more test I've found this:
[WAN] curl.cpp:ResetHandle(1855): The CURLOPT_SSL_ENABLE_ALPN option could not be unset. S3 server does not support ALPN, then this option should be disabled to maximize performance. you need to use libcurl 7.36.0 or later. [WAN] curl.cpp:ResetHandle(1858): The S3FS_CURLOPT_KEEP_SENDING_ON_ERROR option could not be set. For maximize performance you need to enable this option and you should use libcurl 7.51.0 or later.

I will try to update my RHEL 7.6 with a more recent libcurl to see if that change anything.

<!-- gh-comment-id:479888922 --> @pwulff commented on GitHub (Apr 4, 2019): After more test I've found this: `[WAN] curl.cpp:ResetHandle(1855): The CURLOPT_SSL_ENABLE_ALPN option could not be unset. S3 server does not support ALPN, then this option should be disabled to maximize performance. you need to use libcurl 7.36.0 or later. [WAN] curl.cpp:ResetHandle(1858): The S3FS_CURLOPT_KEEP_SENDING_ON_ERROR option could not be set. For maximize performance you need to enable this option and you should use libcurl 7.51.0 or later.` I will try to update my RHEL 7.6 with a more recent libcurl to see if that change anything.
Author
Owner

@pwulff commented on GitHub (Apr 4, 2019):

No change of speed with curl 7.64.1 installed.

<!-- gh-comment-id:479937934 --> @pwulff commented on GitHub (Apr 4, 2019): No change of speed with curl 7.64.1 installed.
Author
Owner

@ggtakec commented on GitHub (Apr 7, 2019):

@pwulff
If objects in your bucket are only files/directories created by s3fs, you can try to specify the notsup_compat_dir option.
I think that using this option will reduce 404 errors.

s3fs performs checks to recognize directories of paths to objects created by other tools, and it generates 404 errors.
This option does not check these another type objects, then it not makes 404 error.

Thanks in advance for your assistance.

<!-- gh-comment-id:480581109 --> @ggtakec commented on GitHub (Apr 7, 2019): @pwulff If objects in your bucket are only files/directories created by s3fs, you can try to specify the notsup_compat_dir option. I think that using this option will reduce 404 errors. s3fs performs checks to recognize directories of paths to objects created by other tools, and it generates 404 errors. This option does not check these another type objects, then it not makes 404 error. Thanks in advance for your assistance.
Author
Owner

@pwulff commented on GitHub (Apr 12, 2019):

I've profiled s3fs during my testing and following callgrind 70% of the time is consumed by EVP_DigestUpdate(). To mount the bucket, upload 300 files of 1.000.000 of bytes and unmount the bucket this method is called more than 800.000 times. I suppose that the SHA256 hash overhead is not so painful for big files.

<!-- gh-comment-id:482607173 --> @pwulff commented on GitHub (Apr 12, 2019): I've profiled s3fs during my testing and following callgrind 70% of the time is consumed by EVP_DigestUpdate(). To mount the bucket, upload 300 files of 1.000.000 of bytes and unmount the bucket this method is called more than 800.000 times. I suppose that the SHA256 hash overhead is not so painful for big files.
Author
Owner

@ggtakec commented on GitHub (Apr 16, 2019):

@pwulff
EVP_DigestUpdate is called to create a signature to access S3.
And when you mount, many HEAD requests are sent to check the existence of directories and files in order to upload files.
I think that the number of requests has increased because of this confirmation, and the number of calls to EVP_DigestUpdate is also increasing.

Could you try the option (max_stat_cache_size, stat_cache_expire, stat_cache_interval_expire, enable_noobj_cache) to cache the file stat info once confirmed?
Thanks in advance for your assistance.

<!-- gh-comment-id:483672894 --> @ggtakec commented on GitHub (Apr 16, 2019): @pwulff EVP_DigestUpdate is called to create a signature to access S3. And when you mount, many HEAD requests are sent to check the existence of directories and files in order to upload files. I think that the number of requests has increased because of this confirmation, and the number of calls to EVP_DigestUpdate is also increasing. Could you try the option (max_stat_cache_size, stat_cache_expire, stat_cache_interval_expire, enable_noobj_cache) to cache the file stat info once confirmed? Thanks in advance for your assistance.
Author
Owner

@joshglenn commented on GitHub (May 6, 2019):

I'm also having this problem... on Mac osx mojave, copying 19 files for a total of 2.0 MB takes about 10 minutes. Slow as dial-up :-)

<!-- gh-comment-id:489800451 --> @joshglenn commented on GitHub (May 6, 2019): I'm also having this problem... on Mac osx mojave, copying 19 files for a total of 2.0 MB takes about 10 minutes. Slow as dial-up :-)
Author
Owner

@nikt12 commented on GitHub (May 14, 2019):

@ggtakec Hello, could you advice, what options should be to upload large files very fast?
I am using 1.84 version and try to upload just one file to S3 compatible storage.
Next options were used: "-o use_path_request_style -o enable_content_md5 -o allow_other -o nonempty -o kernel_cache -o max_stat_cache_size=1000000 -o parallel_count=200 -o multipart_size=100 -o singlepart_copy_limit=200 -o multireq_max=30 -o max_background=1000".
With this I could reach only near 60 MB/S uploading speed.
I'm under assumption that multipart uploading was not enabled somehow (I had 35GB file, and I think that several threads have to upload this, but with htop CPU usage for s3fs daemon was not higher than 100%).

<!-- gh-comment-id:492329273 --> @nikt12 commented on GitHub (May 14, 2019): @ggtakec Hello, could you advice, what options should be to upload large files very fast? I am using 1.84 version and try to upload just one file to S3 compatible storage. Next options were used: "-o use_path_request_style -o enable_content_md5 -o allow_other -o nonempty -o kernel_cache -o max_stat_cache_size=1000000 -o parallel_count=200 -o multipart_size=100 -o singlepart_copy_limit=200 -o multireq_max=30 -o max_background=1000". With this I could reach only near 60 MB/S uploading speed. I'm under assumption that multipart uploading was not enabled somehow (I had 35GB file, and I think that several threads have to upload this, but with htop CPU usage for s3fs daemon was not higher than 100%).
Author
Owner

@gaul commented on GitHub (Feb 3, 2020):

Could you test with the latest version 1.85? It includes several optimizations which should improve your performance.

<!-- gh-comment-id:581293233 --> @gaul commented on GitHub (Feb 3, 2020): Could you test with the latest version 1.85? It includes several optimizations which should improve your performance.
Author
Owner

@gaul commented on GitHub (Apr 22, 2020):

Please reopen if symptoms persist.

<!-- gh-comment-id:617750936 --> @gaul commented on GitHub (Apr 22, 2020): Please reopen if symptoms persist.
Author
Owner

@neilpalima commented on GitHub (May 19, 2020):

@gaul, I think I'm experiencing this issue. Is there a way on how to verify this? In an aws ec2 instance, we are writing a file with ~70mb but it takes like 40mins-1hour(or more) to be uploaded in our s3 bucket. The commit hash we are using is 005a684600f83a784c526ada078ac807e5e19633 - Fix typos.

<!-- gh-comment-id:630673010 --> @neilpalima commented on GitHub (May 19, 2020): @gaul, I think I'm experiencing this issue. Is there a way on how to verify this? In an aws ec2 instance, we are writing a file with ~70mb but it takes like 40mins-1hour(or more) to be uploaded in our s3 bucket. The commit hash we are using is `005a684600f83a784c526ada078ac807e5e19633` - `Fix typos`.
Author
Owner

@drzraf commented on GitHub (Sep 4, 2020):

I found that on openstack, in a preliminary step, for each file to upload using rsync, 4 requests are made sequentially.

  • /2018-09/x.mp4_$folder$
  • /2018-09/x.mp4/
  • /2018-09/x.mp4
  • delimiter=/&max-keys=2&prefix=2018-09/x.mp4/
    And I doubt files are processed in parallel.

As such, before the upload of 500 files even starts, 2000k requests are issued. This is at least suboptimal => notsup_compat_dir to the rescue, limit to 2 requests per file instead of 4.

<!-- gh-comment-id:686892170 --> @drzraf commented on GitHub (Sep 4, 2020): I found that on openstack, in a preliminary step, for each file to upload using rsync, 4 requests are made sequentially. - `/2018-09/x.mp4_$folder$` - `/2018-09/x.mp4/` - `/2018-09/x.mp4` - `delimiter=/&max-keys=2&prefix=2018-09/x.mp4/` And I doubt files are processed in parallel. As such, before the upload of 500 files even starts, 2000k requests are issued. This is at least suboptimal => `notsup_compat_dir` to the rescue, limit to 2 requests per file instead of 4.
Author
Owner

@gaul commented on GitHub (Sep 6, 2020):

@neilpalima s3fs should sustain 100+ MBytes/s for sequential reads and writes. Please open a new issue with your symptoms and benchmark setup if you experience poor performance.

@drzraf s3fs does not have optimal metadata performance and does some checks sequentially. #927 discusses changing these defaults. Note that this should not dramatically affect performance for large files.

<!-- gh-comment-id:687751718 --> @gaul commented on GitHub (Sep 6, 2020): @neilpalima s3fs should sustain 100+ MBytes/s for sequential reads and writes. Please open a new issue with your symptoms and benchmark setup if you experience poor performance. @drzraf s3fs does not have optimal metadata performance and does some checks sequentially. #927 discusses changing these defaults. Note that this should not dramatically affect performance for large files.
Author
Owner

@hopeseekr commented on GitHub (Sep 21, 2020):

cp -rvf works fine and fast. rsync -avW --progress --inplace --size-only seems to get frozen at sending incremental file list... all it does is create all of the directories.

<!-- gh-comment-id:696172517 --> @hopeseekr commented on GitHub (Sep 21, 2020): `cp -rvf` works fine and fast. `rsync -avW --progress --inplace --size-only` seems to get frozen at `sending incremental file list`... all it does is create all of the directories.
Author
Owner

@skepticalwaves commented on GitHub (Nov 23, 2021):

I'm having the same experience as @hopeseekr, cp works, but rsync hangs forever after creating directories.

<!-- gh-comment-id:977165085 --> @skepticalwaves commented on GitHub (Nov 23, 2021): I'm having the same experience as @hopeseekr, `cp` works, but rsync hangs forever after creating directories.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#460
No description provided.