[GH-ISSUE #2294] Copy from S3 bucket is slow #1143

Open
opened 2026-03-04 01:51:42 +03:00 by kerem · 3 comments
Owner

Originally created by @vitalyk-multinarity on GitHub (Aug 27, 2023).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2294

Additional Information

Version of s3fs being used (s3fs --version)

V1.93 (commit:82107f4) (the same results with v1.90)

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse or dpkg -s fuse)

2.9.9-5ubuntu3

Kernel information (uname -r)

6.2.0-1009-aws

GNU/Linux Distribution, if applicable (cat /etc/os-release)

PRETTY_NAME="Ubuntu 22.04.2 LTS"

How to run s3fs, if applicable

Used command line

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

Aug 27 11:07:08 ip-172-31-9-108 s3fs[1016]: s3fs version 1.93(82107f4) : s3fs -o iam_role my-test-bucket mount-point/
Aug 27 11:07:08 ip-172-31-9-108 s3fs[1016]: Loaded mime information from /etc/mime.types
Aug 27 11:07:08 ip-172-31-9-108 s3fs[1019]: init v1.93(commit:82107f4) with OpenSSL, credential-library(built-in)
Aug 27 11:07:08 ip-172-31-9-108 s3fs[1019]: s3fs.cpp:s3fs_check_service(4430): Failed to connect region 'us-east-1'(default), so retry to connect region 'eu-central-1' for url(http(s)://s3-eu-central-1.amazonaws.com).

Details about issue

Copy from S3 bucket is very slow: I have a test bucket with 3.7GB of data. "aws s3 cp --recursive s3://my-test/ /tmp" takes 1 minute, but "cp -r my-mount-point /tmp" takes >10 minutes.
Interesting that the first 1.3GB copied in 1 minute, but after that copying is much slower...

Originally created by @vitalyk-multinarity on GitHub (Aug 27, 2023). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2294 ### Additional Information #### Version of s3fs being used (`s3fs --version`) V1.93 (commit:82107f4) (the same results with v1.90) #### Version of fuse being used (`pkg-config --modversion fuse`, `rpm -qi fuse` or `dpkg -s fuse`) 2.9.9-5ubuntu3 #### Kernel information (`uname -r`) 6.2.0-1009-aws #### GNU/Linux Distribution, if applicable (`cat /etc/os-release`) PRETTY_NAME="Ubuntu 22.04.2 LTS" #### How to run s3fs, if applicable Used command line #### s3fs syslog messages (`grep s3fs /var/log/syslog`, `journalctl | grep s3fs`, or `s3fs outputs`) ``` Aug 27 11:07:08 ip-172-31-9-108 s3fs[1016]: s3fs version 1.93(82107f4) : s3fs -o iam_role my-test-bucket mount-point/ Aug 27 11:07:08 ip-172-31-9-108 s3fs[1016]: Loaded mime information from /etc/mime.types Aug 27 11:07:08 ip-172-31-9-108 s3fs[1019]: init v1.93(commit:82107f4) with OpenSSL, credential-library(built-in) Aug 27 11:07:08 ip-172-31-9-108 s3fs[1019]: s3fs.cpp:s3fs_check_service(4430): Failed to connect region 'us-east-1'(default), so retry to connect region 'eu-central-1' for url(http(s)://s3-eu-central-1.amazonaws.com). ``` ### Details about issue Copy from S3 bucket is very slow: I have a test bucket with 3.7GB of data. "aws s3 cp --recursive s3://my-test/ /tmp" takes 1 minute, but "cp -r my-mount-point /tmp" takes >10 minutes. Interesting that the first 1.3GB copied in 1 minute, but after that copying is much slower...
Author
Owner

@gaul commented on GitHub (Aug 27, 2023):

If the bucket contains many files then AWS CLI will be faster since it copies in parallel while cp copies serially.

<!-- gh-comment-id:1694693520 --> @gaul commented on GitHub (Aug 27, 2023): If the bucket contains many files then AWS CLI will be faster since it copies in parallel while `cp` copies serially.
Author
Owner

@vitalyk-multinarity commented on GitHub (Aug 28, 2023):

If the bucket contains many files then AWS CLI will be faster since it copies in parallel while cp copies serially.

@gaul - thank you!
Yes, I have a few big files (movies), plus thousands small images.

Somehow I was sure that s3fs if multithreaded.

<!-- gh-comment-id:1695662683 --> @vitalyk-multinarity commented on GitHub (Aug 28, 2023): > If the bucket contains many files then AWS CLI will be faster since it copies in parallel while `cp` copies serially. @gaul - thank you! Yes, I have a few big files (movies), plus thousands small images. Somehow I was sure that s3fs if multithreaded.
Author
Owner

@gaul commented on GitHub (Aug 28, 2023):

s3fs is multithreaded but if the application does not copy in parallel then s3fs cannot operate in parallel. So if you use something like https://superuser.com/a/536643 s3fs performance should improve.

<!-- gh-comment-id:1696306434 --> @gaul commented on GitHub (Aug 28, 2023): s3fs is multithreaded but if the application does not copy in parallel then s3fs cannot operate in parallel. So if you use something like https://superuser.com/a/536643 s3fs performance should improve.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#1143
No description provided.