[GH-ISSUE #1608] s3fs not parallel upload small file #845

Closed
opened 2026-03-04 01:49:18 +03:00 by kerem · 2 comments
Owner

Originally created by @dayiguizhen on GitHub (Mar 22, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1608

I'm deploy minio(4 node, 2 compute with 4 disk) using docker swarm.
the docker-compose file is below.

Then, I use s3fs mount minio to local.

s3fs -o passwd_file=/etc/.passwd-s3fs -o url=http://10.42.0.115:9001 -o allow_other -o sigv2 -o nonempty -o no_check_certificate -o use_path_request_style -o umask=000 -o parallel_count=100 -o multipart_size=100 -o max_write=131072 -o big_writes -o use_cache=/dev/shm -o enable_noobj_cache marvel /minio/

I use dd do a benchmark test speed between minio/nfs and local disk.

This is my python script:

minio_path = '/minio/test'
nfs_path = '/data/FileServer/test'
local_path = '/data/FileServer1/tmp_test'
goofys = '/goofys/test/test'
path = minio_path
bs = 64
ts = 'M'
parallel = 100


def run_dd(i):
    cmd = f'dd if=/dev/zero of={path}_{i} bs={bs}{ts} count=1 oflag=direct'
    speed = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
    out = speed.stdout
    s = float(str(out).split(',')[-1][:-7].strip(' '))
    #os.system(f"echo index {i} : {s} >> speed.txt")
    os.system(f"echo index {i} : {out} >> speed.txt")

process_list = []
for i in range(parallel):
    process_list.append(multiprocessing.Process(target=run_dd, args=(i,)))

start = time.time()
for p in process_list:
    p.start()
p.join()

But when I try to write small file like 512K, it was seem like serial not parallel.

So if I want trans serial to parallel, what should I config?

\file 512K(10,100 parallel) 1M 16M 256M 512M
minio 1.6MB/S,2.MB/S 3MB/S,3.74MB/S 42MB/S,45MB/S 204MB/S,211MB/S 214MB/S,250MB/S
NFS 87MB/S,48MB/S 95MB/S,106MB/S 107MB/S,111MB/S 106MB/s,110MB/S 104MB/S,111MB/S
nfs 242MB/S,588MB/S 171MB/S,1179MB/S 2404MB/S,613MB/S 600MB/S, 436MB/S 492MB/S,362MB/S
Originally created by @dayiguizhen on GitHub (Mar 22, 2021). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1608 I'm deploy minio(4 node, 2 compute with 4 disk) using docker swarm. the docker-compose file is below. Then, I use s3fs mount minio to local. `s3fs -o passwd_file=/etc/.passwd-s3fs -o url=http://10.42.0.115:9001 -o allow_other -o sigv2 -o nonempty -o no_check_certificate -o use_path_request_style -o umask=000 -o parallel_count=100 -o multipart_size=100 -o max_write=131072 -o big_writes -o use_cache=/dev/shm -o enable_noobj_cache marvel /minio/` I use `dd` do a benchmark test speed between minio/nfs and local disk. This is my python script: ``` minio_path = '/minio/test' nfs_path = '/data/FileServer/test' local_path = '/data/FileServer1/tmp_test' goofys = '/goofys/test/test' path = minio_path bs = 64 ts = 'M' parallel = 100 def run_dd(i): cmd = f'dd if=/dev/zero of={path}_{i} bs={bs}{ts} count=1 oflag=direct' speed = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True) out = speed.stdout s = float(str(out).split(',')[-1][:-7].strip(' ')) #os.system(f"echo index {i} : {s} >> speed.txt") os.system(f"echo index {i} : {out} >> speed.txt") process_list = [] for i in range(parallel): process_list.append(multiprocessing.Process(target=run_dd, args=(i,))) start = time.time() for p in process_list: p.start() p.join() ``` But when I try to write small file like 512K, it was seem like serial not parallel. So if I want trans serial to parallel, what should I config? \file | 512K(10,100 parallel) | 1M | 16M | 256M | 512M -- | -- | -- | -- | -- | -- minio | 1.6MB/S,2.MB/S | 3MB/S,3.74MB/S | 42MB/S,45MB/S | 204MB/S,211MB/S | 214MB/S,250MB/S NFS | 87MB/S,48MB/S | 95MB/S,106MB/S | 107MB/S,111MB/S | 106MB/s,110MB/S | 104MB/S,111MB/S nfs | 242MB/S,588MB/S | 171MB/S,1179MB/S | 2404MB/S,613MB/S | 600MB/S, 436MB/S | 492MB/S,362MB/S
kerem closed this issue 2026-03-04 01:49:18 +03:00
Author
Owner

@gaul commented on GitHub (Mar 22, 2021):

You have set -o multipart_size=100 so s3fs will serially upload files smaller than 100 MB. You can set this value as small as 5 MB and the default is 10 MB. Changing this should improve performance for the 16 MB and larger columns but 512 KiB and 1 MB will remain the same. At these small sizes, creating the zero-byte object also becomes a bottleneck #1013.

For multi-file parallelism I am not sure that s3fs limits curl in any way? It may be worth experimenting to find out.

Also which version of s3fs do you use? Later versions improve performance.

<!-- gh-comment-id:803915258 --> @gaul commented on GitHub (Mar 22, 2021): You have set `-o multipart_size=100` so s3fs will serially upload files smaller than 100 MB. You can set this value as small as 5 MB and the default is 10 MB. Changing this should improve performance for the 16 MB and larger columns but 512 KiB and 1 MB will remain the same. At these small sizes, creating the zero-byte object also becomes a bottleneck #1013. For multi-file parallelism I am not sure that s3fs limits curl in any way? It may be worth experimenting to find out. Also which version of s3fs do you use? Later versions improve performance.
Author
Owner

@gaul commented on GitHub (May 30, 2021):

Closing since there is nothing actionable here. Note that #1640 improves performance of small files by not creating the unneeded zero-byte object.

<!-- gh-comment-id:850919449 --> @gaul commented on GitHub (May 30, 2021): Closing since there is nothing actionable here. Note that #1640 improves performance of small files by not creating the unneeded zero-byte object.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#845
No description provided.