[GH-ISSUE #1936] S3FS mount point, upload restart if file > 5GB #973

Open
opened 2026-03-04 01:50:21 +03:00 by kerem · 7 comments
Owner

Originally created by @yguerchet on GitHub (Apr 21, 2022).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1936

Hello,

I did a mount point with the s3fs solution, when I copy a file smaller than 5GB I have no problem, but when it exceeds 5GB the upload is restarted one or more times. In the end it works, my file is fully uploaded with segment sizes defined according to my multipart_size directive. But the problem is that all the upload segments that failed at the beginning remain in my S3 bucket
.
This poses two problems for me, the first is that I pay for storage per gigabyte so I have orphan segments that I pay for nothing. The second problem is that I pay for incoming traffic (in gigabytes), so I pay for uploads that fail.

I looked at the logs but I don't see any error, don't hesitate to ask me for the logs or a specific part if necessary.

Here is the command I do: s3fs BucketName /MountPoint -o passwd_file="/root/.passwd-s3fs" -o url=https://URL_Of_My_Provider -o use_path_request_style -o multipart_size=500 -o max_stat_cache_size=100000 -o parallel_count= 50 -o multireq_max=50

I tried to change the value of my multipart_size multiple times but it didn't change anything.

Thank you in advance for your help.

version s3fs : V1.91
version fuse : 2.9.2, release 11.el7
kernel information : 3.10.0-1160.62.1.el7.x86_64
GNU/Linux Distribution : Centos Linux 7

Originally created by @yguerchet on GitHub (Apr 21, 2022). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1936 Hello, I did a mount point with the s3fs solution, when I copy a file smaller than 5GB I have no problem, but when it exceeds 5GB the upload is restarted one or more times. In the end it works, my file is fully uploaded with segment sizes defined according to my multipart_size directive. But the problem is that all the upload segments that failed at the beginning remain in my S3 bucket . This poses two problems for me, the first is that I pay for storage per gigabyte so I have orphan segments that I pay for nothing. The second problem is that I pay for incoming traffic (in gigabytes), so I pay for uploads that fail. I looked at the logs but I don't see any error, don't hesitate to ask me for the logs or a specific part if necessary. Here is the command I do: s3fs BucketName /MountPoint -o passwd_file="/root/.passwd-s3fs" -o url=https://URL_Of_My_Provider -o use_path_request_style -o multipart_size=500 -o max_stat_cache_size=100000 -o parallel_count= 50 -o multireq_max=50 I tried to change the value of my multipart_size multiple times but it didn't change anything. Thank you in advance for your help. version s3fs : V1.91 version fuse : 2.9.2, release 11.el7 kernel information : 3.10.0-1160.62.1.el7.x86_64 GNU/Linux Distribution : Centos Linux 7
Author
Owner

@sbrudenell commented on GitHub (Apr 24, 2022):

I have the same problem, using s3fs with backblaze b2. I have:

# s3fs --version
Amazon Simple Storage Service File System V1.91 (commit:unknown) with OpenSSL
s3fs -f /s3 -d -o allow_root -o bucket=<redacted> -o url=https://s3.us-west-004.backblazeb2.com -o sigv4 -o enable_content_md5 -o multipart_size=200 -o use_cache=/tmp -o del_cache -o enable_noobj_cache

Debug logs around the time when the upload restarts:

s3fs_1   | 2022-04-24T00:26:18.690Z [INF]       curl.cpp:UploadMultipartPostRequest(3947): [tpath=/filename][start=26841858145][size=1736704][part=929]
s3fs_1   | 2022-04-24T00:26:18.690Z [INF]       curl.cpp:UploadMultipartPostSetup(3888): [tpath=/filename][start=26841858145][size=1736704][part=929]
s3fs_1   | 2022-04-24T00:26:18.699Z [INF]       curl_util.cpp:prepare_url(255): URL is https://s3.us-west-004.backblazeb2.com/bucket/filename?partNumber=929&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370
s3fs_1   | 2022-04-24T00:26:18.699Z [INF]       curl_util.cpp:prepare_url(288): URL changed is https://bucket.s3.us-west-004.backblazeb2.com/filename?partNumber=929&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370
s3fs_1   | 2022-04-24T00:26:18.710Z [INF]       curl.cpp:insertV4Headers(2696): computing signature [PUT] [/filename] [partNumber=929&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370] [bdcfcc91c3a9d6092035b0f2f7937b63f1c04315b9a0f1ab3f77cdb8c4bef0e2]
s3fs_1   | 2022-04-24T00:26:18.710Z [INF]       curl_util.cpp:url_to_host(332): url is https://s3.us-west-004.backblazeb2.com
s3fs_1   | 2022-04-24T00:26:27.840Z [INF]       curl.cpp:RequestPerform(2324): HTTP response code 200
s3fs_1   | 2022-04-24T00:26:27.842Z [INF]       curl.cpp:CompleteMultipartPostRequest(3695): [tpath=/filename][parts=929]
s3fs_1   | 2022-04-24T00:26:27.847Z [INF]       curl_util.cpp:prepare_url(255): URL is https://s3.us-west-004.backblazeb2.com/bucket/filename?uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370
s3fs_1   | 2022-04-24T00:26:27.847Z [INF]       curl_util.cpp:prepare_url(288): URL changed is https://bucket.s3.us-west-004.backblazeb2.com/filename?uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370
s3fs_1   | 2022-04-24T00:26:27.849Z [INF]       curl.cpp:insertV4Headers(2696): computing signature [POST] [/filename] [uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370] [fdb57ed6ebfaa09e53e2cee64c2d474adb57b90b0b59cd6c4e6c38c6ceac328e]
s3fs_1   | 2022-04-24T00:26:27.849Z [INF]       curl_util.cpp:url_to_host(332): url is https://s3.us-west-004.backblazeb2.com
s3fs_1   | 2022-04-24T00:26:28.550Z [INF]       curl.cpp:RequestPerform(2324): HTTP response code 200
s3fs_1   | 2022-04-24T00:26:39.253Z [INF] fdcache_fdinfo.cpp:ClearUploadInfo(121): Implementation of cancellation process for multipart upload is awaited.
s3fs_1   | 2022-04-24T00:26:39.253Z [INF]       curl.cpp:PreMultipartPostRequest(3581): [tpath=/filename]
s3fs_1   | 2022-04-24T00:26:39.253Z [INF]       curl_util.cpp:prepare_url(255): URL is https://s3.us-west-004.backblazeb2.com/bucket/filename?uploads
s3fs_1   | 2022-04-24T00:26:39.253Z [INF]       curl_util.cpp:prepare_url(288): URL changed is https://bucket.s3.us-west-004.backblazeb2.com/filename?uploads
s3fs_1   | 2022-04-24T00:26:39.253Z [INF]       curl.cpp:insertV4Headers(2696): computing signature [POST] [/filename] [uploads] []
s3fs_1   | 2022-04-24T00:26:39.253Z [INF]       curl_util.cpp:url_to_host(332): url is https://s3.us-west-004.backblazeb2.com
s3fs_1   | 2022-04-24T00:26:39.492Z [INF]       curl.cpp:RequestPerform(2324): HTTP response code 200
s3fs_1   | 2022-04-24T00:26:39.492Z [INF] fdcache_fdinfo.cpp:ClearUploadInfo(121): Implementation of cancellation process for multipart upload is awaited.
s3fs_1   | 2022-04-24T00:26:39.492Z [INF]       fdcache_entity.cpp:NoCacheLoadAndPost(1075): [path=/filename][physical_fd=44][offset=0][size=27912069217]
s3fs_1   | 2022-04-24T00:26:39.493Z [INF]       curl.cpp:MultipartUploadRequest(4150): [upload_id=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456][tpath=/aubrey.20220423T223601+0000.btrfs.gz.gpg][fd=44][offset=0][size=209715200]
s3fs_1   | 2022-04-24T00:26:39.493Z [INF]       curl.cpp:UploadMultipartPostRequest(3947): [tpath=/filename][start=0][size=209715200][part=1]
s3fs_1   | 2022-04-24T00:26:39.493Z [INF]       curl.cpp:UploadMultipartPostSetup(3888): [tpath=/filename][start=0][size=209715200][part=1]
s3fs_1   | 2022-04-24T00:26:40.096Z [INF]       curl_util.cpp:prepare_url(255): URL is https://s3.us-west-004.backblazeb2.com/bucket/filename?partNumber=1&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456
s3fs_1   | 2022-04-24T00:26:40.096Z [INF]       curl_util.cpp:prepare_url(288): URL changed is https://bucket.s3.us-west-004.backblazeb2.com/filename?partNumber=1&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456
s3fs_1   | 2022-04-24T00:26:40.805Z [INF]       curl.cpp:insertV4Headers(2696): computing signature [PUT] [/filename] [partNumber=1&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456] [72abf2ca8f36943ebe2e49ca3a51d409ca5f0bfcffab6c9d25643c17c32889da]
s3fs_1   | 2022-04-24T00:26:40.805Z [INF]       curl_util.cpp:url_to_host(332): url is https://s3.us-west-004.backblazeb2.com
s3fs_1   | 2022-04-24T00:26:45.707Z [INF]       curl.cpp:RequestPerform(2324): HTTP response code 200
s3fs_1   | 2022-04-24T00:26:45.707Z [INF]       curl.cpp:MultipartUploadRequest(4150): [upload_id=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456][tpath=/aubrey.20220423T223601+0000.btrfs.gz.gpg][fd=44][offset=209715200][size=209715200]

As you can see, it creates a new upload_id.

<!-- gh-comment-id:1107674780 --> @sbrudenell commented on GitHub (Apr 24, 2022): I have the same problem, using s3fs with backblaze b2. I have: ``` # s3fs --version Amazon Simple Storage Service File System V1.91 (commit:unknown) with OpenSSL ``` ``` s3fs -f /s3 -d -o allow_root -o bucket=<redacted> -o url=https://s3.us-west-004.backblazeb2.com -o sigv4 -o enable_content_md5 -o multipart_size=200 -o use_cache=/tmp -o del_cache -o enable_noobj_cache ``` Debug logs around the time when the upload restarts: ``` s3fs_1 | 2022-04-24T00:26:18.690Z [INF] curl.cpp:UploadMultipartPostRequest(3947): [tpath=/filename][start=26841858145][size=1736704][part=929] s3fs_1 | 2022-04-24T00:26:18.690Z [INF] curl.cpp:UploadMultipartPostSetup(3888): [tpath=/filename][start=26841858145][size=1736704][part=929] s3fs_1 | 2022-04-24T00:26:18.699Z [INF] curl_util.cpp:prepare_url(255): URL is https://s3.us-west-004.backblazeb2.com/bucket/filename?partNumber=929&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370 s3fs_1 | 2022-04-24T00:26:18.699Z [INF] curl_util.cpp:prepare_url(288): URL changed is https://bucket.s3.us-west-004.backblazeb2.com/filename?partNumber=929&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370 s3fs_1 | 2022-04-24T00:26:18.710Z [INF] curl.cpp:insertV4Headers(2696): computing signature [PUT] [/filename] [partNumber=929&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370] [bdcfcc91c3a9d6092035b0f2f7937b63f1c04315b9a0f1ab3f77cdb8c4bef0e2] s3fs_1 | 2022-04-24T00:26:18.710Z [INF] curl_util.cpp:url_to_host(332): url is https://s3.us-west-004.backblazeb2.com s3fs_1 | 2022-04-24T00:26:27.840Z [INF] curl.cpp:RequestPerform(2324): HTTP response code 200 s3fs_1 | 2022-04-24T00:26:27.842Z [INF] curl.cpp:CompleteMultipartPostRequest(3695): [tpath=/filename][parts=929] s3fs_1 | 2022-04-24T00:26:27.847Z [INF] curl_util.cpp:prepare_url(255): URL is https://s3.us-west-004.backblazeb2.com/bucket/filename?uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370 s3fs_1 | 2022-04-24T00:26:27.847Z [INF] curl_util.cpp:prepare_url(288): URL changed is https://bucket.s3.us-west-004.backblazeb2.com/filename?uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370 s3fs_1 | 2022-04-24T00:26:27.849Z [INF] curl.cpp:insertV4Headers(2696): computing signature [POST] [/filename] [uploadId=4_z1dfeb93105b823ad8705001c_f201967ca61852c44_d20220424_m000229_c004_v0402006_t0052_u01650758549370] [fdb57ed6ebfaa09e53e2cee64c2d474adb57b90b0b59cd6c4e6c38c6ceac328e] s3fs_1 | 2022-04-24T00:26:27.849Z [INF] curl_util.cpp:url_to_host(332): url is https://s3.us-west-004.backblazeb2.com s3fs_1 | 2022-04-24T00:26:28.550Z [INF] curl.cpp:RequestPerform(2324): HTTP response code 200 s3fs_1 | 2022-04-24T00:26:39.253Z [INF] fdcache_fdinfo.cpp:ClearUploadInfo(121): Implementation of cancellation process for multipart upload is awaited. s3fs_1 | 2022-04-24T00:26:39.253Z [INF] curl.cpp:PreMultipartPostRequest(3581): [tpath=/filename] s3fs_1 | 2022-04-24T00:26:39.253Z [INF] curl_util.cpp:prepare_url(255): URL is https://s3.us-west-004.backblazeb2.com/bucket/filename?uploads s3fs_1 | 2022-04-24T00:26:39.253Z [INF] curl_util.cpp:prepare_url(288): URL changed is https://bucket.s3.us-west-004.backblazeb2.com/filename?uploads s3fs_1 | 2022-04-24T00:26:39.253Z [INF] curl.cpp:insertV4Headers(2696): computing signature [POST] [/filename] [uploads] [] s3fs_1 | 2022-04-24T00:26:39.253Z [INF] curl_util.cpp:url_to_host(332): url is https://s3.us-west-004.backblazeb2.com s3fs_1 | 2022-04-24T00:26:39.492Z [INF] curl.cpp:RequestPerform(2324): HTTP response code 200 s3fs_1 | 2022-04-24T00:26:39.492Z [INF] fdcache_fdinfo.cpp:ClearUploadInfo(121): Implementation of cancellation process for multipart upload is awaited. s3fs_1 | 2022-04-24T00:26:39.492Z [INF] fdcache_entity.cpp:NoCacheLoadAndPost(1075): [path=/filename][physical_fd=44][offset=0][size=27912069217] s3fs_1 | 2022-04-24T00:26:39.493Z [INF] curl.cpp:MultipartUploadRequest(4150): [upload_id=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456][tpath=/aubrey.20220423T223601+0000.btrfs.gz.gpg][fd=44][offset=0][size=209715200] s3fs_1 | 2022-04-24T00:26:39.493Z [INF] curl.cpp:UploadMultipartPostRequest(3947): [tpath=/filename][start=0][size=209715200][part=1] s3fs_1 | 2022-04-24T00:26:39.493Z [INF] curl.cpp:UploadMultipartPostSetup(3888): [tpath=/filename][start=0][size=209715200][part=1] s3fs_1 | 2022-04-24T00:26:40.096Z [INF] curl_util.cpp:prepare_url(255): URL is https://s3.us-west-004.backblazeb2.com/bucket/filename?partNumber=1&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456 s3fs_1 | 2022-04-24T00:26:40.096Z [INF] curl_util.cpp:prepare_url(288): URL changed is https://bucket.s3.us-west-004.backblazeb2.com/filename?partNumber=1&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456 s3fs_1 | 2022-04-24T00:26:40.805Z [INF] curl.cpp:insertV4Headers(2696): computing signature [PUT] [/filename] [partNumber=1&uploadId=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456] [72abf2ca8f36943ebe2e49ca3a51d409ca5f0bfcffab6c9d25643c17c32889da] s3fs_1 | 2022-04-24T00:26:40.805Z [INF] curl_util.cpp:url_to_host(332): url is https://s3.us-west-004.backblazeb2.com s3fs_1 | 2022-04-24T00:26:45.707Z [INF] curl.cpp:RequestPerform(2324): HTTP response code 200 s3fs_1 | 2022-04-24T00:26:45.707Z [INF] curl.cpp:MultipartUploadRequest(4150): [upload_id=4_z1dfeb93105b823ad8705001c_f201967ca6185ee66_d20220424_m002639_c004_v0402000_t0009_u01650759999456][tpath=/aubrey.20220423T223601+0000.btrfs.gz.gpg][fd=44][offset=209715200][size=209715200] ``` As you can see, it creates a new `upload_id`.
Author
Owner

@yguerchet commented on GitHub (Apr 25, 2022):

Hello,
Yes I confirm on my side, it also create a new upload id.

<!-- gh-comment-id:1108185915 --> @yguerchet commented on GitHub (Apr 25, 2022): Hello, Yes I confirm on my side, it also create a new upload id.
Author
Owner

@sbrudenell commented on GitHub (Apr 30, 2022):

I did some more testing.

  • This issue does not occur if I use -o nomixupload. With that option, I just get a single upload id with all my data, as opposed to a bunch of overlapping partitions.
  • The 5GB boundaries of partitioning seems to be due to the default value of max_dirty_data. If I use -o max_dirty_data=1024 (without -o nomixupload) I get partitions having sizes which are multiples of 1GB, rather than 5GB.

Cross-referencing this to the code, it looks like FdEntity::RowFlushMixMultipart has some bug where it rewrites data from zero every time, rather than only a new chunk of dirty data.

I can use -o nomixupload for my use case for now, so I won't do any more testing.

<!-- gh-comment-id:1114038696 --> @sbrudenell commented on GitHub (Apr 30, 2022): I did some more testing. * This issue does not occur if I use `-o nomixupload`. With that option, I just get a single upload id with all my data, as opposed to a bunch of overlapping partitions. * The 5GB boundaries of partitioning seems to be due to the default value of `max_dirty_data`. If I use `-o max_dirty_data=1024` (*without* `-o nomixupload`) I get partitions having sizes which are multiples of 1GB, rather than 5GB. Cross-referencing this to the code, it looks like `FdEntity::RowFlushMixMultipart` has some bug where it rewrites data from zero every time, rather than only a new chunk of dirty data. I can use `-o nomixupload` for my use case for now, so I won't do any more testing.
Author
Owner

@yguerchet commented on GitHub (May 2, 2022):

Hello,
Thank's you it work for me to ! :)
But i don't understand this sentence : "as opposed to a bunch of overlapping partitions.". Can you explain ? Thank's

<!-- gh-comment-id:1114621023 --> @yguerchet commented on GitHub (May 2, 2022): Hello, Thank's you it work for me to ! :) But i don't understand this sentence : "as opposed to a bunch of overlapping partitions.". Can you explain ? Thank's
Author
Owner

@imorandinwnp commented on GitHub (Jul 22, 2022):

Hi,

I have the same problem @sbrudenell and using nomixupload kind of works: in a 75GB file instead of having around 11 partial files I have 2 large files in B2:

image

Ignacio

<!-- gh-comment-id:1192803252 --> @imorandinwnp commented on GitHub (Jul 22, 2022): Hi, I have the same problem @sbrudenell and using nomixupload kind of works: in a 75GB file instead of having around 11 partial files I have 2 large files in B2: ![image](https://user-images.githubusercontent.com/35805127/180495108-518eb222-ffca-42d6-bb9e-e8137fa7b967.png) Ignacio
Author
Owner

@imorandinwnp commented on GitHub (Jul 25, 2022):

After weekend backups I can see 4 of each large file:
image

Still not working for me. Any recommendations?

<!-- gh-comment-id:1194013370 --> @imorandinwnp commented on GitHub (Jul 25, 2022): After weekend backups I can see 4 of each large file: ![image](https://user-images.githubusercontent.com/35805127/180782320-087cc31b-98a7-4de5-8bad-e4e64581f674.png) Still not working for me. Any recommendations?
Author
Owner

@celesteking commented on GitHub (Mar 1, 2024):

Absolutely terrible backblaze s3fs / b2 s3fs / backblaze linux mount support. Same problem with the same multiple created versions as described above (your cost would be multitude ).

Backblaze won't work properly with this thing! Have spent several hours debugging and adjusting parameters.

<!-- gh-comment-id:1972229134 --> @celesteking commented on GitHub (Mar 1, 2024): Absolutely terrible backblaze s3fs / b2 s3fs / backblaze linux mount support. Same problem with the same multiple created versions as described above (your cost would be multitude $$). Backblaze won't work properly with this thing! Have spent several hours debugging and adjusting parameters.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#973
No description provided.