[GH-ISSUE #1542] Writing large files with "-o nomultipart" fails #813

Closed
opened 2026-03-04 01:48:59 +03:00 by kerem · 18 comments
Owner

Originally created by @CarstenGrohmann on GitHub (Feb 1, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1542

Hello,

I tried to copy a large file into a bucket via s2fs -o nomultipart. The result is unexpected.

My expectations for "nomultipart" are that the upload uses a single object upload request, stops at the 5GB limit and the application exits with an error.

In contrast, the command cp copied a complete 7GB file and only failed as expected at chown. For this, the checksum of the file in the S3 bucket does not match the local one.

$ cp -p bigfile /mybucket/
cp: failed to preserve ownership for â/mybucket/bigfileâ: Input/output error

$ date -u; ll bigfile /mybucket/bigfile
Mon Feb  1 12:07:47 UTC 2021
-rw-r--r-- 1 tcpdump tcpdump 7486832640 Jan 27 11:20 bigfile
-rw------- 1 tcpdump tcpdump 7486832640 Jan 27 11:20 /mybucket/bigfile

$ date -u; cp -p /mybucket/bigfile bigfile.2
Mon Feb  1 12:08:58 UTC 2021

$ md5sum bigfile*
92b6de21419129456f82da791e225c5b  bigfile
194d108e7572007ad27b8bc33285512e  bigfile.2

$ date -u; md5sum /mybucket/bigfile
Mon Feb  1 12:12:43 UTC 2021
194d108e7572007ad27b8bc33285512e  /mybucket/bigfile

$ xxd bigfile > bigfile.hex

$ xxd bigfile.2 > bigfile.2.hex

$ head bigfile.hex bigfile.2.hex
==> bigfile.hex <==
0000000: 33ed 9090 9090 9090 9090 9090 9090 9090  3...............
0000010: 9090 9090 9090 9090 9090 9090 9090 9090  ................
0000020: 33ed fa8e d5bc 007c fbfc 6631 db66 31c9  3......|..f1.f1.
0000030: 6653 6651 0657 8edd 8ec5 52be 007c bf00  fSfQ.W....R..|..
0000040: 06b9 0001 f3a5 ea4b 0600 0052 b441 bbaa  .......K...R.A..
0000050: 5531 c930 f6f9 cd13 7216 81fb 55aa 7510  U1.0....r...U.u.
0000060: 83e1 0174 0b66 c706 f106 b442 eb15 eb00  ...t.f.....B....
0000070: 5a51 b408 cd13 83e1 3f5b 510f b6c6 4050  ZQ......?[Q...@P
0000080: f7e1 5352 50bb 007c b904 0066 a1b0 07e8  ..SRP..|...f....
0000090: 4400 0f82 8000 6640 80c7 02e2 f266 813e  D.....f@.....f.>

==> bigfile.2.hex <==
0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................

There are two other issues I am currently struggling with:

  1. The application writes 7 GB, but s3fs uses multipart upload only for the first 5 GB and not for all data. Why does s3fs stops using multipart uploads at the 5 GB limit?
  2. The final flush writes the entire file (the previously written 5 GB plus the missing 2 GB) to the bucket again. This is fine, since S3 does not allow object modification. But if I remember correctly, S3 has defined a limit of 5GB for single upload operations. This should make Flush use a multi-part upload. Right?

The full debug log is 270M, thereby I don't attach it. I can share it on demand.

Debug output s3fs:

$ ./s3fs mybucket /mybucket -o url=https://s3service:8080 -o use_path_request_style,no_check_certificate,nomultipart,multipart_size=256,curldbg,dbglevel=debug -d -d -f
2021-02-01T12:03:36.008Z [CRT] s3fs_logger.cpp:LowSetLogLevel(202): change debug level from [CRT] to [DBG]
2021-02-01T12:03:36.008Z [INF]     s3fs.cpp:set_mountpoint_attribute(3990): PROC(uid=0, gid=0) - MountPoint(uid=0, gid=0, mode=40755)
2021-02-01T12:03:36.008Z [DBG] curl.cpp:InitMimeType(410): Try to load mime types from /etc/mime.types file.
2021-02-01T12:03:36.008Z [DBG] curl.cpp:InitMimeType(415): The old mime types are cleared to load new mime types.
2021-02-01T12:03:36.010Z [INF] curl.cpp:InitMimeType(438): Loaded mime information from /etc/mime.types
2021-02-01T12:03:36.010Z [INF] fdcache_stat.cpp:CheckCacheFileStatTopDir(79): The path to cache top dir is empty, thus not need to check permission.
...

s3fs starts writing to the S3 bucket using multipart requests:

# grep MultipartPost s3fs_20210201.log | head
2021-02-01T12:04:34.863Z [INF]       curl.cpp:PreMultipartPostRequest(3400): [tpath=/bigfile]
2021-02-01T12:04:34.869Z [INF]       curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=0][size=268435456][part=1]
2021-02-01T12:04:34.869Z [INF]       curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=0][size=268435456][part=1]
2021-02-01T12:04:37.299Z [INF]       curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=268435456][size=268435456][part=2]
2021-02-01T12:04:37.299Z [INF]       curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=268435456][size=268435456][part=2]
2021-02-01T12:04:39.354Z [INF]       curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=536870912][size=268435456][part=3]
2021-02-01T12:04:39.354Z [INF]       curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=536870912][size=268435456][part=3]
2021-02-01T12:04:42.301Z [INF]       curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=805306368][size=268435456][part=4]
2021-02-01T12:04:42.301Z [INF]       curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=805306368][size=268435456][part=4]
2021-02-01T12:04:45.158Z [INF]       curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=1073741824][size=268435456][part=5]

$ grep MultipartPost s3fs_20210201.log | tail
2021-02-01T12:05:15.531Z [INF]       curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=4107468800][size=268435456][part=17]
2021-02-01T12:05:18.969Z [INF]       curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=4375904256][size=268435456][part=18]
2021-02-01T12:05:18.969Z [INF]       curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=4375904256][size=268435456][part=18]
2021-02-01T12:05:23.422Z [INF]       curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=4644339712][size=268435456][part=19]
2021-02-01T12:05:23.422Z [INF]       curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=4644339712][size=268435456][part=19]
2021-02-01T12:05:27.175Z [INF]       curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=4912775168][size=268435456][part=20]
2021-02-01T12:05:27.175Z [INF]       curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=4912775168][size=268435456][part=20]
2021-02-01T12:05:30.112Z [INF]       curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=5181210624][size=187498496][part=21]
2021-02-01T12:05:30.112Z [INF]       curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=5181210624][size=187498496][part=21]
2021-02-01T12:05:31.557Z [INF]       curl.cpp:CompleteMultipartPostRequest(3500): [tpath=/bigfile][parts=21]

But at 5GB ([start=5181210624][size=187498496]) s3fs finalised the multipart request and even if it still got write requests from the application side:

2021-02-01T12:05:31.557Z [DBG] curl_handlerpool.cpp:ReturnHandler(103): Return handler to pool
2021-02-01T12:05:31.557Z [INF]       curl.cpp:CompleteMultipartPostRequest(3500): [tpath=/bigfile][parts=21]
2021-02-01T12:05:31.557Z [DBG] curl_handlerpool.cpp:GetHandler(81): Get handler from pool: rest = 30
2021-02-01T12:05:31.557Z [DBG] curl.cpp:ResetHandle(1915): 'no_check_certificate' option in effect.
2021-02-01T12:05:31.557Z [DBG] curl.cpp:ResetHandle(1916): The server certificate won't be checked against the available certificate authorities.
2021-02-01T12:05:31.557Z [INF]       curl_util.cpp:prepare_url(250): URL is https://s3service:8080/mybucket/bigfile?uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg
2021-02-01T12:05:31.557Z [INF]       curl_util.cpp:prepare_url(283): URL changed is https://s3service:8080/mybucket/bigfile?uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg
2021-02-01T12:05:31.557Z [DBG] curl.cpp:RequestPerform(2234): connecting to URL https://s3service:8080/mybucket/bigfile?uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg
2021-02-01T12:05:31.557Z [INF]       curl.cpp:insertV4Headers(2598): computing signature [POST] [/bigfile] [uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg] [d5a47050c3f63ac8d4404cb22cb5799be32738d5a7999620abcd6247355431c9]
2021-02-01T12:05:31.557Z [INF]       curl_util.cpp:url_to_host(327): url is https://s3service:8080
2021-02-01T12:05:31.557Z [CURL DBG] * Found bundle for host s3service: 0x7f3948018bd0
2021-02-01T12:05:31.557Z [CURL DBG] * Re-using existing connection! (#0) with host s3service
2021-02-01T12:05:31.557Z [CURL DBG] * Connected to s3service (172.20.10.152) port 8082 (#0)
2021-02-01T12:05:31.558Z [CURL DBG] > POST /mybucket/bigfile?uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg HTTP/1.1
2021-02-01T12:05:31.558Z [CURL DBG] > User-Agent: s3fs/1.88-CG (commit hash unknown; OpenSSL)
2021-02-01T12:05:31.558Z [CURL DBG] > Authorization: AWS4-HMAC-SHA256 Credential=NHTL282ERGST9B40156H/20210201/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=b9afa8bec1a484f967c66bc937d42007854cb0fabc09b67802f039c1994572b7
2021-02-01T12:05:31.558Z [CURL DBG] > Content-Type: application/xml
2021-02-01T12:05:31.558Z [CURL DBG] > host: s3service:8080
2021-02-01T12:05:31.558Z [CURL DBG] > x-amz-content-sha256: d5a47050c3f63ac8d4404cb22cb5799be32738d5a7999620abcd6247355431c9
2021-02-01T12:05:31.558Z [CURL DBG] > x-amz-date: 20210201T120531Z
2021-02-01T12:05:31.558Z [CURL DBG] > Content-Length: 2039
2021-02-01T12:05:31.558Z [CURL DBG] > Expect: 100-continue
2021-02-01T12:05:31.558Z [CURL DBG] >
2021-02-01T12:05:31.559Z [CURL DBG] < HTTP/1.1 100 Continue
2021-02-01T12:05:31.562Z [CURL DBG] < HTTP/1.1 200 OK
2021-02-01T12:05:31.562Z [CURL DBG] < Date: Mon, 01 Feb 2021 12:05:31 GMT
2021-02-01T12:05:31.562Z [CURL DBG] < Connection: CLOSE
2021-02-01T12:05:31.562Z [CURL DBG] < Server: StorageGRID/11.4.0.2
2021-02-01T12:05:31.562Z [CURL DBG] < x-amz-request-id: 1612181016023039
2021-02-01T12:05:31.562Z [CURL DBG] < x-amz-id-2: 12616688
2021-02-01T12:05:31.562Z [CURL DBG] < Content-Type: application/xml
2021-02-01T12:05:31.562Z [CURL DBG] <
2021-02-01T12:05:31.656Z [CURL DBG] * Closing connection 0
2021-02-01T12:05:31.657Z [INF]       curl.cpp:RequestPerform(2267): HTTP response code 200
2021-02-01T12:05:31.657Z [DBG] curl_handlerpool.cpp:ReturnHandler(103): Return handler to pool
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:PunchHole(1632): [path=/bigfile][fd=7][offset=0][size=0]
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:PunchHole(1646): there is no page list that have no data, so nothing to do.
2021-02-01T12:05:31.698Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7]
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1]
   write[7] 65536 bytes to 5368643584
   unique: 81926, success, outsize: 24
unique: 81927, opcode: WRITE (16), nodeid: 2, insize: 65616, pid: 30993
write[7] 65536 bytes to 5368709120 flags: 0x8001
2021-02-01T12:05:31.698Z [DBG] s3fs.cpp:s3fs_write(2273): [path=/bigfile][size=65536][offset=5368709120][fd=7]
2021-02-01T12:05:31.698Z [DBG] fdcache.cpp:ExistOpen(526): [path=/bigfile][fd=7][ignore_existfd=false]
2021-02-01T12:05:31.698Z [DBG] fdcache.cpp:Open(446): [path=/bigfile][size=-1][time=-1]
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=2]
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Open(315): [path=/bigfile][fd=7][size=-1][time=-1]
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=3]
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=2]
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Write(1439): [path=/bigfile][fd=7][offset=5368709120][size=65536]
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Load(902): [path=/bigfile][fd=7][offset=0][size=5368709120]
2021-02-01T12:05:31.698Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7]
2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1]
   write[7] 65536 bytes to 5368709120
   unique: 81927, success, outsize: 24

the application continue writing till utimens() is called:

unique: 114246, opcode: WRITE (16), nodeid: 2, insize: 65616, pid: 30993
write[7] 65536 bytes to 7486767104 flags: 0x8001
2021-02-01T12:05:44.103Z [DBG] s3fs.cpp:s3fs_write(2273): [path=/bigfile][size=65536][offset=7486767104][fd=7]
2021-02-01T12:05:44.103Z [DBG] fdcache.cpp:ExistOpen(526): [path=/bigfile][fd=7][ignore_existfd=false]
2021-02-01T12:05:44.103Z [DBG] fdcache.cpp:Open(446): [path=/bigfile][size=-1][time=-1]
2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=2]
2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Open(315): [path=/bigfile][fd=7][size=-1][time=-1]
2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=3]
2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=2]
2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Write(1439): [path=/bigfile][fd=7][offset=7486767104][size=65536]
2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Load(902): [path=/bigfile][fd=7][offset=0][size=7486767104]
2021-02-01T12:05:44.104Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7]
2021-02-01T12:05:44.104Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1]
   write[7] 65536 bytes to 7486767104
   unique: 114246, success, outsize: 24
unique: 114247, opcode: SETATTR (4), nodeid: 2, insize: 128, pid: 30993
utimens /bigfile 1612179045.811590758 1611742829.138611914
2021-02-01T12:05:44.104Z [INF] s3fs.cpp:s3fs_utimens(1926): [path=/bigfile][mtime=1611742829][ctime/atime=1612179045]
2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:check_parent_object_access(615): [path=/bigfile]
2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:check_object_access(509): [path=/]
2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:get_object_attribute(362): [path=/]
2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:check_object_access(509): [path=/bigfile]
2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:get_object_attribute(362): [path=/bigfile]

...

2021-02-01T12:05:44.311Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7]
2021-02-01T12:05:44.311Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1]
   unique: 114247, success, outsize: 120

after that chown() is called and failing:

unique: 114249, opcode: SETATTR (4), nodeid: 2, insize: 128, pid: 30993
chown /bigfile 72 72

...

2021-02-01T12:05:44.311Z [WAN] s3fs.cpp:put_headers(729): Metadata update for failed because the file is larger than 5GB and one of the options nocopyapi or nomultipart are set: [path=/bigfile]
2021-02-01T12:05:44.311Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7]
2021-02-01T12:05:44.311Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1]
   unique: 114249, error: -5 (Input/output error), outsize: 16

a final flush writes all data from cache to the S3 bucket:

unique: 114250, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 30993
flush[7]
2021-02-01T12:05:44.312Z [INF] s3fs.cpp:s3fs_flush(2321): [path=/bigfile][fd=7]
2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:check_parent_object_access(615): [path=/bigfile]
2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:check_object_access(509): [path=/]
2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:get_object_attribute(362): [path=/]
2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:check_object_access(509): [path=/bigfile]
2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:get_object_attribute(362): [path=/bigfile]
2021-02-01T12:05:44.312Z [DBG] cache.cpp:GetStat(293): stat cache hit [path=/bigfile][time=1567007.430625848][hit count=4]
2021-02-01T12:05:44.312Z [DBG] fdcache.cpp:ExistOpen(526): [path=/bigfile][fd=7][ignore_existfd=false]
2021-02-01T12:05:44.312Z [DBG] fdcache.cpp:Open(446): [path=/bigfile][size=-1][time=-1]
2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=2]
2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Open(315): [path=/bigfile][fd=7][size=-1][time=-1]
2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=3]
2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=2]
2021-02-01T12:05:44.312Z [INF]       fdcache_entity.cpp:RowFlush(1194): [tpath=][path=/bigfile][fd=7]
2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Load(902): [path=/bigfile][fd=7][offset=0][size=0]
2021-02-01T12:05:44.312Z [INF]       curl.cpp:PutRequest(3127): [tpath=/bigfile]
2021-02-01T12:05:44.312Z [DBG] curl_handlerpool.cpp:GetHandler(81): Get handler from pool: rest = 30
2021-02-01T12:05:44.312Z [DBG] curl.cpp:ResetHandle(1915): 'no_check_certificate' option in effect.
2021-02-01T12:05:44.312Z [DBG] curl.cpp:ResetHandle(1916): The server certificate won't be checked against the available certificate authorities.
2021-02-01T12:05:44.312Z [INF]       curl_util.cpp:prepare_url(250): URL is https://s3service:8080/mybucket/bigfile
2021-02-01T12:05:44.312Z [INF]       curl_util.cpp:prepare_url(283): URL changed is https://s3service:8080/mybucket/bigfile
2021-02-01T12:05:44.312Z [INF]       curl.cpp:PutRequest(3225): uploading... [path=/bigfile][fd=7][size=7486832640]
2021-02-01T12:05:44.312Z [DBG] curl.cpp:RequestPerform(2234): connecting to URL https://s3service:8080/mybucket/bigfile
2021-02-01T12:06:16.722Z [INF]       curl.cpp:insertV4Headers(2598): computing signature [PUT] [/bigfile] [] [44ff81417263149cf3b7a7dbe25db2c455bfc621f5b768c662220e09402cfbc1]
2021-02-01T12:06:16.722Z [INF]       curl_util.cpp:url_to_host(327): url is https://s3service:8080
2021-02-01T12:06:16.722Z [CURL DBG] * Found bundle for host s3service: 0x7f39480bb620
2021-02-01T12:06:16.722Z [CURL DBG] * Re-using existing connection! (#1) with host s3service
2021-02-01T12:06:16.722Z [CURL DBG] * Connected to s3service (172.20.10.152) port 8082 (#1)
2021-02-01T12:06:16.723Z [CURL DBG] > PUT /mybucket/bigfile HTTP/1.1
2021-02-01T12:06:16.723Z [CURL DBG] > User-Agent: s3fs/1.88-CG (commit hash unknown; OpenSSL)
2021-02-01T12:06:16.723Z [CURL DBG] > Accept: */*
2021-02-01T12:06:16.723Z [CURL DBG] > Authorization: AWS4-HMAC-SHA256 Credential=NHTL282ERGST9B40156H/20210201/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date;x-amz-meta-atime;x-amz-meta-ctime;x-amz-meta-gid;x-amz-meta-mode;x-amz-meta-mtime;x-amz-meta-uid;x-amz-metadata-directive, Signature=e480862e45c998444dbf241986aa70e7921b9106623390984e7dd6a126a9cc9a
2021-02-01T12:06:16.723Z [CURL DBG] > Content-Type: application/octet-stream
2021-02-01T12:06:16.723Z [CURL DBG] > host: s3service:8080
2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-content-sha256: 44ff81417263149cf3b7a7dbe25db2c455bfc621f5b768c662220e09402cfbc1
2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-date: 20210201T120616Z
2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-atime: 1612179045
2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-ctime: 1612181144
2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-gid: 72
2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-mode: 33152
2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-mtime: 1611742829
2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-uid: 72
2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-metadata-directive: REPLACE
2021-02-01T12:06:16.723Z [CURL DBG] > Content-Length: 7486832640
2021-02-01T12:06:16.723Z [CURL DBG] > Expect: 100-continue
2021-02-01T12:06:16.723Z [CURL DBG] >
2021-02-01T12:06:16.725Z [CURL DBG] < HTTP/1.1 100 Continue
2021-02-01T12:06:58.137Z [CURL DBG] * We are completely uploaded and fine
2021-02-01T12:06:58.609Z [CURL DBG] < HTTP/1.1 200 OK
2021-02-01T12:06:58.609Z [CURL DBG] < Date: Mon, 01 Feb 2021 12:06:58 GMT
2021-02-01T12:06:58.609Z [CURL DBG] < Connection: KEEP-ALIVE
2021-02-01T12:06:58.609Z [CURL DBG] < Server: StorageGRID/11.4.0.2
2021-02-01T12:06:58.609Z [CURL DBG] < x-amz-request-id: 1612181144114382
2021-02-01T12:06:58.609Z [CURL DBG] < x-amz-id-2: 12293711
2021-02-01T12:06:58.609Z [CURL DBG] < Content-Length: 0
2021-02-01T12:06:58.609Z [CURL DBG] < ETag: "194d108e7572007ad27b8bc33285512e"
2021-02-01T12:06:58.609Z [CURL DBG] <
2021-02-01T12:06:58.609Z [CURL DBG] * Connection #1 to host s3service left intact
2021-02-01T12:06:58.609Z [INF]       curl.cpp:RequestPerform(2267): HTTP response code 200
2021-02-01T12:06:58.609Z [DBG] curl_handlerpool.cpp:ReturnHandler(103): Return handler to pool
2021-02-01T12:06:58.609Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7]
2021-02-01T12:06:58.609Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1]
   unique: 114250, success, outsize: 16

Regards,
Carsten

Originally created by @CarstenGrohmann on GitHub (Feb 1, 2021). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1542 Hello, I tried to copy a large file into a bucket via `s2fs -o nomultipart`. The result is unexpected. My expectations for "nomultipart" are that the upload uses a single object upload request, stops at the 5GB limit and the application exits with an error. In contrast, the command `cp` copied a complete 7GB file and only failed as expected at `chown`. For this, the checksum of the file in the S3 bucket does not match the local one. ``` $ cp -p bigfile /mybucket/ cp: failed to preserve ownership for â/mybucket/bigfileâ: Input/output error $ date -u; ll bigfile /mybucket/bigfile Mon Feb 1 12:07:47 UTC 2021 -rw-r--r-- 1 tcpdump tcpdump 7486832640 Jan 27 11:20 bigfile -rw------- 1 tcpdump tcpdump 7486832640 Jan 27 11:20 /mybucket/bigfile $ date -u; cp -p /mybucket/bigfile bigfile.2 Mon Feb 1 12:08:58 UTC 2021 $ md5sum bigfile* 92b6de21419129456f82da791e225c5b bigfile 194d108e7572007ad27b8bc33285512e bigfile.2 $ date -u; md5sum /mybucket/bigfile Mon Feb 1 12:12:43 UTC 2021 194d108e7572007ad27b8bc33285512e /mybucket/bigfile $ xxd bigfile > bigfile.hex $ xxd bigfile.2 > bigfile.2.hex $ head bigfile.hex bigfile.2.hex ==> bigfile.hex <== 0000000: 33ed 9090 9090 9090 9090 9090 9090 9090 3............... 0000010: 9090 9090 9090 9090 9090 9090 9090 9090 ................ 0000020: 33ed fa8e d5bc 007c fbfc 6631 db66 31c9 3......|..f1.f1. 0000030: 6653 6651 0657 8edd 8ec5 52be 007c bf00 fSfQ.W....R..|.. 0000040: 06b9 0001 f3a5 ea4b 0600 0052 b441 bbaa .......K...R.A.. 0000050: 5531 c930 f6f9 cd13 7216 81fb 55aa 7510 U1.0....r...U.u. 0000060: 83e1 0174 0b66 c706 f106 b442 eb15 eb00 ...t.f.....B.... 0000070: 5a51 b408 cd13 83e1 3f5b 510f b6c6 4050 ZQ......?[Q...@P 0000080: f7e1 5352 50bb 007c b904 0066 a1b0 07e8 ..SRP..|...f.... 0000090: 4400 0f82 8000 6640 80c7 02e2 f266 813e D.....f@.....f.> ==> bigfile.2.hex <== 0000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................ ``` There are two other issues I am currently struggling with: 1. The application writes 7 GB, but s3fs uses multipart upload only for the first 5 GB and not for all data. Why does s3fs stops using multipart uploads at the 5 GB limit? 2. The final flush writes the entire file (the previously written 5 GB plus the missing 2 GB) to the bucket again. This is fine, since S3 does not allow object modification. But if I remember correctly, S3 has defined a limit of 5GB for single upload operations. This should make Flush use a multi-part upload. Right? The full debug log is 270M, thereby I don't attach it. I can share it on demand. **Debug output s3fs:** ``` $ ./s3fs mybucket /mybucket -o url=https://s3service:8080 -o use_path_request_style,no_check_certificate,nomultipart,multipart_size=256,curldbg,dbglevel=debug -d -d -f 2021-02-01T12:03:36.008Z [CRT] s3fs_logger.cpp:LowSetLogLevel(202): change debug level from [CRT] to [DBG] 2021-02-01T12:03:36.008Z [INF] s3fs.cpp:set_mountpoint_attribute(3990): PROC(uid=0, gid=0) - MountPoint(uid=0, gid=0, mode=40755) 2021-02-01T12:03:36.008Z [DBG] curl.cpp:InitMimeType(410): Try to load mime types from /etc/mime.types file. 2021-02-01T12:03:36.008Z [DBG] curl.cpp:InitMimeType(415): The old mime types are cleared to load new mime types. 2021-02-01T12:03:36.010Z [INF] curl.cpp:InitMimeType(438): Loaded mime information from /etc/mime.types 2021-02-01T12:03:36.010Z [INF] fdcache_stat.cpp:CheckCacheFileStatTopDir(79): The path to cache top dir is empty, thus not need to check permission. ... ``` **s3fs starts writing to the S3 bucket using multipart requests:** ``` # grep MultipartPost s3fs_20210201.log | head 2021-02-01T12:04:34.863Z [INF] curl.cpp:PreMultipartPostRequest(3400): [tpath=/bigfile] 2021-02-01T12:04:34.869Z [INF] curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=0][size=268435456][part=1] 2021-02-01T12:04:34.869Z [INF] curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=0][size=268435456][part=1] 2021-02-01T12:04:37.299Z [INF] curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=268435456][size=268435456][part=2] 2021-02-01T12:04:37.299Z [INF] curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=268435456][size=268435456][part=2] 2021-02-01T12:04:39.354Z [INF] curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=536870912][size=268435456][part=3] 2021-02-01T12:04:39.354Z [INF] curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=536870912][size=268435456][part=3] 2021-02-01T12:04:42.301Z [INF] curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=805306368][size=268435456][part=4] 2021-02-01T12:04:42.301Z [INF] curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=805306368][size=268435456][part=4] 2021-02-01T12:04:45.158Z [INF] curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=1073741824][size=268435456][part=5] $ grep MultipartPost s3fs_20210201.log | tail 2021-02-01T12:05:15.531Z [INF] curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=4107468800][size=268435456][part=17] 2021-02-01T12:05:18.969Z [INF] curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=4375904256][size=268435456][part=18] 2021-02-01T12:05:18.969Z [INF] curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=4375904256][size=268435456][part=18] 2021-02-01T12:05:23.422Z [INF] curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=4644339712][size=268435456][part=19] 2021-02-01T12:05:23.422Z [INF] curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=4644339712][size=268435456][part=19] 2021-02-01T12:05:27.175Z [INF] curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=4912775168][size=268435456][part=20] 2021-02-01T12:05:27.175Z [INF] curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=4912775168][size=268435456][part=20] 2021-02-01T12:05:30.112Z [INF] curl.cpp:UploadMultipartPostRequest(3718): [tpath=/bigfile][start=5181210624][size=187498496][part=21] 2021-02-01T12:05:30.112Z [INF] curl.cpp:UploadMultipartPostSetup(3659): [tpath=/bigfile][start=5181210624][size=187498496][part=21] 2021-02-01T12:05:31.557Z [INF] curl.cpp:CompleteMultipartPostRequest(3500): [tpath=/bigfile][parts=21] ``` **But at 5GB (`[start=5181210624][size=187498496]`) s3fs finalised the multipart request and even if it still got write requests from the application side:** ``` 2021-02-01T12:05:31.557Z [DBG] curl_handlerpool.cpp:ReturnHandler(103): Return handler to pool 2021-02-01T12:05:31.557Z [INF] curl.cpp:CompleteMultipartPostRequest(3500): [tpath=/bigfile][parts=21] 2021-02-01T12:05:31.557Z [DBG] curl_handlerpool.cpp:GetHandler(81): Get handler from pool: rest = 30 2021-02-01T12:05:31.557Z [DBG] curl.cpp:ResetHandle(1915): 'no_check_certificate' option in effect. 2021-02-01T12:05:31.557Z [DBG] curl.cpp:ResetHandle(1916): The server certificate won't be checked against the available certificate authorities. 2021-02-01T12:05:31.557Z [INF] curl_util.cpp:prepare_url(250): URL is https://s3service:8080/mybucket/bigfile?uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg 2021-02-01T12:05:31.557Z [INF] curl_util.cpp:prepare_url(283): URL changed is https://s3service:8080/mybucket/bigfile?uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg 2021-02-01T12:05:31.557Z [DBG] curl.cpp:RequestPerform(2234): connecting to URL https://s3service:8080/mybucket/bigfile?uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg 2021-02-01T12:05:31.557Z [INF] curl.cpp:insertV4Headers(2598): computing signature [POST] [/bigfile] [uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg] [d5a47050c3f63ac8d4404cb22cb5799be32738d5a7999620abcd6247355431c9] 2021-02-01T12:05:31.557Z [INF] curl_util.cpp:url_to_host(327): url is https://s3service:8080 2021-02-01T12:05:31.557Z [CURL DBG] * Found bundle for host s3service: 0x7f3948018bd0 2021-02-01T12:05:31.557Z [CURL DBG] * Re-using existing connection! (#0) with host s3service 2021-02-01T12:05:31.557Z [CURL DBG] * Connected to s3service (172.20.10.152) port 8082 (#0) 2021-02-01T12:05:31.558Z [CURL DBG] > POST /mybucket/bigfile?uploadId=dLE3x1Ovc1eHN0f5OFUlrDK06Ky_iMJ9Z-saECzBLUKkW3_iMEI-E1iZQg HTTP/1.1 2021-02-01T12:05:31.558Z [CURL DBG] > User-Agent: s3fs/1.88-CG (commit hash unknown; OpenSSL) 2021-02-01T12:05:31.558Z [CURL DBG] > Authorization: AWS4-HMAC-SHA256 Credential=NHTL282ERGST9B40156H/20210201/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=b9afa8bec1a484f967c66bc937d42007854cb0fabc09b67802f039c1994572b7 2021-02-01T12:05:31.558Z [CURL DBG] > Content-Type: application/xml 2021-02-01T12:05:31.558Z [CURL DBG] > host: s3service:8080 2021-02-01T12:05:31.558Z [CURL DBG] > x-amz-content-sha256: d5a47050c3f63ac8d4404cb22cb5799be32738d5a7999620abcd6247355431c9 2021-02-01T12:05:31.558Z [CURL DBG] > x-amz-date: 20210201T120531Z 2021-02-01T12:05:31.558Z [CURL DBG] > Content-Length: 2039 2021-02-01T12:05:31.558Z [CURL DBG] > Expect: 100-continue 2021-02-01T12:05:31.558Z [CURL DBG] > 2021-02-01T12:05:31.559Z [CURL DBG] < HTTP/1.1 100 Continue 2021-02-01T12:05:31.562Z [CURL DBG] < HTTP/1.1 200 OK 2021-02-01T12:05:31.562Z [CURL DBG] < Date: Mon, 01 Feb 2021 12:05:31 GMT 2021-02-01T12:05:31.562Z [CURL DBG] < Connection: CLOSE 2021-02-01T12:05:31.562Z [CURL DBG] < Server: StorageGRID/11.4.0.2 2021-02-01T12:05:31.562Z [CURL DBG] < x-amz-request-id: 1612181016023039 2021-02-01T12:05:31.562Z [CURL DBG] < x-amz-id-2: 12616688 2021-02-01T12:05:31.562Z [CURL DBG] < Content-Type: application/xml 2021-02-01T12:05:31.562Z [CURL DBG] < 2021-02-01T12:05:31.656Z [CURL DBG] * Closing connection 0 2021-02-01T12:05:31.657Z [INF] curl.cpp:RequestPerform(2267): HTTP response code 200 2021-02-01T12:05:31.657Z [DBG] curl_handlerpool.cpp:ReturnHandler(103): Return handler to pool 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:PunchHole(1632): [path=/bigfile][fd=7][offset=0][size=0] 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:PunchHole(1646): there is no page list that have no data, so nothing to do. 2021-02-01T12:05:31.698Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7] 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1] write[7] 65536 bytes to 5368643584 unique: 81926, success, outsize: 24 unique: 81927, opcode: WRITE (16), nodeid: 2, insize: 65616, pid: 30993 write[7] 65536 bytes to 5368709120 flags: 0x8001 2021-02-01T12:05:31.698Z [DBG] s3fs.cpp:s3fs_write(2273): [path=/bigfile][size=65536][offset=5368709120][fd=7] 2021-02-01T12:05:31.698Z [DBG] fdcache.cpp:ExistOpen(526): [path=/bigfile][fd=7][ignore_existfd=false] 2021-02-01T12:05:31.698Z [DBG] fdcache.cpp:Open(446): [path=/bigfile][size=-1][time=-1] 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=2] 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Open(315): [path=/bigfile][fd=7][size=-1][time=-1] 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=3] 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=2] 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Write(1439): [path=/bigfile][fd=7][offset=5368709120][size=65536] 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Load(902): [path=/bigfile][fd=7][offset=0][size=5368709120] 2021-02-01T12:05:31.698Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7] 2021-02-01T12:05:31.698Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1] write[7] 65536 bytes to 5368709120 unique: 81927, success, outsize: 24 ``` **the application continue writing till `utimens()` is called:** ``` unique: 114246, opcode: WRITE (16), nodeid: 2, insize: 65616, pid: 30993 write[7] 65536 bytes to 7486767104 flags: 0x8001 2021-02-01T12:05:44.103Z [DBG] s3fs.cpp:s3fs_write(2273): [path=/bigfile][size=65536][offset=7486767104][fd=7] 2021-02-01T12:05:44.103Z [DBG] fdcache.cpp:ExistOpen(526): [path=/bigfile][fd=7][ignore_existfd=false] 2021-02-01T12:05:44.103Z [DBG] fdcache.cpp:Open(446): [path=/bigfile][size=-1][time=-1] 2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=2] 2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Open(315): [path=/bigfile][fd=7][size=-1][time=-1] 2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=3] 2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=2] 2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Write(1439): [path=/bigfile][fd=7][offset=7486767104][size=65536] 2021-02-01T12:05:44.103Z [DBG] fdcache_entity.cpp:Load(902): [path=/bigfile][fd=7][offset=0][size=7486767104] 2021-02-01T12:05:44.104Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7] 2021-02-01T12:05:44.104Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1] write[7] 65536 bytes to 7486767104 unique: 114246, success, outsize: 24 unique: 114247, opcode: SETATTR (4), nodeid: 2, insize: 128, pid: 30993 utimens /bigfile 1612179045.811590758 1611742829.138611914 2021-02-01T12:05:44.104Z [INF] s3fs.cpp:s3fs_utimens(1926): [path=/bigfile][mtime=1611742829][ctime/atime=1612179045] 2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:check_parent_object_access(615): [path=/bigfile] 2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:check_object_access(509): [path=/] 2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:get_object_attribute(362): [path=/] 2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:check_object_access(509): [path=/bigfile] 2021-02-01T12:05:44.104Z [DBG] s3fs.cpp:get_object_attribute(362): [path=/bigfile] ... 2021-02-01T12:05:44.311Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7] 2021-02-01T12:05:44.311Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1] unique: 114247, success, outsize: 120 ``` **after that `chown()` is called and failing:** ``` unique: 114249, opcode: SETATTR (4), nodeid: 2, insize: 128, pid: 30993 chown /bigfile 72 72 ... 2021-02-01T12:05:44.311Z [WAN] s3fs.cpp:put_headers(729): Metadata update for failed because the file is larger than 5GB and one of the options nocopyapi or nomultipart are set: [path=/bigfile] 2021-02-01T12:05:44.311Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7] 2021-02-01T12:05:44.311Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1] unique: 114249, error: -5 (Input/output error), outsize: 16 ``` **a final flush writes all data from cache to the S3 bucket:** ``` unique: 114250, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 30993 flush[7] 2021-02-01T12:05:44.312Z [INF] s3fs.cpp:s3fs_flush(2321): [path=/bigfile][fd=7] 2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:check_parent_object_access(615): [path=/bigfile] 2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:check_object_access(509): [path=/] 2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:get_object_attribute(362): [path=/] 2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:check_object_access(509): [path=/bigfile] 2021-02-01T12:05:44.312Z [DBG] s3fs.cpp:get_object_attribute(362): [path=/bigfile] 2021-02-01T12:05:44.312Z [DBG] cache.cpp:GetStat(293): stat cache hit [path=/bigfile][time=1567007.430625848][hit count=4] 2021-02-01T12:05:44.312Z [DBG] fdcache.cpp:ExistOpen(526): [path=/bigfile][fd=7][ignore_existfd=false] 2021-02-01T12:05:44.312Z [DBG] fdcache.cpp:Open(446): [path=/bigfile][size=-1][time=-1] 2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=2] 2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Open(315): [path=/bigfile][fd=7][size=-1][time=-1] 2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Dup(246): [path=/bigfile][fd=7][refcnt=3] 2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=2] 2021-02-01T12:05:44.312Z [INF] fdcache_entity.cpp:RowFlush(1194): [tpath=][path=/bigfile][fd=7] 2021-02-01T12:05:44.312Z [DBG] fdcache_entity.cpp:Load(902): [path=/bigfile][fd=7][offset=0][size=0] 2021-02-01T12:05:44.312Z [INF] curl.cpp:PutRequest(3127): [tpath=/bigfile] 2021-02-01T12:05:44.312Z [DBG] curl_handlerpool.cpp:GetHandler(81): Get handler from pool: rest = 30 2021-02-01T12:05:44.312Z [DBG] curl.cpp:ResetHandle(1915): 'no_check_certificate' option in effect. 2021-02-01T12:05:44.312Z [DBG] curl.cpp:ResetHandle(1916): The server certificate won't be checked against the available certificate authorities. 2021-02-01T12:05:44.312Z [INF] curl_util.cpp:prepare_url(250): URL is https://s3service:8080/mybucket/bigfile 2021-02-01T12:05:44.312Z [INF] curl_util.cpp:prepare_url(283): URL changed is https://s3service:8080/mybucket/bigfile 2021-02-01T12:05:44.312Z [INF] curl.cpp:PutRequest(3225): uploading... [path=/bigfile][fd=7][size=7486832640] 2021-02-01T12:05:44.312Z [DBG] curl.cpp:RequestPerform(2234): connecting to URL https://s3service:8080/mybucket/bigfile 2021-02-01T12:06:16.722Z [INF] curl.cpp:insertV4Headers(2598): computing signature [PUT] [/bigfile] [] [44ff81417263149cf3b7a7dbe25db2c455bfc621f5b768c662220e09402cfbc1] 2021-02-01T12:06:16.722Z [INF] curl_util.cpp:url_to_host(327): url is https://s3service:8080 2021-02-01T12:06:16.722Z [CURL DBG] * Found bundle for host s3service: 0x7f39480bb620 2021-02-01T12:06:16.722Z [CURL DBG] * Re-using existing connection! (#1) with host s3service 2021-02-01T12:06:16.722Z [CURL DBG] * Connected to s3service (172.20.10.152) port 8082 (#1) 2021-02-01T12:06:16.723Z [CURL DBG] > PUT /mybucket/bigfile HTTP/1.1 2021-02-01T12:06:16.723Z [CURL DBG] > User-Agent: s3fs/1.88-CG (commit hash unknown; OpenSSL) 2021-02-01T12:06:16.723Z [CURL DBG] > Accept: */* 2021-02-01T12:06:16.723Z [CURL DBG] > Authorization: AWS4-HMAC-SHA256 Credential=NHTL282ERGST9B40156H/20210201/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date;x-amz-meta-atime;x-amz-meta-ctime;x-amz-meta-gid;x-amz-meta-mode;x-amz-meta-mtime;x-amz-meta-uid;x-amz-metadata-directive, Signature=e480862e45c998444dbf241986aa70e7921b9106623390984e7dd6a126a9cc9a 2021-02-01T12:06:16.723Z [CURL DBG] > Content-Type: application/octet-stream 2021-02-01T12:06:16.723Z [CURL DBG] > host: s3service:8080 2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-content-sha256: 44ff81417263149cf3b7a7dbe25db2c455bfc621f5b768c662220e09402cfbc1 2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-date: 20210201T120616Z 2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-atime: 1612179045 2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-ctime: 1612181144 2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-gid: 72 2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-mode: 33152 2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-mtime: 1611742829 2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-meta-uid: 72 2021-02-01T12:06:16.723Z [CURL DBG] > x-amz-metadata-directive: REPLACE 2021-02-01T12:06:16.723Z [CURL DBG] > Content-Length: 7486832640 2021-02-01T12:06:16.723Z [CURL DBG] > Expect: 100-continue 2021-02-01T12:06:16.723Z [CURL DBG] > 2021-02-01T12:06:16.725Z [CURL DBG] < HTTP/1.1 100 Continue 2021-02-01T12:06:58.137Z [CURL DBG] * We are completely uploaded and fine 2021-02-01T12:06:58.609Z [CURL DBG] < HTTP/1.1 200 OK 2021-02-01T12:06:58.609Z [CURL DBG] < Date: Mon, 01 Feb 2021 12:06:58 GMT 2021-02-01T12:06:58.609Z [CURL DBG] < Connection: KEEP-ALIVE 2021-02-01T12:06:58.609Z [CURL DBG] < Server: StorageGRID/11.4.0.2 2021-02-01T12:06:58.609Z [CURL DBG] < x-amz-request-id: 1612181144114382 2021-02-01T12:06:58.609Z [CURL DBG] < x-amz-id-2: 12293711 2021-02-01T12:06:58.609Z [CURL DBG] < Content-Length: 0 2021-02-01T12:06:58.609Z [CURL DBG] < ETag: "194d108e7572007ad27b8bc33285512e" 2021-02-01T12:06:58.609Z [CURL DBG] < 2021-02-01T12:06:58.609Z [CURL DBG] * Connection #1 to host s3service left intact 2021-02-01T12:06:58.609Z [INF] curl.cpp:RequestPerform(2267): HTTP response code 200 2021-02-01T12:06:58.609Z [DBG] curl_handlerpool.cpp:ReturnHandler(103): Return handler to pool 2021-02-01T12:06:58.609Z [DBG] fdcache.cpp:Close(593): [ent->file=/bigfile][ent->fd=7] 2021-02-01T12:06:58.609Z [DBG] fdcache_entity.cpp:Close(200): [path=/bigfile][fd=7][refcnt=1] unique: 114250, success, outsize: 16 ``` Regards, Carsten
kerem 2026-03-04 01:48:59 +03:00
  • closed this issue
  • added the
    dataloss
    label
Author
Owner

@gaul commented on GitHub (Feb 2, 2021):

This inconsistent behavior is unexpected and I suspect that there is more than one bug here. s3fs tries to be helpful by enforcing a client-size 5 GB check since AWS enforces this, probably to make the error-handling easier. However other services like GCS support larger single-part objects (and currently do not support multi-part objects) so s3fs should not artificially limit them. We need to improve our tests in this area which might require a few fixes to reach the ideal state. You can try removing the 5 GB check in put_headers in the mean time. Thanks for the detailed bug report!

<!-- gh-comment-id:771678881 --> @gaul commented on GitHub (Feb 2, 2021): This inconsistent behavior is unexpected and I suspect that there is more than one bug here. s3fs tries to be helpful by enforcing a client-size 5 GB check since AWS enforces this, probably to make the error-handling easier. However other services like GCS support larger single-part objects (and currently do not support multi-part objects) so s3fs should not artificially limit them. We need to improve our tests in this area which might require a few fixes to reach the ideal state. You can try removing the 5 GB check in `put_headers` in the mean time. Thanks for the detailed bug report!
Author
Owner

@DasMagischeToastbrot commented on GitHub (Feb 3, 2021):

By using s3fs I wrote a small test file which writes a file between the local disk and a s3 bucket.

#!/bin/bash
# create a 10gb file
FILENAME="$( date +%s)_my_10g_file.txt"
echo "This run uses filename=$FILENAME"
dd if=/dev/urandom of=$HOME/$FILENAME bs=1M count=10000
DATASETPATH="$HOME/"
echo 'this is my awesome test file' >> $HOME/$FILENAME
# create a loop of 1 hour
START=`date +%s`
while [ $(( $(date +%s) - 14000 )) -lt $START ]; do

   echo "checking md5_local hash..."
   md5_local=$(md5sum "$HOME/$FILENAME" | cut -d ' ' -f 1)
   echo "md5_local=$md5_local"
   
   echo "moving file local->s3"
   mv $HOME/$FILENAME $DATASETPATH/$FILENAME
   
   echo "checking md5_s3 hash..."
   md5_s3=$(md5sum "$DATASETPATH/$FILENAME" | cut -d ' ' -f 1)
   echo "md5_s3=$md5_s3"
   
   if [[ "$md5_local" == "$md5_s3" ]]
   then
      echo "The checksums match after sync to s3"
   else
      echo "Error!" 1>&2
      exit 64
   fi
   
   echo "moving file s3->local"
   mv $DATASETPATH/$FILENAME $HOME/$FILENAME
   #eval md5sum $FILENAME > $HOME/local
   
   echo "checking md5_local hash..."
   md5_local=$(md5sum "$HOME/$FILENAME" | cut -d ' ' -f 1)
   echo "md5_s3=$md5_s3"
   
   if [[ "$md5_local" == "$md5_s3" ]]
   then
      echo "The checksums match after sync to local"
   else
      echo "Error!" 1>&2
      exit 64
   fi

done
echo "Congratulation everything worked"

Usually this works great, but now big files (for example 10 GB) fails with the following error code.
image
I ran this script with smaller files (1GB, 5GB) and it still works. @gaul could you please verify whether this has the same error background? If you need further information feel free to ask.

<!-- gh-comment-id:772463801 --> @DasMagischeToastbrot commented on GitHub (Feb 3, 2021): By using s3fs I wrote a small test file which writes a file between the local disk and a s3 bucket. ``` #!/bin/bash # create a 10gb file FILENAME="$( date +%s)_my_10g_file.txt" echo "This run uses filename=$FILENAME" dd if=/dev/urandom of=$HOME/$FILENAME bs=1M count=10000 DATASETPATH="$HOME/" echo 'this is my awesome test file' >> $HOME/$FILENAME # create a loop of 1 hour START=`date +%s` while [ $(( $(date +%s) - 14000 )) -lt $START ]; do echo "checking md5_local hash..." md5_local=$(md5sum "$HOME/$FILENAME" | cut -d ' ' -f 1) echo "md5_local=$md5_local" echo "moving file local->s3" mv $HOME/$FILENAME $DATASETPATH/$FILENAME echo "checking md5_s3 hash..." md5_s3=$(md5sum "$DATASETPATH/$FILENAME" | cut -d ' ' -f 1) echo "md5_s3=$md5_s3" if [[ "$md5_local" == "$md5_s3" ]] then echo "The checksums match after sync to s3" else echo "Error!" 1>&2 exit 64 fi echo "moving file s3->local" mv $DATASETPATH/$FILENAME $HOME/$FILENAME #eval md5sum $FILENAME > $HOME/local echo "checking md5_local hash..." md5_local=$(md5sum "$HOME/$FILENAME" | cut -d ' ' -f 1) echo "md5_s3=$md5_s3" if [[ "$md5_local" == "$md5_s3" ]] then echo "The checksums match after sync to local" else echo "Error!" 1>&2 exit 64 fi done echo "Congratulation everything worked" ``` Usually this works great, but now big files (for example 10 GB) fails with the following error code. ![image](https://user-images.githubusercontent.com/76611186/106745568-93873e00-6621-11eb-86a6-b3f032b2377d.png) I ran this script with smaller files (1GB, 5GB) and it still works. @gaul could you please verify whether this has the same error background? If you need further information feel free to ask.
Author
Owner

@gaul commented on GitHub (Feb 3, 2021):

Please let me know if the proposed fix resolves your symptoms. Which S3 implementation do you use that lacks multipart support? s3fs generally performs better with this feature.

<!-- gh-comment-id:772556327 --> @gaul commented on GitHub (Feb 3, 2021): Please let me know if the proposed fix resolves your symptoms. Which S3 implementation do you use that lacks multipart support? s3fs generally performs better with this feature.
Author
Owner

@DasMagischeToastbrot commented on GitHub (Feb 4, 2021):

@gaul Unfortunatly this didn't solve my problem. I still get the same error even though I build away from the latest commit
image
My s3fs configuration looks like:
s3fs#${S3PATH} ${LOCALPATH} fuse rw,_netdev,allow_other,endpoint=eu-central-1,iam_role=auto,use_cache=/tmp/s3fs-cache/,ensure_diskfree=5000,uid=,gid=sshusers,umask=002 0 0

<!-- gh-comment-id:773104070 --> @DasMagischeToastbrot commented on GitHub (Feb 4, 2021): @gaul Unfortunatly this didn't solve my problem. I still get the same error even though I build away from the latest commit ![image](https://user-images.githubusercontent.com/76611186/106859195-f7157800-66c2-11eb-817e-cda985cfa767.png) My s3fs configuration looks like: s3fs#${S3PATH} ${LOCALPATH} fuse rw,_netdev,allow_other,endpoint=eu-central-1,iam_role=auto,use_cache=/tmp/s3fs-cache/,ensure_diskfree=5000,uid=<my-uid>,gid=sshusers,umask=002 0 0
Author
Owner

@CarstenGrohmann commented on GitHub (Feb 4, 2021):

@DasMagischeToastbrot Can you simplify your test a little bit and just copying a bit file to the bucket?

I personally find it better if the output of a terminal appears as text in the bug report and not as an image, as images are not searchable.

From my experience I would say that errors in close() are more related to the flush operation and less to the attribute changes in put_header().

<!-- gh-comment-id:773130955 --> @CarstenGrohmann commented on GitHub (Feb 4, 2021): @DasMagischeToastbrot Can you simplify your test a little bit and just copying a bit file to the bucket? I personally find it better if the output of a terminal appears as text in the bug report and not as an image, as images are not searchable. From my experience I would say that errors in `close()` are more related to the `flush` operation and less to the attribute changes in `put_header()`.
Author
Owner

@gaul commented on GitHub (Feb 4, 2021):

@CarstenGrohmann Could you verify whether master resolves your symptoms? I successfully wrote and chmod +x a 5 GB + 1 file without multipart upload.

@DasMagischeToastbrot If the original symptom is resolved, can you open a new issue with the minimal steps to reproduce your symptoms?

As I said in https://github.com/s3fs-fuse/s3fs-fuse/issues/1542#issuecomment-771678881, I suspect there are multiple bugs with large single-part objects and the referenced PR is only the first fix. I will work on better testing in #1543 but these need several improvements to handle larger files.

<!-- gh-comment-id:773144224 --> @gaul commented on GitHub (Feb 4, 2021): @CarstenGrohmann Could you verify whether master resolves your symptoms? I successfully wrote and `chmod +x` a 5 GB + 1 file without multipart upload. @DasMagischeToastbrot If the original symptom is resolved, can you open a new issue with the minimal steps to reproduce your symptoms? As I said in https://github.com/s3fs-fuse/s3fs-fuse/issues/1542#issuecomment-771678881, I suspect there are multiple bugs with large single-part objects and the referenced PR is only the first fix. I will work on better testing in #1543 but these need several improvements to handle larger files.
Author
Owner

@CarstenGrohmann commented on GitHub (Feb 4, 2021):

@gaul chown works now. I have also successfully checked the PUT request in the debug log.

Thanks for your help.

<!-- gh-comment-id:773205969 --> @CarstenGrohmann commented on GitHub (Feb 4, 2021): @gaul `chown` works now. I have also successfully checked the PUT request in the debug log. Thanks for your help.
Author
Owner

@CarstenGrohmann commented on GitHub (Feb 5, 2021):

I don't know how far you are with the subject. At the same time, I don't want to be rude and impatient. I was just curious and looked again where the zeros could come from.

I think the zeros are from one of the many ftruncate calls.

For example, in FdEntity::Write after uploading the data, ftruncate is always called to free disk space.

github.com/s3fs-fuse/s3fs-fuse@4c6690f5f0/src/fdcache_entity.cpp (L1521-L1536)

or

github.com/s3fs-fuse/s3fs-fuse@4c6690f5f0/src/fdcache_entity.cpp (L1319-L1342)

This works well as long as you stay under the 5GB limit. With larger files, however, the final flush then misses the data that are under the 5GB limit. This results in files that have only zeros in the first bytes.

Take Monday's files and check 32 bytes before and 32 bytes after the 5GB limit to verify the idea:

$ echo "5181210624 + 187498496" | bc
5368709120

$ dd if=bigfile of=sort bs=1 skip=5368709088 count=64
64+0 records in
64+0 records out
64 bytes (64 B) copied, 0.000304265 s, 210 kB/s

$ xxd sort > sort.hex

$ cat sort.hex
0000000: 240f 990d e813 acfd 3fed c104 3dc2 0d1a  $.......?...=...
0000010: 7e9d 1739 ae42 4d45 7f69 6756 df47 ab21  ~..9.BME.igV.G.!
0000020: 9a0e 2131 7b9c 3668 8769 f6d9 d1d1 460f  ..!1{.6h.i....F.
0000030: 5a9e fd93 ea85 68bf 5798 6229 f16d dc09  Z.....h.W.b).m..

$ dd if=bigfile.2 of=sort.2 bs=1 skip=5368709088 count=64
64+0 records in
64+0 records out
64 bytes (64 B) copied, 0.000311154 s, 206 kB/s

$ xxd sort.2 > sort.2.hex

$ cat sort.2.hex
0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000020: 9a0e 2131 7b9c 3668 8769 f6d9 d1d1 460f  ..!1{.6h.i....F.
0000030: 5a9e fd93 ea85 68bf 5798 6229 f16d dc09  Z.....h.W.b).m..
<!-- gh-comment-id:773929550 --> @CarstenGrohmann commented on GitHub (Feb 5, 2021): I don't know how far you are with the subject. At the same time, I don't want to be rude and impatient. I was just curious and looked again where the zeros could come from. I think the zeros are from one of the many `ftruncate` calls. For example, in `FdEntity::Write` after uploading the data, `ftruncate` is always called to free disk space. https://github.com/s3fs-fuse/s3fs-fuse/blob/4c6690f5f0d67604d13c91dd10525728ffd50297/src/fdcache_entity.cpp#L1521-L1536 or https://github.com/s3fs-fuse/s3fs-fuse/blob/4c6690f5f0d67604d13c91dd10525728ffd50297/src/fdcache_entity.cpp#L1319-L1342 This works well as long as you stay under the 5GB limit. With larger files, however, the final `flush` then misses the data that are under the 5GB limit. This results in files that have only zeros in the first bytes. Take Monday's files and check 32 bytes before and 32 bytes after the 5GB limit to verify the idea: ``` $ echo "5181210624 + 187498496" | bc 5368709120 $ dd if=bigfile of=sort bs=1 skip=5368709088 count=64 64+0 records in 64+0 records out 64 bytes (64 B) copied, 0.000304265 s, 210 kB/s $ xxd sort > sort.hex $ cat sort.hex 0000000: 240f 990d e813 acfd 3fed c104 3dc2 0d1a $.......?...=... 0000010: 7e9d 1739 ae42 4d45 7f69 6756 df47 ab21 ~..9.BME.igV.G.! 0000020: 9a0e 2131 7b9c 3668 8769 f6d9 d1d1 460f ..!1{.6h.i....F. 0000030: 5a9e fd93 ea85 68bf 5798 6229 f16d dc09 Z.....h.W.b).m.. $ dd if=bigfile.2 of=sort.2 bs=1 skip=5368709088 count=64 64+0 records in 64+0 records out 64 bytes (64 B) copied, 0.000311154 s, 206 kB/s $ xxd sort.2 > sort.2.hex $ cat sort.2.hex 0000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000020: 9a0e 2131 7b9c 3668 8769 f6d9 d1d1 460f ..!1{.6h.i....F. 0000030: 5a9e fd93 ea85 68bf 5798 6229 f16d dc09 Z.....h.W.b).m.. ```
Author
Owner

@gaul commented on GitHub (Feb 5, 2021):

Sorry I think I misunderstood this issue. Reopening to investigate the data corruption.

<!-- gh-comment-id:773973834 --> @gaul commented on GitHub (Feb 5, 2021): Sorry I think I misunderstood this issue. Reopening to investigate the data corruption.
Author
Owner

@gaul commented on GitHub (Feb 8, 2021):

@CarstenGrohmann Sorry I am still confused -- what steps do I need to perform to reproduce the unexpected zeros?

<!-- gh-comment-id:775154489 --> @gaul commented on GitHub (Feb 8, 2021): @CarstenGrohmann Sorry I am still confused -- what steps do I need to perform to reproduce the unexpected zeros?
Author
Owner

@CarstenGrohmann commented on GitHub (Feb 8, 2021):

@gaul Start s3fs -o nomultipart and copy a big file (e.g. 7GB) to the mounted share and back. Calculate the checksum after that or use e.g. xxd to show the hex dump.

Steps to reproduce:

$ ./s3fs mybucket /mnt-o url=https://s3service:8080 -o nomultipart,curldbg,dbglevel=debug -d -d -f
$ cp -p bigfile /mnt/
$ cp -p /mnt/bigfile bigfile2
$ md5sum  bigfile bigfile.2
$ xxd bigfile | head
$ xxd bigfile.2 | head

In https://github.com/s3fs-fuse/s3fs-fuse/issues/1542#issuecomment-773929550 just wanted to show that the zeros stop exactly at the 5GB limit. The first multipart upload wrote the data up to this limit.

<!-- gh-comment-id:775272454 --> @CarstenGrohmann commented on GitHub (Feb 8, 2021): @gaul Start `s3fs -o nomultipart` and copy a big file (e.g. 7GB) to the mounted share and back. Calculate the checksum after that or use e.g. `xxd` to show the hex dump. **Steps to reproduce:** ``` $ ./s3fs mybucket /mnt-o url=https://s3service:8080 -o nomultipart,curldbg,dbglevel=debug -d -d -f ``` ``` $ cp -p bigfile /mnt/ $ cp -p /mnt/bigfile bigfile2 $ md5sum bigfile bigfile.2 $ xxd bigfile | head $ xxd bigfile.2 | head ``` In https://github.com/s3fs-fuse/s3fs-fuse/issues/1542#issuecomment-773929550 just wanted to show that the zeros stop exactly at the 5GB limit. The first multipart upload wrote the data up to this limit.
Author
Owner

@gaul commented on GitHub (Feb 9, 2021):

Sorry I cannot reproduce these symptoms. I modified test_multipart_upload to use a 7 GB file and successfully ran the following:

dd if=/dev/urandom of="${TEMP_DIR}/${BIG_FILE}" bs=$BIG_FILE_BLOCK_SIZE count=$BIG_FILE_COUNT
cp -p "${TEMP_DIR}/${BIG_FILE}" "${BIG_FILE}"
cp -p "${BIG_FILE}" "${TEMP_DIR}/${BIG_FILE}.2"
md5sum "${BIG_FILE}" "${TEMP_DIR}/${BIG_FILE}.2"
cmp "${BIG_FILE}" "${TEMP_DIR}/${BIG_FILE}.2"

Both files had the same MD5.

<!-- gh-comment-id:775754527 --> @gaul commented on GitHub (Feb 9, 2021): Sorry I cannot reproduce these symptoms. I modified `test_multipart_upload` to use a 7 GB file and successfully ran the following: dd if=/dev/urandom of="${TEMP_DIR}/${BIG_FILE}" bs=$BIG_FILE_BLOCK_SIZE count=$BIG_FILE_COUNT cp -p "${TEMP_DIR}/${BIG_FILE}" "${BIG_FILE}" cp -p "${BIG_FILE}" "${TEMP_DIR}/${BIG_FILE}.2" md5sum "${BIG_FILE}" "${TEMP_DIR}/${BIG_FILE}.2" cmp "${BIG_FILE}" "${TEMP_DIR}/${BIG_FILE}.2" Both files had the same MD5.
Author
Owner

@gaul commented on GitHub (Feb 9, 2021):

I notice a difference between bigfile2 and bigfile.2 in your example -- perhaps this caused the failure?

<!-- gh-comment-id:775755202 --> @gaul commented on GitHub (Feb 9, 2021): I notice a difference between `bigfile2` and `bigfile.2` in your example -- perhaps this caused the failure?
Author
Owner

@CarstenGrohmann commented on GitHub (Feb 9, 2021):

The missing dot is just a spelling mistake I made yesterday when I was putting the steps together.

I've updated to the current master and run the test suite a little bit different, because I can use our local S3 appliance.

test_multipart_upload: "Testing multi-part upload ..."
14622720+0 records in
14622720+0 records out
7486832640 bytes (7.5 GB) copied, 75.1753 s, 99.6 MB/s
28560+0 records in
28560+0 records out
7486832640 bytes (7.5 GB) copied, 174.589 s, 42.9 MB/s
Comparing test file
/tmp/big-file-s3fs.txt big-file-s3fs.txt differ: byte 1, line 1
test_multipart_upload failed

My steps:

  1. start s3fs manually
$ cd tests
$ ../src/s3fs mybucket /mybucket -o url=https://s3service:8080,use_path_request_style,nomultipart

  1. Export the settings to run again my local S3 appliance
$ export S3FS_CREDENTIALS_FILE="~/.aws/credentials"
$ export TEST_BUCKET_1="mybucket"
$ export S3PROXY_BINARY=""
$ export S3_URL="https://s3service:8080"
$ export TEST_BUCKET_MOUNT_POINT_1="/mybucket"
  1. Adapt the test to use my DVD ISO as bigfile
> git diff .
diff --git a/test/integration-test-main.sh b/test/integration-test-main.sh
index d493c5f..41c36a8 100755
--- a/test/integration-test-main.sh
+++ b/test/integration-test-main.sh
@@ -363,8 +363,8 @@ function test_rename_before_close {
function test_multipart_upload {
    describe "Testing multi-part upload ..."

-    dd if=/dev/urandom of="${TEMP_DIR}/${BIG_FILE}" bs=$BIG_FILE_BLOCK_SIZE count=$BIG_FILE_COUNT
-    dd if="${TEMP_DIR}/${BIG_FILE}" of="${BIG_FILE}" bs=$BIG_FILE_BLOCK_SIZE count=$BIG_FILE_COUNT
+    dd if=~/bigfile of="${TEMP_DIR}/${BIG_FILE}"
+    dd if="${TEMP_DIR}/${BIG_FILE}" of="${BIG_FILE}" bs=256k

    # Verify contents of file
    echo "Comparing test file"
  1. Run the test suite
$ ./integration-test-main.sh
Tue Feb  9 14:54:58 CET 2021
test_cache_file_stat: "Test cache file stat ..."
1+0 records in
1+0 records out
26214400 bytes (26 MB) copied, 0.766248 s, 34.2 MB/s
test_cache_file_stat failed
...
<!-- gh-comment-id:775973821 --> @CarstenGrohmann commented on GitHub (Feb 9, 2021): The missing dot is just a spelling mistake I made yesterday when I was putting the steps together. I've updated to the current master and run the test suite a little bit different, because I can use our local S3 appliance. ```` test_multipart_upload: "Testing multi-part upload ..." 14622720+0 records in 14622720+0 records out 7486832640 bytes (7.5 GB) copied, 75.1753 s, 99.6 MB/s 28560+0 records in 28560+0 records out 7486832640 bytes (7.5 GB) copied, 174.589 s, 42.9 MB/s Comparing test file /tmp/big-file-s3fs.txt big-file-s3fs.txt differ: byte 1, line 1 test_multipart_upload failed ```` My steps: 1. start s3fs manually ``` $ cd tests $ ../src/s3fs mybucket /mybucket -o url=https://s3service:8080,use_path_request_style,nomultipart ``` 2. Export the settings to run again my local S3 appliance ``` $ export S3FS_CREDENTIALS_FILE="~/.aws/credentials" $ export TEST_BUCKET_1="mybucket" $ export S3PROXY_BINARY="" $ export S3_URL="https://s3service:8080" $ export TEST_BUCKET_MOUNT_POINT_1="/mybucket" ``` 3. Adapt the test to use my DVD ISO as bigfile ``` > git diff . diff --git a/test/integration-test-main.sh b/test/integration-test-main.sh index d493c5f..41c36a8 100755 --- a/test/integration-test-main.sh +++ b/test/integration-test-main.sh @@ -363,8 +363,8 @@ function test_rename_before_close { function test_multipart_upload { describe "Testing multi-part upload ..." - dd if=/dev/urandom of="${TEMP_DIR}/${BIG_FILE}" bs=$BIG_FILE_BLOCK_SIZE count=$BIG_FILE_COUNT - dd if="${TEMP_DIR}/${BIG_FILE}" of="${BIG_FILE}" bs=$BIG_FILE_BLOCK_SIZE count=$BIG_FILE_COUNT + dd if=~/bigfile of="${TEMP_DIR}/${BIG_FILE}" + dd if="${TEMP_DIR}/${BIG_FILE}" of="${BIG_FILE}" bs=256k # Verify contents of file echo "Comparing test file" ``` 4. Run the test suite ``` $ ./integration-test-main.sh Tue Feb 9 14:54:58 CET 2021 test_cache_file_stat: "Test cache file stat ..." 1+0 records in 1+0 records out 26214400 bytes (26 MB) copied, 0.766248 s, 34.2 MB/s test_cache_file_stat failed ... ```
Author
Owner

@CarstenGrohmann commented on GitHub (Feb 9, 2021):

Reading from /dev/urandom fails also:

test_multipart_upload: "Testing multi-part upload ..."
+ dd if=/dev/urandom of=/tmp/big-file-s3fs.txt bs=26214400 count=300
300+0 records in
300+0 records out
7864320000 bytes (7.9 GB) copied, 51.2575 s, 153 MB/s
+ dd if=/tmp/big-file-s3fs.txt of=big-file-s3fs.txt bs=26214400 count=300
300+0 records in
300+0 records out
7864320000 bytes (7.9 GB) copied, 180.274 s, 43.6 MB/s
+ set +x
Comparing test file
/tmp/big-file-s3fs.txt big-file-s3fs.txt differ: byte 1, line 1
test_multipart_upload failed
<!-- gh-comment-id:775981421 --> @CarstenGrohmann commented on GitHub (Feb 9, 2021): Reading from `/dev/urandom` fails also: ``` test_multipart_upload: "Testing multi-part upload ..." + dd if=/dev/urandom of=/tmp/big-file-s3fs.txt bs=26214400 count=300 300+0 records in 300+0 records out 7864320000 bytes (7.9 GB) copied, 51.2575 s, 153 MB/s + dd if=/tmp/big-file-s3fs.txt of=big-file-s3fs.txt bs=26214400 count=300 300+0 records in 300+0 records out 7864320000 bytes (7.9 GB) copied, 180.274 s, 43.6 MB/s + set +x Comparing test file /tmp/big-file-s3fs.txt big-file-s3fs.txt differ: byte 1, line 1 test_multipart_upload failed ```
Author
Owner

@gaul commented on GitHub (Feb 9, 2021):

@CarstenGrohmann Could you test the referenced PR? I believe this will address your symptoms.

<!-- gh-comment-id:775992518 --> @gaul commented on GitHub (Feb 9, 2021): @CarstenGrohmann Could you test the referenced PR? I believe this will address your symptoms.
Author
Owner

@CarstenGrohmann commented on GitHub (Feb 9, 2021):

The PR fixes the problem. The copied file is now the same as the original file.

$ cp -p bigfile /mybucket

$ cp -p /mybucket/bigfile bigfile.2

$ md5sum bigfile*
92b6de21419129456f82da791e225c5b  bigfile
92b6de21419129456f82da791e225c5b  bigfile.2

@gaul Thank you for your support.

<!-- gh-comment-id:776142489 --> @CarstenGrohmann commented on GitHub (Feb 9, 2021): The PR fixes the problem. The copied file is now the same as the original file. ``` $ cp -p bigfile /mybucket $ cp -p /mybucket/bigfile bigfile.2 $ md5sum bigfile* 92b6de21419129456f82da791e225c5b bigfile 92b6de21419129456f82da791e225c5b bigfile.2 ``` @gaul Thank you for your support.
Author
Owner

@gaul commented on GitHub (Feb 11, 2021):

@CarstenGrohmann Thanks for working with us on this -- the test case definitely helped!

I tagged this as data corruption although this situation was not possible in 1.88. Removing the single-part limit uncovered a second bug which incorrectly allowed the periodic dirty flushing logic to flush when multipart was disabled.

<!-- gh-comment-id:777470829 --> @gaul commented on GitHub (Feb 11, 2021): @CarstenGrohmann Thanks for working with us on this -- the test case definitely helped! I tagged this as data corruption although this situation was not possible in 1.88. Removing the single-part limit uncovered a second bug which incorrectly allowed the periodic dirty flushing logic to flush when multipart was disabled.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#813
No description provided.