[GH-ISSUE #1754] Question: Does s3fs support random write when NoCacheMultipartPost kicked in? #903

Closed
opened 2026-03-04 01:49:47 +03:00 by kerem · 1 comment
Owner

Originally created by @VVoidV on GitHub (Aug 30, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1754

In version 1.89, start position of NoCachemultipartUpload is recorded by mp_start, it seems like this variable is increase only(grow by mp_size).
Consider this case:

  1. write until NocachemultipartUpload kicks in, assume 400MB is written.
  2. seek to 500MB( 100MB hole created)
  3. write 100MB data
  4. seek back to 400MB,write 100MB data

I find the mp_start will grow in step 3, and the hole created in step 2 will be uploaded. but the data written in step 4 will never be uploaded.

    // check multipart uploading
    if(0 < upload_id.length()){
        mp_size += wsize;
        if(S3fsCurl::GetMultipartSize() <= mp_size){
            // over one multipart size
            if(0 != (result = NoCacheMultipartPost(fd, mp_start, mp_size))){
                S3FS_PRN_ERR("failed to multipart post(start=%lld, size=%lld) for file(%d).", static_cast<long long int>(mp_start), static_cast<long long int>(mp_size), fd);
                return result;
            }   
            // [NOTE]
            // truncate file to zero and set length to part offset + size
            // after this, file length is (offset + size), but file does not use any disk space.
            //
            if(-1 == ftruncate(fd, 0) || -1 == ftruncate(fd, (mp_start + mp_size))){
                S3FS_PRN_ERR("failed to truncate file(%d).", fd);
                return -EIO;
            }
            mp_start += mp_size;
            mp_size   = 0;
        }
    }
    return wsize;

It seems like 1.90 not support this either ( I haven't tested it). Because pseudo_obj->AppendUploadPart only allow exactly continuousf area to be added.

// [NOTE]
// This method adds a part for a multipart upload.
// The added new part must be an area that is exactly continuous with the
// immediately preceding part.
// An error will occur if it is discontinuous or if it overlaps with an
// existing area.
//
bool PseudoFdInfo::AppendUploadPart(off_t start, off_t size, bool is_copy, etagpair** ppetag)
{
    if(!IsUploading()){
        S3FS_PRN_ERR("Multipart Upload has not started yet.");
        return false;
    }

    AutoLock auto_lock(&upload_list_lock);
    off_t    next_start_pos = 0;
    if(!upload_list.empty()){
        next_start_pos = upload_list.back().startpos + upload_list.back().size;
    }
    if(start != next_start_pos){
        S3FS_PRN_ERR("The expected starting position for the next part is %lld, but %lld was specified.", static_cast<long long int>(next_start_pos), static_cast<long long int>(start));
        return false;
    }

    // make part number
    int partnumber = static_cast<int>(upload_list.size()) + 1;

    // add new part
    etag_entities.push_back(etagpair(NULL, partnumber));        // [NOTE] Create the etag entity and register it in the list.
    etagpair&   etag_entity = etag_entities.back();
    filepart    newpart(false, physical_fd, start, size, is_copy, &etag_entity);
    upload_list.push_back(newpart);

    // set etag pointer
    if(ppetag){
        *ppetag = &etag_entity;
    }

    return true;
}
Originally created by @VVoidV on GitHub (Aug 30, 2021). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1754 In version 1.89, start position of *NoCachemultipartUpload* is recorded by mp_start, it seems like this variable is increase only(grow by mp_size). Consider this case: 1. write until NocachemultipartUpload kicks in, assume 400MB is written. 2. seek to 500MB( 100MB hole created) 3. write 100MB data 4. seek back to 400MB,write 100MB data I find the mp_start will grow in step 3, and the hole created in step 2 will be uploaded. but the data written in step 4 will never be uploaded. ``` C // check multipart uploading if(0 < upload_id.length()){ mp_size += wsize; if(S3fsCurl::GetMultipartSize() <= mp_size){ // over one multipart size if(0 != (result = NoCacheMultipartPost(fd, mp_start, mp_size))){ S3FS_PRN_ERR("failed to multipart post(start=%lld, size=%lld) for file(%d).", static_cast<long long int>(mp_start), static_cast<long long int>(mp_size), fd); return result; } // [NOTE] // truncate file to zero and set length to part offset + size // after this, file length is (offset + size), but file does not use any disk space. // if(-1 == ftruncate(fd, 0) || -1 == ftruncate(fd, (mp_start + mp_size))){ S3FS_PRN_ERR("failed to truncate file(%d).", fd); return -EIO; } mp_start += mp_size; mp_size = 0; } } return wsize; ``` It seems like 1.90 not support this either ( I haven't tested it). Because pseudo_obj->AppendUploadPart only allow exactly continuousf area to be added. ``` C // [NOTE] // This method adds a part for a multipart upload. // The added new part must be an area that is exactly continuous with the // immediately preceding part. // An error will occur if it is discontinuous or if it overlaps with an // existing area. // bool PseudoFdInfo::AppendUploadPart(off_t start, off_t size, bool is_copy, etagpair** ppetag) { if(!IsUploading()){ S3FS_PRN_ERR("Multipart Upload has not started yet."); return false; } AutoLock auto_lock(&upload_list_lock); off_t next_start_pos = 0; if(!upload_list.empty()){ next_start_pos = upload_list.back().startpos + upload_list.back().size; } if(start != next_start_pos){ S3FS_PRN_ERR("The expected starting position for the next part is %lld, but %lld was specified.", static_cast<long long int>(next_start_pos), static_cast<long long int>(start)); return false; } // make part number int partnumber = static_cast<int>(upload_list.size()) + 1; // add new part etag_entities.push_back(etagpair(NULL, partnumber)); // [NOTE] Create the etag entity and register it in the list. etagpair& etag_entity = etag_entities.back(); filepart newpart(false, physical_fd, start, size, is_copy, &etag_entity); upload_list.push_back(newpart); // set etag pointer if(ppetag){ *ppetag = &etag_entity; } return true; } ```
kerem closed this issue 2026-03-04 01:49:48 +03:00
Author
Owner

@VVoidV commented on GitHub (Sep 2, 2021):

One more thing, while I write only 10MB in step 3, and seek back to 400MB, then write more than 10MB to trigger a put, the curl put request will get no data to transfer. I think we can do better here. we should fail the write req when disk cache run out.

  • write until NocachemultipartUpload kicks in, assume 400MB is written.
  • seek to 500MB( 100MB hole created)
  • write 100MB data
  • seek back to 400MB,write 100MB data
<!-- gh-comment-id:911058119 --> @VVoidV commented on GitHub (Sep 2, 2021): One more thing, while I write only 10MB in step 3, and seek back to 400MB, then write more than 10MB to trigger a put, the curl put request will get no data to transfer. I think we can do better here. we should fail the write req when disk cache run out. > * write until NocachemultipartUpload kicks in, assume 400MB is written. > * seek to 500MB( 100MB hole created) > * write 100MB data > * seek back to 400MB,write 100MB data
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#903
No description provided.