[GH-ISSUE #1754] Question: Does s3fs support random write when NoCacheMultipartPost kicked in? #903

New issue

Closed

opened 2026-03-04 01:49:47 +03:00 by kerem · 1 comment

kerem commented

2026-03-04 01:49:47 +03:00

Owner

Originally created by @VVoidV on GitHub (Aug 30, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1754

In version 1.89, start position of NoCachemultipartUpload is recorded by mp_start, it seems like this variable is increase only(grow by mp_size).
Consider this case：

write until NocachemultipartUpload kicks in, assume 400MB is written.
seek to 500MB( 100MB hole created)
write 100MB data
seek back to 400MB，write 100MB data

I find the mp_start will grow in step 3, and the hole created in step 2 will be uploaded. but the data written in step 4 will never be uploaded.

    // check multipart uploading
    if(0 < upload_id.length()){
        mp_size += wsize;
        if(S3fsCurl::GetMultipartSize() <= mp_size){
            // over one multipart size
            if(0 != (result = NoCacheMultipartPost(fd, mp_start, mp_size))){
                S3FS_PRN_ERR("failed to multipart post(start=%lld, size=%lld) for file(%d).", static_cast<long long int>(mp_start), static_cast<long long int>(mp_size), fd);
                return result;
            }   
            // [NOTE]
            // truncate file to zero and set length to part offset + size
            // after this, file length is (offset + size), but file does not use any disk space.
            //
            if(-1 == ftruncate(fd, 0) || -1 == ftruncate(fd, (mp_start + mp_size))){
                S3FS_PRN_ERR("failed to truncate file(%d).", fd);
                return -EIO;
            }
            mp_start += mp_size;
            mp_size   = 0;
        }
    }
    return wsize;

It seems like 1.90 not support this either ( I haven't tested it). Because pseudo_obj->AppendUploadPart only allow exactly continuousf area to be added.

// [NOTE]
// This method adds a part for a multipart upload.
// The added new part must be an area that is exactly continuous with the
// immediately preceding part.
// An error will occur if it is discontinuous or if it overlaps with an
// existing area.
//
bool PseudoFdInfo::AppendUploadPart(off_t start, off_t size, bool is_copy, etagpair** ppetag)
{
    if(!IsUploading()){
        S3FS_PRN_ERR("Multipart Upload has not started yet.");
        return false;
    }

    AutoLock auto_lock(&upload_list_lock);
    off_t    next_start_pos = 0;
    if(!upload_list.empty()){
        next_start_pos = upload_list.back().startpos + upload_list.back().size;
    }
    if(start != next_start_pos){
        S3FS_PRN_ERR("The expected starting position for the next part is %lld, but %lld was specified.", static_cast<long long int>(next_start_pos), static_cast<long long int>(start));
        return false;
    }

    // make part number
    int partnumber = static_cast<int>(upload_list.size()) + 1;

    // add new part
    etag_entities.push_back(etagpair(NULL, partnumber));        // [NOTE] Create the etag entity and register it in the list.
    etagpair&   etag_entity = etag_entities.back();
    filepart    newpart(false, physical_fd, start, size, is_copy, &etag_entity);
    upload_list.push_back(newpart);

    // set etag pointer
    if(ppetag){
        *ppetag = &etag_entity;
    }

    return true;
}

Originally created by @VVoidV on GitHub (Aug 30, 2021). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1754 In version 1.89, start position of *NoCachemultipartUpload* is recorded by mp_start, it seems like this variable is increase only(grow by mp_size). Consider this case： 1. write until NocachemultipartUpload kicks in, assume 400MB is written. 2. seek to 500MB( 100MB hole created) 3. write 100MB data 4. seek back to 400MB，write 100MB data I find the mp_start will grow in step 3, and the hole created in step 2 will be uploaded. but the data written in step 4 will never be uploaded. ``` C // check multipart uploading if(0 < upload_id.length()){ mp_size += wsize; if(S3fsCurl::GetMultipartSize() <= mp_size){ // over one multipart size if(0 != (result = NoCacheMultipartPost(fd, mp_start, mp_size))){ S3FS_PRN_ERR("failed to multipart post(start=%lld, size=%lld) for file(%d).", static_cast<long long int>(mp_start), static_cast<long long int>(mp_size), fd); return result; } // [NOTE] // truncate file to zero and set length to part offset + size // after this, file length is (offset + size), but file does not use any disk space. // if(-1 == ftruncate(fd, 0) || -1 == ftruncate(fd, (mp_start + mp_size))){ S3FS_PRN_ERR("failed to truncate file(%d).", fd); return -EIO; } mp_start += mp_size; mp_size = 0; } } return wsize; ``` It seems like 1.90 not support this either ( I haven't tested it). Because pseudo_obj->AppendUploadPart only allow exactly continuousf area to be added. ``` C // [NOTE] // This method adds a part for a multipart upload. // The added new part must be an area that is exactly continuous with the // immediately preceding part. // An error will occur if it is discontinuous or if it overlaps with an // existing area. // bool PseudoFdInfo::AppendUploadPart(off_t start, off_t size, bool is_copy, etagpair** ppetag) { if(!IsUploading()){ S3FS_PRN_ERR("Multipart Upload has not started yet."); return false; } AutoLock auto_lock(&upload_list_lock); off_t next_start_pos = 0; if(!upload_list.empty()){ next_start_pos = upload_list.back().startpos + upload_list.back().size; } if(start != next_start_pos){ S3FS_PRN_ERR("The expected starting position for the next part is %lld, but %lld was specified.", static_cast<long long int>(next_start_pos), static_cast<long long int>(start)); return false; } // make part number int partnumber = static_cast<int>(upload_list.size()) + 1; // add new part etag_entities.push_back(etagpair(NULL, partnumber)); // [NOTE] Create the etag entity and register it in the list. etagpair& etag_entity = etag_entities.back(); filepart newpart(false, physical_fd, start, size, is_copy, &etag_entity); upload_list.push_back(newpart); // set etag pointer if(ppetag){ *ppetag = &etag_entity; } return true; } ```

kerem closed this issue

2026-03-04 01:49:48 +03:00

kerem commented

2026-03-04 01:49:48 +03:00

Author

Owner

@VVoidV commented on GitHub (Sep 2, 2021):

One more thing, while I write only 10MB in step 3, and seek back to 400MB, then write more than 10MB to trigger a put, the curl put request will get no data to transfer. I think we can do better here. we should fail the write req when disk cache run out.

write until NocachemultipartUpload kicks in, assume 400MB is written.

seek to 500MB( 100MB hole created)

write 100MB data

seek back to 400MB，write 100MB data

@VVoidV commented on GitHub (Sep 2, 2021): One more thing, while I write only 10MB in step 3, and seek back to 400MB, then write more than 10MB to trigger a put, the curl put request will get no data to transfer. I think we can do better here. we should fail the write req when disk cache run out. > * write until NocachemultipartUpload kicks in, assume 400MB is written. > * seek to 500MB( 100MB hole created) > * write 100MB data > * seek back to 400MB，write 100MB data

kerem referenced this issue

2026-03-04 02:01:41 +03:00

[PR #903] [MERGED] Prefer find(char) over find(const char *) #1688