[GH-ISSUE #1956] error when trying to copy large files which were written to GCS, s3fs multi-part upload to blame? #988

Closed
opened 2026-03-04 01:50:28 +03:00 by kerem · 1 comment
Owner

Originally created by @hqm on GitHub (Jun 10, 2022).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1956

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD

Version of s3fs being used (s3fs --version)

s3fs==2021.7.0

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

Kernel information (uname -r)

5.8.0-44-generic

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Details about issue

My app uses s3fs in python to write moderately large files (12-25 MBytes), using the filesystem-like api S3FileSystem
with calls to s3.open(), calls to write(), and then close() the file

The files are written to GCS, but if I try to copy them to another bucket, or within the same bucket, using 'gsutil cp', I get this error

BadRequestException: 400 Rewriting objects created via Multipart Upload is not implemented yet. As a workaround, you can use compose to overwrite the object (by specifying leela-yoyodyne-dev/cameras/M2/A1/2022-06-09/21/2022-06-09T21:39:35.062Z.obj.jsonl as both the source and output of compose) prior to rewrite.

Is there some way to avoid creating multipart files?

I create a client like this
self.s3 = S3FileSystem(
anon=False,
key=access_key_id,
secret=secret_access_key,
client_kwargs={
'endpoint_url': endpoint
})
Are there some extra options to create the S3FileSystem such that it would not use multipart uploads, to prevent this issue?

Originally created by @hqm on GitHub (Jun 10, 2022). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1956 ### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ _Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD_ #### Version of s3fs being used (s3fs --version) s3fs==2021.7.0 #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) #### Kernel information (uname -r) 5.8.0-44-generic #### GNU/Linux Distribution, if applicable (cat /etc/os-release) NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal ### Details about issue My app uses s3fs in python to write moderately large files (12-25 MBytes), using the filesystem-like api S3FileSystem with calls to s3.open(), calls to write(), and then close() the file The files are written to GCS, but if I try to copy them to another bucket, or within the same bucket, using 'gsutil cp', I get this error BadRequestException: 400 Rewriting objects created via Multipart Upload is not implemented yet. As a workaround, you can use compose to overwrite the object (by specifying leela-yoyodyne-dev/cameras/M2/A1/2022-06-09/21/2022-06-09T21:39:35.062Z.obj.jsonl as both the source and output of compose) prior to rewrite. Is there some way to avoid creating multipart files? I create a client like this self.s3 = S3FileSystem( anon=False, key=access_key_id, secret=secret_access_key, client_kwargs={ 'endpoint_url': endpoint }) Are there some extra options to create the S3FileSystem such that it would not use multipart uploads, to prevent this issue?
kerem closed this issue 2026-03-04 01:50:28 +03:00
Author
Owner

@ggtakec commented on GitHub (Jun 12, 2022):

@hqm
It seems that you are using the python version of https://github.com/fsspec/s3fs (https://s3fs.readthedocs.io/en/latest/index.html).
s3fs-fuse here is a different FUSE file system.
So please post your issue to that github repository.
Thanks.

<!-- gh-comment-id:1153179545 --> @ggtakec commented on GitHub (Jun 12, 2022): @hqm It seems that you are using the python version of https://github.com/fsspec/s3fs (https://s3fs.readthedocs.io/en/latest/index.html). s3fs-fuse here is a different FUSE file system. So please post your issue to that github repository. Thanks.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#988
No description provided.