mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #1956] error when trying to copy large files which were written to GCS, s3fs multi-part upload to blame? #988
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#988
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @hqm on GitHub (Jun 10, 2022).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1956
Additional Information
The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD
Version of s3fs being used (s3fs --version)
s3fs==2021.7.0
Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)
Kernel information (uname -r)
5.8.0-44-generic
GNU/Linux Distribution, if applicable (cat /etc/os-release)
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Details about issue
My app uses s3fs in python to write moderately large files (12-25 MBytes), using the filesystem-like api S3FileSystem
with calls to s3.open(), calls to write(), and then close() the file
The files are written to GCS, but if I try to copy them to another bucket, or within the same bucket, using 'gsutil cp', I get this error
BadRequestException: 400 Rewriting objects created via Multipart Upload is not implemented yet. As a workaround, you can use compose to overwrite the object (by specifying leela-yoyodyne-dev/cameras/M2/A1/2022-06-09/21/2022-06-09T21:39:35.062Z.obj.jsonl as both the source and output of compose) prior to rewrite.
Is there some way to avoid creating multipart files?
I create a client like this
self.s3 = S3FileSystem(
anon=False,
key=access_key_id,
secret=secret_access_key,
client_kwargs={
'endpoint_url': endpoint
})
Are there some extra options to create the S3FileSystem such that it would not use multipart uploads, to prevent this issue?
@ggtakec commented on GitHub (Jun 12, 2022):
@hqm
It seems that you are using the python version of https://github.com/fsspec/s3fs (https://s3fs.readthedocs.io/en/latest/index.html).
s3fs-fuse here is a different FUSE file system.
So please post your issue to that github repository.
Thanks.