[GH-ISSUE #217] gsutil cp bad request #40

Closed
opened 2026-03-03 12:07:42 +03:00 by kerem · 6 comments
Owner

Originally created by @jonasfugedi on GitHub (Apr 19, 2020).
Original GitHub issue: https://github.com/fsouza/fake-gcs-server/issues/217

I need to test some bash scripts which run gsutil but I keep getting errors when trying to create objects using gsutil. The scenario I run is basically:

docker run --name fake-gcs-server -p 4443:4443 fsouza/fake-gcs-server

gsutil -o "Credentials:gs_json_host=0.0.0.0" -o "Credentials:gs_json_port=4443" -o "Boto:https_validate_certificates=False" mb "gs://test"

gsutil -o "Credentials:gs_json_host=0.0.0.0" -o "Credentials:gs_json_port=4443" -o "Boto:https_validate_certificates=False" ls "gs://test"

echo "Hello" | gsutil -o "Credentials:gs_json_host=0.0.0.0" -o "Credentials:gs_json_port=4443" -o "Boto:https_validate_certificates=False" cp - "gs://test/hello.txt"

Copying from ...
ResumableUploadStartOverException: 404 Bad Request

gsutil -o "Credentials:gs_json_host=0.0.0.0" -o "Credentials:gs_json_port=4443" -o "Boto:https_validate_certificates=False" cp ./tmp/funny-memes-81.jpg "gs://test/"

Copying file://./tmp/funny-memes-81.jpg [Content-Type=image/jpeg]...
BadRequestException: 400 Bad Request

Originally created by @jonasfugedi on GitHub (Apr 19, 2020). Original GitHub issue: https://github.com/fsouza/fake-gcs-server/issues/217 I need to test some bash scripts which run gsutil but I keep getting errors when trying to create objects using gsutil. The scenario I run is basically: docker run --name fake-gcs-server -p 4443:4443 fsouza/fake-gcs-server gsutil -o "Credentials:gs_json_host=0.0.0.0" -o "Credentials:gs_json_port=4443" -o "Boto:https_validate_certificates=False" mb "gs://test" gsutil -o "Credentials:gs_json_host=0.0.0.0" -o "Credentials:gs_json_port=4443" -o "Boto:https_validate_certificates=False" ls "gs://test" echo "Hello" | gsutil -o "Credentials:gs_json_host=0.0.0.0" -o "Credentials:gs_json_port=4443" -o "Boto:https_validate_certificates=False" cp - "gs://test/hello.txt" Copying from <STDIN>... ResumableUploadStartOverException: 404 Bad Request gsutil -o "Credentials:gs_json_host=0.0.0.0" -o "Credentials:gs_json_port=4443" -o "Boto:https_validate_certificates=False" cp ./tmp/funny-memes-81.jpg "gs://test/" Copying file://./tmp/funny-memes-81.jpg [Content-Type=image/jpeg]... BadRequestException: 400 Bad Request
kerem 2026-03-03 12:07:42 +03:00
Author
Owner

@fsouza commented on GitHub (Apr 20, 2020):

Hey @jonasfugedi, thanks for opening this issue. What do you see in the server logs?

<!-- gh-comment-id:616802603 --> @fsouza commented on GitHub (Apr 20, 2020): Hey @jonasfugedi, thanks for opening this issue. What do you see in the server logs?
Author
Owner

@jonasfugedi commented on GitHub (Apr 21, 2020):

The logs did not give me any good clues. Is there any debug flag I can enable to get more details?

time="2020-04-19T17:01:22Z"`
level=info
msg="172.17.0.1 - - [19/Apr/2020:17:01:22 +0000] "POST /resumable/upload/storage/v1/b/test/o?fields=generation%2CcustomerEncryption%2Cmd5Hash%2Ccrc32c%2Cetag%2Csize&alt=json&uploadType=resumable HTTP/1.1" 404 19"

server_log.txt

Also, I assume this is reproducible anywhere? I've only tried it on two machines so far.

<!-- gh-comment-id:616971781 --> @jonasfugedi commented on GitHub (Apr 21, 2020): The logs did not give me any good clues. Is there any debug flag I can enable to get more details? time="2020-04-19T17:01:22Z"` level=info msg="172.17.0.1 - - [19/Apr/2020:17:01:22 +0000] \"POST /resumable/upload/storage/v1/b/test/o?fields=generation%2CcustomerEncryption%2Cmd5Hash%2Ccrc32c%2Cetag%2Csize&alt=json&uploadType=resumable HTTP/1.1\" 404 19" [server_log.txt](https://github.com/fsouza/fake-gcs-server/files/4508017/server_log.txt) Also, I assume this is reproducible anywhere? I've only tried it on two machines so far.
Author
Owner

@fsouza commented on GitHub (Apr 21, 2020):

@jonasfugedi thanks for sharing. I wanted to check where exactly the 400 was happening.

So:

  1. gsutil is calling the /resumable/upload endpoint, which is not defined in fake-gcs-server. I couldn't find docs for that endpoint, so we may need to reverse engineer it from the client code or tapping into some requests
  2. it appens that gsutil falls back to multipart upload and calls upload with uploadType=multipart, which fails with a 400. As far a I can tell, that endpoint fails when the Content-Type header can't be parsed as a multipart header, which may mean that whatever gsutil is sending isn't recognized by fake-gcs-server.

I believe next step would be to try and tap into what gsutil is sending, creating a test and fixing the issue in fake-gcs-server. Will tag this as a bug.

Thanks again for reporting and for sharing the logs!

<!-- gh-comment-id:617225647 --> @fsouza commented on GitHub (Apr 21, 2020): @jonasfugedi thanks for sharing. I wanted to check where exactly the 400 was happening. So: 1. gsutil is calling the /resumable/upload endpoint, which is not defined in fake-gcs-server. I couldn't find docs for that endpoint, so we may need to reverse engineer it from the client code or tapping into some requests 1. it appens that gsutil falls back to multipart upload and calls upload with [`uploadType=multipart`](https://github.com/fsouza/fake-gcs-server/blob/27773ebbb8c789b7bedaaa70daea0aedba0ef32f/fakestorage/upload.go#L51-L61), which fails with a 400. As far a I can tell, that [endpoint](https://github.com/fsouza/fake-gcs-server/blob/27773ebbb8c789b7bedaaa70daea0aedba0ef32f/fakestorage/upload.go#L146) fails when the Content-Type header can't be parsed as a multipart header, which may mean that whatever gsutil is sending isn't recognized by fake-gcs-server. I believe next step would be to try and tap into what gsutil is sending, creating a test and fixing the issue in fake-gcs-server. Will tag this as a bug. Thanks again for reporting and for sharing the logs!
Author
Owner

@ex-nerd commented on GitHub (Jun 3, 2020):

I think I'm running into this same issue in my tests to create signed upload URLs. When using fake-gcs server I just return a "direct" URL without the signing key (since that I can't fake KMS stuff and this works for downloads).

Edit: I'm not entirely sure this is the same bug, so I moved this comment over to #270 as its own thing.

<!-- gh-comment-id:637929427 --> @ex-nerd commented on GitHub (Jun 3, 2020): I think I'm running into this same issue in my tests to create signed upload URLs. When using fake-gcs server I just return a "direct" URL without the signing key (since that I can't fake KMS stuff and this works for downloads). Edit: I'm not entirely sure this is the same bug, so I moved this comment over to #270 as its own thing.
Author
Owner

@StephenWithPH commented on GitHub (Aug 25, 2020):

[...] that endpoint fails when the Content-Type header can't be parsed as a multipart header, which may mean that whatever gsutil is sending isn't recognized by fake-gcs-server.

I'm having a similar problem with gsutil cp. I did some digging. I think you are correct.

gsutil -DD cp ... enables debugging. I was able to capture the headers. In my case, they were:

Headers: {'accept': 'application/json',
 'accept-encoding': 'gzip, deflate',
 'content-length': '401',
 'content-type': 'multipart/related; '
                 "boundary='===============1523364337061494617=='",
 'user-agent': 'apitools Python/3.8.5 gsutil/4.52 (linux) analytics/disabled '
               'interactive/True command/cp google-cloud-sdk/306.0.0'}

https://stackoverflow.com/questions/43527820/mime-parsemediatype-fails-on-multipart-boundary gave me enough of a clue to mess with ' -> " (thanks, Python?), and that seems to fix it. See https://play.golang.org/p/TJ5qzwTzSOk.

I made the change on a fork and verified I was able to get past this error. See github.com/StephenWithPH/fake-gcs-server@87e3e3e4e1.

However...

gsutil cp ... is now failing at a different point:

Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
    gsutil.RunMain()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil.py", line 123, in RunMain
    sys.exit(gslib.__main__.main())
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 429, in main
    return _RunNamedCommandAndHandleExceptions(
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 767, in _RunNamedCommandAndHandleExceptions
    _HandleUnknownFailure(e)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 625, in _RunNamedCommandAndHandleExceptions
    return command_runner.RunNamedCommand(command_name,
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 411, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 1196, in RunCommand
    self.Apply(_CopyFuncWrapper,
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1514, in Apply
    self._SequentialApply(func, args_iterator, exception_handler, caller_id,
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1586, in _SequentialApply
    worker_thread.PerformTask(task, self)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2306, in PerformTask
    results = task.func(cls, task.args, thread_state=self.thread_gsutil_api)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 778, in _CopyFuncWrapper
    cls.CopyFunc(args,
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 1059, in CopyFunc
    self.total_bytes_transferred += bytes_transferred
TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType'

This is at https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/commands/cp.py#L1053. I'm working backwards, but it looks like fake-gcs-server doesn't send back the size of the object in its response.

Of note, the object uploads successfully (see logs):

time="2020-08-25T01:02:11Z" level=info msg="172.18.0.3 - - [25/Aug/2020:01:02:11 +0000] \"POST /upload/storage/v1/b/<redacted>/o?alt=json&fields=crc32c%2Cgeneration%2CcustomerEncryption%2Cetag%2Csize%2Cmd5Hash&key=<redacted>&uploadType=multipart HTTP/1.1\" 200 360"

And the object is actually there in fake-gcs-server:

cat fake-gcs/<redacted>/foo.txt 
{"ContentType":"text/plain; charset=us-ascii","ContentEncoding":"","Content":"Zm9vCg==","Crc32c":"liY0ew==","Md5Hash":"07BzhNET7exJ6qYjitX/AA==","ACL":[{"Entity":"projectOwner","EntityID":"","Role":"OWNER","Domain":"","Email":"","ProjectTeam":null}],"Metadata":null,"Created":"2020-08-24T23:41:17.771381Z","Deleted":"0001-01-01T00:00:00Z","Updated":"2020-08-24T23:41:17.771384Z","Generation":0}
<!-- gh-comment-id:679443353 --> @StephenWithPH commented on GitHub (Aug 25, 2020): >[...] that endpoint fails when the Content-Type header can't be parsed as a multipart header, which may mean that whatever gsutil is sending isn't recognized by fake-gcs-server. I'm having a similar problem with `gsutil cp`. I did some digging. I think you are correct. `gsutil -DD cp ...` enables debugging. I was able to capture the headers. In my case, they were: ``` Headers: {'accept': 'application/json', 'accept-encoding': 'gzip, deflate', 'content-length': '401', 'content-type': 'multipart/related; ' "boundary='===============1523364337061494617=='", 'user-agent': 'apitools Python/3.8.5 gsutil/4.52 (linux) analytics/disabled ' 'interactive/True command/cp google-cloud-sdk/306.0.0'} ``` https://stackoverflow.com/questions/43527820/mime-parsemediatype-fails-on-multipart-boundary gave me enough of a clue to mess with `'` -> `"` (thanks, Python?), and that seems to fix it. See https://play.golang.org/p/TJ5qzwTzSOk. I made the change on a fork and verified I was able to get past this error. See https://github.com/StephenWithPH/fake-gcs-server/commit/87e3e3e4e167793ebb862563a0f5e612ce384844. **However**... `gsutil cp ...` is now failing at a different point: ``` Traceback (most recent call last): File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module> gsutil.RunMain() File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil.py", line 123, in RunMain sys.exit(gslib.__main__.main()) File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 429, in main return _RunNamedCommandAndHandleExceptions( File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 767, in _RunNamedCommandAndHandleExceptions _HandleUnknownFailure(e) File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 625, in _RunNamedCommandAndHandleExceptions return command_runner.RunNamedCommand(command_name, File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 411, in RunNamedCommand return_code = command_inst.RunCommand() File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 1196, in RunCommand self.Apply(_CopyFuncWrapper, File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1514, in Apply self._SequentialApply(func, args_iterator, exception_handler, caller_id, File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1586, in _SequentialApply worker_thread.PerformTask(task, self) File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2306, in PerformTask results = task.func(cls, task.args, thread_state=self.thread_gsutil_api) File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 778, in _CopyFuncWrapper cls.CopyFunc(args, File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 1059, in CopyFunc self.total_bytes_transferred += bytes_transferred TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType' ``` This is at https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/commands/cp.py#L1053. I'm working backwards, but it looks like `fake-gcs-server` doesn't send back the size of the object in its response. Of note, the object uploads successfully (see logs): ``` time="2020-08-25T01:02:11Z" level=info msg="172.18.0.3 - - [25/Aug/2020:01:02:11 +0000] \"POST /upload/storage/v1/b/<redacted>/o?alt=json&fields=crc32c%2Cgeneration%2CcustomerEncryption%2Cetag%2Csize%2Cmd5Hash&key=<redacted>&uploadType=multipart HTTP/1.1\" 200 360" ``` And the object is actually there in `fake-gcs-server`: ``` cat fake-gcs/<redacted>/foo.txt {"ContentType":"text/plain; charset=us-ascii","ContentEncoding":"","Content":"Zm9vCg==","Crc32c":"liY0ew==","Md5Hash":"07BzhNET7exJ6qYjitX/AA==","ACL":[{"Entity":"projectOwner","EntityID":"","Role":"OWNER","Domain":"","Email":"","ProjectTeam":null}],"Metadata":null,"Created":"2020-08-24T23:41:17.771381Z","Deleted":"0001-01-01T00:00:00Z","Updated":"2020-08-24T23:41:17.771384Z","Generation":0} ```
Author
Owner

@ekimekim commented on GitHub (May 26, 2023):

An update: It seems that the second half of @StephenWithPH 's comment has been fixed - fixing the multipart boundary bug is now sufficient for gsutil cp to work, at least for the version I'm using.
I've made a PR with the ' -> " hack.

<!-- gh-comment-id:1564949848 --> @ekimekim commented on GitHub (May 26, 2023): An update: It seems that the second half of @StephenWithPH 's comment has been fixed - fixing the multipart boundary bug is now sufficient for `gsutil cp` to work, at least for the version I'm using. I've made a PR with the `'` -> `"` hack.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/fake-gcs-server#40
No description provided.