[GH-ISSUE #623] 404 when doing a resumable upload POST #110

Open
opened 2026-03-03 12:08:23 +03:00 by kerem · 9 comments
Owner

Originally created by @BigJerBD on GitHub (Nov 15, 2021).
Original GitHub issue: https://github.com/fsouza/fake-gcs-server/issues/623

Image version : latest (v1.30.2)

I'm using apache beam to do resumable uploads into a fake gcs bucket (for testing purpose) , but I get this error

"GET  /storage/v1/b/data?alt=json HTTP/1.1\" 200 112" 
"POST /resumable/upload/storage/v1/b/data/o?alt=json&name=aac%2Ftest1%2Fbeam-temp-data-820e4a4e464311ecac030242ac150002%2F18266b8a-3b30-4bf3-bda5-af203113e46d.data.csv&uploadType=resumable HTTP/1.1\" 404 59"

I also confirmed that the path test1 was present :

"GET /storage/v1/b/data/o?maxResults=1&projection=noAcl&prefix=aac%2Ftest1%2F2021103019551635623713%2F&delimiter=%2F&prettyPrint=false HTTP/1.1\" 200 533"

It work with the real GCS service so I was wondering if the sent POST has any version compatibility error or if it isnt supported yet anyhow.

Thanks !

Originally created by @BigJerBD on GitHub (Nov 15, 2021). Original GitHub issue: https://github.com/fsouza/fake-gcs-server/issues/623 Image version : latest (v1.30.2) I'm using apache beam to do resumable uploads into a fake gcs bucket (for testing purpose) , but I get this error ``` "GET /storage/v1/b/data?alt=json HTTP/1.1\" 200 112" "POST /resumable/upload/storage/v1/b/data/o?alt=json&name=aac%2Ftest1%2Fbeam-temp-data-820e4a4e464311ecac030242ac150002%2F18266b8a-3b30-4bf3-bda5-af203113e46d.data.csv&uploadType=resumable HTTP/1.1\" 404 59" ``` I also confirmed that the path `test1` was present : ``` "GET /storage/v1/b/data/o?maxResults=1&projection=noAcl&prefix=aac%2Ftest1%2F2021103019551635623713%2F&delimiter=%2F&prettyPrint=false HTTP/1.1\" 200 533" ``` It work with the real GCS service so I was wondering if the sent POST has any version compatibility error or if it isnt supported yet anyhow. Thanks !
Author
Owner

@fsouza commented on GitHub (Apr 22, 2022):

@BigJerBD hey, would you be able to share a snippet on how to reproduce the issue? I can definitely look into this some time this weekend or early next week.

<!-- gh-comment-id:1106459233 --> @fsouza commented on GitHub (Apr 22, 2022): @BigJerBD hey, would you be able to share a snippet on how to reproduce the issue? I can definitely look into this some time this weekend or early next week.
Author
Owner

@wwwjn commented on GitHub (Apr 22, 2022):

Hi @BigJerBD , I'm also trying to use Apache beam Filesystems to upload and download (Using Filesystems). But I keep getting error: HttpError accessing <https://www.googleapis.com/resumable/upload/storage/v1/b/. It seems that it keeps accessing www.googleapis.com using Apache Beam, no matter how I set the environment variable. Could you please share a snippet how you do this? Thanks a lot!

<!-- gh-comment-id:1106737814 --> @wwwjn commented on GitHub (Apr 22, 2022): Hi @BigJerBD , I'm also trying to use Apache beam Filesystems to upload and download (Using Filesystems). But I keep getting error: `HttpError accessing <https://www.googleapis.com/resumable/upload/storage/v1/b/`. It seems that it keeps accessing www.googleapis.com using Apache Beam, no matter how I set the environment variable. Could you please share a snippet how you do this? Thanks a lot!
Author
Owner

@BigJerBD commented on GitHub (Apr 22, 2022):

I'll try this weekend to share a snippet the error that I had .

It's been a while so I probably lost it and have to reproduce it again 😅

<!-- gh-comment-id:1106855280 --> @BigJerBD commented on GitHub (Apr 22, 2022): I'll try this weekend to share a snippet the error that I had . It's been a while so I probably lost it and have to reproduce it again :sweat_smile:
Author
Owner

@wwwjn commented on GitHub (Apr 23, 2022):

Hi, I monkey-patch Apache Beam to replace www.googleapis.com with fake-gcs-server, then I got the same error with @BigJerBD (I got 404 !)
And my script is: test.py (Apache beam version : apache-beam==2.36.0)

def test_GCS():
    URL = "gs://sample-bucket/test.gz"

    # write to test buckets
    with FileSystems.create(URL, compression_type=CompressionTypes.UNCOMPRESSED) as f:
        f.write(gzip.compress(b"hello world"))

if __name__ == "__main__":
    from .gcsio import *
    test_GCS()

And the gcsio.py file is (which is used for monkey-patch Apache Beam):

# Monkey-patch init function of GcsIO
import apache_beam.io.gcp.gcsio
from apache_beam.io.gcp.internal.clients import storage
from apache_beam.internal.gcp import auth
from apache_beam.internal.http_client import get_new_http

from google.auth.credentials import AnonymousCredentials

def new_init(self, storage_client=None):
    # raise Exception("This is a test")
    if storage_client is None:
        storage_client = storage.StorageV1(
            url = "http://0.0.0.0:4443/storage/v1/",
            credentials=auth.get_service_credentials(),
            get_credentials=False,
            http=get_new_http(),
            response_encoding='utf8'
        )
    self.client = storage_client
    self._rewrite_cb = None
    self.bucket_to_project_number = {}

# Monkey Patch the GcsIO to upload
apache_beam.io.gcp.gcsio.GcsIO.__init__ = new_init

And I got following error with resumable url:
image
And the following info is from the fake-gcs-docker:

time="2022-04-23T00:36:40Z" level=info msg="172.17.0.1 - - [23/Apr/2022:00:36:40 +0000] \"GET /storage/v1/b/sample-bucket?alt=json HTTP/1.1\" 200 153"

time="2022-04-23T00:36:40Z" level=info msg="172.17.0.1 - - [23/Apr/2022:00:36:40 +0000] \"POST /resumable/upload/storage/v1/b/sample-bucket/o?alt=json&name=test.gz&uploadType=resumable HTTP/1.1\" 404 59"

Thanks a lot for your help and hope this will help!

<!-- gh-comment-id:1107093547 --> @wwwjn commented on GitHub (Apr 23, 2022): Hi, I monkey-patch Apache Beam to replace `www.googleapis.com` with fake-gcs-server, then I got the same error with @BigJerBD (I got 404 !) And my script is: `test.py` (Apache beam version : `apache-beam==2.36.0`) ``` def test_GCS(): URL = "gs://sample-bucket/test.gz" # write to test buckets with FileSystems.create(URL, compression_type=CompressionTypes.UNCOMPRESSED) as f: f.write(gzip.compress(b"hello world")) if __name__ == "__main__": from .gcsio import * test_GCS() ``` And the `gcsio.py` file is (which is used for monkey-patch Apache Beam): ``` # Monkey-patch init function of GcsIO import apache_beam.io.gcp.gcsio from apache_beam.io.gcp.internal.clients import storage from apache_beam.internal.gcp import auth from apache_beam.internal.http_client import get_new_http from google.auth.credentials import AnonymousCredentials def new_init(self, storage_client=None): # raise Exception("This is a test") if storage_client is None: storage_client = storage.StorageV1( url = "http://0.0.0.0:4443/storage/v1/", credentials=auth.get_service_credentials(), get_credentials=False, http=get_new_http(), response_encoding='utf8' ) self.client = storage_client self._rewrite_cb = None self.bucket_to_project_number = {} # Monkey Patch the GcsIO to upload apache_beam.io.gcp.gcsio.GcsIO.__init__ = new_init ``` And I got following error with resumable url: <img width="1239" alt="image" src="https://user-images.githubusercontent.com/40016222/164841627-1e9c0a97-cf63-4b0d-88b9-1488e5c6d431.png"> And the following info is from the fake-gcs-docker: ``` time="2022-04-23T00:36:40Z" level=info msg="172.17.0.1 - - [23/Apr/2022:00:36:40 +0000] \"GET /storage/v1/b/sample-bucket?alt=json HTTP/1.1\" 200 153" time="2022-04-23T00:36:40Z" level=info msg="172.17.0.1 - - [23/Apr/2022:00:36:40 +0000] \"POST /resumable/upload/storage/v1/b/sample-bucket/o?alt=json&name=test.gz&uploadType=resumable HTTP/1.1\" 404 59" ``` Thanks a lot for your help and hope this will help!
Author
Owner

@BigJerBD commented on GitHub (Apr 24, 2022):

@wwwjn thank you very much for the snippet! This is indeed something like that I did when I was doing to use fake-gcs-server.

Apache beam or not, since this were also giving a 404, I was also wondering if this feature was implemented within fake-gcs-server or not.

Thanks ! :)

<!-- gh-comment-id:1107697622 --> @BigJerBD commented on GitHub (Apr 24, 2022): @wwwjn thank you very much for the snippet! This is indeed something like that I did when I was doing to use fake-gcs-server. Apache beam or not, since this were also giving a 404, I was also wondering if this feature was implemented within fake-gcs-server or not. Thanks ! :)
Author
Owner

@wwwjn commented on GitHub (May 10, 2022):

Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!

<!-- gh-comment-id:1122688976 --> @wwwjn commented on GitHub (May 10, 2022): Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!
Author
Owner

@fsouza commented on GitHub (May 11, 2022):

Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!

Hey, I haven't had a chance to look at it yet, but I assume the fix should be simple. I'll check it out in the coming weeks.

<!-- gh-comment-id:1123068288 --> @fsouza commented on GitHub (May 11, 2022): > Hi @fsouza, is there any progress on this bug? Thanks a lot for your help! Hey, I haven't had a chance to look at it yet, but I assume the fix should be simple. I'll check it out in the coming weeks.
Author
Owner

@wwwjn commented on GitHub (May 11, 2022):

Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!

Hey, I haven't had a chance to look at it yet, but I assume the fix should be simple. I'll check it out in the coming weeks.

Thanks a lot! If there is anything I could do, feel free to just let me know!

<!-- gh-comment-id:1123125971 --> @wwwjn commented on GitHub (May 11, 2022): > > Hi @fsouza, is there any progress on this bug? Thanks a lot for your help! > > Hey, I haven't had a chance to look at it yet, but I assume the fix should be simple. I'll check it out in the coming weeks. Thanks a lot! If there is anything I could do, feel free to just let me know!
Author
Owner

@martinbjeldbak commented on GitHub (Apr 15, 2024):

For anyone like me coming from Google and simply want to override the URL for Apache Beam to point to fake-gcs-server url, there's an issue tracking this here: https://github.com/apache/beam/issues/21255

For now, the solution is still to patch the url in the test. This worked for me:

from unittest import mock

@mock.patch.object(apache_beam.io.gcp.internal.clients.storage.StorageV1, "BASE_URL",
                   "http://localhost:4443/storage/v1/")
def test_gcs_source():
    pass # test implementation here should now call the emulator

where http://localhost:4443 is the url of your fake-gcs-server instance

<!-- gh-comment-id:2056193523 --> @martinbjeldbak commented on GitHub (Apr 15, 2024): For anyone like me coming from Google and simply want to override the URL for Apache Beam to point to fake-gcs-server url, there's an issue tracking this here: https://github.com/apache/beam/issues/21255 For now, the solution is still to patch the url in the test. This worked for me: ```python from unittest import mock @mock.patch.object(apache_beam.io.gcp.internal.clients.storage.StorageV1, "BASE_URL", "http://localhost:4443/storage/v1/") def test_gcs_source(): pass # test implementation here should now call the emulator ``` where `http://localhost:4443` is the url of your fake-gcs-server instance
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/fake-gcs-server#110
No description provided.