[GH-ISSUE #1950] Timeout on download_as_bytes #232

Closed
opened 2026-03-03 12:09:19 +03:00 by kerem · 1 comment
Owner

Originally created by @meganvw on GitHub (Apr 7, 2025).
Original GitHub issue: https://github.com/fsouza/fake-gcs-server/issues/1950

I'm trying to download an object as bytes via Python, but continuously hit Timeout of 120.0s exceeded:

google.api_core.exceptions.RetryError: Timeout of 120.0s exceeded, last exception: HTTPConnectionPool(host='0.0.0.0', port=4443): Max retries exceeded with url: /download/storage/v1/b/keeper-entries/o/ROOT_USER%2Fworkspace%2Fdd86987f-98e9-5267-87ab-07fee7135b13?alt=media (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7c412e031f20>: Failed to establish a new connection: [Errno 111] Connection refused'))

when I curl the same object from the command line, it returns instantly:

curl http://0.0.0.0:4443/download/storage/v1/b/keeper-entries/o/ROOT_USER%2Fworkspace%2Fdd86987f-98e9-5267-87ab-07fee7135b13\?alt\=media
{
    "type": "workspace",
    "name": "root",
    "is_root": true,
    "entry_refs": {}
}

Other calls from Python to the fake-gcs-server running in docker are successfully completing. Here is the code:

def get_entry(
    env: StorageEnv,
    gcs_client: storage.Client,
    user_id: str,
    entry_uuid: str,
    entry_type: EntryType,
) -> Entry|None:
    bucket = _get_bucket(env=env, gcs_client=gcs_client)
    blob_path = _entry_path(
        user_id=user_id,
        entry_uuid=entry_uuid,
        entry_type=entry_type,
    )
    blob = bucket.get_blob(blob_path)

    if not blob:
        return None

    try:
        # HERE IS WHERE IT HANGS
        byte_data = blob.download_as_bytes()
        str_data = str(byte_data.decode())
        json_data = json.loads(str_data)
        entry_data = _json_to_entry_data(json_data)
        return Entry(
            uuid=entry_uuid,
            user_id=user_id,
            created_at=datetime.now(),
            updated_at=blob.updated,
            data=entry_data,
        )
    except Exception as e:
        logger.error(f"Failed to read entry from GCS: {blob_url}: {e}")
        raise e

I do get a warning, which I'm thinking may be the root, but not sure of the way around it.

web_be-1            | /usr/local/lib/python3.13/site-packages/google_crc32c/__init__.py:29: RuntimeWarning: As the c extension couldn't be imported, `google-crc32c` is using a pure python implementation that is significantly slower. If possible, please configure a c build environment and compile the extension
web_be-1            |   warnings.warn(_SLOW_CRC32C_WARNING, RuntimeWarning)

Is this a known issue? Thank you!

Originally created by @meganvw on GitHub (Apr 7, 2025). Original GitHub issue: https://github.com/fsouza/fake-gcs-server/issues/1950 I'm trying to download an object as bytes via Python, but continuously hit Timeout of 120.0s exceeded: ``` google.api_core.exceptions.RetryError: Timeout of 120.0s exceeded, last exception: HTTPConnectionPool(host='0.0.0.0', port=4443): Max retries exceeded with url: /download/storage/v1/b/keeper-entries/o/ROOT_USER%2Fworkspace%2Fdd86987f-98e9-5267-87ab-07fee7135b13?alt=media (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7c412e031f20>: Failed to establish a new connection: [Errno 111] Connection refused')) ``` when I curl the same object from the command line, it returns instantly: ``` curl http://0.0.0.0:4443/download/storage/v1/b/keeper-entries/o/ROOT_USER%2Fworkspace%2Fdd86987f-98e9-5267-87ab-07fee7135b13\?alt\=media { "type": "workspace", "name": "root", "is_root": true, "entry_refs": {} } ``` Other calls from Python to the fake-gcs-server running in docker are successfully completing. Here is the code: ``` def get_entry( env: StorageEnv, gcs_client: storage.Client, user_id: str, entry_uuid: str, entry_type: EntryType, ) -> Entry|None: bucket = _get_bucket(env=env, gcs_client=gcs_client) blob_path = _entry_path( user_id=user_id, entry_uuid=entry_uuid, entry_type=entry_type, ) blob = bucket.get_blob(blob_path) if not blob: return None try: # HERE IS WHERE IT HANGS byte_data = blob.download_as_bytes() str_data = str(byte_data.decode()) json_data = json.loads(str_data) entry_data = _json_to_entry_data(json_data) return Entry( uuid=entry_uuid, user_id=user_id, created_at=datetime.now(), updated_at=blob.updated, data=entry_data, ) except Exception as e: logger.error(f"Failed to read entry from GCS: {blob_url}: {e}") raise e ``` I do get a warning, which I'm thinking may be the root, but not sure of the way around it. ``` web_be-1 | /usr/local/lib/python3.13/site-packages/google_crc32c/__init__.py:29: RuntimeWarning: As the c extension couldn't be imported, `google-crc32c` is using a pure python implementation that is significantly slower. If possible, please configure a c build environment and compile the extension web_be-1 | warnings.warn(_SLOW_CRC32C_WARNING, RuntimeWarning) ``` Is this a known issue? Thank you!
kerem closed this issue 2026-03-03 12:09:19 +03:00
Author
Owner

@meganvw commented on GitHub (Apr 8, 2025):

Figured it out.

I have overridden the GCS client endpoint by setting STORAGE_EMULATOR_HOST, but the call bucket.get_blob("/blob_path") creates a blob with the default hostname 0.0.0.0 instead of the custom host, which you can see with blob._get_download_url(client).

Switching to Blob.from_uri(blob_ui, client) correctly sets the URI with the custom hostname, and the download will complete.

Another small gotcha I hit, if you use Blob.from_uri() but call blob.reload() first (eg. to populate server side metadata on the blob object), it will default back to the original hostname and cause this same error when trying to call blob.download_as_bytes().

<!-- gh-comment-id:2785055628 --> @meganvw commented on GitHub (Apr 8, 2025): Figured it out. I have overridden the GCS client endpoint by setting STORAGE_EMULATOR_HOST, but the call `bucket.get_blob("/blob_path")` creates a blob with the default hostname 0.0.0.0 instead of the custom host, which you can see with `blob._get_download_url(client)`. Switching to `Blob.from_uri(blob_ui, client)` correctly sets the URI with the custom hostname, and the download will complete. Another small gotcha I hit, if you use `Blob.from_uri()` but call `blob.reload()` first (eg. to populate server side metadata on the blob object), it will default back to the original hostname and cause this same error when trying to call `blob.download_as_bytes()`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/fake-gcs-server#232
No description provided.