[GH-ISSUE #70] BUG: 422 validation error when tool_result contains image content blocks (browser screenshots) #46

New issue

Open

opened 2026-02-27 07:17:41 +03:00 by kerem · 0 comments

kerem commented

2026-02-27 07:17:41 +03:00

Owner

Originally created by @bhaskoro-muthohar on GitHub (Feb 7, 2026).
Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/70

Kiro Gateway Version

v2.2 (latest, commit 9c7933c)

What happened?

When using the Anthropic Messages API route (/v1/messages) with a client that sends tool results containing image content blocks (e.g., browser screenshots from computer use / OpenClaw browser channel), the gateway returns a 422 Pydantic validation error:

422 {"detail":[{"type":"string_type","loc":["body","messages",663,"content","str"],
"msg":"Input should be a valid string",
"input":[{"type":"tool_result","tool_use_id":"tooluse_W2iRH1mYYRx5Y3Dzco3NcJ",
"content":[{"type":"text","text":"MEDIA:..."},
{"type":"image","source":{"type":"base64","media_type":"image/jpeg","data":"/9j/..."}}]}]}

Per the Anthropic API spec, tool_result content can include both text and image blocks. This is common when tools return screenshots (e.g., computer use, browser automation).

Root cause (two issues):

1. `models_anthropic.py` - Pydantic model rejects image blocks

ToolResultContentBlock.content is typed as Optional[Union[str, List[TextContentBlock]]], which rejects image content blocks at validation time.

# Current (line 85)
content: Optional[Union[str, List["TextContentBlock"]]] = None

# Fix
content: Optional[Union[str, List[Any]]] = None

2. `converters_anthropic.py` - Images inside tool_results are silently dropped

convert_anthropic_messages() calls extract_images_from_content() only on top-level message content blocks. It never looks inside tool_result.content, so images embedded in tool results (like browser screenshots) are silently dropped and never sent to Kiro.

# Current (line 254-257) - only extracts top-level images
images = extract_images_from_content(content)

# Fix - also extract images from inside tool_result content blocks
images = extract_images_from_content(content)
if isinstance(content, list):
    for block in content:
        tr_content = None
        if isinstance(block, dict) and block.get("type") == "tool_result":
            tr_content = block.get("content", [])
        elif hasattr(block, "type") and getattr(block, "type", None) == "tool_result":
            tr_content = getattr(block, "content", [])
        if isinstance(tr_content, list):
            images.extend(extract_images_from_content(tr_content))

Debug Logs

Request body (truncated) showing the failing message structure:

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "tooluse_W2iRH1mYYRx5Y3Dzco3NcJ",
      "content": [
        {"type": "text", "text": "MEDIA:/root/.openclaw/media/browser/fb44d88d.jpg"},
        {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "/9j/..."}}
      ]
    }
  ]
}

Pydantic rejects this at validation time because List[TextContentBlock] doesn't match a list containing an ImageContentBlock.

Related to #50 (similar area of code, but that was about Pydantic text extraction, not images in tool results).

Originally created by @bhaskoro-muthohar on GitHub (Feb 7, 2026). Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/70 ## Kiro Gateway Version v2.2 (latest, commit 9c7933c) ## What happened? When using the Anthropic Messages API route (`/v1/messages`) with a client that sends tool results containing image content blocks (e.g., browser screenshots from computer use / OpenClaw browser channel), the gateway returns a 422 Pydantic validation error: ``` 422 {"detail":[{"type":"string_type","loc":["body","messages",663,"content","str"], "msg":"Input should be a valid string", "input":[{"type":"tool_result","tool_use_id":"tooluse_W2iRH1mYYRx5Y3Dzco3NcJ", "content":[{"type":"text","text":"MEDIA:..."}, {"type":"image","source":{"type":"base64","media_type":"image/jpeg","data":"/9j/..."}}]}]} ``` Per the [Anthropic API spec](https://docs.anthropic.com/en/api/messages), `tool_result` content can include both text and image blocks. This is common when tools return screenshots (e.g., computer use, browser automation). **Root cause (two issues):** ### 1. `models_anthropic.py` - Pydantic model rejects image blocks `ToolResultContentBlock.content` is typed as `Optional[Union[str, List[TextContentBlock]]]`, which rejects image content blocks at validation time. ```python # Current (line 85) content: Optional[Union[str, List["TextContentBlock"]]] = None # Fix content: Optional[Union[str, List[Any]]] = None ``` ### 2. `converters_anthropic.py` - Images inside tool_results are silently dropped `convert_anthropic_messages()` calls `extract_images_from_content()` only on top-level message content blocks. It never looks inside `tool_result.content`, so images embedded in tool results (like browser screenshots) are silently dropped and never sent to Kiro. ```python # Current (line 254-257) - only extracts top-level images images = extract_images_from_content(content) # Fix - also extract images from inside tool_result content blocks images = extract_images_from_content(content) if isinstance(content, list): for block in content: tr_content = None if isinstance(block, dict) and block.get("type") == "tool_result": tr_content = block.get("content", []) elif hasattr(block, "type") and getattr(block, "type", None) == "tool_result": tr_content = getattr(block, "content", []) if isinstance(tr_content, list): images.extend(extract_images_from_content(tr_content)) ``` ## Debug Logs Request body (truncated) showing the failing message structure: ```json { "role": "user", "content": [ { "type": "tool_result", "tool_use_id": "tooluse_W2iRH1mYYRx5Y3Dzco3NcJ", "content": [ {"type": "text", "text": "MEDIA:/root/.openclaw/media/browser/fb44d88d.jpg"}, {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "/9j/..."}} ] } ] } ``` Pydantic rejects this at validation time because `List[TextContentBlock]` doesn't match a list containing an `ImageContentBlock`. Related to #50 (similar area of code, but that was about Pydantic text extraction, not images in tool results).