[GH-ISSUE #56] Kiro API truncates large tool call payloads mid-stream #37

Closed
opened 2026-02-27 07:17:37 +03:00 by kerem · 1 comment
Owner

Originally created by @bhaskoro-muthohar on GitHub (Jan 25, 2026).
Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/56

Summary

When Claude generates tool calls with large payloads (e.g., Write tool with 200+ lines of code), the Kiro API truncates the response stream before the tool call completes. This results in incomplete JSON that cannot be parsed, causing tool calls to fail silently with empty parameters.

This is an upstream Kiro API limitation, not a gateway bug. The gateway correctly logs and handles the truncated data, but cannot recover the missing payload.

Environment

  • Kiro Gateway version: 2.1
  • Client: Claude Code CLI
  • Models affected: claude-opus-4-5, claude-sonnet-4-5 (likely all models)

Symptoms

Symptom Description
Write: {} in client transcript Tool call recorded with empty parameters
tool_blocks=0 in gateway logs No tool calls detected despite model generating them
No contextUsagePercentage Stream cut before completion metadata sent
No "stop":true event Tool call never finalized

Evidence

Truncated vs Working Session Comparison

Metric Truncated Session Working Session
Stream size 4,775 bytes 39,318 bytes
toolUseEvent count 16 110
"stop":true events 0 2
contextUsagePercentage Not received 61.49%

Raw Stream Analysis

The truncated stream shows the tool call starting correctly but being cut off mid-payload:

1. <thinking>...</thinking> - Completes normally
2. toolUseEvent: {"name":"Write","toolUseId":"..."} - Starts correctly
3. toolUseEvent: {"input":"{\"file_pa",...} - Building file_path incrementally
4. ... (16 chunks total)
5. toolUseEvent: {"input":"or.ts\"",...} - STREAM ENDS ABRUPTLY

Missing:
- The "content" parameter (hundreds of lines of code)
- Closing braces for the JSON
- The "stop":true event to finalize the tool call
- The contextUsagePercentage event

Key Finding: No max_tokens Support

The Kiro API does not accept max_tokens or inferenceConfig parameters. The gateway payload only contains:

  • conversationState (messages, history, tools)
  • profileArn

This means there is no client-side control over output limits - truncation is entirely server-side.

Reproduction Steps

  1. Start a Claude Code session via kiro-gateway
  2. Ask Claude to write a large file (200+ lines of code)
  3. Enable DEBUG_MODE=all to capture raw stream
  4. Observe the stream truncation in debug_logs/response_stream_raw.txt
  5. Tool call fails with empty parameters

Why This Is NOT a Gateway Bug

  1. Raw stream logging proves data stops arriving - The gateway logs every byte received from Kiro API
  2. No timeout issues - STREAMING_READ_TIMEOUT=300s is generous; stream ends cleanly at packet boundary
  3. Parser handles truncation correctly - _diagnose_json_truncation() detects and logs the issue
  4. Working sessions complete normally - Same code path works for smaller payloads

Workarounds

Users can mitigate this by:

  1. Write smaller files - Break large files into multiple smaller writes (<100 lines each)
  2. Use incremental edits - Use Edit tool instead of Write for modifications
  3. Reduce context - Start fresh sessions to minimize conversation history
  4. Manual copy - Have Claude output code in chat, copy manually

Questions for Maintainer

  1. Is there a known output size limit in the Kiro API?
  2. Are there any undocumented parameters to control response size?
  3. Has this been reported to AWS/Amazon Q team?

Suggested Documentation

Consider adding a note to the README about this limitation so users are aware and can use the workarounds proactively.


Investigated with DEBUG_MODE=all. Raw stream evidence available if needed.

Originally created by @bhaskoro-muthohar on GitHub (Jan 25, 2026). Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/56 ## Summary When Claude generates tool calls with large payloads (e.g., Write tool with 200+ lines of code), the Kiro API truncates the response stream before the tool call completes. This results in incomplete JSON that cannot be parsed, causing tool calls to fail silently with empty parameters. **This is an upstream Kiro API limitation, not a gateway bug.** The gateway correctly logs and handles the truncated data, but cannot recover the missing payload. ## Environment - Kiro Gateway version: 2.1 - Client: Claude Code CLI - Models affected: claude-opus-4-5, claude-sonnet-4-5 (likely all models) ## Symptoms | Symptom | Description | |---------|-------------| | `Write: {}` in client transcript | Tool call recorded with empty parameters | | `tool_blocks=0` in gateway logs | No tool calls detected despite model generating them | | No `contextUsagePercentage` | Stream cut before completion metadata sent | | No `"stop":true` event | Tool call never finalized | ## Evidence ### Truncated vs Working Session Comparison | Metric | Truncated Session | Working Session | |--------|-------------------|-----------------| | Stream size | 4,775 bytes | 39,318 bytes | | toolUseEvent count | 16 | 110 | | `"stop":true` events | **0** | 2 | | contextUsagePercentage | **Not received** | 61.49% | ### Raw Stream Analysis The truncated stream shows the tool call starting correctly but being cut off mid-payload: ``` 1. <thinking>...</thinking> - Completes normally 2. toolUseEvent: {"name":"Write","toolUseId":"..."} - Starts correctly 3. toolUseEvent: {"input":"{\"file_pa",...} - Building file_path incrementally 4. ... (16 chunks total) 5. toolUseEvent: {"input":"or.ts\"",...} - STREAM ENDS ABRUPTLY Missing: - The "content" parameter (hundreds of lines of code) - Closing braces for the JSON - The "stop":true event to finalize the tool call - The contextUsagePercentage event ``` ### Key Finding: No `max_tokens` Support The Kiro API does not accept `max_tokens` or `inferenceConfig` parameters. The gateway payload only contains: - `conversationState` (messages, history, tools) - `profileArn` This means **there is no client-side control over output limits** - truncation is entirely server-side. ## Reproduction Steps 1. Start a Claude Code session via kiro-gateway 2. Ask Claude to write a large file (200+ lines of code) 3. Enable `DEBUG_MODE=all` to capture raw stream 4. Observe the stream truncation in `debug_logs/response_stream_raw.txt` 5. Tool call fails with empty parameters ## Why This Is NOT a Gateway Bug 1. **Raw stream logging proves data stops arriving** - The gateway logs every byte received from Kiro API 2. **No timeout issues** - `STREAMING_READ_TIMEOUT=300s` is generous; stream ends cleanly at packet boundary 3. **Parser handles truncation correctly** - `_diagnose_json_truncation()` detects and logs the issue 4. **Working sessions complete normally** - Same code path works for smaller payloads ## Workarounds Users can mitigate this by: 1. **Write smaller files** - Break large files into multiple smaller writes (<100 lines each) 2. **Use incremental edits** - Use Edit tool instead of Write for modifications 3. **Reduce context** - Start fresh sessions to minimize conversation history 4. **Manual copy** - Have Claude output code in chat, copy manually ## Questions for Maintainer 1. Is there a known output size limit in the Kiro API? 2. Are there any undocumented parameters to control response size? 3. Has this been reported to AWS/Amazon Q team? ## Suggested Documentation Consider adding a note to the README about this limitation so users are aware and can use the workarounds proactively. --- *Investigated with DEBUG_MODE=all. Raw stream evidence available if needed.*
kerem 2026-02-27 07:17:37 +03:00
Author
Owner

@jwadow commented on GitHub (Jan 30, 2026):

Hi, bro, gateway now detects when Kiro API truncates tool calls mid-stream and injects a notice in the next request so the model knows what happened and can adapt. Works for both tool calls and regular content truncation.

Implementation:

  • In-memory cache tracks truncations by tool_call_id
  • Modified tool_results get prepended with [API Limitation] notice
  • System prompt tells the model these notices are legit
  • Enabled by default, can disable via TRUNCATION_RECOVERY=false

The ~5KB limit is still there (upstream issue), but at least now the model understands why the tool call failed instead of thinking it forgot a parameter.

<!-- gh-comment-id:3822690779 --> @jwadow commented on GitHub (Jan 30, 2026): Hi, bro, gateway now detects when Kiro API truncates tool calls mid-stream and injects a notice in the next request so the model knows what happened and can adapt. Works for both tool calls and regular content truncation. Implementation: - In-memory cache tracks truncations by tool_call_id - Modified tool_results get prepended with `[API Limitation]` notice - System prompt tells the model these notices are legit - Enabled by default, can disable via `TRUNCATION_RECOVERY=false` The ~5KB limit is still there (upstream issue), but at least now the model understands why the tool call failed instead of thinking it forgot a parameter.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/kiro-gateway-jwadow#37
No description provided.