starred/kiro-gateway-jwadow

Fork 0

mirror of https://github.com/jwadow/kiro-gateway.git synced 2026-04-25 01:15:57 +03:00

[GH-ISSUE #56] Kiro API truncates large tool call payloads mid-stream #37

New issue

Closed

opened 2026-02-27 07:17:37 +03:00 by kerem · 1 comment

kerem commented

2026-02-27 07:17:37 +03:00

Owner

Originally created by @bhaskoro-muthohar on GitHub (Jan 25, 2026).
Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/56

Summary

When Claude generates tool calls with large payloads (e.g., Write tool with 200+ lines of code), the Kiro API truncates the response stream before the tool call completes. This results in incomplete JSON that cannot be parsed, causing tool calls to fail silently with empty parameters.

This is an upstream Kiro API limitation, not a gateway bug. The gateway correctly logs and handles the truncated data, but cannot recover the missing payload.

Environment

Kiro Gateway version: 2.1
Client: Claude Code CLI
Models affected: claude-opus-4-5, claude-sonnet-4-5 (likely all models)

Symptoms

Symptom	Description
`Write: {}` in client transcript	Tool call recorded with empty parameters
`tool_blocks=0` in gateway logs	No tool calls detected despite model generating them
No `contextUsagePercentage`	Stream cut before completion metadata sent
No `"stop":true` event	Tool call never finalized

Evidence

Truncated vs Working Session Comparison

Metric	Truncated Session	Working Session
Stream size	4,775 bytes	39,318 bytes
toolUseEvent count	16	110
`"stop":true` events	0	2
contextUsagePercentage	Not received	61.49%

Raw Stream Analysis

The truncated stream shows the tool call starting correctly but being cut off mid-payload:

1. <thinking>...</thinking> - Completes normally
2. toolUseEvent: {"name":"Write","toolUseId":"..."} - Starts correctly
3. toolUseEvent: {"input":"{\"file_pa",...} - Building file_path incrementally
4. ... (16 chunks total)
5. toolUseEvent: {"input":"or.ts\"",...} - STREAM ENDS ABRUPTLY

Missing:
- The "content" parameter (hundreds of lines of code)
- Closing braces for the JSON
- The "stop":true event to finalize the tool call
- The contextUsagePercentage event

Key Finding: No `max_tokens` Support

The Kiro API does not accept max_tokens or inferenceConfig parameters. The gateway payload only contains:

conversationState (messages, history, tools)
profileArn

This means there is no client-side control over output limits - truncation is entirely server-side.

Reproduction Steps

Start a Claude Code session via kiro-gateway
Ask Claude to write a large file (200+ lines of code)
Enable DEBUG_MODE=all to capture raw stream
Observe the stream truncation in debug_logs/response_stream_raw.txt
Tool call fails with empty parameters

Why This Is NOT a Gateway Bug

Raw stream logging proves data stops arriving - The gateway logs every byte received from Kiro API
No timeout issues - STREAMING_READ_TIMEOUT=300s is generous; stream ends cleanly at packet boundary
Parser handles truncation correctly - _diagnose_json_truncation() detects and logs the issue
Working sessions complete normally - Same code path works for smaller payloads

Workarounds

Users can mitigate this by:

Write smaller files - Break large files into multiple smaller writes (<100 lines each)
Use incremental edits - Use Edit tool instead of Write for modifications
Reduce context - Start fresh sessions to minimize conversation history
Manual copy - Have Claude output code in chat, copy manually

Questions for Maintainer

Is there a known output size limit in the Kiro API?
Are there any undocumented parameters to control response size?
Has this been reported to AWS/Amazon Q team?

Suggested Documentation

Consider adding a note to the README about this limitation so users are aware and can use the workarounds proactively.

Investigated with DEBUG_MODE=all. Raw stream evidence available if needed.

Originally created by @bhaskoro-muthohar on GitHub (Jan 25, 2026). Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/56 ## Summary When Claude generates tool calls with large payloads (e.g., Write tool with 200+ lines of code), the Kiro API truncates the response stream before the tool call completes. This results in incomplete JSON that cannot be parsed, causing tool calls to fail silently with empty parameters. **This is an upstream Kiro API limitation, not a gateway bug.** The gateway correctly logs and handles the truncated data, but cannot recover the missing payload. ## Environment - Kiro Gateway version: 2.1 - Client: Claude Code CLI - Models affected: claude-opus-4-5, claude-sonnet-4-5 (likely all models) ## Symptoms | Symptom | Description | |---------|-------------| | `Write: {}` in client transcript | Tool call recorded with empty parameters | | `tool_blocks=0` in gateway logs | No tool calls detected despite model generating them | | No `contextUsagePercentage` | Stream cut before completion metadata sent | | No `"stop":true` event | Tool call never finalized | ## Evidence ### Truncated vs Working Session Comparison | Metric | Truncated Session | Working Session | |--------|-------------------|-----------------| | Stream size | 4,775 bytes | 39,318 bytes | | toolUseEvent count | 16 | 110 | | `"stop":true` events | **0** | 2 | | contextUsagePercentage | **Not received** | 61.49% | ### Raw Stream Analysis The truncated stream shows the tool call starting correctly but being cut off mid-payload: ``` 1. <thinking>...</thinking> - Completes normally 2. toolUseEvent: {"name":"Write","toolUseId":"..."} - Starts correctly 3. toolUseEvent: {"input":"{\"file_pa",...} - Building file_path incrementally 4. ... (16 chunks total) 5. toolUseEvent: {"input":"or.ts\"",...} - STREAM ENDS ABRUPTLY Missing: - The "content" parameter (hundreds of lines of code) - Closing braces for the JSON - The "stop":true event to finalize the tool call - The contextUsagePercentage event ``` ### Key Finding: No `max_tokens` Support The Kiro API does not accept `max_tokens` or `inferenceConfig` parameters. The gateway payload only contains: - `conversationState` (messages, history, tools) - `profileArn` This means **there is no client-side control over output limits** - truncation is entirely server-side. ## Reproduction Steps 1. Start a Claude Code session via kiro-gateway 2. Ask Claude to write a large file (200+ lines of code) 3. Enable `DEBUG_MODE=all` to capture raw stream 4. Observe the stream truncation in `debug_logs/response_stream_raw.txt` 5. Tool call fails with empty parameters ## Why This Is NOT a Gateway Bug 1. **Raw stream logging proves data stops arriving** - The gateway logs every byte received from Kiro API 2. **No timeout issues** - `STREAMING_READ_TIMEOUT=300s` is generous; stream ends cleanly at packet boundary 3. **Parser handles truncation correctly** - `_diagnose_json_truncation()` detects and logs the issue 4. **Working sessions complete normally** - Same code path works for smaller payloads ## Workarounds Users can mitigate this by: 1. **Write smaller files** - Break large files into multiple smaller writes (<100 lines each) 2. **Use incremental edits** - Use Edit tool instead of Write for modifications 3. **Reduce context** - Start fresh sessions to minimize conversation history 4. **Manual copy** - Have Claude output code in chat, copy manually ## Questions for Maintainer 1. Is there a known output size limit in the Kiro API? 2. Are there any undocumented parameters to control response size? 3. Has this been reported to AWS/Amazon Q team? ## Suggested Documentation Consider adding a note to the README about this limitation so users are aware and can use the workarounds proactively. --- *Investigated with DEBUG_MODE=all. Raw stream evidence available if needed.*

kerem

2026-02-27 07:17:37 +03:00

closed this issue
added the
fixed

upstream
labels

kerem commented

2026-02-27 07:17:38 +03:00

Author

Owner

@jwadow commented on GitHub (Jan 30, 2026):

Hi, bro, gateway now detects when Kiro API truncates tool calls mid-stream and injects a notice in the next request so the model knows what happened and can adapt. Works for both tool calls and regular content truncation.

Implementation:

In-memory cache tracks truncations by tool_call_id
Modified tool_results get prepended with [API Limitation] notice
System prompt tells the model these notices are legit
Enabled by default, can disable via TRUNCATION_RECOVERY=false

The ~5KB limit is still there (upstream issue), but at least now the model understands why the tool call failed instead of thinking it forgot a parameter.

@jwadow commented on GitHub (Jan 30, 2026): Hi, bro, gateway now detects when Kiro API truncates tool calls mid-stream and injects a notice in the next request so the model knows what happened and can adapt. Works for both tool calls and regular content truncation. Implementation: - In-memory cache tracks truncations by tool_call_id - Modified tool_results get prepended with `[API Limitation]` notice - System prompt tells the model these notices are legit - Enabled by default, can disable via `TRUNCATION_RECOVERY=false` The ~5KB limit is still there (upstream issue), but at least now the model understands why the tool call failed instead of thinking it forgot a parameter.

kerem referenced this issue

2026-02-27 07:17:48 +03:00

[PR #37] [CLOSED] fix: Migrate to Q endpoints & update headers to Kiro IDE 0.8.86 #64