[GH-ISSUE #11] FR: enable thinking/reasoning mode via thinking tags? #12

Closed
opened 2026-02-27 07:17:23 +03:00 by kerem · 2 comments
Owner

Originally created by @JoeGrimes123 on GitHub (Dec 28, 2025).
Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/11

Apparently, if you add thinking_mode tag set to 'enabled', you will get response (still within content key) that contains similar to llm reasoning.

w/out thinking_mode tag:

req:
"messages": [ {"role": "user", "content": "Whats 2+2"} ]

res:
"message": {"role": "assistant", "content": "2 + 2 = **4**"}

w/ thinking_mode tag

req:
"messages": [ {"role": "user", "content": "<thinking_mode>enabled</thinking_mode>\n<max_thinking_length>32000</max_thinking_length>\n\nWhats 2+2"} ]

res:
"message": {"role": "assistant", "content": "<thinking>\nThe user is asking a simple arithmetic question: 2+2.\n\nThe answer is 4.\n</thinking>\n\n2 + 2 = **4**"}

If it is indeed how Kiro API handles thinking, I guess you can translate it to openai format so any client can pass the reasoning parameter though I think this is gonna be hard to implement since it'll be more prone to tool calling failures and errors like what happens in Antigravity models w/ thoughtsignature and what not.
Also I'm not sure if it's possible to create chain of thought/interleaved reasoning from this but it'd be fantastic if you can get it to work

Originally created by @JoeGrimes123 on GitHub (Dec 28, 2025). Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/11 Apparently, if you add thinking_mode tag set to 'enabled', you will get response (still within content key) that contains similar to llm reasoning. w/out thinking_mode tag: req: `"messages": [ {"role": "user", "content": "Whats 2+2"} ]` res: `"message": {"role": "assistant", "content": "2 + 2 = **4**"}` w/ thinking_mode tag req: `"messages": [ {"role": "user", "content": "<thinking_mode>enabled</thinking_mode>\n<max_thinking_length>32000</max_thinking_length>\n\nWhats 2+2"} ]` res: `"message": {"role": "assistant", "content": "<thinking>\nThe user is asking a simple arithmetic question: 2+2.\n\nThe answer is 4.\n</thinking>\n\n2 + 2 = **4**"}` If it is indeed how Kiro API handles thinking, I guess you can translate it to openai format so any client can pass the reasoning parameter though I think this is gonna be hard to implement since it'll be more prone to tool calling failures and errors like what happens in Antigravity models w/ thoughtsignature and what not. Also I'm not sure if it's possible to create chain of thought/interleaved reasoning from this but it'd be fantastic if you can get it to work
kerem 2026-02-27 07:17:23 +03:00
Author
Owner

@jwadow commented on GitHub (Dec 28, 2025):

Hi, I hadn't thought about this method, it's actually a brilliant solution. When I have time, I'll definitely try it out. I'm really excited about it now. Thanks for the tip.

This is also cool because in some situations, the model breaks down and starts reasoning out loud, clogging up the context with tokens and poisons itself. Otherwise, it will stuff all the junk into the reasoning tags.

On the other hand, Kiro has a limitation, probably 8192 output tokens per request (700-800 lines in VS Code), which is impossible to bypass. Consequently, some responses may be short, since your reasoning hack is essentially "content" and not "reasoning."

<!-- gh-comment-id:3694480899 --> @jwadow commented on GitHub (Dec 28, 2025): Hi, I hadn't thought about this method, it's actually a brilliant solution. When I have time, I'll definitely try it out. I'm really excited about it now. Thanks for the tip. This is also cool because in some situations, the model breaks down and starts reasoning out loud, clogging up the context with tokens and poisons itself. Otherwise, it will stuff all the junk into the reasoning tags. On the other hand, Kiro has a limitation, probably 8192 output tokens per request (700-800 lines in VS Code), which is impossible to bypass. Consequently, some responses may be short, since your reasoning hack is essentially "content" and not "reasoning."
Author
Owner

@jwadow commented on GitHub (Jan 3, 2026):

@JoeGrimes123

Done! Merged in the latest commit (git clone or in the future v1.0.8)

Added tag injection with FSM-based streaming parser that handles chunks properly. Converts to OpenAI reasoning_content format. Enabled by default.

Config: FAKE_REASONING_ENABLED, FAKE_REASONING_MAX_TOKENS (4000), FAKE_REASONING_HANDLING.

Yeah the 8k output limit is a real constraint - thinking eats into that budget. But for most cases it's fine, and you can always turn it off.

Thanks for finding this, was a fun one to implement.

<!-- gh-comment-id:3707372158 --> @jwadow commented on GitHub (Jan 3, 2026): @JoeGrimes123 Done! Merged in the latest commit (git clone or in the future v1.0.8) Added tag injection with FSM-based streaming parser that handles chunks properly. Converts to OpenAI reasoning_content format. Enabled by default. Config: FAKE_REASONING_ENABLED, FAKE_REASONING_MAX_TOKENS (4000), FAKE_REASONING_HANDLING. Yeah the 8k output limit is a real constraint - thinking eats into that budget. But for most cases it's fine, and you can always turn it off. Thanks for finding this, was a fun one to implement.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/kiro-gateway-jwadow#12
No description provided.