mirror of
https://github.com/jwadow/kiro-gateway.git
synced 2026-04-25 01:15:57 +03:00
[GH-ISSUE #11] FR: enable thinking/reasoning mode via thinking tags? #12
Labels
No labels
bug
bug
enhancement
enhancement
fixed
fixed
invalid
needs-info
needs-testing
pull-request
question
upstream
wontfix
workaround
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/kiro-gateway-jwadow#12
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @JoeGrimes123 on GitHub (Dec 28, 2025).
Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/11
Apparently, if you add thinking_mode tag set to 'enabled', you will get response (still within content key) that contains similar to llm reasoning.
w/out thinking_mode tag:
req:
"messages": [ {"role": "user", "content": "Whats 2+2"} ]res:
"message": {"role": "assistant", "content": "2 + 2 = **4**"}w/ thinking_mode tag
req:
"messages": [ {"role": "user", "content": "<thinking_mode>enabled</thinking_mode>\n<max_thinking_length>32000</max_thinking_length>\n\nWhats 2+2"} ]res:
"message": {"role": "assistant", "content": "<thinking>\nThe user is asking a simple arithmetic question: 2+2.\n\nThe answer is 4.\n</thinking>\n\n2 + 2 = **4**"}If it is indeed how Kiro API handles thinking, I guess you can translate it to openai format so any client can pass the reasoning parameter though I think this is gonna be hard to implement since it'll be more prone to tool calling failures and errors like what happens in Antigravity models w/ thoughtsignature and what not.
Also I'm not sure if it's possible to create chain of thought/interleaved reasoning from this but it'd be fantastic if you can get it to work
@jwadow commented on GitHub (Dec 28, 2025):
Hi, I hadn't thought about this method, it's actually a brilliant solution. When I have time, I'll definitely try it out. I'm really excited about it now. Thanks for the tip.
This is also cool because in some situations, the model breaks down and starts reasoning out loud, clogging up the context with tokens and poisons itself. Otherwise, it will stuff all the junk into the reasoning tags.
On the other hand, Kiro has a limitation, probably 8192 output tokens per request (700-800 lines in VS Code), which is impossible to bypass. Consequently, some responses may be short, since your reasoning hack is essentially "content" and not "reasoning."
@jwadow commented on GitHub (Jan 3, 2026):
@JoeGrimes123
Done! Merged in the latest commit (git clone or in the future v1.0.8)
Added tag injection with FSM-based streaming parser that handles chunks properly. Converts to OpenAI reasoning_content format. Enabled by default.
Config: FAKE_REASONING_ENABLED, FAKE_REASONING_MAX_TOKENS (4000), FAKE_REASONING_HANDLING.
Yeah the 8k output limit is a real constraint - thinking eats into that budget. But for most cases it's fine, and you can always turn it off.
Thanks for finding this, was a fun one to implement.