[PR #210] [MERGED] feat(kiro): extended thinking support dan fix token counting #303

New issue

Closed

opened 2026-02-27 07:18:53 +03:00 by kerem · 0 comments

kerem commented

2026-02-27 07:18:53 +03:00

Owner

📋 Pull Request Information

Original PR: https://github.com/justlovemaki/AIClient-2-API/pull/210
Author: @tickernelz
Created: 1/11/2026
Status: ✅ Merged
Merged: 1/12/2026
Merged by: @justlovemaki

Base: main ← Head: feat/kiro-think-token-fix

📝 Commits (3)

bdfb27d feat(kiro): implement extended thinking support with streaming and token estimation
6ff2a9b docs(kiro): restore deleted comments
10e4a48 Merge remote-tracking branch 'aiclient/main' into feat/kiro-think-token-fix

📊 Changes

1 file changed (+452 additions, -83 deletions)

View changed files

📝 src/providers/claude/claude-kiro.js (+452 -83)

📄 Description

Summary

This PR implements extended thinking support for Claude's Kiro API adapter and significantly improves input token estimation accuracy for better context management in AI agent tools.

Key Features

1. Extended Thinking Support

Implements Claude's extended thinking capability (PR #197 feature request)
Real-time thinking tag parsing with state machine for streaming responses
Proper content block ordering: thinking (index 0), text (index 1), tool_use (index 2+)
Dynamic block index management to maintain Claude API compatibility
Smart buffer management to prevent tag truncation during streaming
Support for thinking budget token configuration with validation and clamping

2. Enhanced Token Estimation

Adaptive overhead calculation based on tools definition size
Intelligent estimation strategy:
- No tools: 25% overhead + 400 base tokens
- Small tools definition (<21k tokens): 18% overhead + 400 base tokens
- Large tools definition (≥21k tokens): 8% overhead + 400 base tokens
Comprehensive content type coverage:
- System prompts
- Text content
- Tool results (critical for large context scenarios)
- Tool use inputs
- Images (1500 tokens each)
- Thinking content in message history
Minimal gap with actual context usage (0.1-7% for large requests)
Real context values still retrieved from API for accuracy

3. Performance Characteristics

No performance degradation
Message start event sent immediately before content streaming
Maintains backward compatibility with existing implementations
Proper handling of conversation history in token calculations

Testing

Extensively tested across multiple AI agent tools:

Claude Code
OpenCode
Kilo Code
Forge

All tools demonstrate improved context management and accurate token estimation across various workload patterns including:

Small requests without tools
Large requests with extensive tool definitions
Multi-turn conversations with thinking content
Requests with large tool results (30k+ tokens)

Technical Implementation

Thinking Support

Added KIRO_THINKING constants for tag management
Implemented helper functions for tag detection outside quoted strings
Created thinking-related methods for budget normalization and prefix generation
Updated buildCodewhispererRequest() to inject thinking prefixes and handle thinking blocks
Refactored generateContentStream() with proper state machine for real-time parsing
Updated buildClaudeResponse() for non-streaming thinking support

Token Estimation

Refactored estimateInputTokens() to iterate through all content types
Implemented adaptive overhead based on tools definition size
Added proper handling for conversation history overhead
Removed debug logging for production readiness

Breaking Changes

None. All changes are backward compatible.

Addresses PR #197 feature request for extended thinking support
Improves token estimation accuracy for auto-compact functionality

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/justlovemaki/AIClient-2-API/pull/210 **Author:** [@tickernelz](https://github.com/tickernelz) **Created:** 1/11/2026 **Status:** ✅ Merged **Merged:** 1/12/2026 **Merged by:** [@justlovemaki](https://github.com/justlovemaki) **Base:** `main` ← **Head:** `feat/kiro-think-token-fix` --- ### 📝 Commits (3) - [`bdfb27d`](https://github.com/justlovemaki/AIClient-2-API/commit/bdfb27d6d4a3fed8b392999d038aaf2b29f0678f) feat(kiro): implement extended thinking support with streaming and token estimation - [`6ff2a9b`](https://github.com/justlovemaki/AIClient-2-API/commit/6ff2a9b7bdbd54af775c49db59057c40681e2272) docs(kiro): restore deleted comments - [`10e4a48`](https://github.com/justlovemaki/AIClient-2-API/commit/10e4a48f7993d0bd90a536c031215073751b6197) Merge remote-tracking branch 'aiclient/main' into feat/kiro-think-token-fix ### 📊 Changes **1 file changed** (+452 additions, -83 deletions) <details> <summary>View changed files</summary> 📝 `src/providers/claude/claude-kiro.js` (+452 -83) </details> ### 📄 Description ## Summary This PR implements extended thinking support for Claude's Kiro API adapter and significantly improves input token estimation accuracy for better context management in AI agent tools. ## Key Features ### 1. Extended Thinking Support - Implements Claude's extended thinking capability (PR #197 feature request) - Real-time thinking tag parsing with state machine for streaming responses - Proper content block ordering: thinking (index 0), text (index 1), tool_use (index 2+) - Dynamic block index management to maintain Claude API compatibility - Smart buffer management to prevent tag truncation during streaming - Support for thinking budget token configuration with validation and clamping ### 2. Enhanced Token Estimation - Adaptive overhead calculation based on tools definition size - Intelligent estimation strategy: - No tools: 25% overhead + 400 base tokens - Small tools definition (<21k tokens): 18% overhead + 400 base tokens - Large tools definition (≥21k tokens): 8% overhead + 400 base tokens - Comprehensive content type coverage: - System prompts - Text content - Tool results (critical for large context scenarios) - Tool use inputs - Images (1500 tokens each) - Thinking content in message history - Minimal gap with actual context usage (0.1-7% for large requests) - Real context values still retrieved from API for accuracy ### 3. Performance Characteristics - No performance degradation - Message start event sent immediately before content streaming - Maintains backward compatibility with existing implementations - Proper handling of conversation history in token calculations ## Testing Extensively tested across multiple AI agent tools: - Claude Code - OpenCode - Kilo Code - Forge All tools demonstrate improved context management and accurate token estimation across various workload patterns including: - Small requests without tools - Large requests with extensive tool definitions - Multi-turn conversations with thinking content - Requests with large tool results (30k+ tokens) ## Technical Implementation ### Thinking Support - Added `KIRO_THINKING` constants for tag management - Implemented helper functions for tag detection outside quoted strings - Created thinking-related methods for budget normalization and prefix generation - Updated `buildCodewhispererRequest()` to inject thinking prefixes and handle thinking blocks - Refactored `generateContentStream()` with proper state machine for real-time parsing - Updated `buildClaudeResponse()` for non-streaming thinking support ### Token Estimation - Refactored `estimateInputTokens()` to iterate through all content types - Implemented adaptive overhead based on tools definition size - Added proper handling for conversation history overhead - Removed debug logging for production readiness ## Breaking Changes None. All changes are backward compatible. ## Related Issues - Addresses PR #197 feature request for extended thinking support - Improves token estimation accuracy for auto-compact functionality --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>