[PR #73] [MERGED] fix(adapters): handle UTF-8 boundaries in truncate function #121

Closed
opened 2026-02-27 10:22:18 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/mikeyobrien/ralph-orchestrator/pull/73
Author: @mikeyobrien
Created: 1/19/2026
Status: Merged
Merged: 1/19/2026
Merged by: @mikeyobrien

Base: mainHead: fix/utf8-truncate-boundary


📝 Commits (1)

  • d8dcbf9 fix(adapters): handle UTF-8 boundaries in truncate function

📊 Changes

1 file changed (+46 additions, -1 deletions)

View changed files

📝 crates/ralph-adapters/src/claude_stream.rs (+46 -1)

📄 Description

Summary

  • Fix panic in truncate() when byte limit falls inside a multi-byte UTF-8 character
  • Find the last valid char boundary before truncating instead of raw byte slicing
  • Add comprehensive tests for multi-byte characters (arrows, emojis)

Problem

The truncate function in claude_stream.rs was using &s[..max_len] which panics if max_len lands inside a multi-byte UTF-8 sequence. For example:

let s = "hello→world";  // → is 3 bytes
truncate(s, 6);         // Panics! Byte 6 is inside the → character

Solution

Use char_indices() to find the last valid UTF-8 boundary at or before max_len:

let boundary = s
    .char_indices()
    .take_while(|(i, _)| *i < max_len)
    .last()
    .map(|(i, c)| i + c.len_utf8())
    .unwrap_or(0);

Test plan

  • test_truncate_utf8_boundary - verifies truncation with arrow character (3-byte)
  • test_truncate_utf8_emoji - verifies truncation with emoji (4-byte)
  • Existing tests continue to pass

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/mikeyobrien/ralph-orchestrator/pull/73 **Author:** [@mikeyobrien](https://github.com/mikeyobrien) **Created:** 1/19/2026 **Status:** ✅ Merged **Merged:** 1/19/2026 **Merged by:** [@mikeyobrien](https://github.com/mikeyobrien) **Base:** `main` ← **Head:** `fix/utf8-truncate-boundary` --- ### 📝 Commits (1) - [`d8dcbf9`](https://github.com/mikeyobrien/ralph-orchestrator/commit/d8dcbf91c923aecb68d43c2a353962797900baae) fix(adapters): handle UTF-8 boundaries in truncate function ### 📊 Changes **1 file changed** (+46 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `crates/ralph-adapters/src/claude_stream.rs` (+46 -1) </details> ### 📄 Description ## Summary - Fix panic in `truncate()` when byte limit falls inside a multi-byte UTF-8 character - Find the last valid char boundary before truncating instead of raw byte slicing - Add comprehensive tests for multi-byte characters (arrows, emojis) ## Problem The `truncate` function in `claude_stream.rs` was using `&s[..max_len]` which panics if `max_len` lands inside a multi-byte UTF-8 sequence. For example: ```rust let s = "hello→world"; // → is 3 bytes truncate(s, 6); // Panics! Byte 6 is inside the → character ``` ## Solution Use `char_indices()` to find the last valid UTF-8 boundary at or before `max_len`: ```rust let boundary = s .char_indices() .take_while(|(i, _)| *i < max_len) .last() .map(|(i, c)| i + c.len_utf8()) .unwrap_or(0); ``` ## Test plan - [x] `test_truncate_utf8_boundary` - verifies truncation with arrow character (3-byte) - [x] `test_truncate_utf8_emoji` - verifies truncation with emoji (4-byte) - [x] Existing tests continue to pass --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-27 10:22:18 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ralph-orchestrator#121
No description provided.