[GH-ISSUE #609] lineStarts returns column offset instead of byte offset when wrapping multi-byte text #165

Open
opened 2026-03-02 23:45:00 +03:00 by kerem · 1 comment
Owner

Originally created by @zenyr on GitHub (Feb 1, 2026).
Original GitHub issue: https://github.com/anomalyco/opentui/issues/609

Problem

After #255 fix, CJK/emoji text renders correctly in most cases. However, lineInfo.lineStarts still returns column offsets instead of byte offsets when text wraps. This causes character corruption at line boundaries.

Korean text wrap (representative case)
buffer.setStyledText(stringToStyledText("흐름도")) // 9 bytes
view.setWrapWidth(4) // wraps after "흐름"
lineStarts Line 0 Line 1
AS-IS [0, 4] "흐�" "��도"
TO-BE [0, 6] "흐름" "도"
Chinese text wrap
buffer.setStyledText(stringToStyledText("你好世界")) // 12 bytes
view.setWrapWidth(4)
lineStarts Line 0 Line 1
AS-IS [0, 4] "你�" "��世界"
TO-BE [0, 6] "你好" "世界"
Korean with leading ASCII
buffer.setStyledText(stringToStyledText(" 안녕a")) // 8 bytes
view.setWrapWidth(5)
lineStarts Line 0 Line 1
AS-IS [0, 5] " 안�" "��a"
TO-BE [0, 7] " 안녕" "a"
Flag emoji
buffer.setStyledText(stringToStyledText("🇰🇷🇯🇵🇨🇳")) // 24 bytes
view.setWrapWidth(4)
lineStarts Line 0 Line 1
AS-IS [0, 4] "🇰" "🇷🇯🇵🇨🇳"
TO-BE [0, 16] "🇰🇷🇯🇵" "🇨🇳"
Skin tone emoji
buffer.setStyledText(stringToStyledText("👋🏻👋🏿hi")) // 18 bytes
view.setWrapWidth(4)
lineStarts Line 0 Line 1 Line 2
AS-IS [0, 4, 8] "👋" "🏻" "👋🏿hi"
TO-BE [0, 8, 16] "👋🏻" "👋🏿" "hi"

AS-IS splits skin tone modifier from base emoji.

Mixed keycap + CJK
buffer.setStyledText(stringToStyledText("1️⃣한글")) // 13 bytes
view.setWrapWidth(4)
lineStarts Line 0 Line 1
AS-IS [0, 3] "1�" "�⃣한글"
TO-BE [0, 10] "1️⃣한" "글"
Long Korean (multiple wraps)
buffer.setStyledText(stringToStyledText("가나다라마바사")) // 21 bytes
view.setWrapWidth(6)
lineStarts Line 0 Line 1 Line 2
AS-IS [0, 6, 12] "가나" "다라" "마바사"
TO-BE [0, 9, 18] "가나" "다라" "마바사"

Happens to display correctly but byte offsets are still wrong (6≠9, 12≠18).

Root Cause

In text-buffer-view.zig, cached_line_starts was populated using char_offset (display column width) instead of tracking actual byte position through the wrap logic.

Originally created by @zenyr on GitHub (Feb 1, 2026). Original GitHub issue: https://github.com/anomalyco/opentui/issues/609 ### Problem After #255 fix, CJK/emoji text renders correctly in most cases. However, `lineInfo.lineStarts` still returns **column offsets** instead of **byte offsets** when text wraps. This causes character corruption at line boundaries. <details open> <summary><b>Korean text wrap (representative case)</b></summary> ```ts buffer.setStyledText(stringToStyledText("흐름도")) // 9 bytes view.setWrapWidth(4) // wraps after "흐름" ``` | | `lineStarts` | Line 0 | Line 1 | | ----- | ---------- | ------ | ------ | | **AS-IS** | `[0, 4]` | `"흐�"` | `"��도"` | | **TO-BE** | `[0, 6]` | `"흐름"` | `"도"` | </details> <details> <summary><b>Chinese text wrap</b></summary> ```ts buffer.setStyledText(stringToStyledText("你好世界")) // 12 bytes view.setWrapWidth(4) ``` | | `lineStarts` | Line 0 | Line 1 | | ----- | ---------- | ------ | -------- | | **AS-IS** | `[0, 4]` | `"你�"` | `"��世界"` | | **TO-BE** | `[0, 6]` | `"你好"` | `"世界"` | </details> <details> <summary><b>Korean with leading ASCII</b></summary> ```ts buffer.setStyledText(stringToStyledText(" 안녕a")) // 8 bytes view.setWrapWidth(5) ``` | | `lineStarts` | Line 0 | Line 1 | | ----- | ---------- | ------- | ------ | | **AS-IS** | `[0, 5]` | `" 안�"` | `"��a"` | | **TO-BE** | `[0, 7]` | `" 안녕"` | `"a"` | </details> <details> <summary><b>Flag emoji</b></summary> ```ts buffer.setStyledText(stringToStyledText("🇰🇷🇯🇵🇨🇳")) // 24 bytes view.setWrapWidth(4) ``` | | `lineStarts` | Line 0 | Line 1 | | ----- | ---------- | ------ | ------- | | **AS-IS** | `[0, 4]` | `"🇰"` | `"🇷🇯🇵🇨🇳"` | | **TO-BE** | `[0, 16]` | `"🇰🇷🇯🇵"` | `"🇨🇳"` | </details> <details> <summary><b>Skin tone emoji</b></summary> ```ts buffer.setStyledText(stringToStyledText("👋🏻👋🏿hi")) // 18 bytes view.setWrapWidth(4) ``` | | `lineStarts` | Line 0 | Line 1 | Line 2 | | ----- | ---------- | ------ | ------ | ------ | | **AS-IS** | `[0, 4, 8]` | `"👋"` | `"🏻"` | `"👋🏿hi"` | | **TO-BE** | `[0, 8, 16]` | `"👋🏻"` | `"👋🏿"` | `"hi"` | AS-IS splits skin tone modifier from base emoji. </details> <details> <summary><b>Mixed keycap + CJK</b></summary> ```ts buffer.setStyledText(stringToStyledText("1️⃣한글")) // 13 bytes view.setWrapWidth(4) ``` | | `lineStarts` | Line 0 | Line 1 | | ----- | ---------- | ------ | -------- | | **AS-IS** | `[0, 3]` | `"1�"` | `"�⃣한글"` | | **TO-BE** | `[0, 10]` | `"1️⃣한"` | `"글"` | </details> <details> <summary><b>Long Korean (multiple wraps)</b></summary> ```ts buffer.setStyledText(stringToStyledText("가나다라마바사")) // 21 bytes view.setWrapWidth(6) ``` | | `lineStarts` | Line 0 | Line 1 | Line 2 | | ----- | ---------- | ------ | ------ | -------- | | **AS-IS** | `[0, 6, 12]` | `"가나"` | `"다라"` | `"마바사"` | | **TO-BE** | `[0, 9, 18]` | `"가나"` | `"다라"` | `"마바사"` | Happens to display correctly but byte offsets are still wrong (6≠9, 12≠18). </details> ### Root Cause In `text-buffer-view.zig`, `cached_line_starts` was populated using `char_offset` (display column width) instead of tracking actual byte position through the wrap logic. ### Related - #255
Author
Owner

@simonklee commented on GitHub (Mar 2, 2026):

tests/fix-609-cases

<!-- gh-comment-id:3986395598 --> @simonklee commented on GitHub (Mar 2, 2026): [tests/fix-609-cases](https://github.com/anomalyco/opentui/tree/tests/fix-609-cases)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/opentui#165
No description provided.