[PR #610] [CLOSED] fix(core): use byte offset for line_starts in text wrap #658

Closed
opened 2026-03-02 23:47:33 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/anomalyco/opentui/pull/610
Author: @zenyr
Created: 2/1/2026
Status: Closed

Base: mainHead: fix/255-line-starts-byte-offset


📝 Commits (10+)

  • 841364c fix(core): use byte offset for line_starts in text wrap
  • 4f2e8d1 test(core): add no-wrap byte offset test for CJK text
  • 00336eb fix(core): use byte offset for line_starts in no-wrap mode
  • 089f334 refactor(core): use proactive byte offset tracking in no-wrap mode
  • e344f44 perf(core): store byte_start/byte_len in VirtualChunk
  • 76b11a4 test(text-buffer-view): add tests for CJK byte offset handling
  • 2823590 fix: recompute byte offsets
  • a3123a6 Merge branch 'main' into fix/255-line-starts-byte-offset
  • d241b35 add word split bug repro
  • 8f6f30b Merge branch 'main' into fix/255-line-starts-byte-offset

📊 Changes

3 files changed (+175 additions, -17 deletions)

View changed files

📝 packages/core/src/text-buffer-view.test.ts (+38 -0)
📝 packages/core/src/zig/tests/text-buffer-view_test.zig (+31 -0)
📝 packages/core/src/zig/text-buffer-view.zig (+106 -17)

📄 Description

Summary

  • Fix lineInfo.lineStarts to return byte offsets instead of column offsets
  • Prevents character corruption when slicing wrapped multi-byte text (CJK, emoji)

Fixes #609

Changes

  • Add byte_offset tracking to VirtualLine and wrap context in text-buffer-view.zig
  • Update cached_line_starts to use byte position instead of char/column position
  • Add regression tests for CJK text wrapping

Test

buffer.setStyledText(stringToStyledText("흐름도"))
view.setWrapWidth(4)
view.lineInfo.lineStarts // [0, 6] ✅ (was [0, 4] ❌)

Design Considerations

Alternatives Considered

Approach Trade-off
Compute byte offset on-demand at read time O(n) per lineInfo access — defeats caching purpose
Store only in cached_line_starts without VirtualLine field Would require extra params in commitVirtualLine, harder to maintain
Derive from chunk byte lengths Still needs global traversal; chunks are buffer-relative, not document-relative

Chosen approach: Track byte_offset during wrap (O(1) amortized). Memory overhead is ~4 bytes per wrapped line (~80KB for 10K lines), which is acceptable.

Relation to Prior Work

  • #529 (merged): Introduced VirtualLine structure with char_offset. This PR builds on that by adding byte_offset alongside it.
  • #512 (closed): Addressed wrap boundary detection with col_offset in WrapBreak. Different problem — this PR fixes lineStarts return values, not boundary calculation.

Both prior PRs left cached_line_starts using character/column offsets, which corrupts multi-byte characters when used for slicing.

Known Limitation

The no-wrap code path still uses char_offset for lineStarts. This is out of scope for this PR, as wrapping is the primary use case for lineStarts.


cc: @DonKongPaPa @kskang @soomtong @stefanahman — you showed interest in #255; this PR addresses a remaining edge case with line boundary handling. Feedback welcome!


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/anomalyco/opentui/pull/610 **Author:** [@zenyr](https://github.com/zenyr) **Created:** 2/1/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `fix/255-line-starts-byte-offset` --- ### 📝 Commits (10+) - [`841364c`](https://github.com/anomalyco/opentui/commit/841364cb55e46887163ea7e72b13a8e325e57f19) fix(core): use byte offset for line_starts in text wrap - [`4f2e8d1`](https://github.com/anomalyco/opentui/commit/4f2e8d153b4fde8cccfb2c116f2547a76af992d7) test(core): add no-wrap byte offset test for CJK text - [`00336eb`](https://github.com/anomalyco/opentui/commit/00336eb21323cc0fd26fc74d8f33cc79a237608f) fix(core): use byte offset for line_starts in no-wrap mode - [`089f334`](https://github.com/anomalyco/opentui/commit/089f33449dbdaf2de43a8639c394a3f03299e339) refactor(core): use proactive byte offset tracking in no-wrap mode - [`e344f44`](https://github.com/anomalyco/opentui/commit/e344f44f4801bfd7083c33360fc2ff46e654b646) perf(core): store byte_start/byte_len in VirtualChunk - [`76b11a4`](https://github.com/anomalyco/opentui/commit/76b11a43e5fdffcc3f38239d9533956d7a783c07) test(text-buffer-view): add tests for CJK byte offset handling - [`2823590`](https://github.com/anomalyco/opentui/commit/28235904434523b45720cc894ba616f631096ccd) fix: recompute byte offsets - [`a3123a6`](https://github.com/anomalyco/opentui/commit/a3123a6f6e4a590438a9bf9f75c4a5c9385ba054) Merge branch 'main' into fix/255-line-starts-byte-offset - [`d241b35`](https://github.com/anomalyco/opentui/commit/d241b35f416d33a55a42beb4f4b28d5c461d311b) add word split bug repro - [`8f6f30b`](https://github.com/anomalyco/opentui/commit/8f6f30bcb511100594b328f4434c7fe052292f89) Merge branch 'main' into fix/255-line-starts-byte-offset ### 📊 Changes **3 files changed** (+175 additions, -17 deletions) <details> <summary>View changed files</summary> 📝 `packages/core/src/text-buffer-view.test.ts` (+38 -0) 📝 `packages/core/src/zig/tests/text-buffer-view_test.zig` (+31 -0) 📝 `packages/core/src/zig/text-buffer-view.zig` (+106 -17) </details> ### 📄 Description ## Summary - Fix `lineInfo.lineStarts` to return byte offsets instead of column offsets - Prevents character corruption when slicing wrapped multi-byte text (CJK, emoji) Fixes #609 ## Changes - Add `byte_offset` tracking to `VirtualLine` and wrap context in `text-buffer-view.zig` - Update `cached_line_starts` to use byte position instead of char/column position - Add regression tests for CJK text wrapping ## Test ```ts buffer.setStyledText(stringToStyledText("흐름도")) view.setWrapWidth(4) view.lineInfo.lineStarts // [0, 6] ✅ (was [0, 4] ❌) ``` --- ## Design Considerations ### Alternatives Considered | Approach | Trade-off | |----------|-----------| | Compute byte offset on-demand at read time | O(n) per `lineInfo` access — defeats caching purpose | | Store only in `cached_line_starts` without `VirtualLine` field | Would require extra params in `commitVirtualLine`, harder to maintain | | Derive from chunk byte lengths | Still needs global traversal; chunks are buffer-relative, not document-relative | **Chosen approach:** Track `byte_offset` during wrap (O(1) amortized). Memory overhead is ~4 bytes per wrapped line (~80KB for 10K lines), which is acceptable. ### Relation to Prior Work - **#529** (merged): Introduced `VirtualLine` structure with `char_offset`. This PR builds on that by adding `byte_offset` alongside it. - **#512** (closed): Addressed wrap *boundary detection* with `col_offset` in `WrapBreak`. Different problem — this PR fixes `lineStarts` *return values*, not boundary calculation. Both prior PRs left `cached_line_starts` using character/column offsets, which corrupts multi-byte characters when used for slicing. ### Known Limitation The no-wrap code path still uses `char_offset` for `lineStarts`. This is out of scope for this PR, as wrapping is the primary use case for `lineStarts`. --- cc: @DonKongPaPa @kskang @soomtong @stefanahman — you showed interest in #255; this PR addresses a remaining edge case with line boundary handling. Feedback welcome! --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 23:47:33 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/opentui#658
No description provided.