[PR #506] [MERGED] perf(utf8): ASCII wrapping via strict printable-only invariant #1362

New issue

Closed

opened 2026-03-14 09:32:47 +03:00 by kerem · 0 comments

kerem commented

2026-03-14 09:32:47 +03:00

Owner

📋 Pull Request Information

Original PR: https://github.com/anomalyco/opentui/pull/506
Author: @simonklee
Created: 1/9/2026
Status: ✅ Merged
Merged: 1/13/2026
Merged by: @kommander

Base: main ← Head: perf-utf8-ascii-invariant

📝 Commits (2)

715d195 perf(utf8): ASCII wrapping via strict printable-only invariant
b187042 Merge branch 'main' into perf-utf8-ascii-invariant

📊 Changes

5 files changed (+46 additions, -134 deletions)

View changed files

📝 packages/core/src/zig/bench/utf8_bench.zig (+1 -1)
📝 packages/core/src/zig/tests/utf8_test.zig (+25 -25)
📝 packages/core/src/zig/tests/utf8_wcwidth_test.zig (+1 -1)
📝 packages/core/src/zig/text-buffer-segment.zig (+1 -1)
📝 packages/core/src/zig/utf8.zig (+18 -106)

📄 Description

The is ascii only checks isn't exactly what the name implies. It strictly enforces printable ASCII (32-126), explicitly excluding control characters like tabs (\t) and newlines.

This provides a stronger guarantee than typical 7-bit ASCII checks: if isAsciiOnly is true, every byte is exactly 1 column wide.

However, we ignored this and still running O(N)
width loops even on the fast path. This patch deletes those loops entirely:

If it's guaranteed printable ASCII, the display width is identical to text.len. We don't need to iterate N bytes just to add 1 N times.
Since width maps 1:1 to byte index, the wrap position is simply min(text.len, max_width). We don't need to scan the string to find where it overflows.

The obvious risk here is tabs (byte 9), which are strictly ASCII but variable width.

But since isAsciiOnly checks val >= 32, so it returns false for \t. This forces tabbed content into the slow Unicode path where tab_width is handled properly.

I also considered if it was possible for FFI consumers to pass isAsciiOnly=true for strings with tabs or newlines. But since the public API doesn't expose isAsciiOnly directly, and instead derives it via utf8.isAsciiOnly(), which returns false for empty strings and control characters, this optimization is safe and transparent to external users.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/anomalyco/opentui/pull/506 **Author:** [@simonklee](https://github.com/simonklee) **Created:** 1/9/2026 **Status:** ✅ Merged **Merged:** 1/13/2026 **Merged by:** [@kommander](https://github.com/kommander) **Base:** `main` ← **Head:** `perf-utf8-ascii-invariant` --- ### 📝 Commits (2) - [`715d195`](https://github.com/anomalyco/opentui/commit/715d195882e5e1cd6afd35966378a24f862bdc37) perf(utf8): ASCII wrapping via strict printable-only invariant - [`b187042`](https://github.com/anomalyco/opentui/commit/b187042b145b0caf6a5f4397b157cd302c7ec360) Merge branch 'main' into perf-utf8-ascii-invariant ### 📊 Changes **5 files changed** (+46 additions, -134 deletions) <details> <summary>View changed files</summary> 📝 `packages/core/src/zig/bench/utf8_bench.zig` (+1 -1) 📝 `packages/core/src/zig/tests/utf8_test.zig` (+25 -25) 📝 `packages/core/src/zig/tests/utf8_wcwidth_test.zig` (+1 -1) 📝 `packages/core/src/zig/text-buffer-segment.zig` (+1 -1) 📝 `packages/core/src/zig/utf8.zig` (+18 -106) </details> ### 📄 Description The is ascii only checks isn't exactly what the name implies. It strictly enforces printable ASCII (32-126), explicitly excluding control characters like tabs (`\t`) and newlines. This provides a stronger guarantee than typical 7-bit ASCII checks: if `isAsciiOnly` is true, every byte is exactly 1 column wide. However, we ignored this and still running O(N) width loops even on the fast path. This patch deletes those loops entirely: 1. If it's guaranteed printable ASCII, the display width is identical to `text.len`. We don't need to iterate N bytes just to add 1 N times. 2. Since width maps 1:1 to byte index, the wrap position is simply `min(text.len, max_width)`. We don't need to scan the string to find where it overflows. The obvious risk here is tabs (byte 9), which are strictly ASCII but variable width. But since `isAsciiOnly` checks `val >= 32`, so it returns false for `\t`. This forces tabbed content into the slow Unicode path where `tab_width` is handled properly. I also considered if it was possible for FFI consumers to pass `isAsciiOnly=true` for strings with tabs or newlines. But since the public API doesn't expose `isAsciiOnly` directly, and instead derives it via `utf8.isAsciiOnly()`, which returns false for empty strings and control characters, this optimization is safe and transparent to external users. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>