[GH-ISSUE #651] wrapMode "word" breaks mid-word with multi-byte UTF-8 text #946

Closed
opened 2026-03-14 09:06:43 +03:00 by kerem · 1 comment
Owner

Originally created by @la55u on GitHub (Feb 9, 2026).
Original GitHub issue: https://github.com/anomalyco/opentui/issues/651

Description

wrapMode: "word" on TextRenderable breaks words mid-character when text contains multi-byte UTF-8 characters (e.g. Hungarian accented characters like á, é, ö, ü). The optimal line-breaking algorithm appears to use byte offsets instead of display-width offsets when computing wrap positions.

Minimal Reproduction

import { test } from "bun:test"
import { createTestRenderer } from "@opentui/core/testing"
import { TextRenderable } from "@opentui/core"

test("word wrap breaks mid-word with multi-byte UTF-8", async () => {
  const testSetup = await createTestRenderer({ width: 80, height: 24 })

  // Hungarian text - 37 visible characters, 43 bytes
  const text = new TextRenderable(testSetup.renderer, {
    id: "t",
    content: "gyorskiszolgáló éttermek közül. Azóta alapjaiban értelmeztük újra a vendéglátást",
    wrapMode: "word",
    width: 40,
  })

  testSetup.renderer.root.add(text)
  await testSetup.renderOnce()
  const frame = testSetup.captureCharFrame()
  console.log(frame)

  testSetup.renderer.destroy()
})

Actual output

gyorskiszolgáló éttermek közül.
Azóta alapjaiban értelmeztük újr
a a vendéglátást

Line 1 breaks at 31 visible chars (36 bytes) even though "közül. Azóta" (37 chars) fits within width 40. The word "újra" on line 2 is split into "újr" / "a".

Expected output

gyorskiszolgáló éttermek közül. Azóta
alapjaiban értelmeztük újra a
vendéglátást

All breaks occur at whitespace boundaries. This is the output produced when accented characters are replaced with their ASCII equivalents (same char count, fewer bytes):

gyorskiszolgalo ettermek kozul. Azota
alapjaiban ertelmeztuk ujra a
vendeglatast

Key observations

  • ASCII text wraps correctly — the word-wrap algorithm itself works fine with single-byte characters.
  • Individual accented characters are measured at width 1 (correct) — "é".repeat(40) fits on a single line at width: 40.
  • The bug is in the line-breaking cost calculation — adding more text after a line causes earlier lines to reflow differently, confirming the engine uses an optimal (non-greedy) algorithm. That algorithm's cost function appears to use byte length instead of display width.
  • Both widthMethod: "unicode" and widthMethod: "wcwidth" produce the same broken output.

Environment

  • Platform: Linux
  • Runtime: Bun 1.3.5
Originally created by @la55u on GitHub (Feb 9, 2026). Original GitHub issue: https://github.com/anomalyco/opentui/issues/651 ## Description `wrapMode: "word"` on `TextRenderable` breaks words mid-character when text contains multi-byte UTF-8 characters (e.g. Hungarian accented characters like á, é, ö, ü). The optimal line-breaking algorithm appears to use byte offsets instead of display-width offsets when computing wrap positions. ## Minimal Reproduction ```tsx import { test } from "bun:test" import { createTestRenderer } from "@opentui/core/testing" import { TextRenderable } from "@opentui/core" test("word wrap breaks mid-word with multi-byte UTF-8", async () => { const testSetup = await createTestRenderer({ width: 80, height: 24 }) // Hungarian text - 37 visible characters, 43 bytes const text = new TextRenderable(testSetup.renderer, { id: "t", content: "gyorskiszolgáló éttermek közül. Azóta alapjaiban értelmeztük újra a vendéglátást", wrapMode: "word", width: 40, }) testSetup.renderer.root.add(text) await testSetup.renderOnce() const frame = testSetup.captureCharFrame() console.log(frame) testSetup.renderer.destroy() }) ``` ## Actual output ``` gyorskiszolgáló éttermek közül. Azóta alapjaiban értelmeztük újr a a vendéglátást ``` Line 1 breaks at 31 visible chars (36 bytes) even though `"közül. Azóta"` (37 chars) fits within width 40. The word `"újra"` on line 2 is split into `"újr"` / `"a"`. ## Expected output ``` gyorskiszolgáló éttermek közül. Azóta alapjaiban értelmeztük újra a vendéglátást ``` All breaks occur at whitespace boundaries. This is the output produced when accented characters are replaced with their ASCII equivalents (same char count, fewer bytes): ``` gyorskiszolgalo ettermek kozul. Azota alapjaiban ertelmeztuk ujra a vendeglatast ``` ## Key observations - **ASCII text wraps correctly** — the word-wrap algorithm itself works fine with single-byte characters. - **Individual accented characters are measured at width 1** (correct) — `"é".repeat(40)` fits on a single line at `width: 40`. - **The bug is in the line-breaking cost calculation** — adding more text *after* a line causes earlier lines to reflow differently, confirming the engine uses an optimal (non-greedy) algorithm. That algorithm's cost function appears to use byte length instead of display width. - Both `widthMethod: "unicode"` and `widthMethod: "wcwidth"` produce the same broken output. ## Environment - Platform: Linux - Runtime: Bun 1.3.5
kerem closed this issue 2026-03-14 09:06:48 +03:00
Author
Owner

@simonklee commented on GitHub (Feb 21, 2026):

I can't reproduce this in HEAD anymore. I wonder if f891d9631d fixed it. I'm not 100%.

pre-f891d963 output I get is:

  • gyorskiszolgáló éttermek közül. Azóta
  • alapjaiban értelmeztük újra a
  • vendéglátást

So before that fix I see a formatting issue (leading spaces on wrapped continuation lines), but not the exact mid-word split.

<!-- gh-comment-id:3939530127 --> @simonklee commented on GitHub (Feb 21, 2026): I can't reproduce this in HEAD anymore. I wonder if f891d9631d0c3b8b9b2dd24fe2b8c58804a1b76c fixed it. I'm not 100%. pre-f891d963 output I get is: - gyorskiszolgáló éttermek közül. Azóta - alapjaiban értelmeztük újra a - vendéglátást So before that fix I see a formatting issue (leading spaces on wrapped continuation lines), but not the exact mid-word split.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/opentui#946
No description provided.