mirror of
https://github.com/anomalyco/opentui.git
synced 2026-04-25 13:06:00 +03:00
[GH-ISSUE #596] bug: ctrl+w (deleteWordBackward) doesn't recognize word boundaries between CJK and ASCII characters #163
Labels
No labels
bug
core
documentation
feature
good first issue
help wanted
pull-request
question
react
solid
tmux
windows
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/opentui#163
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mkusaka on GitHub (Jan 27, 2026).
Original GitHub issue: https://github.com/anomalyco/opentui/issues/596
Description
When mixing CJK (Japanese/Chinese/Korean) characters with ASCII text,
ctrl+w/deleteWordBackwarddeletes both together instead of treating them as separate words.Steps to Reproduce
日本語abcorテストtestctrl+wExpected Behavior
Only
abc(ortest) should be deleted, stopping at the CJK-ASCII boundary.This is the expected behavior per Unicode UAX #\29 (Text Segmentation), where CJK characters are treated as individual word units, creating implicit boundaries between CJK and Latin scripts.
Actual Behavior
The entire string
日本語abcis deleted at once.Root Cause
Looking at
utf8.zig, thefindWrapBreaksfunction only recognizes these as word boundaries:-,/,., etc.), bracketsCJK characters and script transitions are not considered word boundaries.
Suggested Fix
Consider implementing UAX #\29 compliant word boundary detection, or at minimum:
References
Environment
@hwisu commented on GitHub (Jan 28, 2026):
Treating each character as a separate word does not match the expectations of Korean users.
In practice, word boundaries are expected to be determined by whitespace, not by individual characters.
Changing word-navigation behavior depending on the script (especially when mixed with other languages)
would break long-established editor behavior inherited from Vim and similar tools.
This kind of language-specific branching can easily lead to inconsistent and unpredictable cursor movement.
@simonklee commented on GitHub (Jan 28, 2026):
If i understand this issue correct you're both right about the core issue. There's no word boundary detection at CJK-ASCII transitions, so ctrl+w on 日本語abc deletes everything.
Could the following approach work:
Am I missing something?
@mkusaka commented on GitHub (Feb 1, 2026):
@simonklee Thanks — makes sense to me.
FYI: what we’re discussing here seems close to how Vim gets its word-boundary behavior (via “character classes” in mbyte.c):
https://github.com/vim/vim/blob/master/src/mbyte.c#L2925-L2932
Not saying we need to replicate Vim — just sharing it as a reference in case we want to extend the heuristics later.