mirror of
https://github.com/DavidAnson/markdownlint.git
synced 2026-04-25 01:05:55 +03:00
[GH-ISSUE #1458] MD013: Incorrect count on lines with multi-byte unicode characters #687
Labels
No labels
bug
enhancement
enhancement
enhancement
fixed in next
fixed in next
fixed in next
new rule
new rule
new rule
pull-request
question
refactoring
refactoring
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/markdownlint#687
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Maneren on GitHub (Dec 26, 2024).
Original GitHub issue: https://github.com/DavidAnson/markdownlint/issues/1458
Hi, I copied a paragraph from a PDF and it contained hardcoded unicode italic characters which take 4 bytes in UTF-8 or 2 bytes in UTF-16. After pasting that to a markdown file and saving it in a file in UTF-8 encoding I started receiving
Line length [Expected: 80, Actual: 85]warning, even though there are only 74 unicode characters displayed on the line (stored as 107 bytes).(I assume the intention of the rule is to consider the "visual count of characters" as rendered in the editor - 74 in this case)
I may be missing some context or detail of the implementation but I think the issue is a combination of JS handling everything as UTF-16 rather than UTF-8 (that is the seemingly incorrect
.lengthof the line reported) and the usage of regular "unicode-unaware" regular expressions, where.again matches on UTF-16 character.So I think the correct way to handle these would be
[...line].lengthto get the total length of the line and the inclusion of theuflag for the regular expressions to switch them to unicode mode.@DavidAnson commented on GitHub (Dec 26, 2024):
Related: #564