mirror of
https://github.com/DavidAnson/markdownlint.git
synced 2026-04-25 09:16:02 +03:00
[GH-ISSUE #1869] Supporting CJK characters in MD060 #763
Labels
No labels
bug
enhancement
enhancement
enhancement
fixed in next
fixed in next
fixed in next
new rule
new rule
new rule
pull-request
question
refactoring
refactoring
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/markdownlint#763
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @HowieHz on GitHub (Nov 19, 2025).
Original GitHub issue: https://github.com/DavidAnson/markdownlint/issues/1869
Hello, I would like to ask how to fix the MD060 issue.
I use Prettier to format Markdown files, which produced the following table:
It looks most like the "aligned" style, so I set:
I also used markdownlint-cli to run the lint check, but it reports the following error::
This is the PR from the update; the actions in it might help analyze the issue: https://github.com/HowieHz/halo-theme-higan-hz/pull/291
I created a demo to reproduce this issue. You can access it here: stackblitz
I suspect this rule has poor support for CJK characters.
Thank you for reading this far. I wish you a pleasant life.
(My English is not good, so I used a translator.)
@yuluo-yx commented on GitHub (Nov 19, 2025):
hi @HowieHz, you can refer: https://github.com/vllm-project/semantic-router/pull/700
@DavidAnson commented on GitHub (Nov 19, 2025):
Below is an example of your table with no warnings. It's not that the rule does not support extended characters but instead that the width they are rendered at can be inconsistent. This scenario is described a little more with emoji examples here: https://github.com/DavidAnson/markdownlint/blob/main/doc/md060.md
https://dlaa.me/markdownlint/#%25m%23%20Issue%201869%0A%0A%7C%20%E8%B7%AF%E5%BE%84%E5%8C%B9%E9%85%8D%20%20%20%20%20%20%20%20%20%20%20%7C%20%E5%8C%B9%E9%85%8D%E5%8C%BA%E5%9F%9F%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7C%0A%7C%20--------------%20%7C%20--------------------------%20%7C%0A%7C%20%60%2Farchives%2F**%60%20%7C%20%60article%20.content%60%20%20%20%20%20%20%20%20%20%7C%0A%7C%20%60%2Fmoments%60%20%20%20%20%20%7C%20%60article%20.content%20.medium%60%20%7C%0A%7C%20%60%2Fmoments%2F**%60%20%20%7C%20%60article%20.content%20.medium%60%20%7C%0A%7C%20%60%2Fphotos%60%20%20%20%20%20%20%7C%20%60article%20.content%60%20%20%20%20%20%20%20%20%20%7C%0A%7C%20%60%2Fphotos%2F**%60%20%20%20%7C%20%60article%20.content%60%20%20%20%20%20%20%20%20%20%7C%0A%0A%3C!--%20markdownlint-configure-file%20%7B%0A%20%20%22MD060%22%3A%20%7B%0A%20%20%20%20%22style%22%3A%20%22aligned%22%0A%20%20%7D%0A%7D%20--%3E%0A
@HowieHz commented on GitHub (Nov 20, 2025):
Thank you for your reply @DavidAnson. I think Prettier's formatting results might be more "aligned". What I mean is, considering that CJK (Chinese, Japanese, Korean) characters typically occupy two character widths in monospaced fonts, perhaps we could provide an option to calculate CJK characters as two units wide; or provide a configuration that allows specifying character sets to be calculated with specified widths.
For example, like the implementation in this PR: https://github.com/prettier/prettier/pull/3003
@DavidAnson commented on GitHub (Nov 20, 2025):
On my iPhone in the GitHub web UI editing a Markdown file, the right-most pipe characters would align perfectly if the check mark emoji each rendered with the width of two standard characters (3x2 = 5+1). However, that pipe character renders too far to the right which means these emoji occupy something like 2.1 normal character widths!
Assuming an exact width of 2.0 does not work for the first character I tried, so this does not seem like a guaranteed improvement to me. Furthermore, I do not see a good way to account for fractional widths even if we could tell exactly how wide each character would be for each font/program/OS/etc..
@Mister-Hope commented on GitHub (Nov 20, 2025):
Yes, that's why I also suggest providing some options that may:
Currently, when handling multilingual docs and working with tools like prettier, I did not see any chance to get it enabled.
@HowieHz commented on GitHub (Nov 20, 2025):
@DavidAnson Thank you for the response. I understand your concern about emoji rendering inconsistencies, but I'd like to suggest we treat CJK characters separately from emoji as a starting point.
CJK characters have much more reliable width behavior than emoji:
CJK characters (Chinese/Japanese/Korean) in monospaced fonts have a well-established convention of occupying exactly 2 character widths. This is standardized through:
wcwidthimplementationUnlike emoji (which as you demonstrated can render at fractional widths like 2.1), CJK characters have predictable width behavior that tools like Prettier, VS Code, and most terminal emulators handle consistently.
There are already mature implementation solutions:
Prettier handles similar issues using the
is-fullwidth-code-pointlibrary to determine character width. For example, in prettier/prettier#3003, they implemented accurate width calculation for fullwidth characters using this library.For more precise Unicode EastAsianWidth specification implementation, there's also the
is-full-widthlibrary, which provides a complete implementation conforming to the official Unicode EastAsianWidth specification v11.0.Proposed approach:
This would solve the immediate alignment issues with CJK content (which is very common in Asian markets) without trying to solve the more complex emoji rendering problem at the same time. Emoji support could be considered separately later if needed.
What do you think about this approach?
@DavidAnson commented on GitHub (Nov 20, 2025):
If I understand correctly, you're saying CJK characters are consistently rendered at exactly the width of 2 "normal" characters. The original example from above uses the below four Chinese (?) characters, so I'd expect the rendering of the following table on GitHub under macOS with all defaults to line up perfectly. But that's not what I see - the result is off by 2 full "normal" character widths. (I can provide an image if you don't see the same thing. But I also checked the behavior in VS Code and it's off by about 1.5 characters.) Therefore, I do not think this 2-character width convention is consistent or universal. Maybe if I changed my OS language the rendering would change, but I don't love the ambiguity and inconsistency of that.
The suggestion above to ignore all tables with "special" characters seems like something that could reasonably be done. It would probably mean that tables with any non-Latin characters like emoji, CJK, etc. would never be subject to MD060. Maybe that's the least bad option, but I'd love to come up with something better if we can.
@HowieHz commented on GitHub (Nov 20, 2025):
@DavidAnson Thank you for the response. I mean that, with a monospaced font, a fullwidth character should occupy the width of two halfwidth characters. In practice, however, there may be differences due to various typographic optimizations (fonts are not perfectly monospaced). How to implement this is up to you — having grown up in a CJK environment, I may treat this as common sense, so it might be hard to explain the matter clearly at first.
EDIT: Screenshot from China's largest Q&A community.
https://www.zhihu.com/question/334669192
@DavidAnson commented on GitHub (Nov 20, 2025):
Thanks. I'll experiment with this tonight.
@gibfahn commented on GitHub (Nov 20, 2025):
For what it's worth I came across this because I updated markdownlint, and now the way prettier formats my markdown then triggers markdownlint. It would be great if these two tools did the same thing.
Example:
What it looks like in kitty terminal / Hasklug Nerd Font Mono:
Interesting that GitHub shows it as non-aligned...
@DavidAnson commented on GitHub (Nov 20, 2025):
@gibfahn, what behavior are you saying prettier implements? Your sample suggests it may be treating emoji as two characters also? Can you give some examples of how prettier formats things so I can get a sense of what compatibility might look like?
@Mister-Hope commented on GitHub (Nov 21, 2025):
Thanks for the research @gibfahn did.-
Again I want to express my opinion, a better default value should be provided for sure. But since there are CJK fonts that are not 2:1 exactly, also the emoji width may be different in different fonts, we might need to provide an option to tweak emoji and CJK characters width?
In most editors (like VS Code), you are able to customize editor fonts. For most Chinese developers, we are expected to choose an exact 2:1 font (which will greatly improve developing experience with CJK), while GitHub itself might be a little bit different, so people shall be able to manually choose the way they want to align the tables.
@HowieHz commented on GitHub (Nov 21, 2025):
@DavidAnson Thank you for your work. Here is an online demo demonstrating the incompatibility issue between the new version of markdownlint and prettier. This is why I initially opened this issue, because prettier has been treating full-width characters as width 2 since prettier/prettier#3003.
@DavidAnson commented on GitHub (Nov 23, 2025):
Okay, here's the proposed implementation I've come up with after a bunch of research: https://github.com/DavidAnson/markdownlint/commit/f44a15e4309d31d710f2b481fda8aa7452b77d47
The documentation isn't updated and this commit does NOT handle CJK characters yet, but it DOES handle emoji and demonstrates the concept. CJK characters should be as easy as updating the default RegExp to include something like
\p{East_Asian_Width=Wide, but unfortunately, that unicode property does not seem to be supported anywhere (see https://github.com/tc39/proposal-regexp-unicode-property-escapes/issues/28), so I need to do something more complicated. (Coming soon...)Note that this implementation IS user-configurable, so folks won't be locked to the default list.
@HowieHz commented on GitHub (Nov 23, 2025):
A regex reference for matching Chinese, Japanese, and Korean:
GPT told me that in ES2018+ you can use the regexp
/(\p{East_Asian_Width=Fullwidth}|\p{East_Asian_Width=Wide})/gu, but none of the environments I have on hand can run it — they all report "Invalid property name"; maybe it's an AI hallucination.@DavidAnson commented on GitHub (Nov 23, 2025):
@HowieHz That's great news, thank you! I was afraid I would have to build a set of Unicode character ranges for CJK and that would've been annoying. A short RegExp string will be easier to work with and easier to customize. I should be able to update my prototype tomorrow to get this working for CJK, though it might take another day after that before I'm ready to commit the changes.
@DavidAnson commented on GitHub (Nov 24, 2025):
Okay, I've got it working for Emoji and CJK (after a few detours): https://github.com/DavidAnson/markdownlint/compare/next...wide
@DavidAnson commented on GitHub (Nov 24, 2025):
Actually, the more I look at this, the more I think I need to just use this package because there are so many special cases around character width: https://github.com/sindresorhus/string-width
That would mean removing the ability to customize the behavior, but that shouldn't be necessary if this package gets things right. Which it seems to do better at than anything else.
@HowieHz commented on GitHub (Nov 24, 2025):
I think that makes sense, this package is good.
I just realized that https://github.com/sindresorhus/is-fullwidth-code-point and https://github.com/sindresorhus/string-width are by the same author.
In CJK environments, we generally use fullwidth punctuation, which also takes up two widths and aligns with CJK characters, such as:
,。“”‘’:, rather than:,.""'':.@DavidAnson commented on GitHub (Nov 24, 2025):
Yes, one of the new test cases I added happened to have a full width comma...
@Mister-Hope commented on GitHub (Nov 24, 2025):
I don't think this is the correct approach. The table are expecting to use width from editors with raw content width, so the third example of the package is not acceptable (at least to me)
I would prefer to have a \uxxxx being displayed at "6 width". Table should be human readable, not print readable.
@DavidAnson commented on GitHub (Nov 24, 2025):
@Mister-Hope, can you please explain more? Are you asking to be able to control the width of every individual code point? I don't think that's practical. The regular expression approach I have proposed allows identifying characters that should be treated as two columns instead of one. But it does not handle fractional width for complexity reasons. The approach I proposed immediately above to use the string-width package is more consistent with how the prettier formatter behaves and should produce the same behavior as what I've already prototyped for markdownlint. This new test file, for example, produces no violations and should look right to anyone with the appropriate CJK fonts installed: https://raw.githubusercontent.com/DavidAnson/markdownlint/refs/heads/wide/test/table-column-style-wide-characters.md
@Mister-Hope commented on GitHub (Nov 24, 2025):
I just mean this isn't correct with our usage.
@DavidAnson commented on GitHub (Nov 25, 2025):
@Mister-Hope Those look like ANSI escape codes. I would not expect them in Markdown, but if you pass the countAnsiEscapeCodes option, they should get counted?
@HowieHz commented on GitHub (Nov 25, 2025):
Let's test it in Prettier. stackblitz
@DavidAnson commented on GitHub (Nov 25, 2025):
The examples above are all clean (no violations) in the demo app now. Thanks everyone for helping!