[GH-ISSUE #1869 ] Supporting CJK characters in MD060 #763

Author

Owner

@yuluo-yx commented on GitHub (Nov 19, 2025):

hi @HowieHz, you can refer: https://github.com/vllm-project/semantic-router/pull/700

@yuluo-yx commented on GitHub (Nov 19, 2025): hi @HowieHz, you can refer: https://github.com/vllm-project/semantic-router/pull/700

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 19, 2025):

Below is an example of your table with no warnings. It's not that the rule does not support extended characters but instead that the width they are rendered at can be inconsistent. This scenario is described a little more with emoji examples here: https://github.com/DavidAnson/markdownlint/blob/main/doc/md060.md

@DavidAnson commented on GitHub (Nov 19, 2025): Below is an example of your table with no warnings. It's not that the rule does not support extended characters but instead that the width they are rendered at can be inconsistent. This scenario is described a little more with emoji examples here: https://github.com/DavidAnson/markdownlint/blob/main/doc/md060.md https://dlaa.me/markdownlint/#%25m%23%20Issue%201869%0A%0A%7C%20%E8%B7%AF%E5%BE%84%E5%8C%B9%E9%85%8D%20%20%20%20%20%20%20%20%20%20%20%7C%20%E5%8C%B9%E9%85%8D%E5%8C%BA%E5%9F%9F%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7C%0A%7C%20--------------%20%7C%20--------------------------%20%7C%0A%7C%20%60%2Farchives%2F**%60%20%7C%20%60article%20.content%60%20%20%20%20%20%20%20%20%20%7C%0A%7C%20%60%2Fmoments%60%20%20%20%20%20%7C%20%60article%20.content%20.medium%60%20%7C%0A%7C%20%60%2Fmoments%2F**%60%20%20%7C%20%60article%20.content%20.medium%60%20%7C%0A%7C%20%60%2Fphotos%60%20%20%20%20%20%20%7C%20%60article%20.content%60%20%20%20%20%20%20%20%20%20%7C%0A%7C%20%60%2Fphotos%2F**%60%20%20%20%7C%20%60article%20.content%60%20%20%20%20%20%20%20%20%20%7C%0A%0A%3C!--%20markdownlint-configure-file%20%7B%0A%20%20%22MD060%22%3A%20%7B%0A%20%20%20%20%22style%22%3A%20%22aligned%22%0A%20%20%7D%0A%7D%20--%3E%0A

kerem commented

Author

Owner

@HowieHz commented on GitHub (Nov 20, 2025):

Below is an example of your table with no warnings. It's not that the rule does not support extended characters but instead that the width they are rendered at can be inconsistent. This scenario is described a little more with emoji examples here: main/doc/md060.md下面是你的表格示例，没有警告。并不是规则不支持扩展字符，而是它们渲染时的宽度可能不一致。这里用表情符号做了更详细的描述：main/doc/md060.md

dlaa.me/markdownlint#%m%23 Issue 1869 | 路径匹配 | 匹配区域 | | -------------- | -------------------------- | | %2Farchives%2F** | article .content | | %2Fmoments | article .content .medium | | %2Fmoments%2F** | article .content .medium | | %2Fphotos | article .content | | %2Fphotos%2F** | article .content | dlaa.me/markdownlint#%m%23 1869 年第 1 期 |路径匹配 |匹配区域 | |-------------- |-------------------------- | |'%2Farchives%2F**' |“条目.content” | |'%2Fmoments' |'条目 .content .medium' |'%2Fmoments%2F**' |'条目 .content .medium' |'%2Fphotos' |“条目.content” | |'%2Fphotos%2F**' |“条目 .content” |

Thank you for your reply @DavidAnson. I think Prettier's formatting results might be more "aligned". What I mean is, considering that CJK (Chinese, Japanese, Korean) characters typically occupy two character widths in monospaced fonts, perhaps we could provide an option to calculate CJK characters as two units wide; or provide a configuration that allows specifying character sets to be calculated with specified widths.

For example, like the implementation in this PR: https://github.com/prettier/prettier/pull/3003

@HowieHz commented on GitHub (Nov 20, 2025): > Below is an example of your table with no warnings. It's not that the rule does not support extended characters but instead that the width they are rendered at can be inconsistent. This scenario is described a little more with emoji examples here: [`main`/doc/md060.md](https://github.com/DavidAnson/markdownlint/blob/main/doc/md060.md?rgh-link-date=2025-11-19T17%3A30%3A45.000Z)下面是你的表格示例，没有警告。并不是规则不支持扩展字符，而是它们渲染时的宽度可能不一致。这里用表情符号做了更详细的描述：[`main`/doc/md060.md](https://github.com/DavidAnson/markdownlint/blob/main/doc/md060.md?rgh-link-date=2025-11-19T17%3A30%3A45.000Z) > > [dlaa.me/markdownlint#%m%23 Issue 1869 | 路径匹配 | 匹配区域 | | -------------- | -------------------------- | | `%2Farchives%2F**` | `article .content` | | `%2Fmoments` | `article .content .medium` | | `%2Fmoments%2F**` | `article .content .medium` | | `%2Fphotos` | `article .content` | | `%2Fphotos%2F**` | `article .content` |  dlaa.me/markdownlint#%m%23 1869 年第 1 期 |路径匹配 |匹配区域 | |-------------- |-------------------------- | |'%2Farchives%2F**' |“条目.content” | |'%2Fmoments' |'条目 .content .medium' |'%2Fmoments%2F**' |'条目 .content .medium' |'%2Fphotos' |“条目.content” | |'%2Fphotos%2F**' |“条目 .content” |](https://dlaa.me/markdownlint/#%25m%23%20Issue%201869%0A%0A%7C%20%E8%B7%AF%E5%BE%84%E5%8C%B9%E9%85%8D%20%20%20%20%20%20%20%20%20%20%20%7C%20%E5%8C%B9%E9%85%8D%E5%8C%BA%E5%9F%9F%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7C%0A%7C%20--------------%20%7C%20--------------------------%20%7C%0A%7C%20%60%2Farchives%2F**%60%20%7C%20%60article%20.content%60%20%20%20%20%20%20%20%20%20%7C%0A%7C%20%60%2Fmoments%60%20%20%20%20%20%7C%20%60article%20.content%20.medium%60%20%7C%0A%7C%20%60%2Fmoments%2F**%60%20%20%7C%20%60article%20.content%20.medium%60%20%7C%0A%7C%20%60%2Fphotos%60%20%20%20%20%20%20%7C%20%60article%20.content%60%20%20%20%20%20%20%20%20%20%7C%0A%7C%20%60%2Fphotos%2F**%60%20%20%20%7C%20%60article%20.content%60%20%20%20%20%20%20%20%20%20%7C%0A%0A%3C!--%20markdownlint-configure-file%20%7B%0A%20%20%22MD060%22%3A%20%7B%0A%20%20%20%20%22style%22%3A%20%22aligned%22%0A%20%20%7D%0A%7D%20--%3E%0A) <img width="1767" height="1305" alt="Image" src="https://github.com/user-attachments/assets/48c7db1e-433c-46e3-92d0-5fbba791977e" /> Thank you for your reply @DavidAnson. I think Prettier's formatting results might be more "aligned". What I mean is, considering that CJK (Chinese, Japanese, Korean) characters typically occupy two character widths in monospaced fonts, perhaps we could provide an option to calculate CJK characters as two units wide; or provide a configuration that allows specifying character sets to be calculated with specified widths. For example, like the implementation in this PR: https://github.com/prettier/prettier/pull/3003

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 20, 2025):

On my iPhone in the GitHub web UI editing a Markdown file, the right-most pipe characters would align perfectly if the check mark emoji each rendered with the width of two standard characters (3x2 = 5+1). However, that pipe character renders too far to the right which means these emoji occupy something like 2.1 normal character widths!

Assuming an exact width of 2.0 does not work for the first character I tried, so this does not seem like a guaranteed improvement to me. Furthermore, I do not see a good way to account for fractional widths even if we could tell exactly how wide each character would be for each font/program/OS/etc..

| Response | Emoji |
| -------- | ----- |
| Yes      | ✅✅✅|

@DavidAnson commented on GitHub (Nov 20, 2025): On my iPhone in the GitHub web UI editing a Markdown file, the right-most pipe characters would align perfectly if the check mark emoji each rendered with the width of two standard characters (3x2 = 5+1). However, that pipe character renders too far to the right which means these emoji occupy something like 2.1 normal character widths! Assuming an exact width of 2.0 does not work for the first character I tried, so this does not seem like a guaranteed improvement to me. Furthermore, I do not see a good way to account for fractional widths even if we could tell exactly how wide each character would be for each font/program/OS/etc.. ``` | Response | Emoji | | -------- | ----- | | Yes | ✅✅✅| ```

kerem commented

Author

Owner

@Mister-Hope commented on GitHub (Nov 20, 2025):

Yes, that's why I also suggest providing some options that may:

adjust CJK, emoji and even other characters widths individually
support ignoring table that contain special characters (tables writing with en still get "aligned" while table with other characters are ignored) This is better than disabling the rule entirely, since markdownlint itself and its CLIs does not support ESLint overrides that allow us to tweak rules for some files with globs.

Currently, when handling multilingual docs and working with tools like prettier, I did not see any chance to get it enabled.

@Mister-Hope commented on GitHub (Nov 20, 2025): Yes, that's why I also suggest providing some options that may: 1. adjust CJK, emoji and even other characters widths individually 2. support ignoring table that contain special characters (tables writing with en still get "aligned" while table with other characters are ignored) This is better than disabling the rule entirely, since markdownlint itself and its CLIs does not support ESLint overrides that allow us to tweak rules for some files with globs. Currently, when handling multilingual docs and working with tools like prettier, I did not see any chance to get it enabled.

kerem commented

Author

Owner

@HowieHz commented on GitHub (Nov 20, 2025):

@DavidAnson Thank you for the response. I understand your concern about emoji rendering inconsistencies, but I'd like to suggest we treat CJK characters separately from emoji as a starting point.

CJK characters have much more reliable width behavior than emoji:

CJK characters (Chinese/Japanese/Korean) in monospaced fonts have a well-established convention of occupying exactly 2 character widths. This is standardized through:

Unicode Standard Annex (East Asian Width)
The widely-adopted wcwidth implementation
Consistent behavior across modern terminals, editors, and monospaced fonts

Unlike emoji (which as you demonstrated can render at fractional widths like 2.1), CJK characters have predictable width behavior that tools like Prettier, VS Code, and most terminal emulators handle consistently.

There are already mature implementation solutions:

Prettier handles similar issues using the is-fullwidth-code-point library to determine character width. For example, in prettier/prettier#3003, they implemented accurate width calculation for fullwidth characters using this library.

For more precise Unicode EastAsianWidth specification implementation, there's also the is-full-width library, which provides a complete implementation conforming to the official Unicode EastAsianWidth specification v11.0.

Proposed approach:

"MD060": {
  "style": "aligned",
  "use_east_asian_width": true  // Uses fullwidth calculation for CJK
}

This would solve the immediate alignment issues with CJK content (which is very common in Asian markets) without trying to solve the more complex emoji rendering problem at the same time. Emoji support could be considered separately later if needed.

What do you think about this approach?

@HowieHz commented on GitHub (Nov 20, 2025): @DavidAnson Thank you for the response. I understand your concern about emoji rendering inconsistencies, but I'd like to suggest we treat CJK characters separately from emoji as a starting point. CJK characters have much more reliable width behavior than emoji: CJK characters (Chinese/Japanese/Korean) in monospaced fonts have a well-established convention of occupying exactly 2 character widths. This is standardized through: - [Unicode Standard Annex (East Asian Width)](https://www.unicode.org/reports/tr11/) - The widely-adopted `wcwidth` implementation - Consistent behavior across modern terminals, editors, and monospaced fonts Unlike emoji (which as you demonstrated can render at fractional widths like 2.1), CJK characters have predictable width behavior that tools like Prettier, VS Code, and most terminal emulators handle consistently. There are already mature implementation solutions: Prettier handles similar issues using the [`is-fullwidth-code-point`](https://github.com/sindresorhus/is-fullwidth-code-point) library to determine character width. For example, in [prettier/prettier#3003](https://github.com/prettier/prettier/pull/3003), they implemented accurate width calculation for fullwidth characters using this library. For more precise Unicode EastAsianWidth specification implementation, there's also the [`is-full-width`](https://github.com/NXMIX/is-full-width) library, which provides a complete implementation conforming to the official Unicode EastAsianWidth specification v11.0. **Proposed approach:** ```jsonc "MD060": { "style": "aligned", "use_east_asian_width": true // Uses fullwidth calculation for CJK } ``` This would solve the immediate alignment issues with CJK content (which is very common in Asian markets) without trying to solve the more complex emoji rendering problem at the same time. Emoji support could be considered separately later if needed. What do you think about this approach?

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 20, 2025):

If I understand correctly, you're saying CJK characters are consistently rendered at exactly the width of 2 "normal" characters. The original example from above uses the below four Chinese (?) characters, so I'd expect the rendering of the following table on GitHub under macOS with all defaults to line up perfectly. But that's not what I see - the result is off by 2 full "normal" character widths. (I can provide an image if you don't see the same thing. But I also checked the behavior in VS Code and it's off by about 1.5 characters.) Therefore, I do not think this 2-character width convention is consistent or universal. Maybe if I changed my OS language the rendering would change, but I don't love the ambiguity and inconsistency of that.

| 路径匹配 |
| -------- |
| 12345678 |

The suggestion above to ignore all tables with "special" characters seems like something that could reasonably be done. It would probably mean that tables with any non-Latin characters like emoji, CJK, etc. would never be subject to MD060. Maybe that's the least bad option, but I'd love to come up with something better if we can.

@DavidAnson commented on GitHub (Nov 20, 2025): If I understand correctly, you're saying CJK characters are consistently rendered at exactly the width of 2 "normal" characters. The original example from above uses the below four Chinese (?) characters, so I'd expect the rendering of the following table on GitHub under macOS with all defaults to line up perfectly. But that's not what I see - the result is off by 2 full "normal" character widths. (I can provide an image if you don't see the same thing. But I also checked the behavior in VS Code and it's off by about 1.5 characters.) Therefore, I do not think this 2-character width convention is consistent or universal. Maybe if I changed my OS language the rendering would change, but I don't love the ambiguity and inconsistency of that. ``` | 路径匹配 | | -------- | | 12345678 | ``` The suggestion above to ignore all tables with "special" characters seems like something that could reasonably be done. It would probably mean that tables with any non-Latin characters like emoji, CJK, etc. would never be subject to MD060. Maybe that's the least bad option, but I'd love to come up with something better if we can.

kerem commented

Author

Owner

@HowieHz commented on GitHub (Nov 20, 2025):

路径匹配

12345678

路径匹配
12345678

@DavidAnson Thank you for the response. I mean that, with a monospaced font, a fullwidth character should occupy the width of two halfwidth characters. In practice, however, there may be differences due to various typographic optimizations (fonts are not perfectly monospaced). How to implement this is up to you — having grown up in a CJK environment, I may treat this as common sense, so it might be hard to explain the matter clearly at first.

EDIT: Screenshot from China's largest Q&A community.
https://www.zhihu.com/question/334669192

@HowieHz commented on GitHub (Nov 20, 2025): > | 路径匹配 | > | -------- | > | 12345678 | <img width="1628" height="1254" alt="Image" src="https://github.com/user-attachments/assets/4e03c79c-42fe-48dc-a6c6-7d97e2c46c4c" /> @DavidAnson Thank you for the response. I mean that, with a **monospaced font**, a fullwidth character should occupy the width of two halfwidth characters. In practice, however, there may be differences due to various typographic optimizations (fonts are not perfectly monospaced). How to implement this is up to you — having grown up in a CJK environment, I may treat this as common sense, so it might be hard to explain the matter clearly at first. EDIT: Screenshot from China's largest Q&A community. https://www.zhihu.com/question/334669192 <img width="1391" height="1293" alt="Image" src="https://github.com/user-attachments/assets/b570bdfd-f637-45ea-972c-1a6253cd351d" /> <img width="510" height="143" alt="Image" src="https://github.com/user-attachments/assets/17942cfa-e788-41d5-8a61-423ce21197b9" /> <img width="161" height="142" alt="Image" src="https://github.com/user-attachments/assets/e198b7ff-1d70-41d0-bfdf-f08f6f52a183" />

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 20, 2025):

Thanks. I'll experiment with this tonight.

@DavidAnson commented on GitHub (Nov 20, 2025): Thanks. I'll experiment with this tonight.

kerem commented

Author

Owner

@gibfahn commented on GitHub (Nov 20, 2025):

For what it's worth I came across this because I updated markdownlint, and now the way prettier formats my markdown then triggers markdownlint. It would be great if these two tools did the same thing.

Example:

| Foo | Bar    |
| --- | ------ |
| A   | ✅⚠️❌ |

path/to/t.md:23:14 MD060/table-column-style Table column style [Table pipe does not align with heading for style "aligned"]

What it looks like in kitty terminal / Hasklug Nerd Font Mono:

Interesting that GitHub shows it as non-aligned...

@gibfahn commented on GitHub (Nov 20, 2025): For what it's worth I came across this because I updated markdownlint, and now the way [prettier](https://prettier.io) formats my markdown then triggers markdownlint. It would be great if these two tools did the same thing. Example: ```markdown | Foo | Bar | | --- | ------ | | A | ✅⚠️❌ | ``` ``` path/to/t.md:23:14 MD060/table-column-style Table column style [Table pipe does not align with heading for style "aligned"] ``` What it looks like in kitty terminal / Hasklug Nerd Font Mono: <img width="171" height="83" alt="Image" src="https://github.com/user-attachments/assets/b295faaa-2ce5-46b7-adcb-36b81b32c8e9" /> Interesting that GitHub shows it as non-aligned...

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 20, 2025):

@gibfahn, what behavior are you saying prettier implements? Your sample suggests it may be treating emoji as two characters also? Can you give some examples of how prettier formats things so I can get a sense of what compatibility might look like?

@DavidAnson commented on GitHub (Nov 20, 2025): @gibfahn, what behavior are you saying prettier implements? Your sample suggests it may be treating emoji as two characters also? Can you give some examples of how prettier formats things so I can get a sense of what compatibility might look like?

kerem commented

Author

Owner

@Mister-Hope commented on GitHub (Nov 21, 2025):

Thanks for the research @gibfahn did.-

Again I want to express my opinion, a better default value should be provided for sure. But since there are CJK fonts that are not 2:1 exactly, also the emoji width may be different in different fonts, we might need to provide an option to tweak emoji and CJK characters width?

In most editors (like VS Code), you are able to customize editor fonts. For most Chinese developers, we are expected to choose an exact 2:1 font (which will greatly improve developing experience with CJK), while GitHub itself might be a little bit different, so people shall be able to manually choose the way they want to align the tables.

@Mister-Hope commented on GitHub (Nov 21, 2025): Thanks for the research @gibfahn did.- Again I want to express my opinion, a better default value should be provided for sure. But since there are CJK fonts that are not 2:1 exactly, also the emoji width may be different in different fonts, we might need to provide an option to tweak emoji and CJK characters width? In most editors (like VS Code), you are able to customize editor fonts. For most Chinese developers, we are expected to choose an exact 2:1 font (which will greatly improve developing experience with CJK), while GitHub itself might be a little bit different, so people shall be able to manually choose the way they want to align the tables.

kerem commented

Author

Owner

@HowieHz commented on GitHub (Nov 21, 2025):

I created a demo to reproduce this issue. You can access it here: stackblitz

@DavidAnson Thank you for your work. Here is an online demo demonstrating the incompatibility issue between the new version of markdownlint and prettier. This is why I initially opened this issue, because prettier has been treating full-width characters as width 2 since prettier/prettier#3003.

@HowieHz commented on GitHub (Nov 21, 2025): > I created a demo to reproduce this issue. You can access it here: [stackblitz](https://stackblitz.com/edit/vitejs-vite-jdypgh5u?file=README.md&view=editor) @DavidAnson Thank you for your work. [Here](https://stackblitz.com/edit/vitejs-vite-jdypgh5u?file=README.md&view=editor) is an online demo demonstrating the incompatibility issue between the new version of markdownlint and prettier. This is why I initially opened this issue, because prettier has been treating full-width characters as width 2 since [prettier/prettier#3003](https://github.com/prettier/prettier/pull/3003). <img width="1437" height="1223" alt="Image" src="https://github.com/user-attachments/assets/bbf08e9c-ff1e-4f14-9028-e6e21b82824a" /> <img width="2422" height="1358" alt="Image" src="https://github.com/user-attachments/assets/6a7e9c97-4c07-43e7-b5e2-182c9df1eae8" />

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 23, 2025):

Okay, here's the proposed implementation I've come up with after a bunch of research: https://github.com/DavidAnson/markdownlint/commit/f44a15e4309d31d710f2b481fda8aa7452b77d47

The documentation isn't updated and this commit does NOT handle CJK characters yet, but it DOES handle emoji and demonstrates the concept. CJK characters should be as easy as updating the default RegExp to include something like \p{East_Asian_Width=Wide, but unfortunately, that unicode property does not seem to be supported anywhere (see https://github.com/tc39/proposal-regexp-unicode-property-escapes/issues/28), so I need to do something more complicated. (Coming soon...)

Note that this implementation IS user-configurable, so folks won't be locked to the default list.

@DavidAnson commented on GitHub (Nov 23, 2025): Okay, here's the proposed implementation I've come up with after a bunch of research: <https://github.com/DavidAnson/markdownlint/commit/f44a15e4309d31d710f2b481fda8aa7452b77d47> The documentation isn't updated and this commit does NOT handle CJK characters yet, but it DOES handle emoji and demonstrates the concept. CJK characters *should* be as easy as updating the default RegExp to include something like `\p{East_Asian_Width=Wide`, but unfortunately, that unicode property does not seem to be supported anywhere (see <https://github.com/tc39/proposal-regexp-unicode-property-escapes/issues/28>), so I need to do something more complicated. (Coming soon...) Note that this implementation IS user-configurable, so folks won't be locked to the default list.

kerem commented

Author

Owner

@HowieHz commented on GitHub (Nov 23, 2025):

A regex reference for matching Chinese, Japanese, and Korean:

const sentence = "A ticket to 大阪 上海 창원 costs ¥2000 👌.";

const regexp = /(?:\p{Script=Han}|\p{Script=Hiragana}|\p{Script=Katakana}|\p{Script=Hangul})/gu;

console.log(sentence.match(regexp));
// Expect: Array ["大", "阪", "上", "海", "창", "원"]

GPT told me that in ES2018+ you can use the regexp /(\p{East_Asian_Width=Fullwidth}|\p{East_Asian_Width=Wide})/gu, but none of the environments I have on hand can run it — they all report "Invalid property name"; maybe it's an AI hallucination.

@HowieHz commented on GitHub (Nov 23, 2025): A regex reference for matching Chinese, Japanese, and Korean: ```javascript const sentence = "A ticket to 大阪上海 창원 costs ¥2000 👌."; const regexp = /(?:\p{Script=Han}|\p{Script=Hiragana}|\p{Script=Katakana}|\p{Script=Hangul})/gu; console.log(sentence.match(regexp)); // Expect: Array ["大", "阪", "上", "海", "창", "원"] ``` <img width="544" height="110" alt="Image" src="https://github.com/user-attachments/assets/56bf90fd-119c-4911-9c46-3d0d5ebf4816" /> GPT told me that in ES2018+ you can use the regexp `/(\p{East_Asian_Width=Fullwidth}|\p{East_Asian_Width=Wide})/gu`, but none of the environments I have on hand can run it — they all report "Invalid property name"; maybe it's an AI hallucination.

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 23, 2025):

@HowieHz That's great news, thank you! I was afraid I would have to build a set of Unicode character ranges for CJK and that would've been annoying. A short RegExp string will be easier to work with and easier to customize. I should be able to update my prototype tomorrow to get this working for CJK, though it might take another day after that before I'm ready to commit the changes.

@DavidAnson commented on GitHub (Nov 23, 2025): @HowieHz That's great news, thank you! I was afraid I would have to build a set of Unicode character ranges for CJK and that would've been annoying. A short RegExp string will be easier to work with and easier to customize. I should be able to update my prototype tomorrow to get this working for CJK, though it might take another day after that before I'm ready to commit the changes.

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 24, 2025):

Okay, I've got it working for Emoji and CJK (after a few detours): https://github.com/DavidAnson/markdownlint/compare/next...wide

@DavidAnson commented on GitHub (Nov 24, 2025): Okay, I've got it working for Emoji and CJK (after a few detours): https://github.com/DavidAnson/markdownlint/compare/next...wide

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 24, 2025):

Actually, the more I look at this, the more I think I need to just use this package because there are so many special cases around character width: https://github.com/sindresorhus/string-width

That would mean removing the ability to customize the behavior, but that shouldn't be necessary if this package gets things right. Which it seems to do better at than anything else.

@DavidAnson commented on GitHub (Nov 24, 2025): Actually, the more I look at this, the more I think I need to just use this package because there are so many special cases around character width: https://github.com/sindresorhus/string-width That would mean removing the ability to customize the behavior, but that shouldn't be necessary if this package gets things right. Which it seems to do better at than anything else.

kerem commented

Author

Owner

@HowieHz commented on GitHub (Nov 24, 2025):

many special cases

I think that makes sense, this package is good.
I just realized that https://github.com/sindresorhus/is-fullwidth-code-point and https://github.com/sindresorhus/string-width are by the same author.

In CJK environments, we generally use fullwidth punctuation, which also takes up two widths and aligns with CJK characters, such as: ， 。 “” ‘’ ：, rather than: , . "" '' :.

@HowieHz commented on GitHub (Nov 24, 2025): > many special cases I think that makes sense, this package is good. I just realized that https://github.com/sindresorhus/is-fullwidth-code-point and https://github.com/sindresorhus/string-width are by the same author. In CJK environments, we generally use fullwidth punctuation, which also takes up two widths and aligns with CJK characters, such as: `，` `。` `“”` `‘’` `：`, rather than: `,` `.` `""` `''` `:`.

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 24, 2025):

Yes, one of the new test cases I added happened to have a full width comma...

@DavidAnson commented on GitHub (Nov 24, 2025): Yes, one of the new test cases I added happened to have a full width comma...

kerem commented

Author

Owner

@Mister-Hope commented on GitHub (Nov 24, 2025):

Actually, the more I look at this, the more I think I need to just use this package because there are so many special cases around character width: https://github.com/sindresorhus/string-width

That would mean removing the ability to customize the behavior, but that shouldn't be necessary if this package gets things right. Which it seems to do better at than anything else.

I don't think this is the correct approach. The table are expecting to use width from editors with raw content width, so the third example of the package is not acceptable (at least to me)

I would prefer to have a \uxxxx being displayed at "6 width". Table should be human readable, not print readable.

@Mister-Hope commented on GitHub (Nov 24, 2025): > Actually, the more I look at this, the more I think I need to just use this package because there are so many special cases around character width: https://github.com/sindresorhus/string-width > > That would mean removing the ability to customize the behavior, but that shouldn't be necessary if this package gets things right. Which it seems to do better at than anything else. I don't think this is the correct approach. The table are expecting to use width from editors with raw content width, so the third example of the package is not acceptable (at least to me) I would prefer to have a \uxxxx being displayed at "6 width". Table should be human readable, not print readable.

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 24, 2025):

@Mister-Hope, can you please explain more? Are you asking to be able to control the width of every individual code point? I don't think that's practical. The regular expression approach I have proposed allows identifying characters that should be treated as two columns instead of one. But it does not handle fractional width for complexity reasons. The approach I proposed immediately above to use the string-width package is more consistent with how the prettier formatter behaves and should produce the same behavior as what I've already prototyped for markdownlint. This new test file, for example, produces no violations and should look right to anyone with the appropriate CJK fonts installed: https://raw.githubusercontent.com/DavidAnson/markdownlint/refs/heads/wide/test/table-column-style-wide-characters.md

@DavidAnson commented on GitHub (Nov 24, 2025): @Mister-Hope, can you please explain more? Are you asking to be able to control the width of every individual code point? I don't think that's practical. The regular expression approach I have proposed allows identifying characters that should be treated as two columns instead of one. But it does not handle fractional width for complexity reasons. The approach I proposed immediately above to use the string-width package is more consistent with how the prettier formatter behaves and should produce the same behavior as what I've already prototyped for markdownlint. This new test file, for example, produces no violations and should look right to anyone with the appropriate CJK fonts installed: https://raw.githubusercontent.com/DavidAnson/markdownlint/refs/heads/wide/test/table-column-style-wide-characters.md

kerem commented

Author

Owner

@Mister-Hope commented on GitHub (Nov 24, 2025):

stringWidth('\u001B[1m古\u001B[22m');
//=> 2

I just mean this isn't correct with our usage.

@Mister-Hope commented on GitHub (Nov 24, 2025): ``` stringWidth('\u001B[1m古\u001B[22m'); //=> 2 ``` I just mean this isn't correct with our usage.

kerem commented

Author

Owner

@DavidAnson commented on GitHub (Nov 25, 2025):

@Mister-Hope Those look like ANSI escape codes. I would not expect them in Markdown, but if you pass the countAnsiEscapeCodes option, they should get counted?

@DavidAnson commented on GitHub (Nov 25, 2025): @Mister-Hope Those look like ANSI escape codes. I would not expect them in Markdown, but if you pass the countAnsiEscapeCodes option, they should get counted?

kerem commented

Author

Owner

@HowieHz commented on GitHub (Nov 25, 2025):

Prettier handles similar issues using the is-fullwidth-code-point library to determine character width. For example, in prettier/prettier#3003, they implemented accurate width calculation for fullwidth characters using this library.

Let's test it in Prettier. stackblitz

@HowieHz commented on GitHub (Nov 25, 2025): > Prettier handles similar issues using the [`is-fullwidth-code-point`](https://github.com/sindresorhus/is-fullwidth-code-point?rgh-link-date=2025-11-20T05%3A01%3A35.000Z) library to determine character width. For example, in [prettier/prettier#3003](https://github.com/prettier/prettier/pull/3003), they implemented accurate width calculation for fullwidth characters using this library. <img width="870" height="276" alt="Image" src="https://github.com/user-attachments/assets/c2d25a30-488b-4c3c-b9e7-74a8f2ba24ff" /> Let's test it in Prettier. [stackblitz](https://stackblitz.com/edit/vitejs-vite-jdypgh5u?file=README.md)

kerem commented