starred/markdownlint

Fork 0

mirror of https://github.com/DavidAnson/markdownlint.git synced 2026-04-26 01:36:03 +03:00

[GH-ISSUE #564] MD013: Unicode character width #451

New issue

Open

opened 2026-03-03 01:27:02 +03:00 by kerem · 5 comments

kerem commented

2026-03-03 01:27:02 +03:00

Owner

Originally created by @m15a on GitHub (Aug 19, 2022).
Original GitHub issue: https://github.com/DavidAnson/markdownlint/issues/564

Writing texts using non-ASCII characters in markdown would be daily practice worldwide. One, like me, would want to lint markdown files written in non-ASCII characters.

Regarding MD013, the current implementation of markdownlint seems checking character count of each line, based on regular expression. However, character width of unicode characters varies. For example, CJK characters often have double width.

Imagine that you configured MD013 as

$ cat .markdownlint.yml
MD013:
  line_length: 40

and you have example.md:

# An old story

<!-- Line below is of length 56, so it will be warned by markdownlint. -->
Long, long ago there lived, an old man and an old woman.

<!-- Line below has total unicode width 70 but won't be warned as its character counts 35. -->
むかしむかし，あるところに，おじいさんとおばあさんがくらしていました。

It is expected that both lines 4 and 7 are warned by MD013. However,

$ markdownlint example.md
example.md:4:41 MD013/line-length Line length [Expected: 40; Actual: 56]

Originally created by @m15a on GitHub (Aug 19, 2022). Original GitHub issue: https://github.com/DavidAnson/markdownlint/issues/564 Writing texts using non-ASCII characters in markdown would be daily practice worldwide. One, like me, would want to lint markdown files written in non-ASCII characters. Regarding MD013, the current implementation of markdownlint seems checking character count of each line, based on regular expression. However, character width of unicode characters varies. For example, CJK characters often have double width. Imagine that you configured MD013 as ```console $ cat .markdownlint.yml MD013: line_length: 40 ``` and you have `example.md`: ```markdown # An old story  Long, long ago there lived, an old man and an old woman.  むかしむかし，あるところに，おじいさんとおばあさんがくらしていました。 ``` It is expected that both lines 4 and 7 are warned by MD013. However, ```console $ markdownlint example.md example.md:4:41 MD013/line-length Line length [Expected: 40; Actual: 56] ``` Related: https://github.com/psf/black/issues/1197

kerem added the

enhancement

label

2026-03-03 01:27:02 +03:00

kerem commented

2026-03-03 01:27:03 +03:00

Author

Owner

@DavidAnson commented on GitHub (Aug 19, 2022):

In my mind, rules about line length are meant to ensure that everything fits on the screen or that line lengths are consistent when using a monospaced font. With that understanding, the current implementation of this rule seems correct: it counts the number of visible characters. Trying to base a rule like this on the underlying encoding/representation of the data does not seem generally useful.

Something that complicates things is shown in your example: some characters may render wider then the default monospace character width. However, that is a rendering behavior that will vary by font, program, and operating system - and does not seem like it could be addressed by a rule.

As such, I feel the current implementation is valid.

@DavidAnson commented on GitHub (Aug 19, 2022): In my mind, rules about line length are meant to ensure that everything fits on the screen or that line lengths are consistent when using a monospaced font. With that understanding, the current implementation of this rule seems correct: it counts the number of visible characters. Trying to base a rule like this on the underlying encoding/representation of the data does not seem generally useful. Something that complicates things is shown in your example: some characters may render wider then the default monospace character width. However, that is a rendering behavior that will vary by font, program, and operating system - and does not seem like it could be addressed by a rule. As such, I feel the current implementation is valid.

kerem commented

2026-03-03 01:27:03 +03:00

Author

Owner

@m15a commented on GitHub (Aug 20, 2022):

In my mind, rules about line length are meant to ensure that everything fits on the screen or that line lengths are consistent when using a monospaced font.

Agree, but displayed line lengths are inconsistent when writing CJK text since majority of monospace CJK fonts are actually duospaced.

However, that is a rendering behavior that will vary by font, program, and operating system - and does not seem like it could be addressed by a rule.

Probably I want an option, rather than changing a rule, to customize how to measure line length, based on either the number of characters or unicode width.

@m15a commented on GitHub (Aug 20, 2022): > In my mind, rules about line length are meant to ensure that everything fits on the screen or that line lengths are consistent when using a monospaced font. Agree, but displayed line lengths are inconsistent when writing CJK text since majority of monospace CJK fonts are actually [duospaced](https://en.wikipedia.org/wiki/Duospaced_font). > However, that is a rendering behavior that will vary by font, program, and operating system - and does not seem like it could be addressed by a rule. Probably I want an option, rather than changing a rule, to customize how to measure line length, based on either the number of characters or unicode width.

kerem commented

2026-03-03 01:27:03 +03:00

Author

Owner

@DavidAnson commented on GitHub (Aug 20, 2022):

Do you know if there is a RegExp character class to identify "wide" characters?

@DavidAnson commented on GitHub (Aug 20, 2022): Do you know if there is a RegExp character class to identify "wide" characters?

kerem commented

2026-03-03 01:27:03 +03:00

Author

Owner

@m15a commented on GitHub (Aug 20, 2022):

Hmm, no. I searched for and found cjk-regex, which matches CJK characters but not all CJK characters are necessarily double width (e.g., single width ｱｲｳｴｵ and double width アイウエオ).

@m15a commented on GitHub (Aug 20, 2022): Hmm, no. I searched for and found [cjk-regex](https://www.npmjs.com/package/cjk-regex), which matches CJK characters but not all CJK characters are necessarily double width (e.g., single width `ｱｲｳｴｵ` and double width `アイウエオ`).

kerem commented

2026-03-03 01:27:04 +03:00

Author

Owner

@DavidAnson commented on GitHub (Aug 20, 2022):

This part of the Unicode spec seems relevant:

https://unicode.org/reports/tr11/

As do these packages:

However, I have a strict "no dependencies" rule and I don't see a clean way of referencing this data otherwise.

@DavidAnson commented on GitHub (Aug 20, 2022): This part of the Unicode spec seems relevant: - https://unicode.org/reports/tr11/ As do these packages: - https://github.com/vangie/east-asian-width - https://github.com/susisu/meaw However, I have a strict "no dependencies" rule and I don't see a clean way of referencing this data otherwise.

kerem referenced this issue

2026-03-07 20:05:42 +03:00

[GH-ISSUE #451] MD029 is triggered on unordered with numbers in items #2221