[GH-ISSUE #643] Unicode characters with MD051 #488

Closed
opened 2026-03-03 01:27:21 +03:00 by kerem · 3 comments
Owner

Originally created by @TomStrepsil on GitHub (Nov 18, 2022).
Original GitHub issue: https://github.com/DavidAnson/markdownlint/issues/643

Given the following markdown:

[Link](#🔗-link)

...the linter complains:

MD051/link-fragments: Link fragments should be valid

The same is true when trying to percent-encode that:

[Link](#%F0%9F%94%97-link)

Should unicode be allowed in in-line links? If so, how should they be specified in grace of the linter?

The commonmark spec has this to say about non-ascii characters in URLs:

For example, the spec says what counts as a link
destination, but it doesn't mandate that non-ASCII characters in
the URL be percent-encoded.  To use the automatic tests,
implementers will need to provide a renderer that conforms to
the expectations of the spec examples (percent-encoding
non-ASCII characters in URLs).  But a conforming implementation
can use a different renderer and may choose not to
percent-encode non-ASCII characters in URLs.

...and:

A [link destination](@) consists of either

- a sequence of zero or more characters between an opening `<` and a
  closing `>` that contains no line endings or unescaped
  `<` or `>` characters, or

- a nonempty sequence of characters that does not start with `<`,
  does not include [ASCII control characters][ASCII control character]
  or [space] character, and includes parentheses only if (a) they are
  backslash-escaped or (b) they are part of a balanced pair of
  unescaped parentheses.
  (Implementations may impose limits on parentheses nesting to
  avoid performance issues, but at least three levels of nesting
  should be supported.)

...and regarding in-line links:

URL-escaping should be left alone inside the destination, as all
URL-escaped characters are also valid URL characters. Entity and
numerical character references in the destination will be parsed
into the corresponding Unicode code points, as usual.  These may
be optionally URL-escaped when written as HTML, but this spec
does not enforce any particular policy for rendering URLs in
HTML or other formats.  Renderers may make different decisions
about how to escape or normalize URLs in the output.
Originally created by @TomStrepsil on GitHub (Nov 18, 2022). Original GitHub issue: https://github.com/DavidAnson/markdownlint/issues/643 Given the following markdown: ```md [Link](#🔗-link) ``` ...the linter complains: ```txt MD051/link-fragments: Link fragments should be valid ``` The same is true when trying to percent-encode that: ```md [Link](#%F0%9F%94%97-link) ``` Should unicode be allowed in in-line links? If so, how should they be specified in grace of the linter? The [commonmark spec](https://github.com/commonmark/commonmark-spec/blob/master/spec.txt) has this to say about non-ascii characters in URLs: ```txt For example, the spec says what counts as a link destination, but it doesn't mandate that non-ASCII characters in the URL be percent-encoded. To use the automatic tests, implementers will need to provide a renderer that conforms to the expectations of the spec examples (percent-encoding non-ASCII characters in URLs). But a conforming implementation can use a different renderer and may choose not to percent-encode non-ASCII characters in URLs. ``` ...and: ``` A [link destination](@) consists of either - a sequence of zero or more characters between an opening `<` and a closing `>` that contains no line endings or unescaped `<` or `>` characters, or - a nonempty sequence of characters that does not start with `<`, does not include [ASCII control characters][ASCII control character] or [space] character, and includes parentheses only if (a) they are backslash-escaped or (b) they are part of a balanced pair of unescaped parentheses. (Implementations may impose limits on parentheses nesting to avoid performance issues, but at least three levels of nesting should be supported.) ``` ...and regarding in-line links: ``` URL-escaping should be left alone inside the destination, as all URL-escaped characters are also valid URL characters. Entity and numerical character references in the destination will be parsed into the corresponding Unicode code points, as usual. These may be optionally URL-escaped when written as HTML, but this spec does not enforce any particular policy for rendering URLs in HTML or other formats. Renderers may make different decisions about how to escape or normalize URLs in the output. ```
kerem 2026-03-03 01:27:21 +03:00
  • closed this issue
  • added the
    question
    label
Author
Owner

@DavidAnson commented on GitHub (Nov 18, 2022):

The documentation for the next release has been updated to clarify this rule enforces the GitHub algorithm for automatic heading link generation. I'm guessing neither of the forms you show successfully connect to a generated link in a document on GitHub - in which case the warning is accurate. If one of those forms does work on GitHub, please give an example and I will look into how that character is being transformed. Thank you!

https://github.com/DavidAnson/markdownlint/blob/next/doc/md051.md

<!-- gh-comment-id:1320302608 --> @DavidAnson commented on GitHub (Nov 18, 2022): The documentation for the next release has been updated to clarify this rule enforces the GitHub algorithm for automatic heading link generation. I'm guessing neither of the forms you show successfully connect to a generated link in a document on GitHub - in which case the warning is accurate. If one of those forms does work on GitHub, please give an example and I will look into how that character is being transformed. Thank you! https://github.com/DavidAnson/markdownlint/blob/next/doc/md051.md
Author
Owner

@TomStrepsil commented on GitHub (Nov 18, 2022):

Thanks for the feedback! I realised subsequently that GitHub actually wants a link to this:

### 🔗 Link

...to appear thus:

### [Link](#-link)

...so I think a leading hyphen, but omitting the unicode character, seems to work in this instance.

<!-- gh-comment-id:1320336278 --> @TomStrepsil commented on GitHub (Nov 18, 2022): Thanks for the feedback! I realised subsequently that GitHub actually wants a link to this: ```md ### 🔗 Link ``` ...to appear thus: ```md ### [Link](#-link) ``` ...so I think a leading hyphen, but omitting the unicode character, seems to work in this instance.
Author
Owner

@DavidAnson commented on GitHub (Nov 18, 2022):

Great. What you show above produces no warnings so I will close this issue.

https://dlaa.me/markdownlint/#%25m%23%20Issue%20643%0A%0A%23%23%20%F0%9F%94%97%20Link%0A%0A%5BLink%5D(%23-link)%0A

<!-- gh-comment-id:1320479786 --> @DavidAnson commented on GitHub (Nov 18, 2022): Great. What you show above produces no warnings so I will close this issue. https://dlaa.me/markdownlint/#%25m%23%20Issue%20643%0A%0A%23%23%20%F0%9F%94%97%20Link%0A%0A%5BLink%5D(%23-link)%0A
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/markdownlint#488
No description provided.