[GH-ISSUE #1904] MD059: Identify bad link texts using "non-descriptive words" rather than "disallowed texts"? #2625

Open
opened 2026-03-07 20:09:28 +03:00 by kerem · 2 comments
Owner

Originally created by @benblank on GitHub (Dec 31, 2025).
Original GitHub issue: https://github.com/DavidAnson/markdownlint/issues/1904

MD059 will always be prone to false negatives just because of the nature of trying to translate the idea of "descriptive" into code/config. But I was looking at the list of default disallowedTexts and thought there might be an alternate way of checking for descriptive link text which at least reduces false negatives without introducing false positives.

Basically, instead of having the rule act as a simple "disallow list" for complete link texts, perhaps it could keep a list of words which are known to be "non-descriptive" and ensure link texts contain at least one word not on that list. The configuration would then contain a nonDescriptiveWords (or some such) array containing words like "click", "for", "here", "link", "more", "on", "the", "this", etc. It still wouldn't be perfect, of course, but I think the idea of changing from "these phrases aren't descriptive" to "these words don't make a phrase descriptive" more closely matches the problem with non-descriptive links.

MD059 already normalizes the link text in a way which would simplify the change. The test at its heart is prohibitedTexts.has(normalize(text)); changing it to e.g. !normalize(text).split(" ").some((word) => !nonDescriptiveWords.has(word)) would give the rule the ability to catch additional link texts without requiring that each full link text be added to the config.

Some examples:

link text v0.40.0 proposed
click [here] ⚠️ ⚠️
[click here] ⚠️ ⚠️
click this [link] ⚠️ ⚠️
[click this link] ⚠️
[click here for more] ⚠️
[click here for more info]

I think this could really improve MD059's utility. How would you feel about introducing this kind of change?

Originally created by @benblank on GitHub (Dec 31, 2025). Original GitHub issue: https://github.com/DavidAnson/markdownlint/issues/1904 MD059 will always be prone to false negatives just because of the nature of trying to translate the idea of "descriptive" into code/config. But I was looking at the list of default `disallowedTexts` and thought there might be an alternate way of checking for descriptive link text which at least reduces false negatives without introducing false positives. Basically, instead of having the rule act as a simple "disallow list" for complete link texts, perhaps it could keep a list of words which are known to be "non-descriptive" and ensure link texts contain *at least one word not on that list*. The configuration would then contain a `nonDescriptiveWords` (or some such) array containing words like "click", "for", "here", "link", "more", "on", "the", "this", etc. It still wouldn't be perfect, of course, but I think the idea of changing from "these phrases aren't descriptive" to "these words don't make a phrase descriptive" more closely matches the problem with non-descriptive links. MD059 already normalizes the link text in a way which would simplify the change. The [test at its heart](https://github.com/DavidAnson/markdownlint/blob/63fefcbd4a7cbfa4fb49b40b5c4020c25df24c7d/lib/md059.mjs) is `prohibitedTexts.has(normalize(text))`; changing it to e.g. `!normalize(text).split(" ").some((word) => !nonDescriptiveWords.has(word))` would give the rule the ability to catch additional link texts without requiring that each full link text be added to the config. Some examples: | link text | v0.40.0 | proposed | |:-------------------------------|:-------:|:--------:| | click \[here\] | ⚠️ | ⚠️ | | \[click here\] | ⚠️ | ⚠️ | | click this \[link\] | ⚠️ | ⚠️ | | \[click this link\] | ✅ | ⚠️ | | \[click here for more\] | ✅ | ⚠️ | | \[click here for more *info*\] | ✅ | ✅ | I think this could really improve MD059's utility. How would you feel about introducing this kind of change?
Author
Owner

@DavidAnson commented on GitHub (Dec 31, 2025):

This is an interesting idea! My first thought is that while the current implementation is imperfect, it is imperfect in a consistent, predictable way that only reports violations for obviously-problematic inputs. My concern with this proposal is that a link with the text "Click here on this link for more" would report a violation. And arguably it should, but some people may think that link is descriptive enough as-is.

Basically, the proposed approach will report violations for a much broader set of inputs and some of those combinations could be controversial. If so, "tuning" the list of words might require careful creativity in contrast to the current implementation where it's obvious what to change to restrict or broaden the scope of what's reported.

<!-- gh-comment-id:3701280487 --> @DavidAnson commented on GitHub (Dec 31, 2025): This is an interesting idea! My first thought is that while the current implementation is imperfect, it is imperfect in a consistent, predictable way that only reports violations for obviously-problematic inputs. My concern with this proposal is that a link with the text "Click here on this link for more" would report a violation. And arguably it *should*, but some people may think that link is descriptive enough as-is. Basically, the proposed approach will report violations for a much broader set of inputs and some of those combinations could be controversial. If so, "tuning" the list of words might require careful creativity in contrast to the current implementation where it's obvious what to change to restrict or broaden the scope of what's reported.
Author
Owner

@benblank commented on GitHub (Dec 31, 2025):

I don't think the proposed approach would necessarily be less consistent or predictable, but I do worry it would more difficult to communicate, which in many ways amounts to much the same thing. While writing the suggestion, it took a few minutes to hit on a good, concise description of the behavior and it still doesn't even come close to the simplicity of "it's a deny list" or "don't use these". 😅

On the other hand, though, I think the simplicity of the current implementation works against it when it comes to configuration. If I do think that "Click here on this link for more" isn't descriptive, then I'm surely going to think the same of any number of variations have the same problem. Actually creating a list of every possible variation quickly becomes time-consuming and the result has clear redundancies:

  • click here
  • click this link
  • click on this link
  • click here on this link
  • click here for more
  • click this link for more
  • click on this link for more
  • click here on this link for more

And then you realize that "click" may be outside the link text and add another eight lines… and that "here" also makes sense after "this link" instead, for another six. It feels like the current implementation almost requires being configured with close to 2n combinations of n optional parts.

I think replacing it with a list of those parts then becomes much more manageable, both conceptually and in the config file. And the implementation becomes much simpler when those "parts" are just individual words.

But I also completely agree about any list of words used like this needing to be well-tuned. The example list I gave in the description is only meant to be illustrative; it may well be missing words which should be added or already contain words it shouldn't. 🙂

<!-- gh-comment-id:3702864820 --> @benblank commented on GitHub (Dec 31, 2025): I don't think the proposed approach would necessarily be less *consistent* or *predictable*, but I do worry it would more *difficult to communicate*, which in many ways amounts to much the same thing. While writing the suggestion, it took a few minutes to hit on a good, concise description of the behavior and it still doesn't even come close to the simplicity of "it's a deny list" or "don't use these". 😅 On the other hand, though, I think the simplicity of the current implementation works against it when it comes to configuration. If I *do* think that "Click here on this link for more" isn't descriptive, then I'm surely going to think the same of any number of variations have the same problem. Actually creating a list of every possible variation quickly becomes time-consuming and the result has clear redundancies: * click here * click this link * click on this link * click here on this link * click here for more * click this link for more * click on this link for more * click here on this link for more And then you realize that "click" may be *outside* the link text and add another eight lines… and that "here" also makes sense after "this link" instead, for another six. It feels like the current implementation almost **requires** being configured with close to 2<sup>_n_</sup> combinations of _n_ optional parts. I think replacing it with a list of those parts then becomes much more manageable, both conceptually and in the config file. And the implementation becomes much simpler when those "parts" are just individual words. But I also completely agree about any list of words used like this needing to be well-tuned. The example list I gave in the description is only meant to be illustrative; it may well be missing words which should be added or already contain words it shouldn't. 🙂
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/markdownlint#2625
No description provided.