mirror of
https://github.com/DavidAnson/markdownlint.git
synced 2026-04-25 01:05:55 +03:00
[GH-ISSUE #1904] MD059: Identify bad link texts using "non-descriptive words" rather than "disallowed texts"? #776
Labels
No labels
bug
enhancement
enhancement
enhancement
fixed in next
fixed in next
fixed in next
new rule
new rule
new rule
pull-request
question
refactoring
refactoring
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/markdownlint#776
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @benblank on GitHub (Dec 31, 2025).
Original GitHub issue: https://github.com/DavidAnson/markdownlint/issues/1904
MD059 will always be prone to false negatives just because of the nature of trying to translate the idea of "descriptive" into code/config. But I was looking at the list of default
disallowedTextsand thought there might be an alternate way of checking for descriptive link text which at least reduces false negatives without introducing false positives.Basically, instead of having the rule act as a simple "disallow list" for complete link texts, perhaps it could keep a list of words which are known to be "non-descriptive" and ensure link texts contain at least one word not on that list. The configuration would then contain a
nonDescriptiveWords(or some such) array containing words like "click", "for", "here", "link", "more", "on", "the", "this", etc. It still wouldn't be perfect, of course, but I think the idea of changing from "these phrases aren't descriptive" to "these words don't make a phrase descriptive" more closely matches the problem with non-descriptive links.MD059 already normalizes the link text in a way which would simplify the change. The test at its heart is
prohibitedTexts.has(normalize(text)); changing it to e.g.!normalize(text).split(" ").some((word) => !nonDescriptiveWords.has(word))would give the rule the ability to catch additional link texts without requiring that each full link text be added to the config.Some examples:
I think this could really improve MD059's utility. How would you feel about introducing this kind of change?
@DavidAnson commented on GitHub (Dec 31, 2025):
This is an interesting idea! My first thought is that while the current implementation is imperfect, it is imperfect in a consistent, predictable way that only reports violations for obviously-problematic inputs. My concern with this proposal is that a link with the text "Click here on this link for more" would report a violation. And arguably it should, but some people may think that link is descriptive enough as-is.
Basically, the proposed approach will report violations for a much broader set of inputs and some of those combinations could be controversial. If so, "tuning" the list of words might require careful creativity in contrast to the current implementation where it's obvious what to change to restrict or broaden the scope of what's reported.
@benblank commented on GitHub (Dec 31, 2025):
I don't think the proposed approach would necessarily be less consistent or predictable, but I do worry it would more difficult to communicate, which in many ways amounts to much the same thing. While writing the suggestion, it took a few minutes to hit on a good, concise description of the behavior and it still doesn't even come close to the simplicity of "it's a deny list" or "don't use these". 😅
On the other hand, though, I think the simplicity of the current implementation works against it when it comes to configuration. If I do think that "Click here on this link for more" isn't descriptive, then I'm surely going to think the same of any number of variations have the same problem. Actually creating a list of every possible variation quickly becomes time-consuming and the result has clear redundancies:
And then you realize that "click" may be outside the link text and add another eight lines… and that "here" also makes sense after "this link" instead, for another six. It feels like the current implementation almost requires being configured with close to 2n combinations of n optional parts.
I think replacing it with a list of those parts then becomes much more manageable, both conceptually and in the config file. And the implementation becomes much simpler when those "parts" are just individual words.
But I also completely agree about any list of words used like this needing to be well-tuned. The example list I gave in the description is only meant to be illustrative; it may well be missing words which should be added or already contain words it shouldn't. 🙂