[GH-ISSUE #1904] Non leading underscore in Name::from_utf8 parsing fails #814

Open
opened 2026-03-16 00:20:35 +03:00 by kerem · 10 comments
Owner

Originally created by @nrempel on GitHub (Mar 10, 2023).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1904

Hello. I'm wondering where I can find more information about this particular parsing logic:

github.com/bluejekyll/trust-dns@5492bdedba/crates/proto/src/rr/domain/name.rs (L495-L512)

// Error, underscore in the end
assert!(Name::from_utf8("dis_allowed.example.com.").is_err());

Underscores are allowed at the beginning of a label but not anywhere else.

I couldn't find any mention of this in https://www.rfc-editor.org/rfc/rfc1035.html. Could anyone point me to the relevant rfc?

Thanks!

Originally created by @nrempel on GitHub (Mar 10, 2023). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1904 Hello. I'm wondering where I can find more information about this particular parsing logic: https://github.com/bluejekyll/trust-dns/blob/5492bdedba3479b480fcb904844568fd82d12500/crates/proto/src/rr/domain/name.rs#L495-L512 ```rust // Error, underscore in the end assert!(Name::from_utf8("dis_allowed.example.com.").is_err()); ``` Underscores are allowed at the beginning of a label but not anywhere else. I couldn't find any mention of this in https://www.rfc-editor.org/rfc/rfc1035.html. Could anyone point me to the relevant rfc? Thanks!
Author
Owner

@cpu commented on GitHub (Mar 10, 2023):

In general my understanding is that underscores are forbidden for host names but may occur in domain names in other contexts as the DNS is general beyond host names. Notably there's a pattern of using a leading _ to distinguish from a host name (e.g. for SRV records). I suspect that's what this parsing logic is attempting to honour. I think the best reference for this scoping hack is https://www.rfc-editor.org/rfc/rfc8552#section-1.1 but would be curious if someone else can dig up better chapter and verse to cite :-)

<!-- gh-comment-id:1464260896 --> @cpu commented on GitHub (Mar 10, 2023): In general my understanding is that underscores are forbidden for **host names** but may occur in domain names in other contexts as the DNS is general beyond host names. Notably there's a pattern of using a leading `_` to distinguish from a host name (e.g. for `SRV` records). I suspect that's what this parsing logic is attempting to honour. I think the best reference for this scoping hack is https://www.rfc-editor.org/rfc/rfc8552#section-1.1 but would be curious if someone else can dig up better chapter and verse to cite :-)
Author
Owner

@nrempel commented on GitHub (Mar 10, 2023):

Thank you!

I was under the impression that underscores are forbidden in host names but all other labels allow them. The leading underscore seems like more of convention that a hard requirement but maybe I'm wrong?

For instance, Shopify now (newly?) requires creating a TXT record at shopify_verification.domain.com with a token. Should I be parsing shopify_verification with Name::from_str_relaxed in this case or is there a better type to represent more flexible labels?

Edit: or is this an opportunity to loosen the requirements of Name::from_utf8 so that parsing this label doesn't fail?

<!-- gh-comment-id:1464281958 --> @nrempel commented on GitHub (Mar 10, 2023): Thank you! I was under the impression that underscores are forbidden in host names but all other labels allow them. The leading underscore seems like more of convention that a hard requirement but maybe I'm wrong? For instance, Shopify now (newly?) requires creating a TXT record at `shopify_verification.domain.com` with a token. Should I be parsing `shopify_verification` with `Name::from_str_relaxed` in this case or is there a better type to represent more flexible labels? Edit: or is this an opportunity to loosen the requirements of `Name::from_utf8` so that parsing this label doesn't fail?
Author
Owner

@djc commented on GitHub (Mar 13, 2023):

I was under the impression that underscores are forbidden in host names but all other labels allow them.

That makes it sound like you think "host names" is used to refer to the first label? I'm not sure that is the case.

The aforelinked RFC 8552 section 1.1 (Scoped Interpretation of DNS Resource Records through "Underscored" Naming of Attribute Leaves) says:

Because the DNS rules for a "host" (host name) do not allow use of the underscore character, the underscored name is distinguishable from all legal host names [RFC0952].

RFC 952 ("DOD INTERNET HOST TABLE SPECIFICATION") says:

  1. A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.). Note that periods are only allowed when they serve to delimit components of "domain style names". (See RFC-921, "Domain Name System Implementation Schedule", for background).

So I guess the way I understand it is that names generally probably allow underscore, but specifically names referring to an (IP?) endpoint would, I guess, not?

It's not completely obvious to me if/how we should change the trust-dns-proto API based on these observations, though.

<!-- gh-comment-id:1466159760 --> @djc commented on GitHub (Mar 13, 2023): > I was under the impression that underscores are forbidden in host names but all other labels allow them. That makes it sound like you think "host names" is used to refer to the first label? I'm not sure that is the case. The aforelinked RFC 8552 section 1.1 (Scoped Interpretation of DNS Resource Records through "Underscored" Naming of Attribute Leaves) says: > Because the DNS rules for a "host" (host name) do not allow use of the underscore character, the underscored name is distinguishable from all legal host names [[RFC0952](https://datatracker.ietf.org/doc/html/rfc0952)]. RFC 952 ("DOD INTERNET HOST TABLE SPECIFICATION") says: > 1. A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.). Note that periods are only allowed when they serve to delimit components of "domain style names". (See [RFC-921](https://datatracker.ietf.org/doc/html/rfc921), "Domain Name System Implementation Schedule", for background). So I guess the way I understand it is that names generally probably allow underscore, but specifically names referring to an (IP?) endpoint would, I guess, not? It's not completely obvious to me if/how we should change the trust-dns-proto API based on these observations, though.
Author
Owner

@nrempel commented on GitHub (Mar 13, 2023):

That makes it sound like you think "host names" is used to refer to the first label? I'm not sure that is the case.

Just in the case of shopify_verification.domain.com

Basically, from my reading, shopify_verification.domain.com seems to be a perfectly valid domain for a TXT record. Should Name::from_utf8 be updated so that it successfully parses shopify_verification.domain.com? It seems like a strange determination that only leading underscores can be parsed.

<!-- gh-comment-id:1466796748 --> @nrempel commented on GitHub (Mar 13, 2023): > That makes it sound like you think "host names" is used to refer to the first label? I'm not sure that is the case. Just in the case of `shopify_verification.domain.com` Basically, from my reading, `shopify_verification.domain.com` seems to be a perfectly valid domain for a TXT record. Should `Name::from_utf8` be updated so that it successfully parses `shopify_verification.domain.com`? It seems like a strange determination that only leading underscores can be parsed.
Author
Owner

@djc commented on GitHub (Mar 16, 2023):

@bluejekyll curious about your thoughts on this! I feel like making from_utf8() more lenient is probably a decent option, although we should probably add some language to the documentation around these issues.

<!-- gh-comment-id:1471905443 --> @djc commented on GitHub (Mar 16, 2023): @bluejekyll curious about your thoughts on this! I feel like making `from_utf8()` more lenient is probably a decent option, although we should probably add some language to the documentation around these issues.
Author
Owner

@bluejekyll commented on GitHub (Mar 31, 2023):

Sorry, for the late response. When I implemented a lot of this stuff, I did it strictly... it turns out that in the wild people do things that aren't to the standard, and other nameservers/clients out there are much more flexible. I think that the end of the day, we're probably too strict and should just allow it.

<!-- gh-comment-id:1491263181 --> @bluejekyll commented on GitHub (Mar 31, 2023): Sorry, for the late response. When I implemented a lot of this stuff, I did it strictly... it turns out that in the wild people do things that aren't to the standard, and other nameservers/clients out there are much more flexible. I think that the end of the day, we're probably too strict and should just allow it.
Author
Owner

@bluejekyll commented on GitHub (Mar 31, 2023):

Basically the question becomes, do we want to restrict the name in some way in any record, or leave that up to the caller to determine if _ is valid and we stay out of it. Shopify is probably wrong, but who am I to challenge a multiple billion dollar company.

<!-- gh-comment-id:1491266775 --> @bluejekyll commented on GitHub (Mar 31, 2023): Basically the question becomes, do we want to restrict the name in some way in any record, or leave that up to the caller to determine if `_` is valid and we stay out of it. Shopify is probably wrong, but who am I to challenge a multiple billion dollar company.
Author
Owner

@nrempel commented on GitHub (Apr 4, 2023):

Thanks @bluejekyll!

I actually think that sticking to the spec is ideal here. We do have a way to work around this using from_str_relaxed.

However, I couldn't find any RFC that definitively prohibits non-leading underscores.

So I think we could improve the docs here to reference the relevant RFC sections and/or make this more lenient if there is no hard rule against non-leading underscores.

Do we know that shopify_verification.domain.com for sure violates some specification?

<!-- gh-comment-id:1496326759 --> @nrempel commented on GitHub (Apr 4, 2023): Thanks @bluejekyll! I actually think that sticking to the spec is ideal here. We do have a way to work around this using `from_str_relaxed`. However, I couldn't find any RFC that definitively prohibits non-leading underscores. So I think we could improve the docs here to reference the relevant RFC sections and/or make this more lenient if there is no hard rule against non-leading underscores. Do we know that `shopify_verification.domain.com` for sure violates some specification?
Author
Owner

@bluejekyll commented on GitHub (Apr 4, 2023):

The problem is this notion of “hostname” vs. other names. We can’t know if something is a hostname or not, it’s a distinction that would only be in context from the API user. If that’s the case, then trust-dns shouldn’t try to determine if it’s a hostname or not.

Is Shopify’s usage wrong? By my reading of the RFCs I’d say yes, but I can see someone else saying TXT records are never “host names” so therefor it’s ok. Which is why given their real world usage, I’m inclined to say that we’re being too strict.

<!-- gh-comment-id:1496728096 --> @bluejekyll commented on GitHub (Apr 4, 2023): The problem is this notion of “hostname” vs. other names. We can’t know if something is a hostname or not, it’s a distinction that would only be in context from the API user. If that’s the case, then trust-dns shouldn’t try to determine if it’s a hostname or not. Is Shopify’s usage wrong? By my reading of the RFCs I’d say yes, but I can see someone else saying TXT records are never “host names” so therefor it’s ok. Which is why given their real world usage, I’m inclined to say that we’re being too strict.
Author
Owner

@darnuria commented on GitHub (Jul 2, 2023):

I think it's good material and precision that can end-up in this future RFC from NLnet labs https://github.com/NLnetLabs/draft-koekkoek-dnsop-zone-file-format, now it seems paused since they first do their own SIMD zone parser https://github.com/NLnetLabs/simdzone.

Also seen at work (gandi.net registar) lot of record using either for TXT like __domain_key or SRV.

SVCB/HTTPS records will also accept _.

https://datatracker.ietf.org/doc/draft-ietf-dnsop-svcb-https/

2.3. SVCB query names

When querying the SVCB RR, a service is translated into a QNAME by
prepending the service name with a label indicating the scheme,
prefixed with an underscore, resulting in a domain name like
"_examplescheme.api.example.com.". This follows the Attrleaf naming
pattern [Attrleaf], so the scheme MUST be registered appropriately
with IANA (see Section 11).

My reading leaded to this RFC: Scoped Interpretation of DNS Resource Records through "Underscored" Naming of Attribute Leaves
https://datatracker.ietf.org/doc/html/rfc8552

Like @cpu aleady pointer in https://github.com/bluejekyll/trust-dns/issues/1904#issuecomment-1464260896

But also point to this one that try to fix-it-all: "DNS AttrLeaf Changes: Fixing Specifications That Use Underscored Node Names"https://www.rfc-editor.org/rfc/rfc8553.html

Found via https://www.bortzmeyer.org/8553.html

Checkout: https://www.rfc-editor.org/rfc/rfc8553.html#section-2

Sorry for the edits: last minute reading was worth editing myself. :)

<!-- gh-comment-id:1616670320 --> @darnuria commented on GitHub (Jul 2, 2023): I think it's good material and precision that can end-up in this future RFC from NLnet labs https://github.com/NLnetLabs/draft-koekkoek-dnsop-zone-file-format, now it seems paused since they first do their own SIMD zone parser https://github.com/NLnetLabs/simdzone. Also seen at work (gandi.net registar) lot of record using either for TXT like `__domain_key` or SRV. SVCB/HTTPS records will also accept `_`. https://datatracker.ietf.org/doc/draft-ietf-dnsop-svcb-https/ > 2.3. SVCB query names > > When querying the SVCB RR, a service is translated into a QNAME by > prepending the service name with a label indicating the scheme, > prefixed with an underscore, resulting in a domain name like > "_examplescheme.api.example.com.". This follows the Attrleaf naming > pattern [Attrleaf], so the scheme MUST be registered appropriately > with IANA (see Section 11). My reading leaded to this RFC: Scoped Interpretation of DNS Resource Records through "Underscored" Naming of Attribute Leaves https://datatracker.ietf.org/doc/html/rfc8552 Like @cpu aleady pointer in https://github.com/bluejekyll/trust-dns/issues/1904#issuecomment-1464260896 But also point to this one that try to fix-it-all: "DNS AttrLeaf Changes: Fixing Specifications That Use Underscored Node Names"https://www.rfc-editor.org/rfc/rfc8553.html Found via https://www.bortzmeyer.org/8553.html Checkout: https://www.rfc-editor.org/rfc/rfc8553.html#section-2 Sorry for the edits: last minute reading was worth editing myself. :)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#814
No description provided.