mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 03:05:51 +03:00
[GH-ISSUE #139] UTF-8 Assumption #364
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#364
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Nemo157 on GitHub (May 26, 2017).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/139
So, after looking over RFC1035 I don't think there's any way in which sending and receiving UTF-8 encoded strings is against the standard, it's just a question of interoperability.
The standard specifies that as far as the DNS system cares pretty much everything is just binary strings, clients that use the DNS system might limit what characters are allowed (e.g. a subset of ASCII only for hostnames in most clients). RFC2181 §11 and RFC4343 §2 both restate/clarify that even
labels should be treated as plain binary data by the DNS system (except for comparison).I think
trust-dnshas made the right choice in terms of users dealing with UTF-8 encoded strings being the common case (and punycode could be added as an optional transparent layer below this quite easily). Internally I think it's necessary to work with raw bytes as servers you're connecting to could send arbitrary binary data, and if the internals are all binary safe then it would be useful to have alternative APIs giving direct access to that as an escape hatch for clients doing weird stuff (e.g. IP-over-DNS without having to base64 encode everything).As an example of other clients/servers using UTF-8, while coming up with the test cases below I tried adding an A record for
§.example.; on my machine BIND,digandtrust-dnsall handled this interoperably via UTF-8 encoding. BIND'scheck-namesdid complain that it was an invalid hostname though.I threw together a couple of test cases where
trust-dnscurrently fails 😈, to not clutter this discussion I've posted them in a gist here. Whether or not you decide to support binary data I think these each show an issue in the current implementation:\255in alabeldoesn't match what is specified in RFC4343, rather than parsing to the unicode character with value 255 I think this should either parse to the byte 0xFF if binary data is supported, or only support escapes up to\127the same as Rust's strings only support up to\x7F.@bluejekyll commented on GitHub (May 26, 2017):
Excellent research, Thank you!
Changing to storing binary internally is definitely an option, though for some reason it makes me feel odd. That being said, I have been considering changing the internal representation of the Name struct, I've never been very happy with it.
I'm not sure if your referring to "binary" labels above at all, RFC2673. I never added support for binary labels because they were deprecated in RFC6891, EDNS. Because of this, I decided that I would not support them. I can't remember but I think they posed some issues with EDNS and other things in general.
Your gist is excellent. Those are some good test cases that should be added. Assuming what you have there is valid, then we can add those as test cases and fix the code. I don't do any explicit decoding of escaped character sequences in binary, looks like I should probably cleanse the data before inserting into the string.
I don't do anything special with the TXT record processing today. It's basically just a direct call through to this function today: https://github.com/bluejekyll/trust-dns/blob/master/client/src/serialize/binary/decoder.rs#L94
I think you're right about the
\255parsing. That seems like a straightforward fix here: https://github.com/bluejekyll/trust-dns/blob/master/client/src/rr/domain.rs#L244@Nemo157 commented on GitHub (May 27, 2017):
Nope, by binary label I was just meaning labels containing non-ASCII data; I never came across RFC2673 during my random walk through the DNS RFCs.
The example I used in my gist came from RFC4343 §2.2:
a\000\\\255z.example., which is both non-ASCII and non-UTF8 because of the 0xFF byte. I'm not sure whether these are actually useful in practice; BIND defaults to erroring out any zone file that has a hostname that isn't a limited subset of ASCII, but BIND does at least let you disable this error and will serve the hostnames up.Interestingly BIND's implementation in this regard is non-standard according to RFC2181 §11:
@dimbleby commented on GitHub (Oct 23, 2017):
Valid TXT records are certainly allowed to contain arbitrary binary data. Section 6.5 of RFC6763 (DNS service discovery, which makes use of TXT records) is explicit about this:
Here's an example found in the wild:
@vorner commented on GitHub (Dec 17, 2017):
I think there's a difference between not wanting to load & serve a zone with non-ASCII names, and with failing to parse a valid DNS message because of that (eg. client asking for such a name) ‒ the returned answer would be different, or to forward such message in a resolver.
Also, I haven't read the code, but I guess the case-insensitive comparison is… strange in DNS, every DNS library I looked into implemented their own to avoid problems with locale support and such.
@bluejekyll commented on GitHub (Dec 17, 2017):
yes, TXT records should be capable of containing arbitrary binary data.
For returned queries, I finally implemented something in a recent change to return the exact bytes that were sent in #317: https://github.com/bluejekyll/trust-dns/pull/317/files#diff-a1e92465cce4e170ace723b45258b94dR218
We'll see how far the Rust variants get us... if you know of any tests, maybe we could incorporate them...