[GH-ISSUE #139] UTF-8 Assumption #67

Closed
opened 2026-03-07 22:18:31 +03:00 by kerem · 5 comments
Owner

Originally created by @Nemo157 on GitHub (May 26, 2017).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/139

So, after looking over RFC1035 I don't think there's any way in which sending and receiving UTF-8 encoded strings is against the standard, it's just a question of interoperability.

The standard specifies that as far as the DNS system cares pretty much everything is just binary strings, clients that use the DNS system might limit what characters are allowed (e.g. a subset of ASCII only for hostnames in most clients). RFC2181 §11 and RFC4343 §2 both restate/clarify that even labels should be treated as plain binary data by the DNS system (except for comparison).

I think trust-dns has made the right choice in terms of users dealing with UTF-8 encoded strings being the common case (and punycode could be added as an optional transparent layer below this quite easily). Internally I think it's necessary to work with raw bytes as servers you're connecting to could send arbitrary binary data, and if the internals are all binary safe then it would be useful to have alternative APIs giving direct access to that as an escape hatch for clients doing weird stuff (e.g. IP-over-DNS without having to base64 encode everything).

As an example of other clients/servers using UTF-8, while coming up with the test cases below I tried adding an A record for §.example.; on my machine BIND, dig and trust-dns all handled this interoperably via UTF-8 encoding. BIND's check-names did complain that it was an invalid hostname though.

I threw together a couple of test cases where trust-dns currently fails 😈, to not clutter this discussion I've posted them in a gist here. Whether or not you decide to support binary data I think these each show an issue in the current implementation:

  1. The error returned if a server sends non-UTF-8 data seems like it should be more specific to that case.
  2. The parsing of \255 in a label doesn't match what is specified in RFC4343, rather than parsing to the unicode character with value 255 I think this should either parse to the byte 0xFF if binary data is supported, or only support escapes up to \127 the same as Rust's strings only support up to \x7F.
Originally created by @Nemo157 on GitHub (May 26, 2017). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/139 So, after looking over RFC1035 I don't think there's any way in which sending and receiving UTF-8 encoded strings is against the standard, it's just a question of interoperability. The standard specifies that as far as the DNS system cares pretty much everything is just binary strings, clients that use the DNS system might limit what characters are allowed (e.g. a subset of ASCII only for hostnames in most clients). [RFC2181 §11](https://tools.ietf.org/html/rfc2181#section-11) and [RFC4343 §2](https://tools.ietf.org/html/rfc4343#section-2) both restate/clarify that even `label`s should be treated as plain binary data by the DNS system (except for comparison). I think `trust-dns` has made the right choice in terms of users dealing with UTF-8 encoded strings being the common case (and punycode could be added as an optional transparent layer below this quite easily). Internally I think it's necessary to work with raw bytes as servers you're connecting to could send arbitrary binary data, and if the internals are all binary safe then it would be useful to have alternative APIs giving direct access to that as an escape hatch for clients doing weird stuff (e.g. [IP-over-DNS](http://code.kryo.se/iodine/) without having to base64 encode everything). As an example of other clients/servers using UTF-8, while coming up with the test cases below I tried adding an A record for `§.example.`; on my machine BIND, `dig` and `trust-dns` all handled this interoperably via UTF-8 encoding. BIND's `check-names` did complain that it was an invalid hostname though. I threw together a couple of test cases where `trust-dns` currently fails 😈, to not clutter this discussion I've posted them [in a gist here](https://gist.github.com/Nemo157/5fe69c52fbf106a9314033ebdc079df0). Whether or not you decide to support binary data I think these each show an issue in the current implementation: 1. The error returned if a server sends non-UTF-8 data seems like it should be more specific to that case. 2. The parsing of `\255` in a `label` doesn't match what is specified in RFC4343, rather than parsing to the unicode character with value 255 I think this should either parse to the byte 0xFF if binary data is supported, or only support escapes up to `\127` the same as Rust's strings only support up to `\x7F`.
kerem 2026-03-07 22:18:31 +03:00
  • closed this issue
  • added the
    cleanup
    label
Author
Owner

@bluejekyll commented on GitHub (May 26, 2017):

Excellent research, Thank you!

Internally I think it's necessary to work with raw bytes as servers you're connecting to could send arbitrary binary data, and if the internals are all binary safe then it would be useful to have alternative APIs giving direct access to that as an escape hatch for clients doing weird stuff (e.g. IP-over-DNS without having to base64 encode everything).

Changing to storing binary internally is definitely an option, though for some reason it makes me feel odd. That being said, I have been considering changing the internal representation of the Name struct, I've never been very happy with it.

I'm not sure if your referring to "binary" labels above at all, RFC2673. I never added support for binary labels because they were deprecated in RFC6891, EDNS. Because of this, I decided that I would not support them. I can't remember but I think they posed some issues with EDNS and other things in general.

Your gist is excellent. Those are some good test cases that should be added. Assuming what you have there is valid, then we can add those as test cases and fix the code. I don't do any explicit decoding of escaped character sequences in binary, looks like I should probably cleanse the data before inserting into the string.

  1. I don't do anything special with the TXT record processing today. It's basically just a direct call through to this function today: https://github.com/bluejekyll/trust-dns/blob/master/client/src/serialize/binary/decoder.rs#L94

  2. I think you're right about the \255 parsing. That seems like a straightforward fix here: https://github.com/bluejekyll/trust-dns/blob/master/client/src/rr/domain.rs#L244

<!-- gh-comment-id:304404944 --> @bluejekyll commented on GitHub (May 26, 2017): Excellent research, Thank you! > Internally I think it's necessary to work with raw bytes as servers you're connecting to could send arbitrary binary data, and if the internals are all binary safe then it would be useful to have alternative APIs giving direct access to that as an escape hatch for clients doing weird stuff (e.g. IP-over-DNS without having to base64 encode everything). Changing to storing binary internally is definitely an option, though for some reason it makes me feel odd. That being said, I have been considering changing the internal representation of the Name struct, I've never been very happy with it. I'm not sure if your referring to "binary" labels above at all, [RFC2673](https://tools.ietf.org/html/rfc2673). I never added support for binary labels because they were deprecated in [RFC6891](https://tools.ietf.org/html/rfc6891), EDNS. Because of this, I decided that I would not support them. I can't remember but I think they posed some issues with EDNS and other things in general. Your gist is excellent. Those are some good test cases that should be added. Assuming what you have there is valid, then we can add those as test cases and fix the code. I don't do any explicit decoding of escaped character sequences in binary, looks like I should probably cleanse the data before inserting into the string. 1. I don't do anything special with the TXT record processing today. It's basically just a direct call through to this function today: https://github.com/bluejekyll/trust-dns/blob/master/client/src/serialize/binary/decoder.rs#L94 2. I think you're right about the `\255` parsing. That seems like a straightforward fix here: https://github.com/bluejekyll/trust-dns/blob/master/client/src/rr/domain.rs#L244
Author
Owner

@Nemo157 commented on GitHub (May 27, 2017):

I'm not sure if your referring to "binary" labels above at all, RFC2673.

Nope, by binary label I was just meaning labels containing non-ASCII data; I never came across RFC2673 during my random walk through the DNS RFCs.

The example I used in my gist came from RFC4343 §2.2: a\000\\\255z.example., which is both non-ASCII and non-UTF8 because of the 0xFF byte. I'm not sure whether these are actually useful in practice; BIND defaults to erroring out any zone file that has a hostname that isn't a limited subset of ASCII, but BIND does at least let you disable this error and will serve the hostnames up.

Interestingly BIND's implementation in this regard is non-standard according to RFC2181 §11:

In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs. A DNS server may be configurable to issue warnings when loading, or even to refuse to load, a primary zone containing labels that might be considered questionable, however this should not happen by default.

<!-- gh-comment-id:304444650 --> @Nemo157 commented on GitHub (May 27, 2017): > I'm not sure if your referring to "binary" labels above at all, RFC2673. Nope, by binary label I was just meaning labels containing non-ASCII data; I never came across RFC2673 during my random walk through the DNS RFCs. The example I used in my gist came from [RFC4343 §2.2](https://tools.ietf.org/html/rfc4343#section-2.2): `a\000\\\255z.example.`, which is both non-ASCII and non-UTF8 because of the 0xFF byte. I'm not sure whether these are actually useful in practice; BIND defaults to erroring out any zone file that has a hostname that isn't a limited subset of ASCII, but BIND does at least let you disable this error and will serve the hostnames up. Interestingly BIND's implementation in this regard is non-standard according to [RFC2181 §11](https://tools.ietf.org/html/rfc2181#section-11): > In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs. A DNS server may be configurable to issue warnings when loading, or even to refuse to load, a primary zone containing labels that might be considered questionable, however this should not happen by default.
Author
Owner

@dimbleby commented on GitHub (Oct 23, 2017):

Valid TXT records are certainly allowed to contain arbitrary binary data. Section 6.5 of RFC6763 (DNS service discovery, which makes use of TXT records) is explicit about this:

The value is opaque binary data. Often the value for a particular attribute will be US-ASCII [RFC20] or UTF-8 [RFC3629] text, but it is legal for a value to be any binary data.

Here's an example found in the wild:

$ dig @8.8.8.8 like.com.sa TXT

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @8.8.8.8 like.com.sa TXT
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15148
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;like.com.sa.                   IN      TXT

;; ANSWER SECTION:
like.com.sa.            14399   IN      TXT     "v=spf1 ip4:70.38.11.53 +a +mx +ip4:\184k\180c ?all"

;; Query time: 300 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Oct 23 08:35:50 BST 2017
;; MSG SIZE  rcvd: 97
<!-- gh-comment-id:338635742 --> @dimbleby commented on GitHub (Oct 23, 2017): Valid TXT records are certainly allowed to contain arbitrary binary data. Section 6.5 of RFC6763 (DNS service discovery, which makes use of TXT records) is explicit about this: > The value is opaque binary data. Often the value for a particular attribute will be US-ASCII [RFC20] or UTF-8 [RFC3629] text, but it is legal for a value to be any binary data. Here's an example found in the wild: ``` $ dig @8.8.8.8 like.com.sa TXT ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @8.8.8.8 like.com.sa TXT ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15148 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;like.com.sa. IN TXT ;; ANSWER SECTION: like.com.sa. 14399 IN TXT "v=spf1 ip4:70.38.11.53 +a +mx +ip4:\184k\180c ?all" ;; Query time: 300 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Mon Oct 23 08:35:50 BST 2017 ;; MSG SIZE rcvd: 97 ```
Author
Owner

@vorner commented on GitHub (Dec 17, 2017):

I think there's a difference between not wanting to load & serve a zone with non-ASCII names, and with failing to parse a valid DNS message because of that (eg. client asking for such a name) ‒ the returned answer would be different, or to forward such message in a resolver.

Also, I haven't read the code, but I guess the case-insensitive comparison is… strange in DNS, every DNS library I looked into implemented their own to avoid problems with locale support and such.

<!-- gh-comment-id:352278221 --> @vorner commented on GitHub (Dec 17, 2017): I think there's a difference between not wanting to load & serve a zone with non-ASCII names, and with failing to parse a valid DNS message because of that (eg. client asking for such a name) ‒ the returned answer would be different, or to forward such message in a resolver. Also, I haven't read the code, but I guess the case-insensitive comparison is… strange in DNS, every DNS library I looked into implemented their own to avoid problems with locale support and such.
Author
Owner

@bluejekyll commented on GitHub (Dec 17, 2017):

Valid TXT records are certainly allowed to contain arbitrary binary data

yes, TXT records should be capable of containing arbitrary binary data.

think there's a difference between not wanting to load & serve a zone with non-ASCII names, and with failing to parse a valid DNS message because of that (eg. client asking for such a name) ‒ the returned answer would be different, or to forward such message in a resolver.

For returned queries, I finally implemented something in a recent change to return the exact bytes that were sent in #317: https://github.com/bluejekyll/trust-dns/pull/317/files#diff-a1e92465cce4e170ace723b45258b94dR218

every DNS library I looked into implemented their own to avoid problems with locale support and such.

We'll see how far the Rust variants get us... if you know of any tests, maybe we could incorporate them...

<!-- gh-comment-id:352290841 --> @bluejekyll commented on GitHub (Dec 17, 2017): > Valid TXT records are certainly allowed to contain arbitrary binary data yes, TXT records should be capable of containing arbitrary binary data. > think there's a difference between not wanting to load & serve a zone with non-ASCII names, and with failing to parse a valid DNS message because of that (eg. client asking for such a name) ‒ the returned answer would be different, or to forward such message in a resolver. For returned queries, I finally implemented something in a recent change to return the exact bytes that were sent in #317: https://github.com/bluejekyll/trust-dns/pull/317/files#diff-a1e92465cce4e170ace723b45258b94dR218 > every DNS library I looked into implemented their own to avoid problems with locale support and such. We'll see how far the Rust variants get us... if you know of any tests, maybe we could incorporate them...
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#67
No description provided.