[GH-ISSUE #1782] Stack overflow in trust-dns-recursor #767

Open
opened 2026-03-16 00:10:38 +03:00 by kerem · 3 comments
Owner

Originally created by @iustin24 on GitHub (Sep 27, 2022).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1782

Describe the bug

The recursor binary in trust-dns-util does not work and returns a stack overflow error. I've also tried writing my own implementation using trust-dns-recursor and I'm getting the same stack overflow error.

To Reproduce

> cargo install --bin recurse trust-dns-util
> recurse google.com -n 9.9.9.9:53
Recursing for google.com A from roots

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
zsh: abort      recurse google.com -n 9.9.9.9:53

System:

  • OS: MacOS
  • Architecture: arm64
  • Version: rustc 1.65.0-nightly (b44197abb 2022-09-05)
Originally created by @iustin24 on GitHub (Sep 27, 2022). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1782 **Describe the bug** The recursor binary in trust-dns-util does not work and returns a stack overflow error. I've also tried writing my own implementation using trust-dns-recursor and I'm getting the same stack overflow error. **To Reproduce** ``` > cargo install --bin recurse trust-dns-util > recurse google.com -n 9.9.9.9:53 ``` ``` Recursing for google.com A from roots thread 'main' has overflowed its stack fatal runtime error: stack overflow zsh: abort recurse google.com -n 9.9.9.9:53 ``` **System:** - OS: MacOS - Architecture: arm64 - Version: rustc 1.65.0-nightly (b44197abb 2022-09-05)
Author
Owner

@bluejekyll commented on GitHub (Sep 27, 2022):

Thanks for the report, I was able to reproduce easily. The recursor doesn't currently have any test coverage, something may have regressed in the lead up to the last release.

<!-- gh-comment-id:1259895472 --> @bluejekyll commented on GitHub (Sep 27, 2022): Thanks for the report, I was able to reproduce easily. The recursor doesn't currently have any test coverage, something may have regressed in the lead up to the last release.
Author
Owner

@bluejekyll commented on GitHub (Sep 30, 2022):

Ok, started looking at this. Something about quad9's response is trigger this issue. If you use this query, things work fine:

cargo run --bin recurse -- google.com -n 192.58.128.30:53                                                             ✔  16:52:30 
    Finished dev [unoptimized + debuginfo] target(s) in 0.20s
     Running `target/debug/recurse google.com -n '192.58.128.30:53'`
Recursing for google.com A from roots
Success for query Lookup { query: Query { name: Name("google.com."), query_type: A, query_class: IN }, records: [Record { name_labels: Name("google.com."), rr_type: A, dns_class: IN, ttl: 300, rdata: Some(A(142.250.189.174)) }], valid_until: Instant { t: 53822582600830 } }
        google.com. 300 IN A 142.250.189.174

Stack overflow happens with 8.8.8.8:53 as the name server as well. Based on the --debug output, it looks like we're not treating the forward to .net from these name servers properly.

Looking at the responses from the name server, it appears that they expect us to have the glue for the root nameservers always registered, see:

2022-09-29T23:56:50.364692Z DEBUG trust_dns_resolver::error: Response:; header 16835:RESPONSE:RA:NoError:QUERY:13/0/0
; query
;; net. IN NS
; answers 13
net. 20132 IN NS h.gtld-servers.net.
net. 20132 IN NS f.gtld-servers.net.
net. 20132 IN NS b.gtld-servers.net.
net. 20132 IN NS j.gtld-servers.net.
net. 20132 IN NS l.gtld-servers.net.
net. 20132 IN NS c.gtld-servers.net.
net. 20132 IN NS e.gtld-servers.net.
net. 20132 IN NS a.gtld-servers.net.
net. 20132 IN NS k.gtld-servers.net.
net. 20132 IN NS i.gtld-servers.net.
net. 20132 IN NS g.gtld-servers.net.
net. 20132 IN NS d.gtld-servers.net.
net. 20132 IN NS m.gtld-servers.net.
; nameservers 0
; additionals 0

So my guess is we're getting stuck looking for glue for .net. and .com. because 9.9.9.9 and 8.8.8.8 are not root name severs. To me this implies that we should always register the roots, which currently is a separate file, but maybe we should consider compiling them in.

<!-- gh-comment-id:1262951576 --> @bluejekyll commented on GitHub (Sep 30, 2022): Ok, started looking at this. Something about quad9's response is trigger this issue. If you use this query, things work fine: ```console cargo run --bin recurse -- google.com -n 192.58.128.30:53  ✔  16:52:30 Finished dev [unoptimized + debuginfo] target(s) in 0.20s Running `target/debug/recurse google.com -n '192.58.128.30:53'` Recursing for google.com A from roots Success for query Lookup { query: Query { name: Name("google.com."), query_type: A, query_class: IN }, records: [Record { name_labels: Name("google.com."), rr_type: A, dns_class: IN, ttl: 300, rdata: Some(A(142.250.189.174)) }], valid_until: Instant { t: 53822582600830 } } google.com. 300 IN A 142.250.189.174 ``` Stack overflow happens with `8.8.8.8:53` as the name server as well. Based on the `--debug` output, it looks like we're not treating the forward to `.net` from these name servers properly. Looking at the responses from the name server, it appears that they expect us to have the glue for the root nameservers always registered, see: ```text 2022-09-29T23:56:50.364692Z DEBUG trust_dns_resolver::error: Response:; header 16835:RESPONSE:RA:NoError:QUERY:13/0/0 ; query ;; net. IN NS ; answers 13 net. 20132 IN NS h.gtld-servers.net. net. 20132 IN NS f.gtld-servers.net. net. 20132 IN NS b.gtld-servers.net. net. 20132 IN NS j.gtld-servers.net. net. 20132 IN NS l.gtld-servers.net. net. 20132 IN NS c.gtld-servers.net. net. 20132 IN NS e.gtld-servers.net. net. 20132 IN NS a.gtld-servers.net. net. 20132 IN NS k.gtld-servers.net. net. 20132 IN NS i.gtld-servers.net. net. 20132 IN NS g.gtld-servers.net. net. 20132 IN NS d.gtld-servers.net. net. 20132 IN NS m.gtld-servers.net. ; nameservers 0 ; additionals 0 ``` So my guess is we're getting stuck looking for glue for `.net.` and `.com.` because `9.9.9.9` and `8.8.8.8` are not root name severs. To me this implies that we should always register the roots, which currently is a separate file, but maybe we should consider compiling them in.
Author
Owner

@Arnavion commented on GitHub (Jun 7, 2023):

To me this implies that we should always register the roots, which currently is a separate file, but maybe we should consider compiling them in.

Are you talking about what IANA calls the root hints file or the root zone file? The former is the 26 A and AAAA records for {a-m}.root-servers.net, but it wouldn't solve OP's problem, because it doesn't help with figuring out the IPs of the *.gtld-servers.net servers. The latter contains the A and AAAA records of all the gTLD nameservers as well (including {a-m}.gtld-servers.net), which would solve OP's problem, but is also much bigger (~11k A and AAAA records if I counted correctly) and is not static.

Hard-coding the full root zone shouldn't be necessary to solve this problem, though it would certainly be nice if trust-dns-recursor had a pub static WELL_KNOWN_ROOT_SERVERS: Vec<NameServerConfig> = ...; for the thirteen root-servers.net servers, just so that a user can create a Recursor with them. The real issue as I see it is that trust-dns-recursor implements QNAME minimization but doesn't fall back to a larger query if a smaller query leads to a loop. In OP's case it means trust-dns-recursor does a stack overflow instead of falling back to querying the 9.9.9.9 server for IN A a.gtld-servers.net.

FYI, another example of a server that doesn't return the A/AAAA glue records for gTLD NS queries is Unbound. It filters the additional data out unless val-clean-additional is set to no. That is how I came across this issue.

<!-- gh-comment-id:1581438794 --> @Arnavion commented on GitHub (Jun 7, 2023): >To me this implies that we should always register the roots, which currently is a separate file, but maybe we should consider compiling them in. Are you talking about what IANA calls the root hints file or the root zone file? The former is the 26 A and AAAA records for `{a-m}.root-servers.net`, but it wouldn't solve OP's problem, because it doesn't help with figuring out the IPs of the `*.gtld-servers.net` servers. The latter contains the A and AAAA records of all the gTLD nameservers as well (including `{a-m}.gtld-servers.net`), which would solve OP's problem, but is also much bigger (~11k A and AAAA records if I counted correctly) and is not static. Hard-coding the full root zone shouldn't be necessary to solve this problem, though it would certainly be nice if trust-dns-recursor had a `pub static WELL_KNOWN_ROOT_SERVERS: Vec<NameServerConfig> = ...;` for the thirteen `root-servers.net` servers, just so that a user can create a `Recursor` with them. The real issue as I see it is that trust-dns-recursor implements QNAME minimization but doesn't fall back to a larger query if a smaller query leads to a loop. In OP's case it means trust-dns-recursor does a stack overflow instead of falling back to querying the `9.9.9.9` server for `IN A a.gtld-servers.net`. FYI, another example of a server that doesn't return the A/AAAA glue records for gTLD NS queries is Unbound. It filters the additional data out unless `val-clean-additional` is set to `no`. That is how I came across this issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#767
No description provided.