[GH-ISSUE #1889] Infinite recursion and stack overflow on handling broken glueless referral #807

Open
opened 2026-03-16 00:19:13 +03:00 by kerem · 1 comment
Owner

Originally created by @jonasbb on GitHub (Jan 26, 2023).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1889

Describe the bug

There is an infinite recursion while looking for missing glue data, leading to a stack overflow.
The main stack overflow is this part, which repeats until the crash at the end. After the crash, no further DNS processing is done, effectively causing a denial-of-service.

1674744963:DEBUG:trust_dns_recursor::recursor:391:glue not found for ns-ns.ns.
1674744963:DEBUG:trust_dns_recursor::recursor:400:need glue for ns.
1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for ns. nameservers
1674744963:DEBUG:trust_dns_recursor::recursor:286:cached data Ok(Lookup { query: Query { name: Name("ns."), query_type: NS, query_class: IN }, records: [Record { name_labels: Name("ns."), rr_type: NS, dns_class: IN, ttl: 86400, rdata: Some(NS(Name("ns-ns.ns."))) }], valid_until: Instant { tv_sec: 3907817, tv_nsec: 149618961 } })

thread 'trust-dns-server-runtime' has overflowed its stack
fatal runtime error: stack overflow
Debug logs until the first recursion loop
1674744963:DEBUG:trust_dns_server::server::server_future:80:received udp request from: 127.0.0.1:37540
1674744963:DEBUG:trust_dns_server::server::server_future:818:request:15997 src:UDP://127.0.0.1#37540 type:QUERY dnssec:false QUERY:.:NS:IN qflags:RD,AD
1674744963:DEBUG:trust_dns_server::authority::catalog:141:query received: 15997
1674744963:DEBUG:trust_dns_server::authority::catalog:387:searching authorities for: .
1674744963:DEBUG:trust_dns_server::authority::catalog:410:request: 15997 found authority: .
1674744963:DEBUG:trust_dns_server::authority::catalog:490:performing name: . type: NS class: IN on .
1674744963:DEBUG:trust_dns_server::authority::authority_object:190:performing name: . type: NS class: IN on .
1674744963:DEBUG:trust_dns_server::store::recursor::authority:120:recursive lookup: . NS
1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for . nameservers
1674744963:INFO:trust_dns_recursor::recursor_pool:96:querying . for . IN NS
1674744963:DEBUG:trust_dns_proto::xfer::dns_handle:67:querying: . NS
1674744963:DEBUG:trust_dns_resolver::name_server::name_server_pool:257:sending request: [Query { name: Name("."), query_type: NS, query_class: IN }]
1674744963:DEBUG:trust_dns_resolver::name_server::name_server:115:reconnecting: NameServerConfig { socket_addr: 127.64.1.1:53, protocol: Udp, tls_dns_name: None, trust_negative_responses: false, bind_addr: None }
1674744963:DEBUG:trust_dns_proto::xfer:171:enqueueing message:QUERY:[Query { name: Name("."), query_type: NS, query_class: IN }]
1674744963:DEBUG:trust_dns_proto::udp::udp_client_stream:190:final message: ; header 17530:QUERY::NoError:QUERY:0/0/0
; query
;; . IN NS

1674744963:DEBUG:trust_dns_proto::udp::udp_stream:254:created socket successfully
1674744963:DEBUG:trust_dns_proto::udp::udp_client_stream:304:received message id: 17530
1674744963:DEBUG:trust_dns_resolver::error:148:Response:; header 17530:RESPONSE:AA:NoError:QUERY:1/0/0
; query
;; . IN NS
; answers 1
. 86400 IN NS ns-root.ns.
; nameservers 0
; additionals 0

1674744963:INFO:trust_dns_recursor::recursor:298:response: 17530:RESPONSE:AA:NoError:QUERY:1/0/0
1674744963:DEBUG:trust_dns_recursor::recursor:391:glue not found for ns-root.ns.
1674744963:DEBUG:trust_dns_recursor::recursor:400:need glue for .
1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for ns. nameservers
1674744963:INFO:trust_dns_recursor::recursor_pool:96:querying . for ns. IN NS
1674744963:DEBUG:trust_dns_proto::xfer::dns_handle:67:querying: ns. NS
1674744963:DEBUG:trust_dns_resolver::name_server::name_server_pool:257:sending request: [Query { name: Name("ns."), query_type: NS, query_class: IN }]
1674744963:DEBUG:trust_dns_resolver::name_server::name_server:128:existing connection: NameServerConfig { socket_addr: 127.64.1.1:53, protocol: Udp, tls_dns_name: None, trust_negative_responses: false, bind_addr: None }
1674744963:DEBUG:trust_dns_proto::xfer:171:enqueueing message:QUERY:[Query { name: Name("ns."), query_type: NS, query_class: IN }]
1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for ns. nameservers
1674744963:DEBUG:trust_dns_proto::udp::udp_client_stream:190:final message: ; header 42444:QUERY::NoError:QUERY:0/0/0
; query
;; ns. IN NS

1674744963:DEBUG:trust_dns_proto::udp::udp_stream:254:created socket successfully
1674744963:DEBUG:trust_dns_proto::udp::udp_client_stream:304:received message id: 42444
1674744963:DEBUG:trust_dns_resolver::error:148:Response:; header 42444:RESPONSE:AA:NoError:QUERY:1/0/0
; query
;; ns. IN NS
; answers 1
ns. 86400 IN NS ns-ns.ns.
; nameservers 0
; additionals 0

1674744963:INFO:trust_dns_recursor::recursor:298:response: 42444:RESPONSE:AA:NoError:QUERY:1/0/0
1674744963:DEBUG:trust_dns_recursor::recursor:391:glue not found for ns-ns.ns.
1674744963:DEBUG:trust_dns_recursor::recursor:400:need glue for ns.
1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for ns. nameservers
1674744963:DEBUG:trust_dns_recursor::recursor:286:cached data Ok(Lookup { query: Query { name: Name("ns."), query_type: NS, query_class: IN }, records: [Record { name_labels: Name("ns."), rr_type: NS, dns_class: IN, ttl: 86400, rdata: Some(NS(Name("ns-ns.ns."))) }], valid_until: Instant { tv_sec: 3907817, tv_nsec: 149618961 } })

I am performing some DNS tests and encountered this crash with trust-dns. No other resolver seems to have a problem with the glueless delegations.
The DNS client starts with a query kdig @127.0.0.1 NS .. The log above is the result of the query.

trust-dns seems to get stuck on this answer record, since it doesn't contain an IP address:

1674744963:DEBUG:trust_dns_resolver::error:148:Response:; header 42444:RESPONSE:AA:NoError:QUERY:1/0/0
; query
;; ns. IN NS
; answers 1
ns. 86400 IN NS ns-ns.ns.
; nameservers 0
; additionals 0

I think this is technically a wrong response and should be a referral. The referral would have an empty answer, but the NS in the nameserver section and additional records for the IP addresses. Other resolvers can handle the situation without breaking.

To Reproduce
Reproduction requires a custom root server setup. The full trust-dns configuration is under "Additional context". The root DNS server communication is in the log and can be replayed.

Expected behavior
trust-dns should not run into a stack overflow. It should either handle the broken response gracefully, or return a SERVFAIL.
To handle the error gracefully, trust-dns could contact the pre-configured root server, of which it has the IP address, to ask it to provide the missing IP address, i.e.:
Send a query @127.64.1.1 IN A ns-ns.ns and @127.64.1.1 IN AAAA ns-ns.ns.
Use these IP addresses to continue the recursive lookup.

System:

  • OS: Fedora
  • Architecture: x86_64
  • Version 37
  • rustc version: 1.66.1

Version:
Crate: recursor
Version: lastest commit 0b6fefea3f

Additional context
I tested BIND9, Knot Resolver, PowerDNS Recursor, Unbound, and Deadwood; all of them can handle the wrong response fine.

root.hints:

.           3600000 NS  ns-root.ns.
ns-root.ns. 3600000 A   127.64.1.1

trust-dns.toml:

listen_addrs_ipv4 = [ "127.0.0.1" ]
listen_addrs_ipv6 = []

log_level = "debug"

[[zones]]
## zone: this is the ORIGIN of the zone, aka the base name, '.' is implied on the end
##  specifying something other than '.' here, will restrict this recursor to only queries
##  where the search name is a subzone of the name, e.g. if zone is "example.com.", then
##  queries for "www.example.com" or "example.com" would be recursively queried.
zone = "."

## zone_type: Primary, Secondary, Hint, Forward
zone_type = "Hint"

## remember the port, defaults: 53 for Udp & Tcp, 853 for Tls and 443 for Https.
##   Tls and/or Https require features dns-over-tls and/or dns-over-https
stores = { type = "recursor", roots = "/usr/local/etc/trust-dns/root.hints" }

tcpdump.zip

Originally created by @jonasbb on GitHub (Jan 26, 2023). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1889 **Describe the bug** There is an infinite recursion while looking for missing glue data, leading to a stack overflow. The main stack overflow is this part, which repeats until the crash at the end. After the crash, no further DNS processing is done, effectively causing a denial-of-service. ```text 1674744963:DEBUG:trust_dns_recursor::recursor:391:glue not found for ns-ns.ns. 1674744963:DEBUG:trust_dns_recursor::recursor:400:need glue for ns. 1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for ns. nameservers 1674744963:DEBUG:trust_dns_recursor::recursor:286:cached data Ok(Lookup { query: Query { name: Name("ns."), query_type: NS, query_class: IN }, records: [Record { name_labels: Name("ns."), rr_type: NS, dns_class: IN, ttl: 86400, rdata: Some(NS(Name("ns-ns.ns."))) }], valid_until: Instant { tv_sec: 3907817, tv_nsec: 149618961 } }) thread 'trust-dns-server-runtime' has overflowed its stack fatal runtime error: stack overflow ``` <details> <summary>Debug logs until the first recursion loop</summary> ```text 1674744963:DEBUG:trust_dns_server::server::server_future:80:received udp request from: 127.0.0.1:37540 1674744963:DEBUG:trust_dns_server::server::server_future:818:request:15997 src:UDP://127.0.0.1#37540 type:QUERY dnssec:false QUERY:.:NS:IN qflags:RD,AD 1674744963:DEBUG:trust_dns_server::authority::catalog:141:query received: 15997 1674744963:DEBUG:trust_dns_server::authority::catalog:387:searching authorities for: . 1674744963:DEBUG:trust_dns_server::authority::catalog:410:request: 15997 found authority: . 1674744963:DEBUG:trust_dns_server::authority::catalog:490:performing name: . type: NS class: IN on . 1674744963:DEBUG:trust_dns_server::authority::authority_object:190:performing name: . type: NS class: IN on . 1674744963:DEBUG:trust_dns_server::store::recursor::authority:120:recursive lookup: . NS 1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for . nameservers 1674744963:INFO:trust_dns_recursor::recursor_pool:96:querying . for . IN NS 1674744963:DEBUG:trust_dns_proto::xfer::dns_handle:67:querying: . NS 1674744963:DEBUG:trust_dns_resolver::name_server::name_server_pool:257:sending request: [Query { name: Name("."), query_type: NS, query_class: IN }] 1674744963:DEBUG:trust_dns_resolver::name_server::name_server:115:reconnecting: NameServerConfig { socket_addr: 127.64.1.1:53, protocol: Udp, tls_dns_name: None, trust_negative_responses: false, bind_addr: None } 1674744963:DEBUG:trust_dns_proto::xfer:171:enqueueing message:QUERY:[Query { name: Name("."), query_type: NS, query_class: IN }] 1674744963:DEBUG:trust_dns_proto::udp::udp_client_stream:190:final message: ; header 17530:QUERY::NoError:QUERY:0/0/0 ; query ;; . IN NS 1674744963:DEBUG:trust_dns_proto::udp::udp_stream:254:created socket successfully 1674744963:DEBUG:trust_dns_proto::udp::udp_client_stream:304:received message id: 17530 1674744963:DEBUG:trust_dns_resolver::error:148:Response:; header 17530:RESPONSE:AA:NoError:QUERY:1/0/0 ; query ;; . IN NS ; answers 1 . 86400 IN NS ns-root.ns. ; nameservers 0 ; additionals 0 1674744963:INFO:trust_dns_recursor::recursor:298:response: 17530:RESPONSE:AA:NoError:QUERY:1/0/0 1674744963:DEBUG:trust_dns_recursor::recursor:391:glue not found for ns-root.ns. 1674744963:DEBUG:trust_dns_recursor::recursor:400:need glue for . 1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for ns. nameservers 1674744963:INFO:trust_dns_recursor::recursor_pool:96:querying . for ns. IN NS 1674744963:DEBUG:trust_dns_proto::xfer::dns_handle:67:querying: ns. NS 1674744963:DEBUG:trust_dns_resolver::name_server::name_server_pool:257:sending request: [Query { name: Name("ns."), query_type: NS, query_class: IN }] 1674744963:DEBUG:trust_dns_resolver::name_server::name_server:128:existing connection: NameServerConfig { socket_addr: 127.64.1.1:53, protocol: Udp, tls_dns_name: None, trust_negative_responses: false, bind_addr: None } 1674744963:DEBUG:trust_dns_proto::xfer:171:enqueueing message:QUERY:[Query { name: Name("ns."), query_type: NS, query_class: IN }] 1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for ns. nameservers 1674744963:DEBUG:trust_dns_proto::udp::udp_client_stream:190:final message: ; header 42444:QUERY::NoError:QUERY:0/0/0 ; query ;; ns. IN NS 1674744963:DEBUG:trust_dns_proto::udp::udp_stream:254:created socket successfully 1674744963:DEBUG:trust_dns_proto::udp::udp_client_stream:304:received message id: 42444 1674744963:DEBUG:trust_dns_resolver::error:148:Response:; header 42444:RESPONSE:AA:NoError:QUERY:1/0/0 ; query ;; ns. IN NS ; answers 1 ns. 86400 IN NS ns-ns.ns. ; nameservers 0 ; additionals 0 1674744963:INFO:trust_dns_recursor::recursor:298:response: 42444:RESPONSE:AA:NoError:QUERY:1/0/0 1674744963:DEBUG:trust_dns_recursor::recursor:391:glue not found for ns-ns.ns. 1674744963:DEBUG:trust_dns_recursor::recursor:400:need glue for ns. 1674744963:DEBUG:trust_dns_recursor::recursor:330:using roots for ns. nameservers 1674744963:DEBUG:trust_dns_recursor::recursor:286:cached data Ok(Lookup { query: Query { name: Name("ns."), query_type: NS, query_class: IN }, records: [Record { name_labels: Name("ns."), rr_type: NS, dns_class: IN, ttl: 86400, rdata: Some(NS(Name("ns-ns.ns."))) }], valid_until: Instant { tv_sec: 3907817, tv_nsec: 149618961 } }) ``` </details> I am performing some DNS tests and encountered this crash with trust-dns. No other resolver seems to have a problem with the glueless delegations. The DNS client starts with a query `kdig @127.0.0.1 NS .`. The log above is the result of the query. trust-dns seems to get stuck on this answer record, since it doesn't contain an IP address: ```text 1674744963:DEBUG:trust_dns_resolver::error:148:Response:; header 42444:RESPONSE:AA:NoError:QUERY:1/0/0 ; query ;; ns. IN NS ; answers 1 ns. 86400 IN NS ns-ns.ns. ; nameservers 0 ; additionals 0 ``` I think this is technically a wrong response and should be a referral. The referral would have an empty answer, but the `NS` in the nameserver section and additional records for the IP addresses. Other resolvers can handle the situation without breaking. **To Reproduce** Reproduction requires a custom root server setup. The full trust-dns configuration is under "Additional context". The root DNS server communication is in the log and can be replayed. **Expected behavior** trust-dns should not run into a stack overflow. It should either handle the broken response gracefully, or return a SERVFAIL. To handle the error gracefully, trust-dns could contact the pre-configured root server, of which it has the IP address, to ask it to provide the missing IP address, i.e.: Send a query `@127.64.1.1 IN A ns-ns.ns` and `@127.64.1.1 IN AAAA ns-ns.ns`. Use these IP addresses to continue the recursive lookup. **System:** - OS: Fedora - Architecture: x86_64 - Version 37 - rustc version: 1.66.1 **Version:** Crate: recursor Version: lastest commit 0b6fefea3fefe1086fed4df6781550462de51553 **Additional context** I tested BIND9, Knot Resolver, PowerDNS Recursor, Unbound, and Deadwood; all of them can handle the wrong response fine. `root.hints`: ```text . 3600000 NS ns-root.ns. ns-root.ns. 3600000 A 127.64.1.1 ``` `trust-dns.toml`: ```toml listen_addrs_ipv4 = [ "127.0.0.1" ] listen_addrs_ipv6 = [] log_level = "debug" [[zones]] ## zone: this is the ORIGIN of the zone, aka the base name, '.' is implied on the end ## specifying something other than '.' here, will restrict this recursor to only queries ## where the search name is a subzone of the name, e.g. if zone is "example.com.", then ## queries for "www.example.com" or "example.com" would be recursively queried. zone = "." ## zone_type: Primary, Secondary, Hint, Forward zone_type = "Hint" ## remember the port, defaults: 53 for Udp & Tcp, 853 for Tls and 443 for Https. ## Tls and/or Https require features dns-over-tls and/or dns-over-https stores = { type = "recursor", roots = "/usr/local/etc/trust-dns/root.hints" } ``` [tcpdump.zip](https://github.com/bluejekyll/trust-dns/files/10511204/tcpdump.zip)
Author
Owner

@bluejekyll commented on GitHub (Jan 26, 2023):

We really need to get some integration tests setup for the recursor. This isn’t surprising, but it would be good to reproduce in some tests and fix there.

I’ll try to take a look when I have some time, but if anyone finds a way to deal with this, please don’t hesitate to put up a PR.

<!-- gh-comment-id:1405556506 --> @bluejekyll commented on GitHub (Jan 26, 2023): We really need to get some integration tests setup for the recursor. This isn’t surprising, but it would be good to reproduce in some tests and fix there. I’ll try to take a look when I have some time, but if anyone finds a way to deal with this, please don’t hesitate to put up a PR.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#807
No description provided.