mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 11:15:54 +03:00
[GH-ISSUE #2466] Recursor: follow NS referrals other than the first one #997
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#997
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @divergentdave on GitHub (Sep 18, 2024).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2466
There are three places in
RecursorDnsHandlewhere we calltake(1)on an iterator of NS records, and then look up A/AAAA records. If there are multiple addresses for the first eligible name server, we will set up a name server pool covering all of them, but we will only use the one eligible NS record. This means that the recursor may only contact primary name servers, and not secondary name servers.This has first-order impacts on resilience because we can't fall back properly to the secondary, and second-order impacts on resilience because it increases the likelihood of getting rate-limited by the primary name server.
We should at least randomly pick an NS record, or better yet, load balance between name server names based on past request statistics, much as we currently do between different IP addresses for the same name server name. RFC 1034 and RFC 1035 provide a recommended algorithm that is more sophisticated. For best reliability, we may want to look up addresses for one or two NS records, try making requests to some addresses, and look up more name server addresses corresponding to other NS records as needed if the initial requests are taking too long, and we need more addresses.
Would it be appropriate to put all the connections for different name servers in the same
NameServerPoolfor a zone, rather than just connections for different addresses of one name server?@marcus0x62 commented on GitHub (Sep 18, 2024):
This has been on my list for a while. The short answer is, the take(1) calls can be removed without any issue (I almost included that change in the recent infinite recursion PR, but there was already a lot going on in that PR.)
That doesn't fix the problem of tracking nameserver reliability, but it does add resiliency to the lookup process for the scenario where the parent nameserver(s) don't return glue records. Note that for lookups where the parent nameservers do return glue records, we will add whatever number of records are returned to the NS pool for the zone.
What this means, practically speaking, is that domains that have in-domain nameservers are very likely to have all or most of the NS records in their NS pool, while domains with out-of-domain nameservers are likely to have only a single entry (or a single v4 and a single v6 entry) in their NS pool.
@divergentdave commented on GitHub (Nov 15, 2024):
This was fixed by #2522. Both
ns_pool_for_zone()andns_pool_for_referral()now use all NS records to build pools.