mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 03:05:51 +03:00
[GH-ISSUE #2306] hickory as validating resolver times out against bind security aware name server #964
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#964
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @justahero on GitHub (Jul 10, 2024).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2306
Describe the bug
When running
hickory-dnsas a validating resolver withbindas a security-aware name server the clientdigcan cause the resolver to timeout, hickory then crashes.To Reproduce
Run the following test
Other tests fail too.
Expected behavior
The test should pass. Running the same test against
unbound(DNS_TEST_PEER=unbound) passes the test. It could be caused by insufficiently configuringbind, but the effect itself it has on hickory might be worth checking regardless.System:
Version:
Crate: resolver (most likely)
Version:
514f713afdAdditional context
The hickory output is pretty large, appended here are the first dozen lines that might indicate the issue.
stdout output
@japaric commented on GitHub (Jul 12, 2024):
7 conformance tests fail when the TEST_PEER is set to BIND instead of
unbound(nsd) and TEST_SUBJECT is hickory. TEST_SUBJECT=unbound + TEST_PEER=bind does not produce any failure.currently investigating but at first glance it seems that the test
resolver::dnssec::rfc4035::section_3::section_3_2::do_bit_not_set_in_requestalso goes into some sort of loop as what has been reported here because thetsharkcapture file in the TEST_SUBJECT=hickory case is 40x times longer than in the SUBJECT=unbound case@japaric commented on GitHub (Jul 12, 2024):
Sebastian said
given that
I would think that BIND is properly configured and that hickory should not fail any test that unbound passes
@japaric commented on GitHub (Jul 12, 2024):
I have come up with a minimal repro in #2309
the one thing these 7 failures had was that the DNS network consisted of a single nameserver, the root nameserver.
what appears to be the problem is that BIND does not include a glue record (an A record) when it responds to NS queries which is a common way to give a referral to another nameserver (a NS + A record pair that is)
the problem appears to be that when hickory-dns see that response to
. NSlacks an A record, it will try to get that information from the authoritative nameserver. theNS .response includes the domainprimaryNN.nameservers.com.so hickory will send out queries likecom. NSandnameservers.com. NSbut the only.nameserver it knows about cannot answer those queries. hickory gets stuck in that loop of trying to get theprimaryNN.nameservers.com. Arecord from the authority overnameservers.com.when in fact the.nameserver can serve that record.This is not a bug in DNSSEC functionality but rather a bug in plain DNS recursive resolution. the regression test in #2309 does not use any DNSSEC functionality, e.g. signed zone files.
@marcus0x62 commented on GitHub (Jul 30, 2024):
I ran into this bug while doing some testing and did some more research. The root cause is mutual recursion between ns_pool_for_zone and resolve in recursor_dns_handle.rs, specifically the calls to resolve here and here.
I describe this in more detail in the PR, but, essentially the possibility of infinite recursion exists in the following circumstances:
Triggering this bug in the wild depends mostly on point #1 -- whether or not the parent DNS server returns glue records. BIND seems to be particularly idiosyncratic in that respect.
@marcus0x62 commented on GitHub (Aug 27, 2024):
This should be resolved with the merging of PR #2332