mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 11:15:54 +03:00
[GH-ISSUE #2228] hickory-resolver retries NXDOMAINs over TCP if using try_tcp_on_error #930
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#930
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @x86pup on GitHub (Jun 1, 2024).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2228
Describe the bug
hickory-resolver seems to consider NXDOMAINs (non-existent domains) errors that qualify to be retried over TCP, but NXDOMAINs are not "errors" in the sense that the query failed over UDP and needs to be retried over TCP, somehow expecting a different response.
The codepath that this seems to hit on is: https://github.com/hickory-dns/hickory-dns/blob/main/crates/resolver/src/name_server/name_server_pool.rs#L267-L270 when using
try_tcp_on_errorTo Reproduce
Steps to reproduce the behavior:
try_tcp_on_errorwith hickory-resolver and debug logs enabledawawa.google.comhickory_resolver::name_server::name_server_pool: error from UDP, retrying over TCP: no record found for Query { name: Name("awawa.google.com."), query_type: A, query_class: IN }which indicates that the domain has no record (NXDOMAIN) but hickory_resolver decides to retry over TCP anyways even though the query was successfulThe same domain when queried with
dig:Expected behavior
Domains such as
awawa.google.comthat returnNXDOMAINdo not get retried over TCP.System:
Version:
Crate: resolver
Version: v0.24.1
Additional context
If this behaviour is truly intended (with a very thorough explanation why, but I don't see any obvious reason why this is intended either way), it would be preferable if this is behind a config option.
My crate is a Matrix homeserver, and "dead" servers over federation that were previously Matrix homeservers but no longer are homeservers can return NXDOMAIN because the owner deleted the DNS records.
There is very little to no value retrying the query over TCP as this is almost guaranteed to not be a false negative. Even then, retrying over TCP will definitely not get a successful or better response. So not having to waste queries on TCP would save resources. This is especially noticeable when apart of Matrix delegation/destination resolution, we have to query multiple DNS records. And this behaviour is just amplifying the amount of DNS queries we make for 1 single server where there could be upwards of hundreds of remote servers in a single Matrix room.
If needed, here's my crate's hickory-resolver config:
github.com/girlbossceo/conduwuit@f4cfc77a57/src/service/globals/resolver.rs (L44-L82)Tracing debug logs from my crate from hickory_resolver
@bluejekyll commented on GitHub (Jun 2, 2024):
Do you get the behavior you want when you disable retry over TCP?
@bluejekyll commented on GitHub (Jun 2, 2024):
I do wonder if after a few of these revisions if this config value isn’t actually doing what should be the expected behavior. That is, we retry tcp no matter what in certain cases right now.
it seems like this should really say
if e.is_no_connections() || (opts.try_tcp_on_error && e.is_io())as I think you’re right that DNS failures shouldn’t the retried. I’d have to look for past issues in this area to understand how we ended up here. It’s possible that there are some odd configurations in the wild where local DNS servers are improperly responding to queries, though that really shouldn’t be handled in this location.
@x86pup commented on GitHub (Jun 2, 2024):
Not really because it's expected for us to support TCP fallback. My crate has retry over TCP enabled for everyone by default with an optional config option for someone to toggle it off.
SRV queries on Matrix can have large responses which only TCP fallback can provide, and if a user has exhausted UDP resources then retrying on TCP is not a bad default to have. We just don't want it to be amplifying unnecessary amounts of DNS queries by retrying on NXDOMAIN.
@x86pup commented on GitHub (Jun 2, 2024):
I don't know all the cases where it makes sense to retry TCP on error except for too large DNS responses (e.g. SRV records), and UDP I/O error yeah, but on the surface this makes sense to me.