[PR #1589] [MERGED] Do not retry the same name server on a negative response #2443

New issue

Closed

opened 2026-03-16 08:54:06 +03:00 by kerem · 0 comments

kerem commented

2026-03-16 08:54:06 +03:00

Owner

📋 Pull Request Information

Original PR: https://github.com/hickory-dns/hickory-dns/pull/1589
Author: @peterthejohnston
Created: 11/17/2021
Status: ✅ Merged
Merged: 11/24/2021
Merged by: @bluejekyll

Base: main ← Head: main

📝 Commits (1)

4471bdf Do not retry the same name server on a negative response

📊 Changes

4 files changed (+102 additions, -6 deletions)

View changed files

📝 crates/proto/src/xfer/retry_dns_handle.rs (+10 -1)
📝 crates/resolver/src/config.rs (+6 -4)
📝 crates/resolver/src/error.rs (+9 -1)
➕ tests/integration-tests/tests/retry_dns_handle_tests.rs (+77 -0)

📄 Description

Currently in Trust-DNS, there are two mechanisms that allow failed queries to be retried by the resolver:

the name server pool retries negative responses against other name servers in the resolver's pool, depending on its configuration
the RetryDnsHandle reattempts queries against the name server pool if they fail

The name server pool retries basically any unsuccessful response against fallback name servers, unless it gets a trusted NoRecordsFound error, which occurs when NameServerConfig::trust_nx_responses is true for that server and the resolver received an empty NXDomain response.

The RetryDnsHandle uses the RetryableError trait to determine if an error should be retried. However, the implementation of RetryableError::should_retry for ResolveError uses the same criteria as the name server pool, which I think is not the desired behavior. This leads to the same query being retried to the same name server when it shouldn't be.

For example (this was real behavior observed when testing the resolver on a device running Fuchsia):

let's say a resolver has 3 name servers configured
a query gets a NODATA response from the first name server, and the name server pool retries the query on the other ones, getting the same response
the RetryDnsHandle now retries that entire query over the whole name server pool again, because it got an error for which RetryableError::should_retry is true. This happens ResolverOpts::attempts number of times.

in effect, we send this query 3 (# of name servers) * 3 (# of total attempts) = 9 times, 3 times to each name server. The name server pool is used correctly here to retry on a negative response; however, the RetryDnsHandle should probably only be used on IO errors (e.g. we failed to connect to a given server) or other errors on which it's reasonable to ask the same name server again. If we successfully get a negative response from a server, e.g. a NODATA response, it doesn't make sense to expect an OK response when we retry, so we should not be retrying the query to that same name server.

The desired end state is one where, if the resolver encounters no IO errors, only one query is made to each name server in the pool, at most.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/hickory-dns/hickory-dns/pull/1589 **Author:** [@peterthejohnston](https://github.com/peterthejohnston) **Created:** 11/17/2021 **Status:** ✅ Merged **Merged:** 11/24/2021 **Merged by:** [@bluejekyll](https://github.com/bluejekyll) **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (1) - [`4471bdf`](https://github.com/hickory-dns/hickory-dns/commit/4471bdfe8ec6808819937cfe984da96eb165fbbd) Do not retry the same name server on a negative response ### 📊 Changes **4 files changed** (+102 additions, -6 deletions) <details> <summary>View changed files</summary> 📝 `crates/proto/src/xfer/retry_dns_handle.rs` (+10 -1) 📝 `crates/resolver/src/config.rs` (+6 -4) 📝 `crates/resolver/src/error.rs` (+9 -1) ➕ `tests/integration-tests/tests/retry_dns_handle_tests.rs` (+77 -0) </details> ### 📄 Description Currently in Trust-DNS, there are two mechanisms that allow failed queries to be retried by the resolver: * the [name server pool](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/name_server/name_server_pool.rs) retries negative responses against other name servers in the resolver's pool, depending on its configuration * the [`RetryDnsHandle`](https://github.com/bluejekyll/trust-dns/blob/main/crates/proto/src/xfer/retry_dns_handle.rs#L24) reattempts queries against the name server pool if they fail The name server pool retries basically any unsuccessful response against fallback name servers, [unless it gets](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/name_server/name_server_pool.rs#L332) a `trusted` `NoRecordsFound` error, which occurs when [`NameServerConfig::trust_nx_responses`](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/config.rs#L397) is true for that server and the resolver received an empty `NXDomain` response. The `RetryDnsHandle` uses the [`RetryableError`](https://github.com/bluejekyll/trust-dns/blob/main/crates/proto/src/xfer/retry_dns_handle.rs#L116) trait to determine if an error should be retried. However, the [implementation](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/error.rs#L238) of `RetryableError::should_retry` for `ResolveError` uses the same criteria as the name server pool, which I think is not the desired behavior. This leads to the same query being retried to the same name server when it shouldn't be. For example (this was real behavior observed when testing the resolver on a device running Fuchsia): * let's say a resolver has 3 name servers configured * a query gets a `NODATA` response from the first name server, and the name server pool retries the query on the other ones, getting the same response * the `RetryDnsHandle` now retries that entire query over the whole name server pool again, because it got an error for which `RetryableError::should_retry` is `true`. This happens `ResolverOpts::attempts` number of times. in effect, we send this query 3 (# of name servers) * 3 (# of total attempts) = 9 times, 3 times to each name server. The name server pool is used correctly here to retry on a negative response; however, the RetryDnsHandle should probably only be used on IO errors (e.g. we failed to connect to a given server) or other errors on which it's reasonable to ask the same name server again. If we successfully get a negative response from a server, e.g. a `NODATA` response, it doesn't make sense to expect an OK response when we retry, so we should not be retrying the query to that same name server. The desired end state is one where, if the resolver encounters no IO errors, only one query is made to each name server in the pool, at most. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>