[GH-ISSUE #1702] Name server sorting is problematic #739

Closed
opened 2026-03-16 00:04:28 +03:00 by kerem · 9 comments
Owner

Originally created by @peterthejohnston on GitHub (Apr 25, 2022).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1702

Describe the bug
The Trust-DNS resolver sorts the name servers it's been configured with, by (1) "connection state", so that name servers that have been successfully reached are prioritized above unreachable ones; and (2) based on stats collected on how many queries to a given resolver were successful vs. failed.

However, there are a few problems with this sorting:

  • Connection state (NameServerState) seems to be sorted backwards from what I would expect. Because Failed = 0 and Established = 2, and servers are sorted in (default) increasing order, servers with which we've established communication are sorted last, while servers we've previously attempted and failed to reach are ranked first. (Maybe I'm missing something here?)
  • Stats about successful/failed queries (NameServerStats) are collected and compared in unintuitive ways.
    • From local testing, it looks like failed queries only get recorded in stats if the name server fails to respond and the overall query fails. In other words, if two of three name servers in a pool are nonresponsive, but one does eventually respond, those two name servers won't have their failure stats incremented. I need to dig deeper into why exactly this is happening.
    • Name servers with more failures are sorted later (this matches my expectations), but successes are compared backwards, i.e. name servers with fewer successful queries are queried first, presumably to balance things out.

Unfortunately, when you combine the last two properties, you can get into a situation where a nonresponsive name server always gets queried before a good one as long as the overall query never fails. :( If, for example, NameServerA always drops queries and NameServerB always responds, NameServerA's failure stats won't get incremented, so you can imagine that at some point A's stats are successes: 0 failures: 0 and B's stats are successes: 10 failures: 0 and A will get queried first. When in practice this happens with pairs or groups of name servers, this can result in long timeouts / unexpected latency.

To Reproduce
Configure the resolver with multiple name servers, one or more of which is nonresponsive. Observe that over time (depending on the particulars of the network setup), the nonresponsive name server(s) get queried first.

Expected behavior
Here's what I think we should do:

  • Short-term, we'd like name server sorting to be configurable (at least able to turn on/off)
  • I need to do some more digging to fully understand how the stats and connection state are used to sort, but I think some of this sorting logic needs to be fixed
  • It would be great if we could incorporate latency into NameServerStats (see this TODO in the code).
  • Long-term, it would be great to have more fine-grained control over how this prioritization/sorting is done. As an example, it would be great to be able to turn on prioritization based on connection status (NameServerState), but not successes vs. failures (NameServerStats), or something similar.

I'm happy to take on this work, but I wanted to raise the issues that we've run into and see what folks think.

Originally created by @peterthejohnston on GitHub (Apr 25, 2022). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1702 **Describe the bug** The Trust-DNS resolver [sorts](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/name_server/name_server_pool.rs#L185) the name servers it's been configured with, by (1) ["connection state"](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/name_server/name_server.rs#L204), so that name servers that have been successfully reached are prioritized above unreachable ones; and (2) based on [stats](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/name_server/name_server_stats.rs#L12) collected on how many queries to a given resolver were successful vs. failed. However, there are a few problems with this sorting: - Connection state (`NameServerState`) seems to be sorted backwards from what I would expect. Because `Failed = 0` and `Established = 2`, and servers are sorted in (default) increasing order, servers with which we've established communication are sorted last, while servers we've previously attempted and failed to reach are ranked first. (Maybe I'm missing something here?) - Stats about successful/failed queries (`NameServerStats`) are collected and compared in unintuitive ways. - From local testing, it looks like failed queries only get recorded in stats if the name server fails to respond _and_ the overall query fails. In other words, if two of three name servers in a pool are nonresponsive, but one does eventually respond, those two name servers won't have their failure stats incremented. I need to dig deeper into why exactly this is happening. - Name servers with more failures are sorted later (this matches my expectations), but successes are [compared backwards](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/name_server/name_server_stats.rs#L96), i.e. name servers with fewer successful queries are queried first, presumably to balance things out. Unfortunately, when you combine the last two properties, you can get into a situation where a nonresponsive name server always gets queried before a good one as long as the overall query never fails. :( If, for example, NameServerA always drops queries and NameServerB always responds, NameServerA's failure stats won't get incremented, so you can imagine that at some point A's stats are `successes: 0 failures: 0` and B's stats are `successes: 10 failures: 0` and A will get queried first. When in practice this happens with pairs or groups of name servers, this can result in long timeouts / unexpected latency. **To Reproduce** Configure the resolver with multiple name servers, one or more of which is nonresponsive. Observe that over time (depending on the particulars of the network setup), the nonresponsive name server(s) get queried first. **Expected behavior** Here's what I think we should do: - Short-term, we'd like name server sorting to be configurable (at least able to turn on/off) - I need to do some more digging to fully understand how the stats and connection state are used to sort, but I think some of this sorting logic needs to be fixed - It would be great if we could incorporate latency into `NameServerStats` (see [this TODO](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/name_server/name_server_stats.rs#L15) in the code). - Long-term, it would be great to have more fine-grained control over how this prioritization/sorting is done. As an example, it would be great to be able to turn on prioritization based on connection status (`NameServerState`), but not successes vs. failures (`NameServerStats`), or something similar. I'm happy to take on this work, but I wanted to raise the issues that we've run into and see what folks think.
kerem 2026-03-16 00:04:28 +03:00
Author
Owner

@bluejekyll commented on GitHub (Apr 26, 2022):

Short-term, we'd like name server sorting to be configurable (at least able to turn on/off)

Yes, that sounds like a good idea.

I need to do some more digging to fully understand how the stats and connection state are used to sort, but I think some of this sorting logic needs to be fixed

This sounds good, I thought there were more tests on ordering than there are. Let me know if you need any additional clarifications, but it looks like you've understood the code really well at this point.

I need to do some more digging to fully understand how the stats and connection state are used to sort, but I think some of this sorting logic needs to be fixed

I agree

It would be great if we could incorporate latency into NameServerStats (see this TODO in the code).

Yes, I've wanted to do this, but never got around to it.

long-term, it would be great to have more fine-grained control over how this prioritization/sorting is done. As an example, it would be great to be able to turn on prioritization based on connection status (NameServerState), but not successes vs. failures (NameServerStats), or something similar.

I think one thing you've identified is that we also don't "reset" the stats at any point, I wonder if we would want to move to histograms instead? Please see this PR as well, which also was taking steps at improving the ordering, #1632

I'm happy to take on this work, but I wanted to raise the issues that we've run into and see what folks think.

Excellent, thank you!

<!-- gh-comment-id:1109159585 --> @bluejekyll commented on GitHub (Apr 26, 2022): > Short-term, we'd like name server sorting to be configurable (at least able to turn on/off) Yes, that sounds like a good idea. > I need to do some more digging to fully understand how the stats and connection state are used to sort, but I think some of this sorting logic needs to be fixed This sounds good, I thought there were more tests on ordering than there are. Let me know if you need any additional clarifications, but it looks like you've understood the code really well at this point. > I need to do some more digging to fully understand how the stats and connection state are used to sort, but I think some of this sorting logic needs to be fixed I agree > It would be great if we could incorporate latency into NameServerStats (see [this TODO](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/name_server/name_server_stats.rs#L15) in the code). Yes, I've wanted to do this, but never got around to it. > long-term, it would be great to have more fine-grained control over how this prioritization/sorting is done. As an example, it would be great to be able to turn on prioritization based on connection status (NameServerState), but not successes vs. failures (NameServerStats), or something similar. I think one thing you've identified is that we also don't "reset" the stats at any point, I wonder if we would want to move to histograms instead? Please see this PR as well, which also was taking steps at improving the ordering, #1632 > I'm happy to take on this work, but I wanted to raise the issues that we've run into and see what folks think. Excellent, thank you!
Author
Owner

@djc commented on GitHub (Apr 26, 2022):

See also discussion in https://github.com/bluejekyll/trust-dns/issues/158.

@peterthejohnston in your ideal end state, would sorting ever be off? It'd be interesting to understand what your optimal configuration would look like and if/how you think that configuration might differ from other's use cases. Like, what are the trade-offs here? Because ideally we wouldn't expose many knobs and just wire it up so that it does the right thing for "everyone".

<!-- gh-comment-id:1109459677 --> @djc commented on GitHub (Apr 26, 2022): See also discussion in https://github.com/bluejekyll/trust-dns/issues/158. @peterthejohnston in your ideal end state, would sorting ever be off? It'd be interesting to understand what your optimal configuration would look like and if/how you think that configuration might differ from other's use cases. Like, what are the trade-offs here? Because ideally we wouldn't expose many knobs and just wire it up so that it does the right thing for "everyone".
Author
Owner

@peterthejohnston commented on GitHub (Aug 8, 2022):

Apologies for my very delayed reply.

See also discussion in https://github.com/bluejekyll/trust-dns/issues/158.

Thanks for the pointer!

@peterthejohnston in your ideal end state, would sorting ever be off? ... Because ideally we wouldn't expose many knobs and just wire it up so that it does the right thing for "everyone".

Yeah, I see what you mean, that's a good question. I think we'd potentially be OK with sorting being non-configurable if it worked sensibly out of the box, but I'm honestly not sure exactly what our optimal configuration would look like. I think that @nhurley3 (also working on Fuchsia's networking stack) is doing some investigation into that and might have more thoughts.

<!-- gh-comment-id:1208478006 --> @peterthejohnston commented on GitHub (Aug 8, 2022): Apologies for my very delayed reply. > See also discussion in https://github.com/bluejekyll/trust-dns/issues/158. Thanks for the pointer! > @peterthejohnston in your ideal end state, would sorting ever be off? ... Because ideally we wouldn't expose many knobs and just wire it up so that it does the right thing for "everyone". Yeah, I see what you mean, that's a good question. I think we'd potentially be OK with sorting being non-configurable if it worked sensibly out of the box, but I'm honestly not sure exactly what our optimal configuration would look like. I think that @nhurley3 (also working on Fuchsia's networking stack) is doing some investigation into that and might have more thoughts.
Author
Owner

@nhurley3 commented on GitHub (Aug 9, 2022):

Thanks for looping me in, Peter!

My intuition is that Trust-DNS should generally just do the right thing. One potential option could be to adopt something similar to the SRTT based algorithms that the more heavyweight caching resolvers use (e.g. BIND, Unbound, etc.) [1]. For example, something like:

  1. Maintain a time-weighted average latency for each server (SRTT). The values could initially be random within some range.
  2. For a given query, order the servers in the pool by the aforementioned value.
  3. If a server was used, then incorporate the recorded latency into the time-weighted average.
  4. If a server was not used, then apply some decay value to the time-weighted average.
  5. If the query to a server was not successful (e.g. the server wasn't responsive), then apply some penalty to the recorded value.

The benefit of such an algorithm is that it responds to both negative and positive changes while also generally preferring the least latent server. This may be overkill though. What does everyone think?

Additionally, I think it also makes sense to add the on/off knob that Peter mentioned above. This seems like a fairly common practice with resolvers and is the resolv.conf default [2]. Any objections to me starting with a pull request that adds this config? It sounds like there was probably earlier agreement with this type of change, but just wanted to double check before proceeding.

[1] https://irl.cs.ucla.edu/data/files/papers/res_ns_selection.pdf
[2] https://man7.org/linux/man-pages/man5/resolv.conf.5.html

<!-- gh-comment-id:1209814007 --> @nhurley3 commented on GitHub (Aug 9, 2022): Thanks for looping me in, Peter! My intuition is that Trust-DNS should generally just do the right thing. One potential option could be to adopt something similar to the SRTT based algorithms that the more heavyweight caching resolvers use (e.g. BIND, Unbound, etc.) [1]. For example, something like: 1. Maintain a time-weighted average latency for each server (SRTT). The values could initially be random within some range. 2. For a given query, order the servers in the pool by the aforementioned value. 3. If a server was used, then incorporate the recorded latency into the time-weighted average. 4. If a server was not used, then apply some decay value to the time-weighted average. 5. If the query to a server was not successful (e.g. the server wasn't responsive), then apply some penalty to the recorded value. The benefit of such an algorithm is that it responds to both negative and positive changes while also generally preferring the least latent server. This may be overkill though. What does everyone think? Additionally, I think it also makes sense to add the on/off knob that Peter mentioned above. This seems like a fairly common practice with resolvers and is the resolv.conf default [2]. Any objections to me starting with a pull request that adds this config? It sounds like there was probably earlier agreement with this type of change, but just wanted to double check before proceeding. [1] https://irl.cs.ucla.edu/data/files/papers/res_ns_selection.pdf [2] https://man7.org/linux/man-pages/man5/resolv.conf.5.html
Author
Owner

@bluejekyll commented on GitHub (Aug 9, 2022):

@nhurley3, I really like your proposal on improving the existing algorithm. That is inline with where I was trying to get to, but never completed. This would be really useful in the recursive resolver implementation that I plan to reuse the name server pool for.

I also think we may want a dumb resolution option as well, as that seems to be what most people actually want. So if that's the first simplest thing to do (i.e. the on/off switch), I think that would be a great change to get in.

<!-- gh-comment-id:1209831039 --> @bluejekyll commented on GitHub (Aug 9, 2022): @nhurley3, I really like your proposal on improving the existing algorithm. That is inline with where I was trying to get to, but never completed. This would be really useful in the recursive resolver implementation that I plan to reuse the name server pool for. I also think we may want a dumb resolution option as well, as that seems to be what most people actually want. So if that's the first simplest thing to do (i.e. the on/off switch), I think that would be a great change to get in.
Author
Owner

@djc commented on GitHub (Aug 9, 2022):

The benefit of such an algorithm is that it responds to both negative and positive changes while also generally preferring the least latent server. This may be overkill though. What does everyone think?

Doesn't sound like overkill -- sounds pretty good to me!

I also think we may want a dumb resolution option as well, as that seems to be what most people actually want. So if that's the first simplest thing to do (i.e. the on/off switch), I think that would be a great change to get in.

How dumb do they want? @nhurley3 in what scenario do you think you/someone might want to turn it off?

<!-- gh-comment-id:1209850163 --> @djc commented on GitHub (Aug 9, 2022): > The benefit of such an algorithm is that it responds to both negative and positive changes while also generally preferring the least latent server. This may be overkill though. What does everyone think? Doesn't sound like overkill -- sounds pretty good to me! > I also think we may want a dumb resolution option as well, as that seems to be what most people actually want. So if that's the first simplest thing to do (i.e. the on/off switch), I think that would be a great change to get in. How dumb do they want? @nhurley3 in what scenario do you think you/someone might want to turn it off?
Author
Owner

@nhurley3 commented on GitHub (Aug 10, 2022):

@nhurley3 in what scenario do you think you/someone might want to turn it off?

When we say "turn it off", we really mean that the resolver should respect a user provided ordering. As a result, my envisioned scenario is similar to the case where a user manually edits resolv.conf in Linux or sets the primary + alternate servers in Windows. That is, a power user simply wants to enforce a particular ordering or the implemented algorithm isn't doing what the user wants for whatever reason. This type of escape hatch seems to be the simplest and most common, which is why I think it makes sense.

<!-- gh-comment-id:1210858974 --> @nhurley3 commented on GitHub (Aug 10, 2022): > @nhurley3 in what scenario do you think you/someone might want to turn it off? When we say "turn it off", we really mean that the resolver should respect a user provided ordering. As a result, my envisioned scenario is similar to the case where a user manually edits resolv.conf in Linux or sets the primary + alternate servers in Windows. That is, a power user simply wants to enforce a particular ordering or the implemented algorithm isn't doing what the user wants for whatever reason. This type of escape hatch seems to be the simplest and most common, which is why I think it makes sense.
Author
Owner

@bluejekyll commented on GitHub (Aug 10, 2022):

a power user simply wants to enforce a particular ordering or the implemented algorithm isn't doing what the user wants for whatever reason. This type of escape hatch seems to be the simplest and most common, which is why I think it makes sense.

Yes, this aligns with what I believe people would like as well.

<!-- gh-comment-id:1210976560 --> @bluejekyll commented on GitHub (Aug 10, 2022): > a power user simply wants to enforce a particular ordering or the implemented algorithm isn't doing what the user wants for whatever reason. This type of escape hatch seems to be the simplest and most common, which is why I think it makes sense. Yes, this aligns with what I believe people would like as well.
Author
Owner

@nhurley3 commented on GitHub (Aug 10, 2022):

Yes, this aligns with what I believe people would like as well.

Great, I'll start working on a PR.

<!-- gh-comment-id:1211267130 --> @nhurley3 commented on GitHub (Aug 10, 2022): > Yes, this aligns with what I believe people would like as well. Great, I'll start working on a PR.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#739
No description provided.