[GH-ISSUE #3440] SRV record ordering not respected. #1182

Open
opened 2026-03-16 01:49:41 +03:00 by kerem · 7 comments
Owner

Originally created by @davidv1992 on GitHub (Jan 21, 2026).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/3440

Describe the bug
The specification for SRV records (RFC 2782) specifies that such records should be used by clients in a specific, potentially randomized order. The current implementation in the hickory resolver always provides the SRV records in the order that they are produced on the wire, breaking this requirement from the RFC

To Reproduce
Run the following rust program repeatedly, and observe the order of the records printed is always the same.

use hickory_resolver::Resolver;

#[tokio::main]
async fn main() {
    let mut builder = Resolver::builder_tokio().unwrap();
    builder.options_mut().validate = true;
    let resolver = builder.build();
    let lookup_result = resolver
        .srv_lookup("_ntske._tcp.srvtest.ntspooltest.org")
        .await
        .unwrap();
    for srv in lookup_result.iter() {
        println!(
            "{} {} {} {}",
            srv.target(),
            srv.port(),
            srv.weight(),
            srv.priority()
        );
    }
}

Expected behavior
Expected the order of the records to be randomized according to the priority and weight fields.

System:

  • OS: Debian
  • Architecture: x86_64
  • Versio: Trixie
  • rustc version: 1.92

Version:
Crate: hickory-resolver
Version: 0.25.2

Originally created by @davidv1992 on GitHub (Jan 21, 2026). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/3440 **Describe the bug** The specification for SRV records (RFC 2782) specifies that such records should be used by clients in a specific, potentially randomized order. The current implementation in the hickory resolver always provides the SRV records in the order that they are produced on the wire, breaking this requirement from the RFC **To Reproduce** Run the following rust program repeatedly, and observe the order of the records printed is always the same. ``` use hickory_resolver::Resolver; #[tokio::main] async fn main() { let mut builder = Resolver::builder_tokio().unwrap(); builder.options_mut().validate = true; let resolver = builder.build(); let lookup_result = resolver .srv_lookup("_ntske._tcp.srvtest.ntspooltest.org") .await .unwrap(); for srv in lookup_result.iter() { println!( "{} {} {} {}", srv.target(), srv.port(), srv.weight(), srv.priority() ); } } ``` **Expected behavior** Expected the order of the records to be randomized according to the priority and weight fields. **System:** - OS: Debian - Architecture: x86_64 - Versio: Trixie - rustc version: 1.92 **Version:** Crate: hickory-resolver Version: 0.25.2
Author
Owner

@djc commented on GitHub (Jan 21, 2026):

Would be helpful to reference exactly the sections of the RFC that you're looking at.

<!-- gh-comment-id:3778532924 --> @djc commented on GitHub (Jan 21, 2026): Would be helpful to reference exactly the sections of the RFC that you're looking at.
Author
Owner

@davidv1992 commented on GitHub (Jan 21, 2026):

Unfortunately the RFC is old enough not to have section numbers, but it is specified in the subitem Weight of the RR type definition.

<!-- gh-comment-id:3779168598 --> @davidv1992 commented on GitHub (Jan 21, 2026): Unfortunately the RFC is old enough not to have section numbers, but it is specified in the subitem Weight of the RR type definition.
Author
Owner

@cpu commented on GitHub (Jan 21, 2026):

I have limited experience here and don't hold strong opinions at this point, but is there some intuition you have as to why this should be handled by the hickory resolver library instead of the application processing the SRV records? Or perhaps you can make a comparison to another DNS resolver API that does do what you're asking for if you're aware of one?

The RFC text talks about selecting a SRV record based on the weighted order, but it doesn't seem immediately obvious to me that the SrvLookupIter is the right place to encode that selection process. I'm also not sure I'd characterize this as a requirement since the normative language is described as a 2119 "SHOULD".

<!-- gh-comment-id:3780838943 --> @cpu commented on GitHub (Jan 21, 2026): I have limited experience here and don't hold strong opinions at this point, but is there some intuition you have as to why this should be handled by the hickory resolver library instead of the application processing the SRV records? Or perhaps you can make a comparison to another DNS resolver API that does do what you're asking for if you're aware of one? The RFC text talks about selecting a SRV record based on the weighted order, but it doesn't seem immediately obvious to me that the `SrvLookupIter` is the right place to encode that selection process. I'm also not sure I'd characterize this as a _requirement_ since the normative language is described as a 2119 "SHOULD".
Author
Owner

@davidv1992 commented on GitHub (Jan 22, 2026):

My main point of comparison for this is the GetDNS library, which like hickory-resolver seems to be aimed at making a number of common usecases easy for application developers.

The problem with leaving this to the application is that they might forget to do this (or outright not know that this is a thing). Application developers are typically not going that deep into DNS, and judging by djcs response even for those in the ecosystem this isn't exactly an obvious feature of SRV records.

On the other hand, there might already be (and definitely will be in a few months) use cases that depend on these load balancing features of SRV records in the wild, which would get a real headache should an application forget to do this and just always try the top one first, especially if that application becomes popular. Having the library handling this transparently will prevent that outcome.

<!-- gh-comment-id:3782979495 --> @davidv1992 commented on GitHub (Jan 22, 2026): My main point of comparison for this is the GetDNS library, which like hickory-resolver seems to be aimed at making a number of common usecases easy for application developers. The problem with leaving this to the application is that they might forget to do this (or outright not know that this is a thing). Application developers are typically not going that deep into DNS, and judging by djcs response even for those in the ecosystem this isn't exactly an obvious feature of SRV records. On the other hand, there might already be (and definitely will be in a few months) use cases that depend on these load balancing features of SRV records in the wild, which would get a real headache should an application forget to do this and just always try the top one first, especially if that application becomes popular. Having the library handling this transparently will prevent that outcome.
Author
Owner

@djc commented on GitHub (Jan 22, 2026):

Okay, let's copy that section over:

A server selection mechanism. The weight field specifies a
relative weight for entries with the same priority. Larger
weights SHOULD be given a proportionately higher probability of
being selected. The range of this number is 0-65535. This is a
16 bit unsigned integer in network byte order. Domain
administrators SHOULD use Weight 0 when there isn't any server
selection to do, to make the RR easier to read for humans (less
noisy). In the presence of records containing weights greater
than 0, records with weight 0 should have a very small chance of
being selected.

In the absence of a protocol whose specification calls for the
use of other weighting information, a client arranges the SRV
RRs of the same Priority in the order in which target hosts,
specified by the SRV RRs, will be contacted. The following
algorithm SHOULD be used to order the SRV RRs of the same
priority:

To select a target to be contacted next, arrange all SRV RRs
(that have not been ordered yet) in any order, except that all
those with weight 0 are placed at the beginning of the list.

Compute the sum of the weights of those RRs, and with each RR
associate the running sum in the selected order. Then choose a
uniform random number between 0 and the sum computed
(inclusive), and select the RR whose running sum value is the
first in the selected order which is greater than or equal to
the random number selected. The target host specified in the
selected SRV RR is the next one to be contacted by the client.
Remove this SRV RR from the set of the unordered SRV RRs and
apply the described algorithm to the unordered SRV RRs to select
the next target host. Continue the ordering process until there
are no unordered SRV RRs. This process is repeated for each
Priority.

I don't think we'd want to do this unconditionally as that would be a tax on use cases that don't need it. We could offer explicit API for this. Either way, it doesn't feel like we're non-compliant with the RFC in the current state. I'd be open to reviewing a PR that adds API to create a weighted_iter() or something like that.

<!-- gh-comment-id:3783326585 --> @djc commented on GitHub (Jan 22, 2026): Okay, let's copy that section over: > A server selection mechanism. The weight field specifies a > relative weight for entries with the same priority. Larger > weights SHOULD be given a proportionately higher probability of > being selected. The range of this number is 0-65535. This is a > 16 bit unsigned integer in network byte order. Domain > administrators SHOULD use Weight 0 when there isn't any server > selection to do, to make the RR easier to read for humans (less > noisy). In the presence of records containing weights greater > than 0, records with weight 0 should have a very small chance of > being selected. > > In the absence of a protocol whose specification calls for the > use of other weighting information, a client arranges the SRV > RRs of the same Priority in the order in which target hosts, > specified by the SRV RRs, will be contacted. The following > algorithm SHOULD be used to order the SRV RRs of the same > priority: > > To select a target to be contacted next, arrange all SRV RRs > (that have not been ordered yet) in any order, except that all > those with weight 0 are placed at the beginning of the list. > > Compute the sum of the weights of those RRs, and with each RR > associate the running sum in the selected order. Then choose a > uniform random number between 0 and the sum computed > (inclusive), and select the RR whose running sum value is the > first in the selected order which is greater than or equal to > the random number selected. The target host specified in the > selected SRV RR is the next one to be contacted by the client. > Remove this SRV RR from the set of the unordered SRV RRs and > apply the described algorithm to the unordered SRV RRs to select > the next target host. Continue the ordering process until there > are no unordered SRV RRs. This process is repeated for each > Priority. I don't think we'd want to do this unconditionally as that would be a tax on use cases that don't need it. We could offer explicit API for this. Either way, it doesn't feel like we're non-compliant with the RFC in the current state. I'd be open to reviewing a PR that adds API to create a `weighted_iter()` or something like that.
Author
Owner

@rnijveld commented on GitHub (Jan 22, 2026):

I would like to add that the meaning of SHOULD in RFCs is somewhat stricter than should in regular language use. Specifically RFC2119 specifies SHOULD as:

there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course

Looking at RFC2782 the text only indicates two possible reasons for deviating. First is to set weights to 0 if no server selection should be made, which should be a choice the server administrator makes. In this case I would say the current behavior is exactly what is needed.

Second reason mentioned is if a specific protocol that uses the SRV records has its own weighting that overrides the SRV records weights. I would imagine this could happen if there is any protocol that needs specific load balancing properties which cannot be fulfilled according to the algorithm specified in that RFC. I've tried searching for any standardized algorithm that has any kind of stipulation that overrides using the RFC2782 algorithm, but I could not find any. So from that point of view I don't really see any strong reason to set the default behavior to not follow the RFC here.

Making this an optional choice where the default behavior is to not follow the RFC also doesn't make much sense to me: the meaning of SHOULD specifically makes something more of an opt-out than an opt-in. Given that I also couldn't find any standard that relies on the opt-out, making that the default feels very weird, because basically everyone using the library should enable the behavior to follow the RFC correctly.

I would also consider looking at this from the server administrator's point of view: what would their reason for including weights (so they aren't set to 0, and they return multiple SRV records) be? The only reason I can think of is that they would expect their clients to do some kind of load balancing. The implication here would be that an administrator might expect traffic to be shared between multiple nodes, where they will end up getting all traffic on the node that the first record specified. That seems to me like a pretty hefty fine to pay for not following this section of the RFC.

<!-- gh-comment-id:3783640483 --> @rnijveld commented on GitHub (Jan 22, 2026): I would like to add that the meaning of SHOULD in RFCs is somewhat stricter than should in regular language use. Specifically RFC2119 specifies SHOULD as: > there may exist valid reasons in particular circumstances to ignore a > particular item, but the full implications must be understood and > carefully weighed before choosing a different course Looking at RFC2782 the text only indicates two possible reasons for deviating. First is to set weights to 0 if no server selection should be made, which should be a choice the server administrator makes. In this case I would say the current behavior is exactly what is needed. Second reason mentioned is if a specific protocol that uses the SRV records has its own weighting that overrides the SRV records weights. I would imagine this could happen if there is any protocol that needs specific load balancing properties which cannot be fulfilled according to the algorithm specified in that RFC. I've tried searching for any standardized algorithm that has any kind of stipulation that overrides using the RFC2782 algorithm, but I could not find any. So from that point of view I don't really see any strong reason to set the default behavior to not follow the RFC here. Making this an optional choice where the default behavior is to not follow the RFC also doesn't make much sense to me: the meaning of SHOULD specifically makes something more of an opt-out than an opt-in. Given that I also couldn't find any standard that relies on the opt-out, making that the default feels very weird, because basically everyone using the library should enable the behavior to follow the RFC correctly. I would also consider looking at this from the server administrator's point of view: what would their reason for including weights (so they aren't set to 0, and they return multiple SRV records) be? The only reason I can think of is that they would expect their clients to do some kind of load balancing. The implication here would be that an administrator might expect traffic to be shared between multiple nodes, where they will end up getting all traffic on the node that the first record specified. That seems to me like a pretty hefty fine to pay for not following this section of the RFC.
Author
Owner

@cpu commented on GitHub (Feb 12, 2026):

Is someone from the ntpd-rs side interested in providing a PR & test coverage for the behaviour you want to see? I'm not sure we have consensus that it should be the default instead of a separate API but that feels like a fairly easy detail to hash out in review relative to the rest.

<!-- gh-comment-id:3893184488 --> @cpu commented on GitHub (Feb 12, 2026): Is someone from the ntpd-rs side interested in providing a PR & test coverage for the behaviour you want to see? I'm not sure we have consensus that it should be the default instead of a separate API but that feels like a fairly easy detail to hash out in review relative to the rest.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#1182
No description provided.