[GH-ISSUE #1171] Can no longer detect NXDOMAIN/empty answer in Resolver queries #618

Closed
opened 2026-03-15 23:28:45 +03:00 by kerem · 14 comments
Owner

Originally created by @glts on GitHub (Jul 23, 2020).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1171

I have been using the synchronous Resolver to do DNS lookups and switching on ResolveErrorKind::NoRecordsFound { .. } to detect NXDOMAIN errors.

This no longer works, instead I now get an untyped/generic protocol error:

Err(ResolveError { kind: Proto(ProtoError { kind: Message("Nameserver responded with NXDomain"), backtrack: None }), backtrack: None })

I believe #1086 changed this, a breaking change. Is this intentional? How does one recognise NXDOMAIN/empty answers in the Resolver API?

To reproduce:

  • make a DNS query for a nonexistent domain or one that returns an empty answer
  • compare returned answer in Trust-DNS Resolver 0.19.5 with current master (d67acf9)
Originally created by @glts on GitHub (Jul 23, 2020). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1171 I have been using the synchronous `Resolver` to do DNS lookups and switching on `ResolveErrorKind::NoRecordsFound { .. }` to detect NXDOMAIN errors. This no longer works, instead I now get an untyped/generic protocol error: ``` Err(ResolveError { kind: Proto(ProtoError { kind: Message("Nameserver responded with NXDomain"), backtrack: None }), backtrack: None }) ``` I believe #1086 changed this, a breaking change. Is this intentional? How does one recognise NXDOMAIN/empty answers in the `Resolver` API? **To reproduce:** * make a DNS query for a nonexistent domain or one that returns an empty answer * compare returned answer in Trust-DNS Resolver 0.19.5 with current master (d67acf9)
kerem 2026-03-15 23:28:45 +03:00
Author
Owner

@bluejekyll commented on GitHub (Jul 23, 2020):

We can make that more strongly typed. Not sure if we missed that in the review or if this is a new code path, but that’s probably the easiest fix for this.

<!-- gh-comment-id:663050177 --> @bluejekyll commented on GitHub (Jul 23, 2020): We can make that more strongly typed. Not sure if we missed that in the review or if this is a new code path, but that’s probably the easiest fix for this.
Author
Owner

@glts commented on GitHub (Jul 23, 2020):

To give just a little bit of context, my application is SPF protocol implementation. For the SPF protocol I must be able to distinguish both the NXDOMAIN and the empty answer results from other errors.

<!-- gh-comment-id:663162603 --> @glts commented on GitHub (Jul 23, 2020): To give just a little bit of context, my application is SPF protocol implementation. For the SPF protocol I *must* be able to distinguish both the NXDOMAIN and the empty answer results from other errors.
Author
Owner

@bluejekyll commented on GitHub (Jul 23, 2020):

Understood. There are two error codes here, NxDomain and NoError. No error is the empty response, NxDomain is the authoritative non-existent domain. All we need to do is associate the proper error code to a real type in the errors.

<!-- gh-comment-id:663204103 --> @bluejekyll commented on GitHub (Jul 23, 2020): Understood. There are two error codes here, NxDomain and NoError. No error is the empty response, NxDomain is the authoritative non-existent domain. All we need to do is associate the proper error code to a real type in the errors.
Author
Owner

@zaharidichev commented on GitHub (Aug 28, 2020):

@bluejekyll what issue does https://github.com/bluejekyll/trust-dns/issues/1171 solve exactly? It seems to me that we can not have the NxDomain and NoError codes as proto errors and just leave them be processed by this logic so we get a NoRecordsFound error. I am fine with putting up a PR for that. Does this approach sound ok to you?

<!-- gh-comment-id:682433515 --> @zaharidichev commented on GitHub (Aug 28, 2020): @bluejekyll what issue does https://github.com/bluejekyll/trust-dns/issues/1171 solve exactly? It seems to me that we can not have the `NxDomain` and `NoError` codes as proto errors and just leave them be processed by [this logic](https://github.com/bluejekyll/trust-dns/blob/main/crates/resolver/src/lookup_state.rs#L161) so we get a `NoRecordsFound ` error. I am fine with putting up a PR for that. Does this approach sound ok to you?
Author
Owner

@bluejekyll commented on GitHub (Aug 28, 2020):

@zaharidichev I'm sorry, I'm not fully understanding the way in which your suggesting to fix this. If you have a PR ready, I'm happy to take a look.

One of the issues here is really a question of what Error means in this context. I'm not entirely certain we should surface an error to the caller of the lookup as an NXDomain is a valid response from the upstream server. Right now, we don't pass that information back to the lookup caller at all. So I think the first question we should decide on is, should this be an error returned via the Result on the Resolver::lookup, or should we add a new field on Lookup that returns the underlying response code from the request?

<!-- gh-comment-id:682951664 --> @bluejekyll commented on GitHub (Aug 28, 2020): @zaharidichev I'm sorry, I'm not fully understanding the way in which your suggesting to fix this. If you have a PR ready, I'm happy to take a look. One of the issues here is really a question of what `Error` means in this context. I'm not entirely certain we should surface an error to the caller of the lookup as an `NXDomain` is a valid response from the upstream server. Right now, we don't pass that information back to the lookup caller at all. So I think the first question we should decide on is, should this be an error returned via the `Result` on the `Resolver::lookup`, or should we add a new field on `Lookup` that returns the underlying response code from the request?
Author
Owner

@bluejekyll commented on GitHub (Aug 29, 2020):

For a little more context here. There are a number of things that make returning NXDomain accurately from the lookup call difficult. Off the top of my head:

  • Misconfigured internal DNS servers some times return NXDomain when they are not in fact authorities, see #933 or conversely are inappropriately using unregistered public domains
  • The search list for resolution of non-fqdn (fully-qualified-domain-names) will continue on NXDomain, b/c we have to.

So for the API change, I think we need to incorporate this, and also anticipate that NXDomain will only be semi-accurate for fqdns...

<!-- gh-comment-id:683333642 --> @bluejekyll commented on GitHub (Aug 29, 2020): For a little more context here. There are a number of things that make returning NXDomain accurately from the lookup call difficult. Off the top of my head: - Misconfigured internal DNS servers some times return NXDomain when they are not in fact authorities, see #933 or conversely are inappropriately using unregistered public domains - The search list for resolution of non-fqdn (fully-qualified-domain-names) will continue on NXDomain, b/c we have to. So for the API change, I think we need to incorporate this, and also anticipate that NXDomain will only be semi-accurate for fqdns...
Author
Owner

@zaharidichev commented on GitHub (Aug 31, 2020):

@bluejekyll Sorry I might be missing a big part of the context here but it seems to me that before this commit NameServer::inner_send was returning the response instead of an error when encountering NXDomain. Later that response was processed by CachingClient::inner_lookup by looking at the response code. Are you saying that all this logic should not be there and instead the caller should be responsible for handling all that?

<!-- gh-comment-id:683634768 --> @zaharidichev commented on GitHub (Aug 31, 2020): @bluejekyll Sorry I might be missing a big part of the context here but it seems to me that before [this commit](https://github.com/bluejekyll/trust-dns/pull/1086) `NameServer::inner_send` was returning the response instead of an error when encountering NXDomain. Later that response was processed by `CachingClient::inner_lookup` by looking at the response code. Are you saying that all this logic should not be there and instead the caller should be responsible for handling all that?
Author
Owner

@bluejekyll commented on GitHub (Aug 31, 2020):

Maybe I'm overthinking the issue here. Would it be enough to surface the error in this case directly with the ResponseCode?

<!-- gh-comment-id:683899501 --> @bluejekyll commented on GitHub (Aug 31, 2020): Maybe I'm overthinking the issue here. Would it be enough to surface the error in this case directly with the ResponseCode?
Author
Owner

@zaharidichev commented on GitHub (Sep 2, 2020):

Yes I think this is what needs to happen.

<!-- gh-comment-id:685652702 --> @zaharidichev commented on GitHub (Sep 2, 2020): Yes I think this is what needs to happen.
Author
Owner

@bluejekyll commented on GitHub (Sep 5, 2020):

Let me work on a quick PR for this, perhaps there are some folks on this thread that would be interested in testing with it.

Also, please notice the flag distrust_nx_responses, that flag purposefully ignores NXDomain as an upstream permanent failure (which it should be), and instead regards it as a potential issue in upstream resolver configuration and continues lookups. Making that false, should return NXDomain properly, I hope.

<!-- gh-comment-id:687658529 --> @bluejekyll commented on GitHub (Sep 5, 2020): Let me work on a quick PR for this, perhaps there are some folks on this thread that would be interested in testing with it. Also, please notice the flag `distrust_nx_responses`, that flag purposefully ignores NXDomain as an upstream permanent failure (which it should be), and instead regards it as a potential issue in upstream resolver configuration and continues lookups. Making that false, should return NXDomain properly, I hope.
Author
Owner

@bluejekyll commented on GitHub (Sep 7, 2020):

@zaharidichev @glts please see #1197, I believe this will fix the issue.

I know remember why this wasn't correct before, I had punted on doing the type golf needed to support this. I need to take a pass to see if I can simplify any of the type signatures, but this should resolve the concerns here, I hope.

<!-- gh-comment-id:688448031 --> @bluejekyll commented on GitHub (Sep 7, 2020): @zaharidichev @glts please see #1197, I believe this will fix the issue. I know remember why this wasn't correct before, I had punted on doing the type golf needed to support this. I need to take a pass to see if I can simplify any of the type signatures, but this should resolve the concerns here, I hope.
Author
Owner

@zaharidichev commented on GitHub (Sep 16, 2020):

@bluejekyll This is great. Thanks a lot for the fix! Is there any ETA on when a new release will be out including these changes?

<!-- gh-comment-id:693186479 --> @zaharidichev commented on GitHub (Sep 16, 2020): @bluejekyll This is great. Thanks a lot for the fix! Is there any ETA on when a new release will be out including these changes?
Author
Owner

@bluejekyll commented on GitHub (Sep 16, 2020):

I will publish today. There are a few library deps I want to update first.

<!-- gh-comment-id:693468402 --> @bluejekyll commented on GitHub (Sep 16, 2020): I will publish today. There are a few library deps I want to update first.
Author
Owner

@bluejekyll commented on GitHub (Sep 17, 2020):

@zaharidichev 0.20.0-alpha.2 has been published.

<!-- gh-comment-id:693741858 --> @bluejekyll commented on GitHub (Sep 17, 2020): @zaharidichev 0.20.0-alpha.2 has been published.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#618
No description provided.