[GH-ISSUE #2066] No default root certs when using rustls #868

Open
opened 2026-03-16 00:39:16 +03:00 by kerem · 14 comments
Owner

Originally created by @simonsmd on GitHub (Oct 15, 2023).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2066

Is your feature request related to a problem? Please describe.
The dns-over-rustls and dns-over-https-rustls features of the hickory-resolver crate do not enable the native-certs or webpki-roots feature by default. If the user does not enable one of these features manually, no root certificates are added to the default rustls configuration and connecting to a DoT or DoH server fails because the server certificate cannot be validated.

Describe the solution you'd like
Ideally the webpki-roots feature would be enabled by default if dns-over-rustls or dns-over-https-rustls is enabled.
However, as far as i can tell the webpki-roots feature overwrites the native-certs feature which means it would not be possible to use native-certs anymore. This can be solved either by removing the webpki-roots feature and always enabling the code that adds the webpki-roots certificates unless native-certs is enabled or making the native-certs feature overwrite the webpki-roots feature.

Describe alternatives you've considered
If the above solution is not possible/desired, the requirement to enable one of the two features when using DoT or DoH should at least be documented.

Additional context
The missing root certificates lead to one of the following two errors (in my testing):
ResolveError { kind: Proto(ProtoError { kind: Io(Kind(ConnectionRefused)) }) }
ResolveError { kind: Proto(ProtoError { kind: Io(Os { code: 101, kind: NetworkUnreachable, message: "Network is unreachable" }) }) }

These errors do not mention the real cause of the issue which makes debugging this problem challenging. The failed server certificate validation is logged if debug logging is enabled but I think it would be better if the error would mention the failed certificate validation. I am not familiar with the code base though so I do not know how hard it is to implement this.

Originally created by @simonsmd on GitHub (Oct 15, 2023). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2066 **Is your feature request related to a problem? Please describe.** The `dns-over-rustls` and `dns-over-https-rustls` features of the hickory-resolver crate do not enable the `native-certs` or `webpki-roots` feature by default. If the user does not enable one of these features manually, no root certificates are added to the default rustls configuration and connecting to a DoT or DoH server fails because the server certificate cannot be validated. **Describe the solution you'd like** Ideally the `webpki-roots` feature would be enabled by default if `dns-over-rustls` or `dns-over-https-rustls` is enabled. However, as far as i can tell the `webpki-roots` feature overwrites the `native-certs` feature which means it would not be possible to use `native-certs` anymore. This can be solved either by removing the `webpki-roots` feature and always enabling the code that adds the webpki-roots certificates unless `native-certs` is enabled or making the `native-certs` feature overwrite the `webpki-roots` feature. **Describe alternatives you've considered** If the above solution is not possible/desired, the requirement to enable one of the two features when using DoT or DoH should at least be documented. **Additional context** The missing root certificates lead to one of the following two errors (in my testing): `ResolveError { kind: Proto(ProtoError { kind: Io(Kind(ConnectionRefused)) }) }` `ResolveError { kind: Proto(ProtoError { kind: Io(Os { code: 101, kind: NetworkUnreachable, message: "Network is unreachable" }) }) }` These errors do not mention the real cause of the issue which makes debugging this problem challenging. The failed server certificate validation is logged if debug logging is enabled but I think it would be better if the error would mention the failed certificate validation. I am not familiar with the code base though so I do not know how hard it is to implement this.
Author
Owner

@bluejekyll commented on GitHub (Oct 15, 2023):

@daxpedda, do you have any thoughts on this?

<!-- gh-comment-id:1763527656 --> @bluejekyll commented on GitHub (Oct 15, 2023): @daxpedda, do you have any thoughts on this?
Author
Owner

@bluejekyll commented on GitHub (Oct 15, 2023):

btw, I think the NetworkUnreachable message is an IPv6 issue, and not necessarily related.

<!-- gh-comment-id:1763529639 --> @bluejekyll commented on GitHub (Oct 15, 2023): btw, I think the `NetworkUnreachable` message is an IPv6 issue, and not necessarily related.
Author
Owner

@daxpedda commented on GitHub (Oct 16, 2023):

The problem with adding either webpki-roots or native-certs to dns-over-rustls is that it forces the dependency on users which might not want to use it.

The alternative of course is to add it to the default feature set, but that is weird because the default feature set doesn't even include dns-over-rustls/quic/http/h3 to begin with. This might be the best way forward though.

Whatever the decision, I believe this issue will be common among consumers, so it should be documented appropriately.

Ideally the webpki-roots feature would be enabled by default if dns-over-rustls or dns-over-https-rustls is enabled.

I think it was mentioned before that the goal is to have native root certs by default.

<!-- gh-comment-id:1764110894 --> @daxpedda commented on GitHub (Oct 16, 2023): The problem with adding either `webpki-roots` or `native-certs` to `dns-over-rustls` is that it forces the dependency on users which might not want to use it. The alternative of course is to add it to the default feature set, but that is weird because the default feature set doesn't even include `dns-over-rustls/quic/http/h3` to begin with. This might be the best way forward though. Whatever the decision, I believe this issue will be common among consumers, so it should be documented appropriately. > Ideally the `webpki-roots` feature would be enabled by default if `dns-over-rustls` or `dns-over-https-rustls` is enabled. I think it was mentioned before that the goal is to have native root certs by default.
Author
Owner

@djc commented on GitHub (Oct 16, 2023):

IIRC most libraries pick a default via the default feature, allowing downstream users to work with the defaults while also allowing them to disable the defaults and enable the other option. This is friendlier especially for inexperienced users.

<!-- gh-comment-id:1764322932 --> @djc commented on GitHub (Oct 16, 2023): IIRC most libraries pick a default via the default feature, allowing downstream users to work with the defaults while also allowing them to disable the defaults and enable the other option. This is friendlier especially for inexperienced users.
Author
Owner

@bluejekyll commented on GitHub (Oct 16, 2023):

it forces the dependency on users which might not want to use it.

I think we need to expose the choice as part of the API (and the feature only adds/removes the dependency). This would make the choice explicit in the code and not change behavior unexpectedly as feature enablement is additive.

<!-- gh-comment-id:1765447520 --> @bluejekyll commented on GitHub (Oct 16, 2023): > it forces the dependency on users which might not want to use it. I think we need to expose the choice as part of the API (and the feature only adds/removes the dependency). This would make the choice explicit in the code and not change behavior unexpectedly as feature enablement is additive.
Author
Owner

@cpu commented on GitHub (Nov 3, 2023):

I bumped into this updating some code from v0.23.2 to latest.

On 0.23.2 this code works without panic when run with --features dns-over-https-rustls alone:

    use trust_dns_proto::rr::RecordType;
    use trust_dns_resolver::config::{ResolverConfig, ResolverOpts};
    use trust_dns_resolver::Resolver;

    #[test]
    fn repro() {
        let resolver =
            Resolver::new(ResolverConfig::cloudflare_https(), ResolverOpts::default()).unwrap();
        let data = resolver
            .lookup("crypto.cloudflare.com", RecordType::HTTPS)
            .expect("failed to lookup HTTPS record type");
        println!("{data:?}")
    }

However, switching to 0.24+ (and updating the trust_dns_xxx imports to hickory_) and running with just --features dns-over-https-rustls produces an unclear error from lookup

thread 'tests::repro' panicked at 'failed to lookup HTTPS record type: ResolveError { kind: Proto(ProtoError { kind: Io(Kind(InvalidData)) }) }', crates/resolver/tests/regression.rs:18:14

As described in this issue enabling the webpki-roots feature and running with --features dns-over-https-rustls,webpki-roots resolves the problem.

I think the main point of confusion for me is why the error is being reported as a invalid data instead of something about being unable to verify the presented certificate chain to a known trust anchor. If the error indicated that I think it would be a lot easier to debug. Unlike the OP I don't see a connection refused or IPv6 unreachable error.

<!-- gh-comment-id:1792552087 --> @cpu commented on GitHub (Nov 3, 2023): I bumped into this updating some code from v0.23.2 to latest. On 0.23.2 this code works without panic when run with `--features dns-over-https-rustls` alone: ```rust use trust_dns_proto::rr::RecordType; use trust_dns_resolver::config::{ResolverConfig, ResolverOpts}; use trust_dns_resolver::Resolver; #[test] fn repro() { let resolver = Resolver::new(ResolverConfig::cloudflare_https(), ResolverOpts::default()).unwrap(); let data = resolver .lookup("crypto.cloudflare.com", RecordType::HTTPS) .expect("failed to lookup HTTPS record type"); println!("{data:?}") } ``` However, switching to 0.24+ (and updating the `trust_dns_xxx` imports to `hickory_`) and running with just `--features dns-over-https-rustls` produces an unclear error from `lookup` ``` thread 'tests::repro' panicked at 'failed to lookup HTTPS record type: ResolveError { kind: Proto(ProtoError { kind: Io(Kind(InvalidData)) }) }', crates/resolver/tests/regression.rs:18:14 ``` As described in this issue enabling the `webpki-roots` feature and running with `--features dns-over-https-rustls,webpki-roots` resolves the problem. I think the main point of confusion for me is why the error is being reported as a invalid data instead of something about being unable to verify the presented certificate chain to a known trust anchor. If the error indicated that I think it would be a lot easier to debug. Unlike the OP I don't see a connection refused or IPv6 unreachable error.
Author
Owner

@XOR-op commented on GitHub (Nov 17, 2023):

Struggled with this bug for a few hours today. I suggest we should add a bug label for this issue and/or add notes to the changelog, since this behavior is a breaking change (requiring users to manually enable features if they use rustls before) and not well documented.

<!-- gh-comment-id:1815662697 --> @XOR-op commented on GitHub (Nov 17, 2023): Struggled with this bug for a few hours today. I suggest we should add a bug label for this issue and/or add notes to the changelog, since this behavior is a breaking change (requiring users to manually enable features if they use rustls before) and not well documented.
Author
Owner

@cpu commented on GitHub (Dec 11, 2023):

I think there's two issues here:

  1. Addressing what should happen if a feature for the source of trust anchors isn't provided (e.g. accounting for the webpki-roots default being removed)
  2. Looking into what's eating the more specific errors from tokio-rustls/rustls

Resolving 1 makes sense and would fix the breakage users are seeing, but I suspect 2 will cause other issues down the road (like if the DNS server's certificate happens to be invalid). I started trying to figure out where the error was being swallowed/remapped ~last week, but didn't arrive at a conclusion before I ran out of time.

As a quick fix it seems like it might be worthwhile to have the lazy CLIENT_CONFIG constructor in resolver/src/tls/dns_over_rustls.rs error if root_store.is_empty() after evaluating the feature flags for native-certs and webpki-roots by surfacing an error suggesting using one of the two feature flags. That will probably cut down on some confusion in the meantime.

I hope I can find time to look deeper but it won't be for a little while.

<!-- gh-comment-id:1850893272 --> @cpu commented on GitHub (Dec 11, 2023): I think there's two issues here: 1. Addressing what should happen if a feature for the source of trust anchors isn't provided (e.g. accounting for the `webpki-roots` default being removed) 2. Looking into what's eating the more specific errors from tokio-rustls/rustls Resolving 1 makes sense and would fix the breakage users are seeing, but I suspect 2 will cause other issues down the road (like if the DNS server's certificate happens to be invalid). I started trying to figure out where the error was being swallowed/remapped ~last week, but didn't arrive at a conclusion before I ran out of time. As a quick fix it seems like it might be worthwhile to have the lazy `CLIENT_CONFIG` constructor in `resolver/src/tls/dns_over_rustls.rs` error if `root_store.is_empty()` after evaluating the feature flags for `native-certs` and `webpki-roots` by surfacing an error suggesting using one of the two feature flags. That will probably cut down on some confusion in the meantime. I hope I can find time to look deeper but it won't be for a little while.
Author
Owner

@cpu commented on GitHub (Apr 13, 2024):

I think there's two issues here:

Addressing what should happen if a feature for the source of trust anchors isn't provided (e.g. accounting for the webpki-roots default being removed)

Here's a PR that attempts to fix this aspect. This has come up for a few people now (both in Discord and in other issues) so it seems worth fixing ASAP: https://github.com/hickory-dns/hickory-dns/pull/2179

Looking into what's eating the more specific errors from tokio-rustls/rustls

I went down this rabbit hole again today and haven't found a conclusive answer. In my testing upstream we get a nice io::Error like err: Custom { kind: InvalidData, error: InvalidCertificate(UnknownIssuer) } frrom a tokio rustls connection made with a config without roots to a remote server as expected. I can't figure out where or why, but something in the Hickory DNS stack is reducing this down to just a simple io error with type InvalidData - no nested source error or message.

<!-- gh-comment-id:2053689091 --> @cpu commented on GitHub (Apr 13, 2024): > I think there's two issues here: > > Addressing what should happen if a feature for the source of trust anchors isn't provided (e.g. accounting for the webpki-roots default being removed) Here's a PR that attempts to fix this aspect. This has come up for a few people now (both in Discord and in other issues) so it seems worth fixing ASAP: https://github.com/hickory-dns/hickory-dns/pull/2179 > Looking into what's eating the more specific errors from tokio-rustls/rustls I went down this rabbit hole again today and haven't found a conclusive answer. In my testing upstream we get a nice `io::Error ` like err: `Custom { kind: InvalidData, error: InvalidCertificate(UnknownIssuer) }` frrom a tokio rustls connection made with a config without roots to a remote server as expected. I can't figure out where or why, but something in the Hickory DNS stack is reducing this down to just a simple io error with type InvalidData - no nested source error or message.
Author
Owner

@djc commented on GitHub (Apr 13, 2024):

I remember digging into this, IIRC there's an issue where hickory tries to reconstruct an io::Error and there's no way to extract the existing custom error kind or something like that. Probably in one of the error conversions in one of the error.rs modules.

<!-- gh-comment-id:2053715466 --> @djc commented on GitHub (Apr 13, 2024): I remember digging into this, IIRC there's an issue where hickory tries to reconstruct an io::Error and there's no way to extract the existing custom error kind or something like that. Probably in one of the error conversions in one of the error.rs modules.
Author
Owner

@cpu commented on GitHub (Apr 13, 2024):

Probably in one of the error conversions in one of the error.rs modules.

Good pointer, thanks! It's the Clone impl for ProtoErrorKind, specifically the else clause here:

github.com/hickory-dns/hickory-dns@6c2a1e2c23/crates/proto/src/error.rs (L644-L648)

That creates a "simple representation of error kind" and so boils our upstream error into just InvalidData.

<!-- gh-comment-id:2053743003 --> @cpu commented on GitHub (Apr 13, 2024): > Probably in one of the error conversions in one of the error.rs modules. Good pointer, thanks! It's the `Clone` impl for `ProtoErrorKind`, specifically the `else` clause here: https://github.com/hickory-dns/hickory-dns/blob/6c2a1e2c238eaecff2b6b6c1c0e435685bc0997e/crates/proto/src/error.rs#L644-L648 That creates a "simple representation of error kind" and so boils our upstream error into just `InvalidData`.
Author
Owner

@cpu commented on GitHub (Apr 13, 2024):

there's no way to extract the existing custom error kind or something like that

By my read we have the original error with full info as e from the Io(ref e) binding, but it isn't Clone so some contortions are needed. We can't use e or the inner error from get_ref() with io::Error::new(e.kind(), xxx) to make our own new error because borrowed data would escape the method.

I think the usual fix would be to have Io(io::Error) be Io(Rc<io::Error>) or Io(Arc<io::Error>) instead right? 🤔 That's a breaking change, but I'm not sure I spot another fix and it seems fairly important to keep the broader error context.

<!-- gh-comment-id:2053746377 --> @cpu commented on GitHub (Apr 13, 2024): > there's no way to extract the existing custom error kind or something like that By my read we have the original error with full info as `e` from the `Io(ref e)` binding, but it isn't `Clone` so some contortions are needed. We can't use `e` or the inner error from `get_ref()` with `io::Error::new(e.kind(), xxx)` to make our own new error because borrowed data would escape the method. I think the usual fix would be to have `Io(io::Error)` be `Io(Rc<io::Error>)` or `Io(Arc<io::Error>)` instead right? :thinking: That's a breaking change, but I'm not sure I spot another fix and it seems fairly important to keep the broader error context.
Author
Owner

@djc commented on GitHub (Apr 13, 2024):

Breaking change is fine, we're going to release a breaking change anyway once Quinn 0.11 is published.

<!-- gh-comment-id:2053747976 --> @djc commented on GitHub (Apr 13, 2024): Breaking change is fine, we're going to release a breaking change anyway once Quinn 0.11 is published.
Author
Owner

@cpu commented on GitHub (Apr 13, 2024):

Breaking change is fine

Sounds good. Here's a fix: https://github.com/hickory-dns/hickory-dns/pull/2181

I think it might be sensible to consider landing it & https://github.com/hickory-dns/hickory-dns/pull/2179 both, but with the InvalidCertificate(UnknownIssuer) error coming through correctly #2179 seems less pressing. It's a lot easier to arrive at the right fix when you know the underlying error.

<!-- gh-comment-id:2053758515 --> @cpu commented on GitHub (Apr 13, 2024): > Breaking change is fine Sounds good. Here's a fix: https://github.com/hickory-dns/hickory-dns/pull/2181 I think it might be sensible to consider landing it & https://github.com/hickory-dns/hickory-dns/pull/2179 both, but with the `InvalidCertificate(UnknownIssuer)` error coming through correctly #2179 seems less pressing. It's a lot easier to arrive at the right fix when you know the underlying error.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#868
No description provided.