[GH-ISSUE #532] dns-over-https should reuse connections #522

Closed
opened 2026-03-15 22:55:51 +03:00 by kerem · 16 comments
Owner

Originally created by @kpcyrd on GitHub (Jul 15, 2018).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/532

I've used the dns-over-https resolver from master for a bit and it seems it's creating a fresh tls connection for each lookup, so each lookup takes around 300ms for me. I think trust-dns-resolver should try to keep the connection alive for as long as possible to reduce the number of roundtrips needed, as creating a tls session is expensive in a low-latency context. :)

Originally created by @kpcyrd on GitHub (Jul 15, 2018). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/532 I've used the dns-over-https resolver from master for a bit and it seems it's creating a fresh tls connection for each lookup, so each lookup takes around 300ms for me. I think trust-dns-resolver should try to keep the connection alive for as long as possible to reduce the number of roundtrips needed, as creating a tls session is expensive in a low-latency context. :)
Author
Owner

@bluejekyll commented on GitHub (Jul 15, 2018):

Hm. Mind sharing your code (though I won’t have much time to look at it as I’ll be on a plane and vacation the next couple of weeks)? It’s possible I screwed up that implementation, but if you have the same Resolver shared across uses, it shouldn’t recreate connections, and should share them across uses.

<!-- gh-comment-id:405096994 --> @bluejekyll commented on GitHub (Jul 15, 2018): Hm. Mind sharing your code (though I won’t have much time to look at it as I’ll be on a plane and vacation the next couple of weeks)? It’s possible I screwed up that implementation, but if you have the same Resolver shared across uses, it shouldn’t recreate connections, and should share them across uses.
Author
Owner

@kpcyrd commented on GitHub (Jul 15, 2018):

I've reduced my code to a standalone program to reproduce the issue:

extern crate trust_dns_resolver;
extern crate trust_dns_proto;

use trust_dns_resolver::Resolver;
use trust_dns_resolver::config::*;
use trust_dns_proto::rr::record_type::RecordType;

use std::thread;
use std::time::Duration;

fn sleep(secs: u64) {
    thread::sleep(Duration::from_secs(secs));
}

fn main() {
    let resolver_config = ResolverConfig::cloudflare_https();
    let mut resolver_opts = ResolverOpts::default();
    resolver_opts.use_hosts_file = false;
    let resolver = Resolver::new(resolver_config, resolver_opts).expect("Resolver::new");

    println!("resolving google.com");
    resolver.lookup("google.com", RecordType::A).expect("lookup google.com");
    println!("done");

    sleep(1);

    println!("resolving github.com");
    resolver.lookup("github.com", RecordType::A).expect("lookup google.com");
    println!("done");
}

Using sniffglue you can observe that a new tls connection is created for each query.

// google resolve starts here
[tcp] 192.168.X.X:57772      -> 1.1.1.1:443            [tls] ClientHello (hostname: "cloudflare-dns.com")
[tcp] 192.168.X.X:54826      -> 1.0.0.1:443            [tls] ClientHello (hostname: "cloudflare-dns.com")
// github resolve starts here
[tcp] 192.168.X.X:57776      -> 1.1.1.1:443            [tls] ClientHello (hostname: "cloudflare-dns.com")
[tcp] 192.168.X.X:54830      -> 1.0.0.1:443            [tls] ClientHello (hostname: "cloudflare-dns.com")
<!-- gh-comment-id:405100174 --> @kpcyrd commented on GitHub (Jul 15, 2018): I've reduced my code to a standalone program to reproduce the issue: ```rust extern crate trust_dns_resolver; extern crate trust_dns_proto; use trust_dns_resolver::Resolver; use trust_dns_resolver::config::*; use trust_dns_proto::rr::record_type::RecordType; use std::thread; use std::time::Duration; fn sleep(secs: u64) { thread::sleep(Duration::from_secs(secs)); } fn main() { let resolver_config = ResolverConfig::cloudflare_https(); let mut resolver_opts = ResolverOpts::default(); resolver_opts.use_hosts_file = false; let resolver = Resolver::new(resolver_config, resolver_opts).expect("Resolver::new"); println!("resolving google.com"); resolver.lookup("google.com", RecordType::A).expect("lookup google.com"); println!("done"); sleep(1); println!("resolving github.com"); resolver.lookup("github.com", RecordType::A).expect("lookup google.com"); println!("done"); } ``` Using sniffglue you can observe that a new tls connection is created for each query. ``` // google resolve starts here [tcp] 192.168.X.X:57772 -> 1.1.1.1:443 [tls] ClientHello (hostname: "cloudflare-dns.com") [tcp] 192.168.X.X:54826 -> 1.0.0.1:443 [tls] ClientHello (hostname: "cloudflare-dns.com") // github resolve starts here [tcp] 192.168.X.X:57776 -> 1.1.1.1:443 [tls] ClientHello (hostname: "cloudflare-dns.com") [tcp] 192.168.X.X:54830 -> 1.0.0.1:443 [tls] ClientHello (hostname: "cloudflare-dns.com") ```
Author
Owner

@bluejekyll commented on GitHub (Jul 15, 2018):

Ok, thanks. I'll have to review the usage of h2 to see why this is happening...

<!-- gh-comment-id:405100724 --> @bluejekyll commented on GitHub (Jul 15, 2018): Ok, thanks. I'll have to review the usage of h2 to see why this is happening...
Author
Owner

@bluejekyll commented on GitHub (Aug 29, 2018):

I haven't had time to look into this, but was in the code area that may be effecting this yesterday. This boolean option in the h2 connection may be wrong, and it might be as simple as setting this to false:

github.com/bluejekyll/trust-dns@6777f648df/https/src/https_client_stream.rs (L277)

<!-- gh-comment-id:417033595 --> @bluejekyll commented on GitHub (Aug 29, 2018): I haven't had time to look into this, but was in the code area that may be effecting this yesterday. This boolean option in the h2 connection may be wrong, and it might be as simple as setting this to false: https://github.com/bluejekyll/trust-dns/blob/6777f648df50b4be35b38f9850ba1f224c9a558b/https/src/https_client_stream.rs#L277
Author
Owner

@bluejekyll commented on GitHub (Oct 2, 2018):

Ok. Finally tracked this down. A little too much cloning going on. I have a patch locally that I need to clean up that resolves this.

Basically, ConnectionHandle type was cloning the connection details for a NameServer before connecting, causing all connections to connect like it was the first time every time.

<!-- gh-comment-id:426193046 --> @bluejekyll commented on GitHub (Oct 2, 2018): Ok. Finally tracked this down. A little too much cloning going on. I have a patch locally that I need to clean up that resolves this. Basically, ConnectionHandle type was cloning the connection details for a NameServer before connecting, causing all connections to connect like it was the first time every time.
Author
Owner

@juliangehring commented on GitHub (Jan 2, 2019):

When I run @kpcyrd's example with trust-dns-resolver v0.10.2 and trust-dns-proto v0.6.1, wireshark still shows me one TLS initiatialisation (i.e. one TLS client hello) per request. Maybe I'm missing something, but is the TLS connection currently reused?

<!-- gh-comment-id:450982169 --> @juliangehring commented on GitHub (Jan 2, 2019): When I run @kpcyrd's example with trust-dns-resolver v0.10.2 and trust-dns-proto v0.6.1, wireshark still shows me one TLS initiatialisation (i.e. one TLS client hello) per request. Maybe I'm missing something, but is the TLS connection currently reused?
Author
Owner

@bluejekyll commented on GitHub (Jan 2, 2019):

Over how many uses, what time period, and how many configured TLS endpints?

<!-- gh-comment-id:450989362 --> @bluejekyll commented on GitHub (Jan 2, 2019): Over how many uses, what time period, and how many configured TLS endpints?
Author
Owner

@juliangehring commented on GitHub (Jan 2, 2019):

I saw it consistently during 12+ runs over the last 5 days, all using the cloudflare_https and cloudflare_tls resolver config (1.0.0.1).

<!-- gh-comment-id:450991957 --> @juliangehring commented on GitHub (Jan 2, 2019): I saw it consistently during 12+ runs over the last 5 days, all using the `cloudflare_https` and `cloudflare_tls` resolver config (1.0.0.1).
Author
Owner

@bluejekyll commented on GitHub (Jan 2, 2019):

Do you know the frequency of the requests? I’m wondering if the connections are being closed due to timeout. If you want, you can enable debug log lines for the proto crate and see what’s being reported.

<!-- gh-comment-id:451011277 --> @bluejekyll commented on GitHub (Jan 2, 2019): Do you know the frequency of the requests? I’m wondering if the connections are being closed due to timeout. If you want, you can enable debug log lines for the proto crate and see what’s being reported.
Author
Owner

@juliangehring commented on GitHub (Jan 3, 2019):

Here the full example (almost identical to the original one):

use trust_dns_proto::rr::record_type::RecordType;
use trust_dns_resolver::config::*;
use trust_dns_resolver::Resolver;

use std::thread;
use std::time::Duration;

fn sleep(secs: u64) {
    thread::sleep(Duration::from_secs(secs));
}

fn main() {
    let resolver_config = ResolverConfig::cloudflare_https();
    let mut resolver_opts = ResolverOpts::default();
    resolver_opts.use_hosts_file = false;
    let resolver = Resolver::new(resolver_config, resolver_opts).expect("Resolver::new");

    println!("{:?}", resolver_opts);

    println!("resolving google.com");
    resolver
        .lookup("google.com", RecordType::A)
        .expect("lookup google.com");
    println!("done");

    sleep(1);

    println!("resolving github.com");
    resolver
        .lookup("github.com", RecordType::A)
        .expect("lookup google.com");
    println!("done");

    sleep(1);

    println!("resolving example.com");
    resolver
        .lookup("example.com", RecordType::A)
        .expect("lookup example.com");
    println!("done");
}
<!-- gh-comment-id:451251197 --> @juliangehring commented on GitHub (Jan 3, 2019): Here the full example (almost identical to the original one): ```rust use trust_dns_proto::rr::record_type::RecordType; use trust_dns_resolver::config::*; use trust_dns_resolver::Resolver; use std::thread; use std::time::Duration; fn sleep(secs: u64) { thread::sleep(Duration::from_secs(secs)); } fn main() { let resolver_config = ResolverConfig::cloudflare_https(); let mut resolver_opts = ResolverOpts::default(); resolver_opts.use_hosts_file = false; let resolver = Resolver::new(resolver_config, resolver_opts).expect("Resolver::new"); println!("{:?}", resolver_opts); println!("resolving google.com"); resolver .lookup("google.com", RecordType::A) .expect("lookup google.com"); println!("done"); sleep(1); println!("resolving github.com"); resolver .lookup("github.com", RecordType::A) .expect("lookup google.com"); println!("done"); sleep(1); println!("resolving example.com"); resolver .lookup("example.com", RecordType::A) .expect("lookup example.com"); println!("done"); } ```
Author
Owner

@juliangehring commented on GitHub (Jan 3, 2019):

It's 3 requests, one every 1.2sec (1000ms sleep + around 200ms to handle the request). Here an excerpt of a wireshark log for the example code:

[...]
13 7.560516 192.168.0.12 1.0.0.1 TLSv1.2 226 Client Hello
[...]
45 8.784649 192.168.0.12 1.0.0.1 TLSv1.2 226 Client Hello
[...]
76 10.003687 192.168.0.12 1.0.0.1 TLSv1.2 226 Client Hello
[...]

The example uses the default ResolverOpts which specifies a timeout of 5sec.

<!-- gh-comment-id:451252104 --> @juliangehring commented on GitHub (Jan 3, 2019): It's 3 requests, one every 1.2sec (1000ms sleep + around 200ms to handle the request). Here an excerpt of a wireshark log for the example code: > [...] > 13 7.560516 192.168.0.12 1.0.0.1 TLSv1.2 226 Client Hello > [...] > 45 8.784649 192.168.0.12 1.0.0.1 TLSv1.2 226 Client Hello > [...] > 76 10.003687 192.168.0.12 1.0.0.1 TLSv1.2 226 Client Hello > [...] The example uses the default `ResolverOpts` which specifies a timeout of 5sec.
Author
Owner

@juliangehring commented on GitHub (Jan 3, 2019):

And here the debug.log.

<!-- gh-comment-id:451252584 --> @juliangehring commented on GitHub (Jan 3, 2019): And here the [debug.log](https://github.com/bluejekyll/trust-dns/files/2725423/debug.log).
Author
Owner

@bluejekyll commented on GitHub (Jan 4, 2019):

Getting back to you late on this, but I leave this here as a reference for others: The library uses the "standard" log system. with env_logger you can enable all logs with this environment vairable:

RUST_LOG=trust_dns=debug,trust_dns_client=debug,trust_dns_proto=debug,trust_dns_resolver=debug

<!-- gh-comment-id:451561779 --> @bluejekyll commented on GitHub (Jan 4, 2019): Getting back to you late on this, but I leave this here as a reference for others: The library uses the "standard" log system. with `env_logger` you can enable all logs with this environment vairable: `RUST_LOG=trust_dns=debug,trust_dns_client=debug,trust_dns_proto=debug,trust_dns_resolver=debug`
Author
Owner

@bluejekyll commented on GitHub (Jan 4, 2019):

Thanks for posting the log, it does look like the stream is being closed immediately, based on this:

2019-01-03 21:59:36 DEBUG [trust_dns_https::https_client_stream] got bytes: 55
2019-01-03 21:59:36 DEBUG [trust_dns_proto::rr::record_data] reading A
2019-01-03 21:59:36 DEBUG [tokio_reactor] dropping I/O source: 0

I'll try to look into this when I have some time. There may have been a regression recently. If so, we'll need to build a test specifically for this case to make sure that doesn't happen again.

<!-- gh-comment-id:451563629 --> @bluejekyll commented on GitHub (Jan 4, 2019): Thanks for posting the log, it does look like the stream is being closed immediately, based on this: ```console 2019-01-03 21:59:36 DEBUG [trust_dns_https::https_client_stream] got bytes: 55 2019-01-03 21:59:36 DEBUG [trust_dns_proto::rr::record_data] reading A 2019-01-03 21:59:36 DEBUG [tokio_reactor] dropping I/O source: 0 ``` I'll try to look into this when I have some time. There may have been a regression recently. If so, we'll need to build a test specifically for this case to make sure that doesn't happen again.
Author
Owner

@juliangehring commented on GitHub (Jan 4, 2019):

Thanks for looking into this. Let me know if I can provide any more details for identifying the issue.

<!-- gh-comment-id:451582263 --> @juliangehring commented on GitHub (Jan 4, 2019): Thanks for looking into this. Let me know if I can provide any more details for identifying the issue.
Author
Owner

@bluejekyll commented on GitHub (Jan 7, 2019):

Ah, I looked at your code a while, trying to figure this out, and then I realized, you're using the synchronous Resolver, and not the AsyncResolver. In a nutshell, the synchronous Resolver does not maintain a long-lived connection pool.

I'll open a separate issue to describe. In any case, I did validate that the root of this issue is resolved.

<!-- gh-comment-id:451829088 --> @bluejekyll commented on GitHub (Jan 7, 2019): Ah, I looked at your code a while, trying to figure this out, and then I realized, you're using the synchronous Resolver, and not the AsyncResolver. In a nutshell, the synchronous Resolver does not maintain a long-lived connection pool. I'll open a separate issue to describe. In any case, I did validate that the root of this issue is resolved.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#522
No description provided.