[GH-ISSUE #1043] Resolver: HTTPS client never closes connections #591

Closed
opened 2026-03-15 23:20:12 +03:00 by kerem · 8 comments
Owner

Originally created by @balboah on GitHub (Mar 17, 2020).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1043

Describe the bug
When configuring using NameServerConfigGroup::from_ips_https() the resulting client cause connections to keep ESTABLISHED state and cause a memory leak adding a new connection for each lookup.

My loop of 200 lookups caused 200 connections in ESTABLISHED state.

To Reproduce

Running this example prints 196-200:

example main.rs
use std::net::IpAddr;
use std::process::Command;

use tokio::runtime::Builder;

use trust_dns_client::rr::{Name, RecordType};
use trust_dns_resolver::config::ResolverConfig;
use trust_dns_resolver::config::{NameServerConfigGroup, ResolverOpts};
use trust_dns_resolver::TokioAsyncResolver;

fn main() {
  let mut runtime = Builder::new()
    .threaded_scheduler()
    .core_threads(1)
    .enable_all()
    .build()
    .unwrap();

  let dns_name = "cloudflare-dns.com";
  let dns_ips: Vec<IpAddr> = vec!["1.1.1.1".parse().unwrap()];
  let ns = NameServerConfigGroup::from_ips_https(&dns_ips, 443, dns_name.to_string());
  let options = ResolverOpts {
    preserve_intermediates: true,
    ..ResolverOpts::default()
  };
  let config = ResolverConfig::from_parts(None, vec![], ns);
  let resolver = runtime
    .block_on(TokioAsyncResolver::new(
      config,
      options,
      runtime.handle().clone(),
    ))
    .unwrap();

  for n in 0..200 {
    let name: Name = format!("{}.zendesk.com.", n).parse().unwrap();
    runtime
      .block_on(resolver.lookup(name, RecordType::A, Default::default()))
      .unwrap();
  }

  let mut netstat = Command::new("netstat");
  let output = netstat
    .arg("-na")
    .output()
    .expect("failed to execute process");
  let output = String::from_utf8_lossy(&output.stdout);

  let connections: Vec<_> = output
    .lines()
    .filter(|l| l.contains("ESTA"))
    .filter(|l| l.contains("1.1.1.1"))
    .collect();
  println!("{}", connections.len());
}

Expected behavior

Connections should be closed after resolved lookup, which works with the TLS a.k.a DoT version.

System:

  • OS: darwin
  • Architecture: x86_64
  • rustc version: 1.41.1

Version:
Version: trust-dns git rev 77fd933d

Additional context
Using the rustls TLS only edition works fine.
Tokio runtime is created with:

Builder::new()
    .threaded_scheduler()
    .core_threads(1)
    .enable_all()
    .build()
Originally created by @balboah on GitHub (Mar 17, 2020). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1043 **Describe the bug** When configuring using `NameServerConfigGroup::from_ips_https()` the resulting client cause connections to keep ESTABLISHED state and cause a memory leak adding a new connection for each lookup. My loop of 200 lookups caused 200 connections in ESTABLISHED state. **To Reproduce** Running this example prints 196-200: <details><summary>example main.rs</summary> ```rust use std::net::IpAddr; use std::process::Command; use tokio::runtime::Builder; use trust_dns_client::rr::{Name, RecordType}; use trust_dns_resolver::config::ResolverConfig; use trust_dns_resolver::config::{NameServerConfigGroup, ResolverOpts}; use trust_dns_resolver::TokioAsyncResolver; fn main() { let mut runtime = Builder::new() .threaded_scheduler() .core_threads(1) .enable_all() .build() .unwrap(); let dns_name = "cloudflare-dns.com"; let dns_ips: Vec<IpAddr> = vec!["1.1.1.1".parse().unwrap()]; let ns = NameServerConfigGroup::from_ips_https(&dns_ips, 443, dns_name.to_string()); let options = ResolverOpts { preserve_intermediates: true, ..ResolverOpts::default() }; let config = ResolverConfig::from_parts(None, vec![], ns); let resolver = runtime .block_on(TokioAsyncResolver::new( config, options, runtime.handle().clone(), )) .unwrap(); for n in 0..200 { let name: Name = format!("{}.zendesk.com.", n).parse().unwrap(); runtime .block_on(resolver.lookup(name, RecordType::A, Default::default())) .unwrap(); } let mut netstat = Command::new("netstat"); let output = netstat .arg("-na") .output() .expect("failed to execute process"); let output = String::from_utf8_lossy(&output.stdout); let connections: Vec<_> = output .lines() .filter(|l| l.contains("ESTA")) .filter(|l| l.contains("1.1.1.1")) .collect(); println!("{}", connections.len()); } ``` </details> **Expected behavior** Connections should be closed after resolved lookup, which works with the TLS a.k.a DoT version. **System:** - OS: darwin - Architecture: x86_64 - rustc version: 1.41.1 **Version:** Version: trust-dns git rev 77fd933d **Additional context** Using the rustls TLS only edition works fine. Tokio runtime is created with: ```rust Builder::new() .threaded_scheduler() .core_threads(1) .enable_all() .build() ```
kerem 2026-03-15 23:20:12 +03:00
Author
Owner

@bluejekyll commented on GitHub (Mar 19, 2020):

resulting client cause connections to keep ESTABLISHED state and cause a memory leak

This is surprising. The design isn't to drop the connections between usage, in fact TCP and TLS should maintain connections, so the behavior there seems unexpected there as well. I think there are tests that cover this, but we should do a better review here.

The bigger concern is that you're seeing a memory leak, and we should determine why that is.

<!-- gh-comment-id:600919760 --> @bluejekyll commented on GitHub (Mar 19, 2020): > resulting client cause connections to keep ESTABLISHED state and cause a memory leak This is surprising. The design isn't to drop the connections between usage, in fact TCP and TLS should maintain connections, so the behavior there seems unexpected there as well. I think there are tests that cover this, but we should do a better review here. The bigger concern is that you're seeing a memory leak, and we should determine why that is.
Author
Owner

@balboah commented on GitHub (Mar 19, 2020):

yes the optimal for my HTTPS case would be to have one keep alive connection, especially if it's able to multiplex queries on the h2 streams. I'm not sure how that works with TLS only.

but the critical part is that it leaks and triggers a crash in my limited environment

<!-- gh-comment-id:601074580 --> @balboah commented on GitHub (Mar 19, 2020): yes the optimal for my HTTPS case would be to have one keep alive connection, especially if it's able to multiplex queries on the h2 streams. I'm not sure how that works with TLS only. but the critical part is that it leaks and triggers a crash in my limited environment
Author
Owner

@balboah commented on GitHub (Mar 19, 2020):

I haven't dug into the code yet but I remember there is a "max idle connections per host" setting in reqwest which defaults to max int size. Maybe this is a similar issue, rapid queries isn't limited on the number of idle connections it may spawn, and each will take a fair amount of memory

<!-- gh-comment-id:601076953 --> @balboah commented on GitHub (Mar 19, 2020): I haven't dug into the code yet but I remember there is a "max idle connections per host" setting in reqwest which defaults to max int size. Maybe this is a similar issue, rapid queries isn't limited on the number of idle connections it may spawn, and each will take a fair amount of memory
Author
Owner

@balboah commented on GitHub (Mar 24, 2020):

Currently suspecting that the bug is around the NameServerPool::parallel_conn_loop.
There will be a new HttpsClientStream for every query that was not cached, it wouldn't be possible for it to re-use its connection then. I'm having trouble following the Futures calls and all the different types that collectively gets the job done so I'll leave this for now

<!-- gh-comment-id:603271525 --> @balboah commented on GitHub (Mar 24, 2020): Currently suspecting that the bug is around the `NameServerPool::parallel_conn_loop`. There will be a new `HttpsClientStream` for every query that was not cached, it wouldn't be possible for it to re-use its connection then. I'm having trouble following the Futures calls and all the different types that collectively gets the job done so I'll leave this for now
Author
Owner

@bluejekyll commented on GitHub (Mar 24, 2020):

Yes. Some of these are older futures. All the code hasn’t been 100% cleaned up, so I can understand it being hard to follow.

I’ll take a look at that. I spent a lot of time last release cleaning up that area, so It’s fresh in my mind.

<!-- gh-comment-id:603274793 --> @bluejekyll commented on GitHub (Mar 24, 2020): Yes. Some of these are older futures. All the code hasn’t been 100% cleaned up, so I can understand it being hard to follow. I’ll take a look at that. I spent a lot of time last release cleaning up that area, so It’s fresh in my mind.
Author
Owner

@balboah commented on GitHub (Apr 1, 2020):

@bluejekyll hey, did you find some time to squash this bug? No stress, but would save me a lot of time :)

<!-- gh-comment-id:607219656 --> @balboah commented on GitHub (Apr 1, 2020): @bluejekyll hey, did you find some time to squash this bug? No stress, but would save me a lot of time :)
Author
Owner

@bluejekyll commented on GitHub (Apr 1, 2020):

Not yet. I was hoping to find some time this week.

<!-- gh-comment-id:607315501 --> @bluejekyll commented on GitHub (Apr 1, 2020): Not yet. I was hoping to find some time this week.
Author
Owner

@bluejekyll commented on GitHub (Apr 5, 2020):

I've been looking at this today. I think what happened is that through all the refactoring between 0.18 and 0.19 to support async/await, this area of the code was screwed up. I have a test I'm building to try and detect this so that we don't lose this functionality in the future. Then I'll work on a patch to fix.

<!-- gh-comment-id:609486702 --> @bluejekyll commented on GitHub (Apr 5, 2020): I've been looking at this today. I think what happened is that through all the refactoring between 0.18 and 0.19 to support async/await, this area of the code was screwed up. I have a test I'm building to try and detect this so that we don't lose this functionality in the future. Then I'll work on a patch to fix.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#591
No description provided.