[GH-ISSUE #2050] Sync client hang #863

Open
opened 2026-03-16 00:38:09 +03:00 by kerem · 9 comments
Owner

Originally created by @marcbrevoort-cyberhive on GitHub (Oct 6, 2023).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2050

Describe the bug
Using the sync client, the query

    let response = client.query(&name, dns_class, record_type);

may hang and take forever to complete.

To Reproduce
Left a process running overnight.

Expected behavior
If response not received within a reasonable time frame, the function

    fn query(
        &self,
        name: &Name,
        query_class: DNSClass,
        query_type: RecordType,
    ) -> ClientResult<DnsResponse> {
        let (mut client, runtime) = self.spawn_client()?;

        runtime.block_on(client.query(name.clone(), query_class, query_type))
    }

could just return an error or an empty result. Alternatively, since the sync function just seems a wrapper for the async function, a parallel to the async library call with_timeout could be provided.

System:

  • OS: Windows (observed), probably others
  • Architecture: x86_64
  • Version 0.22.0
  • rustc version: 1.72.1

Version:
trust-dns-client = { version = "0.22.0", default-features = false, features = ["dns-over-https", "dns-over-https-rustls"]}

Additional context
librustls reports
[06:39:41.767] (9a90) DEBUG No cached session for DnsName(DnsName(DnsName("dns.google"))) - this nested DnsName seems odd.

Update: We're using the client to do DNS-over-HTTPS queries, so this issue relates to trust_dns_client::https::HttpsClientConnection which does not appear to have a with_timeout method.

Originally created by @marcbrevoort-cyberhive on GitHub (Oct 6, 2023). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2050 **Describe the bug** Using the sync client, the query ``` let response = client.query(&name, dns_class, record_type); ``` may hang and take forever to complete. **To Reproduce** Left a process running overnight. **Expected behavior** If response not received within a reasonable time frame, the function ``` fn query( &self, name: &Name, query_class: DNSClass, query_type: RecordType, ) -> ClientResult<DnsResponse> { let (mut client, runtime) = self.spawn_client()?; runtime.block_on(client.query(name.clone(), query_class, query_type)) } ``` could just return an error or an empty result. Alternatively, since the sync function just seems a wrapper for the async function, a parallel to the async library call `with_timeout` could be provided. **System:** - OS: Windows (observed), probably others - Architecture: x86_64 - Version 0.22.0 - rustc version: 1.72.1 **Version:** trust-dns-client = { version = "0.22.0", default-features = false, features = ["dns-over-https", "dns-over-https-rustls"]} **Additional context** librustls reports ```[06:39:41.767] (9a90) DEBUG No cached session for DnsName(DnsName(DnsName("dns.google")))``` - this nested DnsName seems odd. Update: We're using the client to do DNS-over-HTTPS queries, so this issue relates to `trust_dns_client::https::HttpsClientConnection` which does not appear to have a `with_timeout` method.
Author
Owner

@bluejekyll commented on GitHub (Oct 6, 2023):

Maybe we need some better defaults here, but could you try setting a timeout with this method when you construct your connection for the Client? https://docs.rs/trust-dns-client/0.23.0/trust_dns_client/udp/struct.UdpClientConnection.html#method.with_timeout

<!-- gh-comment-id:1751200080 --> @bluejekyll commented on GitHub (Oct 6, 2023): Maybe we need some better defaults here, but could you try setting a timeout with this method when you construct your connection for the Client? https://docs.rs/trust-dns-client/0.23.0/trust_dns_client/udp/struct.UdpClientConnection.html#method.with_timeout
Author
Owner

@marcbrevoort-cyberhive commented on GitHub (Oct 12, 2023):

Actually the UdpClientConnection will not do - while I can make it do lookups with UdpClientConnection, WireShark shows it is not secured. I have looked for a timeout function in HttpsClientConnection but cannot find it. Is there a way I'm unaware of that allows me to do DNS over HTTPS and also set the maximum timeout to something non-infinite? Rust reports that with_timeout is not a method of HttpsClientConnection.

My current working (0.22) prototype code for a DNS lookup looks something like this -

use std::net::Ipv4Addr;
use std::str::FromStr;
use trust_dns_client::client::{SyncClient};

use tokio::runtime::Runtime;
use std::sync::Arc;
use trust_dns_client::https::HttpsClientConnection;
use trust_dns_client::op::DnsResponse;
use trust_dns_client::rr::{DNSClass, Name, RData, Record, RecordType};
use rustls::{ClientConfig, OwnedTrustAnchor, RootCertStore};
use trust_dns_proto::iocompat::AsyncIoTokioAsStd;
use trust_dns_client::client::Client;

pub fn main() {

    let name_server = "8.8.8.8:443".parse().unwrap();
    let dns_name = "example.com".to_string();

    let mut root_store = RootCertStore::empty();
    root_store.add_server_trust_anchors(webpki_roots::TLS_SERVER_ROOTS.0.iter().map(|ta| {
        OwnedTrustAnchor::from_subject_spki_name_constraints(
            ta.subject,
            ta.spki,
            ta.name_constraints,
        )
    }));

    let client_config = ClientConfig::builder()
        .with_safe_default_cipher_suites()
        .with_safe_default_kx_groups()
        .with_safe_default_protocol_versions()
        .unwrap()
        .with_root_certificates(root_store)
        .with_no_client_auth();

    let shared_client_config = std::sync::Arc::new(client_config);
    // "string" below is type of phantom data, if used
    let conn:HttpsClientConnection<AsyncIoTokioAsStd<tokio::net::TcpStream>> = HttpsClientConnection::new(name_server, "dns.google".to_string(), shared_client_config);

    let client = SyncClient::new(conn);
    let name = Name::from_ascii(&dns_name).unwrap();
    let dns_class = DNSClass::IN;
    let record_type = RecordType::A;

    let response = client.query(&name, dns_class, record_type);
    match response {
        Ok(answer) => {
            println!("ok={:?}", answer);
        }
        Err(e) => {
            println!("err Resp={:?}", e);

        }
    }
}

Note: This also works as 0.23, but requires these settings to be changed in Cargo.toml to avoid compile errors:

- rustls = { version = "0.20.9" }
+ rustls = { version = "0.21.7" }

- trust-dns-proto = { version = "0.22.0" }
+ trust-dns-proto = { version = "0.23.0" }

- trust-dns-client = { version="0.22.0", default-features = false,features=["dns-over-https", "dns-over-https-rustls"]}
+ trust-dns-client = { version="0.23.0", default-features = false,features=["dns-over-https", "dns-over-https-rustls"]}
<!-- gh-comment-id:1759296816 --> @marcbrevoort-cyberhive commented on GitHub (Oct 12, 2023): Actually the UdpClientConnection will not do - while I can make it do lookups with UdpClientConnection, WireShark shows it is not secured. I have looked for a timeout function in HttpsClientConnection but cannot find it. Is there a way I'm unaware of that allows me to do DNS over HTTPS _and_ also set the maximum timeout to something non-infinite? Rust reports that `with_timeout` is not a method of HttpsClientConnection. My current working (0.22) prototype code for a DNS lookup looks something like this - ``` use std::net::Ipv4Addr; use std::str::FromStr; use trust_dns_client::client::{SyncClient}; use tokio::runtime::Runtime; use std::sync::Arc; use trust_dns_client::https::HttpsClientConnection; use trust_dns_client::op::DnsResponse; use trust_dns_client::rr::{DNSClass, Name, RData, Record, RecordType}; use rustls::{ClientConfig, OwnedTrustAnchor, RootCertStore}; use trust_dns_proto::iocompat::AsyncIoTokioAsStd; use trust_dns_client::client::Client; pub fn main() { let name_server = "8.8.8.8:443".parse().unwrap(); let dns_name = "example.com".to_string(); let mut root_store = RootCertStore::empty(); root_store.add_server_trust_anchors(webpki_roots::TLS_SERVER_ROOTS.0.iter().map(|ta| { OwnedTrustAnchor::from_subject_spki_name_constraints( ta.subject, ta.spki, ta.name_constraints, ) })); let client_config = ClientConfig::builder() .with_safe_default_cipher_suites() .with_safe_default_kx_groups() .with_safe_default_protocol_versions() .unwrap() .with_root_certificates(root_store) .with_no_client_auth(); let shared_client_config = std::sync::Arc::new(client_config); // "string" below is type of phantom data, if used let conn:HttpsClientConnection<AsyncIoTokioAsStd<tokio::net::TcpStream>> = HttpsClientConnection::new(name_server, "dns.google".to_string(), shared_client_config); let client = SyncClient::new(conn); let name = Name::from_ascii(&dns_name).unwrap(); let dns_class = DNSClass::IN; let record_type = RecordType::A; let response = client.query(&name, dns_class, record_type); match response { Ok(answer) => { println!("ok={:?}", answer); } Err(e) => { println!("err Resp={:?}", e); } } } ``` Note: This also works as 0.23, but requires these settings to be changed in `Cargo.toml` to avoid compile errors: ``` - rustls = { version = "0.20.9" } + rustls = { version = "0.21.7" } - trust-dns-proto = { version = "0.22.0" } + trust-dns-proto = { version = "0.23.0" } - trust-dns-client = { version="0.22.0", default-features = false,features=["dns-over-https", "dns-over-https-rustls"]} + trust-dns-client = { version="0.23.0", default-features = false,features=["dns-over-https", "dns-over-https-rustls"]} ```
Author
Owner

@marcbrevoort-cyberhive commented on GitHub (Oct 17, 2023):

Have upgraded to Hickory (0.24) now. I'm still at a loss about how to set a timeout for HttpsClientConnection.
The documentation states

Use with hickory_client::client::Client impls

but I don't understand in what way this is intended (an example might be useful there). Any insights please?

<!-- gh-comment-id:1766749104 --> @marcbrevoort-cyberhive commented on GitHub (Oct 17, 2023): Have upgraded to Hickory (0.24) now. I'm still at a loss about how to set a timeout for HttpsClientConnection. The documentation states > Use with hickory_client::client::Client impls but I don't understand in what way this is intended (an example might be useful there). Any insights please?
Author
Owner

@bluejekyll commented on GitHub (Oct 18, 2023):

Hm, I was just taking a look at the http interface, and you're right, there's not timeout option to pass through on the configuration. This seems like a missing option. I had thought we had an interface like the UDP variant above, but it appears not. We should definitely add one.

<!-- gh-comment-id:1767592545 --> @bluejekyll commented on GitHub (Oct 18, 2023): Hm, I was just taking a look at the http interface, and you're right, there's not timeout option to pass through on the configuration. This seems like a missing option. I had thought we had an interface like the UDP variant above, but it appears not. We should definitely add one.
Author
Owner

@marcbrevoort-cyberhive commented on GitHub (Nov 6, 2023):

We should definitely add [with_timeout].

May I ask if there's a known timeline for this yet?

<!-- gh-comment-id:1795188908 --> @marcbrevoort-cyberhive commented on GitHub (Nov 6, 2023): > We should definitely add [with_timeout]. May I ask if there's a known timeline for this yet?
Author
Owner

@bluejekyll commented on GitHub (Nov 7, 2023):

I'm not sure that anyone has stepped forward to offer a patch for this at this point. I was just looking through the code, and while it would be easy to add the TCP parameters here: https://github.com/hickory-dns/hickory-dns/blob/main/crates/proto/src/h2/h2_client_stream.rs#L325C26-L325C43 and probably here: https://github.com/hickory-dns/hickory-dns/blob/main/crates/resolver/src/name_server/connection_provider.rs#L355C22-L355C22

you can see the timeout example in the standard TCP impl here: https://github.com/hickory-dns/hickory-dns/blob/main/crates/resolver/src/name_server/connection_provider.rs#L303

If we want to implement timeouts on individual operations, I think that will take more effort, and I'd recommend that we rewrite the H2 client stream into modern async/await syntax before starting to do that. Also, it looks like we need timeouts on h2, h3, and quic. I think this logic is baked into the tcp and tls impls in the same way with the DnsMultiplexer implementation, but since h2, h3, and quic have multiplexing baked in, that interface wasn't necessary for them which is probably why timeouts aren't as robust there right now. That's also probably worth reviewing to see if it offers any good methods for implementing these timeouts.

<!-- gh-comment-id:1799299332 --> @bluejekyll commented on GitHub (Nov 7, 2023): I'm not sure that anyone has stepped forward to offer a patch for this at this point. I was just looking through the code, and while it would be easy to add the TCP parameters here: https://github.com/hickory-dns/hickory-dns/blob/main/crates/proto/src/h2/h2_client_stream.rs#L325C26-L325C43 and probably here: https://github.com/hickory-dns/hickory-dns/blob/main/crates/resolver/src/name_server/connection_provider.rs#L355C22-L355C22 you can see the timeout example in the standard TCP impl here: https://github.com/hickory-dns/hickory-dns/blob/main/crates/resolver/src/name_server/connection_provider.rs#L303 If we want to implement timeouts on individual operations, I think that will take more effort, and I'd recommend that we rewrite the H2 client stream into modern async/await syntax before starting to do that. Also, it looks like we need timeouts on h2, h3, and quic. I think this logic is baked into the tcp and tls impls in the same way with the `DnsMultiplexer` implementation, but since h2, h3, and quic have multiplexing baked in, that interface wasn't necessary for them which is probably why timeouts aren't as robust there right now. That's also probably worth reviewing to see if it offers any good methods for implementing these timeouts.
Author
Owner

@marcbrevoort-cyberhive commented on GitHub (Nov 10, 2023):

Can I ask how the timeout is actually enforced - is it by blocks such as these?

(udp_client_stream.rs)

        S::Time::timeout::<Pin<Box<dyn Future<Output = Result<DnsResponse, ProtoError>> + Send>>>(
            self.timeout,
            Box::pin(async move {
                let socket: S = NextRandomUdpSocket::new_with_closure(&addr, creator).await?;
                send_serial_message_inner(message, message_id, verifier, socket, recv_buf_size)
                    .await
            }),
        )
        .into()

(tcp_stream.rs)

    async fn connect_with_future<F: Future<Output = Result<S, io::Error>> + Send + 'static>(
        future: F,
        name_server: SocketAddr,
        timeout: Duration,
        outbound_messages: StreamReceiver,
    ) -> Result<Self, io::Error> {
        S::Time::timeout(timeout, future)
            .map(move |tcp_stream: Result<Result<S, io::Error>, _>| {
                tcp_stream
                    .and_then(|tcp_stream| tcp_stream)
                    .map(|tcp_stream| {
                        debug!("TCP connection established to: {}", name_server);
                        Self {
                            socket: tcp_stream,
                            outbound_messages,
                            send_state: None,
                            read_state: ReadTcpState::LenBytes {
                                pos: 0,
                                bytes: [0u8; 2],
                            },
                            peer_addr: name_server,
                        }
                    })
            })
            .await
    }
<!-- gh-comment-id:1806029357 --> @marcbrevoort-cyberhive commented on GitHub (Nov 10, 2023): Can I ask how the timeout is actually enforced - is it by blocks such as these? (udp_client_stream.rs) ``` S::Time::timeout::<Pin<Box<dyn Future<Output = Result<DnsResponse, ProtoError>> + Send>>>( self.timeout, Box::pin(async move { let socket: S = NextRandomUdpSocket::new_with_closure(&addr, creator).await?; send_serial_message_inner(message, message_id, verifier, socket, recv_buf_size) .await }), ) .into() ``` (tcp_stream.rs) ``` async fn connect_with_future<F: Future<Output = Result<S, io::Error>> + Send + 'static>( future: F, name_server: SocketAddr, timeout: Duration, outbound_messages: StreamReceiver, ) -> Result<Self, io::Error> { S::Time::timeout(timeout, future) .map(move |tcp_stream: Result<Result<S, io::Error>, _>| { tcp_stream .and_then(|tcp_stream| tcp_stream) .map(|tcp_stream| { debug!("TCP connection established to: {}", name_server); Self { socket: tcp_stream, outbound_messages, send_state: None, read_state: ReadTcpState::LenBytes { pos: 0, bytes: [0u8; 2], }, peer_addr: name_server, } }) }) .await } ```
Author
Owner

@bluejekyll commented on GitHub (Nov 12, 2023):

Yes, I think in many cases that is the case. We rely on the socket being dropped to cleanup and close all the connections. I'd have to look at all the implementations to remember how and where we did all of that. A bunch of that is older code that I haven't looked at in a while.

<!-- gh-comment-id:1807275602 --> @bluejekyll commented on GitHub (Nov 12, 2023): Yes, I think in many cases that is the case. We rely on the socket being dropped to cleanup and close all the connections. I'd have to look at all the implementations to remember how and where we did all of that. A bunch of that is older code that I haven't looked at in a while.
Author
Owner

@marcbrevoort-cyberhive commented on GitHub (Nov 14, 2023):

I've ultimately resorted to wrapping the DNS call in a function that shuts down the call if it doesn't return in a timely fashion, broadly following this example but also allowing the wrapper to store the result in an Arc<Mutex<Option<Result<DnsResponse, ClientError>>>>. On timeout, there is no result, and this None is handled by raising a timeout exception. Otherwise the Result<DnsResponse, ClientError> is returned.

The benefit is that it doesn't touch Hickory, and when the update lands, in principle it should just keep working - so both fixing the lack of timeout in Hickory and removing the wrapper once Hickory is fixed can happen in slow time.

While this workaround works, it doesn't actually solve the Sync client hang, so I'll leave this issue open.

<!-- gh-comment-id:1810035446 --> @marcbrevoort-cyberhive commented on GitHub (Nov 14, 2023): I've ultimately resorted to wrapping the DNS call in a function that shuts down the call if it doesn't return in a timely fashion, broadly following [this example](https://docs.rs/tokio/latest/tokio/runtime/struct.Runtime.html#method.shutdown_timeout) but also allowing the wrapper to store the result in an `Arc<Mutex<Option<Result<DnsResponse, ClientError>>>>`. On timeout, there is no result, and this None is handled by raising a timeout exception. Otherwise the Result<DnsResponse, ClientError> is returned. The benefit is that it doesn't touch Hickory, and when the update lands, in principle it should just keep working - so both fixing the lack of timeout in Hickory and removing the wrapper once Hickory is fixed can happen in slow time. While this workaround works, it doesn't actually solve the Sync client hang, so I'll leave this issue open.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#863
No description provided.