[GH-ISSUE #1819] cloudflare_tls failed, both dns-over-native-tls and dns-over-rustls tls handshake stuck #780

Closed
opened 2026-03-16 00:12:48 +03:00 by kerem · 11 comments
Owner

Originally created by @mokeyish on GitHub (Nov 2, 2022).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1819

Describe the bug

cloudflare_tls failed, both dns-over-native-tls and dns-over-rustls tls handshake stuck.cloudflare_tls failed while cloudflare_https is success.

It seems internal ignore the ip address defined in NameServerConfig, and query ip address with cloudflare-dns.com again.

To Reproduce
Steps to reproduce the behavior:


DOT
图片

  1. query ip with dig.
  2. test with openssl, it stuck.

DOH

图片
  1. query ip with dig.
  2. test with openssl, it success.

Expected behavior
Success to query the dns records.

System:

  • OS: MacOS/Linux
  • Architecture: x86_64
  • Version --
  • rustc version: rustc 1.66.0

Version:
Crate: resolver
Version: 0.22.

Additional context

BTW: If I set a breakpoint on resolver.lookup_ip(name).await, then debug, It will query the records successfully.

The reproduce code:

use std::{net::ToSocketAddrs, str::FromStr};

use tokio::runtime;
use trust_dns_resolver::{
    config::{NameServerConfig, NameServerConfigGroup, Protocol, ResolverConfig, ResolverOpts},
    Name, TokioAsyncResolver,
};

fn main() {
    println!("start");
    let cfg: NameServerConfigGroup = vec![NameServerConfig {
        socket_addr: "1.1.1.1:853".to_socket_addrs().unwrap().next().unwrap(),
        protocol: Protocol::Tls,
        tls_dns_name: Some("cloudflare-dns.com".to_string()),
        trust_nx_responses: true,
        tls_config: None,
        bind_addr: None,
    }]
    .into();

    let resolver = TokioAsyncResolver::tokio(
        ResolverConfig::from_parts(None, vec![], cfg),
        ResolverOpts::default(),
    )
    .unwrap();

    runtime::Builder::new_multi_thread()
        .enable_all()
        .build()
        .unwrap()
        .block_on(async {
            // set breakpoint the line below. 👇👇👇
            let res = resolver
                .lookup_ip(Name::from_str("dns.google").unwrap())
                .await
                .unwrap();

            println!("{:?}", res);
        });
}

The output when set a breakpoint then debug:

start
TRACE:trust_dns_resolver::async_resolver:238:handle passed back
DEBUG:trust_dns_proto::xfer::dns_handle:67:querying: dns.google A
DEBUG:trust_dns_resolver::name_server::name_server_pool:257:sending request: [Query { name: Name("dns.google"), query_type: A, query_class: IN }]
DEBUG:trust_dns_resolver::name_server::name_server_pool:266:error from UDP, retrying over TCP: No connections available
DEBUG:trust_dns_resolver::name_server::name_server:115:reconnecting: NameServerConfig { socket_addr: 1.1.1.1:853, protocol: Tls, tls_dns_name: Some("cloudflare-dns.com"), trust_nx_responses: true, bind_addr: None }
TRACE:log:registering event source with poller: token=Token(0), interests=READABLE | WRITABLE    
DEBUG:trust_dns_proto::xfer:171:enqueueing message:QUERY:[Query { name: Name("dns.google"), query_type: A, query_class: IN }]
DEBUG:trust_dns_proto::xfer::dns_multiplexer:317:sending message id: 1265
DEBUG:trust_dns_proto::xfer::dns_multiplexer:320:final message: ; header 1265:QUERY:RD:NoError:QUERY:0/0/0
; query
;; dns.google. IN A

DEBUG:trust_dns_proto::tcp::tcp_stream:336:sending message len: 28 to: 1.1.1.1:853
DEBUG:trust_dns_proto::tcp::tcp_stream:383:in ReadTcpState::LenBytes: 0
DEBUG:trust_dns_proto::tcp::tcp_stream:391:got length: 60
DEBUG:trust_dns_proto::tcp::tcp_stream:395:move ReadTcpState::Bytes: 60
DEBUG:trust_dns_proto::tcp::tcp_stream:416:in ReadTcpState::Bytes: 60
DEBUG:trust_dns_proto::tcp::tcp_stream:423:reset ReadTcpState::LenBytes: 0
DEBUG:trust_dns_proto::tcp::tcp_stream:436:returning bytes
DEBUG:trust_dns_proto::tcp::tcp_stream:445:returning buffer
TRACE:trust_dns_proto::rr::record_data:717:reading A
TRACE:trust_dns_proto::rr::record_data:717:reading A
DEBUG:trust_dns_resolver::error:148:Response:; header 1265:RESPONSE:RD,RA:NoError:QUERY:2/0/0
; query
;; dns.google. IN A
; answers 2
dns.google. 617 IN A 8.8.8.8
dns.google. 617 IN A 8.8.4.4
; nameservers 0
; additionals 0

DEBUG:trust_dns_resolver::error:148:Response:; header 1265:RESPONSE:RD,RA:NoError:QUERY:2/0/0
; query
;; dns.google. IN A
; answers 2
dns.google. 617 IN A 8.8.8.8
dns.google. 617 IN A 8.8.4.4
; nameservers 0
; additionals 0

LookupIp(Lookup { query: Query { name: Name("dns.google"), query_type: A, query_class: IN }, records: [Record { name_labels: Name("dns.google."), rr_type: A, dns_class: IN, ttl: 617, rdata: Some(A(8.8.8.8)) }, Record { name_labels: Name("dns.google."), rr_type: A, dns_class: IN, ttl: 617, rdata: Some(A(8.8.4.4)) }], valid_until: Instant { tv_sec: 681087, tv_nsec: 337162713 } })
TRACE:log:deregistering event source from poller   

The output run directly:


start
TRACE:trust_dns_resolver::async_resolver:238:handle passed back
DEBUG:trust_dns_proto::xfer::dns_handle:67:querying: dns.google A
DEBUG:trust_dns_resolver::name_server::name_server_pool:257:sending request: [Query { name: Name("dns.google"), query_type: A, query_class: IN }]
DEBUG:trust_dns_resolver::name_server::name_server_pool:266:error from UDP, retrying over TCP: No connections available
DEBUG:trust_dns_resolver::name_server::name_server:115:reconnecting: NameServerConfig { socket_addr: 1.1.1.1:853, protocol: Tls, tls_dns_name: Some("cloudflare-dns.com"), trust_nx_responses: true, bind_addr: None }
TRACE:log:registering event source with poller: token=Token(0), interests=READABLE | WRITABLE    

--- stuck  here ---

Originally created by @mokeyish on GitHub (Nov 2, 2022). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1819 **Describe the bug** cloudflare_tls failed, both dns-over-native-tls and dns-over-rustls tls handshake stuck.`cloudflare_tls` failed while `cloudflare_https ` is success. ~~It seems internal ignore the ip address defined in NameServerConfig, and query ip address with `cloudflare-dns.com` again.~~ **To Reproduce** Steps to reproduce the behavior: --- DOT <img width="651" alt="图片" src="https://user-images.githubusercontent.com/16131917/199404774-91856cb9-1c6e-46fc-b041-7d04d90e1523.png"> 1. query ip with dig. 2. test with openssl, it stuck. --- DOH <img width="651" alt="图片" src="https://user-images.githubusercontent.com/16131917/199405309-2e7060d7-e7e6-4583-bb02-53c2fb0d9772.png"> 1. query ip with dig. 2. test with openssl, it success. **Expected behavior** Success to query the dns records. **System:** - OS: MacOS/Linux - Architecture: x86_64 - Version -- - rustc version: rustc 1.66.0 **Version:** Crate: resolver Version: 0.22. **Additional context** ## BTW: If I set a breakpoint on `resolver.lookup_ip(name).await`, then debug, It will query the records successfully. The reproduce code: ```rs use std::{net::ToSocketAddrs, str::FromStr}; use tokio::runtime; use trust_dns_resolver::{ config::{NameServerConfig, NameServerConfigGroup, Protocol, ResolverConfig, ResolverOpts}, Name, TokioAsyncResolver, }; fn main() { println!("start"); let cfg: NameServerConfigGroup = vec![NameServerConfig { socket_addr: "1.1.1.1:853".to_socket_addrs().unwrap().next().unwrap(), protocol: Protocol::Tls, tls_dns_name: Some("cloudflare-dns.com".to_string()), trust_nx_responses: true, tls_config: None, bind_addr: None, }] .into(); let resolver = TokioAsyncResolver::tokio( ResolverConfig::from_parts(None, vec![], cfg), ResolverOpts::default(), ) .unwrap(); runtime::Builder::new_multi_thread() .enable_all() .build() .unwrap() .block_on(async { // set breakpoint the line below. 👇👇👇 let res = resolver .lookup_ip(Name::from_str("dns.google").unwrap()) .await .unwrap(); println!("{:?}", res); }); } ``` The output when set a breakpoint then debug: ```log start TRACE:trust_dns_resolver::async_resolver:238:handle passed back DEBUG:trust_dns_proto::xfer::dns_handle:67:querying: dns.google A DEBUG:trust_dns_resolver::name_server::name_server_pool:257:sending request: [Query { name: Name("dns.google"), query_type: A, query_class: IN }] DEBUG:trust_dns_resolver::name_server::name_server_pool:266:error from UDP, retrying over TCP: No connections available DEBUG:trust_dns_resolver::name_server::name_server:115:reconnecting: NameServerConfig { socket_addr: 1.1.1.1:853, protocol: Tls, tls_dns_name: Some("cloudflare-dns.com"), trust_nx_responses: true, bind_addr: None } TRACE:log:registering event source with poller: token=Token(0), interests=READABLE | WRITABLE DEBUG:trust_dns_proto::xfer:171:enqueueing message:QUERY:[Query { name: Name("dns.google"), query_type: A, query_class: IN }] DEBUG:trust_dns_proto::xfer::dns_multiplexer:317:sending message id: 1265 DEBUG:trust_dns_proto::xfer::dns_multiplexer:320:final message: ; header 1265:QUERY:RD:NoError:QUERY:0/0/0 ; query ;; dns.google. IN A DEBUG:trust_dns_proto::tcp::tcp_stream:336:sending message len: 28 to: 1.1.1.1:853 DEBUG:trust_dns_proto::tcp::tcp_stream:383:in ReadTcpState::LenBytes: 0 DEBUG:trust_dns_proto::tcp::tcp_stream:391:got length: 60 DEBUG:trust_dns_proto::tcp::tcp_stream:395:move ReadTcpState::Bytes: 60 DEBUG:trust_dns_proto::tcp::tcp_stream:416:in ReadTcpState::Bytes: 60 DEBUG:trust_dns_proto::tcp::tcp_stream:423:reset ReadTcpState::LenBytes: 0 DEBUG:trust_dns_proto::tcp::tcp_stream:436:returning bytes DEBUG:trust_dns_proto::tcp::tcp_stream:445:returning buffer TRACE:trust_dns_proto::rr::record_data:717:reading A TRACE:trust_dns_proto::rr::record_data:717:reading A DEBUG:trust_dns_resolver::error:148:Response:; header 1265:RESPONSE:RD,RA:NoError:QUERY:2/0/0 ; query ;; dns.google. IN A ; answers 2 dns.google. 617 IN A 8.8.8.8 dns.google. 617 IN A 8.8.4.4 ; nameservers 0 ; additionals 0 DEBUG:trust_dns_resolver::error:148:Response:; header 1265:RESPONSE:RD,RA:NoError:QUERY:2/0/0 ; query ;; dns.google. IN A ; answers 2 dns.google. 617 IN A 8.8.8.8 dns.google. 617 IN A 8.8.4.4 ; nameservers 0 ; additionals 0 LookupIp(Lookup { query: Query { name: Name("dns.google"), query_type: A, query_class: IN }, records: [Record { name_labels: Name("dns.google."), rr_type: A, dns_class: IN, ttl: 617, rdata: Some(A(8.8.8.8)) }, Record { name_labels: Name("dns.google."), rr_type: A, dns_class: IN, ttl: 617, rdata: Some(A(8.8.4.4)) }], valid_until: Instant { tv_sec: 681087, tv_nsec: 337162713 } }) TRACE:log:deregistering event source from poller ``` The output run directly: ```log start TRACE:trust_dns_resolver::async_resolver:238:handle passed back DEBUG:trust_dns_proto::xfer::dns_handle:67:querying: dns.google A DEBUG:trust_dns_resolver::name_server::name_server_pool:257:sending request: [Query { name: Name("dns.google"), query_type: A, query_class: IN }] DEBUG:trust_dns_resolver::name_server::name_server_pool:266:error from UDP, retrying over TCP: No connections available DEBUG:trust_dns_resolver::name_server::name_server:115:reconnecting: NameServerConfig { socket_addr: 1.1.1.1:853, protocol: Tls, tls_dns_name: Some("cloudflare-dns.com"), trust_nx_responses: true, bind_addr: None } TRACE:log:registering event source with poller: token=Token(0), interests=READABLE | WRITABLE --- stuck here --- ```
kerem closed this issue 2026-03-16 00:12:54 +03:00
Author
Owner

@mokeyish commented on GitHub (Nov 2, 2022):

This line stuck
github.com/bluejekyll/trust-dns@fe70e51f5d/crates/resolver/src/name_server/connection_provider.rs (L326)

and throw

DEBUG:trust_dns_proto::xfer::dns_exchange:327:stream errored while connecting: ProtoError { kind: Io(Custom { kind: ConnectionRefused, error: "tls error: tls handshake eof" }) }
Err(ResolveError { kind: Proto(ProtoError { kind: Io(Kind(ConnectionRefused)) }) })

Is there any way to fix it? set a breakpoint on the line, it can retry, and get the final dns records.

<!-- gh-comment-id:1300535722 --> @mokeyish commented on GitHub (Nov 2, 2022): This line stuck https://github.com/bluejekyll/trust-dns/blob/fe70e51f5d4e50e7e59f7c1642c9b68571c012d2/crates/resolver/src/name_server/connection_provider.rs#L326 and throw ```log DEBUG:trust_dns_proto::xfer::dns_exchange:327:stream errored while connecting: ProtoError { kind: Io(Custom { kind: ConnectionRefused, error: "tls error: tls handshake eof" }) } Err(ResolveError { kind: Proto(ProtoError { kind: Io(Kind(ConnectionRefused)) }) }) ``` ## Is there any way to fix it? set a breakpoint on the line, it can retry, and get the final dns records.
Author
Owner

@mokeyish commented on GitHub (Nov 2, 2022):

图片

I solve this probelm temporarily by adding a DelayFuture and sleep 800 millis.

<!-- gh-comment-id:1300748634 --> @mokeyish commented on GitHub (Nov 2, 2022): ![图片](https://user-images.githubusercontent.com/16131917/199537543-9dec6052-120d-49ae-9bff-0f6e799668bd.png) I solve this probelm temporarily by adding a `DelayFuture` and sleep 800 millis.
Author
Owner

@djc commented on GitHub (Nov 2, 2022):

I tried running your reproduction code but it returns immediately with a lookup result for me. Please give me better instructions on how to reproduce this issue.

<!-- gh-comment-id:1300814491 --> @djc commented on GitHub (Nov 2, 2022): I tried running your reproduction code but it returns immediately with a lookup result for me. Please give me better instructions on how to reproduce this issue.
Author
Owner

@mokeyish commented on GitHub (Nov 2, 2022):

I tried running your reproduction code but it returns immediately with a lookup result for me. Please give me better instructions on how to reproduce this issue.

@djc Thanks, maybe we are in different network environment,if i use google_tls, it also retuns immediately,it seems check server response too quick in tls handshaking. after I add delay 800ms,it will return lookup results, if the delay lower 800ms may fails.

<!-- gh-comment-id:1300900566 --> @mokeyish commented on GitHub (Nov 2, 2022): > I tried running your reproduction code but it returns immediately with a lookup result for me. Please give me better instructions on how to reproduce this issue. @djc Thanks, maybe we are in different network environment,if i use google_tls, it also retuns immediately,it seems check server response too quick in tls handshaking. after I add delay 800ms,it will return lookup results, if the delay lower 800ms may fails.
Author
Owner

@mokeyish commented on GitHub (Nov 3, 2022):

@djc Hello,

I add some logging to MidHandshake, Do you have any ideas to solve this problem? It seems never poll MidHandshake second time if I don't add a delay Future.

The Failed screenshot.

图片

The Success screenshot, Wrap the FirstAnswerFuture with a new DelayFuture and sleep 1 second before polling FirstAnswerFuture.

图片

The code of DelayFuture

    struct DelayFuture<T: std::future::Future + std::marker::Unpin>(T, std::time::Duration);

    impl<T: std::future::Future + std::marker::Unpin> std::future::Future for DelayFuture<T> {
        type Output = T::Output;

        fn poll(
            mut self: std::pin::Pin<&mut Self>,
            cx: &mut std::task::Context<'_>,
        ) -> std::task::Poll<Self::Output> {
            std::thread::sleep(self.1);
            Pin::new(&mut self.0).poll(cx)
        }
    }
<!-- gh-comment-id:1301672481 --> @mokeyish commented on GitHub (Nov 3, 2022): @djc Hello, I add some logging to MidHandshake, Do you have any ideas to solve this problem? It seems never poll MidHandshake second time if I don't add a delay Future. ## The Failed screenshot. <img width="936" alt="图片" src="https://user-images.githubusercontent.com/16131917/199652874-6f295e1c-4fe4-4b3e-a4b4-f735ae8b55ff.png"> ## The Success screenshot, Wrap the FirstAnswerFuture with a new DelayFuture and sleep 1 second before polling FirstAnswerFuture. <img width="865" alt="图片" src="https://user-images.githubusercontent.com/16131917/199653187-cbee6327-7dd5-4adb-a398-4e0455a0239b.png"> The code of DelayFuture ```rs struct DelayFuture<T: std::future::Future + std::marker::Unpin>(T, std::time::Duration); impl<T: std::future::Future + std::marker::Unpin> std::future::Future for DelayFuture<T> { type Output = T::Output; fn poll( mut self: std::pin::Pin<&mut Self>, cx: &mut std::task::Context<'_>, ) -> std::task::Poll<Self::Output> { std::thread::sleep(self.1); Pin::new(&mut self.0).poll(cx) } } ```
Author
Owner

@mokeyish commented on GitHub (Nov 4, 2022):

The trust-dns internal tests also stuck, while i query with dog success.

The dog tool : https://github.com/ogham/dog

图片 图片
<!-- gh-comment-id:1302890171 --> @mokeyish commented on GitHub (Nov 4, 2022): The trust-dns internal tests also stuck, while i query with `dog` success. The dog tool : https://github.com/ogham/dog <img width="1102" alt="图片" src="https://user-images.githubusercontent.com/16131917/199874567-cce5a1ca-5fd3-4326-89ef-805bb93282bd.png"> <img width="699" alt="图片" src="https://user-images.githubusercontent.com/16131917/199874713-89ec8f7b-7879-4f5c-945e-da6d8c2b6f39.png">
Author
Owner

@mokeyish commented on GitHub (Nov 4, 2022):

@djc Hello,

Do you have IPv6 address? I can provide a docker environment to reproduce it, but the docker container only has IPv6 address.

I found that both native-tls and rustls will get stuck in , maybe a bug with https://github.com/tokio-rs/tls in weak network environment.

<!-- gh-comment-id:1303154467 --> @mokeyish commented on GitHub (Nov 4, 2022): @djc Hello, Do you have IPv6 address? I can provide a docker environment to reproduce it, but the docker container only has IPv6 address. I found that both `native-tls` and `rustls` will get stuck in , maybe a bug with https://github.com/tokio-rs/tls in weak network environment.
Author
Owner

@djc commented on GitHub (Nov 4, 2022):

I'm also a tokio-rustls maintainer, if you can isolate the bug to that level probably makes sense to file an issue there.

<!-- gh-comment-id:1303159356 --> @djc commented on GitHub (Nov 4, 2022): I'm also a tokio-rustls maintainer, if you can isolate the bug to that level probably makes sense to file an issue there.
Author
Owner

@mokeyish commented on GitHub (Nov 4, 2022):

I'm also a tokio-rustls maintainer, if you can isolate the bug to that level probably makes sense to file an issue there.

The docker environment did't modify anything, just run the unittest here to reproduce, github.com/bluejekyll/trust-dns@ccd875bce4/crates/resolver/src/tls/mod.rs (L72)

<!-- gh-comment-id:1303168293 --> @mokeyish commented on GitHub (Nov 4, 2022): > I'm also a tokio-rustls maintainer, if you can isolate the bug to that level probably makes sense to file an issue there. The docker environment did't modify anything, just run the unittest here to reproduce, https://github.com/bluejekyll/trust-dns/blob/ccd875bce4c6bb05e6f5778b7759fd90bb70b7cb/crates/resolver/src/tls/mod.rs#L72
Author
Owner

@rohitcoder commented on GitHub (Jan 18, 2023):

Hi Guys, Just wanted to check if anyone found solution for this, actually i'm using reqwest crate in rust and i'm trying to use danger_accept_invalid_certs(true) to avoid this TLS issue for Cloudflare, anyone found solution?

 let client = Client::builder().use_rustls_tls().danger_accept_invalid_certs(true).build().unwrap();
image
<!-- gh-comment-id:1387701693 --> @rohitcoder commented on GitHub (Jan 18, 2023): Hi Guys, Just wanted to check if anyone found solution for this, actually i'm using ``reqwest`` crate in rust and i'm trying to use ``danger_accept_invalid_certs(true)`` to avoid this TLS issue for Cloudflare, anyone found solution? ``` rust let client = Client::builder().use_rustls_tls().danger_accept_invalid_certs(true).build().unwrap(); ``` <img width="1418" alt="image" src="https://user-images.githubusercontent.com/17665703/213282602-898f2a26-0f0d-4668-bd06-3dbc1457bf09.png">
Author
Owner

@mokeyish commented on GitHub (Jan 18, 2023):

@rohitcoder I ended up solving it by disabling sending SNI names. Because the ISP shields it by detecting the plaintext SNI in the first few bytes of TCP.

Here is my code:

github.com/mokeyish/smartdns-rs@785a66c387/src/dns_client.rs (L551)

This issue should be closed.

<!-- gh-comment-id:1396242110 --> @mokeyish commented on GitHub (Jan 18, 2023): @rohitcoder I ended up solving it by disabling sending SNI names. Because the ISP shields it by detecting the plaintext SNI in the first few bytes of TCP. Here is my code: https://github.com/mokeyish/smartdns-rs/blob/785a66c387f359852ea5a305a387d0618575792f/src/dns_client.rs#L551 This issue should be closed.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#780
No description provided.