mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 03:05:51 +03:00
[GH-ISSUE #2050] Sync client hang #863
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#863
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @marcbrevoort-cyberhive on GitHub (Oct 6, 2023).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2050
Describe the bug
Using the sync client, the query
may hang and take forever to complete.
To Reproduce
Left a process running overnight.
Expected behavior
If response not received within a reasonable time frame, the function
could just return an error or an empty result. Alternatively, since the sync function just seems a wrapper for the async function, a parallel to the async library call
with_timeoutcould be provided.System:
Version:
trust-dns-client = { version = "0.22.0", default-features = false, features = ["dns-over-https", "dns-over-https-rustls"]}
Additional context
librustls reports
[06:39:41.767] (9a90) DEBUG No cached session for DnsName(DnsName(DnsName("dns.google")))- this nested DnsName seems odd.Update: We're using the client to do DNS-over-HTTPS queries, so this issue relates to
trust_dns_client::https::HttpsClientConnectionwhich does not appear to have awith_timeoutmethod.@bluejekyll commented on GitHub (Oct 6, 2023):
Maybe we need some better defaults here, but could you try setting a timeout with this method when you construct your connection for the Client? https://docs.rs/trust-dns-client/0.23.0/trust_dns_client/udp/struct.UdpClientConnection.html#method.with_timeout
@marcbrevoort-cyberhive commented on GitHub (Oct 12, 2023):
Actually the UdpClientConnection will not do - while I can make it do lookups with UdpClientConnection, WireShark shows it is not secured. I have looked for a timeout function in HttpsClientConnection but cannot find it. Is there a way I'm unaware of that allows me to do DNS over HTTPS and also set the maximum timeout to something non-infinite? Rust reports that
with_timeoutis not a method of HttpsClientConnection.My current working (0.22) prototype code for a DNS lookup looks something like this -
Note: This also works as 0.23, but requires these settings to be changed in
Cargo.tomlto avoid compile errors:@marcbrevoort-cyberhive commented on GitHub (Oct 17, 2023):
Have upgraded to Hickory (0.24) now. I'm still at a loss about how to set a timeout for HttpsClientConnection.
The documentation states
but I don't understand in what way this is intended (an example might be useful there). Any insights please?
@bluejekyll commented on GitHub (Oct 18, 2023):
Hm, I was just taking a look at the http interface, and you're right, there's not timeout option to pass through on the configuration. This seems like a missing option. I had thought we had an interface like the UDP variant above, but it appears not. We should definitely add one.
@marcbrevoort-cyberhive commented on GitHub (Nov 6, 2023):
May I ask if there's a known timeline for this yet?
@bluejekyll commented on GitHub (Nov 7, 2023):
I'm not sure that anyone has stepped forward to offer a patch for this at this point. I was just looking through the code, and while it would be easy to add the TCP parameters here: https://github.com/hickory-dns/hickory-dns/blob/main/crates/proto/src/h2/h2_client_stream.rs#L325C26-L325C43 and probably here: https://github.com/hickory-dns/hickory-dns/blob/main/crates/resolver/src/name_server/connection_provider.rs#L355C22-L355C22
you can see the timeout example in the standard TCP impl here: https://github.com/hickory-dns/hickory-dns/blob/main/crates/resolver/src/name_server/connection_provider.rs#L303
If we want to implement timeouts on individual operations, I think that will take more effort, and I'd recommend that we rewrite the H2 client stream into modern async/await syntax before starting to do that. Also, it looks like we need timeouts on h2, h3, and quic. I think this logic is baked into the tcp and tls impls in the same way with the
DnsMultiplexerimplementation, but since h2, h3, and quic have multiplexing baked in, that interface wasn't necessary for them which is probably why timeouts aren't as robust there right now. That's also probably worth reviewing to see if it offers any good methods for implementing these timeouts.@marcbrevoort-cyberhive commented on GitHub (Nov 10, 2023):
Can I ask how the timeout is actually enforced - is it by blocks such as these?
(udp_client_stream.rs)
(tcp_stream.rs)
@bluejekyll commented on GitHub (Nov 12, 2023):
Yes, I think in many cases that is the case. We rely on the socket being dropped to cleanup and close all the connections. I'd have to look at all the implementations to remember how and where we did all of that. A bunch of that is older code that I haven't looked at in a while.
@marcbrevoort-cyberhive commented on GitHub (Nov 14, 2023):
I've ultimately resorted to wrapping the DNS call in a function that shuts down the call if it doesn't return in a timely fashion, broadly following this example but also allowing the wrapper to store the result in an
Arc<Mutex<Option<Result<DnsResponse, ClientError>>>>. On timeout, there is no result, and this None is handled by raising a timeout exception. Otherwise the Result<DnsResponse, ClientError> is returned.The benefit is that it doesn't touch Hickory, and when the update lands, in principle it should just keep working - so both fixing the lack of timeout in Hickory and removing the wrapper once Hickory is fixed can happen in slow time.
While this workaround works, it doesn't actually solve the Sync client hang, so I'll leave this issue open.