mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 11:15:54 +03:00
[GH-ISSUE #633] Possible resource leak, UDP sockets #257
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#257
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @bluejekyll on GitHub (Dec 8, 2018).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/633
See https://github.com/actix/actix-web/issues/616
@bluejekyll commented on GitHub (Dec 8, 2018):
Currently no steps to reproduce, version 0.10.
@bluejekyll commented on GitHub (Dec 13, 2018):
local testing has not made any issues apparent.
to try and replicate, I've disabled caching in the global_resolver example, and then added a loop around it's resolution tests. Then monitored UDP socket creation.
initial test: no changes, using google for lookups, no issues. Issued those in a continuous manner, no UDP leaks were apparent. UDP socket creation is bounded to the number of active requests as expected.
next theory: a badly configured DNS resolver ip (i.e. non-existant), this also doesn't experience any resource leak.
next, perform lookup for non-existent name: no resource leak.
future test that will take a bit to setup: lookup name with no response...
@jkilpatr commented on GitHub (Dec 13, 2018):
I'll note that in my usecase requests that do not get a response may be an order of magnitude more likely than normal. These devices are out in the field backed by wireless antennas and packet loss or total connection loss is common.
I'm trying to reproduce with minimum code. A loop + some iptables packet drops but it seems to me that when spawning thousands of unresolved futures you run out of memory before you run out of sockets.
@bluejekyll commented on GitHub (Dec 13, 2018):
that would be very helpful, if you can get that working. There were some changes in connection management in the 0.10 release, I'm going to take some time and review the differences.
@jkilpatr commented on GitHub (Dec 13, 2018):
http://updates.altheamesh.com/trust-dns-file-descriptor-exhaustion.log.gz
@bluejekyll here is the log file from the all up reproduction, it's A) very hefty (304mb uncompressed) and B) contains a lot of irrelevant junk as running a large program in trace for hours is wont to do.
I couldn't get anything useful out of it. The out of file descriptors stack trace hit a shell out to ip route not the creation of a new future. But you can see how tokio_reactor has a thousand connections open. Maybe you can glean something useful.
@bluejekyll commented on GitHub (Dec 13, 2018):
thanks. I can't promise a quick turn around on this.
@jkilpatr commented on GitHub (Dec 13, 2018):
@bluejekyll no problem I think we'll move to synchronous dns resolution across the board for now. Since downgrading seems infeasible.
@bluejekyll commented on GitHub (Dec 13, 2018):
I'm having trouble reproducing the root cause of this issue, but I believe it's related to the way the multiplexing connections were reimplimented in the 0.10 (to support DoH). It appears that in the case of Timeouts we will not close down the stream. I'll work on a patch today.
@jkilpatr commented on GitHub (Dec 14, 2018):
Adding a 1 second timeout to all of my requests which involve DNS has survived an overnight stress test with no problems at all.
@bluejekyll commented on GitHub (Dec 14, 2018):
Life got in the way. I’m about 50% of the way through a patch, but it won’t be coming before tomorrow.
@bluejekyll commented on GitHub (Dec 16, 2018):
Ok, #635 has all tests passing against the Resolver. It's safe to test. I have some work left to get this working across all former uses of the UdpClientStream, such as the Client.
@bluejekyll commented on GitHub (Dec 17, 2018):
FYI, @jkilpatr #635 is ready to go. Not sure if you want to test that PR before I merge it in.
@bluejekyll commented on GitHub (Dec 17, 2018):
Resolver 0.10.1 has been published.
chronodependency with libstd dependency in -client #422