mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 11:15:54 +03:00
[GH-ISSUE #1910] Memory leak when responses timeout #812
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#812
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @hottea773 on GitHub (Mar 20, 2023).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1910
Describe the bug
When doing DNS resolution (using the resolver) with responses which timeout trust-dns is leaking memory.
To Reproduce
Steps to reproduce the behaviour:
With the following
main.rs:and the following
Cargo.toml:Run
cargo runand watch the memory usage over time (0.8% -> 1.3% -> 2.1% -> 5.8%):Expected behaviour
trust-dns-resolvernot to leak memorySystem:
rustc 1.66.0-dev (adb13e80e 2022-12-16)Version:
Crate: resolver
Version:
mainat commit72f2a07c901743f3a4cc7900cda22b75de60d569. This does not reproduce with0.22.0Additional context
The leak seems to be within the
Resolver. When we create a new resolver for every query (move thelet resolver ...line inside theloopin the example) we don't seem to see a leak.@bluejekyll commented on GitHub (Mar 21, 2023):
Looking at the example posted, it looks more like the loop is unbounded and creating infinite requests until it crashes. Is there any indication that they leak if given the time for the requests to clean themselves up?
@djc commented on GitHub (Mar 21, 2023):
Can you run a
git bisecton this?@hottea773 commented on GitHub (Mar 21, 2023):
They're synchronous requests, so I wouldn't expect them to be doing any "clean-ip" after they're done either way.
Anyway, I've modified my example to a more realistic 50 millisecond timeout and given it time to clean up after itself and the memory remains in use after the loop's finished:
Output:
Top output summary:
@mokeyish commented on GitHub (Mar 26, 2023):
I also encountered this problem. But I don't know if it's the same reason.
Reproduce:(windows (11) only)
Repeat executes the command above about 10 times, you can see
fatal runtime error: stack overflow@mokeyish commented on GitHub (Mar 26, 2023):
Not same reason, I fixed it at #1912
@hottea773 commented on GitHub (Mar 30, 2023):
Ok, so I've done a
git bisectand found that it very clearly started leaking ingithub.com/bluejekyll/trust-dns@15423b8610.I've done a bit of testing, and found that if we simply revert the change in
github.com/bluejekyll/trust-dns@15423b8610 (diff-8c08c0c200)it fixes the leak. I'm no expert, but it seems to me that we're doing something along the lines of adding a handle for every request, but never joining on them, or otherwise tidying that up hence leaking memory per-request. I'd propose that we fix this by simply reverting that part of the change, but happy to be told otherwise. @jeff-hiner probably has a better understanding.@jeff-hiner commented on GitHub (Mar 30, 2023):
There's a reason the join sets were added.
For context here: the spawned futures typically hold cloned TCP or DoH sockets. If those futures are spawned in the background without retaining the JoinHandles, then disposing of the parent
ConnectionProviderleaves those tasks running and their sockets open until they time out.It looks like in this case nothing is reaping the
self.join_set. The easiest way to fix that is to add reaping withinspawn_bgwhile the join set is locked. This fix will retain finished tasks until either the nextspawn_bgcall or the parentConnectionProvideris torn down.Give me a moment, I'll have a PR up shortly.