[GH-ISSUE #633] Possible resource leak, UDP sockets #257

Closed
opened 2026-03-07 23:04:15 +03:00 by kerem · 13 comments
Owner

Originally created by @bluejekyll on GitHub (Dec 8, 2018).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/633

See https://github.com/actix/actix-web/issues/616

Originally created by @bluejekyll on GitHub (Dec 8, 2018). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/633 See https://github.com/actix/actix-web/issues/616
kerem 2026-03-07 23:04:15 +03:00
Author
Owner

@bluejekyll commented on GitHub (Dec 8, 2018):

Currently no steps to reproduce, version 0.10.

<!-- gh-comment-id:445486060 --> @bluejekyll commented on GitHub (Dec 8, 2018): Currently no steps to reproduce, version 0.10.
Author
Owner

@bluejekyll commented on GitHub (Dec 13, 2018):

local testing has not made any issues apparent.

to try and replicate, I've disabled caching in the global_resolver example, and then added a loop around it's resolution tests. Then monitored UDP socket creation.

initial test: no changes, using google for lookups, no issues. Issued those in a continuous manner, no UDP leaks were apparent. UDP socket creation is bounded to the number of active requests as expected.

next theory: a badly configured DNS resolver ip (i.e. non-existant), this also doesn't experience any resource leak.

next, perform lookup for non-existent name: no resource leak.

future test that will take a bit to setup: lookup name with no response...

<!-- gh-comment-id:446975113 --> @bluejekyll commented on GitHub (Dec 13, 2018): local testing has not made any issues apparent. to try and replicate, I've disabled caching in the global_resolver example, and then added a loop around it's resolution tests. Then monitored UDP socket creation. initial test: no changes, using google for lookups, no issues. Issued those in a continuous manner, no UDP leaks were apparent. UDP socket creation is bounded to the number of active requests as expected. next theory: a badly configured DNS resolver ip (i.e. non-existant), this also doesn't experience any resource leak. next, perform lookup for non-existent name: no resource leak. future test that will take a bit to setup: lookup name with no response...
Author
Owner

@jkilpatr commented on GitHub (Dec 13, 2018):

I'll note that in my usecase requests that do not get a response may be an order of magnitude more likely than normal. These devices are out in the field backed by wireless antennas and packet loss or total connection loss is common.

I'm trying to reproduce with minimum code. A loop + some iptables packet drops but it seems to me that when spawning thousands of unresolved futures you run out of memory before you run out of sockets.

<!-- gh-comment-id:446976558 --> @jkilpatr commented on GitHub (Dec 13, 2018): I'll note that in my usecase requests that do not get a response may be an order of magnitude more likely than normal. These devices are out in the field backed by wireless antennas and packet loss or total connection loss is common. I'm trying to reproduce with minimum code. A loop + some iptables packet drops but it seems to me that when spawning thousands of unresolved futures you run out of memory before you run out of sockets.
Author
Owner

@bluejekyll commented on GitHub (Dec 13, 2018):

that would be very helpful, if you can get that working. There were some changes in connection management in the 0.10 release, I'm going to take some time and review the differences.

<!-- gh-comment-id:446977902 --> @bluejekyll commented on GitHub (Dec 13, 2018): that would be very helpful, if you can get that working. There were some changes in connection management in the 0.10 release, I'm going to take some time and review the differences.
Author
Owner

@jkilpatr commented on GitHub (Dec 13, 2018):

http://updates.altheamesh.com/trust-dns-file-descriptor-exhaustion.log.gz

@bluejekyll here is the log file from the all up reproduction, it's A) very hefty (304mb uncompressed) and B) contains a lot of irrelevant junk as running a large program in trace for hours is wont to do.

I couldn't get anything useful out of it. The out of file descriptors stack trace hit a shell out to ip route not the creation of a new future. But you can see how tokio_reactor has a thousand connections open. Maybe you can glean something useful.

<!-- gh-comment-id:446981458 --> @jkilpatr commented on GitHub (Dec 13, 2018): http://updates.altheamesh.com/trust-dns-file-descriptor-exhaustion.log.gz @bluejekyll here is the log file from the all up reproduction, it's A) very hefty (304mb uncompressed) and B) contains a lot of irrelevant junk as running a large program in trace for hours is wont to do. I couldn't get anything useful out of it. The out of file descriptors stack trace hit a shell out to ip route not the creation of a new future. But you can see how tokio_reactor has a thousand connections open. Maybe you can glean something useful.
Author
Owner

@bluejekyll commented on GitHub (Dec 13, 2018):

thanks. I can't promise a quick turn around on this.

<!-- gh-comment-id:446982340 --> @bluejekyll commented on GitHub (Dec 13, 2018): thanks. I can't promise a quick turn around on this.
Author
Owner

@jkilpatr commented on GitHub (Dec 13, 2018):

@bluejekyll no problem I think we'll move to synchronous dns resolution across the board for now. Since downgrading seems infeasible.

<!-- gh-comment-id:446982844 --> @jkilpatr commented on GitHub (Dec 13, 2018): @bluejekyll no problem I think we'll move to synchronous dns resolution across the board for now. Since downgrading seems infeasible.
Author
Owner

@bluejekyll commented on GitHub (Dec 13, 2018):

I'm having trouble reproducing the root cause of this issue, but I believe it's related to the way the multiplexing connections were reimplimented in the 0.10 (to support DoH). It appears that in the case of Timeouts we will not close down the stream. I'll work on a patch today.

<!-- gh-comment-id:447130005 --> @bluejekyll commented on GitHub (Dec 13, 2018): I'm having trouble reproducing the root cause of this issue, but I believe it's related to the way the multiplexing connections were reimplimented in the 0.10 (to support DoH). It appears that in the case of Timeouts we will not close down the stream. I'll work on a patch today.
Author
Owner

@jkilpatr commented on GitHub (Dec 14, 2018):

Adding a 1 second timeout to all of my requests which involve DNS has survived an overnight stress test with no problems at all.

<!-- gh-comment-id:447308882 --> @jkilpatr commented on GitHub (Dec 14, 2018): Adding a 1 second timeout to all of my requests which involve DNS has survived an overnight stress test with no problems at all.
Author
Owner

@bluejekyll commented on GitHub (Dec 14, 2018):

Life got in the way. I’m about 50% of the way through a patch, but it won’t be coming before tomorrow.

<!-- gh-comment-id:447388319 --> @bluejekyll commented on GitHub (Dec 14, 2018): Life got in the way. I’m about 50% of the way through a patch, but it won’t be coming before tomorrow.
Author
Owner

@bluejekyll commented on GitHub (Dec 16, 2018):

Ok, #635 has all tests passing against the Resolver. It's safe to test. I have some work left to get this working across all former uses of the UdpClientStream, such as the Client.

<!-- gh-comment-id:447616578 --> @bluejekyll commented on GitHub (Dec 16, 2018): Ok, #635 has all tests passing against the Resolver. It's safe to test. I have some work left to get this working across all former uses of the UdpClientStream, such as the Client.
Author
Owner

@bluejekyll commented on GitHub (Dec 17, 2018):

FYI, @jkilpatr #635 is ready to go. Not sure if you want to test that PR before I merge it in.

<!-- gh-comment-id:447689305 --> @bluejekyll commented on GitHub (Dec 17, 2018): FYI, @jkilpatr #635 is ready to go. Not sure if you want to test that PR before I merge it in.
Author
Owner

@bluejekyll commented on GitHub (Dec 17, 2018):

Resolver 0.10.1 has been published.

<!-- gh-comment-id:447850854 --> @bluejekyll commented on GitHub (Dec 17, 2018): Resolver 0.10.1 has been published.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#257
No description provided.