mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 11:15:54 +03:00
[GH-ISSUE #2230] What is the reason for NextRandomUdpSocket? #933
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#933
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Luap99 on GitHub (Jun 6, 2024).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2230
We got a issue[1] where a user has basically all udp ports in the range bound thus causing NextRandomUdpSocket to fail to bind a free port. I see that the default range was changed in
github.com/hickory-dns/hickory-dns@8f05d14eedso I guess it should very unlikely to hit this but I wonder what the reason for doing this in the first place is?Why does it not just pass port 0 to the kernel and let the kernel assign a free port? This seems more robust and it doesn't require a lot of retires to find a free port. I tried to look in the git history but is seem this logic was added very early on without explanation why this approach was chosen.
Also on linux at least there is the net.ipv4.ip_local_port_range sysctl to specify the ephemeral port range but this logic seems to ignore this as well.
I guess we can implement our own SocketType that does that so it shouldn't be a big problem but I was interested in knowing why it is like this before I do so.
[1] https://github.com/containers/aardvark-dns/issues/473
@bluejekyll commented on GitHub (Jun 6, 2024):
This is designed to mitigate DNS cache poisoning attacks, you can read more here: http://www.unixwiz.net/techtips/iguide-kaminsky-dns-vuln.html
randomizing the ports used adds more entropy to the randomness of the message id, making response spoofing over UDP harder for attackers.
@Luap99 commented on GitHub (Jun 6, 2024):
Thanks, yes that sounds good. Would it make sense instead of of aborting after the 10 tries to do the bind with port 0 instead? That would increase the risk a bit I guess but if a users only has a few free udp ports it should at least work and the kernel will give us that free port. And if there only a few ports free the other side could try to predict them regardless I think.
I don't know what causes all the retires exactly but I see thousands of bind syscalls until it finally managed to pick a free one on a system with only a few free ports so that isn't very practical in such situations.
@bluejekyll commented on GitHub (Jun 6, 2024):
The only reason we don't use port 0 is because I don't believe any OS makes guarantees about randomness in the port issuance.
That said, if it's not a risk you are concerned with, maybe we could just have an option that is based in to disable random port selection and only require a new port on each use?
@Luap99 commented on GitHub (Jun 6, 2024):
Well to be fair I am not sure but I guess we still like to have randomness.
There is a balance to be made here and I think maybe trying 10 times random and then using 0 is good enough in practice?
@djc commented on GitHub (Jun 7, 2024):
Seems reasonable to try 0 after trying a few random ports -- chances seem good that remaining unused ports are sparsely scattered across the space. Can we reset to trying at random after some time?
@Luap99 commented on GitHub (Jun 7, 2024):
I was thinking about doing the bind 0 after the 10 tries always so it doesn't need to store any state to keep track of it. In the worst case it wastes 10 bind syscalls but the overhead of that seems negligible compared to any network latency for the actual requests.
@Luap99 commented on GitHub (Jun 26, 2024):
I created a PR https://github.com/hickory-dns/hickory-dns/pull/2260, let me know if you think this makes sense.