mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 11:15:54 +03:00
[GH-ISSUE #2140] Resolver couldn't handle response that have lots of records #900
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#900
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @zonyitoo on GitHub (Feb 8, 2024).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2140
Describe the bug
Bug was originally reported in: shadowsocks/shadowsocks-rust#1425 .
Reporter @qwerttvv was using smartdns to get DNS resolving result from multiple DNS servers, which would result in a large response, > 40 IPs in this reproducible case.
Tokio's builtin resolver could handle this case well, but hickory-dns rejected this response with error:
NoRecordsFound.Here are some releated logs from the original issue:
To Reproduce
Steps to reproduce the behavior:
AsyncResolver::lookup_ip.Errwith typeNoRecordsFound.Expected behavior
hickory-dns should be able to handle these cases.
System:
Version:
Crate: resolver
Version: 0.24
@qwerttvv commented on GitHub (Feb 8, 2024):
I'm sorry my English isn't good. I can read but I'm not good at writing.
Here is my short addition
Please investigate the performance of all kinds of records, not only A records and AAAA records
The example here is the domain s3.amazonaws.com, which has no AAAA
Of course, there are all kinds of other records
@zonyitoo commented on GitHub (Feb 16, 2024):
Just realized that
ResolverOpts::edns0is default tofalse.@t0m commented on GitHub (Feb 17, 2024):
Hi, would this be considered a breaking change between 0.22.0 and 0.23.0? We have some queries that return slightly over 512 bytes using eDNS that work on 0.22.0, but are rejected on 0.23.0. It looks to be related to the refactor in #1975.
If it helps, we're not using hickory-dns directly, but software we depend on (apollo router) is using it.
@djc commented on GitHub (Feb 17, 2024):
I think we usually use "breaking change" to mean an intentional change -- are you asking whether we would consider this a regression?
I'm not quite familiar with the workings of eDNS, but it certainly looks like #1975 helps compliance with the relevant standards rather than reducing it.
@t0m commented on GitHub (Feb 17, 2024):
Certainly unintended, so a regression then.
0.22.0 works with eDNS out of the box, but apparently had truncation issues fixed in #1975
0.23.0 fixes the truncation issues, but apparently changes the default behavior that we were relying on for DNS responses above 512 bytes.
@t0m commented on GitHub (Feb 17, 2024):
Here's a quick example:
Setup:
0.22.0 working
0.23.0 timing out
@bluejekyll commented on GitHub (Feb 17, 2024):
Is there a reason you need to use UDP in this context? I think google and a few other DNS resolvers out there restrict packet sizes to 512 bytes over UDP to prevent DDOSes. Can you use TCP instead?
@bluejekyll commented on GitHub (Feb 17, 2024):
On another note, it's not obvious to me why that change (which is mostly oriented around the server), would impact the client stream. Do you think it's clear what is causing the issue there?
@bluejekyll commented on GitHub (Feb 17, 2024):
Ok, looking at this a little more, if the ends settings are 512, but the packet size returned is something larger, then I could see this as cutting off the response, and that's probably the issue. But isn't that what we want? Is there a reason we'd want to accept an upstream response that is greater than the size we've specified in EDNS?
@t0m commented on GitHub (Feb 18, 2024):
I was actually expecting it to fall back to TCP after the UDP path fails, but that does not seem to happen. I can dig in to the apollo-router settings we're using to see if it's forcing UDP somewhere.
I worked backwards from the warn message
udp_client_stream: dropped malformed message waitingto udp_client_stream.rs, and noticed the changes around hoistingrecv_bufout of the loop in #1975 looked suspicious. I'm not a rust programmer though, so my debugging is not worth much.One of the odd things I noticed when making the test reduction is
512.size.dns.netmeister.orgactually returns a response size slightly smaller than 512, and it still triggers the issue.@t0m commented on GitHub (Feb 21, 2024):
I had a chance to dig in a bit more, and I believe what we're seeing is this bug from coredns (we're running within k8s): https://github.com/coredns/coredns/issues/5366.
The bug causes coredns to response with payloads >512 even if the client does not send an OPT RR. The client then fails to parse the response because it only has a buffer of 512 bytes. This explains why we didn't see the client fall back to tcp.
I think the docker embedded dns has a similar problem, which is why I was able to replicate the error in the
rockylinux/rockylinux:8container:tcpdump shows no truncate flag/fallback to tcp as expected:
@bluejekyll commented on GitHub (Feb 21, 2024):
Thanks for digging into that. I'm open to solutions here, which I suppose could be to not enforce the limit, but that feels wrong to me.
@t0m commented on GitHub (Feb 27, 2024):
Thanks for your help. For what it's worth, I think you're right to continue enforcing the limit and stick to the spec.