mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 03:05:51 +03:00
[GH-ISSUE #2049] Axfr and AxfrStream issues with big zones #862
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#862
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @darnuria on GitHub (Oct 5, 2023).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2049
EDIT: Maybe related to : https://github.com/bluejekyll/trust-dns/issues/351 as pointer by @bluejekyll in discussion.
Hello! Sorry it's a bit messy found it while playing around with Axfr (zone_transfers) parts of the client lib.
Describe the bug
LARGE AXFR (TCP) may triggers unexpected patterns such as ServerTimeout (tested with powerdns) and trigger
future_channelmpsc queue full.(with 30s read_timeout tcp; found the issue with other values but have to recheck)
To Reproduce
Tested out to make an AXFR of a zone with 200 000 records. Why would I do that? Because I wanted to check biggest zone at work with trust-dns.
Expected behavior
Axfr may handle huge zone and not only one per one.
AxfrStream should handle streaming zone fast enough and handle backpressure if the sender is full because the worker at the other side of the channel is slower than the producer.
Did something like a dig-like just axfring for testing out.
System:
Version:
Crate: client Library
Version: 0.23
Additional context
Add any other context about the problem here.
Code
Logs lead me to here for the mpsc full debug:
https://github.com/bluejekyll/trust-dns/blob/4a1c4fe2d1ad7e987ad23de9dbf4c927d84212c1/crates/proto/src/xfer/dns_multiplexer.rs#L402
And bout the zero bytes:
https://github.com/bluejekyll/trust-dns/blob/a7d4184792fc26c41de0f56bb671aa7dd7705b18/crates/proto/src/tcp/tcp_stream.rs#L397
Reproduction code
Logs
getting:
log server side:
With prints and debug trace
Same tcpdump removed the print of the dig-like but kept logging
End of TCP dump of dig
@bluejekyll commented on GitHub (Oct 5, 2023):
Yes, this is a known issue, I think it’s the same as reported here #351
@darnuria commented on GitHub (Oct 6, 2023):
Oh thanks did a quick search about AXFR and stream and issue and didn't find out, was already late in Europe. Feel free to close I can report the info you think are useful there. :)
EDIT: Will try to spend some time reading carefully the other issue to checkout if it's really the same or if it add some info. Cannot promise to do it today but next-week I can try if you think it's worth it.
@bluejekyll commented on GitHub (Oct 6, 2023):
A quick synopsis on this, I think to fix this we need to change or add a new method into the Authority/Catalog for this operation to properly work. Right now, if I remember correctly, we return the entire zone in a single call and then stream that back. That's bad for large zones. What we need is a continuation call, like get first 0..N record, get next N..(N + N) records, etc. That would allow us to properly stream back an entire very large zone.
We also probably have some other deficiencies that we will need to take care of as well, for example, we really need an on disk option for zone lookups and not store them entirely in memory. I think there might be an old issue for that.