mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-26 03:35:52 +03:00
[GH-ISSUE #2555] Flaky "DS for KSK not found" failures #1015
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#1015
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @divergentdave on GitHub (Nov 5, 2024).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2555
I've had two different tests flake on me today with the message "DS for KSK not found". (see https://github.com/hickory-dns/hickory-dns/actions/runs/11692262300/job/32561220236 and https://github.com/hickory-dns/hickory-dns/actions/runs/11687976077/job/32547325532) This is some sort of issue with parsing the output of
ldns-key2ds.@marcus0x62 commented on GitHub (Nov 5, 2024):
The chain of events for this is roughly:
Interestingly, a key tag collision between the KSK and ZSK would trigger the KSK error specificially, or if for some reason ldns-signzone didn't add the KSK to the zone file.
Since I've been mucking around in this code for semi-related reasons (adding support to use bind's dnssec-signzone so we can really test NSEC3 opt-out) I'll put a PR up with some at least some debug logging and maybe a fix for the collision issue...
@marcus0x62 commented on GitHub (Nov 6, 2024):
See #2556.
I think the keytag collision is the only plausible explanation for the intermittent test failures. That PR calls ldns-key2ds over the KSK and ZSK separately instead of passing in the signed zone file and eliminates the need to do a tag calculation to try to identify which key is which.
@divergentdave commented on GitHub (Nov 6, 2024):
I added some logging and stress-tested the test suite, and this is the first error I got out:
This is logging the expected key tags and each line of the produced DS file. Based on this, I think it's possible we may be rarely miscalculating the key tag as well. I'll add more logging of intermediate values and keep running it.
@marcus0x62 commented on GitHub (Nov 6, 2024):
That's possible. On the one hand, I added the keytag calculation routines from dns test and hickory to a high-throughput key generation program I use to generate key collisions. After ~7m or so keys, I couldn't provoke a single instance of different tag calculations between the three routines. On the other, there are some obvious differences between the reference implementation and the one in dnstest, particularly the lack of a mask on the accumulating add (tag += tag >> 16 in dnstest vs tag += (tag >> 16) & 0xffff in the reference implementation,) so perhaps there is something about the random keys I'm generating that aren't overflowing in a way that would trigger this difference. My collision-generator only creates ECDSAP384 keys, for instance.
edit if you can generate these failures even periodically, it would be really beneficial to have the ZSK and KSK public key bytes logged as well, that way we could verify if the key tag calculation is correct. It's also interesting that the failure you logged has the ZSK and KSK keytags one apart, although that could be a coincidence.
@marcus0x62 commented on GitHub (Nov 6, 2024):
I generated some more key collisions. It looks like ldns-signzone cannot handle a ZSK and KSK with the same keytag -- despite passing two keys, it only added whichever was specified first as an argument.
I then generated a new ksk with a tag one higher than the zsk (the scenario you logged above...) and saw the same behavior. Only the first key I specified was added to the zone file.
I then generated a ksk with a tag that was two more than the zsk. Both keys were added to the zone.
I then generated a ksk with a tag one less than the zsk. Both keys were added to the zone.
So, tl;dr: for some reason ldns-signzone cannot deal with colliding zone/key-signing keys, or one where the ksk has a tag one higher than the zsk.
I'll open a report with the Unbound team about this, but I think for the time being we'll need to either use bind's zone signing tool in the dns-test suite, or add some logic to check for collisions (or near collisions) between the zsk and ksk that might trigger the bug and recreate the ksk when one is detected.
Unfortunately, the fix I put in place in #2556 won't be sufficient, as while it will fix the immediate error, not having both DNSKEY records in our test zones is going to cause other sporadic test failures.#2556 will now attempt to generate non-conflicting KSKs and should provide a complete fix.@divergentdave commented on GitHub (Nov 14, 2024):
This is fixed by the workaround in #2556.