[GH-ISSUE #2555] Flaky "DS for KSK not found" failures #1015

Closed
opened 2026-03-16 01:16:56 +03:00 by kerem · 6 comments
Owner

Originally created by @divergentdave on GitHub (Nov 5, 2024).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2555

I've had two different tests flake on me today with the message "DS for KSK not found". (see https://github.com/hickory-dns/hickory-dns/actions/runs/11692262300/job/32561220236 and https://github.com/hickory-dns/hickory-dns/actions/runs/11687976077/job/32547325532) This is some sort of issue with parsing the output of ldns-key2ds.

Originally created by @divergentdave on GitHub (Nov 5, 2024). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2555 I've had two different tests flake on me today with the message "DS for KSK not found". (see https://github.com/hickory-dns/hickory-dns/actions/runs/11692262300/job/32561220236 and https://github.com/hickory-dns/hickory-dns/actions/runs/11687976077/job/32547325532) This is some sort of issue with parsing the output of `ldns-key2ds`.
kerem closed this issue 2026-03-16 01:17:02 +03:00
Author
Owner

@marcus0x62 commented on GitHub (Nov 5, 2024):

The chain of events for this is roughly:

  • A zone file is created with dns-test
  • KSK and ZSK Keys are generated using ldns-keygen
  • The zone is signed with ldns-signzone, which adds DNSKEY records for the KSK and ZSK
  • ldns-key2ds to run against the signed zone file to extract the DS records for the KSK and ZSKs.
  • DS2::classify (defined in conformance/packages/dns-test/src/name_server.rs) is called to create a DS2 struct with both keys.

Interestingly, a key tag collision between the KSK and ZSK would trigger the KSK error specificially, or if for some reason ldns-signzone didn't add the KSK to the zone file.

Since I've been mucking around in this code for semi-related reasons (adding support to use bind's dnssec-signzone so we can really test NSEC3 opt-out) I'll put a PR up with some at least some debug logging and maybe a fix for the collision issue...

<!-- gh-comment-id:2458216436 --> @marcus0x62 commented on GitHub (Nov 5, 2024): The chain of events for this is roughly: * A zone file is created with dns-test * KSK and ZSK Keys are generated using ldns-keygen * The zone is signed with ldns-signzone, which adds DNSKEY records for the KSK and ZSK * ldns-key2ds to run *against the signed zone file* to extract the DS records for the KSK and ZSKs. * DS2::classify (defined in conformance/packages/dns-test/src/name_server.rs) is called to create a DS2 struct with both keys. Interestingly, a key tag collision between the KSK and ZSK would trigger the KSK error specificially, or if for some reason ldns-signzone didn't add the KSK to the zone file. Since I've been mucking around in this code for semi-related reasons (adding support to use bind's dnssec-signzone so we can really test NSEC3 opt-out) I'll put a PR up with some at least some debug logging and maybe a fix for the collision issue...
Author
Owner

@marcus0x62 commented on GitHub (Nov 6, 2024):

See #2556.

I think the keytag collision is the only plausible explanation for the intermittent test failures. That PR calls ldns-key2ds over the KSK and ZSK separately instead of passing in the signed zone file and eliminates the need to do a tag calculation to try to identify which key is which.

<!-- gh-comment-id:2458470402 --> @marcus0x62 commented on GitHub (Nov 6, 2024): See #2556. I think the keytag collision is the only plausible explanation for the intermittent test failures. That PR calls ldns-key2ds over the KSK and ZSK separately instead of passing in the signed zone file and eliminates the need to do a tag calculation to try to identify which key is which.
Author
Owner

@divergentdave commented on GitHub (Nov 6, 2024):

I added some logging and stress-tested the test suite, and this is the first error I got out:

[packages/dns-test/src/zone_file/signer.rs:187:9] zsk.rdata().calculate_key_tag() = 15208
[packages/dns-test/src/zone_file/signer.rs:188:9] ksk.rdata().calculate_key_tag() = 3355
[packages/dns-test/src/zone_file/signer.rs:202:17] line = "hickory-dns.testing.\t86400\tIN\tDS\t15208 8 2 54974a8b5aec48dcb77ed1c40afb89a85ffc3344580b2f66f6c1bd461d26ca74"
[packages/dns-test/src/zone_file/signer.rs:202:17] line = "hickory-dns.testing.\t86400\tIN\tDS\t3355 8 2 b2779069f82a236ab975101d72058ba475a82ad752bbfa4c55362444ff7da664"
[packages/dns-test/src/zone_file/signer.rs:187:9] zsk.rdata().calculate_key_tag() = 5868
[packages/dns-test/src/zone_file/signer.rs:188:9] ksk.rdata().calculate_key_tag() = 49191
[packages/dns-test/src/zone_file/signer.rs:202:17] line = "dsa.testing.\t86400\tIN\tDS\t5868 3 2 bc7fb7b505d960c70d2b79a15d1ea04ba19085eb896cc4754d540cc7a0b9195c"
[packages/dns-test/src/zone_file/signer.rs:202:17] line = "dsa.testing.\t86400\tIN\tDS\t49191 3 2 5b15b3386ade2f58f07d9fe6bb7fd2084464bb7de3312f475a96a03bc8ab653a"
[packages/dns-test/src/zone_file/signer.rs:187:9] zsk.rdata().calculate_key_tag() = 41476
[packages/dns-test/src/zone_file/signer.rs:188:9] ksk.rdata().calculate_key_tag() = 59996
[packages/dns-test/src/zone_file/signer.rs:202:17] line = "testing.\t86400\tIN\tDS\t41476 8 2 358550820fe840006c8378901b8e3d76b6dec9a2a106bd6e6a16636d05d872b4"
[packages/dns-test/src/zone_file/signer.rs:202:17] line = "testing.\t86400\tIN\tDS\t59996 8 2 e6e71f6afa69a8a5fcd3fc5f8e7a04f002b30b9ba6a495b2001512389ab60ec4"
[packages/dns-test/src/zone_file/signer.rs:187:9] zsk.rdata().calculate_key_tag() = 14683
[packages/dns-test/src/zone_file/signer.rs:188:9] ksk.rdata().calculate_key_tag() = 14684
[packages/dns-test/src/zone_file/signer.rs:202:17] line = ".\t86400\tIN\tDS\t14683 8 2 83d54a6be79322b2e461c7286dfb243c96d68518a180e692fc015f922a81b983"
thread 'resolver::dnssec::scenarios::insecure::deprecated_algorithm::dsa' panicked at packages/dns-test/src/name_server.rs:511:25:
DS for KSK not found

This is logging the expected key tags and each line of the produced DS file. Based on this, I think it's possible we may be rarely miscalculating the key tag as well. I'll add more logging of intermediate values and keep running it.

<!-- gh-comment-id:2458663842 --> @divergentdave commented on GitHub (Nov 6, 2024): I added some logging and stress-tested the test suite, and this is the first error I got out: ``` [packages/dns-test/src/zone_file/signer.rs:187:9] zsk.rdata().calculate_key_tag() = 15208 [packages/dns-test/src/zone_file/signer.rs:188:9] ksk.rdata().calculate_key_tag() = 3355 [packages/dns-test/src/zone_file/signer.rs:202:17] line = "hickory-dns.testing.\t86400\tIN\tDS\t15208 8 2 54974a8b5aec48dcb77ed1c40afb89a85ffc3344580b2f66f6c1bd461d26ca74" [packages/dns-test/src/zone_file/signer.rs:202:17] line = "hickory-dns.testing.\t86400\tIN\tDS\t3355 8 2 b2779069f82a236ab975101d72058ba475a82ad752bbfa4c55362444ff7da664" [packages/dns-test/src/zone_file/signer.rs:187:9] zsk.rdata().calculate_key_tag() = 5868 [packages/dns-test/src/zone_file/signer.rs:188:9] ksk.rdata().calculate_key_tag() = 49191 [packages/dns-test/src/zone_file/signer.rs:202:17] line = "dsa.testing.\t86400\tIN\tDS\t5868 3 2 bc7fb7b505d960c70d2b79a15d1ea04ba19085eb896cc4754d540cc7a0b9195c" [packages/dns-test/src/zone_file/signer.rs:202:17] line = "dsa.testing.\t86400\tIN\tDS\t49191 3 2 5b15b3386ade2f58f07d9fe6bb7fd2084464bb7de3312f475a96a03bc8ab653a" [packages/dns-test/src/zone_file/signer.rs:187:9] zsk.rdata().calculate_key_tag() = 41476 [packages/dns-test/src/zone_file/signer.rs:188:9] ksk.rdata().calculate_key_tag() = 59996 [packages/dns-test/src/zone_file/signer.rs:202:17] line = "testing.\t86400\tIN\tDS\t41476 8 2 358550820fe840006c8378901b8e3d76b6dec9a2a106bd6e6a16636d05d872b4" [packages/dns-test/src/zone_file/signer.rs:202:17] line = "testing.\t86400\tIN\tDS\t59996 8 2 e6e71f6afa69a8a5fcd3fc5f8e7a04f002b30b9ba6a495b2001512389ab60ec4" [packages/dns-test/src/zone_file/signer.rs:187:9] zsk.rdata().calculate_key_tag() = 14683 [packages/dns-test/src/zone_file/signer.rs:188:9] ksk.rdata().calculate_key_tag() = 14684 [packages/dns-test/src/zone_file/signer.rs:202:17] line = ".\t86400\tIN\tDS\t14683 8 2 83d54a6be79322b2e461c7286dfb243c96d68518a180e692fc015f922a81b983" thread 'resolver::dnssec::scenarios::insecure::deprecated_algorithm::dsa' panicked at packages/dns-test/src/name_server.rs:511:25: DS for KSK not found ``` This is logging the expected key tags and each line of the produced DS file. Based on this, I think it's possible we may be rarely miscalculating the key tag as well. I'll add more logging of intermediate values and keep running it.
Author
Owner

@marcus0x62 commented on GitHub (Nov 6, 2024):

That's possible. On the one hand, I added the keytag calculation routines from dns test and hickory to a high-throughput key generation program I use to generate key collisions. After ~7m or so keys, I couldn't provoke a single instance of different tag calculations between the three routines. On the other, there are some obvious differences between the reference implementation and the one in dnstest, particularly the lack of a mask on the accumulating add (tag += tag >> 16 in dnstest vs tag += (tag >> 16) & 0xffff in the reference implementation,) so perhaps there is something about the random keys I'm generating that aren't overflowing in a way that would trigger this difference. My collision-generator only creates ECDSAP384 keys, for instance.

edit if you can generate these failures even periodically, it would be really beneficial to have the ZSK and KSK public key bytes logged as well, that way we could verify if the key tag calculation is correct. It's also interesting that the failure you logged has the ZSK and KSK keytags one apart, although that could be a coincidence.

<!-- gh-comment-id:2460080854 --> @marcus0x62 commented on GitHub (Nov 6, 2024): That's possible. On the one hand, I added the keytag calculation routines from dns test and hickory to a high-throughput key generation program I use to generate key collisions. After ~7m or so keys, I couldn't provoke a single instance of different tag calculations between the three routines. On the other, there are some obvious differences between the reference implementation and the one in dnstest, particularly the lack of a mask on the accumulating add (tag += tag >> 16 in dnstest vs tag += (tag >> 16) & 0xffff in the reference implementation,) so perhaps there is something about the random keys I'm generating that aren't overflowing in a way that would trigger this difference. My collision-generator only creates ECDSAP384 keys, for instance. *edit* if you can generate these failures even periodically, it would be really beneficial to have the ZSK and KSK public key bytes logged as well, that way we could verify if the key tag calculation is correct. It's also interesting that the failure you logged has the ZSK and KSK keytags one apart, although that could be a coincidence.
Author
Owner

@marcus0x62 commented on GitHub (Nov 6, 2024):

I generated some more key collisions. It looks like ldns-signzone cannot handle a ZSK and KSK with the same keytag -- despite passing two keys, it only added whichever was specified first as an argument.

I then generated a new ksk with a tag one higher than the zsk (the scenario you logged above...) and saw the same behavior. Only the first key I specified was added to the zone file.

I then generated a ksk with a tag that was two more than the zsk. Both keys were added to the zone.

I then generated a ksk with a tag one less than the zsk. Both keys were added to the zone.

So, tl;dr: for some reason ldns-signzone cannot deal with colliding zone/key-signing keys, or one where the ksk has a tag one higher than the zsk.

I'll open a report with the Unbound team about this, but I think for the time being we'll need to either use bind's zone signing tool in the dns-test suite, or add some logic to check for collisions (or near collisions) between the zsk and ksk that might trigger the bug and recreate the ksk when one is detected.

Unfortunately, the fix I put in place in #2556 won't be sufficient, as while it will fix the immediate error, not having both DNSKEY records in our test zones is going to cause other sporadic test failures. #2556 will now attempt to generate non-conflicting KSKs and should provide a complete fix.

<!-- gh-comment-id:2460679605 --> @marcus0x62 commented on GitHub (Nov 6, 2024): I generated some more key collisions. It looks like ldns-signzone cannot handle a ZSK and KSK with the same keytag -- despite passing two keys, it only added whichever was specified first as an argument. I then generated a new ksk with a tag one higher than the zsk (the scenario you logged above...) and saw the same behavior. Only the first key I specified was added to the zone file. I then generated a ksk with a tag that was *two* more than the zsk. Both keys were added to the zone. I then generated a ksk with a tag one less than the zsk. Both keys were added to the zone. So, tl;dr: for some reason ldns-signzone cannot deal with colliding zone/key-signing keys, or one where the ksk has a tag one higher than the zsk. I'll open a report with the Unbound team about this, but I think for the time being we'll need to either use bind's zone signing tool in the dns-test suite, or add some logic to check for collisions (or near collisions) between the zsk and ksk that might trigger the bug and recreate the ksk when one is detected. ~~Unfortunately, the fix I put in place in #2556 won't be sufficient, as while it will fix the immediate error, not having both DNSKEY records in our test zones is going to cause other sporadic test failures.~~ #2556 will now attempt to generate non-conflicting KSKs and should provide a complete fix.
Author
Owner

@divergentdave commented on GitHub (Nov 14, 2024):

This is fixed by the workaround in #2556.

<!-- gh-comment-id:2477599459 --> @divergentdave commented on GitHub (Nov 14, 2024): This is fixed by the workaround in #2556.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#1015
No description provided.