[GH-ISSUE #2602] Consider removing DnsLru::insert_records() #1023

Closed
opened 2026-03-16 01:18:21 +03:00 by kerem · 2 comments
Owner

Originally created by @divergentdave on GitHub (Nov 21, 2024).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2602

Currently, the resolver and recursor use disjoint APIs for inserting into the record cache. The recursor uses DnsLru::insert_records(), which takes a list of records, groups them by name and record type (with some exceptions) and stores the groups in the cache under synthesized query keys. The two exceptions for record types are CNAME records, which get stored under the original query, and RRSIG records, which get stored with the record set they are a signature over.

These exceptions are incomplete, because they result in DS records getting stored separate from NS records when present in a response. They also result in storing NSEC and NSEC3 records under the name of the record, not the query name, which means they won't be found when performing the same query again. This affects #2595. Additionally, glue records may get stored in the cache if they are in-bailiwick, which means the recursor may nondeterministically reuse glue records when handling subsequent recursive requests, rather than requesting A/AAAA records from the relevant authoritative server.

Rather than extend the exceptions in insert_records()'s grouping algorithm, I think it might make more sense to eliminate its use. Instead, we would store the records from an authoritative response together, keyed by their original query. This would ensure we always have the necessary DNSSEC records sent in the original response, and prevent parent and child nodes from fighting over cache entries in case they have a disagreement about a record set. However, the recursor as it is written today is reliant on the behavior of insert_records(), so we would need to change its operation in tandem. For example, we would have to pull out glue A/AAAA records directly from a response, rather than relying on insert_records() to populate cache entries with them, and loading IP addresses from the cache in RecursorDnsHandle::ns_pool_for_zone().

Originally created by @divergentdave on GitHub (Nov 21, 2024). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2602 Currently, the resolver and recursor use disjoint APIs for inserting into the record cache. The recursor uses `DnsLru::insert_records()`, which takes a list of records, groups them by name and record type (with some exceptions) and stores the groups in the cache under synthesized query keys. The two exceptions for record types are `CNAME` records, which get stored under the original query, and `RRSIG` records, which get stored with the record set they are a signature over. These exceptions are incomplete, because they result in DS records getting stored separate from NS records when present in a response. They also result in storing NSEC and NSEC3 records under the name of the record, not the query name, which means they won't be found when performing the same query again. This affects #2595. Additionally, glue records may get stored in the cache if they are in-bailiwick, which means the recursor may nondeterministically reuse glue records when handling subsequent recursive requests, rather than requesting A/AAAA records from the relevant authoritative server. Rather than extend the exceptions in `insert_records()`'s grouping algorithm, I think it might make more sense to eliminate its use. Instead, we would store the records from an authoritative response together, keyed by their original query. This would ensure we always have the necessary DNSSEC records sent in the original response, and prevent parent and child nodes from fighting over cache entries in case they have a disagreement about a record set. However, the recursor as it is written today is reliant on the behavior of `insert_records()`, so we would need to change its operation in tandem. For example, we would have to pull out glue A/AAAA records directly from a response, rather than relying on `insert_records()` to populate cache entries with them, and loading IP addresses from the cache in `RecursorDnsHandle::ns_pool_for_zone()`.
kerem 2026-03-16 01:18:21 +03:00
Author
Owner

@divergentdave commented on GitHub (Dec 6, 2024):

RFC 4035 section 4.5 has a related recommendation:

A security-aware resolver SHOULD cache each response as a single atomic entry containing the entire answer, including the named RRset and any associated DNSSEC RRs. ...

... Resolvers that follow this recommendation will have a more consistent view of the namespace.

The recommendation here is more focused on not synthesizing positive responses from wildcard RRSIGs, or performing aggressive NSEC caching, but I think the point about consistency still applies.

<!-- gh-comment-id:2523940605 --> @divergentdave commented on GitHub (Dec 6, 2024): RFC 4035 section 4.5 has a related recommendation: > A security-aware resolver SHOULD cache each response as a single atomic entry containing the entire answer, including the named RRset and any associated DNSSEC RRs. ... > > ... Resolvers that follow this recommendation will have a more consistent view of the namespace. The recommendation here is more focused on not synthesizing positive responses from wildcard RRSIGs, or performing aggressive NSEC caching, but I think the point about consistency still applies.
Author
Owner

@divergentdave commented on GitHub (Mar 28, 2025):

I tried looking up ideleg.net. IN A with the validating recursive resolver, and got back a SERVFAIL response due to this issue. This zone is unusual because one of its nameservers is ideleg.net. itself. Thus, one of the glue records from the referral from net. gets stored in the cache under the same query as the recursive query. The glue does not come with an RRSIG, which should be fine because the glue is not authoritative data anyway, but then this stored RRset is also used as the response to the recursive query. DNSSEC validation then fails because there is no RRSIG in the Lookup returned by the recursor.

<!-- gh-comment-id:2761406086 --> @divergentdave commented on GitHub (Mar 28, 2025): I tried looking up `ideleg.net. IN A` with the validating recursive resolver, and got back a SERVFAIL response due to this issue. This zone is unusual because one of its nameservers is `ideleg.net.` itself. Thus, one of the glue records from the referral from `net.` gets stored in the cache under the same query as the recursive query. The glue does not come with an `RRSIG`, which should be fine because the glue is not authoritative data anyway, but then this stored RRset is also used as the response to the recursive query. DNSSEC validation then fails because there is no RRSIG in the `Lookup` returned by the recursor.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#1023
No description provided.