starred/hickory-dns

Fork 0

mirror of https://github.com/hickory-dns/hickory-dns.git synced 2026-04-25 11:15:54 +03:00

[GH-ISSUE #2261] Concurrent access to the DNS cache for better performance #942

New issue

Closed

opened 2026-03-16 01:02:27 +03:00 by kerem · 6 comments

kerem commented

2026-03-16 01:02:27 +03:00

Owner

Originally created by @telbizov on GitHub (Jun 26, 2024).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2261

Hello everyone,

I am building a high-performance asynchronous forward proxy with tokio. I started using hickory resolver for the DNS resolution part and so far I like it a lot. For example, I really appreciate the fact that I can use my own RuntimeProvider to allocate TCP and UDP sockets because of an edge case I have.

I run multiple current_thread tokio runtimes (one per core) in dedicated worker threads that I launch myself. Then, I have a lazy static for the AsyncResolver which is shared across all threads. The default mechanics work really well in my case. Every thread that calls lookup_ip() gets to run the async future on its own local runtime thus distributing the workload of the lookups themselves well across worker. Then the result of the lookup is shared in the common DnsLru cache behind the global AsyncResolver variable, which is exactly what I am after -- shared DNS cache.

The issue that I am facing is that caching is implemented via the DnsLru type which uses cache: Arc<Mutex<LruCache<Query, LruValue>>> for the storage of the data. Furthermore, any access to any value within that data structure leads to locking the mutex of the entire LruCache. This includes both writes and reads. This essentially leads to serialization of accesses and a significant performance drop. In my experiments it's around 35%.

It seems to me that hickory's performance could benefit greatly if it used a structure that is designed for concurrent access. A quick search around for concurrent HashMap showed DashMap and scc::HashMap. I tested their performance in a concurrent read and write scenario and compared with an Arc<Mutex<HashMap>> which locks/unlocks on every read/write operation (similar to what DnsLru seems to be doing) and the performance differences were significant (with 16 threads). scc::HashMap is 18x faster in write-only test and 30x faster in the read-only test.

What I was hoping for is one of two things:

Either replace the Arc<Mutex<LruCache<...>>> with a concurrent analog that will provide better performance characteristics, or better yet ...
Create a cache Trait that will allow the users of the library to supply their own caching type. This of course would be the most generic way to solve this problem and it might prove useful in other cases - for example using Redis as a backend to store cache, etc.

Any chance you would consider abstracting the caching in a trait?

Thank you,
Rumen Telbizov

Originally created by @telbizov on GitHub (Jun 26, 2024). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2261 Hello everyone, I am building a high-performance asynchronous forward proxy with tokio. I started using hickory resolver for the DNS resolution part and so far I like it a lot. For example, I really appreciate the fact that I can use my own RuntimeProvider to allocate TCP and UDP sockets because of an edge case I have. I run multiple `current_thread` tokio runtimes (one per core) in dedicated worker threads that I launch myself. Then, I have a lazy static for the AsyncResolver which is shared across all threads. The default mechanics work really well in my case. Every thread that calls `lookup_ip()` gets to run the async future on its own local runtime thus distributing the workload of the lookups themselves well across worker. Then the result of the lookup is shared in the common `DnsLru` cache behind the global `AsyncResolver` variable, which is exactly what I am after -- shared DNS cache. The issue that I am facing is that caching is implemented via the `DnsLru` type which uses `cache: Arc<Mutex<LruCache<Query, LruValue>>>` for the storage of the data. Furthermore, any access to any value within that data structure leads to locking the mutex of the **entire LruCache**. This includes both writes and reads. This essentially leads to serialization of accesses and a significant performance drop. In my experiments it's around 35%. It seems to me that hickory's performance could benefit greatly if it used a structure that is designed for concurrent access. A quick search around for concurrent `HashMap` showed [DashMap](https://docs.rs/dashmap/latest/dashmap/struct.DashMap.html) and [scc::HashMap](https://docs.rs/scc/2.1.1/scc/#hashmap). I tested their performance in a concurrent read and write scenario and compared with an `Arc<Mutex<HashMap>>` which locks/unlocks on every read/write operation (similar to what DnsLru seems to be doing) and the performance differences were significant (with 16 threads). `scc::HashMap` is 18x faster in write-only test and 30x faster in the read-only test. What I was hoping for is one of two things: 1. Either replace the `Arc<Mutex<LruCache<...>>>` with a concurrent analog that will provide better performance characteristics, or better yet ... 2. Create a cache `Trait` that will allow the users of the library to supply their own caching type. This of course would be the most generic way to solve this problem and it might prove useful in other cases - for example using Redis as a backend to store cache, etc. Any chance you would consider abstracting the caching in a trait? Thank you, Rumen Telbizov

kerem closed this issue

2026-03-16 01:02:32 +03:00

kerem commented

2026-03-16 01:02:38 +03:00

Author

Owner

@bdaehlie commented on GitHub (Jun 27, 2024):

Can both be done? Abstracting the caching in a trait and also improving the default to be something concurrent?

@bdaehlie commented on GitHub (Jun 27, 2024): Can both be done? Abstracting the caching in a trait and also improving the default to be something concurrent?

kerem commented

2026-03-16 01:02:43 +03:00

Author

Owner

@telbizov commented on GitHub (Jun 27, 2024):

Can both be done? Abstracting the caching in a trait and also improving the default to be something concurrent?

Sounds good to me!

@telbizov commented on GitHub (Jun 27, 2024): > Can both be done? Abstracting the caching in a trait and also improving the default to be something concurrent? Sounds good to me!

kerem commented

2026-03-16 01:02:48 +03:00

Author

Owner

@djc commented on GitHub (Jun 27, 2024):

Yup, both of these seem like reasonable ideas. I won't be able to implement these but happy to review a short design proposal and/or a PR implementing these.

@djc commented on GitHub (Jun 27, 2024): Yup, both of these seem like reasonable ideas. I won't be able to implement these but happy to review a short design proposal and/or a PR implementing these.

kerem commented

2026-03-16 01:02:53 +03:00

Author

Owner

@telbizov commented on GitHub (Jun 27, 2024):

Thanks for the feedback @djc. I'll consider spending some time on this next week and will let you know.

@telbizov commented on GitHub (Jun 27, 2024): Thanks for the feedback @djc. I'll consider spending some time on this next week and will let you know.

kerem commented

2026-03-16 01:02:59 +03:00

Author

Owner

@telbizov commented on GitHub (Jul 1, 2024):

@djc et al. I'll try to take a shot at tackling this issue. I'll keep you posted in the coming days and weeks and will hopefully present a PR for your consideration. I might possibly reach out to you on discourse, to speed up to back-and-forth in the beginning.

@telbizov commented on GitHub (Jul 1, 2024): @djc et al. I'll try to take a shot at tackling this issue. I'll keep you posted in the coming days and weeks and will hopefully present a PR for your consideration. I might possibly reach out to you on discourse, to speed up to back-and-forth in the beginning.

kerem commented

2026-03-16 01:03:04 +03:00

Author

Owner

@divergentdave commented on GitHub (Nov 13, 2024):

Currently, there are two sources of cache mutations when we read from the DnsLru cache. At the application level, we may delete an entry after attempting to read it if its TTL is expired. Inside lru-cache, the cache moves/refreshes entries in the linked-hash-map in order to maintain the LRU order. Since almost all lru-cache methods require mutable references, we unfortunately can't just swap our Tokio Mutex for an RwLock for a quick win.

It looks like the moka library would work well for the resolver cache. It can handle both eviction, using a more sophisticated policy than the current LRU policy, and per-element expiry times, pruning entries past their TTL before we attempt to look them up again. The biggest downsides seem to be code size/dependency count and lack of WASM support, but we can just polyfill with something like the existing implementation if need be. The cache returns clones of values, but that's fine because we were copying already, in order to rewrite TTLs.

@divergentdave commented on GitHub (Nov 13, 2024): Currently, there are two sources of cache mutations when we read from the `DnsLru` cache. At the application level, we may delete an entry after attempting to read it if its TTL is expired. Inside `lru-cache`, the cache moves/refreshes entries in the `linked-hash-map` in order to maintain the LRU order. Since almost all `lru-cache` methods require mutable references, we unfortunately can't just swap our Tokio `Mutex` for an `RwLock` for a quick win. It looks like the [`moka`](https://github.com/moka-rs/moka/blob/main/README.md) library would work well for the resolver cache. It can handle both eviction, using a more sophisticated policy than the current LRU policy, and per-element expiry times, pruning entries past their TTL before we attempt to look them up again. The biggest downsides seem to be code size/dependency count and lack of WASM support, but we can just polyfill with something like the existing implementation if need be. The cache returns clones of values, but that's fine because we were copying already, in order to rewrite TTLs.

kerem referenced this issue

2026-03-16 02:26:43 +03:00

[PR #942] [CLOSED] Bump bytes from 0.4.12 to 0.5.2 #1807