[GH-ISSUE #1398] how to respond to backpressure when sending many queries #662

New issue

Open

opened 2026-03-15 23:44:14 +03:00 by kerem · 0 comments

kerem commented

2026-03-15 23:44:14 +03:00

Owner

Originally created by @cmusser on GitHub (Mar 2, 2021).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1398

Creating a large number of concurrent queries managed with a FuturesUnordered can result in a large proportion of failed requests. In the 0.19 release series, there didn't seem to be any limits on how many queries could be started. However, in tests with ~6 million queries, divided into 10000-query batches, I've seen 3/5 of the requests timeout. In 0.20, backpressure was introduced to prevent queries from overflowing internal resources. With that, if there are more than 32 active requests, subsequent ones result in a ProtoError with kind of Busy.

The question is: what's an effective strategy for using the Busy return as a backpressure mechanism. The goal is minimize the chance of overflowing local system resources while still having lots of concurrent requests.

I tried splitting the queries to be done into fixed size batches (each batch goes into a FuturesUnordered) and then accumulating timeouts into a list to be retried again. That strategy eventually gets all the queries answers (given enough iterations of the retry logic), but it doesn't prevent packet loss in the first place. Some more specific questions are:

Can the program interactively see the number active requests, so it can insert new queries to maximize the number up to some "reasonable" amount?
Can the number of active requests be increased from 32, or be tunable within sane limits?
Is using TCP an improvement? I converted to TCP and got about the same number of Busy errors, but I'm wondering if if the in-flight limit were increased it might provide another layer of backpressure.

Originally created by @cmusser on GitHub (Mar 2, 2021). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/1398 Creating a large number of concurrent queries managed with a `FuturesUnordered` can result in a large proportion of failed requests. In the 0.19 release series, there didn't seem to be any limits on how many queries could be started. However, in tests with ~6 million queries, divided into 10000-query batches, I've seen 3/5 of the requests timeout. In 0.20, backpressure was introduced to prevent queries from overflowing internal resources. With that, if there are more than 32 active requests, subsequent ones result in a `ProtoError` with kind of `Busy`. The question is: what's an effective strategy for using the `Busy` return as a backpressure mechanism. The goal is minimize the chance of overflowing local system resources while still having lots of concurrent requests. I tried splitting the queries to be done into fixed size batches (each batch goes into a `FuturesUnordered`) and then accumulating timeouts into a list to be retried again. That strategy eventually gets all the queries answers (given enough iterations of the retry logic), but it doesn't *prevent* packet loss in the first place. Some more specific questions are: - Can the program interactively see the number active requests, so it can insert new queries to maximize the number up to some "reasonable" amount? - Can the number of active requests be increased from 32, or be tunable within sane limits? - Is using TCP an improvement? I converted to TCP and got about the same number of Busy errors, but I'm wondering if if the in-flight limit were increased it might provide another layer of backpressure.