[GH-ISSUE #3151] CI test flakiness: resolver::tests::test_sec_lookup and resolver::tests::test_sec_lookup_fails #1144

Closed
opened 2026-03-16 01:43:19 +03:00 by kerem · 2 comments
Owner

Originally created by @divergentdave on GitHub (Jul 23, 2025).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/3151

The resolver::tests::test_sec_lookup and resolver::tests::test_sec_lookup_fails tests have been failing frequently in CI of late. It looks like the cause is a timeout error while fetching DNSSEC records causing validation failures.

Originally created by @divergentdave on GitHub (Jul 23, 2025). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/3151 The `resolver::tests::test_sec_lookup` and `resolver::tests::test_sec_lookup_fails` tests have been failing frequently in CI of late. It looks like the cause is a timeout error while fetching DNSSEC records causing validation failures.
kerem closed this issue 2026-03-16 01:43:25 +03:00
Author
Owner

@divergentdave commented on GitHub (Jul 23, 2025):

I wanted to test whether recent changes have affected retry behavior, worsening the impact of dropped packets on this test. I ran the following commands to set up simulated packet loss in a network namespace. (loosely assembled from https://ilmanzo.github.io/post/faulty_network_simulation/ and https://josephmuia.ca/2018-05-16-net-namespaces-veth-nat/)

IFACE=$(route | grep ^default | awk '{ print $8 }')
sudo ip netns add testing
sudo ip link add veth-default type veth peer name veth-testing
sudo ip link set veth-testing netns testing
sudo ip addr add 10.0.178.1/24 dev veth-default
sudo ip netns exec testing ip addr add 10.0.178.2/24 dev veth-testing
sudo ip link set veth-default up
sudo ip netns exec testing ip link set veth-testing up
sudo iptables -A FORWARD -o $IFACE -i veth-default -j ACCEPT
sudo iptables -A FORWARD -i $IFACE -o veth-default -j ACCEPT
sudo iptables -t nat -A POSTROUTING -s 10.0.178.2/24 -o $IFACE -j MASQUERADE
sudo ip netns exec testing ip route add default via 10.0.178.1
sudo ip netns exec testing tc qdisc add dev veth-testing root netem loss 10% 0%

I ran tests with the following commands:

sudo ip netns exec testing sudo -u $USER bash -c "for i in {1..25}; do $(which cargo) test -p hickory-resolver --features dnssec-aws-lc-rs --lib -- resolver::tests::test_sec_lookup; done"
sudo ip netns exec testing sudo -u $USER bash -c "RUST_LOG=debug $(which cargo) test -p hickory-resolver --features dnssec-aws-lc-rs --lib -- resolver::tests::test_sec_lookup --exact --nocapture"

I did not get any test failures, but sometimes the tests took five, ten, or twenty seconds to run, instead of the usual half second. This suggests that our retry logic is effective in this test, but some of our Mac/Windows CI runs are exhausting retries due to even worse network conditions than the 10% packet loss I chose. (or rate limiting by Google Public DNS) Logs from the second test command show the internal workings of the resolver as it hits timeouts and resends requests. Note that some DNSKEY responses were truncated, so fallback to TCP had to be used as well.

<!-- gh-comment-id:3109350854 --> @divergentdave commented on GitHub (Jul 23, 2025): I wanted to test whether recent changes have affected retry behavior, worsening the impact of dropped packets on this test. I ran the following commands to set up simulated packet loss in a network namespace. (loosely assembled from https://ilmanzo.github.io/post/faulty_network_simulation/ and https://josephmuia.ca/2018-05-16-net-namespaces-veth-nat/) ```bash IFACE=$(route | grep ^default | awk '{ print $8 }') sudo ip netns add testing sudo ip link add veth-default type veth peer name veth-testing sudo ip link set veth-testing netns testing sudo ip addr add 10.0.178.1/24 dev veth-default sudo ip netns exec testing ip addr add 10.0.178.2/24 dev veth-testing sudo ip link set veth-default up sudo ip netns exec testing ip link set veth-testing up sudo iptables -A FORWARD -o $IFACE -i veth-default -j ACCEPT sudo iptables -A FORWARD -i $IFACE -o veth-default -j ACCEPT sudo iptables -t nat -A POSTROUTING -s 10.0.178.2/24 -o $IFACE -j MASQUERADE sudo ip netns exec testing ip route add default via 10.0.178.1 sudo ip netns exec testing tc qdisc add dev veth-testing root netem loss 10% 0% ``` I ran tests with the following commands: `sudo ip netns exec testing sudo -u $USER bash -c "for i in {1..25}; do $(which cargo) test -p hickory-resolver --features dnssec-aws-lc-rs --lib -- resolver::tests::test_sec_lookup; done"` `sudo ip netns exec testing sudo -u $USER bash -c "RUST_LOG=debug $(which cargo) test -p hickory-resolver --features dnssec-aws-lc-rs --lib -- resolver::tests::test_sec_lookup --exact --nocapture"` I did not get any test failures, but sometimes the tests took five, ten, or twenty seconds to run, instead of the usual half second. This suggests that our retry logic is effective in this test, but some of our Mac/Windows CI runs are exhausting retries due to even worse network conditions than the 10% packet loss I chose. (or rate limiting by Google Public DNS) Logs from the second test command show the internal workings of the resolver as it hits timeouts and resends requests. Note that some DNSKEY responses were truncated, so fallback to TCP had to be used as well.
Author
Owner

@divergentdave commented on GitHub (Jul 31, 2025):

#3157 ignores these tests, and I don't think there are any further underlying issues to address. so I'm going to close this issue.

<!-- gh-comment-id:3140513048 --> @divergentdave commented on GitHub (Jul 31, 2025): #3157 ignores these tests, and I don't think there are any further underlying issues to address. so I'm going to close this issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#1144
No description provided.