[GH-ISSUE #185] Resolver: mpsc error #86

Closed
opened 2026-03-07 22:18:40 +03:00 by kerem · 8 comments
Owner

Originally created by @NfNitLoop on GitHub (Sep 12, 2017).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/185

I'm getting this Error when trying to use Resolver::lookup():

Err(Error { repr: Custom(Custom { kind: Other, error: StringError("ClientError: Cloned error: Cloned error: error sending to mpsc: send failed because receiver is gone") }) })

I'm not creating any mpsc channels so I'm pretty sure that's all on the Resolver. :p

Here's the code that's doing the lookup. I'm calling from an outer loop that just reads a "domain" (a line from stdin) and calls this function:

    fn resolve<S: Into<String>>(resolver: &mut Resolver, domain: S) -> Self {
        // normalize domain:
        let mut domain: String = domain.into().trim_left_matches(
            |x| x == '*' || x == '.' || x == ' '
        ).into();

        let mut info = Self::new(domain.as_str());

        // Resolver wants a trailing .:
        domain.push('.');
        let result = resolver.lookup(&domain, RecordType::A);
        println!("result: {:?}", result);
        let first_ip = match result {
            Err(_) => {
                eprintln!("Error querying {}", domain);
                None
            },
            Ok(result) => {
                let mut value = None;
                for rdata in result.iter() {
                    if let &RData::A(ref addr) = rdata {
                        value = Some(addr.clone());
                        break;
                    } else {
                        eprintln!("Skipping unexpected record type: {:?}", rdata);
                    }
                }
                value
            }
        };

        if let Some(ip) = first_ip {
            info.domain_ip = format!("{}", ip);
        } else {
            eprintln!("No IP found for domain '{}'", domain);
        }

        return info;
    }
Originally created by @NfNitLoop on GitHub (Sep 12, 2017). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/185 I'm getting this Error when trying to use Resolver::lookup(): ~~~ Err(Error { repr: Custom(Custom { kind: Other, error: StringError("ClientError: Cloned error: Cloned error: error sending to mpsc: send failed because receiver is gone") }) }) ~~~ I'm not creating any mpsc channels so I'm pretty sure that's all on the Resolver. :p Here's the code that's doing the lookup. I'm calling from an outer loop that just reads a "domain" (a line from stdin) and calls this function: ~~~ fn resolve<S: Into<String>>(resolver: &mut Resolver, domain: S) -> Self { // normalize domain: let mut domain: String = domain.into().trim_left_matches( |x| x == '*' || x == '.' || x == ' ' ).into(); let mut info = Self::new(domain.as_str()); // Resolver wants a trailing .: domain.push('.'); let result = resolver.lookup(&domain, RecordType::A); println!("result: {:?}", result); let first_ip = match result { Err(_) => { eprintln!("Error querying {}", domain); None }, Ok(result) => { let mut value = None; for rdata in result.iter() { if let &RData::A(ref addr) = rdata { value = Some(addr.clone()); break; } else { eprintln!("Skipping unexpected record type: {:?}", rdata); } } value } }; if let Some(ip) = first_ip { info.domain_ip = format!("{}", ip); } else { eprintln!("No IP found for domain '{}'", domain); } return info; } ~~~
kerem 2026-03-07 22:18:40 +03:00
Author
Owner

@bluejekyll commented on GitHub (Sep 12, 2017):

Yikes! Yes, there are is a mpsc internal to the Client for tracking messages in and out. Is this on first request or on subsequent requests? Also, what platform are you on linux, windows or mac?

Can you enable debug! logging for the library and see what's going on there?

Btw, you don't "need" to put in the trailing dot, that just forces it to be an FQDN, so that only one query is attempted. Also, if you upgrade to the 0.5.0 version, I added a new method for this in particular: https://docs.rs/trust-dns-resolver/0.5.0/trust_dns_resolver/struct.Resolver.html#method.ipv4_lookup

<!-- gh-comment-id:328930808 --> @bluejekyll commented on GitHub (Sep 12, 2017): Yikes! Yes, there are is a mpsc internal to the Client for tracking messages in and out. Is this on first request or on subsequent requests? Also, what platform are you on linux, windows or mac? Can you enable debug! logging for the library and see what's going on there? Btw, you don't "need" to put in the trailing dot, that just forces it to be an FQDN, so that only one query is attempted. Also, if you upgrade to the 0.5.0 version, I added a new method for this in particular: https://docs.rs/trust-dns-resolver/0.5.0/trust_dns_resolver/struct.Resolver.html#method.ipv4_lookup
Author
Owner

@bluejekyll commented on GitHub (Sep 12, 2017):

Are you configuring this with the systems resolv.conf? By default that adds Nameserver configs for both UDP and TCP to the Resolver. I wonder if you've gotten unlucky in it trying TCP first, and that not working (i.e. the remote authority might not have TCP enabled?). The resolver by default reattempts 2 times. This can be increased, to see if that is the issue.

Another option would be to explicitly construct the NameserverConfig and create a ResolverConfig with that information, only specifying the protocol you want to use, e.g. UDP.

<!-- gh-comment-id:328934062 --> @bluejekyll commented on GitHub (Sep 12, 2017): Are you configuring this with the systems resolv.conf? By default that adds Nameserver configs for both UDP and TCP to the Resolver. I wonder if you've gotten unlucky in it trying TCP first, and that not working (i.e. the remote authority might not have TCP enabled?). The resolver by default reattempts 2 times. This can be increased, to see if that is the issue. Another option would be to explicitly construct the NameserverConfig and create a ResolverConfig with that information, only specifying the protocol you want to use, e.g. UDP.
Author
Owner

@NfNitLoop commented on GitHub (Sep 12, 2017):

Can you enable debug! logging for the library and see what's going on there?

Hmm, I'm not able to reproduce it now. The network I'm on is a bit flaky, maybe it was just having issues? But it seems like flaky network shouldn't cause channel errors.

Are you configuring this with the systems resolv.conf?

Yep!

By default that adds Nameserver configs for both UDP and TCP to the Resolver.

ahaa, good to know. I'd prefer UDP-only. Maybe I should manually configure it.

I wonder if you've gotten unlucky in it trying TCP first, and that not working (i.e. the remote authority might not have TCP enabled?). The resolver by default reattempts 2 times. This can be increased, to see if that is the issue.

I've got the attempts down to 1.

Is this on first request or on subsequent requests?

It was often on subsequent requests, after having left the Resolver unused for a while. Could it be that it was using a TCP connection, and the connection was getting dropped in the meantime?

<!-- gh-comment-id:328935453 --> @NfNitLoop commented on GitHub (Sep 12, 2017): > Can you enable debug! logging for the library and see what's going on there? Hmm, I'm not able to reproduce it now. The network I'm on is a bit flaky, maybe it was just having issues? But it seems like flaky network shouldn't cause channel errors. > Are you configuring this with the systems resolv.conf? Yep! > By default that adds Nameserver configs for both UDP and TCP to the Resolver. ahaa, good to know. I'd prefer UDP-only. Maybe I should manually configure it. > I wonder if you've gotten unlucky in it trying TCP first, and that not working (i.e. the remote authority might not have TCP enabled?). The resolver by default reattempts 2 times. This can be increased, to see if that is the issue. I've got the attempts down to 1. > Is this on first request or on subsequent requests? It was often on subsequent requests, after having left the Resolver unused for a while. Could it be that it was using a TCP connection, and the connection was getting dropped in the meantime?
Author
Owner

@bluejekyll commented on GitHub (Sep 12, 2017):

it seems like flaky network shouldn't cause channel errors.

Yeah, this might just be poor Error messaging, i.e. I might be losing the original Error in translation. It's something I've been meaning to go clean up...

I'd prefer UDP-only. Maybe I should manually configure it.

Yes, definitely do that. I think I will also add some logic to prefer UDP over TCP as well. This is related to another issue I've filed, #178 , for promoting to TCP when responses are truncated (generally only on large packets). resolv.conf doesn't (as far as I'm aware) have an option for disabling TCP... But I should definitely err on using UDP over TCP.

I've got the attempts down to 1.

Yeah, the logic right now is to work through all the Nameservers in the pool and try to determine the best one. It's weighted at the moment to try ones that haven't yet been tried, to make sure to balance the requests. So with an attempt of 1, if it hits one that hasn't been tried before it could fail. Any reason you don't want a retry in there?

Could it be that it was using a TCP connection, and the connection was getting dropped in the meantime?

Possibly. There could be a bug there, so I'll see about building a test case for that specific event. The logic should cause connections that fail to be closed, and then reopened. I believe I have test coverage for that, so I suspect that if that's what's happening, then the issue is that since only one attempt is being made, it won't have a chance to reconnect (for TCP) if the connection was dropped.

<!-- gh-comment-id:328938450 --> @bluejekyll commented on GitHub (Sep 12, 2017): > it seems like flaky network shouldn't cause channel errors. Yeah, this might just be poor Error messaging, i.e. I might be losing the original Error in translation. It's something I've been meaning to go clean up... > I'd prefer UDP-only. Maybe I should manually configure it. Yes, definitely do that. I think I will also add some logic to prefer UDP over TCP as well. This is related to another issue I've filed, #178 , for promoting to TCP when responses are truncated (generally only on large packets). `resolv.conf` doesn't (as far as I'm aware) have an option for disabling TCP... But I should definitely err on using UDP over TCP. > I've got the attempts down to 1. Yeah, the logic right now is to work through all the Nameservers in the pool and try to determine the best one. It's weighted at the moment to try ones that haven't yet been tried, to make sure to balance the requests. So with an attempt of 1, if it hits one that hasn't been tried before it could fail. Any reason you don't want a retry in there? > Could it be that it was using a TCP connection, and the connection was getting dropped in the meantime? Possibly. There could be a bug there, so I'll see about building a test case for that specific event. The logic *should* cause connections that fail to be closed, and then reopened. I believe I have test coverage for that, so I suspect that if that's what's happening, then the issue is that since only one attempt is being made, it won't have a chance to reconnect (for TCP) if the connection was dropped.
Author
Owner

@NfNitLoop commented on GitHub (Sep 12, 2017):

Any reason you don't want a retry in there?

The docs weren't 100% clear so I assumed that the timeout duration was per attempt, so I didn't want to double that time. But to be honest it's because I'm trying to first do a like-for-like rewrite of the C tool I'm hoping to replace, and it didn't do retries. (The ability to easily update the attempt config in the future is a feature I'm going to point out in Rust/trust-dns's favor, though.) 😄

<!-- gh-comment-id:328940311 --> @NfNitLoop commented on GitHub (Sep 12, 2017): > Any reason you don't want a retry in there? The [docs] weren't 100% clear so I assumed that the timeout duration was per attempt, so I didn't want to double that time. But to be honest it's because I'm trying to first do a like-for-like rewrite of the C tool I'm hoping to replace, and it didn't do retries. (The ability to easily update the `attempt` config in the future is a feature I'm going to point out in Rust/trust-dns's favor, though.) 😄 [docs]: https://docs.rs/trust-dns-resolver/0.5.0/trust_dns_resolver/config/struct.ResolverOpts.html
Author
Owner

@bluejekyll commented on GitHub (Sep 12, 2017):

The docs weren't 100% clear so I assumed that the timeout duration was per attempt, so I didn't want to double that time.

I could definitely use help on the documentation side of things... A lot of those options are straight out of the resolv.conf definitions from POSIX systems, so some of the wording could definitely be more accurate.

<!-- gh-comment-id:328942224 --> @bluejekyll commented on GitHub (Sep 12, 2017): > The docs weren't 100% clear so I assumed that the timeout duration was per attempt, so I didn't want to double that time. I could definitely use help on the documentation side of things... A lot of those options are straight out of the `resolv.conf` definitions from POSIX systems, so some of the wording could definitely be more accurate.
Author
Owner

@bluejekyll commented on GitHub (Sep 20, 2017):

Part of this discussion raised the issue of TCP being used instead of UDP. This was resolved in #189 .

<!-- gh-comment-id:331005308 --> @bluejekyll commented on GitHub (Sep 20, 2017): Part of this discussion raised the issue of TCP being used instead of UDP. This was resolved in #189 .
Author
Owner

@bluejekyll commented on GitHub (Jun 12, 2018):

Closing as out-of-date

<!-- gh-comment-id:396614923 --> @bluejekyll commented on GitHub (Jun 12, 2018): Closing as out-of-date
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#86
No description provided.