[GH-ISSUE #138] dnspython: strange effect with dns.resolver.zone_for_name(fqdn)

kerem commented

2026-02-26 09:35:54 +03:00

Owner

Originally created by @ThomasWaldmann on GitHub (Jan 21, 2014).
Original GitHub issue: https://github.com/nsupdate-info/nsupdate.info/issues/138

that call seems to either malfunction or at least behave unexpectedly sometimes.

i did zone_for_name("x.tests.nsupdate.info") and it gave "nsupdate.info" (incorrect) instead of "tests.nsupdate.info" (correct).

then i edited /etc/resolv.conf to set nameserver = 127.0.0.1 (pointing to a freshly started bind9) and it gave correct answer.

then switched resolv.conf nameserver back to previous ip (my router ip) and it also gave correct answer. huh!?

this call giving the correct zone is essential for correct operation of nsupdate.info as we use it to determine the origin zone (in production as well as in the tests).

Originally created by @ThomasWaldmann on GitHub (Jan 21, 2014). Original GitHub issue: https://github.com/nsupdate-info/nsupdate.info/issues/138 that call seems to either malfunction or at least behave unexpectedly sometimes. i did zone_for_name("x.tests.nsupdate.info") and it gave "nsupdate.info" (incorrect) instead of "tests.nsupdate.info" (correct). then i edited /etc/resolv.conf to set nameserver = 127.0.0.1 (pointing to a freshly started bind9) and it gave correct answer. then switched resolv.conf nameserver back to previous ip (my router ip) and it also gave correct answer. huh!? this call giving the correct zone is essential for correct operation of nsupdate.info as we use it to determine the origin zone (in production as well as in the tests).

kerem

2026-02-26 09:35:54 +03:00

closed this issue
added the
bug
label

kerem commented

2026-02-26 09:35:55 +03:00

Author

Owner

@rthalley commented on GitHub (Jan 22, 2014):

This doesn't seem likely to be a dnspython bug. All dnspython does is do SOA queries for the name and its parents to the system's configured resolver until one of them gets a positive answer.

Intuitively, I think what's happening is that there's a negative cache entry above your target name in the system's resolver's cache. E.g. for your x.tests.nsupdate.info example, if someone or something had queried anything for "tests.nsupdate.info" and it didn't exist yet, this will create a negative cache entry in the configured resolver's cache. Note that if this is an ISP's resolver, you might not have been the one whose query caused this name to be created.

If you then subsequently create the "tests.nsupdate.info" zone and then try to do zone_for_name() on "x.tests.nsupdate.info", then you won't find the right zone until the negative cache entry for tests.nsupdate.info expires. This theory is supported by the evidence you cite, namely that when you start a local BIND it works, and that when you came back to the router IP (presumably after the negative cache expiration) it worked too.

If you control the system resolver, you can tell it to flush the cache for a name (and all names below it) that you just created.

If all of the authoritative servers for nsupdate.info know about all subdomains of nsupdate.info too, then you might be able to point dnspython at the authority servers instead of the system resolver. This should work so long as you only ask it about zones under nsupdate.info. Note that this does NOT work if you have some subdomains of nsupdate.info served by one system and some by another.

The only other workaround would be do determine zones in a way that doesn't use caching, by querying all of the authorities directly from the root down, but this is a tedious and complicated process that you really don't want to attempt (it's writing the heart of a recursive resolver like BIND). Easier in dnspython, I grant, but still quite painful! This is why dnspython does not offer a full resolver! It's also quite rude not to cache if you're doing a lot of lookups! :)

Finally as one of the notes on Issue #122 says, you might be able to address this in the API if you can remember what the zone is supposed to be. Though even that might not free you completely from caching effects depending on what you do with that info. E.g. if you query for the zone's NS RRs and there's a negative cache entry, you may still have a problem.

@rthalley commented on GitHub (Jan 22, 2014): This doesn't seem likely to be a dnspython bug. All dnspython does is do SOA queries for the name and its parents to the system's configured resolver until one of them gets a positive answer. Intuitively, I think what's happening is that there's a negative cache entry above your target name in the system's resolver's cache. E.g. for your x.tests.nsupdate.info example, if someone or something had queried anything for "tests.nsupdate.info" and it didn't exist yet, this will create a negative cache entry in the configured resolver's cache. Note that if this is an ISP's resolver, you might not have been the one whose query caused this name to be created. If you then subsequently create the "tests.nsupdate.info" zone and then try to do zone_for_name() on "x.tests.nsupdate.info", then you won't find the right zone until the negative cache entry for tests.nsupdate.info expires. This theory is supported by the evidence you cite, namely that when you start a local BIND it works, and that when you came back to the router IP (presumably after the negative cache expiration) it worked too. If you control the system resolver, you can tell it to flush the cache for a name (and all names below it) that you just created. If all of the authoritative servers for nsupdate.info know about all subdomains of nsupdate.info too, then you might be able to point dnspython at the authority servers instead of the system resolver. This should work so long as you only ask it about zones under nsupdate.info. Note that this does NOT work if you have some subdomains of nsupdate.info served by one system and some by another. The only other workaround would be do determine zones in a way that doesn't use caching, by querying all of the authorities directly from the root down, but this is a tedious and complicated process that you really don't want to attempt (it's writing the heart of a recursive resolver like BIND). Easier in dnspython, I grant, but still quite painful! This is why dnspython does not offer a full resolver! It's also quite rude not to cache if you're doing a lot of lookups! :) Finally as one of the notes on Issue #122 says, you might be able to address this in the API if you can remember what the zone is supposed to be. Though even that might not free you completely from caching effects depending on what you do with that info. E.g. if you query for the zone's NS RRs and there's a negative cache entry, you may still have a problem.

kerem commented

2026-02-26 09:35:55 +03:00

Author

Owner

@ThomasWaldmann commented on GitHub (Jan 22, 2014):

thanks for looking at the issue.

about negative caching: when I had first seen that issue (for another subdomain of another domain that was just freshly created short before), I had same suspicion.

but tests.nsupdate.info exists since weeks. my unit tests worked some days ago and yesterday they suddenly massively failed (they do such queries). after what I described above, they suddenly worked again. thus, I am not so convinced any more that it is caused by (wrong) negative caching. next time it happens I'll try to find out more about it.

the authoritative nameservers for nsupdate.info both have the ns entry for tests.nsupdate.info, i checked that with dig.

but: there is only one nameserver for tests.nsupdate.info (no secondary here). could this cause issues? (reason: not needed, this is just for unit tests. also didn't want a flood of useless axfrs.)

in the nsupdate.info software, we usually talk to the master ns of the domain directly (and not use system resolver) to get fresh and correct information. but due to how our internal api is made (passes fqdn around, NOT hostname and origin separately) at one place we use zone_for_name() to find out the origin so that we now which zone to lookup in our database to find out which master ns to talk to. guess I have to rewrite the internal api some day...

about that "flush the local resolver cache": I don't have nscd installed. I did some research, but all I could find were pointers to restarting nscd. So is there some other "resolver caching" mechanism I should look for? I use ubuntu linux 12.04.

@ThomasWaldmann commented on GitHub (Jan 22, 2014): thanks for looking at the issue. about negative caching: when I had first seen that issue (for another subdomain of another domain that was just freshly created short before), I had same suspicion. but tests.nsupdate.info exists since weeks. my unit tests worked some days ago and yesterday they suddenly massively failed (they do such queries). after what I described above, they suddenly worked again. thus, I am not so convinced any more that it is caused by (wrong) negative caching. next time it happens I'll try to find out more about it. the authoritative nameservers for nsupdate.info both have the ns entry for tests.nsupdate.info, i checked that with dig. but: there is only one nameserver for tests.nsupdate.info (no secondary here). could this cause issues? (reason: not needed, this is just for unit tests. also didn't want a flood of useless axfrs.) in the nsupdate.info software, we usually talk to the master ns of the domain directly (and not use system resolver) to get fresh and correct information. but due to how our internal api is made (passes fqdn around, NOT hostname and origin separately) at one place we use zone_for_name() to find out the origin so that we now which zone to lookup in our database to find out which master ns to talk to. guess I have to rewrite the internal api some day... about that "flush the local resolver cache": I don't have nscd installed. I did some research, but all I could find were pointers to restarting nscd. So is there some other "resolver caching" mechanism I should look for? I use ubuntu linux 12.04.

kerem commented

2026-02-26 09:35:55 +03:00

Author

Owner

@rthalley commented on GitHub (Jan 23, 2014):

re just one nameserver: I don't think that would cause issues as if the server is down dnspython should raise a timeout exception.

If you see it happen again, try getting the results of a "dig tests.nsupdate.info ANY" to see what your resolver has cached.

Re "flush the local resolver cache" I didn't mean nscd, I meant "the resolver indicated by /etc/resolv.conf".

@rthalley commented on GitHub (Jan 23, 2014): re just one nameserver: I don't think that would cause issues as if the server is down dnspython should raise a timeout exception. If you see it happen again, try getting the results of a "dig tests.nsupdate.info ANY" to see what your resolver has cached. Re "flush the local resolver cache" I didn't mean nscd, I meant "the resolver indicated by /etc/resolv.conf".

kerem commented

2026-02-26 09:35:55 +03:00

Author

Owner

@ykmm commented on GitHub (Jul 26, 2014):

ep2014 - trying to look into this
difficult to reproduce, in order to understand what happens I wrote a decorator that sniffs network data for the duration of a function call, it could be used to decorate the zone_for_name() function

https://gist.github.com/ykmm/f66cb485a3f3353402a5

@ykmm commented on GitHub (Jul 26, 2014): ep2014 - trying to look into this difficult to reproduce, in order to understand what happens I wrote a decorator that sniffs network data for the duration of a function call, it could be used to decorate the zone_for_name() function https://gist.github.com/ykmm/f66cb485a3f3353402a5

kerem commented

2026-02-26 09:35:55 +03:00

Author

Owner

@ThomasWaldmann commented on GitHub (Aug 27, 2014):

I am seeing this bug on travis CI all the time, quite some unit tests failing there just because of this.

could be also related to #122 - I put just these 2 bugs into 0.8 release milestone.

to fix both: refactor the api so we do not need zone_for_name(fqdn) any more, see post 2 of #122.

@ThomasWaldmann commented on GitHub (Aug 27, 2014): I am seeing this bug on travis CI all the time, quite some unit tests failing there just because of this. could be also related to #122 - I put just these 2 bugs into 0.8 release milestone. to fix both: refactor the api so we do not need zone_for_name(fqdn) any more, see post 2 of #122.

kerem commented

2026-02-26 09:35:56 +03:00

Author

Owner

@ThomasWaldmann commented on GitHub (Aug 30, 2014):

fixed by cf2c46e612

@ThomasWaldmann commented on GitHub (Aug 30, 2014): fixed by cf2c46e612cca80ff9748783f61367fad4efbbe1

kerem referenced this issue

2026-02-26 10:31:19 +03:00

[PR #128] [MERGED] added m0n0wall router configuration #395

Rows
Columns

[GH-ISSUE #138] dnspython: strange effect with dns.resolver.zone_for_name(fqdn) #128