mirror of
https://github.com/nsupdate-info/nsupdate.info.git
synced 2026-04-25 16:45:55 +03:00
[GH-ISSUE #138] dnspython: strange effect with dns.resolver.zone_for_name(fqdn) #128
Labels
No labels
bug
bug
duplicate
easy
easy
enhancement
enhancement
invalid
needs help
pull-request
scalability
security
task
urgent
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/nsupdate.info-nsupdate-info#128
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ThomasWaldmann on GitHub (Jan 21, 2014).
Original GitHub issue: https://github.com/nsupdate-info/nsupdate.info/issues/138
that call seems to either malfunction or at least behave unexpectedly sometimes.
i did zone_for_name("x.tests.nsupdate.info") and it gave "nsupdate.info" (incorrect) instead of "tests.nsupdate.info" (correct).
then i edited /etc/resolv.conf to set nameserver = 127.0.0.1 (pointing to a freshly started bind9) and it gave correct answer.
then switched resolv.conf nameserver back to previous ip (my router ip) and it also gave correct answer. huh!?
this call giving the correct zone is essential for correct operation of nsupdate.info as we use it to determine the origin zone (in production as well as in the tests).
@rthalley commented on GitHub (Jan 22, 2014):
This doesn't seem likely to be a dnspython bug. All dnspython does is do SOA queries for the name and its parents to the system's configured resolver until one of them gets a positive answer.
Intuitively, I think what's happening is that there's a negative cache entry above your target name in the system's resolver's cache. E.g. for your x.tests.nsupdate.info example, if someone or something had queried anything for "tests.nsupdate.info" and it didn't exist yet, this will create a negative cache entry in the configured resolver's cache. Note that if this is an ISP's resolver, you might not have been the one whose query caused this name to be created.
If you then subsequently create the "tests.nsupdate.info" zone and then try to do zone_for_name() on "x.tests.nsupdate.info", then you won't find the right zone until the negative cache entry for tests.nsupdate.info expires. This theory is supported by the evidence you cite, namely that when you start a local BIND it works, and that when you came back to the router IP (presumably after the negative cache expiration) it worked too.
If you control the system resolver, you can tell it to flush the cache for a name (and all names below it) that you just created.
If all of the authoritative servers for nsupdate.info know about all subdomains of nsupdate.info too, then you might be able to point dnspython at the authority servers instead of the system resolver. This should work so long as you only ask it about zones under nsupdate.info. Note that this does NOT work if you have some subdomains of nsupdate.info served by one system and some by another.
The only other workaround would be do determine zones in a way that doesn't use caching, by querying all of the authorities directly from the root down, but this is a tedious and complicated process that you really don't want to attempt (it's writing the heart of a recursive resolver like BIND). Easier in dnspython, I grant, but still quite painful! This is why dnspython does not offer a full resolver! It's also quite rude not to cache if you're doing a lot of lookups! :)
Finally as one of the notes on Issue #122 says, you might be able to address this in the API if you can remember what the zone is supposed to be. Though even that might not free you completely from caching effects depending on what you do with that info. E.g. if you query for the zone's NS RRs and there's a negative cache entry, you may still have a problem.
@ThomasWaldmann commented on GitHub (Jan 22, 2014):
thanks for looking at the issue.
about negative caching: when I had first seen that issue (for another subdomain of another domain that was just freshly created short before), I had same suspicion.
but tests.nsupdate.info exists since weeks. my unit tests worked some days ago and yesterday they suddenly massively failed (they do such queries). after what I described above, they suddenly worked again. thus, I am not so convinced any more that it is caused by (wrong) negative caching. next time it happens I'll try to find out more about it.
the authoritative nameservers for nsupdate.info both have the ns entry for tests.nsupdate.info, i checked that with dig.
but: there is only one nameserver for tests.nsupdate.info (no secondary here). could this cause issues? (reason: not needed, this is just for unit tests. also didn't want a flood of useless axfrs.)
in the nsupdate.info software, we usually talk to the master ns of the domain directly (and not use system resolver) to get fresh and correct information. but due to how our internal api is made (passes fqdn around, NOT hostname and origin separately) at one place we use zone_for_name() to find out the origin so that we now which zone to lookup in our database to find out which master ns to talk to. guess I have to rewrite the internal api some day...
about that "flush the local resolver cache": I don't have nscd installed. I did some research, but all I could find were pointers to restarting nscd. So is there some other "resolver caching" mechanism I should look for? I use ubuntu linux 12.04.
@rthalley commented on GitHub (Jan 23, 2014):
re just one nameserver: I don't think that would cause issues as if the server is down dnspython should raise a timeout exception.
If you see it happen again, try getting the results of a "dig tests.nsupdate.info ANY" to see what your resolver has cached.
Re "flush the local resolver cache" I didn't mean nscd, I meant "the resolver indicated by /etc/resolv.conf".
@ykmm commented on GitHub (Jul 26, 2014):
ep2014 - trying to look into this
difficult to reproduce, in order to understand what happens I wrote a decorator that sniffs network data for the duration of a function call, it could be used to decorate the zone_for_name() function
https://gist.github.com/ykmm/f66cb485a3f3353402a5
@ThomasWaldmann commented on GitHub (Aug 27, 2014):
I am seeing this bug on travis CI all the time, quite some unit tests failing there just because of this.
could be also related to #122 - I put just these 2 bugs into 0.8 release milestone.
to fix both: refactor the api so we do not need zone_for_name(fqdn) any more, see post 2 of #122.
@ThomasWaldmann commented on GitHub (Aug 30, 2014):
fixed by
cf2c46e612