mirror of
https://github.com/hickory-dns/hickory-dns.git
synced 2026-04-25 03:05:51 +03:00
[GH-ISSUE #2014] do not leak and requested domain to the log #849
Labels
No labels
blocked
breaking-change
bug
bug:critical
bug:tests
cleanup
compliance
compliance
compliance
crate:all
crate:client
crate:native-tls
crate:proto
crate:recursor
crate:resolver
crate:resolver
crate:rustls
crate:server
crate:util
dependencies
docs
duplicate
easy
easy
enhance
enhance
enhance
feature:dns-over-https
feature:dns-over-quic
feature:dns-over-tls
feature:dnsssec
feature:global_lb
feature:mdns
feature:tsig
features:edns
has workaround
ops
perf
platform:WASM
platform:android
platform:fuchsia
platform:linux
platform:macos
platform:windows
pull-request
question
test
tools
tools
trust
unclear
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/hickory-dns#849
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @LuckyTurtleDev on GitHub (Sep 2, 2023).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/2014
Is your feature request related to a problem? Please describe.
After updating trust dns server to 0.23.0 it has starting leaking the ip address an the requested domain to the logs.
Because of privacy reasons this is problematic.
Describe the solution you'd like
Do not leak ip and requested domain
Describe alternatives you've considered
@bluejekyll commented on GitHub (Sep 3, 2023):
Is there any guidance you want to provide on why this is a privacy issue? This is valuable debug information, so we could split this into two separate log lines? one with full context, one with only the request parameters and not the source IP? Other option would be to have a parameter to choose to log or not? or config for the log line itself?
I'm open to options.
@LuckyTurtleDev commented on GitHub (Sep 3, 2023):
I have write a pi-hole clone. It can also run at a public server so android can access it via dot.
But after the updating trust dns server to 0.23.0 it has start logging every ip which, connect to it and the requested domain.
Since this are very private data I think this is a big privacy issue.
I can increase the log level to warning, but then it does not even show the open ports/protocols and some other important inflammations anymore. And logging them with
warnis also wrong because they are no warnings.@msrd0 commented on GitHub (Sep 5, 2023):
This is not sufficient, as I have seen IP addresses and domains queried in error messages as well.
@XOR-op commented on GitHub (Sep 5, 2023):
What is your threat model? I think your program can always see the details of those requests. So in my assumption, you want to dump your logs into a file, but don't want other users/process under the same user read those logs. For this case, I would suggest you set a write-only permission for the log file (-w-------). Correct me if I make a wrong assumption.
@msrd0 commented on GitHub (Sep 5, 2023):
This is less of a security "thread" and more of a privacy issue. If I were hosting a DNS server using trust-dns software, I don't want any log of my user's ip addresses or requests. I will always have the opportunity to modify the code on my server to report those details, so a little bit of trust is always necessary.
That being said, here in the EU, IP addresses can be considered personal data under the GDPR. Given the nature of DNS queries, it is impractical to ask for a user's consent to store their IP address. This means it is only lawful to do so if storing the IP address is necessary under certain criteria. Unfortunately, I believe that logging IP addresses is unnecessary, so it will be hard to argue should someone complain.
The only option left currently is to manually implement the logger downstream and apply some regular expression that tries to remove sensitive data such as domain names or IP addresses from the log messages, given that even error messages can contain such details. It would, however, be much easier if trust-dns just removes those details from the log messages again.
@bluejekyll commented on GitHub (Sep 6, 2023):
I think this is a reasonable feature request, I just don't know off the top of my head how we should implement it. It would require a fairly extensive audit to identify all the potential areas where we log source IPs, and then have a solution in place to not log them.
I'm definitely open to us making this change. I think one thing we could do is say that all info logs and lower (debug, etc) can contain IPs but warnings or maybe error and up messages should be treated as restricted to not containing IPs. Then disabling info logs and only enabling warning or error would ensure that no IPs would be logged. That seems like the simplest thing to do, would that be sufficient?
@djc commented on GitHub (Sep 6, 2023):
I would argue that info shouldn't contain PII either (as IIRC info is commonly enabled by default), so only debug and trace should. Also I think we can be explicit that this is only about source IPs? That is, it mainly concerns the server logging the IP of requests it receives -- which reduces the scope quite a bit. IMO IPs contained in DNS responses are unlikely to be PII, as are name server IPs.
@bluejekyll commented on GitHub (Sep 6, 2023):
I think the problem I have right now is that we have, imo, a conflict between operational issues and privacy. I think the IP information for requests coming in on a publicly accessible server is important in case there is some malicious behavior. This could be used to help identify if there are folks working to DOS or DDOS a system. Then there is the use case mentioned here in regards to making sure we're not disclosing the behavior of an individual user. To me, the trust-dns service operator will always have access to this information (if they want it) regardless of what we do in the trust-dns server code. That is, it's not hard to sniff traffic on a server, and collect all of these details. quic and https can make that harder, but still the potential exists.
So I see a couple of options for us. The trust-dns server currently supports these log levels out of the box:
infois default,debug, andquiet, where quiet will only report warning or error messages. I'd like us to come up with a policy that balances all the needs that folks have. I think to answer what should appear in log messages at what time, we need to identify the various people involved. First, I'm going to assume we are all ok with any information being present in DEBUG log lines. This gives the power to an operator to collect a lot of information, but that is a certain amount of trust that is being put into the DNS resolver/server being used, so I think this is acceptable. So let's not discuss DEBUG log lines for now.That leaves INFO, WARN, and ERROR. When the INFO log level is disabled, there is no output from the server at all on any request. Based on this conversation, what I'm gathering is that INFO is enabled by default in many cases (it is by the trust-dns binary as well). What I want to know is that given this, it can be disabled today, why is INFO being enabled at all? Is it reasonable for us to suggest to people that they do not enable INFO on trust-dns when privacy is important? It sounds like based on this conversation, people think that isn't putting us on a privacy first posture. I'm not sure I agree with that at the moment, but I can be convinced. So assuming we remove this information from INFO, then my question becomes what information do people want in the standard INFO log line? For reference this is the default INFO log line on a request from trust-dns:
Let's split this out and discuss what things are ok, I'll have checkmarks for each item, unchecked things are what I assume the folks on this issue believe should not be included:
1694015948- timestampINFO:trust_dns_server::server::server_future:909:request- level, location, and type43742- DNS message ID from the request packetsrc:UDP- request is via UDP//127.0.0.1#60636- source IP address and port (disclosing PII)www.example.com.- requested name (disclosing the name requested)A- record type (is this safe?)IN- DNS classqflags:RD- query flags setresponse:NoError- response coderr:1/0/0- response record counts, answer/authority/additionalrflags:RD,AA- response flags setI realize I'm being a little pedantic here, but I'm trying to figure out the best path forward. Without the source IP and query name, what is valuable out of this message to keep? Or said another way, what information do people need to track on each request made to the server and for what reason?
In terms of error and warn messages, it will be problematic not to include the requested details of what query failed, and then if there is malicious behavior, the errors and warnings could help identify that (like badly configured zone or server, slow TCP connections, scanning operations trying to disclose and query the entire catalog or zone file, etc). This is why I'm struggling with a proper response to this issue. It feels like there are competing priorities depending on the needs people have for the logs.
@msrd0 commented on GitHub (Sep 6, 2023):
I guess to prevent DoS attacks an similar, a rate limit per IP would be much more effective, at least for people like me that don't inspect what their server does 99% of the time. That being said, I agree that the log messages is kinda irrelevant, especially when you remove IP and Domain name from it, so why don't you just make those log messages optional? If you suspect your server is under attack, you could enable logging, and if that's not the case, you can just leave it disabled?
@bluejekyll commented on GitHub (Sep 7, 2023):
Currently I'd argue the INFO log lines are optional, you must enable them for them to be visible. In the trust-dns binary, we default to INFO being on but it's easily disabled with the quiet flag. I think we'd still need to go through and cleanse WARN and ERROR messages of any IP data, and then we'd set a policy in the project recognizing that... (maybe we could add some tests as well somehow)
@djc commented on GitHub (Sep 7, 2023):
From the message contents you listed, I think it's really only the source IP that's problematic in terms of privacy. I do think a good default would be to avoid logging PII at levels above debug. It's fine that the operator still has ways to access this data, but at the same time it's good to avoid making it easy to leak by accident.
@bluejekyll commented on GitHub (Sep 12, 2023):
Thanks, @djc, do you feel like it's ok to keep the requested name in the logs? Is that something we might want to leave in INFO, but remove from WARN and ERROR?
@LuckyTurtleDev commented on GitHub (Sep 12, 2023):
I think requested domains are very private data too, especially if the server is only used by a few users . I would like to avoid logging them at info/warn/error.
@djc commented on GitHub (Sep 12, 2023):
That's fair, I guess the requested domain is also something that should not be logged at the default log levels (IMO there's no difference between INFO and WARN in terms of privacy). There's a reason a bunch of people are working on EncryptedClientHello for TLS, and it's mostly because they want to avoid leaking the server name.
@bluejekyll commented on GitHub (Sep 12, 2023):
If we are agreeing to not log name/ip etc, how much value is there in the rest of the message? should we just drop it to a DEBUG line, and then do an audit on other messages to see if there's data we don't want there?
@djc commented on GitHub (Sep 13, 2023):
I think there's still value in terms of figuring out how many messages are coming in, the message ID is useful to correlate with packet captures.
@jpds commented on GitHub (Sep 21, 2023):
I cannot think of any other DNS server software that implements this (or even a HTTP server that does this). Unbound, BIND, CoreDNS, and NSD do not do this.
As for GDPR and from a production systems and networks perspective; logging client IPs would fall squarely under https://gdpr-info.eu/recitals/no-49/ and I'm not going to wait for an attack to happen, or worse, "suspecting" that one has happened before I turn on logging. You'd have very little in terms of figuring out what actually happened to the compromised system (which, at this point, cannot be trusted anymore and you'd also have senior management breathing down your neck for answers).
It's far better to do this at a metric level, I filed #2032 about this.
@bluejekyll commented on GitHub (Sep 22, 2023):
to be clear, I think you mean, “logs the client IP address”, correct?
I think we’re all in agreement with this at this point. I’ll even happily make the change, the rest of the discussion was just around the value here.
@jpds commented on GitHub (Sep 22, 2023):
The opposite, obfuscating a client IP address / the domain it requested:
@bluejekyll commented on GitHub (Sep 22, 2023):
Thank you for clarifying, I was able to interpret your comment both ways. This gets me back to some of my original questions about just making these guarantees in different logging levels, and saying INFO will have this, but WARN and ERROR will not?
@LuckyTurtleDev commented on GitHub (Oct 15, 2023):
I still thinking adding the information at a log message manual by the user would be easier than filter the out by using a regex. Especial for domains this would be not easy, because they end at
.like many other stuff and a regex would match to much or not all domains.An other option would be a option to configure, the logging of this information independent of the log level.
For example by function or feature flag.