[GH-ISSUE #528] Problems with max_ttl on embedded systems #520

Closed
opened 2026-03-15 22:55:51 +03:00 by kerem · 12 comments
Owner

Originally created by @kingoflolz on GitHub (Jul 12, 2018).
Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/528

Hi, I'm using trust-dns-resolver on some openwrt systems and found that version 0.9 crashes because it overflows time structs in new_with_max_ttl, as the time type defines seconds as an i32.

I've worked around this by making MAX_TTL lower (to one year), and from reading the RFC 2181 it seems that it is standard conforming behavior " Implementations are always free to place an upper bound on any TTL received, and treat any larger values as if they were that upper bound".

However, I'm not familiar enough with the codebase to know what happens if trust-dns-resolver stays running for over a year, and if this will break any assumptions the rest of the code has in the codebase. If there are no problems with using a shorter max TTL, I suggest it be the default as the one we have right now makes it unusable on devices with a shorter time type

Originally created by @kingoflolz on GitHub (Jul 12, 2018). Original GitHub issue: https://github.com/hickory-dns/hickory-dns/issues/528 Hi, I'm using trust-dns-resolver on some openwrt systems and found that version 0.9 crashes because it overflows time structs in `new_with_max_ttl`, as the time type defines seconds as an i32. I've worked around this by making MAX_TTL lower (to one year), and from reading the RFC 2181 it seems that it is standard conforming behavior " Implementations are always free to place an upper bound on any TTL received, and treat any larger values as if they were that upper bound". However, I'm not familiar enough with the codebase to know what happens if trust-dns-resolver stays running for over a year, and if this will break any assumptions the rest of the code has in the codebase. If there are no problems with using a shorter max TTL, I suggest it be the default as the one we have right now makes it unusable on devices with a shorter time type
kerem 2026-03-15 22:55:51 +03:00
Author
Owner

@bluejekyll commented on GitHub (Jul 12, 2018):

Thanks for the report. I'd love to hear more sometime about your use case.

This seems to be the relevant section of the RFC: https://tools.ietf.org/html/rfc2181#section-8

The definition of values appropriate to the TTL field in STD 13 is
   not as clear as it could be, with respect to how many significant
   bits exist, and whether the value is signed or unsigned.  It is
   hereby specified that a TTL value is an unsigned number, with a
   minimum value of 0, and a maximum value of 2147483647.  That is, a
   maximum of 2^31 - 1.  When transmitted, this value shall be encoded
   in the less significant 31 bits of the 32 bit TTL field, with the
   most significant, or sign, bit set to zero.

   Implementations should treat TTL values received with the most
   significant bit set as if the entire value received was zero.

   Implementations are always free to place an upper bound on any TTL
   received, and treat any larger values as if they were that upper
   bound.  The TTL specifies a maximum time to live, not a mandatory
   time to live.

You probably want to address this section of code: https://github.com/bluejekyll/trust-dns/blob/master/proto/src/xfer/dns_multiplexer.rs#L333-L334

I'm also ok with reducing the TTL to a year, that shouldn't be a problem. Could you make sure to put a link to the RFC and a note about why the TTL is being set to what it will be? I'd love a PR on this, thanks!

I did a quick scan of the code, the only other spot (this is for the server/client which it looks like you aren't using so you don't need to address it) would be: https://github.com/bluejekyll/trust-dns/blob/master/client/src/serialize/txt/master.rs#L274

<!-- gh-comment-id:404552377 --> @bluejekyll commented on GitHub (Jul 12, 2018): Thanks for the report. I'd love to hear more sometime about your use case. This seems to be the relevant section of the RFC: https://tools.ietf.org/html/rfc2181#section-8 ```text The definition of values appropriate to the TTL field in STD 13 is not as clear as it could be, with respect to how many significant bits exist, and whether the value is signed or unsigned. It is hereby specified that a TTL value is an unsigned number, with a minimum value of 0, and a maximum value of 2147483647. That is, a maximum of 2^31 - 1. When transmitted, this value shall be encoded in the less significant 31 bits of the 32 bit TTL field, with the most significant, or sign, bit set to zero. Implementations should treat TTL values received with the most significant bit set as if the entire value received was zero. Implementations are always free to place an upper bound on any TTL received, and treat any larger values as if they were that upper bound. The TTL specifies a maximum time to live, not a mandatory time to live. ``` You probably want to address this section of code: https://github.com/bluejekyll/trust-dns/blob/master/proto/src/xfer/dns_multiplexer.rs#L333-L334 I'm also ok with reducing the TTL to a year, that shouldn't be a problem. Could you make sure to put a link to the RFC and a note about why the TTL is being set to what it will be? I'd love a PR on this, thanks! I did a quick scan of the code, the only other spot (this is for the server/client which it looks like you aren't using so you don't need to address it) would be: https://github.com/bluejekyll/trust-dns/blob/master/client/src/serialize/txt/master.rs#L274
Author
Owner

@azdlowry commented on GitHub (Jul 31, 2018):

Seem to be having this problem on 32bit Android as well. It's hard to narrow down the exact area of code that has the issue as backtraces don't work on Android. But we get a panic with any of the resolvers:

thread 'service_location::test::run_trust_dns_example' panicked at 'overflow when adding duration to time', libcore/option.rs:960:5

<!-- gh-comment-id:409191301 --> @azdlowry commented on GitHub (Jul 31, 2018): Seem to be having this problem on 32bit Android as well. It's hard to narrow down the exact area of code that has the issue as backtraces don't work on Android. But we get a panic with any of the resolvers: thread 'service_location::test::run_trust_dns_example' panicked at 'overflow when adding duration to time', libcore/option.rs:960:5
Author
Owner

@azdlowry commented on GitHub (Jul 31, 2018):

I've reduced MAX_TTL to 2^28-1 and it no longer panics. Though I don't think this is the best solution.

<!-- gh-comment-id:409198881 --> @azdlowry commented on GitHub (Jul 31, 2018): I've reduced MAX_TTL to 2^28-1 and it no longer panics. Though I don't think this is the best solution.
Author
Owner

@bluejekyll commented on GitHub (Jul 31, 2018):

We could create a new TTL type that expresses maximum bounds with conversions from i32 as necessary, validating and storing a maximum based on platform.

<!-- gh-comment-id:409202239 --> @bluejekyll commented on GitHub (Jul 31, 2018): We could create a new TTL type that expresses maximum bounds with conversions from i32 as necessary, validating and storing a maximum based on platform.
Author
Owner

@azdlowry commented on GitHub (Jul 31, 2018):

Only glanced through the code, so this might not be correct, but it seems the issue is with the use Instant type as this wraps a libc time_t structure, which is i32 on 32bit linux and i64 on 64bit linux. Should trust-dns be using a different time structure that is consistent across platforms?

<!-- gh-comment-id:409231690 --> @azdlowry commented on GitHub (Jul 31, 2018): Only glanced through the code, so this might not be correct, but it seems the issue is with the use Instant type as this wraps a libc time_t structure, which is i32 on 32bit linux and i64 on 64bit linux. Should trust-dns be using a different time structure that is consistent across platforms?
Author
Owner

@bluejekyll commented on GitHub (Aug 2, 2018):

So one issue I have is access to a testing platform to verify a fix for this. The library relies on 'std::time::Duration' for all TTL and time evaluation, I don’t really want to change that, but we can add restrictions on top of that.

I’m fine with forcing these values to be safe for 32bit platforms. I don’t see any need for longer TTLs than that.

I propose a new TTL type that would enforce these limits and make it typesafe, but I’m open to other ideas.

<!-- gh-comment-id:409974277 --> @bluejekyll commented on GitHub (Aug 2, 2018): So one issue I have is access to a testing platform to verify a fix for this. The library relies on 'std::time::Duration' for all TTL and time evaluation, I don’t really want to change that, but we can add restrictions on top of that. I’m fine with forcing these values to be safe for 32bit platforms. I don’t see any need for longer TTLs than that. I propose a new TTL type that would enforce these limits and make it typesafe, but I’m open to other ideas.
Author
Owner

@azdlowry commented on GitHub (Aug 2, 2018):

I'm happy to make the changes, and I can look into adding 32bit linux to CI as well.

My only concern with limiting the TTL is in picking a duration that is sensible. We can pick 19 years today but we'll be back here with the same problem in January 2019. If we say 10 years we'll be good till year 2028. If 10 years is good enough on 32 bit systems why is it not good enough on 64 bit systems as well? I think I don't understand the reason for the long TTL enough to be able to pick sensible ones.

I'll also have a look into replacing the Instant type with something that is consistent across platforms, then we don't need to change the TTL at all.

<!-- gh-comment-id:409991584 --> @azdlowry commented on GitHub (Aug 2, 2018): I'm happy to make the changes, and I can look into adding 32bit linux to CI as well. My only concern with limiting the TTL is in picking a duration that is sensible. We can pick 19 years today but we'll be back here with the same problem in January 2019. If we say 10 years we'll be good till year 2028. If 10 years is good enough on 32 bit systems why is it not good enough on 64 bit systems as well? I think I don't understand the reason for the long TTL enough to be able to pick sensible ones. I'll also have a look into replacing the Instant type with something that is consistent across platforms, then we don't need to change the TTL at all.
Author
Owner

@bluejekyll commented on GitHub (Aug 4, 2018):

If 10 years is good enough on 32 bit systems why is it not good enough on 64 bit systems as well? I think I don't understand the reason for the long TTL enough to be able to pick sensible ones.

If it sounded like I had a differing opinion on this, I apologize. I agree that it should be the same limit on 64bit and 32bit. TTL is really just a caching hint, too short is better than too long. I'm happy with anything over a day. The only reason the current values are there is that they seemed to be in accordance with the RFCs. The issue here is that I've incorrectly added the MAX_TTL value, to an existing time. We should be safe to just drop the MAX_TTL to 86400.

Instant and Duration don't appear to have a saturating_add, which would really be the correct addition to use here...

<!-- gh-comment-id:410472838 --> @bluejekyll commented on GitHub (Aug 4, 2018): > If 10 years is good enough on 32 bit systems why is it not good enough on 64 bit systems as well? I think I don't understand the reason for the long TTL enough to be able to pick sensible ones. If it sounded like I had a differing opinion on this, I apologize. I agree that it should be the same limit on 64bit and 32bit. TTL is really just a caching hint, too short is better than too long. I'm happy with anything over a day. The only reason the current values are there is that they seemed to be in accordance with the RFCs. The issue here is that I've incorrectly added the MAX_TTL value, to an existing time. We should be safe to just drop the MAX_TTL to 86400. Instant and Duration don't appear to have a saturating_add, which would really be the correct addition to use here...
Author
Owner

@bluejekyll commented on GitHub (Aug 5, 2018):

#542 and #543 are partial fixes for this issue. I would like us to come up with a type-safe response, but they will at least fix the current issue.

<!-- gh-comment-id:410523931 --> @bluejekyll commented on GitHub (Aug 5, 2018): #542 and #543 are partial fixes for this issue. I would like us to come up with a type-safe response, but they will at least fix the current issue.
Author
Owner

@bluejekyll commented on GitHub (Aug 5, 2018):

This std::time issue appears related: https://github.com/rust-lang/rust/issues/48980

<!-- gh-comment-id:410540704 --> @bluejekyll commented on GitHub (Aug 5, 2018): This `std::time` issue appears related: https://github.com/rust-lang/rust/issues/48980
Author
Owner

@bluejekyll commented on GitHub (Aug 5, 2018):

I'm going to close this, we can open a different issue we want to harden the TTL max with a type, but I think we might want to get saturating add into the std::time impls.

<!-- gh-comment-id:410542655 --> @bluejekyll commented on GitHub (Aug 5, 2018): I'm going to close this, we can open a different issue we want to harden the TTL max with a type, but I think we might want to get saturating add into the std::time impls.
Author
Owner

@azdlowry commented on GitHub (Aug 6, 2018):

No apologies needed, I didn't want to dive into any changes without understanding all the details first. Saturating add does seem to be the best solution. Thanks.

<!-- gh-comment-id:410668873 --> @azdlowry commented on GitHub (Aug 6, 2018): No apologies needed, I didn't want to dive into any changes without understanding all the details first. Saturating add does seem to be the best solution. Thanks.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/hickory-dns#520
No description provided.