I was asked to assist in debugging a strange issue involving a BIND resolver: seemingly correlating with an upgrade to Debian 10 a while ago, the chaps were reporting that their 9.11.5 BIND resolvers where responding with impossible TTLs on NOERROR/NODATA responses. My answer: nope – can’t happen.

Spoiler: it can.

Let’s look at an SOA record. It’s incomplete, but the important bits are showing (and for those in the know, the negative TTL here is also 3600). You’re welcome to follow along – example.com is delegated. :-)

Querying the authority server for the SOA shows

;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1

example.com.   3600  IN   SOA

Now, when querying the resolver for this owner with a non-existent type, SRV, say, I should get a NOERROR/NODATA response, so here it is:

;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

example.com.            10800   IN      SOA

Whazzat! Where does the TTL of 10800 come from!?! I blamed everything and the kitchen sink. The network, the authority server, the selection of menus in the company restaurant (even though I’m at home) – you name it, I blamed it, but I honestly didn’t believe BIND was doing this. It must be that hand-rolled ancient Perl-based database-driven unmaintained authoritative server from which they’re feeding it records … I even blamed AppArmor so that was disabled as well.

In cases like these I remove as much complexity as possible, and thankfully the chaps I was working with were game to attempting anything; they’d been working at this problem for this for a while. I should add that I’d spent an hour in the morning attempting but failing at reproducing the issue on multiple versions of BIND.

First things first, let’s start with the following configuration file: (I can almost hear Evan tell me “that’s too verbose!” :-)

options {
        directory ".";

This configuration also surfaced the 10800 TTL. Hmm.

We started off by capping max-ncache-ttl; I choose idiotic numbers I will recognize. Let’s set that to 37.

example.com.            37      IN      SOA

OK, that looks sane. Let’s up it a bit to, say, max-ncache-ttl 12341234; even though it “will be silently truncated to 7 days if set to a greater value”.

example.com.            604800  IN      SOA

DAFUQ? The behavior of that TTL increasing this way doesn’t make sense to me at all.

Well into the morning, people were getting hungry so we took a 30 minute break, and I got a cup of coffee and came to the conclusion the cause must be a combination of strange defaults. On the way back to my desk it occurred to me what the issue might be, and a moment later I was able to reproduce the 10800 TTL on my own test install of 9.11.5, and then I “fixed” it. Here’s how:

options {
        directory ".";
	dnssec-validation auto;

In 9.11.5 dnssec-validation defaults to yes, and the Bv9ARM clearly says

If set to yes, DNSSEC validation is enabled, but a trust anchor must be manually configured using a trusted-keys or managed-keys statement. The default is yes.

A trust anchor was not configured, neither manually nor at all. As they say in Turkey “trust anchor yok”. My chaps had noticed the option now defaulting to yes, but they’d ignored it during the update thinking “we’ll check that later when all quiets down, let’s concentrate on the important things first”. Ouch. That caused the pain. (If you’re reading this, beware: in future versions the semantics of this option change.)

So why does the auto fix it? Quoting from the ARM again:

If set to auto, DNSSEC validation is enabled, and a default trust anchor for the DNS root zone is used.

All’s well which ends well.

I’d normally submit a bug report, but I’ve decided it’s probably not worthwhile; it’s basically a configuration error, although it wasn’t logged as such. (If somebody at ISC sees this and wants me to, they know where to find me. :-)

OK, so I lied: we were at it for 16800 seconds, and it was a very pleasant debugging / hair-pulling session with fun and very capable chaps. I cannot tell you why this TTL business is important because you could deduce a company name from that, and I don’t divulge those. Unless we’re out having a drink. ;-)