It’s a wee ironic that a day after I tell a group of students that DNSSEC is very stable today compared to what it was like years ago (grandpa telling tales at the fireplace), I get a mail from the .CH registry:
My first thought was “they must be mistaken; I have only one dummy zone there”, and my second though was “wow, that’s pretty neat that they monitor customer zones!”. Still convinced it must be a case of mixed up identity, I click on the link which takes me to
OK, that’s definitely my zone; it’s the one I set up for testing DNSSEC provisioning automation with CDS/CDNSKEY.
I query the NS RRset for the domain and get a … SERVFAIL
. What?!
Next up: DNSviz, of course, and there’s no doubt: a cryptographic signature on the RRSIG of the NS RRset does not compute:
I’m getting a bit nervous because the third day of training commences in an hour, but let me see what I see. Two of the four NS are under my control, so I force a transfer from the primary and notify all secondaries. The SOA serials match across the board. I contact the operator of the other two secondaries and request him to force transfer; no change – the kaputtness remains.
There’s no way for me to force re-signing of the zone, so I simply don’t know how to “fix” that one signature. I copy the zone file aside on primary and secondary, and then un-sign the primary zone, bump the serial, reload. This will now of course completely break validation as there still is a DS in the parent zone, but kaputt is kaputt and I can’t kaputt it further.
I bump the SOA serial again and sign the zone and see that the new zone is now on all servers.
Green. Relief.
post mortem
My training ends at 21:00 UTC, and I can’t switch off – I need to know what happened.
Using ldns-verify-zone
on the kaputt zone, I see it too reports errors:
% ldns-verify-zone -V5 tcp53.ch.kaputt > /dev/null
Error: Bogus DNSSEC signature for tcp53.ch. NS
RRSet:
tcp53.ch. 3600 IN NS ns1.dnspartner.de.
tcp53.ch. 3600 IN NS ns2.dnspartner.de.
tcp53.ch. 3600 IN NS lumpy.jpmens.net.
tcp53.ch. 3600 IN NS woozle.jpmens.net.
Signature:
tcp53.ch. 3600 IN RRSIG NS 13 2 3600 20221120102241 20221021101736 44444 tcp53.ch. TPfWAl4GffhEcyX50bZ4z43dtsrjL3dj/i+sSAMnJPXTuYmMCtQvLM8Hr/TbracpOPjymPrgvSQ+8wfBLeZgxw==
There were errors in the zone
I then load the zone into a fresh server, and see what delv
has to say:
% delv +vtrace +root=tcp53.ch +multiline +trust +rrcomments +crypto +rtrace -d 99 -a tcp53.keys @::1 tcp53.ch NS
...
;; validating tcp53.ch/NS: verify rdataset (keyid=44444): RRSIG failed to verify
;; validating tcp53.ch/NS: failed to verify rdataset
;; validating tcp53.ch/NS: verify failure: success
;; validating tcp53.ch/NS: no valid signature found
...
OK, I know of the failure, I was hoping for a few more details.
It occurred to me to use Perl Net::DNS to see if I could obtain more details. I asked for a bit of help, and Oli Schacher came to the rescue. First I verify that the current zone is OK so I use the current NS RRset and its RRSIG:
#!/usr/bin/env perl
use strict;
use Net::DNS;
use Net::DNS::SEC;
my $dnskey = "tcp53.ch. 3600 IN DNSKEY 256 3 13 egTvRrsMdaMjapWI4pC2M5dq6s0W6gpsLT4LwiwXvYs66CqPu+N+JgbO kLVIAwm8PGnPDEIDcAcHViYSvFbHpg==";
# current one: verifies
my $rrsig = "tcp53.ch. 3600 IN RRSIG NS 13 2 3600 20221122031533 20221109133528 44444 tcp53.ch. xxW7tx5fIMUiIOIYrjfCq4h/T28rLlR6NSa0NOZC5NFalz/ShKPkpL3K KgsBKjTD0lleKUd5cqGCtyM4vIFm1Q==";
# this one is kaputt
# $rrsig = "tcp53.ch. 3600 IN RRSIG NS 13 2 3600 20221120102241 20221021101736 44444 tcp53.ch. TPfWAl4GffhEcyX50bZ4z43dtsrjL3dj/i+sSAMnJPXTuYmMCtQvLM8Hr/TbracpOPjymPrgvSQ+8wfBLeZgxw==";
my @data = ();
push(@data, Net::DNS::RR->new("tcp53.ch. 3600 IN NS woozle.jpmens.net."));
push(@data, Net::DNS::RR->new("tcp53.ch. 3600 IN NS ns2.dnspartner.de."));
push(@data, Net::DNS::RR->new("tcp53.ch. 3600 IN NS lumpy.jpmens.net."));
push(@data, Net::DNS::RR->new("tcp53.ch. 3600 IN NS ns1.dnspartner.de."));
my $dnskeyrr = Net::DNS::RR->new($dnskey);
my $nssig = Net::DNS::RR->new($rrsig);
my $v = $nssig->verify( [ @data ], [ $dnskeyrr ]);
print "verifies\n" if $v or die $nssig->vrfyerrstr;
% ./sig.pl
verifies
I then run the program with the kaputt RRSIG, and voila, we’re on the right track:
% ./sig.pl
key 44444: signature verification failed at ./sig.pl line 28.
I say “on the right track”, but I’m not really – I’ve simply verified what the registry’s email informed me of that morning.
But why is this signature broken?
Oli checks the logs from their scanner tool that actually caused this email to see if they have more details from the Extended DNS Error, but it also just reports “(DNSSEC Bogus)”.
It’s now quite late, and Oli thinks the kaputt RRSIG is probably due to bit flip, cosmic rays, a bug in the signer, or the Pentium FDIV, and I am starting to think he’s right about that…
Time to sleep.
I rise, get coffee, and look through the repository which gets a copy of all zone activity upon notify.
The (now broken) signature was introduced on 2022-10-21. Then, on 2022-10-31T11:21:05, I replaced one of the nameservers, retiring “kanga” and introducing “woozle”, and due to cosmic whatever, the signature was not updated. (It must be a bug, but how on earth do I report that?!)
Can I prove that’s the reason? Yes, I can. In our Perl program, if I s/woozle/kanga/
in the NS RRset the (reportedly broken) RRSIG validates!
I deduce it’s a problem in the signer, but have no idea why it occurred. However, my biggest question is: why did this take so long to be noticed?
While getting the data for this blogticle, I note that DNSviz noticed the failure on 2022-10-31 15:13:10Z which was the time at which the zone went insecure because of the NS RRset change. (But that’s a story for another day.)
I was lucky that this is currently a toy zone.
And kudos to our friends at SWITCH for the excellent service; thank you!
whodunnit?
I went for groceries and a spot of lunch. This topic didn’t leave me alone, and I didn’t really believe in signer bugs or Oli’s cosmic rays. Upon returning to my desk I checked the dynamic DNS update logs:
% grep tcp53\.ch update.log
...
22-Sep-2021 06:24:38.475 ::1#59860/key local-ddns: updating zone 'tcp53.ch/IN': adding an RR at 'tcp53.ch' RP . jpm.people.dnslab.org.
25-Sep-2021 18:44:43.001 ::1#54888/key local-ddns: updating zone 'tcp53.ch/IN': adding an RR at 'tcp53.ch' CDS 0 0 0 00
31-Oct-2022 17:02:14.689 ::1#62762/key <redacted> updating zone 'tcp53.ch/IN': adding an RR at '<redacted>.tcp53.ch' NS <redacted>.
31-Oct-2022 17:07:48.056 2a03:b0c0:3:d0::1453:6001#45137/key <redacted> updating zone 'tcp53.ch/IN': adding an RR at '<redacted>.tcp53.ch' DS 1722 13 2 AD....
I bet you see it coming…
I serve a few zones for myself, some of my projects, and a few other people, and most of my zones are signed; other people’s are signed only if they ask me to. (There’s no specific reason for me not to sign them by default.)
Anywow, the plain (i.e. unsigned) zone files live in a separate directory, and (you do see it coming, dontcha?) upon changing the NS RRset, and being a lazy bum, I “automated” the task:
% for z in *; do sed -i -e s,kanga,woozle, $z; done
Do you want to guess which other zone file was in this directory? Yes.
This should have become apparent when the simple change of a single NS RR in our Perl program suddenly validated.
So, the very good news is that I can now finally put this to rest, and the even better news is that Oli was wrong with his FDIV bug and cosmic whatnots. On the flip side, this is embarrassing, but not a fraction as embarrassing as the cake fiasco as this didn’t impact anybody other than me and my pride.
Guillaume-J puts it very nicely:
it’s not a DNSSEC error per se: let’s say that signing your zone helped noticing the failure in your “manual” zone edition process :-)