DNS servers optionally log queries on demand by formatting a message and storing that in a file, sending it through syslog, etc. This is an I/O-intensive operation which can dramatically slow down busy servers, and the biggest issue is we get the query but not the associated response.

[1505125481] unbound[89142:0] info: 127.0.0.2 dnstap.info. A IN

10-Sep-2017 08:31:03.644 client @0x7f9b12dd5c00 127.0.0.2#52872 (dnstap.info): view internal: query: dnstap.info IN A +E(0)K (127.0.0.2)

In addition to having to format the data into human-readable format and write the resulting string to a file, DNS server authors haven’t been able to standardize on query logging formats. As can be seen by the two examples above (first Unbound, then BIND), the strings differ dramatically. The different results also mean that further parsing/processing of these logs will have to be different as well. (Have fun building regular expressions for both and having more than two problems.)

One method to overcome this is to capture packets externally, such as how DSC does it, but doing it in this fashion means the software must deal with several things the name server has already dealt with: fragments, TCP stream reassembly, spoofed packets, etc. (Here’s a bit of a “versus” thread.)

An issue with both these methods is that the query a name server received and the response it returned aren’t bundled together. Only the name server software itself knows, really, what belongs together at the time the query occurred and the response was returned. Can you imagine a DNS log so complete that you could see what query a client issued and which response it got?

dnstap is a solution which introduces a flexible, binary log-format for DNS servers together with Protocol Buffers, a mechanism for serializing structured data. Robert Edmonds had the idea for dnstap and created the first implementation with two specific use cases in mind:

  • make query-logging faster by eliminating synchronous I/O bottlenecks and message formatting
  • avoid complicated state reconstruction by capturing full messages instead of packets for passive DNS

What dnstap basically does is to add a lightweight message-copy routine into a DNS server. This routine duplicates a DNS message with context and moves it out of the DNS server to a listening process, typically via a Unix domain socket. In case of extreme load, a DNS server can simply start dropping log payloads instead of degrading server performance.

dnstap enabled DNS server (from project)

The dnstap protocol buffer content is defined in this schema, and includes a type of message (see below), the socket family queries/responses were transported on, socket protocols, query and responder addresses, the initiator’s DNS query message in wire format, timestamps, and the original wire-format DNS response message, verbatim.

dnstap is currently implemented in a few utilities as well as in these DNS servers:

  • BIND
  • CoreDNS
  • Knot 2.x
  • Knot Resolver (> 1.2.5)
  • Unbound

For my experiments, I’ll be using BIND 9.11.2, CoreDNS-011, Knot 2.5.4, and Unbound 1.6.5.

Before launching a dnstap-enabled (and configured) DNS server, we have to ensure a listener has created the Unix domain socket. The dnstap code in the BIND, Unbound etc. acts as a client rather than a server, so it requires a server which will accept connections. Robert Edmonds, dnstap inventor, did it this way so that a single socket could be used by different dnstap senders (like how a system daemon listens to messages from multiple clients). If the Unix socket isn’t present or nothing’s listening on it, the client code (in the DNS server) will periodically attempt reconnection.

We’ll be looking at two programs which provide this functionality.

Prerequisites

We’ll likely have to build our DNS server installations ourselves as official packages are typically not built with dnstap support. The requirements for all the below (except CoreDNS which provides everything in its single statically linked binary) will be fstrm, protobuf, and protobuf-c:

  • fstrm is a frame streams implementation in C. It implements a lightweight protocol with which any serialized data format which produces byte sequences can be transported and provides a Unix domain listener (fstrm_capture) for dnstap records written by the DNS servers.
  • protobuf is the implementation of Google’s Protocol Buffers format. We install it in order to build and use some of the utilities, namely the protobuf compiler.
  • protobuf-c is a C implementation of the latter; this includes a library (libprotobuf-c) which some of the utilities require.

Other than these requirements, a number of the DNS server implementation I mention have their own additional requirements which I will not specify here – the projects’ documentation will tell you more.

Utilities

What follows are some utilities we’ll be using for working with and/or decoding (i.e. printing) dnstap records.

dnstap -u

fstrm_capture

I can use fstrm_capture to create a required Unix domain socket which dnstap clients can write to. The program needs a “Frame Streams” content type specified as well as the path to the Unix socket and the file name it should protocol buffer frames to:

$ fstrm_capture -t protobuf:dnstap.Dnstap -u /var/run/dnstap.sock -w fstrm.tap

While there is provision in the code to handle SIGHUP (fstrm_capture then flushes the output file), there is no provision for file rotation.

An alternative method for doing similarly is to use the dnstap utility from the dnstap packag.

dnstap

The dnstap project maintains a dnstap utility written in Go. There are, unfortunately, no prebuilt binaries on the releases page, but building the program is easy (after you go through the hassle of installing go).

I launch the dnstap utility (instead of launching fstrm_capture) like this:

dnstap -u /var/run/dnstap.sock -w file.tap

I can also use dnstap to read a tap file from the file system and print it out in various formats, which I will be showing below when we look at some examples. dnstap can also create a TCP endpoint (e.g. for CoreDNS) with dnstap -l <address:port>

dnstap-ldns

For the actual decoding of dnstap files (i.e. printing them out), we can use dnstap as just discussed, or the reference utility which is called dnstap-ldns which has thankfully kept the option letters used by dnstap. However, this utility, as the name implies, brings an additional dependency, namely ldns. (But you have that already for its utility programs, don’t you?)

dnstap-read

Whilst on the subject of decoding dnstap files, dnstap-read, from the BIND distribution, can also do that nicely. By default it prints the short version, but with -y it’ll also do the long YAML format.

$ dnstap-read file.tap
11-Sep-2017 10:59:00.652 CR 127.0.0.2:55453 <- 127.0.0.2:53 UDP 107b www.test.aa/IN/A
11-Sep-2017 10:59:00.954 CR 127.0.0.2:61308 <- 127.0.0.2:53 UDP 107b www.test.aa/IN/A
$ dnstap-read -y file.tap
---
type: MESSAGE
identity: tiggr
version: bind-9.11.2
message:
  type: CLIENT_RESPONSE
  response_time: !!timestamp 2017-09-11T08:59:00Z
  message_size: 107b
  socket_family: INET
  socket_protocol: UDP
  query_address: 127.0.0.2
  response_address: 127.0.0.2
  query_port: 61308
  response_port: 53
  response_message_data:
    opcode: QUERY
    status: NOERROR
    id:  24094
    flags: qr aa rd ra
    QUESTION: 1
    ANSWER: 1
    AUTHORITY: 1
    ADDITIONAL: 1
    OPT_PSEUDOSECTION:
      EDNS:
        version: 0
        flags:
        udp: 4096
        COOKIE: 67816834b9432822c5a508fd59b65054fb5bbab0c5fe14f8
    QUESTION_SECTION:
      - www.test.aa. IN A
    ANSWER_SECTION:
      - www.test.aa. 60 IN A 192.168.1.20
    AUTHORITY_SECTION:
      - test.aa. 60 IN NS localhost.
  response_message: |
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id:  24094
    ;; flags: qr aa rd ra    ; QUESTION: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4096
    ; COOKIE: 67816834b9432822c5a508fd59b65054fb5bbab0c5fe14f8
    ;; QUESTION SECTION:
    ;www.test.aa.			IN	A

    ;; ANSWER SECTION:
    www.test.aa.		60	IN	A	192.168.1.20

    ;; AUTHORITY SECTION:
    test.aa.		60	IN	NS	localhost.

kdig

kdig is the dig-like utility shipped with Knot, and it too can read dnstap files and present them in dig-like (I should probably say kdig-like) manner. I note that kdig doesn’t show version information recorded in the tap file.

$ kdig -G file.tap +multiline
;; Received 759 B
;; Time 2017-09-10 06:28:20 UTC
;; From 199.249.113.1@53(TCP) in 32.1 ms
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 38621
;; Flags: qr aa; QUERY: 1; ANSWER: 3; AUTHORITY: 0; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 1280 B; ext-rcode: NOERROR

;; QUESTION SECTION:
;; dnstap.info.          IN A

;; ANSWER SECTION:
dnstap.info.            3600 IN A 192.30.252.154
dnstap.info.            3600 IN A 192.30.252.153
dnstap.info.            3600 IN RRSIG A 5 2 3600 20171006034256 (
                                20170906032611 36186 dnstap.info.
                                ekqkRvSmO0csExldaVG5RlxEvKSs/Spi4szHukeM
                                dz6dW0493rv+wXsKzjnyyTrOWPZWiplfGJZL2MQL
                                Yc4hg1h0J89YPEfomkE0d6yIybuFjljhuX/YT34E
                                FLv45Wbq+N20mBrdSupajgPQEmFUgnhUT6hg4Ayf
                                t6T3UuVcJ3M=
                                )

What’s quite practical is that kdig can record a live query / response (i.e. something you’d do right now) into a tap file. So, in the following example, I use kdig to perform a query and the program writes in dnstap format to the specified file what we get on stdout:

$ kdig -E iis-a.tap iis.se AAAA
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 53652
;; Flags: qr rd ra; QUERY: 1; ANSWER: 1; AUTHORITY: 3; ADDITIONAL: 0

;; QUESTION SECTION:
;; iis.se.             		IN	AAAA

;; ANSWER SECTION:
iis.se.             	60	IN	AAAA	2001:67c:124c:4006::214

;; AUTHORITY SECTION:
iis.se.             	3600	IN	NS	i.ns.se.
iis.se.             	3600	IN	NS	ns.nic.se.
iis.se.             	3600	IN	NS	ns3.nic.se.

$ ls -l *.tap
-rw-r--r-- 1 jpm users    305 Sep 10 14:55 iis-a.tap

# (I have reported the epoch of 1970-01-01 as a bug to the knot-dns project)

$ dnstap-ldns -r iis-a.tap
1970-01-01 04:22:23.659631 TQ 192.168.1.81 UDP 24b "iis.se." IN AAAA
1970-01-01 04:22:23.725209 TR 192.168.1.81 UDP 110b "iis.se." IN AAAA

$ dnstap-ldns -r iis-a.tap -y
---
type: MESSAGE
version: "kdig 2.5.4"
message:
  type: TOOL_RESPONSE
  query_time: !!timestamp 1970-01-01 04:22:23.659631
  response_time: !!timestamp 1970-01-01 04:22:23.725209
  socket_family: INET
  socket_protocol: UDP
  query_address: 0.0.0.0
  response_address: 192.168.1.81
  query_port: 54370
  response_port: 53
  response_message: |
    ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 53652
    ;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 0

    ;; QUESTION SECTION:
    ;iis.se.	IN	AAAA

    ;; ANSWER SECTION:
    iis.se.	60	IN	AAAA	2001:67c:124c:4006::214

(Note how kdig uses the TOOL_* subtypes in the dnstap records.)

After discussing some of the tools for working with (in particular for decoding) dnstap, I now turn to the DNS servers proper.

DNS servers

Before we look at the individual DNS servers and how to configure them for support of dnstap, it’s interesting to know that dnstap currently has 12 defined subtypes of dnstap “Message”. dnstap tags a log record with a subtype corresponding to the location at which a record was recorded, so we can at any point in time see where the record was collected.

dnstap flow

  • AUTH_QUERY (AQ)
  • AUTH_RESPONSE (AR)
  • RESOLVER_QUERY (RQ)
  • RESOLVER_RESPONSE (RR)
  • CLIENT_QUERY (CQ)
  • CLIENT_RESPONSE (CR)
  • FORWARDER_QUERY (FQ)
  • FORWARDER_RESPONSE (FR)
  • STUB_QUERY (SQ)
  • STUB_RESPONSE (SR)
  • TOOL_QUERY (TQ)
  • TOOL_RESPONSE (TR)

These subtypes ought to be pretty self-explanatory, but their full description is in the dnstap protocol schema. The diagram above illustrates at which point they are obtained. The mnemonics in parenthesis are those which are output by the utilities in “quiet” mode.

BIND

BIND’s configuration is as flexible as Unbound’s in terms of dnstap logging. I build named by adding --enable-dnstap to ./configure and then modify named.conf.

I can set different types to be logged for each view (but I dislike views so I wont’t do that). Supported types are client, auth, resolver, and forwarder as well as all, which causes all dnstap messages to be logged regardless of their type. Each type may take an additional argument to indicate whether to log query or response messages. If not specified, BIND should log both, but this didn’t work for me.

options {
    dnstap { all; };
    // dnstap { auth; resolver query; resolver response; };

    /* where to capture to: file or unix (socket) */
    // dnstap-output file "/tmp/named.tap";
    dnstap-output unix "/var/run/dnstap.sock";

    dnstap-identity "tiggr";
    dnstap-version "bind-9.11.2";
};

When named starts it starts producing dnstap log data. When writing to a file, we can instruct named to truncate and reopen the file or we can instruct named to roll its dnstap output file using rndc:

$ rndc dnstap -reopen        # Close, truncate and re-open the DNSTAP output file.
$ rndc dnstap -roll <count>  # Close, rename and re-open the DNSTAP output file(s).

If you’re interested in the nitty-gritty of dnstap with servers with are both authoritative and recursive, here’s a thread Evan Hunt started. (But in my opinion you should not be interested in servers which are simultaneously authoritative and recursive …)

Other than that, there’s a good single-page document on using dnstap with BIND.

In BIND 9.12.x, dnstap logfiles can be configured to automatically roll when they reach a specified size, for example:

dnstap-output file "/taps/prod.tap" size 15M versions 100 suffix increment;

CoreDNS

I spoke earlier of CoreDNS, and one of the really great things of this single-binary program is that it’s bundled with all I need to produce dnstap frames.

The following configuration suffices for CoreDNS to provide a forwarder which logs all requests in to the specified Unix domain socket:

.:53 {
    dnstap /var/run/dnstap.sock full
    proxy . 192.168.1.10:53
}

If I then look at a query I see what type of DNS server produced this query, namely a forwarder.

$ dnstap -r coredns.tap -y
type: MESSAGE
message:
  type: FORWARDER_QUERY
  socket_family: INET
  socket_protocol: UDP
  response_address: 192.168.1.10
  response_port: 53
  query_message: |
    ;; opcode: QUERY, status: NOERROR, id: 60806
    ;; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

    ;; QUESTION SECTION:
    ;dnstap.info.       IN       A

    ;; ADDITIONAL SECTION:

    ;; OPT PSEUDOSECTION:
    ; EDNS: version 0; flags: ; udp: 4096
    ; COOKIE: 7f8f3ebbf66ffc95
---

...

I note that CoreDNS records neither identity nor version in the tap file. CoreDNS can log to a remote endpoint by specifying tcp://address:port as sink.

Unbound

Unbound has had dnstap support for a few versions, since Robert Edmonds did the first prototype. I build dnstap support into Unbound with --enable-dnstap.

dnstap:
    dnstap-enable: yes
    dnstap-socket-path: "/var/run/dnstap.sock"
    dnstap-send-identity: yes
    dnstap-send-version: yes
    dnstap-log-client-query-messages: yes
    dnstap-log-client-response-messages: yes
    dnstap-log-forwarder-query-messages: yes
    dnstap-log-forwarder-response-messages: yes
    dnstap-log-resolver-query-messages: yes
    dnstap-log-resolver-response-messages: yes

A local-zone is answered directly by Unbound without performing recursion, so you’ll only see response messages for those domains if you set “dnstap-log-client-response-messages: yes”.

The documentation of dnstap in unbound.conf is, well, no it’s not, it’s simply not there. Actually there is no documentation at all for dnstap in unbound-1.6.5/doc/ which is quite atypical: Unbound’s usually very good about that…

Knot

I tested knot-2.5.4 (authoritative) and built it with

./configure --with-module-dnstap=yes --enable-dnstap

I configure dnstap in knot.conf by specifying the module to load (mod-dnstap) and its parameters, most of which are self-explanatory and have sensible defaults. The sink directive specifies either a file on the file system (which is opened for truncate) or, if prefixed with the string "unix:", a Unix domain socket, e.g. as created by fstrm_capture.

mod-dnstap:
  - id: tap
    sink: /root/taps/knot-auth.tap
    # sink: unix:/var/run/dnstap.sock
    log-queries: false
    log-responses: true

template:
  - id: default
    global-module: mod-dnstap/tap
$ dnstap-ldns -r knot-auth.tap  -y
type: MESSAGE
identity: "knot.ww.mens.de"
version: "Knot DNS 2.5.4"
message:
  type: AUTH_RESPONSE
  query_time: !!timestamp 1970-01-01 21:10:59.484261
  response_time: !!timestamp 1970-01-01 21:10:59.484261
  socket_family: INET
  socket_protocol: UDP
  query_address: 192.168.1.130
  query_port: 53394
  response_message: |
    ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 35495
    ;; flags: qr aa rd ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

    ;; QUESTION SECTION:
    ;www.k20.aa.	IN	A

    ;; ANSWER SECTION:
    www.k20.aa.	3600	IN	A	192.168.1.111

    ;; AUTHORITY SECTION:

    ;; ADDITIONAL SECTION:

    ;; EDNS: version 0; flags: ; udp: 4096
---

Daniel writes that a knotc reload will rotate the dnstap output file as will a SIGHUP.

Wrapping up

There’s a lot of good in dnstap, and it’s a huge improvement over what existed beforehand. There are however a few things to take note of:

  • While debugging, it can take a while until queries start showing up in the dnstap file due to the buffering which the DNS servers do
  • I’m convinced (but have not taken the time to prove) some servers drop logs even though they’re completely idle. This might be due to either real drops or to not flushing existing records when a server is stopped. For example, after shutting down named (rndc stop) and killing dnstap I find the last query missing. People with heavy-traffic servers won’t notice this of course.
  • The existing toolset is a bit sloppy at times. For example fstrm_capture or dnstap -u which cannot rotate output files (the former can rotate every N seconds). This is easy to fix and it needs doing.
  • There’s no network transport of dnstap other than CoreDNS which can send dnstap to a tcp:// target. Somebody started some work but I haven’t seen it.

dnstap is a relatively young, open standard for DNS query logging. It was designed for large, busy DNS servers and offers minimal performance loss. It has already wide adoption amongst open source DNS server implementations, even if there are some missing: NSD, PowerDNS Authoritative, PowerDNS Recursor come to mind, and I hope they’ll very soon join the party. (There’s already a pull request to add dnstap support to dnsdist which will greatly simplify the work for the PowerDNS products, according to the project.)

Further reading:

Flattr this
DNS, logging, and monitoring :: 11 Sep 2017 :: e-mail

Comments

blog comments powered by Disqus