You’re into systems administration, but you don’t know Collectd? Shame on you! Go read about what it is, and what it does, and then come back here.

Back? Good.

You’ll have read that Collectd basically gathers statistics and stores them in RRD files. It doesn’t generate graphs; just data.

I’m in the process of getting Collectd onto quite a number of hosts, and, of course, I’m using Puppet for that. I’m also distributing Collect’s collect.conf via Puppet, from a template, which allows me to get particular configurations onto individual nodes if I need that. So what I end up with is basically this: Puppet supplies nodes with their configuration, which includes a collectd.conf file

What I really want is to have Collectd send data collected on the individual nodes to the server located in the vertical center of the diagram. The powers- that-be behind Collectd foresaw that, and they provide the network plugin with which to do that. Configuration is quite trivial and well documented (as most of Collectd’s plugins are). The end result is a hub and spoke formation which sends encrypted traffic over a predefined UDP port from the nodes on the left to the central Collectd server pictured in the middle of the diagram. If your topology is even more segregated, you can even send those accumulated results from the central server to the hinted server on the far right; in this case the server in the middle acts as a Collectd proxy. (This is also documented in networking introduction.)

What this results in, is that any plugin on one of the nodes, automatically sends its results to the server. Thusly, if I add a plugin to Collectd’s configuration, upon Puppet’s next run, all nodes will load that plugin and begin transferring results to the server. Way cool!

I can create my own RRDs that Collectd captures using any of Collectd’s numerous plugins. I’m in need of a custom plugin, so I’m using the Exec plugin for which there are some examples. (Another very flexible tool is the Tail plugin, which allows me to react on content of files by tail -f‘ing them.)

Let me show you a trivial example. First the program that will be Exececuted by Collectd as a long-running process; it is launched when Collectd starts up and re-spawned if it dies. What the program does is it creates two RRDs containing random values. In real life I obtain data from defined sources of course.

    HOSTNAME="${COLLECTD_HOSTNAME:-`hostname -f`}"
    while sleep "$INTERVAL"; do
      time="$(date +%s)"
      value=`expr $RANDOM % 20`
      echo "PUTVAL $HOSTNAME/jptest-1/gauge-ddns_updates interval=$INTERVAL $time:$value"
      echo "PUTVAL $HOSTNAME/jptest-one/gauge-mygauge interval=$INTERVAL $time:$RANDOM"

Note, that the filename of an RRD you create (called gauge-ddns_updates in the first PUTVAL line) must start with one of the types defined in Collectd’s types.db file; add custom types as you see fit and distribute those to your nodes using Puppet.

The configuration for this plugin is simple:

    LoadPlugin exec
    <Plugin exec>
      Exec "nobody:nobody" "/usr/local/sbin/"

As soon as the plugin is launched, it starts collecting data. This data is stored on the node proper, and it is also transmitted via the Network plugin to the server, from which I can visualize it. I mentioned above that Collectd doesn’t create pretty pictures – you need to use other utilities for that, although the distribution does include some rudimentary tools to help you get started. If you want a very good-looking Web-based tool for that, I recommend Collectd-web. So what do my RRDs look like in a default installation? Here they are:

Ideally, I’d tweak Collectd-web’s output to add descriptive labels for my RRDs, but that’s another exercise. Collectd-web also has a good-looking snappy iPhone interface which reportedly also works on other mobile devices. I highly recommend you look into these tools more closely – it’s worth it.

CLI, monitoring, Puppet, and collectd :: 03 Mar 2011 :: e-mail