Several of my readers will recall the incredibly embarassing “sudo cake” experience I underwent many years ago, and the world’s most popular search engine still lists the page as first hit on those two keywords:

Google search for sudo cake

I recount the experience during each of my Ansible trainings, not to scare but illustrate how things can go wrong. While doing so today, a student infected me with what I think is a very good idea. I’ve not had time to think this through completely but will jot it down here nevertheless.

This company has begun migrating from CFEngine to Ansible, and the student suggested that CFEngine would have been able to fix the broken /etc/sudoers file after a while because of the agent running as root on all nodes. He then suggested Ansible in pull mode could do likewise, and I’m enthusiastic about that idea.

Recall that “pull mode” means a local installation of Ansible on each node. Each node subscribes to a “repository” (of any type) and periodically pulls configuration files and playbooks from that repository, and runs them locally, as in

ansible-playbook -c local playbook.yml

Thanks to the fact that “pull mode” is less of a product but more of a concept everything can be fine-tuned however I wish to do this.

Let’s assume for a moment that a “cake scenario” happens in which, say, the sudoers file is corrupted in such a way as become: true no longer works in push mode. If I’ve set up pull mode correctly and run it as root, I could distribute a (possibly self-destroying) playbook which the node would download from a repository to fix whatever broke.

I’ve said already that I’ve not thought this through completely, but here are some things I would look into:

  • Ansible installed on the node
  • Periodic pull of a repository / directory by whichever means suitable (this could be a git or SVN repo, a directory I download via SFTP, etc.)
  • Periodic launch of ansible-playbook -c local with a predefined $hostname.yml playbook

If periodic runs aren’t possible or desired, Runner (Dial A for Ansible and R for Runner), trigerred by a REST call could be an alternative to have one-shot “recovery playbooks”.

Let me know, please what your reaction to the idea is.