When I heard that the CentOS Project was going to publish official CentOS images on Amazon EC2 (official CentOS announcement) I thought the time was ripe to finally try stuff there, and a cloudy weekend suited me perfectly for doing so. In terms of EC2 I’m a beginner, so bear with me if some of the terminology is incorrect: there’s a lot of terminology involved, so I started by reading this.
To start using EC2 you need an Amazon AWS account and a credit card. The good news is that they have a model in which you can create some smallish machines (called instances) free of charge. Check the fine print on the AWS site. (And by fine print I don’t mean it’s in a small font or hidden – their documentation is quite extensive :-)
First off, you don’t need the AWS tools installed on your management system, and that saves you from having to install Java as well. We need the following components on our CentOS management system:
- Ansible and its small list of dependencies. There’s help in getting started.
- The Euca2ools, command-line utilities for interacting with Amazon’s EC2 and S3 services. I think of these as the answer to Amazon’s tools, but they’re written in Python. Furthermore, we’ll need this and its prerequisites for Ansible as well. Euca2ools are in EPEL so installation is easy.
- A rather large collection of keys and authorization codes which you obtain from the AWS site. I won’t bore you with how to do that, but I will show you a list of variables which must be correctly set for things to work.
You’ve set up your AWS account, and you’ve obtained authorization secrets to interact with EC2. For everything we do from here onwards, we need the following variables in our shell’s environment:
If that environment is set up correctly, you should be able to use the
euca-describe-images command to find out which images are available to create
instances. This program talks to EC2, so if that works, the rest should too.
ami-xxxx in the second column of the list which is returned: that’s
the name we’ll use to choose which image we will instantiate.
In order to launch a new instance (i.e. a new machine) on EC2, we require an SSH key-pair. The private part we keep safely, and the public portion is injected into EC2. Upon creating an instance, EC2 automatically populates the root account of the machine we create with that public key so that we can log in.
We create as many different keypairs as we need, and note their names:
- EC2 needs to be instructed which key it should inject into the instance. It does so with this name.
- SSH needs to use the private key to connect to the instance, and we use the
pemfile for that.
- Ansible uses the SSH key to talk to the instance.
If you don’t have an SSH agent running yet, I recommend you do so now:
To make sure everything is running, let us manually set up an instance now from the command-line. I’ve chosen the ami image I want to use, and I select the SSH key I want EC2 to inject into the instance:
Make note of the instance name (
i-b37bcdcc): this is the handle into that
machine, and we need that name to reboot or destroy it.
After a few moments, I use
euca-describe-instances to check that the machine
is actually booting. I can also see (but not interact with) its console, using
euca-get-console-output, and I’ll see a public hostname which I use to SSH
into the instance as the
Ansible uses an inventory in which I describe the machines I want it to speak
to, the groups they belong to and specific variables I want those machines to
use. The inventory file defaults to
/etc/ansible/hosts, but I can override
that by setting
$ANSIBLE_HOSTS to a different path. An inventory file can be
as short as this, and believe it or not, this is actually the inventory we’re
going to give to Ansible in order to launch EC2 instances:
On the other hand we need to provision EC2 instances running on the other side of the world (for me at least). How do we do that? How do we know their hostnames?
The Ansible ec2.py inventory script enumerates the EC2 instances we can access. If I launch this program, I see the following JSON output because I already have an instance running. (Compare the instance ID and the public hostname to what we saw earlier.)
Even though we have a single machine only, it shows up in different groups.
These groups will allow us to target specific groups of instances when we use
Ansible to provision them. (Note: the
ec2.py program caches its output in
configurable paths, so it may take a minute until the list is refreshed.) If I
invoke the inventory program with a specific host, I get a list of variables
particular to that instance: (I’m omitting lots of output for brevity)
ec2.py inventory script will allow Ansible to interact
with instances on EC2. This happens either by installing the file as an
/etc/ansible/hosts or by pointing
$ANSIBLE_HOSTS to that
Will it “ping”?
As this instance is costing money (not much but do check the AWS pricelist), I’ll terminate (kill, destroy, zap) the instance:
At this point I could leave you to it, and you could successfully use Ansible to install and configure your EC2 instances. But I won’t leave you to it: let’s do a bit of provisioning.
However, I’ll admit that my poor, tired, brain had some trouble in coupling two completely distinct and seemingly unrelated operations:
- The first is instance creation. Fine and dandy, but there is very little that associates the information returned on instance creation with a hostname we later require to access that host.
- For managing the host with Ansible we need a hostname (or IP address) but
neither is obtainable at the time of instance creation. (Compare the output
Furthermore, EC2 hostnames and addresses are volatile: if I stop the instance and restart it, it gets a different IP address and hostname associated with it, meaning I’d “loose” touch. The problem I was facing was: how do I create a self-defined DNS hostname which resolves to a particular instance? (AWS does offer so-called Elastic IP addresses, but I wanted to try and solve the problem without using those.)
If that isn’t an issue for you, ignore the folloing bits which discuss DNS.
I learned that during instance creation on EC2, I can inject user-defined
data into the instance. What I’ll be doing is to ask the operator to
enter a hostname (shortname) and inject that into
user-data. Once the machine
has booted and that data is available, I’ll obtain said hostname and create
a DNS address record associating that name with the machine’s IP address.
Ansible launches a CentOS instance on EC2
ec2 module creates an instance on EC2 and optionally
waits for that instance to become ready. (Note: ready doesn’t mean booted –
that can take a few minutes.) Upon creating an instance, I specify the SSH
keypair I want to use (we created a key called
jp1 for that), the image name,
and a few other parameters which are described in the module’s documentation.
One parameter I want to point out is called
group. This is a so-called
security group which, as far as I’ve been able to determine, specifies e.g.
firewall rules from the AWS point of view. By default only port 22 (SSH) is
allowed into my instance, but I’m creating Web servers so I also want (at
least) port 80.
To create an EC2 security group, I used the following commands:
The specified security group later on shows up as a group in the Ansible inventory (
Note how the
ec2 module is given
user_data: in this case, I simply push a
JSON blob into it, but it could just as well be something else. I
believe it must be less than 16KB. (The
user_data parameter to the
module is brand new in Ansible:
you’re welcome. :-)
ec2, obtained by registering the result of the
contains the following values, which I use in the e-mail I fire off to the
Let me run Ansible on this playbook. Note that I use the simple inventory containing just localhost, because these modules run on my Ansible management machine and not remotely.
To recapitulate: so far Ansible used modules locally (i.e. on our management
machine) to remotely create a CentOS instance on EC2 and to send the
e-mail. I could now use some of the
euca- tools to look and see what is
I’ll also reiterate, that during the creation of the instance our
was injected into that machine so it will be available to us as soon as we
connect to it.
Also: have a bit of patience: it can take a few minutes for the EC2 instance to actually come alive.
Oh, I have mail:
As an aside, recall from above that we have the EC2 access codes in shell
environment variables. The
ec2 module allows specifying these as parameters
so we could also use Ansible
host_vars to set these,
instead of relying on Ansible’s run-time environment.
Ansible provisions CentOS instances
After a couple minutes of patience I use Ansible to actually provision the instance I just brought up. To illustrate, I’ll just install an Apache Web server and a template, so nothing special, at least not in the first part of the playbook:
The hosts Ansible should act upon are specified by a group name as obtained by
ec2.py, and I’m connecting as
root because that’s the user for which our
key has been injected into by EC2. After the first portion of the playbook has
run, I could use a Web browser to connect to the
ec2-*.compute-1.amazonaws.com hostname or to its public IP address.
I wanted to somehow be able to pre-determine the DNS name by which an instance is reachable. As mentioned earlier, that isn’t as easy as it sounds because EC2 hostnames and addresses are volatile.
Fiddling with DNS
Recall we had injected
user_data into the instance. If, at this point, I
logged onto the instance, I could obtain that data from a specific URL
(available to all instances on the instance only).
Because I was able to “smuggle” that hostname into the user-data, I can now
have Ansible retrieve that via a module to further process it. I’ve created a
custom module called userdata to read that data when on
the instance. Its output is registered in the playbook as variable
ud, and I
use bits of that information in the template (for
index.html) as well as using portions to
update the DNS with our hostname pointing to the instance’s public IP address.
Ansible will do that for us, and it’ll use a brand-new
dnsupdate Ansible module I wrote to fire-off a dynamic DNS update.
So let’s make this happen!
The instance is still waiting for us to do something with it. We tell Ansible to use a different inventory this time, i.e. the EC2 inventory, and launch the configuration playbook:
Provisioning is complete, services are running, and the dynamic DNS update has been performed. I can immediately connect to our new host via the name I chose upon first launching the instance!
One final note: pop over to Seth Vidal’s site and read on how he uses Ansible on cloud instances. I particularly recommend that because he knows a lot more about all this and has more experience with it than I do. He takes a different approach by creating a module which injects a host into Ansible’s in-memory inventory; that module is now in Ansible core. I could have copied and pasted and be done with it, but I wanted to do this differently.