The process of booting a computer system over the network is well understood, and it’s been around for donkey’s ages. Basically, the way it works is that a computer system requests an IP address from a BOOTP/DHCP server, obtains the name of a bootstrap program (e.g. PXELINUX) it should load from a TFTP server, and subsequently uses that to boot the machine. This is used extensively when installing operating systems onto a number of machines. I’ve been wanting to avoid using TFTP because:
- The first T in TFTP stands for trivial; TFTP is unreliable and error-prone and won’t work over wide area networks. Ideally, PXE systems would implement alternative protocols but most don’t.
- TFTP is an all-or-nothing proposition: there’s no access control to the content of the server’s directory. (There is at least one server that includes libwrap capabilities.)
- Configuration files for PXELINUX (i.e. the things that live in its
pxelinux.cfgdirectory) cannot be created on demand. I can pre-create a file and save it in the required directory for TFTP to send out, but files must exist by the time PXELINUX asks for them.
Earlier this year I mentioned I was setting up lots of bare metal, and I mentioned iPXE (formerly gPXE, formerly Etherboot). iPXE is a network boot loader which provides a full PXE implementation with some exciting features: it can boot via HTTP (and from an iSCSI SAN), and I can control the boot process with a script. Ideally, the network cards (NIC) we use would have iPXE burnt in (which can be done) but in this project we haven’t yet evaluated what that would mean in terms of hardware.
In the following discussion I assume you’ve downloaded a copy of the iPXE source
code and that you’ve have unpacked that and run a
make in the
make takes a bit of time; it creates all of iPXE’s target formats. Later on
I’ll show you how to embed a script, and the
make for that takes a second or two.
iPXE can be used in a variety of ways, but I’ll concentrate on three scenarios in the following diagram:
The three machines boot as follows:
machine1sends out a PXE request which is answered by a near-by DHCP server. It then loads iPXE as
undionly.kpxefrom the TFTP server, and the rest happens over HTTP.
undionly.kpxeis created with
make bin/undionly.kpxe, and I drop that file into my TFTP root directory and then have my DHCP server give this file as boot file to my clients, ensuring I break the infinite loop that would result. (My
machine2boots with a customized iPXE script, either from a modified network ROM or via, say, a CD-ROM. It obtains its network address via DHCP and can then directly “speak” to a HTTP server. To create a customized boot loader with an embedded script (e.g.
jpmens.ipxe), I invoke
make bin/undionly.kpxe EMBED=jpmens.ipxeand store the resulting file on a bootable floppy or burn it onto a CD-ROM, etc. The embedded script uses a iPXE commands to obtain DHCP parameters when it starts, or I can hard-code IP address, net mask, etc., and I can use iPXE settings in the script, as we’ll see for
In the case of
machine3, I’ve created a custom iPXE image with which the machine boots. The script contains hard-coded network addresses, and it should be straight-forward to mass-create custom images with a bit of
make. This is interesting if there is no DHCP server (or relay) close to (network-wise) the node.
DHCP, TFTP, and HTTP
machine1 uses DHCP and a TFTP server to load iPXE’s
which the latter takes over. The DHCP server configuration I’m using is:
When the machine (node) boots it fires off its first PXE request, our DHCP server
receives the request and gives it an IP address, netmask, etc. as well as a
undionly.kpxe. The node then retrieves
undionly.kpxe via TFTP
and loads and executes it. iPXE (
undionly.kpxe) then again issues a DHCP
request. Without the
if exists user-class magic we’d enter an endless loop where
iPXE would load itself, then load itself, ad nauseam. The
if ensures that when iPXE
issues a DHCP request, it is given the filename called
netboot.php which resides
on a HTTP server. From this point onwards, everything happens over HTTP!
The file name iPXE chains into is an HTTP URL which, in my case, creates an
on-the-fly configuration script for iPXE. (The strange-looking
dhcpd.conf is to ensure the hardware address is correctly formatted.)
To make things easier, I’ll
omit showing the code the iPXE script is generated from (basically a database
access and some Mustache); instead, here is its output:
echo prints information to the screen, using some of iPXE’s
settings. Apart from that, a kernel is loaded together with
an initrd image, and we attempt to boot that. If that fails, we
fall back into iPXE’s shell.
The configuration for
machine3 differ only slightly in that
the former lets iPXE obtain network parameters via DHCP, and the latter has them
embedded in the script. I can test with a VirtualBox client which boots
from an ISO image created with one of the iPXE
make targets. What I
did was to create a script called
jpstatic.ipxe and I then built the ISO
image I attached to VirtualBox with
jpstatic.ipxe is an iPXE script which defines network addresses
for the machine and subsequently chains to the boot file.
When I launch the virtual machine, it boots from the ISO image containing iPXE.
iPXE initializes its network stack and proceeds to run the embedded script. Note
chain command loads a script or image from the specified HTTP server
and then boots into that.
node.ipxe script I’m chaining into doesn’t do much except print out some
iPXE’s variable values obtained via DHCP or hardcoded into the script,
and it then launches the iPXE shell:
PXELINUX over HTTP
To be as flexible as possible with regard to booting different types of images, allowing boot menus, etc. I’m adding a level of indirection. PXELINUX versions >= 3.70 can boot over HTTP. (I tried with the latest version (4.04) but that failed, so I fell back to using version 3.86.) I installed nasm and built the code from a SYSLINUX distribution:
Take note that I’m copying
pxelinux.0 to the HTTP document root, and not the
TFTP root. I then changed my
netboot.php to return the following iPXE
The two DHCP options define the HTTP URL to the root of the HTTP server (209) and to the configuration file for PXELINUX (210) respectively. Without option 209, when PXELINUX is loaded it will attempt to retrieve its configuration (via HTTP) from the following URLS:
Instead of using static files I create PXELINUX configuration on the fly. For example, if
the node would boot Centos, whereas if it, instead, output
then the machine boots from the first hard disk. It is important to realize
that all paths I’ve used (e.g.
(also from SYSLINUX)) are relative to the HTTP root we specified as option 210
above. (Keep an eye on your HTTP access log when experimenting with this.)
dnsmasq as a DHCP server
If you use dnsmasq as your DHCP server, you can also do this. Here’s
a snippet from my
To summarize, I need a DHCP server and a TFTP server close by the machines
(nodes) I’ll be booting this way, unless I go the extra mile and create custom
undionly.kpxe images that can be booted from local media. When nodes boot they go
through the following chain of events:
- Machine boots.
- If configured to use local boot media, loads iPXE from that.
- Hardware does a PXE boot and sends out a DHCP request.
- DHCP server returns reply and boot filename
- Node requests file from TFTP server.
undionly.kpxe(iPXE) loads and optionally issues another DHCP request, and then
- chains (boots) into the script returned by
- Node loads
pxelinux.0loads configuration file specified in option 209. (
pxelinux.0loads further kernel via HTTP depending on configuration.
This sounds quite convoluted, and it is rather, but we gain a lot of functionality:
- Nodes can boot over the WAN links (e.g. the Internet).
- If necessary, we can use caching HTTP proxies to reduce the volume of data transferred from the deployment server to groups of nodes.
- We can apply granular access-controls to the HTTP server, something very difficult (or impossible?) to do with TFTP.
- We are highly flexible in how we create configuration for clients; we can use database queries to provision boot scripts to individual nodes or groups of nodes.
Client nodes can be set to always PXE boot, and we can remote-control what they do when they’re power-cycled: install, boot from disk, show menu, etc.