1 Objectives

You will each of you install the Ganeti virtualization cluster management software on your Linux server.

Install the Ganeti software
Configure the network components (including VLANs) and network based disk replication (DRBD)
Use the Ganeti command line tools to initialize the cluster, with each pair of machines working in a cluster configuration

Each pair of machines (vm1 & vm2, vm3 & vm4, ... vm13 & vm14) will be set up in a cluster configuration, where the odd (1,3,5,...,13) will be the master node, and the even machine (2,4,6,...,14) will be the slave.

cluster	master node	additional node(s)
gnt1.ws.nsrc.org	vm1.ws.nsrc.org	vm2.ws.nsrc.org
gnt2.ws.nsrc.org	vm3.ws.nsrc.org	vm4.ws.nsrc.org
gnt3.ws.nsrc.org	vm5.ws.nsrc.org	vm6.ws.nsrc.org
etc

If there is an odd number of machines in the class then the last cluster will have three machines.

Note that ganeti requires you to use fully-qualified domain names, and these must resolve to the correct IP addresses (either in the DNS or in the /etc/hosts file on every node)

2 Become root

All of the actions in this exercise are done as "root", so if you are not root already type:

$ sudo -s
#

3 Configure the Hostname

Look at the contents of the file /etc/hostname and check it contains the fully-qualified domain name, i.e.

vmX.ws.nsrc.org

(where X is your machine number). If not, then edit it so that it looks like that, then get the system to re-read this file:

# hostname -F /etc/hostname

Also check /etc/hosts to ensure that you have the both the fully-qualified name and the short name there, pointing to the correct IP address:

127.0.0.1   localhost
10.10.0.X   vmX.ws.nsrc.org vmX

4 Configure the Network

We're now going to reconfigure the network on our machine, so that we will be using VLANs. While it would be perfectly fine to use a single network for running virtual machines, there are a number of limitations, including:

no separation between the networks used to manage the servers (management) and the one where the virtual machines are placed (service)
we will be using network-based disk replication, and we'd like to keep the disk traffic separate from the management and service traffic

Instead of using separate ethernet cards, we'll use VLANs.

We need to implement three networks: management, replication, and service.

Ideally, we would create three VLANs:

A management VLAN (vlan 1) ¹
A replication (or storage) VLAN (vlan 100), used for storage (disk) traffic/replication
An external (or service) VLAN (vlan 255), where we will "connect" the virtual machines

4.1 VLAN configuration

To be on the safe side, let's install the vlan and bridge management tools (these should already have been installed by you earlier).

# apt-get install vlan bridge-utils

Let's make changes to the network configuration file for your system. If you remember, this is /etc/network/interfaces.

Edit this file, and look for the br-lan definition. This is the bridge interface you created earlier, and eth0 is attached to it.

If should looks something like this:

# Management interface
auto eth0
iface eth0 inet manual

auto br-lan
iface br-lan inet static
        address         10.10.0.X
        netmask         255.255.255.0
        gateway         10.10.0.254
        dns-nameservers 10.10.0.241
        bridge_ports    eth0
        bridge_stp      off
        bridge_fd       0
        bridge_maxwait  0

We're going to leave this alone, and not going to use VLAN tagging (802.1q) for our management network. What it means is that we will have both untagged and tagged (VLAN) frames going through eth0 ².

We will proceed to create VLANs 100 and 255, and the associated bridge interfaces for them.

4.1.1 VLAN 100

Let's start with VLAN 100. To do this, add the following lines below the br-lan section:

# VLAN 100
auto eth0.100
iface eth0.100 inet manual

auto br-rep
iface br-rep inet static
        address 10.10.100.X
        netmask 255.255.255.0
        bridge_ports    eth0.100
        bridge_stp      off
        bridge_fd       0
        bridge_maxwait  0

Remember to replace X with the number of your class PC.

This does two things:

Adds a new "sub"-interface VLAN 100. The new interface is called eth0.100. This naming convention using dot is quite common, and immediately identifies that VLAN 100 is associated with eth0.
Creates a new bridge br-rep (for 'replication'). This bridge has only one interface associated with it: eth0.100 ³

4.1.2 VLAN 255

Now we add VLAN 255. Same as before, go to the end of the file, and add the following lines:

# VLAN 255
auto eth0.255
iface eth0.255 inet manual

auto br-svc
iface br-svc inet manual
        bridge_ports    eth0.255
        bridge_stp      off
        bridge_fd       0
        bridge_maxwait  0

This is very similar to VLAN 100, but notice that we have NOT configured an IP address for br-svc. This is because we do not want the physical host OS to be connected to this network with IP: the host OS shouldn't be reachable via SSH on this network, for security reasons.

Review the work you have just done. The resulting file should look something like this (IPs should be the ones for your PC, of course):

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet manual

auto br-lan
iface br-lan inet static
        address 10.10.0.X
        netmask 255.255.255.0
        gateway 10.10.0.254
        dns-nameservers 10.10.0.241
        bridge_ports    eth0
        bridge_stp      off
        bridge_fd       0
        bridge_maxwait  0

# VLAN 100
auto eth0.100
iface eth0.100 inet manual

auto br-rep
iface br-rep inet static
        address 10.10.100.X
        netmask 255.255.255.0
        bridge_ports    eth0.100
        bridge_stp      off
        bridge_fd       0
        bridge_maxwait  0

auto eth0.255
iface eth0.255 inet manual

# VLAN 255
auto br-svc
iface br-svc inet manual
        bridge_ports    eth0.255
        bridge_stp      off
        bridge_fd       0
        bridge_maxwait  0

4.2 Summary of the topology

We know have the following configuration. Think of eth0, eth0.100 and eth0.255 as 3 different interfaces, connected to 3 different virtual switches (br-lan, br-rep and br-svc, respectively).

                 -----------+--------------
                            |
                          br-lan 
                            |         host X
                  +---------+---------+
                  |        eth0       |
                  |                   |
                  |eth0.255   eth0.100|
                  +--+-----------+----+
                     |           |
                   br-svc      br-rep
                     |           |
         VMs --------+           +------> to other hosts

4.3 Reboot and test network configuration

At this point, it may be best to reboot completely to test all the network changes we've done. Proceed to reboot:

# reboot

Once the machine is up again, log in again, and verify that your colleagues have finished their configuration, and test that you can ping each other:

On the management network IP
On the replication network IP

If you have problems:

verify /etc/network/interfaces
run ifconfig to see if the interfaces are configured
ask a neighbor, or your instructors, for assistance.

You way want to test that you can resolve the following hostnames using the dig command:

dig +short vm1.ws.nsrc.org
dig +short vm2.ws.nsrc.org
..
dig +short gnt1.ws.nsrc.org
dig +short gnt2.ws.nsrc.org
..

5 Install the Ganeti software

Now install the software from the right package repository. How to do this depends on whether your machine is running Debian or Ubuntu.

5.1 Debian

On Debian, the available version of ganeti is too old, but fortunately the current version is available in a backports repository ⁴.

As root, create a file /etc/apt/sources.list.d/wheezy-backports.list containing this one line:

deb http://cdn.debian.net/debian/ wheezy-backports main

Then refresh the index of available packages:

# apt-get update

Now, install the Ganeti software package. Note that the backports packages are not used unless you ask for them explicitly.

# apt-get install ganeti/wheezy-backports

This will install the current released version of Ganeti on your system; but any dependencies it pulls in will be the stable versions.

5.2 Ubuntu

On Ubuntu, the version of Ganeti available is too old, and there is no version in backports.

Luckily, a newer version of Ganeti is available for the version of Ubuntu we're running, via a "Private Package Archive" (PPA).

https://launchpad.net/~pkg-ganeti-devel/+archive/lts

To add the necessary information to the our list of packages sources (/etc/apt/sources.list), run the following commands:

# apt-get install python-software-properties
# add-apt-repository ppa:pkg-ganeti-devel/lts

The second command will prompt you:

You are about to add the following PPA to your system:
 This PPA contains stable versions of Ganeti backported to Ubuntu LTS. Currently it covers 12.04 LTS.
 More info: https://launchpad.net/~pkg-ganeti-devel/+archive/lts
Press [ENTER] to continue or ctrl-c to cancel adding it

Just press [ENTER]

The package archive will now be available. We still need to update the list of available packages:

# apt-get update

Now, install the Ganeti software package:

# apt-get install ganeti

This will install the current released version of Ganeti on your system.

6 Setup DRBD

We'll now set up DRBD (Distributed Replicated Block Device), which will make it possible for VMs to have redundant storage across two physical machines.

DRBD was already installed when we installed Ganeti, but we still need to change the configuration:

# echo "options drbd minor_count=128 usermode_helper=/bin/true" >/etc/modprobe.d/drbd.conf
# echo "drbd" >>/etc/modules
# rmmod drbd      # ignore error if the module isn't already loaded
# modprobe drbd

The entry in /etc/modules ensures that drbd is loaded at boot time.

7 Create a root password

Ganeti will need to log in as root to the other nodes in the cluster so it can set up the configuration files there. After the first login, SSH keys are used (and therefore no password is used), but for the first connection, we need to set a root password.

For Ubuntu servers only: you need to set a root password on each node. (For Debian servers, this will have already been done at installation time)

Note: You only need to do this on the slave node in each pair of servers.

# passwd root
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully

Use the in-class password!

Finally, create a directory for SSH keys to be stored for the root user:

# mkdir /root/.ssh
# chmod 700 /root/.ssh

8 Initialize the cluster - MASTER NODE ONLY

We are now ready to run the command that will create the Ganeti cluster. Do this only on the MASTER node of the cluster.

# gnt-cluster init --master-netdev=br-lan --enabled-hypervisors=kvm \
  -H kvm:kernel_path="",initrd_path="" -N link=br-svc -s 10.10.100.X \
  gntN.ws.nsrc.org

where X is the number of your host (like vm1, vm2 etc), and N is the number of your cluster (gnt1, gnt2 etc)

Explanation of the above parameters:

--master-netdev => Ganeti uses a virtual IP which the Ganeti software service will use when communicating with the tools and the other nodes in the cluster. In our case, our management network is br-lan, thus we set master-netdev to be br-lan.

Observe that there is indeed an interface br-lan:0 now configured:

$ ifconfig br-lan:0

The IP address should be that which the hostname gntN.ws.nsrc.org resolves to.

--enabled-hypervisors => We are using KVM as our hypervisor
-N link => Here we set the default network to which the virtual machines we create will be attached. In our case, this will be br-svc
-s => This tells Ganeti which Secondary IP to use for disk replication. We created a dedicated network for this.
Finally gntN.ws.nsrc.org is the name of the cluster you are creating

If everything goes well, this command will take 5-6 seconds to complete.

The command gnt-cluster init will not output anything unless a problem occurred.

During the cluster creation, the node you ran the command on (the master node) was automatically added to the cluster. So we don't need to do that and can proceed directly to adding the other nodes in the cluster. In this case, there is only one other node: vmY.ws.nsrc.org.

8.1 Adding nodes to the cluster - MASTER NODE ONLY

So let's run the command to add the other node. note the use of the -s option to indicate which IP address will be used for disk replication on the node you are adding.

Run this command only on the MASTER node of the cluster.

# gnt-node add -s 10.10.100.Y vmY.ws.nsrc.org

You will be warned that the command will replace the SSH keys on the destination machine (the node you are adding) with new ones. This is normal.

-- WARNING --
Performing this operation is going to replace the ssh daemon keypair
on the target machine (vmY) with the ones of the current one
and grant full intra-cluster ssh root access to/from it

When asked if you want to continue connection, say yes:

The authenticity of host 'vmY (10.10.0.Y)' can't be established.
ECDSA key fingerprint is a1:af:e8:20:ad:77:6f:96:4a:19:56:41:68:40:2f:06.
Are you sure you want to continue connecting (yes/no)? yes

When prompted for the root password for vmY, enter it:

Warning: Permanently added 'vmY' (ECDSA) to the list of known hosts.
root@vmY's password:

You may see the following informational message; you can ignore it:

Rather than invoking init scripts through /etc/init.d, use the service(8)
utility, e.g. service ssh restart

Since the script you are attempting to invoke has been converted to an
Upstart job, you may also use the stop(8) and then start(8) utilities,
e.g. stop ssh ; start ssh. The restart(8) utility is also available.
ssh stop/waiting
ssh start/running, process 2921

The last message you should see is this:

Tue Jan 14 01:07:40 2014  - INFO: Node will be a master candidate

This means that the machine you have just added into the node (vmY) can take over the role of configuration master for the cluster, should the master (vmX) crash or be unavailable.

If your cluster has three nodes, then repeat the above for the third node.

8.2 Verify the configuration of your cluster

Again only on the MASTER node of the cluster:

# gnt-cluster verify

This will tell you if there are any errors in your configuration. You may see errors about "orphan volumes":

Thu Feb  6 05:02:47 2014 * Verifying orphan volumes
Thu Feb  6 05:02:47 2014   - ERROR: node vmX.ws.nsrc.org: volume xenvg/swap is unknown
Thu Feb  6 05:02:47 2014   - ERROR: node vmX.ws.nsrc.org: volume xenvg/var is unknown
Thu Feb  6 05:02:47 2014   - ERROR: node vmX.ws.nsrc.org: volume xenvg/root is unknown

This means logical volumes which have been created but which ganeti does not know about or manage. You can avoid this error by telling ganeti to ignore those logical volumes:

# gnt-cluster modify --reserved-lvs=xenvg/root,xenvg/swap,xenvg/var
# gnt-cluster verify

If you still have any errors, please talk to the instructors.

To see detailed information on how your cluster is configured, try these commands:

# gnt-cluster info | more

Look at the output.

# gnt-node list

You are done with the basic installation!

9 Securing the VNC consoles

It would be good idea to make sure that the VNC consoles for the VMs was protected by a password.

To do this, we can create a cluster-wide password for every VM console.

This can later be overridden (changed) for each instance (VM).

To create the cluster-wide password, run this command on the master:

# echo 'xyzzy' >/etc/ganeti/vnc-cluster-password
# gnt-cluster modify -H kvm:vnc_password_file=/etc/ganeti/vnc-cluster-password

You will probably see an error message:

Failure: command execution error:
Hypervisor parameter validation failed on node vmY.ws.nsrc.org: Parameter 'vnc_password_file' fails validation: not found or not a file (current value: '/etc/ganeti/vnc-cluster-password')

Hmm, we just added the file - but wait! It's telling us that the file is missing from the secondary node (vmY.ws.nsrc.org), where Y is our partner node.

That's because we only created /etc/ganeti/vnc-cluster-password on the master node. It needs to be on every node (host) since any one of them could become a cluster master in the future.

There's a great command for this in ganeti: gnt-cluster copyfile

gnt-cluster copyfile will take a file as a parameter, and will take care of copying it to every node in the cluster.

In this case, we want our file /etc/ganet/vnc-cluster-password to be copied.

To do this (on the master host - you will get a complaint if you try and run this on the other nodes):

# gnt-cluster copyfile /etc/ganeti/vnc-cluster-password

You can now re-run the command from earlier:

# gnt-cluster modify -H kvm:vnc_password_file=/etc/ganeti/vnc-cluster-password

That's it! Next up, we'll create some instances (VMs) and test migration.

Proceed to the next lab ex-ganeti-create

Note that VLAN 1 can have a special meaning. On many switches, VLAN 1 is the "default" VLAN, and cannot be removed. Some switches only allow management using VLAN 1. For security reasons, it's good practice to disable VLAN 1 and use other VLAN numbers. In our workshop, we'll keep it to make things simpler in our labs.↩
This isn't a typical network setup, but it keeps things simpler here so we don't have to change the network configuration for our management network.↩
We won't be attaching (connecting) any virtual machines to br-rep, so the bridge interface is not strictly necessary (we could have allocated the IP directly to eth0.100)↩
backports are newer versions of the third party software originally packaged for your version of the operating system. These newer versions, packaged for a newer releases of Debian (or Ubuntu), have been made available (or backported) to the version of Debian we are using.↩

Ganeti basic installation