If the instructors have already assembled your cluster: please skip the first section and jump straight to the section headed “Live migration”.
In this lab, you’re going to add new nodes to the cluster, and then perform a live migration of a virtual machine.
There are multiple clusters (depending on the class size), each with five nodes, connected like this:

Managing nodeX1 is a shared responsibility between all the groups using a particular cluster; but then each group is responsible for one of the other nodes. Your group number (node number) should have been assigned to you by the instructors.
It is VERY IMPORTANT that the groups on the same cluster work together to do this ONE STEP at a time! That is:
Following recommended practice, we are going to use a separate NIC for cluster communication (corosync), on a separate subnet, 100.64.4.0/24. This NIC has already been set up for you; in real life you’d have to configure this network interface first.
Every node must select its 100.64.4.X address when joining the cluster, otherwise they won’t be able to communicate.
PLEASE BE CAREFUL. Fixing a broken cluster in Proxmox VE is HARD.
For this part, ONE person is going to prepare nodeX1 for the whole cluster. So, groups 12-15 should nominate one person and gather around their laptop; groups 22-25 choose another person, and so on.
nodeX1 has to be initialized as a standalone cluster, before other nodes can join. There are two ways to do this: choose whichever way you prefer.
Login to nodeX1 GUI as you did before.
Go to Datacenter (first column), Cluster (second column)
Click “Create Cluster”, which gives you a dialog box.
Link 0, make sure address
100.64.4.1X1 (eth1) is selected
It should generate a few lines of output ending with “TASK OK”. Close the output window.
You should then see your single node under “Cluster Nodes” in the GUI.
If there’s a problem, ask your instructors for help.
Login to nodeX1 GUI, then get a command line by selecting nodeX1 in
the left-hand menu, then clicking >_ Shell in the next
column.

Then enter this command:
pvecm create clusterX --link0 100.64.4.1X1 --nodeid 1
^ ^
change X to ------'-------------------'
your cluster number
You should see some results. To confirm the cluster status afterwards, enter this command:
pvecm status
which should end like this:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 100.64.4.1X1 (local)
If there’s a problem, ask your instructors for help.
Now the group which is managing nodeX2 (see the table at the top) needs to add it to the cluster. One person in the group should do these steps, and the other people in the cluster watch and help.
Again, you have two options for doing this. Choose either one; it doesn’t matter which one was used to set up nodeX1.
This requires you to use both nodeX1 and nodeX2 GUIs. Open both on the same laptop.
First, login to nodeX1 GUI. Go to Datacenter (first column), Cluster (second column), then click “Join Information”.
This will give you a dialog box showing three fields: IP address (of nodeX1), Fingerprint, and Join Information which is a long blob of encoded text.
Click “Copy Information” at the bottom left, and it will copy the Join Information to the clipboard.

Now open a window to nodeX2 GUI and login there.
Go to Datacenter (first column), Cluster (second column)
Click “Join Cluster”, which gives you a dialog box. Paste the text blob into the “Information” field, and more fields will open up:

Under “Cluster Network”, next to Link 0, select address 100.64.4.XXX
(interface enp6s0)
Finally, you have to enter the peer’s root password (i.e. the root password for nodeX1, which you already know), and click “Join ‘ClusterX’”
At this point you’ll start getting some results:
Establishing API connection with host '100.64.0.101'
Login succeeded.
check cluster join API version
Request addition of this node
At this point the web interface may hang, or it may get a bit further:
Join request OK, finishing setup locally
stopping pve-cluster service
and/or you may see an error like “permission denied - invalid PVE ticket (401)”. This is due to the web server restarting after joining the cluster. If so, refresh the web page and login again. If the browser shows that the site now has an invalid certificate, try quitting and restarting the browser.
When logged back in, you should be able to see the success in the “Tasks” log at the bottom of the screen:

You should also see both nodes in the GUI of both nodeX1 and nodeX2, when you go to “Datacenter” in column 1, “Cluster” in column 2.
Login to nodeX2 GUI, then get a command line by selecting nodeX2 in
the left-hand menu, then clicking >_ Shell in the next
column.
Then enter this command:
pvecm add nodeX1.ws.nsrc.org --link0 100.64.4.1XY --nodeid Y
^ ^^ ^
'--------------------------------'`----------'
change X to your change Y to last
cluster number digit of node number
(e.g. 2 for nodeX2)
The first parameter, nodeX1.ws.nsrc.org, is the existing
cluster node that we want to join.
You will be prompted for the root password for nodeX1; enter it.
If you get an error saying “hostname verification failed” then you’ll need a longer version of this command:
pvecm add nodeX1.ws.nsrc.org --link0 100.64.4.1XY --nodeid Y \
--fingerprint ZZ:ZZ:ZZ:ZZ...
(all on one line), where you replace ZZ:ZZ:ZZ:ZZ... with
the fingerprint shown from the cluster “Join Information” in the GUI
from nodeX1.
You should see some results like this:
Establishing API connection with host 'node01.ws.nsrc.org'
Login succeeded.
check cluster join API version
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1753349590.sql.gz'
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'node02' to cluster.
(You may then be logged out of the GUI; if so, login again)
To confirm the cluster status afterwards, enter this command:
pvecm status
which should end like this:
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 100.64.4.1X1
0x00000002 1 100.64.4.1X2 (local)
Note the “Quorum”. For the cluster to work, more than half of the nodes have to be online to agree on cluster state. Since we have only two nodes, the quorum is 2; both nodes must be online.
“Quorate” means that the cluster does currently have sufficient active nodes to form a quorum.
ONE GROUP AT A TIME, repeat the above steps which were done for nodeX2 on the remaining nodes. That is: the next group should add their nodeX3; only when that is complete and working move to nodeX4; and so on.
Once this is finished, you should have a 5-node cluster. (This is a good cluster size; it is recommended that clusters should have an odd number of nodes)
At this point, you can break up into your separate groups. Each group can run this exercise in parallel with the others.
You’re going to live-migrate the VM you created earlier
(groupXY-web) to your group’s new node.
Choose someone in the group who hasn’t done much so far
Find your groupXY-web VM in the GUI. It should be under
nodeX1. Start it, if it’s not already running, and then open the
console.
Login to the VM (username “ubuntu” and the password you set when creating the VM), then start the following command:
ping 8.8.8.8
You should see pings scrolling up the screen, with increasing
sequence numbers (icmp_seq).
You are now going to move this VM, while it is running, to another node.
Click the “Migrate” button, which is at the top right above the console.

For the target, choose the node which your group is managing. For example, if you’re group12, you will migrate to node12.

Click “Migrate” and a progress window should pop up. You will see the disk contents being migrated:
...
drive mirror is starting for drive-scsi0
mirror-scsi0: transferred 6.2 MiB of 5.0 GiB (0.12%) in 0s
mirror-scsi0: transferred 1.7 GiB of 5.0 GiB (34.69%) in 1s
mirror-scsi0: transferred 2.7 GiB of 5.0 GiB (53.51%) in 2s
mirror-scsi0: transferred 5.0 GiB of 5.0 GiB (100.00%) in 3s, ready
...
This is follwed by the migration of the VM state (RAM). Hopefully it will end with TASK OK after a few seconds - with the ping still running in the console. Close the task viewer window.
But if you look in the left-hand pane, you’ll see that your VM is now running on a different node!
Live migration involves copying the entire RAM and CPU state, and because this VM has local disk, it also involves copying the entire disk image. That’s why there’s a warning that it might take a long time. Since RAM and disk can be changing, any “dirty” data may have to be copied again. There may be a short pause of a second or two while the final copy takes place.
Now that all the nodes share a replicated cluster database, it’s possible to create user accounts which can login to the GUI and manage all nodes, without having to create system users on every node.
Using these steps, everyone in the class can create their own user account, so they don’t need to use the PAM ‘root’ login any more.
In the GUI, go to Datacenter (first column), Permissions > Groups (second column)
If you find there’s already a group called “admin”, then somebody else has beaten you to this: go straight to the next section, “Create your own user account”.
Otherwise, you are the first. Click on “Create”
In the “Create: Group” box, enter name “admin”, then press Create. (If it fails with “group ‘admin’ already exists” then someone else has beaten you to it, skip to the next section).

Next, click on “Permissions” in the second column (directly above “Users”). Click Add > Group Permission.

From the drop-down menus, select:
/adminAdministratorthen click Add. This gives anyone in the “admin” group all permissions to all resources.
Now you are going to create your own user account.
Go to Permissions > Users and click “Add”. In the dialog box enter:

You should see your new user in the list of users:

Now logout; you can find this by clicking the root@pam
drop-down menu at the top-right corner.
You should be prompted to login again. Login with your new
credentials, remembering to select realm “Proxmox VE authentication
server” if it’s not already selected. You should be able to login, and
now see <username>@pve in the top-right corner.
A cluster set up this way means that most users don’t need to know
the root password. Furthermore, any actions taken in the GUI (such as
stopping and starting VMs) will be shown in the log with their username
instead of root@pam.
You can also assign different levels of permissions to groups or users. We will not investigate that here.
NOTE: if you try to open a shell on a Proxmox node, and are logged into Proxmox as a user other than
root@pam, you will be required to login to that shell. Login as “root” plus the root password.
For this lab, we have installed the web server certificates for you. To do this manually, you’d upload
/etc/pve/nodes/<nodename>/pveproxy-ssl.key # private key
/etc/pve/nodes/<nodename>/pveproxy-ssl.pem # certificate
and then systemctl restart pveproxy. Alternatively, the
web interface has options for uploading certificates and configuring
ACME for automated certificate issuance - we don’t have time to go into
this. We recommend you configure this after you have joined
your nodes into the cluster, since adding a node to the cluster wipes
its local copy of /etc/pve and synchronizes it to the
shared cluster database, and hence any existing certificate it has is
lost.
To see how we configured the NICs, go to nodeXY (first column), System > Network (second column). For the maximum cluster reliability it is even possible to configure two corosync NICs, on separate subnets and connected to separate switches. Again, we won’t do that here. Instead we have reserved an extra NIC for use for storage replication traffic.