This section describes how to recreate a failed KVM node with all VCP VMs that were hosted on the old KVM node. The replaced KVM node will be assigned the same IP addresses as the failed KVM node.
To replace a failed KVM node:
Log in to the Salt Master node.
Copy and keep the hostname and GlusterFS UUID of the old KVM node.
To obtain the UUIDs of all peers in the cluster:
salt '*kvm<NUM>*' cmd.run "gluster peer status"
Note
Run the command above from a different KVM node of the same cluster since the command outputs other peers only.
Verify that the KVM node is not registered in salt-key
.
If the node is present, remove it:
salt-key | grep kvm<NUM>
salt-key -d kvm<NUM>.domain_name
Remove the salt-key
records for all VMs originally running on the failed
KVM node:
salt-key -d <kvm_node_name><NUM>.domain_name
Note
You can list all VMs running on the KVM node using the
salt '*kvm<NUM>*' cmd.run 'virsh list --all'
command.
Alternatively, obtain the list of VMs from cluster/infra/kvm.yml
.
Add or reprovision a physical node using MAAS as described in the MCP Deployment Guide: Provision physical nodes using MAAS.
Verify that the new node has been registered on the Salt Master node successfully:
salt-key | grep kvm
Note
If the new node is not available in the list, wait some time until the node becomes available or use the IPMI console to troubleshoot the node.
Verify that the target node has connectivity with the Salt Master node:
salt '*kvm<NUM>*' test.ping
Verify that salt-common
and salt-minion
have the
same version for the new node as the rest of the cluster.
salt -t 10 'kvm*' cmd.run 'dpkg -l |grep "salt-minion\|salt-common"'
Note
If the command above shows a different version for the new node, follow the steps described in Install the correct versions of salt-common and salt-minion.
Verify that the Salt Minion nodes are synchronized:
salt '*' saltutil.refresh_pillar
Apply the linux state for the added node:
salt '*kvm<NUM>*' state.sls linux
Perform the initial Salt configuration:
Run the following commands:
salt '*kvm<NUM>*' cmd.run "touch /run/is_rebooted"
salt '*kvm<NUM>*' cmd.run 'reboot'
Wait some time before the node is rebooted.
Verify that the node is rebooted:
salt '*kvm<NUM>*' cmd.run 'if [ -f "/run/is_rebooted" ];then echo \
"Has not been rebooted!";else echo "Rebooted";fi'
Note
The node must be in the Rebooted
state.
Set up the network interfaces and the SSH access:
salt -C 'I@salt:control' state.sls linux.system.user,openssh,linux.network,ntp
Apply the libvirt state for the added node:
salt '*kvm<NUM>*' state.sls libvirt
Recreate the original VCP VMs on the new node:
salt '*kvm<NUM>*' state.sls salt.control
Note
Salt virt
takes the name of a VM and registers it
on the Salt Master node.
Once created, the instance picks up an IP address from the MAAS DHCP service and the key will be seen as accepted on the Salt Master node.
Verify that the added VCP VMs are registered on the Salt Master node:
salt-key
Verify that the Salt Minion nodes are synchronized:
salt '*' saltutil.sync_all
Apply the highstate for the VCP VMs:
salt '*kvm<NUM>*' state.highstate
Verify whether the new node has correct IP address and proceed to restore GlusterFS configuration as described in Recover GlusterFS on a replaced KVM node.