Replace a failed KVM node

Replace a failed KVM node

This section describes how to recreate a failed KVM node with all VCP VMs that were hosted on the old KVM node. The replaced KVM node will be assigned the same IP addresses as the failed KVM node.

To replace a failed KVM node:

  1. Log in to the Salt Master node.

  2. Copy and keep the hostname and GlusterFS UUID of the old KVM node.

    To obtain the UUIDs of all peers in the cluster:

    salt '*kvm<NUM>*' "gluster peer status"


    Run the command above from a different KVM node of the same cluster since the command outputs other peers only.

  3. Verify that the KVM node is not registered in salt-key. If the node is present, remove it:

    salt-key | grep kvm<NUM>
    salt-key -d kvm<NUM>.domain_name
  4. Remove the salt-key records for all VMs originally running on the failed KVM node:

    salt-key -d <kvm_node_name><NUM>.domain_name


    You can list all VMs running on the KVM node using the salt '*kvm<NUM>*' 'virsh list --all' command. Alternatively, obtain the list of VMs from cluster/infra/kvm.yml.

  5. Add or reprovision a physical node using MAAS as described in the MCP Deployment Guide: Provision physical nodes using MAAS.

  6. Verify that the new node has been registered on the Salt Master node successfully:

    salt-key | grep kvm


    If the new node is not available in the list, wait some time until the node becomes available or use the IPMI console to troubleshoot the node.

  7. Verify that the target node has connectivity with the Salt Master node:

    salt '*kvm<NUM>*'
  8. Verify that salt-common and salt-minion have the same version for the new node as the rest of the cluster.

    salt -t 10 'kvm*' 'dpkg -l |grep "salt-minion\|salt-common"'


    If the command above shows a different version for the new node, follow the steps described in Install the correct versions of salt-common and salt-minion.

  9. Verify that the Salt Minion nodes are synchronized:

    salt '*' saltutil.refresh_pillar
  10. Apply the linux state for the added node:

    salt '*kvm<NUM>*' state.sls linux
  11. Perform the initial Salt configuration:

    1. Run the following commands:

      salt '*kvm<NUM>*' "touch /run/is_rebooted"
      salt '*kvm<NUM>*' 'reboot'

      Wait some time before the node is rebooted.

    2. Verify that the node is rebooted:

      salt '*kvm<NUM>*' 'if [ -f "/run/is_rebooted" ];then echo \
      "Has not been rebooted!";else echo "Rebooted";fi'


      The node must be in the Rebooted state.

  12. Set up the network interfaces and the SSH access:

    salt -C 'I@salt:control' state.sls linux.system.user,openssh,,ntp
  13. Apply the libvirt state for the added node:

    salt '*kvm<NUM>*' state.sls libvirt
  14. Recreate the original VCP VMs on the new node:

    salt '*kvm<NUM>*' state.sls salt.control


    Salt virt takes the name of a VM and registers it on the Salt Master node.

    Once created, the instance picks up an IP address from the MAAS DHCP service and the key will be seen as accepted on the Salt Master node.

  15. Verify that the added VCP VMs are registered on the Salt Master node:

  16. Verify that the Salt Minion nodes are synchronized:

    salt '*' saltutil.sync_all
  17. Apply the highstate for the VCP VMs:

    salt '*kvm<NUM>*' state.highstate
  18. Verify whether the new node has correct IP address and proceed to restore GlusterFS configuration as described in Recover GlusterFS on a replaced KVM node.