Replace a failed KVM node

Replace a failed KVM nodeΒΆ

This section describes how to recreate a failed KVM node with all VCP VMs that were hosted on the old KVM node. The replaced KVM node will be assigned the same IP addresses as the failed KVM node.

To replace a failed KVM node:

  1. Log in to the Salt Master node.

  2. Copy and keep the hostname and GlusterFS UUID of the old KVM node.

    To obtain the UUIDs of all peers in the cluster:

    salt '*kvm<NUM>*' cmd.run "gluster peer status"
    

    Note

    Run the command above from a different KVM node of the same cluster since the command outputs other peers only.

  3. Verify that the KVM node is not registered in salt-key. If the node is present, remove it:

    salt-key | grep kvm<NUM>
    salt-key -d kvm<NUM>.domain_name
    
  4. Remove the salt-key records for all VMs originally running on the failed KVM node:

    salt-key -d <kvm_node_name><NUM>.domain_name
    

    Note

    You can list all VMs running on the KVM node using the salt '*kvm<NUM>*' cmd.run 'virsh list --all' command. Alternatively, obtain the list of VMs from cluster/infra/kvm.yml.

  5. Add or reprovision a physical node using MAAS as described in the MCP Deployment Guide: Provision physical nodes using MAAS.

  6. Verify that the new node has been registered on the Salt Master node successfully:

    salt-key | grep kvm
    

    Note

    If the new node is not available in the list, wait some time until the node becomes available or use the IPMI console to troubleshoot the node.

  7. Verify that the target node has connectivity with the Salt Master node:

    salt '*kvm<NUM>*' test.ping
    
  8. Verify that salt-common and salt-minion have the same version for the new node as the rest of the cluster.

    salt -t 10 'kvm*' cmd.run 'dpkg -l |grep "salt-minion\|salt-common"'
    

    Note

    If the command above shows a different version for the new node, follow the steps described in Install the correct versions of salt-common and salt-minion.

  9. Verify that the Salt Minion nodes are synchronized:

    salt '*' saltutil.refresh_pillar
    
  10. Apply the linux state for the added node:

    salt '*kvm<NUM>*' state.sls linux
    
  11. Perform the initial Salt configuration:

    1. Run the following commands:

      salt '*kvm<NUM>*' cmd.run "touch /run/is_rebooted"
      salt '*kvm<NUM>*' cmd.run 'reboot'
      

      Wait some time before the node is rebooted.

    2. Verify that the node is rebooted:

      salt '*kvm<NUM>*' cmd.run 'if [ -f "/run/is_rebooted" ];then echo \
      "Has not been rebooted!";else echo "Rebooted";fi'
      

      Note

      The node must be in the Rebooted state.

  12. Set up the network interfaces and the SSH access:

    salt -C 'I@salt:control' state.sls linux.system.user,openssh,linux.network,ntp
    
  13. Apply the libvirt state for the added node:

    salt '*kvm<NUM>*' state.sls libvirt
    
  14. Recreate the original VCP VMs on the new node:

    salt '*kvm<NUM>*' state.sls salt.control
    

    Note

    Salt virt takes the name of a VM and registers it on the Salt Master node.

    Once created, the instance picks up an IP address from the MAAS DHCP service and the key will be seen as accepted on the Salt Master node.

  15. Verify that the added VCP VMs are registered on the Salt Master node:

    salt-key
    
  16. Verify that the Salt Minion nodes are synchronized:

    salt '*' saltutil.sync_all
    
  17. Apply the highstate for the VCP VMs:

    salt '*kvm<NUM>*' state.highstate
    
  18. Verify whether the new node has correct IP address and proceed to restore GlusterFS configuration as described in Recover GlusterFS on a replaced KVM node.