Use Masakari

The section provides instructions on how to verify whether the Masakari service has been correctly configured by the cloud operator and will recover an instance from the process and compute node failures.

Verify recovery from a VM process failure

  1. Create an instance:

    openstack server create --image Cirros-5.1 --flavor m1.tiny --network DemoNetwork DemoInstance01
    
  2. If required, mark it with the HA_Enabled tag:

    Note

    Depending on the Masakari service configuration, you may need to mark instances with the HA_Enabled tag. For more information about service configuration, refer to Configure high availability with Masakari.

    openstack server set --property HA_Enabled=True DemoInstance01
    
  3. Identify the compute host with the instance:

    openstack server show DemoInstance01 |grep host
    | OS-EXT-SRV-ATTR:host | vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt |
    
  4. Log in to the compute host and obtain the instance PID:

    ps xafu |grep qemu
    nova      5231 34.3  1.1 5459452 184712 ?      Sl   07:39   0:18  |   \_ /usr/bin/qemu-system-x86_64 -name guest=instance-00000002....
    
  5. Simulate the failure by killing the process:

    kill -9 5231
    
  6. Verify notifications:

    openstack notification list
    +--------------------------------------+----------------------------+---------+------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
    | notification_uuid                    | generated_time             | status  | type | source_host_uuid                     | payload                                                                                                               |
    +--------------------------------------+----------------------------+---------+------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
    | 2fb82a5c-9a8b-4cef-a06e-a737e1b565a0 | 2021-07-06T07:40:40.000000 | running | VM   | 6f1bd5aa-0c21-446a-b6dd-c1b4d09759be | {'event': 'LIFECYCLE', 'instance_uuid': '165cdfaf-b9e5-42b2-bbb9-af9283a789ae', 'vir_domain_event': 'STOPPED_FAILED'} |
    +--------------------------------------+----------------------------+---------+------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
    
    openstack notification list
    +--------------------------------------+----------------------------+----------+------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
    | notification_uuid                    | generated_time             | status   | type | source_host_uuid                     | payload                                                                                                               |
    +--------------------------------------+----------------------------+----------+------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
    | 2fb82a5c-9a8b-4cef-a06e-a737e1b565a0 | 2021-07-06T07:40:40.000000 | finished | VM   | 6f1bd5aa-0c21-446a-b6dd-c1b4d09759be | {'event': 'LIFECYCLE', 'instance_uuid': '165cdfaf-b9e5-42b2-bbb9-af9283a789ae', 'vir_domain_event': 'STOPPED_FAILED'} |
    +--------------------------------------+----------------------------+----------+------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
    
  7. Verify that the instance process has been recovered:

    ps xafu |grep qemu
    root      8800  0.0  0.0  11488  1104 pts/1    S+   07:41   0:00  |   |   \_ grep --color=auto qemu
    nova      8323  104  0.7 1262628 128936 ?      Sl   07:40   0:09  |   \_ /usr/bin/qemu-system-x86_64 -name guest=instance-00000002
    

Verify recovery from a node failure

  1. Create an instance:

    openstack server create --image Cirros-5.1 --flavor m1.tiny --network DemoNetwork DemoInstance01
    
  2. If required, mark it with the HA_Enabled tag:

    Note

    Depending on the Masakari service configuration, you may need to mark instances with the HA_Enabled tag. For more information about service configuration, refer to Configure high availability with Masakari.

    openstack server set --property HA_Enabled=True DemoInstance01
    
  3. Identify the compute host with the instance:

    openstack server show DemoInstance01 |grep host
    | OS-EXT-SRV-ATTR:host | vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt |
    
  4. Log in to the compute host and obtain the instance PID:

    ps xafu |grep qemu
    nova      5231 34.3  1.1 5459452 184712 ?      Sl   07:39   0:18  |   \_ /usr/bin/qemu-system-x86_64 -name guest=instance-00000002....
    
  5. Simulate the failure by killing the process:

    kill -9 5231
    
  6. Log in to the compute host and power it off.

  7. After a while, verify that the instance has been evacuated:

    openstack server show DemoInstance01 |grep host
    | OS-EXT-SRV-ATTR:host | vs-ps-vyvsrkrdpusv-0-ukqbpy2pkcuq-server-s4u2thvgxdfi |