Configure high availability with Masakari

Instances High Availability Service or Masakari is an OpenStack project designed to ensure high availability of instances and compute processes running on hosts.

Before the end user can start enjoying the benefits of Masakari, the cloud operator has to configure the service properly. This section includes instructions on how to create segments and host through the Masakari API as well as provides the list of additional settings that can be useful in certain use cases.

Group compute nodes into segments

The segment object is a logical grouping of compute nodes into zones also known as availability zones. The segment object enables the cloud operator to list, create, show details for, update, and delete segments.

To create a segment named allcomputes with service_type = compute, and recovery_method = auto, run:

openstack segment create allcomputes auto compute

Example of a positive system response:

+-----------------+--------------------------------------+
| Field           | Value                                |
+-----------------+--------------------------------------+
| created_at      | 2021-07-06T07:34:23.000000           |
| updated_at      | None                                 |
| uuid            | b8b0d7ca-1088-49db-a1e2-be004522f3d1 |
| name            | allcomputes                          |
| description     | None                                 |
| id              | 2                                    |
| service_type    | compute                              |
| recovery_method | auto                                 |
+-----------------+--------------------------------------+

Create hosts under segments

The host object represents compute service hypervisors. A host belongs to a segment. The host can be any kind of virtual machine that has compute service running on it. The host object enables the operator to list, create, show details for, update, and delete hosts.

To create a host under a given segment:

  1. Obtain the hypervisor hostname:

    openstack hypervisor list
    

    Example of a positive system response:

    +----+-------------------------------------------------------+-----------------+------------+-------+
    | ID | Hypervisor Hostname                                   | Hypervisor Type | Host IP    | State |
    +----+-------------------------------------------------------+-----------------+------------+-------+
    |  2 | vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt | QEMU            | 10.10.0.39 | up    |
    |  5 | vs-ps-vyvsrkrdpusv-0-ukqbpy2pkcuq-server-s4u2thvgxdfi | QEMU            | 10.10.0.14 | up    |
    +----+-------------------------------------------------------+-----------------+------------+-------+
    
  2. Create the host under previously created segment. For example, with uuid = b8b0d7ca-1088-49db-a1e2-be004522f3d1:

    Caution

    The segment under which you create a host must exist.

    openstack segment host create \
        vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt \
        compute \
        SSH \
        b8b0d7ca-1088-49db-a1e2-be004522f3d1
    

    Positive system response:

    +---------------------+-------------------------------------------------------+
    | Field               | Value                                                 |
    +---------------------+-------------------------------------------------------+
    | created_at          | 2021-07-06T07:37:26.000000                            |
    | updated_at          | None                                                  |
    | uuid                | 6f1bd5aa-0c21-446a-b6dd-c1b4d09759be                  |
    | name                | vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt |
    | type                | compute                                               |
    | control_attributes  | SSH                                                   |
    | reserved            | False                                                 |
    | on_maintenance      | False                                                 |
    | failover_segment_id | b8b0d7ca-1088-49db-a1e2-be004522f3d1                  |
    +---------------------+-------------------------------------------------------+
    

Enable notifications

The alerting API is used by Masakari monitors to notify about a failure of either a host, process, or instance. The notification object enables the operator to list, create, and show details of notifications.

Useful tunings

The list of useful tunings for the Masakari service includes:

  • [host_failure]\evacuate_all_instances

    Enables the operator to decide whether to evacuate all instances or only the instances that have [host_failure]\ha_enabled_instance_metadata_key set to True. By default, the parameter is set to False.

  • [host_failure]\ha_enabled_instance_metadata_key

    Enables the operator to decide on the instance metadata key naming that affects the per instance behavior of [host_failure]\evacuate_all_instances. The default is the same for both failure types, which include host and instance, but the value can be overridden to make the metadata key different per failure type.

  • [host_failure]\ignore_instances_in_error_state

    Enables the operator to decide whether error instances should be allowed for evacuation from a failed source compute node or not. If set to True, it will ignore error instances from evacuation from a failed source compute node. Otherwise, it will evacuate error instances along with other instances from a failed source compute node.

  • [instance_failure]\process_all_instances

    Enables the operator to decide whether all instances or only the ones that have [instance_failure]\ha_enabled_instance_metadata_key set to True should be recovered from instance failure events. If set to True, it will execute instance failure recovery actions for an instance irrespective of whether that particular instance has [instance_failure]\ha_enabled_instance_metadata_key set to True or not. Otherwise, it will only execute instance failure recovery actions for an instance which has [instance_failure]\ha_enabled_instance_metadata_key set to True.

  • [instance_failure]\ha_enabled_instance_metadata_key

    Enables the operators to decide on the instance metadata key naming that affects the per-instance behavior of [instance_failure]\process_all_instances. The default is the same for both failure types, which include host and instance, but you can override the value to make the metadata key different per failure type.