Configure high availability with Masakari¶

Instances High Availability Service or Masakari is an OpenStack project designed to ensure high availability of instances and compute processes running on hosts.

Before the end user can start enjoying the benefits of Masakari, the cloud operator has to configure the service properly. This section includes instructions on how to create segments and host through the Masakari API as well as provides the list of additional settings that can be useful in certain use cases.

Group compute nodes into segments¶

The segment object is a logical grouping of compute nodes into zones also known as availability zones. The segment object enables the cloud operator to list, create, show details for, update, and delete segments.

To create a segment named allcomputes with service_type = compute, and recovery_method = auto, run:

openstack segment create allcomputes auto compute

Example of a positive system response:

+-----------------+--------------------------------------+
| Field           | Value                                |
+-----------------+--------------------------------------+
| created_at      | 2021-07-06T07:34:23.000000           |
| updated_at      | None                                 |
| uuid            | b8b0d7ca-1088-49db-a1e2-be004522f3d1 |
| name            | allcomputes                          |
| description     | None                                 |
| id              | 2                                    |
| service_type    | compute                              |
| recovery_method | auto                                 |
+-----------------+--------------------------------------+

Create hosts under segments¶

The host object represents compute service hypervisors. A host belongs to a segment. The host can be any kind of virtual machine that has compute service running on it. The host object enables the operator to list, create, show details for, update, and delete hosts.

To create a host under a given segment:

Obtain the hypervisor hostname:

openstack hypervisor list

Example of a positive system response:

+----+-------------------------------------------------------+-----------------+------------+-------+
| ID | Hypervisor Hostname                                   | Hypervisor Type | Host IP    | State |
+----+-------------------------------------------------------+-----------------+------------+-------+
|  2 | vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt | QEMU            | 10.10.0.39 | up    |
|  5 | vs-ps-vyvsrkrdpusv-0-ukqbpy2pkcuq-server-s4u2thvgxdfi | QEMU            | 10.10.0.14 | up    |
+----+-------------------------------------------------------+-----------------+------------+-------+

Create the host under previously created segment. For example, with uuid = b8b0d7ca-1088-49db-a1e2-be004522f3d1:

Caution

The segment under which you create a host must exist.

openstack segment host create \
    vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt \
    compute \
    SSH \
    b8b0d7ca-1088-49db-a1e2-be004522f3d1

Positive system response:

+---------------------+-------------------------------------------------------+
| Field               | Value                                                 |
+---------------------+-------------------------------------------------------+
| created_at          | 2021-07-06T07:37:26.000000                            |
| updated_at          | None                                                  |
| uuid                | 6f1bd5aa-0c21-446a-b6dd-c1b4d09759be                  |
| name                | vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt |
| type                | compute                                               |
| control_attributes  | SSH                                                   |
| reserved            | False                                                 |
| on_maintenance      | False                                                 |
| failover_segment_id | b8b0d7ca-1088-49db-a1e2-be004522f3d1                  |
+---------------------+-------------------------------------------------------+

Enable notifications¶

The alerting API is used by Masakari monitors to notify about a failure of either a host, process, or instance. The notification object enables the operator to list, create, and show details of notifications.

Useful tunings¶

The list of useful tunings for the Masakari service includes:

[host_failure]\evacuate_all_instances

Enables the operator to decide whether to evacuate all instances or only the instances that have [host_failure]\ha_enabled_instance_metadata_key set to True. By default, the parameter is set to False.
[host_failure]\ha_enabled_instance_metadata_key

Enables the operator to decide on the instance metadata key naming that affects the per instance behavior of [host_failure]\evacuate_all_instances. The default is the same for both failure types, which include host and instance, but the value can be overridden to make the metadata key different per failure type.
[host_failure]\ignore_instances_in_error_state

Enables the operator to decide whether error instances should be allowed for evacuation from a failed source compute node or not. If set to True, it will ignore error instances from evacuation from a failed source compute node. Otherwise, it will evacuate error instances along with other instances from a failed source compute node.
Available since MOSK 24.2 [host_failure]\ha_enabled_project_tag

By default, instances belonging to any project are evacuated. However, if the operator needs to restrict this functionality to specific projects, they can tag these projects with a designated tag and pass this tag as the value for this Masakari option. Consequently, instances from projects that do not have the specified tag are not considered for evacuation, even if they have the corresponding metadata key and value set.
[instance_failure]\process_all_instances

Enables the operator to decide whether all instances or only the ones that have [instance_failure]\ha_enabled_instance_metadata_key set to True should be recovered from instance failure events. If set to True, it will execute instance failure recovery actions for an instance irrespective of whether that particular instance has [instance_failure]\ha_enabled_instance_metadata_key set to True or not. Otherwise, it will only execute instance failure recovery actions for an instance which has [instance_failure]\ha_enabled_instance_metadata_key set to True.
[instance_failure]\ha_enabled_instance_metadata_key

Enables the operators to decide on the instance metadata key naming that affects the per-instance behavior of [instance_failure]\process_all_instances. The default is the same for both failure types, which include host and instance, but you can override the value to make the metadata key different per failure type.