Use the Instance HA service

The OpenStack Instance HA service (Masakari) provides automated recovery for Nova instances and compute hosts to minimize downtime.

If an instance process crashes or hangs, the Instance HA service detects the failure and automatically restarts the instance on the same compute host.

If an entire physical server fails, the Instance HA service triggers an evacuation of instances from the failed compute host. This process automatically rebuilds the affected instances on a healthy compute host within the same failover segment. Because this is a recovery process rather than a live migration, note the following impact on your data:

Memory state: all data currently stored in the instance RAM will be lost
Disk state:
- If your instance uses Cinder volumes or shared storage, such as Ceph, your data is preserved and the instance will boot from its last written state
- If the instance uses local ephemeral storage, all data on the disk will be lost as the instance is rebuilt from its base image

To verify that the Instance HA service is active and available in your OpenStack cloud, execute the following command and verify that the response is successful:

openstack catalog show instance-ha

Example of a positive system response:

+-----------+----------------------------------------------------------------------+
| Field     | Value                                                                |
+-----------+----------------------------------------------------------------------+
| endpoints | CustomRegion                                                         |
|           |   public: https://masakari.it.just.works/v1                          |
|           | CustomRegion                                                         |
|           |   internal: http://masakari-api.openstack.svc.cluster.local:15868/v1 |
|           | CustomRegion                                                         |
|           |   admin: http://masakari-api.openstack.svc.cluster.local:15868/v1    |
|           |                                                                      |
| id        | ed359ae64a2847f89c82c38177eb8392                                     |
| name      | masakari                                                             |
| type      | instance-ha                                                          |
+-----------+----------------------------------------------------------------------+

Note

The Instance HA service is primarily managed by cloud administrators. As a non-admin user, you cannot interact with the Masakari API directly to create failover segments or manage hosts.

Enable High Availability for an instance

Create an instance. For example, to create a minimal CirrOS instance:

openstack server create --image Cirros-6.0 --flavor m1.tiny --network DemoNetwork DemoInstance01

Enable High Availability (HA) for the instance:

Note

Although the Instance HA service may be enabled globally in your cloud, it typically operates on an opt-in basis to prioritize critical workloads.

By default, the service only monitors and recovers instances marked with a specific metadata key (typically HA_Enabled). Depending on your cloud configuration of the ha_enabled_instance_metadata_key setting, you may need different keys for host failures versus instance failures.

To ensure your instance is protected by the HA engine, apply the metadata property. For example, using the default HA_Enabled metadata key:
```
openstack server set --property HA_Enabled=True DemoInstance01
```
Confirm that the metadata property has been successfully applied to the instance:
```
openstack server show DemoInstance01 -c properties
```
The output should list the HA_Enabled='True' property under the properties field.

Note

If HA_Enabled=True does not trigger a recovery, contact your cloud administrator to verify if a custom metadata key name has been configured in the Instance HA service settings. For details, refer to Configure high availability with Masakari for cloud administrators.