Instance HA overview¶
The Instance High Availability (HA) service (OpenStack Masakari) provides automated recovery for virtual machine workloads triggering evacuation or restart workflows when failures are detected.
Instance HA components¶
The Instance HA service service consists of the following components:
Component |
Description |
|---|---|
API |
Receives recovery requests from users and failure events from monitors, then dispatches them to the Engine. |
Engine |
Orchestrates and executes the defined recovery workflows based on the received failure events. |
Monitors |
Detect failures and notify the API. MOSK utilizes the following monitor types:
Note The Processes monitor is not included in MOSK because high availability for the OpenStack control plane processes is managed natively by Kubernetes. |
Instance HA recovery actions¶
Depending on the failure type, the service performs the following actions:
Recovery action |
Description |
|---|---|
Host evacuation |
If a compute host fails, the service automatically evacuates the affected instances, rebuilding them on a healthy compute host to restore availability. |
Instance recovery |
In the case of an internal instance failure, such as a crashed instance process, the service automatically restarts the instance to resume normal operation. |
Instance HA recovery management¶
As a cloud user, you can control whether your instances are managed by the
Instance HA service by applying a specific metadata key to your instances.
The metadata keys are defined by a cloud administrator. By default, the
HA_Enabled=True is used to authorize both host evacuation and
instance restart recovery workflows. For the procedure, refer to
Use the Instance HA service.
As a cloud administator, you can fine-tune the scope and priority of the Instance HA service to match your workload requirements:
Control mechanism |
Description |
|---|---|
Recovery scope and priority |
Administrators can configure the service to monitor all instances or only those labeled with a specific metadata key. If all instances are monitored, those with the metadata key are prioritized during host evacuation to ensure critical workloads are restored first. |
Granular control |
Administrators can define separate metadata keys to authorize instance restarts and host evacuations independently, allowing for distinct recovery logic per failure type. |
Project-based limits |
Administrators can restrict host evacuation capabilities to specific OpenStack projects. This ensures that HA resources are reserved for high-priority projects while excluding non-critical workloads from automated recovery. |
For Instance HA configuration procedures for cloud administrators, refer to: