Instance HA overview

The Instance High Availability (HA) service (OpenStack Masakari) provides automated recovery for virtual machine workloads triggering evacuation or restart workflows when failures are detected.

Instance HA components

The Instance HA service service consists of the following components:

Instance HA components
Component	Description
API	Receives recovery requests from users and failure events from monitors, then dispatches them to the Engine.
Engine	Orchestrates and executes the defined recovery workflows based on the received failure events.
Monitors	Detect failures and notify the API. MOSK utilizes the following monitor types: Instance monitor: Monitors the liveness of instance processes on the compute host. Introspective instance monitor: Enhances availability by identifying system-level failures within the guest OS via the QEMU Guest Agent. Host monitor: Monitors compute host liveness; runs as part of the Node controller within the OpenStack Controller (Rockoon). Note The Processes monitor is not included in MOSK because high availability for the OpenStack control plane processes is managed natively by Kubernetes.

Instance HA recovery actions

Depending on the failure type, the service performs the following actions:

Instance HA recovery actions
Recovery action	Description
Host evacuation	If a compute host fails, the service automatically evacuates the affected instances, rebuilding them on a healthy compute host to restore availability.
Instance recovery	In the case of an internal instance failure, such as a crashed instance process, the service automatically restarts the instance to resume normal operation.

Instance HA recovery management

As a cloud user, you can control whether your instances are managed by the Instance HA service by applying a specific metadata key to your instances. The metadata keys are defined by a cloud administrator. By default, the HA_Enabled=True is used to authorize both host evacuation and instance restart recovery workflows. For the procedure, refer to Use the Instance HA service.

As a cloud administator, you can fine-tune the scope and priority of the Instance HA service to match your workload requirements:

Instance HA administrative controls
Control mechanism	Description
Recovery scope and priority	Administrators can configure the service to monitor all instances or only those labeled with a specific metadata key. If all instances are monitored, those with the metadata key are prioritized during host evacuation to ensure critical workloads are restored first.
Granular control	Administrators can define separate metadata keys to authorize instance restarts and host evacuations independently, allowing for distinct recovery logic per failure type.
Project-based limits	Administrators can restrict host evacuation capabilities to specific OpenStack projects. This ensures that HA resources are reserved for high-priority projects while excluding non-critical workloads from automated recovery.

For Instance HA configuration procedures for cloud administrators, refer to: