Instance HA overview

The Instance High Availability (HA) service (OpenStack Masakari) provides automated recovery for virtual machine workloads triggering evacuation or restart workflows when failures are detected.

Instance HA components

The Instance HA service service consists of the following components:

Instance HA components

Component

Description

API

Receives recovery requests from users and failure events from monitors, then dispatches them to the Engine.

Engine

Orchestrates and executes the defined recovery workflows based on the received failure events.

Monitors

Detect failures and notify the API. MOSK utilizes the following monitor types:

  • Instance monitor: Monitors the liveness of instance processes on the compute host.

  • Introspective instance monitor: Enhances availability by identifying system-level failures within the guest OS via the QEMU Guest Agent.

  • Host monitor: Monitors compute host liveness; runs as part of the Node controller within the OpenStack Controller (Rockoon).

Note

The Processes monitor is not included in MOSK because high availability for the OpenStack control plane processes is managed natively by Kubernetes.

Instance HA recovery actions

Depending on the failure type, the service performs the following actions:

Instance HA recovery actions

Recovery action

Description

Host evacuation

If a compute host fails, the service automatically evacuates the affected instances, rebuilding them on a healthy compute host to restore availability.

Instance recovery

In the case of an internal instance failure, such as a crashed instance process, the service automatically restarts the instance to resume normal operation.

Instance HA recovery management

As a cloud user, you can control whether your instances are managed by the Instance HA service by applying a specific metadata key to your instances. The metadata keys are defined by a cloud administrator. By default, the HA_Enabled=True is used to authorize both host evacuation and instance restart recovery workflows. For the procedure, refer to Use the Instance HA service.

As a cloud administator, you can fine-tune the scope and priority of the Instance HA service to match your workload requirements:

Instance HA administrative controls

Control mechanism

Description

Recovery scope and priority

Administrators can configure the service to monitor all instances or only those labeled with a specific metadata key. If all instances are monitored, those with the metadata key are prioritized during host evacuation to ensure critical workloads are restored first.

Granular control

Administrators can define separate metadata keys to authorize instance restarts and host evacuations independently, allowing for distinct recovery logic per failure type.

Project-based limits

Administrators can restrict host evacuation capabilities to specific OpenStack projects. This ensures that HA resources are reserved for high-priority projects while excluding non-critical workloads from automated recovery.

For Instance HA configuration procedures for cloud administrators, refer to: