OpenStack Controller maintenance API

When LCM creates the ClusterMaintenanceRequest object, the OpenStack Controller ensures that all OpenStack components are in the Healthy state, which means that the pods are up and running, and the readiness probes are passing.

The ClusterMaintenanceRequest object creation flow:

ClusterMaintenanceRequest - create

When LCM creates the NodeMaintenanceRequest, the OpenStack Controller:

  1. Prepares components on the node for maintenance by removing nova-compute from scheduling.

  2. If the reboot of a node is possible, the instance migration workflow is triggered. The Operator can configure the instance migration flow through the Kubernetes node annotation and should define the required option before the managed cluster update.

    To mitigate the potential impact on the cloud workloads, you can define the instance migration flow for the compute nodes running the most valuable instances.

    The list of available options for the instance migration configuration includes:

    • The openstack.lcm.mirantis.com/instance_migration_mode annotation:

      • live

        Default. The OpenStack controller live migrates instances automatically. The update mechanism tries to move the memory and local storage of all instances on the node to another node without interrupting before applying any changes to the node. By default, the update mechanism makes three attempts to migrate each instance before falling back to the manual mode.

        Note

        Success of live migration depends on many factors including the selected vCPU type and model, the amount of data that needs to be transferred, the intensity of the disk IO and memory writes, the type of the local storage, and others. Instances using the following product features are known to have issues with live migration:

        • LVM-based ephemeral storage with and without encryption

        • Encrypted block storage volumes

        • CPU and NUMA node pinning

      • manual

        The OpenStack Controller waits for the Operator to migrate instances from the compute node. When it is time to update the compute node, the update mechanism asks you to manually migrate the instances and proceeds only once you confirm the node is safe to update.

      • skip

        The OpenStack Controller skips the instance check on the node and reboots it.

        Note

        For the clouds relying on the converged LVM with iSCSI block storage that offer persistent volumes in a remote edge sub-region, it is important to keep in mind that applying a major change to a compute node may impact not only the instances running on this node but also the instances attached to the LVM devices hosted there. We recommend that in such environments you perform the update procedure in the manual mode with mitigation measures taken by the Operator for each compute node. Otherwise, all the instances that have LVM with iSCSI volumes attached would need reboot to restore the connectivity.

    • The openstack.lcm.mirantis.com/instance_migration_attempts annotation

      Defines the number of times the OpenStack Controller attempts to migrate a single instance before giving up. Defaults to 3.

    Note

    You can also use annotations to control the update of non-compute nodes if they represent critical points of a specific cloud architecture. For example, setting the instance_migration_mode to manual on a controller node with a collocated gateway (Open vSwitch) will allow the Operator to gracefully shut down all the virtual routers hosted on this node.

  3. If the OpenStack Controller cannot migrate instances due to errors, it is suspended unless all instances are migrated manually or the openstack.lcm.mirantis.com/instance_migration_mode annotation is set to skip.

The NodeMaintenanceRequest object creation flow:

NodeMaintenanceRequest - create

When the node maintenance is over, LCM removes the NodeMaintenanceRequest object and the OpenStack Controller:

  • Verifies that the Kubernetes Node becomes Ready.

  • Verifies that all OpenStack components on a given node are Healthy, which means that the pods are up and running, and the readiness probes are passing.

  • Ensures that the OpenStack components are connected to RabbitMQ. For example, the Neutron Agents become alive on the node, and compute instances are in the UP state.

Note

The OpenStack Controller enables you to have only one nodeworkloadlock object at a time in the inactive state. Therefore, the update process for nodes is sequential.

The NodeMaintenanceRequest object removal flow:

NodeMaintenanceRequest - delete

When the cluster maintenance is over, the OpenStack Controller sets the ClusterWorkloadLock object to back active and the update completes.

The CLusterMaintenanceRequest object removal flow:

ClusterMaintenanceRequest - delete