OpenStack Controller maintenance API¶
When LCM creates the
ClusterMaintenanceRequest object, the OpenStack
Controller ensures that all OpenStack components are in the
state, which means that the pods are up and running, and the readiness
probes are passing.
The ClusterMaintenanceRequest object creation flow:
When LCM creates the
NodeMaintenanceRequest, the OpenStack Controller:
Prepares components on the node for maintenance by removing
If the reboot of a node is possible, the instance migration workflow is triggered. The Operator can configure the instance migration flow through the Kubernetes node annotation and should define the required option before the managed cluster update.
To mitigate the potential impact on the cloud workloads, you can define the instance migration flow for the compute nodes running the most valuable instances.
The list of available options for the instance migration configuration includes:
Default. The OpenStack controller live migrates instances automatically. The update mechanism tries to move the memory and local storage of all instances on the node to another node without interrupting before applying any changes to the node. By default, the update mechanism makes three attempts to migrate each instance before falling back to the
Success of live migration depends on many factors including the selected vCPU type and model, the amount of data that needs to be transferred, the intensity of the disk IO and memory writes, the type of the local storage, and others. Instances using the following product features are known to have issues with live migration:
LVM-based ephemeral storage with and without encryption
Encrypted block storage volumes
CPU and NUMA node pinning
The OpenStack Controller waits for the Operator to migrate instances from the compute node. When it is time to update the compute node, the update mechanism asks you to manually migrate the instances and proceeds only once you confirm the node is safe to update.
The OpenStack Controller skips the instance check on the node and reboots it.
For the clouds relying on the converged LVM with iSCSI block storage that offer persistent volumes in a remote edge sub-region, it is important to keep in mind that applying a major change to a compute node may impact not only the instances running on this node but also the instances attached to the LVM devices hosted there. We recommend that in such environments you perform the update procedure in the
manualmode with mitigation measures taken by the Operator for each compute node. Otherwise, all the instances that have LVM with iSCSI volumes attached would need reboot to restore the connectivity.
Defines the number of times the OpenStack Controller attempts to migrate a single instance before giving up. Defaults to
You can also use annotations to control the update of non-compute nodes if they represent critical points of a specific cloud architecture. For example, setting the
manualon a controller node with a collocated gateway (Open vSwitch) will allow the Operator to gracefully shut down all the virtual routers hosted on this node.
If the OpenStack Controller cannot migrate instances due to errors, it is suspended unless all instances are migrated manually or the
openstack.lcm.mirantis.com/instance_migration_modeannotation is set to
The NodeMaintenanceRequest object creation flow:
When the node maintenance is over, LCM removes the
object and the OpenStack Controller:
Verifies that the Kubernetes Node becomes
Verifies that all OpenStack components on a given node are
Healthy, which means that the pods are up and running, and the readiness probes are passing.
Ensures that the OpenStack components are connected to RabbitMQ. For example, the Neutron Agents become alive on the node, and compute instances are in the
The OpenStack Controller enables you to have only one
nodeworkloadlock object at a time in the inactive state. Therefore,
the update process for nodes is sequential.
The NodeMaintenanceRequest object removal flow:
When the cluster maintenance is over, the OpenStack Controller sets the
ClusterWorkloadLock object to back
active and the update completes.
The CLusterMaintenanceRequest object removal flow: