Cluster update flow¶
This section describes the MOSK cluster update flow to the product releases that contain major updates and require node reboot such as support for new Linux kernel, and similar.
The diagram below illustrates the sequence of operations controlled by
LCM and taking place during the update under the hood. We assume that the
ClusterWorkloadLock
and NodeWrokloadLock
objects present in the cluster
are in the active state before the cloud operator triggers the update.
See also
For details about the Application Controllers flow during different maintenance stages, refer to:
Phase 1: The Operator triggers the update¶
The Operator sets appropriate annotations on nodes and selects suitable migration mode for workloads.
The Operator triggers the managed cluster update through the Mirantis Container Cloud web UI as described in Step 2. Initiate MOSK cluster update.
LCM creates the
ClusterMaintenance
object and notifies the application controllers about planned maintenance.
Phase 2: LCM triggers the OpenStack and Ceph update¶
The OpenStack update starts.
Ceph is waiting for the OpenStack
ClusterWorkloadLock
object to become inactive.When the OpenStack update is finalized, the OpenStack Controller marks
ClusterWorkloadLock
as inactive.The Ceph Controller triggers an update of the Ceph cluster.
When the Ceph update is finalized, Ceph marks the
ClusterWorkloadLock
object as inactive.
Phase 3: LCM initiates the Kubernetes master nodes update¶
If a master node has collocated roles, LCM creates
NodeMainteananceRequest
for the node.All Application Controllers mark their
NodeWorkloadLock
objects for this node as inactive.LCM starts draining the node by gracefully moving out all pods from the node. The DaemonSet pods are not evacuated and left running.
LCM downloads the new version of the LCM Agent and runs its states.
Note
While running Ansible states, the services on the node may be restarted.
The above flow is applied to all Kubernetes master nodes one by one.
LCM removes
NodeMainteananceRequest
.
Phase 4: LCM initiates the Kubernetes worker nodes update¶
LCM creates
NodeMaintenanceRequest
for the node with specifying scope.Application Controllers start preparing the node according to the scope.
LCM waits until all Application Controllers mark their
NodeWorkloadLock
objects for this node as inactive.All pods are evacuated from the node by draining it. This does not apply to the DaemonSet pods, which cannot be removed.
LCM downloads the new version of the LCM Agent and runs its states.
Note
While running Ansible states, the services on the node may be restarted.
The above flow is applied to all Kubernetes worker nodes one by one.
LCM removes
NodeMainteananceRequest
.
Phase 5: Finalization¶
LCM triggers the update for all other applications present in the cluster, such as StackLight, Tungsten Fabric, and others.
LCM removes
ClusterMaintenanceRequest
.
After a while the cluster update completes and becomes fully operable again.