Cluster update flow

This section describes the MOSK cluster update flow to the product releases that contain major updates and require node reboot such as support for new Linux kernel, and similar.

The diagram below illustrates the sequence of operations controlled by LCM and taking place during the update under the hood. We assume that the ClusterWorkloadLock and NodeWrokloadLock objects present in the cluster are in the active state before the cloud operator triggers the update.

Cluster update flow

See also

For details about the Application Controllers flow during different maintenance stages, refer to:

Phase 1: The Operator triggers the update

  1. The Operator sets appropriate annotations on nodes and selects suitable migration mode for workloads.

  2. The Operator triggers the managed cluster update through the Mirantis Container Cloud web UI as described in Update the cluster to MOSK 22.1 or above: Step 3. Initiate MOSK cluster update.

  3. LCM creates the ClusterMaintenance object and notifies the application controllers about planned maintenance.

Phase 2: LCM triggers the OpenStack and Ceph update

  1. The OpenStack update starts.

  2. Ceph is waiting for the OpenStack ClusterWorkloadLock object to become inactive.

  3. When the OpenStack update is finalized, the OpenStack Controller marks ClusterWorkloadLock as inactive.

  4. The Ceph Controller triggers an update of the Ceph cluster.

  5. When the Ceph update is finalized, Ceph marks the ClusterWorkloadLock object as inactive.

Phase 3: LCM initiates the Kubernetes master nodes update

  1. If a master node has collocated roles, LCM creates NodeMainteananceRequest for the node.

  2. All Application Controllers mark their NodeWorkloadLock objects for this node as inactive.

  3. LCM starts draining the node by gracefully moving out all pods from the node. The DaemonSet pods are not evacuated and left running.

  4. LCM downloads the new version of the LCM Agent and runs its states.

    Note

    While running Ansible states, the services on the node may be restarted.

  5. The above flow is applied to all Kubernetes master nodes one by one.

  6. LCM removes NodeMainteananceRequest.

Phase 4: LCM initiates the Kubernetes worker nodes update

  1. LCM creates NodeMaintenanceRequest for the node with specifying scope.

  2. Application Controllers start preparing the node according to the scope.

  3. LCM waits until all Application Controllers mark their NodeWorkloadLock objects for this node as inactive.

  4. All pods are evacuated from the node by draining it. This does not apply to the DaemonSet pods, which cannot be removed.

  5. LCM downloads the new version of the LCM Agent and runs its states.

    Note

    While running Ansible states, the services on the node may be restarted.

  6. The above flow is applied to all Kubernetes worker nodes one by one.

  7. LCM removes NodeMainteananceRequest.

Phase 5: Finalization

  1. LCM triggers the update for all other applications present in the cluster, such as StackLight, Tungsten Fabric, and others.

  2. LCM removes ClusterMaintenanceRequest.

After a while the cluster update completes and becomes fully operable again.