Granularly update a managed cluster using the ClusterUpdatePlan object¶
Available since MCC 2.27.0 (17.2.0 and 16.2.0) TechPreview
You can control the process of a managed cluster update by manually launching
update stages using the ClusterUpdatePlan
custom resource. Between the
update stages, a cluster remains functional from the perspective of cloud
users and workloads.
A ClusterUpdatePlan
object contains the following funtionality:
The object is automatically created by the bare metal provider when a new Cluster release becomes available for your cluster.
The object is created in the management cluster for the same namespace that the corresponding managed cluster refers to.
The object contains a list of self-descriptive update steps that are cluster-specific. These steps are defined in the
spec
section of the object with information about their impact on the cluster.The object starts cluster update when the operator manually changes the
commence
field of the first update step totrue
. All steps have thecommence
flag initially set tofalse
so that the operator can decide when to pause or resume the update process.The object has the following naming convention:
<managedClusterName>-<targetClusterReleaseVersion>
.Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0), the object contains several StackLight alerts to notify the operator about the update progress and potential update issues. For details, see StackLight alerts: Container Cloud.
To update a managed cluster granularly:
Verify that the management cluster is upgraded successfully as described in Verify the management cluster status before MOSK update.
Open the
ClusterUpdatePlan
object for editing.Start cluster update by changing the
spec:steps:commence
field of the first update step totrue
.Once done, the following actions are applied to the cluster:
The Cluster release in the corresponding
Cluster
spec
is changed to the target Cluster version defined in theClusterUpdatePlan
spec
.The cluster update starts and pauses before the next update step with
commence: false
set in theClusterUpdatePlan
spec
.
Caution
Cancelling an already started update
step
is not supported.The following example illustrates the
ClusterUpdatePlan
object of a MOSK cluster update that has completed:Example of a completed
ClusterUpdatePlan
objectObject: apiVersion: kaas.mirantis.com/v1alpha1 kind: ClusterUpdatePlan metadata: creationTimestamp: "2024-05-20T14:03:47Z" generation: 3 name: demo-managed-67835-17.3.0 namespace: managed-namespace resourceVersion: "534402" uid: 2eab536b-55aa-4870-b732-67ebf0a8a5bb spec: cluster: demo-managed-67835 source: mosk-17-2-0-24-2 steps: - commence: true constraints: - until the step is complete, it wont be possible to perform normal LCM operations on the cluster description: - install new version of life cycle management modules - restart OpenStack control plane components in parallel duration: eta: 2h0m0s info: - 15 minutes to update one OpenStack controller node - 5 minutes to update one compute node granularity: cluster impact: info: - 'up to 8% unavailability of APIs: OpenStack' users: minor workloads: none id: openstack name: Update OpenStack control plane on a MOSK cluster - commence: true description: - major Ceph version upgrade - update monitors, managers, RGW/MDS - OSDs are restarted sequentially, or by rack - takes into account the failure domain config in cluster (rack updated in parallel) duration: eta: 40m0s info: - up to 40 minutes to update Ceph cluster (30 nodes) granularity: cluster impact: info: - 'up to 8% unavailability of APIs: S3/Swift' users: none workloads: none id: ceph name: Update Ceph cluster on a MOSK cluster - commence: true description: - new host OS kernel and packages get installed - host OS configuration re-applied - new versions of Kubernetes components installed duration: eta: 45m0s info: - 15 minutes per Kubernetes master node, nodes updated sequentially granularity: cluster impact: users: none workloads: none id: k8s-controllers name: Update host OS and Kubernetes components on master nodes - commence: true description: - new host OS kernel and packages get installed - host OS configuration re-applied - new versions of Kubernetes components installed - containerd and MCR get bumped - Open vSwitch and Neutron L3 agents gets restarted on gateway and compute nodes duration: eta: 12h0m0s info: - 'depends on the type of the nodes: controller, compute, OSD' granularity: machine impact: info: - some OpenStack running operations might not complete due to restart of docker/containerd on controller nodes (up to 30%, assuming seq. controller update) - OpenStack LCM will prevent OpenStack controllers and gateways from parallel cordon / drain, despite node-group config - Ceph LCM will prevent parallel restart of OSDs, monitors and managers, despite node-group config - minor loss of the East-West connectivity with the Open vSwitch networking back end that causes approximately 5 min of downtime per compute node - 'minor loss of the North-South connectivity with the Open vSwitch networking back end: a non-distributed HA virtual router needs up to 1 minute to fail over; a non-distributed and non-HA virtual router failover time depends on many factors and may take up to 10 minutes' users: minor workloads: major id: k8s-workers-demo-managed-67835-default name: Update host OS and Kubernetes components on worker nodes, group default - commence: true description: - restart of StackLight, MetalLB services - restart of auxilary controllers and charts duration: eta: 30m0s info: - 30 minutes minimum granularity: cluster impact: info: - minor cloud API downtime due restart of MetalLB components users: minor workloads: none id: mcc-components name: Auxilary components update target: mosk-17-3-0-24-3 status: startedAt: "2024-05-20T14:05:23Z" status: Completed steps: - duration: 29m16.887573286s message: Ready id: openstack name: Update OpenStack control plane startedAt: "2024-05-20T14:05:23Z" status: Completed - duration: 8m1.808804491s message: Ready id: ceph name: Update Ceph cluster startedAt: "2024-05-20T14:34:39Z" status: Completed - duration: 33m5.100480887s message: Ready id: k8s-controllers name: Update host OS and Kubernetes components on master nodes startedAt: "2024-05-20T14:42:40Z" status: Completed - duration: 1h39m9.896875724s message: Ready id: k8s-workers-demo-managed-67835-default name: Update host OS and Kubernetes components on worker nodes, group default startedAt: "2024-05-20T15:34:46Z" status: Completed - duration: 2m1.426000849s message: Ready id: mcc-components name: Auxilary components update startedAt: "2024-05-20T17:13:55Z" status: Completed
Monitor the
message
andstatus
fields of the first step. Themessage
field contains information about the progress of the current step. Thestatus
field can have the following values:NotStarted
Scheduled
Since MCC 2.28.0 (17.3.0 and 16.3.0)InProgress
Stuck
Completed
The
Stuck
status indicates an issue and that the step can not fit into the ETA defined in theduration
field for this step. The ETA for each step is defined statically and does not change depending on the cluster.The
Scheduled
status indicates that a step is already triggered but its execution has not started yet.Caution
The status is not populated for the
ClusterUpdatePlan
objects that have not been started by adding thecommence: true
flag to the first object step. Therefore, always start updating the object from the first step.Optional. Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Add or remove update groups of worker nodes on the fly, unless the group update has already been scheduled. These changes are reflected in
ClusterUpdatePlan
.You can also reassign a machine to a different update group while the cluster is being updated, but only if the new update group has not finished updating yet. Disabled machines are considered as updated immediately.
Note
Depending on the number of update groups for worker nodes present in the cluster, the number of steps in
spec
differs. Each update group for worker nodes that has at least one machine will be represented by a step with the IDk8s-workers-<UpdateGroupName>
.Proceed with changing the
commence
flag of the following update steps granularly depending on the cluster update requirements.Caution
Launch the update steps sequentially. A consecutive step is not started until the previous step is completed.