Granularly update a managed cluster using the ClusterUpdatePlan object¶
Available since MCC 2.27.0 (17.2.0) TechPreview
You can control the process of a managed cluster update by manually launching
update stages using the ClusterUpdatePlan
custom resource. Between the
update stages, a cluster remains functional from the perspective of cloud
users and workloads.
A ClusterUpdatePlan
object contains the following funtionality:
The object is automatically created by the bare metal provider when a new Cluster release becomes available for your cluster.
The object is created in the management cluster for the same namespace that the corresponding managed cluster refers to.
The object contains a list of self-descriptive update steps that are cluster-specific. These steps are defined in the
spec
section of the object with information about their impact on the cluster.The object starts cluster update when the operator manually changes the
commence
field of the first update step totrue
. All steps have thecommence
flag initially set tofalse
so that the operator can decide when to pause or resume the update process.The object has the following naming convention:
<managedClusterName>-<targetClusterReleaseVersion>
.Since Container Cloud 2.28.0 (Cluster release 17.3.0), the object contains several StackLight alerts to notify the operator about the update progress and potential update issues. For details, see StackLight alerts: Container Cloud.
Granularly update a managed cluster using CLI¶
Verify that the management cluster is upgraded successfully as described in Verify the management cluster status before MOSK update.
Optional. Available since Container Cloud 2.29.0 (Cluster release 17.4.0) as Technology Preview. Enable update auto-pause to be triggered by specific StackLight alerts. For details, see Configure update auto-pause.
Open the
ClusterUpdatePlan
object for editing.Start cluster update by changing the
spec:steps:commence
field of the first update step totrue
.Once done, the following actions are applied to the cluster:
The Cluster release in the corresponding
Cluster
spec
is changed to the target Cluster version defined in theClusterUpdatePlan
spec
.The cluster update starts and pauses before the next update step with
commence: false
set in theClusterUpdatePlan
spec
.
Caution
Cancelling an already started update
step
is not supported.The following example illustrates the
ClusterUpdatePlan
object of a MOSK cluster update that has completed:Example of a completed
ClusterUpdatePlan
objectObject: apiVersion: kaas.mirantis.com/v1alpha1 kind: ClusterUpdatePlan metadata: creationTimestamp: "2025-02-06T16:53:51Z" generation: 11 name: mosk-17.4.0 namespace: child resourceVersion: "6072567" uid: 82c072be-1dc5-43dd-b8cf-bc643206d563 spec: cluster: mosk releaseNotes: https://docs.mirantis.com/mosk/latest/25.1-series.html source: mosk-17-3-0-24-3 steps: - commence: true description: - install new version of OpenStack and Tungsten Fabric life cycle management modules - OpenStack and Tungsten Fabric container images pre-cached - OpenStack and Tungsten Fabric control plane components restarted in parallel duration: estimated: 1h30m0s info: - 15 minutes to cache the images and update the life cycle management modules - 1h to restart the components granularity: cluster id: openstack impact: info: - some of the running cloud operations may fail due to restart of API services and schedulers - DNS might be affected users: minor workloads: minor name: Update OpenStack and Tungsten Fabric - commence: true description: - Ceph version update - restart Ceph monitor, manager, object gateway (radosgw), and metadata services - restart OSD services node-by-node, or rack-by-rack depending on the cluster configuration duration: estimated: 8m30s info: - 15 minutes for the Ceph version update - around 40 minutes to update Ceph cluster of 30 nodes granularity: cluster id: ceph impact: info: - 'minor unavailability of object storage APIs: S3/Swift' - workloads may experience IO performance degradation for the virtual storage devices backed by Ceph users: minor workloads: minor name: Update Ceph - commence: true description: - new host OS kernel and packages get installed - host OS configuration re-applied - container runtime version gets bumped - new versions of Kubernetes components installed duration: estimated: 1h40m0s info: - about 20 minutes to update host OS per a Kubernetes controller, nodes updated one-by-one - Kubernetes components update takes about 40 minutes, all nodes in parallel granularity: cluster id: k8s-controllers impact: users: none workloads: none name: Update host OS and Kubernetes components on master nodes - commence: true description: - new host OS kernel and packages get installed - host OS configuration re-applied - container runtime version gets bumped - new versions of Kubernetes components installed - data plane components (Open vSwitch and Neutron L3 agents, TF agents and vrouter) restarted on gateway and compute nodes - storage nodes put to “no-out” mode to prevent rebalancing - by default, nodes are updated one-by-one, a node group can be configured to update several nodes in parallel duration: estimated: 8h0m0s info: - host OS update - up to 15 minutes per node (not including host OS configuration modules) - Kubernetes components update - up to 15 minutes per node - OpenStack controllers and gateways updated one-by-one - nodes hosting Ceph OSD, monitor, manager, metadata, object gateway (radosgw) services updated one-by-one granularity: machine id: k8s-workers-vdrok-child-default impact: info: - 'OpenStack controller nodes: some running OpenStack operations might not complete due to restart of components' - 'OpenStack compute nodes: minor loss of the East-West connectivity with the Open vSwitch networking back end that causes approximately 5 min of downtime' - 'OpenStack gateway nodes: minor loss of the North-South connectivity with the Open vSwitch networking back end: a non-distributed HA virtual router needs up to 1 minute to fail over; a non-distributed and non-HA virtual router failover time depends on many factors and may take up to 10 minutes' users: major workloads: major name: Update host OS and Kubernetes components on worker nodes, group vdrok-child-default - commence: true description: - restart of StackLight, MetalLB services - restart of auxiliary controllers and charts duration: estimated: 1h30m0s granularity: cluster id: mcc-components impact: info: - minor cloud API downtime due restart of MetalLB components users: minor workloads: none name: Auxiliary components update target: mosk-17-4-0-25-1 status: completedAt: "2025-02-07T19:24:51Z" startedAt: "2025-02-07T17:07:02Z" status: Completed steps: - duration: 26m36.355605528s id: openstack message: Ready name: Update OpenStack and Tungsten Fabric startedAt: "2025-02-07T17:07:02Z" status: Completed - duration: 6m1.124356485s id: ceph message: Ready name: Update Ceph startedAt: "2025-02-07T17:33:38Z" status: Completed - duration: 24m3.151554465s id: k8s-controllers message: Ready name: Update host OS and Kubernetes components on master nodes startedAt: "2025-02-07T17:39:39Z" status: Completed - duration: 1h19m9.359184228s id: k8s-workers-vdrok-child-default message: Ready name: Update host OS and Kubernetes components on worker nodes, group vdrok-child-default startedAt: "2025-02-07T18:03:42Z" status: Completed - duration: 2m0.772243006s id: mcc-components message: Ready name: Auxiliary components update startedAt: "2025-02-07T19:22:51Z" status: Completed
Monitor the
message
andstatus
fields of the first step. Themessage
field contains information about the progress of the current step. Thestatus
field can have the following values:NotStarted
Scheduled
Since MCC 2.28.0 (17.3.0)InProgress
AutoPaused
TechPreview since MCC 2.29.0 (17.4.0)Stuck
Completed
The
Scheduled
status indicates that a step is already triggered but its execution has not started yet.The
AutoPaused
status indicates that the update process is paused by a firing StackLight alert defined in theUpdateAutoPause
object. For details, see Configure update auto-pause.The
Stuck
status indicates an issue and that the step can not fit into the ETA defined in theduration
field for this step. The ETA for each step is defined statically and does not change depending on the cluster.Caution
The status is not populated for the
ClusterUpdatePlan
objects that have not been started by adding thecommence: true
flag to the first object step. Therefore, always start updating the object from the first step.Optional. Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Add or remove update groups of worker nodes on the fly, unless the update of the group that is being removed has already been scheduled, or if a newly set group will have an index that is lower or equal to another group that is already scheduled. These changes are reflected in
ClusterUpdatePlan
.You can also reassign a machine to a different update group while the cluster is being updated, but only if the new update group has an index higher than the index of the last scheduled worker update group. Disabled machines are considered as updated immediately.
Note
Depending on the number of update groups for worker nodes present in the cluster, the number of steps in
spec
differs. Each update group for worker nodes that has at least one machine will be represented by a step with the IDk8s-workers-<UpdateGroupName>
.Proceed with changing the
commence
flag of the following update steps granularly depending on the cluster update requirements.Caution
Launch the update steps sequentially. A consecutive step is not started until the previous step is completed.
Granularly update a managed cluster using the Container Cloud web UI¶
Available since MCC 2.29.0 (17.4.0 and 16.4.0)
Verify that the management cluster is upgraded successfully as described in Verify the management cluster status before MOSK update.
Optional. Available since Container Cloud 2.29.0 (Cluster release 17.4.0) as Technology Preview. Enable update auto-pause to be triggered by specific StackLight alerts. For details, see Configure update auto-pause.
Log in to the Container Cloud web UI with the
m:kaas:namespace@operator
orm:kaas:namespace@writer
permissions.Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.
On the Clusters page, in the Updates column of the required cluster, click the Available link. The Updates tab opens.
Note
If the Updates column is absent, it indicates that the cluster is up-to-date.
Note
For your convenience, the Cluster updates menu is also available in the right-side kebab menu of the cluster on the Clusters page.
On the Updates page, click the required version in the Target column to open update details, including the list of update steps, current and target cluster versions, and estimated update time.
In the Target version section of the Cluster update window, click Release notes and carefully read updates about target release, including the Update notes section that contains important pre-update and post-update steps.
Expand each step to verify information about update impact and other useful details.
Select one of the following options:
Enable Auto-commence all at the top-right of the first update step section and click Start Update to launch update and start each step automatically.
Click Start Update to only launch the first update step.
Note
This option allows you to auto-commence consecutive steps while the current step is in progress. Enable the Auto-commence toggle for required steps and click Save to launch the selected steps automatically. You will only be prompted to confirm the consecutive step, all remaining steps will be launched without a manual confirmation.
Before launching the update, you will be prompted to manually type in the target Cluster release name and confirm that you have read release notes about target release.
Caution
Cancelling an already started update step is not supported.
Monitor the status of each step by hovering over the In Progress icon at the top-right of the step window. While the step is in progress, its current status is updated every minute.
Once the required step is completed, the Waiting for input status at the top of the update window is displayed requiring you to confirm the next step.
The update history is retained in the Updates tab with the completion timestamp. The update plans that were not started and can no longer be used are cleaned up automatically.