ClusterUpdatePlan resource¶
Available since 2.27.0 (17.2.0 and 16.2.0) TechPreview
This section describes the ClusterUpdatePlan
custom resource (CR) used in
the Container Cloud API for all supported providers. Use this resource to
granularly control update process of a managed cluster by stopping the update
after each step.
The ClusterUpdatePlan
CR contains the following fields:
apiVersion
API version of the object that is
kaas.mirantis.com/v1alpha1
.
kind
Object type that is
ClusterUpdatePlan
.
metadata
Metadata of the
ClusterUpdatePlan
CR that contains the following fields:name
Name of the
ClusterUpdatePlan
object.
namespace
Project name of the cluster that relates to
ClusterUpdatePlan
.
spec
Specification of the
ClusterUpdatePlan
CR that contains the following fields:source
Source name of the Cluster release from which the cluster is updated.
target
Target name of the Cluster release to which the cluster is updated.
cluster
Name of the cluster for which
ClusterUpdatePlan
is created.
steps
List of update steps, where each step contains the following fields:
id
Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step ID.
name
Step name.
description
Step description.
constraints
Description of constraints applied during the step execution.
impact
Impact of the step on the cluster functionality and workloads. Contains the following fields:
users
Impact on the Container Cloud user operations. Possible values:
none
,major
, orminor
.
workloads
Impact on workloads. Possible values:
none
,major
, orminor
.
info
Additional details on impact, if any.
duration
Details about duration of the step execution. Contains the following fields:
eta
Estimated time to complete the update step.
info
Additional details on update duration, if any.
granularity
Information on the current step granularity. Indicates whether the current step is applied to each machine individually or to the entire cluster at once. Possible values are
cluster
ormachine
.
commence
Flag that allows controlling the step execution. Boolean,
false
by default. If set totrue
, the step starts execution after all previous steps are completed.Caution
Cancelling an already started update step is unsupported.
status
Status of the
ClusterUpdatePlan
CR that contains the following fields:startedAt
Time when
ClusterUpdatePlan
has started.
status
Overall object status.
steps
List of step statuses in the same order as defined in
spec
. Each step status contains the following fields:id
Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step ID.
name
Step name.
status
Step status. Possible values are:
NotStarted
Step has not started yet.
InProgress
Step is currently in progress.
Completed
Step has been completed.
Stuck
Step execution contains an issue, which also indicates that the step does not fit into the ETA defined in the
duration
field for this step inspec
.
Scheduled
Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step is already triggered but its execution has not started yet.
message
Message describing status details the current update step.
duration
Current duration of the step execution.
startedAt
Start time of the step execution.
Example of a ClusterUpdatePlan
object:
apiVersion: kaas.mirantis.com/v1alpha1
kind: ClusterUpdatePlan
metadata:
creationTimestamp: "2024-05-20T14:03:47Z"
generation: 3
name: demo-managed-67835-17.3.0
namespace: managed-namespace
resourceVersion: "534402"
uid: 2eab536b-55aa-4870-b732-67ebf0a8a5bb
spec:
cluster: demo-managed-67835
source: mosk-17-2-0-24-2
steps:
- commence: true
constraints:
- until the step is complete, it won't be possible to perform normal LCM operations
on the cluster
description:
- install new version of life cycle management modules
- restart OpenStack control plane components in parallel
duration:
eta: 2h0m0s
info:
- 15 minutes to update one OpenStack controller node
- 5 minutes to update one compute node
granularity: cluster
impact:
info:
- 'up to 8% unavailability of APIs: OpenStack'
users: minor
workloads: none
id: openstack
name: Update OpenStack control plane on a MOSK cluster
- commence: true
description:
- major Ceph version upgrade
- update monitors, managers, RGW/MDS
- OSDs are restarted sequentially, or by rack
- takes into account the failure domain config in cluster (rack updated in parallel)
duration:
eta: 40m0s
info:
- up to 40 minutes to update Ceph cluster (30 nodes)
granularity: cluster
impact:
info:
- 'up to 8% unavailability of APIs: S3/Swift'
users: none
workloads: none
id: ceph
name: Update Ceph cluster on a MOSK cluster
- commence: true
description:
- new host OS kernel and packages get installed
- host OS configuration re-applied
- new versions of Kubernetes components installed
duration:
eta: 45m0s
info:
- 15 minutes per Kubernetes master node, nodes updated sequentially
granularity: cluster
impact:
users: none
workloads: none
id: k8s-controllers
name: Update host OS and Kubernetes components on master nodes
- commence: true
description:
- new host OS kernel and packages get installed
- host OS configuration re-applied
- new versions of Kubernetes components installed
- containerd and MCR get bumped
- Open vSwitch and Neutron L3 agents gets restarted on gateway and compute nodes
duration:
eta: 12h0m0s
info:
- 'depends on the type of the nodes: controller, compute, OSD'
granularity: machine
impact:
info:
- some OpenStack running operations might not complete due to restart of docker/containerd
on controller nodes (up to 30%, assuming seq. controller update)
- OpenStack LCM will prevent OpenStack controllers and gateways from parallel
cordon / drain, despite node-group config
- Ceph LCM will prevent parallel restart of OSDs, monitors and managers, despite
node-group config
- minor loss of the East-West connectivity with the Open vSwitch networking
back end that causes approximately 5 min of downtime per compute node
- 'minor loss of the North-South connectivity with the Open vSwitch networking
back end: a non-distributed HA virtual router needs up to 1 minute to fail
over; a non-distributed and non-HA virtual router failover time depends
on many factors and may take up to 10 minutes'
users: minor
workloads: major
id: k8s-workers-demo-managed-67835-default
name: Update host OS and Kubernetes components on worker nodes, group default
- commence: true
description:
- restart of StackLight, MetalLB services
- restart of auxilary controllers and charts
duration:
eta: 30m0s
info:
- 30 minutes minimum
granularity: cluster
impact:
info:
- minor cloud API downtime due restart of MetalLB components
users: minor
workloads: none
id: mcc-components
name: Auxilary components update
target: mosk-17-3-0-24-3
status:
startedAt: "2024-05-20T14:05:23Z"
status: Completed
steps:
- duration: 29m16.887573286s
message: Ready
id: openstack
name: Update OpenStack control plane
startedAt: "2024-05-20T14:05:23Z"
status: Completed
- duration: 8m1.808804491s
message: Ready
id: ceph
name: Update Ceph cluster
startedAt: "2024-05-20T14:34:39Z"
status: Completed
- duration: 33m5.100480887s
message: Ready
id: k8s-controllers
name: Update host OS and Kubernetes components on master nodes
startedAt: "2024-05-20T14:42:40Z"
status: Completed
- duration: 1h39m9.896875724s
message: Ready
id: k8s-workers-demo-managed-67835-default
name: Update host OS and Kubernetes components on worker nodes, group default
startedAt: "2024-05-20T15:34:46Z"
status: Completed
- duration: 2m1.426000849s
message: Ready
id: mcc-components
name: Auxilary components update
startedAt: "2024-05-20T17:13:55Z"
status: Completed