ClusterUpdatePlan resource

Available since 2.27.0 (17.2.0 and 16.2.0) TechPreview

This section describes the ClusterUpdatePlan custom resource (CR) used in the Container Cloud API to granularly control update process of a managed cluster by stopping the update after each step.

The ClusterUpdatePlan CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is ClusterUpdatePlan.

  • metadata

    Metadata of the ClusterUpdatePlan CR that contains the following fields:

    • name

      Name of the ClusterUpdatePlan object.

    • namespace

      Project name of the cluster that relates to ClusterUpdatePlan.

  • spec

    Specification of the ClusterUpdatePlan CR that contains the following fields:

    • source

      Source name of the Cluster release from which the cluster is updated.

    • target

      Target name of the Cluster release to which the cluster is updated.

    • cluster

      Name of the cluster for which ClusterUpdatePlan is created.

    • releaseNotes

      Available since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0). Link to MOSK release notes of the target release.

    • steps

      List of update steps, where each step contains the following fields:

      • id

        Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step ID.

      • name

        Step name.

      • description

        Step description.

      • constraints

        Description of constraints applied during the step execution.

      • impact

        Impact of the step on the cluster functionality and workloads. Contains the following fields:

        • users

          Impact on the Container Cloud user operations. Possible values: none, major, or minor.

        • workloads

          Impact on workloads. Possible values: none, major, or minor.

        • info

          Additional details on impact, if any.

      • duration

        Details about duration of the step execution. Contains the following fields:

        • estimated

          Estimated time to complete the update step.

          Note

          Before Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0), this field was named eta.

        • info

          Additional details on update duration, if any.

      • granularity

        Information on the current step granularity. Indicates whether the current step is applied to each machine individually or to the entire cluster at once. Possible values are cluster or machine.

      • commence

        Flag that allows controlling the step execution. Boolean, false by default. If set to true, the step starts execution after all previous steps are completed.

        Caution

        Cancelling an already started update step is unsupported.

  • status

    Status of the ClusterUpdatePlan CR that contains the following fields:

    • startedAt

      Time when ClusterUpdatePlan has started.

    • completedAt

      Available since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0). Time of update completion.

    • status

      Overall object status.

    • steps

      List of step statuses in the same order as defined in spec. Each step status contains the following fields:

      • id

        Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step ID.

      • name

        Step name.

      • status

        Step status. Possible values are:

        • NotStarted

          Step has not started yet.

        • Scheduled

          Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step is already triggered but its execution has not started yet.

        • InProgress

          Step is currently in progress.

        • AutoPaused

          Available since Container Cloud 2.29.0 (Cluster release 17.4.0) as Technology Preview. Update is automatically paused by the trigger from a firing alert defined in the UpdateAutoPause configuration. For details, see UpdateAutoPause resource.

        • Stuck

          Step execution contains an issue, which also indicates that the step does not fit into the estimate defined in the duration field for this step in spec.

        • Completed

          Step has been completed.

      • message

        Message describing status details the current update step.

      • duration

        Current duration of the step execution.

      • startedAt

        Start time of the step execution.

Example of a ClusterUpdatePlan object:

apiVersion: kaas.mirantis.com/v1alpha1
kind: ClusterUpdatePlan
metadata:
  creationTimestamp: "2025-02-06T16:53:51Z"
  generation: 11
  name: mosk-17.4.0
  namespace: child
  resourceVersion: "6072567"
  uid: 82c072be-1dc5-43dd-b8cf-bc643206d563
spec:
  cluster: mosk
  releaseNotes: https://docs.mirantis.com/mosk/latest/25.1-series.html
  source: mosk-17-3-0-24-3
  steps:
  - commence: true
    description:
    - install new version of OpenStack and Tungsten Fabric life cycle management
      modules
    - OpenStack and Tungsten Fabric container images pre-cached
    - OpenStack and Tungsten Fabric control plane components restarted in parallel
    duration:
      estimated: 1h30m0s
      info:
      - 15 minutes to cache the images and update the life cycle management modules
      - 1h to restart the components
    granularity: cluster
    id: openstack
    impact:
      info:
      - some of the running cloud operations may fail due to restart of API services
        and schedulers
      - DNS might be affected
      users: minor
      workloads: minor
    name: Update OpenStack and Tungsten Fabric
  - commence: true
    description:
    - Ceph version update
    - restart Ceph monitor, manager, object gateway (radosgw), and metadata services
    - restart OSD services node-by-node, or rack-by-rack depending on the cluster
      configuration
    duration:
      estimated: 8m30s
      info:
      - 15 minutes for the Ceph version update
      - around 40 minutes to update Ceph cluster of 30 nodes
    granularity: cluster
    id: ceph
    impact:
      info:
      - 'minor unavailability of object storage APIs: S3/Swift'
      - workloads may experience IO performance degradation for the virtual storage
        devices backed by Ceph
      users: minor
      workloads: minor
    name: Update Ceph
  - commence: true
    description:
    - new host OS kernel and packages get installed
    - host OS configuration re-applied
    - container runtime version gets bumped
    - new versions of Kubernetes components installed
    duration:
      estimated: 1h40m0s
      info:
      - about 20 minutes to update host OS per a Kubernetes controller, nodes updated
        one-by-one
      - Kubernetes components update takes about 40 minutes, all nodes in parallel
    granularity: cluster
    id: k8s-controllers
    impact:
      users: none
      workloads: none
    name: Update host OS and Kubernetes components on master nodes
  - commence: true
    description:
    - new host OS kernel and packages get installed
    - host OS configuration re-applied
    - container runtime version gets bumped
    - new versions of Kubernetes components installed
    - data plane components (Open vSwitch and Neutron L3 agents, TF agents and vrouter)
      restarted on gateway and compute nodes
    - storage nodes put to “no-out” mode to prevent rebalancing
    - by default, nodes are updated one-by-one, a node group can be configured to
      update several nodes in parallel
    duration:
      estimated: 8h0m0s
      info:
      - host OS update - up to 15 minutes per node (not including host OS configuration
        modules)
      - Kubernetes components update - up to 15 minutes per node
      - OpenStack controllers and gateways updated one-by-one
      - nodes hosting Ceph OSD, monitor, manager, metadata, object gateway (radosgw)
        services updated one-by-one
    granularity: machine
    id: k8s-workers-vdrok-child-default
    impact:
      info:
      - 'OpenStack controller nodes: some running OpenStack operations might not
        complete due to restart of components'
      - 'OpenStack compute nodes: minor loss of the East-West connectivity with
        the Open vSwitch networking back end that causes approximately 5 min of
        downtime'
      - 'OpenStack gateway nodes: minor loss of the North-South connectivity with
        the Open vSwitch networking back end: a non-distributed HA virtual router
        needs up to 1 minute to fail over; a non-distributed and non-HA virtual
        router failover time depends on many factors and may take up to 10 minutes'
      users: major
      workloads: major
    name: Update host OS and Kubernetes components on worker nodes, group vdrok-child-default
  - commence: true
    description:
    - restart of StackLight, MetalLB services
    - restart of auxiliary controllers and charts
    duration:
      estimated: 1h30m0s
    granularity: cluster
    id: mcc-components
    impact:
      info:
      - minor cloud API downtime due restart of MetalLB components
      users: minor
      workloads: none
    name: Auxiliary components update
  target: mosk-17-4-0-25-1
status:
  completedAt: "2025-02-07T19:24:51Z"
  startedAt: "2025-02-07T17:07:02Z"
  status: Completed
  steps:
  - duration: 26m36.355605528s
    id: openstack
    message: Ready
    name: Update OpenStack and Tungsten Fabric
    startedAt: "2025-02-07T17:07:02Z"
    status: Completed
  - duration: 6m1.124356485s
    id: ceph
    message: Ready
    name: Update Ceph
    startedAt: "2025-02-07T17:33:38Z"
    status: Completed
  - duration: 24m3.151554465s
    id: k8s-controllers
    message: Ready
    name: Update host OS and Kubernetes components on master nodes
    startedAt: "2025-02-07T17:39:39Z"
    status: Completed
  - duration: 1h19m9.359184228s
    id: k8s-workers-vdrok-child-default
    message: Ready
    name: Update host OS and Kubernetes components on worker nodes, group vdrok-child-default
    startedAt: "2025-02-07T18:03:42Z"
    status: Completed
  - duration: 2m0.772243006s
    id: mcc-components
    message: Ready
    name: Auxiliary components update
    startedAt: "2025-02-07T19:22:51Z"
    status: Completed