ClusterUpdatePlan resource

Available since 2.27.0 (17.2.0 and 16.2.0) TechPreview

This section describes the ClusterUpdatePlan custom resource (CR) used in the Container Cloud API for all supported providers. Use this resource to granularly control update process of a managed cluster by stopping the update after each step.

The ClusterUpdatePlan CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is ClusterUpdatePlan.

  • metadata

    Metadata of the ClusterUpdatePlan CR that contains the following fields:

    • name

      Name of the ClusterUpdatePlan object.

    • namespace

      Project name of the cluster that relates to ClusterUpdatePlan.

  • spec

    Specification of the ClusterUpdatePlan CR that contains the following fields:

    • source

      Source name of the Cluster release from which the cluster is updated.

    • target

      Target name of the Cluster release to which the cluster is updated.

    • cluster

      Name of the cluster for which ClusterUpdatePlan is created.

    • steps

      List of update steps, where each step contains the following fields:

      • id

        Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step ID.

      • name

        Step name.

      • description

        Step description.

      • constraints

        Description of constraints applied during the step execution.

      • impact

        Impact of the step on the cluster functionality and workloads. Contains the following fields:

        • users

          Impact on the Container Cloud user operations. Possible values: none, major, or minor.

        • workloads

          Impact on workloads. Possible values: none, major, or minor.

        • info

          Additional details on impact, if any.

      • duration

        Details about duration of the step execution. Contains the following fields:

        • eta

          Estimated time to complete the update step.

        • info

          Additional details on update duration, if any.

      • granularity

        Information on the current step granularity. Indicates whether the current step is applied to each machine individually or to the entire cluster at once. Possible values are cluster or machine.

      • commence

        Flag that allows controlling the step execution. Boolean, false by default. If set to true, the step starts execution after all previous steps are completed.

        Caution

        Cancelling an already started update step is unsupported.

  • status

    Status of the ClusterUpdatePlan CR that contains the following fields:

    • startedAt

      Time when ClusterUpdatePlan has started.

    • status

      Overall object status.

    • steps

      List of step statuses in the same order as defined in spec. Each step status contains the following fields:

      • id

        Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step ID.

      • name

        Step name.

      • status

        Step status. Possible values are:

        • NotStarted

          Step has not started yet.

        • InProgress

          Step is currently in progress.

        • Completed

          Step has been completed.

        • Stuck

          Step execution contains an issue, which also indicates that the step does not fit into the ETA defined in the duration field for this step in spec.

        • Scheduled

          Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step is already triggered but its execution has not started yet.

      • message

        Message describing status details the current update step.

      • duration

        Current duration of the step execution.

      • startedAt

        Start time of the step execution.

Example of a ClusterUpdatePlan object:

apiVersion: kaas.mirantis.com/v1alpha1
kind: ClusterUpdatePlan
metadata:
  creationTimestamp: "2024-05-20T14:03:47Z"
  generation: 3
  name: demo-managed-67835-17.3.0
  namespace: managed-namespace
  resourceVersion: "534402"
  uid: 2eab536b-55aa-4870-b732-67ebf0a8a5bb
spec:
  cluster: demo-managed-67835
  source: mosk-17-2-0-24-2
  steps:
  - commence: true
    constraints:
    - until the step is complete, it won't be possible to perform normal LCM operations
      on the cluster
    description:
    - install new version of life cycle management modules
    - restart OpenStack control plane components in parallel
    duration:
      eta: 2h0m0s
      info:
      - 15 minutes to update one OpenStack controller node
      - 5 minutes to update one compute node
    granularity: cluster
    impact:
      info:
      - 'up to 8% unavailability of APIs: OpenStack'
      users: minor
      workloads: none
    id: openstack
    name: Update OpenStack control plane on a MOSK cluster
  - commence: true
    description:
    - major Ceph version upgrade
    - update monitors, managers, RGW/MDS
    - OSDs are restarted sequentially, or by rack
    - takes into account the failure domain config in cluster (rack updated in parallel)
    duration:
      eta: 40m0s
      info:
      - up to 40 minutes to update Ceph cluster (30 nodes)
    granularity: cluster
    impact:
      info:
      - 'up to 8% unavailability of APIs: S3/Swift'
      users: none
      workloads: none
    id: ceph
    name: Update Ceph cluster on a MOSK cluster
  - commence: true
    description:
    - new host OS kernel and packages get installed
    - host OS configuration re-applied
    - new versions of Kubernetes components installed
    duration:
      eta: 45m0s
      info:
      - 15 minutes per Kubernetes master node, nodes updated sequentially
    granularity: cluster
    impact:
      users: none
      workloads: none
    id: k8s-controllers
    name: Update host OS and Kubernetes components on master nodes
  - commence: true
    description:
    - new host OS kernel and packages get installed
    - host OS configuration re-applied
    - new versions of Kubernetes components installed
    - containerd and MCR get bumped
    - Open vSwitch and Neutron L3 agents gets restarted on gateway and compute nodes
    duration:
      eta: 12h0m0s
      info:
      - 'depends on the type of the nodes: controller, compute, OSD'
    granularity: machine
    impact:
      info:
      - some OpenStack running operations might not complete due to restart of docker/containerd
        on controller nodes (up to 30%, assuming seq. controller update)
      - OpenStack LCM will prevent OpenStack controllers and gateways from parallel
        cordon / drain, despite node-group config
      - Ceph LCM will prevent parallel restart of OSDs, monitors and managers, despite
        node-group config
      - minor loss of the East-West connectivity with the Open vSwitch networking
        back end that causes approximately 5 min of downtime per compute node
      - 'minor loss of the North-South connectivity with the Open vSwitch networking
        back end: a non-distributed HA virtual router needs up to 1 minute to fail
        over; a non-distributed and non-HA virtual router failover time depends
        on many factors and may take up to 10 minutes'
      users: minor
      workloads: major
    id: k8s-workers-demo-managed-67835-default
    name: Update host OS and Kubernetes components on worker nodes, group default
  - commence: true
    description:
    - restart of StackLight, MetalLB services
    - restart of auxilary controllers and charts
    duration:
      eta: 30m0s
      info:
      - 30 minutes minimum
    granularity: cluster
    impact:
      info:
      - minor cloud API downtime due restart of MetalLB components
      users: minor
      workloads: none
    id: mcc-components
    name: Auxilary components update
  target: mosk-17-3-0-24-3
status:
  startedAt: "2024-05-20T14:05:23Z"
  status: Completed
  steps:
  - duration: 29m16.887573286s
    message: Ready
    id: openstack
    name: Update OpenStack control plane
    startedAt: "2024-05-20T14:05:23Z"
    status: Completed
  - duration: 8m1.808804491s
    message: Ready
    id: ceph
    name: Update Ceph cluster
    startedAt: "2024-05-20T14:34:39Z"
    status: Completed
  - duration: 33m5.100480887s
    message: Ready
    id: k8s-controllers
    name: Update host OS and Kubernetes components on master nodes
    startedAt: "2024-05-20T14:42:40Z"
    status: Completed
  - duration: 1h39m9.896875724s
    message: Ready
    id: k8s-workers-demo-managed-67835-default
    name: Update host OS and Kubernetes components on worker nodes, group default
    startedAt: "2024-05-20T15:34:46Z"
    status: Completed
  - duration: 2m1.426000849s
    message: Ready
    id: mcc-components
    name: Auxilary components update
    startedAt: "2024-05-20T17:13:55Z"
    status: Completed