Configure update auto-pause

Available since MOSK 25.1 Technology Preview

Uinsg the UpdateAutoPause object, the operator can define specific StackLight alerts that trigger auto-pause of an update phase execution in a MOSK cluster. The feature enhances update management of MOSK clusters by preventing harmful changes to be propagated across the entire cloud.

Note

The feature is not available for management clusters.

When an update auto-pause is configured on a cluster, the following workflow applies:

  • During cluster updates, the system continuously monitors for the alerts defined in the UpdateAutoPause object

  • If any configured alert fires:

    • The update process automatically pauses

    • The commence field is removed from all steps that have not started

    • The commence field is removed from the steps related to Update host OS and Kubernetes components on worker nodes even if the step is in progress, and the step is paused

    • The ClusterUpdatePlan status changes to AutoPaused

    • The firing alerts are recorded in the UpdateAutoPause status

    • A condition is added to the Cluster object indicating the pause state

Configure auto-pausing of a MOSK cluster update

  1. Verify that StackLight is enabled on the MOSK cluster.

  2. Create an UpdateAutoPause object with the name that matches your cluster name within the cluster namespace. For example:

    apiVersion: kaas.mirantis.com/v1alpha1
    kind: UpdateAutoPause
    metadata:
      name: managed-cluster-example    # Must match cluster name
      namespace: managed-cluster-ns   # Must match cluster namespace
    spec:
      alerts:
        - AlertName1
        - AlertName2
    

    The list of alerts can include standard and custom StackLight alerts previously configured for the cluster.

    For the object spec, see Container Cloud API Reference: UpdateAutoPause resource.

  3. Apply the configuration:

    kubectl apply -f update-autopause.yaml
    

Resume paused updates

  1. Select one of the following options:

    • Investigate and resolve the conditions that triggered the alerts, then wait for the alerts to clear automatically

    • Remove the problematic alert from the UpdateAutoPause configuration

  2. Set the commence field to true for the relevant UpdatePlan steps to resume the update.

Caution

Admission Controller blocks attempts to set commence: true while alerts defined in the UpdateAutoPause object are still firing.

Monitor the status of an update auto-pause

You can monitor the status of an update auto-pause using the following resources:

  • The UpdateAutoPause object status:

    kubectl get updateautopause <cluster-name> -n <namespace> -o yaml
    
  • The ClusterUpdatePlan object status that displays the following details:

    • The AutoPaused status when updates are paused.

    • Messages indicating which alerts caused the pause and other relevant information.

  • StackLight alerts:

    • ClusterUpdateAutoPaused, which indicates that an update is currently paused.

    • ClusterUpdateStepAutoPaused, which describes specific steps that are paused.

    For alert details, see Container Cloud.