Perform a graceful reboot of a cluster

You can perform a graceful reboot on a management or MOSK cluster. Use the below procedure to cordon, drain, and reboot the required cluster machines using a rolling reboot without workloads interruption. The procedure is also useful for a bulk reboot of machines, for example, on large clusters.

The reboot occurs in the order of cluster upgrade policy that you can change for MOSK clusters as described in Change the upgrade order of a machine.

Caution

The cluster and machines must have the Ready status to perform a graceful reboot.

Perform a rolling reboot of a cluster using web UI

  1. Log in to the MOSK management console with the m:kaas:namespace@operator or m:kaas:namespace@writer permissions.

  2. Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.

  3. On the Clusters page, verify that the status of the required cluster is Ready. Otherwise, the Reboot machines option is disabled.

  4. Click the More action icon in the last column of the required cluster and select Reboot machines. Confirm the selection.

    Note

    While a graceful reboot is in progress, the Reboot machines option is disabled.

    To monitor the cluster readiness, see Verify cluster status.

Caution

Machine configuration changes are forbidden during graceful reboot. Therefore, either wait until reboot is completed or cancel it using CLI, as described in the following section.

Perform a rolling reboot of a cluster using CLI

  1. Create a GracefulRebootRequest resource with a name that matches the name of the required cluster. For description of the resource fields, see GracefulRebootRequest resource.

  2. In spec:machines, add the machine list or leave it empty to reboot all cluster machines.

    When rebooting both Kubernetes control plane machines and OpenStack control plane machines, either by specifying machines of both types in spec.machines or by leaving the list empty, the two node groups may be rebooted in parallel, which can cause issues with workload management. To avoid this, perform the reboot in two sequential steps as described below.

    Perform the reboot in two sequential steps
    1. Create a GracefulRebootRequest object that lists only the Kubernetes control plane machines in spec.machines.

      To obtain the names of Kubernetes control plane machines, run:

      kubectl get machines -n <projectName> \
        -l cluster.sigs.k8s.io/control-plane
      

      Example GracefulRebootRequest for Kubernetes control plane nodes:

      apiVersion: kaas.mirantis.com/v1alpha1
      kind: GracefulRebootRequest
      metadata:
        name: <clusterName>
        namespace: <projectName>
      spec:
        machines:
        - <k8sControlPlaneMachine1>
        - <k8sControlPlaneMachine2>
        - <k8sControlPlaneMachine3>
      
    2. Once the GracefulRebootRequest completes and is automatically deleted, create another GracefulRebootRequest for the remaining cluster nodes:

      apiVersion: kaas.mirantis.com/v1alpha1
      kind: GracefulRebootRequest
      metadata:
        name: <clusterName>
        namespace: <projectName>
      spec:
        machines:
        - <remainingMachine1>
        - <remainingMachine2>
        - <remainingMachine3>
      
  3. Wait until all specified machines are rebooted. You can monitor the reboot status of the cluster and machines using the Conditions:GracefulReboot fields of the corresponding Cluster and Machine objects.

    The GracefulRebootRequest object is automatically deleted once the reboot on all target machines completes.

    To monitor the live machine status:

    kubectl get machines <machineName> -o wide
    

    Example of system response:

    NAME    READY  LCMPHASE  NODENAME            UPGRADEINDEX  REBOOTREQUIRED  WARNINGS
    demo-0  true   Ready     kaas-node-c6aa8ad3  1             true
    

Caution

Machine configuration changes are forbidden during graceful reboot.

In emergency cases, for example, to migrate StackLight or Ceph services from a disabled machine that fails during graceful reboot and blocks the process, cancel the reboot by deleting the GracefulRebootRequest object:

kubectl -n <projectName> delete gracefulrebootrequest <gracefulRebootRequestName>

Once you migrate StackLight or Ceph services to another machine and disable it, re-create the GracefulRebootRequest object for the remaining machines that require reboot.

Note

To reboot a single node, for example, for maintenance purposes, refer to Enable cluster and machine maintenance mode.