Disable a machine

TechPreview since 2.25.0 (17.0.0 and 16.0.0) for workers on managed clusters

You can use the machine disabling API to seamlessly remove a worker machine from the LCM control of a managed cluster. This action isolates the affected node without impacting other machines in the cluster, effectively eliminating it from the Kubernetes cluster. This functionality proves invaluable in scenarios where a malfunctioning machine impedes cluster updates.

Note

The Technology Preview support of the machine disabling feature also applies during cluster update to the Cluster release 17.1.0 or 16.1.0.

Precautions for machine disablement

Before disabling a cluster machine, carefully read the following essential information for a successful machine disablement:

  • Container Cloud supports machine disablement of worker machines only.

    If an issue occurs on the control plane, which is updated before worker machines, fix the issue or replace the affected control machine as soon as possible to prevent issues with workloads. For reference, see Troubleshooting and Delete a cluster machine.

  • Disabling a machine can break high availability (HA) of components such as StackLight. Therefore, Mirantis recommends adding a new machine as soon as possible to provide sufficient node number for components HA.

    Note

    It is expected that the cluster status contains degraded replicas of some components during or after cluster update with a disabled machine. These replicas become available as soon as you replace the disabled machine.

  • When a machine is disabled, some services may switch to the NodeReady state and may require additional actions to unblock LCM tasks.

  • A disabled machine is removed from the overall cluster status and is labeled as Disabled. The requested node number for the cluster remains the same, but an additional disabled field is displayed with the number of disabled nodes.

  • A disabled machine is not taken into account for any calculations, for example, when the number of StackLight nodes is required for some restriction check.

  • Container Cloud removes the node running the disabled machine from the Kubernetes cluster.

  • Deletion of the disabled machine with the graceful deletion policy is not allowed. Use the unsafe deletion policy instead. For details, see Delete a cluster machine.

  • For a major cluster update, the Cluster release of a disabled machine must match the Cluster release of other cluster machines.

    If a machine is disabled during the major Cluster release update, then the upgrade should be completed if all other requirements are met. However, cluster update to the next available major Cluster release will be blocked until you re-enable or replace the disabled machine.

    Patch updates do not have such limitation on different patch Cluster releases. You can update a cluster with a disabled machine to several patch Cluster releases in the scope of one major Cluster release.

  • After enabling the machine, it will be updated to match the Cluster release of the corresponding cluster, including all related components.

  • For Ceph machines, you need to perform additional disablement steps.

Disable a machine using the Container Cloud web UI

  1. Carefully read the precautions for machine disablement.

  2. Power off the underlying host of a machine to be disabled.

    Warning

    If the underlying host of a machine is not powered off, the cluster may still contain the disabled machine in the list of available nodes with kubelet attempting to start the corresponding containers on the disabled machine.

    Therefore, Mirantis strongly recommends powering off the underlying host to prevent manual removal of the related Kubernetes node from the Docker Swarm cluster using the MKE web UI.

  3. In the Clusters tab, click the required cluster name to open the list of machines running on it.

  4. Click the More action icon in the last column of the required machine and click Disable.

  5. Wait until the machine Status switches to Disabled.

  6. If the disabled machine contains StackLight or Ceph, migrate these services to a healthy machine:

    1. Verify that the required disabled and healthy machines are not currently added to GracefulRebootRequest:

      Note

      Machine configuration changes, such as reassigning Ceph and StackLight labels from a disabled machine to a healthy one, which are described in the following steps, are not allowed during graceful reboot. For details, see Perform a graceful reboot of a cluster.

      1. Verify that the More > Reboot machines option is not disabled. If the option is active, skip the following sub-step and proceed to the next step. If the option is disabled, proceed to the following sub-step.

      2. Using the Container Cloud CLI, verify that the new machine, which you are going to use for StackLight or Ceph services migration, is not included in the list of the GracefulRebootRequest resource. Otherwise, remove GracefulRebootRequest before proceeding. For details, see Disable a machine using the Container Cloud CLI.

      Note

      Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0), reboot of the disabled machine is automatically skipped in GracefulRebootRequest.

    2. If StackLight is deployed on the machine, unblock LCM tasks by moving the stacklight=enabled label to another healthy machine with a sufficient amount of resources and manually remove StackLight Pods along with related local persistent volumes from the disabled machine. For details, see Deschedule StackLight Pods from a worker machine.

    3. If Ceph is deployed on the machine:

      Disable a Ceph machine
      1. Select one of the following options to open the Ceph cluster spec:

        In the CephClusters tab, click the required Ceph cluster name to open its spec.

        Open the KaaSCephCluster object for editing:

        kubectl edit kaascephcluster -n <managedClusterProjectName> <KaaSCephClusterName>
        
      2. In spec.node, find the machine to be disabled.

      3. Back up the machine configuration.

      4. Verify the machine role:

        • For mgr, rgw, or mds, move such role to another node located in the node section. Such node must meet resource requirements to run the corresponding daemon type and must not have the respective node assigned yet.

        • For mon, refer to Move a Ceph Monitor daemon to another node for further instructions. Mirantis recommends considering nodes with sufficient resources to run the moved monitor daemon.

        • For osd, proceed to the next step.

      5. Remove the machine from spec.

Enable machine using the Container Cloud web UI

  1. In the Clusters tab, click the required cluster name to open the list of machines running on it.

  2. Click the More action icon in the last column of the required machine and click Enable.

  3. Wait until the machine Status switches to Ready.

  4. If Ceph is deployed on the machine:

    Enable a Ceph machine
    1. Select one of the following options to open the Ceph cluster spec:

      In the CephClusters tab, click the required Ceph cluster name to open its spec.

      Open the KaaSCephCluster object for editing:

      kubectl edit kaascephcluster -n <managedClusterProjectName> <KaaSCephClusterName>
      
    2. In spec.node, add a new or backed-up configuration of the machine to be enabled.

      If the machine must have any role besides osd, consider the following options to return a role back to the node:

Disable a machine using the Container Cloud CLI

  1. Carefully read the precautions for machine disablement.

  2. Power off the underlying host of a machine to be disabled.

    Warning

    If the underlying host of a machine is not powered off, the cluster may still contain the disabled machine in the list of available nodes with kubelet attempting to start the corresponding containers on the disabled machine.

    Therefore, Mirantis strongly recommends powering off the underlying host to prevent manual removal of the related Kubernetes node from the Docker Swarm cluster using the MKE web UI.

  3. Open the required Machine object for editing.

  4. In the providerSpec:value section, set disable to true:

    kubectl patch machines.cluster.k8s.io -n <projectName> <machineName> --type=merge -p '{"spec":{"providerSpec":{"value":{"disable":true}}}}'
    
  5. Wait until the machine status switches to Disabled:

    kubectl get machines.cluster.k8s.io -n <projectName> <machineName> -o jsonpath='{.status.providerStatus.status}'
    
  6. If the disabled machine contains StackLight or Ceph, migrate these services to a healthy machine:

    1. Verify that the required disabled and healthy machines are not currently added to GracefulRebootRequest:

      Note

      Machine configuration changes, such as reassigning Ceph and StackLight labels from a disabled machine to a healthy one, which are described in the following steps, are not allowed during graceful reboot. For details, see Perform a graceful reboot of a cluster.

      kubectl get gracefulrebootrequest -A
      
      kubectl -n <projectName> get gracefulrebootrequest <gracefulRebootRequestName> -o yaml
      

      If the machine is listed in the object spec section, remove the GracefulRebootRequest object:

      kubectl -n <projectName> delete gracefulrebootrequest <gracefulRebootRequestName>
      

      Note

      Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0), reboot of the disabled machine is automatically skipped in GracefulRebootRequest.

    2. If StackLight is deployed on the machine, unblock LCM tasks by moving the stacklight=enabled label to another healthy machine with a sufficient amount of resources and manually remove StackLight Pods along with related local persistent volumes from the disabled machine. For details, see Deschedule StackLight Pods from a worker machine.

    3. If Ceph is deployed on the machine:

      Disable a Ceph machine
      1. Select one of the following options to open the Ceph cluster spec:

        In the CephClusters tab, click the required Ceph cluster name to open its spec.

        Open the KaaSCephCluster object for editing:

        kubectl edit kaascephcluster -n <managedClusterProjectName> <KaaSCephClusterName>
        
      2. In spec.node, find the machine to be disabled.

      3. Back up the machine configuration.

      4. Verify the machine role:

        • For mgr, rgw, or mds, move such role to another node located in the node section. Such node must meet resource requirements to run the corresponding daemon type and must not have the respective node assigned yet.

        • For mon, refer to Move a Ceph Monitor daemon to another node for further instructions. Mirantis recommends considering nodes with sufficient resources to run the moved monitor daemon.

        • For osd, proceed to the next step.

      5. Remove the machine from spec.

Enable a machine using the Container Cloud CLI

  1. Open the required Machine object for editing.

  2. In the providerSpec:value section, set disable to false:

    kubectl patch machines.cluster.k8s.io -n <projectName> <machineName> --type=merge -p '{"spec":{"providerSpec":{"value":{"disable":false}}}}'
    
  3. Wait until the machine status switches to Ready:

    kubectl get machines.cluster.k8s.io -n <projectName> <machineName> -o jsonpath='{.status.providerStatus.status}'
    
  4. If Ceph is deployed on the machine:

    Enable a Ceph machine
    1. Select one of the following options to open the Ceph cluster spec:

      In the CephClusters tab, click the required Ceph cluster name to open its spec.

      Open the KaaSCephCluster object for editing:

      kubectl edit kaascephcluster -n <managedClusterProjectName> <KaaSCephClusterName>
      
    2. In spec.node, add a new or backed-up configuration of the machine to be enabled.

      If the machine must have any role besides osd, consider the following options to return a role back to the node:

See also