Delete a cluster machine using CLI

Available since 2.21.0 for non-MOSK clusters as TechPreview

This section instructs you on how to scale down an existing management, regional, or managed cluster through the Container Cloud API. To delete a machine using the Container Cloud web UI, see Delete a cluster machine using web UI.

Using the Container Cloud API, you can delete a cluster machine using the following methods:

  • Recommended. Enable the delete field in the providerSpec section of the required Machine object. It allows aborting graceful machine deletion before the node is removed from Docker Swarm.

  • Not recommended. Apply the delete request to the Machine object.

You can control machine deletion steps by following a specific machine deletion policy.

Overview of machine deletion policies

The deletion policy of the Machine resource used in the Container Cloud API defines specific steps occurring before a machine deletion.

The Container Cloud API contains the following types of deletion policies: graceful, unsafe, forced.

By default, the unsafe deletion policy is used. In future Container Cloud releases, the default policy will be changed to the graceful one.

You can change the deletion policy at any time - TBU. If the deletion process has already started, you can reduce the deletion policy restrictions in the following order: graceful > unsafe > forced.

Graceful machine deletion

During a graceful machine deletion, the cloud provider and LCM controllers perform the following steps:

  1. Cordon and drain the node being deleted.

  2. Remove the node from Docker Swarm.

  3. Send the delete request to the corresponding Machine resource.

  4. Remove the provider resources such as the VM instance, network, volume, and so on. Remove the related Kubernetes resources.

  5. Remove the finalizer from the Machine resource. This step completes the machine deletion from Kubernetes resources.

Caution

You can abort a graceful machine deletion only before the corresponding node is removed from Docker Swarm.

During a graceful machine deletion, the Machine object status displays prepareDeletionPhase with the following possible values:

  • started

    Cloud provider controller prepares a machine for deletion by cordoning, draining the machine, and so on.

  • completed

    LCM Controller starts removing the machine resources since the preparation for deletion is complete.

  • aborting

    Cloud provider controller attempts to uncordon the node. If the attempt fails, the status changes to failed.

  • failed

    Error in the deletion workflow.

Unsafe machine deletion

During an unsafe machine deletion, the cloud provider and LCM controllers perform the following steps:

  1. Send the delete request to the corresponding Machine resource.

  2. Remove the provider resources such as the VM instance, network, volume, and so on. Remove the related Kubernetes resources.

  3. Remove the finalizer from the Machine resource. This step completes the machine deletion from Kubernetes resources.

Forced machine deletion

During a forced machine deletion, the cloud provider and LCM controllers perform the following steps:

  1. Send the delete request to the corresponding Machine resource.

  2. Remove the provider resources such as the VM instance, network, volume, and so on. Remove the related Kubernetes resources.

  3. Remove the finalizer from the Machine resource. This step completes the machine deletion from Kubernetes resources.

This policy type allows deleting a Machine resource even if the cloud provider or LCM controller gets stuck at some step. But this policy may require a manual cleanup of machine resources in case of the сontroller failure.

Warning

If forced deletion fails at any step, the LCM Controller removes the finalizer anyway.

Delete a machine from a cluster using CLI

  1. Carefully read the machine deletion precautions.

  2. Log in to the Container Cloud web UI with the m:kaas:namespace@operator or m:kaas:namespace@writer permissions.

  3. Log in to the host where your management cluster kubeconfig is located and where kubectl is installed.

  4. For the Equinix Metal and bare metal providers, ensure that the machine being deleted is not a Ceph Monitor. If it is, migrate the Ceph Monitor to keep the odd number quorum of Ceph Monitors after the machine deletion. For details, see Migrate a Ceph Monitor before machine replacement.

  5. If the machine is assigned to a machine pool, decrease replicas count of the pool as described in Change replicas count of a machine pool.

  6. Select from the following options:

    • Recommended. In the providerSpec.value section of the Machine object, set delete to true:

      kubectl patch machines.cluster.k8s.io -n <projectName> <machineName> --type=merge -p '{"spec":{"providerSpec":{"value":{"delete":true}}}}'
      

      Replace the parameters enclosed in angle brackets with the corresponding values.

    • Delete the Machine object.

      kubectl delete machines.cluster.k8s.io -n <projectName> <machineName>
      

    Note

    • After a successful unsafe or graceful machine deletion, the resources allocated to the machine are automatically freed up.

    • After the forced machine deletion, you may require a manual cleanup of resources if the controller gets stuck at some stage.

  7. Applicable only to managed clusters. Skip this step if your management cluster is upgraded to Container Cloud 2.17.0.

    If StackLight in HA mode is enabled and the deleted machine had the StackLight label, perform the following steps:

    1. Connect to the managed cluster as described in the steps 5-7 in Connect to a Mirantis Container Cloud cluster.

    2. Define the pods in the Pending state:

      kubectl get po -n stacklight | grep Pending
      

      Example of system response:

      opensearch-master-2             0/1       Pending       0       49s
      patroni-12-0                    0/3       Pending       0       51s
      patroni-13-0                    0/3       Pending       0       48s
      prometheus-alertmanager-1       0/1       Pending       0       47s
      prometheus-server-0             0/2       Pending       0       47s
      
    3. Verify that the reason for the pod Pending state is volume node affinity conflict:

      kubectl describe pod <POD_NAME> -n stacklight
      

      Example of system response:

      Events:
        Type     Reason            Age    From               Message
        ----     ------            ----   ----               -------
        Warning  FailedScheduling  6m53s  default-scheduler  0/6 nodes are available:
                                                             3 node(s) didn't match node selector,
                                                             3 node(s) had volume node affinity conflict.
        Warning  FailedScheduling  6m53s  default-scheduler  0/6 nodes are available:
                                                             3 node(s) didn't match node selector,
                                                             3 node(s) had volume node affinity conflict.
      
    4. Obtain the PVC of one of the pods:

      kubectl get pod <POD_NAME> -n stacklight -o=jsonpath='{range .spec.volumes[*]}{.persistentVolumeClaim}{"\n"}{end}'
      

      Example of system response:

      {"claimName":"opensearch-master-opensearch-master-2"}
      
    5. Remove the PVC using the obtained name. For example, for opensearch-master-opensearch-master-2:

      kubectl delete pvc opensearch-master-opensearch-master-2 -n stacklight
      
    6. Delete the pod:

      kubectl delete po <POD_NAME> -n stacklight
      
    7. Verify that a new pod is created and scheduled to the spare node. This may take some time. For example:

      kubectl get po opensearch-master-2 -n stacklight
      NAME                     READY   STATUS   RESTARTS   AGE
      opensearch-master-2   1/1     Running  0          7m1s
      
    8. Repeat the steps above for the remaining pods in the Pending state.