Disable a machine¶
Available for workers on MOSK clusters TechPreview
You can use the machine disabling API to seamlessly remove a worker machine from the LCM control of a MOSK cluster. This action isolates the affected node without impacting other machines in the cluster, effectively eliminating it from the Kubernetes cluster. This functionality proves invaluable in scenarios where a malfunctioning machine impedes cluster updates.
Precautions for machine disablement¶
Before disabling a cluster machine, carefully read the following essential information for a successful machine disablement:
MOSK supports machine disablement of worker machines only.
If an issue occurs on the control plane, which is updated before worker machines, fix the issue or replace the affected control machine as soon as possible to prevent issues with workloads. For reference, see Troubleshooting Guide and Delete a cluster machine.
Disabling a machine can break high availability (HA) of components such as StackLight. Therefore, Mirantis recommends adding a new machine as soon as possible to provide sufficient node number for components HA.
Note
It is expected that the cluster status contains degraded replicas of some components during or after cluster update with a disabled machine. These replicas become available as soon as you replace the disabled machine.
When a machine is
disabled, some services may switch to theNodeReadystate and may require additional actions to unblock LCM tasks.A disabled machine is removed from the overall cluster status and is labeled as Disabled. The requested node number for the cluster remains the same, but an additional
disabledfield is displayed with the number of disabled nodes.A disabled machine is not taken into account for any calculations, for example, when the number of StackLight nodes is required for some restriction check.
MOSK removes the node running the disabled machine from the Kubernetes cluster.
Deletion of the disabled machine with the
gracefuldeletion policy is not allowed. Use theunsafedeletion policy instead. For details, see Delete a cluster machine.For a major cluster update, the Cluster release of a disabled machine must match the Cluster release of other cluster machines.
If a machine is disabled during the major Cluster release update, then the upgrade should be completed if all other requirements are met. However, cluster update to the next available major Cluster release will be blocked until you re-enable or replace the disabled machine.
Patch updates do not have such limitation on different patch Cluster releases. You can update a cluster with a disabled machine to several patch Cluster releases in the scope of one major Cluster release.
After enabling the machine, it will be updated to match the Cluster release of the corresponding cluster, including all related components.
For Ceph machines, you need to perform additional disablement steps.
Disable a machine using the MOSK management console¶
Carefully read the precautions for machine disablement.
Power off the underlying host of a machine to be disabled.
Warning
If the underlying host of a machine is not powered off, the cluster may still contain the disabled machine in the list of available nodes with
kubeletattempting to start the corresponding containers on the disabled machine.Therefore, Mirantis strongly recommends powering off the underlying host to prevent manual removal of the related Kubernetes node from the Docker Swarm cluster using the MKE web UI.
In the Clusters tab, click the required cluster name to open the list of machines running on it.
Click the More action icon in the last column of the required machine and click Disable.
Wait until the machine Status switches to Disabled.
If the disabled machine contains StackLight or Ceph, migrate these services to a healthy machine:
Verify that the required disabled and healthy machines are not currently added to
GracefulRebootRequest:Note
Machine configuration changes, such as reassigning Ceph and StackLight labels from a disabled machine to a healthy one, which are described in the following steps, are not allowed during graceful reboot. For details, see Perform a graceful reboot of a cluster.
Verify that the More > Reboot machines option is not disabled. If the option is active, skip the following sub-step and proceed to the next step. If the option is disabled, proceed to the following sub-step.
Using the MOSK management CLI, verify that the new machine, which you are going to use for StackLight or Ceph services migration, is not included in the list of the
GracefulRebootRequestresource. Otherwise, removeGracefulRebootRequestbefore proceeding. For details, see Disable a machine using the MOSK management CLI.
Note
Reboot of the disabled machine is automatically skipped in
GracefulRebootRequest.If StackLight is deployed on the machine, unblock LCM tasks by moving the
stacklight=enabledlabel to another healthy machine with a sufficient amount of resources and manually remove StackLight Pods along with related local persistent volumes from the disabled machine. For details, see Deschedule StackLight Pods from a worker machine.If Ceph is deployed on the machine:
Disable a Ceph machine
Open the
CephDeploymentobject for editing:kubectl -n ceph-lcm-mirantis edit cephdeployment
In
spec.nodes, find the machine to be disabled.Back up the machine configuration.
Verify the machine role:
For
mgr,rgw, ormds, move such role to another node located in thenodesection. Such node must meet resource requirements to run the corresponding daemon type and must not have the respective node assigned yet.For
mon, refer to Move a Ceph Monitor daemon to another node for further instructions. Mirantis recommends considering nodes with sufficient resources to run the moved monitor daemon.For
osd, proceed to the next step.
Remove the machine from
spec.
Enable machine using the MOSK management console¶
In the Clusters tab, click the required cluster name to open the list of machines running on it.
Click the More action icon in the last column of the required machine and click Enable.
Wait until the machine Status switches to Ready.
If Ceph is deployed on the machine:
Enable a Ceph machine
Open the
CephDeploymentobject for editing:kubectl -n ceph-lcm-mirantis edit cephdeployment
In
spec.nodes, add a new or backed-up configuration of the machine to be enabled.If the machine must have any role besides
osd, consider the following options to return a role back to the node:For
mgr,rgw, ormds, add the role to the enabled node in thenodesection.For
mon, refer to Move a Ceph Monitor daemon to another node for further instructions.
Disable a machine using the MOSK management CLI¶
Carefully read the precautions for machine disablement.
Power off the underlying host of a machine to be disabled.
Warning
If the underlying host of a machine is not powered off, the cluster may still contain the disabled machine in the list of available nodes with
kubeletattempting to start the corresponding containers on the disabled machine.Therefore, Mirantis strongly recommends powering off the underlying host to prevent manual removal of the related Kubernetes node from the Docker Swarm cluster using the MKE web UI.
Open the required
Machineobject for editing.In the
providerSpec:valuesection, setdisabletotrue:kubectl patch machines.cluster.k8s.io -n <projectName> <machineName> --type=merge -p '{"spec":{"providerSpec":{"value":{"disable":true}}}}'
Wait until the machine status switches to
Disabled:kubectl get machines.cluster.k8s.io -n <projectName> <machineName> -o jsonpath='{.status.providerStatus.status}'
If the disabled machine contains StackLight or Ceph, migrate these services to a healthy machine:
Verify that the required disabled and healthy machines are not currently added to
GracefulRebootRequest:Note
Machine configuration changes, such as reassigning Ceph and StackLight labels from a disabled machine to a healthy one, which are described in the following steps, are not allowed during graceful reboot. For details, see Perform a graceful reboot of a cluster.
kubectl get gracefulrebootrequest -A kubectl -n <projectName> get gracefulrebootrequest <gracefulRebootRequestName> -o yaml
If the machine is listed in the object
specsection, remove theGracefulRebootRequestobject:kubectl -n <projectName> delete gracefulrebootrequest <gracefulRebootRequestName>
Note
Reboot of the disabled machine is automatically skipped in
GracefulRebootRequest.If StackLight is deployed on the machine, unblock LCM tasks by moving the
stacklight=enabledlabel to another healthy machine with a sufficient amount of resources and manually remove StackLight Pods along with related local persistent volumes from the disabled machine. For details, see Deschedule StackLight Pods from a worker machine.If Ceph is deployed on the machine:
Disable a Ceph machine
Open the
CephDeploymentobject for editing:kubectl -n ceph-lcm-mirantis edit cephdeployment
In
spec.nodes, find the machine to be disabled.Back up the machine configuration.
Verify the machine role:
For
mgr,rgw, ormds, move such role to another node located in thenodesection. Such node must meet resource requirements to run the corresponding daemon type and must not have the respective node assigned yet.For
mon, refer to Move a Ceph Monitor daemon to another node for further instructions. Mirantis recommends considering nodes with sufficient resources to run the moved monitor daemon.For
osd, proceed to the next step.
Remove the machine from
spec.
Enable a machine using the MOSK management CLI¶
Open the required
Machineobject for editing.In the
providerSpec:valuesection, setdisabletofalse:kubectl patch machines.cluster.k8s.io -n <projectName> <machineName> --type=merge -p '{"spec":{"providerSpec":{"value":{"disable":false}}}}'
Wait until the machine status switches to
Ready:kubectl get machines.cluster.k8s.io -n <projectName> <machineName> -o jsonpath='{.status.providerStatus.status}'
If Ceph is deployed on the machine:
Enable a Ceph machine
Open the
CephDeploymentobject for editing:kubectl -n ceph-lcm-mirantis edit cephdeployment
In
spec.nodes, add a new or backed-up configuration of the machine to be enabled.If the machine must have any role besides
osd, consider the following options to return a role back to the node:For
mgr,rgw, ormds, add the role to the enabled node in thenodesection.For
mon, refer to Move a Ceph Monitor daemon to another node for further instructions.