TechPreview since 2.25.0 (17.0.0 and 16.0.0) for workers on managed
clusters
You can use the machine disabling API to seamlessly remove a worker machine
from the LCM control of a managed cluster. This action isolates the affected
node without impacting other machines in the cluster, effectively eliminating
it from the Kubernetes cluster. This functionality proves invaluable in
scenarios where a malfunctioning machine impedes cluster updates.
Note
The Technology Preview support of the machine disabling feature
also applies during cluster update to the Cluster release 17.1.0 or 16.1.0.
Before disabling a cluster machine, carefully read the following essential
information for a successful machine disablement:
Container Cloud supports machine disablement of worker machines only.
If an issue occurs on the control plane, which is updated before worker
machines, fix the issue or replace the affected
control machine as soon as possible to prevent issues with workloads.
For reference, see Troubleshooting and Delete a cluster machine.
Disabling a machine can break high availability (HA) of components such as
StackLight. Therefore, Mirantis recommends adding a new machine as soon
as possible to provide sufficient node number for components HA.
Note
It is expected that the cluster status contains degraded replicas
of some components during or after cluster update with a disabled machine.
These replicas become available as soon as you replace the disabled machine.
When a machine is disabled, some services may switch to the NodeReady
state and may require additional actions to unblock LCM tasks.
A disabled machine is removed from the overall cluster status and is labeled
as Disabled. The requested node number for the cluster remains
the same, but an additional disabled field is displayed with the number
of disabled nodes.
A disabled machine is not taken into account for any calculations,
for example, when the number of StackLight nodes is required for some
restriction check.
Container Cloud removes the node running the disabled machine from the
Kubernetes cluster.
Deletion of the disabled machine with the graceful deletion policy is
not allowed. Use the unsafe deletion policy instead.
For details, see Delete a cluster machine.
For a major cluster update, the Cluster release of a disabled machine must
match the Cluster release of other cluster machines.
If a machine is disabled during the major Cluster release update, then the
upgrade should be completed if all other requirements are met. However,
cluster update to the next available major Cluster release will be blocked
until you re-enable or replace the disabled machine.
Patch updates do not have such limitation on different patch Cluster
releases. You can update a cluster with a disabled machine to several patch
Cluster releases in the scope of one major Cluster release.
After enabling the machine, it will be updated to match the Cluster release
of the corresponding cluster, including all related components.
For Ceph machines, you need to perform additional disablement steps.
Disable a machine using the Container Cloud web UI¶
Carefully read the precautions for
machine disablement.
Power off the underlying host of a machine to be disabled.
Warning
If the underlying host of a machine is not powered off, the
cluster may still contain the disabled machine in the list of available
nodes with kubelet attempting to start the corresponding containers
on the disabled machine.
Therefore, Mirantis strongly recommends powering off the underlying host
to prevent manual removal of the related Kubernetes node from the Docker
Swarm cluster using the MKE web UI.
In the Clusters tab, click the required cluster name to
open the list of machines running on it.
Click the More action icon in the last column of the required
machine and click Disable.
Wait until the machine Status switches to Disabled.
If the disabled machine contains StackLight or Ceph, migrate these services
to a healthy machine:
Verify that the required disabled and healthy machines are not currently
added to GracefulRebootRequest:
Note
Machine configuration changes, such as reassigning Ceph and
StackLight labels from a disabled machine to a healthy one, which
are described in the following steps, are not allowed during graceful
reboot. For details, see Perform a graceful reboot of a cluster.
Verify that the More > Reboot machines option is not
disabled. If the option is active, skip the following sub-step and
proceed to the next step. If the option is disabled, proceed to the
following sub-step.
Using the Container Cloud CLI, verify that the new machine, which you
are going to use for StackLight or Ceph services migration, is not
included in the list of the GracefulRebootRequest resource.
Otherwise, remove GracefulRebootRequest before proceeding.
For details, see Disable a machine using the Container Cloud CLI.
Note
Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0), reboot of the disabled machine is automatically skipped in
GracefulRebootRequest.
If StackLight is deployed on the machine, unblock LCM tasks by moving the
stacklight=enabled label to another healthy machine with a sufficient
amount of resources and manually remove StackLight Pods along with related
local persistent volumes from the disabled machine. For details, see
Deschedule StackLight Pods from a worker machine.
If Ceph is deployed on the machine:
Disable a Ceph machine
Select one of the following options to open the Ceph cluster spec:
Web UI
In the CephClusters tab, click the required Ceph cluster
name to open its spec.
For mgr, rgw, or mds, move such role to another node
located in the node section. Such node must meet resource
requirements to run the corresponding daemon type and must not have the
respective node assigned yet.
For mon, refer to Move a Ceph Monitor daemon to another node for further instructions.
Mirantis recommends considering nodes with sufficient resources to run
the moved monitor daemon.
Carefully read the precautions for
machine disablement.
Power off the underlying host of a machine to be disabled.
Warning
If the underlying host of a machine is not powered off, the
cluster may still contain the disabled machine in the list of available
nodes with kubelet attempting to start the corresponding containers
on the disabled machine.
Therefore, Mirantis strongly recommends powering off the underlying host
to prevent manual removal of the related Kubernetes node from the Docker
Swarm cluster using the MKE web UI.
Open the required Machine object for editing.
In the providerSpec:value section, set disable to true:
If the disabled machine contains StackLight or Ceph, migrate these services
to a healthy machine:
Verify that the required disabled and healthy machines are not currently
added to GracefulRebootRequest:
Note
Machine configuration changes, such as reassigning Ceph and
StackLight labels from a disabled machine to a healthy one, which
are described in the following steps, are not allowed during graceful
reboot. For details, see Perform a graceful reboot of a cluster.
Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0), reboot of the disabled machine is automatically skipped in
GracefulRebootRequest.
If StackLight is deployed on the machine, unblock LCM tasks by moving the
stacklight=enabled label to another healthy machine with a sufficient
amount of resources and manually remove StackLight Pods along with related
local persistent volumes from the disabled machine. For details, see
Deschedule StackLight Pods from a worker machine.
If Ceph is deployed on the machine:
Disable a Ceph machine
Select one of the following options to open the Ceph cluster spec:
Web UI
In the CephClusters tab, click the required Ceph cluster
name to open its spec.
For mgr, rgw, or mds, move such role to another node
located in the node section. Such node must meet resource
requirements to run the corresponding daemon type and must not have the
respective node assigned yet.
For mon, refer to Move a Ceph Monitor daemon to another node for further instructions.
Mirantis recommends considering nodes with sufficient resources to run
the moved monitor daemon.