Replace a failed manager node
This section describes how to replace a failed manager node in both
MOSK management and MOSK clusters. The
procedure applies to the manager nodes that are, for example, permanently
failed due to a hardware failure and remain in the NotReady state.
Caution
If your MOSK cluster is deployed with a compact control plane, follow the Replace a failed controller node procedure.
To replace a failed manager node:
Verify that the affected manager node is in the
NotReadystate:kubectl get nodes <NODE-NAME>
Example of system response:
NAME STATUS ROLES AGE VERSION <NODE-NAME> NotReady <none> 10d v1.18.8-mirantis-1
Delete the affected manager node as described in Delete a cluster machine.
Add a manager node as described in Add a machine.
Strongly recommended. Back up MKE as described in Mirantis Kubernetes Engine documentation: Back up MKE.
Since the procedure above modifies the cluster configuration, a fresh backup is required to restore the cluster in case further reconfigurations fail.
Important
Because the MKE restoration process is complicated, we strongly recommend contacting Mirantis support for assistance.
If you still decide to restore MKE from a backup on your own, you must scale down
helm-controlleron the cluster being restored if the MKE version of the affected cluster after the restore will differ from the MKE version in theClusterReleaseobject that is set in MOSK Cluster objects in the management cluster:If you are restoring MKE on a management cluster: before starting the restore, scale down
helm-controlleron each affected MOSK cluster. This prevents unintended Ceph and OpenStack downgrades on MOSK clusters after the management cluster is restored.If you are restoring MKE on a MOSK cluster: immediately after the restore completes, scale down
helm-controller. Because the restore rolls the cluster back to an older release, this prevents it from triggering a premature upgrade of Helm releases.