This section describes how to replace a failed control plane node in your
MOS deployment. The procedure applies to the control plane nodes that are,
for example, permanently failed due to a hardware failure and appear in the
NotReady
state:
kubectl get nodes <CONTAINER-CLOUD-NODE-NAME>
Example of system response:
NAME STATUS ROLES AGE VERSION
<CONTAINER-CLOUD-NODE-NAME> NotReady <none> 10d v1.18.8-mirantis-1
To replace a failed controller node:
Remove the Kubernetes labels from the failed node by editing
the .metadata.labels
node object:
kubectl edit node <CONTAINER-CLOUD-NODE-NAME>
Add the control plane node to your deployment as described in Add a controller node.
Identify all stateful applications present on the failed node:
node=<CONTAINER-CLOUD-NODE-NAME>
claims=$(kubectl -n openstack get pv -o jsonpath="{.items[?(@.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0] == '${node}')].spec.claimRef.name}")
for i in $claims; do echo $i; done
Example of system response:
mysql-data-mariadb-server-2
openstack-operator-bind-mounts-rfr-openstack-redis-1
etcd-data-etcd-etcd-0
Reschedule stateful applications pods to healthy controller nodes as described in Reschedule stateful applications.
Remove the OpenStack port related to the Octavia health manager pod of the failed node:
kubectl -n openstack exec -t <KEYSTONE-CLIENT-POD-NAME> openstack port delete octavia-health-manager-listen-port-<NODE-NAME>