If one of the Tungsten Fabric (TF) controller nodes has failed, follow this procedure to replace it with a new node.
To replace a TF controller node:
Note
Pods that belong to the failed node can stay in the Terminating
state.
Delete the failed TF controller node from the Kubernetes cluster:
kubectl delete node <FAILED-TF-CONTROLLER-NODE-NAME>
Note
Once the failed node has been removed from the cluster, all pods
that hanged in the Terminating
state should be removed.
Assign the TF labels for the new control plane node as per the table below using the following command:
kubectl label node <NODE-NAME> <LABEL-KEY=LABEL-VALUE> ...
Node role |
Description |
Kubernetes labels |
Minimal count |
---|---|---|---|
TF control plane |
Hosts the TF control plane services such as |
tfconfig=enabled tfcontrol=enabled tfwebui=enabled tfconfigdb=enabled |
3 |
TF analytics |
Hosts the TF analytics services. |
tfanalytics=enabled tfanalyticsdb=enabled |
3 |
TF vRouter |
Hosts the TF vRouter module and vRouter agent. |
tfvrouter=enabled |
Varies |
TF vRouter DPDK Technical Preview |
Hosts the TF vRouter agent in DPDK mode. |
tfvrouter-dpdk=enabled |
Varies |
Note
TF supports only Kubernetes OpenStack workloads.
Therefore, you should label OpenStack compute nodes with
the tfvrouter=enabled
label.
Note
Do not specify the openstack-gateway=enabled
and openvswitch=enabled
labels for the MOS deployments
with TF as a networking back end for OpenStack.
Once you label the new Kubernetes node, new pods start scheduling on the
node. Though, pods that use Persistent Volume Claims are stuck in the
Pending
state as their volume claims stay bounded to the local volumes
from the deleted node. To resolve the issue:
Delete the PersistentVolumeClaim (PVC) bounded to the local volume from the failed node:
kubectl -n tf delete pvc <PVC-BOUNDED-TO-NON-EXISTING-VOLUME>
Note
Clustered services that use PVC, such as Cassandra, Kafka,
and ZooKeeper, start the replication process when new pods move
to the Ready
state.
Delete the pod that is using the removed PVC:
kubectl -n tf delete pod <POD-NAME>
Verify that the pods have successfully started on the replaced controller
node and stay in the Ready
state.
Delete terminated nodes from the TF configuration through the TF web UI:
Log in to the TF web UI.
Navigate to Configure > BGP Routers.
Delete all terminated control nodes.
Note
You can manage nodes of other types from Configure > Nodes.