Deschedule StackLight Pods from a worker machine¶
On an existing managed cluster, addition of a worker machine that replaces the one containing the StackLight node label requires the label migration to the new machine and a manual removal of StackLight Pods from the old machine, which you remove the label from.
Caution
In this procedure, replace <machine-name>
with the name of
the machine from which you remove the StackLight node label.
To deschedule StackLight Pods from a worker machine:
Remove the
stacklight=enabled
node label from thespec
section of the targetMachine
object.Connect to the required cluster using its
kubeconfig
.Verify that the
stacklight=enabled
label was removed successfully:kubectl get node -l "kaas.mirantis.com/machine-name=<machine name>" --show-labels | grep "stacklight=enabled"
A positive system response must be empty.
Verify the list of StackLight Pods to be deleted that run on the target machine:
kubectl get pods -n stacklight -o wide --field-selector spec.nodeName=$(kubectl get node -l "kaas.mirantis.com/machine-name=<machine name>" -o jsonpath='{.items[0].metadata.name}')
Example of system response extract:
NAME READY STATUS AGE IP NODE alerta-fc45c8f6-6qlfx 1/1 Running 63m 10.233.76.3 node-3a0de232-c1b4-43b0-8f21-44cd1 grafana-9bc56cdff-sl5w6 3/3 Running 63m 10.233.76.4 node-3a0de232-c1b4-43b0-8f21-44cd1 iam-proxy-alerta-57585798d7-kqwd7 1/1 Running 58m 10.233.76.17 node-3a0de232-c1b4-43b0-8f21-44cd1 iam-proxy-alertmanager-6b4c4c8867-pdwcs 1/1 Running 56m 10.233.76.18 node-3a0de232-c1b4-43b0-8f21-44cd1 iam-proxy-grafana-87b984c45-2qwvb 1/1 Running 55m 10.233.76.19 node-3a0de232-c1b4-43b0-8f21-44cd1 iam-proxy-prometheus-545789585-9mll8 1/1 Running 54m 10.233.76.21 node-3a0de232-c1b4-43b0-8f21-44cd1 patroni-13-0 3/3 Running 61m 10.233.76.11 node-3a0de232-c1b4-43b0-8f21-44cd1 prometheus-alertmanager-0 1/1 Running 55m 10.233.76.20 node-3a0de232-c1b4-43b0-8f21-44cd1 prometheus-blackbox-exporter-9f6bdfd75-8zn4w 2/2 Running 61m 10.233.76.8 node-3a0de232-c1b4-43b0-8f21-44cd1 prometheus-kube-state-metrics-67ff88649f-tslxc 1/1 Running 61m 10.233.76.7 node-3a0de232-c1b4-43b0-8f21-44cd1 prometheus-node-exporter-zl8pj 1/1 Running 61m 10.10.10.143 node-3a0de232-c1b4-43b0-8f21-44cd1 telegraf-docker-swarm-69567fcf7f-jvbgn 1/1 Running 61m 10.233.76.10 node-3a0de232-c1b4-43b0-8f21-44cd1 telemeter-client-55d465dcc5-9thds 1/1 Running 61m 10.233.76.9 node-3a0de232-c1b4-43b0-8f21-44cd1
Delete all StackLight Pods from the target machine:
kubectl -n stacklight delete $(kubectl get pods -n stacklight -o wide --field-selector spec.nodeName=$(kubectl get node -l "kaas.mirantis.com/machine-name=<machine name>" -o jsonpath='{.items[0].metadata.name}') -o name)
Example of system response:
pod "alerta-fc45c8f6-6qlfx" deleted pod "grafana-9bc56cdff-sl5w6" deleted pod "iam-proxy-alerta-57585798d7-kqwd7" deleted pod "iam-proxy-alertmanager-6b4c4c8867-pdwcs" deleted pod "iam-proxy-grafana-87b984c45-2qwvb" deleted pod "iam-proxy-prometheus-545789585-9mll8" deleted pod "patroni-13-0" deleted pod "prometheus-alertmanager-0" deleted pod "prometheus-blackbox-exporter-9f6bdfd75-8zn4w" deleted pod "prometheus-kube-state-metrics-67ff88649f-tslxc" deleted pod "prometheus-node-exporter-zl8pj" deleted pod "telegraf-docker-swarm-69567fcf7f-jvbgn" deleted pod "telemeter-client-55d465dcc5-9thds" deleted
Wait about three minutes for Pods to be rescheduled.
Verify that you do not have
Pending
Pods in thestacklight
namespace:kubectl -n stacklight get pods --field-selector status.phase=Pending
If the system response is
No resources found in stacklight namespace
, all Pods are rescheduled successfully.If the system response still contains some Pods, remove local persistent volumes (LVP) bound to the target machine.
Remove LVP from a machine
Connect to the managed cluster as described in the steps 5-7 in Connect to a Mirantis Container Cloud cluster.
Define the pods in the
Pending
state:kubectl get po -n stacklight | grep Pending
Example of system response:
opensearch-master-2 0/1 Pending 0 49s patroni-12-0 0/3 Pending 0 51s patroni-13-0 0/3 Pending 0 48s prometheus-alertmanager-1 0/1 Pending 0 47s prometheus-server-0 0/2 Pending 0 47s
Verify that the reason for the pod
Pending
state isvolume node affinity conflict
:kubectl describe pod <POD_NAME> -n stacklight
Example of system response:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 6m53s default-scheduler 0/6 nodes are available: 3 node(s) didn't match node selector, 3 node(s) had volume node affinity conflict. Warning FailedScheduling 6m53s default-scheduler 0/6 nodes are available: 3 node(s) didn't match node selector, 3 node(s) had volume node affinity conflict.
Obtain the PVC of one of the pods:
kubectl get pod <POD_NAME> -n stacklight -o=jsonpath='{range .spec.volumes[*]}{.persistentVolumeClaim}{"\n"}{end}'
Example of system response:
{"claimName":"opensearch-master-opensearch-master-2"}
Remove the PVC using the obtained name. For example, for
opensearch-master-opensearch-master-2
:kubectl delete pvc opensearch-master-opensearch-master-2 -n stacklight
Delete the pod:
kubectl delete po <POD_NAME> -n stacklight
Verify that a new pod is created and scheduled to the spare node. This may take some time. For example:
kubectl get po opensearch-master-2 -n stacklight NAME READY STATUS RESTARTS AGE opensearch-master-2 1/1 Running 0 7m1s
Repeat the steps above for the remaining pods in the
Pending
state.