Deschedule StackLight Pods from a worker machine

On an existing managed cluster, addition of a worker machine that replaces the one containing the StackLight node label requires the label migration to the new machine and a manual removal of StackLight Pods from the old machine, which you remove the label from.

Caution

In this procedure, replace <machine-name> with the name of the machine from which you remove the StackLight node label.

To deschedule StackLight Pods from a worker machine:

  1. Remove the stacklight=enabled node label from the spec section of the target Machine object.

  2. Connect to the required cluster using its kubeconfig.

  3. Verify that the stacklight=enabled label was removed successfully:

    kubectl get node -l "kaas.mirantis.com/machine-name=<machine name>" --show-labels | grep "stacklight=enabled"
    

    A positive system response must be empty.

  4. Verify the list of StackLight Pods to be deleted that run on the target machine:

    kubectl get pods -n stacklight -o wide --field-selector spec.nodeName=$(kubectl get node -l "kaas.mirantis.com/machine-name=<machine name>" -o jsonpath='{.items[0].metadata.name}')
    

    Example of system response extract:

    NAME                                           READY STATUS    AGE   IP             NODE
    alerta-fc45c8f6-6qlfx                          1/1   Running   63m   10.233.76.3    node-3a0de232-c1b4-43b0-8f21-44cd1
    grafana-9bc56cdff-sl5w6                        3/3   Running   63m   10.233.76.4    node-3a0de232-c1b4-43b0-8f21-44cd1
    iam-proxy-alerta-57585798d7-kqwd7              1/1   Running   58m   10.233.76.17   node-3a0de232-c1b4-43b0-8f21-44cd1
    iam-proxy-alertmanager-6b4c4c8867-pdwcs        1/1   Running   56m   10.233.76.18   node-3a0de232-c1b4-43b0-8f21-44cd1
    iam-proxy-grafana-87b984c45-2qwvb              1/1   Running   55m   10.233.76.19   node-3a0de232-c1b4-43b0-8f21-44cd1
    iam-proxy-prometheus-545789585-9mll8           1/1   Running   54m   10.233.76.21   node-3a0de232-c1b4-43b0-8f21-44cd1
    patroni-13-0                                   3/3   Running   61m   10.233.76.11   node-3a0de232-c1b4-43b0-8f21-44cd1
    prometheus-alertmanager-0                      1/1   Running   55m   10.233.76.20   node-3a0de232-c1b4-43b0-8f21-44cd1
    prometheus-blackbox-exporter-9f6bdfd75-8zn4w   2/2   Running   61m   10.233.76.8    node-3a0de232-c1b4-43b0-8f21-44cd1
    prometheus-kube-state-metrics-67ff88649f-tslxc 1/1   Running   61m   10.233.76.7    node-3a0de232-c1b4-43b0-8f21-44cd1
    prometheus-node-exporter-zl8pj                 1/1   Running   61m   10.10.10.143   node-3a0de232-c1b4-43b0-8f21-44cd1
    telegraf-docker-swarm-69567fcf7f-jvbgn         1/1   Running   61m   10.233.76.10   node-3a0de232-c1b4-43b0-8f21-44cd1
    telemeter-client-55d465dcc5-9thds              1/1   Running   61m   10.233.76.9    node-3a0de232-c1b4-43b0-8f21-44cd1
    
  5. Delete all StackLight Pods from the target machine:

    kubectl -n stacklight delete $(kubectl get pods -n stacklight -o wide --field-selector spec.nodeName=$(kubectl get node -l "kaas.mirantis.com/machine-name=<machine name>" -o jsonpath='{.items[0].metadata.name}') -o name)
    

    Example of system response:

    pod "alerta-fc45c8f6-6qlfx" deleted
    pod "grafana-9bc56cdff-sl5w6" deleted
    pod "iam-proxy-alerta-57585798d7-kqwd7" deleted
    pod "iam-proxy-alertmanager-6b4c4c8867-pdwcs" deleted
    pod "iam-proxy-grafana-87b984c45-2qwvb" deleted
    pod "iam-proxy-prometheus-545789585-9mll8" deleted
    pod "patroni-13-0" deleted
    pod "prometheus-alertmanager-0" deleted
    pod "prometheus-blackbox-exporter-9f6bdfd75-8zn4w" deleted
    pod "prometheus-kube-state-metrics-67ff88649f-tslxc" deleted
    pod "prometheus-node-exporter-zl8pj" deleted
    pod "telegraf-docker-swarm-69567fcf7f-jvbgn" deleted
    pod "telemeter-client-55d465dcc5-9thds" deleted
    
  6. Wait about three minutes for Pods to be rescheduled.

  7. Verify that you do not have Pending Pods in the stacklight namespace:

    kubectl -n stacklight get pods --field-selector status.phase=Pending
    
    • If the system response is No resources found in stacklight namespace, all Pods are rescheduled successfully.

    • If the system response still contains some Pods, remove local persistent volumes (LVP) bound to the target machine.

      Remove LVP from a machine
      1. Connect to the managed cluster as described in the steps 5-7 in Connect to a Mirantis Container Cloud cluster.

      2. Define the pods in the Pending state:

        kubectl get po -n stacklight | grep Pending
        

        Example of system response:

        opensearch-master-2             0/1       Pending       0       49s
        patroni-12-0                    0/3       Pending       0       51s
        patroni-13-0                    0/3       Pending       0       48s
        prometheus-alertmanager-1       0/1       Pending       0       47s
        prometheus-server-0             0/2       Pending       0       47s
        
      3. Verify that the reason for the pod Pending state is volume node affinity conflict:

        kubectl describe pod <POD_NAME> -n stacklight
        

        Example of system response:

        Events:
          Type     Reason            Age    From               Message
          ----     ------            ----   ----               -------
          Warning  FailedScheduling  6m53s  default-scheduler  0/6 nodes are available:
                                                               3 node(s) didn't match node selector,
                                                               3 node(s) had volume node affinity conflict.
          Warning  FailedScheduling  6m53s  default-scheduler  0/6 nodes are available:
                                                               3 node(s) didn't match node selector,
                                                               3 node(s) had volume node affinity conflict.
        
      4. Obtain the PVC of one of the pods:

        kubectl get pod <POD_NAME> -n stacklight -o=jsonpath='{range .spec.volumes[*]}{.persistentVolumeClaim}{"\n"}{end}'
        

        Example of system response:

        {"claimName":"opensearch-master-opensearch-master-2"}
        
      5. Remove the PVC using the obtained name. For example, for opensearch-master-opensearch-master-2:

        kubectl delete pvc opensearch-master-opensearch-master-2 -n stacklight
        
      6. Delete the pod:

        kubectl delete po <POD_NAME> -n stacklight
        
      7. Verify that a new pod is created and scheduled to the spare node. This may take some time. For example:

        kubectl get po opensearch-master-2 -n stacklight
        NAME                     READY   STATUS   RESTARTS   AGE
        opensearch-master-2   1/1     Running  0          7m1s
        
      8. Repeat the steps above for the remaining pods in the Pending state.