Reschedule stateful applications

The rescheduling of stateful applications may be required when replacing a permanently failed node, decommissioning a node, migrating applications to nodes with a more suitable set of hardware, and in several other use cases.

MOS deployment profiles include the following stateful applications:

  • OpenStack database (MariaDB)

  • OpenStack coordination (etcd)

  • OpenStack Time Series Database back end (Redis)

Each stateful application from the list above has a persistent volume claim (PVC) based on a local persistent volume per pod. Each of control plane nodes has a set of local volumes available. To migrate an application pod to another node, recreate a PVC with the persistent volume from the target node.

Caution

A stateful application pod can only be migrated to a node that does not contain other pods of this application.

Caution

When a PVC is removed, all data present in the related persistent volume is removed from the node as well.

Reschedule pods to another control plane node

This section describes how to reschedule pods for MariaDB, etcd, and Redis to another control plane node.

To reschedule pods for MariaDB:

  1. Recreate PVCs as described in Recreate a PVC on another control plane node.

  2. Remove the pod:

    Note

    To remove a pod from a node in the NotReady state, add --grace-period=0 --force to the following command.

    kubectl -n openstack delete pod <STATEFULSET-NAME>-<NUMBER>
    
  3. Wait until the pod appears in the Ready state.

    When the rescheduling is finalized, the <STATEFULSET-NAME>-<NUMBER> pod rejoins the Galera cluster with a clean MySQL data directory and requests the Galera state transfer from the available nodes.

To reschedule pods for Redis:

  1. Recreate PVCs as described in Recreate a PVC on another control plane node.

  2. Remove the pod:

    Note

    To remove a pod from a node in the NotReady state, add --grace-period=0 --force to the following command.

    kubectl -n openstack-redis delete pod <STATEFULSET-NAME>-<NUMBER>
    
  3. Wait until the pod is in the Ready state.

To reschedule pods for etcd:

Warning

During the reschedule procedure of the etcd LCM, a short cluster downtime is expected.

  1. On the failed node, identify the etcd replica ID that is a numeric suffix in a pod name. For example, the ID of the etcd-etcd-0 node id 0. This ID is required during the reschedule procedure.

    kubectl -n openstack get pods | grep etcd
    
    etcd-etcd-0                    0/1     Pending                 0          3m52s
    etcd-etcd-1                    1/1     Running                 0          39m
    etcd-etcd-2                    1/1     Running                 0          39m
    
  2. If the replica ID is 1 or higher:

    1. Add the coordination section to the spec.services section of the OsDpl object:

      spec:
        services:
          coordination:
            etcd:
              values:
                conf:
                  etcd:
                    ETCD_INITIAL_CLUSTER_STATE: existing
      
    2. Wait for the etcd statefulSet to update the new state parameter:

      kubectl -n openstack get sts etcd-etcd -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="ETCD_INITIAL_CLUSTER_STATE")].value}'
      
  3. Scale down the etcd StatefulSet to 0 replicas. Verify that no replicas are running on the failed node.

    kubectl -n openstack scale sts etcd-etcd --replicas=0
    
  4. Recreate PVCs as described in Recreate a PVC on another control plane node.

  5. Scale the etcd StatefulSet to the initial number of replicas:

    kubectl -n openstack scale sts etcd-etcd --replicas=<NUMBER-OF-REPLICAS>
    
  6. Wait until all etcd pods are in the Ready state.

  7. Verify that the etcd cluster is healthy:

    kubectl -n openstack exec -t etcd-etcd-1 -- etcdctl cluster-health
    
  8. If the replica ID is 1 or higher:

    1. Remove the coordination section from the spec.services section of the OsDpl object.

    2. Wait until all etcd pods appear in the Ready state.

    3. Verify that the etcd cluster is healthy:

      kubectl -n openstack exec -t etcd-etcd-1 -- etcdctl cluster-health
      

Recreate a PVC on another control plane node

This section describes how to recreate a PVC of a stateful application on another control plane node.

To recreate a PVC on another control plane node:

  1. Select one of the persistent volumes available on the node:

    Caution

    A stateful application pod can only be migrated to the node that does not contain other pods of this application.

    NODE_NAME=<NODE-NAME>
    STORAGE_CLASS=$(kubectl -n openstack get osdpl <OSDPL_OBJECT_NAME> -o jsonpath='{.spec.local_volume_storage_class}')
    kubectl -n openstack get pv -o json | jq --arg NODE_NAME $NODE_NAME --arg STORAGE_CLASS $STORAGE_CLASS -r '.items[] | select(.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0] == $NODE_NAME and .spec.storageClassName == $STORAGE_CLASS and .status.phase == "Available") | .metadata.name'
    
  2. As the new PVC should contain the same parameters as the deleted one except for volumeName, save the old PVC configuration in YAML:

    kubectl -n <NAMESPACE> get pvc <PVC-NAME> -o yaml > <OLD-PVC>.yaml
    

    Note

    <NAMESPACE> is a Kubernetes namespace where the PVC is created. For Redis, specify openstack-redis, for other applications specify openstack.

  3. Delete the old PVC:

    kubectl -n <NAMESPACE> delete pvc <PVC-NAME>
    

    Note

    If a PVC has stuck in the terminating state, run kubectl -n openstack edit pvc <PVC-NAME> and remove the finalizers section from metadata of the PVC.

  4. Create a PVC with a new persistent volume:

    cat <<EOF | kubectl apply -f -
         apiVersion: v1
         kind: PersistentVolumeClaim
         metadata:
           name: <PVC-NAME>
           namespace: <NAMESPACE>
         spec:
           accessModes:
           - ReadWriteOnce
           resources:
             requests:
               storage: <STORAGE-SIZE>
           storageClassName: <STORAGE-CLASS>
           volumeMode: Filesystem
           volumeName: <PV-NAME>
        EOF
    

    Caution

    <STORAGE-SIZE>, <STORAGE-CLASS>, and <NAMESPACE> should correspond to the storage, storageClassName, and namespace values from the <OLD-PVC>.yaml file with the old PVC configuration.