AWS-based managed cluster requiring PVs fails to be deployed

Note

The issue below affects only the Kubernetes 1.18 deployments and is fixed in the Cluster release 7.0.0 that introduces support for Kubernetes 1.20.

On a management cluster with multiple AWS-based managed clusters, some clusters may fail to complete the deployments that require persistent volumes (PVs), for example, Elasticsearch. Some of the affected pods may get stuck in the Pending state with the pod has unbound immediate PersistentVolumeClaims and node(s) had volume node affinity conflict errors.

Warning

The below issue resolution applies to HA deployments where data can be rebuilt from replicas. If you have a non-HA deployment, back up any existing data before proceeding, since all data will be lost while applying the issue resolution.

To apply the issue resolution:

  1. Obtain the persistent volume claims related to the storage mounts of the affected pods:

    kubectl get pod/<pod_name1> pod/<pod_name2> \
    -o jsonpath='{.spec.volumes[?(@.persistentVolumeClaim)].persistentVolumeClaim.claimName}'
    

    Note

    In the above command and in the subsequent step, substitute the parameters enclosed in angle brackets with the corresponding values.

  2. Delete the affected Pods and PersistentVolumeClaims to reschedule them. For example, for StackLight:

    kubectl -n stacklight delete \
    
      pod/<pod_name1> pod/<pod_name2> ...
      pvc/<pvc_name2> pvc/<pvc_name2> ...