Cluster update known issues

This section lists the cluster update known issues with workarounds for the Mirantis OpenStack for Kubernetes release 21.5.

[4288] Cluster update failure with kubelet being stuck

A MOS managed cluster may fail to update to the latest Cluster release with kubelet being stuck and reporting authorization errors.

The cluster is affected by the issue if you see the Failed to make webhook authorizer request: context canceled error in the kubelet logs:

docker logs ucp-kubelet --since 5m 2>&1 | grep 'Failed to make webhook authorizer request: context canceled'

As a workaround, restart the ucp-kubelet container on the affected node(s):

ctr -n com.docker.ucp snapshot rm ucp-kubelet
docker rm -f ucp-kubelet

Note

Ignore failures in the output of the first command, if any.


[16987] Сluster update fails at Ceph CSI pod eviction

An update of a MOS managed cluster may fail with the ceph csi-driver is not evacuated yet, waiting… error during the Ceph CSI pod eviction.

Workaround:

  1. Scale the affected StatefulSet of the pod that fails to init down to 0 replicas. If it is the DaemonSet such as nova-compute, it must not be scheduled on the affected node.

  2. On every csi-rbdplugin pod, search for stuck csi-vol:

    rbd device list | grep <csi-vol-uuid>
    
  3. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    
  4. Delete volumeattachment of the affected pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  5. Scale the affected StatefulSet back to the original number of replicas and until its state is Running. If it is a DaemonSet, run the pod on the affected node again.


[15525] HelmBundle controller gets stuck during cluster update

The HelmBundle controller that handles OpenStack releases gets stuck during cluster update and does not apply HelmBundle changes. The issue is caused by an unlimited releases history that increases the amount of RAM consumed by Tiller. The workaround is to manually limit the releases number history to 3.

Workaround:

  1. Remove the old releases:

    1. Clean up releases in the stacklight namespace:

      function cleanup_release_history {
         pattern=$1
         left_items=${2:-3}
         for i in $(kubectl -n stacklight get cm |grep "$pattern" | awk '{print $1}' | sort -V | head -n -${left_items})
         do
           kubectl -n stacklight delete cm $i
         done
      }
      

      For example:

      kubectl -n stacklight get cm |grep "openstack-cinder.v" | awk '{print $1}'
      openstack-cinder.v1
      ...
      openstack-cinder.v50
      openstack-cinder.v51
      cleanup_release_history openstack-cinder.v
      
  2. Fix the releases in the FAILED state:

    1. Connect to one of StackLight Helm controller pods and list the releases in the FAILED state:

      kubectl -n stacklight exec -it stacklight-helm-controller-699cc6949-dtfgr -- sh
      ./helm --host localhost:44134 list
      

      Example of system response:

      # openstack-heat            2313   Wed Jun 23 06:50:55 2021   FAILED   heat-0.1.0-mcp-3860      openstack
      # openstack-keystone        76     Sun Jun 20 22:47:50 2021   FAILED   keystone-0.1.0-mcp-3860  openstack
      # openstack-neutron         147    Wed Jun 23 07:00:37 2021   FAILED   neutron-0.1.0-mcp-3860   openstack
      # openstack-nova            1      Wed Jun 23 07:09:43 2021   FAILED   nova-0.1.0-mcp-3860      openstack
      # openstack-nova-rabbitmq   15     Wed Jun 23 07:04:38 2021   FAILED   rabbitmq-0.1.0-mcp-2728  openstack
      
    2. Determine the reason for a release failure. Typically, this is due to changes in the immutable objects (jobs). For example:

      ./helm --host localhost:44134 history openstack-mariadb
      

      Example of system response:

      REVISION   UPDATED                    STATUS     CHART                   APP VERSION   DESCRIPTION
      173        Thu Jun 17 20:26:14 2021   DEPLOYED   mariadb-0.1.0-mcp-2710                Upgrade complete
      212        Wed Jun 23 07:07:58 2021   FAILED     mariadb-0.1.0-mcp-2728                Upgrade "openstack-mariadb" failed: Job.batch "openstack-...
      213        Wed Jun 23 07:55:22 2021   FAILED     mariadb-0.1.0-mcp-2728                Upgrade "openstack-mariadb" failed: Job.batch "exporter-c...
      
    3. Remove the FAILED job and roll back the release. For example:

      kubectl -n openstack delete job -l application=mariadb
      ./helm --host localhost:44134 rollback openstack-mariadb 213
      
    4. Verify that the release is in the DEPLOYED state. For example:

      ./helm --host localhost:44134 history openstack-mariadb
      
    5. Perform the steps above for all releases in the FAILED state one by one.

  3. Set TILLER_HISTORY_MAX in the StackLight controller to 3:

    kubectl -n stacklight edit deployment stacklight-helm-controller