OpenStack known issues

This section lists the OpenStack known issues with workarounds for the Mirantis OpenStack for Kubernetes release 21.3.


[15525] HelmBundle Controller gets stuck during cluster update

Affects only MOS 21.3

The HelmBundle Controller that handles OpenStack releases gets stuck during cluster update and does not apply HelmBundle changes. The issue is caused by an unlimited releases history that increases the amount of RAM consumed by Tiller. The workaround is to manually limit the releases number history to 3.

Workaround:

  1. Remove the old releases:

    1. Clean up releases in the stacklight namespace:

      function cleanup_release_history {
         pattern=$1
         left_items=${2:-3}
         for i in $(kubectl -n stacklight get cm |grep "$pattern" | awk '{print $1}' | sort -V | head -n -${left_items})
         do
           kubectl -n stacklight delete cm $i
         done
      }
      

      For example:

      kubectl -n stacklight get cm |grep "openstack-cinder.v" | awk '{print $1}'
      openstack-cinder.v1
      ...
      openstack-cinder.v50
      openstack-cinder.v51
      cleanup_release_history openstack-cinder.v
      
  2. Fix the releases in the FAILED state:

    1. Connect to one of StackLight Helm Controller pods and list the releases in the FAILED state:

      kubectl -n stacklight exec -it stacklight-helm-controller-699cc6949-dtfgr -- sh
      ./helm --host localhost:44134 list
      

      Example of system response:

      # openstack-heat            2313   Wed Jun 23 06:50:55 2021   FAILED   heat-0.1.0-mcp-3860      openstack
      # openstack-keystone        76     Sun Jun 20 22:47:50 2021   FAILED   keystone-0.1.0-mcp-3860  openstack
      # openstack-neutron         147    Wed Jun 23 07:00:37 2021   FAILED   neutron-0.1.0-mcp-3860   openstack
      # openstack-nova            1      Wed Jun 23 07:09:43 2021   FAILED   nova-0.1.0-mcp-3860      openstack
      # openstack-nova-rabbitmq   15     Wed Jun 23 07:04:38 2021   FAILED   rabbitmq-0.1.0-mcp-2728  openstack
      
    2. Determine the reason for a release failure. Typically, this is due to changes in the immutable objects (jobs). For example:

      ./helm --host localhost:44134 history openstack-mariadb
      

      Example of system response:

      REVISION   UPDATED                    STATUS     CHART                   APP VERSION   DESCRIPTION
      173        Thu Jun 17 20:26:14 2021   DEPLOYED   mariadb-0.1.0-mcp-2710                Upgrade complete
      212        Wed Jun 23 07:07:58 2021   FAILED     mariadb-0.1.0-mcp-2728                Upgrade "openstack-mariadb" failed: Job.batch "openstack-...
      213        Wed Jun 23 07:55:22 2021   FAILED     mariadb-0.1.0-mcp-2728                Upgrade "openstack-mariadb" failed: Job.batch "exporter-c...
      
    3. Remove the FAILED job and roll back the release. For example:

      kubectl -n openstack delete job -l application=mariadb
      ./helm --host localhost:44134 rollback openstack-mariadb 213
      
    4. Verify that the release is in the DEPLOYED state. For example:

      ./helm --host localhost:44134 history openstack-mariadb
      
    5. Perform the steps above for all releases in the FAILED state one by one.

  3. Set TILLER_HISTORY_MAX in the StackLight Controller to 3:

    kubectl -n stacklight edit deployment stacklight-helm-controller
    

[13273] Octavia amphora may get stuck after cluster update

Fixed in MOS 21.4

After the MOS cluster update, Octavia amphora may get stuck with the WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying. error message present in the Octavia worker logs. The workaround is to manually switch the Octavia amphorae driver from V2 to V1.

Workaround:

  1. In the OsDpl CR, specify the following configuration:

    spec:
      services:
        load-balancer:
          octavia:
            values:
              conf:
                octavia:
                  api_settings:
                    default_provider_driver: amphora
    
  2. Trigger the OpenStack deployment to restart Octavia:

    kubectl apply -f openstackdeployment.yaml
    

    To monitor the status:

    kubectl -n openstack get pods
    kubectl -n openstack describe osdpl osh-dev
    

[6912] Octavia load balancers may not work properly with DVR

Limitation

When Neutron is deployed in the DVR mode, Octavia load balancers may not work correctly. The symptoms include both failure to properly balance traffic and failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.