OpenStack known issues

This section lists the OpenStack known issues with workarounds for the Mirantis OpenStack for Kubernetes release 24.2.

[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    

[42725] OpenStack Controller Exporter fails to scrub metrics after credential rotation

Fixed in 24.1.6

Ocassionally, after the credential rotation, OpenStack Controller Exporter fails to scrub the metrics. To work around the issue, restart the pod with openstack-controller-exporter.

[43058] [Antelope] Cronjob for MariaDB is not created

Sometimes, after changing the OpenStackDeployment custom resource, it does not transition to the APPLYING state as expected.

To work around the issue, restart the openstack-controller pod in the osh-system namespace.

[44813] [Antelope] Traffic disruption observed on trunk ports

Fixed in 24.2.1 Fixed in 24.3

After upgrading to OpenStack Antelope, clusters with configured trunk ports experience traffic flow disruptions that block the cluster updates.

To work around the issue, pin the MOSK Networking service (OpenStack Neutron) container image by adding the following content to the OpenStackDeployment custom resource:

spec:
  services:
    networking:
      neutron:
        values:
          images:
            tags:
              neutron_openvswitch_agent: mirantis.azurecr.io/openstack/neutron:antelope-jammy-20240816113600

Caution

Remove the pinning after updating to MOSK 24.2.1 or later patch or major release.

[45879] [Antelope] Incorrect packet handling between instance and its gateway

Fixed in 24.2.1

After upgrade to OpenStack Antelope, the virtual machines experience connectivity disruptions when sending data over the virtual networks. Network packets with full MTU are dropped.

The issue affects the MOSK clusters with Open vSwitch as the networking backend and with the following specific MTU settings:

  • The MTU configured on the tunnel interface of compute nodes is equal to the value of the spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu parameter of the OpenStackDeployment custom resource (if not specified, default is 1500 bytes).

    If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster is not affected by the issue.

  • The cluster contains virtual machines that have the MTU of the network interfaces of the guest operating system larger than the MTU of the value of the global_physnet_mtu parameter above minus 50 bytes.

To work around the issue, pin the MOSK Networking service (OpenStack Neutron) container image by adding the following content to the OpenStackDeployment custom resource:

spec:
  services:
    networking:
      neutron:
        values:
          images:
            tags:
              neutron_openvswitch_agent: mirantis.azurecr.io/openstack/neutron:antelope-jammy-20240816113600

Caution

Remove the pinning after updating to MOSK 24.2.1 or later patch or major release.