Known issues

This section lists MOSK known issues with workarounds for the Mirantis OpenStack for Kubernetes release 24.1.5.

OpenStack

[31186,34132] Pods get stuck during MariaDB operations

Due to the upstream MariaDB issue, during MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[36524] etcd enters a panic state after replacement of the controller node

Fixed in 24.2

After provisioning the controller node, the etcd pod initiates before the Kubernetes networking is fully operational. As a result, the pod encounters difficulties resolving DNS and establishing connections with other members, ultimately leading to a panic state for the etcd service.

Workaround:

  1. Delete the PVC related to the replaced controller node:

    kubectl -n openstack delete pvc <PVC-NAME>
    
  2. Delete pods related to the crashing etcd service on the replaced controller node:

    kubectl -n openstack delete pods <ETCD-POD-NAME>
    

Tungsten Fabric

[40032] tf-rabbitmq fails to start after rolling reboot

Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable the tracking_records_in_ets during the initialization process.

To work around the problem, restart the affected pods manually.

[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot

Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or other circumstances that cause the Cassandra pods to start simultaneously may cause a broken Cassandra TFConfig and/or TFAnalytics cluster. In this case, Cassandra nodes do not join the ring and do not update the IPs of the neighbor nodes. As a result, the TF services cannot operate Cassandra cluster(s).

To verify that a Cassandra cluster is affected:

Run the nodetool status command specifying the config or analytics cluster and the replica number:

kubectl -n tf exec -it tf-cassandra-<config/analytics>-dc1-rack1-<replica number> -c cassandra -- nodetool status

Example of system response with outdated IP addresses:

Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
DN  <outdated ip>   ?          256          64.9%             a58343d0-1e3f-4d54-bcdf-9b9b949ca873  r1
DN  <outdated ip>   ?          256          69.8%             67f1d07c-8b13-4482-a2f1-77fa34e90d48  r1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens       Owns (effective)  Host ID                               Rack
UN  <actual ip>      3.84 GiB   256          65.2%             7324ebc4-577a-425f-b3de-96faac95a331  rack1

Workaround:

Manually delete the Cassandra pod from the failed config or analytics cluster to re-initiate the bootstrap process for one of the Cassandra nodes:

kubectl -n tf delete pod tf-cassandra-<config/analytics>-dc1-rack1-<replica_num>

Ceph

[42903] Inconsistent handling of missing pools by ceph-controller

Fixed in 24.2

In rare cases, when ceph-controller cannot confirm the existence of MOSK pools, instead of denying action and raising errors, it proceeds to recreate the Cinder Ceph client. Such behavior may potentially cause issues with OpenStack workloads.

Workaround::

  1. In spec.cephClusterSpec of the KaaSCephCluster custom resource, remove the external section.

  2. Wait for the Not all mgrs are running: 1/2 message to disappear from the KaaSCephCluster status.

  3. Verify that the nova Ceph client that is integrated to MOSK has the same keyring as in the Ceph cluster.

    Keyring verification for the Ceph nova client
    1. Compare the keyring used in the nova-compute and libvirt pods with the one from the Ceph cluster:

      kubectl -n openstack get pod | grep nova-compute
      kubectl -n openstack exec -it <nova-compute-pod-name> -- cat /etc/ceph/ceph.client.nova.keyring
      kubectl -n openstack get pod | grep libvirt
      kubectl -n openstack exec -it <libvirt-pod-name> -- cat /etc/ceph/ceph.client.nova.keyring
      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.nova
      
    2. If the keyring differs, change the one stored in Ceph cluster with the key from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
      ceph auth get client.nova -o /tmp/nova.key
      vi /tmp/nova.key
      # in the editor, change "key" value to the key obtained from the OpenStack pods
      # then save and exit editing
      ceph auth import -i /tmp/nova.key
      
    3. Verify that the client.nova keyring of the Ceph cluster matches the one obtained from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.nova
      
    4. Verify that nova-compute and libvirt pods have access to the Ceph cluster:

      kubectl -n openstack get pod | grep nova-compute
      kubectl -n openstack exec -it <nova-compute-pod-name> -- ceph -s -n client.nova
      kubectl -n openstack get pod | grep libvirt
      kubectl -n openstack exec -it <libvirt-pod-name> -- ceph -s -n client.nova
      
  4. Verify that the cinder Ceph client integrated to MOSK has the same keyring as in the Ceph cluster:

    Keyring verification for the Ceph cinder client
    1. Compare the keyring used in the cinder-volume pods with the one from the Ceph cluster.

      kubectl -n openstack get pod | grep cinder-volume
      kubectl -n openstack exec -it <cinder-volume-pod-name> -- cat /etc/ceph/ceph.client.cinder.keyring
      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.cinder
      
    2. If the keyring differs, change the one stored in Ceph cluster with the key from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
      ceph auth get client.cinder -o /tmp/cinder.key
      vi /tmp/cinder.key
      # in the editor, change "key" value to the key obtained from the OpenStack pods
      # then save and exit editing
      ceph auth import -i /tmp/cinder.key
      
    3. Verify that the client.cinder keyring of the Ceph cluster matches the one obtained from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.cinder
      
    4. Verify that the cinder-volume pods have access to the Ceph cluster:

      kubectl -n openstack get pod | grep cinder-volume
      kubectl -n openstack exec -it <cinder-volume-pod-name> -- ceph -s -n client.cinder
      
  5. Verify that the glance Ceph client integrated to MOSK has the same keyring as in the Ceph cluster.

    Keyring verification for the Ceph glance client
    1. Compare the keyring used in the glance-api pods with the one from the Ceph cluster:

      kubectl -n openstack get pod | grep glance-api
      kubectl -n openstack exec -it <glance-api-pod-name> -- cat /etc/ceph/ceph.client.glance.keyring
      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.glance
      
    2. If the keyring differs, change the one stored in Ceph cluster with the key from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
      ceph auth get client.glance -o /tmp/glance.key
      vi /tmp/glance.key
      # in the editor, change "key" value to the key obtained from the OpenStack pods
      # then save and exit editing
      ceph auth import -i /tmp/glance.key
      
    3. Verify that the client.glance keyring of the Ceph cluster matches the one obtained from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.glance
      
    4. Verify that the glance-api pods have access to the Ceph cluster:

      kubectl -n openstack get pod | grep glance-api
      kubectl -n openstack exec -it <glance-api-pod-name> -- ceph -s -n client.glance