Ceph known issues

This section lists the Ceph known issues with workarounds for the Mirantis OpenStack for Kubernetes release 23.1.

[30857] Irrelevant error during Ceph OSD deployment on removable devices

Fixed in 23.2

The deployment of Ceph OSDs fails with the following messages in the status section of the KaaSCephCluster custom resource:

shortClusterInfo:
  messages:
  - Not all osds are deployed
  - Not all osds are in
  - Not all osds are up

To find out if your cluster is affected, verify if the devices on the AMD hosts you use for the Ceph OSDs deployment are removable. For example, if the sdb device name is specified in spec.cephClusterSpec.nodes.storageDevices of the KaaSCephCluster custom resource for the affected host, run:

# cat /sys/block/sdb/removable
1

The system output shows that the reason of the above messages in status is the enabled hotplug functionality on the AMD nodes, which marks all drives as removable. And the hotplug functionality is not supported by Ceph in MOSK.

As a workaround, disable the hotplug functionality in the BIOS settings for disks that are configured to be used as Ceph OSD data devices.

[31630] Ceph cluster upgrade to Pacific is stuck with Rook connection failure

Fixed in 23.2

During update to MOSK 23.1, the Ceph cluster gets stuck during upgrade to Ceph Pacific.

To verify whether your cluster is affected:

The cluster is affected if the following conditions are true:

  • The ceph-status-controller pod on the MOSK cluster contains the following log lines:

    kubectl -n ceph-lcm-mirantis logs <ceph-status-controller-podname>
    ...
    E0405 08:07:15.603247       1 cluster.go:222] Cluster health: "HEALTH_ERR"
    W0405 08:07:15.603266       1 cluster.go:230] found issue error: {Urgent failed to get status. . timed out: exit status 1}
    
  • The KaaSCephCluster custom resource contains the following configuration option in the rookConfig section:

    spec:
      cephClusterSpec:
        rookConfig:
          ms_crc_data: "false" # or 'ms crc data: "false"'
    

As a workaround, remove ms_crc_data (or ms crc data) configuration key from the KaaSCephCluster custom resource and wait for the rook-ceph-mon pods to restart on the MOSK cluster:

kubectl -n rook-ceph get pod -l app=rook-ceph-mon -w

[31555] Ceph can find only 1 out of 2 ‘mgr’ after update to MOSK 23.1

Fixed in 23.2 After update to MOSK 23.1, the status section of the KaaSCephCluster custom resource can contain the following message:

shortClusterInfo:
  messages:
  - Not all mgrs are running: 1/2

To verify whether the cluster is affected:

If the KaaSCephCluster spec contains the external section, the cluster is affected:

spec:
  cephClusterSpec:
    external:
      enable: false

Workaround::

  1. In spec.cephClusterSpec of the KaaSCephCluster custom resource, remove the external section.

  2. Wait for the Not all mgrs are running: 1/2 message to disappear from the KaaSCephCluster status.

  3. Verify that the nova Ceph client that is integrated to MOSK has the same keyring as in the Ceph cluster.

    Keyring verification for the Ceph nova client
    1. Compare the keyring used in the nova-compute and libvirt pods with the one from the Ceph cluster:

      kubectl -n openstack get pod | grep nova-compute
      kubectl -n openstack exec -it <nova-compute-pod-name> -- cat /etc/ceph/ceph.client.nova.keyring
      kubectl -n openstack get pod | grep libvirt
      kubectl -n openstack exec -it <libvirt-pod-name> -- cat /etc/ceph/ceph.client.nova.keyring
      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.nova
      
    2. If the keyring differs, change the one stored in Ceph cluster with the key from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
      ceph auth get client.nova -o /tmp/nova.key
      vi /tmp/nova.key
      # in the editor, change "key" value to the key obtained from the OpenStack pods
      # then save and exit editing
      ceph auth import -i /tmp/nova.key
      
    3. Verify that the client.nova keyring of the Ceph cluster matches the one obtained from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.nova
      
    4. Verify that nova-compute and libvirt pods have access to the Ceph cluster:

      kubectl -n openstack get pod | grep nova-compute
      kubectl -n openstack exec -it <nova-compute-pod-name> -- ceph -s -n client.nova
      kubectl -n openstack get pod | grep libvirt
      kubectl -n openstack exec -it <libvirt-pod-name> -- ceph -s -n client.nova
      
  4. Verify that the cinder Ceph client integrated to MOSK has the same keyring as in the Ceph cluster:

    Keyring verification for the Ceph cinder client
    1. Compare the keyring used in the cinder-volume pods with the one from the Ceph cluster.

      kubectl -n openstack get pod | grep cinder-volume
      kubectl -n openstack exec -it <cinder-volume-pod-name> -- cat /etc/ceph/ceph.client.cinder.keyring
      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.cinder
      
    2. If the keyring differs, change the one stored in Ceph cluster with the key from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
      ceph auth get client.cinder -o /tmp/cinder.key
      vi /tmp/cinder.key
      # in the editor, change "key" value to the key obtained from the OpenStack pods
      # then save and exit editing
      ceph auth import -i /tmp/cinder.key
      
    3. Verify that the client.cinder keyring of the Ceph cluster matches the one obtained from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.cinder
      
    4. Verify that the cinder-volume pods have access to the Ceph cluster:

      kubectl -n openstack get pod | grep cinder-volume
      kubectl -n openstack exec -it <cinder-volume-pod-name> -- ceph -s -n client.cinder
      
  5. Verify that the glance Ceph client integrated to MOSK has the same keyring as in the Ceph cluster.

    Keyring verification for the Ceph glance client
    1. Compare the keyring used in the glance-api pods with the one from the Ceph cluster:

      kubectl -n openstack get pod | grep glance-api
      kubectl -n openstack exec -it <glance-api-pod-name> -- cat /etc/ceph/ceph.client.glance.keyring
      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.glance
      
    2. If the keyring differs, change the one stored in Ceph cluster with the key from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
      ceph auth get client.glance -o /tmp/glance.key
      vi /tmp/glance.key
      # in the editor, change "key" value to the key obtained from the OpenStack pods
      # then save and exit editing
      ceph auth import -i /tmp/glance.key
      
    3. Verify that the client.glance keyring of the Ceph cluster matches the one obtained from the OpenStack pods:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get client.glance
      
    4. Verify that the glance-api pods have access to the Ceph cluster:

      kubectl -n openstack get pod | grep glance-api
      kubectl -n openstack exec -it <glance-api-pod-name> -- ceph -s -n client.glance