Ceph known issues¶

This section lists the Ceph known issues with workarounds for the Mirantis OpenStack for Kubernetes release 23.1.

[30857] Irrelevant error during Ceph OSD deployment on removable devices
[31630] Ceph cluster upgrade to Pacific is stuck with Rook connection failure
[31555] Ceph can find only 1 out of 2 ‘mgr’ after update to MOSK 23.1

[30857] Irrelevant error during Ceph OSD deployment on removable devices¶

The deployment of Ceph OSDs fails with the following messages in the status section of the KaaSCephCluster custom resource:

shortClusterInfo:
  messages:
  - Not all osds are deployed
  - Not all osds are in
  - Not all osds are up

To find out if your cluster is affected, verify if the devices on the AMD hosts you use for the Ceph OSDs deployment are removable. For example, if the sdb device name is specified in spec.cephClusterSpec.nodes.storageDevices of the KaaSCephCluster custom resource for the affected host, run:

# cat /sys/block/sdb/removable
1

The system output shows that the reason of the above messages in status is the enabled hotplug functionality on the AMD nodes, which marks all drives as removable. And the hotplug functionality is not supported by Ceph in MOSK.

As a workaround, disable the hotplug functionality in the BIOS settings for disks that are configured to be used as Ceph OSD data devices.

[31630] Ceph cluster upgrade to Pacific is stuck with Rook connection failure¶

Fixed in MOSK 23.2

During update to MOSK 23.1, the Ceph cluster gets stuck during upgrade to Ceph Pacific.

To verify whether your cluster is affected:

The cluster is affected if the following conditions are true:

The ceph-status-controller pod on the MOSK cluster contains the following log lines:

kubectl -n ceph-lcm-mirantis logs <ceph-status-controller-podname>
...
E0405 08:07:15.603247       1 cluster.go:222] Cluster health: "HEALTH_ERR"
W0405 08:07:15.603266       1 cluster.go:230] found issue error: {Urgent failed to get status. . timed out: exit status 1}

The KaaSCephCluster custom resource contains the following configuration option in the rookConfig section:

spec:
  cephClusterSpec:
    rookConfig:
      ms_crc_data: "false" # or 'ms crc data: "false"'

As a workaround, remove ms_crc_data (or ms crc data) configuration key from the KaaSCephCluster custom resource and wait for the rook-ceph-mon pods to restart on the MOSK cluster:

kubectl -n rook-ceph get pod -l app=rook-ceph-mon -w

[31555] Ceph can find only 1 out of 2 ‘mgr’ after update to MOSK 23.1¶

Fixed in MOSK 23.2

After update to MOSK 23.1, the status section of the KaaSCephCluster custom resource can contain the following message:

shortClusterInfo:
  messages:
  - Not all mgrs are running: 1/2

To verify whether the cluster is affected:

If the KaaSCephCluster spec contains the external section, the cluster is affected:

spec:
  cephClusterSpec:
    external:
      enable: false

Workaround::

In spec.cephClusterSpec of the KaaSCephCluster custom resource, remove the external section.
Wait for the Not all mgrs are running: 1/2 message to disappear from the KaaSCephCluster status.

Verify that the nova Ceph client that is integrated to MOSK has the same keyring as in the Ceph cluster.

Verify that the cinder Ceph client integrated to MOSK has the same keyring as in the Ceph cluster:

Verify that the glance Ceph client integrated to MOSK has the same keyring as in the Ceph cluster.