CephOsdRemoveRequest failure with a timeout during rebalance¶
Warning
This procedure is valid for MOSK clusters that use the MiraCeph custom
resource (CR), which is available since MOSK 25.2 to replace the unsupported
KaaSCephCluster resource. And MiraCeph will be automatically migrated
to CephDeployment in MOSK 26.1. For details, see Deprecation Notes:
KaaSCephCluster API on management clusters.
For the equivalent procedure with the unsupported KaaSCephCluster CR, refer
to the following section:
KaaSCephOperationRequest failure with a timeout during rebalance
Ceph OSD removal procedure includes the Ceph OSD out action that starts
the Ceph PGs rebalancing process. The total time for rebalancing depends on a
cluster hardware configuration: network bandwidth, Ceph PGs placement, number
of Ceph OSDs, and so on. The default rebalance timeout is limited by 30
minutes, which applies to standard cluster configurations.
If the rebalance takes more than 30 minutes, the CephOsdRemoveRequest
resources created for removing Ceph OSDs or nodes fail with the following
example message:
status:
messages:
- Timeout (30m0s) reached for waiting pg rebalance for osd 2
To apply the issue resolution, increase the timeout for all future
CephOsdRemoveRequest resources:
On the management cluster, open the
Clusterresource of the affected MOSK cluster for editing:kubectl -n <moskClusterProjectName> edit cluster <moskClusterName>
Replace
<moskClusterProjectName>and<moskClusterName>with the corresponding values of the affected MOSK cluster.Add
pgRebalanceTimeoutMinto theceph-controllerHelm releasevaluessection in theClusterspec:spec: providerSpec: value: helmReleases: - name: ceph-controller values: controllers: cephRequest: parameters: pgRebalanceTimeoutMin: <rebalanceTimeout>
The
<rebalanceTimeout>value is a required rebalance timeout in minutes. Must be an integer greater than zero. For example,60.Save the edits and exit from the
Clusterresource.
If you have an existing CephOsdRemoveRequest resource with issues in
messages to process:
In the failed
CephOsdRemoveRequestresource, copy thespecsection.Create a new
CephOsdRemoveRequestwith a different name. For details, see Creating a Ceph OSD removal request.Paste the previously copied
specsection of the failedCephOsdRemoveRequestresource to the new one.Remove the failed
CephOsdRemoveRequestresource.