KaaSCephOperationRequest failure with a timeout during rebalance¶
Ceph OSD removal procedure includes the Ceph OSD
out action that starts
the Ceph PGs rebalancing process. The total time for rebalancing depends on a
cluster hardware configuration: network bandwidth, Ceph PGs placement, number
of Ceph OSDs, and so on. The default rebalance timeout is limited by
minutes, which applies to standard cluster configurations.
If the rebalance takes more than 30 minutes, the
resources created for removing Ceph OSDs or nodes fail with the following
status: removeStatus: osdRemoveStatus: errorReason: Timeout (30m0s) reached for waiting pg rebalance for osd 2 status: Failed
To apply the issue resolution, increase the timeout for all future
On the management cluster, open the
Clusterresource of the affected managed cluster for editing:
kubectl -n <managedClusterProjectName> edit cluster <managedClusterName>
<managedClusterName>with the corresponding values of the affected managed cluster.
ceph-controllerHelm release values section in the
spec: providerSpec: value: helmReleases: - name: ceph-controller values: controllers: cephRequest: parameters: pgRebalanceTimeoutMin: <rebalanceTimeout>
<rebalanceTimeout>value is a required rebalance timeout in minutes. Must be an integer greater than zero. For example,
Save the edits and exit from the
If you have an existing
KaaSCephOperationRequest resource with
errorReason to process:
specsection in the failed
Create a new
KaaSCephOperationRequestwith a different name. For details, see Creating a Ceph OSD removal request.
Paste the previously copied
specsection of the failed
KaaSCephOperationRequestresource to the new one.
Remove the failed