Searching for results...

No results

Your search did not match anything from Mirantis documentation.
Check your spelling or try different keywords.

An error occurred

An error occurred while using the search.
Try your search again or contact us to let us know about it.

Newer documentation is now live.You are currently reading an older version.

CephOsdRemoveRequest failure with a timeout during rebalance

Warning

This procedure is valid for MOSK clusters that use the MiraCeph custom resource (CR), which is available since MOSK 25.2 to replace the unsupported KaaSCephCluster resource. And MiraCeph will be automatically migrated to CephDeployment in MOSK 26.1. For details, see Deprecation Notes: KaaSCephCluster API on management clusters.

For the equivalent procedure with the unsupported KaaSCephCluster CR, refer to the following section:

KaaSCephOperationRequest failure with a timeout during rebalance

Ceph OSD removal procedure includes the Ceph OSD out action that starts the Ceph PGs rebalancing process. The total time for rebalancing depends on a cluster hardware configuration: network bandwidth, Ceph PGs placement, number of Ceph OSDs, and so on. The default rebalance timeout is limited by 30 minutes, which applies to standard cluster configurations.

If the rebalance takes more than 30 minutes, the CephOsdRemoveRequest resources created for removing Ceph OSDs or nodes fail with the following example message:

status:
  messages:
  - Timeout (30m0s) reached for waiting pg rebalance for osd 2

To apply the issue resolution, increase the timeout for all future CephOsdRemoveRequest resources:

  1. On the management cluster, open the Cluster resource of the affected MOSK cluster for editing:

    kubectl -n <moskClusterProjectName> edit cluster <moskClusterName>
    

    Replace <moskClusterProjectName> and <moskClusterName> with the corresponding values of the affected MOSK cluster.

  2. Add pgRebalanceTimeoutMin to the ceph-controller Helm release values section in the Cluster spec:

    spec:
      providerSpec:
        value:
          helmReleases:
          - name: ceph-controller
            values:
              controllers:
                cephRequest:
                  parameters:
                    pgRebalanceTimeoutMin: <rebalanceTimeout>
    

    The <rebalanceTimeout> value is a required rebalance timeout in minutes. Must be an integer greater than zero. For example, 60.

  3. Save the edits and exit from the Cluster resource.


If you have an existing CephOsdRemoveRequest resource with issues in messages to process:

  1. In the failed CephOsdRemoveRequest resource, copy the spec section.

  2. Create a new CephOsdRemoveRequest with a different name. For details, see Creating a Ceph OSD removal request.

  3. Paste the previously copied spec section of the failed CephOsdRemoveRequest resource to the new one.

  4. Remove the failed CephOsdRemoveRequest resource.