Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!

Starting with MOSK 25.2, the MOSK documentation set covers all product layers, including MOSK management (formerly MCC). This means everything you need is in one place. The separate MCC documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.

KaaSCephOperationRequest failure with a timeout during rebalance

Warning

This procedure is valid for MOSK clusters that use the deprecated KaaSCephCluster custom resource (CR) instead of the MiraCeph CR that is available since MOSK 25.2 as a new Ceph configuration entrypoint. For the equivalent procedure with the MiraCeph CR, refer to the following section:

CephOsdRemoveRequest failure with a timeout during rebalance

Ceph OSD removal procedure includes the Ceph OSD out action that starts the Ceph PGs rebalancing process. The total time for rebalancing depends on a cluster hardware configuration: network bandwidth, Ceph PGs placement, number of Ceph OSDs, and so on. The default rebalance timeout is limited by 30 minutes, which applies to standard cluster configurations.

If the rebalance takes more than 30 minutes, the KaaSCephOperationRequest resources created for removing Ceph OSDs or nodes fail with the following example message:

status:
  removeStatus:
    osdRemoveStatus:
      errorReason: Timeout (30m0s) reached for waiting pg rebalance for osd 2
      status: Failed

To apply the issue resolution, increase the timeout for all future KaaSCephOperationRequest resources:

  1. On the management cluster, open the Cluster resource of the affected managed cluster for editing:

    kubectl -n <managedClusterProjectName> edit cluster <managedClusterName>
    

    Replace <managedClusterProjectName> and <managedClusterName> with the corresponding values of the affected managed cluster.

  2. Add pgRebalanceTimeoutMin to the ceph-controller Helm release values section in the Cluster spec:

    spec:
      providerSpec:
        value:
          helmReleases:
          - name: ceph-controller
            values:
              controllers:
                cephRequest:
                  parameters:
                    pgRebalanceTimeoutMin: <rebalanceTimeout>
    

    The <rebalanceTimeout> value is a required rebalance timeout in minutes. Must be an integer greater than zero. For example, 60.

  3. Save the edits and exit from the Cluster resource.


If you have an existing KaaSCephOperationRequest resource with errorReason to process:

  1. Copy the spec section in the failed KaaSCephOperationRequest resource.

  2. Create a new KaaSCephOperationRequest with a different name. For details, see Creating a Ceph OSD removal request.

  3. Paste the previously copied spec section of the failed KaaSCephOperationRequest resource to the new one.

  4. Remove the failed KaaSCephOperationRequest resource.