Create a Ceph OSD removal request

Available since 2.14.0 TechPreview

The KaaSCephOperationRequest CR provides automated LCM operations for Ceph OSDs and Ceph nodes by creating separate CephOsdRemoveRequest requests. It allows for automated removal of healthy or non-healthy Ceph OSDs from a Ceph cluster and covers the following scenarios:

  • Reducing hardware - all Ceph OSDs are up/in but you want to decrease the number of Ceph OSDs by reducing the number of disks or hosts

  • Hardware issues. For example, if a host unexpectedly goes down and will not be restored, or if a disk on a host goes down and requires replacement.

For the description of the removal request phases, see KaaSCephOperationRequest status.

To create a Ceph OSD removal request:

  1. Remove obsolete nodes or disks from the spec.nodes section of the KaaSCephCluster CR as described in Ceph advanced configuration.

    Note

    Note the names of the removed nodes, devices or their paths exactly as they were specified in KaaSCephCluster for further usage.

  2. Edit the KaaSCephOperationRequest CR as described in KaaSCephOperationRequest CR specification.

    • If KaaSCephOperationRequest contains information about Ceph OSDs to remove in a proper format, the information will be validated to eliminate human error and avoid a wrong Ceph OSD removal.

    • If the osdRemove.nodes section of KaaSCephOperationRequest is empty, the Ceph Request Controller will automatically detect Ceph OSDs for removal, if any. Auto-detection is based not only on the information provided in the KaaSCephCluster but also on the information from the Ceph cluster itself.

    Once the validation or auto-detection completes, you will see the entire information about the Ceph OSDs to remove: hosts they belong to, OSD IDs, disks, partitions, and so on. Once the information appears in the KaaSCephOperationRequest object, the request will move to the ApproveWaiting phase until you manually specify the approve flag in the spec.

  3. To execute the request, manually add an affirmative approve flag in the KaaSCephOperationRequest spec. Once done, the Ceph Status Controller reconciliation pauses until the request is handled and executes the following:

    • Stops regular Ceph Controller reconciliation

    • Removes Ceph OSDs

    • Runs batch jobs to clean up the device, if possible

    • Removes host information from the Ceph cluster if the entire Ceph node is removed

    • Marks the request with an appropriate result with a description of occurred issues

    Note

    If the request completes successfully, Ceph Controller reconciliation resumes. Otherwise, it remains paused until the issue is resolved.

  4. Verify the status of the Ceph OSD removal as described in KaaSCephOperationRequest status.

  5. Manually remove the device cleanup jobs:

    Note

    The device clean up jobs are not removed automatically and are kept in the ceph-lcm-mirantis namespace along with pods containing information about the executed actions. The jobs have the following labels:

    labels:
      app: miraceph-cleanup-disks
      host: <HOST-NAME>
      osd: <OSD-ID>
      rook-cluster: <ROOK-CLUSTER-NAME>
    

    Additionally, jobs are labeled with disk names that will be cleaned up, such as vdb=true. You can remove a single job or a group of jobs using any label described above, such as host, disk, and so on.

    kubectl delete jobs -n ceph-lcm-mirantis -l app=miraceph-cleanup-disks
    

See also

Known issues