Create a Ceph OSD removal request¶
Available since 2.14.0 TechPreview
KaaSCephOperationRequest CR provides automated LCM operations for
Ceph OSDs and Ceph nodes by creating separate
requests. It allows for automated removal of healthy or non-healthy Ceph OSDs
from a Ceph cluster and covers the following scenarios:
Reducing hardware - all Ceph OSDs are up/in but you want to decrease the number of Ceph OSDs by reducing the number of disks or hosts
Hardware issues. For example, if a host unexpectedly goes down and will not be restored, or if a disk on a host goes down and requires replacement.
For the description of the removal request phases, see KaaSCephOperationRequest status.
To create a Ceph OSD removal request:
Remove obsolete nodes or disks from the
spec.nodessection of the
KaaSCephClusterCR as described in Ceph advanced configuration.
Note the names of the removed nodes, devices or their paths exactly as they were specified in
KaaSCephClusterfor further usage.
KaaSCephOperationRequestCR as described in KaaSCephOperationRequest CR specification.
KaaSCephOperationRequestcontains information about Ceph OSDs to remove in a proper format, the information will be validated to eliminate human error and avoid a wrong Ceph OSD removal.
KaaSCephOperationRequestis empty, the Ceph Request Controller will automatically detect Ceph OSDs for removal, if any. Auto-detection is based not only on the information provided in the
KaaSCephClusterbut also on the information from the Ceph cluster itself.
Once the validation or auto-detection completes, you will see the entire information about the Ceph OSDs to remove: hosts they belong to, OSD IDs, disks, partitions, and so on. Once the information appears in the
KaaSCephOperationRequestobject, the request will move to the
ApproveWaitingphase until you manually specify the
approveflag in the spec.
To execute the request, manually add an affirmative
approveflag in the
KaaSCephOperationRequestspec. Once done, the Ceph Status Controller reconciliation pauses until the request is handled and executes the following:
Stops regular Ceph Controller reconciliation
Removes Ceph OSDs
Runs batch jobs to clean up the device, if possible
Removes host information from the Ceph cluster if the entire Ceph node is removed
Marks the request with an appropriate result with a description of occurred issues
If the request completes successfully, Ceph Controller reconciliation resumes. Otherwise, it remains paused until the issue is resolved.
Verify the status of the Ceph OSD removal as described in KaaSCephOperationRequest status.
Manually remove the device cleanup jobs:
The device clean up jobs are not removed automatically and are kept in the
ceph-lcm-mirantisnamespace along with pods containing information about the executed actions. The jobs have the following labels:
labels: app: miraceph-cleanup-disks host: <HOST-NAME> osd: <OSD-ID> rook-cluster: <ROOK-CLUSTER-NAME>
Additionally, jobs are labeled with disk names that will be cleaned up, such as
vdb=true. You can remove a single job or a group of jobs using any label described above, such as host, disk, and so on.
kubectl delete jobs -n ceph-lcm-mirantis -l app=miraceph-cleanup-disks