Creating a Ceph OSD removal request¶
The workflow of creating a Ceph OSD removal request includes the following steps:
Removing obsolete nodes or disks from the
spec.nodes
section of theKaaSCephCluster
CR as described in Ceph advanced configuration.Note
Note the names of the removed nodes, devices or their paths exactly as they were specified in
KaaSCephCluster
for further usage.Creating a YAML template for the
KaaSCephOperationRequest
CR. For details, see KaaSCephOperationRequest OSD removal specification.If
KaaSCephOperationRequest
contains information about Ceph OSDs to remove in a proper format, the information will be validated to eliminate human error and avoid a wrong Ceph OSD removal.If the
osdRemove.nodes
section ofKaaSCephOperationRequest
is empty, the Ceph Request Controller will automatically detect Ceph OSDs for removal, if any. Auto-detection is based not only on the information provided in theKaaSCephCluster
but also on the information from the Ceph cluster itself.
Once the validation or auto-detection completes, the entire information about the Ceph OSDs to remove appears in the
KaaSCephOperationRequest
object: hosts they belong to, OSD IDs, disks, partitions, and so on. The request then moves to theApproveWaiting
phase until the Operator manually specifies theapprove
flag in the spec.Manually adding an affirmative
approve
flag in theKaaSCephOperationRequest
spec. Once done, the Ceph Status Controller reconciliation pauses until the request is handled and executes the following:Stops regular Ceph Controller reconciliation
Removes Ceph OSDs
Runs batch jobs to clean up the device, if possible
Removes host information from the Ceph cluster if the entire Ceph node is removed
Marks the request with an appropriate result with a description of occurred issues
Note
If the request completes successfully, Ceph Controller reconciliation resumes. Otherwise, it remains paused until the issue is resolved.
Reviewing the Ceph OSD removal status. For details, see KaaSCephOperationRequest OSD removal status.
Manual removal of device cleanup jobs.
Note
Device cleanup jobs are not removed automatically and are kept in the
ceph-lcm-mirantis
namespace along with pods containing information about the executed actions. The jobs have the following labels:labels: app: miraceph-cleanup-disks host: <HOST-NAME> osd: <OSD-ID> rook-cluster: <ROOK-CLUSTER-NAME>
Additionally, jobs are labeled with disk names that will be cleaned up, such as
vdb=true
. You can remove a single job or a group of jobs using any label described above, such as host, disk, and so on.
Example of KaaSCephOperationRequest
resource
apiVersion: kaas.mirantis.com/v1alpha1
kind: KaaSCephOperationRequest
metadata:
name: remove-osd-3-4-request
namespace: managed-namespace
spec:
osdRemove:
approve: true
nodes:
worker-3:
cleanupByDevice:
- name: sdb
- path: /dev/disk/by-path/pci-0000:00:1t.9
kaasCephCluster:
name: ceph-cluster-managed-cluster
namespace: managed-namespace
Example of Ceph OSDs ready for removal
apiVersion: kaas.mirantis.com/v1alpha1
kind: KaaSCephOperationRequest
metadata:
generateName: remove-osds
namespace: managed-ns
spec:
osdRemove: {}
kaasCephCluster:
name: ceph-cluster-managed-cl
namespace: managed-ns
See also
ceph-failed-kcor-timeout