Creating a Ceph OSD removal request

The workflow of creating a Ceph OSD removal request includes the following steps:

  1. Removing obsolete nodes or disks from the spec.nodes section of the KaaSCephCluster CR as described in Ceph advanced configuration.

    Note

    Note the names of the removed nodes, devices or their paths exactly as they were specified in KaaSCephCluster for further usage.

  2. Creating a YAML template for the KaaSCephOperationRequest CR. For details, see KaaSCephOperationRequest OSD removal specification.

    • If KaaSCephOperationRequest contains information about Ceph OSDs to remove in a proper format, the information will be validated to eliminate human error and avoid a wrong Ceph OSD removal.

    • If the osdRemove.nodes section of KaaSCephOperationRequest is empty, the Ceph Request Controller will automatically detect Ceph OSDs for removal, if any. Auto-detection is based not only on the information provided in the KaaSCephCluster but also on the information from the Ceph cluster itself.

    Once the validation or auto-detection completes, the entire information about the Ceph OSDs to remove appears in the KaaSCephOperationRequest object: hosts they belong to, OSD IDs, disks, partitions, and so on. The request then moves to the ApproveWaiting phase until the Operator manually specifies the approve flag in the spec.

  3. Manually adding an affirmative approve flag in the KaaSCephOperationRequest spec. Once done, the Ceph Status Controller reconciliation pauses until the request is handled and executes the following:

    • Stops regular Ceph Controller reconciliation

    • Removes Ceph OSDs

    • Runs batch jobs to clean up the device, if possible

    • Removes host information from the Ceph cluster if the entire Ceph node is removed

    • Marks the request with an appropriate result with a description of occurred issues

    Note

    If the request completes successfully, Ceph Controller reconciliation resumes. Otherwise, it remains paused until the issue is resolved.

  4. Reviewing the Ceph OSD removal status. For details, see KaaSCephOperationRequest OSD removal status.

  5. Manual removal of device cleanup jobs.

    Note

    Device cleanup jobs are not removed automatically and are kept in the ceph-lcm-mirantis namespace along with pods containing information about the executed actions. The jobs have the following labels:

    labels:
      app: miraceph-cleanup-disks
      host: <HOST-NAME>
      osd: <OSD-ID>
      rook-cluster: <ROOK-CLUSTER-NAME>
    

    Additionally, jobs are labeled with disk names that will be cleaned up, such as vdb=true. You can remove a single job or a group of jobs using any label described above, such as host, disk, and so on.

Example of KaaSCephOperationRequest resource
apiVersion: kaas.mirantis.com/v1alpha1
kind: KaaSCephOperationRequest
metadata:
  name: remove-osd-3-4-request
  namespace: managed-namespace
spec:
  osdRemove:
    approve: true
    nodes:
      worker-3:
        cleanupByDevice:
        - name: sdb
        - path: /dev/disk/by-path/pci-0000:00:1t.9
  kaasCephCluster:
    name: ceph-cluster-managed-cluster
    namespace: managed-namespace
Example of Ceph OSDs ready for removal
apiVersion: kaas.mirantis.com/v1alpha1
kind: KaaSCephOperationRequest
metadata:
  generateName: remove-osds
  namespace: managed-ns
spec:
  osdRemove: {}
  kaasCephCluster:
    name: ceph-cluster-managed-cl
    namespace: managed-ns

See also

ceph-failed-kcor-timeout