Add, remove, or reconfigure Ceph OSDs

Mirantis Ceph Controller simplifies Ceph cluster management by automating LCM operations. This section describes how to add, remove, or reconfigure Ceph OSDs.

Add a Ceph OSD on a managed cluster

  1. Manually prepare the required machine devices with LVM2 on the existing node because BareMetalHostProfile does not support in-place changes.

  2. Open the KaasCephCluster CR of a managed cluster for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  3. In the nodes.<machineName>.storageDevices section, specify the parameters for a Ceph OSD as required. For the parameters description, see Node parameters.

    The example configuration of the nodes section with the new node:

    nodes:
      kaas-node-5bgk6:
        roles:
        - mon
        - mgr
        storageDevices:
        - config: # existing item
            deviceClass: hdd
          fullPath: /dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS
        - config: # new item
            deviceClass: hdd
          fullPath: /dev/disk/by-id/scsi-0ATA_HGST_HUS724040AL_PN1334PEHN1VBC
    
    nodes:
      kaas-node-5bgk6:
        roles:
        - mon
        - mgr
        storageDevices:
        - config: # existing item
            deviceClass: hdd
          name: sdb
        - config: # new item
            deviceClass: hdd
          name: sdc
    

    Warning

    Since Container Cloud 2.25.0, Mirantis highly recommends using the non-wwn by-id symlinks to specify storage devices in the storageDevices list.

    For details, see Addressing storage devices.

  4. Verify that the Ceph OSD on the specified node is successfully deployed. The fullClusterInfo section should not contain any issues.

    kubectl -n <managedClusterProjectName> get kaascephcluster -o yaml
    

    For example:

    status:
      fullClusterInfo:
        daemonsStatus:
          ...
          osd:
            running: '3/3 running: 3 up, 3 in'
            status: Ok
    

    Note

    Since Container Cloud 2.24.0, cephDeviceMapping is removed because its large size can potentially exceed the Kubernetes 1.5 MB quota.

  5. Verify the Ceph OSD on the managed cluster:

    kubectl -n rook-ceph get pod -l app=rook-ceph-osd -o wide | grep <machineName>
    

Remove a Ceph OSD from a managed cluster

Note

Ceph OSD removal presupposes usage of a KaaSCephOperationRequest CR. For workflow overview, spec and phases description, see High-level workflow of Ceph OSD or node removal.

Warning

When using the non-recommended Ceph pools replicated.size of less than 3, Ceph OSD removal cannot be performed. The minimal replica size equals a rounded up half of the specified replicated.size.

For example, if replicated.size is 2, the minimal replica size is 1, and if replicated.size is 3, then the minimal replica size is 2. The replica size of 1 allows Ceph having PGs with only one Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED health warning that blocks Ceph OSD removal. Mirantis recommends setting replicated.size to 3 for each Ceph pool.

  1. Open the KaasCephCluster CR of a managed cluster for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  2. Remove the required Ceph OSD specification from the spec.cephClusterSpec.nodes.<machineName>.storageDevices list:

    The example configuration of the nodes section with the new node:

    nodes:
      kaas-node-5bgk6:
        roles:
        - mon
        - mgr
        storageDevices:
        - config:
            deviceClass: hdd
          fullPath: /dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS
        - config: # remove the entire item entry from storageDevices list
            deviceClass: hdd
          fullPath: /dev/disk/by-id/scsi-0ATA_HGST_HUS724040AL_PN1334PEHN1VBC
    
    nodes:
      kaas-node-5bgk6:
        roles:
        - mon
        - mgr
        storageDevices:
        - config:
            deviceClass: hdd
          name: sdb
        - config: # remove the entire item entry from storageDevices list
            deviceClass: hdd
          name: sdc
    
  3. Create a YAML template for the KaaSCephOperationRequest CR. Select from the following options:

    • Remove Ceph OSD by device name, by-path symlink, or by-id symlink:

      apiVersion: kaas.mirantis.com/v1alpha1
      kind: KaaSCephOperationRequest
      metadata:
        name: remove-osd-<machineName>-sdb
        namespace: <managedClusterProjectName>
      spec:
        osdRemove:
          nodes:
            <machineName>:
              cleanupByDevice:
              - name: sdb
        kaasCephCluster:
          name: <kaasCephClusterName>
          namespace: <managedClusterProjectName>
      

      Substitute <managedClusterProjectName> with the corresponding cluster namespace and <kaasCephClusterName> with the corresponding KaaSCephCluster name.

      Warning

      Since Container Cloud 2.25.0, Mirantis does not recommend setting device name or device by-path symlink in the cleanupByDevice field as these identifiers are not persistent and can change at node boot. Remove Ceph OSDs with by-id symlinks specified in the path field or use cleanupByOsdId instead.

      For details, see Addressing storage devices.

      Note

      • Since Container Cloud 2.23.0 and 2.23.1 for MOSK 23.1, cleanupByDevice is not supported if a device was physically removed from a node. Therefore, use cleanupByOsdId instead. For details, see Remove a failed Ceph OSD by Ceph OSD ID.

      • Before Container Cloud 2.23.0 and 2.23.1 for MOSK 23.1, if the storageDevice item was specified with by-id, specify the path parameter in the cleanupByDevice section instead of name.

      • If the storageDevice item was specified with a by-path device path, specify the path parameter in the cleanupByDevice section instead of name.

    • Remove Ceph OSD by OSD ID:

      apiVersion: kaas.mirantis.com/v1alpha1
      kind: KaaSCephOperationRequest
      metadata:
        name: remove-osd-<machineName>-sdb
        namespace: <managedClusterProjectName>
      spec:
        osdRemove:
          nodes:
            <machineName>:
              cleanupByOsdId:
              - 2
        kaasCephCluster:
          name: <kaasCephClusterName>
          namespace: <managedClusterProjectName>
      

      Substitute <managedClusterProjectName> with the corresponding cluster namespace and <kaasCephClusterName> with the corresponding KaaSCephCluster name.

  4. Apply the template on the management cluster in the corresponding namespace:

    kubectl apply -f remove-osd-<machineName>-sdb.yaml
    
  5. Verify that the corresponding request has been created:

    kubectl get kaascephoperationrequest remove-osd-<machineName>-sdb -n <managedClusterProjectName>
    
  6. Verify that the removeInfo section appeared in the KaaSCephOperationRequest CR status:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest remove-osd-<machineName>-sdb -o yaml
    

    Example of system response:

    status:
      childNodesMapping:
        kaas-node-d4aac64d-1721-446c-b7df-e351c3025591: <machineName>
      osdRemoveStatus:
        removeInfo:
          cleanUpMap:
            kaas-node-d4aac64d-1721-446c-b7df-e351c3025591:
              osdMapping:
                "10":
                  deviceMapping:
                    sdb:
                      path: "/dev/disk/by-path/pci-0000:00:1t.9"
                      partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdb"
                      type: "block"
                      class: "hdd"
                      zapDisk: true
    
  7. Verify that the cleanUpMap section matches the required removal and wait for the ApproveWaiting phase to appear in status:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest remove-osd-<machineName>-sdb -o yaml
    

    Example of system response:

    status:
      phase: ApproveWaiting
    
  8. Edit the KaaSCephOperationRequest CR and set the approve flag to true:

    kubectl -n <managedClusterProjectName> edit kaascephoperationrequest remove-osd-<machineName>-sdb
    

    For example:

    spec:
      osdRemove:
        approve: true
    
  9. Review the status of the KaaSCephOperationRequest resource request processing. The valuable parameters are as follows:

    • status.phase - the current state of request processing

    • status.messages - the description of the current phase

    • status.conditions - full history of request processing before the current phase

    • status.removeInfo.issues and status.removeInfo.warnings - contain error and warning messages occurred during request processing

  10. Verify that the KaaSCephOperationRequest has been completed. For example:

    status:
      phase: Completed # or CompletedWithWarnings if there are non-critical issues
    
  11. Remove the device cleanup jobs:

    kubectl delete jobs -n ceph-lcm-mirantis -l app=miraceph-cleanup-disks
    

Reconfigure a Ceph OSD on a managed cluster

There is no hot reconfiguration procedure for existing Ceph OSDs. To reconfigure an existing Ceph node, follow the steps below:

  1. Remove a Ceph OSD from the Ceph cluster as described in Remove a Ceph OSD from a managed cluster.

  2. Add the same Ceph OSD but with a modified configuration as described in Add a Ceph OSD on a managed cluster.