Replace a failed Ceph OSD

After a physical disk replacement, you can use Ceph LCM API to redeploy a failed Ceph OSD. The common flow of replacing a failed Ceph OSD is as follows:

  1. Remove the obsolete Ceph OSD from the Ceph cluster by device name, by Ceph OSD ID, or by path.

  2. Add a new Ceph OSD on the new disk to the Ceph cluster.

Note

Ceph OSD replacement presupposes usage of a KaaSCephOperationRequest CR. For workflow overview, spec and phases description, see High-level workflow of Ceph OSD or node removal.

Remove a failed Ceph OSD by device name, path, or ID

Warning

The procedure below presuppose that the operator knows the exact device name, by-path, or by-id of the replaced device, as well as on which node the replacement occurred.

Warning

Since Container Cloud 2.23.1 (Cluster release 12.7.0), a Ceph OSD removal using by-path, by-id, or device name is not supported if a device was physically removed from a node. Therefore, use cleanupByOsdId instead. For details, see Remove a failed Ceph OSD by Ceph OSD ID.

Warning

Since MOSK 23.3, Mirantis does not recommend setting device name or device by-path symlink in the cleanupByDevice field as these identifiers are not persistent and can change at node boot. Remove Ceph OSDs with by-id symlinks specified in the path field or use cleanupByOsdId instead.

For details, see Container Cloud documentation: Addressing storage devices.

  1. Open the KaasCephCluster CR of a managed cluster for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  2. In the nodes section, remove the required device:

    spec:
      cephClusterSpec:
        nodes:
          <machineName>:
            storageDevices:
            - name: <deviceName>  # remove the entire item from storageDevices list
              # fullPath: <deviceByPath> if device is specified with symlink instead of name
              config:
                deviceClass: hdd
    

    Substitute <machineName> with the machine name of the node where the device <deviceName> or <deviceByPath> is going to be replaced.

  3. Save KaaSCephCluster and close the editor.

  4. Create a KaaSCephOperationRequest CR template and save it as replace-failed-osd-<machineName>-<deviceName>-request.yaml:

    apiVersion: kaas.mirantis.com/v1alpha1
    kind: KaaSCephOperationRequest
    metadata:
      name: replace-failed-osd-<machineName>-<deviceName>
      namespace: <managedClusterProjectName>
    spec:
      osdRemove:
        nodes:
          <machineName>:
            cleanupByDevice:
            - name: <deviceName>
              # If a device is specified with by-path or by-id instead of
              # name, path: <deviceByPath> or <deviceById>.
      kaasCephCluster:
        name: <kaasCephClusterName>
        namespace: <managedClusterProjectName>
    

    Substitute <kaasCephClusterName> with the corresponding KaaSCephCluster resource from the <managedClusterProjectName> namespace.

  5. Apply the template to the cluster:

    kubectl apply -f replace-failed-osd-<machineName>-<deviceName>-request.yaml
    
  6. Verify that the corresponding request has been created:

    kubectl get kaascephoperationrequest -n <managedClusterProjectName>
    
  7. Verify that the removeInfo section appeared in the KaaSCephOperationRequest CR status:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest replace-failed-osd-<machineName>-<deviceName> -o yaml
    

    Example of system response:

    status:
      childNodesMapping:
        <nodeName>: <machineName>
      osdRemoveStatus:
        removeInfo:
          cleanUpMap:
            <nodeName>:
              osdMapping:
                <osdId>:
                  deviceMapping:
                    <dataDevice>:
                      deviceClass: hdd
                      devicePath: <dataDeviceByPath>
                      devicePurpose: block
                      usedPartition: /dev/ceph-d2d3a759-2c22-4304-b890-a2d87e056bd4/osd-block-ef516477-d2da-492f-8169-a3ebfc3417e2
                      zapDisk: true
    

    Definition of values in angle brackets:

    • <machineName> - name of the machine on which the device is being replaced, for example, worker-1

    • <nodeName> - underlying node name of the machine, for example, kaas-node-5a74b669-7e53-4535-aabd-5b509ec844af

    • <osdId> - Ceph OSD ID for the device being replaced, for example, 1

    • <dataDeviceByPath> - by-path of the device placed on the node, for example, /dev/disk/by-path/pci-0000:00:1t.9

    • <dataDevice> - name of the device placed on the node, for example, /dev/sdb

  8. Verify that the cleanUpMap section matches the required removal and wait for the ApproveWaiting phase to appear in status:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest replace-failed-osd-<machineName>-<deviceName> -o yaml
    

    Example of system response:

    status:
      phase: ApproveWaiting
    
  9. Edit the KaaSCephOperationRequest CR and set the approve flag to true:

    kubectl -n <managedClusterProjectName> edit kaascephoperationrequest replace-failed-osd-<machineName>-<deviceName>
    

    For example:

    spec:
      osdRemove:
        approve: true
    
  10. Review the following status fields of the KaaSCephOperationRequest CR request processing:

    • status.phase - current state of request processing

    • status.messages - description of the current phase

    • status.conditions - full history of request processing before the current phase

    • status.removeInfo.issues and status.removeInfo.warnings - error and warning messages occurred during request processing, if any

  11. Verify that the KaaSCephOperationRequest has been completed. For example:

    status:
      phase: Completed # or CompletedWithWarnings if there are non-critical issues
    
  12. Remove the device cleanup jobs:

    kubectl delete jobs -n ceph-lcm-mirantis -l app=miraceph-cleanup-disks
    

Remove a failed Ceph OSD by Ceph OSD ID

Caution

The procedure below presupposes that the operator knows only the failed Ceph OSD ID.

  1. Identify the node and device names used by the affected Ceph OSD:

    Using the Ceph CLI in the rook-ceph-tools Pod, run:

    kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd metadata <osdId>
    

    Substitute <osdId> with the affected OSD ID.

    Example output:

    {
      "id": 1,
      ...
      "bluefs_db_devices": "vdc",
      ...
      "bluestore_bdev_devices": "vde",
      ...
      "devices": "vdc,vde",
      ...
      "hostname": "kaas-node-6c5e76f9-c2d2-4b1a-b047-3c299913a4bf",
      ...
    },
    

    In the example above, hostname is the node name and devices are all devices used by the affected Ceph OSD.

    In the status section of the KaaSCephCluster CR, obtain the osd-device mapping:

    kubectl get kaascephcluster -n <managedClusterProjectName> -o yaml
    

    Substitute <managedClusterProjectName> with the corresponding value.

    For example:

    status:
      fullClusterInfo:
        cephDetails:
          cephDeviceMapping:
            <nodeName>:
              <osdId>: <deviceName>
    

    In the system response, capture the following parameters:

    • <nodeName> - the corresponding node name that hosts the Ceph OSD

    • <osdId> - the ID of the Ceph OSD to replace

    • <deviceName> - an actual device name to replace

  2. Obtain <machineName> for <nodeName> where the Ceph OSD is placed:

    kubectl -n rook-ceph get node -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.metadata.labels.kaas\.mirantis\.com\/machine-name}{"\n"}{end}'
    
  3. Open the KaasCephCluster CR of a managed cluster for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  4. In the nodes section, remove the required device:

    spec:
      cephClusterSpec:
        nodes:
          <machineName>:
            storageDevices:
            - name: <deviceName>  # remove the entire item from storageDevices list
              config:
                deviceClass: hdd
    

    Substitute <machineName> with the machine name of the node where the device <deviceName> is going to be replaced.

  5. Save KaaSCephCluster and close the editor.

  6. Create a KaaSCephOperationRequest CR template and save it as replace-failed-<machineName>-osd-<osdId>-request.yaml:

    apiVersion: kaas.mirantis.com/v1alpha1
    kind: KaaSCephOperationRequest
    metadata:
      name: replace-failed-<machineName>-osd-<osdId>
      namespace: <managedClusterProjectName>
    spec:
      osdRemove:
        nodes:
          <machineName>:
            cleanupByOsdId:
            - <osdId>
      kaasCephCluster:
        name: <kaasCephClusterName>
        namespace: <managedClusterProjectName>
    

    Substitute <kaasCephClusterName> with the corresponding KaaSCephCluster resource from the <managedClusterProjectName> namespace.

  7. Apply the template to the cluster:

    kubectl apply -f replace-failed-<machineName>-osd-<osdId>-request.yaml
    
  8. Verify that the corresponding request has been created:

    kubectl get kaascephoperationrequest -n <managedClusterProjectName>
    
  9. Verify that the removeInfo section appeared in the KaaSCephOperationRequest CR status:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest replace-failed-<machineName>-osd-<osdId>-request -o yaml
    

    Example of system response

    status:
      childNodesMapping:
        <nodeName>: <machineName>
      osdRemoveStatus:
        removeInfo:
          cleanUpMap:
            <nodeName>:
              osdMapping:
                <osdId>:
                  deviceMapping:
                    <dataDevice>:
                      deviceClass: hdd
                      devicePath: <dataDeviceByPath>
                      devicePurpose: block
                      usedPartition: /dev/ceph-d2d3a759-2c22-4304-b890-a2d87e056bd4/osd-block-ef516477-d2da-492f-8169-a3ebfc3417e2
                      zapDisk: true
    

    Definition of values in angle brackets:

    • <machineName> - name of the machine on which the device is being replaced, for example, worker-1

    • <nodeName> - underlying node name of the machine, for example, kaas-node-5a74b669-7e53-4535-aabd-5b509ec844af

    • <osdId> - Ceph OSD ID for the device being replaced, for example, 1

    • <dataDeviceByPath> - by-path of the device placed on the node, for example, /dev/disk/by-path/pci-0000:00:1t.9

    • <dataDevice> - name of the device placed on the node, for example, /dev/sdb

  10. Verify that the cleanUpMap section matches the required removal and wait for the ApproveWaiting phase to appear in status:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest replace-failed-<machineName>-osd-<osdId>-request -o yaml
    

    Example of system response:

    status:
      phase: ApproveWaiting
    
  11. Edit the KaaSCephOperationRequest CR and set the approve flag to true:

    kubectl -n <managedClusterProjectName> edit kaascephoperationrequest replace-failed-<machineName>-osd-<osdId>-request
    

    For example:

    spec:
      osdRemove:
        approve: true
    
  12. Review the following status fields of the KaaSCephOperationRequest CR request processing:

    • status.phase - current state of request processing

    • status.messages - description of the current phase

    • status.conditions - full history of request processing before the current phase

    • status.removeInfo.issues and status.removeInfo.warnings - error and warning messages occurred during request processing, if any

  13. Verify that the KaaSCephOperationRequest has been completed. For example:

    status:
      phase: Completed # or CompletedWithWarnings if there are non-critical issues
    
  14. Remove the device cleanup jobs:

    kubectl delete jobs -n ceph-lcm-mirantis -l app=miraceph-cleanup-disks
    

Deploy a new device after removal of a failed one

Note

You can spawn Ceph OSD on a raw device, but it must be clean and without any data or partitions. If you want to add a device that was in use, also ensure it is raw and clean. To clean up all data and partitions from a device, refer to official Rook documentation.

  1. If you want to add a Ceph OSD on top of a raw device that already exists on a node or is hot-plugged, add the required device using the following guidelines:

    • You can add a raw device to a node during node deployment.

    • If a node supports adding devices without node reboot, you can hot plug a raw device to a node.

    • If a node does not support adding devices without node reboot, you can hot plug a raw device during node shutdown. In this case, complete the following steps:

      1. Enable maintenance mode on the managed cluster.

      2. Turn off the required node.

      3. Attach the required raw device to the node.

      4. Turn on the required node.

      5. Disable maintenance mode on the managed cluster.

  2. Open the KaasCephCluster CR of a managed cluster for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  3. In the nodes section, add a new device:

    spec:
      cephClusterSpec:
        nodes:
          <machineName>:
            storageDevices:
            - fullPath: <deviceByID> # Since 2.25.0 (17.0.0) if device is supposed to be added with by-id
              # name: <deviceByID> # Prior MCC 2.25.0 if device is supposed to be added with by-id
              # fullPath: <deviceByPath> # if device is supposed to be added with by-path
              config:
                deviceClass: hdd
    

    Substitute <machineName> with the machine name of the node where device <deviceName> or <deviceByPath> is going to be added as a Ceph OSD.

  4. Verify that the new Ceph OSD has appeared in the Ceph cluster and is in and up. The fullClusterInfo section should not contain any issues.

    kubectl -n <managedClusterProjectName> get kaascephcluster -o yaml
    

    For example:

    status:
      fullClusterInfo:
        daemonStatus:
          osd:
            running: '3/3 running: 3 up, 3 in'
            status: Ok