Replace a failed metadata device

This section describes the scenario when an underlying metadata device fails with all related Ceph OSDs. In this case, the only solution is to remove all Ceph OSDs related to the failed metadata device, then attach a device that will be used as a new metadata device, and re-create all affected Ceph OSDs.

Caution

If you used BareMetalHostProfile to automatically partition the failed device, you must create a manual partition of the new device because BareMetalHostProfile does not support hot-load changes and creates an automatic device partition only during node provisioning.

Remove failed Ceph OSDs with the affected metadata device

  1. Save the KaaSCephCluster specification of all Ceph OSDs affected by the failed metadata device to re-use this specification during re-creation of Ceph OSDs after disk replacement.

  2. Identify Ceph OSD IDs related to the failed metadata device, for example, using Ceph CLI in the rook-ceph-tools Pod:

    ceph osd metadata
    

    Example of system response:

    {
        "id": 11,
        ...
        "bluefs_db_devices": "vdc",
        ...
        "bluestore_bdev_devices": "vde",
        ...
        "devices": "vdc,vde",
        ...
        "hostname": "kaas-node-6c5e76f9-c2d2-4b1a-b047-3c299913a4bf",
        ...
    },
    {
        "id": 12,
        ...
        "bluefs_db_devices": "vdd",
        ...
        "bluestore_bdev_devices": "vde",
        ...
        "devices": "vdd,vde",
        ...
        "hostname": "kaas-node-6c5e76f9-c2d2-4b1a-b047-3c299913a4bf",
        ...
    },
    {
        "id": 13,
        ...
        "bluefs_db_devices": "vdf",
        ...
        "bluestore_bdev_devices": "vde",
        ...
        "devices": "vde,vdf",
        ...
        "hostname": "kaas-node-6c5e76f9-c2d2-4b1a-b047-3c299913a4bf",
        ...
    },
    ...
    
  3. Open the KaasCephCluster custom resource (CR) for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  4. In the nodes section, remove all storageDevices items that relate to the failed metadata device. For example:

    spec:
      cephClusterSpec:
        nodes:
          <machineName>:
            storageDevices:
            - name: <deviceName1>  # remove the entire item from the storageDevices list
              # fullPath: <deviceByPath> if device is specified using symlink instead of name
              config:
                deviceClass: hdd
                metadataDevice: <metadataDevice>
            - name: <deviceName2>  # remove the entire item from the storageDevices list
              config:
                deviceClass: hdd
                metadataDevice: <metadataDevice>
            - name: <deviceName3>  # remove the entire item from the storageDevices list
              config:
                deviceClass: hdd
                metadataDevice: <metadataDevice>
            ...
    

    In the example above, <machineName> is the machine name of the node where the metadata device <metadataDevice> must be replaced.

  5. Create a KaaSCephOperationRequest CR template and save it as replace-failed-meta-<machineName>-<metadataDevice>-request.yaml:

    apiVersion: kaas.mirantis.com/v1alpha1
    kind: KaaSCephOperationRequest
    metadata:
      name: replace-failed-meta-<machineName>-<metadataDevice>
      namespace: <managedClusterProjectName>
    spec:
      osdRemove:
        nodes:
          <machineName>:
            cleanupByOsdId:
            - <osdID-1>
            - <osdID-2>
            ...
      kaasCephCluster:
        name: <kaasCephClusterName>
        namespace: <managedClusterProjectName>
    

    Substitute the following parameters:

    • <machineName> and <metadataDevice> with the machine and device names from the previous step

    • <managedClusterProjectName> with the cluster project name

    • <osdID-*> with IDs of the affected Ceph OSDs

    • <kaasCephClusterName> with the KaaSCephCluster CR name

    • <managedClusterProjectName> with the project name of the related managed cluster

  6. Apply the template to the cluster:

    kubectl apply -f replace-failed-meta-<machineName>-<metadataDevice>-request.yaml
    
  7. Verify that the corresponding request has been created:

    kubectl get kaascephoperationrequest -n <managedClusterProjectName>
    
  8. Verify that the removeInfo section is present in the KaaSCephOperationRequest CR status and that the cleanUpMap section matches the required removal:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest replace-failed-meta-<machineName>-<metadataDevice> -o yaml
    

    Example of system response:

    childNodesMapping:
      <nodeName>: <machineName>
    removeInfo:
      cleanUpMap:
        <nodeName>:
          osdMapping:
            "<osdID-1>":
              deviceMapping:
                <dataDevice-1>:
                  deviceClass: hdd
                  devicePath: <dataDeviceByPath-1>
                  devicePurpose: block
                  usedPartition: <dataLvPartition-1>
                  zapDisk: true
                <metadataDevice>:
                  deviceClass: hdd
                  devicePath: <metadataDeviceByPath>
                  devicePurpose: db
                  usedPartition: /dev/ceph-b0c70c72-8570-4c9d-93e9-51c3ab4dd9f9/osd-db-ecf64b20-1e07-42ac-a8ee-32ba3c0b7e2f
              uuid: ef516477-d2da-492f-8169-a3ebfc3417e2
            "<osdID-2>":
              deviceMapping:
                <dataDevice-2>:
                  deviceClass: hdd
                  devicePath: <dataDeviceByPath-2>
                  devicePurpose: block
                  usedPartition: <dataLvPartition-2>
                  zapDisk: true
                <metadataDevice>:
                  deviceClass: hdd
                  devicePath: <metadataDeviceByPath>
                  devicePurpose: db
                  usedPartition: /dev/ceph-b0c70c72-8570-4c9d-93e9-51c3ab4dd9f9/osd-db-ecf64b20-1e07-42ac-a8ee-32ba3c0b7e2f
              uuid: ef516477-d2da-492f-8169-a3ebfc3417e2
            ...
    

    Definition of values in angle brackets:

    • <machineName> - name of the machine on which the device is being replaced, for example, worker-1

    • <nodeName> - underlying node name of the machine, for example, kaas-node-5a74b669-7e53-4535-aabd-5b509ec844af

    • <osdId> - Ceph OSD ID for the device being replaced, for example, 1

    • <dataDevice> - name of the device placed on the node, for example, /dev/vdc

    • <dataDeviceByPath> - by-path of the device placed on the node, for example, /dev/disk/by-path/pci-0000:00:1t.9

    • <metadataDevice> - metadata name of the device placed on the node, for example, /dev/vde

    • <metadataDeviceByPath> - metadata by-path of the device placed on the node, for example, /dev/disk/by-path/pci-0000:00:12.0

    • <dataLvPartition> logical volume partition of the data device

  9. Wait for the ApproveWaiting phase to appear in status:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest replace-failed-meta-<machineName>-<metadataDevice> -o yaml
    

    Example of system response:

    status:
      phase: ApproveWaiting
    
  10. In the KaaSCephOperationRequest CR, set the approve flag to true:

    kubectl -n <managedClusterProjectName> edit kaascephoperationrequest replace-failed-meta-<machineName>-<metadataDevice>
    

    Configuration snippet:

    spec:
      osdRemove:
        approve: true
    
  11. Review the following status fields of the KaaSCephOperationRequest CR request processing:

    • status.phase - current state of request processing

    • status.messages - description of the current phase

    • status.conditions - full history of request processing before the current phase

    • status.removeInfo.issues and status.removeInfo.warnings - error and warning messages occurred during request processing, if any

  12. Verify that the KaaSCephOperationRequest has been completed. For example:

    status:
      phase: Completed # or CompletedWithWarnings if there are non-critical issues
    

Prepare the replaced metadata device for Ceph OSD re-creation

Note

This section describes how to create a metadata disk partition on N logical volumes. To create one partition on a metadata disk, refer to Reconfigure a partition of a Ceph OSD metadata device.

  1. Partition the replaced metadata device by N logical volumes (LVs), where N is the number of Ceph OSDs previously located on a failed metadata device.

    Calculate the new metadata LV percentage of used volume group capacity using the 100 / N formula.

  2. Log in to the node with the replaced metadata disk.

  3. Create an LVM physical volume atop the replaced metadata device:

    pvcreate <metadataDisk>
    

    Substitute <metadataDisk> with the replaced metadata device.

  4. Create an LVM volume group atop of the physical volume:

    vgcreate bluedb <metadataDisk>
    

    Substitute <metadataDisk> with the replaced metadata device.

  5. Create N LVM logical volumes with the calculated capacity per each volume:

    lvcreate -l <X>%VG -n meta_<i> bluedb
    

    Substitute <X> with the result of the 100 / N formula and <i> with the current number of metadata partitions.

As a result, the replaced metadata device will have N LVM paths, for example, /dev/bluedb/meta_1.

Re-create a Ceph OSD on the replaced metadata device

Note

You can spawn Ceph OSD on a raw device, but it must be clean and without any data or partitions. If you want to add a device that was in use, also ensure it is raw and clean. To clean up all data and partitions from a device, refer to official Rook documentation.

  1. Open the KaasCephCluster CR for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  2. In the nodes section, add the cleaned Ceph OSD device with the replaced LVM paths of the metadata device from previous steps. For example:

    spec:
      cephClusterSpec:
        nodes:
          <machineName>:
            storageDevices:
            - name: <deviceByID-1> # Recommended. Add the new device by ID /dev/disk/by-id/...
              #fullPath: <deviceByPath-1> # Add a new device by path /dev/disk/by-path/...
              config:
                deviceClass: hdd
                metadataDevice: /dev/<vgName>/<lvName-1>
            - name: <deviceByID-2> # Recommended. Add the new device by ID /dev/disk/by-id/...
              #fullPath: <deviceByPath-2> # Add a new device by path /dev/disk/by-path/...
              config:
                deviceClass: hdd
                metadataDevice: /dev/<vgName>/<lvName-2>
            - name: <deviceByID-3> # Recommended. Add the new device by ID /dev/disk/by-id/...
              #fullPath: <deviceByPath-3> # Add a new device by path /dev/disk/by-path/...
              config:
                deviceClass: hdd
                metadataDevice: /dev/<vgName>/<lvName-3>
    
    • Substitute <machineName> with the machine name of the node where the metadata device has been replaced.

    • Add all data devices for re-created Ceph OSDs and specify metadataDevice that is the path to the previously created logical volume. Substitute <vgName> with a volume group name that contains N logical volumes <lvName-i>.

  3. Wait for the re-created Ceph OSDs to apply to the Ceph cluster.

    You can monitor the application state using either the status section of the KaaSCephCluster CR or in the rook-ceph-tools Pod:

    kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s