Replace a failed Ceph OSD disk with a metadata device as a device name¶
You can apply the below procedure if a Ceph OSD failed with data disk outage
and the metadata partition is not specified in the BareMetalHostProfile
custom resource (CR). This scenario implies that the Ceph cluster
automatically creates a required metadata logical volume on a desired device.
Remove a Ceph OSD with a metadata device as a device name¶
To remove the affected Ceph OSD with a metadata device as a device name, follow the Remove a failed Ceph OSD by ID with a defined metadata device procedure and capture the following details:
While editing
KaasCephCluster
in thenodes
section, capture themetadataDevice
path to reuse it during re-creation of the Ceph OSD.Example of the
spec.nodes
section:spec: cephClusterSpec: nodes: <machineName>: storageDevices: - name: <deviceName> # remove the entire item from the storageDevices list # fullPath: <deviceByPath> if device is specified using by-path instead of name config: deviceClass: hdd metadataDevice: /dev/nvme0n1
In the example above, save the
metadataDevice
device name/dev/nvme0n1
.During verification of
removeInfo
, capture theusedPartition
value of the metadata device located in thedeviceMapping.<metadataDevice>
section.Example of the
removeInfo
section:removeInfo: cleanUpMap: <nodeName>: osdMapping: "<osdID>": deviceMapping: <dataDevice>: deviceClass: hdd devicePath: <dataDeviceByPath> devicePurpose: block usedPartition: /dev/ceph-d2d3a759-2c22-4304-b890-a2d87e056bd4/osd-block-ef516477-d2da-492f-8169-a3ebfc3417e2 zapDisk: true <metadataDevice>: deviceClass: hdd devicePath: <metadataDeviceByPath> devicePurpose: db usedPartition: /dev/ceph-b0c70c72-8570-4c9d-93e9-51c3ab4dd9f9/osd-db-ecf64b20-1e07-42ac-a8ee-32ba3c0b7e2f uuid: ef516477-d2da-492f-8169-a3ebfc3417e2
In the example above, capture the following values from the
<metadataDevice>
section:ceph-b0c70c72-8570-4c9d-93e9-51c3ab4dd9f9
- name of the volume group that contains all metadata partitions on the<metadataDevice>
diskosd-db-ecf64b20-1e07-42ac-a8ee-32ba3c0b7e2f
- name of the logical volume that relates to a failed Ceph OSD
Re-create the metadata partition on the existing metadata disk¶
After you remove the Ceph OSD disk, manually create a separate logical volume for the metadata partition in an existing volume group on the metadata device:
lvcreate -l 100%FREE -n meta_1 <vgName>
Subtitute <vgName>
with the name of a volume group captured in the
usedPartiton
parameter.
Note
If you removed more than one OSD, replace 100%FREE
with the
corresponding partition size. For example:
lvcreate -l <partitionSize> -n meta_1 <vgName>
Substitute <partitionSize>
with the corresponding value that matches the
size of other partitions placed on the affected metadata drive. To obtain
<partitionSize>
, use the output of the lvs command. For example:
16G
.
During execution of the lvcreate command, the system asks you to wipe the found bluestore label on a metadata device. For example:
WARNING: ceph_bluestore signature detected on /dev/ceph-b0c70c72-8570-4c9d-93e9-51c3ab4dd9f9/meta_1 at offset 0. Wipe it? [y/n]:
Using the interactive shell, answer n
to keep all metadata partitions
alive. After answering n
, the system outputs the following:
Aborted wiping of ceph_bluestore.
1 existing signature left on the device.
Logical volume "meta_1" created.
Re-create the Ceph OSD with the re-created metadata partition¶
Note
You can spawn Ceph OSD on a raw device, but it must be clean and without any data or partitions. If you want to add a device that was in use, also ensure it is raw and clean. To clean up all data and partitions from a device, refer to official Rook documentation.
If you want to add a Ceph OSD on top of a raw device that already exists on a node or is hot-plugged, add the required device using the following guidelines:
You can add a raw device to a node during node deployment.
If a node supports adding devices without node reboot, you can hot plug a raw device to a node.
If a node does not support adding devices without node reboot, you can hot plug a raw device during node shutdown. In this case, complete the following steps:
Enable maintenance mode on the managed cluster.
Turn off the required node.
Attach the required raw device to the node.
Turn on the required node.
Disable maintenance mode on the managed cluster.
Open the
KaasCephCluster
CR for editing:kubectl edit kaascephcluster -n <managedClusterProjectName>
Substitute
<managedClusterProjectName>
with the corresponding value.In the
nodes
section, add the replaced device with the samemetadataDevice
path as in the previous Ceph OSD:spec: cephClusterSpec: nodes: <machineName>: storageDevices: - fullPath: <deviceByID> # Recommended since Container Cloud 2.25.0. # Add a new device by-id symlink, for example, /dev/disk/by-id/... #name: <deviceByID> # Add a new device by ID, for example, /dev/disk/by-id/... #fullPath: <deviceByPath> # Add a new device by path, for example, /dev/disk/by-path/... config: deviceClass: hdd metadataDevice: /dev/<vgName>/meta_1
Substitute
<machineName>
with the machine name of the node where the new device<deviceByID>
or<deviceByPath>
must be added. Also specifymetadataDevice
with the path to the logical volume created during the Re-create the metadata partition on the existing metadata disk procedure.Wait for the replaced disk to apply to the Ceph cluster as a new Ceph OSD.
You can monitor the application state using either the
status
section of theKaaSCephCluster
CR or in therook-ceph-tools
Pod:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s