Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!
Starting with MOSK 25.2, the MOSK documentation set covers all product layers, including MOSK management (formerly Container Cloud). This means everything you need is in one place. Some legacy names may remain in the code and documentation and will be updated in future releases. The separate Container Cloud documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.
Replace a failed Ceph OSD with a metadata device as a logical volume path¶
Warning
This procedure is valid for MOSK clusters that use the deprecated
KaaSCephCluster custom resource (CR) instead of the MiraCeph CR that is
available since MOSK 25.2 as a new Ceph configuration entrypoint. For the
equivalent procedure with the MiraCeph CR, refer to the following section:
Replace a failed Ceph OSD with a metadata device as a logical volume path
You can apply the below procedure in the following cases:
A Ceph OSD failed without data or metadata device outage. In this case, first remove a failed Ceph OSD and clean up all corresponding disks and partitions. Then add a new Ceph OSD to the same data and metadata paths.
A Ceph OSD failed with data or metadata device outage. In this case, you also first remove a failed Ceph OSD and clean up all corresponding disks and partitions. Then add a new Ceph OSD to a newly replaced data device with the same metadata path.
Note
The below procedure also applies to manually created metadata partitions.
Remove a failed Ceph OSD by ID with a defined metadata device¶
Identify the ID of Ceph OSD related to a failed device. For example, use the Ceph CLI in the
rook-ceph-toolsPod:ceph osd metadata
Example of system response:
{ "id": 0, ... "bluestore_bdev_devices": "vdc", ... "devices": "vdc", ... "hostname": "kaas-node-6c5e76f9-c2d2-4b1a-b047-3c299913a4bf", ... "pod_name": "rook-ceph-osd-0-7b8d4d58db-f6czn", ... }, { "id": 1, ... "bluefs_db_devices": "vdf", ... "bluestore_bdev_devices": "vde", ... "devices": "vde,vdf", ... "hostname": "kaas-node-6c5e76f9-c2d2-4b1a-b047-3c299913a4bf", ... "pod_name": "rook-ceph-osd-1-78fbc47dc5-px9n2", ... }, ...
Open the
KaasCephClustercustom resource (CR) for editing:kubectl edit kaascephcluster -n <moskClusterProjectName>
Substitute
<moskClusterProjectName>with the corresponding value.In the
nodessection:Find and capture the
metadataDevicepath to reuse it during re-creation of the Ceph OSD.Remove the required device:
Example configuration snippet:
spec: cephClusterSpec: nodes: <machineName>: storageDevices: - name: <deviceName> # remove the entire item from the storageDevices list # fullPath: <deviceByPath> if device is specified using by-path instead of name config: deviceClass: hdd metadataDevice: /dev/bluedb/meta_1
In the example above,
<machineName>is the name of machine that relates to the node on which the device<deviceName>or<deviceByPath>must be replaced.Create a
KaaSCephOperationRequestCR template and save it asreplace-failed-osd-<machineName>-<osdID>-request.yaml:apiVersion: kaas.mirantis.com/v1alpha1 kind: KaaSCephOperationRequest metadata: name: replace-failed-osd-<machineName>-<deviceName> namespace: <moskClusterProjectName> spec: osdRemove: nodes: <machineName>: cleanupByOsdId: - <osdID> kaasCephCluster: name: <kaasCephClusterName> namespace: <moskClusterProjectName>
Substitute the following parameters:
<machineName>and<deviceName>with the machine and device names from the previous step<moskClusterProjectName>with the cluster project name<osdID>with the ID of the affected Ceph OSD<kaasCephClusterName>with theKaaSCephClusterresource name<moskClusterProjectName>with the project name of the related MOSK cluster
Apply the template to the cluster:
kubectl apply -f replace-failed-osd-<machineName>-<osdID>-request.yaml
Verify that the corresponding request has been created:
kubectl get kaascephoperationrequest -n <moskClusterProjectName>
Verify that the
statussection ofKaaSCephOperationRequestcontains theremoveInfosection:kubectl -n <moskClusterProjectName> get kaascephoperationrequest replace-failed-osd-<machineName>-<osdID> -o yaml
Example of system response:
childNodesMapping: <nodeName>: <machineName> removeInfo: cleanUpMap: <nodeName>: osdMapping: "<osdID>": deviceMapping: <dataDevice>: deviceClass: hdd devicePath: <dataDeviceByPath> devicePurpose: block usedPartition: /dev/ceph-d2d3a759-2c22-4304-b890-a2d87e056bd4/osd-block-ef516477-d2da-492f-8169-a3ebfc3417e2 zapDisk: true <metadataDevice>: deviceClass: hdd devicePath: <metadataDeviceByPath> devicePurpose: db usedPartition: /dev/bluedb/meta_1 uuid: ef516477-d2da-492f-8169-a3ebfc3417e2
Definition of values in angle brackets:
<machineName>- name of the machine on which the device is being replaced, for example,worker-1<nodeName>- underlying node name of the machine, for example,kaas-node-5a74b669-7e53-4535-aabd-5b509ec844af<osdId>- Ceph OSD ID for the device being replaced, for example,1<dataDeviceByPath>-by-pathof the device placed on the node, for example,/dev/disk/by-path/pci-0000:00:1t.9<dataDevice>- name of the device placed on the node, for example,/dev/vde<metadataDevice>- metadata name of the device placed on the node, for example,/dev/vdf<metadataDeviceByPath>- metadataby-pathof the device placed on the node, for example,/dev/disk/by-path/pci-0000:00:12.0
Note
The partitions that are manually created or configured using the
BareMetalHostProfileobject can be removed only manually, or during a complete metadata disk removal, or during theMachineobject removal or re-provisioning.Verify that the
cleanUpMapsection matches the required removal and wait for theApproveWaitingphase to appear instatus:kubectl -n <moskClusterProjectName> get kaascephoperationrequest replace-failed-osd-<machineName>-<osdID> -o yaml
Example of system response:
status: phase: ApproveWaiting
In the
KaaSCephOperationRequestCR, set theapproveflag totrue:kubectl -n <moskClusterProjectName> edit kaascephoperationrequest replace-failed-osd-<machineName>-<osdID>
Configuration snippet:
spec: osdRemove: approve: true
Review the following
statusfields of the Ceph LCM CR request processing:status.phase- current state of request processingstatus.messages- description of the current phasestatus.conditions- full history of request processing before the current phasestatus.removeInfo.issuesandstatus.removeInfo.warnings- error and warning messages occurred during request processing, if any
Verify that the
KaaSCephOperationRequesthas been completed. For example:status: phase: Completed # or CompletedWithWarnings if there are non-critical issues
Re-create a Ceph OSD with the same metadata partition¶
Note
You can spawn Ceph OSD on a raw device, but it must be clean and without any data or partitions. If you want to add a device that was in use, also ensure it is raw and clean. To clean up all data and partitions from a device, refer to official Rook documentation.
If you want to add a Ceph OSD on top of a raw device that already exists on a node or is hot-plugged, add the required device using the following guidelines:
You can add a raw device to a node during node deployment.
If a node supports adding devices without node reboot, you can hot plug a raw device to a node.
If a node does not support adding devices without node reboot, you can hot plug a raw device during node shutdown. In this case, complete the following steps:
Enable maintenance mode on the MOSK cluster.
Turn off the required node.
Attach the required raw device to the node.
Turn on the required node.
Disable maintenance mode on the MOSK cluster.
Open the
KaasCephClusterCR for editing:kubectl edit kaascephcluster -n <moskClusterProjectName>
Substitute
<moskClusterProjectName>with the corresponding value.In the
nodessection, add the replaced device with the samemetadataDevicepath as on the removed Ceph OSD. For example:spec: cephClusterSpec: nodes: <machineName>: storageDevices: - name: <deviceByID> # Recommended. Add a new device by ID, for example, /dev/disk/by-id/... #fullPath: <deviceByPath> # Add a new device by path, for example, /dev/disk/by-path/... config: deviceClass: hdd metadataDevice: /dev/bluedb/meta_1 # Must match the value of the previously removed OSD
Substitute
<machineName>with the machine name of the node where the new device<deviceByID>or<deviceByPath>must be added.Wait for the replaced disk to apply to the Ceph cluster as a new Ceph OSD.
You can monitor the application state using either the
statussection of theKaaSCephClusterCR or in therook-ceph-toolsPod:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s