After a physical disk replacement, you can use Ceph LCM API to redeploy
a failed Ceph OSD. The common flow of replacing a failed Ceph OSD is as
follows:
Remove the obsolete Ceph OSD from the Ceph cluster by device name, by Ceph
OSD ID, or by path.
Add a new Ceph OSD on the new disk to the Ceph cluster.
Remove a failed Ceph OSD by device name, path, or ID¶
Warning
The procedure below presuppose that the Operator knows the exact
device name, by-path, or by-id of the replaced device, as well as on
which node the replacement occurred.
Warning
Since Container Cloud 2.23.0 and 2.23.1 for MOSK
23.1, a Ceph OSD removal using by-path, by-id, or device name is
not supported if a device was physically removed from a node. Therefore, use
cleanupByOsdId instead. For details, see
Remove a failed Ceph OSD by Ceph OSD ID.
Warning
Since Container Cloud 2.25.0, Mirantis does not recommend
setting device name or device by-path symlink in the
cleanupByDevice field as these identifiers are not persistent and
can change at node boot. Remove Ceph OSDs with by-id symlinks
specified in the path field or use cleanupByOsdId instead.
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section, remove the required device:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-name:<deviceName># remove the entire item from storageDevices list# fullPath: <deviceByPath> if device is specified with symlink instead of nameconfig:deviceClass:hdd
Substitute <machineName> with the machine name of the node where the
device <deviceName> or <deviceByPath> is going to be replaced.
Save KaaSCephCluster and close the editor.
Create a KaaSCephOperationRequest CR template and save it as
replace-failed-osd-<machineName>-<deviceName>-request.yaml:
apiVersion:kaas.mirantis.com/v1alpha1kind:KaaSCephOperationRequestmetadata:name:replace-failed-osd-<machineName>-<deviceName>namespace:<managedClusterProjectName>spec:osdRemove:nodes:<machineName>:cleanupByDevice:-name:<deviceName># If a device is specified with by-path or by-id instead of# name, path: <deviceByPath> or <deviceById>.kaasCephCluster:name:<kaasCephClusterName>namespace:<managedClusterProjectName>
Substitute <kaasCephClusterName> with the corresponding
KaaSCephCluster resource from the <managedClusterProjectName>
namespace.
Deploy a new device after removal of a failed one¶
Note
You can spawn Ceph OSD on a raw device, but it must be clean and
without any data or partitions. If you want to add a device that was in use,
also ensure it is raw and clean. To clean up all data and partitions from a
device, refer to official Rook documentation.
If you want to add a Ceph OSD on top of a raw device that already exists
on a node or is hot-plugged, add the required device using the following
guidelines:
You can add a raw device to a node during node deployment.
If a node supports adding devices without node reboot, you can hot plug
a raw device to a node.
If a node does not support adding devices without node reboot, you can
hot plug a raw device during node shutdown. In this case, complete the
following steps:
Enable maintenance mode on the managed cluster.
Turn off the required node.
Attach the required raw device to the node.
Turn on the required node.
Disable maintenance mode on the managed cluster.
Open the KaasCephCluster CR of a managed cluster for editing:
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section, add a new device:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-fullPath:<deviceByID># Since Container Cloud 2.25.0 if device is supposed to be added with by-id# name: <deviceByID> # Prior Container Cloud 2.25.0 if device is supposed to be added with by-id# fullPath: <deviceByPath> # if device is supposed to be added with by-pathconfig:deviceClass:hdd
Substitute <machineName> with the machine name of the node where device
<deviceName> or <deviceByPath> is going to be added as a Ceph OSD.
Verify that the new Ceph OSD has appeared in the Ceph cluster and is in
and up. The fullClusterInfo section should not contain any issues.