Replace a failed Ceph OSD

Replace a failed Ceph OSD

After a physical disk replacement, you can use Rook to redeploy a failed Ceph OSD by restarting rook-operator that triggers the reconfiguration of the management or managed cluster.

To redeploy a failed Ceph OSD:

  1. Log in to a local machine running Ubuntu 18.04 where kubectl is installed.

  2. Obtain and export kubeconfig of the required management or managed cluster as described in Connect to a Mirantis Container Cloud cluster.

  3. Identify the failed Ceph OSD ID:

    ceph osd tree
    
  4. Remove the Ceph OSD deployment from the management or managed cluster:

    kubectl delete deployment -n rook-ceph rook-ceph-osd-<ID>
    
  5. Connect to the terminal of the ceph-tools pod:

    kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod \
    -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
    
  6. Remove the failed Ceph OSD from the Ceph cluster:

    ceph osd purge osd.<ID>
    
  7. Replace the failed disk.

  8. Restart the Rook operator:

    kubectl delete pod $(kubectl -n rook-ceph get pod -l "app=rook-ceph-operator" \
    -o jsonpath='{.items[0].metadata.name}') -n rook-ceph