Ceph Monitors recovery

This section describes how to recover failed Ceph Monitors of an existing Ceph cluster in the following state:

  • The Ceph cluster contains failed Ceph Monitors that cannot start and hang in the Error or CrashLoopBackOff state.

  • The logs of the failed Ceph Monitor pods contain the following lines:

    mon.g does not exist in monmap, will attempt to join an existing cluster
    ...
    mon.g@-1(???) e11 not in monmap and have been in a quorum before; must have been removed
    mon.g@-1(???) e11 commit suicide!
    
  • The Ceph cluster contains at least one Running Ceph Monitor and the ceph -s command outputs one healthy mon and one healthy mgr instance.

Perform the following steps for all failed Ceph Monitors at a time if not stated otherwise.

To recover failed Ceph Monitors:

  1. Obtain and export the kubeconfig of the affected cluster.

  2. Scale the rook-ceph/rook-ceph-operator deployment down to 0 replicas:

    kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 0
    
  3. Delete all failed Ceph Monitor deployments:

    1. Identify the Ceph Monitor pods in the Error or CrashLookBackOff state:

      kubectl -n rook-ceph get pod -l 'app in (rook-ceph-mon,rook-ceph-mon-canary)'
      
    2. Verify that the affected pods contain the failure logs described above:

      kubectl -n rook-ceph logs <failedMonPodName>
      

      Substitute <failedMonPodName> with the Ceph Monitor pod name. For example, rook-ceph-mon-g-845d44b9c6-fjc5d.

    3. Save the identifying letters of failed Ceph Monitors for further usage. For example, f, e, and so on.

    4. Delete all corresponding deployments of these pods:

      1. Identify the affected Ceph Monitor pod deployments:

        kubectl -n rook-ceph get deploy -l 'app in (rook-ceph-mon,rook-ceph-mon-canary)'
        
      2. Delete the affected Ceph Monitor pod deployments. For example, if the Ceph cluster has the rook-ceph-mon-c-845d44b9c6-fjc5d pod in the CrashLoopBackOff state, remove the corresponding rook-ceph-mon-c:

        kubectl -n rook-ceph delete deploy rook-ceph-mon-c
        

        Canary mon deployments have the suffix -canary.

  4. Remove all corresponding entries of Ceph Monitors from the MON map:

    1. Enter the ceph-tools pod:

      kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l \
      app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') bash
      
    2. Inspect the current MON map and save the IP addresses of the failed Ceph monitors for further usage:

      ceph mon dump
      
    3. Remove all entries of failed Ceph Monitors using the previously saved letters:

      ceph mon rm <monLetter>
      

      Substitute <monLetter> with the corresponding letter of a failed Ceph Monitor.

    4. Exit the ceph-tools pod.

  5. Remove all failed Ceph Monitors entries from the Rook mon endpoints ConfigMap:

    1. Open the rook-ceph/rook-ceph-mon-endpoints ConfigMap for editing:

      kubectl -n rook-ceph edit cm rook-ceph-mon-endpoints
      
    2. Remove all entries of failed Ceph Monitors from the ConfigMap data and update the maxMonId value with the current number of Running Ceph Monitors. For example, rook-ceph-mon-endpoints has the following data:

      data:
        csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["172.0.0.222:6789","172.0.0.223:6789","172.0.0.224:6789","172.16.52.217:6789","172.16.52.216:6789"]}]'
        data: a=172.0.0.222:6789,b=172.0.0.223:6789,c=172.0.0.224:6789,f=172.0.0.217:6789,e=172.0.0.216:6789
        mapping: '{"node":{
            "a":{"Name":"kaas-node-21465871-42d0-4d56-911f-7b5b95cb4d34","Hostname":"kaas-node-21465871-42d0-4d56-911f-7b5b95cb4d34","Address":"172.16.52.222"},
            "b":{"Name":"kaas-node-43991b09-6dad-40cd-93e7-1f02ed821b9f","Hostname":"kaas-node-43991b09-6dad-40cd-93e7-1f02ed821b9f","Address":"172.16.52.223"},
            "c":{"Name":"kaas-node-15225c81-3f7a-4eba-b3e4-a23fd86331bd","Hostname":"kaas-node-15225c81-3f7a-4eba-b3e4-a23fd86331bd","Address":"172.16.52.224"},
            "e":{"Name":"kaas-node-ba3bfa17-77d2-467c-91eb-6291fb219a80","Hostname":"kaas-node-ba3bfa17-77d2-467c-91eb-6291fb219a80","Address":"172.16.52.216"},
            "f":{"Name":"kaas-node-6f669490-f0c7-4d19-bf73-e51fbd6c7672","Hostname":"kaas-node-6f669490-f0c7-4d19-bf73-e51fbd6c7672","Address":"172.16.52.217"}}
        }'
        maxMonId: "5"
      

      If e and f are the letters of failed Ceph Monitors, the resulting ConfigMap data must be as follows:

      data:
        csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["172.0.0.222:6789","172.0.0.223:6789","172.0.0.224:6789"]}]'
        data: a=172.0.0.222:6789,b=172.0.0.223:6789,c=172.0.0.224:6789
        mapping: '{"node":{
            "a":{"Name":"kaas-node-21465871-42d0-4d56-911f-7b5b95cb4d34","Hostname":"kaas-node-21465871-42d0-4d56-911f-7b5b95cb4d34","Address":"172.16.52.222"},
            "b":{"Name":"kaas-node-43991b09-6dad-40cd-93e7-1f02ed821b9f","Hostname":"kaas-node-43991b09-6dad-40cd-93e7-1f02ed821b9f","Address":"172.16.52.223"},
            "c":{"Name":"kaas-node-15225c81-3f7a-4eba-b3e4-a23fd86331bd","Hostname":"kaas-node-15225c81-3f7a-4eba-b3e4-a23fd86331bd","Address":"172.16.52.224"}}
        }'
        maxMonId: "3"
      
  6. Back up the data of the failed Ceph Monitors one by one:

    1. SSH to the node of a failed Ceph Monitor using the previously saved IP address.

    2. Move the Ceph Monitor data directory to another place:

      mv /var/lib/rook/mon-<letter> /var/lib/rook/mon-<letter>.backup
      
    3. Close the SSH connection.

  7. Scale the rook-ceph/rook-ceph-operator deployment up to 1 replica:

    kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 1
    
  8. Wait until all Ceph Monitors are in the Running state:

    kubectl -n rook-ceph get pod -l app=rook-ceph-mon -w
    
  9. Restore the data from the backup for each recovered Ceph Monitor one by one:

    1. Enter a recovered Ceph Monitor pod:

      kubectl -n rook-ceph exec -it <monPodName> bash
      

      Substitute <monPodName> with the recovered Ceph Monitor pod name. For example, rook-ceph-mon-g-845d44b9c6-fjc5d.

    2. Recover the mon data backup for the current Ceph Monitor:

      ceph-monstore-tool /var/lib/rook/mon-<letter>.backup/data store-copy /var/lib/rook/mon-<letter>/data/
      

      Substitute <letter> with the current Ceph Monitor pod letter, for example, e.

  10. Verify the Ceph state. The output must indicate the desired number of Ceph Monitors and all of them must be in quorum.

    kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') -- ceph -s