Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!
Starting with MOSK 25.2, the MOSK documentation set covers all product layers, including MOSK management (formerly Container Cloud). This means everything you need is in one place. Some legacy names may remain in the code and documentation and will be updated in future releases. The separate Container Cloud documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.
Ceph Monitors recovery¶
Warning
This procedure is valid for MOSK clusters that use the MiraCeph custom
resource (CR), which is available since MOSK 25.2 to replace the deprecated
KaaSCephCluster. For the equivalent procedure with the KaaSCephCluster
CR, refer to the following section:
This section describes how to recover failed Ceph Monitors of an existing Ceph cluster in the following state:
- The Ceph cluster contains failed Ceph Monitors that cannot start and hang in the - Erroror- CrashLoopBackOffstate.
- The logs of the failed Ceph Monitor pods contain the following lines: - mon.g does not exist in monmap, will attempt to join an existing cluster ... mon.g@-1(???) e11 not in monmap and have been in a quorum before; must have been removed mon.g@-1(???) e11 commit suicide! 
- The Ceph cluster contains at least one - RunningCeph Monitor and the ceph -s command outputs one healthy- monand one healthy- mgrinstance.
Perform the following steps for all failed Ceph Monitors at a time if not stated otherwise.
To recover failed Ceph Monitors:
- Scale the - rook-ceph/rook-ceph-operatordeployment down to- 0replicas:- kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 0 
- Delete all failed Ceph Monitor deployments: - Identify the Ceph Monitor pods in the - Erroror- CrashLookBackOffstate:- kubectl -n rook-ceph get pod -l 'app in (rook-ceph-mon,rook-ceph-mon-canary)' 
- Verify that the affected pods contain the failure logs described above: - kubectl -n rook-ceph logs <failedMonPodName> - Substitute - <failedMonPodName>with the Ceph Monitor pod name. For example,- rook-ceph-mon-g-845d44b9c6-fjc5d.
- Save the identifying letters of failed Ceph Monitors for further usage. For example, f, e, and so on. 
- Delete all corresponding deployments of these pods: - Identify the affected Ceph Monitor pod deployments: - kubectl -n rook-ceph get deploy -l 'app in (rook-ceph-mon,rook-ceph-mon-canary)' 
- Delete the affected Ceph Monitor pod deployments. For example, if the Ceph cluster has the - rook-ceph-mon-c-845d44b9c6-fjc5dpod in the- CrashLoopBackOffstate, remove the corresponding- rook-ceph-mon-c:- kubectl -n rook-ceph delete deploy rook-ceph-mon-c - Canary - mondeployments have the suffix- -canary.
 
 
- Remove all corresponding entries of Ceph Monitors from the MON map: - Enter the - ceph-toolspod:- kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash 
- Inspect the current MON map and save the IP addresses of the failed Ceph monitors for further usage: - ceph mon dump 
- Remove all entries of failed Ceph Monitors using the previously saved letters: - ceph mon rm <monLetter> - Substitute - <monLetter>with the corresponding letter of a failed Ceph Monitor.
- Exit the - ceph-toolspod.
 
- Remove all failed Ceph Monitors entries from the Rook - monendpoints ConfigMap:- Open the - rook-ceph/rook-ceph-mon-endpointsConfigMap for editing:- kubectl -n rook-ceph edit cm rook-ceph-mon-endpoints 
- Remove all entries of failed Ceph Monitors from the ConfigMap data and update the - maxMonIdvalue with the current number of- RunningCeph Monitors. For example,- rook-ceph-mon-endpointshas the following- data:- data: csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["172.0.0.222:6789","172.0.0.223:6789","172.0.0.224:6789","172.16.52.217:6789","172.16.52.216:6789"]}]' data: a=172.0.0.222:6789,b=172.0.0.223:6789,c=172.0.0.224:6789,f=172.0.0.217:6789,e=172.0.0.216:6789 mapping: '{"node":{ "a":{"Name":"kaas-node-21465871-42d0-4d56-911f-7b5b95cb4d34","Hostname":"kaas-node-21465871-42d0-4d56-911f-7b5b95cb4d34","Address":"172.16.52.222"}, "b":{"Name":"kaas-node-43991b09-6dad-40cd-93e7-1f02ed821b9f","Hostname":"kaas-node-43991b09-6dad-40cd-93e7-1f02ed821b9f","Address":"172.16.52.223"}, "c":{"Name":"kaas-node-15225c81-3f7a-4eba-b3e4-a23fd86331bd","Hostname":"kaas-node-15225c81-3f7a-4eba-b3e4-a23fd86331bd","Address":"172.16.52.224"}, "e":{"Name":"kaas-node-ba3bfa17-77d2-467c-91eb-6291fb219a80","Hostname":"kaas-node-ba3bfa17-77d2-467c-91eb-6291fb219a80","Address":"172.16.52.216"}, "f":{"Name":"kaas-node-6f669490-f0c7-4d19-bf73-e51fbd6c7672","Hostname":"kaas-node-6f669490-f0c7-4d19-bf73-e51fbd6c7672","Address":"172.16.52.217"}} }' maxMonId: "5" - If - eand- fare the letters of failed Ceph Monitors, the resulting ConfigMap data must be as follows:- data: csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["172.0.0.222:6789","172.0.0.223:6789","172.0.0.224:6789"]}]' data: a=172.0.0.222:6789,b=172.0.0.223:6789,c=172.0.0.224:6789 mapping: '{"node":{ "a":{"Name":"kaas-node-21465871-42d0-4d56-911f-7b5b95cb4d34","Hostname":"kaas-node-21465871-42d0-4d56-911f-7b5b95cb4d34","Address":"172.16.52.222"}, "b":{"Name":"kaas-node-43991b09-6dad-40cd-93e7-1f02ed821b9f","Hostname":"kaas-node-43991b09-6dad-40cd-93e7-1f02ed821b9f","Address":"172.16.52.223"}, "c":{"Name":"kaas-node-15225c81-3f7a-4eba-b3e4-a23fd86331bd","Hostname":"kaas-node-15225c81-3f7a-4eba-b3e4-a23fd86331bd","Address":"172.16.52.224"}} }' maxMonId: "3" 
 
- Back up the data of the failed Ceph Monitors one by one: - SSH to the node of a failed Ceph Monitor using the previously saved IP address. 
- Move the Ceph Monitor data directory to another place: - mv /var/lib/rook/mon-<letter> /var/lib/rook/mon-<letter>.backup 
- Close the SSH connection. 
 
- Scale the - rook-ceph/rook-ceph-operatordeployment up to- 1replica:- kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 1 
- Wait until all Ceph Monitors are in the - Runningstate:- kubectl -n rook-ceph get pod -l app=rook-ceph-mon -w 
- Restore the data from the backup for each recovered Ceph Monitor one by one: - Enter a recovered Ceph Monitor pod: - kubectl -n rook-ceph exec -it <monPodName> bash - Substitute - <monPodName>with the recovered Ceph Monitor pod name. For example,- rook-ceph-mon-g-845d44b9c6-fjc5d.
- Recover the - mondata backup for the current Ceph Monitor:- ceph-monstore-tool /var/lib/rook/mon-<letter>.backup/data store-copy /var/lib/rook/mon-<letter>/data/ - Substitute - <letter>with the current Ceph Monitor pod letter, for example,- e.
 
- Verify the Ceph state. The output must indicate the desired number of Ceph Monitors and all of them must be in quorum. - kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s