Ceph disaster recovery

This section describes how to recover a failed or accidentally removed Ceph cluster in the following cases:

  • If the Ceph controller underlying a running Rook Ceph cluster has failed and you want to install a new Ceph controller Helm release and recover the failed Ceph cluster onto the new Ceph controller.

  • To migrate the data of an existing Ceph cluster to a new Container Cloud or Mirantis OpenStack on Kubernetes (MOS) deployment in case downtime can be tolerated.

Consider the common state of a failed or removed Ceph cluster:

  • The rook-ceph namespace does not contain pods or they are in the Terminating state.

  • The rook-ceph or/and ceph-lcm-mirantis namespaces are in the Terminating state.

  • The ceph-operator is in the FAILED state:

    • For Container Cloud: the state of the ceph-operator Helm release in the management HelmBundle, such as default/kaas-mgmt, has switched from DEPLOYED to FAILED.

    • For MOS: the state of the osh-system/ceph-operator HelmBundle, or a related namespace, has switched from DEPLOYED to FAILED.

  • The Rook CephCluster, CephBlockPool, CephObjectStore CRs in the rook-ceph namespace cannot be found or have the deletionTimestamp parameter in the metadata section.

Note

Prior to recovering the Ceph cluster, verify that your deployment meets the following prerequisites:

  1. The Ceph cluster fsid exists.

  2. The Ceph cluster Monitor keyrings exist.

  3. The Ceph cluster devices exist and include the data previously handled by Ceph OSDs.

Overview of the recovery procedure workflow:

  1. Create a backup of the remaining data and resources.

  2. Clean up the failed or removed ceph-operator Helm release.

  3. Deploy a new ceph-operator Helm release with the previously used KaaSCephCluster and one Ceph Monitor.

  4. Replace the ceph-mon data with the old cluster data.

  5. Replace fsid in secrets/rook-ceph-mon with the old one.

  6. Fix the Monitor map in the ceph-mon database.

  7. Fix the Ceph Monitor authentication key and disable authentication.

  8. Start the restored cluster and inspect the recovery.

  9. Fix the admin authentication key and enable authentication.

  10. Restart the cluster.

To recover a failed or removed Ceph cluster:

  1. Back up the remaining resources. Skip the commands for the resources that have already been removed:

    kubectl -n rook-ceph get cephcluster <clusterName> -o yaml > backup/cephcluster.yaml
    # perform this for each cephblockpool
    kubectl -n rook-ceph get cephblockpool <cephBlockPool-i> -o yaml > backup/<cephBlockPool-i>.yaml
    # perform this for each client
    kubectl -n rook-ceph get cephclient <cephclient-i> -o yaml > backup/<cephclient-i>.yaml
    kubectl -n rook-ceph get cephobjectstore <cephObjectStoreName> -o yaml > backup/<cephObjectStoreName>.yaml
    # perform this for each secret
    kubectl -n rook-ceph get secret <secret-i> -o yaml > backup/<secret-i>.yaml
    # perform this for each configMap
    kubectl -n rook-ceph get cm <cm-i> -o yaml > backup/<cm-i>.yaml
    
  2. SSH to each node where the Ceph Monitors or Ceph OSDs were placed before the failure and back up the valuable data:

    mv /var/lib/rook /var/lib/rook.backup
    mv /etc/ceph /etc/ceph.backup
    mv /etc/rook /etc/rook.backup
    

    Once done, close the SSH connection.

  3. Clean up the previous installation of ceph-operator. For details, see Rook documentation: Cleaning up a cluster.

    1. Delete the ceph-lcm-mirantis/ceph-controller deployment:

      kubectl -n ceph-lcm-mirantis delete deployment ceph-controller
      
    2. Delete all deployments, DaemonSets, and jobs from the rook-ceph namespace, if any:

      kubectl -n rook-ceph delete deployment --all
      kubectl -n rook-ceph delete daemonset --all
      kubectl -n rook-ceph delete job --all
      
    3. Edit the MiraCeph and MiraCephLog CRs of the ceph-lcm-mirantis namespace and remove the finalizer parameter from the metadata section:

      kubectl -n ceph-lcm-mirantis edit miraceph
      kubectl -n ceph-lcm-mirantis edit miracephlog
      
    4. Edit the CephCluster, CephBlockPool, CephClient, and CephObjectStore CRs of the rook-ceph namespace and remove the finalizer parameter from the metadata section:

      kubectl -n rook-ceph edit cephclusters
      kubectl -n rook-ceph edit cephblockpools
      kubectl -n rook-ceph edit cephclients
      kubectl -n rook-ceph edit cephobjectstores
      kubectl -n rook-ceph edit cephobjectusers
      
    5. Once you clean up every single resource related to the Ceph release, open the Cluster CR for editing:

      kubectl -n <projectName> edit cluster <clusterName>
      

      Substitute <projectName> with default for the management cluster or with a related project name for the managed cluster.

    6. Remove the ceph-controller Helm release item from the spec.providerSpec.value.helmReleases array and save the Cluster CR:

      - name: ceph-controller
        values: {}
      
    7. Verify that ceph-controller has disappeared from the corresponding HelmBundle:

      kubectl -n <projectName> get helmbundle -o yaml
      
  4. Open the KaaSCephCluster CR of the related management or managed cluster for editing:

    kubectl -n <projectName> edit kaascephcluster
    

    Substitute <projectName> with default for the management cluster or with a related project name for the managed cluster.

  5. Edit the roles of nodes. The entire nodes spec must contain only one mon role. Save KaaSCephCluster after editing.

  6. Open the Cluster CR for editing:

    kubectl -n <projectName> edit cluster <clusterName>
    

    Substitute <projectName> with default for the management cluster or with a related project name for the managed cluster.

  7. Add ceph-controller to spec.providerSpec.value.helmReleases to restore the ceph-controller Helm release. Save Cluster after editing.

    - name: ceph-controller
      values: {}
    
  8. Verify that the ceph-controller Helm release is deployed:

    1. Inspect the Rook operator logs and wait until the orchestration has settled:

      kubectl -n rook-ceph logs -l app=rook-ceph-operator
      
    2. Verify that the pods in the rook-ceph namespace have rook-ceph-mon-a, rook-ceph-mgr-a, and all the auxiliary pods ar up and running, and no rook-ceph-osd-ID-xxxxxx are running:

      kubectl -n rook-ceph get pod
      
    3. Verify the Ceph state. The output must indicate that one mon and one mgr are running, all Ceph OSDs are down, and all PGs are in the Unknown state.

      kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') -- ceph -s
      

      Note

      Rook should not start any Ceph OSD daemon because all devices belong to the old cluster that has a different fsid. To verify the Ceph OSD daemons, inspect the osd-prepare pods logs:

      kubectl -n rook-ceph logs -l app=rook-ceph-osd-prepare
      
  9. Connect to the terminal of the rook-ceph-mon-a pod:

    kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod \
    -l app=rook-ceph-mon -o jsonpath='{.items[0].metadata.name}') bash
    
  10. Output the keyring file and save it for further usage:

    cat /etc/ceph/keyring-store/keyring
    exit
    
  11. Obtain and save the nodeName of mon-a for further usage:

    kubectl -n rook-ceph get pod $(kubectl -n rook-ceph get pod \
    -l app=rook-ceph-mon -o jsonpath='{.items[0].metadata.name}') -o jsonpath='{.spec.nodeName}'
    
  12. Obtain and save the cephImage used in the Ceph cluster for further usage:

    kubectl -n ceph-lcm-mirantis get cm ccsettings -o jsonpath='{.data.cephImage}'
    
  13. Stop the Rook operator and scale the deployment replicas to 0:

    kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 0
    
  14. Remove the Rook deployments generated with the Rook operator:

    kubectl -n rook-ceph delete deploy -l app=rook-ceph-mon
    kubectl -n rook-ceph delete deploy -l app=rook-ceph-mgr
    kubectl -n rook-ceph delete deploy -l app=rook-ceph-osd
    kubectl -n rook-ceph delete deploy -l app=rook-ceph-crashcollector
    
  15. Using the saved nodeName, SSH to the host where rook-ceph-mon-a in the new Kubernetes cluster is placed and perform the following steps:

    1. Remove /var/lib/rook/mon-a or copy it to another folder:

      mv /var/lib/rook/mon-a /var/lib/rook/mon-a.new
      
    2. Pick a healthy rook-ceph-mon-ID directory (/var/lib/rook.backup/mon-ID) in the previous backup, copy to /var/lib/rook/mon-a:

      cp -rp /var/lib/rook.backup/mon-<ID> /var/lib/rook/mon-a
      

      Substitute ID with any healthy mon node ID of the old cluster.

    3. Replace /var/lib/rook/mon-a/keyring with the previously saved keyring, preserving only the [mon.] section. Remove the [client.admin] section.

    4. Run the cephImage Docker container using the previously saved cephImage image:

      docker run -it --rm -v /var/lib/rook:/var/lib/rook <cephImage> bash
      
    5. Inside the container, create /etc/ceph/ceph.conf for a stable operation of ceph-mon:

      touch /etc/ceph/ceph.conf
      
    6. Change the directory to /var/lib/rook and edit monmap by replacing the existing mon hosts with the new mon-a endpoints:

      cd /var/lib/rook
      rm /var/lib/rook/mon-a/data/store.db/LOCK # make sure the quorum lock file does not exist
      ceph-mon --extract-monmap monmap --mon-data ./mon-a/data  # Extract monmap from old ceph-mon db and save as monmap
      monmaptool --print monmap  # Print the monmap content, which reflects the old cluster ceph-mon configuration.
      monmaptool --rm a monmap  # Delete `a` from monmap.
      monmaptool --rm b monmap  # Repeat, and delete `b` from monmap.
      monmaptool --rm c monmap  # Repeat this pattern until all the old ceph-mons are removed and monmap won't be empty
      monmaptool --addv a [v2:<nodeIP>:3300,v1:<nodeIP>:6789] monmap   # Replace it with the rook-ceph-mon-a address you got from previous command.
      ceph-mon --inject-monmap monmap --mon-data ./mon-a/data  # Replace monmap in ceph-mon db with our modified version.
      rm monmap
      exit
      

      Substitute <nodeIP> with the IP address of the current <nodeName> node.

    7. Close the SSH connection.

  16. Change fsid to the original one to run Rook as an old cluster:

    kubectl -n rook-ceph edit secret/rook-ceph-mon
    

    Note

    The fsid is base64 encoded and must not contain a trailing carriage return. For example:

    echo -n a811f99a-d865-46b7-8f2c-f94c064e4356 | base64  # Replace with the fsid from the old cluster.
    
  17. Scale the ceph-lcm-mirantis/ceph-controller deployment replicas to 0:

    kubectl -n ceph-lcm-mirantis scale deployment ceph-controller --replicas 0
    
  18. Disable authentication:

    1. Open the cm/rook-config-override ConfigMap for editing:

      kubectl -n rook-ceph edit cm/rook-config-override
      
    2. Add the following content:

      data:
        config: |
          [global]
          ...
          auth cluster required = none
          auth service required = none
          auth client required = none
          auth supported = none
      
  19. Start the Rook operator by scaling its deployment replicas to 1:

    kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 1
    
  20. Inspect the Rook operator logs and wait until the orchestration has settled:

    kubectl -n rook-ceph logs -l app=rook-ceph-operator
    
  21. Verify that the pods in the rook-ceph namespace have the rook-ceph-mon-a, rook-ceph-mgr-a, and all the auxiliary pods are up and running, and all rook-ceph-osd-ID-xxxxxx greater than zero are running:

    kubectl -n rook-ceph get pod
    
  22. Verify the Ceph state. The output must indicate that one mon, one mgr, and all Ceph OSDs are up and running and all PGs are either in the Active or Degraded state:

    kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod \
    -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') -- ceph -s
    
  23. Enter the ceph-tools pod and import the authentication key:

    kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod \
    -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') bash
    vi key
    [paste keyring content saved before, preserving only `[client admin]` section]
    ceph auth import -i key
    rm key
    exit
    
  24. Stop the Rook operator by scaling the deployment to 0 replicas:

    kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 0
    
  25. Re-enable authentication:

    1. Open the cm/rook-config-override ConfigMap for editing:

      kubectl -n rook-ceph edit cm/rook-config-override
      
    2. Remove the following content:

      data:
        config: |
          [global]
          ...
          auth cluster required = none
          auth service required = none
          auth client required = none
          auth supported = none
      
  26. Remove all Rook deployments generated with the Rook operator:

    kubectl -n rook-ceph delete deploy -l app=rook-ceph-mon
    kubectl -n rook-ceph delete deploy -l app=rook-ceph-mgr
    kubectl -n rook-ceph delete deploy -l app=rook-ceph-osd
    kubectl -n rook-ceph delete deploy -l app=rook-ceph-crashcollector
    
  27. Start the Ceph controller by scaling its deployment replicas to 1:

    kubectl -n ceph-lcm-mirantis scale deployment ceph-controller --replicas 1
    
  28. Start the Rook operator by scaling its deployment replicas to 1:

    kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 1
    
  29. Inspect the Rook operator logs and wait until the orchestration has settled:

    kubectl -n rook-ceph logs -l app=rook-ceph-operator
    
  30. Verify that the pods in the rook-ceph namespace have the rook-ceph-mon-a, rook-ceph-mgr-a, and all the auxiliary pods are up and running, and all rook-ceph-osd-ID-xxxxxx greater than zero are running:

    kubectl -n rook-ceph get pod
    
  31. Verify the Ceph state. The output must indicate that one mon, one mgr, and all Ceph OSDs are up and running and the overall stored data size equals to the old cluster data size.

    kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') -- ceph -s
    
  32. Edit the MiraCeph CR and add two more mon and mgr roles to the corresponding nodes:

    kubectl -n ceph-lcm-mirantis edit miraceph
    
  33. Inspect the Rook namespace and wait until all Ceph Monitors are in the Running state:

    kubectl -n rook-ceph get pod -l app=rook-ceph-mon
    
  34. Verify the Ceph state. The output must indicate that three mon (three in quorum), one mgr, and all Ceph OSDs are up and running and the overall stored data size equals to the old cluster data size.

    kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') -- ceph -s
    

Once done, the data from the failed or removed Ceph cluster is restored and ready to use.