Verify Ceph Controller and Rook

The starting point for Ceph troubleshooting is the ceph-controller and rook-operator logs. Once you locate the component that causes issues, verify the logs of the related pod. This section describes how to verify the Ceph Controller and Rook objects of a Ceph cluster.

To verify Ceph Controller and Rook:

  1. Verify data access. Ceph volumes can be consumed directly by Kubernetes workloads and internally, for example, by OpenStack services. To verify the Kubernetes storage:

    1. Verify the available storage classes. The storage classes that are automatically managed by Ceph Controller use the rook-ceph.rbd.csi.ceph.com provisioner.

      kubectl get storageclass
      

      Example of system response:

      NAME                            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
      iam-kaas-iam-data               kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  64m
      kubernetes-ssd (default)        rook-ceph.rbd.csi.ceph.com     Delete          Immediate              false                  55m
      stacklight-alertmanager-data    kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  55m
      stacklight-elasticsearch-data   kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  55m
      stacklight-postgresql-db        kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  55m
      stacklight-prometheus-data      kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  55m
      
    2. Verify that volumes are properly connected to the pod:

      1. Obtain the list of volumes:

        kubectl get persistentvolumeclaims -n kaas
        

        Example of system response:

        NAME                         STATUS  VOLUME                                    CAPACITY  ACCESS MODES  STORAGECLASS       AGE
        ironic-aio-pvc               Bound   pvc-9132beb2-6a17-4877-af40-06031d52da47  5Gi       RWO           kubernetes-ssd     62m
        ironic-inspector-pvc         Bound   pvc-e84e9a9e-51b8-4c57-b116-0e1e6a9e7e94  1Gi       RWO           kubernetes-ssd     62m
        mariadb-pvc                  Bound   pvc-fb0dbf01-ee4b-4c88-8b08-901080ee8e14  2Gi       RWO           kubernetes-ssd     62m
        mysql-data-mariadb-server-0  Bound   local-pv-d1ecc89d                         457Gi     RWO           iam-kaas-iam-data  62m
        mysql-data-mariadb-server-1  Bound   local-pv-1f385d17                         457Gi     RWO           iam-kaas-iam-data  62m
        mysql-data-mariadb-server-2  Bound   local-pv-79a820d7                         457Gi     RWO           iam-kaas-iam-data  62m
        
      2. For each volume, verify the connection. For example:

        kubectl describe pvc ironic-aio-pvc -n kaas
        

        Example of a positive system response:

        Name:          ironic-aio-pvc
        Namespace:     kaas
        StorageClass:  kubernetes-ssd
        Status:        Bound
        Volume:        pvc-9132beb2-6a17-4877-af40-06031d52da47
        Labels:        <none>
        Annotations:   pv.kubernetes.io/bind-completed: yes
                       pv.kubernetes.io/bound-by-controller: yes
                       volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
        Finalizers:    [kubernetes.io/pvc-protection]
        Capacity:      5Gi
        Access Modes:  RWO
        VolumeMode:    Filesystem
        Events:        <none>
        Mounted By:    dnsmasq-dbd84d496-6fcz4
                       httpd-0
                       ironic-555bff5dd8-kb8p2
        

        In case of connection issues, inspect the pod description for the volume information:

        kubectl describe pod <crashloopbackoff-pod-name>
        

        Example of system response:

        ...
        Events:
          FirstSeen LastSeen Count From    SubObjectPath Type     Reason           Message
          --------- -------- ----- ----    ------------- -------- ------           -------
          1h        1h       3     default-scheduler     Warning  FailedScheduling PersistentVolumeClaim is not bound: "mysql-pv-claim" (repeated 2 times)
          1h        35s      36    kubelet, 172.17.8.101 Warning  FailedMount      Unable to mount volumes for pod "wordpress-mysql-918363043-50pjr_default(08d14e75-bd99-11e7-bc4c-001c428b9fc8)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-mysql-918363043-50pjr". list of unattached/unmounted volumes=[mysql-persistent-storage]
          1h        35s      36    kubelet, 172.17.8.101 Warning  FailedSync       Error syncing pod
        
    3. Verify that the CSI provisioner plugins were started properly and have the Running status:

      1. Obtain the list of CSI provisioner plugins:

        kubectl -n rook-ceph get pod -l app=csi-rbdplugin-provisioner
        
      2. Verify the logs of the required CSI provisioner:

        kubectl logs -n rook-ceph <csi-provisioner-plugin-name> csi-provisioner
        
  2. Verify the Ceph cluster status:

    1. Verify that the status of each pod in the ceph-lcm-mirantis and rook-ceph name spaces is Running:

      • For ceph-lcm-mirantis:

        kubectl get pod -n ceph-lcm-mirantis
        
      • For rook-ceph:

        kubectl get pod -n rook-ceph
        
  3. Verify Ceph Controller. Ceph Controller prepares the configuration that Rook uses to deploy the Ceph cluster, managed using the KaasCephCluster resource. If Rook cannot finish the deployment, verify the Rook Operator logs as described in the step 4.

    1. List the pods:

      kubectl -n ceph-lcm-mirantis get pods
      
    2. Verify the logs of the required pod:

      kubectl -n ceph-lcm-mirantis logs <ceph-controller-pod-name>
      
    3. Verify the configuration:

      kubectl get kaascephcluster -n <managedClusterProjectName> -o yaml
      
    4. On the managed cluster, verify the MiraCeph subresource:

      kubectl get miraceph -n ceph-lcm-mirantis -o yaml
      
  4. Verify the Rook Operator logs. Rook deploys a Ceph cluster based on custom resources created by the Ceph Controller, such as pools, clients, cephcluster, and so on. Rook logs contain details about components orchestration. For details about the Ceph cluster status and to get access to CLI tools, connect to the ceph-tools pod as described in the step 5.

    1. Verify the Rook Operator logs:

      kubectl -n rook-ceph logs -l app=rook-ceph-operator
      
    2. Verify the CephCluster configuration:

      Note

      The Ceph Controller manages the CephCluster CR . Open the CephCluster CR only for verification and do not modify it manually.

      kubectl get cephcluster -n rook-ceph -o yaml
      
  5. Verify the ceph-tools pod:

    1. Execute the ceph-tools pod:

      kubectl --kubeconfig <pathToManagedClusterKubeconfig> -n rook-ceph exec -it $(kubectl --kubeconfig <pathToManagedClusterKubeconfig> -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') bash
      
    2. Verify that CLI commands can run on the ceph-tools pod:

      ceph -s
      
  6. Verify hardware:

    1. Through the ceph-tools pod, obtain the required device in your cluster:

      ceph osd tree
      
    2. Enter all Ceph OSD pods in the rook-ceph namespace one by one:

      kubectl exec -it -n rook-ceph <osd-pod-name> bash
      
    3. Verify that the ceph-volume tool is available on all pods running on the target node:

      ceph-volume lvm list