Verify Ceph Controller and Rook¶

The starting point for Ceph troubleshooting is the ceph-controller and rook-operator logs. Once you locate the component that causes issues, verify the logs of the related pod. This section describes how to verify the Ceph Controller and Rook objects of a Ceph cluster.

To verify Ceph Controller and Rook:

Verify the Ceph cluster status:
1. Verify that the status of each pod in the ceph-lcm-mirantis and rook-ceph name spaces is Running:
  - For ceph-lcm-mirantis:
    kubectl get pod -n ceph-lcm-mirantis
  - For rook-ceph:
    kubectl get pod -n rook-ceph
Verify Ceph Controller. Ceph Controller prepares the configuration that Rook uses to deploy the Ceph cluster, managed using the KaasCephCluster resource. If Rook cannot finish the deployment, verify the Rook Operator logs as described in the step 4.
1. List the pods:
```
kubectl -n ceph-lcm-mirantis get pods
```
2. Verify the logs of the required pod:
```
kubectl -n ceph-lcm-mirantis logs <ceph-controller-pod-name>
```
3. Verify the configuration:
```
kubectl get kaascephcluster -n <managedClusterProjectName> -o yaml
```
4. On the managed cluster, verify the MiraCeph subresource:
```
kubectl get miraceph -n ceph-lcm-mirantis -o yaml
```
Verify the Rook Operator logs. Rook deploys a Ceph cluster based on custom resources created by the Ceph Controller, such as pools, clients, cephcluster, and so on. Rook logs contain details about components orchestration. For details about the Ceph cluster status and to get access to CLI tools, connect to the ceph-tools pod as described in the step 5.
1. Verify the Rook Operator logs:
```
kubectl -n rook-ceph logs -l app=rook-ceph-operator
```
2. Verify the CephCluster configuration:
  
  Note
  
  The Ceph Controller manages the CephCluster CR . Open the CephCluster CR only for verification and do not modify it manually.
```
kubectl get cephcluster -n rook-ceph -o yaml
```

Verify the ceph-tools pod:

Execute the ceph-tools pod:

kubectl --kubeconfig <pathToManagedClusterKubeconfig> -n rook-ceph exec -it $(kubectl --kubeconfig <pathToManagedClusterKubeconfig> -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') bash

Verify that CLI commands can run on the ceph-tools pod:
```
ceph -s
```

Verify hardware:
1. Through the ceph-tools pod, obtain the required device in your cluster:
```
ceph osd tree
```
2. Enter all Ceph OSD pods in the rook-ceph namespace one by one:
```
kubectl exec -it -n rook-ceph <osd-pod-name> bash
```
3. Verify that the ceph-volume tool is available on all pods running on the target node:
```
ceph-volume lvm list
```

Verify data access. Ceph volumes can be consumed directly by Kubernetes workloads and internally, for example, by OpenStack services. To verify the Kubernetes storage:

Verify the available storage classes. The storage classes that are automatically managed by Ceph Controller use the rook-ceph.rbd.csi.ceph.com provisioner.

kubectl get storageclass

Example of system response:

NAME                            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
kubernetes-ssd (default)        rook-ceph.rbd.csi.ceph.com     Delete          Immediate              false                  55m
stacklight-alertmanager-data    kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  55m
stacklight-elasticsearch-data   kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  55m
stacklight-postgresql-db        kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  55m
stacklight-prometheus-data      kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  55m

Verify that volumes are properly connected to the Pod:

Obtain the list of volumes in all namespaces or use a particular one:

kubectl get persistentvolumeclaims -A

Example of system response:

NAMESPACE   NAME       STATUS   VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS     AGE
rook-ceph   app-test   Bound    pv-test   1Gi        RWO            kubernetes-ssd   11m

For each volume, verify the connection. For example:

kubectl describe pvc app-test -n rook-ceph

Example of a positive system response:

Name:          app-test
Namespace:     kaas
StorageClass:  rook-ceph
Status:        Bound
Volume:        pv-test
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Events:        <none>

In case of connection issues, inspect the Pod description for the volume information:

kubectl describe pod <crashloopbackoff-pod-name>

Example of system response:

...
Events:
  FirstSeen LastSeen Count From    SubObjectPath Type     Reason           Message
  --------- -------- ----- ----    ------------- -------- ------           -------
  1h        1h       3     default-scheduler     Warning  FailedScheduling PersistentVolumeClaim is not bound: "app-test" (repeated 2 times)
  1h        35s      36    kubelet, 172.17.8.101 Warning  FailedMount      Unable to mount volumes for pod "wordpress-mysql-918363043-50pjr_default(08d14e75-bd99-11e7-bc4c-001c428b9fc8)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-mysql-918363043-50pjr". list of unattached/unmounted volumes=[mysql-persistent-storage]
  1h        35s      36    kubelet, 172.17.8.101 Warning  FailedSync       Error syncing pod

Verify that the CSI provisioner plugins started properly and are in the Running status:
1. Obtain the list of CSI provisioner plugins:
```
kubectl -n rook-ceph get pod -l app=csi-rbdplugin-provisioner
```
2. Verify the logs of the required CSI provisioner:
```
kubectl logs -n rook-ceph <csi-provisioner-plugin-name> csi-provisioner
```