Verify Ceph Controller and Rook¶
The starting point for Ceph troubleshooting is the ceph-controller
and
rook-operator
logs. Once you locate the component that causes issues,
verify the logs of the related pod. This section describes how to verify the
Ceph Controller and Rook objects of a Ceph cluster.
To verify Ceph Controller and Rook:
Verify the Ceph cluster status:
Verify that the status of each pod in the
ceph-lcm-mirantis
androok-ceph
namespaces isRunning
:For
ceph-lcm-mirantis
:kubectl get pod -n ceph-lcm-mirantis
For
rook-ceph
:kubectl get pod -n rook-ceph
Verify Ceph Controller. Ceph Controller prepares the configuration that Rook uses to deploy the Ceph cluster, managed using the
KaasCephCluster
resource. If Rook cannot finish the deployment, verify the Rook Operator logs as described in the step 4.List the pods:
kubectl -n ceph-lcm-mirantis get pods
Verify the logs of the required pod:
kubectl -n ceph-lcm-mirantis logs <ceph-controller-pod-name>
Verify the configuration:
kubectl get kaascephcluster -n <managedClusterProjectName> -o yaml
On the managed cluster, verify the
MiraCeph
subresource:kubectl get miraceph -n ceph-lcm-mirantis -o yaml
Verify the Rook Operator logs. Rook deploys a Ceph cluster based on custom resources created by the Ceph Controller, such as pools, clients,
cephcluster
, and so on. Rook logs contain details about components orchestration. For details about the Ceph cluster status and to get access to CLI tools, connect to theceph-tools
pod as described in the step 5.Verify the Rook Operator logs:
kubectl -n rook-ceph logs -l app=rook-ceph-operator
Verify the
CephCluster
configuration:Note
The Ceph Controller manages the
CephCluster
CR . Open theCephCluster
CR only for verification and do not modify it manually.kubectl get cephcluster -n rook-ceph -o yaml
Verify the
ceph-tools
pod:Execute the
ceph-tools
pod:kubectl --kubeconfig <pathToManagedClusterKubeconfig> -n rook-ceph exec -it $(kubectl --kubeconfig <pathToManagedClusterKubeconfig> -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') bash
Verify that CLI commands can run on the
ceph-tools
pod:ceph -s
Verify hardware:
Through the
ceph-tools
pod, obtain the required device in your cluster:ceph osd tree
Enter all Ceph OSD pods in the
rook-ceph
namespace one by one:kubectl exec -it -n rook-ceph <osd-pod-name> bash
Verify that the
ceph-volume
tool is available on all pods running on the target node:ceph-volume lvm list
Verify data access. Ceph volumes can be consumed directly by Kubernetes workloads and internally, for example, by OpenStack services. To verify the Kubernetes storage:
Verify the available storage classes. The storage classes that are automatically managed by Ceph Controller use the
rook-ceph.rbd.csi.ceph.com
provisioner.kubectl get storageclass
Example of system response:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE kubernetes-ssd (default) rook-ceph.rbd.csi.ceph.com Delete Immediate false 55m stacklight-alertmanager-data kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 55m stacklight-elasticsearch-data kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 55m stacklight-postgresql-db kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 55m stacklight-prometheus-data kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 55m
Verify that volumes are properly connected to the Pod:
Obtain the list of volumes in all namespaces or use a particular one:
kubectl get persistentvolumeclaims -A
Example of system response:
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE rook-ceph app-test Bound pv-test 1Gi RWO kubernetes-ssd 11m
For each volume, verify the connection. For example:
kubectl describe pvc app-test -n rook-ceph
Example of a positive system response:
Name: app-test Namespace: kaas StorageClass: rook-ceph Status: Bound Volume: pv-test Labels: <none> Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: 1Gi Access Modes: RWO VolumeMode: Filesystem Events: <none>
In case of connection issues, inspect the Pod description for the volume information:
kubectl describe pod <crashloopbackoff-pod-name>
Example of system response:
... Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 3 default-scheduler Warning FailedScheduling PersistentVolumeClaim is not bound: "app-test" (repeated 2 times) 1h 35s 36 kubelet, 172.17.8.101 Warning FailedMount Unable to mount volumes for pod "wordpress-mysql-918363043-50pjr_default(08d14e75-bd99-11e7-bc4c-001c428b9fc8)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-mysql-918363043-50pjr". list of unattached/unmounted volumes=[mysql-persistent-storage] 1h 35s 36 kubelet, 172.17.8.101 Warning FailedSync Error syncing pod
Verify that the CSI provisioner plugins started properly and are in the
Running
status:Obtain the list of CSI provisioner plugins:
kubectl -n rook-ceph get pod -l app=csi-rbdplugin-provisioner
Verify the logs of the required CSI provisioner:
kubectl logs -n rook-ceph <csi-provisioner-plugin-name> csi-provisioner