Verify Ceph Controller and Rook¶
The starting point for Ceph troubleshooting is the ceph-controller
and
rook-operator
logs. Once you locate the component that causes issues,
verify the logs of the related pod. This section describes how to verify the
Ceph Controller and Rook objects of a Ceph cluster.
To verify Ceph Controller and Rook:
Verify data access. Ceph volumes can be consumed directly by Kubernetes workloads and internally, for example, by OpenStack services. To verify the Kubernetes storage:
Verify the available storage classes. The storage classes that are automatically managed by Ceph Controller use the
rook-ceph.rbd.csi.ceph.com
provisioner.kubectl get storageclass
Example of system response:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE iam-kaas-iam-data kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 64m kubernetes-ssd (default) rook-ceph.rbd.csi.ceph.com Delete Immediate false 55m stacklight-alertmanager-data kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 55m stacklight-elasticsearch-data kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 55m stacklight-postgresql-db kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 55m stacklight-prometheus-data kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 55m
Verify that volumes are properly connected to the pod:
Obtain the list of volumes:
kubectl get persistentvolumeclaims -n kaas
Example of system response:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ironic-aio-pvc Bound pvc-9132beb2-6a17-4877-af40-06031d52da47 5Gi RWO kubernetes-ssd 62m ironic-inspector-pvc Bound pvc-e84e9a9e-51b8-4c57-b116-0e1e6a9e7e94 1Gi RWO kubernetes-ssd 62m mariadb-pvc Bound pvc-fb0dbf01-ee4b-4c88-8b08-901080ee8e14 2Gi RWO kubernetes-ssd 62m mysql-data-mariadb-server-0 Bound local-pv-d1ecc89d 457Gi RWO iam-kaas-iam-data 62m mysql-data-mariadb-server-1 Bound local-pv-1f385d17 457Gi RWO iam-kaas-iam-data 62m mysql-data-mariadb-server-2 Bound local-pv-79a820d7 457Gi RWO iam-kaas-iam-data 62m
For each volume, verify the connection. For example:
kubectl describe pvc ironic-aio-pvc -n kaas
Example of a positive system response:
Name: ironic-aio-pvc Namespace: kaas StorageClass: kubernetes-ssd Status: Bound Volume: pvc-9132beb2-6a17-4877-af40-06031d52da47 Labels: <none> Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: 5Gi Access Modes: RWO VolumeMode: Filesystem Events: <none> Mounted By: dnsmasq-dbd84d496-6fcz4 httpd-0 ironic-555bff5dd8-kb8p2
In case of connection issues, inspect the pod description for the volume information:
kubectl describe pod <crashloopbackoff-pod-name>
Example of system response:
... Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 3 default-scheduler Warning FailedScheduling PersistentVolumeClaim is not bound: "mysql-pv-claim" (repeated 2 times) 1h 35s 36 kubelet, 172.17.8.101 Warning FailedMount Unable to mount volumes for pod "wordpress-mysql-918363043-50pjr_default(08d14e75-bd99-11e7-bc4c-001c428b9fc8)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-mysql-918363043-50pjr". list of unattached/unmounted volumes=[mysql-persistent-storage] 1h 35s 36 kubelet, 172.17.8.101 Warning FailedSync Error syncing pod
Verify that the CSI provisioner plugins were started properly and have the
Running
status:Obtain the list of CSI provisioner plugins:
kubectl -n rook-ceph get pod -l app=csi-rbdplugin-provisioner
Verify the logs of the required CSI provisioner:
kubectl logs -n rook-ceph <csi-provisioner-plugin-name> csi-provisioner
Verify the Ceph cluster status:
Verify that the status of each pod in the
ceph-lcm-mirantis
androok-ceph
name spaces isRunning
:For
ceph-lcm-mirantis
:kubectl get pod -n ceph-lcm-mirantis
For
rook-ceph
:kubectl get pod -n rook-ceph
Verify Ceph Controller. Ceph Controller prepares the configuration that Rook uses to deploy the Ceph cluster, managed using the
KaasCephCluster
resource. If Rook cannot finish the deployment, verify the Rook Operator logs as described in the step 4.List the pods:
kubectl -n ceph-lcm-mirantis get pods
Verify the logs of the required pod:
kubectl -n ceph-lcm-mirantis logs <ceph-controller-pod-name>
Verify the configuration:
kubectl get kaascephcluster -n <managedClusterProjectName> -o yaml
On the managed cluster, verify the
MiraCeph
subresource:kubectl get miraceph -n ceph-lcm-mirantis -o yaml
Verify the Rook Operator logs. Rook deploys a Ceph cluster based on custom resources created by the Ceph Controller, such as pools, clients,
cephcluster
, and so on. Rook logs contain details about components orchestration. For details about the Ceph cluster status and to get access to CLI tools, connect to theceph-tools
pod as described in the step 5.Verify the Rook Operator logs:
kubectl -n rook-ceph logs -l app=rook-ceph-operator
Verify the
CephCluster
configuration:Note
The Ceph Controller manages the
CephCluster
CR . Open theCephCluster
CR only for verification and do not modify it manually.kubectl get cephcluster -n rook-ceph -o yaml
Verify the
ceph-tools
pod:Execute the
ceph-tools
pod:kubectl --kubeconfig <pathToManagedClusterKubeconfig> -n rook-ceph exec -it $(kubectl --kubeconfig <pathToManagedClusterKubeconfig> -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}') bash
Verify that CLI commands can run on the
ceph-tools
pod:ceph -s
Verify hardware:
Through the
ceph-tools
pod, obtain the required device in your cluster:ceph osd tree
Enter all Ceph OSD pods in the
rook-ceph
namespace one by one:kubectl exec -it -n rook-ceph <osd-pod-name> bash
Verify that the
ceph-volume
tool is available on all pods running on the target node:ceph-volume lvm list