View Grafana dashboards¶
Using the Grafana web UI, you can view the visual representation of the metric graphs based on the time series databases.
Most Grafana dashboards include a View logs in OpenSearch Dashboards link to immediately view relevant logs in the OpenSearch Dashboards web UI. The OpenSearch Dashboards web UI displays logs filtered using the Grafana dashboard variables, such as the drop-downs. Once you amend the variables, wait for Grafana to generate a new URL.
Note
Due to the known issue, the View logs in OpenSearch Dashboards link does not work in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). The issue is addressed in Container Cloud 2.26.1 (Cluster releases 17.1.1 and 16.1.1).
Caution
The Grafana dashboards that contain drop-down lists are limited to 1000 lines. Therefore, if you require data on a specific item, use the filter by name instead.
Note
Grafana dashboards that present node data have an additional Node identifier drop-down menu. By default, it is set to machine to display short names for Kubernetes nodes. To display Kubernetes node name labels, change this option to node.
To view the Grafana dashboards:
Log in to the Grafana web UI as described in Access StackLight web UIs.
From the drop-down list, select the required dashboard to inspect the status and statistics of the corresponding service in your management or managed cluster:
Component
Dashboard
Description
Ceph cluster
Ceph Cluster
Provides the overall health status of the Ceph cluster, capacity, latency, and recovery metrics.
Ceph Nodes
Provides an overview of the host-related metrics, such as the number of Ceph Monitors, Ceph OSD hosts, average usage of resources across the cluster, network and hosts load.
This dashboard is deprecated since Container Cloud 2.25.0 (Cluster releases 17.0.0, 16.0.0, 14.1.0) and is removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).
Therefore, Mirantis recommends switching to the following dashboards in the current release:
For Ceph stats, use the Ceph Cluster dashboard.
For resource utilization, use the System dashboard, which includes filtering by Ceph node labels, such as
ceph_role_osd
,ceph_role_mon
, andceph_role_mgr
.
Ceph OSDs
Provides metrics for Ceph OSDs, including the Ceph OSD read and write latencies, distribution of PGs per Ceph OSD, Ceph OSDs and physical device performance.
Ceph Pools
Provides metrics for Ceph pools, including the client IOPS and throughput by pool and pools capacity usage.
Ironic bare metal
Ironic BM
Provides graphs on Ironic health, HTTP API availability, provisioned nodes by state and installed
ironic-conductor
backend drivers.Container Cloud
Clusters Overview
Represents the main cluster capacity statistics for all clusters of a Mirantis Container Cloud deployment where StackLight is installed.
Note
Due to the known issue, the Prometheus Targets Unavailable panel of the Clusters Overview dashboard does not display data for managed clusters of the 11.7.0, 11.7.4, 12.5.0, and 12.7.x series Cluster releases after update to Container Cloud 2.24.0.
Etcd
Available since Container Cloud 2.21.0 and 2.21.1 for MOSK 22.5. Provides graphs on database size, leader elections, requests duration, incoming and outgoing traffic.
MCC Applications Performance
Available since Container Cloud 2.23.0 and 2.23.1 for MOSK 23.1. Provides information on the Container Cloud internals work based on Golang, controller runtime, and custom metrics. You can use it to verify performance of applications and for troubleshooting purposes.
Kubernetes resources
Kubernetes Calico
Provides metrics of the entire Calico cluster usage, including the cluster status, host status, and Felix resources.
Kubernetes Cluster
Provides metrics for the entire Kubernetes cluster, including the cluster status, host status, and resources consumption.
Kubernetes Containers
Provides charts showing resource consumption per deployed Pod containers running on Kubernetes nodes.
Kubernetes Deployments
Provides information on the desired and current state of all service replicas deployed on a Container Cloud cluster.
Kubernetes Namespaces
Provides the Pods state summary and the CPU, MEM, network, and IOPS resources consumption per name space.
Kubernetes Nodes
Provides charts showing resources consumption per Container Cloud cluster node.
Kubernetes Pods
Provides charts showing resources consumption per deployed Pod.
NGINX
NGINX
Provides the overall status of the NGINX cluster and information about NGINX requests and connections.
StackLight
Alertmanager
Provides performance metrics on the overall health status of the Prometheus Alertmanager service, the number of firing and resolved alerts received for various periods, the rate of successful and failed notifications, and the resources consumption.
OpenSearch
Provides information about the overall health status of the OpenSearch cluster, including the resources consumption, number of operations and their performance.
OpenSearch Indices
Provides detailed information about the state of indices, including their size, the number and the size of segments.
Grafana
Provides performance metrics for the Grafana service, including the total number of Grafana entities, CPU and memory consumption.
PostgreSQL
Provides PostgreSQL statistics, including read (DQL) and write (DML) row operations, transaction and lock, replication lag and conflict, and checkpoint statistics, as well as PostgreSQL performance metrics.
Prometheus
Provides the availability and performance behavior of the Prometheus servers, the sample ingestion rate, and system usage statistics per server. Also, provides statistics about the overall status and uptime of the Prometheus service, the chunks number of the local storage memory, target scrapes, and queries duration.
Prometheus Relay
Provides service status and resources consumption metrics.
Reference Application
Available since Container Cloud 2.21.0 for non-MOSK clusters. Provides check statuses of Reference Application and statistics such as response time and content length.
Note
For the feature support on MOSK deployments, refer to MOSK documentation: Deploy RefApp using automation tools.
Telemeter Server
Provides statistics and the overall health status of the Telemeter service.
Note
Due to the known issue, the Telemeter Client Status panel of the Telemeter Server dashboard does not display data for managed clusters of the 11.7.0, 11.7.4, 12.5.0, and 12.7.x series Cluster releases after update to Container Cloud 2.24.0.
System
System
Provides a detailed resource consumption and operating system information per Container Cloud cluster node.
Mirantis Kubernetes Engine (MKE)
MKE Cluster
Provides a global overview of an MKE cluster: statistics about the number of the worker and manager nodes, containers, images, Swarm services.
MKE Containers
Provides per container resources consumption metrics for the MKE containers such as CPU, RAM, network.