Using MKE cluster metrics with Prometheus

Using MKE cluster metrics with Prometheus

MKE metrics

The following table lists the metrics that MKE exposes in Prometheus, along with descriptions. Note that only the metrics labeled with ucp_ are documented. Other metrics are exposed in Prometheus but are not documented.

Name

Units

Description

Labels

Metric source

ucp_controller_services

number of services

The total number of Swarm services.

Controller

ucp_engine_container_cpu_percent

percentage

The percentage of CPU time this container is using.

container labels

Node

ucp_engine_container_cpu_total_time_nanoseconds

nanoseconds

Total CPU time used by this container in nanoseconds.

container labels

Node

ucp_engine_container_health

0.0 or 1.0

Whether or not this container is healthy, according to its healthcheck. Note that if this value is 0, it just means that the container is not reporting healthy; it might not have a healthcheck defined at all, or its healthcheck might not have returned any results yet.

container labels

Node

ucp_engine_container_memory_max_usage_bytes

bytes

Maximum memory used by this container in bytes.

container labels

Node

ucp_engine_container_memory_usage_bytes

bytes

Current memory used by this container in bytes.

container labels

Node

ucp_engine_container_memory_usage_percent

percentage

Percentage of total node memory currently being used by this container.

container labels

Node

ucp_engine_container_network_rx_bytes_total

bytes

Number of bytes received by this container on this network in the last sample.

container networking labels

Node

ucp_engine_container_network_rx_dropped_packets_total

number of packets

Number of packets bound for this container on this network that were dropped in the last sample.

container networking labels

Node

ucp_engine_container_network_rx_errors_total

number of errors

Number of received network errors for this container on this network in the last sample.

container networking labels

Node

ucp_engine_container_network_rx_packets_total

number of packets

Number of received packets for this container on this network in the last sample.

container networking labels

Node

ucp_engine_container_network_tx_bytes_total

bytes

Number of bytes sent by this container on this network in the last sample.

container networking labels

Node

ucp_engine_container_network_tx_dropped_packets_total

number of packets

Number of packets sent from this container on this network that were dropped in the last sample.

container networking labels

Node

ucp_engine_container_network_tx_errors_total

number of errors

Number of sent network errors for this container on this network in the last sample.

container networking labels

Node

ucp_engine_container_network_tx_packets_total

number of packets

Number of sent packets for this container on this network in the last sample.

container networking labels

Node

ucp_engine_container_unhealth

0.0 or 1.0

Whether or not this container is unhealthy, according to its healthcheck. Note that if this value is 0, it just means that the container is not reporting unhealthy; it might not have a healthcheck defined at all, or its healthcheck might not have returned any results yet.

container labels

Node

ucp_engine_containers

number of containers

Total number of containers on this node.

node labels

Node

ucp_engine_cpu_total_time_nanoseconds

nanoseconds

System CPU time used by this container in nanoseconds.

container labels

Node

ucp_engine_disk_free_bytes

bytes

Free disk space on the Docker root directory on this node in bytes. Note that this metric is not available for Windows nodes.

node labels

Node

ucp_engine_disk_total_bytes

bytes

Total disk space on the Docker root directory on this node in bytes. Note that this metric is not available for Windows nodes.

node labels

Node

ucp_engine_images

number of images

Total number of images on this node.

node labels

Node

ucp_engine_memory_total_bytes

bytes

Total amount of memory on this node in bytes.

node labels

Node

ucp_engine_networks

number of networks

Total number of networks on this node.

node labels

Node

ucp_engine_node_health

0.0 or 1.0

Whether or not this node is healthy, as determined by MKE.

nodeName: node name, nodeAddr: node IP address

Controller

ucp_engine_num_cpu_cores

number of cores

Number of CPU cores on this node.

node labels

Node

ucp_engine_pod_container_ready

0.0 or 1.0

Whether or not this container in a Kubernetes pod is ready, as determined by its readiness probe.

pod labels

Controller

ucp_engine_pod_ready

0.0 or 1.0

Whether or not this Kubernetes pod is ready, as determined by its readiness probe.

pod labels

Controller

ucp_engine_volumes

number of volumes

Total number of volumes on this node.

node labels

Node

Metrics labels

Metrics exposed by MKE in Prometheus have standardized labels, depending on the resource that they are measuring. The following table lists some of the labels that are used, along with their values:

Container labels

Label name

Value

collection

The collection ID of the collection this container is in, if any.

container

The ID of this container.

image

The name of this container’s image.

manager

“true” if the container’s node is a MKE manager, “false” otherwise.

name

The name of the container.

podName

If this container is part of a Kubernetes pod, this is the pod’s name.

podNamespace

If this container is part of a Kubernetes pod, this is the pod’s namespace.

podContainerName

If this container is part of a Kubernetes pod, this is the container’s name in the pod spec.

service

If this container is part of a Swarm service, this is the service ID.

stack

If this container is part of a Docker compose stack, this is the name of the stack.

Container networking labels

The following metrics measure network activity for a given network attached to a given container. They have the same labels as Container labels, with one addition:

Label name

Value

network

The ID of the network.

Node labels

Label name

Value

manager

“true” if the node is a MKE manager, “false” otherwise.

Metric source

MKE exports metrics on every node and also exports additional metrics from every controller. The metrics that are exported from controllers are cluster-scoped, for example, the total number of Swarm services. Metrics that are exported from nodes are specific to those nodes, for example, the total memory on that node.