Monitor an MKE cluster¶
You can monitor the health of your MKE cluster using the MKE web UI, the CLI,
_ping endpoint. This topic describes how to monitor your cluster
health, vulnerability counts, and disk usage.
For those running MSR in addition to MKE, MKE displays image vulnerability scanning count data obtained from MSR for containers, Swarm services, Pods, and images. This feature requires that you run MSR 2.6.x or later and enable MKE single sign-on.
The MKE web UI only displays the disk usage metrics, including space
availability, for the
/var/lib/docker part of the filesystem. Monitoring
the total space available on each filesystem of an MKE worker or manager node
requires that you deploy a third-party operating system-monitoring solution.
Monitor with the MKE web UI¶
Log in to the MKE web UI.
From the left-side navigation panel, navigate to the Dashboard page.
Cluster health-related warnings that require your immediate attention display on the cluster dashboard. A greater number of such warnings are likely to present for MKE administrators than for regular users.
Navigate to Shared Resources > Nodes to inspect the health of the nodes that MKE manages. To read the node health status, hover over the colored indicator.
Click a particular node to learn more about its health.
Click on the vertical ellipsis in the top right corner and select Tasks.
From the left-side navigation panel, click Agent Logs to examine log entries.
Monitor with the CLI¶
Examine the health of the nodes in your cluster:
docker node ls
Status messages that begin with
[Pending]indicate a transient state that is expected to resolve itself and return to a healthy state.
Automate the monitoring process¶
Automate the MKE cluster monitoring process by using the
https://<mke-manager-url>/_ping endpoint to evaluate the health of a single
manager node. The MKE manager evaluates whether its internal components are
functioning properly, and returns one of the following HTTP codes:
200 - all components are healthy
500 - one or more components are not healthy
Using an administrator client certificate as a TLS client certificate for the
_ping endpoint returns a detailed error message if any component is
Do not access the
_ping endpoint with a load balancer, as this method does
not allow you to determine which manager node is not healthy. Instead, connect
directly to the URL of a manager node. Use
GET to ping the endpoint instead
HEAD returns a 404 error code.