Documentation Portal

Kubernetes

Kubernetes¶

This section describes the alerts for Kubernetes.

ContainerScrapeError
KubernetesProcessDown
KubernetesProcessDownMinor
KubernetesProcessDownMajor
KubernetesProcessOutage

ContainerScrapeError¶

Severity	Warning
Summary	Prometheus was not able to scrape metrics from the container on the `{{ $labels.instance }}` Kubernetes instance.
Raise condition	`container_scrape_error != 0`
Description	Raises when `cadvisor` fails to scrape metrics from a container.
Tuning	Not required

KubernetesProcessDown¶

Severity	Minor
Summary	The Kubernetes `{{ $labels.process_name }}` process on the `{{ $labels.host }}` node is down for 2 minutes.
Raise condition	`procstat_running{process_name=~"hyperkube-.*"} == 0`
Description	Raises when Telegraf cannot find running `hyperkube-kubelet`, `hyperkube-proxy`, `hyperkube-apiserver`, `hyperkube-controller-manager`, and `hyperkube-scheduler` processes on any `ctl` host and `hyperkube-kubelet` or `hyperkube-proxy` processes on any `cmp` host. The `process_name` label in the raised alert contains the process name.
Troubleshooting	Verify the containerd status on the affected node using `systemctl containerd status`. Verify the Docker status on the affected node using `systemctl docker status`. For issues with `cmp`, verify `criproxy` using `systemctl criproxy status`. Inspect the logs in `/var/log/kubernetes.log`.
Tuning	Not required

KubernetesProcessDownMinor¶

Severity	Minor
Summary	`{{ $value }}` Kubernetes `{{ $labels.process_name }}` processes (>= `{{ instance_minor_threshold_percent * 100}}%`) are down for 2 minutes.
Raise condition	`count(procstat_running{process_name=~"hyperkube-."} == 0) by (process_name) > count(procstat_running{process_name=~"hyperkube-."}) by (process_name) * {{ instance_minor_threshold_percent }}`
Description	Raises when Telegraf cannot find running `hyperkube-kubelet`, `hyperkube-proxy`, `hyperkube-apiserver`, `hyperkube-controller-manager`, and `hyperkube-scheduler` processes on more than 30% of the `ctl` or `cmp` hosts. The `process_name` label in the raised alert contains the process name. For the affected nodes, see the `host` label in the `KubernetesProcessDown` alerts.
Troubleshooting	Verify the containerd status on the affected node using `systemctl containerd status`. Verify the Docker status on the affected node using `systemctl docker status`. For issues with `cmp`, verify `criproxy` using `systemctl criproxy status`. Inspect the logs in `/var/log/kubernetes.log`.
Tuning	Not required

KubernetesProcessDownMajor¶

Severity	Major
Summary	`{{ $value }}` Kubernetes `{{ $labels.process_name }}` processes (>= `{{ instance_major_threshold_percent * 100}}%`) are down for 2 minutes.
Raise condition	`count(procstat_running{process_name=~"hyperkube-."} == 0) by (process_name) > count(procstat_running{process_name=~"hyperkube-."}) by (process_name) * {{ instance_major_threshold_percent }}`
Description	Raises when Telegraf cannot find running `hyperkube-kubelet`, `hyperkube-proxy`, `hyperkube-apiserver`, `hyperkube-controller-manager`, and `hyperkube-scheduler` processes on more than 60% of the `ctl` or `cmp` hosts. The `process_name` label in the raised alert contains the process name. For the affected nodes, see the `host` label in the `KubernetesProcessDown` alerts.
Troubleshooting	Verify the containerd status on the affected node using `systemctl containerd status`. Verify the Docker status on the affected node using `systemctl docker status`. For issues with `cmp`, verify `criproxy` using `systemctl criproxy status`. Inspect the logs in `/var/log/kubernetes.log`.
Tuning	Not required

KubernetesProcessOutage¶

Severity	Critical
Summary	All Kubernetes `{{ $labels.process_name }}` processes are down for 2 minutes.
Raise condition	`count(procstat_running{process_name=~"hyperkube-."}) by (process_name) == count(procstat_running{process_name=~"hyperkube-."} == 0) by (process_name)`
Description	Raises when Telegraf cannot find running `hyperkube-kubelet`, `hyperkube-proxy`, `hyperkube-apiserver`, `hyperkube-controller-manager`, and `hyperkube-scheduler` processes on all `ctl` and `cmp` hosts. The `process_name` label in the raised alert contains the process name.
Troubleshooting	Verify the containerd status on the affected node using `systemctl containerd status`. Verify the Docker status on the affected node using `systemctl docker status`. For issues with `cmp`, verify `criproxy` using `systemctl criproxy status`. Inspect the logs in `/var/log/kubernetes.log`.
Tuning	Not required

updated: 2025-01-10 08:56

etcd

View Previous Section

OpenContrail

View Next Section