Kubernetes system

This section lists the alerts for the Kubernetes system.


KubeNodeNotReady

Severity

Warning

Summary

Node {{ $labels.node }} is not ready.

Description

The {{ $labels.node }} Kubernetes has been unready for more than an hour.

KubeStateMetricsTargetDown

Severity

Critical

Summary

kube-state-metrics Prometheus target is down.

Description

Prometheus fails to scrape metrics from the kube-state-metrics service.

KubeVersionMismatch

Severity

Warning

Summary

Kubernetes components version mismatch.

Description

There are {{ $value }} different semantic versions of Kubernetes components running.

KubeletTargetDown

Severity

Critical

Summary

Kubelet Prometheus target is down.

Description

Prometheus fails to scrape metrics from kubelet on the {{ $labels.node }} node (more than 1/10 failed scrapes).

KubeletTargetsOutage

Severity

Critical

Summary

Kubelet Prometheus targets outage.

Description

Prometheus fails to scrape metrics from kubelet on all nodes (more than 1/10 failed scrapes).

KubeClientErrors

Severity

Warning

Summary

Kubernetes API client has more than 1% of error requests.

Description

The {{ $labels.instance }} Kubernetes API server client has {{ printf "%0.0f" $value }}% errors.

KubeContainerScrapeError

Severity

Warning

Summary

Failure to get Kubernetes container metrics.

Description

cAdvisor was not able to scrape metrics from some containers on the {{ $labels.node }} Kubernetes node.

KubeDNSTargetsOutage

Removed in 17.0.0, 16.0.0, and 14.1.0

Severity

Critical

Summary

CoreDNS Prometheus targets outage.

Description

Prometheus fails to scrape metrics from all CoreDNS endpoints (more than 1/10 failed scrapes).

KubeletTooManyPods

Severity

Warning

Summary

kubelet reached 90% of Pods limit.

Description

The kubelet container on the {{ $labels.node }} Node is running {{ $value }} Pods, nearly 90% of possible allocation.

cAdvisorTargetDown

Severity

Major

Summary

cAdvisor Prometheus target is down.

Description

Prometheus fails to scrape metrics from the cAdvisor endpoint on the {{ $labels.node }} node.

cAdvisorTargetsOutage

Severity

Critical

Summary

cAdvisor Prometheus targets outage.

Description

Prometheus fails to scrape metrics from all cAdvisor endpoints.

KubeAPIDown

Severity

Critical

Summary

A Kubernetes API endpoint is down.

Description

The {{ $labels.node }} Kubernetes API endpoint is not accessible for the last 3 minutes.

KubeAPIOutage

Severity

Critical

Summary

Kubernetes API is down.

Description

The Kubernetes API is not accessible for the last 30 seconds.

KubeAPIErrorsHighMajor

Severity

Major

Summary

API server is returning errors for more than 3% of requests.

Description

The {{ $labels.instance }} API server is returning errors for {{ $value }}% of requests.

KubeAPIErrorsHighWarning

Severity

Warning

Summary

API server is returning errors for more than 1% of requests.

Description

The API server is returning errors for {{ $value }}% of requests.

KubeAPIResourceErrorsHighMajor

Severity

Major

Summary

API server is returning errors for 10% of requests.

Description

The {{ $labels.instance }} API server is returning errors for {{ $value }}% of requests for {{ $labels.resource }} {{ $labels.subresource }}.

KubeAPIResourceErrorsHighWarning

Severity

Warning

Summary

API server is returning errors for 5% of requests.

Description

The {{ $labels.instance }} API server is returning errors for {{ $value }}% of requests for {{ $labels.resource }} {{ $labels.subresource }}.

KubeClientCertificateExpirationInSevenDays

Severity

Warning

Summary

Client certificate expires in 7 days.

Description

The client certificate used to authenticate to the API server expires in less than 7 days.

KubeClientCertificateExpirationInOneDay

Severity

Critical

Summary

Client certificate expires in 24 hours.

Description

The client certificate used to authenticate to the API server expires in less than 24 hours.

KubeAPICertExpirationMajor

Severity

Major

Summary

Kubernetes API certificate expires in less than 10 days.

Description

The SSL certificate for Kubernetes API expires in less than 10 days.

KubeAPICertExpirationWarning

Severity

Warning

Summary

Kubernetes API certificate expires in less than 30 days.

Description

The SSL certificate for Kubernetes API expires in less than 30 days.

KubernetesApiserverTargetsOutage

Severity

Critical

Summary

Kubernetes API server Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of Kubernetes API server endpoints.

KubernetesMasterAPITargetsOutage

Severity

Critical

Summary

Kubernetes master API Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of Kubernetes master API nodes.