Kubernetes system

This section lists the alerts for the Kubernetes system.

For troubleshooting guidelines, see Troubleshoot Kubernetes system alerts.


KubeAPICertExpirationHigh

Severity

Critical

Summary

Kubernetes API certificate expires on {{ $value | humanizeTimestamp }}.

Description

The SSL certificate for Kubernetes API expires on {{ $value | humanizeTimestamp }}, less than 10 days are left.

KubeAPICertExpirationMedium

Severity

Major

Summary

Kubernetes API certificate expires on {{ $value | humanizeTimestamp }}.

Description

The SSL certificate for Kubernetes API expires on {{ $value | humanizeTimestamp }}, less than 30 days are left.

KubeAPIDown

Severity

Critical

Summary

A Kubernetes API endpoint is down.

Description

The {{ $labels.node }} Kubernetes API endpoint is not accessible for the last 3 minutes.

KubeAPIErrorsHighMajor

Severity

Major

Summary

API server is returning errors for more than 3% of requests.

Description

The {{ $labels.instance }} API server is returning errors for {{ $value }}% of requests.

KubeAPIErrorsHighWarning

Severity

Warning

Summary

API server is returning errors for more than 1% of requests.

Description

The API server is returning errors for {{ $value }}% of requests.

KubeAPIOutage

Severity

Critical

Summary

Kubernetes API is down.

Description

The Kubernetes API is not accessible for the last 30 seconds.

KubeAPIResourceErrorsHighMajor

Severity

Major

Summary

API server is returning errors for 10% of requests.

Description

The {{ $labels.instance }} API server is returning errors for {{ $value }}% of requests for {{ $labels.resource }} {{ $labels.subresource }}.

KubeAPIResourceErrorsHighWarning

Severity

Warning

Summary

API server is returning errors for 5% of requests.

Description

The {{ $labels.instance }} API server is returning errors for {{ $value }}% of requests for {{ $labels.resource }} {{ $labels.subresource }}.

KubeClientCertificateExpirationInOneDay

Removed in 2.28.0 (17.3.0 and 16.3.0)

Severity

Critical

Summary

Client certificate expires in 24 hours.

Description

The client certificate used to authenticate to the API server expires in less than 24 hours.

KubeClientCertificateExpirationInSevenDays

Removed in 2.28.0 (17.3.0 and 16.3.0)

Severity

Major

Summary

Client certificate expires in 7 days.

Description

The client certificate used to authenticate to the API server expires in less than 7 days.

KubeClientErrors

Severity

Warning

Summary

Kubernetes API client has more than 1% of error requests.

Description

The {{ $labels.instance }} Kubernetes API server client has {{ printf "%0.0f" $value }}% errors.

KubeDNSTargetsOutage

Removed in 17.0.0, 16.0.0, and 14.1.0

Severity

Critical

Summary

CoreDNS Prometheus targets outage.

Description

Prometheus fails to scrape metrics from all CoreDNS endpoints (more than 1/10 failed scrapes).

KubeletTargetDown

Severity

Critical

Summary

Kubelet Prometheus target is down.

Description

Prometheus fails to scrape metrics from kubelet on the {{ $labels.node }} node (more than 1/10 failed scrapes).

KubeletTargetsOutage

Severity

Critical

Summary

Kubelet Prometheus targets outage.

Description

Prometheus fails to scrape metrics from kubelet on all nodes (more than 1/10 failed scrapes).

KubeletTooManyPods

Severity

Warning

Summary

kubelet reached 90% of Pods limit.

Description

The kubelet container on the {{ $labels.node }} Node is running {{ $value }} Pods, nearly 90% of possible allocation.

KubeNodeNotReady

Severity

Warning

Summary

Node {{ $labels.node }} is not ready.

Description

The {{ $labels.node }} Kubernetes has been unready for more than an hour.

KubernetesApiserverTargetsOutage

Severity

Critical

Summary

Kubernetes API server Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of Kubernetes API server endpoints.

KubernetesMasterAPITargetsOutage

Severity

Critical

Summary

Kubernetes master API Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of Kubernetes master API nodes.

KubeStateMetricsTargetDown

Severity

Critical

Summary

kube-state-metrics Prometheus target is down.

Description

Prometheus fails to scrape metrics from the kube-state-metrics service.

KubeVersionMismatch

Severity

Warning

Summary

Kubernetes components version mismatch.

Description

There are {{ $value }} different semantic versions of Kubernetes components running.