Kubernetes system

Kubernetes system

This section lists the alerts for the Kubernetes system.


KubeNodeNotReady

Severity

Warning

Summary

The {{ $labels.node }} node is not ready.

Description

The Kubernetes {{ $labels.node }} node is not ready for more than one hour.


KubeVersionMismatch

Severity

Warning

Summary

Kubernetes components have mismatching versions.

Description

Kubernetes has components with {{ $value }} different semantic versions running.


KubeClientErrors

Severity

Warning

Summary

Kubernetes API client has more than 1% of error requests.

Description

The {{ $labels.job }}/{{ $labels.instance }} Kubernetes API server client has {{ printf "%0.0f" $value }}% errors.


KubeletTooManyPods

Severity

Warning

Summary

kubelet reached 90% of Pods limit.

Description

The {{ $labels.instance }}/{{ $labels.node }} kubelet runs {{ $value }} Pods, nearly 90% of possible allocation.


KubeAPIDown

Severity

Critical

Summary

Kubernetes API endpoint is down.

Description

The Kubernetes API endpoint {{ $labels.instance }} is not accessible for the last 3 minutes.


KubeAPIOutage

Severity

Critical

Summary

Kubernetes API is down.

Description

The Kubernetes API is not accessible for the last 30 seconds.


KubeAPILatencyHighWarning

Severity

Warning

Summary

The API server has a 99th percentile latency of more than 1 second.

Description

The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.


KubeAPILatencyHighMajor

Severity

Major

Summary

The API server has a 99th percentile latency of more than 4 seconds.

Description

The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.


KubeAPIErrorsHighMajor

Severity

Major

Summary

API server returns errors for more than 3% of requests.

Description

The API server returns errors for {{ $value }}% of requests.


KubeAPIErrorsHighWarning

Severity

Warning

Summary

API server returns errors for more than 1% of requests.

Description

The API server returns errors for {{ $value }}% of requests.


KubeAPIResourceErrorsHighMajor

Severity

Major

Summary

API server returns errors for 10% of requests.

Description

The API server returns errors for {{ $value }}% of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.


KubeAPIResourceErrorsHighWarning

Severity

Warning

Summary

API server returns errors for 5% of requests.

Description

The API server returns errors for {{ $value }}% of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.


KubeClientCertificateExpirationInSevenDays

Severity

Warning

Summary

A client certificate expires in 7 days.

Description

A client certificate used to authenticate to the API server expires in less than 7 days.


KubeClientCertificateExpirationInOneDay

Severity

Critical

Summary

A client certificate expires in 24 hours.

Description

A client certificate used to authenticate to the API server expires in less than 24 hours.


ContainerScrapeError

Severity

Warning

Summary

Failure to get Kubernetes container metrics.

Description

Prometheus was not able to scrape metrics from the container on the {{ $labels.node }} Kubernetes node.