Etcd

This section describes the alerts for the etcd service.


etcdDbSizeCritical

Available since 12.5.0, 11.5.0, and 7.11.0

Severity

Critical

Summary

Etcd database passed 95% of quota.

Description

The {{ $labels.job }} etcd database reached {{ $value }} % of defined quota on the {{ $labels.node }} node.

etcdDbSizeMajor

Available since 12.5.0, 11.5.0, and 7.11.0

Severity

Major

Summary

Etcd database passed 85% of quota.

Description

The {{ $labels.job }} etcd database reached {{ $value }} % of defined quota on the {{ $labels.node }} node.

etcdInsufficientMembers

Severity

Critical

Summary

Etcd cluster has insufficient members.

Description

The {{ $labels.job }} etcd cluster has {{ $value }} insufficient members.

etcdNoLeader

Severity

Critical

Summary

Etcd cluster has no leader.

Description

The {{ $labels.node }} member of the {{ $labels.job }} etcd cluster has no leader.

etcdHighNumberOfLeaderChanges

Severity

Warning

Summary

Etcd cluster has detected more than 3 leader changes within the last hour.

Description

The {{ $labels.node }} node of the {{ $labels.job }} etcd cluster has {{ $value }} leader changes within the last hour.

etcdHighNumberOfFailedProposals

Severity

Warning

Summary

Etcd cluster has more than 5 proposal failures.

Description

The {{ $labels.job }} etcd cluster has {{ $value }} proposal failures on the {{ $labels.node }} etcd node within the last hour.

etcdTargetDown

Since 17.0.0, 16.0.0, and 14.1.0 to replace etcdTargetsOutage

Severity

Critical

Summary

Etcd cluster Prometheus target down.

Description

Prometheus fails to scrape metrics from the etcd {{ $labels.job }} cluster instance on the {{ $labels.node }} node.

etcdTargetsOutage

Replaced with etcdTargetDown in 17.0.0, 16.0.0, and 14.1.0

Severity

Critical

Summary

Etcd cluster Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of etcd nodes (more than 1/10 failed scrapes).