Ceph

This section describes the alerts for the Ceph cluster.


CephClusterHealthMinor

Severity

Minor

Summary

Ceph cluster health is WARNING.

Description

The Ceph cluster {{ $labels.rook_cluster }} is in the WARNING state. For details, run ceph -s.


CephClusterHealthCritical

Severity

Critical

Summary

Ceph cluster health is CRITICAL.

Description

The Ceph cluster {{ $labels.rook_cluster }} is in the CRITICAL state. For details, run ceph -s.


CephClusterTargetDown

Available since 2.13.0

Severity

Critical

Summary

Ceph cluster Prometheus target is down.

Description

Prometheus fails to scrape metrics from the Ceph cluster {{ $labels.rook_cluster }} endpoint(s) (more than 1/10 failed scrapes).


CephMonQuorumAtRisk

Severity

Major

Summary

Ceph cluster quorum at risk.

Description

The Ceph Monitors quorum on the {{ $labels.rook_cluster }} cluster is low.


CephOSDDown

Severity

Critical

Summary

Ceph OSDs are down.

Description

{{ $value }} Ceph OSDs on the {{ $labels.rook_cluster }} cluster are down. For details, run ceph osd tree.


CephOSDDiskNotResponding

Severity

Critical

Summary

Disk not responding.

Description

The {{ $labels.device }} disk device is not responding to {{ $labels.ceph_daemon }} on the {{ $labels.node }} node of the {{ $labels.rook_cluster }} Ceph cluster.


CephOSDDiskUnavailable

Severity

Critical

Summary

Disk not accessible.

Description

The {{ $labels.device }} disk device is not accessible by {{ $labels.ceph_daemon }} on the {{ $labels.node }} node of the {{ $labels.rook_cluster }} Ceph cluster.


CephClusterFullWarning

Severity

Warning

Summary

Ceph cluster is nearly full.

Description

The Ceph cluster {{ $labels.rook_cluster }} utilization has crossed 85%. Expansion is required.


CephClusterFullCritical

Severity

Critical

Summary

Ceph cluster is full.

Description

The Ceph cluster {{ $labels.rook_cluster }} utilization has crossed 95% and needs immediate expansion.


CephOSDPgNumTooHighWarning

Severity

Warning

Summary

Ceph OSDs have more than 200 PGs.

Description

Some Ceph OSDs on the {{ $labels.rook_cluster }} cluster contain more than 200 Placement Groups. This may have a negative impact on the cluster performance. For details, run ceph pg dump.


CephOSDPgNumTooHighCritical

Severity

Critical

Summary

Ceph OSDs have more than 300 PGs.

Description

Some Ceph OSDs on the {{ $labels.rook_cluster }} cluster contain more than 300 Placement Groups. This may have a negative impact on the cluster performance. For details, run ceph pg dump.


CephMonHighNumberOfLeaderChanges

Severity

Warning

Summary

Ceph cluster has too many leader changes.

Description

The Ceph Monitor {{ $labels.ceph_daemon }} on the {{ $labels.rook_cluster }} cluster has detected {{ $value }} leader changes per minute.


CephNodeDown

Severity

Critical

Summary

Ceph node {{ $labels.node }} went down.

Description

The Ceph node {{ $labels.node }} of the {{ $labels.rook_cluster }} cluster went down and requires immediate verification.


CephOSDVersionMismatch

Severity

Warning

Summary

Multiple versions of Ceph OSDs running.

Description

{{ $value }} different versions of Ceph OSD daemons are running on the {{ $labels.rook_cluster }} cluster.


CephMonVersionMismatch

Severity

Warning

Summary

Multiple versions of Ceph Monitors running.

Description

{{ $value }} different versions of Ceph Monitor components are running on the {{ $labels.rook_cluster }} cluster.


CephPGInconsistent

Severity

Minor

Summary

Too many inconsistent Ceph PGs.

Description

The Ceph cluster {{ $labels.rook_cluster }} detects inconsistencies in one or more replicas of an object in {{ $value }} Placement Groups on the {{ $labels.name }} pool.


CephPGUndersized

Severity

Minor

Summary

Too many undersized Ceph PGs.

Description

The Ceph cluster {{ $labels.rook_cluster }} reports {{ $value }} Placement Groups have fewer copies than the configured pool replication level on the {{ $labels.name }} pool.