Ceph

This section describes the alerts for the Ceph cluster.


CephClusterHealthWarning

Severity

Warning

Summary

Ceph cluster health is WARNING.

Description

The Ceph cluster {{ $labels.rook_cluster }} is in the WARNING state. For details, run ceph -s.

CephClusterHealthCritical

Severity

Critical

Summary

Ceph cluster health is CRITICAL.

Description

The Ceph cluster {{ $labels.rook_cluster }} is in the CRITICAL state. For details, run ceph -s.

CephClusterTargetDown

Severity

Critical

Summary

Ceph cluster Prometheus target is down.

Description

Prometheus fails to scrape metrics from the Ceph cluster {{ $labels.rook_cluster }} endpoint(s) (more than 1/10 failed scrapes).

CephMonQuorumAtRisk

Severity

Major

Summary

Ceph cluster quorum at risk.

Description

The Ceph Monitors quorum on the {{ $labels.rook_cluster }} cluster is low.

CephOSDDown

Severity

Critical

Summary

Ceph OSDs are down.

Description

{{ $value }} Ceph OSDs on the {{ $labels.rook_cluster }} cluster are down. For details, run ceph osd tree.

CephOSDDiskNotResponding

Severity

Critical

Summary

Disk not responding.

Description

The {{ $labels.device }} disk device is not responding to {{ $labels.ceph_daemon }} on the {{ $labels.node }} node of the {{ $labels.rook_cluster }} Ceph cluster.

CephOSDDiskUnavailable

Severity

Critical

Summary

Disk not accessible.

Description

The {{ $labels.device }} disk device is not accessible by {{ $labels.ceph_daemon }} on the {{ $labels.node }} node of the {{ $labels.rook_cluster }} Ceph cluster.

CephClusterFullWarning

Severity

Warning

Summary

Ceph cluster is nearly full.

Description

The Ceph cluster {{ $labels.rook_cluster }} utilization has crossed 85%. Expansion is required.

CephClusterFullCritical

Severity

Critical

Summary

Ceph cluster is full.

Description

The Ceph cluster {{ $labels.rook_cluster }} utilization has crossed 95% and needs immediate expansion.

CephOSDPgNumTooHighWarning

Severity

Warning

Summary

Ceph OSDs have more than 200 PGs.

Description

Some Ceph OSDs on the {{ $labels.rook_cluster }} cluster contain more than 200 Placement Groups. This may have a negative impact on the cluster performance. For details, run ceph pg dump.

CephOSDPgNumTooHighCritical

Severity

Critical

Summary

Ceph OSDs have more than 300 PGs.

Description

Some Ceph OSDs on the {{ $labels.rook_cluster }} cluster contain more than 300 Placement Groups. This may have a negative impact on the cluster performance. For details, run ceph pg dump.

CephMonHighNumberOfLeaderChanges

Severity

Warning

Summary

Ceph cluster has too many leader changes.

Description

The Ceph Monitor {{ $labels.ceph_daemon }} on the {{ $labels.rook_cluster }} cluster has detected {{ $value }} leader changes per minute.

CephNodeDown

Severity

Critical

Summary

Ceph node {{ $labels.node }} went down.

Description

The Ceph node {{ $labels.node }} of the {{ $labels.rook_cluster }} cluster went down and requires immediate verification.

CephOSDVersionMismatch

Severity

Warning

Summary

Multiple versions of Ceph OSDs running.

Description

{{ $value }} different versions of Ceph OSD daemons are running on the {{ $labels.rook_cluster }} cluster.

CephMonVersionMismatch

Severity

Warning

Summary

Multiple versions of Ceph Monitors running.

Description

{{ $value }} different versions of Ceph Monitor components are running on the {{ $labels.rook_cluster }} cluster.

CephPGInconsistent

Severity

Warning

Summary

Too many inconsistent Ceph PGs.

Description

The Ceph cluster {{ $labels.rook_cluster }} detects inconsistencies in one or more replicas of an object in {{ $value }} Placement Groups on the {{ $labels.name }} pool.

CephPGUndersized

Severity

Warning

Summary

Too many undersized Ceph PGs.

Description

The Ceph cluster {{ $labels.rook_cluster }} reports {{ $value }} Placement Groups have fewer copies than the configured pool replication level on the {{ $labels.name }} pool.