Prometheus

This section describes the alerts for the Prometheus service.


PrometheusConfigReloadFailed

Severity

Warning

Summary

Failure to reload the Prometheus configuration.

Description

Reloading of the Prometheus configuration has failed.

PrometheusNotificationQueueRunningFull

Severity

Warning

Summary

Prometheus alert notification queue is running full.

Description

The Prometheus alert notification queue is running full for the {{ $labels.namespace }}/{{ $labels.pod }} Pod.

PrometheusErrorSendingAlertsWarning

Severity

Warning

Summary

Errors while sending alerts from Prometheus.

Description

Errors while sending alerts from the {{ $labels.namespace }}/{{ $labels.pod }} Prometheus Pod to the {{ $labels.Alertmanager }} Alertmanager.

PrometheusErrorSendingAlertsMajor

Severity

Major

Summary

Errors while sending alerts from Prometheus.

Description

Errors while sending alerts from the {{ $labels.namespace }}/{{ $labels.pod }} Prometheus Pod to the {{ $labels.alertmanager }} Alertmanager.

PrometheusNotConnectedToAlertmanagers

Severity

Warning

Summary

Prometheus is not connected to any Alertmanager.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Prometheus Pod is not connected to any Alertmanager instance.

PrometheusTSDBReloadsFailing

Severity

Warning

Summary

Prometheus has issues reloading data blocks from disk.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Prometheus Pod had {{ $value | humanize }} reload failures over the last 12 hours.

PrometheusTSDBCompactionsFailing

Severity

Warning

Summary

Prometheus has issues compacting sample blocks.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Prometheus Pod had {{ $value | humanize }} compaction failures over the last 12 hours.

PrometheusTSDBWALCorruptions

Severity

Warning

Summary

Prometheus encountered WAL corruptions.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Prometheus Pod has write-ahead log (WAL) corruptions in the time series database (TSDB) for the last 5 minutes.

PrometheusNotIngestingSamples

Severity

Major

Summary

Prometheus does not ingest samples.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Prometheus Pod does not ingest samples.

PrometheusTargetScrapesDuplicate

Severity

Warning

Summary

Prometheus has many samples rejected.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Prometheus Pod has many samples rejected due to duplicate timestamps but different values.

PrometheusRuleEvaluationsFailed

Severity

Warning

Summary

Prometheus failed to evaluate recording rules.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Prometheus Pod has failed evaluations for recording rules. Verify the rules state in the Status/Rules section of the Prometheus Web UI.

PrometheusServerTargetDown

Since 17.0.0, 16.0.0, 14.1.0 to replace PrometheusServerTargetsOutage

Severity

Critical

Summary

Prometheus server target down.

Description

Prometheus fails to scrape metrics from the {{ $labels.pod }} Pod on the {{ $labels.node }} node.

PrometheusServerTargetsOutage

Replaced with PrometheusServerTargetDown in 17.0.0, 16.0.0, 14.1.0

Severity

Critical

Summary

Prometheus server targets outage.

Description

Prometheus fails to scrape metrics from all of its endpoints (more than 1/10 failed scrapes).