Telegraf

This section lists the alerts for the Telegraf service.


TelegrafGatherErrors

Removed in MCC 2.29.0 (17.4.0 and 16.4.0)

Note

TelegrafGatherErrors was replaced with TelegrafDockerSwarmGatherErrors and TelegrafSMARTGatherErrors.

Severity

Major

Summary

{{ $labels.job }} failed to gather metrics.

Description

The {{ $labels.job }} Prometheus target has gathering errors for the last 10 minutes.

TelegrafDockerSwarmGatherErrors

Since MCC 2.29.0 (17.4.0 and 16.4.0)

Severity

Major

Summary

Telegraf Docker Swarm failed to gather metrics.

Description

The Telegraf Docker Swarm Prometheus target contains gathering errors for the last 30 minutes.

TelegrafDockerSwarmTargetDown

Severity

Critical

Summary

Telegraf Docker Swarm Prometheus target is down.

Description

Prometheus fails to scrape metrics from the {{ $labels.pod }} Pod on the {{ $labels.node }} node.

TelegrafOpenstackTargetDown

Removed in MOSK 24.1

Severity

Critical

Summary

Telegraf OpenStack Prometheus target is down.

Description

Prometheus fails to scrape metrics from the Telegraf OpenStack service.

TelegrafSMARTGatherErrors

Since MCC 2.29.0 (17.4.0 and 16.4.0)

Severity

Major

Summary

Telegraf SMART failed to gather metrics.

Description

The Telegraf SMART Prometheus target contains gathering errors for the last 10 minutes.

TelegrafSMARTTargetDown

Severity

Major

Summary

Telegraf SMART Prometheus target is down.

Description

Prometheus fails to scrape metrics from the Telegraf SMART endpoint on the {{ $labels.node }} node.

TelegrafSMARTTargetsOutage

Severity

Critical

Summary

Telegraf SMART Prometheus targets outage.

Description

Prometheus fails to scrape metrics from all Telegraf SMART endpoints.