Telegraf

Telegraf

This section describes the alerts for the Telegraf service.


TelegrafGatherErrors

Available starting from the 2019.2.5 maintenance update

Severity

Major

Summary

Telegraf failed to gather metrics.

Raise condition

  • In 2019.2.9 and prior: rate(internal_agent_gather_errors[10m]) > 0

  • In 2019.2.10 and newer: rate(internal_agent_gather_errors{job!="remote_agent"}[10m]) > 0

Description

Raises when Telegraf has gathering errors on a node for the last 10 minutes. The host label in the raised alert contains the host name of the affected node.

Troubleshooting

Inspect the Telegraf logs by running journalctl -u telegraf on the affected node.

Tuning

Not required

TelegrafRemoteGatherErrors

Available starting from the 2019.2.10 maintenance update

Severity

Major

Summary

Remote Telegraf failed to gather metrics.

Raise condition

rate(internal_agent_gather_errors{job="remote_agent"}[10m]) > 0

Description

Raises when remote Telegraf has gathering errors for the last 10 minutes.

Troubleshooting

Inspect the Telegraf monitoring_remote_agent service logs.

Tuning

Not required