Telegraf

Telegraf

This section describes the alerts for the Telegraf service.


TelegrafGatherErrors

Available starting from the 2019.2.5 maintenance update

Severity Major
Summary Telegraf failed to gather metrics.
Raise condition
  • In 2019.2.9 and prior: rate(internal_agent_gather_errors[10m]) > 0
  • In 2019.2.10 and newer: rate(internal_agent_gather_errors{job!="remote_agent"}[10m]) > 0
Description Raises when Telegraf has gathering errors on a node for the last 10 minutes. The host label in the raised alert contains the host name of the affected node.
Troubleshooting Inspect the Telegraf logs by running journalctl -u telegraf on the affected node.
Tuning Not required

TelegrafRemoteGatherErrors

Available starting from the 2019.2.10 maintenance update

Severity Major
Summary Remote Telegraf failed to gather metrics.
Raise condition rate(internal_agent_gather_errors{job="remote_agent"}[10m]) > 0
Description Raises when remote Telegraf has gathering errors for the last 10 minutes.
Troubleshooting Inspect the Telegraf monitoring_remote_agent service logs.
Tuning Not required