Keepalived

Keepalived


KeepalivedProcessDown

Severity Major
Summary The Keepalived process on the {{ $labels.host }} node is down.
Raise condition procstat_running{process_name="keepalived"} == 0
Description Raised when Keepalived on a particular host does not respond Telegraf, typically indicating that Keepalived is down. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  • Verify the Keepalived status on the affected node using systemctl status keepalived.
  • Inspect the Keepalived logs on the affected node using journalctl -u keepalived.
  • Inspect the Telegraf logs on the affected node using journalctl -u telegraf.
Tuning Not required

KeepalivedProcessNotResponsive

Severity Major
Summary The Keepalived process on the {{ $labels.host }} node is not responding.
Raise condition keepalived_up == 0
Description Raises when Keepalived on a particular host does not respond to Telegraf, typically indicating that Keepalived is running but is not responsive on that node. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  • Verify the Keepalived status on the affected node using service keepalived status.
  • Inspect the Keepalived logs on the affected node using journalctl -u keepalived.
  • Inspect the Telegraf logs on the affected node using journalctl -u telegraf.
Tuning Not required

KeepalivedFailedState

Severity Minor
Summary The Keepalived VRRP {{ $labels.name }} is in the FAILED state on the {{ $labels.host }} node.
Raise condition keepalived_state == 0
Description Raises when the Keepalived Virtual Router Redundancy Protocol (VRRP) is in the FAILED state on a node, typically indicating network issues. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  • Inspect the Keepalived logs on the affected node using journalctl -u keepalived.
  • Inspect the Telegraf logs on the affected node using journalctl -u telegraf.
  • Inspect the affected node for any network issues.
Tuning Not required

KeepalivedUnknownState

Severity Minor
Summary The Keepalived VRRP {{ $labels.name }} is in the UNKNOWN state on the {{ $labels.host }} node.
Raise condition keepalived_state == -1
Description Raises when the Keepalived Virtual Router Redundancy Protocol (VRRP) is in the UNKNOWN state on a node, typically indicating that Keepalived has improperly reported its state or Telegraf cannot gather the state. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  • Inspect the Keepalived logs on the affected node using journalctl -u keepalived.
  • Inspect the Telegraf logs on the affected node using journalctl -u telegraf.
Tuning Not required

KeepalivedMultipleIPAddr

Severity Major
Summary The Keepalived {{ $labels.ip }} virtual IP is assigned more than once.
Raise condition count(ipcheck_assigned) by (ip) > 1
Description Raises when the virtual IP address (VIP) of Keepalived is assigned more than once (on more than one node within a cluster).
Troubleshooting On each node of the Keepalived cluster, ctl nodes by default, verify if the VIP is assigned on two or more nodes or interfaces using the ip a | grep VIP_address command.
Tuning Not required

KeepalivedServiceOutage

Severity Critical
Summary All Keepalived processes within the {{ $labels.cluster}} cluster are down.
Raise condition count(label_replace(procstat_running{process_name="keepalived"}, "cluster", "$1", "host", "([^0-9]+).+")) by (cluster) == count(label_replace(procstat_running{process_name="keepalived"} == 0, "cluster", "$1", "host", "([^0-9]+).+")) by (cluster)
Description Raises when all Keepalived services across the cluster do not respond to Telegraf, typically indicating configuration or deployment issues.
Troubleshooting
  • Inspect the KeepalivedProcessDown alerts for the host names of the affected nodes.
  • Inspect the Keepalived logs on the affected nodes using journalctl -u keepalived.
  • Inspect the Telegraf logs on the affected nodes using journalctl -u telegraf.
Tuning Not required