Documentation Portal

Keepalived

Keepalived¶

KeepalivedProcessDown
KeepalivedProcessNotResponsive
KeepalivedFailedState
KeepalivedUnknownState
KeepalivedMultipleIPAddr
KeepalivedServiceOutage

KeepalivedProcessDown¶

Severity	Major
Summary	The Keepalived process on the `{{ $labels.host }}` node is down.
Raise condition	`procstat_running{process_name="keepalived"} == 0`
Description	Raised when Keepalived on a particular host does not respond Telegraf, typically indicating that Keepalived is down. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Verify the Keepalived status on the affected node using `systemctl status keepalived`. Inspect the Keepalived logs on the affected node using `journalctl -u keepalived`. Inspect the Telegraf logs on the affected node using `journalctl -u telegraf`.
Tuning	Not required

KeepalivedProcessNotResponsive¶

Severity	Major
Summary	The Keepalived process on the `{{ $labels.host }}` node is not responding.
Raise condition	`keepalived_up == 0`
Description	Raises when Keepalived on a particular host does not respond to Telegraf, typically indicating that Keepalived is running but is not responsive on that node. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Verify the Keepalived status on the affected node using `service keepalived status`. Inspect the Keepalived logs on the affected node using `journalctl -u keepalived`. Inspect the Telegraf logs on the affected node using `journalctl -u telegraf`.
Tuning	Not required

KeepalivedFailedState¶

Severity	Minor
Summary	The Keepalived VRRP `{{ $labels.name }}` is in the `FAILED` state on the `{{ $labels.host }}` node.
Raise condition	`keepalived_state == 0`
Description	Raises when the Keepalived Virtual Router Redundancy Protocol (VRRP) is in the `FAILED` state on a node, typically indicating network issues. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Inspect the Keepalived logs on the affected node using `journalctl -u keepalived`. Inspect the Telegraf logs on the affected node using `journalctl -u telegraf`. Inspect the affected node for any network issues.
Tuning	Not required

KeepalivedUnknownState¶

Severity	Minor
Summary	The Keepalived VRRP `{{ $labels.name }}` is in the `UNKNOWN` state on the `{{ $labels.host }}` node.
Raise condition	`keepalived_state == -1`
Description	Raises when the Keepalived Virtual Router Redundancy Protocol (VRRP) is in the `UNKNOWN` state on a node, typically indicating that Keepalived has improperly reported its state or Telegraf cannot gather the state. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Inspect the Keepalived logs on the affected node using `journalctl -u keepalived`. Inspect the Telegraf logs on the affected node using `journalctl -u telegraf`.
Tuning	Not required

KeepalivedMultipleIPAddr¶

Severity	Major
Summary	The Keepalived `{{ $labels.ip }}` virtual IP is assigned more than once.
Raise condition	`count(ipcheck_assigned) by (ip) > 1`
Description	Raises when the virtual IP address (VIP) of Keepalived is assigned more than once (on more than one node within a cluster).
Troubleshooting	On each node of the Keepalived cluster, `ctl` nodes by default, verify if the VIP is assigned on two or more nodes or interfaces using the `ip a \| grep VIP_address` command.
Tuning	Not required

KeepalivedServiceOutage¶

Severity	Critical
Summary	All Keepalived processes within the `{{ $labels.cluster}}` cluster are down.
Raise condition	`count(label_replace(procstat_running{process_name="keepalived"}, "cluster", "$1", "host", "([^0-9]+).+")) by (cluster) == count(label_replace(procstat_running{process_name="keepalived"} == 0, "cluster", "$1", "host", "([^0-9]+).+")) by (cluster)`
Description	Raises when all Keepalived services across the cluster do not respond to Telegraf, typically indicating configuration or deployment issues.
Troubleshooting	Inspect the `KeepalivedProcessDown` alerts for the host names of the affected nodes. Inspect the Keepalived logs on the affected nodes using `journalctl -u keepalived`. Inspect the Telegraf logs on the affected nodes using `journalctl -u telegraf`.
Tuning	Not required

updated: 2025-01-10 08:56

HAProxy

View Previous Section

libvirt

View Next Section