Calico

Calico

This section describes the alerts for Calico.

CalicoProcessDown

Severity Minor
Summary The Calico {{ $labels.process_name }} process on the {{ $labels.host }} node is down for 2 minutes.
Raise condition procstat_running{process_name=~"calico-felix|bird|bird6|confd"} == 0
Description Raises when Telegraf cannot find running processes with names calico-felix, bird, bird6, confd on any ctl host. The process_name and host labels in the raised alert contain the name of a particular process and the host name of the affected node respectively.
Troubleshooting
  • Inspect dmesg and /var/log/kern.log
  • Inspect the logs in /var/log/calico
  • Inspect the output of the systemctl status containerd and journalctl -u containerd commands
Tuning Not required

CalicoProcessDownMinor

Severity Minor
Summary More than 30% of Calico {{ $labels.process_name }} processes are down for 2 minutes.
Raise condition count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"} == 0) by (process_name) > count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"}) by (process_name) * {{ instance_minor_threshold_percent }}
Description Raises when Telegraf cannot find running processes with names calico-felix, bird, bird6, confd on more than 30% of the ctl hosts. The process_name label in the raised alert contains the name of a particular process.
Troubleshooting
  • Inspect the CalicoProcessDown alerts for the host names of the affected nodes
  • Inspect dmesg and /var/log/kern.log
  • Inspect the logs in /var/log/calico
  • Inspect the output of the systemctl status containerd and journalctl -u containerd commands
Tuning Not required

CalicoProcessDownMajor

Severity Major
Summary More than 60% of Calico {{ $labels.process_name }} processes are down for 2 minutes.
Raise condition count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"} == 0) by (process_name) > count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"}) by (process_name) * {{ instance_major_threshold_percent }}
Description Raises when Telegraf cannot find running processes with names calico-felix, bird, bird6, confd on more than 60% of ctl hosts. The process_name label in the raised alert contains the name of a particular process.
Troubleshooting
  • Inspect the CalicoProcessDown alerts for host names of the affected nodes
  • Inspect dmesg and /var/log/kern.log
  • Inspect the logs in /var/log/calico
  • Inspect the output of the systemctl status containerd and journalctl -u containerd commands
Tuning Not required

CalicoProcessOutage

Severity Critical
Summary All Calico {{ $labels.process_name }} processes are down for 2 minutes.
Raise condition count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"}) by (process_name) == count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"} == 0) by (process_name)
Description Raises when Telegraf cannot find running processes with names calico-felix, bird, bird6, confd on all ctl hosts. The process_name label in the raised alert contains the name of a particular process.
Troubleshooting
  • Verify the CalicoProcessDown alerts for host names of the affected nodes
  • Inspect dmesg and /var/log/kern.log
  • Inspect the logs in /var/log/calico
  • Inspect the output of the systemctl status containerd and journalctl -u containerd commands
Tuning Not required