Calico

Calico

This section describes the alerts for Calico.

CalicoProcessDown

Severity

Minor

Summary

The Calico {{ $labels.process_name }} process on the {{ $labels.host }} node is down for 2 minutes.

Raise condition

procstat_running{process_name=~"calico-felix|bird|bird6|confd"} == 0

Description

Raises when Telegraf cannot find running processes with names calico-felix, bird, bird6, confd on any ctl host. The process_name and host labels in the raised alert contain the name of a particular process and the host name of the affected node respectively.

Troubleshooting

  • Inspect dmesg and /var/log/kern.log

  • Inspect the logs in /var/log/calico

  • Inspect the output of the systemctl status containerd and journalctl -u containerd commands

Tuning

Not required

CalicoProcessDownMinor

Severity

Minor

Summary

More than 30% of Calico {{ $labels.process_name }} processes are down for 2 minutes.

Raise condition

count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"} == 0) by (process_name) > count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"}) by (process_name) * {{ instance_minor_threshold_percent }}

Description

Raises when Telegraf cannot find running processes with names calico-felix, bird, bird6, confd on more than 30% of the ctl hosts. The process_name label in the raised alert contains the name of a particular process.

Troubleshooting

  • Inspect the CalicoProcessDown alerts for the host names of the affected nodes

  • Inspect dmesg and /var/log/kern.log

  • Inspect the logs in /var/log/calico

  • Inspect the output of the systemctl status containerd and journalctl -u containerd commands

Tuning

Not required

CalicoProcessDownMajor

Severity

Major

Summary

More than 60% of Calico {{ $labels.process_name }} processes are down for 2 minutes.

Raise condition

count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"} == 0) by (process_name) > count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"}) by (process_name) * {{ instance_major_threshold_percent }}

Description

Raises when Telegraf cannot find running processes with names calico-felix, bird, bird6, confd on more than 60% of ctl hosts. The process_name label in the raised alert contains the name of a particular process.

Troubleshooting

  • Inspect the CalicoProcessDown alerts for host names of the affected nodes

  • Inspect dmesg and /var/log/kern.log

  • Inspect the logs in /var/log/calico

  • Inspect the output of the systemctl status containerd and journalctl -u containerd commands

Tuning

Not required

CalicoProcessOutage

Severity

Critical

Summary

All Calico {{ $labels.process_name }} processes are down for 2 minutes.

Raise condition

count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"}) by (process_name) == count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"} == 0) by (process_name)

Description

Raises when Telegraf cannot find running processes with names calico-felix, bird, bird6, confd on all ctl hosts. The process_name label in the raised alert contains the name of a particular process.

Troubleshooting

  • Verify the CalicoProcessDown alerts for host names of the affected nodes

  • Inspect dmesg and /var/log/kern.log

  • Inspect the logs in /var/log/calico

  • Inspect the output of the systemctl status containerd and journalctl -u containerd commands

Tuning

Not required