Calico
This section describes the alerts for Calico.
CalicoProcessDown
Severity |
Minor |
Summary |
The Calico {{ $labels.process_name }} process on the
{{ $labels.host }} node is down for 2 minutes. |
Raise condition |
procstat_running{process_name=~"calico-felix|bird|bird6|confd"} == 0 |
Description |
Raises when Telegraf cannot find running processes with names
calico-felix , bird , bird6 , confd on any ctl host.
The process_name and host labels in the raised alert contain the
name of a particular process and the host name of the affected node
respectively. |
Troubleshooting |
- Inspect
dmesg and /var/log/kern.log
- Inspect the logs in
/var/log/calico
- Inspect the output of the
systemctl status containerd and
journalctl -u containerd commands
|
Tuning |
Not required |
CalicoProcessDownMinor
Severity |
Minor |
Summary |
More than 30% of Calico {{ $labels.process_name }} processes are
down for 2 minutes. |
Raise condition |
count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"}
== 0) by (process_name) >
count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"})
by (process_name) * {{ instance_minor_threshold_percent }} |
Description |
Raises when Telegraf cannot find running processes with names
calico-felix , bird , bird6 , confd on more than 30% of the
ctl hosts. The process_name label in the raised alert contains
the name of a particular process. |
Troubleshooting |
- Inspect the
CalicoProcessDown alerts for the host names of the
affected nodes
- Inspect
dmesg and /var/log/kern.log
- Inspect the logs in
/var/log/calico
- Inspect the output of the
systemctl status containerd and
journalctl -u containerd commands
|
Tuning |
Not required |
CalicoProcessDownMajor
Severity |
Major |
Summary |
More than 60% of Calico {{ $labels.process_name }} processes are
down for 2 minutes. |
Raise condition |
count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"}
== 0) by (process_name) >
count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"})
by (process_name) * {{ instance_major_threshold_percent }} |
Description |
Raises when Telegraf cannot find running processes with names
calico-felix , bird , bird6 , confd on more than 60% of
ctl hosts. The process_name label in the raised alert contains
the name of a particular process. |
Troubleshooting |
- Inspect the
CalicoProcessDown alerts for host names of the
affected nodes
- Inspect
dmesg and /var/log/kern.log
- Inspect the logs in
/var/log/calico
- Inspect the output of the
systemctl status containerd and
journalctl -u containerd commands
|
Tuning |
Not required |
CalicoProcessOutage
Severity |
Critical |
Summary |
All Calico {{ $labels.process_name }} processes are down for 2
minutes. |
Raise condition |
count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"})
by (process_name) ==
count(procstat_running{process_name=~"calico-felix|bird|bird6|confd"}
== 0) by (process_name) |
Description |
Raises when Telegraf cannot find running processes with names
calico-felix , bird , bird6 , confd on all ctl hosts.
The process_name label in the raised alert contains the name of a
particular process. |
Troubleshooting |
- Verify the
CalicoProcessDown alerts for host names of the affected
nodes
- Inspect
dmesg and /var/log/kern.log
- Inspect the logs in
/var/log/calico
- Inspect the output of the
systemctl status containerd and
journalctl -u containerd commands
|
Tuning |
Not required |