This section describes the alerts for the Open vSwitch (OVS) processes.
Warning
OVSInstanceArpingCheckDown alert is available starting from the
MCP 2019.2.4 update.OVSTooManyPortRunningOnAgent, OVSErrorOnPort,
OVSNonInternalPortDown and OVSGatherFailed alerts are available
starting from the MCP 2019.2.6 update.Available starting from the 2019.2.3 maintenance update
| Severity | Warning |
|---|---|
| Summary | The ovs-vswitchd process consumes more than 20% of system memory. |
| Raise condition | procstat_memory_vms{process_name="ovs-vswitchd"} / on(host) mem_total
> 0.2 |
| Description | Raises when the virtual memory of the ovs-switchd process exceeds
20% of the host memory. |
| Tuning | Not required |
Available starting from the 2019.2.3 maintenance update
| Severity | Critical |
|---|---|
| Summary | The ovs-vswitchd process consumes more than 30% of system memory. |
| Raise condition | procstat_memory_vms{process_name="ovs-vswitchd"} / on(host) mem_total
> 0.3 |
| Description | Raises when the virtual memory of the ovs-switchd process exceeds
30% of the host memory. |
| Tuning | Not required |
Available starting from the 2019.2.4 maintenance update
| Severity | Major |
|---|---|
| Summary | The OVS instance arping check is down. |
| Raise condition | instance_arping_check_up == 0 |
| Description | Raises when the OVS instance arping check on the {{ $labels.host }}
node is down for 2 minutes. The host label in the raised alert
contains the affected node name. |
| Tuning | Not required |
Available starting from the 2019.2.6 maintenance update
| Severity | Major |
|---|---|
| Summary | The number of OVS port is {{ $value }} (ovs-vsctl list port) on
the {{ $labels.host }} host, which is more than the expected limit. |
| Raise condition | sum by (host) (ovs_bridge_status) > 1500 |
| Description | Raises when too many networks are created or OVS does not properly clean up the OVS ports. OVS may malfunction if too many ports are assigned to a single agent. Warning For production environments, configure the alert after deployment. |
| Troubleshooting |
|
| Tuning | For example, to change the threshold to
|
Available starting from the 2019.2.6 maintenance update
| Severity | Critical |
|---|---|
| Summary | The {{ $labels.port }} OVS port on the {{ $labels.bridge }}
bridge running on the {{ $labels.host }} host is reporting errors. |
| Raise condition | ovs_bridge_status == 2 |
| Description | Raises when an OVS port reports errors, indicating that the port is not working properly. |
| Troubleshooting |
|
| Tuning | Not required |
Available starting from the 2019.2.6 maintenance update
| Severity | Critical |
|---|---|
| Summary | The {{ $labels.port }} OVS port on the {{ $labels.bridge }}
bridge running on the {{ $labels.host }} host is down. |
| Raise condition | ovs_bridge_status{type!="internal"} == 0 |
| Description | Raises when the port on the OVS bridge is in the DOWN state, which
may lead to an unexpected network disturbance. |
| Troubleshooting |
|
| Tuning | Note required |
Available starting from the 2019.2.6 maintenance update
| Severity | Critical |
|---|---|
| Summary | Failure to gather the OVS information on the {{ $labels.host }}
host. |
| Raise condition | ovs_bridge_check == 0 |
| Description | Raises when the check script for the OVS bridge fails to gather data. OVS is not monitored. |
| Troubleshooting | Run /usr/local/bin/ovs_parse_bridge.py from the affected host and
inspect the output. |
| Tuning | Not required |