Open vSwitch

Open vSwitch¶

This section describes the alerts for the Open vSwitch (OVS) processes.

Warning

Monitoring of the OVS processes is available starting from the MCP 2019.2.3 update.
The OVSInstanceArpingCheckDown alert is available starting from the MCP 2019.2.4 update.
The OVSTooManyPortRunningOnAgent, OVSErrorOnPort, OVSNonInternalPortDown and OVSGatherFailed alerts are available starting from the MCP 2019.2.6 update.

^{Available starting from the 2019.2.3 maintenance update}

Severity	Warning
Summary	The `ovs-vswitchd` process consumes more than 20% of system memory.
Raise condition	`procstat_memory_vms{process_name="ovs-vswitchd"} / on(host) mem_total > 0.2`
Description	Raises when the virtual memory of the `ovs-switchd` process exceeds 20% of the host memory.
Tuning	Not required

^{Available starting from the 2019.2.3 maintenance update}

Severity	Critical
Summary	The `ovs-vswitchd` process consumes more than 30% of system memory.
Raise condition	`procstat_memory_vms{process_name="ovs-vswitchd"} / on(host) mem_total > 0.3`
Description	Raises when the virtual memory of the `ovs-switchd` process exceeds 30% of the host memory.
Tuning	Not required

^{Available starting from the 2019.2.4 maintenance update}

Severity	Major
Summary	The OVS instance arping check is down.
Raise condition	`instance_arping_check_up == 0`
Description	Raises when the OVS instance arping check on the `{{ $labels.host }}` node is down for 2 minutes. The `host` label in the raised alert contains the affected node name.
Tuning	Not required

^{Available starting from the 2019.2.6 maintenance update}

Severity	Major
Summary	The number of OVS port is `{{ $value }}` (`ovs-vsctl list port`) on the `{{ $labels.host }}` host, which is more than the expected limit.
Raise condition	`sum by (host) (ovs_bridge_status) > 1500`
Description	Raises when too many networks are created or OVS does not properly clean up the OVS ports. OVS may malfunction if too many ports are assigned to a single agent. Warning For production environments, configure the alert after deployment.
Troubleshooting	Run `ovs-vsctl show` from the affected node and `openstack port list` from the OpenStack controller nodes and inspect the existing ports. Remove the unneeded ports or redistribute the OVS ports.
Tuning	For example, to change the threshold to `1600`: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert threshold by overriding the `if` parameter: parameters: prometheus: server: alert: OVSTooManyPortRunningOnAgent: if: >- sum by (host) (ovs_bridge_status) > 1600 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

^{Available starting from the 2019.2.6 maintenance update}

Severity	Critical
Summary	The `{{ $labels.port }}` OVS port on the `{{ $labels.bridge }}` bridge running on the `{{ $labels.host }}` host is reporting errors.
Raise condition	`ovs_bridge_status == 2`
Description	Raises when an OVS port reports errors, indicating that the port is not working properly.
Troubleshooting	From the affected node, run `ovs-vsctl show`. Inspect the output for `error` entries.
Tuning	Not required

^{Available starting from the 2019.2.6 maintenance update}

Severity	Critical
Summary	The `{{ $labels.port }}` OVS port on the `{{ $labels.bridge }}` bridge running on the `{{ $labels.host }}` host is down.
Raise condition	`ovs_bridge_status{type!="internal"} == 0`
Description	Raises when the port on the OVS bridge is in the `DOWN` state, which may lead to an unexpected network disturbance.
Troubleshooting	From the affected node, run `ip a` to verify if the port is in the `DOWN` state. If required, bring the port up using `ifconfig <interface> up`.
Tuning	Note required

^{Available starting from the 2019.2.6 maintenance update}

Severity	Critical
Summary	Failure to gather the OVS information on the `{{ $labels.host }}` host.
Raise condition	`ovs_bridge_check == 0`
Description	Raises when the check script for the OVS bridge fails to gather data. OVS is not monitored.
Troubleshooting	Run `/usr/local/bin/ovs_parse_bridge.py` from the affected host and inspect the output.
Tuning	Not required

updated: 2025-01-10 08:56

NTP

RabbitMQ