Documentation Portal

Neutron

Neutron¶

This section describes the alerts for Neutron.

NeutronApiOutage
NeutronApiEndpointDown
NeutronApiEndpointsDownMajor
NeutronApiEndpointsOutage
NeutronAgentDown
NeutronAgentsDownMinor
NeutronAgentsDownMajor
NeutronAgentsOutage
NeutronErrorLogsTooHigh

NeutronApiOutage¶

^{Removed since the 2019.2.11 maintenance update}

Severity	Critical
Summary	Neutron API is not accessible for the Neutron endpoint in the OpenStack service catalog.
Raise condition	`openstack_api_check_status{name="neutron"} == 0`
Description	Raises when the checks against all available internal Neutron endpoints in the OpenStack service catalog do not pass. Telegraf sends HTTP requests to the URLs from the OpenStack service catalog and compares the expected and actual HTTP response codes. The expected response code for Neutron is `200`. For a list of all available endpoints, run `openstack endpoint list`.
Troubleshooting	Verify the availability of internal Neutron endpoints (URLs) from the output of `openstack endpoint list`.
Tuning	Not required

NeutronApiEndpointDown¶

Severity	Minor
Summary	The `neutron-api` endpoint on the `{{ $labels.host }}` node is not accessible for 2 minutes.
Raise condition	`http_response_status{name="neutron-api"} == 0`
Description	Raises when the check against a Neutron API endpoint does not pass, typically indicating that the service endpoint is down or unreachable due to connectivity issues. Telegraf sends a request to the URL configured in `/etc/telegraf/telegraf.d/input-http_response.conf` on the corresponding node and compares the expected and actual HTTP response codes from the configuration file. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Inspect the Telegraf logs using `journalctl -u telegraf` or in `/var/log/telegraf`. Verify the configured URL availability using `curl`.
Tuning	Not required

NeutronApiEndpointsDownMajor¶

Severity	Major
Summary	More than 50% of `neutron-api` endpoints are not accessible for 2 minutes.
Raise condition	`count(http_response_status{name="neutron-api"} == 0) >= count (http_response_status{name="neutron-api"}) * 0.5`
Description	Raises when the check against a Neutron API endpoint does not pass on more than 50% of OpenStack controller nodes, typically indicating that the service endpoint is down or unreachable due to connectivity issues. Telegraf sends a request to the URL configured in `/etc/telegraf/telegraf.d/input-http_response.conf` on the corresponding node and compares the expected and actual HTTP response codes from the configuration file. To identify the affected node, see the `host` label in the `NeutronApiEndpointDown` alert.
Troubleshooting	Inspect the Telegraf logs using `journalctl -u telegraf` or in `/var/log/telegraf`. Verify the configured URL availability using `curl`.
Tuning	Not required

NeutronApiEndpointsOutage¶

Severity	Critical
Summary	All available `neutron-api` endpoints are not accessible for 2 minutes.
Raise condition	`count(http_response_status{name="neutron-api"} == 0) == count (http_response_status{name="neutron-api"})`
Description	Raises when the check against a Neutron API endpoint does not pass on all OpenStack controller nodes, typically indicating that the service endpoint is down or unreachable due to connectivity issues. Telegraf sends a request to the URL configured in `/etc/telegraf/telegraf.d/input-http_response.conf` on the corresponding node and compares the expected and actual HTTP response codes from the configuration file. To identify the affected node, see the `host` label in the `NeutronApiEndpointDown` alert.
Troubleshooting	Inspect the Telegraf logs using `journalctl -u telegraf` or in `/var/log/telegraf`. Verify the configured URL availability using `curl`.
Tuning	Not required

NeutronAgentDown¶

Severity	Minor
Summary	The `{{ $labels.binary }}` agent on the `{{ $labels.hostname }}` node is down.
Raise condition	`openstack_neutron_agent_state == 0`
Description	Raises when a Neutron agent is in the `DOWN` state, according to the information from the Neutron API. For the list of Neutron services, see Networking service overview. This alert can also indicate issues with the Telegraf `monitoring_remote_agent` service. The `binary` and `hostname` labels contain the name of the agent that is in the `DOWN` state and the node that hosts the agent.
Troubleshooting	Verify the statuses of Neutron agents using `openstack network agent list`. Verify the status of the `monitoring_remote_agent` by running `docker service ls` on a `mon` node. Inspect the `monitoring_remote_agent` service logs by running `docker service logs monitoring_remote_agent` on a `mon` node.
Tuning	Not required

NeutronAgentsDownMinor¶

Severity	Minor
Summary	More than 30% of `{{ $labels.binary }}` agents are down.
Raise condition	`count by(binary) (openstack_neutron_agent_state == 0) >= on(binary) count by(binary) (openstack_neutron_agent_state) * 0.3`
Description	Raises when more than 30% of Neutron agents of the same type are in the `DOWN` state, according to the information from the Neutron API. For the list of Neutron services, see Networking service overview. This alert can also indicate issues with the Telegraf `monitoring_remote_agent` service. The `binary` label contains the name of the agent that is in the `DOWN` state.
Troubleshooting	Verify the statuses of Neutron agents using `openstack network agent list`. Inspect the `NeutronAgentDown` alert for the nodes and services that are in the `DOWN` state. Verify the status of the `monitoring_remote_agent` by running `docker service ls` on a `mon` node. Inspect the `monitoring_remote_agent` service logs by running `docker service logs monitoring_remote_agent` on one of the `mon` nodes.
Tuning	Not required

NeutronAgentsDownMajor¶

Severity	Major
Summary	More than 60% of `{{ $labels.binary }}` agents are down.
Raise condition	`count by(binary) (openstack_neutron_agent_state == 0) >= on(binary) count by(binary) (openstack_neutron_agent_state) * 0.6`
Description	Raises when more than 60% of Neutron agents of the same type are in the `DOWN` state, according to the information from the Neutron API. For the list of Neutron services, see Networking service overview. This alert can also indicate issues with the Telegraf `monitoring_remote_agent` service. The `binary` label contains the name of the agent that is in the `DOWN` state.
Troubleshooting	Verify the statuses of Neutron agents using `openstack network agent list`. Inspect the `NeutronAgentDown` alert for the nodes and services that are in the `DOWN` state. Verify the status of the `monitoring_remote_agent` by running `docker service ls` on a `mon` node. Inspect the `monitoring_remote_agent` service logs by running `docker service logs monitoring_remote_agent` on one of the `mon` nodes.
Tuning	Not required

NeutronAgentsOutage¶

Severity	Critical
Summary	All `{{ $labels.binary }}` agents are down.
Raise condition	`count by(binary) (openstack_neutron_agent_state == 0) == on(binary) count by(binary) (openstack_neutron_agent_state)`
Description	Raises when all Neutron agents of the same type are in the `DOWN` state and unavailable, according to the information from the Neutron API. For the list of Neutron services, see Networking service overview. This alert can also indicate issues with the Telegraf `monitoring_remote_agent` service. The `binary` label contains the name of the agent that is in the `DOWN` state.
Troubleshooting	Verify the statuses of Neutron agents using `openstack network agent list`. Inspect the `NeutronAgentDown` alert for the nodes and services that are in the `DOWN` state. Verify the status of the `monitoring_remote_agent` by running `docker service ls` on a `mon` node. Inspect the `monitoring_remote_agent` service logs by running `docker service logs monitoring_remote_agent` on one of the `mon` nodes.
Tuning	Not required

NeutronErrorLogsTooHigh¶

Severity	Warning
Summary	The average per-second rate of errors in Neutron logs on the `{{ $labels.host }}` node is `{{ $value }}` (as measured over the last 5 minutes).
Raise condition	`sum without(level) (rate(log_messages{level=~"(?i:(error\|emergency\| fatal))",service="neutron"}[5m])) > 0.2`
Description	Raises when the average per-second rate of the `error`, `fatal`, or `emergency` messages in Neutron logs on the node is more than 0.2 per second. Fluentd forwards all logs from Neutron to Elasticsearch and counts the number of log messages per severity. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Inspect the Neutron logs in the `/var/log/neutron/` directory on the affected node.
Tuning	Typically, you should not change the default value. If the alert is constantly firing, inspect the Neutron error logs in the Kibana web UI. However, you can adjust the threshold to an acceptable error rate for a particular environment. In the Prometheus web UI, use the raise condition query to view the appearance rate of a particular message type in logs for a longer period of time and define the best threshold. For example, to change the threshold to `0.4`: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert by overriding the `if` parameter: parameters: prometheus: server: alert: NeutronErrorLogsTooHigh: if: >- sum(rate(log_messages{service="neutron", level=~"(?i:\ (error\|emergency\|fatal))"}[5m])) without (level) > 0.4 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

updated: 2025-01-10 08:56

Keystone

View Previous Section

Nova

View Next Section