Cinder

Cinder¶

This section describes the alerts for Cinder.

CinderApiOutage
CinderApiDown
CinderApiEndpointDown
CinderApiEndpointDownMajor
CinderApiEndpointsOutage
CinderServiceDown
CinderServicesDownMinor
CinderServicesDownMajor
CinderServiceOutage
CinderVolumeProcessDown
CinderVolumeProcessesDownMinor
CinderVolumeProcessesDownMajor
CinderVolumeServiceOutage
CinderErrorLogsTooHigh

CinderApiOutage¶

^{Removed since the 2019.2.11 maintenance update}

Severity	Critical
Summary	Cinder API is not accessible for all available Cinder endpoints in the OpenStack service catalog.
Raise condition	`max(openstack_api_check_status{name=~"cinder.*"}) == 0`
Description	Raises when the checks against all available internal Cinder endpoints in the OpenStack service catalog do not pass. Telegraf sends HTTP requests to the URLs from the OpenStack service catalog and compares the expected and actual HTTP response codes. The expected response codes for Cinder, Cinderv2, and Cinderv3 are `200` and `300`. For a list of all available endpoints, run `openstack endpoint list`.
Troubleshooting	Verify the availability of internal Cinder endpoints (URLs) from the output of `openstack endpoint list`.
Tuning	Not required

CinderApiDown¶

^{Removed since the 2019.2.11 maintenance update}

Severity	Major
Summary	Cinder API is not accessible for the `{{ $labels.name }}` endpoint.
Raise condition	`openstack_api_check_status{name=~"cinder.*"} == 0`
Description	Raises when the check against one available internal Cinder endpoints in the OpenStack service catalog does not pass. Telegraf sends HTTP requests to the URLs from the OpenStack service catalog and compares the expected and actual HTTP response codes. The expected response codes for Cinder, Cinderv2, and Cinderv3 are `200` and `300`. For a list of all available endpoints, run `openstack endpoint list`.
Troubleshooting	Verify the availability of internal Cinder endpoints (URLs) from the output of the `openstack endpoint list` command.
Tuning	Not required

CinderApiEndpointDown¶

Severity	Minor
Summary	The cinder-api endpoint on the `{{ $labels.host }}` node is not accessible for 2 minutes.
Raise condition	`http_response_status{name=~"cinder-api"} == 0`
Description	Raises when the check against a Cinder API endpoint does not pass, typically meaning that the service endpoint is down or unreachable due to connectivity issues. The `host` label in the raised alert contains the host name of the affected node. Telegraf sends a request to the URL configured in `/etc/telegraf/telegraf.d/input-http_response.conf` on the corresponding node and compares the expected and actual HTTP response codes from the configuration file.
Troubleshooting	Inspect the Telegraf logs using `journalctl -u telegraf` or in `/var/log/telegraf`. Verify the configured URL availability using `curl`.
Tuning	Not required

CinderApiEndpointDownMajor¶

Severity	Major
Summary	More than 50% of cinder-api endpoints are not accessible for 2 minutes.
Raise condition	`count(http_response_status{name=~"cinder-api"} == 0) >= count(http_response_status{name=~"cinder-api"}) * 0.5`
Description	Raises when the check against a Cinder API endpoint does not pass on more than 50% of OpenStack controller nodes. For details on the affected nodes, see the `host` label in the `CinderApiEndpointDown` alerts. Telegraf sends a request to the URL configured in `/etc/telegraf/telegraf.d/input-http_response.conf` on the corresponding node and compares the expected and actual HTTP response codes from the configuration file.
Troubleshooting	Inspect the `CinderApiEndpointDown` alerts for the host names of the affected nodes. Inspect the Telegraf logs using `journalctl -u telegraf` or in `/var/log/telegraf`. Verify the configured URL availability using `curl`.
Tuning	Not required

CinderApiEndpointsOutage¶

Severity	Critical
Summary	All available cinder-api endpoints are not accessible for 2 minutes.
Raise condition	`count(http_response_status{name=~"cinder-api"} == 0) == count(http_response_status{name=~"cinder-api"})`
Description	Raises when the check against a Cinder API endpoint does not pass on all OpenStack controller nodes. Telegraf sends a request to the URL configured in `/etc/telegraf/telegraf.d/input-http_response.conf` on the corresponding node and compares the expected and actual HTTP response codes from the configuration file.
Troubleshooting	Inspect the `CinderApiEndpointDown` alerts for the host names of the affected nodes. Inspect the Telegraf logs using `journalctl -u telegraf` or in `/var/log/telegraf`. Verify the configured URL availability using `curl`.
Tuning	Not required

CinderServiceDown¶

Severity	Minor
Summary	The `{{ $labels.binary }}` service on the `{{ $labels.hostname }}` node is down.
Raise condition	`openstack_cinder_service_state == 0`
Description	Raises when a Cinder service on the OpenStack controller or compute node is in the `DOWN` state. For the list of Cinder services, see Cinder Block Storage service overview. The `binary` and `hostname` labels contain the name of the service that is in the `DOWN` state and the node that hosts the service.
Troubleshooting	Verify the list of Cinder services and their states using `openstack volume service list`. Verify the status of the corresponding Cinder service on the affected node using `systemctl service <binary>`. Inspect the logs of the corresponding Cinder service on the affected node in the `/var/log/cinder/` directory. Verify the Telegraf `monitoring_remote_agent` service: Verify the status of the `monitoring_remote_agent` service using `docker service ls`. Inspect the `monitoring_remote_agent` service logs by running `docker service logs monitoring_remote_agent` on one of the `mon` nodes.
Tuning	Not required

CinderServicesDownMinor¶

Severity	Minor
Summary	More than 30% of `{{ $labels.binary }}` services are down.
Raise condition	`count by(binary) (openstack_cinder_service_state == 0) >= on(binary) count by(binary) (openstack_cinder_service_state) * 0.3`
Description	Raises when a Cinder service is in the `DOWN` state on more than 30% of the `ctl` or `cmp` hosts. For the list of services, see Cinder Block Storage service overview. Inspect the `hostname` label in the `CinderServiceDown` alerts for details on the affected services and nodes.
Troubleshooting	Verify the list of Cinder services and their states using `openstack volume service list`. Verify the status of the corresponding Cinder service on the affected node using `systemctl service <binary>`. Inspect the logs of the corresponding Cinder service on the affected node in the `/var/log/cinder/` directory. Verify the Telegraf `monitoring_remote_agent` service: Verify the status of the `monitoring_remote_agent` service using `docker service ls`. Inspect the `monitoring_remote_agent` service logs by running `docker service logs monitoring_remote_agent` on one of the `mon` nodes.
Tuning	Not required

CinderServicesDownMajor¶

Severity	Major
Summary	More than 60% of `{{ $labels.binary }}` services are down.
Raise condition	`count by(binary) (openstack_cinder_service_state == 0) >= on(binary) count by(binary) (openstack_cinder_service_state) * 0.6`
Description	Raises when a Cinder service is in the `DOWN` state on more than 60% of the `ctl` or `cmp` hosts. For the list of services, see Cinder Block Storage service overview. Inspect the `hostname` label in the `CinderServiceDown` alerts for details on the affected services and nodes.
Troubleshooting	Verify the list of Cinder services and their states using `openstack volume service list`. Verify the status of the corresponding Cinder service on the affected node using `systemctl service <binary>`. Inspect the logs of the corresponding Cinder service on the affected node in the `/var/log/cinder/` directory. Verify the Telegraf `monitoring_remote_agent` service: Verify the status of the `monitoring_remote_agent` service using `docker service ls`. Inspect the `monitoring_remote_agent` service logs by running `docker service logs monitoring_remote_agent` on one of the `mon` nodes.
Tuning	Not required

CinderServiceOutage¶

Severity	Critical
Summary	All `{{ $labels.binary }}` services are down.
Raise condition	`count by(binary) (openstack_cinder_service_state == 0) == on(binary) count by(binary) (openstack_cinder_service_state)`
Description	Raises when a Cinder service is in the `DOWN` state on all `ctl` or `cmp` hosts. For the list of services, see Cinder Block Storage service overview. Inspect the `hostname` label in the `CinderServiceDown` alerts for details on the affected services and nodes.
Troubleshooting	Verify the list of Cinder services and their states using `openstack volume service list`. Verify the status of the corresponding Cinder service on the affected node using `systemctl service <binary>`. Inspect the logs of the corresponding Cinder service on the affected node in the `/var/log/cinder/` directory. Verify the Telegraf `monitoring_remote_agent` service: Verify the status of the `monitoring_remote_agent` service using `docker service ls`. Inspect the `monitoring_remote_agent` service logs by running `docker service logs monitoring_remote_agent` on one of the `mon` nodes.
Tuning	Not required

CinderVolumeProcessDown¶

^{Available starting from the 2019.2.8 maintenance update}

Severity	Minor
Summary	A `cinder-volume` process is down.
Raise condition	`procstat_running{process_name="cinder-volume"} == 0`
Description	Raises when a `cinder-volume` process on a node is down. The `host` label in the raised alert contains the affected node.
Troubleshooting	Log in to the corresponding node and verify the process status using `systemctl status cinder-volume`. Inspect the `cinder-volume` log files in the `/var/log/cinder/` directory.
Tuning	Not required

CinderVolumeProcessesDownMinor¶

^{Available starting from the 2019.2.8 maintenance update}

Severity	Minor
Summary	30% of `cinder-volume` processes are down.
Raise condition	`count(procstat_running{process_name="cinder-volume"} == 0) >= count (procstat_running{process_name="cinder-volume"}) * {{ minor_threshold }}`
Description	Raises when at least one `cinder-volume` process is down. Includes the number of `cinder-volume` processes in the `DOWN` state (`>= {%- endraw %}{{minor_threshold*100}}%{%- raw %}`).
Troubleshooting	Log in to the corresponding node and verify the process status using `systemctl status cinder-volume`. Inspect the `cinder-volume` log files in the `/var/log/cinder/` directory.
Tuning	Not required

CinderVolumeProcessesDownMajor¶

^{Available starting from the 2019.2.8 maintenance update}

Severity	Major
Summary	60% of `cinder-volume` processes are down.
Raise condition	`count(procstat_running{process_name="cinder-volume"} == 0) >= count (procstat_running{process_name="cinder-volume"}) * {{ major_threshold }}`
Description	Raises when at least two `cinder-volume` processes are down. Includes the number of `cinder-volume` processes in the `DOWN` state (`>= {%- endraw %}{{major_threshold*100}}%{%- raw %}`).
Troubleshooting	Log in to the corresponding node and verify the process status using `systemctl status cinder-volume`. Inspect the `cinder-volume` log files in the `/var/log/cinder/` directory.
Tuning	Not required

CinderVolumeServiceOutage¶

^{Available starting from the 2019.2.8 maintenance update}

Severity	Critical
Summary	The `cinder-volume` service is down.
Raise condition	`count(procstat_running{process_name="cinder-volume"} == 0) == count (procstat_running{process_name="cinder-volume"})`
Description	Raises when all `cinder-volume` processes are down.
Troubleshooting	Log in to the corresponding node and verify the process status using `systemctl status cinder-volume`. Inspect the `cinder-volume` log files in the `/var/log/cinder/` directory.
Tuning	Not required

CinderErrorLogsTooHigh¶

Severity	Warning
Summary	The average per-second rate of errors in Cinder logs on the `{{ $labels.host }}` node is larger than 0.2 messages.
Raise condition	`sum without(level) (rate(log_messages{level=~"(?i:(error\|emergency\|fatal))", service="cinder"}[5m])) > 0.2`
Description	Raises when the average per-second rate of `error`, `fatal`, or `emergency` messages in Cinder logs on the node is more than 0.2 per second. The `host` label in the raised alert contains the affected node. Fluentd forwards all logs from Cinder to Elasticsearch and counts the number of log messages per severity.
Troubleshooting	Inspect the log files in the `/var/log/cinder/` directory on the corresponding node. Inspect Cinder logs in the Kibana web UI.
Tuning description	Typically, you should not change the default value. However, you can adjust the threshold to an acceptable error rate for a particular environment. In the Prometheus Web UI, use the raise condition query to view the appearance rate of a particular message type in logs for a longer period of time and define the best threshold. To change the threshold to `0.4`: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert threshold by overriding the `if` parameter: parameters: prometheus: server: alert: CinderErrorLogsTooHigh: if: >- sum(rate(log_messages{service="cinder", \ level=~"(?i:(error\|emergency\|fatal))"}[5m])) without (level) > 0.4 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

updated: 2025-01-10 08:56

OpenStack service endpoints

View Previous Section

Glance

Cinder

Cinder¶

CinderApiOutage¶

CinderApiDown¶

CinderApiEndpointDown¶

CinderApiEndpointDownMajor¶

CinderApiEndpointsOutage¶

CinderServiceDown¶

CinderServicesDownMinor¶

CinderServicesDownMajor¶

CinderServiceOutage¶

CinderVolumeProcessDown¶

CinderVolumeProcessesDownMinor¶

CinderVolumeProcessesDownMajor¶

CinderVolumeServiceOutage¶

CinderErrorLogsTooHigh¶

View Previous Section

View Next Section