Documentation Portal

Glance

Glance¶

This section describes the alerts for Glance.

GlanceApiOutage
GlareApiOutage
GlanceApiEndpointDown
GlanceApiEndpointsDownMajor
GlanceApiEndpointsOutage
GlanceErrorLogsTooHigh

GlanceApiOutage¶

^{Removed since the 2019.2.11 maintenance update}

Severity	Critical
Summary	Glance API is not accessible for the Glance endpoint in the OpenStack service catalog.
Raise condition	`openstack_api_check_status{name="glance"} == 0`
Description	Raises when the checks against all available internal Glance endpoints in the OpenStack service catalog do not pass. Telegraf sends HTTP requests to the URLs from the OpenStack service catalog and compares the expected and actual HTTP response codes. The expected response codes for Glance are `200` and `300`.
Troubleshooting	Obtain the list of available endpoints using `openstack endpoint list` and verify the availability of internal Glance endpoints (URLs) from the list.
Tuning	Not required

GlareApiOutage¶

^{Removed since the 2019.2.11 maintenance update}

Severity	Critical
Summary	Glare API is not accessible for the Glare endpoint in the OpenStack service catalog.
Raise condition	`openstack_api_check_status{name="glare"} == 0`
Description	Raises when the checks against all available internal Glare endpoints in the OpenStack service catalog do not pass. Telegraf sends HTTP requests to the URLs from the OpenStack service catalog and compares the expected and actual HTTP response codes. The expected response codes for Glare are `200` and `300`.
Troubleshooting	Obtain the list of available endpoints using `openstack endpoint list` and verify the availability of internal Glance endpoints (URLs) from the list.
Tuning	Not required

GlanceApiEndpointDown¶

Severity	Minor
Summary	The `{{ $labels.name }}` endpoint on the `{{ $labels.host }}` node is not accessible for 2 minutes.
Raise condition	`http_response_status{name=~"glance.*"} == 0`
Description	Raises when the check against the Glance API endpoint does not pass, typically meaning that the service endpoint is down or unreachable due to connectivity issues. The `host` label in the raised alert contains the host name of the affected node. Telegraf sends an HTTP request to the URL configured in `/etc/telegraf/telegraf.d/input-http_response.conf` on the corresponding node and compares the expected and actual HTTP response codes from the configuration file.
Troubleshooting	Inspect the Telegraf logs using `journalctl -u telegraf` or in `/var/log/telegraf`. Verify the configured URL availability using `curl`.
Tuning	Not required

GlanceApiEndpointsDownMajor¶

Severity	Major
Summary	More than 50% of `{{ $labels.name }}` endpoints are not accessible for 2 minutes.
Raise condition	`count by(name) (http_response_status{name=~"glance."} == 0) >= count by(name) (http_response_status{name=~"glance."}) * 0.5`
Description	Raises when the check against the Glance API endpoint does not pass on more than 50% of the `ctl` nodes, typically meaning that the service endpoint is down or unreachable due to connectivity issues. For details on the affected nodes, see the `host` label in the `GlanceApiEndpointDown` alerts. Telegraf sends an HTTP request to the URL configured in `/etc/telegraf/telegraf.d/input-http_response.conf` on the corresponding node and compares the expected and actual HTTP response codes from the configuration file.
Troubleshooting	Inspect the Telegraf logs using `journalctl -u telegraf` or in `/var/log/telegraf`. Verify the configured URL availability using `curl`.
Tuning	Not required

GlanceApiEndpointsOutage¶

Severity	Critical
Summary	All available `{{ $labels.name }}` endpoints are not accessible for 2 minutes.
Raise condition	`count by(name) (http_response_status{name=~"glance."} == 0) == count by(name) (http_response_status{name=~"glance."})`
Description	Raises when the check against the Glance API endpoint does not pass on all controller nodes, typically meaning that the service endpoint is down or unreachable due to connectivity issues. For details on the affected nodes, see the `host` label in the `GlanceApiEndpointDown` alerts. Telegraf sends an HTTP request to the URL configured in `/etc/telegraf/telegraf.d/input-http_response.conf` on the corresponding node and compares the expected and actual HTTP response codes from the configuration file.
Troubleshooting	Inspect the Telegraf logs using `journalctl -u telegraf` or in `/var/log/telegraf`. Verify the configured URL availability using `curl`.
Tuning	Not required

GlanceErrorLogsTooHigh¶

Severity	Warning
Summary	The average per-second rate of errors in Glance logs on the `{{ $labels.host }}` node is `{{ $value }}` (as measured over the last 5 minutes).
Raise condition	`sum without(level) (rate(log_messages{level=~"(?i:(error\|emergency\|fatal))",service="glance"}[5m])) > 0.2`
Description	Raises when the average per-second rate of `error`, `fatal` or `emergency` messages in Glance logs on the node is more than 0.2 messages per second. The `host` label in the raised alert contains the affected node. Fluentd forwards all logs from Glance to Elasticsearch and counts the number of log messages by severity.
Troubleshooting	Inspect the log files in `/var/log/glance/` on the corresponding node.
Tuning	Typically, you should not change the default value. If the alert is constantly firing, inspect the Glance error logs in Kibana and adjust the threshold to an acceptable error rate for a particular environment. In the Prometheus Web UI, use the raise condition query to view the appearance rate of a particular message type in logs for a longer period of time and define the best threshold. For example, to change the threshold to `0.4`: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert threshold by overriding the `if` parameter: parameters: prometheus: server: alert: GlanceErrorLogsTooHigh: if: >- sum(rate(log_messages{service="glance", \ level=~"(?i:(error\|emergency\|fatal))"}[5m])) without (level) > 0.4 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

updated: 2025-01-10 08:56

Cinder

View Previous Section

Heat

View Next Section