StackLight

StackLight¶

In the MCP 2019.2.4 maintenance update, Mirantis introduces the following enhancements for StackLight LMA:

Elasticsearch and Kibana versions update
Prometheus Elasticsearch exporter
TLS encryption for StackLight
VM state indicator
Docker services logging
KPI measurements
Alerts optimization
CADF notifications handled by Fluentd

To obtain the enhancements, follow the steps described in Apply maintenance updates.

Elasticsearch and Kibana versions update¶

Updated Elasticsearch and Kibana from version 5.6.12 to 6.8.0.

Prometheus Elasticsearch exporter¶

Added support for Prometheus Elasticsearch exporter that periodically sends configured queries to the Elasticsearch cluster and exposes the results as Prometheus metrics that you can view in the Prometheus web UI.

Learn more

TLS encryption for StackLight¶

Added the capability to encrypt the communication between Prometheus and Telegraf as well as Fluentd and Elasticsearch inside an MCP deployment over the Transport Layer Security (TLS) protocol.

Warning

The functionality does not cover encryption of the traffic between HAProxy and Elasticsearch.

Learn more

MCP Operations Guide: Enable TLS for StackLight

VM state indicator¶

Implemented the openstack_nova_instance_status and libvirt_domain_info_state metrics to provide an overview of a VM status from the OpenStack perspective and state from the libvirt perspective. To view the metrics, use the Prometheus web UI.

Learn more

MCP Operations Guide: Use the Prometheus web UI

Docker services logging¶

Added the capability for Fluentd to parse the Docker logs and send them to Elasticsearch. Now, you can view the Docker services logs in the Kibana web UI.

KPI measurements¶

Implemented the KPI Downtime and KPI Provisioning Grafana dashboards as well as the OVSInstanceArpingCheckDown and OpencontrailInstancePingCheckDownKey alerts to provide an overview of the infrastructure stability based on the following Key Performance Indicator (KPI) measurements:

Provisioning KPI

Provides the percentage of instances provisioning failures from the perspective of OpenStack notifications by tracking the compute.instance.create.start, compute.instance.create.end, and compute.instance.create.error Nova notifications and calculating the KPI on a daily basis. The measurements reset at midnight.

Downtime KPI

Provides the percentage of downtime check failures. Depending on the MCP cluster configuration, the downtime KPI includes the following measurements:

The states of instances from the OpenStack perspective. In this case, a check is considered as failed if the instance state is ERROR.
The instances network checks from the OVS or OpenContrail perspective:
- For OVS, StackLight LMA performs Address Resolution Protocol (ARP) pings of the DHCP assigned IP address of the OpenStack instances. The check is considered as failed if all DHCP assigned IPs of the instance do not respond to ARP pings for 10 minutes.
- For OpenContrail, StackLight LMA pings the link-local IP addresses of the OpenStack instances. The check is considered as failed if all link-local IPs of the instance do not respond to pings for 10 minutes.

Learn more

Alerts optimization¶

Enhanced the StackLight LMA alerts to provide for a more optimized infrastructure monitoring.

Reconsidered the severities of the RabbitMQ* alerts and adjusted the Alertmanager* and SystemMemory* alerts.
Restructured and enhanced the alerts documentation to provide alerts customization capabilities and troubleshooting recommendations, as well as list the alerts that require post-deployment tuning according to the deployment configuration.
Added the following alerts:
Removed the inefficient ContrailFLows*, NovaHypervisor*, NovaAggregate*, NovaTotalVCPUs*, NovaTotalMemory*, and NovaTotalDisk*, MemcachedServiceRespawn, MemcachedItemsNoneMinor, SystemSwap*, PrometheusTargetSamples*, and PrometheusDataIngestionWarning alerts.

Note

These alerts will be removed automatically when updating to MCP 2019.2.4. However, if you have modified any of these alerts, you must remove them manually as described in MCP Operations Guide: Manage alerts.

Learn more

CADF notifications handled by Fluentd¶

Added the capability for Fluentd to handle the OpenStack Cloud Auditing Data Federation (CADF) notifications instead of Heka. Deprecated the Heka service.

If required, you can configure Fluentd running on the RabbitMQ nodes to forward the Cloud Auditing Data Federation (CADF) events to specific external security information and event management (SIEM) systems. For details, see MCP Operations Guide: Enable sending CADF events to external SIEM systems.

To enable CADF notifications handling by Fluentd and remove Heka:

On the cluster level of the Reclass model:

In openstack/message_queue.yml, add the following class:
```
- system.fluentd.label.notifications
```

In stacklight/client.yml, remove the following class:

- system.docker.swarm.stack.monitoring.remote_collector

In stacklight/server.yml, remove the Heka classes:

- system.heka.remote_collector.container
- system.heka.remote_collector.input.amqp
- system.heka.remote_collector.output.elasticsearch
- system.heka.remote_collector.output.telegraf

From the Salt Master node:

Update the Fluentd configuration:

salt -C "I@fluentd:agent" state.sls fluentd

Apply the changes:

salt -C "I@docker:swarm:role:master and I@prometheus:server" state.sls docker.client

Remove the Docker service with Heka:

salt -C "I@docker:swarm:role:master and I@prometheus:server" cmd.run 'docker service rm monitoring_remote_collector'

updated: 2025-01-10 09:02

OpenContrail

View Previous Section

Ceph

StackLight

StackLight¶

Elasticsearch and Kibana versions update¶

Prometheus Elasticsearch exporter¶

TLS encryption for StackLight¶

VM state indicator¶

Docker services logging¶

KPI measurements¶

Alerts optimization¶

CADF notifications handled by Fluentd¶

View Previous Section

View Next Section