Deployment architecture¶
Mirantis Container Cloud deploys the StackLight stack
as a release of a Helm chart that contains the helm-controller
and helmbundles.lcm.mirantis.com
(HelmBundle) custom resources.
The StackLight HelmBundle consists of a set of Helm charts
with the StackLight components that include:
StackLight component |
Description |
---|---|
Alerta |
Receives, consolidates, and deduplicates the alerts sent by Alertmanager and visually represents them through a simple web UI. Using the Alerta web UI, you can view the most recent or watched alerts, group, and filter alerts. |
Alertmanager |
Handles the alerts sent by client applications such as Prometheus,
deduplicates, groups, and routes alerts to receiver integrations.
Using the Alertmanager web UI, you can view the most recent |
Elasticsearch Curator |
Maintains the data (indexes) in OpenSearch by performing such operations as creating, closing, or opening an index as well as deleting a snapshot. Also, manages the data retention policy in OpenSearch. |
Elasticsearch Exporter Compatible with OpenSearch |
The Prometheus exporter that gathers internal OpenSearch metrics. |
Grafana |
Builds and visually represents metric graphs based on time series databases. Grafana supports querying of Prometheus using the PromQL language. |
Database backends |
StackLight uses PostgreSQL for Alerta and Grafana. PostgreSQL reduces the data storage fragmentation while enabling high availability. High availability is achieved using Patroni, the PostgreSQL cluster manager that monitors for node failures and manages failover of the primary node. StackLight also uses Patroni to manage major version upgrades of PostgreSQL clusters, which allows leveraging the database engine functionality and improvements as they are introduced upstream in new releases, maintaining functional continuity without version lock-in. |
Logging stack |
Responsible for collecting, processing, and persisting logs and Kubernetes events. By default, when deploying through the Container Cloud web UI, only the metrics stack is enabled on managed clusters. To enable StackLight to gather managed cluster logs, enable the logging stack during deployment. On management clusters, the logging stack is enabled by default. The logging stack components include:
Note The logging mechanism performance depends on the cluster log load. In
case of a high load, you may need to increase the default resource requests
and limits for |
Metric collector |
Collects telemetry data (CPU or memory usage, number of active alerts, and so on) from Prometheus and sends the data to centralized cloud storage for further processing and analysis. Metric collector runs on the management cluster. Note This component is designated for internal StackLight use only. |
Prometheus |
Gathers metrics. Automatically discovers and monitors the endpoints. Using the Prometheus web UI, you can view simple visualizations and debug. By default, the Prometheus database stores metrics of the past 15 days or up to 15 GB of data depending on the limit that is reached first. |
Prometheus Blackbox Exporter |
Allows monitoring endpoints over HTTP, HTTPS, DNS, TCP, and ICMP. |
Prometheus-es-exporter |
Presents the OpenSearch data as Prometheus metrics by periodically sending configured queries to the OpenSearch cluster and exposing the results to a scrapable HTTP endpoint like other Prometheus targets. |
Prometheus Node Exporter |
Gathers hardware and operating system metrics exposed by kernel. |
Prometheus Relay |
Adds a proxy layer to Prometheus to merge the results from underlay Prometheus servers to prevent gaps in case some data is missing on some servers. Is available only in the HA StackLight mode. |
Reference Application Removed in 2.28.3 (16.3.3) |
Enables workload monitoring on non-MOSK managed clusters. Mimics a classical microservice application and provides metrics that describe the likely behavior of user workloads. Note For the feature support on MOSK deployments, refer to MOSK documentation: Deploy your first cloud application using automation. |
Salesforce notifier |
Enables sending Alertmanager notifications to Salesforce to allow creating Salesforce cases and closing them once the alerts are resolved. Disabled by default. |
Salesforce reporter |
Queries Prometheus for the data about the amount of vCPU, vRAM, and vStorage used and available, combines the data, and sends it to Salesforce daily. Mirantis uses the collected data for further analysis and reports to improve the quality of customer support. Disabled by default. |
Telegraf |
Collects metrics from the system. Telegraf is plugin-driven and has the concept of two distinct set of plugins: input plugins collect metrics from the system, services, or third-party APIs; output plugins write and expose metrics to various destinations. The Telegraf agents used in Container Cloud include:
|
Telemeter |
Enables a multi-cluster view through a Grafana dashboard of the management cluster. Telemeter includes a Prometheus federation push server and clients to enable isolated Prometheus instances, which cannot be scraped from a central Prometheus instance, to push metrics to the central location. The Telemeter services are distributed between the management cluster that hosts the Telemeter server and managed clusters that host the Telemeter client. The metrics from managed clusters are aggregated on management clusters. Note This component is designated for internal StackLight use only. |
Every Helm chart contains a default values.yml
file. These default values
are partially overridden by custom values defined in the StackLight Helm chart.
Before deploying a managed cluster, you can select the HA or non-HA StackLight architecture type. The non-HA mode is set by default. On management clusters, StackLight is deployed in the HA mode only. The following table lists the differences between the HA and non-HA modes:
Non-HA StackLight mode default |
HA StackLight mode |
---|---|
One persistent volume is provided for storing data. In case of a service or node failure, a new pod is redeployed and the volume is reattached to provide the existing data. Such setup has a reduced hardware footprint but provides less performance. |
Local Volume Provisioner is used to provide local host storage. In case
of a service or node failure, the traffic is automatically redirected to
any other running Prometheus or OpenSearch server. For better
performance, Mirantis recommends that you deploy StackLight in the HA
mode. Two |
Note
Before Container Cloud 2.24.0, Alertmanager has 2 replicas in the non-HA mode.
Depending on the Container Cloud cluster type and selected StackLight database mode, StackLight is deployed on the following number of nodes:
Cluster |
StackLight database mode |
Target nodes |
---|---|---|
Management |
HA mode |
All Kubernetes master nodes |
Managed |
Non-HA mode |
|
HA mode |
All nodes with the |