StackLight configuration parameters

This section describes the StackLight configuration keys that you can specify in the values section to change StackLight settings as required. Prior to making any changes to StackLight configuration, perform the steps described in StackLight configuration procedure. After changing StackLight configuration, verify the changes as described in Verify StackLight after configuration.

Important

Some parameters are marked as mandatory. Failure to specify values for such parameters causes the Admission Controller to reject cluster creation.


Alerta

Key

Description

Example values

alerta.enabled (bool)

Enables or disables Alerta. Set to true by default.

true or false

Grafana

Key

Description

Example values

grafana.renderer.enabled (bool) Removed in 2.27.0 (17.2.0 and 16.2.0)

Disables Grafana Image Renderer. For example, for resource-limited environments. Enabled by default.

true or false

grafana.homeDashboard (string)

Defines the home dashboard. Set to kubernetes-cluster by default. You can define any of the available dashboards.

kubernetes-cluster

Logging

Key

Description

Example values

logging.enabled (bool) Mandatory

Enables or disables the StackLight logging stack. For details about the logging components, see Deployment architecture. Set to true by default. On management clusters, true is mandatory.

true or false

logging.level (bool)

Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Sets the least important level of log messages to send to OpenSearch. Requires logging.enabled set to true.

The default logging level is INFO, meaning that StackLight will drop log messages for the lower DEBUG and TRACE levels. Levels from WARNING to EMERGENCY require attention.

Note

The FLUENTD_ERROR logs are of special type and cannot be dropped.

  • TRACE - the most verbose logs. Such level generates large amounts of data.

  • DEBUG- messages typically of use only for debugging purposes.

  • INFO - informational messages describing common processes such as service starting or stopping. Can be ignored during normal system operation but may provide additional input for investigation.

  • NOTICE - normal but significant conditions that may require special handling.

  • WARNING - messages on unexpected conditions that may require attention.

  • ERROR - messages on error conditions that prevent normal system operation and require action.

  • CRITICAL - messages on critical conditions indicating that a service is not working or working incorrectly.

  • ALERT - messages on severe events indicating that action is needed immediately.

  • EMERGENCY - messages indicating that a service is unusable.

logging.metricQueries (map)

Allows configuring OpenSearch queries for the data present in OpenSearch. Prometheus Elasticsearch Exporter then queries the OpenSearch database and exposes such metrics in the Prometheus format. For details, see Create logs-based metrics. Includes the following parameters:

  • indices - specifies the index pattern

  • interval and timeout - specify in seconds how often to send the query to OpenSearch and how long it can last before timing out

  • onError and onMissing - modify the prometheus-es-exporter behavior on query error and missing index. For details, see Prometheus Elasticsearch Exporter.

For usage example, see Create logs-based metrics.

logging.retentionTime (map)

Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Specifies the retention time per index. Includes the following parameters:

  • logstash - specifies the logstash-* index retention time.

  • events - specifies the kubernetes_events-* index retention time.

  • notifications - specifies the notification-* index retention time.

The allowed values include integers (days) and numbers with suffixes: y, m, w, d, h, including capital letters.

logging:
  retentionTime:
    logstash: 3
    events: "2w"
    notifications: "1M"

Log verbosity

Key

Description

Example values

stacklightLogLevels.default (string)

Defines the log verbosity level for all StackLight components if not defined using component. To use the component default log verbosity level, leave the string empty.

  • trace - most verbose log messages, generates large amounts of data

  • debug - messages typically of use only for debugging purposes

  • info - informational messages describing common processes such as service starting or stopping; can be ignored during normal system operation but may provide additional input for investigation

  • warn - messages about conditions that may require attention

  • error - messages on error conditions that prevent normal system operation and require action

  • crit - messages on critical conditions indicating that a service is not working, working incorrectly or is unusable, requiring immediate attention

    Since Cluster releases 17.0.0, 16.0.0, and 14.1.0, the NO_SEVERITY severity label is automatically added to a log with no severity label in the message. This enables greater control over determining which logs Fluentd processes and which ones are skipped by mistake.

stacklightLogLevels.component (map)

Defines (overrides the default value) the log verbosity level for any StackLight component separately. To use the component default log verbosity, leave the string empty.

component:
  kubeStateMetrics: ""
  prometheusAlertManager: ""
  prometheusBlackboxExporter: ""
  prometheusNodeExporter: ""
  prometheusServer: ""
  alerta: ""
  alertmanagerWebhookServicenow: ""
  elasticsearchCurator: ""
  postgresql: ""
  prometheusEsExporter: ""
  sfNotifier: ""
  sfReporter: ""
  fluentd: ""
  # fluentdElasticsearch ""
  fluentdLogs: ""
  telemeterClient: ""
  telemeterServer: ""
  tfControllerExporter: ""
  tfVrouterExporter: ""
  telegrafDs: ""
  telegrafS: ""
  # elasticsearch: ""
  opensearch: ""
  # kibana: ""
  grafana: ""
  opensearchDashboards: ""
  metricbeat: ""
  prometheusMsTeams: ""

Logging to external outputs

Available since 2.23.0 and 2.23.1 for MOSK 23.1

Key

Description

Example values

logging.externalOutputs (map)

Specifies external Elasticsearch, OpenSearch, and syslog destinations as fluentd-logs outputs. Requires logging.enabled: true. For configuration procedure, see Enable log forwarding to external destinations.

logging:
  externalOutputs:
    elasticsearch:
      # disabled: false
      type: elasticsearch
      level: info
      plugin_log_level: info
      tag_exclude: '{fluentd-logs,systemd}'
      host: elasticsearch-host
      port: 9200
      logstash_date_format: '%Y.%m.%d'
      logstash_format: true
      logstash_prefix: logstash
      ...
      buffer:
        # disabled: false
        chunk_limit_size: 16m
        flush_interval: 15s
        flush_mode: interval
        overflow_action: block
        ...
    opensearch:
      disabled: true
      type: opensearch
      ...

Secrets for external log outputs

Available since 2.23.0 and 2.23.1 for MOSK 23.1

Key

Description

Example values

logging.externalOutputSecretMounts (map)

Specifies authentication secret mounts for external log destinations. Requires logging.externalOutputs to be enabled and a Kubernetes secret to be created under the stacklight namespace. Contains the following values:

  • secretName

    Mandatory. Kubernetes secret name.

  • mountPath

    Mandatory. Mount path of the Kubernetes secret defined in secretName.

  • defaultMode

    Optional. Decimal number defining secret permissions, 420 by default.

Secret mount configuration:

logging:
  externalOutputSecretMounts:
  - secretName: elasticsearch-certs
    mountPath: /tmp/elasticsearch-certs
    defaultMode: 420
  - secretName: opensearch-certs
    mountPath: /tmp/opensearch-certs

Elasticsearch configuration for the above secret mount:

logging:
  externalOutputs:
    elasticsearch:
      ...
      ca_file: /tmp/elasticsearch-certs/ca.pem
      client_cert: /tmp/elasticsearch-certs/client.pem
      client_key: /tmp/elasticsearch-certs/client.key
      client_key_pass: password

Logging to syslog

Deprecated since 2.23.0

Note

Since Container Cloud 2.23.0, logging.syslog is deprecated for the sake of logging.externalOutputs. For details, see Logging to external outputs.

Key

Description

Example values

logging.syslog.enabled (bool)

Enables or disables remote logging to syslog. Disabled by default. Requires logging.enabled set to true. For details and configuration example, see Enable remote logging to syslog.

true or false

logging.syslog.host (string)

Specifies the remote syslog host.

remote-syslog.svc

logging.syslog.level (string)

Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Specifies logging level for the syslog output.

INFO

logging.syslog.port (string)

Specifies the remote syslog port.

514

logging.syslog.packetSize (string)

Defines the packet size in bytes for the syslog logging output. Set to 1024 by default. May be useful for syslog setups allowing packet size larger than 1 kB. Mirantis recommends that you tune this parameter to allow sending full log lines.

1024

logging.syslog.protocol (bool)

Specifies the remote syslog protocol. Set to udp by default.

tcp or udp

logging.syslog.tls.enabled (bool)

Optional. Disabled by default. Enables or disables TLS. Use TLS only for the TCP protocol. TLS will not be enabled if you set a protocol other than TCP.

true or false

logging.syslog.tls.verify_mode (int)

Optional. Configures TLS verification.

  • 0 for OpenSSL::SSL::VERIFY_NONE

  • 1 for OpenSSL::SSL::VERIFY_PEER

  • 2 for OpenSSL::SSL::VERIFY_FAIL_IF_NO_PEER_CERT

  • 4 for OpenSSL::SSL::VERIFY_CLIENT_ONCE

logging.syslog.tls.certificate (string)

Defines how to pass the certificate. secret takes precedence over hostPath.

  • secret - specifies the name of the secret holding the certificate.

  • hostPath - specifies an absolute host path to the PEM certificate.

certificate:
  secret: ""
  hostPath: "/etc/ssl/certs/ca-bundle.pem"

tag_exclude (string) Since 2.23.0

Optional. Overrides tag_include. Sets logs by tags to exclude from the destination output. For example, to exclude all logs with the test tag, set tag_exclude: '/.*test.*/'.

How to obtain tags for logs

Select from the following options:

  • In the main OpenSearch output, use the logger field that equals the tag.

  • Use logs of a particular Pod or container by following the below order, with the first match winning:

    1. The value of the app Pod label. For example, for app=opensearch-master, use opensearch-master as the log tag.

    2. The value of the k8s-app Pod label.

    3. The value of the app.kubernetes.io/name Pod label.

    4. If a release_group Pod label exists and the component Pod label starts with app, use the value of the component label as the tag. Otherwise, the tag is the application label joined to the component label with a -.

    5. The name of the container from which the log is taken.

The values for tag_exclude and tag_include are placed into <match> directives of Fluentd and only accept regex types that are supported by the <match> directive of Fluentd. For details, refer to the Fluentd official documentation.

'{fluentd-logs,systemd}'

tag_include (string) Since 2.23.0

Optional. Is overridden by tag_exclude. Sets logs by tags to include to the destination output. For example, to include all logs with the auth tag, set tag_include: '/.*auth.*/'.

'/.*auth.*/'

Log filtering for namespaces

Available since Cluster releases 17.0.0, 16.0.0, 14.1.0

Key

Description

Example values

logging.namespaceFiltering.logs.enabled (bool)

Limits the number of namespaces for Pods log collection. Enabled by default with the following list of monitored Kubernetes namespaces:

Kubernetes namespaces monitored by default
  • ceph If Ceph is enabled

  • ceph-lcm-mirantis If Ceph is enabled

  • default

  • kaas

  • kube-node-lease

  • kube-public

  • kube-system

  • lcm-system

  • local-path-storage

  • metallb
    For bare metal and vSphere clusters
  • metallb-system
    For Bare metal and vSphere clusters
  • node-feature-discovery

  • openstack

  • openstack-ceph-shared
    If Ceph is enabled
  • openstack-lma-shared

  • openstack-provider-system

  • openstack-redis

  • openstack-tf-share
    If Tungsten Fabric is enabled
  • openstack-vault

  • osh-system

  • rook-ceph If Ceph is enabled

  • stacklight

  • system

  • tf If Tungsten Fabric is enabled

true or false

logging.namespaceFiltering.logs.extraNamespaces (map)

Adds extra namespaces to collect Kubernetes Pod logs from. Requires logging.enabled and logging.namespaceFiltering.logs.enabled set to true. Defines a YAML-formatted list of namespaces, which is empty by default.

logging:
  namespaceFiltering:
    logs:
      enabled: true
      extraNamespaces:
      - custom-ns-1

logging.namespaceFiltering.events.enabled (bool)

Limits the number of namespaces for Kubernetes events collection. Disabled by default due to sysdig scanner present on some MOSK clusters and due to cluster-scoped objects producing events by default to the default namespace, but it is not passed to StackLight configuration anyhow. Requires logging.enabled set to true.

true or false

logging.namespaceFiltering.events.extraNamespaces (map)

Adds extra namespaces to collect Kubernetes events from. Requires logging.enabled and logging.namespaceFiltering.events.enabled set to true. Defines a YAML-formatted list of namespaces, which is empty by default.

logging:
  namespaceFiltering:
    events:
      enabled: true
      extraNamespaces:
      - custom-ns-1

Enforce OOPS compression

Available since Cluster releases 17.0.0, 16.0.0, 14.1.0

Key

Description

Example values

logging.enforceOopsCompression

Enforces 32 GB of heap size, unless the defined memory limit allows using 50 GB of heap. Requires logging.enabled set to true. Enabled by default. When disabled, StackLight computes heap as ⅘ of the set memory limit for any resulting heap value. For more details, see Tune OpenSearch performance for the bare metal provider.

logging:
  enforceOopsCompression: true

OpenSearch

Key

Description

Example values

elasticsearch.retentionTime (map)

Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Specifies the retention time per index. Includes the following parameters:

  • logstash - specifies the logstash-* index retention time.

  • events - specifies the kubernetes_events-* index retention time.

  • notifications - specifies the notification-* index retention time.

The allowed values include integers (days) and numbers with suffixes: y, m, w, d, h, including capital letters.

By default, values set in elasticsearch.logstashRetentionTime are used. However, the elasticsearch.retentionTime parameters, if defined, take precedence over elasticsearch.logstashRetentionTime.

elasticsearch:
  retentionTime:
    logstash: 3
    events: "2w"
    notifications: "1M"

elasticsearch.logstashRetentionTime (int)

Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).

Defines the OpenSearch (Elasticsearch) logstash-* index retention time in days. The logstash-* index stores all logs gathered from all nodes and containers. Set to 1 by default.

Note

Due to the known issue 27732-2, a custom setting for this parameter is dismissed during cluster deployment and changes to one day (default). Refer to the known issue description for the affected Cluster releases and available workaround.

1, 5, 15

elasticsearch.persistentVolumeClaimSize (string) Mandatory

Specifies the OpenSearch (Elasticsearch) PVC(s) size. The number of PVCs depends on the StackLight database mode. For HA, three PVCs will be created, each of the size specified in this parameter. For non-HA, one PVC of the specified size.

Important

You cannot modify this parameter after cluster creation.

Note

Due to the known issue 27732-1 that is fixed in Container Cloud 2.22.0 (Cluster releases 11.6.0 and 12.7.0), the OpenSearch PVC size configuration is dismissed during a cluster deployment. Refer to the known issue description for affected Cluster releases and available workarounds.

elasticsearch:
  persistentVolumeClaimSize: 30Gi

elasticsearch.persistentVolumeUsableStorageSizeGB (integer) Available since 2.26.0 (17.1.0, 16.1.0)

Optional. Specifies the number of gigabytes that is exclusively available for the OpenSearch data. Defines ceiling for storage-based retention where 80% of the defined value is assumed as available disk space for normal OpenSearch node functioning. If not set (by default), the number of gigabytes from elasticsearch.persistentVolumeClaimSize is used.

This parameter is useful in the following cases:

  • The real storage behind the volume is shared between multiple consumers. As a result, OpenSearch cannot use all elasticsearch.persistentVolumeClaimSize.

  • The real volume size is bigger than elasticsearch.persistentVolumeClaimSize. As a result, OpenSearch can use more than elasticsearch.persistentVolumeClaimSize.

elasticsearch:
  persistentVolumeUsableStorageSizeGB: 160

OpenSearch extra settings

Key

Description

Example values

logging.extraConfig (map)

Additional configuration for opensearch.yml.

logging:
  extraConfig:
    cluster.max_shards_per_node: 5000

OpenSearch Dashboards extra settings

Key

Description

Example values

logging.dashboardsExtraConfig (map)

Additional configuration for opensearch_dashboards.yml.

logging:
  dashboardsExtraConfig:
    opensearch.requestTimeout: 60000

High availability

Key

Description

Example values

highAvailabilityEnabled (bool) Mandatory

Enables or disables StackLight multiserver mode. For details, see StackLight database modes in Deployment architecture. On managed clusters, set to false by default. On management clusters, true is mandatory.

true or false

Prometheus

Key

Description

Example values

prometheusServer.alertResendDelay (string)

Defines the minimum amount of time for Prometheus to wait before resending an alert to Alertmanager. Passed to the --rules.alert.resend-delay flag. Set to 2m by default.

2m, 90s

prometheusServer.alertsCommonLabels (dict) Since 2.26.0 (17.1.0, 16.1.0)

Defines the list of labels to be injected to firing alerts while they are sent to Alertmanager. Empty by default.

The following labels are reserved for internal purposes and cannot be overridden: cluster_id, service, severity.

Caution

When new labels are injected, Prometheus sends alert updates with a new set of labels, which can potentially cause Alertmanager to have duplicated alerts for a short period of time if the cluster currently has firing alerts.

alertsCommonLabels:
  region: west
  environment: prod

prometheusServer.persistentVolumeClaimSize (string) Mandatory

Specifies the Prometheus PVC(s) size. The number of PVCs depends on the StackLight database mode. For HA, three PVCs will be created, each of the size specified in this parameter. For non-HA, one PVC of the specified size.

Important

You cannot modify this parameter after cluster creation.

prometheusServer:
  persistentVolumeClaimSize: 16Gi

prometheusServer.queryConcurrency (string) Since 2.24.0

Defines the number of concurrent queries limit. Passed to the --query.max-concurrency flag. Set to 20 by default.

25

prometheusServer.retentionSize (string)

Defines the Prometheus database retention size. Passed to the --storage.tsdb.retention.size flag. Set to 15GB by default.

15GB, 512MB

prometheusServer.retentionTime (string)

Defines the Prometheus database retention period. Passed to the --storage.tsdb.retention.time flag. Set to 15d by default.

15d, 1000h, 10d12h

Prometheus remote write

Allows sending of metrics from Prometheus to a custom monitoring endpoint. For details, see Prometheus Documentation: remote_write.

Key

Description

Example values

prometheusServer.remoteWriteSecretMounts (slice)

Skip this step if your remote server does not have authorization. Defines additional mounts for remoteWrites secrets. Secret objects with credentials needed to access the remote endpoint must be precreated in the stacklight namespace. For details, see Kubernetes Secrets.

Note

To create more than one file for the same remote write endpoint, for example, to configure TLS connections, use a single secret object with multiple keys in the data field. Using the following example configuration, two files will be created, cert_file and key_file:

...
  data:
    cert_file: aWx1dnRlc3Rz
    key_file: dGVzdHVzZXI=
...
remoteWriteSecretMounts:
- secretName: prom-secret-files
  mountPath: /etc/config/remote_write

prometheusServer.remoteWrites (slice)

Defines the configuration of a custom remote_write endpoint for sending Prometheus samples.

Note

If the remote server uses authorization, first create secret(s) in the stacklight namespace and mount them to Prometheus through prometheusServer.remoteWriteSecretMounts. Then define the created secret in the authorization field.

remoteWrites:
-  url: http://remote_url/push
   authorization:
     credentials_file: /etc/config/remote_write/key_file

Prometheus Relay

Note

Prometheus Relay is set up as an endpoint in the Prometheus datasource in Grafana. Therefore, all requests from Grafana are sent to Prometheus through Prometheus Relay. If Prometheus Relay reports request timeouts or exceeds the response size limits, you can configure the parameters below. In this case, Prometheus Relay resource limits may also require tuning.

Key

Description

Example values

prometheusRelay.clientTimeout (string)

Specifies the client timeout in seconds. If empty, defaults to a value determined by the cluster size: 10 for small, 30 for medium, 60 for large.

Note

The cluster size parameters are available since Container Cloud 2.24.0.

10

prometheusRelay.responseLimitBytes (string)

Specifies the response size limit in bytes. If empty, defaults to a value determined by the cluster size: 6291456 for small, 18874368 for medium, 37748736 for large.

Note

The cluster size parameters are available since Container Cloud 2.24.0.

1048576

Custom Prometheus recording rules

Key

Description

Example values

prometheusServer.customRecordingRules (slice)

Defines custom Prometheus recording rules. Overriding of existing recording rules is not supported.

customRecordingRules:
- name: ExampleRule.http_requests_total
  rules:
  - expr: sum by(job) (rate(http_requests_total[5m]))
    record: job:http_requests:rate5m
  - expr: avg_over_time(job:http_requests:rate5m[1w])
    record: job:http_requests:rate5m:avg_over_time_1w

Custom Prometheus scrape configurations

Key

Description

Example values

prometheusServer.customScrapeConfigs (map)

Defines custom Prometheus scrape configurations. For details, see Prometheus documentation: scrape_config. The names of default StackLight scrape configurations, which you can view in the Status -> Targets tab of the Prometheus web UI, are reserved for internal usage and any overrides will be discarded. Therefore, provide unique names to avoid overrides.

customScrapeConfigs:
  custom-grafana:
    scrape_interval: 10s
    scrape_timeout: 5s
    kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - source_labels:
      - __meta_kubernetes_service_label_app
      - __meta_kubernetes_endpoint_port_name
      regex: grafana;service
      action: keep
    - source_labels:
      - __meta_kubernetes_pod_name
      target_label: pod

Cluster size

Key

Description

Example values

clusterSize (string)

Specifies the approximate expected cluster size. Set to small by default. Other possible values include medium and large. Depending on the choice, appropriate resource limits are passed according to the resources or deprecated resourcesPerClusterSize parameter. The values differ by the OpenSearch and Prometheus resource limits:

  • small (default) - 2 CPU, 6 Gi RAM for OpenSearch, 1 CPU, 8 Gi RAM for Prometheus. Use small only for testing and evaluation purposes with no workloads expected.

  • medium - 4 CPU, 16 Gi RAM for OpenSearch, 3 CPU, 16 Gi RAM for Prometheus.

  • large - 8 CPU, 32 Gi RAM for OpenSearch, 6 CPU, 32 Gi RAM for Prometheus. Set to large only in case of lack of resources for OpenSearch and Prometheus.

small, medium, or large

Resource limits

Key

Description

Example values

resourcesPerClusterSize (map)

Provides the capability to override the default resource requests or limits for any StackLight component for the predefined cluster sizes.

Caution

Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0), resourcesPerClusterSize is deprecated. Use the resources parameter instead.

StackLight components for resource limits customization

Note

The below list has the componentName: <podNamePrefix>/<containerName> format.

alerta: alerta/alerta
alertmanager: prometheus-alertmanager/prometheus-alertmanager
alertmanagerWebhookServicenow: alertmanager-webhook-servicenow/alertmanager-webhook-servicenow
blackboxExporter: prometheus-blackbox-exporter/blackbox-exporter
elasticsearch: opensearch-master/opensearch # Deprecated
elasticsearchCurator: elasticsearch-curator/elasticsearch-curator
elasticsearchExporter: elasticsearch-exporter/elasticsearch-exporter
fluentdElasticsearch: fluentd-logs/fluentd-logs # Deprecated
fluentdLogs: fluentd-logs/fluentd-logs
fluentdNotifications: fluentd-notifications/fluentd # for MOSK
grafana: grafana/grafana
grafanaRenderer: grafana/grafana-renderer # Removed in 2.27.0 (Cluster releases 17.2.0 and 16.2.0)
iamProxy: iam-proxy/iam-proxy # Deprecated
iamProxyAlerta: iam-proxy-alerta/iam-proxy
iamProxyAlertmanager: iam-proxy-alertmanager/iam-proxy
iamProxyGrafana: iam-proxy-grafana/iam-proxy
iamProxyKibana: iam-proxy-kibana/iam-proxy # Deprecated
iamProxyOpenSearchDashboards: iam-proxy-kibana/iam-proxy
iamProxyPrometheus: iam-proxy-prometheus/iam-proxy
kibana: opensearch-dashboards/opensearch-dashboards # Deprecated
kubeStateMetrics: prometheus-kube-state-metrics/prometheus-kube-state-metrics
libvirtExporter: prometheus-libvirt-exporter/prometheus-libvirt-exporter # for MOSK
metricCollector: metric-collector/metric-collector
metricbeat: metricbeat/metricbeat
nodeExporter: prometheus-node-exporter/prometheus-node-exporter
opensearch: opensearch-master/opensearch
opensearchDashboards: opensearch-dashboards/opensearch-dashboards
patroniExporter: patroni/patroni-patroni-exporter
pgsqlExporter: patroni/patroni-pgsql-exporter
postgresql: patroni/patroni
prometheusEsExporter: prometheus-es-exporter/prometheus-es-exporter
prometheusMsTeams: prometheus-msteams/prometheus-msteams
prometheusRelay: prometheus-relay/prometheus-relay
prometheusServer: prometheus-server/prometheus-server
refapp: refapp/refapp # Removed in 2.28.3 (16.3.3)
refappCleanup: refapp-cleanup/refapp-cleanup # Removed in 2.28.3 (16.3.3)
refappInit: db-init/db-init # Removed in 2.28.3 (16.3.3)
sfNotifier: sf-notifier/sf-notifier
sfReporter: sf-reporter/sf-reporter
stacklightHelmControllerController: stacklight-helm-controller/controller
telegrafDockerSwarm: telegraf-docker-swarm/telegraf-docker-swarm
telegrafDs: telegraf-ds-smart/telegraf-ds-smart # Deprecated
telegrafDsSmart: telegraf-ds-smart/telegraf-ds-smart
telegrafOpenstack: telegraf-openstack/telegraf-openstack # for MOSK, replaced with osdpl-exporter in 24.1
telegrafS: telegraf-docker-swarm/telegraf-docker-swarm # Deprecated
telemeterClient: telemeter-client/telemeter-client
telemeterServer: telemeter-server/telemeter-server
telemeterServerAuthServer: telemeter-server/telemeter-server-authorization-server
tfControllerExporter: prometheus-tf-controller-exporter/prometheus-tungstenfabric-exporter # for MOSK
tfVrouterExporter: prometheus-tf-vrouter-exporter/prometheus-tungstenfabric-exporter # for MOSK
resourcesPerClusterSize:
  # elasticsearch:
  opensearch:
    small:
      limits:
        cpu: "1000m"
        memory: "4Gi"
    medium:
      limits:
        cpu: "2000m"
        memory: "8Gi"
      requests:
        cpu: "1000m"
        memory: "4Gi"
    large:
      limits:
        cpu: "4000m"
        memory: "16Gi"

resources (map)

Provides the capability to override the containers resource requests or limits for any StackLight component.

StackLight components for resource limits customization

Note

The below list has the componentName: <podNamePrefix>/<containerName> format.

alerta: alerta/alerta
alertmanager: prometheus-alertmanager/prometheus-alertmanager
alertmanagerWebhookServicenow: alertmanager-webhook-servicenow/alertmanager-webhook-servicenow
blackboxExporter: prometheus-blackbox-exporter/blackbox-exporter
elasticsearch: opensearch-master/opensearch # Deprecated
elasticsearchCurator: elasticsearch-curator/elasticsearch-curator
elasticsearchExporter: elasticsearch-exporter/elasticsearch-exporter
fluentdElasticsearch: fluentd-logs/fluentd-logs # Deprecated
fluentdLogs: fluentd-logs/fluentd-logs
fluentdNotifications: fluentd-notifications/fluentd # for MOSK
grafana: grafana/grafana
grafanaRenderer: grafana/grafana-renderer # Removed in 2.27.0 (Cluster releases 17.2.0 and 16.2.0)
iamProxy: iam-proxy/iam-proxy # Deprecated
iamProxyAlerta: iam-proxy-alerta/iam-proxy
iamProxyAlertmanager: iam-proxy-alertmanager/iam-proxy
iamProxyGrafana: iam-proxy-grafana/iam-proxy
iamProxyKibana: iam-proxy-kibana/iam-proxy # Deprecated
iamProxyOpenSearchDashboards: iam-proxy-kibana/iam-proxy
iamProxyPrometheus: iam-proxy-prometheus/iam-proxy
kibana: opensearch-dashboards/opensearch-dashboards # Deprecated
kubeStateMetrics: prometheus-kube-state-metrics/prometheus-kube-state-metrics
libvirtExporter: prometheus-libvirt-exporter/prometheus-libvirt-exporter # for MOSK
metricCollector: metric-collector/metric-collector
metricbeat: metricbeat/metricbeat
nodeExporter: prometheus-node-exporter/prometheus-node-exporter
opensearch: opensearch-master/opensearch
opensearchDashboards: opensearch-dashboards/opensearch-dashboards
patroniExporter: patroni/patroni-patroni-exporter
pgsqlExporter: patroni/patroni-pgsql-exporter
postgresql: patroni/patroni
prometheusEsExporter: prometheus-es-exporter/prometheus-es-exporter
prometheusMsTeams: prometheus-msteams/prometheus-msteams
prometheusRelay: prometheus-relay/prometheus-relay
prometheusServer: prometheus-server/prometheus-server
refapp: refapp/refapp # Removed in 2.28.3 (16.3.3)
refappCleanup: refapp-cleanup/refapp-cleanup # Removed in 2.28.3 (16.3.3)
refappInit: db-init/db-init # Removed in 2.28.3 (16.3.3)
sfNotifier: sf-notifier/sf-notifier
sfReporter: sf-reporter/sf-reporter
stacklightHelmControllerController: stacklight-helm-controller/controller
telegrafDockerSwarm: telegraf-docker-swarm/telegraf-docker-swarm
telegrafDs: telegraf-ds-smart/telegraf-ds-smart # Deprecated
telegrafDsSmart: telegraf-ds-smart/telegraf-ds-smart
telegrafOpenstack: telegraf-openstack/telegraf-openstack # for MOSK, replaced with osdpl-exporter in 24.1
telegrafS: telegraf-docker-swarm/telegraf-docker-swarm # Deprecated
telemeterClient: telemeter-client/telemeter-client
telemeterServer: telemeter-server/telemeter-server
telemeterServerAuthServer: telemeter-server/telemeter-server-authorization-server
tfControllerExporter: prometheus-tf-controller-exporter/prometheus-tungstenfabric-exporter # for MOSK
tfVrouterExporter: prometheus-tf-vrouter-exporter/prometheus-tungstenfabric-exporter # for MOSK
resources:
  alerta:
    requests:
      cpu: "50m"
      memory: "200Mi"
    limits:
      memory: "500Mi"

Using the example above, each pod in the alerta service will be requesting 50 millicores of CPU and 200 MiB of memory, while being hard-limited to 500 MiB of memory usage. Each configuration key is optional.

Note

The logging mechanism performance depends on the cluster log load. If the cluster components send an excessive amount of logs, the default resource requests and limits for fluentdLogs (or fluentdElasticsearch) may be insufficient, which may cause its pods to be OOMKilled and trigger the KubePodCrashLooping alert. In such case, increase the default resource requests and limits for fluentdLogs. For example:

resources:
  # fluentdElasticsearch:
  fluentdLogs:
    requests:
      memory: "500Mi"
    limits:
      memory: "1500Mi"

Byte limit for Telemeter client

For internal StackLight use only

Key

Description

Example values

telemetry.telemeterClient.limitBytes (string)

Specifies the size limit of the incoming data length in bytes for the Telemeter client. Defaults to 1048576.

4194304

Kubernetes network policies

Available since Cluster releases 17.0.1 and 16.0.1

Key

Description

Example values

networkPolicies.enabled (bool)

Enables or disables the Kubernetes Network Policy resource that allows controlling network connections to and from Pods deployed in the stackLight namespace. Enabled by default.

For the list of network policy rules, refer to StackLight rules for Kubernetes network policies. Customization of network policies is not supported.

true or false

Kubernetes tolerations

Key

Description

Example values

tolerations.default (slice)

Kubernetes tolerations to add to all StackLight components.

default:
- key: "com.docker.ucp.manager"
  operator: "Exists"
  effect: "NoSchedule"

tolerations.component (map)

Defines Kubernetes tolerations (overrides the default ones) for any StackLight component.

component:
  # elasticsearch:
  opensearch:
  - key: "com.docker.ucp.manager"
    operator: "Exists"
    effect: "NoSchedule"
  postgresql:
  - key: "node-role.kubernetes.io/master"
    operator: "Exists"
    effect: "NoSchedule"

Storage class

In an HA StackLight setup, when highAvailabilityEnabled is set to true, all StackLight Persistent Volumes (PVs) use the Local Volume Provisioner (LVP) storage class not to rely on dynamic provisioners such as Ceph, which are not available in every Container Cloud deployment. In a non-HA StackLight setup, when no storage class is specified, PVs use the default storage class of a cluster.

Key

Description

Example values

storage.defaultStorageClass (string)

Defines the StorageClass to use for all StackLight Persistent Volume Claims (PVCs) if a component StorageClass is not defined using the componentStorageClasses. To use the default storage class, leave the string empty.

lvp, standard

storage.componentStorageClasses (map)

Defines (overrides the defaultStorageClass value) the storage class for any StackLight component separately. To use the default storage class, leave the string empty.

componentStorageClasses:
  elasticsearch: ""
  opensearch: ""
  fluentd: ""
  postgresql: ""
  prometheusAlertManager: ""
  prometheusServer: ""

NodeSelector

Key

Description

Example values

nodeSelector.default (map)

Defines the NodeSelector to use for the most of StackLight pods (except some pods that refer to DaemonSets) if the NodeSelector of a component is not defined.

default:
  role: stacklight

nodeSelector.component (map)

Defines the NodeSelector to use for particular StackLight component pods. Overrides nodeSelector.default.

component:
  alerta:
    role: stacklight
    component: alerta
  # kibana:
  #   role: stacklight
  #   component: kibana
  opensearchDashboards:
    role: stacklight
    component: opensearchdashboards

Prometheus Node Exporter

Key

Description

Example values

nodeExporter.netDeviceExclude (string)

Excludes monitoring of RegExp-specified network devices. The number of network interface-related metrics is significant and may cause extended Prometheus RAM usage in big clusters. Therefore, Prometheus Node Exporter only collects information of a basic set of interfaces (both host and container) and excludes the following monitoring interfaces:

  • veth/cali - the host-side part of the container-host Ethernet tunnel

  • o-hm0 - the OpenStack Octavia management interface for communication with the amphora machine

  • tap, qg-, qr-, ha- - the Open vSwitch virtual bridge ports

  • br-(ex|int|tun) - the Open vSwitch virtual bridges

  • docker0, br- - the Docker bridge (master for the veth interfaces)

  • ovs-system - the Open vSwitch interface (mapping interfaces to bridges)

To enable information collecting for the interfaces above, edit the list of blacklisted devices as needed.

nodeExporter:
  netDeviceExclude: "^(veth.+|cali.+|o-hm0|tap.+|qg-.+|qr-.+|ha-.+|br-.+|ovs-system|docker0)$"

nodeExporter.extraCollectorsEnabled (slice)

Enables Node Exporter collectors. For a list of available collectors, see Node Exporter Collectors. The following collectors are enabled by default in StackLight:

  • arp

  • conntrack

  • cpu

  • diskstats

  • entropy

  • filefd

  • filesystem

  • hwmon

  • loadavg

  • meminfo

  • netdev

  • netstat

  • nfs

  • stat

  • sockstat

  • textfile

  • time

  • timex

  • uname

  • vmstat

extraCollectorsEnabled:
  - bcache
  - bonding
  - softnet

Prometheus Blackbox Exporter

Key

Description

Example values

blackboxExporter.customModules (map)

Specifies a set of custom Blackbox Exporter modules. For details, see Blackbox Exporter configuration: module. The http_2xx, http_2xx_verify, http_openstack, http_openstack_insecure, tls, tls_verify names are reserved for internal usage and any overrides will be discarded.

customModules:
  http_post_2xx:
    prober: http
    timeout: 5s
    http:
      method: POST
      headers:
        Content-Type: application/json
      body: '{}'

blackboxExporter.timeoutOffset (string)

Specifies the offset to subtract from timeout in seconds (--timeout-offset), upper bounded by 5.0 to comply with the built-in StackLight functionality. If nothing is specified, the Blackbox Exporter default value is used. For example, for Blackbox Exporter v0.19.0, the default value is 0.5.

timeoutOffset: "0.1"

Reference Application

Unsupported and removed in 2.28.3 (16.3.3) Available since 2.21.0 for non-MOSK managed clusters

Note

For the feature support on MOSK deployments, refer to MOSK documentation: Deploy your first cloud application using automation.

Key

Description

Example values

refapp.enabled (bool)

Enables or disables Reference Application that is a small microservice application that enables workload monitoring on non-MOSK managed clusters. Disabled by default.

true or false

refapp.workload.persistentVolumeEnabled (bool)

Available since Container Cloud 2.23.0.
Enables or disables persistent volumes for Reference Application. Enabled by default. Disabling is not recommended for production clusters. Once set, the value cannot be changed.

true or false

refapp.workload.storageClassName (string)

Defines StorageClass to use for Reference Application persistent volumes. Empty by default. If empty, uses the default storage class. Once set, the value cannot be changed. Takes effect only if persistent volumes are enabled.

refapp:
  workload:
    storageClassName: kubernetes-ssd

refapp.workload.persistentVolumeSize (string)

Available since Container Cloud 2.23.0.
Defines the size of persistent volumes for the Reference Application. Default is 1Gi. Applies only if persistent volumes are enabled.
refapp:
  workload:
    persistentVolumeSize: 1Gi

Salesforce reporter

On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Salesforce reporter depends on the Internet access through HTTPS.

Key

Description

Example values

clusterId (string)

Unique cluster identifier clusterId="<Cluster Project>/<Cluster Name>/<UID>", generated for each cluster using Cluster Project, Cluster Name, and cluster UID, separated by a slash. Used for both sf-reporter and sf-notifier services.

The clusterId key is automatically defined for each cluster. Do not set or modify it manually.

Do not modify clusterId.

sfReporter.enabled (bool)

Enables or disables reporting of Prometheus metrics to Salesforce. For details, see Deployment architecture. Disabled by default.

true or false

sfReporter.salesForceAuth (map)

Salesforce parameters and credentials for the metrics reporting integration.

Note

Modify this parameter if sf-notifier is not configured or if you want to use a different Salesforce user account to send reports to.

salesForceAuth:
  url: "<SF instance URL>"
  username: "<SF account email address>"
  password: "<SF password>"
  environment_id: "<Cloud identifier>"
  organization_id: "<Organization identifier>"
  sandbox_enabled: "<Set to true or false>"

sfReporter.cronjob (map)

Defines the Kubernetes cron job for sending metrics to Salesforce. By default, reports are sent at midnight server time.

cronjob:
  schedule: "0 0 * * *"
  concurrencyPolicy: "Allow"
  failedJobsHistoryLimit: ""
  successfulJobsHistoryLimit: ""
  startingDeadlineSeconds: 200

Ceph monitoring

Key

Description

Example values

ceph.enabled (bool)

Enables or disables Ceph monitoring on baremetal-based managed clusters. Set to false by default.

true or false

External endpoint monitoring

Key

Description

Example values

externalEndpointMonitoring.enabled (bool)

Enables or disables HTTP endpoints monitoring. If enabled, the monitoring tool performs the probes against the defined endpoints every 15 seconds. Set to false by default.

true or false

externalEndpointMonitoring.certificatesHostPath (string)

Defines the directory path with external endpoints certificates on host.

/etc/ssl/certs/

externalEndpointMonitoring.domains (slice)

Defines the list of HTTP endpoints to monitor. The endpoints must successfully respond to a liveness probe. For success, a request to a specific endpoint must result in a 2xx HTTP response code.

domains:
- https://prometheus.io/health
- http://example.com:8080/status
- http://example.net:8080/pulse

Ironic monitoring

Key

Description

Example values

ironic.endpoint (string)

Enables or disables monitoring of bare metal Ironic on baremetal-based clusters. To enable, specify the Ironic API URL.

http://ironic-api-http.kaas.svc:6385/v1

ironic.insecure (bool)

Defines whether to skip the chain and host verification. Set to false by default.

true or false

SSL certificates monitoring

Key

Description

Example values

sslCertificateMonitoring.enabled (bool)

Enables or disables StackLight to monitor and alert on the expiration date of the TLS certificate of an HTTPS endpoint. If enabled, the monitoring tool performs the probes against the defined endpoints every hour. Set to false by default.

true or false

sslCertificateMonitoring.domains (slice)

Defines the list of HTTPS endpoints to monitor the certificates from.

domains:
- https://prometheus.io
- https://example.com:8080

Mirantis Kubernetes Engine monitoring

Key

Description

Example values

mke.enabled (bool)

Enables or disables Mirantis Kubernetes Engine (MKE) monitoring. Set to true by default.

true or false

mke.dockerdDataRoot (string)

Defines the dockerd data root directory of persistent Docker state. For details, see Docker documentation: Daemon CLI (dockerd).

/var/lib/docker

Workload monitoring

Key

Description

Example values

metricFilter (map)

On the clusters that run large-scale workloads, workload monitoring generates a big amount of resource-consuming metrics. To prevent generation of excessive metrics, you can disable workload monitoring in the StackLight metrics and monitor only the infrastructure.

The metricFilter parameter enables the cAdvisor (Container Advisor) and kubeStateMetrics metric ingestion filters for Prometheus. Set to false by default. If set to true, you can define the namespaces to which the filter will apply. The parameter is designed for managed clusters.

metricFilter:
  enabled: true
  action: keep
  namespaces:
  - kaas
  - kube-system
  - stacklight
  • enabled - enable or disable metricFilter using true or false

  • action - action to take by Prometheus:

    • keep - keep only metrics from namespaces that are defined in the namespaces list

    • drop - ignore metrics from namespaces that are defined in the namespaces list

  • namespaces - list of namespaces to keep or drop metrics from regardless of the boolean value for every namespace

Prometheus metrics filtering

Available since 2.24.0 and 2.24.2 for MOSK 23.2

Key

Description

Example values

metricsFiltering.enabled (bool)

Configuration for managing Prometheus metrics filtering. When enabled (default), only actively used and explicitly white-listed metrics get scraped by Prometheus.

prometheusServer:
  metricsFiltering:
    enabled: true

metricsFiltering.extraMetricsInclude (map)

List of extra metrics to whitelist, which are dropped by default. Contains the following parameters:

  • <job name> - scraping job name as a key for extra white-listed metrics to add under the key. For the list of job names, see White list of Prometheus scrape jobs. If a job name is not present in this list, its target metrics are not dropped and are collected by Prometheus by default.

    You can also use group key names to add metrics to more than one job using _group-<key name>. The following list combines jobs by groups:

    List of jobs by groups
    _group-blackbox-metrics
     - blackbox
     - blackbox-external-endpoint
     - kubernetes-master-api
     - mcc-blackbox
     - mke-manager-api
     - msr-api
     - openstack-blackbox-ext
     - openstack-dns-probe # Since MOSK 24.3
     - refapp
    
    _group-controller-runtime-metrics
     - helm-controller
     - kaas-exporter
     - kubelet
     - kubernetes-apiservers
     - mcc-controllers
     - mcc-providers
     - rabbitmq-operator-metrics
    
    _group-etcd-metrics
     - etcd-server
     - ucp-kv
    
    _group-go-collector-metrics
     - cadvisor
     - calico
     - etcd-server
     - helm-controller
     - ironic
     - kaas-exporter
     - kubelet
     - kubernetes-apiservers
     - mcc-cache
     - mcc-controllers
     - mcc-providers
     - mke-metrics-controller
     - mke-metrics-engine
     - openstack-ingress-controller
     - postgresql
     - prometheus-alertmanager
     - prometheus-elasticsearch-exporter
     - prometheus-grafana
     - prometheus-libvirt-exporter
     - prometheus-memcached-exporter
     - prometheus-msteams
     - prometheus-mysql-exporter
     - prometheus-node-exporter
     - prometheus-rabbitmq-exporter
     - prometheus-relay
     - prometheus-server
     - rabbitmq-operator-metrics
     - telegraf-docker-swarm
     - telemeter-client
     - telemeter-server
     - tf-control
     - tf-redis
     - tf-vrouter
     - ucp-kv
    
    _group-process-collector-metrics
     - alertmanager-webhook-servicenow
     - cadvisor
     - calico
     - etcd-server
     - helm-controller
     - ironic
     - kaas-exporter
     - kubelet
     - kubernetes-apiservers
     - mcc-cache
     - mcc-controllers
     - mcc-providers
     - mke-metrics-controller
     - mke-metrics-engine
     - openstack-ingress-controller
     - patroni
     - postgresql
     - prometheus-alertmanager
     - prometheus-elasticsearch-exporter
     - prometheus-grafana
     - prometheus-libvirt-exporter
     - prometheus-memcached-exporter
     - prometheus-msteams
     - prometheus-mysql-exporter
     - prometheus-node-exporter
     - prometheus-rabbitmq-exporter
     - prometheus-relay
     - prometheus-server
     - rabbitmq-operator-metrics
     - sf-notifier
     - telegraf-docker-swarm
     - telemeter-client
     - telemeter-server
     - tf-control
     - tf-redis
     - tf-vrouter
     - tf-zookeeper
     - ucp-kv
    
    _group-rest-client-metrics
     - helm-controller
     - kaas-exporter
     - mcc-controllers
     - mcc-providers
    
    _group-service-handler-metrics
     - mcc-controllers
     - mcc-providers
    
    _group-service-http-metrics
     - mcc-cache
     - mcc-controllers
    
    _group-service-reconciler-metrics
     - mcc-controllers
     - mcc-providers
    

    Note

    The prometheus-coredns job from the go-collector-metrics and process-collector-metrics groups is removed in Cluster releases 17.0.0, 16.0.0, and 14.1.0.

  • <list of metrics to collect> - extra metrics of <job name> to be white-listed.

prometheusServer:
  metricsFiltering:
    enabled: true
    extraMetricsInclude:
      cadvisor:
        - container_memory_failcnt
        - container_network_transmit_errors_total
      calico:
        - felix_route_table_per_iface_sync_seconds_sum
        - felix_bpf_dataplane_endpoints
      _group-go-collector-metrics:
        - go_gc_heap_goal_bytes
        - go_gc_heap_objects_objects

Alerts configuration

Key

Description

Example values

prometheusServer.customAlerts (slice)

Defines custom alerts. Also, modifies or disables existing alert configurations. For the list of predefined alerts, see Available StackLight alerts. While adding or modifying alerts, follow the Alerting rules.

customAlerts:
# To add a new alert:
- alert: ExampleAlert
  annotations:
    description: Alert description
    summary: Alert summary
  expr: example_metric > 0
  for: 5m
  labels:
    severity: warning
# To modify an existing alert expression:
- alert: AlertmanagerFailedReload
  expr: alertmanager_config_last_reload_successful == 5
# To disable an existing alert:
- alert: TargetDown
  enabled: false

An optional field enabled is accepted in the alert body to disable an existing alert by setting to false. All fields specified using the customAlerts definition override the default predefined definitions in the charts’ values.

Watchdog alert

Key

Description

Example values

prometheusServer.watchDogAlertEnabled (bool)

Enables or disables the Watchdog alert that constantly fires as long as the entire alerting pipeline is functional. You can use this alert to verify that Alertmanager notifications properly flow to the Alertmanager receivers. Set to true by default.

true or false

Alertmanager integrations

On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled, for example, for the Salesforce integration and Alertmanager notifications external rules.

Key

Description

Example values

alertmanagerSimpleConfig.genericReceivers (slice)

Provides a generic template for notifications receiver configurations. For a list of supported receivers, see Prometheus Alertmanager documentation: Receiver.

For example, to enable notifications to OpsGenie:

alertmanagerSimpleConfig:
  genericReceivers:
  - name: HTTP-opsgenie
    enabled: true # optional
    opsgenie_configs:
    - api_url: "https://example.app.eu.opsgenie.com/"
      api_key: "secret-key"
      send_resolved: true

alertmanagerSimpleConfig.genericRoutes (slice)

Provides a template for notifications route configuration. For details, see Prometheus Alertmanager documentation: Route.

genericRoutes:
- receiver: HTTP-opsgenie
  enabled: true # optional
  matchers:
    severity=~"major|critical"
  continue: true

alertmanagerSimpleConfig.inhibitRules.enabled (bool)

Disables or enables alert inhibition rules. If enabled, Alertmanager decreases alert noise by suppressing dependent alerts notifications to provide a clearer view on the cloud status and simplify troubleshooting. Enabled by default. For details, see Alert dependencies. For details on inhibition rules, see Prometheus documentation.

true or false

Notifications to email

Key

Description

Example values

alertmanagerSimpleConfig.email.enabled (bool)

Enables or disables Alertmanager integration with email. Set to false by default.

true or false

alertmanagerSimpleConfig.email (map)

Defines the notification parameters for Alertmanager integration with email. For details, see Prometheus Alertmanager documentation: Email configuration.

email:
  enabled: false
  send_resolved: true
  to: "to@test.com"
  from: "from@test.com"
  smarthost: smtp.gmail.com:587
  auth_username: "from@test.com"
  auth_password: password
  auth_identity: "from@test.com"
  require_tls: true

alertmanagerSimpleConfig.email.route (map)

Defines the route for Alertmanager integration with email. For details, see Prometheus Alertmanager documentation: Route.

route:
  matchers: []
  routes: []

Notifications to Salesforce

On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Salesforce integration depends on the Internet access through HTTPS.

Key

Description

Example values

clusterId (string)

Unique cluster identifier clusterId="<Cluster Project>/<Cluster Name>/<UID>", generated for each cluster using Cluster Project, Cluster Name, and cluster UID, separated by a slash. Used for both sf-notifier and sf-reporter services.

The clusterId is automatically defined for each cluster. Do not set or modify it manually.

Do not modify clusterId.

alertmanagerSimpleConfig.salesForce.enabled (bool)

Enables or disables Alertmanager integration with Salesforce using the sf-notifier service. Disabled by default.

true or false

alertmanagerSimpleConfig.salesForce.auth (map)

Defines the Salesforce parameters and credentials for integration with Alertmanager.

auth:
  url: "<SF instance URL>"
  username: "<SF account email address>"
  password: "<SF password>"
  environment_id: "<Cloud identifier>"
  organization_id: "<Organization identifier>"
  sandbox_enabled: "<Set to true or false>"

alertmanagerSimpleConfig.salesForce.route (map)

Defines the notifications route for Alertmanager integration with Salesforce. For details, see Prometheus Alertmanager documentation: Route.

route:
  matchers:
  - severity="critical"
  routes: []

Note

By default, only Critical alerts will be sent to Salesforce.

alertmanagerSimpleConfig.salesForce.feed_enabled (bool)

Enables or disables feed update in Salesforce. To save API calls, this parameter is set to false by default.

true or false

alertmanagerSimpleConfig.salesForce.link_prometheus (bool)

Enables or disables links to the Prometheus web UI in alerts sent to Salesforce. To simplify troubleshooting, set to true by default.

true or false

Notifications to Slack

On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Slack integration depends on the Internet access through HTTPS.

Key

Description

Example values

alertmanagerSimpleConfig.slack.enabled (bool)

Enables or disables Alertmanager integration with Slack. For details, see Prometheus Alertmanager documentation: Slack configuration. Set to false by default.

true or false

alertmanagerSimpleConfig.slack.api_url (string)

Defines the Slack webhook URL.

http://localhost:8888

alertmanagerSimpleConfig.slack.channel (string)

Defines the Slack channel or user to send notifications to.

monitoring

alertmanagerSimpleConfig.slack.route (map)

Defines the notifications route for Alertmanager integration with Slack. For details, see Prometheus Alertmanager documentation: Route.

route:
  matchers: []
  routes: []

Notifications to Microsoft Teams

On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Microsoft Teams integration depends on the Internet access through HTTPS.

Key

Description

Example values

alertmanagerSimpleConfig.msteams.enabled (bool)

Enables or disables Alertmanager integration with Microsoft Teams. Requires a set up Microsoft Teams channel and a channel connector. Set to false by default.

true or false

alertmanagerSimpleConfig.msteams.url (string)

Defines the URL of an Incoming Webhook connector of a Microsoft Teams channel. For details about channel connectors, see Microsoft documentation.

https://example.webhook.office.com/webhookb2/UUID

alertmanagerSimpleConfig.msteams.route (map)

Defines the notifications route for Alertmanager integration with MS Teams. For details, see Prometheus Alertmanager documentation: Route.

route:
  matchers: []
  routes: []

Notifications to ServiceNow

Caution

Prior to configuring the integration with ServiceNow, perform the following prerequisite steps using the ServiceNow documentation of the required version.

  1. In a new or existing Incident table, add the Alert ID field as described in Add fields to a table. To avoid alerts duplication, select Unique.

  2. Create an Access Control List (ACL) with read/write permissions for the Incident table as described in Securing table records.

  3. Set up a service account.

Key

Description

Example values

alertmanagerSimpleConfig.serviceNow.enabled (bool)

Enables or disables Alertmanager integration with ServiceNow. Set to false by default. Requires a set up ServiceNow account and compliance with the Incident table requirements above.

true or false

alertmanagerSimpleConfig.serviceNow (map)

Defines the ServiceNow parameters and credentials for integration with Alertmanager:

  • incident_table - name of the table created in ServiceNow. Do not confuse with the table label.

  • api_version - version of the ServiceNow HTTP API. By default, v1.

  • alert_id_field - name of the unique string field configured in ServiceNow to hold Prometheus alert IDs. Do not confuse with the table label.

  • auth.instance - URL of the instance.

  • auth.username - name of the ServiceNow user account with access to Incident table.

  • auth.password - password of the ServiceNow user account.

serviceNow:
  enabled: true
  incident_table: "incident"
  api_version: "v1"
  alert_id_field: "u_alert_id"
  auth:
    instance: "https://dev00001.service-now.com"
    username: "testuser"
    password: "testpassword"