StackLight configuration parameters

StackLight configuration parameters

This section describes the StackLight configuration keys that you can specify in the values section to change StackLight settings as required. Prior to making any changes to StackLight configuration, perform the steps described in Configure StackLight. After changing StackLight configuration, verify the changes as described in Verify StackLight after configuration.


Alerta

Key

Description

Example values

alerta.enabled (bool)

Enables or disables Alerta. Set to true by default.

true or false


Elasticsearch

Key

Description

Example values

elasticsearch.logstashRetentionTime (int)

Defines the Elasticsearch logstash-* index retention time in days. The logstash-* index stores all logs gathered from all nodes and containers. Set to 1 by default.

1, 5, 15


Grafana

Key

Description

Example values

grafana.renderer.enabled (bool)

Disables Grafana Image Renderer. For example, for resource-limited environments. Enabled by default.

true or false

grafana.homeDashboard (string)

Defines the home dashboard. Set to kubernetes-cluster by default. You can define any of the available dashboards.

kubernetes-cluster


Logging

Key

Description

Example values

logging.enabled (bool)

Enables or disables the StackLight logging stack. For details about the logging components, see Reference Architecture: StackLight deployment architecture. Set to true by default.

true or false


High availability

Key

Description

Example values

highAvailabilityEnabled (bool)

Enables or disables StackLight multiserver mode. For details, see StackLight database modes in Reference Architecture: StackLight deployment architecture. Set to false by default.

true or false


Metric collector

Key

Description

Example values

metricCollector.enabled (bool)

Disables or enables the metric collector. Modify this parameter for the management cluster only. Set to false by default.

false or true


Prometheus

Key

Description

Example values

prometheusServer.retentionTime (string)

Defines the Prometheus database retention period. Passed to the --storage.tsdb.retention.time flag. Set to 15d by default.

15d, 1000h, 10d12h

prometheusServer.retentionSize (string)

Defines the Prometheus database retention size. Passed to the --storage.tsdb.retention.size flag. Set to 15GB by default.

15GB, 512MB

prometheusServer.alertResendDelay (string)

Defines the minimum amount of time for Prometheus to wait before resending an alert to Alertmanager. Passed to the --rules.alert.resend-delay flag. Set to 2m by default.

2m, 90s


Cluster size

Key

Description

Example values

clusterSize (string)

Specifies the approximate expected cluster size. Set to small by default. Other possible values include medium and large. Depending on the choice, appropriate resource limits are passed according to the resourcesPerClusterSize parameter. The values differ by the Elasticsearch and Prometheus resource limits:

  • small (default) - 2 CPU, 6 Gi RAM for Elasticsearch, 1 CPU, 8 Gi RAM for Prometheus. Use small only for testing and evaluation purposes with no workloads expected.

  • medium - 4 CPU, 16 Gi RAM for Elasticsearch, 3 CPU, 16 Gi RAM for Prometheus.

  • large - 8 CPU, 32 Gi RAM for Elasticsearch, 6 CPU, 32 Gi RAM for Prometheus. Set to large only in case of lack of resources for Elasticsearch and Prometheus.

small, medium, or large


Resource limits

Key

Description

Example values

resourcesPerClusterSize (map)

Provides the capability to override the default resource requests or limits for any StackLight component for the predefined cluster sizes. For a list of StackLight components, see Components versions in Release Notes: Cluster releases.

resourcesPerClusterSize:
  elasticsearch:
    small:
      limits:
        cpu: "1000m"
        memory: "4Gi"
    medium:
      limits:
        cpu: "2000m"
        memory: "8Gi"
      requests:
        cpu: "1000m"
        memory: "4Gi"
    large:
      limits:
        cpu: "4000m"
        memory: "16Gi"

resources (map)

Provides the capability to override the containers resource requests or limits for any StackLight component. For a list of StackLight components, see Components versions in Release Notes: Cluster releases.

resources:
  alerta:
    requests:
      cpu: "50m"
      memory: "200Mi"
    limits:
      memory: "500Mi"

Using the example above, each pod in the alerta service will be requesting 50 millicores of CPU and 200 MiB of memory, while being hard-limited to 500 MiB of memory usage. Each configuration key is optional.


Kubernetes tolerations

Key

Description

Example values

tolerations.default (slice)

Kubernetes tolerations to add to all StackLight components.

default:
- key: "com.docker.ucp.manager"
  operator: "Exists"
  effect: "NoSchedule"

tolerations.component (map)

Defines Kubernetes tolerations (overrides the default ones) for any StackLight component.

component:
  elasticsearch:
  - key: "com.docker.ucp.manager"
    operator: "Exists"
    effect: "NoSchedule"
  postgresql:
  - key: "node-role.kubernetes.io/master"
    operator: "Exists"
    effect: "NoSchedule"

Storage class

Key

Description

Example values

storage.defaultStorageClass (string)

Defines the StorageClass to use for all StackLight Persistent Volume Claims (PVCs) if a component StorageClass is not defined using the componentStorageClasses. To use the cluster default storage class, leave the string empty.

lvp, standard

storage.componentStorageClasses (map)

Defines (overrides the defaultStorageClass value) the storage class for any StackLight component separately. To use the cluster default storage class, leave the string empty.

componentStorageClasses:
  elasticsearch: ""
  fluentd: ""
  postgresql: ""
  prometheusAlertManager: ""
  prometheusPushGateway: ""
  prometheusServer: ""

NodeSelector

Key

Description

Example values

nodeSelector.default (map)

Defines the NodeSelector to use for the most of StackLight pods (except some pods that refer to DaemonSets) if the NodeSelector of a component is not defined.

default:
  role: stacklight

nodeSelector.component (map)

Defines the NodeSelector to use for particular StackLight component pods. Overrides nodeSelector.default.

component:
  alerta:
    role: stacklight
    component: alerta
  kibana:
    role: stacklight
    component: kibana

Salesforce reporter

Key

Description

Example values

clusterId (string)

Unique cluster identifier, clusterId="<namespace>/<name>/<uid>", generated for each cluster from the cluster namespace, cluster name, and cluster UID, separated by a slash.

Cluster <namespace> is a Kubernetes namespace where the cluster is deployed. Cluster <name> is the name of the cluster object in the namespace. Cluster <uid> is the annotation set for each cluster. To obtain the cluster UID:

kubectl  get cluster -n ``<namespace>`` \
``<name>`` -o jsonpath=\
'{.metadata.annotations.kaas\.mirantis\.com/uid}'

The sf-reporter and sf-notifier services use the same clusterId parameter.

default/demo-9952/bcdb27ce-34b0-11eb-805d-0242c0a85b02

sfReporter.enabled (bool)

Enables or disables reporting of Prometheus metrics to Salesforce. For details, see StackLight deployment architecture. Disabled by default.

true or false

sfReporter.salesForceAuth (map)

Salesforce parameters and credentials for the metrics reporting integration.

Note

Modify this parameter if sf-notifier is not configured or if you want to use a different Salesforce user account to send reports to.

salesForceAuth:
  url: "<SF instance URL>"
  username: "<SF account email address>"
  password: "<SF password>"
  environment_id: "<Cloud identifier>"
  organization_id: "<Organization identifier>"
  sandbox_enabled: "<Set to true or false>"

sfReporter.cronjob (map)

Defines the Kubernetes cron job for sending metrics to Salesforce. By default, reports are sent at midnight server time.

cronjob:
  schedule: "0 0 * * *"
  concurrencyPolicy: "Allow"
  failedJobsHistoryLimit: ""
  successfulJobsHistoryLimit: ""
  startingDeadlineSeconds: 200

Ceph monitoring

Key

Description

Example values

ceph.enabled (bool)

Enables or disables Ceph monitoring. Set to false by default.

true or false


External endpoint monitoring

Key

Description

Example values

externalEndpointMonitoring.enabled (bool)

Enables or disables HTTP endpoints monitoring. If enabled, the monitoring tool performs the probes against the defined endpoints every 15 seconds. Set to false by default.

true or false

externalEndpointMonitoring.certificatesHostPath (string)

Defines the directory path with external endpoints certificates on host.

/etc/ssl/certs/

externalEndpointMonitoring.domains (slice)

Defines the list of HTTP endpoints to monitor.

domains:
- https://prometheus.io_health
- http://example.com:8080_status
- http://example.net:8080_pulse

Ironic monitoring

Key

Description

Example values

ironic.endpoint (string)

Enables or disables monitoring of bare metal Ironic. To enable, specify the Ironic API URL.

http://ironic-api-http.kaas.svc:6385/v1


SSL certificates monitoring

Key

Description

Example values

sslCertificateMonitoring.enabled (bool)

Enables or disables StackLight to monitor and alert on the expiration date of the TLS certificate of an HTTPS endpoint. If enabled, the monitoring tool performs the probes against the defined endpoints every hour. Set to false by default.

true or false

sslCertificateMonitoring.domains (slice)

Defines the list of HTTPS endpoints to monitor the certificates from.

domains:
- https://prometheus.io
- http://example.com:8080

Workload monitoring

Key

Description

Example values

metricFilter (map)

On the clusters that run large-scale workloads, workload monitoring generates a big amount of resource-consuming metrics. To prevent generation of excessive metrics, you can disable workload monitoring in the StackLight metrics and monitor only the infrastructure.

The metricFilter parameter enables the cAdvisor (Container Advisor) and kubeStateMetrics metric ingestion filters for Prometheus. Set to false by default. If set to true, you can define the namespaces to which the filter will apply.

metricFilter:
  enabled: true
  action: keep
  namespaces:
  - kaas
  - kube-system
  - stacklight
  • enabled - enable or disable metricFilter using true or false

  • action - action to take by Prometheus:

    • keep - keep only metrics from namespaces that are defined in the namespaces list

    • drop - ignore metrics from namespaces that are defined in the namespaces list

  • namespaces - list of namespaces to keep or drop metrics from regardless of the boolean value for every namespace


Mirantis Kubernetes Engine monitoring

Key

Description

Example values

mke.enabled (bool)

Enables or disables Mirantis Kubernetes Engine (MKE) monitoring. Set to false by default.

true or false

mke.dockerdDataRoot (string)

Defines the dockerd data root directory of persistent Docker state. For details, see Docker documentation: Daemon CLI (dockerd).

/var/lib/docker


Alerts configuration

Key

Description

Example values

prometheusServer.customAlerts (slice)

Defines custom alerts. Also, modifies or disables existing alert configurations. For the list of predefined alerts, see Available StackLight alerts. While adding or modifying alerts, follow the Alerting rules.

customAlerts:
# To add a new alert:
- alert: ExampleAlert
  annotations:
    description: Alert description
    summary: Alert summary
  expr: example_metric > 0
  for: 5m
  labels:
    severity: warning
# To modify an existing alert expression:
- alert: AlertmanagerFailedReload:
  expr: alertmanager_config_last_reload_successful == 5
# To disable an existing alert:
- alert: TargetDown
  enabled: false

An optional field enabled is accepted in the alert body to disable an existing alert by setting to false. All fields specified using the customAlerts definition override the default predefined definitions in the charts’ values.


Watchdog alert

Key

Description

Example values

prometheusServer.watchDogAlertEnabled (bool)

Enables or disables the Watchdog alert that constantly fires as long as the entire alerting pipeline is functional. You can use this alert to verify that Alertmanager notifications properly flow to the Alertmanager receivers. Set to true by default.

true or false


Alertmanager integrations

Key

Description

Example values

alertmanagerSimpleConfig.genericReceivers (slice)

Provides a genetic template for notifications receiver configurations. For a list of supported receivers, see Prometheus Alertmanager documentation: Receiver.

For example, to enable notifications to OpsGenie:

alertmanagerSimpleConfig:
  genericReceivers:
  - name: HTTP-opsgenie
    enabled: true # optional
    opsgenie_configs:
    - api_url: "https://example.app.eu.opsgenie.com/"
      api_key: "secret-key"
      send_resolved: true

alertmanagerSimpleConfig.genericRoutes (slice)

Provides a template for notifications route configuration. For details, see Prometheus Alertmanager documentation: Route.

genericRoutes:
- receiver: HTTP-opsgenie
  enabled: true # optional
  match_re:
    severity: major|critical
  continue: true

Notifications to email

Key

Description

Example values

alertmanagerSimpleConfig.email.enabled (bool)

Enables or disables Alertmanager integration with email. Set to false by default.

true or false

alertmanagerSimpleConfig.email (map)

Defines the notification parameters for Alertmanager integration with email. For details, see Prometheus Alertmanager documentation: Email configuration.

email:
  enabled: false
  send_resolved: true
  to: "to@test.com"
  from: "from@test.com"
  smarthost: smtp.gmail.com:587
  auth_username: "from@test.com"
  auth_password: password
  auth_identity: "from@test.com"
  require_tls: true

alertmanagerSimpleConfig.email.route (map)

Defines the route for Alertmanager integration with email. For details, see Prometheus Alertmanager documentation: Route.

route:
  match: {}
  match_re: {}
  routes: []

Notifications to Salesforce

Key

Description

Example values

clusterId (string)

Unique cluster identifier, clusterId="<namespace>/<name>/<uid>", generated for each cluster from the cluster namespace, cluster name, and cluster UID, separated by a slash.

Cluster <namespace> is a Kubernetes namespace where the cluster is deployed. Cluster <name> is the name of the cluster object in the namespace. Cluster <uid> is the annotation set for each cluster. To obtain the cluster UID:

kubectl  get cluster -n ``<namespace>`` \
``<name>`` -o jsonpath=\
'{.metadata.annotations.kaas\.mirantis\.com/uid}'

The sf-notifier and sf-reporter services use the same clusterId parameter.

default/demo-9952/bcdb27ce-34b0-11eb-805d-0242c0a85b02

alertmanagerSimpleConfig.salesForce.enabled (bool)

Enables or disables Alertmanager integration with Salesforce using the sf-notifier service. Disabled by default.

true or false

alertmanagerSimpleConfig.salesForce.auth (map)

Defines the Salesforce parameters and credentials for integration with Alertmanager.

auth:
  url: "<SF instance URL>"
  username: "<SF account email address>"
  password: "<SF password>"
  environment_id: "<Cloud identifier>"
  organization_id: "<Organization identifier>"
  sandbox_enabled: "<Set to true or false>"

alertmanagerSimpleConfig.salesForce.route (map)

Defines the notifications route for Alertmanager integration with Salesforce. For details, see Prometheus Alertmanager documentation: Route.

route:
  match: {}
  match_re: {}
  routes: []

Notifications to Slack

Key

Description

Example values

alertmanagerSimpleConfig.slack.enabled (bool)

Enables or disables Alertmanager integration with Slack. For details, see Prometheus Alertmanager documentation: Slack configuration. Set to false by default.

true or false

alertmanagerSimpleConfig.slack.api_url (string)

Defines the Slack webhook URL.

http://localhost:8888

alertmanagerSimpleConfig.slack.channel (string)

Defines the Slack channel or user to send notifications to.

monitoring

alertmanagerSimpleConfig.slack.route (map)

Defines the notifications route for Alertmanager integration with Slack. For details, see Prometheus Alertmanager documentation: Route.

route:
  match: {}
  match_re: {}
  routes: []