StackLight configuration parameters
This section describes the StackLight configuration keys that you can specify
in the values section to change StackLight settings as required. Prior to
making any changes to StackLight configuration, perform the steps described in
StackLight configuration procedure.
After changing StackLight configuration, verify the changes as described in
Verify StackLight after configuration.
Important
Some parameters are marked as mandatory. Failure to specify values for such parameters causes the Admission Controller to reject cluster creation.
OpenStack cluster configuration parameters
This section describes the OpenStack-related StackLight configuration keys. For MOSK cluster configuration keys, see MOSK cluster configuration parameters.
General
openstack.enabled(bool)Enables OpenStack monitoring. Defaults to
true.
openstack.namespace(string)Defines the namespace within which the OpenStack virtualized control plane is installed. Defaults to
openstack.
openstack:
enabled: true
namespace: openstack
Gnocchi
openstack.gnocchi.enabled(bool)Enables Gnocchi monitoring. Set to
falseby default.
openstack:
gnocchi:
enabled: false
Ironic
openstack.ironic.enabled(bool)Enables Ironic monitoring. Defaults to
false.
openstack:
ironic:
enabled: false
RabbitMQ
openstack.rabbitmq.credentialsConfig(map)Defines the RabbitMQ credentials to use if credentials discovery is disabled or some required parameters were not found during the discovery.
openstack.rabbitmq.credentialsDiscovery(map)Enables the credentials discovery to obtain the username and password from the secret object.
openstack:
rabbitmq:
credentialsConfig:
username: "stacklight"
password: "stacklight"
host: "rabbitmq.openstack.svc"
queue: "notifications"
vhost: "openstack"
credentialsDiscovery:
enabled: true
namespace: openstack
secretName: os-rabbitmq-user-credentials
SSL certificates
openstack.externalFQDN(string) DeprecatedExternal FQDN used to communicate with OpenStack services for certificates monitoring. For example,
https://os.ssl.mirantis.net/. The option is deprecated, useopenstack.externalFQDNs.enabledinstead.
openstack.externalFQDNs.enabled(bool)External FQDN used to communicate with OpenStack services for certificates monitoring. Defaults to
false.
openstack.insecure(string)Defines whether to verify the trust chain of the OpenStack endpoint SSL certificates during monitoring.
openstack:
externalFQDNs:
enabled: false
insecure:
internal: true
external: false
Tungsten Fabric
tungstenFabricMonitoring.enabled(bool)Enables Tungsten Fabric monitoring. Defaults to
trueif Tungsten Fabric is deployed.
tungstenFabricMonitoring.exportersTimeout(string)Defines the timeout of the
tungstenfabric-exporterclient requests. Defaults to5s.
tungstenFabricMonitoring.analyticsEnabled(bool)Enables or disables monitoring of the Tungsten Fabric analytics services. The default value is set automatically based on the real state of the Tungsten Fabric analytics services (enabled or disabled) in the Tungsten Fabric cluster.
tungstenFabricMonitoring:
enabled: true
exportersTimeout: "5s"
analyticsEnabled: true
MOSK cluster configuration parameters
This section describes the MOSK cluster StackLight configuration keys. For OpenStack cluster configuration keys, see OpenStack cluster configuration parameters.
Alert configuration
prometheusServer.customAlerts(slice)Defines custom alerts. Also, modifies or disables existing alert configurations. For the list of predefined alerts, see StackLight alerts. While adding or modifying alerts, follow the Alerting rules.
prometheusServer: customAlerts: # To add a new alert: - alert: ExampleAlert annotations: description: Alert description summary: Alert summary expr: example_metric > 0 for: 5m labels: severity: warning # To modify an existing alert expression: - alert: AlertmanagerFailedReload expr: alertmanager_config_last_reload_successful == 5 # To disable an existing alert: - alert: TargetDown enabled: false
An optional field
enabledis accepted in the alert body to disable an existing alert by setting tofalse. All fields specified using thecustomAlertsdefinition override the default predefined definitions in the charts’ values.
Alerta
alerta.enabled(bool)Enables or disables Alerta. Using the Alerta web UI, you can view the most recent or watched alerts, group, and filter alerts. Defaults to
true.
alerta:
enabled: true
Alertmanager integrations
On MOSK clusters with limited internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled, for example, for the Salesforce integration and Alertmanager notifications external rules.
alertmanagerSimpleConfig.genericReceivers(slice)Provides a generic template for notifications receiver configurations. For a list of supported receivers, see Prometheus Alertmanager documentation: Receiver.
For example, to enable notifications to OpsGenie:
alertmanagerSimpleConfig: genericReceivers: - name: HTTP-opsgenie enabled: true # optional opsgenie_configs: - api_url: "https://example.app.eu.opsgenie.com/" api_key: "secret-key" send_resolved: true
alertmanagerSimpleConfig.genericRoutes(slice)Provides a template for notifications route configuration. For details, see Prometheus Alertmanager documentation: Route.
alertmanagerSimpleConfig: genericRoutes: - receiver: HTTP-opsgenie enabled: true # optional matchers: severity=~"major|critical" continue: true
alertmanagerSimpleConfig.inhibitRules.enabled(bool)Disables or enables alert inhibition rules. If enabled, Alertmanager decreases alert noise by suppressing dependent alerts notifications to provide a clearer view on the cloud status and simplify troubleshooting. Enabled by default. For details, see Alert dependencies. For details on inhibition rules, see Prometheus documentation.
alertmanagerSimpleConfig: inhibitRules: enabled: true
Alertmanager: notifications to email
alertmanagerSimpleConfig.email.enabled(bool)Enables or disables Alertmanager integration with email. Defaults to
false.alertmanagerSimpleConfig: email: enabled: false
alertmanagerSimpleConfig.email(map)Defines the notification parameters for Alertmanager integration with email. For details, see Prometheus Alertmanager documentation: Email configuration.
alertmanagerSimpleConfig: email: enabled: false send_resolved: true to: "to@test.com" from: "from@test.com" smarthost: smtp.gmail.com:587 auth_username: "from@test.com" auth_password: password auth_identity: "from@test.com" require_tls: true
alertmanagerSimpleConfig.email.customTemplates(slice)Defines custom notification templates for email alerts. You can override
subject,html, andtexttemplates while keeping thedefineandendblocks unchanged. You can customize each template independently.The following example is provided for demonstration purposes. For production environments, use the official Prometheus documentation to create relevant templates that fit your deployment.
Simple example template
alertmanagerSimpleConfig: email: customTemplates: subject: | {{- define "email.notification.subject" -}} {{ if (index .Alerts 0).Labels.severity }}{{ (index .Alerts 0).Labels.severity | toUpper }}: {{ end }}{{ (index .Alerts 0).Labels.alertname }}{{ if (index .Alerts 0).Labels.cluster }} [{{ (index .Alerts 0).Labels.cluster }}]{{ end }} {{- end -}} html: | {{- define "email.notification.html" -}} {{ range .Alerts }} <b>{{ .Labels.alertname }}</b> ({{ .Labels.severity }}) [{{ .Labels.cluster }}]<br> {{ .Annotations.description }}<br> {{ end }} {{- end -}} text: | {{- define "email.notification.text" -}} {{ range .Alerts }} Alert: {{ .Labels.alertname }} ({{ .Labels.severity }}) [{{ .Labels.cluster }}] {{ .Annotations.description }} {{ end }} {{- end -}}
Example notification
Subject: CRITICAL: KernelIOErrorsDetectedFake [west-cluster] Text: KernelIOErrorsDetectedFake (critical) [west-cluster] The kaas-node-0f724961-95e8-483f-a466-b2759ff9c5ea node kernel reports IO errors. Investigate kernel logs. KernelIOErrorsDetectedFake (critical) [west-cluster] The kaas-node-c7b4982c-ae86-4055-92db-dc9a8afd3bed node kernel reports IO errors. Investigate kernel logs.
alertmanagerSimpleConfig.email.route(map)Defines the route for Alertmanager integration with email. For details, see Prometheus Alertmanager documentation: Route.
alertmanagerSimpleConfig: email: route: matchers: [] routes: []
Alertmanager: notifications to Microsoft Teams
On MOSK clusters with limited internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Microsoft Teams integration depends on the internet access through HTTPS.
alertmanagerSimpleConfig.msteams.enabled(bool)Enables or disables Alertmanager integration with Microsoft Teams. Requires a set up Microsoft Teams channel and a channel connector. Defaults to
false.alertmanagerSimpleConfig: msteams: enabled: false
alertmanagerSimpleConfig.msteams.url(string)Defines the URL of an Incoming Webhook connector of a Microsoft Teams channel. For details about channel connectors, see Microsoft documentation.
alertmanagerSimpleConfig: msteams: url: "https://example.webhook.office.com/webhookb2/UUID"
alertmanagerSimpleConfig.msteams.route(map)Defines the notifications route for Alertmanager integration with MS Teams. For details, see Prometheus Alertmanager documentation: Route.
alertmanagerSimpleConfig: msteams: route: matchers: [] routes: []
Alertmanager: notifications to Salesforce
On the MOSK clusters with limited internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Salesforce integration depends on the internet access through HTTPS.
clusterId(string)Unique cluster identifier
clusterId="<Cluster Project>/<Cluster Name>/<UID>", generated for each cluster using Cluster Project, Cluster Name, and cluster UID, separated by a slash. Used for bothsf-notifierandsf-reporterservices.The
clusterIdis automatically defined for each cluster. Do not set or modify it manually.
alertmanagerSimpleConfig.salesForce.enabled(bool)Enables or disables Alertmanager integration with Salesforce using the
sf-notifierservice. Disabled by default.alertmanagerSimpleConfig: salesForce: enabled: false
alertmanagerSimpleConfig.salesForce.auth(map)Defines the Salesforce parameters and credentials for integration with Alertmanager.
alertmanagerSimpleConfig: salesForce: auth: url: "<SF instance URL>" username: "<SF account email address>" password: "<SF password>" environment_id: "<Cloud identifier>" organization_id: "<Organization identifier>" sandbox_enabled: "<Set to true or false>"
alertmanagerSimpleConfig.salesForce.route(map)Defines the notifications route for Alertmanager integration with Salesforce. For details, see Prometheus Alertmanager documentation: Route.
alertmanagerSimpleConfig: salesForce: route: matchers: - severity="critical" routes: []
Note
By default, only
Criticalalerts will be sent to Salesforce.
alertmanagerSimpleConfig.salesForce.feed_enabled(bool)Enables or disables feed update in Salesforce. To save API calls, this defaults to
false.alertmanagerSimpleConfig: salesForce: feed_enabled: false
alertmanagerSimpleConfig.salesForce.link_prometheus(bool)Enables or disables links to the Prometheus web UI in alerts sent to Salesforce. To simplify troubleshooting, defaults to
true.alertmanagerSimpleConfig: salesForce: link_prometheus: true
Alertmanager: notifications to ServiceNow
Caution
Prior to configuring the integration with ServiceNow, perform the following prerequisite steps using the ServiceNow documentation of the required version.
In a new or existing Incident table, add the Alert ID field as described in Add fields to a table. To avoid alerts duplication, select Unique.
Create an Access Control List (ACL) with read/write permissions for the Incident table as described in Securing table records.
alertmanagerSimpleConfig.serviceNow.enabled(bool)Enables or disables Alertmanager integration with ServiceNow. Defaults to
false. Requires a set up ServiceNow account and compliance with the Incident table requirements above.alertmanagerSimpleConfig: serviceNow: enabled: false
alertmanagerSimpleConfig.serviceNow(map)Defines the ServiceNow parameters and credentials for integration with Alertmanager:
incident_table- name of the table created in ServiceNow. Do not confuse with the table label.api_version- version of the ServiceNow HTTP API. By default,v1.alert_id_field- name of the unique string field configured in ServiceNow to hold Prometheus alert IDs. Do not confuse with the table label.auth.instance- URL of the instance.auth.username- name of the ServiceNow user account with access to Incident table.auth.password- password of the ServiceNow user account.
alertmanagerSimpleConfig: serviceNow: enabled: true incident_table: "incident" api_version: "v1" alert_id_field: "u_alert_id" auth: instance: "https://dev00001.service-now.com" username: "testuser" password: "testpassword"
Alertmanager: notifications to Slack
On MOSK clusters with limited internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Slack integration depends on the internet access through HTTPS.
alertmanagerSimpleConfig.slack.enabled(bool)Enables or disables Alertmanager integration with Slack. For details, see Prometheus Alertmanager documentation: Slack configuration. Defaults to
false.alertmanagerSimpleConfig: slack: enabled: false
alertmanagerSimpleConfig.slack.api_url(string)Defines the Slack webhook URL.
alertmanagerSimpleConfig: slack: api_url: "http://localhost:8888"
alertmanagerSimpleConfig.slack.channel(string)Defines the Slack channel or user to send notifications to.
alertmanagerSimpleConfig: slack: channel: "monitoring"
alertmanagerSimpleConfig.slack.customTemplates(slice)Defines custom notification templates for Slack alerts. You can override
titleandtexttemplates while keeping thedefineandendblocks unchanged. You can customize each template independently.The following example is provided for demonstration purposes. For production environments, use the official Prometheus documentation to create relevant templates that fit your deployment.
Simple example template
alertmanagerSimpleConfig: slack: customTemplates: title: | {{- define "slack.notification.title" -}} {{ if (index .Alerts 0).Labels.severity }}{{ (index .Alerts 0).Labels.severity | toUpper }}: {{ end }}{{ (index .Alerts 0).Labels.alertname }}{{ if (index .Alerts 0).Labels.cluster }} [{{ (index .Alerts 0).Labels.cluster }}]{{ end }} {{- end -}} text: | {{- define "slack.notification.text" -}} {{ range .Alerts }} *{{ .Labels.alertname }}* ({{ .Labels.severity }}) [{{ .Labels.cluster }}] {{ .Annotations.description }} {{ end }} {{- end -}}
Example notification
Title: CRITICAL: KernelIOErrorsDetectedFake [west-cluster] Text: KernelIOErrorsDetectedFake (critical) [west-cluster] The kaas-node-0f724961-95e8-483f-a466-b2759ff9c5ea node kernel reports IO errors. Investigate kernel logs. KernelIOErrorsDetectedFake (critical) [west-cluster] The kaas-node-c7b4982c-ae86-4055-92db-dc9a8afd3bed node kernel reports IO errors. Investigate kernel logs.
alertmanagerSimpleConfig.slack.route(map)Defines the notifications route for Alertmanager integration with Slack. For details, see Prometheus Alertmanager documentation: Route.
alertmanagerSimpleConfig: slack: route: matchers: [] routes: []
Alertmanager: Watchdog alert
prometheusServer.watchDogAlertEnabled(bool)Enables or disables the
Watchdogalert that constantly fires as long as the entire alerting pipeline is functional. You can use this alert to verify that Alertmanager notifications properly flow to the Alertmanager receivers. Defaults totrue.
prometheusServer:
watchDogAlertEnabled: true
Bond interface monitoring
bondInterfaceMonitoring.enabled(bool)Enables the bond interface monitoring. Defaults to
true.
bondInterfaceMonitoring:
enabled: true
Byte limit for Telemeter client
For internal StackLight use only
telemetry.telemeterClient.limitBytes(string)Specifies the size limit of the incoming data length in bytes for the Telemeter client. Defaults to
1048576.
telemetry:
telemeterClient:
limitBytes: "1048576"
Cluster size
clusterSize(string)Specifies the approximate expected cluster size. Possible values:
small,medium,large. Defaults tosmall. Depending on the choice, appropriate resource limits are passed according to theresourcesorresourcesPerClusterSizeparameter.Caution
The
resourcesPerClusterSizeparameter is deprecated and will be overridden by theresourcesparameter. Therefore, use theresourcesparameter instead.The values differ by the OpenSearch and Prometheus resource limits:
small(default) - 2 CPU, 6 Gi RAM for OpenSearch, 1 CPU, 8 Gi RAM for Prometheus. Usesmallonly for testing and evaluation purposes with no workloads expected.medium- 4 CPU, 16 Gi RAM for OpenSearch, 3 CPU, 16 Gi RAM for Prometheus.large- 8 CPU, 32 Gi RAM for OpenSearch, 6 CPU, 32 Gi RAM for Prometheus. Set tolargeonly in case of lack of resources for OpenSearch and Prometheus.
clusterSize: small
Grafana
grafana.homeDashboard(string)Defines the home dashboard. Defaults to
kubernetes-cluster. You can define any of the available dashboards.
grafana:
homeDashboard: kubernetes-cluster
High availability
highAvailabilityEnabled(bool) MandatoryEnables or disables StackLight multiserver mode. For details, see StackLight database modes in Deployment architecture. On MOSK clusters, defaults to
false. On management clusters,trueis mandatory.
highAvailabilityEnabled: true
Kubernetes network policies
networkPolicies.enabled(bool)Enables or disables the Kubernetes Network Policy resource that allows controlling network connections to and from Pods deployed in the
stackLightnamespace. Enabled by default.For the list of network policy rules, refer to StackLight rules for Kubernetes network policies. Customization of network policies is not supported.
networkPolicies:
enabled: true
Kubernetes tolerations
tolerations.default(slice)Kubernetes tolerations to add to all StackLight components.
tolerations: default: - key: "com.docker.ucp.manager" operator: "Exists" effect: "NoSchedule"
tolerations.component(map)Defines Kubernetes tolerations (overrides the default ones) for any StackLight component.
tolerations: component: # elasticsearch: opensearch: - key: "com.docker.ucp.manager" operator: "Exists" effect: "NoSchedule" postgresql: - key: "node-role.kubernetes.io/master" operator: "Exists" effect: "NoSchedule"
Log filtering for namespaces
logging.namespaceFiltering.logs.enabled(bool)Limits the number of namespaces for Pods log collection. Enabled by default with the following list of monitored Kubernetes namespaces:
Kubernetes namespaces monitored by default
cephIf Ceph is enabledceph-lcm-mirantisIf Ceph is enableddefaultkaaskube-node-leasekube-publickube-systemlcm-systemlocal-path-storagemetallbmetallb-systemnode-feature-discoveryopenstackopenstack-ceph-sharedIf Ceph is enabledopenstack-lma-sharedopenstack-provider-systemopenstack-redisopenstack-tf-shareIf Tungsten Fabric is enabledopenstack-vaultosh-systemrook-cephIf Ceph is enabledstacklightsystemtfIf Tungsten Fabric is enabled
logging: namespaceFiltering: logs: enabled: true
logging.namespaceFiltering.logs.extraNamespaces(map)Adds extra namespaces to collect Kubernetes Pod logs from. Requires
logging.enabledandlogging.namespaceFiltering.logs.enabledset totrue. Defines a YAML-formatted list of namespaces, which is empty by default.logging: namespaceFiltering: logs: enabled: true extraNamespaces: - custom-ns-1
logging.namespaceFiltering.events.enabled(bool)Limits the number of namespaces for Kubernetes events collection. Disabled by default due to sysdig scanner present on some MOSK clusters and due to cluster-scoped objects producing events by default to the
defaultnamespace, but it is not passed to StackLight configuration anyhow. Requireslogging.enabledset totrue.logging: namespaceFiltering: events: enabled: false
logging.namespaceFiltering.events.extraNamespaces(map)Adds extra namespaces to collect Kubernetes events from. Requires
logging.enabledandlogging.namespaceFiltering.events.enabledset totrue. Defines a YAML-formatted list of namespaces, which is empty by default.logging: namespaceFiltering: events: enabled: true extraNamespaces: - custom-ns-1
Log verbosity
stacklightLogLevels.default(string)Defines the log verbosity level for all StackLight components if not defined using
component. To use the component default log verbosity level, leave the string empty. Possible values:trace- most verbose log messages, generates large amounts of datadebug- messages typically of use only for debugging purposesinfo- informational messages describing common processes such as service starting or stopping; can be ignored during normal system operation but may provide additional input for investigationwarn- messages about conditions that may require attentionerror- messages on error conditions that prevent normal system operation and require actioncrit- messages on critical conditions indicating that a service is not working, working incorrectly or is unusable, requiring immediate attention
The
NO_SEVERITYseverity label is automatically added to a log with no severity label in the message. This enables greater control over determining which logs Fluentd processes and which ones are skipped by mistake.stacklightLogLevels: default: ""
stacklightLogLevels.component(map)Defines (overrides the
defaultvalue) the log verbosity level for any StackLight component separately. To use the component default log verbosity, leave the string empty.stacklightLogLevels: component: kubeStateMetrics: "" prometheusAlertManager: "" prometheusBlackboxExporter: "" prometheusNodeExporter: "" prometheusServer: "" alerta: "" alertmanagerWebhookServicenow: "" elasticsearchCurator: "" postgresql: "" prometheusEsExporter: "" sfNotifier: "" sfReporter: "" fluentd: "" # fluentdElasticsearch "" fluentdLogs: "" telemeterClient: "" telemeterServer: "" tfControllerExporter: "" tfVrouterExporter: "" telegrafDs: "" telegrafS: "" # elasticsearch: "" opensearch: "" # kibana: "" grafana: "" opensearchDashboards: "" metricbeat: "" prometheusMsTeams: ""
Logging
logging.enabled(bool) MandatoryEnables or disables the StackLight logging stack. For details about the logging components, see Deployment architecture. Defaults to
true. On management clusters,trueis mandatory.logging: enabled: true
logging.metricQueries(map)Allows configuring OpenSearch queries for the data present in OpenSearch. Prometheus Elasticsearch Exporter then queries the OpenSearch database and exposes such metrics in the Prometheus format. For details, see Create log-based metrics. Includes the following parameters:
indices- specifies the index patternintervalandtimeout- specify in seconds how often to send the query to OpenSearch and how long it can last before timing outonErrorandonMissing- modify theprometheus-es-exporterbehavior on query error and missing index. For details, see Prometheus Elasticsearch Exporter.
For usage example, see Create log-based metrics.
Logging: Enforce OOPS compression
logging.enforceOopsCompressionEnforces 32 GB of heap size, unless the defined memory limit allows using 50 GB of heap. Requires
logging.enabledset totrue. Enabled by default. When disabled, StackLight computes heap as ⅘ of the set memory limit for any resulting heap value. For more details, see Tune OpenSearch performance.
logging:
enforceOopsCompression: true
Logging to external outputs
logging.externalOutputs(map)Specifies external Elasticsearch, OpenSearch, and syslog destinations as
fluentd-logsoutputs. Requireslogging.enabled: true. For configuration procedure, see Enable log forwarding to external destinations.
logging:
externalOutputs:
elasticsearch:
# disabled: false
type: elasticsearch
level: info
plugin_log_level: info
tag_exclude: '{fluentd-logs,systemd}'
host: elasticsearch-host
port: 9200
logstash_date_format: '%Y.%m.%d'
logstash_format: true
logstash_prefix: logstash
...
buffer:
# disabled: false
chunk_limit_size: 16m
flush_interval: 15s
flush_mode: interval
overflow_action: block
...
opensearch:
disabled: true
type: opensearch
...
Logging to external outputs: secrets
logging.externalOutputSecretMounts(map)Specifies authentication secret mounts for external log destinations. Requires
logging.externalOutputsto be enabled and a Kubernetes secret to be created under thestacklightnamespace. Contains the following values:secretNameMandatory. Kubernetes secret name.
mountPathMandatory. Mount path of the Kubernetes secret defined in
secretName.
defaultModeOptional. Decimal number defining secret permissions, defaults to
420.
Secret mount configuration:
logging: externalOutputSecretMounts: - secretName: elasticsearch-certs mountPath: /tmp/elasticsearch-certs defaultMode: 420 - secretName: opensearch-certs mountPath: /tmp/opensearch-certs
Elasticsearch configuration for the above secret mount:
logging: externalOutputs: elasticsearch: ... ca_file: /tmp/elasticsearch-certs/ca.pem client_cert: /tmp/elasticsearch-certs/client.pem client_key: /tmp/elasticsearch-certs/client.key client_key_pass: password
Logging to syslog
Note
The logging.syslog parameter is deprecated in favor of
logging.externalOutputs. For details, see
Logging to external outputs.
logging.syslog.enabled(bool)Enables or disables remote logging to syslog. Disabled by default. Requires
logging.enabledset totrue. For details and configuration example, see Enable remote logging to syslog.logging: syslog: enabled: true
logging.syslog.host(string)Specifies the remote syslog host.
logging: syslog: host: remote-syslog.svc
logging.syslog.port(string)Specifies the remote syslog port.
logging: syslog: port: "514"
logging.syslog.packetSize(string)Defines the packet size in bytes for the syslog logging output. Defaults to
1024. May be useful for syslog setups allowing packet size larger than 1 kB. Mirantis recommends that you tune this parameter to allow sending full log lines.logging: syslog: packetSize: "1024"
logging.syslog.protocol(bool)Specifies the remote syslog protocol. Defaults to
udp. Possible values:tcporudp.logging: syslog: protocol: udp
logging.syslog.tls.enabled(bool)Optional. Disabled by default. Enables or disables TLS. Use TLS only for the TCP protocol. TLS will not be enabled if you set a protocol other than TCP.
logging: syslog: tls: enabled: true
logging.syslog.tls.verify_mode(int)Optional. Configures TLS verification. Possible values:
0forOpenSSL::SSL::VERIFY_NONE1forOpenSSL::SSL::VERIFY_PEER2forOpenSSL::SSL::VERIFY_FAIL_IF_NO_PEER_CERT4forOpenSSL::SSL::VERIFY_CLIENT_ONCE
logging: syslog: tls: verify_mode: 1
logging.syslog.tls.certificate(string)Defines how to pass the certificate.
secrettakes precedence overhostPath.secret- specifies the name of the secret holding the certificate.hostPath- specifies an absolute host path to the PEM certificate.
logging: syslog: tls: certificate: secret: "" hostPath: "/etc/ssl/certs/ca-bundle.pem"
tag_exclude(string)Optional. Overrides
tag_include. Sets logs by tags to exclude from the destination output. For example, to exclude all logs with thetesttag, settag_exclude: '/.*test.*/'.How to obtain tags for logs
Select from the following options:
In the main OpenSearch output, use the
loggerfield that equals the tag.Use logs of a particular Pod or container by following the below order, with the first match winning:
The value of the
appPod label. For example, forapp=opensearch-master, useopensearch-masteras the log tag.The value of the
k8s-appPod label.The value of the
app.kubernetes.io/namePod label.If a
release_groupPod label exists and the component Pod label starts withapp, use the value of the component label as the tag. Otherwise, the tag is the application label joined to the component label with a-.The name of the container from which the log is taken.
The values for
tag_excludeandtag_includeare placed into<match>directives of Fluentd and only accept regex types that are supported by the<match>directive of Fluentd. For details, refer to the Fluentd official documentation.logging: syslog: tag_exclude: '{fluentd-logs,systemd}'
tag_include(string)Optional. Is overridden by
tag_exclude. Sets logs by tags to include to the destination output. For example, to include all logs with theauthtag, settag_include: '/.*auth.*/'.logging: syslog: tag_include: '/.*auth.*/'
Monitoring of Ceph
ceph.enabled(bool)Enables or disables Ceph monitoring on MOSK clusters. Defaults to
false.
ceph:
enabled: false
Monitoring of external endpoint
externalEndpointMonitoring.enabled(bool)Enables or disables HTTP endpoints monitoring. If enabled, the monitoring tool performs the probes against the defined endpoints every 15 seconds. Defaults to
false.
externalEndpointMonitoring.certificatesHostPath(string)Defines the directory path with external endpoints certificates on host.
externalEndpointMonitoring.domains(slice)Defines the list of HTTP endpoints to monitor. The endpoints must successfully respond to a liveness probe. For success, a request to a specific endpoint must result in a 2xx HTTP response code.
externalEndpointMonitoring:
enabled: false
certificatesHostPath: /etc/ssl/certs/
domains:
- https://prometheus.io/health
- http://example.com:8080/status
- http://example.net:8080/pulse
Monitoring of Ironic
ironic.endpoint(string)Enables or disables monitoring of Ironic. To enable, specify the Ironic API URL.
ironic.insecure(bool)Defines whether to skip the chain and host verification. Defaults to
false.
ironic:
endpoint: http://ironic-api-http.kaas.svc:6385/v1
insecure: false
Monitoring of Mirantis Kubernetes Engine
mke.enabled(bool)Enables or disables Mirantis Kubernetes Engine (MKE) monitoring. Defaults to
true.
mke.dockerdDataRoot(string)Defines the dockerd data root directory of persistent Docker state. For details, see Docker documentation: Daemon CLI (dockerd).
mke:
enabled: true
dockerdDataRoot: /var/lib/docker
Monitoring of SSL certificates
sslCertificateMonitoring.enabled(bool)Enables or disables StackLight to monitor and alert on the expiration date of the TLS certificate of an HTTPS endpoint. If enabled, the monitoring tool performs the probes against the defined endpoints every hour. Defaults to
false.
sslCertificateMonitoring.domains(slice)Defines the list of HTTPS endpoints to monitor the certificates from.
sslCertificateMonitoring:
enabled: false
domains:
- https://prometheus.io
- https://example.com:8080
Monitoring of workload
metricFilter(map)On the clusters that run large-scale workloads, workload monitoring generates a big amount of resource-consuming metrics. To prevent generation of excessive metrics, you can disable workload monitoring in the StackLight metrics and monitor only the infrastructure.
The
metricFilterparameter enables thecAdvisor(Container Advisor) andkubeStateMetricsmetric ingestion filters for Prometheus. Defaults tofalse. If set totrue, you can define the namespaces to which the filter will apply. The parameter is designed for MOSK clusters.enabled- enable or disablemetricFilterusingtrueorfalseaction- action to take by Prometheus:keep- keep only metrics from namespaces that are defined in thenamespaceslistdrop- ignore metrics from namespaces that are defined in thenamespaceslist
namespaces- list of namespaces tokeepordropmetrics from regardless of the boolean value for every namespace
metricFilter:
enabled: true
action: keep
namespaces:
- kaas
- kube-system
- stacklight
NodeSelector
nodeSelector.default(map)Defines the
NodeSelectorto use for the most of StackLight pods (except some pods that refer toDaemonSets) if theNodeSelectorof a component is not defined.
nodeSelector.component(map)Defines the
NodeSelectorto use for particular StackLight component pods. OverridesnodeSelector.default.
nodeSelector:
default:
role: stacklight
component:
alerta:
role: stacklight
component: alerta
# kibana:
# role: stacklight
# component: kibana
opensearchDashboards:
role: stacklight
component: opensearchdashboards
OpenSearch
elasticsearch.persistentVolumeClaimSize(string) MandatorySpecifies the OpenSearch (Elasticsearch) Persistent Volume Claim(s) (PVC) size. The number of PVCs depends on the StackLight database mode. For HA, three PVCs will be created, each of the size specified in this parameter. For non-HA, one PVC of the specified size.
In HA mode, Local Volume Provisioner (LVP) acts as the storage provisioner. All PVCs located on the same node share the same storage pool and perceive the total available capacity as that of the entire LVP disk.
Important
You cannot modify this parameter after cluster creation.
elasticsearch.persistentVolumeUsableStorageSizeGB(integer)Optional. Specifies the number of gigabytes that is exclusively available for the OpenSearch data. Elasticsearch Curator uses this value as the available storage per node, multiplies it by the node count to calculate the total usable space of the cluster, and removes older indices when this capacity is reached. However, this setting does not enforce a per-node storage limit; it serves only for retention size calculation.
Note
To limit the maximum storage capacity usage per node, consider using watermark parameters described in OpenSearch extra settings.
This parameter defines the ceiling for storage-based retention, though only a portion of this storage will be available for indices, depending on the total size and cluster configuration.
If not set (by default), the number of gigabytes from
elasticsearch.persistentVolumeClaimSizeis used.This parameter is useful in the following cases for HA mode using LVP:
The real storage behind the volume is shared between multiple consumers. As a result, OpenSearch cannot use all
elasticsearch.persistentVolumeClaimSize.The real volume size is bigger than
elasticsearch.persistentVolumeClaimSize. As a result, OpenSearch can use more thanelasticsearch.persistentVolumeClaimSize.
elasticsearch:
persistentVolumeClaimSize: 30Gi
persistentVolumeUsableStorageSizeGB: 160
OpenSearch Dashboards extra settings
logging.dashboardsExtraConfig(map)Additional configuration for
opensearch_dashboards.yml.
logging:
dashboardsExtraConfig:
opensearch.requestTimeout: 60000
OpenSearch extra settings
logging.extraConfig(map)Additional configuration for
opensearch.ymlthat allows setting various OpenSearch parameters, including logging settings, node watermarks, and other cluster-level configurations.By default, StackLight manages watermarks efficiently: the low, high, and flood thresholds are set to 15%, 10%, and 5% of the node’s usable storage, respectively. The usable storage is defined by
persistentVolumeUsableStorageSizeGB. If this parameter is not set, StackLight usespersistentVolumeClaimSizeinstead. These percentage values are converted to absolute GB values and capped at 150 GB, 100 GB, and 50 GB, respectively.If
logging.extraConfigsets any watermark, StackLight stops managing them. In this case, explicitly set all watermarks, not only one or two, preferably using absolute GB values. If you decide to use percentage values, make sure you calculate it based on the whole volume available pool size (in the LVP case, volume capacity can be larger than the defined size). For details on watermark settings, refer to official OpenSearch documentation.
logging:
extraConfig:
cluster.max_shards_per_node: 5000
Prometheus
prometheusServer.alertResendDelay(string)Defines the minimum amount of time for Prometheus to wait before resending an alert to Alertmanager. Passed to the
--rules.alert.resend-delayflag. For example,2mor90s. Defaults to2m.
prometheusServer.alertsCommonLabels(dict)Defines the list of labels to be injected to firing alerts while they are sent to Alertmanager. Empty by default.
The following labels are reserved for internal purposes and cannot be overridden:
cluster_id,service,severity.Caution
When new labels are injected, Prometheus sends alert updates with a new set of labels, which can potentially cause Alertmanager to have duplicated alerts for a short period of time if the cluster currently has firing alerts.
Warning
Before MOSK management 2.31.0 and MOSK 26.1, do not use the
environmentlabel to prevent Alerta from rejecting alert notifications.
prometheusServer.persistentVolumeClaimSize(string) MandatorySpecifies the Prometheus PVC(s) size. The number of PVCs depends on the StackLight database mode. For HA, three PVCs will be created, each of the size specified in this parameter. For non-HA, one PVC of the specified size.
Important
You cannot modify this parameter after cluster creation.
prometheusServer.queryConcurrency(string)Defines the number of concurrent queries limit. Passed to the
--query.max-concurrencyflag. Defaults to20.
prometheusServer.retentionSize(string)Defines the Prometheus database retention size. Passed to the
--storage.tsdb.retention.sizeflag. For example,15GBor512MB. Defaults to15GB.
prometheusServer.retentionTime(string)Defines the Prometheus database retention period. Passed to the
--storage.tsdb.retention.timeflag. For example,15d,1000h, or10d12h. Defaults to15d.
prometheusServer:
alertResendDelay: 2m
alertsCommonLabels:
region: west
env: prod
persistentVolumeClaimSize: 16Gi
queryConcurrency: 20
retentionSize: 15GB
retentionTime: 15d
Prometheus Blackbox Exporter
blackboxExporter.customModules(map)Specifies a set of custom Blackbox Exporter modules. For details, see Blackbox Exporter configuration: module. The
http_2xx,http_2xx_verify,http_openstack,http_openstack_insecure,tls,tls_verifynames are reserved for internal usage and any overrides will be discarded.
blackboxExporter.timeoutOffset(string)Specifies the offset to subtract from timeout in seconds (
--timeout-offset), upper bounded by 5.0 to comply with the built-in StackLight functionality. If nothing is specified, the Blackbox Exporter default value is used. For example, for Blackbox Exporter v0.19.0, the default value is0.5.
blackboxExporter:
customModules:
http_post_2xx:
prober: http
timeout: 5s
http:
method: POST
headers:
Content-Type: application/json
body: '{}'
timeoutOffset: "0.1"
Prometheus custom recording rules
prometheusServer.customRecordingRules(slice)Defines custom Prometheus recording rules. Overriding of existing recording rules is not supported.
prometheusServer:
customRecordingRules:
- name: ExampleRule.http_requests_total
rules:
- expr: sum by(job) (rate(http_requests_total[5m]))
record: job:http_requests:rate5m
- expr: avg_over_time(job:http_requests:rate5m[1w])
record: job:http_requests:rate5m:avg_over_time_1w
Prometheus custom scrape configurations
prometheusServer.customScrapeConfigs(map)Defines custom Prometheus scrape configurations. For details, see Prometheus documentation: scrape_config. The names of default StackLight scrape configurations, which you can view in the Status -> Targets tab of the Prometheus web UI, are reserved for internal usage and any overrides will be discarded. Therefore, provide unique names to avoid overrides.
prometheusServer:
customScrapeConfigs:
custom-grafana:
scrape_interval: 10s
scrape_timeout: 5s
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels:
- __meta_kubernetes_service_label_app
- __meta_kubernetes_endpoint_port_name
regex: grafana;service
action: keep
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
Prometheus experimental features
Available since MOSK 25.2.2 and MOSK management 2.30.2
prometheusServer.enabledFeatures(slice)Defines the list of experimental features to enable in Prometheus server. For a list of available features, see Prometheus documentation: Feature Flags.
The
memory-snapshot-on-shutdownfeature is enabled by default in StackLight.prometheusServer: enabledFeatures: - memory-snapshot-on-shutdown - use-uncached-io
To disable all experimental features:
prometheusServer: enabledFeatures: []
Prometheus metrics filtering
metricsFiltering.enabled(bool)Configuration for managing Prometheus metrics filtering. When enabled (default), only actively used and explicitly white-listed metrics get scraped by Prometheus.
metricsFiltering.extraMetricsInclude(map)List of extra metrics to whitelist, which are dropped by default. Contains the following parameters:
<job name>- scraping job name as a key for extra white-listed metrics to add under the key. For the list of job names, see White list of Prometheus scrape jobs. If a job name is not present in this list, its target metrics are not dropped and are collected by Prometheus by default.You can also use group key names to add metrics to more than one job using
_group-<key name>. The following list combines jobs by groups:List of jobs by groups
_group-blackbox-metrics - blackbox - blackbox-external-endpoint - kubernetes-master-api - mcc-blackbox - mke-manager-api - openstack-blackbox-ext - openstack-dns-probe - refapp _group-controller-runtime-metrics - helm-controller - kaas-exporter - kubelet - kubernetes-apiservers - mcc-controllers - mcc-providers - rabbitmq-operator-metrics _group-etcd-metrics - etcd-server - ucp-kv _group-go-collector-metrics - cadvisor - calico - etcd-server - helm-controller - ironic - kaas-exporter - kubelet - kubernetes-apiservers - mcc-cache - mcc-controllers - mcc-providers - mke-metrics-controller - mke-metrics-engine - openstack-ingress-controller - postgresql - prometheus-alertmanager - prometheus-elasticsearch-exporter - prometheus-grafana - prometheus-libvirt-exporter - prometheus-memcached-exporter - prometheus-msteams - prometheus-mysql-exporter - prometheus-node-exporter - prometheus-rabbitmq-exporter # Removed in MOSK 25.2 - prometheus-relay - prometheus-server - rabbitmq-operator-metrics - telegraf-docker-swarm - telemeter-client - telemeter-server - tf-control - tf-redis - tf-vrouter - ucp-kv _group-process-collector-metrics - alertmanager-webhook-servicenow - cadvisor - calico - etcd-server - helm-controller - ironic - kaas-exporter - kubelet - kubernetes-apiservers - mcc-cache - mcc-controllers - mcc-providers - mke-metrics-controller - mke-metrics-engine - openstack-ingress-controller - patroni - postgresql - prometheus-alertmanager - prometheus-elasticsearch-exporter - prometheus-grafana - prometheus-libvirt-exporter - prometheus-memcached-exporter - prometheus-msteams - prometheus-mysql-exporter - prometheus-node-exporter - prometheus-rabbitmq-exporter # Removed in MOSK 25.2 - prometheus-relay - prometheus-server - rabbitmq-operator-metrics - sf-notifier - telegraf-docker-swarm - telemeter-client - telemeter-server - tf-control - tf-redis - tf-vrouter - tf-zookeeper - ucp-kv _group-rest-client-metrics - helm-controller - kaas-exporter - mcc-controllers - mcc-providers _group-service-handler-metrics - mcc-controllers - mcc-providers _group-service-http-metrics - mcc-cache - mcc-controllers _group-service-reconciler-metrics - mcc-controllers - mcc-providers
<list of metrics to collect>- extra metrics of<job name>to be white-listed.
prometheusServer:
metricsFiltering:
enabled: true
extraMetricsInclude:
cadvisor:
- container_memory_failcnt
- container_network_transmit_errors_total
calico:
- felix_route_table_per_iface_sync_seconds_sum
- felix_bpf_dataplane_endpoints
_group-go-collector-metrics:
- go_gc_heap_goal_bytes
- go_gc_heap_objects_objects
Prometheus Node Exporter
nodeExporter.netDeviceExclude(string) DeprecatedDeprecated for the sake of extraArgs. Excludes monitoring of RegExp-specified network devices. The number of network interface-related metrics is significant and may cause extended Prometheus RAM usage in big clusters. Therefore, Prometheus Node Exporter only collects information of a basic set of interfaces (both host and container) and excludes the following monitoring interfaces:
veth/cali- the host-side part of the container-host Ethernet tunnelo-hm0- the OpenStack Octavia management interface for communication with the amphora machinetap,qg-,qr-,ha-- the Open vSwitch virtual bridge portsbr-(ex|int|tun)- the Open vSwitch virtual bridgesdocker0,br-- the Docker bridge (masterfor thevethinterfaces)ovs-system- the Open vSwitch interface (mapping interfaces to bridges)vxlan_sys(since 2.31.2 and 25.2.7) - the shared kernel VXLAN interface used internally by Open vSwitch
To enable information collecting for the interfaces above, edit the list of blacklisted devices as needed.
Since 2.31.2 and 25.2.7:
nodeExporter: netDeviceExclude: "^(veth.+|cali.+|o-hm0|tap.+|qg-.+|qr-.+|ha-.+|br-.+|ovs-system|docker0|vxlan_sys)$"
Before 2.31.2 and 25.2.7:
nodeExporter: netDeviceExclude: "^(veth.+|cali.+|o-hm0|tap.+|qg-.+|qr-.+|ha-.+|br-.+|ovs-system|docker0)$"
nodeExporter.extraCollectorsEnabled(slice) DeprecatedDeprecated for the sake of extraArgs. Enables Node Exporter collectors. For a list of available collectors, see Node Exporter Collectors. The following collectors are enabled by default in StackLight:
arpconntrack[0]cpucpu.infodiskstatsentropyfilefd
filesystemhwmonloadavgmeminfonetdevnetstatnfs
statsockstattextfiletimetimexunamevmstat
nodeExporter: extraCollectorsEnabled: - bcache - bonding - softnet
nodeExporter.extraArgs(map)Additional command-line arguments passed to Node Exporter. This field has the highest priority when merging with default and other user-provided arguments, including
nodeExporter.netDeviceExcludeandnodeExporter.extraCollectorsEnabled(both are deprecated).The value should be a map of flags to values. For boolean flags, use an empty string (
"") as the value.nodeExporter: extraArgs: collector.filesystem.ignored-mount-points: "^/(dev|proc|sys|var/lib/docker/.+)($|/)" collector.netdev.device-exclude: "lo" collector.cpu: ""
nodeExporter.extraVolumes(slice)Additional volumes to be mounted into the Node Exporter pod. Do not define volumes using the following names, as they are already defined by default. Duplicating them will break the following Helm upgrades:
rootfs,proc, andsys.nodeExporter: extraVolumes: - name: custom-mount hostPath: path: /custom type: Directory
nodeExporter.extraVolumeMounts(slice)Additional volume mounts to be added to the Node Exporter container. Do not define mounts using the following mountPaths, as they are already included by default. Adding them here will result in duplicate definitions and cause Helm upgrade failures:
/host/rootfs/host/proc/host/sys
nodeExporter: extraVolumeMounts: - name: custom-mount mountPath: /host/custom readOnly: true
Prometheus Relay
Note
Prometheus Relay is set up as an endpoint in the Prometheus datasource in Grafana. Therefore, all requests from Grafana are sent to Prometheus through Prometheus Relay. If Prometheus Relay reports request timeouts or exceeds the response size limits, you can configure the parameters below. In this case, Prometheus Relay resource limits may also require tuning.
prometheusRelay.clientTimeout(string)Specifies the client timeout in seconds. If empty, defaults to a value determined by the cluster size:
10for small,30for medium,60for large.
prometheusRelay.responseLimitBytes(string)Specifies the response size limit in bytes. If empty, defaults to a value determined by the cluster size:
6291456for small,18874368for medium,37748736for large.
prometheusRelay:
clientTimeout: 10
responseLimitBytes: 1048576
Prometheus remote write
Allows sending of metrics from Prometheus to a custom monitoring endpoint. For details, see Prometheus Documentation: remote_write.
prometheusServer.remoteWriteSecretMounts(slice)Skip this step if your remote server does not have authorization. Defines additional mounts for
remoteWritessecrets. Secret objects with credentials needed to access the remote endpoint must be precreated in thestacklightnamespace. For details, see Kubernetes Secrets.Note
To create more than one file for the same remote write endpoint, for example, to configure TLS connections, use a single secret object with multiple keys in the
datafield. Using the following example configuration, two files will be created,cert_fileandkey_file:... data: cert_file: aWx1dnRlc3Rz key_file: dGVzdHVzZXI= ...
prometheusServer.remoteWrites(slice)Defines the configuration of a custom remote_write endpoint for sending Prometheus samples.
Note
If the remote server uses authorization, first create secret(s) in the
stacklightnamespace and mount them to Prometheus throughprometheusServer.remoteWriteSecretMounts. Then define the created secret in theauthorizationfield.
prometheusServer:
remoteWriteSecretMounts:
- secretName: prom-secret-files
mountPath: /etc/config/remote_write
remoteWrites:
- url: http://remote_url/push
authorization:
credentials_file: /etc/config/remote_write/key_file
Resource limits
resourcesPerClusterSize(map) DeprecatedProvides the capability to override the default resource requests or limits for any StackLight component for the predefined cluster sizes.
Caution
The
resourcesPerClusterSizeparameter is deprecated and is overridden by theresourcesparameter. Therefore, use theresourcesparameter instead.StackLight components for resource limits customization
Note
The below list has the
componentName: <podNamePrefix>/<containerName>format.alerta: alerta/alerta alertmanager: prometheus-alertmanager/prometheus-alertmanager alertmanagerWebhookServicenow: alertmanager-webhook-servicenow/alertmanager-webhook-servicenow blackboxExporter: prometheus-blackbox-exporter/blackbox-exporter elasticsearch: opensearch-master/opensearch # Deprecated elasticsearchCurator: elasticsearch-curator/elasticsearch-curator elasticsearchExporter: elasticsearch-exporter/elasticsearch-exporter fluentdElasticsearch: fluentd-logs/fluentd-logs # Deprecated fluentdLogs: fluentd-logs/fluentd-logs fluentdNotifications: fluentd-notifications/fluentd grafana: grafana/grafana iamProxy: iam-proxy/iam-proxy # Deprecated iamProxyAlerta: iam-proxy-alerta/iam-proxy iamProxyAlertmanager: iam-proxy-alertmanager/iam-proxy iamProxyGrafana: iam-proxy-grafana/iam-proxy iamProxyKibana: iam-proxy-kibana/iam-proxy # Deprecated iamProxyOpenSearchDashboards: iam-proxy-kibana/iam-proxy iamProxyPrometheus: iam-proxy-prometheus/iam-proxy kibana: opensearch-dashboards/opensearch-dashboards # Deprecated kubeStateMetrics: prometheus-kube-state-metrics/prometheus-kube-state-metrics libvirtExporter: prometheus-libvirt-exporter/prometheus-libvirt-exporter metricCollector: metric-collector/metric-collector metricbeat: metricbeat/metricbeat nodeExporter: prometheus-node-exporter/prometheus-node-exporter opensearch: opensearch-master/opensearch opensearchDashboards: opensearch-dashboards/opensearch-dashboards patroniExporter: patroni/patroni-patroni-exporter pgsqlExporter: patroni/patroni-pgsql-exporter postgresql: patroni/patroni prometheusEsExporter: prometheus-es-exporter/prometheus-es-exporter prometheusMsTeams: prometheus-msteams/prometheus-msteams prometheusRelay: prometheus-relay/prometheus-relay prometheusServer: prometheus-server/prometheus-server sfNotifier: sf-notifier/sf-notifier sfReporter: sf-reporter/sf-reporter stacklightHelmControllerController: stacklight-helm-controller/controller telegrafDockerSwarm: telegraf-docker-swarm/telegraf-docker-swarm telegrafDs: telegraf-ds-smart/telegraf-ds-smart # Deprecated telegrafDsSmart: telegraf-ds-smart/telegraf-ds-smart telegrafS: telegraf-docker-swarm/telegraf-docker-swarm # Deprecated telemeterClient: telemeter-client/telemeter-client telemeterServer: telemeter-server/telemeter-server telemeterServerAuthServer: telemeter-server/telemeter-server-authorization-server tfControllerExporter: prometheus-tf-controller-exporter/prometheus-tungstenfabric-exporter tfVrouterExporter: prometheus-tf-vrouter-exporter/prometheus-tungstenfabric-exporter
resourcesPerClusterSize: # elasticsearch: opensearch: small: limits: cpu: "1000m" memory: "4Gi" medium: limits: cpu: "2000m" memory: "8Gi" requests: cpu: "1000m" memory: "4Gi" large: limits: cpu: "4000m" memory: "16Gi"
resources(map)Provides the capability to override the containers resource requests or limits for any StackLight component.
StackLight components for resource limits customization
Note
The below list has the
componentName: <podNamePrefix>/<containerName>format.alerta: alerta/alerta alertmanager: prometheus-alertmanager/prometheus-alertmanager alertmanagerWebhookServicenow: alertmanager-webhook-servicenow/alertmanager-webhook-servicenow blackboxExporter: prometheus-blackbox-exporter/blackbox-exporter elasticsearch: opensearch-master/opensearch # Deprecated elasticsearchCurator: elasticsearch-curator/elasticsearch-curator elasticsearchExporter: elasticsearch-exporter/elasticsearch-exporter fluentdElasticsearch: fluentd-logs/fluentd-logs # Deprecated fluentdLogs: fluentd-logs/fluentd-logs fluentdNotifications: fluentd-notifications/fluentd grafana: grafana/grafana iamProxy: iam-proxy/iam-proxy # Deprecated iamProxyAlerta: iam-proxy-alerta/iam-proxy iamProxyAlertmanager: iam-proxy-alertmanager/iam-proxy iamProxyGrafana: iam-proxy-grafana/iam-proxy iamProxyKibana: iam-proxy-kibana/iam-proxy # Deprecated iamProxyOpenSearchDashboards: iam-proxy-kibana/iam-proxy iamProxyPrometheus: iam-proxy-prometheus/iam-proxy kibana: opensearch-dashboards/opensearch-dashboards # Deprecated kubeStateMetrics: prometheus-kube-state-metrics/prometheus-kube-state-metrics libvirtExporter: prometheus-libvirt-exporter/prometheus-libvirt-exporter metricCollector: metric-collector/metric-collector metricbeat: metricbeat/metricbeat nodeExporter: prometheus-node-exporter/prometheus-node-exporter opensearch: opensearch-master/opensearch opensearchDashboards: opensearch-dashboards/opensearch-dashboards patroniExporter: patroni/patroni-patroni-exporter pgsqlExporter: patroni/patroni-pgsql-exporter postgresql: patroni/patroni prometheusEsExporter: prometheus-es-exporter/prometheus-es-exporter prometheusMsTeams: prometheus-msteams/prometheus-msteams prometheusRelay: prometheus-relay/prometheus-relay prometheusServer: prometheus-server/prometheus-server sfNotifier: sf-notifier/sf-notifier sfReporter: sf-reporter/sf-reporter stacklightHelmControllerController: stacklight-helm-controller/controller telegrafDockerSwarm: telegraf-docker-swarm/telegraf-docker-swarm telegrafDs: telegraf-ds-smart/telegraf-ds-smart # Deprecated telegrafDsSmart: telegraf-ds-smart/telegraf-ds-smart telegrafS: telegraf-docker-swarm/telegraf-docker-swarm # Deprecated telemeterClient: telemeter-client/telemeter-client telemeterServer: telemeter-server/telemeter-server telemeterServerAuthServer: telemeter-server/telemeter-server-authorization-server tfControllerExporter: prometheus-tf-controller-exporter/prometheus-tungstenfabric-exporter tfVrouterExporter: prometheus-tf-vrouter-exporter/prometheus-tungstenfabric-exporter
resources: alerta: requests: cpu: "50m" memory: "200Mi" limits: memory: "500Mi"
Using the example above, each pod in the
alertaservice will be requesting 50 millicores of CPU and 200 MiB of memory, while being hard-limited to 500 MiB of memory usage. Each configuration key is optional.Note
The logging mechanism performance depends on the cluster log load. If the cluster components send an excessive amount of logs, the default resource requests and limits for
fluentdLogs(orfluentdElasticsearch) may be insufficient, which may cause its pods to be OOMKilled and trigger theKubePodCrashLoopingalert. In such case, increase the default resource requests and limits forfluentdLogs. For example:resources: # fluentdElasticsearch: fluentdLogs: requests: memory: "500Mi" limits: memory: "1500Mi"
Salesforce reporter
On MOSK clusters with limited internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Salesforce reporter depends on the internet access through HTTPS.
clusterId(string)Unique cluster identifier
clusterId="<Cluster Project>/<Cluster Name>/<UID>", generated for each cluster using Cluster Project, Cluster Name, and cluster UID, separated by a slash. Used for bothsf-reporterandsf-notifierservices.The
clusterIdkey is automatically defined for each cluster. Do not set or modify it manually.
sfReporter.enabled(bool)Enables or disables reporting of Prometheus metrics to Salesforce. For details, see Deployment architecture. Disabled by default.
sfReporter.salesForceAuth(map)Salesforce parameters and credentials for the metrics reporting integration.
Note
Modify this parameter if
sf-notifieris not configured or if you want to use a different Salesforce user account to send reports to.
sfReporter.cronjob(map)Defines the Kubernetes cron job for sending metrics to Salesforce. By default, reports are sent at midnight server time.
sfReporter:
enabled: false
salesForceAuth:
url: "<SF instance URL>"
username: "<SF account email address>"
password: "<SF password>"
environment_id: "<Cloud identifier>"
organization_id: "<Organization identifier>"
sandbox_enabled: "<Set to true or false>"
cronjob:
schedule: "0 0 * * *"
concurrencyPolicy: "Allow"
failedJobsHistoryLimit: ""
successfulJobsHistoryLimit: ""
startingDeadlineSeconds: 200
Storage class
In an HA StackLight setup, when highAvailabilityEnabled is set to true,
all StackLight Persistent Volumes (PVs) use the Local Volume Provisioner (LVP)
storage class not to rely on dynamic provisioners such as Ceph, which are not
available in every deployment. In a non-HA StackLight setup, when no storage
class is specified, PVs use the default storage class of a cluster.
storage.defaultStorageClass(string)Defines the
StorageClassto use for all StackLight Persistent Volume Claims (PVCs) if a componentStorageClassis not defined using thecomponentStorageClasses. For example,lvporstandard. To use the default storage class, leave the string empty.
storage.componentStorageClasses(map)Defines (overrides the
defaultStorageClassvalue) the storage class for any StackLight component separately. To use the default storage class, leave the string empty.
storage:
defaultStorageClass: ""
componentStorageClasses:
elasticsearch: ""
opensearch: ""
fluentd: ""
postgresql: ""
prometheusAlertManager: ""
prometheusServer: ""
Telegraf S.M.A.R.T
telegrafSmart.enabled(bool)Enables the Telegraf S.M.A.R.T input plugin. Enabled by default.
telegrafSmart.configParameters(map)Configuration block for the S.M.A.R.T input plugin. The list of the supported parameters includes:
enable_extensionstag_with_device_typeattributes(default:true)interval(default:"30s")excludesdevicestimeout
Warning
When the
devicesparameter is defined, all specified devices are selected on each node. If the listed device is not present on a given node, Telegraf reportsmetric smart_device_exit_status{device=...}with a value of2indicating the non-successfulexitstatus.Refer to the official Telegraf S.M.A.R.T Input Plugin documentation for details and expected formats.
telegrafSmart:
enabled: true
configParameters:
enable_extensions:
- "auto-on"
excludes:
- "/dev/sdb"
attributes: true
tag_with_device_type: false
interval: "30s"
timeout: "25s"