StackLight configuration parameters¶
This section describes the StackLight configuration keys that you can specify
in the values
section to change StackLight settings as required. Prior to
making any changes to StackLight configuration, perform the steps described in
StackLight configuration procedure.
After changing StackLight configuration, verify the changes as described in
Verify StackLight after configuration.
Important
Some parameters are marked as mandatory. Failure to specify values for such parameters causes the Admission Controller to reject cluster creation.
Alerta¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables Alerta. Set to |
|
Grafana¶
Key |
Description |
Example values |
---|---|---|
|
Disables Grafana Image Renderer. For example, for resource-limited environments. Enabled by default. |
|
|
Defines the home dashboard. Set to |
|
Logging¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables the StackLight logging stack. For details about the
logging components, see Deployment architecture. Set to |
|
|
Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).
Sets the least important level of log messages to send to OpenSearch.
Requires The default logging level is Note The |
|
|
Allows configuring OpenSearch queries for the data present in OpenSearch. Prometheus Elasticsearch Exporter then queries the OpenSearch database and exposes such metrics in the Prometheus format. For details, see Create logs-based metrics. Includes the following parameters:
|
For usage example, see Create logs-based metrics. |
|
Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Specifies the retention time per index. Includes the following parameters:
The allowed values include integers (days) and numbers with suffixes: y, m, w, d, h, including capital letters. |
logging:
retentionTime:
logstash: 3
events: "2w"
notifications: "1M"
|
Log verbosity¶
Key |
Description |
Example values |
---|---|---|
|
Defines the log verbosity level for all StackLight components if not
defined using |
|
|
Defines (overrides the |
component:
kubeStateMetrics: ""
prometheusAlertManager: ""
prometheusBlackboxExporter: ""
prometheusNodeExporter: ""
prometheusServer: ""
alerta: ""
alertmanagerWebhookServicenow: ""
elasticsearchCurator: ""
postgresql: ""
prometheusEsExporter: ""
sfNotifier: ""
sfReporter: ""
fluentd: ""
# fluentdElasticsearch ""
fluentdLogs: ""
telemeterClient: ""
telemeterServer: ""
tfControllerExporter: ""
tfVrouterExporter: ""
telegrafDs: ""
telegrafS: ""
# elasticsearch: ""
opensearch: ""
# kibana: ""
grafana: ""
opensearchDashboards: ""
metricbeat: ""
prometheusMsTeams: ""
|
Logging to external outputs¶
Available since 2.23.0 and 2.23.1 for MOSK 23.1
Key |
Description |
Example values |
---|---|---|
|
Specifies external Elasticsearch, OpenSearch, and syslog destinations
as |
logging:
externalOutputs:
elasticsearch:
# disabled: false
type: elasticsearch
level: info
plugin_log_level: info
tag_exclude: '{fluentd-logs,systemd}'
host: elasticsearch-host
port: 9200
logstash_date_format: '%Y.%m.%d'
logstash_format: true
logstash_prefix: logstash
...
buffer:
# disabled: false
chunk_limit_size: 16m
flush_interval: 15s
flush_mode: interval
overflow_action: block
...
opensearch:
disabled: true
type: opensearch
...
|
Secrets for external log outputs¶
Available since 2.23.0 and 2.23.1 for MOSK 23.1
Key |
Description |
Example values |
---|---|---|
|
Specifies authentication secret mounts for external log destinations.
Requires
|
Secret mount configuration: logging:
externalOutputSecretMounts:
- secretName: elasticsearch-certs
mountPath: /tmp/elasticsearch-certs
defaultMode: 420
- secretName: opensearch-certs
mountPath: /tmp/opensearch-certs
Elasticsearch configuration for the above secret mount: logging:
externalOutputs:
elasticsearch:
...
ca_file: /tmp/elasticsearch-certs/ca.pem
client_cert: /tmp/elasticsearch-certs/client.pem
client_key: /tmp/elasticsearch-certs/client.key
client_key_pass: password
|
Logging to syslog¶
Deprecated since 2.23.0
Note
Since Container Cloud 2.23.0, logging.syslog
is deprecated for
the sake of logging.externalOutputs
. For details, see
Logging to external outputs.
Key |
Description |
Example values |
---|---|---|
|
Enables or disables remote logging to syslog. Disabled by default.
Requires |
|
|
Specifies the remote syslog host. |
|
|
Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Specifies logging level for the syslog output. |
|
|
Specifies the remote syslog port. |
|
|
Defines the packet size in bytes for the syslog logging output. Set to
|
|
|
Specifies the remote syslog protocol. Set to |
|
|
Optional. Disabled by default. Enables or disables TLS. Use TLS only for the TCP protocol. TLS will not be enabled if you set a protocol other than TCP. |
|
|
Optional. Configures TLS verification. |
|
|
Defines how to pass the certificate.
|
certificate:
secret: ""
hostPath: "/etc/ssl/certs/ca-bundle.pem"
|
|
Optional. Overrides
How to obtain tags for logsSelect from the following options:
The values for |
|
|
Optional. Is overridden by |
|
Log filtering for namespaces¶
Available since Cluster releases 17.0.0, 16.0.0, 14.1.0
Key |
Description |
Example values |
---|---|---|
|
Limits the number of namespaces for Pods log collection. Enabled by default with the following list of monitored Kubernetes namespaces:
Kubernetes namespaces monitored by default
|
|
|
Adds extra namespaces to collect Kubernetes Pod logs from. Requires
|
logging:
namespaceFiltering:
logs:
enabled: true
extraNamespaces:
- custom-ns-1
|
|
Limits the number of namespaces for Kubernetes events collection.
Disabled by default due to sysdig scanner present on some
MOSK clusters and due to cluster-scoped objects
producing events by default to the |
|
|
Adds extra namespaces to collect Kubernetes events from. Requires
|
logging:
namespaceFiltering:
events:
enabled: true
extraNamespaces:
- custom-ns-1
|
Enforce OOPS compression¶
Available since Cluster releases 17.0.0, 16.0.0, 14.1.0
Key |
Description |
Example values |
---|---|---|
|
Enforces 32 GB of heap size, unless the defined memory limit allows using
50 GB of heap. Requires |
logging:
enforceOopsCompression: true
|
OpenSearch¶
Key |
Description |
Example values |
---|---|---|
|
Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Specifies the retention time per index. Includes the following parameters:
The allowed values include integers (days) and numbers with suffixes: y, m, w, d, h, including capital letters. By default, values set in |
elasticsearch:
retentionTime:
logstash: 3
events: "2w"
notifications: "1M"
|
|
Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Defines the OpenSearch (Elasticsearch) Note Due to the known issue 27732-2, a custom
setting for this parameter is dismissed during cluster deployment and
changes to one day (default). Refer to the known issue description
for the affected |
|
|
Specifies the OpenSearch (Elasticsearch) PVC(s) size. The number of PVCs depends on the StackLight database mode. For HA, three PVCs will be created, each of the size specified in this parameter. For non-HA, one PVC of the specified size. Important You cannot modify this parameter after cluster creation. Note Due to the known issue 27732-1 that is
fixed in Container Cloud 2.22.0 (Cluster releases 11.6.0 and 12.7.0),
the OpenSearch PVC size configuration is dismissed during a cluster
deployment. Refer to the known issue description for affected
|
elasticsearch:
persistentVolumeClaimSize: 30Gi
|
|
Optional. Specifies the number of gigabytes that is exclusively available
for the OpenSearch data. Defines ceiling for storage-based retention
where 80% of the defined value is assumed as available disk space
for normal OpenSearch node functioning. If not set (by default),
the number of gigabytes from This parameter is useful in the following cases:
|
elasticsearch:
persistentVolumeUsableStorageSizeGB: 160
|
OpenSearch extra settings¶
Key |
Description |
Example values |
---|---|---|
|
Additional configuration for |
logging:
extraConfig:
cluster.max_shards_per_node: 5000
|
OpenSearch Dashboards extra settings¶
Key |
Description |
Example values |
---|---|---|
|
Additional configuration for |
logging:
dashboardsExtraConfig:
opensearch.requestTimeout: 60000
|
High availability¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables StackLight multiserver mode. For details, see
StackLight database modes in Deployment architecture.
On managed clusters, set to |
|
Prometheus¶
Key |
Description |
Example values |
---|---|---|
|
Defines the minimum amount of time for Prometheus to wait before
resending an alert to Alertmanager. Passed to the
|
|
|
Defines the list of labels to be injected to firing alerts while they are sent to Alertmanager. Empty by default. The following labels are reserved for internal purposes and cannot
be overridden: Caution When new labels are injected, Prometheus sends alert updates with a new set of labels, which can potentially cause Alertmanager to have duplicated alerts for a short period of time if the cluster currently has firing alerts. |
alertsCommonLabels:
region: west
environment: prod
|
|
Specifies the Prometheus PVC(s) size. The number of PVCs depends on the StackLight database mode. For HA, three PVCs will be created, each of the size specified in this parameter. For non-HA, one PVC of the specified size. Important You cannot modify this parameter after cluster creation. |
prometheusServer:
persistentVolumeClaimSize: 16Gi
|
|
Defines the number of concurrent queries limit. Passed to the
|
|
|
Defines the Prometheus database retention size. Passed to the
|
|
|
Defines the Prometheus database retention period. Passed to the
|
|
Prometheus remote write¶
Allows sending of metrics from Prometheus to a custom monitoring endpoint. For details, see Prometheus Documentation: remote_write.
Key |
Description |
Example values |
---|---|---|
|
Skip this step if your remote server does not have authorization.
Defines additional mounts for Note To create more than one file for the same remote write
endpoint, for example, to configure TLS connections,
use a single secret object with multiple keys in the ...
data:
cert_file: aWx1dnRlc3Rz
key_file: dGVzdHVzZXI=
...
|
remoteWriteSecretMounts:
- secretName: prom-secret-files
mountPath: /etc/config/remote_write
|
|
Defines the configuration of a custom remote_write endpoint for sending Prometheus samples. Note If the remote server uses authorization, first create
secret(s) in the |
remoteWrites:
- url: http://remote_url/push
authorization:
credentials_file: /etc/config/remote_write/key_file
|
Prometheus Relay¶
Note
Prometheus Relay is set up as an endpoint in the Prometheus datasource in Grafana. Therefore, all requests from Grafana are sent to Prometheus through Prometheus Relay. If Prometheus Relay reports request timeouts or exceeds the response size limits, you can configure the parameters below. In this case, Prometheus Relay resource limits may also require tuning.
Key |
Description |
Example values |
---|---|---|
|
Specifies the client timeout in seconds. If empty, defaults to a value
determined by the cluster size: Note The cluster size parameters are available since Container Cloud 2.24.0. |
|
|
Specifies the response size limit in bytes. If empty, defaults to a
value determined by the cluster size: Note The cluster size parameters are available since Container Cloud 2.24.0. |
|
Custom Prometheus recording rules¶
Key |
Description |
Example values |
---|---|---|
|
Defines custom Prometheus recording rules. Overriding of existing recording rules is not supported. |
customRecordingRules:
- name: ExampleRule.http_requests_total
rules:
- expr: sum by(job) (rate(http_requests_total[5m]))
record: job:http_requests:rate5m
- expr: avg_over_time(job:http_requests:rate5m[1w])
record: job:http_requests:rate5m:avg_over_time_1w
|
Custom Prometheus scrape configurations¶
Key |
Description |
Example values |
---|---|---|
|
Defines custom Prometheus scrape configurations. For details, see Prometheus documentation: scrape_config. The names of default StackLight scrape configurations, which you can view in the Status -> Targets tab of the Prometheus web UI, are reserved for internal usage and any overrides will be discarded. Therefore, provide unique names to avoid overrides. |
customScrapeConfigs:
custom-grafana:
scrape_interval: 10s
scrape_timeout: 5s
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels:
- __meta_kubernetes_service_label_app
- __meta_kubernetes_endpoint_port_name
regex: grafana;service
action: keep
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
|
Cluster size¶
Key |
Description |
Example values |
---|---|---|
|
Specifies the approximate expected cluster size. Set to
|
|
Resource limits¶
Key |
Description |
Example values |
---|---|---|
|
Provides the capability to override the default resource requests or limits for any StackLight component for the predefined cluster sizes. Caution Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and
16.3.0),
StackLight components for resource limits customizationNote The below list has the
alerta: alerta/alerta
alertmanager: prometheus-alertmanager/prometheus-alertmanager
alertmanagerWebhookServicenow: alertmanager-webhook-servicenow/alertmanager-webhook-servicenow
blackboxExporter: prometheus-blackbox-exporter/blackbox-exporter
elasticsearch: opensearch-master/opensearch # Deprecated
elasticsearchCurator: elasticsearch-curator/elasticsearch-curator
elasticsearchExporter: elasticsearch-exporter/elasticsearch-exporter
fluentdElasticsearch: fluentd-logs/fluentd-logs # Deprecated
fluentdLogs: fluentd-logs/fluentd-logs
fluentdNotifications: fluentd-notifications/fluentd # for MOSK
grafana: grafana/grafana
grafanaRenderer: grafana/grafana-renderer # Removed in 2.27.0 (Cluster releases 17.2.0 and 16.2.0)
iamProxy: iam-proxy/iam-proxy # Deprecated
iamProxyAlerta: iam-proxy-alerta/iam-proxy
iamProxyAlertmanager: iam-proxy-alertmanager/iam-proxy
iamProxyGrafana: iam-proxy-grafana/iam-proxy
iamProxyKibana: iam-proxy-kibana/iam-proxy # Deprecated
iamProxyOpenSearchDashboards: iam-proxy-kibana/iam-proxy
iamProxyPrometheus: iam-proxy-prometheus/iam-proxy
kibana: opensearch-dashboards/opensearch-dashboards # Deprecated
kubeStateMetrics: prometheus-kube-state-metrics/prometheus-kube-state-metrics
libvirtExporter: prometheus-libvirt-exporter/prometheus-libvirt-exporter # for MOSK
metricCollector: metric-collector/metric-collector
metricbeat: metricbeat/metricbeat
nodeExporter: prometheus-node-exporter/prometheus-node-exporter
opensearch: opensearch-master/opensearch
opensearchDashboards: opensearch-dashboards/opensearch-dashboards
patroniExporter: patroni/patroni-patroni-exporter
pgsqlExporter: patroni/patroni-pgsql-exporter
postgresql: patroni/patroni
prometheusEsExporter: prometheus-es-exporter/prometheus-es-exporter
prometheusMsTeams: prometheus-msteams/prometheus-msteams
prometheusRelay: prometheus-relay/prometheus-relay
prometheusServer: prometheus-server/prometheus-server
refapp: refapp/refapp # Removed in 2.28.3 (16.3.3)
refappCleanup: refapp-cleanup/refapp-cleanup # Removed in 2.28.3 (16.3.3)
refappInit: db-init/db-init # Removed in 2.28.3 (16.3.3)
sfNotifier: sf-notifier/sf-notifier
sfReporter: sf-reporter/sf-reporter
stacklightHelmControllerController: stacklight-helm-controller/controller
telegrafDockerSwarm: telegraf-docker-swarm/telegraf-docker-swarm
telegrafDs: telegraf-ds-smart/telegraf-ds-smart # Deprecated
telegrafDsSmart: telegraf-ds-smart/telegraf-ds-smart
telegrafOpenstack: telegraf-openstack/telegraf-openstack # for MOSK, replaced with osdpl-exporter in 24.1
telegrafS: telegraf-docker-swarm/telegraf-docker-swarm # Deprecated
telemeterClient: telemeter-client/telemeter-client
telemeterServer: telemeter-server/telemeter-server
telemeterServerAuthServer: telemeter-server/telemeter-server-authorization-server
tfControllerExporter: prometheus-tf-controller-exporter/prometheus-tungstenfabric-exporter # for MOSK
tfVrouterExporter: prometheus-tf-vrouter-exporter/prometheus-tungstenfabric-exporter # for MOSK
|
resourcesPerClusterSize:
# elasticsearch:
opensearch:
small:
limits:
cpu: "1000m"
memory: "4Gi"
medium:
limits:
cpu: "2000m"
memory: "8Gi"
requests:
cpu: "1000m"
memory: "4Gi"
large:
limits:
cpu: "4000m"
memory: "16Gi"
|
|
Provides the capability to override the containers resource requests or limits for any StackLight component.
StackLight components for resource limits customizationNote The below list has the
alerta: alerta/alerta
alertmanager: prometheus-alertmanager/prometheus-alertmanager
alertmanagerWebhookServicenow: alertmanager-webhook-servicenow/alertmanager-webhook-servicenow
blackboxExporter: prometheus-blackbox-exporter/blackbox-exporter
elasticsearch: opensearch-master/opensearch # Deprecated
elasticsearchCurator: elasticsearch-curator/elasticsearch-curator
elasticsearchExporter: elasticsearch-exporter/elasticsearch-exporter
fluentdElasticsearch: fluentd-logs/fluentd-logs # Deprecated
fluentdLogs: fluentd-logs/fluentd-logs
fluentdNotifications: fluentd-notifications/fluentd # for MOSK
grafana: grafana/grafana
grafanaRenderer: grafana/grafana-renderer # Removed in 2.27.0 (Cluster releases 17.2.0 and 16.2.0)
iamProxy: iam-proxy/iam-proxy # Deprecated
iamProxyAlerta: iam-proxy-alerta/iam-proxy
iamProxyAlertmanager: iam-proxy-alertmanager/iam-proxy
iamProxyGrafana: iam-proxy-grafana/iam-proxy
iamProxyKibana: iam-proxy-kibana/iam-proxy # Deprecated
iamProxyOpenSearchDashboards: iam-proxy-kibana/iam-proxy
iamProxyPrometheus: iam-proxy-prometheus/iam-proxy
kibana: opensearch-dashboards/opensearch-dashboards # Deprecated
kubeStateMetrics: prometheus-kube-state-metrics/prometheus-kube-state-metrics
libvirtExporter: prometheus-libvirt-exporter/prometheus-libvirt-exporter # for MOSK
metricCollector: metric-collector/metric-collector
metricbeat: metricbeat/metricbeat
nodeExporter: prometheus-node-exporter/prometheus-node-exporter
opensearch: opensearch-master/opensearch
opensearchDashboards: opensearch-dashboards/opensearch-dashboards
patroniExporter: patroni/patroni-patroni-exporter
pgsqlExporter: patroni/patroni-pgsql-exporter
postgresql: patroni/patroni
prometheusEsExporter: prometheus-es-exporter/prometheus-es-exporter
prometheusMsTeams: prometheus-msteams/prometheus-msteams
prometheusRelay: prometheus-relay/prometheus-relay
prometheusServer: prometheus-server/prometheus-server
refapp: refapp/refapp # Removed in 2.28.3 (16.3.3)
refappCleanup: refapp-cleanup/refapp-cleanup # Removed in 2.28.3 (16.3.3)
refappInit: db-init/db-init # Removed in 2.28.3 (16.3.3)
sfNotifier: sf-notifier/sf-notifier
sfReporter: sf-reporter/sf-reporter
stacklightHelmControllerController: stacklight-helm-controller/controller
telegrafDockerSwarm: telegraf-docker-swarm/telegraf-docker-swarm
telegrafDs: telegraf-ds-smart/telegraf-ds-smart # Deprecated
telegrafDsSmart: telegraf-ds-smart/telegraf-ds-smart
telegrafOpenstack: telegraf-openstack/telegraf-openstack # for MOSK, replaced with osdpl-exporter in 24.1
telegrafS: telegraf-docker-swarm/telegraf-docker-swarm # Deprecated
telemeterClient: telemeter-client/telemeter-client
telemeterServer: telemeter-server/telemeter-server
telemeterServerAuthServer: telemeter-server/telemeter-server-authorization-server
tfControllerExporter: prometheus-tf-controller-exporter/prometheus-tungstenfabric-exporter # for MOSK
tfVrouterExporter: prometheus-tf-vrouter-exporter/prometheus-tungstenfabric-exporter # for MOSK
|
resources:
alerta:
requests:
cpu: "50m"
memory: "200Mi"
limits:
memory: "500Mi"
Using the example above, each pod in the Note The logging mechanism performance depends on the cluster log
load. If the cluster components send an excessive amount of logs, the
default resource requests and limits for resources:
# fluentdElasticsearch:
fluentdLogs:
requests:
memory: "500Mi"
limits:
memory: "1500Mi"
|
Byte limit for Telemeter client¶
For internal StackLight use only
Key |
Description |
Example values |
---|---|---|
|
Specifies the size limit of the incoming data length in bytes for the
Telemeter client. Defaults to |
|
Kubernetes network policies¶
Available since Cluster releases 17.0.1 and 16.0.1
Key |
Description |
Example values |
---|---|---|
|
Enables or disables the Kubernetes Network Policy resource that allows
controlling network connections to and from Pods deployed in the
For the list of network policy rules, refer to StackLight rules for Kubernetes network policies. Customization of network policies is not supported. |
|
Kubernetes tolerations¶
Key |
Description |
Example values |
---|---|---|
|
Kubernetes tolerations to add to all StackLight components. |
default:
- key: "com.docker.ucp.manager"
operator: "Exists"
effect: "NoSchedule"
|
|
Defines Kubernetes tolerations (overrides the default ones) for any StackLight component. |
component:
# elasticsearch:
opensearch:
- key: "com.docker.ucp.manager"
operator: "Exists"
effect: "NoSchedule"
postgresql:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
|
Storage class¶
In an HA StackLight setup, when highAvailabilityEnabled
is set to true
,
all StackLight Persistent Volumes (PVs) use the Local Volume Provisioner (LVP)
storage class not to rely on dynamic provisioners such as Ceph, which are not
available in every Container Cloud deployment. In a non-HA StackLight setup,
when no storage class is specified, PVs use the default storage class of a
cluster.
Key |
Description |
Example values |
---|---|---|
|
Defines the |
|
|
Defines (overrides the |
componentStorageClasses:
elasticsearch: ""
opensearch: ""
fluentd: ""
postgresql: ""
prometheusAlertManager: ""
prometheusServer: ""
|
NodeSelector¶
Key |
Description |
Example values |
---|---|---|
|
Defines the |
default:
role: stacklight
|
|
Defines the |
component:
alerta:
role: stacklight
component: alerta
# kibana:
# role: stacklight
# component: kibana
opensearchDashboards:
role: stacklight
component: opensearchdashboards
|
Prometheus Node Exporter¶
Key |
Description |
Example values |
---|---|---|
|
Excludes monitoring of RegExp-specified network devices. The number of network interface-related metrics is significant and may cause extended Prometheus RAM usage in big clusters. Therefore, Prometheus Node Exporter only collects information of a basic set of interfaces (both host and container) and excludes the following monitoring interfaces:
To enable information collecting for the interfaces above, edit the list of blacklisted devices as needed. |
nodeExporter:
netDeviceExclude: "^(veth.+|cali.+|o-hm0|tap.+|qg-.+|qr-.+|ha-.+|br-.+|ovs-system|docker0)$"
|
|
Enables Node Exporter collectors. For a list of available collectors, see Node Exporter Collectors. The following collectors are enabled by default in StackLight:
|
extraCollectorsEnabled:
- bcache
- bonding
- softnet
|
Prometheus Blackbox Exporter¶
Key |
Description |
Example values |
---|---|---|
|
Specifies a set of custom Blackbox Exporter modules. For details, see
Blackbox Exporter configuration: module.
The |
customModules:
http_post_2xx:
prober: http
timeout: 5s
http:
method: POST
headers:
Content-Type: application/json
body: '{}'
|
|
Specifies the offset to subtract from timeout in seconds
( |
|
Reference Application¶
Unsupported and removed in 2.28.3 (16.3.3) Available since 2.21.0 for non-MOSK managed clusters
Note
For the feature support on MOSK deployments, refer to MOSK documentation: Deploy your first cloud application using automation.
Key |
Description |
Example values |
---|---|---|
|
Enables or disables Reference Application that is a small microservice application that enables workload monitoring on non-MOSK managed clusters. Disabled by default. |
|
|
Available since Container Cloud 2.23.0.
Enables or disables persistent volumes for Reference Application.
Enabled by default. Disabling is not recommended for production clusters.
Once set, the value cannot be changed.
|
|
|
Defines |
refapp:
workload:
storageClassName: kubernetes-ssd
|
|
Available since Container Cloud 2.23.0.
Defines the size of persistent volumes for the Reference Application.
Default is
1Gi . Applies only if persistent volumes are enabled. |
refapp:
workload:
persistentVolumeSize: 1Gi
|
Salesforce reporter¶
On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Salesforce reporter depends on the Internet access through HTTPS.
Key |
Description |
Example values |
---|---|---|
|
Unique cluster identifier
The |
Do not modify |
|
Enables or disables reporting of Prometheus metrics to Salesforce. For details, see Deployment architecture. Disabled by default. |
|
|
Salesforce parameters and credentials for the metrics reporting integration. |
Note Modify this parameter if salesForceAuth:
url: "<SF instance URL>"
username: "<SF account email address>"
password: "<SF password>"
environment_id: "<Cloud identifier>"
organization_id: "<Organization identifier>"
sandbox_enabled: "<Set to true or false>"
|
|
Defines the Kubernetes cron job for sending metrics to Salesforce. By default, reports are sent at midnight server time. |
cronjob:
schedule: "0 0 * * *"
concurrencyPolicy: "Allow"
failedJobsHistoryLimit: ""
successfulJobsHistoryLimit: ""
startingDeadlineSeconds: 200
|
Ceph monitoring¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables Ceph monitoring on baremetal-based managed clusters.
Set to |
|
External endpoint monitoring¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables HTTP endpoints monitoring. If enabled, the
monitoring tool performs the probes against the defined endpoints every
15 seconds. Set to |
|
|
Defines the directory path with external endpoints certificates on host. |
|
|
Defines the list of HTTP endpoints to monitor. The endpoints must successfully respond to a liveness probe. For success, a request to a specific endpoint must result in a 2xx HTTP response code. |
domains:
- https://prometheus.io/health
- http://example.com:8080/status
- http://example.net:8080/pulse
|
Ironic monitoring¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables monitoring of bare metal Ironic on baremetal-based clusters. To enable, specify the Ironic API URL. |
|
|
Defines whether to skip the chain and host verification. Set to
|
|
SSL certificates monitoring¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables StackLight to monitor and alert on the expiration
date of the TLS certificate of an HTTPS endpoint. If enabled, the
monitoring tool performs the probes against the defined endpoints every
hour. Set to |
|
|
Defines the list of HTTPS endpoints to monitor the certificates from. |
domains:
- https://prometheus.io
- https://example.com:8080
|
Mirantis Kubernetes Engine monitoring¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables Mirantis Kubernetes Engine (MKE) monitoring.
Set to |
|
|
Defines the dockerd data root directory of persistent Docker state. For details, see Docker documentation: Daemon CLI (dockerd). |
|
Workload monitoring¶
Key |
Description |
Example values |
---|---|---|
|
On the clusters that run large-scale workloads, workload monitoring generates a big amount of resource-consuming metrics. To prevent generation of excessive metrics, you can disable workload monitoring in the StackLight metrics and monitor only the infrastructure. The |
metricFilter:
enabled: true
action: keep
namespaces:
- kaas
- kube-system
- stacklight
|
Prometheus metrics filtering¶
Available since 2.24.0 and 2.24.2 for MOSK 23.2
Key |
Description |
Example values |
---|---|---|
|
Configuration for managing Prometheus metrics filtering. When enabled (default), only actively used and explicitly white-listed metrics get scraped by Prometheus. |
prometheusServer:
metricsFiltering:
enabled: true
|
|
List of extra metrics to whitelist, which are dropped by default. Contains the following parameters:
|
prometheusServer:
metricsFiltering:
enabled: true
extraMetricsInclude:
cadvisor:
- container_memory_failcnt
- container_network_transmit_errors_total
calico:
- felix_route_table_per_iface_sync_seconds_sum
- felix_bpf_dataplane_endpoints
_group-go-collector-metrics:
- go_gc_heap_goal_bytes
- go_gc_heap_objects_objects
|
Alerts configuration¶
Key |
Description |
Example values |
---|---|---|
|
Defines custom alerts. Also, modifies or disables existing alert configurations. For the list of predefined alerts, see Available StackLight alerts. While adding or modifying alerts, follow the Alerting rules. |
customAlerts:
# To add a new alert:
- alert: ExampleAlert
annotations:
description: Alert description
summary: Alert summary
expr: example_metric > 0
for: 5m
labels:
severity: warning
# To modify an existing alert expression:
- alert: AlertmanagerFailedReload
expr: alertmanager_config_last_reload_successful == 5
# To disable an existing alert:
- alert: TargetDown
enabled: false
An optional field |
Watchdog alert¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables the |
|
Alertmanager integrations¶
On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled, for example, for the Salesforce integration and Alertmanager notifications external rules.
Key |
Description |
Example values |
---|---|---|
|
Provides a generic template for notifications receiver configurations. For a list of supported receivers, see Prometheus Alertmanager documentation: Receiver. |
For example, to enable notifications to OpsGenie: alertmanagerSimpleConfig:
genericReceivers:
- name: HTTP-opsgenie
enabled: true # optional
opsgenie_configs:
- api_url: "https://example.app.eu.opsgenie.com/"
api_key: "secret-key"
send_resolved: true
|
|
Provides a template for notifications route configuration. For details, see Prometheus Alertmanager documentation: Route. |
genericRoutes:
- receiver: HTTP-opsgenie
enabled: true # optional
matchers:
severity=~"major|critical"
continue: true
|
|
Disables or enables alert inhibition rules. If enabled, Alertmanager decreases alert noise by suppressing dependent alerts notifications to provide a clearer view on the cloud status and simplify troubleshooting. Enabled by default. For details, see Alert dependencies. For details on inhibition rules, see Prometheus documentation. |
|
Notifications to email¶
Key |
Description |
Example values |
---|---|---|
|
Enables or disables Alertmanager integration with email. Set to
|
|
|
Defines the notification parameters for Alertmanager integration with email. For details, see Prometheus Alertmanager documentation: Email configuration. |
email:
enabled: false
send_resolved: true
to: "to@test.com"
from: "from@test.com"
smarthost: smtp.gmail.com:587
auth_username: "from@test.com"
auth_password: password
auth_identity: "from@test.com"
require_tls: true
|
|
Defines the route for Alertmanager integration with email. For details, see Prometheus Alertmanager documentation: Route. |
route:
matchers: []
routes: []
|
Notifications to Salesforce¶
On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Salesforce integration depends on the Internet access through HTTPS.
Key |
Description |
Example values |
---|---|---|
|
Unique cluster identifier
The |
Do not modify |
|
Enables or disables Alertmanager integration with Salesforce using the
|
|
|
Defines the Salesforce parameters and credentials for integration with Alertmanager. |
auth:
url: "<SF instance URL>"
username: "<SF account email address>"
password: "<SF password>"
environment_id: "<Cloud identifier>"
organization_id: "<Organization identifier>"
sandbox_enabled: "<Set to true or false>"
|
|
Defines the notifications route for Alertmanager integration with Salesforce. For details, see Prometheus Alertmanager documentation: Route. |
route:
matchers:
- severity="critical"
routes: []
Note By default, only |
|
Enables or disables feed update in Salesforce. To save API calls, this
parameter is set to |
|
|
Enables or disables links to the Prometheus web UI in alerts sent to
Salesforce. To simplify troubleshooting, set to |
|
Notifications to Slack¶
On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Slack integration depends on the Internet access through HTTPS.
Key |
Description |
Example values |
---|---|---|
|
Enables or disables Alertmanager integration with Slack. For
details, see Prometheus Alertmanager documentation: Slack configuration.
Set to |
|
|
Defines the Slack webhook URL. |
|
|
Defines the Slack channel or user to send notifications to. |
|
|
Defines the notifications route for Alertmanager integration with Slack. For details, see Prometheus Alertmanager documentation: Route. |
route:
matchers: []
routes: []
|
Notifications to Microsoft Teams¶
On the managed clusters with limited Internet access, proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled. The Microsoft Teams integration depends on the Internet access through HTTPS.
Key |
Description |
Example values |
---|---|---|
|
Enables or disables Alertmanager integration with Microsoft Teams.
Requires a set up Microsoft Teams channel and a channel connector. Set
to |
|
|
Defines the URL of an Incoming Webhook connector of a Microsoft Teams channel. For details about channel connectors, see Microsoft documentation. |
|
|
Defines the notifications route for Alertmanager integration with MS Teams. For details, see Prometheus Alertmanager documentation: Route. |
route:
matchers: []
routes: []
|
Notifications to ServiceNow¶
Caution
Prior to configuring the integration with ServiceNow, perform the following prerequisite steps using the ServiceNow documentation of the required version.
In a new or existing Incident table, add the Alert ID field as described in Add fields to a table. To avoid alerts duplication, select Unique.
Create an Access Control List (ACL) with read/write permissions for the Incident table as described in Securing table records.
Key |
Description |
Example values |
---|---|---|
|
Enables or disables Alertmanager integration with ServiceNow. Set to
|
|
|
Defines the ServiceNow parameters and credentials for integration with Alertmanager:
|
serviceNow:
enabled: true
incident_table: "incident"
api_version: "v1"
alert_id_field: "u_alert_id"
auth:
instance: "https://dev00001.service-now.com"
username: "testuser"
password: "testpassword"
|