Verify StackLight after configuration¶

This section describes how to verify StackLight after configuring its parameters as described in StackLight configuration procedure and StackLight configuration parameters. Perform the verification procedure described for a particular modified StackLight key.

Verify StackLight configuration of an OpenStack cluster¶

Key	Verification procedure
`externalFQDNs.enabled` `openstack.insecure`	In the Prometheus web UI, navigate to Status > Targets. Verify that the `blackbox-external-endpoint` target contains the configured domains (URLs).
`openstack.enabled` `openstack.namespace`	In the Grafana web UI, verify that the OpenStack dashboards are present and not empty. In the Prometheus web UI, click Alerts and verify that the OpenStack alerts are present in the list of alerts.
`openstack.gnocchi.enabled`	In the Grafana web UI, verify that the Gnocchi dashboard is present and not empty. Alternatively, verify that the Gnocchi dashboard ConfigMap is present: kubectl get cm -n stacklight \ grafana-dashboards-default-gnocchi In the OpenSearch Dashboards web UI, verify that logs for the `gnocchi-metricd` and `gnocchi-api` loggers are present.
`openstack.ironic.enabled`	In the Grafana web UI, verify that the Ironic dashboard is present and not empty. In the Prometheus web UI, click Alerts and verify that the `Ironic*` alerts are present in the list of alerts.
`openstack.rabbitmq.credentialsConfig` `openstack.rabbitmq.credentialsDiscovery`	In the OpenSearch Dashboards web UI, click Discover and verify that the `audit-` and `notifications-` indexes contain documents.
`openstack.telegraf.credentialsConfig` `openstack.telegraf.credentialsDiscovery` `openstack.telegraf.interval` `openstack.telegraf.insecure` `openstack.telegraf.skipPublicEndpoints`	In the Grafana web UI, verify that the OpenStack dashboards are present and not empty.
`tungstenFabricMonitoring.enabled`	In the Grafana web UI, verify that the Tungsten Fabric dashboards are present and not empty. In the Prometheus web UI, click Alerts and verify that the Tungsten Fabric alerts are present in the list of alerts.

Verify StackLight configuration of a MOSK cluster¶

Key	Verification procedure
`alerta.enabled`	Verify that Alerta is present in the list of StackLight resources. An empty output indicates that Alerta is disabled. kubectl get all -n stacklight -l app=alerta
`alertmanagerSimpleConfig.email` `alertmanagerSimpleConfig.email.enabled` `alertmanagerSimpleConfig.email.route`	In the Alertmanager web UI, navigate to Status and verify that the Config section contains the `Email` receiver and route.
`alertmanagerSimpleConfig.genericReceivers`	In the Alertmanager web UI, navigate to Status and verify that the Config section contains the intended receiver(s).
`alertmanagerSimpleConfig.genericRoutes`	In the Alertmanager web UI, navigate to Status and verify that the Config section contains the intended route(s).
`alertmanagerSimpleConfig.inhibitRules.enabled`	Run the following command. An empty output indicates either a failure or that the feature is disabled. kubectl get cm -n stacklight prometheus-alertmanager -o \ yaml \| grep -A 6 inhibit_rules
`alertmanagerSimpleConfig.msteams.enabled` `alertmanagerSimpleConfig.msteams.url` `alertmanagerSimpleConfig.msteams.route`	Verify that the Prometheus Microsoft Teams pod is up and running: kubectl get pods -n stacklight -l \ 'app=prometheus-msteams' Verify that the Prometheus Microsoft Teams pod logs have no errors: kubectl logs -f -n stacklight -l \ 'app=prometheus-msteams' Verify that notifications are being sent to the Microsoft Teams channel.
`alertmanagerSimpleConfig.salesForce.enabled` `alertmanagerSimpleConfig.salesForce.auth` `alertmanagerSimpleConfig.salesForce.route`	Verify that `sf-notifier` is enabled. The output must include the `sf-notifier` pod name, `1/1` in the `READY` field and `Running` in the `STATUS` field. kubectl get pods -n stacklight Verify that `sf-notifier` successfully authenticates to Salesforce. The output must include the Salesforce authentication successful line. kubectl logs -f -n stacklight <sf-notifier-pod-name> In the Alertmanager web UI, navigate to Status and verify that the Config section contains the `HTTP-salesforce` receiver and route.
`alertmanagerSimpleConfig.salesForce.feed_enabled`	Verify that the `sf-notifier` pod logs include Creating feed item messages. For such messages to appear in logs, `DEBUG` logging level must be set up. Verify through Salesforce: Log in to the Salesforce web UI. Click the Feed tab for a case created by `sf-notifier`. Verify that All Messages gets updated.
`alertmanagerSimpleConfig.salesForce.link_prometheus`	Verify that `SF_NOTIFIER_ADD_LINKS` has changed to `true` or `false` according to your customization: kubectl get deployment sf-notifier \ -o=jsonpath='{.spec.template.spec.containers[0].env}' \| jq .
`alertmanagerSimpleConfig.serviceNow`	Verify that the `alertmanager-webhook-servicenow` pod is up and running: kubectl get pods -n stacklight -l \ 'app=alertmanager-webhook-servicenow' Verify that authentication to ServiceNow was successful. The output should include ServiceNow authentication successful. In case of authentication failure, the `ServiceNowAuthFailure` alert will raise. kubectl logs -f -n stacklight \ <alertmanager-webhook-servicenow-pod-name> In your ServiceNow instance, verify that the Watchdog alert appears in the Incident table. Once the incident is created, the pod logs should include a line similar to Created Incident: bef260671bdb2010d7b540c6cc4bcbed. In case of any failure: Verify that your ServiceNow instance is not in hibernation. Verify that the service user credentials, table name, and `alert_id_field` are correct. Verify that the ServiceNow user has access to the table with permission to read, create, and update records.
`alertmanagerSimpleConfig.slack.enabled` `alertmanagerSimpleConfig.slack.api_url` `alertmanagerSimpleConfig.slack.channel` `alertmanagerSimpleConfig.slack.route`	In the Alertmanager web UI, navigate to Status and verify that the Config section contains the `HTTP-slack` receiver and route.
`blackboxExporter.customModules`	Verify that your module is present in the list of modules. It can take up to 10 minutes for the module to appear in the ConfigMap. kubectl get cm prometheus-blackbox-exporter -n stacklight \ -o=jsonpath='{.data.blackbox\.yaml}' Review the `configmap-reload` container logs to verify that the reload happened successfully. It can take up to 1 minute for reload to happen after the module appears in the ConfigMap. kubectl logs -l app=prometheus-blackbox-exporter -n stacklight -c \ configmap-reload
`blackboxExporter.timeoutOffset`	Verify that the `args` parameter of the `blackbox-exporter` container contains the specified `--timeout-offset`: kubectl get deployment.apps/prometheus-blackbox-exporter -n stacklight \ -o=jsonpath='{.spec.template.spec.containers[?(@.name=="blackbox-exporter")].args}' For example, for `blackboxExporter.timeoutOffset` set to `0.1`, the output should include `["--config.file=/config/blackbox.yaml","--timeout-offset=0.1"]`. It can take up to 10 minutes for the parameter to be populated.
`ceph.enabled`	In the Grafana web UI, verify that Ceph dashboards are present in the list of dashboards and are populated with data. In the Prometheus web UI, click Alerts and verify that the list of alerts contains `Ceph*` alerts.
`clusterSize` `resourcesPerClusterSize` ^Deprecated `resources`	Obtain the list of pods: kubectl get po -n stacklight Verify that the desired resource limits or requests are set in the `resources` section of every container in the pod: kubectl get po <pod_name> -n stacklight -o yaml
`elasticsearch.logstashRetentionTime` ^{Removed in MCC 2.26.0 (17.1.0, 16.1.0)}	Verify that the `unit_count` parameter contains the desired number of days: kubectl get cm elasticsearch-curator-config -n \ stacklight -o=jsonpath='{.data.action_file\.yml}'
`elasticsearch.persistentVolumeClaimSize`	Verify that the PVC(s) capacity is equal or higher (in case of statically provisioned volumes) than specified: kubectl get pvc -n stacklight -l "app=opensearch-master"
`elasticsearch.retentionTime` `logging.retentionTime` ^{Removed in MCC 2.26.0 (17.1.0, 16.1.0)}	Verify that `configMap` includes the new data. The output should include the changed values. kubectl get cm elasticsearch-curator-config -n stacklight --kubeconfig=<pathToKubeconfig> -o yaml Verify that the `elasticsearch-curator-{JOB_ID}-{POD_ID}` job has successfully completed: kubectl logs elasticsearch-curator-<jobID>-<podID> -n stacklight --kubeconfig=<pathToKubeconfig>
`externalEndpointMonitoring.enabled` `externalEndpointMonitoring.domains`	In the Prometheus web UI, navigate to Status -> Targets. Verify that the `blackbox-external-endpoint` target contains the configured domains (URLs).
`grafana.homeDashboard`	In the Grafana web UI, verify that the desired dashboard is set as a home dashboard.
`grafana.renderer.enabled` ^{Removed in MCC 2.27.0 (17.2.0, 16.2.0)}	Verify the Grafana Image Renderer. If set to `true`, the output should include `HTTP Server started, listening at http://localhost:8081`. kubectl logs -f -n stacklight -l app=grafana \ --container grafana-renderer
`highAvailabilityEnabled`	Verify the number of service replicas for the HA or non-HA StackLight mode. For details, see Deployment architecture. kubectl get sts -n stacklight
`ironic.endpoint` `ironic.insecure`	In the Grafana web UI, verify that the Ironic BM dashboard displays valuable data (no false-positive or empty panels).
`logging.dashboardsExtraConfig`	Verify that the customization has applied: kubectl -n stacklight get cm opensearch-dashboards -o=jsonpath='{.data}' Example of system response: {"opensearch_dashboards.yml":"opensearch.hosts: http://opensearch-master:9200\ \nopensearch.requestTimeout: 60000\ \nopensearchDashboards.defaultAppId: dashboard/2d53aa40-ad1f-11e9-9839-052bda0fdf49\ \nserver:\ \n host: 0.0.0.0\ \n name: opensearch-dashboards\n"}
`logging.enabled`	Verify that OpenSearch, Fluentd, and OpenSearch Dashboards are present in the list of StackLight resources. An empty output indicates that the StackLight logging stack is disabled. kubectl get all -n stacklight -l 'app in (opensearch-master,opensearchDashboards,fluentd-logs)'
`logging.externalOutputs`	Verify the `fluentd-logs` Kubernetes configmap in the `stacklight` namespace: kubectl get cm -n stacklight fluentd-logs -o \ "jsonpath={.data['output-logs\.conf']}" The output must contain an additional output stream according to configured external outputs. After restart of the `fluentd-logs` pods, verify that their logs do not contain any delivery error messages. For example: kubectl logs -n stacklight -f <fluentd-logs-pod-name>\| grep '\[error\]' Example output with a missing parameter: [...] 2023-07-25 09:39:33 +0000 [error]: config error file="/etc/fluentd/fluent.conf" error_class=Fluent::ConfigError error="host or host_with_port is required" If a parameter is missing, verify the configuration as described in Enable log forwarding to external destinations. Verify that the log messages are appearing in the external server database. To troubleshoot issues with Splunk, refer to No logs are forwarded to Splunk.
`logging.externalOutputSecretMounts`	Verify that files were created for the specified path in the Fluentd container: kubectl get pods -n stacklight -o name \| grep fluentd-logs \| \ xargs -I{} kubectl exec -i {} -c fluentd-logs -n stacklight -- \ ls <logging.externalOutputSecretMounts.mountPath>
`logging.extraConfig`	Verify that the customization has applied: kubectl -n stacklight get cm opensearch-master-config -o=jsonpath='{.data}' Example of system response: {"opensearch.yml":"cluster.name: opensearch\ \nnetwork.host: 0.0.0.0\ \nplugins.security.disabled: true\ \nplugins.index_state_management.enabled: false\ \npath.data: /usr/share/opensearch/data\ \ncompatibility.override_main_response_version: true\ \ncluster.max_shards_per_node: 5000\n"}
`logging.level` ^{Removed in MCC 2.26.0 (17.1.0, 16.1.0)}	Inspect the `fluentd-logs` Kubernetes configmap in the `stacklight` namespace: kubectl get cm -n stacklight fluentd-logs \ -o "jsonpath={.data['output-logs\.conf']}" Grep the output using the following command. The `pattern` should contain all logging levels below the expected one. @type grep <exclude> key severity_label pattern /^<pattern>$/ </exclude>
`logging.metricQueries`	For details, see steps 4.2 and 4.3 in Create logs-based metrics.
`logging.syslog.enabled`	Verify the `fluentd-logs` Kubernetes configmap in the `stacklight` namespace: kubectl get cm -n stacklight fluentd-logs -o \ "jsonpath={.data['output-logs\.conf']}" The output must contain an additional container with the remote syslog configuration. After restart of the `fluentd-logs` pods, verify that their logs do not contain any delivery error messages. Verify that the log messages are appearing in the remote syslog database.
`logging.syslog.packetSize`	Verify that `packetSize` has changed according to your customization: kubectl get cm -n stacklight fluentd-logs -o \ yaml \| grep packet_size
`metricFilter`	In the Prometheus web UI, navigate to Status > Configuration. Verify that the following fields in the `metric_relabel_configs` section for the `kubernetes-nodes-cadvisor` and `prometheus-kube-state-metrics` scrape jobs have the required configuration: `action` is set to `keep` or `drop` `regex` contains a regular expression with configured namespaces delimited by `\|` `source_labels` is set to `[namespace]`
`mke.dockerdDataRoot`	In the Prometheus web UI, navigate to Alerts and verify that the `MKEAPIDown` is not false-positively firing due to the certificate absence.
`mke.enabled`	In the Grafana web UI, verify that the MKE Cluster and MKE Containers dashboards are present and not empty. In the Prometheus web UI, navigate to Alerts and verify that the `MKE*` alerts are present in the list of alerts.
`nodeExporter.extraCollectorsEnabled`	In the Prometheus web UI, run the following PromQL queries. The result should not be empty. node_scrape_collector_duration_seconds{collector="<COLLECTOR_NAME>"} node_scrape_collector_success{collector="<COLLECTOR_NAME>"}
`nodeExporter.netDeviceExclude`	Verify the DaemonSet configuration of the Node Exporter: kubectl get daemonset -n stacklight prometheus-node-exporter \ -o=jsonpath='{.spec.template.spec.containers[0].args}' \| jq . Expected system response: [ "--path.procfs=/host/proc", "--path.sysfs=/host/sys", "--collector.netclass.ignored-devices=<paste_your_excluding_regexp_here>", "--collector.netdev.device-blacklist=<paste_your_excluding_regexp_here>", "--no-collector.ipvs" ] In the Prometheus web UI, run the following PromQL query. The expected result is `1`. absent(node_network_transmit_bytes_total{device=~"<paste_your_excluding_regexp_here>"})
`nodeSelector.component` `nodeSelector.default` `tolerations.component` `tolerations.default`	Verify that the appropriate components pods are located on the intended nodes: kubectl get pod -o=custom-columns=NAME:.metadata.name,\ STATUS:.status.phase,NODE:.spec.nodeName -n stacklight
`prometheusRelay.clientTimeout` `prometheusRelay.responseLimitBytes`	Verify that the Prometheus Relay pod is up and running: kubectl get pods -n stacklight -l 'component=relay' Verify that the values have changed according to your customization: kubectl get pods -n stacklight prometheus-relay-9f87df558-zjpvn \ -o=jsonpath='{.spec.containers[0].env}' \| jq .
`prometheusServer.alertsCommonLabels`	In the Prometheus web UI, navigate to Status > Configuration. Verify that the `alerting.alert_relabel_configs` section contains the customization for common labels that you added in `prometheusServer.alertsCommonLabels` during StackLight configuration.
`prometheusServer.customAlerts`	In the Prometheus web UI, navigate to Alerts and verify that the list of alerts has changed according to your customization.
`prometheusServer.customRecordingRules`	In the Prometheus web UI, navigate to Status > Rules. Verify that the list of Prometheus recording rules has changed according to your customization.
`prometheusServer.customScrapeConfigs`	In the Prometheus web UI, navigate to Status > Targets. Verify that the required target has appeared in the list of targets. It may take up to 10 minutes for the change to apply.
`prometheusServer.persistentVolumeClaimSize`	Verify that the PVC(s) capacity equals or is higher (in case of statically provisioned volumes) than specified: kubectl get pvc -n stacklight -l "app=prometheus,component=server"
`prometheusServer.alertResendDelay` `prometheusServer.queryConcurrency` `prometheusServer.retentionSize` `prometheusServer.retentionTime`	In the Prometheus web UI, navigate to Status > Command-Line Flags. Verify the values for the following flags: `rules.alert.resend-delay` `query.max-concurrency` `storage.tsdb.retention.size` `storage.tsdb.retention.time`
`prometheusServer.remoteWrites`	Inspect the `remote_write` configuration in the Status > Configuration section of the Prometheus web UI. Inspect the Prometheus server logs for errors: kubectl logs prometheus-server-0 prometheus-server -n stacklight
`prometheusServer.remoteWriteSecretMounts`	Verify that files were created for the specified path in the Prometheus container: kubectl exec -it prometheus-server-0 -c prometheus-server -n \ stacklight -- ls <remoteWriteSecretMounts.mountPath>
`prometheusServer.watchDogAlertEnabled`	In the Prometheus web UI, navigate to Alerts and verify that the list of alerts contains the `Watchdog` alert.
`sfReporter.cronjob` `sfReporter.enabled` `sfReporter.salesForce`	Verify that Salesforce reporter is enabled. The `SUSPEND` field in the output must be `False`. kubectl get cronjob -n stacklight Verify that the Salesforce reporter configuration includes all expected queries: kubectl get configmap -n stacklight \ sf-reporter-config -o yaml After cron job execution (by default, at midnight server time), obtain the Salesforce reporter pod name. The output should include the Salesforce reporter pod name and `STATUS` must be `Completed`. kubectl get pods -n stacklight Verify that Salesforce reporter successfully authenticates to Salesforce and creates records. The output must include the Salesforce authentication successful, Created record or Duplicate record and Updated record lines. kubectl logs -n stacklight <sf-reporter-pod-name>
`sslCertificateMonitoring.domains` `sslCertificateMonitoring.enabled`	In the Prometheus web UI, navigate to Status -> Targets. Verify that the `blackbox` target contains the configured domains (URLs).
`storage.componentStorageClasses` `storage.defaultStorageClass`	Verify that the appropriate components PVCs have been created according to the configured `StorageClass`: kubectl get pvc -n stacklight