Verify StackLight after configuration

Verify StackLight after configurationΒΆ

This section describes how to verify StackLight after configuring its parameters as described in Configure StackLight and StackLight configuration parameters. Perform the verification procedure described for a particular modified StackLight key.

To verify StackLight after configuration:

Key

Verification procedure

alerta.enabled

Verify that Alerta is present in the list of StackLight resources. An empty output indicates that Alerta is disabled.

kubectl get all -n stacklight -l app=alerta

elasticsearch.logstashRetentionTime

Verify that the unit_count parameter contains the desired number of days:

kubectl get cm elasticsearch-curator-config -n \
stacklight -o=jsonpath='{.data.action_file\.yml}'

grafana.renderer.enabled

Verify the Grafana Image Renderer. If set to true, the output should include HTTP Server started, listening at http://localhost:8081.

kubectl logs -f -n stacklight -l app=grafana \
--container grafana-renderer

grafana.homeDashboard

In the Grafana web UI, verify that the desired dashboard is set as a home dashboard.

logging.enabled

Verify that Elasticsearch, Fluentd, and Kibana are present in the list of StackLight resources. An empty output indicates that the StackLight logging stack is disabled.

kubectl get all -n stacklight -l 'app in
(elasticsearch-master,kibana,fluentd-elasticsearch)'

logging.syslog.enabled

  1. Verify the fluentd-elasticsearch Kubernetes configmap in the stacklight namespace:

    kubectl get cm -n stacklight fluentd-elasticsearch -o \
    "jsonpath={.data['output-logs\.conf']}"
    

    The output must contain an additional container with the remote syslog configuration.

  2. After restart of the fluentd-elasticsearch pods, verify that their logs do not contain any delivery error messages.

  3. Verify that the log messages are appearing in the remote syslog database.

logging.level

  1. Run the following command to inspect the fluentd-elasticsearch Kubernetes configmap in the stacklight namespace:

    kubectl get cm -n stacklight fluentd-elasticsearch \
    -o "jsonpath={.data['output-logs\.conf']}"
    
  2. Grep the output using the following command. The pattern should contain all logging levels below the expected one.

    @type grep
    <exclude>
     key severity_label
     pattern /^<pattern>$/
    </exclude>
    

highAvailabilityEnabled

Verify the number of service replicas for the HA or non-HA StackLight mode. For details, see Deployment architecture.

kubectl get sts -n stacklight
  • prometheusServer.retentionTime

  • prometheusServer.retentionSize

  • prometheusServer.alertResendDelay

  1. In the Prometheus web UI, navigate to Status > Command-Line Flags.

  2. Verify the values for the following flags:

    • storage.tsdb.retention.time

    • storage.tsdb.retention.size

    • rules.alert.resend-delay

  • clusterSize

  • resourcesPerClusterSize

  • resources

  1. Obtain the list of pods:

    kubectl get po -n stacklight
    
  2. Verify that the desired resource limits or requests are set in the resources section of every container in the pod:

    kubectl get po <pod_name> -n stacklight -o yaml
    
  • nodeSelector.default

  • nodeSelector.component

  • tolerations.default

  • tolerations.component

Verify that the appropriate components pods are located on the intended nodes:

kubectl get pod -o=custom-columns=NAME:.metadata.name,\
STATUS:.status.phase,NODE:.spec.nodeName -n stacklight
  • storage.defaultStorageClass

  • storage.componentStorageClasses

Verify that the appropriate components PVCs have been created according to the configured StorageClass:

kubectl get pvc -n stacklight
  • sfReporter.enabled

  • sfReporter.salesForce

  • sfReporter.cronjob

  1. Verify that Salesforce reporter is enabled. The SUSPEND field in the output must be False.

    kubectl get cronjob -n stacklight
    
  2. Verify that the Salesforce reporter configuration includes all expected queries:

    kubectl get configmap -n stacklight \
    sf-reporter-config -o yaml
    
  3. After cron job execution (by default, at midnight server time), obtain the Salesforce reporter pod name. The output should include the Salesforce reporter pod name and STATUS must be Completed.

    kubectl get pods -n stacklight
    
  4. Verify that Salesforce reporter successfully authenticates to Salesforce and creates records. The output must include the Salesforce authentication successful, Created record or Duplicate record and Updated record lines.

    kubectl logs -n stacklight <sf-reporter-pod-name>
    

ceph.enabled

  1. In the Grafana web UI, verify that Ceph dashboards are present in the list of dashboards and are populated with data.

  2. In the Prometheus web UI, click Alerts and verify that the list of alerts contains Ceph* alerts.

  • externalEndpointMonitoring.enabled

  • externalEndpointMonitoring.domains

  1. In the Prometheus web UI, navigate to Status -> Targets.

  2. Verify that the blackbox-external-endpoint target contains the configured domains (URLs).

  • ironic.endpoint

  • ironic.insecure

In the Grafana web UI, verify that the Ironic BM dashboard displays valuable data (no false-positive or empty panels).

metricFilter

  1. In the Prometheus web UI, navigate to Status > Configuration.

  2. Verify that the following fields in the metric_relabel_configs section for the kubernetes-nodes-cadvisor and prometheus-kube-state-metrics scrape jobs have the required configuration:

    • action is set to keep or drop

    • regex contains a regular expression with configured namespaces delimited by |

    • source_labels is set to [namespace]

  • sslCertificateMonitoring.enabled

  • sslCertificateMonitoring.domains

  1. In the Prometheus web UI, navigate to Status -> Targets.

  2. Verify that the blackbox target contains the configured domains (URLs).

mke.enabled

  1. In the Grafana web UI, verify that the MKE Cluster and MKE Containers dashboards are present and not empty.

  2. In the Prometheus web UI, navigate to Alerts and verify that the MKE* alerts are present in the list of alerts.

mke.dockerdDataRoot

In the Prometheus web UI, navigate to Alerts and verify that the MKEAPIDown is not false-positively firing due to the certificate absence.

prometheusServer.customAlerts

In the Prometheus web UI, navigate to Alerts and verify that the list of alerts has changed according to your customization.

prometheusServer.watchDogAlertEnabled

In the Prometheus web UI, navigate to Alerts and verify that the list of alerts contains the Watchdog alert.

alertmanagerSimpleConfig.genericReceivers

In the Alertmanager web UI, navigate to Status and verify that the Config section contains the intended receiver(s).

alertmanagerSimpleConfig.genericRoutes

In the Alertmanager web UI, navigate to Status and verify that the Config section contains the intended route(s).

alertmanagerSimpleConfig.inhibitRules.enabled

Run the following command. An empty output indicates either a failure or that the feature is disabled.

kubectl  get cm -n stacklight prometheus-alertmanager -o \
yaml | grep -A 6 inhibit_rules
  • alertmanagerSimpleConfig.email.enabled

  • alertmanagerSimpleConfig.email

  • alertmanagerSimpleConfig.email.route

In the Alertmanager web UI, navigate to Status and verify that the Config section contains the Email receiver and route.

  • alertmanagerSimpleConfig.salesForce.enabled

  • alertmanagerSimpleConfig.salesForce.auth

  • alertmanagerSimpleConfig.salesForce.route

  1. Verify that sf-notifier is enabled. The output must include the sf-notifier pod name, 1/1 in the READY field and Running in the STATUS field.

    kubectl get pods -n stacklight
    
  2. Verify that sf-notifier successfully authenticates to Salesforce. The output must include the Salesforce authentication successful line.

    kubectl logs -f -n stacklight <sf-notifier-pod-name>
    
  3. In the Alertmanager web UI, navigate to Status and verify that the Config section contains the HTTP-salesforce receiver and route.

  • alertmanagerSimpleConfig.slack.enabled

  • alertmanagerSimpleConfig.slack.api_url

  • alertmanagerSimpleConfig.slack.channel

  • alertmanagerSimpleConfig.slack.route

In the Alertmanager web UI, navigate to Status and verify that the Config section contains the HTTP-slack receiver and route.

  • alertmanagerSimpleConfig.msteams.enabled

  • alertmanagerSimpleConfig.msteams.url

  1. Verify that the Prometheus Microsoft Teams pod is up and running:

    kubectl get pods -n stacklight -l \
    'app=prometheus-msteams'
    
  2. Verify that the Prometheus Microsoft Teams pod logs have no errors:

    kubectl logs -f -n stacklight -l \
    'app=prometheus-msteams'
    
  3. Verify that notifications are being sent to the Microsoft Teams channel.

alertmanagerSimpleConfig.serviceNow

  1. Verify that the alertmanager-webhook-servicenow pod is up and running:

    kubectl get pods -n stacklight -l \
    'app=alertmanager-webhook-servicenow'
    
  2. Verify that authentication to ServiceNow was successful. The output should include ServiceNow authentication successful. In case of authentication failure, the ServiceNowAuthFailure alert will raise.

    kubectl logs -f -n stacklight \
    <alertmanager-webhook-servicenow-pod-name>
    
  3. In your ServiceNow instance, verify that the Watchdog alert appears in the Incident table. Once the incident is created, the pod logs should include a line similar to Created Incident: bef260671bdb2010d7b540c6cc4bcbed.

In case of any failure:

  • Verify that your ServiceNow instance is not in hibernation.

  • Verify that the service user credentials, table name, and alert_id_field are correct.

  • Verify that the ServiceNow user has access to the table with permission to read, create, and update records.