Create logs-based metrics¶
StackLight provides a vast variety of metrics for Container Cloud components. However, you may need to create a custom log-based metric to use it for alert notifications, for example, in the following cases:
If a component producing logs does not expose scraping targets. In this case, component-specific metrics may be missing.
If a scraping target lacks information that can be collected by aggregating the log messages.
If alerting reasons are more explicitly presented in log messages.
For example, you want to receive alert notifications when more than 10 cases
are created in Salesforce within an hour. The sf-notifier
scraping
endpoint does not expose such information. However, sf-notifier
logs are
stored in OpenSearch and using prometheus-es-exporter
you can perform the
following:
Configure a query using Query DSL (Domain Specific Language) and test it in Dev Tools in in OpenSearch Dashboards.
Configure Prometheus Elasticsearch Exporter to expose the result as a Prometheus metric showing the total amount of Salesforce cases created daily, for example,
salesforce_cases_daily_total_value
.Configure StackLight to send a notification once the value of this metric increases by 10 or more within an hour.
Caution
StackLight logging must be enabled and functional.
Prometheus-es-exporter
uses OpenSearch Search API. Therefore, configured queries must be tuned for this specific API and must include:The query part to filter documents
The aggregation part to combine filtered documents into a metric-oriented result
For details, see Supported Aggregations.
The following procedure is based on the salesforce_cases_daily_total_value
metric described in the example above.
To create a custom logs-based metric:
Perform steps 1-2 as described in StackLight configuration procedure.
In the manifest that opens, verify that StackLight logging is enabled:
logging: enabled: true
Create a query using Query DSL:
In the OpenSearch Dashboards web UI, select an index to query. StackLight stores logs in hourly OpenSearch indices. To select all indices for a day, use the
<logstash-{now/d}*>
index pattern, which stands for%3Clogstash-%7Bnow%2Fd%7D*%3E
when URL-encoded.Note
Optimize the query time by limiting the number of results. For example, we will use the OpenSearch
logger
field set tosf-notifier
to limit the number of logs to search.For example:
GET /%3Clogstash-%7Bnow%2Fd%7D*%3E/_search { "query": { "bool": { "must": { "term": { "logger": { "value": "sf-notifier" } } } } } }
Test the query in Dev Tools in OpenSearch Dashboards.
Select the log lines that include information about Salesforce cases creation. For the
INFO
logging level, to indicate case creation,sf-notifier
produces log messages similar to the following one:[2021-07-02 12:35:28,596] INFO in client: Created case: OrderedDict([('id', '5007h000007iqmKAAQ'), ('success', True), ('errors', [])]).
Such log messages include the Created case phrase. Use it in the query to filter log messages for created cases:
"filter": { "match_phrase_prefix" : { "message" : "Created case" } }
Combine the query result to a single value that
prometheus-es-exporter
will expose as a metric. Use thevalue_count
aggregation:GET /%3Clogstash-%7Bnow%2Fd%7D*%3E/_search { "query": { "bool": { "must": { "term": { "logger": { "value": "sf-notifier" } } }, "filter": { "match_phrase_prefix" : { "message" : "Created case" } } } }, "aggs" : { "daily_total": { "value_count": { "field" : "logger" } } } }
The aggregation result in Dev Tools should look as follows:
"aggregations" : { "daily_total" : { "value" : 19 } }
Note
The metric name is suffixed with the aggregation name and the result field name:
salesforce_cases_daily_total_value
.
Configure Prometheus Elasticsearch Exporter:
In StackLight values of the cluster resource, specify the new metric using the
logging.metricQueries
parameter and configure the query parameters as described in StackLight configuration parameters: logging.metricQueries.In the example below,
salesforce_cases
is the query name. The final metric name can be generalized using the<query_name>_<aggregation_name>_<aggregation_result_field_name>
template.logging: metricQueries: salesforce_cases: indices: '<logstash-{now/d}*>' interval: 600 timeout: 60 onError: preserve onMissing: zero body: "{\"query\":{\"bool\":{\"must\":{\"term\":{\"logger\":{\"value\":\"sf-notifier\"}}},\"filter\":{\"match_bool_prefix\":{\"message\":\"Created case\"}}}},\"aggs\":{\"daily_total\":{\"value_count\":{\"field\":\"logger\"}}}}"
Verify that the
prometheus-es-exporter
ConfigMap has been updated:kubectl describe cm -n stacklight prometheus-es-exporter
Example of system response:
QueryOnError = preserve QueryOnMissing = zero QueryJson = "{\"aggs\":{\"component\":{\"terms\":{\"field\":\"logger\"}}},\"query\":{\"match_all\":{}},\"size\":0}" [query_salesforce_cases] QueryIntervalSecs = 600 QueryTimeoutSecs = 60 QueryIndices = <logstash-{now/d}*> QueryOnError = preserve QueryOnMissing = zero QueryJson = "{\"query\":{\"bool\":{\"must\":{\"term\":{\"logger\":{\"value\":\"sf-notifier\"}}},\"filter\":{\"match_phrase_prefix\":{\"message\":\"Created case\"}}}},\"aggs\":{\"daily_total\":{\"value_count\":{\"field\":\"logger\"}}}}" Events: <none>
ConfigMap update triggers the
prometheus-es-exporter
pod restart.Verify that the newly configured query has been executed.
kubectl logs -f -n stacklight <prometheus-es-exporter-pod-id>
Example of system response:
[...] [2021-08-04 12:08:51,989] opensearch.INFO MainThread POST http://opensearch-master:9200/%3Cnotification-%7Bnow%2Fd%7D%3E/_search [status:200 request:0.040s] [2021-08-04 12:08:52,089] opensearch.INFO MainThread POST http://opensearch-master:9200/%3Cnotification-%7Bnow%2Fd%7D%3E/_search [status:200 request:0.100s] [2021-08-04 12:08:54,469] opensearch.INFO MainThread POST http://opensearch-master:9200/%3Clogstash-%7Bnow%2Fd%7D*%3E/_search [status:200 request:2.278s]
Once done,
prometheus-es-exporter
will expose metrics from Prometheus in its scraping endpoint. You can view the new metric in the Prometheus web UI.
(Optional) Configure StackLight notifications:
Add a new alert as described in Alerts configuration. For example:
prometheusServer: customAlerts: - alert: SalesforceCasesDailyWarning annotations: description: The number of cases created today in Salesforce increased by 10 within the last hour. summary: Too many cases in Salesforce expr: increase(salesforce_cases_daily_total_value[1h]) >= 10 labels: severity: warning service: custom
Configure receivers as described in StackLight configuration parameters. For example, to send alert notifications to Slack only:
alertmanagerSimpleConfig: slack: enabled: true api_url: https://hooks.slack.com/services/i45f3k3/w3bh00kU9L/06vi0u5ly channel: Slackbot route: match: alertname: SalesforceCasesDailyWarning salesForce: enabled: true route: routes: - receiver: HTTP-slack match: - alertname: SalesforceCasesDailyWarning