You can easily extend StackLight LMA to support a new service check by adding a custom alert. You may also need to modify or disable the default alerts as required.
To create a custom alert:
Log in to the Salt Master node.
Add the new alert to the prometheus:server:alert
section in the
classes/cluster/cluster_name/stacklight/server.yml
file of the Reclass
model. Enter the alert name, alerting conditions, severity level, and
annotations that will be shown in the alert message.
Example:
prometheus:
server:
alert:
EtcdFailedTotalIn5m:
if: >-
sum by(method) (rate(etcd_http_failed_total{code!~"4[0-9]{2}"}[5m]))
/ sum by(method) (rate(etcd_http_received_total[5m])) > {{
prometheus_server.get('alert', {}).get('EtcdFailedTotalin5m', \
{}).get('var', {}).get('threshold', 0.01) }}
labels:
severity: warning
service: etcd
annotations:
summary: 'High number of HTTP requests are failing on etcd'
description: '{{ $value }}% of requests for {{ $labels.method }} \
failed on etcd instance {{ $labels.instance }}'
Apply the Salt formula:
salt -C 'I@docker:swarm and I@prometheus:server' state.sls prometheus.server -b1
To view the new alert, see the Prometheus logs:
docker service logs monitoring_server
Alternatively, see the Alerts tab of the Prometheus web UI.
To modify a default alert:
Log in to the Salt Master node.
Modify the required alert in the prometheus:server:alert
section in the
classes/cluster/cluster_name/stacklight/server.yml
file of the Reclass
model.
Apply the Salt formula:
salt -C 'I@docker:swarm and I@prometheus:server' state.sls prometheus.server -b1
To view the changes, see the Prometheus logs:
docker service logs monitoring_server
Alternatively, see the alert details in the Alerts tab of the Prometheus web UI.
To disable an alert:
Log in to the Salt Master node.
Create the required alert definition in the prometheus:server:alert
section in the classes/cluster/cluster_name/stacklight/server.yml
file
of the Reclass model and set the enabled
parameter to false
.
Example:
prometheus:
server:
alert:
EtcdClusterSmall:
enabled: false
Apply the Salt formula:
salt -C 'I@docker:swarm and I@prometheus:server' state.sls prometheus.server -b1
Verify the changes in the Alerts tab of the Prometheus web UI.