Manage alerts

Manage alertsΒΆ

You can easily extend StackLight LMA to support a new service check by adding a custom alert. You may also need to modify or disable the default alerts as required.

To create a custom alert:

  1. Log in to the Salt Master node.

  2. Add the new alert to the prometheus:server:alert section in the classes/cluster/cluster_name/stacklight/server.yml file of the Reclass model. Enter the alert name, alerting conditions, severity level, and annotations that will be shown in the alert message.

    Example:

    prometheus:
      server:
        alert:
          EtcdFailedTotalIn5m:
            if: >-
              sum by(method) (rate(etcd_http_failed_total{code!~"4[0-9]{2}"}[5m]))
              / sum by(method) (rate(etcd_http_received_total[5m])) > {{
              prometheus_server.get('alert', {}).get('EtcdFailedTotalin5m', \
              {}).get('var', {}).get('threshold', 0.01) }}
            labels:
              severity: warning
              service: etcd
            annotations:
              summary: 'High number of HTTP requests are failing on etcd'
              description: '{{ $value }}% of requests for {{ $labels.method }} \
              failed on etcd instance {{ $labels.instance }}'
    
  3. Apply the Salt formula:

    salt -C 'I@docker:swarm and I@prometheus:server' state.sls prometheus.server -b1
    
  4. To view the new alert, see the Prometheus logs:

    docker service logs monitoring_server
    

    Alternatively, see the Alerts tab of the Prometheus web UI.

To modify a default alert:

  1. Log in to the Salt Master node.

  2. Modify the required alert in the prometheus:server:alert section in the classes/cluster/cluster_name/stacklight/server.yml file of the Reclass model.

  3. Apply the Salt formula:

    salt -C 'I@docker:swarm and I@prometheus:server' state.sls prometheus.server -b1
    
  4. To view the changes, see the Prometheus logs:

    docker service logs monitoring_server
    

    Alternatively, see the alert details in the Alerts tab of the Prometheus web UI.

To disable an alert:

  1. Log in to the Salt Master node.

  2. Create the required alert definition in the prometheus:server:alert section in the classes/cluster/cluster_name/stacklight/server.yml file of the Reclass model and set the enabled parameter to false.

    Example:

    prometheus:
      server:
        alert:
          EtcdClusterSmall:
            enabled: false
    
  3. Apply the Salt formula:

    salt -C 'I@docker:swarm and I@prometheus:server' state.sls prometheus.server -b1
    
  4. Verify the changes in the Alerts tab of the Prometheus web UI.