HAProxy

HAProxy

This section describes the alerts for the HAProxy service.


HaproxyServiceDown

Severity

Minor

Summary

The HAProxy service on the {{ $labels.host }} node is down.

Raise condition

haproxy_up != 1

Description

Raises when the HAProxy service on a node does not respond to Telegraf, typically meaning that the HAproxy process is in the DOWN state on that node. The host label in the raised alert contains the host name of the affected node.

Troubleshooting

  • Verify the HAProxy status by running systemctl status haproxy on the affected node.

  • If HAProxy is up and running, inspect the Telegraf logs on the affected node using journalctl -u telegraf.

Tuning

Not required

HaproxyServiceDownMajor

Severity

Major

Summary

More than 50% of HAProxy services within the {{ $labels.cluster }} cluster are down.

Raise condition

count(label_replace(haproxy_up, "cluster", "$1", "host", "([^0-9]+).+") != 1) by (cluster) >= 0.5 * count(label_replace(haproxy_up, "cluster", "$1", "host", "([^0-9]+).+")) by (cluster)

Description

Raises when the HAProxy service does not respond to Telegraf on more than 50% of cluster nodes. The cluster label in the raised alert contains the cluster prefix, for example, ctl, dbs, or mon.

Troubleshooting

  • Inspect the HaproxyServiceDown alerts for the host names of the affected nodes.

  • Inspect dmesg and /var/log/kern.log.

  • Inspect the logs in /var/log/haproxy.log.

  • Inspect the Telegraf logs using journalctl -u telegraf.

Tuning

Not required

HaproxyServiceOutage

Severity

Critical

Summary

All HAProxy services within the {{ $labels.cluster }} cluster are down.

Raise condition

count(label_replace(haproxy_up, "cluster", "$1", "host", "([^0-9]+).+") != 1) by (cluster) == count(label_replace(haproxy_up, "cluster", "$1", "host", "([^0-9]+).+")) by (cluster)

Description

Raises when the HAProxy service does not respond to Telegraf on all nodes of a cluster, typically indicating deployment or configuration issues. The cluster label in the raised alert contains the cluster prefix, for example, ctl, dbs, or mon.

Troubleshooting

  • Inspect the HaproxyServiceDown alerts for the host names of the affected nodes.

  • Inspect dmesg and /var/log/kern.log.

  • Inspect the logs in /var/log/haproxy.log.

  • Inspect the Telegraf logs using journalctl -u telegraf.

Tuning

Not required

HaproxyHTTPResponse5xxTooHigh

Severity

Warning

Summary

The average per-second rate of 5xx HTTP errors on the {{ $labels.host }} node for the {{ $labels.proxy }} back end is {{ $value }} (as measured over the last 2 minutes).

Raise condition

rate(haproxy_http_response_5xx{sv="FRONTEND"}[2m]) > 1

Description

Raises when the HTTP 5xx responses sent by HAProxy increased for the last 2 minutes, indicating a configuration issue with the HAProxy service or back-end servers within the cluster. The host label in the raised alert contains the host name of the affected node.

Troubleshooting

Inspect the HAproxy logs by running journalctl -u haproxy on the affected node and verify the state of the back-end servers.

Tuning

Not required

HaproxyBackendDown

Severity

Minor

Summary

The {{ $labels.proxy }} back end on the {{ $labels.host }} node is down.

Raise condition

increase(haproxy_chkdown{sv="BACKEND"}[1m]) > 0

Description

Raises when an internal HAProxy check for the back-end availability reported the back-end outage. The host and proxy labels in the raised alert contain the host name of the affected node and the service proxy name.

Troubleshooting

  • Inspect the HAProxy logs by running journalctl -u haproxy on the affected node.

  • Verify the state of the affected back-end server:

    • Verify that the server is responding and the back-end service is active and responsive.

    • Verify the state of the back-end service using an HTTP GET request, for example, curl -XGET http://ctl01:8888/. Typically, the 200 response code indicates the healthy state.

Tuning

Not required

HaproxyBackendDownMajor

Severity

Major

Summary

More than 50% of {{ $labels.proxy }} back ends are down.

Raise condition

  • In 2019.2.10 and prior: 0.5 * avg(sum(haproxy_active_servers{type=""server""}) by (host, proxy) + sum(haproxy_backup_servers{type=""server""}) by (host, proxy)) by (proxy) >= avg(sum(haproxy_active_servers{type=""backend""}) by (host, proxy) + sum(haproxy_backup_servers{type=""backend""}) by (host, proxy)) by (proxy)

  • In 2019.2.11 and newer: avg(sum(haproxy_active_servers{type="server"}) by (host, proxy) + sum(haproxy_backup_servers{type="server"}) by (host, proxy)) by (proxy) - avg(sum(haproxy_active_servers{type="backend"}) by (host, proxy) + sum(haproxy_backup_servers{type="backend"}) by (host, proxy)) by (proxy) >= 0.5 * avg(sum(haproxy_active_servers{type="server"}) by (host, proxy) + sum(haproxy_backup_servers{type="server"}) by (host, proxy)) by (proxy)

Description

Raises when at least half of the back-end servers (>=50%) used by the HAProxy service are in the DOWN state. The host and proxy labels in the raised alert contain the host name of the affected node and the service proxy name.

Troubleshooting

  • Inspect the HAProxy logs by running journalctl -u haproxy on the affected node.

  • Verify the state of the affected back-end server:

    • Verify that the server is responding and the back-end service is active and responsive.

    • Verify the state of the back-end service using an HTTP GET request, for example, curl -XGET http://ctl01:8888/. Typically, the 200 response code indicates the healthy state.

Tuning

Not required

HaproxyBackendOutage

Severity

Critical

Summary

All {{ $labels.proxy }} back ends are down.

Raise condition

max(haproxy_active_servers{sv=""BACKEND""}) by (proxy) + max(haproxy_backup_servers{sv=""BACKEND""}) by (proxy) == 0

Description

Raises when all back-end servers used by the HAProxy service across the cluster are not available to process the requests proxied by HAProxy, typically indicating deployment or configuration issues. The proxy label in the raised alert contains the service proxy name.

Troubleshooting

  • Verify the affected back ends.

  • Inspect the HAProxy logs by running journalctl -u haproxy on the affected node.

  • Inspect Telegraf logs by running journalctl -u telegraf on the affected node.

Tuning

Not required