Documentation Portal

HAProxy

HAProxy¶

This section describes the alerts for the HAProxy service.

HaproxyServiceDown
HaproxyServiceDownMajor
HaproxyServiceOutage
HaproxyHTTPResponse5xxTooHigh
HaproxyBackendDown
HaproxyBackendDownMajor
HaproxyBackendOutage

HaproxyServiceDown¶

Severity	Minor
Summary	The HAProxy service on the `{{ $labels.host }}` node is down.
Raise condition	`haproxy_up != 1`
Description	Raises when the HAProxy service on a node does not respond to Telegraf, typically meaning that the HAproxy process is in the `DOWN` state on that node. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Verify the HAProxy status by running `systemctl status haproxy` on the affected node. If HAProxy is up and running, inspect the Telegraf logs on the affected node using `journalctl -u telegraf`.
Tuning	Not required

HaproxyServiceDownMajor¶

Severity	Major
Summary	More than 50% of HAProxy services within the `{{ $labels.cluster }}` cluster are down.
Raise condition	`count(label_replace(haproxy_up, "cluster", "$1", "host", "([^0-9]+).+") != 1) by (cluster) >= 0.5 * count(label_replace(haproxy_up, "cluster", "$1", "host", "([^0-9]+).+")) by (cluster)`
Description	Raises when the HAProxy service does not respond to Telegraf on more than 50% of cluster nodes. The `cluster` label in the raised alert contains the cluster prefix, for example, `ctl`, `dbs`, or `mon`.
Troubleshooting	Inspect the `HaproxyServiceDown` alerts for the host names of the affected nodes. Inspect `dmesg` and `/var/log/kern.log`. Inspect the logs in `/var/log/haproxy.log`. Inspect the Telegraf logs using `journalctl -u telegraf`.
Tuning	Not required

HaproxyServiceOutage¶

Severity	Critical
Summary	All HAProxy services within the `{{ $labels.cluster }}` cluster are down.
Raise condition	`count(label_replace(haproxy_up, "cluster", "$1", "host", "([^0-9]+).+") != 1) by (cluster) == count(label_replace(haproxy_up, "cluster", "$1", "host", "([^0-9]+).+")) by (cluster)`
Description	Raises when the HAProxy service does not respond to Telegraf on all nodes of a cluster, typically indicating deployment or configuration issues. The `cluster` label in the raised alert contains the cluster prefix, for example, `ctl`, `dbs`, or `mon`.
Troubleshooting	Inspect the `HaproxyServiceDown` alerts for the host names of the affected nodes. Inspect `dmesg` and `/var/log/kern.log`. Inspect the logs in `/var/log/haproxy.log`. Inspect the Telegraf logs using `journalctl -u telegraf`.
Tuning	Not required

HaproxyHTTPResponse5xxTooHigh¶

Severity	Warning
Summary	The average per-second rate of 5xx HTTP errors on the `{{ $labels.host }}` node for the `{{ $labels.proxy }}` back end is `{{ $value }}` (as measured over the last 2 minutes).
Raise condition	`rate(haproxy_http_response_5xx{sv="FRONTEND"}[2m]) > 1`
Description	Raises when the HTTP 5xx responses sent by HAProxy increased for the last 2 minutes, indicating a configuration issue with the HAProxy service or back-end servers within the cluster. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Inspect the HAproxy logs by running `journalctl -u haproxy` on the affected node and verify the state of the back-end servers.
Tuning	Not required

HaproxyBackendDown¶

Severity	Minor
Summary	The `{{ $labels.proxy }}` back end on the `{{ $labels.host }}` node is down.
Raise condition	`increase(haproxy_chkdown{sv="BACKEND"}[1m]) > 0`
Description	Raises when an internal HAProxy check for the back-end availability reported the back-end outage. The `host` and `proxy` labels in the raised alert contain the host name of the affected node and the service proxy name.
Troubleshooting	Inspect the HAProxy logs by running `journalctl -u haproxy` on the affected node. Verify the state of the affected back-end server: Verify that the server is responding and the back-end service is active and responsive. Verify the state of the back-end service using an HTTP GET request, for example, `curl -XGET http://ctl01:8888/`. Typically, the `200` response code indicates the healthy state.
Tuning	Not required

HaproxyBackendDownMajor¶

Severity	Major
Summary	More than 50% of `{{ $labels.proxy }}` backends are down.
Raise condition	In 2019.2.10 and prior: `0.5 * avg(sum(haproxy_active_servers{type=""server""}) by (host, proxy) + sum(haproxy_backup_servers{type=""server""}) by (host, proxy)) by (proxy) >= avg(sum(haproxy_active_servers{type=""backend""}) by (host, proxy) + sum(haproxy_backup_servers{type=""backend""}) by (host, proxy)) by (proxy)` In 2019.2.11 and newer: `avg(sum(haproxy_active_servers{type="server"}) by (host, proxy) + sum(haproxy_backup_servers{type="server"}) by (host, proxy)) by (proxy) - avg(sum(haproxy_active_servers{type="backend"}) by (host, proxy) + sum(haproxy_backup_servers{type="backend"}) by (host, proxy)) by (proxy) >= 0.5 * avg(sum(haproxy_active_servers{type="server"}) by (host, proxy) + sum(haproxy_backup_servers{type="server"}) by (host, proxy)) by (proxy)`
Description	Raises when at least half of the back-end servers (>=50%) used by the HAProxy service are in the `DOWN` state. The `host` and `proxy` labels in the raised alert contain the host name of the affected node and the service proxy name.
Troubleshooting	Inspect the HAProxy logs by running `journalctl -u haproxy` on the affected node. Verify the state of the affected back-end server: Verify that the server is responding and the back-end service is active and responsive. Verify the state of the back-end service using an HTTP GET request, for example, `curl -XGET http://ctl01:8888/`. Typically, the `200` response code indicates the healthy state.
Tuning	Not required

HaproxyBackendOutage¶

Severity	Critical
Summary	All `{{ $labels.proxy }}` backends are down.
Raise condition	`max(haproxy_active_servers{sv=""BACKEND""}) by (proxy) + max(haproxy_backup_servers{sv=""BACKEND""}) by (proxy) == 0`
Description	Raises when all back-end servers used by the HAProxy service across the cluster are not available to process the requests proxied by HAProxy, typically indicating deployment or configuration issues. The `proxy` label in the raised alert contains the service proxy name.
Troubleshooting	Verify the affected backends. Inspect the HAProxy logs by running `journalctl -u haproxy` on the affected node. Inspect Telegraf logs by running `journalctl -u telegraf` on the affected node.
Tuning	Not required

updated: 2025-01-10 08:56

GlusterFS

View Previous Section

Keepalived

View Next Section