HAProxy
This section describes the alerts for the HAProxy service.
HaproxyServiceDown
Severity |
Minor |
Summary |
The HAProxy service on the {{ $labels.host }} node is down. |
Raise condition |
haproxy_up != 1 |
Description |
Raises when the HAProxy service on a node does not respond to Telegraf,
typically meaning that the HAproxy process is in the DOWN state on
that node. The host label in the raised alert contains the host name
of the affected node. |
Troubleshooting |
- Verify the HAProxy status by running
systemctl status haproxy on
the affected node.
- If HAProxy is up and running, inspect the Telegraf logs on the
affected node using
journalctl -u telegraf .
|
Tuning |
Not required |
HaproxyServiceDownMajor
Severity |
Major |
Summary |
More than 50% of HAProxy services within the {{ $labels.cluster }}
cluster are down. |
Raise condition |
count(label_replace(haproxy_up, "cluster", "$1", "host",
"([^0-9]+).+") != 1) by (cluster) >= 0.5 *
count(label_replace(haproxy_up, "cluster", "$1", "host", "([^0-9]+).+"))
by (cluster) |
Description |
Raises when the HAProxy service does not respond to Telegraf on more
than 50% of cluster nodes. The cluster label in the raised alert
contains the cluster prefix, for example, ctl , dbs , or mon . |
Troubleshooting |
- Inspect the
HaproxyServiceDown alerts for the host names of the
affected nodes.
- Inspect
dmesg and /var/log/kern.log .
- Inspect the logs in
/var/log/haproxy.log .
- Inspect the Telegraf logs using
journalctl -u telegraf .
|
Tuning |
Not required |
HaproxyServiceOutage
Severity |
Critical |
Summary |
All HAProxy services within the {{ $labels.cluster }} cluster are
down. |
Raise condition |
count(label_replace(haproxy_up, "cluster", "$1", "host",
"([^0-9]+).+") != 1) by (cluster) == count(label_replace(haproxy_up,
"cluster", "$1", "host", "([^0-9]+).+")) by (cluster) |
Description |
Raises when the HAProxy service does not respond to Telegraf on all
nodes of a cluster, typically indicating deployment or configuration
issues. The cluster label in the raised alert contains
the cluster prefix, for example, ctl , dbs , or mon . |
Troubleshooting |
- Inspect the
HaproxyServiceDown alerts for the host names of the
affected nodes.
- Inspect
dmesg and /var/log/kern.log .
- Inspect the logs in
/var/log/haproxy.log .
- Inspect the Telegraf logs using
journalctl -u telegraf .
|
Tuning |
Not required |
HaproxyHTTPResponse5xxTooHigh
Severity |
Warning |
Summary |
The average per-second rate of 5xx HTTP errors on the
{{ $labels.host }} node for the {{ $labels.proxy }} back end is
{{ $value }} (as measured over the last 2 minutes). |
Raise condition |
rate(haproxy_http_response_5xx{sv="FRONTEND"}[2m]) > 1 |
Description |
Raises when the HTTP 5xx responses sent by HAProxy increased for the
last 2 minutes, indicating a configuration issue with the HAProxy
service or back-end servers within the cluster. The host label in
the raised alert contains the host name of the affected node. |
Troubleshooting |
Inspect the HAproxy logs by running journalctl -u haproxy on the
affected node and verify the state of the back-end servers. |
Tuning |
Not required |
HaproxyBackendDown
Severity |
Minor |
Summary |
The {{ $labels.proxy }} back end on the {{ $labels.host }} node
is down. |
Raise condition |
increase(haproxy_chkdown{sv="BACKEND"}[1m]) > 0 |
Description |
Raises when an internal HAProxy check for the back-end availability
reported the back-end outage. The host and proxy labels in the
raised alert contain the host name of the affected node and the service
proxy name. |
Troubleshooting |
- Inspect the HAProxy logs by running
journalctl -u haproxy on the
affected node.
- Verify the state of the affected back-end server:
- Verify that the server is responding and the back-end service is
active and responsive.
- Verify the state of the back-end service using an HTTP GET request,
for example,
curl -XGET http://ctl01:8888/ . Typically, the
200 response code indicates the healthy state.
|
Tuning |
Not required |
HaproxyBackendDownMajor
Severity |
Major |
Summary |
More than 50% of {{ $labels.proxy }} backends are down. |
Raise condition |
- In 2019.2.10 and prior:
0.5 * avg(sum(haproxy_active_servers{type=""server""}) by (host,
proxy) + sum(haproxy_backup_servers{type=""server""}) by (host,
proxy)) by (proxy) >=
avg(sum(haproxy_active_servers{type=""backend""}) by (host, proxy) +
sum(haproxy_backup_servers{type=""backend""}) by (host, proxy)) by
(proxy)
- In 2019.2.11 and newer:
avg(sum(haproxy_active_servers{type="server"}) by (host, proxy) +
sum(haproxy_backup_servers{type="server"}) by (host, proxy)) by
(proxy) - avg(sum(haproxy_active_servers{type="backend"}) by (host,
proxy) + sum(haproxy_backup_servers{type="backend"}) by (host, proxy))
by (proxy) >= 0.5 * avg(sum(haproxy_active_servers{type="server"}) by
(host, proxy) + sum(haproxy_backup_servers{type="server"}) by (host,
proxy)) by (proxy)
|
Description |
Raises when at least half of the back-end servers (>=50%) used by the
HAProxy service are in the DOWN state. The host and proxy
labels in the raised alert contain the host name of the affected node
and the service proxy name. |
Troubleshooting |
- Inspect the HAProxy logs by running
journalctl -u haproxy on the
affected node.
- Verify the state of the affected back-end server:
- Verify that the server is responding and the back-end service is
active and responsive.
- Verify the state of the back-end service using an HTTP GET request,
for example,
curl -XGET http://ctl01:8888/ . Typically, the
200 response code indicates the healthy state.
|
Tuning |
Not required |
HaproxyBackendOutage
Severity |
Critical |
Summary |
All {{ $labels.proxy }} backends are down. |
Raise condition |
max(haproxy_active_servers{sv=""BACKEND""}) by (proxy)
+ max(haproxy_backup_servers{sv=""BACKEND""}) by (proxy) == 0 |
Description |
Raises when all back-end servers used by the HAProxy service across the
cluster are not available to process the requests proxied by HAProxy,
typically indicating deployment or configuration issues. The proxy
label in the raised alert contains the service proxy name. |
Troubleshooting |
- Verify the affected backends.
- Inspect the HAProxy logs by running
journalctl -u haproxy on the
affected node.
- Inspect Telegraf logs by running
journalctl -u telegraf on the
affected node.
|
Tuning |
Not required |