Documentation Portal

ZooKeeper

ZooKeeper¶

This section describes the alerts for the ZooKeeper service.

ZookeeperServiceDown
ZookeeperServiceErrorWarning
ZookeeperServicesDownMinor
ZookeeperServicesDownMajor
ZookeeperServiceOutage

ZookeeperServiceDown¶

Severity	Minor
Summary	The ZooKeeper service on the `{{ $labels.host }}` node is down for 2 minutes.
Raise condition	`zookeeper_up == 0`
Description	Raises when the ZooKeeper service on a host node does not respond to Telegraf, typically indicating that ZooKeeper is down on that node. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Verify the ZooKeeper status on the affected node using `service zookeeper status`. If ZooKeeper is up and running, inspect the Telegraf logs on the affected node using `journalctl -u telegraf`.
Tuning	Not required

ZookeeperServiceErrorWarning¶

Severity	Warning
Summary	The ZooKeeper service on the `{{ $labels.host }}` node is not responding for 2 minutes.
Raise condition	`zookeeper_service_health == 0`
Description	Raises when the ZooKeeper service on a node is not healthy (in operational mode), typically indicating that the service is unresponsive due to a high load or an operating system or hardware issue on the node.
Troubleshooting	Inspect `dmesg` and `/var/log/kern.log`. Inspect the logs in `/var/log/zookeeper`.
Tuning	Not required

ZookeeperServicesDownMinor¶

Severity	Minor
Summary	More than 30% of ZooKeeper services are down for 2 minutes.
Raise condition	`count(zookeeper_up == 0) >= count(zookeeper_up) * 0.3`
Description	Raises when a ZooKeeper cluster has more than 30% of unavailable services.
Troubleshooting	Inspect the ZooKeeper logs on any node of the affected cluster using `journalctl -u zookeeper`.
Tuning	Not required

ZookeeperServicesDownMajor¶

Severity	Major
Summary	More than 60% of ZooKeeper services are down for 2 minutes.
Raise condition	`count(zookeeper_up == 0) >= count(zookeeper_up) * 0.6`
Description	Raises when a ZooKeeper cluster has more than 60% of unavailable services.
Troubleshooting	Inspect the ZooKeeper logs on any node of the affected cluster using `journalctl -u zookeeper`.
Tuning	Not required

ZookeeperServiceOutage¶

Severity	Critical
Summary	All ZooKeeper services are down for 2 minutes.
Raise condition	`count(zookeeper_up == 0) == count(zookeeper_up)`
Description	Raises when all ZooKeeper services across a cluster do not respond to Telegraf, typically indicating deployment or configuration issues.
Troubleshooting	Inspect the ZooKeeper logs on any node of the affected cluster using `journalctl -u zookeeper`. If ZooKeeper is up and running, inspect the Telegraf logs on the affected node using `journalctl -u telegraf`.
Tuning	Not required

updated: 2025-01-10 08:56

Redis

View Previous Section

Ceph

View Next Section