Documentation Portal

Memcached

Memcached¶

This section describes the alerts for the Memcached service.

MemcachedServiceDown
MemcachedServiceRespawn
MemcachedConnectionThrottled
MemcachedConnectionsNoneMinor
MemcachedConnectionsNoneMajor
MemcachedItemsNoneMinor
MemcachedEvictionsLimit

MemcachedServiceDown¶

Severity	Minor
Summary	The Memcached service on the `{{ $labels.host }}` node is down.
Raise condition	`memcached_up == 0`
Description	Raised when Telegraf cannot gather metrics from the Memcached service, typically indicating that Memcached is down on one node and caching does not work on that node. The `host` label in the raised alert contains the host name of the affected node
Troubleshooting	Verify the Memcached service status using `systemctl status memcached`. Inspect the Memcached service logs using `journalctl -xfu memcached`.
Tuning	Not required

MemcachedServiceRespawn¶

^{Removed since the 2019.2.4 maintenance update.}

Severity	Warning
Summary	The Memcached service on the `{{ $labels.host }}` node was respawned.
Raise condition	`memcached_uptime < 180`
Description	Raises when the Memcached service uptime is below 180 seconds, indicating that it was recently respawned (restarted). If Memcached respawning happened during maintenance, the alert is expected. Otherwise, this alert indicates an issue with the service. The `host` label in the raised alert contains the host name of the affected node. Warning The alert is a partial duplicate of `MemcachedServiceDown` and has been removed starting from the 2019.2.4 maintenance update. For the existing MCP deployments, verify and disable this alert.
Troubleshooting	Verify the Memcached service status using `systemctl status memcached`. Inspect the Memcached service logs using `journalctl -xfu memcached`.
Tuning	Disable the alert as described in Manage alerts.

MemcachedConnectionThrottled¶

Severity	Warning
Summary	More than 5 client connections to the Memcached database on the `{{ $labels.host }}` node throttle for 2 minutes.
Raise condition	`increase(memcached_conn_yields[1m]) > 5`
Description	Raises when the number of times the Memcached connection was throttled reaches 5 over the last minute. This warning appears with the Too many open connections error message in Memcached. Too many connections may cause an error in writing because of the process starvation (blocking). To avoid this, Memcached throttles the connection. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Use `telnet` to connect to Memcached by running `telnet localhost 11211` on the affected node. Then run `stats` to obtain the server information. Inspect the Memcached service logs using `journalctl -xfu memcached`. Adjust the threshold if required.
Tuning	To change the throttling threshold to `10`: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert threshold by overriding the `if` parameter: parameters: prometheus: server: alert: MemcachedConnectionThrottled: if: >- increase(memcached_conn_yields[1m]) > 10 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

MemcachedConnectionsNoneMinor¶

Severity	Minor
Summary	The Memcached database on the `{{ $labels.host }}` node has no open connections.
Raise condition	`memcached_curr_connections == 0`
Description	Raises when no connections to Memcached exist on one node, typically indicating that the connections were dropped. The state may affect performance. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Use `telnet` to connect to Memcached by running `telnet localhost 11211` on the affected node. Then run `stats` to obtain the server information. Inspect the Memcached service logs using `journalctl -xfu memcached`.
Tuning	Not required

MemcachedConnectionsNoneMajor¶

Severity	Major
Summary	The Memcached database has no open connections on all nodes.
Raise condition	`count(memcached_curr_connections == 0) == count(memcached_up)`
Description	Raises when no connections to Memcached exist on all nodes, indicating that Memcached has no client connected to it and does not receive data.
Troubleshooting	Use `telnet` to connect to Memcached by running `telnet localhost 11211` on the affected node. Then run `stats` to obtain the server information. Inspect the Memcached service logs using `journalctl -xfu memcached`.
Tuning	Not required

MemcachedItemsNoneMinor¶

^{Removed since the 2019.2.4 maintenance update.}

Severity	Minor
Summary	The Memcached database on the `{{ $labels.host }}` node is empty.
Raise condition	`memcached_curr_items == 0`
Description	Raises when a Memcached database has no items on one node. As Memcached is an in-memory database, this may be the result of Memcached respawn. Otherwise, investigate the reason. The `host` label in the raised alert contains the host name of the affected node. Warning The alert has been removed starting from the 2019.2.4 maintenance update. For the existing MCP deployments, disable this alert.
Troubleshooting	To confirm the issue, use `telnet` to connect to Memcached by running `telnet localhost 11211` on the affected node. Run `stats` and search for `curr_items` and `evictions` to verify that the items were not removed before their TTL. Run `stats items` for further details on the status of the items.
Tuning	Disable the alert as described in Manage alerts.

MemcachedEvictionsLimit¶

Severity	Warning
Summary	More than 10 evictions in the Memcached database occurred on the `{{ $labels.host }}` node during the last minute.
Raise condition	`increase(memcached_evictions[1m]) > 10`
Description	Raises when the number of Memcached items that were removed before the ending of TTL has increased by 10 (default threshold) over the last minute. Memcached is used on the OpenStack controller nodes to cache the service authentication tokens. A high number of evictions indicates a heavy token rotation since old items must be removed to free the space for the new ones, based on pseudo-LRU. The `host` label in the raised alert contains the host name of the affected node.
Troubleshooting	Use `telnet` to connect to Memcached by running `telnet localhost 11211` on the affected node. Run `stats slabs` and search for `total_pages`, `chunk_size`, and `chunks_per_page` to verify if the slabs consume too much space.
Tuning	To change the evictions limit to `60`: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert by overriding the `if` parameter: parameters: prometheus: server: alert: MemcachedEvictionsLimit: if: >- increase(memcached_evictions[1m]) > 60 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

updated: 2025-01-10 08:56

libvirt

View Previous Section

NGINX

View Next Section