GlusterFS
This section describes the alerts for the GlusterFS service.
GlusterfsServiceMinor
Severity |
Minor |
Summary |
The GlusterFS service on the {{ $labels.host }} host is down. |
Raise condition |
procstat_running{process_name="glusterd"} < 1
|
Description |
Raises when Telegraf cannot find running glusterd processes on the
kvm hosts. |
Troubleshooting |
|
Tuning |
Not required |
GlusterfsServiceOutage
Severity |
Critical |
Summary |
All GlusterFS services are down. |
Raise condition |
glusterfs_up != 1
|
Description |
Raises when the Telegraf service cannot connect or gather metrics from
the GlusterFS service, typically meaning the GlusterFS, Telegraf
monitoring_remote_agent service, or network issues. |
Troubleshooting |
Inspect the Telegraf monitoring_remote_agent service logs by
running docker service logs monitoring_remote_agent on any
mon node.
Verify the GlusterFS status using
systemctl status glusterfs-server .
Inspect GlusterFS logs in the /var/log/glusterfs/ directory.
|
Tuning |
Not required |
GlusterfsInodesUsedMinor
Severity |
Minor |
Summary |
More than 80% of GlusterFS {{ $labels.volume }} volume inodes are
used for 2 minutes. |
Raise condition |
glusterfs_inodes_percent_used >=
{{ monitoring.inodes_percent_used_minor_threshold_percent*100 }} and
glusterfs_inodes_percent_used <
{{ monitoring.inodes_percent_used_major_threshold_percent*100 }}
|
Description |
Raises when GlusterFS uses more than 80% and less than 90% of available
inodes. The volume label in the raised alert contains the affected
GlusterFS volume. |
Troubleshooting |
|
Tuning |
Typically, you should not change the default value. If the alert is
constantly firing, verify the available inodes on the GlusterFS nodes
and adjust the threshold according to the number of available inodes.
In the Prometheus Web UI, use the raise condition query for a longer
period of time to define the best threshold.
For example, to change the threshold to the 90 - 95% interval:
On the cluster level of the Reclass model, create a common file for
all alert customizations. Skip this step to use an existing defined
file.
Create a file for alert customizations:
touch cluster/<cluster_name>/stacklight/custom/alerts.yml
Define the new file in
cluster/<cluster_name>/stacklight/server.yml :
classes:
- cluster.<cluster_name>.stacklight.custom.alerts
...
In the defined alert customizations file, modify the alert threshold
by overriding the if parameter:
parameters:
prometheus:
server:
alert:
GlusterfsInodesUsedMinor:
if: >-
glusterfs_inodes_percent_used >= 90 and \
glusterfs_inodes_percent_used < 95
From the Salt Master node, apply the changes:
salt 'I@prometheus:server' state.sls prometheus.server
Verify the updated alert definition in the Prometheus web UI.
|
GlusterfsInodesUsedMajor
Severity |
Major |
Summary |
{{ $value }}% of GlusterFS {{ $labels.volume }} volume inodes
are used for 2 minutes.
|
Raise condition |
glusterfs_inodes_percent_used >=
{{ monitoring.inodes_percent_used_major_threshold_percent*100 }}
|
Description |
Raises when GlusterFS uses more than 90% of available inodes. The
volume label in the raised alert contains the affected GlusterFS
volume. |
Troubleshooting |
|
Tuning |
Typically, you should not change the default value. If the alert is
constantly firing, verify the available inodes on the GlusterFS nodes
and adjust the threshold according to the number of available inodes.
In the Prometheus Web UI, use the raise condition query for a longer
period of time to define the best threshold.
For example, to change the threshold to 95% :
On the cluster level of the Reclass model, create a common file for
all alert customizations. Skip this step to use an existing defined
file.
Create a file for alert customizations:
touch cluster/<cluster_name>/stacklight/custom/alerts.yml
Define the new file in
cluster/<cluster_name>/stacklight/server.yml :
classes:
- cluster.<cluster_name>.stacklight.custom.alerts
...
In the defined alert customizations file, modify the alert threshold
by overriding the if parameter:
parameters:
prometheus:
server:
alert:
GlusterfsInodesUsedMajor:
if: >-
glusterfs_inodes_percent_used >= 95
From the Salt Master node, apply the changes:
salt 'I@prometheus:server' state.sls prometheus.server
Verify the updated alert definition in the Prometheus web UI.
|
GlusterfsSpaceUsedMinor
Severity |
Minor |
Summary |
{{ $value }}% of GlusterFS {{ $labels.volume }} volume disk
space is used for 2 minutes.
|
Raise condition |
glusterfs_space_percent_used >=
{{ monitoring.space_percent_used_minor_threshold_percent*100 }} and
glusterfs_space_percent_used <
{{ monitoring.space_percent_used_major_threshold_percent*100 }}
|
Description |
Raises when GlusterFS uses more than 80% and less than 90% of available
space. The volume label in the raised alert contains the affected
GlusterFS volume. |
Troubleshooting |
|
Tuning |
Typically, you should not change the default value. If the alert is
constantly firing, verify the available space on the GlusterFS nodes
and adjust the threshold accordingly. In the Prometheus Web UI, use the
raise condition query for a longer period of time to define the best
threshold.
For example, to change the threshold to the 90-95% interval:
On the cluster level of the Reclass model, create a common file for
all alert customizations. Skip this step to use an existing defined
file.
Create a file for alert customizations:
touch cluster/<cluster_name>/stacklight/custom/alerts.yml
Define the new file in
cluster/<cluster_name>/stacklight/server.yml :
classes:
- cluster.<cluster_name>.stacklight.custom.alerts
...
In the defined alert customizations file, modify the alert threshold
by overriding the if parameter:
parameters:
prometheus:
server:
alert:
GlusterfsSpaceUsedMinor:
if: >-
glusterfs_space_percent_used >= 90 and \
glusterfs_space_percent_used < 95
From the Salt Master node, apply the changes:
salt 'I@prometheus:server' state.sls prometheus.server
Verify the updated alert definition in the Prometheus web UI.
|
GlusterfsSpaceUsedMajor
Severity |
Minor |
Summary |
{{ $value }}% of GlusterFS {{ $labels.volume }} volume disk
space is used for 2 minutes.
|
Raise condition |
glusterfs_space_percent_used >=
{{ monitoring.space_percent_used_major_threshold_percent*100 }}
|
Description |
Raises when GlusterFS uses more than 90% of available space. The
volume label in the raised alert contains the affected GlusterFS
volume. |
Troubleshooting |
|
Tuning |
Typically, you should not change the default value. If the alert is
constantly firing, verify the available space on the GlusterFS nodes
and adjust the threshold accordingly. In the Prometheus Web UI, use the
raise condition query for a longer period of time to define the best
threshold.
For example, to change the threshold to 95% :
On the cluster level of the Reclass model, create a common file for
all alert customizations. Skip this step to use an existing defined
file.
Create a file for alert customizations:
touch cluster/<cluster_name>/stacklight/custom/alerts.yml
Define the new file in
cluster/<cluster_name>/stacklight/server.yml :
classes:
- cluster.<cluster_name>.stacklight.custom.alerts
...
In the defined alert customizations file, modify the alert threshold
by overriding the if parameter:
parameters:
prometheus:
server:
alert:
GlusterfsSpaceUsedMajor:
if: >-
glusterfs_space_percent_used >= 95
From the Salt Master node, apply the changes:
salt 'I@prometheus:server' state.sls prometheus.server
Verify the updated alert definition in the Prometheus web UI.
|
GlusterfsMountMissing
Available since the 2019.2.11 maintenance update
Severity |
Major |
Summary |
GlusterFS mount point is not mounted. |
Raise condition |
delta(glusterfs_mount_scrapes:rate5m{fstype=~"(fuse.)?glusterfs"}[5m])
< 0 or glusterfs_mount_scrapes:rate5m{fstype=~"(fuse.)?glusterfs"}
== 0
|
Description |
Raises when a GlusterFS mount point is not mounted. The path ,
host , and device labels in the raised alert contain the path,
node name, and device of the affected mount point. |
Troubleshooting |
To perform a manual remount, apply the
salt '*' state.sls glusterfs.client Salt state from the Salt Master
node. |
Tuning |
Not required |