Documentation Portal

GlusterFS

GlusterFS¶

This section describes the alerts for the GlusterFS service.

GlusterfsServiceMinor
GlusterfsServiceOutage
GlusterfsInodesUsedMinor
GlusterfsInodesUsedMajor
GlusterfsSpaceUsedMinor
GlusterfsSpaceUsedMajor
GlusterfsMountMissing

GlusterfsServiceMinor¶

Severity	Minor
Summary	The GlusterFS service on the `{{ $labels.host }}` host is down.
Raise condition	`procstat_running{process_name="glusterd"} < 1`
Description	Raises when Telegraf cannot find running `glusterd` processes on the `kvm` hosts.
Troubleshooting	Verify the GlusterFS status using `systemctl status glusterfs-server`. Inspect GlusterFS logs in the `/var/log/glusterfs/` directory.
Tuning	Not required

GlusterfsServiceOutage¶

Severity	Critical
Summary	All GlusterFS services are down.
Raise condition	`glusterfs_up != 1`
Description	Raises when the Telegraf service cannot connect or gather metrics from the GlusterFS service, typically meaning the GlusterFS, Telegraf `monitoring_remote_agent` service, or network issues.
Troubleshooting	Inspect the Telegraf `monitoring_remote_agent` service logs by running `docker service logs monitoring_remote_agent` on any `mon` node. Verify the GlusterFS status using `systemctl status glusterfs-server`. Inspect GlusterFS logs in the `/var/log/glusterfs/` directory.
Tuning	Not required

GlusterfsInodesUsedMinor¶

Severity	Minor
Summary	More than 80% of GlusterFS `{{ $labels.volume }}` volume inodes are used for 2 minutes.
Raise condition	`glusterfs_inodes_percent_used >= {{ monitoring.inodes_percent_used_minor_threshold_percent100 }} and glusterfs_inodes_percent_used < {{ monitoring.inodes_percent_used_major_threshold_percent100 }}`
Description	Raises when GlusterFS uses more than 80% and less than 90% of available inodes. The `volume` label in the raised alert contains the affected GlusterFS volume.
Troubleshooting	Verify the number of objects stored in GlusterFS. If possible, increase the number of inodes.
Tuning	Typically, you should not change the default value. If the alert is constantly firing, verify the available inodes on the GlusterFS nodes and adjust the threshold according to the number of available inodes. In the Prometheus Web UI, use the raise condition query for a longer period of time to define the best threshold. For example, to change the threshold to the `90 - 95%` interval: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert threshold by overriding the `if` parameter: parameters: prometheus: server: alert: GlusterfsInodesUsedMinor: if: >- glusterfs_inodes_percent_used >= 90 and \ glusterfs_inodes_percent_used < 95 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

GlusterfsInodesUsedMajor¶

Severity	Major
Summary	`{{ $value }}%` of GlusterFS `{{ $labels.volume }}` volume inodes are used for 2 minutes.
Raise condition	`glusterfs_inodes_percent_used >= {{ monitoring.inodes_percent_used_major_threshold_percent*100 }}`
Description	Raises when GlusterFS uses more than 90% of available inodes. The `volume` label in the raised alert contains the affected GlusterFS volume.
Troubleshooting	Verify the number of objects stored in GlusterFS. If possible, increase the number of inodes.
Tuning	Typically, you should not change the default value. If the alert is constantly firing, verify the available inodes on the GlusterFS nodes and adjust the threshold according to the number of available inodes. In the Prometheus Web UI, use the raise condition query for a longer period of time to define the best threshold. For example, to change the threshold to `95%`: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert threshold by overriding the `if` parameter: parameters: prometheus: server: alert: GlusterfsInodesUsedMajor: if: >- glusterfs_inodes_percent_used >= 95 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

GlusterfsSpaceUsedMinor¶

Severity	Minor
Summary	`{{ $value }}%` of GlusterFS `{{ $labels.volume }}` volume disk space is used for 2 minutes.
Raise condition	`glusterfs_space_percent_used >= {{ monitoring.space_percent_used_minor_threshold_percent100 }} and glusterfs_space_percent_used < {{ monitoring.space_percent_used_major_threshold_percent100 }}`
Description	Raises when GlusterFS uses more than 80% and less than 90% of available space. The `volume` label in the raised alert contains the affected GlusterFS volume.
Troubleshooting	Inspect the data stored in GlusterFS. Increase the GlusterFS capacity.
Tuning	Typically, you should not change the default value. If the alert is constantly firing, verify the available space on the GlusterFS nodes and adjust the threshold accordingly. In the Prometheus Web UI, use the raise condition query for a longer period of time to define the best threshold. For example, to change the threshold to the `90-95%` interval: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert threshold by overriding the `if` parameter: parameters: prometheus: server: alert: GlusterfsSpaceUsedMinor: if: >- glusterfs_space_percent_used >= 90 and \ glusterfs_space_percent_used < 95 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

GlusterfsSpaceUsedMajor¶

Severity	Minor
Summary	`{{ $value }}%` of GlusterFS `{{ $labels.volume }}` volume disk space is used for 2 minutes.
Raise condition	`glusterfs_space_percent_used >= {{ monitoring.space_percent_used_major_threshold_percent*100 }}`
Description	Raises when GlusterFS uses more than 90% of available space. The `volume` label in the raised alert contains the affected GlusterFS volume.
Troubleshooting	Inspect the data stored in GlusterFS. Increase the GlusterFS capacity.
Tuning	Typically, you should not change the default value. If the alert is constantly firing, verify the available space on the GlusterFS nodes and adjust the threshold accordingly. In the Prometheus Web UI, use the raise condition query for a longer period of time to define the best threshold. For example, to change the threshold to `95%`: On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file. Create a file for alert customizations: touch cluster/<cluster_name>/stacklight/custom/alerts.yml Define the new file in `cluster/<cluster_name>/stacklight/server.yml`: classes: - cluster.<cluster_name>.stacklight.custom.alerts ... In the defined alert customizations file, modify the alert threshold by overriding the `if` parameter: parameters: prometheus: server: alert: GlusterfsSpaceUsedMajor: if: >- glusterfs_space_percent_used >= 95 From the Salt Master node, apply the changes: salt 'I@prometheus:server' state.sls prometheus.server Verify the updated alert definition in the Prometheus web UI.

GlusterfsMountMissing¶

^{Available since the 2019.2.11 maintenance update}

Severity	Major
Summary	GlusterFS mount point is not mounted.
Raise condition	`delta(glusterfs_mount_scrapes:rate5m{fstype=~"(fuse.)?glusterfs"}[5m]) < 0 or glusterfs_mount_scrapes:rate5m{fstype=~"(fuse.)?glusterfs"} == 0`
Description	Raises when a GlusterFS mount point is not mounted. The `path`, `host`, and `device` labels in the raised alert contain the path, node name, and device of the affected mount point.
Troubleshooting	To perform a manual remount, apply the `salt '*' state.sls glusterfs.client` Salt state from the Salt Master node.
Tuning	Not required

updated: 2025-01-10 08:56

Galera

View Previous Section

HAProxy

View Next Section