GlusterFS

GlusterFS

This section describes the alerts for the GlusterFS service.


GlusterfsServiceMinor

Severity

Minor

Summary

The GlusterFS service on the {{ $labels.host }} host is down.

Raise condition

procstat_running{process_name="glusterd"} < 1

Description

Raises when Telegraf cannot find running glusterd processes on the kvm hosts.

Troubleshooting

  • Verify the GlusterFS status using systemctl status glusterfs-server.

  • Inspect GlusterFS logs in the /var/log/glusterfs/ directory.

Tuning

Not required

GlusterfsServiceOutage

Severity

Critical

Summary

All GlusterFS services are down.

Raise condition

glusterfs_up != 1

Description

Raises when the Telegraf service cannot connect or gather metrics from the GlusterFS service, typically meaning the GlusterFS, Telegraf monitoring_remote_agent service, or network issues.

Troubleshooting

  • Inspect the Telegraf monitoring_remote_agent service logs by running docker service logs monitoring_remote_agent on any mon node.

  • Verify the GlusterFS status using systemctl status glusterfs-server.

  • Inspect GlusterFS logs in the /var/log/glusterfs/ directory.

Tuning

Not required

GlusterfsInodesUsedMinor

Severity

Minor

Summary

More than 80% of GlusterFS {{ $labels.volume }} volume inodes are used for 2 minutes.

Raise condition

glusterfs_inodes_percent_used >= {{ monitoring.inodes_percent_used_minor_threshold_percent*100 }} and glusterfs_inodes_percent_used < {{ monitoring.inodes_percent_used_major_threshold_percent*100 }}

Description

Raises when GlusterFS uses more than 80% and less than 90% of available inodes. The volume label in the raised alert contains the affected GlusterFS volume.

Troubleshooting

  • Verify the number of objects stored in GlusterFS.

  • If possible, increase the number of inodes.

Tuning

Typically, you should not change the default value. If the alert is constantly firing, verify the available inodes on the GlusterFS nodes and adjust the threshold according to the number of available inodes. In the Prometheus Web UI, use the raise condition query for a longer period of time to define the best threshold.

For example, to change the threshold to the 90 - 95% interval:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            GlusterfsInodesUsedMinor:
              if: >-
                glusterfs_inodes_percent_used >= 90 and \
                glusterfs_inodes_percent_used < 95
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

GlusterfsInodesUsedMajor

Severity

Major

Summary

{{ $value }}% of GlusterFS {{ $labels.volume }} volume inodes are used for 2 minutes.

Raise condition

glusterfs_inodes_percent_used >= {{ monitoring.inodes_percent_used_major_threshold_percent*100 }}

Description

Raises when GlusterFS uses more than 90% of available inodes. The volume label in the raised alert contains the affected GlusterFS volume.

Troubleshooting

  • Verify the number of objects stored in GlusterFS.

  • If possible, increase the number of inodes.

Tuning

Typically, you should not change the default value. If the alert is constantly firing, verify the available inodes on the GlusterFS nodes and adjust the threshold according to the number of available inodes. In the Prometheus Web UI, use the raise condition query for a longer period of time to define the best threshold.

For example, to change the threshold to 95%:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            GlusterfsInodesUsedMajor:
              if: >-
                glusterfs_inodes_percent_used >= 95
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

GlusterfsSpaceUsedMinor

Severity

Minor

Summary

{{ $value }}% of GlusterFS {{ $labels.volume }} volume disk space is used for 2 minutes.

Raise condition

glusterfs_space_percent_used >= {{ monitoring.space_percent_used_minor_threshold_percent*100 }} and glusterfs_space_percent_used < {{ monitoring.space_percent_used_major_threshold_percent*100 }}

Description

Raises when GlusterFS uses more than 80% and less than 90% of available space. The volume label in the raised alert contains the affected GlusterFS volume.

Troubleshooting

  • Inspect the data stored in GlusterFS.

  • Increase the GlusterFS capacity.

Tuning

Typically, you should not change the default value. If the alert is constantly firing, verify the available space on the GlusterFS nodes and adjust the threshold accordingly. In the Prometheus Web UI, use the raise condition query for a longer period of time to define the best threshold.

For example, to change the threshold to the 90-95% interval:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            GlusterfsSpaceUsedMinor:
              if: >-
                glusterfs_space_percent_used >= 90 and \
                glusterfs_space_percent_used < 95
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

GlusterfsSpaceUsedMajor

Severity

Minor

Summary

{{ $value }}% of GlusterFS {{ $labels.volume }} volume disk space is used for 2 minutes.

Raise condition

glusterfs_space_percent_used >= {{ monitoring.space_percent_used_major_threshold_percent*100 }}

Description

Raises when GlusterFS uses more than 90% of available space. The volume label in the raised alert contains the affected GlusterFS volume.

Troubleshooting

  • Inspect the data stored in GlusterFS.

  • Increase the GlusterFS capacity.

Tuning

Typically, you should not change the default value. If the alert is constantly firing, verify the available space on the GlusterFS nodes and adjust the threshold accordingly. In the Prometheus Web UI, use the raise condition query for a longer period of time to define the best threshold.

For example, to change the threshold to 95%:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            GlusterfsSpaceUsedMajor:
              if: >-
                glusterfs_space_percent_used >= 95
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.


GlusterfsMountMissing

Available since the 2019.2.11 maintenance update

Severity

Major

Summary

GlusterFS mount point is not mounted.

Raise condition

delta(glusterfs_mount_scrapes:rate5m{fstype=~"(fuse.)?glusterfs"}[5m]) < 0 or glusterfs_mount_scrapes:rate5m{fstype=~"(fuse.)?glusterfs"} == 0

Description

Raises when a GlusterFS mount point is not mounted. The path, host, and device labels in the raised alert contain the path, node name, and device of the affected mount point.

Troubleshooting

To perform a manual remount, apply the salt '*' state.sls glusterfs.client Salt state from the Salt Master node.

Tuning

Not required