Enable SMART disk monitoring

Enable SMART disk monitoringΒΆ

Warning

This feature is available starting from the MCP 2019.2.3 maintenance update. Before enabling the feature, follow the steps described in Apply maintenance updates.

If your MCP cluster includes physical disks that support Self-Monitoring, Analysis and Reporting Technology (SMART), you can configure StackLight LMA to monitor such disks by parsing their SMART data and to raise alerts if disk errors occur. By default, all disks on the bare metal servers will be scanned.

To enable SMART disk monitoring:

  1. Log in to the Salt Master node.

  2. Verify that you have updated the salt-formulas-linux package.

  3. Install the smartmontools package as a required dependency:

    salt -C 'I@salt:minion' cmd.run 'apt update; apt install -y smartmontools'
    
  4. (Optional) Modify the default parameters per node or server as required, for example, to exclude a specific device from the checks.

    (...)
    parameters:
      _param:
      (...)
      telegraf:
        agent:
          input:
            smart:
              excludes:
                - /dev/sdd
    
  5. Refresh Salt pillar:

    salt -C 'I@salt:minion' saltutil.refresh_pillar
    
  6. Update the Salt mine:

    salt -C 'I@salt:minion' state.sls salt.minion.grains
    salt -C 'I@salt:minion' mine.update
    
  7. Update the Telegraf configuration:

    salt -C 'I@telegraf:agent' state.sls telegraf
    
  8. Update the Prometheus configuration:

    salt -C 'I@prometheus:server and I@docker:swarm' state.sls prometheus.server