Tune Prometheus IPMI exporter

In MOSK, IPMI monitoring is provided by the Prometheus IPMI exporter. The exporter is enabled by default on management clusters, collects hardware telemetry from server Baseboard Management Controller (BMC) endpoints, and exposes it as Prometheus metrics. It monitors hosts in both management and MOSK clusters. For architecture details, see Deployment architecture: Prometheus IPMI exporter.

Note

The IPMI exporter monitors only BareMetalHost objects with IPMI-based BMC addresses, using the ipmi:// scheme or no scheme, which defaults to IPMI. Hosts configured with other BMC protocols, such as redfish://, are excluded from IPMI monitoring.

The procedures below describe how to exclude specific clusters or hosts from IPMI monitoring and how to add custom alerts and Grafana dashboards. For global IPMI exporter configuration, collector settings, and to disable Prometheus IPMI exporter entirely, see Prometheus IPMI exporter.

Disable IPMI monitoring for hosts or clusters

IPMI monitoring can be disabled at two levels: for entire clusters or for individual hosts. When disabled at the cluster level, all hosts belonging to that cluster are excluded from IPMI monitoring. Host‑level exclusions take effect only when cluster‑level exclusion is not enabled.

To disable IPMI monitoring for an entire cluster:

  1. Open the Cluster object for editing.

  2. In spec.providerSpec.value, set disableIPMIMonitoring: true.

  3. Save the Cluster object to apply the change.

This disables IPMI monitoring for all hosts in the cluster.

To disable IPMI monitoring for a host:

  1. Open the BareMetalHostInventory object for the specific host you want to exclude:

    kubectl -n <project-name> edit baremetalhostinventory <host-name>
    
  2. Add the kaas.mirantis.com/disable-ipmi-monitoring="true" annotation.

  3. Save the BareMetalHostInventory object to apply the change.

This disables IPMI monitoring for that host only.

Note

The cluster-level setting takes precedence over host-level annotations. If a cluster has IPMI monitoring disabled, individual host annotations are ignored.

Create custom alerts and Grafana dashboards

IPMI monitoring includes preconfigured Grafana dashboards and Prometheus alert rules. They offer built-in visibility into common hardware health scenarios and can be customized as needed.

You can extend IPMI monitoring by creating custom Prometheus alerts and Grafana dashboards:

  • To create custom Grafana dashboards, see Create custom dashboards in Grafana.

  • To configure custom alerts use prometheusServer.customAlerts in your StackLight configuration. For details, see Alert configuration.

    Examples of custom alerts:

    • Temperature thresholds (warning at 70°C, critical at 80°C)

    • Fan speed below minimum thresholds

    • Voltage outside acceptable ranges

    • Power consumption exceeding capacity

    • Chassis power state changes

    • Power supply failures