Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!

Starting with MOSK 25.2, the MOSK documentation set covers all product layers, including MOSK management (formerly Container Cloud). This means everything you need is in one place. Some legacy names may remain in the code and documentation and will be updated in future releases. The separate Container Cloud documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.

General

This section lists the general alerts for Kubernetes nodes.


FileDescriptorUsageMajor

Severity

Major

Summary

Node uses 90% of file descriptors.

Description

The {{ $labels.node }} node uses 90% of file descriptors.

FileDescriptorUsageWarning

Severity

Warning

Summary

Node uses 80% of file descriptors.

Description

The {{ $labels.node }} node uses 80% of file descriptors.

NodeDown

Severity

Critical

Summary

{{ $labels.node }} node is down.

Description

The {{ $labels.node }} node is down. During the last 2 minutes Kubernetes treated the node as Not Ready or Unknown and kubelet was not accessible from Prometheus.

NodeExporterCollectorFailure

Available since MOSK 25.2 and MOSK management 2.30.0

Severity

Warning

Summary

Node Exporter failure detected for {{ $labels.collector }} collector.

Description

The {{ $labels.collector }} collector has failed to scrape at least once in the last 20 minutes on {{ $value }} node(s).

NodeExporterTargetDown

Severity

Critical

Summary

Node Exporter Prometheus target is down.

Description

Prometheus fails to scrape metrics from the Node Exporter endpoint on the {{ $labels.node }} node.

NodeExporterTargetsOutage

Severity

Critical

Summary

Node Exporter Prometheus targets outage.

Description

Prometheus fails to scrape metrics from all Node Exporter endpoints.

SystemCpuFullWarning

Severity

Warning

Summary

High CPU consumption.

Description

The average CPU consumption on the {{ $labels.node }} node is {{ $value }}% for 2 minutes.

SystemLoadTooHighWarning

Severity

Warning

Summary

System load is more than 1 per CPU.

Description

The system load per CPU on the {{ $labels.node }} node is {{ $value }} for 5 minutes.

SystemLoadTooHighCritical

Severity

Critical

Summary

System load is more than 2 per CPU.

Description

The system load per CPU on the {{ $labels.node }} node is {{ $value }} for 5 minutes.

SystemDiskFullWarning

Severity

Warning

Summary

Disk partition {{ $labels.device }} is 85% full.

Description

The {{ $labels.device }} disk partition on the {{ $labels.node }} node is {{ printf "%.1f" $value }} % full for 2 minutes.

SystemDiskFullMajor

Severity

Major

Summary

Disk partition {{ $labels.device }} is 95% full.

Description

The {{ $labels.device }} disk partition on the {{ $labels.node }} node is {{ printf "%.1f" $value }} % full for 2 minutes.

SystemMemoryFullWarning

Severity

Warning

Summary

{{ $labels.node }} memory warning usage.

Description

The {{ $labels.node }} node uses {{ $value }}% of memory for 10 minutes. More than 90% of memory is used and less than 8 GB of memory is available.

SystemMemoryFullMajor

Severity

Major

Summary

{{ $labels.node }} memory major usage.

Description

The {{ $labels.node }} node uses {{ $value }}% of memory for 10 minutes. More than 95% of memory is used and less than 4 GB of memory is available.

SystemDiskInodesFullWarning

Severity

Warning

Summary

85% of inodes for {{ $labels.device }} are used.

Description

The {{ $labels.device }} disk on the {{ $labels.node }} node uses {{ printf "%.1f" $value }} % of disk inodes for 2 minutes.

SystemDiskInodesFullMajor

Severity

Major

Summary

95% of inodes for {{ $labels.device }} are used.

Description

The {{ $labels.device }} disk on the {{ $labels.node }} node uses {{ printf "%.1f" $value }} % of disk inodes for 2 minutes.

SystemUptimeCritical

Available since MOSK 25.2 and MOSK management 2.30.0

Severity

Critical

Summary

System with vulnerable AMD CPU (Erratum 1474) exceeds uptime of 1000 days.

Description

Node {{ $labels.node }} has been running for {{ printf "%.0f" $value }} days without reboot. Installed CPU ({{ $labels.model_name }}) might be affected by Erratum 1474 that may cause system crashes after 1044 days of continuous operation.

SystemUptimeWarning

Available since MOSK 25.2 and MOSK management 2.30.0

Severity

Warning

Summary

System with vulnerable AMD CPU (Erratum 1474) exceeds uptime of 800 days.

Description

Node {{ $labels.node }} has been running for {{ printf "%.0f" $value }} days without reboot. Installed CPU ({{ $labels.model_name }}) might be affected by Erratum 1474 that may cause system crashes after 1044 days of continuous operation.