Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!

Now, the MOSK documentation set covers all product layers, including MOSK management (formerly Container Cloud). This means everything you need is in one place. Some legacy names may remain in the code and documentation and will be updated in future releases. The separate Container Cloud documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.

Alert dependencies

Using alert inhibition rules, Alertmanager decreases alert noise by suppressing dependent alerts notifications to provide a clearer view on the cloud status and simplify troubleshooting. Alert inhibition rules are enabled by default.

The following tables describe the dependencies between the OpenStack-related and MOSK cluster alerts.

Once an alert from the Alert column raises, the alert from the Inhibits and rules column will be suppressed with the Inhibited status in the Alertmanager web UI.

The Inhibits and rules column lists the labels and conditions, if any, for the inhibition to apply.

Alert inhibition rules for OpenStack clusters

Alert

Inhibits and rules

CassandraTombstonesTooManyCritical

CassandraTombstonesTooManyMajor with the same cassandra_cluster, namespace, and pod labels

CassandraTombstonesTooManyMajor

CassandraTombstonesTooManyWarning with the same cassandra_cluster, namespace, and pod labels

CinderServiceOutage

  • CinderVolumeServiceDown with the same binary, zone, and backend labels

  • CinderServiceDown with the same binary and zone labels

KubeDaemonSetOutage

  • LibvirtExporterTargetsOutage

  • TungstenFabricControllerOutage

  • TungstenFabricControllerTargetsOutage

  • TungstenFabricVrouterOutage

  • TungstenFabricVrouterTargetsOutage

And other alerts described in Alert inhibition rules for MOSK clusters.

KubeDeploymentOutage

  • RabbitMQExporterTargetDown for the particular OpenStack service

  • RabbitMQOperatorTargetDown

  • TelegrafOpenstackTargetDown

And other alerts described in Alert inhibition rules for MOSK clusters.

KubeStatefulSetOutage

  • CassandraClusterTargetDown

  • KafkaClusterTargetDown

  • MariadbClusterDown

  • MariadbExporterTargetDown

  • MemcachedClusterDown

  • MemcachedExporterTargetDown

  • OpenstackPowerDNSTargetDown

  • OpenstackPowerDNSProbeFailure

  • RabbitMQTargetDown

  • RabbitMQDown for the particular OpenStack service

  • ZooKeeperClusterTargetDown

And other alerts described in Alert inhibition rules for MOSK clusters.

LibvirtExporterTargetsOutage

LibvirtExporterTargetDown

MemcachedConnectionsNoneMajor

MemcachedConnectionsNoneWarning with the same namespace label

NeutronAgentOutage

NeutronAgentDown with the same binary and zone labels

NodeDown

  • Alerts with the same node label:

    • LibvirtExporterTargetDown

    • OpenstackPowerDNSTargetDown

    • RabbitMQTargetDown

    • CassandraClusterTargetDown

    • KafkaClusterTargetDown

    • MariadbExporterTargetDown

    • MemcachedExporterTargetDown

    • OpenstackCloudproberTargetDown

    • RabbitMQOperatorTargetDown

    • RabbitMQExporterTargetDown

    • RedisClusterTargetDown

    • ZooKeeperClusterTargetDown

    And other alerts described in Alert inhibition rules for MOSK clusters.

NovaServiceOutage

NovaServiceDown with the same binary and zone labels

OpenstackPowerDNSProbeFailure

OpenstackPowerDNSQueryDurationHigh with the same target_name, target_type, and protocol

OpenstackSSLCertExpirationHigh

OpenstackSSLCertExpirationMedium with the same namespace and service_name labels

OsDplSSLCertExpirationHigh

OsDplSSLCertExpirationMedium with the same identifier label

TungstenFabricControllerOutage

TungstenFabricControllerDown

TungstenFabricVrouterOutage

TungstenFabricVrouterDown

TungstenFabricVrouterTargetsOutage

TungstenFabricVrouterTargetDown

Alert inhibition rules for MOSK clusters

Alert

Inhibits and rules

cAdvisorTargetsOutage

cAdvisorTargetDown

CalicoTargetsOutage

CalicoTargetDown

CephClusterFullCritical

CephClusterFullWarning

CephClusterHealthCritical

CephClusterHealthWarning

CephOSDNodeDown

With the same node label:

  • CephOSDDiskNotResponding

  • CephOSDDiskUnavailable

  • CnncNodeDown

CephOSDPgNumTooHighCritical

CephOSDPgNumTooHighWarning

CnncNodeDown

CnncAgentDown with the same node label

DockerSwarmServiceReplicasFlapping

DockerSwarmServiceReplicasDown with the same service_id, service_mode, and service_name labels

DockerSwarmServiceReplicasOutage

DockerSwarmServiceReplicasDown with the same service_id, service_mode, and service_name labels

etcdDbSizeCritical

etcdDbSizeMajor with the same job and instance labels

etcdHighNumberOfFailedGRPCRequestsCritical

etcdHighNumberOfFailedGRPCRequestsWarning with the same grpc_method, grpc_service, job, and instance labels

ExternalEndpointDown

ExternalEndpointTCPFailure with the same instance and job labels

FileDescriptorUsageMajor

FileDescriptorUsageWarning with the same node label

FluentdTargetsOutage

FluentdTargetDown

KeycloakBlackboxServiceProbeFailing

KeycloakBlackboxEndpointProbeFailing with the same service_name label

KeycloakHttpErrors4xxHighMajor

KeycloakHttpErrors4xxHighWarning

KeycloakHttpErrors5xxHighMajor

KeycloakHttpErrors5xxHighWarning

KeycloakLoginErrorsHighMajor

KeycloakLoginErrorsHighWarning

KeycloakSSLCertExpirationHigh

KeycloakSSLCertExpirationMedium

KubeAPICertExpirationHigh

KubeAPICertExpirationMedium

KubeAPIErrorsHighMajor

KubeAPIErrorsHighWarning with the same instance label

KubeAPIOutage

KubeAPIDown

KubeAPIResourceErrorsHighMajor

KubeAPIResourceErrorsHighWarning with the same instance, resource, and subresource labels

KubeDaemonSetOutage

  • CalicoTargetsOutage

  • KubeDaemonSetRolloutStuck with the same daemonset and namespace labels

  • FluentdTargetsOutage

  • NodeExporterTargetsOutage

  • TelegrafSMARTTargetsOutage

KubeDeploymentOutage

  • KubeDeploymentReplicasMismatch with the same deployment and namespace labels

  • GrafanaTargetDown

  • KubernetesMasterAPITargetsOutage

  • KubeStateMetricsTargetDown

  • PrometheusEsExporterTargetDown

  • PrometheusMsTeamsTargetDown

  • PrometheusRelayTargetDown

  • ServiceNowWebhookReceiverTargetDown

  • SfNotifierTargetDown

  • TelegrafDockerSwarmTargetDown

  • TelegrafOpenstackTargetDown

KubeletTargetsOutage

KubeletTargetDown

KubePersistentVolumeUsageCritical

With the same namespace and persistentvolumeclaim labels:

  • KubePersistentVolumeFullInFourDays

  • OpenSearchStorageUsageCritical

  • OpenSearchStorageUsageMajor

KubePodsCrashLooping

KubePodsRegularLongTermRestarts with the same created_by_name, created_by_kind, and namespace labels

KubeStatefulSetOutage

  • Alerts with the same namespace and statefulset labels:

    • KubeStatefulSetUpdateNotRolledOut

    • KubeStatefulSetReplicasMismatch

  • AlertmanagerTargetDown

  • ElasticsearchExporterTargetDown

  • FluentdTargetsOutage

  • KeycloakTargetDown

  • OpenSearchClusterStatusCritical

  • PostgresqlReplicaDown

  • PostgresqlTargetDown

  • PrometheusEsExporterTargetDown

  • PrometheusServerTargetDown

MCCLicenseExpirationHigh

MCCLicenseExpirationMedium

MCCSSLCertExpirationHigh

MCCSSLCertExpirationMedium with the same namespace and service_name labels

MCCSSLProbesServiceTargetOutage

MCCSSLProbesEndpointTargetOutage with the same namespace and service_name labels

MKEAPICertExpirationHigh

MKEAPICertExpirationMedium

MKEAPIOutage

MKEAPIDown

MKEMetricsEngineTargetsOutage

MKEMetricsEngineTargetDown

MKENodeDiskFullCritical

MKENodeDiskFullWarning with the same node label

MKENodeDown

CnncNodeDown with the same node label

NodeDown

  • KubeDaemonSetRolloutStuck for the calico-node, ucp-node-feature-discovery, and ucp-nvidia-device-plugin DaemonSets

  • For resource=nodes:

    • KubeAPIResourceErrorsHighMajor

    • KubeAPIResourceErrorsHighWarning

  • Alerts with the same node label:

    • AlertmanagerTargetDown

    • CalicoTargetDown

    • cAdvisorTargetDown

    • CephClusterTargetDown

    • CnncNodeDown

    • etcdTargetDown

    • FluentdTargetDown

    • GrafanaTargetDown

    • HelmControllerTargetDown

    • KeycloakTargetDown

    • KubeAPIDown

    • KubeletDown

    • KubeletTargetDown

    • KubeNodeNotReady

    • LibvirtExporterTargetDown

    • MCCCacheTargetDown

    • MCCControllerTargetDown

    • MCCProviderTargetDown

    • MKEAPIDown

    • MKEMetricsEngineTargetDown

    • MKENodeDown

    • NodeExporterTargetDown

    • PostgresqlTargetDown

    • PrometheusMsTeamsTargetDown

    • PrometheusRelayTargetDown

    • PrometheusServerTargetDown

    • ServiceNowWebhookReceiverTargetDown

    • SfNotifierTargetDown

    • TelegrafDockerSwarmTargetDown

    • TelegrafSMARTTargetDown

    • TelemeterClientTargetDown

    • TelemeterServerFederationTargetDown

    • TelemeterServerTargetDown

NodeExporterTargetsOutage

NodeExporterTargetDown

OpenSearchClusterStatusCritical

  • KubeJobFailed for created_by_name=~"elasticsearch-curator-."

  • OpenSearchClusterStatusWarning with the same cluster label
    Removed in MOSK 25.2 and MOSK management 2.30.0

OpenSearchHeapUsageCritical

OpenSearchHeapUsageWarning with the same cluster and name labels

OpenSearchStorageUsageCritical

KubePersistentVolumeFullInFourDays and OpenSearchStorageUsageMajor with the same namespace and persistentvolumeclaim labels

OpenSearchStorageUsageMajor

KubePersistentVolumeFullInFourDays with the same namespace and persistentvolumeclaim labels

PostgresqlPatroniClusterUnlocked

With the same cluster and namespace labels:

  • PostgresqlReplicationNonStreamingReplicas

  • PostgresqlReplicationPaused

PostgresqlReplicaDown

  • Alerts with the same cluster and namespace labels:

    • PostgresqlReplicationNonStreamingReplicas

    • PostgresqlReplicationPaused

    • PostgresqlReplicationSlowWalApplication

    • PostgresqlReplicationSlowWalDownload

    • PostgresqlReplicationWalArchiveWriteFailing

PrometheusErrorSendingAlertsMajor

PrometheusErrorSendingAlertsWarning with the same alertmanager and pod labels

SSLCertExpirationHigh

SSLCertExpirationMedium with the same instance label

SystemDiskFullMajor

SystemDiskFullWarning with the same device, mountpoint, and node labels

SystemDiskInodesFullMajor

SystemDiskInodesFullWarning with the same device, mountpoint, and node labels

SystemLoadTooHighCritical

SystemLoadTooHighWarning with the same node label

SystemMemoryFullMajor

SystemMemoryFullWarning with the same node label

SystemUptimeCritical

SystemUptimeWarning with the same node and model_name labels

TelegrafSMARTTargetsOutage

TelegrafSMARTTargetDown

TelemeterServerTargetDown

TelemeterServerFederationTargetDown