Enhancements

This section outlines new features implemented in the Cluster release 8.5.0 that is introduced in the Container Cloud release 2.15.1.


MOSK on local RAID devices

Available since 2.16.0 Technology Preview

Implemented the initial Technology Preview support for Mirantis OpenStack for Kubernetes (MOSK) deployment on local software-based Redundant Array of Independent Disks (RAID) devices to withstand failure of one device at a time. The feature becomes available once your Container Cloud cluster is automatically upgraded to 2.16.0.

Using a custom bare metal host profile, you can configure and create an mdadm-based software RAID device of type raid10 if you have an even number of devices available on your servers. At least four storage devices are required for such RAID device.

MKE and Kubernetes major versions update

Introduced support for the Mirantis Kubernetes Engine version 3.4.6 with Kubernetes 1.20 for the Container Cloud management, regional, and managed clusters. Also, added support for attachment of existing MKE 3.4.6 clusters.

MCR version update

Updated the Mirantis Container Runtime (MCR) version from 20.10.6 to 20.10.8 for the Container Cloud management, regional, and managed clusters on all supported cloud providers.

Network interfaces monitoring

Limited the number of monitored network interfaces to prevent extended Prometheus RAM consumption in big clusters. By default, Prometheus Node Exporter now only collects information of a basic set of interfaces, both host and container. If required you can edit the list of excluded devices as needed.

Custom Prometheus recording rules

Implemented the capability to define custom Prometheus recording rules through the prometheusServer.customRecordingRules parameter in the StackLight Helm chart. Overriding of existing recording rules is not supported.

Syslog packet size configuration

Implemented the capability to configure packet size for the syslog logging output. If remote logging to syslog is enabled in StackLight, use the logging.syslog.packetSize parameter in the StackLight Helm chart to configure the packet size.

Prometheus Relay configuration

Implemented the capability to configure the Prometheus Relay client timeout and response size limit through the prometheusRelay.clientTimeout and prometheusRelay.responseLimitBytes parameters in the StackLight Helm chart.

Mirantis Container Cloud alerts

Implemented the MCCLicenseExpirationCritical and MCCLicenseExpirationMajor alerts that notify about Mirantis Container Cloud license expiration in less than 10 and 30 days.

Improvements to StackLight alerting

Implemented the following improvements to StackLight alerting:

  • Enhanced Kubernetes applications alerting:

    • Reworked the Kubernetes applications alerts to minimize flapping, avoid firing during pod rescheduling, and to detect crash looping for pods that restart less frequently.

    • Added the KubeDeploymentOutage, KubeStatefulSetOutage, and KubeDaemonSetOutage alerts.

    • Removed the redundant KubeJobCompletion alert.

    • Enhanced the alert inhibition rules to reduce alert flooding.

    • Improved alert descriptions.

  • Split TelemeterClientFederationFailed into TelemeterClientFailed and TelemeterClientHAFailed to separate alerts depending on the HA mode disabled or enabled.

  • Updated the description for DockerSwarmNodeFlapping.

Node Exporter collectors

Disabled unused Node Exporter collectors and implemented the capability to manually enable needed collectors using the nodeExporter.extraCollectorsEnabled parameter. Only the following collectors are now enabled by default in StackLight:

  • arp

  • conntrack

  • cpu

  • diskstats

  • entropy

  • filefd

  • filesystem

  • hwmon

  • loadavg

  • meminfo

  • netdev

  • netstat

  • nfs

  • stat

  • sockstat

  • textfile

  • time

  • timex

  • uname

  • vmstat

Enhanced Ceph architecture

To improve debugging and log reading, separated Ceph Controller, Ceph Status Controller, and Ceph Request Controller, which used to run in one pod, into three different deployments.

Ceph networks validation

Implemented additional validation of networks specified in spec.cephClusterSpec.network.publicNet and spec.cephClusterSpec.network.clusterNet and prohibited the use of the 0.0.0.0/0 CIDR. Now, the bare metal provider automatically translates the 0.0.0.0/0 network range to the default LCM IPAM subnet if it exists.

You can now also add corresponding labels for the bare metal IPAM subnets when configuring the Ceph cluster during the management cluster deployment.

Automated Ceph LCM

Implemented full support for automated Ceph LCM operations using the KaaSCephOperationRequest CR, such as addition or removal of Ceph OSDs and nodes, as well as replacement of failed Ceph OSDs or nodes.

Learn more

Automated Ceph LCM

Ceph CSI provisioner tolerations and node affinity

Implemented the capability to specify Container Storage Interface (CSI) provisioner tolerations and node affinity for different Rook resources. Added support for the all and mds keys in toleration rules.

Ceph KaaSCephCluster.status enhancement

Extended the fullClusterInfo section of the KaaSCephCluster.status resource with the following fields:

  • cephDetails - contains verbose details of a Ceph cluster state

  • cephCSIPluginDaemonsStatus - contains details on all Ceph CSIs

Ceph Shared File System (CephFS)

TechPreview

Implemented the capability to enable the Ceph Shared File System, or CephFS, to create read/write shared file system Persistent Volumes (PVs).