Mirantis Container Cloud Documentation

The documentation is intended to help operators understand the core concepts of the product.

The information provided in this documentation set is being constantly improved and amended based on the feedback and kind requests from our software consumers. This documentation set outlines description of the features supported within three latest Container Cloud minor releases and their supported Cluster releases, with a corresponding note Available since <release-version>.

The following table lists the guides included in the documentation set you are reading:

Guides list

Guide

Purpose

Reference Architecture

Learn the fundamentals of Container Cloud reference architecture to plan your deployment.

Deployment Guide

Deploy Container Cloud of a preferred configuration using supported deployment profiles tailored to the demands of specific business cases.

Operations Guide

Deploy and operate the Container Cloud managed clusters.

Release Compatibility Matrix

Deployment compatibility of the Container Cloud components versions for each product release.

Release Notes

Learn about new features and bug fixes in the current Container Cloud version as well as in the Container Cloud minor releases.

Intended audience

This documentation assumes that the reader is familiar with network and cloud concepts and is intended for the following users:

  • Infrastructure Operator

    • Is member of the IT operations team

    • Has working knowledge of Linux, virtualization, Kubernetes API and CLI, and OpenStack to support the application development team

    • Accesses Mirantis Container Cloud and Kubernetes through a local machine or web UI

    • Provides verified artifacts through a central repository to the Tenant DevOps engineers

  • Tenant DevOps engineer

    • Is member of the application development team and reports to line-of-business (LOB)

    • Has working knowledge of Linux, virtualization, Kubernetes API and CLI to support application owners

    • Accesses Container Cloud and Kubernetes through a local machine or web UI

    • Consumes artifacts from a central repository approved by the Infrastructure Operator

Conventions

This documentation set uses the following conventions in the HTML format:

Documentation conventions

Convention

Description

boldface font

Inline CLI tools and commands, titles of the procedures and system response examples, table titles.

monospaced font

Files names and paths, Helm charts parameters and their values, names of packages, nodes names and labels, and so on.

italic font

Information that distinguishes some concept or term.

Links

External links and cross-references, footnotes.

Main menu > menu item

GUI elements that include any part of interactive user interface and menu navigation.

Superscript

Some extra, brief information. For example, if a feature is available from a specific release or if a feature is in the Technology Preview development stage.

Note

The Note block

Messages of a generic meaning that may be useful to the user.

Caution

The Caution block

Information that prevents a user from mistakes and undesirable consequences when following the procedures.

Warning

The Warning block

Messages that include details that can be easily missed, but should not be ignored by the user and are valuable before proceeding.

See also

The See also block

List of references that may be helpful for understanding of some related tools, concepts, and so on.

Learn more

The Learn more block

Used in the Release Notes to wrap a list of internal references to the reference architecture, deployment and operation procedures specific to a newly implemented product feature.

Technology Preview features

A Technology Preview feature provides early access to upcoming product innovations, allowing customers to experiment with the functionality and provide feedback.

Technology Preview features may be privately or publicly available but neither are intended for production use. While Mirantis will provide assistance with such features through official channels, normal Service Level Agreements do not apply.

As Mirantis considers making future iterations of Technology Preview features generally available, we will do our best to resolve any issues that customers experience when using these features.

During the development of a Technology Preview feature, additional components may become available to the public for evaluation. Mirantis cannot guarantee the stability of such features. As a result, if you are using Technology Preview features, you may not be able to seamlessly update to subsequent product releases, as well as upgrade or migrate to the functionality that has not been announced as full support yet.

Mirantis makes no guarantees that Technology Preview features will graduate to generally available features.

Documentation history

The documentation set refers to Mirantis Container Cloud GA as to the latest released GA version of the product. For details about the Container Cloud GA minor releases dates, refer to Container Cloud releases.

Product Overview

Mirantis Container Cloud enables you to ship code faster by enabling speed with choice, simplicity, and security. Through a single pane of glass you can deploy, manage, and observe Kubernetes clusters on bare metal infrastructure.

The list of the most common use cases includes:

Kubernetes cluster lifecycle management

The consistent lifecycle management of a single Kubernetes cluster is a complex task on its own that is made infinitely more difficult when you have to manage multiple clusters across different platforms spread across the globe. Mirantis Container Cloud provides a single, centralized point from which you can perform full lifecycle management of your container clusters, including automated updates and upgrades.

Highly regulated industries

Regulated industries need a fine level of access control granularity, high security standards and extensive reporting capabilities to ensure that they can meet and exceed the security standards and requirements. Mirantis Container Cloud provides for a fine-grained Role Based Access Control (RBAC) mechanism and easy integration and federation to existing identity management systems (IDM).

Logging, monitoring, alerting

A complete operational visibility is required to identify and address issues in the shortest amount of time – before the problem becomes serious. Mirantis StackLight is the proactive monitoring, logging, and alerting solution designed for large-scale container and cloud observability with extensive collectors, dashboards, trend reporting and alerts.

Storage

Cloud environments require a unified pool of storage that can be scaled up by simply adding storage server nodes. Ceph is a unified, distributed storage system designed for excellent performance, reliability, and scalability. Deploy Ceph utilizing Rook to provide and manage a robust persistent storage that can be used by Kubernetes workloads on the baremetal-based clusters.

Security

Security is a core concern for all enterprises, especially with more of our systems being exposed to the Internet as a norm. Mirantis Container Cloud provides for a multi-layered security approach that includes effective identity management and role based authentication, secure out of the box defaults and extensive security scanning and monitoring during the development process.

5G and Edge

The introduction of 5G technologies and the support of Edge workloads requires an effective multi-tenant solution to manage the underlying container infrastructure. Mirantis Container Cloud provides for a full stack, secure, multi-cloud cluster management and Day-2 operations solution.

Reference Architecture

Overview

Mirantis Container Cloud is a set of microservices that are deployed using Helm charts and run in a Kubernetes cluster. Container Cloud is based on the Kubernetes Cluster API community initiative.

The following diagram illustrates an overview of Container Cloud and the clusters it manages:

_images/cluster-overview.png

All artifacts used by Kubernetes and workloads are stored on the Container Cloud content delivery network (CDN):

  • mirror.mirantis.com (Debian packages including the Ubuntu mirrors)

  • binary.mirantis.com (Helm charts and binary artifacts)

  • mirantis.azurecr.io (Docker image registry)

All Container Cloud components are deployed in the Kubernetes clusters. All Container Cloud APIs are implemented using the Kubernetes Custom Resource Definition (CRD) that represents custom objects stored in Kubernetes and allows you to expand Kubernetes API.

The Container Cloud logic is implemented using controllers. A controller handles the changes in custom resources defined in the controller CRD. A custom resource consists of a spec that describes the desired state of a resource provided by a user. During every change, a controller reconciles the external state of a custom resource with the user parameters and stores this external state in the status subresource of its custom resource.

Container Cloud cluster types

The types of the Container Cloud clusters include:

Bootstrap cluster
  • Runs the bootstrap process on a seed data center bare metal node that can be reused after the management cluster deployment for other purposes.

  • Requires access to the bare metal provider backend.

  • Initially, the bootstrap cluster is created with the following minimal set of components: Bootstrap Controller, public API charts, and the Bootstrap API.

  • The user can interact with the bootstrap cluster through the Bootstrap API to create the configuration for a management cluster and start its deployment. More specifically, the user performs the following operations:

    1. Create required deployment objects.

    2. Optionally add proxy and SSH keys.

    3. Configure the cluster and machines.

    4. Deploy a management cluster.

  • The user can monitor the deployment progress of the cluster and machines.

  • After a successful deployment, the user can download the kubeconfig artifact of the provisioned cluster.

Management cluster

Comprises Container Cloud as product and provides the following functionality:

  • Runs all public APIs and services including the web UIs of Container Cloud.

  • Does not require access to any provider backend.

  • Runs the provider-specific services and internal API including LCMMachine and LCMCluster. Also, it runs an LCM controller for orchestrating managed clusters and other controllers for handling different resources.

  • Requires two-way access to a provider backend. The provider connects to a backend to spawn managed cluster nodes, and the agent running on the nodes accesses the regional cluster to obtain the deployment information.

For deployment details of a management cluster, see Deployment Guide.

Managed cluster
  • A Mirantis Kubernetes Engine (MKE) cluster that an end user creates using the Container Cloud web UI.

  • Requires access to its management cluster. Each node of a managed cluster runs an LCM Agent that connects to the LCM machine of the management cluster to obtain the deployment details.

  • Supports Mirantis OpenStack for Kubernetes (MOSK). For details, see MOSK documentation.

All types of the Container Cloud clusters except the bootstrap cluster are based on the MKE and Mirantis Container Runtime (MCR) architecture. For details, see MKE and MCR documentation.

The following diagram illustrates the distribution of services between each type of the Container Cloud clusters:

_images/cluster-types.png

Container Cloud provider

The Mirantis Container Cloud provider is the central component of Container Cloud that provisions a node of a management or managed cluster and runs the LCM Agent on this node. It runs in a management cluster and requires connection to a provider backend.

The Container Cloud provider interacts with the following types of public API objects:

Public API object name

Description

Container Cloud release object

Contains the following information about clusters:

  • Version of the supported Cluster release for a management cluster

  • List of supported Cluster releases for the managed clusters and supported upgrade path

  • Description of Helm charts that are installed on the management cluster

Cluster release object

  • Provides a specific version of a management or managed cluster. Any Cluster release object, as well as a Container Cloud release object never changes, only new releases can be added. Any change leads to a new release of a cluster.

  • Contains references to all components and their versions that are used to deploy all cluster types:

    • LCM components:

      • LCM Agent

      • Ansible playbooks

      • Scripts

      • Description of steps to execute during a cluster deployment and upgrade

      • Helm Controller image references

    • Supported Helm charts description:

      • Helm chart name and version

      • Helm release name

      • Helm values

Cluster object

  • References the Credentials, KaaSRelease and ClusterRelease objects.

  • Represents all cluster-level resources, for example, networks, load balancer for the Kubernetes API, and so on. It uses data from the Credentials object to create these resources and data from the KaaSRelease and ClusterRelease objects to ensure that all lower-level cluster objects are created.

Machine object

  • References the Cluster object.

  • Represents one node of a managed cluster and contains all data to provision it.

Credentials object

Contains all information necessary to connect to a provider backend.

PublicKey object

Is provided to every machine to obtain an SSH access.

The following diagram illustrates the Container Cloud provider data flow:

_images/provider-dataflow.png

The Container Cloud provider performs the following operations in Container Cloud:

  • Consumes the below types of data from a management cluster:

    • Credentials to connect to a provider backend

    • Deployment instructions from the KaaSRelease and ClusterRelease objects

    • The cluster-level parameters from the Cluster objects

    • The machine-level parameters from the Machine objects

  • Prepares data for all Container Cloud components:

    • Creates the LCMCluster and LCMMachine custom resources for LCM Controller and LCM Agent. The LCMMachine custom resources are created empty to be later handled by the LCM Controller.

    • Creates the HelmBundle custom resources for the Helm Controller using data from the KaaSRelease and ClusterRelease objects.

    • Creates service accounts for these custom resources.

    • Creates a scope in Identity and access management (IAM) for a user access to a managed cluster.

  • Provisions nodes for a managed cluster using the cloud-init script that downloads and runs the LCM Agent.

  • Installs Helm Controller as a Helm v3 chart.

Release Controller

The Mirantis Container Cloud Release Controller is responsible for the following functionality:

  • Monitor and control the KaaSRelease and ClusterRelease objects present in a management cluster. If any release object is used in a cluster, the Release Controller prevents the deletion of such an object.

  • Sync the KaaSRelease and ClusterRelease objects published at https://binary.mirantis.com/releases/ with an existing management cluster.

  • Trigger the Container Cloud auto-update procedure if a new KaaSRelease object is found:

    1. Search for the managed clusters with old Cluster releases that are not supported by a new Container Cloud release. If any are detected, abort the auto-update and display a corresponding note about an old Cluster release in the Container Cloud web UI for the managed clusters. In this case, a user must update all managed clusters using the Container Cloud web UI. Once all managed clusters are updated to the Cluster releases supported by a new Container Cloud release, the Container Cloud auto-update is retriggered by the Release Controller.

    2. Trigger the Container Cloud release update of all Container Cloud components in a management cluster. The update itself is processed by the Container Cloud provider.

    3. Trigger the Cluster release update of a management cluster to the Cluster release version that is indicated in the updated Container Cloud release version. The LCMCluster components, such as MKE, are updated before the HelmBundle components, such as StackLight or Ceph.

      Once a management cluster is updated, an option to update a managed cluster becomes available in the Container Cloud web UI. During a managed cluster update, all cluster components including Kubernetes are automatically updated to newer versions if available. The LCMCluster components, such as MKE, are updated before the HelmBundle components, such as StackLight or Ceph.

The Operator can delay the Container Cloud automatic upgrade procedure for a limited amount of time or schedule upgrade to run at desired hours or weekdays. For details, see Schedule Mirantis Container Cloud updates.

Container Cloud remains operational during the management cluster upgrade. Managed clusters are not affected during this upgrade. For the list of components that are updated during the Container Cloud upgrade, see the Components versions section of the corresponding Container Cloud release in Release Notes.

When Mirantis announces support of the newest versions of Mirantis Container Runtime (MCR) and Mirantis Kubernetes Engine (MKE), Container Cloud automatically upgrades these components as well. For the maintenance window best practices before upgrade of these components, see MKE Documentation.

See also

Patch releases

Web UI

The Mirantis Container Cloud web UI is mainly designed to create and update the managed clusters as well as add or remove machines to or from an existing managed cluster.

You can use the Container Cloud web UI to obtain the management cluster details including endpoints, release version, and so on. The management cluster update occurs automatically with a new release change log available through the Container Cloud web UI.

The Container Cloud web UI is a JavaScript application that is based on the React framework. The Container Cloud web UI is designed to work on a client side only. Therefore, it does not require a special backend. It interacts with the Kubernetes and Keycloak APIs directly. The Container Cloud web UI uses a Keycloak token to interact with Container Cloud API and download kubeconfig for the management and managed clusters.

The Container Cloud web UI uses NGINX that runs on a management cluster and handles the Container Cloud web UI static files. NGINX proxies the Kubernetes and Keycloak APIs for the Container Cloud web UI.

Bare metal

The bare metal service provides for the discovery, deployment, and management of bare metal hosts.

The bare metal management in Mirantis Container Cloud is implemented as a set of modular microservices. Each microservice implements a certain requirement or function within the bare metal management system.

Bare metal components

The bare metal management solution for Mirantis Container Cloud includes the following components:

Bare metal components

Component

Description

OpenStack Ironic

The backend bare metal manager in a standalone mode with its auxiliary services that include httpd, dnsmasq, and mariadb.

OpenStack Ironic Inspector

Introspects and discovers the bare metal hosts inventory. Includes OpenStack Ironic Python Agent (IPA) that is used as a provision-time agent for managing bare metal hosts.

Ironic Operator

Monitors changes in the external IP addresses of httpd, ironic, and ironic-inspector and automatically reconciles the configuration for dnsmasq, ironic, baremetal-provider, and baremetal-operator.

Bare Metal Operator

Manages bare metal hosts through the Ironic API. The Container Cloud bare-metal operator implementation is based on the Metal³ project.

Bare metal resources manager

Ensures that the bare metal provisioning artifacts such as the distribution image of the operating system is available and up to date.

cluster-api-provider-baremetal

The plugin for the Kubernetes Cluster API integrated with Container Cloud. Container Cloud uses the Metal³ implementation of cluster-api-provider-baremetal for the Cluster API.

HAProxy

Load balancer for external access to the Kubernetes API endpoint.

LCM Agent

Used for physical and logical storage, physical and logical network, and control over the life cycle of a bare metal machine resources.

Ceph

Distributed shared storage is required by the Container Cloud services to create persistent volumes to store their data.

MetalLB

Load balancer for Kubernetes services on bare metal. 1

Keepalived

Monitoring service that ensures availability of the virtual IP for the external load balancer endpoint (HAProxy). 1

IPAM

IP address management services provide consistent IP address space to the machines in bare metal clusters. See details in IP Address Management.

1(1,2)

For details, see Built-in load balancing.

The diagram below summarizes the following components and resource kinds:

  • Metal³-based bare metal management in Container Cloud (white)

  • Internal APIs (yellow)

  • External dependency components (blue)

_images/bm-component-stack.png
Bare metal networking

This section provides an overview of the networking configuration and the IP address management in the Mirantis Container Cloud on bare metal.

IP Address Management

Mirantis Container Cloud on bare metal uses IP Address Management (IPAM) to keep track of the network addresses allocated to bare metal hosts. This is necessary to avoid IP address conflicts and expiration of address leases to machines through DHCP.

Note

Only IPv4 address family is currently supported by Container Cloud and IPAM. IPv6 is not supported and not used in Container Cloud.

IPAM is provided by the kaas-ipam controller. Its functions include:

  • Allocation of IP address ranges or subnets to newly created clusters using the Subnet resource.

    Note

    Before Container Cloud 2.27.0 (Cluster releases 17.1.0, 16.1.0, or earlier) the deprecated SubnetPool resource was also used for this purpose. For details, see MOSK Deprecation Notes: SubnetPool resource management.

  • Allocation of IP addresses to machines and cluster services at the request of baremetal-provider using the IpamHost and IPaddr resources.

  • Creation and maintenance of host networking configuration on the bare metal hosts using the IpamHost resources.

The IPAM service can support different networking topologies and network hardware configurations on the bare metal hosts.

In the most basic network configuration, IPAM uses a single L3 network to assign addresses to all bare metal hosts, as defined in Managed cluster networking.

You can apply complex networking configurations to a bare metal host using the L2 templates. The L2 templates imply multihomed host networking and enable you to create a managed cluster where nodes use separate host networks for different types of traffic. Multihoming is required to ensure the security and performance of a managed cluster.

Caution

Modification of L2 templates in use is allowed with a mandatory validation step from the Infrastructure Operator to prevent accidental cluster failures due to unsafe changes. The list of risks posed by modifying L2 templates includes:

  • Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.

  • Connections between services are interrupted unexpectedly, which can cause data loss.

  • Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.

For details, see Modify network configuration on an existing machine.

Management cluster networking

The main purpose of networking in a Container Cloud management cluster is to provide access to the Container Cloud Management API that consists of:

  • Container Cloud Public API

    Used by end users to provision and configure managed clusters and machines. Includes the Container Cloud web UI.

  • Container Cloud LCM API

    Used by LCM agents in managed clusters to obtain configuration and report status. Contains provider-specific services and internal API including LCMMachine and LCMCluster objects.

The following types of networks are supported for the management clusters in Container Cloud:

  • PXE network

    Enables PXE boot of all bare metal machines in the Container Cloud region.

    • PXE subnet

      Provides IP addresses for DHCP and network boot of the bare metal hosts for initial inspection and operating system provisioning. This network may not have the default gateway or a router connected to it. The PXE subnet is defined by the Container Cloud Operator during bootstrap.

      Provides IP addresses for the bare metal management services of Container Cloud, such as bare metal provisioning service (Ironic). These addresses are allocated and served by MetalLB.

  • Management network

    Connects LCM Agents running on the hosts to the Container Cloud LCM API. Serves the external connections to the Container Cloud Management API. The network is also used for communication between kubelet and the Kubernetes API server inside a Kubernetes cluster. The MKE components use this network for communication inside a swarm cluster.

    • LCM subnet

      Provides IP addresses for the Kubernetes nodes in the management cluster. This network also provides a Virtual IP (VIP) address for the load balancer that enables external access to the Kubernetes API of a management cluster. This VIP is also the endpoint to access the Container Cloud Management API in the management cluster.

      Provides IP addresses for the externally accessible services of Container Cloud, such as Keycloak, web UI, StackLight. These addresses are allocated and served by MetalLB.

  • Kubernetes workloads network

    Technology Preview

    Serves the internal traffic between workloads on the management cluster.

    • Kubernetes workloads subnet

      Provides IP addresses that are assigned to nodes and used by Calico.

  • Out-of-Band (OOB) network

    Connects to Baseboard Management Controllers of the servers that host the management cluster. The OOB subnet must be accessible from the management network through IP routing. The OOB network is not managed by Container Cloud and is not represented in the IPAM API.

Managed cluster networking

A Kubernetes cluster networking is typically focused on connecting pods on different nodes. On bare metal, however, the cluster networking is more complex as it needs to facilitate many different types of traffic.

Kubernetes clusters managed by Mirantis Container Cloud have the following types of traffic:

  • PXE network

    Enables the PXE boot of all bare metal machines in Container Cloud. This network is not configured on the hosts in a managed cluster. It is used by the bare metal provider to provision additional hosts in managed clusters and is disabled on the hosts after provisioning is done.

  • Life-cycle management (LCM) network

    Connects LCM Agents running on the hosts to the Container Cloud LCM API. The LCM API is provided by the management cluster. The LCM network is also used for communication between kubelet and the Kubernetes API server inside a Kubernetes cluster. The MKE components use this network for communication inside a swarm cluster.

    When using the BGP announcement of the IP address for the cluster API load balancer, which is available as Technology Preview since Container Cloud 2.24.4, no segment stretching is required between Kubernetes master nodes. Also, in this scenario, the load balancer IP address is not required to match the LCM subnet CIDR address.

    • LCM subnet(s)

      Provides IP addresses that are statically allocated by the IPAM service to bare metal hosts. This network must be connected to the Kubernetes API endpoint of the management cluster through an IP router.

      LCM Agents running on managed clusters will connect to the management cluster API through this router. LCM subnets may be different per managed cluster as long as this connection requirement is satisfied.

      The Virtual IP (VIP) address for load balancer that enables access to the Kubernetes API of the managed cluster must be allocated from the LCM subnet.

    • Cluster API subnet

      Technology Preview

      Provides a load balancer IP address for external access to the cluster API. Mirantis recommends that this subnet stays unique per managed cluster.

  • Kubernetes workloads network

    Serves as an underlay network for traffic between pods in the managed cluster. Do not share this network between clusters.

    • Kubernetes workloads subnet(s)

      Provides IP addresses that are statically allocated by the IPAM service to all nodes and that are used by Calico for cross-node communication inside a cluster. By default, VXLAN overlay is used for Calico cross-node communication.

  • Kubernetes external network

    Serves ingress traffic to the managed cluster from the outside world. You can share this network between clusters, but with dedicated subnets per cluster. Several or all cluster nodes must be connected to this network. Traffic from external users to the externally available Kubernetes load-balanced services comes through the nodes that are connected to this network.

    • Services subnet(s)

      Provides IP addresses for externally available Kubernetes load-balanced services. The address ranges for MetalLB are assigned from this subnet. There can be several subnets per managed cluster that define the address ranges or address pools for MetalLB.

    • External subnet(s)

      Provides IP addresses that are statically allocated by the IPAM service to nodes. The IP gateway in this network is used as the default route on all nodes that are connected to this network. This network allows external users to connect to the cluster services exposed as Kubernetes load-balanced services. MetalLB speakers must run on the same nodes. For details, see Configure node selector for MetalLB speaker.

  • Storage network

    Serves storage access and replication traffic from and to Ceph OSD services. The storage network does not need to be connected to any IP routers and does not require external access, unless you want to use Ceph from outside of a Kubernetes cluster. To use a dedicated storage network, define and configure both subnets listed below.

    • Storage access subnet(s)

      Provides IP addresses that are statically allocated by the IPAM service to Ceph nodes. The Ceph OSD services bind to these addresses on their respective nodes. Serves Ceph access traffic from and to storage clients. This is a public network in Ceph terms. 1

    • Storage replication subnet(s)

      Provides IP addresses that are statically allocated by the IPAM service to Ceph nodes. The Ceph OSD services bind to these addresses on their respective nodes. Serves Ceph internal replication traffic. This is a cluster network in Ceph terms. 1

  • Out-of-Band (OOB) network

    Connects baseboard management controllers (BMCs) of the bare metal hosts. This network must not be accessible from the managed clusters.

The following diagram illustrates the networking schema of the Container Cloud deployment on bare metal with a managed cluster:

_images/bm-cluster-l3-networking-multihomed.png
1(1,2)

For more details about Ceph networks, see Ceph Network Configuration Reference.

Host networking

The following network roles are defined for all Mirantis Container Cloud clusters nodes on bare metal including the bootstrap, management and managed cluster nodes:

  • Out-of-band (OOB) network

    Connects the Baseboard Management Controllers (BMCs) of the hosts in the network to Ironic. This network is out of band for the host operating system.

  • PXE network

    Enables remote booting of servers through the PXE protocol. In management clusters, DHCP server listens on this network for hosts discovery and inspection. In managed clusters, hosts use this network for the initial PXE boot and provisioning.

  • LCM network

    Connects LCM Agents running on the node to the LCM API of the management cluster. It is also used for communication between kubelet and the Kubernetes API server inside a Kubernetes cluster. The MKE components use this network for communication inside a swarm cluster. In management clusters, it is replaced by the management network.

  • Kubernetes workloads (pods) network

    Technology Preview

    Serves connections between Kubernetes pods. Each host has an address on this network, and this address is used by Calico as an endpoint to the underlay network.

  • Kubernetes external network

    Technology Preview

    Serves external connection to the Kubernetes API and the user services exposed by the cluster. In management clusters, it is replaced by the management network.

  • Management network

    Serves external connections to the Container Cloud Management API and services of the management cluster. Not available in a managed cluster.

  • Storage access network

    Connects Ceph nodes to the storage clients. The Ceph OSD service is bound to the address on this network. This is a public network in Ceph terms. 0

  • Storage replication network

    Connects Ceph nodes to each other. Serves internal replication traffic. This is a cluster network in Ceph terms. 0

Each network is represented on the host by a virtual Linux bridge. Physical interfaces may be connected to one of the bridges directly, or through a logical VLAN subinterface, or combined into a bond interface that is in turn connected to a bridge.

The following table summarizes the default names used for the bridges connected to the networks listed above:

Management cluster

Network type

Bridge name

Assignment method TechPreview

OOB network

N/A

N/A

PXE network

bm-pxe

By a static interface name

Management network

k8s-lcm 2

By a subnet label ipam/SVC-k8s-lcm

Kubernetes workloads network

k8s-pods 1

By a static interface name

Managed cluster

Network type

Bridge name

Assignment method

OOB network

N/A

N/A

PXE network

N/A

N/A

LCM network

k8s-lcm 2

By a subnet label ipam/SVC-k8s-lcm

Kubernetes workloads network

k8s-pods 1

By a static interface name

Kubernetes external network

k8s-ext

By a static interface name

Storage access (public) network

ceph-public

By the subnet label ipam/SVC-ceph-public

Storage replication (cluster) network

ceph-cluster

By the subnet label ipam/SVC-ceph-cluster

0(1,2)

Ceph network configuration reference

1(1,2)

Interface name for this network role is static and cannot be changed.

2(1,2)

Use of this interface name (and network role) is mandatory for every cluster.

Storage

The baremetal-based Mirantis Container Cloud uses Ceph as a distributed storage system for file, block, and object storage. This section provides an overview of a Ceph cluster deployed by Container Cloud.

Overview

Mirantis Container Cloud deploys Ceph on baremetal-based managed clusters using Helm charts with the following components:

Rook Ceph Operator

A storage orchestrator that deploys Ceph on top of a Kubernetes cluster. Also known as Rook or Rook Operator. Rook operations include:

  • Deploying and managing a Ceph cluster based on provided Rook CRs such as CephCluster, CephBlockPool, CephObjectStore, and so on.

  • Orchestrating the state of the Ceph cluster and all its daemons.

KaaSCephCluster custom resource (CR)

Represents the customization of a Kubernetes installation and allows you to define the required Ceph configuration through the Container Cloud web UI before deployment. For example, you can define the failure domain, Ceph pools, Ceph node roles, number of Ceph components such as Ceph OSDs, and so on. The ceph-kcc-controller controller on the Container Cloud management cluster manages the KaaSCephCluster CR.

Ceph Controller

A Kubernetes controller that obtains the parameters from Container Cloud through a CR, creates CRs for Rook and updates its CR status based on the Ceph cluster deployment progress. It creates users, pools, and keys for OpenStack and Kubernetes and provides Ceph configurations and keys to access them. Also, Ceph Controller eventually obtains the data from the OpenStack Controller for the Keystone integration and updates the RADOS Gateway services configurations to use Kubernetes for user authentication. Ceph Controller operations include:

  • Transforming user parameters from the Container Cloud Ceph CR into Rook CRs and deploying a Ceph cluster using Rook.

  • Providing integration of the Ceph cluster with Kubernetes.

  • Providing data for OpenStack to integrate with the deployed Ceph cluster.

Ceph Status Controller

A Kubernetes controller that collects all valuable parameters from the current Ceph cluster, its daemons, and entities and exposes them into the KaaSCephCluster status. Ceph Status Controller operations include:

  • Collecting all statuses from a Ceph cluster and corresponding Rook CRs.

  • Collecting additional information on the health of Ceph daemons.

  • Provides information to the status section of the KaaSCephCluster CR.

Ceph Request Controller

A Kubernetes controller that obtains the parameters from Container Cloud through a CR and manages Ceph OSD lifecycle management (LCM) operations. It allows for a safe Ceph OSD removal from the Ceph cluster. Ceph Request Controller operations include:

  • Providing an ability to perform Ceph OSD LCM operations.

  • Obtaining specific CRs to remove Ceph OSDs and executing them.

  • Pausing the regular Ceph Controller reconcile until all requests are completed.

A typical Ceph cluster consists of the following components:

  • Ceph Monitors - three or, in rare cases, five Ceph Monitors.

  • Ceph Managers:

    • Before Container Cloud 2.22.0, one Ceph Manager.

    • Since Container Cloud 2.22.0, two Ceph Managers.

  • RADOS Gateway services - Mirantis recommends having three or more RADOS Gateway instances for HA.

  • Ceph OSDs - the number of Ceph OSDs may vary according to the deployment needs.

    Warning

    • A Ceph cluster with 3 Ceph nodes does not provide hardware fault tolerance and is not eligible for recovery operations, such as a disk or an entire Ceph node replacement.

    • A Ceph cluster uses the replication factor that equals 3. If the number of Ceph OSDs is less than 3, a Ceph cluster moves to the degraded state with the write operations restriction until the number of alive Ceph OSDs equals the replication factor again.

The placement of Ceph Monitors and Ceph Managers is defined in the KaaSCephCluster CR.

The following diagram illustrates the way a Ceph cluster is deployed in Container Cloud:

_images/ceph-deployment.png

The following diagram illustrates the processes within a deployed Ceph cluster:

_images/ceph-data-flow.png
Limitations

A Ceph cluster configuration in Mirantis Container Cloud includes but is not limited to the following limitations:

  • Only one Ceph Controller per a managed cluster and only one Ceph cluster per Ceph Controller are supported.

  • The replication size for any Ceph pool must be set to more than 1.

  • All CRUSH rules must have the same failure_domain.

  • Only one CRUSH tree per cluster. The separation of devices per Ceph pool is supported through device classes with only one pool of each type for a device class.

  • Only the following types of CRUSH buckets are supported:

    • topology.kubernetes.io/region

    • topology.kubernetes.io/zone

    • topology.rook.io/datacenter

    • topology.rook.io/room

    • topology.rook.io/pod

    • topology.rook.io/pdu

    • topology.rook.io/row

    • topology.rook.io/rack

    • topology.rook.io/chassis

  • Only IPv4 is supported.

  • If two or more Ceph OSDs are located on the same device, there must be no dedicated WAL or DB for this class.

  • Only a full collocation or dedicated WAL and DB configurations are supported.

  • The minimum size of any defined Ceph OSD device is 5 GB.

  • Lifted since Container Cloud 2.24.2 (Cluster releases 14.0.1 and 15.0.1). Ceph cluster does not support removable devices (with hotplug enabled) for deploying Ceph OSDs.

  • Ceph OSDs support only raw disks as data devices meaning that no dm or lvm devices are allowed.

  • When adding a Ceph node with the Ceph Monitor role, if any issues occur with the Ceph Monitor, rook-ceph removes it and adds a new Ceph Monitor instead, named using the next alphabetic character in order. Therefore, the Ceph Monitor names may not follow the alphabetical order. For example, a, b, d, instead of a, b, c.

  • Reducing the number of Ceph Monitors is not supported and causes the Ceph Monitor daemons removal from random nodes.

  • Removal of the mgr role in the nodes section of the KaaSCephCluster CR does not remove Ceph Managers. To remove a Ceph Manager from a node, remove it from the nodes spec and manually delete the mgr pod in the Rook namespace.

  • Lifted since Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.10). Ceph does not support allocation of Ceph RGW pods on nodes where the Federal Information Processing Standard (FIPS) mode is enabled.

Addressing storage devices

There are several formats to use when specifying and addressing storage devices of a Ceph cluster. The default and recommended one is the /dev/disk/by-id format. This format is reliable and unaffected by the disk controller actions, such as device name shuffling or /dev/disk/by-path recalculating.

Difference between by-id, name, and by-path formats

The storage device /dev/disk/by-id format in most of the cases bases on a disk serial number, which is unique for each disk. A by-id symlink is created by the udev rules in the following format, where <BusID> is an ID of the bus to which the disk is attached and <DiskSerialNumber> stands for a unique disk serial number:

/dev/disk/by-id/<BusID>-<DiskSerialNumber>

Typical by-id symlinks for storage devices look as follows:

/dev/disk/by-id/nvme-SAMSUNG_MZ1LB3T8HMLA-00007_S46FNY0R394543
/dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS
/dev/disk/by-id/ata-WDC_WD4003FZEX-00Z4SA0_WD-WMC5D0D9DMEH

In the example above, symlinks contain the following IDs:

  • Bus IDs: nvme, scsi-SATA and ata

  • Disk serial numbers: SAMSUNG_MZ1LB3T8HMLA-00007_S46FNY0R394543, HGST_HUS724040AL_PN1334PEHN18ZS and WDC_WD4003FZEX-00Z4SA0_WD-WMC5D0D9DMEH.

An exception to this rule is the wwn by-id symlinks, which are programmatically generated at boot. They are not solely based on disk serial numbers but also include other node information. This can lead to the wwn being recalculated when the node reboots. As a result, this symlink type cannot guarantee a persistent disk identifier and should not be used as a stable storage device symlink in a Ceph cluster.

The storage device name and by-path formats cannot be considered persistent because the sequence in which block devices are added during boot is semi-arbitrary. This means that block device names, for example, nvme0n1 and sdc, are assigned to physical disks during discovery, which may vary inconsistently from the previous node state. The same inconsistency applies to by-path symlinks, as they rely on the shortest physical path to the device at boot and may differ from the previous node state.

Therefore, Mirantis highly recommends using storage device by-id symlinks that contain disk serial numbers. This approach enables you to use a persistent device identifier addressed in the Ceph cluster specification.

Example KaaSCephCluster with device by-id identifiers

Below is an example KaaSCephCluster custom resource using the /dev/disk/by-id format for storage devices specification:

Note

Container Cloud enables you to use fullPath for the by-id symlinks since 2.25.0. For the earlier product versions, use the name field instead.

 apiVersion: kaas.mirantis.com/v1alpha1
 kind: KaaSCephCluster
 metadata:
   name: ceph-cluster-managed-cluster
   namespace: managed-ns
 spec:
   cephClusterSpec:
     nodes:
       # Add the exact ``nodes`` names.
       # Obtain the name from the "get machine" list.
       cz812-managed-cluster-storage-worker-noefi-58spl:
         roles:
         - mgr
         - mon
       # All disk configuration must be reflected in ``status.providerStatus.hardware.storage`` of the ``Machine`` object
         storageDevices:
         - config:
             deviceClass: ssd
           fullPath: /dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231440912
       cz813-managed-cluster-storage-worker-noefi-lr4k4:
         roles:
         - mgr
         - mon
         storageDevices:
         - config:
             deviceClass: nvme
           fullPath: /dev/disk/by-id/nvme-SAMSUNG_MZ1LB3T8HMLA-00007_S46FNY0R394543
       cz814-managed-cluster-storage-worker-noefi-z2m67:
         roles:
         - mgr
         - mon
         storageDevices:
         - config:
             deviceClass: nvme
           fullPath: /dev/disk/by-id/nvme-SAMSUNG_ML1EB3T8HMLA-00007_S46FNY1R130423
     pools:
     - default: true
       deviceClass: ssd
       name: kubernetes
       replicated:
         size: 3
       role: kubernetes
   k8sCluster:
     name: managed-cluster
     namespace: managed-ns
Extended hardware configuration

Mirantis Container Cloud provides APIs that enable you to define hardware configurations that extend the reference architecture:

  • Bare Metal Host Profile API

    Enables for quick configuration of host boot and storage devices and assigning of custom configuration profiles to individual machines. See Create a custom bare metal host profile.

  • IP Address Management API

    Enables for quick configuration of host network interfaces and IP addresses and setting up of IP addresses ranges for automatic allocation. See Create L2 templates.

Typically, operations with the extended hardware configurations are available through the API and CLI, but not the web UI.

Automatic upgrade of a host operating system

To keep operating system on a bare metal host up to date with the latest security updates, the operating system requires periodic software packages upgrade that may or may not require the host reboot.

Mirantis Container Cloud uses life cycle management tools to update the operating system packages on the bare metal hosts. Container Cloud may also trigger restart of bare metal hosts to apply the updates.

In the management cluster of Container Cloud, software package upgrade and host restart is applied automatically when a new Container Cloud version with available kernel or software packages upgrade is released.

In managed clusters, package upgrade and host restart is applied as part of usual cluster upgrade using the Update cluster option in the Container Cloud web UI.

Operating system upgrade and host restart are applied to cluster nodes one by one. If Ceph is installed in the cluster, the Container Cloud orchestration securely pauses the Ceph OSDs on the node before restart. This allows avoiding degradation of the storage service.

Caution

  • Depending on the cluster configuration, applying security updates and host restart can increase the update time for each node to up to 1 hour.

  • Cluster nodes are updated one by one. Therefore, for large clusters, the update may take several days to complete.

Built-in load balancing

The Mirantis Container Cloud managed clusters use MetalLB for load balancing of services and HAProxy with VIP managed by Virtual Router Redundancy Protocol (VRRP) with Keepalived for the Kubernetes API load balancer.

Kubernetes API load balancing

Every control plane node of each Kubernetes cluster runs the kube-api service in a container. This service provides a Kubernetes API endpoint. Every control plane node also runs the haproxy server that provides load balancing with backend health checking for all kube-api endpoints as backends.

The default load balancing method is least_conn. With this method, a request is sent to the server with the least number of active connections. The default load balancing method cannot be changed using the Container Cloud API.

Only one of the control plane nodes at any given time serves as a front end for Kubernetes API. To ensure this, the Kubernetes clients use a virtual IP (VIP) address for accessing Kubernetes API. This VIP is assigned to one node at a time using VRRP. Keepalived running on each control plane node provides health checking and failover of the VIP.

Keepalived is configured in multicast mode.

Note

The use of VIP address for load balancing of Kubernetes API requires that all control plane nodes of a Kubernetes cluster are connected to a shared L2 segment. This limitation prevents from installing full L3 topologies where control plane nodes are split between different L2 segments and L3 networks.

Services load balancing

The services provided by the Kubernetes clusters, including Container Cloud and user services, are balanced by MetalLB. The metallb-speaker service runs on every worker node in the cluster and handles connections to the service IP addresses.

MetalLB runs in the MAC-based (L2) mode. It means that all control plane nodes must be connected to a shared L2 segment. This is a limitation that does not allow installing full L3 cluster topologies.

Kubernetes lifecycle management

The Kubernetes lifecycle management (LCM) engine in Mirantis Container Cloud consists of the following components:

LCM Controller

Responsible for all LCM operations. Consumes the LCMCluster object and orchestrates actions through LCM Agent.

LCM Agent

Runs on the target host. Executes Ansible playbooks in headless mode.

Helm Controller

Responsible for the Helm charts life cycle, is installed by the provider as a Helm v3 chart.

The Kubernetes LCM components handle the following custom resources:

  • LCMCluster

  • LCMMachine

  • HelmBundle

The following diagram illustrates handling of the LCM custom resources by the Kubernetes LCM components. On a managed cluster, apiserver handles multiple Kubernetes objects, for example, deployments, nodes, RBAC, and so on.

_images/lcm-components.png
LCM custom resources

The Kubernetes LCM components handle the following custom resources (CRs):

  • LCMMachine

  • LCMCluster

  • HelmBundle

LCMMachine

Describes a machine that is located on a cluster. It contains the machine type, control or worker, StateItems that correspond to Ansible playbooks and miscellaneous actions, for example, downloading a file or executing a shell command. LCMMachine reflects the current state of the machine, for example, a node IP address, and each StateItem through its status. Multiple LCMMachine CRs can correspond to a single cluster.

LCMCluster

Describes a managed cluster. In its spec, LCMCluster contains a set of StateItems for each type of LCMMachine, which describe the actions that must be performed to deploy the cluster. LCMCluster is created by the provider, using machineTypes of the Release object. The status field of LCMCluster reflects the status of the cluster, for example, the number of ready or requested nodes.

HelmBundle

Wrapper for Helm charts that is handled by Helm Controller. HelmBundle tracks what Helm charts must be installed on a managed cluster.

LCM Controller

LCM Controller runs on the management cluster and orchestrates the LCMMachine objects according to their type and their LCMCluster object.

Once the LCMCluster and LCMMachine objects are created, LCM Controller starts monitoring them to modify the spec fields and update the status fields of the LCMMachine objects when required. The status field of LCMMachine is updated by LCM Agent running on a node of a management or managed cluster.

Each LCMMachine has the following lifecycle states:

  1. Uninitialized - the machine is not yet assigned to an LCMCluster.

  2. Pending - the agent reports a node IP address and host name.

  3. Prepare - the machine executes StateItems that correspond to the prepare phase. This phase usually involves downloading the necessary archives and packages.

  4. Deploy - the machine executes StateItems that correspond to the deploy phase that is becoming a Mirantis Kubernetes Engine (MKE) node.

  5. Ready - the machine is being deployed.

  6. Upgrade - the machine is being upgraded to the new MKE version.

  7. Reconfigure - the machine executes StateItems that correspond to the reconfigure phase. The machine configuration is being updated without affecting workloads running on the machine.

The templates for StateItems are stored in the machineTypes field of an LCMCluster object, with separate lists for the MKE manager and worker nodes. Each StateItem has the execution phase field for a management and managed cluster:

  1. The prepare phase is executed for all machines for which it was not executed yet. This phase comprises downloading the files necessary for the cluster deployment, installing the required packages, and so on.

  2. During the deploy phase, a node is added to the cluster. LCM Controller applies the deploy phase to the nodes in the following order:

    1. First manager node is deployed.

    2. The remaining manager nodes are deployed one by one and the worker nodes are deployed in batches (by default, up to 50 worker nodes at the same time).

LCM Controller deploys and upgrades a Mirantis Container Cloud cluster by setting StateItems of LCMMachine objects following the corresponding StateItems phases described above. The Container Cloud cluster upgrade process follows the same logic that is used for a new deployment, that is applying a new set of StateItems to the LCMMachines after updating the LCMCluster object. But if the existing worker node is being upgraded, LCM Controller performs draining and cordoning on this node honoring the Pod Disruption Budgets. This operation prevents unexpected disruptions of the workloads.

LCM Agent

LCM Agent handles a single machine that belongs to a management or managed cluster. It runs on the machine operating system but communicates with apiserver of the management cluster. LCM Agent is deployed as a systemd unit using cloud-init. LCM Agent has a built-in self-upgrade mechanism.

LCM Agent monitors the spec of a particular LCMMachine object to reconcile the machine state with the object StateItems and update the LCMMachine status accordingly. The actions that LCM Agent performs while handling the StateItems are as follows:

  • Download configuration files

  • Run shell commands

  • Run Ansible playbooks in headless mode

LCM Agent provides the IP address and host name of the machine for the LCMMachine status parameter.

Helm Controller

Helm Controller is used by Mirantis Container Cloud to handle management and managed clusters core addons such as StackLight and the application addons such as the OpenStack components.

Helm Controller is installed as a separate Helm v3 chart by the Container Cloud provider. Its Pods are created using Deployment.

The Helm release information is stored in the KaaSRelease object for the management clusters and in the ClusterRelease object for all types of the Container Cloud clusters. These objects are used by the Container Cloud provider. The Container Cloud provider uses the information from the ClusterRelease object together with the Container Cloud API Cluster spec. In Cluster spec, the operator can specify the Helm release name and charts to use. By combining the information from the Cluster providerSpec parameter and its ClusterRelease object, the cluster actuator generates the LCMCluster objects. These objects are further handled by LCM Controller and the HelmBundle object handled by Helm Controller. HelmBundle must have the same name as the LCMCluster object for the cluster that HelmBundle applies to.

Although a cluster actuator can only create a single HelmBundle per cluster, Helm Controller can handle multiple HelmBundle objects per cluster.

Helm Controller handles the HelmBundle objects and reconciles them with the state of Helm in its cluster.

Helm Controller can also be used by the management cluster with corresponding HelmBundle objects created as part of the initial management cluster setup.

Identity and access management

Identity and access management (IAM) provides a central point of users and permissions management of the Mirantis Container Cloud cluster resources in a granular and unified manner. Also, IAM provides infrastructure for single sign-on user experience across all Container Cloud web portals.

IAM for Container Cloud consists of the following components:

Keycloak
  • Provides the OpenID Connect endpoint

  • Integrates with an external identity provider (IdP), for example, existing LDAP or Google Open Authorization (OAuth)

  • Stores roles mapping for users

IAM Controller
  • Provides IAM API with data about Container Cloud projects

  • Handles all role-based access control (RBAC) components in Kubernetes API

IAM API

Provides an abstraction API for creating user scopes and roles

External identity provider integration

To be consistent and keep the integrity of a user database and user permissions, in Mirantis Container Cloud, IAM stores the user identity information internally. However in real deployments, the identity provider usually already exists.

Out of the box, in Container Cloud, IAM supports integration with LDAP and Google Open Authorization (OAuth). If LDAP is configured as an external identity provider, IAM performs one-way synchronization by mapping attributes according to configuration.

In the case of the Google Open Authorization (OAuth) integration, the user is automatically registered and their credentials are stored in the internal database according to the user template configuration. The Google OAuth registration workflow is as follows:

  1. The user requests a Container Cloud web UI resource.

  2. The user is redirected to the IAM login page and logs in using the Log in with Google account option.

  3. IAM creates a new user with the default access rights that are defined in the user template configuration.

  4. The user can access the Container Cloud web UI resource.

The following diagram illustrates the external IdP integration to IAM:

_images/iam-ext-idp.png

You can configure simultaneous integration with both external IdPs with the user identity matching feature enabled.

Authentication and authorization

Mirantis IAM uses the OpenID Connect (OIDC) protocol for handling authentication.

Implementation flow

Mirantis IAM performs as an OpenID Connect (OIDC) provider, it issues a token and exposes discovery endpoints.

The credentials can be handled by IAM itself or delegated to an external identity provider (IdP).

The issued JSON Web Token (JWT) is sufficient to perform operations across Mirantis Container Cloud according to the scope and role defined in it. Mirantis recommends using asymmetric cryptography for token signing (RS256) to minimize the dependency between IAM and managed components.

When Container Cloud calls Mirantis Kubernetes Engine (MKE), the user in Keycloak is created automatically with a JWT issued by Keycloak on behalf of the end user. MKE, in its turn, verifies whether the JWT is issued by Keycloak. If the user retrieved from the token does not exist in the MKE database, the user is automatically created in the MKE database based on the information from the token.

The authorization implementation is out of the scope of IAM in Container Cloud. This functionality is delegated to the component level. IAM interacts with a Container Cloud component using the OIDC token content that is processed by a component itself and required authorization is enforced. Such an approach enables you to have any underlying authorization that is not dependent on IAM and still to provide a unified user experience across all Container Cloud components.

Kubernetes CLI authentication flow

The following diagram illustrates the Kubernetes CLI authentication flow. The authentication flow for Helm and other Kubernetes-oriented CLI utilities is identical to the Kubernetes CLI flow, but JSON Web Tokens (JWT) must be pre-provisioned.

_images/iam-authn-k8s.png

See also

IAM resources

Monitoring

Mirantis Container Cloud uses StackLight, the logging, monitoring, and alerting solution that provides a single pane of glass for cloud maintenance and day-to-day operations as well as offers critical insights into cloud health including operational information about the components deployed in management and managed clusters.

StackLight is based on Prometheus, an open-source monitoring solution and a time series database.

Deployment architecture

Mirantis Container Cloud deploys the StackLight stack as a release of a Helm chart that contains the helm-controller and helmbundles.lcm.mirantis.com (HelmBundle) custom resources. The StackLight HelmBundle consists of a set of Helm charts with the StackLight components that include:

StackLight components overview

StackLight component

Description

Alerta

Receives, consolidates, and deduplicates the alerts sent by Alertmanager and visually represents them through a simple web UI. Using the Alerta web UI, you can view the most recent or watched alerts, group, and filter alerts.

Alertmanager

Handles the alerts sent by client applications such as Prometheus, deduplicates, groups, and routes alerts to receiver integrations. Using the Alertmanager web UI, you can view the most recent fired alerts, silence them, or view the Alertmanager configuration.

Elasticsearch Curator

Maintains the data (indexes) in OpenSearch by performing such operations as creating, closing, or opening an index as well as deleting a snapshot. Also, manages the data retention policy in OpenSearch.

Elasticsearch Exporter Compatible with OpenSearch

The Prometheus exporter that gathers internal OpenSearch metrics.

Grafana

Builds and visually represents metric graphs based on time series databases. Grafana supports querying of Prometheus using the PromQL language.

Database backends

StackLight uses PostgreSQL for Alerta and Grafana. PostgreSQL reduces the data storage fragmentation while enabling high availability. High availability is achieved using Patroni, the PostgreSQL cluster manager that monitors for node failures and manages failover of the primary node. StackLight also uses Patroni to manage major version upgrades of PostgreSQL clusters, which allows leveraging the database engine functionality and improvements as they are introduced upstream in new releases, maintaining functional continuity without version lock-in.

Logging stack

Responsible for collecting, processing, and persisting logs and Kubernetes events. By default, when deploying through the Container Cloud web UI, only the metrics stack is enabled on managed clusters. To enable StackLight to gather managed cluster logs, enable the logging stack during deployment. On management clusters, the logging stack is enabled by default. The logging stack components include:

  • OpenSearch, which stores logs and notifications.

  • Fluentd-logs, which collects logs, sends them to OpenSearch, generates metrics based on analysis of incoming log entries, and exposes these metrics to Prometheus.

  • OpenSearch Dashboards, which provides real-time visualization of the data stored in OpenSearch and enables you to detect issues.

  • Metricbeat, which collects Kubernetes events and sends them to OpenSearch for storage.

  • Prometheus-es-exporter, which presents the OpenSearch data as Prometheus metrics by periodically sending configured queries to the OpenSearch cluster and exposing the results to a scrapable HTTP endpoint like other Prometheus targets.

Note

The logging mechanism performance depends on the cluster log load. In case of a high load, you may need to increase the default resource requests and limits for fluentdLogs. For details, see StackLight configuration parameters: Resource limits.

Metric collector

Collects telemetry data (CPU or memory usage, number of active alerts, and so on) from Prometheus and sends the data to centralized cloud storage for further processing and analysis. Metric collector runs on the management cluster.

Note

This component is designated for internal StackLight use only.

Prometheus

Gathers metrics. Automatically discovers and monitors the endpoints. Using the Prometheus web UI, you can view simple visualizations and debug. By default, the Prometheus database stores metrics of the past 15 days or up to 15 GB of data depending on the limit that is reached first.

Prometheus Blackbox Exporter

Allows monitoring endpoints over HTTP, HTTPS, DNS, TCP, and ICMP.

Prometheus-es-exporter

Presents the OpenSearch data as Prometheus metrics by periodically sending configured queries to the OpenSearch cluster and exposing the results to a scrapable HTTP endpoint like other Prometheus targets.

Prometheus Node Exporter

Gathers hardware and operating system metrics exposed by kernel.

Prometheus Relay

Adds a proxy layer to Prometheus to merge the results from underlay Prometheus servers to prevent gaps in case some data is missing on some servers. Is available only in the HA StackLight mode.

Salesforce notifier

Enables sending Alertmanager notifications to Salesforce to allow creating Salesforce cases and closing them once the alerts are resolved. Disabled by default.

Salesforce reporter

Queries Prometheus for the data about the amount of vCPU, vRAM, and vStorage used and available, combines the data, and sends it to Salesforce daily. Mirantis uses the collected data for further analysis and reports to improve the quality of customer support. Disabled by default.

Telegraf

Collects metrics from the system. Telegraf is plugin-driven and has the concept of two distinct set of plugins: input plugins collect metrics from the system, services, or third-party APIs; output plugins write and expose metrics to various destinations.

The Telegraf agents used in Container Cloud include:

  • telegraf-ds-smart monitors SMART disks, and runs on both management and managed clusters.

  • telegraf-ironic monitors Ironic on the baremetal-based management clusters. The ironic input plugin collects and processes data from Ironic HTTP API, while the http_response input plugin checks Ironic HTTP API availability. As an output plugin, to expose collected data as Prometheus target, Telegraf uses prometheus.

  • telegraf-docker-swarm gathers metrics from the Mirantis Container Runtime API about the Docker nodes, networks, and Swarm services. This is a Docker Telegraf input plugin with downstream additions.

Telemeter

Enables a multi-cluster view through a Grafana dashboard of the management cluster. Telemeter includes a Prometheus federation push server and clients to enable isolated Prometheus instances, which cannot be scraped from a central Prometheus instance, to push metrics to the central location.

The Telemeter services are distributed between the management cluster that hosts the Telemeter server and managed clusters that host the Telemeter client. The metrics from managed clusters are aggregated on management clusters.

Note

This component is designated for internal StackLight use only.

Every Helm chart contains a default values.yml file. These default values are partially overridden by custom values defined in the StackLight Helm chart.

Before deploying a managed cluster, you can select the HA or non-HA StackLight architecture type. The non-HA mode is set by default on managed clusters. On management clusters, StackLight is deployed in the HA mode only. The following table lists the differences between the HA and non-HA modes:

StackLight database modes

Non-HA StackLight mode default

HA StackLight mode

  • One Prometheus instance

  • One Alertmanager instance Since 2.24.0 and 2.24.2 for MOSK 23.2

  • One OpenSearch instance

  • One PostgreSQL instance

  • One iam-proxy instance

One persistent volume is provided for storing data. In case of a service or node failure, a new pod is redeployed and the volume is reattached to provide the existing data. Such setup has a reduced hardware footprint but provides less performance.

  • Two Prometheus instances

  • Two Alertmanager instances

  • Three OpenSearch instances

  • Three PostgreSQL instances

  • Two iam-proxy instances Since 2.23.0 and 2.23.1 for MOSK 23.1

Local Volume Provisioner is used to provide local host storage. In case of a service or node failure, the traffic is automatically redirected to any other running Prometheus or OpenSearch server. For better performance, Mirantis recommends that you deploy StackLight in the HA mode. Two iam-proxy instances ensure access to HA components if one iam-proxy node fails.

Note

Before Container Cloud 2.24.0, Alertmanager has 2 replicas in the non-HA mode.

Caution

Non-HA StackLight requires a backend storage provider, for example, a Ceph cluster. For details, see Storage.

Depending on the Container Cloud cluster type and selected StackLight database mode, StackLight is deployed on the following number of nodes:

StackLight database modes

Cluster

StackLight database mode

Target nodes

Management

HA mode

All Kubernetes master nodes

Managed

Non-HA mode

  • All nodes with the stacklight label.

  • If no nodes have the stacklight label, StackLight is spread across all worker nodes. The minimal requirement is at least 1 worker node.

HA mode

All nodes with the stacklight label. The minimal requirement is 3 nodes with the stacklight label. Otherwise, StackLight deployment does not start.

Authentication flow

StackLight provides five web UIs including Prometheus, Alertmanager, Alerta, OpenSearch Dashboards, and Grafana. Access to StackLight web UIs is protected by Keycloak-based Identity and access management (IAM). All web UIs except Alerta are exposed to IAM through the IAM proxy middleware. The Alerta configuration provides direct integration with IAM.

The following diagram illustrates accessing the IAM-proxied StackLight web UIs, for example, Prometheus web UI:

_images/sl-auth-iam-proxied.png

Authentication flow for the IAM-proxied StackLight web UIs:

  1. A user enters the public IP of a StackLight web UI, for example, Prometheus web UI.

  2. The public IP leads to IAM proxy, deployed as a Kubernetes LoadBalancer, which protects the Prometheus web UI.

  3. LoadBalancer routes the HTTP request to Kubernetes internal IAM proxy service endpoints, specified in the X-Forwarded-Proto or X-Forwarded-Host headers.

  4. The Keycloak login form opens (the login_url field in the IAM proxy configuration, which points to Keycloak realm) and the user enters the user name and password.

  5. Keycloak validates the user name and password.

  6. The user obtains access to the Prometheus web UI (the upstreams field in the IAM proxy configuration).

Note

  • The discovery URL is the URL of the IAM service.

  • The upstream URL is the hidden endpoint of a web UI (Prometheus web UI in the example above).

The following diagram illustrates accessing the Alerta web UI:

_images/sl-authentication-direct.png

Authentication flow for the Alerta web UI:

  1. A user enters the public IP of the Alerta web UI.

  2. The public IP leads to Alerta deployed as a Kubernetes LoadBalancer type.

  3. LoadBalancer routes the HTTP request to the Kubernetes internal Alerta service endpoint.

  4. The Keycloak login form opens (Alerta refers to the IAM realm) and the user enters the user name and password.

  5. Keycloak validates the user name and password.

  6. The user obtains access to the Alerta web UI.

Supported features

Using the Mirantis Container Cloud web UI, on the pre-deployment stage of a managed cluster, you can view, enable or disable, or tune the following StackLight features available:

  • StackLight HA mode.

  • Database retention size and time for Prometheus.

  • Tunable index retention period for OpenSearch.

  • Tunable PersistentVolumeClaim (PVC) size for Prometheus and OpenSearch set to 16 GB for Prometheus and 30 GB for OpenSearch by default. The PVC size must be logically aligned with the retention periods or sizes for these components.

  • Email and Slack receivers for the Alertmanager notifications.

  • Predefined set of dashboards.

  • Predefined set of alerts and capability to add new custom alerts for Prometheus in the following exemplary format:

    - alert: HighErrorRate
      expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
      for: 10m
      labels:
        severity: page
      annotations:
        summary: High request latency
    
Monitored components

StackLight measures, analyzes, and reports in a timely manner about failures that may occur in the following Mirantis Container Cloud components and their sub-components, if any:

  • Ceph

  • Ironic

  • Kubernetes services:

    • Calico

    • etcd

    • Kubernetes cluster

    • Kubernetes containers

    • Kubernetes deployments

    • Kubernetes nodes

  • NGINX

  • Node hardware and operating system

  • PostgreSQL

  • StackLight:

    • Alertmanager

    • OpenSearch

    • Grafana

    • Prometheus

    • Prometheus Relay

    • Salesforce notifier

    • Telemeter

  • SSL certificates

  • Mirantis Kubernetes Engine (MKE)

    • Docker/Swarm metrics (through Telegraf)

    • Built-in MKE metrics

Storage-based log retention strategy

Available since 2.26.0 (17.1.0 and 16.1.0)

StackLight uses a storage-based log retention strategy that optimizes storage utilization and ensures effective data retention. A proportion of available disk space is defined as 80% of disk space allocated for the OpenSearch node with the following data types:

  • 80% for system logs

  • 10% for audit logs

  • 5% for OpenStack notifications (applies only to MOSK clusters)

  • 5% for Kubernetes events

This approach ensures that storage resources are efficiently allocated based on the importance and volume of different data types.

The logging index management implies the following advantages:

  • Storage-based rollover mechanism

    The rollover mechanism for system and audit indices enforces shard size based on available storage, ensuring optimal resource utilization.

  • Consistent shard allocation

    The number of primary shards per index is dynamically set based on cluster size, which boosts search and facilitates ingestion for large clusters.

  • Minimal size of cluster state

    The logging size of the cluster state is minimal and uses static mappings, which are based on Elastic Common Schema (ESC) with slight deviations from the standard. Dynamic mapping in index templates is avoided to reduce overhead.

  • Storage compression

    The system and audit indices utilize the best_compression codec that minimizes the size of stored indices, resulting in significant storage savings of up to 50% on average.

  • No filter by logging level

    In light of non-even severity level over components in Container Cloud, logs of all severity levels are collected to prevent ignorance of important logs of low severity while debugging a cluster. Filtering by tags is still available.

Outbound cluster metrics

The data collected and transmitted through an encrypted channel back to Mirantis provides our Customer Success Organization information to better understand the operational usage patterns our customers are experiencing as well as to provide feedback on product usage statistics to enable our product teams to enhance our products and services for our customers.

Mirantis collects the following statistics using configuration-collector:

Mirantis collects hardware information using the following metrics:

  • mcc_hw_machine_chassis

  • mcc_hw_machine_cpu_model

  • mcc_hw_machine_cpu_number

  • mcc_hw_machine_nics

  • mcc_hw_machine_ram

  • mcc_hw_machine_storage (storage devices and disk layout)

  • mcc_hw_machine_vendor

Mirantis collects the summary of all deployed Container Cloud configurations using the following objects, if any:

Note

The data is anonymized from all sensitive information, such as IDs, IP addresses, passwords, private keys, and so on.

  • Cluster

  • Machine

  • MCCUpgrade

  • BareMetalHost

  • BareMetalHostProfile

  • IPAMHost

  • IPAddr

  • KaaSCephCluster

  • L2Template

  • Subnet

Note

In the Cluster releases 17.0.0, 16.0.0, and 14.1.0, Mirantis does not collect any configuration summary in light of the configuration-collector refactoring.

The node-level resource data are broken down into three broad categories: Cluster, Node, and Namespace. The telemetry data tracks Allocatable, Capacity, Limits, Requests, and actual Usage of node-level resources.

Terms explanation

Term

Definition

Allocatable

On a Kubernetes Node, the amount of compute resources that are available for pods

Capacity

The total number of available resources regardless of current consumption

Limits

Constraints imposed by Administrators

Requests

The resources that a given container application is requesting

Usage

The actual usage or consumption of a given resource

The full list of the outbound data includes:

From management clusters
  • hostos_module_usage Since 2.28.0 (17.3.0, 16.3.0)

From Mirantis OpenStack for Kubernetes (MOSK) clusters
  • cluster_alerts_firing Since MOSK 23.1

  • cluster_filesystem_size_bytes

  • cluster_filesystem_usage_bytes

  • cluster_filesystem_usage_ratio

  • cluster_master_nodes_total

  • cluster_nodes_total

  • cluster_persistentvolumeclaim_requests_storage_bytes

  • cluster_total_alerts_triggered

  • cluster_capacity_cpu_cores

  • cluster_capacity_memory_bytes

  • cluster_usage_cpu_cores

  • cluster_usage_memory_bytes

  • cluster_usage_per_capacity_cpu_ratio

  • cluster_usage_per_capacity_memory_ratio

  • cluster_worker_nodes_total

  • cluster_workload_pods_total Since MOSK 23.1

  • cluster_workload_containers_total Since MOSK 23.1

  • kaas_info

  • kaas_cluster_machines_ready_total

  • kaas_cluster_machines_requested_total

  • kaas_clusters

  • kaas_cluster_updating Since MOSK 22.5

  • kaas_license_expiry

  • kaas_machines_ready

  • kaas_machines_requested

  • kubernetes_api_availability

  • mcc_cluster_update_plan_status Since MOSK 24.3 as TechPreview

  • mke_api_availability

  • mke_cluster_nodes_total

  • mke_cluster_containers_total

  • mke_cluster_vcpu_free

  • mke_cluster_vcpu_used

  • mke_cluster_vram_free

  • mke_cluster_vram_used

  • mke_cluster_vstorage_free

  • mke_cluster_vstorage_used

  • node_labels Since MOSK 23.2

  • openstack_cinder_api_latency_90

  • openstack_cinder_api_latency_99

  • openstack_cinder_api_status Removed in MOSK 24.1

  • openstack_cinder_availability

  • openstack_cinder_volumes_total

  • openstack_glance_api_status

  • openstack_glance_availability

  • openstack_glance_images_total

  • openstack_glance_snapshots_total Removed in MOSK 24.1

  • openstack_heat_availability

  • openstack_heat_stacks_total

  • openstack_host_aggregate_instances Removed in MOSK 23.2

  • openstack_host_aggregate_memory_used_ratio Removed in MOSK 23.2

  • openstack_host_aggregate_memory_utilisation_ratio Removed in MOSK 23.2

  • openstack_host_aggregate_cpu_utilisation_ratio Removed in MOSK 23.2

  • openstack_host_aggregate_vcpu_used_ratio Removed in MOSK 23.2

  • openstack_instance_availability

  • openstack_instance_create_end

  • openstack_instance_create_error

  • openstack_instance_create_start

  • openstack_keystone_api_latency_90

  • openstack_keystone_api_latency_99

  • openstack_keystone_api_status Removed in MOSK 24.1

  • openstack_keystone_availability

  • openstack_keystone_tenants_total

  • openstack_keystone_users_total

  • openstack_kpi_provisioning

  • openstack_lbaas_availability

  • openstack_mysql_flow_control

  • openstack_neutron_api_latency_90

  • openstack_neutron_api_latency_99

  • openstack_neutron_api_status Removed in MOSK 24.1

  • openstack_neutron_availability

  • openstack_neutron_lbaas_loadbalancers_total

  • openstack_neutron_networks_total

  • openstack_neutron_ports_total

  • openstack_neutron_routers_total

  • openstack_neutron_subnets_total

  • openstack_nova_all_compute_cpu_utilisation

  • openstack_nova_all_compute_mem_utilisation

  • openstack_nova_all_computes_total

  • openstack_nova_all_vcpus_total

  • openstack_nova_all_used_vcpus_total

  • openstack_nova_all_ram_total_gb

  • openstack_nova_all_used_ram_total_gb

  • openstack_nova_all_disk_total_gb

  • openstack_nova_all_used_disk_total_gb

  • openstack_nova_api_status Removed in MOSK 24.1

  • openstack_nova_availability

  • openstack_nova_compute_cpu_utilisation

  • openstack_nova_compute_mem_utilisation

  • openstack_nova_computes_total

  • openstack_nova_disk_total_gb

  • openstack_nova_instances_active_total

  • openstack_nova_ram_total_gb

  • openstack_nova_used_disk_total_gb

  • openstack_nova_used_ram_total_gb

  • openstack_nova_used_vcpus_total

  • openstack_nova_vcpus_total

  • openstack_public_api_status Since MOSK 22.5

  • openstack_quota_instances

  • openstack_quota_ram_gb

  • openstack_quota_vcpus

  • openstack_quota_volume_storage_gb

  • openstack_rmq_message_deriv

  • openstack_usage_instances

  • openstack_usage_ram_gb

  • openstack_usage_vcpus

  • openstack_usage_volume_storage_gb

  • osdpl_aodh_alarms Since MOSK 23.3

  • osdpl_api_success Since MOSK 24.1

  • osdpl_cinder_zone_volumes Since MOSK 23.3

  • osdpl_ironic_nodes Since MOSK 25.1

  • osdpl_manila_shares Since MOSK 24.2

  • osdpl_masakari_hosts Since MOSK 24.2

  • osdpl_neutron_availability_zone_info Since MOSK 23.3

  • osdpl_neutron_zone_routers Since MOSK 23.3

  • osdpl_nova_aggregate_hosts Since MOSK 23.3

  • osdpl_nova_audit_orphaned_allocations Since MOSK 24.3

  • osdpl_nova_availability_zone_info Since MOSK 23.3

  • osdpl_nova_availability_zone_instances Since MOSK 23.3

  • osdpl_nova_availability_zone_hosts Since MOSK 23.3

  • osdpl_version_info Since MOSK 23.3

  • tf_operator_info Since MOSK 23.3 for Tungsten Fabric

StackLight proxy

StackLight components, which require external access, automatically use the same proxy that is configured for Mirantis Container Cloud clusters. Therefore, you only need to configure proxy during deployment of your management or managed clusters. No additional actions are required to set up proxy for StackLight. For more details about implementation of proxy support in Container Cloud, see Proxy and cache support.

Note

Proxy handles only the HTTP and HTTPS traffic. Therefore, for clusters with limited or no Internet access, it is not possible to set up Alertmanager email notifications, which use SMTP, when proxy is used.

Proxy is used for the following StackLight components:

Component

Cluster type

Usage

Alertmanager

Any

As a default http_config for all HTTP-based receivers except the predefined HTTP-alerta and HTTP-salesforce. For these receivers, http_config is overridden on the receiver level.

Metric Collector

Management

To send outbound cluster metrics to Mirantis.

Salesforce notifier

Any

To send notifications to the Salesforce instance.

Salesforce reporter

Any

To send metric reports to the Salesforce instance.

Requirements

Using Mirantis Container Cloud, you can deploy a Mirantis Kubernetes Engine (MKE) cluster on bare metal that requires corresponding resources.

If you use a firewall or proxy, make sure that the bootstrap and management clusters have access to the following IP ranges and domain names required for the Container Cloud content delivery network and alerting:

  • IP ranges:

  • Domain names:

    • mirror.mirantis.com and repos.mirantis.com for packages

    • binary.mirantis.com for binaries and Helm charts

    • mirantis.azurecr.io and *.blob.core.windows.net for Docker images

    • mcc-metrics-prod-ns.servicebus.windows.net:9093 for Telemetry (port 9093 if proxy is disabled, or port 443 if proxy is enabled)

    • mirantis.my.salesforce.com and login.salesforce.com for Salesforce alerts

Note

  • Access to Salesforce is required from any Container Cloud cluster type.

  • If any additional Alertmanager notification receiver is enabled, for example, Slack, its endpoint must also be accessible from the cluster.

Caution

Regional clusters are unsupported since Container Cloud 2.25.0. Mirantis does not perform functional integration testing of the feature and the related code is removed in Container Cloud 2.26.0. If you still require this feature, contact Mirantis support for further information.

Reference hardware configuration

The following hardware configuration is used as a reference to deploy Mirantis Container Cloud with bare metal Container Cloud clusters with Mirantis Kubernetes Engine.

Reference hardware configuration for Container Cloud management and managed clusters on bare metal

Server role

Management cluster

Managed cluster

# of servers

3 1

6 2

CPU cores

Minimal: 16
Recommended: 32
Minimal: 16
Recommended: depends on workload

RAM, GB

Minimal: 64
Recommended: 256
Minimal: 64
Recommended: 128

System disk, GB 3

Minimal: SSD 1x 120
Recommended: NVME 1 x 960
Minimal: SSD 1 x 120
Recommended: NVME 1 x 960

SSD/HDD storage, GB

1 x 1900 4

2 x 1900

NICs 5

Minimal: 1 x 2-port
Recommended: 2 x 2-port
Minimal: 2 x 2-port
Recommended: depends on workload
1

Adding more than 3 nodes to a management cluster is not supported.

2

Three manager nodes for HA and three worker storage nodes for a minimal Ceph cluster.

3

A management cluster requires 2 volumes for Container Cloud (total 50 GB) and 5 volumes for StackLight (total 60 GB). A managed cluster requires 5 volumes for StackLight.

4

In total, at least 2 disks are required:

  • disk0 - minimum 120 GB for system

  • disk1 - minimum 120 GB for LocalVolumeProvisioner

For the default storage schema, see Default configuration of the host system storage

5

Only one PXE port per node is allowed. The out-of-band management (IPMI) port is not included.

System requirements for the seed node

The seed node is necessary only to deploy the management cluster. When the bootstrap is complete, the bootstrap node can be redeployed and its resources can be reused for the managed cluster workloads.

The minimum reference system requirements for a baremetal-based bootstrap seed node are as follows:

  • Basic server on Ubuntu 22.04 with the following configuration:

    • Kernel version 4.15.0-76.86 or later

    • 8 GB of RAM

    • 4 CPU

    • 10 GB of free disk space for the bootstrap cluster cache

  • No DHCP or TFTP servers on any NIC networks

  • Routable access IPMI network for the hardware servers. For more details, see Host networking.

  • Internet access for downloading of all required artifacts

Network fabric

The following diagram illustrates the physical and virtual L2 underlay networking schema for the final state of the Mirantis Container Cloud bare metal deployment.

_images/bm-cluster-physical-and-l2-networking.png

The network fabric reference configuration is a spine/leaf with 2 leaf ToR switches and one out-of-band (OOB) switch per rack.

Reference configuration uses the following switches for ToR and OOB:

  • Cisco WS-C3560E-24TD has 24 of 1 GbE ports. Used in OOB network segment.

  • Dell Force 10 S4810P has 48 of 1/10GbE ports. Used as ToR in Common/PXE network segment.

In the reference configuration, all odd interfaces from NIC0 are connected to TOR Switch 1, and all even interfaces from NIC0 are connected to TOR Switch 2. The Baseboard Management Controller (BMC) interfaces of the servers are connected to OOB Switch 1.

The following recommendations apply to all types of nodes:

  • Use the Link Aggregation Control Protocol (LACP) bonding mode with MC-LAG domains configured on leaf switches. This corresponds to the 802.3ad bond mode on hosts.

  • Use ports from different multi-port NICs when creating bonds. This makes network connections redundant if failure of a single NIC occurs.

  • Configure the ports that connect servers to the PXE network with PXE VLAN as native or untagged. On these ports, configure LACP fallback to ensure that the servers can reach DHCP server and boot over network.

DHCP range requirements for PXE

When setting up the network range for DHCP Preboot Execution Environment (PXE), keep in mind several considerations to ensure smooth server provisioning:

  • Determine the network size. For instance, if you target a concurrent provision of 50+ servers, a /24 network is recommended. This specific size is crucial as it provides sufficient scope for the DHCP server to provide unique IP addresses to each new Media Access Control (MAC) address, thereby minimizing the risk of collision.

    The concept of collision refers to the likelihood of two or more devices being assigned the same IP address. With a /24 network, the collision probability using the SDBM hash function, which is used by the DHCP server, is low. If a collision occurs, the DHCP server provides a free address using a linear lookup strategy.

  • In the context of PXE provisioning, technically, the IP address does not need to be consistent for every new DHCP request associated with the same MAC address. However, maintaining the same IP address can enhance user experience, making the /24 network size more of a recommendation than an absolute requirement.

  • For a minimal network size, it is sufficient to cover the number of concurrently provisioned servers plus one additional address (50 + 1). This calculation applies after covering any exclusions that exist in the range. You can define excludes in the corresponding field of the Subnet object. For details, see API Reference: Subnet resource.

  • When the available address space is less than the minimum described above, you will not be able to automatically provision all servers. However, you can manually provision them by combining manual IP assignment for each bare metal host with manual pauses. For these operations, use the host.dnsmasqs.metal3.io/address and baremetalhost.metal3.io/detached annotations in the BareMetalHostInventory object. For details, see Operations Guide: Manually allocate IP addresses for bare metal hosts.

  • All addresses within the specified range must remain unused before provisioning. If an IP address in-use is issued by the DHCP server to a BOOTP client, that specific client cannot complete provisioning.

Management cluster storage

The management cluster requires minimum two storage devices per node. Each device is used for different type of storage.

  • The first device is always used for boot partitions and the root file system. SSD is recommended. RAID device is not supported.

  • One storage device per server is reserved for local persistent volumes. These volumes are served by the Local Storage Static Provisioner (local-volume-provisioner) and used by many services of Container Cloud.

You can configure host storage devices using the BareMetalHostProfile resources. For details, see Customize the default bare metal host profile.

Proxy and cache support

Proxy support

If you require all Internet access to go through a proxy server for security and audit purposes, you can bootstrap management clusters using proxy. The proxy server settings consist of three standard environment variables that are set prior to the bootstrap process:

  • HTTP_PROXY

  • HTTPS_PROXY

  • NO_PROXY

These settings are not propagated to managed clusters. However, you can enable a separate proxy access on a managed cluster using the Container Cloud web UI. This proxy is intended for the end user needs and is not used for a managed cluster deployment or for access to the Mirantis resources.

Caution

Since Container Cloud uses the OpenID Connect (OIDC) protocol for IAM authentication, management clusters require a direct non-proxy access from managed clusters.

StackLight components, which require external access, automatically use the same proxy that is configured for Container Cloud clusters.

On the managed clusters with limited Internet access, a proxy is required for StackLight components that use HTTP and HTTPS and are disabled by default but need external access if enabled, for example, for the Salesforce integration and Alertmanager notifications external rules. For more details about proxy implementation in StackLight, see StackLight proxy.

For the list of Mirantis resources and IP addresses to be accessible from the Container Cloud clusters, see Requirements.

After enabling proxy support on managed clusters, proxy is used for:

  • Docker traffic on managed clusters

  • StackLight

  • OpenStack on MOSK-based clusters

Warning

Any modification to the Proxy object used in any cluster, for example, changing the proxy URL, NO_PROXY values, or certificate, leads to cordon-drain and Docker restart on the cluster machines.

Artifacts caching

The Container Cloud managed clusters are deployed without direct Internet access in order to consume less Internet traffic in your cloud. The Mirantis artifacts used during managed clusters deployment are downloaded through a cache running on a management cluster. The feature is enabled by default on new managed clusters and will be automatically enabled on existing clusters during upgrade to the latest version.

Caution

IAM operations require a direct non-proxy access of a managed cluster to a management cluster.

MKE API limitations

To ensure the Mirantis Container Cloud stability in managing the Container Cloud-based Mirantis Kubernetes Engine (MKE) clusters, the following MKE API functionality is not available for the Container Cloud-based MKE clusters as compared to the MKE clusters that are deployed not by Container Cloud. Use the Container Cloud web UI or CLI for this functionality instead.

Public APIs limitations in a Container Cloud-based MKE cluster

API endpoint

Limitation

GET /swarm

Swarm Join Tokens are filtered out for all users, including admins.

PUT /api/ucp/config-toml

All requests are forbidden.

POST /nodes/{id}/update

Requests for the following changes are forbidden:

  • Change Role

  • Add or remove the com.docker.ucp.orchestrator.swarm and com.docker.ucp.orchestrator.kubernetes labels.

DELETE /nodes/{id}

All requests are forbidden.

MKE configuration management

This section describes configuration specifics of an MKE cluster deployed using Container Cloud.

MKE configuration managed by Container Cloud

Since 2.25.1 (Cluster releases 16.0.1 and 17.0.1), Container Cloud does not override changes in MKE configuration except the following list of parameters that are automatically managed by Container Cloud. These parameters are always overridden by the Container Cloud default values if modified direclty using the MKE API. For details on configuration using the MKE API, see MKE configuration managed directly by the MKE API.

However, you can manually configure a few options from this list using the Cluster object of a Container Cloud cluster. They are labeled with the superscript and contain references to the respective configuration procedures in the Comments columns of the tables.

[audit_log_configuration]

MKE parameter name

Default value in Container Cloud

Comments

level

"metadata" 0
"" 1

You can configure this option either using MKE API with no Container Cloud overrides or using the Cluster object of a Container Cloud cluster. For details, see Configure Kubernetes auditing and profiling and MKE documentation: MKE audit logging.

If configured using the Cluster object, use the same object to disable the option. Otherwise, it will be overridden by Container Cloud.

support_bundle_include_audit_logs

false

For configuration procedure, see comments above.

0

For management clusters since 2.26.0 (Cluster release 16.1.0)

1

For management and managed clusters since 2.24.3 (Cluster releases 15.0.2 and 14.0.2)

[auth]

MKE parameter name

Default value in Container Cloud

default_new_user_role

"restrictedcontrol"

backend

"managed"

samlEnabled

false

managedPasswordDisabled

false

[auth.external_identity_provider]

MKE parameter name

Default value in Container Cloud

issuer

"https://<Keycloak-external-address>/auth/realms/iam"

userServiceId

"<userServiceId>"

clientId

"kaas"

wellKnownConfigUrl

"https://<Keycloak-external-address>/auth/realms/iam/.well-known/openid-configuration"

caBundle

"<caCert>"

usernameClaim

""

httpProxy

""

httpsProxy

""

[hardening_configuration]

MKE parameter name

Default value in Container Cloud

hardening_enabled

true

limit_kernel_capabilities

true

pids_limit_int

100000

pids_limit_k8s

100000

pids_limit_swarm

100000

[scheduling_configuration]

MKE parameter name

Default value in Container Cloud

enable_admin_ucp_scheduling

true

default_node_orchestrator

kubernetes

[tracking_configuration]

MKE parameter name

Default value in Container Cloud

cluster_label

"prod"

[cluster_config]

MKE parameter name

Default value in Container Cloud

Comments

calico_ip_auto_method

interface=k8s-pods

calico_mtu

"1440"

For configuration steps, see Set the MTU size for Calico.

calico_vxlan

true

calico_vxlan_mtu

"1440"

calico_vxlan_port

"4792"

cloud_provider

""

controller_port

4443

custom_kube_api_server_flags

["--event-ttl=720h"]

Applies only to MKE on the management cluster.

custom_kube_controller_manager_flags

["--leader-elect-lease-duration=120s", "--leader-elect-renew-deadline=60s"]

custom_kube_scheduler_flags

["--leader-elect-lease-duration=120s", "--leader-elect-renew-deadline=60s"]

custom_kubelet_flags

["--serialize-image-pulls=false"]

etcd_storage_quota

""

For configuration steps, see Increase storage quota for etcd.

exclude_server_identity_headers

true

ipip_mtu

"1440"

kube_api_server_auditing

true 3
false 4

For configuration steps, see Configure Kubernetes auditing and profiling.

kube_api_server_audit_log_maxage 5

30

kube_api_server_audit_log_maxbackup 5

10

kube_api_server_audit_log_maxsize 5

10

kube_api_server_profiling_enabled

false

For configuration steps, see Configure Kubernetes auditing and profiling.

kube_apiserver_port

5443

kube_protect_kernel_defaults

true

local_volume_collection_mapping

false

manager_kube_reserved_resources

"cpu=1000m,memory=2Gi,ephemeral-storage=4Gi"

metrics_retention_time

"24h"

metrics_scrape_interval

"1m"

nodeport_range

"30000-32768"

pod_cidr

"10.233.64.0/18"

You can override this value in spec::clusterNetwork::pods::cidrBlocks: of the Cluster object.

priv_attributes_allowed_for_service_accounts 2

["hostBindMounts", "hostIPC", "hostNetwork", "hostPID", "kernelCapabilities", "privileged"]

priv_attributes_priv_attributes_service_accounts 2

["kube-system:helm-controller-sa", "kube-system:pod-garbage-collector", "stacklight:stacklight-helm-controller"]service_accounts

profiling_enabled

false

prometheus_memory_limit

"4Gi"

prometheus_memory_request

"2Gi"

secure_overlay

true

service_cluster_ip_range

"10.233.0.0/18"

You can override this value in spec::clusterNetwork::services::cidrBlocks: of the Cluster object.

swarm_port

2376

swarm_strategy

"spread"

unmanaged_cni

false

vxlan_vni

10000

worker_kube_reserved_resources

"cpu=100m,memory=300Mi,ephemeral-storage=500Mi"

2(1,2)

For priv_attributes parameters, you can add custom options on top of existing parameters using the MKE API.

3

For management clusters since 2.26.0 (Cluster release 16.1.0).

4

For management and managed clusters since 2.24.3 (Cluster releases 15.0.2 and 14.0.2).

5(1,2,3)

For management and managed clusters since 2.27.0 (Cluster releases 17.2.0 and 16.2.0). For configuration steps, see Configure Kubernetes auditing and profiling.

Note

All possible values for parameters labeled with the superscript, which you can manually configure using the Cluster object are described in MKE Operations Guide: Configuration options.

MKE configuration managed directly by the MKE API

Since 2.25.1, aside from MKE parameters described in MKE configuration managed by Container Cloud, Container Cloud does not override changes in MKE configuration that are applied directly through the MKE API. For the configuration options and procedure, see MKE documentation:

  • MKE configuration options

  • Configure an existing MKE cluster

    While using this procedure, replace the command to upload the newly edited MKE configuration file with the following one:

    curl --silent --insecure -X PUT -H "X-UCP-Allow-Restricted-API: i-solemnly-swear-i-am-up-to-no-good" -H "accept: application/toml" -H "Authorization: Bearer $AUTHTOKEN" --upload-file 'mke-config.toml' https://$MKE_HOST/api/ucp/config-toml
    

Important

Mirantis cannot guarrantee the expected behavior of the functionality configured using the MKE API as long as customer-specific configuration does not undergo testing within Container Cloud. Therefore, Mirantis recommends that you test custom MKE settings configured through the MKE API on a staging environment before applying them to production.

Deployment Guide

Deploy a Container Cloud management cluster

Note

The deprecated bootstrap procedure using Bootstrap v1 was removed for the sake of Bootstrap v2 in Container Cloud 2.26.0.

Introduction

Available since 2.25.0

Mirantis Container Cloud Bootstrap v2 provides best user experience to set up Container Cloud. Using Bootstrap v2, you can provision and operate management clusters using required objects through the Container Cloud API.

Basic concepts and components of Bootstrap v2 include:

  • Bootstrap cluster

    Bootstrap cluster is any kind-based Kubernetes cluster that contains a minimal set of Container Cloud bootstrap components allowing the user to prepare the configuration for management cluster deployment and start the deployment. The list of these components includes:

    • Bootstrap Controller

      Controller that is responsible for:

      1. Configuration of a bootstrap cluster with provider charts through the bootstrap Helm bundle.

      2. Configuration and deployment of a management cluster and its related objects.

    • Helm Controller

      Operator that manages Helm chart releases. It installs the Container Cloud bootstrap and provider charts configured in the bootstrap Helm bundle.

    • Public API charts

      Helm charts that contain custom resource definitions for Container Cloud resources.

    • Admission Controller

      Controller that performs mutations and validations for the Container Cloud resources including cluster and machines configuration.

    Currently one bootstrap cluster can be used for deployment of only one management cluster. For example, to add a new management cluster with different settings, a new bootstrap cluster must be recreated from scratch.

  • Bootstrap region

    BootstrapRegion is the first object to create in the bootstrap cluster for the Bootstrap Controller to identify and install provider components onto the bootstrap cluster. After, the user can prepare and deploy a management cluster with related resources.

    The bootstrap region is a starting point for the cluster deployment. The user needs to approve the BootstrapRegion object. Otherwise, the Bootstrap Controller will not be triggered for the cluster deployment.

  • Bootstrap Helm bundle

    Helm bundle that contains charts configuration for the bootstrap cluster. This object is managed by the Bootstrap Controller that updates the provider bundle in the BootstrapRegion object. The Bootstrap Controller always configures provider charts listed in the regional section of the Container Cloud release for the provider. Depending on the cluster configuration, the Bootstrap Controller may update or reconfigure this bundle even after the cluster deployment starts. For example, the Bootstrap Controller enables the provider in the bootstrap cluster only after the bootstrap region is approved for the deployment.

Overview of the deployment workflow

Management cluster deployment consists of several sequential stages. Each stage finishes when a specific condition is met or specific configuration applies to a cluster or its machines.

In case of issues at any deployment stage, you can identify the problem and adjust it on the fly. The cluster deployment does not abort until all stages complete by means of the infinite-timeout option enabled by default in Bootstrap v2.

Infinite timeout prevents the bootstrap failure due to timeout. This option is useful in the following cases:

  • The network speed is slow for artifacts downloading

  • An infrastructure configuration does not allow booting fast

  • A bare-metal node inspecting presupposes more than two HDDSATA disks to attach to a machine

You can track the status of each stage in the bootstrapStatus section of the Cluster object that is updated by the Bootstrap Controller.

The Bootstrap Controller starts deploying the cluster after you approve the BootstrapRegion configuration.

The following table describes deployment states of a management cluster that apply in the strict order.

Deployment states of a management cluster

Step

State

Description

1

ProxySettingsHandled

Verifies proxy configuration in the Cluster object. If the bootstrap cluster was created without a proxy, no actions are applied to the cluster.

2

ClusterSSHConfigured

Verifies SSH configuration for the cluster and machines.

You can provide any number of SSH public keys, which are added to cluster machines. But the Bootstrap Controller always adds the bootstrap-key SSH public key to the cluster configuration. The Bootstrap Controller uses this SSH key to manage the lcm-agent configuration on cluster machines.

The bootstrap-key SSH key is copied to a bootstrap-key-<clusterName> object containing the cluster name in its name.

3

ProviderUpdatedInBootstrap

Synchronizes the provider and settings of its components between the Cluster object and bootstrap Helm bundle. Settings provided in the cluster configuration have higher priority than the default settings of the bootstrap cluster, except CDN.

4

ProviderEnabledInBootstrap

Enables the provider and its components if any were disabled by the Bootstrap Controller during preparation of the bootstrap region. A cluster and machines deployment starts after the provider enablement.

5

Nodes readiness

Waits for the provider to complete nodes deployment that comprises VMs creation and MKE installation.

6

ObjectsCreated

Creates required namespaces and IAM secrets.

7

ProviderConfigured

Verifies the provider configuration in the provisioned cluster.

8

HelmBundleReady

Verifies the Helm bundle readiness for the provisioned cluster.

9

ControllersDisabledBeforePivot

Collects the list of deployment controllers and disables them to prepare for pivot.

10

PivotDone

Moves all cluster-related objects from the bootstrap cluster to the provisioned cluster. The copies of Cluster and Machine objects remain in the bootstrap cluster to provide the status information to the user. About every minute, the Bootstrap Controller reconciles the status of the Cluster and Machine objects of the provisioned cluster to the bootstrap cluster.

11

ControllersEnabledAfterPivot

Enables controllers in the provisioned cluster.

12

MachinesLCMAgentUpdated

Updates the lcm-agent configuration on machines to target LCM agents to the provisioned cluster.

13

HelmControllerDisabledBeforeConfig

Disables the Helm Controller before reconfiguration.

14

HelmControllerConfigUpdated

Updates the Helm Controller configuration for the provisioned cluster.

15

Cluster readiness

Contains information about the global cluster status. The Bootstrap Controller verifies that OIDC, Helm releases, and all Deployments are ready. Once the cluster is ready, the Bootstrap Controller stops managing the cluster.

Set up a bootstrap cluster

The setup of a bootstrap cluster comprises preparation of the seed node, configuration of environment variables, acquisition of the Container Cloud license file, and execution of the bootstrap script.

To set up a bootstrap cluster:

  1. Prepare the seed node:

    1. Verify that the hardware allocated for the installation meets the minimal requirements described in Requirements.

    2. Install basic Ubuntu 22.04 server using standard installation images of the operating system on the bare metal seed node.

    3. Log in to the seed node that is running Ubuntu 22.04.

    4. Prepare the system and network configuration:

      1. Establish a virtual bridge using an IP address of the PXE network on the seed node. Use the following netplan-based configuration file as an example:

        # cat /etc/netplan/config.yaml
        network:
          version: 2
          renderer: networkd
          ethernets:
            ens3:
                dhcp4: false
                dhcp6: false
          bridges:
              br0:
                  addresses:
                  # Replace with IP address from PXE network to create a virtual bridge
                  - 10.0.0.15/24
                  dhcp4: false
                  dhcp6: false
                  # Adjust for your environment
                  gateway4: 10.0.0.1
                  interfaces:
                  # Interface name may be different in your environment
                  - ens3
                  nameservers:
                      addresses:
                      # Adjust for your environment
                      - 8.8.8.8
                  parameters:
                      forward-delay: 4
                      stp: false
        
      2. Apply the new network configuration using netplan:

        sudo netplan apply
        
      3. Verify the new network configuration:

        sudo apt update && sudo apt install -y bridge-utils
        sudo brctl show
        

        Example of system response:

        bridge name     bridge id               STP enabled     interfaces
        br0             8000.fa163e72f146       no              ens3
        

        Verify that the interface connected to the PXE network belongs to the previously configured bridge.

      4. Install the current Docker version available for Ubuntu 22.04:

        sudo apt-get update
        sudo apt-get install docker.io
        
      5. Verify that your logged USER has access to the Docker daemon:

        sudo usermod -aG docker $USER
        
      6. Log out and log in again to the seed node to apply the changes.

      7. Verify that Docker is configured correctly and has access to Container Cloud CDN. For example:

        docker run --rm alpine sh -c "apk add --no-cache curl; \
        curl https://binary.mirantis.com"
        

        The system output must contain a json file with no error messages. In case of errors, follow the steps provided in Troubleshooting.

        Note

        If you require all Internet access to go through a proxy server for security and audit purposes, configure Docker proxy settings as described in the official Docker documentation.

        To verify that Docker is configured correctly and has access to Container Cloud CDN:

        docker run --rm alpine sh -c "export http_proxy=http://<proxy_ip:proxy_port>; \
        sed -i ‘s/https/http/g' /etc/apk/repositories; \
        apk add --no-cache wget ; \
        wget http://binary.mirantis.com; \
        cat index.html
        
    5. Verify that the seed node has direct access to the Baseboard Management Controller (BMC) of each bare metal host. All target hardware nodes must be in the power off state.

      For example, using the IPMI tool:

      apt install ipmitool
      ipmitool -I lanplus -H 'IPMI IP' -U 'IPMI Login' -P 'IPMI password' \
      chassis power status
      

      Example of system response:

      Chassis Power is off
      
  2. Prepare the bootstrap script:

    1. Download and run the Container Cloud bootstrap script:

      sudo apt-get update
      sudo apt-get install wget
      wget https://binary.mirantis.com/releases/get_container_cloud.sh
      chmod 0755 get_container_cloud.sh
      ./get_container_cloud.sh
      
    2. Change the directory to the kaas-bootstrap folder created by the script.

  3. Obtain a Container Cloud license file required for the bootstrap:

    Obtain a Container Cloud license
    1. Select from the following options:

      • Open the email from support@mirantis.com with the subject Mirantis Container Cloud License File or Mirantis OpenStack License File

      • In the Mirantis CloudCare Portal, open the Account or Cloud page

    2. Download the License File and save it as mirantis.lic under the kaas-bootstrap directory on the bootstrap node.

    3. Verify that mirantis.lic contains the previously downloaded Container Cloud license by decoding the license JWT token, for example, using jwt.io.

      Example of a valid decoded Container Cloud license data with the mandatory license field:

      {
          "exp": 1652304773,
          "iat": 1636669973,
          "sub": "demo",
          "license": {
              "dev": false,
              "limits": {
                  "clusters": 10,
                  "workers_per_cluster": 10
              },
              "openstack": null
          }
      }
      

    Warning

    The MKE license does not apply to mirantis.lic. For details about MKE license, see MKE documentation.

  4. Export mandatory parameters:

    Bare metal network mandatory parameters

    Export the following mandatory parameters using the commands and table below:

    export KAAS_BM_ENABLED="true"
    #
    export KAAS_BM_PXE_IP="172.16.59.5"
    export KAAS_BM_PXE_MASK="24"
    export KAAS_BM_PXE_BRIDGE="br0"
    
    Bare metal prerequisites data

    Parameter

    Description

    Example value

    KAAS_BM_PXE_IP

    The provisioning IP address in the PXE network. This address will be assigned on the seed node to the interface defined by the KAAS_BM_PXE_BRIDGE parameter described below. The PXE service of the bootstrap cluster uses this address to network boot bare metal hosts.

    172.16.59.5

    KAAS_BM_PXE_MASK

    The PXE network address prefix length to be used with the KAAS_BM_PXE_IP address when assigning it to the seed node interface.

    24

    KAAS_BM_PXE_BRIDGE

    The PXE network bridge name that must match the name of the bridge created on the seed node during the Set up a bootstrap cluster stage.

    br0

  5. Optional. Configure proxy settings to bootstrap the cluster using proxy:

    Proxy configuration

    Add the following environment variables:

    • HTTP_PROXY

    • HTTPS_PROXY

    • NO_PROXY

    • PROXY_CA_CERTIFICATE_PATH

    Example snippet:

    export HTTP_PROXY=http://proxy.example.com:3128
    export HTTPS_PROXY=http://user:pass@proxy.example.com:3128
    export NO_PROXY=172.18.10.0,registry.internal.lan
    export PROXY_CA_CERTIFICATE_PATH="/home/ubuntu/.mitmproxy/mitmproxy-ca-cert.cer"
    

    The following formats of variables are accepted:

    Proxy configuration data

    Variable

    Format

    HTTP_PROXY
    HTTPS_PROXY
    • http://proxy.example.com:port - for anonymous access.

    • http://user:password@proxy.example.com:port - for restricted access.

    NO_PROXY

    Comma-separated list of IP addresses or domain names.

    PROXY_CA_CERTIFICATE_PATH

    Optional. Absolute path to the proxy CA certificate for man-in-the-middle (MITM) proxies. Must be placed on the bootstrap node to be trusted. For details, see Install a CA certificate for a MITM proxy on a bootstrap node.

    Warning

    If you require Internet access to go through a MITM proxy, ensure that the proxy has streaming enabled as described in Enable streaming for MITM.

    For implementation details, see Proxy and cache support.

    After the bootstrap cluster is set up, the bootstrap-proxy object is created with the provided proxy settings. You can use this object later for the Cluster object configuration.

  6. Deploy the bootstrap cluster:

    ./bootstrap.sh bootstrapv2
    
  7. Make sure that port 80 is open for localhost to prevent security requirements for the seed node:

    Note

    Kind uses port mapping for the master node.

    telnet localhost 80
    

    Example of a positive system response:

    Connected to localhost.
    

    Example of a negative system response:

    telnet: connect to address ::1: Connection refused
    telnet: Unable to connect to remote host
    

    To open port 80:

    iptables -A INPUT -p tcp --dport 80 -j ACCEPT
    
Deploy a management cluster using the Container Cloud API

This section contains an overview of the cluster-related objects along with the configuration procedure of these objects during deployment of a management cluster using Bootstrap v2 through the Container Cloud API.

Deploy a management cluster using CLI

The following procedure describes how to prepare and deploy a management cluster using Bootstrap v2 by operating YAML templates available in the kaas-bootstrap/templates/ folder.

To deploy a management cluster using CLI:

  1. Set up a bootstrap cluster.

  2. Export kubeconfig of the kind cluster:

    export KUBECONFIG=<pathToKindKubeconfig>
    

    By default, <pathToKindKubeconfig> is $HOME/.kube/kind-config-clusterapi.

  3. Configure BIOS on a bare metal host.

  4. Navigate to kaas-bootstrap/templates/bm.

    Warning

    The kubectl apply command automatically saves the applied data as plain text into the kubectl.kubernetes.io/last-applied-configuration annotation of the corresponding object. This may result in revealing sensitive data in this annotation when creating or modifying objects containing credentials. Such Container Cloud objects include:

    • BareMetalHostCredential

    • ClusterOIDCConfiguration

    • License

    • Proxy

    • ServiceUser

    • TLSConfig

    Therefore, do not use kubectl apply on these objects. Use kubectl create, kubectl patch, or kubectl edit instead.

    If you used kubectl apply on these objects, you can remove the kubectl.kubernetes.io/last-applied-configuration annotation from the objects using kubectl edit.

  5. Create the BootstrapRegion object by modifying bootstrapregion.yaml.template.

    Configuration of bootstrapregion.yaml.template
    1. Select from the following options:

      • Since Container Cloud 2.26.0 (Cluster releases 16.1.0 and 17.1.0), set provider: baremetal and use the default <regionName>, which is region-one.

      • Before Container Cloud 2.26.0, set the required <providerName> and <regionName>.

      apiVersion: kaas.mirantis.com/v1alpha1
      kind: BootstrapRegion
      metadata:
        name: <regionName>
        namespace: default
      spec:
        provider: baremetal
      
    2. Create the object:

      ./kaas-bootstrap/bin/kubectl create -f \
          kaas-bootstrap/templates/bm/bootstrapregion.yaml.template
      

    Note

    In the following steps, apply the changes to objects using

    the commands below with the required template name:

    ./kaas-bootstrap/bin/kubectl create -f \
        kaas-bootstrap/templates/bm/<templateName>.yaml.template
    
  6. Create the ServiceUser object by modifying serviceusers.yaml.template.

    Configuration of serviceusers.yaml.template

    Service user is the initial user to create in Keycloak for access to a newly deployed management cluster. By default, it has the global-admin, operator (namespaced), and bm-pool-operator (namespaced) roles.

    You can delete serviceuser after setting up other required users with specific roles or after any integration with an external identity provider, such as LDAP.

    apiVersion: kaas.mirantis.com/v1alpha1
    kind: ServiceUserList
    items:
    - apiVersion: kaas.mirantis.com/v1alpha1
      kind: ServiceUser
      metadata:
        name: SET_USERNAME
      spec:
        password:
          value: SET_PASSWORD
    
  7. Optional. Prepare any number of additional SSH keys using the following example:

    apiVersion: kaas.mirantis.com/v1alpha1
    kind: PublicKey
    metadata:
      name: <SSHKeyName>
      namespace: default
    spec:
      publicKey: |
        <insert your public key here>
    
  8. Optional. Add the Proxy object using the example below:

    apiVersion: kaas.mirantis.com/v1alpha1
    kind: Proxy
    metadata:
      labels:
        kaas.mirantis.com/region: <regionName>
      name: <proxyName>
      namespace: default
    spec:
      ...
    

    The region label must match the BootstrapRegion object name.

    Note

    The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

  9. Configure and apply the cluster configuration using cluster deployment templates:

    1. In cluster.yaml.template, set mandatory cluster labels:

      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: <regionName>
      

      Note

      The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

    2. Configure provider settings as required:

      1. Inspect the default bare metal host profile definition in templates/bm/baremetalhostprofiles.yaml.template and adjust it to fit your hardware configuration. For details, see Customize the default bare metal host profile.

        Warning

        Any data stored on any device defined in the fileSystems list can be deleted or corrupted during cluster (re)deployment. It happens because each device from the fileSystems list is a part of the rootfs directory tree that is overwritten during (re)deployment.

        Examples of affected devices include:

        • A raw device partition with a file system on it

        • A device partition in a volume group with a logical volume that has a file system on it

        • An mdadm RAID device with a file system on it

        • An LVM RAID device with a file system on it

        The wipe field (deprecated) or wipeDevice structure (recommended since Container Cloud 2.26.0) have no effect in this case and cannot protect data on these devices.

        Therefore, to prevent data loss, move the necessary data from these file systems to another server beforehand, if required.

      2. In templates/bm/baremetalhostinventory.yaml.template, update the bare metal host definitions according to your environment configuration. Use the reference table below to manually set all parameters that start with SET_.

        Note

        Before Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0), also set the name of the bootstrapRegion object from bootstrapregion.yaml.template for the kaas.mirantis.com/region label across all objects listed in templates/bm/baremetalhosts.yaml.template.

        Bare metal hosts template mandatory parameters

        Parameter

        Description

        Example value

        SET_MACHINE_0_IPMI_USERNAME

        The IPMI user name to access the BMC. 0

        user

        SET_MACHINE_0_IPMI_PASSWORD

        The IPMI password to access the BMC. 0

        password

        SET_MACHINE_0_MAC

        The MAC address of the first master node in the PXE network.

        ac:1f:6b:02:84:71

        SET_MACHINE_0_BMC_ADDRESS

        The IP address of the BMC endpoint for the first master node in the cluster. Must be an address from the OOB network that is accessible through the management network gateway.

        192.168.100.11

        SET_MACHINE_1_IPMI_USERNAME

        The IPMI user name to access the BMC. 0

        user

        SET_MACHINE_1_IPMI_PASSWORD

        The IPMI password to access the BMC. 0

        password

        SET_MACHINE_1_MAC

        The MAC address of the second master node in the PXE network.

        ac:1f:6b:02:84:72

        SET_MACHINE_1_BMC_ADDRESS

        The IP address of the BMC endpoint for the second master node in the cluster. Must be an address from the OOB network that is accessible through the management network gateway.

        192.168.100.12

        SET_MACHINE_2_IPMI_USERNAME

        The IPMI user name to access the BMC. 0

        user

        SET_MACHINE_2_IPMI_PASSWORD

        The IPMI password to access the BMC. 0

        password

        SET_MACHINE_2_MAC

        The MAC address of the third master node in the PXE network.

        ac:1f:6b:02:84:73

        SET_MACHINE_2_BMC_ADDRESS

        The IP address of the BMC endpoint for the third master node in the cluster. Must be an address from the OOB network that is accessible through the management network gateway.

        192.168.100.13

        0(1,2,3,4,5,6)

        The parameter requires a user name and password in plain text.

      3. Configure cluster network:

        Important

        Bootstrap V2 supports only separated PXE and LCM networks.

        • To ensure successful bootstrap, enable asymmetric routing on the interfaces of the management cluster nodes. This is required because the seed node relies on one network by default, which can potentially cause traffic asymmetry.

          In the kernelParameters section of bm/baremetalhostprofiles.yaml.template, set rp_filter to 2. This enables loose mode as defined in RFC3704.

          Example configuration of asymmetric routing
          ...
          kernelParameters:
            ...
            sysctl:
              # Enables the "Loose mode" for the "k8s-lcm" interface (management network)
              net.ipv4.conf.k8s-lcm.rp_filter: "2"
              # Enables the "Loose mode" for the "bond0" interface (PXE network)
              net.ipv4.conf.bond0.rp_filter: "2"
              ...
          

          Note

          More complicated solutions that are not described in this manual include getting rid of traffic asymmetry, for example:

          • Configure source routing on management cluster nodes.

          • Plug the seed node into the same networks as the management cluster nodes, which requires custom configuration of the seed node.

        • Update the network objects definition in templates/bm/ipam-objects.yaml.template according to the environment configuration. By default, this template implies the use of separate PXE and life-cycle management (LCM) networks.

        • Manually set all parameters that start with SET_.

        For configuration details of bond network interface for the PXE and management network, see Configure NIC bonding.

        Example of the default L2 template snippet for a management cluster:

        bonds:
          bond0:
            interfaces:
              - {{ nic 0 }}
              - {{ nic 1 }}
            parameters:
              mode: active-backup
              primary: {{ nic 0 }}
            dhcp4: false
            dhcp6: false
            addresses:
              - {{ ip "bond0:mgmt-pxe" }}
        vlans:
          k8s-lcm:
            id: SET_VLAN_ID
            link: bond0
            addresses:
              - {{ ip "k8s-lcm:kaas-mgmt" }}
            nameservers:
              addresses: {{ nameservers_from_subnet "kaas-mgmt" }}
            routes:
              - to: 0.0.0.0/0
                via: {{ gateway_from_subnet "kaas-mgmt" }}
        

        In this example, the following configuration applies:

        • A bond of two NIC interfaces

        • A static address in the PXE network set on the bond

        • An isolated L2 segment for the LCM network is configured using the k8s-lcm VLAN with the static address in the LCM network

        • The default gateway address is in the LCM network

        For general concepts of configuring separate PXE and LCM networks for a management cluster, see Separate PXE and management networks. For the latest object templates and variable names to use, see the following tables.

        Network parameters mapping overview

        Deployment file name

        Parameters list to update manually

        ipam-objects.yaml.template

        • SET_LB_HOST

        • SET_MGMT_ADDR_RANGE

        • SET_MGMT_CIDR

        • SET_MGMT_DNS

        • SET_MGMT_NW_GW

        • SET_MGMT_SVC_POOL

        • SET_PXE_ADDR_POOL

        • SET_PXE_ADDR_RANGE

        • SET_PXE_CIDR

        • SET_PXE_SVC_POOL

        • SET_VLAN_ID

        bootstrap.env

        • KAAS_BM_PXE_IP

        • KAAS_BM_PXE_MASK

        • KAAS_BM_PXE_BRIDGE

        The below table contains examples of mandatory parameter values to set in templates/bm/ipam-objects.yaml.template for the network scheme that has the following networks:

        • 172.16.59.0/24 - PXE network

        • 172.16.61.0/25 - LCM network

        Mandatory network parameters of the IPAM objects template

        Parameter

        Description

        Example value

        SET_PXE_CIDR

        The IP address of the PXE network in the CIDR notation. The minimum recommended network size is 256 addresses (/24 prefix length).

        172.16.59.0/24

        SET_PXE_SVC_POOL

        The IP address range to use for endpoints of load balancers in the PXE network for the Container Cloud services: Ironic-API, DHCP server, HTTP server, and caching server. The minimum required range size is 5 addresses.

        172.16.59.6-172.16.59.15

        SET_PXE_ADDR_POOL

        The IP address range in the PXE network to use for dynamic address allocation for hosts during inspection and provisioning.

        The minimum recommended range size is 30 addresses for management cluster nodes if it is located in a separate PXE network segment. Otherwise, it depends on the number of managed cluster nodes to deploy in the same PXE network segment as the management cluster nodes.

        172.16.59.51-172.16.59.200

        SET_PXE_ADDR_RANGE

        The IP address range in the PXE network to use for static address allocation on each management cluster node. The minimum recommended range size is 6 addresses.

        172.16.59.41-172.16.59.50

        SET_MGMT_CIDR

        The IP address of the LCM network for the management cluster in the CIDR notation. If managed clusters will have their separate LCM networks, those networks must be routable to the LCM network. The minimum recommended network size is 128 addresses (/25 prefix length).

        172.16.61.0/25

        SET_MGMT_NW_GW

        The default gateway address in the LCM network. This gateway must provide access to the OOB network of the Container Cloud cluster and to the Internet to download the Mirantis artifacts.

        172.16.61.1

        SET_LB_HOST

        The IP address of the externally accessible MKE API endpoint of the cluster in the CIDR notation. This address must be within the management SET_MGMT_CIDR network but must NOT overlap with any other addresses or address ranges within this network. External load balancers are not supported.

        172.16.61.5/32

        SET_MGMT_DNS

        An external (non-Kubernetes) DNS server accessible from the LCM network.

        8.8.8.8

        SET_MGMT_ADDR_RANGE

        The IP address range that includes addresses to be allocated to bare metal hosts in the LCM network for the management cluster.

        When this network is shared with managed clusters, the size of this range limits the number of hosts that can be deployed in all clusters sharing this network.

        When this network is solely used by a management cluster, the range must include at least 6 addresses for bare metal hosts of the management cluster.

        172.16.61.30-172.16.61.40

        SET_MGMT_SVC_POOL

        The IP address range to use for the externally accessible endpoints of load balancers in the LCM network for the Container Cloud services, such as Keycloak, web UI, and so on. The minimum required range size is 19 addresses.

        172.16.61.10-172.16.61.29

        SET_VLAN_ID

        The VLAN ID used for isolation of LCM network. The bootstrap.sh process and the seed node must have routable access to the network in this VLAN.

        3975

        When using separate PXE and LCM networks, the management cluster services are exposed in different networks using two separate MetalLB address pools:

        • Services exposed through the PXE network are as follows:

          • Ironic API as a bare metal provisioning server

          • HTTP server that provides images for network boot and server provisioning

          • Caching server for accessing the Container Cloud artifacts deployed on hosts

        • Services exposed through the LCM network are all other Container Cloud services, such as Keycloak, web UI, and so on.

        The default MetalLB configuration described in the MetalLBConfig object template of templates/bm/metallbconfig.yaml.template uses two separate MetalLB address pools. Also, it uses the interfaces selector in its l2Advertisements template.

        Caution

        When you change the L2Template object template in templates/bm/ipam-objects.yaml.template, ensure that interfaces listed in the interfaces field of the MetalLBConfig.spec.l2Advertisements section match those used in your L2Template. For details about the interfaces selector, see API Reference: MetalLBConfig spec.

        See Configure MetalLB for details on MetalLB configuration.

      4. In cluster.yaml.template, update the cluster-related settings to fit your deployment.

      5. Optional. Technology Preview. Deprecated since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0). Available since Container Cloud 2.24.0 (Cluster release 14.0.0). Enable WireGuard for traffic encryption on the Kubernetes workloads network.

        WireGuard configuration
        1. Ensure that the Calico MTU size is at least 60 bytes smaller than the interface MTU size of the workload network. IPv4 WireGuard uses a 60-byte header. For details, see Set the MTU size for Calico.

        2. In templates/bm/cluster.yaml.template, enable WireGuard by adding the secureOverlay parameter:

          spec:
            ...
            providerSpec:
              value:
                ...
                secureOverlay: true
          

          Caution

          Changing this parameter on a running cluster causes a downtime that can vary depending on the cluster size.

        For more details about WireGuard, see Calico documentation: Encrypt in-cluster pod traffic.

    3. Configure StackLight. For parameters description, see StackLight configuration parameters.

    4. Optional. Configure additional cluster settings as described in Configure optional cluster settings.

  10. Apply configuration for machines using machines.yaml.template.

    Configuration of machines.yaml.template
    1. Add the following mandatory machine labels:

      labels:
        kaas.mirantis.com/provider: baremetal
        cluster.sigs.k8s.io/cluster-name: <clusterName>
        kaas.mirantis.com/region: <regionName>
        cluster.sigs.k8s.io/control-plane: "true"
      

      Note

      The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

    2. Configure the provider-specific settings:

      Inspect the machines.yaml.template and adjust spec and labels of each entry according to your deployment. Adjust spec.providerSpec.value.hostSelector values to match BareMetalHostInventory corresponding to each machine. For details, see API Reference: Bare metal Machine spec.

  11. Monitor the inspecting process of the baremetal hosts and wait until all hosts are in the available state:

    kubectl get bmh -o go-template='{{- range .items -}} {{.status.provisioning.state}}{{"\n"}} {{- end -}}'
    

    Example of system response:

    available
    available
    available
    
  12. Monitor the BootstrapRegion object status and wait until it is ready.

    kubectl get bootstrapregions -o go-template='{{(index .items 0).status.ready}}{{"\n"}}'
    

    To obtain more granular status details, monitor status.conditions:

    kubectl get bootstrapregions -o go-template='{{(index .items 0).status.conditions}}{{"\n"}}'
    

    For a more convenient system response, consider using dedicated tools such as jq or yq and adjust the -o flag to output in json or yaml format accordingly.

    Note

    Before Container Cloud 2.26.0 (Cluster release 16.1.0), the BareMetalObjectReferences condition is not mandatory and may remain in the not ready state with no effect on the BootstrapRegion object. Since Container Cloud 2.26.0, this condition is mandatory.

  13. Change the directory to /kaas-bootstrap/.

  14. Approve the BootstrapRegion object to start the cluster deployment:

    ./container-cloud bootstrap approve all
    
    ./container-cloud bootstrap approve <bootstrapRegionName>
    

    Caution

    Once you approve the BootstrapRegion object, no cluster or machine modification is allowed.

    Warning

    Do not manually restart or power off any of the bare metal hosts during the bootstrap process.

  15. Monitor the deployment progress. For deployment stages description, see Overview of the deployment workflow.

  16. Verify that network addresses used on your clusters do not overlap with the following default MKE network addresses for Swarm and MCR:

    • 10.0.0.0/16 is used for Swarm networks. IP addresses from this network are virtual.

    • 10.99.0.0/16 is used for MCR networks. IP addresses from this network are allocated on hosts.

    Verification of Swarm and MCR network addresses

    To verify Swarm and MCR network addresses, run on any master node:

    docker info
    

    Example of system response:

    Server:
     ...
     Swarm:
      ...
      Default Address Pool: 10.0.0.0/16
      SubnetSize: 24
      ...
     Default Address Pools:
       Base: 10.99.0.0/16, Size: 20
     ...
    

    Not all of Swarm and MCR addresses are usually in use. One Swarm Ingress network is created by default and occupies the 10.0.0.0/24 address block. Also, three MCR networks are created by default and occupy three address blocks: 10.99.0.0/20, 10.99.16.0/20, 10.99.32.0/20.

    To verify the actual networks state and addresses in use, run:

    docker network ls
    docker network inspect <networkName>
    
  17. Optional. If you plan to use multiple L2 segments for provisioning of managed cluster nodes, consider the requirements specified in Configure multiple DHCP address ranges.

Configure a bare metal deployment

During creation of a bare metal management cluster using Bootstrap v2, configure several cluster settings to fit your deployment.

Configure BIOS on a bare metal host

Note

Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

Caution

While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

Before adding new BareMetalHostInventory objects, configure hardware hosts to correctly boot them over the PXE network.

Important

Consider the following common requirements for hardware hosts configuration:

  • Update firmware for BIOS and Baseboard Management Controller (BMC) to the latest available version, especially if you are going to apply the UEFI configuration.

    Container Cloud uses the ipxe.efi binary loader that might be not compatible with old firmware and have vendor-related issues with UEFI booting. For example, the Supermicro issue. In this case, we recommend using the legacy booting format.

  • Configure all or at least the PXE NIC on switches.

    If the hardware host has more than one PXE NIC to boot, we strongly recommend setting up only one in the boot order. It speeds up the provisioning phase significantly.

    Some hardware vendors require a host to be rebooted during BIOS configuration changes from legacy to UEFI or vice versa for the extra option with NIC settings to appear in the menu.

  • Connect only one Ethernet port on a host to the PXE network at any given time. Collect the physical address (MAC) of this interface and use it to configure the BareMetalHostInventory object describing the host.

To configure BIOS on a bare metal host:

  1. Enable the global BIOS mode using BIOS > Boot > boot mode select > legacy. Reboot the host if required.

  2. Enable the LAN-PXE-OPROM support using the following menus:

    • BIOS > Advanced > PCI/PCIe Configuration > LAB OPROM TYPE > legacy

    • BIOS > Advanced > PCI/PCIe Configuration > Network Stack > enabled

    • BIOS > Advanced > PCI/PCIe Configuration > IPv4 PXE Support > enabled

  3. Set up the configured boot order:

    1. BIOS > Boot > Legacy-Boot-Order#1 > Hard Disk

    2. BIOS > Boot > Legacy-Boot-Order#2 > NIC

  4. Save changes and power off the host.

  1. Enable the global BIOS mode using BIOS > Boot > boot mode select > UEFI. Reboot the host if required.

  2. Enable the LAN-PXE-OPROM support using the following menus:

    • BIOS > Advanced > PCI/PCIe Configuration > LAB OPROM TYPE > uefi

    • BIOS > Advanced > PCI/PCIe Configuration > Network Stack > enabled

    • BIOS > Advanced > PCI/PCIe Configuration > IPv4 PXE Support > enabled

    Note

    UEFI support might not apply to all NICs. But at least built-in network interfaces should support it.

  3. Set up the configured boot order:

    1. BIOS > Boot > UEFI-Boot-Order#1 > UEFI Hard Disk

    2. BIOS > Boot > UEFI-Boot-Order#2 > UEFI Network

  4. Save changes and power off the host.

Customize the default bare metal host profile

This section describes the bare metal host profile settings and instructs how to configure this profile before deploying Mirantis Container Cloud on physical servers.

The bare metal host profile is a Kubernetes custom resource. It allows the Infrastructure Operator to define how the storage devices and the operating system are provisioned and configured.

The bootstrap templates for a bare metal deployment include the template for the default BareMetalHostProfile object in the following file that defines the default bare metal host profile:

templates/bm/baremetalhostprofiles.yaml.template

Note

Using BareMetalHostProfile, you can configure LVM or mdadm-based software RAID support during a management or managed cluster creation. For details, see Configure RAID support.

This feature is available as Technology Preview. Use such configuration for testing and evaluation purposes only. For the Technology Preview feature definition, refer to Technology Preview features.

Warning

Any data stored on any device defined in the fileSystems list can be deleted or corrupted during cluster (re)deployment. It happens because each device from the fileSystems list is a part of the rootfs directory tree that is overwritten during (re)deployment.

Examples of affected devices include:

  • A raw device partition with a file system on it

  • A device partition in a volume group with a logical volume that has a file system on it

  • An mdadm RAID device with a file system on it

  • An LVM RAID device with a file system on it

The wipe field (deprecated) or wipeDevice structure (recommended since Container Cloud 2.26.0) have no effect in this case and cannot protect data on these devices.

Therefore, to prevent data loss, move the necessary data from these file systems to another server beforehand, if required.

The customization procedure of BareMetalHostProfile is almost the same for the management and managed clusters, with the following differences:

  • For a management cluster, the customization automatically applies to machines during bootstrap. And for a managed cluster, you apply the changes using kubectl before creating a managed cluster.

  • For a management cluster, you edit the default baremetalhostprofiles.yaml.template. And for a managed cluster, you create a new BareMetalHostProfile with the necessary configuration.

For the procedure details, see Create a custom bare metal host profile. Use this procedure for both types of clusters considering the differences described above.

Configure NIC bonding

You can configure L2 templates for the management cluster to set up a bond network interface for the PXE and management network.

This configuration must be applied to the bootstrap templates, before you run the bootstrap script to deploy the management cluster.

..admonition:: Configuration requirements for NIC bonding

  • Add at least two physical interfaces to each host in your management cluster.

  • Connect at least two interfaces per host to an Ethernet switch that supports Link Aggregation Control Protocol (LACP) port groups and LACP fallback.

  • Configure an LACP group on the ports connected to the NICs of a host.

  • Configure the LACP fallback on the port group to ensure that the host can boot over the PXE network before the bond interface is set up on the host operating system.

  • Configure server BIOS for both NICs of a bond to be PXE-enabled.

  • If the server does not support booting from multiple NICs, configure the port of the LACP group that is connected to the PXE-enabled NIC of a server to be the primary port. With this setting, the port becomes active in the fallback mode.

  • Configure the ports that connect servers to the PXE network with the PXE VLAN as native or untagged.

For reference configuration of network fabric in a baremetal-based cluster, see Network fabric.

To configure a bond interface that aggregates two interfaces for the PXE and management network:

  1. In kaas-bootstrap/templates/bm/ipam-objects.yaml.template:

    1. Verify that only the following parameters for the declaration of {{nic 0}} and {{nic 1}} are set, as shown in the example below:

      • dhcp4

      • dhcp6

      • match

      • set-name

      Remove other parameters.

    2. Verify that the declaration of the bond interface bond0 has the interfaces parameter listing both Ethernet interfaces.

    3. Verify that the node address in the PXE network (ip "bond0:mgmt-pxe" in the below example) is bound to the bond interface or to the virtual bridge interface tied to that bond.

      Caution

      No VLAN ID must be configured for the PXE network from the host side.

    4. Configure bonding options using the parameters field. The only mandatory option is mode. See the example below for details.

      Note

      You can set any mode supported by netplan and your hardware.

      Important

      Bond monitoring is disabled in Ubuntu by default. However, Mirantis highly recommends enabling it using Media Independent Interface (MII) monitoring by setting the mii-monitor-interval parameter to a non-zero value. For details, see Linux documentation: bond monitoring.

  2. Verify your configuration using the following example:

    kind: L2Template
    metadata:
      name: kaas-mgmt
      ...
    spec:
      ...
      l3Layout:
        - subnetName: kaas-mgmt
          scope:      namespace
      npTemplate: |
        version: 2
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
        bonds:
          bond0:
            interfaces:
              - {{nic 0}}
              - {{nic 1}}
            parameters:
              mode: 802.3ad
              mii-monitor-interval: 100
            dhcp4: false
            dhcp6: false
            addresses:
              - {{ ip "bond0:mgmt-pxe" }}
        vlans:
          k8s-lcm:
            id: SET_VLAN_ID
            link: bond0
            addresses:
              - {{ ip "k8s-lcm:kaas-mgmt" }}
            nameservers:
              addresses: {{ nameservers_from_subnet "kaas-mgmt" }}
            routes:
              - to: 0.0.0.0/0
                via: {{ gateway_from_subnet "kaas-mgmt" }}
        ...
    
  3. Proceed to bootstrap your management cluster as described in Deploy a management cluster using CLI.

Separate PXE and management networks

This section describes how to configure a dedicated PXE network for a management bare metal cluster. A separate PXE network allows isolating sensitive bare metal provisioning process from the end users. The users still have access to Container Cloud services, such as Keycloak, to authenticate workloads in managed clusters, such as Horizon in a Mirantis OpenStack for Kubernetes cluster.

Note

This additional configuration procedure must be completed as part of the Deploy a management cluster using CLI steps. It substitutes or appends some configuration parameters and templates that are used in Deploy a management cluster using CLI for the management cluster to use two networks, PXE and management, instead of one PXE/management network. We recommend considering the Deploy a management cluster using CLI procedure first.

The following table describes the overall network mapping scheme with all L2/L3 parameters, for example, for two networks, PXE (CIDR 10.0.0.0/24) and management (CIDR 10.0.11.0/24):

Network mapping overview

Deployment file name

Network

Parameters and values

cluster.yaml

Management

  • SET_LB_HOST=10.0.11.90

  • SET_METALLB_ADDR_POOL=10.0.11.61-10.0.11.80

ipam-objects.yaml

PXE

  • SET_IPAM_CIDR=10.0.0.0/24

  • SET_PXE_NW_GW=10.0.0.1

  • SET_PXE_NW_DNS=8.8.8.8

  • SET_IPAM_POOL_RANGE=10.0.0.100-10.0.0.109

  • SET_METALLB_PXE_ADDR_POOL=10.0.0.61-10.0.0.70

ipam-objects.yaml

Management

  • SET_LCM_CIDR=10.0.11.0/24

  • SET_LCM_RANGE=10.0.11.100-10.0.11.199

  • SET_LB_HOST=10.0.11.90

  • SET_METALLB_ADDR_POOL=10.0.11.61-10.0.11.80

bootstrap.sh

PXE

  • KAAS_BM_PXE_IP=10.0.0.20

  • KAAS_BM_PXE_MASK=24

  • KAAS_BM_PXE_BRIDGE=br0

  • KAAS_BM_BM_DHCP_RANGE=10.0.0.30,10.0.0.59,255.255.255.0

  • BOOTSTRAP_METALLB_ADDRESS_POOL=10.0.0.61-10.0.0.80


When using separate PXE and management networks, the management cluster services are exposed in different networks using two separate MetalLB address pools:

  • Services exposed through the PXE network are as follows:

    • Ironic API as a bare metal provisioning server

    • HTTP server that provides images for network boot and server provisioning

    • Caching server for accessing the Container Cloud artifacts deployed on hosts

  • Services exposed through the management network are all other Container Cloud services, such as Keycloak, web UI, and so on.

To configure separate PXE and management networks:

  1. Inspect guidelines to follow during configuration of the Subnet object as a MetalLB address pool as described MetalLB configuration guidelines for subnets.

  2. To ensure successful bootstrap, enable asymmetric routing on the interfaces of the management cluster nodes. This is required because the seed node relies on one network by default, which can potentially cause traffic asymmetry.

    In the kernelParameters section of bm/baremetalhostprofiles.yaml.template, set rp_filter to 2. This enables loose mode as defined in RFC3704.

    Example configuration of asymmetric routing
    ...
    kernelParameters:
      ...
      sysctl:
        # Enables the "Loose mode" for the "k8s-lcm" interface (management network)
        net.ipv4.conf.k8s-lcm.rp_filter: "2"
        # Enables the "Loose mode" for the "bond0" interface (PXE network)
        net.ipv4.conf.bond0.rp_filter: "2"
        ...
    

    Note

    More complicated solutions that are not described in this manual include getting rid of traffic asymmetry, for example:

    • Configure source routing on management cluster nodes.

    • Plug the seed node into the same networks as the management cluster nodes, which requires custom configuration of the seed node.

  3. In kaas-bootstrap/templates/bm/ipam-objects.yaml.template:

    • Substitute all the Subnet object templates with the new ones as described in the example template below

    • Update the L2 template spec.l3Layout and spec.npTemplate fields as described in the example template below

    Example of the Subnet object templates
    # Subnet object that provides IP addresses for bare metal hosts of
    # management cluster in the PXE network.
    apiVersion: "ipam.mirantis.com/v1alpha1"
    kind: Subnet
    metadata:
      name: mgmt-pxe
      namespace: default
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas-mgmt-pxe-subnet: ""
    spec:
      cidr: SET_IPAM_CIDR
      gateway: SET_PXE_NW_GW
      nameservers:
        - SET_PXE_NW_DNS
      includeRanges:
        - SET_IPAM_POOL_RANGE
      excludeRanges:
        - SET_METALLB_PXE_ADDR_POOL
    ---
    # Subnet object that provides IP addresses for bare metal hosts of
    # management cluster in the management network.
    apiVersion: "ipam.mirantis.com/v1alpha1"
    kind: Subnet
    metadata:
      name: mgmt-lcm
      namespace: default
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas-mgmt-lcm-subnet: ""
        ipam/SVC-k8s-lcm: "1"
        ipam/SVC-ceph-cluster: "1"
        ipam/SVC-ceph-public: "1"
        cluster.sigs.k8s.io/cluster-name: CLUSTER_NAME
    spec:
      cidr: {{ SET_LCM_CIDR }}
      includeRanges:
        - {{ SET_LCM_RANGE }}
      excludeRanges:
        - SET_LB_HOST
        - SET_METALLB_ADDR_POOL
    ---
    # Deprecated since 2.27.0. Subnet object that provides configuration
    # for "services-pxe" MetalLB address pool that will be used to expose
    # services LB endpoints in the PXE network.
    apiVersion: "ipam.mirantis.com/v1alpha1"
    kind: Subnet
    metadata:
      name: mgmt-pxe-lb
      namespace: default
      labels:
        kaas.mirantis.com/provider: baremetal
        metallb/address-pool-name: services-pxe
        metallb/address-pool-protocol: layer2
        metallb/address-pool-auto-assign: "false"
        cluster.sigs.k8s.io/cluster-name: CLUSTER_NAME
    spec:
      cidr: SET_IPAM_CIDR
      includeRanges:
        - SET_METALLB_PXE_ADDR_POOL
    
    Example of the L2 template spec
    kind: L2Template
    ...
    spec:
      ...
      l3Layout:
        - scope: namespace
          subnetName: kaas-mgmt-pxe
          labelSelector:
            kaas.mirantis.com/provider: baremetal
            kaas-mgmt-pxe-subnet: ""
        - scope: namespace
          subnetName: kaas-mgmt-lcm
          labelSelector:
            kaas.mirantis.com/provider: baremetal
            kaas-mgmt-lcm-subnet: ""
      npTemplate: |
        version: 2
        renderer: networkd
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
        bridges:
          bm-pxe:
            interfaces:
             - {{ nic 0 }}
            dhcp4: false
            dhcp6: false
            addresses:
              - {{ ip "bm-pxe:kaas-mgmt-pxe" }}
            nameservers:
              addresses: {{ nameservers_from_subnet "kaas-mgmt-pxe" }}
            routes:
              - to: 0.0.0.0/0
                via: {{ gateway_from_subnet "kaas-mgmt-pxe" }}
          k8s-lcm:
            interfaces:
             - {{ nic 1 }}
            dhcp4: false
            dhcp6: false
            addresses:
              - {{ ip "k8s-lcm:kaas-mgmt-lcm" }}
            nameservers:
              addresses: {{ nameservers_from_subnet "kaas-mgmt-lcm" }}
    

    Deprecated since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0): the last Subnet template named mgmt-pxe-lb in the example above will be used to configure the MetalLB address pool in the PXE network. The bare metal provider will automatically configure MetalLB with address pools using the Subnet objects identified by specific labels.

    Warning

    The bm-pxe address must have a separate interface with only one address on this interface.

  4. Verify the current MetalLB configuration that is stored in MetalLB objects:

    kubectl -n metallb-system get ipaddresspools,l2advertisements
    

    For the example configuration described above, the system outputs a similar content:

    NAME                                    AGE
    ipaddresspool.metallb.io/default        129m
    ipaddresspool.metallb.io/services-pxe   129m
    
    NAME                                      AGE
    l2advertisement.metallb.io/default        129m
    l2advertisement.metallb.io/services-pxe   129m
    

    To verify the MetalLB objects:

    kubectl -n metallb-system get <object> -o json | jq '.spec'
    

    For the example configuration described above, the system outputs a similar content for ipaddresspool objects:

    {
      "addresses": [
        "10.0.11.61-10.0.11.80"
      ],
      "autoAssign": true,
      "avoidBuggyIPs": false
    }
    $ kubectl -n metallb-system get ipaddresspool.metallb.io/services-pxe -o json | jq '.spec'
    {
      "addresses": [
        "10.0.0.61-10.0.0.70"
      ],
      "autoAssign": false,
      "avoidBuggyIPs": false
    }
    

    The auto-assign parameter will be set to false for all address pools except the default one. So, a particular service will get an address from such an address pool only if the Service object has a special metallb.universe.tf/address-pool annotation that points to the specific address pool name.

    Note

    It is expected that every Container Cloud service on a management cluster will be assigned to one of the address pools. Current consideration is to have two MetalLB address pools:

    • services-pxe is a reserved address pool name to use for the Container Cloud services in the PXE network (Ironic API, HTTP server, caching server).

      The bootstrap cluster also uses the services-pxe address pool for its provision services for management cluster nodes to be provisioned from the bootstrap cluster. After the management cluster is deployed, the bootstrap cluster is deleted and that address pool is solely used by the newly deployed cluster.

    • default is an address pool to use for all other Container Cloud services in the management network. No annotation is required on the Service objects in this case.

  5. Select from the following options for configuration of the dedicatedMetallbPools flag:

    Skip this step because the flag is hardcoded to true.

    Verify that the flag is set to the default true value.

    The flag enables splitting of LB endpoints for the Container Cloud services. The metallb.universe.tf/address-pool annotations on the Service objects are configured by the bare metal provider automatically when the dedicatedMetallbPools flag is set to true.

    Example Service object configured by the baremetal-operator Helm release:

    apiVersion: v1
    kind: Service
    metadata:
      name: ironic-api
      annotations:
        metallb.universe.tf/address-pool: services-pxe
    spec:
      ports:
      - port: 443
        targetPort: 443
      type: LoadBalancer
    

    The metallb.universe.tf/address-pool annotation on the Service object is set to services-pxe by the baremetal provider, so the ironic-api service will be assigned an LB address from the corresponding MetalLB address pool.

  6. In addition to the network parameters defined in Deploy a management cluster using CLI, configure the following ones by replacing them in templates/bm/ipam-objects.yaml.template:

    New subnet template parameters

    Parameter

    Description

    Example value

    SET_LCM_CIDR

    Address of a management network for the management cluster in the CIDR notation. You can later share this network with managed clusters where it will act as the LCM network. If managed clusters have their separate LCM networks, those networks must be routable to the management network.

    10.0.11.0/24

    SET_LCM_RANGE

    Address range that includes addresses to be allocated to bare metal hosts in the management network for the management cluster. When this network is shared with managed clusters, the size of this range limits the number of hosts that can be deployed in all clusters that share this network. When this network is solely used by a management cluster, the range should include at least 3 IP addresses for bare metal hosts of the management cluster.

    10.0.11.100-10.0.11.109

    SET_METALLB_PXE_ADDR_POOL

    Address range to be used for LB endpoints of the Container Cloud services: Ironic-API, HTTP server, and caching server. This range must be within the PXE network. The minimum required range is 5 IP addresses.

    10.0.0.61-10.0.0.70

    The following parameters will now be tied to the management network while their meaning remains the same as described in Deploy a management cluster using CLI:

    Subnet template parameters migrated to management network

    Parameter

    Description

    Example value

    SET_LB_HOST

    IP address of the externally accessible API endpoint of the management cluster. This address must NOT be within the SET_METALLB_ADDR_POOL range but within the management network. External load balancers are not supported.

    10.0.11.90

    SET_METALLB_ADDR_POOL

    The address range to be used for the externally accessible LB endpoints of the Container Cloud services, such as Keycloak, web UI, and so on. This range must be within the management network. The minimum required range is 19 IP addresses.

    10.0.11.61-10.0.11.80

  7. Proceed to further steps in Deploy a management cluster using CLI.

Configure multiple DHCP address ranges

To facilitate multi-rack and other types of distributed bare metal datacenter topologies, the dnsmasq DHCP server used for host provisioning in Container Cloud supports working with multiple L2 segments through network routers that support DHCP relay.

Container Cloud has its own DHCP relay running on one of the management cluster nodes. That DHCP relay serves for proxying DHCP requests in the same L2 domain where the management cluster nodes are located.

Caution

Networks used for hosts provisioning of a managed cluster must have routes to the PXE network (when a dedicated PXE network is configured) or to the combined PXE/management network of the management cluster. This configuration enables hosts to have access to the management cluster services that are used during host provisioning.

Management cluster nodes must have routes through the PXE network to PXE network segments used on a managed cluster. The following example contains L2 template fragments for a management cluster node:

l3Layout:
  # PXE/static subnet for a management cluster
  - scope: namespace
    subnetName: kaas-mgmt-pxe
    labelSelector:
      kaas-mgmt-pxe-subnet: "1"
  # management (LCM) subnet for a management cluster
  - scope: namespace
    subnetName: kaas-mgmt-lcm
    labelSelector:
      kaas-mgmt-lcm-subnet: "1"
  # PXE/dhcp subnets for a managed cluster
  - scope: namespace
    subnetName: managed-dhcp-rack-1
  - scope: namespace
    subnetName: managed-dhcp-rack-2
  - scope: namespace
    subnetName: managed-dhcp-rack-3
  ...
npTemplate: |
  ...
  bonds:
    bond0:
      interfaces:
        - {{ nic 0 }}
        - {{ nic 1 }}
      parameters:
        mode: active-backup
        primary: {{ nic 0 }}
        mii-monitor-interval: 100
      dhcp4: false
      dhcp6: false
      addresses:
        # static address on management node in the PXE network
        - {{ ip "bond0:kaas-mgmt-pxe" }}
      routes:
        # routes to managed PXE network segments
        - to: {{ cidr_from_subnet "managed-dhcp-rack-1" }}
          via: {{ gateway_from_subnet "kaas-mgmt-pxe" }}
        - to: {{ cidr_from_subnet "managed-dhcp-rack-2" }}
          via: {{ gateway_from_subnet "kaas-mgmt-pxe" }}
        - to: {{ cidr_from_subnet "managed-dhcp-rack-3" }}
          via: {{ gateway_from_subnet "kaas-mgmt-pxe" }}
        ...

To configure DHCP ranges for dnsmasq, create the Subnet objects tagged with the ipam/SVC-dhcp-range label while setting up subnets for a managed cluster using CLI.

Caution

Support of multiple DHCP ranges has the following limitations:

  • Using of custom DNS server addresses for servers that boot over PXE is not supported.

  • The Subnet objects for DHCP ranges cannot be associated with any specific cluster, as DHCP server configuration is only applicable to the management cluster where DHCP server is running. The cluster.sigs.k8s.io/cluster-name label will be ignored.

    Note

    Before the Cluster release 16.1.0, the Subnet object contains the kaas.mirantis.com/region label that specifies the region where the DHCP ranges will be applied.

Migration of DHCP configuration for existing management clusters

Note

This section applies only to existing management clusters that are created before Container 2.24.0.

Caution

Since Container Cloud 2.24.0, you can only remove the deprecated dnsmasq.dhcp_range, dnsmasq.dhcp_ranges, dnsmasq.dhcp_routers, and dnsmasq.dhcp_dns_servers values from the cluster spec.

The Admission Controller does not accept any other changes in these values. This configuration is completely superseded by the Subnet object.

The DHCP configuration automatically migrated from the cluster spec to Subnet objects after cluster upgrade to 2.21.0.

To remove the deprecated dnsmasq parameters from the cluster spec:

  1. Open the management cluster spec for editing.

  2. In the baremetal-operator release values, remove the dnsmasq.dhcp_range, dnsmasq.dhcp_ranges, dnsmasq.dhcp_routers, and dnsmasq.dhcp_dns_servers parameters. For example:

    regional:
    - helmReleases:
      - name: baremetal-operator
        values:
          dnsmasq:
            dhcp_range: 10.204.1.0,10.204.5.255,255.255.255.0
    

    Caution

    The dnsmasq.dhcp_<name> parameters of the baremetal-operator Helm chart values in the Cluster spec are deprecated since the Cluster release 11.5.0 and removed in the Cluster release 14.0.0.

  3. Ensure that the required DHCP ranges and options are set in the Subnet objects. For configuration details, see Configure DHCP ranges for dnsmasq.

The dnsmasq configuration options dhcp-option=3 and dhcp-option=6 are absent in the default configuration. So, by default, dnsmasq will send the DNS server and default route to DHCP clients as defined in the dnsmasq official documentation:

  • The netmask and broadcast address are the same as on the host running dnsmasq.

  • The DNS server and default route are set to the address of the host running dnsmasq.

  • If the domain name option is set, this name is sent to DHCP clients.

Configure DHCP ranges for dnsmasq
  1. Create the Subnet objects tagged with the ipam/SVC-dhcp-range label.

    Caution

    For cluster-specific subnets, create Subnet objects in the same namespace as the related Cluster object project. For shared subnets, create Subnet objects in the default namespace.

    To create the Subnet objects, refer to Create subnets.

    Use the following Subnet object example to specify DHCP ranges and DHCP options to pass the default route address:

    apiVersion: "ipam.mirantis.com/v1alpha1"
    kind: Subnet
    metadata:
      name: mgmt-dhcp-range
      namespace: default
      labels:
        ipam/SVC-dhcp-range: ""
        kaas.mirantis.com/provider: baremetal
    spec:
      cidr: 10.11.0.0/24
      gateway: 10.11.0.1
      includeRanges:
        - 10.11.0.121-10.11.0.125
        - 10.11.0.191-10.11.0.199
    

    Note

    Setting of custom nameservers in the DHCP subnet is not supported.

    After creation of the above Subnet object, the provided data will be utilized to render the Dnsmasq object used for configuration of the dnsmasq deployment. You do not have to manually edit the Dnsmasq object.

  2. Verify that the changes are applied to the Dnsmasq object:

    kubectl --kubeconfig <pathToMgmtClusterKubeconfig> \
    -n kaas get dnsmasq dnsmasq-dynamic-config -o json
    
Configure DHCP relay on ToR switches

For servers to access the DHCP server across the L2 segment boundaries, for example, from another rack with a different VLAN for PXE network, you must configure DHCP relay (agent) service on the border switch of the segment. For example, on a top-of-rack (ToR) or leaf (distribution) switch, depending on the data center network topology.

Warning

To ensure predictable routing for the relay of DHCP packets, Mirantis strongly advises against the use of chained DHCP relay configurations. This precaution limits the number of hops for DHCP packets, with an optimal scenario being a single hop.

This approach is justified by the unpredictable nature of chained relay configurations and potential incompatibilities between software and hardware relay implementations.

The dnsmasq server listens on the PXE network of the management cluster by using the dhcp-lb Kubernetes Service.

To configure the DHCP relay service, specify the external address of the dhcp-lb Kubernetes Service as an upstream address for the relayed DHCP requests, which is the IP helper address for DHCP. There is the dnsmasq deployment behind this service that can only accept relayed DHCP requests.

Container Cloud has its own DHCP relay running on one of the management cluster nodes. That DHCP relay serves for proxying DHCP requests in the same L2 domain where the management cluster nodes are located.

To obtain the actual IP address issued to the dhcp-lb Kubernetes Service:

kubectl -n kaas get service dhcp-lb
Enable dynamic IP allocation

Available since the Cluster release 16.1.0

This section instructs you on how to enable dynamic IP allocation feature to increase the amount of baremetal hosts to be provisioned in parallel on managed clusters.

Using this feature, you can effortlessly deploy a large managed cluster by provisioning up to 100 hosts simultaneously. In addition to dynamic IP allocation, this feature disables the ping check in the DHCP server. Therefore, if you plan to deploy large managed clusters, enable this feature during the management cluster bootstrap.

Caution

Before using this feature, familiarize yourself with DHCP range requirements for PXE.

To enable dynamic IP allocation for large managed clusters:

In the Cluster object of the management cluster, modify the configuration of baremetal-operator by setting dynamic_bootp to true:

spec:
  ...
  providerSpec:
    value:
      kaas:
        ...
        regional:
          - helmReleases:
            - name: baremetal-operator
              values:
                dnsmasq:
                  dynamic_bootp: true
            provider: baremetal
          ...
Set a custom external IP address for the DHCP service

Available since Container Cloud 2.25.0 (Cluster release 16.0.0)

This section instructs you on how to set a custom external IP address for the dhcp-lb service so that it remains the same during management cluster upgrades and other LCM operations.

The changes of dhcp-lb service address may lead to the necessity of changing configuration for DHCP relays on ToR switches. The described procedure allows you to avoid such unwanted changes. This configuration makes sense when you use multiple DHCP address ranges on your deployment. See Configure multiple DHCP address ranges for details.

To set a custom external IP address for the dhcp-lb service:

  1. In the Cluster object of the management cluster, modify the configuration of the baremetal-operator release by setting dnsmasq.dedicated_udp_service_address_pool to true:

    spec:
      ...
      providerSpec:
        value:
          kaas:
            ...
            regional:
              - helmReleases:
                ...
                - name: baremetal-operator
                  values:
                    dnsmasq:
                      dedicated_udp_service_address_pool: true
                      ...
                provider: baremetal
              ...
    
  2. In the MetalLBConfig object of the management cluster, modify the ipAddressPools object list by adding the dhcp-lb object and the serviceAllocation parameters for the default object:

    ipAddressPools:
    - name: default
      spec:
        addresses:
        - 112.181.11.41-112.181.11.60
        autoAssign: true
        avoidBuggyIPs: false
        serviceAllocation:
          serviceSelectors:
          - matchExpressions:
            - key: app.kubernetes.io/name
              operator: NotIn
              values:
              - dhcp-lb
    - name: services-pxe
      spec:
        addresses:
        - 10.0.24.122-10.0.24.140
        autoAssign: false
        avoidBuggyIPs: false
    - name: dhcp-lb
      spec:
        addresses:
        - 10.0.24.121/32
        autoAssign: true
        avoidBuggyIPs: false
        serviceAllocation:
          namespaces:
          - kaas
          serviceSelectors:
          - matchExpressions:
            - key: app.kubernetes.io/name
              operator: In
              values:
              - dhcp-lb
    

    Select non-overlapping IP addresses for all the ipAddressPools that you use: default, services-pxe, and dhcp-lb.

  3. In the MetalLBConfig object of the management cluster, modify the l2Advertisements object list by adding dhcp-lb to the ipAddressPools section in the pxe object spec:

    Note

    A cluster may have a different L2Advertisement object name instead of pxe.

    l2Advertisements:
    ...
    - name: pxe
      spec:
        ipAddressPools:
        - services-pxe
        - dhcp-lb
        ...
    
Configure optional cluster settings

Note

Consider this section as part of the Bootstrap v2 CLI procedure.

During creation of a management cluster using Bootstrap v2, you can configure optional cluster settings using the Container Cloud API by modifying cluster.yaml.template.

To configure optional cluster settings:

  1. Technology Preview. Enable custom host names for cluster machines. When enabled, any machine host name in a particular region matches the related Machine object name. For example, instead of the default kaas-node-<UID>, a machine host name will be master-0. The custom naming format is more convenient and easier to operate with.

    Configuration for custom host names on the management and its future managed clusters
    1. In cluster.yaml.template, find the spec.providerSpec.value.kaas.regional.helmReleases.name: baremetal-provider section.

    2. Under values.config, add customHostnamesEnabled: true:

      regional:
       - helmReleases:
         - name: baremetal-provider
           values:
             config:
               allInOneAllowed: false
               customHostnamesEnabled: true
               internalLoadBalancers: false
         provider: baremetal-provider
      
    1. In cluster.yaml.template, find the spec.providerSpec.value.kaas.regional section of the required region.

    2. In this section, find baremetal-provider under helmReleases.

    3. Under values.config, add customHostnamesEnabled: true. For example, in region-one:

      regional:
       - helmReleases:
         - name: baremetal-provider
           values:
             config:
               allInOneAllowed: false
               customHostnamesEnabled: true
               internalLoadBalancers: false
         provider: baremetal-provider
      
  2. Technology Preview. Enable the Linux Audit daemon auditd to monitor activity of cluster processes and prevent potential malicious activity.

    Configuration for auditd

    In cluster.yaml.template, add the auditd parameters:

    spec:
      providerSpec:
        value:
          audit:
            auditd:
              enabled: <bool>
              enabledAtBoot: <bool>
              backlogLimit: <int>
              maxLogFile: <int>
              maxLogFileAction: <string>
              maxLogFileKeep: <int>
              mayHaltSystem: <bool>
              presetRules: <string>
              customRules: <string>
              customRulesX32: <text>
              customRulesX64: <text>
    

    Configuration parameters for auditd:

    enabled

    Boolean, default - false. Enables the auditd role to install the auditd packages and configure rules. CIS rules: 4.1.1.1, 4.1.1.2.

    enabledAtBoot

    Boolean, default - false. Configures grub to audit processes that can be audited even if they start up prior to auditd startup. CIS rule: 4.1.1.3.

    backlogLimit

    Integer, default - none. Configures the backlog to hold records. If during boot audit=1 is configured, the backlog holds 64 records. If more than 64 records are created during boot, auditd records will be lost with a potential malicious activity being undetected. CIS rule: 4.1.1.4.

    maxLogFile

    Integer, default - none. Configures the maximum size of the audit log file. Once the log reaches the maximum size, it is rotated and a new log file is created. CIS rule: 4.1.2.1.

    maxLogFileAction

    String, default - none. Defines handling of the audit log file reaching the maximum file size. Allowed values:

    • keep_logs - rotate logs but never delete them

    • rotate - add a cron job to compress rotated log files and keep maximum 5 compressed files.

    • compress - compress log files and keep them under the /var/log/auditd/ directory. Requires auditd_max_log_file_keep to be enabled.

    CIS rule: 4.1.2.2.

    maxLogFileKeep

    Integer, default - 5. Defines the number of compressed log files to keep under the /var/log/auditd/ directory. Requires auditd_max_log_file_action=compress. CIS rules - none.

    mayHaltSystem

    Boolean, default - false. Halts the system when the audit logs are full. Applies the following configuration:

    • space_left_action = email

    • action_mail_acct = root

    • admin_space_left_action = halt

    CIS rule: 4.1.2.3.

    customRules

    String, default - none. Base64-encoded content of the 60-custom.rules file for any architecture. CIS rules - none.

    customRulesX32

    String, default - none. Base64-encoded content of the 60-custom.rules file for the i386 architecture. CIS rules - none.

    customRulesX64

    String, default - none. Base64-encoded content of the 60-custom.rules file for the x86_64 architecture. CIS rules - none.

    presetRules

    String, default - none. Comma-separated list of the following built-in preset rules:

    • access

    • actions

    • delete

    • docker

    • identity

    • immutable

    • logins

    • mac-policy

    • modules

    • mounts

    • perm-mod

    • privileged

    • scope

    • session

    • system-locale

    • time-change

    Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0) in the Technology Preview scope, you can collect some of the preset rules indicated above as groups and use them in presetRules:

    • ubuntu-cis-rules - this group contains rules to comply with the Ubuntu CIS Benchmark recommendations, including the following CIS Ubuntu 20.04 v2.0.1 rules:

      • scope - 5.2.3.1

      • actions - same as 5.2.3.2

      • time-change - 5.2.3.4

      • system-locale - 5.2.3.5

      • privileged - 5.2.3.6

      • access - 5.2.3.7

      • identity - 5.2.3.8

      • perm-mod - 5.2.3.9

      • mounts - 5.2.3.10

      • session - 5.2.3.11

      • logins - 5.2.3.12

      • delete - 5.2.3.13

      • mac-policy - 5.2.3.14

      • modules - 5.2.3.19

    • docker-cis-rules - this group contains rules to comply with Docker CIS Benchmark recommendations, including the docker Docker CIS v1.6.0 rules 1.1.3 - 1.1.18.

    You can also use two additional keywords inside presetRules:

    • none - select no built-in rules.

    • all - select all built-in rules. When using this keyword, you can add the ! prefix to a rule name to exclude some rules. You can use the ! prefix for rules only if you add the all keyword as the first rule. Place a rule with the ! prefix only after the all keyword.

    Example configurations:

    • presetRules: none - disable all preset rules

    • presetRules: docker - enable only the docker rules

    • presetRules: access,actions,logins - enable only the access, actions, and logins rules

    • presetRules: ubuntu-cis-rules - enable all rules from the ubuntu-cis-rules group

    • presetRules: docker-cis-rules,actions - enable all rules from the docker-cis-rules group and the actions rule

    • presetRules: all - enable all preset rules

    • presetRules: all,!immutable,!sessions - enable all preset rules except immutable and sessions


    CIS controls
    4.1.3 (time-change)
    4.1.4 (identity)
    4.1.5 (system-locale)
    4.1.6 (mac-policy)
    4.1.7 (logins)
    4.1.8 (session)
    4.1.9 (perm-mod)
    4.1.10 (access)
    4.1.11 (privileged)
    4.1.12 (mounts)
    4.1.13 (delete)
    4.1.14 (scope)
    4.1.15 (actions)
    4.1.16 (modules)
    4.1.17 (immutable)
    Docker CIS controls
    1.1.4
    1.1.8
    1.1.10
    1.1.12
    1.1.13
    1.1.15
    1.1.16
    1.1.17
    1.1.18
    1.2.3
    1.2.4
    1.2.5
    1.2.6
    1.2.7
    1.2.10
    1.2.11
  3. Configure OIDC integration:

    LDAP configuration

    Example configuration:

    spec:
      providerSpec:
        value:
          kaas:
            management:
              helmReleases:
              - name: iam
                values:
                  keycloak:
                    userFederation:
                      providers:
                        - displayName: "<LDAP_NAME>"
                          providerName: "ldap"
                          priority: 1
                          fullSyncPeriod: -1
                          changedSyncPeriod: -1
                          config:
                            pagination: "true"
                            debug: "false"
                            searchScope: "1"
                            connectionPooling: "true"
                            usersDn: "<DN>" # "ou=People, o=<ORGANIZATION>, dc=<DOMAIN_COMPONENT>"
                            userObjectClasses: "inetOrgPerson,organizationalPerson"
                            usernameLDAPAttribute: "uid"
                            rdnLDAPAttribute: "uid"
                            vendor: "ad"
                            editMode: "READ_ONLY"
                            uuidLDAPAttribute: "uid"
                            connectionUrl: "ldap://<LDAP_DNS>"
                            syncRegistrations: "false"
                            authType: "simple"
                            bindCredential: ""
                            bindDn: ""
                      mappers:
                        - name: "username"
                          federationMapperType: "user-attribute-ldap-mapper"
                          federationProviderDisplayName: "<LDAP_NAME>"
                          config:
                            ldap.attribute: "uid"
                            user.model.attribute: "username"
                            is.mandatory.in.ldap: "true"
                            read.only: "true"
                            always.read.value.from.ldap: "false"
                        - name: "full name"
                          federationMapperType: "full-name-ldap-mapper"
                          federationProviderDisplayName: "<LDAP_NAME>"
                          config:
                            ldap.full.name.attribute: "cn"
                            read.only: "true"
                            write.only: "false"
                        - name: "last name"
                          federationMapperType: "user-attribute-ldap-mapper"
                          federationProviderDisplayName: "<LDAP_NAME>"
                          config:
                            ldap.attribute: "sn"
                            user.model.attribute: "lastName"
                            is.mandatory.in.ldap: "true"
                            read.only: "true"
                            always.read.value.from.ldap: "true"
                        - name: "email"
                          federationMapperType: "user-attribute-ldap-mapper"
                          federationProviderDisplayName: "<LDAP_NAME>"
                          config:
                            ldap.attribute: "mail"
                            user.model.attribute: "email"
                            is.mandatory.in.ldap: "false"
                            read.only: "true"
                            always.read.value.from.ldap: "true"
    

    Note

    • Verify that the userFederation section is located on the same level as the initUsers section.

    • Verify that all attributes set in the mappers section are defined for users in the specified LDAP system. Missing attributes may cause authorization issues.

    For details, see Configure LDAP for IAM.

    Google OAuth configuration

    Example configuration:

    keycloak:
      externalIdP:
        google:
          enabled: true
          config:
            clientId: <Google_OAuth_client_ID>
            clientSecret: <Google_OAuth_client_secret>
    

    For details, see Configure Google OAuth IdP for IAM.

  4. Disable NTP that is enabled by default. This option disables the management of chrony configuration by Container Cloud to use your own system for chrony management. Otherwise, configure the regional NTP server parameters as described below.

    NTP configuration

    Configure the regional NTP server parameters to be applied to all machines of managed clusters.

    In cluster.yaml.template, add the ntp:servers section with the list of required server names:

    spec:
      ...
      providerSpec:
        value:
          kaas:
          ...
          ntpEnabled: true
            regional:
              - helmReleases:
                - name: <providerName>-provider
                  values:
                    config:
                      lcm:
                        ...
                        ntp:
                          servers:
                          - 0.pool.ntp.org
                          ...
                provider: <providerName>
                ...
    

    To disable NTP:

    spec:
      ...
      providerSpec:
        value:
          ...
          ntpEnabled: false
          ...
    
  5. Applies since Container Cloud 2.26.0 (Cluster release 16.1.0). If you plan to deploy large managed clusters, enable dynamic IP allocation to increase the amount of baremetal hosts to be provisioned in parallel. For details, see Enable dynamic IP allocation.

Now, proceed with completing the bootstrap process using the Container Cloud Bootstrap API as described in Deploy a management cluster using the Container Cloud API.

Post-deployment steps

After bootstrapping the management cluster, collect and save the following cluster details in a secure location:

  1. Obtain the management cluster kubeconfig:

    ./container-cloud get cluster-kubeconfig \
    --kubeconfig <pathToKindKubeconfig> \
    --cluster-name <clusterName>
    

    By default, pathToKindKubeconfig is $HOME/.kube/kind-config-clusterapi.

  2. Obtain the Keycloak credentials as described in Access the Keycloak Admin Console.

  3. Obtain MariaDB credentials for IAM.

  4. Remove the kind cluster:

    ./bin/kind delete cluster -n <kindClusterName>
    

    By default, kindClusterName is clusterapi.

Now, you can proceed with operating your management cluster through the Container Cloud web UI and deploying managed clusters as described in Operations Guide.

Troubleshooting

This section provides solutions to the issues that may occur while deploying a cluster with Container Cloud Bootstrap v2.

Troubleshoot the bootstrap region creation

If the BootstrapRegion object is in the Error state, find the error type in the Status field of the object for the following components to resolve the issue:

Field name

Troubleshooting steps

Helm

If the bootstrap HelmBundle is not ready for a long time, for example, during 15 minutes in case of an average network bandwidth, verify statuses of non-ready releases and resolve the issue depending on the error message of a particular release:

kubectl --kubeconfig <pathToKindKubeconfig> \
get helmbundle bootstrap -o json | \
jq '.status.releaseStatuses[] | select(.ready == false) | {name: .chart, message: .message}'

If fixing the issues with Helm releases does not help, collect the Helm Controller logs and filter them by error to find the root cause:

kubectl --kubeconfig <pathToKindKubeconfig> -n kube-sytem \
logs -lapp=helm-controller | grep "ERROR"

Deployments

If some of deployments are not ready for a long time while the bootstrap HelmBundle is ready, restart the affected deployments:

kubectl --kubeconfig <pathToKindKubeconfig> \
-n kaas rollout restart deploy <notReadyDeploymentName>

If restarting of the affected deployments does not help, collect and assess the logs of non-ready deployments:

kubectl --kubeconfig <pathToKindKubeconfig> \
-n kaas logs -lapp.kubernetes.io/name=<notReadyDeploymentName>

Provider

The status of this field becomes Ready when all provider-related HelmBundle charts are configured and in the Ready status.

Troubleshoot machines creation

If a Machine object is stuck in the same status for a long time, identify the status phase of the affected machine and proceed as described below.

To verify the status of the created Machine objects:

kubectl --kubeconfig <pathToKindKubeconfig> \
get machines -o jsonpath='{.items[*].status.phase}'

The deployment statuses of a Machine object are the same as the LCMMachine object states:

  1. Uninitialized - the machine is not yet assigned to an LCMCluster.

  2. Pending - the agent reports a node IP address and host name.

  3. Prepare - the machine executes StateItems that correspond to the prepare phase. This phase usually involves downloading the necessary archives and packages.

  4. Deploy - the machine executes StateItems that correspond to the deploy phase that is becoming a Mirantis Kubernetes Engine (MKE) node.

  5. Ready - the machine is being deployed.

  6. Upgrade - the machine is being upgraded to the new MKE version.

  7. Reconfigure - the machine executes StateItems that correspond to the reconfigure phase. The machine configuration is being updated without affecting workloads running on the machine.

If the system response is empty, approve the BootstrapRegion object:

  • Using the Container Cloud web UI, navigate to the Bootstrap tab and approve the related BootstrapRegion object

  • Using the Container Cloud CLI:

    ./container-cloud bootstrap approve all
    

If the system response is not empty and the status remains the same for a while, the issue may relate to machine misconfiguration. Therefore, verify and adjust the parameters of the affected Machine object. For provider-related issues, refer to the Troubleshooting section.

Troubleshoot deployment stages

If the cluster deployment is stuck on the same stage for a long time, it may be related to configuration issues in the Machine or other deployment objects.

To troubleshoot cluster deployment:

  1. Identify the current deployment stage that got stuck:

    kubectl --kubeconfig <pathToKindKubeconfig> \
    get cluster <cluster-name> -o jsonpath='{.status.bootstrapStatus}{"\n"}'
    

    For the deployment stages description, see Overview of the deployment workflow.

  2. Collect the bootstrap-provider logs and identify a repetitive error that relates to the stuck deployment stage:

    kubectl --kubeconfig <pathToKindKubeconfig> \
    -n kaas logs -lapp.kubernetes.io/name=bootstrap-provider
    
    Examples of repetitive errors

    Error name

    Solution

    Cluster nodes are not yet ready

    Verify the Machine objects configuration.

    Starting pivot

    Contact Mirantis support for further issue assessment.

    Some objects in cluster are not ready with the same deployment names

    Verify the related deployment configuration.

Collect the bootstrap logs

If the bootstrap process is stuck or fails, collect and inspect the bootstrap and management cluster logs.

To collect the bootstrap logs:

If the Cluster object is not created yet
  1. List all available deployments:

    kubectl --kubeconfig <pathToKindKubeconfig> \
    -n kaas get deploy
    
  2. Collect the logs of the required deployment:

    kubectl --kubeconfig <pathToKindKubeconfig> \
    -n kaas logs -lapp.kubernetes.io/name=<deploymentName>
    
If the Cluster object is created

Select from the following options:

  • If a management cluster is not deployed yet:

    CLUSTER_NAME=<clusterName> ./bootstrap.sh collect_logs
    
  • If a management cluster is deployed or pivoting is done:

    1. Obtain the cluster kubeconfig:

      ./container-cloud get cluster-kubeconfig \
      --kubeconfig <pathToKindKubeconfig> \
      --cluster-name <clusterName> \
      --kubeconfig-output <pathToMgmtClusterKubeconfig>
      
    2. Collect the logs:

      CLUSTER_NAME=<cluster-name> \
      KUBECONFIG=<pathToMgmtClusterKubeconfig> \
      ./bootstrap.sh collect_logs
      
    3. Technology Preview. Assess the Ironic pod logs:

      • Extract the content of the 'message' fields from every log message:

        kubectl -n kaas logs <ironicPodName> -c syslog | jq -rRM 'fromjson? | .message'
        
      • Extract the content of the 'message' fields from the ironic_conductor source log messages:

        kubectl -n kaas logs <ironicPodName> -c syslog | jq -rRM 'fromjson? | select(.source == "ironic_conductor") | .message'
        

      The syslog container collects logs generated by Ansible during the node deployment and cleanup and outputs them in the JSON format.

Note

Add COLLECT_EXTENDED_LOGS=true before the collect_logs command to output the extended version of logs that contains system and MKE logs, logs from LCM Ansible and LCM Agent along with cluster events and Kubernetes resources description and logs.

Without the --extended flag, the basic version of logs is collected, which is sufficient for most use cases. The basic version of logs contains all events, Kubernetes custom resources, and logs from all Container Cloud components. This version does not require passing --key-file.

The logs are collected in the directory where the bootstrap script is located.

Logs structure

The Container Cloud logs structure in <output_dir>/<cluster_name>/ is as follows:

  • /events.log

    Human-readable table that contains information about the cluster events.

  • /system

    System logs.

  • /system/mke (or /system/MachineName/mke)

    Mirantis Kuberntes Engine (MKE) logs.

  • /objects/cluster

    Logs of the non-namespaced Kubernetes objects.

  • /objects/namespaced

    Logs of the namespaced Kubernetes objects.

  • /objects/namespaced/<namespaceName>/core/pods

    Logs of the pods from a specific Kubernetes namespace. For example, logs of the pods from the kaas namespace contain logs of Container Cloud controllers, including bootstrap-cluster-controller since Container Cloud 2.25.0.

  • /objects/namespaced/<namespaceName>/core/pods/<containerName>.prev.log

    Logs of the pods from a specific Kubernetes namespace that were previously removed or failed.

  • /objects/namespaced/<namespaceName>/core/pods/<ironicPodName>/syslog.log Technology Preview. Ironic pod logs.

    Note

    Logs collected by the syslog container during the bootstrap phase are not transferred to the management cluster during pivoting. These logs are located in /volume/log/ironic/ansible_conductor.log inside the Ironic pod.

Each log entry of the management cluster logs contains a request ID that identifies chronology of actions performed on a cluster or machine. The format of the log entry is as follows:

<process ID>.[<subprocess ID>...<subprocess ID N>].req:<requestID>: <logMessage>

For example, bm.machine.req:28 contains information about the task 28 applied to a bare metal machine.

Since Container Cloud 2.22.0, the logging format has the following extended structure for the admission-controller, storage-discovery, and all supported baremetal-provider services of a management cluster:

level:<debug,info,warn,error,panic>,
ts:<YYYY-MM-DDTHH:mm:ssZ>,
logger:<processID>.<subProcessID(s)>.req:<requestID>,
caller:<lineOfCode>,
msg:<message>,
error:<errorMessage>,
stacktrace:<codeInfo>

Since Container Cloud 2.23.0, this structure also applies to the <name>-controller services of a management cluster.

Example of a log extract for baremetal-provider since 2.22.0
{"level":"error","ts":"2022-11-14T21:37:18Z","logger":"bm.cluster.req:318","caller":"lcm/machine.go:808","msg":"","error":"could not determine machine demo-46880 host name”,”stacktrace”:”sigs.k8s.io/cluster-api-provider-baremetal/pkg/lcm.GetMachineConditions\n\t/go/src/sigs.k8s.io/cluster-api-provider-baremetal/pkg/lcm/machine.go:808\nsigs.k8s.io/cluster-api-provider-baremetal/pkg...."}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"bm.machine.req:476","caller":"service/reconcile.go:128","msg":"request: default/demo-46880-2"}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"bm.machine.req:476","caller":"machine/machine_controller.go:201","msg":"Reconciling Machine \"default/demo-46880-2\""}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"bm.machine.req:476","caller":"machine/actuator.go:454","msg":"Checking if machine exists: \"default/demo-46880-2\" (cluster: \"default/demo-46880\")"}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"bm.machine.req:476","caller":"machine/machine_controller.go:327","msg":"Reconciling machine \"default/demo-46880-2\" triggers idempotent update"}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"bm.machine.req:476","caller":"machine/actuator.go:290","msg":"Updating machine: \"default/demo-46880-2\" (cluster: \"default/demo-46880\")"}
{"level":"info","ts":"2022-11-14T21:37:24Z","logger":"bm.machine.req:476","caller":"lcm/machine.go:73","msg":"Machine in LCM cluster, reconciling LCM objects"}
{"level":"info","ts":"2022-11-14T21:37:26Z","logger":"bm.machine.req:476","caller":"lcm/machine.go:902","msg":"Updating Machine default/demo-46880-2 conditions"}
  • level

    Informational level. Possible values: debug, info, warn, error, panic.

  • ts

    Time stamp in the <YYYY-MM-DDTHH:mm:ssZ> format. For example: 2022-11-14T21:37:23Z.

  • logger

    Details on the process ID being logged:

    • <processID>

      Primary process identifier. The list of possible values includes bm, os, iam, license, and bootstrap.

      Note

      The iam and license values are available since Container Cloud 2.23.0. The bootstrap value is available since Container Cloud 2.25.0.

    • <subProcessID(s)>

      One or more secondary process identifiers. The list of possible values includes cluster, machine, controller, and cluster-ctrl.

      Note

      The controller value is available since Container Cloud 2.23.0. The cluster-ctrl value is available since Container Cloud 2.25.0 for the bootstrap process identifier.

    • req

      Request ID number that increases when a service performs the following actions:

      • Receives a request from Kubernetes about creating, updating, or deleting an object

      • Receives an HTTP request

      • Runs a background process

      The request ID allows combining all operations performed with an object within one request. For example, the result of a Machine object creation, update of its statuses, and so on has the same request ID.

  • caller

    Code line used to apply the corresponding action to an object.

  • msg

    Description of a deployment or update phase. If empty, it contains the "error" key with a message followed by the "stacktrace" key with stack trace details. For example:

    "msg"="" "error"="Cluster nodes are not yet ready" "stacktrace": "<stack-trace-info>"
    

    The log format of the following Container Cloud components does not contain the "stacktrace" key for easier log handling: baremetal-provider, bootstrap-provider, and host-os-modules-controller.

Note

Logs may also include a number of informational key-value pairs containing additional cluster details. For example, "name": "object-name", "foobar": "baz".

Depending on the type of issue found in logs, apply the corresponding fixes. For example, if you detect the LoadBalancer ERROR state errors during the bootstrap of an OpenStack-based management cluster, contact your system administrator to fix the issue.

Requirements for a MITM proxy

Note

For MOSK, the feature is generally available since MOSK 23.1.

While bootstrapping a Container Cloud management cluster using proxy, you may require Internet access to go through a man-in-the-middle (MITM) proxy. Such configuration requires that you enable streaming and install a CA certificate on a bootstrap node.

Enable streaming for MITM

Ensure that the MITM proxy is configured with enabled streaming. For example, if you use mitmproxy, enable the stream_large_bodies=1 option:

./mitmdump --set stream_large_bodies=1
Install a CA certificate for a MITM proxy on a bootstrap node
  1. Log in to the bootstrap node.

  2. Install ca-certificates:

    apt install ca-certificates
    
  3. Copy your CA certificate to the /usr/local/share/ca-certificates/ directory. For example:

    sudo cp ~/.mitmproxy/mitmproxy-ca-cert.cer /usr/local/share/ca-certificates/mitmproxy-ca-cert.crt
    

    Replace ~/.mitmproxy/mitmproxy-ca-cert.cer with the path to your CA certificate.

    Caution

    The target CA certificate file must be in the PEM format with the .crt extension.

  4. Apply the changes:

    sudo update-ca-certificates
    

Now, proceed with bootstrapping your management cluster.

Create initial users after a management cluster bootstrap

Once you bootstrap your management cluster, create Keycloak users for access to the Container Cloud web UI. Use the created credentials to log in to the Container Cloud web UI.

Mirantis recommends creating at least two users, user and operator, that are required for a typical Container Cloud deployment.

To create the user for access to the Container Cloud web UI, use:

./container-cloud bootstrap user add \
    --username <userName> \
    --roles <roleName> \
    --kubeconfig <pathToMgmtKubeconfig>

Note

You will be asked for the user password interactively.

User creation parameters

Flag

Description

--username

Required. Name of the user to create.

--roles

Required. Comma-separated list of roles to assign to the user.

  • If you run the command without the --namespace flag, you can assign the following roles:

    • global-admin - read and write access for global role bindings

    • writer - read and write access

    • reader - view access

    • operator - create and manage access to the BareMetalHost objects

    • management-admin - full access to the management cluster, available since Container Cloud 2.25.0 (Cluster releases 17.0.0, 16.0.0, 14.1.0)

  • If you run the command for a specific project using the --namespace flag, you can assign the following roles:

    • operator or writer - read and write access

    • user or reader - view access

    • member - read and write access (excluding IAM objects)

    • bm-pool-operator - create and manage access to the BareMetalHost objects

--kubeconfig

Required. Path to the management cluster kubeconfig generated during the management cluster bootstrap.

--namespace

Optional. Name of the Container Cloud project where the user will be created. If not set, a global user will be created for all Container Cloud projects with the corresponding role access to view or manage all Container Cloud public objects.

--password-stdin

Optional. Flag to provide the user password through stdin:

echo '$PASSWORD' | ./container-cloud bootstrap user add \
    --username <userName> \
    --roles <roleName> \
    --kubeconfig <pathToMgmtKubeconfig> \
    --password-stdin

To delete the user, run:

./container-cloud bootstrap user delete --username <userName> --kubeconfig <pathToMgmtKubeconfig>

Troubleshooting

This section provides solutions to the issues that may occur while deploying a management cluster.

Troubleshoot the bootstrap node configuration

This section provides solutions to the issues that may occur while configuring the bootstrap node.

DNS settings

If you have issues related to the DNS settings, the following error message may occur:

curl: (6) Could not resolve host

The issue may occur if a VPN is used to connect to the cloud or a local DNS forwarder is set up.

The workaround is to change the default DNS settings for Docker:

  1. Log in to your local machine.

  2. Identify your internal or corporate DNS server address:

    systemd-resolve --status
    
  3. Create or edit /etc/docker/daemon.json by specifying your DNS address:

    {
      "dns": ["<YOUR_DNS_ADDRESS>"]
    }
    
  4. Restart the Docker daemon:

    sudo systemctl restart docker
    
Default network addresses

If you have issues related to the default network address configuration, curl either hangs or the following error occurs:

curl: (7) Failed to connect to xxx.xxx.xxx.xxx port xxxx: Host is unreachable

The issue may occur because the default Docker network address 172.17.0.0/16 and/or the kind Docker network, which is used by kind, overlap with your cloud address or other addresses of the network configuration.

Workaround:

  1. Log in to your local machine.

  2. Verify routing to the IP addresses of the target cloud endpoints:

    1. Obtain the IP address of your target cloud. For example:

      nslookup auth.openstack.example.com
      

      Example of system response:

      Name:   auth.openstack.example.com
      Address: 172.17.246.119
      
    2. Verify that this IP address is not routed through docker0 but through any other interface, for example, ens3:

      ip r get 172.17.246.119
      

      Example of the system response if the routing is configured correctly:

      172.17.246.119 via 172.18.194.1 dev ens3 src 172.18.1.1 uid 1000
        cache
      

      Example of the system response if the routing is configured incorrectly:

      172.17.246.119 via 172.18.194.1 dev docker0 src 172.18.1.1 uid 1000
        cache
      
  3. If the routing is incorrect, change the IP address of the default Docker bridge:

    1. Create or edit /etc/docker/daemon.json by adding the "bip" option:

      {
        "bip": "192.168.91.1/24"
      }
      
    2. Restart the Docker daemon:

      sudo systemctl restart docker
      
  4. If required, customize addresses for your kind Docker network or any other additional Docker networks:

    1. Remove the kind network:

      docker network rm 'kind'
      
    2. Choose from the following options:

      • Configure /etc/docker/daemon.json:

        Note

        The following steps are applied to to customize addresses for the kind Docker network. Use these steps as an example for any other additional Docker networks.

        1. Add the following section to /etc/docker/daemon.json:

          {
           "default-address-pools":
           [
             {"base":"192.169.0.0/16","size":24}
           ]
          }
          
        2. Restart the Docker daemon:

          sudo systemctl restart docker
          

          After Docker restart, the newly created local or global scope networks, including 'kind', will be dynamically assigned a subnet from the defined pool.

      • Recreate the 'kind' Docker network manually with a subnet that is not in use in your network. For example:

        docker network create -o com.docker.network.bridge.enable_ip_masquerade=true -d bridge --subnet 192.168.0.0/24 'kind'
        

        Caution

        Docker pruning removes the user defined networks, including 'kind'. Therefore, every time after running the Docker pruning commands, re-create the 'kind' network again using the command above.

Configure external identity provider for IAM

This section describes how to configure authentication for Mirantis Container Cloud depending on the external identity provider type integrated to your deployment.

Configure LDAP for IAM

If you integrate LDAP for IAM to Mirantis Container Cloud, add the required LDAP configuration to cluster.yaml.template during the bootstrap of the management cluster.

Note

The example below defines the recommended non-anonymous authentication type. If you require anonymous authentication, replace the following parameters with authType: "none":

authType: "simple"
bindCredential: ""
bindDn: ""

To configure LDAP for IAM:

  1. Open templates/bm/cluster.yaml.template.

  2. Configure the keycloak:userFederation:providers: and keycloak:userFederation:mappers: sections as required:

    spec:
      providerSpec:
        value:
          kaas:
            management:
              helmReleases:
              - name: iam
                values:
                  keycloak:
                    userFederation:
                      providers:
                        - displayName: "<LDAP_NAME>"
                          providerName: "ldap"
                          priority: 1
                          fullSyncPeriod: -1
                          changedSyncPeriod: -1
                          config:
                            pagination: "true"
                            debug: "false"
                            searchScope: "1"
                            connectionPooling: "true"
                            usersDn: "<DN>" # "ou=People, o=<ORGANIZATION>, dc=<DOMAIN_COMPONENT>"
                            userObjectClasses: "inetOrgPerson,organizationalPerson"
                            usernameLDAPAttribute: "uid"
                            rdnLDAPAttribute: "uid"
                            vendor: "ad"
                            editMode: "READ_ONLY"
                            uuidLDAPAttribute: "uid"
                            connectionUrl: "ldap://<LDAP_DNS>"
                            syncRegistrations: "false"
                            authType: "simple"
                            bindCredential: ""
                            bindDn: ""
                      mappers:
                        - name: "username"
                          federationMapperType: "user-attribute-ldap-mapper"
                          federationProviderDisplayName: "<LDAP_NAME>"
                          config:
                            ldap.attribute: "uid"
                            user.model.attribute: "username"
                            is.mandatory.in.ldap: "true"
                            read.only: "true"
                            always.read.value.from.ldap: "false"
                        - name: "full name"
                          federationMapperType: "full-name-ldap-mapper"
                          federationProviderDisplayName: "<LDAP_NAME>"
                          config:
                            ldap.full.name.attribute: "cn"
                            read.only: "true"
                            write.only: "false"
                        - name: "last name"
                          federationMapperType: "user-attribute-ldap-mapper"
                          federationProviderDisplayName: "<LDAP_NAME>"
                          config:
                            ldap.attribute: "sn"
                            user.model.attribute: "lastName"
                            is.mandatory.in.ldap: "true"
                            read.only: "true"
                            always.read.value.from.ldap: "true"
                        - name: "email"
                          federationMapperType: "user-attribute-ldap-mapper"
                          federationProviderDisplayName: "<LDAP_NAME>"
                          config:
                            ldap.attribute: "mail"
                            user.model.attribute: "email"
                            is.mandatory.in.ldap: "false"
                            read.only: "true"
                            always.read.value.from.ldap: "true"
    

    Note

    • Verify that the userFederation section is located on the same level as the initUsers section.

    • Verify that all attributes set in the mappers section are defined for users in the specified LDAP system. Missing attributes may cause authorization issues.

Now, return to the bootstrap instruction of your management cluster.

Configure Google OAuth IdP for IAM

Caution

The instruction below applies to the DNS-based management clusters. If you bootstrap a non-DNS-based management cluster, configure Google OAuth IdP for Keycloak after bootstrap using the official Keycloak documentation.

If you integrate Google OAuth external identity provider for IAM to Mirantis Container Cloud, create the authorization credentials for IAM in your Google OAuth account and configure cluster.yaml.template during the bootstrap of the management cluster.

To configure Google OAuth IdP for IAM:

  1. Create Google OAuth credentials for IAM:

    1. Log in to your https://console.developers.google.com.

    2. Navigate to Credentials.

    3. In the APIs Credentials menu, select OAuth client ID.

    4. In the window that opens:

      1. In the Application type menu, select Web application.

      2. In the Authorized redirect URIs field, type in <keycloak-url>/auth/realms/iam/broker/google/endpoint, where <keycloak-url> is the corresponding DNS address.

      3. Press Enter to add the URI.

      4. Click Create.

      A page with your client ID and client secret opens. Save these credentials for further usage.

  2. Log in to the bootstrap node.

  3. Open templates/bm/cluster.yaml.template.

  4. In the keycloak:externalIdP: section, add the following snippet with your credentials created in previous steps:

    keycloak:
      externalIdP:
        google:
          enabled: true
          config:
            clientId: <Google_OAuth_client_ID>
            clientSecret: <Google_OAuth_client_secret>
    

Now, return to the bootstrap instruction of your management cluster.

Operations Guide

Mirantis Container Cloud CLI

This section was moved to MOSK documentation: Container Cloud CLI.

Create and operate managed clusters

Note

This tutorial applies only to the Container Cloud web UI users with the m:kaas:namespace@operator or m:kaas:namespace@writer access role assigned by the Infrastructure Operator. To add a bare metal host, the m:kaas@operator or m:kaas:namespace@bm-pool-operator role is required.

After you deploy the Mirantis Container Cloud management cluster, you can start creating managed clusters depending on your cloud needs.

The deployment procedure is performed using the Container Cloud web UI and comprises the following steps:

  1. Create a dedicated non-default project for managed clusters.

  2. Create and configure bare metal hosts with corresponding labels for machines such as worker, manager, or storage.

  3. Create an initial cluster configuration.

  4. Add the required amount of machines with the corresponding configuration to the managed cluster.

  5. Add a Ceph cluster.

Note

The Container Cloud web UI communicates with Keycloak to authenticate users. Keycloak is exposed using HTTPS with self-signed TLS certificates that are not trusted by web browsers.

To use your own TLS certificates for Keycloak, refer to Configure TLS certificates for cluster applications.

Create a project for managed clusters

Note

The procedure below applies only to the Container Cloud web UI users with the m:kaas@global-admin or m:kaas@writer access role assigned by the infrastructure Operator.

The default project (Kubernetes namespace) in Container Cloud is dedicated for management clusters only. Managed clusters require a separate project. You can create as many projects as required by your company infrastructure.

To create a project for managed clusters using the Container Cloud web UI:

  1. Log in to the Container Cloud web UI as m:kaas@global-admin or m:kaas@writer.

  2. In the Projects tab, click Create.

  3. Type the new project name.

  4. Click Create.

Note

Due to the known issue 50168, access to the newly created project becomes available in five minutes after project creation.

Generate a kubeconfig for a managed cluster using API

This section was moved to Mirantis OpenStack for Kubernetes documentation: Getting access - Generate a kubeconfig for a cluster using API.

Create and operate a baremetal-based managed cluster

After bootstrapping your baremetal-based Mirantis Container Cloud management cluster as described in Deploy a Container Cloud management cluster, you can start creating the baremetal-based managed clusters.

Add a bare metal host

The subsections of this section were moved to MOSK Deployment Guide: Add a bare metal host.

Add a bare metal host using web UI

This section was moved to MOSK Deployment Guide: Add a bare metal host using web UI.

Add a bare metal host using CLI

This section was moved to MOSK Deployment Guide: Add a bare metal host using web CLI.

Create a custom bare metal host profile

The subsections of this section were moved to MOSK Deployment Guide: Create MOSK host profiles.

Default configuration of the host system storage

This section was moved to MOSK Deployment Guide: Default configuration of the host system storage.

Wipe a device or partition

Available since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK Deployment Guide: Wipe a device or partition.

Create a custom host profile

This section was moved to MOSK Deployment Guide: Create a custom host profile.

Configure Ceph disks in a host profile

This section was moved to MOSK Deployment Guide: Configure Ceph disks in a host profile.

Enable huge pages

This section was moved to MOSK Deployment Guide: Enable huge pages.

Configure RAID support

Caution

This feature is available as Technology Preview. Use such configuration for testing and evaluation purposes only. For the Technology Preview feature definition, refer to Technology Preview features.

The subsections of this section were moved to MOSK Deployment Guide: Configure RAID support.

Create an LVM software RAID level 1 (raid1)

Caution

This feature is available as Technology Preview. Use such configuration for testing and evaluation purposes only. For the Technology Preview feature definition, refer to Technology Preview features.

This section was moved to MOSK Deployment Guide: Create an LVM software RAID level 1 (raid1).

Create an mdadm software RAID level 1 (raid1)

Caution

This feature is available as Technology Preview. Use such configuration for testing and evaluation purposes only. For the Technology Preview feature definition, refer to Technology Preview features.

This section was moved to MOSK Deployment Guide: Create an mdadm software RAID level 1 (raid1).

Create an mdadm software RAID level 10 (raid10)

Technology Preview

This section was moved to MOSK Deployment Guide: Create an mdadm software RAID level 10 (raid10).

Add a managed baremetal cluster

This section instructs you on how to configure and deploy a managed cluster that is based on the baremetal-based management cluster.

By default, Mirantis Container Cloud configures a single interface on the cluster nodes, leaving all other physical interfaces intact.

With L2 networking templates, you can create advanced host networking configurations for your clusters. For example, you can create bond interfaces on top of physical interfaces on the host or use multiple subnets to separate different types of network traffic.

You can use several host-specific L2 templates per one cluster to support different hardware configurations. For example, you can create L2 templates with different number and layout of NICs to be applied to the specific machines of one cluster.

Caution

Modification of L2 templates in use is allowed with a mandatory validation step from the Infrastructure Operator to prevent accidental cluster failures due to unsafe changes. The list of risks posed by modifying L2 templates includes:

  • Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.

  • Connections between services are interrupted unexpectedly, which can cause data loss.

  • Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.

For details, see Modify network configuration on an existing machine.

Since Container Cloud 2.24.4, in the Technology Preview scope, you can create a managed cluster with a multi-rack topology, where cluster nodes including Kubernetes masters are distributed across multiple racks without L2 layer extension between them, and use BGP for announcement of the cluster API load balancer address and external addresses of Kubernetes load-balanced services.

Implementation of the multi-rack topology implies the use of Rack and MultiRackCluster objects that support configuration of BGP announcement of the cluster API load balancer address. For the configuration procedure, refer to Configure BGP announcement for cluster API LB address. For configuring the BGP announcement of external addresses of Kubernetes load-balanced services, refer to Configure MetalLB.

Follow the procedures described in the below subsections to configure initial settings and advanced network objects for your managed clusters.

Create a cluster using web UI

This section instructs you on how to create initial configuration of a managed cluster that is based on the baremetal-based management cluster through the Mirantis Container Cloud web UI.

Note

Due to the known issue 50181, creation of a compact managed cluster or addition of any labels to the control plane nodes is not available through the Container Cloud web UI.

To create a managed cluster on bare metal:

  1. Available since the Cluster release 16.1.0 on the management cluster. If you plan to deploy a large managed cluster, enable dynamic IP allocation to increase the amount of baremetal hosts to be provisioned in parallel. For details, see Enable dynamic IP allocation.

  2. Available since Container Cloud 2.24.0. Optional. Technology Preview. Enable custom host names for cluster machines. When enabled, any machine host name in a particular region matches the related Machine object name. For example, instead of the default kaas-node-<UID>, a machine host name will be master-0. The custom naming format is more convenient and easier to operate with.

    For details, see Configure host names for cluster machines.

    If you enabled this feature during management cluster bootstrap, skip this step, as the feature applies to any cluster type.

  3. Log in to the Container Cloud web UI with the m:kaas:namespace@operator or m:kaas:namespace@writer permissions.

  4. Switch to the required non-default project using the Switch Project action icon located on top of the main left-side navigation panel.

    To create a project, refer to Create a project for managed clusters.

  5. Optional. In the SSH Keys tab, click Add SSH Key to upload the public SSH key(s) for SSH access to VMs.

  6. Optional. Enable proxy access to the cluster.

    In the Proxies tab, configure proxy:

    1. Click Add Proxy.

    2. In the Add New Proxy wizard, fill out the form with the following parameters:

      Proxy configuration

      Parameter

      Description

      Proxy Name

      Name of the proxy server to use during cluster creation.

      Region Removed in 2.26.0 (16.1.0 and 17.1.0)

      From the drop-down list, select the required region.

      HTTP Proxy

      Add the HTTP proxy server domain name in the following format:

      • http://proxy.example.com:port - for anonymous access

      • http://user:password@proxy.example.com:port - for restricted access

      HTTPS Proxy

      Add the HTTPS proxy server domain name in the same format as for HTTP Proxy.

      No Proxy

      Comma-separated list of IP addresses or domain names.

      For implementation details, see Proxy and cache support.

    3. If your proxy requires a trusted CA certificate, select the CA Certificate check box and paste a CA certificate for a MITM proxy to the corresponding field or upload a certificate using Upload Certificate.

    For MOSK-based deployments, the possibility to use a MITM proxy with a CA certificate is available since MOSK 23.1.

    For the list of Mirantis resources and IP addresses to be accessible from the Container Cloud clusters, see Requirements.

  7. In the Clusters tab, click Create Cluster.

  8. Configure the new cluster in the Create New Cluster wizard that opens:

    1. Define general and Kubernetes parameters:

      Create new cluster: General, Provider, and Kubernetes

      Section

      Parameter name

      Description

      General settings

      Cluster name

      The cluster name.

      Provider

      Select Baremetal.

      Region Removed in 2.26.0 (17.1.0 and 16.1.0)

      From the drop-down list, select Baremetal.

      Release version

      The Container Cloud version.

      Proxy

      Optional. From the drop-down list, select the proxy server name that you have previously created.

      SSH keys

      From the drop-down list, select the SSH key name(s) that you have previously added for SSH access to the bare metal hosts.

      Container Registry

      From the drop-down list, select the Docker registry name that you have previously added using the Container Registries tab. For details, see Define a custom CA certificate for a private Docker registry.

      Note

      For MOSK-based deployments, the feature support is available since MOSK 22.5.

      Enable WireGuard

      Optional. Technology Preview. Deprecated since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0). Available since Container Cloud 2.24.0 (Cluster release 14.0.0). Enable WireGuard for traffic encryption on the Kubernetes workloads network.

      WireGuard configuration
      1. Ensure that the Calico MTU size is at least 60 bytes smaller than the interface MTU size of the workload network. IPv4 WireGuard uses a 60-byte header. For details, see Set the MTU size for Calico.

      2. Enable WireGuard by selecting the Enable WireGuard check box.

        Caution

        Changing this parameter on a running cluster causes a downtime that can vary depending on the cluster size.

      For more details about WireGuard, see Calico documentation: Encrypt in-cluster pod traffic.

      Note

      This parameter was renamed from Enable Secure Overlay to Enable WireGuard in Cluster releases 17.0.0 and 16.0.0.

      Parallel Upgrade Of Worker Machines

      Optional. Available since Cluster releases 17.0.0 and 16.0.0.

      The maximum number of the worker nodes to update simultaneously. It serves as an upper limit on the number of machines that are drained at a given moment of time. Defaults to 1.

      You can also configure this option after deployment before the cluster update.

      Parallel Preparation For Upgrade Of Worker Machines

      Optional. Available since Cluster releases 17.0.0 and 16.0.0.

      The maximum number of worker nodes being prepared at a given moment of time, which includes downloading of new artifacts. It serves as a limit for the network load that can occur when downloading the files to the nodes. Defaults to 50.

      You can also configure this option after deployment before the cluster update.

      Provider

      LB host IP

      The IP address of the load balancer endpoint that will be used to access the Kubernetes API of the new cluster. This IP address must be in the LCM network if a separate LCM network is in use and if L2 (ARP) announcement of cluster API load balancer IP is in use.

      LB address range

      Removed in Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). The range of IP addresses that can be assigned to load balancers for Kubernetes Services by MetalLB. For a more flexible MetalLB configuration, refer to Configure MetalLB.

      Note

      Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0), MetalLB configuration must be added after cluster creation.

      Kubernetes

      Services CIDR blocks

      The Kubernetes Services CIDR blocks. For example, 10.233.0.0/18.

      Pods CIDR blocks

      The Kubernetes pods CIDR blocks. For example, 10.233.64.0/18.

      Note

      The network subnet size of Kubernetes pods influences the number of nodes that can be deployed in the cluster. The default subnet size /18 is enough to create a cluster with up to 256 nodes. Each node uses the /26 address blocks (64 addresses), at least one address block is allocated per node. These addresses are used by the Kubernetes pods with hostNetwork: false. The cluster size may be limited further when some nodes use more than one address block.

    2. Configure StackLight:

      Note

      If StackLight is enabled in non-HA mode but Ceph is not deployed yet, StackLight will not be installed and will be stuck in the Yellow state waiting for a successful Ceph installation. Once the Ceph cluster is deployed, the StackLight installation resumes. To deploy a Ceph cluster, refer to Add a Ceph cluster.

      Section

      Parameter name

      Description

      StackLight

      Enable Monitoring

      Selected by default. Deselect to skip StackLight deployment. You can also enable, disable, or configure StackLight parameters after deploying a managed cluster. For details, see Change a cluster configuration or Configure StackLight.

      Enable Logging

      Select to deploy the StackLight logging stack. For details about the logging components, see Deployment architecture.

      Note

      The logging mechanism performance depends on the cluster log load. In case of a high load, you may need to increase the default resource requests and limits for fluentdLogs. For details, see StackLight configuration parameters: Resource limits.

      HA Mode

      Select to enable StackLight monitoring in the HA mode. For the differences between HA and non-HA modes, see Deployment architecture. If disabled, StackLight requires a Ceph cluster. To deploy a Ceph cluster, refer to Add a Ceph cluster.

      StackLight Default Logs Severity Level

      Log severity (verbosity) level for all StackLight components. The default value for this parameter is Default component log level that respects original defaults of each StackLight component. For details about severity levels, see MOSK Operations Guide: StackLight configuration parameters - Log verbosity.

      StackLight Component Logs Severity Level

      The severity level of logs for a specific StackLight component that overrides the value of the StackLight Default Logs Severity Level parameter. For details about severity levels, see MOSK Operations Guide: StackLight configuration parameters - Log verbosity.

      Expand the drop-down menu for a specific component to display its list of available log levels.

      OpenSearch

      Logstash Retention Time

      Skip this parameter since Container Cloud 2.26.0 (17.1.0, 16.1.0). It was removed from the code base and will be removed from the web UI in one of the following releases.

      Available if you select Enable Logging. Specifies the logstash-* index retention time.

      Events Retention Time

      Available if you select Enable Logging. Specifies the kubernetes_events-* index retention time.

      Notifications Retention

      Available if you select Enable Logging. Specifies the notification-* index retention time and is used for Mirantis OpenStack for Kubernetes.

      Persistent Volume Claim Size

      Available if you select Enable Logging. The OpenSearch persistent volume claim size.

      Collected Logs Severity Level

      Available if you select Enable Logging. The minimum severity of all Container Cloud components logs collected in OpenSearch. For details about severity levels, see MOSK Operations Guide: StackLight configuration parameters - Logging.

      Prometheus

      Retention Time

      The Prometheus database retention period.

      Retention Size

      The Prometheus database retention size.

      Persistent Volume Claim Size

      The Prometheus persistent volume claim size.

      Enable Watchdog Alert

      Select to enable the Watchdog alert that fires as long as the entire alerting pipeline is functional.

      Custom Alerts

      Specify alerting rules for new custom alerts or upload a YAML file in the following exemplary format:

      - alert: HighErrorRate
        expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
        for: 10m
        labels:
          severity: page
        annotations:
          summary: High request latency
      

      For details, see Official Prometheus documentation: Alerting rules. For the list of the predefined StackLight alerts, see Operations Guide: Available StackLight alerts.

      StackLight Email Alerts

      Enable Email Alerts

      Select to enable the StackLight email alerts.

      Send Resolved

      Select to enable notifications about resolved StackLight alerts.

      Require TLS

      Select to enable transmitting emails through TLS.

      Email alerts configuration for StackLight

      Fill out the following email alerts parameters as required:

      • To - the email address to send notifications to.

      • From - the sender address.

      • SmartHost - the SMTP host through which the emails are sent.

      • Authentication username - the SMTP user name.

      • Authentication password - the SMTP password.

      • Authentication identity - the SMTP identity.

      • Authentication secret - the SMTP secret.

      StackLight Slack Alerts

      Enable Slack alerts

      Select to enable the StackLight Slack alerts.

      Send Resolved

      Select to enable notifications about resolved StackLight alerts.

      Slack alerts configuration for StackLight

      Fill out the following Slack alerts parameters as required:

      • API URL - The Slack webhook URL.

      • Channel - The channel to send notifications to, for example, #channel-for-alerts.

  9. Available since Container Cloud 2.24.0 and 2.24.2 for MOSK 23.2. Optional. Technology Preview. Enable the Linux Audit daemon auditd to monitor activity of cluster processes and prevent potential malicious activity.

    Configuration for auditd

    In the Cluster object, add the auditd parameters:

    spec:
      providerSpec:
        value:
          audit:
            auditd:
              enabled: <bool>
              enabledAtBoot: <bool>
              backlogLimit: <int>
              maxLogFile: <int>
              maxLogFileAction: <string>
              maxLogFileKeep: <int>
              mayHaltSystem: <bool>
              presetRules: <string>
              customRules: <string>
              customRulesX32: <text>
              customRulesX64: <text>
    

    Configuration parameters for auditd:

    enabled

    Boolean, default - false. Enables the auditd role to install the auditd packages and configure rules. CIS rules: 4.1.1.1, 4.1.1.2.

    enabledAtBoot

    Boolean, default - false. Configures grub to audit processes that can be audited even if they start up prior to auditd startup. CIS rule: 4.1.1.3.

    backlogLimit

    Integer, default - none. Configures the backlog to hold records. If during boot audit=1 is configured, the backlog holds 64 records. If more than 64 records are created during boot, auditd records will be lost with a potential malicious activity being undetected. CIS rule: 4.1.1.4.

    maxLogFile

    Integer, default - none. Configures the maximum size of the audit log file. Once the log reaches the maximum size, it is rotated and a new log file is created. CIS rule: 4.1.2.1.

    maxLogFileAction

    String, default - none. Defines handling of the audit log file reaching the maximum file size. Allowed values:

    • keep_logs - rotate logs but never delete them

    • rotate - add a cron job to compress rotated log files and keep maximum 5 compressed files.

    • compress - compress log files and keep them under the /var/log/auditd/ directory. Requires auditd_max_log_file_keep to be enabled.

    CIS rule: 4.1.2.2.

    maxLogFileKeep

    Integer, default - 5. Defines the number of compressed log files to keep under the /var/log/auditd/ directory. Requires auditd_max_log_file_action=compress. CIS rules - none.

    mayHaltSystem

    Boolean, default - false. Halts the system when the audit logs are full. Applies the following configuration:

    • space_left_action = email

    • action_mail_acct = root

    • admin_space_left_action = halt

    CIS rule: 4.1.2.3.

    customRules

    String, default - none. Base64-encoded content of the 60-custom.rules file for any architecture. CIS rules - none.

    customRulesX32

    String, default - none. Base64-encoded content of the 60-custom.rules file for the i386 architecture. CIS rules - none.

    customRulesX64

    String, default - none. Base64-encoded content of the 60-custom.rules file for the x86_64 architecture. CIS rules - none.

    presetRules

    String, default - none. Comma-separated list of the following built-in preset rules:

    • access

    • actions

    • delete

    • docker

    • identity

    • immutable

    • logins

    • mac-policy

    • modules

    • mounts

    • perm-mod

    • privileged

    • scope

    • session

    • system-locale

    • time-change

    Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0) in the Technology Preview scope, you can collect some of the preset rules indicated above as groups and use them in presetRules:

    • ubuntu-cis-rules - this group contains rules to comply with the Ubuntu CIS Benchmark recommendations, including the following CIS Ubuntu 20.04 v2.0.1 rules:

      • scope - 5.2.3.1

      • actions - same as 5.2.3.2

      • time-change - 5.2.3.4

      • system-locale - 5.2.3.5

      • privileged - 5.2.3.6

      • access - 5.2.3.7

      • identity - 5.2.3.8

      • perm-mod - 5.2.3.9

      • mounts - 5.2.3.10

      • session - 5.2.3.11

      • logins - 5.2.3.12

      • delete - 5.2.3.13

      • mac-policy - 5.2.3.14

      • modules - 5.2.3.19

    • docker-cis-rules - this group contains rules to comply with Docker CIS Benchmark recommendations, including the docker Docker CIS v1.6.0 rules 1.1.3 - 1.1.18.

    You can also use two additional keywords inside presetRules:

    • none - select no built-in rules.

    • all - select all built-in rules. When using this keyword, you can add the ! prefix to a rule name to exclude some rules. You can use the ! prefix for rules only if you add the all keyword as the first rule. Place a rule with the ! prefix only after the all keyword.

    Example configurations:

    • presetRules: none - disable all preset rules

    • presetRules: docker - enable only the docker rules

    • presetRules: access,actions,logins - enable only the access, actions, and logins rules

    • presetRules: ubuntu-cis-rules - enable all rules from the ubuntu-cis-rules group

    • presetRules: docker-cis-rules,actions - enable all rules from the docker-cis-rules group and the actions rule

    • presetRules: all - enable all preset rules

    • presetRules: all,!immutable,!sessions - enable all preset rules except immutable and sessions


    CIS controls
    4.1.3 (time-change)
    4.1.4 (identity)
    4.1.5 (system-locale)
    4.1.6 (mac-policy)
    4.1.7 (logins)
    4.1.8 (session)
    4.1.9 (perm-mod)
    4.1.10 (access)
    4.1.11 (privileged)
    4.1.12 (mounts)
    4.1.13 (delete)
    4.1.14 (scope)
    4.1.15 (actions)
    4.1.16 (modules)
    4.1.17 (immutable)
    Docker CIS controls
    1.1.4
    1.1.8
    1.1.10
    1.1.12
    1.1.13
    1.1.15
    1.1.16
    1.1.17
    1.1.18
    1.2.3
    1.2.4
    1.2.5
    1.2.6
    1.2.7
    1.2.10
    1.2.11
  10. Click Create.

    To monitor the cluster readiness, hover over the status icon of a specific cluster in the Status column of the Clusters page.

    Once the orange blinking status icon becomes green and Ready, the cluster deployment or update is complete.

    You can monitor live deployment status of the following cluster components:

    Component

    Description

    Helm

    Installation or upgrade status of all Helm releases

    Kubelet

    Readiness of the node in a Kubernetes cluster, as reported by kubelet

    Kubernetes

    Readiness of all requested Kubernetes objects

    Nodes

    Equality of the requested nodes number in the cluster to the number of nodes having the Ready LCM status

    OIDC

    Readiness of the cluster OIDC configuration

    StackLight

    Health of all StackLight-related objects in a Kubernetes cluster

    Swarm

    Readiness of all nodes in a Docker Swarm cluster

    LoadBalancer

    Readiness of the Kubernetes API load balancer

    ProviderInstance

    Readiness of all machines in the underlying infrastructure (virtual or bare metal, depending on the provider type)

    Graceful Reboot

    Readiness of a cluster during a scheduled graceful reboot, available since Cluster releases 15.0.1 and 14.0.0.

    Infrastructure Status

    Available since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0). Readiness of the MetalLBConfig object along with MetalLB and DHCP subnets.

    LCM Operation

    Available since Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Health of all LCM operations on the cluster and its machines.

    LCM Agent

    Available since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0). Health of all LCM agents on cluster machines and the status of LCM agents update to the version from the current Cluster release.

    For the history of a cluster deployment or update, refer to Inspect the history of a cluster and machine deployment or update.

  11. Configure MetalLB as described in Configure MetalLB.

  12. Create and add required subnets as described in Create subnets.

  13. Configure an L2 template for a new cluster. For initial details, see Workflow of network interface naming.

Workflow of network interface naming

To simplify operations with L2 templates, before you start creating them, inspect the general workflow of a network interface name gathering and processing.

Network interface naming workflow:

  1. The Operator creates a BareMetalHostInventory object.

    Note

    Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

    Caution

    While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

  2. The BareMetalHostInventory object executes the introspection stage and becomes ready.

  3. The Operator collects information about NIC count, naming, and so on for further changes in the mapping logic.

    At this stage, the NICs order in the object may randomly change during each introspection, but the NICs names are always the same. For more details, see Predictable Network Interface Names.

    For example:

    # Example commands:
    # kubectl -n managed-ns get bmh baremetalhost1 -o custom-columns='NAME:.metadata.name,STATUS:.status.provisioning.state'
    # NAME            STATE
    # baremetalhost1  ready
    
    # kubectl -n managed-ns get bmh baremetalhost1 -o yaml
    # Example output:
    
    apiVersion: metal3.io/v1alpha1
    kind: BareMetalHost
    ...
    status:
    ...
        nics:
        - ip: fe80::ec4:7aff:fe6a:fb1f%eno2
          mac: 0c:c4:7a:6a:fb:1f
          model: 0x8086 0x1521
          name: eno2
          pxe: false
        - ip: fe80::ec4:7aff:fe1e:a2fc%ens1f0
          mac: 0c:c4:7a:1e:a2:fc
          model: 0x8086 0x10fb
          name: ens1f0
          pxe: false
        - ip: fe80::ec4:7aff:fe1e:a2fd%ens1f1
          mac: 0c:c4:7a:1e:a2:fd
          model: 0x8086 0x10fb
          name: ens1f1
          pxe: false
        - ip: 192.168.1.151 # Temp. PXE network adress
          mac: 0c:c4:7a:6a:fb:1e
          model: 0x8086 0x1521
          name: eno1
          pxe: true
     ...
    
  4. The Operator selects from the following options:

  5. The Operator creates a Machine or Subnet object.

  6. The baremetal-provider service links the Machine object to the BareMetalHostInventory object.

  7. The kaas-ipam and baremetal-provider services collect hardware information from the BareMetalHostInventory object and use it to configure host networking and services.

  8. The kaas-ipam service:

    1. Spawns the IpamHost object.

    2. Renders the l2template object.

    3. Spawns the ipaddr object.

    4. Updates the IpamHost object status with all rendered and linked information.

  9. The baremetal-provider service collects the rendered networking information from the IpamHost object

  10. The baremetal-provider service proceeds with the IpamHost object provisioning.

Create subnets

After creating a basic Cluster object along with the MetalLBConfig object and before creating an L2 template, create the required subnets that can be used in the L2 template to allocate IP addresses for the managed cluster nodes. Where required, create a number of subnets for a particular project using the Subnet CR.

Each subnet used in an L2 template has its logical scope that is set using the scope parameter in the corresponding L2Template.spec.l3Layout section. One of the following logical scopes is used for each subnet referenced in an L2 template:

  • global - CR uses the default namespace. A subnet can be used for any cluster located in any project.

  • namespaced - CR uses the namespace that corresponds to a particular project where managed clusters are located. A subnet can be used for any cluster located in the same project.

  • cluster - Unsupported since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). CR uses the namespace where the referenced cluster is located. A subnet is only accessible to the cluster that L2Template.metadata.labels:cluster.sigs.k8s.io/cluster-name (mandatory since 2.25.0) or L2Template.spec.clusterRef (deprecated since 2.25.0) refers to. The Subnet objects with the cluster scope will be created for every new cluster depending on the provided SubnetPool.

Note

The use of the ipam/SVC-MetalLB label in Subnet objects is unsupported as part of the MetalLBConfigTemplate object deprecation since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0). No actions are required for existing objects. A Subnet object containing this label will be ignored by baremetal-provider after cluster update to the mentioned Cluster releases.

You can have subnets with the same name in different projects. In this case, the subnet that has the same project as the cluster will be used. One L2 template may often reference several subnets, those subnets may have different scopes in this case.

The IP address objects (IPaddr CR) that are allocated from subnets always have the same project as their corresponding IpamHost objects, regardless of the subnet scope.

You can create subnets using either the Container Cloud web UI or CLI.

Service labels and their life cycle

Any Subnet object may contain ipam/SVC-<serviceName> labels. All IP addresses allocated from the Subnet object that has service labels defined, will inherit those labels.

When a particular IpamHost uses IP addresses allocated from such labeled Subnet objects, the ServiceMap field in IpamHost.Status will contain information about which IPs and interfaces correspond to which service labels (that have been set in the Subnet objects). Using ServiceMap, you can understand what IPs and interfaces of a particular host are used for network traffic of a given service.

Currently, Container Cloud uses the following service labels that allow for the use of specific subnets for particular Container Cloud services:

  • ipam/SVC-k8s-lcm

  • ipam/SVC-ceph-cluster

  • ipam/SVC-ceph-public

  • ipam/SVC-dhcp-range

  • ipam/SVC-MetalLB Unsupported since 2.28.0 (17.3.0 and 16.3.0)

  • ipam/SVC-LBhost

Caution

The use of the ipam/SVC-k8s-lcm label is mandatory for every cluster.

You can also add custom service labels to the Subnet objects the same way you add Container Cloud service labels. The mapping of IPs and interfaces to the defined services is displayed in IpamHost.Status.ServiceMap.

You can assign multiple service labels to one network. You can also assign the ceph-* and dhcp-range services to multiple networks. In the latter case, the system sorts the IP addresses in the ascending order:

serviceMap:
  ipam/SVC-ceph-cluster:
    - ifName: ceph-br2
      ipAddress: 10.0.10.11
    - ifName: ceph-br1
      ipAddress: 10.0.12.22
  ipam/SVC-ceph-public:
    - ifName: ceph-public
      ipAddress: 10.1.1.15
  ipam/SVC-k8s-lcm:
    - ifName: k8s-lcm
      ipAddress: 10.0.1.52

You can add service labels during creation of subnets as described in Create subnets for a managed cluster using CLI.

MetalLB configuration guidelines for subnets

Note

Consider this section as obsolete since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0) due to the MetalLBConfigTemplate object deprecation. For details, see MOSK Deprecation Notes: MetalLBConfigTemplate resource management.

Caution

This section also applies to the bootstrap procedure of a management cluster with the following difference: instead of creating the Subnet object, add its configuration to ipam-objects.yaml.template located in kaas-bootstrap/templates/bm/.

The Kubernetes Subnet object is created for a management cluster from templates during bootstrap.

Each Subnet object can be used to define either a MetalLB address range or MetalLB address pool. A MetalLB address pool may contain one or several address ranges. The following rules apply to creation of address ranges or pools:

  • To designate a subnet as a MetalLB address pool or range, use the ipam/SVC-MetalLB label key. Set the label value to "1".

  • The object must contain the cluster.sigs.k8s.io/<cluster-name> label to reference the name of the target cluster where the MetalLB address pool is used.

  • You may create multiple subnets with the ipam/SVC-MetalLB label to define multiple IP address ranges or multiple address pools for MetalLB in the cluster.

  • The IP addresses of the MetalLB address pool are not assigned to the interfaces on hosts. This subnet is virtual. Do not include such subnets to the L2 template definitions for your cluster.

  • If a Subnet object defines a MetalLB address range, no additional object properties are required.

  • You can use any number of Subnet objects with each defining a single MetalLB address range. In this case, all address ranges are aggregated into a single MetalLB L2 address pool named services having the auto-assign policy enabled.

  • Intersection of IP address ranges within any single MetalLB address pool is not allowed.

    The bare metal provider verifies intersection of IP address ranges. If it detects intersection, the MetalLB configuration is blocked and the provider logs contain corresponding error messages.

Use the following labels to identify the Subnet object as a MetalLB address pool and configure the name and protocol for that address pool. All labels below are mandatory for the Subnet object that configures a MetalLB address pool.

Mandatory Subnet labels for a MetalLB address pool

Label

Description

Labels to link Subnet to the target cluster and region

cluster.sigs.k8s.io/cluster-name

Specifies the cluster name where the MetalLB address pool is used.

kaas.mirantis.com/provider

Specifies the provider of the cluster where the MetalLB address pool is used.

kaas.mirantis.com/region

Specifies the region name of the cluster where the MetalLB address pool is used.

Note

The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

ipam/SVC-MetalLB

Defines that the Subnet object will be used to provide a new address pool or range for MetalLB.

metallb/address-pool-name

Every address pool must have a distinct name.

The services-pxe address pool is mandatory when configuring a dedicated PXE network in the management cluster. This name will be used in annotations for services exposed through the PXE network. A bootstrap cluster also uses the services-pxe address pool for its provision services so that management cluster nodes can be provisioned from the bootstrap cluster. After a management cluster is deployed, the bootstrap cluster is deleted and that address pool is solely used by the newly deployed cluster.

metallb/address-pool-auto-assign

Configures the auto-assign policy of an address pool. Boolean.

Caution

For the address pools defined using the MetalLB Helm chart values in the Cluster spec section, auto-assign policy is set to true and is not configurable .

For any service that does not have a specific MetalLB annotation configured, MetalLB allocates external IPs from arbitrary address pools that have the auto-assign policy set to true.

Only for the service that has a specific MetalLB annotation with the address pool name, MetalLB allocates external IPs from the address pool having the auto-assign policy set to false.

metallb/address-pool-protocol

Sets the address pool protocol. The only supported value is layer2 (default).

Caution

Do not set the same address pool name for two or more Subnet objects. Otherwise, the corresponding MetalLB address pool configuration fails with a warning message in the bare metal provider log.

Caution

For the auto-assign policy, the following configuration rules apply:

  • At least one MetalLB address pool must have the auto-assign policy enabled so that unannotated services can have load balancer IPs allocated for them. To satisfy this requirement, either configure one of address pools using the Subnet object with metallb/address-pool-auto-assign: "true" or configure address range(s) using the Subnet object(s) without metallb/address-pool-* labels.

  • When configuring multiple address pools with the auto-assign policy enabled, keep in mind that it is not determined in advance which pool of those multiple address pools is used to allocate an IP for a particular unannotated service.

Configure MetalLB

This section describes how to set up and verify MetalLB parameters before configuring subnets for a managed cluster.

Caution

This section also applies to the bootstrap procedure of a management cluster with the following differences:

  • Instead of the Cluster object, configure templates/bm/cluster.yaml.template.

  • Instead of the MetalLBConfig object, configure templates/bm/metallbconfig.yaml.template.

  • Instead of creating specific IPAM objects such as Subnet, add their configuration to templates/bm/ipam-objects.yaml.template.

The Kubernetes objects described below are created for a management cluster from template files during bootstrap.

Configuration rules for the ‘MetalLBConfig’ object

Caution

The use of the MetalLBConfig object is mandatory for management and managed clusters after a management cluster upgrade to the Cluster release 16.0.0.

The following rules and requirements apply to configuration of the MetalLBConfig object:

  • Define one MetalLBConfig object per cluster.

  • Define the following mandatory labels:

    cluster.sigs.k8s.io/cluster-name

    Specifies the cluster name where the MetalLB address pool is used.

    kaas.mirantis.com/provider

    Specifies the provider of the cluster where the MetalLB address pool is used.

    kaas.mirantis.com/region

    Specifies the region name of the cluster where the MetalLB address pool is used.

    Note

    The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

  • Intersection of IP address ranges within any single MetalLB address pool is not allowed.

  • At least one MetalLB address pool must have the auto-assign policy enabled so that unannotated services can have load balancer IP addresses allocated to them.

  • When configuring multiple address pools with the auto-assign policy enabled, keep in mind that it is not determined in advance which pool of those multiple address pools is used to allocate an IP address for a particular unannotated service.

Note

You can optimize address announcement for load-balanced services using the interfaces selector for the l2Advertisements object. This selector allows for address announcement only on selected host interfaces. For details, see API Reference: MetalLB configuration examples.

Configuration rules for MetalLBConfigTemplate (obsolete since 2.27.0)

Caution

The MetalLBConfigTemplate object is deprecated in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0) and unsupported since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). For details, see MOSK Deprecation Notes: MetalLBConfigTemplate resource management.

  • All rules described above for MetalLBConfig also apply to MetalLBConfigTemplate.

  • Optional. Define one MetalLBConfigTemplate object per cluster. The use of this object without MetalLBConfig is not allowed.

  • When using MetalLBConfigTemplate:

    • MetalLBConfig must reference MetalLBConfigTemplate by name:

      spec:
        templateName: <managed-metallb-template>
      
    • You can use Subnet objects for defining MetalLB address pools. Refer to MetalLB configuration guidelines for subnets for guidelines on configuring MetalLB address pools using Subnet objects.

    • You can optimize address announcement for load-balanced services using the interfaces selector for the l2Advertisements object. This selector allows for address announcement only on selected host interfaces. For details, see API Reference: MetalLBConfigTemplate spec.

Configure and verify MetalLB using the CLI
  1. Optional. Configure parameters related to MetalLB components life cycle such as deployment and update using the metallb Helm chart values in the Cluster spec section. For example:

  2. Configure the MetalLB parameters related to IP address allocation and announcement for load-balanced cluster services. Select from the following options:

    Recommended. Default. Mandatory after a management cluster upgrade to the Cluster release 17.2.0.

    Create the MetalLBConfig object:

    In the Technology Preview scope, you can use BGP for announcement of external addresses of Kubernetes load-balanced services for managed clusters. To configure the BGP announcement mode for MetalLB, use the MetalLBConfig object.

    The use of BGP is required to announce IP addresses for load-balanced services when using MetalLB on nodes that are distributed across multiple racks. In this case, setting of rack-id labels on nodes is required, they are used in node selectors for BGPPeer, BGPAdvertisement, or both MetalLB objects to properly configure BGP connections from each node.

    Configuration example of the Machine object for the BGP announcement mode
    apiVersion: cluster.k8s.io/v1alpha1
    kind: Machine
    metadata:
      name: test-cluster-compute-1
      namespace: managed-ns
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        ipam/RackRef: rack-1  # reference to the "rack-1" Rack
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      providerSpec:
        value:
          ...
          nodeLabels:
          - key: rack-id   # node label can be used in "nodeSelectors" inside
            value: rack-1  # "BGPPeer" and/or "BGPAdvertisement" MetalLB objects
      ...
    
    Configuration example of the MetalLBConfig object for the BGP announcement mode
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: MetalLBConfig
    metadata:
      name: test-cluster-metallb-config
      namespace: managed-ns
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      ...
      bgpPeers:
        - name: svc-peer-1
          spec:
            holdTime: 0s
            keepaliveTime: 0s
            peerAddress: 10.77.42.1
            peerASN: 65100
            myASN: 65101
            nodeSelectors:
              - matchLabels:
                rack-id: rack-1  # references the nodes having
                                 # the "rack-id=rack-1" label
      bgpAdvertisements:
        - name: services
          spec:
            aggregationLength: 32
            aggregationLengthV6: 128
            ipAddressPools:
              - services
            peers:
              - svc-peer-1
              ...
    

    Select from the following options:

    • Deprecated in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0) and unsupported since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Mandatory after a management cluster upgrade to the Cluster release 16.0.0.

      Create MetalLBConfig and MetalLBConfigTemplate objects. This method allows using the Subnet object to define MetalLB address pools.

      Note

      For managed clusters, this configuration method is generally available since Cluster releases 17.0.0 and 16.0.0. And it is available as Technology Preview since Cluster releases 15.0.1, 14.0.1, and 14.0.0.

      Since Cluster releases 15.0.3 and 14.0.3, in the Technology Preview scope, you can use BGP for announcement of external addresses of Kubernetes load-balanced services for managed clusters. To configure the BGP announcement mode for MetalLB, use MetalLBConfig and MetalLBConfigTemplate objects.

      The use of BGP is required to announce IP addresses for load-balanced services when using MetalLB on nodes that are distributed across multiple racks. In this case, setting of rack-id labels on nodes is required, they are used in node selectors for BGPPeer, BGPAdvertisement, or both MetalLB objects to properly configure BGP connections from each node.

      Configuration example of the Machine object for the BGP announcement mode
      apiVersion: cluster.k8s.io/v1alpha1
      kind: Machine
      metadata:
        name: test-cluster-compute-1
        namespace: managed-ns
        labels:
          cluster.sigs.k8s.io/cluster-name: test-cluster
          ipam/RackRef: rack-1  # reference to the "rack-1" Rack
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
      spec:
        providerSpec:
          value:
            ...
            nodeLabels:
            - key: rack-id   # node label can be used in "nodeSelectors" inside
              value: rack-1  # "BGPPeer" and/or "BGPAdvertisement" MetalLB objects
        ...
      
      Configuration example of the MetalLBConfigTemplate object for the BGP announcement mode
      apiVersion: ipam.mirantis.com/v1alpha1
      kind: MetalLBConfigTemplate
      metadata:
        name: test-cluster-metallb-config-template
        namespace: managed-ns
        labels:
          cluster.sigs.k8s.io/cluster-name: test-cluster
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
      spec:
        templates:
          ...
          bgpPeers: |
            - name: svc-peer-1
              spec:
                peerAddress: 10.77.42.1
                peerASN: 65100
                myASN: 65101
                nodeSelectors:
                  - matchLabels:
                      rack-id: rack-1  # references the nodes having
                                       # the "rack-id=rack-1" label
          bgpAdvertisements: |
            - name: services
              spec:
                ipAddressPools:
                  - services
                peers:
                  - svc-peer-1
                  ...
      

      The bgpPeers and bgpAdvertisements fields are used to configure BGP announcement instead of l2Advertisements.

      Note

      The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

      The use of BGP for announcement also allows for better balancing of service traffic between cluster nodes as well as gives more configuration control and flexibility for infrastructure administrators. For configuration examples, refer to MetalLB configuration examples. For configuration procedure, refer to Configure BGP announcement for cluster API LB address.

    • Deprecated since Container Cloud 2.24.0. Configure the configInline value in the MetalLB chart of the Cluster object.

      Warning

      This functionality is removed during the management cluster upgrade to the Cluster release 16.0.0. Therefore, this option becomes unavailable on managed clusters after the parent management cluster upgrade to 16.0.0.

    • Deprecated since Container Cloud 2.24.0. Configure the Subnet objects without MetalLBConfigTemplate.

      Warning

      This functionality is removed during the management cluster upgrade to the Cluster release 16.0.0. Therefore, this option becomes unavailable on managed clusters after the parent management cluster upgrade to 16.0.0.

    Caution

    If the MetalLBConfig object is not used for MetalLB configuration related to address allocation and announcement for load-balanced services, then automated migration applies during creation of clusters of any type or cluster update to Cluster releases 15.0.x or 14.0.x.

    During automated migration, the MetalLBConfig and MetalLBConfigTemplate objects are created and contents of the MetalLB chart configInline value is converted to the parameters of the MetalLBConfigTemplate object.

    Any change to the configInline value made on a 15.0.x or 14.0.x cluster will be reflected in the MetalLBConfigTemplate object.

    This automated migration is removed during your management cluster upgrade to the Cluster release 16.0.0, which is introduced in Container Cloud 2.25.0, together with the possibility to use the configInline value of the MetalLB chart. After that, any changes in MetalLB configuration related to address allocation and announcement for load-balanced services will be applied using the MetalLBConfigTemplate and Subnet objects only.

    Select from the following options:

    • Configure Subnet objects. For details, see MetalLB configuration guidelines for subnets.

    • Configure the configInline value for the MetalLB chart in the Cluster object.

    • Configure both the configInline value for the MetalLB chart and Subnet objects.

      The resulting MetalLB address pools configuration will contain address ranges from both cluster specification and Subnet objects. All address ranges for L2 address pools will be aggregated into a single L2 address pool and sorted as strings.

    Changes to be applied since Container Cloud 2.25.0

    The configuration options above are deprecated since Container Cloud 2.24.0, after your management cluster upgrade to the Cluster release 14.0.0 or 14.0.1. Automated migration of MetalLB parameters applies during cluster creation or update to Container Cloud 2.24.x.

    During automated migration, the MetalLBConfig and MetalLBConfigTemplate objects are created and contents of the MetalLB chart configInline value is converted to the parameters of the MetalLBConfigTemplate object.

    Any change to the configInline value made on a Container Cloud 2.24.x cluster will be reflected in the MetalLBConfigTemplate object.

    This automated migration is removed during your management cluster upgrade to the Cluster release 16.0.0, which is introduced in Container Cloud 2.25.0, together with the possibility to use the configInline value of the MetalLB chart. After that, any changes in MetalLB configuration related to address allocation and announcement for load-balanced services will be applied using the MetalLBConfigTemplate and Subnet objects only.

  3. Verify the current MetalLB configuration:

    Verify the MetalLB configuration that is stored in MetalLB objects:

    kubectl -n metallb-system get ipaddresspools,l2advertisements
    

    The example system output:

    NAME                                    AGE
    ipaddresspool.metallb.io/default        129m
    ipaddresspool.metallb.io/services-pxe   129m
    
    NAME                                      AGE
    l2advertisement.metallb.io/default        129m
    l2advertisement.metallb.io/services-pxe   129m
    

    Verify one of the listed above MetalLB objects:

    kubectl -n metallb-system get <object> -o json | jq '.spec'
    

    The example system output for ipaddresspool objects:

    $ kubectl -n metallb-system get ipaddresspool.metallb.io/default -o json | jq '.spec'
    {
      "addresses": [
        "10.0.11.61-10.0.11.80"
      ],
      "autoAssign": true,
      "avoidBuggyIPs": false
    }
    $ kubectl -n metallb-system get ipaddresspool.metallb.io/services-pxe -o json | jq '.spec'
    {
      "addresses": [
        "10.0.0.61-10.0.0.70"
      ],
      "autoAssign": false,
      "avoidBuggyIPs": false
    }
    

    Verify the MetalLB configuration that is stored in the ConfigMap object:

    kubectl -n metallb-system get cm metallb -o jsonpath={.data.config}
    

    An example of a successful output:

    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - 10.0.11.61-10.0.11.80
    - name: services-pxe
      protocol: layer2
      auto-assign: false
      addresses:
      - 10.0.0.61-10.0.0.70
    

    The auto-assign parameter will be set to false for all address pools except the default one. So, a particular service will get an address from such an address pool only if the Service object has a special metallb.universe.tf/address-pool annotation that points to the specific address pool name.

    Note

    It is expected that every Container Cloud service on a management cluster will be assigned to one of the address pools. Current consideration is to have two MetalLB address pools:

    • services-pxe is a reserved address pool name to use for the Container Cloud services in the PXE network (Ironic API, HTTP server, caching server).

    • default is an address pool to use for all other Container Cloud services in the management network. No annotation is required on the Service objects in this case.

  4. Proceed to creating cluster subnets as described in Create subnets for a managed cluster using CLI.

Configure and verify MetalLB using the web UI

Available since 2.28.0 (17.3.0 and 16.3.0)

Note

The BGP configuration is not yet supported in the Container Cloud web UI. Meantime, use the CLI for this purpose. For details, see Configure and verify MetalLB using the CLI.

  1. Read the MetalLB configuration guidelines described in Configure MetalLB.

  2. Optional. Configure parameters related to MetalLB components life cycle such as deployment and update using the metallb Helm chart values in the Cluster spec section. For example:

  3. Log in to the Container Cloud web UI with the m:kaas:namespace@operator or m:kaas:namespace@writer permissions.

  4. Switch to the required non-default project using the Switch Project action icon located on top of the main left-side navigation panel.

    To create a project, refer to Create a project for managed clusters.

  5. In the Networks section, click the MetalLB Configs tab.

  6. Click Create MetalLB Config.

  7. Fill out the Create MetalLB Config form as required:

    • Name

      Name of the MetalLB object being created.

    • Cluster

      Name of the cluster that the MetalLB object is being created for.

    • IP Address Pools

      List of MetalLB IP address pool descriptions that will be used to create the MetalLB IPAddressPool objects. Click the + button on the right side of the section to add more objects.

      • Name

        IP address pool name.

      • Addresses

        Comma-separated ranges of the IP addresses included into the address pool.

      • Auto Assign

        Enable auto-assign policy for unannotated services to have load balancer IP addresses allocated to them. At least one MetalLB address pool must have the auto-assign policy enabled.

      • Service Allocation

        IP address pool allocation to services. Click Edit to insert a service allocation object with required label selectors for services in the YAML format. For example:

        serviceSelectors:
        - matchExpressions:
          - key: app.kubernetes.io/name
            operator: NotIn
            values:
            - dhcp-lb
        

        For details on the MetalLB IPAddressPool object type, see MetalLB documentation.

      • L2 Advertisements

        List of MetalLBL2Advertisement objects to create MetalLB L2Advertisement objects.

        The l2Advertisements object allows defining interfaces to optimize the announcement. When you use the interfaces selector, LB addresses are announced only on selected host interfaces.

        Mirantis recommends using the interfaces selector if nodes use separate host networks for different types of traffic. The pros of such configuration are as follows: less spam on other interfaces and networks and limited chances to reach IP addresses of load-balanced services from irrelevant interfaces and networks.

        Caution

        Interface names in the interfaces list must match those on the corresponding nodes.

        Add the following parameters:

        • Name

          Name of the l2Advertisements object.

        • Interfaces

          Optional. Comma-separated list of interface names that must match the ones on the corresponding nodes. These names are defined in L2 templates that are linked to the selected cluster.

        • IP Address Pools

          Select the IP adress pool to use for the l2Advertisements object.

        • Node Selectors

          Optional. Match labels and values for the Kubernetes node selector to limit the nodes announced as next hops for the LoadBalancer IP. If you do not provide any labels, all nodes are announced as next hops.

        For details on the MetalLB L2Advertisements object type, see MetalLB documentation.

  8. Click Create.

  9. In Networks > MetalLB Configs, verify the status of the created MetalLB object:

    • Ready - object is operational.

    • Error - object is non-operational. Hover over the status

      to obtain details of the issue.

    Note

    To verify the object details, in Networks > MetalLB Configs, click the More action icon in the last column of the required object section and select MetalLB Config info.

  10. Proceed to creating cluster subnets as described in Create subnets for a managed cluster using web UI.

Configure node selector for MetalLB speaker

By default, MetalLB speakers are deployed on all Kubernetes nodes. You can configure MetalLB to run its speakers on a particular set of nodes. This decreases the number of nodes that should be connected to external network. In this scenario, only a few nodes are exposed for ingress traffic from the outside world.

To customize the MetalLB speaker node selector:

  1. Using kubeconfig of the management cluster, open the Cluster object of the managed cluster for editing:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <TargetClusterProjectName> edit cluster <TargetClusterName>
    
  2. In the spec:providerSpec:value:helmReleases section, add the speaker.nodeSelector field for metallb:

     spec:
       ...
       providerSpec:
         value:
           ...
           helmReleases:
           - name: metallb
             values:
               ...
               speaker:
                 nodeSelector:
                   metallbSpeakerEnabled: "true"
    

    The metallbSpeakerEnabled: "true" parameter in this example is the label on Kubernetes nodes where MetalLB speakers will be deployed. It can be an already existing node label or a new one.

    You can add user-defined labels to nodes using the nodeLabels field.

    List of node labels to be attached to a node for the user to run certain components on separate cluster nodes. The list of allowed node labels is located in the Cluster object status providerStatus.releaseRef.current.allowedNodeLabels field.

    If the value field is not defined in allowedNodeLabels, a label can have any value.

    Before or after a machine deployment, add the required label from the allowed node labels list with the corresponding value to spec.providerSpec.value.nodeLabels in machine.yaml. For example:

    nodeLabels:
    - key: stacklight
      value: enabled
    

    The addition of a node label that is not available in the list of allowed node labels is restricted.

Create subnets for a managed cluster using web UI

After creating the MetalLB configuration as described in Configure MetalLB and before creating an L2 template, create the required subnets to use in the L2 template to allocate IP addresses for the managed cluster nodes.

To create subnets for a managed cluster using web UI:

  1. Log in to the Container Cloud web UI with the operator permissions.

  2. Switch to the required non-default project using the Switch Project action icon located on top of the main left-side navigation panel.

    To create a project, refer to Create a project for managed clusters.

  3. Create basic cluster settings as described in Create a cluster using web UI.

  4. Select one of the following options:

    1. In the left sidebar, navigate to Networks. The Subnets tab opens.

    2. Click Create Subnet.

    3. Fill out the Create subnet form as required:

      • Name

        Subnet name.

      • Subnet Type

        Subnet type:

        • DHCP

          DHCP subnet that configures DHCP address ranges used by the DHCP server on the management cluster. For details, see Configure multiple DHCP address ranges.

        • LB

          Cluster API LB subnet.

        • LCM

          LCM subnet(s).

        • Storage access

          Available in the web UI since Container Cloud 2.28.0 (17.3.0 and 16.3.0). Storage access subnet.

        • Storage replication

          Available in the web UI since Container Cloud 2.28.0 (17.3.0 and 16.3.0). Storage replication subnet.

        • Custom

          Custom subnet. For example, external or Kubernetes workloads.

        • MetalLB

          Services subnet(s).

          Warning

          Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0), disregard this parameter during subnet creation. Configure MetalLB separately as described in Configure MetalLB.

          This parameter is removed from the Container Cloud web UI in Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0).

        For description of subnet types in a managed cluster, see Managed cluster networking.

      • Cluster

        Cluster name that the subnet is being created for. Not required only for the DHCP subnet.

      • CIDR

        A valid IPv4 address of the subnet in the CIDR notation, for example, 10.11.0.0/24.

      • Include Ranges Optional

        A comma-separated list of IP address ranges within the given CIDR that should be used in the allocation of IPs for nodes. The gateway, network, broadcast, and DNSaddresses will be excluded (protected) automatically if they intersect with one of the range. The IPs outside the given ranges will not be used in the allocation. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77.

        Warning

        Do not use values that are out of the given CIDR.

      • Exclude Ranges Optional

        A comma-separated list of IP address ranges within the given CIDR that should not be used in the allocation of IPs for nodes. The IPs within the given CIDR but outside the given ranges will be used in the allocation. The gateway, network, broadcast, and DNS addresses will be excluded (protected) automatically if they are included in the CIDR. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77.

        Warning

        Do not use values that are out of the given CIDR.

      • Gateway Optional

        A valid IPv4 gateway address, for example, 10.11.0.9. Does not apply to the MetalLB subnet.

      • Nameservers

        IP addresses of nameservers separated by a comma. Does not apply to the DHCP and MetalLB subnet types.

      • Use whole CIDR

        Optional. Select to use the whole IPv4 address range that is set in the CIDR field. Useful when defining single IP address (/32), for example, in the Cluster API load balancer (LB) subnet.

        If not set, the network address and broadcast address in the IP subnet are excluded from the address allocation.

      • Labels

        Key-value pairs attached to the selected subnet:

        Caution

        The values of the created subnet labels must match the ones in spec.l3Layout section of the corresponding L2Template object.

        • Optional user-defined labels to distinguish different subnets of the same type. For an example of user-defined labels, see Expand IP addresses capacity in an existing cluster.

          The following special values define the storage subnets:

          • ipam/SVC-ceph-cluster

          • ipam/SVC-ceph-public

          For more examples of label usage, see Service labels and their life cycle and Create subnets for a managed cluster using CLI.

          Click Add a label and assign the first custom label with the required name and value. To assign consecutive labels, use the + button located in the right side of the Labels section.

        • MetalLB:

          Warning

          Since Container Cloud 2.28.0 (Cluster releases 17.3.0

          and 16.3.0), disregard this label during subnet creation. Configure MetalLB separately as described in Configure MetalLB.

          The label will be removed from the Container Cloud web UI in one of the following releases.

          • metallb/address-pool-name

            Name of the subnet address pool. Exemplary values: services, default, external, services-pxe.

            The latter label is dedicated for management clusters only. For details about address pool names of a management cluster, see Separate PXE and management networks.

          • metallb/address-pool-auto-assign

            Enables automatic assignment of address pool. Boolean.

          • metallb/address-pool-protocol

            Defines the address pool protocol. Possible values:

            • layer2 - announcement using the ARP protocol.

            • bgp - announcement using the BGP protocol. Technology Preview.

            For description of these protocols, refer to the MetalLB documentation.

    4. Click Create.

    5. In the Networks tab, verify the status of the created subnet:

      • Ready - object is operational.

      • Error - object is non-operational. Hover over the status

        to obtain details of the issue.

      Note

      To verify subnet details, in the Networks tab, click the More action icon in the last column of the required subnet and select Subnet info.

    1. In the Clusters tab, click the required cluster and scroll down to the Subnets section.

    2. Click Add Subnet.

    3. Fill out the Add new subnet form as required:

      • Subnet Name

        Subnet name.

      • CIDR

        A valid IPv4 CIDR, for example, 10.11.0.0/24.

      • Include Ranges Optional

        A comma-separated list of IP address ranges within the given CIDR that should be used in the allocation of IPs for nodes. The gateway, network, broadcast, and DNSaddresses will be excluded (protected) automatically if they intersect with one of the range. The IPs outside the given ranges will not be used in the allocation. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77.

        Warning

        Do not use values that are out of the given CIDR.

      • Exclude Ranges Optional

        A comma-separated list of IP address ranges within the given CIDR that should not be used in the allocation of IPs for nodes. The IPs within the given CIDR but outside the given ranges will be used in the allocation. The gateway, network, broadcast, and DNS addresses will be excluded (protected) automatically if they are included in the CIDR. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77.

        Warning

        Do not use values that are out of the given CIDR.

      • Gateway Optional

        A valid gateway address, for example, 10.11.0.9.

    4. Click Create.

Proceed to creating L2 templates as described in Create L2 templates.

Create subnets for a managed cluster using CLI

After creating the MetalLB configuration as described in Configure MetalLB and before creating an L2 template, create the required subnets to use in the

L2 template to allocate IP addresses for the managed cluster nodes.

To create subnets for a managed cluster using CLI:

  1. Log in to a local machine where your management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created during the last stage of the management cluster bootstrap.

  2. Create a cluster using one of the following options:

  3. Create the subnet.yaml file with a number of global or namespaced subnets depending on the configuration of your cluster:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> apply -f <SubnetFileName.yaml>
    

    Note

    In the command above and in the steps below, substitute the parameters enclosed in angle brackets with the corresponding values.

    Example of a subnet.yaml file:

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      name: demo
      namespace: demo-namespace
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      cidr: 10.11.0.0/24
      gateway: 10.11.0.9
      includeRanges:
      - 10.11.0.5-10.11.0.70
      nameservers:
      - 172.18.176.6
    

    Note

    The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

    Specification fields of the Subnet object

    Parameter

    Description

    cidr (singular)

    A valid IPv4 CIDR, for example, 10.11.0.0/24.

    includeRanges (list)

    A comma-separated list of IP address ranges within the given CIDR that should be used in the allocation of IPs for nodes. The gateway, network, broadcast, and DNSaddresses will be excluded (protected) automatically if they intersect with one of the range. The IPs outside the given ranges will not be used in the allocation. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77.

    Warning

    Do not use values that are out of the given CIDR.

    excludeRanges (list)

    A comma-separated list of IP address ranges within the given CIDR that should not be used in the allocation of IPs for nodes. The IPs within the given CIDR but outside the given ranges will be used in the allocation. The gateway, network, broadcast, and DNS addresses will be excluded (protected) automatically if they are included in the CIDR. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77.

    Warning

    Do not use values that are out of the given CIDR.

    useWholeCidr (boolean)

    If set to true, the subnet address (10.11.0.0 in the example above) and the broadcast address (10.11.0.255 in the example above) are included into the address allocation for nodes. Otherwise, (false by default), the subnet address and broadcast address will be excluded from the address allocation.

    gateway (singular)

    A valid gateway address, for example, 10.11.0.9.

    nameservers (list)

    A list of the IP addresses of name servers. Each element of the list is a single address, for example, 172.18.176.6.

    Caution

    • The subnet for the PXE network of the management cluster is automatically created during deployment.

    • The subnet for the LCM network must contain the ipam/SVC-k8s-lcm: "1" label. For details, see Service labels and their life cycle.

    • Each cluster must use at least one subnet for its LCM network. Every node must have the address allocated in the LCM network using such subnet(s).

    Each node of every cluster must have only one IP address in the LCM network that is allocated from one of the Subnet objects having the ipam/SVC-k8s-lcm label defined. Therefore, all Subnet objects used for LCM networks must have the ipam/SVC-k8s-lcm label defined. For details, see Service labels and their life cycle.

    Note

    You may use different subnets to allocate IP addresses to different Container Cloud components in your cluster. Add a label with the ipam/SVC- prefix to each subnet that is used to configure a Container Cloud service. For details, see Service labels and their life cycle and the optional steps below.

    Caution

    Use of a dedicated network for Kubernetes pods traffic, for external connection to the Kubernetes services exposed by the cluster, and for the Ceph cluster access and replication traffic is available as Technology Preview. Use such configurations for testing and evaluation purposes only. For the Technology Preview feature definition, refer to Technology Preview features.

  4. Optional. Technology Preview. Add a subnet for the externally accessible API endpoint of the managed cluster.

    • Make sure that loadBalancerHost is set to "" (empty string) in the Cluster spec.

      spec:
        providerSpec:
          value:
            apiVersion: baremetal.k8s.io/v1alpha1
            kind: BaremetalClusterProviderSpec
            ...
            loadBalancerHost: ""
      
    • Create a subnet with the ipam/SVC-LBhost label having the "1" value to make the baremetal-provider use this subnet for allocation of cluster API endpoints addresses.

    One IP address will be allocated for each cluster to serve its Kubernetes/MKE API endpoint.

    Caution

    Make sure that master nodes have host local-link addresses in the same subnet as the cluster API endpoint address. These host IP addresses will be used for VRRP traffic. The cluster API endpoint address will be assigned to the same interface on one of the master nodes where these host IPs are assigned.

    Note

    We highly recommend that you assign the cluster API endpoint address from the LCM network. For details on cluster networks types, refer to Managed cluster networking. See also the Single managed cluster use case example in the following table.

    You can use several options of addresses allocation scope of API endpoints using subnets:

    Use case

    Example configuration

    Several managed clusters in one management cluster

    Create a subnet in the default namespace with no reference to any cluster.

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      name: lbhost-per-region
      namespace: default
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        ipam/SVC-LBhost: "1"
    spec:
      cidr: 191.11.0.0/24
      includeRanges:
      - 191.11.0.6-191.11.0.20
    

    Note

    The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

    Warning

    Combining the ipam/SVC-LBhost label with any other service labels on a single subnet is not supported. Use a dedicated subnet for addresses allocation for cluster API endpoints.

    Several managed clusters in a project

    Create a subnet in a namespace corresponding to your project with no reference to any cluster. Such subnet has priority over the one described above.

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      name: lbhost-per-namespace
      namespace: my-project
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        ipam/SVC-LBhost: "1"
    spec:
      cidr: 191.11.0.0/24
      includeRanges:
      - 191.11.0.6-191.11.0.20
    

    Warning

    Combining the ipam/SVC-LBhost label with any other service labels on a single subnet is not supported. Use a dedicated subnet for addresses allocation for cluster API endpoints.

    Single managed cluster

    Create a subnet in a namespace corresponding to your project with a reference to the target cluster using the cluster.sigs.k8s.io/cluster-name label. Such subnet has priority over the ones described above. In this case, it is not obligatory to use a dedicated subnet for addresses allocation of API endpoints. You can add the ipam/SVC-LBhost label to the LCM subnet, and one of the addresses from this subnet will be allocated for an API endpoint:

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      name: lbhost-per-cluster
      namespace: my-project
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        ipam/SVC-LBhost: "1"
        ipam/SVC-k8s-lcm: "1"
        cluster.sigs.k8s.io/cluster-name: my-cluster
    spec:
      cidr: 10.11.0.0/24
      includeRanges:
      - 10.11.0.6-10.11.0.50
    

    Warning

    You can combine the ipam/SVC-LBhost label only with the following service labels on a single subnet:

    • ipam/SVC-k8s-lcm

    • ipam/SVC-ceph-cluster

    • ipam/SVC-ceph-public

    Otherwise, use a dedicated subnet for address allocation for the cluster API endpoint. Other combinations are not supported and can lead to unexpected results.

    The above options can be used in conjunction. For example, you can define a subnet for a region, a number of subnets within this region defined for particular namespaces, and a number of subnets within the same region and namespaces defined for particular clusters.

  5. Optional. Add a subnet(s) for the Storage access network.

    • Set the ipam/SVC-ceph-public label with the value "1" to create a subnet that will be used to configure the Ceph public network.

    • Set the cluster.sigs.k8s.io/cluster-name label to the name of the target cluster during the subnet creation.

    • Use this subnet in the L2 template for storage nodes.

    • Assign this subnet to the interface connected to your Storage access network.

    • Ceph will automatically use this subnet for its external connections.

    • A Ceph OSD will look for and bind to an address from this subnet when it is started on a machine.

  6. Optional. Add a subnet(s) for the Storage replication network.

    • Set the ipam/SVC-ceph-cluster label with the value "1" to create a subnet that will be used to configure the Ceph cluster network.

    • Set the cluster.sigs.k8s.io/cluster-name label to the name of the target cluster during the subnet creation.

    • Use this subnet in the L2 template for storage nodes.

    • Assign this subnet to the interface connected to your Storage replication network.

    • Ceph will automatically use this subnet for its internal replication traffic.

  7. Optional. Add a subnet for Kubernetes pods traffic.

    • Use this subnet in the L2 template for all nodes in the cluster.

    • Assign this subnet to the interface connected to your Kubernetes workloads network.

    • Use the npTemplate.bridges.k8s-pods bridge name in the L2 template. This bridge name is reserved for the Kubernetes workloads network. When the k8s-pods bridge is defined in an L2 template, Calico CNI uses that network for routing the pods traffic between nodes.

  8. Optional. Add subnets for configuring multiple DHCP ranges. For details, see Configure multiple DHCP address ranges.

  9. Verify that the subnet is successfully created:

    kubectl get subnet kaas-mgmt -o yaml
    

    In the system output, verify the status fields of the subnet.yaml file using the table below.

    Status fields of the Subnet object

    Parameter

    Description

    state Since 2.23.0

    Contains a short state description and a more detailed one if applicable. The short status values are as follows:

    • OK - object is operational.

    • ERR - object is non-operational. This status has a detailed description in the messages list.

    • TERM - object was deleted and is terminating.

    messages Since 2.23.0

    Contains error or warning messages if the object state is ERR. For example, ERR: Wrong includeRange for CIDR….

    statusMessage

    Deprecated since Container Cloud 2.23.0 and will be removed in one of the following releases in favor of state and messages. Since Container Cloud 2.24.0, this field is not set for the objects of newly created clusters.

    cidr

    Reflects the actual CIDR, has the same meaning as spec.cidr.

    gateway

    Reflects the actual gateway, has the same meaning as spec.gateway.

    nameservers

    Reflects the actual name servers, has same meaning as spec.nameservers.

    ranges

    Specifies the address ranges that are calculated using the fields from spec: cidr, includeRanges, excludeRanges, gateway, useWholeCidr. These ranges are directly used for nodes IP allocation.

    allocatable

    Includes the number of currently available IP addresses that can be allocated for nodes from the subnet.

    allocatedIPs

    Specifies the list of IPv4 addresses with the corresponding IPaddr object IDs that were already allocated from the subnet.

    capacity

    Contains the total number of IP addresses being held by ranges that equals to a sum of the allocatable and allocatedIPs parameters values.

    objCreated

    Date, time, and IPAM version of the Subnet CR creation.

    objStatusUpdated

    Date, time, and IPAM version of the last update of the status field in the Subnet CR.

    objUpdated

    Date, time, and IPAM version of the last Subnet CR update by kaas-ipam.

    Example of a successfully created subnet:

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      labels:
        ipam/UID: 6039758f-23ee-40ba-8c0f-61c01b0ac863
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        ipam/SVC-k8s-lcm: "1"
      name: kaas-mgmt
      namespace: default
    spec:
      cidr: 172.16.170.0/24
      excludeRanges:
      - 172.16.170.100
      - 172.16.170.101-172.16.170.139
      gateway: 172.16.170.1
      includeRanges:
      - 172.16.170.70-172.16.170.99
      nameservers:
      - 172.18.176.6
      - 172.18.224.6
    status:
      allocatable: 27
      allocatedIPs:
      - 172.16.170.70:ebabace8-7d9e-4913-a938-3d9e809f49fc
      - 172.16.170.71:c1109596-fba1-471b-950b-b1b60ef2c37c
      - 172.16.170.72:94c25734-c046-4a7e-a0fb-75582c5f20a9
      capacity: 30
      checksums:
        annotations: sha256:38e0b9de817f645c4bec37c0d4a3e58baecccb040f5718dc069a72c7385a0bed
        labels: sha256:5ed97704b05f15b204c1347603f9749ac015c29a4a16c6f599eed06babfb312e
        spec: sha256:60ead7c744564b3bfbbb3c4e846bce54e9128be49a279bf0c2bbebac2cfcebe6
      cidr: 172.16.170.0/24
      gateway: 172.16.170.1
      labelSetChecksum: 5ed97704b05f15b204c1347603f9749ac015c29a4a16c6f599eed06babfb312e
      nameservers:
      - 172.18.176.6
      - 172.18.224.6
      objCreated: 2023-03-03T03:06:20.00000Z  by  v6.4.999-20230127-091906-c451398
      objStatusUpdated: 2023-03-03T04:05:14.48469Z  by  v6.4.999-20230127-091906-c451398
      objUpdated: 2023-03-03T04:05:14.48469Z  by  v6.4.999-20230127-091906-c451398
      ranges:
      - 172.16.170.70-172.16.170.99
      state: OK
    
  10. Proceed to creating an L2 template for one or multiple managed clusters as described in Create L2 templates.

Automate multiple subnet creation using SubnetPool

Unsupported since 2.28.0 (17.3.0 and 16.3.0)

Warning

The SubnetPool object is unsupported since Container Cloud 2.28.0 (17.3.0 and 16.3.0). For details, see MOSK Deprecation Notes: SubnetPool resource management.

Operators of Mirantis Container Cloud for on-demand self-service Kubernetes deployments will want their users to create networks without extensive knowledge about network topology or IP addresses. For that purpose, the Operator can prepare L2 network templates in advance for users to assign these templates to machines in their clusters.

The Operator can ensure that the users’ clusters have separate IP address spaces using the SubnetPool resource.

SubnetPool allows for automatic creation of Subnet objects that will consume blocks from the parent SubnetPool CIDR IP address range. The SubnetPool blockSize setting defines the IP address block size to allocate to each child Subnet. SubnetPool has a global scope, so any SubnetPool can be used to create the Subnet objects for any namespace and for any cluster.

You can use the SubnetPool resource in the L2Template resources to automatically allocate IP addresses from an appropriate IP range that corresponds to a specific cluster, or create a Subnet resource if it does not exist yet. This way, every cluster will use subnets that do not overlap with other clusters.

To automate multiple subnet creation using SubnetPool:

  1. Log in to a local machine where your management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created during the last stage of the management cluster bootstrap.

  2. Create the subnetpool.yaml file with a number of subnet pools:

    Note

    You can define either or both subnets and subnet pools, depending on the use case. A single L2 template can use either or both subnets and subnet pools.

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> apply -f <SubnetFileName.yaml>
    

    Note

    In the command above and in the steps below, substitute the parameters enclosed in angle brackets with the corresponding values.

    Example of a subnetpool.yaml file:

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: SubnetPool
    metadata:
      name: kaas-mgmt
      namespace: default
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      cidr: 10.10.0.0/16
      blockSize: /25
      nameservers:
      - 172.18.176.6
      gatewayPolicy: first
    

    For the specification fields description of the SubnetPool object, see SubnetPool spec.

    Note

    The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

  3. Verify that the subnet pool is successfully created:

    kubectl get subnetpool kaas-mgmt -oyaml
    

    In the system output, verify the status fields of the subnetpool.yaml file. For the status fields description of the SunbetPool object, see SubnetPool status.

  4. Proceed to creating an L2 template for one or multiple managed clusters as described in Create L2 templates. In this procedure, select the exemplary L2 template for multiple subnets.

    Caution

    Using the l3Layout section, define all subnets that are used in the npTemplate section. Defining only part of subnets is not allowed.

    If labelSelector is used in l3Layout, use any custom label name that differs from system names. This allows for easier cluster scaling in case of adding new subnets as described in Expand IP addresses capacity in an existing cluster.

    Mirantis recommends using a unique label prefix such as user-defined/.

Create L2 templates

Caution

Since Container Cloud 2.9.0, L2 templates have a new format. In the new L2 templates format, l2template:status:npTemplate is used directly during provisioning. Therefore, a hardware node obtains and applies a complete network configuration during the first system boot.

Update any L2 template created before Container Cloud 2.9.0 as described in Release Notes: Switch L2 templates to the new format.

After you create subnets for one or more managed clusters or projects as described in Create subnets or Automate multiple subnet creation using SubnetPool, follow the procedure below to create L2 templates for a managed cluster. This procedure contains exemplary L2 templates for the following use cases:

L2 template example with bonds and bridges

This section contains an exemplary L2 template that demonstrates how to set up bonds and bridges on hosts for your managed clusters as described in Create L2 templates.

Caution

Use of a dedicated network for Kubernetes pods traffic, for external connection to the Kubernetes services exposed by the cluster, and for the Ceph cluster access and replication traffic is available as Technology Preview. Use such configurations for testing and evaluation purposes only. For the Technology Preview feature definition, refer to Technology Preview features.

Parameters of the bond interface

Configure bonding options using the parameters field. The only mandatory option is mode. See the example below for details.

Note

You can set any mode supported by netplan and your hardware.

Important

Bond monitoring is disabled in Ubuntu by default. However, Mirantis highly recommends enabling it using Media Independent Interface (MII) monitoring by setting the mii-monitor-interval parameter to a non-zero value. For details, see Linux documentation: bond monitoring.

Kubernetes LCM network

The Kubernetes LCM network connects LCM Agents running on nodes to the LCM API of the management cluster. It is also used for communication between kubelet and Kubernetes API server inside a Kubernetes cluster. The MKE components use this network for communication inside a swarm cluster.

To configure each node with an IP address that will be used for LCM traffic, use the npTemplate.bridges.k8s-lcm bridge in the L2 template, as demonstrated in the example below.

Each node of every cluster must have only one IP address in the LCM network that is allocated from one of the Subnet objects having the ipam/SVC-k8s-lcm label defined. Therefore, all Subnet objects used for LCM networks must have the ipam/SVC-k8s-lcm label defined. For details, see Service labels and their life cycle.

As defined in Host networking, the LCM network can be collocated with the PXE network.

Dedicated network for the Kubernetes pods traffic

If you want to use a dedicated network for Kubernetes pods traffic, configure each node with an IPv4 address that will be used to route the pods traffic between nodes. To accomplish that, use the npTemplate.bridges.k8s-pods bridge in the L2 template, as demonstrated in the example below. As defined in Host networking, this bridge name is reserved for the Kubernetes pods network. When the k8s-pods bridge is defined in an L2 template, Calico CNI uses that network for routing the pods traffic between nodes.

Dedicated network for the Kubernetes services traffic (MetalLB)

You can use a dedicated network for external connection to the Kubernetes services exposed by the cluster. If enabled, MetalLB will listen and respond on the dedicated virtual bridge. To accomplish that, configure each node where metallb-speaker is deployed with an IPv4 address. For details on selecting nodes for metallb-speaker, see Configure node selector for MetalLB speaker. Both the MetalLB IP address ranges and the IP addresses configured on those nodes must fit in the same CIDR.

Use the npTemplate.bridges.k8s-ext bridge in the L2 template, as demonstrated in the example below. This bridge name is reserved for the Kubernetes external network. The Subnet object that corresponds to the k8s-ext bridge must have explicitly excluded the IP address ranges that are in use by MetalLB.

Dedicated network for the Ceph distributed storage traffic

You can configure dedicated networks for the Ceph cluster access and replication traffic. Set labels on the Subnet CRs for the corresponding networks, as described in Create subnets. Container Cloud automatically configures Ceph to use the addresses from these subnets. Ensure that the addresses are assigned to the storage nodes.

Use the npTemplate.bridges.ceph-cluster and npTemplate.bridges.ceph-public bridges in the L2 template, as demonstrated in the example below. These names are reserved for the Ceph cluster access (public) and replication (cluster) networks.

The Subnet objects used to assign IP addresses to these bridges must have corresponding labels ipam/SVC-ceph-public for the ceph-public bridge and ipam/SVC-ceph-cluster for the ceph-cluster bridge.

Example of an L2 template with interfaces bonding
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  name: test-managed
  namespace: managed-ns
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    cluster.sigs.k8s.io/cluster-name: my-cluster
spec:
  autoIfMappingPrio:
    - provision
    - eno
    - ens
    - enp
  l3Layout:
    - subnetName: demo-lcm
      scope:      namespace
    - subnetName: demo-pods
      scope:      namespace
    - subnetName: demo-ext
      scope:      namespace
    - subnetName: demo-ceph-cluster
      scope:      namespace
    - subnetName: demo-ceph-public
      scope:      namespace
  npTemplate: |
    version: 2
    ethernets:
      {{nic 2}}:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 2}}
        set-name: {{nic 2}}
      {{nic 3}}:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 3}}
        set-name: {{nic 3}}
    bonds:
      bond0:
        interfaces:
          - {{nic 2}}
          - {{nic 3}}
        parameters:
          mode: 802.3ad
          mii-monitor-interval: 100
    vlans:
      k8s-ext-vlan:
        id: 1001
        link: bond0
      k8s-pods-vlan:
        id: 1002
        link: bond0
      stor-frontend:
        id: 1003
        link: bond0
      stor-backend:
        id: 1004
        link: bond0
    bridges:
      k8s-lcm:
        interfaces: [bond0]
        addresses:
          - {{ip "k8s-lcm:demo-lcm"}}
        gateway4: {{gateway_from_subnet "demo-lcm"}}
        nameservers:
          addresses: {{nameservers_from_subnet "demo-lcm"}}
      k8s-ext:
        interfaces: [k8s-ext-vlan]
        addresses:
          - {{ip "k8s-ext:demo-ext"}}
      k8s-pods:
        interfaces: [k8s-pods-vlan]
        addresses:
          - {{ip "k8s-pods:demo-pods"}}
      ceph-cluster:
        interfaces: [stor-backend]
        addresses:
          - {{ip "ceph-cluster:demo-ceph-cluster"}}
      ceph-public:
        interfaces: [stor-frontend]
        addresses:
          - {{ip "ceph-public:demo-ceph-public"}}

Note

The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

L2 template example for automatic multiple subnet creation

Unsupported since 2.28.0 (17.3.0 and 16.3.0)

Warning

The SubnetPool object is unsupported since Container Cloud 2.28.0 (17.3.0 and 16.3.0). For details, see MOSK Deprecation Notes: SubnetPool resource management.

This section contains an exemplary L2 template for automatic multiple subnet creation as described in Automate multiple subnet creation using SubnetPool. This template also contains the L3Layout section that allows defining the Subnet scopes and enables auto-creation of the Subnet objects from the SubnetPool objects. For details about auto-creation of the Subnet objects see Automate multiple subnet creation using SubnetPool.

For details on how to create L2 templates, see Create L2 templates.

Caution

Do not assign an IP address to the PXE nic 0 NIC explicitly to prevent the IP duplication during updates. The IP address is automatically assigned by the bootstrapping engine.

Example of an L2 template for multiple subnets:

apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  name: test-managed
  namespace: managed-ns
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    cluster.sigs.k8s.io/cluster-name: my-cluster
spec:
  autoIfMappingPrio:
    - provision
    - eno
    - ens
    - enp
  l3Layout:
    - subnetName: lcm-subnet
      scope:      namespace
    - subnetName: subnet-1
      subnetPool: kaas-mgmt
      scope:      namespace
    - subnetName: subnet-2
      subnetPool: kaas-mgmt
      scope:      cluster
  npTemplate: |
    version: 2
    ethernets:
      onboard1gbe0:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 0}}
        set-name: {{nic 0}}
        # IMPORTANT: do not assign an IP address here explicitly
        # to prevent IP duplication issues. The IP will be assigned
        # automatically by the bootstrapping engine.
        # addresses: []
      onboard1gbe1:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 1}}
        set-name: {{nic 1}}
      ten10gbe0s0:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 2}}
        set-name: {{nic 2}}
        addresses:
          - {{ip "2:subnet-1"}}
      ten10gbe0s1:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 3}}
        set-name: {{nic 3}}
        addresses:
          - {{ip "3:subnet-2"}}
    bridges:
      k8s-lcm:
        interfaces: [onboard1gbe0]
        addresses:
          - {{ip "k8s-lcm:lcm-subnet"}}
        gateway4: {{gateway_from_subnet "lcm-subnet"}}
        nameservers:
          addresses: {{nameservers_from_subnet "lcm-subnet"}}

Note

The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

In the template above, the following networks are defined in the l3Layout section:

  • lcm-subnet - the subnet name to use for the LCM network in the npTemplate. This subnet is shared between the project clusters because it has the namespaced scope.

    • Since a subnet pool is not in use, create the corresponding Subnet object before machines are attached to cluster manually. For details, see Create subnets for a managed cluster using CLI.

    • Mark this Subnet with the ipam/SVC-k8s-lcm label. The L2 template must contain the definition of the virtual Linux bridge (k8s-lcm in the L2 template example) that is used to set up the LCM network interface. IP addresses for the defined bridge must be assigned from the LCM subnet, which is marked with the ipam/SVC-k8s-lcm label.

      Each node of every cluster must have only one IP address in the LCM network that is allocated from one of the Subnet objects having the ipam/SVC-k8s-lcm label defined. Therefore, all Subnet objects used for LCM networks must have the ipam/SVC-k8s-lcm label defined. For details, see Service labels and their life cycle.

  • subnet-1 - unless already created, this subnet will be created from the kaas-mgmt subnet pool. The subnet name must be unique within the project. This subnet is shared between the project clusters.

  • subnet-2 - will be created from the kaas-mgmt subnet pool. This subnet has the cluster scope. Therefore, the real name of the Subnet CR object consists of the subnet name defined in l3Layout and the cluster UID. But the npTemplate section of the L2 template must contain only the subnet name defined in l3Layout. The subnets of the cluster scope are not shared between clusters.

Caution

Using the l3Layout section, define all subnets that are used in the npTemplate section. Defining only part of subnets is not allowed.

If labelSelector is used in l3Layout, use any custom label name that differs from system names. This allows for easier cluster scaling in case of adding new subnets as described in Expand IP addresses capacity in an existing cluster.

Mirantis recommends using a unique label prefix such as user-defined/.

Caution

Modification of L2 templates in use is allowed with a mandatory validation step from the Infrastructure Operator to prevent accidental cluster failures due to unsafe changes. The list of risks posed by modifying L2 templates includes:

  • Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.

  • Connections between services are interrupted unexpectedly, which can cause data loss.

  • Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.

For details, see Modify network configuration on an existing machine.

Create an L2 template for a new managed cluster

Caution

Make sure that you create L2 templates before adding any machines to your new managed cluster.

  1. Log in to a local machine where your management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created during the last stage of the management cluster bootstrap.

  2. Inspect the existing L2 templates to select the one that fits your deployment:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> \
    get l2template -n <ProjectNameForNewManagedCluster>
    
  3. Create an L2 YAML template specific to your deployment using one of the exemplary templates:

    Note

    You can create several L2 templates with different configurations to be applied to different nodes of the same cluster. See Assign L2 templates to machines for details.

  4. Add or edit the mandatory parameters in the new L2 template. The following tables provide the description of the mandatory parameters in the example templates mentioned in the previous step.

    L2 template mandatory parameters

    Parameter

    Description

    clusterRef

    Caution

    Deprecated since Container Cloud 2.25.0 in favor of the mandatory cluster.sigs.k8s.io/cluster-name label. Will be removed in one of the following releases.

    On existing clusters, this parameter is automatically migrated to the cluster.sigs.k8s.io/cluster-name label since 2.25.0.

    If an existing cluster has clusterRef: default set, the migration process involves removing this parameter. Subsequently, it is not substituted with the cluster.sigs.k8s.io/cluster-name label, ensuring the application of the L2 template across the entire Kubernetes namespace.

    The Cluster object name that this template is applied to. The default value is used to apply the given template to all clusters within a particular project, unless an L2 template that references a specific cluster name exists. The clusterRef field has priority over the cluster.sigs.k8s.io/cluster-name label:

    • When clusterRef is set to a non-default value, the cluster.sigs.k8s.io/cluster-name label will be added or updated with that value.

    • When clusterRef is set to default, the cluster.sigs.k8s.io/cluster-name label will be absent or removed.

    L2 template requirements

    • An L2 template must have the same project (Kubernetes namespace) as the referenced cluster.

    • A cluster can be associated with many L2 templates. Only one of them can have the ipam/DefaultForCluster label. Every L2 template that does not have the ipam/DefaultForCluster label can be later assigned to a particular machine using l2TemplateSelector.

    • The following rules apply to the default L2 template of a namespace:

      • Since Container Cloud 2.25.0, creation of the default L2 template for a namespace is disabled. On existing clusters, the Spec.clusterRef: default parameter of such an L2 template is automatically removed during the migration process. Subsequently, this parameter is not substituted with the cluster.sigs.k8s.io/cluster-name label, ensuring the application of the L2 template across the entire Kubernetes namespace. Therefore, you can continue using existing default namespaced L2 templates.

      • Before Container Cloud 2.25.0, the default L2Template object of a namespace must have the Spec.clusterRef: default parameter that is deprecated since 2.25.0.

    ifMapping or autoIfMappingPrio

    • ifMapping

      List of interface names for the template. The interface mapping is defined globally for all bare metal hosts in the cluster but can be overridden at the host level, if required, by editing the IpamHost object for a particular host. The ifMapping parameter is mutually exclusive with autoIfMappingPrio.

    • autoIfMappingPrio

      autoIfMappingPrio is a list of prefixes, such as eno, ens, and so on, to match the interfaces to automatically create a list for the template. If you are not aware of any specific ordering of interfaces on the nodes, use the default ordering from Predictable Network Interfaces Names specification for systemd. You can also override the default NIC list per host using the IfMappingOverride parameter of the corresponding IpamHost. The provision value corresponds to the network interface that was used to provision a node. Usually, it is the first NIC found on a particular node. It is defined explicitly to ensure that this interface will not be reconfigured accidentally.

      The autoIfMappingPrio parameter is mutually exclusive with ifMapping.

    l3Layout

    Subnets to be used in the npTemplate section. The field contains a list of subnet definitions with parameters used by template macros.

    • subnetName

      Defines the alias name of the subnet that can be used to reference this subnet from the template macros. This parameter is mandatory for every entry in the l3Layout list.

    • subnetPool Unsupported since 2.28.0 (17.3.0 and 16.3.0)

      Optional. Default: none. Defines a name of the parent SubnetPool object that will be used to create a Subnet object with a given subnetName and scope. For deprecation details, see MOSK Deprecation Notes: SubnetPool resource management.

      If a corresponding Subnet object already exists, nothing will be created and the existing object will be used. If no SubnetPool is provided, no new Subnet object will be created.

    • scope

      Logical scope of the Subnet object with a corresponding subnetName. Possible values:

      • global - the Subnet object is accessible globally, for any Container Cloud project and cluster, for example, the PXE subnet.

      • namespace - the Subnet object is accessible within the same project where the L2 template is defined.

      • cluster - the Subnet object is only accessible to the cluster that L2Template.spec.clusterRef refers to. The Subnet objects with the cluster scope will be created for every new cluster.

    • labelSelector

      Contains a dictionary of labels and their respective values that will be used to find the matching Subnet object for the subnet. If the labelSelector field is omitted, the Subnet object will be selected by name, specified by the subnetName parameter.

      Caution

      The labels and their values in this section must match the ones added for the corresponding Subnet object.

    Caution

    The l3Layout section is mandatory for each L2Template custom resource.

    npTemplate

    A netplan-compatible configuration with special lookup functions that defines the networking settings for the cluster hosts, where physical NIC names and details are parameterized. This configuration will be processed using Go templates. Instead of specifying IP and MAC addresses, interface names, and other network details specific to a particular host, the template supports use of special lookup functions. These lookup functions, such as nic, mac, ip, and so on, return host-specific network information when the template is rendered for a particular host.

    Caution

    All rules and restrictions of the netplan configuration also apply to L2 templates. For details, see the official netplan documentation.

    Caution

    We strongly recommend following the below conventions on network interface naming:

    • A physical NIC name set by an L2 template must not exceed 15 symbols. Otherwise, an L2 template creation fails. This limit is set by the Linux kernel.

    • Names of virtual network interfaces such as VLANs, bridges, bonds, veth, and so on must not exceed 15 symbols.

    We recommend setting interfaces names that do not exceed 13 symbols for both physical and virtual interfaces to avoid corner cases and issues in netplan rendering.

    The following table describes the main lookup functions for an L2 template.

    Lookup function

    Description

    {{nic N}}

    Name of a NIC number N. NIC numbers correspond to the interface mapping list. This macro can be used as a key for the elements of the ethernets map, or as the value of the name and set-name parameters of a NIC. It is also used to reference the physical NIC from definitions of virtual interfaces (vlan, bridge).

    {{mac N}}

    MAC address of a NIC number N registered during a host hardware inspection.

    {{ip “N:subnet-a”}}

    IP address and mask for a NIC number N. The address will be auto-allocated from the given subnet if the address does not exist yet.

    {{ip “br0:subnet-x”}}

    IP address and mask for a virtual interface, “br0” in this example. The address will be auto-allocated from the given subnet if the address does not exist yet.

    For virtual interfaces names, an IP address placeholder must contain a human-readable ID that is unique within the L2 template and must have the following format:

    {{ip "<shortUniqueHumanReadableID>:<subnetNameFromL3Layout>"}}

    The <shortUniqueHumanReadableID> is made equal to a virtual interface name throughout this document and Container Cloud bootstrap templates.

    {{cidr_from_subnet “subnet-a”}}

    IPv4 CIDR address from the given subnet.

    {{gateway_from_subnet “subnet-a”}}

    IPv4 default gateway address from the given subnet.

    {{nameservers_from_subnet “subnet-a”}}

    List of the IP addresses of name servers from the given subnet.

    {{cluster_api_lb_ip}}

    Technology Preview since Container Cloud 2.24.4. IP address for a cluster API load balancer.

    Note

    Every subnet referenced in an L2 template can have either a global or namespaced scope. In the latter case, the subnet must exist in the same project where the corresponding cluster and L2 template are located.

  5. Optional. To designate an L2 template as default, assign the ipam/DefaultForCluster label to it. Only one L2 template in a cluster can have this label. It will be used for machines that do not have an L2 template explicitly assigned to them.

    To assign the default template to the cluster:

    • Since Container Cloud 2.25.0, use the mandatory cluster.sigs.k8s.io/cluster-name label in the L2 template metadata section.

    • Before Container Cloud 2.25.0, use the cluster.sigs.k8s.io/cluster-name label or the clusterRef parameter in the L2 template spec section. This parameter is deprecated and will be removed in one of the following releases. During cluster update to 2.25.0, this parameter is automatically migrated to the cluster.sigs.k8s.io/cluster-name label.

  6. Optional. Add the l2template-<NAME>: "exists" label to the L2 template. Replace <NAME> with the unique L2 template name or any other unique identifier. You can refer to this label to assign this L2 template when you create machines.

  7. Add the L2 template to your management cluster. Select one of the following options:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> apply -f <pathToL2TemplateYamlFile>
    

    Available since Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0)

    1. Log in to the Container Cloud web UI with the operator permissions.

    2. Switch to the required non-default project using the Switch Project action icon located on top of the main left-side navigation panel.

      To create a project, refer to Create a project for managed clusters.

    3. In the left sidebar, navigate to Networks and click the L2 Templates tab.

    4. Click Create L2 Template.

    5. Fill out the Create L2 Template form as required:

      • Name

        L2 template name.

      • Cluster

        Cluster name that the L2 template is being added for. To set the L2 template as default for all machines, also select Set default for the cluster.

      • Specification

        L2 specification in the YAML format that you have previously created. Click Edit to edit the L2 template if required.

        Note

        Before Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0), the field name is YAML file, and you can upload the required YAML file instead of inserting and editing it.

      • Labels

        Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Key-value pairs attached to the L2 template. For details, see API Reference: L2Template metadata.

  8. Proceed with Add a machine. The resulting L2 template will be used to render the netplan configuration for the managed cluster machines.

Workflow of the netplan configuration using an L2 template
  1. The kaas-ipam service uses the data from BareMetalHost, the L2 template, and subnets to generate the netplan configuration for every cluster machine.

    Note

    Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

    Caution

    While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

  2. The generated netplan configuration is saved in the status.netconfigFiles section of the IpamHost resource. If the status.netconfigFilesState field of the IpamHost resource is OK, the configuration was rendered in the IpamHost resource successfully. Otherwise, the status contains an error message.

    Caution

    The following fields of the ipamHost status are renamed since Container Cloud 2.22.0 in the scope of the L2Template and IpamHost objects refactoring:

    • netconfigV2 to netconfigCandidate

    • netconfigV2state to netconfigCandidateState

    • netconfigFilesState to netconfigFilesStates (per file)

    No user actions are required after renaming.

    The format of netconfigFilesState changed after renaming. The netconfigFilesStates field contains a dictionary of statuses of network configuration files stored in netconfigFiles. The dictionary contains the keys that are file paths and values that have the same meaning for each file that netconfigFilesState had:

    • For a successfully rendered configuration file: OK: <timestamp> <sha256-hash-of-rendered-file>, where a timestamp is in the RFC 3339 format.

    • For a failed rendering: ERR: <error-message>.

  3. The baremetal-provider service copies data from the status.netconfigFiles of IpamHost to the Spec.StateItemsOverwrites[‘deploy’][‘bm_ipam_netconfigv2’] parameter of LCMMachine.

  4. The lcm-agent service on every host synchronizes the LCMMachine data to its host. The lcm-agent service runs a playbook to update the netplan configuration on the host during the pre-download and deploy phases.

Configure BGP announcement for cluster API LB address

TechPreview Available since 2.24.4

When you create a bare metal managed cluster with the multi-rack topology, where Kubernetes masters are distributed across multiple racks without an L2 layer extension between them, you must configure BGP announcement of the cluster API load balancer address.

For clusters where Kubernetes masters are in the same rack or with an L2 layer extension between masters, you can configure either BGP or L2 (ARP) announcement of the cluster API load balancer address. The L2 (ARP) announcement is used by default and its configuration is covered in Create a cluster using web UI.

Caution

Create Rack and MultiRackCluster objects, which are described in the below procedure, before initiating the provisioning of master nodes to ensure that both BGP and netplan configurations are applied simultaneously during the provisioning process.

To enable the use of BGP announcement for the cluster API LB address:

  1. In the Cluster object, set the useBGPAnnouncement parameter to true:

    spec:
      providerSpec:
        value:
          useBGPAnnouncement: true
    
  2. Create the MultiRackCluster object that is mandatory when configuring BGP announcement for the cluster API LB address. This object enables you to set cluster-wide parameters for configuration of BGP announcement.

    In this scenario, the MultiRackCluster object must be bound to the corresponding Cluster object using the cluster.sigs.k8s.io/cluster-name label.

    Container Cloud uses the bird BGP daemon for announcement of the cluster API LB address. For this reason, set the corresponding bgpdConfigFileName and bgpdConfigFilePath parameters in the MultiRackCluster object, so that bird can locate the configuration file. For details, see the configuration example below.

    The bgpdConfigTemplate object contains the default configuration file template for the bird BGP daemon, which you can override in Rack objects.

    The defaultPeer parameter contains default parameters of the BGP connection from master nodes to infrastructure BGP peers, which you can override in Rack objects.

    Configuration example for MultiRackCluster
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: MultiRackCluster
    metadata:
      name: multirack-test-cluster
      namespace: managed-ns
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      bgpdConfigFileName: bird.conf
      bgpdConfigFilePath: /etc/bird
      bgpdConfigTemplate: |
        ...
      defaultPeer:
        localASN: 65101
        neighborASN: 65100
        neighborIP: ""
        password: deadbeef
    

    For the object description, see API Reference: MultiRackCluster resource.

    Note

    The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

  3. Create the Rack object(s). This object is mandatory when configuring BGP announcement for the cluster API LB address and it allows you to configure BGP announcement parameters for each rack.

    In this scenario, Rack objects must be bound to Machine objects corresponding to master nodes of the cluster. Each Rack object describes the configuration for the bird BGP daemon used to announce the cluster API LB address from a particular master node or from several master nodes in the same rack.

    The Machine object can optionally define the rack-id node label that is not used for BGP announcement of the cluster API LB IP but can be used for MetalLB. This label is required for MetalLB node selectors when MetalLB is used to announce LB IP addresses on nodes that are distributed across multiple racks. In this scenario, the L2 (ARP) announcement mode cannot be used for MetalLB because master nodes are in different L2 segments. So, the BGP announcement mode must be used for MetalLB, and node selectors are required to properly configure BGP connections from each node. See Configure MetalLB for details.

    The L2Template object includes the lo interface configuration to set the IP address for the bird BGP daemon that will be advertised as the cluster API LB address. The {{ cluster_api_lb_ip }} function is used in npTemplate to obtain the cluster API LB address value.

    Configuration example for Rack
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Rack
    metadata:
      name: rack-master-1
      namespace: managed-ns
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      bgpdConfigTemplate: |  # optional
        ...
      peeringMap:
        lcm-rack-control-1:
          peers:
          - neighborIP: 10.77.31.2  # "localASN" & "neighborASN" are taken from
          - neighborIP: 10.77.31.3  # "MultiRackCluster.spec.defaultPeer" if
                                    # not set here
    
    Configuration example for Machine
    apiVersion: cluster.k8s.io/v1alpha1
    kind: Machine
    metadata:
      name: test-cluster-master-1
      namespace: managed-ns
      annotations:
        metal3.io/BareMetalHost: managed-ns/test-cluster-master-1
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        cluster.sigs.k8s.io/control-plane: controlplane
        hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
        ipam/RackRef: rack-master-1  # reference to the "rack-master-1" Rack
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      providerSpec:
        value:
          kind: BareMetalMachineProviderSpec
          apiVersion: baremetal.k8s.io/v1alpha1
          hostSelector:
            matchLabels:
              kaas.mirantis.com/baremetalhost-id: test-cluster-master-1
          l2TemplateSelector:
            name: test-cluster-master-1
          nodeLabels:            # optional. it is not used for BGP announcement
          - key: rack-id         # of the cluster API LB IP but it can be used
            value: rack-master-1 # for MetalLB if "nodeSelectors" are required
      ...
    
    Configuration example for L2Template
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
      name: test-cluster-master-1
      namespace: managed-ns
    spec:
      ...
      l3Layout:
        - subnetName: lcm-rack-control-1  # this network is referenced
          scope:      namespace           # in the "rack-master-1" Rack
        - subnetName: ext-rack-control-1  # optional. this network is used
          scope:      namespace           # for k8s services traffic and
                                          # MetalLB BGP connections
      ...
      npTemplate: |
        ...
        ethernets:
          lo:
            addresses:
              - {{ cluster_api_lb_ip }}  # function for cluster API LB IP
            dhcp4: false
            dhcp6: false
        ...
    

    The Rack object fields are described in API Reference: Rack resource.

    The configuration example for the scenario where Kubernetes masters are in the same rack or with an L2 layer extension between masters is described in Single rack configuration example.

    The configuration example for the scenario where Kubernetes masters are distributed across multiple racks without L2 layer extension between them is described in Multiple rack configuration example.

Add a machine

The subsections of this section were moved to MOSK Deployment Guide: Add a machine.

Create a machine using web UI

This section was moved to MOSK Deployment Guide: Add a machine using web UI.

Create a machine using CLI

The subsections of this section were moved to MOSK Deployment Guide: Add a machine.

Deploy a machine to a specific bare metal host

This section was moved to MOSK Deployment Guide: Deploy a machine to a specific bare metal host.

Assign L2 templates to machines

This section was moved to MOSK Deployment Guide: Assign L2 templates to machines.

Override network interfaces naming and order

This section was moved to MOSK Deployment Guide: Override network interfaces naming and order.

Manually allocate IP addresses for bare metal hosts

Available since Cluster releases 16.0.0 and 17.0.0 as TechPreview and since 16.1.0 and 17.1.0 as GA

This section was moved to MOSK Deployment Guide: Manually allocate IP addresses for bare metal hosts.

Add a Ceph cluster

After you add machines to your new bare metal cluster as described in Add a machine to bare metal managed cluster, create a Ceph cluster on top of this managed cluster using the Mirantis Container Cloud web UI or CLI.

Add a Ceph cluster using web UI

Warning

Mirantis highly recommends adding a Ceph cluster using the CLI instead of the web UI. For the CLI procedure, refer to Add a Ceph cluster using CLI.

The web UI capabilities for adding a Ceph cluster are limited and lack flexibility in defining Ceph cluster specifications. For example, if an error occurs while adding a Ceph cluster using the web UI, usually you can address it only through the CLI.

The web UI functionality for managing Ceph cluster is going to be deprecated in one of the following releases.

This section explains how to create a Ceph cluster on top of a managed cluster using the Mirantis Container Cloud web UI. As a result, you will deploy a Ceph cluster with minimum three Ceph nodes that provide persistent volumes to the Kubernetes workloads for your managed cluster.

Note

For the advanced configuration through the KaaSCephCluster custom resource, see Ceph advanced configuration.

For the configuration of the Ceph Controller through Kubernetes templates to manage Ceph node resources, see Enable Ceph tolerations and resources management.

To create a Ceph cluster in the managed cluster:

  1. Log in to the Container Cloud web UI with the m:kaas:namespace@operator or m:kaas:namespace@writer permissions.

  2. Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.

  3. In the Clusters tab, click the required cluster name. The Cluster page with the Machines and Ceph clusters lists opens.

  4. In the Ceph Clusters block, click Create Cluster.

  5. Configure the Ceph cluster in the Create New Ceph Cluster wizard that opens:

    Create new Ceph cluster

    Section

    Parameter name

    Description

    General settings

    Name

    The Ceph cluster name.

    Cluster Network

    Replication network for Ceph OSDs. Must contain the CIDR definition and match the corresponding values of the cluster Subnet object or the environment network values. For configuration examples, see the descriptions of managed-ns_Subnet_storage YAML files in :ref: e2example1.

    Public Network

    Public network for Ceph data. Must contain the CIDR definition and match the corresponding values of the cluster Subnet object or the environment network values. For configuration examples, see the descriptions of managed-ns_Subnet_storage YAML files in :ref: e2example1.

    Enable OSDs LCM

    Select to enable LCM for Ceph OSDs.

    Machines / Machine #1-3

    Select machine

    Select the name of the Kubernetes machine that will host the corresponding Ceph node in the Ceph cluster.

    Manager, Monitor

    Select the required Ceph services to install on the Ceph node.

    Devices

    Select the disk that Ceph will use.

    Warning

    Do not select the device for system services, for example, sda.

    Warning

    A Ceph cluster does not support removable devices that are hosts with hotplug functionality enabled. To use devices as Ceph OSD data devices, make them non-removable or disable the hotplug functionality in the BIOS settings for disks that are configured to be used as Ceph OSD data devices.

    Enable Object Storage

    Select to enable the single-instance RGW Object Storage.

  6. To add more Ceph nodes to the new Ceph cluster, click + next to any Ceph Machine title in the Machines tab. Configure a Ceph node as required.

    Warning

    Do not add more than 3 Manager and/or Monitor services to the Ceph cluster.

  7. After you add and configure all nodes in your Ceph cluster, click Create.

  8. Verify your Ceph cluster as described in Verify Ceph.

  9. Verify that network addresses used on your clusters do not overlap with the following default MKE network addresses for Swarm and MCR:

    • 10.0.0.0/16 is used for Swarm networks. IP addresses from this network are virtual.

    • 10.99.0.0/16 is used for MCR networks. IP addresses from this network are allocated on hosts.

    Verification of Swarm and MCR network addresses

    To verify Swarm and MCR network addresses, run on any master node:

    docker info
    

    Example of system response:

    Server:
     ...
     Swarm:
      ...
      Default Address Pool: 10.0.0.0/16
      SubnetSize: 24
      ...
     Default Address Pools:
       Base: 10.99.0.0/16, Size: 20
     ...
    

    Not all of Swarm and MCR addresses are usually in use. One Swarm Ingress network is created by default and occupies the 10.0.0.0/24 address block. Also, three MCR networks are created by default and occupy three address blocks: 10.99.0.0/20, 10.99.16.0/20, 10.99.32.0/20.

    To verify the actual networks state and addresses in use, run:

    docker network ls
    docker network inspect <networkName>
    
Add a Ceph cluster using CLI

This section explains how to create a Ceph cluster on top of a managed cluster using the Mirantis Container Cloud CLI. As a result, you will deploy a Ceph cluster with minimum three Ceph nodes that provide persistent volumes to the Kubernetes workloads for your managed cluster.

Note

For the advanced configuration through the KaaSCephCluster custom resource, see Ceph advanced configuration.

For the configuration of the Ceph Controller through Kubernetes templates to manage Ceph node resources, see Enable Ceph tolerations and resources management.

To create a Ceph cluster in a managed cluster:

  1. Verify that the managed cluster overall status is ready with all conditions in the Ready state:

    kubectl -n <managedClusterProject> get cluster <clusterName> -o yaml
    

    Substitute <managedClusterProject> and <clusterName> with the corresponding managed cluster namespace and name accordingly.

    Example output:

    status:
      providerStatus:
        ready: true
        conditions:
        - message: Helm charts are successfully installed(upgraded).
          ready: true
          type: Helm
        - message: Kubernetes objects are fully up.
          ready: true
          type: Kubernetes
        - message: All requested nodes are ready.
          ready: true
          type: Nodes
        - message: Maintenance state of the cluster is false
          ready: true
          type: Maintenance
        - message: TLS configuration settings are applied
          ready: true
          type: TLS
        - message: Kubelet is Ready on all nodes belonging to the cluster
          ready: true
          type: Kubelet
        - message: Swarm is Ready on all nodes belonging to the cluster
          ready: true
          type: Swarm
        - message: All provider instances of the cluster are Ready
          ready: true
          type: ProviderInstance
        - message: LCM agents have the latest version
          ready: true
          type: LCMAgent
        - message: StackLight is fully up.
          ready: true
          type: StackLight
        - message: OIDC configuration has been applied.
          ready: true
          type: OIDC
        - message: Load balancer 10.100.91.150 for kubernetes API has status HEALTHY
          ready: true
          type: LoadBalancer
    
  2. Create a YAML file with the Ceph cluster specification:

    apiVersion: kaas.mirantis.com/v1alpha1
    kind: KaaSCephCluster
    metadata:
      name: <cephClusterName>
      namespace: <managedClusterProject>
    spec:
      k8sCluster:
        name: <clusterName>
        namespace: <managedClusterProject>
    

    Substitute <cephClusterName> with the desired name for the Ceph cluster. This name will be used in the Ceph LCM operations.

  3. Select from the following options:

    • Add explicit network configuration of the Ceph cluster using the network section:

      spec:
        cephClusterSpec:
          network:
            publicNet: <publicNet>
            clusterNet: <clusterNet>
      

      Substitute the following values:

      • <publicNet> is a CIDR definition or comma-separated list of CIDR definitions (if the managed cluster uses multiple networks) of public network for the Ceph data. The values should match the corresponding values of the cluster Subnet object.

      • <clusterNet> is a CIDR definition or comma-separated list of CIDR definitions (if the managed cluster uses multiple networks) of replication network for the Ceph data. The values should match the corresponding values of the cluster Subnet object.

    • Configure Subnet objects for the Storage access network by setting ipam/SVC-ceph-public: "1" and ipam/SVC-ceph-cluster: "1" labels to the corresponding Subnet objects. For more details, refer to Create subnets for a managed cluster using CLI, Step 5.

  4. Configure Ceph Manager and Ceph Monitor roles to select nodes that should place Ceph Monitor and Ceph Manager daemons:

    1. Obtain the names of the machines to place Ceph Monitor and Ceph Manager daemons at:

      kubectl -n <managedClusterProject> get machine
      
    2. Add the nodes section with mon and mgr roles defined:

      spec:
        cephClusterSpec:
          nodes:
            <mgr-node-1>:
              roles:
              - <role-1>
              - <role-2>
              ...
            <mgr-node-2>:
              roles:
              - <role-1>
              - <role-2>
              ...
      

      Substitute <mgr-node-X> with the corresponding Machine object names and <role-X> with the corresponding roles of daemon placement, for example, mon or mgr.

  5. Configure Ceph OSD daemons for Ceph cluster data storage:

    Note

    This step involves the deployment of Ceph Monitor and Ceph Manager daemons on nodes that are different from the ones hosting Ceph cluster OSDs. However, it is also possible to colocate Ceph OSDs, Ceph Monitor, and Ceph Manager daemons on the same nodes. You can achieve this by configuring the roles and storageDevices sections accordingly. This kind of configuration flexibility is particularly useful in scenarios such as hyper-converged clusters.

    Warning

    The minimal production cluster requires at least three nodes for Ceph Monitor daemons and three nodes for Ceph OSDs.

    1. Obtain the names of the machines with disks intended for storing Ceph data:

      kubectl -n <managedClusterProject> get machine
      
    2. For each machine, use status.providerStatus.hardware.storage to obtain information about node disks:

      kubectl -n <managedClusterProject> get machine <machineName> -o yaml
      

      Output example of the machine hardware details:

      status:
        providerStatus:
          hardware:
            storage:
            - byID: /dev/disk/by-id/wwn-0x05ad99618d66a21f
              byIDs:
              - /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_05ad99618d66a21f
              - /dev/disk/by-id/scsi-305ad99618d66a21f
              - /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_05ad99618d66a21f
              - /dev/disk/by-id/wwn-0x05ad99618d66a21f
              byPath: /dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:0
              byPaths:
              - /dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:0
              name: /dev/sda
              serialNumber: 05ad99618d66a21f
              size: 61
              type: hdd
            - byID: /dev/disk/by-id/wwn-0x26d546263bd312b8
              byIDs:
              - /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_26d546263bd312b8
              - /dev/disk/by-id/scsi-326d546263bd312b8
              - /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_26d546263bd312b8
              - /dev/disk/by-id/wwn-0x26d546263bd312b8
              byPath: /dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:2
              byPaths:
              - /dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:2
              name: /dev/sdb
              serialNumber: 26d546263bd312b8
              size: 32
              type: hdd
            - byID: /dev/disk/by-id/wwn-0x2e52abb48862dbdc
              byIDs:
              - /dev/disk/by-id/lvm-pv-uuid-MncrcO-6cel-0QsB-IKaY-e8UK-6gDy-k2hOtf
              - /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_2e52abb48862dbdc
              - /dev/disk/by-id/scsi-32e52abb48862dbdc
              - /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_2e52abb48862dbdc
              - /dev/disk/by-id/wwn-0x2e52abb48862dbdc
              byPath: /dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:1
              byPaths:
              - /dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:1
              name: /dev/sdc
              serialNumber: 2e52abb48862dbdc
              size: 61
              type: hdd
      
    3. Select by-id symlinks on the disks to be used in the Ceph cluster. The symlinks should meet the following requirements:

      • A by-id symlink should contain status.providerStatus.hardware.storage.serialNumber

      • A by-id symlink should not contain wwn

      For the example above, if you are willing to use the sdc disk to store Ceph data on it, use the /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_2e52abb48862dbdc symlink. It will be persistent and will not be affected by node reboot.

    4. Sepcify by-id symlinks:

      Specify selected by-id symlinks in the spec.cephClusterSpec.nodes.storageDevices.fullPath field along with the spec.cephClusterSpec.nodes.storageDevices.config.deviceClass field:

      spec:
        cephClusterSpec:
          nodes:
            <storage-node-1>:
              storageDevices:
              - fullPath: <byIDSymlink-1>
                config:
                  deviceClass: <deviceClass-1>
              - fullPath: <byIDSymlink-2>
                config:
                  deviceClass: <deviceClass-1>
              - fullPath: <byIDSymlink-3>
                config:
                  deviceClass: <deviceClass-2>
              ...
            <storage-node-2>:
              storageDevices:
              - fullPath: <byIDSymlink-4>
                config:
                  deviceClass: <deviceClass-1>
              - fullPath: <byIDSymlink-5>
                config:
                  deviceClass: <deviceClass-1>
              - fullPath: <byIDSymlink-6>
                config:
                  deviceClass: <deviceClass-2>
            <storage-node-3>:
              storageDevices:
              - fullPath: <byIDSymlink-7>
                config:
                  deviceClass: <deviceClass-1>
              - fullPath: <byIDSymlink-8>
                config:
                  deviceClass: <deviceClass-1>
              - fullPath: <byIDSymlink-9>
                config:
                  deviceClass: <deviceClass-2>
      

      Substitute the following values:

      • <storage-node-X> with the corresponding Machine object names

      • <byIDSymlink-X> with the obtained by-id symlinks from status.providerStatus.hardware.storage.byIDs

      • <deviceClass-X> with the obtained disk types from status.providerStatus.hardware.storage.type

      Specify selected by-id symlinks in the spec.cephClusterSpec.nodes.storageDevices.name field along with the spec.cephClusterSpec.nodes.storageDevices.config.deviceClass field:

      spec:
        cephClusterSpec:
          nodes:
            <storage-node-1>:
              storageDevices:
              - name: <byIDSymlink-1>
                config:
                  deviceClass: <deviceClass-1>
              - name: <byIDSymlink-2>
                config:
                  deviceClass: <deviceClass-1>
              - name: <byIDSymlink-3>
                config:
                  deviceClass: <deviceClass-2>
              ...
            <storage-node-2>:
              storageDevices:
              - name: <byIDSymlink-4>
                config:
                  deviceClass: <deviceClass-1>
              - name: <byIDSymlink-5>
                config:
                  deviceClass: <deviceClass-1>
              - name: <byIDSymlink-6>
                config:
                  deviceClass: <deviceClass-2>
            <storage-node-3>:
              storageDevices:
              - name: <byIDSymlink-7>
                config:
                  deviceClass: <deviceClass-1>
              - name: <byIDSymlink-8>
                config:
                  deviceClass: <deviceClass-1>
              - name: <byIDSymlink-9>
                config:
                  deviceClass: <deviceClass-2>
      

      Substitute the following values:

      • <storage-node-X> with the corresponding Machine object names

      • <byIDSymlink-X> with the obtained by-id symlinks from status.providerStatus.hardware.storage.byIDs

      • <deviceClass-X> with the obtained disk types from status.providerStatus.hardware.storage.type

  6. Optional. Configure Ceph Block Pools to use RBD. For the detailed configuration, refer to MOSK documentation: Ceph advanced configuration - Pool parameters.

    Example configuration:

    spec:
      cephClusterSpec:
        pools:
        - name: kubernetes
          role: kubernetes
          deviceClass: hdd
          replicated:
            size: 3
            targetSizeRatio: 10.0
          default: true
    
  7. Optional. Configure Ceph Object Storage to use RGW. For the detailed configuration, refer to MOSK documentation: Ceph advanced configuration - RADOS Gateway parameters.

    Example configuration:

    spec:
      cephClusterSpec:
        objectStorage:
          rgw:
            dataPool:
              deviceClass: hdd
              erasureCoded:
                codingChunks: 1
                dataChunks: 2
              failureDomain: host
            gateway:
              instances: 3
              port: 80
              securePort: 8443
            metadataPool:
              deviceClass: hdd
              failureDomain: host
              replicated:
                size: 3
            name: object-store
            preservePoolsOnDelete: false
    
  8. Optional. Configure Ceph Shared Filesystem to use CephFS. For the detailed configuration, refer to Enable Ceph Shared File System (CephFS).

    Example configuration:

    spec:
      cephClusterSpec:
        sharedFilesystem:
          cephFS:
          - name: cephfs-store
            dataPools:
            - name: cephfs-pool-1
              deviceClass: hdd
              replicated:
                size: 3
              failureDomain: host
            metadataPool:
              deviceClass: nvme
              replicated:
                size: 3
              failureDomain: host
            metadataServer:
              activeCount: 1
              activeStandby: false
    
  9. When the Ceph cluster specification is complete, apply the built YAML file on the management cluster:

    kubectl apply -f <kcc-template>.yaml
    

    Substitue <kcc-template> with the name of the file containing the KaaSCephCluster specification.

    The resulting example of the KaaSCephCluster template
    apiVersion: kaas.mirantis.com/v1alpha1
    kind: KaaSCephCluster
    metadata:
      name: kaas-ceph
      namespace: child-namespace
    spec:
      k8sCluster:
        name: child-cluster
        namespace: child-namespace
      cephClusterSpec:
        network:
          publicNet: 10.10.0.0/24
          clusterNet: 10.11.0.0/24
        nodes:
          master-1:
            roles:
            - mon
            - mgr
          master-2:
            roles:
            - mon
            - mgr
          master-3:
            roles:
            - mon
            - mgr
          worker-1:
            storageDevices:
            - fullPath: dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231443409
              config:
                deviceClass: ssd
          worker-2:
            storageDevices:
            - fullPath: /dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231440912
              config:
                deviceClass: ssd
          worker-3:
            storageDevices:
            - fullPath: /dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231434939
              config:
                deviceClass: ssd
        pools:
        - name: kubernetes
          role: kubernetes
          deviceClass: ssd
          replicated:
            size: 3
            targetSizeRatio: 10.0
          default: true
        objectStorage:
          rgw:
            dataPool:
              deviceClass: ssd
              erasureCoded:
                codingChunks: 1
                dataChunks: 2
              failureDomain: host
            gateway:
              instances: 3
              port: 80
              securePort: 8443
            metadataPool:
              deviceClass: ssd
              failureDomain: host
              replicated:
                size: 3
            name: object-store
            preservePoolsOnDelete: false
          sharedFilesystem:
            cephFS:
            - name: cephfs-store
              dataPools:
              - name: cephfs-pool-1
                deviceClass: ssd
                replicated:
                  size: 3
                failureDomain: host
              metadataPool:
                deviceClass: ssd
                replicated:
                  size: 3
                failureDomain: host
              metadataServer:
                activeCount: 1
                activeStandby: false
    
  10. Wait for the KaaSCephCluster status and then for status.shortClusterInfo.state to become Ready:

    kubectl -n <managedClusterProject> get kcc -o yaml
    
Example of a complete L2 templates configuration for cluster creation

The following example contains all required objects of an advanced network and host configuration for a baremetal-based managed cluster.

The procedure below contains:

  • Various .yaml objects to be applied with a managed cluster kubeconfig

  • Useful comments inside the .yaml example files

  • Example hardware and configuration data, such as network, disk, auth, that must be updated accordingly to fit your cluster configuration

  • Example templates, such as l2template and baremetalhostprofline, that illustrate how to implement a specific configuration

Caution

The exemplary configuration described below is not production ready and is provided for illustration purposes only.

For illustration purposes, all files provided in this exemplary procedure are named by the Kubernetes object types:

Note

Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

Caution

While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

managed-ns_BareMetalHostInventory_cz7700-managed-cluster-control-noefi.yaml
managed-ns_BareMetalHostInventory_cz7741-managed-cluster-control-noefi.yaml
managed-ns_BareMetalHostInventory_cz7743-managed-cluster-control-noefi.yaml
managed-ns_BareMetalHostInventory_cz812-managed-cluster-storage-worker-noefi.yaml
managed-ns_BareMetalHostInventory_cz813-managed-cluster-storage-worker-noefi.yaml
managed-ns_BareMetalHostInventory_cz814-managed-cluster-storage-worker-noefi.yaml
managed-ns_BareMetalHostInventory_cz815-managed-cluster-worker-noefi.yaml
managed-ns_BareMetalHostProfile_bmhp-cluster-default.yaml
managed-ns_BareMetalHostProfile_worker-storage1.yaml
managed-ns_Cluster_managed-cluster.yaml
managed-ns_KaaSCephCluster_ceph-cluster-managed-cluster.yaml
managed-ns_L2Template_bm-1490-template-controls-netplan-cz7700-pxebond.yaml
managed-ns_L2Template_bm-1490-template-controls-netplan.yaml
managed-ns_L2Template_bm-1490-template-workers-netplan.yaml
managed-ns_Machine_cz7700-managed-cluster-control-noefi-.yaml
managed-ns_Machine_cz7741-managed-cluster-control-noefi-.yaml
managed-ns_Machine_cz7743-managed-cluster-control-noefi-.yaml
managed-ns_Machine_cz812-managed-cluster-storage-worker-noefi-.yaml
managed-ns_Machine_cz813-managed-cluster-storage-worker-noefi-.yaml
managed-ns_Machine_cz814-managed-cluster-storage-worker-noefi-.yaml
managed-ns_Machine_cz815-managed-cluster-worker-noefi-.yaml
managed-ns_PublicKey_managed-cluster-key.yaml
managed-ns_cz7700-cred.yaml
managed-ns_cz7741-cred.yaml
managed-ns_cz7743-cred.yaml
managed-ns_cz812-cred.yaml
managed-ns_cz813-cred.yaml
managed-ns_cz814-cred.yaml
managed-ns_cz815-cred.yaml
managed-ns_Subnet_lcm-nw.yaml
managed-ns_Subnet_metallb-public-for-managed.yaml (obsolete)
managed-ns_Subnet_metallb-public-for-extiface.yaml
managed-ns_MetalLBConfig-lb-managed.yaml
managed-ns_MetalLBConfigTemplate-lb-managed-template.yaml (obsolete)
managed-ns_Subnet_storage-backend.yaml
managed-ns_Subnet_storage-frontend.yaml
default_Namespace_managed-ns.yaml

Caution

The procedure below assumes that you apply each new .yaml file using kubectl create -f <file_name.yaml>.

To create an example configuration for a managed cluster creation:

  1. Verify that you have configured the following items:

    1. All bmh nodes for PXE boot as described in Add a bare metal host using CLI

    2. All physical NICs of the bmh nodes

    3. All required physical subnets and routing

  2. Create an empty .yaml file with the namespace object:

    apiVersion: v1
    
  3. Select from the following options:

    Create the required number of .yaml files with the BareMetalHostCredential objects for each bmh node with the unique name and authentication data. The following example contains one BareMetalHostCredential object:

    Note

    The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

    managed-ns_cz815-cred.yaml
    apiVersion: kaas.mirantis.com/v1alpha1
    kind: BareMetalHostCredential
    metadata:
      name: cz815-cred
      namespace: managed-ns
      labels:
        kaas.mirantis.com/region: region-one
    spec:
      username: admin
      password:
        value: supersecret
    

    Create the required number of .yaml files with the Secret objects for each bmh node with the unique name and authentication data. The following example contains one Secret object:

    managed-ns_cz815-cred.yaml
    apiVersion: v1
    data:
      password: YWRtaW4=
      username: ZW5naW5lZXI=
    kind: Secret
    metadata:
      labels:
        kaas.mirantis.com/credentials: 'true'
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
      name: cz815-cred
      namespace: managed-ns
    
  4. Create a set of files with the bmh nodes configuration:

    • managed-ns_BareMetalHostInventory_cz7700-managed-cluster-control-noefi.yaml
      apiVersion: kaas.mirantis.com/v1alpha1
      kind: BareMetalHostInventory
      metadata:
        annotations:
          inspect.metal3.io/hardwaredetails-storage-sort-term: hctl ASC, wwn ASC, by_id ASC, name ASC
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          # we will use those label, to link machine to exact bmh node
          kaas.mirantis.com/baremetalhost-id: cz7700
          kaas.mirantis.com/provider: baremetal
        name: cz7700-managed-cluster-control-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.12
          bmhCredentialsName: 'cz7740-cred'
        bootMACAddress: 0c:c4:7a:34:52:04
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHostInventory_cz7741-managed-cluster-control-noefi.yaml
      apiVersion: kaas.mirantis.com/v1alpha1
      kind: BareMetalHostInventory
      metadata:
        annotations:
          inspect.metal3.io/hardwaredetails-storage-sort-term: hctl ASC, wwn ASC, by_id ASC, name ASC
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/baremetalhost-id: cz7741
          kaas.mirantis.com/provider: baremetal
        name: cz7741-managed-cluster-control-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.76
            bmhCredentialsName: 'cz7741-cred'
        bootMACAddress: 0c:c4:7a:34:92:f4
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHostInventory_cz7743-managed-cluster-control-noefi.yaml
      apiVersion: kaas.mirantis.com/v1alpha1
      kind: BareMetalHostInventory
      metadata:
        annotations:
          inspect.metal3.io/hardwaredetails-storage-sort-term: hctl ASC, wwn ASC, by_id ASC, name ASC
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/baremetalhost-id: cz7743
          kaas.mirantis.com/provider: baremetal
        name: cz7743-managed-cluster-control-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.78
          bmhCredentialsName: 'cz7743-cred'
        bootMACAddress: 0c:c4:7a:34:66:fc
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHostInventory_cz812-managed-cluster-storage-worker-noefi.yaml
      apiVersion: kaas.mirantis.com/v1alpha1
      kind: BareMetalHostInventory
      metadata:
        annotations:
          inspect.metal3.io/hardwaredetails-storage-sort-term: hctl ASC, wwn ASC, by_id ASC, name ASC
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/baremetalhost-id: cz812
          kaas.mirantis.com/provider: baremetal
        name: cz812-managed-cluster-storage-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.182
          bmhCredentialsName: 'cz812-cred'
        bootMACAddress: 0c:c4:7a:bc:ff:2e
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHostInventory_cz813-managed-cluster-storage-worker-noefi.yaml
      apiVersion: kaas.mirantis.com/v1alpha1
      kind: BareMetalHostInventory
      metadata:
        annotations:
          inspect.metal3.io/hardwaredetails-storage-sort-term: hctl ASC, wwn ASC, by_id ASC, name ASC
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/baremetalhost-id: cz813
          kaas.mirantis.com/provider: baremetal
        name: cz813-managed-cluster-storage-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.183
          bmhCredentialsName: 'cz813-cred'
        bootMACAddress: 0c:c4:7a:bc:fe:36
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHostInventory_cz814-managed-cluster-storage-worker-noefi.yaml
      apiVersion: kaas.mirantis.com/v1alpha1
      kind: BareMetalHostInventory
      metadata:
        annotations:
          inspect.metal3.io/hardwaredetails-storage-sort-term: hctl ASC, wwn ASC, by_id ASC, name ASC
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/baremetalhost-id: cz814
          kaas.mirantis.com/provider: baremetal
        name: cz814-managed-cluster-storage-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.184
          bmhCredentialsName: 'cz814-cred'
        bootMACAddress: 0c:c4:7a:bc:fb:20
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHostInventory_cz815-managed-cluster-worker-noefi.yaml
      apiVersion: kaas.mirantis.com/v1alpha1
      kind: BareMetalHostInventory
      metadata:
        annotations:
          inspect.metal3.io/hardwaredetails-storage-sort-term: hctl ASC, wwn ASC, by_id ASC, name ASC
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/baremetalhost-id: cz815
          kaas.mirantis.com/provider: baremetal
        name: cz815-managed-cluster-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.185
          bmhCredentialsName: 'cz815-cred'
        bootMACAddress: 0c:c4:7a:bc:fc:3e
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz7700-managed-cluster-control-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
          # we will use those label, to link machine to exact bmh node
          kaas.mirantis.com/baremetalhost-id: cz7700
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        annotations:
          kaas.mirantis.com/baremetalhost-credentials-name: cz7700-cred
        name: cz7700-managed-cluster-control-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.12
          # credentialsName is updated automatically during cluster deployment
          credentialsName: ''
        bootMACAddress: 0c:c4:7a:34:52:04
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz7741-managed-cluster-control-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
          kaas.mirantis.com/baremetalhost-id: cz7741
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        annotations:
          kaas.mirantis.com/baremetalhost-credentials-name: cz7741-cred
        name: cz7741-managed-cluster-control-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.76
          credentialsName: ''
        bootMACAddress: 0c:c4:7a:34:92:f4
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz7743-managed-cluster-control-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
          kaas.mirantis.com/baremetalhost-id: cz7743
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        annotations:
          kaas.mirantis.com/baremetalhost-credentials-name: cz7743-cred
        name: cz7743-managed-cluster-control-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.78
          credentialsName: ''
        bootMACAddress: 0c:c4:7a:34:66:fc
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz812-managed-cluster-storage-worker-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/baremetalhost-id: cz812
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        annotations:
          kaas.mirantis.com/baremetalhost-credentials-name: cz812-cred
        name: cz812-managed-cluster-storage-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.182
          credentialsName: ''
        bootMACAddress: 0c:c4:7a:bc:ff:2e
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz813-managed-cluster-storage-worker-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/baremetalhost-id: cz813
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        annotations:
          kaas.mirantis.com/baremetalhost-credentials-name: cz813-cred
        name: cz813-managed-cluster-storage-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.183
          credentialsName: ''
        bootMACAddress: 0c:c4:7a:bc:fe:36
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz814-managed-cluster-storage-worker-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/baremetalhost-id: cz814
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        annotations:
          kaas.mirantis.com/baremetalhost-credentials-name: cz814-cred
        name: cz814-managed-cluster-storage-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.184
          credentialsName: ''
        bootMACAddress: 0c:c4:7a:bc:fb:20
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz815-managed-cluster-worker-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/baremetalhost-id: cz815
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        annotations:
          kaas.mirantis.com/baremetalhost-credentials-name: cz815-cred
        name: cz815-managed-cluster-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.185
          credentialsName: ''
        bootMACAddress: 0c:c4:7a:bc:fc:3e
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz7700-managed-cluster-control-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
          # we will use those label, to link machine to exact bmh node
          kaas.mirantis.com/baremetalhost-id: cz7700
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: cz7700-managed-cluster-control-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.12
          # The secret for credentials requires the username and password
          # keys in the Base64 encoding.
          credentialsName: cz7700-cred
        bootMACAddress: 0c:c4:7a:34:52:04
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz7741-managed-cluster-control-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
          kaas.mirantis.com/baremetalhost-id: cz7741
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: cz7741-managed-cluster-control-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.76
          credentialsName: cz7741-cred
        bootMACAddress: 0c:c4:7a:34:92:f4
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz7743-managed-cluster-control-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
          kaas.mirantis.com/baremetalhost-id: cz7743
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: cz7743-managed-cluster-control-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.78
          credentialsName: cz7743-cred
        bootMACAddress: 0c:c4:7a:34:66:fc
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz812-managed-cluster-storage-worker-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/baremetalhost-id: cz812
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: cz812-managed-cluster-storage-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.182
          credentialsName: cz812-cred
        bootMACAddress: 0c:c4:7a:bc:ff:2e
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz813-managed-cluster-storage-worker-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/baremetalhost-id: cz813
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: cz813-managed-cluster-storage-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.183
          credentialsName: cz813-cred
        bootMACAddress: 0c:c4:7a:bc:fe:36
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz814-managed-cluster-storage-worker-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/baremetalhost-id: cz814
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: cz814-managed-cluster-storage-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.184
          credentialsName: cz814-cre
        bootMACAddress: 0c:c4:7a:bc:fb:20
        bootMode: legacy
        online: true
      
    • managed-ns_BareMetalHost_cz815-managed-cluster-worker-noefi.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/baremetalhost-id: cz815
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: cz815-managed-cluster-worker-noefi
        namespace: managed-ns
      spec:
        bmc:
          address: 192.168.1.185
          credentialsName: cz815-cred
        bootMACAddress: 0c:c4:7a:bc:fc:3e
        bootMode: legacy
        online: true
      
  5. Verify that the inspecting phase has started:

    KUBECONFIG=kubeconfig kubectl -n managed-ns get bmh -o wide
    

    Example of system response:

    NAME                                       STATUS STATE CONSUMER BMC           BOOTMODE ONLINE ERROR REGION
    cz7700-managed-cluster-control-noefi       OK     inspecting     192.168.1.12  legacy   true         region-one
    cz7741-managed-cluster-control-noefi       OK     inspecting     192.168.1.76  legacy   true         region-one
    cz7743-managed-cluster-control-noefi       OK     inspecting     192.168.1.78  legacy   true         region-one
    cz812-managed-cluster-storage-worker-noefi OK     inspecting     192.168.1.182 legacy   true         region-one
    

    Wait for inspection to complete. Usually, it takes up to 15 minutes.

  6. Collect the bmh hardware information to create the l2template and bmh objects:

    KUBECONFIG=kubeconfig kubectl -n managed-ns get bmh -o wide
    

    Example of system response:

    NAME                                       STATUS STATE CONSUMER BMC           BOOTMODE ONLINE ERROR REGION
    cz7700-managed-cluster-control-noefi       OK     ready          192.168.1.12  legacy   true         region-one
    cz7741-managed-cluster-control-noefi       OK     ready          192.168.1.76  legacy   true         region-one
    cz7743-managed-cluster-control-noefi       OK     ready          192.168.1.78  legacy   true         region-one
    cz812-managed-cluster-storage-worker-noefi OK     ready          192.168.1.182 legacy   true         region-one
    
    KUBECONFIG=kubeconfig kubectl -n managed-ns get bmh cz7700-managed-cluster-control-noefi -o yaml | less
    

    Example of system response:

    ..
    nics:
    - ip: ""
      mac: 0c:c4:7a:1d:f4:a6
      model: 0x8086 0x10fb
      # discovered interfaces
      name: ens4f0
      pxe: false
      # temporary PXE address discovered from baremetal-mgmt
    - ip: 172.16.170.30
      mac: 0c:c4:7a:34:52:04
      model: 0x8086 0x1521
      name: enp9s0f0
      pxe: true
      # duplicates temporary PXE address discovered from baremetal-mgmt
      # since we have fallback-bond configured on host
    - ip: 172.16.170.33
      mac: 0c:c4:7a:34:52:05
      model: 0x8086 0x1521
      # discovered interfaces
      name: enp9s0f1
      pxe: false
    ...
    storage:
    - by_path: /dev/disk/by-path/pci-0000:00:1f.2-ata-1
      model: Samsung SSD 850
      name: /dev/sda
      rotational: false
      sizeBytes: 500107862016
    - by_path: /dev/disk/by-path/pci-0000:00:1f.2-ata-2
      model: Samsung SSD 850
      name: /dev/sdb
      rotational: false
      sizeBytes: 500107862016
    ....
    
  7. Create bare metal host profiles:

    • managed-ns_BareMetalHostProfile_bmhp-cluster-default.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHostProfile
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          # This label indicates that this profile will be default in
          # namespaces, so machines w\o exact profile selecting will use
          # this template
          kaas.mirantis.com/defaultBMHProfile: 'true'
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: bmhp-cluster-default
        namespace: managed-ns
      spec:
        devices:
        - device:
            byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-1
            minSize: 120Gi
            wipe: true
          partitions:
          - name: bios_grub
            partflags:
            - bios_grub
            size: 4Mi
            wipe: true
          - name: uefi
            partflags:
            - esp
            size: 200Mi
            wipe: true
          - name: config-2
            size: 64Mi
            wipe: true
          - name: lvm_dummy_part
            size: 1Gi
            wipe: true
          - name: lvm_root_part
            size: 0
            wipe: true
        - device:
            byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-2
            minSize: 30Gi
            wipe: true
        - device:
            byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-3
            minSize: 30Gi
            wipe: true
          partitions:
          - name: lvm_lvp_part
            size: 0
            wipe: true
        - device:
            byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-4
            wipe: true
        fileSystems:
        - fileSystem: vfat
          partition: config-2
        - fileSystem: vfat
          mountPoint: /boot/efi
          partition: uefi
        - fileSystem: ext4
          logicalVolume: root
          mountPoint: /
        - fileSystem: ext4
          logicalVolume: lvp
          mountPoint: /mnt/local-volumes/
        grubConfig:
          defaultGrubOptions:
          - GRUB_DISABLE_RECOVERY="true"
          - GRUB_PRELOAD_MODULES=lvm
          - GRUB_TIMEOUT=30
        kernelParameters:
          modules:
          - content: 'options kvm_intel nested=1'
            filename: kvm_intel.conf
          sysctl:
          # For the list of options prohibited to change, refer to
          # https://docs.mirantis.com/mke/3.7/install/predeployment/set-up-kernel-default-protections.html
            fs.aio-max-nr: '1048576'
            fs.file-max: '9223372036854775807'
            fs.inotify.max_user_instances: '4096'
            kernel.core_uses_pid: '1'
            kernel.dmesg_restrict: '1'
            net.ipv4.conf.all.rp_filter: '0'
            net.ipv4.conf.default.rp_filter: '0'
            net.ipv4.conf.k8s-ext.rp_filter: '0'
            net.ipv4.conf.k8s-ext.rp_filter: '0'
            net.ipv4.conf.m-pub.rp_filter: '0'
            vm.max_map_count: '262144'
        logicalVolumes:
        - name: root
          size: 0
          vg: lvm_root
        - name: lvp
          size: 0
          vg: lvm_lvp
        postDeployScript: |
          #!/bin/bash -ex
          # used for test-debug only!
          echo "root:r00tme" | sudo chpasswd
          echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"' > /etc/udev/rules.d/60-ssd-scheduler.rules
          echo $(date) 'post_deploy_script done' >> /root/post_deploy_done
      
        preDeployScript: |
          #!/bin/bash -ex
          echo "$(date) pre_deploy_script done" >> /root/pre_deploy_done
        volumeGroups:
        - devices:
          - partition: lvm_root_part
          name: lvm_root
        - devices:
          - partition: lvm_lvp_part
          name: lvm_lvp
        - devices:
          - partition: lvm_dummy_part
          # here we can create lvm, but do not mount or format it somewhere
          name: lvm_forawesomeapp
      
    • managed-ns_BareMetalHostProfile_worker-storage1.yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHostProfile
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: worker-storage1
        namespace: managed-ns
      spec:
        devices:
        - device:
            minSize: 120Gi
            wipe: true
          partitions:
          - name: bios_grub
            partflags:
            - bios_grub
            size: 4Mi
            wipe: true
          - name: uefi
            partflags:
            - esp
            size: 200Mi
            wipe: true
          - name: config-2
            size: 64Mi
            wipe: true
          # Create dummy partition w\o mounting
          - name: lvm_dummy_part
            size: 1Gi
            wipe: true
          - name: lvm_root_part
            size: 0
            wipe: true
        - device:
            # Will be used for Ceph, so required to be wiped
            byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-1
            minSize: 30Gi
            wipe: true
        - device:
            byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-2
            minSize: 30Gi
            wipe: true
          partitions:
          - name: lvm_lvp_part
            size: 0
            wipe: true
        - device:
            byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-3
            wipe: true
        - device:
            byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-4
            minSize: 30Gi
            wipe: true
          partitions:
            - name: lvm_lvp_part_sdf
              wipe: true
              size: 0
        fileSystems:
        - fileSystem: vfat
          partition: config-2
        - fileSystem: vfat
          mountPoint: /boot/efi
          partition: uefi
        - fileSystem: ext4
          logicalVolume: root
          mountPoint: /
        - fileSystem: ext4
          logicalVolume: lvp
          mountPoint: /mnt/local-volumes/
        grubConfig:
          defaultGrubOptions:
          - GRUB_DISABLE_RECOVERY="true"
          - GRUB_PRELOAD_MODULES=lvm
          - GRUB_TIMEOUT=30
        kernelParameters:
          modules:
          - content: 'options kvm_intel nested=1'
            filename: kvm_intel.conf
          sysctl:
          # For the list of options prohibited to change, refer to
          # https://docs.mirantis.com/mke/3.6/install/predeployment/set-up-kernel-default-protections.html
            fs.aio-max-nr: '1048576'
            fs.file-max: '9223372036854775807'
            fs.inotify.max_user_instances: '4096'
            kernel.core_uses_pid: '1'
            kernel.dmesg_restrict: '1'
            net.ipv4.conf.all.rp_filter: '0'
            net.ipv4.conf.default.rp_filter: '0'
            net.ipv4.conf.k8s-ext.rp_filter: '0'
            net.ipv4.conf.k8s-ext.rp_filter: '0'
            net.ipv4.conf.m-pub.rp_filter: '0'
            vm.max_map_count: '262144'
        logicalVolumes:
        - name: root
          size: 0
          vg: lvm_root
        - name: lvp
          size: 0
          vg: lvm_lvp
        postDeployScript: |
      
          #!/bin/bash -ex
      
          # used for test-debug only! That would allow operator to logic via TTY.
          echo "root:r00tme" | sudo chpasswd
          # Just an example for enforcing "ssd" disks to be switched to use "deadline" i\o scheduler.
          echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"' > /etc/udev/   rules.d/60-ssd-scheduler.rules
          echo $(date) 'post_deploy_script done' >> /root/post_deploy_done
      
        preDeployScript: |
          #!/bin/bash -ex
          echo "$(date) pre_deploy_script done" >> /root/pre_deploy_done
      
        volumeGroups:
        - devices:
          - partition: lvm_root_part
          name: lvm_root
        - devices:
          - partition: lvm_lvp_part
          - partition: lvm_lvp_part_sdf
          name: lvm_lvp
        - devices:
          - partition: lvm_dummy_part
          name: lvm_forawesomeapp
      

    Note

    If you mount the /var directory, review Mounting recommendations for the /var directory before configuring BareMetalHostProfile.

  8. Create the L2Template objects:

    • managed-ns_L2Template_bm-1490-template-controls-netplan.yaml
      apiVersion: ipam.mirantis.com/v1alpha1
      kind: L2Template
      metadata:
        labels:
          bm-1490-template-controls-netplan: anymagicstring
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: bm-1490-template-controls-netplan
        namespace: managed-ns
      spec:
        ifMapping:
        - enp9s0f0
        - enp9s0f1
        - eno1
        - ens3f1
        l3Layout:
        - scope: namespace
          subnetName: lcm-nw
        - scope: namespace
          subnetName: storage-frontend
        - scope: namespace
          subnetName: storage-backend
        - scope: namespace
          subnetName: metallb-public-for-extiface
        npTemplate: |-
          version: 2
          ethernets:
            {{nic 0}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 0}}
              set-name: {{nic 0}}
              mtu: 1500
            {{nic 1}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 1}}
              set-name: {{nic 1}}
              mtu: 1500
            {{nic 2}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 2}}
              set-name: {{nic 2}}
              mtu: 1500
            {{nic 3}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 3}}
              set-name: {{nic 3}}
              mtu: 1500
          bonds:
            bond0:
              parameters:
                mode: 802.3ad
                #transmit-hash-policy: layer3+4
                #mii-monitor-interval: 100
              interfaces:
                - {{ nic 0 }}
                - {{ nic 1 }}
            bond1:
              parameters:
                mode: 802.3ad
                #transmit-hash-policy: layer3+4
                #mii-monitor-interval: 100
              interfaces:
                - {{ nic 2 }}
                - {{ nic 3 }}
          vlans:
            stor-f:
              id: 1494
              link: bond1
              addresses:
                - {{ip "stor-f:storage-frontend"}}
            stor-b:
              id: 1489
              link: bond1
              addresses:
                - {{ip "stor-b:storage-backend"}}
            m-pub:
              id: 1491
              link: bond0
          bridges:
            k8s-ext:
              interfaces: [m-pub]
              addresses:
                - {{ ip "k8s-ext:metallb-public-for-extiface" }}
            k8s-lcm:
              dhcp4: false
              dhcp6: false
              gateway4: {{ gateway_from_subnet "lcm-nw" }}
              addresses:
                - {{ ip "k8s-lcm:lcm-nw" }}
              nameservers:
                addresses: [ 172.18.176.6 ]
              interfaces:
                  - bond0
      
    • managed-ns_L2Template_bm-1490-template-workers-netplan.yaml
      apiVersion: ipam.mirantis.com/v1alpha1
      kind: L2Template
      metadata:
        labels:
          bm-1490-template-workers-netplan: anymagicstring
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: bm-1490-template-workers-netplan
        namespace: managed-ns
      spec:
        ifMapping:
        - eno1
        - eno2
        - ens7f0
        - ens7f1
        l3Layout:
        - scope: namespace
          subnetName: lcm-nw
        - scope: namespace
          subnetName: storage-frontend
        - scope: namespace
          subnetName: storage-backend
        - scope: namespace
          subnetName: metallb-public-for-extiface
        npTemplate: |-
          version: 2
          ethernets:
            {{nic 0}}:
              match:
                macaddress: {{mac 0}}
              set-name: {{nic 0}}
              mtu: 1500
            {{nic 1}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 1}}
              set-name: {{nic 1}}
              mtu: 1500
            {{nic 2}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 2}}
              set-name: {{nic 2}}
              mtu: 1500
            {{nic 3}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 3}}
              set-name: {{nic 3}}
              mtu: 1500
          bonds:
            bond0:
              interfaces:
                - {{ nic 1 }}
            bond1:
              parameters:
                mode: 802.3ad
                #transmit-hash-policy: layer3+4
                #mii-monitor-interval: 100
              interfaces:
                - {{ nic 2 }}
                - {{ nic 3 }}
          vlans:
            stor-f:
              id: 1494
              link: bond1
              addresses:
                - {{ip "stor-f:storage-frontend"}}
            stor-b:
              id: 1489
              link: bond1
              addresses:
                - {{ip "stor-b:storage-backend"}}
            m-pub:
              id: 1491
              link: {{ nic 1 }}
          bridges:
            k8s-lcm:
              interfaces:
                - {{ nic 0 }}
              gateway4: {{ gateway_from_subnet "lcm-nw" }}
              addresses:
                - {{ ip "k8s-lcm:lcm-nw" }}
              nameservers:
                addresses: [ 172.18.176.6 ]
            k8s-ext:
              interfaces: [m-pub]
      
    • managed-ns_L2Template_bm-1490-template-controls-netplan-cz7700-pxebond.yaml
      apiVersion: ipam.mirantis.com/v1alpha1
      kind: L2Template
      metadata:
        labels:
          bm-1490-template-controls-netplan-cz7700-pxebond: anymagicstring
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: bm-1490-template-controls-netplan-cz7700-pxebond
        namespace: managed-ns
      spec:
        ifMapping:
        - enp9s0f0
        - enp9s0f1
        - eno1
        - ens3f1
        l3Layout:
        - scope: namespace
          subnetName: lcm-nw
        - scope: namespace
          subnetName: storage-frontend
        - scope: namespace
          subnetName: storage-backend
        - scope: namespace
          subnetName: metallb-public-for-extiface
        npTemplate: |-
          version: 2
          ethernets:
            {{nic 0}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 0}}
              set-name: {{nic 0}}
              mtu: 1500
            {{nic 1}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 1}}
              set-name: {{nic 1}}
              mtu: 1500
            {{nic 2}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 2}}
              set-name: {{nic 2}}
              mtu: 1500
            {{nic 3}}:
              dhcp4: false
              dhcp6: false
              match:
                macaddress: {{mac 3}}
              set-name: {{nic 3}}
              mtu: 1500
          bonds:
            bond0:
              parameters:
                mode: 802.3ad
                #transmit-hash-policy: layer3+4
                #mii-monitor-interval: 100
              interfaces:
                - {{ nic 0 }}
                - {{ nic 1 }}
            bond1:
              parameters:
                mode: 802.3ad
                #transmit-hash-policy: layer3+4
                #mii-monitor-interval: 100
              interfaces:
                - {{ nic 2 }}
                - {{ nic 3 }}
          vlans:
            stor-f:
              id: 1494
              link: bond1
              addresses:
                - {{ip "stor-f:storage-frontend"}}
            stor-b:
              id: 1489
              link: bond1
              addresses:
                - {{ip "stor-b:storage-backend"}}
            m-pub:
              id: 1491
              link: bond0
          bridges:
            k8s-ext:
              interfaces: [m-pub]
              addresses:
                - {{ ip "k8s-ext:metallb-public-for-extiface" }}
            k8s-lcm:
              dhcp4: false
              dhcp6: false
              gateway4: {{ gateway_from_subnet "lcm-nw" }}
              addresses:
                - {{ ip "k8s-lcm:lcm-nw" }}
              nameservers:
                addresses: [ 172.18.176.6 ]
              interfaces:
                - bond0
      
  9. Create the Subnet objects:

    • managed-ns_Subnet_lcm-nw.yaml
      apiVersion: ipam.mirantis.com/v1alpha1
      kind: Subnet
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          ipam/SVC-k8s-lcm: '1'
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: lcm-nw
        namespace: managed-ns
      spec:
        cidr: 172.16.170.0/24
        excludeRanges:
        - 172.16.170.150
        gateway: 172.16.170.1
        includeRanges:
        - 172.16.170.150-172.16.170.250
      
    • managed-ns_Subnet_metallb-public-for-extiface.yaml
      apiVersion: ipam.mirantis.com/v1alpha1
      kind: Subnet
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: metallb-public-for-extiface
        namespace: managed-ns
      spec:
        cidr: 172.16.168.0/24
        gateway: 172.16.168.1
        includeRanges:
        - 172.16.168.10-172.16.168.30
      
    • managed-ns_Subnet_storage-backend.yaml
      apiVersion: ipam.mirantis.com/v1alpha1
      kind: Subnet
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          ipam/SVC-ceph-cluster: '1'
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: storage-backend
        namespace: managed-ns
      spec:
        cidr: 10.12.0.0/24
      
    • managed-ns_Subnet_storage-frontend.yaml
      apiVersion: ipam.mirantis.com/v1alpha1
      kind: Subnet
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          ipam/SVC-ceph-public: '1'
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: storage-frontend
        namespace: managed-ns
      spec:
        cidr: 10.12.1.0/24
      
  10. Create MetalLB configuration objects:

    • Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0):

      managed-ns_MetalLBConfig-lb-managed.yaml
      apiVersion: kaas.mirantis.com/v1alpha1
      kind: MetalLBConfig
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: lb-managed
        namespace: managed-ns
      spec:
        ipAddressPools:
        - name: services
          spec:
            addresses:
            - 10.100.91.151-10.100.91.170
            autoAssign: true
            avoidBuggyIPs: false
        l2Advertisements:
        - name: services
          spec:
            ipAddressPools:
            - services
      
    • Before Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0):

      • managed-ns_Subnet_metallb-public-for-managed.yaml
        apiVersion: ipam.mirantis.com/v1alpha1
        kind: Subnet
        metadata:
          labels:
            cluster.sigs.k8s.io/cluster-name: managed-cluster
            ipam/SVC-MetalLB: '1'
            kaas.mirantis.com/provider: baremetal
            kaas.mirantis.com/region: region-one
          name: metallb-public-for-managed
          namespace: managed-ns
        spec:
          cidr: 172.16.168.0/24
          includeRanges:
          - 172.16.168.31-172.16.168.50
        
      • managed-ns_MetalLBConfig-lb-managed.yaml

        Note

        Applies since Container Cloud 2.21.0 and 2.21.1 for MOSK as TechPreview and since 2.24.0 as GA for management clusters. For managed clusters, is generally available since Container Cloud 2.25.0.

        apiVersion: kaas.mirantis.com/v1alpha1
        kind: MetalLBConfig
        metadata:
          labels:
            cluster.sigs.k8s.io/cluster-name: managed-cluster
            kaas.mirantis.com/provider: baremetal
            kaas.mirantis.com/region: region-one
          name: lb-managed
          namespace: managed-ns
        spec:
          templateName: lb-managed-template
        
      • managed-ns_MetalLBConfigTemplate-lb-managed-template.yaml

        Note

        The MetalLBConfigTemplate object is available as Technology Preview since Container Cloud 2.24.0 and is generally available since Container Cloud 2.25.0.

        apiVersion: ipam.mirantis.com/v1alpha1
        kind: MetalLBConfigTemplate
        metadata:
          labels:
            cluster.sigs.k8s.io/cluster-name: managed-cluster
            kaas.mirantis.com/provider: baremetal
            kaas.mirantis.com/region: region-one
          name: lb-managed-template
          namespace: managed-ns
        spec:
          templates:
            l2Advertisements: |
              - name: services
                spec:
                  ipAddressPools:
                    - services
        
    • Before Container Cloud 2.24.0 (Cluster release 14.0.0):

      managed-ns_Subnet_metallb-public-for-managed.yaml
      apiVersion: ipam.mirantis.com/v1alpha1
      kind: Subnet
      metadata:
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          ipam/SVC-MetalLB: '1'
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        name: metallb-public-for-managed
        namespace: managed-ns
      spec:
        cidr: 172.16.168.0/24
        includeRanges:
        - 172.16.168.31-172.16.168.50
      
  11. Create the PublicKey object for a managed cluster connection. For details, see Public key resources.

    managed-ns_PublicKey_managed-cluster-key.yaml
    apiVersion: kaas.mirantis.com/v1alpha1
    kind: PublicKey
    metadata:
      name: managed-cluster-key
      namespace: managed-ns
    spec:
      publicKey: ssh-rsa AAEXAMPLEXXX
    
  12. Create the Cluster object. For details, see Cluster resources.

    managed-ns_Cluster_managed-cluster.yaml
    apiVersion: cluster.k8s.io/v1alpha1
    kind: Cluster
    metadata:
      annotations:
        kaas.mirantis.com/lcm: 'true'
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
      name: managed-cluster
      namespace: managed-ns
    spec:
      clusterNetwork:
        pods:
          cidrBlocks:
          - 192.169.0.0/16
        serviceDomain: ''
        services:
          cidrBlocks:
          - 10.232.0.0/18
      providerSpec:
        value:
          apiVersion: baremetal.k8s.io/v1alpha1
          dedicatedControlPlane: false
          helmReleases:
          - name: ceph-controller
          - enabled: true
            name: stacklight
            values:
              alertmanagerSimpleConfig:
                email:
                  enabled: false
                slack:
                  enabled: false
              logging:
                persistentVolumeClaimSize: 30Gi
              highAvailabilityEnabled: false
              logging:
                enabled: false
              prometheusServer:
                customAlerts: []
                persistentVolumeClaimSize: 16Gi
                retentionSize: 15GB
                retentionTime: 15d
                watchDogAlertEnabled: false
          - name: metallb
            values: {}
          kind: BaremetalClusterProviderSpec
          loadBalancerHost: 172.16.168.3
          publicKeys:
          - name: managed-cluster-key
          region: region-one
          release: mke-5-16-0-3-3-6
    
  13. Create the Machine objects linked to each bmh node. For details, see Machine resources.

    • managed-ns_Machine_cz7700-managed-cluster-control-noefi-.yaml
      apiVersion: cluster.k8s.io/v1alpha1
      kind: Machine
      metadata:
        generateName: cz7700-managed-cluster-control-noefi-
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          cluster.sigs.k8s.io/control-plane: controlplane
          hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        namespace: managed-ns
      spec:
        providerSpec:
          value:
            apiVersion: baremetal.k8s.io/v1alpha1
            hostSelector:
              matchLabels:
                kaas.mirantis.com/baremetalhost-id: cz7700
            kind: BareMetalMachineProviderSpec
            l2TemplateSelector:
              label: bm-1490-template-controls-netplan-cz7700-pxebond
            publicKeys:
            - name: managed-cluster-key
      
    • managed-ns_Machine_cz7741-managed-cluster-control-noefi-.yaml
      apiVersion: cluster.k8s.io/v1alpha1
      kind: Machine
      metadata:
        generateName: cz7741-managed-cluster-control-noefi-
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          cluster.sigs.k8s.io/control-plane: controlplane
          hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        namespace: managed-ns
      spec:
        providerSpec:
          value:
            apiVersion: baremetal.k8s.io/v1alpha1
            bareMetalHostProfile:
              name: bmhp-cluster-default
              namespace: managed-ns
            hostSelector:
              matchLabels:
                kaas.mirantis.com/baremetalhost-id: cz7741
            kind: BareMetalMachineProviderSpec
            l2TemplateSelector:
              label: bm-1490-template-controls-netplan
            publicKeys:
            - name: managed-cluster-key
      
    • managed-ns_Machine_cz7743-managed-cluster-control-noefi-.yaml
      apiVersion: cluster.k8s.io/v1alpha1
      kind: Machine
      metadata:
        generateName: cz7743-managed-cluster-control-noefi-
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          cluster.sigs.k8s.io/control-plane: controlplane
          hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        namespace: managed-ns
      spec:
        providerSpec:
          value:
            apiVersion: baremetal.k8s.io/v1alpha1
            bareMetalHostProfile:
              name: bmhp-cluster-default
              namespace: managed-ns
            hostSelector:
              matchLabels:
                kaas.mirantis.com/baremetalhost-id: cz7743
            kind: BareMetalMachineProviderSpec
            l2TemplateSelector:
              label: bm-1490-template-controls-netplan
            publicKeys:
            - name: managed-cluster-key
      
    • managed-ns_Machine_cz812-managed-cluster-storage-worker-noefi-.yaml
      apiVersion: cluster.k8s.io/v1alpha1
      kind: Machine
      metadata:
        generateName: cz812-managed-cluster-storage-worker-noefi-
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/storage: storage
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        namespace: managed-ns
      spec:
        providerSpec:
          value:
            apiVersion: baremetal.k8s.io/v1alpha1
            bareMetalHostProfile:
              name: worker-storage1
              namespace: managed-ns
            hostSelector:
              matchLabels:
                kaas.mirantis.com/baremetalhost-id: cz812
            kind: BareMetalMachineProviderSpec
            l2TemplateSelector:
              label: bm-1490-template-workers-netplan
            publicKeys:
            - name: managed-cluster-key
      
    • managed-ns_Machine_cz813-managed-cluster-storage-worker-noefi-.yaml
      apiVersion: cluster.k8s.io/v1alpha1
      kind: Machine
      metadata:
        generateName: cz813-managed-cluster-storage-worker-noefi-
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/storage: storage
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        namespace: managed-ns
      spec:
        providerSpec:
          value:
            apiVersion: baremetal.k8s.io/v1alpha1
            bareMetalHostProfile:
              name: worker-storage1
              namespace: managed-ns
            hostSelector:
              matchLabels:
                kaas.mirantis.com/baremetalhost-id: cz813
            kind: BareMetalMachineProviderSpec
            l2TemplateSelector:
              label: bm-1490-template-workers-netplan
            publicKeys:
            - name: managed-cluster-key
      
    • managed-ns_Machine_cz814-managed-cluster-storage-worker-noefi-.yaml
      apiVersion: cluster.k8s.io/v1alpha1
      kind: Machine
      metadata:
        generateName: cz814-managed-cluster-storage-worker-noefi-
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/storage: storage
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
        namespace: managed-ns
      spec:
        providerSpec:
          value:
            apiVersion: baremetal.k8s.io/v1alpha1
            bareMetalHostProfile:
              name: worker-storage1
              namespace: managed-ns
            hostSelector:
              matchLabels:
                kaas.mirantis.com/baremetalhost-id: cz814
            kind: BareMetalMachineProviderSpec
            l2TemplateSelector:
              label: bm-1490-template-workers-netplan
            publicKeys:
            - name: managed-cluster-key
      
    • managed-ns_Machine_cz815-managed-cluster-worker-noefi-.yaml
      apiVersion: cluster.k8s.io/v1alpha1
      kind: Machine
      metadata:
        generateName: cz815-managed-cluster-worker-noefi-
        labels:
          cluster.sigs.k8s.io/cluster-name: managed-cluster
          hostlabel.bm.kaas.mirantis.com/worker: worker
          kaas.mirantis.com/provider: baremetal
          kaas.mirantis.com/region: region-one
          si-role/node-for-delete: 'true'
        namespace: managed-ns
      spec:
        providerSpec:
          value:
            apiVersion: baremetal.k8s.io/v1alpha1
            bareMetalHostProfile:
              name: worker-storage1
              namespace: managed-ns
            hostSelector:
              matchLabels:
                kaas.mirantis.com/baremetalhost-id: cz815
            kind: BareMetalMachineProviderSpec
            l2TemplateSelector:
              label: bm-1490-template-workers-netplan
            publicKeys:
            - name: managed-cluster-key
      
  14. Verify that the bmh nodes are in the provisioning state:

    KUBECONFIG=kubectl kubectl -n managed-ns get bmh  -o wide
    

    Example of system response:

    NAME                                  STATUS STATE          CONSUMER                                    BMC          BOOTMODE   ONLINE  ERROR REGION
    cz7700-managed-cluster-control-noefi  OK     provisioning   cz7700-managed-cluster-control-noefi-8bkqw  192.168.1.12  legacy     true          region-one
    cz7741-managed-cluster-control-noefi  OK     provisioning   cz7741-managed-cluster-control-noefi-42tp2  192.168.1.76  legacy     true          region-one
    cz7743-managed-cluster-control-noefi  OK     provisioning   cz7743-managed-cluster-control-noefi-8cwpw  192.168.1.78  legacy     true          region-one
    ...
    

    Wait until all bmh nodes are in the provisioned state.

  15. Verify that the lcmmachine phase has started:

    KUBECONFIG=kubeconfig kubectl -n managed-ns get lcmmachines  -o wide
    

    Example of system response:

    NAME                                       CLUSTERNAME       TYPE      STATE   INTERNALIP     HOSTNAME                                         AGENTVERSION
    cz7700-managed-cluster-control-noefi-8bkqw managed-cluster   control   Deploy  172.16.170.153 kaas-node-803721b4-227c-4675-acc5-15ff9d3cfde2   v0.2.0-349-g4870b7f5
    cz7741-managed-cluster-control-noefi-42tp2 managed-cluster   control   Prepare 172.16.170.152 kaas-node-6b8f0d51-4c5e-43c5-ac53-a95988b1a526   v0.2.0-349-g4870b7f5
    cz7743-managed-cluster-control-noefi-8cwpw managed-cluster   control   Prepare 172.16.170.151 kaas-node-e9b7447d-5010-439b-8c95-3598518f8e0a   v0.2.0-349-g4870b7f5
    ...
    
  16. Verify that the lcmmachine phase is complete and the Kubernetes cluster is created:

    KUBECONFIG=kubeconfig kubectl -n managed-ns get lcmmachines  -o wide
    

    Example of system response:

    NAME                                       CLUSTERNAME       TYPE     STATE  INTERNALIP      HOSTNAME                                        AGENTVERSION
    cz7700-managed-cluster-control-noefi-8bkqw  managed-cluster  control  Ready  172.16.170.153  kaas-node-803721b4-227c-4675-acc5-15ff9d3cfde2  v0.2.0-349-g4870b7f5
    cz7741-managed-cluster-control-noefi-42tp2  managed-cluster  control  Ready  172.16.170.152  kaas-node-6b8f0d51-4c5e-43c5-ac53-a95988b1a526  v0.2.0-349-g4870b7f5
    cz7743-managed-cluster-control-noefi-8cwpw  managed-cluster  control  Ready  172.16.170.151  kaas-node-e9b7447d-5010-439b-8c95-3598518f8e0a  v0.2.0-349-g4870b7f5
    ...
    
  17. Create the KaaSCephCluster object:

    managed-ns_KaaSCephCluster_ceph-cluster-managed-cluster.yaml
    apiVersion: kaas.mirantis.com/v1alpha1
    kind: KaaSCephCluster
    metadata:
      name: ceph-cluster-managed-cluster
      namespace: managed-ns
    spec:
      cephClusterSpec:
        nodes:
          # Add the exact ``nodes`` names.
          # Obtain the name from "get bmh -o wide" ``consumer`` field.
          cz812-managed-cluster-storage-worker-noefi-58spl:
            roles:
            - mgr
            - mon
          # All disk configuration must be reflected in ``baremetalhostprofile``
            storageDevices:
            - config:
                deviceClass: ssd
              fullPath: /dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231434939
          cz813-managed-cluster-storage-worker-noefi-lr4k4:
            roles:
            - mgr
            - mon
            storageDevices:
            - config:
                deviceClass: ssd
              fullPath: /dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231440912
          cz814-managed-cluster-storage-worker-noefi-z2m67:
            roles:
            - mgr
            - mon
            storageDevices:
            - config:
                deviceClass: ssd
              fullPath: /dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231443409
        pools:
        - default: true
          deviceClass: ssd
          name: kubernetes
          replicated:
            size: 3
          role: kubernetes
      k8sCluster:
        name: managed-cluster
        namespace: managed-ns
    

    Note

    The storageDevices[].fullPath field is available since Container Cloud 2.25.0. For the clusters running earlier product versions, define the /dev/disk/by-id symlinks using storageDevices[].name instead.

  18. Obtain kubeconfig of the newly created managed cluster:

    KUBECONFIG=kubeconfig kubectl -n managed-ns get secrets managed-cluster-kubeconfig -o jsonpath='{.data.admin\.conf}' | base64 -d |  tee managed.kubeconfig
    
  19. Verify the status of the Ceph cluster in your managed cluster:

    KUBECONFIG=managed.kubeconfig kubectl -n rook-ceph exec -it $(KUBECONFIG=managed.kubeconfig kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph -s
    

    Example of system response:

    cluster:
      id:     e75c6abd-c5d5-4ae8-af17-4711354ff8ef
      health: HEALTH_OK
    services:
      mon: 3 daemons, quorum a,b,c (age 55m)
      mgr: a(active, since 55m)
      osd: 3 osds: 3 up (since 54m), 3 in (since 54m)
    data:
      pools:   1 pools, 32 pgs
      objects: 273 objects, 555 MiB
      usage:   4.0 GiB used, 1.6 TiB / 1.6 TiB avail
      pgs:     32 active+clean
    io:
      client:   51 KiB/s wr, 0 op/s rd, 4 op/s wr
    
Manage an existing bare metal cluster

The subsections of this section were moved to Mirantis OpenStack for Kubernetes documentation: Bare metal operations.

Manage machines of a bare metal cluster

The subsections of this section were moved to Mirantis OpenStack for Kubernetes documentation: Bare metal operations.

Upgrade an operating system distribution

Available since 14.0.1 and 15.0.1 for MOSK 23.2

This section was moved to Mirantis OpenStack for Kubernetes documentation: Bare metal operations.

Remove old Ubuntu kernel packages

Available since 2.25.0

This section was moved to Mirantis OpenStack for Kubernetes documentation: Bare metal operations.

Modify network configuration on an existing machine

TechPreview

This section was moved to Mirantis OpenStack for Kubernetes documentation: Bare metal operations.

Change a user name and password for a bare metal host

This section was moved to Mirantis OpenStack for Kubernetes documentation: Bare metal operations.

Manage Ceph

The subsections of this section were moved to Mirantis OpenStack for Kubernetes documentation: Manage Ceph.

Ceph advanced configuration

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph advanced configuration.

Ceph default configuration options

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph advanced configuration.

Automated Ceph LCM

The subsections of this section were moved to MOSK documentation: Automated Ceph LCM.

High-level workflow of Ceph OSD or node removal

The subsections of this section were moved to Mirantis OpenStack for Kubernetes documentation: High-level workflow of Ceph OSD or node removal.

Creating a Ceph OSD removal request

This section was moved to Mirantis OpenStack for Kubernetes documentation: Creating a Ceph OSD removal request.

KaaSCephOperationRequest OSD removal specification

This section was moved to Mirantis OpenStack for Kubernetes documentation: KaaSCephOperationRequest OSD removal specification.

KaaSCephOperationRequest OSD removal status

This section was moved to Mirantis OpenStack for Kubernetes documentation: KaaSCephOperationRequest OSD removal status.

Add, remove, or reconfigure Ceph nodes

This section was moved to MOSK documentation: Ceph operations - Add, remove, or reconfigure Ceph nodes.

Add, remove, or reconfigure Ceph OSDs

This section was moved to MOSK documentation: Ceph operations - Add, remove, or reconfigure Ceph OSDs.

Add, remove, or reconfigure Ceph OSDs with metadata devices

This section was moved to MOSK documentation: Automated Ceph LCM - Add, remove, or reconfigure Ceph OSDs with metadata devices.

Replace a failed Ceph OSD

This section was moved to MOSK documentation: Automated Ceph LCM - Replace a failed Ceph OSD.

Replace a failed Ceph OSD with a metadata device

The subsections of this section were moved to MOSK documentation: Automated Ceph LCM - Replace a failed Ceph OSD with a metadata device.

Replace a failed Ceph OSD with a metadata device as a logical volume path

This section was moved to MOSK documentation: Automated Ceph LCM - Replace a failed Ceph OSD with a metadata device as a logical volume path.

Replace a failed Ceph OSD disk with a metadata device as a device name

This section was moved to MOSK documentation: Automated Ceph LCM - Replace a failed Ceph OSD disk with a metadata device as a device name.

Replace a failed metadata device

This section was moved to MOSK documentation: Automated Ceph LCM - Replace a failed metadata device.

Replace a failed Ceph node

This section was moved to MOSK documentation: Automated Ceph LCM - Replace a failed Ceph node.

Migrate Ceph cluster to address storage devices using by-id

This section was moved to MOSK documentation: Ceph operations - Migrate Ceph cluster to address storage devices using by-id.

Increase Ceph cluster storage size

This section was moved to MOSK documentation: Ceph operations - Increase Ceph cluster storage size.

Move a Ceph Monitor daemon to another node

This section was moved to MOSK documentation: Ceph operations - Move a Ceph Monitor daemon to another node.

Migrate a Ceph Monitor before machine replacement

This section was moved to MOSK documentation: Ceph operations - Migrate a Ceph Monitor before machine replacement.

Enable Ceph RGW Object Storage

This section was moved to MOSK documentation: Ceph operations - Enable Ceph RGW Object Storage.

Enable multisite for Ceph RGW Object Storage

This section was moved to MOSK documentation: Ceph operations - Enable multisite for Ceph RGW Object Storage.

Manage Ceph RBD or CephFS clients and RGW users

Available since 2.21.0 for non-MOSK clusters

The subsections of this section were moved to MOSK documentation: Ceph operations - Manage Ceph RBD or CephFS clients and RGW users.

Manage Ceph RBD or CephFS clients

This section was moved to MOSK documentation: Ceph operations - Manage Ceph RBD or CephFS clients.

Manage Ceph Object Storage users

This section was moved to MOSK documentation: Ceph operations - Manage Ceph Object Storage users.

Set an Amazon S3 bucket policy

The subsections of this section were moved to MOSK documentation: Ceph operations - Set an Amazon S3 bucket policy.

Create Ceph Object Storage users

This section was moved to MOSK documentation: Ceph operations Create Ceph Object Storage users.

Set a bucket policy for a Ceph Object Storage user

This section was moved to MOSK documentation: Ceph operations Set a bucket policy for a Ceph Object Storage user.

Verify Ceph

The subsections of this section were moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Verify Ceph.

Enable Ceph tolerations and resources management

The subsections of this section were moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Enable Ceph tolerations and resources management.

Enable Ceph multinetwork

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Enable Ceph multinetwork.

Enable TLS for Ceph public endpoints

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Configure Ceph Object Gateway TLS.

Enable Ceph RBD mirroring

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Enable Ceph RBD mirroring.

Enable Ceph Shared File System (CephFS)

Available since 2.22.0 as GA

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Enable Ceph Shared File System (CephFS).

Share Ceph across two managed clusters

TechPreview Available since 2.22.0

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Share Ceph across two managed clusters.

Calculate target ratio for Ceph pools

This section was moved to MOSK documentation: Ceph operations - Calculate target ratio for Ceph pools.

Specify placement of Ceph cluster daemons

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Specify placement of Ceph cluster daemons.

Migrate Ceph pools from one failure domain to another

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Migrate Ceph pools from one failure domain to another.

Enable periodic Ceph performance testing

TechPreview

The subsections of this section were moved to Mirantis OpenStack for Kubernetes documentation: Enable periodic Ceph performance testing.

Create a Ceph performance test request

TechPreview

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - Create a Ceph performance test request.

KaaSCephOperationRequest CR perftest specification

TechPreview

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - KaaSCephOperationRequest CR perftest specification.

KaaSCephOperationRequest perftest status

TechPreview

This section was moved to Mirantis OpenStack for Kubernetes documentation: Ceph operations - KaaSCephOperationRequest perftest status.

Delete a managed cluster

This section was moved to MOSK documentation: General operations - Delete a managed cluster.

Day-2 operations

TechPreview since 2.26.0 (17.1.0 and 16.1.0)

The subsections of this section were moved to MOSK documentation: Host operating system configuration - Day-2 operations.

Day-2 operations workflow

TechPreview since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK documentation: Day-2 operations - Day-2 operations workflow.

Global recommendations for implementation of custom modules

This section was moved to MOSK documentation: Day-2 operations - Global recommendations for implementation of custom modules.

Format and structure of a module package

TechPreview since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK documentation: Day-2 operations - Format and structure of a module package.

Modules provided by Container Cloud

TechPreview since 2.27.0 (17.2.0 and 16.2.0)

The subsections of this section were moved to host-os-modules documentation.

irqbalance module

TechPreview since 2.27.0 (17.2.0 and 16.2.0)

This section was moved to host-os-modules documentation: irqbalance module.

package module

TechPreview since 2.27.0 (17.2.0 and 16.2.0)

This section was moved to host-os-modules documentation: package module.

sysctl module

TechPreview since 2.27.0 (17.2.0 and 16.2.0)

This section was moved to host-os-modules documentation: sysctl module.

HostOSConfiguration and HostOSConfigurationModules concepts

TechPreview since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK documentation: Day-2 operations - HostOSConfiguration and HostOSConfigurationModules concepts.

Internal API for day-2 operations

TechPreview since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK documentation: Day-2 operations - Internal API for day-2 operations.

Add a custom module to a Container Cloud deployment

TechPreview since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK documentation: Day-2 operations - Add a custom module to a Container Cloud deployment.

Test a custom or Container Cloud module after creation

TechPreview since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK documentation: Day-2 operations - Test a custom or Container Cloud module after creation.

Retrigger a module configuration

This section was moved to MOSK documentation: Day-2 operations - Retrigger a module configuration.

Troubleshooting

This section was moved to MOSK documentation: Day-2 operations - Troubleshooting.

Add or update a CA certificate for a MITM proxy using API

This section was moved to MOSK documentation: Underlay Kubernetes operations - Add or update a CA certificate for a MITM proxy using API.

Add a custom OIDC provider for MKE

Available since 17.0.0, 16.0.0, and 14.1.0

By default, MKE uses Keycloak as the OIDC provider. Using the ClusterOIDCConfiguration custom resource, you can add your own OpenID Connect (OIDC) provider for MKE on managed clusters to authenticate user requests to Kubernetes. For OIDC provider requirements, see OIDC official specification.

Note

For OpenStack and StackLight, Container Cloud supports only Keycloak, which is configured on the management cluster, as the OIDC provider.

To add a custom OIDC provider for MKE:

  1. Configure the OIDC provider:

    1. Log in to the OIDC provider dashboard.

    2. Create an OIDC client. If you are going to use an existing one, skip this step.

    3. Add the MKE redirectURL of the managed cluster to the OIDC client. By default, the URL format is https://<MKE IP>:6443/login.

    4. Add the <Container Cloud web UI IP>/token to the OIDC client for generation of kubeconfig files of the target managed cluster through the Container Cloud web UI.

    5. Ensure that the aud claim of the issued id_token for audience will be equal to the created client ID.

    6. Optional. Allow MKE to refresh authentication when id_token expires by allowing the offline_access claim for the OIDC client.

  2. Create the ClusterOIDCConfiguration object in the YAML format containing the OIDC client settings. For details, see API Reference: ClusterOIDCConfiguration resource for MKE.

    Warning

    The kubectl apply command automatically saves the applied data as plain text into the kubectl.kubernetes.io/last-applied-configuration annotation of the corresponding object. This may result in revealing sensitive data in this annotation when creating or modifying the object.

    Therefore, do not use kubectl apply on this object. Use kubectl create, kubectl patch, or kubectl edit instead.

    If you used kubectl apply on this object, you can remove the kubectl.kubernetes.io/last-applied-configuration annotation from the object using kubectl edit.

    The ClusterOIDCConfiguration object is created in the management cluster. Users with the m:kaas:ns@operator/writer/member roles have access to this object.

    Once done, the following dependent objects are created automatically in the target managed cluster: the rbac.authorization.k8s.io/v1/ClusterRoleBinding object that binds the admin group defined in spec:adminRoleCriteria:value to the cluster-admin rbac.authorization.k8s.io/v1/ClusterRole object.

  3. In the Cluster object of the managed cluster, add the name of the ClusterOIDCConfiguration object to the spec.providerSpec.value.oidc field.

  4. Wait until the cluster machines switch from the Reconfigure to Ready state for the changes to apply.

Change a cluster configuration

This section was moved to MOSK documentation: General Operations - Change a cluster configuration.

Disable a machine

TechPreview since 2.25.0 (17.0.0 and 16.0.0) for workers on managed clusters

This section was moved to MOSK documentation: General Operations - Disable a machine.

Configure the parallel update of worker nodes

Available since 17.0.0, 16.0.0, and 14.1.0 as GA Available since 14.0.1(0) and 15.0.1 as TechPreview

This section was moved to MOSK documentation: Cluster update - Configure the parallel update of worker nodes.

Create update groups for worker machines

Available since 2.27.0 (17.2.0 and 16.2.0)

This section was moved to MOSK documentation: Cluster update - Create update groups for worker machines.

Change the upgrade order of a machine or machine pool

This section was moved to MOSK documentation: Cluster update - Change the upgrade order of a machine or machine pool.

Update a managed cluster

The subsections of this section were moved to MOSK documentation: Cluster update.

Verify the Container Cloud status before managed cluster update

This section was moved to MOSK documentation: Cluster update - Verify the management cluster status before MOSK update.

Update a managed cluster using the Container Cloud web UI

This section was moved to MOSK documentation: Cluster update - Update to a major version.

Granularly update a managed cluster using the ClusterUpdatePlan object

Available since 2.27.0 (17.2.0 and 16.2.0) TechPreview

This section was moved to MOSK documentation: Cluster update - Granularly update a managed cluster using the ClusterUpdatePlan object.

Update a patch Cluster release of a managed cluster

Available since 2.23.2

This section was moved to MOSK documentation: Cluster update - Update a patch Cluster release of a managed cluster.

Add a Container Cloud cluster to Lens

This section was moved to Mirantis OpenStack for Kubernetes documentation: Getting access - Add a Container Cloud cluster to Lens.

Connect to the Mirantis Kubernetes Engine web UI

This section was moved to Mirantis OpenStack for Kubernetes documentation: Getting access.

Connect to a Mirantis Container Cloud cluster

This section was moved to Mirantis OpenStack for Kubernetes documentation: Getting access - Connect to a MOSK cluster.

Inspect the history of a cluster and machine deployment or update

Available since 2.22.0

This section was moved to MOSK Troubleshooting Guide: Inspect the history of a cluster and machine deployment or update.

Operate management clusters

The subsections of this section were moved to MOSK documentation: Management cluster operations.

Workflow and configuration of management cluster upgrade

This section was moved to MOSK documentation: Management cluster operations - Workflow and configuration of management cluster upgrade.

Schedule Mirantis Container Cloud updates

This section was moved to MOSK documentation: Management cluster operations - Schedule Mirantis Container Cloud updates.

Renew the Container Cloud and MKE licenses

This section was moved to MOSK documentation: Management cluster operations - Renew the Container Cloud and MKE licenses.

Configure NTP server

This section was moved to MOSK documentation: Management cluster operations - Configure NTP server.

Automatically propagate Salesforce configuration to all clusters

This section was moved to MOSK documentation: Management cluster operations - Automatically propagate Salesforce configuration to all clusters.

Update the Keycloak IP address on bare metal clusters

This section was moved to MOSK documentation: Management cluster operations - Update the Keycloak IP address on bare metal clusters.

Configure host names for cluster machines

TechPreview Available since 2.24.0

This section was moved to MOSK documentation: Management cluster operations - Configure host names for cluster machines.

Back up MariaDB on a management cluster

The subsections of this section were moved to MOSK documentation: Management cluster operations - Back up MariaDB on a management cluster.

Configure periodic backups of MariaDB

This section was moved to MOSK documentation: Management cluster operations - Configure periodic backups of MariaDB.

Verify operability of the MariaDB backup jobs

This section was moved to MOSK documentation: Management cluster operations - Verify operability of the MariaDB backup jobs.

Restore MariaDB databases

This section was moved to MOSK documentation: Management cluster operations - Restore MariaDB databases.

Change the storage node for MariaDB on bare metal clusters

This section was moved to MOSK documentation: Management cluster operations - Change the storage node for MariaDB.

Remove a management cluster

This section was moved to MOSK documentation: Management cluster operations - Remove a management cluster.

Warm up the Container Cloud cache

TechPreview Available since 2.24.0 and 23.2 for MOSK clusters

This section was moved to MOSK documentation: Management cluster operations - Warm up the Container Cloud cache.

Self-diagnostics for management and managed clusters

Available since 2.28.0 (17.3.0 and 16.3.0)

The subsections of this section were moved to MOSK Operations Guide: Bare metal operations - Run cluster self-diagnostics.

Trigger self-diagnostics for a management or managed cluster

Available since 2.28.0 (17.3.0 and 16.3.0)

This section was moved to MOSK Operations Guide: Bare metal operations - Trigger self-diagnostics for a management or managed cluster.

Self-upgrades of the Diagnostic Controller

Available since 2.28.0 (17.3.0 and 16.3.0)

This section was moved to MOSK Operations Guide: Bare metal operations - Self-upgrades of the Diagnostic Controller.

Diagnostic checks for the bare metal provider

Available since 2.28.0 (17.3.0 and 16.3.0) Technology Preview

This section was moved to MOSK Operations Guide: Bare metal operations - Diagnostic checks for the bare metal provider.

Increase memory limits for cluster components

This section was moved to MOSK documentation: Underlay Kubernetes operations - Increase memory limits for cluster components.

Set the MTU size for Calico

TechPreview Available since 2.24.0 and 2.24.2 for MOSK 23.2

This section was moved to MOSK documentation: Underlay Kubernetes operations - Set the MTU size for Calico.

Increase storage quota for etcd

Available since Cluster releases 15.0.3 and 14.0.3

This section was moved to MOSK documentation: Underlay Kubernetes operations - Increase storage quota for etcd.

Configure Kubernetes auditing and profiling

Available since 2.24.3 (Cluster releases 15.0.2 and 14.0.2)

This section was moved to MOSK documentation: Underlay Kubernetes operations - Configure Kubernetes auditing and profiling.

Configure TLS certificates for cluster applications

Technology Preview

This section was moved to MOSK documentation: Underlay Kubernetes operations - Configure TLS certificates for cluster applications.

Define a custom CA certificate for a private Docker registry

This section was moved to MOSK documentation: Underlay Kubernetes operations - Define a custom CA certificate for a private Docker registry.

Enable cluster and machine maintenance mode

The subsections of this section were moved to MOSK documentation: General Operations - Enable cluster and machine maintenance mode.

Enable maintenance mode on a cluster and machine using web UI

This section was moved to MOSK documentation: General Operations - Enable maintenance mode on a cluster and machine using web UI.

Enable maintenance mode on a cluster and machine using CLI

This section was moved to MOSK documentation: General Operations - Enable maintenance mode on a cluster and machine using CLI.

Perform a graceful reboot of a cluster

Available since 2.23.0

This section was moved to MOSK documentation: General Operations - Perform a graceful reboot of a cluster.

Delete a cluster machine

The subsections of this section were moved to MOSK documentation: Delete a cluster machine.

Precautions for a cluster machine deletion

This section was moved to MOSK documentation: Precautions for a cluster machine deletion.

Delete a cluster machine using web UI

This section was moved to MOSK documentation: Delete a cluster machine using web UI.

Delete a cluster machine using CLI

This section was moved to MOSK documentation: Delete a cluster machine using CLI.

Manage IAM

The subsections of this section were moved to MOSK documentation: IAM operations.

Manage user roles through Container Cloud API

The subsections of this section were moved to MOSK documentation: IAM operations - Manage user roles through Container Cloud API.

Manage user roles through the Container Cloud web UI

This section was moved to MOSK documentation: IAM operations - Manage user roles through the Container Cloud web UI.

Manage user roles through Keycloak

The subsections of this section were moved to MOSK documentation: IAM operations - Manage user roles through Keycloak.

Container Cloud roles and scopes

This section was moved to MOSK documentation: IAM operations - Container Cloud roles and scopes.

Use cases

This section was moved to MOSK documentation: Manage user roles through Keycloak - Use cases.

Access the Keycloak Admin Console

This section was moved to Mirantis OpenStack for Kubernetes documentation: Getting access - Access the Keycloak Admin Console.

Change passwords for IAM users

This section was moved to MOSK documentation: IAM operations - Change passwords for IAM users.

Obtain MariaDB credentials for IAM

Available since Container Cloud 2.22.0

This section was moved to MOSK documentation: IAM operations - Obtain MariaDB credentials for IAM.

Manage Keycloak truststore using the Container Cloud web UI

Available since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK documentation: IAM operations - Manage Keycloak truststore using the Container Cloud web UI.

Manage StackLight

The subsections of this section were moved to MOSK Operations Guide: StackLight operations.

Access StackLight web UIs

This section was moved to Mirantis OpenStack for Kubernetes documentation: Getting access.

StackLight logging indices

Available since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK Reference Architecture: StackLight logging indices.

OpenSearch Dashboards

The subsections of this section were moved to MOSK Operations Guide: OpenSearch Dashboards.

View OpenSearch Dashboards

This section was moved to MOSK Operations Guide: View OpenSearch Dashboards.

Search in OpenSearch Dashboards

This section was moved to MOSK Operations Guide: Search in OpenSearch Dashboards.

Export logs from OpenSearch Dashboards to CSV

Available since 2.23.0 (12.7.0 and 11.7.0)

This section was moved to MOSK Operations Guide: Export logs from OpenSearch Dashboards to CSV.

Tune OpenSearch performance for the bare metal provider

This section was moved to MOSK Operations Guide: Tune OpenSearch performance.

View Grafana dashboards

This section was moved to MOSK Operations Guide: View Grafana dashboards.

Export data from Table panels of Grafana dashboards to CSV

This section was moved to MOSK Operations Guide: Export data from Table panels of Grafana dashboards to CSV.

Available StackLight alerts

The subsections of this section were moved to MOSK Operations Guide: StackLight alerts.

Alert dependencies

This section was moved to MOSK Operations Guide: Alert dependencies.

Alertmanager

This section was moved to MOSK Operations Guide: StackLight alerts - Alertmanager.

Bond interface

Available since 2.24.0 and 2.24.2 for MOSK 23.2

This section was moved to MOSK Operations Guide: Bare metal alerts - Bond interface.

cAdvisor

This section was moved to MOSK Operations Guide: StackLight alerts - cAdvisor.

Calico

This section was moved to MOSK Operations Guide: Generic alerts - Calico.

Ceph

This section was moved to MOSK Operations Guide: StackLight alerts - Ceph.

Docker Swarm

This section was moved to MOSK Operations Guide: StackLight alerts - Mirantis Kubernetes Engine.

Elasticsearch Exporter

This section was moved to MOSK Operations Guide: StackLight alerts - Elasticsearch Exporter.

Etcd

This section was moved to MOSK Operations Guide: Generic alerts - Etcd.

External endpoint

This section was moved to MOSK Operations Guide: StackLight alerts - Monitoring of external endpoints.

Fluentd

This section was moved to MOSK Operations Guide: StackLight alerts - Fluentd.

General alerts

This section was moved to MOSK Operations Guide: General StackLight alerts.

General node alerts

This section was moved to MOSK Operations Guide: StackLight alerts - Node.

Grafana

This section was moved to MOSK Operations Guide: StackLight alerts - Grafana.

Helm Controller

This section was moved to MOSK Operations Guide: Container Cloud alerts - Helm Controller.

Host Operating System Modules Controller

TechPreview since 2.28.0 (17.3.0 and 16.3.0)

This section was moved to MOSK Operations Guide: Bare metal alerts - Host Operating System Modules Controller.

Ironic

This section was moved to MOSK Operations Guide: Bare metal alerts - Ironic.

Kernel

This section was moved to MOSK Operations Guide: Bare metal alerts - Kernel.

Kubernetes applications

This section was moved to MOSK Operations Guide: Kubernetes alerts - Kubernetes applications.

Kubernetes resources

This section was moved to MOSK Operations Guide: Kubernetes alerts - Kubernetes resources.

Kubernetes storage

This section was moved to MOSK Operations Guide: Kubernetes alerts - Kubernetes storage.

Kubernetes system

This section was moved to MOSK Operations Guide: Kubernetes alerts - Kubernetes system.

Mirantis Container Cloud

This section was moved to MOSK Operations Guide: StackLight alerts - Mirantis Container Cloud.

Mirantis Container Cloud cache

This section was moved to MOSK Operations Guide: StackLight alerts - Mirantis Container Cloud cache.

Mirantis Container Cloud controllers

Available since Cluster releases 12.7.0 and 11.7.0

This section was moved to MOSK Operations Guide: StackLight alerts - Mirantis Container Cloud controllers.

Mirantis Container Cloud providers

Available since Cluster releases 12.7.0 and 11.7.0

This section was moved to MOSK Operations Guide: StackLight alerts - Mirantis Container Cloud providers.

Mirantis Kubernetes Engine

This section was moved to MOSK Operations Guide: StackLight alerts - Mirantis Kubernetes Engine.

Node network

This section was moved to MOSK Operations Guide: StackLight alerts - Node network.

Node time

This section was moved to MOSK Operations Guide: StackLight alerts - Node time.

OpenSearch

This section was moved to MOSK Operations Guide: StackLight alerts - OpenSearch.

PostgreSQL

This section was moved to MOSK Operations Guide: StackLight alerts - PostgreSQL.

Prometheus

This section was moved to MOSK Operations Guide: StackLight alerts - Prometheus.

Prometheus MS Teams

This section was moved to MOSK Operations Guide: StackLight alerts - Prometheus MS Teams.

Prometheus Relay

This section was moved to MOSK Operations Guide: StackLight alerts - Prometheus Relay.

Release Controller

This section was moved to MOSK Operations Guide: StackLight alerts - Release Controller.

ServiceNow

This section was moved to MOSK Operations Guide: StackLight alerts - ServiceNow.

Salesforce notifier

This section was moved to MOSK Operations Guide: StackLight alerts - Salesforce notifier.

SSL certificates

This section was moved to MOSK Operations Guide - StackLight alerts: Monitoring of external endpoints and Container Cloud SSL.

Telegraf

This section was moved to MOSK Operations Guide: StackLight alerts - Telegraf.

Telemeter

This section was moved to MOSK Operations Guide: StackLight alerts - Telemeter.

Troubleshoot alerts

The subsections of this section were moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot alerts.

Troubleshoot cAdvisor alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot cAdvisor alerts.

Troubleshoot Helm Controller alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Helm Controller alerts.

Troubleshoot Host Operating System Modules Controller alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Host Operating System Modules Controller alerts.

Troubleshoot Ubuntu kernel alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Ubuntu kernel alerts.

Troubleshoot Kubernetes applications alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Kubernetes applications alerts.

Troubleshoot Kubernetes resources alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Kubernetes resources alerts.

Troubleshoot Kubernetes storage alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Kubernetes storage alerts.

Troubleshoot Kubernetes system alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Kubernetes system alerts.

Troubleshoot Mirantis Container Cloud Exporter alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Mirantis Container Cloud Exporter alerts.

Troubleshoot Mirantis Kubernetes Engine alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Mirantis Kubernetes Engine alerts.

Troubleshoot OpenSearch alerts

Available since 2.26.0 (17.1.0 and 16.1.0)

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot OpenSearch alerts.

Troubleshoot Release Controller alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Release Controller alerts.

Troubleshoot Telemeter client alerts

This section was moved to MOSK Troubleshooting Guide: Troubleshoot StackLight - Troubleshoot Telemeter client alerts.

Silence alerts

This section was moved to MOSK documentation: Silence alerts.

StackLight rules for Kubernetes network policies

Available since Cluster releases 17.0.1 and 16.0.1

This section was moved to MOSK documentation: StackLight rules for Kubernetes network policies.

Configure StackLight

The subsections of this section were moved to MOSK Operations Guide: Configure StackLight.

StackLight configuration procedure

Thhis section was moved to MOSK Operations Guide: Configure StackLight - StackLight configuration procedure.

StackLight configuration parameters

Thhis section was moved to MOSK Operations Guide: Configure StackLight - StackLight configuration parameters.

Verify StackLight after configuration

Thhis section was moved to MOSK Operations Guide: Configure StackLight - Verify StackLight after configuration.

Tune StackLight for long-term log retention

Available since 2.24.0 and 2.24.2 for MOSK 23.2

This section was moved to MOSK Operations Guide: StackLight operations - Tune StackLight for long-term log retention.

Enable log forwarding to external destinations

Available since 2.23.0 and 2.23.1 for MOSK 23.1

This section was moved to MOSK Operations Guide: StackLight operations - Enable log forwarding to external destinations.

Enable remote logging to syslog

Deprecated since 2.23.0

This section was moved to MOSK Operations Guide: StackLight operations - Enable remote logging to syslog.

Create logs-based metrics

This section was moved to MOSK Operations Guide: StackLight operations - Create logs-based metrics.

Enable generic metric scraping

This section was moved to MOSK Operations Guide: StackLight operations - Enable generic metric scraping.

Manage metrics filtering

Available since 2.24.0 and 2.24.2 for MOSK 23.2

This section was moved to MOSK Operations Guide: StackLight operations - Manage metrics filtering.

Use S.M.A.R.T. metrics for creating alert rules on bare metal clusters

Available since 2.27.0 (Cluster releases 17.2.0 and 16.2.0)

This section was moved to MOSK Operations Guide: StackLight operations - Use S.M.A.R.T. metrics for creating alert rules.

Deschedule StackLight Pods from a worker machine

This section was moved to MOSK Operations Guide: Deschedule StackLight Pods from a worker machine.

Calculate the storage retention time

Obsolete since 2.26.0 (17.1.0, 16.1.0) for OpenSearch Available since 2.22.0 and 2.23.1 (12.7.0, 11.6.0)

This section was moved to MOSK documentation: Calculate the storage retention time.

Troubleshooting

This section was moved to MOSK documentation: Troubleshooting Guide.

Collect cluster logs

This section was moved to MOSK documentation: Collect cluster logs.

Cluster deletion or detachment freezes

This section was moved to MOSK documentation: Cluster deletion or detachment freezes.

Keycloak admin console becomes inaccessible after changing the theme

This section was moved to MOSK documentation: Keycloak admin console becomes inaccessible after changing the theme.

The ‘database space exceeded’ error on large clusters

This section was moved to MOSK documentation: The ‘database space exceeded’ error on large clusters.

The auditd events cause ‘backlog limit exceeded’ messages

This section was moved to MOSK documentation: The auditd events cause ‘backlog limit exceeded’ messages.

Troubleshoot baremetal-based clusters

This section was moved to MOSK documentation: Troubleshoot bare metal.

Log in to the IPA virtual console for hardware troubleshooting

This section was moved to MOSK documentation: Log in to the IPA virtual console for hardware troubleshooting.

Bare metal hosts in ‘provisioned registration error’ state after update

This section was moved to MOSK documentation: Bare metal hosts in provisioned registration error state after update.

Troubleshoot an operating system upgrade with host restart

This section was moved to MOSK documentation: Troubleshoot an operating system upgrade with host restart.

Troubleshoot iPXE boot issues

This section was moved to MOSK documentation: Troubleshoot iPXE boot issues.

Provisioning failure due to device naming issues in a bare metal host profile

This section was moved to MOSK documentation: Provisioning failure due to device naming issues in a bare metal host profile.

Troubleshoot Ceph

This section was moved to MOSK documentation: Troubleshoot Ceph.

Ceph disaster recovery

This section was moved to MOSK documentation: Ceph disaster recovery.

Ceph Monitors recovery

This section was moved to MOSK documentation: Ceph Monitors recovery.

Remove Ceph OSD manually

This section was moved to MOSK documentation: Remove Ceph OSD manually.

KaaSCephOperationRequest failure with a timeout during rebalance

This section was moved to MOSK documentation: KaaSCephOperationRequest failure with a timeout during rebalance.

Ceph Monitors store.db size rapidly growing

This section was moved to MOSK documentation: Ceph Monitors store.db size rapidly growing.

Replaced Ceph OSD fails to start on authorization

This section was moved to MOSK documentation: Replaced Ceph OSD fails to start on authorization.

The ceph-exporter pods are present in the Ceph crash list

This section was moved to MOSK documentation: The ceph-exporter pods are present in the Ceph crash list.

Troubleshoot StackLight

This section was moved to MOSK documentation: Troubleshoot StackLight.

Patroni replication lag

This section was moved to MOSK documentation: Patroni replication lag.

Alertmanager does not send resolve notifications for custom alerts

This section was moved to MOSK documentation: Alertmanager does not send resolve notifications for custom alerts.

OpenSearchPVCMismatch alert raises due to the OpenSearch PVC size mismatch

This section was moved to MOSK documentation: OpenSearchPVCMismatch alert raises due to the OpenSearch PVC size mismatch.

OpenSearch cluster deadlock due to the corrupted index

This section was moved to MOSK documentation: OpenSearch cluster deadlock due to the corrupted index.

Failure of shard relocation in the OpenSearch cluster

This section was moved to MOSK documentation: Failure of shard relocation in the OpenSearch cluster.

StackLight pods get stuck with the ‘NodeAffinity failed’ error

This section was moved to MOSK documentation: StackLight pods get stuck with the NodeAffinity failed error.

No logs are forwarded to Splunk

This section was moved to MOSK documentation: No logs are forwarded to Splunk.

Security Guide

This guide was moved to MOSK documentation: Security Guide.

Firewall configuration

This section was moved to MOSK documentation: Firewall configuration.

Container Cloud

This section was moved to MOSK documentation: Firewall configuration - Container Cloud.

Mirantis Kubernetes Engine

For available Mirantis Kubernetes Engine (MKE) ports, refer to MKE Documentation: Open ports to incoming traffic.

StackLight

This section was moved to MOSK documentation: Firewall configuration - StackLight.

Ceph

This section was moved to MOSK documentation: Firewall configuration - Ceph.

Container images signing and validation

Available since 2.26.0 (17.1.0 and 16.1.0) Technology Preview

This section was moved to MOSK documentation: Container images signing and validation.

API Reference

Warning

This section is intended only for advanced Infrastructure Operators who are familiar with Kubernetes Cluster API.

Mirantis currently supports only those Mirantis Container Cloud API features that are implemented in the Container Cloud web UI. Use other Container Cloud API features for testing and evaluation purposes only.

The Container Cloud APIs are implemented using the Kubernetes CustomResourceDefinitions (CRDs) that enable you to expand the Kubernetes API. Different types of resources are grouped in the dedicated files, such as cluster.yaml or machines.yaml.

For testing and evaluation purposes, you may also use the experimental public Container Cloud API that allows for implementation of custom clients for creating and operating of managed clusters. This repository contains branches that correspond to the Container Cloud releases. For an example usage, refer to the README file of the repository.

Public key resources

This section describes the PublicKey resource used in Mirantis Container Cloud API to provide SSH access to every machine of a cluster.

The Container Cloud PublicKey CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1

  • kind

    Object type that is PublicKey

  • metadata

    The metadata object field of the PublicKey resource contains the following fields:

    • name

      Name of the public key

    • namespace

      Project where the public key is created

  • spec

    The spec object field of the PublicKey resource contains the publicKey field that is an SSH public key value.

The PublicKey resource example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: PublicKey
metadata:
  name: demokey
  namespace: test
spec:
  publicKey: |
    ssh-rsa AAAAB3NzaC1yc2EAAAA…

License resource

This section describes the License custom resource (CR) used in Mirantis Container Cloud API to maintain the Mirantis Container Cloud license data.

Warning

The kubectl apply command automatically saves the applied data as plain text into the kubectl.kubernetes.io/last-applied-configuration annotation of the corresponding object. This may result in revealing sensitive data in this annotation when creating or modifying the object.

Therefore, do not use kubectl apply on this object. Use kubectl create, kubectl patch, or kubectl edit instead.

If you used kubectl apply on this object, you can remove the kubectl.kubernetes.io/last-applied-configuration annotation from the object using kubectl edit.

The Container Cloud License CR contains the following fields:

  • apiVersion

    The API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    The object type that is License.

  • metadata

    The metadata object field of the License resource contains the following fields:

    • name

      The name of the License object, must be license.

  • spec

    The spec object field of the License resource contains the Secret reference where license data is stored.

    • license

      • secret

        The Secret reference where the license data is stored.

        • key

          The name of a key in the license Secret data field under which the license data is stored.

        • name

          The name of the Secret where the license data is stored.

      • value

        The value of the updated license. If you need to update the license, place it under this field. The new license data will be placed to the Secret and value will be cleaned.

  • status
    • customerID

      The unique ID of a customer generated during the license issuance.

    • instance

      The unique ID of the current Mirantis Container Cloud instance.

    • dev

      The license is for development.

    • openstack

      The license limits for MOSK clusters:

      • clusters

        The maximum number of MOSK clusters to be deployed. If the field is absent, the number of deployments is unlimited.

      • workersPerCluster

        The maximum number of workers per MOSK cluster to be created. If the field is absent, the number of workers is unlimited.

    • expirationTime

      The license expiration time in the ISO 8601 format.

    • expired

      The license expiration state. If the value is true, the license has expired. If the field is absent, the license is valid.

Configuration example of the status fields:

status:
 customerID: "auth0|5dd501e54138450d337bc356"
 instance: 7589b5c3-57c5-4e64-96a0-30467189ae2b
 dev: true
 limits:
   clusters: 3
   workersPerCluster: 5
 expirationTime: 2028-11-28T23:00:00Z

Diagnostic resource

Available since 2.28.0 (17.3.0 and 16.3.0)

This section describes the Diagnostic custom resource (CR) used in Mirantis Container Cloud API to trigger self-diagnostics for management or managed clusters.

The Container Cloud Diagnostic CR contains the following fields:

  • apiVersion

    API version of the object that is diagnostic.mirantis.com/v1alpha1.

  • kind

    Object type that is Diagnostic.

  • metadata

    Object metadata that contains the following fields:

    • name

      Name of the Diagnostic object.

    • namespace

      Namespace used to create the Diagnostic object. Must be equal to the namespace of the target cluster.

  • spec

    Resource specification that contains the following fields:

    • cluster

      Name of the target cluster to run diagnostics on.

    • checks

      Reserved for internal usage, any override will be discarded.

  • status
    • finishedAt

      Completion timestamp of diagnostics. If the Diagnostic Controller version is outdated, this field is not set and the corresponding error message is displayed in the error field.

    • error

      Error that occurs during diagnostics or if the Diagnostic Controller version is outdated. Omitted if empty.

    • controllerVersion

      Version of the controller that launched diagnostics.

    • result

      Map of check statuses where the key is the check name and the value is the result of the corresponding diagnostic check:

      • description

        Description of the check in plain text.

      • result

        Result of diagnostics. Possible values are PASS, ERROR, FAIL, WARNING, INFO.

      • message

        Optional. Explanation of the check results. It may optionally contain a reference to the documentation describing a known issue related to the check results, including the existing workaround for the issue.

      • success

        Success status of the check. Boolean.

      • ticketInfo

        Optional. Information about the ticket to track the resolution progress of the known issue related to the check results. For example, FIELD-12345.

The Diagnostic resource example:

apiVersion: diagnostic.mirantis.com/v1alpha1
kind: Diagnostic
metadata:
  name: test-diagnostic
  namespace: test-namespace
spec:
  cluster: test-cluster
status:
  finishedAt: 2024-07-01T11:27:14Z
  error: ""
  controllerVersion: v1.40.11
  result:
    bm_address_capacity:
      description: Baremetal addresses capacity
      message: LCM Subnet 'default/k8s-lcm-nics' has 8 allocatable addresses (threshold
        is 5) - OK; PXE-NIC Subnet 'default/k8s-pxe-nics' has 7 allocatable addresses
        (threshold is 5) - OK; Auto-assignable address pool 'default' from MetallbConfig
        'default/kaas-mgmt-metallb' has left 21 available IP addresses (threshold
        is 10) - OK
      result: INFO
      success: true
    bm_artifacts_overrides:
      description: Baremetal overrides check
      message: BM operator has no undesired overrides
      result: PASS
      success: true

IAM resources

This section contains descriptions and examples of the IAM resources for Mirantis Container Cloud. For management details, see Manage user roles through Container Cloud API.


IAMUser

IAMUser is the Cluster (non-namespaced) object. Its objects are synced from Keycloak that is they are created upon user creation in Keycloak and deleted user upon deletion in Keycloak. The IAMUser is exposed as read-only to all users. It contains the following fields:

  • apiVersion

    API version of the object that is iam.mirantis.com/v1alpha1

  • kind

    Object type that is IAMUser

  • metadata

    Object metadata that contains the following field:

    • name

      Sanitized user name without special characters with first 8 symbols of the user UUID appended to the end

  • displayName

    Name of the user as defined in the Keycloak database

  • externalID

    ID of the user as defined in the Keycloak database

Configuration example:

apiVersion: iam.mirantis.com/v1alpha1
kind: IAMUser
metadata:
  name: userone-f150d839
displayName: userone
externalID: f150d839-d03a-47c4-8a15-4886b7349791
IAMRole

IAMRole is the read-only cluster-level object that can have global, namespace, or cluster scope. It contains the following fields:

  • apiVersion

    API version of the object that is iam.mirantis.com/v1alpha1.

  • kind

    Object type that is IAMRole.

  • metadata

    Object metadata that contains the following field:

    • name

      Role name. Possible values are: global-admin, cluster-admin, operator, bm-pool-operator, user, member, stacklight-admin, management-admin.

      For details on user role assignment, see Manage user roles through Container Cloud API.

      Note

      The management-admin role is available since Container Cloud 2.25.0 (Cluster releases 17.0.0, 16.0.0, 14.1.0).

  • description

    Role description.

  • scope

    Role scope.

Configuration example:

apiVersion: iam.mirantis.com/v1alpha1
kind: IAMRole
metadata:
  name: global-admin
description: Gives permission to manage IAM role bindings in the Container Cloud deployment.
scope: global
IAMGlobalRoleBinding

IAMGlobalRoleBinding is the Cluster (non-namespaced) object that should be used for global role bindings in all namespaces. This object is accessible to users with the global-admin IAMRole assigned through the IAMGlobalRoleBinding object. The object contains the following fields:

  • apiVersion

    API version of the object that is iam.mirantis.com/v1alpha1.

  • kind

    Object type that is IAMGlobalRoleBinding.

  • metadata

    Object metadata that contains the following field:

    • name

      Role binding name. If the role binding is user-created, user can set any unique name. If a name relates to a binding that is synced by user-controller from Keycloak, the naming convention is <username>-<rolename>.

  • role

    Object role that contains the following field:

    • name

      Role name.

  • user

    Object name that contains the following field:

    • name

      Name of the iamuser object that the defined role is provided to. Not equal to the user name in Keycloak.

  • legacy

    Defines whether the role binding is legacy. Possible values are true or false.

  • legacyRole

    Applicable when the legacy field value is true. Defines the legacy role name in Keycloak.

  • external

    Defines whether the role is assigned through Keycloak and is synced by user-controller with the Container Cloud API as the IAMGlobalRoleBinding object. Possible values are true or false.

Caution

If you create the IAM*RoleBinding, do not set or modify the legacy, legacyRole, and external fields unless absolutely necessary and you understand all implications.

Configuration example:

apiVersion: iam.mirantis.com/v1alpha1
kind: IAMGlobalRoleBinding
metadata:
  name: userone-global-admin
role:
  name: global-admin
user:
  name: userone-f150d839
external: false
legacy: false
legacyRole: “”
IAMRoleBinding

IAMRoleBinding is the namespaced object that represents a grant of one role to one user in all clusters of the namespace. It is accessible to users that have either of the following bindings assigned to them:

  • IAMGlobalRoleBinding that binds them with the global-admin, operator, or user iamRole. For user, the bindings are read-only.

  • IAMRoleBinding that binds them with the operator or user iamRole in a particular namespace. For user, the bindings are read-only.

  • apiVersion

    API version of the object that is iam.mirantis.com/v1alpha1.

  • kind

    Object type that is IAMRoleBinding.

  • metadata

    Object metadata that contains the following fields:

    • namespace

      Namespace that the defined binding belongs to.

    • name

      Role binding name. If the role is user-created, user can set any unique name. If a name relates to a binding that is synced from Keycloak, the naming convention is <userName>-<roleName>.

  • legacy

    Defines whether the role binding is legacy. Possible values are true or false.

  • legacyRole

    Applicable when the legacy field value is true. Defines the legacy role name in Keycloak.

  • external

    Defines whether the role is assigned through Keycloak and is synced by user-controller with the Container Cloud API as the IAMGlobalRoleBinding object. Possible values are true or false.

Caution

If you create the IAM*RoleBinding, do not set or modify the legacy, legacyRole, and external fields unless absolutely necessary and you understand all implications.

  • role

    Object role that contains the following field:

    • name

      Role name.

  • user

    Object user that contains the following field:

    • name

      Name of the iamuser object that the defined role is granted to. Not equal to the user name in Keycloak.

Configuration example:

apiVersion: iam.mirantis.com/v1alpha1
kind: IAMRoleBinding
metadata:
  namespace: nsone
  name: userone-operator
external: false
legacy: false
legacyRole: “”
role:
  name: operator
user:
  name: userone-f150d839
IAMClusterRoleBinding

IAMClusterRoleBinding is the namespaced object that represents a grant of one role to one user on one cluster in the namespace. This object is accessible to users that have either of the following bindings assigned to them:

  • IAMGlobalRoleBinding that binds them with the global-admin, operator, or user iamRole. For user, the bindings are read-only.

  • IAMRoleBinding that binds them with the operator or user iamRole in a particular namespace. For user, the bindings are read-only.

The IAMClusterRoleBinding object contains the following fields:

  • apiVersion

    API version of the object that is iam.mirantis.com/v1alpha1.

  • kind

    Object type that is IAMClusterRoleBinding.

  • metadata

    Object metadata that contains the following fields:

    • namespace

      Namespace of the cluster that the defined binding belongs to.

    • name

      Role binding name. If the role is user-created, user can set any unique name. If a name relates to a binding that is synced from Keycloak, the naming convention is <userName>-<roleName>-<clusterName>.

  • role

    Object role that contains the following field:

    • name

      Role name.

  • user

    Object user that contains the following field:

    • name

      Name of the iamuser object that the defined role is granted to. Not equal to the user name in Keycloak.

  • cluster

    Object cluster that contains the following field:

    • name

      Name of the cluster on which the defined role is granted.

  • legacy

    Defines whether the role binding is legacy. Possible values are true or false.

  • legacyRole

    Applicable when the legacy field value is true. Defines the legacy role name in Keycloak.

  • external

    Defines whether the role is assigned through Keycloak and is synced by user-controller with the Container Cloud API as the IAMGlobalRoleBinding object. Possible values are true or false.

Caution

If you create the IAM*RoleBinding, do not set or modify the legacy, legacyRole, and external fields unless absolutely necessary and you understand all implications.

Configuration example:

apiVersion: iam.mirantis.com/v1alpha1
kind: IAMClusterRoleBinding
metadata:
  namespace: nsone
  name: userone-clusterone-admin
role:
  name: cluster-admin
user:
  name: userone-f150d839
cluster:
  name: clusterone
legacy: false
legacyRole: “”
external: false

ClusterOIDCConfiguration resource for MKE

Available since 17.0.0, 16.0.0, and 14.1.0

This section contains description of the OpenID Connect (OIDC) custom resource for Mirantis Container Cloud that you can use to customize OIDC for Mirantis Kubernetes Engine (MKE) on managed clusters. Using this resource, add your own OIDC provider to authenticate user requests to Kubernetes. For OIDC provider requirements, see OIDC official specification.

The creation procedure of the ClusterOIDCConfiguration for a managed cluster is described in Add a custom OIDC provider for MKE.

The Container Cloud ClusterOIDCConfiguration custom resource contains the following fields:

  • apiVersion

    The API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    The object type that is ClusterOIDCConfiguration.

  • metadata

    The metadata object field of the ClusterOIDCConfiguration resource contains the following fields:

    • name

      The object name.

    • namespace

      The project name (Kubernetes namespace) of the related managed cluster.

  • spec

    The spec object field of the ClusterOIDCConfiguration resource contains the following fields:

    • adminRoleCriteria

      Definition of the id_token claim with the admin role and the role value.

      • matchType

        Matching type of the claim with the requested role. Possible values that MKE uses to match the claim with the requested value:

        • must

          Requires a plain string in the id_token claim, for example, "iam_role": "mke-admin".

        • contains

          Requires an array of strings in the id_token claim, for example, "iam_role": ["mke-admin", "pod-reader"].

      • name

        Name of the admin id_token claim containing a role or array of roles.

      • value

        Role value that matches the "iam_role" value in the admin id_token claim.

    • caBundle

      Base64-encoded certificate authority bundle of the OIDC provider endpoint.

    • clientID

      ID of the OIDC client to be used by Kubernetes.

    • clientSecret

      Secret value of the clientID parameter. After the ClusterOIDCConfiguration object creation, this field is updated automatically with a reference to the corresponding Secret. For example:

      clientSecret:
      secret:
        key: value
        name: CLUSTER_NAME-wqbkj
      
    • issuer

      OIDC endpoint.

Configuration example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: ClusterOIDCConfiguration
metadata:
  name: CLUSTER_NAME
  namespace: CLUSTER_NAMESPACE
spec:
  adminRoleCriteria:
    matchType: contains
    name: iam_roles
    value: mke-admin
  caBundle: BASE64_ENCODED_CA
  clientID: MY_CLIENT
  clientSecret:
    value: MY_SECRET
  issuer: https://auth.example.com/

UpdateGroup resource

Available since 2.27.0 (17.2.0 and 16.2.0)

This section describes the UpdateGroup custom resource (CR) used in the Container Cloud API to configure update concurrency for specific sets of machines or machine pools within a cluster. This resource enhances the update process by allowing a more granular control over the concurrency of machine updates. This resource also provides a way to control the reboot behavior of machines during a Cluster release update.

The Container Cloud UpdateGroup CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is UpdateGroup.

  • metadata

    Metadata of the UpdateGroup CR that contains the following fields. All of them are required.

    • name

      Name of the UpdateGroup object.

    • namespace

      Project where the UpdateGroup is created.

    • labels

      Label to associate the UpdateGroup with a specific cluster in the cluster.sigs.k8s.io/cluster-name: <cluster-name> format.

  • spec

    Specification of the UpdateGroup CR that contains the following fields:

    • index

      Index to determine the processing order of the UpdateGroup object. Groups with the same index are processed concurrently.

      The update order of a machine within the same group is determined by the upgrade index of a specific machine. For details, see Change the upgrade order of a machine or machine pool.

    • concurrentUpdates

      Number of machines to update concurrently within UpdateGroup.

    • rebootIfUpdateRequires Since 2.28.0 (17.3.0 and 16.3.0)

      Technology Preview. Automatic reboot of controller or worker machines of an update group if a Cluster release update involves node reboot, for example, when kernel version update is available in new Cluster release. You can set this parameter for management or managed clusters.

      Boolean. By default, true on management clusters and false on managed clusters. On managed clusters:

      • If set to true, related machines are rebooted as part of a Cluster release update that requires a reboot.

      • If set to false, machines are not rebooted even if a Cluster release update requires a reboot.

      Caution

      During a distribution upgrade, machines are always rebooted, overriding rebootIfUpdateRequires: false.

Configuration example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: UpdateGroup
metadata:
  name: update-group-example
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: managed-cluster
spec:
  index: 10
  concurrentUpdates: 2
  rebootIfUpdateRequires: false

MCCUpgrade resource

This section describes the MCCUpgrade resource used in Mirantis Container Cloud API to configure a schedule for the Container Cloud update.

The Container Cloud MCCUpgrade CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is MCCUpgrade.

  • metadata

    The metadata object field of the MCCUpgrade resource contains the following fields:

    • name

      The name of MCCUpgrade object, must be mcc-upgrade.

  • spec

    The spec object field of the MCCUpgrade resource contains the schedule when Container Cloud update is allowed or blocked. This field contains the following fields:

    • blockUntil

      Deprecated since Container Cloud 2.28.0 (Cluster release 16.3.0). Use autoDelay instead.

      Time stamp in the ISO 8601 format, for example, 2021-12-31T12:30:00-05:00. Updates will be disabled until this time. You cannot set this field to more than 7 days in the future and more than 30 days after the latest Container Cloud release.

    • autoDelay

      Available since Container Cloud 2.28.0 (Cluster release 16.3.0).

      Flag that enables delay of the management cluster auto-update to a new Container Cloud release and ensures that auto-update is not started immediately on the release date. Boolean, false by default.

      The delay period is minimum 20 days for each newly discovered release and depends on specifics of each release cycle and on optional configuration of week days and hours selected for update. You can verify the exact date of a scheduled auto-update in the status section of the MCCUpgrade object.

      Note

      Modifying the delay period is not supported.

    • timeZone

      Name of a time zone in the IANA Time Zone Database. This time zone will be used for all schedule calculations. For example: Europe/Samara, CET, America/Los_Angeles.

    • schedule

      List of schedule items that allow an update at specific hours or weekdays. The update process can proceed if at least one of these items allows it. Schedule items allow update when both hours and weekdays conditions are met. When this list is empty or absent, update is allowed at any hour of any day. Every schedule item contains the following fields:

      • hours

        Object with 2 fields: from and to. Both must be non-negative integers not greater than 24. The to field must be greater than the from one. Update is allowed if the current hour in the time zone specified by timeZone is greater or equals to from and is less than to. If hours is absent, update is allowed at any hour.

      • weekdays

        Object with boolean fields with these names:

        • monday

        • tuesday

        • wednesday

        • thursday

        • friday

        • saturday

        • sunday

        Update is allowed only on weekdays that have the corresponding field set to true. If all fields are false or absent, or weekdays is empty or absent, update is allowed on all weekdays.

    Full spec example:

    spec:
      autoDelay: true
      timeZone: CET
      schedule:
      - hours:
          from: 10
          to: 17
        weekdays:
          monday: true
          tuesday: true
      - hours:
          from: 7
          to: 10
        weekdays:
          monday: true
          friday: true
    

    In this example, all schedule calculations are done in the CET timezone and upgrades are allowed only:

    • From 7:00 to 17:00 on Mondays

    • From 10:00 to 17:00 on Tuesdays

    • From 7:00 to 10:00 on Fridays

  • status

    The status object field of the MCCUpgrade resource contains information about the next planned Container Cloud update, if available. This field contains the following fields:

    • nextAttempt Deprecated since 2.28.0 (Cluster release 16.3.0)

      Time stamp in the ISO 8601 format indicating the time when the Release Controller will attempt to discover and install a new Container Cloud release. Set to the next allowed time according to the schedule configured in spec or one minute in the future if the schedule currently allows update.

    • message Deprecated since 2.28.0 (Cluster release 16.3.0)

      Message from the last update step or attempt.

    • nextRelease

      Object describing the next release that Container Cloud will be updated to. Absent if no new releases have been discovered. Contains the following fields:

      • version

        Semver-compatible version of the next Container Cloud release, for example, 2.22.0.

      • date

        Time stamp in the ISO 8601 format of the Container Cloud release defined in version:

        • Since 2.28.0 (Cluster release 16.3.0), the field indicates the publish time stamp of a new release.

        • Before 2.28.0 (Cluster release 16.2.x or earlier), the field indicates the discovery time stamp of a new release.

      • scheduled

        Available since Container Cloud 2.28.0 (Cluster release 16.3.0). Time window that the pending Container Cloud release update is scheduled for:

        • startTime

          Time stamp in the ISO 8601 format indicating the start time of the update for the pending Container Cloud release.

        • endTime

          Time stamp in the ISO 8601 format indicating the end time of the update for the pending Container Cloud release.

    • lastUpgrade

      Time stamps of the latest Container Cloud update:

      • startedAt

        Time stamp in the ISO 8601 format indicating the time when the last Container Cloud update started.

      • finishedAt

        Time stamp in the ISO 8601 format indicating the time when the last Container Cloud update finished.

    • conditions

      Available since Container Cloud 2.28.0 (Cluster release 16.3.0). List of status conditions describing the status of the MCCUpgrade resource. Each condition has the following format:

      • type

        Condition type representing a particular aspect of the MCCUpgrade object. Currently, the only supported condition type is Ready that defines readiness to process a new release.

        If the status field of the Ready condition type is False, the Release Controller blocks the start of update operations.

      • status

        Condition status. Possible values: True, False, Unknown.

      • reason

        Machine-readable explanation of the condition.

      • lastTransitionTime

        Time of the latest condition transition.

      • message

        Human-readable description of the condition.

Example of MCCUpgrade status:

status:
  conditions:
  - lastTransitionTime: "2024-09-16T13:22:27Z"
    message: New release scheduled for upgrade
    reason: ReleaseScheduled
    status: "True"
    type: Ready
  lastUpgrade: {}
  message: ''
  nextAttempt: "2024-09-16T13:23:27Z"
  nextRelease:
    date: "2024-08-25T21:05:46Z"
    scheduled:
      endTime: "2024-09-17T00:00:00Z"
      startTime: "2024-09-16T00:00:00Z"
    version: 2.28.0

ClusterUpdatePlan resource

Available since 2.27.0 (17.2.0 and 16.2.0) TechPreview

This section describes the ClusterUpdatePlan custom resource (CR) used in the Container Cloud API to granularly control update process of a managed cluster by stopping the update after each step.

The ClusterUpdatePlan CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is ClusterUpdatePlan.

  • metadata

    Metadata of the ClusterUpdatePlan CR that contains the following fields:

    • name

      Name of the ClusterUpdatePlan object.

    • namespace

      Project name of the cluster that relates to ClusterUpdatePlan.

  • spec

    Specification of the ClusterUpdatePlan CR that contains the following fields:

    • source

      Source name of the Cluster release from which the cluster is updated.

    • target

      Target name of the Cluster release to which the cluster is updated.

    • cluster

      Name of the cluster for which ClusterUpdatePlan is created.

    • releaseNotes

      Available since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0). Link to MOSK release notes of the target release.

    • steps

      List of update steps, where each step contains the following fields:

      • id

        Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step ID.

      • name

        Step name.

      • description

        Step description.

      • constraints

        Description of constraints applied during the step execution.

      • impact

        Impact of the step on the cluster functionality and workloads. Contains the following fields:

        • users

          Impact on the Container Cloud user operations. Possible values: none, major, or minor.

        • workloads

          Impact on workloads. Possible values: none, major, or minor.

        • info

          Additional details on impact, if any.

      • duration

        Details about duration of the step execution. Contains the following fields:

        • estimated

          Estimated time to complete the update step.

          Note

          Before Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0), this field was named eta.

        • info

          Additional details on update duration, if any.

      • granularity

        Information on the current step granularity. Indicates whether the current step is applied to each machine individually or to the entire cluster at once. Possible values are cluster or machine.

      • commence

        Flag that allows controlling the step execution. Boolean, false by default. If set to true, the step starts execution after all previous steps are completed.

        Caution

        Cancelling an already started update step is unsupported.

  • status

    Status of the ClusterUpdatePlan CR that contains the following fields:

    • startedAt

      Time when ClusterUpdatePlan has started.

    • completedAt

      Available since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0). Time of update completion.

    • status

      Overall object status.

    • steps

      List of step statuses in the same order as defined in spec. Each step status contains the following fields:

      • id

        Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step ID.

      • name

        Step name.

      • status

        Step status. Possible values are:

        • NotStarted

          Step has not started yet.

        • Scheduled

          Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). Step is already triggered but its execution has not started yet.

        • InProgress

          Step is currently in progress.

        • AutoPaused

          Available since Container Cloud 2.29.0 (Cluster release 17.4.0) as Technology Preview. Update is automatically paused by the trigger from a firing alert defined in the UpdateAutoPause configuration. For details, see UpdateAutoPause resource.

        • Stuck

          Step execution contains an issue, which also indicates that the step does not fit into the estimate defined in the duration field for this step in spec.

        • Completed

          Step has been completed.

      • message

        Message describing status details the current update step.

      • duration

        Current duration of the step execution.

      • startedAt

        Start time of the step execution.

Example of a ClusterUpdatePlan object:

apiVersion: kaas.mirantis.com/v1alpha1
kind: ClusterUpdatePlan
metadata:
  creationTimestamp: "2025-02-06T16:53:51Z"
  generation: 11
  name: mosk-17.4.0
  namespace: child
  resourceVersion: "6072567"
  uid: 82c072be-1dc5-43dd-b8cf-bc643206d563
spec:
  cluster: mosk
  releaseNotes: https://docs.mirantis.com/mosk/latest/25.1-series.html
  source: mosk-17-3-0-24-3
  steps:
  - commence: true
    description:
    - install new version of OpenStack and Tungsten Fabric life cycle management
      modules
    - OpenStack and Tungsten Fabric container images pre-cached
    - OpenStack and Tungsten Fabric control plane components restarted in parallel
    duration:
      estimated: 1h30m0s
      info:
      - 15 minutes to cache the images and update the life cycle management modules
      - 1h to restart the components
    granularity: cluster
    id: openstack
    impact:
      info:
      - some of the running cloud operations may fail due to restart of API services
        and schedulers
      - DNS might be affected
      users: minor
      workloads: minor
    name: Update OpenStack and Tungsten Fabric
  - commence: true
    description:
    - Ceph version update
    - restart Ceph monitor, manager, object gateway (radosgw), and metadata services
    - restart OSD services node-by-node, or rack-by-rack depending on the cluster
      configuration
    duration:
      estimated: 8m30s
      info:
      - 15 minutes for the Ceph version update
      - around 40 minutes to update Ceph cluster of 30 nodes
    granularity: cluster
    id: ceph
    impact:
      info:
      - 'minor unavailability of object storage APIs: S3/Swift'
      - workloads may experience IO performance degradation for the virtual storage
        devices backed by Ceph
      users: minor
      workloads: minor
    name: Update Ceph
  - commence: true
    description:
    - new host OS kernel and packages get installed
    - host OS configuration re-applied
    - container runtime version gets bumped
    - new versions of Kubernetes components installed
    duration:
      estimated: 1h40m0s
      info:
      - about 20 minutes to update host OS per a Kubernetes controller, nodes updated
        one-by-one
      - Kubernetes components update takes about 40 minutes, all nodes in parallel
    granularity: cluster
    id: k8s-controllers
    impact:
      users: none
      workloads: none
    name: Update host OS and Kubernetes components on master nodes
  - commence: true
    description:
    - new host OS kernel and packages get installed
    - host OS configuration re-applied
    - container runtime version gets bumped
    - new versions of Kubernetes components installed
    - data plane components (Open vSwitch and Neutron L3 agents, TF agents and vrouter)
      restarted on gateway and compute nodes
    - storage nodes put to “no-out” mode to prevent rebalancing
    - by default, nodes are updated one-by-one, a node group can be configured to
      update several nodes in parallel
    duration:
      estimated: 8h0m0s
      info:
      - host OS update - up to 15 minutes per node (not including host OS configuration
        modules)
      - Kubernetes components update - up to 15 minutes per node
      - OpenStack controllers and gateways updated one-by-one
      - nodes hosting Ceph OSD, monitor, manager, metadata, object gateway (radosgw)
        services updated one-by-one
    granularity: machine
    id: k8s-workers-vdrok-child-default
    impact:
      info:
      - 'OpenStack controller nodes: some running OpenStack operations might not
        complete due to restart of components'
      - 'OpenStack compute nodes: minor loss of the East-West connectivity with
        the Open vSwitch networking back end that causes approximately 5 min of
        downtime'
      - 'OpenStack gateway nodes: minor loss of the North-South connectivity with
        the Open vSwitch networking back end: a non-distributed HA virtual router
        needs up to 1 minute to fail over; a non-distributed and non-HA virtual
        router failover time depends on many factors and may take up to 10 minutes'
      users: major
      workloads: major
    name: Update host OS and Kubernetes components on worker nodes, group vdrok-child-default
  - commence: true
    description:
    - restart of StackLight, MetalLB services
    - restart of auxiliary controllers and charts
    duration:
      estimated: 1h30m0s
    granularity: cluster
    id: mcc-components
    impact:
      info:
      - minor cloud API downtime due restart of MetalLB components
      users: minor
      workloads: none
    name: Auxiliary components update
  target: mosk-17-4-0-25-1
status:
  completedAt: "2025-02-07T19:24:51Z"
  startedAt: "2025-02-07T17:07:02Z"
  status: Completed
  steps:
  - duration: 26m36.355605528s
    id: openstack
    message: Ready
    name: Update OpenStack and Tungsten Fabric
    startedAt: "2025-02-07T17:07:02Z"
    status: Completed
  - duration: 6m1.124356485s
    id: ceph
    message: Ready
    name: Update Ceph
    startedAt: "2025-02-07T17:33:38Z"
    status: Completed
  - duration: 24m3.151554465s
    id: k8s-controllers
    message: Ready
    name: Update host OS and Kubernetes components on master nodes
    startedAt: "2025-02-07T17:39:39Z"
    status: Completed
  - duration: 1h19m9.359184228s
    id: k8s-workers-vdrok-child-default
    message: Ready
    name: Update host OS and Kubernetes components on worker nodes, group vdrok-child-default
    startedAt: "2025-02-07T18:03:42Z"
    status: Completed
  - duration: 2m0.772243006s
    id: mcc-components
    message: Ready
    name: Auxiliary components update
    startedAt: "2025-02-07T19:22:51Z"
    status: Completed

UpdateAutoPause resource

Available since 2.29.0 (17.4.0) Technology Preview

This section describes the UpdateAutoPause custom resource (CR) used in the Container Cloud API to configure automatic pausing of cluster release updates in a managed cluster using StackLight alerts.

The Container Cloud UpdateAutoPause CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is UpdateAutoPause.

  • metadata

    Metadata of the UpdateAutoPause CR that contains the following fields:

    • name

      Name of the UpdateAutoPause object. Must match the cluster name.

    • namespace

      Project where the UpdateAutoPause is created. Must match the cluster namespace.

  • spec

    Specification of the UpdateAutoPause CR that contains the following field:

    • alerts

      List of alert names. The occurrence of any alert from this list triggers auto-pause of the cluster release update.

  • status

    Status of the UpdateAutoPause CR that contains the following fields:

    • firingAlerts

      List of currently firing alerts from the specified set.

    • error

      Error message, if any, encountered during object processing.

Configuration example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: UpdateAutoPause
metadata:
  name: example-cluster
  namespace: example-ns
spec:
  alerts:
    - KubernetesNodeNotReady
    - KubernetesContainerOOMKilled
status:
  firingAlerts:
    - KubernetesNodeNotReady
  error: ""

CacheWarmupRequest resource

TechPreview Available since 2.24.0 and 23.2 for MOSK clusters

This section describes the CacheWarmupRequest custom resource (CR) used in the Container Cloud API to predownload images and store them in the mcc-cache service.

The Container Cloud CacheWarmupRequest CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is CacheWarmupRequest.

  • metadata

    The metadata object field of the CacheWarmupRequest resource contains the following fields:

    • name

      Name of the CacheWarmupRequest object that must match the existing management cluster name to which the warm-up operation applies.

    • namespace

      Container Cloud project in which the cluster is created. Always set to default as the only available project for management clusters creation.

  • spec

    The spec object field of the CacheWarmupRequest resource contains the settings for artifacts fetching and artifacts filtering through Cluster releases. This field contains the following fields:

    • clusterReleases

      Array of strings. Defines a set of Cluster release names to warm up in the mcc-cache service.

    • openstackReleases

      Optional. Array of strings. Defines a set of OpenStack releases to warm up in mcc-cache. Applicable only if ClusterReleases field contains mosk releases.

      If you plan to upgrade an OpenStack version, define the current and the target versions including the intermediate versions, if any. For example, to upgrade OpenStack from Victoria to Yoga:

      openstackReleases:
      - victoria
      - wallaby
      - xena
      - yoga
      
    • fetchRequestTimeout

      Optional. String. Time for a single request to download a single artifact. Defaults to 30m. For example, 1h2m3s.

    • clientsPerEndpoint

      Optional. Integer. Number of clients to use for fetching artifacts per each mcc-cache service endpoint. Defaults to 2.

    • openstackOnly

      Optional. Boolean. Enables fetching of the OpenStack-related artifacts for MOSK. Defaults to false. Applicable only if the ClusterReleases field contains mosk releases. Useful when you need to upgrade only an OpenStack version.

Example configuration:

apiVersion: kaas.mirantis.com/v1alpha1
kind: CacheWarmupRequest
metadata:
  name: example-cluster-name
  namespace: default
spec:
  clusterReleases:
  - mke-14-0-1
  - mosk-15-0-1
  openstackReleases:
  - yoga
  fetchRequestTimeout: 30m
  clientsPerEndpoint: 2
  openstackOnly: false

In this example:

  • The CacheWarmupRequest object is created for a management cluster named example-cluster-name.

  • The CacheWarmupRequest object is created in the only allowed default Container Cloud project.

  • Two Cluster releases mosk-15-0-1 and mke-14-0-1 will be predownloaded.

  • For mosk-15-0-1, only images related to the OpenStack version Yoga will be predownloaded.

  • Maximum time-out for a single request to download a single artifact is 30 minutes.

  • Two parallel workers will fetch artifacts per each mcc-cache service endpoint.

  • All artifacts will be fetched, not only those related to OpenStack.

GracefulRebootRequest resource

Available since 2.23.0 and 2.23.1 for MOSK 23.1

This section describes the GracefulRebootRequest custom resource (CR) used in the Container Cloud API for a rolling reboot of several or all cluster machines without workloads interruption. The resource is also useful for a bulk reboot of machines, for example, on large clusters.

The Container Cloud GracefulRebootRequest CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is GracefulRebootRequest.

  • metadata

    Metadata of the GracefulRebootRequest CR that contains the following fields:

    • name

      Name of the GracefulRebootRequest object. The object name must match the name of the cluster on which you want to reboot machines.

    • namespace

      Project where the GracefulRebootRequest is created.

  • spec

    Specification of the GracefulRebootRequest CR that contains the following fields:

    • machines

      List of machines for a rolling reboot. Each machine of the list is cordoned, drained, rebooted, and uncordoned in the order of cluster upgrade policy. For details about the upgrade order, see Change the upgrade order of a machine or machine pool.

      Leave this field empty to reboot all cluster machines.

      Caution

      The cluster and machines must have the Ready status to perform a graceful reboot.

Configuration example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: GracefulRebootRequest
metadata:
  name: demo-cluster
  namespace: demo-project
spec:
  machines:
  - demo-worker-machine-1
  - demo-worker-machine-3

ContainerRegistry resource

This section describes the ContainerRegistry custom resource (CR) used in Mirantis Container Cloud API to configure CA certificates on machines to access private Docker registries.

The Container Cloud ContainerRegistry CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1

  • kind

    Object type that is ContainerRegistry

  • metadata

    The metadata object field of the ContainerRegistry CR contains the following fields:

    • name

      Name of the container registry

    • namespace

      Project where the container registry is created

  • spec

    The spec object field of the ContainerRegistry CR contains the following fields:

    • domain

      Host name and optional port of the registry

    • CACert

      CA certificate of the registry in the base64-encoded format

Caution

Only one ContainerRegistry resource can exist per domain. To configure multiple CA certificates for the same domain, combine them into one certificate.

The ContainerRegistry resource example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: ContainerRegistry
metadata:
  name: demoregistry
  namespace: test
spec:
  domain: demohost:5000
  CACert: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0...

TLSConfig resource

This section describes the TLSConfig resource used in Mirantis Container Cloud API to configure TLS certificates for cluster applications.

Warning

The kubectl apply command automatically saves the applied data as plain text into the kubectl.kubernetes.io/last-applied-configuration annotation of the corresponding object. This may result in revealing sensitive data in this annotation when creating or modifying the object.

Therefore, do not use kubectl apply on this object. Use kubectl create, kubectl patch, or kubectl edit instead.

If you used kubectl apply on this object, you can remove the kubectl.kubernetes.io/last-applied-configuration annotation from the object using kubectl edit.

The Container Cloud TLSConfig CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is TLSConfig.

  • metadata

    The metadata object field of the TLSConfig resource contains the following fields:

    • name

      Name of the public key.

    • namespace

      Project where the TLS certificate is created.

  • spec

    The spec object field contains the configuration to apply for an application. It contains the following fields:

    • serverName

      Host name of a server.

    • serverCertificate

      Certificate to authenticate server’s identity to a client. A valid certificate bundle can be passed. The server certificate must be on the top of the chain.

    • privateKey

      Reference to the Secret object that contains a private key. A private key is a key for the server. It must correspond to the public key used in the server certificate.

      • key

        Key name in the secret.

      • name

        Secret name.

    • caCertificate

      Certificate that issued the server certificate. The top-most intermediate certificate should be used if a CA certificate is unavailable.

Configuration example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: TLSConfig
metadata:
  namespace: default
  name: keycloak
spec:
  caCertificate: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0...
  privateKey:
    secret:
      key: value
      name: keycloak-s7mcj
  serverCertificate: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0...
  serverName: keycloak.mirantis.com

Bare metal resources

This section contains descriptions and examples of the baremetal-based Kubernetes resources for Mirantis Container Cloud.

BareMetalHost

Private API since Container Cloud 2.29.0 (Cluster release 16.4.0)

Warning

Since Container Cloud 2.29.0 (Cluster release 16.4.0), use the BareMetalHostInventory resource instead of BareMetalHost for adding and modifying configuration of a bare metal server. Any change in the BareMetalHost object will be overwitten by BareMetalHostInventory.

For any existing BareMetalHost object, a BareMetalHostInventory object is created automatically during management cluster update to the Cluster release 16.4.0.

This section describes the BareMetalHost resource used in the Mirantis Container Cloud API. BareMetalHost object is being created for each Machine and contains all information about machine hardware configuration. BareMetalHost objects are used to monitor and manage the state of a bare metal server. This includes inspecting the host hardware, firmware, operating system provisioning, power control, server deprovision. When a machine is created, the bare metal provider assigns a BareMetalHost to that machine using labels and the BareMetalHostProfile configuration.

For demonstration purposes, the Container Cloud BareMetalHost custom resource (CR) can be split into the following major sections:

BareMetalHost metadata

The Container Cloud BareMetalHost CR contains the following fields:

  • apiVersion

    API version of the object that is metal3.io/v1alpha1.

  • kind

    Object type that is BareMetalHost.

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the BareMetalHost object.

    • namespace

      Project in which the BareMetalHost object was created.

    • annotations

      Available since Cluster releases 12.5.0, 11.5.0, and 7.11.0. Key-value pairs to attach additional metadata to the object:

      • kaas.mirantis.com/baremetalhost-credentials-name

        Key that connects the BareMetalHost object with a previously created BareMetalHostCredential object. The value of this key must match the BareMetalHostCredential object name.

      • host.dnsmasqs.metal3.io/address

        Available since Cluster releases 17.0.0 and 16.0.0. Key that assigns a particular IP address to a bare metal host during PXE provisioning.

      • baremetalhost.metal3.io/detached

        Available since Cluster releases 17.0.0 and 16.0.0. Key that pauses host management by the bare metal Operator for a manual IP address assignment.

        Note

        If the host provisioning has already started or completed, adding of this annotation deletes the information about the host from Ironic without triggering deprovisioning. The bare metal Operator recreates the host in Ironic once you remove the annotation. For details, see Metal3 documentation.

      • inspect.metal3.io/hardwaredetails-storage-sort-term

        Available since Cluster releases 17.0.0 and 16.0.0. Optional. Key that defines sorting of the bmh:status:storage[] list during inspection of a bare metal host. Accepts multiple tags separated by a comma or semi-column with the ASC/DESC suffix for sorting direction. Example terms: sizeBytes DESC, hctl ASC, type ASC, name DESC.

        Since Cluster releases 17.1.0 and 16.1.0, the following default value applies: hctl ASC, wwn ASC, by_id ASC, name ASC.

    • labels

      Labels used by the bare metal provider to find a matching BareMetalHost object to deploy a machine:

      • hostlabel.bm.kaas.mirantis.com/controlplane

      • hostlabel.bm.kaas.mirantis.com/worker

      • hostlabel.bm.kaas.mirantis.com/storage

      Each BareMetalHost object added using the Container Cloud web UI will be assigned one of these labels. If the BareMetalHost and Machine objects are created using API, any label may be used to match these objects for a bare metal host to deploy a machine.

      Warning

      Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: master-0
  namespace: default
  labels:
    kaas.mirantis.com/baremetalhost-id: hw-master-0
    kaas.mirantis.com/baremetalhost-id: <bareMetalHostHardwareNodeUniqueId>
  annotations: # Since 2.21.0 (7.11.0, 12.5.0, 11.5.0)
    kaas.mirantis.com/baremetalhost-credentials-name: hw-master-0-credentials
BareMetalHost configuration

The spec section for the BareMetalHost object defines the desired state of BareMetalHost. It contains the following fields:

  • bmc

    Details for communication with the Baseboard Management Controller (bmc) module on a host. Contains the following subfields:

    • address

      URL for communicating with the BMC. URLs vary depending on the communication protocol and the BMC type, for example:

      • IPMI

        Default BMC type in the ipmi://<host>:<port> format. You can also use a plain <host>:<port> format. A port is optional if using the default port 623.

        You can change the IPMI privilege level from the default ADMINISTRATOR to OPERATOR with an optional URL parameter privilegelevel: ipmi://<host>:<port>?privilegelevel=OPERATOR.

      • Redfish

        BMC type in the redfish:// format. To disable TLS, you can use the redfish+http:// format. A host name or IP address and a path to the system ID are required for both formats. For example, redfish://myhost.example/redfish/v1/Systems/System.Embedded.1 or redfish://myhost.example/redfish/v1/Systems/1.

    • credentialsName

      Name of the secret containing the BareMetalHost object credentials.

      • Since Container Cloud 2.21.0 and 2.21.1 for MOSK 22.5, this field is updated automatically during cluster deployment. For details, see BareMetalHostCredential.

      • Before Container Cloud 2.21.0 or MOSK 22.5, the secret requires the username and password keys in the Base64 encoding.

    • disableCertificateVerification

      Boolean to skip certificate validation when true.

  • bootMACAddress

    MAC address for booting.

  • bootMode

    Boot mode: UEFI if UEFI is enabled and legacy if disabled.

  • online

    Defines whether the server must be online after provisioning is done.

    Warning

    Setting online: false to more than one bare metal host in a management cluster at a time can make the cluster non-operational.

Configuration example for Container Cloud 2.21.0 or later:

metadata:
  name: node-1-name
  annotations:
    kaas.mirantis.com/baremetalhost-credentials-name: node-1-credentials # Since Container Cloud 2.21.0
spec:
  bmc:
    address: 192.168.33.106:623
    credentialsName: ''
  bootMACAddress: 0c:c4:7a:a8:d3:44
  bootMode: legacy
  online: true

Configuration example for Container Cloud 2.20.1 or earlier:

metadata:
  name: node-1-name
spec:
  bmc:
    address: 192.168.33.106:623
    credentialsName: node-1-credentials-secret-f9g7d9f8h79
  bootMACAddress: 0c:c4:7a:a8:d3:44
  bootMode: legacy
  online: true
BareMetalHost status

The status field of the BareMetalHost object defines the current state of BareMetalHost. It contains the following fields:

  • errorMessage

    Last error message reported by the provisioning subsystem.

  • goodCredentials

    Last credentials that were validated.

  • hardware

    Hardware discovered on the host. Contains information about the storage, CPU, host name, firmware, and so on.

  • operationalStatus

    Status of the host:

    • OK

      Host is configured correctly and is manageable.

    • discovered

      Host is only partially configured. For example, the bmc address is discovered but not the login credentials.

    • error

      Host has any sort of error.

  • poweredOn

    Host availability status: powered on (true) or powered off (false).

  • provisioning

    State information tracked by the provisioner:

    • state

      Current action being done with the host by the provisioner.

    • id

      UUID of a machine.

  • triedCredentials

    Details of the last credentials sent to the provisioning backend.

Configuration example:

status:
  errorMessage: ""
  goodCredentials:
    credentials:
      name: master-0-bmc-secret
      namespace: default
    credentialsVersion: "13404"
  hardware:
    cpu:
      arch: x86_64
      clockMegahertz: 3000
      count: 32
      flags:
      - 3dnowprefetch
      - abm
      ...
      model: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
    firmware:
      bios:
        date: ""
        vendor: ""
        version: ""
    hostname: ipa-fcab7472-892f-473c-85a4-35d64e96c78f
    nics:
    - ip: ""
      mac: 0c:c4:7a:a8:d3:45
      model: 0x8086 0x1521
      name: enp8s0f1
      pxe: false
      speedGbps: 0
      vlanId: 0
      ...
    ramMebibytes: 262144
    storage:
    - by_path: /dev/disk/by-path/pci-0000:00:1f.2-ata-1
      hctl: "4:0:0:0"
      model: Micron_5200_MTFD
      name: /dev/sda
      rotational: false
      serialNumber: 18381E8DC148
      sizeBytes: 1920383410176
      vendor: ATA
      wwn: "0x500a07511e8dc148"
      wwnWithExtension: "0x500a07511e8dc148"
      ...
    systemVendor:
      manufacturer: Supermicro
      productName: SYS-6018R-TDW (To be filled by O.E.M.)
      serialNumber: E16865116300188
  operationalStatus: OK
  poweredOn: true
  provisioning:
    state: provisioned
  triedCredentials:
    credentials:
      name: master-0-bmc-secret
      namespace: default
    credentialsVersion: "13404"
BareMetalHostCredential

Available since 2.21.0 and 2.21.1 for MOSK 22.5

This section describes the BareMetalHostCredential custom resource (CR) used in the Mirantis Container Cloud API. The BareMetalHostCredential object is created for each BareMetalHostInventory and contains all information about the Baseboard Management Controller (bmc) credentials.

Note

Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

Caution

While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

Warning

The kubectl apply command automatically saves the applied data as plain text into the kubectl.kubernetes.io/last-applied-configuration annotation of the corresponding object. This may result in revealing sensitive data in this annotation when creating or modifying the object.

Therefore, do not use kubectl apply on this object. Use kubectl create, kubectl patch, or kubectl edit instead.

If you used kubectl apply on this object, you can remove the kubectl.kubernetes.io/last-applied-configuration annotation from the object using kubectl edit.

For demonstration purposes, the BareMetalHostCredential CR can be split into the following sections:

BareMetalHostCredential metadata

The BareMetalHostCredential metadata contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1

  • kind

    Object type that is BareMetalHostCredential

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the BareMetalHostCredential object

    • namespace

      Container Cloud project in which the related BareMetalHostInventory object was created

    • labels

      Labels used by the bare metal provider:

      • kaas.mirantis.com/region

        Region name

        Note

        The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

BareMetalHostCredential configuration

The spec section for the BareMetalHostCredential object contains sensitive information that is moved to a separate Secret object during cluster deployment:

  • username

    User name of the bmc account with administrator privileges to control the power state and boot source of the bare metal host

  • password

    Details on the user password of the bmc account with administrator privileges:

    • value

      Password that will be automatically removed once saved in a separate Secret object

    • name

      Name of the Secret object where credentials are saved

The BareMetalHostCredential object creation triggers the following automatic actions:

  1. Create an underlying Secret object containing data about username and password of the bmc account of the related BareMetalHostCredential object.

  2. Erase sensitive password data of the bmc account from the BareMetalHostCredential object.

  3. Add the created Secret object name to the spec.password.name section of the related BareMetalHostCredential object.

  4. Update BareMetalHostInventory.spec.bmc.bmhCredentialsName with the BareMetalHostCredential object name.

    Note

    Before Container Cloud 2.29.0 (17.4.0 and 16.4.0), BareMetalHost.spec.bmc.credentialsName was updated with the BareMetalHostCredential object name.

Note

When you delete a BareMetalHostInventory object, the related BareMetalHostCredential object is deleted automatically.

Note

On existing clusters, a BareMetalHostCredential object is automatically created for each BareMetalHostInventory object during a cluster update.

Example of BareMetalHostCredential before the cluster deployment starts:

apiVersion: kaas.mirantis.com/v1alpha1
kind: BareMetalHostCredential
metadata:
  name: hw-master-0-credetnials
  namespace: default
spec:
  username: admin
  password:
    value: superpassword

Example of BareMetalHostCredential created during cluster deployment:

apiVersion: kaas.mirantis.com/v1alpha1
kind: BareMetalHostCredential
metadata:
  name: hw-master-0-credetnials
  namespace: default
spec:
  username: admin
  password:
    name: secret-cv98n7c0vb9
BareMetalHostInventory

Available since Container Cloud 2.29.0 (Cluster release 16.4.0)

Note

Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

Caution

While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

This section describes the BareMetalHostInventory resource used in the Mirantis Container Cloud API to monitor and manage the state of a bare metal server. This includes inspecting the host hardware, firmware, operating system provisioning, power control, and server deprovision. The BareMetalHostInventory object is created for each Machine and contains all information about machine hardware configuration.

Each BareMetalHostInventory object is synchronized with an automatically created BareMetalHost object, which is used for internal purposes of the Container Cloud private API.

Use the BareMetalHostInventory object instead of BareMetalHost for adding and modifying configuration of a bare metal server.

Caution

Any change in the BareMetalHost object will be overwitten by BareMetalHostInventory.

For any existing BareMetalHost object, a BareMetalHostInventory object is created automatically during management cluster update to Container Cloud 2.29.0 (Cluster release 16.4.0).

For demonstration purposes, the Container Cloud BareMetalHostInventory custom resource (CR) can be split into the following major sections:

BareMetalHostInventory metadata

The BareMetalHostInventory CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is BareMetalHostInventory.

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the BareMetalHostInventory object.

    • namespace

      Project in which the BareMetalHostInventory object was created.

    • annotations

      • host.dnsmasqs.metal3.io/address

        Key that assigns a particular IP address to a bare metal host during PXE provisioning. For details, see Manually allocate IP addresses for bare metal hosts.

      • baremetalhost.metal3.io/detached

        Key that pauses host management by the bare metal Operator for a manual IP address assignment.

        Note

        If the host provisioning has already started or completed, adding of this annotation deletes the information about the host from Ironic without triggering deprovisioning. The bare metal Operator recreates the host in Ironic once you remove the annotation. For details, see Metal3 documentation.

      • inspect.metal3.io/hardwaredetails-storage-sort-term

        Optional. Key that defines sorting of the bmh:status:storage[] list during inspection of a bare metal host. Accepts multiple tags separated by a comma or semi-column with the ASC/DESC suffix for sorting direction. Example terms: sizeBytes DESC, hctl ASC, type ASC, name DESC.

        The default value is hctl ASC, wwn ASC, by_id ASC, name ASC.

    • labels

      Labels used by the bare metal provider to find a matching BareMetalHostInventory object for machine deployment. For example:

      • hostlabel.bm.kaas.mirantis.com/controlplane

      • hostlabel.bm.kaas.mirantis.com/worker

      • hostlabel.bm.kaas.mirantis.com/storage

      Warning

      Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: BareMetalHostInventory
metadata:
  name: master-0
  namespace: default
  labels:
    kaas.mirantis.com/baremetalhost-id: hw-master-0
  annotations:
    inspect.metal3.io/hardwaredetails-storage-sort-term: hctl ASC, wwn ASC, by_id ASC, name ASC
BareMetalHostInventory configuration

The spec section for the BareMetalHostInventory object defines the required state of BareMetalHostInventory. It contains the following fields:

  • bmc

    Details for communication with the Baseboard Management Controller (bmc) module on a host. Contains the following subfields:

    • address

      URL for communicating with the BMC. URLs vary depending on the communication protocol and the BMC type. For example:

      • IPMI

        Default BMC type in the ipmi://<host>:<port> format. You can also use a plain <host>:<port> format. A port is optional if using the default port 623.

        You can change the IPMI privilege level from the default ADMINISTRATOR to OPERATOR with an optional URL parameter privilegelevel: ipmi://<host>:<port>?privilegelevel=OPERATOR.

      • Redfish

        BMC type in the redfish:// format. To disable TLS, you can use the redfish+http:// format. A host name or IP address and a path to the system ID are required for both formats. For example, redfish://myhost.example/redfish/v1/Systems/System.Embedded.1 or redfish://myhost.example/redfish/v1/Systems/1.

    • bmhCredentialsName

      Name of the BareMetalHostCredentials object.

    • disableCertificateVerification

      Key that disables certificate validation. Boolean, false by default. When true, the validation is skipped.

  • bootMACAddress

    MAC address for booting.

  • bootMode

    Boot mode: UEFI if UEFI is enabled and legacy if disabled.

  • online

    Defines whether the server must be online after provisioning is done.

    Warning

    Setting online: false to more than one bare metal host in a management cluster at a time can make the cluster non-operational.

Configuration example:

metadata:
  name: master-0
spec:
  bmc:
    address: 192.168.33.106:623
    bmhCredentialsName: 'master-0-bmc-credentials'
  bootMACAddress: 0c:c4:7a:a8:d3:44
  bootMode: legacy
  online: true
BareMetalHostInventory status

The status field of the BareMetalHostInventory object defines the current state of BareMetalHostInventory. It contains the following fields:

  • errorMessage

    Latest error message reported by the provisioning subsystem.

  • errorCount

    Number of errors that the host has encountered since the last successful operation.

  • operationalStatus

    Status of the host:

    • OK

      Host is configured correctly and is manageable.

    • discovered

      Host is only partially configured. For example, the bmc address is discovered but the login credentials are not.

    • error

      Host has any type of error.

  • poweredOn

    Host availability status that is powered on (true) or powered off (false).

  • operationHistory

    Key that contains information about performed operations.

Status example:

status:
  errorCount: 0
  errorMessage: ""
  operationHistory:
    deprovision:
      end: null
      start: null
    inspect:
      end: "2025-01-01T00:00:00Z"
      start: "2025-01-01T00:00:00Z"
    provision:
      end: "2025-01-01T00:00:00Z"
      start: "2025-01-01T00:00:00Z"
    register:
      end: "2025-01-01T00:00:00Z"
      start: "2025-01-01T00:00:00Z"
  operationalStatus: OK
  poweredOn: true
BareMetalHostProfile

This section describes the BareMetalHostProfile resource used in Mirantis Container Cloud API to define how the storage devices and operating system are provisioned and configured.

For demonstration purposes, the Container Cloud BareMetalHostProfile custom resource (CR) is split into the following major sections:

metadata

The Container Cloud BareMetalHostProfile CR contains the following fields:

  • apiVersion

    API version of the object that is metal3.io/v1alpha1.

  • kind

    Object type that is BareMetalHostProfile.

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the bare metal host profile.

    • namespace

      Project in which the bare metal host profile was created.

Configuration example:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHostProfile
metadata:
  name: default
  namespace: default
spec

The spec field of BareMetalHostProfile object contains the fields to customize your hardware configuration:

Warning

Any data stored on any device defined in the fileSystems list can be deleted or corrupted during cluster (re)deployment. It happens because each device from the fileSystems list is a part of the rootfs directory tree that is overwritten during (re)deployment.

Examples of affected devices include:

  • A raw device partition with a file system on it

  • A device partition in a volume group with a logical volume that has a file system on it

  • An mdadm RAID device with a file system on it

  • An LVM RAID device with a file system on it

The wipe field (deprecated) or wipeDevice structure (recommended since Container Cloud 2.26.0) have no effect in this case and cannot protect data on these devices.

Therefore, to prevent data loss, move the necessary data from these file systems to another server beforehand, if required.

  • devices

    List of definitions of the physical storage devices. To configure more than three storage devices per host, add additional devices to this list. Each device in the list can have one or more partitions defined by the list in the partitions field.

    • Each device in the list must have the following fields in the properties section for device handling:

      • workBy (recommended, string)

        Defines how the device should be identified. Accepts a comma-separated string with the following recommended value (in order of priority): by_id,by_path,by_wwn,by_name. Since 2.25.1, this value is set by default.

      • wipeDevice (recommended, object)

        Available since Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Enables and configures cleanup of a device or its metadata before cluster deployment. Contains the following fields:

        • eraseMetadata (dictionary)

          Enables metadata cleanup of a device. Contains the following field:

          • enabled (boolean)

            Enables the eraseMetadata option. False by default.

        • eraseDevice (dictionary)

          Configures a complete cleanup of a device. Contains the following fields:

          • blkdiscard (object)

            Executes the blkdiscard command on the target device to discard all data blocks. Contains the following fields:

            • enabled (boolean)

              Enables the blkdiscard option. False by default.

            • zeroout (string)

              Configures writing of zeroes to each block during device erasure. Contains the following options:

              • fallback - default, blkdiscard attempts to write zeroes only if the device does not support the block discard feature. In this case, the blkdiscard command is re-executed with an additional --zeroout flag.

              • always - always write zeroes.

              • never - never write zeroes.

          • userDefined (object)

            Enables execution of a custom command or shell script to erase the target device. Contains the following fields:

            • enabled (boolean)

              Enables the userDefined option. False by default.

            • command (string)

              Defines a command to erase the target device. Empty by default. Mutually exclusive with script. For the command execution, the ansible.builtin.command module is called.

            • script (string)

              Defines a plain-text script allowing pipelines (|) to erase the target device. Empty by default. Mutually exclusive with command. For the script execution, the ansible.builtin.shell module is called.

            When executing a command or a script, you can use the following environment variables:

            • DEVICE_KNAME (always defined by Ansible)

              Device kernel path, for example, /dev/sda

            • DEVICE_BY_NAME (optional)

              Link from /dev/disk/by-name/ if it was added by udev

            • DEVICE_BY_ID (optional)

              Link from /dev/disk/by-id/ if it was added by udev

            • DEVICE_BY_PATH (optional)

              Link from /dev/disk/by-path/ if it was added by udev

            • DEVICE_BY_WWN (optional)

              Link from /dev/disk/by-wwn/ if it was added by udev

        For configuration details, see Wipe a device or partition.

      • wipe (boolean, deprecated)

        Defines whether the device must be wiped of the data before being used.

        Note

        This field is deprecated since Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0) for the sake of wipeDevice and will be removed in one of the following releases.

        For backward compatibility, any existing wipe: true option is automatically converted to the following structure:

        wipeDevice:
          eraseMetadata:
            enabled: True
        

        Before Container Cloud 2.26.0, the wipe field is mandatory.

    • Each device in the list can have the following fields in its properties section that affect the selection of the specific device when the profile is applied to a host:

      • type (optional, string)

        The device type. Possible values: hdd, ssd, nvme. This property is used to filter selected devices by type.

      • partflags (optional, string)

        Extra partition flags to be applied on a partition. For example, bios_grub.

      • minSizeGiB, maxSizeGiB (deprecated, optional, string)

        The lower and upper limit of the selected device size. Only the devices matching these criteria are considered for allocation. Omitted parameter means no upper or lower limit.

        The minSize and maxSize parameter names are also available for the same purpose.

        Caution

        Mirantis recommends using only one parameter name type and units throughout the configuration files. If both sizeGiB and size are used, sizeGiB is ignored during deployment and the suffix is adjusted accordingly. For example, 1.5Gi will be serialized as 1536Mi. The size without units is counted in bytes. For example, size: 120 means 120 bytes.

        Since Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0), minSizeGiB and maxSizeGiB are deprecated. Instead of floats that define sizes in GiB for *GiB fields, use the <sizeNumber>Gi text notation (Ki, Mi, and so on). All newly created profiles are automatically migrated to the Gi syntax. In existing profiles, migrate the syntax manually.

      • byName (forbidden in new profiles since 2.27.0, optional, string)

        The specific device name to be selected during provisioning, such as dev/sda.

        Warning

        With NVME devices and certain hardware disk controllers, you cannot reliably select such device by the system name. Therefore, use a more specific byPath, serialNumber, or wwn selector.

        Caution

        Since Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0), byName is deprecated. Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0), byName is blocked by admission-controller in new BareMetalHostProfile objects. As a replacement, use a more specific selector, such as byPath, serialNumber, or wwn.

      • byPath (optional, string) Since 2.26.0 (17.1.0, 16.1.0)

        The specific device name with its path to be selected during provisioning, such as /dev/disk/by-path/pci-0000:00:07.0.

      • serialNumber (optional, string) Since 2.26.0 (17.1.0, 16.1.0)

        The specific serial number of a physical disk to be selected during provisioning, such as S2RBNXAH116186E.

      • wwn (optional, string) Since 2.26.0 (17.1.0, 16.1.0)

        The specific World Wide Name number of a physical disk to be selected during provisioning, such as 0x5002538d409aeeb4.

        Warning

        When using strict filters, such as byPath, serialNumber, or wwn, Mirantis strongly recommends not combining them with a soft filter, such as minSize / maxSize. Use only one approach.

  • softRaidDevices Tech Preview

    List of definitions of a software-based Redundant Array of Independent Disks (RAID) created by mdadm. Use the following fields to describe an mdadm RAID device:

    • name (mandatory, string)

      Name of a RAID device. Supports the following formats:

      • dev path, for example, /dev/md0.

      • simple name, for example, raid-name that will be created as /dev/md/raid-name on the target OS.

    • devices (mandatory, list)

      List of partitions from the devices list. Expand the resulting list of devices into at least two partitions.

    • level (optional, string)

      Level of a RAID device, defaults to raid1. Possible values: raid1, raid0, raid10.

    • metadata (optional, string)

      Metadata version of RAID, defaults to 1.0. Possible values: 1.0, 1.1, 1.2. For details about the differences in metadata, see man 8 mdadm.

      Warning

      The EFI system partition partflags: ['esp'] must be a physical partition in the main partition table of the disk, not under LVM or mdadm software RAID.

  • fileSystems

    List of file systems. Each file system can be created on top of either device, partition, or logical volume. If more file systems are required for additional devices, define them in this field. Each fileSystems in the list has the following fields:

    • fileSystem (mandatory, string)

      Type of a file system to create on a partition. For example, ext4, vfat.

    • mountOpts (optional, string)

      Comma-separated string of mount options. For example, rw,noatime,nodiratime,lazytime,nobarrier,commit=240,data=ordered.

    • mountPoint (optional, string)

      Target mount point for a file system. For example, /mnt/local-volumes/.

    • partition (optional, string)

      Partition name to be selected for creation from the list in the devices section. For example, uefi.

    • logicalVolume (optional, string)

      LVM logical volume name if the file system is supposed to be created on an LVM volume defined in the logicalVolumes section. For example, lvp.

  • logicalVolumes

    List of LVM logical volumes. Every logical volume belongs to a volume group from the volumeGroups list and has the size attribute for a size in the corresponding units.

    You can also add a software-based RAID raid1 created by LVM using the following fields:

    • name (mandatory, string)

      Name of a logical volume.

    • vg (mandatory, string)

      Name of a volume group that must be a name from the volumeGroups list.

    • sizeGiB or size (mandatory, string)

      Size of a logical volume in gigabytes. When set to 0, all available space on the corresponding volume group will be used. The 0 value equals -l 100%FREE in the lvcreate command.

    • type (optional, string)

      Type of a logical volume. If you require a usual logical volume, you can omit this field.

      Possible values:

      • linear

        Default. A usual logical volume. This value is implied for bare metal host profiles created using the Container Cloud release earlier than 2.12.0 where the type field is unavailable.

      • raid1 Tech Preview

        Serves to build the raid1 type of LVM. Equals to the lvcreate --type raid1... command. For details, see man 8 lvcreate and man 7 lvmraid.

      Caution

      Mirantis recommends using only one parameter name type and units throughout the configuration files. If both sizeGiB and size are used, sizeGiB is ignored during deployment and the suffix is adjusted accordingly. For example, 1.5Gi will be serialized as 1536Mi. The size without units is counted in bytes. For example, size: 120 means 120 bytes.

  • volumeGroups

    List of definitions of LVM volume groups. Each volume group contains one or more devices or partitions from the devices list. Contains the following field:

    • devices (mandatory, list)

      List of partitions to be used in a volume group. For example:

      - partition: lvm_root_part1
      - partition: lvm_root_part2
      

      Must contain the following field:

      • name (mandatory, string)

        Name of a volume group to be created. For example: lvm_root.

  • preDeployScript (optional, string)

    Shell script that executes on a host before provisioning the target operating system inside the ramfs system.

  • postDeployScript (optional, string)

    Shell script that executes on a host after deploying the operating system inside the ramfs system that is chrooted to the target operating system. To use a specific default gateway (for example, to have Internet access) on this stage, refer to Migration of DHCP configuration for existing management clusters.

  • grubConfig (optional, object)

    Set of options for the Linux GRUB bootloader on the target operating system. Contains the following field:

    • defaultGrubOptions (optional, array)

      Set of options passed to the Linux GRUB bootloader. Each string in the list defines one parameter. For example:

      defaultGrubOptions:
      - GRUB_DISABLE_RECOVERY="true"
      - GRUB_PRELOAD_MODULES=lvm
      - GRUB_TIMEOUT=20
      
  • kernelParameters:sysctl (optional, object)

    List of kernel sysctl options passed to /etc/sysctl.d/999-baremetal.conf during a bmh provisioning. For example:

    kernelParameters:
      sysctl:
        fs.aio-max-nr: "1048576"
        fs.file-max: "9223372036854775807"
    

    For the list of options prohibited to change, refer to MKE documentation: Set up kernel default protections.

    Note

    If asymmetric traffic is expected on some of the managed cluster nodes, enable the loose mode for the corresponding interfaces on those nodes by setting the net.ipv4.conf.<interface-name>.rp_filter parameter to "2" in the kernelParameters.sysctl section. For example:

    kernelParameters:
      sysctl:
        net.ipv4.conf.k8s-lcm.rp_filter: "2"
    
  • kernelParameters:modules (optional, object)

    List of options for kernel modules to be passed to /etc/modprobe.d/{filename} during a bare metal host provisioning. For example:

    kernelParameters:
      modules:
      - content: |
          options kvm_intel nested=1
        filename: kvm_intel.conf
    
Configuration example with strict filtering for device - applies since 2.26.0 (17.1.0 and 16.1.0)
spec:
  devices:
  - device:
      wipe: true
      workBy: by_wwn,by_path,by_id,by_name
      wwn: "0x5002538d409aeeb4"
    partitions:
    - name: bios_grub
      partflags:
      - bios_grub
      size: 4Mi
      wipe: true
    - name: uefi
      partflags:
      - esp
      size: 200Mi
      wipe: true
    - name: config-2
      size: 64Mi
      wipe: true
    - name: lvm_root_part
      size: 0
      wipe: true
  - device:
      byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-1
      minSize: 30Gi
      wipe: true
      workBy: by_id,by_path,by_wwn,by_name
    partitions:
    - name: lvm_lvp_part1
      size: 0
      wipe: true
  - device:
      byPath: /dev/disk/by-path/pci-0000:00:1f.2-ata-3
      minSize: 30Gi
      wipe: true
      workBy: by_id,by_path,by_wwn,by_name
    partitions:
    - name: lvm_lvp_part2
      size: 0
      wipe: true
  - device:
      serialNumber: 'Z1X69DG6'
      wipe: true
      workBy: by_id,by_path,by_wwn,by_name
  fileSystems:
  - fileSystem: vfat
    partition: config-2
  - fileSystem: vfat
    mountPoint: /boot/efi
    partition: uefi
  - fileSystem: ext4
    logicalVolume: root
    mountPoint: /
  - fileSystem: ext4
    logicalVolume: lvp
    mountPoint: /mnt/local-volumes/
  grubConfig:
    defaultGrubOptions:
    - GRUB_DISABLE_RECOVERY="true"
    - GRUB_PRELOAD_MODULES=lvm
    - GRUB_TIMEOUT=5
  ...
  logicalVolumes:
  - name: root
    size: 0
    type: linear
    vg: lvm_root
  - name: lvp
    size: 0
    type: linear
    vg: lvm_lvp
  ...
  volumeGroups:
  - devices:
    - partition: lvm_root_part
    name: lvm_root
  - devices:
    - partition: lvm_lvp_part1
    - partition: lvm_lvp_part2
    name: lvm_lvp
General configuration example with the wipeDevice option for devices - applies since 2.26.0 (17.1.0 and 16.1.0)
spec:
  devices:
  - device:
      wipeDevice:
        eraseMetadata:
          enabled: true
      workBy: by_wwn,by_path,by_id,by_name
    partitions:
    - name: bios_grub
      partflags:
      - bios_grub
      size: 4Mi
    - name: uefi
      partflags:
      - esp
      size: 200Mi
    - name: config-2
      size: 64Mi
    - name: lvm_root_part
      size: 0
  - device:
      minSize: 30Gi
      wipeDevice:
        eraseMetadata:
          enabled: true
      workBy: by_id,by_path,by_wwn,by_name
    partitions:
    - name: lvm_lvp_part1
      size: 0
      wipe: true
  - device:
      minSize: 30Gi
      wipeDevice:
        eraseMetadata:
          enabled: true
      workBy: by_id,by_path,by_wwn,by_name
    partitions:
    - name: lvm_lvp_part2
      size: 0
  - device:
      wipeDevice:
        eraseMetadata:
          enabled: true
      workBy: by_id,by_path,by_wwn,by_name
  fileSystems:
  - fileSystem: vfat
    partition: config-2
  - fileSystem: vfat
    mountPoint: /boot/efi
    partition: uefi
  - fileSystem: ext4
    logicalVolume: root
    mountPoint: /
  - fileSystem: ext4
    logicalVolume: lvp
    mountPoint: /mnt/local-volumes/
  grubConfig:
    defaultGrubOptions:
    - GRUB_DISABLE_RECOVERY="true"
    - GRUB_PRELOAD_MODULES=lvm
    - GRUB_TIMEOUT=5
  ...
  logicalVolumes:
  - name: root
    size: 0
    type: linear
    vg: lvm_root
  - name: lvp
    size: 0
    type: linear
    vg: lvm_lvp
  ...
  volumeGroups:
  - devices:
    - partition: lvm_root_part
    name: lvm_root
  - devices:
    - partition: lvm_lvp_part1
    - partition: lvm_lvp_part2
    name: lvm_lvp
General configuration example with the deprecated wipe option for devices - applies before 2.26.0 (17.1.0 and 16.1.0)
spec:
  devices:
   - device:
       #byName: /dev/sda
       minSize: 61GiB
       wipe: true
       workBy: by_wwn,by_path,by_id,by_name
     partitions:
       - name: bios_grub
         partflags:
         - bios_grub
         size: 4Mi
         wipe: true
       - name: uefi
         partflags: ['esp']
         size: 200Mi
         wipe: true
       - name: config-2
         # limited to 64Mb
         size: 64Mi
         wipe: true
       - name: md_root_part1
         wipe: true
         partflags: ['raid']
         size: 60Gi
       - name: lvm_lvp_part1
         wipe: true
         partflags: ['raid']
         # 0 Means, all left space
         size: 0
   - device:
       #byName: /dev/sdb
       minSize: 61GiB
       wipe: true
       workBy: by_wwn,by_path,by_id,by_name
     partitions:
       - name: md_root_part2
         wipe: true
         partflags: ['raid']
         size: 60Gi
       - name: lvm_lvp_part2
         wipe: true
         # 0 Means, all left space
         size: 0
   - device:
       #byName: /dev/sdc
       minSize: 30Gib
       wipe: true
       workBy: by_wwn,by_path,by_id,by_name
  softRaidDevices:
    - name: md_root
      metadata: "1.2"
      devices:
        - partition: md_root_part1
        - partition: md_root_part2
  volumeGroups:
    - name: lvm_lvp
      devices:
        - partition: lvm_lvp_part1
        - partition: lvm_lvp_part2
  logicalVolumes:
    - name: lvp
      vg: lvm_lvp
      # Means, all left space
      sizeGiB: 0
  postDeployScript: |
    #!/bin/bash -ex
    echo $(date) 'post_deploy_script done' >> /root/post_deploy_done
  preDeployScript: |
    #!/bin/bash -ex
    echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"' > /etc/udev/rules.d/60-ssd-scheduler.rules
    echo $(date) 'pre_deploy_script done' >> /root/pre_deploy_done
  fileSystems:
    - fileSystem: vfat
      partition: config-2
    - fileSystem: vfat
      partition: uefi
      mountPoint: /boot/efi/
    - fileSystem: ext4
      softRaidDevice: md_root
      mountPoint: /
    - fileSystem: ext4
      logicalVolume: lvp
      mountPoint: /mnt/local-volumes/
  grubConfig:
    defaultGrubOptions:
    - GRUB_DISABLE_RECOVERY="true"
    - GRUB_PRELOAD_MODULES=lvm
    - GRUB_TIMEOUT=20
  kernelParameters:
    sysctl:
    # For the list of options prohibited to change, refer to
    # https://docs.mirantis.com/mke/3.7/install/predeployment/set-up-kernel-default-protections.html
      kernel.dmesg_restrict: "1"
      kernel.core_uses_pid: "1"
      fs.file-max: "9223372036854775807"
      fs.aio-max-nr: "1048576"
      fs.inotify.max_user_instances: "4096"
      vm.max_map_count: "262144"
    modules:
      - filename: kvm_intel.conf
        content: |
          options kvm_intel nested=1
Mounting recommendations for the /var directory

During volume mounts, Mirantis strongly advises against mounting the entire /var directory to a separate disk or partition. Otherwise, the cloud-init service may fail to configure the target host system during the first boot.

This recommendation allows preventing the following cloud-init issue related to asynchronous mount in systemd with ignoring dependency:

  1. System boots the / mounts.

  2. The cloud-init service starts and processes data in /var/lib/cloud-init, which currently references [/]var/lib/cloud-init.

  3. The systemd service mounts /var/lib/cloud-init and breaks the cloud-init service logic.

Recommended configuration example for /var/lib/nova
spec:
  devices:
    ...
    - device:
        serialNumber: BTWA516305VE480FGN
        type: ssd
        wipeDevice:
          eraseMetadata:
            enabled: true
      partitions:
        - name: var_lib_nova_part
          size: 0
  fileSystems:
    ....
    - fileSystem: ext4
      partition: var_lib_nova_part
      mountPoint: '/var/lib/nova'
      mountOpts: 'rw,noatime,nodiratime,lazytime'
Not recommended configuration example for /var
spec:
  devices:
    ...
    - device:
        serialNumber: BTWA516305VE480FGN
        type: ssd
        wipeDevice:
          eraseMetadata:
            enabled: true
      partitions:
        - name: var_part
          size: 0
  fileSystems:
    ....
    - fileSystem: ext4
      partition: var_part
      mountPoint: '/var' # NOT RECOMMENDED
      mountOpts: 'rw,noatime,nodiratime,lazytime'
Cluster

This section describes the Cluster resource used the in Mirantis Container Cloud API that describes the cluster-level parameters.

For demonstration purposes, the Container Cloud Cluster custom resource (CR) is split into the following major sections:

Warning

The fields of the Cluster resource that are located under the status section including providerStatus are available for viewing only. They are automatically generated by the bare metal cloud provider and must not be modified using Container Cloud API.

metadata

The Container Cloud Cluster CR contains the following fields:

  • apiVersion

    API version of the object that is cluster.k8s.io/v1alpha1.

  • kind

    Object type that is Cluster.

The metadata object field of the Cluster resource contains the following fields:

  • name

    Name of a cluster. A managed cluster name is specified under the Cluster Name field in the Create Cluster wizard of the Container Cloud web UI. A management cluster name is configurable in the bootstrap script.

  • namespace

    Project in which the cluster object was created. The management cluster is always created in the default project. The managed cluster project equals to the selected project name.

  • labels

    Key-value pairs attached to the object:

    • kaas.mirantis.com/provider

      Provider type that is baremetal for the baremetal-based clusters.

    • kaas.mirantis.com/region

      Region name. The default region name for the management cluster is region-one.

      Note

      The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

    Warning

    Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: cluster.k8s.io/v1alpha1
kind: Cluster
metadata:
  name: demo
  namespace: test
  labels:
    kaas.mirantis.com/provider: baremetal
spec:providerSpec

The spec object field of the Cluster object represents the BaremetalClusterProviderSpec subresource that contains a complete description of the desired bare metal cluster state and all details to create the cluster-level resources. It also contains the fields required for LCM deployment and integration of the Container Cloud components.

The providerSpec object field is custom for each cloud provider and contains the following generic fields for the bare metal provider:

  • apiVersion

    API version of the object that is baremetal.k8s.io/v1alpha1

  • kind

    Object type that is BaremetalClusterProviderSpec

Configuration example:

spec:
  ...
  providerSpec:
    value:
      apiVersion: baremetal.k8s.io/v1alpha1
      kind: BaremetalClusterProviderSpec
spec:providerSpec common

The common providerSpec object field of the Cluster resource contains the following fields:

  • credentials

    Field reserved for other cloud providers, has an empty value. Disregard this field.

spec:providerSpec common

The common providerSpec object field of the Cluster resource contains the following fields:

  • credentials

    Field reserved for other cloud providers, has an empty value. Disregard this field.

  • release

    Name of the ClusterRelease object to install on a cluster

  • helmReleases

    List of enabled Helm releases from the Release object that run on a cluster

  • proxy

    Name of the Proxy object

  • tls

    TLS configuration for endpoints of a cluster

    • keycloak

      KeyCloak endpoint

      • tlsConfigRef

        Reference to the TLSConfig object

    • ui

      Web UI endpoint

      • tlsConfigRef

        Reference to the TLSConfig object

    For more details, see TLSConfig resource.

  • maintenance

    Maintenance mode of a cluster. Prepares a cluster for maintenance and enables the possibility to switch machines into maintenance mode.

  • containerRegistries

    List of the ContainerRegistries resources names.

  • ntpEnabled

    NTP server mode. Boolean, enabled by default.

    Since Container Cloud 2.23.0, you can optionally disable NTP to disable the management of chrony configuration by Container Cloud and use your own system for chrony management. Otherwise, configure the regional NTP server parameters to be applied to all machines of managed clusters.

    Before Container Cloud 2.23.0, you can optionally configure NTP parameters if servers from the Ubuntu NTP pool (*.ubuntu.pool.ntp.org) are accessible from the node where a management cluster is being provisioned. Otherwise, this configuration is mandatory.

    NTP configuration

    Configure the regional NTP server parameters to be applied to all machines of managed clusters.

    In the Cluster object, add the ntp:servers section with the list of required server names:

    spec:
      ...
      providerSpec:
        value:
          kaas:
          ...
          ntpEnabled: true
            regional:
              - helmReleases:
                - name: <providerName>-provider
                  values:
                    config:
                      lcm:
                        ...
                        ntp:
                          servers:
                          - 0.pool.ntp.org
                          ...
                provider: <providerName>
                ...
    

    To disable NTP:

    spec:
      ...
      providerSpec:
        value:
          ...
          ntpEnabled: false
          ...
    
  • audit Since 2.24.0 as TechPreview

    Optional. Auditing tools enabled on the cluster. Contains the auditd field that enables the Linux Audit daemon auditd to monitor activity of cluster processes and prevent potential malicious activity.

    Configuration for auditd

    In the Cluster object, add the auditd parameters:

    spec:
      providerSpec:
        value:
          audit:
            auditd:
              enabled: <bool>
              enabledAtBoot: <bool>
              backlogLimit: <int>
              maxLogFile: <int>
              maxLogFileAction: <string>
              maxLogFileKeep: <int>
              mayHaltSystem: <bool>
              presetRules: <string>
              customRules: <string>
              customRulesX32: <text>
              customRulesX64: <text>
    

    Configuration parameters for auditd:

    enabled

    Boolean, default - false. Enables the auditd role to install the auditd packages and configure rules. CIS rules: 4.1.1.1, 4.1.1.2.

    enabledAtBoot

    Boolean, default - false. Configures grub to audit processes that can be audited even if they start up prior to auditd startup. CIS rule: 4.1.1.3.

    backlogLimit

    Integer, default - none. Configures the backlog to hold records. If during boot audit=1 is configured, the backlog holds 64 records. If more than 64 records are created during boot, auditd records will be lost with a potential malicious activity being undetected. CIS rule: 4.1.1.4.

    maxLogFile

    Integer, default - none. Configures the maximum size of the audit log file. Once the log reaches the maximum size, it is rotated and a new log file is created. CIS rule: 4.1.2.1.

    maxLogFileAction

    String, default - none. Defines handling of the audit log file reaching the maximum file size. Allowed values:

    • keep_logs - rotate logs but never delete them

    • rotate - add a cron job to compress rotated log files and keep maximum 5 compressed files.

    • compress - compress log files and keep them under the /var/log/auditd/ directory. Requires auditd_max_log_file_keep to be enabled.

    CIS rule: 4.1.2.2.

    maxLogFileKeep

    Integer, default - 5. Defines the number of compressed log files to keep under the /var/log/auditd/ directory. Requires auditd_max_log_file_action=compress. CIS rules - none.

    mayHaltSystem

    Boolean, default - false. Halts the system when the audit logs are full. Applies the following configuration:

    • space_left_action = email

    • action_mail_acct = root

    • admin_space_left_action = halt

    CIS rule: 4.1.2.3.

    customRules

    String, default - none. Base64-encoded content of the 60-custom.rules file for any architecture. CIS rules - none.

    customRulesX32

    String, default - none. Base64-encoded content of the 60-custom.rules file for the i386 architecture. CIS rules - none.

    customRulesX64

    String, default - none. Base64-encoded content of the 60-custom.rules file for the x86_64 architecture. CIS rules - none.

    presetRules

    String, default - none. Comma-separated list of the following built-in preset rules:

    • access

    • actions

    • delete

    • docker

    • identity

    • immutable

    • logins

    • mac-policy

    • modules

    • mounts

    • perm-mod

    • privileged

    • scope

    • session

    • system-locale

    • time-change

    Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0) in the Technology Preview scope, you can collect some of the preset rules indicated above as groups and use them in presetRules:

    • ubuntu-cis-rules - this group contains rules to comply with the Ubuntu CIS Benchmark recommendations, including the following CIS Ubuntu 20.04 v2.0.1 rules:

      • scope - 5.2.3.1

      • actions - same as 5.2.3.2

      • time-change - 5.2.3.4

      • system-locale - 5.2.3.5

      • privileged - 5.2.3.6

      • access - 5.2.3.7

      • identity - 5.2.3.8

      • perm-mod - 5.2.3.9

      • mounts - 5.2.3.10

      • session - 5.2.3.11

      • logins - 5.2.3.12

      • delete - 5.2.3.13

      • mac-policy - 5.2.3.14

      • modules - 5.2.3.19

    • docker-cis-rules - this group contains rules to comply with Docker CIS Benchmark recommendations, including the docker Docker CIS v1.6.0 rules 1.1.3 - 1.1.18.

    You can also use two additional keywords inside presetRules:

    • none - select no built-in rules.

    • all - select all built-in rules. When using this keyword, you can add the ! prefix to a rule name to exclude some rules. You can use the ! prefix for rules only if you add the all keyword as the first rule. Place a rule with the ! prefix only after the all keyword.

    Example configurations:

    • presetRules: none - disable all preset rules

    • presetRules: docker - enable only the docker rules

    • presetRules: access,actions,logins - enable only the access, actions, and logins rules

    • presetRules: ubuntu-cis-rules - enable all rules from the ubuntu-cis-rules group

    • presetRules: docker-cis-rules,actions - enable all rules from the docker-cis-rules group and the actions rule

    • presetRules: all - enable all preset rules

    • presetRules: all,!immutable,!sessions - enable all preset rules except immutable and sessions


    CIS controls
    4.1.3 (time-change)
    4.1.4 (identity)
    4.1.5 (system-locale)
    4.1.6 (mac-policy)
    4.1.7 (logins)
    4.1.8 (session)
    4.1.9 (perm-mod)
    4.1.10 (access)
    4.1.11 (privileged)
    4.1.12 (mounts)
    4.1.13 (delete)
    4.1.14 (scope)
    4.1.15 (actions)
    4.1.16 (modules)
    4.1.17 (immutable)
    Docker CIS controls
    1.1.4
    1.1.8
    1.1.10
    1.1.12
    1.1.13
    1.1.15
    1.1.16
    1.1.17
    1.1.18
    1.2.3
    1.2.4
    1.2.5
    1.2.6
    1.2.7
    1.2.10
    1.2.11
  • secureOverlay

    Optional. Technology Preview. Deprecated since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0). Available since Container Cloud 2.24.0 (Cluster release 14.0.0). Enables WireGuard for traffic encryption on the Kubernetes workloads network. Boolean. Disabled by default.

    Caution

    Before enabling WireGuard, ensure that the Calico MTU size is at least 60 bytes smaller than the interface MTU size of the workload network. IPv4 WireGuard uses a 60-byte header. For details, see Set the MTU size for Calico.

    Caution

    Changing this parameter on a running cluster causes a downtime that can vary depending on the cluster size.

    For more details about WireGuard, see Calico documentation: Encrypt in-cluster pod traffic.

Configuration example:

spec:
  ...
  providerSpec:
    value:
      credentials: ""
      publicKeys:
        - name: bootstrap-key
      release: ucp-5-7-0-3-3-3-tp11
      helmReleases:
        - name: metallb
          values:
            configInline:
              address-pools:
                - addresses:
                  - 10.0.0.101-10.0.0.120
                    name: default
                    protocol: layer2
        ...
        - name: stacklight
          ...
      tls:
        keycloak:
          certificate:
            name: keycloak
          hostname: container-cloud-auth.example.com
        ui:
          certificate:
            name: ui
          hostname: container-cloud-ui.example.com
      containerRegistries:
      - demoregistry
      ntpEnabled: false
      ...
spec:providerSpec configuration

This section represents the Container Cloud components that are enabled on a cluster. It contains the following fields:

  • management

    Configuration for the management cluster components:

    • enabled

      Management cluster enabled (true) or disabled (false).

    • helmReleases

      List of the management cluster Helm releases that will be installed on the cluster. A Helm release includes the name and values fields. The specified values will be merged with relevant Helm release values of the management cluster in the Release object.

  • regional

    List of regional cluster components for the provider:

    • provider

      Provider type that is baremetal.

    • helmReleases

      List of the regional Helm releases that will be installed on the cluster. A Helm release includes the name and values fields. The specified values will be merged with relevant regional Helm release values in the Release object.

  • release

    Name of the Container Cloud Release object.

Configuration example:

spec:
  ...
  providerSpec:
     value:
       kaas:
         management:
           enabled: true
           helmReleases:
             - name: kaas-ui
               values:
                 serviceConfig:
                   server: https://10.0.0.117
         regional:
           - helmReleases:
             - name: baremetal-provider
               values: {}
             provider: baremetal
           ...
         release: kaas-2-0-0
status:providerStatus common

Must not be modified using API

The common providerStatus object field of the Cluster resource contains the following fields:

  • apiVersion

    API version of the object that is baremetal.k8s.io/v1alpha1

  • kind

    Object type that is BaremetalClusterProviderStatus

  • loadBalancerHost

    Load balancer IP or host name of the Container Cloud cluster

  • apiServerCertificate

    Server certificate of Kubernetes API

  • ucpDashboard

    URL of the Mirantis Kubernetes Engine (MKE) Dashboard

  • maintenance

    Maintenance mode of a cluster. Prepares a cluster for maintenance and enables the possibility to switch machines into maintenance mode.

Configuration example:

status:
  providerStatus:
    apiVersion: baremetal.k8s.io/v1alpha1
    kind: BaremetalClusterProviderStatus
    loadBalancerHost: 10.0.0.100
    apiServerCertificate: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS…
    ucpDashboard: https://10.0.0.100:6443
status:providerStatus for cluster readiness

Must not be modified using API

The providerStatus object field of the Cluster resource that reflects the cluster readiness contains the following fields:

  • persistentVolumesProviderProvisioned

    Status of the persistent volumes provisioning. Prevents the Helm releases that require persistent volumes from being installed until some default StorageClass is added to the Cluster object.

  • helm

    Details about the deployed Helm releases:

    • ready

      Status of the deployed Helm releases. The true value indicates that all Helm releases are deployed successfully.

    • releases

      List of the enabled Helm releases that run on the Container Cloud cluster:

      • releaseStatuses

        List of the deployed Helm releases. The success: true field indicates that the release is deployed successfully.

      • stacklight

        Status of the StackLight deployment. Contains URLs of all StackLight components. The success: true field indicates that StackLight is deployed successfully.

  • nodes

    Details about the cluster nodes:

    • ready

      Number of nodes that completed the deployment or update.

    • requested

      Total number of nodes. If the number of ready nodes does not match the number of requested nodes, it means that a cluster is being currently deployed or updated.

  • notReadyObjects

    The list of the services, deployments, and statefulsets Kubernetes objects that are not in the Ready state yet. A service is not ready if its external address has not been provisioned yet. A deployment or statefulset is not ready if the number of ready replicas is not equal to the number of desired replicas. Both objects contain the name and namespace of the object and the number of ready and desired replicas (for controllers). If all objects are ready, the notReadyObjects list is empty.

Configuration example:

status:
  providerStatus:
    persistentVolumesProviderProvisioned: true
    helm:
      ready: true
      releases:
        releaseStatuses:
          iam:
            success: true
          ...
        stacklight:
          alerta:
            url: http://10.0.0.106
          alertmanager:
            url: http://10.0.0.107
          grafana:
            url: http://10.0.0.108
          kibana:
            url: http://10.0.0.109
          prometheus:
            url: http://10.0.0.110
          success: true
    nodes:
      ready: 3
      requested: 3
    notReadyObjects:
      services:
        - name: testservice
          namespace: default
      deployments:
        - name: baremetal-provider
          namespace: kaas
          replicas: 3
          readyReplicas: 2
      statefulsets: {}
status:providerStatus for Open ID Connect

Must not be modified using API

The oidc section of the providerStatus object field in the Cluster resource reflects the Open ID Connect configuration details. It contains the required details to obtain a token for a Container Cloud cluster and consists of the following fields:

  • certificate

    Base64-encoded OIDC certificate.

  • clientId

    Client ID for OIDC requests.

  • groupsClaim

    Name of an OIDC groups claim.

  • issuerUrl

    Issuer URL to obtain the representation of the realm.

  • ready

    OIDC status relevance. If true, the status corresponds to the LCMCluster OIDC configuration.

Configuration example:

status:
  providerStatus:
    oidc:
      certificate: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUREekNDQWZ...
      clientId: kaas
      groupsClaim: iam_roles
      issuerUrl: https://10.0.0.117/auth/realms/iam
      ready: true
status:providerStatus for cluster releases

Must not be modified using API

The releaseRefs section of the providerStatus object field in the Cluster resource provides the current Cluster release version as well as the one available for upgrade. It contains the following fields:

  • current

    Details of the currently installed Cluster release:

    • lcmType

      Type of the Cluster release (ucp).

    • name

      Name of the Cluster release resource.

    • version

      Version of the Cluster release.

    • unsupportedSinceKaaSVersion

      Indicates that a Container Cloud release newer than the current one exists and that it does not support the current Cluster release.

  • available

    List of the releases available for upgrade. Contains the name and version fields.

Configuration example:

status:
  providerStatus:
    releaseRefs:
      available:
        - name: ucp-5-5-0-3-4-0-dev
          version: 5.5.0+3.4.0-dev
      current:
        lcmType: ucp
        name: ucp-5-4-0-3-3-0-beta1
        version: 5.4.0+3.3.0-beta1
HostOSConfiguration

TechPreview since 2.26.0 (17.1.0 and 16.1.0)

Warning

For security reasons and to ensure safe and reliable cluster operability, test this configuration on a staging environment before applying it to production. For any questions, contact Mirantis support.

Caution

As long as the feature is still on the development stage, Mirantis highly recommends deleting all HostOSConfiguration objects, if any, before automatic upgrade of the management cluster to Container Cloud 2.27.0 (Cluster release 16.2.0). After the upgrade, you can recreate the required objects using the updated parameters.

This precautionary step prevents re-processing and re-applying of existing configuration, which is defined in HostOSConfiguration objects, during management cluster upgrade to 2.27.0. Such behavior is caused by changes in the HostOSConfiguration API introduced in 2.27.0.

This section describes the HostOSConfiguration custom resource (CR) used in the Container Cloud API. It contains all necessary information to introduce and load modules for further configuration of the host operating system of the related Machine object.

Note

This object must be created and managed on the management cluster.

For demonstration purposes, we split the Container Cloud HostOSConfiguration CR into the following sections:

HostOSConfiguration metadata
metadata

The Container Cloud HostOSConfiguration custom resource (CR) contains the following fields:

  • apiVersion

    Object API version that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is HostOSConfiguration.

The metadata object field of the HostOSConfiguration resource contains the following fields:

  • name

    Object name.

  • namespace

    Project in which the HostOSConfiguration object is created.

Configuration example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: HostOSConfiguration
metadata:
  name: host-os-configuration-sample
  namespace: default
HostOSConfiguration configuration

The spec object field contains configuration for a HostOSConfiguration object and has the following fields:

  • machineSelector

    Required for production deployments. A set of Machine objects to apply the HostOSConfiguration object to. Has the format of the Kubernetes label selector.

  • configs

    Required. List of configurations to apply to Machine objects defined in machineSelector. Each entry has the following fields:

    • module

      Required. Name of the module that refers to an existing module in one of the HostOSConfigurationModules objects.

    • moduleVersion

      Required. Version of the module in use in the SemVer format.

    • description

      Optional. Description and purpose of the configuration.

    • order

      Optional. Positive integer between 1 and 1024 that indicates the order of applying the module configuration. A configuration with the lowest order value is applied first. If the order field is not set:

      The configuration is applied in the order of appearance in the list after all configurations with the value are applied.

      The following rules apply to the ordering when comparing each pair of entries:

      1. Ordering by alphabet based on the module values unless they are equal.

      2. Ordering by version based on the moduleVersion values, with preference given to the lesser value.

    • values

      Optional if secretValues is set. Module configuration in the format of key-value pairs.

    • secretValues

      Optional if values is set. Reference to a Secret object that contains the configuration values for the module:

      • namespace

        Project name of the Secret object.

      • name

        Name of the Secret object.

      Note

      You can use both values and secretValues together. But if the values are duplicated, the secretValues data rewrites duplicated keys of the values data.

      Warning

      The referenced Secret object must contain only primitive non-nested values. Otherwise, the values will not be applied correctly.

    • phase

      Optional. LCM phase, in which a module configuration must be executed. The only supported and default value is reconfigure. Hence, you may omit this field.

  • order Removed in 2.27.0 (17.2.0 and 16.2.0)

    Optional. Positive integer between 1 and 1024 that indicates the order of applying HostOSConfiguration objects on newly added or newly assigned machines. An object with the lowest order value is applied first. If the value is not set, the object is applied last in the order.

    If no order field is set for all HostOSConfiguration objects, the objects are sorted by name.

    Note

    If a user changes the HostOSConfiguration object that was already applied on some machines, then only the changed items from the spec.configs section of the HostOSConfiguration object are applied to those machines, and the execution order applies only to the changed items.

    The configuration changes are applied on corresponding LCMMachine objects almost immediately after host-os-modules-controller verifies the changes.

Configuration example:

spec:
   machineSelector:
      matchLabels:
        label-name: "label-value"
   configs:
   - description: Brief description of the configuration
     module: container-cloud-provided-module-name
     moduleVersion: 1.0.0
     order: 1
     # the 'phase' field is provided for illustration purposes. it is redundant
     # because the only supported value is "reconfigure".
     phase: "reconfigure"
     values:
       foo: 1
       bar: "baz"
     secretValues:
       name: values-from-secret
       namespace: default
HostOSConfiguration status

The status field of the HostOSConfiguration object contains the current state of the object:

  • controllerUpdate Since 2.27.0 (17.2.0 and 16.2.0)

    Reserved. Indicates whether the status updates are initiated by host-os-modules-controller.

  • isValid Since 2.27.0 (17.2.0 and 16.2.0)

    Indicates whether all given configurations have been validated successfully and are ready to be applied on machines. An invalid object is discarded from processing.

  • specUpdatedAt Since 2.27.0 (17.2.0 and 16.2.0)

    Defines the time of the last change in the object spec observed by host-os-modules-controller.

  • containsDeprecatedModules Since 2.28.0 (17.3.0 and 16.3.0)

    Indicates whether the object uses one or several deprecated modules. Boolean.

  • machinesStates Since 2.27.0 (17.2.0 and 16.2.0)

    Specifies the per-machine state observed by baremetal-provider. The keys are machines names, and each entry has the following fields:

    • observedGeneration

      Read-only. Specifies the sequence number representing the quantity of changes in the object since its creation. For example, during object creation, the value is 1.

    • selected

      Indicates whether the machine satisfied the selector of the object. Non-selected machines are not defined in machinesStates. Boolean.

    • secretValuesChanged

      Indicates whether the secret values have been changed and the corresponding stateItems have to be updated. Boolean.

      The value is set to true by host-os-modules-controller if changes in the secret data are detected. The value is set to false by baremetal-provider after processing.

    • configStateItemsStatuses

      Specifies key-value pairs with statuses of StateItems that are applied to the machine. Each key contains the name and version of the configuration module. Each key value has the following format:

      • Key: name of a configuration StateItem

      • Value: simplified status of the configuration StateItem that has the following fields:

        • hash

          Value of the hash sum from the status of the corresponding StateItem in the LCMMachine object. Appears when the status switches to Success.

        • state

          Actual state of the corresponding StateItem from the LCMMachine object. Possible values: Not Started, Running, Success, Failed.

  • configs

    List of configurations statuses, indicating results of application of each configuration. Every entry has the following fields:

    • moduleName

      Existing module name from the list defined in the spec:modules section of the related HostOSConfigurationModules object.

    • moduleVersion

      Existing module version defined in the spec:modules section of the related HostOSConfigurationModules object.

    • modulesReference

      Name of the HostOSConfigurationModules object that contains the related module configuration.

    • modulePlaybook

      Name of the Ansible playbook of the module. The value is taken from the related HostOSConfigurationModules object where this module is defined.

    • moduleURL

      URL to the module package in the FQDN format. The value is taken from the related HostOSConfigurationModules object where this module is defined.

    • moduleHashsum

      Hash sum of the module. The value is taken from the related HostOSConfigurationModules object where this module is defined.

    • lastDesignatedConfiguration

      Removed in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0). Key-value pairs representing the latest designated configuration data for modules. Each key corresponds to a machine name, while the associated value contains the configuration data encoded in the gzip+base64 format.

    • lastValidatedSpec

      Removed in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0). Last validated module configuration encoded in the gzip+base64 format.

    • valuesValid

      Removed in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0). Validation state of the configuration and secret values defined in the object spec against the module valuesValidationSchema. Always true when valuesValidationSchema is empty.

    • error

      Details of an error, if any, that occurs during the object processing by host-os-modules-controller.

    • secretObjectVersion

      Available since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0). Resource version of the corresponding Secret object observed by host-os-modules-controller. Is present only if secretValues is set.

    • moduleDeprecatedBy

      Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). List of modules that deprecate the currently configured module. Contains the name and version fields specifying one or more modules that deprecate the current module.

    • supportedDistributions

      Available since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0). List of operating system distributions that are supported by the current module. An empty list means support of any distribution by the current module.

HostOSConfiguration status example:

status:
  configs:
  - moduleHashsum: bc5fafd15666cb73379d2e63571a0de96fff96ac28e5bce603498cc1f34de299
    moduleName: module-name
    modulePlaybook: main.yaml
    moduleURL: <url-to-module-archive.tgz>
    moduleVersion: 1.1.0
    modulesReference: mcc-modules
    moduleDeprecatedBy:
    - name: another-module-name
      version: 1.0.0
  - moduleHashsum: 53ec71760dd6c00c6ca668f961b94d4c162eef520a1f6cb7346a3289ac5d24cd
    moduleName: another-module-name
    modulePlaybook: main.yaml
    moduleURL: <url-to-another-module-archive.tgz>
    moduleVersion: 1.1.0
    modulesReference: mcc-modules
    secretObjectVersion: "14234794"
  containsDeprecatedModules: true
  isValid: true
  machinesStates:
    default/master-0:
      configStateItemsStatuses:
        # moduleName-moduleVersion
        module-name-1.1.0:
          # corresponding state item
          host-os-download-<object-name>-module-name-1.1.0-reconfigure:
            hash: 0e5c4a849153d3278846a8ed681f4822fb721f6d005021c4509e7126164f428d
            state: Success
          host-os-<object-name>-module-name-1.1.0-reconfigure:
            state: Not Started
        another-module-name-1.1.0:
          host-os-download-<object-name>-another-module-name-1.1.0-reconfigure:
            state: Not Started
          host-os-<object-name>-another-module-name-1.1.0-reconfigure:
            state: Not Started
      observedGeneration: 1
      selected: true
  updatedAt: "2024-04-23T14:10:28Z"
HostOSConfigurationModules

TechPreview since 2.26.0 (17.1.0 and 16.1.0)

Warning

For security reasons and to ensure safe and reliable cluster operability, test this configuration on a staging environment before applying it to production. For any questions, contact Mirantis support.

This section describes the HostOSConfigurationModules custom resource (CR) used in the Container Cloud API. It contains all necessary information to introduce and load modules for further configuration of the host operating system of the related Machine object. For description of module format, schemas, and rules, see Format and structure of a module package.

Note

This object must be created and managed on the management cluster.

For demonstration purposes, we split the Container Cloud HostOSConfigurationModules CR into the following sections:

HostOSConfigurationModules metadata
metadata

The Container Cloud HostOSConfigurationModules custom resource (CR) contains the following fields:

  • apiVersion

    Object API version that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is HostOSConfigurationModules.

The metadata object field of the HostOSConfigurationModules resource contains the following fields:

  • name

    Object name.

Configuration example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: HostOSConfigurationModules
metadata:
  name: host-os-configuration-modules-sample
HostOSConfigurationModules configuration

The spec object field contains configuration for a HostOSConfigurationModules object and has the following fields:

  • modules

    List of available modules to use as a configuration. Each entry has the following fields:

    • name

      Required. Module name that must equal the corresponding custom module name defined in the metadata section of the corresponding module. For reference, see MOSK documentation: Day-2 operations - Metadata file format.

    • url

      Required for custom modules. URL to the archive containing the module package in the FQDN format. If omitted, the module is considered as the one provided and validated by Container Cloud.

    • version

      Required. Module version in SemVer format that must equal the corresponding custom module version defined in the metadata section of the corresponding module. For reference, see MOSK documentation: Day-2 operations - Metadata file format.

    • sha256sum

      Required. Hash sum computed using the SHA-256 algorithm. The hash sum is automatically validated upon fetching the module package, the module does not load if the hash sum is invalid.

    • deprecates Since 2.28.0 (17.3.0 and 16.3.0)

      Reserved. List of modules that will be deprecated by the module. This field is overriden by the same field, if any, of the module metadata section.

      Contains the name and version fields specifying one or more modules to be deprecated. If name is omitted, it inherits the name of the current module.

Configuration example:

spec:
    modules:
    - name: mirantis-provided-module-name
      sha256sum: ff3c426d5a2663b544acea74e583d91cc2e292913fc8ac464c7d52a3182ec146
      version: 1.0.0
    - name: custom-module-name
      url: https://fully.qualified.domain.name/to/module/archive/module-name-1.0.0.tgz
      sha256sum: 258ccafac1570de7b7829bde108fa9ee71b469358dbbdd0215a081f8acbb63ba
      version: 1.0.0
HostOSConfigurationModules status

The status field of the HostOSConfigurationModules object contains the current state of the object:

  • modules

    List of module statuses, indicating the loading results of each module. Each entry has the following fields:

    • name

      Name of the loaded module.

    • version

      Version of the loaded module.

    • url

      URL to the archive containing the loaded module package in the FQDN format.

    • docURL

      URL to the loaded module documentation if it was initially present in the module package.

    • description

      Description of the loaded module if it was initially present in the module package.

    • sha256sum

      Actual SHA-256 hash sum of the loaded module.

    • valuesValidationSchema

      JSON schema used against the module configuration values if it was initially present in the module package. The value is encoded in the gzip+base64 format.

    • state

      Actual availability state of the module. Possible values are: available or error.

    • error

      Error, if any, that occurred during the module fetching and verification.

    • playbookName

      Name of the module package playbook.

    • deprecates Since 2.28.0 (17.3.0 and 16.3.0)

      List of modules that are deprecated by the module. Contains the name and version fields specifying one or more modules deprecated by the current module.

    • deprecatedBy Since 2.28.0 (17.3.0 and 16.3.0)

      List of modules that deprecate the current module. Contains the name and version fields specifying one or more modules that deprecate the current module.

    • supportedDistributions Since 2.28.0 (17.3.0 and 16.3.0)

      List of operating system distributions that are supported by the current module. An empty list means support of any distribution by the current module.

HostOSConfigurationModules status example:

status:
  modules:
  - description: Brief description of the module
    docURL: https://docs.mirantis.com
    name: mirantis-provided-module-name
    playbookName: directory/main.yaml
    sha256sum: ff3c426d5a2663b544acea74e583d91cc2e292913fc8ac464c7d52a3182ec146
    state: available
    url: https://example.mirantis.com/path/to/module-name-1.0.0.tgz
    valuesValidationSchema: <gzip+base64 encoded data>
    version: 1.0.0
    deprecates:
    - name: custom-module-name
      version: 1.0.0
  - description: Brief description of the module
    docURL: https://example.documentation.page/module-name
    name: custom-module-name
    playbookName: directory/main.yaml
    sha256sum: 258ccafac1570de7b7829bde108fa9ee71b469358dbbdd0215a081f8acbb63ba
    state: available
    url: https://fully.qualified.domain.name/to/module/archive/module-name-1.0.0.tgz
    version: 1.0.0
    deprecatedBy:
    - name: mirantis-provided-module-name
      version: 1.0.0
    supportedDistributions:
    - ubuntu/jammy
IPaddr

This section describes the IPaddr resource used in Mirantis Container Cloud API. The IPAddr object describes an IP address and contains all information about the associated MAC address.

For demonstration purposes, the Container Cloud IPaddr custom resource (CR) is split into the following major sections:

IPaddr metadata

The Container Cloud IPaddr CR contains the following fields:

  • apiVersion

    API version of the object that is ipam.mirantis.com/v1alpha1

  • kind

    Object type that is IPaddr

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the IPaddr object in the auto-XX-XX-XX-XX-XX-XX format where XX-XX-XX-XX-XX-XX is the associated MAC address

    • namespace

      Project in which the IPaddr object was created

    • labels

      Key-value pairs that are attached to the object:

      • ipam/IP

        IPv4 address

      • ipam/IpamHostID

        Unique ID of the associated IpamHost object

      • ipam/MAC

        MAC address

      • ipam/SubnetID

        Unique ID of the Subnet object

      • ipam/UID

        Unique ID of the IPAddr object

      Warning

      Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: ipam.mirantis.com/v1alpha1
kind: IPaddr
metadata:
  name: auto-0c-c4-7a-a8-b8-18
  namespace: default
  labels:
    ipam/IP: 172.16.48.201
    ipam/IpamHostID: 848b59cf-f804-11ea-88c8-0242c0a85b02
    ipam/MAC: 0C-C4-7A-A8-B8-18
    ipam/SubnetID: 572b38de-f803-11ea-88c8-0242c0a85b02
    ipam/UID: 84925cac-f804-11ea-88c8-0242c0a85b02
IPAddr spec

The spec object field of the IPAddr resource contains the associated MAC address and the reference to the Subnet object:

  • mac

    MAC address in the XX:XX:XX:XX:XX:XX format

  • subnetRef

    Reference to the Subnet resource in the <subnetProjectName>/<subnetName> format

Configuration example:

spec:
  mac: 0C:C4:7A:A8:B8:18
  subnetRef: default/kaas-mgmt
IPAddr status

The status object field of the IPAddr resource reflects the actual state of the IPAddr object. In contains the following fields:

  • address

    IP address.

  • cidr

    IPv4 CIDR for the Subnet.

  • gateway

    Gateway address for the Subnet.

  • mac

    MAC address in the XX:XX:XX:XX:XX:XX format.

  • nameservers

    List of the IP addresses of name servers of the Subnet. Each element of the list is a single address, for example, 172.18.176.6.

  • state Since 2.23.0

    Message that reflects the current status of the resource. The list of possible values includes the following:

    • OK - object is operational.

    • ERR - object is non-operational. This status has a detailed description in the messages list.

    • TERM - object was deleted and is terminating.

  • messages Since 2.23.0

    List of error or warning messages if the object state is ERR.

  • objCreated

    Date, time, and IPAM version of the resource creation.

  • objStatusUpdated

    Date, time, and IPAM version of the last update of the status field in the resource.

  • objUpdated

    Date, time, and IPAM version of the last resource update.

  • phase

    Deprecated since Container Cloud 2.23.0 and will be removed in one of the following releases in favor of state. Possible values: Active, Failed, or Terminating.

Configuration example:

status:
  address: 172.16.48.201
  cidr: 172.16.48.201/24
  gateway: 172.16.48.1
  objCreated: 2021-10-21T19:09:32Z  by  v5.1.0-20210930-121522-f5b2af8
  objStatusUpdated: 2021-10-21T19:14:18.748114886Z  by  v5.1.0-20210930-121522-f5b2af8
  objUpdated: 2021-10-21T19:09:32.606968024Z  by  v5.1.0-20210930-121522-f5b2af8
  mac: 0C:C4:7A:A8:B8:18
  nameservers:
  - 172.18.176.6
  state: OK
  phase: Active
IpamHost

This section describes the IpamHost resource used in Mirantis Container Cloud API. The kaas-ipam controller monitors the current state of the bare metal Machine, verifies if BareMetalHost is successfully created and inspection is completed. Then the kaas-ipam controller fetches the information about the network interface configuration, creates the IpamHost object, and requests the IP addresses.

The IpamHost object is created for each Machine and contains all configuration of the host network interfaces and IP address. It also contains the information about associated BareMetalHost, Machine, and MAC addresses.

Note

Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

Caution

While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

For demonstration purposes, the Container Cloud IpamHost custom resource (CR) is split into the following major sections:

IpamHost metadata

The Container Cloud IpamHost CR contains the following fields:

  • apiVersion

    API version of the object that is ipam.mirantis.com/v1alpha1

  • kind

    Object type that is IpamHost

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the IpamHost object

    • namespace

      Project in which the IpamHost object has been created

    • labels

      Key-value pairs that are attached to the object:

      • cluster.sigs.k8s.io/cluster-name

        References the Cluster object name that IpamHost is assigned to

      • ipam/BMHostID

        Unique ID of the associated BareMetalHost object

      • ipam/MAC-XX-XX-XX-XX-XX-XX: "1"

        Number of NICs of the host that the corresponding MAC address is assigned to

      • ipam/MachineID

        Unique ID of the associated Machine object

      • ipam/UID

        Unique ID of the IpamHost object

      Warning

      Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: ipam.mirantis.com/v1alpha1
kind: IpamHost
metadata:
  name: master-0
  namespace: default
  labels:
    cluster.sigs.k8s.io/cluster-name: kaas-mgmt
    ipam/BMHostID: 57250885-f803-11ea-88c8-0242c0a85b02
    ipam/MAC-0C-C4-7A-1E-A9-5C: "1"
    ipam/MAC-0C-C4-7A-1E-A9-5D: "1"
    ipam/MachineID: 573386ab-f803-11ea-88c8-0242c0a85b02
    ipam/UID: 834a2fc0-f804-11ea-88c8-0242c0a85b02
IpamHost configuration

The spec field of the IpamHost resource describes the desired state of the object. It contains the following fields:

  • nicMACmap

    Represents an unordered list of all NICs of the host obtained during the bare metal host inspection. Each NIC entry contains such fields as name, mac, ip, and so on. The primary field defines which NIC was used for PXE booting. Only one NIC can be primary. The IP address is not configurable and is provided only for debug purposes.

  • l2TemplateSelector

    If specified, contains the name (first priority) or label of the L2 template that will be applied during a machine creation. The l2TemplateSelector field is copied from the Machine providerSpec object to the IpamHost object only once, during a machine creation. To modify l2TemplateSelector after creation of a Machine CR, edit the IpamHost object.

  • netconfigUpdateMode TechPreview

    Update mode of network configuration. Possible values:

    • MANUAL

      Default, recommended. An operator manually applies new network configuration.

    • AUTO-UNSAFE

      Unsafe, not recommended. If new network configuration is rendered by kaas-ipam successfully, it is applied automatically with no manual approval.

    • MANUAL-GRACEPERIOD

      Initial value set during the IpamHost object creation. If new network configuration is rendered by kaas-ipam successfully, it is applied automatically with no manual approval. This value is implemented for automatic changes in the IpamHost object during the host provisioning and deployment. The value is changed automatically to MANUAL in three hours after the IpamHost object creation.

    Caution

    For MKE clusters that are part of MOSK infrastructure, the feature support will become available in one of the following Container Cloud releases.

  • netconfigUpdateAllow TechPreview

    Manual approval of network changes. Possible values: true or false. Set to true to approve the Netplan configuration file candidate (stored in netconfigCandidate) and copy its contents to the effective Netplan configuration file list (stored in netconfigFiles). After that, its value is automatically switched back to false.

    Note

    This value has effect only if netconfigUpdateMode is set to MANUAL.

    Set to true only if status.netconfigCandidateState of network configuration candidate is OK.

    Caution

    The following fields of the ipamHost status are renamed since Container Cloud 2.22.0 in the scope of the L2Template and IpamHost objects refactoring:

    • netconfigV2 to netconfigCandidate

    • netconfigV2state to netconfigCandidateState

    • netconfigFilesState to netconfigFilesStates (per file)

    No user actions are required after renaming.

    The format of netconfigFilesState changed after renaming. The netconfigFilesStates field contains a dictionary of statuses of network configuration files stored in netconfigFiles. The dictionary contains the keys that are file paths and values that have the same meaning for each file that netconfigFilesState had:

    • For a successfully rendered configuration file: OK: <timestamp> <sha256-hash-of-rendered-file>, where a timestamp is in the RFC 3339 format.

    • For a failed rendering: ERR: <error-message>.

    Caution

    For MKE clusters that are part of MOSK infrastructure, the feature support will become available in one of the following Container Cloud releases.

Configuration example:

spec:
  nicMACmap:
  - mac: 0c:c4:7a:1e:a9:5c
    name: ens11f0
  - ip: 172.16.48.157
    mac: 0c:c4:7a:1e:a9:5d
    name: ens11f1
    primary: true
  l2TemplateSelector:
    label:xxx
  netconfigUpdateMode: manual
  netconfigUpdateAllow: false
IpamHost status

Caution

The following fields of the ipamHost status are renamed since Container Cloud 2.22.0 in the scope of the L2Template and IpamHost objects refactoring:

  • netconfigV2 to netconfigCandidate

  • netconfigV2state to netconfigCandidateState

  • netconfigFilesState to netconfigFilesStates (per file)

No user actions are required after renaming.

The format of netconfigFilesState changed after renaming. The netconfigFilesStates field contains a dictionary of statuses of network configuration files stored in netconfigFiles. The dictionary contains the keys that are file paths and values that have the same meaning for each file that netconfigFilesState had:

  • For a successfully rendered configuration file: OK: <timestamp> <sha256-hash-of-rendered-file>, where a timestamp is in the RFC 3339 format.

  • For a failed rendering: ERR: <error-message>.

The status field of the IpamHost resource describes the observed state of the object. It contains the following fields:

  • netconfigCandidate

    Candidate of the Netplan configuration file in human readable format that is rendered using the corresponding L2Template. This field contains valid data if l2RenderResult and netconfigCandidateState retain the OK result.

  • l2RenderResult Deprecated

    Status of a rendered Netplan configuration candidate stored in netconfigCandidate. Possible values:

    • For a successful L2 template rendering: OK: timestamp sha256-hash-of-rendered-netplan, where timestamp is in the RFC 3339 format

    • For a failed rendering: ERR: <error-message>

    This field is deprecated and will be removed in one of the following releases. Use netconfigCandidateState instead.

  • netconfigCandidateState TechPreview

    Status of a rendered Netplan configuration candidate stored in netconfigCandidate. Possible values:

    • For a successful L2 template rendering: OK: timestamp sha256-hash-of-rendered-netplan, where timestamp is in the RFC 3339 format

    • For a failed rendering: ERR: <error-message>

    Caution

    For MKE clusters that are part of MOSK infrastructure, the feature support will become available in one of the following Container Cloud releases.

  • netconfigFiles

    List of Netplan configuration files rendered using the corresponding L2Template. It is used to configure host networking during bare metal host provisioning and during Kubernetes node deployment. For details, refer to Workflow of the netplan configuration using an L2 template.

    Its contents are changed only if rendering of Netplan configuration was successful. So, it always retains the last successfully rendered Netplan configuration. To apply changes in contents, the Infrastructure Operator approval is required. For details, see Modify network configuration on an existing machine.

    Every item in this list contains:

    • content

      The base64-encoded Netplan configuration file that was rendered using the corresponding L2Template.

    • path

      The file path for the Netplan configuration file on the target host.

  • netconfigFilesStates

    Status of Netplan configuration files stored in netconfigFiles. Possible values are:

    • For a successful L2 template rendering: OK: timestamp sha256-hash-of-rendered-netplan, where timestamp is in the RFC 3339 format

    • For a failed rendering: ERR: <error-message>

  • serviceMap

    Dictionary of services and their endpoints (IP address and optional interface name) that have the ipam/SVC-<serviceName> label. These addresses are added to the ServiceMap dictionary during rendering of an L2 template for a given IpamHost. For details, see Service labels and their life cycle.

  • state Since 2.23.0

    Message that reflects the current status of the resource. The list of possible values includes the following:

    • OK - object is operational.

    • ERR - object is non-operational. This status has a detailed description in the messages list.

    • TERM - object was deleted and is terminating.

  • messages Since 2.23.0

    List of error or warning messages if the object state is ERR.

  • objCreated

    Date, time, and IPAM version of the resource creation.

  • objStatusUpdated

    Date, time, and IPAM version of the last update of the status field in the resource.

  • objUpdated

    Date, time, and IPAM version of the last resource update.

Configuration example:

status:
  l2RenderResult: OK
  l2TemplateRef: namespace_name/l2-template-name/1/2589/88865f94-04f0-4226-886b-2640af95a8ab
  netconfigFiles:
    - content: ...<base64-encoded Netplan configuration file>...
      path: /etc/netplan/60-kaas-lcm-netplan.yaml
  netconfigFilesStates: /etc/netplan/60-kaas-lcm-netplan.yaml: 'OK: 2023-01-23T09:27:22.71802Z ece7b73808999b540e32ca1720c6b7a6e54c544cc82fa40d7f6b2beadeca0f53'
  netconfigCandidate:
    ...
    <Netplan configuration file in plain text, rendered from L2Template>
    ...
  netconfigCandidateState: OK: 2022-06-08T03:18:08.49590Z a4a128bc6069638a37e604f05a5f8345cf6b40e62bce8a96350b5a29bc8bccde\
  serviceMap:
    ipam/SVC-ceph-cluster:
      - ifName: ceph-br2
        ipAddress: 10.0.10.11
      - ifName: ceph-br1
        ipAddress: 10.0.12.22
    ipam/SVC-ceph-public:
      - ifName: ceph-public
        ipAddress: 10.1.1.15
    ipam/SVC-k8s-lcm:
      - ifName: k8s-lcm
        ipAddress: 10.0.1.52
  phase: Active
  state: OK
  objCreated: 2021-10-21T19:09:32Z  by  v5.1.0-20210930-121522-f5b2af8
  objStatusUpdated: 2021-10-21T19:14:18.748114886Z  by  v5.1.0-20210930-121522-f5b2af8
  objUpdated: 2021-10-21T19:09:32.606968024Z  by  v5.1.0-20210930-121522-f5b2af8
L2Template

This section describes the L2Template resource used in Mirantis Container Cloud API.

By default, Container Cloud configures a single interface on cluster nodes, leaving all other physical interfaces intact. With L2Template, you can create advanced host networking configurations for your clusters. For example, you can create bond interfaces on top of physical interfaces on the host.

For demonstration purposes, the Container Cloud L2Template custom resource (CR) is split into the following major sections:

L2Template metadata

The Container Cloud L2Template CR contains the following fields:

  • apiVersion

    API version of the object that is ipam.mirantis.com/v1alpha1.

  • kind

    Object type that is L2Template.

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the L2Template object.

    • namespace

      Project in which the L2Template object was created.

    • labels

      Key-value pairs that are attached to the object:

      Caution

      All ipam/* labels, except ipam/DefaultForCluster, are set automatically and must not be configured manually.

      • cluster.sigs.k8s.io/cluster-name

        References the Cluster object name that this template is applied to. Mandatory for newly created L2Template since Container Cloud 2.25.0.

        The process of selecting the L2Template object for a specific cluster is as follows:

        1. The kaas-ipam controller monitors the L2Template objects with the cluster.sigs.k8s.io/cluster-name: <clusterName> label.

        2. The L2Template object with the cluster.sigs.k8s.io/cluster-name: <clusterName> label is assigned to a cluster with Name: <clusterName>, if available.

      • ipam/PreInstalledL2Template: "1"

        Is automatically added during a management cluster deployment. Indicates that the current L2Template object was preinstalled. Represents L2 templates that are automatically copied to a project once it is created. Once the L2 templates are copied, the ipam/PreInstalledL2Template label is removed.

        Note

        Preinstalled L2 templates are removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0) along with the ipam/PreInstalledL2Template label. During cluster update to the mentioned releases, existing preinstalled templates are automatically removed.

      • ipam/DefaultForCluster

        This label is unique per cluster. When you use several L2 templates per cluster, only the first template is automatically labeled as the default one. All consequent templates must be referenced in the machines configuration files using L2templateSelector. You can manually configure this label if required.

      • ipam/UID

        Unique ID of an object.

      • kaas.mirantis.com/provider

        Provider type.

      • kaas.mirantis.com/region

        Region name.

        Note

        The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

      Warning

      Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  name: l2template-test
  namespace: default
  labels:
    ipam/DefaultForCluster: "1"
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
L2Template configuration

The spec field of the L2Template resource describes the desired state of the object. It contains the following fields:

  • clusterRef

    Caution

    Deprecated since Container Cloud 2.25.0 in favor of the mandatory cluster.sigs.k8s.io/cluster-name label. Will be removed in one of the following releases.

    On existing clusters, this parameter is automatically migrated to the cluster.sigs.k8s.io/cluster-name label since 2.25.0.

    If an existing cluster has clusterRef: default set, the migration process involves removing this parameter. Subsequently, it is not substituted with the cluster.sigs.k8s.io/cluster-name label, ensuring the application of the L2 template across the entire Kubernetes namespace.

    The Cluster object name that this template is applied to. The default value is used to apply the given template to all clusters within a particular project, unless an L2 template that references a specific cluster name exists. The clusterRef field has priority over the cluster.sigs.k8s.io/cluster-name label:

    • When clusterRef is set to a non-default value, the cluster.sigs.k8s.io/cluster-name label will be added or updated with that value.

    • When clusterRef is set to default, the cluster.sigs.k8s.io/cluster-name label will be absent or removed.

    L2 template requirements

    • An L2 template must have the same project (Kubernetes namespace) as the referenced cluster.

    • A cluster can be associated with many L2 templates. Only one of them can have the ipam/DefaultForCluster label. Every L2 template that does not have the ipam/DefaultForCluster label can be later assigned to a particular machine using l2TemplateSelector.

    • The following rules apply to the default L2 template of a namespace:

      • Since Container Cloud 2.25.0, creation of the default L2 template for a namespace is disabled. On existing clusters, the Spec.clusterRef: default parameter of such an L2 template is automatically removed during the migration process. Subsequently, this parameter is not substituted with the cluster.sigs.k8s.io/cluster-name label, ensuring the application of the L2 template across the entire Kubernetes namespace. Therefore, you can continue using existing default namespaced L2 templates.

      • Before Container Cloud 2.25.0, the default L2Template object of a namespace must have the Spec.clusterRef: default parameter that is deprecated since 2.25.0.

  • ifMapping

    List of interface names for the template. The interface mapping is defined globally for all bare metal hosts in the cluster but can be overridden at the host level, if required, by editing the IpamHost object for a particular host. The ifMapping parameter is mutually exclusive with autoIfMappingPrio.

  • autoIfMappingPrio

    autoIfMappingPrio is a list of prefixes, such as eno, ens, and so on, to match the interfaces to automatically create a list for the template. If you are not aware of any specific ordering of interfaces on the nodes, use the default ordering from Predictable Network Interfaces Names specification for systemd. You can also override the default NIC list per host using the IfMappingOverride parameter of the corresponding IpamHost. The provision value corresponds to the network interface that was used to provision a node. Usually, it is the first NIC found on a particular node. It is defined explicitly to ensure that this interface will not be reconfigured accidentally.

    The autoIfMappingPrio parameter is mutually exclusive with ifMapping.

  • l3Layout

    Subnets to be used in the npTemplate section. The field contains a list of subnet definitions with parameters used by template macros.

    • subnetName

      Defines the alias name of the subnet that can be used to reference this subnet from the template macros. This parameter is mandatory for every entry in the l3Layout list.

    • subnetPool Unsupported since 2.28.0 (17.3.0 and 16.3.0)

      Optional. Default: none. Defines a name of the parent SubnetPool object that will be used to create a Subnet object with a given subnetName and scope. For deprecation details, see MOSK Deprecation Notes: SubnetPool resource management.

      If a corresponding Subnet object already exists, nothing will be created and the existing object will be used. If no SubnetPool is provided, no new Subnet object will be created.

    • scope

      Logical scope of the Subnet object with a corresponding subnetName. Possible values:

      • global - the Subnet object is accessible globally, for any Container Cloud project and cluster, for example, the PXE subnet.

      • namespace - the Subnet object is accessible within the same project where the L2 template is defined.

      • cluster - the Subnet object is only accessible to the cluster that L2Template.spec.clusterRef refers to. The Subnet objects with the cluster scope will be created for every new cluster.

    • labelSelector

      Contains a dictionary of labels and their respective values that will be used to find the matching Subnet object for the subnet. If the labelSelector field is omitted, the Subnet object will be selected by name, specified by the subnetName parameter.

      Caution

      The labels and their values in this section must match the ones added for the corresponding Subnet object.

    Caution

    The l3Layout section is mandatory for each L2Template custom resource.

  • npTemplate

    A netplan-compatible configuration with special lookup functions that defines the networking settings for the cluster hosts, where physical NIC names and details are parameterized. This configuration will be processed using Go templates. Instead of specifying IP and MAC addresses, interface names, and other network details specific to a particular host, the template supports use of special lookup functions. These lookup functions, such as nic, mac, ip, and so on, return host-specific network information when the template is rendered for a particular host.

    Caution

    All rules and restrictions of the netplan configuration also apply to L2 templates. For details, see the official netplan documentation.

    Caution

    We strongly recommend following the below conventions on network interface naming:

    • A physical NIC name set by an L2 template must not exceed 15 symbols. Otherwise, an L2 template creation fails. This limit is set by the Linux kernel.

    • Names of virtual network interfaces such as VLANs, bridges, bonds, veth, and so on must not exceed 15 symbols.

    We recommend setting interfaces names that do not exceed 13 symbols for both physical and virtual interfaces to avoid corner cases and issues in netplan rendering.

Configuration example:

spec:
  autoIfMappingPrio:
  - provision
  - eno
  - ens
  - enp
  l3Layout:
    - subnetName: kaas-mgmt
      scope:      global
      labelSelector:
        kaas-mgmt-subnet: ""
    - subnetName: demo-pods
      scope:      namespace
    - subnetName: demo-ext
      scope:      namespace
    - subnetName: demo-ceph-cluster
      scope:      namespace
    - subnetName: demo-ceph-replication
      scope:      namespace
  npTemplate: |
    version: 2
    ethernets:
      {{nic 1}}:
        dhcp4: false
        dhcp6: false
        addresses:
          - {{ip "1:kaas-mgmt"}}
        gateway4: {{gateway_from_subnet "kaas-mgmt"}}
        nameservers:
          addresses: {{nameservers_from_subnet "kaas-mgmt"}}
        match:
          macaddress: {{mac 1}}
        set-name: {{nic 1}}
L2Template status

The status field of the L2Template resource reflects the actual state of the L2Template object and contains the following fields:

  • state Since 2.23.0

    Message that reflects the current status of the resource. The list of possible values includes the following:

    • OK - object is operational.

    • ERR - object is non-operational. This status has a detailed description in the messages list.

    • TERM - object was deleted and is terminating.

  • messages Since 2.23.0

    List of error or warning messages if the object state is ERR.

  • objCreated

    Date, time, and IPAM version of the resource creation.

  • objStatusUpdated

    Date, time, and IPAM version of the last update of the status field in the resource.

  • objUpdated

    Date, time, and IPAM version of the last resource update.

  • phase

    Deprecated since Container Cloud 2.23.0 and will be removed in one of the following releases in favor of state. Possible values: Active, Failed, or Terminating.

  • reason

    Deprecated since Container Cloud 2.23.0 and will be removed in one of the following releases in favor of messages. For the field description, see messages.

Configuration example:

status:
  phase: Failed
  state: ERR
  messages:
    - "ERR: The kaas-mgmt subnet in the terminating state."
  objCreated: 2021-10-21T19:09:32Z  by  v5.1.0-20210930-121522-f5b2af8
  objStatusUpdated: 2021-10-21T19:14:18.748114886Z  by  v5.1.0-20210930-121522-f5b2af8
  objUpdated: 2021-10-21T19:09:32.606968024Z  by  v5.1.0-20210930-121522-f5b2af8
Machine

This section describes the Machine resource used in Mirantis Container Cloud API for bare metal provider. The Machine resource describes the machine-level parameters.

For demonstration purposes, the Container Cloud Machine custom resource (CR) is split into the following major sections:

metadata

The Container Cloud Machine CR contains the following fields:

  • apiVersion

    API version of the object that is cluster.k8s.io/v1alpha1.

  • kind

    Object type that is Machine.

The metadata object field of the Machine resource contains the following fields:

  • name

    Name of the Machine object.

  • namespace

    Project in which the Machine object is created.

  • annotations

    Key-value pair to attach arbitrary metadata to the object:

    • metal3.io/BareMetalHost

      Annotation attached to the Machine object to reference the corresponding BareMetalHostInventory object in the <BareMetalHostProjectName/BareMetalHostName> format.

      Note

      Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

      Caution

      While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

  • labels

    Key-value pairs that are attached to the object:

    • kaas.mirantis.com/provider

      Provider type that matches the provider type in the Cluster object and must be baremetal.

    • kaas.mirantis.com/region

      Region name that matches the region name in the Cluster object.

      Note

      The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

    • cluster.sigs.k8s.io/cluster-name

      Cluster name that the Machine object is linked to.

    • cluster.sigs.k8s.io/control-plane

      For the control plane role of a machine, this label contains any value, for example, "true". For the worker role, this label is absent.

    Warning

    Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: example-control-plane
  namespace: example-ns
  annotations:
    metal3.io/BareMetalHost: default/master-0
  labels:
    kaas.mirantis.com/provider: baremetal
    cluster.sigs.k8s.io/cluster-name: example-cluster
    cluster.sigs.k8s.io/control-plane: "true" # remove for worker
spec:providerSpec for instance configuration

The spec object field of the Machine object represents the BareMetalMachineProviderSpec subresource with all required details to create a bare metal instance. It contains the following fields:

  • apiVersion

    API version of the object that is baremetal.k8s.io/v1alpha1.

  • kind

    Object type that is BareMetalMachineProviderSpec.

  • bareMetalHostProfile

    Configuration profile of a bare metal host:

    • name

      Name of a bare metal host profile

    • namespace

      Project in which the bare metal host profile is created.

  • l2TemplateIfMappingOverride

    If specified, overrides the interface mapping value for the corresponding L2Template object.

  • l2TemplateSelector

    If specified, contains the name (first priority) or label of the L2 template that will be applied during a machine creation. The l2TemplateSelector field is copied from the Machine providerSpec object to the IpamHost object only once, during a machine creation. To modify l2TemplateSelector after creation of a Machine CR, edit the IpamHost object.

  • hostSelector

    Specifies the matching criteria for labels on the bare metal hosts. Limits the set of the BareMetalHostInventory objects considered for claiming for the Machine object. The following selector labels can be added when creating a machine using the Container Cloud web UI:

    • hostlabel.bm.kaas.mirantis.com/controlplane

    • hostlabel.bm.kaas.mirantis.com/worker

    • hostlabel.bm.kaas.mirantis.com/storage

    Any custom label that is assigned to one or more bare metal hosts using API can be used as a host selector. If the BareMetalHostInventory objects with the specified label are missing, the Machine object will not be deployed until at least one bare metal host with the specified label is available.

    Note

    Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

    Caution

    While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

  • nodeLabels

    List of node labels to be attached to a node for the user to run certain components on separate cluster nodes. The list of allowed node labels is located in the Cluster object status providerStatus.releaseRef.current.allowedNodeLabels field.

    If the value field is not defined in allowedNodeLabels, a label can have any value.

    Before or after a machine deployment, add the required label from the allowed node labels list with the corresponding value to spec.providerSpec.value.nodeLabels in machine.yaml. For example:

    nodeLabels:
    - key: stacklight
      value: enabled
    

    The addition of a node label that is not available in the list of allowed node labels is restricted.

  • distribution Mandatory

    Specifies an operating system (OS) distribution ID that is present in the current ClusterRelease object under the AllowedDistributions list. When specified, the BareMetalHostInventory object linked to this Machine object will be provisioned using the selected OS distribution instead of the default one.

    By default, ubuntu/jammy is installed on greenfield managed clusters:

    • Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0), for MOSK clusters

    • Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0), for non-MOSK clusters

    The default distribution is marked with the boolean flag default inside one of the elements under the AllowedDistributions list.

    The ubuntu/focal distribution was deprecated in Container Cloud 2.28.0 and only supported for existing managed clusters. Container Cloud 2.28.x release series is the last one to support Ubuntu 20.04 as the host operating system for managed clusters.

    Caution

    The outdated ubuntu/bionic distribution, which is removed in Cluster releases 17.0.0 and 16.0.0, is only supported for existing clusters based on Ubuntu 18.04. For greenfield deployments of managed clusters, only ubuntu/jammy is supported.

    Warning

    During the course of the Container Cloud 2.28.x series, Mirantis highly recommends upgrading an operating system on any nodes of all your managed cluster machines to Ubuntu 22.04 before the next major Cluster release becomes available.

    It is not mandatory to upgrade all machines at once. You can upgrade them one by one or in small batches, for example, if the maintenance window is limited in time.

    Otherwise, the Cluster release update of the Ubuntu 20.04-based managed clusters will become impossible as of Container Cloud 2.29.0 with Ubuntu 22.04 as the only supported version.

    Management cluster update to Container Cloud 2.29.1 will be blocked if at least one node of any related managed cluster is running Ubuntu 20.04.

  • maintenance

    Maintenance mode of a machine. If enabled, the node of the selected machine is drained, cordoned, and prepared for maintenance operations.

  • upgradeIndex (optional)

    Positive numeral value that determines the order of machines upgrade. The first machine to upgrade is always one of the control plane machines with the lowest upgradeIndex. Other control plane machines are upgraded one by one according to their upgrade indexes.

    If the Cluster spec dedicatedControlPlane field is false, worker machines are upgraded only after the upgrade of all control plane machines finishes. Otherwise, they are upgraded after the first control plane machine, concurrently with other control plane machines.

    If two or more machines have the same value of upgradeIndex, these machines are equally prioritized during upgrade.

  • deletionPolicy

    Generally available since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0). Technology Preview since 2.21.0 (Cluster releases 11.5.0 and 7.11.0) for non-MOSK clusters. Policy used to identify steps required during a Machine object deletion. Supported policies are as follows:

    • graceful

      Prepares a machine for deletion by cordoning, draining, and removing from Docker Swarm of the related node. Then deletes Kubernetes objects and associated resources. Can be aborted only before a node is removed from Docker Swarm.

    • unsafe

      Default. Deletes Kubernetes objects and associated resources without any preparations.

    • forced

      Deletes Kubernetes objects and associated resources without any preparations. Removes the Machine object even if the cloud provider or LCM Controller gets stuck at some step. May require a manual cleanup of machine resources in case of the controller failure.

    For more details on the workflow of machine deletion policies, see MOSK documentation: Overview of machine deletion policies.

Configuration example:

spec:
  ...
  providerSpec:
    value:
      apiVersion: baremetal.k8s.io/v1alpha1
      kind: BareMetalMachineProviderSpec
      bareMetalHostProfile:
        name: default
        namespace: default
      l2TemplateIfMappingOverride:
        - eno1
        - enp0s0
      l2TemplateSelector:
        label: l2-template1-label-1
      hostSelector:
        matchLabels:
          kaas.mirantis.com/baremetalhost-id: hw-master-0
      kind: BareMetalMachineProviderSpec
      nodeLabels:
      - key: stacklight
        value: enabled
      distribution: ubuntu/jammy
      delete: false
      deletionPolicy: graceful
Machine status

The status object field of the Machine object represents the BareMetalMachineProviderStatus subresource that describes the current bare metal instance state and contains the following fields:

  • apiVersion

    API version of the object that is cluster.k8s.io/v1alpha1.

  • kind

    Object type that is BareMetalMachineProviderStatus.

  • hardware

    Provides a machine hardware information:

    • cpu

      Number of CPUs.

    • ram

      RAM capacity in GB.

    • storage

      List of hard drives mounted on the machine. Contains the disk name and size in GB.

  • status

    Represents the current status of a machine:

    • Provision

      A machine is yet to obtain a status

    • Uninitialized

      A machine is yet to obtain the node IP address and host name

    • Pending

      A machine is yet to receive the deployment instructions and it is either not booted yet or waits for the LCM controller to be deployed

    • Prepare

      A machine is running the Prepare phase during which Docker images and packages are being predownloaded

    • Deploy

      A machine is processing the LCM Controller instructions

    • Reconfigure

      A machine is being updated with a configuration without affecting workloads running on the machine

    • Ready

      A machine is deployed and the supported Mirantis Kubernetes Engine (MKE) version is set

    • Maintenance

      A machine host is cordoned, drained, and prepared for maintenance operations

  • currentDistribution Since 2.24.0 as TechPreview and 2.24.2 as GA

    Distribution ID of the current operating system installed on the machine. For example, ubuntu/jammy.

  • maintenance

    Maintenance mode of a machine. If enabled, the node of the selected machine is drained, cordoned, and prepared for maintenance operations.

  • reboot Available since 2.22.0

    Indicator of a host reboot to complete the Ubuntu operating system updates, if any.

    • required

      Specifies whether a host reboot is required. Boolean. If true, a manual host reboot is required.

    • reason

      Specifies the package name(s) to apply during a host reboot.

  • upgradeIndex

    Positive numeral value that determines the order of machines upgrade. If upgradeIndex in the Machine object spec is set, this status value equals the one in the spec. Otherwise, this value displays the automatically generated order of upgrade.

  • delete

    Generally available since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0). Technology Preview since 2.21.0 for non-MOSK clusters. Start of a machine deletion or a successful abortion. Boolean.

  • prepareDeletionPhase

    Generally available since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0). Technology Preview since 2.21.0 for non-MOSK clusters. Preparation phase for a graceful machine deletion. Possible values are as follows:

    • started

      Cloud provider controller prepares a machine for deletion by cordoning, draining the machine, and so on.

    • completed

      LCM Controller starts removing the machine resources since the preparation for deletion is complete.

    • aborting

      Cloud provider controller attempts to uncordon the node. If the attempt fails, the status changes to failed.

    • failed

      Error in the deletion workflow.

    For the workflow description of a graceful deletion, see MOSK documentation: Overview of machine deletion policies.

Configuration example:

status:
  providerStatus:
    apiVersion: baremetal.k8s.io/v1alpha1
    kind: BareMetalMachineProviderStatus
    hardware:
      cpu: 11
      ram: 16
    storage:
      - name: /dev/vda
        size: 61
      - name: /dev/vdb
        size: 32
      - name: /dev/vdc
        size: 32
    reboot:
      required: true
      reason: |
        linux-image-5.13.0-51-generic
        linux-base
    status: Ready
    upgradeIndex: 1
MetalLBConfig

TechPreview since 2.21.0 and 2.21.1 for MOSK 22.5 GA since 2.24.0 for management and regional clusters GA since 2.25.0 for managed clusters

This section describes the MetalLBConfig custom resource used in the Container Cloud API that contains the MetalLB configuration objects for a particular cluster.

For demonstration purposes, the Container Cloud MetalLBConfig custom resource description is split into the following major sections:

The Container Cloud API also uses the third-party open source MetalLB API. For details, see MetalLB objects.

MetalLBConfig metadata

The Container Cloud MetalLBConfig CR contains the following fields:

  • apiVersion

    API version of the object that is kaas.mirantis.com/v1alpha1.

  • kind

    Object type that is MetalLBConfig.

The metadata object field of the MetalLBConfig resource contains the following fields:

  • name

    Name of the MetalLBConfig object.

  • namespace

    Project in which the object was created. Must match the project name of the target cluster.

  • labels

    Key-value pairs attached to the object. Mandatory labels:

    • kaas.mirantis.com/provider

      Provider type that is baremetal.

    • kaas.mirantis.com/region

      Region name that matches the region name of the target cluster.

      Note

      The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

    • cluster.sigs.k8s.io/cluster-name

      Name of the cluster that the MetalLB configuration must apply to.

    Warning

    Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: kaas.mirantis.com/v1alpha1
kind: MetalLBConfig
metadata:
  name: metallb-demo
  namespace: test-ns
  labels:
    kaas.mirantis.com/provider: baremetal
    cluster.sigs.k8s.io/cluster-name: test-cluster
MetalLBConfig spec

The spec field of the MetalLBConfig object represents the MetalLBConfigSpec subresource that contains the description of MetalLB configuration objects. These objects are created in the target cluster during its deployment.

The spec field contains the following optional fields:

  • addressPools

    Removed in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0), deprecated in 2.26.0 (Cluster releases 17.2.0 and 16.2.0).

    List of MetalLBAddressPool objects to create MetalLB AddressPool objects.

  • bfdProfiles

    List of MetalLBBFDProfile objects to create MetalLB BFDProfile objects.

  • bgpAdvertisements

    List of MetalLBBGPAdvertisement objects to create MetalLB BGPAdvertisement objects.

  • bgpPeers

    List of MetalLBBGPPeer objects to create MetalLB BGPPeer objects.

  • communities

    List of MetalLBCommunity objects to create MetalLB Community objects.

  • ipAddressPools

    List of MetalLBIPAddressPool objects to create MetalLB IPAddressPool objects.

  • l2Advertisements

    List of MetalLBL2Advertisement objects to create MetalLB L2Advertisement objects.

    The l2Advertisements object allows defining interfaces to optimize the announcement. When you use the interfaces selector, LB addresses are announced only on selected host interfaces.

    Mirantis recommends using the interfaces selector if nodes use separate host networks for different types of traffic. The pros of such configuration are as follows: less spam on other interfaces and networks and limited chances to reach IP addresses of load-balanced services from irrelevant interfaces and networks.

    Caution

    Interface names in the interfaces list must match those on the corresponding nodes.

  • templateName

    Unsupported since 2.28.0 (17.3.0 and 16.3.0). Available since 2.24.0 (14.0.0). For details, see MOSK Deprecation Notes: MetalLBConfigTemplate resource management.

    Name of the MetalLBConfigTemplate object used as a source of MetalLB configuration objects. Mutually exclusive with the fields listed below that will be part of the MetalLBConfigTemplate object. For details, see MetalLBConfigTemplate.

    Before Cluster releases 17.2.0 and 16.2.0, MetalLBConfigTemplate is the default configuration method for MetalLB on bare metal deployments. Since Cluster releases 17.2.0 and 16.2.0, use the MetalLBConfig object instead.

    Caution

    For MKE clusters that are part of MOSK infrastructure, the feature support will become available in one of the following Container Cloud releases.

    Caution

    For managed clusters, this field is available as Technology Preview since Container Cloud 2.24.0, is generally available since 2.25.0, and is deprecated since 2.27.0.


The objects listed in the spec field of the MetalLBConfig object, such as MetalLBIPAddressPool, MetalLBL2Advertisement, and so on, are used as templates for the MetalLB objects that will be created in the target cluster. Each of these objects has the following structure:

  • labels

    Optional. Key-value pairs attached to the metallb.io/<objectName> object as metadata.labels.

  • name

    Name of the metallb.io/<objectName> object.

  • spec

    Contents of the spec section of the metallb.io/<objectName> object. The spec field has the metallb.io/<objectName>Spec type. For details, see MetalLB objects.

For example, MetalLBIPAddressPool is a template for the metallb.io/IPAddressPool object and has the following structure:

  • labels

    Optional. Key-value pairs attached to the metallb.io/IPAddressPool object as metadata.labels.

  • name

    Name of the metallb.io/IPAddressPool object.

  • spec

    Contents of spec section of the metallb.io/IPAddressPool object. The spec has the metallb.io/IPAddressPoolSpec type.

MetalLB objects

Container Cloud supports the following MetalLB object types of the metallb.io API group:

  • IPAddressPool

  • Community

  • L2Advertisement

  • BFDProfile

  • BGPAdvertisement

  • BGPPeer

As of v1beta1 and v1beta2 API versions, metadata of MetalLB objects has a standard format with no specific fields or labels defined for any particular object:

  • apiVersion

    API version of the object that can be metallb.io/v1beta1 or metallb.io/v1beta2.

  • kind

    Object type that is one of the metallb.io types listed above. For example, IPAddressPool.

  • metadata

    Object metadata that contains the following subfields:

    • name

      Name of the object.

    • namespace

      Namespace where the MetalLB components are located. It matches metallb-system in Container Cloud.

    • labels

      Optional. Key-value pairs that are attached to the object. It can be an arbitrary set of labels. No special labels are defined as of v1beta1 and v1beta2 API versions.

The MetalLBConfig object contains spec sections of the metallb.io/<objectName> objects that have the metallb.io/<objectName>Spec type. For metallb.io/<objectName> and metallb.io/<objectName>Spec types definitions, refer to the official MetalLB documentation:

Note

Before Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0), metallb.io/<objectName> objects v0.13.9 are supported.

The l2Advertisements object allows defining interfaces to optimize the announcement. When you use the interfaces selector, LB addresses are announced only on selected host interfaces. Mirantis recommends this configuration if nodes use separate host networks for different types of traffic. The pros of such configuration are as follows: less spam on other interfaces and networks, limited chances to reach services LB addresses from irrelevant interfaces and networks.

Configuration example:

l2Advertisements: |
  - name: management-lcm
    spec:
      ipAddressPools:
        - default
      interfaces:
        # LB addresses from the "default" address pool will be announced
        # on the "k8s-lcm" interface
        - k8s-lcm

Caution

Interface names in the interfaces list must match those on the corresponding nodes.

MetalLBConfig status

Available since 2.24.0 for management clusters

Caution

For managed clusters, this field is available as Technology Preview and is generally available since Container Cloud 2.25.0.

Caution

For MKE clusters that are part of MOSK infrastructure, the feature support will become available in one of the following Container Cloud releases.

The status field describes the actual state of the object. It contains the following fields:

  • bootstrapMode Only in 2.24.0

    Field that appears only during a management cluster bootstrap as true and is used internally for bootstrap. Once deployment completes, the value is moved to false and is excluded from the status output.

  • objects

    Description of MetalLB objects that is used to create MetalLB native objects in the target cluster.

    The format of underlying objects is the same as for those in the spec field, except templateName, which is obsolete since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0) and which is not present in this field. The objects contents are rendered from the following locations, with possible modifications for the bootstrap cluster:

    • Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0), MetalLBConfig.spec

    • Before Container Cloud 2.28.0 (Cluster releases 17.2.0, 16.2.0, or earlier):

      • MetalLBConfigTemplate.status of the corresponding template if MetalLBConfig.spec.templateName is defined

      • MetalLBConfig.spec if MetalLBConfig.spec.templateName is not defined

  • propagateResult

    Result of objects propagation. During objects propagation, native MetalLB objects of the target cluster are created and updated according to the description of the objects present in the status.objects field.

    This field contains the following information:

    • message

      Text message that describes the result of the last attempt of objects propagation. Contains an error message if the last attempt was unsuccessful.

    • success

      Result of the last attempt of objects propagation. Boolean.

    • time

      Timestamp of the last attempt of objects propagation. For example, 2023-07-04T00:30:36Z.

    If the objects propagation was successful, the MetalLB objects of the target cluster match the ones present in the status.objects field.

  • updateResult

    Status of the MetalLB objects update. Has the same format of subfields that in propagateResult described above.

    During objects update, the status.objects contents are rendered as described in the objects field definition above.

    If the objects update was successful, the MetalLB objects description present in status.objects is rendered successfully and up to date. This description is used to update MetalLB objects in the target cluster. If the objects update was not successful, MetalLB objects will not be propagated to the target cluster.

MetalLB configuration examples

Example of configuration template for using L2 announcements:

apiVersion: kaas.mirantis.com/v1alpha1
kind: MetalLBConfig
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: managed-cluster
    kaas.mirantis.com/provider: baremetal
  name: managed-l2
  namespace: managed-ns
spec:
  ipAddressPools:
    - name: services
      spec:
        addresses:
          - 10.100.91.151-10.100.91.170
        autoAssign: true
        avoidBuggyIPs: false
  l2Advertisements:
    - name: services
      spec:
        ipAddressPools:
        - services

Example of configuration extract for using the interfaces selector, which enables announcement of LB addresses only on selected host interfaces:

l2Advertisements:
  - name: services
    spec:
      ipAddressPools:
      - default
      interfaces:
      - k8s-lcm

Caution

Interface names in the interfaces list must match the ones on the corresponding nodes.

After the object is created and processed by the MetalLB Controller, the status field is added. For example:

status:
  objects:
    ipAddressPools:
    - name: services
      spec:
        addresses:
        - 10.100.100.151-10.100.100.170
        autoAssign: true
        avoidBuggyIPs: false
    l2Advertisements:
      - name: services
        spec:
          ipAddressPools:
          - services
  propagateResult:
    message: Objects were successfully updated
    success: true
    time: "2023-07-04T14:31:40Z"
  updateResult:
    message: Objects were successfully read from MetalLB configuration specification
    success: true
    time: "2023-07-04T14:31:39Z"

Example of native MetalLB objects to be created in the managed-ns/managed-cluster cluster during deployment:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: services
  namespace: metallb-system
spec:
  addresses:
  - 10.100.91.151-10.100.91.170
  autoAssign: true
  avoidBuggyIPs: false
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: services
  namespace: metallb-system
spec:
  ipAddressPools:
  - services

Example of configuration template for using BGP announcements:

apiVersion: kaas.mirantis.com/v1alpha1
kind: MetalLBConfig
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: managed-cluster
    kaas.mirantis.com/provider: baremetal
  name: managed-bgp
  namespace: managed-ns
spec:
  bgpPeers:
    - name: bgp-peer-rack1
      spec:
        peerAddress: 10.0.41.1
        peerASN: 65013
        myASN: 65099
        nodeSelectors:
          - matchLabels:
              rack-id: rack1
    - name: bgp-peer-rack2
      spec:
        peerAddress: 10.0.42.1
        peerASN: 65023
        myASN: 65099
        nodeSelectors:
          - matchLabels:
              rack-id: rack2
    - name: bgp-peer-rack3
      spec:
        peerAddress: 10.0.43.1
        peerASN: 65033
        myASN: 65099
        nodeSelectors:
          - matchLabels:
              rack-id: rack3
  ipAddressPools:
    - name: services
      spec:
        addresses:
          - 10.100.191.151-10.100.191.170
        autoAssign: true
        avoidBuggyIPs: false
  bgpAdvertisements:
    - name: services
      spec:
        ipAddressPools:
        - services
MetalLBConfigTemplate

Unsupported since 2.28.0 (17.3.0 and 16.3.0)

Warning

The MetalLBConfigTemplate object may not work as expected due to its deprecation. For details, see MOSK Deprecation Notes: MetalLBConfigTemplate resource management.

Support status of MetalLBConfigTemplate

Container Cloud release

Cluster release

Support status

2.29.0

17.4.0 and 16.4.0

Admission Controller blocks creation of the object

2.28.0

17.3.0 and 16.3.0

Unsupported for any cluster type

2.27.0

17.2.0 and 16.2.0

Deprecated for any cluster type

2.25.0

17.0.0 and 16.0.0

Generally available for managed clusters

2.24.2

15.0.1, 14.0.1, 14.0.0

Technology Preview for managed clusters

2.24.0

14.0.0

Generally available for management clusters

This section describes the MetalLBConfigTemplate custom resource used in the Container Cloud API that contains the template for MetalLB configuration for a particular cluster.

Note

The MetalLBConfigTemplate object applies to bare metal deployments only.

Before Cluster releases 17.2.0 and 16.2.0, MetalLBConfigTemplate is the default configuration method for MetalLB on bare metal deployments. This method allows the use of Subnet objects to define MetalLB IP address pools the same way as they were used before introducing the MetalLBConfig and MetalLBConfigTemplate objects. Since Cluster releases 17.2.0 and 16.2.0, use the MetalLBConfig object for this purpose instead.

For demonstration purposes, the Container Cloud MetalLBConfigTemplate custom resource description is split into the following major sections:

MetalLBConfigTemplate metadata

The Container Cloud MetalLBConfigTemplate CR contains the following fields:

  • apiVersion

    API version of the object that is ipam.mirantis.com/v1alpha1.

  • kind

    Object type that is MetalLBConfigTemplate.

The metadata object field of the MetalLBConfigTemplate resource contains the following fields:

  • name

    Name of the MetalLBConfigTemplate object.

  • namespace

    Project in which the object was created. Must match the project name of the target cluster.

  • labels

    Key-value pairs attached to the object. Mandatory labels:

    • kaas.mirantis.com/provider

      Provider type that is baremetal.

    • kaas.mirantis.com/region

      Region name that matches the region name of the target cluster.

      Note

      The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

    • cluster.sigs.k8s.io/cluster-name

      Name of the cluster that the MetalLB configuration applies to.

    Warning

    Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: ipam.mirantis.com/v1alpha1
kind: MetalLBConfigTemplate
metadata:
  name: metallb-demo
  namespace: test-ns
  labels:
    kaas.mirantis.com/provider: baremetal
    cluster.sigs.k8s.io/cluster-name: test-cluster
MetalLBConfigTemplate spec

The spec field of the MetalLBConfigTemplate object contains the templates of MetalLB configuration objects and optional auxiliary variables. Container Cloud uses these templates to create MetalLB configuration objects during the cluster deployment.

The spec field contains the following optional fields:

  • machines

    Key-value dictionary to select IpamHost objects corresponding to nodes of the target cluster. Keys contain machine aliases used in spec.templates. Values contain the NameLabelsSelector items that select IpamHost by name or by labels. For example:

    machines:
      control1:
        name: mosk-control-uefi-0
      worker1:
        labels:
          uid: kaas-node-4003a5f6-2667-40e3-aa64-ebe713a8a7ba
    

    This field is required if some IP addresses of nodes are used in spec.templates.

  • vars

    Key-value dictionary of arbitrary user-defined variables that are used in spec.templates. For example:

    vars:
      localPort: 4561
    
  • templates

    List of templates for MetalLB configuration objects that are used to render MetalLB configuration definitions and create MetalLB objects in the target cluster. Contains the following optional fields:

    • bfdProfiles

      Template for the MetalLBBFDProfile object list to create MetalLB BFDProfile objects.

    • bgpAdvertisements

      Template for the MetalLBBGPAdvertisement object list to create MetalLB BGPAdvertisement objects.

    • bgpPeers

      Template for the MetalLBBGPPeer object list to create MetalLB BGPPeer objects.

    • communities

      Template for the MetalLBCommunity object list to create MetalLB Community objects.

    • ipAddressPools

      Template for the MetalLBIPAddressPool object list to create MetalLB IPAddressPool objects.

    • l2Advertisements

      Template for the MetalLBL2Advertisement object list to create MetalLB L2Advertisement objects.

    Each template is a string and has the same structure as the list of the corresponding objects described in MetalLBConfig spec such as MetalLBIPAddressPool and MetalLBL2Advertisement, but you can use additional functions and variables inside these templates.

    Note

    When using the MetalLBConfigTemplate object, you can define MetalLB IP address pools using both Subnet objects and spec.ipAddressPools templates. IP address pools rendered from these sources will be concatenated and then written to status.renderedObjects.ipAddressPools.

    You can use the following functions in templates:

    • ipAddressPoolNames

      Selects all IP address pools of the given announcement type found for the target cluster. Possible types: layer2, bgp, any.

      The any type includes all IP address pools found for the target cluster. The announcement types of IP address pools are verified using the metallb/address-pool-protocol labels of the corresponding Subnet object.

      The ipAddressPools templates have no types as native MetalLB IPAddressPool objects have no announcement type.

      The l2Advertisements template can refer to IP address pools of the layer2 or any type.

      The bgpAdvertisements template can refer to IP address pools of the bgp or any type.

      IP address pools are searched in the templates.ipAddressPools field and in the Subnet objects of the target cluster. For example:

      l2Advertisements: |
        - name: l2services
          spec:
            ipAddressPools: {{ipAddressPoolNames "layer2"}}
      
      bgpAdvertisements: |
        - name: l3services
          spec:
            ipAddressPools: {{ipAddressPoolNames "bgp"}}
      
      l2Advertisements: |
        - name: any
          spec:
            ipAddressPools: {{ipAddressPoolNames "any"}}
      
      bgpAdvertisements: |
        - name: any
          spec:
            ipAddressPools: {{ipAddressPoolNames "any"}}
      

    The l2Advertisements object allows defining interfaces to optimize the announcement. When you use the interfaces selector, LB addresses are announced only on selected host interfaces. Mirantis recommends this configuration if nodes use separate host networks for different types of traffic. The pros of such configuration are as follows: less spam on other interfaces and networks, limited chances to reach services LB addresses from irrelevant interfaces and networks.

    Configuration example:

    l2Advertisements: |
      - name: management-lcm
        spec:
          ipAddressPools:
            - default
          interfaces:
            # LB addresses from the "default" address pool will be announced
            # on the "k8s-lcm" interface
            - k8s-lcm
    

    Caution

    Interface names in the interfaces list must match those on the corresponding nodes.

MetalLBConfigTemplate status

The status field describes the actual state of the object. It contains the following fields:

  • renderedObjects

    MetalLB objects description rendered from spec.templates in the same format as they are defined in the MetalLBConfig spec field.

    All underlying objects are optional. The following objects can be present: bfdProfiles, bgpAdvertisements, bgpPeers, communities, ipAddressPools, l2Advertisements.

  • state Since 2.23.0

    Message that reflects the current status of the resource. The list of possible values includes the following:

    • OK - object is operational.

    • ERR - object is non-operational. This status has a detailed description in the messages list.

    • TERM - object was deleted and is terminating.

  • messages Since 2.23.0

    List of error or warning messages if the object state is ERR.

  • objCreated

    Date, time, and IPAM version of the resource creation.

  • objStatusUpdated

    Date, time, and IPAM version of the last update of the status field in the resource.

  • objUpdated

    Date, time, and IPAM version of the last resource update.

MetalLB configuration examples

The following examples contain configuration templates that include MetalLBConfigTemplate.

Configuration example for using L2 (ARP) announcement
Configuration example for MetalLBConfig
apiVersion: kaas.mirantis.com/v1alpha1
kind: MetalLBConfig
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: kaas-mgmt
    kaas.mirantis.com/provider: baremetal
  name: mgmt-l2
  namespace: default
spec:
  templateName: mgmt-metallb-template
Configuration example for MetalLBConfigTemplate
apiVersion: ipam.mirantis.com/v1alpha1
kind: MetalLBConfigTemplate
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: kaas-mgmt
    kaas.mirantis.com/provider: baremetal
  name: mgmt-metallb-template
  namespace: default
spec:
  templates:
    l2Advertisements: |
      - name: management-lcm
        spec:
          ipAddressPools:
            - default
          interfaces:
            # IPs from the "default" address pool will be announced on the "k8s-lcm" interface
            - k8s-lcm
      - name: provision-pxe
        spec:
          ipAddressPools:
            - services-pxe
          interfaces:
            # IPs from the "services-pxe" address pool will be announced on the "k8s-pxe" interface
            - k8s-pxe
Configuration example for Subnet of the default pool
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: kaas-mgmt
    ipam/SVC-MetalLB: ""
    kaas.mirantis.com/provider: baremetal
    metallb/address-pool-auto-assign: "true"
    metallb/address-pool-name: default
    metallb/address-pool-protocol: layer2
  name: master-lb-default
  namespace: default
spec:
  cidr: 10.0.34.0/24
  includeRanges:
  - 10.0.34.101-10.0.34.120
Configuration example for Subnet of the services-pxe pool
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: kaas-mgmt
    ipam/SVC-MetalLB: ""
    kaas.mirantis.com/provider: baremetal
    metallb/address-pool-auto-assign: "false"
    metallb/address-pool-name: services-pxe
    metallb/address-pool-protocol: layer2
  name: master-lb-pxe
  namespace: default
spec:
  cidr: 10.0.24.0/24
  includeRanges:
  - 10.0.24.221-10.0.24.230

After the objects are created and processed by the kaas-ipam Controller, the status field displays for MetalLBConfigTemplate:

Configuration example of the status field for MetalLBConfigTemplate
status:
  checksums:
    annotations: sha256:38e0b9de817f645c4bec37c0d4a3e58baecccb040f5718dc069a72c7385a0bed
    labels: sha256:380337902278e8985e816978c349910a4f7ed98169c361eb8777411ac427e6ba
    spec: sha256:0860790fc94217598e0775ab2961a02acc4fba820ae17c737b94bb5d55390dbe
  messages:
  - Template for BFDProfiles is undefined
  - Template for BGPAdvertisements is undefined
  - Template for BGPPeers is undefined
  - Template for Communities is undefined
  objCreated: 2023-06-30T21:22:56.00000Z  by  v6.5.999-20230627-072014-ba8d918
  objStatusUpdated: 2023-07-04T00:30:35.82023Z  by  v6.5.999-20230627-072014-ba8d918
  objUpdated: 2023-06-30T22:10:51.73822Z  by  v6.5.999-20230627-072014-ba8d918
  renderedObjects:
    ipAddressPools:
    - name: default
      spec:
        addresses:
        - 10.0.34.101-10.0.34.120
        autoAssign: true
    - name: services-pxe
      spec:
        addresses:
        - 10.0.24.221-10.0.24.230
        autoAssign: false
    l2Advertisements:
    - name: management-lcm
      spec:
        interfaces:
        - k8s-lcm
        ipAddressPools:
        - default
    - name: provision-pxe
      spec:
        interfaces:
        - k8s-pxe
        ipAddressPools:
        - services-pxe
  state: OK

The following example illustrates contents of the status field that displays for MetalLBConfig after the objects are processed by the MetalLB Controller.

Configuration example of the status field for MetalLBConfig
status:
  objects:
    ipAddressPools:
    - name: default
      spec:
        addresses:
        - 10.0.34.101-10.0.34.120
        autoAssign: true
        avoidBuggyIPs: false
    - name: services-pxe
      spec:
        addresses:
        - 10.0.24.221-10.0.24.230
        autoAssign: false
        avoidBuggyIPs: false
    l2Advertisements:
    - name: management-lcm
      spec:
        interfaces:
        - k8s-lcm
        ipAddressPools:
        - default
    - name: provision-pxe
      spec:
        interfaces:
        - k8s-pxe
        ipAddressPools:
        - services-pxe
  propagateResult:
    message: Objects were successfully updated
    success: true
    time: "2023-07-05T03:10:23Z"
  updateResult:
    message: Objects were successfully read from MetalLB configuration specification
    success: true
    time: "2023-07-05T03:10:23Z"

Using the objects described above, several native MetalLB objects are created in the kaas-mgmt cluster during deployment.

Configuration example of MetalLB objects created during cluster deployment
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: management-lcm
  namespace: metallb-system
spec:
  interfaces:
  - k8s-lcm
  ipAddressPools:
  - default

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: provision-pxe
  namespace: metallb-system
spec:
  interfaces:
  - k8s-pxe
  ipAddressPools:
  - services-pxe

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default
  namespace: metallb-system
spec:
  addresses:
  - 10.0.34.101-10.0.34.120
  autoAssign: true
  avoidBuggyIPs: false

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: services-pxe
  namespace: metallb-system
spec:
  addresses:
  - 10.0.24.221-10.0.24.230
  autoAssign: false
  avoidBuggyIPs: false
Configuration example for using BGP announcement

In the following configuration example, MetalLB is configured to use BGP for announcement of external addresses of Kubernetes load-balanced services for the managed cluster from master nodes. Each master node is located in its own rack without the L2 layer extension between racks.

This section contains only examples of the objects required to illustrate the MetalLB configuration. For Rack, MultiRackCluster, L2Template and other objects required to configure BGP announcement of the cluster API load balancer address for this scenario, refer to Multiple rack configuration example.

Configuration example for MetalLBConfig
apiVersion: kaas.mirantis.com/v1alpha1
kind: MetalLBConfig
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: test-cluster-metallb-bgp
  namespace: managed-ns
spec:
  templateName: test-cluster-metallb-bgp-template
Configuration example for MetalLBConfigTemplate
apiVersion: ipam.mirantis.com/v1alpha1
kind: MetalLBConfigTemplate
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: test-cluster-metallb-bgp-template
  namespace: managed-ns
spec:
  templates:
    bgpAdvertisements: |
      - name: services
        spec:
          ipAddressPools:
            - services
          peers:            # "peers" can be omitted if all defined peers
          - svc-peer-rack1  # are used in a particular "bgpAdvertisement"
          - svc-peer-rack2
          - svc-peer-rack3
    bgpPeers: |
      - name: svc-peer-rack1
        spec:
          peerAddress: 10.77.41.1  # peer address is in the external subnet #1
          peerASN: 65100
          myASN: 65101
          nodeSelectors:
            - matchLabels:
                rack-id: rack-master-1  # references the node corresponding
                                        # to the "test-cluster-master-1" Machine
      - name: svc-peer-rack2
        spec:
          peerAddress: 10.77.42.1  # peer address is in the external subnet #2
          peerASN: 65100
          myASN: 65101
          nodeSelectors:
            - matchLabels:
                rack-id: rack-master-2  # references the node corresponding
                                        # to the "test-cluster-master-2" Machine
      - name: svc-peer-rack3
        spec:
          peerAddress: 10.77.43.1  # peer address is in the external subnet #3
          peerASN: 65100
          myASN: 65101
          nodeSelectors:
            - matchLabels:
                rack-id: rack-master-3  # references the node corresponding
                                        # to the "test-cluster-master-3" Machine
Configuration example for Subnet
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    ipam/SVC-MetalLB: ""
    kaas.mirantis.com/provider: baremetal
    metallb/address-pool-auto-assign: "true"
    metallb/address-pool-name: services
    metallb/address-pool-protocol: bgp
  name: test-cluster-lb
  namespace: managed-ns
spec:
  cidr: 134.33.24.0/24
  includeRanges:
    - 134.33.24.221-134.33.24.240

The following objects illustrate configuration for three subnets that are used to configure external network in three racks. Each master node uses its own external L2/L3 network segment.

Configuration example for the Subnet ext-rack-control-1
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: ext-rack-control-1
  namespace: managed-ns
spec:
  cidr: 10.77.41.0/28
  gateway: 10.77.41.1
  includeRanges:
    - 10.77.41.3-10.77.41.13
  nameservers:
    - 1.2.3.4
Configuration example for the Subnet ext-rack-control-2
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: ext-rack-control-2
  namespace: managed-ns
spec:
  cidr: 10.77.42.0/28
  gateway: 10.77.42.1
  includeRanges:
    - 10.77.42.3-10.77.42.13
  nameservers:
    - 1.2.3.4
Configuration example for the Subnet ext-rack-control-3
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: ext-rack-control-3
  namespace: managed-ns
spec:
  cidr: 10.77.43.0/28
  gateway: 10.77.43.1
  includeRanges:
    - 10.77.43.3-10.77.43.13
  nameservers:
    - 1.2.3.4

Rack objects and ipam/RackRef labels in Machine objects are not required for MetalLB configuration. But in this example, rack objects are implied to be used for configuration of BGP announcement of the cluster API load balancer address. Rack objects are not present in this example.

Machine objects select different L2 templates because each master node uses different L2/L3 network segments for LCM, external, and other networks.

Configuration example for the Machine test-cluster-master-1
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: test-cluster-master-1
  namespace: managed-ns
  annotations:
    metal3.io/BareMetalHost: managed-ns/test-cluster-master-1
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    cluster.sigs.k8s.io/control-plane: controlplane
    hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
    ipam/RackRef: rack-master-1
    kaas.mirantis.com/provider: baremetal
spec:
  providerSpec:
    value:
      kind: BareMetalMachineProviderSpec
      apiVersion: baremetal.k8s.io/v1alpha1
      hostSelector:
        matchLabels:
          kaas.mirantis.com/baremetalhost-id: test-cluster-master-1
      l2TemplateSelector:
        name: test-cluster-master-1
      nodeLabels:
      - key: rack-id          # it is used in "nodeSelectors"
        value: rack-master-1  # of "bgpPeer" MetalLB objects
Configuration example for the Machine test-cluster-master-2
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: test-cluster-master-2
  namespace: managed-ns
  annotations:
    metal3.io/BareMetalHost: managed-ns/test-cluster-master-2
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    cluster.sigs.k8s.io/control-plane: controlplane
    hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
    ipam/RackRef: rack-master-2
    kaas.mirantis.com/provider: baremetal
spec:
  providerSpec:
    value:
      kind: BareMetalMachineProviderSpec
      apiVersion: baremetal.k8s.io/v1alpha1
      hostSelector:
        matchLabels:
          kaas.mirantis.com/baremetalhost-id: test-cluster-master-2
      l2TemplateSelector:
        name: test-cluster-master-2
      nodeLabels:
      - key: rack-id          # it is used in "nodeSelectors"
        value: rack-master-1  # of "bgpPeer" MetalLB objects
Configuration example for the Machine test-cluster-master-2
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: test-cluster-master-3
  namespace: managed-ns
  annotations:
    metal3.io/BareMetalHost: managed-ns/test-cluster-master-3
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    cluster.sigs.k8s.io/control-plane: controlplane
    hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
    ipam/RackRef: rack-master-3
    kaas.mirantis.com/provider: baremetal
spec:
  providerSpec:
    value:
      kind: BareMetalMachineProviderSpec
      apiVersion: baremetal.k8s.io/v1alpha1
      hostSelector:
        matchLabels:
          kaas.mirantis.com/baremetalhost-id: test-cluster-master-3
      l2TemplateSelector:
        name: test-cluster-master-3
      nodeLabels:
      - key: rack-id          # it is used in "nodeSelectors"
        value: rack-master-3  # of "bgpPeer" MetalLB objects
MultiRackCluster

TechPreview Available since 2.24.4

This section describes the MultiRackCluster resource used in the Container Cloud API.

When you create a bare metal managed cluster with a multi-rack topology, where Kubernetes masters are distributed across multiple racks without L2 layer extension between them, the MultiRackCluster resource allows you to set cluster-wide parameters for configuration of the BGP announcement of the cluster API load balancer address. In this scenario, the MultiRackCluster object must be bound to the Cluster object.

The MultiRackCluster object is generally used for a particular cluster in conjunction with Rack objects described in Rack.

For demonstration purposes, the Container Cloud MultiRackCluster custom resource (CR) description is split into the following major sections:

MultiRackCluster metadata

The Container Cloud MultiRackCluster CR metadata contains the following fields:

  • apiVersion

    API version of the object that is ipam.mirantis.com/v1alpha1.

  • kind

    Object type that is MultiRackCluster.

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the MultiRackCluster object.

    • namespace

      Container Cloud project (Kubernetes namespace) in which the object was created.

    • labels

      Key-value pairs that are attached to the object:

      • cluster.sigs.k8s.io/cluster-name

        Cluster object name that this MultiRackCluster object is applied to. To enable the use of BGP announcement for the cluster API LB address, set the useBGPAnnouncement parameter in the Cluster object to true:

        spec:
          providerSpec:
            value:
              useBGPAnnouncement: true
        
      • kaas.mirantis.com/provider

        Provider name that is baremetal.

      • kaas.mirantis.com/region

        Region name.

        Note

        The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

      Warning

      Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

The MultiRackCluster metadata configuration example:

apiVersion: ipam.mirantis.com/v1alpha1
kind: MultiRackCluster
metadata:
  name: multirack-test-cluster
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
MultiRackCluster spec

The spec field of the MultiRackCluster resource describes the desired state of the object. It contains the following fields:

  • bgpdConfigFileName

    Name of the configuration file for the BGP daemon (bird). Recommended value is bird.conf.

  • bgpdConfigFilePath

    Path to the directory where the configuration file for the BGP daemon (bird) is added. The recommended value is /etc/bird.

  • bgpdConfigTemplate

    Optional. Configuration text file template for the BGP daemon (bird) configuration file where you can use go template constructs and the following variables:

    • RouterID, LocalIP

      Local IP on the given network, which is a key in the Rack.spec.peeringMap dictionary, for a given node. You can use it, for example, in the router id {{$.RouterID}}; instruction.

    • LocalASN

      Local AS number.

    • NeighborASN

      Neighbor AS number.

    • NeighborIP

      Neighbor IP address. Its values are taken from Rack.spec.peeringMap, it can be used only inside the range iteration through the Neighbors list.

    • Neighbors

      List of peers in the given network and node. It can be iterated through the range statement in the go template.

    Values for LocalASN and NeighborASN are taken from:

    • MultiRackCluster.defaultPeer - if not used as a field inside the range iteration through the Neighbors list.

    • Corresponding values of Rack.spec.peeringMap - if used as a field inside the range iteration through the Neighbors list.

    This template can be overridden using the Rack objects. For details, see Rack spec.

  • defaultPeer

    Configuration parameters for the default BGP peer. These parameters will be used in rendering of the configuration file for BGP daemon from the template if they are not overridden for a particular rack or network using Rack objects. For details, see Rack spec.

    • localASN

      Mandatory. Local AS number.

    • neighborASN

      Mandatory. Neighbor AS number.

    • neighborIP

      Reserved. Neighbor IP address. Leave it as an empty string.

    • password

      Optional. Neighbor password. If not set, you can hardcode it in bgpdConfigTemplate. It is required for MD5 authentication between BGP peers.

Configuration examples:

Since Cluster releases 17.1.0 and 16.1.0 for bird v2.x
spec:
  bgpdConfigFileName: bird.conf
  bgpdConfigFilePath: /etc/bird
  bgpdConfigTemplate: |
    protocol device {
    }
    #
    protocol direct {
      interface "lo";
      ipv4;
    }
    #
    protocol kernel {
      ipv4 {
        export all;
      };
    }
    #
    {{range $i, $peer := .Neighbors}}
    protocol bgp 'bgp_peer_{{$i}}' {
      local port 1179 as {{.LocalASN}};
      neighbor {{.NeighborIP}} as {{.NeighborASN}};
      ipv4 {
        import none;
        export filter {
          if dest = RTD_UNREACHABLE then {
            reject;
          }
          accept;
        };
      };
    }
    {{end}}
  defaultPeer:
    localASN: 65101
    neighborASN: 65100
    neighborIP: ""
Before Cluster releases 17.1.0 and 16.1.0 for bird v1.x
spec:
  bgpdConfigFileName: bird.conf
  bgpdConfigFilePath: /etc/bird
  bgpdConfigTemplate: |
    listen bgp port 1179;
    protocol device {
    }
    #
    protocol direct {
      interface "lo";
    }
    #
    protocol kernel {
      export all;
    }
    #
    {{range $i, $peer := .Neighbors}}
    protocol bgp 'bgp_peer_{{$i}}' {
      local as {{.LocalASN}};
      neighbor {{.NeighborIP}} as {{.NeighborASN}};
      import all;
      export filter {
        if dest = RTD_UNREACHABLE then {
          reject;
        }
        accept;
      };
    }
    {{end}}
  defaultPeer:
    localASN: 65101
    neighborASN: 65100
    neighborIP: ""
MultiRackCluster status

The status field of the MultiRackCluster resource reflects the actual state of the MultiRackCluster object and contains the following fields:

  • state Since 2.23.0

    Message that reflects the current status of the resource. The list of possible values includes the following:

    • OK - object is operational.

    • ERR - object is non-operational. This status has a detailed description in the messages list.

    • TERM - object was deleted and is terminating.

  • messages Since 2.23.0

    List of error or warning messages if the object state is ERR.

  • objCreated

    Date, time, and IPAM version of the resource creation.

  • objStatusUpdated

    Date, time, and IPAM version of the last update of the status field in the resource.

  • objUpdated

    Date, time, and IPAM version of the last resource update.

Configuration example:

status:
  checksums:
    annotations: sha256:38e0b9de817f645c4bec37c0d4a3e58baecccb040f5718dc069a72c7385a0bed
    labels: sha256:d8f8eacf487d57c22ca0ace29bd156c66941a373b5e707d671dc151959a64ce7
    spec: sha256:66b5d28215bdd36723fe6230359977fbede828906c6ae96b5129a972f1fa51e9
  objCreated: 2023-08-11T12:25:21.00000Z  by  v6.5.999-20230810-155553-2497818
  objStatusUpdated: 2023-08-11T12:32:58.11966Z  by  v6.5.999-20230810-155553-2497818
  objUpdated: 2023-08-11T12:32:57.32036Z  by  v6.5.999-20230810-155553-2497818
  state: OK
MultiRackCluster and Rack usage examples

The following configuration examples of several bare metal objects illustrate how to configure BGP announcement of the load balancer address used to expose the cluster API.

Single rack configuration example

In the following example, all master nodes are in a single rack. One Rack object is required in this case for master nodes. Some worker nodes can coexist in the same rack with master nodes or occupy separate racks. It is implied that the useBGPAnnouncement parameter is set to true in the corresponding Cluster object.

Configuration example for MultiRackCluster

Since Cluster releases 17.1.0 and 16.1.0 for bird v2.x:

apiVersion: ipam.mirantis.com/v1alpha1
kind: MultiRackCluster
metadata:
  name: multirack-test-cluster
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
spec:
  bgpdConfigFileName: bird.conf
  bgpdConfigFilePath: /etc/bird
  bgpdConfigTemplate: |
    protocol device {
    }
    #
    protocol direct {
      interface "lo";
      ipv4;
    }
    #
    protocol kernel {
      ipv4 {
        export all;
      };
    }
    #
    {{range $i, $peer := .Neighbors}}
    protocol bgp 'bgp_peer_{{$i}}' {
      local port 1179 as {{.LocalASN}};
      neighbor {{.NeighborIP}} as {{.NeighborASN}};
      ipv4 {
        import none;
        export filter {
          if dest = RTD_UNREACHABLE then {
            reject;
          }
          accept;
        };
      };
    }
    {{end}}
  defaultPeer:
    localASN: 65101
    neighborASN: 65100
    neighborIP: ""

Before Cluster releases 17.1.0 and 16.1.0 for bird v1.x:

apiVersion: ipam.mirantis.com/v1alpha1
kind: MultiRackCluster
metadata:
  name: multirack-test-cluster
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
spec:
  bgpdConfigFileName: bird.conf
  bgpdConfigFilePath: /etc/bird
  bgpdConfigTemplate: |
    listen bgp port 1179;
    protocol device {
    }
    #
    protocol direct {
      interface "lo";
    }
    #
    protocol kernel {
      export all;
    }
    #
    {{range $i, $peer := .Neighbors}}
    protocol bgp 'bgp_peer_{{$i}}' {
      local as {{.LocalASN}};
      neighbor {{.NeighborIP}} as {{.NeighborASN}};
      import all;
      export filter {
        if dest = RTD_UNREACHABLE then {
          reject;
        }
        accept;
      };
    }
    {{end}}
  defaultPeer:
    localASN: 65101
    neighborASN: 65100
    neighborIP: ""
Configuration example for Rack
apiVersion: ipam.mirantis.com/v1alpha1
kind: Rack
metadata:
  name: rack-master
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
spec:
  peeringMap:
    lcm-rack-control:
      peers:
      - neighborIP: 10.77.31.1  # "localASN" and "neighborASN" are taken from
      - neighborIP: 10.77.37.1  # "MultiRackCluster.spec.defaultPeer"
                                # if not set here
Configuration example for Machine
# "Machine" templates for "test-cluster-master-2" and "test-cluster-master-3"
# differ only in BMH selectors in this example.
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: test-cluster-master-1
  namespace: managed-ns
  annotations:
    metal3.io/BareMetalHost: managed-ns/test-cluster-master-1
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    cluster.sigs.k8s.io/control-plane: controlplane
    hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
    ipam/RackRef: rack-master # used to connect "IpamHost" to "Rack" objects, so that
                              # BGP parameters can be obtained from "Rack" to
                              # render BGP configuration for the given "IpamHost" object
    kaas.mirantis.com/provider: baremetal
spec:
  providerSpec:
    value:
      kind: BareMetalMachineProviderSpec
      apiVersion: baremetal.k8s.io/v1alpha1
      hostSelector:
        matchLabels:
          kaas.mirantis.com/baremetalhost-id: test-cluster-master-1
      l2TemplateSelector:
        name: test-cluster-master

Note

Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

Caution

While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

Configuration example for L2Template
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: test-cluster-master
  namespace: managed-ns
spec:
  ...
  l3Layout:
    - subnetName: lcm-rack-control # this network is referenced in "rack-master" Rack
      scope:      namespace
  ...
  npTemplate: |
    ...
    ethernets:
      lo:
        addresses:
          - {{ cluster_api_lb_ip }}  # function for cluster API LB IP
        dhcp4: false
        dhcp6: false
    ...

After the objects are created and nodes are provisioned, the IpamHost objects will have BGP daemon configuration files in their status fields. For example:

Configuration example for IpamHost
apiVersion: ipam.mirantis.com/v1alpha1
kind: IpamHost
...
status:
  ...
  netconfigFiles:
  - content: bGlzdGVuIGJncCBwb3J0IDExNzk7CnByb3RvY29sIGRldmljZSB7Cn0KIwpwcm90b2NvbCBkaXJlY3QgewogIGludGVyZmFjZSAibG8iOwp9CiMKcHJvdG9jb2wga2VybmVsIHsKICBleHBvcnQgYWxsOwp9CiMKCnByb3RvY29sIGJncCAnYmdwX3BlZXJfMCcgewogIGxvY2FsIGFzIDY1MTAxOwogIG5laWdoYm9yIDEwLjc3LjMxLjEgYXMgNjUxMDA7CiAgaW1wb3J0IGFsbDsKICBleHBvcnQgZmlsdGVyIHsKICAgIGlmIGRlc3QgPSBSVERfVU5SRUFDSEFCTEUgdGhlbiB7CiAgICAgIHJlamVjdDsKICAgIH0KICAgIGFjY2VwdDsKICB9Owp9Cgpwcm90b2NvbCBiZ3AgJ2JncF9wZWVyXzEnIHsKICBsb2NhbCBhcyA2NTEwMTsKICBuZWlnaGJvciAxMC43Ny4zNy4xIGFzIDY1MTAwOwogIGltcG9ydCBhbGw7CiAgZXhwb3J0IGZpbHRlciB7CiAgICBpZiBkZXN0ID0gUlREX1VOUkVBQ0hBQkxFIHRoZW4gewogICAgICByZWplY3Q7CiAgICB9CiAgICBhY2NlcHQ7CiAgfTsKfQoK
    path: /etc/bird/bird.conf
  - content: ...
    path: /etc/netplan/60-kaas-lcm-netplan.yaml
  netconfigFilesStates:
    /etc/bird/bird.conf: 'OK: 2023-08-17T08:00:58.96140Z 25cde040e898fd5bf5b28aacb12f046b4adb510570ecf7d7fa5a8467fa4724ec'
    /etc/netplan/60-kaas-lcm-netplan.yaml: 'OK: 2023-08-11T12:33:24.54439Z 37ac6e9fe13e5969f35c20c615d96b4ed156341c25e410e95831794128601e01'
  ...

You can decode /etc/bird/bird.conf contents and verify the configuration:

echo "<<base64-string>>" | base64 -d

The following system output applies to the above configuration examples:

Configuration example for the decoded bird.conf

Since Cluster releases 17.1.0 and 16.1.0 for bird v2.x:

protocol device {
}
#
protocol direct {
  interface "lo";
  ipv4;
}
#
protocol kernel {
  ipv4 {
    export all;
  };
}
#

protocol bgp 'bgp_peer_0' {
  local port 1179 as 65101;
  neighbor 10.77.31.1 as 65100;
  ipv4 {
    import none;
    export filter {
      if dest = RTD_UNREACHABLE then {
        reject;
      }
      accept;
    };
  };
}

protocol bgp 'bgp_peer_1' {
  local port 1179 as 65101;
  neighbor 10.77.37.1 as 65100;
  ipv4 {
    import none;
    export filter {
      if dest = RTD_UNREACHABLE then {
        reject;
      }
      accept;
    };
  };
}

Before Cluster releases 17.1.0 and 16.1.0 for bird v1.x:

listen bgp port 1179;
protocol device {
}
#
protocol direct {
  interface "lo";
}
#
protocol kernel {
  export all;
}
#

protocol bgp 'bgp_peer_0' {
  local as 65101;
  neighbor 10.77.31.1 as 65100;
  import all;
  export filter {
    if dest = RTD_UNREACHABLE then {
      reject;
    }
    accept;
  };
}

protocol bgp 'bgp_peer_1' {
  local as 65101;
  neighbor 10.77.37.1 as 65100;
  import all;
  export filter {
    if dest = RTD_UNREACHABLE then {
      reject;
    }
    accept;
  };
}

BGP daemon configuration files are copied from IpamHost.status to the corresponding LCMMachine object the same way as it is done for netplan configuration files. Then, the configuration files are written to the corresponding node by the LCM-Agent.

Multiple rack configuration example

In the following configuration example, each master node is located in its own rack. Three Rack objects are required in this case for master nodes. Some worker nodes can coexist in the same racks with master nodes or occupy separate racks. Only objects that are required to show configuration for BGP announcement of the cluster API load balancer address are provided here.

For the description of Rack, MetalLBConfig, and other objects that are required for MetalLB configuration in this scenario, refer to Configuration example for using BGP announcement.

It is implied that the useBGPAnnouncement parameter is set to true in the corresponding Cluster object.

Configuration example for MultiRackCluster

Since Cluster releases 17.1.0 and 16.1.0 for bird v2.x:

# It is the same object as in the single rack example.
apiVersion: ipam.mirantis.com/v1alpha1
kind: MultiRackCluster
metadata:
  name: multirack-test-cluster
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
spec:
  bgpdConfigFileName: bird.conf
  bgpdConfigFilePath: /etc/bird
  bgpdConfigTemplate: |
    protocol device {
    }
    #
    protocol direct {
      interface "lo";
      ipv4;
    }
    #
    protocol kernel {
      ipv4 {
        export all;
      };
    }
    #
    {{range $i, $peer := .Neighbors}}
    protocol bgp 'bgp_peer_{{$i}}' {
      local port 1179 as {{.LocalASN}};
      neighbor {{.NeighborIP}} as {{.NeighborASN}};
      ipv4 {
        import none;
        export filter {
          if dest = RTD_UNREACHABLE then {
            reject;
          }
          accept;
        };
      };
    }
    {{end}}
  defaultPeer:
    localASN: 65101
    neighborASN: 65100
    neighborIP: ""

Before Cluster releases 17.1.0 and 16.1.0 for bird v1.x:

# It is the same object as in the single rack example.
apiVersion: ipam.mirantis.com/v1alpha1
kind: MultiRackCluster
metadata:
  name: multirack-test-cluster
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
spec:
  bgpdConfigFileName: bird.conf
  bgpdConfigFilePath: /etc/bird
  bgpdConfigTemplate: |
    listen bgp port 1179;
    protocol device {
    }
    #
    protocol direct {
      interface "lo";
    }
    #
    protocol kernel {
      export all;
    }
    #
    {{range $i, $peer := .Neighbors}}
    protocol bgp 'bgp_peer_{{$i}}' {
      local as {{.LocalASN}};
      neighbor {{.NeighborIP}} as {{.NeighborASN}};
      import all;
      export filter {
        if dest = RTD_UNREACHABLE then {
          reject;
        }
        accept;
      };
    }
    {{end}}
  defaultPeer:
    localASN: 65101
    neighborASN: 65100
    neighborIP: ""

The following Rack objects differ in neighbor IP addresses and in the network (L3 subnet) used for BGP connection to announce the cluster API LB IP and for cluster API traffic.

Configuration example for Rack 1
apiVersion: ipam.mirantis.com/v1alpha1
kind: Rack
metadata:
  name: rack-master-1
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
spec:
  peeringMap:
    lcm-rack-control-1:
      peers:
      - neighborIP: 10.77.31.2  # "localASN" and "neighborASN" are taken from
      - neighborIP: 10.77.31.3  # "MultiRackCluster.spec.defaultPeer" if
                                # not set here
Configuration example for Rack 2
apiVersion: ipam.mirantis.com/v1alpha1
kind: Rack
metadata:
  name: rack-master-2
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
spec:
  peeringMap:
    lcm-rack-control-2:
      peers:
      - neighborIP: 10.77.32.2  # "localASN" and "neighborASN" are taken from
      - neighborIP: 10.77.32.3  # "MultiRackCluster.spec.defaultPeer" if
                                # not set here
Configuration example for Rack 3
apiVersion: ipam.mirantis.com/v1alpha1
kind: Rack
metadata:
  name: rack-master-3
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
spec:
  peeringMap:
    lcm-rack-control-3:
      peers:
      - neighborIP: 10.77.33.2  # "localASN" and "neighborASN" are taken from
      - neighborIP: 10.77.33.3  # "MultiRackCluster.spec.defaultPeer" if
                                # not set here

As compared to single rack examples, the following Machine objects differ in:

  • BMH selectors

  • L2Template selectors

  • Rack selectors (the ipam/RackRef label)

  • The rack-id node labels

    The labels on master nodes are required for MetalLB node selectors if MetalLB is used to announce LB IP addresses on master nodes. In this scenario, the L2 (ARP) announcement mode cannot be used for MetalLB because master nodes are in different L2 segments. So, the BGP announcement mode must be used for MetalLB. Node selectors are required to properly configure BGP connections from each master node.

Note

Before update of the management cluster to Container Cloud 2.29.0 (Cluster release 16.4.0), instead of BareMetalHostInventory, use the BareMetalHost object. For details, see BareMetalHost.

Caution

While the Cluster release of the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

Configuration example for Machine 1
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: test-cluster-master-1
  namespace: managed-ns
  annotations:
    metal3.io/BareMetalHost: managed-ns/test-cluster-master-1
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    cluster.sigs.k8s.io/control-plane: controlplane
    hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
    ipam/RackRef: rack-master-1
    kaas.mirantis.com/provider: baremetal
spec:
  providerSpec:
    value:
      kind: BareMetalMachineProviderSpec
      apiVersion: baremetal.k8s.io/v1alpha1
      hostSelector:
        matchLabels:
          kaas.mirantis.com/baremetalhost-id: test-cluster-master-1
      l2TemplateSelector:
        name: test-cluster-master-1
      nodeLabels:             # not used for BGP announcement of the
      - key: rack-id          # cluster API LB IP but can be used for
        value: rack-master-1  # MetalLB if "nodeSelectors" are required
Configuration example for Machine 2
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: test-cluster-master-2
  namespace: managed-ns
  annotations:
    metal3.io/BareMetalHost: managed-ns/test-cluster-master-2
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    cluster.sigs.k8s.io/control-plane: controlplane
    hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
    ipam/RackRef: rack-master-2
    kaas.mirantis.com/provider: baremetal
spec:
  providerSpec:
    value:
      kind: BareMetalMachineProviderSpec
      apiVersion: baremetal.k8s.io/v1alpha1
      hostSelector:
        matchLabels:
          kaas.mirantis.com/baremetalhost-id: test-cluster-master-2
      l2TemplateSelector:
        name: test-cluster-master-2
      nodeLabels:             # not used for BGP announcement of the
      - key: rack-id          # cluster API LB IP but can be used for
        value: rack-master-2  # MetalLB if "nodeSelectors" are required
Configuration example for Machine 3
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: test-cluster-master-3
  namespace: managed-ns
  annotations:
    metal3.io/BareMetalHost: managed-ns/test-cluster-master-3
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    cluster.sigs.k8s.io/control-plane: controlplane
    hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
    ipam/RackRef: rack-master-3
    kaas.mirantis.com/provider: baremetal
spec:
  providerSpec:
    value:
      kind: BareMetalMachineProviderSpec
      apiVersion: baremetal.k8s.io/v1alpha1
      hostSelector:
        matchLabels:
          kaas.mirantis.com/baremetalhost-id: test-cluster-master-3
      l2TemplateSelector:
        name: test-cluster-master-3
      nodeLabels:             # optional. not used for BGP announcement of
      - key: rack-id          # the cluster API LB IP but can be used for
        value: rack-master-3  # MetalLB if "nodeSelectors" are required
Configuration example for Subnet defining the cluster API LB IP address
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  name: test-cluster-api-lb
  namespace: managed-ns
  labels:
    kaas.mirantis.com/provider: baremetal
    ipam/SVC-LBhost: "1"
    cluster.sigs.k8s.io/cluster-name: test-cluster
spec:
  cidr: 134.33.24.201/32
  useWholeCidr: true
Configuration example for Subnet of the LCM network in the rack-master-1 rack
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: lcm-rack-control-1
  namespace: managed-ns
spec:
  cidr: 10.77.31.0/28
  gateway: 10.77.31.1
  includeRanges:
    - 10.77.31.4-10.77.31.13
  nameservers:
    - 1.2.3.4
Configuration example for Subnet of the LCM network in the rack-master-2 rack
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: lcm-rack-control-2
  namespace: managed-ns
spec:
  cidr: 10.77.32.0/28
  gateway: 10.77.32.1
  includeRanges:
    - 10.77.32.4-10.77.32.13
  nameservers:
    - 1.2.3.4
Configuration example for Subnet of the LCM network in the rack-master-3 rack
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: lcm-rack-control-3
  namespace: managed-ns
spec:
  cidr: 10.77.33.0/28
  gateway: 10.77.33.1
  includeRanges:
    - 10.77.33.4-10.77.33.13
  nameservers:
    - 1.2.3.4

The following L2Template objects differ in LCM and external subnets that each master node uses.

Configuration example for L2Template 1
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: test-cluster-master-1
  namespace: managed-ns
spec:
  ...
  l3Layout:
    - subnetName: lcm-rack-control-1  # this network is referenced
      scope:      namespace           # in the "rack-master-1" Rack
    - subnetName: ext-rack-control-1  # this optional network is used for
      scope:      namespace           # Kubernetes services traffic and
                                      # MetalLB BGP connections
  ...
  npTemplate: |
    ...
    ethernets:
      lo:
        addresses:
          - {{ cluster_api_lb_ip }}  # function for cluster API LB IP
        dhcp4: false
        dhcp6: false
    ...
Configuration example for L2Template 2
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: test-cluster-master-2
  namespace: managed-ns
spec:
  ...
  l3Layout:
    - subnetName: lcm-rack-control-2  # this network is referenced
      scope:      namespace           # in "rack-master-2" Rack
    - subnetName: ext-rack-control-2  # this network is used for Kubernetes services
      scope:      namespace           # traffic and MetalLB BGP connections
  ...
  npTemplate: |
    ...
    ethernets:
      lo:
        addresses:
          - {{ cluster_api_lb_ip }}  # function for cluster API LB IP
        dhcp4: false
        dhcp6: false
    ...
Configuration example for L2Template 3
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal
  name: test-cluster-master-3
  namespace: managed-ns
spec:
  ...
  l3Layout:
    - subnetName: lcm-rack-control-3  # this network is referenced
      scope:      namespace           # in "rack-master-3" Rack
    - subnetName: ext-rack-control-3  # this network is used for Kubernetes services
      scope:      namespace           # traffic and MetalLB BGP connections
  ...
  npTemplate: |
    ...
    ethernets:
      lo:
        addresses:
          - {{ cluster_api_lb_ip }}  # function for cluster API LB IP
        dhcp4: false
        dhcp6: false
    ...

The following MetalLBConfig example illustrates how node labels are used in nodeSelectors of bgpPeers. Each of bgpPeers corresponds to one of master nodes.

Configuration example for MetalLBConfig
apiVersion: ipam.mirantis.com/v1alpha1
kind: MetalLBConfig
metadata:
labels:
  cluster.sigs.k8s.io/cluster-name: test-cluster
  kaas.mirantis.com/provider: baremetal
name: test-cluster-metallb-config
namespace: managed-ns
spec:
  ...
  bgpPeers:
    - name: svc-peer-rack1
      spec:
        holdTime: 0s
        keepaliveTime: 0s
        peerAddress: 10.77.41.1 # peer address is in external subnet
                                # instead of LCM subnet used for BGP
                                # connection to announce cluster API LB IP
        peerASN: 65100  # the same as for BGP connection used to announce
                        # cluster API LB IP
        myASN: 65101    # the same as for BGP connection used to announce
                        # cluster API LB IP
        nodeSelectors:
          - matchLabels:
              rack-id: rack-master-1  # references the node corresponding
                                      # to "test-cluster-master-1" Machine
    - name: svc-peer-rack2
      spec:
        holdTime: 0s
        keepaliveTime: 0s
        peerAddress: 10.77.42.1
        peerASN: 65100
        myASN: 65101
        nodeSelectors:
          - matchLabels:
              rack-id: rack-master-1
    - name: svc-peer-rack3
      spec:
        holdTime: 0s
        keepaliveTime: 0s
        peerAddress: 10.77.43.1
        peerASN: 65100
        myASN: 65101
        nodeSelectors:
          - matchLabels:
              rack-id: rack-master-1
  ...

After the objects are created and nodes are provisioned, the IpamHost objects will have BGP daemon configuration files in their status fields. Refer to Single rack configuration example on how to verify the BGP configuration files.

Rack

TechPreview Available since 2.24.4

This section describes the Rack resource used in the Container Cloud API.

When you create a bare metal managed cluster with a multi-rack topology, where Kubernetes masters are distributed across multiple racks without L2 layer extension between them, the Rack resource allows you to configure BGP announcement of the cluster API load balancer address from each rack.

In this scenario, Rack objects must be bound to Machine objects corresponding to master nodes of the cluster. Each Rack object describes the configuration of the BGP daemon (bird) used to announce the cluster API LB address from a particular master node (or from several nodes in the same rack).

Rack objects are used for a particular cluster only in conjunction with the MultiRackCluster object described in MultiRackCluster.

For demonstration purposes, the Container Cloud Rack custom resource (CR) description is split into the following major sections:

For configuration examples, see MultiRackCluster and Rack usage examples.

Rack metadata

The Container Cloud Rack CR metadata contains the following fields:

  • apiVersion

    API version of the object that is ipam.mirantis.com/v1alpha1.

  • kind

    Object type that is Rack.

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the Rack object. Corresponding Machine objects must have their ipam/RackRef label value set to the name of the Rack object. This label is required only for Machine objects of the master nodes that announce the cluster API LB address.

    • namespace

      Container Cloud project (Kubernetes namespace) where the object was created.

    • labels

      Key-value pairs that are attached to the object:

      • cluster.sigs.k8s.io/cluster-name

        Cluster object name that this Rack object is applied to.

      • kaas.mirantis.com/provider

        Provider name that is baremetal.

      • kaas.mirantis.com/region

        Region name.

        Note

        The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

      Warning

      Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Rack metadata example:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Rack
metadata:
  name: rack-1
  namespace: managed-ns
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    kaas.mirantis.com/provider: baremetal

Corresponding Machine metadata example:

apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  labels:
    cluster.sigs.k8s.io/cluster-name: test-cluster
    cluster.sigs.k8s.io/control-plane: controlplane
    hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
    ipam/RackRef: rack-1
    kaas.mirantis.com/provider: baremetal
  name: managed-master-1-control-efi-6tg52
  namespace: managed-ns
Rack spec

The spec field of the Rack resource describes the desired state of the object. It contains the following fields:

  • bgpdConfigTemplate

    Optional. Configuration file template that will be used to create configuration file for a BGP daemon on nodes in this rack. If not set, the configuration file template from the corresponding MultiRackCluster object is used.

  • peeringMap

    Structure that describes general parameters of BGP peers to be used in the configuration file for a BGP daemon for each network where BGP announcement is used. Also, you can define a separate configuration file template for the BGP daemon for each of those networks. The peeringMap structure is as follows:

    peeringMap:
      <network-name-a>:
        peers:
          - localASN: <localASN-1>
            neighborASN: <neighborASN-1>
            neighborIP: <neighborIP-1>
            password: <password-1>
          - localASN: <localASN-2>
            neighborASN: <neighborASN-2>
            neighborIP: <neighborIP-2>
            password: <password-2>
        bgpdConfigTemplate: |
        <configuration file template for a BGP daemon>
      ...
    
    • <network-name-a>

      Name of the network where a BGP daemon should connect to the neighbor BGP peers. By default, it is implied that the same network is used on the node to make connection to the neighbor BGP peers as well as to receive and respond to the traffic directed to the IP address being advertised. In our scenario, the advertised IP address is the cluster API LB IP address.

      This network name must be the same as the subnet name used in the L2 template (l3Layout section) for the corresponding master node(s).

    • peers

      Optional. List of dictionaries where each dictionary defines configuration parameters for a particular BGP peer. Peer parameters are as follows:

      • localASN

        Optional. Local AS number. If not set, it can be taken from MultiRackCluster.spec.defaultPeer or can be hardcoded in bgpdConfigTemplate.

      • neighborASN

        Optional. Neighbor AS number. If not set, it can be taken from MultiRackCluster.spec.defaultPeer or can be hardcoded in bgpdConfigTemplate.

      • neighborIP

        Mandatory. Neighbor IP address.

      • password

        Optional. Neighbor password. If not set, it can be taken from MultiRackCluster.spec.defaultPeer or can be hardcoded in bgpdConfigTemplate. It is required when MD5 authentication between BGP peers is used.

    • bgpdConfigTemplate

      Optional. Configuration file template that will be used to create the configuration file for the BGP daemon of the network-name-a network on a particular node. If not set, Rack.spec.bgpdConfigTemplate is used.

Configuration example:

Since Cluster releases 17.1.0 and 16.1.0 for bird v2.x
spec:
  bgpdConfigTemplate: |
    protocol device {
    }
    #
    protocol direct {
      interface "lo";
      ipv4;
    }
    #
    protocol kernel {
      ipv4 {
        export all;
      };
    }
    #
    protocol bgp bgp_lcm {
      local port 1179 as {{.LocalASN}};
      neighbor {{.NeighborIP}} as {{.NeighborASN}};
      ipv4 {
         import none;
         export filter {
           if dest = RTD_UNREACHABLE then {
             reject;
           }
           accept;
         };
      };
    }
  peeringMap:
    lcm-rack1:
      peers:
      - localASN: 65050
        neighborASN: 65011
        neighborIP: 10.77.31.1
Before Cluster releases 17.1.0 and 16.1.0 for bird v1.x
spec:
  bgpdConfigTemplate: |
    listen bgp port 1179;
    protocol device {
    }
    #
    protocol direct {
      interface "lo";
    }
    #
    protocol kernel {
      export all;
    }
    #
    protocol bgp bgp_lcm {
      local as {{.LocalASN}};
      neighbor {{.NeighborIP}} as {{.NeighborASN}};
      import all;
      export filter {
        if dest = RTD_UNREACHABLE then {
          reject;
        }
        accept;
      };
    }
  peeringMap:
    lcm-rack1:
      peers:
      - localASN: 65050
        neighborASN: 65011
        neighborIP: 10.77.31.1
Rack status

The status field of the Rack resource reflects the actual state of the Rack object and contains the following fields:

  • state Since 2.23.0

    Message that reflects the current status of the resource. The list of possible values includes the following:

    • OK - object is operational.

    • ERR - object is non-operational. This status has a detailed description in the messages list.

    • TERM - object was deleted and is terminating.

  • messages Since 2.23.0

    List of error or warning messages if the object state is ERR.

  • objCreated

    Date, time, and IPAM version of the resource creation.

  • objStatusUpdated

    Date, time, and IPAM version of the last update of the status field in the resource.

  • objUpdated

    Date, time, and IPAM version of the last resource update.

Configuration example:

status:
  checksums:
    annotations: sha256:cd4b751d9773eacbfd5493712db0cbebd6df0762156aefa502d65a9d5e8af31d
    labels: sha256:fc2612d12253443955e1bf929f437245d304b483974ff02a165bc5c78363f739
    spec: sha256:8f0223b1eefb6a9cd583905a25822fd83ac544e62e1dfef26ee798834ef4c0c1
  objCreated: 2023-08-11T12:25:21.00000Z  by  v6.5.999-20230810-155553-2497818
  objStatusUpdated: 2023-08-11T12:33:00.92163Z  by  v6.5.999-20230810-155553-2497818
  objUpdated: 2023-08-11T12:32:59.11951Z  by  v6.5.999-20230810-155553-2497818
  state: OK
Subnet

This section describes the Subnet resource used in Mirantis Container Cloud API to allocate IP addresses for the cluster nodes.

For demonstration purposes, the Container Cloud Subnet custom resource (CR) can be split into the following major sections:

Subnet metadata

The Container Cloud Subnet CR contains the following fields:

  • apiVersion

    API version of the object that is ipam.mirantis.com/v1alpha1.

  • kind

    Object type that is Subnet

  • metadata

    This field contains the following subfields:

    • name

      Name of the Subnet object.

    • namespace

      Project in which the Subnet object was created.

    • labels

      Key-value pairs that are attached to the object:

      • ipam/DefaultSubnet: "1" Deprecated since 2.14.0

        Indicates that this subnet was automatically created for the PXE network.

      • ipam/UID

        Unique ID of a subnet.

      • kaas.mirantis.com/provider

        Provider type.

      • kaas.mirantis.com/region

        Region name.

        Note

        The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

      Warning

      Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  name: kaas-mgmt
  namespace: default
  labels:
    ipam/UID: 1bae269c-c507-4404-b534-2c135edaebf5
    kaas.mirantis.com/provider: baremetal
Subnet spec

The spec field of the Subnet resource describes the desired state of a subnet. It contains the following fields:

  • cidr

    A valid IPv4 CIDR, for example, 10.11.0.0/24.

  • gateway

    A valid gateway address, for example, 10.11.0.9.

  • includeRanges

    A comma-separated list of IP address ranges within the given CIDR that should be used in the allocation of IPs for nodes. The gateway, network, broadcast, and DNSaddresses will be excluded (protected) automatically if they intersect with one of the range. The IPs outside the given ranges will not be used in the allocation. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77.

    Warning

    Do not use values that are out of the given CIDR.

  • excludeRanges

    A comma-separated list of IP address ranges within the given CIDR that should not be used in the allocation of IPs for nodes. The IPs within the given CIDR but outside the given ranges will be used in the allocation. The gateway, network, broadcast, and DNS addresses will be excluded (protected) automatically if they are included in the CIDR. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77.

    Warning

    Do not use values that are out of the given CIDR.

  • useWholeCidr

    If set to false (by default), the subnet address and broadcast address will be excluded from the address allocation. If set to true, the subnet address and the broadcast address are included into the address allocation for nodes.

  • nameservers

    The list of IP addresses of name servers. Each element of the list is a single address, for example, 172.18.176.6.

Configuration example:

spec:
  cidr: 172.16.48.0/24
  excludeRanges:
  - 172.16.48.99
  - 172.16.48.101-172.16.48.145
  gateway: 172.16.48.1
  nameservers:
  - 172.18.176.6
Subnet status

The status field of the Subnet resource describes the actual state of a subnet. It contains the following fields:

  • allocatable

    The number of IP addresses that are available for allocation.

  • allocatedIPs

    The list of allocated IP addresses in the IP:<IPAddr object UID> format.

  • capacity

    The total number of IP addresses to be allocated, including the sum of allocatable and already allocated IP addresses.

  • cidr

    The IPv4 CIDR for a subnet.

  • gateway

    The gateway address for a subnet.

  • nameservers

    The list of IP addresses of name servers.

  • ranges

    The list of IP address ranges within the given CIDR that are used in the allocation of IPs for nodes.

  • statusMessage

    Deprecated since Container Cloud 2.23.0 and will be removed in one of the following releases in favor of state and messages. Since Container Cloud 2.24.0, this field is not set for the subnets of newly created clusters. For the field description, see state.

  • state Since 2.23.0

    Message that reflects the current status of the resource. The list of possible values includes the following:

    • OK - object is operational.

    • ERR - object is non-operational. This status has a detailed description in the messages list.

    • TERM - object was deleted and is terminating.

  • messages Since 2.23.0

    List of error or warning messages if the object state is ERR.

  • objCreated

    Date, time, and IPAM version of the resource creation.

  • objStatusUpdated

    Date, time, and IPAM version of the last update of the status field in the resource.

  • objUpdated

    Date, time, and IPAM version of the last resource update.

Configuration example:

status:
  allocatable: 51
  allocatedIPs:
  - 172.16.48.200:24e94698-f726-11ea-a717-0242c0a85b02
  - 172.16.48.201:2bb62373-f726-11ea-a717-0242c0a85b02
  - 172.16.48.202:37806659-f726-11ea-a717-0242c0a85b02
  capacity: 54
  cidr: 172.16.48.0/24
  gateway: 172.16.48.1
  nameservers:
  - 172.18.176.6
  ranges:
  - 172.16.48.200-172.16.48.253
  objCreated: 2021-10-21T19:09:32Z  by  v5.1.0-20210930-121522-f5b2af8
  objStatusUpdated: 2021-10-21T19:14:18.748114886Z  by  v5.1.0-20210930-121522-f5b2af8
  objUpdated: 2021-10-21T19:09:32.606968024Z  by  v5.1.0-20210930-121522-f5b2af8
  state: OK
SubnetPool

Unsupported since 2.28.0 (17.3.0 and 16.3.0)

Warning

The SubnetPool object is unsupported since Container Cloud 2.28.0 (17.3.0 and 16.3.0). For details, see MOSK Deprecation Notes: SubnetPool resource management.

This section describes the SubnetPool resource used in Mirantis Container Cloud API to manage a pool of addresses from which subnets can be allocated.

For demonstration purposes, the Container Cloud SubnetPool custom resource (CR) is split into the following major sections:

SubnetPool metadata

The Container Cloud SubnetPool CR contains the following fields:

  • apiVersion

    API version of the object that is ipam.mirantis.com/v1alpha1.

  • kind

    Object type that is SubnetPool.

  • metadata

    The metadata field contains the following subfields:

    • name

      Name of the SubnetPool object.

    • namespace

      Project in which the SubnetPool object was created.

    • labels

      Key-value pairs that are attached to the object:

      • kaas.mirantis.com/provider

        Provider type that is baremetal.

      • kaas.mirantis.com/region

        Region name.

        Note

        The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

      Warning

      Labels and annotations that are not documented in this API Reference are generated automatically by Container Cloud. Do not modify them using the Container Cloud API.

Configuration example:

apiVersion: ipam.mirantis.com/v1alpha1
kind: SubnetPool
metadata:
  name: kaas-mgmt
  namespace: default
  labels:
    kaas.mirantis.com/provider: baremetal
SubnetPool spec

The spec field of the SubnetPool resource describes the desired state of a subnet pool. It contains the following fields:

  • cidr

    Valid IPv4 CIDR. For example, 10.10.0.0/16.

  • blockSize

    IP address block size to use when assigning an IP address block to every new child Subnet object. For example, if you set /25, every new child Subnet will have 128 IPs to allocate. Possible values are from /29 to the cidr size. Immutable.

  • nameservers

    Optional. List of IP addresses of name servers to use for every new child Subnet object. Each element of the list is a single address, for example, 172.18.176.6. Default: empty.

  • gatewayPolicy

    Optional. Method of assigning a gateway address to new child Subnet objects. Default: none. Possible values are:

    • first - first IP of the IP address block assigned to a child Subnet, for example, 10.11.10.1.

    • last - last IP of the IP address block assigned to a child Subnet, for example, 10.11.10.254.

    • none - no gateway address.

Configuration example:

spec:
  cidr: 10.10.0.0/16
  blockSize: /25
  nameservers:
  - 172.18.176.6
  gatewayPolicy: first
SubnetPool status

The status field of the SubnetPool resource describes the actual state of a subnet pool. It contains the following fields:

  • allocatedSubnets

    List of allocated subnets. Each subnet has the <CIDR>:<SUBNET_UID> format.

  • blockSize

    Block size to use for IP address assignments from the defined pool.

  • capacity

    Total number of IP addresses to be allocated. Includes the number of allocatable and already allocated IP addresses.

  • allocatable

    Number of subnets with the blockSize size that are available for allocation.

  • state Since 2.23.0

    Message that reflects the current status of the resource. The list of possible values includes the following:

    • OK - object is operational.

    • ERR - object is non-operational. This status has a detailed description in the messages list.

    • TERM - object was deleted and is terminating.

  • messages Since 2.23.0

    List of error or warning messages if the object state is ERR.

  • objCreated

    Date, time, and IPAM version of the resource creation.

  • objStatusUpdated

    Date, time, and IPAM version of the last update of the status field in the resource.

  • objUpdated

    Date, time, and IPAM version of the last resource update.

Example:

status:
  allocatedSubnets:
  - 10.10.0.0/24:0272bfa9-19de-11eb-b591-0242ac110002
  blockSize: /24
  capacity: 54
  allocatable: 51
  objCreated: 2021-10-21T19:09:32Z  by  v5.1.0-20210930-121522-f5b2af8
  objStatusUpdated: 2021-10-21T19:14:18.748114886Z  by  v5.1.0-20210930-121522-f5b2af8
  objUpdated: 2021-10-21T19:09:32.606968024Z  by  v5.1.0-20210930-121522-f5b2af8
  state: OK

Release Compatibility Matrix

The Mirantis Container Cloud Release Compatibility Matrix outlines the specific operating environments that are validated and supported.

The document provides the deployment compatibility for each product release and determines the upgrade paths between major components versions when upgrading. The document also provides the Container Cloud browser compatibility.

A Container Cloud management cluster upgrades automatically when a new product release becomes available. Once the management cluster has been updated, the user may trigger the managed clusters upgrade through the Container Cloud web UI or API.

To view the full components list with their respective versions for each Container Cloud release, refer to the Container Cloud Release Notes related to the release version of your deployment or use the Releases section in the web UI or API.

Caution

The document applies to the Container Cloud regular deployments. For supported configurations of existing Mirantis Kubernetes Engine (MKE) clusters that are not deployed by Container Cloud, refer to MKE Compatibility Matrix.

Compatibility matrix of component versions

The following tables outline the compatibility matrices of the most recent major Container Cloud and Cluster releases along with patch releases and their component versions. For details about unsupported releases, see Releases summary.

Major and patch versions update path

The primary distinction between major and patch product versions lies in the fact that major release versions introduce new functionalities, whereas patch release versions predominantly offer minor product enhancements, mostly CVE resolutions for your clusters.

Depending on your deployment needs, you can either update only between major Cluster releases or apply patch updates between major releases. Choosing the latter option ensures you receive security fixes as soon as they become available. Though, be prepared to update your cluster frequently, approximately once every three weeks. Otherwise, you can update only between major Cluster releases as each subsequent major Cluster release includes patch Cluster release updates of the previous major Cluster release.

Legend

Symbol

Definition

Cluster release is not included in the Container Cloud release yet.

Latest supported Cluster release to use for cluster deployment or update.

Deprecated Cluster release that you must update to the latest supported Cluster release. The deprecated Cluster release will become unsupported in one of the following Container Cloud releases. Greenfield deployments based on a deprecated Cluster release are not supported. Use the latest supported Cluster release instead.

Unsupported Cluster release that blocks automatic upgrade of a management cluster. Update the Cluster release to the latest supported one to unblock management cluster upgrade and obtain newest product features and enhancements.

Component is included in the Container Cloud release.

Component is available in the Technology Preview scope. Use it only for testing purposes on staging environments.

Component is unsupported in the Container Cloud release.

The following table outlines the compatibility matrix for the Container Cloud release series 2.28.x and 2.29.0.

Container Cloud compatibility matrix 2.28.x and 2.29.0

Release

Container Cloud

2.29.0 (current)

2.28.5

2.28.4

2.28.3

2.28.2

2.28.1

2.28.0

Release history

Release date

Mar 11, 2025

Feb 03, 2025

Jan 06, 2025

Dec 09, 2024

Nov 18, 2024

Oct 30, 2024

Oct 16, 2024

Major Cluster releases (managed)

17.4.0 +
MOSK 25.1
MKE 3.7.19

17.3.0 +
MOSK 24.3
MKE 3.7.12

17.2.0 +
MOSK 24.2
MKE 3.7.8

17.1.0 +
MOSK 24.1
MKE 3.7.5

16.4.0
MKE 3.7.19

16.3.0
MKE 3.7.12

16.2.0
MKE 3.7.8

16.1.0
MKE 3.7.5

Patch Cluster releases (managed)

17.3.x + MOSK 24.3.x

17.3.5
17.3.4
17.3.5
17.3.4

17.3.4

17.2.x + MOSK 24.2.x

17.2.7
17.2.6
17.2.5
17.2.4
17.2.3
17.2.7
17.2.6
17.2.5
17.2.4
17.2.3
17.2.7
17.2.6
17.2.5
17.2.4
17.2.3

17.2.6
17.2.5
17.2.4
17.2.3


17.2.5
17.2.4
17.2.3



17.2.4
17.2.3

17.1.x + MOSK 24.1.x

16.3.x

16.3.5
16.3.4
16.3.3
16.3.2
16.3.1
16.3.5
16.3.4
16.3.3
16.3.2
16.3.1

16.3.4
16.3.3
16.3.2
16.3.1


16.3.3
16.3.2
16.3.1



16.3.2
16.3.1




16.3.1

16.2.x

16.2.7
16.2.6
16.2.5
16.2.4
16.2.3


16.2.7
16.2.6
16.2.5
16.2.4
16.2.3


16.2.7
16.2.6
16.2.5
16.2.4
16.2.3



16.2.6
16.2.5
16.2.4
16.2.3




16.2.5
16.2.4
16.2.3





16.2.4
16.2.3
16.2.2
16.2.1

16.1.x

Fully managed cluster

Mirantis Kubernetes Engine (MKE)

3.7.19
17.4.0, 16.4.0
3.7.18
17.3.5, 16.3.5
3.7.17
17.3.4, 16.3.4
3.7.16
17.2.7, 16.3.3, 16.2.7
3.7.16
17.2.6, 16.3.2, 16.2.6
3.7.15
17.2.5, 16.3.1, 16.2.5
3.7.12
17.3.0, 16.3.0

Container orchestration

Kubernetes

1.27 17.x, 16.x

1.27 17.x, 16.x

1.27 17.x, 16.x

1.27 17.x, 16.x

1.27 17.x, 16.x

1.27 17.x, 16.x

1.27 17.x, 16.x

Container runtime

Mirantis Container Runtime (MCR)

25.0.8
17.4.0, 16.4.0
23.0.15
17.3.5, 16.3.5
23.0.15
17.3.5, 16.3.5
23.0.14
17.3.0, 16.3.0
23.0.15
17.3.4, 16.3.4
23.0.14
17.3.0, 16.3.0
23.0.14
17.3.0, 16.3.x
23.0.11
17.2.7, 16.2.7
23.0.14
17.3.0, 16.3.x
23.0.11
17.2.6, 16.2.6
23.0.14
17.3.0, 16.3.x
23.0.11
17.2.5, 16.2.5
23.0.14
17.3.0, 16.3.0
23.0.11
17.2.4, 16.2.4

OS distributions

Ubuntu

22.04 9

22.04 9

22.04 9

22.04 9

22.04 9

22.04 9

22.04 9

Infrastructure platform

Bare metal 8

kernel 5.15.0-131-generic Jammy
kernel 5.15.0-130-generic Jammy, Focal
kernel 5.15.0-126-generic Jammy, Focal
kernel 5.15.0-125-generic Jammy, Focal
kernel 5.15.0-124-generic Jammy, Focal
kernel 5.15.0-122-generic Jammy, Focal
kernel 5.15.0-119-generic Jammy, Focal

MOSK Yoga or Antelope with OVS 3

OpenStack (Octavia)
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope

Software defined storage

Ceph

18.2.4-12.cve
17.4.0, 16.4.0
18.2.4-11.cve
17.3.5, 16.3.5
18.2.4-11.cve
17.3.4, 16.3.4
18.2.4-10.cve
17.2.7, 16.3.3, 16.2.7
18.2.4-8.cve
17.2.6, 16.3.2, 16.2.6
18.2.4-6.cve
16.3.1
18.2.4-7.cve
17.2.5, 16.2.5
18.2.4-6.cve
17.3.0, 16.3.0

Rook

1.14.10-26
17.4.0, 16.4.0
1.13.5-28
17.3.5, 16.3.5
1.13.5-28
17.3.4, 16.3.4
1.13.5-26
17.2.7, 16.3.3, 16.2.7
1.13.5-23
17.2.6, 16.3.2, 16.2.6
1.13.5-21
16.3.1
1.13.5-22
17.2.5, 16.2.5
1.13.5-21
17.3.0, 16.3.0

Logging, monitoring, and alerting

StackLight


The following table outlines the compatibility matrix for the Container Cloud release series 2.27.x.

Container Cloud compatibility matrix 2.27.x
Container Cloud compatibility matrix 2.27.x

Release

Container Cloud

2.27.4

2.27.3

2.27.2

2.27.1

2.27.0

Release history

Release date

Sep 16, 2024

Aug 27, 2024

Aug 05, 2024

July 16, 2024

July 02, 2024

Major Cluster releases (managed)

17.2.0 +
MOSK 24.2
MKE 3.7.8

17.1.0 +
MOSK 24.1
MKE 3.7.5

16.2.0
MKE 3.7.8

16.1.0
MKE 3.7.5

Patch Cluster releases (managed)

17.2.x + MOSK 24.2.x

17.2.4
17.2.3

17.2.3

17.1.x + MOSK 24.1.x

17.1.7+24.1.7
17.1.6+24.1.6
17.1.5+24.1.5
17.1.7+24.1.7
17.1.6+24.1.6
17.1.5+24.1.5
17.1.7+24.1.7
17.1.6+24.1.6
17.1.5+24.1.5

17.1.6+24.1.6
17.1.5+24.1.5


17.1.5+24.1.5

16.2.x

16.2.4
16.2.3
16.2.2
16.2.1

16.2.3
16.2.2
16.2.1


16.2.2
16.2.1



16.2.1

16.1.x

16.1.7
16.1.6
16.1.5
16.1.7
16.1.6
16.1.5
16.1.7
16.1.6
16.1.5

16.1.6
16.1.5


16.1.5

Fully managed cluster

Mirantis Kubernetes Engine (MKE)

3.7.12
17.2.4, 16.2.4
3.7.12
17.2.3, 16.2.3
3.7.11
17.1.7, 16.2.2, 16.1.7
3.7.10
17.1.6, 16.2.1, 16.1.6
3.7.8
17.2.0, 16.2.0

Attached managed cluster

MKE 7

3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0
3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0
3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0

Container orchestration

Kubernetes

1.27 17.x, 16.x

1.27 17.x, 16.x

1.27 17.x, 16.x

1.27 17.x, 16.x

1.27 17.x, 16.x

Container runtime

Mirantis Container Runtime (MCR)

23.0.11 17.2.x, 16.2.x 10
23.0.9 17.1.x, 16.1.x 10
23.0.11 17.2.x, 16.2.x 10
23.0.9 17.1.x, 16.1.x 10
23.0.11 17.2.x, 16.2.x 10
23.0.9 17.1.x, 16.1.x 10
23.0.11 17.2.x, 16.2.x 10
23.0.9 17.1.x, 16.1.x 10

23.0.11 17.2.x, 16.2.x

OS distributions

Ubuntu

22.04 9
20.04
22.04 9
20.04
22.04 9
20.04
22.04 9
20.04
22.04 9
20.04

Infrastructure platform

Bare metal 8

kernel 5.15.0-119-generic Jammy
kernel 5.15.0-118-generic Focal
kernel 5.15.0-117-generic Jammy, Focal
kernel 5.15.0-116-generic Jammy
kernel 5.15.0-113-generic Focal
kernel 5.15.0-113-generic
kernel 5.15.0-107-generic

MOSK Yoga or Antelope with OVS 3

OpenStack (Octavia)
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope

VMware vSphere 5

7.0, 6.7

7.0, 6.7

7.0, 6.7

Software defined storage

Ceph

18.2.4-4.cve
17.2.4, 16.2.4
18.2.4-3.cve
17.2.3, 16.2.3
18.2.3-2.cve
16.2.2
17.2.7-15.cve
17.1.7, 16.1.7
18.2.3-2.cve
16.2.1
17.2.7-15.cve
17.1.6, 16.1.6
18.2.3-1.release
17.2.0, 16.2.0

Rook

1.13.5-19
17.2.4, 16.2.4
1.13.5-18
17.2.3, 16.2.3
1.13.5-16
16.2.2
1.12.10-21
17.1.7, 16.1.7
1.13.5-16
16.2.1
1.12.10-21
17.1.6, 16.1.6
1.13.5-15
17.2.0, 16.2.0

Logging, monitoring, and alerting

StackLight


The following table outlines the compatibility matrix for the Container Cloud release series 2.26.x.

Container Cloud compatibility matrix 2.26.x

Release

Container Cloud

2.26.5

2.26.4

2.26.3

2.26.2

2.26.1

2.26.0

Release history

Release date

June 18, 2024

May 20, 2024

Apr 29, 2024

Apr 08, 2024

Mar 20, 2024

Mar 04, 2024

Major Cluster releases (managed)

17.1.0 +
MOSK 24.1
MKE 3.7.5

17.0.0 +
MOSK 23.3
MKE 3.7.1

16.1.0
MKE 3.7.5

16.0.0
MKE 3.7.1

Patch Cluster releases (managed)

17.1.x + MOSK 24.1.x

17.1.5+24.1.5
17.1.4+24.1.4
17.1.3+24.1.3
17.1.2+24.1.2
17.1.1+24.1.1

17.1.4+24.1.4
17.1.3+24.1.3
17.1.2+24.1.2
17.1.1+24.1.1


17.1.3+24.1.3
17.1.2+24.1.2
17.1.1+24.1.1



17.1.2+24.1.2
17.1.1+24.1.1




17.1.1+24.1.1




17.0.x + MOSK 23.3.x

17.0.4+23.3.4
17.0.4+23.3.4
17.0.4+23.3.4
17.0.4+23.3.4
17.0.4+23.3.4
17.0.4+23.3.4
17.0.3+23.3.3
17.0.2+23.3.2
17.0.1+23.3.1

16.1.x

16.1.5
16.1.4
16.1.3
16.1.2
16.1.1

16.1.4
16.1.3
16.1.2
16.1.1


16.1.3
16.1.2
16.1.1



16.1.2
16.1.1




16.1.1




16.0.x

16.0.4
16.0.4
16.0.4
16.0.4
16.0.4
16.0.4
16.0.3
16.0.2
16.0.1

Fully managed cluster

Mirantis Kubernetes Engine (MKE)

3.7.8
17.1.5, 16.1.5
3.7.8
17.1.4, 16.1.4
3.7.7
17.1.3, 16.1.3
3.7.6
17.1.2, 16.1.2
3.7.5
17.1.1, 16.1.1
3.7.5
17.1.0, 16.1.0

Attached managed cluster

MKE 7

3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0
3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0
3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0
3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0
3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0
3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0

Container orchestration

Kubernetes

1.27 17.1.x, 16.1.x

1.27 17.1.x, 16.1.x

1.27 17.1.x, 16.1.x

1.27 17.1.x, 16.1.x

1.27 17.1.x, 16.1.x

1.27 17.1.x, 16.1.x

Container runtime

Mirantis Container Runtime (MCR)

23.0.9 17.1.x, 16.1.x 2

23.0.9 17.1.x, 16.1.x 2

23.0.9 17.1.x, 16.1.x 2

23.0.9 17.1.x, 16.1.x 2

23.0.9 17.1.x, 16.1.x

23.0.9 17.1.x, 16.1.x

OS distributions

Ubuntu

20.04

20.04

20.04

20.04

20.04

20.04

Infrastructure platform

Bare metal 8

kernel 5.15.0-107-generic
kernel 5.15.0-105-generic
kernel 5.15.0-102-generic
kernel 5.15.0-101-generic
kernel 5.15.0-97-generic
kernel 5.15.0-92-generic

MOSK Yoga or Antelope with OVS 3

OpenStack (Octavia)
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope

VMware vSphere 5

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

Software defined storage

Ceph

17.2.7-13.cve
17.1.5, 16.1.5
17.2.7-12.cve
17.1.4, 16.1.4
17.2.7-11.cve
17.1.3, 16.1.3
17.2.7-10.release
17.1.2, 16.1.2
17.2.7-9.release
17.1.1, 16.1.1
17.2.7-8.release
17.1.0, 16.1.0

Rook

1.12.10-19
17.1.5, 16.1.5
1.12.10-18
17.1.4, 16.1.4
1.12.10-17
17.1.3, 16.1.3
1.12.10-16
17.1.2, 16.1.2
1.12.10-14
17.1.1, 16.1.1
1.12.10-13
17.1.0, 16.1.0

Logging, monitoring, and alerting

StackLight


The following table outlines the compatibility matrix for the Container Cloud release series 2.25.x.

Container Cloud compatibility matrix 2.25.x

Release

Container Cloud

2.25.4

2.25.3

2.25.2

2.25.1

2.25.0

Release history

Release date

Jan 10, 2024

Dec 18, 2023

Dec 05, 2023

Nov 27, 2023

Nov 06, 2023

17.0.0 +
MOSK 23.3
MKE 3.7.1

16.0.0
MKE 3.7.1

15.0.1 +
MOSK 23.2
MKE 3.6.5

14.1.0 1
MKE 3.6.6

14.0.1
MKE 3.6.5

12.7.0 +
MOSK 23.1
MKE 3.5.7

11.7.0
MKE 3.5.7

Patch Cluster releases (managed)

17.0.x + MOSK 23.3.x

17.0.4+23.3.4
17.0.3+23.3.3
17.0.2+23.3.2
17.0.1+23.3.1

17.0.3+23.3.3
17.0.2+23.3.2
17.0.1+23.3.1


17.0.2+23.3.2
17.0.1+23.3.1



17.0.1+23.3.1

16.0.x

16.0.4
16.0.3
16.0.2
16.0.1

16.0.3
16.0.2
16.0.1


16.0.2
16.0.1



16.0.1

15.0.x + MOSK 23.2.x

15.0.4+23.2.3

15.0.4+23.2.3

15.0.4+23.2.3

15.0.4+23.2.3

15.0.4+23.2.3

14.0.x

14.0.4

14.0.4

14.0.4

14.0.4

14.0.4

Fully managed cluster

Mirantis Kubernetes Engine (MKE)

3.7.3
Since 17.0.3, 16.0.3
3.7.2
Since 17.0.1, 16.0.1
3.7.1
17.0.0, 16.0.0
3.7.3
Since 17.0.3, 16.0.3
3.7.2
Since 17.0.1, 16.0.1
3.7.1
17.0.0, 16.0.0
3.7.2
Since 17.0.1, 16.0.1
3.7.1
17.0.0, 16.0.0
3.7.2
Since 17.0.1, 16.0.1
3.7.1
17.0.0, 16.0.0
3.7.1
17.0.0, 16.0.0

Attached managed cluster

MKE 7

3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0
3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0
3.6.8
19.1.0
3.6.1
19.0.0
3.5.5
18.1.0
3.5.3
18.0.0

Container orchestration

Kubernetes

1.27 17.0.x, 16.0.x

1.27 17.0.x, 16.0.x

1.27 17.0.x, 16.0.x

1.27 17.0.x, 16.0.x

1.27 17.0.0, 16.0.0

Container runtime

Mirantis Container Runtime (MCR)

23.0.7 17.0.x, 16.0.x

23.0.7 17.0.x, 16.0.x

23.0.7 17.0.x, 16.0.x

23.0.7 17.0.x, 16.0.x

23.0.7 17.0.0, 16.0.0

OS distributions

Ubuntu

20.04

20.04

20.04

20.04

20.04

Infrastructure platform

Bare metal 8

kernel 5.15.0-86-generic

kernel 5.15.0-86-generic

kernel 5.15.0-86-generic

kernel 5.15.0-86-generic

kernel 5.15.0-86-generic

MOSK Yoga or Antelope with Tungsten Fabric 3

MOSK Yoga or Antelope with OVS 3

OpenStack (Octavia)
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope
Queens
Yoga
Antelope

VMware vSphere 5

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

Software defined storage

Ceph

17.2.6-8.cve
Since 17.0.3, 16.0.3
17.2.6-5.cve
17.0.2, 16.0.2
17.2.6-2.cve
17.0.1, 16.0.1
17.2.6-cve-1
17.0.0, 16.0.0, 14.1.0
17.2.6-8.cve
17.0.3, 16.0.3
17.2.6-5.cve
17.0.2, 16.0.2
17.2.6-2.cve
17.0.1, 16.0.1
17.2.6-cve-1
17.0.0, 16.0.0, 14.1.0
17.2.6-5.cve
17.0.2, 16.0.2
17.2.6-2.cve
17.0.1, 16.0.1
17.2.6-cve-1
17.0.0, 16.0.0, 14.1.0
17.2.6-2.cve
17.0.1, 16.0.1
17.2.6-cve-1
17.0.0, 16.0.0, 14.1.0
17.2.6-cve-1
17.0.0, 16.0.0, 14.1.0

Rook

1.11.11-22
17.0.4, 16.0.4
1.11.11-21
17.0.3, 16.0.3
1.11.11-17
17.0.2, 16.0.2
1.11.11-15
17.0.1, 16.0.1
1.11.11-13
17.0.0, 16.0.0, 14.1.0
1.11.11-21
17.0.3, 16.0.3
1.11.11-17
17.0.2, 16.0.2
1.11.11-15
17.0.1, 16.0.1
1.11.11-13
17.0.0, 16.0.0, 14.1.0
1.11.11-17
17.0.2, 16.0.2
1.11.11-15
S17.0.1, 16.0.1
1.11.11-13
17.0.0, 16.0.0, 14.1.0
1.11.11-15
17.0.1, 16.0.1
1.11.11-13
17.0.0, 16.0.0, 14.1.0
1.11.11-13
17.0.0, 16.0.0, 14.1.0

Logging, monitoring, and alerting

StackLight

The following table outlines the compatibility matrix for the Container Cloud release series 2.24.x.

Container Cloud compatibility matrix 2.24.x

Release

Container Cloud

2.24.5

2.24.4

2.24.3

2.24.2

2.24.0
2.24.1 0

Release history

Release date

Sep 26, 2023

Sep 14, 2023

Aug 29, 2023

Aug 21, 2023

Jul 20, 2023
Jul 27, 2023

Major Cluster releases (managed)

15.0.1 +
MOSK 23.2
MKE 3.6.5

14.0.1
MKE 3.6.5

14.0.0
MKE 3.6.5

12.7.0 +
MOSK 23.1
MKE 3.5.7

11.7.0
MKE 3.5.7

Patch Cluster releases (managed)

15.0.x + MOSK 23.2.x

15.0.4+23.2.3
15.0.3+23.2.2
15.0.2+23.2.1

15.0.3+23.2.2
15.0.2+23.2.1


15.0.2+23.2.1

14.0.x

14.0.4
14.0.3
14.0.2

14.0.3
14.0.2


14.0.2

Managed cluster

Mirantis Kubernetes Engine (MKE)

3.6.6
Since 15.0.2, 14.0.2
3.6.5
15.0.1, 14.0.1
3.6.6
Since 15.0.2, 14.0.2
3.6.5
15.0.1, 14.0.1
3.6.6
15.0.2, 14.0.2
3.6.5
15.0.1, 14.0.1
3.6.5
15.0.1, 14.0.1
3.6.5
14.0.0

Container orchestration

Kubernetes

1.24
15.0.x, 14.0.x
1.24
15.0.x, 14.0.x
1.24
15.0.x, 14.0.x
1.24
15.0.1, 14.0.1
1.24
14.0.0

Container runtime

Mirantis Container Runtime (MCR)

20.10.17
15.0.x, 14.0.x
20.10.17
15.0.x, 14.0.x
20.10.17 2
15.0.x, 14.0.x
20.10.17
15.0.1, 14.0.1
20.10.17
14.0.0

OS distributions

Ubuntu

20.04

20.04

20.04

20.04

20.04

Infrastructure platform

Bare metal

kernel 5.4.0-150-generic

kernel 5.4.0-150-generic

kernel 5.4.0-150-generic

kernel 5.4.0-150-generic

kernel 5.4.0-150-generic

MOSK Yoga or Antelope with Tungsten Fabric 3

MOSK Yoga or Antelope with OVS 3

OpenStack (Octavia)
Queens
Yoga
Queens
Yoga
Queens
Yoga
Queens
Yoga
Queens
Yoga

VMware vSphere 5

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

Software defined storage

Ceph 6

17.2.6-cve-1 Since 15.0.2, 14.0.2
17.2.6-rel-5 15.0.1, 14.0.1
17.2.6-cve-1
Since 15.0.2, 14.0.2
17.2.6-rel-5
15.0.1, 14.0.1
17.2.6-cve-1
15.0.2, 14.0.2
17.2.6-rel-5
15.0.1, 14.0.1
17.2.6-rel-5
17.2.6-rel-5
16.2.11-cve-4
16.2.11

Rook 6

1.11.4-12
Since 15.0.3, 14.0.3
1.11.4-11
15.0.2, 14.0.2
1.11.4-10
15.0.1, 14.0.1
1.11.4-12
15.0.3, 14.0.3
1.11.4-11
15.0.2, 14.0.2
1.11.4-10
15.0.1, 14.0.1
1.11.4-11
15.0.2, 14.0.2
1.11.4-10
15.0.1, 14.0.1
1.11.4-10
1.11.4-10
1.10.10-10
1.0.0-20230120144247

Logging, monitoring, and alerting

StackLight

The following table outlines the compatibility matrix for the Container Cloud release series 2.23.x.

Container Cloud compatibility matrix 2.23.x

Release

Container Cloud

2.23.5

2.23.4

2.23.3

2.23.2

2.23.1

2.23.0

Release history

Release date

Jun 05, 2023

May 22, 2023

May 04, 2023

Apr 20, 2023

Apr 04, 2023

Mar 07, 2023

Major Cluster releases (managed)

12.7.0 +
MOSK 23.1 MKE 3.5.7

12.5.0 +
MOSK 22.5 MKE 3.5.5

11.7.0
MKE 3.5.7

11.6.0
MKE 3.5.5

Patch Cluster releases (managed)

12.7.x + MOSK 23.1.x

12.7.4 + 23.1.4
12.7.3 + 23.1.3
12.7.2 + 23.1.2
12.7.1 + 23.1.1

12.7.3 + 23.1.3
12.7.2 + 23.1.2
12.7.1 + 23.1.1


12.7.2 + 23.1.2
12.7.1 + 23.1.1



12.7.1 + 23.1.1

11.7.x

11.7.4
11.7.3
11.7.2
11.7.1

11.7.3
11.7.2
11.7.1


11.7.2
11.7.1



11.7.1

Managed cluster

Mirantis Kubernetes Engine (MKE)

3.5.7 12.7.x, 11.7.x

3.5.7 12.7.x, 11.7.x

3.5.7 12.7.x, 11.7.x

3.5.7 12.7.x, 11.7.x

3.5.7 12.7.0, 11.7.0

3.5.7 11.7.0

Container orchestration

Kubernetes

1.21 12.7.x, 11.7.x

1.21 12.7.x, 11.7.x

1.21 12.7.x, 11.7.x

1.21 12.7.x, 11.7.x

1.21 12.7.0, 11.7.0

1.21 12.5.0, 11.7.0

Container runtime

Mirantis Container Runtime (MCR) 2

20.10.13

20.10.13

20.10.13

20.10.13

20.10.13

20.10.13

OS distributions

Ubuntu

20.04

20.04

20.04

20.04

20.04

20.04

Infrastructure platform

Bare metal

kernel 5.4.0-137-generic

kernel 5.4.0-137-generic

kernel 5.4.0-137-generic

kernel 5.4.0-137-generic

kernel 5.4.0-137-generic

kernel 5.4.0-137-generic

MOSK Victoria or Yoga with Tungsten Fabric 3

MOSK Victoria or Yoga with OVS 3

OpenStack (Octavia)
Queens
Victoria
Yoga
Queens
Victoria
Yoga
Queens
Victoria
Yoga
Queens
Victoria
Yoga
Queens
Victoria
Yoga
Queens
Victoria
Yoga

VMware vSphere 5

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

7.0, 6.7

Software defined storage

Ceph 6

16.2.11-cve-4
16.2.11-cve-2
16.2.11
16.2.11-cve-4
16.2.11-cve-2
16.2.11
16.2.11-cve-4
16.2.11-cve-2
16.2.11

16.2.11-cve-2
16.2.11


16.2.11


16.2.11

Rook 6

1.10.10-10
1.10.10-9
1.0.0-20230120144247
1.10.10-10
1.10.10-9
1.0.0-20230120144247
1.10.10-10
1.10.10-9
1.0.0-20230120144247

1.10.10-9
1.0.0-20230120144247


1.0.0-20230120144247


1.0.0-20230120144247

Logging, monitoring, and alerting

StackLight

0

Container Cloud 2.23.5 or 2.24.0 automatically upgrades to the 2.24.1 patch release containing several hot fixes.

1

The major Cluster release 14.1.0 is dedicated for the vSphere provider only. This is the last Cluster release for the vSphere provider based on MCR 20.10 and MKE 3.6.6 with Kubernetes 1.24.

Container Cloud 2.25.1 introduces the patch Cluster release 16.0.1 that supports the vSphere provider on MCR 23.0.7 and MKE 3.7.2 with Kubernetes 1.27. For details, see External vSphere CCM with CSI supporting vSphere 6.7 on Kubernetes 1.27.

2(1,2,3,4,5,6)
  • In Container Cloud 2.26.2, docker-ee-cli is updated to 23.0.10 for MCR 23.0.9 to fix several CVEs.

  • In Container Cloud 2.24.3, docker-ee-cli is updated to 20.10.18 for MCR 20.10.17 to fix the following CVEs: CVE-2023-28840, CVE-2023-28642, CVE-2022-41723.

3(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
  • OpenStack Antelope is supported as TechPreview since MOSK 23.3.

  • A Container Cloud cluster based on MOSK Yoga or Antelope with Tungsten Fabric is supported as TechPreview since Container Cloud 2.25.1. Since Container Cloud 2.26.0, support for this configuration is suspended. If you still require this configuration, contact Mirantis support for further information.

  • OpenStack Victoria is supported until September, 2023. MOSK 23.2 is the last release version where OpenStack Victoria packages are updated.

    If you have not already upgraded your OpenStack version to Yoga, Mirantis highly recommends doing this during the course of the MOSK 23.2 series. For details, see MOSK documentation: Upgrade OpenStack.

4(1,2,3,4,5,6)

Only Cinder API V3 is supported.

5(1,2,3,4,5)
  • Since Container Cloud 2.27.3 (Cluster release 16.2.3), the VMware vSphere configuration is unsupported. For details, see Deprecation notes.

  • VMware vSphere is supported on RHEL 8.7 or Ubuntu 20.04.

  • RHEL 8.7 is generally available since Cluster releases 16.0.0 and 14.1.0. Before these Cluster releases, it is supported within the Technology Preview features scope.

  • For Ubuntu deployments, Packer builds a vSphere virtual machine template that is based on Ubuntu 20.04 with kernel 5.15.0-116-generic. If you build a VM template manually, we recommend installing the same kernel version 5.15.0-116-generic.

6(1,2,3,4)
  • Ceph Pacific supported in 2.23.0 is automatically updated to Quincy during cluster update to 2.24.0.

  • Ceph Pacific 16.2.11 and Rook 1.0.0-20230120144247 apply to major Cluster releases 12.7.0 and 11.7.0 only.

7(1,2,3)

Attachment of non Container Cloud based MKE clusters is supported only for vSphere-based management clusters on Ubuntu 20.04. Since Container Cloud 2.27.3 (Cluster release 16.2.3), the vSphere-based configuration is unsupported. For details, see Deprecation notes.

8(1,2,3,4)

The kernel version of the host operating system is validated by Mirantis and confirmed to be working for the supported use cases. Usage of custom kernel versions or third-party vendor-provided kernels, such as FIPS-enabled, assume full responsibility for validating the compatibility of components in such environments.

9(1,2,3,4,5,6,7,8,9,10,11,12)
  • On non-MOSK clusters, Ubuntu 22.04 is installed by default on management and managed clusters. Ubuntu 20.04 is not supported.

  • On MOSK clusters:

    • Since Container Cloud 2.28.0 (Cluster releases 17.3.0), Ubuntu 22.04 is generally available for managed clusters. All existing deployments based on Ubuntu 20.04 must be upgraded to 22.04 within the course of 2.28.x. Otherwise, update of managed clusters to 2.29.0 will become impossible and management cluster update to 2.29.1 will be blocked.

    • Before Container Cloud 2.28.0 (Cluster releases 17.2.0, 16.2.0, or earlier), Ubuntu 22.04 is installed by default on management clusters only. And Ubuntu 20.04 is the only supported distribution for managed clusters.

10(1,2,3,4,5,6,7,8)

In Container Cloud 2.27.1, docker-ee-cli is updated to 23.0.13 for MCR 23.0.11 and 23.0.9 to fix several CVEs.

See also

Release Notes

Container Cloud web UI browser compatibility

The Container Cloud web UI runs in the browser, separate from any backend software. As such, Mirantis aims to support browsers separately from the backend software in use, although each Container Cloud release is tested with specific browser versions.

Mirantis currently supports the following web browsers for the Container Cloud web UI:

Browser

Supported version

Release date

Supported operating system

Firefox

94.0 or newer

November 2, 2021

Windows, macOS

Google Chrome

96.0.4664 or newer

November 15, 2021

Windows, macOS

Microsoft Edge

95.0.1020 or newer

October 21, 2021

Windows

Caution

This table does not apply to third-party web UIs such as the StackLight or Keycloak endpoints that are available through the Container Cloud web UI. Refer to the official documentation of the corresponding third-party component for details about its supported browsers versions.

To ensure the best user experience, Mirantis recommends that you use the latest version of any of the supported browsers. The use of other browsers or older versions of the browsers we support can result in rendering issues, and can even lead to glitches and crashes in the event that the Container Cloud web UI does not support some JavaScript language features or browser web APIs.

Important

Mirantis does not tie browser support to any particular Container Cloud release.

Mirantis strives to leverage the latest in browser technology to build more performant client software, as well as ensuring that our customers benefit from the latest browser security updates. To this end, our strategy is to regularly move our supported browser versions forward, while also lagging behind the latest releases by approximately one year to give our customers a sufficient upgrade buffer.

See also

Release Notes

Release Notes

Major and patch versions update path

The primary distinction between major and patch product versions lies in the fact that major release versions introduce new functionalities, whereas patch release versions predominantly offer minor product enhancements, mostly CVE resolutions for your clusters.

Depending on your deployment needs, you can either update only between major Cluster releases or apply patch updates between major releases. Choosing the latter option ensures you receive security fixes as soon as they become available. Though, be prepared to update your cluster frequently, approximately once every three weeks. Otherwise, you can update only between major Cluster releases as each subsequent major Cluster release includes patch Cluster release updates of the previous major Cluster release.

Releases summary
Container Cloud release


Release date



Supported Cluster releases


Summary



2.29.0

Mar 11, 2025

  • Improvements in the CIS Benchmark compliance for Ubuntu Linux 22.04 LTS v2.0.0 L1 Server

  • Support for MKE 3.7.19

  • Support for MCR 25.0.8

  • Switch of the default container runtime from Docker to containerd

  • BareMetalHostInventory instead of BareMetalHost

  • Validation of the Subnet object changes against allocated IP addresses

  • Improvements in calculation of update estimates using ClusterUpdatePlan

2.28.5

Feb 03, 2025

Container Cloud 2.28.5 is the fifth patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.5 and 17.3.5 that represents MOSK patch release 24.3.2.

  • Support for Mirantis Kubernetes Engine to 3.7.18 and Mirantis Container Runtime 23.0.15, which includes containerd 1.6.36.

  • Optional migration of container runtime from Docker to containerd.

  • Bare metal: update of Ubuntu mirror to ubuntu-2025-01-08-003900 along with update of minor kernel version to 5.15.0-130-generic.

  • Security fixes for CVEs in images.

2.28.4

Jan 06, 2025

Container Cloud 2.28.4 is the fourth patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.4 and 17.3.4 that represents MOSK patch release 24.3.1.

  • Support for Mirantis Kubernetes Engine to 3.7.17 and Mirantis Container Runtime 23.0.15, which includes containerd 1.6.36.

  • Optional migration of container runtime from Docker to containerd.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-12-05-003900 along with update of minor kernel version to 5.15.0-126-generic.

  • Security fixes for CVEs in images.

  • OpenStack provider: suspension of support for cluster deployment and update

2.28.3

Dec 09, 2024

Container Cloud 2.28.3 is the third patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.3.

  • Support for the patch Cluster releases 16.2.7 and 17.2.7 that represents MOSK patch release 24.2.5.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-11-18-003900 along with update of minor kernel version to 5.15.0-125-generic.

  • Security fixes for CVEs in images.

2.28.2

Nov 18, 2024

Container Cloud 2.28.2 is the second patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.2.

  • Support for the patch Cluster releases 16.2.6 and 17.2.6 that represents MOSK patch release 24.2.4.

  • Support for MKE 3.7.16.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-10-28-012906 along with update of minor kernel version to 5.15.0-124-generic.

  • Security fixes for CVEs in images.

2.28.1

Oct 30, 2024

Container Cloud 2.28.1 is the first patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.1.

  • Support for the patch Cluster releases 16.2.5 and 17.2.5 that represents MOSK patch release 24.2.3.

  • Support for MKE 3.7.15.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-10-14-013948 along with update of minor kernel version to 5.15.0-122-generic.

  • Security fixes for CVEs in images.

2.28.0

Oct 16, 2024

  • General availability for Ubuntu 22.04 on MOSK clusters

  • Improvements in the CIS Benchmark compliance for Ubuntu Linux 22.04 LTS v2.0.0 L1 Server

  • Support for MKE 3.7.12 on clusters following the major update path

  • Support for MCR 23.0.14

  • Update group for controller nodes

  • Reboot of machines using update groups

  • Amendments for the ClusterUpdatePlan object

  • Refactoring of delayed auto-update of a management cluster

  • Self-diagnostics for management and managed clusters

  • Configuration of groups in auditd

  • Container Cloud web UI enhancements for the bare metal provider

  • Day-2 operations for bare metal:

    • Updating modules

    • Configuration enhancements for modules

  • StackLight:

    • Monitoring of LCM issues

    • Refactoring of StackLight expiration alerts

  • Documentation enhancements

- Cluster release is deprecated and will become unsupported in one of the following Container Cloud releases.

Container Cloud releases

This section outlines the release notes for the Mirantis Container Cloud GA release. Within the scope of the Container Cloud GA release, major releases are being published continuously with new features, improvements, and critical issues resolutions to enhance the Container Cloud GA version. Between major releases, patch releases that incorporate fixes for CVEs of high and critical severity are being delivered. For details, see Container Cloud releases, Cluster releases (managed), and Patch releases.

Once a new Container Cloud release is available, a management cluster automatically upgrades to a newer consecutive release unless this cluster contains managed clusters with a Cluster release unsupported by the newer Container Cloud release. For more details about the Container Cloud release mechanism, see Reference Architecture: Release Controller.

2.29.0 (current)

The Mirantis Container Cloud major release 2.29.0:

  • Introduces support for the Cluster release 17.4.0 that is based on the Cluster release 16.4.0 and represents Mirantis OpenStack for Kubernetes (MOSK) 25.1.

  • Introduces support for the Cluster release 16.4.0 that is based on Mirantis Container Runtime (MCR) 25.0.8 and Mirantis Kubernetes Engine (MKE) 3.7.19 with Kubernetes 1.27.

  • Does not support greenfield deployments on deprecated Cluster releases of the 17.3.x and 16.3.x series. Use the latest available Cluster releases of the series instead.

    Caution

    Make sure to update the Cluster release version of your managed cluster before the current Cluster release version becomes unsupported by a new Container Cloud release version. Otherwise, Container Cloud stops auto-upgrade and eventually Container Cloud itself becomes unsupported.

This section outlines release notes for the Container Cloud release 2.29.0.

Enhancements

This section outlines new features and enhancements introduced in the Container Cloud release 2.29.0.

  • For the list of enhancements delivered with the Cluster releases introduced by Container Cloud 2.29.0, see 17.4.0 and 16.4.0.

  • For the list of enhancements delivered with MOSK 25.1 introduced together with Container Cloud 2.29.0, see MOSK release notes 25.1: New features.

BareMetalHostInventory instead of BareMetalHost

To allow the operator use the gitops approach, implemented the BareMetalHostInventory resource that must be used instead of BareMetalHost for adding and modifying configuration of bare metal servers.

The BareMetalHostInventory resource monitors and manages the state of a bare metal server and is created for each Machine with all information about machine hardware configuration.

Each BareMetalHostInventory object is synchronized with an automatically created BareMetalHost object, which is now used for internal purposes of the Container Cloud private API.

Caution

Any change in the BareMetalHost object will be overwitten by BareMetalHostInventory.

For any existing BareMetalHost object, a BareMetalHostInventory object is created automatically during cluster update.

Caution

While the Cluster release the management cluster is 16.4.0, BareMetalHostInventory operations are allowed to m:kaas@management-admin only. Once the management cluster is updated to the Cluster release 16.4.1 (or later), this limitation will be lifted.

Validation of the Subnet object changes against allocated IP addresses

Implemented a validation of the Subnet object changes against already allocated IP addresses. This validation is performed by the Admission Controller. The controller now blocks changes in the Subnet object containing allocated IP addresses that are out of the allocatable IP address space, which is formed by a CIDR address and include/exclude address ranges.

Improvements in calculation of update estimates using ClusterUpdatePlan

Improved calculation of update estimates for a managed cluster that is managed by the ClusterUpdatePlan object. Each step of ClusterUpdatePlan now has more precise estimates that are based on the following calculations:

  • The amount and type of components updated between releases during patch updates

  • The amount of nodes with particular roles in the OpenStack cluster

  • The number of nodes and storage used in the Ceph cluster

Also, the ClusterUpdatePlan object now contains the releaseNotes field that links to MOSK release notes of the target release.

Switch of the default container runtime from Docker to containerd

Switched the default container runtime from Docker to containerd on greenfield management and managed clusters. The use of containerd allows for better Kubernetes performance and component update without pod restart when applying fixes for CVEs.

On existing clusters, perform the mandatory migration from Docker to containerd in the scope of Container Cloud 2.29.x. Otherwise, the management cluster update to Container Cloud 2.30.0 will be blocked.

Important

Container runtime migration involves machine cordoning and draining.

Addressed issues

The following issues have been addressed in the Mirantis Container Cloud release 2.29.0 along with the Cluster releases 17.4.0 and 16.4.0. For the list of MOSK addressed issues, see MOSK release notes 25.1: Addressed issues.

Note

This section provides descriptions of issues addressed since the last Container Cloud patch release 2.28.5.

For details on addressed issues in earlier patch releases since 2.28.0, which are also included into the major release 2.29.0, refer to 2.28.x patch releases.

  • [47263] [StackLight] Fixed the issue with configuration inconsistencies for requests and limits between the deprecated resourcesPerClusterSize and resources parameters.

  • [44193] [StackLight] Fixed the issue with OpenSearch reaching the 85% disk usage watermark on High Availability clusters that use Local Volume Provisioner, which caused the OpenSearch cluster state to switch to Warning or Critical.

  • [46858] [Container Cloud web UI] Fixed the issue that prevented the drop-down menu from displaying the full list of allowed node labels.

  • [39437] [LCM] Fixed the issue that caused failure to replace a master node and the Kubelet's NodeReady condition is Unknown message in the machine status on the remaining master nodes.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.29.0 including the Cluster releases 17.4.0 and 16.4.0. For the list of MOSK known issues, see MOSK release notes 25.1: Known issues.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[50287] BareMetalHost with a Redfish BMC address is stuck on registering phase

During addition of a bare metal host containing a Redfish Baseboard Management Controller address with the following exemplary configuration may get stuck during the registering phase:

bmc:
  address: redfish://192.168.1.150/redfish/v1/Systems/1

Workaround:

  1. Open the ironic-config configmap for editing:

    KUBECONFIG=mgmt_kubeconfig kubectl -n kaas edit cm ironic-config
    
  2. In the data:ironic.conf section, add the enabled_firmware_interfaces parameter:

    data:
      ironic.conf: |
    
        [DEFAULT]
        ...
        enabled_firmware_interfaces = redfish,no-firmware
        ...
    
  3. Restart Ironic:

    KUBECONFIG=mgmt_kubeconfig kubectl -n kaas rollout restart deployment/ironic
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.

Ceph
[50637] Ceph creates second miracephnodedisable object during node disabling

During managed cluster update, if some node is being disabled and at the same time ceph-maintenance-controller is restarted, a second miracephnodedisable object is erroneously created for the node. As a result, the second object fails in the Cleaning state, which blocks managed cluster update.

Workaround

  1. On the affected managed cluster, obtain the list of miracephnodedisable objects:

    kubectl get miracephnodedisable -n ceph-lcm-mirantis
    

    The system response must contain one completed and one failed miracephnodedisable object for the node being disabled. For example:

    NAME                                               AGE   NODE NAME                                        STATE      LAST CHECK             ISSUE
    nodedisable-353ccad2-8f19-4c11-95c9-a783abb531ba   58m   kaas-node-91207a35-3200-41d1-9ba9-388500970981   Ready      2025-03-06T22:04:48Z
    nodedisable-58bbf563-1c76-4319-8c28-363d73a5efef   57m   kaas-node-91207a35-3200-41d1-9ba9-388500970981   Cleaning   2025-03-07T11:59:27Z   host clean up Job 'ceph-lcm-mirantis/host-cleanup-nodedisable-58bbf563-1c76-4319-8c28-363d73a5efef' is failed, check logs
    
  2. Remove the failed miracephnodedisable object. For example:

    kubectl delete miracephnodedisable -n ceph-lcm-mirantis nodedisable-58bbf563-1c76-4319-8c28-363d73a5efef
    
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.

LCM
[50768] Failure to update the MCCUpgrade object

While editing the MCCUpgrade object, the following error occurs when trying to save changes:

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure",
"message":"Internal error occurred: failed calling webhook \"mccupgrades.kaas.mirantis.com\":
failed to call webhook: the server could not find the requested resource",
"reason":"InternalError",
"details":{"causes":[{"message":"failed calling webhook \"mccupgrades.kaas.mirantis.com\":
failed to call webhook: the server could not find the requested resource"}]},"code":500}

To work around the issue, remove the name: mccupgrades.kaas.mirantis.com entry from mutatingwebhookconfiguration:

kubectl --kubeconfig kubeconfig edit mutatingwebhookconfiguration admission-controller

Example configuration:

- admissionReviewVersions:
  - v1
  - v1beta1
  clientConfig:
    caBundle: <REDACTED>
    service:
      name: admission-controller
      namespace: kaas
      path: /mccupgrades
      port: 443
  failurePolicy: Fail
  matchPolicy: Equivalent
  name: mccupgrades.kaas.mirantis.com
  namespaceSelector: {}
  objectSelector: {}
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - kaas.mirantis.com
    apiVersions:
    - v1alpha1
    operations:
    - CREATE
    - UPDATE
    resources:
    - mccupgrades
    scope: '*'
  sideEffects: NoneOnDryRun
  timeoutSeconds: 5
[50561] The local-volume-provisioner pod switches to CrashLoopBackOff

After machine disablement and consequent re-enablement, persistent volumes (PVs) provisioned by local-volume-provisioner that are not used by any pod may cause the local-volume-provisioner pod on such machine to switch to the CrashLoopBackOff state.

Workaround:

  1. Identify the ID of the affected local-volume-provisioner:

    kubectl -n kube-system get pods
    

    Example of system response extract:

    local-volume-provisioner-h5lrc   0/1   CrashLoopBackOff   33 (2m3s ago)   90m
    
  2. In the local-volume-provisioner logs, identify the affected PVs. For example:

    kubectl logs -n kube-system local-volume-provisioner-h5lrc | less
    

    Example of system response extract:

    E0304 23:21:31.455148    1 discovery.go:221] Failed to discover local volumes:
    5 error(s) while discovering volumes: [error creating PV "local-pv-1d04ed53"
    for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol04":
    persistentvolumes "local-pv-1d04ed53" already exists error creating PV "local-pv-ce2dfc24"
    for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol01":
    persistentvolumes "local-pv-ce2dfc24" already exists error creating PV "local-pv-bcb9e4bd"
    for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol02":
    persistentvolumes "local-pv-bcb9e4bd" already exists error creating PV "local-pv-c5924ada"
    for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol03":
    persistentvolumes "local-pv-c5924ada" already exists error creating PV "local-pv-7c7150cf"
    for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol00":
    persistentvolumes "local-pv-7c7150cf" already exists]
    
  3. Delete all PVs that contain the already exists error in logs. For example:

    kubectl delete pv local-pv-1d04ed53
    
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

StackLight
[43474] Custom Grafana dashboards are corrupted

Custom Grafana panels and dashboards may be corrupted after automatic migration of deprecated Angular-based plugins to the React-based ones. For details, see MOSK Deprecation Notes: Angular plugins in Grafana dashboards and the post-update step Back up custom Grafana dashboards in Container Cloud 2.28.4 update notes.

To work around the issue, manually adjust the affected dashboards to restore their custom appearance.

Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

[50140] The Ceph Clusters tab does not display Ceph cluster details

The Clusters page for the bare metal provider does not display information about the Ceph cluster in the Ceph Clusters tab and contains access denied errors.

To work around the issue, verify the Ceph cluster state through CLI. For details, see MOSK documentation: Ceph operations - Verify Ceph.

Components versions

The following table lists major components and their versions delivered in Container Cloud 2.29.0. The components that are newly added, updated, deprecated, or removed as compared to 2.28.0, are marked with a corresponding superscript, for example, admission-controller Updated.

Component

Application/Service

Version

Bare metal Updated

ambassador

1.42.9

baremetal-dnsmasq

base-2-29-alpine-20250217104113

baremetal-operator

base-2-29-alpine-20250217104322

baremetal-provider

1.42.9

bm-collective

base-2-29-alpine-20250217104943

cluster-api-provider-baremetal

1.42.9

ironic

caracal-jammy-20250128120200

ironic-inspector

caracal-jammy-20250128120200

ironic-prometheus-exporter

0.1-20240913123302

kaas-ipam

1.42.9

kubernetes-entrypoint

1.0.1-202a68c-20250203183923

mariadb

10.6.20-jammy-20241104184039

syslog-ng

base-alpine-20250217103755

Container Cloud Updated

admission-controller

1.42.9

agent-controller

1.42.9

byo-cluster-api-controller

1.42.9

ceph-kcc-controller

1.42.9

cert-manager-controller

1.11.0-11

configuration-collector

1.42.9

event-controller

1.42.9

frontend

1.42.9

golang

1.23.6-alpine3.20

iam-controller

1.42.9

kaas-exporter

1.42.9

kproxy

1.42.9

lcm-controller

1.42.9

license-controller

1.42.9

machinepool-controller

1.42.9

nginx

1.42.9

portforward-controller

1.42.9

rbac-controller

1.42.9

registry

2.8.1-15

release-controller

1.42.9

scope-controller

1.42.9

secret-controller

1.42.9

user-controller

1.42.9

IAM Updated

iam

1.42.9

mariadb

10.6.20-jammy-20241104184039

mcc-keycloak

25.0.6-20241114073807

OpenStack Deprecated

host-os-modules-controller

1.42.9

openstack-cluster-api-controller

1.42.9

openstack-provider

1.42.9

Artifacts

This section lists the artifacts of components included in the Container Cloud release 2.29.0. The components that are newly added, updated, deprecated, or removed as compared to 2.28.0, are marked with a corresponding superscript, for example, admission-controller Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs Updated

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-caracal-jammy-debug-20250217102957

ironic-python-agent.kernel Updated

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-caracal-jammy-debug-20250217102957

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-167-e7a55fd.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.42.9.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.42.9.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.42.9.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.42.9.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.42.9.tgz

Docker images Updated

ambassador

mirantis.azurecr.io/core/external/nginx:1.42.9

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-29-alpine-20250217104113

baremetal-operator

mirantis.azurecr.io/bm/baremetal-operator:base-2-29-alpine-20250217104322

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-29-alpine-20250217104943

cluster-api-provider-baremetal

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.42.9

ironic

mirantis.azurecr.io/openstack/ironic:caracal-jammy-20250128120200

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:caracal-jammy-20250128120200

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240913123302

kaas-ipam

mirantis.azurecr.io/core/kaas-ipam:1.42.9

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-202a68c-20250203183923

mariadb

mirantis.azurecr.io/general/mariadb:10.6.20-jammy-20241104184039

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20250217103755

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.42.9.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.42.9.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.42.9.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.42.9.tgz

byo-provider Removed

n/a

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.42.9.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.42.9.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.42.9.tgz

credentials-controller Deprecated

https://binary.mirantis.com/core/helm/credentials-controller-1.42.9.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.42.9.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.42.9.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.42.9.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.42.9.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.42.9.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.42.9.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.42.9.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.42.9.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.42.9.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.42.9.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.42.9.tgz

openstack-provider Deprecated

https://binary.mirantis.com/core/helm/openstack-provider-1.42.9.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.42.9.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.42.9.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.42.9.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.42.9.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.42.9.tgz

squid-proxy Removed

n/a

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.42.9.tgz

Docker images Updated

admission-controller

mirantis.azurecr.io/core/admission-controller:1.42.9

agent-controller

mirantis.azurecr.io/core/agent-controller:1.42.9

byo-cluster-api-controller Removed

n/a

ceph-kcc-controller

mirantis.azurecr.io/core/ceph-kcc-controller:1.42.9

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-11

configuration-collector

mirantis.azurecr.io/core/configuration-collector:1.42.9

credentials-controller Deprecated

mirantis.azurecr.io/core/credentials-controller:1.42.9

event-controller

mirantis.azurecr.io/core/event-controller:1.42.9

frontend

mirantis.azurecr.io/core/frontend:1.42.9

host-os-modules-controller

mirantis.azurecr.io/core/host-os-modules-controller:1.42.9

iam-controller

mirantis.azurecr.io/core/iam-controller:1.42.9

kaas-exporter

mirantis.azurecr.io/core/kaas-exporter:1.42.9

kproxy

mirantis.azurecr.io/core/kproxy:1.42.9

lcm-controller

mirantis.azurecr.io/core/lcm-controller:1.42.9

license-controller

mirantis.azurecr.io/core/license-controller:1.42.9

machinepool-controller

mirantis.azurecr.io/core/machinepool-controller:1.42.9

mcc-cache-warmup

mirantis.azurecr.io/core/mcc-cache-warmup:1.42.9

nginx

mirantis.azurecr.io/core/external/nginx:1.42.9

openstack-cluster-api-controller Deprecated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.42.9

portforward-controller

mirantis.azurecr.io/core/portforward-controller:1.42.9

rbac-controller

mirantis.azurecr.io/core/rbac-controller:1.42.9

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-15

release-controller

mirantis.azurecr.io/core/release-controller:1.42.9

scope-controller

mirantis.azurecr.io/core/scope-controller:1.42.9

secret-controller

mirantis.azurecr.io/core/secret-controller:1.42.9

squid-proxy Removed

n/a

user-controller

mirantis.azurecr.io/core/user-controller:1.42.9

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.42.9.tgz

Docker images

kubectl

mirantis.azurecr.io/general/kubectl:20240926142019

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.20-jammy-20241104184039

mcc-keycloak Updated

mirantis.azurecr.io/iam/mcc-keycloak:25.0.6-20241114073807

Security notes

In total, since Container Cloud 2.28.5, in 2.29.0, 736 Common Vulnerabilities and Exposures (CVE) have been fixed: 125 of critical and 611 of high severity.

The table below includes the total numbers of addressed unique and common vulnerabilities and exposures (CVE) by product component since the 2.28.5 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

6

6

Common

0

177

177

Kaas core

Unique

1

8

9

Common

88

229

317

StackLight

Unique

7

48

55

Common

37

205

242

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK release notes 25.1: Security notes.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.4.0 or 16.4.0. For details on update impact and maintenance window planning, see MOSK Update notes.

Consider the information below as a supplement to the generic update procedures published in MOSK Operations Guide: Workflow and configuration of management cluster upgrade and MOSK Cluster update.

Pre-update actions
Update managed clusters to Ubuntu 22.04

In Container Cloud 2.29.0, the Cluster release update of the Ubuntu 20.04-based managed clusters becomes impossible, and Ubuntu 22.04 becomes the only supported version of the operating system. Therefore, ensure that every node of your managed clusters are running Ubuntu 22.04 to unblock managed cluster update in Container Cloud 2.29.0.

For the update procedure, refer to Mirantis OpenStack for Kubernetes documentation: Bare metal operations - Upgrade an operating system distribution.

Warning

Management cluster update to Container Cloud 2.29.1 will be blocked if at least one node of any related managed cluster is running Ubuntu 20.04.

Note

Existing management clusters were automatically updated to Ubuntu 22.04 during cluster upgrade to the Cluster release 16.2.0 in Container Cloud 2.27.0. Greenfield deployments of management clusters are also based on Ubuntu 22.04.

Back up custom Grafana dashboards on managed clusters

In Container Cloud 2.29.0, Grafana is updated to version 11 where the following deprecated Angular-based plugins are automatically migrated to the React-based ones:

  • Graph (old) -> Time Series

  • Singlestat -> Stat

  • Stat (old) -> Stat

  • Table (old) -> Table

  • Worldmap -> Geomap

This migration may corrupt custom Grafana dashboards that have Angular-based panels. Therefore, if you have such dashboards on managed clusters, back them up and manually upgrade Angular-based panels before updating to the Cluster release 17.4.0 to prevent custom appearance issues after plugin migration.

Note

All Grafana dashboards provided by StackLight are also migrated to React automatically. For the list of default dashboards, see MOSK Operations Guide: View Grafana dashboards.

Caution

For management clusters that are updated automatically, it is important to remove all Angular-based panels and prepare the backup of custom Grafana dashboards before Container Cloud 2.29.0 is released. For details, see Post update notes in 2.28.5 release notes. Otherwise, custom dashboards using Angular-based plugins may be corrupted and must be manually restored without a backup.

Post-update actions
Start using BareMetalHostInventory instead of BareMetalHost

Container Cloud 2.29.0 introduces the BareMetalHostInventory resource that must be used instead of BareMetalHost for adding and modifying configuration of bare metal servers. Therefore, if you need to modify an existing or create a new configuration of a bare metal host, use BareMetalHostInventory.

Each BareMetalHostInventory object is synchronized with an automatically created BareMetalHost object, which is now used for internal purposes of the Container Cloud private API.

Caution

Any change in the BareMetalHost object will be overwitten by BareMetalHostInventory.

For any existing BareMetalHost object, a BareMetalHostInventory object is created automatically during cluster update.

Update passwords for custom Linux accounts

To match CIS Benchmark compliance checks for Ubuntu Linux 22.04 LTS v2.0.0 L1 Server, Container Cloud 2.29.0 introduces new password policies for local (Linux) user accounts. For details, see Improvements in the CIS Benchmark compliance for Ubuntu, MKE, and Docker.

The rules are applied automatically to all cluster nodes during cluster update. Therefore, if you use custom Linux accounts protected by passwords, do not plan any critical maintenance activities right after cluster upgrade as you may need to update Linux user passwords.

Note

By default, during cluster creation, mcc-user is created without a password with an option to add an SSH key.

Migrate container runtime from Docker to containerd

Container Cloud 2.29.0 introduces switching of the default container runtime from Docker to containerd on greenfield management and managed clusters.

On existing clusters, perform the mandatory migration from Docker to containerd in the scope of Container Cloud 2.29.x. Otherwise, the management cluster update to Container Cloud 2.30.0 will be blocked.

Important

Container runtime migration involves machine cordoning and draining.

Note

If you have not upgraded the operating system distribution on your machines to Jammy yet, Mirantis recommends migrating machines from Docker to containerd on managed clusters together with distribution upgrade to minimize the maintenance window.

In this case, ensure that all cluster machines are updated at once during the same maintenance window to prevent machines from running different container runtimes.

Unsupported releases
Unsupported Container Cloud releases history - 2025

Version

Release date

Summary

2.28.5

Feb 03, 2025

Container Cloud 2.28.5 is the fifth patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.5 and 17.3.5 that represents MOSK patch release 24.3.2.

  • Support for Mirantis Kubernetes Engine to 3.7.18 and Mirantis Container Runtime 23.0.15, which includes containerd 1.6.36.

  • Optional migration of container runtime from Docker to containerd.

  • Bare metal: update of Ubuntu mirror to ubuntu-2025-01-08-003900 along with update of minor kernel version to 5.15.0-130-generic.

  • Security fixes for CVEs in images.

2.28.4

Jan 06, 2025

Container Cloud 2.28.4 is the fourth patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.4 and 17.3.4 that represents MOSK patch release 24.3.1.

  • Support for Mirantis Kubernetes Engine to 3.7.17 and Mirantis Container Runtime 23.0.15, which includes containerd 1.6.36.

  • Optional migration of container runtime from Docker to containerd.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-12-05-003900 along with update of minor kernel version to 5.15.0-126-generic.

  • Security fixes for CVEs in images.

  • OpenStack provider: suspension of support for cluster deployment and update

2.28.5

The Container Cloud patch release 2.28.5, which is based on the 2.28.0 major release, provides the following updates:

  • Support for the patch Cluster release 16.3.5.

  • Support for the patch Cluster release 17.3.5 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.3.2.

  • Support for Mirantis Kubernetes Engine 3.7.18 and Mirantis Container Runtime 23.0.15, which includes containerd 1.6.36.

  • Optional migration of container runtime from Docker to containerd.

  • Bare metal: update of Ubuntu mirror from ubuntu-2024-12-05-003900 to ubuntu-2025-01-08-003900 along with update of minor kernel version from 5.15.0-126-generic to 5.15.0-130-generic.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.3.0 and 16.3.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.28.5, refer to 2.28.0.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.3.5 or 16.3.5.

Consider the information below as a supplement to the generic update procedures published in MOSK Operations Guide: Automatic upgrade of a management cluster and Update to a patch version.

Post-update actions
Optional migration of container runtime from Docker to containerd

Since Container Cloud 2.28.4, Mirantis introduced an optional migration of container runtime from Docker to containerd, which is implemented for existing management and managed bare metal clusters. The use of containerd allows for better Kubernetes performance and component update without pod restart when applying fixes for CVEs. For the migration procedure, refer to MOSK Operations Guide: Migrate container runtime from Docker to containerd.

Note

Container runtime migration becomes mandatory in the scope of Container Cloud 2.29.x. Otherwise, the management cluster update to Container Cloud 2.30.0 will be blocked.

Note

In Containter Cloud 2.28.x series, the default container runtime remains Docker for greenfield deployments. Support for greenfield deployments based on containerd will be announced in one of the following releases.

Important

Container runtime migration involves machine cordoning and draining.

Note

If you have not upgraded the operating system distribution on your machines to Jammy yet, Mirantis recommends migrating machines from Docker to containerd on managed clusters together with distribution upgrade to minimize the maintenance window.

In this case, ensure that all cluster machines are updated at once during the same maintenance window to prevent machines from running different container runtimes.

Back up custom Grafana dashboards

In Container Cloud 2.29.0, Grafana will be updated to version 11 where the following deprecated Angular-based plugins will be automatically migrated to the React-based ones:

  • Graph (old) -> Time Series

  • Singlestat -> Stat

  • Stat (old) -> Stat

  • Table (old) -> Table

  • Worldmap -> Geomap

This migration may corrupt custom Grafana dashboards that have Angular-based panels. Therefore, if you have such dashboards, back them up and manually upgrade Angular-based panels during the course of Container Cloud 2.28.x (Cluster releases 17.3.x and 16.3.x) to prevent custom appearance issues after plugin migration in Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0).

Note

All Grafana dashboards provided by StackLight are also migrated to React automatically. For the list of default dashboards, see MOSK Operations Guide: View Grafana dashboards.

Warning

For management clusters that are updated automatically, it is important to prepare the backup before Container Cloud 2.29.0 is released. Otherwise, custom dashboards using Angular-based plugins may be corrupted.

For managed clusters, you can perform the backup after the Container Cloud 2.29.0 release date but before updating them to the Cluster release 17.4.0.

Security notes

In total, since Container Cloud 2.28.4, 1 Common Vulnerability and Exposure (CVE) of high severity has been fixed in 2.28.5.

The table below includes the total numbers of addressed unique and common CVEs in images by product component since Container Cloud 2.28.4. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Kaas core

Unique

0

1

1

Common

0

1

1

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.3.2: Security notes.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.28.5 including the Cluster releases 16.3.5 and 17.3.5.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.


StackLight
[44193] OpenSearch reaches 85% disk usage watermark affecting the cluster state

Fixed in 2.29.0 (17.4.0 and 16.4.0)

On High Availability (HA) clusters that use Local Volume Provisioner (LVP), Prometheus and OpenSearch from StackLight may share the same pool of storage. In such configuration, OpenSearch may approach the 85% disk usage watermark due to the combined storage allocation and usage patterns set by the Persistent Volume Claim (PVC) size parameters for Prometheus and OpenSearch, which consume storage the most.

When the 85% threshold is reached, the affected node is transitioned to the read-only state, preventing shard allocation and causing the OpenSearch cluster state to transition to Warning (Yellow) or Critical (Red).

Caution

The issue and the provided workaround apply only for clusters on which OpenSearch and Prometheus utilize the same storage pool.

To verify that the cluster is affected:

  1. Verify the result of the following formula:

    0.8 × OpenSearch_PVC_Size_GB + Prometheus_PVC_Size_GB > 0.85 × Total_Storage_Capacity_GB
    

    In the formula, define the following values:

    OpenSearch_PVC_Size_GB

    Derived from .values.elasticsearch.persistentVolumeUsableStorageSizeGB, defaulting to .values.elasticsearch.persistentVolumeClaimSize if unspecified. To obtain the OpenSearch PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.elasticsearch.persistentVolumeClaimSize '
    

    Example of system response:

    10000Gi
    
    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize. To obtain the Prometheus PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.prometheusServer.persistentVolumeClaimSize '
    

    Example of system response:

    4000Gi
    
    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    If the formula result is positive, it is an early indication that the cluster is affected.

  2. Verify whether the OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical alert is firing. And if so, verify the following:

    1. Log in to the OpenSearch web UI.

    2. In Management -> Dev Tools, run the following command:

      GET _cluster/allocation/explain
      

      The following system response indicates that the corresponding node is affected:

      "explanation": "the node is above the low watermark cluster setting \
      [cluster.routing.allocation.disk.watermark.low=85%], using more disk space \
      than the maximum allowed [85.0%], actual free: [xx.xxx%]"
      

      Note

      The system response may contain even higher watermark percent than 85.0%, depending on the case.

Workaround:

Warning

The workaround implies adjustement of the retention threshold for OpenSearch. And depending on the new threshold, some old logs will be deleted.

  1. Adjust or set .values.elasticsearch.persistentVolumeUsableStorageSizeGB to a lower value for the affection check formula to be non-positive. For configuration details, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

    Mirantis also recommends reserving some space for other PVCs using storage from the pool. Use the following formula to calculate the required space:

    persistentVolumeUsableStorageSizeGB =
    0.84 × ((1 - Reserved_Percentage - Filesystem_Reserve) ×
    Total_Storage_Capacity_GB - Prometheus_PVC_Size_GB) /
    0.8
    

    In the formula, define the following values:

    Reserved_Percentage

    A user-defined variable that specifies what percentage of the total storage capacity should not be used by OpenSearch or Prometheus. This is used to reserve space for other components. It should be expressed as a decimal. For example, for 5% of reservation, Reserved_Percentage is 0.05. Mirantis recommends using 0.05 as a starting point.

    Filesystem_Reserve

    Percentage to deduct for filesystems that may reserve some portion of the available storage, which is marked as occupied. For example, for EXT4, it is 5% by default, so the value must be 0.05.

    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize.

    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    Calculation of above formula provides a maximum safe storage to allocate for .values.elasticsearch.persistentVolumeUsableStorageSizeGB. Use this formula as a reference for setting .values.elasticsearch.persistentVolumeUsableStorageSizeGB on a cluster.

  2. Wait up to 15-20 mins for OpenSearch to perform the cleaning.

  3. Verify that the cluster is not affected anymore using the procedure above.


Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.28.5. For artifacts of the Cluster releases introduced in 2.28.5, see patch Cluster releases 17.3.5 and 16.3.5.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs Updated

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-antelope-jammy-debug-20250108133235

ironic-python-agent.kernel Updated

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-antelope-jammy-debug-20250108133235

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-167-e7a55fd.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.41.28.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.41.28.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.41.28.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.41.28.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.41.28.tgz

Docker images

ambassador Updated

mirantis.azurecr.io/core/external/nginx:1.41.28

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-28-alpine-20241022121257

baremetal-operator

mirantis.azurecr.io/bm/baremetal-operator:base-2-28-alpine-20241217153430

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-28-alpine-20241217153957

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.41.28

ironic

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20241128095555

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20241128095555

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240913123302

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-28-alpine-20241217153549

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-34a4f54-20240910081335

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-jammy-20240927170336

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20241022120929

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.41.28.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.41.28.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.41.28.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.41.28.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.41.28.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.41.28.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.41.28.tgz

credentials-controller Deprecated

https://binary.mirantis.com/core/helm/credentials-controller-1.41.28.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.41.28.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.41.28.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.41.28.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.41.28.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.41.28.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.41.28.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.41.28.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.41.28.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.41.28.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.41.28.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.41.28.tgz

openstack-provider Deprecated

https://binary.mirantis.com/core/helm/openstack-provider-1.41.28.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.41.28.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.41.28.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.41.28.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.41.28.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.41.28.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.41.28.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.41.28

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.41.28

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.41.28

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-9

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.41.28

credentials-controller Deprecated

mirantis.azurecr.io/core/credentials-controller:1.41.28

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.41.28

frontend Updated

mirantis.azurecr.io/core/frontend:1.41.28

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.41.28

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.41.28

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.41.28

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.41.28

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.41.28

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.41.28

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.41.28

mcc-cache-warmup Updated

mirantis.azurecr.io/core/mcc-cache-warmup:1.41.28

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.41.28

openstack-cluster-api-controller Deprecated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.41.28

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.41.28

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.41.28

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-14

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.41.28

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.41.28

secret-controller Updated

mirantis.azurecr.io/core/secret-controller:1.41.28

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.41.28

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.41.28.tgz

Docker images

kubectl

mirantis.azurecr.io/general/kubectl:20240926142019

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240909113408

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:25.0.6-20241114073807

2.28.4

The Container Cloud patch release 2.28.4, which is based on the 2.28.0 major release, provides the following updates:

  • Support for the patch Cluster release 16.3.4.

  • Support for the patch Cluster release 17.3.4 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.3.1.

  • Support for Mirantis Kubernetes Engine to 3.7.17 and Mirantis Container Runtime 23.0.15, which includes containerd 1.6.36.

  • Optional migration of container runtime from Docker to containerd.

  • Bare metal: update of Ubuntu mirror from ubuntu-2024-11-18-003900 to ubuntu-2024-12-05-003900 along with update of minor kernel version from 5.15.0-125-generic to 5.15.0-126-generic.

  • Security fixes for CVEs in images.

  • OpenStack provider: suspension of support for cluster deployment and update. For details, see Deprecation notes.

This patch release also supports the latest major Cluster releases 17.3.0 and 16.3.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.28.4, refer to 2.28.0.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.3.4 or 16.3.4.

Important

For MOSK deployments, although MOSK 24.3.1 is classified as a patch release, as a cloud operator, you will be performing a major update regardless of the upgrade path: whether you are upgrading from patch 24.2.5 or major version 24.3. For details, see MOSK 24.3.1 release notes: Update notes.

Consider the information below as a supplement to the generic update procedures published in MOSK Operations Guide: Automatic upgrade of a management cluster and Update to a patch version.

Post-update actions
Optional migration of container runtime from Docker to containerd

Container Cloud 2.28.4 introduces an optional migration of container runtime from Docker to containerd, which is implemented for existing management and managed bare metal clusters. The use of containerd allows for better Kubernetes performance and component update without pod restart when applying fixes for CVEs. For the migration procedure, refer to MOSK Operations Guide: Migrate container runtime from Docker to containerd.

Note

Container runtime migration becomes mandatory in the scope of Container Cloud 2.29.x. Otherwise, the management cluster update to Container Cloud 2.30.0 will be blocked.

Note

In Containter Cloud 2.28.x series, the default container runtime remains Docker for greenfield deployments. Support for greenfield deployments based on containerd will be announced in one of the following releases.

Important

Container runtime migration involves machine cordoning and draining.

Note

If you have not upgraded the operating system distribution on your machines to Jammy yet, Mirantis recommends migrating machines from Docker to containerd on managed clusters together with distribution upgrade to minimize the maintenance window.

In this case, ensure that all cluster machines are updated at once during the same maintenance window to prevent machines from running different container runtimes.

Back up custom Grafana dashboards

In Container Cloud 2.29.0, Grafana will be updated to version 11 where the following deprecated Angular-based plugins will be automatically migrated to the React-based ones:

  • Graph (old) -> Time Series

  • Singlestat -> Stat

  • Stat (old) -> Stat

  • Table (old) -> Table

  • Worldmap -> Geomap

This migration may corrupt custom Grafana dashboards that have Angular-based panels. Therefore, if you have such dashboards, back them up and manually upgrade Angular-based panels during the course of Container Cloud 2.28.x (Cluster releases 17.3.x and 16.3.x) to prevent custom appearance issues after plugin migration in Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0).

Note

All Grafana dashboards provided by StackLight are also migrated to React automatically. For the list of default dashboards, see MOSK Operations Guide: View Grafana dashboards.

Warning

For management clusters that are updated automatically, it is important to prepare the backup before Container Cloud 2.29.0 is released. Otherwise, custom dashboards using Angular-based plugins may be corrupted.

For managed clusters, you can perform the backup after the Container Cloud 2.29.0 release date but before updating them to the Cluster release 17.4.0.

Security notes

In total, since Container Cloud 2.28.3, 158 Common Vulnerabilities and Exposures (CVE) have been fixed in 2.28.4: 10 of critical and 148 of high severity.

The table below includes the total numbers of addressed unique and common CVEs in images by product component since Container Cloud 2.28.3. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

3

3

Common

0

7

7

Kaas core

Unique

1

18

19

Common

4

92

96

StackLight

Unique

3

19

22

Common

6

49

55

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.3.1: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.28.4 along with the patch Cluster releases 16.3.4 and 17.3.4:

  • [30294] [LCM] Fixed the issue that prevented replacement of a manager machine during the calico-node Pod start on a new node that has the same IP address as the node being replaced.

  • [5782] [LCM] Fixed the issue that prevented deployment of a manager machine during node replacement.

  • [5568] [LCM] Fixed the issue that prevented cleaning of resources by the calico-kube-controllers Pod during unsafe or forced deletion of a manager machine.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.28.4 including the Cluster releases 16.3.4 and 17.3.4.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.


StackLight
[44193] OpenSearch reaches 85% disk usage watermark affecting the cluster state

Fixed in 2.29.0 (17.4.0 and 16.4.0)

On High Availability (HA) clusters that use Local Volume Provisioner (LVP), Prometheus and OpenSearch from StackLight may share the same pool of storage. In such configuration, OpenSearch may approach the 85% disk usage watermark due to the combined storage allocation and usage patterns set by the Persistent Volume Claim (PVC) size parameters for Prometheus and OpenSearch, which consume storage the most.

When the 85% threshold is reached, the affected node is transitioned to the read-only state, preventing shard allocation and causing the OpenSearch cluster state to transition to Warning (Yellow) or Critical (Red).

Caution

The issue and the provided workaround apply only for clusters on which OpenSearch and Prometheus utilize the same storage pool.

To verify that the cluster is affected:

  1. Verify the result of the following formula:

    0.8 × OpenSearch_PVC_Size_GB + Prometheus_PVC_Size_GB > 0.85 × Total_Storage_Capacity_GB
    

    In the formula, define the following values:

    OpenSearch_PVC_Size_GB

    Derived from .values.elasticsearch.persistentVolumeUsableStorageSizeGB, defaulting to .values.elasticsearch.persistentVolumeClaimSize if unspecified. To obtain the OpenSearch PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.elasticsearch.persistentVolumeClaimSize '
    

    Example of system response:

    10000Gi
    
    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize. To obtain the Prometheus PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.prometheusServer.persistentVolumeClaimSize '
    

    Example of system response:

    4000Gi
    
    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    If the formula result is positive, it is an early indication that the cluster is affected.

  2. Verify whether the OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical alert is firing. And if so, verify the following:

    1. Log in to the OpenSearch web UI.

    2. In Management -> Dev Tools, run the following command:

      GET _cluster/allocation/explain
      

      The following system response indicates that the corresponding node is affected:

      "explanation": "the node is above the low watermark cluster setting \
      [cluster.routing.allocation.disk.watermark.low=85%], using more disk space \
      than the maximum allowed [85.0%], actual free: [xx.xxx%]"
      

      Note

      The system response may contain even higher watermark percent than 85.0%, depending on the case.

Workaround:

Warning

The workaround implies adjustement of the retention threshold for OpenSearch. And depending on the new threshold, some old logs will be deleted.

  1. Adjust or set .values.elasticsearch.persistentVolumeUsableStorageSizeGB to a lower value for the affection check formula to be non-positive. For configuration details, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

    Mirantis also recommends reserving some space for other PVCs using storage from the pool. Use the following formula to calculate the required space:

    persistentVolumeUsableStorageSizeGB =
    0.84 × ((1 - Reserved_Percentage - Filesystem_Reserve) ×
    Total_Storage_Capacity_GB - Prometheus_PVC_Size_GB) /
    0.8
    

    In the formula, define the following values:

    Reserved_Percentage

    A user-defined variable that specifies what percentage of the total storage capacity should not be used by OpenSearch or Prometheus. This is used to reserve space for other components. It should be expressed as a decimal. For example, for 5% of reservation, Reserved_Percentage is 0.05. Mirantis recommends using 0.05 as a starting point.

    Filesystem_Reserve

    Percentage to deduct for filesystems that may reserve some portion of the available storage, which is marked as occupied. For example, for EXT4, it is 5% by default, so the value must be 0.05.

    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize.

    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    Calculation of above formula provides a maximum safe storage to allocate for .values.elasticsearch.persistentVolumeUsableStorageSizeGB. Use this formula as a reference for setting .values.elasticsearch.persistentVolumeUsableStorageSizeGB on a cluster.

  2. Wait up to 15-20 mins for OpenSearch to perform the cleaning.

  3. Verify that the cluster is not affected anymore using the procedure above.


Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.28.4. For artifacts of the Cluster releases introduced in 2.28.4, see patch Cluster releases 17.3.4 and 16.3.4.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs Updated

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-antelope-jammy-debug-20241205172311

ironic-python-agent.kernel Updated

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-antelope-jammy-debug-20241205172311

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-167-e7a55fd.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.41.26.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.41.26.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.41.26.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.41.26.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.41.26.tgz

Docker images

ambassador Updated

mirantis.azurecr.io/core/external/nginx:1.41.26

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-28-alpine-20241022121257

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-28-alpine-20241217153430

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-2-28-alpine-20241217153957

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.41.26

ironic

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20241128095555

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20241128095555

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240913123302

kaas-ipam Updated

mirantis.azurecr.io/bm/kaas-ipam:base-2-28-alpine-20241217153549

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-34a4f54-20240910081335

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-jammy-20240927170336

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20241022120929

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.41.26.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.41.26.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.41.26.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.41.26.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.41.26.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.41.26.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.41.26.tgz

credentials-controller

https://binary.mirantis.com/core/helm/credentials-controller-1.41.26.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.41.26.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.41.26.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.41.26.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.41.26.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.41.26.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.41.26.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.41.26.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.41.26.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.41.26.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.41.26.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.41.26.tgz

openstack-provider Deprecated

https://binary.mirantis.com/core/helm/openstack-provider-1.41.26.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.41.26.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.41.26.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.41.26.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.41.26.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.41.26.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.41.26.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.41.26.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.41.26

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.41.26

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.41.26

cert-manager-controller Updated

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-9

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.41.26

credentials-controller Updated

mirantis.azurecr.io/core/credentials-controller:1.41.26

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.41.26

frontend Updated

mirantis.azurecr.io/core/frontend:1.41.26

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.41.26

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.41.26

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.41.26

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.41.26

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.41.26

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.41.26

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.41.26

mcc-cache-warmup Updated

mirantis.azurecr.io/core/mcc-cache-warmup:1.41.26

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.41.26

openstack-cluster-api-controller Deprecated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.41.26

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.41.26

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.41.26

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-14

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.41.26

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.41.26

secret-controller Updated

mirantis.azurecr.io/core/secret-controller:1.41.26

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.41.26

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.41.26.tgz

Docker images

kubectl

mirantis.azurecr.io/general/kubectl:20240926142019

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240909113408

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:25.0.6-20241114073807

Releases delivered in 2024

This section contains historical information on the unsupported Container Cloud releases delivered in 2024. For the latest supported Container Cloud release, see Container Cloud releases.

Unsupported Container Cloud releases 2024

Version

Release date

Summary

2.28.3

Dec 09, 2024

Container Cloud 2.28.3 is the third patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.3.

  • Support for the patch Cluster releases 16.2.7 and 17.2.7 that represents MOSK patch release 24.2.5.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-11-18-003900 along with update of minor kernel version to 5.15.0-125-generic.

  • Security fixes for CVEs in images.

2.28.2

Nov 18, 2024

Container Cloud 2.28.2 is the second patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.2.

  • Support for the patch Cluster releases 16.2.6 and 17.2.6 that represents MOSK patch release 24.2.4.

  • Support for MKE 3.7.16.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-10-28-012906 along with update of minor kernel version to 5.15.0-124-generic.

  • Security fixes for CVEs in images.

2.28.1

Oct 30, 2024

Container Cloud 2.28.1 is the first patch release of the 2.28.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.3.1.

  • Support for the patch Cluster releases 16.2.5 and 17.2.5 that represents MOSK patch release 24.2.3.

  • Support for MKE 3.7.15.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-10-14-013948 along with update of minor kernel version to 5.15.0-122-generic.

  • Security fixes for CVEs in images.

2.28.0

Oct 16, 2024

  • General availability for Ubuntu 22.04 on MOSK clusters

  • Improvements in the CIS Benchmark compliance for Ubuntu Linux 22.04 LTS v2.0.0 L1 Server

  • Support for MKE 3.7.12 on clusters following the major update path

  • Support for MCR 23.0.14

  • Update group for controller nodes

  • Reboot of machines using update groups

  • Amendments for the ClusterUpdatePlan object

  • Refactoring of delayed auto-update of a management cluster

  • Self-diagnostics for management and managed clusters

  • Configuration of groups in auditd

  • Container Cloud web UI enhancements for the bare metal provider

  • Day-2 operations for bare metal:

    • Updating modules

    • Configuration enhancements for modules

  • StackLight:

    • Monitoring of LCM issues

    • Refactoring of StackLight expiration alerts

  • Documentation enhancements

2.27.4

Sep 16, 2024

Container Cloud 2.27.4 is the fourth patch release of the 2.27.x release series that introduces the following updates:

  • Support for the patch Cluster releases 16.2.4 and 17.2.4 that represents MOSK patch release 24.2.2.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-08-21-014714 along with update of minor kernel version to 5.15.0-119-generic.

  • Security fixes for CVEs in images.

2.27.3

Aug 27, 2024

Container Cloud 2.27.3 is the third patch release of the 2.27.x release series that introduces the following updates:

  • Support for the patch Cluster releases 16.2.3 and 17.2.3 that represents MOSK patch release 24.2.1.

  • Support for MKE 3.7.12.

  • Improvements in the MKE benchmark compliance (control ID 5.1.5).

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-08-06-014502 along with update of minor kernel version to 5.15.0-117-generic.

  • VMware vSphere: suspension of support for cluster deployment, update, and attachment.

  • Security fixes for CVEs in images.

2.27.2

Aug 05, 2024

Container Cloud 2.27.2 is the second patch release of the 2.27.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.2.2.

  • Support for the patch Cluster releases 16.1.7 and 17.1.7 that represents MOSK patch release 24.1.7.

  • Support for MKE 3.7.11.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-07-16-014744 along with update of the minor kernel version to 5.15.0-116-generic (Cluster release 16.2.2).

  • Security fixes for CVEs in images.

2.27.1

Jul 16, 2024

Container Cloud 2.27.1 is the first patch release of the 2.27.x release series that introduces the following updates:

  • Support for the patch Cluster release 16.2.1.

  • Support for the patch Cluster releases 16.1.6 and 17.1.6 that represents MOSK patch release 24.1.6.

  • Support for MKE 3.7.10.

  • Support for docker-ee-cli 23.0.13 in MCR 23.0.11 to fix several CVEs.

  • Bare metal: update of Ubuntu mirror to ubuntu-2024-06-27-095142 along with update of minor kernel version to 5.15.0-113-generic.

  • Security fixes for CVEs in images.

  • Bug fixes.

2.27.0

Jul 02, 2024

  • MKE:

    • MKE 3.7.8 for clusters that follow major update path

    • Improvements in the MKE benchmark compliance

  • Bare metal:

    • General availability for Ubuntu 22.04 on bare metal clusters

    • Improvements in the day-2 management API for bare metal clusters

    • Optimization of strict filtering for devices on bare metal clusters

    • Deprecation of SubnetPool and MetalLBConfigTemplate objects

  • LCM:

    • The ClusterUpdatePlan object for a granular cluster update

    • Update groups for worker machines

    • LCM Agent heartbeats

    • Handling secret leftovers using secret-controller

    • MariaDB backup for bare metal and vSphere providers

  • Ceph:

    • Automatic upgrade from Quincy to Reef

    • Support for Rook v1.13

    • Setting a configuration section for Rook parameters

  • StackLight:

    • Monitoring of I/O errors in kernel logs

    • S.M.A.R.T. metrics for creating alert rules on bare metal clusters

    • Improvements for OpenSearch and OpenSearch Indices Grafana dashboards

    • Removal of grafana-image-renderer

2.26.5

June 18, 2024

Container Cloud 2.26.5 is the fifth patch release of the 2.26.x and MOSK 24.1.x release series that introduces the following updates:

  • Support for the patch Cluster releases 16.1.5 and 17.1.5 that represents MOSK patch release 24.1.5.

  • Bare metal: update of Ubuntu mirror to 20.04~20240517090228 along with update of minor kernel version to 5.15.0-107-generic.

  • Security fixes for CVEs in images.

  • Bug fixes.

2.26.4

May 20, 2024

Container Cloud 2.26.4 is the fourth patch release of the 2.26.x and MOSK 24.1.x release series that introduces the following updates:

  • Support for the patch Cluster releases 16.1.4 and 17.1.4 that represents MOSK patch release 24.1.4.

  • Support for MKE 3.7.8.

  • Bare metal: update of Ubuntu mirror to 20.04~20240502102020 along with update of minor kernel version to 5.15.0-105-generic.

  • Security fixes for CVEs in images.

  • Bug fixes.

2.26.3

Apr 29, 2024

Container Cloud 2.26.3 is the third patch release of the 2.26.x and MOSK 24.1.x release series that introduces the following updates:

  • Support for the patch Cluster releases 16.1.3 and 17.1.3 that represents MOSK patch release 24.1.3.

  • Support for MKE 3.7.7.

  • Bare metal: update of Ubuntu mirror to 20.04~20240411171541 along with update of minor kernel version to 5.15.0-102-generic.

  • Security fixes for CVEs in images.

  • Bug fixes.

2.26.2

Apr 08, 2024

Container Cloud 2.26.2 is the second patch release of the 2.26.x and MOSK 24.1.x release series that introduces the following updates:

  • Support for the patch Cluster releases 16.1.2 and 17.1.2 that represents MOSK patch release 24.1.2.

  • Support for MKE 3.7.6.

  • Support for docker-ee-cli 23.0.10 in MCR 23.0.9 to fix the several CVEs.

  • Bare metal: update of Ubuntu mirror to 20.04~20240324172903 along with update of minor kernel version to 5.15.0-101-generic.

  • Security fixes for CVEs in images.

2.26.1

Mar 20, 2024

Container Cloud 2.26.1 is the first patch release of the 2.26.x and MOSK 24.1.x release series that introduces the following updates:

  • Support for the patch Cluster releases 16.1.1 and 17.1.1 that represents MOSK patch release 24.1.1.

  • Support for MKE 3.7.6.

  • Security fixes for CVEs in images.

2.26.0

Mar 04, 2024

  • LCM:

    • Pre-update inspection of pinned product artifacts in a Cluster object

    • Disablement of worker machines on managed clusters

    • Health monitoring of cluster LCM operations

    • Support for MKE 3.7.5 and MCR 23.0.9

  • Security:

    • Support for Kubernetes auditing and profiling on management clusters

    • Policy Controller for validating pod image signatures

    • Configuring trusted certificates for Keycloak

  • Bare metal:

    • Day-2 management API for bare metal clusters

    • Strict filtering for devices on bare metal clusters

    • Dynamic IP allocation for faster host provisioning

    • Cleanup of LVM thin pool volumes during cluster provisioning

    • Wiping a device or partition before a bare metal cluster deployment

    • Container Cloud web UI improvements

  • Ceph:

    • Support for Rook v1.12

    • Support for custom device classes

    • Network policies for Rook Ceph daemons

  • StackLight:

    • Upgraded logging pipeline

    • Support for custom labels during alert injection

  • Documentation enhancements

2.25.4

Jan 10, 2024

Container Cloud 2.25.4 is the fourth patch release of the 2.25.x and MOSK 23.3.x release series that introduces the following updates:

  • Patch Cluster release 17.0.4 for MOSK 23.3.4

  • Patch Cluster release 16.0.4

  • Security fixes for CVEs in images

2.28.3

Important

For MOSK clusters, Container Cloud 2.28.3 is the continuation for MOSK 24.2.x series using the patch Cluster release 17.2.7. For the update path of 24.1, 24.2, and 24.3 series, see MOSK documentation: Release Compatibility Matrix - Managed cluster update schema.

The management cluster of a MOSK 24.2.x cluster is automatically updated to the latest patch Cluster release 16.3.3.

The Container Cloud patch release 2.28.3, which is based on the 2.28.0 major release, provides the following updates:

  • Support for the patch Cluster release 16.3.3.

  • Support for the patch Cluster releases 16.2.7 and 17.2.7 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.2.5.

  • Bare metal: update of Ubuntu mirror from ubuntu-2024-10-28-012906 to ubuntu-2024-11-18-003900 along with update of minor kernel version from 5.15.0-124-generic to 5.15.0-125-generic.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.3.0 and 16.3.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.28.3, refer to 2.28.0.

Security notes

In total, since Container Cloud 2.28.2, 66 Common Vulnerabilities and Exposures (CVE) have been fixed in 2.28.3: 4 of critical and 62 of high severity.

The table below includes the total numbers of addressed unique and common CVEs in images by product component since Container Cloud 2.28.2. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

1

1

Common

0

3

3

Kaas core

Unique

0

4

4

Common

0

7

7

StackLight

Unique

4

21

25

Common

4

52

56

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.2.5: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.28.3 along with the patch Cluster releases 16.3.3, 16.2.7, and 17.2.7:

  • [47594] [StackLight] Fixed the issue with Patroni pods getting stuck in the CrashLoopBackOff state due to the patroni container being terminated with reason: OOMKilled.

  • [47929] [LCM] Fixed the issue with incorrect restrictive permissions set for registry certificate files in /etc/docker/certs.d, which were set to 644 instead of 444.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.28.3 including the Cluster releases 16.2.7, 16.3.3, and 17.2.7.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

StackLight
[44193] OpenSearch reaches 85% disk usage watermark affecting the cluster state

Fixed in 2.29.0 (17.4.0 and 16.4.0)

On High Availability (HA) clusters that use Local Volume Provisioner (LVP), Prometheus and OpenSearch from StackLight may share the same pool of storage. In such configuration, OpenSearch may approach the 85% disk usage watermark due to the combined storage allocation and usage patterns set by the Persistent Volume Claim (PVC) size parameters for Prometheus and OpenSearch, which consume storage the most.

When the 85% threshold is reached, the affected node is transitioned to the read-only state, preventing shard allocation and causing the OpenSearch cluster state to transition to Warning (Yellow) or Critical (Red).

Caution

The issue and the provided workaround apply only for clusters on which OpenSearch and Prometheus utilize the same storage pool.

To verify that the cluster is affected:

  1. Verify the result of the following formula:

    0.8 × OpenSearch_PVC_Size_GB + Prometheus_PVC_Size_GB > 0.85 × Total_Storage_Capacity_GB
    

    In the formula, define the following values:

    OpenSearch_PVC_Size_GB

    Derived from .values.elasticsearch.persistentVolumeUsableStorageSizeGB, defaulting to .values.elasticsearch.persistentVolumeClaimSize if unspecified. To obtain the OpenSearch PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.elasticsearch.persistentVolumeClaimSize '
    

    Example of system response:

    10000Gi
    
    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize. To obtain the Prometheus PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.prometheusServer.persistentVolumeClaimSize '
    

    Example of system response:

    4000Gi
    
    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    If the formula result is positive, it is an early indication that the cluster is affected.

  2. Verify whether the OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical alert is firing. And if so, verify the following:

    1. Log in to the OpenSearch web UI.

    2. In Management -> Dev Tools, run the following command:

      GET _cluster/allocation/explain
      

      The following system response indicates that the corresponding node is affected:

      "explanation": "the node is above the low watermark cluster setting \
      [cluster.routing.allocation.disk.watermark.low=85%], using more disk space \
      than the maximum allowed [85.0%], actual free: [xx.xxx%]"
      

      Note

      The system response may contain even higher watermark percent than 85.0%, depending on the case.

Workaround:

Warning

The workaround implies adjustement of the retention threshold for OpenSearch. And depending on the new threshold, some old logs will be deleted.

  1. Adjust or set .values.elasticsearch.persistentVolumeUsableStorageSizeGB to a lower value for the affection check formula to be non-positive. For configuration details, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

    Mirantis also recommends reserving some space for other PVCs using storage from the pool. Use the following formula to calculate the required space:

    persistentVolumeUsableStorageSizeGB =
    0.84 × ((1 - Reserved_Percentage - Filesystem_Reserve) ×
    Total_Storage_Capacity_GB - Prometheus_PVC_Size_GB) /
    0.8
    

    In the formula, define the following values:

    Reserved_Percentage

    A user-defined variable that specifies what percentage of the total storage capacity should not be used by OpenSearch or Prometheus. This is used to reserve space for other components. It should be expressed as a decimal. For example, for 5% of reservation, Reserved_Percentage is 0.05. Mirantis recommends using 0.05 as a starting point.

    Filesystem_Reserve

    Percentage to deduct for filesystems that may reserve some portion of the available storage, which is marked as occupied. For example, for EXT4, it is 5% by default, so the value must be 0.05.

    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize.

    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    Calculation of above formula provides a maximum safe storage to allocate for .values.elasticsearch.persistentVolumeUsableStorageSizeGB. Use this formula as a reference for setting .values.elasticsearch.persistentVolumeUsableStorageSizeGB on a cluster.

  2. Wait up to 15-20 mins for OpenSearch to perform the cleaning.

  3. Verify that the cluster is not affected anymore using the procedure above.


Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.28.3. For artifacts of the Cluster releases introduced in 2.28.3, see patch Cluster releases 17.2.7, 16.3.3, and 16.2.7.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs Updated

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-antelope-jammy-debug-20241118155355

ironic-python-agent.kernel Updated

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-antelope-jammy-debug-20241118155355

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-167-e7a55fd.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.41.23.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.41.23.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.41.23.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.41.23.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.41.23.tgz

Docker images

ambassador Updated

mirantis.azurecr.io/core/external/nginx:1.41.23

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-28-alpine-20241022121257

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-28-alpine-20241111132119

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-28-alpine-20241022120001

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.41.23

ironic Updated

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20241128095555

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20241128095555

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240913123302

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-28-alpine-20241022122006

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-34a4f54-20240910081335

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-jammy-20240927170336

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20241022120929

Core artifacts
Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.41.23.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.41.23.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.41.23.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.41.23.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.41.23.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.41.23.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.41.23.tgz

credentials-controller

https://binary.mirantis.com/core/helm/credentials-controller-1.41.23.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.41.23.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.41.23.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.41.23.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.41.23.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.41.23.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.41.23.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.41.23.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.41.23.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.41.23.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.41.23.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.41.23.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.41.23.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.41.23.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.41.23.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.41.23.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.41.23.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.41.23.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.41.23.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.41.23

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.41.23

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.41.23

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-8

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.41.23

credentials-controller Updated

mirantis.azurecr.io/core/credentials-controller:1.41.23

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.41.23

frontend Updated

mirantis.azurecr.io/core/frontend:1.41.23

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.41.23

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.41.23

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.41.23

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.41.23

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.41.23

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.41.23

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.41.23

mcc-cache-warmup Updated

mirantis.azurecr.io/core/mcc-cache-warmup:1.41.23

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.41.23

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.41.23

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.41.23

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.41.23

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-14

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.41.23

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.41.23

secret-controller Updated

mirantis.azurecr.io/core/secret-controller:1.41.23

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.41.23

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.41.23.tgz

Docker images

kubectl

mirantis.azurecr.io/general/kubectl:20240926142019

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240909113408

mcc-keycloak Updated

mirantis.azurecr.io/iam/mcc-keycloak:25.0.6-20241114073807

2.28.2

Important

For MOSK clusters, Container Cloud 2.28.2 is the continuation for MOSK 24.2.x series using the patch Cluster release 17.2.6. For the update path of 24.1, 24.2, and 24.3 series, see MOSK documentation: Release Compatibility Matrix - Managed cluster update schema.

The management cluster of a MOSK 24.2.x cluster is automatically updated to the latest patch Cluster release 16.3.2.

The Container Cloud patch release 2.28.2, which is based on the 2.28.0 major release, provides the following updates:

  • Support for the patch Cluster release 16.3.2.

  • Support for the patch Cluster releases 16.2.6 and 17.2.6 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.2.4.

  • Support for MKE 3.7.16.

  • Bare metal: update of Ubuntu mirror from ubuntu-2024-10-14-013948 to ubuntu-2024-10-28-012906 along with update of minor kernel version from 5.15.0-122-generic to 5.15.0-124-generic.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.3.0 and 16.3.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.28.2, refer to 2.28.0.

Security notes

In total, since Container Cloud 2.28.1, 15 Common Vulnerabilities and Exposures (CVE) of high severity have been fixed in 2.28.2.

The table below includes the total numbers of addressed unique and common CVEs in images by product component since Container Cloud 2.28.1. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Kaas core

Unique

0

5

5

Common

0

9

9

StackLight

Unique

0

1

1

Common

0

6

6

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.2.4: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.28.2 along with the patch Cluster releases 16.3.2, 16.2.6, and 17.2.6.

  • [47741] [LCM] Fixed the issue with upgrade to MKE 3.7.15 getting stuck due to the leftover ucp-upgrade-check-images service that is part of MKE 3.7.12.

  • [47304] [StackLight] Fixed the issue with OpenSearch not storing kubelet logs due to the JSON-based format of ucp-kubelet logs.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.28.2 including the Cluster releases 16.2.6, 16.3.2, and 17.2.6.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

StackLight
[47594] Patroni pods may get stuck in the CrashLoopBackOff state

Fixed in 2.28.3 (17.2.7, 16.2.7, and 16.3.3)

The Patroni pods may get stuck in the CrashLoopBackOff state due to the patroni container being terminated with reason: OOMKilled that you can see in the pod status. For example:

kubectl get pod/patroni-13-0 -n stacklight -o yaml
...
  - containerID: docker://<ID>`
    image: mirantis.azurecr.io/stacklight/spilo:13-2.1p9-20240828023010
    imageID: docker-pullable://mirantis.azurecr.io/stacklight/spilo@sha256:<ID>
    lastState:
      terminated:
        containerID: docker://<ID>
        exitCode: 137
        finishedAt: "2024-10-17T14:26:25Z"
        reason: OOMKilled
        startedAt: "2024-10-17T14:23:25Z"
    name: patroni
...

As a workaround, increase the memory limit for PostgreSQL to 20Gi in the Cluster object:

spec:
  providerSpec:
    value:
      helmReleases:
      - name: stacklight
        values:
          resources:
            postgresql:
              limits:
                memory: "20Gi"

For a detailed procedure of StackLight configuration, see MOSK Operations Guide: Configure StackLight. For description of the resources option, see MOSK Operations Guide: StackLight configuration parameters - Resource limits.

[44193] OpenSearch reaches 85% disk usage watermark affecting the cluster state

Fixed in 2.29.0 (17.4.0 and 16.4.0)

On High Availability (HA) clusters that use Local Volume Provisioner (LVP), Prometheus and OpenSearch from StackLight may share the same pool of storage. In such configuration, OpenSearch may approach the 85% disk usage watermark due to the combined storage allocation and usage patterns set by the Persistent Volume Claim (PVC) size parameters for Prometheus and OpenSearch, which consume storage the most.

When the 85% threshold is reached, the affected node is transitioned to the read-only state, preventing shard allocation and causing the OpenSearch cluster state to transition to Warning (Yellow) or Critical (Red).

Caution

The issue and the provided workaround apply only for clusters on which OpenSearch and Prometheus utilize the same storage pool.

To verify that the cluster is affected:

  1. Verify the result of the following formula:

    0.8 × OpenSearch_PVC_Size_GB + Prometheus_PVC_Size_GB > 0.85 × Total_Storage_Capacity_GB
    

    In the formula, define the following values:

    OpenSearch_PVC_Size_GB

    Derived from .values.elasticsearch.persistentVolumeUsableStorageSizeGB, defaulting to .values.elasticsearch.persistentVolumeClaimSize if unspecified. To obtain the OpenSearch PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.elasticsearch.persistentVolumeClaimSize '
    

    Example of system response:

    10000Gi
    
    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize. To obtain the Prometheus PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.prometheusServer.persistentVolumeClaimSize '
    

    Example of system response:

    4000Gi
    
    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    If the formula result is positive, it is an early indication that the cluster is affected.

  2. Verify whether the OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical alert is firing. And if so, verify the following:

    1. Log in to the OpenSearch web UI.

    2. In Management -> Dev Tools, run the following command:

      GET _cluster/allocation/explain
      

      The following system response indicates that the corresponding node is affected:

      "explanation": "the node is above the low watermark cluster setting \
      [cluster.routing.allocation.disk.watermark.low=85%], using more disk space \
      than the maximum allowed [85.0%], actual free: [xx.xxx%]"
      

      Note

      The system response may contain even higher watermark percent than 85.0%, depending on the case.

Workaround:

Warning

The workaround implies adjustement of the retention threshold for OpenSearch. And depending on the new threshold, some old logs will be deleted.

  1. Adjust or set .values.elasticsearch.persistentVolumeUsableStorageSizeGB to a lower value for the affection check formula to be non-positive. For configuration details, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

    Mirantis also recommends reserving some space for other PVCs using storage from the pool. Use the following formula to calculate the required space:

    persistentVolumeUsableStorageSizeGB =
    0.84 × ((1 - Reserved_Percentage - Filesystem_Reserve) ×
    Total_Storage_Capacity_GB - Prometheus_PVC_Size_GB) /
    0.8
    

    In the formula, define the following values:

    Reserved_Percentage

    A user-defined variable that specifies what percentage of the total storage capacity should not be used by OpenSearch or Prometheus. This is used to reserve space for other components. It should be expressed as a decimal. For example, for 5% of reservation, Reserved_Percentage is 0.05. Mirantis recommends using 0.05 as a starting point.

    Filesystem_Reserve

    Percentage to deduct for filesystems that may reserve some portion of the available storage, which is marked as occupied. For example, for EXT4, it is 5% by default, so the value must be 0.05.

    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize.

    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    Calculation of above formula provides a maximum safe storage to allocate for .values.elasticsearch.persistentVolumeUsableStorageSizeGB. Use this formula as a reference for setting .values.elasticsearch.persistentVolumeUsableStorageSizeGB on a cluster.

  2. Wait up to 15-20 mins for OpenSearch to perform the cleaning.

  3. Verify that the cluster is not affected anymore using the procedure above.


Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.28.2. For artifacts of the Cluster releases introduced in 2.28.2, see patch Cluster releases 17.2.6, 16.3.2, and 16.2.6.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-antelope-jammy-debug-20241028161924

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-antelope-jammy-debug-20241028161924

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.41.22.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.41.22.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.41.22.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.41.22.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.41.22.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.41.22

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-28-alpine-20241022121257

baremetal-operator

mirantis.azurecr.io/bm/baremetal-operator:base-2-28-alpine-20241022120949

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-28-alpine-20241022120001

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.41.22

ironic Updated

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20241023091304

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20241023091304

ironic-prometheus-exporter Updated

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240913123302

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-28-alpine-20241022122006

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-34a4f54-20240910081335

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-jammy-20240927170336

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20241022120929

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.41.22.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.41.22.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.41.22.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.41.22.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.41.22.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.41.22.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.41.22.tgz

credentials-controller

https://binary.mirantis.com/core/helm/credentials-controller-1.41.22.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.41.22.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.41.22.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.41.22.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.41.22.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.41.22.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.41.22.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.41.22.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.41.22.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.41.22.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.41.22.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.41.22.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.41.22.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.41.22.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.41.22.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.41.22.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.41.22.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.41.22.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.41.22.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.41.22

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.41.22

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.41.22

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-8

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.41.22

credentials-controller Updated

mirantis.azurecr.io/core/credentials-controller:1.41.22

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.41.22

frontend Updated

mirantis.azurecr.io/core/frontend:1.41.22

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.41.22

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.41.22

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.41.22

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.41.22

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.41.22

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.41.22

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.41.22

mcc-cache-warmup Updated

mirantis.azurecr.io/core/mcc-cache-warmup:1.41.22

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.41.22

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.41.22

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.41.22

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.41.22

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-14

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.41.22

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.41.22

secret-controller Updated

mirantis.azurecr.io/core/secret-controller:1.41.22

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.41.22

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.41.22.tgz

Docker images

kubectl

mirantis.azurecr.io/general/kubectl:20240926142019

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240909113408

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:25.0.6-20240926140203

2.28.1

Important

For MOSK clusters, Container Cloud 2.28.1 is the continuation for MOSK 24.2.x series using the patch Cluster release 17.2.5. For the update path of 24.1, 24.2, and 24.3 series, see MOSK documentation: Release Compatibility Matrix - Managed cluster update schema.

The management cluster of a MOSK 24.2.x cluster is automatically updated to the latest patch Cluster release 16.3.1.

The Container Cloud patch release 2.28.1, which is based on the 2.28.0 major release, provides the following updates:

  • Support for the patch Cluster release 16.3.1.

  • Support for the patch Cluster releases 16.2.5 and 17.2.5 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.2.3.

  • Support for MKE 3.7.15.

  • Bare metal: update of Ubuntu mirror from 2024-09-11-014225 to ubuntu-2024-10-14-013948 along with update of minor kernel version from 5.15.0-119-generic to 5.15.0-122-generic.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.3.0 and 16.3.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.28.1, refer to 2.28.0.

Security notes

In total, since Container Cloud 2.28.0, 400 Common Vulnerabilities and Exposures (CVE) have been fixed in 2.28.1: 46 of critical and 354 of high severity.

The table below includes the total numbers of addressed unique and common CVEs in images by product component since Container Cloud 2.28.0. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

1

1

Common

0

4

4

Kaas core

Unique

1

14

15

Common

1

118

119

StackLight

Unique

8

40

48

Common

45

232

277

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.2.3: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.28.1 along with the patch Cluster releases 16.3.1, 16.2.5, and 17.2.5.

  • [46808] [LCM] Fixed the issue with old kernel metapackages remaining on the cluster after kernel upgrade.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.28.1 including the Cluster releases 16.2.5, 16.3.1, and 17.2.5.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


LCM
[47741] Upgrade to MKE 3.7.15 is blocked by ucp-upgrade-check-images

Fixed in 2.28.2 (17.2.6, 16.2.6, and 16.3.2)

Upgrade from MKE 3.7.12 to 3.7.15 may get stuck due to the leftover ucp-upgrade-check-images service that is part of MKE 3.7.12.

As a workaround, on any master node, remove the leftover service using the docker service rm ucp-upgrade-check-images command.

[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

StackLight
[47594] Patroni pods may get stuck in the CrashLoopBackOff state

Fixed in 2.28.3 (17.2.7, 16.2.7, and 16.3.3)

The Patroni pods may get stuck in the CrashLoopBackOff state due to the patroni container being terminated with reason: OOMKilled that you can see in the pod status. For example:

kubectl get pod/patroni-13-0 -n stacklight -o yaml
...
  - containerID: docker://<ID>`
    image: mirantis.azurecr.io/stacklight/spilo:13-2.1p9-20240828023010
    imageID: docker-pullable://mirantis.azurecr.io/stacklight/spilo@sha256:<ID>
    lastState:
      terminated:
        containerID: docker://<ID>
        exitCode: 137
        finishedAt: "2024-10-17T14:26:25Z"
        reason: OOMKilled
        startedAt: "2024-10-17T14:23:25Z"
    name: patroni
...

As a workaround, increase the memory limit for PostgreSQL to 20Gi in the Cluster object:

spec:
  providerSpec:
    value:
      helmReleases:
      - name: stacklight
        values:
          resources:
            postgresql:
              limits:
                memory: "20Gi"

For a detailed procedure of StackLight configuration, see MOSK Operations Guide: Configure StackLight. For description of the resources option, see MOSK Operations Guide: StackLight configuration parameters - Resource limits.

[47304] OpenSearch does not store kubelet logs

Fixed in 2.28.2 (17.2.6, 16.2.6, and 16.3.2)

Due to the JSON-based format of ucp-kubelet logs, OpenSearch does not store kubelet logs. Mirantis is working on the issue and will deliver the resolution in one of the nearest patch releases.

[44193] OpenSearch reaches 85% disk usage watermark affecting the cluster state

Fixed in 2.29.0 (17.4.0 and 16.4.0)

On High Availability (HA) clusters that use Local Volume Provisioner (LVP), Prometheus and OpenSearch from StackLight may share the same pool of storage. In such configuration, OpenSearch may approach the 85% disk usage watermark due to the combined storage allocation and usage patterns set by the Persistent Volume Claim (PVC) size parameters for Prometheus and OpenSearch, which consume storage the most.

When the 85% threshold is reached, the affected node is transitioned to the read-only state, preventing shard allocation and causing the OpenSearch cluster state to transition to Warning (Yellow) or Critical (Red).

Caution

The issue and the provided workaround apply only for clusters on which OpenSearch and Prometheus utilize the same storage pool.

To verify that the cluster is affected:

  1. Verify the result of the following formula:

    0.8 × OpenSearch_PVC_Size_GB + Prometheus_PVC_Size_GB > 0.85 × Total_Storage_Capacity_GB
    

    In the formula, define the following values:

    OpenSearch_PVC_Size_GB

    Derived from .values.elasticsearch.persistentVolumeUsableStorageSizeGB, defaulting to .values.elasticsearch.persistentVolumeClaimSize if unspecified. To obtain the OpenSearch PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.elasticsearch.persistentVolumeClaimSize '
    

    Example of system response:

    10000Gi
    
    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize. To obtain the Prometheus PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.prometheusServer.persistentVolumeClaimSize '
    

    Example of system response:

    4000Gi
    
    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    If the formula result is positive, it is an early indication that the cluster is affected.

  2. Verify whether the OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical alert is firing. And if so, verify the following:

    1. Log in to the OpenSearch web UI.

    2. In Management -> Dev Tools, run the following command:

      GET _cluster/allocation/explain
      

      The following system response indicates that the corresponding node is affected:

      "explanation": "the node is above the low watermark cluster setting \
      [cluster.routing.allocation.disk.watermark.low=85%], using more disk space \
      than the maximum allowed [85.0%], actual free: [xx.xxx%]"
      

      Note

      The system response may contain even higher watermark percent than 85.0%, depending on the case.

Workaround:

Warning

The workaround implies adjustement of the retention threshold for OpenSearch. And depending on the new threshold, some old logs will be deleted.

  1. Adjust or set .values.elasticsearch.persistentVolumeUsableStorageSizeGB to a lower value for the affection check formula to be non-positive. For configuration details, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

    Mirantis also recommends reserving some space for other PVCs using storage from the pool. Use the following formula to calculate the required space:

    persistentVolumeUsableStorageSizeGB =
    0.84 × ((1 - Reserved_Percentage - Filesystem_Reserve) ×
    Total_Storage_Capacity_GB - Prometheus_PVC_Size_GB) /
    0.8
    

    In the formula, define the following values:

    Reserved_Percentage

    A user-defined variable that specifies what percentage of the total storage capacity should not be used by OpenSearch or Prometheus. This is used to reserve space for other components. It should be expressed as a decimal. For example, for 5% of reservation, Reserved_Percentage is 0.05. Mirantis recommends using 0.05 as a starting point.

    Filesystem_Reserve

    Percentage to deduct for filesystems that may reserve some portion of the available storage, which is marked as occupied. For example, for EXT4, it is 5% by default, so the value must be 0.05.

    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize.

    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    Calculation of above formula provides a maximum safe storage to allocate for .values.elasticsearch.persistentVolumeUsableStorageSizeGB. Use this formula as a reference for setting .values.elasticsearch.persistentVolumeUsableStorageSizeGB on a cluster.

  2. Wait up to 15-20 mins for OpenSearch to perform the cleaning.

  3. Verify that the cluster is not affected anymore using the procedure above.


Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.28.1. For artifacts of the Cluster releases introduced in 2.28.1, see patch Cluster releases 17.2.5, 16.3.1, and 16.2.5.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20241014163420

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20241014163420

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.41.18.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.41.18.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.41.18.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.41.18.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.41.18.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.41.18

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-28-alpine-20241022121257

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-28-alpine-20241022120949

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-2-28-alpine-20241022120001

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.41.18

ironic Updated

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20240927160001

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20240927160001

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240819102310

kaas-ipam Updated

mirantis.azurecr.io/bm/kaas-ipam:base-2-28-alpine-20241022122006

kubernetes-entrypoint Updated

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-34a4f54-20240910081335

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.17-jammy-20240927170336

syslog-ng Updated

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20241022120929

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.41.18.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.41.18.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.41.18.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.41.18.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.41.18.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.41.18.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.41.18.tgz

credentials-controller

https://binary.mirantis.com/core/helm/credentials-controller-1.41.18.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.41.18.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.41.18.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.41.18.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.41.18.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.41.18.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.41.18.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.41.18.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.41.18.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.41.18.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.41.18.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.41.18.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.41.18.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.41.18.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.41.18.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.41.18.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.41.18.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.41.18.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.41.18.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.41.18

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.41.18

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.41.18

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-8

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.41.18

credentials-controller Updated

mirantis.azurecr.io/core/credentials-controller:1.41.18

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.41.18

frontend Updated

mirantis.azurecr.io/core/frontend:1.41.18

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.41.18

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.41.18

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.41.18

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.41.18

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.41.18

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.41.18

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.41.18

mcc-cache-warmup Updated

mirantis.azurecr.io/core/mcc-cache-warmup:1.41.18

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.41.18

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.41.18

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.41.18

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.41.18

registry Updated

mirantis.azurecr.io/lcm/registry:v2.8.1-14

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.41.18

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.41.18

secret-controller Updated

mirantis.azurecr.io/core/secret-controller:1.41.18

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.41.18

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.41.18.tgz

Docker images

kubectl

mirantis.azurecr.io/general/kubectl:20240926142019

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240909113408

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:25.0.6-20240926140203

2.28.0

The Mirantis Container Cloud major release 2.28.0:

  • Introduces support for the Cluster release 17.3.0 that is based on the Cluster release 16.3.0 and represents Mirantis OpenStack for Kubernetes (MOSK) 24.3.

  • Introduces support for the Cluster release 16.3.0 that is based on Mirantis Container Runtime (MCR) 23.0.14 and Mirantis Kubernetes Engine (MKE) 3.7.12 with Kubernetes 1.27.

  • Does not support greenfield deployments on deprecated Cluster releases of the 17.2.x and 16.2.x series. Use the latest available Cluster releases of the series instead.

    Caution

    Make sure to update the Cluster release version of your managed cluster before the current Cluster release version becomes unsupported by a new Container Cloud release version. Otherwise, Container Cloud stops auto-upgrade and eventually Container Cloud itself becomes unsupported.

This section outlines release notes for the Container Cloud release 2.28.0.

Enhancements

This section outlines new features and enhancements introduced in the Container Cloud release 2.28.0. For the list of enhancements delivered with the Cluster releases introduced by Container Cloud 2.28.0, see 17.3.0 and 16.3.0.

General availability for Ubuntu 22.04 on MOSK clusters

Implemented full support for Ubuntu 22.04 LTS (Jammy Jellyfish) as the default host operating system in MOSK clusters, including greenfield deployments and update from Ubuntu 20.04 to 22.04 on existing clusters.

Ubuntu 20.04 is deprecated for greenfield deployments and supported during the MOSK 24.3 release cycle only for existing clusters.

Warning

During the course of the Container Cloud 2.28.x series, Mirantis highly recommends upgrading an operating system on any nodes of all your managed cluster machines to Ubuntu 22.04 before the next major Cluster release becomes available.

It is not mandatory to upgrade all machines at once. You can upgrade them one by one or in small batches, for example, if the maintenance window is limited in time.

Otherwise, the Cluster release update of the Ubuntu 20.04-based managed clusters will become impossible as of Container Cloud 2.29.0 with Ubuntu 22.04 as the only supported version.

Management cluster update to Container Cloud 2.29.1 will be blocked if at least one node of any related managed cluster is running Ubuntu 20.04.

Note

Since Container Cloud 2.27.0 (Cluster release 16.2.0), existing MOSK management clusters were automatically updated to Ubuntu 22.04 during cluster upgrade. Greenfield deployments of management clusters are also based on Ubuntu 22.04.

Day-2 operations for bare metal: updating modules

TechPreview

Implemented the capability to update custom modules using deprecation. Once you create a new custom module, you can use it to deprecate another module by adding the deprecates field to metadata.yaml of the new module. The related HostOSConfiguration and HostOSConfigurationModules objects reflect the deprecation status of new and old modules using the corresponding fields in spec and status sections.

Also, added monitoring of deprecated modules by implementing the StackLight metrics for the Host Operating System Modules Controller along with the Day2ManagementControllerTargetDown and Day2ManagementDeprecatedConfigs alerts to notify the cloud operator about detected deprecations and issues with host-os-modules-controller.

Note

Deprecation is soft, meaning that no actual restrictions are applied to the usage of a deprecated module.

Caution

Deprecating a version automatically deprecates all lower SemVer versions of the specified module.

Day-2 operations for bare metal: configuration enhancements for modules

TechPreview

Introduced the following configuration enhancements for custom modules:

  • Module-specific Ansible configuration

    Updated the Ansible execution mechanism for running any modules. The default ansible.cfg file is now placed in /etc/ansible/mcc.cfg and used for execution of lcm-ansible and day-2 modules. However, if a module has its own ansible.cfg in the module root folder, such configuration is used for the module execution instead of the default one.

  • Configuration of supported operating system distribution

    Added the supportedDistributions to the metadata section of a module custom resource to define the list of supported operating system distributions for the module. This field is informative and does not block the module execution on machines running non-supported distributions, but such execution will be most probably completed with an error.

  • Separate flag for machines requiring reboot

    Introduced a separate /run/day2/reboot-required file for day-2 modules to add a notification about required reboot for a machine and a reason for reboot that appear after the module execution. The feature allows for separation of the reboot reason between LCM and day-2 operations.

Update group for controller nodes

TechPreview

Implemented the update group for controller nodes using the UpdateGroup resource, which is automatically generated during initial cluster creation with the following settings:

  • Name: <cluster-name>-control

  • Index: 1

  • Concurrent updates: 1

This feature decouples the concurrency settings from the global cluster level and provides update flexibility.

All control plane nodes are automatically assigned to the control update group with no possibility to change it.

Note

On existing clusters created before 2.28.0 (Cluster releases 17.2.0, 16.2.0, or earlier), the control update group is created after upgrade of the Container Cloud release to 2.28.0 (Cluster release 16.3.0) on the management cluster.

Reboot of machines using update groups

TechPreview

Implemented the rebootIfUpdateRequires parameter for the UpdateGroup custom resource. The parameter allows for rebooting a set of controller or worker machines added to an update group during a Cluster release update that requires a reboot, for example, when kernel version update is available in the target Cluster release. The feature reduces manual intervention and overall downtime during cluster update.

Note

By default, rebootIfUpdateRequires is set to false on managed clusters and to true on management clusters.

Self-diagnostics for management and managed clusters

Implemented the Diagnostic Controller that is a tool with a set of diagnostic checks to perform self-diagnostics of any Container Cloud cluster and help the operator to easily understand, troubleshoot, and resolve potential issues against the following major subsystems: core, bare metal, Ceph, StackLight, Tungsten Fabric, and OpenStack. The Diagnostic Controller analyzes the configuration of the cluster subsystems and reports results of checks that contain useful information about cluster health.

Running self-diagnostics on both management and managed clusters is essential to ensure the overall health and optimal performance of your cluster. Mirantis recommends running self-diagnostics before cluster update, node replacement, or any other significant changes in the cluster to prevent potential issues and optimize maintenance window.

Configuration of groups in auditd

TechPreview

Simplified the default auditd configuration by implementing the preset groups that you can use in presetRules instead of exact names or the virtual group all. The feature allows enabling a limited set of presets using a single keyword (group name).

Also, optimized disk usage by removing the following Docker rule that was removed from the Docker CIS Benchmark 1.3.0 due to producing excessive events:

# 1.2.4 Ensure auditing is configured for Docker files and directories - /var/lib/docker
-w /var/lib/docker -k docker
Amendments for the ClusterUpdatePlan object

TechPreview

Enhanced the ClusterUpdatePlan object by adding a separate update step for each UpdateGroup of worker nodes of a managed cluster. The feature allows the operator to granularly control the update process and its impact on workloads, with the option to pause the update after each step.

Also, added several StackLight alerts to notify the operator about the update progress and potential update issues.

Refactoring of delayed auto-update of a management cluster

Refactored the MCCUpgrade object by implementing a new mechanism to delay Container Cloud release updates. You now have the following options for auto-update of a management cluster:

  • Automatically update a cluster on the publish day of a new release (by default).

  • Set specific days and hours for an auto-update allowing delays of up to one week. For example, if a release becomes available on Monday, you can delay it until Sunday by setting Sunday as the only permitted day for auto-updates.

  • Delay auto-update for minimum 20 days for each newly discovered release. The exact number of delay days is set in the release metadata and cannot be changed by the user. It depends on the specifics of each release cycle and on optional configuration of week days and hours selected for update.

    You can verify the exact date of a scheduled auto-update either in the Status section of the Management Cluster Updates page in the web UI or in the status section of the MCCUpgrade object.

  • Combine auto-update delay with the specific days and hours setting (two previous options).

Also, optimized monitoring of auto-update by implementing several StackLight metrics for the kaas-exporter job along with the MCCUpdateBlocked and MCCUpdateScheduled alerts to notify the cloud operator about new releases as well as other important information about management cluster auto-update.

Container Cloud web UI enhancements for the bare metal provider

Refactored and improved UX visibility as well as added the following functionality for the bare metal managed clusters in the Container Cloud web UI:

  • Reworked the Create Subnets page:

    • Added the possibility to delete a subnet when it is not used by a cluster

    • Changed the default value of Use whole CIDR from true to false

    • Added storage subnet types: Storage access and Storage replication

  • Added the MetalLB Configs tab with configuration fields for MetalLB on the Networks page

  • Optimized the Create new machine form

  • Replicated the Create Credential form on the Baremetal page for easy access

  • Added the Labels fields to the Create L2 template and Create host profile forms as well as optimized uploading of specification data for these objects

Documentation enhancements

On top of continuous improvements delivered to the existing Container Cloud guides, added documentation on how to run Ceph performance tests using Kubernetes batch or cron jobs that run fio processes according to a predefined KaaSCephOperationRequest CR.

Addressed issues

The following issues have been addressed in the Mirantis Container Cloud release 2.28.0 along with the Cluster releases 17.3.0 and 16.3.0.

Note

This section provides descriptions of issues addressed since the last Container Cloud patch release 2.27.4.

For details on addressed issues in earlier patch releases since 2.27.0, which are also included into the major release 2.28.0, refer to 2.27.x patch releases.

  • [41305] [Bare metal] Fixed the issue with newly added management cluster nodes failing to undergo provisioning if the management cluster nodes were configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

  • [46245] [Bare metal] Fixed the issue with lack of permissions for serviceuser and users with the global-admin and operator roles to fetch HostOSConfigurationModules and HostOSConfiguration custom resources.

  • [43164] [StackLight] Fixed the issue with rollover policy not being added to indicies created without a policy.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.28.0 including the Cluster releases 17.3.0 and 16.3.0.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


LCM
[46808] Old kernel metapackages are not removed during kernel upgrade

Fixed in 2.28.1 (17.2.5, 16.2.5, and 16.3.1)

After upgrade of kernel to the latest supported version, old kernel metapackages may remain on the cluster. The issue occurs if the system kernel line is changed from LTS to HWE. This setting is controlled by the upgrade_kernel_version parameter located in the ClusterRelease object under the deploy StateItem. As a result, the operating system has both LTS and HWE kernel packages installed and regularly updated, but only one kernel image is used (loaded into memory). The unused kernel images consume minimal amount of disk space.

Therefore, you can safely disregard the issue because it does not affect cluster operability. If you still require removing unused kernel metapackages, contact Mirantis support for detailed instructions.

[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

StackLight
[47594] Patroni pods may get stuck in the CrashLoopBackOff state

Fixed in 2.28.3 (17.2.7, 16.2.7, and 16.3.3)

The Patroni pods may get stuck in the CrashLoopBackOff state due to the patroni container being terminated with reason: OOMKilled that you can see in the pod status. For example:

kubectl get pod/patroni-13-0 -n stacklight -o yaml
...
  - containerID: docker://<ID>`
    image: mirantis.azurecr.io/stacklight/spilo:13-2.1p9-20240828023010
    imageID: docker-pullable://mirantis.azurecr.io/stacklight/spilo@sha256:<ID>
    lastState:
      terminated:
        containerID: docker://<ID>
        exitCode: 137
        finishedAt: "2024-10-17T14:26:25Z"
        reason: OOMKilled
        startedAt: "2024-10-17T14:23:25Z"
    name: patroni
...

As a workaround, increase the memory limit for PostgreSQL to 20Gi in the Cluster object:

spec:
  providerSpec:
    value:
      helmReleases:
      - name: stacklight
        values:
          resources:
            postgresql:
              limits:
                memory: "20Gi"

For a detailed procedure of StackLight configuration, see MOSK Operations Guide: Configure StackLight. For description of the resources option, see MOSK Operations Guide: StackLight configuration parameters - Resource limits.

[47304] OpenSearch does not store kubelet logs

Fixed in 2.28.2 (17.2.6, 16.2.6, and 16.3.2)

Due to the JSON-based format of ucp-kubelet logs, OpenSearch does not store kubelet logs. Mirantis is working on the issue and will deliver the resolution in one of the nearest patch releases.

[44193] OpenSearch reaches 85% disk usage watermark affecting the cluster state

Fixed in 2.29.0 (17.4.0 and 16.4.0)

On High Availability (HA) clusters that use Local Volume Provisioner (LVP), Prometheus and OpenSearch from StackLight may share the same pool of storage. In such configuration, OpenSearch may approach the 85% disk usage watermark due to the combined storage allocation and usage patterns set by the Persistent Volume Claim (PVC) size parameters for Prometheus and OpenSearch, which consume storage the most.

When the 85% threshold is reached, the affected node is transitioned to the read-only state, preventing shard allocation and causing the OpenSearch cluster state to transition to Warning (Yellow) or Critical (Red).

Caution

The issue and the provided workaround apply only for clusters on which OpenSearch and Prometheus utilize the same storage pool.

To verify that the cluster is affected:

  1. Verify the result of the following formula:

    0.8 × OpenSearch_PVC_Size_GB + Prometheus_PVC_Size_GB > 0.85 × Total_Storage_Capacity_GB
    

    In the formula, define the following values:

    OpenSearch_PVC_Size_GB

    Derived from .values.elasticsearch.persistentVolumeUsableStorageSizeGB, defaulting to .values.elasticsearch.persistentVolumeClaimSize if unspecified. To obtain the OpenSearch PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.elasticsearch.persistentVolumeClaimSize '
    

    Example of system response:

    10000Gi
    
    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize. To obtain the Prometheus PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.prometheusServer.persistentVolumeClaimSize '
    

    Example of system response:

    4000Gi
    
    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    If the formula result is positive, it is an early indication that the cluster is affected.

  2. Verify whether the OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical alert is firing. And if so, verify the following:

    1. Log in to the OpenSearch web UI.

    2. In Management -> Dev Tools, run the following command:

      GET _cluster/allocation/explain
      

      The following system response indicates that the corresponding node is affected:

      "explanation": "the node is above the low watermark cluster setting \
      [cluster.routing.allocation.disk.watermark.low=85%], using more disk space \
      than the maximum allowed [85.0%], actual free: [xx.xxx%]"
      

      Note

      The system response may contain even higher watermark percent than 85.0%, depending on the case.

Workaround:

Warning

The workaround implies adjustement of the retention threshold for OpenSearch. And depending on the new threshold, some old logs will be deleted.

  1. Adjust or set .values.elasticsearch.persistentVolumeUsableStorageSizeGB to a lower value for the affection check formula to be non-positive. For configuration details, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

    Mirantis also recommends reserving some space for other PVCs using storage from the pool. Use the following formula to calculate the required space:

    persistentVolumeUsableStorageSizeGB =
    0.84 × ((1 - Reserved_Percentage - Filesystem_Reserve) ×
    Total_Storage_Capacity_GB - Prometheus_PVC_Size_GB) /
    0.8
    

    In the formula, define the following values:

    Reserved_Percentage

    A user-defined variable that specifies what percentage of the total storage capacity should not be used by OpenSearch or Prometheus. This is used to reserve space for other components. It should be expressed as a decimal. For example, for 5% of reservation, Reserved_Percentage is 0.05. Mirantis recommends using 0.05 as a starting point.

    Filesystem_Reserve

    Percentage to deduct for filesystems that may reserve some portion of the available storage, which is marked as occupied. For example, for EXT4, it is 5% by default, so the value must be 0.05.

    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize.

    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    Calculation of above formula provides a maximum safe storage to allocate for .values.elasticsearch.persistentVolumeUsableStorageSizeGB. Use this formula as a reference for setting .values.elasticsearch.persistentVolumeUsableStorageSizeGB on a cluster.

  2. Wait up to 15-20 mins for OpenSearch to perform the cleaning.

  3. Verify that the cluster is not affected anymore using the procedure above.


Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Components versions

The following table lists the major components and their versions delivered in Container Cloud 2.28.0. The components that are newly added, updated, deprecated, or removed as compared to 2.27.0, are marked with a corresponding superscript, for example, admission-controller Updated.

Component

Application/Service

Version

Bare metal

baremetal-dnsmasq Updated

base-2-28-alpine-20240906160120

baremetal-operator Updated

base-2-28-alpine-20240910093836

baremetal-provider Updated

1.41.14

bm-collective Updated

base-2-28-alpine-20240910093747

cluster-api-provider-baremetal Updated

1.41.14

ironic Updated

antelope-jammy-20240716113922

ironic-inspector Updated

antelope-jammy-20240716113922

ironic-prometheus-exporter

0.1-20240819102310

kaas-ipam Updated

base-2-28-alpine-20240910095249

kubernetes-entrypoint

v1.0.1-4e381cb-20240813170642

mariadb

10.6.17-focal-20240523075821

metallb-controller Updated

v0.14.5-ed177720-amd64

metallb-speaker Updated

v0.14.5-ed177720-amd64

syslog-ng Updated

base-alpine-20240906155734

Container Cloud

admission-controller Updated

1.41.14

agent-controller Updated

1.41.14

byo-cluster-api-controller Updated

1.41.14

byo-credentials-controller Removed

n/a

ceph-kcc-controller Updated

1.41.14

cert-manager-controller Updated

1.11.0-8

configuration-collector Updated

1.41.14

event-controller Updated

1.41.14

frontend Updated

1.41.14

golang

1.22.7

iam-controller Updated

1.41.14

kaas-exporter Updated

1.41.14

kproxy Updated

1.41.14

lcm-controller Updated

1.41.14

license-controller Updated

1.41.14

machinepool-controller Updated

1.41.14

mcc-haproxy Updated

0.26.0-95-g95f0130

nginx Updated

1.41.14

portforward-controller Updated

1.41.14

proxy-controller Updated

1.41.14

rbac-controller Updated

1.41.14

registry Updated

2.8.1-13

release-controller Updated

1.41.14

rhellicense-controller Removed

n/a

scope-controller Updated

1.41.14

secret-controller Updated

1.41.14

storage-discovery Updated

1.41.14

user-controller Updated

1.41.14

IAM Updated

iam

1.41.14

mariadb

10.6.17-focal-20240909113408

mcc-keycloak Updated

25.0.6-20240926140203

OpenStack Updated

host-os-modules-controller

1.41.14

openstack-cluster-api-controller

1.41.14

openstack-provider

1.41.14

os-credentials-controller Removed

n/a

Artifacts

This section lists the artifacts of components included in the Container Cloud release 2.27.0. The components that are newly added, updated, deprecated, or removed as compared to 2.27.0, are marked with a corresponding superscript, for example, admission-controller Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240911112529

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240911112529

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.41.14.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.41.14.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.41.14.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.41.14.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.41.14.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.41.14.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.41.14

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-28-alpine-20240906160120

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-28-alpine-20240910093836

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-2-28-alpine-20240910093747

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.41.14

ironic Updated

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20240716113922

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20240716113922

ironic-prometheus-exporter Updated

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240819102310

kaas-ipam Updated

mirantis.azurecr.io/bm/kaas-ipam:base-2-28-alpine-20240910095249

kubernetes-entrypoint Updated

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-4e381cb-20240813170642

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.26.0-95-g95f0130

metallb-controller Updated

mirantis.azurecr.io/bm/metallb/controller:v0.14.5-ed177720-amd64

metallb-speaker Updated

mirantis.azurecr.io/bm/metallb/speaker:v0.14.5-ed177720-amd64

syslog-ng Updated

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240906155734

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.41.14.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.41.14.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.41.14.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.41.14.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.41.14.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.41.14.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.41.14.tgz

credentials-controller New

https://binary.mirantis.com/core/helm/credentials-controller-1.41.14.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.41.14.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.41.14.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.41.14.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.41.14.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.41.14.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.41.14.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.41.14.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.41.14.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.41.14.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.41.14.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.41.14.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.41.14.tgz

os-credentials-controller Removed

n/a

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.41.14.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.41.14.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.41.14.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.41.14.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.41.14.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.41.14.tgz

Docker images Updated

admission-controller

mirantis.azurecr.io/core/admission-controller:1.41.14

agent-controller

mirantis.azurecr.io/core/agent-controller:1.41.14

ceph-kcc-controller

mirantis.azurecr.io/core/ceph-kcc-controller:1.41.14

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-8

configuration-collector

mirantis.azurecr.io/core/configuration-collector:1.41.14

credentials-controller New

mirantis.azurecr.io/core/credentials-controller:1.41.14

event-controller

mirantis.azurecr.io/core/event-controller:1.41.14

frontend

mirantis.azurecr.io/core/frontend:1.41.14

host-os-modules-controller

mirantis.azurecr.io/core/host-os-modules-controller:1.41.14

iam-controller

mirantis.azurecr.io/core/iam-controller:1.41.14

kaas-exporter

mirantis.azurecr.io/core/kaas-exporter:1.41.14

kproxy

mirantis.azurecr.io/core/kproxy:1.41.14

lcm-controller

mirantis.azurecr.io/core/lcm-controller:1.41.14

license-controller

mirantis.azurecr.io/core/license-controller:1.41.14

machinepool-controller

mirantis.azurecr.io/core/machinepool-controller:1.41.14

mcc-cache-warmup

mirantis.azurecr.io/core/mcc-cache-warmup:1.41.14

mcc-haproxy

mirantis.azurecr.io/lcm/mcc-haproxy:v0.26.0-95-g95f0130

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.26.0-95-g95f0130

nginx

mirantis.azurecr.io/core/external/nginx:1.41.14

openstack-cluster-api-controller

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.41.14

os-credentials-controller Removed

n/a

portforward-controller

mirantis.azurecr.io/core/portforward-controller:1.41.14

rbac-controller

mirantis.azurecr.io/core/rbac-controller:1.41.14

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-13

release-controller

mirantis.azurecr.io/core/release-controller:1.41.14

scope-controller

mirantis.azurecr.io/core/scope-controller:1.41.14

secret-controller

mirantis.azurecr.io/core/secret-controller:1.41.14

user-controller

mirantis.azurecr.io/core/user-controller:1.41.14

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.41.14.tgz

Docker images

kubectl Updated

mirantis.azurecr.io/general/kubectl:20240926142019

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240909113408

mcc-keycloak Updated

mirantis.azurecr.io/iam/mcc-keycloak:25.0.6-20240926140203

Security notes

In total, since Container Cloud 2.27.0, in 2.28.0, 2614 Common Vulnerabilities and Exposures (CVE) have been fixed: 299 of critical and 2315 of high severity.

The table below includes the total numbers of addressed unique and common vulnerabilities and exposures (CVE) by product component since the 2.27.4 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

5

5

Common

0

211

211

KaaS core

Unique

4

11

15

Common

10

315

325

StackLight

Unique

1

7

8

Common

1

25

26

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.3: Security notes.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.3.0 or 16.3.0.

Consider this information as a supplement to the generic update procedures published in Operations Guide: Automatic upgrade of a management cluster and Update a managed cluster.

Pre-update actions
Change label values in Ceph metrics used in customizations

Note

If you do not use Ceph metrics in any customizations, for example, custom alerts, Grafana dashboards, or queries in custom workloads, skip this section.

After deprecating the performance metric exporter that is integrated into the Ceph Manager daemon for the sake of the dedicated Ceph Exporter daemon in Container Cloud 2.27.0, you may need to update values of several labels in Ceph metrics if you use them in any customizations such as custom alerts, Grafana dashboards, or queries in custom tools. These labels are changed in Container Cloud 2.28.0 (Cluster releases 16.3.0 and 17.3.0).

Note

Names of metrics are changed, no metrics are removed.

All Ceph metrics to be collected by the Ceph Exporter daemon changed their labels job and instance due to scraping metrics from new Ceph Exporter daemon instead of the performance metric exporter of Ceph Manager:

  • Values of the job labels are changed from rook-ceph-mgr to prometheus-rook-exporter for all Ceph metrics moved to Ceph Exporter. The full list of moved metrics is presented below.

  • Values of the instance labels are changed from the metric endpoint of Ceph Manager with port 9283 to the metric endpoint of Ceph Exporter with port 9926 for all Ceph metrics moved to Ceph Exporter. The full list of moved metrics is presented below.

  • Values of the instance_id labels of Ceph metrics from the RADOS Gateway (RGW) daemons are changed from the daemon GID to the daemon subname. For example, instead of instance_id="<RGW_PROCESS_GID>", the instance_id="a" (ceph_rgw_qlen{instance_id="a"}) is now used. The list of moved Ceph RGW metrics is presented below.

List of affected Ceph RGW metrics
  • ceph_rgw_cache_.*

  • ceph_rgw_failed_req

  • ceph_rgw_gc_retire_object

  • ceph_rgw_get.*

  • ceph_rgw_keystone_.*

  • ceph_rgw_lc_.*

  • ceph_rgw_lua_.*

  • ceph_rgw_pubsub_.*

  • ceph_rgw_put.*

  • ceph_rgw_qactive

  • ceph_rgw_qlen

  • ceph_rgw_req

List of all metrics to be collected by Ceph Exporter instead of Ceph Manager
  • ceph_bluefs_.*

  • ceph_bluestore_.*

  • ceph_mds_cache_.*

  • ceph_mds_caps

  • ceph_mds_ceph_.*

  • ceph_mds_dir_.*

  • ceph_mds_exported_inodes

  • ceph_mds_forward

  • ceph_mds_handle_.*

  • ceph_mds_imported_inodes

  • ceph_mds_inodes.*

  • ceph_mds_load_cent

  • ceph_mds_log_.*

  • ceph_mds_mem_.*

  • ceph_mds_openino_dir_fetch

  • ceph_mds_process_request_cap_release

  • ceph_mds_reply_.*

  • ceph_mds_request

  • ceph_mds_root_.*

  • ceph_mds_server_.*

  • ceph_mds_sessions_.*

  • ceph_mds_slow_reply

  • ceph_mds_subtrees

  • ceph_mon_election_.*

  • ceph_mon_num_.*

  • ceph_mon_session_.*

  • ceph_objecter_.*

  • ceph_osd_numpg.*

  • ceph_osd_op.*

  • ceph_osd_recovery_.*

  • ceph_osd_stat_.*

  • ceph_paxos.*

  • ceph_prioritycache.*

  • ceph_purge.*

  • ceph_rgw_cache_.*

  • ceph_rgw_failed_req

  • ceph_rgw_gc_retire_object

  • ceph_rgw_get.*

  • ceph_rgw_keystone_.*

  • ceph_rgw_lc_.*

  • ceph_rgw_lua_.*

  • ceph_rgw_pubsub_.*

  • ceph_rgw_put.*

  • ceph_rgw_qactive

  • ceph_rgw_qlen

  • ceph_rgw_req

  • ceph_rocksdb_.*

Post-update actions
Manually disable collection of performance metrics by Ceph Manager (optional)

Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0), Ceph cluster metrics are collected by the dedicated Ceph Exporter daemon. At the same time, same metrics are still available to be collected by the Ceph Manager daemon. To improve performance of the Ceph Manager daemon, you can manually disable collection of performance metrics by Ceph Manager, which are collected by the Ceph Exporter daemon.

To disable performance metrics for the Ceph Manager daemon, add the following parameter to the KaaSCephCluster spec in the rookConfig section:

spec:
  cephClusterSpec:
    rookConfig:
      "mgr|mgr/prometheus/exclude_perf_counters": "true"

Once you add this option, Ceph performance metrics are collected by the Ceph Exporter daemon only. For more details, see Official Ceph documentation.

Upgrade to Ubuntu 22.04 on baremetal-based clusters

In Container Cloud 2.29.0, the Cluster release update of the Ubuntu 20.04-based managed clusters will become impossible, and Ubuntu 22.04 will become the only supported version of the operating system. Therefore, ensure that every node of your managed clusters are running Ubuntu 22.04 to unblock managed cluster update in Container Cloud 2.29.0.

Warning

Management cluster update to Container Cloud 2.29.1 will be blocked if at least one node of any related managed cluster is running Ubuntu 20.04.

Therefore, if your existing cluster runs nodes on Ubuntu 20.04, prevent blocking of your cluster update by upgrading all cluster nodes to Ubuntu 22.04 during the course of the Container Cloud 2.28.x series. For the update procedure, refer to Mirantis OpenStack for Kubernetes documentation: Bare metal operations - Upgrade an operating system distribution.

It is not mandatory to upgrade all machines at once. You can upgrade them one by one or in small batches, for example, if the maintenance window is limited in time.

Note

Existing management clusters were automatically updated to Ubuntu 22.04 during cluster upgrade to the Cluster release 16.2.0 in Container Cloud 2.27.0. Greenfield deployments of management clusters are also based on Ubuntu 22.04.

Warning

Usage of third-party software, which is not part of Mirantis-supported configurations, for example, the use of custom DPDK modules, may block upgrade of an operating system distribution. Users are fully responsible for ensuring the compatibility of such custom components with the latest supported Ubuntu version.

2.27.4

Note

For MOSK clusters, Container Cloud 2.27.4 is the second patch release of MOSK 24.2.x series using the patch Cluster release 17.2.4. For the update path of 24.1 and 24.2 series, see MOSK documentation: Cluster update scheme.

The Container Cloud patch release 2.27.4, which is based on the 2.27.0 major release, provides the following updates:

  • Support for the patch Cluster releases 16.2.4 and 17.2.4 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.2.2.

  • Bare metal: update of Ubuntu mirror from ubuntu-2024-08-06-014502 to ubuntu-2024-08-21-014714 along with update of the minor kernel version from 5.15.0-117-generic to 5.15.0-119-generic for Jammy and to 5.15.0-118-generic for Focal.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.2.0 and 16.2.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.27.4, refer to 2.27.0.

Security notes

In total, since Container Cloud 2.27.3, 131 Common Vulnerabilities and Exposures (CVE) have been fixed in 2.27.4: 15 of critical and 116 of high severity.

The table below includes the total numbers of addressed unique and common CVEs in images by product component since Container Cloud 2.27.3. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

1

1

Common

0

3

3

Kaas core

Unique

3

19

22

Common

14

105

119

StackLight

Unique

1

8

9

Common

1

8

9

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.2.2: Security notes.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.27.4 including the Cluster releases 16.2.4 and 17.2.4.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.

Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Patch cluster update
[49713] Patch update is stuck with some nodes in Prepare state

Patch update from 2.27.3 to 2.27.4 may get stuck with one or more management cluster nodes remaining in the Prepare state and with the following error in the lcm-controller logs on the management cluster:

failed to create cluster updater for cluster default/kaas-mgmt:
machine update group not found for machine default/master-0

To work around the issue, in the LCMMachine objects of the management cluster, set the following annotation:

lcm.mirantis.com/update-group: <mgmt cluster name>-controlplane

Once done, patch update of the cluster resumes automatically.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.27.4. For artifacts of the Cluster releases introduced in 2.27.4, see patch Cluster releases 16.2.4 and 17.2.4.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240821131059

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240821131059

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.40.23.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.40.23.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.40.23.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.40.23.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.40.23.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.40.23.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.40.23

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-27-alpine-20240806125028

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-27-alpine-20240827132225

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-27-alpine-20240812135414

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.40.23

ironic

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20240716113922

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20240716113922

ironic-prometheus-exporter Updated

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240819102310

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-27-alpine-20240812140336

kubernetes-entrypoint Updated

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-4e381cb-20240813170642

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-42-g8710cbe

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.14.5-dfbd1a68-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.14.5-dfbd1a68-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240806124545

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.40.23.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.40.23.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.40.23.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.40.23.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.40.23.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.40.23.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.40.23.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.40.23.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.40.23.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.40.23.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.40.23.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.40.23.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.40.23.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.40.23.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.40.23.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.40.23.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.40.23.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.40.23.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.40.23.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.40.23.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.40.23.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.40.23.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.40.23.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.40.23.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.40.23.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.40.23.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.40.23.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.40.23

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.40.23

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.40.23

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.40.23

cert-manager-controller Updated

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-7

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.40.23

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.40.23

frontend Updated

mirantis.azurecr.io/core/frontend:1.40.23

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.40.23

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.40.23

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.40.23

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.40.23

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.40.23

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.40.23

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.40.23

mcc-cache-warmup Updated

mirantis.azurecr.io/core/mcc-cache-warmup:1.40.23

mcc-haproxy

mirantis.azurecr.io/lcm/mcc-haproxy:v0.25.0-42-g8710cbe

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-42-g8710cbe

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.40.23

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.40.23

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.40.23

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.40.23

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.40.23

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-11

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.40.23

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.40.23

secret-controller Updated

mirantis.azurecr.io/core/secret-controller:1.40.23

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.40.23

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.40.23

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.40.23.tgz

Docker images

kubectl

mirantis.azurecr.io/general/kubectl:20240711152257

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:24.0.5-20240802071408

2.27.3

Important

For MOSK clusters, Container Cloud 2.27.3 is the first patch release of MOSK 24.2.x series using the patch Cluster release 17.2.3. For the update path of 24.1 and 24.2 series, see MOSK documentation: Cluster update scheme.

The Container Cloud patch release 2.27.3, which is based on the 2.27.0 major release, provides the following updates:

  • Support for the patch Cluster releases 16.2.3 and 17.2.3 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.2.1.

  • MKE:

    • Support for MKE 3.7.12.

    • Improvements in the MKE benchmark compliance (control ID 5.1.5): analyzed and fixed the majority of failed compliance checks for the following components:

      • Container Cloud: iam-keycloak in the kaas namespace and opensearch-dashboards in the stacklight namespace

      • MOSK: opensearch-dashboards in the stacklight namespace

  • Bare metal: update of Ubuntu mirror from ubuntu-2024-07-16-014744 to ubuntu-2024-08-06-014502 along with update of the minor kernel version from 5.15.0-116-generic to 5.15.0-117-generic.

  • VMware vSphere: suspension of support for cluster deployment, update, and attachment. For details, see Deprecation notes.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.2.0 and 16.2.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.27.3, refer to 2.27.0.

Security notes

In total, since Container Cloud 2.27.2, 1559 Common Vulnerabilities and Exposures (CVE) have been fixed in 2.27.3: 253 of critical and 1306 of high severity.

The table below includes the total numbers of addressed unique and common CVEs in images by product component since Container Cloud 2.27.2. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

3

14

17

Common

142

736

878

Kaas core

Unique

4

22

26

Common

99

448

547

StackLight

Unique

7

51

58

Common

12

122

134

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.2.1: Security notes.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.27.3 including the Cluster releases 16.2.3 and 17.2.3.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.

Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.27.3. For artifacts of the Cluster releases introduced in 2.27.3, see patch Cluster releases 16.2.3 and 17.2.3.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240716085444

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240716085444

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.40.21.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.40.21.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.40.21.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.40.21.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.40.21.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.40.21.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.40.21

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-27-alpine-20240806125028

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-27-alpine-20240812133205

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-2-27-alpine-20240812135414

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.40.21

ironic

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20240716113922

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20240716113922

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam Updated

mirantis.azurecr.io/bm/kaas-ipam:base-2-27-alpine-20240812140336

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-42-g8710cbe

metallb-controller Updated

mirantis.azurecr.io/bm/metallb/controller:v0.14.5-dfbd1a68-amd64

metallb-speaker Updated

mirantis.azurecr.io/bm/metallb/speaker:v0.14.5-dfbd1a68-amd64

syslog-ng Updated

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240806124545

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.40.21.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.40.21.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.40.21.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.40.21.tgz

byo-provider Unsupported

https://binary.mirantis.com/core/helm/byo-provider-1.40.21.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.40.21.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.40.21.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.40.21.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.40.21.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.40.21.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.40.21.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.40.21.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.40.21.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.40.21.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.40.21.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.40.21.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.40.21.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.40.21.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.40.21.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.40.21.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.40.21.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.40.21.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.40.21.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.40.21.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.40.21.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.40.21.tgz

squid-proxy Unsupported

https://binary.mirantis.com/core/helm/squid-proxy-1.40.21.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.40.21.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.40.21.tgz

vsphere-credentials-controller Unsupported

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.40.21.tgz

vsphere-provider Unsupported

https://binary.mirantis.com/core/helm/vsphere-provider-1.40.21.tgz

vsphere-vm-template-controller Unsupported

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.40.21.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.40.21

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.40.21

byo-cluster-api-controller Unsupported

mirantis.azurecr.io/core/byo-cluster-api-controller:1.40.21

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.40.21

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-6

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.40.21

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.40.21

frontend Updated

mirantis.azurecr.io/core/frontend:1.40.21

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.40.21

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.40.21

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.40.21

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.40.21

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.40.21

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.40.21

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.40.21

mcc-cache-warmup Updated

mirantis.azurecr.io/core/mcc-cache-warmup:1.40.21

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.25.0-42-g8710cbe

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-42-g8710cbe

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.40.21

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.40.21

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.40.21

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.40.21

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.40.21

registry Updated

mirantis.azurecr.io/lcm/registry:v2.8.1-11

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.40.21

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.40.21

secret-controller Updated

mirantis.azurecr.io/core/secret-controller:1.40.21

squid-proxy Unsupported

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.40.21

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.40.21

vsphere-cluster-api-controller Unsupported

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.40.21

vsphere-credentials-controller Unsupported

mirantis.azurecr.io/core/vsphere-credentials-controller:1.40.21

vsphere-vm-template-controller Unsupported

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.40.21

IAM artifacts
IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.40.21.tgz

Docker images

kubectl

mirantis.azurecr.io/general/kubectl:20240711152257

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keycloak Updated

mirantis.azurecr.io/iam/mcc-keycloak:24.0.5-20240802071408

2.27.2

Important

For MOSK clusters, Container Cloud 2.27.2 is the continuation for MOSK 24.1.x series using the patch Cluster release 17.1.7. For the update path of 24.1 and 24.2 series, see MOSK documentation: Cluster update scheme.

The management cluster of a MOSK 24.1, 24.1.5, or 24.1.6 cluster is automatically updated to the latest patch Cluster release 16.2.2.

The Container Cloud patch release 2.27.2, which is based on the 2.27.0 major release, provides the following updates:

  • Support for the patch Cluster release 16.2.2.

  • Support for the patch Cluster releases 16.1.7 and 17.1.7 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.1.7.

  • Support for MKE 3.7.11.

  • Bare metal: update of Ubuntu mirror from ubuntu-2024-06-27-095142 to ubuntu-2024-07-16-014744 along with update of minor kernel version from 5.15.0-113-generic to 5.15.0-116-generic (Cluster release 16.2.2).

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.2.0 and 16.2.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.27.2, refer to 2.27.0.

Security notes

In total, since Container Cloud 2.27.1, 95 Common Vulnerabilities and Exposures (CVE) have been fixed in 2.27.2: 6 of critical and 89 of high severity.

The table below includes the total numbers of addressed unique and common CVEs in images by product component since Container Cloud 2.27.1. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Kaas core

Unique

5

26

31

Common

6

69

75

StackLight

Unique

0

3

3

Common

0

20

20

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.1.7: Security notes.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.27.2 including the Cluster releases 16.2.2, 16.1.7, and 17.1.7.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.

Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.27.2. For artifacts of the Cluster releases introduced in 2.27.2, see patch Cluster releases 16.2.2, 16.1.7, and 17.1.7.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240716085444

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240716085444

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.40.18.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.40.18.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.40.18.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.40.18.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.40.18.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.40.18.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.40.18

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-27-alpine-20240701130209

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-27-alpine-20240711081559

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-27-alpine-20240701130719

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.40.18

ironic Updated

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20240716113922

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20240716113922

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-27-alpine-20240701133222

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-40-g890ffca

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.14.5-e86184d9-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.14.5-e86184d9-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240701125905

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.40.18.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.40.18.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.40.18.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.40.18.tgz

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.40.18.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.40.18.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.40.18.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.40.18.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.40.18.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.40.18.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.40.18.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.40.18.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.40.18.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.40.18.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.40.18.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.40.18.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.40.18.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.40.18.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.40.18.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.40.18.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.40.18.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.40.18.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.40.18.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.40.18.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.40.18.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.40.18.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.40.18.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.40.18.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.40.18.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.40.18.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.40.18.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.40.18.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.40.18

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.40.18

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.40.18

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.40.18

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-6

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.40.18

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.40.18

frontend Updated

mirantis.azurecr.io/core/frontend:1.40.18

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.40.18

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.40.18

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.40.18

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.40.18

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.40.18

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.40.18

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.40.18

mcc-haproxy

mirantis.azurecr.io/lcm/mcc-haproxy:v0.25.0-40-g890ffca

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-40-g890ffca

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.40.18

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.40.18

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.40.18

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.40.18

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.40.18

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-10

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.40.18

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.40.18

secret-controller Updated

mirantis.azurecr.io/core/secret-controller:1.40.18

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.40.18

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.40.18

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.40.18

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.40.18

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.40.18

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.40.18.tgz

Docker images

kubectl Updated

mirantis.azurecr.io/general/kubectl:20240711152257

kubernetes-entrypoint Removed

n/a

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:24.0.5-20240621131831

2.27.1

Important

For MOSK clusters, Container Cloud 2.27.1 is the continuation for MOSK 24.1.x series using the patch Cluster release 17.1.6. For the update path of 24.1 and 24.2 series, see MOSK documentation: Cluster update scheme.

The management cluster of a MOSK 24.1 or 24.1.5 cluster is automatically updated to the latest patch Cluster release 16.2.1.

The Container Cloud patch release 2.27.1, which is based on the 2.27.0 major release, provides the following updates:

  • Support for the patch Cluster release 16.2.1.

  • Support for the patch Cluster releases 16.1.6 and 17.1.6 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.1.6.

  • Support for MKE 3.7.10.

  • Support for docker-ee-cli 23.0.13 in MCR 23.0.11 to fix several CVEs.

  • Bare metal: update of Ubuntu mirror from ubuntu-2024-05-17-013445 to ubuntu-2024-06-27-095142 along with update of minor kernel version from 5.15.0-107-generic to 5.15.0-113-generic.

  • Security fixes for CVEs in images.

  • Bug fixes.

This patch release also supports the latest major Cluster releases 17.2.0 and 16.2.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.27.1, refer to 2.27.0.

Security notes

In total, since Container Cloud 2.27.0, 270 Common Vulnerabilities and Exposures (CVE) of high severity have been fixed in 2.27.1.

The table below includes the total numbers of addressed unique and common CVEs in images by product component since Container Cloud 2.27.0. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

6

6

Common

0

29

29

Kaas core

Unique

0

10

10

Common

0

178

178

StackLight

Unique

0

14

14

Common

0

63

63

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.1.6: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.27.1 along with the patch Cluster releases 16.2.1, 16.1.6, and 17.1.6.

  • [42304] [StackLight] [Cluster releases 17.1.6, 16.1.6] Fixed the issue with failure of shard relocation in the OpenSearch cluster on large Container Cloud managed clusters.

  • [40020] [StackLight] [Cluster releases 17.1.6, 16.1.6] Fixed the issue with rollover_policy not being applied to the current indices while updating the policy for the current system* and audit* data streams.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.27.1 including the Cluster releases 16.2.1, 16.1.6, and 17.1.6.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.

Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.1.6, 16.2.1, or 16.1.6.

Consider this information as a supplement to the generic update procedures published in Operations Guide: Automatic upgrade of a management cluster and Update a patch Cluster release of a managed cluster.

Post-update actions
Prepare for changing label values in Ceph metrics used in customizations

Note

If you do not use Ceph metrics in any customizations, for example, custom alerts, Grafana dashboards, or queries in custom workloads, skip this section.

After deprecating the performance metric exporter that is integrated into the Ceph Manager daemon for the sake of the dedicated Ceph Exporter daemon in Container Cloud 2.27.0, you may need to prepare for updating values of several labels in Ceph metrics if you use them in any customizations such as custom alerts, Grafana dashboards, or queries in custom tools. These labels will be changed in Container Cloud 2.28.0 (Cluster releases 16.3.0 and 17.3.0).

Note

Names of metrics will not be changed, no metrics will be removed.

All Ceph metrics to be collected by the Ceph Exporter daemon will change their labels job and instance due to scraping metrics from new Ceph Exporter daemon instead of the performance metric exporter of Ceph Manager:

  • Values of the job labels will be changed from rook-ceph-mgr to prometheus-rook-exporter for all Ceph metrics moved to Ceph Exporter. The full list of moved metrics is presented below.

  • Values of the instance labels will be changed from the metric endpoint of Ceph Manager with port 9283 to the metric endpoint of Ceph Exporter with port 9926 for all Ceph metrics moved to Ceph Exporter. The full list of moved metrics is presented below.

  • Values of the instance_id labels of Ceph metrics from the RADOS Gateway (RGW) daemons will be changed from the daemon GID to the daemon subname. For example, instead of instance_id="<RGW_PROCESS_GID>", the instance_id="a" (ceph_rgw_qlen{instance_id="a"}) will be used. The list of moved Ceph RGW metrics is presented below.

List of affected Ceph RGW metrics
  • ceph_rgw_cache_.*

  • ceph_rgw_failed_req

  • ceph_rgw_gc_retire_object

  • ceph_rgw_get.*

  • ceph_rgw_keystone_.*

  • ceph_rgw_lc_.*

  • ceph_rgw_lua_.*

  • ceph_rgw_pubsub_.*

  • ceph_rgw_put.*

  • ceph_rgw_qactive

  • ceph_rgw_qlen

  • ceph_rgw_req

List of all metrics to be collected by Ceph Exporter instead of Ceph Manager
  • ceph_bluefs_.*

  • ceph_bluestore_.*

  • ceph_mds_cache_.*

  • ceph_mds_caps

  • ceph_mds_ceph_.*

  • ceph_mds_dir_.*

  • ceph_mds_exported_inodes

  • ceph_mds_forward

  • ceph_mds_handle_.*

  • ceph_mds_imported_inodes

  • ceph_mds_inodes.*

  • ceph_mds_load_cent

  • ceph_mds_log_.*

  • ceph_mds_mem_.*

  • ceph_mds_openino_dir_fetch

  • ceph_mds_process_request_cap_release

  • ceph_mds_reply_.*

  • ceph_mds_request

  • ceph_mds_root_.*

  • ceph_mds_server_.*

  • ceph_mds_sessions_.*

  • ceph_mds_slow_reply

  • ceph_mds_subtrees

  • ceph_mon_election_.*

  • ceph_mon_num_.*

  • ceph_mon_session_.*

  • ceph_objecter_.*

  • ceph_osd_numpg.*

  • ceph_osd_op.*

  • ceph_osd_recovery_.*

  • ceph_osd_stat_.*

  • ceph_paxos.*

  • ceph_prioritycache.*

  • ceph_purge.*

  • ceph_rgw_cache_.*

  • ceph_rgw_failed_req

  • ceph_rgw_gc_retire_object

  • ceph_rgw_get.*

  • ceph_rgw_keystone_.*

  • ceph_rgw_lc_.*

  • ceph_rgw_lua_.*

  • ceph_rgw_pubsub_.*

  • ceph_rgw_put.*

  • ceph_rgw_qactive

  • ceph_rgw_qlen

  • ceph_rgw_req

  • ceph_rocksdb_.*

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.27.1. For artifacts of the Cluster releases introduced in 2.27.1, see patch Cluster releases 16.2.1, 16.1.6, and 17.1.6.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240627104414

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240627104414

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.40.15.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.40.15.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.40.15.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.40.15.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.40.15.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.40.15.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.40.15

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-27-alpine-20240701130209

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-27-alpine-20240701130001

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-2-27-alpine-20240701130719

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.40.15

ironic

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20240522120643

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20240522120643

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam Updated

mirantis.azurecr.io/bm/kaas-ipam:base-2-27-alpine-20240701133222

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-40-g890ffca

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.14.5-e86184d9-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.14.5-e86184d9-amd64

syslog-ng Updated

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240701125905

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.40.15.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.40.15.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.40.15.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.40.15.tgz

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.40.15.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.40.15.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.40.15.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.40.15.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.40.15.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.40.15.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.40.15.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.40.15.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.40.15.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.40.15.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.40.15.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.40.15.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.40.15.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.40.15.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.40.15.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.40.15.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.40.15.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.40.15.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.40.15.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.40.15.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.40.15.tgz

secret-controller

https://binary.mirantis.com/core/helm/secret-controller-1.40.15.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.40.15.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.40.15.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.40.15.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.40.15.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.40.15.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.40.15.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.40.15

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.40.15

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.40.15

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.40.15

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-6

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.40.15

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.40.15

frontend Updated

mirantis.azurecr.io/core/frontend:1.40.15

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.40.15

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.40.15

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.40.15

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.40.15

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.40.15

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.40.15

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.40.15

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.25.0-40-g890ffca

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-40-g890ffca

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.40.15

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.40.15

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.40.15

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.40.15

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.40.15

registry Updated

mirantis.azurecr.io/lcm/registry:v2.8.1-10

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.40.15

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.40.15

secret-controller Updated

mirantis.azurecr.io/core/secret-controller:1.40.15

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.40.15

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.40.15

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.40.15

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.40.15

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.40.15

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.40.15.tgz

Docker images

kubectl

mirantis.azurecr.io/stacklight/kubectl:1.22-20240501023013

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keycloak Updated

mirantis.azurecr.io/iam/mcc-keycloak:24.0.5-20240621131831

2.27.0

The Mirantis Container Cloud major release 2.27.0:

  • Introduces support for the Cluster release 17.2.0 that is based on the Cluster release 16.2.0 and represents Mirantis OpenStack for Kubernetes (MOSK) 24.2.

  • Introduces support for the Cluster release 16.2.0 that is based on Mirantis Container Runtime (MCR) 23.0.11 and Mirantis Kubernetes Engine (MKE) 3.7.8 with Kubernetes 1.27.

  • Does not support greenfield deployments on deprecated Cluster releases of the 17.1.x and 16.1.x series. Use the latest available Cluster releases of the series instead.

    Caution

    Make sure to update the Cluster release version of your managed cluster before the current Cluster release version becomes unsupported by a new Container Cloud release version. Otherwise, Container Cloud stops auto-upgrade and eventually Container Cloud itself becomes unsupported.

This section outlines release notes for the Container Cloud release 2.27.0.

Enhancements

This section outlines new features and enhancements introduced in the Container Cloud release 2.27.0. For the list of enhancements delivered with the Cluster releases introduced by Container Cloud 2.27.0, see 17.2.0 and 16.2.0.

General availability for Ubuntu 22.04 on bare metal clusters

Implemented full support for Ubuntu 22.04 LTS (Jellyfish) as the default host operating system that now installs on non-MOSK bare metal management and managed clusters.

For MOSK:

  • Existing management clusters are automatically updated to Ubuntu 22.04 during cluster upgrade to Container Cloud 2.27.0 (Cluster release 16.2.0).

  • Greenfield deployments of management clusters are based on Ubuntu 22.04.

  • Existing and greenfield deployments of managed clusters are still based on Ubuntu 20.04. The support for Ubuntu 22.04 on this cluster type will be announced in one of the following releases.

Caution

Upgrading from Ubuntu 20.04 to 22.04 on existing deployments of Container Cloud managed clusters is not supported.

Improvements in the day-2 management API for bare metal clusters

TechPreview

Enhanced the day-2 management API the bare metal provider with several key improvements:

  • Implemented the sysctl, package, and irqbalance configuration modules, which become available for usage after your management cluster upgrade to the Cluster release 16.2.0. These Container Cloud modules use the designated HostOSConfiguration object named mcc-modules to distingish them from custom modules.

    Configuration modules allow managing the operating system of a bare metal host granularly without rebuilding the node from scratch. Such approach prevents workload evacuation and significantly reduces configuration time.

  • Optimized performance for faster, more efficient operations.

  • Enhanced user experience for easier and more intuitive interactions.

  • Resolved various internal issues to ensure smoother functionality.

  • Added comprehensive documentation, including concepts, guidelines, and recommendations for effective use of day-2 operations.

Optimization of strict filtering for devices on bare metal clusters

Optimized the BareMetalHostProfile custom resource, which uses the strict byID filtering to target system disks using the byPath, serialNumber, and wwn reliable device options instead of the unpredictable byName naming format.

The optimization includes changes in admission-controller that now blocks the use of bmhp:spec:devices:by_name in new BareMetalHostProfile objects.

Deprecation of SubnetPool and MetalLBConfigTemplate objects

As part of refactoring of the bare metal provider, deprecated the SubnetPool and MetalLBConfigTemplate objects. The objects will be completely removed from the product in one of the following releases.

Both objects are automatically migrated to the MetallbConfig object during cluster update to the Cluster release 17.2.0 or 16.2.0.

Learn more

Deprecation notes

The ClusterUpdatePlan object for a granular cluster update

TechPreview

Implemented the ClusterUpdatePlan custom resource to enable a granular step-by-step update of a managed cluster. The operator can control the update process by manually launching update stages using the commence flag. Between the update stages, a cluster remains functional from the perspective of cloud users and workloads.

A ClusterUpdatePlan object is automatically created by the respective Container Cloud provider when a new Cluster release becomes available for your cluster. This object contains a list of predefined self-descriptive update steps that are cluster-specific. These steps are defined in the spec section of the object with information about their impact on the cluster.

Update groups for worker machines

Implemented the UpdateGroup custom resource for creation of update groups for worker machines on managed clusters. The use of update groups provides enhanced control over update of worker machines. This feature decouples the concurrency settings from the global cluster level, providing update flexibility based on the workload characteristics of different worker machine sets.

LCM Agent heartbeats

Implemented the same heartbeat model for the LCM Agent as Kubernetes uses for Nodes. This model allows reflecting the actual status of the LCM Agent when it fails. For visual representation, added the corresponding LCM Agent status to the Container Cloud web UI for clusters and machines, which reflects health status of the LCM agent along with its status of update to the version from the current Cluster release.

Handling secret leftovers using secret-controller

Implemented secret-controller that runs on a management cluster and cleans up secret leftovers of credentials that are not cleaned up automatically after creation of new secrets. This controller replaces rhellicense-controller, proxy-controller, and byo-credentials-controller as well as partially replaces the functionality of license-controller and other credential controllers.

Note

You can change memory limits for secret-controller on a management cluster using the resources:limits parameter in the spec:providerSpec:value:kaas:management:helmReleases: section of the Cluster object.

MariaDB backup for bare metal and vSphere providers

Implemented the capability to back up and restore MariaDB databases on management clusters for bare metal and vSphere providers. Also, added documentation on how to change the storage node for backups on clusters of these provider types.

Addressed issues

The following issues have been addressed in the Mirantis Container Cloud release 2.27.0 along with the Cluster releases 17.2.0 and 16.2.0.

Note

This section provides descriptions of issues addressed since the last Container Cloud patch release 2.26.5.

For details on addressed issues in earlier patch releases since 2.26.0, which are also included into the major release 2.27.0, refer to 2.26.x patch releases.

  • [42304] [StackLight] Fixed the issue with failure of shard relocation in the OpenSearch cluster on large Container Cloud managed clusters.

  • [41890] [StackLight] Fixed the issue with Patroni failing to start because of the short default timeout.

  • [40020] [StackLight] Fixed the issue with rollover_policy not being applied to the current indices while updating the policy for the current system* and audit* data streams.

  • [41819] [Ceph] Fixed the issue with the graceful cluster reboot being blocked by active Ceph ClusterWorkloadLock objects.

  • [28865] [LCM] Fixed the issue with validation of the NTP configuration before cluster deployment. Now, deployment does not start until the NTP configuration is validated.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.27.0 including the Cluster releases 17.2.0 and 16.2.0.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[47202] Inspection error on bare metal hosts after dnsmasq restart

Note

Moving forward, the workaround for this issue will be moved from Release Notes to MOSK Troubleshooting Guide: Inspection error on bare metal hosts after dnsmasq restart.

If the dnsmasq pod is restarted during the bootstrap of newly added nodes, those nodes may fail to undergo inspection. That can result in inspection error in the corresponding BareMetalHost objects.

The issue can occur when:

  • The dnsmasq pod was moved to another node.

  • DHCP subnets were changed, including addition or removal. In this case, the dhcpd container of the dnsmasq pod is restarted.

    Caution

    If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the dnsmasq pod becomes ready, then create BareMetalHost objects.

To verify whether the nodes are affected:

  1. Verify whether the BareMetalHost objects contain the inspection error:

    kubectl get bmh -n <managed-cluster-namespace-name>
    

    Example of system response:

    NAME            STATE         CONSUMER        ONLINE   ERROR              AGE
    test-master-1   provisioned   test-master-1   true                        9d
    test-master-2   provisioned   test-master-2   true                        9d
    test-master-3   provisioned   test-master-3   true                        9d
    test-worker-1   provisioned   test-worker-1   true                        9d
    test-worker-2   provisioned   test-worker-2   true                        9d
    test-worker-3   inspecting                    true     inspection error   19h
    
  2. Verify whether the dnsmasq pod was in Ready state when the inspection of the affected baremetal hosts (test-worker-3 in the example above) was started:

    kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
    

    Example of system response:

    ...
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2024-10-11T07:38:54Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2024-10-10T15:37:34Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf
        image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650
        imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b
        lastState:
          terminated:
            containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987
            exitCode: 0
            finishedAt: "2024-10-11T07:38:35Z"
            reason: Completed
            startedAt: "2024-10-10T15:37:45Z"
        name: dhcpd
        ready: true
        restartCount: 2
        started: true
        state:
          running:
            startedAt: "2024-10-11T07:38:37Z"
      ...
    

    In the system response above, the dhcpd container was not ready between "2024-10-11T07:38:35Z" and "2024-10-11T07:38:54Z".

  3. Verify the affected baremetal host. For example:

    kubectl get bmh -n managed-ns test-worker-3 -oyaml
    

    Example of system response:

    ...
    status:
      errorCount: 15
      errorMessage: Introspection timeout
      errorType: inspection error
      ...
      operationHistory:
        deprovision:
          end: null
          start: null
        inspect:
          end: null
          start: "2024-10-11T07:38:19Z"
        provision:
          end: null
          start: null
        register:
          end: "2024-10-11T07:38:19Z"
          start: "2024-10-11T07:37:25Z"
    

    In the system response above, inspection was started at "2024-10-11T07:38:19Z", immediately before the period of the dhcpd container downtime. Therefore, this node is most likely affected by the issue.

Workaround

  1. Reboot the node using the IPMI reset or cycle command.

  2. If the node fails to boot, remove the failed BareMetalHost object and create it again:

    1. Remove BareMetalHost object. For example:

      kubectl delete bmh -n managed-ns test-worker-3
      
    2. Verify that the BareMetalHost object is removed:

      kubectl get bmh -n managed-ns test-worker-3
      
    3. Create a BareMetalHost object from the template. For example:

      kubectl create -f bmhc-test-worker-3.yaml
      kubectl create -f bmh-test-worker-3.yaml
      
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[50566] Ceph upgrade is very slow during patch or major cluster update

Due to the upstream Ceph issue 66717, during CVE upgrade of the Ceph daemon image of Ceph Reef 18.2.4, OSDs may start slow and even fail the starting probe with the following describe output in the rook-ceph-osd-X pod:

 Warning  Unhealthy  57s (x16 over 3m27s)  kubelet  Startup probe failed:
 ceph daemon health check failed with the following output:
> no valid command found; 10 closest matches:
> 0
> 1
> 2
> abort
> assert
> bluefs debug_inject_read_zeros
> bluefs files list
> bluefs stats
> bluestore bluefs device info [<alloc_size:int>]
> config diff
> admin_socket: invalid command

Workaround:

Complete the following steps during every patch or major cluster update of the Cluster releases 17.2.x, 17.3.x, and 17.4.x (until Ceph 18.2.5 becomes supported):

  1. Plan extra time in the maintenance window for the patch cluster update.

    Slow starts will still impact the update procedure, but after completing the following step, the recovery process noticeably shortens without affecting the overall cluster state and data responsiveness.

  2. Select one of the following options:

    • Before the cluster update, set the noout flag:

      ceph osd set noout
      

      Once the Ceph OSDs image upgrade is done, unset the flag:

      ceph osd unset noout
      
    • Monitor the Ceph OSDs image upgrade. If the symptoms of slow start appear, set the noout flag as soon as possible. Once the Ceph OSDs image upgrade is done, unset the flag.

[42908] The ceph-exporter pods are present in the Ceph crash list

After a managed cluster update, the ceph-exporter pods are present in the ceph crash ls list while rook-ceph-exporter attempts to obtain the port that is still in use. The issue does not block the managed cluster update. Once the port becomes available, rook-ceph-exporter obtains the port and the issue disappears.

As a workaround, run ceph crash archive-all to remove ceph-exporter pods from the Ceph crash list.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


StackLight
[44193] OpenSearch reaches 85% disk usage watermark affecting the cluster state

Fixed in 2.29.0 (17.4.0 and 16.4.0)

On High Availability (HA) clusters that use Local Volume Provisioner (LVP), Prometheus and OpenSearch from StackLight may share the same pool of storage. In such configuration, OpenSearch may approach the 85% disk usage watermark due to the combined storage allocation and usage patterns set by the Persistent Volume Claim (PVC) size parameters for Prometheus and OpenSearch, which consume storage the most.

When the 85% threshold is reached, the affected node is transitioned to the read-only state, preventing shard allocation and causing the OpenSearch cluster state to transition to Warning (Yellow) or Critical (Red).

Caution

The issue and the provided workaround apply only for clusters on which OpenSearch and Prometheus utilize the same storage pool.

To verify that the cluster is affected:

  1. Verify the result of the following formula:

    0.8 × OpenSearch_PVC_Size_GB + Prometheus_PVC_Size_GB > 0.85 × Total_Storage_Capacity_GB
    

    In the formula, define the following values:

    OpenSearch_PVC_Size_GB

    Derived from .values.elasticsearch.persistentVolumeUsableStorageSizeGB, defaulting to .values.elasticsearch.persistentVolumeClaimSize if unspecified. To obtain the OpenSearch PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.elasticsearch.persistentVolumeClaimSize '
    

    Example of system response:

    10000Gi
    
    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize. To obtain the Prometheus PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.prometheusServer.persistentVolumeClaimSize '
    

    Example of system response:

    4000Gi
    
    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    If the formula result is positive, it is an early indication that the cluster is affected.

  2. Verify whether the OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical alert is firing. And if so, verify the following:

    1. Log in to the OpenSearch web UI.

    2. In Management -> Dev Tools, run the following command:

      GET _cluster/allocation/explain
      

      The following system response indicates that the corresponding node is affected:

      "explanation": "the node is above the low watermark cluster setting \
      [cluster.routing.allocation.disk.watermark.low=85%], using more disk space \
      than the maximum allowed [85.0%], actual free: [xx.xxx%]"
      

      Note

      The system response may contain even higher watermark percent than 85.0%, depending on the case.

Workaround:

Warning

The workaround implies adjustement of the retention threshold for OpenSearch. And depending on the new threshold, some old logs will be deleted.

  1. Adjust or set .values.elasticsearch.persistentVolumeUsableStorageSizeGB to a lower value for the affection check formula to be non-positive. For configuration details, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

    Mirantis also recommends reserving some space for other PVCs using storage from the pool. Use the following formula to calculate the required space:

    persistentVolumeUsableStorageSizeGB =
    0.84 × ((1 - Reserved_Percentage - Filesystem_Reserve) ×
    Total_Storage_Capacity_GB - Prometheus_PVC_Size_GB) /
    0.8
    

    In the formula, define the following values:

    Reserved_Percentage

    A user-defined variable that specifies what percentage of the total storage capacity should not be used by OpenSearch or Prometheus. This is used to reserve space for other components. It should be expressed as a decimal. For example, for 5% of reservation, Reserved_Percentage is 0.05. Mirantis recommends using 0.05 as a starting point.

    Filesystem_Reserve

    Percentage to deduct for filesystems that may reserve some portion of the available storage, which is marked as occupied. For example, for EXT4, it is 5% by default, so the value must be 0.05.

    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize.

    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    Calculation of above formula provides a maximum safe storage to allocate for .values.elasticsearch.persistentVolumeUsableStorageSizeGB. Use this formula as a reference for setting .values.elasticsearch.persistentVolumeUsableStorageSizeGB on a cluster.

  2. Wait up to 15-20 mins for OpenSearch to perform the cleaning.

  3. Verify that the cluster is not affected anymore using the procedure above.

[43164] Rollover policy is not added to indicies created without a policy

Fixed in 2.28.0 (17.3.0 and 16.3.0)

The initial index for the system* and audit* data streams can be created without any policy attached due to race condition.

One of indicators that the cluster is most likely affected is the KubeJobFailed alert firing for the elasticsearch-curator job and one or both of the following errors being present in elasticsearch-curator pods that remain in the Error status:

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  \
<class 'curator.exceptions.FailedExecution'>: Exception encountered.  \
Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. \
Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-system-000001] \
is the write index for data stream [system] and cannot be deleted')

or

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  \
<class 'curator.exceptions.FailedExecution'>: Exception encountered.  \
Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. \
Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-audit-000001] \
is the write index for data stream [audit] and cannot be deleted')

If the above mentioned alert and errors are present, an immediate action is required, because it indicates that the corresponding index size has already exceeded the space allocated for the index.

To verify that the cluster is affected:

Caution

Verify and apply the workaround to both index patterns, system and audit, separately.

If one of indices is affected, the second one is most likely affected as well. Although in rare cases, only one index may be affected.

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. Verify whether the rollover policy is attached to the index with the 000001 number:

    • system:

      curl localhost:9200/_plugins/_ism/explain/.ds-system-000001
      
    • audit:

      curl localhost:9200/_plugins/_ism/explain/.ds-audit-000001
      

    If the rollover policy is not attached, the cluster is affected. Examples of system responses in an affected cluster:

     {
      ".ds-system-000001": {
        "index.plugins.index_state_management.policy_id": null,
        "index.opendistro.index_state_management.policy_id": null,
        "enabled": null
      },
      "total_managed_indices": 0
    }
    
    {
      ".ds-audit-000001": {
        "index.plugins.index_state_management.policy_id": null,
        "index.opendistro.index_state_management.policy_id": null,
        "enabled": null
      },
      "total_managed_indices": 0
    }
    

Workaround:

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. Add the policy:

    • system:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/system* -d'{"policy_id":"system_rollover_policy"}'
      
    • audit:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/audit* -d'{"policy_id":"audit_rollover_policy"}'
      
  3. Perform again the last step of the cluster verification procedure provided above and make sure that the policy is attached to the index.

Container Cloud web UI
[50181] Failure to deploy a compact cluster

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.

Components versions

The following table lists the major components and their versions delivered in Container Cloud 2.27.0.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Component

Application/Service

Version

Bare metal

baremetal-dnsmasq Updated

base-2-27-alpine-20240523143049

baremetal-operator Updated

base-2-27-alpine-20240523142757

baremetal-provider Updated

1.40.11

bm-collective Updated

base-2-27-alpine-20240523143803

cluster-api-provider-baremetal Updated

1.40.11

ironic Updated

antelope-jammy-20240522120643

ironic-inspector Updated

antelope-jammy-20240522120643

ironic-prometheus-exporter

0.1-20240117102150

kaas-ipam Updated

base-2-27-alpine-20240531082457

kubernetes-entrypoint

v1.0.1-ba8ada4-20240405150338

mariadb

10.6.17-focal-20240523075821

metallb-controller Updated

v0.14.5-e86184d9-amd64

metallb-speaker Updated

v0.14.5-e86184d9-amd64

syslog-ng

base-alpine-20240129163811

Container Cloud

admission-controller Updated

1.40.11

agent-controller Updated

1.40.11

byo-cluster-api-controller Updated

1.40.11

byo-credentials-controller Removed

n/a

ceph-kcc-controller Updated

1.40.11

cert-manager-controller

1.11.0-6

cinder-csi-plugin

1.27.2-16

client-certificate-controller Updated

1.40.11

configuration-collector Updated

1.40.11

csi-attacher

4.2.0-5

csi-node-driver-registrar

2.7.0-5

csi-provisioner

3.4.1-5

csi-resizer

1.7.0-5

csi-snapshotter

6.2.1-mcc-4

event-controller Updated

1.40.11

frontend Updated

1.40.12

golang

1.21.7-alpine3.18

iam-controller Updated

1.40.11

kaas-exporter Updated

1.40.11

kproxy Updated

1.40.11

lcm-controller Updated

1.40.11

license-controller Updated

1.40.11

livenessprobe Updated

2.9.0-5

machinepool-controller Updated

1.40.11

mcc-haproxy Updated

0.25.0-37-gc15c97d

metrics-server

0.6.3-7

nginx Updated

1.40.11

policy-controller New

1.40.11

portforward-controller Updated

1.40.11

proxy-controller Updated

1.40.11

rbac-controller Updated

1.40.11

registry

2.8.1-9

release-controller Updated

1.40.11

rhellicense-controller Removed

n/a

scope-controller Updated

1.40.11

secret-controller New

1.40.11

storage-discovery Updated

1.40.11

user-controller Updated

1.40.11

IAM

iam Updated

1.40.11

mariadb

10.6.17-focal-20240523075821

mcc-keycloak Updated

24.0.3-20240527150505

OpenStack Updated

host-os-modules-controller Updated

1.40.11

openstack-cloud-controller-manager

v1.27.2-16

openstack-cluster-api-controller

1.40.11

openstack-provider

1.40.11

os-credentials-controller

1.40.11

VMware vSphere

mcc-keepalived Updated

0.25.0-37-gc15c97d

squid-proxy

0.0.1-10-g24a0d69

vsphere-cloud-controller-manager

v1.27.0-6

vsphere-cluster-api-controller Updated

1.40.11

vsphere-credentials-controller Updated

1.40.11

vsphere-csi-driver

v3.0.2-1

vsphere-csi-syncer

v3.0.2-1

vsphere-provider Updated

1.40.11

vsphere-vm-template-controller Updated

1.40.11

Artifacts

This section lists the artifacts of components included in the Container Cloud release 2.27.0.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240517093708

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240517093708

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.40.11.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.40.11.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.40.11.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.40.11.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.40.11.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.40.11.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.40.11.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.40.11

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-27-alpine-20240523143049

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-27-alpine-20240523142757

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-2-27-alpine-20240523143803

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.40.11

ironic Updated

mirantis.azurecr.io/openstack/ironic:antelope-jammy-20240522120643

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:antelope-jammy-20240522120643

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam Updated

mirantis.azurecr.io/bm/kaas-ipam:base-2-27-alpine-20240531082457

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-37-gc15c97d

metallb-controller Updated

mirantis.azurecr.io/bm/metallb/controller:v0.14.5-e86184d9-amd64

metallb-speaker Updated

mirantis.azurecr.io/bm/metallb/speaker:v0.14.5-e86184d9-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240129163811

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.40.11.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.40.11.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.40.11.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.40.11.tgz

byo-credentials-controller Removed

n/a

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.40.11.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.40.11.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.40.11.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.40.11.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.40.11.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.40.11.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.40.11.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.40.11.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.40.11.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.40.11.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.40.11.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.40.12.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.40.11.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.40.11.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.40.11.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.40.11.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.40.11.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.40.11.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.40.11.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.40.11.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.40.11.tgz

policy-controller

https://binary.mirantis.com/core/helm/policy-controller-1.40.11.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.40.11.tgz

proxy-controller Removed

n/a

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.40.11.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.40.11.tgz

rhellicense-controller Removed

n/a

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.40.11.tgz

secret-controller New

https://binary.mirantis.com/core/helm/secret-controller-1.40.11.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.40.11.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.40.11.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.40.11.tgz

vsphere-cloud-controller-manager

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.40.11.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.40.11.tgz

vsphere-csi-plugin

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.40.11.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.40.11.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.40.11.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.40.11

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.40.11

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.40.11

byo-credentials-controller Removed

n/a

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.40.11

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-6

cinder-csi-plugin

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-16

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.40.11

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.40.11

csi-attacher

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-5

csi-node-driver-registrar

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-5

csi-provisioner

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-5

csi-resizer

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-5

csi-snapshotter

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-4

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.40.11

frontend Updated

mirantis.azurecr.io/core/frontend:1.40.12

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.40.11

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.40.11

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.40.11

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.40.11

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.40.11

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.40.11

livenessprobe

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-5

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.40.11

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.25.0-37-gc15c97d

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.25.0-37-gc15c97d

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-7

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.40.11

openstack-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-16

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.40.11

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.40.11

policy-controller Updated

mirantis.azurecr.io/core/policy-controller:1.40.11

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.40.11

proxy-controller Removed

n/a

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.40.11

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-9

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.40.11

rhellicense-controller Removed

n/a

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.40.11

secret-controller New

mirantis.azurecr.io/core/secret-controller:1.40.11

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.40.11

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.40.11

vsphere-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-6

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.40.11

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.40.11

vsphere-csi-driver

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.40.11

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.40.11.tgz

Docker images

kubectl

mirantis.azurecr.io/stacklight/kubectl:1.22-20240501023013

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keycloak Updated

mirantis.azurecr.io/iam/mcc-keycloak:24.0.3-20240527150505

Security notes

In total, since Container Cloud 2.26.0, in 2.27.0, 408 Common Vulnerabilities and Exposures (CVE) have been fixed: 26 of critical and 382 of high severity.

The table below includes the total numbers of addressed unique and common vulnerabilities and exposures (CVE) by product component since the 2.26.5 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Kaas core

Unique

0

7

7

Common

0

13

13

StackLight

Unique

4

14

18

Common

4

25

29

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.2: Security notes.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.2.0 or 16.2.0.

Consider this information as a supplement to the generic update procedures published in Operations Guide: Automatic upgrade of a management cluster and Update a managed cluster.

Updated scheme for patch Cluster releases

Starting from Container Cloud 2.26.5, Mirantis introduces a new update scheme allowing for the update path flexibility. For details, see Patch update schemes before and since 2.26.5. For details on MOSK update scheme, refer to MOSK documentation: Update notes.

For those clusters that update between only major versions, the update scheme remains unchaged.

Caution

In Container Cloud patch releases 2.27.1 and 2.27.2, only the 16.2.x patch Cluster releases will be delivered with an automatic update of management clusters and the possibility to update non-MOSK managed clusters.

In parallel, 2.27.1 and 2.27.2 will include new 16.1.x and 17.1.x patches for MOSK 24.1.x. And the first 17.2.x patch Cluster release for MOSK 24.2.x will be delivered in 2.27.3. For details, see MOSK documentation: Update path for 24.1 and 24.2 series.

Pre-update actions
Update bird configuration on BGP-enabled bare metal clusters

Note

If you have already completed the below procedure after updating your clusters to Container Cloud 2.26.0 (Cluster releases 17.1.0 or 16.1.0), skip this subsection.

Container Cloud 2.26.0 introduced the bird daemon update from v1.6.8 to v2.0.7 on master nodes if BGP is used for BGP announcement of the cluster API load balancer address.

Configuration files for bird v1.x are not fully compatible with those for bird v2.x. Therefore, if you used BGP announcement of cluster API LB address on a deployment based on Cluster releases 17.0.0 or 16.0.0, update bird configuration files to fit bird v2.x using configuration examples provided in the API Reference: MultirRackCluster section.

Review and adjust the storage parameters for OpenSearch

Note

If you have already completed the below procedure after updating your clusters to Container Cloud 2.26.0 (Cluster releases 17.1.0 or 16.1.0), skip this subsection.

To prevent underused or overused storage space, review your storage space parameters for OpenSearch on the StackLight cluster:

  1. Review the value of elasticsearch.persistentVolumeClaimSize and the real storage available on volumes.

  2. Decide whether you have to additionally set elasticsearch.persistentVolumeUsableStorageSizeGB.

For both parameters description, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

Post-update actions
Prepare for changing label values in Ceph metrics used in customizations

Note

If you do not use Ceph metrics in any customizations, for example, custom alerts, Grafana dashboards, or queries in custom workloads, skip this section.

After deprecating the performance metric exporter that is integrated into the Ceph Manager daemon for the sake of the dedicated Ceph Exporter daemon in Container Cloud 2.27.0, you may need to prepare for updating values of several labels in Ceph metrics if you use them in any customizations such as custom alerts, Grafana dashboards, or queries in custom tools. These labels will be changed in Container Cloud 2.28.0 (Cluster releases 16.3.0 and 17.3.0).

Note

Names of metrics will not be changed, no metrics will be removed.

All Ceph metrics to be collected by the Ceph Exporter daemon will change their labels job and instance due to scraping metrics from new Ceph Exporter daemon instead of the performance metric exporter of Ceph Manager:

  • Values of the job labels will be changed from rook-ceph-mgr to prometheus-rook-exporter for all Ceph metrics moved to Ceph Exporter. The full list of moved metrics is presented below.

  • Values of the instance labels will be changed from the metric endpoint of Ceph Manager with port 9283 to the metric endpoint of Ceph Exporter with port 9926 for all Ceph metrics moved to Ceph Exporter. The full list of moved metrics is presented below.

  • Values of the instance_id labels of Ceph metrics from the RADOS Gateway (RGW) daemons will be changed from the daemon GID to the daemon subname. For example, instead of instance_id="<RGW_PROCESS_GID>", the instance_id="a" (ceph_rgw_qlen{instance_id="a"}) will be used. The list of moved Ceph RGW metrics is presented below.

List of affected Ceph RGW metrics
  • ceph_rgw_cache_.*

  • ceph_rgw_failed_req

  • ceph_rgw_gc_retire_object

  • ceph_rgw_get.*

  • ceph_rgw_keystone_.*

  • ceph_rgw_lc_.*

  • ceph_rgw_lua_.*

  • ceph_rgw_pubsub_.*

  • ceph_rgw_put.*

  • ceph_rgw_qactive

  • ceph_rgw_qlen

  • ceph_rgw_req

List of all metrics to be collected by Ceph Exporter instead of Ceph Manager
  • ceph_bluefs_.*

  • ceph_bluestore_.*

  • ceph_mds_cache_.*

  • ceph_mds_caps

  • ceph_mds_ceph_.*

  • ceph_mds_dir_.*

  • ceph_mds_exported_inodes

  • ceph_mds_forward

  • ceph_mds_handle_.*

  • ceph_mds_imported_inodes

  • ceph_mds_inodes.*

  • ceph_mds_load_cent

  • ceph_mds_log_.*

  • ceph_mds_mem_.*

  • ceph_mds_openino_dir_fetch

  • ceph_mds_process_request_cap_release

  • ceph_mds_reply_.*

  • ceph_mds_request

  • ceph_mds_root_.*

  • ceph_mds_server_.*

  • ceph_mds_sessions_.*

  • ceph_mds_slow_reply

  • ceph_mds_subtrees

  • ceph_mon_election_.*

  • ceph_mon_num_.*

  • ceph_mon_session_.*

  • ceph_objecter_.*

  • ceph_osd_numpg.*

  • ceph_osd_op.*

  • ceph_osd_recovery_.*

  • ceph_osd_stat_.*

  • ceph_paxos.*

  • ceph_prioritycache.*

  • ceph_purge.*

  • ceph_rgw_cache_.*

  • ceph_rgw_failed_req

  • ceph_rgw_gc_retire_object

  • ceph_rgw_get.*

  • ceph_rgw_keystone_.*

  • ceph_rgw_lc_.*

  • ceph_rgw_lua_.*

  • ceph_rgw_pubsub_.*

  • ceph_rgw_put.*

  • ceph_rgw_qactive

  • ceph_rgw_qlen

  • ceph_rgw_req

  • ceph_rocksdb_.*

2.26.5

The Container Cloud patch release 2.26.5, which is based on the 2.26.0 major release, provides the following updates:

  • Support for the patch Cluster releases 16.1.5 and 17.1.5 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.1.5.

  • Bare metal: update of Ubuntu mirror from 20.04~20240502102020 to 20.04~20240517090228 along with update of minor kernel version from 5.15.0-105-generic to 5.15.0-107-generic.

  • Security fixes for CVEs in images.

  • Bug fixes.

This patch release also supports the latest major Cluster releases 17.1.0 and 16.1.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.26.5, refer to 2.26.0.

Security notes

The table below includes the total numbers of addressed unique and common CVEs in images by product component since the Container Cloud 2.26.4 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

1

1

Common

0

3

3

Kaas core

Unique

0

5

5

Common

0

12

12

StackLight

Unique

1

3

4

Common

2

6

8

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.1.5: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.26.5 along with the patch Cluster releases 17.1.5 and 16.1.5.

  • [42408] [bare metal] Fixed the issue with old versions of system packages, including kernel, remaining on the manager nodes after cluster update.

  • [41540] [LCM] Fixed the issue with lcm-agent failing to grab storage information on a host and leaving lcmmachine.status.hostinfo.hardware empty due to issues with managing physical NVME devices.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.26.4 including the Cluster releases 17.1.5 and 16.1.5.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[41819] Graceful cluster reboot is blocked by the Ceph ClusterWorkloadLocks

Fixed in 2.27.0 (17.2.0 and 16.2.0)

During graceful reboot of a cluster with Ceph enabled, the reboot is blocked with the following message in the MiraCephMaintenance object status:

message: ClusterMaintenanceRequest found, Ceph Cluster is not ready to upgrade,
 delaying cluster maintenance

As a workaround, add the following snippet to the cephFS section under metadataServer in the spec section of <kcc-name>.yaml in the Ceph cluster:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      metadataServer:
        activeCount: 1
        healthCheck:
          livenessProbe:
            probe:
              failureThreshold: 5
              initialDelaySeconds: 30
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 5
[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


StackLight
[42304] Failure of shard relocation in the OpenSearch cluster

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

On large managed clusters, shard relocation may fail in the OpenSearch cluster with the yellow or red status of the OpenSearch cluster. The characteristic symptom of the issue is that in the stacklight namespace, the statefulset.apps/opensearch-master containers are experiencing throttling with the KubeContainersCPUThrottlingHigh alert firing for the following set of labels:

{created_by_kind="StatefulSet",created_by_name="opensearch-master",namespace="stacklight"}

Caution

The throttling that OpenSearch is experiencing may be a temporary situation, which may be related, for example, to a peaky load and the ongoing shards initialization as part of disaster recovery or after node restart. In this case, Mirantis recommends waiting until initialization of all shards is finished. After that, verify the cluster state and whether throttling still exists. And only if throttling does not disappear, apply the workaround below.

To verify that the initialization of shards is ongoing:

kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash

curl "http://localhost:9200/_cat/shards" | grep INITIALIZING

Example of system response:

.ds-system-000072    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-system-000073    1 r INITIALIZING    10.232.7.145   opensearch-master-2
.ds-system-000073    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-audit-000001     2 r INITIALIZING    10.232.7.145   opensearch-master-2

The system response above indicates that shards from the .ds-system-000072, .ds-system-000073, and .ds-audit-000001 indicies are in the INITIALIZING state. In this case, Mirantis recommends waiting until this process is finished, and only then consider changing the limit.

You can additionally analyze the exact level of throttling and the current CPU usage on the Kubernetes Containers dashboard in Grafana.

Workaround:

  1. Verify the currently configured CPU requests and limits for the opensearch containers:

    kubectl -n stacklight get statefulset.apps/opensearch-master -o jsonpath="{.spec.template.spec.containers[?(@.name=='opensearch')].resources}"
    

    Example of system response:

    {"limits":{"cpu":"600m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    In the example above, the CPU request is 500m and the CPU limit is 600m.

  2. Increase the CPU limit to a reasonably high number.

    For example, the default CPU limit for the clusters with the clusterSize:large parameter set was increased from 8000m to 12000m for StackLight in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0).

    Note

    For details, on the clusterSize parameter, see MOSK Operations Guide: StackLight configuration parameters - Cluster size.

    If the defaults are already overridden on the affected cluster using the resourcesPerClusterSize or resources parameters as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits, then the exact recommended number depends on the currently set limit.

    Mirantis recommends increasing the limit by 50%. If it does not resolve the issue, another increase iteration will be required.

  3. When you select the required CPU limit, increase it as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits.

    If the CPU limit for the opensearch component is already set, increase it in the Cluster object for the opensearch parameter. Otherwise, the default StackLight limit is used. In this case, increase the CPU limit for the opensearch component using the resources parameter.

  4. Wait until all opensearch-master pods are recreated with the new CPU limits and become running and ready.

    To verify the current CPU limit for every opensearch container in every opensearch-master pod separately:

    kubectl -n stacklight get pod/opensearch-master-<podSuffixNumber> -o jsonpath="{.spec.containers[?(@.name=='opensearch')].resources}"
    

    In the command above, replace <podSuffixNumber> with the name of the pod suffix. For example, pod/opensearch-master-0 or pod/opensearch-master-2.

    Example of system response:

    {"limits":{"cpu":"900m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    The waiting time may take up to 20 minutes depending on the cluster size.

If the issue is fixed, the KubeContainersCPUThrottlingHigh alert stops firing immediately, while OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical can still be firing for some time during shard relocation.

If the KubeContainersCPUThrottlingHigh alert is still firing, proceed with another iteration of the CPU limit increase.

[40020] Rollover policy update is not appllied to the current index

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

While updating rollover_policy for the current system* and audit* data streams, the update is not applied to indices.

One of indicators that the cluster is most likely affected is the KubeJobFailed alert firing for the elasticsearch-curator job and one or both of the following errors being present in elasticsearch-curator pods that remain in the Error status:

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-audit-000001] is the write index for data stream [audit] and cannot be deleted')

or

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-system-000001] is the write index for data stream [system] and cannot be deleted')

Note

Instead of .ds-audit-000001 or .ds-system-000001 index names, similar names can be present with the same prefix but different suffix numbers.

If the above mentioned alert and errors are present, an immediate action is required, because it indicates that the corresponding index size has already exceeded the space allocated for the index.

To verify that the cluster is affected:

Caution

Verify and apply the workaround to both index patterns, system and audit, separately.

If one of indices is affected, the second one is most likely affected as well. Although in rare cases, only one index may be affected.

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. Verify that the rollover policy is present:

    • system:

      curl localhost:9200/_plugins/_ism/policies/system_rollover_policy
      
    • audit:

      curl localhost:9200/_plugins/_ism/policies/audit_rollover_policy
      

    The cluster is affected if the rollover policy is missing. Otherwise, proceed to the following step.

  3. Verify the system response from the previous step. For example:

    {"_id":"system_rollover_policy","_version":7229,"_seq_no":42362,"_primary_term":28,"policy":{"policy_id":"system_rollover_policy","description":"system index rollover policy.","last_updated_time":1708505222430,"schema_version":19,"error_notification":null,"default_state":"rollover","states":[{"name":"rollover","actions":[{"retry":{"count":3,"backoff":"exponential","delay":"1m"},"rollover":{"min_size":"14746mb","copy_alias":false}}],"transitions":[]}],"ism_template":[{"index_patterns":["system*"],"priority":200,"last_updated_time":1708505222430}]}}
    

    Verify and capture the following items separately for every policy:

    • The _seq_no and _primary_term values

    • The rollover policy threshold, which is defined in policy.states[0].actions[0].rollover.min_size

  4. List indices:

    • system:

      curl localhost:9200/_cat/indices | grep system
      

      Example of system response:

      [...]
      green open .ds-system-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
    • audit:

      curl localhost:9200/_cat/indices | grep audit
      

      Example of system response:

      [...]
      green open .ds-audit-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
  5. Select the index with the highest number and verify the rollover policy attached to the index:

    • system:

      curl localhost:9200/_plugins/_ism/explain/.ds-system-000001
      
    • audit:

      curl localhost:9200/_plugins/_ism/explain/.ds-audit-000001
      
    • If the rollover policy is not attached, the cluster is affected.

    • If the rollover policy is attached but _seq_no and _primary_term numbers do not match the previously captured ones, the cluster is affected.

    • If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), the cluster is most probably affected.

Workaround:

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. If the policy is attached to the index but has different _seq_no and _primary_term, remove the policy from the index:

    Note

    Use the index with the highest number in the name, which was captured during verification procedure.

    • system:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-system-000001
      
    • audit:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-audit-000001
      
  3. Re-add the policy:

    • system:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/system* -d'{"policy_id":"system_rollover_policy"}'
      
    • audit:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/audit* -d'{"policy_id":"audit_rollover_policy"}'
      
  4. Perform again the last step of the cluster verification procedure provided above and make sure that the policy is attached to the index and has the same _seq_no and _primary_term.

    If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), wait up to 15 minutes and verify that the additional index is created with the consecutive number in the index name. For example:

    • system: if you applied changes to .ds-system-000001, wait until .ds-system-000002 is created.

    • audit: if you applied changes to .ds-audit-000001, wait until .ds-audit-000002 is created.

    If such index is not created, escalate the issue to Mirantis support.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.1.5 or 16.1.5.

Consider this information as a supplement to the generic update procedures published in Operations Guide: Automatic upgrade of a management cluster and Update a managed cluster.

Update scheme for patch Cluster releases

To improve user update experience and make the update path more flexible, Container Cloud is introducing a new scheme of updating between patch Cluster releases. More specifically, Container Cloud intends to ultimately provide a possibility to update to any newer patch version within single series at any point of time. The patch version downgrade is not supported.

Though, in some cases, Mirantis may request to update to some specific patch version in the series to be able to update to the next major series. This may be necessary due to the specifics of technical content already released or planned for the release. For possible update paths in MOSK in 24.1 and 24.2 series, see MOSK documentation: Cluster update scheme.

The exact number of patch releases for the 16.1.x and 17.1.x series is yet to be confirmed, but the current target is 7 releases.

Note

The management cluster update scheme remains the same. A management cluster obtains the new product version automatically after release.

Post-update actions
Delete ‘HostOSConfiguration’ objects on baremetal-based clusters

If you use the HostOSConfiguration and HostOSConfigurationModules custom resources for the bare metal provider, which are available in the Technology Preview scope in Container Cloud 2.26.x, delete all HostOSConfiguration objects right after update of your managed cluster to the Cluster release 17.1.5 or 16.1.5, before automatic upgrade of the management cluster to Container Cloud 2.27.0 (Cluster release 16.2.0). After the upgrade, you can recreate the required objects using the updated parameters.

This precautionary step prevents re-processing and re-applying of existing configuration, which is defined in HostOSConfiguration objects, during management cluster upgrade to 2.27.0. Such behavior is caused by changes in the HostOSConfiguration API introduced in 2.27.0.

Configure Kubernetes auditing and profiling for log rotation

Note

Skip this procedure if you have already completed it after updating your managed cluster to Container Cloud 2.26.4 (Cluster release 17.1.4 or 16.1.4).

After the MKE update to 3.7.8, if you are going to enable or already enabled Kubernetes auditing and profiling on your managed or management cluster, keep in mind that enabling audit log rotation requires an additional step. Set the following options in the MKE configuration file after enabling auditing and profiling:

[cluster_config]
  kube_api_server_audit_log_maxage=30
  kube_api_server_audit_log_maxbackup=10
  kube_api_server_audit_log_maxsize=10

For the configuration procedure, see MKE documentation: Configure an existing MKE cluster.

While using this procedure, replace the command to upload the newly edited MKE configuration file with the following one:

curl --silent --insecure -X PUT -H "X-UCP-Allow-Restricted-API: i-solemnly-swear-i-am-up-to-no-good" -H "accept: application/toml" -H "Authorization: Bearer $AUTHTOKEN" --upload-file 'mke-config.toml' https://$MKE_HOST/api/ucp/config-toml
  • The value for MKE_HOST has the <loadBalancerHost>:6443 format, where loadBalancerHost is the corresponding field in the cluster status.

  • The value for MKE_PASSWORD is taken from the ucp-admin-password-<clusterName> secret in the cluster namespace of the management cluster.

  • The value for MKE_USERNAME is always admin.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.26.5. For artifacts of the Cluster releases introduced in 2.26.5, see patch Cluster releases 17.1.5 and 16.1.5.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240517093708

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240517093708

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.39.28.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.39.28.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.39.28.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.39.28.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.39.28.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.39.28.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.39.28.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.39.28

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-26-alpine-20240523095922

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-26-alpine-20240523095601

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-26-alpine-20240408142218

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.39.28

ironic Updated

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20240522120640

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20240522120640

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-26-alpine-20240408150853

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.12-ef4c9453-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.12-ef4c9453-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240129163811

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.39.28.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.39.28.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.39.28.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.39.28.tgz

byo-credentials-controller

https://binary.mirantis.com/core/helm/byo-credentials-controller-1.39.28.tgz

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.39.28.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.39.28.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.39.28.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.39.28.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.39.28.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.39.28.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.39.28.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.39.28.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.39.28.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.39.28.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.39.28.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.39.28.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.39.28.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.39.28.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.39.28.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.39.28.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.39.28.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.39.28.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.39.28.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.39.28.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.39.28.tgz

policy-controller

https://binary.mirantis.com/core/helm/policy-controller-1.39.28.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.39.28.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.39.28.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.39.28.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.39.28.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.39.28.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.39.28.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.39.28.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.39.28.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.39.28.tgz

vsphere-cloud-controller-manager

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.39.28.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.39.28.tgz

vsphere-csi-plugin

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.39.28.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.39.28.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.39.28.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.39.28

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.39.28

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.39.28

byo-credentials-controller Updated

mirantis.azurecr.io/core/byo-credentials-controller:1.39.28

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.39.28

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-6

cinder-csi-plugin

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-16

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.39.28

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.39.28

csi-attacher

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-5

csi-node-driver-registrar

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-5

csi-provisioner

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-5

csi-resizer

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-5

csi-snapshotter

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-4

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.39.28

frontend Updated

mirantis.azurecr.io/core/frontend:1.39.28

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.39.28

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.39.28

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.39.28

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.39.28

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.39.28

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.39.28

livenessprobe

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-5

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.39.28

mcc-haproxy

mirantis.azurecr.io/lcm/mcc-haproxy:v0.24.0-47-gf77368e

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-7

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.39.28

openstack-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-16

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.39.28

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.39.28

policy-controller Updated

mirantis.azurecr.io/core/policy-controller:1.39.28

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.39.28

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.39.28

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.39.28

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-9

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.39.28

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.39.28

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.39.28

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.39.28

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.39.28

vsphere-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-6

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.39.28

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.39.28

vsphere-csi-driver

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.39.28

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts

iam Updated

https://binary.mirantis.com/core/helm/iam-1.39.28.tgz

Docker images

kubectl

mirantis.azurecr.io/stacklight/kubectl:1.22-20240501023013

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240523075821

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:23.0.6-20240216125244

See also

Patch releases

2.26.4

The Container Cloud patch release 2.26.4, which is based on the 2.26.0 major release, provides the following updates:

  • Support for the patch Cluster releases 16.1.4 and 17.1.4 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.1.4.

  • Support for MKE 3.7.8.

  • Bare metal: update of Ubuntu mirror from 20.04~20240411171541 to 20.04~20240502102020 along with update of minor kernel version from 5.15.0-102-generic to 5.15.0-105-generic.

  • Security fixes for CVEs in images.

  • Bug fixes.

This patch release also supports the latest major Cluster releases 17.1.0 and 16.1.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.26.4, refer to 2.26.0.

Security notes

The table below includes the total numbers of addressed unique and common CVEs in images by product component since the Container Cloud 2.26.3 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

1

1

Common

0

3

3

StackLight

Unique

2

8

10

Common

6

9

15

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.1.4: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.26.4 along with the patch Cluster releases 17.1.4 and 16.1.4.

  • [41806] [Container Cloud web UI] Fixed the issue with failure to configure management cluster using the Configure cluster web UI menu without updating the Keycloak Truststore settings.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.26.4 including the Cluster releases 17.1.4 and 16.1.4.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42408] Kernel is not updated on manager nodes after cluster update

Fixed in 17.1.5 and 16.1.5

After managed cluster update, old versions of system packages, including kernel, may remain on the manager nodes. This issue occurs because the task responsible for updating packages fails to run after updating Ubuntu mirrors.

As a workaround, manually run apt-get upgrade on every manager node after the cluster update but before rebooting the node.

[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[41540] LCM Agent cannot grab storage information on a host

Fixed in 17.1.5 and 16.1.5

Due to issues with managing physical NVME devices, lcm-agent cannot grab storage information on a host. As a result, lcmmachine.status.hostinfo.hardware is empty and the following example error is present in logs:

{"level":"error","ts":"2024-05-02T12:26:10Z","logger":"agent", \
"msg":"get hardware details", \
"host":"kaas-node-548b2861-aed0-41c9-8ff2-10c5476b000b", \
"error":"new storage info: get disk info \"nvme0c0n1\": \
invoke command: exit status 1","errorVerbose":"exit status 1

As a workaround, on the affected node, create a symlink for any device indicated in lcm-agent logs. For example:

ln -sfn /dev/nvme0n1 /dev/nvme0c0n1
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[41819] Graceful cluster reboot is blocked by the Ceph ClusterWorkloadLocks

Fixed in 2.27.0 (17.2.0 and 16.2.0)

During graceful reboot of a cluster with Ceph enabled, the reboot is blocked with the following message in the MiraCephMaintenance object status:

message: ClusterMaintenanceRequest found, Ceph Cluster is not ready to upgrade,
 delaying cluster maintenance

As a workaround, add the following snippet to the cephFS section under metadataServer in the spec section of <kcc-name>.yaml in the Ceph cluster:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      metadataServer:
        activeCount: 1
        healthCheck:
          livenessProbe:
            probe:
              failureThreshold: 5
              initialDelaySeconds: 30
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 5
[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


StackLight
[42304] Failure of shard relocation in the OpenSearch cluster

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

On large managed clusters, shard relocation may fail in the OpenSearch cluster with the yellow or red status of the OpenSearch cluster. The characteristic symptom of the issue is that in the stacklight namespace, the statefulset.apps/opensearch-master containers are experiencing throttling with the KubeContainersCPUThrottlingHigh alert firing for the following set of labels:

{created_by_kind="StatefulSet",created_by_name="opensearch-master",namespace="stacklight"}

Caution

The throttling that OpenSearch is experiencing may be a temporary situation, which may be related, for example, to a peaky load and the ongoing shards initialization as part of disaster recovery or after node restart. In this case, Mirantis recommends waiting until initialization of all shards is finished. After that, verify the cluster state and whether throttling still exists. And only if throttling does not disappear, apply the workaround below.

To verify that the initialization of shards is ongoing:

kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash

curl "http://localhost:9200/_cat/shards" | grep INITIALIZING

Example of system response:

.ds-system-000072    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-system-000073    1 r INITIALIZING    10.232.7.145   opensearch-master-2
.ds-system-000073    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-audit-000001     2 r INITIALIZING    10.232.7.145   opensearch-master-2

The system response above indicates that shards from the .ds-system-000072, .ds-system-000073, and .ds-audit-000001 indicies are in the INITIALIZING state. In this case, Mirantis recommends waiting until this process is finished, and only then consider changing the limit.

You can additionally analyze the exact level of throttling and the current CPU usage on the Kubernetes Containers dashboard in Grafana.

Workaround:

  1. Verify the currently configured CPU requests and limits for the opensearch containers:

    kubectl -n stacklight get statefulset.apps/opensearch-master -o jsonpath="{.spec.template.spec.containers[?(@.name=='opensearch')].resources}"
    

    Example of system response:

    {"limits":{"cpu":"600m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    In the example above, the CPU request is 500m and the CPU limit is 600m.

  2. Increase the CPU limit to a reasonably high number.

    For example, the default CPU limit for the clusters with the clusterSize:large parameter set was increased from 8000m to 12000m for StackLight in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0).

    Note

    For details, on the clusterSize parameter, see MOSK Operations Guide: StackLight configuration parameters - Cluster size.

    If the defaults are already overridden on the affected cluster using the resourcesPerClusterSize or resources parameters as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits, then the exact recommended number depends on the currently set limit.

    Mirantis recommends increasing the limit by 50%. If it does not resolve the issue, another increase iteration will be required.

  3. When you select the required CPU limit, increase it as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits.

    If the CPU limit for the opensearch component is already set, increase it in the Cluster object for the opensearch parameter. Otherwise, the default StackLight limit is used. In this case, increase the CPU limit for the opensearch component using the resources parameter.

  4. Wait until all opensearch-master pods are recreated with the new CPU limits and become running and ready.

    To verify the current CPU limit for every opensearch container in every opensearch-master pod separately:

    kubectl -n stacklight get pod/opensearch-master-<podSuffixNumber> -o jsonpath="{.spec.containers[?(@.name=='opensearch')].resources}"
    

    In the command above, replace <podSuffixNumber> with the name of the pod suffix. For example, pod/opensearch-master-0 or pod/opensearch-master-2.

    Example of system response:

    {"limits":{"cpu":"900m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    The waiting time may take up to 20 minutes depending on the cluster size.

If the issue is fixed, the KubeContainersCPUThrottlingHigh alert stops firing immediately, while OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical can still be firing for some time during shard relocation.

If the KubeContainersCPUThrottlingHigh alert is still firing, proceed with another iteration of the CPU limit increase.

[40020] Rollover policy update is not appllied to the current index

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

While updating rollover_policy for the current system* and audit* data streams, the update is not applied to indices.

One of indicators that the cluster is most likely affected is the KubeJobFailed alert firing for the elasticsearch-curator job and one or both of the following errors being present in elasticsearch-curator pods that remain in the Error status:

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-audit-000001] is the write index for data stream [audit] and cannot be deleted')

or

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-system-000001] is the write index for data stream [system] and cannot be deleted')

Note

Instead of .ds-audit-000001 or .ds-system-000001 index names, similar names can be present with the same prefix but different suffix numbers.

If the above mentioned alert and errors are present, an immediate action is required, because it indicates that the corresponding index size has already exceeded the space allocated for the index.

To verify that the cluster is affected:

Caution

Verify and apply the workaround to both index patterns, system and audit, separately.

If one of indices is affected, the second one is most likely affected as well. Although in rare cases, only one index may be affected.

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. Verify that the rollover policy is present:

    • system:

      curl localhost:9200/_plugins/_ism/policies/system_rollover_policy
      
    • audit:

      curl localhost:9200/_plugins/_ism/policies/audit_rollover_policy
      

    The cluster is affected if the rollover policy is missing. Otherwise, proceed to the following step.

  3. Verify the system response from the previous step. For example:

    {"_id":"system_rollover_policy","_version":7229,"_seq_no":42362,"_primary_term":28,"policy":{"policy_id":"system_rollover_policy","description":"system index rollover policy.","last_updated_time":1708505222430,"schema_version":19,"error_notification":null,"default_state":"rollover","states":[{"name":"rollover","actions":[{"retry":{"count":3,"backoff":"exponential","delay":"1m"},"rollover":{"min_size":"14746mb","copy_alias":false}}],"transitions":[]}],"ism_template":[{"index_patterns":["system*"],"priority":200,"last_updated_time":1708505222430}]}}
    

    Verify and capture the following items separately for every policy:

    • The _seq_no and _primary_term values

    • The rollover policy threshold, which is defined in policy.states[0].actions[0].rollover.min_size

  4. List indices:

    • system:

      curl localhost:9200/_cat/indices | grep system
      

      Example of system response:

      [...]
      green open .ds-system-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
    • audit:

      curl localhost:9200/_cat/indices | grep audit
      

      Example of system response:

      [...]
      green open .ds-audit-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
  5. Select the index with the highest number and verify the rollover policy attached to the index:

    • system:

      curl localhost:9200/_plugins/_ism/explain/.ds-system-000001
      
    • audit:

      curl localhost:9200/_plugins/_ism/explain/.ds-audit-000001
      
    • If the rollover policy is not attached, the cluster is affected.

    • If the rollover policy is attached but _seq_no and _primary_term numbers do not match the previously captured ones, the cluster is affected.

    • If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), the cluster is most probably affected.

Workaround:

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. If the policy is attached to the index but has different _seq_no and _primary_term, remove the policy from the index:

    Note

    Use the index with the highest number in the name, which was captured during verification procedure.

    • system:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-system-000001
      
    • audit:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-audit-000001
      
  3. Re-add the policy:

    • system:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/system* -d'{"policy_id":"system_rollover_policy"}'
      
    • audit:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/audit* -d'{"policy_id":"audit_rollover_policy"}'
      
  4. Perform again the last step of the cluster verification procedure provided above and make sure that the policy is attached to the index and has the same _seq_no and _primary_term.

    If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), wait up to 15 minutes and verify that the additional index is created with the consecutive number in the index name. For example:

    • system: if you applied changes to .ds-system-000001, wait until .ds-system-000002 is created.

    • audit: if you applied changes to .ds-audit-000001, wait until .ds-audit-000002 is created.

    If such index is not created, escalate the issue to Mirantis support.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.1.4 or 16.1.4.

Consider this information as a supplement to the generic update procedures published in Operations Guide: Automatic upgrade of a management cluster and Update a patch Cluster release of a managed cluster.

Post-update actions
Configure Kubernetes auditing and profiling for log rotation

After the MKE update to 3.7.8, if you are going to enable or already enabled Kubernetes auditing and profiling on your managed or management cluster, keep in mind that enabling audit log rotation requires an additional step. Set the following options in the MKE configuration file after enabling auditing and profiling:

[cluster_config]
  kube_api_server_audit_log_maxage=30
  kube_api_server_audit_log_maxbackup=10
  kube_api_server_audit_log_maxsize=10

For the configuration procedure, see MKE documentation: Configure an existing MKE cluster.

While using this procedure, replace the command to upload the newly edited MKE configuration file with the following one:

curl --silent --insecure -X PUT -H "X-UCP-Allow-Restricted-API: i-solemnly-swear-i-am-up-to-no-good" -H "accept: application/toml" -H "Authorization: Bearer $AUTHTOKEN" --upload-file 'mke-config.toml' https://$MKE_HOST/api/ucp/config-toml
  • The value for MKE_HOST has the <loadBalancerHost>:6443 format, where loadBalancerHost is the corresponding field in the cluster status.

  • The value for MKE_PASSWORD is taken from the ucp-admin-password-<clusterName> secret in the cluster namespace of the management cluster.

  • The value for MKE_USERNAME is always admin.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.26.4. For artifacts of the Cluster releases introduced in 2.26.4, see patch Cluster releases 17.1.4 and 16.1.4.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240502103738

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240502103738

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.39.26.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.39.26.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.39.26.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.39.26.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.39.26.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.39.26.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.39.26.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.39.26

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-26-alpine-20240408141922

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-26-alpine-20240415095355

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-26-alpine-20240408142218

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.39.26

ironic Updated

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20240510100941

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20240510100941

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-26-alpine-20240408150853

kubernetes-entrypoint Updated

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20240311120505

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.12-ef4c9453-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.12-ef4c9453-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240129163811

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.39.26.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.39.26.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.39.26.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.39.26.tgz

byo-credentials-controller

https://binary.mirantis.com/core/helm/byo-credentials-controller-1.39.26.tgz

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.39.26.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.39.26.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.39.26.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.39.26.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.39.26.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.39.26.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.39.26.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.39.26.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.39.26.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.39.26.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.39.26.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.39.26.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.39.26.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.39.26.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.39.26.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.39.26.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.39.26.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.39.26.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.39.26.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.39.26.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.39.26.tgz

policy-controller

https://binary.mirantis.com/core/helm/policy-controller-1.39.26.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.39.26.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.39.26.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.39.26.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.39.26.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.39.26.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.39.26.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.39.26.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.39.26.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.39.26.tgz

vsphere-cloud-controller-manager

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.39.26.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.39.26.tgz

vsphere-csi-plugin

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.39.26.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.39.26.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.39.26.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.39.26

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.39.26

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.39.26

byo-credentials-controller Updated

mirantis.azurecr.io/core/byo-credentials-controller:1.39.26

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.39.26

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-6

cinder-csi-plugin Updated

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-16

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.39.26

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.39.26

csi-attacher

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-5

csi-node-driver-registrar

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-5

csi-provisioner

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-5

csi-resizer

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-5

csi-snapshotter

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-4

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.39.26

frontend Updated

mirantis.azurecr.io/core/frontend:1.39.26

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.39.26

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.39.26

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.39.26

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.39.26

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.39.26

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.39.26

livenessprobe

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-5

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.39.26

mcc-haproxy

mirantis.azurecr.io/lcm/mcc-haproxy:v0.24.0-47-gf77368e

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-7

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.39.26

openstack-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-16

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.39.26

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.39.26

policy-controller Updated

mirantis.azurecr.io/core/policy-controller:1.39.26

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.39.26

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.39.26

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.39.26

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-9

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.39.26

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.39.26

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.39.26

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.39.26

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.39.26

vsphere-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-6

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.39.26

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.39.26

vsphere-csi-driver

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.39.26

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts

iam Updated

https://binary.mirantis.com/core/helm/iam-1.39.26.tgz

Docker images

kubectl Updated

mirantis.azurecr.io/stacklight/kubectl:1.22-20240501023013

kubernetes-entrypoint Updated

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-ba8ada4-20240405150338

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.17-focal-20240327104027

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:23.0.6-20240216125244

See also

Patch releases

2.26.3

The Container Cloud patch release 2.26.3, which is based on the 2.26.0 major release, provides the following updates:

  • Support for the patch Cluster releases 16.1.3 and 17.1.3 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.1.3.

  • Support for MKE 3.7.7.

  • Bare metal: update of Ubuntu mirror from 20.04~20240324172903 to 20.04~20240411171541 along with update of minor kernel version from 5.15.0-101-generic to 5.15.0-102-generic.

  • Security fixes for CVEs in images.

  • Bug fixes.

This patch release also supports the latest major Cluster releases 17.1.0 and 16.1.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.26.3, refer to 2.26.0.

Security notes

The table below includes the total numbers of addressed unique and common CVEs in images by product component since the Container Cloud 2.26.2 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

1

1

Common

0

10

10

Core

Unique

0

4

4

Common

0

105

105

StackLight

Unique

1

4

5

Common

1

24

25

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.1.3: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.26.3 along with the patch Cluster releases 17.1.3 and 16.1.3.

  • [40811] [LCM] Fixed the issue with the DaemonSet Pod remaining on the deleted node in the Terminating state during machine deletion.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.26.3 including the Cluster releases 17.1.3 and 16.1.3.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[41540] LCM Agent cannot grab storage information on a host

Fixed in 17.1.5 and 16.1.5

Due to issues with managing physical NVME devices, lcm-agent cannot grab storage information on a host. As a result, lcmmachine.status.hostinfo.hardware is empty and the following example error is present in logs:

{"level":"error","ts":"2024-05-02T12:26:10Z","logger":"agent", \
"msg":"get hardware details", \
"host":"kaas-node-548b2861-aed0-41c9-8ff2-10c5476b000b", \
"error":"new storage info: get disk info \"nvme0c0n1\": \
invoke command: exit status 1","errorVerbose":"exit status 1

As a workaround, on the affected node, create a symlink for any device indicated in lcm-agent logs. For example:

ln -sfn /dev/nvme0n1 /dev/nvme0c0n1
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[41819] Graceful cluster reboot is blocked by the Ceph ClusterWorkloadLocks

Fixed in 2.27.0 (17.2.0 and 16.2.0)

During graceful reboot of a cluster with Ceph enabled, the reboot is blocked with the following message in the MiraCephMaintenance object status:

message: ClusterMaintenanceRequest found, Ceph Cluster is not ready to upgrade,
 delaying cluster maintenance

As a workaround, add the following snippet to the cephFS section under metadataServer in the spec section of <kcc-name>.yaml in the Ceph cluster:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      metadataServer:
        activeCount: 1
        healthCheck:
          livenessProbe:
            probe:
              failureThreshold: 5
              initialDelaySeconds: 30
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 5
[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


StackLight
[42304] Failure of shard relocation in the OpenSearch cluster

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

On large managed clusters, shard relocation may fail in the OpenSearch cluster with the yellow or red status of the OpenSearch cluster. The characteristic symptom of the issue is that in the stacklight namespace, the statefulset.apps/opensearch-master containers are experiencing throttling with the KubeContainersCPUThrottlingHigh alert firing for the following set of labels:

{created_by_kind="StatefulSet",created_by_name="opensearch-master",namespace="stacklight"}

Caution

The throttling that OpenSearch is experiencing may be a temporary situation, which may be related, for example, to a peaky load and the ongoing shards initialization as part of disaster recovery or after node restart. In this case, Mirantis recommends waiting until initialization of all shards is finished. After that, verify the cluster state and whether throttling still exists. And only if throttling does not disappear, apply the workaround below.

To verify that the initialization of shards is ongoing:

kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash

curl "http://localhost:9200/_cat/shards" | grep INITIALIZING

Example of system response:

.ds-system-000072    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-system-000073    1 r INITIALIZING    10.232.7.145   opensearch-master-2
.ds-system-000073    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-audit-000001     2 r INITIALIZING    10.232.7.145   opensearch-master-2

The system response above indicates that shards from the .ds-system-000072, .ds-system-000073, and .ds-audit-000001 indicies are in the INITIALIZING state. In this case, Mirantis recommends waiting until this process is finished, and only then consider changing the limit.

You can additionally analyze the exact level of throttling and the current CPU usage on the Kubernetes Containers dashboard in Grafana.

Workaround:

  1. Verify the currently configured CPU requests and limits for the opensearch containers:

    kubectl -n stacklight get statefulset.apps/opensearch-master -o jsonpath="{.spec.template.spec.containers[?(@.name=='opensearch')].resources}"
    

    Example of system response:

    {"limits":{"cpu":"600m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    In the example above, the CPU request is 500m and the CPU limit is 600m.

  2. Increase the CPU limit to a reasonably high number.

    For example, the default CPU limit for the clusters with the clusterSize:large parameter set was increased from 8000m to 12000m for StackLight in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0).

    Note

    For details, on the clusterSize parameter, see MOSK Operations Guide: StackLight configuration parameters - Cluster size.

    If the defaults are already overridden on the affected cluster using the resourcesPerClusterSize or resources parameters as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits, then the exact recommended number depends on the currently set limit.

    Mirantis recommends increasing the limit by 50%. If it does not resolve the issue, another increase iteration will be required.

  3. When you select the required CPU limit, increase it as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits.

    If the CPU limit for the opensearch component is already set, increase it in the Cluster object for the opensearch parameter. Otherwise, the default StackLight limit is used. In this case, increase the CPU limit for the opensearch component using the resources parameter.

  4. Wait until all opensearch-master pods are recreated with the new CPU limits and become running and ready.

    To verify the current CPU limit for every opensearch container in every opensearch-master pod separately:

    kubectl -n stacklight get pod/opensearch-master-<podSuffixNumber> -o jsonpath="{.spec.containers[?(@.name=='opensearch')].resources}"
    

    In the command above, replace <podSuffixNumber> with the name of the pod suffix. For example, pod/opensearch-master-0 or pod/opensearch-master-2.

    Example of system response:

    {"limits":{"cpu":"900m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    The waiting time may take up to 20 minutes depending on the cluster size.

If the issue is fixed, the KubeContainersCPUThrottlingHigh alert stops firing immediately, while OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical can still be firing for some time during shard relocation.

If the KubeContainersCPUThrottlingHigh alert is still firing, proceed with another iteration of the CPU limit increase.

[40020] Rollover policy update is not appllied to the current index

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

While updating rollover_policy for the current system* and audit* data streams, the update is not applied to indices.

One of indicators that the cluster is most likely affected is the KubeJobFailed alert firing for the elasticsearch-curator job and one or both of the following errors being present in elasticsearch-curator pods that remain in the Error status:

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-audit-000001] is the write index for data stream [audit] and cannot be deleted')

or

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-system-000001] is the write index for data stream [system] and cannot be deleted')

Note

Instead of .ds-audit-000001 or .ds-system-000001 index names, similar names can be present with the same prefix but different suffix numbers.

If the above mentioned alert and errors are present, an immediate action is required, because it indicates that the corresponding index size has already exceeded the space allocated for the index.

To verify that the cluster is affected:

Caution

Verify and apply the workaround to both index patterns, system and audit, separately.

If one of indices is affected, the second one is most likely affected as well. Although in rare cases, only one index may be affected.

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. Verify that the rollover policy is present:

    • system:

      curl localhost:9200/_plugins/_ism/policies/system_rollover_policy
      
    • audit:

      curl localhost:9200/_plugins/_ism/policies/audit_rollover_policy
      

    The cluster is affected if the rollover policy is missing. Otherwise, proceed to the following step.

  3. Verify the system response from the previous step. For example:

    {"_id":"system_rollover_policy","_version":7229,"_seq_no":42362,"_primary_term":28,"policy":{"policy_id":"system_rollover_policy","description":"system index rollover policy.","last_updated_time":1708505222430,"schema_version":19,"error_notification":null,"default_state":"rollover","states":[{"name":"rollover","actions":[{"retry":{"count":3,"backoff":"exponential","delay":"1m"},"rollover":{"min_size":"14746mb","copy_alias":false}}],"transitions":[]}],"ism_template":[{"index_patterns":["system*"],"priority":200,"last_updated_time":1708505222430}]}}
    

    Verify and capture the following items separately for every policy:

    • The _seq_no and _primary_term values

    • The rollover policy threshold, which is defined in policy.states[0].actions[0].rollover.min_size

  4. List indices:

    • system:

      curl localhost:9200/_cat/indices | grep system
      

      Example of system response:

      [...]
      green open .ds-system-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
    • audit:

      curl localhost:9200/_cat/indices | grep audit
      

      Example of system response:

      [...]
      green open .ds-audit-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
  5. Select the index with the highest number and verify the rollover policy attached to the index:

    • system:

      curl localhost:9200/_plugins/_ism/explain/.ds-system-000001
      
    • audit:

      curl localhost:9200/_plugins/_ism/explain/.ds-audit-000001
      
    • If the rollover policy is not attached, the cluster is affected.

    • If the rollover policy is attached but _seq_no and _primary_term numbers do not match the previously captured ones, the cluster is affected.

    • If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), the cluster is most probably affected.

Workaround:

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. If the policy is attached to the index but has different _seq_no and _primary_term, remove the policy from the index:

    Note

    Use the index with the highest number in the name, which was captured during verification procedure.

    • system:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-system-000001
      
    • audit:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-audit-000001
      
  3. Re-add the policy:

    • system:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/system* -d'{"policy_id":"system_rollover_policy"}'
      
    • audit:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/audit* -d'{"policy_id":"audit_rollover_policy"}'
      
  4. Perform again the last step of the cluster verification procedure provided above and make sure that the policy is attached to the index and has the same _seq_no and _primary_term.

    If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), wait up to 15 minutes and verify that the additional index is created with the consecutive number in the index name. For example:

    • system: if you applied changes to .ds-system-000001, wait until .ds-system-000002 is created.

    • audit: if you applied changes to .ds-audit-000001, wait until .ds-audit-000002 is created.

    If such index is not created, escalate the issue to Mirantis support.


Container Cloud web UI
[41806] Configuration of a management cluster fails without Keycloak settings

Fixed in 17.1.4 and 16.1.4

During configuration of a management cluster settings using the Configure cluster web UI menu, updating the Keycloak Truststore settings is mandatory, despite being optional.

As a workaround, update the management cluster using the API or CLI.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.26.3. For artifacts of the Cluster releases introduced in 2.26.3, see patch Cluster releases 17.1.3 and 16.1.3.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240411174919

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240411174919

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.39.23.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.39.23.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.39.23.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.39.23.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.39.23.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.39.23.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.39.23.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.39.23

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-26-alpine-20240408141922

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-26-alpine-20240408141703

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-2-26-alpine-20240408142218

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.39.23

ironic

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20240226060024

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20240226060024

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam Updated

mirantis.azurecr.io/bm/kaas-ipam:base-2-26-alpine-20240408150853

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20240311120505

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metallb-controller Updated

mirantis.azurecr.io/bm/metallb/controller:v0.13.12-ef4c9453-amd64

metallb-speaker Updated

mirantis.azurecr.io/bm/metallb/speaker:v0.13.12-ef4c9453-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240129163811

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.39.23.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.39.23.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.39.23.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.39.23.tgz

byo-credentials-controller

https://binary.mirantis.com/core/helm/byo-credentials-controller-1.39.23.tgz

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.39.23.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.39.23.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.39.23.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.39.23.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.39.23.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.39.23.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.39.23.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.39.23.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.39.23.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.39.23.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.39.23.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.39.23.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.39.23.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.39.23.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.39.23.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.39.23.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.39.23.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.39.23.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.39.23.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.39.23.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.39.23.tgz

policy-controller

https://binary.mirantis.com/core/helm/policy-controller-1.39.23.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.39.23.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.39.23.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.39.23.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.39.23.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.39.23.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.39.23.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.39.23.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.39.23.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.39.23.tgz

vsphere-cloud-controller-manager

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.39.23.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.39.23.tgz

vsphere-csi-plugin

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.39.23.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.39.23.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.39.23.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.39.23

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.39.23

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.39.23

byo-credentials-controller Updated

mirantis.azurecr.io/core/byo-credentials-controller:1.39.23

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.39.23

cert-manager-controller Updated

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-6

cinder-csi-plugin Updated

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-14

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.39.23

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.39.23

csi-attacher Updated

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-5

csi-node-driver-registrar Updated

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-5

csi-provisioner Updated

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-5

csi-resizer Updated

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-5

csi-snapshotter Updated

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-4

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.39.23

frontend Updated

mirantis.azurecr.io/core/frontend:1.39.23

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.39.23

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.39.23

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.39.23

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.39.23

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.39.23

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.39.23

livenessprobe Updated

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-5

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.39.23

mcc-haproxy

mirantis.azurecr.io/lcm/mcc-haproxy:v0.24.0-47-gf77368e

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metrics-server Updated

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-7

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.39.23

openstack-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-14

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.39.23

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.39.23

policy-controller Updated

mirantis.azurecr.io/core/policy-controller:1.39.23

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.39.23

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.39.23

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.39.23

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-9

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.39.23

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.39.23

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.39.23

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.39.23

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.39.23

vsphere-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-6

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.39.23

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.39.23

vsphere-csi-driver

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.39.23

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts

iam Updated

https://binary.mirantis.com/core/helm/iam-1.39.23.tgz

Docker images

kubectl

mirantis.azurecr.io/stacklight/kubectl:1.22-20240221023016

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20240311120505

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:23.0.6-20240216125244

See also

Patch releases

2.26.2

The Container Cloud patch release 2.26.2, which is based on the 2.26.0 major release, provides the following updates:

  • Support for the patch Cluster releases 16.1.2 and 17.1.2 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.1.2.

  • Support for MKE 3.7.6.

  • Support for docker-ee-cli 23.0.10 in MCR 23.0.9 to fix several CVEs.

  • Bare metal: update of Ubuntu mirror from 20.04~20240302175618 to 20.04~20240324172903 along with update of minor kernel version from 5.15.0-97-generic to 5.15.0-101-generic.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.1.0 and 16.1.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.26.2, refer to 2.26.0.

Security notes

The table below includes the total numbers of addressed unique and common CVEs in images by product component since the Container Cloud 2.26.1 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

3

3

Common

0

12

12

Kaas core

Unique

1

6

7

Common

1

11

12

StackLight

Unique

0

1

1

Common

0

10

10

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.1.2: Security notes.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.26.2 including the Cluster releases 17.1.2 and 16.1.2.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[41540] LCM Agent cannot grab storage information on a host

Fixed in 17.1.5 and 16.1.5

Due to issues with managing physical NVME devices, lcm-agent cannot grab storage information on a host. As a result, lcmmachine.status.hostinfo.hardware is empty and the following example error is present in logs:

{"level":"error","ts":"2024-05-02T12:26:10Z","logger":"agent", \
"msg":"get hardware details", \
"host":"kaas-node-548b2861-aed0-41c9-8ff2-10c5476b000b", \
"error":"new storage info: get disk info \"nvme0c0n1\": \
invoke command: exit status 1","errorVerbose":"exit status 1

As a workaround, on the affected node, create a symlink for any device indicated in lcm-agent logs. For example:

ln -sfn /dev/nvme0n1 /dev/nvme0c0n1
[40811] Pod is stuck in the Terminating state on the deleted node

Fixed in 17.1.3 and 16.1.3

During deletion of a machine, the related DaemonSet Pod can remain on the deleted node in the Terminating state. As a workaround, manually delete the Pod:

kubectl delete pod -n <podNamespace> <podName>
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[41819] Graceful cluster reboot is blocked by the Ceph ClusterWorkloadLocks

Fixed in 2.27.0 (17.2.0 and 16.2.0)

During graceful reboot of a cluster with Ceph enabled, the reboot is blocked with the following message in the MiraCephMaintenance object status:

message: ClusterMaintenanceRequest found, Ceph Cluster is not ready to upgrade,
 delaying cluster maintenance

As a workaround, add the following snippet to the cephFS section under metadataServer in the spec section of <kcc-name>.yaml in the Ceph cluster:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      metadataServer:
        activeCount: 1
        healthCheck:
          livenessProbe:
            probe:
              failureThreshold: 5
              initialDelaySeconds: 30
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 5
[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


StackLight
[42304] Failure of shard relocation in the OpenSearch cluster

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

On large managed clusters, shard relocation may fail in the OpenSearch cluster with the yellow or red status of the OpenSearch cluster. The characteristic symptom of the issue is that in the stacklight namespace, the statefulset.apps/opensearch-master containers are experiencing throttling with the KubeContainersCPUThrottlingHigh alert firing for the following set of labels:

{created_by_kind="StatefulSet",created_by_name="opensearch-master",namespace="stacklight"}

Caution

The throttling that OpenSearch is experiencing may be a temporary situation, which may be related, for example, to a peaky load and the ongoing shards initialization as part of disaster recovery or after node restart. In this case, Mirantis recommends waiting until initialization of all shards is finished. After that, verify the cluster state and whether throttling still exists. And only if throttling does not disappear, apply the workaround below.

To verify that the initialization of shards is ongoing:

kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash

curl "http://localhost:9200/_cat/shards" | grep INITIALIZING

Example of system response:

.ds-system-000072    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-system-000073    1 r INITIALIZING    10.232.7.145   opensearch-master-2
.ds-system-000073    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-audit-000001     2 r INITIALIZING    10.232.7.145   opensearch-master-2

The system response above indicates that shards from the .ds-system-000072, .ds-system-000073, and .ds-audit-000001 indicies are in the INITIALIZING state. In this case, Mirantis recommends waiting until this process is finished, and only then consider changing the limit.

You can additionally analyze the exact level of throttling and the current CPU usage on the Kubernetes Containers dashboard in Grafana.

Workaround:

  1. Verify the currently configured CPU requests and limits for the opensearch containers:

    kubectl -n stacklight get statefulset.apps/opensearch-master -o jsonpath="{.spec.template.spec.containers[?(@.name=='opensearch')].resources}"
    

    Example of system response:

    {"limits":{"cpu":"600m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    In the example above, the CPU request is 500m and the CPU limit is 600m.

  2. Increase the CPU limit to a reasonably high number.

    For example, the default CPU limit for the clusters with the clusterSize:large parameter set was increased from 8000m to 12000m for StackLight in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0).

    Note

    For details, on the clusterSize parameter, see MOSK Operations Guide: StackLight configuration parameters - Cluster size.

    If the defaults are already overridden on the affected cluster using the resourcesPerClusterSize or resources parameters as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits, then the exact recommended number depends on the currently set limit.

    Mirantis recommends increasing the limit by 50%. If it does not resolve the issue, another increase iteration will be required.

  3. When you select the required CPU limit, increase it as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits.

    If the CPU limit for the opensearch component is already set, increase it in the Cluster object for the opensearch parameter. Otherwise, the default StackLight limit is used. In this case, increase the CPU limit for the opensearch component using the resources parameter.

  4. Wait until all opensearch-master pods are recreated with the new CPU limits and become running and ready.

    To verify the current CPU limit for every opensearch container in every opensearch-master pod separately:

    kubectl -n stacklight get pod/opensearch-master-<podSuffixNumber> -o jsonpath="{.spec.containers[?(@.name=='opensearch')].resources}"
    

    In the command above, replace <podSuffixNumber> with the name of the pod suffix. For example, pod/opensearch-master-0 or pod/opensearch-master-2.

    Example of system response:

    {"limits":{"cpu":"900m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    The waiting time may take up to 20 minutes depending on the cluster size.

If the issue is fixed, the KubeContainersCPUThrottlingHigh alert stops firing immediately, while OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical can still be firing for some time during shard relocation.

If the KubeContainersCPUThrottlingHigh alert is still firing, proceed with another iteration of the CPU limit increase.

[40020] Rollover policy update is not appllied to the current index

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

While updating rollover_policy for the current system* and audit* data streams, the update is not applied to indices.

One of indicators that the cluster is most likely affected is the KubeJobFailed alert firing for the elasticsearch-curator job and one or both of the following errors being present in elasticsearch-curator pods that remain in the Error status:

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-audit-000001] is the write index for data stream [audit] and cannot be deleted')

or

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-system-000001] is the write index for data stream [system] and cannot be deleted')

Note

Instead of .ds-audit-000001 or .ds-system-000001 index names, similar names can be present with the same prefix but different suffix numbers.

If the above mentioned alert and errors are present, an immediate action is required, because it indicates that the corresponding index size has already exceeded the space allocated for the index.

To verify that the cluster is affected:

Caution

Verify and apply the workaround to both index patterns, system and audit, separately.

If one of indices is affected, the second one is most likely affected as well. Although in rare cases, only one index may be affected.

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. Verify that the rollover policy is present:

    • system:

      curl localhost:9200/_plugins/_ism/policies/system_rollover_policy
      
    • audit:

      curl localhost:9200/_plugins/_ism/policies/audit_rollover_policy
      

    The cluster is affected if the rollover policy is missing. Otherwise, proceed to the following step.

  3. Verify the system response from the previous step. For example:

    {"_id":"system_rollover_policy","_version":7229,"_seq_no":42362,"_primary_term":28,"policy":{"policy_id":"system_rollover_policy","description":"system index rollover policy.","last_updated_time":1708505222430,"schema_version":19,"error_notification":null,"default_state":"rollover","states":[{"name":"rollover","actions":[{"retry":{"count":3,"backoff":"exponential","delay":"1m"},"rollover":{"min_size":"14746mb","copy_alias":false}}],"transitions":[]}],"ism_template":[{"index_patterns":["system*"],"priority":200,"last_updated_time":1708505222430}]}}
    

    Verify and capture the following items separately for every policy:

    • The _seq_no and _primary_term values

    • The rollover policy threshold, which is defined in policy.states[0].actions[0].rollover.min_size

  4. List indices:

    • system:

      curl localhost:9200/_cat/indices | grep system
      

      Example of system response:

      [...]
      green open .ds-system-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
    • audit:

      curl localhost:9200/_cat/indices | grep audit
      

      Example of system response:

      [...]
      green open .ds-audit-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
  5. Select the index with the highest number and verify the rollover policy attached to the index:

    • system:

      curl localhost:9200/_plugins/_ism/explain/.ds-system-000001
      
    • audit:

      curl localhost:9200/_plugins/_ism/explain/.ds-audit-000001
      
    • If the rollover policy is not attached, the cluster is affected.

    • If the rollover policy is attached but _seq_no and _primary_term numbers do not match the previously captured ones, the cluster is affected.

    • If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), the cluster is most probably affected.

Workaround:

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. If the policy is attached to the index but has different _seq_no and _primary_term, remove the policy from the index:

    Note

    Use the index with the highest number in the name, which was captured during verification procedure.

    • system:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-system-000001
      
    • audit:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-audit-000001
      
  3. Re-add the policy:

    • system:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/system* -d'{"policy_id":"system_rollover_policy"}'
      
    • audit:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/audit* -d'{"policy_id":"audit_rollover_policy"}'
      
  4. Perform again the last step of the cluster verification procedure provided above and make sure that the policy is attached to the index and has the same _seq_no and _primary_term.

    If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), wait up to 15 minutes and verify that the additional index is created with the consecutive number in the index name. For example:

    • system: if you applied changes to .ds-system-000001, wait until .ds-system-000002 is created.

    • audit: if you applied changes to .ds-audit-000001, wait until .ds-audit-000002 is created.

    If such index is not created, escalate the issue to Mirantis support.


Container Cloud web UI
[41806] Configuration of a management cluster fails without Keycloak settings

Fixed in 17.1.4 and 16.1.4

During configuration of a management cluster settings using the Configure cluster web UI menu, updating the Keycloak Truststore settings is mandatory, despite being optional.

As a workaround, update the management cluster using the API or CLI.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.26.2. For artifacts of the Cluster releases introduced in 2.26.2, see patch Cluster releases 17.1.2 and 16.1.2.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240324195604

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240324195604

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.39.19.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.39.19.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.39.19.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.39.19.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.39.19.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.39.19.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.39.19.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.39.19

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-26-alpine-20240325100252

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-26-alpine-20240325093002

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-26-alpine-20240129155244

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.39.19

ironic

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20240226060024

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20240226060024

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-26-alpine-20240129213142

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20240311120505

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.12-31212f9e-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.12-31212f9e-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240129163811

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.39.19.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.39.19.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.39.19.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.39.19.tgz

byo-credentials-controller

https://binary.mirantis.com/core/helm/byo-credentials-controller-1.39.19.tgz

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.39.19.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.39.19.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.39.19.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.39.19.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.39.19.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.39.19.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.39.19.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.39.19.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.39.19.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.39.19.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.39.19.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.39.19.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.39.19.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.39.19.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.39.19.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.39.19.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.39.19.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.39.19.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.39.19.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.39.19.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.39.19.tgz

policy-controller

https://binary.mirantis.com/core/helm/policy-controller-1.39.19.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.39.19.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.39.19.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.39.19.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.39.19.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.39.19.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.39.19.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.39.19.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.39.19.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.39.19.tgz

vsphere-cloud-controller-manager

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.39.19.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.39.19.tgz

vsphere-csi-plugin

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.39.19.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.39.19.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.39.19.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.39.19

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.39.19

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.39.19

byo-credentials-controller Updated

mirantis.azurecr.io/core/byo-credentials-controller:1.39.19

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.39.19

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-5

cinder-csi-plugin

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-13

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.39.19

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.39.19

csi-attacher

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-4

csi-node-driver-registrar

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-4

csi-provisioner

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-4

csi-resizer

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-4

csi-snapshotter

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-3

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.39.19

frontend Updated

mirantis.azurecr.io/core/frontend:1.39.19

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.39.19

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.39.19

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.39.19

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.39.19

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.39.19

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.39.19

livenessprobe

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-4

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.39.19

mcc-haproxy

mirantis.azurecr.io/lcm/mcc-haproxy:v0.24.0-47-gf77368e

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-6

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.39.19

openstack-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-13

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.39.19

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.39.19

policy-controller Updated

mirantis.azurecr.io/core/policy-controller:1.39.19

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.39.19

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.39.19

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.39.19

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-9

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.39.19

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.39.19

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.39.19

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.39.19

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.39.19

vsphere-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-5

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.39.19

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.39.19

vsphere-csi-driver

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.39.19

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.39.19.tgz

Docker images

kubectl Updated

mirantis.azurecr.io/stacklight/kubectl:1.22-20240221023016

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231127070342

mcc-keycloak Updated

mirantis.azurecr.io/iam/mcc-keycloak:23.0.6-20240216125244

See also

Patch releases

2.26.1

The Container Cloud patch release 2.26.1, which is based on the 2.26.0 major release, provides the following updates:

  • Support for the patch Cluster releases 16.1.1 and 17.1.1 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 24.1.1.

  • Delivery mechanism for CVE fixes on Ubuntu in bare metal clusters that includes update of Ubuntu kernel minor version. For details, see Enhancements.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.1.0 and 16.1.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.26.1, refer to 2.26.0.

Enhancements

This section outlines new features and enhancements introduced in the Container Cloud patch release 2.26.1 along with Cluster releases 17.1.1 and 16.1.1.

Delivery mechanism for CVE fixes on Ubuntu in bare metal clusters

Introduced the ability to update Ubuntu packages including kernel minor version update, when available in a Cluster release, for both management and managed bare metal clusters to address CVE issues on a host operating system.

  • On management clusters, the update of Ubuntu mirror along with the update of minor kernel version occurs automatically with cordon-drain and reboot of machines.

  • On managed clusters, the update of Ubuntu mirror along with the update of minor kernel version applies during a manual cluster update without automatic cordon-drain and reboot of machines. After a managed cluster update, all cluster machines have the reboot is required notification. You can manually handle the reboot of machines during a convenient maintenance window using GracefulRebootRequest.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.26.1. For artifacts of the Cluster releases introduced in 2.26.1, see patch Cluster releases 17.1.1 and 16.1.1.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts
Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240302181430

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240302181430

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-155-1882779.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.39.15.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.39.15.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.39.15.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.39.15.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.39.15.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.39.15.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.39.15.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.39.15

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-26-alpine-20240226130438

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-26-alpine-20240226130310

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-26-alpine-20240129155244

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.39.15

ironic Updated

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20240226060024

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20240226060024

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-26-alpine-20240129213142

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231127070342

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.12-31212f9e-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.12-31212f9e-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240129163811

Core artifacts
Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.39.15.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.39.15.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.39.15.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.39.15.tgz

byo-credentials-controller

https://binary.mirantis.com/core/helm/byo-credentials-controller-1.39.15.tgz

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.39.15.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.39.15.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.39.15.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.39.15.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.39.15.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.39.15.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.39.15.tgz

host-os-modules-controller

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.39.15.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.39.15.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.39.15.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.39.15.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.39.15.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.39.15.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.39.15.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.39.15.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.39.15.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.39.15.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.39.15.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.39.15.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.39.15.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.39.15.tgz

policy-controller

https://binary.mirantis.com/core/helm/policy-controller-1.39.15.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.39.15.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.39.15.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.39.15.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.39.15.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.39.15.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.39.15.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.39.15.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.39.15.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.39.15.tgz

vsphere-cloud-controller-manager

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.39.15.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.39.15.tgz

vsphere-csi-plugin

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.39.15.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.39.15.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.39.15.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.39.15

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.39.15

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.39.15

byo-credentials-controller Updated

mirantis.azurecr.io/core/byo-credentials-controller:1.39.15

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.39.15

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-5

cinder-csi-plugin Updated

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-13

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.39.15

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.39.15

csi-attacher

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-4

csi-node-driver-registrar

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-4

csi-provisioner

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-4

csi-resizer

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-4

csi-snapshotter

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-3

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.39.15

frontend Updated

mirantis.azurecr.io/core/frontend:1.39.15

host-os-modules-controller Updated

mirantis.azurecr.io/core/host-os-modules-controller:1.39.15

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.39.15

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.39.15

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.39.15

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.39.15

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.39.15

livenessprobe

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-4

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.39.15

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.24.0-47-gf77368e

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-47-gf77368e

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-6

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.39.15

openstack-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-13

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.39.15

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.39.15

policy-controller Updated

mirantis.azurecr.io/core/policy-controller:1.39.15

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.39.15

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.39.15

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.39.15

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-9

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.39.15

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.39.15

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.39.15

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.39.15

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.39.15

vsphere-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-5

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.39.15

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.39.15

vsphere-csi-driver

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.39.15

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts

iam

https://binary.mirantis.com/core/helm/iam-1.39.15.tgz

Docker images

kubectl

mirantis.azurecr.io/stacklight/kubectl:1.22-20240105023016

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231127070342

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:23.0.3-1

Security notes

The table below includes the total numbers of addressed unique and common CVEs in images by product component since the Container Cloud 2.26.0 major release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

1

1

Common

0

3

3

Kaas core

Unique

0

6

6

Common

0

27

27

StackLight

Unique

0

15

15

Common

0

51

51

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.1.1: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.26.1 along with the patch Cluster releases 17.1.1 and 16.1.1.

  • [39330] [StackLight] Fixed the issue with the OpenSearch cluster being stuck due to initializing replica shards.

  • [39220] [StackLight] Fixed the issue with Patroni failure due to no limit configuration for the max_timelines_history parameter.

  • [39080] [StackLight] Fixed the issue with the OpenSearchClusterStatusWarning alert firing during cluster upgrade if StackLight is deployed in the HA mode.

  • [38970] [StackLight] Fixed the issue with the Logs dashboard in the OpenSearch Dashboards web UI not working for the system index.

  • [38937] [StackLight] Fixed the issue with the View logs in OpenSearch Dashboards link not working in the Grafana web UI.

  • [40747] [vSphere] Fixed the issue with the unsupported Cluster release being available for greenfield vSphere-based managed cluster deployments in the drop-down menu of the cluster creation window in the Container Cloud web UI.

  • [40036] [LCM] Fixed the issue causing nodes to remain in the Kubernetes cluster when the corresponding Machine object is disabled during cluster update.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.26.1 including the Cluster releases 17.1.1 and 16.1.1.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


LCM
[41540] LCM Agent cannot grab storage information on a host

Fixed in 17.1.5 and 16.1.5

Due to issues with managing physical NVME devices, lcm-agent cannot grab storage information on a host. As a result, lcmmachine.status.hostinfo.hardware is empty and the following example error is present in logs:

{"level":"error","ts":"2024-05-02T12:26:10Z","logger":"agent", \
"msg":"get hardware details", \
"host":"kaas-node-548b2861-aed0-41c9-8ff2-10c5476b000b", \
"error":"new storage info: get disk info \"nvme0c0n1\": \
invoke command: exit status 1","errorVerbose":"exit status 1

As a workaround, on the affected node, create a symlink for any device indicated in lcm-agent logs. For example:

ln -sfn /dev/nvme0n1 /dev/nvme0c0n1
[40811] Pod is stuck in the Terminating state on the deleted node

Fixed in 17.1.3 and 16.1.3

During deletion of a machine, the related DaemonSet Pod can remain on the deleted node in the Terminating state. As a workaround, manually delete the Pod:

kubectl delete pod -n <podNamespace> <podName>
[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[41819] Graceful cluster reboot is blocked by the Ceph ClusterWorkloadLocks

Fixed in 2.27.0 (17.2.0 and 16.2.0)

During graceful reboot of a cluster with Ceph enabled, the reboot is blocked with the following message in the MiraCephMaintenance object status:

message: ClusterMaintenanceRequest found, Ceph Cluster is not ready to upgrade,
 delaying cluster maintenance

As a workaround, add the following snippet to the cephFS section under metadataServer in the spec section of <kcc-name>.yaml in the Ceph cluster:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      metadataServer:
        activeCount: 1
        healthCheck:
          livenessProbe:
            probe:
              failureThreshold: 5
              initialDelaySeconds: 30
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 5
[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


StackLight
[42304] Failure of shard relocation in the OpenSearch cluster

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

On large managed clusters, shard relocation may fail in the OpenSearch cluster with the yellow or red status of the OpenSearch cluster. The characteristic symptom of the issue is that in the stacklight namespace, the statefulset.apps/opensearch-master containers are experiencing throttling with the KubeContainersCPUThrottlingHigh alert firing for the following set of labels:

{created_by_kind="StatefulSet",created_by_name="opensearch-master",namespace="stacklight"}

Caution

The throttling that OpenSearch is experiencing may be a temporary situation, which may be related, for example, to a peaky load and the ongoing shards initialization as part of disaster recovery or after node restart. In this case, Mirantis recommends waiting until initialization of all shards is finished. After that, verify the cluster state and whether throttling still exists. And only if throttling does not disappear, apply the workaround below.

To verify that the initialization of shards is ongoing:

kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash

curl "http://localhost:9200/_cat/shards" | grep INITIALIZING

Example of system response:

.ds-system-000072    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-system-000073    1 r INITIALIZING    10.232.7.145   opensearch-master-2
.ds-system-000073    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-audit-000001     2 r INITIALIZING    10.232.7.145   opensearch-master-2

The system response above indicates that shards from the .ds-system-000072, .ds-system-000073, and .ds-audit-000001 indicies are in the INITIALIZING state. In this case, Mirantis recommends waiting until this process is finished, and only then consider changing the limit.

You can additionally analyze the exact level of throttling and the current CPU usage on the Kubernetes Containers dashboard in Grafana.

Workaround:

  1. Verify the currently configured CPU requests and limits for the opensearch containers:

    kubectl -n stacklight get statefulset.apps/opensearch-master -o jsonpath="{.spec.template.spec.containers[?(@.name=='opensearch')].resources}"
    

    Example of system response:

    {"limits":{"cpu":"600m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    In the example above, the CPU request is 500m and the CPU limit is 600m.

  2. Increase the CPU limit to a reasonably high number.

    For example, the default CPU limit for the clusters with the clusterSize:large parameter set was increased from 8000m to 12000m for StackLight in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0).

    Note

    For details, on the clusterSize parameter, see MOSK Operations Guide: StackLight configuration parameters - Cluster size.

    If the defaults are already overridden on the affected cluster using the resourcesPerClusterSize or resources parameters as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits, then the exact recommended number depends on the currently set limit.

    Mirantis recommends increasing the limit by 50%. If it does not resolve the issue, another increase iteration will be required.

  3. When you select the required CPU limit, increase it as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits.

    If the CPU limit for the opensearch component is already set, increase it in the Cluster object for the opensearch parameter. Otherwise, the default StackLight limit is used. In this case, increase the CPU limit for the opensearch component using the resources parameter.

  4. Wait until all opensearch-master pods are recreated with the new CPU limits and become running and ready.

    To verify the current CPU limit for every opensearch container in every opensearch-master pod separately:

    kubectl -n stacklight get pod/opensearch-master-<podSuffixNumber> -o jsonpath="{.spec.containers[?(@.name=='opensearch')].resources}"
    

    In the command above, replace <podSuffixNumber> with the name of the pod suffix. For example, pod/opensearch-master-0 or pod/opensearch-master-2.

    Example of system response:

    {"limits":{"cpu":"900m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    The waiting time may take up to 20 minutes depending on the cluster size.

If the issue is fixed, the KubeContainersCPUThrottlingHigh alert stops firing immediately, while OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical can still be firing for some time during shard relocation.

If the KubeContainersCPUThrottlingHigh alert is still firing, proceed with another iteration of the CPU limit increase.

[40020] Rollover policy update is not appllied to the current index

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

While updating rollover_policy for the current system* and audit* data streams, the update is not applied to indices.

One of indicators that the cluster is most likely affected is the KubeJobFailed alert firing for the elasticsearch-curator job and one or both of the following errors being present in elasticsearch-curator pods that remain in the Error status:

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-audit-000001] is the write index for data stream [audit] and cannot be deleted')

or

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-system-000001] is the write index for data stream [system] and cannot be deleted')

Note

Instead of .ds-audit-000001 or .ds-system-000001 index names, similar names can be present with the same prefix but different suffix numbers.

If the above mentioned alert and errors are present, an immediate action is required, because it indicates that the corresponding index size has already exceeded the space allocated for the index.

To verify that the cluster is affected:

Caution

Verify and apply the workaround to both index patterns, system and audit, separately.

If one of indices is affected, the second one is most likely affected as well. Although in rare cases, only one index may be affected.

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. Verify that the rollover policy is present:

    • system:

      curl localhost:9200/_plugins/_ism/policies/system_rollover_policy
      
    • audit:

      curl localhost:9200/_plugins/_ism/policies/audit_rollover_policy
      

    The cluster is affected if the rollover policy is missing. Otherwise, proceed to the following step.

  3. Verify the system response from the previous step. For example:

    {"_id":"system_rollover_policy","_version":7229,"_seq_no":42362,"_primary_term":28,"policy":{"policy_id":"system_rollover_policy","description":"system index rollover policy.","last_updated_time":1708505222430,"schema_version":19,"error_notification":null,"default_state":"rollover","states":[{"name":"rollover","actions":[{"retry":{"count":3,"backoff":"exponential","delay":"1m"},"rollover":{"min_size":"14746mb","copy_alias":false}}],"transitions":[]}],"ism_template":[{"index_patterns":["system*"],"priority":200,"last_updated_time":1708505222430}]}}
    

    Verify and capture the following items separately for every policy:

    • The _seq_no and _primary_term values

    • The rollover policy threshold, which is defined in policy.states[0].actions[0].rollover.min_size

  4. List indices:

    • system:

      curl localhost:9200/_cat/indices | grep system
      

      Example of system response:

      [...]
      green open .ds-system-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
    • audit:

      curl localhost:9200/_cat/indices | grep audit
      

      Example of system response:

      [...]
      green open .ds-audit-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
  5. Select the index with the highest number and verify the rollover policy attached to the index:

    • system:

      curl localhost:9200/_plugins/_ism/explain/.ds-system-000001
      
    • audit:

      curl localhost:9200/_plugins/_ism/explain/.ds-audit-000001
      
    • If the rollover policy is not attached, the cluster is affected.

    • If the rollover policy is attached but _seq_no and _primary_term numbers do not match the previously captured ones, the cluster is affected.

    • If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), the cluster is most probably affected.

Workaround:

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. If the policy is attached to the index but has different _seq_no and _primary_term, remove the policy from the index:

    Note

    Use the index with the highest number in the name, which was captured during verification procedure.

    • system:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-system-000001
      
    • audit:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-audit-000001
      
  3. Re-add the policy:

    • system:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/system* -d'{"policy_id":"system_rollover_policy"}'
      
    • audit:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/audit* -d'{"policy_id":"audit_rollover_policy"}'
      
  4. Perform again the last step of the cluster verification procedure provided above and make sure that the policy is attached to the index and has the same _seq_no and _primary_term.

    If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), wait up to 15 minutes and verify that the additional index is created with the consecutive number in the index name. For example:

    • system: if you applied changes to .ds-system-000001, wait until .ds-system-000002 is created.

    • audit: if you applied changes to .ds-audit-000001, wait until .ds-audit-000002 is created.

    If such index is not created, escalate the issue to Mirantis support.


Container Cloud web UI
[41806] Configuration of a management cluster fails without Keycloak settings

Fixed in 17.1.4 and 16.1.4

During configuration of a management cluster settings using the Configure cluster web UI menu, updating the Keycloak Truststore settings is mandatory, despite being optional.

As a workaround, update the management cluster using the API or CLI.

See also

Patch releases

2.26.0

The Mirantis Container Cloud major release 2.26.0:

  • Introduces support for the Cluster release 17.1.0 that is based on the Cluster release 16.1.0 and represents Mirantis OpenStack for Kubernetes (MOSK) 24.1.

  • Introduces support for the Cluster release 16.1.0 that is based on Mirantis Container Runtime (MCR) 23.0.9 and Mirantis Kubernetes Engine (MKE) 3.7.5 with Kubernetes 1.27.

  • Does not support greenfield deployments on deprecated Cluster releases of the 17.0.x and 16.0.x series. Use the latest available Cluster releases of the series instead.

    Caution

    Make sure to update the Cluster release version of your managed cluster before the current Cluster release version becomes unsupported by a new Container Cloud release version. Otherwise, Container Cloud stops auto-upgrade and eventually Container Cloud itself becomes unsupported.

This section outlines release notes for the Container Cloud release 2.26.0.

Enhancements

This section outlines new features and enhancements introduced in the Container Cloud release 2.26.0. For the list of enhancements delivered with the Cluster releases introduced by Container Cloud 2.26.0, see 17.1.0 and 16.1.0.

Pre-update inspection of pinned product artifacts in a ‘Cluster’ object

To ensure that Container Cloud clusters remain consistently updated with the latest security fixes and product improvements, the Admission Controller has been enhanced. Now, it actively prevents the utilization of pinned custom artifacts for Container Cloud components. Specifically, it blocks a management or managed cluster release update, or any cluster configuration update, for example, adding public keys or proxy, if a Cluster object contains any custom Container Cloud artifacts with global or image-related values overwritten in the helm-releases section, until these values are removed.

Normally, the Container Cloud clusters do not contain pinned artifacts, which eliminates the need for any pre-update actions in most deployments. However, if the update of your cluster is blocked with the invalid HelmReleases configuration error, refer to Update notes: Pre-update actions for details.

Note

In rare cases, if the image-related or global values should be changed, you can use the ClusterRelease or KaaSRelease objects instead. But make sure to update these values manually after every major and patch update.

Note

The pre-update inspection applies only to images delivered by Container Cloud that are overwritten. Any custom images unrelated to the product components are not verified and do not block cluster update.

Disablement of worker machines on managed clusters

TechPreview

Implemented the machine disabling API that allows you to seamlessly remove a worker machine from the LCM control of a managed cluster. This action isolates the affected node without impacting other machines in the cluster, effectively eliminating it from the Kubernetes cluster. This functionality proves invaluable in scenarios where a malfunctioning machine impedes cluster updates.

Day-2 management API for bare metal clusters

TechPreview

Added initial Technology Preview support for the HostOSConfiguration and HostOSConfigurationModules custom resources in the bare metal provider. These resources introduce configuration modules that allow managing the operating system of a bare metal host granularly without rebuilding the node from scratch. Such approach prevents workload evacuation and significantly reduces configuration time.

Configuration modules manage various settings of the operating system using Ansible playbooks, adhering to specific schemas and metadata requirements. For description of module format, schemas, and rules, contact Mirantis support.

Warning

For security reasons and to ensure safe and reliable cluster operability, contact Mirantis support to start using these custom resources.

Caution

As long as the feature is still on the development stage, Mirantis highly recommends deleting all HostOSConfiguration objects, if any, before automatic upgrade of the management cluster to Container Cloud 2.27.0 (Cluster release 16.2.0). After the upgrade, you can recreate the required objects using the updated parameters.

This precautionary step prevents re-processing and re-applying of existing configuration, which is defined in HostOSConfiguration objects, during management cluster upgrade to 2.27.0. Such behavior is caused by changes in the HostOSConfiguration API introduced in 2.27.0.

Strict filtering for devices on bare metal clusters

Implemented the strict byID filtering for targeting system disks using specific device options: byPath, serialNumber, and wwn. These options offer a more reliable alternative to the unpredictable byName naming format.

Mirantis recommends adopting these new device naming options when adding new nodes and redeploying existing ones to ensure a predictable and stable device naming schema.

Dynamic IP allocation for faster host provisioning

Introduced a mechanism in the Container Cloud dnsmasq server to dynamically allocate IP addresses for baremetal hosts during provisioning. This new mechanism replaces sequential IP allocation that includes the ping check with dynamic IP allocation without the ping check. Such behavior significantly increases the amount of baremetal servers that you can provision in parallel, which allows you to streamline the process of setting up a large managed cluster.

Support for Kubernetes auditing and profiling on management clusters

Added support for the Kubernetes auditing and profiling enablement and configuration on management clusters. The auditing option is enabled by default. You can configure both options using Cluster object of the management cluster.

Note

For managed clusters, you can also configure Kubernetes auditing along with profiling using the Cluster object of a managed cluster.

Cleanup of LVM thin pool volumes during cluster provisioning

Implemented automatic cleanup of LVM thin pool volumes during the provisioning stage to prevent issues with logical volume detection before removal, which could cause node cleanup failure during cluster redeployment.

Wiping a device or partition before a bare metal cluster deployment

Implemented the capability to erase existing data from hardware devices to be used for a bare metal management or managed cluster deployment. Using the new wipeDevice structure, you can either erase an existing partition or remove all existing partitions from a physical device. For these purposes, use the eraseMetadata or eraseDevice option that configures cleanup behavior during configuration of a custom bare metal host profile.

Note

The wipeDevice option replaces the deprecated wipe option that will be removed in one of the following releases. For backward compatibility, any existing wipe: true option is automatically converted to the following structure:

wipeDevice:
  eraseMetadata:
    enabled: True
Policy Controller for validating pod image signatures

Technology Preview

Introduced initial Technology Preview support for the Policy Controller that validates signatures of pod images. The Policy Controller verifies that images used by the Container Cloud and Mirantis OpenStack for Kubernetes controllers are signed by a trusted authority. The Policy Controller inspects defined image policies that list Docker registries and authorities for signature validation.

Configuring trusted certificates for Keycloak

Added support for configuring Keycloak truststore using the Container Cloud web UI to allow for a proper validation of client self-signed certificates. The truststore is used to ensure secured connection to identity brokers, LDAP identity providers, and others.

Health monitoring of cluster LCM operations

Added the LCM Operation condition to monitor health of all LCM operations on a cluster and its machines that is useful during cluster update. You can monitor the status of LCM operations using the the Container Cloud web UI in the status hover menus of a cluster and machine.

Container Cloud web UI improvements for bare metal

Reorganized the Container Cloud web UI to optimize the baremetal-based managed cluster deployment and management:

  • Moved the L2 Templates and Subnets tabs from the Clusters menu to the separate Networks tab on the left sidebar.

  • Improved the Create Subnet menu by adding configuration for different subnet types.

  • Reorganized the Baremetal tab in the left sidebar that now contains Hosts, Hosts Profiles, and Credentials tabs.

  • Implemented the ability to add bare metal host profiles using the web UI.

  • Moved description of a baremetal host to Host info located in a baremetal host kebab menu on the Hosts page of the Baremetal tab.

  • Moved description of baremetal host credentials to Credential info located in a credential kebab menu on the Credentials page of the Baremetal tab.

Documentation enhancements

On top of continuous improvements delivered to the existing Container Cloud guides, added the documentation on how to export logs from OpenSearch dashboards to CSV.

Addressed issues

The following issues have been addressed in the Mirantis Container Cloud release 2.26.0 along with the Cluster releases 17.1.0 and 16.1.0.

Note

This section provides descriptions of issues addressed since the last Container Cloud patch release 2.25.4.

For details on addressed issues in earlier patch releases since 2.25.0, which are also included into the major release 2.26.0, refer to 2.25.x patch releases.

  • [32761] [LCM] Fixed the issue with node cleanup failing on MOSK clusters due to the Ansible provisioner hanging in a loop while trying to remove LVM thin pool logical volumes, which occurred due to issues with volume detection before removal during cluster redeployment. The issue resolution comprises implementation of automatic cleanup of LVM thin pool volumes during the provisioning stage.

  • [36924] [LCM] Fixed the issue with Ansible starting to run on nodes of a managed cluster after the mcc-cache certificate is applied on a management cluster.

  • [37268] [LCM] Fixed the issue with Container Cloud cluster being blocked by a node stuck in the Prepare or Deploy state with error processing package openssh-server. The issue was caused by customizations in /etc/ssh/sshd_config, such as additional Match statements.

  • [34820] [Ceph] Fixed the issue with the Ceph rook-operator failing to connect to Ceph RADOS Gateway pods on clusters with the Federal Information Processing Standard mode enabled.

  • [38340] [StackLight] Fixed the issue with Telegraf Docker Swarm timing out while collecting data by increasing its timeout from 10 to 25 seconds.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.26.0 including the Cluster releases 17.1.0 and 16.1.0.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[46245] Lack of access permissions for HOC and HOCM objects

Fixed in 2.28.0 (17.3.0 and 16.3.0)

When trying to list the HostOSConfigurationModules and HostOSConfiguration custom resources, serviceuser or a user with the global-admin or operator role obtains the access denied error. For example:

kubectl --kubeconfig ~/.kube/mgmt-config get hocm

Error from server (Forbidden): hostosconfigurationmodules.kaas.mirantis.com is forbidden:
User "2d74348b-5669-4c65-af31-6c05dbedac5f" cannot list resource "hostosconfigurationmodules"
in API group "kaas.mirantis.com" at the cluster scope: access denied

Workaround:

  1. Modify the global-admin role by adding a new entry with the following contents to the rules list:

    kubectl edit clusterroles kaas-global-admin
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurationmodules]
      verbs: ['*']
    
  2. For each Container Cloud project, modify the kaas-operator role by adding a new entry with the following contents to the rules list:

    kubectl -n <projectName> edit roles kaas-operator
    
    - apiGroups: [kaas.mirantis.com]
      resources: [hostosconfigurations]
      verbs: ['*']
    
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[41305] DHCP responses are lost between dnsmasq and dhcp-relay pods

Fixed in 2.28.0 (17.3.0 and 16.3.0)

After node maintenance of a management cluster, the newly added nodes may fail to undergo provisioning successfully. The issue relates to new nodes that are in the same L2 domain as the management cluster.

The issue was observed on environments having management cluster nodes configured with a single L2 segment used for all network traffic (PXE and LCM/management networks).

To verify whether the cluster is affected:

Verify whether the dnsmasq and dhcp-relay pods run on the same node in the management cluster:

kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"

Example of system response:

dhcp-relay-7d85f75f76-5vdw2   2/2   Running   2 (36h ago)   36h   10.10.0.122     kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (36h ago)   36h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>

If this is the case, proceed to the workaround below.

Workaround:

  1. Log in to a node that contains kubeconfig of the affected management cluster.

  2. Make sure that at least two management cluster nodes are schedulable:

    kubectl get node
    

    Example of a positive system response:

    NAME                                             STATUS   ROLES    AGE   VERSION
    kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   Ready    master   37h   v1.27.10-mirantis-1
    kaas-node-ad5a6f51-b98f-43c3-91d5-55fed3d0ff21   Ready    master   37h   v1.27.10-mirantis-1
    
  3. Delete the dhcp-relay pod:

    kubectl -n kaas delete pod <dhcp-relay-xxxxx>
    
  4. Verify that the dnsmasq and dhcp-relay pods are scheduled into different nodes:

    kubectl -n kaas get pods -o wide| grep -e "dhcp\|dnsmasq"
    

    Example of a positive system response:

    dhcp-relay-7d85f75f76-rkv03   2/2   Running   0             49s   10.10.0.121     kaas-node-bcedb87b-b3ce-46a4-a4ca-ea3068689e40   <none>   <none>
    dnsmasq-8f4b484b4-slhbd       5/5   Running   1 (37h ago)   37h   10.233.123.75   kaas-node-8a24b81c-76d0-4d4c-8421-962bd39df5ad   <none>   <none>
    
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


vSphere
[40747] Unsupported Cluster release is available for managed cluster deployment

Fixed in 2.26.1

The Cluster release 16.0.0, which is not supported for greenfield vSphere-based deployments, is still available in the drop-down menu of the cluster creation window in the Container Cloud web UI.

Do not select this Cluster release to prevent deployment failures. Use the latest supported version instead.


LCM
[41540] LCM Agent cannot grab storage information on a host

Fixed in 17.1.5 and 16.1.5

Due to issues with managing physical NVME devices, lcm-agent cannot grab storage information on a host. As a result, lcmmachine.status.hostinfo.hardware is empty and the following example error is present in logs:

{"level":"error","ts":"2024-05-02T12:26:10Z","logger":"agent", \
"msg":"get hardware details", \
"host":"kaas-node-548b2861-aed0-41c9-8ff2-10c5476b000b", \
"error":"new storage info: get disk info \"nvme0c0n1\": \
invoke command: exit status 1","errorVerbose":"exit status 1

As a workaround, on the affected node, create a symlink for any device indicated in lcm-agent logs. For example:

ln -sfn /dev/nvme0n1 /dev/nvme0c0n1
[40036] Node is not removed from a cluster when its Machine is disabled

Fixed in 2.26.1 (17.1.1 and 16.1.1)

During the ClusterRelease update of a MOSK cluster, a node cannot be removed from the Kubernetes cluster if the related Machine object is disabled.

As a workaround, remove the finalizer from the affected Node object.

[39437] Failure to replace a master node on a Container Cloud cluster

Fixed in 2.29.0 (17.4.0 and 16.4.0)

During the replacement of a master node on a cluster of any type, the process may get stuck with Kubelet's NodeReady condition is Unknown in the machine status on the remaining master nodes.

As a workaround, log in on the affected node and run the following command:

docker restart ucp-kubelet
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[41819] Graceful cluster reboot is blocked by the Ceph ClusterWorkloadLocks

Fixed in 2.27.0 (17.2.0 and 16.2.0)

During graceful reboot of a cluster with Ceph enabled, the reboot is blocked with the following message in the MiraCephMaintenance object status:

message: ClusterMaintenanceRequest found, Ceph Cluster is not ready to upgrade,
 delaying cluster maintenance

As a workaround, add the following snippet to the cephFS section under metadataServer in the spec section of <kcc-name>.yaml in the Ceph cluster:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      metadataServer:
        activeCount: 1
        healthCheck:
          livenessProbe:
            probe:
              failureThreshold: 5
              initialDelaySeconds: 30
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 5
[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.


StackLight
[44193] OpenSearch reaches 85% disk usage watermark affecting the cluster state

Fixed in 2.29.0 (17.4.0 and 16.4.0)

On High Availability (HA) clusters that use Local Volume Provisioner (LVP), Prometheus and OpenSearch from StackLight may share the same pool of storage. In such configuration, OpenSearch may approach the 85% disk usage watermark due to the combined storage allocation and usage patterns set by the Persistent Volume Claim (PVC) size parameters for Prometheus and OpenSearch, which consume storage the most.

When the 85% threshold is reached, the affected node is transitioned to the read-only state, preventing shard allocation and causing the OpenSearch cluster state to transition to Warning (Yellow) or Critical (Red).

Caution

The issue and the provided workaround apply only for clusters on which OpenSearch and Prometheus utilize the same storage pool.

To verify that the cluster is affected:

  1. Verify the result of the following formula:

    0.8 × OpenSearch_PVC_Size_GB + Prometheus_PVC_Size_GB > 0.85 × Total_Storage_Capacity_GB
    

    In the formula, define the following values:

    OpenSearch_PVC_Size_GB

    Derived from .values.elasticsearch.persistentVolumeUsableStorageSizeGB, defaulting to .values.elasticsearch.persistentVolumeClaimSize if unspecified. To obtain the OpenSearch PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.elasticsearch.persistentVolumeClaimSize '
    

    Example of system response:

    10000Gi
    
    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize. To obtain the Prometheus PVC size:

    kubectl -n <namespaceName> get cluster <clusterName> -o yaml |\
    yq '.spec.providerSpec.value.helmReleases[] | select(.name == "stacklight") | .values.prometheusServer.persistentVolumeClaimSize '
    

    Example of system response:

    4000Gi
    
    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    If the formula result is positive, it is an early indication that the cluster is affected.

  2. Verify whether the OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical alert is firing. And if so, verify the following:

    1. Log in to the OpenSearch web UI.

    2. In Management -> Dev Tools, run the following command:

      GET _cluster/allocation/explain
      

      The following system response indicates that the corresponding node is affected:

      "explanation": "the node is above the low watermark cluster setting \
      [cluster.routing.allocation.disk.watermark.low=85%], using more disk space \
      than the maximum allowed [85.0%], actual free: [xx.xxx%]"
      

      Note

      The system response may contain even higher watermark percent than 85.0%, depending on the case.

Workaround:

Warning

The workaround implies adjustement of the retention threshold for OpenSearch. And depending on the new threshold, some old logs will be deleted.

  1. Adjust or set .values.elasticsearch.persistentVolumeUsableStorageSizeGB to a lower value for the affection check formula to be non-positive. For configuration details, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

    Mirantis also recommends reserving some space for other PVCs using storage from the pool. Use the following formula to calculate the required space:

    persistentVolumeUsableStorageSizeGB =
    0.84 × ((1 - Reserved_Percentage - Filesystem_Reserve) ×
    Total_Storage_Capacity_GB - Prometheus_PVC_Size_GB) /
    0.8
    

    In the formula, define the following values:

    Reserved_Percentage

    A user-defined variable that specifies what percentage of the total storage capacity should not be used by OpenSearch or Prometheus. This is used to reserve space for other components. It should be expressed as a decimal. For example, for 5% of reservation, Reserved_Percentage is 0.05. Mirantis recommends using 0.05 as a starting point.

    Filesystem_Reserve

    Percentage to deduct for filesystems that may reserve some portion of the available storage, which is marked as occupied. For example, for EXT4, it is 5% by default, so the value must be 0.05.

    Prometheus_PVC_Size_GB

    Sourced from .values.prometheusServer.persistentVolumeClaimSize.

    Total_Storage_Capacity_GB

    Total capacity of the OpenSearch PVCs. For LVP, the capacity of the storage pool. To obtain the total capacity:

    kubectl get pvc -n stacklight -l app=opensearch-master \
    -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity.storage
    

    The system response contains multiple outputs, one per opensearch-master node. Select the capacity for the affected node.

    Note

    Convert the values to GB if they are set in different units.

    Calculation of above formula provides a maximum safe storage to allocate for .values.elasticsearch.persistentVolumeUsableStorageSizeGB. Use this formula as a reference for setting .values.elasticsearch.persistentVolumeUsableStorageSizeGB on a cluster.

  2. Wait up to 15-20 mins for OpenSearch to perform the cleaning.

  3. Verify that the cluster is not affected anymore using the procedure above.

[42304] Failure of shard relocation in the OpenSearch cluster

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

On large managed clusters, shard relocation may fail in the OpenSearch cluster with the yellow or red status of the OpenSearch cluster. The characteristic symptom of the issue is that in the stacklight namespace, the statefulset.apps/opensearch-master containers are experiencing throttling with the KubeContainersCPUThrottlingHigh alert firing for the following set of labels:

{created_by_kind="StatefulSet",created_by_name="opensearch-master",namespace="stacklight"}

Caution

The throttling that OpenSearch is experiencing may be a temporary situation, which may be related, for example, to a peaky load and the ongoing shards initialization as part of disaster recovery or after node restart. In this case, Mirantis recommends waiting until initialization of all shards is finished. After that, verify the cluster state and whether throttling still exists. And only if throttling does not disappear, apply the workaround below.

To verify that the initialization of shards is ongoing:

kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash

curl "http://localhost:9200/_cat/shards" | grep INITIALIZING

Example of system response:

.ds-system-000072    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-system-000073    1 r INITIALIZING    10.232.7.145   opensearch-master-2
.ds-system-000073    2 r INITIALIZING    10.232.182.135 opensearch-master-1
.ds-audit-000001     2 r INITIALIZING    10.232.7.145   opensearch-master-2

The system response above indicates that shards from the .ds-system-000072, .ds-system-000073, and .ds-audit-000001 indicies are in the INITIALIZING state. In this case, Mirantis recommends waiting until this process is finished, and only then consider changing the limit.

You can additionally analyze the exact level of throttling and the current CPU usage on the Kubernetes Containers dashboard in Grafana.

Workaround:

  1. Verify the currently configured CPU requests and limits for the opensearch containers:

    kubectl -n stacklight get statefulset.apps/opensearch-master -o jsonpath="{.spec.template.spec.containers[?(@.name=='opensearch')].resources}"
    

    Example of system response:

    {"limits":{"cpu":"600m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    In the example above, the CPU request is 500m and the CPU limit is 600m.

  2. Increase the CPU limit to a reasonably high number.

    For example, the default CPU limit for the clusters with the clusterSize:large parameter set was increased from 8000m to 12000m for StackLight in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0).

    Note

    For details, on the clusterSize parameter, see MOSK Operations Guide: StackLight configuration parameters - Cluster size.

    If the defaults are already overridden on the affected cluster using the resourcesPerClusterSize or resources parameters as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits, then the exact recommended number depends on the currently set limit.

    Mirantis recommends increasing the limit by 50%. If it does not resolve the issue, another increase iteration will be required.

  3. When you select the required CPU limit, increase it as described in MOSK Operations Guide: StackLight configuration parameters - Resource limits.

    If the CPU limit for the opensearch component is already set, increase it in the Cluster object for the opensearch parameter. Otherwise, the default StackLight limit is used. In this case, increase the CPU limit for the opensearch component using the resources parameter.

  4. Wait until all opensearch-master pods are recreated with the new CPU limits and become running and ready.

    To verify the current CPU limit for every opensearch container in every opensearch-master pod separately:

    kubectl -n stacklight get pod/opensearch-master-<podSuffixNumber> -o jsonpath="{.spec.containers[?(@.name=='opensearch')].resources}"
    

    In the command above, replace <podSuffixNumber> with the name of the pod suffix. For example, pod/opensearch-master-0 or pod/opensearch-master-2.

    Example of system response:

    {"limits":{"cpu":"900m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
    

    The waiting time may take up to 20 minutes depending on the cluster size.

If the issue is fixed, the KubeContainersCPUThrottlingHigh alert stops firing immediately, while OpenSearchClusterStatusWarning or OpenSearchClusterStatusCritical can still be firing for some time during shard relocation.

If the KubeContainersCPUThrottlingHigh alert is still firing, proceed with another iteration of the CPU limit increase.

[40020] Rollover policy update is not appllied to the current index

Fixed in 17.2.0, 16.2.0, 17.1.6, 16.1.6

While updating rollover_policy for the current system* and audit* data streams, the update is not applied to indices.

One of indicators that the cluster is most likely affected is the KubeJobFailed alert firing for the elasticsearch-curator job and one or both of the following errors being present in elasticsearch-curator pods that remain in the Error status:

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-audit-000001] is the write index for data stream [audit] and cannot be deleted')

or

2024-05-31 13:16:04,459 ERROR   Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: RequestError(400, 'illegal_argument_exception', 'index [.ds-system-000001] is the write index for data stream [system] and cannot be deleted')

Note

Instead of .ds-audit-000001 or .ds-system-000001 index names, similar names can be present with the same prefix but different suffix numbers.

If the above mentioned alert and errors are present, an immediate action is required, because it indicates that the corresponding index size has already exceeded the space allocated for the index.

To verify that the cluster is affected:

Caution

Verify and apply the workaround to both index patterns, system and audit, separately.

If one of indices is affected, the second one is most likely affected as well. Although in rare cases, only one index may be affected.

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. Verify that the rollover policy is present:

    • system:

      curl localhost:9200/_plugins/_ism/policies/system_rollover_policy
      
    • audit:

      curl localhost:9200/_plugins/_ism/policies/audit_rollover_policy
      

    The cluster is affected if the rollover policy is missing. Otherwise, proceed to the following step.

  3. Verify the system response from the previous step. For example:

    {"_id":"system_rollover_policy","_version":7229,"_seq_no":42362,"_primary_term":28,"policy":{"policy_id":"system_rollover_policy","description":"system index rollover policy.","last_updated_time":1708505222430,"schema_version":19,"error_notification":null,"default_state":"rollover","states":[{"name":"rollover","actions":[{"retry":{"count":3,"backoff":"exponential","delay":"1m"},"rollover":{"min_size":"14746mb","copy_alias":false}}],"transitions":[]}],"ism_template":[{"index_patterns":["system*"],"priority":200,"last_updated_time":1708505222430}]}}
    

    Verify and capture the following items separately for every policy:

    • The _seq_no and _primary_term values

    • The rollover policy threshold, which is defined in policy.states[0].actions[0].rollover.min_size

  4. List indices:

    • system:

      curl localhost:9200/_cat/indices | grep system
      

      Example of system response:

      [...]
      green open .ds-system-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
    • audit:

      curl localhost:9200/_cat/indices | grep audit
      

      Example of system response:

      [...]
      green open .ds-audit-000001   FjglnZlcTKKfKNbosaE9Aw 2 1 1998295  0   1gb 507.9mb
      
  5. Select the index with the highest number and verify the rollover policy attached to the index:

    • system:

      curl localhost:9200/_plugins/_ism/explain/.ds-system-000001
      
    • audit:

      curl localhost:9200/_plugins/_ism/explain/.ds-audit-000001
      
    • If the rollover policy is not attached, the cluster is affected.

    • If the rollover policy is attached but _seq_no and _primary_term numbers do not match the previously captured ones, the cluster is affected.

    • If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), the cluster is most probably affected.

Workaround:

  1. Log in to the opensearch-master-0 Pod:

    kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
    
  2. If the policy is attached to the index but has different _seq_no and _primary_term, remove the policy from the index:

    Note

    Use the index with the highest number in the name, which was captured during verification procedure.

    • system:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-system-000001
      
    • audit:

      curl -XPOST localhost:9200/_plugins/_ism/remove/.ds-audit-000001
      
  3. Re-add the policy:

    • system:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/system* -d'{"policy_id":"system_rollover_policy"}'
      
    • audit:

      curl -XPOST -H "Content-type: application/json" localhost:9200/_plugins/_ism/add/audit* -d'{"policy_id":"audit_rollover_policy"}'
      
  4. Perform again the last step of the cluster verification procedure provided above and make sure that the policy is attached to the index and has the same _seq_no and _primary_term.

    If the index size drastically exceeds the defined threshold of the rollover policy (which is the previously captured min_size), wait up to 15 minutes and verify that the additional index is created with the consecutive number in the index name. For example:

    • system: if you applied changes to .ds-system-000001, wait until .ds-system-000002 is created.

    • audit: if you applied changes to .ds-audit-000001, wait until .ds-audit-000002 is created.

    If such index is not created, escalate the issue to Mirantis support.


Container Cloud web UI
[41806] Configuration of a management cluster fails without Keycloak settings

Fixed in 17.1.4 and 16.1.4

During configuration of a management cluster settings using the Configure cluster web UI menu, updating the Keycloak Truststore settings is mandatory, despite being optional.

As a workaround, update the management cluster using the API or CLI.

Components versions

The following table lists the major components and their versions delivered in the Container Cloud 2.26.0.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Container Cloud release components versions

Component

Application/Service

Version

Bare metal Updated

ambasador

1.39.13

baremetal-dnsmasq

base-2-26-alpine-20240129134230

baremetal-operator

base-2-26-alpine-20240129135007

baremetal-provider

1.39.13

bm-collective

base-2-26-alpine-20240129155244

cluster-api-provider-baremetal

1.39.13

ironic

yoga-jammy-20240108060019

ironic-inspector

yoga-jammy-20240108060019

ironic-prometheus-exporter

0.1-20240117102150

kaas-ipam

base-2-26-alpine-20240129213142

kubernetes-entrypoint

1.0.1-55b02f7-20231019172556

mariadb

10.6.14-focal-20231127070342

metallb-controller

0.13.12-31212f9e-amd64

metallb-speaker

0.13.12-31212f9e-amd64

syslog-ng

base-alpine-20240129163811

Container Cloud

admission-controller Updated

1.39.13

agent-controller Updated

1.39.13

byo-cluster-api-controller New

1.39.13

byo-credentials-controller New

1.39.13

ceph-kcc-controller Updated

1.39.13

cert-manager-controller

1.11.0-5

cinder-csi-plugin Updated

1.27.2-11

client-certificate-controller Updated

1.39.13

configuration-collector Updated

1.39.13

csi-attacher Updated

4.2.0-4

csi-node-driver-registrar Updated

2.7.0-4

csi-provisioner Updated

3.4.1-4

csi-resizer Updated

1.7.0-4

csi-snapshotter Updated

6.2.1-mcc-3

event-controller Updated

1.39.13

frontend Updated

1.39.13

golang

1.20.4-alpine3.17

iam-controller Updated

1.39.13

kaas-exporter Updated

1.39.13

kproxy Updated

1.39.13

lcm-controller Updated

1.39.13

license-controller Updated

1.39.13

livenessprobe Updated

2.9.0-4

machinepool-controller Updated

1.38.17

mcc-haproxy Updated

0.24.0-46-gdaf7dbc

metrics-server Updated

0.6.3-6

nginx Updated

1.39.13

policy-controller New

1.39.13

portforward-controller Updated

1.39.13

proxy-controller Updated

1.39.13

rbac-controller Updated

1.39.13

registry Updated

2.8.1-9

release-controller Updated

1.39.13

rhellicense-controller Updated

1.39.13

scope-controller Updated

1.39.13

storage-discovery Updated

1.39.13

user-controller Updated

1.39.13

IAM

iam Updated

1.39.13

iam-controller Updated

1.39.13

keycloak Removed

n/a

mcc-keycloak New

23.0.3-1

OpenStack Updated

host-os-modules-controller New

1.39.13

openstack-cloud-controller-manager

v1.27.2-12

openstack-cluster-api-controller

1.39.13

openstack-provider

1.39.13

os-credentials-controller

1.39.13

VMware vSphere

mcc-keepalived Updated

0.24.0-46-gdaf7dbc

squid-proxy

0.0.1-10-g24a0d69

vsphere-cloud-controller-manager New

v1.27.0-5

vsphere-cluster-api-controller Updated

1.39.13

vsphere-credentials-controller Updated

1.39.13

vsphere-csi-driver New

v3.0.2-1

vsphere-csi-syncer New

v3.0.2-1

vsphere-provider Updated

1.39.13

vsphere-vm-template-controller Updated

1.39.13

Artifacts

This section lists the artifacts of components included in the Container Cloud release 2.26.0.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts
Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20240201183421

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20240201183421

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-146-1bd8e71.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.39.13.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.39.13.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.39.13.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.39.13.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.39.13.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.39.13.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.39.13.tgz

Docker images Updated

ambasador

mirantis.azurecr.io/core/external/nginx:1.39.13

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-26-alpine-20240129134230

baremetal-operator

mirantis.azurecr.io/bm/baremetal-operator:base-2-26-alpine-20240129135007

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-26-alpine-20240129155244

cluster-api-provider-baremetal

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.39.13

ironic

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20240108060019

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20240108060019

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20240117102150

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-26-alpine-20240129213142

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231127070342

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-46-gdaf7dbc

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.12-31212f9e-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.12-31212f9e-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20240129163811

Core artifacts
Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.39.13.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.39.13.tgz

Helm charts

admission-controller Updated

https://binary.mirantis.com/core/helm/admission-controller-1.39.13.tgz

agent-controller Updated

https://binary.mirantis.com/core/helm/agent-controller-1.39.13.tgz

byo-credentials-controller New

https://binary.mirantis.com/core/helm/byo-credentials-controller-1.39.13.tgz

byo-provider New

https://binary.mirantis.com/core/helm/byo-provider-1.39.13.tgz

ceph-kcc-controller Updated

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.39.13.tgz

cert-manager Updated

https://binary.mirantis.com/core/helm/cert-manager-1.39.13.tgz

cinder-csi-plugin Updated

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.39.13.tgz

client-certificate-controller Updated

https://binary.mirantis.com/core/helm/client-certificate-controller-1.39.13.tgz

configuration-collector Updated

https://binary.mirantis.com/core/helm/configuration-collector-1.39.13.tgz

event-controller Updated

https://binary.mirantis.com/core/helm/event-controller-1.39.13.tgz

host-os-modules-controller New

https://binary.mirantis.com/core/helm/host-os-modules-controller-1.39.13.tgz

iam-controller Updated

https://binary.mirantis.com/core/helm/iam-controller-1.39.13.tgz

kaas-exporter Updated

https://binary.mirantis.com/core/helm/kaas-exporter-1.39.13.tgz

kaas-public-api Updated

https://binary.mirantis.com/core/helm/kaas-public-api-1.39.13.tgz

kaas-ui Updated

https://binary.mirantis.com/core/helm/kaas-ui-1.39.13.tgz

lcm-controller Updated

https://binary.mirantis.com/core/helm/lcm-controller-1.39.13.tgz

license-controller Updated

https://binary.mirantis.com/core/helm/license-controller-1.39.13.tgz

machinepool-controller Updated

https://binary.mirantis.com/core/helm/machinepool-controller-1.39.13.tgz

mcc-cache Updated

https://binary.mirantis.com/core/helm/mcc-cache-1.39.13.tgz

mcc-cache-warmup Updated

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.39.13.tgz

metrics-server Updated

https://binary.mirantis.com/core/helm/metrics-server-1.39.13.tgz

openstack-cloud-controller-manager Updated

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.39.13.tgz

openstack-provider Updated

https://binary.mirantis.com/core/helm/openstack-provider-1.39.13.tgz

os-credentials-controller Updated

https://binary.mirantis.com/core/helm/os-credentials-controller-1.39.13.tgz

policy-controller New

https://binary.mirantis.com/core/helm/policy-controller-1.39.13.tgz

portforward-controller Updated

https://binary.mirantis.com/core/helm/portforward-controller-1.39.13.tgz

proxy-controller Updated

https://binary.mirantis.com/core/helm/proxy-controller-1.39.13.tgz

rbac-controller Updated

https://binary.mirantis.com/core/helm/rbac-controller-1.39.13.tgz

release-controller Updated

https://binary.mirantis.com/core/helm/release-controller-1.39.13.tgz

rhellicense-controller Updated

https://binary.mirantis.com/core/helm/rhellicense-controller-1.39.13.tgz

scope-controller Updated

https://binary.mirantis.com/core/helm/scope-controller-1.39.13.tgz

squid-proxy Updated

https://binary.mirantis.com/core/helm/squid-proxy-1.39.13.tgz

storage-discovery Updated

https://binary.mirantis.com/core/helm/storage-discovery-1.39.13.tgz

user-controller Updated

https://binary.mirantis.com/core/helm/user-controller-1.39.13.tgz

vsphere-cloud-controller-manager New

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.39.13.tgz

vsphere-credentials-controller Updated

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.39.13.tgz

vsphere-csi-plugin New

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.39.13.tgz

vsphere-provider Updated

https://binary.mirantis.com/core/helm/vsphere-provider-1.39.13.tgz

vsphere-vm-template-controller Updated

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.39.13.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.39.13

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.39.13

byo-cluster-api-controller New

mirantis.azurecr.io/core/byo-cluster-api-controller:1.39.13

byo-credentials-controller New

mirantis.azurecr.io/core/byo-credentials-controller:1.39.13

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.39.13

cert-manager-controller Updated

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-5

cinder-csi-plugin Updated

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-11

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.39.13

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.39.13

csi-attacher Updated

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-4

csi-node-driver-registrar Updated

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-4

csi-provisioner Updated

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-4

csi-resizer Updated

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-4

csi-snapshotter Updated

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-3

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.39.13

frontend Updated

mirantis.azurecr.io/core/frontend:1.39.13

host-os-modules-controller New

mirantis.azurecr.io/core/host-os-modules-controller:1.39.13

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.39.13

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.39.13

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.39.13

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.39.13

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.39.13

livenessprobe Updated

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-4

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.39.13

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.24.0-46-gdaf7dbc

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.24.0-46-gdaf7dbc

metrics-server Updated

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-6

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.39.13

openstack-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-12

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.39.13

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.39.13

policy-controller New

mirantis.azurecr.io/core/policy-controller:1.39.13

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.39.13

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.39.13

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.39.13

registry Updated

mirantis.azurecr.io/lcm/registry:v2.8.1-9

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.39.13

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.39.13

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.39.13

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.39.13

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.39.13

vsphere-cloud-controller-manager New

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-5

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.39.13

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.39.13

vsphere-csi-driver New

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer New

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.39.13

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/core/helm/iam-1.39.13.tgz

Docker images

keycloak Removed

n/a

kubectl New

mirantis.azurecr.io/stacklight/kubectl:1.22-20240105023016

kubernetes-entrypoint Updated

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231127070342

mcc-keycloak New

mirantis.azurecr.io/iam/mcc-keycloak:23.0.3-1

Security notes

The table below includes the total numbers of addressed unique and common vulnerabilities and exposures (CVE) by product component since the 2.25.4 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

2

2

Common

0

6

6

Kaas core

Unique

0

7

7

Common

0

8

8

StackLight

Unique

3

7

10

Common

5

19

24

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 24.1: Security notes.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.1.0 or 16.1.0.

Consider this information as a supplement to the generic update procedures published in Operations Guide: Automatic upgrade of a management cluster and Update a managed cluster.

Pre-update actions
Unblock cluster update by removing any pinned product artifacts

If any pinned product artifacts are present in the Cluster object of a management or managed cluster, the update will be blocked by the Admission Controller with the invalid HelmReleases configuration error until such artifacts are removed. The update process does not start and any changes in the Cluster object are blocked by the Admission Controller except the removal of fields with pinned product artifacts.

Therefore, verify that the following sections of the Cluster objects do not contain any image-related (tag, name, pullPolicy, repository) and global values inside Helm releases:

  • .spec.providerSpec.value.helmReleases

  • .spec.providerSpec.value.kaas.management.helmReleases

  • .spec.providerSpec.value.regionalHelmReleases

  • .spec.providerSpec.value.regional

For example, a cluster configuration that contains the following highlighted lines will be blocked until you remove them:

- name: kaas-ipam
          values:
            kaas_ipam:
              image:
                tag: base-focal-20230127092754
              exampleKey: exampleValue
- name: kaas-ipam
          values:
            global:
              anyKey: anyValue
            kaas_ipam:
              image:
                tag: base-focal-20230127092754
              exampleKey: exampleValue

The custom pinned product artifacts are inspected and blocked by the Admission Controller to ensure that Container Cloud clusters remain consistently updated with the latest security fixes and product improvements

Note

The pre-update inspection applies only to images delivered by Container Cloud that are overwritten. Any custom images unrelated to the product components are not verified and do not block cluster update.

Update queries for custom log-based metrics in StackLight

Container Cloud 2.26.0 introduces reorganized and significantly improved StackLight logging pipeline. It involves changes in queries implemented in the scope of the logging.metricQueries feature designed for creation of custom log-based metrics. For the procedure, see StackLight operations: Create logs-based metrics.

If you already have some custom log-based metrics:

  1. Before the cluster update, save existing queries.

  2. After the cluster update, update the queries according to the changes implemented in the scope of the logging.metricQueries feature.

These steps prevent failures of queries containing fields that are renamed or removed in Container Cloud 2.26.0.

Post-update actions
Update bird configuration on BGP-enabled bare metal clusters

Container Cloud 2.26.0 introduces the bird daemon update from v1.6.8 to v2.0.7 on master nodes if BGP is used for BGP announcement of the cluster API load balancer address.

Configuration files for bird v1.x are not fully compatible with those for bird v2.x. Therefore, if you used BGP announcement of cluster API LB address on a deployment based on Cluster releases 17.0.0 or 16.0.0, update bird configuration files to fit bird v2.x using configuration examples provided in the API Reference: MultirRackCluster section.

Review and adjust the storage parameters for OpenSearch

To prevent underused or overused storage space, review your storage space parameters for OpenSearch on the StackLight cluster:

  1. Review the value of elasticsearch.persistentVolumeClaimSize and the real storage available on volumes.

  2. Decide whether you have to additionally set elasticsearch.persistentVolumeUsableStorageSizeGB.

For both parameters description, see MOSK Operations Guide: StackLight configuration parameters - OpenSearch.

2.25.4

The Container Cloud patch release 2.25.4, which is based on the 2.25.0 major release, provides the following updates:

  • Support for the patch Cluster releases 16.0.4 and 17.0.4 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 23.3.4.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.0.0 and 16.0.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.25.4, refer to 2.25.0.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.25.4. For artifacts of the Cluster releases introduced in 2.25.4, see patch Cluster releases 17.0.4 and 16.0.4.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20231012141354

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20231012141354

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-113-4f8b843.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.38.33.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.38.33.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.38.33.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.38.33.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.38.33.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.38.33.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.38.33.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.38.33

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-25-alpine-20231128145936

baremetal-operator

mirantis.azurecr.io/bm/baremetal-operator:base-2-25-alpine-20231204121500

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-25-alpine-20231121115652

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.38.33

ironic

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20231204153029

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20231204153029

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20231204142028

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-25-alpine-20231121164200

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231127070342

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.23.0-88-g35be0fc

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.9-ef4faae9-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.9-ef4faae9-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20231121121917

Core artifacts

Artifact

Component

Paths

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.38.33.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.38.33.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.38.33.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.38.33.tgz

byo-credentials-controller

https://binary.mirantis.com/core/helm/byo-credentials-controller-1.38.33.tgz

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.38.33.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.38.33.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.38.33.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.38.33.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.38.33.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.38.33.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.38.33.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.38.33.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.38.33.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.38.33.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.38.33.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.38.33.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.38.33.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.38.33.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.38.33.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.38.33.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.38.33.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.38.33.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.38.33.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.38.33.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.38.33.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.38.33.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.38.33.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.38.33.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.38.33.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.38.33.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.38.33.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.38.33.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.38.33.tgz

vsphere-cloud-controller-manager

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.38.33.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.38.33.tgz

vsphere-csi-plugin

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.38.33.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.38.33.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.38.33.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.38.33

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.38.33

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.38.33

byo-credentials-controller Updated

mirantis.azurecr.io/core/byo-credentials-controller:1.38.33

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.38.33

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-5

cinder-csi-plugin

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-11

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.38.33

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.38.33

csi-attacher

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-4

csi-node-driver-registrar

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-4

csi-provisioner

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-4

csi-resizer

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-4

csi-snapshotter

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-3

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.38.33

frontend Updated

mirantis.azurecr.io/core/frontend:1.38.33

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.38.33

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.38.33

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.38.33

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.38.33

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.38.33

livenessprobe

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-4

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.38.33

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.23.0-88-g35be0fc

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.23.0-88-g35be0fc

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-6

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.38.33

openstack-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-12

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.38.33

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.38.33

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.38.33

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.38.33

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.38.33

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-7

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.38.33

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.38.33

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.38.33

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.38.33

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.38.33

vsphere-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-5

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.38.33

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.38.33

vsphere-csi-driver

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.38.33

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/iam/helm/iam-2.6.4.tgz

Docker images

kubectl Updated

mirantis.azurecr.io/stacklight/kubectl:1.22-20231208023019

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231127070342

mcc-keycloak

mirantis.azurecr.io/iam/mcc-keycloak:22.0.5-1

Security notes

The table below includes the total numbers of addressed unique and common CVEs in images by product component since the Container Cloud 2.25.3 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

1

1

Common

0

5

5

Kaas core

Unique

0

1

1

Common

0

1

1

StackLight

Unique

0

3

3

Common

0

9

9

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 23.3.4: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.25.4 along with the patch Cluster releases 17.0.4 and 16.0.4.

  • [38259] Fixed the issue causing the failure to attach an existing MKE cluster to a Container Cloud management cluster. The issue was related to byo-provider and prevented the attachment of MKE clusters having less than three manager nodes and two worker nodes.

  • [38399] Fixed the issue causing the failure to deploy a management cluster in the offline mode due to the issue in the setup script.

See also

Patch releases

Releases delivered in 2023

This section contains historical information on the unsupported Container Cloud releases delivered in 2023. For the latest supported Container Cloud release, see Container Cloud releases.

Unsupported Container Cloud releases 2023

Version

Release date

Summary

2.25.3

Dec 18, 2023

Container Cloud 2.25.3 is the third patch release of the 2.25.x and MOSK 23.3.x release series that introduces the following updates:

  • Support for MKE 3.7.3

  • Patch Cluster release 17.0.3 for MOSK 23.3.3

  • Patch Cluster release 16.0.3

  • Security fixes for CVEs in images

2.25.2

Dec 05, 2023

Container Cloud 2.25.2 is the second patch release of the 2.25.x and MOSK 23.3.x release series that introduces the following updates:

  • Support for attachment of non Container Cloud based MKE clusters to vSphere-based management clusters

  • Patch Cluster release 17.0.2 for MOSK 23.3.2

  • Patch Cluster release 16.0.2

  • Security fixes for CVEs in images

2.25.1

Nov 27, 2023

Container Cloud 2.25.1 is the first patch release of the 2.25.x and MOSK 23.3.x release series that introduces the following updates:

  • MKE:

    • Support for MKE 3.7.2

    • Amendments for MKE configuration managed by Container Cloud

  • vSphere:

    • Switch to an external vSphere cloud controller manager

    • Mandatory MKE upgrade from 3.6 from 3.7

  • StackLight:

    • Kubernetes Network Policies

    • MKE benchmark compliance

  • Patch Cluster release 17.0.1 for MOSK 23.3.1

  • Patch Cluster release 16.0.1

  • Security fixes for CVEs in images

2.25.0

Nov 06, 2023

  • Container Cloud Bootstrap v2

  • Support for MKE 3.7.1 and MCR 23.0.7

  • General availability for RHEL 8.7 on vSphere-based clusters

  • Automatic cleanup of old Ubuntu kernel packages

  • Configuration of a custom OIDC provider for MKE on managed clusters

  • General availability for graceful machine deletion

  • Bare metal provider:

    • General availability for MetalLBConfigTemplate and MetalLBConfig objects

    • Manual IP address allocation for bare metal hosts during PXE provisioning

  • Ceph:

    • Addressing storage devices using by-id identifiers

    • Verbose Ceph cluster status in the KaaSCephCluster.status specification

    • Detailed view of a Ceph cluster summary in web UI

  • StackLight:

    • Fluentd log forwarding to Splunk

    • Ceph monitoring improvements

    • Optimization of StackLight NodeDown alerts

    • OpenSearch performance optimization

    • Documentation: Export data from Table panels of Grafana dashboards to CSV

  • Container Cloud web UI:

    • Status of infrastructure health for bare metal and OpenStack providers

    • Parallel update of worker nodes

    • Graceful machine deletion

2.24.5

Sep 26, 2023

Container Cloud 2.24.4 is the third patch release of the 2.24.x and MOSK 23.2.x release series that introduces the following updates:

  • Patch Cluster release 15.0.4 for MOSK 23.2.3

  • Patch Cluster release 14.0.4

  • Security fixes for CVEs of Critical and High severity

2.24.4

Sep 14, 2023

Container Cloud 2.24.4 is the second patch release of the 2.24.x and MOSK 23.2.x release series that introduces the following updates:

  • Patch Cluster release 15.0.3 for MOSK 23.2.2

  • Patch Cluster release 14.0.3

  • Multi-rack topology for bare metal managed clusters

  • Configuration of the etcd storage quota

  • Security fixes for CVEs of Critical and High severity

2.24.3

Aug 29, 2023

Container Cloud 2.24.3 is the first patch release of the 2.24.x and MOSK 23.2.x release series that introduces the following updates:

  • Patch Cluster release 15.0.2 for MOSK 23.2.1

  • Patch Cluster release 14.0.2

  • Support for MKE 3.6.6 and updated docker-ee-cli 20.10.18 for MCR 20.10.17

  • GA for TLS certificates configuration

  • Security fixes for CVEs of High severity

  • End of support for new deployments on deprecated major or patch Cluster releases

For details, see Patch releases.

2.24.2

Aug 21, 2023

Based on 2.24.1, Container Cloud 2.24.2:

  • Introduces the major Cluster release 15.0.1 that is based on 14.0.1 and supports Mirantis OpenStack for Kubernetes (MOSK) 23.2.

  • Supports the Cluster release 14.0.1. The deprecated Cluster release 14.0.0 and the 12.7.x along with 11.7.x series are not supported for new deployments.

  • Contains features and amendments of the parent releases 2.24.0 and 2.24.1.

2.24.1

Jul 27, 2023

Patch release containing hot fixes for the major Container Cloud release 2.24.0.

2.24.0

Jul 20, 2023

  • Support for MKE 3.6.5 and MCR 20.10.17

  • Bare metal:

    • Automated upgrade of operating system on management and regional clusters

    • Support for WireGuard

    • Configuration of MTU size for Calico

    • MetalLB configuration changes

  • vSphere:

    • Support for RHEL 8.7

    • MetalLB configuration changes

  • OpenStack:

    • Custom flavors for Octavia

    • Deletion of persistent volumes during a cluster deletion

  • IAM:

    • Support for Keycloak Quarkus

    • The admin role for management cluster

  • Security:

    • Support for auditd

    • General availability for TLS certificates configuration

  • LCM:

    • Custom host names for cluster machines

    • Cache warm-up for managed clusters

  • Ceph:

    • Automatic upgrade of Ceph from Pacific to Quincy

    • Ceph non-admin client for a shared Ceph cluster

    • Dropping of redundant components from management and regional clusters

    • Documentation enhancements for Ceph OSDs

  • StackLight:

    • Major version update of OpenSearch and OpenSearch Dashboards from 1.3.7 to 2.7.0

    • Monitoring of network connectivity between Ceph nodes

    • Improvements to StackLight alerting

    • Performance tuning of Grafana dashboards

    • Dropped and white-listed metrics

  • Container Cloud web UI:

    • Graceful cluster reboot

    • Creation and deletion of bare metal host credentials

    • Node labeling improvements

2.23.5

June 05, 2023

Container Cloud 2.23.5 is the fourth patch release of the 2.23.0 and 2.23.1 major releases that:

  • Contains security fixes for critical and high CVEs

  • Introduces the patch Cluster release 12.7.4 for MOSK 23.1.4

  • Introduces the patch Cluster release 11.7.4

  • Supports all major Cluster releases introduced in previous 2.23.x releases

  • Does not support new deployments on deprecated major or patch Cluster releases

For details, see Patch releases.

2.23.4

May 22, 2023

Container Cloud 2.23.4 is the third patch release of the 2.23.0 and 2.23.1 major releases that:

  • Contains several addressed issues and security fixes for critical and high CVEs

  • Introduces the patch Cluster release 12.7.3 for MOSK 23.1.3

  • Introduces the patch Cluster release 11.7.3

  • Supports all major Cluster releases introduced in previous 2.23.x releases

  • Does not support new deployments on deprecated major or patch Cluster releases

For details, see Patch releases.

2.23.3

May 04, 2023

Container Cloud 2.23.3 is the second patch release of the 2.23.0 and 2.23.1 major releases that:

  • Contains security fixes for critical and high CVEs

  • Introduces the patch Cluster release 12.7.2 for MOSK 23.1.2

  • Introduces the patch Cluster release 11.7.2

  • Supports all major Cluster releases introduced in previous 2.23.x releases

  • Does not support new deployments on deprecated major or patch Cluster releases

For details, see Patch releases.

2.23.2

Apr 20, 2023

Container Cloud 2.23.2 is the first patch release of the 2.23.0 and 2.23.1 major releases that:

  • Contains security fixes for critical and high CVEs

  • Introduces support for patch Cluster releases 12.7.1 or 11.7.1

  • Supports all major Cluster releases introduced and supported in the previous 2.23.x releases

For details, see Patch releases.

2.23.1

Apr 04, 2023

Based on 2.23.0, Container Cloud 2.23.1:

  • Introduces the Cluster release 12.7.0 that is based on 11.7.0 and supports Mirantis OpenStack for Kubernetes (MOSK) 23.1.

  • Supports the Cluster release 11.7.0. The deprecated Cluster releases 12.5.0 and 11.6.0 are not supported for new deployments.

  • Contains features and amendments of the parent releases 2.23.0 and 2.22.0.

2.23.0

Mar 07, 2023

  • MKE patch release update from 3.5.5 to 3.5.7

  • Automatic upgrade of Ceph from Octopus 15.2.17 to Pacific 16.2.11

  • Graceful cluster reboot using the GracefulRebootRequest CR

  • Readiness fields for Machine and Cluster objects

  • Deletion of persistent volumes during an OpenStack-based cluster deletion

  • Option to disable time sync management

  • Upgrade button for easy cluster update through the web UI

  • Deployment of an Equinix Metal regional cluster with private networking on top of a public management cluster

  • StackLight:

    • HA setup for iam-proxy in StackLight

    • Log forwarding to third-party systems using Fluentd plugins

    • MCC Applications Performance Grafana dashboard

    • PVC configuration for Reference Application

2.22.0

Jan 31, 2023

  • Custom network configuration for Equinix Metal managed clusters

  • Custom TLS certificates for the StackLight iam-proxy endpoints

  • Notification of a required reboot in the status of a bare metal machine

  • Cluster deployment and update history objects

  • Extended logging format for essential management cluster components

  • StackLight:

    • Bond interfaces monitoring

    • Calculation of storage retention time

    • Deployment of cAdvisor as a StackLight component

    • Container Cloud web UI support for Reference Application

  • Ceph:

    • Two Ceph Managers by default for HA

    • General availability of Ceph Shared File System

    • Sharing Ceph between managed clusters or to an attached MKE cluster

2.25.3

The Container Cloud patch release 2.25.3, which is based on the 2.25.0 major release, provides the following updates:

This patch release also supports the latest major Cluster releases 17.0.0 and 16.0.0. And it does not support greenfield deployments based on deprecated Cluster releases. Use the latest available Cluster release instead.

For main deliverables of the parent Container Cloud release of 2.25.3, refer to 2.25.0.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.25.3. For artifacts of the Cluster releases introduced in 2.25.3, see patch Cluster releases 17.0.3 and 16.0.3.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20231012141354

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20231012141354

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-113-4f8b843.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.38.31.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.38.31.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.38.31.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.38.31.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.38.31.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.38.31.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.38.31.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.38.31

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-25-alpine-20231128145936

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-25-alpine-20231204121500

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-2-25-alpine-20231121115652

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.38.31

ironic Updated

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20231204153029

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20231204153029

ironic-prometheus-exporter Updated

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20231204142028

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-2-25-alpine-20231121164200

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231127070342

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.23.0-87-gc9d7d3b

metallb-controller Updated

mirantis.azurecr.io/bm/metallb/controller:v0.13.9-ef4faae9-amd64

metallb-speaker Updated

mirantis.azurecr.io/bm/metallb/speaker:v0.13.9-ef4faae9-amd64

syslog-ng Updated

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20231121121917

Core artifacts

Artifact

Component

Paths

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.38.31.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.38.31.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.38.31.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.38.31.tgz

byo-credentials-controller

https://binary.mirantis.com/core/helm/byo-credentials-controller-1.38.31.tgz

byo-provider

https://binary.mirantis.com/core/helm/byo-provider-1.38.31.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.38.31.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.38.31.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.38.31.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.38.31.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.38.31.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.38.31.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.38.31.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.38.31.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.38.31.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.38.31.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.38.31.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.38.31.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.38.31.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.38.31.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.38.31.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.38.31.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.38.31.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.38.31.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.38.31.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.38.31.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.38.31.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.38.31.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.38.31.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.38.31.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.38.31.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.38.31.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.38.31.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.38.31.tgz

vsphere-cloud-controller-manager

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.38.31.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.38.31.tgz

vsphere-csi-plugin

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.38.31.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.38.31.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.38.31.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.38.31

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.38.31

byo-cluster-api-controller Updated

mirantis.azurecr.io/core/byo-cluster-api-controller:1.38.31

byo-credentials-controller Updated

mirantis.azurecr.io/core/byo-credentials-controller:1.38.31

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.38.31

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-5

cinder-csi-plugin

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-11

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.38.31

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.38.31

csi-attacher

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-4

csi-node-driver-registrar

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-4

csi-provisioner

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-4

csi-resizer

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-4

csi-snapshotter

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-3

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.38.31

frontend Updated

mirantis.azurecr.io/core/frontend:1.38.31

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.38.31

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.38.31

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.38.31

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.38.31

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.38.31

livenessprobe

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-4

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.38.31

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.23.0-87-gc9d7d3b

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.23.0-87-gc9d7d3b

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-6

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.38.31

openstack-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-12

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.38.31

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.38.31

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.38.31

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.38.31

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.38.31

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-7

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.38.31

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.38.31

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.38.31

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.38.31

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.38.31

vsphere-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-5

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.38.31

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.38.31

vsphere-csi-driver

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.38.31

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/iam/helm/iam-2.6.3.tgz

Docker images

keycloak

n/a (replaced with mcc-keycloak)

kubectl New

mirantis.azurecr.io/stacklight/kubectl:1.22-20231201023019

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231127070342

mcc-keycloak New

mirantis.azurecr.io/iam/mcc-keycloak:22.0.5-1

Security notes

The table below includes the total numbers of addressed unique and common CVEs in images by product component since the Container Cloud 2.25.2 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Ceph

Unique

0

1

1

Common

0

3

3

KaaS core

Unique

2

9

11

Common

3

18

21

StackLight

Unique

1

18

19

Common

1

52

53

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 23.3.3: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.25.3 along with the patch Cluster releases 17.0.3 and 16.0.3.

  • [37634][OpenStack] Fixed the issue with a management or managed cluster deployment or upgrade being blocked by all pods being stuck in the Pending state due to incorrect secrets being used to initialize the OpenStack external Cloud Provider Interface.

  • [37766][IAM] Fixed the issue with sign-in to the MKE web UI of the management cluster using the Sign in with External Provider option, which failed with the invalid parameter: redirect_uri error.

See also

Patch releases

2.25.2

The Container Cloud patch release 2.25.2, which is based on the 2.25.0 major release, provides the following updates:

  • Renewed support for attachment of MKE clusters that are not originally deployed by Container Cloud for vSphere-based management clusters.

  • Support for the patch Cluster releases 16.0.2 and 17.0.2 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 23.3.2.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.0.0 and 16.0.0. And it does not support greenfield deployments based on deprecated Cluster releases 14.0.1, 15.0.1, 16.0.1, and 17.0.1. Use the latest available Cluster releases instead.

For main deliverables of the parent Container Cloud release of 2.25.2, refer to 2.25.0.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.25.2. For artifacts of the Cluster releases introduced in 2.25.2, see patch Cluster releases 17.0.2 and 16.0.2.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20231012141354

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20231012141354

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-113-4f8b843.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.38.29.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.38.29.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.38.29.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.38.29.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.38.29.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.38.29.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.38.29.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.38.29

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-2-25-alpine-20231121112823

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-2-25-alpine-20231121112816

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-2-25-alpine-20231121115652

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.38.29

ironic Updated

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20231120060019

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20231030060018

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20230912104602

kaas-ipam Updated

mirantis.azurecr.io/bm/kaas-ipam:base-2-25-alpine-20231121164200

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231024091216

mcc-keepalived

mirantis.azurecr.io/docker.mirantis.net/lcm/mcc-keepalived:v0.23.0-84-g8d74d7c

metallb-controller Updated

mirantis.azurecr.io/bm/metallb/controller:v0.13.9-ef4faae9-amd64

metallb-speaker Updated

mirantis.azurecr.io/bm/metallb/speaker:v0.13.9-ef4faae9-amd64

syslog-ng Updated

mirantis.azurecr.io/bm/syslog-ng:base-alpine-20231121121917

Core artifacts

Artifact

Component

Paths

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.38.29.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.38.29.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.38.29.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.38.29.tgz

byo-credentials-controller New

https://binary.mirantis.com/core/helm/byo-credentials-controller-1.38.29.tgz

byo-provider New

https://binary.mirantis.com/core/helm/byo-provider-1.38.29.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.38.29.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.38.29.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.38.29.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.38.29.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.38.29.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.38.29.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.38.29.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.38.29.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.38.29.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.38.29.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.38.29.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.38.29.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.38.29.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.38.29.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.38.29.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.38.29.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.38.29.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.38.29.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.38.29.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.38.29.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.38.29.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.38.29.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.38.29.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.38.29.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.38.29.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.38.29.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.38.29.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.38.29.tgz

vsphere-cloud-controller-manager

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.38.29.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.38.29.tgz

vsphere-csi-plugin

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.38.29.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.38.29.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.38.29.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.38.29

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.38.29

byo-credentials-controller New

mirantis.azurecr.io/core/byo-credentials-controller:1.38.29

byo-provider New

mirantis.azurecr.io/core/byo-provider:1.38.29

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.38.29

cert-manager-controller Updated

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-5

cinder-csi-plugin

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-11

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.38.29

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.38.29

csi-attacher

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-4

csi-node-driver-registrar

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-4

csi-provisioner

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-4

csi-resizer

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-4

csi-snapshotter

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-3

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.38.29

frontend Updated

mirantis.azurecr.io/core/frontend:1.38.29

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.38.29

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.38.29

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.38.29

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.38.29

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.38.29

livenessprobe

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-4

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.38.29

mcc-haproxy

mirantis.azurecr.io/lcm/mcc-haproxy:v0.23.0-84-g8d74d7c

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.23.0-84-g8d74d7c

metrics-server Updated

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-6

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.38.29

openstack-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-12

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.38.29

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.38.29

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.38.29

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.38.29

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.38.29

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-7

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.38.29

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.38.29

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.38.29

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.38.29

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.38.29

vsphere-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-5

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.38.29

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.38.29

vsphere-csi-driver Updated

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-driver:v3.0.2-1

vsphere-csi-syncer Updated

mirantis.azurecr.io/lcm/kubernetes/vsphere-csi-syncer:v3.0.2-1

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.38.29

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts

iam

https://binary.mirantis.com/iam/helm/iam-2.5.10.tgz

Docker images

keycloak

mirantis.azurecr.io/iam/keycloak:0.6.0-1

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231024091216

Security notes

The table below includes the total numbers of addressed unique and common CVEs in images by product component since the Container Cloud 2.25.1 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Product component

CVE type

Critical

High

Total

Kaas core

Unique

0

6

6

Common

0

20

20

Ceph

Unique

0

2

2

Common

0

6

6

StackLight

Unique

0

16

16

Common

0

70

70

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 23.3.2: Security notes.

See also

Patch releases

2.25.1

The Container Cloud patch release 2.25.1, which is based on the 2.25.0 major release, provides the following updates:

  • Support for the patch Cluster releases 16.0.1 and 17.0.1 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 23.3.1.

  • Several product improvements. For details, see Enhancements.

  • Security fixes for CVEs in images.

This patch release also supports the latest major Cluster releases 17.0.0 and 16.0.0. And it does not support greenfield deployments based on deprecated Cluster releases 14.1.0, 14.0.1, and 15.0.1. Use the latest available Cluster releases instead.

For main deliverables of the parent Container Cloud release of 2.25.1, refer to 2.25.0.

Enhancements

This section outlines new features and enhancements introduced in the Container Cloud patch release 2.25.1 along with Cluster releases 17.0.1 and 16.0.1.

Support for MKE 3.7.2

Introduced support for Mirantis Kubernetes Engine (MKE) 3.7.2 on Container Cloud management and managed clusters. On existing managed clusters, MKE is updated to the latest supported version when you update your cluster to the patch Cluster release 17.0.1 or 16.0.1.

MKE options managed by Container Cloud

To simplify MKE configuration through API, moved management of MKE parameters controlled by Container Cloud from lcm-ansible to lcm-controller. Now, Container Cloud overrides only a set of MKE configuration parameters that are automatically managed by Container Cloud.

Improvements in the MKE benchmark compliance for StackLight

Analyzed and fixed the majority of failed compliance checks in the MKE benchmark compliance for StackLight. The following controls were analyzed:

Control ID

Control description

Analyzed item

5.2.7

Minimize the admission of containers with the NET_RAW capability

Containers with NET_RAW capability

5.2.6

Minimize the admission of root containers

  • Containers permitting root

  • Containers with the RunAsUser root or root not set

  • Containers with the SYS_ADMIN capability

  • Container UID is a range of hosts

Kubernetes network policies in StackLight

Introduced Kubernetes network policies for all StackLight components. The feature is implemented using the networkPolicies parameter that is enabled by default.

The Kubernetes NetworkPolicy resource allows controlling network connections to and from Pods within a cluster. This enhances security by restricting communication from compromised Pod applications and provides transparency into how applications communicate with each other.

External vSphere CCM with CSI supporting vSphere 6.7 on Kubernetes 1.27

Switched to the external vSphere cloud controller manager (CCM) that uses vSphere Container Storage Plug-in 3.0 for volume attachment. The feature implementation implies an automatic migration of PersistentVolume and PersistentVolumeClaim.

The external vSphere CCM supports vSphere 6.7 on Kubernetes 1.27 as compared to the in-tree vSphere CCM that does not support vSphere 6.7 since Kubernetes 1.25.

Important

The major Cluster release 14.1.0 is the last Cluster release for the vSphere provider based on MCR 20.10 and MKE 3.6.6 with Kubernetes 1.24. Therefore, Mirantis highly recommends updating your existing vSphere-based managed clusters to the Cluster release 16.0.1 that contains newer versions on MCR and MKE with Kubernetes. Otherwise, your management cluster upgrade to Container Cloud 2.25.2 will blocked.

For the update procedure, refer to Operations Guide: Update a patch Cluster release of a managed cluster.

Since Container Cloud 2.25.1, the major Cluster release 14.1.0 is deprecated. Greenfield vSphere-based deployments on this Cluster release are not supported. Use the patch Cluster release 16.0.1 for new deployments instead.

Artifacts

This section lists the artifacts of components included in the Container Cloud patch release 2.25.1. For artifacts of the Cluster releases introduced in 2.25.1, see patch Cluster releases 17.0.1 and 16.0.1.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20231012141354

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20231012141354

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-113-4f8b843.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.38.22.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.38.22.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.38.22.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.38.22.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.38.22.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.38.22.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.38.22.tgz

Docker images Updated

ambasador

mirantis.azurecr.io/core/external/nginx:1.38.22

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-alpine-20231030180650

baremetal-operator

mirantis.azurecr.io/bm/baremetal-operator:base-alpine-20231101201729

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-alpine-20231027135748

cluster-api-provider-baremetal

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.38.22

ironic

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20231030060018

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20231030060018

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20230912104602

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-alpine-20231027151726

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231024091216

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.23.0-84-g8d74d7c

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.9-fd3b03b0-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.9-fd3b03b0-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-apline-20231030181839

Core artifacts

Artifact

Component

Paths

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com//core/binbootstrap-darwin-1.38.22.tgz

bootstrap-linux

https://binary.mirantis.com//core/binbootstrap-linux-1.38.22.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.38.22.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.38.22.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.38.22.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.38.22.tgz

cinder-csi-plugin

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.38.22.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.38.22.tgz

configuration-collector

https://binary.mirantis.com/core/helm/configuration-collector-1.38.22.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.38.22.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.38.22.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.38.22.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.38.22.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.38.22.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.38.22.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.38.22.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.38.22.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.38.22.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.38.22.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.38.22.tgz

openstack-cloud-controller-manager

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.38.22.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.38.22.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.38.22.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.38.22.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.38.22.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.38.22.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.38.22.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.38.22.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.38.22.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.38.22.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.38.22.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.38.22.tgz

vsphere-cloud-controller-manager New

https://binary.mirantis.com/core/helm/vsphere-cloud-controller-manager-1.38.22.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.38.22.tgz

vsphere-csi-plugin New

https://binary.mirantis.com/core/helm/vsphere-csi-plugin-1.38.22.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.38.22.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.38.22.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.38.22

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.38.22

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.38.22

cert-manager-controller Updated

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-4

cinder-csi-plugin Updated

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-11

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.38.22

configuration-collector Updated

mirantis.azurecr.io/core/configuration-collector:1.38.22

csi-attacher Updated

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-4

csi-node-driver-registrar Updated

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-4

csi-provisioner Updated

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-4

csi-resizer Updated

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-4

csi-snapshotter Updated

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-3

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.38.22

frontend Updated

mirantis.azurecr.io/core/frontend:1.38.22

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.38.22

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.38.22

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.38.22

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.38.22

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.38.22

livenessprobe Updated

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-4

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.38.22

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.23.0-84-g8d74d7c

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.23.0-84-g8d74d7c

metrics-server Updated

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-4

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.38.22

openstack-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-11

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.38.22

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.38.22

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.38.22

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.38.22

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.38.22

registry Updated

mirantis.azurecr.io/lcm/registry:v2.8.1-7

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.38.22

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.38.22

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.38.22

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.38.22

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.38.22

vsphere-cloud-controller-manager New

mirantis.azurecr.io/lcm/kubernetes/vsphere-cloud-controller-manager:v1.27.0-4

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.38.22

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.38.22

vsphere-csi-driver New

mirantis.azurecr.io/core/external/vsphere-csi-driver:v3.0.2

vsphere-csi-syncer New

mirantis.azurecr.io/core/external/vsphere-csi-syncer:v3.0.2

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.38.22

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/iam/helm/iam-2.5.10.tgz

Docker images Updated

keycloak

mirantis.azurecr.io/iam/keycloak:0.6.0-1

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-55b02f7-20231019172556

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20231024091216

Security notes

The table below includes the total numbers of addressed unique and common CVEs in images by product component since the Container Cloud 2.25.0 major release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Container Cloud component

CVE type

Critical

High

Total

Kaas core

Unique

0

12

12

Common

0

280

280

Ceph

Unique

0

8

8

Common

0

41

41

StackLight

Unique

4

33

37

Common

18

130

148

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 23.3.1: Security notes.

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.25.1 along with the patch Cluster releases 17.0.1 and 16.0.1.

  • [35426] [StackLight] Fixed the issue with the prometheus-libvirt-exporter Pod failing to reconnect to libvirt after the libvirt Pod recovery from a failure.

  • [35339] [LCM] Fixed the issue with the LCM Ansible task of copying kubectl from the ucp-hyperkube image failing if kubectl exec is in use, for example, during a management cluster upgrade.

  • [35089] [bare metal, Calico] Fixed the issue with arbitrary Kubernetes pods getting stuck in an error loop due to a failed Calico networking setup for that pod.

  • [33936] [bare metal, Calico] Fixed the issue with deletion failure of a controller node during machine replacement due to the upstream Calico issue.

See also

Patch releases

2.25.0

The Mirantis Container Cloud major release 2.25.0:

  • Introduces support for the Cluster release 17.0.0 that is based on the Cluster release 16.0.0 and represents Mirantis OpenStack for Kubernetes (MOSK) 23.3.

  • Introduces support for the Cluster release 16.0.0 that is based on Mirantis Container Runtime (MCR) 23.0.7 and Mirantis Kubernetes Engine (MKE) 3.7.1 with Kubernetes 1.27.

  • Introduces support for the Cluster release 14.1.0 that is dedicated for the vSphere provider only. This is the last Cluster release for the vSphere provider based on MKE 3.6.6 with Kubernetes 1.24.

  • Does not support greenfield deployments on deprecated Cluster releases of the 15.x and 14.x series. Use the latest available Cluster releases of the series instead.

    Caution

    Make sure to update the Cluster release version of your managed cluster before the current Cluster release version becomes unsupported by a new Container Cloud release version. Otherwise, Container Cloud stops auto-upgrade and eventually Container Cloud itself becomes unsupported.

This section outlines release notes for the Container Cloud release 2.25.0.

Enhancements

This section outlines new features and enhancements introduced in the Container Cloud release 2.25.0. For the list of enhancements delivered with the Cluster releases introduced by Container Cloud 2.25.0, see 17.0.0, 16.0.0, and 14.1.0.

Container Cloud Bootstrap v2

Implemented Container Cloud Bootstrap v2 that provides an exceptional user experience to set up Container Cloud. With Bootstrap v2, you also gain access to a comprehensive and user-friendly web UI for the OpenStack and vSphere providers.

Bootstrap v2 empowers you to effortlessly provision management clusters before deployment, while benefiting from a streamlined process that isolates each step. This approach not only simplifies the bootstrap process but also enhances troubleshooting capabilities for addressing any potential intermediate failures.

Note

The Bootstrap web UI support for the bare metal provider will be added in one of the following Container Cloud releases.

General availability for ‘MetalLBConfigTemplate’ and ‘MetalLBConfig’ objects

Completed development of the MetalLB configuration related to address allocation and announcement for load-balanced services using the MetalLBConfigTemplate object for bare metal and the MetalLBConfig object for vSphere. Container Cloud uses these objects in default templates as recommended during creation of a management or managed cluster.

At the same time, removed the possibility to use the deprecated options, such as configInline value of the MetalLB chart and the use of Subnet objects without new MetalLBConfigTemplate and MetalLBConfig objects.

Automated migration, which applied to these deprecated options during creation of clusters of any type or cluster update to Container Cloud 2.24.x, is removed automatically during your management cluster upgrade to Container Cloud 2.25.0. After that, any changes in MetalLB configuration related to address allocation and announcement for load-balanced services will be applied using the MetalLBConfig, MetalLBConfigTemplate, and Subnet objects only.

Manual IP address allocation for bare metal hosts during PXE provisioning

Technology Preview

Implemented the following annotations for bare metal hosts that enable manual allocation of IP addresses during PXE provisioning on managed clusters:

  • host.dnsmasqs.metal3.io/address - assigns a specific IP address to a host

  • baremetalhost.metal3.io/detached - pauses automatic host management

These annotations are helpful if you have a limited amount of free and unused IP addresses for server provisioning. Using these annotations, you can manually create bare metal hosts one by one and provision servers in small, manually managed chunks.

Status of infrastructure health for bare metal and OpenStack providers

Implemented the Infrastructure Status condition to monitor infrastructure readiness in the Container Cloud web UI during cluster deployment for bare metal and OpenStack providers. Readiness of the following components is monitored:

  • Bare metal: the MetalLBConfig object along with MetalLB and DHCP subnets

  • OpenStack: cluster network, routers, load balancers, and Bastion along with their ports and floating IPs

For the bare metal provider, also implemented the Infrastructure Status condition for machines to monitor readiness of the IPAMHost, L2Template, BareMetalHost, and BareMetalHostProfile objects associated with the machine.

General availability for RHEL 8.7 on vSphere-based clusters

Introduced general availability support for RHEL 8.7 on VMware vSphere-based clusters. You can install this operating system on any type of a Container Cloud cluster including the bootstrap node.

Note

RHEL 7.9 is not supported as the operating system for the bootstrap node.

Caution

A Container Cloud cluster based on mixed RHEL versions, such as RHEL 7.9 and 8.7, is not supported.

Automatic cleanup of old Ubuntu kernel packages

Implemented automatic cleanup of old Ubuntu kernel and other unnecessary system packages. During cleanup, Container Cloud keeps two most recent kernel versions, which is the default behavior of the Ubuntu apt autoremove command.

Mirantis recommends keeping two kernel versions with the previous kernel version as a fallback option in the event that the current kernel may become unstable at any time. However, if you absolutely require leaving only the latest version of kernel packages, you can use the cleanup-kernel-packages script after considering all possible risks.

Configuration of a custom OIDC provider for MKE on managed clusters

Implemented the ability to configure a custom OpenID Connect (OIDC) provider for MKE on managed clusters using the ClusterOIDCConfiguration custom resource. Using this resource, you can add your own OIDC provider configuration to authenticate user requests to Kubernetes.

Note

For OpenStack and StackLight, Container Cloud supports only Keycloak, which is configured on the management cluster, as the OIDC provider.

The admin role for management cluster

Implemented the management-admin OIDC role to grant full admin access specifically to a management cluster. This role enables the user to manage Pods and all other resources of the cluster, for example, for debugging purposes.

General availability for graceful machine deletion

Introduced general availability support for graceful machine deletion with a safe cleanup of node resources:

  • Changed the default deletion policy from unsafe to graceful for machine deletion using the Container Cloud API.

    Using the deletionPolicy: graceful parameter in the providerSpec.value section of the Machine object, the cloud provider controller prepares a machine for deletion by cordoning, draining, and removing the related node from Docker Swarm. If required, you can abort a machine deletion when using deletionPolicy: graceful, but only before the related node is removed from Docker Swarm.

  • Implemented the following machine deletion methods in the Container Cloud web UI: Graceful, Unsafe, Forced.

  • Added support for deletion of manager machines, which is intended only for replacement or recovery of failed nodes, for MOSK-based clusters using either of deletion policies mentioned above.

General availability for parallel update of worker nodes

Completed development of the parallel update of worker nodes during cluster update by implementing the ability to configure the required options using the Container Cloud web UI. Parallelizing of node update operations significantly optimizes the update efficiency of large clusters.

The following options are added to the Create Cluster window:

  • Parallel Upgrade Of Worker Machines that sets the maximum number of worker nodes to update simultaneously

  • Parallel Preparation For Upgrade Of Worker Machines that sets the maximum number of worker nodes for which new artifacts are downloaded at a given moment of time

Addressed issues

The following issues have been addressed in the Mirantis Container Cloud release 2.25.0 along with the Cluster releases 17.0.0, 16.0.0, and 14.1.0.

Note

This section provides descriptions of issues addressed since the last Container Cloud patch release 2.24.5.

For details on addressed issues in earlier patch releases since 2.24.0, which are also included into the major release 2.25.0, refer to 2.24.x patch releases.

  • [34462] [BM] Fixed the issue with incorrect handling of the DHCP egress traffic by reconfiguring the external traffic policy for the dhcp-lb Kubernetes Service. For details about the issue, refer to the Kubernetes upstream bug.

    On existing clusters with multiple L2 segments using DHCP relays on the border switches, in order to successfully provision new nodes or reprovision existing ones, manually point the DHCP relays on your network infrastructure to the new IP address of the dhcp-lb Service of the Container Cloud cluster.

    To obtain the new IP address:

    kubectl -n kaas get service dhcp-lb
    
  • [35429] [BM] Fixed the issue with the WireGuard interface not having the IPv4 address assigned. The fix implies automatic restart of the calico-node Pod to allocate the IPv4 address on the WireGuard interface.

  • [36131] [BM] Fixed the issue with IpamHost object changes not being propagated to LCMMachine during netplan configuration after cluster deployment.

  • [34657] [LCM] Fixed the issue with iam-keycloak Pods not starting after powering up master nodes and starting the Container Cloud upgrade right after.

  • [34750] [LCM] Fixed the issue with journald generating a lot of log messages that already exist in the auditd log due to enabled systemd-journald-audit.socket.

  • [35738] [StackLight] Fixed the issue with ucp-node-exporter being unable to bind the port 9100 with the ucp-node-exporter failing to start due to a conflict with the StackLight node-exporter binding the same port.

    The resolution of the issue involves an automatic change of the port for the StackLight node-exporter from 9100 to 19100. No manual port update is required.

    If your cluster uses a firewall, add an additional firewall rule that grants the same permissions to port 19100 as those currently assigned to port 9100 on all cluster nodes.

  • [34296] [StackLight] Fixed the issue with the CPU over-consumption by helm-controller leading to the KubeContainersCPUThrottlingHigh alert firing.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.25.0 including the Cluster releases 17.0.0, 16.0.0, and 14.1.0.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
[42386] A load balancer service does not obtain the external IP address

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

  1. Identify the service that is stuck:

    kubectl get svc -A | grep pending
    

    Example of system response:

    stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP
    
  2. Add an arbitrary label to the service that is stuck. For example:

    kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
    

    Example of system response:

    service/iam-proxy-prometheus labeled
    
  3. Verify that the external IP was allocated to the service:

    kubectl get svc -n stacklight iam-proxy-prometheus
    

    Example of system response:

    NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
    iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d
    
[35089] Calico does not set up networking for a pod

Fixed in 17.0.1 and 16.0.1 for MKE 3.7.2

An arbitrary Kubernetes pod may get stuck in an error loop due to a failed Calico networking setup for that pod. The pod cannot access any network resources. The issue occurs more often during cluster upgrade or node replacement, but this can sometimes happen during the new deployment as well.

You may find the following log for the failed pod IP (for example, 10.233.121.132) in calico-node logs:

felix/route_table.go 898: Syncing routes: found unexpected route; ignoring due to grace period. dest=10.233.121.132/32 ifaceName="cali9731b965838" ifaceRegex="^cali." ipVersion=0x4 tableIndex=254
felix/route_table.go 898: Syncing routes: found unexpected route; ignoring due to grace period. dest=10.233.121.132/32 ifaceName="cali9731b965838" ifaceRegex="^cali." ipVersion=0x4 tableIndex=254
...
felix/route_table.go 902: Remove old route dest=10.233.121.132/32 ifaceName="cali9731b965838" ifaceRegex="^cali.*" ipVersion=0x4 routeProblems=[]string{"unexpected route"} tableIndex=254
felix/conntrack.go 90: Removing conntrack flows ip=10.233.121.132

The workaround is to manually restart the affected pod:

kubectl delete pod <failedPodID>
[33936] Deletion failure of a controller node during machine replacement

Fixed in 17.0.1 and 16.0.1 for MKE 3.7.2

Due to the upstream Calico issue, a controller node cannot be deleted if the calico-node Pod is stuck blocking node deletion. One of the symptoms is the following warning in the baremetal-operator logs:

Resolving dependency Service dhcp-lb in namespace kaas failed: \
the server was unable to return a response in the time allotted,\
but may still be processing the request (get endpoints dhcp-lb).

As a workaround, delete the Pod that is stuck to retrigger the node deletion.

[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.


OpenStack
[37634] Cluster deployment or upgrade is blocked by all pods in ‘Pending’ state

Fixed in 17.0.3 and 16.0.3

When using OpenStackCredential with a custom CACert, a management or managed cluster deployment or upgrade is blocked by all pods being stuck in the Pending state. The issue is caused by incorrect secrets being used to initialize the OpenStack external Cloud Provider Interface.

As a workaround, copy CACert from the OpenStackCredential object to openstack-ca-secret:

kubectl --kubeconfig <pathToFailedClusterKubeconfig> patch secret -n kube-system openstack-ca-secret -p '{"data":{"ca.pem":"'$(kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <affectedProjectName> get openstackcredentials <credentialsName> -o go-template="{{.spec.CACert}}")'"}}'

If the CACert from the OpenStackCredential is not base64-encoded:

kubectl --kubeconfig <pathToFailedClusterKubeconfig> patch secret -n kube-system openstack-ca-secret -p '{"data":{"ca.pem":"'$(kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <affectedProjectName> get openstackcredentials <credentialsName> -o go-template="{{.spec.CACert}}" | base64)'"}}'

In either command above, replace the following values:

  • <pathToFailedClusterKubeconfig> is the file path to the affected managed or management cluster kubeconfig.

  • <pathToManagementClusterKubeconfig> is the file path to the Container Cloud management cluster kubeconfig.

  • <affectedProjectName> is the Container Cloud project name containing the cluster with stuck pods. For a management cluster, the value is default.

  • <credentialsName> is the OpenStackCredential name used for the deployment.


IAM
[37766] Sign-in to the MKE web UI fails with ‘invalid parameter: redirect_uri’

Fixed in 17.0.3 and 16.0.3

A sign-in to the MKE web UI of the management cluster using the Sign in with External Provider option can fail with the invalid parameter: redirect_uri error.

Workaround:

  1. Log in to the Keycloak admin console.

  2. In the sidebar menu, switch to the IAM realm.

  3. Navigate to Clients > kaas.

  4. On the page, navigate to Seetings > Access settings > Valid redirect URIs.

  5. Add https://<mgmt mke ip>:6443/* to the list of valid redirect URIs and click Save.

  6. Refresh the browser window with the sign-in URI.


LCM
[31186,34132] Pods get stuck during MariaDB operations

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

  1. Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.

  2. Verify that other replicas are up and ready.

  3. Remove the galera.cache file for the affected mariadb-server Pod.

  4. Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

[32761] Node cleanup fails due to remaining devices

Fixed in 17.1.0 and 16.1.0

On MOSK clusters, the Ansible provisioner may hang in a loop while trying to remove LVM thin pool logical volumes (LVs) due to issues with volume detection before removal. The Ansible provisioner cannot remove LVM thin pool LVs correctly, so it consistently detects the same volumes whenever it scans disks, leading to a repetitive cleanup process.

The following symptoms mean that a cluster can be affected:

  • A node was configured to use thin pool LVs. For example, it had the OpenStack Cinder role in the past.

  • A bare metal node deployment flaps between provisioninig and deprovisioning states.

  • In the Ansible provisioner logs, the following example warnings are growing:

    88621.log:7389:2023-06-22 16:30:45.109 88621 ERROR ansible.plugins.callback.ironic_log
    [-] Ansible task clean : fail failed on node 14eb0dbc-c73a-4298-8912-4bb12340ff49:
    {'msg': 'There are more devices to clean', '_ansible_no_log': None, 'changed': False}
    

    Important

    There are more devices to clean is a regular warning indicating some in-progress tasks. But if the number of such warnings is growing along with the node flapping between provisioninig and deprovisioning states, the cluster is highly likely affected by the issue.

As a workaround, erase disks manually using any preferred tool.

[30294] Replacement of a master node is stuck on the calico-node Pod start

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

  1. Log in to any master node.

  2. From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
    -e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
    -e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl \
    "
    
    alias calicoctl="\
    docker run -i --rm \
    --pid host \
    --net host \
    -e constraint:ostype==linux \
    -e ETCD_ENDPOINTS=<etcdEndpoint> \
    -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
    -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
    -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
    -v /var/run/calico:/var/run/calico \
    -v ucp-node-certs:/ucp-node-certs:ro \
    mirantis/ucp-dsinfo:<mkeVersion> \
    calicoctl --allow-version-mismatch \
    "
    

    In the above command, replace the following values with the corresponding settings of the affected cluster:

    • <etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378

    • <mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

  3. Verify the node list on the cluster:

    kubectl get node
    
  4. Compare this list with the node list in Calico to identify the old node:

    calicoctl get node -o wide
    
  5. Remove the old node from Calico:

    calicoctl delete node kaas-node-<nodeID>
    
[5782] Manager machine fails to be deployed during node replacement

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

  • The system adds the node to Docker swarm but not to Kubernetes

  • The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

  1. Delete the failed node.

  2. Wait for the MKE cluster to become healthy. To monitor the cluster status:

    1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.

    2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.

  3. Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

  • The calico-kube-controllers Pod fails to clean up resources associated with the deleted node

  • The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

Ceph
[34820] The Ceph ‘rook-operator’ fails to connect to RGW on FIPS nodes

Fixed in 17.1.0 and 16.1.0

Due to the upstream Ceph issue, on clusters with the Federal Information Processing Standard (FIPS) mode enabled, the Ceph rook-operator fails to connect to Ceph RADOS Gateway (RGW) pods.

As a workaround, do not place Ceph RGW pods on nodes where FIPS mode is enabled.

[26441] Cluster update fails with the MountDevice failed for volume warning

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

  1. Verify that the description of the Pods that failed to run contain the FailedMount events:

    kubectl -n <affectedProjectName> describe pod <affectedPodName>
    

    In the command above, replace the following values:

    • <affectedProjectName> is the Container Cloud project name where the Pods failed to run

    • <affectedPodName> is a Pod name that failed to run in the specified project

    In the Pod description, identify the node name where the Pod failed to run.

  2. Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.

    1. Identify csiPodName of the corresponding csi-rbdplugin:

      kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
      -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
      
    2. Output the affected csiPodName logs:

      kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
      
  3. Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

  4. On every csi-rbdplugin Pod, search for stuck csi-vol:

    for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
      echo $pod
      kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
    done
    
  5. Unmap the affected csi-vol:

    rbd unmap -o force /dev/rbd<i>
    

    The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

  6. Delete volumeattachment of the affected Pod:

    kubectl get volumeattachments | grep <csi-vol-uuid>
    kubectl delete volumeattacmhent <id>
    
  7. Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.

Update
[37268] Container Cloud upgrade is blocked by a node in ‘Prepare’ or ‘Deploy’ state

Fixed in 17.1.0 and 16.1.0

Container Cloud upgrade may be blocked by a node being stuck in the Prepare or Deploy state with error processing package openssh-server. The issue is caused by customizations in /etc/ssh/sshd_config, such as additional Match statements. This file is managed by Container Cloud and must not be altered manually.

As a workaround, move customizations from sshd_config to a new file in the /etc/ssh/sshd_config.d/ directory.

[36928] The helm-controller Deployment is stuck during cluster update

During a cluster update, a Kubernetes helm-controller Deployment may get stuck in a restarting Pod loop with Terminating and Running states flapping. Other Deployment types may also be affected.

As a workaround, restart the Deployment that got stuck:

kubectl -n <affectedProjectName> get deploy <affectedDeployName> -o yaml

kubectl -n <affectedProjectName> scale deploy <affectedDeployName> --replicas 0

kubectl -n <affectedProjectName> scale deploy <affectedDeployName> --replicas <replicasNumber>

In the command above, replace the following values:

  • <affectedProjectName> is the Container Cloud project name containing the cluster with stuck Pods

  • <affectedDeployName> is the Deployment name that failed to run Pods in the specified project

  • <replicasNumber> is the original number of replicas for the Deployment that you can obtain using the get deploy command

[33438] ‘CalicoDataplaneFailuresHigh’ alert is firing during cluster update

During cluster update of a managed bare metal cluster, the false positive CalicoDataplaneFailuresHigh alert may be firing. Disregard this alert, which will disappear once cluster update succeeds.

The observed behavior is typical for calico-node during upgrades, as workload changes occur frequently. Consequently, there is a possibility of temporary desynchronization in the Calico dataplane. This can occasionally result in throttling when applying workload changes to the Calico dataplane.

Components versions

The following table lists the major components and their versions delivered in the Container Cloud 2.25.0.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Container Cloud release components versions

Component

Application/Service

Version

Bare metal Updated

ambasador

1.38.17

baremetal-dnsmasq

base-alpine-20231013162346

baremetal-operator

base-alpine-20231101201729

baremetal-provider

1.38.17

bm-collective

base-alpine-20230929115341

cluster-api-provider-baremetal

1.38.17

ironic

yoga-jammy-20230914091512

ironic-inspector

yoga-jammy-20230914091512

ironic-prometheus-exporter

0.1-20230912104602

kaas-ipam

base-alpine-20230911165405

kubernetes-entrypoint

1.0.1-27d64fb-20230421151539

mariadb

10.6.14-focal-20230912121635

metallb-controller

0.13.9-0d8e8043-amd64

metallb-speaker

0.13.9-0d8e8043-amd64

syslog-ng

base-apline-20230914091214

IAM

iam Updated

2.5.8

iam-controller Updated

1.38.17

keycloak

21.1.1

Container Cloud

admission-controller Updated

1.38.17

agent-controller Updated

1.38.17

ceph-kcc-controller Updated

1.38.17

cert-manager-controller

1.11.0-2

cinder-csi-plugin New

1.27.2-8

client-certificate-controller Updated

1.38.17

configuration-collector New

1.38.17

csi-attacher New

4.2.0-2

csi-node-driver-registrar New

2.7.0-2

csi-provisioner New

3.4.1-2

csi-resizer New

1.7.0-2

csi-snapshotter New

6.2.1-mcc-1

event-controller Updated

1.38.17

frontend Updated

1.38.17

golang

1.20.4-alpine3.17

iam-controller Updated

1.38.17

kaas-exporter Updated

1.38.17

kproxy Updated

1.38.17

lcm-controller Updated

1.38.17

license-controller Updated

1.38.17

livenessprobe New

2.9.0-2

machinepool-controller Updated

1.38.17

mcc-haproxy

0.23.0-73-g01aa9b3

metrics-server

0.6.3-2

nginx Updated

1.38.17

portforward-controller Updated

1.38.17

proxy-controller Updated

1.38.17

rbac-controller Updated

1.38.17

registry

2.8.1-5

release-controller Updated

1.38.17

rhellicense-controller Updated

1.38.17

scope-controller Updated

1.38.17

storage-discovery Updated

1.38.17

user-controller Updated

1.38.17

OpenStack Updated

openstack-cloud-controller-manager

1.27.2-8

openstack-cluster-api-controller

1.38.17

openstack-provider

1.38.17

os-credentials-controller

1.38.17

VMware vSphere

mcc-keepalived Updated

0.23.0-73-g01aa9b3

squid-proxy

0.0.1-10-g24a0d69

vsphere-cluster-api-controller Updated

1.38.17

vsphere-credentials-controller Updated

1.38.17

vsphere-provider Updated

1.38.17

vsphere-vm-template-controller Updated

1.38.17

Artifacts

This section lists the artifacts of components included in the Container Cloud release 2.25.0.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries Updated

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20231012141354

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20231012141354

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-113-4f8b843.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.38.17.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.38.17.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.38.17.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.38.17.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.38.17.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.38.17.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.38.17.tgz

Docker images Updated

ambasador

mirantis.azurecr.io/core/external/nginx:1.38.17

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-alpine-20231013162346

baremetal-operator

mirantis.azurecr.io/bm/baremetal-operator:base-alpine-20231101201729

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-alpine-20230929115341

cluster-api-provider-baremetal

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.38.17

ironic

mirantis.azurecr.io/openstack/ironic:yoga-jammy-20230914091512

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:yoga-jammy-20230914091512

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20230912104602

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-alpine-20230911165405

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-27d64fb-20230421151539

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20230912121635

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.23.0-73-g01aa9b3

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.9-0d8e8043-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.9-0d8e8043-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-apline-20230914091214

Core artifacts

Artifact

Component

Paths

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.38.17.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.38.17.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.38.17.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.38.17.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.38.17.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.38.17.tgz

cinder-csi-plugin New

https://binary.mirantis.com/core/helm/cinder-csi-plugin-1.38.17.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.38.17.tgz

configuration-collector New

https://binary.mirantis.com/core/helm/configuration-collector-1.38.17.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.38.17.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.38.17.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.38.17.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.38.17.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.38.17.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.38.17.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.38.17.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.38.17.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.38.17.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.38.17.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.38.17.tgz

openstack-cloud-controller-manager New

https://binary.mirantis.com/core/helm/openstack-cloud-controller-manager-1.38.17.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.38.17.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.38.17.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.38.17.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.38.17.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.38.17.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.38.17.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.38.17.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.38.17.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.38.17.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.38.17.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.38.17.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.38.17.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.38.17.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.38.17.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.38.17

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.38.17

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.38.17

cert-manager-controller Updated

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-2

cinder-csi-plugin New

mirantis.azurecr.io/lcm/kubernetes/cinder-csi-plugin:v1.27.2-8

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.38.17

configuration-collector New

mirantis.azurecr.io/core/configuration-collector:1.38.17

csi-attacher New

mirantis.azurecr.io/lcm/k8scsi/csi-attacher:v4.2.0-2

csi-node-driver-registrar New

mirantis.azurecr.io/lcm/k8scsi/csi-node-driver-registrar:v2.7.0-2

csi-provisioner New

mirantis.azurecr.io/lcm/k8scsi/csi-provisioner:v3.4.1-2

csi-resizer New

mirantis.azurecr.io/lcm/k8scsi/csi-resizer:v1.7.0-2

csi-snapshotter New

mirantis.azurecr.io/lcm/k8scsi/csi-snapshotter:v6.2.1-mcc-1

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.38.17

frontend Updated

mirantis.azurecr.io/core/frontend:1.38.17

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.38.17

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.38.17

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.38.17

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.38.17

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.38.17

livenessprobe New

mirantis.azurecr.io/lcm/k8scsi/livenessprobe:v2.9.0-2

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.38.17

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.23.0-73-g01aa9b3

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.23.0-73-g01aa9b3

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-2

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.38.17

openstack-cloud-controller-manager Updated

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager:v1.27.2-8

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.38.17

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.38.17

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.38.17

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.38.17

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.38.17

registry Updated

mirantis.azurecr.io/lcm/registry:v2.8.1-6

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.38.17

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.38.17

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.38.17

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.38.17

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.38.17

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.38.17

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.38.17

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.38.17

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts Updated

iam

https://binary.mirantis.com/iam/helm/iam-2.5.8.tgz

Docker images

keycloak

mirantis.azurecr.io/iam/keycloak:0.6.0

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-27d64fb-20230421151539

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20230730124341

Security notes

The table below includes the total numbers of addressed unique and common CVEs by product component since the 2.24.5 patch release. The common CVEs are issues addressed across several images.

Addressed CVEs - summary

Container Cloud component

CVE type

Critical

High

Total

Kaas core

Unique

7

39

46

Common

54

305

359

Ceph

Unique

0

1

1

Common

0

1

1

StackLight

Unique

0

5

5

Common

0

13

13

Mirantis Security Portal

For the detailed list of fixed and existing CVEs across the Mirantis Container Cloud and MOSK products, refer to Mirantis Security Portal.

MOSK CVEs

For the number of fixed CVEs in the MOSK-related components including OpenStack and Tungsten Fabric, refer to MOSK 23.3: Security notes.

Update notes

This section describes the specific actions you as a cloud operator need to complete before or after your Container Cloud cluster update to the Cluster releases 17.0.0, 16.0.0, or 14.1.0.

Consider this information as a supplement to the generic update procedures published in Operations Guide: Automatic upgrade of a management cluster and Update a managed cluster.

Pre-update actions
Upgrade to Ubuntu 20.04 on baremetal-based clusters

The Cluster release series 14.x and 15.x are the last ones where Ubuntu 18.04 is supported on existing clusters. A Cluster release update to 17.0.0 or 16.0.0 is impossible for a cluster running on Ubuntu 18.04.

Therefore, if your cluster update is blocked, make sure that the operating system on all cluster nodes is upgraded to Ubuntu 20.04 as described in Operations Guide: Upgrade an operating system distribution.

Configure managed clusters with the etcd storage quota set

If your cluster has custom etcd storage quota set as described in Increase storage quota for etcd, before the management cluster upgrade to 2.25.0, configure LCMMachine resources:

  1. Manually set the ucp_etcd_storage_quota parameter in LCMMachine resources of the cluster controller nodes:

    spec:
      stateItemsOverwrites:
        deploy:
          ucp_etcd_storage_quota: "<custom_etcd_storage_quota_value>"
    

    If the stateItemsOverwrites.deploy section is already set, append ucp_etcd_storage_quota to the existing parameters.

    To obtain the list of the cluster LCMMachine resources:

    kubectl -n <cluster_namespace> get lcmmachine
    

    To patch the cluster LCMMachine resources of the Type control:

    kubectl -n <cluster_namespace> edit lcmmachine <control_lcmmachine_name>
    
  2. After the management cluster is upgraded to 2.25.0, update your managed cluster to the Cluster release 17.0.0 or 16.0.0.

  3. Manually remove the ucp_etcd_storage_quota parameter from the stateItemsOverwrites.deploy section.

Allow the TCP port 12392 for management cluster nodes

The Cluster release 16.x and 17.x series are shipped with MKE 3.7.x. To ensure cluster operability after the update, verify that the TCP port 12392 is allowed in your network for the Container Cloud management cluster nodes.

For the full list of the required ports for MKE, refer to MKE Documentation: Open ports to incoming traffic.

Post-update actions
Migrate Ceph cluster to address storage devices using by-id

Container Cloud uses the device by-id identifier as the default method of addressing the underlying devices of Ceph OSDs. This is the only persistent device identifier for a Ceph cluster that remains stable after cluster upgrade or any other cluster maintenance.

Therefore, if your existing Ceph clusters are still utilizing the device names or device by-path symlinks, migrate them to the by-id format as described in Migrate Ceph cluster to address storage devices using by-id.

Point DHCP relays on routers to the new dhcp-lb IP address

If your managed cluster has multiple L2 segments using DHCP relays on the border switches, after the related management cluster automatically upgrades to Container Cloud 2.25.0, manually point the DHCP relays on your network infrastructure to the new IP address of the dhcp-lb service of the Container Cloud managed cluster in order to successfully provision new nodes or reprovision existing ones.

To obtain the new IP address:

kubectl -n kaas get service dhcp-lb

This change is required after the product has included the resolution of the issue related to the incorrect handling of DHCP egress traffic. The fix involves reconfiguring the external traffic policy for the dhcp-lb Kubernetes Service. For details about the issue, refer to the Kubernetes upstream bug.

2.24.5

The Container Cloud patch release 2.24.5, which is based on the 2.24.2 major release, provides the following updates:

  • Support for the patch Cluster releases 14.0.4 and 15.0.4 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 23.2.3.

  • Security fixes for CVEs of Critical and High severity

This patch release also supports the latest major Cluster releases 14.0.1 and 15.0.1. And it does not support greenfield deployments based on deprecated Cluster releases 15.0.3, 15.0.2, 14.0.3, 14.0.2 along with 12.7.x and 11.7.x series. Use the latest available Cluster releases for new deployments instead.

For main deliverables of the parent Container Cloud releases of 2.24.5, refer to 2.24.0 and 2.24.1.

Artifacts

This section lists the components artifacts of the Container Cloud patch release 2.24.5. For artifacts of the Cluster releases introduced in 2.24.5, see patch Cluster releases 15.0.4 and 14.0.4.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts
Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20230606121129

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20230606121129

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-104-6e2e82c.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.37.25.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.37.25.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.37.25.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.37.25.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.37.25.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.37.25.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.37.25.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.37.25

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-alpine-20230810152159

baremetal-operator

mirantis.azurecr.io/bm/baremetal-operator:base-alpine-20230803175048

bm-collective

mirantis.azurecr.io/bm/bm-collective:base-alpine-20230829084517

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.37.25

ironic

mirantis.azurecr.io/openstack/ironic:yoga-focal-20230810113432

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:yoga-focal-20230810113432

ironic-prometheus-exporter Updated

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20230912104602

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-alpine-20230810155639

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-5359171-20230810125608

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20230730124341

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.22.0-75-g08569a8

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.9-53df4a9c-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.9-53df4a9c-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-apline-20230814110635

Core artifacts

Artifact

Component

Path

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com/core/bin/bootstrap-darwin-1.37.25.tgz

bootstrap-linux

https://binary.mirantis.com/core/bin/bootstrap-linux-1.37.25.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.37.25.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.37.25.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.37.25.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.37.25.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.37.25.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.37.25.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.37.25.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.37.25.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.37.25.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.37.25.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.37.25.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.37.25.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.37.25.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.37.25.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.37.25.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.37.25.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.37.25.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.37.25.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.37.25.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.37.25.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.37.25.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.37.25.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.37.25.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.37.25.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.37.25.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.37.25.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.37.25.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.37.25.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.37.25.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.37.25.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.37.25

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.37.25

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.37.25

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-2

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.37.25

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.37.25

frontend Updated

mirantis.azurecr.io/core/frontend:1.37.25

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.37.25

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.37.25

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.37.25

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.37.25

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.37.25

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.37.25

mcc-haproxy

mirantis.azurecr.io/lcm/mcc-haproxy:v0.22.0-75-g08569a8

mcc-keepalived

mirantis.azurecr.io/lcm/mcc-keepalived:v0.22.0-75-g08569a8

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-2

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.37.25

openstack-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager-amd64:v1.24.5-13

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.37.25

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.37.25

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.37.25

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.37.25

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.37.25

registry Updated

mirantis.azurecr.io/lcm/registry:v2.8.1-5

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.37.25

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.37.25

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.37.25

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.37.25

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.37.25

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.37.25

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.37.25

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.37.25

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts

iam

https://binary.mirantis.com/iam/helm/iam-2.5.4.tgz

Docker images

keycloak

mirantis.azurecr.io/iam/keycloak:0.6.0

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-27d64fb-20230421151539

mariadb

mirantis.azurecr.io/general/mariadb:10.6.12-focal-20230423170220

Security notes

In total, since Container Cloud 2.24.4, in 2.24.5, 21 Common Vulnerabilities and Exposures (CVE) have been fixed: 18 of critical and 3 of high severity.

The summary table contains the total number of unique CVEs along with the total number of issues fixed across the images.

The full list of the CVEs present in the current Container Cloud release is available at the Mirantis Security Portal.

Addressed CVEs - summary

Severity

Critical

High

Total

Unique CVEs

1

1

2

Total issues across images

18

3

21

Addressed CVEs - detailed

Image

Component name

CVE

core/external/nginx

libwebp

CVE-2023-4863 (High)

core/frontend

libwebp

CVE-2023-4863 (High)

lcm/kubernetes/openstack-cloud-controller-manager-amd64

busybox

CVE-2022-48174 (Critical)

busybox-binsh

CVE-2022-48174 (Critical)

ssl_client

CVE-2022-48174 (Critical)

lcm/registry

busybox

CVE-2022-48174 (Critical)

busybox-binsh

CVE-2022-48174 (Critical)

ssl_client

CVE-2022-48174 (Critical)

scale/curl-jq

busybox

CVE-2022-48174 (Critical)

busybox-binsh

CVE-2022-48174 (Critical)

ssl_client

CVE-2022-48174 (Critical)

stacklight/alertmanager-webhook-servicenow

busybox

CVE-2022-48174 (Critical)

busybox-binsh

CVE-2022-48174 (Critical)

ssl_client

CVE-2022-48174 (Critical)

stacklight/grafana-image-renderer

libwebp

CVE-2023-4863 (High)

stacklight/ironic-prometheus-exporter

busybox

CVE-2022-48174 (Critical)

busybox-binsh

CVE-2022-48174 (Critical)

ssl_client

CVE-2022-48174 (Critical)

stacklight/sf-reporter

busybox

CVE-2022-48174 (Critical)

busybox-binsh

CVE-2022-48174 (Critical)

ssl_client

CVE-2022-48174 (Critical)

2.24.4

The Container Cloud patch release 2.24.4, which is based on the 2.24.2 major release, provides the following updates:

  • Support for the patch Cluster releases 14.0.3 and 15.0.3 that represents Mirantis OpenStack for Kubernetes (MOSK) patch release 23.2.2.

  • Support for the multi-rack topology on bare metal managed clusters

  • Support for configuration of the etcd storage quota

  • Security fixes for CVEs of Critical and High severity

This patch release also supports the latest major Cluster releases 14.0.1 and 15.0.1. And it does not support greenfield deployments based on deprecated Cluster releases 15.0.2, 14.0.2, along with 12.7.x and 11.7.x series. Use the latest available Cluster releases for new deployments instead.

For main deliverables of the parent Container Cloud releases of 2.24.4, refer to 2.24.0 and 2.24.1.

Enhancements

This section outlines new features and enhancements introduced in the Container Cloud patch release 2.24.4.

Configuration of the etcd storage quota

Added the capability to configure storage quota, which is 2 GB by default. You may need to increase the default etcd storage quota if etcd runs out of space and there is no other way to clean up the storage on your management or managed cluster.

Multi-rack topology for bare metal managed clusters

TechPreview

Added support for the multi-rack topology on bare metal managed clusters. Implementation of the multi-rack topology implies the use of Rack and MultiRackCluster objects that support configuration of BGP announcement of the cluster API load balancer address.

You can now create a managed cluster where cluster nodes including Kubernetes masters are distributed across multiple racks without L2 layer extension between them, and use BGP for announcement of the cluster API load balancer address and external addresses of Kubernetes load-balanced services.

Artifacts

This section lists the components artifacts of the Container Cloud patch release 2.24.4. For artifacts of the Cluster releases introduced in 2.24.4, see patch Cluster releases 15.0.3 and 14.0.3.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20230606121129

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20230606121129

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-104-6e2e82c.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.37.24.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.37.24.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.37.24.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.37.24.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.37.24.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.37.24.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.37.24.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.37.24

baremetal-dnsmasq

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-alpine-20230810152159

baremetal-operator

mirantis.azurecr.io/bm/baremetal-operator:base-alpine-20230803175048

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-alpine-20230829084517

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.37.24

ironic

mirantis.azurecr.io/openstack/ironic:yoga-focal-20230810113432

ironic-inspector

mirantis.azurecr.io/openstack/ironic-inspector:yoga-focal-20230810113432

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20230531081117

kaas-ipam

mirantis.azurecr.io/bm/kaas-ipam:base-alpine-20230810155639

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-5359171-20230810125608

mariadb

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20230730124341

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.22.0-66-ga855169

metallb-controller

mirantis.azurecr.io/bm/metallb/controller:v0.13.9-53df4a9c-amd64

metallb-speaker

mirantis.azurecr.io/bm/metallb/speaker:v0.13.9-53df4a9c-amd64

syslog-ng

mirantis.azurecr.io/bm/syslog-ng:base-apline-20230814110635

Core artifacts

Artifact

Component

Paths

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com//core/binbootstrap-darwin-1.37.24.tgz

bootstrap-linux

https://binary.mirantis.com//core/binbootstrap-linux-1.37.24.tgz

Helm charts Updated

admission-controller

https://binary.mirantis.com/core/helm/admission-controller-1.37.24.tgz

agent-controller

https://binary.mirantis.com/core/helm/agent-controller-1.37.24.tgz

ceph-kcc-controller

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.37.24.tgz

cert-manager

https://binary.mirantis.com/core/helm/cert-manager-1.37.24.tgz

client-certificate-controller

https://binary.mirantis.com/core/helm/client-certificate-controller-1.37.24.tgz

event-controller

https://binary.mirantis.com/core/helm/event-controller-1.37.24.tgz

iam-controller

https://binary.mirantis.com/core/helm/iam-controller-1.37.24.tgz

kaas-exporter

https://binary.mirantis.com/core/helm/kaas-exporter-1.37.24.tgz

kaas-public-api

https://binary.mirantis.com/core/helm/kaas-public-api-1.37.24.tgz

kaas-ui

https://binary.mirantis.com/core/helm/kaas-ui-1.37.24.tgz

lcm-controller

https://binary.mirantis.com/core/helm/lcm-controller-1.37.24.tgz

license-controller

https://binary.mirantis.com/core/helm/license-controller-1.37.24.tgz

machinepool-controller

https://binary.mirantis.com/core/helm/machinepool-controller-1.37.24.tgz

mcc-cache

https://binary.mirantis.com/core/helm/mcc-cache-1.37.24.tgz

mcc-cache-warmup

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.37.24.tgz

metrics-server

https://binary.mirantis.com/core/helm/metrics-server-1.37.24.tgz

openstack-provider

https://binary.mirantis.com/core/helm/openstack-provider-1.37.24.tgz

os-credentials-controller

https://binary.mirantis.com/core/helm/os-credentials-controller-1.37.24.tgz

portforward-controller

https://binary.mirantis.com/core/helm/portforward-controller-1.37.24.tgz

proxy-controller

https://binary.mirantis.com/core/helm/proxy-controller-1.37.24.tgz

rbac-controller

https://binary.mirantis.com/core/helm/rbac-controller-1.37.24.tgz

release-controller

https://binary.mirantis.com/core/helm/release-controller-1.37.24.tgz

rhellicense-controller

https://binary.mirantis.com/core/helm/rhellicense-controller-1.37.24.tgz

scope-controller

https://binary.mirantis.com/core/helm/scope-controller-1.37.24.tgz

squid-proxy

https://binary.mirantis.com/core/helm/squid-proxy-1.37.24.tgz

storage-discovery

https://binary.mirantis.com/core/helm/storage-discovery-1.37.24.tgz

user-controller

https://binary.mirantis.com/core/helm/user-controller-1.37.24.tgz

vsphere-credentials-controller

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.37.24.tgz

vsphere-provider

https://binary.mirantis.com/core/helm/vsphere-provider-1.37.24.tgz

vsphere-vm-template-controller

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.37.24.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.37.24

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.37.24

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.37.24

cert-manager-controller

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-2

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.37.24

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.37.24

frontend Updated

mirantis.azurecr.io/core/frontend:1.37.24

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.37.24

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.37.24

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.37.24

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.37.24

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.37.24

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.37.24

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.22.0-66-ga855169

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.22.0-66-ga855169

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-2

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.37.24

openstack-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager-amd64:v1.24.5-10-g93314b86

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.37.24

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.37.24

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.37.24

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.37.24

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.37.24

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-4

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.37.24

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.37.24

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.37.24

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.37.24

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.37.24

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.37.24

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.37.24

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.37.24

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts

iam

https://binary.mirantis.com/iam/helm/iam-2.5.4.tgz

Docker images

keycloak

mirantis.azurecr.io/iam/keycloak:0.6.0

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-27d64fb-20230421151539

mariadb

mirantis.azurecr.io/general/mariadb:10.6.12-focal-20230423170220

Security notes

In total, since Container Cloud 2.24.3, in 2.24.4, 18 Common Vulnerabilities and Exposures (CVE) have been fixed: 3 of critical and 15 of high severity.

The summary table contains the total number of unique CVEs along with the total number of issues fixed across the images.

The full list of the CVEs present in the current Container Cloud release is available at the Mirantis Security Portal.

Addressed CVEs - summary

Severity

Critical

High

Total

Unique CVEs

1

10

11

Total issues across images

3

15

18

Addressed CVEs - detailed

Image

Component name

CVE

iam/keycloak-gatekeeper

golang.org/x/crypto

CVE-2021-43565 (High)

CVE-2022-27191 (High)

CVE-2020-29652 (High)

golang.org/x/net

CVE-2022-27664 (High)

CVE-2021-33194 (High)

golang.org/x/text

CVE-2021-38561 (High)

CVE-2022-32149 (High)

github.com/prometheus/client_golang

CVE-2022-21698 (High)

scale/psql-client

busybox

CVE-2022-48174 (Critical)

busybox-binsh

CVE-2022-48174 (Critical)

ssl_client

CVE-2022-48174 (Critical)

libpq

CVE-2023-39417 (High)

postgresql13-client

CVE-2023-39417 (High)

stacklight/alerta-web

grpcio

CVE-2023-33953 (High)

libpq

CVE-2023-39417 (High)

postgresql15-client

CVE-2023-39417 (High)

stacklight/pgbouncer

libpq

CVE-2023-39417 (High)

postgresql-client

CVE-2023-39417 (High)

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.24.4 along with the patch Cluster releases 14.0.3 and 15.0.3.

  • [34200][Ceph] Fixed the watch command missing in the rook-ceph-tools Pod.

  • [34836][Ceph] Fixed ceph-disk-daemon spawning a lot of zombie processes.

2.24.3

The Container Cloud patch release 2.24.3, which is based on the 2.24.2 major release, provides the following updates:

This patch release also supports the latest major Cluster releases 14.0.1 and 15.0.1. And it does not support greenfield deployments based on deprecated Cluster release 14.0.0 along with 12.7.x and 11.7.x series. Use the latest available Cluster releases instead.

For main deliverables of the parent Container Cloud releases of 2.24.3, refer to 2.24.0 and 2.24.1.

Artifacts

This section lists the components artifacts of the Container Cloud patch release 2.24.3. For artifacts of the Cluster releases introduced in 2.24.3, see Cluster releases 15.0.2 and 14.0.2.

Note

The components that are newly added, updated, deprecated, or removed as compared to the previous release version, are marked with a corresponding superscript, for example, lcm-ansible Updated.

Bare metal artifacts

Artifact

Component

Path

Binaries

ironic-python-agent.initramfs

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/initramfs-yoga-focal-debug-20230606121129

ironic-python-agent.kernel

https://binary.mirantis.com/bm/bin/ironic/ipa/ubuntu/kernel-yoga-focal-debug-20230606121129

provisioning_ansible

https://binary.mirantis.com/bm/bin/ansible/provisioning_ansible-0.1.1-104-6e2e82c.tgz

Helm charts Updated

baremetal-api

https://binary.mirantis.com/core/helm/baremetal-api-1.37.23.tgz

baremetal-operator

https://binary.mirantis.com/core/helm/baremetal-operator-1.37.23.tgz

baremetal-provider

https://binary.mirantis.com/core/helm/baremetal-provider-1.37.23.tgz

baremetal-public-api

https://binary.mirantis.com/core/helm/baremetal-public-api-1.37.23.tgz

kaas-ipam

https://binary.mirantis.com/core/helm/kaas-ipam-1.37.23.tgz

local-volume-provisioner

https://binary.mirantis.com/core/helm/local-volume-provisioner-1.37.23.tgz

metallb

https://binary.mirantis.com/core/helm/metallb-1.37.23.tgz

Docker images

ambasador Updated

mirantis.azurecr.io/core/external/nginx:1.37.23

baremetal-dnsmasq Updated

mirantis.azurecr.io/bm/baremetal-dnsmasq:base-alpine-20230810152159

baremetal-operator Updated

mirantis.azurecr.io/bm/baremetal-operator:base-alpine-20230803175048

bm-collective Updated

mirantis.azurecr.io/bm/bm-collective:base-alpine-20230810134945

cluster-api-provider-baremetal Updated

mirantis.azurecr.io/core/cluster-api-provider-baremetal:1.37.23

ironic Updated

mirantis.azurecr.io/openstack/ironic:yoga-focal-20230810113432

ironic-inspector Updated

mirantis.azurecr.io/openstack/ironic-inspector:yoga-focal-20230810113432

ironic-prometheus-exporter

mirantis.azurecr.io/stacklight/ironic-prometheus-exporter:0.1-20230531081117

kaas-ipam Updated

mirantis.azurecr.io/bm/kaas-ipam:base-alpine-20230810155639

kubernetes-entrypoint Updated

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-5359171-20230810125608

mariadb Updated

mirantis.azurecr.io/general/mariadb:10.6.14-focal-20230730124341

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.22.0-63-g8f4f248

metallb-controller Updated

mirantis.azurecr.io/bm/metallb/controller:v0.13.9-53df4a9c-amd64

metallb-speaker Updated

mirantis.azurecr.io/bm/metallb/speaker:v0.13.9-53df4a9c-amd64

syslog-ng Updated

mirantis.azurecr.io/bm/syslog-ng:base-apline-20230814110635

Core artifacts

Artifact

Component

Paths

Bootstrap tarball Updated

bootstrap-darwin

https://binary.mirantis.com//core/binbootstrap-darwin-1.37.23.tgz

bootstrap-linux

https://binary.mirantis.com//core/binbootstrap-linux-1.37.23.tgz

Helm charts

admission-controller Updated

https://binary.mirantis.com/core/helm/admission-controller-1.37.23.tgz

agent-controller Updated

https://binary.mirantis.com/core/helm/agent-controller-1.37.23.tgz

ceph-kcc-controller Updated

https://binary.mirantis.com/core/helm/ceph-kcc-controller-1.37.23.tgz

cert-manager Updated

https://binary.mirantis.com/core/helm/cert-manager-1.37.23.tgz

client-certificate-controller Updated

https://binary.mirantis.com/core/helm/client-certificate-controller-1.37.23.tgz

event-controller Updated

https://binary.mirantis.com/core/helm/event-controller-1.37.23.tgz

iam-controller Updated

https://binary.mirantis.com/core/helm/iam-controller-1.37.23.tgz

kaas-exporter Updated

https://binary.mirantis.com/core/helm/kaas-exporter-1.37.23.tgz

kaas-public-api Updated

https://binary.mirantis.com/core/helm/kaas-public-api-1.37.23.tgz

kaas-ui Updated

https://binary.mirantis.com/core/helm/kaas-ui-1.37.23.tgz

lcm-controller Updated

https://binary.mirantis.com/core/helm/lcm-controller-1.37.23.tgz

license-controller Updated

https://binary.mirantis.com/core/helm/license-controller-1.37.23.tgz

machinepool-controller Updated

https://binary.mirantis.com/core/helm/machinepool-controller-1.37.23.tgz

mcc-cache Updated

https://binary.mirantis.com/core/helm/mcc-cache-1.37.23.tgz

mcc-cache-warmup Updated

https://binary.mirantis.com/core/helm/mcc-cache-warmup-1.37.23.tgz

metrics-server Updated

https://binary.mirantis.com/core/helm/metrics-server-1.37.23.tgz

openstack-provider Updated

https://binary.mirantis.com/core/helm/openstack-provider-1.37.23.tgz

os-credentials-controller Updated

https://binary.mirantis.com/core/helm/os-credentials-controller-1.37.23.tgz

portforward-controller Updated

https://binary.mirantis.com/core/helm/portforward-controller-1.37.23.tgz

proxy-controller Updated

https://binary.mirantis.com/core/helm/proxy-controller-1.37.23.tgz

rbac-controller Updated

https://binary.mirantis.com/core/helm/rbac-controller-1.37.23.tgz

release-controller Updated

https://binary.mirantis.com/core/helm/release-controller-1.37.23.tgz

rhellicense-controller Updated

https://binary.mirantis.com/core/helm/rhellicense-controller-1.37.23.tgz

scope-controller Updated

https://binary.mirantis.com/core/helm/scope-controller-1.37.23.tgz

squid-proxy Updated

https://binary.mirantis.com/core/helm/squid-proxy-1.37.23.tgz

storage-discovery Updated

https://binary.mirantis.com/core/helm/storage-discovery-1.37.23.tgz

user-controller Updated

https://binary.mirantis.com/core/helm/user-controller-1.37.23.tgz

vsphere-credentials-controller Updated

https://binary.mirantis.com/core/helm/vsphere-credentials-controller-1.37.23.tgz

vsphere-provider Updated

https://binary.mirantis.com/core/helm/vsphere-provider-1.37.23.tgz

vsphere-vm-template-controller Updated

https://binary.mirantis.com/core/helm/vsphere-vm-template-controller-1.37.23.tgz

Docker images

admission-controller Updated

mirantis.azurecr.io/core/admission-controller:1.37.23

agent-controller Updated

mirantis.azurecr.io/core/agent-controller:1.37.23

ceph-kcc-controller Updated

mirantis.azurecr.io/core/ceph-kcc-controller:1.37.23

cert-manager-controller Updated

mirantis.azurecr.io/core/external/cert-manager-controller:v1.11.0-2

client-certificate-controller Updated

mirantis.azurecr.io/core/client-certificate-controller:1.37.23

event-controller Updated

mirantis.azurecr.io/core/event-controller:1.37.23

frontend Updated

mirantis.azurecr.io/core/frontend:1.37.23

iam-controller Updated

mirantis.azurecr.io/core/iam-controller:1.37.23

kaas-exporter Updated

mirantis.azurecr.io/core/kaas-exporter:1.37.23

kproxy Updated

mirantis.azurecr.io/core/kproxy:1.37.23

lcm-controller Updated

mirantis.azurecr.io/core/lcm-controller:1.37.23

license-controller Updated

mirantis.azurecr.io/core/license-controller:1.37.23

machinepool-controller Updated

mirantis.azurecr.io/core/machinepool-controller:1.37.23

mcc-haproxy Updated

mirantis.azurecr.io/lcm/mcc-haproxy:v0.22.0-63-g8f4f248

mcc-keepalived Updated

mirantis.azurecr.io/lcm/mcc-keepalived:v0.22.0-63-g8f4f248

metrics-server

mirantis.azurecr.io/lcm/metrics-server-amd64:v0.6.3-2

nginx Updated

mirantis.azurecr.io/core/external/nginx:1.37.23

openstack-cloud-controller-manager

mirantis.azurecr.io/lcm/kubernetes/openstack-cloud-controller-manager-amd64:v1.24.5-10-g93314b86

openstack-cluster-api-controller Updated

mirantis.azurecr.io/core/openstack-cluster-api-controller:1.37.23

os-credentials-controller Updated

mirantis.azurecr.io/core/os-credentials-controller:1.37.23

portforward-controller Updated

mirantis.azurecr.io/core/portforward-controller:1.37.23

proxy-controller Updated

mirantis.azurecr.io/core/proxy-controller:1.37.23

rbac-controller Updated

mirantis.azurecr.io/core/rbac-controller:1.37.23

registry

mirantis.azurecr.io/lcm/registry:v2.8.1-4

release-controller Updated

mirantis.azurecr.io/core/release-controller:1.37.23

rhellicense-controller Updated

mirantis.azurecr.io/core/rhellicense-controller:1.37.23

scope-controller Updated

mirantis.azurecr.io/core/scope-controller:1.37.23

squid-proxy

mirantis.azurecr.io/lcm/squid-proxy:0.0.1-10-g24a0d69

storage-discovery Updated

mirantis.azurecr.io/core/storage-discovery:1.37.23

user-controller Updated

mirantis.azurecr.io/core/user-controller:1.37.23

vsphere-cluster-api-controller Updated

mirantis.azurecr.io/core/vsphere-cluster-api-controller:1.37.23

vsphere-credentials-controller Updated

mirantis.azurecr.io/core/vsphere-credentials-controller:1.37.23

vsphere-vm-template-controller Updated

mirantis.azurecr.io/core/vsphere-vm-template-controller:1.37.23

IAM artifacts

Artifact

Component

Path

Binaries

hash-generate-darwin

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-darwin

hash-generate-linux

http://binary.mirantis.com/iam/bin/hash-generate-0.0.1-268-3cf7f17-linux

Helm charts

iam Updated

https://binary.mirantis.com/iam/helm/iam-2.5.4.tgz

Docker images

keycloak

mirantis.azurecr.io/iam/keycloak:0.6.0

kubernetes-entrypoint

mirantis.azurecr.io/openstack/extra/kubernetes-entrypoint:v1.0.1-27d64fb-20230421151539

mariadb

mirantis.azurecr.io/general/mariadb:10.6.12-focal-20230423170220

Security notes

In total, since Container Cloud 2.24.1, in 2.24.3, 63 Common Vulnerabilities and Exposures (CVE) with high severity have been fixed.

The summary table contains the total number of unique CVEs along with the total number of issues fixed across the images.

The full list of the CVEs present in the current Container Cloud release is available at the Mirantis Security Portal.

Addressed CVEs - summary

Severity

Critical

High

Total

Unique CVEs

0

15

15

Total issues across images

0

63

63

Addressed CVEs - detailed

Image

Component name

CVE

bm/external/metallb/controller

libcrypto3

CVE-2023-0464 (High)

CVE-2023-2650 (High)

libssl3

CVE-2023-2650 (High)

CVE-2023-0464 (High)

golang.org/x/net

CVE-2022-41723 (High)

bm/external/metallb/speaker

libcrypto3

CVE-2023-2650 (High)

CVE-2023-0464 (High)

libssl3

CVE-2023-0464 (High)

CVE-2023-2650 (High)

golang.org/x/net

CVE-2022-41723 (High)

core/external/cert-manager-cainjector

golang.org/x/net

CVE-2022-41723 (High)

core/external/cert-manager-controller

golang.org/x/net

CVE-2022-41723 (High)

core/external/cert-manager-webhook

golang.org/x/net

CVE-2022-41723 (High)

core/external/nginx

nghttp2-libs

CVE-2023-35945 (High)

core/frontend

nghttp2-libs

CVE-2023-35945 (High)

lcm/external/csi-attacher

github.com/prometheus/client_golang

CVE-2022-21698 (High)

golang.org/x/net

CVE-2022-27664 (High)

golang.org/x/text

CVE-2022-32149 (High)

gopkg.in/yaml.v3

CVE-2022-28948 (High)

lcm/external/csi-node-driver-registrar

github.com/prometheus/client_golang

CVE-2022-21698 (High)

golang.org/x/net

CVE-2022-27664 (High)

golang.org/x/text

CVE-2022-32149 (High)

lcm/external/csi-provisioner

golang.org/x/crypto

CVE-2021-43565 (High)

CVE-2022-27191 (High)

github.com/prometheus/client_golang

CVE-2022-21698 (High)

golang.org/x/net

CVE-2022-27664 (High)

golang.org/x/text

CVE-2022-32149 (High)

gopkg.in/yaml.v3

CVE-2022-28948 (High)

lcm/external/csi-resizer

github.com/prometheus/client_golang

CVE-2022-21698 (High)

golang.org/x/net

CVE-2022-27664 (High)

golang.org/x/text

CVE-2022-32149 (High)

gopkg.in/yaml.v3

CVE-2022-28948 (High)

lcm/external/csi-snapshotter

github.com/prometheus/client_golang

CVE-2022-21698 (High)

golang.org/x/net

CVE-2022-27664 (High)

golang.org/x/text

CVE-2022-32149 (High)

gopkg.in/yaml.v3

CVE-2022-28948 (High)

lcm/external/livenessprobe

golang.org/x/text

CVE-2021-38561 (High)

CVE-2022-32149 (High)

github.com/prometheus/client_golang

CVE-2022-21698 (High)

golang.org/x/net

CVE-2022-27664 (High)

lcm/kubernetes/cinder-csi-plugin-amd64

libpython3.7-minimal

CVE-2021-3737 (High)

CVE-2020-10735 (High)

CVE-2022-45061 (High)

CVE-2015-20107 (High)

libpython3.7-stdlib

CVE-2021-3737 (High)

CVE-2020-10735 (High)

CVE-2022-45061 (High)

CVE-2015-20107 (High)

python3.7

CVE-2021-3737 (High)

CVE-2020-10735 (High)

CVE-2022-45061 (High)

CVE-2015-20107 (High)

python3.7-minimal

CVE-2021-3737 (High)

CVE-2020-10735 (High)

CVE-2022-45061 (High)

CVE-2015-20107 (High)

libssl1.1

CVE-2023-2650 (High)

CVE-2023-0464 (High)

openssl

CVE-2023-2650 (High)

CVE-2023-0464 (High)

lcm/mcc-haproxy

nghttp2-libs

CVE-2023-35945 (High)

openstack/ironic

cryptography

CVE-2023-2650 (High)

openstack/ironic-inspector

cryptography

CVE-2023-2650 (High)

Addressed issues

The following issues have been addressed in the Container Cloud patch release 2.24.3 along with the patch Cluster releases 14.0.2 and 15.0.2.

  • [34638][BM] Fixed the issue with failure to delete a management cluster due to the issue with secrets during machine deletion.

  • [34220][BM] Fixed the issue with ownerReferences being lost for HardwareData after pivoting during a management cluster bootstrap.

  • [34280][LCM] Fixed the issue with no cluster reconciles generated if a cluster is stuck on waiting for agents upgrade.

  • [33439][TLS] Fixed the issue with client-certificate-controller silently replacing user-provided key if PEM header and key format do not match.

  • [33686][audit] Fixed the issue with rules provided by the docker auditd preset not covering the Sysdig Docker CIS benchmark.

  • [34080][StackLight] Fixed the issue with missing events in OpenSearch that have lastTimestamp set to null and eventTime set to a non-null value.

2.24.2

The Container Cloud major release 2.24.2 based on 2.24.0 and 2.24.1 provides the following:

  • Introduces support for the major Cluster release 15.0.1 that is based on the Cluster release 14.0.1 and represents Mirantis OpenStack for Kubernetes (MOSK) 23.2. This Cluster release is based on the updated version of Mirantis Kubernetes Engine 3.6.5 with Kubernetes 1.24 and Mirantis Container Runtime 20.10.17.

  • Supports the latest Cluster release 14.0.1.

  • Does not support greenfield deployments based on deprecated Cluster release 14.0.0 along with 12.7.x and 11.7.x series. Use the latest available Cluster releases of the series instead.

For main deliverables of the Container Cloud release 2.24.2, refer to its parent release 2.24.0:

Caution

Make sure to update the Cluster release version of your managed cluster before the current Cluster release version becomes unsupported by a new Container Cloud release version. Otherwise, Container Cloud stops auto-upgrade and eventually Container Cloud itself becomes unsupported.

2.24.1

The Container Cloud patch release 2.24.1 based on 2.24.0 includes updated baremetal-operator, admission-controller, and iam artifacts and provides hot fixes for the following issues:

  • [34218] Fixed the issue with the iam-keycloak Pod being stuck in the Pending state during Keycloak upgrade to version 21.1.1.

  • [34247] Fixed the issue with MKE backup failing during cluster update due to wrong permissions in the etcd backup directory. If the issue still persists, which may occur on clusters that were originally deployed using early Container Cloud releases delivered in 2020-2021, follow the workaround steps described in Known issues: LCM.

Note

Container Cloud patch release 2.24.1 does not introduce new Cluster releases.

For main deliverables of the Container Cloud release 2.24.1, refer to its parent release 2.24.0:

Caution

Make sure to update the Cluster release version of your managed cluster before the current Cluster release version becomes unsupported by a new Container Cloud release version. Otherwise, Container Cloud stops auto-upgrade and eventually Container Cloud itself becomes unsupported.

2.24.0

Important

Container Cloud 2.24.0 has been successfully applied to a certain number of clusters. The 2.24.0 related documentation content fully applies to these clusters.

If your cluster started to update but was reverted to the previous product version or the update is stuck, you automatically receive the 2.24.1 patch release with the bug fixes to unblock the update to the 2.24 series.

There is no impact on the cluster workloads. For details on the patch release, see 2.24.1.

The Mirantis Container Cloud GA release 2.24.0:

  • Introduces support for the Cluster release 14.0.0 that is based on Mirantis Container Runtime 20.10.17 and Mirantis Kubernetes Engine 3.6.5 with Kubernetes 1.24.

  • Supports the latest major and patch Cluster releases of the 12.7.x series that supports Mirantis OpenStack for Kubernetes (MOSK) 23.1 series.

  • Does not support greenfield deployments on deprecated Cluster releases 12.7.3, 11.7.4, or earlier patch releases, 12.5.0, or 11.7.0. Use the latest available Cluster releases of the series instead.

    Caution

    Make sure to update the Cluster release version of your managed cluster before the current Cluster release version becomes unsupported by a new Container Cloud release version. Otherwise, Container Cloud stops auto-upgrade and eventually Container Cloud itself becomes unsupported.

This section outlines release notes for the Container Cloud release 2.24.0.

Enhancements

This section outlines new features and enhancements introduced in the Mirantis Container Cloud release 2.24.0. For the list of enhancements in the Cluster release 14.0.0 that is introduced by the Container Cloud release 2.24.0, see the 14.0.0.

Automated upgrade of operating system on bare metal clusters

Support status of the feature

  • Since MOSK 23.2, the feature is generally available for MOSK clusters.

  • Since Container Cloud 2.24.2, the feature is generally available for any type of bare metal clusters.

  • Since Container Cloud 2.24.0, the feature is available as Technology Preview for management and regional clusters only.

Implemented automatic in-place upgrade of an operating system (OS) distribution on bare metal clusters. The OS upgrade occurs as part of cluster update that requires machines reboot. The OS upgrade workflow is as follows:

  1. The distribution ID value is taken from the id field of the distribution from the allowedDistributions list in the spec of the ClusterRelease object.

  2. The distribution that has the default: true value is used during update. This distribution ID is set in the spec:providerSpec:value:distribution field of the Machine object during cluster update.

On management and regional clusters, the operating system upgrades automatically during cluster update. For managed clusters, an in-place OS distribution upgrade should be performed between cluster updates. This scenario implies a machine cordoning, draining, and reboot.

Warning

During the course of the Container Cloud 2.28.x series, Mirantis highly recommends upgrading an operating system on any nodes of all your managed cluster machines to Ubuntu 22.04 before the next major Cluster release becomes available.

It is not mandatory to upgrade all machines at once. You can upgrade them one by one or in small batches, for example, if the maintenance window is limited in time.

Otherwise, the Cluster release update of the Ubuntu 20.04-based managed clusters will become impossible as of Container Cloud 2.29.0 with Ubuntu 22.04 as the only supported version.

Management cluster update to Container Cloud 2.29.1 will be blocked if at least one node of any related managed cluster is running Ubuntu 20.04.

Support for WireGuard on bare metal clusters

TechPreview

Added initial Technology Preview support for WireGuard that enables traffic encryption on the Kubernetes workloads network. Set secureOverlay: true in the Cluster object during deployment of management, regional, or managed bare metal clusters to enable WireGuard encryption.

Also, added the possibility to configure the maximum transmission unit (MTU) size for Calico that is required for the WireGuard functionality and allows maximizing network performance.

Note

For MOSK-based deployments, the feature support is available since MOSK 23.2.

MetalLB configuration changes for bare metal and vSphere

For management and regional clusters

Caution

For managed clusters, this object is available as Technology Preview and will become generally available in one of the following Container Cloud releases.

Introduced the following MetalLB configuration changes and objects related to address allocation and announcement of services LB for bare metal and vSphere providers:

  • Introduced the MetalLBConfigTemplate object for bare metal and the MetalLBConfig object for vSphere to be used as default and recommended.

  • For vSphere, during creation of clusters of any type, now a separate MetalLBConfig object is created instead of corresponding settings in the Cluster object.

  • The use of either Subnet objects without the new MetalLB objects or the configInline MetalLB value of the Cluster object is deprecated and will be removed in one of the following releases.

  • If the MetalLBConfig object is not used for MetalLB configuration related to address allocation and announcement of services LB, then automated migration applies during creation of clusters of any type or cluster update to Container Cloud 2.24.0.

    During automated migration, the MetalLBConfig and MetalLBConfigTemplate objects for bare metal or the MetalLBConfig for vSphere are created and contents of the MetalLB chart configInline value is converted to the parameters of the MetalLBConfigTemplate object for bare metal or of the MetalLBConfig object for vSphere.

The following changes apply to the bare metal bootstrap procedure:

  • Moved the following environment variables from cluster.yaml.template to the dedicated ipam-objects.yaml.template:

    • BOOTSTRAP_METALLB_ADDRESS_POOL

    • KAAS_BM_BM_DHCP_RANGE

    • SET_METALLB_ADDR_POOL

    • SET_LB_HOST

  • Modified the default network configuration. Now it includes a bond interface and separated PXE and management networks. Mirantis recommends using separate PXE and management networks for management and regional clusters.

Support for RHEL 8.7 on the vSphere provider

TechPreview

Added support for RHEL 8.7 on the vSphere-based management, regional, and managed clusters.

Custom flavors for Octavia on OpenStack-based clusters

Implemented the possibility to use custom Octavia Amphora flavors that you can enable in spec:providerSpec section of the Cluster object using serviceAnnotations:loadbalancer.openstack.org/flavor-id during management or regional cluster deployment.

Note

For managed clusters, you can enable the feature through the Container Cloud API. The web UI functionality will be added in one of the following Container Cloud releases.

Deletion of persistent volumes during an OpenStack-based cluster deletion

Completed the development of persistent volumes deletion during an OpenStack-based managed cluster deletion by implementing the Delete all volumes in the cluster check box in the cluster deletion menu of the Container Cloud web UI.

Support for Keycloak Quarkus

Upgraded the Keycloak major version from 18.0.0 to 21.1.1. For the list of new features and enhancements, see Keycloak Release Notes.

The upgrade path is fully automated. No data migration or custom LCM changes are required.

Important

After the Keycloak upgrade, access the Keycloak Admin Console using the new URL format: https://<keycloak.ip>/auth instead of https://<keycloak.ip>. Otherwise, the Resource not found error displays in a browser.

Custom host names for cluster machines

TechPreview

Added initial Technology Preview support for custom host names of machines on any supported provider and any cluster type. When enabled, any machine host name in a particular region matches the related Machine object name. For example, instead of the default kaas-node-<UID>, a machine host name will be master-0. The custom naming format is more convenient and easier to operate with.

You can enable the feature before or after management or regional cluster deployment. If enabled after deployment, custom host names will apply to all newly deployed machines in the region. Existing host names will remain the same.

Parallel update of worker nodes

TechPreview

Added initial Technology Preview support for parallelizing of node update operations that significantly improves the efficiency of your cluster. To configure the parallel node update, use the following parameters located under spec.providerSpec of the Cluster object:

  • maxWorkerUpgradeCount - maximum number of worker nodes for simultaneous update to limit machine draining during update

  • maxWorkerPrepareCount - maximum number of workers for artifacts downloading to limit network load during update

Note

For MOSK clusters, you can start using this feature during cluster update from 23.1 to 23.2. For details, see MOSK documentation: Parallelizing node update operations.

Cache warm-up for managed clusters

Implemented the CacheWarmupRequest resource to predownload, aka warm up, a list of artifacts included in a given set of Cluster releases into the mcc-cache service only once per release. The feature facilitates and speeds up deployment and update of managed clusters.

After a successful cache warm-up, the object of the CacheWarmupRequest resource is automatically deleted from the cluster and cache remains for managed clusters deployment or update until next Container Cloud auto-upgrade of the management or regional cluster.

Caution

If the disk space for cache runs out, the cache for the oldest object is evicted. To avoid running out of space in the cache, verify and adjust its size before each cache warm-up.

Note

For MOSK-based deployments, the feature support is available since MOSK 23.2.

Support for auditd

TechPreview

Added initial Technology Preview support for the Linux Audit daemon auditd to monitor activity of cluster processes on any type of Container Cloud cluster. The feature is an essential requirement for many security guides that enables auditing of any cluster process to detect potential malicious activity.

You can enable and configure auditd either during or after cluster deployment using the Cluster object.

Note

For MOSK-based deployments, the feature support is available since MOSK 23.2.

Enhancements for TLS certificates configuration

TechPreview

Enhanced TLS certificates configuration for cluster applications:

  • Added support for configuration of TLS certificates for MKE on management or regional clusters to the existing support on managed clusters.

  • Implemented the ability to configure TLS certificates using the Container Cloud web UI through the Security section located in the More > Configure cluster menu.

Graceful cluster reboot using web UI

Expanded the capability to perform a graceful reboot on a management, regional, or managed cluster for all supported providers by adding the Reboot machines option to the cluster menu in the Container Cloud web UI. The feature allows for a rolling reboot of all cluster machines without workloads interruption. The reboot occurs in the order of cluster upgrade policy.

Note

For MOSK-based deployments, the feature support is available since MOSK 23.2.

Creation and deletion of bare metal host credentials using web UI

Improved management of bare metal host credentials using the Container Cloud web UI:

  • Added the Add Credential menu to the Credentials tab. The feature facilitates association of credentials with bare metal hosts created using the BM Hosts tab.

  • Implemented automatic deletion of credentials during deletion of bare metal hosts after deletion of managed cluster.

Node labeling improvements in web UI

Improved the Node Labels menu in the Container Cloud web UI by making it more intuitive. Replaced the greyed out (disabled) label names with the No labels have been assigned to this machine. message and the Add a node label button link.

Also, added the possibility to configure node labels for machine pools after deployment using the More > Configure Pool option.

Documentation enhancements

On top of continuous improvements delivered to the existing Container Cloud guides, added the documentation on managing Ceph OSDs with a separate metadata device.

Addressed issues

The following issues have been addressed in the Mirantis Container Cloud release 2.24.0 along with the Cluster release 14.0.0. For the list of hot fixes delivered in the 2.24.1 patch release, see 2.24.1.

  • [5981] Fixed the issue with upgrade of a cluster containing more than 120 nodes getting stuck on one node with errors about IP addresses exhaustion in the docker logs. On existing clusters, after updating to the Cluster release 14.0.0 or later, you can optionally remove the abandoned mke-overlay network using docker network rm mke-overlay.

  • [29604] Fixed the issue with the false positive failed to get kubeconfig error occurring on the Waiting for TLS settings to be applied stage during TLS configuration.

  • [29762] Fixed the issue with a wrong IP address being assigned after the MetalLB controller restart.

  • [30635] Fixed the issue with the pg_autoscaler module of Ceph Manager failing with the pool <poolNumber> has overlapping roots error if a Ceph cluster contains a mix of pools with deviceClass either explicitly specified or not specified.

  • [30857] Fixed the issue with irrelevant error message displaying in the osd-prepare Pod during the deployment of Ceph OSDs on removable devices on AMD nodes. Now, the error message clearly states that removable devices (with hotplug enabled) are not supported for deploying Ceph OSDs. This issue has been addressed since the Cluster release 14.0.0.

  • [30781] Fixed the issue with cAdvisor failing to collect metrics on CentOS-based deployments. Missing metrics affected the KubeContainersCPUThrottlingHigh alert and the following Grafana dashboards: Kubernetes Containers, Kubernetes Pods, and Kubernetes Namespaces.

  • [31288] Fixed the issue with Fluentd agent failing and the fluentd-logs Pods reporting the maximum open shards limit error, thus preventing OpenSearch to accept new logs. The fix enables the possibility to increase the limit for maximum open shards using cluster.max_shards_per_node. For details, see Tune StackLight for long-term log retention.

  • [31485] Fixed the issue with Elasticsearch Curator not deleting indices according to the configured retention period on any type of Container Cloud clusters.

Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud releases 2.24.0 and 2.24.1 including the Cluster release 14.0.0.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and