This documentation provides information on how to deploy and operate a
Mirantis OpenStack for Kubernetes (MOSK) environment.
The documentation is intended to help operators to understand the core
concepts of the product. The documentation provides sufficient information
to deploy and operate the solution.
The information provided in this documentation set is being constantly
improved and amended based on the feedback and kind requests from the
consumers of MOS.
The following table lists the guides included in the documentation set you
are reading:
This documentation is intended for engineers who have the basic knowledge of
Linux, virtualization and containerization technologies, Kubernetes API and
CLI, Helm and Helm charts, Mirantis Kubernetes Engine (MKE), and OpenStack.
GUI elements that include any part of interactive user interface and
menu navigation
Superscript
Some extra, brief information
Note
The Note block
Messages of a generic meaning that may be useful for the user
Caution
The Caution block
Information that prevents a user from mistakes and undesirable
consequences when following the procedures
Warning
The Warning block
Messages that include details that can be easily missed, but should not
be ignored by the user and are valuable before proceeding
See also
The See also block
List of references that may be helpful for understanding of some related
tools, concepts, and so on
Learn more
The Learn more block
Used in the Release Notes to wrap a list of internal references to
the reference architecture, deployment and operation procedures specific
to a newly implemented product feature
Mirantis OpenStack for Kubernetes (MOSK) combines the power of
Mirantis Container Cloud for delivering and managing Kubernetes clusters, with
the industry standard OpenStack APIs, enabling you to build your own cloud
infrastructure.
The advantages of running all of the OpenStack components as a Kubernetes
application are multi-fold and include the following:
Zero downtime, non-disruptive updates
Fully automated Day-2 operations
Full-stack management from bare metal through the operating system and
all the necessary components
The list of the most common use cases includes:
Software-defined data center
The traditional data center requires multiple requests and interactions
to deploy new services, by abstracting the data center functionality
behind a standardized set of APIs service can be deployed faster and
more efficiently. MOSK enables you to define all your
data center resources behind the industry standard OpenStack APIs allowing you
to automate the deployment of applications or simply request resources
through the UI to quickly and efficiently provision virtual machines,
storage, networking, and other resources.
Virtual Network Functions (VNFs)
VNFs require high performance systems that can be accessed on demand in
a standardized way, with assurances that they will have access to the
necessary resources and performance guarantees when needed.
MOSK provides extensive support for VNF workload enabling
easy access to functionality
such as Intel EPA (NUMA, CPU pinning, Huge Pages) as well as the consumption
of specialized networking interfaces cards to support SR-IOV and DPDK.
The centralized management model of MOSK and Mirantis
Container Cloud also enables the easy management of multiple
MOSK deployments with full lifecycle management.
Legacy workload migration
With the industry moving toward cloud-native technologies many older or
legacy applications are not able to be moved easily and often it does not
make financial sense to transform the applications to cloud-native
applications. MOSK provides a stable cloud platform that
can cost-effectively host legacy applications whilst still providing the
expected levels of control, customization, and uptime.
Mirantis OpenStack for Kubernetes (MOSK) is a virtualization
platform that provides an infrastructure for cloud-ready applications,
in combination with reliability and full control over the data.
MOSK combines OpenStack, an open-source cloud
infrastructure software, with application management techniques used
in the Kubernetes ecosystem that include container isolation, state
enforcement, declarative definition of deployments, and others.
MOSK integrates with Mirantis Container Cloud to rely
on its capabilities for bare-metal infrastructure provisioning, Kubernetes
cluster management, and continuous delivery of the stack components.
MOSK simplifies the work of a cloud operator by
automating all major cloud life cycle management routines including
cluster updates and upgrades.
A Mirantis OpenStack for Kubernetes (MOSK) deployment profile is a
thoroughly tested and officially supported reference architecture that is
guaranteed to work at a specific scale and is tailored to the demands of
a specific business case, such as generic IaaS cloud, Network Function
Virtualisation infrastructure, Edge Computing, and others.
A deployment profile is defined as a combination of:
Services and features the cloud offers to its users.
Non-functional characteristics that users and operators should expect when
running the profile on top of a reference hardware configuration. Including,
but not limited to:
Performance characteristics, such as an average network throughput between
VMs in the same virtual network.
Reliability characteristics, such as the cloud API error response rate when
recovering a failed controller node.
Scalability characteristics, such as the total amount of virtual routers
tenants can run simultaneously.
Hardware requirements - the specification of physical servers, and
networking equipment required to run the profile in production.
Deployment parameters that an operator for the cloud can tweak within a
certain range without being afraid of breaking the cloud or losing support.
In addition, the following items may be included in a definition:
Compliance-driven technical requirements, such as TLS encryption of all
external API endpoints.
Foundation-level software components, such as Tungsten Fabric or
Open vSwitch as a backend for the networking service.
Note
Mirantis reserves the right to revise the technical implementation of any
profile at will while preserving its definition - the functional
and non-functional characteristics that operators and users are known
to rely on.
MOSK supports a huge list of different deployment profiles
to address a wide variety of business tasks. The table below includes the
profiles for the most common use cases.
Note
Some components of a MOSK cluster are mandatory and
are being installed
during the managed cluster deployment by Container Cloud regardless of the
deployment profile in use. StackLight is one of the cluster components that
are enabled by default. See Container Cloud Operations Guide
for details.
Provides the core set of the services an IaaS vendor would need
including some extra functionality. The profile is designed to
support up 50-70 compute nodes and a reasonable number of
storage nodes. 0
The core set of services provided by the profile includes:
Compute (Nova)
Images (Glance)
Networking (Neutron with Open vSwitch as a backend)
The HelmBundle Operator is the realization of the Kubernetes Operator
pattern that provides a Kubernetes custom resource of the HelmBundle
kind and code running inside a pod in Kubernetes. This code handles changes,
such as creation, update, and deletion, in the Kubernetes resources of this
kind by deploying, updating, and deleting groups of Helm releases from
specified Helm charts with specified values.
The OpenStack platform manages virtual infrastructure resources, including
virtual servers, storage devices, networks, and networking services, such as
load balancers, as well as provides management functions to the tenant users.
Various OpenStack services are running as pods in Kubernetes and are
represented as appropriate native Kubernetes resources, such as
Deployments, StatefulSets, and DaemonSets.
For a simple, resilient, and flexible deployment of OpenStack and related
services on top of a Kubernetes cluster, MOSK uses
OpenStack-Helm that provides a required collection of the Helm charts.
Also, MOSK uses OpenStack Controller (Rockoon) as the
realization of the Kubernetes Operator pattern. Rockoon provides a custom
Kubernetes resource of the OpenStackDeployment kind and code running
inside a pod in Kubernetes. This code handles changes such as creation,
update, and deletion in the Kubernetes resources of this kind by
deploying, updating, and deleting groups of the Helm releases.
Ceph is a distributed storage platform that provides storage resources,
such as objects and virtual block devices, to virtual and physical
infrastructure.
MOSK uses Rook as the implementation of the
Kubernetes Operator pattern that manages resources of the CephCluster
kind to deploy and
manage Ceph services as pods on top of Kubernetes to provide Ceph-based
storage to the consumers, which include OpenStack services, such as Volume
and Image services, and underlying Kubernetes through Ceph CSI (Container
Storage Interface).
The Ceph Controller is the implementation of the Kubernetes Operator
pattern, that manages resources of the MiraCeph kind to simplify
management of the Rook-based Ceph clusters.
The StackLight component is responsible for collection, analysis, and
visualization of critical monitoring data from physical and virtual
infrastructure, as well as alerting and error notifications through
a configured communication system, such as email. StackLight includes
the following key sub-components:
This section provides hardware requirements for the Mirantis Container
Cloud management cluster with a managed Mirantis OpenStack for Kubernetes
(MOSK) cluster.
For installing MOSK, the Mirantis Container Cloud management
cluster and managed cluster must be deployed with baremetal provider.
Important
A MOSK cluster is to be used for a
deployment of an OpenStack cluster and its components. Deployment of
third-party workloads on a MOSK cluster is neither
allowed nor supported.
Note
One of the industry best practices is to verify every new update or
configuration change in a non-customer-facing environment before
applying it to production. Therefore, Mirantis recommends
having a staging cloud, deployed and maintained along with the production
clouds. The recommendation is especially applicable to the environments
that:
Receive updates often and use continuous delivery. For example,
any non-isolated deployment of Mirantis Container Cloud.
Have significant deviations from the reference architecture or
third party extensions installed.
Are managed under the Mirantis OpsCare program.
Run business-critical workloads where even the slightest application
downtime is unacceptable.
A typical staging cloud is a complete copy of the production environment
including the hardware and software configurations, but with a bare minimum
of compute and storage capacity.
The table below describes the node types the MOSK reference
architecture includes.
The Container Cloud management cluster architecture on bare metal
requires three physical servers for manager nodes. On these hosts,
we deploy a Kubernetes cluster with services that provide Container
Cloud control plane functions.
OpenStack control plane node and StackLight node
Host OpenStack control plane services such as database, messaging, API,
schedulers conductors, and L3 and L2 agents, as well as the StackLight
components.
Note
MOSK enables the cloud operator to
collocate the OpenStack control plane with the managed cluster master
nodes on the OpenStack deployments of a small size. This capability
is available as technical preview. Use such configuration for testing
and evaluation purposes only.
Tenant gateway node
Optional. Hosts OpenStack gateway services including L2, L3, and DHCP
agents. The tenant gateway nodes are combined with OpenStack control
plane nodes. The strict requirement is a dedicated physical network
(bond) for tenant network traffic.
Tungsten Fabric control plane node
Required only if Tungsten Fabric is enabled as a backend for the
OpenStack networking. These nodes host the TF control plane services
such as Cassandra database, messaging, API, control, and configuration
services.
Tungsten Fabric analytics node
Required only if Tungsten Fabric is enabled as a backend for the
OpenStack networking. These nodes host the TF analytics services
such as Cassandra, ZooKeeper, and collector.
Compute node
Hosts the OpenStack Compute services such as QEMU, L2 agents, and
others.
Infrastructure nodes
Runs underlying Kubernetes cluster management services.
The MOSK reference configuration requires minimum
three infrastructure nodes.
The table below specifies the hardware resources the MOSK
reference architecture recommends for each node type.
The exact hardware specifications and number of the control plane
and gateway nodes depend on a cloud configuration and scaling needs.
For example, for the clouds with more than 12,000 Neutron ports, Mirantis
recommends increasing the number of gateway nodes.
TF control plane and analytics nodes can be combined with a respective
addition of RAM, CPU, and disk space to the hardware hosts. Though,
Mirantis does not recommend such configuration for production environments
as the risk of the cluster downtime if one of the nodes unexpectedly fails
increases.
A Ceph cluster with 3 Ceph nodes does not provide hardware fault
tolerance and is not eligible for recovery operations,
such as a disk or an entire node replacement. Therefore, a minimum of
5 Ceph nodes is recommended for production use.
A Ceph cluster uses the replication factor that equals 3.
If the number of Ceph OSDs is less than 3, a Ceph cluster moves
to the degraded state with the write operations restriction until
the number of alive Ceph OSDs equals the replication factor again.
If you would like to evaluate the MOSK
capabilities and do not have much hardware at your disposal,
you can deploy it in a virtual environment. For example, on
top of another OpenStack cloud using the sample Heat templates.
Please mind, the tooling is provided for reference only and is not
a part of the product itself. Mirantis does not guarantee its
interoperability with any MOSK version.
The management cluster requires minimum two storage devices per node.
Each device is used for different type of storage:
One storage device for boot partitions and root file system.
SSD is recommended. A RAID device is not supported.
One storage device per server is reserved for local persistent
volumes. These volumes are served by the Local Storage Static Provisioner,
that is local-volume-provisioner, and used by many services of Mirantis
Container Cloud.
The seed node is only necessary to deploy the management cluster.
When the bootstrap is complete, the bootstrap node can be
discarded and added back to the MOSK cluster as a node of
any type.
The minimum reference system requirements for a baremetal-based bootstrap
seed node are as follow:
Basic Ubuntu 18.04 server with the following configuration:
Kernel of version 4.15.0-76.86 or later
8 GB of RAM
4 CPU
10 GB of free disk space for the bootstrap cluster cache
No DHCP or TFTP servers on any NIC networks
Routable access IPMI network for the hardware servers.
Internet access for downloading of all required artifacts
If you use a firewall or proxy, make sure that the bootstrap and management
clusters have access to the following IP ranges and domain names:
IP ranges:
Microsoft Azure
(only IP addresses for MicrosoftContainerRegistry)
Amazon AWS
(only IP addresses for "service":"CLOUDFRONT")
MOSK uses Kubernetes labels to place components onto hosts.
For the default locations of components, see MOSK cluster hardware requirements. Additionally,
MOSK supports component collocation. This is mostly useful
for OpenStack compute and Ceph nodes. For component collocation, consider
the following recommendations:
When calculating hardware requirements for nodes, consider the requirements
for all collocated components.
When performing maintenance on a node with collocated components, execute the
maintenance plan for all of them.
When combining other services with the OpenStack compute host, verify that
reserved_host_* has increased accordingly to the needs of collocated
components by using node-specific overrides for the compute service.
MetalLB exposes external IP addresses of cluster services to access
applications in a Kubernetes cluster.
DNS
The Kubernetes Ingress NGINX controller is used to expose OpenStack
services outside of a Kubernetes deployment. Access to the Ingress
services is allowed only by its FQDN. Therefore, DNS is a mandatory
infrastructure service for an OpenStack on Kubernetes deployment.
To keep operating system on a bare metal host up to date with the latest
security updates, the operating system requires periodic software
packages upgrade that may or may not require the host reboot.
Mirantis Container Cloud uses life cycle management tools to update
the operating system packages on the bare metal hosts.
In a management cluster, software package upgrade and host restart are
applied automatically when a new Container Cloud version
with available kernel or software packages upgrade is released.
In a managed cluster, package upgrade and host restart are applied as part of
usual cluster update, when applicable. To start planning the maintenance
window and proceed with the managed cluster update, see Cluster update.
Operating system upgrade and host restart are applied to cluster
nodes one by one. If Ceph is installed in the cluster, the Container
Cloud orchestration securely pauses the Ceph OSDs on the node before
restart. This allows avoiding degradation of the storage service.
Each section below is dedicated to a particular service provided by
MOSK. They contain configuration details and usage
samples of supported capabilities provided through the custom resources.
Mirantis OpenStack for Kubernetes (MOSK) provides instances management
capability through the Compute service (OpenStack Nova). The Compute service
interacts with other OpenStack components of an OpenStack environment to
provide life-cycle management of the virtual machine instances.
The Compute service (OpenStack Nova) enables you to spawn instances that can
collectively consume more resources than what is physically available on a
compute node through resource oversubscription, also known as overcommit
or allocation ratio.
Resources available for oversubscription on a compute node include the number
of CPUs, amount of RAM, and amount of available disk space. When making a
scheduling decision, the scheduler of the Compute service takes into account
the actual amount of resources multiplied by the allocation ratio. Thereby,
the service allocates resources based on the assumption that not all instances
will be using their full allocation of resources at the same time.
Oversubscription enables you to increase the density of workloads and compute
resource utilization and, thus, achieve better Return on Investment (ROI) on
compute hardware. In addition, oversubscription can also help avoid the need
to create too many fine-grained flavors, which is commonly known as
flavor explosion.
There are two ways to control the oversubscription values for compute
nodes:
The legacy approach entails utilizing the
{cpu,disk,ram}_allocation_ratio configuration options offered by the
Compute service. A drawback of this method is that restarting the Compute
service is mandatory to apply the new configuration. This introduces the
risk of possible interruptions of cloud user operations, for example,
instance build failures.
The modern and recommended approach, adopted in MOSK
23.1, involves using the initial_{cpu,disk,ram}_allocation_ratio
configuration options, which are employed exclusively during the initial
provisioning of a compute node. This may occur during the initial deployment
of the cluster or when new compute nodes are added subsequently. Any further
alterations can be performed dynamically using the OpenStack Placement
service API without necessitating the restart of the service.
There is no definitive method for selecting optimal oversubscription values.
As a cloud operator, you should continuously monitor your workloads, ideally
have a comprehensive understanding of their nature, and experimentally
determine the maximum values that do not impact performance. This approach
ensures maximum workload density and cloud resource utilization.
To configure the initial compute resource oversubscription in
MOSK, specify the spec:features:nova:allocation_ratios
parameter in the OpenStackDeployment custom resource as explained in the
table below.
Changing the resource oversubscription configuration through the
OpenStackDeployment resource after cloud deployment will only
affect the newly added compute nodes and will not change
oversubscription for already existing compute nodes.
To change oversubscription for already existing compute nodes, use the
placement service API as described in Change oversubscription settings for existing compute nodes.
In the example configuration above, the compute nodes labeled with
compute-type=hi-perf label will use less intense oversubscription
on CPU and no oversubscription on disk.
When using oversubscription, it is important to conduct thorough cloud
management and monitoring to avoid system overloading and performance
degradation. If many or all instances on a compute node start using all
allocated resources at once and, thereby, overconsume physical resources,
failure scenarios depend on the resource being exhausted.
Workloads are getting slower as they actively compete for physical CPU
usage. A useful indicator is the steal time as reported inside the
workload, which is a percentage of time the operating system in the
workload is waiting for actual physical CPU core availability to run
instructions.
To verify the steal time in the Linux-based workload, use the
top command:
top-bn1|head|grepst$|awk-F',''{print $NF}'
Generally, steal times of >10 for 20-30 minutes are considered
alarming.
RAM
Operating system on the compute node starts to aggressively use physical
swap space, which significantly slows the workloads down. Sometimes, when
the swap is also exhausted, the operating system of a compute node can
outright OOM kill most offending processes, which can cause major
disruptions to workloads or a compute node itself.
Warning
While it may seem like a good idea to make the most of
available resources, oversubscribing RAM can lead to various issues and
is generally not recommended due to potential performance degradation,
reduced stability, and security risks for the workloads.
Mirantis strongly advises against oversubscribing RAM, by any amount.
Disk space
Depends on the physical layout of storage. Virtual root and ephemeral
storage devices that are hosted on a compute node itself are put in
the read-only mode negatively affecting workloads. Additionally,
the file system used by the operating system on a compute node may
become read-only too blocking the compute node operability.
There are workload types that are not suitable for running in an oversubscribed
environment, especially those with high performance, latency-sensitive, or
real-time requirements. Such workloads are better suited for compute nodes
with dedicated CPUs, ensuring that only processes of a single instance run
on each CPU core.
MOSK provides the capability to configure virtual CPU types
for OpenStack instances through the OpenStackDeployment custom resource.
This feature enables cloud user to tailor performance and resource allocation
within their OpenStack environment to meet specific workload demands
effectively.
Parameter
spec:features:nova:vcpu_type
Usage
Configures the type of virtual CPU that Nova will use when creating
instances.
The list of supported CPU models include host-model (default),
host-passthrough, and custom models.
The host-model CPU model (default) mimics the host CPU and provides for
decent performance, good security, and moderate compatibility with live
migrations.
With this mode, libvirt finds an available predefined CPU model that best
matches the host CPU, and then explicitly adds the missing CPU feature
flags to closely match the host CPU features. To mitigate known security
flaws, libvirt automatically adds critical CPU flags, supported by
installed libvirt, QEMU, kernel, and CPU microcode versions.
This is a safe choice if your OpenStack compute node CPUs are of the same
generation. If your OpenStack compute node CPUs are sufficiently different,
for example, span multiple CPU generations, Mirantis strongly recommends
setting explicit CPU models supported by all of your OpenStack compute node
CPUs or organizing your OpenStack compute nodes into host aggregates and
availability zones that have largely identical CPUs.
Note
The host-model model does not guarantee two-way live migrations
between nodes.
When migrating instances, the libvirt domain XML is first copied as is to
the destination OpenStack compute node. Once the instance is hard rebooted
or shut down and started again, the domain XML will be re-generated. If
versions of libvirt, kernel, CPU microcode, or BIOS firmware differ from
what they were on the source compute node the instance was started before,
libvirt may pick up additional CPU feature flags, making it impossible to
live-migrate back to the original compute node.
The host-passthrough CPU model provides maximum performance, especially
when nested virtualization is required or if live migration support is not
a concern for workloads. Live migration requires exactly the same CPU
on all OpenStack compute nodes, including the CPU microcode and kernel
versions. Therefore, for live migrations support, organize your compute
nodes into host aggregates and availability zones. For workload migration
between non-identical OpenStack compute nodes, contact Mirantis support.
For example, to set the host-passthrough CPU model for all OpenStack
compute nodes:
MOSK enables you to specify a comma-separated list of
exact QEMU CPU models to create and emulate. Specify the common and less
advanced CPU models first. All explicit CPU models provided must be compatible
with the OpenStack compute node CPUs.
To specify an exact CPU model, review the available CPU models and their
features. List and inspect the /usr/share/libvirt/cpu_map/*.xml files in
the libvirt containers of pods of the libvirt DeamonSet or multiple
DaemonSets if you are using node-specific settings.
OpenStack supports the following types of instance migrations:
Cold migration (also referred to simply as migration)
The process involves shutting down the instance, copying its definition and
disk, if necessary, to another host, and then starting the instance again
on the new host.
This method disrupts the workload running inside the instance but allows
for more reliability and works for most types of instances and consumed
resources.
Live migration
The process involves copying the instance definition, memory, and disk,
if necessary, to another host while the instance continues running,
without shutting it down. The instance then momentarily switches to run
on the new host.
While generally less disruptive to workloads, this method is less reliable
and imposes more restrictions on the instance and target host properties
to succeed.
As a cloud operator, you can configure live migration through the
OpenStackDeployment custom resource. The following table provides
the details on available configuration.
Parameter
Usage
features:nova:live_migration_interface
Specifies the name of the NIC device on the actual host that will be
used by Nova for the live migration of instances.
Mirantis recommends setting up your Kubernetes hosts in such a way
that networking is configured identically on all of them,
and names of the interfaces serving the same purpose or plugged into
the same network are consistent across all physical nodes.
Also, set the option to vhost0 in the following cases:
The Neutron service uses Tungsten Fabric.
Nova migrates instances through the interface specified by
the Neutron tunnel_interface parameter.
features:nova:libvirt:tls
Available since MOSK 23.2.
If set to true, enables the live migration over TLS:
Allowing non-administrative users to migrate instances¶
Available since MOSK 24.3
MOSK provides the following distinct sets of policies that
govern access to cold and live migrations:
os_compute_api:os-migrate-server:migrate and
os_compute_api:os-migrate-server:migrate_live define the ability to
initiate migrations without specifying the target host. In this case,
the OpenStack Compute scheduler selects the best suited target host
automatically.
os_compute_api:os-migrate-server:migrate:host
and os_compute_api:os-migrate-server:migrate_live:host define the ability
to initiate migration together with specifying the target host.
Depending on the API microversion used to start the migration, the host is
either validated by the scheduler (recommended) or forced regardless of
other considerations. The latter option is not recommended as it may lead
to inconsistencies in the internal state of the Compute service.
Since MOSK 24.3, the default policies for migrations without
the target host specification is set to rule:project_member_or_admin.
This means that migration is available to both cloud administrators and
project users with the member role.
The migration to a specific host requires administrative privileges.
If the default policy does not suit your deployment, you can require
administrative access for all instance migrations by setting these policy
values to rule:context_is_admin, or any other value appropriate for your
use case.
If you use the default policies and want to revert to the old defaults, ensure
that the following snippet is present in your OpenStackDeployment custom
resource:
Defines the type of storage for Nova to use on the compute hosts for
the images that back up the instances.
The list of supported options include:
localDeprecated
Option is deprecated and replaced by qcow2.
qcow2
The local storage is used. The backend disk image format is
qcow2. The pros include faster operation, failure domain
independency from the external storage. The cons include local
space consumption and less performant and robust live migration
with block migration.
rawAvailable since 24.2
The local storage is used. The backend disk image format is raw.
Raw images are simple binary dumps of disk data, including empty
space, resulting in larger file sizes. They provide superior
performance because they do not incur overhead from features such as
compression or copy-on-write, which are present in the qcow2 disk
images.
ceph
Instance images are stored in a Ceph pool shared across all
Nova hypervisors. The pros include faster image start, faster and
more robust live migration. The cons include considerably slower
IO performance, workload operations direct dependency on Ceph cluster
availability and performance.
lvmTechPreview
Instance images and ephemeral images are stored on a local Logical
Volume. If specified, features:nova:images:lvm:volume_group must
be set to an available LVM Volume Group, by default, nova-vol.
For details, see Enable LVM ephemeral storage.
MOSK provides a number of different methods to interact
with OpenStack virtual machines including VNC (default) and SPICE remote
consoles. This section outlines how you can configure these different
console services through the OpenStackDeployment custom resource.
The noVNC client provides remote control or remote desktop access to guest
virtual machines through the Virtual Network Computing (VNC) system.
The MOSK Compute service users can access their
instances using the noVNC clients through the noVNC proxy server.
The VNC remote console is enabled by default in MOSK.
To disable VNC remote console through the OpenStackDeployment custom
resource, set spec:features:nova:console:novnc to false:
MOSK uses TLS to secure public-facing VNC access
on networks between a noVNC client and noVNC proxy server.
The features:nova:console:novnc:tls:enabled ensures that the data
transferred between the instance and the noVNC proxy server is encrypted.
Both servers use the VeNCrypt authentication scheme for the data
encryption.
To enable the encrypted data transfer for noVNC, use the following
structure in the OpenStackDeployment custom resource:
The VNC protocol has its limitations, such as the lack of support for multiple
monitors, bi-directional audio, reliable cut-and-paste, video streaming,
and others. The SPICE protocol aims to overcome these limitations and
deliver a robust remote desktop support.
The SPICE remote console is disabled by default in MOSK.
To enable SPICE remote console through the OpenStackDeployment custom
resource, set spec:features:nova:console:spice:enabled to true:
MOSK provides GPU virtualization capabilities to its users
through the NVIDIA vGPU and Multi-Instance GPU (MIG) technologies.
GPU virtualization is a capability offered by modern datacenter-grade GPUs,
enabling the partitioning of a single physical GPU into
smaller virtual devices, that can then be attached to individual
virtual machines.
In contrast to the Peripheral Component Interconnect (PCI) passthrough feature, leveraging the GPU virtualization enables
concurrent utilization of the same physical GPU device by multiple virtual
machines. This enhances hardware utilization and fosters a more elastic
consumption of expensive hardware resources.
When using GPU virtualization, the physical device and its drivers manage
computing resource partitioning and isolation.
Untitled Diagram
The use case for GPU virtualization aligns with any application necessitating
or benefiting from accelerated parallel floating-point calculations, such as
graphic-intensive desktop workloads, for example, 3D modeling and rendering,
as well as computationally intensive tasks, for example, artifial intelligence,
specifically, machine learning training and classification.
At its core, GPU virtualization operates on base of the single-root
input/output virtualization framework (SR-IOV), which is already widely used by
datacenter-grade network adapters and mediated devices Linux kernel framework.
Typically, using GPU virtualization requires the installation of specific
physical GPU drivers on the host system. For detailed instructions on obtaining
and installing the required drivers, refer to official documentation from the
vendor of your GPU.
You can automate the configuration of drivers by adding a custom post-install
script to the BareMetalHostProfile object of your
MOSK cluster. See Configure GPU virtualization for details.
Certain NVIDIA GPUs, for example, Ampere GPU architecture and later,
support GPU virtualization in two modes: time sliced (vGPU)
or
Multi-Instance GPU (MIG).
Older architectures support only the time-sliced mode.
The distinction between these modes lies in resource isolation, dedicated
performance levels, and partitioning flexibility.
Typically, there is no fixed rule dictating which mode should be used, as it
depends on the intended workloads for the virtual GPUs and the level of
experience and assurances the cloud operator aims to offer users. Below,
there is a brief overview of the differences between these two modes.
In time-sliced vGPU mode, each virtual GPU is allocated dedicated slices of
the physical GPU memory while sharing the physical GPU engines. Only one vGPU
operates at a time, with full access to all physical GPU engines. The resource
scheduler within the physical GPU regulates the timing of each vGPU execution,
ensuring fair allocation of resources.
Therefore, this setup may encounter issues with noisy neighbors, where the
performance of one vGPU is affected by resource contention from others.
However, when not all available vGPU slots are occupied, the active ones
can fully utilize the power of its physical GPU.
Advantages:
Potential ability to fully utilize the compute power of physical GPU,
even if not all possible vGPUs have yet been created on that physical GPU.
Easier configuration.
Disadvantages:
Only a single vGPU type (size of the vGPU) can be created on any given
physical GPU. The cloud operator must decide beforehand what type of vGPU
each physical GPU will be providing.
Less strict resource isolation. Noisy neighbors and unpredictable level of
performance for every single guest vGPU.
In Multi-Instance GPUs (MIG) mode, each virtual GPU is allocated
dedicated physical GPU engines, exclusively utilized by that specific virtual
GPU. Virtual GPUs run in parallel, each on its own engines according to their
type.
Advantages:
Ability to partition a single physical GPU into various types of virtual
GPUs. This approach provides cloud operators with enhanced flexibility in
determining the available vGPU types for cloud users. However, the cloud
operator has to decide beforehand what types of virtual GPU each physical
GPU will be providing and partition each GPU accordingly.
Better resource isolation and guaranteed resource access with predictable
performance levels for every virtual GPU.
Disadvantages:
Under-utilization of physical GPU when not all possible virtual GPU slots are
occupied.
Comparatively complicated configuration, especially in heterogeneous hardware
environments.
Some of these restrictions may be lifted in future releases of
MOSK.
Cloud users will face the following limitations when working with
GPU virtualization in MOSK:
Inability to create several instances with virtual GPUs in one request
if there is no physical GPU available that can fit all of them at once.
For NVIDIA MIG, this effectively means that you cannot create
several instances with virtual GPUs in one request.
Inability to create an instance with several virtual GPUs.
Inability to attach virtual GPU to or detach virtual GPU
from a running instance.
Inability to live-migrate instances with virtual GPU attached.
Cloud operator will face the following limitations when configuring GPU
virtualization in MOSK:
Partition of physical GPUs to virtual GPUs is static and not on-demand.
You need to decide beforehand what types of virtual GPUs
each physical GPU will get partinioned into. Changing of the partitioning
requires removing all instances using virtual GPUs from the compute node
before initiating the repartitioning process.
Repartitioning may require additional manual steps to eliminate orphan
resource providers in the placement service, and thus, avoid resource
over-reporting and instance scheduling problems.
Configuration of multiple virtual GPU types per node may be very verbose
since configuration depends on particular PCI addresses of physical GPUs
on each node.
Management of compute node reboots is an important Day 2 operation. Before
shutting down a host, guest instances must either be migrated to other
compute nodes or gracefully powered off. This ensures the integrity of
disk filesystems and prevents damage to running applications.
MOSK provides the capability to automatically power off
the instances during the compute node shutdown or reboot through the ACPI
power event.
Graceful instance shutdown is managed using the systemd inhibit
tool. When the nova-compute service starts, it creates locks. For example:
The process runs in the nova-compute-inhibit-lock container within
the nova-compute pod. It intercepts systemd power event and starts
graceful guest shutdown. When all guest instances are powered off,
the inhibit lock is released.
To initiate a proper shutdown, use the following commands:
systemctl shutdown and systemctl reboot.
Mirantis OpenStack for Kubernetes (MOSK) Networking service
(OpenStack Neutron) provides cloud applications with
Connectivity-as-a-Service enabling instances to communicate with each
other and the outside world.
The API provided by the service abstracts all the nuances of implementing
a virtual network infrastructure on top of your own physical network
infrastructure. The service allows cloud users to create advanced virtual
network topologies that may include load balancing, virtual private
networking, traffic filtering, and other services.
MOSK Networking service supports Open vSwitch and
Tungsten Fabric SDN technologies as backends.
MOSK offers various networking backends. Selecting
the appropriate backend option for the Networking service is essential
for building a robust and efficient cloud networking infrastructure.
Whether you choose Open vSwitch (OVS), Open Virtual Network (OVN),
or Tungsten Fabric, understanding their features, capabilities, and
suitability for your specific use case is crucial for achieving optimal
performance and scalability in your OpenStack environment.
Open vSwitch is a production-quality, multilayer virtual switch licensed
under the open source Apache 2.0 license. It is designed to enable massive
network automation through programmatic extension, while supporting standard
management interfaces and protocols.
Open vSwitch is suitable for general-purpose networking requirements in
OpenStack deployments. It provides flexibility and scalability for various
network topologies.
Key characteristics of Open vSwitch:
Depends on RabbitMQ and RPC communication
Uses keepalived to set up HA routers
Uses namespace and Veth routing to provide its capabilities
Locates metadata in router or DHCP namespaces
Centralizes the DHCP service, which is running in a separate namespace
Available since MOSK 25.1 as GA (Caracal)Available since MOSK 24.2 as TechPreview
(Antelope)
Open Virtual Network is a solution for Open vSwitch that provides native
virtual networking support for Open vSwitch environments. It provides enhanced
scalability and performance compared to traditional Open vSwitch deployments.
Key characteristics of Open Virtual Network:
Uses the OVSDB protocol for commmunication
Is distributed by design
Handles all traffic with OpenFlow
Runs metadata on all nodes
Provides DHCP through local Open vSwitch instances
Caution
There are numerous limitations related to VLAN/Flat tenant
networks in Open Virtual Network with distributed floating IPs for
bare metal SR-IOV and Octavia VIP ports. For more information about
Open Virtual Network limitations, see relevant upstream documentation.
Tungsten Fabric is an open-source SDN based on Juniper Contrail. Its design
allows for simplified creation and management of virtual networks in cloud
environments. Tungsten Fabric supports advanced networking scenarious, such as
BGP integration and scalability.
Key characteristics of Tungsten Fabric:
Uses well scalable protocols to set up tunnels, such as BGP/MPLS
MOSK offers the Networking service as a part of its
core setup. You can configure the service through the
spec:features:neutron section of the OpenStackDeployment custom
resource.
Defines the name of the NIC device on the actual host that will be
used for Neutron.
Mirantis recommends setting up your Kubernetes hosts in such a way
that networking is configured identically on all of them,
and names of the interfaces serving the same purpose or plugged into
the same network are consistent across all physical nodes.
If enabled, must contain the data structure defining the floating IP
network that will be created for Neutron to provide external access to
your Nova instances.
The BGP dynamic routing extension to the Networking service (OpenStack Neutron)
is particularly useful for the MOSK clouds where private
networks managed by cloud users need to be transparently integrated into the
networking of the data center.
For example, the BGP dynamic routing is a common requirement for IPv6-enabled
environments, where clients need to seamlessly access cloud workloads using
dedicated IP addresses with no address translation involved in between the
cloud and the external network.
Untitled Diagram
BGP dynamic routing changes the way self-service (private) network prefixes
are communicated to BGP-compatible physical network devices, such as routers,
present in the data center. It eliminates the traditional reliance on static
routes or ICMP-based advertising by enabling the direct passing of private
network prefix information to router devices.
Note
To effectively use the BGP dynamic routing feature, Mirantis
recommends acquiring good understanding of OpenStack address scopes
and how they work.
The components of the OpenStack BGP dynamic routing are:
Service plugin
An extension to the Networking service (OpenStack Neutron) that implements
the logic for BGP-related entities orhestration and provides the cloud
user-facing API. A cloud administrator creates and configures a BGP speaker
using the CLI or API and manually schedules it to one or more hosts running
the agent.
Agent
Manages BGP peering sessions. In MOSK, the BGP agent
runs on nodes labeled with openstack-gateway=enabled.
Prefix advertisement depends on the binding of external networks to a BGP
speaker and the address scope of external and internal IP address ranges or
subnets.
BGP dynamic routing advertises prefixes for self-service networks and host
routes for floating IP addresses.
To successfully advertise a self-service network, you need to fulfill
the following conditions:
External and self-service networks reside in the same address scope.
The router contains an interface on the self-service subnet and a gateway
on the external network.
The BGP speaker associates with the external network that provides
a gateway on the router.
The BGP speaker has the advertise_tenant_networks attribute set
to True.
To successfully advertise a floating IP address, you need to fulfill
the following conditions:
The router with the floating IP address binding contains a gateway on
an external network with the BGP speaker association.
The BGP speaker has the advertise_floating_ip_host_routes attribute
set to true.
The diagram below is an example of the BGP dynamic routing in the non-DVR mode
with self-service networks and the following advertisements:
B>*192.168.0.0/25[200/0] through 10.11.12.1
B>*192.168.0.128/25[200/0] through 10.11.12.2
B>*10.11.12.234/32[200/0] through 10.11.12.1
Untitled DiagramOperation in the Distributed Virtal Router (DVR) mode¶
For both floating IP and IPv4 fixed IP addresses, the BGP speaker advertises
the gateway of the floating IP agent on the corresponding compute node as
the next-hop IP address. When using IPv6 fixed IP addresses, the BGP speaker
advertises the DVR SNAT node as the next-hop IP address.
The diagram below is an example of the BGP dynamic routing in the DVR mode
with self-service networks and the following advertisements:
DVR incompatibility with ARP announcements and VRRP¶
Due to the known issue
#1774459 in the upstream
implementation, Mirantis does not recommend using Distributed Virtual Routing
(DVR) routers in the same networks as load balancers or other applications
that utilize the Virtual Router Redundancy Protocol (VRRP) such as Keepalived.
The issue prevents the DVR functionality from working correctly with network
protocols that rely on the Address Resolution Protocol (ARP) announcements
such as VRRP.
The issue occurs when updating permanent ARP entries for
allowed_address_pair IP addresses in DVR routers because DVR performs
the ARP table update through the control plane and does not allow any
ARP entry to leave the node to prevent the router IP/MAC from
contaminating the network.
This results in various network failover mechanisms not functioning in virtual
networks that have a distributed virtual router plugged in. For instance, the
default backend for MOSK Load Balancing service,
represented by OpenStack Octavia with the OpenStack Amphora backend when
deployed in the HA mode in a DVR-connected network, is not able to redirect
the traffic from a failed active service instance to a standby one without
interruption.
In MOSK, Cinder backup is enabled and uses the Ceph back
end for Cinder by default. The backup configuration is stored
in the spec:features:cinder:backup structure in the
OpenStackDeployment custom resource. If necessary, you can disable
the backup feature in Cinder as follows:
Using this structure, you can also configure another backup driver supported
by MOSK for Cinder as described below. At any given time,
only one backend can be enabled.
MOSK supports NFS Unix authentication exclusively.
To use an NFS driver with MOSK, ensure you have
a preconfigured NFS server with an NFS share accessible to a Unix
Cinder user. This user must be the owner of the exported NFS folder,
and the folder must have the permission value set to 775.
All Cinder services run with the same user by default.
To obtain the Unix user ID:
You can specify the backup_share parameter in following formats:
hostname:path, ipv4addr:path, or [ipv6addr]:path.
For example: 1.2.3.4:/cinder_backup.
The Block Storage service (OpenStack Cinder) supports volume encryption using a
key stored in the Key Manager service (OpenStack Barbican). Such configuration
uses Linux Unified Key Setup (LUKS) to create an encrypted volume type and
attach it to the Compute service (OpenStack Nova) instances.
Nova retrieves the asymmetric key from Barbican and stores it
on the OpenStack compute node as a libvirt key to encrypt the volume
locally or on the backend and only after that transfers it to Cinder.
Note
To create an encrypted volume under a non-admin user, the
creator role must be assigned to the user.
When planning your cloud, consider that encryption may impact CPU.
The MOSK Block Storage service (OpenStack Cinder) uses Ceph
as the default backend for Cinder Volume. Also, MOSK enables
its clients to define their own volume backends using the
OpenStackDeployment custom resource. This section provides all the details
required to properly configure a custom Cinder Volume backend as a StatefulSet
or a DaemonSet.
When disabling the Ceph backend for Cinder Volume, you must explicitly specify
the new default_volume_type parameter. Refer to the sections below to learn
how you can configure it.
Considerations for configuring a custom Cinder Volume backend¶
Before you start deploying your custom Cinder Volume backend, decide on key
backend parameters and understand how they affect other services:
Note
Make sure to navigate to the documentation for the specific
OpenStack version used to deploy your environment when referring
to the official OpenStack documentation.
Considerations for configuring a custom Cinder Volume backend¶
Configuration option
Details
StatefulSet or DaemonSet
If the Cinder volume backend you prefer must run on all nodes with a
specific label and scale automatically as nodes are added or removed,
use a DaemonSet.
This type of backend typically requires that its data remains on
the same node where its pod is running. A common example of such a
backend is the LVM backend.
Otherwise, Mirantis recommends using a StatefulSet,
which offers more flexibility than a DaemonSet.
Support for Active/Active High Availability
If the driver does not support Active/Active High Availability, ensure
that only a single copy of the backend runs and that the cluster
parameter is left empty in the cinder.conf file for this backend.
When deploying the backend using a StatefulSet, set
pod.replicas.volume to 1 for this backend configuration.
Additionally, enable hostNetwork to ensure that the service
endpoint’s IP address remains stable when the backend pod restarts.
Support for Multi-Attach
If the driver supports Multi-Attach, it allows multiple connections to
the same volume. This capability is important for certain services,
such as Glance. If the driver does not support Multi-Attach, the backend
cannot be used for services that require this functionality.
Support for iSCSI and access to the /run directory
Some drivers require access to the /run directory on the host system
for storing their PID or lock files. Additionally, they may need access
to iSCSI and multipath services on the host. To enable this capability,
set the conf:enable_iscsi parameter to true. In some cases,
you might also need to run the backend container as privileged.
Privileged access for the container
For security reasons, Mirantis recommends running the Cinder Volume
backend container with the minimum required privileges. However,
if the drivers require privileged access, you can enable it for the
StatefulSet by setting the parameter
pod:security_context:cinder_volume:container:cinder_volume:privileged.
Access to the host network namespace
If the driver requires access to the host network namespace, or if you
need to ensure that the Cinder Volume backend’s IP address remains
unchanged after pod recreation or restart, set hostNetwork
to true using the following parameters:
For a DaemonSet, use pod:useHostNetwork:volume_daemonset. This
parameter is set to true by default.
For a StatefulSet, use pod:useHostNetwork:volume. Mirantis
recommends avoiding using StatefulSets with hostNetwork as it
may cause issues. StatefulSet pods are not tied to a specific
node, and multiple pods can run on the same node.
Access to the host IPC namespace
If the driver requires access to the host’s IPC namespace, set hostIPC
to true using the following parameters:
For a DaemonSet, use pod:useHostIPC:volume_daemonset.
For DaemonSet, this parameter is set to true by default.
For a StatefulSet, use pod:useHostIPC:volume.
Access to host PID namespace
If the driver requires access to the host’s PID namespace, set hostPID
to true using the following parameters:
For a DaemonSet, use pod:useHostPID:volume_daemonset.
MOSK enables its clients to define volume backends as
a StatefulSet.
To configure a custom StatefulSet backend for the MOSK
Block Storage service (OpenStack Cinder), use the
spec:features:cinder:volume:backends structure in the
OpenStackDeployment custom resource:
The enabled and create_volume_type parameters are optional. With
create_volume_type set to true (default), the new backend
will be added to the Cinder bootstrap job. Once this job is completed,
the volume type for the custom backend will be created in OpenStack.
The supported value for type is statefulset.
The list of keys you can override in the values.yaml file of the Cinder
chart includes conf, images, labels, and pod.
When you define the custom backend for the Block Storage service,
MOSK deploys individual pods for it. These pods have
separate Secrets for configuration files and ConfigMaps for scripts.
Example of configuration of a custom StatefulSet backend for Cinder:
The configuration example deploys a StatefulSet for the Cinder
volume backend that uses the NFS driver, running a single replica on node
labeled kubernetes.io/hostname:service-node. Privilege escalation for
the Cinder volume pod is driver-specific.
MOSK enables its clients to define volume backends as
a DaemonSet, LVM in particular.
To configure a custom DaemonSet backend for the MOSK
Block Storage service (OpenStack Cinder), use the spec:nodes structure
in the OpenStackDeployment custom resource:
Example of configuration of a custom DaemonSet backend for Cinder:
The configuration example deploys a DaemonSet for the Cinder volume backend
that uses the LVM driver and runs on nodes with the
openstack-compute-node=enabled label:
Caution
For data storage, this backend uses the LVM cinder-vol
group that must be present on nodes before the new backend is applied.
For the procedure on how to deploy an LVM backend, refer to
Enable LVM block storage.
MOSK provides the cinder-service-cleaner CronJob
by default. This CronJob periodically checks whether all Cinder services in
OpenStack are up to date and removes any stale ones.
This CronJob is tested only with backends supported by MOSK.
If cinder-service-cleaner does not work properly with your custom Cinder
volume backend, you can disable it at the OpenStackDeployment service
level in the OpenStackDeployment custom resource:
Make sure to navigate to the documentation for the specific
OpenStack version used to deploy your environment when referring
to the official OpenStack documentation.
Mirantis OpenStack for Kubernetes (MOSK) provides authentication,
service discovery, and distributed multi-tenant authorization through the
OpenStack Identity service, aka Keystone.
MOSK integrates with Mirantis Container Cloud Identity and
Access Management (IAM) subsystem to allow centralized management of users and
their permissions across multiple clouds.
The core component of Container Cloud IAM is Keycloak, the open-source identity
and access management software. Its primary function is to perform secure
authentication of cloud users against its built-in or various external
identity databases, such as LDAP directories, OpenID Connect or SAML
compatible identity providers.
By default, every MOSK cluster is integrated with the
Keycloak running in the Container Cloud management cluster. The integration
automatically provisions the necessary configuration on the
MOSK and Container Cloud IAM sides, such as the os
client object in Keycloak. However, for the federated users to get proper
permissions after logging in, the cloud operator needs to define the role
mapping rules specific to each MOSK environment.
MOSK enables you to connect external identity
provider to Keystone directly through the following structure in the
OpenStackDeployment custom resource:
The oidc_auth_type parameter specifies the Apache module to use:
oauth20 or oauth2. The oauth20 functionality is deprecated
and superseded by a new oauth2 module. You can configure two and more
identity providers only with the oauth2 module.
A region in MOSK represents a complete OpenStack cluster
that has a dedicated control plane and set of API endpoints. It is not uncommon
for operators of large clouds to offer their users several OpenStack regions,
which differ by their geographical location or purpose. In order to easily
navigate in a multi-region environment, cloud users need a way to distinguish
clusters by their names.
The region_name parameter of an OpenStackDeployment custom resource
specifies the name of the region that will be configured in all the OpenStack
services comprising the MOSK cluster upon the initial
deployment.
Important
Once the cluster is up and running, the cloud operator cannot
set or change the name of the region. Therefore, Mirantis recommends
selecting a meaningful name for the new region before the deployment starts.
For example, the region name can be based on the name of the data center the
cluster is located in.
Application credentials is a mechanism in the MOSK
Identity service that enables application automation tools, such as shell
scripts, Terraform modules, Python programs, and others, to securely
perform various actions in the cloud API in order to deploy and manage
application components.
Application credentials is a modern alternative to the legacy approach where
every application owner had to request several technical user accounts
to ensure their tools could authenticate in the cloud.
For the details on how to create and authenticate with application credentials,
refer to Manage application credentials.
Application credentials must be explicitly enabled for federated users¶
By default, cloud users logging in to the cloud through the Mirantis Container
Cloud IAM or any external identity provider cannot use the application
credentials mechanism.
An application credential is heavily tied to the account of the cloud user
owning it. An application automation tool that is a consumer of the credential
acts on behalf of the human user who created the credential. Each action that
the application automation tool performs gets authorized against the
permissions, including roles and groups, the user currently has.
The source of truth about a federated user permissions is the identity
provider. This information gets temporary transferred to the cloud’s
Identity service inside a token once the user authenticates. By default,
if such a user creates an application credential and passes it to the
automation tool, there is no data to validate the tool’s action on
the user’s behalf.
However, a cloud operator can configure the authorization_ttl parameter
for an identity provider object to enable caching of its users authorization
data. The parameter defines for how long in minutes the information about
user permissions is preserved in the database after the user successfully
logs in to the cloud.
Warning
Authorization data caching has security implications. In case a
federated user account is revoked or his permissions change in the identity
provider, the cloud Identity service will still allow performing actions
on the user behalf until the cached data expires or the user
re-authenticates in the cloud.
To set authorization_ttl to, for example, 60 minutes for the keycloak
identity provider in Keystone:
Defines the domain-specific configuration and is useful for integration
with LDAP. An example of OsDpl with LDAP integration, which will create
a separate domain.with.ldap domain and configure it to use LDAP as
an identity driver:
Mirantis OpenStack for Kubernetes (MOSK) provides the image management
capability through the OpenStack Image service, aka Glance.
The Image service enables you to discover, register, and retrieve virtual
machine images. Using the Glance API, you can query virtual machine image
metadata and retrieve actual images.
MOSK deployment profiles include the Image service in the
core set of services. You can configure the Image service through the
spec:features definition in the OpenStackDeployment custom resource.
MOSK can automatically verify the cryptographic signatures
associated with images to ensure the integrity of their data. A signed image
has a few additional properties set in its metadata that include
img_signature, img_signature_hash_method, img_signature_key_type,
and img_signature_certificate_uuid. You can find more information about
these properties and their values in the upstream OpenStack documentation.
MOSK performs image signature verification during the
following operations:
A cloud user or a service creates an image in the store and starts
to upload its data. If the signature metadata properties are set
on the image, its content gets verified against the signature.
The Image service accepts non-signed image uploads.
A cloud user spawns a new instance from an image. The Compute service
ensures that the data it downloads from the image storage matches
the image signature. If the signature is missing or does not match the
data, the operation fails. Limitations apply, see
Known limitations.
A cloud user boots an instance from a volume, or creates a new volume from
an image. If the image is signed, the Block Storage service compares the
downloaded image data against the signature. If there is a mismatch, the
operation fails.
The service will accept a non-signed image as a source for a volume.
Limitations apply, see Known limitations.
Every MOSK cloud is pre-provisioned with a baseline set of
images containing most popular operating systems, such as Ubuntu, Fedora,
CirrOS.
In addition, a few services in MOSK rely on the creation
of service instances to provide their functions, namely the Load Balancer
service and the Bare Metal service, and require corresponding images to exist
in the image store.
When image signature verification is enabled during the cloud deployment,
all these images get automatically signed with a pre-generated self-signed
certificate. Enabling the feature in an already existing cloud requires manual
signing of all of the images stored in it. Consult the OpenStack documentation
for an example of the image signing procedure.
The image signature verification is supported for LVM and local backends for
ephemeral storage.
The functionality is not compatible with Ceph-backed ephemeral storage
combined with RAW formatted images. The Ceph copy-on-write mechanism enables
the user to create instance virtual disks without downloading the image to
a compute node, the data is handled completely on the side of a Ceph cluster.
This enables you to spin up instances almost momentarily but makes it
impossible to verify the image data before creating an instance from it.
The Image service does not enforce the presence of a signature in
the metadata when the user creates a new image. The service will accept the
non-signed image uploads.
The Image service does not verify the correctness of an image signature
upon update of the image metadata.
MOSK does not validate if the certificate used to sign an
image is trusted,
it only ensures the correctness of the signature itself. Cloud users are
allowed to use self-signed certificates.
The Compute service does not verify image signature for Ceph backend when
the RAW image format is used as described in
Supported storage backends.
The Compute service does not verify image signature if the image is already
cached on the target compute node.
The Instance HA service may experience issues when auto-evacuating instances
created from signed images if it does have access to the corresponding
secrets in the Key manager service.
The Block Storage service does not perform image signature verification
when a Ceph backend is used and the images are in the RAW format.
The Block Storage service does not enforce the presence of a signature on
the images.
Instead of Swift, such configuration uses an S3 client to upload server-side
encrypted objects. Using server-side encryption, the data is sent over a secure
HTTPS connection in an unencrypted form and the Ceph Object Gateway stores that
data in the Ceph cluster in an encrypted form.
Defines the list of custom OpenStack Dashboard themes.
Content of the archive file with a theme depends on the level of
customization and can include static files, Django templates, and other
artifacts. For the details, refer to OpenStack official documentation:
Customizing Horizon Themes.
spec:features:horizon:themes:-name:theme_namedescription:The brand new themeurl:https://<path to .tgz file with the contents of custom theme>sha256summ:<SHA256 checksum of the archive above>
MOSK enables a cloud operator to configure
Message of the Day (MOTD) for the MOSK Dashboard
(OpenStack Horizon). These short messages inform users about
current infrastructure issues, upcoming maintenance, and other events,
helping them plan their work with minimal service disruption.
Cloud operators can configure messages to appear before or after users
log in to Horizon, or both. Messages can also be visually distinguished
based on severity and support minimal HTML formatting, including links.
To define the MOTD, populate the following structure in the
OpenStackDeployment custom resource:
The Bare Metal service (Ironic) is an extra OpenStack service that can be
deployed by the OpenStack Controller (Rockoon). This section provides the
baremetal-specific configuration options of the OpenStackDeployment
resource.
To provision a user image onto a bare metal server, Ironic boots a node with
a ramdisk image. Depending on the node’s deploy interface and hardware, the
ramdisk may require different drivers (agents). MOSK
provides tinyIPA-based ramdisk images and uses the direct deploy interface
with the ipmitool power interface.
Since the bare metal nodes hardware may require additional drivers,
you may need to build a deploy ramdisk for particular hardware. For more
information, see Ironic Python Agent Builder.
Be sure to create a ramdisk image with the version of Ironic Python Agent
appropriate for your OpenStack release.
Ironic supports the flat and multitenancy networking modes.
The flat networking mode assumes that all bare metal nodes are
pre-connected to a single network that cannot be changed during the
virtual machine provisioning. This network with bridged interfaces
for Ironic should be spread across all nodes including compute nodes
to allow plug-in regular virtual machines to connect to Ironic network.
In its turn, the interface defined as provisioning_interface should
be spread across gateway nodes. The cloud operator can perform
all these underlying configuration through the L2 templates.
Example of the OsDpl resource illustrating the configuration for the flat
network mode:
spec:features:services:-baremetalneutron:external_networks:-bridge:ironic-pxeinterface:<baremetal-interface>network_types:-flatphysnet:ironicvlan_ranges:nullironic:# The name of neutron network used for provisioning/cleaning.baremetal_network_name:ironic-provisioningnetworks:# Neutron baremetal network definition.baremetal:physnet:ironicname:ironic-provisioningnetwork_type:flatexternal:trueshared:truesubnets:-name:baremetal-subnetrange:10.13.0.0/24pool_start:10.13.0.100pool_end:10.13.0.254gateway:10.13.0.11# The name of interface where provision services like tftp and ironic-conductor# are bound.provisioning_interface:br-baremetal
The multitenancy network mode uses the neutron Ironic network
interface to share physical connection information with Neutron. This
information is handled by Neutron ML2 drivers when plugging a Neutron port
to a specific network. MOSK supports the
networking-generic-switch Neutron ML2 driver out of the box.
Example of the OsDpl resource illustrating the configuration for the
multitenancy network mode:
spec:features:services:-baremetalneutron:tunnel_interface:ens3external_networks:-physnet:physnet1interface:<physnet1-interface>bridge:br-exnetwork_types:-flatvlan_ranges:nullmtu:null-physnet:ironicinterface:<physnet-ironic-interface>bridge:ironic-pxenetwork_types:-vlanvlan_ranges:1000:1099ironic:# The name of interface where provision services like tftp and ironic-conductor# are bound.provisioning_interface:<baremetal-interface>baremetal_network_name:ironic-provisioningnetworks:baremetal:physnet:ironicname:ironic-provisioningnetwork_type:vlansegmentation_id:1000external:trueshared:falsesubnets:-name:baremetal-subnetrange:10.13.0.0/24pool_start:10.13.0.100pool_end:10.13.0.254gateway:10.13.0.11
The supported backend for Designate is PowerDNS. If required, you can specify
whether to use an external IP address or UDP, TCP, or TCP + UDP kind
of Kubernetes for the PowerDNS service.
To configure LoadBalancer for PowerDNS, use the spec:features:designate
definition in the OpenStackDeployment custom resource.
The list of supported options includes:
external_ip - Optional. An IP address for the LoadBalancer service. If
not defined, LoadBalancer allocates the IP address.
protocol - A protocol for the Designate backend in Kubernetes. Can only
be udp, tcp, or tcp+udp.
type - The type of the backend for Designate. Can only be powerdns.
Due to an issue in the dnspython library, Asynchronous Transfer Full Range
(AXFR) requests do not work and cause inability to set up a secondary DNS zone.
The issue affects OpenStack Victoria and is fixed in the Yoga release.
MOSK Key Manager service (OpenStack Barbican) provides
secure storage, provisioning, and management of cloud application secret data,
such as Symmetric Keys, Asymmetric Keys, Certificates, and raw binary data.
Instance High Availability service (OpenStack Masakari) enables cloud users
to ensure that their instances get automatically evacuated from a failed
hypervisor.
The service consists of the following components:
API recieves requests from users and events from monitors, and sends
them to engine
Engine executes recovery workflow
Monitors detect failures and notifies API. MOSK uses
monitors of the following types:
Instance monitor performs liveness of instance processes
Introspective instance monitor enhances instance high availability
within OpenStack environments by monitoring and identifying system-level
failures through the QEMU Guest Agent
Host monitor performs liveness of a compute host, runs as part of
the Node controller from the OpenStack Controller (Rockoon)
Note
The Processes monitor is not present in MOSK
as far as HA for the compute processes is handled by Kubernetes.
This section describes how to enable various components of the
Instance High Availability service for your MOSK
deployment:
The Instance HA service is not included into the core set of services and needs
to be explicitly enabled in the OpenStackDeployment custom resource.
Parameter
features:services:instance-ha
Usage
Enables Masakari, the OpenStack service that ensures high availability
of instances running on a host. To enable the service, add
instance-ha to the service list:
The introspective instance monitor in the Instance High Availability
service enhances the reliability of the cloud environment by monitoring
virtual machines for failure events, including operating system crashes,
kernel panics, and unresponsive states. Upon detecting such events in real
time, the monitor initiates automated recovery actions, such as rebooting
the affected instance. This allows for reduced downtime and maintains high
availability of an OpenStack environment.
As a cloud operator, you can enable and configure the instance introspection
through the spec:features:masakari:monitors:introspective definition in the
OpenStackDeployment custom resource. The list of supported options include:
enabled (boolean)
Enables or disables the introspection monitor. Default: false.
guest_monitoring_interval (integer)
Defines the time interval (in seconds) for monitoring the status of the
guest virtual machine. Default: 10.
guest_monitoring_timeout (integer)
Sets the timeout (in seconds) for detecting a non-responsive guest VM before
marking it as failed. Default: 2.
guest_monitoring_failure_threshold (integer)
Defines the number of consecutive failures required before a notification is
sent or recovery action is initiated. Default: 3.
The introspective instance monitor relies on the QEMU Guest Agent being
installed within the guest virtual machine. This agent enables communication
between the host and guest operating systems, ensuring precise monitoring
of the virtual machine health. Without the QEMU Guest Agent, the introspection
monitor cannot accurately assess the state of the virtual machine, which may
prevent the initiation of necessary recovery actions. To start monitoring,
refer to Configure the introspective instance monitor.
MOSK Shared Filesystems service (OpenStack Manila) provides
Shared Filesystems as a service. The Shared Filesystems service enables you to
create and manage shared filesystems in your multi-project cloud environments.
Note
MOSK does not support the Shared Filesystems
service for the clusters with Tungsten Fabric as a networking backend.
The Shared FileSystems service (OpenStack Manila) consists of manila-api,
manila-scheduler, and manila-share services. All these services
communicate with each other through the AMQP protocol and store their data
in the MySQL database:
manila-api
Provides a stable RESTful API, authenticates and routes requests
throughout the Shared Filesystem service
manila-scheduler
Responsible for scheduling and routing requests to the appropriate
manila-share service by determining which backend should
serve as the destination for a share creation request
manila-share
Responsible for managing Shared Filesystems service devices, specifically
the backend ones
The diagram below illustrates how the Shared FileSystems service components
communicate with each other.
MOSK ensures support for different kind of equipment and
shared filesystems by means of special drivers that are part of the
manila-share service. Also, these drivers determine the ability to restrict
access to data stored on a shared filesystem, list of operations with Manila
volumes, and types of connections to the client network.
Driver Handles Share Servers (DHSS) is one of the main parameters that
define the Manila workflow including the way the Manila driver makes clients
access shared filesystems. Some drivers support only one DHSS mode,
for example, the LVM share driver. Others support both modes, for example,
the Generic driver. If the DHSS is set to False in the driver
configuration, the driver does not prepare the share server that provides
access to the share filesystems and the server and network setup should be
performed by the administrator. In this case, the Shared Filesystems service
only manages the server in its own configuration.
Untitled Diagram
If the driver configuration includes DHSS=True, the driver creates a
service virtual machine that provides access to shared filesystems.
Also, when DHSS=True, the Shared Filesystems service performs a network
setup to provide client’s access to the created service virtual machine.
For working with the service virtual machine, the Shared Filesystems service
requires a separate service network that must be included in the driver’s
configuration as well.
The following are descriptions of drivers supported by the
MOSK Shared Filesystems service.
The generic driver is an example for the DHSS=True case.
There are two network topologies for connecting client’s network to the
service virtual machine, which depend of the
connect_share_server_to_tenant_network parameter.
If the connect_share_server_to_tenant_network parameter is set to
False, which is default, the client must create a shared network connected
to a public router. IP addresses from this network will be granted access to
the created shared filesystem. The Shared Filesystems service creates a subnet
in its service network where the network port of the new service virtual
machine and network port of the clent’s router will be connected to. When a
new shared filesystem is created, the client’s machine is granted access to it
through the router.
Untitled Diagram
If the connect_share_server_to_tenant_network parameter is set to True,
the Shared Filesystems service creates the service virtual machines with two
network interfaces. One of them is connected to the service network while the
other one is connected to the client’s network.
The CephFS driver is a DHSS=False driver. The CephFS driver can be
configured to use the Ceph protocol to provide shares. However,
MOSK does not support the NFS Ganesha protocol.
The main advantages of using a direct connection to CephFS through the Ceph
protocol over using the NFS protocol include:
Simplified setup
No third-party services are required between the client and CephFS, whereas
an NFS layer can introduce an additional point of failure.
No additional load balancing
Making NFS highly available requires setting up additional load balancers,
which is unnecessary with direct CephFS access.
Enhanced access control
CephFS shares can be restricted using cephx authentication, whereas NFS
only allows access restrictions based on IP addresses.
For the CephFS driver to function, the manila-share service must have
access to the Storage Access network. To mount created shares, the client must
have access to the Storage Access network, the share URL, and credentials.
The URLs and credentials for created shares are exposed to clients through the
Manila API.
Note
Due to the existing limitation for Ceph clusters, Ceph Monitor
services are only accessible on the MOSK LCM network.
Therefore, both the manila-share service and clients require access
to the MOSK LCM network. By default, manila-share
already have access to this network. However, to enable access for external
clients, for example, client VMs, routing must be configured between the
client VM and the MOSK LCM network.
Untitled Diagram
The risks of direct connection of client VMs to the Storage Access Network
include:
A malicious host on the same network may attempt to attack or scan other
clients or the Ceph cluster
A malicious host may intercept and manipulate communication, acting on
behalf of a valid client or Ceph cluster (a man-in-the-middle attack)
The following measures can help reduce these risks:
Ensure that port security is enabled on client VM ports connected to Ceph
networks, which is enabled by default on OpenStack networks
Ensure that the Ceph cluster and client use the msgr2 protocol with CRC and
secure modes enabled, which are enabled by default for
MOSK deployments
Configure OpenStack security groups for client VM ports to allow traffic only
from trusted hosts
The Shared Filesystems service is not included into the core set of services
and needs to be explicitly enabled in the OpenStackDeployment custom
resource.
To install the OpenStack Manila services, add the shared-file-system
keyword to the spec:features:services list:
spec:features:services:-shared-file-system
The above configuration installs the Shared Filesystems service with
the generic driver configured.
Enabling CephFS driver for Shared Filesystems service¶
Available since MOSK 25.1TechPreview
Caution
MOSK does not support enabling both the generic
driver and CephFS driver in the same environment. If the CephFS driver is
enabled in an environment where the generic driver was previously enabled,
the CephFS driver will replace the generic one.
The CephFS driver is not enabled by default in the Shared Filesystems service.
To enable the CephFS driver:
In a cloud environment where resources are shared across all workloads,
those resources often become a point of contention.
For example, it is not uncommon for an oversubscribed compute node to
experience the noisy neighbor problem, when one of the instances may
start consuming a lot more resources than usually, negatively affecting
performance of other instances running on the same node.
In such cases, an intervention is required from the cloud operators
to manually re-distribute workloads in the cluster to achieve more equal
utilization of resources.
The Dynamic Resource Balancer (DRB) service continiously measures resource
usage on hypervisors and redistributes workloads to achieve some optimum
target, thereby eliminating the need for manual interventions from cloud
operators.
The DRB service is implemented as a Kubernetes operator, controlled by
the custom resource of kind:DRBConfig. Unless at least one resource
of such kind is present, the service does not perform any operations. Cloud
operators who want to enable the DRB service for their MOSK
clouds, need to create the resource with proper configuration.
The DRB controller consists of the following сomponents interacting with each
other:
collector
Collects the statistics of resource consumption in the cluster
scheduler
Based on the data from the collector, makes decisions whether
cloud resources need to be relocated to achieve the optimum
actuator
Executes the resource relocation decisions made by scheduler
Out of the box, these service components implement a very simple logic, which,
however, can be individually enhanced according to the needs of a specific
cloud environment by utilizing their pluggable architecture. The plugins
need to be written in Python programming language and injected as modules
into the DRB service by building a custom drb-controller container image.
Default plugins as well as custom plugins are configured through the
corresponding sections of DRBConfig custom resources.
Also, it is possible to limit the scope of DRB decisions and actions
to only a subset of hosts. This way, you can model the node grouping
schema that is configured in OpenStack, for example, compute node
aggregates and availability zones, to avoid DRB service attempting resource
placement changes that cannot be fulfilled by MOSK Compute
service (OpenStack Nova).
The spec section of configuration consists of the following main parts:
collector
Specifies and configures the collector plugin to collect the metrics on
which decisions are based. At a minimum, the name of the plugin must
be provided.
scheduler
Specifies and configures the scheduler plugin that will make decisions
based on the collected metrics. At a minimum, the name of the plugin
must be provided.
actuator
Specifies and configures the actuator plugin that will move resources
around. At a minimum, the name of the plugin must be provided.
reconcileInterval
Defines time in seconds between reconciliation cycles. Should be large
enough for the metrics to settle after resources are moved around.
For the default stacklight collector plugin, this value must equal
at least 300.
hosts
Specifies the list of cluster hosts to which this given instance of
DRBConfig applies. This means that only metrics from these hosts
will be used for making decisions, only resources belonging to these
hosts will be considered for re-distribution, and only these hosts
will be considered as possible targets for re-distribution.
You can create multiple DRBConfig resources that watch over
non-overlapping sets of hosts.
Default of this setting is an empty list that implies all hosts.
migrateAny
A boolean flag that the scheduler plugin can consider when making
decisions, allowing cloud operators and users to opt certain workloads
in or out of redistribution.
For the default vm-optimize scheduler plugin:
migrateAny:true (default) - any instance can be migrated, except for
instances tagged with lcm.mirantis.com:no-drb, explicitly opting out
of the DRB functionality
migrateAny:false - only instances tagged with
lcm.mirantis.com:drb are migrated by the DRB service,
explicitly opting in to the DRB functionality
Collects node_load5, machine_cpu_cores, and
libvirt_domain_info_cpu_time_seconds:rate5m metrics
from the StackLight service running in the MOSK cluster.
Does not have options available.
Requires the reconcileInterval set to at least 300 (5 minutes),
as both the collected node and instance CPU usage metrics are effectively
averaged over a 5-minute sliding window.
Attempts to minimize the standard deviation of node load. The node load is
normalized per CPU core, so heterogeneous compute hosts can be compared.
Available options:
load_threshold
The value in percent of the compute host load after which the host will be
considered overloaded and attempts will be made to migrate instances
from it. Defaults to 80.
min_improvement
Minimal improvement of the optimization metric in percent.
While making decisions, the scheduler attempts to predict the resulting load
distribution to determine if moving resources is beneficial. If the total
improvement after all necessary decisions is calculated to be less than
min_improvement, no decisions will be executed.
Defaults to 0, any potential improvement is acted upon.
Setting this to a higher value should allow avoiding instance migrations
that provide negligible improvements.
Warning
The current version of this plugin takes into account only basic
resource classes when making scheduling decisions. These include only RAM,
disk, and vCPU count from the instance flavor. It does not take into account
any other information including specific image or aggregate metadata, custom
resource classes, PCI devices, NUMA, hugepages, and so on. Moving around
instances that consume such resources will more likely fail as the current
implementation of the scheduler plugin cannot reliably predict if such
instances fit onto the selected target host.
Live migrates instances to specific hosts. Assumes any migration is possible.
Refer to the hosts and migrateAny options above to learn how to control
which instances are migrated to which locations.
Available options:
max_parallel_migrations
Defines the number of instances to migrate in parallel.
Defaults to 10.
This value applies to all decisions being processed, so it may involve
instances from different hosts. Meanwhile, the nova-compute service may
have its own limits on how many live migrations a given host can handle
in parallel.
migration_polling_interval
Defines the interval in seconds for checking the instance status while
the latter is being migrated
Defaults to 5.
migration_timeout
Defines the interval in seconds after which an unfinished migration is
considered failed.
Only logs the decisions that were scheduled for execution.
Useful for debugging and dry-runs.
Note
The list of the services and their supported features included in
this section is not full and is being constantly amended based on the
complexity of the architecture and use of a particular service.
OpenStack and auxiliary services are running as containers in the kind:Pod
Kubernetes resources. All long-running services are governed by one of
the ReplicationController-enabled Kubernetes resources, which include
either kind:Deployment, kind:StatefulSet, or kind:DaemonSet.
The placement of the services is mostly governed by the Kubernetes node labels.
The labels affecting the OpenStack services include:
openstack-control-plane=enabled - the node hosting most of the OpenStack
control plane services.
openstack-compute-node=enabled - the node serving as a hypervisor for
Nova. The virtual machines with tenants workloads are created there.
openvswitch=enabled - the node hosting Neutron L2 agents and OpenvSwitch
pods that manage L2 connection of the OpenStack networks.
openstack-gateway=enabled - the node hosting Neutron L3, Metadata and
DHCP agents, Octavia Health Manager, Worker and Housekeeping components.
Note
OpenStack is an infrastructure management platform. Mirantis OpenStack
for Kubernetes (MOSK) uses Kubernetes mostly for
orchestration and dependency isolation. As a result, multiple OpenStack
services are running as privileged containers with host PIDs and Host
Networking enabled. You must ensure that at least the user with the
credentials used by Helm/Tiller (administrator) is capable of creating
such Pods.
While the underlying Kubernetes cluster is configured to use Ceph CSI
for providing persistent storage for container workloads, for some
types of workloads such networked storage is suboptimal due to latency.
This is why the separate local-volume-provisioner CSI is
deployed and configured as an additional storage class.
Local Volume Provisioner is deployed as kind:DaemonSet.
Database
A single WSREP (Galera) cluster of MariaDB is deployed as the SQL
database to be used by all OpenStack services. It uses the storage class
provided by Local Volume Provisioner to store the actual database files.
The service is deployed as kind:StatefulSet of a given size, which
is no less than 3, on any openstack-control-plane node. For details,
see OpenStack database architecture.
Messaging
RabbitMQ is used as a messaging bus between the components of the
OpenStack services.
A separate instance of RabbitMQ is deployed for each OpenStack service
that needs a messaging bus for intercommunication between its
components.
An additional, separate RabbitMQ instance is deployed to serve as
a notification messages bus for OpenStack services to post their own
and listen to notifications from other services.
StackLight also uses this message bus to collect notifications for
monitoring purposes.
Each RabbitMQ instance is a single node and is deployed as
kind:StatefulSet.
Caching
A single multi-instance of the Memcached service is deployed to be used
by all OpenStack services that need caching, which are mostly HTTP API
services.
Coordination
A separate instance of etcd is deployed to be used by Cinder,
which require Distributed Lock Management for coordination between its
components.
Ingress
Is deployed as kind:DaemonSet.
Image pre-caching
A special kind:DaemonSet is deployed and updated each time the
kind:OpenStackDeployment resource is created or updated.
Its purpose is to pre-cache container images on Kubernetes nodes, and
thus, to minimize possible downtime when updating container images.
This is especially useful for containers used in kind:DaemonSet
resources, as during the image update Kubernetes starts to pull the
new image only after the container with the old image is shut down.
keystoneclient - a separate kind:Deployment with a pod that
has the OpenStack CLI client as well as relevant plugins installed,
and OpenStack admin credentials mounted. Can be used by
administrator to manually interact with OpenStack APIs from within a
cluster.
Image (Glance)
Supported backend is RBD (Ceph is required).
Volume (Cinder)
Supported backend is RBD (Ceph is required).
Network (Neutron)
Supported backends are Open vSwitch, Open Virtual Network, and Tungsten
Fabric.
Placement
Compute (Nova)
Supported hypervisor is Qemu/KVM through libvirt library.
Dashboard (Horizon)
DNS (Designate)
Supported backend is PowerDNS.
Load Balancer (Octavia)
Ceph Object Gateway (SWIFT)
Provides the object storage and a Ceph Object Gateway Swift API that is
compatible with the OpenStack Swift API. You can manually enable the
service in the OpenStackDeployment CR as described in
Deploy an OpenStack cluster.
Instance HA (Masakari)
An OpenStack service that ensures high availability of instances running
on a host. You can manually enable Masakari in the
OpenStackDeployment CR as described in Deploy an OpenStack cluster.
Orchestration (Heat)
Key Manager (Barbican)
The supported backends include:
The built-in Simple Crypto, which is used by default
Vault
Vault by HashiCorp is a third-party system and is not
installed by MOSK. Hence,
the Vault storage backend should be
available elsewhere on the user environment and accessible from
the MOSK deployment.
If the Vault backend is used, you can configure Vault in the
OpenStackDeployment CR as described in
Deploy an OpenStack cluster.
Tempest
Runs tests against a deployed OpenStack cloud. You can manually enable
Tempest in the OpenStackDeployment CR as described in
Deploy an OpenStack cluster.
Shared Filesystems (OpenStack Manila)
Provides Shared Filesystems as a service that enables you to create and
manage shared filesystems in a multi-project cloud environments.
For details, refer to Shared Filesystems service.
Shared Filesystems (OpenStack Manila)
Provides Shared Filesystems as a service that enables you to create and
manage shared filesystems in a multi-project cloud environments.
For details, refer to Shared Filesystems service.
A complete setup of a MariaDB Galera cluster for OpenStack is illustrated
in the following image:
MariaDB server pods are running a Galera multi-master cluster. Clients
requests are forwarded by the Kubernetes mariadb service to the
mariadb-server pod that has the primary label. Other pods from
the mariadb-server StatefulSet have the backup label. Labels are
managed by the mariadb-controller pod.
The MariaDB Controller periodically checks the readiness of the
mariadb-server pods and sets the primary label to it if the following
requirements are met:
The primary label has not already been set on the pod.
The pod is in the ready state.
The pod is not being terminated.
The pod name has the lowest integer suffix among other ready pods in
the StatefulSet. For example, between mariadb-server-1 and
mariadb-server-2, the pod with the mariadb-server-1 name is
preferred.
Otherwise, the MariaDB Controller sets the backup label. This means that
all SQL requests are passed only to one node while other two nodes are in
the backup state and replicate the state from the primary node.
The MariaDB clients are connecting to the mariadb service.
The OpenStack Controller (Rockoon) runs in a set of containers in a pod in
Kubernetes. Rockoon is deployed as a Deployment with 1 replica only.
The failover is provided by Kubernetes that automatically restarts the
failed containers in a pod.
However, given the recommendation to use a separate Kubernetes cluster
for each OpenStack deployment, the controller in envisioned mode for
operation and deployment will only manage a single OpenStackDeployment
resource, making the proper HA much less of an issue.
Rockoon is written in Python using Kopf, as a Python framework to build
Kubernetes operators, and Pykube, as a Kubernetes API client.
Using Kubernetes API, the controller subscribes to changes to resources of
kind:OpenStackDeployment, and then reacts to these changes by creating,
updating, or deleting appropriate resources in Kubernetes.
The basic child resources managed by the controller are Helm releases.
They are rendered from templates taking into account
an appropriate values set from the main and features fields in the
OpenStackDeployment resource.
Then, the common fields are merged to resulting data structures.
Lastly, the services fields are merged providing the final and precise override
for any value in any Helm release to be deployed or upgraded.
The constructed values are then used by Rockoon during a Helm release
installation.
The core container that handles changes in the osdpl object.
helmbundle
The container that watches the helmbundle objects
and reports their statuses to the osdpl object in
status:children. See OpenStackDeploymentStatus custom resource for details.
health
The container that watches all Kubernetes native
resources, such as Deployments, Daemonsets, Statefulsets,
and reports their statuses to the osdpl object in
status:health. See OpenStackDeploymentStatus custom resource for details.
secrets
The container that provides data exchange between different
components such as Ceph.
The CustomResourceDefinition resource in Kubernetes uses the
OpenAPI Specification version 2 to specify the schema of the resource
defined. The Kubernetes API outright rejects the resources that do not
pass this schema validation.
The language of the schema, however, is not expressive enough to define a
specific validation logic that may be needed for a given resource. For this
purpose, Kubernetes enables the extension of its API with
Dynamic Admission Control.
For the OpenStackDeployment (OsDpl) CR the ValidatingAdmissionWebhook
is a natural choice. It is deployed as part of OpenStack Controller (Rockoon)
by default and performs specific extended validations when an OsDpl CR is
created or updated.
The inexhaustive list of additional validations includes:
Deny the OpenStack version downgrade
Deny the OpenStack version skip-level upgrade
Deny the OpenStack master version deployment
Deny upgrade to the OpenStack master version
Deny upgrade if any part of an OsDpl CR specification
changes along with the OpenStack version
Under specific circumstances, it may be viable to disable the Admission
Controller, for example, when you attempt to deploy or upgrade to the master
version of OpenStack.
Warning
Mirantis does not support MOSK deployments
performed without the OpenStackDeployment Admission Controller enabled.
Disabling of the OpenStackDeployment Admission Controller is only
allowed in staging non-production environments.
To disable the Admission Controller, ensure that the following structures and
values are present in the rockoon HelmBundle resource:
The OpenStack Exporter collects metrics from the OpenStack services and exposes
them to Prometheus for integration with StackLight. The Exporter interacts with
the REST APIs of various OpenStack services to gather data about the
infrastructure state and performance for visualization, alerting, and analysis
within the monitoring system.
To retrieve metrics from the OpenStack Exporter:
Locate the Exporter pod. The OpenStack Exporter runs in the osh-system
namespace:
kubectl-nosh-systemgetpods|grepexporter
Query the metrics by executing the curl request inside the exporter
container:
MOSK provides the configurational capabilities through a
number of custom resources. This section is intended to provide detailed
overview of these custom resources and their possible configuration.
The OpenStackDeployment custom resource enables you to securely store
sensitive fields in Kubernetes secrets. To do that, verify that the
reference secret is present in the same namespace as the
OpenStackDeployment object and the
openstack.lcm.mirantis.com/osdpl_secret label is set to true.
The list of fields that can be hidden from OpenStackDeployment is limited
and defined by the OpenStackDeployment schema.
For example, to hide spec:features:ssl:public_endpoints:api_cert, use the
following structure:
Main elements of OpenStackDeployment custom resource¶
Element
Sub-element
Description
apiVersion
n/a
Specifies the version of the Kubernetes API that is used to create
this object
kind
n/a
Specifies the kind of the object
metadata
name
Specifies the name of metadata. Should be set in compliance with the
Kubernetes resource naming limitations
namespace
Specifies the metadata namespace. While technically it is possible to
deploy OpenStack on top of Kubernetes in other than openstack
namespace, such configuration is not included in the
MOSK system integration test plans. Therefore,
Mirantis does not recommend such scenario.
Warning
Both OpenStack and Kubernetes platforms provide resources
to applications. When OpenStack is running on top of Kubernetes,
Kubernetes is completely unaware of OpenStack-native workloads,
such as virtual machines, for example.
For better results and stability, Mirantis recommends using a
dedicated Kubernetes cluster for OpenStack, so that OpenStack and
auxiliary services, Ceph, and StackLight are the only Kubernetes
applications running in the cluster.
spec
openstack_version
Specifies the OpenStack release to deploy
preset
String that specifies the name of the preset, a predefined
configuration for the OpenStack cluster. A preset includes:
A set of enabled services that includes virtualization, bare
metal management, secret management, and others
Major features provided by the services, such as VXLAN encapsulation
of the tenant traffic
Integration of services
Every supported deployment profile incorporates an OpenStack preset.
Refer to Deployment profiles for the list of possible values.
size
String that specifies the size category for the OpenStack cluster.
The size category defines the internal configuration of the cluster
such as the number of replicas for service workers and timeouts, etc.
The list of supported sizes include:
tiny - for approximately 10 OpenStack compute nodes
small - for approximately 50 OpenStack compute nodes
medium - for approximately 100 OpenStack compute nodes
public_domain_name
Specifies the public DNS name for OpenStack services. This is a base
DNS name that must be accessible and resolvable by API clients of your
OpenStack cloud. It will be present in the OpenStack endpoints as
presented by the OpenStack Identity service catalog.
The TLS certificates used by the OpenStack services (see below) must
also be issued to this DNS name.
persistent_volume_storage_class
Specifies the Kubernetes storage class name used for services to create
persistent volumes. For example, backups of MariaDB. If not specified,
the storage class marked as default will be used.
features
Contains the top-level collections of settings for the OpenStack
deployment that potentially target several OpenStack services. The
section where the customizations should take place.
The features:services element contains a list of extra OpenStack
services to deploy. Extra OpenStack services are services that are not
included into preset.
The list of services available for configuration includes: Cinder, Nova,
Designate, Keystone, Glance, Neutron, Heat, Octavia, Barbican, Placement,
Ironic, aodh, Gnocchi, and Masakari.
Mirantis is not responsible for cloud operability in case
of default policies modifications but provides API to pass the required
configuration to the core OpenStack services.
Enables a tested set of policies that limits the global admin role to
only the user with admin role in the admin project or user with the
service role. The latter should be used only for service users utilizied
for communication between OpenStack services.
A low-level section that defines values that will be passed to all
OpenStack (spec:common:openstack) or auxiliary
(spec:common:infra) services Helm charts.
A section of the lowest level, enables the definition of
specific values to pass to specific Helm charts on a one-by-one basis:
Warning
Mirantis does not recommend changing the default settings for
spec:artifacts, spec:common, and spec:services elements.
Customizations can compromise the OpenStack deployment update and upgrade
processes.
However, you may need to edit the spec:services section to limit
hardware resources in case of a hyperconverged architecture as described in
Limit HW resources for hyperconverged OpenStack compute nodes.
Specifies the standard logging levels for OpenStack services that
include the following, at increasing severity: TRACE, DEBUG,
INFO, AUDIT, WARNING, ERROR, and CRITICAL.
Depending on the use case, you may need to configure the same application
components differently on different hosts. MOSK enables
you to easily perform the required configuration through node-specific
overrides at the OpenStack Controller side.
The limitation of using the node-specific overrides is that they override
only the configuration settings while other components, such as startup
scripts and others, should be reconfigured as well.
Caution
The overrides have been implemented in a similar way to the
OpenStack node and node label specific DaemonSet configurations.
Though, the OpenStack Controller node-specific settings conflict
with the upstream OpenStack node and node label specific DaemonSet
configurations. Therefore, we do not recommend configuring node and
node label overrides.
The list of allowed node labels is located in the Cluster object status
providerStatus.releaseRef.current.allowedNodeLabels field.
If the value field is not defined in allowedNodeLabels, a label can
have any value.
Before or after a machine deployment, add the required label from the allowed
node labels list with the corresponding value to
spec.providerSpec.value.nodeLabels in machine.yaml. For example:
The addition of a node label that is not available in the list of allowed node
labels is restricted.
The node-specific settings are activated through the spec:nodes
section of the OsDpl CR. The spec:nodes section contains the following
subsections:
features- implements overrides for a limited subset of fields and is
constructed similarly to spec::features
services - similarly to spec::services, enables you to override
settings in general for the components running as DaemonSets.
Example configuration:
spec:nodes:<NODE-LABEL>::<NODE-LABEL-VALUE>:features:# Detailed information about features might be found at# openstack_controller/admission/validators/nodes/schema.yamlservices:<service>:<chart>:<chart_daemonset_name>:values:# Any value from specific helm chart
The resource of kind OpenStackDeploymentStatus is a custom resource that
describes the status of an OpenStack deployment. To obtain detailed information
about the schema of an OpenStackDeploymentStatus custom resource:
OPENSTACKVERSION displays the actual OpenStack version of the
deployment
CONTROLLERVERSION indicates the version of the OpenStack Controller
(Rockoon) responsible for the deployment
STATE reflects the current status of life-cycle management. The list
of possible values includes:
APPLYING indicates that some Kubernetes objects for applications
are in the process of being applied
APPLIED indicates that all Kubernetes objects for applications
have been applied to the latest state
LCMPROGRESS reflects the current progress of STATE in the format
X/Y, where X denotes the number of applications with Kubernetes objects
applied and in the actual state, and Y represents the total number of
applications managed by the OpenStack Controller (Rockoon)
HEALTH provides an overview of the current health status of the
OpenStack deployment in the format X/Y, where X represents the number
of applications with notReady pods, and Y is the total number of
applications managed by the OpenStack Controller (Rockoon)
MOSKRELEASE displays the current product release of the OpenStack
deployment
The services subsection provides detailed information of LCM performed with
a specific service. This is a dictionary where keys are service names, for
example, baremetal or compute and values are dictionaries with the
following items.
Since MOSK 25.1, the OpenStack Controller has been open-sourced under the
name Rockoon and is maintained as an independent open-source project
going forward.
As part of this transition, all openstack-controller pods are named
rockoon pods across the MOSK documentation and deployments. This change
does not affect functionality, but this is the reminder for the users to
utilize the new naming for pods and other related artifacts accordingly.
The OpenStack Controller (Rockoon) enables you to modify its configuration at
runtime without restarting. MOSK stores the controller
configuration in the rockoon-config ConfigMap in the osh-system
namespace of your cluster.
To retrieve the Rockoon configuration ConfigMap, run:
The number of seconds to wait for all application components
to become ready.
wait_application_ready_delay
10
The number of seconds before going to the sleep mode between attempts
to verify if the application is ready.
node_not_ready_flapping_timeout
120
The amount of time to wait for the flapping node.
[helmbundle]
manifest_enable_timeout
600
The number of seconds to wait until the values set in the manifest
are propagated to the dependent objects.
manifest_enable_delay
10
The number of seconds between attempts to verify if the values
were applied.
manifest_disable_timeout
600
The number of seconds to wait until the values are removed from
the manifest and propagated to the child objects.
manifest_disable_delay
10
The number of seconds between attempts to verify if the values were
removed from the release.
manifest_purge_timeout
600
The number of seconds to wait until the Kubernetes object is removed.
manifest_purge_delay
10
The number of seconds between attempts to verify if the Kubernetes
object is removed.
manifest_apply_delay
10
The number of seconds to pause for the Helm bundle changes.
[maintenance]
instance_migrate_concurrency
1
The number of instances to migrate concurrently.
nwl_parallel_max_compute
30
The maximum number of compute nodes allowed for a parallel update.
nwl_parallel_max_gateway
1
The maximum number of gateway nodes allowed for a parallel update.
respect_nova_az
true
Respect Nova availability zone (AZ). The true value allows
the parallel update only for the compute nodes in the same AZ.
ndr_skip_instance_check
false
The flag to skip the instance verification on a host before proceeding
with the node removal. The false value blocks the node removal
until at least one instance exists on the host.
ndr_skip_volume_check
false
The flag to skip the volume verification on a host before proceeding
with the node removal. The false value blocks the node removal
until at least one volume exists on the host. A volume is tied to
a specific host only for the LVM backend.
The OpenStack Controller enables you to use customized images in your OpenStack
deployments. To start using such images, create a ConfigMap in the
openstack namespace with the following content, replacing
<OPENSTACKDEPLOYMENT-NAME> with the name of your OpenStackDeployment
custom resource:
MOSK relies on the MariaDB Galera cluster to provide
its OpenStack components with a reliable storage of persistent data.
For successful long-term operations of a MOSK cloud, it
is crucial to ensure the healthy state of the OpenStack database as well as the
safety of the data stored in it. To help you with that, MOSK
provides built-in automated procedures for OpenStack database maintenance,
backup, and restoration. The hereby chapter describes the internal mechanisms
and configuration details for the provided tools.
Overview of the OpenStack database backup and restoration¶
MOSK relies on the MariaDB Galera cluster to provide
its OpenStack components with a reliable storage for persistent data.
Mirantis recommends backing up your OpenStack databases daily to ensure
the safety of your cloud data. Also, you should always create an instant
backup before updating your cloud or performing any kind of potentially
disruptive experiment.
MOSK has a built-in automated backup routine that can be
triggered manually or by schedule. For detailed information about the process
of MariaDB Galera cluster backup, refer to Workflows of the OpenStack database backup and restoration.
Backup and restoration can only be performed against the OpenStack database
as a whole. Granular per-service or per-table procedures are not supported
by MOSK.
By default, periodic backups are turned off. Though, a cloud operator can
easily enable this capability by adding the following structure to the
OpenStackDeployment custom resource:
By default, MOSK backup routine stores the OpenStack
database data into the Mirantis Ceph cluster, which is a part of the same
cloud. This is sufficient for the vast majority of clouds. However, you may
want to have the backup data stored off the cloud to comply with specific
enterprise practices for infrastructure recovery and data safety.
The size of a backup storage volume depends directly on the size of the
MOSK cluster, which can be determined through the
size parameter in the OpenStackDeployment CR.
The list of the recommended sizes for a minimal backup volume includes:
20 GB for the tiny cluster size
40 GB for the small cluster size
80 GB for the medium cluster size
If required, you can change the default size of a database backup volume.
However, make sure that you configure the volume size before OpenStack
deployment is complete. This is because there is no automatic way to
resize the backup volume once the cloud is deployed. Also, only the local
backup storage (Ceph) supports the configuration of the volume size.
To change the default size of the backup volume, use the following structure
in the OpenStackDeployment CR:
To store the backup data to a local Mirantis Ceph, the MOSK
underlying Kubernetes cluster needs to have a preconfigured storage class for
Kubernetes persistent volumes with the Ceph cluster as a storage backend.
When restoring the OpenStack database from a local Ceph storage, the cron job
restores the state on each MariaDB node sequentially. It is not possible to
perform parallel restoration because Ceph Kubernetes volumes do not support
concurrent mounting from multiple places.
MOSK provides you with a capability to store the OpenStack
database data outside of the cloud, on an external storage device that supports
common data access protocols, such as third-party NAS appliances.
Security compliance may require storing backups of databases in an encrypted
format. MOSK enables encryption of database backups, both
local and remote, using the OpenSSL aes-256-cbc encryption.
To encrypt database backups, add the following configuration to the
OpenStackDeployment custom resource:
Workflows of the OpenStack database backup and restoration¶
This section provides technical details about the internal implementation
of automated backup and restoration routines built into
MOSK. The below information would be helpful for
troubleshooting of any issues related to the process or understanding the
impact these procedures impose on a running cloud.
If enabled, synchronizing the local backup storage with the
remote S3 storage
During the first backup phase, the following actions take place:
The mariadb-phy-backup pod starts on the node where the
mariadb-server replica with the highest number in its name runs.
For example, if the MariaDB server pods are named mariadb-server-0,
mariadb-server-1, and mariadb-server-2, the
mariadb-phy-backup pod starts on the same node as
mariadb-server-2.
The backup process verifies the hash sums of existing backup files
based on ConfigMap information:
If the verification fails and synchronization with the remote
S3 storage is enabled, the process checks the hash sums of
remote backups as well. If the remote backups are valid, they
are downloaded.
If the hash sums are incorrect for both local and remote backups,
the backup job fails.
If no ConfigMap exists, these hash sum checks are skipped.
Sanity check: verification of the Kubernetes status and wsrep status of
each MariaDB pod. If some pods have wrong statuses, the backup job
fails unless the --allow-unsafe-backup parameter is passed to
the main script in the Kubernetes backup job.
Note
Since MOSK 22.4, the --allow-unsafe-backup
functionality is removed from the product for security and backup
procedure simplification purposes.
Mirantis does not recommend setting the --allow-unsafe-backup
parameter unless it is absolutely required. To ensure the consistency
of a backup, verify that the MariaDB Galera cluster is in a working
state before you proceed with the backup.
Desynchronize the replica from the Galera cluster. The script connects
the target replica and sets the wsrep_desync variable to ON.
Then, the replica stops receiving write-sets and receives the wsrep
status Donor/Desynced. The Kubernetes health check of that
mariadb-server pod fails and the Kubernetes status of that pod
becomes Notready. If the pod has the primary label, the MariaDB
Controller sets the backup label to it and the pod is removed from
the endpoints list of the MariaDB service.
Verify that there is enough space in the /var/backup folder to
perform the backup. The amount of available space in the folder
should exceed <DB-SIZE>*<MARIADB-BACKUP-REQUIRED-SPACE-RATIO>
in KB.
The mariadb-phy-backup pod performs the backup using the
mariabackup tool.
The script puts the backed up replica back to sync with the Galera
cluster by setting wsrep_desync to OFF and waits for
the replica to become Ready in Kubernetes.
The script calculates hash sums for backup files and stores them in a
special ConfigMap.
If the number of existing backups exceeds the value of the
MARIADB_BACKUPS_TO_KEEP job parameter, the script removes
the oldest backups to maintain the allowed limit.
If enabled, the script synchronizes the local backup storage with the
remote S3 storage.
The mariadb-phy-restore job launches the mariadb-phy-restore pod.
This pod starts with the mariadb-server PVC with the highest number
in its name. This PVC is mounted to the /var/lib/mysql folder and the
backup PVC (or local filesystem if the hostapath backend is configured)
is mounted to /var/backup.
The mariadb-phy-restore pod contains the main restore script, which is
responsible for:
Scaling the mariadb-server StatefulSet
Verifying the statuses of mariadb-server pods
Managing the openstack-mariadb-phy-restore-runner pods
During the restoration, the database is not available for
OpenStack services that means a complete outage of all OpenStack
services.
During the first phase, the following actions take place:
The restoration process verifies the hash sums of existing backup files
based on ConfigMap information:
If the verification fails and synchronization with the remote
S3 storage is enabled, the process checks the hash sums of
remote backups as well. If the remote backups are valid, they
are downloaded.
If the hash sums are incorrect for both local and remote backups,
the backup job fails.
Save the list of mariadb-server persistent volume claims (PVC).
Scale the mariadb server StatefulSet to 0 replicas.
At this point, the database becomes unavailable for OpenStack services.
By design, when deleting a cloud resource, for example, an instance, volume,
or router, an OpenStack service does not immediately delete its data but
marks it as removed so that it can later be picked up by the garbage
collector.
Given that an OpenStack resource is often represented by more than one record
in the database, deletion of all of them right away could affect the overall
responsiveness of the cloud API. On the other hand, an OpenStack database
being severely clogged with stale data is one of the most typical reasons for
the cloud slowness.
To keep the OpenStack database small and performance fast,
MOSK is pre-configured to automatically clean up the removed
database records older than 30 days. By default, the clean up is performed for
the following MOSK services every Monday according to the
schedule:
The default database cleanup schedule by OpenStack service¶
Service
Service identifier
Clean up time
Block Storage (OpenStack Cinder)
cinder
12:01 a.m.
Compute (OpenStack Nova)
nova
01:01 a.m.
Image (OpenStack Glance)
glance
02:01 a.m.
Instance HA (OpenStack Masakari)
masakari
03:01 a.m.
Key Manager (OpenStack Barbican)
barbican
04:01 a.m.
Orchestration (OpenStack Heat)
heat
05:01 a.m.
If required, you can adjust the cleanup schedule for the OpenStack database by
adding the features:database:cleanup setting to the OpenStackDeployment
CR following the example below. The schedule parameter must contain a
valid cron expression. The age parameter specifies the number of days after
which a stale record gets cleaned up.
MOSK uses the Mariabackup utility to back up the MariaDB
Galera cluster data where the OpenStack data is stored. The Mariabackup gets
launched on a periodic basis as a part of the Kubernetes CronJob included
in any MOSK deployment and is suspended by default.
Note
If you are using the default backend to store the backup data,
which is Ceph, you can increase the default size of a backup volume.
However, make sure to configure the volume size before you deploy
OpenStack.
MOSK enables you to configure the periodic backup of the
OpenStack database through the OpenStackDeployment object. To enable the
backup, use the following structure:
spec:features:database:backup:enabled:true
TechPreview
To enhance cloud security, you can enable encryption of OpenStack
database backups using the OpenSSL aes-256-cbc encryption
through the OpenStackDeployment custom resource. Refer to
Backup encryption for configuration details.
By default, the backup job:
Runs backup on a daily basis at 01:00 AM
Creates incremental backups daily and full backups weekly
Keeps 10 latest full backups
Stores backups in the mariadb-phy-backup-data PVC
Has the backup timeout of 3600 seconds
Has the incremental backup type
To verify the configuration of the mariadb-phy-backup CronJob
object, run:
Type of a backup. The list of possible values include:
incremental
If the newest full backup is older than the value of
the full_backup_cycle parameter, the system performs a full
backup. Otherwise, the system performs an incremental backup of
the newest full backup.
Number of seconds that defines a period between 2 full backups.
During this period, incremental backups are performed. The parameter
is taken into account only if backup_type is set to
incremental. Otherwise, it is ignored.
For example, with full_backup_cycle set to 604800 seconds
a full backup is taken weekly and, if cron is set to 00***,
an incremental backup is performed on daily basis.
Multiplier for the database size to predict the space required to
create a backup, either full or incremental, and perform a
restoration keeping the uncompressed backup files on the same file
system as the compressed ones.
To estimate the size of MARIADB_BACKUP_REQUIRED_SPACE_RATIO, use
the following formula: size of (1 uncompressed full backup + all
related incremental uncompressed backups + 1 full compressed backup)
in KB =< (DB_SIZE * MARIADB_BACKUP_REQUIRED_SPACE_RATIO) in
KB.
The DB_SIZE is the disk space allocated in the MySQL data
directory, which is /var/lib/mysql, for databases data excluding
galera.cache and ib_logfile* files. This parameter prevents
the backup PVC from being full in the middle of the restoration and
backup procedures. If the current available space is lower than
DB_SIZE * MARIADB_BACKUP_REQUIRED_SPACE_RATIO, the backup
script fails before the system starts the actual backup and the
overall status of the backup job is failed.
For example, to perform full backups monthly and incremental backups
daily at 02:30 AM and keep the backups for the last six months,
configure the database backup in your OpenStackDeployment object
as follows:
By default, MOSK stores the OpenStack database backups
locally in the Mirantis Ceph cluster, which is a part of the same cloud.
Alternatively, MOSK provides you with a capability to create
remote backups using an external storage. This section contains configuration
details for a remote backend to be used for the OpenStack data backup.
In general, the built-in automated backup routine saves the data to the
mariadb-phy-backup-data PersistentVolumeClaim (PVC), which is provisioned
from StorageClass specified in the spec.persistent_volume_storage_class
parameter of the OpenstackDeployment custom resource (CR).
Remote NFS storage for OpenStack database backups¶
A preconfigured NFS server with NFS share that a Unix backup and
restore user has access to. By default, it is the same user that runs
MySQL server in a MariaDB image.
Removal of the NFS persistent volume does not automatically remove the data.
No validation of mount options. If mount options are specified incorrectly in
the OpenStackDeployment CR, the mount command fails upon the
creation of a backup runner pod.
To enhance cloud security, you can enable encryption of OpenStack
database backups using the OpenSSL aes-256-cbc encryption
through the OpenStackDeployment custom resource. Refer to
Backup encryption for configuration details.
Optionally, MOSK enables you to set the required mount
options for the NFS mount command. You can set as many options
of mount as you need. For example:
Synchronization of local MariaDB backups with a remote S3 storage¶
Available since MOSK 25.1TechPreview
MOSK provides the capability to synchronize local MariaDB
backups with a remote S3 storage. Distributing backups across multiple
locations increases their safety. Optionally, backup archives stored in S3
can be encrypted on the server side.
To enable synchronization, you need to have a preconfigured S3 storage and
a user account for access.
Enable synchronization by adding the following structure to the
OpenStackDeployment custom resource. For example, to use Ceph RadosGW
as the S3 storage provider and enable server-side encryption for stored
archives:
spec:features:database:backup:enabled:truesync_remote:enabled:trueremotes:<< remote name >>:conf:type:s3provider:Cephendpoint:<URL-TO-S3-STORAGE>path:<BUCKET-NAME-FOR-BACKUPS-ON-S3-STORAGE>server_side_encryption:aws:kmsaccess_key_id:value_from:secret_key_ref:key:access_keyname:mariadb-backup-s3-hiddensecret_access_key:value_from:secret_key_ref:key:secret_keyname:mariadb-backup-s3-hiddensse_kms_key_id:value_from:secret_key_ref:key:sse_kms_key_idname:mariadb-backup-s3-hidden
Alternatively, you can set the provider parameter to AWS
if you prefer using AWS as a provider for S3 storage and omit the
server_side_encryption and sse_kms_key_id parameters if
encryption is not required.
The internal components of Mirantis OpenStack for Kubernetes (MOSK)
coordinate their operations and exchange status information using the
cluster’s message bus (RabbitMQ).
MOSK enables you to configure OpenStack services to emit
notification messages to the MOSK cluster messaging bus
(RabbitMQ) every time an OpenStack resource, for example, an instance, image,
and so on, changes its state due to a cloud user action or through its
lifecycle. For example, MOSK Compute service (OpenStack
Nova) can publish the instance.create.end notification once a newly created
instance is up and running.
Note
In certain cases, RabbitMQ notifications may prove unreliable, such
as when the RabbitMQ server undergoes a restart or when communication
between the server and the client reading the notifications breaks down.
To optimize reliability, Mirantis suggests using multiple channels to store
notification events, encompassing:
OpenStack notification messages can be consumed and processed by various
corporate systems to integrate MOSK clouds into the
company infrastructure and business processes.
The list of the most common use cases includes:
Using notification history for retrospective security audit
Using the real-time aggregation of notification messages to gather
statistics on cloud resource consumption for further capacity planning
Cloud billing considerations
Notifications alone should not be considered as a source of data for any
kind of financial reporting. The delivery of the messages can not be
guaranteed due to various technical reasons. For example, messages can
be lost if an external consumer is not fetching them from the queue fast
enough.
Mirantis strongly recommends that your cloud billing solutions rely on the
combination of the following data sources:
Periodic polling of the OpenStack API as a reliable source of information
about allocated resources
Subscription to notifications to receive timely updates about the resource
status change
A cloud administrator can securely expose part of a MOSK
cluster message bus to the outside world. This enables an external consumer
to subscribe to the notification messages emitted by the cluster services.
Important
The latest OpenStack release available in MOSK supports
notifications from the following services:
Block storage (OpenStack Cinder)
DNS (OpenStack Designate)
Image (OpenStack Glance)
Orchestration (OpenStack Heat)
Bare Metal (OpenStack Ironic)
Identity (OpenStack Keystone)
Shared Filesystems (OpenStack Manila)
Instance High Avalability (OpenStack Masakari)
Networking (OpenStack Neutron)
Compute (OpenStack Nova)
To enable the external notification endpoint, add the following structure
to the OpenStackDeployment custom resource. For example:
For each topic name specified in the topics field, MOSK
creates a topic exchange in its RabbitMQ cluster together with a set of queues
bound to this topic. All enabled MOSK services will
publish their notification messages to all configured topics so that
multiple consumers can receive the same messages in parallel.
A topic name must follow Kubernetes standard format for object names and IDs
that is only lowercase alphanumeric characters, -, or . The topic
name notifications is reserved for the internal use.
MOSK supports the connection to message bus (RabbitMQ)
through an encrypted or non-encrypted endpoint. Once connected, it supports
authentication through either a plain text user name and password or mutual
TLS authentication using encrypted X.509 client certificates.
Each topic exchange is protected by automatically generated authentication
credentials and certificates for secure connection that are stored as a secret
in the openstack-external namespace of a MOSK underlying
Kubernetes cluster. A secret is identified by the name of the topic. The list
of attributes for the secret object includes:
hosts
The IP addresses which an external notification endpoint is available on
port_amqp, port_amqp-tls
The TCP ports which external notification endpoint is available on
vhost
The name of the RabbitMQ virtual host which the topic queues are created on
username, password
Authentication data
ca_cert
The client CA certificate
client_cert
The client certificate
client_key
The client private key
For the configuration example above, the following objects will be created:
Tungsten Fabric provides basic L2/L3 networking to an OpenStack environment
running on the MKE cluster and includes the IP address management, security
groups, floating IP addresses, and routing policies functionality.
Tungsten Fabric is based on overlay networking, where all virtual machines are
connected to a virtual network with encapsulation (MPLSoGRE, MPLSoUDP, VXLAN).
This enables you to separate the underlay Kubernetes management network. A
workload requires an external gateway, such as a hardware EdgeRouter or a
simple gateway to route the outgoing traffic.
The Tungsten Fabric vRouter uses different gateways for the control and data
planes.
All services of Tungsten Fabric are delivered as separate containers, which
are deployed by the Tungsten Fabric Operator (TFO). Each container has an
INI-based configuration file that is available on the host system. The
configuration file is generated automatically upon the container start and is
based on environment variables provided by the TFO through Kubernetes
ConfigMaps.
The main Tungsten Fabric containers run with the host network as
DeploymentSet, without using the Kubernetes networking layer. The services
listen directly on the host network interface.
The following diagram describes the minimum production installation of
Tungsten Fabric with a Mirantis OpenStack for Kubernetes (MOSK)
deployment.
For the details about the Tungsten Fabric services included in
MOSK deployments and the types of traffic and traffic
flow directions, see the subsections below.
This section describes the Tungsten Fabric services and their distribution
across the Mirantis OpenStack for Kubernetes (MOSK) deployment.
The Tungsten Fabric services run mostly as DaemonSets in separate containers
for each service. The deployment and update processes are managed by the
Tungsten Fabric Operator. However, Kubernetes manages the probe checks and
restart of broken containers.
All configuration and control services run on the Tungsten Fabric Controller
nodes.
Service name
Service description
config-api
Exposes a REST-based interface for the Tungsten Fabric API.
config-provisioner
Provisions the node for execution of configuration services.
control
Communicates with the cluster gateways using BGP and with the vRouter
agents using XMPP, as well as redistributes appropriate networking
information.
control-provisioner
Provisions the node for execution of configuration services.
device-manager
Manages physical networking devices using netconf or ovsdb.
In multi-node deployments, it operates in the active-backup mode.
dns
Using the named service, provides the DNS service to the VMs spawned
on different compute nodes. Each vRouter node connects to two
Tungsten Fabric Controller containers that run the dns process.
named
The customized Berkeley Internet Name Domain (BIND) daemon of
Tungsten Fabric that manages DNS zones for the dns service.
schema
Listens to configuration changes performed by a user and generates
corresponding system configuration objects. In multi-node deployments,
it works in the active-backup mode.
svc-monitor
Listens to configuration changes of service-template and
service-instance, as well as spawns and monitors virtual machines
for the firewall, analyzer services, and so on. In multi-node
deployments, it works in the active-backup mode.
webui
Consists of the webserver and jobserver services. Provides
the Tungsten Fabric web UI.
All analytics services run on Tungsten Fabric analytics nodes.
Service name
Service description
alarm-gen
Evaluates and manages the alarms rules.
analytics-api
Provides a REST API to interact with the Cassandra analytics
database.
analytics-nodemgr
Collects all Tungsten Fabric analytics process data and sends
this information to the Tungsten Fabric collector.
analytics-database-nodemgr
Provisions the init model if needed. Collects data of the database
process and sends it to the Tungsten Fabric collector.
collector
Collects and analyzes data from all Tungsten Fabric services.
query-engine
Handles the queries to access data from the Cassandra database.
snmp-collector
Receives the authorization and configuration of the physical routers
from the config-nodemgr service, polls the physical routers using
the Simple Network Management Protocol (SNMP), and uploads the data to
the Tungsten Fabric collector.
topology
Reads the SNMP information from the physical router user-visible
entities (UVEs), creates a neighbor list, and writes the neighbor
information to the physical router UVEs. The Tungsten Fabric web UI uses
the neighbor list to display the physical topology.
The Tungsten Fabric vRouter provides data forwarding to an OpenStack tenant
instance and reports statistics to the Tungsten Fabric analytics service. The
Tungsten Fabric vRouter is installed on all OpenStack compute nodes.
Mirantis OpenStack for Kubernetes (MOSK) supports the kernel-based
deployment of the Tungsten Fabric vRouter.
Connects to the Tungsten Fabric Controller container and the Tungsten
Fabric DNS system using the Extensible Messaging and Presence Protocol
(XMPP). The vRouter Agent acts as a local control plane. Each Tungsten
Fabric vRouter Agent is connected to at least two Tungsten Fabric
controllers in an active-active redundancy mode.
The Tungsten Fabric vRouter Agent is responsible for all
networking-related functions including routing instances, routes,
and others.
The Tungsten Fabric vRouter uses different gateways for the control
and data planes. For example, the Linux system gateway is located
on the management network, and the Tungsten Fabric gateway is located
on the data plane network.
vrouter-provisioner
Provisions the node for the vRouter agent execution.
The following diagram illustrates the Tungsten Fabric kernel vRouter set up by
the TF operator:
On the diagram above, the following types of networks interfaces are used:
eth0 - for the management (PXE) network (eth1 and eth2 are the
slave interfaces of Bond0)
On the Tungsten Fabric control plane nodes, maintains the
configuration data of the Tungsten Fabric cluster.
On the Tungsten Fabric analytics nodes, stores the collector
service data.
cassandra-operator
The Kubernetes operator that enables the Cassandra clusters creation
and management.
kafka
Handles the messaging bus and generates alarms across the Tungsten
Fabric analytics containers.
kafka-operator
The Kubernetes operator that enables Kafka clusters creation and
management.
redis
Stores the physical router UVE storage and serves as a messaging bus
for event notifications.
redis-operator
The Kubernetes operator that enables Redis clusters creation and
management.
zookeeper
Holds the active-backup status for the device-manager,
svc-monitor, and the schema-transformer services. This service
is also used for mapping of the Tungsten Fabric resources names to
UUIDs.
zookeeper-operator
The Kubernetes operator that enables ZooKeeper clusters creation and
management.
rabbitmq
Exchanges messages between API servers and original request senders.
rabbitmq-operator
The Kubernetes operator that enables RabbitMQ clusters creation and
management.
Along with the Tungsten Fabric services, MOSK deploys and
updates special image precaching DaemonSets when the kind TFOperator
resource is created or image references in it get updated.
These DaemonSets precache container images on Kubernetes nodes minimizing
possible downtime when updating container images. Cloud operator can
disable image precaching through the TFOperator resource.
The following diagram illustrates all types of UI
and API traffic in a Mirantis OpenStack for Kubernetes
cluster, including the monitoring and OpenStack API traffic. The OpenStack
Dashboard pod hosts Horizon and acts as a proxy for all other types of
traffic. TLS termination is also performed for this type of traffic.
SDN or Tungsten Fabric traffic goes through the overlay Data network and
processes east-west and north-south traffic for applications that run in a
MOSK cluster. This network segment typically contains
tenant networks as separate MPLS-over-GRE and MPLS-over-UDP tunnels.
The traffic load depends on the workload.
The control traffic between the Tungsten Fabric controllers, edge routers, and
vRouters uses the XMPP with TLS and iBGP protocols. Both protocols produce low
traffic that does not affect MPLS over GRE and MPLS over UDP traffic.
However, this traffic is critical and must be reliably delivered. Mirantis
recommends configuring higher QoS for this type of traffic.
The following diagram displays both MPLS over GRE/MPLS over UDP and iBGP and
XMPP traffic examples in a MOSK cluster:
Mirantis OpenStack for Kubernetes (MOSK) provides the Tungsten Fabric
lifecycle management including pre-deployment custom configurations, updates,
data backup and restoration, as well as handling partial failure scenarios,
by means of the Tungsten Fabric operator.
This section is intended for the cloud operators who want to gain insight into
the capabilities provided by the Tungsten Fabric operator along with the
understanding of how its architecture allows for easy management while
addressing the concerns of users of Tungsten Fabric-based
MOSK clusters.
The Tungsten Fabric Operator (TFO) is based on the Kubernetes operator
SDK project. The Kubernetes operator SDK is a framework that uses the
controller-runtime library to make writing operators easier by providing
the following:
High-level APIs and abstractions to write the operational logic more
intuitively.
Tools for scaffolding and code generation to bootstrap a new project fast.
Extensions to cover common operator use cases.
The TFO deploys the following sub-operators. Each sub-operator handles a
separate part of a TF deployment:
Since MOSK 24.3, Provisioner is a separate
component for the vRouter, deployed as the tf-vrouter-provisioner
DaemonSet. The NodeManager service is no longer deployed in TF
setups.
Besides the sub-operators that deploy TF services, TFO uses operators to deploy
and maintain third-party services, such as different types of storage, cache,
message system, and so on. The following table describes all third-party
operators:
The resource of kind TFOperator is a custom resource defined by a resource
of kind CustomResourceDefinition.
The CustomResourceDefinition resource in Kubernetes uses the OpenAPI
Specification version 2 to specify the schema of the defined resource.
The Kubernetes API outright rejects the resources that do not pass this schema
validation. Along with schema validation, TFOperator uses
ValidatingAdmissionWebhook for extended validations when a custom resource
is created or updated.
Important
Since 24.1, MOSK introduces the technical
preview support for the API v2 for the Tungsten Fabric Operator. This
version of the Tungsten Fabric Operator API aligns with the OpenStack
Controller API and provides better interface for advanced configurations.
Refer to Key differences between TFOperator API v1alpha1 and v2 for details.
Tungsten Fabric Operator uses ValidatingAdmissionWebhook to validate
environment variables set to Tungsten Fabric components upon the TFOperator
object creation or update. The following validations are performed:
Environment variables passed to the Tungsten Fabric components containers
Mapping between tfVersion and tfImageTag, if defined
Schedule for dbBackup
Data capacity format
Feature variable values
Availability of the dataStorageClass class
If required, you can disable ValidatingAdmissionWebhook through the
TFOperator HelmBundle resource:
Environment variables for Tungsten Fabric components¶
API v2 Available since MOSK 23.1
Warning
The features section of the TFOperator specification
allows for easy configuration of all Tungsten Fabric features. Mirantis
recommends updating the environment variables through envSettings
directly.
Allowed environment variables for Tungsten Fabric components¶
Key differences between TFOperator API v1alpha1 and v2¶
This section outlines the main differences between the v1alpha1 and v2 versions
of the TFOperator API:
Introduction of the features section:
All non-default Tungsten Fabric and Tungsten Fabric Operator features
can now be set in the features section.
Setting environment variables is no longer necessary but can still be
done using the envSetting field in each Tungsten Fabric service
section.
Relocation of CustomSpec from the vRouter agent specification
to the nodes section.
Reorganization of the controllers section:
The controllers section has been integrated into the services
section.
The services section is now divided into groups: analytics,
config, control, vRouter, and webUI.
Configuration of third-party services can be performed through the
analytics or config sections.
Configuration of the logging levels can be performed using the logging
field, which is a separate field in each Tungsten Fabric services
configuration.
Movement of the dataStorageClass and tfVersion fields to the upper
level of the specification.
Introduction of the devOptions section enabling the setup of experimental
development-related options.
Mirantis OpenStack for Kubernetes (MOSK) allows you to easily adapt
your Tungsten Fabric deployment to the needs of your environment through the
TFOperator custom resource.
This section includes custom configuration details available to you.
Important
Since 24.1, MOSK introduces the technical
preview support for the API v2 for the Tungsten Fabric Operator. This
version of the Tungsten Fabric Operator API aligns with the OpenStack
Controller API and provides better interface for advanced configurations.
In MOSK 24.1, the API v2 is available only for the
new product deployments with Tungsten Fabric.
Since 24.2, the API v2 becomes default for new product deployments and
includes the ability to convert existing v1alpha1 TFOperator to v2
during update.
During the update to the 24.3 series, the old Tungsten Fabric cluster
configuration API v1alpha1 is automatically converted and replaced
with the v2 version. Therefore, since MOSK 24.3,
start using the v2 TFOperator custom resource for any updates.
The v1alpha1 TFOperator custom resource remains in the cluster
but is no longer reconciled and will be automatically removed in
MOSK 25.1.
By default, Tungsten Fabric Operator sets up the following resource limits for
Cassandra analytics and configuration StatefulSets:
Limits:cpu:8memory:32GiRequests:cpu:1memory:16Gi
This is a verified configuration suitable for most cases. However, if nodes
are under a heavy load, the KubeContainerCPUThrottlingHigh StackLight alert
may raise for Tungsten Fabric Pods of the tf-cassandra-analytics and
tf-cassandra-config StatefulSets. If such alerts appear constantly, you can
increase the limits through the TFOperator custom resource. For example:
To specify custom configurations for Cassandra clusters, use the
configOptions settings in the TFOperator custom resource.
For example, you may need to increase the file cache size in case
of a heavy load on the nodes labeled with tfanalyticsdb=enabled
or tfconfigdb=enabled:
Depending on the Tungsten Fabric Operator API version in use, proceed with
one of the following options:
API v2
Available since MOSK 24.1
To specify custom settings for the Tungsten Fabric vRouter nodes, for
example, to change the name of the tunnel network interface or enable
debug level logging on some subset of nodes, use the nodes settings in
the TFOperator custom resource.
For example, to enable debug level logging on a specific node or multiple
nodes:
To specify custom settings for the Tungsten Fabric vRouter nodes, for
example, to change the name of the tunnel network interface or enable
debug level logging on some subset of nodes, use the customSpecs
settings in the TFOperator custom resource.
For example, to enable debug level logging on a specific node or multiple
nodes:
The customspecs:name value must follow the RFC 1123
international format. Verify that the name of a DaemonSet object
is a valid DNS subdomain name.
The customSpecs parameter inherits all settings for the tf-vrouter
containers that are set on the spec:controllers:agent level and
overrides or adds additional parameters. The example configuration above
overrides the logging level from SYS_INFO, which is the default logging
level, to SYS_DEBUG.
For clusters with a multi-rack architecture, you may need to redefine the
gateway IP for the Tungsten Fabric vRouter nodes using the
VROUTER_GATEWAY parameter. For details, see Multi-rack architecture.
By default, the TF control service uses the management interface for
the BGP and XMPP traffic. You can change the control service interface
using the controlInterface parameter in the TFOperator custom resource,
for example, to combine the BGP and XMPP traffic with the data (tenant)
traffic:
Tungsten Fabric implements cloud tenants’ virtual networks as Layer 3 overlays.
Tenant traffic gets encapsulated into one of the supported protocols and is
carried over the infrastructure network between 2 compute nodes or a compute
node and an edge router device.
In addition, Tungsten Fabric is capable of exchanging encapsulated traffic with
external systems in order to build advanced virtual networking topologies,
for example, BGP VPN connectivity between 2 MOSK clouds or a
MOSK cloud and a cloud tenant premises.
MOSK supports the following encapsulation protocols:
MPLS over Generic Routing Encapsulation (GRE)
A traditional encapsulation method supported by several router vendors,
including Cisco and Juniper. The feature is applicable when other
encapsulation methods are not available. For example, an SDN gateway
runs software that does not support MPLS over UDP.
MPLS over User Datagram Protocol (UDP)
A variation of the MPLS over GRE mechanism. It is the default and the most
frequently used option in MOSK. MPLS over UDP replaces
headers in UDP packets. In this case, a UDP port stores a hash of
the packet payload (entropy). It provides a significant benefit for
equal-cost multi-path (ECMP) routing load balancing. MPLS over UDP and MPLS
over GRE transfer Layer 3 traffic only.
Virtual Extensible LAN (VXLAN) TechPrev
The combination of VXLAN and EVPN technologies is often used for creating
advanced cloud networking topologies. For example, it can provide
transparent Layer 2 interconnections between Virtual Network Functions
running on top of the cloud and physical traffic generator appliances hosted
somewhere else.
The ENCAP_PRIORIY parameter defines the priority in which the
encapsulation protocols are attempted to be used when setting the BGP VPN
connectivity between the cloud and external systems.
By default, the encapsulation order is set to MPLSoUDP,MPLSoGRE,VXLAN.
The cloud operator can change it depending their needs in the TFOperator
custom resource as it is illustrated in Configuring encapsulation.
The list of supported encapsulated methods along with their order is shared
between BGP peers as part of the capabilities information exchange when
establishing a BGP session. Both parties must support the same encapsulation
methods to build a tunnel for the network traffic.
For example, if the cloud operator wants to set up a Layer 2 VPN between the
cloud and their network infrastructure, they configure the cloud’s virtual
networks with VXLAN identifiers (VNIs) and do the same on the other side,
for example, on a network switch. Also, VXLAN must be set in the first position
in encapsulation priority order. Otherwise, VXLAN tunnels will not get
established between endpoints, even though both endpoints may support the VXLAN
protocol.
However, setting VXLAN first in the encapsulation priority order will not
enforce VXLAN encapsulation between compute nodes or between compute nodes and
gateway routers that use Layer 3 VPNs for communication.
The TFOperator custom resource allows you to define encapsulation settings
for your Tungsten Fabric cluster.
Important
The TFOperator custom resource must be the only place to
configure the cluster encapsulation. Performing these configurations through
the Tungsten Fabric web UI, CLI, or API does not provide the configuration
persistency, and the settings defined this way may get reset to defaults
during the cluster services restart or update.
Note
Defining the default values for encapsulation parameters in
the TFOperator custom resource is unnecessary.
Depending on the Tungsten Fabric operator API version in use, proceed with
one of the following options:
In the routing fabric of a data centre, a MOSK cluster
with Tungsten Fabric enabled can be represented either by a separate
Autonomous System (AS)
or as part of a bigger autonomous system. In either case, Tungsten Fabric
needs to participate in the BGP peering, exchanging routes with external
devices and within the cloud.
The Tungsten Fabric Controller acts as an internal (iBGP) route reflector for
the cloud AS by populating /32 routes pointing to VMs across all compute
nodes as well as the cloud’s edge gateway devices in case they belong to the
same AS. Apart from being an iBGP router reflector for the cloud AS, the
Tungsten Fabric Controller can act as a BGP peer for autonomous systems
external to the cloud, for example, for the AS configured across the
leaf-spine fabric of the data center.
The Autonomous System Number (ASN) setting contains the unique identifier
of the autonomous system that the MOSK cluster with
Tungsten Fabric belongs to. The ASN number does not affect the internal
iBGP communication between vRouters running on the compute nodes. Such
communication will work regardless of the ASN number settings. However,
any network appliance that is not managed by the Tungsten Fabric control plane
will have BGP configured manually. Therefore, the ASN settings should be
configured accordingly on both sides. Otherwise, it would result in the
inability to establish BPG sessions, regardless of whether the external device
peers with Tungsten Fabric over iBGP or eBGP.
The TFOperator custom resource enables you to define ASN settings for
your Tungsten Fabric cluster.
Important
The TFOperator CR must be the only place to configure
the cluster ASN. Performing these configurations through the Tungsten
Fabric web UI, CLI, or API does not provide the configuration persistency,
and the settings defined this way may get reset to defaults during
the cluster services restart or update.
Note
Defining the default values for ASN parameters in the Tungsten Fabric
Operator custom resource is unnecessary.
Depending on the Tungsten Fabric Operator API version in use, proceed with
one of the following options:
By default, the Tungsten Fabric tf-control-dns-external service is created
to expose the Tungsten Fabric controldns. You can disable creation of this
service through the enableDNSExternal parameter in the TFOperator
custom resource. For example:
If an edge router is accessible from the data plane through a gateway, define
the vRouter gateway in the TFOperator custom resource. Otherwise,
the default system gateway is used.
Depending on the Tungsten Fabric Operator API version in use, proceed with
one of the following configurations:
API v2
Available since MOSK 24.1
Define the vRouterGateway parameter in the features section of
the TFOperator custom resource:
By default, MOSK deploys image precaching DaemonSets
to minimize possible downtime when updating container images. You can disable
creation of these DaemonSets by setting the imagePreCaching parameter in
the TFOperator custom resource to false:
Available since MOSK 23.2
for Tungsten Fabric 21.4 onlyTechPreview
Graceful restart and long-lived graceful restart are vital mechanisms
within BGP (Border Gateway Protocol) routing, designed to optimize
the routing tables convergence in scenarios where a BGP router restarts or
a networking failure is experienced, leading to interruptions of router
peering.
During a graceful restart, a router can signal its BGP peers about its
impending restart, requesting them to retain the routes it had previously
advertised as active. This allows for seamless network operation and minimal
disruption to data forwarding during the router downtime.
The long-lived aspect of the long-lived graceful restart extends
the graceful restart effectiveness beyond the usual restart duration.
This extension provides an additional layer of resilience and stability
to BGP routing updates, bolstering the network ability to manage
unforeseen disruptions.
Caution
Mirantis does not generally recommend using the graceful restart
and long-lived graceful restart features with the Tungsten Fabric XMPP
helper, unless the configuration is done by proficient operators with
at-scale expertise in networking domain and exclusively to address specific
corner cases.
Configuring graceful restart and long-lived graceful restart¶
Tungsten Fabric Operator allows for easy enablement and configuration
of the graceful restart and long-lived graceful restart features through
the TFOperator custom resource:
Graceful restart and long-lived graceful restart settings¶
Parameter
Default value
Description
enabled
false
Enables or disables graceful restart and long-lived graceful
restart features.
bgpHelperEnabled
false
Specifies the time interval, when the Tungsten Fabric control services
act as a graceful restart helper to the edge router or any other BGP
peer by retaining the routes learned from this peer and advertising
them to the rest of the network as applicable.
Note
BGP peer should support and be configured with graceful
restart for all of the address families used.
xmppHelperEnabled
false
Specifies the time interval, when the datapath agent should retain
the last route path from the Tungsten Fabric Controller when an
XMPP-based connection is lost.
restartTime
300
Configures a non-zero restart time in seconds to advertise for graceful
restart capability from peers.
llgrRestartTime
300
Specifies the amount of time in seconds the vRouter datapath should keep
advertised routes from the Tungsten Fabric control services, when
an XMPP connection between the control and vRouter agent services is lost.
Note
When graceful restart and long-lived graceful restart
are both configured, the duration of the long-lived graceful
restart timer is the sum of both timers.
endOfRibTimeout
300
Specifies the amount of time in seconds a control node waits to remove
stale routes from a vRouter agent Routing Information Base (RIB).
Configuring the protocol for connecting to Cassandra clusters¶
To streamline and improve the efficiency of communication between clients and
the database, Cassandra is transitioning away from the Thrift protocol in
favor of the Query Language (CQL) protocol starting with
MOSK 24.1. Since MOSK 24.2, Cassandra
uses the CQL protocol by default.
CQL provides a more user-friendly and SQL-like interface for interacting with
the database. With the move towards CQL, the Thrift-based client drivers are
no longer actively supported encouraging the users to migrate to CQL-based
client drivers to take advantage of new features and improvements in Cassandra.
If your cluster is running MOSK 24.1.x, you can enable the
CQL protocol proceeding with one of the options below depending on the Tungsten
Fabric Operator API version in use.
During update to MOSK 24.2, switching from Thrift to CQL
is performed automatically. While it is possible to switch back to Thrift,
Mirantis does not recommend it. If you choose to do so, specify thrift
instead of cql in the configuration examples below.
API v2 Available since MOSK 24.1
Define the cassandraDriver parameter in the devOptions section of
the TFOperator custom resource:
MOSK provides the capability to enable SR-IOV Spoof Check
control with the Neutron Tungsten Fabric backend.
The capability can be useful for certain network configurations. For example,
you might need to allow traffic from a virtual function interface even when
its MAC address does not match the MAC address inside the virtual machine.
In this scenario, known as MAC spoofing, disabling spoof check enables
the traffic to pass through regardless of the MAC address mismatch.
Caution
Certain NICs and drivers may not handle the spoofchk setting.
For example, the Intel 82599ES NIC paired with the ixgbe driver disregards
the spoofchk setting when VLAN tagging is enabled. Therefore, ensure
compatibility with your hardware configuration regarding spoofchk
handling before proceeding.
To enable SR-IOV Spoof Check control for Tungsten Fabric, enable SR-IOV
interfaces handling by Nova os-vif plugin in the OpenStackDeployment
custom resource:
The Tungsten Fabric Operator provides a capability to configure
the netns_availability_zone parameter of the Tungsten Fabric
svc-monitor service through the netnsAZ parameter. This configuration
enables MOSK users to specify an availability zone for
Tungsten Fabric instances, such as HAProxy (load balancer instances) or SNAT
routers.
Tungsten Fabric (TF) uses Cassandra and ZooKeeper to store its data.
Cassandra is a fault-tolerant and horizontally scalable database that provides
persistent storage of configuration and analytics data. ZooKeeper is used by
TF for allocation of unique object identifiers and transactions implementation.
To prevent data loss, Mirantis recommends that you simultaneously back up
the ZooKeeper database dedicated to configuration services and the Cassandra
database.
The backup of database must be consistent across all systems
because the state of the Tungsten Fabric databases is associated with
other system databases, such as OpenStack databases.
MOSK enables you to perform the automatic TF
data backup in the JSON format using the tf-dbbackup-job cron job.
By default, it is disabled. To back up the TF databases, enable
tf-dbBackup in the TF Operator custom resource:
By default, the tf-dbbackup-job job is scheduled for weekly execution,
allocating PVC of 5 Gi size for storing backups and keeping 5 previous
backups. To configure the backup parameters according to the needs of your
cluster, use the following structure:
The section explains specifics of the Tungsten Fabric services provided by
Mirantis OpenStack for Kubernetes (MOSK). The list of the services
and their supported features included in this section is not full and is being
constantly amended based on the complexity of the architecture and use of
a particular service.
MOSK ensures Octavia with Tungsten Fabric integration
by OpenStack Octavia Driver with Tungsten Fabric HAProxy as a backend.
The Tungsten Fabric-based MOSK deployment supports
creation, update, and deletion operations with the following standard
load balancing API entities:
Load balancers
Note
For a load balancer creation operation, the driver supports
only the vip-subnet-id argument, the vip-network-id argument is
not supported.
Listeners
Pools
Health monitors
The Tungsten Fabric-based MOSK deployment does not support
the following load balancing capabilities:
L7 load balancing capabilities, such as L7 policies, L7 rules, and others
Setting specific availability zones for load balancers and their resources
Using of the UDP protocol
Operations with Octavia quotas
Operations with Octavia flavors
Warning
The Tungsten Fabric-based MOSK deployment
enables you to manage the load balancer resources by means of the OpenStack
CLI or OpenStack Horizon. Do not perform any manipulations with the load
balancer resources through the Tungsten Fabric web UI because in this case
the changes will not be reflected on the OpenStack API side.
Octavia Amphora (Amphora v2) load balancing provides a scalable and flexible
solution for load balancing in cloud environments. MOSK
deploys Amphora load balancer on each node of the OpenStack environment
ensuring that load balancing services are easily accessible, highly scalable,
and highly reliable.
Compared to the Octavia Tungsten Fabric driver for LBaaS v2 solution, Amphora
offers several advanced features including:
Full compatibility with the Octavia API, which provides a standardized
interface for load balancing in MOSK OpenStack
environments. This makes it easier to manage and integrate with other
OpenStack services.
Layer 7 policies and rules, which allow for more granular control over
traffic routing and load balancing decisions. This enables users to
optimize their application performance and improve the user experience.
Support for the UDP protocol, which is commonly used for real-time
communications and other high-performance applications. This enables
users to deploy a wider range of applications with the same load
balancing infrastructure.
By default, MOSK uses the Octavia Tungsten Fabric load
balancing. Once Octavia Amphora load balancing is enabled, the existing Octavia
Tungsten Fabric driver load balancers will continue to function normally.
However, you cannot migrate your load balancer workloads from the old LBaaS
v2 solution to Amphora.
Note
As long as MOSK provides Octavia Amphora load
balancing as a technology preview feature, Mirantis
cannot guarantee the stability of this solution and does not provide
a migration path from Tungsten Fabric load balancing (HAProxy), which
is used by default.
To enable Octavia Amphora load balancing:
Assign openstack-gateway:enabled labels to the compute nodes in either
order.
Caution
Assigning the openstack-gateway:enabled labels on compute
nodes is crucial for the effective operation of Octavia Amphora load
balancing within an OpenStack environment. Double-check the labels
assignment to guarantee proper configuration.
To make Amphora the default provider, specify it in the
OpenStackDeployment custom resource:
spec:features:octavia:default_provider:amphorav2
Verify that the OpenStack Controller (Rockoon) has scheduled new Octavia
pods that include health manager, worker, and housekeeping pods.
kubectlgetpods-nopenstack-l'application=octavia,component in (worker, health_manager, housekeeping)'
Example of output for an environment with two compute nodes:
The workflow for creating new load balancers with Amphora is identical
to the workflow for creating load balancers with Octavia Tungsten Fabric
driver for LBaaS v2. You can do it either through the OpenStack Horizon
UI or OpenStack CLI.
If you have not defined amphorav2 as default provider in the
OpenStackDeployment custom resource, you can specify it explicitly
when creating a load balancer using the provider argument:
This section contains a summary of the Tungsten Fabric upstream features
and use cases not supported in MOSK,
features and use cases offered as Technology Preview
in the current product release if any, and known limitations of Tungsten
Fabric in integration with other product components.
Feature or use case
Status
Description
Tungsten Fabric web UI
Provided as is
MOSK provides the TF web UI as is and does not
include this service in the support Service Level Agreement
Automatic generation of network port records in DNSaaS
(Designate)
Not supported
As a workaround, you can use the Tungsten Fabric
built-in DNS service that enables virtual machines to resolve
each other names
Secret management (Barbican)
Not supported
It is not possible to use the certificates stored in Barbican
to terminate HTTPs on a load balancer in a Tungsten Fabric deployment
Role Based Access Control (RBAC) for Neutron objects
Not supported
Advanced Tungsten Fabric features
Provided as is
MOSK provides the following advanced Tungsten Fabric
features as is and does not include them in the support Service
Level Agreement:
Service Function Chaining
Production ready multi-site SDN
Layer 3 multihoming
Long-Lived Graceful Restart (LLGR)
Technical Preview
DPDK
Tungsten Fabric and OpenStack Octavia Amphora integration
Technical Preview
Due to Tungsten Fabric Simple Virtual Gateway restriction, each virtual
network can have only one VGW interface. As a result,
MOSK should be limited to a single compute node
with the openstack-gateway=enabled label. This limitation prevents
OpenStack Octavia Amphora from functioning in a multi-rack deployment.
The integration between the OpenStack and TF controllers is
implemented through the shared Kubernetes openstack-tf-shared namespace.
Both controllers have access to this namespace to read and write the Kubernetes
kind:Secret objects.
The OpenStack Controller (Rockoon) posts the data into the
openstack-tf-shared namespace required by the TF services. The TF
controller watches this namespace. Once an appropriate secret is created,
the TF controller obtains it into the internal data structures for further
processing.
The OpenStack Controller includes the following data for the TF Controller:
tunnel_inteface
Name of the network interface for the TF data plane. This interface
is used by TF for the encapsulated traffic for overlay networks.
Keystone authorization information
Keystone Administrator credentials and an up-and-running IAM service
are required for the TF Controller to initiate the deployment process.
Nova metadata information
Required for the TF vRrouter agent service.
Also, the OpenStack Controller watches the openstack-tf-shared namespace
for the vrouter_port parameter that defines the vRouter port number and
passes it to the nova-compute pod.
The list of the OpenStack services that are integrated with TF through their
API include:
neutron-server - integration is provided by the
contrail-neutron-plugin component that is used by the neutron-server
service for transformation of the API calls to the TF API compatible
requests.
nova-compute - integration is provided by the
contrail-nova-vif-driver and contrail-vrouter-api packages used
by the nova-compute service for interaction with the TF vRouter to
the network ports.
octavia-api - integration is provided by the Octavia TF Driver that
enables you to use OpenStack CLI and Horizon for operations with load
balancers. See Tungsten Fabric load balancing (HAProxy) for details.
Warning
TF is not integrated with the following OpenStack services:
Tungsten Fabric allows running IPv6-enabled OpenStack tenant networks on top
of the IPv4 underlay. You can create an IPv6 virtual network through the
Tungsten Fabric web UI or OpenStack CLI in the same way as an IPv4 virtual
network. The IPv6 functionality is enabled out of the box and does not require
major changes in the cloud configuration. This section lists the IPv6
capabilities supported by MOSK, as well as those available
and unavailable in the upstream OpenContrail software.
The following IPv6 features are
supported and verified in MOSK:
Virtual machines with IPv6 and IPv4 interfaces
Virtual machines with IPv6-only interfaces
DHCPv6 and neighbor discovery
Policy and security groups
IPv6 flow set up, tear down, and aging
Flow set up and tear down based on a TCP state machine
Fat flow
Allowed address pair configuration with IPv6 addresses
Equal Cost Multi-Path (ECMP)
Additionally, the following IPv6 features are
available in upstream OpenContrail according to its official
documentation:
Protocol-based flow aging
IPv6 service chaining
Connectivity with gateway (MX Series device)
Virtual Domain Name Services (vDNS), name-to-IPv6 address resolution
The following IPv6 features are not available in upstream
OpenContrail:
Depending on the size of an OpenStack environment and the components
that you use, you may want to have a single or multiple network interfaces,
as well as run different types of traffic on a single or multiple VLANs.
This section provides the recommendations for planning the network
configuration and optimizing the cloud performance.
Mirantis OpenStack for Kubernetes (MOSK) cluster networking is
complex and defined by the security requirements and performance
considerations. It is based on the Kubernetes cluster networking provided by
Mirantis Container Cloud and expanded to facilitate the demands of the
OpenStack virtualization platform.
A Container Cloud Kubernetes cluster provides a platform for
MOSK and is considered a part of its control plane. All
networks that serve Kubernetes and related traffic are considered control
plane networks. The Kubernetes cluster networking is typically focused on
connecting pods of different nodes as well as exposing the Kubernetes API and
services running in pods into an external network.
The OpenStack networking connects virtual machines to each other and the
outside world. Most of the OpenStack-related networks are considered a part of
the data plane in an OpenStack cluster. Ceph networks are considered data
plane networks for the purpose of this reference architecture.
When planning your OpenStack environment, consider the types of traffic that
your workloads generate and design your network accordingly. If you
anticipate that certain types of traffic, such as storage replication,
will likely consume a significant amount of network bandwidth, you may
want to move that traffic to a dedicated network interface to avoid
performance degradation.
The following diagram provides a simplified overview of the underlay
networking in a MOSK environment:
This page summarizes the recommended networking architecture of a Mirantis
Container Cloud management cluster for a Mirantis OpenStack for Kubernetes
(MOSK) cluster.
We recommend deploying the management cluster with a dedicated interface
for the provisioning (PXE) network. The separation of the provisioning network
from the management network ensures additional security and resilience of
the solution.
MOSK end users typically should have access to the Keycloak
service in the management cluster for authentication to the Horizon web UI.
Therefore, we recommend that you connect the management network of the
management cluster to an external network through an IP router. The default
route on the management cluster nodes must be configured with the default
gateway in the management network.
If you deploy the multi-rack configuration, ensure that the provisioning
network of the management cluster is connected to an IP router that connects
it to the provisioning networks of all racks.
Facilitates the iPXE boot of all bare metal machines in a
MOSK cluster and provisioning of the operating system
to machines.
This network is only used during provisioning of the host. It must not
be configured on an operational MOSK node.
Life-cycle management (LCM) network
Connects LCM Agents running on the hosts to the Container Cloud LCM API.
The LCM API is provided by the management cluster.
The LCM network is also used for communication between kubelet
and the Kubernetes API server inside a Kubernetes cluster. The MKE
components use this network for communication inside a swarm cluster.
The LCM subnet(s) provides IP addresses that are statically allocated
by the IPAM service to bare metal hosts. This network must be connected
to the Kubernetes API endpoint of the management cluster through an
IP router. LCM Agents running on MOSK clusters will
connect to the management cluster API through this router. LCM subnets
may be different per MOSK cluster as long as this
connection requirement is satisfied.
You can use more than one LCM network segment in a MOSK
cluster. In this case, separated L2 segments and interconnected L3
subnets are still used to serve LCM and API traffic.
All IP subnets in the LCM networks must be connected to each other
by IP routes. These routes must be configured on the hosts through
L2 templates.
All IP subnets in the LCM network must be connected to the Kubernetes
API endpoints of the management cluster through an IP router.
You can manually select the load balancer IP address for external
access to the cluster API and specify it in the Cluster object
configuration. Alternatively, you can allocate a dedicated IP range
for a virtual IP of the cluster API load balancer by adding a Subnet
object with a special annotation. Mirantis recommends that this subnet
stays unique per MOSK cluster.
For details, see Create subnets.
Note
When using the ARP announcement of the IP address for the
cluster API load balancer, the following limitations apply:
Only one of the LCM networks can contain the API endpoint.
This network is called API/LCM throughout this documentation.
It consists of a VLAN segment stretched between all Kubernetes
master nodes in the cluster and the IP subnet that provides
IP addresses allocated to these nodes.
The load balancer IP address must be allocated from the same
subnet CIDR address that the LCM subnet uses.
When using the BGP announcement of the IP address for the cluster API
load balancer, which is available as Technology Preview since
MOSK 23.2.2, no segment stretching is required
between Kubernetes master nodes. Also, in this scenario, the load
balancer IP address is not required to match the LCM subnet CIDR address.
Kubernetes workloads network
Serves as an underlay network for traffic between pods in
the MOSK cluster. Do not share this network between
clusters.
There might be more than one Kubernetes pods network segment in the cluster.
In this case, they must be connected through an IP router.
Kubernetes workloads network does not need an external access.
The Kubernetes workloads subnet(s) provides IP addresses that
are statically allocated by the IPAM service to all nodes and that
are used by Calico for cross-node communication inside a cluster.
By default, VXLAN overlay is used for Calico cross-node communication.
Kubernetes external network
Serves for an access to the OpenStack endpoints in a MOSK
cluster.
When using the ARP announcement of the external endpoints of
load-balanced services, the network must contain a VLAN segment
extended to all MOSK nodes connected to this network.
When using the BGP announcement of the external endpoints of
load-balanced services, which is available as Technology Preview since
MOSK 23.2.2, there is no requirement of having
a single VLAN segment extended to all MOSK nodes
connected to this network.
A typical MOSK cluster only has one external network.
The external network must include at least two IP address ranges
defined by separate Subnet objects in Container Cloud API:
MOSK services address range
Provides IP addresses for externally available
load-balanced services, including OpenStack API endpoints.
External address range
Provides IP addresses to be assigned to network interfaces
on all cluster nodes that are connected to this network.
MetalLB speakers must run on the same nodes. For details, see
Configure the MetalLB speaker node selector.
This is required for external traffic to return to the originating
client. The default route on the MOSK nodes that
are connected to the external network must be configured with the
default gateway in the external network.
Storage access network
Serves for the storage access traffic from and to Ceph OSD services.
A MOSK cluster may have more than one VLAN segment
and IP subnet in the storage access network. All IP subnets of this
network in a single cluster must be connected by an IP router.
The storage access network does not require external access unless
you want to directly expose Ceph to the clients outside of a
MOSK cluster.
Note
A direct access to Ceph by the clients outside of a
MOSK cluster is technically possible but not
supported by Mirantis. Use at your own risk.
The IP addresses from subnets in this network are statically allocated
by the IPAM service to Ceph nodes. The Ceph OSD services bind to these
addresses on their respective nodes.
Serves for the storage replication traffic between Ceph OSD services.
A MOSK cluster may have more than one VLAN segment
and IP subnet in this network as long as the subnets are connected
by an IP router.
This network does not require external access.
The IP addresses from subnets in this network are statically allocated
by the IPAM service to Ceph nodes.
The Ceph OSD services bind to these addresses on their respective nodes.
The following diagram illustrates the networking schema of the Container Cloud
deployment on bare metal with a MOSK cluster using ARP
announcements:
Since 23.2.2, MOSK supports full L3 networking
topology in the Technology Preview scope.
The following diagram illustrates the networking schema of the Container Cloud
deployment on bare metal with a MOSK cluster using BGP
announcements:
This section describes network types for Layer 3 networks used for Kubernetes
and Mirantis OpenStack for Kubernetes (MOSK) clusters along with
requirements for each network type.
Note
Only IPv4 is currently supported by Container Cloud and IPAM
for infrastructure networks. Both IPv4 and IPv6 are supported
for OpenStack workloads.
The following diagram provides an overview of the underlay networks in a
MOSK environment:
If BGP announcement is configured for the MOSK cluster API LB address, the
API/LCM network is not required. Announcement of the cluster API LB address
is done using the LCM network.
If you configure ARP announcement of the load-balancer IP address for the
MOSK cluster API, the API/LCM network must be configured on the Kubernetes
manager nodes of the cluster. This network contains the Kubernetes API
endpoint with the VRRP virtual IP address.
LCM network
Enables communication between the MKE cluster nodes. Multiple VLAN
segments and IP subnets can be created for a multi-rack architecture. Each
server must be connected to one of the LCM segments and have an IP from
the corresponding subnet.
External network
Used to expose the OpenStack, StackLight, and other services of the
MOSK cluster.
Kubernetes workloads network
Used for communication between containers in Kubernetes.
Storage access network (Ceph)
Used for accessing the Ceph storage. In Ceph terms, this is a public
network 0. We recommended that it is placed on a dedicated hardware
interface.
Storage replication network (Ceph)
Used for Ceph storage replication. In Ceph terms, this is a cluster
network 0. To ensure low latency and fast access, place the network on a
dedicated hardware interface.
Typically, a routable network used to provide the external access to
OpenStack instances (a floating network). Can be used by the OpenStack
services such as Ironic, Manila, and others, to connect their
management resources.
pr-floating
Networking
Overlay networks (virtual networks)
The network used to provide denied, secure tenant networks with the
help of the tunneling mechanism (VLAN/GRE/VXLAN). If the VXLAN and GRE
encapsulation takes place, the IP address assignment is required on
interfaces at the node level.
neutron-tunnel
Compute
Live migration network
The network used by the OpenStack compute service (Nova) to transfer
data during live migration. Depending on the cloud needs, it can be
placed on a dedicated physical network not to affect other networks
during live migration. The IP address assignment is required on
interfaces at the node level.
lm-vlan
The way of mapping of the logical networks described above to physical
networks and interfaces on nodes depends on the cloud size and configuration.
We recommend placing OpenStack networks on a dedicated physical interface
(bond) that is not shared with storage and Kubernetes management network
to minimize the influence on each other.
The bridge interface with this name is mandatory if you need to separate
Kubernetes workloads traffic. You can configure this bridge over the VLAN or
directly over the bonded or single interface.
Routing to all IP subnets of the Storage access network
Routing to all IP subnets of the Storage replication network
Note
When selecting externally routable subnets, ensure that the subnet
ranges do not overlap with the internal subnets ranges. Otherwise, internal
resources of users will not be available from the MOSK
cluster.
Mirantis OpenStack for Kubernetes (MOSK) enables you to deploy a
cluster with a multi-rack architecture, where every data center cabinet
(a rack) incorporates its own Layer 2 network infrastructure that does not
extend beyond its top-of-rack switch. The architecture allows a
MOSK cloud to integrate natively with the Layer 3-centric
networking topologies such as Spine-Leaf
that are commonly seen in modern data centers.
The architecture eliminates the need to stretch and manage VLANs across
parts of a single data center, or to build VPN tunnels between the segments of
a geographically distributed cloud.
The set of networks present in each rack depends on the backend used by the
OpenStack networking service.
In the Mirantis Container Cloud and MOSK multi-rack
reference architecture, every rack has its own L2 segment (VLAN) to bootstrap
and install servers.
Segmentation of the provisioning network requires additional configuration
of the underlay networking infrastructure and certain Container Cloud API
objects. You need to configure a DHCP Relay agent on the border of each VLAN
in the provisioning network. The agent handles broadcast DHCP requests coming
from the bare metal servers in the rack and forwards them as unicast packets
across L3 fabric of the data center to a Container Cloud management cluster.
multi-rack-bm.html
From the standpoint of Container Cloud API, you need to configure per-rack DHCP
ranges by adding Subnet resources in Container Cloud as described in
Configure multiple DHCP address ranges.
The DHCP server of Container Cloud automatically leases a temporary IP address
from the DHCP range to the requester host depending on the address of the DHCP
agent that relays the request.
To deploy a MOSK cluster with multi-rack reference
architecture, you need to create a dedicated set of subnets and L2 templates
for every rack in your cluster.
Every specific host type in the rack, which is defined by the role in the
MOSK cluster and network-related hardware configuration,
may require a specific L2 template.
For MOSK 23.1 and older versions, due to the Container
Cloud limitations, you need to configure the following networks to have L2
segments (VLANs) stretch across racks to all hosts of certain types in
a multi-rack environment:
LCM/API network
Must be configured on the Kubernetes manager nodes of the
MOSK cluster. Contains a Kubernetes API endpoint with a
VRRP virtual IP address. Enables MKE cluster nodes to communicate
with each other.
External network
Exposes OpenStack, StackLight, and other services of the
MOSK cluster to external clients.
When planning space allocation for IP addresses in your cluster, pick large
IP ranges for each type of network. Then you will split these ranges into
per-rack subnets.
For example, if you allocate a /20 address block for LCM network,
then you can create up to 16 Subnet objects with the /24 address block
each for up to 16 racks. This way you can simplify routing on your hosts using
the large /20 IP subnet as an aggregated route destination. For details,
see Underlay networking: routing configuration.
A typical medium and more sized MOSK cloud consists of three
or more racks that can generally be divided into the following major
categories:
Compute/Storage racks that contain the hypervisors and instances running on
top of them. Additionally, they contain nodes that store cloud applications’
block, ephemeral, and object data as part of the Ceph cluster.
Control plane racks that incorporate all the components needed by the cloud
operator to manage its life cycle. Also, they include the services through
which the cloud users interact with the cloud to deploy their applications,
such as cloud APIs and web UI.
A control plane rack may also contain additional compute and storage nodes.
The diagram below will help you to plan the networking layout of a multi-rack
MOSK cloud with Tungsten Fabric.
For MOSK 23.1 and older versions, Kubernetes masters
(3 nodes) either need to be placed into a single rack or, if distributed
across multiple racks for better availability, require stretching of the
L2 segment of the management network across these racks.
This requirement is caused by the Mirantis Kubernetes Engine underlay for
MOSK relying on the Layer 2 VRRP protocol to ensure high
availability of the Kubernetes API endpoint.
The table below provides a mapping between the racks and the network types
participating in a multi-rack MOSK cluster with the
Tungsten Fabric backend.
Networks and VLANs for a multi-rack MOSK
cluster with TF¶
This section summarizes the requirements for the physical layout of underlay
network and VLANs configuration for the multi-rack architecture of
Mirantis OpenStack for Kubernetes (MOSK).
Physical networking of a Container Cloud management cluster¶
Due to limitations of virtual IP address for Kubernetes API and of MetalLB
load balancing in Container Cloud, the management cluster nodes must share
VLAN segments in the provisioning and management networks.
In the multi-rack architecture, the management cluster nodes may be placed to
a single rack or spread across three racks. In either case, provisioning and
management network VLANs must be stretched across ToR switches of the racks.
The following diagram illustrates physical and L2 connections of
the Container Cloud management cluster.
If you configure BGP announcement for IP addresses of load-balanced services
of a MOSK cluster, the external network can consist of multiple VLAN segments
connected to all nodes of a MOSK cluster where MetalLB speaker components are
configured to announce IP addresses for Kubernetes load-balanced services.
Mirantis recommends that you use OpenStack controller nodes for this purpose.
If you configure ARP announcement for IP addresses of load-balanced services
of a MOSK cluster, the external network must consist of a single VLAN
stretched to the ToR switches of all the racks where MOSK nodes connected to
the external network are located. Those are the nodes where MetalLB speaker
components are configured to announce IP addresses for Kubernetes load-balanced
services. Mirantis recommends that you use OpenStack controller nodes for this
purpose.
If BGP announcement is configured for MOSK
cluster API LB address, Kubernetes manager
nodes have no requirement to share the single stretched VLAN segment in the
API/LCM network. All VLANs may be configured per rack.
If ARP announcement is configured for
MOSK cluster API LB address, Kubernetes manager
nodes must share the VLAN segment in the API/LCM network.
In the multi-rack architecture, Kubernetes manager nodes may be spread
across three racks. The API/LCM network VLAN must be stretched to the ToR
switches of the racks. All other VLANs may be configured per rack.
This requirement is caused by the Mirantis Kubernetes Engine underlay for
MOSK relying on the Layer 2 VRRP protocol to ensure high
availability of the Kubernetes API endpoint.
The following diagram illustrates physical and L2 network connections
of the Kubernetes manager nodes in a MOSK cluster.
Caution
Such configuration does not apply to a compact control plane
MOSK installation. See Create a MOSK cluster.
This section describes requirements for the configuration of the underlay
network for an MOSK cluster in a multi-rack
reference configuration. The infrastructure operator must configure the
underlay network according to these guidelines. Mirantis Container Cloud will
not configure routing on the network devices.
In the multi-rack reference architecture, every server rack has its own
layer-2 segment (VLAN) for network bootstrap and installation of physical
servers.
You need to configure top-of-rack (ToR) switches in each rack with the default
gateway for the provisioning network VLAN. This gateway must also serve as a
DHCP Relay Agent on the border of the VLAN. The agent handles broadcast
DHCP requests coming from the bare metal servers in the rack and
forwards them as unicast packets across the data center L3 fabric to the
provisioning network of a Container Cloud management cluster.
Therefore, each ToR gateway must have an IP route to the IP subnet of the
provisioning network of the management cluster. The provisioning network
gateway, in turn, must have routes to all IP subnets of all racks.
The hosts of the management cluster must have routes to all IP subnets
in the provisioning network through the gateway in the provisioning network
of the management cluster.
All hosts in the management cluster must have IP addresses from the same
IP subnet of the provisioning network. Even if the hosts
of the management cluster are mounted to different racks, they must share
a single provisioning VLAN segment.
All hosts of a management cluster must have IP addresses from the same subnet
of the management network. Even if hosts of a management cluster are mounted
to different racks, they must share a single management VLAN segment.
The gateway in this network is used as the default route on the nodes
in a Container Cloud management cluster. This gateway must connect to
external Internet networks directly or through a proxy server.
If the Internet is accessible through a proxy server, you must configure
Container Cloud bootstrap to use it as well. For details, see
Deploy a management cluster.
This network connects a Container Cloud management cluster to Kubernetes
API endpoints of MOSK clusters. It also connects LCM agents
of MOSK nodes to the Kubernetes API endpoint of the
management cluster.
The network gateway must have routes to all API/LCM subnets of all
MOSK clusters.
This network may include multiple VLANs, typically, one VLAN per rack.
Each VLAN may have one or more IP subnets with gateways configured on
ToR switches.
Each ToR gateway must provide routes to all other IP subnets in all
other VLANs in the LCM network to enable communication between nodes
in the cluster.
If you configure BGP announcement of the load-balancer IP address for a
MOSK cluster API:
All nodes of a MOSK cluster must be connected to the
LCM network. Each host connected to this network must have routes to all
IP subnets in the LCM network and to the management subnet of the
management cluster, through the ToR gateway for the rack of this host.
It is not required to configure a separate API/LCM network.
Announcement of the IP address of the load balancer is done using the LCM
network.
If you configure ARP announcement of the load-balancer IP address for a
MOSK cluster API:
All nodes of a MOSK cluster excluding manager nodes
must be connected to the LCM network. Each host connected to this network
must have routes to all IP subnets in the LCM network, including the
API/LCM network of this MOSK cluster and to the
Management subnet of the management cluster, through the ToR gateway for
the rack of this host.
It is required to configure a separate API/LCM network. All manager nodes
of a MOSK cluster must be connected to the API/LCM
network. IP address announcement for load balancing is done using the
API/LCM network.
If BGP announcement is configured for the MOSK cluster API LB address, the
API/LCM network is not required. Announcement of the cluster API LB address
is done using the LCM network.
If you configure ARP announcement of the load-balancer IP address for the
MOSK cluster API, the API/LCM network must be configured on the Kubernetes
manager nodes of the cluster. This network contains the Kubernetes API
endpoint with the VRRP virtual IP address.
This network consists of a single VLAN shared between all
MOSK manager nodes in a MOSK cluster,
even if the nodes are spread across multiple racks.
All manager nodes of a MOSK cluster must be connected to
this network and have IP addresses from the same subnet in this network.
The gateway in the API/LCM network for a MOSK cluster
must have a route to the Management subnet of the management cluster.
This is required to ensure symmetric traffic flow between the management
and MOSK clusters.
The gateway in this network must also have routes to all IP subnets
in the LCM network of this MOSK cluster.
The load-balancer IP address for cluster API must be allocated from the same
CIDR address that the API/LCM subnet uses.
If you configure BGP announcement for IP addresses of load-balanced services
of a MOSK cluster, the external network can consist of multiple VLAN segments
connected to all nodes of a MOSK cluster where MetalLB speaker components are
configured to announce IP addresses for Kubernetes load-balanced services.
Mirantis recommends that you use OpenStack controller nodes for this purpose.
If you configure ARP announcement for IP addresses of load-balanced services
of a MOSK cluster, the external network must consist of a single VLAN
stretched to the ToR switches of all the racks where MOSK nodes connected to
the external network are located. Those are the nodes where MetalLB speaker
components are configured to announce IP addresses for Kubernetes load-balanced
services. Mirantis recommends that you use OpenStack controller nodes for this
purpose.
The IP gateway in this network is used as the default route on all nodes in the
MOSK cluster, which are connected to this network.
This allows external users to connect to the OpenStack endpoints exposed as
Kubernetes load-balanced services.
Dedicated IP ranges from this network must be configured as address pools
for the MetalLB service. MetalLB allocates addresses from these address pools
to Kubernetes load-balanced services.
This network may include multiple VLANs and IP subnets, typically,
one VLAN and IP subnet per rack. All IP subnets in this network must
be connected by IP routes on the ToR switches.
Typically, every node in a MOSK cluster is connected
to this network and have routes to all IP subnets from this network through
its rack IP gateway.
This network is not connected to the external networks.
This network may include multiple VLANs and IP subnets, typically,
one VLAN and IP subnet per rack. All IP subnets in this network must
be connected by IP routes on the ToR switches.
Every Ceph OSD node in a MOSK cluster must be connected
to this network and have routes to all IP subnets from this network
through its rack IP gateway.
This network is not connected to the external networks.
This network may include multiple VLANs and IP subnets, typically,
one VLAN and IP subnet per rack. All IP subnets in this network must
be connected by IP routes on the ToR switches.
All nodes in a MOSK cluster must be connected
to this network and have routes to all IP subnets from this network
through its rack IP gateway.
This network is not connected to the external networks.
To improve the goodput, we recommend that you enable jumbo frames where
possible. The jumbo frames have to be enabled on the whole path of the packets
traverse. If one of the network components cannot handle jumbo frames, the
network path uses the smallest MTU.
To provide fault tolerance of a single NIC, we recommend using the link
aggregation, such as bonding. The link aggregation is useful for linear
scaling of bandwidth, load balancing, and fault protection. Depending
on the hardware equipment, different types of bonds might be supported.
Use the multi-chassis link aggregation as it provides fault tolerance
at the device level. For example, MLAG on Arista equipment or vPC on
Cisco equipment.
The Linux kernel supports the following bonding modes:
active-backup
balance-xor
802.3ad (LACP)
balance-tlb
balance-alb
Since LACP is the IEEE standard 802.3ad supported by the majority of
network platforms, we recommend using this bonding mode.
Use the Link Aggregation Control Protocol (LACP) bonding mode
with MC-LAG domains configured on ToR switches. This corresponds to
the 802.3ad bond mode on hosts.
Additionally, follow these recommendations in regards to bond interfaces:
Use ports from different multi-port NICs when creating bonds. This makes
network connections redundant if failure of a single NIC occurs.
Configure the ports that connect servers to the PXE network with PXE VLAN
as native or untagged. On these ports, configure LACP fallback to ensure
that the servers can reach DHCP server and boot over network.
Configure Spanning Tree Protocol (STP) settings on the network switch ports to
ensure that the ports start forwarding packets as soon as the link comes up.
It helps avoid iPXE timeout issues and ensures reliable boot over network.
A MOSK cluster uses Ceph as a distributed storage system
for file, block, and object storage. This section provides an overview of a
Ceph cluster deployed by Container Cloud.
Mirantis Container Cloud deploys Ceph on MOSK using Helm
charts with the following components:
Rook Ceph Operator
A storage orchestrator that deploys Ceph on top of a Kubernetes cluster. Also
known as Rook or RookOperator. Rook operations include:
Deploying and managing a Ceph cluster based on provided Rook CRs such as
CephCluster, CephBlockPool, CephObjectStore, and so on.
Orchestrating the state of the Ceph cluster and all its daemons.
KaaSCephCluster custom resource (CR)
Represents the customization of a Kubernetes installation and allows you to
define the required Ceph configuration through the Container Cloud web UI
before deployment. For example, you can define the failure domain, Ceph pools,
Ceph node roles, number of Ceph components such as Ceph OSDs, and so on.
The ceph-kcc-controller controller on the Container Cloud management
cluster manages the KaaSCephCluster CR.
Ceph Controller
A Kubernetes controller that obtains the parameters from Container Cloud
through a CR, creates CRs for Rook and updates its CR status based on the Ceph
cluster deployment progress. It creates users, pools, and keys for OpenStack
and Kubernetes and provides Ceph configurations and keys to access them. Also,
Ceph Controller eventually obtains the data from the OpenStack Controller
(Rockoon) for the Keystone integration and updates the Ceph Object Gateway
services configurations to use Kubernetes for user authentication.
The Ceph Controller operations include:
Transforming user parameters from the Container Cloud Ceph CR into Rook CRs
and deploying a Ceph cluster using Rook.
Providing integration of the Ceph cluster with Kubernetes.
Providing data for OpenStack to integrate with the deployed Ceph cluster.
Ceph Status Controller
A Kubernetes controller that collects all valuable parameters from the current
Ceph cluster, its daemons, and entities and exposes them into the
KaaSCephCluster status. Ceph Status Controller operations include:
Collecting all statuses from a Ceph cluster and corresponding Rook CRs.
Collecting additional information on the health of Ceph daemons.
Provides information to the status section of the KaaSCephCluster
CR.
Ceph Request Controller
A Kubernetes controller that obtains the parameters from Container Cloud
through a CR and manages Ceph OSD lifecycle management (LCM) operations. It
allows for a safe Ceph OSD removal from the Ceph cluster. Ceph Request
Controller operations include:
Providing an ability to perform Ceph OSD LCM operations.
Obtaining specific CRs to remove Ceph OSDs and executing them.
Pausing the regular Ceph Controller reconciliation until all requests are
completed.
A typical Ceph cluster consists of the following components:
Ceph Monitors - three or, in rare cases, five Ceph Monitors.
Ceph Managers - one Ceph Manager in a regular cluster.
Ceph Object Gateway (radosgw) - Mirantis recommends having three or more
radosgw instances for HA.
Ceph OSDs - the number of Ceph OSDs may vary according to the deployment
needs.
Warning
A Ceph cluster with 3 Ceph nodes does not provide
hardware fault tolerance and is not eligible
for recovery operations,
such as a disk or an entire Ceph node replacement.
A Ceph cluster uses the replication factor that equals 3.
If the number of Ceph OSDs is less than 3, a Ceph cluster
moves to the degraded state with the write operations
restriction until the number of alive Ceph OSDs
equals the replication factor again.
The placement of Ceph Monitors and Ceph Managers is defined in the
KaaSCephCluster CR.
The following diagram illustrates the way a Ceph cluster is deployed in
Container Cloud:
The following diagram illustrates the processes within a deployed Ceph cluster:
A Ceph cluster configuration in MOSK includes but is
not limited to the following limitations:
Only one Ceph Controller per MOSK cluster
and only one Ceph cluster per Ceph Controller are supported.
The replication size for any Ceph pool must be set to more than 1.
Only one CRUSH tree per cluster. The separation of devices per Ceph pool is
supported through device classes
with only one pool of each type for a device class.
All CRUSH rules must have the same failure_domain.
Only the following types of CRUSH buckets are supported:
topology.kubernetes.io/region
topology.kubernetes.io/zone
topology.rook.io/datacenter
topology.rook.io/room
topology.rook.io/pod
topology.rook.io/pdu
topology.rook.io/row
topology.rook.io/rack
topology.rook.io/chassis
RBD mirroring is not supported.
Consuming an existing Ceph cluster is not supported.
Lifted since MOSK 23.1
CephFS is unsupported. Multiple CephFS are supported since
MOSK 25.1.
Only IPv4 is supported.
If two or more Ceph OSDs are located on the same device, there must be no
dedicated WAL or DB for this class.
Only a full collocation or dedicated WAL and DB configurations are supported.
The minimum size of any defined Ceph OSD device is 5 GB.
Ceph OSDs support only raw disks as data devices meaning that no dm or
lvm devices are allowed.
Lifted since MOSK 23.3
Ceph cluster does not support removable devices (with hotplug enabled) for
deploying Ceph OSDs.
When adding a Ceph node with the Ceph Monitor role, if any issues occur with
the Ceph Monitor, rook-ceph removes it and adds a new Ceph Monitor instead,
named using the next alphabetic character in order. Therefore, the Ceph Monitor
names may not follow the alphabetical order. For example, a, b, d,
instead of a, b, c.
Reducing the number of Ceph Monitors is not supported and causes the Ceph
Monitor daemons removal from random nodes.
Removal of the mgr role in the nodes section of the
KaaSCephCluster CR does not remove Ceph Managers. To remove a Ceph
Manager from a node, remove it from the nodes spec and manually delete
the mgr pod in the Rook namespace.
Lifted since MOSK 24.1
Ceph does not support allocation of Ceph RGW pods on nodes where the Federal
Information Processing Standard (FIPS) mode is enabled.
The integration between Ceph and OpenStack (Rockoon) Controllers is implemented
through the shared Kubernetes openstack-ceph-shared namespace. Both
controllers have access to this namespace to read and write the Kubernetes
kind:Secret objects.
As Ceph is required and only supported backend for several OpenStack
services, all necessary Ceph pools must be specified in the configuration
of the kind:MiraCeph custom resource as part of the deployment.
Once the Ceph cluster is deployed, the Ceph Controller posts the
information required by the OpenStack services to be properly configured
as a kind:Secret object into the openstack-ceph-shared namespace.
The OpenStack Controller watches this namespace. Once the corresponding
secret is created, the OpenStack Controller transforms this secret to the
data structures expected by the OpenStack-Helm charts. Even if an OpenStack
installation is triggered at the same time as a Ceph cluster deployment, the
OpenStack Controller halts the deployment of the OpenStack services that
depend on Ceph availability until the secret in the shared namespace is
created by the Ceph Controller.
For the configuration of Ceph Object Gateway as an OpenStack Object
Storage, the reverse process takes place. The OpenStack Controller waits
for the OpenStack-Helm to create a secret with OpenStack Identity
(Keystone) credentials that Ceph Object Gateway must use to validate the
OpenStack Identity tokens, and posts it back to the same
openstack-ceph-shared namespace in the format suitable for
consumption by the Ceph Controller. The Ceph Controller then reads this
secret and reconfigures Ceph Object Gateway accordingly.
StackLight is the logging, monitoring, and alerting solution that provides a
single pane of glass for cloud maintenance and day-to-day operations as well
as offers critical insights into cloud health including operational
information about the components deployed with Mirantis OpenStack for
Kubernetes (MOSK). StackLight is based on Prometheus, an
open-source monitoring solution and a time series database, and OpenSearch, the
logs and notifications storage.
Mirantis OpenStack for Kubernetes (MOSK) deploys the StackLight stack
as a release of a Helm chart that contains the helm-controller and HelmBundle
custom resources. The StackLight HelmBundle consists of a set of Helm charts
describing the StackLight components. Apart from the OpenStack-specific
components below, StackLight also includes the components described in
Mirantis Container Cloud Reference Architecture: Deployment architecture.
By default, StackLight logging stack is disabled.
During the StackLight configuration when deploying a MOSK
cluster, you can define the HA or non-HA StackLight architecture type.
Non-HA StackLight requires a backend storage provider, for example, a Ceph
cluster. For details, see Mirantis Container Cloud Reference Architecture:
StackLight database modes.
StackLight measures, analyzes, and reports in a timely manner about failures
that may occur in the following Mirantis OpenStack for Kubernetes
(MOSK)
components and their sub-components. Apart from the components below,
StackLight also monitors the components listed in
Mirantis Container Cloud Reference Architecture: Monitored components.
Calculations in this document are based on numbers from a
real-scale test cluster with 34 nodes. The exact space required for metrics
and logs must be calculated depending on the ongoing cluster operations.
Some operations force the generation of additional metrics and logs. The
values below are approximate. Use them only as recommendations.
During the deployment of a new cluster, you must specify the OpenSearch
retention time and Persistent Volume Claim (PVC) size, Prometheus PVC,
retention time, and retention size.
When configuring an existing cluster, you can only set OpenSearch
retention time, Prometheus retention time, and retention size.
The following table describes the recommendations for both OpenSearch
and Prometheus retention size and PVC size for a cluster with 34 nodes.
Retention time depends on the space allocated for the data. To calculate
the required retention time, use the
{retentiontime}={retentionsize}/{amountofdataperday} formula.
Service
Required space per day
Description
OpenSearch
StackLight in non-HA mode:
202 - 253 GB for the entire cluster
~6 - 7.5 GB for a single node
StackLight in HA mode:
404 - 506 GB for the entire cluster
~12 - 15 GB for a single node
When setting Persistent Volume Claim Size for OpenSearch
during the cluster creation, take into account that it defines the PVC
size for a single instance of the OpenSearch cluster. StackLight in HA
mode has 3 OpenSearch instances. Therefore, for a total OpenSearch
capacity, multiply the PVC size by 3.
Prometheus
11 GB for the entire cluster
~400 MB for a single node
Every Prometheus instance stores the entire database. Multiple replicas
store multiple copies of the same data. Therefore, treat the Prometheus
PVC size as the capacity of Prometheus in the cluster. Do not sum them
up.
Prometheus has built-in retention mechanisms based on the database size
and time series duration stored in the database. Therefore, if you
miscalculate the PVC size, retention size set to ~1 GB less than the PVC
size will prevent disk overfilling.
StackLight integration with OpenStack includes automatic discovery of RabbitMQ
credentials for notifications and OpenStack credentials for OpenStack API
metrics. For details, see the
openstack.rabbitmq.credentialsConfig and
openstack.telegraf.credentialsConfig parameters description in
StackLight configuration parameters.
Lifecycle management operations of a MOSK cluster may
impose impact on its workloads and, specifically, may cause network
connectivity interruptions for instances running in OpenStack.
To make sure that the downtime caused on the cloud applications still
fits into Service Level Agreements (SLAs), MOSK
provides the tooling to measure the network availability of instances.
Additionally, continuous monitoring of the network connectivity in the cluster
is essential for early detection of infrastructure problems.
MOSK offers cloud operators to oversee the availability
of workloads hosted in their OpenStack infrastructure on several levels:
Monitoring of floating IP addresses through the Cloudprober service
Monitoring of network ports availability through the Portprober service
Floating IP address availability monitoring (Cloudprober)¶
Available since MOSK 23.2TechPreview
The floating IP address availability monitoring service (Cloudprober) is a
special probing agent that starts on controller nodes and periodically pings
selected floating IP addresses. As of today, the agent supports
only Internet Control Message Protocol (ICMP) to determine the IP address
availability.
instance_availability_arch
To monitor the availability of floating IP addresses, your
MOSK cluster and workloads need to meet the
following requirements:
There must be the layer-3 connectivity between the clusters floating
IP networks and nodes running the OpenStack control plane.
The guest operating system of the monitored OpenStack instances must allow
the ICMP ingress and egress traffic.
OpenStack security groups used by the monitored instances must allow the ICMP
ingress and egress traffic.
To enable the floating IP address availability monitoring service, use the
following OpenStackDeployment definition:
Network port availability monitoring (Portprober)¶
Available since MOSK 24.2TechPreview
The network port availability monitoring service (Portprober) is implemented
as an extension to OpenStack Neutron service which gets enabled automatically
together with the cloudprober service described above.
Also, you can enable Portprober explicitly, regardless of whether Cloudprober
is enabled or not. To do so, specify the following structure in the
OpenStackDeployment custom resource:
The Portprober service is supported only for the following cloud
configurations:
OpenStack version is Antelope or newer
Neutron OVS backend for networking (Tungsten Fabric and OVN backends are
not supported)
portprober
The Portprober agent automatically connects to all OpenStack virtual networks
and probes all the ports that are plugged in there and are in the bound
state, meaning they are associated with an instance or a network service.
The service makes no difference between private and external networks and also
reports the availability of the ports that belong to virtual routers.
The service relies on the ARP protocol to determine port availability and
does not require any security groups to be assigned to monitored instances,
as opposed to the Floating IP address monitoring service (Cloudprober).
Among the known limitations of the network port availability monitoring
service is the lack of support for IPv6. The service ignores the ports that
do not have IPv4 addresses associated with them.
StackLight logging indices are managed by OpenSearch data streams, which are
introduced in OpenSearch 2.6. It is a convenient way to manage insert-only
pipelines such as log message collection. The solution consists of the
following elements:
Data stream objects that can be referred to as alias:
Audit - dedicated for Container Cloud, MKE, and host audit logs, ensuring
data integrity and security.
System - replaces Logstash for system logs, provides a streamlined
approach to log management.
Write index - current index where ingestion can be performed without
removing a data stream.
Read indices - indices created after the rollover mechanism is applied.
Rollover policy - creating new write index for data stream based on
the size of shards
This section contains a collection of Mirantis OpenStack for Kubernetes
(MOSK) architecture blueprints that include common cluster
topology and configuration patterns that can be referred to when building a
MOSK cloud. Every blueprint is validated by Mirantis and
is known to work. You can use these blueprints alone or in combination,
although the interoperability of all possible combinations can not be
guaranteed.
The section provides information on the target use cases, pros and cons of
every blueprint and outlines the extents of its applicability. However, do
not hesitate to reach out to Mirantis if you have any questions or doubts
on whether a specific blueprint can be applied when designing your cloud.
Although a classic cloud approach allows resources to be distributed across
multiple regions, it still needs powerful data centers to host control planes
and compute clusters. Such regional centralization poses challenges when the
number of data consumers grows. It becomes hard to access the resources hosted
in the cloud even though the resources are located in the same geographic
region. The solution would be to bring the data closer to the consumer.
And this is exactly what edge computing provides.
Edge computing is a paradigm that brings computation and data storage closer to
the sources of data or the consumer. It is designed to improve response time
and save bandwidth.
A few examples of use cases for edge computing include:
Hosting a video stream processing application on premises of a large stadium
during the Super Bowl match
Placing the inventory or augmented reality services directly in the
industrial facilities, such as storage, powerplant, shipyard, and so on
A small computation node deployed in a far-distanced village supermarket to
host an application for store automatization and accounting
These and many other use cases could be solved by deploying multiple edge
clusters managed from a single central place. The idea of centralized
management plays a significant role for the business efficiency of the edge
cloud environment:
Cloud operators obtain a single management console for the cloud that
simplifies the Day-1 provisioning of new edge sites and Day-2 operations
across multiple geographically distributed points of presence
Cloud users get ability to transparently connect their edge applications
with central databases or business logic components hosted in data centers
or public clouds
Depending on the size, location, and target use case, the points of presence
comprising an edge cloud environment can be divided into five major categories.
Mirantis OpenStack powered by Mirantis Container Cloud offers reference
architectures to address the centralized management in core and regional data
centers as well as edge sites.
Remote compute nodes is one of the approaches to the implementation of the
edge computing concept offered by MOSK. The topology
consists of a MOSK cluster residing in a data center,
which is extended with multiple small groups of compute nodes deployed in
geographically distanced remote sites. Remote compute nodes are integrated
into the MOSK cluster just like the nodes in the central
site with their configuration and life cycle managed through the same means.
Along with compute nodes, remote sites need to incorporate network gateway
components that allow application users to consume edge services directly
without looping the traffic through the central site.
Deployment of an edge cluster managed from a single central place starts with
a proper planning. This section provides recommendations on how to approach
the deployment design.
Compute nodes aggregation into availability zones¶
Mirantis recommends organizing nodes in each remote site into separate
Availability Zones in the MOSK Compute (OpenStack Nova),
Networking (OpenStack Neutron), and Block Storage (OpenStack Cinder)
services. This enables the cloud users to be aware of the failure domain
represented by a remote site and distribute the parts of their applications
accordingly.
Typically, high latency in between the central control plane and remote sites
makes it not feasible to rely on Ceph as a storage for the instance
root/ephemeral and block data.
Mirantis recommends that you configure the remote sites to use the following
backends:
Local storage (LVM or QCOW2) as a storage backend for the
MOSK Compute service. See Image storage backend
for the configuration details.
LVM on iSCSI backend for the MOSK Block Storage service.
See Enable LVM block storage for the enablement procedure.
To maintain the small size of a remote site, the compute nodes need to be
hyper-converged and combine the compute and block storage functions.
There is no limitation on the number of the remote sites and their size.
However, when planning the cluster, ensure consistency between the total number
of nodes managed by a single control plane and the value of the size
parameter set in the OpenStackDeployment custom resource. For the list of
supported sizes, refer to Main elements.
Additionally, the sizing of the remote site needs to take into account the
characteristics of the networking channel with the main site.
Typically, an edge site consists of 3-7 compute nodes installed in a single,
usually rented, rack.
Mirantis recommends keeping the network latency between the main and remote
sites as low as possible. For stable interoperability of cluster components,
the latency needs to be around 30-70 milliseconds. Though, depending on the
cluster configuration and dynamism of the workloads running in the remote site,
the stability of the cluster can be preserved with the latency of up to 190
milliseconds.
The bandwidth of the communication channel between the main and remote sites
needs to be sufficient to run the following traffic:
The control plane and management traffic, such as OpenStack messaging,
database access, MOSK underlay Kubernetes cluster control
plane, and so on. A single remote compute node in the idle state requires at
minimum 1.5 Mbit/s of bandwidth to perform the non-data plane communications.
The data plane traffic, such as OpenStack image operations, instances VNC
console traffic, and so on, that heavily depend on the profile of the
workloads and other aspects of the cloud usage.
In general, Mirantis recommends having a minimum of 100 MBit/s bandwidth
between the main and remote sites.
MOSK remote compute nodes architecture is designed to
tolerate a temporary loss of connectivity between the main cluster and
the remote sites. In case of a disconnection, the instances running
on remote compute nodes will keep running normally preserving their
ability to read and write ephemeral and block storage data presuming it
is located in the same site, as well as connectivity to their neighbours
and edge application users. However, the instances will not have access
to any cloud services or applications located outside of their remote site.
Since the MOSK control plane communicates with remote
compute nodes through the same network channel, cloud users will not be able
to perform any manipulations, for example, instance creation, deletion,
snapshotting, and so on, over their edge applications until the
connectivity gets restored. MOSK services providing high
availability to cloud applications, such as the Instance HA service and Network
service, need to be connected to the remote compute nodes to perform a failover
of application components running in the remote site.
Once the connectivity between the main and the remote site restores, all
functions become available again. The period during which an edge application
can sustain normal function after a connectivity loss is determined by multiple
factors including the selected networking backend for the
MOSK cluster. Mirantis recommends that a cloud operator
performs a set of test manipulations over the cloud resources hosted in the
remote site to ensure that it has been fully restored.
When configured in Tungsten Fabric-powered clouds, the Graceful restart and long-lived graceful restart
feature significantly improves the MOSK ability to sustain
the connectivity of workloads running at remote sites in situations when
a site experiences a loss of connection to the central hosting location of
the control plane.
Extensive testing has demonstrated that remote sites can effectively withstand
a 72-hour control plane disconnection with zero impact on the running
applications.
Given that a remote site communicates with its main MOSK
cluster across a wide area network (WAN), it becomes important to protect
sensitive data from being intercepted and viewed by a third party.
Specifically, you should ensure the protection of the data belonging to the
following cloud components:
Bare metal servers provisioning and control, Kubernetes cluster deployment
and management, Mirantis StackLight telemetry
MOSK control plane
Communication between the components of OpenStack, Tungsten Fabric, and
Mirantis Ceph
MOSK data plane
Cloud application traffic
The most reliable way to protect the data is to configure the network equipment
in the data center and the remote site to encapsulate all the bypassing
remote-to-main communications into an encrypted VPN tunnel. Alternatively,
Mirantis Container Cloud and MOSK can be configured to force
encryption of specific types of network traffic, such as:
Kubernetes networking for MOSK underlying Kubernetes
cluster that handles the vast majority of in-MOSK
communications
OpenStack tenant networking that carries all the cloud application traffic
The ability to enforce traffic encryption depends on the specific version of
the Mirantis Container Cloud and MOSK in use, as well as
the selected SDN backend for OpenStack.
In MOSK, the main cloud that controls remote computes can be
the regional site that locates the regional cluster and the
MOSK control plane. Additionally, it can contain a local
storage and compute nodes.
The remote computes implementation in MOSK considers
Tungsten Fabric as an SDN solution.
Remote computes bare metal servers are configured as Kubernetes workers
hosting the deployments for:
The architecture validation is perfomed by means of simultanious creation of
multiple OpenStack resources of various types and execution of functional tests
against each resource. The amount of resources hosted in the cluster at the
moment when a certain threshold of non-operational resources starts being
observed, is described below as cluster capacity limit.
Note
A successfully created resource has the Active status in the API
and passes the functional tests, for example, its floating IP address is
accessible. The MOSK cluster is considered to be able to
handle the created resources if it successfully performs the LCM operations
including the OpenStack services restart, both on the control and data
plane.
Note
The key limiting factor for creating more OpenStack objects in this
illustrative setup is hardware resources (vCPU and RAM) available on the
compute nodes.
Persistent storage is a key component of any MOSK
deployment. Out of the box, MOSK includes an open-source
software-defined storage solution (Ceph), which hosts various kinds of
cloud application data, such as root and ephemeral disks for virtual machines,
virtual machine images, attachable virtual block storage, and object data.
In addition, a Ceph cluster usually acts as a storage for the internal
MOSK components, such as Kubernetes, OpenStack, StackLight,
and so on.
Being distributed and redundant by design, Ceph requires a certain minimum
amount of servers, also known as OSD or storage nodes, to work.
A production-grade Ceph cluster typically consists of at least nine storage
nodes, while a development and test environment may include four to six
servers. For details, refer to MOSK cluster hardware requirements.
It is possible to reduce the overall footprint of a MOSK
cluster by collocating the Ceph components with hypervisors on the same
physical servers; this is also known as hyper-converged design. However,
this architecture still may not satisfy the requirements of certain use cases
for the cloud.
Standalone telco-edge MOSK clouds typically consist of
three to seven servers hosted in a single rack, where every piece of CPU,
memory and disk resources is strictly accounted and better be dedicated
to the cloud workloads, rather than control plane. For such clouds,
where the cluster footprint is more important than the resiliency of
the application data storage, it makes sense either not to have a Ceph
cluster at all or to replace it with some primitive non-redundant solution.
Enterprise virtualization infrastructure with third-party storage is
not a rare strategy among large companies that rely on proprietary storage
appliances, provided by NetApp, Dell, HPE, Pure Storage, and other major
players in the data storage sector. These industry leaders offer a variety
of storage solutions meticulously designed to suit various enterprise demands.
Many companies, having already invested substantially in proprietary storage
infrastructure, prefer integrating MOSK with their existing
storage systems. This approach allows them to leverage this investment rather
than incurring new costs and logistical complexities associated with
migrating to Ceph.
MOSK standard LVM+iSCSI backend for the Block
Storage service. This aligns in a seamless manner with the concept
of hyper-converged design, wherein the LVM volumes are collocated
on the compute nodes.
Local file system of one of the MOSK controller
nodes. By default, database backups are stored on the local file
system on the node where the MariaDB service is running. This imposes
a risk to cloud security and resiliency. For enterprise environments,
it is a common requirement to store all the backup data externally.
Alternatively, you can disable the database backup functionality.
Results of functional testing
OpenStack Tempest
Local file system of MOSK controller nodes.
The openstack-tempest-run-tests job responsible for running
the Tempest suite stores the results of its execution in a volume
requested through the pvc-tempest PersistentVolumeClaim
(PVC). The subject volume can be created by the local volume provisioner
on the same Kubernetes worker node, where the job runs. Usually, it is
a MOSK controller node.
You can configure the Block Storage service (OpenStack Cinder)
to be used as a storage backend for images and snapshots.
In this case, each image is represented as a volume.
Important
Representing volumes as images implies a hard
requirement for the selected block storage backend to support
multi-attach capability that is concurrent reads and writes to
and from a single volume.
External S3, Swift, or any other third-party storage solutions
compatible with object access protocols.
Note
An external object storage solution will not be integrated
into the MOSK identity service (OpenStack
Keystone), the cloud applications will need to take care of managing
access to their object data themselves.
If no Ceph is deployed as part of a cluster, the MOSK
built-in Object Storage service API endpoints are disabled automatically.
StackLight must be deployed in the HA mode, when all its data gets
stored on the local file system of the nodes running StackLight
services. In this mode, StackLight components get configured
to handle the data replication themselves.
The determination of whether a MOSK cloud will
include Ceph or not should take place during its planning and design
phase. Once the deployment is complete, reconfiguring the cloud
to switch between Ceph and non-Ceph architectures becomes impossible.
Mirantis recommends avoiding substitution of Ceph-backed persistent volumes
in the MOSK underlying Kubernetes cluster with local
volumes (local volume provisioner) for production environments.
MOSK does not support such configuration unless
the components that rely on these volumes can replicate
their data themselves, for example, StackLight. Volumes provided by
the local volume provisioner are not redundant, as they are bound
to just a single node and can only be mounted from the Kubernetes
pods running on the same nodes.
This section describes internal implementation of the node maintenance API
and how OpenStack and Tungsten Fabric controllers communicate with LCM and
each other during a managed cluster update.
The WorkloadLock objects are created by each Application Controller.
These objects prevent LCM from performing any changes on the cluster or node
level while the lock is in the active state. The inactive state of the lock
means that the Application Controller has finished its work and the LCM can
proceed with the node or cluster maintenance.
The MaintenanceRequest objects are created by LCM. These objects notify
Application Controllers about the upcoming maintenance of a cluster or
a specific node.
ClusterMaintenanceRequest object example configuration¶
The scope parameter in the object specification defines the impact on
the managed cluster or node. The list of the available options include:
drain
A regular managed cluster update. Each node in the cluster
goes over a drain procedure. No node reboot takes place, a maximum impact
includes restart of services on the node including Docker, which causes
the restart of all containers present in the cluster.
os
A node might be rebooted during the update. Triggers the workload
evacuation by the OpenStack Controller (Rockoon).
When the MaintenanceRequest object is created, an Application Controller
executes a handler to prepare workloads for maintenance and put appropriate
WorkloadLock objects into the inactive state.
When maintenance is over, LCM removes MaintenanceRequest objects,
and the Application Controllers move their WorkloadLocks objects into
the active state.
When LCM creates the ClusterMaintenanceRequest object, the OpenStack
Controller (Rockoon) ensures that all OpenStack components are in the
Healthy state, which means that the pods are up and running, and the
readiness probes are passing.
When LCM creates the NodeMaintenanceRequest, the OpenStack Controller:
Prepares components on the node for maintenance by removing
nova-compute from scheduling.
If the reboot of a node is possible, the instance migration workflow is
triggered. The Operator can configure the instance migration flow
through the Kubernetes node annotation and should define the required option
before the managed cluster update. For configuration details, refer to
Instance migration configuration for hosts.
Also, since MOSK 25.1, cloud users can mark their
instances for LCM to handle them individually during host maintenance
operations. This allows for greater flexibility during cluster updates,
especially for workloads that are sensitive to live migration. For
details, refer to Configure per-instance migration mode.
If the OpenStack Controller cannot migrate instances due to errors, it
is suspended unless all instances are migrated manually or
the openstack.lcm.mirantis.com/instance_migration_mode annotation
is set to skip.
When the node maintenance is over, LCM removes the NodeMaintenanceRequest
object and the OpenStack Controller:
Verifies that the Kubernetes Node becomes Ready.
Verifies that all OpenStack components on a given node are Healthy,
which means that the pods are up and running, and the readiness probes
are passing.
Ensures that the OpenStack components are connected to RabbitMQ.
For example, the Neutron Agents become alive on the node, and compute
instances are in the UP state.
Note
The OpenStack Controller enables you to have only one
nodeworkloadlock object at a time in the inactive state. Therefore,
the update process for nodes is sequential.
The Tungsten Fabric (TF) Controller creates and uses both types of
workloadlocks that include ClusterWorkloadLock and NodeWorkloadLock.
When the ClusterMaintenanceRequest object is created, the TF Controller
verifies the TF cluster health status and proceeds as follows:
If the cluster is Ready , the TF Controller moves the
ClusterWorkloadLock object to the inactive state.
Otherwise, the TF Controller keeps the ClusterWorkloadLock object
in the active state.
When the NodeMaintenanceRequest object is created, the TF Controller
verifies the vRouter pod state on the corresponding node and proceeds as
follows:
If all containers are Ready, the TF Controller moves the
NodeWorkloadLock object to the inactive state.
Otherwise, the TF Controller keeps the NodeWorkloadLock in the active
state.
Note
If there is a NodeWorkloadLock object in the inactive state
present in the cluster, the TF Controller does not process the
NodeMaintenanceRequest object for other nodes until this inactive
NodeWorkloadLock object becomes active.
When the cluster LCM removes the MaintenanceRequest object, the TF
Controller waits for the vRouter pods to become ready and proceeds as follows:
If all containers are in the Ready state, the TF Controller moves
the NodeWorkloadLock object to the active state.
Otherwise, the TF Controller keeps the NodeWorkloadLock object in the
inactive state.
This section describes the MOSK cluster update
flow to the product releases that contain major updates and require node reboot
such as support for new Linux kernel, and similar.
The diagram below illustrates the sequence of operations controlled by
LCM and taking place during the update under the hood. We assume that the
ClusterWorkloadLock and NodeWrokloadLock objects present in the cluster
are in the active state before the cloud operator triggers the update.
Cluster update flow
See also
For details about the Application Controllers flow during different
maintenance stages, refer to:
Since MOSK 25.1, the OpenStack Controller has been open-sourced under the
name Rockoon and is maintained as an independent open-source project
going forward.
As part of this transition, all openstack-controller pods are named
rockoon pods across the MOSK documentation and deployments. This change
does not affect functionality, but this is the reminder for the users to
utilize the new naming for pods and other related artifacts accordingly.
MOSK enables you to parallelize node update operations,
significantly improving the efficiency of your deployment. This capability
applies to any operation that utilizes the Node Maintenance API, such as
cluster updates or graceful node reboots.
The core implementation of parallel updates is handled by the LCM Controller
ensuring seamless execution of parallel operations. LCM starts performing an
operation on the node only when all NodeWorkloadLock objects for the node
are marked as inactive. By default, the LCM Controller creates one
NodeMaintenanceRequest at a time.
Each application controller, including Ceph, OpenStack, and Tungsten Fabric
Controllers, manages parallel NodeMaintenanceRequest objects independently.
The controllers determine how to handle and execute parallel node maintenance
requests based on specific requirements of their respective applications.
To understand the workflow of the Node Maintenance API, refer to
WorkloadLock objects.
You can optimize parallel updates by setting the order in which nodes are
updated. You can accomplish this by configuring upgradeIndex of
the Machine object. For the procedure, refer to
Change the upgrade order of a machine.
Increase parallelism.
Boost parallelism by adjusting the maximum number of worker node updates
that are allowed during LCM operations using the
spec.providerSpec.value.maxWorkerUpgradeCount configuration parameter,
which is set to 1 by default.
By default, the OpenStack Controller handles the NodeMaintenanceRequest
objects as follows:
Updates the OpenStack controller nodes sequentially (one by one).
Updates the gateway nodes sequentially. Technically, you can increase
the number of gateway nodes upgrades allowed in parallel using the
nwl_parallel_max_gateway parameter but Mirantis does not recommend
to do so.
Updates the compute nodes in parallel. The default number of allowed
parallel updates is 30. You can adjust this value through
the nwl_parallel_max_compute parameter.
Parallelism considerations for compute nodes
When considering parallelism for compute nodes, take into account that
during certain pod restarts, for example, the openvswitch-vswitchd
pods, a brief instance downtime may occur. Select a suitable level
of parallelism to minimize the impact on workloads and prevent excessive
load on the control plane nodes.
If your cloud environment is distributed across failure domains, which are
represented by Nova availability zones, you can limit the parallel updates
of nodes to only those within the same availability zone. This behavior is
controlled by the respect_nova_az option in the OpenStack Controller.
The OpenStack Controller configuration is stored in the
rockoon-config configMap of the osh-system namespace.
The options are picked up automatically after update. To learn more about
the OpenStack Controller (Rockoon) configuration parameters,
refer to OpenStack Controller configuration.
By default, the Ceph Controller handles the NodeMaintenanceRequest
objects as follows:
Updates the non-storage nodes sequentially. Non-storage nodes include all
nodes that have mon, mgr, rgw, or mds roles.
Updates storage nodes in parallel. The default number of allowed
parallel updates is calculated automatically based on the minimal
failure domain in a Ceph cluster.
Parallelism calculations for storage nodes
The Ceph Controller automatically calculates the parallelism number
in the following way:
Finds the minimal failure domain for a Ceph cluster. For example,
the minimal failure domain is rack.
Filters all currently requested nodes by minimal failure domain.
For example, parallelism equals to 5, and LCM requests 3 nodes from
the rack1 rack and 2 nodes from the rack2 rack.
Handles each filtered node group one by one. For example, the controller
handles in parallel all nodes from rack1 before processing nodes
from rack2.
The Ceph Controller handles non-storage nodes before the storage
ones. If there are node requests for both node types, the Ceph Controller
handles sequentially the non-storage nodes first. Therefore, Mirantis
recommends setting the upgrade index of a higher priority for the non-storage
nodes to decrease the total upgrade time.
If the minimal failure domain is host, the Ceph Controller updates only
one storage node per failure domain unit. This results in updating all Ceph
nodes sequentially, despite the potential for increased parallelism.
By default, the Tungsten Fabric Controller handles the
NodeMaintenanceRequest objects as follows:
Updates the Tungsten Fabric Controller and gateway nodes sequentially.
Updates the vRouter nodes in parallel. The Tungsten Fabric Controller
allows updating up to 30 vRouter nodes in parallel.
Maximum amount of vRouter nodes in maintenance
While the Tungsten Fabric Controller has the capability to process up
to 30 NodeMaintenanceRequest objects targeted to vRouter nodes,
the actual amount may be lower. This is due to a check that ensures
OpenStack readiness to unlock the relevant nodes for maintenance.
If OpenStack allows for maintenance, the Tungsten Fabric Controller
verifies the vRouter pods. Upon successful verification,
the NodeWorkloadLock object is switched to the maintenance mode.
Mirantis OpenStack for Kubernetes (MOSK) enables the operator to
create, scale, update, and upgrade OpenStack deployments on Kubernetes through
a declarative API.
The Kubernetes built-in features, such as flexibility, scalability, and
declarative resource definition make MOSK a robust solution.
The detailed plan of any Mirantis OpenStack for Kubernetes (MOSK)
deployment is determined on a per-cloud basis. For the MOSK
reference architecture and design overview, see Reference Architecture.
One of the industry best practices is to verify every new update or
configuration change in a non-customer-facing environment before
applying it to production. Therefore, Mirantis recommends
having a staging cloud, deployed and maintained along with the production
clouds. The recommendation is especially applicable to the environments
that:
Receive updates often and use continuous delivery. For example,
any non-isolated deployment of Mirantis Container Cloud.
Have significant deviations from the reference architecture or
third party extensions installed.
Are managed under the Mirantis OpsCare program.
Run business-critical workloads where even the slightest application
downtime is unacceptable.
A typical staging cloud is a complete copy of the production environment
including the hardware and software configurations, but with a bare minimum
of compute and storage capacity.
The bare metal management system enables the Infrastructure Operator to
deploy a Container Cloud management cluster on a set of bare metal servers.
It also enables Container Cloud to deploy MOSK clusters on
bare metal servers without a pre-provisioned operating system.
This section instructs you on how to provision and deploy a Container Cloud
management cluster.
Mirantis Container Cloud Bootstrap v2 provides best user experience to set up
Container Cloud. Using Bootstrap v2, you can provision and operate management
clusters using required objects through the Container Cloud API.
Basic concepts and components of Bootstrap v2 include:
Bootstrap cluster
Bootstrap cluster is any kind-based Kubernetes cluster that contains a
minimal set of Container Cloud bootstrap components allowing the user to
prepare the configuration for management cluster deployment and start the
deployment. The list of these components includes:
Bootstrap Controller
Controller that is responsible for:
Configuration of a bootstrap cluster with provider charts through the
bootstrap Helm bundle.
Configuration and deployment of a management cluster and
its related objects.
Helm Controller
Operator that manages Helm chart releases. It installs the Container
Cloud bootstrap and provider charts configured in the bootstrap Helm
bundle.
Public API charts
Helm charts that contain custom resource definitions for Container Cloud
resources.
Admission Controller
Controller that performs mutations and validations for the Container
Cloud resources including cluster and machines configuration.
Currently one bootstrap cluster can be used for deployment of only one
management cluster. For example, to add a new management cluster with
different settings, a new bootstrap cluster must be created from scratch.
Bootstrap region
BootstrapRegion is the first object to create in the bootstrap cluster
for the Bootstrap Controller to identify and install provider components
onto the bootstrap cluster. After, the user can prepare and deploy a
management cluster with related resources.
The bootstrap region is a starting point for the cluster deployment. The
user needs to approve the BootstrapRegion object. Otherwise, the
Bootstrap Controller will not be triggered for the cluster deployment.
Bootstrap Helm bundle
Helm bundle that contains charts configuration for the bootstrap cluster.
This object is managed by the Bootstrap Controller that updates the provider
bundle in the BootstrapRegion object. The Bootstrap Controller always
configures provider charts listed in the regional section of the
Container Cloud release for the provider. Depending on the cluster
configuration, the Bootstrap Controller may update or reconfigure this
bundle even after the cluster deployment starts. For example, the Bootstrap
Controller enables the provider in the bootstrap cluster only after the
bootstrap region is approved for the deployment.
Management cluster deployment consists of several sequential stages.
Each stage finishes when a specific condition is met or specific configuration
applies to a cluster or its machines.
In case of issues at any deployment stage, you can identify the problem
and adjust it on the fly. The cluster deployment does not abort until all
stages complete by means of the infinite-timeout option enabled
by default in Bootstrap v2.
Infinite timeout prevents the bootstrap failure due to timeout. This option
is useful in the following cases:
The network speed is slow for artifacts downloading
The infrastructure configuration does not allow fast booting
The inspection of a bare-metal node presupposes more than two HDDSATA disks
to attach to a machine
You can track the status of each stage in the bootstrapStatus section of
the Cluster object that is updated by the Bootstrap Controller.
The Bootstrap Controller starts deploying the cluster after you approve the
BootstrapRegion configuration.
The following table describes deployment states of a management cluster that
apply in the strict order.
Verifies proxy configuration in the Cluster object.
If the bootstrap cluster was created without a proxy, no actions are
applied to the cluster.
2
ClusterSSHConfigured
Verifies SSH configuration for the cluster and machines.
You can provide any number of SSH public keys, which are added to
cluster machines. But the Bootstrap Controller always adds the
bootstrap-key SSH public key to the cluster configuration. The
Bootstrap Controller uses this SSH key to manage the lcm-agent
configuration on cluster machines.
The bootstrap-key SSH key is copied to a
bootstrap-key-<clusterName> object containing the cluster name in
its name.
3
ProviderUpdatedInBootstrap
Synchronizes the provider and settings of its components between the
Cluster object and bootstrap Helm bundle. Settings provided in
the cluster configuration have higher priority than the default
settings of the bootstrap cluster, except CDN.
4
ProviderEnabledInBootstrap
Enables the provider and its components if any were disabled by the
Bootstrap Controller during preparation of the bootstrap region.
A cluster and machines deployment starts after the provider enablement.
5
Nodes readiness
Waits for the provider to complete nodes deployment that comprises VMs
creation and MKE installation.
6
ObjectsCreated
Creates required namespaces and IAM secrets.
7
ProviderConfigured
Verifies the provider configuration in the provisioned cluster.
8
HelmBundleReady
Verifies the Helm bundle readiness for the provisioned cluster.
9
ControllersDisabledBeforePivot
Collects the list of deployment controllers and disables them to
prepare for pivot.
10
PivotDone
Moves all cluster-related objects from the bootstrap cluster to the
provisioned cluster. The copies of Cluster and Machine objects
remain in the bootstrap cluster to provide the status information to the
user. About every minute, the Bootstrap Controller reconciles the status
of the Cluster and Machine objects of the provisioned cluster
to the bootstrap cluster.
11
ControllersEnabledAfterPivot
Enables controllers in the provisioned cluster.
12
MachinesLCMAgentUpdated
Updates the lcm-agent configuration on machines to target LCM
agents to the provisioned cluster.
13
HelmControllerDisabledBeforeConfig
Disables the Helm Controller before reconfiguration.
14
HelmControllerConfigUpdated
Updates the Helm Controller configuration for the provisioned cluster.
15
Cluster readiness
Contains information about the global cluster status. The Bootstrap
Controller verifies that OIDC, Helm releases, and all Deployments are
ready. Once the cluster is ready, the Bootstrap Controller stops
managing the cluster.
The setup of a bootstrap cluster comprises preparation of the seed node,
configuration of environment variables, acquisition of the Container Cloud
license file, and execution of the bootstrap script.
Install basic Ubuntu 22.04 server using standard installation images
of the operating system on the bare metal seed node.
Log in to the seed node that is running Ubuntu 22.04.
Configure the operating system and network:
Operating system and network configuration
Establish a virtual bridge using an IP address of the PXE network on the
seed node. Use the following netplan-based configuration file
as an example:
# cat /etc/netplan/config.yamlnetwork:version:2renderer:networkdethernets:ens3:dhcp4:falsedhcp6:falsebridges:br0:addresses:# Replace with IP address from PXE network to create a virtual bridge-10.0.0.15/24dhcp4:falsedhcp6:false# Adjust for your environmentgateway4:10.0.0.1interfaces:# Interface name may be different in your environment-ens3nameservers:addresses:# Adjust for your environment-8.8.8.8parameters:forward-delay:4stp:false
Apply the new network configuration using netplan:
If you require all Internet access to go through a proxy server
for security and audit purposes, configure Docker proxy settings
as described in the official
Docker documentation.
To verify that Docker is configured correctly and has access to Container
Cloud CDN:
Verify that the seed node has direct access to the Baseboard
Management Controller (BMC) of each bare metal host. All target
hardware nodes must be in the poweroff state.
The provisioning IP address in the PXE network. This address will be
assigned on the seed node to the interface defined by the
KAAS_BM_PXE_BRIDGE parameter described below. The PXE service
of the bootstrap cluster uses this address to network boot
bare metal hosts.
172.16.59.5
KAAS_BM_PXE_MASK
The PXE network address prefix length to be used with the
KAAS_BM_PXE_IP address when assigning it to the seed node
interface.
24
KAAS_BM_PXE_BRIDGE
The PXE network bridge name that must match the name of the bridge
created on the seed node during preparation of the system and
network configuration described earlier in this procedure.
br0
Optional. Configure proxy settings to bootstrap the cluster using proxy:
After the bootstrap cluster is set up, the bootstrap-proxy object is
created with the provided proxy settings. You can use this object later for
the Cluster object configuration.
Deploy the bootstrap cluster:
./bootstrap.shbootstrapv2
Make sure that port 80 is open for localhost to prevent security
requirements for the seed node:
This section contains an overview of the cluster-related objects along with
the configuration procedure of these objects during deployment of a
management cluster using Bootstrap v2 through the Container Cloud API.
Overview of the cluster-related objects in the Container Cloud API/CLI¶
The following cluster-related objects are available through the Container
Cloud API. Use these objects to deploy a management cluster using the
Container Cloud API.
Region and provider names for a management cluster and all related
objects. First object to create in the bootstrap cluster. For
the bootstrap region definition, see Introduction.
SSHKey
Optional. SSH configuration with any number of SSH public keys to be
added to cluster machines.
By default, any bootstrap cluster has a pregenerated bootstrap-key
object to use for the cluster configuration. This is the service SSH key
used by the Bootstrap Controller to access machines for their
deployment. The private part of bootstrap-key is always saved to
kaas-bootstrap/ssh_key.
Proxy
Proxy configuration. Mandatory for offline environments with no direct
access to the Internet. Such configuration usually contains proxy for
the bootstrap cluster and already has the bootstrap-proxy object
to use in the cluster configuration by default.
Machine configuration that must fit the following requirements:
Role - only manager
Number - odd for the management cluster HA
Mandatory labels - provider and cluster-name
ServiceUser
Service user is the initial user to create in Keycloak for
access to a newly deployed management cluster. By default, it has the
global-admin, operator (namespaced), and bm-pool-operator
(namespaced) roles.
You can delete serviceuser after setting up other required users with
specific roles or after any integration with an external identity provider,
such as LDAP.
BareMetalHostPrivate API since MCC 2.29.0 (16.4.0)
Before update of the management cluster to Container Cloud 2.29.0
(Cluster release 16.4.0), instead of BareMetalHostInventory, use the
BareMetalHost object. For details, see Container Cloud API Reference:
BareMetalHost resource.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
Advanced host networking configuration for clusters, which enables, for
example, creation of bond interfaces on top of physical interfaces on the
host or the use of multiple subnets to separate different types of network
traffic. For details, see Container Cloud API Reference: L2Template.
MetalLBConfigTemplateUnsupported since MCC 2.28.0 (16.3.0)
Deprecated in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0)
and unsupported since Container Cloud 2.28.0 (Cluster releases 17.3.0 and
16.3.0). Before Container Cloud 2.27.0, the default object for the MetalLB
configuration, which enables the use of Subnet objects to define MetalLB
IP address pools. For details, see Container Cloud API Reference: MetalLBConfigTemplate.
The following procedure describes how to prepare and deploy a management
cluster using Bootstrap v2 by operating YAML templates available in the
kaas-bootstrap/templates/ folder.
The kubectl apply command automatically saves the
applied data as plain text into the
kubectl.kubernetes.io/last-applied-configuration annotation of the
corresponding object. This may result in revealing sensitive data in this
annotation when creating or modifying objects containing credentials.
Such Container Cloud objects include:
BareMetalHostCredential
ClusterOIDCConfiguration
License
Proxy
ServiceUser
TLSConfig
Therefore, do not use kubectl apply on these objects.
Use kubectl create, kubectl patch, or
kubectl edit instead.
If you used kubectl apply on these objects, you
can remove the kubectl.kubernetes.io/last-applied-configuration
annotation from the objects using kubectl edit.
Create the BootstrapRegion object by modifying
bootstrapregion.yaml.template.
Configuration of bootstrapregion.yaml.template
Set provider:baremetal and use the default <regionName>,
which is region-one.
Create the ServiceUser object by modifying
serviceusers.yaml.template.
Configuration of serviceusers.yaml.template
Service user is the initial user to create in Keycloak for
access to a newly deployed management cluster. By default, it has the
global-admin, operator (namespaced), and bm-pool-operator
(namespaced) roles.
You can delete serviceuser after setting up other required users with
specific roles or after any integration with an external identity provider,
such as LDAP.
Inspect the default bare metal host profile definition in
baremetalhostprofiles.yaml.template and adjust it to
fit your hardware configuration. For details,
see Customize the default bare metal host profile.
Warning
All data will be wiped during cluster deployment on devices
defined directly or indirectly in the fileSystems list of
BareMetalHostProfile. For example:
A raw device partition with a file system on it
A device partition in a volume group with a logical volume that has a
file system on it
An mdadm RAID device with a file system on it
An LVM RAID device with a file system on it
The wipe field is always considered true for these devices.
The false value is ignored.
Therefore, to prevent data loss, move the necessary data from these file
systems to another server beforehand, if required.
In baremetalhostinventory.yaml.template, update the
bare metal host definitions according to your environment
configuration. Use the reference table below to manually set all
parameters that start with SET_.
Mandatory parameters for a bare metal host template
The MAC address of the first master node in the PXE network.
ac:1f:6b:02:84:71
SET_MACHINE_0_BMC_ADDRESS
The IP address of the BMC endpoint for the first master node in
the cluster. Must be an address from the OOB network
that is accessible through the management network gateway.
The MAC address of the second master node in the PXE network.
ac:1f:6b:02:84:72
SET_MACHINE_1_BMC_ADDRESS
The IP address of the BMC endpoint for the second master node
in the cluster. Must be an address from the OOB network that is
accessible through the management network gateway.
The MAC address of the third master node in the PXE network.
ac:1f:6b:02:84:73
SET_MACHINE_2_BMC_ADDRESS
The IP address of the BMC endpoint for the third master node in
the cluster. Must be an address from the OOB network
that is accessible through the management network gateway.
The parameter requires a user name and password in plain
text.
Configure cluster network:
Important
Bootstrap V2 supports only separated PXE and LCM networks.
Update the network object definition in ipam-objects.yaml.template
according to the environment configuration. By default, this template
implies the use of separate PXE and life-cycle management (LCM) networks.
Manually set all parameters that start with SET_.
To ensure successful bootstrap, enable asymmetric routing on the interfaces
of the management cluster nodes. This is required because the seed node relies
on one network by default, which can potentially cause traffic asymmetry.
In the kernelParameters section of baremetalhostprofiles.yaml.template,
set rp_filter to 2. This enables loose mode as defined in
RFC3704.
Example configuration of asymmetric routing
...kernelParameters:...sysctl:# Enables the "Loose mode" for the "k8s-lcm" interface (management network)net.ipv4.conf.k8s-lcm.rp_filter:"2"# Enables the "Loose mode" for the "bond0" interface (PXE network)net.ipv4.conf.bond0.rp_filter:"2"...
Note
More complicated solutions that are not described in this manual
include getting rid of traffic asymmetry, for example:
Configure source routing on management cluster nodes.
Plug the seed node into the same networks as the management cluster nodes,
which requires custom configuration of the seed node.
For configuration details of bond network interface for the PXE and
management network, see Configure NIC bonding.
Example of the default L2 template snippet for a management
cluster
In this example, the following configuration applies:
A bond of two NIC interfaces
A static address in the PXE network set on the bond
An isolated L2 segment for the LCM network is configured using
the k8s-lcm VLAN with the static address in the LCM network
The default gateway address is in the LCM network
For general concepts of configuring separate PXE and LCM networks for
a management cluster, see Separate PXE and management networks. For current object
templates and variable names to use, see the following tables.
Network parameters mapping overview
Deployment file name
Parameters list to update manually
ipam-objects.yaml.template
SET_LB_HOST
SET_MGMT_ADDR_RANGE
SET_MGMT_CIDR
SET_MGMT_DNS
SET_MGMT_NW_GW
SET_MGMT_SVC_POOL
SET_PXE_ADDR_POOL
SET_PXE_ADDR_RANGE
SET_PXE_CIDR
SET_PXE_SVC_POOL
SET_VLAN_ID
bootstrap.env
KAAS_BM_PXE_IP
KAAS_BM_PXE_MASK
KAAS_BM_PXE_BRIDGE
Mandatory network parameters of the IPAM object template
The following table contains examples of mandatory parameter values to
set in ipam-objects.yaml.template for the network scheme that has the
following networks:
172.16.59.0/24 - PXE network
172.16.61.0/25 - LCM network
Parameter
Description
Example value
SET_PXE_CIDR
The IP address of the PXE network in the CIDR notation. The minimum
recommended network size is 256 addresses (/24 prefix length).
172.16.59.0/24
SET_PXE_SVC_POOL
The IP address range to use for endpoints of load balancers in the PXE
network for the Container Cloud services: Ironic-API, DHCP server,
HTTP server, and caching server. The minimum required range size is
5 addresses.
172.16.59.6-172.16.59.15
SET_PXE_ADDR_POOL
The IP address range in the PXE network to use for dynamic address
allocation for hosts during inspection and provisioning.
The minimum recommended range size is 30 addresses for management
cluster nodes if it is located in a separate PXE network segment.
Otherwise, it depends on the number of managed cluster nodes to
deploy in the same PXE network segment as the management cluster nodes.
172.16.59.51-172.16.59.200
SET_PXE_ADDR_RANGE
The IP address range in the PXE network to use for static address
allocation on each management cluster node. The minimum recommended
range size is 6 addresses.
172.16.59.41-172.16.59.50
SET_MGMT_CIDR
The IP address of the LCM network for the management cluster
in the CIDR notation.
If managed clusters will have their separate LCM networks, those
networks must be routable to the LCM network. The minimum
recommended network size is 128 addresses (/25 prefix length).
172.16.61.0/25
SET_MGMT_NW_GW
The default gateway address in the LCM network. This gateway
must provide access to the OOB network of the Container Cloud cluster
and to the Internet to download the Mirantis artifacts.
172.16.61.1
SET_LB_HOST
The IP address of the externally accessible MKE API endpoint
of the cluster in the CIDR notation. This address must be within
the management SET_MGMT_CIDR network but must NOT overlap
with any other addresses or address ranges within this network.
External load balancers are not supported.
172.16.61.5/32
SET_MGMT_DNS
An external (non-Kubernetes) DNS server accessible from the
LCM network.
8.8.8.8
SET_MGMT_ADDR_RANGE
The IP address range that includes addresses to be allocated to
bare metal hosts in the LCM network for the management cluster.
When this network is shared with managed clusters, the size of this
range limits the number of hosts that can be deployed in all clusters
sharing this network.
When this network is solely used by a management cluster, the range
must include at least 6 addresses for bare metal hosts of the
management cluster.
172.16.61.30-172.16.61.40
SET_MGMT_SVC_POOL
The IP address range to use for the externally accessible endpoints
of load balancers in the LCM network for the Container Cloud
services, such as Keycloak, web UI, and so on. The minimum required
range size is 19 addresses.
172.16.61.10-172.16.61.29
SET_VLAN_ID
The VLAN ID used for isolation of LCM network. The
bootstrap.sh process and the seed node must have routable
access to the network in this VLAN.
3975
While using separate PXE and LCM networks, the management cluster
services are exposed in different networks using two separate MetalLB
address pools:
Services exposed through the PXE network are as follows:
Ironic API as a bare metal provisioning server
HTTP server that provides images for network boot and server
provisioning
Caching server for accessing the Container Cloud artifacts
deployed on hosts
Services exposed through the LCM network are all other
Container Cloud services, such as Keycloak, web UI, and so on.
The default MetalLB configuration described in the MetalLBConfig
object template of metallbconfig.yaml.template uses two separate MetalLB
address pools. Also, it uses the interfaces selector in its
l2Advertisements template.
Caution
When you change the L2Template object template in
ipam-objects.yaml.template, ensure that interfaces
listed in the interfaces field of the
MetalLBConfig.spec.l2Advertisements section match those used in
your L2Template. For details about the interfaces selector,
see Container Cloud API Reference: MetalLBConfig spec.
Update the cluster-related settings to fit your deployment.
Optional. Technology Preview. Deprecated since Container Cloud 2.29.0
(Cluster release 16.4.0). Enable WireGuard for traffic encryption on
the Kubernetes workloads network.
WireGuard configuration
Ensure that the Calico MTU size is at least 60 bytes smaller than the
interface MTU size of the workload network. IPv4 WireGuard uses a
60-byte header. For details, see Set the MTU size for Calico.
In cluster.yaml.template, enable WireGuard by adding
the secureOverlay parameter:
spec:...providerSpec:value:...secureOverlay:true
Caution
Changing this parameter on a running cluster causes a
downtime that can vary depending on the cluster size.
Adjust spec and labels sections of each entry according to your
deployment.
Adjust the spec.providerSpec.value.hostSelector values to match
BareMetalHostInventory corresponding to each machine. For details,
see Container Cloud API Reference: Machine spec.
Monitor the inspecting process of the baremetal hosts and wait until all
hosts are in the available state:
kubectlgetbmh-ogo-template='{{- range .items -}} {{.status.provisioning.state}}{{"\n"}} {{- end -}}'
Example of system response:
available
available
available
Monitor the BootstrapRegion object status and wait until it is ready.
For a more user-friendly system response, consider using dedicated tools
such as jq or yq and adjust the -o flag to output
in the json or yaml format accordingly.
Change the directory to /kaas-bootstrap/.
Approve the BootstrapRegion object to start the cluster deployment:
./container-cloudbootstrapapproveall
Caution
Once you approve the BootstrapRegion object, no cluster or
machine modification is allowed.
Warning
Do not manually restart or power off any of the bare metal
hosts during the bootstrap process.
Not all of Swarm and MCR addresses are usually in use. One Swarm Ingress
network is created by default and occupies the 10.0.0.0/24 address
block. Also, three MCR networks are created by default and occupy
three address blocks: 10.99.0.0/20, 10.99.16.0/20,
10.99.32.0/20.
To verify the actual networks state and addresses in use, run:
dockernetworkls
dockernetworkinspect<networkName>
Optional. If you plan to use multiple L2 segments for provisioning of
managed cluster nodes, consider the requirements specified in
Configure multiple DHCP address ranges.
Before update of the management cluster to Container Cloud 2.29.0
(Cluster release 16.4.0), instead of BareMetalHostInventory, use the
BareMetalHost object. For details, see Container Cloud API Reference:
BareMetalHost resource.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
Before adding new BareMetalHostInventory objects, configure hardware hosts
to correctly boot them over the PXE network.
Important
Consider the following common requirements for hardware hosts
configuration:
Update firmware for BIOS and Baseboard Management Controller (BMC) to the
latest available version, especially if you are going to apply the UEFI
configuration.
Container Cloud uses the ipxe.efi binary loader that might be not
compatible with old firmware and have vendor-related issues with UEFI
booting. For example, the Supermicro issue.
In this case, we recommend using the legacy booting format.
Configure all or at least the PXE NIC on switches.
If the hardware host has more than one PXE NIC to boot, we strongly
recommend setting up only one in the boot order. It speeds up the
provisioning phase significantly.
Some hardware vendors require a host to be rebooted during BIOS
configuration changes from legacy to UEFI or vice versa for the
extra option with NIC settings to appear in the menu.
Connect only one Ethernet port on a host to the PXE network at any given
time. Collect the physical address (MAC) of this interface and use it to
configure the BareMetalHostInventory object describing the host.
To configure BIOS on a bare metal host:
Legacy hardware host configuration
Enable the global BIOS mode using
BIOS > Boot > boot mode select > legacy. Reboot the host
if required.
Enable the LAN-PXE-OPROM support using the following menus:
This section provides description of the bare metal host profile settings and
provides instructions on how to configure this profile before deploying
Mirantis Container Cloud on physical servers.
The bare metal host profile is a Kubernetes custom resource. It allows the
infrastructure operator to define how the storage devices and the operating
system are provisioned and configured.
The bootstrap templates for a bare metal deployment include the template for
the default BareMetalHostProfile object in the following file that defines
the default bare metal host profile:
templates/bm/baremetalhostprofiles.yaml.template
Note
Using BareMetalHostProfile, you can configure LVM or mdadm-based
software RAID support during a management or managed cluster creation. For
details, see Configure RAID support.
Warning
All data will be wiped during cluster deployment on devices
defined directly or indirectly in the fileSystems list of
BareMetalHostProfile. For example:
A raw device partition with a file system on it
A device partition in a volume group with a logical volume that has a
file system on it
An mdadm RAID device with a file system on it
An LVM RAID device with a file system on it
The wipe field is always considered true for these devices.
The false value is ignored.
Therefore, to prevent data loss, move the necessary data from these file
systems to another server beforehand, if required.
The customization procedure of BareMetalHostProfile is almost the same for
the management and managed clusters, with the following differences:
For a management cluster, the customization automatically applies to machines
during bootstrap. And for a managed cluster, you apply the changes using
kubectl before creating a managed cluster.
For a management cluster, you edit the default
baremetalhostprofiles.yaml.template. And for a managed cluster, you
create a new BareMetalHostProfile with the necessary configuration.
For the procedure details, see Create a custom bare metal host profile.
Use this procedure for both types of clusters considering the differences
described above.
You can configure L2 templates for the management cluster to set up a bond
network interface for the PXE and management network.
This configuration must be applied to the bootstrap templates, before you run
the bootstrap script to deploy the management cluster.
Configuration requirements for NIC bonding
Add at least two physical interfaces to each host in your management
cluster.
Connect at least two interfaces per host to an Ethernet switch that supports
Link Aggregation Control Protocol (LACP) port groups and LACP fallback.
Configure an LACP group on the ports connected to the NICs of a host.
Configure the LACP fallback on the port group to ensure that the host can
boot over the PXE network before the bond interface is set up on the host
operating system.
Configure server BIOS for both NICs of a bond to be PXE-enabled.
If the server does not support booting from multiple NICs, configure the
port of the LACP group that is connected to the PXE-enabled NIC of a server
to be the primary port. With this setting, the port becomes active in the
fallback mode.
Configure the ports that connect servers to the PXE network with the PXE
VLAN as native or untagged.
To configure a bond interface that aggregates two interfaces
for the PXE and management network:
In kaas-bootstrap/templates/bm/ipam-objects.yaml.template:
Verify that only the following parameters for the declaration of
{{nic0}} and {{nic1}} are set, as shown in the example below:
dhcp4
dhcp6
match
set-name
Remove other parameters.
Verify that the declaration of the bond interface bond0 has the
interfaces parameter listing both Ethernet interfaces.
Verify that the node address in the PXE network (ip"bond0:mgmt-pxe"
in the below example) is bound to the bond interface or to the virtual
bridge interface tied to that bond.
Caution
No VLAN ID must be configured for the PXE network from the
host side.
Configure bonding options using the parameters field. The only
mandatory option is mode. See the example below for details.
Note
You can set any mode supported by
netplan
and your hardware.
Important
Bond monitoring is disabled in Ubuntu by default. However,
Mirantis highly recommends enabling it using the Media Independent Interface
(MII) monitoring by setting the mii-monitor-interval parameter to a
non-zero value. For details, see Linux documentation: bond monitoring.
Verify your configuration using the following example:
This section describes how to configure a dedicated PXE network for a
management bare metal cluster.
A separate PXE network allows isolating sensitive bare metal provisioning
process from the end users. The users still have access to Container Cloud
services, such as Keycloak, to authenticate workloads in managed clusters,
such as Horizon in a Mirantis OpenStack for Kubernetes cluster.
Note
This additional configuration procedure must be completed as part
of the main Deploy a management cluster using CLI procedure. It substitutes or appends
some configuration parameters and templates that are used in the main
procedure for the management cluster to use two networks, PXE and
management, instead of one PXE/management network. Mirantis recommends
considering the main procedure first.
The following table describes the overall network mapping scheme with all
L2/L3 parameters, for example, for two networks, PXE (CIDR 10.0.0.0/24)
and management (CIDR 10.0.11.0/24):
When using separate PXE and management networks, the management cluster
services are exposed in different networks using two separate MetalLB
address pools:
Services exposed through the PXE network are as follows:
Ironic API as a bare metal provisioning server
HTTP server that provides images for network boot and server provisioning
Caching server for accessing the Container Cloud artifacts deployed on
hosts
Services exposed through the management network are all other Container Cloud
services, such as Keycloak, web UI, and so on.
To configure separate PXE and management networks:
To ensure successful bootstrap, enable asymmetric routing on the interfaces
of the management cluster nodes. This is required because the seed node relies
on one network by default, which can potentially cause traffic asymmetry.
In the kernelParameters section of baremetalhostprofiles.yaml.template,
set rp_filter to 2. This enables loose mode as defined in
RFC3704.
Example configuration of asymmetric routing
...kernelParameters:...sysctl:# Enables the "Loose mode" for the "k8s-lcm" interface (management network)net.ipv4.conf.k8s-lcm.rp_filter:"2"# Enables the "Loose mode" for the "bond0" interface (PXE network)net.ipv4.conf.bond0.rp_filter:"2"...
Note
More complicated solutions that are not described in this manual
include getting rid of traffic asymmetry, for example:
Configure source routing on management cluster nodes.
Plug the seed node into the same networks as the management cluster nodes,
which requires custom configuration of the seed node.
In kaas-bootstrap/templates/bm/ipam-objects.yaml.template:
Substitute all Subnet object templates with the new ones as described
in the example template below
Update the L2 template spec.l3Layout and spec.npTemplate fields
as described in the example template below
Example of the Subnet object templates
# Subnet object that provides IP addresses for bare metal hosts of# management cluster in the PXE network.apiVersion:"ipam.mirantis.com/v1alpha1"kind:Subnetmetadata:name:mgmt-pxenamespace:defaultlabels:kaas.mirantis.com/provider:baremetalkaas-mgmt-pxe-subnet:""spec:cidr:SET_IPAM_CIDRgateway:SET_PXE_NW_GWnameservers:-SET_PXE_NW_DNSincludeRanges:-SET_IPAM_POOL_RANGEexcludeRanges:-SET_METALLB_PXE_ADDR_POOL---# Subnet object that provides IP addresses for bare metal hosts of# management cluster in the management network.apiVersion:"ipam.mirantis.com/v1alpha1"kind:Subnetmetadata:name:mgmt-lcmnamespace:defaultlabels:kaas.mirantis.com/provider:baremetalkaas-mgmt-lcm-subnet:""ipam/SVC-k8s-lcm:"1"ipam/SVC-ceph-cluster:"1"ipam/SVC-ceph-public:"1"cluster.sigs.k8s.io/cluster-name:CLUSTER_NAMEspec:cidr:{{SET_LCM_CIDR}}includeRanges:-{{SET_LCM_RANGE}}excludeRanges:-SET_LB_HOST-SET_METALLB_ADDR_POOL---# Deprecated since 2.27.0. Subnet object that provides configuration# for "services-pxe" MetalLB address pool that will be used to expose# services LB endpoints in the PXE network.apiVersion:"ipam.mirantis.com/v1alpha1"kind:Subnetmetadata:name:mgmt-pxe-lbnamespace:defaultlabels:kaas.mirantis.com/provider:baremetalmetallb/address-pool-name:services-pxemetallb/address-pool-protocol:layer2metallb/address-pool-auto-assign:"false"cluster.sigs.k8s.io/cluster-name:CLUSTER_NAMEspec:cidr:SET_IPAM_CIDRincludeRanges:-SET_METALLB_PXE_ADDR_POOL
Deprecated since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0): the last Subnet template named mgmt-pxe-lb in the example
above will be used to configure the MetalLB address pool in the PXE network.
The bare metal provider will automatically configure MetalLB
with address pools using the Subnet objects identified by specific
labels.
Warning
The bm-pxe address must have a separate interface
with only one address on this interface.
Verify the current MetalLB configuration that is stored in MetalLB
objects:
The auto-assign parameter will be set to false for all address
pools except the default one. So, a particular service will get an
address from such an address pool only if the Service object has a
special metallb.universe.tf/address-pool annotation that points to
the specific address pool name.
Note
It is expected that every Container Cloud service on a management
cluster will be assigned to one of the address pools. Current
consideration is to have two MetalLB address pools:
services-pxe is a reserved address pool name to use for the
Container Cloud services in the PXE network (Ironic API, HTTP server,
caching server).
The bootstrap cluster also uses the services-pxe address pool for
its provision services for management cluster nodes to be provisioned
from the bootstrap cluster. After the management cluster is deployed,
the bootstrap cluster is deleted and that address pool is solely used
by the newly deployed cluster.
default is an address pool to use for all other Container Cloud
services in the management network. No annotation is required on the
Service objects in this case.
In addition to the network parameters defined in Deploy a management cluster using CLI,
configure the following ones by replacing them in
templates/bm/ipam-objects.yaml.template:
Address of a management network for the management cluster
in the CIDR notation. You can later share this network with managed
clusters where it will act as the LCM network.
If managed clusters have their separate LCM networks,
those networks must be routable to the management network.
10.0.11.0/24
SET_LCM_RANGE
Address range that includes addresses to be allocated to
bare metal hosts in the management network for the management
cluster. When this network is shared with managed clusters,
the size of this range limits the number of hosts that can be
deployed in all clusters that share this network.
When this network is solely used by a management cluster,
the range should include at least 3 IP addresses
for bare metal hosts of the management cluster.
10.0.11.100-10.0.11.109
SET_METALLB_PXE_ADDR_POOL
Address range to be used for LB endpoints of the Container Cloud
services: Ironic-API, HTTP server, and caching server.
This range must be within the PXE network.
The minimum required range is 5 IP addresses.
10.0.0.61-10.0.0.70
The following parameters will now be tied to the management network
while their meaning remains the same as described in
Deploy a management cluster using CLI:
Subnet template parameters migrated to management network¶
Parameter
Description
Example value
SET_LB_HOST
IP address of the externally accessible API endpoint
of the management cluster. This address must NOT be
within the SET_METALLB_ADDR_POOL range but within the
management network. External load balancers are not supported.
10.0.11.90
SET_METALLB_ADDR_POOL
The address range to be used for the externally accessible LB
endpoints of the Container Cloud services, such as Keycloak, web UI,
and so on. This range must be within the management network.
The minimum required range is 19 IP addresses.
To facilitate multi-rack and other types of distributed bare metal datacenter
topologies, the dnsmasq DHCP server used for host provisioning in Container
Cloud supports working with multiple L2 segments through network routers that
support DHCP relay.
Container Cloud has its own DHCP relay running on one of the management
cluster nodes. That DHCP relay serves for proxying DHCP requests in the
same L2 domain where the management cluster nodes are located.
Caution
Networks used for hosts provisioning of a managed cluster must
have routes to the PXE network of the management cluster. This configuration
enables hosts to have access to the management cluster services that are
used during host provisioning.
Management cluster nodes must have routes through the PXE network to PXE
network segments used on a managed cluster. The following example contains
L2 template fragments for a management cluster node:
Configuration example extract
l3Layout:# PXE/static subnet for a management cluster-scope:namespacesubnetName:kaas-mgmt-pxelabelSelector:kaas-mgmt-pxe-subnet:"1"# management (LCM) subnet for a management cluster-scope:namespacesubnetName:kaas-mgmt-lcmlabelSelector:kaas-mgmt-lcm-subnet:"1"# PXE/dhcp subnets for a managed cluster-scope:namespacesubnetName:managed-dhcp-rack-1-scope:namespacesubnetName:managed-dhcp-rack-2-scope:namespacesubnetName:managed-dhcp-rack-3...npTemplate:|...bonds:bond0:interfaces:- {{ nic 0 }}- {{ nic 1 }}parameters:mode: active-backupprimary: {{ nic 0 }}mii-monitor-interval: 100dhcp4: falsedhcp6: falseaddresses:# static address on management node in the PXE network- {{ ip "bond0:kaas-mgmt-pxe" }}routes:# routes to managed PXE network segments- to: {{ cidr_from_subnet "managed-dhcp-rack-1" }}via: {{ gateway_from_subnet "kaas-mgmt-pxe" }}- to: {{ cidr_from_subnet "managed-dhcp-rack-2" }}via: {{ gateway_from_subnet "kaas-mgmt-pxe" }}- to: {{ cidr_from_subnet "managed-dhcp-rack-3" }}via: {{ gateway_from_subnet "kaas-mgmt-pxe" }}...
To configure DHCP ranges for dnsmasq, create the Subnet objects
tagged with the ipam/SVC-dhcp-range label while setting up subnets
for a managed cluster using CLI.
Caution
Support of multiple DHCP ranges has the following limitations:
Using of custom DNS server addresses for servers that boot over PXE
is not supported.
The Subnet objects for DHCP ranges cannot be associated with any
specific cluster, as DHCP server configuration is only applicable to the
management cluster where DHCP server is running.
The cluster.sigs.k8s.io/cluster-name label will be ignored.
Create the Subnet objects tagged with the ipam/SVC-dhcp-range label.
Caution
For cluster-specific subnets, create Subnet objects in the
same namespace as the related Cluster object project. For shared
subnets, create Subnet objects in the default namespace.
Setting of custom nameservers in the DHCP subnet is not supported.
After creation of the above Subnet object, the provided data will be
utilized to render the Dnsmasq object used for configuration of the
dnsmasq deployment. You do not have to manually edit the Dnsmasq object.
Verify that the changes are applied to the Dnsmasq object:
For servers to access the DHCP server across the L2 segment boundaries, for
example, from another rack with a different VLAN for PXE network, you must
configure DHCP relay (agent) service on the border switch of the segment. For
example, on a top-of-rack (ToR) or leaf (distribution) switch, depending on the
data center network topology.
Warning
To ensure predictable routing for the relay of DHCP packets,
Mirantis strongly advises against the use of chained DHCP relay
configurations. This precaution limits the number of hops for DHCP packets,
with an optimal scenario being a single hop.
This approach is justified by the unpredictable nature of chained relay
configurations and potential incompatibilities between software and
hardware relay implementations.
The dnsmasq server listens on the PXE network of the management
cluster by using the dhcp-lb Kubernetes Service.
To configure the DHCP relay service, specify the external address of the
dhcp-lb Kubernetes Service as an upstream address for the relayed DHCP
requests, which is the IP helper address for DHCP. There is the dnsmasq
deployment behind this service that can only accept relayed DHCP requests.
Container Cloud has its own DHCP relay running on one of the management
cluster nodes. That DHCP relay serves for proxying DHCP requests in the
same L2 domain where the management cluster nodes are located.
To obtain the actual IP address issued to the dhcp-lb Kubernetes Service:
kubectl-nkaasgetservicedhcp-lb
Migration of DHCP configuration for existing management clusters¶
Note
This section applies only to existing management clusters that
are created before Container Cloud 2.24.0 (Cluster release 14.0.0).
Caution
Since Container Cloud 2.24.0, you can only remove the deprecated
dnsmasq.dhcp_range, dnsmasq.dhcp_ranges, dnsmasq.dhcp_routers,
and dnsmasq.dhcp_dns_servers values from the cluster spec.
The Admission Controller does not accept any other changes in these values.
This configuration is completely superseded by the Subnet object.
The DHCP configuration automatically migrated from the cluster spec to
Subnet objects after cluster upgrade to Container Cloud 2.21.0 (Cluster
release 11.5.0).
To remove the deprecated dnsmasq parameters from the cluster spec:
Open the management cluster spec for editing.
In the baremetal-operator release values, remove the
dnsmasq.dhcp_range, dnsmasq.dhcp_ranges, dnsmasq.dhcp_routers,
and dnsmasq.dhcp_dns_servers parameters. For example:
The dnsmasq.dhcp_<name> parameters of the
baremetal-operator Helm chart values in the Clusterspec are
deprecated since the Cluster release 11.5.0 and removed in the Cluster
release 14.0.0.
Ensure that the required DHCP ranges and options are set in the Subnet
objects. For configuration details, see Configure DHCP ranges for dnsmasq.
The dnsmasq configuration options dhcp-option=3 and dhcp-option=6
are absent in the default configuration. So, by default, dnsmasq
will send the DNS server and default route to DHCP clients as defined in the
dnsmasq official documentation:
The netmask and broadcast address are the same as on the host running
dnsmasq.
The DNS server and default route are set to the address of the host running
dnsmasq.
If the domain name option is set, this name is sent to DHCP clients.
Available since MCC 2.26.0 (Cluster release 16.1.0)
This section instructs you on how to enable dynamic IP allocation feature
to increase the amount of baremetal hosts to be provisioned in parallel on
managed clusters.
Using this feature, you can effortlessly deploy a large managed cluster by
provisioning up to 100 hosts simultaneously. In addition to dynamic
IP allocation, this feature disables the ping check in the DHCP server.
Therefore, if you plan to deploy large managed clusters, enable this feature
during the management cluster bootstrap.
Set a custom external IP address for the DHCP service¶
Available since MCC 2.25.0 (Cluster release 16.0.0)
This section instructs you on how to set a custom external IP address for
the dhcp-lb service so that it remains the same during management cluster
upgrades and other LCM operations.
The changes of dhcp-lb service address may lead to the necessity of
changing configuration for DHCP relays on ToR switches.
The described procedure allows you to avoid such unwanted changes.
This configuration makes sense when you use multiple DHCP address ranges
on your deployment. See Configure multiple DHCP address ranges for details.
To set a custom external IP address for the dhcp-lb service:
In the Cluster object of the management cluster, modify the
configuration of the baremetal-operator release by setting
dnsmasq.dedicated_udp_service_address_pool to true:
In the MetalLBConfig object of the management cluster, modify the
ipAddressPools object list by adding the dhcp-lb object and the
serviceAllocation parameters for the default object:
Select non-overlapping IP addresses for all the ipAddressPools that
you use: default, services-pxe, and dhcp-lb.
In the MetalLBConfig object of the management cluster, modify the
l2Advertisements object list by adding dhcp-lb to the
ipAddressPools section in the pxe object spec:
Note
A cluster may have a different L2Advertisement object name
instead of pxe.
Consider this section as part of the Bootstrap v2
CLI procedure.
During creation of a management cluster, you can configure optional cluster
settings using the Container Cloud API by modifying cluster.yaml.template.
To configure optional cluster settings:
Technology Preview. Enable custom host names for cluster machines.
When enabled, any machine host name in a particular region matches the related
Machine object name. For example, instead of the default
kaas-node-<UID>, a machine host name will be master-0. The custom
naming format is more convenient and easier to operate with.
Configuration for custom host names on the management and its future
managed clusters
In cluster.yaml.template, find the
spec.providerSpec.value.kaas.regional.helmReleases.name:baremetal-provider section.
Under values.config, add customHostnamesEnabled:true:
Boolean, default - false. Enables the auditd role to install the
auditd packages and configure rules. CIS rules: 4.1.1.1, 4.1.1.2.
enabledAtBoot
Boolean, default - false. Configures grub to audit processes that can
be audited even if they start up prior to auditd startup. CIS rule:
4.1.1.3.
backlogLimit
Integer, default - none. Configures the backlog to hold records. If during
boot audit=1 is configured, the backlog holds 64 records. If more than
64 records are created during boot, auditd records will be lost with a
potential malicious activity being undetected. CIS rule: 4.1.1.4.
maxLogFile
Integer, default - none. Configures the maximum size of the audit log file.
Once the log reaches the maximum size, it is rotated and a new log file is
created. CIS rule: 4.1.2.1.
maxLogFileAction
String, default - none. Defines handling of the audit log file reaching the
maximum file size. Allowed values:
keep_logs - rotate logs but never delete them
rotate - add a cron job to compress rotated log files and keep
maximum 5 compressed files.
compress - compress log files and keep them under the
/var/log/auditd/ directory. Requires
auditd_max_log_file_keep to be enabled.
CIS rule: 4.1.2.2.
maxLogFileKeep
Integer, default - 5. Defines the number of compressed log files to
keep under the /var/log/auditd/ directory. Requires
auditd_max_log_file_action=compress. CIS rules - none.
mayHaltSystem
Boolean, default - false. Halts the system when the audit logs are
full. Applies the following configuration:
space_left_action=email
action_mail_acct=root
admin_space_left_action=halt
CIS rule: 4.1.2.3.
customRules
String, default - none. Base64-encoded content of the 60-custom.rules
file for any architecture. CIS rules - none.
customRulesX32
String, default - none. Base64-encoded content of the 60-custom.rules
file for the i386 architecture. CIS rules - none.
customRulesX64
String, default - none. Base64-encoded content of the 60-custom.rules
file for the x86_64 architecture. CIS rules - none.
presetRules
String, default - none. Comma-separated list of the following built-in
preset rules:
access
actions
delete
docker
identity
immutable
logins
mac-policy
modules
mounts
perm-mod
privileged
scope
session
system-locale
time-change
Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0) in the
Technology Preview scope, you can collect some of the preset rules
indicated above as groups and use them in presetRules:
ubuntu-cis-rules - this group contains rules to comply with the
Ubuntu CIS Benchmark recommendations, including the following CIS Ubuntu
20.04 v2.0.1 rules:
scope - 5.2.3.1
actions - same as 5.2.3.2
time-change - 5.2.3.4
system-locale - 5.2.3.5
privileged - 5.2.3.6
access - 5.2.3.7
identity - 5.2.3.8
perm-mod - 5.2.3.9
mounts - 5.2.3.10
session - 5.2.3.11
logins - 5.2.3.12
delete - 5.2.3.13
mac-policy - 5.2.3.14
modules - 5.2.3.19
docker-cis-rules - this group contains rules to comply with
Docker CIS Benchmark recommendations, including the docker Docker CIS
v1.6.0 rules 1.1.3 - 1.1.18.
You can also use two additional keywords inside presetRules:
none - select no built-in rules.
all - select all built-in rules. When using this keyword, you can add
the ! prefix to a rule name to exclude some rules. You can use the
! prefix for rules only if you add the all keyword as the
first rule. Place a rule with the ! prefix only after
the all keyword.
Example configurations:
presetRules:none - disable all preset rules
presetRules:docker - enable only the docker rules
presetRules:access,actions,logins - enable only the
access, actions, and logins rules
presetRules:ubuntu-cis-rules - enable all rules from the
ubuntu-cis-rules group
presetRules:docker-cis-rules,actions - enable all rules from
the docker-cis-rules group and the actions rule
presetRules:all - enable all preset rules
presetRules:all,!immutable,!sessions - enable all preset
rules except immutable and sessions
Configure NTP server. You can disable NTP that is enabled by default. This
option disables the management of chrony configuration by Container Cloud
to use your own system for chrony management. Otherwise, configure the
regional NTP server parameters as described below.
NTP configuration
Configure the regional NTP server parameters to be applied to all machines
of managed clusters.
In cluster.yaml.template or the Cluster object, add the
ntp:servers section with the list of required server names:
Applies since Container Cloud 2.26.0 (Cluster release 16.1.0). If you plan
to deploy large managed clusters, enable dynamic IP allocation to increase
the amount of baremetal hosts to be provisioned in parallel.
For details, see Enable dynamic IP allocation.
Now, proceed with completing the bootstrap process using the Container Cloud
Bootstrap API as described in Deploy a management cluster.
Now, you can proceed with operating your management cluster through the
Container Cloud web UI and deploying MOSK clusters as
described in Operations Guide.
Required. Comma-separated list of roles to assign to the user.
If you run the command without the --namespace flag,
you can assign the following roles:
global-admin - read and write access for global role bindings
writer - read and write access
reader - view access
operator - create and manage access to the BareMetalHost
and BareMetalHostInventory (since Container Cloud 2.29.1,
Cluster release 16.4.1) objects
management-admin - full access to the management cluster,
available since Container Cloud 2.25.0 (Cluster release 16.0.0)
If you run the command for a specific project using the
--namespace flag, you can assign the following roles:
operator or writer - read and write access
user or reader - view access
member - read and write access (excluding IAM objects)
bm-pool-operator - create and manage access to the
BareMetalHost and BareMetalHostInventory (since Container
Cloud 2.29.1, Cluster release 16.4.1) objects
--kubeconfig
Required. Path to the management cluster kubeconfig generated during
the management cluster bootstrap.
--namespace
Optional. Name of the Container Cloud project where the user will be
created. If not set, a global user will be created for all Container
Cloud projects with the corresponding role access to view or manage
all public objects.
--password-stdin
Optional. Flag to provide the user password through stdin:
For MOSK clusters, the feature is generally
available since MOSK 23.1.
While bootstrapping a Container Cloud management cluster using proxy, you may
require Internet access to go through a man-in-the-middle (MITM) proxy. Such
configuration requires that you enable streaming and install a CA certificate
on a bootstrap node.
This section describes how to configure authentication for management cluster
depending on the external identity provider type integrated to your deployment.
If you integrate LDAP for IAM to Mirantis OpenStack for Kubernetes, add the required LDAP
configuration to cluster.yaml.template during the management cluster
bootstrap.
Note
The example below defines the recommended non-anonymous
authentication type. If you require anonymous authentication, replace the
following parameters with authType: "none":
authType:"simple"bindCredential:""bindDn:""
To configure LDAP for IAM:
Open templates/bm/cluster.yaml.template.
Configure the keycloak:userFederation:providers:
and keycloak:userFederation:mappers: sections as required:
Verify that the userFederation section is located on the same level as
the initUsers section.
Verify that all attributes set in the mappers section are defined for
users in the specified LDAP system. Missing attributes may cause
authorization issues.
Now, return to the bootstrap instruction for your management cluster.
The instruction below applies to the DNS-based management
clusters. If you bootstrap a non-DNS-based management cluster, configure
Google OAuth IdP for Keycloak after bootstrap using the
official Keycloak documentation.
If you integrate Google OAuth external identity provider for IAM to
Mirantis OpenStack for Kubernetes, create the authorization credentials for IAM in your
Google OAuth account and configure cluster.yaml.template during the
bootstrap of the management cluster.
In the APIs Credentials menu, select
OAuth client ID.
In the window that opens:
In the Application type menu, select
Web application.
In the Authorized redirect URIs field, type in
<keycloak-url>/auth/realms/iam/broker/google/endpoint,
where <keycloak-url> is the corresponding DNS address.
Press Enter to add the URI.
Click Create.
A page with your client ID and client secret opens. Save these
credentials for further usage.
Log in to the bootstrap node.
Open templates/bm/cluster.yaml.template.
In the keycloak:externalIdP: section, add the following snippet with
your credentials created in previous steps:
After bootstrapping your baremetal-based Mirantis Container Cloud
management cluster, you can create a baremetal-based managed cluster
to deploy Mirantis OpenStack for Kubernetes using the Container Cloud API.
The procedure below applies only to the Container Cloud web UI
users with the m:kaas@global-admin or m:kaas@writer access role
assigned by the infrastructure operator.
The default project (Kubernetes namespace) is dedicated for management
clusters only. MOSK clusters require a separate project.
You can create as many projects as required by your company infrastructure.
To create a project for MOSK clusters:
Log in to the Container Cloud web UI as m:kaas@global-admin or
m:kaas@writer.
In the Projects tab, click Create.
Type the new project name.
Click Create.
Note
Due to the known issue 50168, access to the
newly created project becomes available in five minutes after project
creation.
Before creating a bare metal managed cluster, add the required number
of bare metal hosts either using the Container Cloud web UI for a default
configuration or using CLI for an advanced configuration.
You can view the created profiles in the
BM Host Profiles tab of the Container Cloud web UI.
Log in to the Container Cloud web UI with the m:kaas@operator or
m:kaas:namespace@bm-pool-operator permissions.
Switch to the required non-default project using the
Switch Project action icon located on top of the main left-side
navigation panel.
Caution
Do not create a MOSK cluster in the default project
(Kubernetes namespace), which is dedicated for the management cluster only.
If no projects are defined, first create a new mosk project as described
in Create a project for MOSK clusters.
Optional. Available since Container Cloud 2.24.0 (Cluster releases 15.0.1
and 14.0.1). In the Credentials tab, click
Add Credential and add the IPMI user name and password of the
bare metal host to access the Baseboard Management Controller (BMC).
Select one of the following options:
Since Container Cloud 2.26.0 (17.1.0 and 16.1.0)
In the Baremetal tab, click Create Host.
Fill out the Create baremetal host form as required:
Name
Specify the name of the new bare metal host.
Boot Mode
Specify the BIOS boot mode. Available options: Legacy,
UEFI, or UEFISecureBoot.
MAC Address
Specify the MAC address of the PXE network interface.
Baseboard Management Controller (BMC)
Specify the following BMC details:
IP Address
Specify the IP address to access the BMC.
Credential Name
Specify the name of the previously added bare metal host
credentials to associate with the current host.
Cert Validation
Enable validation of the BMC API certificate. Applies only to the
redfish+http BMC protocol. Disabled by default.
Power off host after creation
Experimental. Select to power off the bare metal host after
creation.
Caution
This option is experimental and intended only for
testing and evaluation purposes. Do not use it for
production deployments.
Before MCC 2.26.0 (17.0.0, 16.0.0, or earlier)
In the Baremetal tab, click Add BM host.
Fill out the Add new BM host form as required:
Baremetal host name
Specify the name of the new bare metal host.
Provider Credential
Optional. Available since Container Cloud 2.24.0 (Cluster releases
15.0.1 and 14.0.1). Specify the name of the previously added bare
metal host credentials to associate with the current host.
Add New Credential
Optional. Available since Container Cloud 2.24.0 (Cluster releases
15.0.1 and 14.0.1). Applies if you did not add bare metal host
credentials using the Credentials tab. Add the bare metal
host credentials:
Username
Specify the name of the IPMI user to access the BMC.
Password
Specify the IPMI password of the user to access the BMC.
Boot MAC address
Specify the MAC address of the PXE network interface.
IP Address
Specify the IP address to access the BMC.
Label
Assign the machine label to the new host that defines which type of
machine may be deployed on this bare metal host. Only one label can
be assigned to a host. The supported labels include:
Manager
This label is selected and set by default. Assign this label t
the bare metal hosts that can be used to deploy machines with the
manager type. These hosts must match the CPU and RAM
requirements described in MOSK cluster hardware requirements.
Worker
The host with this label may be used to deploy the worker
machine type. Assign this label to the bare metal hosts that have
sufficient CPU and RAM resources, as described in
MOSK cluster hardware requirements.
Storage
Assign this label to the bare metal hosts that have sufficient
storage devices to match MOSK cluster hardware requirements. Hosts with this
label will be used to deploy machines with the storage type that
run Ceph OSDs.
Click Create.
While adding the bare metal host, Container Cloud discovers and inspects
the hardware of the bare metal host and adds it to BareMetalHost.status
for future references.
During provisioning, baremetal-operator inspects the bare metal host
and moves it to the Preparing state. The host becomes ready to be linked
to a bare metal machine.
Verify the results of the hardware inspection to avoid unexpected errors
during the host usage:
Select one of the following options:
Since MCC 2.26.0 (17.1.0 and 16.1.0)
In the left sidebar, click Baremetal. The
Hosts page opens.
Before MCC 2.26.0 (17.0.0, 16.0.0, or earlier)
In the left sidebar, click BM Hosts.
Verify that the bare metal host is registered and switched to one of the
following statuses:
Preparing for a newly added host
Ready for a previously used host or for a host that is
already linked to a machine
Select one of the following options:
Since MCC 2.26.0 (17.1.0 and 16.1.0)
On the Hosts page, click the host kebab menu and select
Host info.
Before MCC 2.26.0 (17.0.0, 16.0.0, or earlier)
On the BM Hosts page, click the name of the newly added
bare metal host.
In the window with the host details, scroll down to the
Hardware section.
Review the section and make sure that the number and models
of disks, network interface cards, and CPUs match the hardware
specification of the server.
If the hardware details are consistent with the physical server
specifications for all your hosts, proceed to
Create a MOSK cluster.
If you find any discrepancies in the hardware inspection results,
it might indicate that the server has hardware issues or
is not compatible with Container Cloud.
In the metadata section, add a unique credentials name and the
name of the non-default project (namespace) dedicated for the
managed cluster being created.
In the spec section, add the IPMI user name and password in plain
text to access the Baseboard Management Controller (BMC). The password
will not be stored in the BareMetalHostCredential object but will
be erased and saved in an underlying Secret object.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
MOSK 22.4 and earlier
Create a secret YAML file that describes
the unique credentials of the new bare metal host.
Example of the bare metal host secret:
In the data section, add the IPMI user name and password in the
base64 encoding to access the BMC. To obtain the base64-encoded
credentials, you can use the following command in your Linux console:
echo-n<username|password>|base64
Caution
Each bare metal host must have a unique Secret.
In the metadata section, add the unique name of credentials and
the name of the non-default project (namespace) dedicated for
the managed cluster being created.
Apply this secret YAML file to your deployment:
Warning
The kubectl apply command automatically saves the
applied data as plain text into the
kubectl.kubernetes.io/last-applied-configuration annotation of the
corresponding object. This may result in revealing sensitive data in this
annotation when creating or modifying the object.
Therefore, do not use kubectl apply on this object.
Use kubectl create, kubectl patch, or
kubectl edit instead.
If you used kubectl apply on this object, you
can remove the kubectl.kubernetes.io/last-applied-configuration
annotation from the object using kubectl edit.
Create a YAML file that contains a description of the new bare metal host:
Since the management cluster update to 16.4.0 (MCC 2.29.0)
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
apiVersion:kaas.mirantis.com/v1alpha1kind:BareMetalHostInventorymetadata:annotations:inspect.metal3.io/hardwaredetails-storage-sort-term:hctl ASC, wwn ASC, by_id ASC, name ASClabels:kaas.mirantis.com/baremetalhost-id:<unique-bare-metal-host-hardware-node-id>kaas.mirantis.com/provider:baremetalname:<bare-metal-host-unique-name>namespace:<managed-cluster-project-name>spec:bmc:address:<ip-address-for-bmc-access>bmhCredentialsName:<bare-metal-host-credential-unique-name>bootMACAddress:<bare-metal-host-boot-mac-address>online:true
Note
If you have a limited amount of free and unused IP addresses
for server provisioning, you can add the
baremetalhost.metal3.io/detached annotation that pauses automatic
host management to manually allocate an IP address for the host. For
details, see Manually allocate IP addresses for bare metal hosts.
Before the management cluster update to 16.4.0 (MCC 2.29.0)
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Note
If you have a limited amount of free and unused IP addresses
for server provisioning, you can add the
baremetalhost.metal3.io/detached annotation that pauses automatic
host management to manually allocate an IP address for the host. For
details, see Manually allocate IP addresses for bare metal hosts.
During provisioning, baremetal-operator inspects the bare metal host
and moves it to the Preparing state. The host becomes ready to be linked
to a bare metal machine.
Caution
If changing or adding of DHCP subnets is required to bootstrap
new nodes, wait after changing or adding of DHCP subnets until the
dnsmasq pod becomes ready, then create bare metal host objects as
described above.
During provisioning, the status changes as follows:
registering
inspecting
preparing
After the bare metal host object switches to the preparing stage, the
inspecting phase finishes and you can verify that hardware information
is available in the object status and matches the MOSK cluster hardware requirements.
For example:
The bare metal host profile is a Kubernetes custom resource. It enables
the operator to define how the storage devices and the operating system are
provisioned and configured.
This section describes the bare metal host profile default settings and
configuration of custom profiles for managed clusters using Container Cloud
API. The section also applies to a management cluster with a few differences
described in Customize the default bare metal host profile.
The default host profile requires three storage devices
in the following strict order:
Boot device and operating system storage
This device contains boot data and operating system data. It
is partitioned using the GUID Partition Table (GPT) labels.
The root file system is an ext4 file system
created on top of an LVM logical volume.
For a detailed layout, refer to the table below.
Local volumes device
This device contains an ext4 file system with directories mounted
as persistent volumes to Kubernetes. These volumes are used by
the Mirantis Container Cloud services to store its data,
including monitoring and identity databases.
Ceph storage device
This device is used as a Ceph datastore or Ceph OSD on managed clusters.
The following table summarizes the default configuration of the host system
storage set up by the Container Cloud bare metal management.
Default configuration of the bare metal host storage¶
Device/partition
Name/Mount point
Recommended size, GB
Description
/dev/sda1
bios_grub
4 MiB
The mandatory GRUB boot partition required for non-UEFI systems.
/dev/sda2
UEFI -> /boot/efi
0.2 GiB
The boot partition required for the UEFI boot mode.
/dev/sda3
config-2
64 MiB
The mandatory partition for the cloud-init configuration.
Used during the first host boot for initial configuration.
/dev/sda4
lvm_root_part
100% of the remaining free space in the LVM volume group
The main LVM physical volume that is used to create the root file system.
/dev/sdb
lvm_lvp_part -> /mnt/local-volumes
100% of the remaining free space in the LVM volume group
The LVM physical volume that is used to create the file system
for LocalVolumeProvisioner.
/dev/sdc
-
100% of the remaining free space in the LVM volume group
Clean raw disk that will be used for the Ceph storage backend on
managed clusters.
If required, you can customize the default host storage configuration.
For details, see Create MOSK host profiles.
Before deploying a cluster, you may need to erase existing data from hardware
devices to be used for deployment. You can either erase an existing partition
or remove all existing partitions from a physical device. For this purpose,
use the wipeDevice structure that configures cleanup behavior during
configuration of a custom bare metal host profile described in
Create MOSK host profiles.
The wipeDevice structure contains the following options:
When you enable the eraseMetadata option, which is disabled by default,
the Ansible provisioner attempts to clean up the existing metadata from
the target device. Examples of metadata include:
Existing file system
Logical Volume Manager (LVM) or Redundant Array of Independent Disks (RAID)
configuration
The behavior of metadata erasure varies depending on the target device:
If a device is part of other logical devices, for example, a partition,
logical volume, or MD RAID volume, such logical device is disassembled and
its file system metadata is erased. On the final erasure step,
the file system metadata of the target device is erased as well.
If a device is a physical disk, then all its nested partitions along with
their nested logical devices, if any, are erased and disassembled.
On the final erasure step, all partitions and metadata of the target device
are removed.
Caution
None of the eraseMetadata actions include overwriting the
target device with data patterns. For this purpose, use the eraseDevice
option as described in Erase a device.
To enable the eraseMetadata option, use the wipeDevice field in the
spec:devices section of the BareMetalHostProfile object. For a
detailed description of the option, see Container Cloud API Reference:
BareMetalHostProfile.
If you require not only disassembling of existing logical volumes but also
removing of all data ever written to the target device, configure the
eraseDevice option, which is disabled by default. This option is not
applicable to paritions, LVM, or MD RAID logical volumes because such volumes
may use caching that prevents a physical device from being erased properly.
Important
The eraseDevice option does not replace the secure erase.
To configure the eraseDevice option, use the wipeDevice field in the
spec:devices section of the BareMetalHostProfile object. For a
detailed description of the option, see Container Cloud API Reference:
BareMetalHostProfile.
Different types of MOSK nodes require differently
configured host storage. This section describes how to create custom
host profiles for different types of MOSK nodes.
You can create custom profiles for managed clusters using Container
Cloud API.
Note
The procedure below also applies to management clusters.
You can use flexible size units throughout bare metal host profiles.
For example, you can now use either sizeGiB:0.1 or size:100Mi
when specifying a device size.
Mirantis recommends using only one parameter name type and units throughout
the configuration files. If both sizeGiB and size are used,
sizeGiB is ignored during deployment and the suffix is adjusted
accordingly. For example, 1.5Gi will be serialized as 1536Mi. The size
without units is counted in bytes. For example, size:120 means 120 bytes.
Warning
All data will be wiped during cluster deployment on devices
defined directly or indirectly in the fileSystems list of
BareMetalHostProfile. For example:
A raw device partition with a file system on it
A device partition in a volume group with a logical volume that has a
file system on it
An mdadm RAID device with a file system on it
An LVM RAID device with a file system on it
The wipe field is always considered true for these devices.
The false value is ignored.
Therefore, to prevent data loss, move the necessary data from these file
systems to another server beforehand, if required.
To create MOSK bare metal host profiles:
Select from the following options:
For a management cluster, log in to the bare metal seed node that will be
used to bootstrap the management cluster.
For a managed cluster, log in to the local machine where you management
cluster kubeconfig is located and where kubectl is installed.
Note
The management cluster kubeconfig is created automatically
during the last stage of the management cluster bootstrap.
Select from the following options:
For a management cluster, open
templates/bm/baremetalhostprofiles.yaml.template for editing.
For a managed cluster, create a new bare metal host profile for
MOSK compute nodes in a YAML file under the
templates/bm/ directory.
Edit the host profile using the example template below to meet
your hardware configuration requirements:
apiVersion:metal3.io/v1alpha1kind:BareMetalHostProfilemetadata:name:<PROFILE_NAME>namespace:<PROJECT_NAME>spec:devices:# From the HW node, obtain the first device, which size is at least 60Gib-device:workBy:"by_id,by_wwn,by_path,by_name"minSize:60Gitype:ssdwipe:truepartitions:-name:bios_grubpartflags:-bios_grubsize:4Miwipe:true-name:uefipartflags:-espsize:200Miwipe:true-name:config-2size:64Miwipe:true# This partition is only required on compute nodes if you plan to# use LVM ephemeral storage.-name:lvm_nova_partwipe:truesize:100Gi-name:lvm_root_partsize:0wipe:true# From the HW node, obtain the second device, which size is at least 60Gib# If a device exists but does not fit the size,# the BareMetalHostProfile will not be applied to the node-device:workBy:"by_id,by_wwn,by_path,by_name"minSize:60Gitype:ssdwipe:true# From the HW node, obtain the disk device with the exact name-device:workBy:"by_id,by_wwn,by_path,by_name"minSize:60Giwipe:truepartitions:-name:lvm_lvp_partsize:0wipe:true# Example of wiping a device w\o partitioning it.# Mandatory for the case when a disk is supposed to be used for Ceph backend# later-device:workBy:"by_id,by_wwn,by_path,by_name"wipe:truefileSystems:-fileSystem:vfatpartition:config-2-fileSystem:vfatmountPoint:/boot/efipartition:uefi-fileSystem:ext4logicalVolume:rootmountPoint:/-fileSystem:ext4logicalVolume:lvpmountPoint:/mnt/local-volumes/logicalVolumes:-name:rootsize:0vg:lvm_root-name:lvpsize:0vg:lvm_lvppostDeployScript:|#!/bin/bash -execho $(date) 'post_deploy_script done' >> /root/post_deploy_donepreDeployScript:|#!/bin/bash -execho $(date) 'pre_deploy_script done' >> /root/pre_deploy_donevolumeGroups:-devices:-partition:lvm_root_partname:lvm_root-devices:-partition:lvm_lvp_partname:lvm_lvpgrubConfig:defaultGrubOptions:-GRUB_DISABLE_RECOVERY="true"-GRUB_PRELOAD_MODULES=lvm-GRUB_TIMEOUT=20kernelParameters:sysctl:# For the list of options prohibited to change, refer to# https://docs.mirantis.com/mke/3.7/install/predeployment/set-up-kernel-default-protections.htmlkernel.dmesg_restrict:"1"kernel.core_uses_pid:"1"fs.file-max:"9223372036854775807"fs.aio-max-nr:"1048576"fs.inotify.max_user_instances:"4096"vm.max_map_count:"262144"
If asymmetric traffic is expected on some of the managed cluster
nodes, enable the loose mode for the corresponding interfaces on those
nodes by setting the net.ipv4.conf.<interface-name>.rp_filter
parameter to "2" in the kernelParameters.sysctl section.
For example:
Optional. Configure wiping of the target device or partition to be used
for cluster deployment as described in Wipe a device or partition.
Optional. Configure multiple devices for LVM volume using the example
template extract below for reference.
Caution
The following template extract contains only sections relevant
to LVM configuration with multiple PVs. Expand the main template
described in the previous step with the configuration below if required.
Optional. Technology Preview. Configure support of the Redundant Array of
Independent Disks (RAID) that allows, for example, installing a cluster
operating system on a RAID device, refer to Configure RAID support.
Optional. Configure the RX/TX buffer size for physical network interfaces
and txqueuelen for any network interfaces.
This configuration can greatly benefit high-load and high-performance
network interfaces. You can configure these parameters using the udev
rules. For example:
For a management cluster, proceed with the cluster bootstrap procedure
as described in Deploy a management cluster.
For a managed cluster, select from the following options:
Using the Container Cloud web UI
Available since MCC 2.26.0 (17.1.0 and 16.1.0)
Log in to the Container Cloud web UI with the operator
permissions.
Switch to the required non-default project using the
Switch Project action icon located on top of the main left-side
navigation panel.
Caution
Do not create a MOSK cluster in the default project
(Kubernetes namespace), which is dedicated for the management cluster only.
If no projects are defined, first create a new mosk project as described
in Create a project for MOSK clusters.
In the left sidebar, navigate to Baremetal and click
the Host Profiles tab.
Click Create Host Profile.
Fill out the Create host profile form:
Name
Name of the bare metal host profile.
Specification
BareMetalHostProfile object specification in the YAML format
that you have previously created. Click Edit to edit
the BareMetalHostProfile object if required.
Note
Before Container Cloud 2.28.0 (Cluster releases 17.3.0
and 16.3.0), the field name is YAML file, and you
can upload the required YAML file instead of inserting and
editing it.
Labels
Available since Container Cloud 2.28.0 (Cluster releases 17.3.0
and 16.3.0). Key-value pairs attached to BareMetalHostProfile.
Using the Container Cloud API
Add the bare metal host profile to your management cluster:
Create a volume group on top of the defined partition and create the
required number of logical volumes (LVs) on top of the created volume
group (VG). Add one logical volume per one Ceph OSD on the node.
Example snippet of an LVM configuration for a Ceph metadata disk:
Plan LVs of a separate metadata device thoroughly.
Any logical volume misconfiguration causes redeployment of all
Ceph OSDs that use this disk as metadata devices.
Note
General Ceph recommendation is to have a metadata device in
between 1% to 4% of the Ceph OSD data size. Mirantis highly
recommends having at least 4% of Ceph OSD data size.
If you plan using a disk as a separate metadata device for 10 Ceph
OSDs, define the size of an LV for each Ceph OSD in between 1% to
4% of the corresponding Ceph OSD data size. If RADOS Gateway is
enabled, the minimum data size must be 4%. For details, see
Ceph documentation: Bluestore config reference.
For example, if the total data size of 10 Ceph OSDs equals 1Tb
with 100Gb each, assign a metadata disk less than 10Gb with
1Gb per each LV. The recommended size is 40Gb with 4Gb
per each LV.
After applying BareMetalHostProfile, the bare metal provider
creates an LVM partitioning for the metadata disk and places
these volumes as /dev paths, for example, /dev/bluedb/meta_1
or /dev/bluedb/meta_3.
Example template of a host profile configuration for Ceph
The BareMetalHostProfile API allows configuring a host to use the
huge pages feature of the Linux kernel on managed clusters. The procedure
included in this section applies to both new and existing cluster
deployments.
Note
Huge pages is a mode of operation of the Linux kernel. With huge
pages enabled, the kernel allocates the RAM in bigger chunks, or pages.
This allows kernel-based virtual machines and virtual machines running
on it to use the host RAM more efficiently and improves the performance
of the virtual machines.
To enable huge pages in a custom bare metal host profile for a managed
cluster:
Log in to the local machine where you management
cluster kubeconfig is located and where kubectl is installed.
Note
The management cluster kubeconfig is created automatically
during the last stage of the management cluster bootstrap.
Open for editing or create a new bare metal host profile
under the templates/bm/ directory.
Edit the grubConfig section of the host profile spec using
the example below to configure the kernel boot parameters and
enable huge pages:
The example configuration above will allocate N huge pages of 1 GB each
on the server boot. The last hugepagesz parameter value is default unless
default_hugepagesz is defined. For details about possible values, see
official Linux kernel documentation.
Add the bare metal host profile to your management cluster:
During a management or MOSK cluster creation, you can
configure the support of the software-based Redundant Array of Independent
Disks (RAID) using BareMetalHosProfile to set up an LVM-based RAID level 1
(raid1) or an mdadm-based RAID level 0, 1, or 10 (raid0, raid1, or
raid10).
If required, you can further configure RAID in the same profile, for example,
to install a cluster operating system onto a RAID device.
Caution
RAID configuration on already provisioned bare metal machines
or on an existing cluster is not supported.
To start using any kind of RAIDs, reprovisioning of machines
with a new BaremetalHostProfile is required.
Mirantis supports the raid1 type of RAID devices both
for LVM and mdadm.
Mirantis supports the raid0 type for the mdadm RAID
to be on par with the LVM linear type.
Mirantis recommends having at least two physical disks
for raid0 and raid1 devices to prevent unnecessary
complexity.
Mirantis supports the raid10 type for mdadm RAID. At least
four physical disks are required for this type of RAID.
Only an even number of disks can be used for a raid1 or
raid10 device.
The EFI system partition partflags: ['esp'] must be
a physical partition in the main partition table of the disk, not under
LVM or mdadm software RAID.
During configuration of your custom bare metal host profile,
you can create an LVM-based software RAID device raid1 by adding
type: raid1 to the logicalVolume spec in BaremetalHostProfile.
The logicalVolume spec of the raid1 type requires at least
two devices (partitions) in volumeGroup where you build a logical
volume. For an LVM of the linear type, one device is enough.
You can use flexible size units throughout bare metal host profiles.
For example, you can now use either sizeGiB:0.1 or size:100Mi
when specifying a device size.
Mirantis recommends using only one parameter name type and units throughout
the configuration files. If both sizeGiB and size are used,
sizeGiB is ignored during deployment and the suffix is adjusted
accordingly. For example, 1.5Gi will be serialized as 1536Mi. The size
without units is counted in bytes. For example, size:120 means 120 bytes.
Note
The LVM raid1 requires additional space to store the raid1
metadata on a volume group, roughly 4 MB for each partition.
Therefore, you cannot create a logical volume of exactly the same
size as the partitions it works on.
For example, if you have two partitions of 10 GiB, the corresponding
raid1 logical volume size will be less than 10 GiB. For that
reason, you can either set size: 0 to use all available
space on the volume group, or set a smaller size than the partition
size. For example, use size: 9.9Gi instead of
size: 10Gi for the logical volume.
The following example illustrates an extract of BaremetalHostProfile
with / on the LVM raid1.
All data will be wiped during cluster deployment on devices
defined directly or indirectly in the fileSystems list of
BareMetalHostProfile. For example:
A raw device partition with a file system on it
A device partition in a volume group with a logical volume that has a
file system on it
An mdadm RAID device with a file system on it
An LVM RAID device with a file system on it
The wipe field is always considered true for these devices.
The false value is ignored.
Therefore, to prevent data loss, move the necessary data from these file
systems to another server beforehand, if required.
You can configure an LVM volume group on top of mdadm-based RAID devices as
physical volumes using the BareMetalHostProfile resource. List the
required RAID devices in a separate field of the volumeGroups definition
within the storage configuration of BareMetalHostProfile.
You can use flexible size units throughout bare metal host profiles.
For example, you can now use either sizeGiB:0.1 or size:100Mi
when specifying a device size.
Mirantis recommends using only one parameter name type and units throughout
the configuration files. If both sizeGiB and size are used,
sizeGiB is ignored during deployment and the suffix is adjusted
accordingly. For example, 1.5Gi will be serialized as 1536Mi. The size
without units is counted in bytes. For example, size:120 means 120 bytes.
Warning
All data will be wiped during cluster deployment on devices
defined directly or indirectly in the fileSystems list of
BareMetalHostProfile. For example:
A raw device partition with a file system on it
A device partition in a volume group with a logical volume that has a
file system on it
An mdadm RAID device with a file system on it
An LVM RAID device with a file system on it
The wipe field is always considered true for these devices.
The false value is ignored.
Therefore, to prevent data loss, move the necessary data from these file
systems to another server beforehand, if required.
The following example illustrates an extract of BaremetalHostProfile with
a volume group named lvm_nova to be created on top of an mdadm-based
RAID device raid1:
Create an mdadm software RAID (raid0, raid1, raid10)¶
TechPreview
Warning
The EFI system partition partflags: ['esp'] must be
a physical partition in the main partition table of the disk, not under
LVM or mdadm software RAID.
During configuration of your custom bare metal host profile as described in
Create a custom bare metal host profile, you can create an mdadm-based software RAID
device raid0 and raid1 by describing the mdadm devices under the
softRaidDevices field in BaremetalHostProfile. For example:
You can also use the raid10 type for the mdadm-based software RAID devices.
This type requires at least four and in total an even number of storage devices
available on your servers. For example:
The following fields in softRaidDevices describe RAID devices:
name
Name of the RAID device to refer to throughout the baremetalhostprofile.
level
Type or level of RAID used to create a device, defaults to raid1.
Set to raid0 or raid10 to create a device of the corresponding type.
devices
List of physical devices or partitions used to build a software RAID device.
It must include at least two partitions or devices to build a raid0 and
raid1 devices and at least four for raid10.
The mdadm RAID devices cannot be created on top of LVM devices.
You can use flexible size units throughout bare metal host profiles.
For example, you can now use either sizeGiB:0.1 or size:100Mi
when specifying a device size.
Mirantis recommends using only one parameter name type and units throughout
the configuration files. If both sizeGiB and size are used,
sizeGiB is ignored during deployment and the suffix is adjusted
accordingly. For example, 1.5Gi will be serialized as 1536Mi. The size
without units is counted in bytes. For example, size:120 means 120 bytes.
Warning
All data will be wiped during cluster deployment on devices
defined directly or indirectly in the fileSystems list of
BareMetalHostProfile. For example:
A raw device partition with a file system on it
A device partition in a volume group with a logical volume that has a
file system on it
An mdadm RAID device with a file system on it
An LVM RAID device with a file system on it
The wipe field is always considered true for these devices.
The false value is ignored.
Therefore, to prevent data loss, move the necessary data from these file
systems to another server beforehand, if required.
The following example illustrates an extract of BaremetalHostProfile
with / on the mdadm raid1 and some data storage on raid0:
Example with / on the mdadm raid1 and data storage on raid0
The EFI system partition partflags: ['esp'] must be
a physical partition in the main partition table of the disk, not under
LVM or mdadm software RAID.
You can deploy MOSK on local software-based Redundant Array
of Independent Disks (RAID) devices to withstand failure of one device at a
time.
Using a custom bare metal host profile, you can configure and create
an mdadm-based software RAID device of type raid10 if you have
an even number of devices available on your servers. At least four
storage devices are required for such RAID device.
During configuration of your custom bare metal host profile as described in
Create a custom bare metal host profile, create an mdadm-based software RAID device
raid10 by describing the mdadm devices under the softRaidDevices
field. For example:
The following fields in softRaidDevices describe RAID devices:
name
Name of the RAID device to refer to throughout the baremetalhostprofile.
devices
List of physical devices or partitions used to build a software RAID device.
It must include at least four partitions or devices to build a raid10
device.
level
Type or level of RAID used to create device. Set to raid10 or raid1
to create a device of the corresponding type.
When building the raid10 array on top of device partitions,
make sure that only one partition per device is used for a given array.
Although having two partitions located on the same physical device as array
members is technically possible, it may lead to data loss if
mdadm selects both partitions of the same drive to be mirrored.
In such case, redundancy against entire drive failure is lost.
Warning
All data will be wiped during cluster deployment on devices
defined directly or indirectly in the fileSystems list of
BareMetalHostProfile. For example:
A raw device partition with a file system on it
A device partition in a volume group with a logical volume that has a
file system on it
An mdadm RAID device with a file system on it
An LVM RAID device with a file system on it
The wipe field is always considered true for these devices.
The false value is ignored.
Therefore, to prevent data loss, move the necessary data from these file
systems to another server beforehand, if required.
With L2 networking templates, you can create MOSK
clusters with advanced host networking configurations. For example,
you can create bond interfaces on top of physical interfaces on the
host or use multiple subnets to separate different types of network
traffic.
You can use several host-specific L2 templates per one cluster
to support different hardware configurations. For example, you can create
L2 templates with a different number and layout of NICs to be applied
to specific machines of one cluster.
You can also use multiple L2 templates to support different roles
for nodes in a MOSK installation. You can create L2
templates with different logical interfaces and assign them to individual
machines based on their roles in a MOSK cluster.
Caution
Modification of L2 templates in use is only allowed with a
mandatory validation step from the infrastructure operator to prevent
accidental cluster failures due to unsafe changes. The list of risks posed
by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to
the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause
data loss.
Incorrect configurations on hosts can lead to irrevocable loss of
connectivity between services and unexpected cluster partition or
disassembly.
Since MOSK 23.2.2, in the Technology Preview scope, you can
create a MOSK cluster with the multi-rack topology,
where cluster nodes including Kubernetes masters are distributed across
multiple racks without L2 layer extension between them, and use BGP for
announcement of the cluster API load balancer address and external addresses
of Kubernetes load-balanced services.
Implementation of the multi-rack topology implies the use of Rack and
MultiRackCluster objects that support configuration of BGP announcement
of the cluster API load balancer address. For the configuration procedure,
refer to Configure BGP announcement for cluster API LB address. For configuring the BGP announcement of
external addresses of Kubernetes load-balanced services, refer to
Configure MetalLB.
Follow the procedures described in the below subsections to configure initial
settings and advanced network objects for your managed clusters.
This section instructs you on how to configure and deploy a managed cluster
that is based on the baremetal-based management cluster
through the Mirantis Container Cloud web UI.
Note
Due to the known issue 50181, creation of a
compact managed cluster or addition of any labels to the control plane nodes
is not available through the Container Cloud web UI.
To create a managed cluster on bare metal:
Available since the Cluster release 16.1.0 on the management cluster.
If you plan to deploy a large managed cluster, enable dynamic IP allocation
to increase the amount of baremetal hosts to be provisioned in parallel.
For details, see Container Cloud Deployment Guide: Enable dynamic IP
allocation.
Available since Container Cloud 2.24.0 (Cluster release 14.0.0). Optional.
Technology Preview. Enable custom host names for cluster machines.
When enabled, any machine host name in a particular region matches the related
Machine object name. For example, instead of the default
kaas-node-<UID>, a machine host name will be master-0. The custom
naming format is more convenient and easier to operate with.
Skip this step if you enabled this feature during management cluster bootstrap,
because custom host names will be automatically enabled on the related managed
cluster as well.
Log in to the Container Cloud web UI with the writer permissions.
Switch to the required non-default project using the
Switch Project action icon located on top of the main left-side
navigation panel.
Caution
Do not create a MOSK cluster in the default project
(Kubernetes namespace), which is dedicated for the management cluster only.
If no projects are defined, first create a new mosk project as described
in Create a project for MOSK clusters.
In the SSH keys tab, click Add SSH Key
to upload the public SSH key that will be used for the SSH access to VMs.
Optional. In the Proxies tab, enable proxy access
to the managed cluster:
Click Add Proxy.
In the Add New Proxy wizard, fill out the form
with the following parameters:
Optional. Technology Preview. Deprecated since Container Cloud 2.29.0
(Cluster releases 17.4.0 and 16.4.0). Available since Container Cloud
2.24.0 (Cluster release 14.0.0). Enable WireGuard for traffic encryption
on the Kubernetes workloads network.
WireGuard configuration
Ensure that the Calico MTU size is at least 60 bytes smaller than
the interface MTU size of the workload network. IPv4 WireGuard uses
a 60-byte header. For details, see Set the MTU size for Calico.
Enable WireGuard by selecting the Enable WireGuard
check box.
Caution
Changing this parameter on a running cluster causes a
downtime that can vary depending on the cluster size.
Note
This parameter was renamed from
Enable Secure Overlay to Enable WireGuard
in Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0).
Optional. Available since Container Cloud 2.25.0 (Cluster releases
17.0.0 and 16.0.0).
The maximum number of the worker nodes to update simultaneously. It serves as
an upper limit on the number of machines that are drained at a given moment
of time. Defaults to 1.
You can also configure this option after deployment before
the cluster update.
Parallel Preparation For Upgrade Of Worker Machines
Optional. Available since Container Cloud 2.25.0 (Cluster releases
17.0.0 and 16.0.0)
The maximum number of worker nodes being prepared at a given moment of time,
which includes downloading of new artifacts. It serves as a limit for the
network load that can occur when downloading the files to the nodes.
Defaults to 50.
You can also configure this option after deployment before
the cluster update.
Provider
LB host IP
The IP address of the load balancer endpoint that will be used to
access the Kubernetes API of the new cluster. This IP address
must be in the LCM network if a separate LCM network is in use and
if L2 (ARP) announcement of cluster API load balancer IP is in use.
LB address range Removed in 24.3
The range of IP addresses that can be assigned to load balancers
for Kubernetes Services by MetalLB. For a more flexible MetalLB
configuration, refer to Configure MetalLB.
Note
Since MOSK 24.3, MetalLB configuration
must be added after cluster creation.
Kubernetes
Services CIDR blocks
The Kubernetes Services CIDR blocks.
For example, 10.233.0.0/18.
Pods CIDR blocks
The Kubernetes pods CIDR blocks.
For example, 10.233.64.0/18.
Note
The network subnet size of Kubernetes pods influences
the number of nodes that can be deployed in the cluster.
The default subnet size /18 is enough to create a cluster with
up to 256 nodes. Each node uses the /26 address blocks
(64 addresses), at least one address block is allocated per node.
These addresses are used by the Kubernetes pods with
hostNetwork:false. The cluster size may be limited further when
some nodes use more than one address block.
Configure StackLight:
Note
If StackLight is enabled in non-HA mode but Ceph is not
deployed yet, StackLight will not be installed and will be stuck in
the Yellow state waiting for a successful Ceph installation. Once
the Ceph cluster is deployed, the StackLight installation resumes.
To deploy a Ceph cluster, refer to Add a Ceph cluster.
StackLight configuration
Section
Parameter name
Description
StackLight
Enable Monitoring
Selected by default. Deselect to skip StackLight deployment.
Select to deploy the StackLight logging stack. For details about the
logging components, see Deployment architecture.
Note
The logging mechanism performance depends on the cluster log
load. In case of a high load, you may need to increase the default
resource requests and limits for fluentdLogs. For details, see
StackLight resource limits.
HA Mode
Select to enable StackLight monitoring in High Availability (HA) mode.
For differences between HA and non-HA modes, see Deployment architecture.
If disabled, StackLight requires a Ceph cluster. To deploy a Ceph cluster,
refer to Add a Ceph cluster.
StackLight Default Logs Severity Level
Log severity (verbosity) level for all StackLight components.
The default value for this parameter is
Default component log level that respects original defaults
of each StackLight component.
For details about severity levels, see StackLight log verbosity.
StackLight Component Logs Severity Level
The severity level of logs for a specific StackLight component that
overrides the value of the
StackLight Default Logs Severity Level parameter.
For details about severity levels, see StackLight log verbosity.
Expand the drop-down menu for a specific component to display
its list of available log levels.
OpenSearch
Logstash Retention Time Removed in MOSK 24.1
Available if you select Enable Logging. Specifies the
logstash-* index retention time.
Events Retention Time
Available if you select Enable Logging. Specifies the
kubernetes_events-* index retention time.
Notifications Retention Time
Available if you select Enable Logging. Specifies the
notification-* index retention time.
Persistent Volume Claim Size
Available if you select Enable Logging.
The OpenSearch persistent volume claim size.
Collected Logs Severity Level
Available if you select Enable Logging.
The minimum severity of all Container Cloud components logs
collected in OpenSearch.
For details about severity levels, see StackLight logging.
Prometheus
Retention Time
The Prometheus database retention period.
Retention Size
The Prometheus database retention size.
Persistent Volume Claim Size
The Prometheus persistent volume claim size.
Enable Watchdog Alert
Select to enable the Watchdog alert that fires
as long as the entire alerting pipeline is functional.
Custom Alerts
Specify alerting rules for new custom alerts or upload a YAML file
in the following exemplary format:
Available since Container Cloud 2.24.0 (Cluster releases 14.0.0 and 15.0.1).
Optional. Technology Preview. Enable the Linux Audit daemon auditd
to monitor activity of cluster processes and prevent potential malicious
activity.
Configuration for auditd
In the Cluster object or cluster.yaml.template, add the auditd
parameters:
Boolean, default - false. Enables the auditd role to install the
auditd packages and configure rules. CIS rules: 4.1.1.1, 4.1.1.2.
enabledAtBoot
Boolean, default - false. Configures grub to audit processes that can
be audited even if they start up prior to auditd startup. CIS rule:
4.1.1.3.
backlogLimit
Integer, default - none. Configures the backlog to hold records. If during
boot audit=1 is configured, the backlog holds 64 records. If more than
64 records are created during boot, auditd records will be lost with a
potential malicious activity being undetected. CIS rule: 4.1.1.4.
maxLogFile
Integer, default - none. Configures the maximum size of the audit log file.
Once the log reaches the maximum size, it is rotated and a new log file is
created. CIS rule: 4.1.2.1.
maxLogFileAction
String, default - none. Defines handling of the audit log file reaching the
maximum file size. Allowed values:
keep_logs - rotate logs but never delete them
rotate - add a cron job to compress rotated log files and keep
maximum 5 compressed files.
compress - compress log files and keep them under the
/var/log/auditd/ directory. Requires
auditd_max_log_file_keep to be enabled.
CIS rule: 4.1.2.2.
maxLogFileKeep
Integer, default - 5. Defines the number of compressed log files to
keep under the /var/log/auditd/ directory. Requires
auditd_max_log_file_action=compress. CIS rules - none.
mayHaltSystem
Boolean, default - false. Halts the system when the audit logs are
full. Applies the following configuration:
space_left_action=email
action_mail_acct=root
admin_space_left_action=halt
CIS rule: 4.1.2.3.
customRules
String, default - none. Base64-encoded content of the 60-custom.rules
file for any architecture. CIS rules - none.
customRulesX32
String, default - none. Base64-encoded content of the 60-custom.rules
file for the i386 architecture. CIS rules - none.
customRulesX64
String, default - none. Base64-encoded content of the 60-custom.rules
file for the x86_64 architecture. CIS rules - none.
presetRules
String, default - none. Comma-separated list of the following built-in
preset rules:
access
actions
delete
docker
identity
immutable
logins
mac-policy
modules
mounts
perm-mod
privileged
scope
session
system-locale
time-change
Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and 16.3.0) in the
Technology Preview scope, you can collect some of the preset rules
indicated above as groups and use them in presetRules:
ubuntu-cis-rules - this group contains rules to comply with the
Ubuntu CIS Benchmark recommendations, including the following CIS Ubuntu
20.04 v2.0.1 rules:
scope - 5.2.3.1
actions - same as 5.2.3.2
time-change - 5.2.3.4
system-locale - 5.2.3.5
privileged - 5.2.3.6
access - 5.2.3.7
identity - 5.2.3.8
perm-mod - 5.2.3.9
mounts - 5.2.3.10
session - 5.2.3.11
logins - 5.2.3.12
delete - 5.2.3.13
mac-policy - 5.2.3.14
modules - 5.2.3.19
docker-cis-rules - this group contains rules to comply with
Docker CIS Benchmark recommendations, including the docker Docker CIS
v1.6.0 rules 1.1.3 - 1.1.18.
You can also use two additional keywords inside presetRules:
none - select no built-in rules.
all - select all built-in rules. When using this keyword, you can add
the ! prefix to a rule name to exclude some rules. You can use the
! prefix for rules only if you add the all keyword as the
first rule. Place a rule with the ! prefix only after
the all keyword.
Example configurations:
presetRules:none - disable all preset rules
presetRules:docker - enable only the docker rules
presetRules:access,actions,logins - enable only the
access, actions, and logins rules
presetRules:ubuntu-cis-rules - enable all rules from the
ubuntu-cis-rules group
presetRules:docker-cis-rules,actions - enable all rules from
the docker-cis-rules group and the actions rule
presetRules:all - enable all preset rules
presetRules:all,!immutable,!sessions - enable all preset
rules except immutable and sessions
CIS controls
4.1.3 (time-change)
4.1.4 (identity)
4.1.5 (system-locale)
4.1.6 (mac-policy)
4.1.7 (logins)
4.1.8 (session)
4.1.9 (perm-mod)
4.1.10 (access)
4.1.11 (privileged)
4.1.12 (mounts)
4.1.13 (delete)
4.1.14 (scope)
4.1.15 (actions)
4.1.16 (modules)
4.1.17 (immutable)
Docker CIS controls
1.1.4
1.1.8
1.1.10
1.1.12
1.1.13
1.1.15
1.1.16
1.1.17
1.1.18
1.2.3
1.2.4
1.2.5
1.2.6
1.2.7
1.2.10
1.2.11
Optional. Colocate the OpenStack control plane with the managed cluster
Kubernetes manager nodes by adding the following field to the Cluster
object spec:
This feature is available as technical preview. Use such
configuration for testing and evaluation purposes only.
Optional. Customize MetalLB speakers that are deployed on all Kubernetes
nodes except master nodes by default. For details, see
Configure the MetalLB speaker node selector.
Configure the MetalLB parameters related to IP address allocation and
announcement for load-balanced cluster services. For details, see
Configure MetalLB.
This section describes how to set up and verify MetalLB parameters before
configuring subnets for a MOSK cluster.
Caution
This section also applies to the bootstrap procedure of a
management cluster with the following differences:
Instead of the Cluster object, configure
templates/bm/cluster.yaml.template.
Instead of the MetalLBConfig object, configure
templates/bm/metallbconfig.yaml.template.
Instead of creating specific IPAM objects such as Subnet and
L2Template (as well as Rack and MultiRackCluster when using
BGP configuration), add their settings to
templates/bm/ipam-objects.yaml.template.
Configuration rules for the ‘MetalLBConfig’ object¶
Caution
The use of the MetalLBConfig object is mandatory after your
management cluster upgrade to the Cluster release 16.0.0.
The following rules and requirements apply to configuration of the
MetalLBConfig object:
Define one MetalLBConfig object per cluster.
Define the following mandatory labels:
cluster.sigs.k8s.io/cluster-name
Specifies the cluster name where the MetalLB address pool is used.
kaas.mirantis.com/region
Specifies the region name of the cluster where the MetalLB address pool is
used.
kaas.mirantis.com/provider
Specifies the provider of the cluster where the MetalLB address pool is used.
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Intersection of IP address ranges within any single MetalLB address pool
is not allowed.
At least one MetalLB address pool must have the auto-assign policy enabled
so that unannotated services can have load balancer IP addresses allocated
for them.
When configuring multiple address pools with the auto-assign policy enabled,
keep in mind that it is not determined in advance which pool of those
multiple address pools is used to allocate an IP address for a particular
unannotated service.
Note
You can optimize address announcement for load-balanced services
using the interfaces selector for the l2Advertisements object. This
selector allows for address announcement only on selected host interfaces.
For details, see Container Cloud API Reference: MetalLBConfig spec.
Configuration rules for MetalLBConfigTemplate (obsolete since 24.2)
Caution
The MetalLBConfigTemplate object is deprecated since
MOSK 24.2 and unsupported since
MOSK 24.3. For details, see
Deprecation notes.
All rules described above for MetalLBConfig also apply to
MetalLBConfigTemplate.
Optional. Define one MetalLBConfigTemplate object per cluster.
The use of this object without MetalLBConfig is not allowed.
When using MetalLBConfigTemplate:
MetalLBConfig must reference MetalLBConfigTemplate by name:
spec:templateName:<managed-metallb-template>
You can use Subnet objects for defining MetalLB address pools.
Refer to MetalLB configuration guidelines for subnets for guidelines on configuring
MetalLB address pools using Subnet objects.
You can optimize address announcement for load-balanced services using
the interfaces selector for the l2Advertisements object. This
selector allows for address announcement only on selected host
interfaces. For details, see Container Cloud API MetalLBConfigTemplate
spec.
The BGP configuration is not yet supported in the Container Cloud
web UI. Meantime, use the CLI for this purpose. For details, see
Configure and verify MetalLB using the CLI.
Optional. Configure parameters related to MetalLB components life cycle
such as deployment and update using the metallb Helm chart values in
the Clusterspec section. For example:
Log in to the Container Cloud web UI with the writer permissions.
Switch to the required non-default project using the
Switch Project action icon located on top of the main left-side
navigation panel.
Caution
Do not create a MOSK cluster in the default project
(Kubernetes namespace), which is dedicated for the management cluster only.
If no projects are defined, first create a new mosk project as described
in Create a project for MOSK clusters.
In the Networks section, click the MetalLB Configs
tab.
Click Create MetalLB Config.
Fill out the Create MetalLB Config form as required:
Name
Name of the MetalLB object being created.
Cluster
Name of the cluster that the MetalLB object is being created
for
IP Address Pools
List of MetalLB IP address pool descriptions that will be used to create
the MetalLB IPAddressPool objects. Click the + button on
the right side of the section to add more objects.
Name
IP address pool name.
Addresses
Comma-separated ranges of the IP addresses included into the address
pool.
Auto Assign
Enable auto-assign policy for unannotated services to have load
balancer IP addresses allocated to them. At least one MetalLB address
pool must have the auto-assign policy enabled.
Service Allocation
IP address pool allocation to services. Click Edit to
insert a service allocation object with required label selectors for
services in the YAML format. For example:
List of MetalLBL2Advertisement objects to create MetalLB
L2Advertisement objects.
The l2Advertisements object allows defining interfaces to optimize
the announcement. When you use the interfaces selector, LB addresses
are announced only on selected host interfaces.
Mirantis recommends using the interfaces selector if nodes use separate
host networks for different types of traffic. The pros of such configuration
are as follows: less spam on other interfaces and networks and limited chances
to reach IP addresses of load-balanced services from irrelevant interfaces and
networks.
Caution
Interface names in the interfaces list must match those
on the corresponding nodes.
Add the following parameters:
Name
Name of the l2Advertisements object.
Interfaces
Optional. Comma-separated list of interface names that must match
the ones on the corresponding nodes. These names are defined in
L2 templates that are linked to the selected cluster.
IP Address Pools
Select the IP adress pool to use for the l2Advertisements
object.
Node Selectors
Optional. Match labels and values for the Kubernetes node selector
to limit the nodes announced as next hops for the LoadBalancer
IP. If you do not provide any labels, all nodes are announced as
next hops.
In Networks > MetalLB Configs, verify the status of the created
MetalLB object:
Ready - object is operational.
Error - object is non-operational. Hover over the status
to obtain details of the issue.
Note
To verify the object details, in
Networks > MetalLB Configs, click the More action
icon in the last column of the required object section and select
MetalLB Config info.
Proceed to creating cluster subnets as described in
Create subnets.
Optional. Configure parameters related to MetalLB components life cycle
such as deployment and update using the metallb Helm chart values in
the Clusterspec section. For example:
In the Technology Preview scope, you can use BGP for announcement of
external addresses of Kubernetes load-balanced services for a
MOSK cluster. To configure the BGP announcement mode
for MetalLB, use MetalLBConfig object.
The use of BGP is required to announce IP addresses for load-balanced
services when using MetalLB on nodes that are distributed across
multiple racks. In this case, setting of rack-id labels on nodes
is required, they are used in node selectors for BGPPeer,
BGPAdvertisement, or both MetalLB objects to properly configure
BGP connections from each node.
Configuration example of the Machine object for the
BGP announcement mode
apiVersion:cluster.k8s.io/v1alpha1kind:Machinemetadata:name:test-cluster-compute-1namespace:mosk-nslabels:cluster.sigs.k8s.io/cluster-name:test-clusteripam/RackRef:rack-1# reference to the "rack-1" Rackkaas.mirantis.com/provider:baremetalspec:providerSpec:value:...nodeLabels:-key:rack-id# node label can be used in "nodeSelectors" insidevalue:rack-1# "BGPPeer" and/or "BGPAdvertisement" MetalLB objects...
Configuration example of the MetalLBConfig
object for the BGP announcement mode
apiVersion:ipam.mirantis.com/v1alpha1kind:MetalLBConfigmetadata:name:test-cluster-metallb-confignamespace:mosk-nslabels:cluster.sigs.k8s.io/cluster-name:test-clusterkaas.mirantis.com/provider:baremetalspec:...bgpPeers:-name:svc-peer-1spec:holdTime:0skeepaliveTime:0speerAddress:10.77.42.1peerASN:65100myASN:65101nodeSelectors:-matchLabels:rack-id:rack-1# references the nodes having# the "rack-id=rack-1" labelbgpAdvertisements:-name:servicesspec:aggregationLength:32aggregationLengthV6:128ipAddressPools:-servicespeers:-svc-peer-1...
The bgpPeers and bgpAdvertisements fields are used to
configure BGP announcement instead of l2Advertisements.
The use of BGP for announcement also allows for better balancing
of service traffic between cluster nodes as well as gives more
configuration control and flexibility for infrastructure administrators.
For configuration examples, refer to Examples of MetalLBConfig.
For configuration procedure, refer to Configure BGP announcement for cluster API LB address.
Since MOSK 23.2
Select from the following options:
Deprecated since MOSK 24.2 and unsupported since
MOSK 24.3.
Mandatory after a management cluster upgrade to the Cluster release
16.0.0. Recommended and default since MOSK 23.2 in
the Technology Preview scope.
Create MetalLBConfig and MetalLBConfigTemplate objects.
This method allows using the Subnet object to define MetalLB
address pools.
Since MOSK 23.2.2, in the Technology Preview scope,
you can use BGP for announcement of external addresses of Kubernetes
load-balanced services for a MOSK cluster.
To configure the BGP announcement mode for MetalLB, use
MetalLBConfig and MetalLBConfigTemplate objects.
The use of BGP is required to announce IP addresses for load-balanced
services when using MetalLB on nodes that are distributed across
multiple racks. In this case, setting of rack-id labels on nodes
is required, they are used in node selectors for BGPPeer,
BGPAdvertisement, or both MetalLB objects to properly configure
BGP connections from each node.
Configuration example of the Machine object for the
BGP announcement mode
apiVersion:cluster.k8s.io/v1alpha1kind:Machinemetadata:name:test-cluster-compute-1namespace:mosk-nslabels:cluster.sigs.k8s.io/cluster-name:test-clusteripam/RackRef:rack-1# reference to the "rack-1" Rackkaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onespec:providerSpec:value:...nodeLabels:-key:rack-id# node label can be used in "nodeSelectors" insidevalue:rack-1# "BGPPeer" and/or "BGPAdvertisement" MetalLB objects...
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Configuration example of the MetalLBConfigTemplate
object for the BGP announcement mode
apiVersion:ipam.mirantis.com/v1alpha1kind:MetalLBConfigTemplatemetadata:name:test-cluster-metallb-config-templatenamespace:mosk-nslabels:cluster.sigs.k8s.io/cluster-name:test-clusterkaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onespec:templates:...bgpPeers:|- name: svc-peer-1spec:peerAddress: 10.77.42.1peerASN: 65100myASN: 65101nodeSelectors:- matchLabels:rack-id: rack-1 # references the nodes having# the "rack-id=rack-1" labelbgpAdvertisements:|- name: servicesspec:ipAddressPools:- servicespeers:- svc-peer-1...
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
The bgpPeers and bgpAdvertisements fields are used to
configure BGP announcement instead of l2Advertisements.
The use of BGP for announcement also allows for better balancing
of service traffic between cluster nodes as well as gives more
configuration control and flexibility for infrastructure administrators.
For configuration examples, refer to Examples of MetalLBConfigTemplate.
For configuration procedure, refer to Configure BGP announcement for cluster API LB address.
Not recommended. Configure the configInline value in the MetalLB
chart of the Cluster object.
Warning
This option is deprecated since MOSK
23.2 and is removed during the management cluster upgrade to the
Cluster release 16.0.0, which is introduced in Container Cloud 2.25.0.
Therefore, this option becomes unavailable on MOSK
23.2 clusters after the parent management cluster upgrade to 2.25.0.
Not recommended. Configure the Subnet objects without
MetalLBConfigTemplate.
Warning
This option is deprecated since MOSK
23.2 and is removed during the management cluster upgrade to the
Cluster release 16.0.0, which is introduced in Container Cloud 2.25.0.
Therefore, this option becomes unavailable on MOSK
23.2 clusters after the parent management cluster upgrade to 2.25.0.
Caution
If the MetalLBConfig object is not used for MetalLB
configuration related to address allocation and announcement for
load-balanced services, then automated migration applies during
cluster creation or update to MOSK 23.2.
During automated migration, the MetalLBConfig and
MetalLBConfigTemplate objects are created and contents of the MetalLB
chart configInline value is converted to the parameters of the
MetalLBConfigTemplate object.
Any change to the configInline value made on a
MOSK 23.2 cluster will be reflected in the
MetalLBConfigTemplate object.
This automated migration is removed during your management cluster
upgrade to the Cluster release 16.0.0, which is introduced in Container
Cloud 2.25.0, together with the possibility to use the configInline
value of the MetalLB chart. After that, any changes in MetalLB
configuration related to address allocation and announcement for
load-balanced services are applied using the MetalLBConfig,
MetalLBConfigTemplate, and Subnet objects only.
Configure the configInline value for the MetalLB chart in the
Cluster object.
Configure both the configInline value for the MetalLB chart and
Subnet objects.
The resulting MetalLB address pools configuration will contain address
ranges from both cluster specification and Subnet objects.
All address ranges for L2 address pools will be aggregated into a single
L2 address pool and sorted as strings.
Changes to be applied since MOSK 23.2
The configuration options above become deprecated since 23.2, and
automated migration of MetalLB parameters applies during cluster creation
or update to MOSK 23.2.
During automated migration, the MetalLBConfig and
MetalLBConfigTemplate objects are created and contents of the
MetalLB chart configInline value is converted to the parameters of
the MetalLBConfigTemplate object.
Any change to the configInline value made on a
MOSK 23.2 cluster will be reflected in the
MetalLBConfigTemplate object.
This automated migration is removed during your management cluster
upgrade to Container Cloud 2.25.0 together with the possibility to use
the configInline value of the MetalLB chart. After that, any
changes in MetalLB configuration related to address allocation and
announcement for load-balanced services will be applied using the
MetalLBConfigTemplate and Subnet objects only.
Verify the current MetalLB configuration:
Since MOSK 22.5
Verify the MetalLB configuration that is stored in MetalLB objects:
The auto-assign parameter will be set to false for all address
pools except the default one. So, a particular service will get an
address from such an address pool only if the Service object has a
special metallb.universe.tf/address-pool annotation that points to
the specific address pool name.
Note
It is expected that every Kubernetes service on a management
cluster will be assigned to one of the address pools. Current
consideration is to have two MetalLB address pools:
services-pxe is a reserved address pool name to use for the
Kubernetes services in the PXE network (Ironic API, HTTP server,
caching server).
default is an address pool to use for all other Kubernetes services
in the management network. No annotation is required on the Service
objects in this case.
Proceed to creating cluster subnets as described in
Create subnets.
By default, MetalLB speakers are deployed on all Kubernetes nodes except
master nodes.
You can configure MetalLB to run its speakers on a particular set of nodes.
This decreases the number of nodes that should be connected to external
network. In this scenario, only a few nodes are exposed for ingress
traffic from the outside world.
To customize the MetalLB speaker node selector:
Using kubeconfig of the Container Cloud management cluster, open the
MOSK Cluster object for editing:
The metallbSpeakerEnabled:"true" parameter in this example is the
label on Kubernetes nodes where MetalLB speakers will be deployed.
It can be an already existing node label or a new one.
You can add user-defined labels to nodes using the nodeLabels field.
This field contains the list of node labels to be attached to a node for the
user to run certain components on separate cluster nodes. The list of allowed
node labels is located in the Cluster object status
providerStatus.releaseRef.current.allowedNodeLabels field.
If the value field is not defined in allowedNodeLabels, a label can
have any value. For example:
Before or after a machine deployment, add the required label from the allowed
node labels list with the corresponding value to
spec.providerSpec.value.nodeLabels in machine.yaml. For example:
nodeLabels:-key:stacklightvalue:enabled
Adding of a node label that is not available in the list of allowed node
labels is restricted.
Configure BGP announcement for cluster API LB address¶
Available since MOSK 23.2.2TechPreview
When you create a MOSK cluster with the multi-rack topology,
where Kubernetes masters are distributed across multiple racks
without an L2 layer extension between them, you must configure
BGP announcement of the cluster API load balancer address.
For clusters where Kubernetes masters are in the same rack or with an L2 layer
extension between masters, you can configure either BGP or L2 (ARP)
announcement of the cluster API load balancer address.
The L2 (ARP) announcement is used by default and its configuration is covered
in Create a managed bare metal cluster.
Caution
Create Rack and MultiRackCluster objects, which are
described in the below procedure, before initiating the provisioning
of master nodes to ensure that both BGP and netplan configurations
are applied simultaneously during the provisioning process.
To enable the use of BGP announcement for the cluster API LB address:
In the Cluster object, set the useBGPAnnouncement parameter
to true:
spec:providerSpec:value:useBGPAnnouncement:true
Create the MultiRackCluster object that is mandatory when configuring
BGP announcement for the cluster API LB address. This object enables you
to set cluster-wide parameters for configuration of BGP announcement.
In this scenario, the MultiRackCluster object must be bound to the
corresponding Cluster object using the
cluster.sigs.k8s.io/cluster-name label.
Container Cloud uses the bird BGP daemon for announcement of the cluster
API LB address. For this reason, set the corresponding
bgpdConfigFileName and bgpdConfigFilePath parameters in the
MultiRackCluster object, so that bird can locate the configuration
file. For details, see the configuration example below.
The bgpdConfigTemplate object contains the default configuration file
template for the bird BGP daemon, which you can override in Rack
objects.
The defaultPeer parameter contains default parameters of the BGP
connection from master nodes to infrastructure BGP peers, which you can
override in Rack objects.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Create the Rack object(s). This object is mandatory when configuring
BGP announcement for the cluster API LB address and it allows you
to configure BGP announcement parameters for each rack.
In this scenario, Rack objects must be bound to Machine objects
corresponding to master nodes of the cluster.
Each Rack object describes the configuration for the bird BGP
daemon used to announce the cluster API LB address from a particular
master node or from several master nodes in the same rack.
Set a reference to the Rack object used to configure the bird BGP
daemon for a particular master node to announce the cluster API LB IP:
Since MOSK 25.1
In the Machine objects for all master nodes, set the ipam/RackRef
label with the value equal to the name of the corresponding Rack
object. For example:
apiVersion:cluster.k8s.io/v1alpha1kind:Machinemetadata:labels:ipam/RackRef:rack-master-1# reference to the "rack-master-1" Rack...
Before MOSK 25.1. (deprecated)
In the BareMetalHost objects for all cluster nodes, set the
ipam.mirantis.com/rack-ref annotation with the value equal to the name
of the corresponding Rack object. For example:
apiVersion:metal3.io/v1alpha1kind:BareMetalHostmetadata:annotations:ipam.mirantis.com/rack-ref:rack-master-1# reference to the "rack-master-1" Rack...
Optional. Using the Machine object, define the rack-id node label
that is not used for BGP announcement of the cluster API LB IP but
can be used for MetalLB.
The rack-id node label is required for MetalLB node selectors when
MetalLB is used to announce LB IP addresses on nodes that are distributed
across multiple racks. In this scenario, the L2 (ARP) announcement mode
cannot be used for MetalLB because master nodes are in different L2
segments. So, the BGP announcement mode must be used for MetalLB, and node
selectors are required to properly configure BGP connections from each node.
See Configure MetalLB for details.
The L2Template object includes the lo interface configuration
to set the IP address for the bird BGP daemon that will be advertised
as the cluster API LB address. The {{ cluster_api_lb_ip }}
function is used in npTemplate to obtain the cluster API LB address
value.
Configuration example for Rack
apiVersion:ipam.mirantis.com/v1alpha1kind:Rackmetadata:name:rack-master-1namespace:mosk-nslabels:cluster.sigs.k8s.io/cluster-name:test-clusterkaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onespec:bgpdConfigTemplate:|# optional...peeringMap:lcm-rack-control-1:peers:-neighborIP:10.77.31.2# "localASN" & "neighborASN" are taken from-neighborIP:10.77.31.3# "MultiRackCluster.spec.defaultPeer" if# not set here
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Configuration example for Machine
apiVersion:cluster.k8s.io/v1alpha1kind:Machinemetadata:name:test-cluster-master-1namespace:mosk-nsannotations:metal3.io/BareMetalHost:mosk-ns/test-cluster-master-1labels:cluster.sigs.k8s.io/cluster-name:test-clustercluster.sigs.k8s.io/control-plane:controlplanehostlabel.bm.kaas.mirantis.com/controlplane:controlplaneipam/RackRef:rack-master-1# reference to the "rack-master-1" Rackkaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onespec:providerSpec:value:kind:BareMetalMachineProviderSpecapiVersion:baremetal.k8s.io/v1alpha1hostSelector:matchLabels:kaas.mirantis.com/baremetalhost-id:test-cluster-master-1l2TemplateSelector:name:test-cluster-master-1nodeLabels:# optional. it is not used for BGP announcement-key:rack-id# of the cluster API LB IP but it can be usedvalue:rack-master-1# for MetalLB if "nodeSelectors" are required...
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Configuration example for L2Template
apiVersion:ipam.mirantis.com/v1alpha1kind:L2Templatemetadata:labels:cluster.sigs.k8s.io/cluster-name:test-clusterkaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onename:test-cluster-master-1namespace:mosk-nsspec:...l3Layout:-subnetName:lcm-rack-control-1# this network is referencedscope:namespace# in the "rack-master-1" Rack-subnetName:ext-rack-control-1# optional. this network is usedscope:namespace# for k8s services traffic and# MetalLB BGP connections...npTemplate:|...ethernets:lo:addresses:- {{ cluster_api_lb_ip }} # function for cluster API LB IPdhcp4: falsedhcp6: false...
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
This section also applies to the bootstrap procedure of a
management cluster with the following difference: instead of creating the
Subnet object, add its configuration to
ipam-objects.yaml.template located in kaas-bootstrap/templates/bm/.
The Kubernetes Subnet object is created for a management cluster from
templates during bootstrap.
Each Subnet object can define either a MetalLB address range or MetalLB
address pool. A MetalLB address pool may contain one or several address
ranges. The following rules apply to creation of address ranges or pools:
To designate a subnet as a MetalLB address pool or range, use
the ipam/SVC-MetalLB label key. Set the label value to "1".
The object must contain the cluster.sigs.k8s.io/<cluster-name> label to
reference the name of the target cluster where the MetalLB address pool
is used.
You may create multiple subnets with the ipam/SVC-MetalLB label to
define multiple IP address ranges or multiple address pools for MetalLB in
the cluster.
The IP addresses of the MetalLB address pool are not assigned to the
interfaces on hosts. This subnet is virtual. Do not include such subnets
to the L2 template definitions for your cluster.
If a Subnet object defines a MetalLB address range, no additional
object properties are required.
You can use any number of Subnet objects that define a single MetalLB
address range. In this case, all address ranges are aggregated into
a single MetalLB L2 address pool named services having the auto-assign
policy enabled.
Intersection of IP address ranges within any single MetalLB address pool
is not allowed.
The bare metal provider verifies intersection of IP address ranges.
If it detects intersection, the MetalLB configuration is blocked and
the provider logs contain corresponding error messages.
Use the following labels to identify the Subnet object as a MetalLB
address pool and configure the name and protocol for that address pool.
All labels below are mandatory for the Subnet object that configures
a MetalLB address pool.
Mandatory Subnet labels for a MetalLB address pool¶
Label
Description
Labels to link Subnet to the target MOSK
clusters within a management cluster.
cluster.sigs.k8s.io/cluster-name
Specifies the cluster name where the MetalLB address pool is used.
kaas.mirantis.com/region
Specifies the region name of the cluster where the MetalLB address pool is
used.
kaas.mirantis.com/provider
Specifies the provider of the cluster where the MetalLB address pool is used.
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
ipam/SVC-MetalLB
Defines that the Subnet object will be used to provide
a new address pool or range for MetalLB.
metallb/address-pool-name
Every address pool must have a distinct name.
The services-pxe address pool is mandatory when configuring
a dedicated PXE network in the management cluster. This name will be
used in annotations for services exposed through the PXE network.
A bootstrap cluster also uses the services-pxe address pool
for its provision services so that management cluster nodes can be
provisioned from the bootstrap cluster. After a management cluster is
deployed, the bootstrap cluster is deleted and that address pool is
solely used by the newly deployed cluster.
metallb/address-pool-auto-assign
Configures the auto-assign policy of an address pool. Boolean.
Caution
For the address pools defined using the MetalLB Helm chart
values in the Clusterspec section, auto-assign policy is
set to true and is not configurable.
For any service that does not have a specific MetalLB annotation
configured, MetalLB allocates external IPs from arbitrary address
pools that have the auto-assign policy set to true.
Only for the service that has a specific MetalLB annotation with
the address pool name, MetalLB allocates external IPs from the address
pool having the auto-assign policy set to false.
metallb/address-pool-protocol
Sets the address pool protocol.
The only supported value is layer2 (default).
Caution
Do not set the same address pool name for two or more
Subnet objects. Otherwise, the corresponding MetalLB address pool
configuration fails with a warning message in the bare metal provider log.
Caution
For the auto-assign policy, the following configuration
rules apply:
At least one MetalLB address pool must have the auto-assign
policy enabled so that unannotated services can have load balancer IPs
allocated for them. To satisfy this requirement, either configure one
of address pools using the Subnet object with
metallb/address-pool-auto-assign:"true" or configure address
range(s) using the Subnet object(s) without
metallb/address-pool-* labels.
When configuring multiple address pools with the auto-assign policy
enabled, keep in mind that it is not determined in advance which pool of
those multiple address pools is used to allocate an IP for a particular
unannotated service.
To simplify operations with L2 templates, before you start creating
them, inspect the general workflow of a network interface name gathering
and processing.
Network interface naming workflow:
The Operator creates a BareMetalHostInventory object.
Note
Before update of the management cluster to Container Cloud 2.29.0
(Cluster release 16.4.0), instead of BareMetalHostInventory, use the
BareMetalHost object. For details, see Container Cloud API Reference:
BareMetalHost resource.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
The BareMetalHostInventory object executes the introspection stage
and becomes ready.
The Operator collects information about NIC count, naming, and so on
for further changes in the mapping logic.
At this stage, the NICs order in the object may randomly change
during each introspection, but the NICs names are always the same.
For more details, see Predictable Network Interface Names.
For example:
# Example commands:# kubectl -n managed-ns get bmh baremetalhost1 -o custom-columns='NAME:.metadata.name,STATUS:.status.provisioning.state'# NAME STATE# baremetalhost1 ready# kubectl -n managed-ns get bmh baremetalhost1 -o yaml# Example output:apiVersion:metal3.io/v1alpha1kind:BareMetalHost...status:...nics:-ip:fe80::ec4:7aff:fe6a:fb1f%eno2mac:0c:c4:7a:6a:fb:1fmodel:0x8086 0x1521name:eno2pxe:false-ip:fe80::ec4:7aff:fe1e:a2fc%ens1f0mac:0c:c4:7a:1e:a2:fcmodel:0x8086 0x10fbname:ens1f0pxe:false-ip:fe80::ec4:7aff:fe1e:a2fd%ens1f1mac:0c:c4:7a:1e:a2:fdmodel:0x8086 0x10fbname:ens1f1pxe:false-ip:192.168.1.151# Temp. PXE network adressmac:0c:c4:7a:6a:fb:1emodel:0x8086 0x1521name:eno1pxe:true...
The Operator selects from the following options:
Create an l2template object with the ifMapping configuration.
For details, see Create L2 templates.
The baremetal-provider service links the Machine object
to the BareMetalHostInventory object.
The kaas-ipam and baremetal-provider services collect hardware
information from the BareMetalHostInventory object and use it to
configure host networking and services.
The kaas-ipam service:
Spawns the IpamHost object.
Renders the l2template object.
Spawns the ipaddr object.
Updates the IpamHost object status with all rendered
and linked information.
The baremetal-provider service collects the rendered networking
information from the IpamHost object
The baremetal-provider service proceeds with the IpamHost object
provisioning.
After creating the MetalLB configuration as described in Configure MetalLB
and before creating L2 templates, ensure that you have the required subnets
that can be used in the L2 template to allocate IP addresses for the
MOSK cluster nodes.
Where required, create a number of subnets for a particular project
using the Subnet CR. A subnet has the following logical scopes:
Each subnet used in an L2 template has its logical scope that is set using the
scope parameter in the corresponding L2Template.spec.l3Layout section.
One of the following logical scopes is used for each subnet referenced in an
L2 template:
global - CR uses the default namespace.
A subnet can be used for any cluster located in any project.
namespaced - CR uses the namespace that corresponds to a particular project
where MOSK clusters are located. A subnet can be used
for any cluster located in the same project.
cluster - Unsupported since Container Cloud 2.28.0 (Cluster releases 17.3.0
and 16.3.0). CR uses the namespace where the referenced cluster is located.
A subnet is only accessible to the cluster that
L2Template.metadata.labels:cluster.sigs.k8s.io/cluster-name (mandatory
since MOSK 23.3) or L2Template.spec.clusterRef
(deprecated since MOSK 23.3) refers to. The Subnet
objects with the cluster scope will be created for every new cluster.
Note
The use of the ipam/SVC-MetalLB label in Subnet objects
is unsupported as part of the MetalLBConfigTemplate object deprecation
since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0). No
actions are required for existing objects. A Subnet object containing
this label will be ignored by baremetal-provider after cluster update
to the mentioned Cluster releases.
You can have subnets with the same name in different projects.
In this case, the subnet that has the same project
as the cluster will be used. One L2 template may often reference several
subnets, those subnets may have different scopes in this case.
The IP address objects (IPaddr CR) that are allocated from subnets
always have the same project as their corresponding IpamHost objects,
regardless of the subnet scope.
You can create subnets using either the Container Cloud web UI or CLI.
Any Subnet object may contain ipam/SVC-<serviceName> labels.
All IP addresses allocated from the Subnet object that has service labels
defined inherit those labels.
When a particular IpamHost uses IP addresses allocated from such labeled
Subnet objects, the ServiceMap field in IpamHost.Status
contains information about which IPs and interfaces correspond to which service
labels (that have been set in the Subnet objects). Using ServiceMap,
you can understand what IPs and interfaces of a particular host are used
for network traffic of a given service.
Container Cloud uses the following service labels that allow using of the
specific subnets for particular Container Cloud services:
ipam/SVC-k8s-lcm
ipam/SVC-ceph-cluster
ipam/SVC-ceph-public
ipam/SVC-dhcp-range
ipam/SVC-MetalLBUnsupported since 24.3
ipam/SVC-LBhost
Caution
The use of the ipam/SVC-k8s-lcm label is mandatory for every
cluster.
Important
A label value is not mandatory and can be empty but it must
match the value in the related L2Template object, in which the
corresponding subnet is used. Otherwise, network configuration for related
hosts will not be rendered due to not found subnets.
You can also add custom service labels to the Subnet objects the same way
you add Container Cloud service labels. The mapping of IPs and interfaces to
the defined services is displayed in IpamHost.Status.ServiceMap.
You can assign multiple service labels to one network. You can also assign the
ceph-* and dhcp-range services to multiple networks. In the latter
case, the system sorts the IP addresses in the ascending order:
You can add service labels during creation of subnets as described in
Create subnets.
Create subnets for a managed cluster using web UI¶
After creating the MetalLB configuration as described in Configure MetalLB
and before creating an L2 template, create the required subnets to use in the
L2 template to allocate IP addresses for the managed cluster nodes.
To create subnets for a managed cluster using web UI:
Log in to the Container Cloud web UI with the operator permissions.
Switch to the required non-default project using the
Switch Project action icon located on top of the main left-side
navigation panel.
Caution
Do not create a MOSK cluster in the default project
(Kubernetes namespace), which is dedicated for the management cluster only.
If no projects are defined, first create a new mosk project as described
in Create a project for MOSK clusters.
Available in the web UI since Container Cloud 2.28.0 (17.3.0 and 16.3.0).
Storage access subnet.
Storage replication
Available in the web UI since Container Cloud 2.28.0 (17.3.0 and 16.3.0).
Storage replication subnet.
Custom
Custom subnet. For example, external or Kubernetes workloads.
MetalLB
Services subnet(s).
Warning
Since Container Cloud 2.28.0 (Cluster releases 17.3.0
and 16.3.0), disregard this parameter during subnet creation.
Configure MetalLB separately as described in
Configure MetalLB.
This parameter is removed from the Container Cloud web UI in
Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0).
Cluster name that the subnet is being created for. Not required only
for the DHCP subnet.
CIDR
A valid IPv4 address of the subnet in the CIDR notation, for example,
10.11.0.0/24.
Include RangesOptional
A comma-separated list of IP address ranges within the given CIDR that should
be used in the allocation of IPs for nodes. The gateway, network, broadcast,
and DNSaddresses will be excluded (protected) automatically if they intersect
with one of the range. The IPs outside the given ranges will not be used in
the allocation. Each element of the list can be either an interval
10.11.0.5-10.11.0.70 or a single address 10.11.0.77.
Warning
Do not use values that are out of the given CIDR.
Exclude RangesOptional
A comma-separated list of IP address ranges within the given CIDR that should
not be used in the allocation of IPs for nodes. The IPs within the given CIDR
but outside the given ranges will be used in the allocation.
The gateway, network, broadcast, and DNS addresses will be excluded
(protected) automatically if they are included in the CIDR.
Each element of the list can be either an interval 10.11.0.5-10.11.0.70
or a single address 10.11.0.77.
Warning
Do not use values that are out of the given CIDR.
GatewayOptional
A valid IPv4 gateway address, for example, 10.11.0.9. Does not apply
to the MetalLB subnet.
Nameservers
IP addresses of nameservers separated by a comma. Does not apply
to the DHCP and MetalLB subnet types.
Use whole CIDR
Optional. Select to use the whole IPv4 address range that is set in
the CIDR field. Useful when defining single IP address (/32),
for example, in the Cluster API load balancer (LB) subnet.
If not set, the network address and broadcast address in the IP
subnet are excluded from the address allocation.
Labels
Key-value pairs attached to the selected subnet:
Caution
The values of the created subnet labels must match the
ones in the spec.l3Layout section of the corresponding
L2Template object.
Click Add a label and assign the first custom label
with the required name and value. To assign consecutive labels,
use the + button located in the right side of the
Labels section.
MetalLB:
Warning
Since Container Cloud 2.28.0 (Cluster releases
17.3.0 and 16.3.0), disregard this label during subnet
creation. Configure MetalLB separately as described in
Configure MetalLB.
The label will be removed from the Container Cloud web UI in
one of the following releases.
metallb/address-pool-name
Name of the subnet address pool. Exemplary values:
services, default, external, services-pxe.
In the Networks tab, verify the status of the created
subnet:
Ready - object is operational.
Error - object is non-operational. Hover over the status
to obtain details of the issue.
Note
To verify subnet details, in the Networks tab,
click the More action icon in the last column of the
required subnet and select Subnet info.
Before MCC 2.26.0 (17.0.0, 16.0.0, or earlier)
In the Clusters tab, click the required cluster and scroll
down to the Subnets section.
Click Add Subnet.
Fill out the Add new subnet form as required:
Subnet Name
Subnet name.
CIDR
A valid IPv4 CIDR, for example, 10.11.0.0/24.
Include RangesOptional
A comma-separated list of IP address ranges within the given CIDR that should
be used in the allocation of IPs for nodes. The gateway, network, broadcast,
and DNSaddresses will be excluded (protected) automatically if they intersect
with one of the range. The IPs outside the given ranges will not be used in
the allocation. Each element of the list can be either an interval
10.11.0.5-10.11.0.70 or a single address 10.11.0.77.
Warning
Do not use values that are out of the given CIDR.
Exclude RangesOptional
A comma-separated list of IP address ranges within the given CIDR that should
not be used in the allocation of IPs for nodes. The IPs within the given CIDR
but outside the given ranges will be used in the allocation.
The gateway, network, broadcast, and DNS addresses will be excluded
(protected) automatically if they are included in the CIDR.
Each element of the list can be either an interval 10.11.0.5-10.11.0.70
or a single address 10.11.0.77.
After creating the MetalLB configuration as described in Configure MetalLB
and before creating an L2 template, create the required subnets to use in the
L2 template to allocate IP addresses for the managed cluster nodes.
To create subnets for a cluster using CLI:
Create a cluster using one of the following options:
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
A comma-separated list of IP address ranges within the given CIDR that should
be used in the allocation of IPs for nodes. The gateway, network, broadcast,
and DNSaddresses will be excluded (protected) automatically if they intersect
with one of the range. The IPs outside the given ranges will not be used in
the allocation. Each element of the list can be either an interval
10.11.0.5-10.11.0.70 or a single address 10.11.0.77.
Warning
Do not use values that are out of the given CIDR.
excludeRanges (list)
A comma-separated list of IP address ranges within the given CIDR that should
not be used in the allocation of IPs for nodes. The IPs within the given CIDR
but outside the given ranges will be used in the allocation.
The gateway, network, broadcast, and DNS addresses will be excluded
(protected) automatically if they are included in the CIDR.
Each element of the list can be either an interval 10.11.0.5-10.11.0.70
or a single address 10.11.0.77.
Warning
Do not use values that are out of the given CIDR.
useWholeCidr (boolean)
If set to true, the subnet address (10.11.0.0 in the example
above) and the broadcast address (10.11.0.255 in the example above)
are included into the address allocation for nodes. Otherwise,
(false by default), the subnet address and broadcast address
are excluded from the address allocation.
gateway (singular)
A valid gateway address, for example, 10.11.0.9.
nameservers (list)
A list of the IP addresses of name servers. Each element of the list
is a single address, for example, 172.18.176.6.
Each cluster must use at least one subnet for its LCM network. Every
node must have the address allocated in the LCM network using such
subnet(s).
Each node of every cluster must have one and only IP address in the LCM
network that is allocated from one of the Subnet
objects having the ipam/SVC-k8s-lcm label defined. Therefore, all
Subnet objects used for LCM networks must have the ipam/SVC-k8s-lcm
label defined.
You can use any interface name for the LCM network traffic.
The Subnet objects for the LCM network must have the
ipam/SVC-k8s-lcm label. For details, see Service labels and their life cycle.
Note
You may use different subnets to allocate IP addresses to
different Container Cloud components in your cluster. Add a label with
the ipam/SVC- prefix to each subnet that is used to configure a
Container Cloud service. For details, see Service labels and their life cycle and the
optional steps below.
Caution
Use of a dedicated network for Kubernetes pods traffic,
for external connection to the Kubernetes services exposed
by the cluster, and for the Ceph cluster access and replication
traffic is available as Technology Preview. Use such
configurations for testing and evaluation purposes only.
For the Technology Preview feature definition,
refer to Technology Preview features.
Add one or more subnets for the LCM network:
Set the ipam/SVC-k8s-lcm label with the value "1" to create
a subnet that will be used to assign IP addresses in the LCM network.
Optional. Set the cluster.sigs.k8s.io/cluster-name label to the name
of the target cluster during the subnet creation.
Use this subnet in the L2 template for cluster nodes.
Using the L2 template, assign this subnet to the interface connected to
your LCM network.
Precautions for the LCM network usage
Each cluster must use at least one subnet for its LCM network.
Every node must have the address allocated in the LCM network
using such subnet(s).
Each node of every cluster must have one and only IP address in the LCM
network that is allocated from one of the Subnet
objects having the ipam/SVC-k8s-lcm label defined. Therefore, all
Subnet objects used for LCM networks must have the ipam/SVC-k8s-lcm
label defined.
You can use any interface name for the LCM network traffic.
The Subnet objects for the LCM network must have the
ipam/SVC-k8s-lcm label. For details, see Service labels and their life cycle.
Optional. Technology Preview. Add a subnet for the externally accessible
API endpoint of the MOSK cluster.
Make sure that loadBalancerHost is set to "" (empty string)
in the Cluster spec.
Create a subnet with the ipam/SVC-LBhost label having the "1"
value to make the baremetal-provider use this subnet for allocation
of addresses for cluster API endpoints.
One IP address will be allocated for each cluster to serve its
Kubernetes/MKE API endpoint.
Caution
Make sure that master nodes have host local-link addresses
in the same subnet as the cluster API endpoint address. These host IP
addresses will be used for VRRP traffic. The cluster API endpoint address
will be assigned to the same interface on one of the master nodes where
these host IPs are assigned.
Note
Mirantis highly recommends that you assign the cluster API
endpoint address from the LCM or external network. For details on cluster
network types, refer to MOSK cluster networking.
To add an address allocation scope of API endpoints, create a subnet in the
corresponding namespace with a reference to the target cluster using the
cluster.sigs.k8s.io/cluster-name label. For example:
Optional. Add a subnet(s) for the storage access network.
Set the ipam/SVC-ceph-public label with the value "1" to create
a subnet that will be used to configure the Ceph public network.
Set the cluster.sigs.k8s.io/cluster-name label to the name of the
target cluster during the subnet creation.
Use this subnet in the L2 template for all cluster nodes except Kubernetes
manager nodes.
Assign this subnet to the interface connected to your Storage access
network.
Ceph will automatically use this subnet for its external connections.
A Ceph OSD will look for and bind to an address from this subnet
when it is started on a machine.
Optional. Add a subnet(s) for the storage replication network.
Set the ipam/SVC-ceph-cluster label with the value "1" to create
a subnet that will be used to configure the Ceph cluster network.
Set the cluster.sigs.k8s.io/cluster-name label to the name
of the target cluster during the subnet creation.
Use this subnet in the L2 template for storage nodes.
Assign this subnet to the interface connected to your storage replication
network.
Ceph will automatically use this subnet for its internal replication
traffic.
Optional. Add a subnet for the Kubernetes Pods traffic.
Use this subnet in the L2 template for all nodes in the cluster.
Assign this subnet to the interface connected to your Kubernetes
workloads network.
Use the npTemplate.bridges.k8s-pods bridge name in the L2 template.
This bridge name is reserved for the Kubernetes workloads network.
When the k8s-pods bridge is defined in an L2 template,
Calico CNI uses that network for routing the Pods traffic between nodes.
Optional. Add a subnet for the MOSK overlay network:
Use this subnet in the L2 template for the compute and gateway
(controller) nodes in the MOSK cluster.
Assign this subnet to the interface connected to your
MOSK overlay network.
This network is used to provide denied and secure tenant networks with the
help of the tunneling mechanism (VLAN/GRE/VXLAN). If the VXLAN and GRE
encapsulation takes place, the IP address assignment is required on
interfaces at the node level. On the Tungsten Fabric deployments,
this network is used for MPLS over UDP+GRE traffic.
Optional. Add a subnet for the MOSK live migration
network:
Use this subnet in the L2 template for compute nodes in the
MOSK cluster.
Assign this subnet to the interface connected to your
MOSK overlay network.
This subnet is used by the Compute service (OpenStack Nova) to transfer
data during live migration. Depending on the cloud needs, you can place
it on a dedicated physical network not to affect other networks during
live migration. The IP address assignment is required on interfaces at the
node level.
Contains a short state description and a more detailed one if
applicable. The short status values are as follows:
OK - object is operational.
ERR - object is non-operational. This status has a detailed
description in the messages list.
TERM - object was deleted and is terminating.
messagesSince 23.1
Contains error or warning messages if the object state is ERR.
For example, ERR:WrongincludeRangeforCIDR….
statusMessage
Deprecated since MOSK 23.1 and will be removed in
one of the following releases in favor of state and messages.
Since MOSK 23.2, this field is not set for the
objects of newly created clusters.
cidr
Reflects the actual CIDR, has the same meaning as spec.cidr.
gateway
Reflects the actual gateway, has the same meaning as spec.gateway.
nameservers
Reflects the actual name servers, has same meaning as spec.nameservers.
ranges
Specifies the address ranges that are calculated using the fields from
spec:cidr,includeRanges,excludeRanges,gateway,useWholeCidr.
These ranges are directly used for nodes IP allocation.
allocatable
Includes the number of currently available IP addresses that can be allocated
for nodes from the subnet.
allocatedIPs
Specifies the list of IPv4 addresses with the corresponding IPaddr object IDs
that were already allocated from the subnet.
capacity
Contains the total number of IP addresses being held by ranges that equals to a sum
of the allocatable and allocatedIPs parameters values.
objCreated
Date, time, and IPAM version of the Subnet CR creation.
objStatusUpdated
Date, time, and IPAM version of the last update of the status
field in the Subnet CR.
objUpdated
Date, time, and IPAM version of the last Subnet CR update
by kaas-ipam.
Example of a successfully created subnet:
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:labels:ipam/UID:6039758f-23ee-40ba-8c0f-61c01b0ac863kaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-oneipam/SVC-k8s-lcm:"1"name:kaas-mgmtnamespace:defaultspec:cidr:10.0.0.0/24excludeRanges:-10.0.0.100-10.0.0.101-10.0.0.120gateway:10.0.0.1includeRanges:-10.0.0.50-10.0.0.90nameservers:-172.18.176.6status:allocatable:38allocatedIPs:-10.0.0.50:0b50774f-ffed-11ea-84c7-0242c0a85b02-10.0.0.51:1422e651-ffed-11ea-84c7-0242c0a85b02-10.0.0.52:1d19912c-ffed-11ea-84c7-0242c0a85b02capacity:41cidr:10.0.0.0/24gateway:10.0.0.1objCreated:2021-10-21T19:09:32Z by v5.1.0-20210930-121522-f5b2af8objStatusUpdated:2021-10-21T19:14:18.748114886Z by v5.1.0-20210930-121522-f5b2af8objUpdated:2021-10-21T19:09:32.606968024Z by v5.1.0-20210930-121522-f5b2af8nameservers:-172.18.176.6ranges:-10.0.0.50-10.0.0.90
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
According to the MOSK reference architecture,
you should create the following subnets.
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
The addresses from this subnet are not allocated to interfaces, but used
as a MetalLB address pool to expose MOSK API endpoints
as Kubernetes cluster services.
The addresses from this subnet are assigned to interfaces connected
to the Kubernetes workloads network and used by Calico CNI as underlay
for traffic between the pods in the Kubernetes cluster.
The network is used by the Compute service (OpenStack Nova) to transfer data
during live migration. Depending on the cloud needs, you can place it on a
dedicated physical network not to affect other networks during live migration.
When planning your installation in advance, you need to prepare a set
of subnets and L2 templates for every rack in your cluster. For details, see
Multi-rack architecture.
Using the Subnet object examples for a multi-rack cluster that are
described in the following sections, create subnets for the target cluster.
Note
Subnet labels such as rack-x-lcm, rack-api-lcm, and so on are
optional. You can use them in L2 templates to select Subnet objects by
label.
Note
Before the Cluster release 16.1.0, the Subnet object contains
the kaas.mirantis.com/region label that specifies the region
where the Subnet object will be applied.
Configure DHCP relay agents on the edges of the broadcast domains in the
provisioning network, as needed.
Make sure to assign the IP address ranges you want to allocate
to the hosts using DHCP for discovery and inspection. Create subnets
using these IP parameters. Specify the IP address of your DHCP relay
as the default gateway in the corresponding Subnet object.
Caution
Support of multiple DHCP ranges has the following limitations:
Using of custom DNS server addresses for servers that boot over PXE
is not supported.
The Subnet objects for DHCP ranges cannot be associated with any
specific cluster, as the DHCP server configuration is only applicable to
the management cluster where the DHCP server is running.
The cluster.sigs.k8s.io/cluster-name label will be ignored.
Example mos-racks-dhcp-subnets.yaml
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-1-dhcpnamespace:defaultlabels:ipam/SVC-dhcp-range:"1"kaas.mirantis.com/provider:baremetalspec:cidr:10.20.101.0/24gateway:10.20.101.1includeRanges:-10.20.101.16-10.20.101.127---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-2-dhcpnamespace:defaultlabels:ipam/SVC-dhcp-range:"1"kaas.mirantis.com/provider:baremetalspec:cidr:10.20.102.0/24gateway:10.20.102.1includeRanges:-10.20.102.16-10.20.102.127---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-3-dhcpnamespace:defaultlabels:ipam/SVC-dhcp-range:"1"kaas.mirantis.com/provider:baremetalspec:cidr:10.20.103.0/24gateway:10.20.103.1includeRanges:-10.20.103.16-10.20.103.127---# Add more Subnet object templates as required using the above example# (one subnet per rack)
This is the IP address space that Container Cloud uses to ensure communication
between the LCM agents and the management API. These addresses are also used
by Kubernetes nodes for communication. The addresses from the subnets
are assigned to all MOSK cluster nodes.
Example mosk-racks-lcm-subnets.yaml
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-1-lcmnamespace:mosk-namespace-namelabels:ipam/SVC-k8s-lcm:"1"kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-1-lcm:"true"spec:cidr:10.20.111.0/24gateway:10.20.111.1includeRanges:-10.20.111.16-10.20.111.255nameservers:-8.8.8.8---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-2-lcmnamespace:mosk-namespace-namelabels:ipam/SVC-k8s-lcm:"1"kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-2-lcm:"true"spec:cidr:10.20.112.0/24gateway:10.20.112.1includeRanges:-10.20.112.16-10.20.112.255nameservers:-8.8.8.8---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-3-lcmnamespace:mosk-namespace-namelabels:ipam/SVC-k8s-lcm:"1"kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-3-lcm:"true"spec:cidr:10.20.113.0/24gateway:10.20.113.1includeRanges:-10.20.113.16-10.20.113.255nameservers:-8.8.8.8---# Add more subnet object templates as required using the above example# (one subnet per rack)
If BGP announcement is configured for the MOSK cluster API LB address, the
API/LCM network is not required. Announcement of the cluster API LB address
is done using the LCM network.
If you configure ARP announcement of the load-balancer IP address for the
MOSK cluster API, the API/LCM network must be configured on the Kubernetes
manager nodes of the cluster. This network contains the Kubernetes API
endpoint with the VRRP virtual IP address.
This network contains Kubernetes API endpoint with the VRRP virtual IP address.
This is the IP address space that Container Cloud uses to ensure communication
between the LCM agents and the management API. These addresses are also used by
Kubernetes nodes for communication. The addresses from the subnet are assigned
to all Kubernetes manager nodes of the MOSK cluster.
If you configure BGP announcement for IP addresses of load-balanced services
of a MOSK cluster, the external network can consist of multiple VLAN segments
connected to all nodes of a MOSK cluster where MetalLB speaker components are
configured to announce IP addresses for Kubernetes load-balanced services.
Mirantis recommends that you use OpenStack controller nodes for this purpose.
If you configure ARP announcement for IP addresses of load-balanced services
of a MOSK cluster, the external network must consist of a single VLAN
stretched to the ToR switches of all the racks where MOSK nodes connected to
the external network are located. Those are the nodes where MetalLB speaker
components are configured to announce IP addresses for Kubernetes load-balanced
services. Mirantis recommends that you use OpenStack controller nodes for this
purpose.
The subnets are used to assign addresses to the external interfaces of the
MOSK controller nodes and will be used to assign the default
gateway to these hosts. The default gateway for other hosts of the
MOSK cluster is assigned using the LCM and optionally
API/LCM subnets.
Example mosk-racks-external-subnets.yaml
Example of a subnet where a single VLAN segment is stretched to all
MOSK controller nodes:
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:k8s-externalnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namek8s-external:truespec:cidr:10.20.120.0/24gateway:10.20.120.1# This will be the default gateway on hostsincludeRanges:-10.20.120.16-10.20.120.20nameservers:-8.8.8.8
Example of subnets where separate VLAN segments per rack are used:
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-1-k8s-extnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-1-k8s-ext:truespec:cidr:10.20.121.0/24gateway:10.20.121.1# This will be the default gateway on hostsincludeRanges:-10.20.121.16-10.20.121.20nameservers:-8.8.8.8---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-2-k8s-extnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-2-k8s-ext:truespec:cidr:10.20.122.0/24gateway:10.20.122.1# This will be the default gateway on hostsincludeRanges:-10.20.122.16-10.20.122.20nameservers:-8.8.8.8---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-3-k8s-extnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-3-k8s-ext:truespec:cidr:10.20.123.0/24gateway:10.20.123.1# This will be the default gateway on hostsincludeRanges:-10.20.123.16-10.20.123.20nameservers:-8.8.8.8
This network may have per-rack VLANs and IP subnets. The addresses from the
subnets are assigned to all MOSK cluster nodes besides
Kubernetes manager nodes.
Example mosk-racks-ceph-public-subnets.yaml
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-1-ceph-publicnamespace:mosk-namespace-namelabels:ipam/SVC-ceph-public:"1"kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-1-ceph-public:truespec:cidr:10.20.131.0/24gateway:10.20.131.1includeRanges:-10.20.131.16-10.20.131.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-2-ceph-publicnamespace:mosk-namespace-namelabels:ipam/SVC-ceph-public:"1"kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-2-ceph-public:truespec:cidr:10.20.132.0/24gateway:10.20.132.1includeRanges:-10.20.132.16-10.20.132.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-3-ceph-publicnamespace:mosk-namespace-namelabels:ipam/SVC-ceph-public:"1"kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-3-ceph-public:truespec:cidr:10.20.133.0/24gateway:10.20.133.1includeRanges:-10.20.133.16-10.20.133.255---# Add more Subnet object templates as required using the above example# (one subnet per rack)
This network may have per-rack VLANs and IP subnets. The addresses from the
subnets are assigned to storage nodes in the MOSK cluster.
Example mosk-racks-ceph-cluster-subnets.yaml
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-1-ceph-clusternamespace:mosk-namespace-namelabels:ipam/SVC-ceph-cluster:"1"kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-1-ceph-cluster:truespec:cidr:10.20.141.0/24gateway:10.20.141.1includeRanges:-10.20.141.16-10.20.141.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-2-ceph-clusternamespace:mosk-namespace-namelabels:ipam/SVC-ceph-cluster:"1"kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-2-ceph-cluster:truespec:cidr:10.20.142.0/24gateway:10.20.142.1includeRanges:-10.20.142.16-10.20.142.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-3-ceph-clusternamespace:mosk-namespace-namelabels:ipam/SVC-ceph-cluster:"1"kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-3-ceph-cluster:truespec:cidr:10.20.143.0/24gateway:10.20.143.1includeRanges:-10.20.143.16-10.20.143.255---# Add more Subnet object templates as required using the above example# (one subnet per rack)
This network may include multiple per-rack VLANs and IP subnets. The addresses
from the subnets are assigned to all MOSK cluster nodes.
For details, see Network types.
Example mosk-racks-k8s-pods.yaml
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-1-k8s-podsnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-1-k8s-pods:truespec:cidr:10.20.151.0/24gateway:10.20.151.1includeRanges:-10.20.151.16-10.20.151.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-2-k8s-podsnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-2-k8s-pods:truespec:cidr:10.20.152.0/24gateway:10.20.152.1includeRanges:-10.20.152.16-10.20.152.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-3-k8s-podsnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-3-k8s-pods:truespec:cidr:10.20.153.0/24gateway:10.20.153.1includeRanges:-10.20.153.16-10.20.153.255---# Add more Subnet object templates as required using the above example# (one subnet per rack)
The underlay network for VXLAN tunnels for the MOSK tenants
traffic. If deployed with Tungsten Fabric, it is used for MPLS over UDP+GRE
traffic.
Example mosk-racks-tenant-tunnel.yaml
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-1-tenant-tunnelnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-1-tenant-tunnel:truespec:cidr:10.20.161.0/24gateway:10.20.161.1includeRanges:-10.20.161.16-10.20.161.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-2-tenant-tunnelnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-2-tenant-tunnel:truespec:cidr:10.20.162.0/24gateway:10.20.162.1includeRanges:-10.20.162.16-10.20.162.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-3-tenant-tunnelnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-3-tenant-tunnel:truespec:cidr:10.20.163.0/24gateway:10.20.163.1includeRanges:-10.20.163.16-10.20.163.255---# Add more Subnet object templates as required using the above example# (one subnet per rack)
The network is used by the Compute service (OpenStack Nova) to transfer data
during live migration. Depending on the cloud needs, it can be placed on a
dedicated physical network not to affect other networks during live migration.
Example mosk-racks-live-migration.yaml
apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-1-live-migrationnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-1-live-migration:truespec:cidr:10.20.171.0/24gateway:10.20.171.1includeRanges:-10.20.171.16-10.20.171.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-2-live-migrationnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-2-live-migration:truespec:cidr:10.20.172.0/24gateway:10.20.172.1includeRanges:-10.20.172.16-10.20.172.255---apiVersion:ipam.mirantis.com/v1alpha1kind:Subnetmetadata:name:rack-3-live-migrationnamespace:mosk-namespace-namelabels:kaas.mirantis.com/provider:baremetalcluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack-3-live-migration:truespec:cidr:10.20.173.0/24gateway:10.20.173.1includeRanges:-10.20.173.16-10.20.173.255---# Add more Subnet object templates as required using the above example# (one subnet per rack)
After you create subnets for the MOSK cluster
as described in Create subnets, follow the procedure below to create L2
templates for different types of OpenStack nodes in the cluster.
See the following subsections for templates that implement the
MOSK Reference Architecture: Networking.
You may adjust the templates according to the requirements of
your architecture using the last two subsections of this section.
They explain mandatory parameters of the templates and supported
configuration options.
After you create subnets for one or more MOSK clusters or
projects as described in Create subnets, follow the procedure below to
create L2 templates for a MOSK cluster.
L2 templates are used directly during provisioning. This way, a hardware node
obtains and applies a complete network configuration during the first system
boot.
Create L2 templates before adding any machines to your new
MOSK cluster.
Log in to a local machine where your management cluster kubeconfig
is located and where kubectl is installed.
Note
The management cluster kubeconfig is created during the last
stage of the management cluster bootstrap.
Create a set of L2Template YAML files specific to your deployment using
exemplary templates provided in Create L2 templates.
Note
You can create several L2 templates with different configurations
to be applied to different nodes of the same cluster. See
Assign L2 templates to machines for details.
Add or edit the mandatory labels and parameters in the new L2 template.
For description of mandatory labels and parameters, see
Container Cloud API Reference: L2Template.
Optional. To designate an L2 template as default, assign the
ipam/DefaultForCluster label to it. Only one L2 template in a cluster
can have this label. It will be used for machines that do not have an L2
template explicitly assigned to them.
Note
You may skip this step and add the default label along with
other custom labels using the Container Cloud web UI, as described below
in this procedure.
To assign the default template to the cluster:
Since MCC 2.25.0 (17.0.0 and 16.0.0)
Use the mandatory cluster.sigs.k8s.io/cluster-name label in the L2
template metadata section.
Before MCC 2.25.0 (15.x, 14.x, or earlier)
Use the cluster.sigs.k8s.io/cluster-name label or the clusterRef
parameter in the L2 template spec section. During cluster update to
2.25.0, this deprecated parameter is automatically migrated to the
cluster.sigs.k8s.io/cluster-name label.
Optional. Add custom labels to the L2 template. You can refer to these
labels to assign the L2 template to machines.
Add the L2 template to your management cluster. Select one of the following
options:
Using the Container Cloud web UI
Since MCC 2.26.0 (17.1.0 and 16.1.0)
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required non-default project using the
Switch Project action icon located on top of the main left-side
navigation panel.
Caution
Do not create a MOSK cluster in the default project
(Kubernetes namespace), which is dedicated for the management cluster only.
If no projects are defined, first create a new mosk project as described
in Create a project for MOSK clusters.
In the left sidebar, navigate to Networks and click
the L2 Templates tab.
Click Create L2 Template.
Fill out the Create L2 Template form as required:
Name
L2 template name.
Cluster
Cluster name that the L2 template is being added for. To set the
L2 template as default for all machines, also select
Set default for the cluster.
Specification
L2 specification in the YAML format that you have previously
created. Click Edit to edit the L2 template if
required.
Note
Before Container Cloud 2.28.0 (Cluster releases 17.3.0
and 16.3.0), the field name is YAML file, and you
can upload the required YAML file instead of inserting and
editing it.
Modification of L2 templates in use is only allowed with a
mandatory validation step from the infrastructure operator to prevent
accidental cluster failures due to unsafe changes. The list of risks posed
by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to
the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause
data loss.
Incorrect configurations on hosts can lead to irrevocable loss of
connectivity between services and unexpected cluster partition or
disassembly.
Proceed with Add a machine. The resulting L2 template will be used to
render the netplan configuration for the MOSK cluster
machines.
Workflow of the netplan configuration using an L2 template¶
The kaas-ipam service uses the data from BareMetalHost,
L2Template, and Subnet objects to generate the netplan
configuration for every cluster machine.
Note
Before update of the management cluster to Container Cloud 2.29.0
(Cluster release 16.4.0), instead of BareMetalHostInventory, use the
BareMetalHost object. For details, see Container Cloud API Reference:
BareMetalHost resource.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
The generated netplan configuration is saved in the
status.netconfigFiles section of the IpamHost object. If the
status.netconfigFilesState field of the IpamHost object is OK,
the configuration was rendered in the IpamHost object successfully.
Otherwise, the status contains an error message.
Caution
The following fields of the ipamHost status are renamed since
MOSK 23.1 in the scope of the L2Template and IpamHost objects
refactoring:
netconfigV2 to netconfigCandidate
netconfigV2state to netconfigCandidateState
netconfigFilesState to netconfigFilesStates (per file)
No user actions are required after renaming.
The format of netconfigFilesState changed after renaming. The
netconfigFilesStates field contains a dictionary of statuses of network
configuration files stored in netconfigFiles. The dictionary contains
the keys that are file paths and values that have the same meaning for each
file that netconfigFilesState had:
For a successfully rendered configuration file:
OK:<timestamp><sha256-hash-of-rendered-file>, where a timestamp
is in the RFC 3339 format.
For a failed rendering: ERR:<error-message>.
The baremetal-provider service copies data from
status.netconfigFiles of the IpamHost object to the
Spec.StateItemsOverwrites[‘deploy’][‘bm_ipam_netconfigv2’] parameter
of LCMMachine.
The lcm-agent service on every host synchronizes the LCMMachine
data to its host. The lcm-agent service runs a playbook to update the
netplan configuration on the host during the pre-download and deploy
phases.
Create L2 templates for a multi-rack MOSK cluster¶
For a multi-rack MOSK cluster, you need to create
one L2 template for each type of server in each rack. This may result
in a large number of L2 templates in your configuration.
For example, if you have a three-rack deployment of MOSK
with 4 types of nodes evenly distributed across three racks, you have to create
at least the following L2 templates:
rack-1-k8s-manager, rack-2-k8s-manager, rack-3-k8s-manager
for Kubernetes control plane nodes, unless you use the compact control
plane option.
rack-1-mosk-control, rack-2-mosk-control, rack-3-mosk-control
for OpenStack controller nodes in each rack.
rack-1-mosk-compute, rack-2-mosk-compute, rack-3-mosk-compute
for OpenStack compute nodes in each rack.
rack-1-mosk-storage, rack-2-mosk-storage, rack-3-mosk-storage
for OpenStack storage nodes in each rack.
In total, twelve L2 templates are required for this relatively simple cluster.
In the following sections, the examples cover only one rack, but can be
easily expanded to more racks.
Note
Three servers are required for Kubernetes control plane and for the
OpenStack control plane. So, you might not need more L2 templates for these
roles when expanding beyond three racks.
Create an L2 template for a Kubernetes manager node¶
Caution
Modification of L2 templates in use is only allowed with a
mandatory validation step from the infrastructure operator to prevent
accidental cluster failures due to unsafe changes. The list of risks posed
by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to
the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause
data loss.
Incorrect configurations on hosts can lead to irrevocable loss of
connectivity between services and unexpected cluster partition or
disassembly.
To create L2 templates for Kubernetes manager nodes:
Create or open the mosk-l2templates.yml file that contains
the L2 templates you are preparing.
Add L2 templates using the following example. Adjust the values of
specific parameters according to the specifications of your environment,
specifically the name of your project (namespace) and cluster,
IP address ranges and networks, subnet names.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Note
Before MOSK 23.3, an L2 template requires
clusterRef:<clusterName> in the spec section. Since MOSK 23.3,
this parameter is deprecated and automatically migrated to the
cluster.sigs.k8s.io/cluster-name:<clusterName> label.
To create L2 templates for other racks, change the rack
identifier in the names and labels above.
Modification of L2 templates in use is only allowed with a
mandatory validation step from the infrastructure operator to prevent
accidental cluster failures due to unsafe changes. The list of risks posed
by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to
the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause
data loss.
Incorrect configurations on hosts can lead to irrevocable loss of
connectivity between services and unexpected cluster partition or
disassembly.
According to the reference architecture, MOSK controller
nodes must be connected to the following networks:
PXE network
LCM network
Kubernetes workloads network
Storage access network (if deploying with Ceph as a backend for ephemeral
storage)
Floating IP and provider networks. Not required for deployment with
Tungsten Fabric.
Tenant underlay networks. If deploying with VXLAN networking or with
Tungsten Fabric. In the latter case, the BGP service is configured over
this network.
To create L2 templates for MOSK controller nodes:
Create or open the mosk-l2template.yml file that contains
the L2 templates.
Add L2 templates using the following example. Adjust the values of
specific parameters according to the specification of your environment.
Example of an L2 template for a MOSK
controller node¶
apiVersion:ipam.mirantis.com/v1alpha1kind:L2Templatemetadata:labels:kaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onecluster.sigs.k8s.io/cluster-name:mosk-cluster-namerack1-mosk-controller:"true"name:rack1-mosk-controllernamespace:mosk-namespace-namespec:autoIfMappingPrio:-provision-eno-ens-enpl3Layout:-subnetName:mgmt-lcmscope:global-subnetName:rack1-k8s-lcmscope:namespace-subnetName:k8s-externalscope:namespace-subnetName:rack1-k8s-podsscope:namespace-subnetName:rack1-ceph-publicscope:namespace-subnetName:rack1-tenant-tunnelscope:namespacenpTemplate:|-version: 2ethernets:{{nic 0}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 0}}set-name: {{nic 0}}mtu: 9000{{nic 1}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 1}}set-name: {{nic 1}}mtu: 9000{{nic 2}}dhcp4: falsedhcp6: falsematch:macaddress: {{mac 2}}set-name: {{nic 2}}mtu: 9000{{nic 3}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 3}}set-name: {{nic 3}}mtu: 9000bonds:bond0:mtu: 9000parameters:mode: 802.3admii-monitor-interval: 100interfaces:- {{nic 0}}- {{nic 1}}bond1:mtu: 9000parameters:mode: 802.3admii-monitor-interval: 100interfaces:- {{nic 2}}- {{nic 3}}vlans:k8s-lcm-v:id: 403link: bond0mtu: 9000k8s-ext-v:id: 409link: bond0mtu: 9000k8s-pods-v:id: 408link: bond0mtu: 9000pr-floating:id: 407link: bond1mtu: 9000stor-frontend:id: 404link: bond0addresses:- {{ip "stor-frontend:rack1-ceph-public"}}mtu: 9000routes:- to: 10.199.16.0/22 # aggregated address space for Ceph public networkvia: {{ gateway_from_subnet "rack1-ceph-public" }}tenant-tunnel:id: 406link: bond1addresses:- {{ip "tenant-tunnel:rack1-tenant-tunnel"}}mtu: 9000routes:- to: 10.195.0.0/22 # aggregated address space for tenant networksvia: {{ gateway_from_subnet "rack1-tenant-tunnel" }}bridges:k8s-lcm:interfaces: [k8s-lcm-v]addresses:- {{ ip "k8s-lcm:rack1-k8s-lcm" }}nameservers:addresses: {{nameservers_from_subnet "rack1-k8s-lcm"}}routes:- to: 10.197.0.0/21 # aggregated address space for LCM and API/LCM networksvia: {{ gateway_from_subnet "rack1-k8s-lcm" }}- to: {{ cidr_from_subnet "mgmt-lcm" }}via: {{ gateway_from_subnet "rack1-k8s-lcm" }}k8s-ext:interfaces: [k8s-ext-v]addresses:- {{ip "k8s-ext:k8s-external"}}nameservers:addresses: {{nameservers_from_subnet "k8s-external"}}gateway4: {{ gateway_from_subnet "k8s-external" }}mtu: 9000k8s-pods:interfaces: [k8s-pods-v]addresses:- {{ip "k8s-pods:rack1-k8s-pods"}}mtu: 9000routes:- to: 10.199.0.0/22 # aggregated address space for Kubernetes workloadsvia: {{ gateway_from_subnet "rack1-k8s-pods" }}
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Note
Before MOSK 23.3, an L2 template requires
clusterRef:<clusterName> in the spec section. Since MOSK 23.3,
this parameter is deprecated and automatically migrated to the
cluster.sigs.k8s.io/cluster-name:<clusterName> label.
Caution
If you plan to deploy a MOSK cluster with
the compact control plane option and configure ARP announcement of the
load-balancer IP address for the MOSK cluster API, the
API/LCM network will be used for MOSK controller nodes.
Therefore, change the rack1-k8s-lcm subnet to the api-lcm one in
the corresponding L2Template object:
spec:...l3Layout:...-subnetName:api-lcmscope:namespace...npTemplate:|-...bridges:k8s-lcm:interfaces:[k8s-lcm-v]addresses:-{{ip "k8s-lcm:api-lcm"}}nameservers:addresses:{{nameservers_from_subnet "api-lcm"}}routes:-to:10.197.0.0/21# aggregated address space for LCM and API/LCM networksvia:{{gateway_from_subnet "api-lcm"}}-to:{{cidr_from_subnet "mgmt-lcm"}}via:{{gateway_from_subnet "api-lcm"}}...
Modification of L2 templates in use is only allowed with a
mandatory validation step from the infrastructure operator to prevent
accidental cluster failures due to unsafe changes. The list of risks posed
by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to
the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause
data loss.
Incorrect configurations on hosts can lead to irrevocable loss of
connectivity between services and unexpected cluster partition or
disassembly.
According to the reference architecture, MOSK compute nodes
must be connected to the following networks:
PXE network
LCM network
Kubernetes workloads network
Storage access network (if deploying with Ceph as a backend for ephemeral
storage)
Floating IP and provider networks (if deploying OpenStack with DVR)
Tenant underlay networks
To create L2 templates for MOSK compute nodes:
Add L2 templates to the mosk-l2templates.yml file using the following
example. Adjust the values of parameters according to the specification
of your environment.
Example of an L2 template for a MOSK
compute node¶
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Note
Before MOSK 23.3, an L2 template requires
clusterRef:<clusterName> in the spec section. Since MOSK 23.3,
this parameter is deprecated and automatically migrated to the
cluster.sigs.k8s.io/cluster-name:<clusterName> label.
Modification of L2 templates in use is only allowed with a
mandatory validation step from the infrastructure operator to prevent
accidental cluster failures due to unsafe changes. The list of risks posed
by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to
the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause
data loss.
Incorrect configurations on hosts can lead to irrevocable loss of
connectivity between services and unexpected cluster partition or
disassembly.
According to the reference architecture, MOSK storage nodes
in the MOSK cluster must be connected to the
following networks:
PXE network
LCM network
Kubernetes workloads network
Storage access network
Storage replication network
To create L2 templates for MOSK storage nodes:
Add L2 templates to the mosk-l2templates.yml file using the following
example. Adjust the values of parameters according to the specification
of your environment.
Example of an L2 template for a MOSK storage node¶
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Note
Before MOSK 23.3, an L2 template requires
clusterRef:<clusterName> in the spec section. Since MOSK 23.3,
this parameter is deprecated and automatically migrated to the
cluster.sigs.k8s.io/cluster-name:<clusterName> label.
This section contains an exemplary L2 template that demonstrates how to
set up bonds and bridges on hosts for your managed clusters.
Caution
Use of a dedicated network for Kubernetes pods traffic,
for external connection to the Kubernetes services exposed
by the cluster, and for the Ceph cluster access and replication
traffic is available as Technology Preview. Use such
configurations for testing and evaluation purposes only.
For the Technology Preview feature definition,
refer to Technology Preview features.
Configure bonding options using the parameters field. The only
mandatory option is mode. See the example below for details.
Note
You can set any mode supported by
netplan
and your hardware.
Important
Bond monitoring is disabled in Ubuntu by default. However,
Mirantis highly recommends enabling it using the Media Independent Interface
(MII) monitoring by setting the mii-monitor-interval parameter to a
non-zero value. For details, see Linux documentation: bond monitoring.
The Kubernetes LCM network connects LCM Agents running on nodes to the LCM API
of the management cluster. It is also used for communication between
kubelet and Kubernetes API server inside a Kubernetes cluster.
The MKE components use this network for communication inside a swarm
cluster.
To configure each node with an IP address that will be used for LCM traffic,
use the npTemplate.bridges.k8s-lcm bridge in the L2 template, as
demonstrated in the example below.
Each node of every cluster must have one and only IP address in the LCM
network that is allocated from one of the Subnet
objects having the ipam/SVC-k8s-lcm label defined. Therefore, all
Subnet objects used for LCM networks must have the ipam/SVC-k8s-lcm
label defined.
You can use any interface name for the LCM network traffic.
The Subnet objects for the LCM network must have the
ipam/SVC-k8s-lcm label. For details, see Service labels and their life cycle.
Dedicated network for the Kubernetes pods traffic¶
If you want to use a dedicated network for Kubernetes pods traffic,
configure each node with an IPv4
address that will be used to route the pods traffic between nodes.
To accomplish that, use the npTemplate.bridges.k8s-pods bridge
in the L2 template, as demonstrated in the example below.
As defined in Container Cloud Reference Architecture: Host networking,
this bridge name is reserved for the Kubernetes pods network. When the
k8s-pods bridge is defined in an L2 template, Calico CNI uses that network
for routing the pods traffic between nodes.
Dedicated network for the Kubernetes services traffic (MetalLB)¶
You can use a dedicated network for external connection to the Kubernetes
services exposed by the cluster.
If enabled, MetalLB will listen and respond on the dedicated virtual bridge.
To accomplish that, configure each node where metallb-speaker is deployed
with an IPv4 address. For details on selecting nodes for metallb-speaker,
see Configure the MetalLB speaker node selector. Both the MetalLB IP address ranges and the
IP addresses configured on those nodes must fit in the same CIDR.
The default route on the MOSK nodes that are connected to
the external network must be configured with the default gateway in the
external network.
Caution
The IP address ranges of the corresponding subnet used in
L2Template for the dedicated virtual brigde must be excluded from the
MetalLB address ranges.
Dedicated networks for the Ceph distributed storage traffic¶
You can configure dedicated networks for the Ceph cluster access and
replication traffic. Set labels on the Subnet CRs for the corresponding
networks, as described in Create subnets.
Container Cloud automatically configures Ceph to use the addresses from these
subnets. Ensure that the addresses are assigned to the storage nodes.
The Subnet objects used to assign IP addresses to these networks
must have corresponding labels ipam/SVC-ceph-public for the
Ceph public (storage access) network and ipam/SVC-ceph-cluster for the
Ceph cluster (storage replication) network.
Example of an L2 template with interfaces bonding¶
Before MOSK 23.3, an L2 template requires
clusterRef:<clusterName> in the spec section. Since MOSK 23.3,
this parameter is deprecated and automatically migrated to the
cluster.sigs.k8s.io/cluster-name:<clusterName> label.
L2 template example for automatic multiple subnet creation¶
This section contains an exemplary L2 template for automatic multiple
subnet creation as described in Automate multiple subnet creation using SubnetPool. This template
also contains the L3Layout section that allows defining the Subnet
scopes and enables auto-creation of the Subnet objects from the
SubnetPool objects. For details about auto-creation of the Subnet
objects see Automate multiple subnet creation using SubnetPool.
Do not assign an IP address to the PXE nic0 NIC explicitly
to prevent the IP duplication during updates. The IP address is
automatically assigned by the bootstrapping engine.
Example of an L2 template for multiple subnets:
apiVersion:ipam.mirantis.com/v1alpha1kind:L2Templatemetadata:name:test-managednamespace:managed-nslabels:kaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onecluster.sigs.k8s.io/cluster-name:my-clusterspec:autoIfMappingPrio:-provision-eno-ens-enpl3Layout:-subnetName:lcm-subnetscope:namespace-subnetName:subnet-1subnetPool:kaas-mgmtscope:namespace-subnetName:subnet-2subnetPool:kaas-mgmtscope:clusternpTemplate:|version: 2ethernets:onboard1gbe0:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 0}}set-name: {{nic 0}}# IMPORTANT: do not assign an IP address here explicitly# to prevent IP duplication issues. The IP will be assigned# automatically by the bootstrapping engine.# addresses: []onboard1gbe1:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 1}}set-name: {{nic 1}}ten10gbe0s0:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 2}}set-name: {{nic 2}}addresses:- {{ip "2:subnet-1"}}ten10gbe0s1:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 3}}set-name: {{nic 3}}addresses:- {{ip "3:subnet-2"}}bridges:k8s-lcm:interfaces: [onboard1gbe0]addresses:- {{ip "k8s-lcm:lcm-subnet"}}gateway4: {{gateway_from_subnet "lcm-subnet"}}nameservers:addresses: {{nameservers_from_subnet "lcm-subnet"}}
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
In the template above, the following networks are defined
in the l3Layout section:
lcm-subnet - the subnet name to use for the LCM network in
npTemplate. This subnet is shared between the project clusters because
it has the namespaced scope.
Since a subnet pool is not in use, create the corresponding Subnet
object before machines are attached to cluster manually. For details, see
Create subnets for a managed cluster using CLI.
Mark this Subnet with the ipam/SVC-k8s-lcm label.
The L2 template must contain the definition of the virtual Linux bridge
(k8s-lcm in the L2 template example) that is used to set up the LCM
network interface. IP addresses for the defined bridge must be assigned
from the LCM subnet, which is marked with the ipam/SVC-k8s-lcm label.
Each node of every cluster must have one and only IP address in the LCM
network that is allocated from one of the Subnet
objects having the ipam/SVC-k8s-lcm label defined. Therefore, all
Subnet objects used for LCM networks must have the ipam/SVC-k8s-lcm
label defined.
You can use any interface name for the LCM network traffic.
The Subnet objects for the LCM network must have the
ipam/SVC-k8s-lcm label. For details, see Service labels and their life cycle.
subnet-1 - unless already created, this subnet will be created
from the kaas-mgmt subnet pool. The subnet name must be unique within
the project. This subnet is shared between the project clusters.
subnet-2 - will be created from the kaas-mgmt subnet pool.
This subnet has the cluster scope. Therefore, the real name of the
Subnet CR object consists of the subnet name defined in l3Layout
and the cluster UID.
But the npTemplate section of the L2 template must contain only
the subnet name defined in l3Layout.
The subnets of the cluster scope are not shared between clusters.
Caution
Using the l3Layout section, define all subnets that are used
in the npTemplate section.
Defining only part of subnets is not allowed.
If labelSelector is used in l3Layout, use any custom
label name that differs from system names. This allows for easier
cluster scaling in case of adding new subnets as described in
Expand IP addresses capacity in an existing cluster.
Mirantis recommends using a unique label prefix such as
user-defined/.
Example of a complete template configuration for cluster creation¶
The following example contains all required objects of an advanced network
and host configuration for a managed cluster.
The procedure below contains:
Various .yaml objects to be applied with a managed cluster
kubeconfig
Useful comments inside the .yaml example files
Example hardware and configuration data, such as network, disk,
auth, that must be updated accordingly to fit your cluster configuration
Example templates, such as l2template and baremetalhostprofline,
that illustrate how to implement a specific configuration
Caution
The exemplary configuration described below is not production
ready and is provided for illustration purposes only.
For illustration purposes, all files provided in this exemplary procedure
are named by the Kubernetes object types:
Note
Before update of the management cluster to Container Cloud 2.29.0
(Cluster release 16.4.0), instead of BareMetalHostInventory, use the
BareMetalHost object. For details, see Container Cloud API Reference:
BareMetalHost resource.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
Create an empty .yaml file with the namespace object:
apiVersion:v1
Select from the following options:
Since MCC 2.21.0 (11.5.0, 7.11.0)
Create the required number of .yaml files with the
BareMetalHostCredential objects for each bmh node with the
unique name and authentication data. The following example
contains one BareMetalHostCredential object:
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Before MCC 2.21.0 (11.4.0, 8.10.0, 7.10.0, or earlier)
Create the required number of .yaml files with the Secret
objects for each bmh node with the unique name and
authentication data. The following example contains one Secret
object:
apiVersion:kaas.mirantis.com/v1alpha1kind:BareMetalHostInventorymetadata:annotations:inspect.metal3.io/hardwaredetails-storage-sort-term:hctl ASC, wwn ASC, by_id ASC, name ASClabels:cluster.sigs.k8s.io/cluster-name:managed-cluster# we will use those label, to link machine to exact bmh nodekaas.mirantis.com/baremetalhost-id:cz7700kaas.mirantis.com/provider:baremetalname:cz7700-managed-cluster-control-noefinamespace:managed-nsspec:bmc:address:192.168.1.12bmhCredentialsName:'cz7740-cred'bootMACAddress:0c:c4:7a:34:52:04bootMode:legacyonline:true
apiVersion:metal3.io/v1alpha1kind:BareMetalHostmetadata:labels:cluster.sigs.k8s.io/cluster-name:managed-clusterhostlabel.bm.kaas.mirantis.com/controlplane:controlplane# we will use those label, to link machine to exact bmh nodekaas.mirantis.com/baremetalhost-id:cz7700kaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-oneannotations:kaas.mirantis.com/baremetalhost-credentials-name:cz7700-credname:cz7700-managed-cluster-control-noefinamespace:managed-nsspec:bmc:address:192.168.1.12# credentialsName is updated automatically during cluster deploymentcredentialsName:''bootMACAddress:0c:c4:7a:34:52:04bootMode:legacyonline:true
apiVersion:metal3.io/v1alpha1kind:BareMetalHostmetadata:labels:cluster.sigs.k8s.io/cluster-name:managed-clusterhostlabel.bm.kaas.mirantis.com/controlplane:controlplane# we will use those label, to link machine to exact bmh nodekaas.mirantis.com/baremetalhost-id:cz7700kaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onename:cz7700-managed-cluster-control-noefinamespace:managed-nsspec:bmc:address:192.168.1.12# The secret for credentials requires the username and password# keys in the Base64 encoding.credentialsName:cz7700-credbootMACAddress:0c:c4:7a:34:52:04bootMode:legacyonline:true
apiVersion:metal3.io/v1alpha1kind:BareMetalHostProfilemetadata:labels:cluster.sigs.k8s.io/cluster-name:managed-cluster# This label indicates that this profile will be default in# namespaces, so machines w\o exact profile selecting will use# this templatekaas.mirantis.com/defaultBMHProfile:'true'kaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onename:bmhp-cluster-defaultnamespace:managed-nsspec:devices:-device:byPath:/dev/disk/by-path/pci-0000:00:1f.2-ata-1minSize:120Giwipe:truepartitions:-name:bios_grubpartflags:-bios_grubsize:4Miwipe:true-name:uefipartflags:-espsize:200Miwipe:true-name:config-2size:64Miwipe:true-name:lvm_dummy_partsize:1Giwipe:true-name:lvm_root_partsize:0wipe:true-device:byPath:/dev/disk/by-path/pci-0000:00:1f.2-ata-2minSize:30Giwipe:true-device:byPath:/dev/disk/by-path/pci-0000:00:1f.2-ata-3minSize:30Giwipe:truepartitions:-name:lvm_lvp_partsize:0wipe:true-device:byPath:/dev/disk/by-path/pci-0000:00:1f.2-ata-4wipe:truefileSystems:-fileSystem:vfatpartition:config-2-fileSystem:vfatmountPoint:/boot/efipartition:uefi-fileSystem:ext4logicalVolume:rootmountPoint:/-fileSystem:ext4logicalVolume:lvpmountPoint:/mnt/local-volumes/grubConfig:defaultGrubOptions:-GRUB_DISABLE_RECOVERY="true"-GRUB_PRELOAD_MODULES=lvm-GRUB_TIMEOUT=30kernelParameters:modules:-content:'optionskvm_intelnested=1'filename:kvm_intel.confsysctl:# For the list of options prohibited to change, refer to# https://docs.mirantis.com/mke/3.7/install/predeployment/set-up-kernel-default-protections.htmlfs.aio-max-nr:'1048576'fs.file-max:'9223372036854775807'fs.inotify.max_user_instances:'4096'kernel.core_uses_pid:'1'kernel.dmesg_restrict:'1'net.ipv4.conf.all.rp_filter:'0'net.ipv4.conf.default.rp_filter:'0'net.ipv4.conf.k8s-ext.rp_filter:'0'net.ipv4.conf.k8s-ext.rp_filter:'0'net.ipv4.conf.m-pub.rp_filter:'0'vm.max_map_count:'262144'logicalVolumes:-name:rootsize:0vg:lvm_root-name:lvpsize:0vg:lvm_lvppostDeployScript:|#!/bin/bash -ex# used for test-debug only!echo "root:r00tme" | sudo chpasswdecho 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"' > /etc/udev/rules.d/60-ssd-scheduler.rulesecho $(date) 'post_deploy_script done' >> /root/post_deploy_donepreDeployScript:|#!/bin/bash -execho "$(date) pre_deploy_script done" >> /root/pre_deploy_donevolumeGroups:-devices:-partition:lvm_root_partname:lvm_root-devices:-partition:lvm_lvp_partname:lvm_lvp-devices:-partition:lvm_dummy_part# here we can create lvm, but do not mount or format it somewherename:lvm_forawesomeapp
apiVersion:metal3.io/v1alpha1kind:BareMetalHostProfilemetadata:labels:cluster.sigs.k8s.io/cluster-name:managed-clusterkaas.mirantis.com/provider:baremetalkaas.mirantis.com/region:region-onename:worker-storage1namespace:managed-nsspec:devices:-device:minSize:120Giwipe:truepartitions:-name:bios_grubpartflags:-bios_grubsize:4Miwipe:true-name:uefipartflags:-espsize:200Miwipe:true-name:config-2size:64Miwipe:true# Create dummy partition w\o mounting-name:lvm_dummy_partsize:1Giwipe:true-name:lvm_root_partsize:0wipe:true-device:# Will be used for Ceph, so required to be wipedbyPath:/dev/disk/by-path/pci-0000:00:1f.2-ata-1minSize:30Giwipe:true-device:byPath:/dev/disk/by-path/pci-0000:00:1f.2-ata-2minSize:30Giwipe:truepartitions:-name:lvm_lvp_partsize:0wipe:true-device:byPath:/dev/disk/by-path/pci-0000:00:1f.2-ata-3wipe:true-device:byPath:/dev/disk/by-path/pci-0000:00:1f.2-ata-4minSize:30Giwipe:truepartitions:-name:lvm_lvp_part_sdfwipe:truesize:0fileSystems:-fileSystem:vfatpartition:config-2-fileSystem:vfatmountPoint:/boot/efipartition:uefi-fileSystem:ext4logicalVolume:rootmountPoint:/-fileSystem:ext4logicalVolume:lvpmountPoint:/mnt/local-volumes/grubConfig:defaultGrubOptions:-GRUB_DISABLE_RECOVERY="true"-GRUB_PRELOAD_MODULES=lvm-GRUB_TIMEOUT=30kernelParameters:modules:-content:'optionskvm_intelnested=1'filename:kvm_intel.confsysctl:# For the list of options prohibited to change, refer to# https://docs.mirantis.com/mke/3.6/install/predeployment/set-up-kernel-default-protections.htmlfs.aio-max-nr:'1048576'fs.file-max:'9223372036854775807'fs.inotify.max_user_instances:'4096'kernel.core_uses_pid:'1'kernel.dmesg_restrict:'1'net.ipv4.conf.all.rp_filter:'0'net.ipv4.conf.default.rp_filter:'0'net.ipv4.conf.k8s-ext.rp_filter:'0'net.ipv4.conf.k8s-ext.rp_filter:'0'net.ipv4.conf.m-pub.rp_filter:'0'vm.max_map_count:'262144'logicalVolumes:-name:rootsize:0vg:lvm_root-name:lvpsize:0vg:lvm_lvppostDeployScript:|#!/bin/bash -ex# used for test-debug only! That would allow operator to logic via TTY.echo "root:r00tme" | sudo chpasswd# Just an example for enforcing "ssd" disks to be switched to use "deadline" i\o scheduler.echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"' > /etc/udev/ rules.d/60-ssd-scheduler.rulesecho $(date) 'post_deploy_script done' >> /root/post_deploy_donepreDeployScript:|#!/bin/bash -execho "$(date) pre_deploy_script done" >> /root/pre_deploy_donevolumeGroups:-devices:-partition:lvm_root_partname:lvm_root-devices:-partition:lvm_lvp_part-partition:lvm_lvp_part_sdfname:lvm_lvp-devices:-partition:lvm_dummy_partname:lvm_forawesomeapp
Applies since Container Cloud 2.21.0 and 2.21.1 for
MOSK as TechPreview and since 2.24.0 as GA for
management clusters. For managed clusters, is generally available
since Container Cloud 2.25.0.
The MetalLBConfigTemplate object is available as
Technology Preview since Container Cloud 2.24.0 (Cluster release
14.0.0) and is generally available since Container Cloud 2.25.0
(Cluster releases 17.0.0 and 16.0.0).
apiVersion:kaas.mirantis.com/v1alpha1kind:KaaSCephClustermetadata:name:ceph-cluster-managed-clusternamespace:managed-nsspec:cephClusterSpec:nodes:# Add the exact ``nodes`` names.# Obtain the name from "get bmh -o wide" ``consumer`` field.cz812-managed-cluster-storage-worker-noefi-58spl:roles:-mgr-mon# All disk configuration must be reflected in ``baremetalhostprofile``storageDevices:-config:deviceClass:ssdfullPath:/dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231434939cz813-managed-cluster-storage-worker-noefi-lr4k4:roles:-mgr-monstorageDevices:-config:deviceClass:ssdfullPath:/dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231440912cz814-managed-cluster-storage-worker-noefi-z2m67:roles:-mgr-monstorageDevices:-config:deviceClass:ssdfullPath:/dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231443409pools:-default:truedeviceClass:ssdname:kubernetesreplicated:size:3role:kubernetesk8sCluster:name:managed-clusternamespace:managed-ns
Note
The storageDevices[].fullPath field is available since
Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0). For the
clusters running earlier product versions, define the /dev/disk/by-id
symlinks using storageDevices[].name instead.
Obtain kubeconfig of the newly created managed cluster:
Operators of Mirantis Container Cloud for on-demand self-service
Kubernetes deployments will want their users to create networks without
extensive knowledge about network topology or IP addresses. For
that purpose, the Operator can prepare L2 network templates in advance for
users to assign these templates to machines in their clusters.
The Operator can ensure that the users’ clusters have separate
IP address spaces using the SubnetPool resource.
SubnetPool allows for automatic creation of Subnet objects
that will consume blocks from the parent SubnetPool CIDR IP address
range. The SubnetPoolblockSize setting defines the IP address
block size to allocate to each child Subnet. SubnetPool has a global
scope, so any SubnetPool can be used to create the Subnet objects
for any namespace and for any cluster.
You can use the SubnetPool resource in the L2Template resources to
automatically allocate IP addresses from an appropriate IP range that
corresponds to a specific cluster, or create a Subnet resource
if it does not exist yet. This way, every cluster will use subnets
that do not overlap with other clusters.
To automate multiple subnet creation using SubnetPool:
Log in to a local machine where your management cluster kubeconfig
is located and where kubectl is installed.
Note
The management cluster kubeconfig is created
during the last stage of the management cluster bootstrap.
Create the subnetpool.yaml file with a number of subnet pools:
Note
You can define either or both subnets and subnet pools,
depending on the use case. A single L2 template can use
either or both subnets and subnet pools.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Verify that the subnet pool is successfully created:
Proceed to creating an L2 template for one or multiple managed clusters
as described in Create L2 templates. In this procedure, select
the exemplary L2 template for multiple subnets.
Caution
Using the l3Layout section, define all subnets that are used
in the npTemplate section.
Defining only part of subnets is not allowed.
If labelSelector is used in l3Layout, use any custom
label name that differs from system names. This allows for easier
cluster scaling in case of adding new subnets as described in
Expand IP addresses capacity in an existing cluster.
Mirantis recommends using a unique label prefix such as
user-defined/.
Log in to the host where your management cluster kubeconfig is
located and where kubectl is installed.
Create a new text file mosk-cluster-machines.yaml and create the
YAML definitons of the Machine resources. Use this as an example,
and see the descriptions of the fields below:
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Add the top level fields:
apiVersion
API version of the object that is cluster.k8s.io/v1alpha1.
kind
Object type that is Machine.
metadata
This section will contain the metadata of the object.
spec
This section will contain the configuration of the object.
Add mandatory fields to the metadata section of the Machine
object definition.
name
The name of the Machine object.
namespace
The name of the Project where the Machine will be created.
labels
This section contains additional metadata of the machine. Set
the following mandatory labels for the Machine object.
kaas.mirantis.com/provider
Set to "baremetal".
kaas.mirantis.com/region
Region name that matches the region name in the Cluster
object.
cluster.sigs.k8s.io/cluster-name
The name of the cluster to add the machine to.
Note
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Configure the mandatory parameters of the Machine object in
spec field. Add providerSpec field that contains parameters
for deployment on bare metal in a form of Kubernetes subresource.
In the providerSpec section, add the following mandatory configuration
parameters:
apiVersion
API version of the subresource that is baremetal.k8s.io/v1alpha1.
kind
Object type that is BareMetalMachineProviderSpec.
bareMetalHostProfile
Reference to a configuration profile of a bare metal host. It
helps to pick bare metal host with suitable configuration for
the machine. This section includes two parameters:
name
Name of a bare metal host profile
namespace
Project in which the bare metal host profile is created.
l2TemplateSelector
If specified, contains the name (first priority) or label
of the L2 template that will be applied during a machine creation.
Note that changing this field after Machine object is created
will not affect the host network configuration of the machine.
You should assign one of the templates you defined in
Create L2 templates to the machine. If there is no suitable
templates, you should create one per Create L2 templates.
hostSelector
This parameter defines matching criteria for picking a bare metal
host for the machine by label.
Any custom label that is assigned to one or more bare metal hosts
using API can be used as a host selector. If the bare metal host
objects with the specified label are missing, the Machine object
will not be deployed until at least one bare metal host with the
specified label is available.
This parameter contains a list of names of network interfaces of
the host. It allows to override the default naming and ordering
of network interfaces defined in L2 template referenced by
the l2TemplateSelector. This ordering informs the L2
templates how to generate the host network configuration.
Depending on the role of the machine in the MOSK cluster,
add labels to the nodeLabels field.
This field contains the list of node labels to be attached to a node for the
user to run certain components on separate cluster nodes. The list of allowed
node labels is located in the Cluster object status
providerStatus.releaseRef.current.allowedNodeLabels field.
If the value field is not defined in allowedNodeLabels, a label can
have any value. For example:
Before or after a machine deployment, add the required label from the allowed
node labels list with the corresponding value to
spec.providerSpec.value.nodeLabels in machine.yaml. For example:
nodeLabels:-key:stacklightvalue:enabled
Adding of a node label that is not available in the list of allowed node
labels is restricted.
If you are NOT deploying MOSK with the compact
control plane, add 3 dedicated Kubernetes manager nodes.
Add 3 Machine objects for Kubernetes manager nodes using
the following label:
If you are deploying MOSK with compact control plane,
add Machine objects for 3 combined control plane nodes using the
following labels and parameters to the nodeLabels field:
After you add bare metal hosts and create a managed cluster as described in
Create a MOSK cluster, proceed with associating Kubernetes machines
of your cluster with the previously added bare metal hosts
using the Container Cloud web UI.
To add a Kubernetes machine to a MOSK cluster:
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the required cluster name.
The cluster page with the Machines list opens.
Click Create Machine button.
Fill out the Create New Machine form as required:
Since MCC 2.28.0 (Cluster releases 17.3.0 and 16.3.0)
Name
New machine name. If empty, a name is automatically generated in the
<clusterName>-<machineType>-<uniqueSuffix> format.
Type
Machine type. Select Manager or Worker to
create a Kubernetes manager or worker node.
Caution
The required minimum number of machines:
3 manager nodes for HA
3 worker storage nodes for a minimal Ceph cluster
L2 Template
From the drop-down list, select the previously created L2 template,
if any. For details, see Create L2 templates.
Otherwise, leave the default selection to use the default L2 template
of the cluster.
Note
Before Container Cloud 2.26.0 (Cluster releases 17.1.0 and
16.1.0), if you leave the default selection in the drop-down list,
a preinstalled L2 template is used. Preinstalled templates are
removed in the above-mentioned releases.
Distribution
Operating system to provision the machine. From the drop-down list,
select Ubuntu 22.04 Jammy as the machine distribution.
Warning
Do not use obsolete Ubuntu 20.04 distribution on
greenfield deployments but only on existing clusters based on
Ubuntu 20.04, which reaches end-of-life in April 2025. MOSK 24.3
release series is the last one to support Ubuntu 20.04 as the
host operating system.
Update of management or MOSK clusters running
Ubuntu 20.04 to the following major product release, where Ubuntu
22.04 is the only supported version, is not possible.
Upgrade Index
Optional. A positive numeral value that defines the order of machine
upgrade during a cluster update.
The first machine to upgrade is always one of the control plane machines
with the lowest upgradeIndex. Other control plane machines are upgraded
one by one according to their upgrade indexes.
If the Cluster spec dedicatedControlPlane field is false, worker
machines are upgraded only after the upgrade of all control plane machines
finishes. Otherwise, they are upgraded after the first control plane
machine, concurrently with other control plane machines.
If several machines have the same upgrade index, they have the same priority
during upgrade.
If the value is not set, the machine is automatically assigned a value
of the upgrade index.
Host Configuration
Configuration settings of the bare metal host to be used for the
machine:
Host
From the drop-down list, select the previously created custom bare
metal host to be used for the new machine.
Host Profile
From the drop-down list, select the previously created custom bare
metal host profile, if any. For details, see
Create a custom bare metal host profile. Otherwise, leave the default
selection.
Labels
Add the required node labels for the worker machine to run certain
components on a specific node. For example, for the StackLight nodes
that run OpenSearch and require more resources than a standard node,
add the StackLight label. The list of available node
labels is obtained from allowedNodeLabels of your current
Cluster release.
If the value field is not defined in allowedNodeLabels, from
the drop-down list, select the required label and define an
appropriate custom value for this label to be set to the node. For
example, the node-type label can have the storage-ssd value
to meet the service scheduling logic on a particular machine.
Note
Due to the known issue 23002
fixed in Container Cloud 2.21.0 (Cluster releases 7.11.0 and
11.5.0), a custom value for a predefined node label cannot be set
using the Container Cloud web UI. For a workaround, refer to the
issue description.
Caution
If you deploy StackLight in the HA mode (recommended):
Add the StackLight label to minimum three worker
nodes. Otherwise, StackLight will not be deployed until
the required number of worker nodes is configured with
the StackLight label.
Removal of the StackLight label from worker nodes
along with removal of worker nodes with StackLight
label can cause the StackLight components to become
inaccessible. It is important to correctly maintain the worker
nodes where the StackLight local volumes were provisioned.
For details, see Delete a cluster machine.
If you move the StackLight label to a new worker machine
on an existing cluster, manually deschedule all StackLight components
from the old worker machine, which you remove the StackLight
label from. For details, see Deschedule StackLight Pods from a worker machine.
Note
To add node labels after deploying a worker machine,
navigate to the Machines page, click the
More action icon in the last column of the required
machine field, and select Configure machine.
Before MCC 2.28.0 (Cluster releases 17.2.0, 16.2.0, or earlier)
Count
Specify the number of machines to create. If you create a machine
pool, specify the replicas count of the pool.
Manager
Select Manager or Worker to create a Kubernetes
manager or worker node.
Caution
The required minimum number of machines:
3 manager nodes for HA
3 worker storage nodes for a minimal Ceph cluster
BareMetal Host Label
Assign the role to the new machine(s) to link the machine
to a previously created bare metal host with the corresponding label.
You can assign one role type per machine. The supported labels include:
Manager
This node hosts the manager services of a managed cluster.
For the reliability reasons, Container Cloud does not permit
running end user workloads on the manager nodes or use them
as storage nodes.
Worker
The default role for any node in a managed cluster.
Only the kubelet service is running on the machines of this type.
Upgrade Index
Optional. A positive numeral value that defines the order of machine upgrade
during a cluster update.
The first machine to upgrade is always one of the control plane machines
with the lowest upgradeIndex. Other control plane machines are upgraded
one by one according to their upgrade indexes.
If the Cluster spec dedicatedControlPlane field is false, worker
machines are upgraded only after the upgrade of all control plane machines
finishes. Otherwise, they are upgraded after the first control plane
machine, concurrently with other control plane machines.
If several machines have the same upgrade index, they have the same priority
during upgrade.
If the value is not set, the machine is automatically assigned a value
of the upgrade index.
Distribution
Operating system to provision the machine. From the drop-down list,
select the required Ubuntu distribution.
L2 Template
From the drop-down list, select the previously created L2 template,
if any. For details, see Create L2 templates.
Otherwise, leave the default selection to use the default L2 template
of the cluster.
Note
Before Container Cloud 2.26.0 (Cluster releases 17.1.0 and
16.1.0), if you leave the default selection in the drop-down list,
a preinstalled L2 template is used. Preinstalled templates are
removed in the above-mentioned releases.
BM Host Profile
From the drop-down list, select the previously created custom bare metal
host profile, if any. For details, see Create a custom bare metal host profile.
Otherwise, leave the default selection.
Node Labels
Add the required node labels for the worker machine to run certain
components on a specific node. For example, for the StackLight nodes
that run OpenSearch and require more resources than a standard node,
add the StackLight label. The list of available node
labels is obtained from allowedNodeLabels of your current
Cluster release.
If the value field is not defined in allowedNodeLabels, from
the drop-down list, select the required label and define an
appropriate custom value for this label to be set to the node. For
example, the node-type label can have the storage-ssd value
to meet the service scheduling logic on a particular machine.
Note
Due to the known issue 23002
fixed in Container Cloud 2.21.0 (Cluster releases 7.11.0 and
11.5.0), a custom value for a predefined node label cannot be set
using the Container Cloud web UI. For a workaround, refer to the
issue description.
Caution
If you deploy StackLight in the HA mode (recommended):
Add the StackLight label to minimum three worker
nodes. Otherwise, StackLight will not be deployed until
the required number of worker nodes is configured with
the StackLight label.
Removal of the StackLight label from worker nodes
along with removal of worker nodes with StackLight
label can cause the StackLight components to become
inaccessible. It is important to correctly maintain the worker
nodes where the StackLight local volumes were provisioned.
For details, see Delete a cluster machine.
If you move the StackLight label to a new worker machine
on an existing cluster, manually deschedule all StackLight components
from the old worker machine, which you remove the StackLight
label from. For details, see Deschedule StackLight Pods from a worker machine.
Note
To add node labels after deploying a worker machine,
navigate to the Machines page, click the
More action icon in the last column of the required
machine field, and select Configure machine.
Click Create.
At this point, Container Cloud adds the new machine object to the specified
cluster. And the Bare Metal Operator Controller creates the relation to
bare metal host with the labels matching the roles.
Provisioning of the newly created machine starts when the machine object is
created and includes the following stages:
Creation of partitions on the local disks as required by the operating
system and the Container Cloud architecture.
Configuration of the network interfaces on the host as required by the
operating system and the Container Cloud architecture.
Installation and configuration of the Container Cloud LCM Agent.
Repeat the steps above for the remaining machines.
To install MOSK on bare metal with Container Cloud,
you must create L2 templates for each node type in the MOSK
cluster. Additionally, you may have to create separate templates for nodes
of the same type when they have different configuration.
To assign specific L2 templates to machines in a cluster:
Select from the following options to assign the templates to the cluster:
Since MOSK 23.3, use the
cluster.sigs.k8s.io/cluster-name label in the labels section.
Before MOSK 23.3, use the clusterRef parameter in
the spec section.
Add a unique identifier label to every L2 template.
Typically, that would be the name of the MOSK node role,
for example l2template-compute, or l2template-compute-5nics.
Assign an L2 template to a machine. Set the l2TemplateSelector
field in the machine spec to the name of the label added in the previous
step. IPAM Controller uses this field to use a specific L2 template
for the corresponding machine.
Alternatively, you may set the l2TemplateSelector field to
the name of the L2 template.
Consider the following examples of an L2 template assignment to a machine.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Note
Before MOSK 23.3, an L2 template requires
clusterRef:<clusterName> in the spec section. Since MOSK 23.3,
this parameter is deprecated and automatically migrated to the
cluster.sigs.k8s.io/cluster-name:<clusterName> label.
Example of a Machine resource with the label-based L2 template selector¶
A machine in a MOSK cluster requires dedicated bare metal
host for deployment. In the Mirantis Container Cloud management API, bare metal
hosts are represented by the BareMetalHost objects that are automatically
generated by the related BareMetalHostInventory objects.
Note
The BareMetalHostInventory resource is available since the update
of the management cluster to the Cluster release 16.4.0 (Container Cloud
2.29.0). Before this release, the BareMetalHost object is used.
Since the above mentioned release, BareMetalHost is only used for
internal purposes of the Container Cloud private API. All configuration
changes must be applied using the BareMetalHostInventory objects.
For any existing BareMetalHost object, a BareMetalHostInventory
object is created automatically during cluster update.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
All BareMetalHostInventory objects must be labeled upon creation with a
label that allows identifying the host and assigning it to a machine.
The labels may be unique, or applied to a group of hosts, based on
similarities in their capacity, capabilities and hardware configuration,
on their location, suitable role, or a combination of thereof.
In some cases, you may need to deploy a machine to a specific
bare metal host. This is especially useful when some of your bare metal
hosts have different hardware configuration than the rest.
To deploy a machine to a specific bare metal host:
Log in to the host where your management cluster kubeconfig is located
and where kubectl is installed.
Identify the bare metal host that you want to associate with the specific
machine. For example, host host-1.
Since the management cluster update to 16.4.0 (MCC 2.29.0)
kubectlgetbaremetalhostinventoryhost-1-oyaml
Before the management cluster update to 16.4.0 (MCC 2.29.0)
kubectlgetbaremetalhosthost-1-oyaml
Add a label that will uniquely identify this host, for example, by the
name of the host and machine that you want to deploy on it.
Since the management cluster update to 16.4.0 (MCC 2.29.0)
An L2 template contains the ifMapping field that allows you to
identify Ethernet interfaces for the template. The Machine object
API enables the Operator to override the mapping from the L2 template
by enforcing a specific order of names of the interfaces when applied
to the template.
The field l2TemplateIfMappingOverride in the spec of the Machine
object contains a list of interfaces names. The order of the interfaces
names in the list is important because the L2Template object will
be rendered with NICs ordered as per this list.
Note
Changes in the l2TemplateIfMappingOverride field will apply
only once when the Machine and corresponding IpamHost objects
are created. Further changes to l2TemplateIfMappingOverride
will not reset the interfaces assignment and configuration.
Caution
The l2TemplateIfMappingOverride field must contain the names
of all interfaces of the bare metal host.
The following example illustrates how to include the override field to the
Machine object. In this example, we configure the interface eno1,
which is the second on-board interface of the server, to precede the first
on-board interface eno0.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
As a result of the configuration above, when used with the example
L2 template for bonds and bridges described in Create L2 templates,
the enp0s1 and enp0s2 interfaces will be in predictable
ordered state. This state will be used to create subinterfaces for
Kubernetes networks (k8s-pods) and for Kubernetes external
network (k8s-ext).
Also, you can use the non-case-sensitive list of NIC MAC addresses
instead of the list of NIC names. For example:
Manually allocate IP addresses for bare metal hosts¶
Available since MCC 2.26.0 (16.1.0 and 17.1.0)
You can force the DHCP server to assign a particular IP address for a bare
metal host during PXE provisioning by adding the
host.dnsmasqs.metal3.io/address annotation with the desired IP address
value to the required bare metal host.
If you have a limited amount of free and unused IP addresses for a server
provisioning, you can manually create bare metal hosts one by one and
provision servers in small, manually managed batches.
For batching in small chunks, you can use the
host.dnsmasqs.metal3.io/address annotation to manually allocate IP
addresses along with the baremetalhost.metal3.io/detached annotation to
pause automatic host management by the bare metal Operator.
To pause bare metal hosts for a manual IP allocation during provisioning:
Set the baremetalhost.metal3.io/detached annotation for all
bare metal hosts that pauses host management.
Note
If the host provisioning has already started or completed, addition
of this annotation deletes the information about the host from Ironic without
triggering deprovisioning. The bare metal Operator recreates the host
in Ironic once you remove the annotation. For details, see
Metal3 documentation.
Add the host.dnsmasqs.metal3.io/address annotation with corresponding
IP address values to a batch of bare metal hosts.
Remove the baremetalhost.metal3.io/detached annotation from the batch
used in the previous step.
Repeat the steps 2 and 3 until all hosts are provisioned.
The procedure below enables you to create a Ceph cluster with minimum three
Ceph nodes that provides persistent volumes to the Kubernetes workloads
in the managed cluster.
Substitute <managedClusterProject> and <clusterName> with
the corresponding managed cluster namespace and name.
Example of system response:
status:providerStatus:ready:trueconditions:-message:Helm charts are successfully installed(upgraded).ready:truetype:Helm-message:Kubernetes objects are fully up.ready:truetype:Kubernetes-message:All requested nodes are ready.ready:truetype:Nodes-message:Maintenance state of the cluster is falseready:truetype:Maintenance-message:TLS configuration settings are appliedready:truetype:TLS-message:Kubelet is Ready on all nodes belonging to the clusterready:truetype:Kubelet-message:Swarm is Ready on all nodes belonging to the clusterready:truetype:Swarm-message:All provider instances of the cluster are Readyready:truetype:ProviderInstance-message:LCM agents have the latest versionready:truetype:LCMAgent-message:StackLight is fully up.ready:truetype:StackLight-message:OIDC configuration has been applied.ready:truetype:OIDC-message:Load balancer 10.100.91.150 for kubernetes API has status HEALTHYready:truetype:LoadBalancer
Create a YAML file with the Ceph cluster specification:
<publicNet> is a CIDR definition or comma-separated list of
CIDR definitions (if the managed cluster uses multiple networks) of
public network for the Ceph data. The values should match the
corresponding values of the cluster Subnet object.
<clusterNet> is a CIDR definition or comma-separated list of
CIDR definitions (if the managed cluster uses multiple networks) of
replication network for the Ceph data. The values should match
the corresponding values of the cluster Subnet object.
Configure Subnet objects for the Storage access network by setting
ipam/SVC-ceph-public:"1" and ipam/SVC-ceph-cluster:"1" labels
to the corresponding Subnet objects. For more details, refer to
Create subnets for a managed cluster using CLI, Step 5.
Configure Ceph Manager and Ceph Monitor roles to select nodes that must
place Ceph Monitor and Ceph Manager daemons:
Obtain the names of machines to place Ceph Monitor and Ceph Manager
daemons at:
kubectl-n<managedClusterProject>getmachine
Add the nodes section with mon and mgr roles defined:
Substitute <mgr-node-X> with the corresponding Machine object
names and <role-X> with the corresponding roles of daemon placement,
for example, mon or mgr.
Configure Ceph OSD daemons for Ceph cluster data storage:
Note
This step involves the deployment of Ceph Monitor and Ceph Manager
daemons on nodes that are different from the ones hosting Ceph cluster
OSDs. However, you can also colocate Ceph OSDs, Ceph Monitor, and
Ceph Manager daemons on the same nodes by configuring the roles and
storageDevices sections accordingly. This kind of configuration
flexibility is particularly useful in scenarios such as hyper-converged
clusters.
Warning
The minimal production cluster requires at least three nodes
for Ceph Monitor daemons and three nodes for Ceph OSDs.
Obtain the names of machines with disks intended for storing Ceph data:
kubectl-n<managedClusterProject>getmachine
For each machine, use status.providerStatus.hardware.storage
to obtain information about node disks:
Select by-id symlinks on the disks to be used in the Ceph cluster.
The symlinks must meet the following requirements:
A by-id symlink must contain
status.providerStatus.hardware.storage.serialNumber
A by-id symlink must not contain wwn
For the example above, to use the sdc disk to store Ceph data on it,
select the /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_2e52abb48862dbdc
symlink. It is persistent and will not be affected by node reboot.
Specify the selected by-id symlinks in the
spec.cephClusterSpec.nodes.storageDevices.fullPath field
along with the
spec.cephClusterSpec.nodes.storageDevices.config.deviceClass
field:
Specify the selected by-id symlinks in the
spec.cephClusterSpec.nodes.storageDevices.name field
along with the
spec.cephClusterSpec.nodes.storageDevices.config.deviceClass
field:
Each Ceph pool, depending on its role, has the default targetSizeRatio
value that defines the expected consumption of the total Ceph cluster
capacity. The default ratio values for MOSK pools are
as follows:
20.0% for a Ceph pool with the role volumes
40.0% for a Ceph pool with the role vms
10.0% for a Ceph pool with the role images
10.0% for a Ceph pool with the role backup
Optional. Configure Ceph Block Pools to use RBD. For the detailed
configuration, refer to Pool parameters.
Once all pools are created, verify that an appropriate secret required for
a successful deployment of the OpenStack services that rely on Ceph is
created in the openstack-ceph-shared namespace:
Mirantis highly recommends adding a Ceph cluster using the CLI
instead of the web UI.
The web UI capabilities for adding a Ceph cluster are limited and lack
flexibility in defining Ceph cluster specifications.
For example, if an error occurs while adding a Ceph cluster using the
web UI, usually you can address it only through the CLI.
The web UI functionality for managing Ceph cluster is going to be
deprecated in one of the following releases.
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the required cluster name.
The Cluster page with the Machines and
Ceph clusters lists opens.
In the Ceph Clusters block, click Create Cluster.
Configure the Ceph cluster in the Create New Ceph Cluster
wizard that opens:
Replication network for Ceph OSDs. Must contain the CIDR definition
and match the corresponding values of the cluster L2Template
object or the environment network values.
Public Network
Public network for Ceph data. Must contain the CIDR definition and
match the corresponding values of the cluster L2Template object
or the environment network values.
Enable OSDs LCM
Select to enable LCM for Ceph OSDs.
Machines / Machine #1-3
Select machine
Select the name of the Kubernetes machine that will host
the corresponding Ceph node in the Ceph cluster.
Manager, Monitor
Select the required Ceph services to install on the Ceph node.
Devices
Select the disk that Ceph will use.
Warning
Do not select the device for system services,
for example, sda.
Warning
A Ceph cluster does not support removable devices that are
hosts with hotplug functionality enabled. To use devices as Ceph OSD
data devices, make them non-removable or disable the hotplug
functionality in the BIOS settings for disks that are configured
to be used as Ceph OSD data devices.
Enable Object Storage
Select to enable the single-instance RGW Object Storage.
To add more Ceph nodes to the new Ceph cluster, click +
next to any Ceph Machine title in the Machines tab.
Configure a Ceph node as required.
Warning
Do not add more than 3 Manager and/or Monitor
services to the Ceph cluster.
After you add and configure all nodes in your Ceph cluster, click
Create.
Each Ceph pool, depending on its role, has a default targetSizeRatio
value that defines the expected consumption of the total Ceph cluster
capacity. The default ratio values for MOSK pools are
as follows:
20.0% for a Ceph pool with role volumes
40.0% for a Ceph pool with role vms
10.0% for a Ceph pool with role images
10.0% for a Ceph pool with role backup
Once all pools are created, verify that an appropriate secret required for
a successful deployment of the OpenStack services that rely on Ceph is
created in the openstack-ceph-shared namespace:
This section instructs you on how to deploy OpenStack on top of Kubernetes
as well as how to troubleshoot the deployment and access your OpenStack
environment after deployment.
This section instructs you on how to deploy OpenStack on top of Kubernetes
using the OpenStack Controller (Rockoon) and
openstackdeployments.lcm.mirantis.com (OsDpl) CR.
To deploy an OpenStack cluster:
Verify that you have pre-configured the networking according to
Networking.
Verify that the TLS certificates that will be required for the OpenStack
cluster deployment have been pre-generated.
Note
The Transport Layer Security (TLS) protocol is mandatory on public
endpoints.
Caution
To avoid certificates renewal with subsequent OpenStack
updates during which additional services with new public endpoints may
appear, we recommend using wildcard SSL certificates for public
endpoints. For example, *.it.just.works, where it.just.works is
a cluster public domain.
The sample code block below illustrates how to generate a self-signed
certificate for the it.just.works domain. The procedure presumes
the cfssl and cfssljson tools are installed on the
machine.
Create the openstackdeployment.yaml file that will include the
OpenStack cluster deployment configuration. For the configuration details,
refer to OpenStack configuration and API Reference.
Note
The resource of kind OpenStackDeployment (OsDpl) is a custom
resource defined by a resource of kind
CustomResourceDefinition. The resource is validated with
the help of the OpenAPI v3 schema.
Configure the OsDpl resource depending on the needs of your deployment.
For the configuration details, refer to OpenStack configuration.
Example of an OpenStackDeployment CR of minimum configuration¶
To the openstackdeployment object, add information about the TLS
certificates:
ssl:public_endpoints:ca_cert - CA certificate content (ca.pem)
ssl:public_endpoints:api_cert - server certificate content
(server.pem)
ssl:public_endpoints:api_key - server private key (server-key.pem)
Verify that the Load Balancer network does not overlap your corporate
or internal Kubernetes networks, for example, Calico IP pools. Also,
verify that the pool of Load Balancer network is big enough to provide
IP addresses for all Amphora VMs (loadbalancers).
If required, reconfigure the Octavia network settings using the
following sample structure:
If you are using the default backend to store OpenStack database backups,
which is Ceph, you may want to increase the default size of the allocated
storage since there is no automatic way to resize the backup volume once
the cloud is deployed.
This section includes configuration information for available advanced
Mirantis OpenStack for Kubernetes features that include DPDK with the Neutron
OVS backend, huge pages, CPU pinning, and other Enhanced Platform Awareness
(EPA) capabilities.
This section instructs you on how to configure LVM as backend for the VM disks
and ephemeral storage.
You can use flexible size units throughout bare metal host profiles.
For example, you can now use either sizeGiB:0.1 or size:100Mi
when specifying a device size.
Mirantis recommends using only one parameter name type and units throughout
the configuration files. If both sizeGiB and size are used,
sizeGiB is ignored during deployment and the suffix is adjusted
accordingly. For example, 1.5Gi will be serialized as 1536Mi. The size
without units is counted in bytes. For example, size:120 means 120 bytes.
Warning
All data will be wiped during cluster deployment on devices
defined directly or indirectly in the fileSystems list of
BareMetalHostProfile. For example:
A raw device partition with a file system on it
A device partition in a volume group with a logical volume that has a
file system on it
An mdadm RAID device with a file system on it
An LVM RAID device with a file system on it
The wipe field is always considered true for these devices.
The false value is ignored.
Therefore, to prevent data loss, move the necessary data from these file
systems to another server beforehand, if required.
Warning
Usage of more than one nonvolatile memory express (NVMe) drive per
node may cause update issues and is therefore not supported.
To enable LVM ephemeral storage:
In BareMetalHostProfile in the spec:volumeGroups section, add
the following configuration for the OpenStack compute nodes:
This section instructs you on how to configure LVM as a backend for the
OpenStack Block Storage service.
You can use flexible size units throughout bare metal host profiles.
For example, you can now use either sizeGiB:0.1 or size:100Mi
when specifying a device size.
Mirantis recommends using only one parameter name type and units throughout
the configuration files. If both sizeGiB and size are used,
sizeGiB is ignored during deployment and the suffix is adjusted
accordingly. For example, 1.5Gi will be serialized as 1536Mi. The size
without units is counted in bytes. For example, size:120 means 120 bytes.
Warning
All data will be wiped during cluster deployment on devices
defined directly or indirectly in the fileSystems list of
BareMetalHostProfile. For example:
A raw device partition with a file system on it
A device partition in a volume group with a logical volume that has a
file system on it
An mdadm RAID device with a file system on it
An LVM RAID device with a file system on it
The wipe field is always considered true for these devices.
The false value is ignored.
Therefore, to prevent data loss, move the necessary data from these file
systems to another server beforehand, if required.
To enable LVM block storage:
Open BareMetalHostProfile for editing.
In the spec:volumeGroups section, specify the following data
for the OpenStack compute nodes. In the following example, we deploy
a Cinder volume with LVM on compute nodes.
However, you can use dedicated nodes for this purpose.
Since MOSK 23.1, the open-iscsi
package does not install by default on bare metal hosts. Install
it manually during cluster deployment in BareMetalHostProfile
in the spec:postDeployScript section:
This section instructs you on how to enable DPDK with the Neutron OVS back
end.
Warning
Usage of third-party software, which is not part of
Mirantis-supported configurations, for example, the use of custom DPDK
modules, may block upgrade of an operating system distribution. Users are
fully responsible for ensuring the compatibility of such custom components
with the latest supported Ubuntu version.
To enable DPDK with OVS:
Verify that your deployment meets the following requirements:
The required drivers have been installed on the host operating system.
This section instructs you on how to enable SR-IOV with the Neutron OVS back
end.
To enable SR-IOV with OVS:
Verify that your deployment meets the following requirements:
NICs with the SR-IOV support are installed
SR-IOV and VT-d are enabled in BIOS
Enable IOMMU in the kernel by configuring intel_iommu=on in the GRUB
configuration file. Specify the parameter for compute nodes in
BareMetalHostProfile in the grubConfig section:
The BGP VPN service is an extra OpenStack Neutron plugin that enables
connection of OpenStack Virtual Private Networks with external VPN
sites through either BGP/MPLS IP VPNs or E-VPN.
To enable the BGP VPN service:
Enable BGP VPN in the OsDpl custom resource through the
nodespecificoverrides settings. For example:
spec:features:neutron:bgpvpn:enabled:trueroute_reflector:# Enable deploygin FRR route reflectorenabled:true# Local AS numberas_number:64512# List of subnets we allow to connect to# router reflector BGPneighbor_subnets:-10.0.0.0/8-172.16.0.0/16nodes:rockoon-openstack-compute-node::enabled:features:neutron:bgpvpn:enabled:true
When the service is enabled, a route reflector is scheduled on nodes with
the openstack-frrouting:enabled label. Mirantis recommends collocating
the route reflector nodes with the OpenStack controller nodes. By default, two
replicas are deployed.
MOSK allows configuring Internet Protocol Security
(IPSec) encryption for the east-west tenant traffic between the OpenStack
compute nodes and gateways. The feature uses the
strongSwan open source IPSec solution.
Authentication is accomplished through a pre-shared key (PSK). However, other
authentication methods are upcoming.
To encrypt the east-west tenant traffic, enable ipsec in the
spec:features:neutron settings of the OpenStackDeployment CR:
spec:features:neutron:ipsec:enabled:true
Caution
Enabling IPSec adds extra headers to the tenant traffic. The
header size varies depending on IPSec configuration.
This section instructs you on how to configure Cinder backend for the
for images through the OpenStackDeployment CR.
Note
This feature depends heavily on Cinder multi-attach, which
enables you to simultaneously attach volumes to multiple instances.
Therefore, only the block storage backends that support multi-attach
can be used.
To configure a Cinder backend for Glance, define the backend identity
in the OpenStackDeployment CR. This identity will be used as a name for
the backend section in the Glance configuration file.
When defining the backend:
Configure one of the backends as default.
Configure each backend to use specific Cinder volume type.
Note
You can use the cinder_volume_type parameter instead of
backend_name. If so, you have to create this volume type beforehand
and take into account that the bootstrap script does not manage such
volume types.
The blockstore identity definition example:
spec:features:glance:backends:cinder:blockstore:default:truebackend_name:<volume_type:volume_name># e.g. backend_name: lvm:lvm_storespec:features:glance:backends:cinder:blockstore:default:truecinder_volume_type:netapp
This section instructs you on how to enable Cinder volume encryption
through the OpenStackDeployment CR using Linux Unified Key Setup (LUKS)
and store the encryption keys in Barbican. For details, see
Volume encryption.
To enable Cinder volume encryption:
In the OpenStackDeployment CR, specify the LUKS volume type and
configure the required encryption parameters for the storage system to
encrypt or decrypt the volume.
To create an encrypted volume as a non-admin user and store keys in the
Barbican storage, assign the creator role to the user since the default
Barbican policy allows only the admin or creator role:
The section describes how to perform advanced configuration for the OpenStack
compute nodes. Such configuration can be required in some specific use cases,
such as DPDK, SR-IOV, or huge pages features usage.
Configuration recommendations for compute node types¶
This section contains recommendations for configuration of an
OpenStackDeployment custom resource for the compute nodes
of the following types:
Compute nodes with the default configuration, without local NVMe storage and
SR-IOV network interface cards (NICs)
Compute nodes with the NVMe local storage
Compute nodes with the SR-IOV NICs
Compute nodes with both the NVMe local storage and SR-IOV NICs
Note
If the local NVMe storage is enabled, Mirantis recommends using it
and enable SR-IOV if possible.
Caution
Before using the NVMe local storage and mount point, define them
in BareMetalHostProfile. For example:
apiVersion:metal3.io/v1alpha1kind:BareMetalHostProfile...spec:devices:...-device:byName:/dev/nvme0n1minSizeGiB:30wipe:truepartitions:-name:local-volumes-partitionsizeGiB:0wipe:true...fileSystems:...-fileSystem:ext4partition:local-volumes-partition# mountpoint for Nova imagesmountPoint:/var/lib/nova
To control the storage type (local NVMe or Ceph) for virtual
machines, place a node into the OpenStack aggregate. For details, see
OpenStack documentation: Host aggregates.
As defined in Node-specific configuration, each node with a non-default configuration
must be configured separately. Each Machine object must have a
configuration-specific label. For example, for a compute node with the local
NVMe storage:
In the examples above, compute-sriov, compute-nvme-sriov, and
compute-nvme are human-readable string identifiers. You can use any unique
string value for each type of compute node.
In the OpenStackDeployment object of each node group, define its own
section that starts with <NODE-LABEL>::<NODE-LABEL-VALUE>::
Mirantis OpenStack for Kubernetes (MOSK) enables you to configure the
vCPU model for all instances managed by the OpenStack Compute service (Nova)
using the following osdpl definition:
spec:features:nova:vcpu_type:host-model
For the supported values and configuration examples, see Virtual CPU.
The instruction provided in this section applies to both
OpenStack with OVS and OpenStack with Tungsten Fabric topologies.
The huge pages OpenStack feature provides essential performance improvements
for applications that are highly memory IO-bound. Huge pages should be enabled
on a per compute node basis. By default, NUMATopologyFilter is enabled.
To activate the feature, you need to enable huge pages on the dedicated
bare metal host as described in Enable huge pages in a host profile during
the predeployment bare metal configuration.
Note
The multi-size huge pages are not fully supported by Kubernetes
versions before 1.19. Therefore, define only one size in kernel parameters.
The below procedure applies only to deployments based on
deprecated Ubuntu 20.04. For Ubuntu 22.04 that supports cgroup v2,
use the cpushield module.
For the procedure details, see Day-2 operations.
CPU isolation is a way to force the system scheduler to use only some logical
CPU cores for processes. For compute hosts, you should typically isolate system
processes and virtual guests on different cores through the cpusets
mechanism in Linux kernel.
The Linux kernel and cpuset provide a mechanism to run tasks by limiting the
resources defined by a cpuset. The tasks can be moved from one cpuset to
another to use the resources defined in other cpusets. The cset Python tool
is a command-line interface to work with cpusets.
To configure CPU isolation using cpusets:
Configure core isolation:
Note
You can also automate this step during deployment by using the
postDeploy script as described in Create MOSK host profiles.
cat<<-"EOF">/usr/bin/setup-cgroups.sh
#!/bin/bashset-x
UNSHIELDED_CPUS=${UNSHIELDED_CPUS:-"0-3"}UNSHIELDED_MEM_NUMAS=${UNSHIELDED_MEM_NUMAS:-0}SHIELD_CPUS=${SHIELD_CPUS:-"4-15"}SHIELD_MODE=${SHIELD_MODE:-"cpuset"}# One of isolcpu or cpusetDOCKER_CPUS=${DOCKER_CPUS:-$UNSHIELDED_CPUS}DOCKER_MEM_NUMAS=${DOCKER_MEM_NUMAS:-$UNSHIELDED_MEM_NUMAS}KUBERNETES_CPUS=${KUBERNETES_CPUS:-$UNSHIELDED_CPUS}KUBERNETES_MEM_NUMAS=${KUBERNETES_MEM_NUMAS:-$UNSHIELDED_MEM_NUMAS}CSET_CMD=${CSET_CMD:-"python3 /usr/bin/cset"}if[[${SHIELD_MODE}=="cpuset"]];then${CSET_CMD}set-c${UNSHIELDED_CPUS}-m${UNSHIELDED_MEM_NUMAS}-ssystem
${CSET_CMD}proc-m-froot-tsystem
${CSET_CMD}proc-k-froot-tsystem
fi${CSET_CMD}set--cpu=${DOCKER_CPUS}--mem=${DOCKER_MEM_NUMAS}--set=docker
${CSET_CMD}set--cpu=${KUBERNETES_CPUS}--mem=${KUBERNETES_MEM_NUMAS}--set=kubepods
${CSET_CMD}set--cpu=${DOCKER_CPUS}--mem=${DOCKER_MEM_NUMAS}--set=com.docker.ucp
EOF
chmod+x/usr/bin/setup-cgroups.sh
cat<<-"EOF">/etc/systemd/system/shield-cpus.service
[Unit]Description=ShieldCPUs
DefaultDependencies=no
After=systemd-udev-settle.service
Before=lvm2-activation-early.service
Wants=systemd-udev-settle.service
[Service]ExecStart=/usr/bin/setup-cgroups.sh
RemainAfterExit=trueType=oneshot
Restart=on-failure#Service should restart on failureRestartSec=5s#Restart each five seconds until success[Install]WantedBy=basic.target
EOF
systemctlenableshield-cpus
reboot
As root user, verify that isolation has been applied:
The majority of CPU topologies features are activated by NUMATopologyFilter
that is enabled by default. Such features do not require any further service
configuration and can be used directly on a vanilla MOSK
deployment. The list of the CPU topologies features includes, for example:
NUMA placement policies
CPU pinning policies
CPU thread pinning policies
CPU topologies
To enable libvirt CPU pinning through the node-specific overrides in the
OpenStackDeployment custom resource, use the following sample
configuration structure:
The Peripheral Component Interconnect (PCI) passthrough feature in OpenStack
allows full access and direct control over physical PCI devices in guests.
The mechanism applies to any kind of PCI devices including a Network
Interface Card (NIC), Graphics Processing Unit (GPU), and any other device
that can be attached to a PCI bus. The only requirement for the guest
to properly use the device is to correctly install the driver.
To enable PCI passthrough in a MOSK deployment:
For Linux X86 compute nodes, verify that the following
features are enabled on the host:
Configure the nova-api service that is scheduled
on OpenStack controllers nodes. To generate the alias for PCI
in nova.conf, add the alias configuration through the
OpenStackDeployment CR.
Note
When configuring PCI with SR-IOV on the same host, the values
specified in alias take precedence. Therefore, add the SR-IOV
devices to passthrough_whitelist explicitly.
Configure the nova-compute service that is scheduled
on OpenStack compute nodes. To enable Nova to pass PCI devices
to virtual machines, configure the passthrough_whitelist
section in nova.conf through the node-specific overrides in the
OpenStackDeployment CR. For example:
MOSK enables you to configure initial oversubscription
through the OpenStackDeployment custom resource. For configuration details
and oversubscription considerations, refer to
Configuring initial resource oversubscription.
By default, the following values are applied:
8.0 for the number of CPUs
1.6 for the disk space
1.0 for the amount of RAM
Note
In MOSK 22.5 and earlier, the effective default
value of RAM allocation ratio is 1.1.
Changing oversubscription configuration after deployment will only affect the
newly added compute nodes and will not change oversubscription for already
existing compute nodes. You can change oversubscription for existing compute
nodes through the placement API as described in Change oversubscription settings for existing compute nodes.
Hyperconverged architecture combines OpenStack compute nodes along with Ceph
nodes. To avoid nodes overloading, which can cause Ceph performance degradation
and outage, limit the hardware resources consumption by the OpenStack compute
services.
cpu_allocation_ratio - in case of a hyperconverged architecture, the
value depends on the number of vCPU used for non-workload related operations,
total number of vCPUs of a hyperconverged node, and on workload vCPU
consumption:
In this case, if there are 40 vCPUs per hyperconverged node, 28 vCPUs are
required for non-workload related calculations, and a workload consumes 50%
of the allocated CPU time:
cpu_allocation_ratio=(40-28)/40/0.5=0.6.
reserved_host_memory_mb - a dedicated variable in the OpenStack Nova
configuration, to reserve memory for non-OpenStack related VM activities:
For example for 10 Ceph OSDs per hyperconverged node:
reserved_host_memory_mb=13GB*10=130GB=133120
ram_allocation_ratio - the allocation ratio of virtual RAM to physical
RAM. To completely exclude the possibility of memory overcommitting, set to
1.
To limit HW resources for hyperconverged OpenStack compute nodes:
In the OpenStackDeployment CR, specify the cpu_allocation_ratio,
ram_allocation_ratio, and reserved_host_memory_mb parameters as
required using the calculations described above.
This section delves into virtual GPU configuration. It is specifically
tailored for NVIDIA physical GPUs, with a focus on the A100 40GB GPU and NVIDIA
AIE 4.1 drivers.
While setup procedures may vary among different cards and vendors,
MOSK can generally ensure compatibility between the
MOSK Compute service (Nova) and vGPU functionality,
as long as the drivers for the physical GPU expose an VFIO mdev-compatible
interface to the Linux host.
For configuration specifics of other physical GPUs, refer to the official
documentation provided by the vendor.
To install the acquired drivers within your cluster, add a custom
postDeployScript script to the custom BareMetalHostProfile object
used for the compute nodes with GPUs.
Virtual GPU types are similar to compute flavors as they determine the
resources allocated to each virtual GPU. This allows for efficient allocation
and optimization of GPU resources in virtualized environments.
Each physical GPU has a maximum number of virtual GPUs of a specific type that
can be created on it, with no possibility for overallocation. In the
time-sliced vGPU configuration, each particular physical GPU can only
instantiate vGPUs of the same selected type. In the Multi-Instance GPU (MIG),
a single physical GPU may be partitioned into several differently sized
virtual GPUs.
Either way, prior to accepting workloads, Mirantis recommends determining the
virtual GPU types that each of your physical GPU will provide. Altering
these settings afterward necessitates terminating every virtual machine
currently running on the physical GPU intended for reconfiguration or
repurposing for another virtual GPU type.
This section outlines the process for partitioning physical GPUs into
Multi-Instance GPUs (MIG) using the nvidia-smi tool provided
by the NVIDIA Host GPU driver.
To create seven, which is a maximum possible amount of instances according to
the system response above, MIG vGPUs of the smallest size:
nvidia-smimig-cgi19,19,19,19,19,19,19
To create three differently sized vGPUs of 4g.20gb, 2g.10gb, and
1g.5gb sizes:
nvidia-smimig-cgi5,14,19
Caution
Keep in mind that not all combinations of differently sized vGPU
instances are supported. Additionally, the order in which you create vGPUs
is important.
To determine the mdev class supported by a specific virtual GPU type
listed by a PCI device address, verify the output of the
mdevctl types command executed on the compute node that has a
physical GPU available on it:
The Name field from the example system output above corresponds to the
supported virtual GPU type, linking the GPU instance profile with the mdev class
supported by your physical GPU.
In the example above, the MIG1g.5gb GPU instance profile corresponds
to the GRIDA100-1-5C vGPU type as per NVIDIA documentation, and according
to the mdevctl types` output, it corresponds to the nvidia-474
mdev class.
Note
Notice that Availableinstances is zero for vGPU types that are
not actually supported by this given card and configuration. For MIGs,
the Availableinstances will be non-zero only for the virtual GPU types
for which the MIG virtual GPU instances have already been created. See
Partition to Multi-Instance GPUs.
The parameters you need to define for the nova-compute service on each
compute node with physical GPUs you want to expose as virtual GPUs include:
[devices]enabled_mdev_types
Required. List of the mdev classes, see the previous step for details.
[devices]cleanup_mdev_devices
Optional. By default, the Compute service does not delete created mdev
devices but reuses them instead. While this speeds up processes, it may
pose challenges when reconfiguring the enabled_mdev_types parameter.
Set cleanup_mdev_devices to True for the Compute service to
auto-delete created mdev devices upon instance deletion.
If you plan to use only time-sliced vGPUs and provide a single virtal GPU type
across the entire cloud, you only need to configure the options mentioned
above once globally for all compute nodes through the spec.services section
of the OpenStackDeployment custom resource.
With the configuration below, the Compute service will auto-detect all PCI
devices that provide this mdev type and automatically create required resource
providers in the placement service with the resource class VGPU.
Example configuration for the nvidia-474 mdev type:
If you plan to provide multiple time-sliced vGPU types, simplify the
configuration by grouping the nodes based on a node label (not necessarily
aggregates). Ensure that each group exposes only one mdev type using the
Node-specific configuration settings. Additionally, use custom resource classes to
facilitate flavor creation, ensuring consistent use of the CUSTOM_ prefix
for custom mdev_class.
For example, if you want to provide the nvidia-474 and nvidia-475 mdev
types, label your nodes with the vgpu-type=nvidia-474 and
vgpu-type=nvidia-475 labels and use the following node-specific settings:
The configuration above creates corresponding resource providers in the
placement service that provide CUSTOM_VGPU_A100_1_5C or
CUSTOM_VGPU_A100_2_10C resources.
You can use these resources during the definition of flavors for instances
with corresponding vGPU types.
In some cases, you may need to provide different vGPU types from a single
compute node, for example, if the compute node has 2 physical GPUs and
you want to create two different types of vGPU on them. For such scenarios,
you should provide explicit PCI device addresses of these physical GPUs in
the settings. This makes such configuration verbose in heterogeneous hardware
environments where physical GPUs have different PCI addresses on each node.
For example, when targeting node-specific settings by node name:
In the SR-IOV mode, the driver typically creates more virtual functions than
the maximum capacity of the physical GPU, even for the smallest virtual GPU
type. Each virtual function can hold only one single virtual GPU. This leads
to resource over-reporting to the placement service.
Therefore, to ensure efficient resource allocation and utilization within a
homogeneous hardware environment, assuming that each compute node in it has
the same PCI address for the physical GPU and the physical GPU has been
partitioned to the MIG GPU instances identically:
Identify the number of instances created of each MIG profile.
Select random but not overlapping sets of PCI addresses from the list of
virtual functions of the physical GPU. The amount of addresses in each set
must correspond to the number of instances created of each MIG profile.
Assign the mdev type to the selected devices.
For example, for the environment with the following configuration:
3 MIG instances of MIG1.5gb and 2 MIG instances of MIG2.10gb
16 virtual functions created for the physical GPU with the PCI address range
from 0000:42:00.0 to 0000:42:01.7
Pick 3 and 2 random PCI addresses from that pool and assign them to
CUSTOM_VGPU_A100_1_5C and CUSTOM_VGPU_A100_2_10C mdev classes
respectively:
In a heterogeneous hardware environment, use node-specific settings to group
nodes with the same PCI addresses and intended vGPU configuration, or use
explicit setting for each node targeting node-specific settings to every
node, sequentially if needed.
This section provides guidelines for verifying that virtual GPUs are correctly
accounted for in the OpenStack Placement service, ensuring proper scheduling
of instances that utilize virtual GPUs.
Firstly, verify that resource providers have been created with accurate
inventories. For each PCI device associated with a virtual GPU, including
virtual instances in the case of MIG/SR-IOV, there should be a nested resource
provider under the resource provider of the corresponding compute node.
The name of this nested resource provider should follow the format
<node-name>_pci_<pci-address-with-underscores>:
Also, examine the inventory of each resource provider. It should exclusively
consist of the VGPU resource or any custom resource name configured in
the Compute service settings. The total capacity of the resource should match
the capacity reported by the mdevctl types output, reflecting the
capabilities of the PCI device for the specified mdev class. In the case
of MIG, this total capacity will always be 1.
This section provides instructions for creating a flavor that requests
a specific virtual GPU resource, using the mdev classes configured in the
Compute service and registered in the placement service.
To create the flavor, use the openstack flavor create command.
Ensure that the flavor properties match the configured mdev classes in the
Compute service. For example, to request one vGPU of type nvidia-474 using
the resource class from the previous examples:
Mirantis OpenStack for Kubernetes (MOSK) enables you to perform image
signature verification when booting an OpenStack instance, uploading
a Glance image with signature metadata fields set, and creating a volume
from an image.
To enable signature verification, use the following osdpl definition:
spec:features:glance:signature:enabled:true
When enabled during initial deployment, all internal images such as
Amphora, Ironic, and test (CirrOS, Fedora, Ubuntu) images, will be signed by
a self-signed certificate.
Mirantis OpenStack for Kubernetes (MOSK) allows configuring
LoadBalancer for the Designate PowerDNS backend. For example, you can expose a
TCP port for zone transferring using the following exemplary osdpl
definition:
DNS is a mandatory component for MOSK deployment, all
records must be created on the customer DNS server. The OpenStack services
are exposed through the Ingress NGINX controller.
Warning
This document describes how to temporarily configure DNS.
The workflow contains non-permanent changes that will be rolled back
during a managed cluster update or reconciliation loop.
Therefore, proceed at your own risk.
To configure DNS to access your OpenStack environment:
Obtain the external IP address of the Ingress service:
Deploy a standalone CoreDNS server by including the following
configuration into coredns.yaml:
apiVersion:lcm.mirantis.com/v1alpha1kind:HelmBundlemetadata:name:corednsnamespace:osh-systemspec:repositories:-name:hub_stableurl:https://charts.helm.sh/stablereleases:-name:corednschart:hub_stable/corednsversion:1.8.1namespace:corednsvalues:image:repository:mirantis.azurecr.io/openstack/extra/corednstag:"1.6.9"isClusterService:falseservers:-zones:-zone:.scheme:dns://use_tcp:falseport:53plugins:-name:cacheparameters:30-name:errors# Serves a /health endpoint on :8080, required for livenessProbe-name:health# Serves a /ready endpoint on :8181, required for readinessProbe-name:ready# Required to query kubernetes API for data-name:kubernetesparameters:cluster.local-name:loadbalanceparameters:round_robin# Serves a /metrics endpoint on :9153, required for serviceMonitor-name:prometheusparameters:0.0.0.0:9153-name:forwardparameters:. /etc/resolv.conf-name:fileparameters:/etc/coredns/it.just.works.db it.just.worksserviceType:LoadBalancerzoneFiles:-filename:it.just.works.dbdomain:it.just.workscontents:|it.just.works. IN SOA sns.dns.icann.org. noc.dns.icann.org. 2015082541 7200 3600 1209600 3600it.just.works. IN NS b.iana-servers.net.it.just.works. IN NS a.iana-servers.net.it.just.works. IN A 1.2.3.4*.it.just.works. IN A 1.2.3.4
Update the public IP address of the Ingress service:
sed -i 's/1.2.3.4/10.172.1.101/' coredns.yamlkubectl apply -f coredns.yaml
Point your machine to use the correct DNS. It is 10.172.1.102
in the example system response above.
If you plan to launch Tempest tests or use the OpenStack client from
a keystone-client-XXX pod, verify that the Kubernetes built-in
DNS service is configured to resolve your public FQDN records by
adding your public domain to Corefile. For example,
to add the it.just.works domain:
This section explains how to access your OpenStack environment as admin user.
Before you proceed, make sure that you can access the Kubernetes API and have
privileges to read secrets from the openstack-external namespace in
Kubernetes or you are able to exec to the pods in the openstack
namespace.
Access OpenStack using the Kubernetes built-in admin CLI¶
You can use the built-in admin CLI client and execute the openstack
commands from a dedicated pod deployed in the openstack namespace:
This pod has python-openstackclient and all required plugins already
installed. The python-openstackclient command-line client is configured
to use the admin user credentials. You can view the detailed configuration
for the openstack command in /etc/openstack/clouds.yaml file in
the pod.
Access Horizon through your browser using its public service.
For example, https://horizon.it.just.works.
To log in, specify the user name and domain name obtained in previous step
from the <ADMIN_USER_NAME> and <ADMIN_USER_DOMAIN> values.
If the OpenStack Identity service has been deployed with the OpenID Connect
integration:
From the Authenticate using drop-down menu, select
OpenID Connect.
Click Connect. You will be redirected to your identity
provider to proceed with the authentication.
Note
If OpenStack has been deployed with self-signed TLS certificates
for public endpoints, you may get a warning about an untrusted
certificate. To proceed, allow the connection.
Access OpenStack through CLI from your local machine¶
To be able to access your OpenStack environment through the CLI,
you need to configure the openstack client environment
using either an openstackrc environment file or clouds.yaml file.
If OpenStack was deployed with self-signed TLS certificates
for public endpoints, you may need to use the openstack
command-line client with certificate validation disabled. For example:
This section provides the general debugging instructions for your OpenStack on
Kubernetes deployment. Start your troubleshooting with the determination of
the failing component that can include the OpenStack Controller (Rockoon),
Helm, a particular pod, or service.
Since MOSK 25.1, the OpenStack Controller has been open-sourced under the
name Rockoon and is maintained as an independent open-source project
going forward.
As part of this transition, all openstack-controller pods are named
rockoon pods across the MOSK documentation and deployments. This change
does not affect functionality, but this is the reminder for the users to
utilize the new naming for pods and other related artifacts accordingly.
Since MOSK 25.1, the OpenStack Controller has been open-sourced under the
name Rockoon and is maintained as an independent open-source project
going forward.
As part of this transition, all openstack-controller pods are named
rockoon pods across the MOSK documentation and deployments. This change
does not affect functionality, but this is the reminder for the users to
utilize the new naming for pods and other related artifacts accordingly.
The OpenStack Controller (Rockoon) is running in several containers in the
rockoon-xxxx pod in the osh-system namespace. For the full list
of containers and their roles, refer to OpenStack Controller (Rockoon).
To verify the status of the OpenStack Controller, run:
This section includes the ways to mitigate the most common issues with the
OsDpl CR. We assume that you have already debugged the Helm releases and
OpenStack Controller to rule out possible failures with these components
as described in Debugging the Helm releases and Debugging the OpenStack Controller.
Possible root cause: MOSK uses the Kubernetes entrypoint
init container to resolve dependencies between objects. If the pod is stuck
in Init:0/X, this pod may be waiting for its dependencies.
Possible root cause: some OpenStack services depend on Ceph. These services
include OpenStack Image, OpenStack Compute, and OpenStack Block Storage.
If the Helm releases for these services are not present, the
openstack-ceph-keys secret may be missing in the openstack-ceph-shared
namespace.
To debug the issue:
Verify that the Ceph Controller has created the openstack-ceph-keys
secret in the openstack-ceph-shared namespace:
Since MOSK 25.1, the OpenStack Controller has been open-sourced under the
name Rockoon and is maintained as an independent open-source project
going forward.
As part of this transition, all openstack-controller pods are named
rockoon pods across the MOSK documentation and deployments. This change
does not affect functionality, but this is the reminder for the users to
utilize the new naming for pods and other related artifacts accordingly.
Support dump described in this section specifically targets OpenStack
components, providing valuable insights for troubleshooting
OpenStack-related problems.
To generate a support dump for your MOSK environment,
use the osctl sos report tool present within the rockoon
image.
This section focuses only on the essential capabilities of the tool.
For all available parameters, consult osctl sos report --help.
The support dump is modular. Each module is responsible for specific
functionality. To enable or disable specific modules during support
dump creation, use the --collector option. If not specified,
all collectors are used.
Collects logs from StackLight by connecting to the OpenSearch API.
k8s
Collects data about objects from Kubernetes.
nova
Collects metadata associated with the Compute service (OpenStack Nova)
from the OpenStack nodes. This encompasses a wide range of data,
including instance details, general libvirt information, and so on.
neutron
Collects metadata associated with the Networking service (OpenStack
Neutron) from the OpenStack nodes. This encompasses a wide range of
data, including Open vSwitch statistics, list of namespaces, IP address
statistics in namespaces, Open vSwitch flows, and so on.
Given the substantial amount of information, you can manage the components
included in a support dump using the mutually exclusive --component
or --all-components options. Within the elastic collector component,
you can specify which loggers to gather logs for. For example,
the --componentnova option restricts log collection to pods related
to Nova, which names start with nova-* and libvirt-*.
Another filtering criterion involves specifying the host for which you intend
to collect support information. This can be accomplished through the use of
mutually exclusive --host or --all-hosts options. This feature is
particularly valuable for limiting the volume of data included in
the support dump.
For older MOSK versions, to start generating support
dumps, execute the osctl sos commands from a manually started
Docker container on any node of your cluster. For example, to create a generic
report for the Neutron component:
Before you proceed with the actual Tungsten Fabric (TF) deployment, verify
that your deployment meets the following prerequisites:
Your MOSK OpenStack cluster is deployed as described in
Deploy an OpenStack cluster with the Tungsten Fabric backend
enabled for Neutron using the following structure:
spec:features:neutron:backend:tungstenfabric
Your MOSK OpenStack cluster uses the correct value of
features:neutron:tunnel_interface in the openstackdeployment object.
The TF Operator will consume this value through the shared secret and use
it as a network interface from the underlay network to create encapsulated
tunnels with the tenant networks.
Considerations for tunnel_interface
Plan this interface as a dedicated physical interface for TF overlay
networks. TF uses features:neutron:tunnel_interface to create the
vhost0 virtual interface and transfers the IP configuration from
the tunnel_interface to the virtual one.
Do not use bridges from L2 templates as tunnel_interface.
Such usage might lead to networking performance degradation and data
plane downtime.
The Kubernetes nodes are labeled according to the TF node roles:
Deployment of Tungsten Fabric is managed by the tungstenfabric-operator
Helm resource in a respective ClusterRelease.
To deploy Tungsten Fabric:
Optional. Configure the ASN and encapsulation settings if you need custom
values for these parameters. For configuration details, see Autonomous System Number (ASN).
Configure the TFOperator custom resource according to the needs of your
deployment. For the configuration details, refer to TFOperator custom resource
and API Reference.
Trigger the Tungsten Fabric deployment:
kubectlapply-ftungstenfabric.yaml
Verify that Tungsten Fabric has been successfully deployed:
kubectlgetpods-ntf
The successfully deployed TF services should appear in the Running status
in the system response.
If you have enabled StackLight, enable Tungsten Fabric monitoring by setting
tungstenFabricMonitoring.enabled to true as described in
StackLight configuration procedure.
Since MOSK 23.1,
tungstenFabricMonitoring.enabled is enabled by default during
the Tungsten Fabric deployment. Therefore, skip this step.
This section includes configuration information for available advanced
Mirantis OpenStack for Kubernetes features that include SR-IOV and DPDK
with the Neutron Tungsten Fabric backend.
Enable huge pages for OpenStack with Tungsten Fabric¶
Note
The instruction provided in this section applies to both
OpenStack with OVS and OpenStack with Tungsten Fabric topologies.
The huge pages OpenStack feature provides essential performance improvements
for applications that are highly memory IO-bound. Huge pages should be enabled
on a per compute node basis. By default, NUMATopologyFilter is enabled.
To activate the feature, you need to enable huge pages on the dedicated
bare metal host as described in Enable huge pages in a host profile during
the predeployment bare metal configuration.
Note
The multi-size huge pages are not fully supported by Kubernetes
versions before 1.19. Therefore, define only one size in kernel parameters.
Verify that DPDK NICs are not used on the host operating system.
Note
For use in the Linux user space, DPDK NICs will be bound to
specific Linux drivers, required by PMDs. In such a way,
bounded NICs are not available for usage by standard Linux network
utilities. Therefore, allocate a dedicated NIC(s) for the vRouter
deployment in DPDK mode.
This section instructs you on how to enable SR-IOV with the Neutron Tungsten
Fabric (TF) backend.
To enable SR-IOV for TF:
Verify that your deployment meets the following requirements:
NICs with the SR-IOV support are installed
SR-IOV and VT-d are enabled in BIOS
Enable IOMMU in the kernel by configuring intel_iommu=on in the GRUB
configuration file. Specify the parameter for compute nodes in
BareMetalHostProfile in the grubConfig section:
After the OpenStackDeployment CR modification,
the TF Operator generates a separate vRouter DaemonSet with specified
settings. The tf-vrouter-agent-<XXXXX> pods will be automatically
restarted on the affected nodes causing the network services
interruption on virtual machines running on these hosts.
Optional. To modify a vRouter DaemonSet according to
the SR-IOV definition in the OpenStackDevelopment
CR, add vRouter custom specs to the TF Operator CR with
the node label specified in the OpenStackDeployment CR.
For example:
Tungsten Fabric MOSK deployments use six workers of the
contrail-api service by default. This section instructs you on how to
change the default configuration if needed.
To configure the number of Contrail API workers on a TF deployment:
Specify the required number of workers in the TFOperator custom
resource:
Verify that the ps output lists one API process with PID "1"
and the number of workers set in the TFOperator custom resource.
In /etc/contrail/, verify that the number of configuration
files contrail-api-X.conf matches the number of workers set in the
TFOperator custom resource.
By default, analytics services are part of basic
setups for Tungsten Fabric deployments. To obtain a more lightweight setup,
you can disable these services through the custom resource of the Tungsten
Fabric Operator.
Warning
Disabling of the Tungsten Fabric analytics services requires
restart of the data plane services for existing environments and must be
planned in advance. While calculating the maintenance window for this
operation, take into account the deletion of the analytics DaemonSets
and automatic restart of the tf-config, tf-control, and
tf-webui pods.
Clean up the Kubernetes resources. To free up the space that has been used
by Cassandra, ZooKeeper, and Kafka analytics storage, manually delete the
related PVC:
Delete terminated nodes from the Tungsten Fabric configuration through
the Tungsten Fabric web UI:
Caution
With disabled Tungsten Fabric analytics, the Tungsten Fabric
web UI may not work properly.
Log in to the Tungsten Fabric web UI.
On Configure > Infrastructure > Nodes > Analytics Nodes,
delete all terminated analytics nodes.
On Configure > Infrastructure > Nodes > Database Analytics
Nodes, delete all terminated database analytics nodes.
Depending on the MOSK version, proceed accordingly:
MOSK 24.1
Disable monitoring of the Tungsten Fabric analytics services in
StackLight by setting the following parameter in StackLight values
of the Cluster object to false:
tungstenFabricMonitoring:
analyticsEnabled:false
When done, the monitoring of the Tungsten Fabric analytics components
will become disabled and Kafka alerts along with the Kafka dashboard
will disappear from StackLight.
Since MOSK 24.2
The tungstenFabricMonitoring.analyticsEnabled setting is
automatically configured based on the state of the Tungsten Fabric
analytics services, which are enabled or disabled.
However, you can still override this setting. If set manually,
the configuration overrides the default behavior and does not reflect
the actual state of Tungsten Fabric analytics.
Now, with the Tungsten Fabric analytics services successfully disabled, you
have optimized resource utilization and system performance. While these
services are deactivated, related alerts may still be present in StackLight.
However, do not consider such alerts as indicative of the actual status
of the analytics services.
After the TF custom resource modification, the pods
related to the affected services will be restarted. This rule does not
apply to the tf-vrouter-agent-<XXXXX> pods as their update strategy
differs. Therefore, if you enable the debug logging for the services in
a tf-vrouter-agent-<XXXXX> pod, restart this pod manually after you
modify the custom resource.
Troubleshoot access to the Tungsten Fabric web UI¶
If you cannot access the Tungsten Fabric (TF) web UI service, verify that
the FQDN of the TF web UI is resolvable on your PC by running one of the
following commands:
hosttf-webui.it.just.works
# or
pingtf-webui.it.just.works
# or
dighosttf-webui.it.just.works
All commands above should resolve the web UI domain name to the IP address
that should match the EXTERNAL-IPs subnet dedicated to Kubernetes.
If the TF web UI domain name has not been resolved to the IP address, your PC
is using a different DNS or the DNS does not contain the record for the TF web
UI service. To resolve the issue, define the IP address of the Ingress service
from the openstack namespace of Kubernetes in the hosts file of your
machine. To obtain the Ingress IP address:
In the following cases, a TCP-based service may not work on VMs:
If the setup has nested VMs.
If VMs are running in the ESXi hypervisor.
If the Network Interface Cards (NICs) do not support the IP checksum
calculation and generate an incorrect checksum. For example, the Broadcom
Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe NIC cards.
To resolve the issue, disable the transmit (TX) offloading on all OpenStack
compute nodes for the affected NIC used by the vRouter as described below.
To identify the issue:
Verify whether ping is working between VMs on different
hypervisor hosts and the TCP services are working.
Run the following command for the vRouter Agent and verify whether the
output includes the number of Checksumerrors:
Once you modify the TFOperator CR, the
tf-vrouter-agent-<XXXXX> pods will not restart automatically because
they use the OnDelete update strategy. Restart such pods manually,
considering that the vRouter pods restart causes network services
interruption for the VMs hosted on the affected nodes.
To disable TX offloading on a specific subset of nodes, use custom
vRouter settings. For details, see Custom vRouter settings.
Warning
Once you add a new CustomSpec, a new daemon set will be
generated and the tf-vrouter-agent-<XXXXX> pods will be automatically
restarted. The vRouter pods restart causes network services interruption
for VMs hosted on the affected node. Therefore, plan this procedure
accordingly.
This guide outlines the post-deployment Day-2 operations for a Mirantis
OpenStack for Kubernetes environment. It describes how to configure and manage
the MOSK components, perform different types of cloud
verification, and enable additional features depending on your cloud needs.
The guide also contains day-to-day maintenance procedures such as how to back
up and restore, update and upgrade, or troubleshoot your
MOSK cluster.
Updating a MOSK cluster ensures that the system remains
secure, efficient, and up-to-date with the latest features and performance
improvements, as well as receives fixes for the known CVEs. This section
provides comprehensive details and step-by-step procedures to guide you
through the process of updating your cluster.
This section describes the workflow you as a cloud operator need to follow
to correctly update your Mirantis OpenStack for Kubernetes (MOSK)
cluster to a major release version.
Note
The hereby guide applies to the clusters running
MOSK of version 23.1 and above. In case you have an
older version and looking to update, please contact Mirantis support
to get intructions valid for your cluster.
The instructions below are generic and apply to any MOSK
cluster regardless of its configuration specifics. However, every major
release may have its own update peculiarities. Therefore, to accurately plan
and successfully perform an update, in addition to the hereby document,
read the update-related section in the
Release Notes of the target MOSK version.
Depending on the payload of a target release, the update mechanism
can perform the changes on different levels of the stack, from the
configuration of the host operating system to the code of OpenStack itself.
The update mechanism is designed to avoid the impact on the workloads and cloud
users as much as possible. The life-cycle management logic minimizes
the downtime for the cloud API by means of smart management of the
cluster components under the hood and only requests your involvement when a
human decision is required to proceed.
Though the update mechanism may change the internal components of the cluster,
it will always preserve the major versions of OpenStack, that is, the APIs that
cloud users and workloads deal with. After the cluster is successfully updated,
you can initiate a separate upgrade procedure to obtain the latest supported
OpenStack version.
Before starting an update, we recommend that you closely peruse the Release
Compatibility Matrix document and Release notes of the target release, as well
as thoroughly plan maintenance windows for each update phase depending on the
configurational of your cluster.
Current Mirantis Container Cloud software version and the need to first
update to the latest cluster release version
Update notes provided in the Release notes for the target
MOSK version
New product features that will get enabled in your cloud by default
New product features that may have already been configured in your cloud
as customizations and now need to be properly re-enabled to be eligible for
further support
Any changes in the behavior of the product features enabled in your cloud
List of the addressed and known issues in the target MOSK
version
Warning
If your cloud configuration is known to have any custom
configuration that was not explicitly approved by Mirantis,
make sure to bring this up with your dedicated Mirantis
representative before proceeding with the update. Mirantis
cannot guarantee the safe updating of a customized cloud.
Depending on the payload brought by a particular target release, a generic
cluster update includes from three to six major phases.
The first three phases are present in any update. They focus on the
containerized components of the software stack and have minimal impact on the
cloud users and workloads.
The remaining phases are only present if any changes need to be made to the
foundation layers: the underlay Kubernetes cluster and host
operating system. For the changes to take effect, you may need to reboot the
cluster nodes. This procedure imposes a severe impact on
cloud workloads and, therefore, needs to be thoroughly planned across
several sequential maintenance windows.
Important
To effectively plan a cluster update, keep in mind the
architecture of your specific cloud. Depending on the selected design, the
components of a MOSK cluster may have different
distribution across the nodes (physical servers) comprising the underlay
bare metal Kubernetes cluster. The more components are collocated on a
single node, the harder is the impact on the functions of the cloud when the
changes are applied.
The tables below will help you to plan your cluster update and include the
following information for each mandatory and additional update phase:
What happens during the phase
Includes the phase milestones. The nature of changes that are going to be
applied is important to understand in order to estimate the exact impact
the update is going to have on your cluster.
Consult the Update notes section of the target MOSK
release for the detailed information about the changes it brings and the
impact these changes are going to imply when getting applied to your
cluster.
Impact
Describes possible impact on cloud users and workloads.
The provided information about the impact represents the worst-case scenario
in the cluster architectures that imply a combination of several roles on
the same physical servers, such as hyper-converged compute nodes and
clusters with a compact control plane.
The impact estimation presumes that your cluster uses one of the standard
architectures provided by the product and follows Mirantis design
guidelines.
Time to complete
Provides a rough estimation of the time required to complete the phase.
The estimates for a phase timeline presume that your cluster uses one
of the standard architectures provided by the product and follows
Mirantis design guidelines.
Warning
During the update, try to prevent users from performing write
operations on the cloud resources. Any intensive manipulations may lead to
workload corruption.
Phase 1: Life-cycle management modules update
Important
This phase is mandatory. It is always present in the update flow
regardless of the contents of the target release.
New versions of OpenStack and Tungsten Fabric container images
downloaded, services restarted sequentially.
Impact
Some of the running cloud operations may fail over
the course of the phase due to minor unavailability of the cloud
API.
Workloads may experience temporary loss of the North-South
connectivity in the clusters with Open vSwitch networking backend.
The downtime depends on the type of virtual routers in use.
Time to complete
20 minutes per network gateway node (Open vSwitch)
5 minutes for a Tungsten Fabric cluster
15 minutes per compute node
Phase 3: Ceph cluster update and upgrade
Important
This phase is mandatory. It is always present in the update flow
regardless of the contents of the target release.
New versions of Ceph components downloaded, services restarted.
If applicable, Ceph switched to the latest major version.
Impact
Workloads may experience IO performance degradation for the virtual
storage devices backed by Ceph.
Time to complete
The update of a Ceph cluster with 30 storage nodes can take up to 35
minutes. Additionally, 15 minutes are required for the major Ceph
version upgrade, if any.
Phase 4a: Host operating system update on Kubernetes master nodes
Important
This phase is optional. The presense of this phase
in the update flow depends on the contents of
the target release.
Host operating system update on Kubernetes master nodes¶
What happens during the phase
New system packages downloaded and installed on the host operating
system, other major changes get applied.
Impact
None
Time to complete
The nodes are updated sequentially. Up to 15 minutes per node.
Phase 4b: Kubernetes components update on Kubernetes master nodes
Important
This phase is optional. The presense of this phase
in the update flow depends on the contents of
the target release.
Kubernetes cluster update on Kubernetes master nodes¶
What happens during the phase
New versions of Kubernetes control plane components downloaded and
installed.
Impact
For clusters with the compact control plane, some of the running
cloud operations may fail over the course of the phase due to minor
unavailability of the cloud API.
For the compact control plane with gateway nodes collocated
(Open vSwitch networking backend), workloads can experience
temporary loss of the North-South connectivity. The downtime
depends on the type of virtual routers in use.
Time to complete
Up to 40 minutes total
Phases 5a and 5b: Host operating system and Kubernetes cluster
update on Kubernetes worker nodes
Important
Both phases, 5a and 5b, are applied together, either node
by node (default) or to several nodes in parallel. The parallel updating
is available since 23.1.
Take this into consideration when estimating the impact and planning the
maintenance window.
Loss of connectivity to the volumes for the nodes hosting LVM
with iSCSI volumes.
For dedicated control plane nodes, some of the
running cloud operations may fail over the course of the phase
due to minor unavailability of the cloud API.
For dedicated gateway nodes (Open vSwitch), workloads can
experience minor loss of the North-South connectivity.
For compute nodes, there can be up to 5 minute downtime on the
network connectivity for the workloads running on the node,
due to the restart of the containers hosting the components
of the cloud data plane.
For clusters running MOSK 24.1.2 and above, the
dowtime is up to 2 minutes per node.
Time to complete
By default, the nodes are updated sequentially as follows:
For the host operating system update, up to 15 minutes per node.
For the Kubernetes cluster update, up to 40 minutes per node.
For MOSK 23.1 to 23.2 and newer releases, you can
reduce update time by enabling parallel node update. The procedure
is described further in the Enable parallel update of Kubernetes worker nodes
subsection.
Phase 6: Cluster nodes reboot
Important
This phase is optional. The presense of this phase
in the update flow depends on the contents of
the target release.
Important
An update to a newer MOSK version may require reboot
of the cluster nodes for changes to take effect. Although, you can decide
when to restart each particular node, an update can not be
considered complete until all of the nodes get restarted.
Optional. You configure an instance migration policy.
You initiate the node reboot.
The node is gracefully restarted with automatic or manual
migration of cloud workloads running on it.
Impact
For the storage nodes:
No impact on the nodes hosting the Ceph cluster data
Loss of connectivity to the volumes for the nodes hosting LVM
with iSCSI volumes
For the control plane nodes, some of the
running cloud operations may fail over the course of the phase
due to minor unavailability of the cloud API.
For the network gateway nodes (Open vSwitch), workloads can
experience minor loss of the North-South connectivity depending on
the type of virtual routers in use.
Step 1. Verify that the Container Cloud management cluster is up-to-date¶
MOSK relies on Mirantis Container Cloud to manage the
underlying software stack for a cluster, as well as to deliver updates
for all the components.
Since every MOSK release is tightly coupled with a
Container Cloud release, a MOSK cluster update becomes
possible once the management cluster is known to run the latest Container Cloud
version. The management cluster periodically verifies public Mirantis
repositories and updates itself automatically when a newer version becomes
available. Having any of the managed clusters, including
MOSK, running outdated Container Cloud version will
prevent the management cluster from automatic self-update.
To identify the current version of the Container Cloud software your management
cluster is running, refer to the Container Cloud web UI. You can also verify
your management cluster status using CLI as described in
Verify the management cluster status before MOSK update.
During an update of a MOSK cluster, numerous alerts may be
seen in StackLight. This is expected behavior. Therefore, ignore or temporarily
mute the alerts as described in Silence alerts.
Caution
During update, the false positive CalicoDataplaneFailuresHigh
alert may be firing. Disregard this alert, which will disappear once update
succeeds.
The observed behavior is typical for calico-node during upgrades,
as workload changes occur frequently. Consequently, there is a possibility
of temporary desynchronization in the Calico dataplane. This can
occasionally result in throttling when applying workload changes to the
Calico dataplane.
If you update MOSK to 23.1, verify that the
KaaSCephCluster custom resource does not contain the following entries.
If they exist, remove them.
In the spec.cephClusterSpec section, the external section.
In the spec.cephClusterSpec.rookConfig section, the ms_crc_data or
mscrcdata configuration key. After you remove the key, wait for
rook-ceph-mon pods to restart on the MOSK
cluster.
Enable parallel update of Kubernetes worker nodes¶
Optional. Starting from MOSK 23.1 to 23.2 update, you can
enable and configure parallel node update to reduce update time and minimize
downtime:
To enable parallel update of Kubernetes worker nodes, set the
spec.providerSpec.value.maxWorkerUpgradeCount configuration parameter
in the Mirantis Container Cloud management cluster as described in
conf-upd-count.
Consider the specifics of handling of parallel node updates by OpenStack,
Ceph, and Tungsten Fabric Controllers to properly plan the maintenance
window. For handling details and possible configuration, refer to
Parallelizing node update operations.
Optional. Starting from MOSK 24.3, you can enable automatic
node reboot of an update group, which contains a set of controller or worker
machines. This option applies when a Cluster release update requires node
reboot, for example, when kernel version update is available in the target
Cluster release. The option reduces manual intervention and overall downtime
during cluster update.
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
In the Clusters tab, find the managed MOSK
cluster.
Click the More action icon to see whether a new release is
available. If that is the case, click Update cluster.
In the Release Update window, select the required Cluster
release to update your managed cluster to.
The Description section contains the list of components
versions to be installed with a new Cluster release.
Click Update.
Before the cluster update starts, Container Cloud performs
a backup of MKE and Docker Swarm. The backup directory is located
under:
/srv/backup/swarm on every Container Cloud node for Docker Swarm
/srv/backup/ucp on one of the controller nodes for MKE
To view the update status, verify the cluster status on the
Clusters page. Once the orange blinking dot near the cluster
name disappears, the update is complete.
To view the update status through the Container Cloud web UI, navigate to
the Clusters page. Once the orange blinking dot next to the
cluster name disappears, the cluster update is complete.
Also, you can see the general status of each node during the update on the
Container Cloud cluster view page.
The whole update process is controlled by lcm-controller, which runs in
the kaas namespace of the Container Cloud management cluster. Follow
its logs to watch the progress of the update, discover, and debug any issues.
Watch the state of the cluster and nodes update through the CLI¶
The lcmclusterstate and lcmmachines objects in the mos namespace
of the Container Cloud management cluster provide detailed information about
the current phase of the update process in the context of the managed cluster
overall as well as specific nodes.
The lcmmachine object being in the Ready state indicates that a node
has been successfully updated.
To display the detailed view of the cluster update state, run:
Step 4. Reboot the nodes with optional instance migration¶
Depending on the target release content, you may need to reboot the cluster
nodes for the changes to take effect. Running a MOSK cluster
in a semi-updated state for an extended period may result in unpredictable
behavior of the cloud and impact users and workloads. Therefore, when it is
required, you need to reboot the cluster nodes as soon as possible to avoid
potential risks.
Note
If you enabled rebootIfUpdateRequires as described in
Enable automatic node reboot in update groups, nodes will be automatically rebooted in update
groups during a Cluster release update that requires a reboot, for example,
when kernel version update is available in the target Cluster release.
For a distribution upgrade, continue reading the following subsections.
Verify the YAML definitions of the LCMMachine and Machine objects.
The node must be rebooted if the rebootRequired flag is set to true.
In addition, objects explicitly specify the reason for rebooting. For example:
The LCMMachine object of the node that requires rebooting:
Since MOSK 23.1, you can also use the Mirantis
Container Cloud web UI to identify the nodes requiring reboot:
In the Clusters tab, click the required cluster name. The
page with Machines opens.
Hover over the status of every machine. A machine to reboot contains
the Reboot > The machine requires a reboot notification in
the Status tooltip.
Configure instance migration policy for cluster nodes¶
Restarting the cluster causes downtime of the cloud services running on the
nodes. While the MOSK control plane is built for high
availability and can tolerate temporary loss of at least 1/3 of services
without a significant impact on user experience, rebooting nodes that host the
elements of cloud data plane, such as network gateway nodes and compute nodes,
has a detrimental effect on the cloud workloads, if not performed gracefully.
To configure the instance migration policy:
Edit the target compute node resource. For example:
To mitigate the potential impact on the cloud workloads, define
the migration mode and the number of attempts the OpenStack Controller
should make to migrate a single instance running on it:
Defines the instance migration mode for the host.
The list of available options include:
live: the OpenStack Controller live migrates instances
automatically. The update mechanism tries to move the memory
and local storage of all instances on the node to another node
without interrupting before applying any changes to the node.
By default, the update mechanism makes three attempts to migrate
each instance before falling back to the manual mode. 0
manual: the OpenStack Controller waits for the Operator to
migrate instances from the host. When it is time to update the host,
the update mechanism asks you to manually migrate the instances and
proceeds only once you confirm the node is safe to update. 1
skip: the OpenStack Controller skips the instance check on the
node and reboots it.
instance_migration_attempts
3
Defines the number of times the OpenStack Controller attempts
to live-migrate a single instance before falling back to the
manual mode.
Success of live migration depends on many factors including
the selected vCPU type and model, the amount of data that needs
to be transferred, the intensity of the disk IO and memory writes,
the type of the local storage, and others. Instances using
the following product features are known to have issues with
live migration:
LVM-based ephemeral storage with and without encryption
For the clouds relying on the converged LVM with iSCSI block
storage that offer persistent volumes in a remote edge sub-region,
it is important to keep in mind that applying a major change to a
compute node may impact not only the instances running on this node
but also the instances attached to the LVM devices hosted there.
Mirantis recommends that in such environments you perform the update
procedure in the manual mode with mitigation measures taken by
the Operator for each compute node. Otherwise, all the instances that
have LVM with iSCSI volumes attached would need reboot to restore
connectivity.
Configuration example that sets the instance migration mode to live
and the number of attempts to live-migrate to 5:
If needed, as a cloud user, mark the instances that require individual
handling during instance migration using the
openstack.lcm.mirantis.com:maintenance_action=<ACTION-TAG> server tag.
For details, refer to Configure per-instance migration mode.
Since MOSK 23.1, you can reboot several cluster nodes
in one go by using the Graceful reboot mechanism
provided by Mirantis Container Cloud. The mechanism restarts the selected nodes
one by one, honoring the instance migration policies.
For older versions of MOSK, you need to reboot each node
manually as follows:
When a node that has a manual instance migration policy is ready to be
restarted, the life-cycle management mechanism notifies you about that by
creating a NodeMaintenanceRequest object for the node and setting the
active status attribute for the corresponding NodeWorkloadLock object.
Note
Verify the status:errorMessage attribute before proceeding.
To view the NodeWorkloadLock objects details for a specific node, run:
If an update phase takes significantly longer than expected according
to the tables included in Plan the cluster update, you should consider
the update process hung.
If you observe errors that are not described explicitly in the documentation,
immediately contact Mirantis support.
To see any issues that might have occurred during the update, verify the logs
of the lcm-controller pods in the kaas namespace of the Container Cloud
management cluster.
Patch releases aim to significantly shorten the cycle of CVE fixes delivery
onto your MOSK deployments to help you avoid cyber
threats and data breaches.
Your management bare-metal cluster obtains patch releases automatically
the same way as major releases. A new patch MOSK release
version becomes available through the Container Cloud web UI after the
automatic upgrade of the management cluster.
It is not possible to update between the patch releases that belong to
different release series in one go. For example, you can update from
MOSK 23.1.1 to 23.1.2, but you cannot immediately update
from MOSK 23.1.x to 23.2.x because you need to update to
the major MOSK 23.2 release first.
Caution
If you delay the Container Cloud upgrade and schedule it at a
later time as described in Schedule Mirantis Container Cloud updates, make sure to
schedule a longer maintenance window as the upgrade queue can include
several patch releases along with the major release upgrade.
Read the Update notes part of the target MOSK
release notes to understand the changes it brings and the impact these
changes are going to have on your cloud users and workloads.
The application of the patch releases may not require the cluster nodes
reboot. Though, your cluster can contain nodes that require reboot after
the last update to a major release, and this requirement will remain after
update to any of the following patch releases. Therefore, Mirantis strongly
recommends that you determine if there are such nodes in your cluster
before you update to the next patch release and reboot them if any, as
described in Step 4. Reboot the nodes with optional instance migration.
For some MOSK versions, applying a patch release may require
restart of the containers that host the elements of the cloud data plane. In
case of Open vSwitch-based clusters, this may result in up to 5 minute downtime
of workload network connectivity for each compute node.
For MOSK prior to 24.1 series, you can determine whether
applying a patch release is going to require the restart of the data plane by
consulting the Release artifacts part of the release notes of the current
and target MOSK releases.
The data plane restart will only happen if there are new versions of the
container images related to the data plane.
It is possible to avoid the downtime for the cloud data by
explicitly pinning the image versions of the following components:
Open vSwitch
Kubernetes entrypoint
However, pinning these images will result in the cloud data plane not receiving
any security or bugfixes during the update.
To pin the images:
Depending on the proxy configuration, the image base URL differs.
To obtain the list of currently used images on the cluster, run:
Add the openvswitch and kubernetes-entrypoint images used on your
cluster:
Since MOSK 25.1
Create a ConfigMap in the openstack namespace with the following
content, replacing <OPENSTACKDEPLOYMENT-NAME> with the name of your
OpenStackDeployment custom resource:
Since Container Cloud 2.26.1 (patch Cluster releases 17.1.1 and
16.1.1), the update of Ubuntu packages with kernel minor version update
may apply in certain releases.
In this case, cordon-drain and reboot of machines does not apply
automatically, and all machines have the Reboot is required
notification after the cluster update. You can manually handle the reboot
of machines during a convenient maintenance window as described in
Perform a graceful reboot of a cluster.
Compare the output obtained in the previous step with the output from the
first step. The Cluster releases must match. If this is not the case,
contact Mirantis support for further details.
You can define the upgrade sequence for existing machines to allow prioritized
machines to be upgraded first during a cluster update.
Consider the following upgrade index specifics:
The first machine to upgrade is always one of the control plane machines
with the lowest upgradeIndex. Other control plane machines are upgraded
one by one according to their upgrade indexes.
If the Cluster spec dedicatedControlPlane field is false, worker
machines are upgraded only after the upgrade of all control plane machines
finishes. Otherwise, they are upgraded after the first control plane
machine, concurrently with other control plane machines.
If several machines have the same upgrade index, they have the same priority
during upgrade.
If the value is not set, the machine is automatically assigned a value
of the upgrade index.
To define the upgrade order of an existing machine:
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the required cluster name.
The cluster page with the Machines list opens.
In one of the Unassigned machines settings menu, select
Change upgrade index.
In the Configure Upgrade Priority window that opens, use the
Up and Down arrows in the Upgrade Index
field to configure the upgrade sequence of a machine.
Click Update to apply changes.
Using the Pool info or Machine info options in the
machine settings menu, verify that the Upgrade Priority Index
contains the updated value.
By default, worker machines are upgraded sequentially, which includes node
draining, software upgrade, services restart, and so on. Though,
MOSK enables you to parallelize node upgrade operations,
significantly improving the efficiency of your deployment, especially on
large clusters.
Configure the parallel update of worker nodes using web UI¶
Available since MCC 2.25.0 (17.0.0 and 16.0.0)
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the required cluster name.
The cluster page with the Machines list opens.
On the Clusters page, click the More
action icon in the last column of the required cluster and
select Configure cluster.
In General Settings of the Configure cluster window,
define the following parameters:
Parallel Upgrade Of Worker Machines
The maximum number of the worker nodes to update simultaneously. It serves as
an upper limit on the number of machines that are drained at a given moment
of time. Defaults to 1.
You can configure this option after deployment before the cluster update.
Parallel Preparation For Upgrade Of Worker Machines
The maximum number of worker nodes being prepared at a given moment of time,
which includes downloading of new artifacts. It serves as a limit for the
network load that can occur when downloading the files to the nodes.
Defaults to 50.
Configure the parallel update of worker nodes using CLI¶
The maximum number of the worker nodes to update simultaneously.
It serves as an upper limit on the number of machines that are
drained at a given moment of time.
Caution
Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0), maxWorkerUpgradeCount is deprecated and will be removed
in one of the following releases. Use the concurrentUpdates
parameter in the UpdateGroup object instead. For details, see
Create update groups for worker machines.
spec.providerSpec.maxWorkerPrepareCount
50
The maximum number of workers being prepared at a given moment of time,
which includes downloading of new artifacts.
It serves as a limit for the network load that can occur when downloading
the files to the nodes.
The use of update groups provides enhanced control over update of worker
machines by allowing granular concurrency settings for specific machine groups.
This feature uses the UpdateGroup object to decouple the concurrency
settings from the global cluster level, providing flexibility based on the
workload characteristics of different machine sets.
The UpdateGroup objects are processed sequentially based on their indexes.
Update groups with the same indexes are processed concurrently. The control
update group is always processed first.
Note
The update order of a machine within the same group is determined by
the upgrade index of a specific machine. For details, see
Change the upgrade order of a machine.
The maxWorkerUpgradeCount parameter of the Cluster object is inherited
by the default update group. Changing maxWorkerUpgradeCount leads to
changing the concurrentUpdates parameter of the default update group.
Note
The maxWorkerUpgradeCount parameter of the Cluster object is
deprecated and will be removed in one of the following Container Cloud
releases. You can still use this parameter to change the
concurrentUpdates value of the default update group. However, Mirantis
recommends changing this value directly in the UpdateGroup object.
Available since MCC 2.28.0 (17.3.0 and 16.3.0)TechPreview
The update group for controller nodes is automatically generated during initial
cluster creation with the following settings:
name:<cluster-name>-control
index:1
concurrentUpdates:1
rebootIfUpdateRequires:false
Caution
During a distribution upgrade, machines are always rebooted,
overriding rebootIfUpdateRequires:false.
All control plane machines are automatically assigned to the update group for
controller nodes with no possibility to change it.
Note
On existing clusters created before Container Cloud 2.28.0 (Cluster
releases 17.2.0, 16.2.0, or earlier), the update group for controller nodes
is created after Container Cloud upgrade to 2.28.0 (Cluster release 16.3.0)
on the management cluster.
Caution
The index and concurrentUpdates parameters of the update
group for controller nodes are hardcoded and cannot be changed.
The default update group is automatically created during initial cluster
creation with the following settings:
name:<cluster-name>-default
index:1
rebootIfUpdateRequires:false
concurrentUpdates: inherited from the maxWorkerUpgradeCount parameter
set in the Cluster object
Caution
During a distribution upgrade, machines are always rebooted,
overriding rebootIfUpdateRequires:false.
Note
On existing clusters created before Container Cloud 2.27.0 (Cluster
releases 17.1.0, 16.1.0, or earlier), the default update group is created
after Container Cloud upgrade to 2.27.0 (Cluster release 16.2.0) on the
management cluster.
To change the update group of a machine, update the
kaas.mirantis.com/update-group label of the machine with the new update
group name. Removing this label from a machine automatically assigns such
machine to the default update group.
Note
After creation of a custom UpdateGroup object, if you plan to
add a new machine that requires a non-default update group, manually add
the corresponding label to the machine as described above. Otherwise, the
default update group is applied to such machine.
Note
Before removing the UpdateGroup object, reassign all machines to
another update group.
Granularly update a managed cluster using the ClusterUpdatePlan object¶
Available since MCC 2.27.0 (17.2.0)TechPreview
You can control the process of a managed cluster update by manually launching
update stages using the ClusterUpdatePlan custom resource. Between the
update stages, a cluster remains functional from the perspective of cloud
users and workloads.
A ClusterUpdatePlan object contains the following funtionality:
The object is automatically created by the bare metal provider when a new
Cluster release becomes available for your cluster.
The object is created in the management cluster for the same namespace that
the corresponding managed cluster refers to.
The object contains a list of self-descriptive update steps that
are cluster-specific. These steps are defined in the spec section of the
object with information about their impact on the cluster.
The object starts cluster update when the operator manually changes the
commence field of the first update step to true. All steps have the
commence flag initially set to false so that the operator can decide
when to pause or resume the update process.
The object has the following naming convention:
<managedClusterName>-<targetClusterReleaseVersion>.
Since Container Cloud 2.28.0 (Cluster release 17.3.0), the object contains
several StackLight alerts to notify the operator about the update progress
and potential update issues. For details, see
StackLight alerts: Container Cloud.
Optional. Available since Container Cloud 2.29.0 (Cluster release 17.4.0)
as Technology Preview. Enable update auto-pause to be triggered by specific
StackLight alerts. For details, see Configure update auto-pause.
Open the ClusterUpdatePlan object for editing.
Start cluster update by changing the spec:steps:commence field of the
first update step to true.
Once done, the following actions are applied to the cluster:
The Cluster release in the corresponding Clusterspec is changed
to the target Cluster version defined in the ClusterUpdatePlanspec.
The cluster update starts and pauses before the next update step with
commence:false set in the ClusterUpdatePlanspec.
Caution
Cancelling an already started update step is not supported.
The following example illustrates the ClusterUpdatePlan object of a
MOSK cluster update that has completed:
Example of a completed ClusterUpdatePlan object
Object:apiVersion:kaas.mirantis.com/v1alpha1kind:ClusterUpdatePlanmetadata:creationTimestamp:"2025-02-06T16:53:51Z"generation:11name:mosk-17.4.0namespace:childresourceVersion:"6072567"uid:82c072be-1dc5-43dd-b8cf-bc643206d563spec:cluster:moskreleaseNotes:https://docs.mirantis.com/mosk/latest/25.1-series.htmlsource:mosk-17-3-0-24-3steps:-commence:truedescription:-install new version of OpenStack and Tungsten Fabric life cycle managementmodules-OpenStack and Tungsten Fabric container images pre-cached-OpenStack and Tungsten Fabric control plane components restarted in parallelduration:estimated:1h30m0sinfo:-15 minutes to cache the images and update the life cycle management modules-1h to restart the componentsgranularity:clusterid:openstackimpact:info:-some of the running cloud operations may fail due to restart of API servicesand schedulers-DNS might be affectedusers:minorworkloads:minorname:Update OpenStack and Tungsten Fabric-commence:truedescription:-Ceph version update-restart Ceph monitor, manager, object gateway (radosgw), and metadata services-restart OSD services node-by-node, or rack-by-rack depending on the clusterconfigurationduration:estimated:8m30sinfo:-15 minutes for the Ceph version update-around 40 minutes to update Ceph cluster of 30 nodesgranularity:clusterid:cephimpact:info:-'minorunavailabilityofobjectstorageAPIs:S3/Swift'-workloads may experience IO performance degradation for the virtual storagedevices backed by Cephusers:minorworkloads:minorname:Update Ceph-commence:truedescription:-new host OS kernel and packages get installed-host OS configuration re-applied-container runtime version gets bumped-new versions of Kubernetes components installedduration:estimated:1h40m0sinfo:-about 20 minutes to update host OS per a Kubernetes controller, nodes updatedone-by-one-Kubernetes components update takes about 40 minutes, all nodes in parallelgranularity:clusterid:k8s-controllersimpact:users:noneworkloads:nonename:Update host OS and Kubernetes components on master nodes-commence:truedescription:-new host OS kernel and packages get installed-host OS configuration re-applied-container runtime version gets bumped-new versions of Kubernetes components installed-data plane components (Open vSwitch and Neutron L3 agents, TF agents and vrouter)restarted on gateway and compute nodes-storage nodes put to “no-out” mode to prevent rebalancing-by default, nodes are updated one-by-one, a node group can be configured toupdate several nodes in parallelduration:estimated:8h0m0sinfo:-host OS update - up to 15 minutes per node (not including host OS configurationmodules)-Kubernetes components update - up to 15 minutes per node-OpenStack controllers and gateways updated one-by-one-nodes hosting Ceph OSD, monitor, manager, metadata, object gateway (radosgw)services updated one-by-onegranularity:machineid:k8s-workers-vdrok-child-defaultimpact:info:-'OpenStackcontrollernodes:somerunningOpenStackoperationsmightnotcompleteduetorestartofcomponents'-'OpenStackcomputenodes:minorlossoftheEast-WestconnectivitywiththeOpenvSwitchnetworkingbackendthatcausesapproximately5minofdowntime'-'OpenStackgatewaynodes:minorlossoftheNorth-SouthconnectivitywiththeOpenvSwitchnetworkingbackend:anon-distributedHAvirtualrouterneedsupto1minutetofailover;anon-distributedandnon-HAvirtualrouterfailovertimedependsonmanyfactorsandmaytakeupto10minutes'users:majorworkloads:majorname:Update host OS and Kubernetes components on worker nodes, group vdrok-child-default-commence:truedescription:-restart of StackLight, MetalLB services-restart of auxiliary controllers and chartsduration:estimated:1h30m0sgranularity:clusterid:mcc-componentsimpact:info:-minor cloud API downtime due restart of MetalLB componentsusers:minorworkloads:nonename:Auxiliary components updatetarget:mosk-17-4-0-25-1status:completedAt:"2025-02-07T19:24:51Z"startedAt:"2025-02-07T17:07:02Z"status:Completedsteps:-duration:26m36.355605528sid:openstackmessage:Readyname:Update OpenStack and Tungsten FabricstartedAt:"2025-02-07T17:07:02Z"status:Completed-duration:6m1.124356485sid:cephmessage:Readyname:Update CephstartedAt:"2025-02-07T17:33:38Z"status:Completed-duration:24m3.151554465sid:k8s-controllersmessage:Readyname:Update host OS and Kubernetes components on master nodesstartedAt:"2025-02-07T17:39:39Z"status:Completed-duration:1h19m9.359184228sid:k8s-workers-vdrok-child-defaultmessage:Readyname:Update host OS and Kubernetes components on worker nodes, group vdrok-child-defaultstartedAt:"2025-02-07T18:03:42Z"status:Completed-duration:2m0.772243006sid:mcc-componentsmessage:Readyname:Auxiliary components updatestartedAt:"2025-02-07T19:22:51Z"status:Completed
Monitor the message and status fields of the first step.
The message field contains information about the progress of the current
step. The status field can have the following values:
NotStarted
ScheduledSince MCC 2.28.0 (17.3.0)
InProgress
AutoPausedTechPreview since MCC 2.29.0 (17.4.0)
Stuck
Completed
The Scheduled status indicates that a step is already triggered but
its execution has not started yet.
The AutoPaused status indicates that the update process is paused by a
firing StackLight alert defined in the UpdateAutoPause object. For
details, see Configure update auto-pause.
The Stuck status indicates an issue and that the step can not fit
into the ETA defined in the duration field for this step. The ETA for
each step is defined statically and does not change depending on the
cluster.
Caution
The status is not populated for the ClusterUpdatePlan
objects that have not been started by adding the commence:true flag
to the first object step. Therefore, always start updating the object
from the first step.
Optional. Available since Container Cloud 2.28.0 (Cluster releases 17.3.0
and 16.3.0). Add or remove update groups of worker nodes on the fly, unless
the update of the group that is being removed has already been scheduled,
or if a newly set group will have an index that is lower or equal to another
group that is already scheduled. These changes are reflected in
ClusterUpdatePlan.
You can also reassign a machine to a different update group while the
cluster is being updated, but only if the new update group has an index
higher than the index of the last scheduled worker update group.
Disabled machines are considered as updated immediately.
Note
Depending on the number of update groups for worker nodes present
in the cluster, the number of steps in spec differs. Each update
group for worker nodes that has at least one machine will be represented
by a step with the ID k8s-workers-<UpdateGroupName>.
Proceed with changing the commence flag of the following update steps
granularly depending on the cluster update requirements.
Caution
Launch the update steps sequentially. A consecutive step is
not started until the previous step is completed.
Optional. Available since Container Cloud 2.29.0 (Cluster release 17.4.0)
as Technology Preview. Enable update auto-pause to be triggered by specific
StackLight alerts. For details, see Configure update auto-pause.
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
On the Clusters page, in the Updates column of the
required cluster, click the Available link. The
Updates tab opens.
Note
If the Updates column is absent, it indicates that
the cluster is up-to-date.
Note
For your convenience, the Cluster updates menu is
also available in the right-side kebab menu of the cluster on the
Clusters page.
On the Updates page, click the required version in the
Target column to open update details, including the list of
update steps, current and target cluster versions, and estimated update
time.
In the Target version section of the Cluster update
window, click Release notes and carefully read updates about
target release, including the Update notes section that contains
important pre-update and post-update steps.
Expand each step to verify information about update impact and other useful
details.
Select one of the following options:
Enable Auto-commence all at the top-right of the first update
step section and click Start Update to launch update and start
each step automatically.
Click Start Update to only launch the first update step.
Note
This option allows you to auto-commence consecutive steps while
the current step is in progress. Enable the Auto-commence
toggle for required steps and click Save to launch the
selected steps automatically. You will only be prompted to confirm the
consecutive step, all remaining steps will be launched without a manual
confirmation.
Before launching the update, you will be prompted to manually type in the
target Cluster release name and confirm that you have read release notes
about target release.
Caution
Cancelling an already started update step is not supported.
Monitor the status of each step by hovering over the In Progress
icon at the top-right of the step window. While the step is in progress,
its current status is updated every minute.
Once the required step is completed, the Waiting for input
status at the top of the update window is displayed requiring you to confirm
the next step.
The update history is retained in the Updates tab with the
completion timestamp. The update plans that were not started and can no longer
be used are cleaned up automatically.
Uinsg the UpdateAutoPause object, the operator can define specific
StackLight alerts that trigger auto-pause of an update phase execution in
a MOSK cluster. The feature enhances update management of
MOSK clusters by preventing harmful changes to be propagated
across the entire cloud.
Note
The feature is not available for management clusters.
When an update auto-pause is configured on a cluster, the following workflow
applies:
During cluster updates, the system continuously monitors for the alerts
defined in the UpdateAutoPause object
If any configured alert fires:
The update process automatically pauses
The commence field is removed from all steps that have not started
The commence field is removed from the steps related to UpdatehostOSandKubernetescomponentsonworkernodes even if the step is in
progress, and the step is paused
The ClusterUpdatePlan status changes to AutoPaused
The firing alerts are recorded in the UpdateAutoPause status
A condition is added to the Cluster object indicating the pause state
Verify that StackLight is enabled on the MOSK cluster.
Create an UpdateAutoPause object with the name that matches your cluster
name within the cluster namespace. For example:
apiVersion:kaas.mirantis.com/v1alpha1kind:UpdateAutoPausemetadata:name:managed-cluster-example# Must match cluster namenamespace:managed-cluster-ns# Must match cluster namespacespec:alerts:-AlertName1-AlertName2
The list of alerts can include standard and
custom StackLight alerts previously configured for
the cluster.
Calculate a maintenance window duration for update Deprecated¶
Deprecation notice
The maintenance window duration calculator is deprecated. Starting from
MOSK 25.1, cloud operators should use the
ClusterUpdatePlan API instead. For details, refer to
ClusterUpdatePlan resource.
This section provides an online calculator for quick calculation
of the approximate time required to update your MOSK
cluster that uses Open vSwitch as a networking backend.
Additionally, for a more accurate calculation, consider any cluster-specific
factors that can have a large impact on the update time in some edge cases,
such as number of routers, frequency of CPU, and so on.
Number of virtual machines per compute node:
Number of OpenStack compute nodes:
Number of OpenStack gateway nodes:
Number of Kubernetes control plane nodes:
Number of Kubernetes worker nodes except Kubernetes control plane nodes under the Kubernetes worker role:
This section contains instructions on how to get access to different systems
of a MOSK cluster.
To obtain endpoints of the MKE web UI and StackLight web UIs such as
Prometheus, Alertmanager, Alerta, OpenSearch Dashboards, and Grafana, in the
Clusters tab of the Container Cloud web UI, navigate to
More > Cluster info.
Note
The Alertmanager web UI displays alerts received by all
configured receivers, which can be mistaken for duplicates. To only display
the alerts received by a particular receiver, use the Receivers
filter.
Generate a kubeconfig for a MOSK cluster using API¶
This section describes how to generate a MOSK cluster
kubeconfig using the Container Cloud API. You can also download a
MOSK cluster kubeconfig using the
Download Kubeconfig option in the Container Cloud web UI. For
details, see Connect to a MOSK cluster.
The kubeconfig of your <username> that you can download through
the Container Cloud web UI using Download Kubeconfig located
under your <username> on the top-left of the page.
Obtain the <cluster> object of the <cluster_name>
MOSK cluster:
Generate the MOSK cluster kubeconfig using the data
from <cluster.status> and <token> obtained in the previous steps.
Use the following template as an example:
The Container Cloud web UI communicates with Keycloak to authenticate
users. Keycloak is exposed using HTTPS with self-signed TLS certificates
that are not trusted by web browsers.
After you deploy a MOSK management or managed cluster,
connect to the cluster to verify the availability and status of the nodes as
described below.
To connect to a MOSK cluster:
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the required cluster name.
The cluster page with the Machines list opens.
Verify the status of the manager nodes.
Once the first manager node is deployed and
has the Ready status, the Download Kubeconfig
option for the cluster being deployed becomes active.
Open the Clusters tab.
Click the More action icon in the last column of the required
cluster and select Download Kubeconfig:
Enter your user password.
Not recommended. Select Offline Token to generate an offline
IAM token. Otherwise, for security reasons, the kubeconfig token
expires every 30 minutes of the Container Cloud API idle time
and you have to download kubeconfig again with a newly generated
token.
Click Download.
Verify the availability of the managed cluster machines:
Export the kubeconfig parameters to your local machine with
access to kubectl. For example:
Using the Keycloak Admin Console, you can create or delete a user as well as
grant or revoke roles to or from a user. The Keycloak administrator is
responsible for assigning roles to users depending on the level of access they
need in a cluster.
The system response contains the URL to access the Keycloak Admin Console.
The user name is keycloak by default. The password is located in
passwords.yaml generated during bootstrap.
You can also obtain the password from the iam-api-secrets secret
in the kaas namespace of the management cluster and
decode the content of the keycloak_password key:
The Tungsten Fabric (TF) web UI allows for easy and fast TF resources
configuration, monitoring, and debugging. You can access
the TF web UI through either the Ingress service or the Kubernetes Service
directly. TLS termination for the https protocol is performed through the
Ingress service.
Note
Mirantis OpenStack for Kubernetes provides the TF web UI as is and
does not include this service in the support Service Level Agreement.
To access the TF web UI through Ingress:
Log in to a local machine where kubectl is installed.
For quick and easy inspection and monitoring, you can add a
MOSK cluster to Lens using the Container Cloud web UI.
The following options are available in the More action icon
menu of each cluster:
Add cluster to Lens
Open cluster in Lens
Before you can start monitoring your clusters in Lens, install the Container
Cloud Lens extension as described below.
Verify that your OpenStack cloud is running on the latest
MOSK release. See Release Compatibility Matrix for the
release matrix and supported upgrade paths.
Just before the upgrade, back up your OpenStack databases.
See the following documentation for details:
Verify that OpenStack is healthy and operational. All OpenStack components
in the health group in the OpenStackDeploymentStatus CR should be
in the Ready state. See OpenStackDeploymentStatus custom resource for details.
Verify the workability of your OpenStack deployment by running Tempest
against the OpenStack cluster as described in Run Tempest tests.
Verification of the testing pass rate before upgrading will help you
measure your cloud quality before and after upgrade.
Read carefully through the Release Notes of your
MOSK version paying attention to the Known issues
section and the
OpenStack upstream release notes
for the target OpenStack version.
When upgrading to OpenStack Yoga, remove the Panko service from the cloud
by removing the event entry from the spec:features:services
structure in the OpenStackDeployment resource as described in
Remove an OpenStack service.
Note
The OpenStack Panko service has been removed from the product and
is no longer maintained in the upstream OpenStack. See the project
repository page for details.
To start the OpenStack upgrade, change the value of the
spec:openstack_version parameter in the OpenStackDeployment object
to the target OpenStack release.
After you change the value of the spec:openstack_version parameter,
the OpenStack Controller initializes the upgrade process.
To verify the upgrade status, use:
Logs from the osdpl container in the OpenStack Controller rockoon
pod.
The OpenStackDeploymentStatus object.
When upgrade starts, the OPENSTACKVERSION field content changes
to the target OpenStack version, and STATE displays APPLYING:
Verify that OpenStack is healthy and operational. All OpenStack components
in the health group in the OpenStackDeploymentStatus CR should be
in the Ready state. See OpenStackDeploymentStatus custom resource for details.
Verify the workability of your OpenStack deployment by running Tempest
against the OpenStack cluster as described in Run Tempest tests.
Before upgrading, verify that you have completed
the Prerequisites and removed the domains from federation
mappings as described below.
Warning
If your MOSK cluster is running version
24.3 and includes the Instance High Availability service (OpenStack
Masakari), the OpenStack upgrade will fail due to an incorrect migration
of the Masakari database from legacy SQLAlchemy Migrate to Alembic caused
by a misconfigured alembic_table. To avoid this issue, follow the
workaround steps outlined in [47603] Masakari fails during the OpenStack upgrade to Caracal before proceeding
with the upgrade.
MOSK enables you to upgrade directly from Antelope to
Caracal without the need to upgrade to the intermediate Bobcat release.
To upgrade the cloud, complete the upgrade steps
instruction changing the value of the spec:openstack_version parameter
in the OpenStackDeployment object from antelope to caracal.
Perform the domains removal from the federation
mappings if your MOSK cluster configuration
includes federated identity management system, such as IAM or
any other supported identity provider.
Before Caracal, Keystone does not properly handle domain specifications for
users in mappings. Even though domains are specified for users, Keystone always
creates users in the domain associated with the identity provider the user
logs in from.
Starting with Caracal, Keystone honors the domains specified for users
in mappings. Many example mappings, including the previous default mapping
in MOSK, use domain specifications. After upgrading
to Caracal, the new users logging in through federation may be assigned to
a different Keystone domain, while existing users will retain their
current domain. This behaviour may negatively impact monitoring,
compliance, and overall cluster operations.
To maintain the same functionality after the upgrade, remove the domain
element from both the local.user element and local element, which
sets default domain values for user and group elements, from the previous
default mappings.
You can use the openstack mapping commands to manage mappings:
To list available mappings: openstack mapping list
To display the mapping rules: openstack mapping show <name>
To modify the mapping rules:
openstack mapping set <name> --rules <rules>
MOSK enables you to upgrade directly from Yoga to Antelope
without the need to upgrade to the intermediate Zed release.
Before upgrading, verify that you have completed the
Prerequisites.
Important
There are several known issue affecting MOSK clusters running
OpenStack Antelope that can disrupt the network connectivity of the cloud
workloads.
If your cluster is still running OpenStack Yoga, update to the MOSK 24.2.1
patch release first and only then upgrade to OpenStack Antelope. If you
have not been applying patch releases previously and would prefer to switch
back to major releases-only mode, you will be able to do this when MOSK 24.3
is released.
If you have updated your cluster to OpenStack Antelope, apply the
workarounds described in Release notes: OpenStack known issues for the following issues:
[45879] [Antelope] Incorrect packet handling between instance and
its gateway
[44813] Traffic disruption observed on trunk ports
To upgrade the cloud, complete the upgrade steps
instruction changing the value of the spec:openstack_version parameter
in the OpenStackDeployment object from yoga to antelope.
If your cluster is running on top of the MOSK 23.1.2
patch version, the OpenStack upgrade to Yoga may fail due to the delay
in the Cinder start. For the workaround, see 23.1.2 known issues:
OpenStack upgrade failure.
Before upgrading, verify that you have completed the
Prerequisites.
If your cloud runs on top of the OpenStack Victoria release, you must first
upgrade to the technical OpenStack releases Wallaby and Xena before upgrading
to Yoga.
Caution
The Wallaby and Xena releases are not recommended for a long-run
production usage. These versions are transitional, so-called technical
releases with limited testing scopes. For the OpenStack versions support
cycle, refer to OpenStack support cycle.
To upgrade the cloud, complete the upgrade steps
for each release version in line in the following strict order:
Mirantis OpenStack for Kubernetes (MOSK) relies on the MariaDB Galera
cluster to provide its OpenStack components with reliable storage of persistent
data. Mirantis recommends backing up your OpenStack databases daily to ensure
the safety of your cloud data. Also, you should always create an instant
backup before updating your cloud or performing any kind of potentially
disruptive experiment.
MOSK has a built-in automated backup routine that can be
triggered manually or by schedule. Periodic backups are suspended by default
but you can easily enable them through the OpenStackDeployment custom
resource. For the details about enablement and configuration of the periodic
backups, refer to Periodic OpenStack database backups in the Reference Architecture.
This section includes more intricate procedures that involve additional steps
beyond editing the OpenStackDeployment custom resource, such as restoring
the OpenStack database from a backup or configuring a remote storage for
backups.
By default, MOSK stores the OpenStack database backups
locally in the Mirantis Ceph cluster, which is a part of the same cloud.
Alternatively, MOSK enables you to save the backup data
to an external storage. This section contains the details on how you, as a
cloud operator, can configure a remote storage backend for OpenStack
database backups.
In general, the built-in automated backup mechanism saves the data to the
mariadb-phy-backup-data PersistentVolumeClaim (PVC), which is provisioned
from StorageClass specified in the spec.persistent_volume_storage_class
parameter of the OpenstackDeployment custom resource (CR).
Configure a remote NFS storage for OpenStack backups¶
If your MOSK cluster was originally deployed with the default
backup storage, proceed with this step. Otherwise, skip it.
Copy the already existing backup data to a storage different from the
mariadb-phy-backup-data PVC.
During the OpenStack database restoration, the MariaDB cluster is
unavailable due to the MariaDB StatefulSet being scaled down to 0 replicas.
Therefore, to safely restore the state of the OpenStack database,
plan the maintenance window thoroughly and in accordance with the
database size.
The duration of the maintenance window may depend on the following:
Network throughput
Performance of the storage where backups are kept, which is Mirantis Ceph
by default
Local disks performance of the nodes where MariaDB data resides
If you want to restore the full backup, the name from the
example above is 2020-09-09_11-35-48. To restore a specific
incremental backup, the name from the example above is
2020-09-09_11-35-48/2020-09-12_01-01-54.
In the example above, the backups will be restored in the following strict
order:
2020-09-09_11-35-48 - full backup,
path /var/backup/base/2020-09-09_11-35-48
Pass the following parameters to the mariadb_resque.py script from
the OsDpl object:
Parameter
Type
Default
Description
--backup-name
String
Name of a folder with backup in <BASE_BACKUP> or
<BASE_BACKUP>/<INCREMENTAL_BACKUP>.
--replica-restore-timeout
Integer
3600
Timeout in seconds for 1 replica data to be restored to the
mysql data directory. Also, includes time for spawning a
rescue runner pod in Kubernetes and extracting data from a
backup archive.
kubectl -n openstack get jobs mariadb-phy-restore -o jsonpath='{.status}'
The mariadb-phy-restore job is an immutable object. Therefore,
remove the job after each successful execution. To correctly remove the job,
clean up all the settings from the OpenStackDeployment object
that you have configured during step 7 of this procedure.
This will remove all related pods as well.
Important
If mariadb-phy-restore fails, the MariaDB Pods do not start automatically.
For example, the failure may occur due to discrepancy between the current
and backup versions of MariaDB, broken backup archive, and so on.
Assess the mariadb-phy-restore job log to identify the issue:
The base directory contains full backups. Each directory in the incr
folder contains incremental backups related to a certain full backup in
the base folder. All incremental backups always have the base backup
name as parent folder.
When adding the bare metal host YAML file, specify the following OpenStack
control plane node labels for the OpenStack control plane services
such as database, messaging, API, schedulers, conductors, L3 and L2 agents:
openstack-control-plane=enabled
openstack-gateway=enabled
openvswitch=enabled
Create a Kubernetes machine in your cluster as described in
Add a machine.
When adding the machine, verify that OpenStack control plane node has
the following labels:
openstack-control-plane=enabled
openstack-gateway=enabled
openvswitch=enabled
Note
Depending on the applications that were colocated on the failed
controller node, you may need to specify some additional labels,
for example, ceph_role_mgr=true and ceph_role_mon=true . To
successfuly replace a failed mon and mgr node, refer to
Ceph operations.
Verify that the node is in the Ready state through the Kubernetes API:
kubectlgetnode<NODE-NAME>-owide|grepReady
Verify that the node has all required labels described in the previous
steps:
This section describes how to replace a failed control plane node in your
MOSK deployment. The procedure applies to the control plane
nodes that are, for example, permanently failed due to a hardware failure and
appear in the NotReady state:
For MOSK 23.3 series or earlier,
reschedule stateful applications pods to healthy controller nodes as
described in Reschedule stateful applications. For the newer
versions, MOSK performs the rescheduling of stateful
applications automatically.
If the failed controller node had the StackLight label,
fix the StackLight volume node affinity conflict as described in
Delete a cluster machine.
Remove the OpenStack port related to the Octavia health manager pod of the
failed node:
Hosts the OpenStack control plane services such as database,
messaging, API, schedulers, conductors, L3 and L2 agents.
openstack-control-plane=enabled
openstack-gateway=enabled
openvswitch=enabled
3
OpenStack compute
Hosts the OpenStack compute services such as libvirt and L2 agents.
openstack-compute-node=enabled
openvswitch=enabled (for a deployment with Open vSwitch as a
backend for networking)
Varies
If required, configure the compute host to enable DPDK, huge pages, SR-IOV,
and other advanced features in your MOSK deployment.
See Advanced OpenStack configuration (optional) for details.
Once the node is available in Kubernetes and when the nova-compute and
neutron pods are running on the node, verify that the compute
service and Neutron Agents are healthy in OpenStack API.
Verify that the compute service is mapped to cell.
The OpenStack Controller triggers the nova-cell-setup job once it
detects a new compute pod in the Ready state. This job sets mapping
for new compute services to cells.
In the nova-api-osapi pod, run:
nova-managecell_v2list_hosts|grep<cmp_host_name>
Change oversubscription settings for existing compute nodes¶
Available since MOSK 23.1
MOSK enables you to control the oversubscription of compute
node resources through the placement service API.
To manage the oversubscription through the placement API:
Obtain the host name of the hypervisor in question:
Update the allocation ratio for the required resource class in the resource
provider and inspect the system response to verify that the change has been
applied:
To ensure accurate resource updates, it is crucial to specify
the --amend argument when making requests. Failure to do so will
require the inclusion of values for all fields associated with the
resource provider.
Since MOSK 23.2, the OpenStack-related metadata is
automatically removed during the graceful machine deletion through
the Mirantis Container Cloud web UI. For the procedure, refer
to Delete a cluster machine.
During the graceful machine deletion, the OpenStack Controller (Rockoon)
performs the following operations:
Disables the OpenStack Compute and Block Storage services on the node
to prevent further scheduling of workloads to it.
Verifies if any resources are present on the node, for example, instances
and volumes. By default, the OpenStack Controller blocks the removal process
until the resources are removed by the user. To adjust this behavior
to the needs of your cluster, refer to OpenStack Controller configuration.
Removes OpenStack services metadata including compute services, Neutron
agents, and volume services.
Caution
You cannot collocate the OpenStack compute node with other cluster
components, such as Ceph. If done so, refer to the removal steps
of the collocated components when planning the maintenance window.
If your cluster runs MOSK 23.1 or older version, perfrom
the following steps before you remove the node from the cluster through
the web UI to correctly remove the OpenStack-related metadata from it:
Disable the compute service to prevent spawning of new instances.
In the keystone-client pod, run:
openstackcomputeserviceset--disable<cmp_host_name>nova-compute--disable-reason"Compute is going to be removed."
The procedure applies to the MOSK clusters running
MOSK 23.3 series or earlier versions. Starting from
24.1, MOSK performs the rescheduling of stateful
applications automatically.
The rescheduling of stateful applications may be required when replacing
a permanently failed node, decommissioning a node, migrating applications
to nodes with a more suitable set of hardware, and in several other use cases.
MOSK deployment profiles include the following stateful
applications:
OpenStack database (MariaDB)
OpenStack coordination (etcd)
OpenStack Time Series Database backend (Redis)
Each stateful application from the list above has a persistent volume claim
(PVC) based on a local persistent volume per pod. Each of control plane nodes
has a set of local volumes available. To migrate an application pod to another
node, recreate a PVC with the persistent volume from the target node.
Caution
A stateful application pod can only be migrated to a node
that does not contain other pods of this application.
Caution
When a PVC is removed, all data present in the related persistent
volume is removed from the node as well.
Perform the pods rescheduling if you have to move a PVC to
another node and the current node is still present in the cluster. If the
current node has been removed already, MOSK reschedules pods automatically
when a node with required labels is present in the cluster.
When the rescheduling is finalized, the <STATEFULSET-NAME>-<NUMBER>
pod rejoins the Galera cluster with a clean MySQL data directory
and requests the Galera state transfer from the available nodes.
Perform the pods rescheduling if you have to move a PVC to
another node and the current node is still present in the cluster. If the
current node has been removed already, MOSK reschedules pods automatically
when a node with required labels is present in the cluster.
During the reschedule procedure of the etcd LCM, a short
cluster downtime is expected.
Before MOSK 23.1:
Identify the etcd replica ID that is a numeric suffix in a pod name.
For example, the ID of the etcd-etcd-0 pod is 0.
This ID is required during the reschedule procedure.
<STORAGE-SIZE>, <STORAGE-CLASS>, and
<NAMESPACE> should correspond to the storage,
storageClassName, and namespace values from the
<OLD-PVC>.yaml file with the old PVC configuration.
The OpenStack Integration Test Suite (Tempest), is a set of integration tests
to be run against a live OpenStack environment. This section instructs you
on how to verify the workability of your OpenStack deployment using Tempest.
To verify an OpenStack deployment using Tempest:
Configure the Tempest run parameters using the features:services:tempest
structure in the OpenStackDeployment custom resource.
Note
To perform the smoke testing of your deployment, no additional
configuration is required.
Configuration examples:
To perform the full Tempest testing:
spec:services:tempest:tempest:values:conf:script:|tempest run --config-file /etc/tempest/tempest.conf --concurrency 4 --blacklist-file /etc/tempest/test-blacklist --regex test
Run Tempest. The OpenStack Tempest is deployed like other OpenStack services
in a dedicated openstack-tempest Helm release by adding tempest to
spec:features:services in the OpenStackDeployment custom resource:
spec:features:services:-tempest
Wait until Tempest is ready. The Tempest tests are launched by the
openstack-tempest-run-tests job. To keep track of the tests execution,
run:
This section instructs you on how to remove an OpenStack cluster, deployed on
top of Kubernetes, by deleting the openstackdeployments.lcm.mirantis.com
(OsDpl) CR.
To remove an OpenStack cluster:
Verify that the OsDpl object is present:
kubectlgetosdpl-nopenstack
Delete the OsDpl object:
kubectldeleteosdplosh-dev-nopenstack
The deletion may take a certain amount of time.
Verify that all pods and jobs have been deleted and no objects are present
in the command output:
kubectlgetpods,jobs-nopenstack
Delete Persistent Volume Claims (PVCs) using the following snippet. Deletion
of PVCs causes data deletion on Persistent Volumes. The volumes themselves
will become available for further operations.
Caution
Before deleting PVCs, save valuable data in a safe place.
Since MOSK 25.1, the OpenStack Controller has been open-sourced under the
name Rockoon and is maintained as an independent open-source project
going forward.
As part of this transition, all openstack-controller pods are named
rockoon pods across the MOSK documentation and deployments. This change
does not affect functionality, but this is the reminder for the users to
utilize the new naming for pods and other related artifacts accordingly.
This section instructs you on how to remove an OpenStack service deployed on
top of Kubernetes. A service is typically removed by deleting a corresponding
entry in the spec.features.services section of the
openstackdeployments.lcm.mirantis.com (OsDpl) CR.
Caution
You cannot remove the default services built into the preset
section.
Clean up OpenStack database leftovers after the service removal¶
Caution
The procedure below will permanently destroy the data of
the removed service.
Log in to the mariadb-server pod shell:
kubectl-nopenstackexec-itmariadb-server-0--bash
Remove the service database user and its permissions:
Note
Use the user name for the service database obtained during
the Remove a service procedure to substitute SERVICE-DB-NAME:
mysql-uroot-p${MYSQL_DBADMIN_PASSWORD}-e"REVOKE ALL PRIVILEGES, GRANT OPTION FROM '<SERVICE-DB-USERNAME>'@'%';"
mysql-uroot-p${MYSQL_DBADMIN_PASSWORD}-e"DROP USER '<SERVICE-DB-USERNAME>'@'%';"
Enable uploading of an image through Horizon with untrusted SSL certificates¶
By default, the OpenStack Dashboard (Horizon) is configured to load images
directly into Glance. However, if a MOSK cluster
is deployed using untrusted certificates for public API endpoints and
Horizon, uploading of images to Glance through the Horizon web UI may fail.
When accessing the Horizon web UI of such MOSK deployment
for the first time, a warning informs you that the site is insecure and you
must force trust the certificate of this site. However, when trying to upload
an image directly from a web browser, the certificate of the Glance API is
still not considered by the web browser as a trusted one since host:port
of the site is different. In this case, you must explicitly trust the
certificate of the Glance API.
To enable uploading of an image through Horizon with untrusted SSL
certificates:
Navigate to the Horizon web UI.
Configure your web browser to trust the Horizon certificate if you have not
done so yet:
In Google Chrome or Chromium, click
Advanced > Proceed to <URL> (unsafe).
In Mozilla Firefox, navigate to Advanced > Add Exception,
enter the URL in the Location field, and click
Confirm Security Exception.
Note
For other web browsers, the steps may vary slightly.
Navigate to Project > API Access.
Copy the Service Endpoint URL of the Image service.
Open this URL in a new window or tab of the same web browser.
Configure your web browser to trust the certificate of this site as
described in the step 2.
As a result, the version discovery document should appear with contents
that varies depending on the OpenStack version. For example, for OpenStack
Victoria:
Since MOSK 25.1, the OpenStack Controller has been open-sourced under the
name Rockoon and is maintained as an independent open-source project
going forward.
As part of this transition, all openstack-controller pods are named
rockoon pods across the MOSK documentation and deployments. This change
does not affect functionality, but this is the reminder for the users to
utilize the new naming for pods and other related artifacts accordingly.
The credential rotation procedure is designed to minimize the impact on service
availability and workload downtime. It depends on the credential type and
is based on the following principles:
Credentials for OpenStack admin database and messaging are immediately
changed during one rotation cycle, without a transition period.
Credentials for OpenStack admin identity are rotated with a transition
period of one extra rotation cycle. This means that the credentials
become invalid after two rotations. MOSK exposes
the latest valid credentials to the openstack-external namespace.
For details, refer to Access OpenStack through CLI from your local machine.
Credentials for OpenStack service users, including those for messaging,
identity, and database, undergo a transition period of one extra rotation
cycle during rotation.
Note
If immediate inactivation of credentials is required, initiate
the rotation procedure twice.
Verify that there are no other LCM operations running on the OpenStack
cluster.
Thoroughly plan the maintenance window taking into account the following
considerations:
All OpenStack control plane services, components of the Networking service
(OpenStack Neutron) responsible for the data plane and messaging services
are restarted during service credentials rotation.
OpenStack database and OpenStack messaging services are restarted during
administrator credentials rotation, as well as some of the Openstack
control plane services, including the Instance High Availability service
(OpenStack Masakari), Dashboard (OpenStack Horizon), and Identity service
(OpenStack Keystone).
Where the <credentials-type> value is either admin or service.
Note
Mirantis recommends rotating both admin and service credentials
simultaneously to decrease the duration of the maintenance window and
number of service restarts. You can do this by passing the --type
argument twice:
Wait until the OpenStackDeploymentStatus object has state APPLIED
and all OpenStack components in the health group in the
OpenStackDeploymentStatus custom resource are in the Ready state.
Alternatively, you can launch the rotation command with the --wait flag.
Now, the latest admin password for your OpenStack environment is available in
the openstack-identity-credentials secret in the openstack-external
namespace.
This section provides instructions on how to customize the functionality of
your MOSK OpenStack services by installing custom system
or Python packages into their container images.
The MOSK services are running in Ubuntu-based
containers, which can be extended to meet specific requirements or implement
specific use cases, for example:
Enabling third-party storage driver for OpenStack Cinder
Implementing a custom scheduler for OpenStack Nova
Adding a custom dashboard to OpenStack Horizon
Building your own image importing workflow for OpenStack Glance
Warning
Mirantis cannot be held responsible for any consequences arising
from using customized container images. Mirantis does not provide support
for such actions, and any modifications to production systems are made at
your own risk.
Note
Custom images are pinned in the OpenStackDeployment custom
resource. These images do not undergo automatic updates or upgrades.
Cloud administrator is responsible for image update during OpenStack
updating and upgrading.
Specify the location for the base image in the Dockerfile.
A custom image can be derived from any OpenStack image shipped with
MOSK. For locations of the images comprising a
specific MOSK release, refer to a corresponding
release artifacts page in the Release Notes.
Presuming the custom image will need to get rebuilt for every new
MOSK release, Mirantis recommends parametrizing the
location of its base by introducing the $FROM argument to the
Dockerfile.
Instruct the Dockerfile to install additional system packages:
Honor upper constraints that MOSK defines for its
OpenStack packages prerequisites
OpenStack components in every MOSK release are shipped
together with their requirements packaged as Python wheels and constraints
file. Download and extract these artifacts from the corresponding
requirements container image, so that they can be used for building
your packages as well. Use the requirements image with the same tag
as the base image that you plan to customize:
When selecting the name for your image, Mirantis recommends following the
common practice across major public Docker repositories, that is Docker Hub.
The image name should be <user-name>/<repo-name>, where <user-name>
is a unique identifier of the user who authored it and <repo-name>
is the name of the software shipped.
Specify the current directory as the build context. Also, use the --tag
option to assign the tag to your image. Assigning a tag :<tag>
enables you to add multiple versions of the same image to the repository.
Unless you assign a tag, it defaults to latest.
If you are adding Python packages, you can minimize the size of the custom
image by building it with the --squash flag. It merges all the image
layers into one and instructs the system not to store the cache layers of
the wheel packages.
Verify that the image has been built and is present on your system:
dockerimagels
Publish the image to the designated registry by its name and tag:
Note
Before pushing the image, make sure that you have authenticated
with the registry using the docker login command.
dockerpush<user-name>/<repo-name>:<tag>
Attach a private Docker registry to MOSK Kubernetes underlay¶
To ensure that the Kubernetes worker nodes in your MOSK
cluster can locate and download the custom image, it should be published
to a container image registry that the cluster is configured to use.
To configure the MOSK Kubernetes underlay to use your
private registry, you need to create a ContainerRegistry resource in the
Mirantis Container Cloud API with the registry domain and CA certificate in it,
and specify the resource in the Cluster object that corresponds to
MOSK.
To inject a customized OpenStack container into your MOSK
cluster:
Since MOSK 25.1
Create a ConfigMap in the openstack namespace with the following
content, replacing <OPENSTACKDEPLOYMENT-NAME> with the name of your
OpenStackDeployment custom resource:
To help you better understand the process, this section provides a few examples
illustrating how to add various plugins to MOSK services.
Warning
Mirantis cannot be held responsible for any consequences arising
from using storage drivers, plugins, or features that are not explicitly
tested or documented with MOSK. Mirantis does not provide
support for such configurations as a part of standard product subscription.
Pure Storage driver for OpenStack Cinder
Although the PureStorage driver itself is already included in the cinder
system package, you need to install additional dependencies to make it work:
System packages: nfs-common
Python packages: purestorage
The base image is the MOSK Cinder image
cinder:yoga-focal-20230227093206.
Procedure:
Download and extract the requirements from the requirements container
image that corresponds to the base image that you plan to customize:
Build wheels. This step will be performed automatically because the
Trillio repository has tar Python packages that build wheels
binaries on installation.
Since MOSK 25.1, the OpenStack Controller has been open-sourced under the
name Rockoon and is maintained as an independent open-source project
going forward.
As part of this transition, all openstack-controller pods are named
rockoon pods across the MOSK documentation and deployments. This change
does not affect functionality, but this is the reminder for the users to
utilize the new naming for pods and other related artifacts accordingly.
Orphaned resource allocations are entries in the Placement database that
track resource consumption, but the corresponding consumer (instance)
no longer exists on the compute nodes. As a result, the Nova scheduler
mistakenly believes that compute nodes have more resources allocated
than they actually have.
For example, orphaned resource allocations may occur when an instance
is evacuated from a hypervisor while the related nova-compute service
is down.
This section provides instructions on how to resolve orphaned resource
allocations in Nova if they are detected on compute nodes.
Orphaned allocations are detected by the nova-placement-audit CronJob that
runs every four hours.
The osdpl-exporter service processes the nova-placement-audit CronJob
output and exports current number of orphaned allocations to StackLight as an
osdpl_nova_audit_orphaned_allocations metric. If the value of this metric
is greater than 0, StackLight raises a major alert
NovaOrphanedAllocationsDetected.
Analyze the list of the nova-compute services obtained during
the previous step:
For the nova-compute services in the down state, most probably
there were evacuations of instances from the correspoding nodes when
the services were down. If this is the case, proceed directly to
Remove orphaned allocations. Otherwise, proceed with collecting
the logs.
For the nova-compute services in the UP state, proceed with
collecting the logs.
Collect the following logs from the environment:
Caution
The log data can be significant in size. Ensure that there is
sufficient space available in the /tmp/ directory of the OpenStack
Controller (Rockoon) pod. Create a separate report for each node.
Logs from compute nodes for a 3-day period around the time of the alert:
From the node with the orphaned allocation
From the node with the actual allocation (where the instance exists,
if any)
For example, if the alert was raised on 2024-08-12, set
<REPORT_PERIOD_TIMESTAMPS> to 2024-08-11,2024-08-13.
Logs from the nova-scheduler, nova-api, nova-conductor,
placement-api pods for a 3-day period around the time of the alert:
ctl_nodes=$(kubectlgetnodes-lopenstack-control-plane=enabled-oname)
kubectl-nosh-systemexec-itdeployment/rockoon--bash
# for each node in ctl_nodes execute:
osctlsos--between<REPORT_PERIOD_TIMESTAMPS>\--host<CTL_HOSTNAME>\--componentnova\--componentplacement\--collectorelastic\--workspace/tmp/report
kubectl-nopenstackexec-itdeployment/keystone-client--bash
openstackservermigrationlist
openstackcomputeservicelist--long
openstackresourceproviderlist
# Get the server event list for each orphaned consumer id
openstackservereventlist<SERVER_ID>
Note
SERVER_ID is the orphaned consumer ID from the
nova-placement-audit logs.
Create a support case and attach the obtained information.
The MOSK deployments with Tungsten Fabric do
not support IP address capacity monitoring.
Monitoring IP address capacity helps cloud operators allocate routable
IP addresses efficiently for dynamic workloads in the clouds. This capability
provides insights for predicting future needs for IP addresses, ensuring
seamless communication between workloads, users, and services while optimizing
IP address usage.
By monitoring IP address capacity, cloud operators can:
Predict when to add new IP address blocks to prevent service disruptions.
Identify networks or subnets nearing capacity to prevent issues.
Optimize the allocation of costly external IP address pools.
To start monitoring IP address capacity in your cloud:
Verify that all required networks and subnets are monitored.
By default, MOSK monitors IP address capacity for the
external networks that have the router:external=External attribute and
segmentation type of vlan or flat.
To include additional networks and subnets in the monitoring:
Tag the network with the openstack.lcm.mirantis.com:prometheus tag.
When a network is tagged, all its subnets are automatically included in
the monitoring:
Tag individual subnets with the openstack.lcm.mirantis.com:prometheus tag.
This includes the subnet in the monitoring regardless of the network tagging:
This section covers post-deployment configuration of OpenStack services and
is intended for cloud operators responsible for maintaining a functional
cloud infrastructure for end users. It focuses on more complex procedures
that require additional steps beyond simply editing the OpenStackDeployment
custom resource.
For an overview of the capabilities provided by MOSK
OpenStack services and instructions on enabling and configuring them
at the OpenStackDeployment level, refer to Cloud services.
Instances High Availability Service or Masakari is an OpenStack project
designed to ensure high availability of instances and compute processes
running on hosts.
Before the end user can start enjoying the benefits of Masakari, the cloud
operator has to configure the service properly. This section includes
instructions on how to create segments and host through the Masakari API
as well as provides the list of additional settings that can be useful
in certain use cases.
The segment object is a logical grouping of compute nodes into zones
also known as availability zones. The segment object enables
the cloud operator to list, create, show details for, update,
and delete segments.
To create a segment named allcomputes with service_type=compute,
and recovery_method=auto, run:
The host object represents compute service hypervisors. A host belongs
to a segment. The host can be any kind of virtual machine that has
compute service running on it. The host object enables the operator
to list, create, show details for, update, and delete hosts.
The alerting API is used by Masakari monitors to notify about a failure
of either a host, process, or instance. The notification object enables
the operator to list, create, and show details of notifications.
The list of useful tunings for the Masakari service includes:
[host_failure]\evacuate_all_instances
Enables the operator to decide whether to evacuate all instances or only the
instances that have [host_failure]\ha_enabled_instance_metadata_key
set to True. By default, the parameter is set to False.
[host_failure]\ha_enabled_instance_metadata_key
Enables the operator to decide on the instance metadata key naming that
affects the per instance behavior of
[host_failure]\evacuate_all_instances. The default is the same for
both failure types, which include host and instance, but
the value can be overridden to make the metadata key different per
failure type.
[host_failure]\ignore_instances_in_error_state
Enables the operator to decide whether error instances should be allowed
for evacuation from a failed source compute node or not. If set to True,
it will ignore error instances from evacuation from a failed source compute
node. Otherwise, it will evacuate error instances along with other
instances from a failed source compute node.
Available since MOSK 24.2[host_failure]\ha_enabled_project_tag
By default, instances belonging to any project are evacuated. However, if
the operator needs to restrict this functionality to specific projects,
they can tag these projects with a designated tag and pass this tag as
the value for this Masakari option. Consequently, instances from projects
that do not have the specified tag are not considered for evacuation, even
if they have the corresponding metadata key and value set.
[instance_failure]\process_all_instances
Enables the operator to decide whether all instances or only the
ones that have
[instance_failure]\ha_enabled_instance_metadata_key set to True
should be recovered from instance failure events. If set to True,
it will execute instance failure recovery actions for an instance
irrespective of whether that particular instance has
[instance_failure]\ha_enabled_instance_metadata_key set to True or
not. Otherwise, it will only execute instance failure recovery actions
for an instance which has
[instance_failure]\ha_enabled_instance_metadata_key set to True.
Enables the operators to decide on the instance metadata key naming that
affects the per-instance behavior of
[instance_failure]\process_all_instances. The default is the same for
both failure types, which include host and instance, but you can
override the value to make the metadata key different per failure type.
Configure monitoring of cloud workload availability¶
MOSK enables cloud operators to oversee the availability of
workloads hosted in their OpenStack infrastructure through the monitoring of
floating IP addresses availability (Cloudpprober) and network port availability
(Portprober).
For the feature description and usage requirements, refer to
Workload monitoring.
Configure floating IP address availability monitoring¶
Available since MOSK 23.2TechPreview
MOSK allows you to monitor the floating IP address
availability through the Cloudprober service. This section explains the details
of the service configuration.
By default, for outgoing traffic, the IP address for the Cloudprober Pod
is translated to the node IP address. In this procedure, we assume no
further translation of that node IP address on the path between the node
and floating network.
Identify the node IP address used for traffic destined to floating
network by selecting the IP address from the floating network and
running the following command on each OpenStack control plane node:
Verify that the instances have been added successfully.
Cloudprober uses auto-discovery of instances on periodic basis. Therefore,
wait for the discovery interval to pass (defaults to 600 seconds) and
execute the following command inside the keystone-client Pod:
You can adjust the instance auto-discovery interval in
the OpenStackDeployment object. However, Mirantis does not
recommend setting it to too low values to avoid high load on
the OpenStack API:
spec:features:cloudprober:discovery:interval:300
Now, you can start seeing the availability of instances floating IP
addresses per OpenStack compute node and project, as well as
viewing the probe statistics for individual instance floating IP addresses
through the OpenStack Instances Availability dashboard
in Grafana.
MOSK allows you to monitor the network port availability
through the Portprober service.
The Portprober service is enabled by default when the Cloudprober service
is enabled as described above, on clouds running OpenStack Antelope or newer
version and using Neutron OVS backend for networking.
This section outlines Ceph LCM operations such as adding Ceph Monitor,
Ceph nodes, and RADOS Gateway nodes to an existing Ceph cluster or removing
them, as well as removing or replacing Ceph OSDs. The section also includes
OpenStack-specific operations for Ceph.
The following sections describe the Ceph cluster configuration options:
Ceph Controller provides the capability to specify configuration options for
the Ceph cluster through the spec.cephClusterSpec.rookConfig key-value
parameter of the KaaSCephCluster resource as if they were set in a usual
ceph.conf file. For details, see Ceph advanced configuration.
However, if rookConfig is empty, Ceph Controller still specifies the
following default configuration options for each Ceph cluster:
Required network parameters that you can change through the
spec.cephClusterSpec.network section:
General default configuration options that you can override using the
rookConfig parameter:
mon target pg per osd = 200mon max pg per osd = 600# Workaround configuration option to avoid the# https://github.com/rook/rook/issues/7573 issue# when updating to Rook 1.6.x versions:rgw_data_log_backing = omap
If rookConfig is empty but the spec.cephClsuterSpec.objectStore.rgw
section is defined, Ceph Controller specifies the following OpenStack-related
default configuration options for each Ceph cluster:
Ceph Object Gateway options, which you can override using the rookConfig
parameter:
This section describes how to configure a Ceph cluster through the
KaaSCephCluster (kaascephclusters.kaas.mirantis.com) CR during or
after the deployment of a MOSK cluster.
The KaaSCephCluster CR spec has two sections, cephClusterSpec and
k8sCluster and specifies the nodes to deploy as Ceph components. Based on
the roles definitions in the KaaSCephCluster CR, Ceph Controller
automatically labels nodes for Ceph Monitors and Managers. Ceph OSDs are
deployed based on the storageDevices parameter defined for each Ceph node.
Describes a Ceph cluster in the MOSK cluster. For details
on cephClusterSpec parameters, see the tables below.
k8sCluster
Defines the cluster on which the KaaSCephCluster
depends on. Use the k8sCluster parameter if the name or namespace
of the corresponding MOSK cluster differs from default
one:
clusterNet - specifies a Classless Inter-Domain Routing (CIDR)
for the Ceph OSD replication network.
Warning
To avoid ambiguous behavior of Ceph daemons, do not specify
0.0.0.0/0 in clusterNet. Otherwise, Ceph daemons can select
an incorrect public interface that can cause the Ceph cluster to
become unavailable. The bare metal provider automatically translates
the 0.0.0.0/0 network range to the default LCM IPAM subnet
if it exists.
Note
The clusterNet and publicNet parameters support
multiple IP networks. For details, see Enable Ceph multinetwork.
publicNet - specifies a CIDR for communication between
the service and operator.
Warning
To avoid ambiguous behavior of Ceph daemons,
do not specify 0.0.0.0/0 in publicNet.
Otherwise, Ceph daemons can select an incorrect
public interface that can cause the Ceph cluster to
become unavailable. The bare metal provider
automatically translates the 0.0.0.0/0 network
range to the default LCM IPAM subnet if it exists.
Note
The clusterNet and publicNet parameters support
multiple IP networks. For details, see Enable Ceph multinetwork.
nodes
Specifies the list of Ceph nodes. For details, see
Node parameters. The nodes parameter is a map with
machine names as keys and Ceph node specifications as values, for
example:
Specifies the list of Ceph nodes grouped by node lists or node
labels. For details, see NodeGroups parameters. The nodeGroups
parameter is a map with group names as keys and Ceph node
specifications for defined nodes or node labels as values.
For example:
The <nodeLabelExpression> must be a valid Kubernetes label
selector expression.
pools
Specifies the list of Ceph pools. For details, see
Pool parameters.
objectStorage
Specifies the parameters for Object Storage, such as RADOS Gateway,
the Ceph Object Storage. Also specifies the RADOS Gateway
Multisite configuration. For details, see RADOS Gateway parameters and
Multisite parameters.
rookConfig
Optional. String key-value parameter that allows overriding
Ceph configuration options.
Since MOSK 24.2, use the | delimiter to specify
the section where a parameter must be placed. For example, mon or
osd. And, if required, use the . delimiter to specify the exact
number of the Ceph OSD or Ceph Monitor to apply an option to a specific
mon or osd and override the configuration of the corresponding section.
The use of this option enables restart of only specific daemons related
to the corresponding section. If you do not specify the section,
a parameter is set in the global section, which includes restart
of all Ceph daemons except Ceph OSD.
Available since MOSK 23.3. Enables specification of
extra options for a setup, includes the deviceLabels parameter. For details,
see ExtraOpts parameters.
ingress
In MOSK 25.1, is automatically replaced with ingressConfig.
Enables a custom ingress rule for public access on Ceph services, for
example, Ceph RADOS Gateway. For details, see Configure Ceph Object Gateway TLS.
ingressConfig
Available since MOSK 25.1 to automatically replace the
ingress section.
Enables a custom ingress rule for public access on Ceph services, for
example, Ceph RADOS Gateway. For details, see Configure Ceph Object Gateway TLS.
rbdMirror
Enables pools mirroring between two interconnected clusters. For
details, see Enable Ceph RBD mirroring.
Disables autogeneration of shared Ceph values for OpenStack
deployments. Set to false by default.
mgr
Contains the mgrModules parameter that should list the
following keys:
name - Ceph Manager module name
enabled - flag that defines whether the Ceph Manager module
is enabled
settings.balancerMode - available since MOSK 25.1.
Allows defining balancer mode for the Ceph Manager balancer module.
Possible values are crush-compat or upmap.
Specifies the mon, mgr, or rgw daemon to be installed on
a Ceph node. You can place the daemons on any nodes upon your
decision. Consider the following recommendations:
The recommended number of Ceph Monitors in a Ceph cluster is 3.
Therefore, at least 3 Ceph nodes must contain the mon item in
the roles parameter.
The number of Ceph Monitors must be odd.
Do not add more than 2 Ceph Monitors at a time and wait until the
Ceph cluster is Ready before adding more daemons.
For better HA and fault tolerance, the number of mgr roles
must equal the number of mon roles. Therefore, we recommend
labeling at least 3 Ceph nodes with the mgr role.
If rgw roles are not specified, all rgw daemons will spawn
on the same nodes with mon daemons.
If a Ceph node contains a mon role, the Ceph Monitor Pod
deploys on this node.
If a Ceph node contains a mgr role, it informs the Ceph
Controller that a Ceph Manager can be deployed on the node.
Rook Operator selects the first available node to deploy the
Ceph Manager on it:
Before MOSK 23.1, only one Ceph Manager is
deployed on a cluster.
Since MOSK 23.1, two Ceph Managers, active and
stand-by, are deployed on a cluster.
If you assign the mgr role to three recommended Ceph nodes,
one back-up Ceph node is available to redeploy a failed Ceph Manager
in case of a server outage.
storageDevices
Specifies the list of devices to use for Ceph OSD deployment.
Includes the following parameters:
Note
Since MOSK 23.3, Mirantis recommends migrating
all storageDevices items to by-id symlinks as persistent
device identifiers.
fullPath - a storage device symlink. Accepts the
following values:
Since MOSK 23.3, the device by-id symlink
that contains the serial number of the physical device and does
not contain wwn. For example,
/dev/disk/by-id/nvme-SAMSUNG_MZ1LB3T8HMLA-00007_S46FNY0R394543.
The by-id symlink should be equal to the one of Machine
status status.providerStatus.hardware.storage.byIDs list.
Mirantis recommends using this field for defining by-id
symlinks.
The device by-path symlink. For example,
/dev/disk/by-path/pci-0000:00:11.4-ata-3. Since
MOSK 23.3, Mirantis does not recommend specifying storage
devices with device by-path symlinks because such identifiers
are not persistent and can change at node boot.
This parameter is mutually exclusive with name.
name - a storage device name. Accepts the following values:
The device name, for example, sdc. Since MOSK
23.3, Mirantis does not recommend specifying storage devices
with device names because such identifiers are not persistent
and can change at node boot.
The device by-id symlink that contains the serial
number of the physical device and does not contain wwn.
For example,
/dev/disk/by-id/nvme-SAMSUNG_MZ1LB3T8HMLA-00007_S46FNY0R394543.
The by-id symlink should be equal to the one of Machine
status status.providerStatus.hardware.storage.byIDs list.
Since MOSK 23.3, Mirantis recommends using
the fullPath field for defining by-id symlinks instead.
This parameter is mutually exclusive with fullPath.
config - a map of device configurations that must contain a
mandatory deviceClass parameter set to hdd, ssd, or
nvme. The device class must be defined in a pool and can
optionally contain a metadata device, for example:
The underlying storage format to use for Ceph OSDs is BlueStore.
The metadataDevice parameter accepts a device name or logical
volume path for the BlueStore device. Mirantis recommends using
logical volume paths created on nvme devices. For devices
partitioning on logical volumes, see Create a custom bare metal host profile.
The osdsPerDevice parameter accepts the string-type natural
numbers and allows splitting one device on several Ceph OSD
daemons. Mirantis recommends using this parameter only for ssd
or nvme disks.
Optional. Available since MOSK 25.1. Specifies
the custom monitor endpoint for the node on which the monitor is placed.
The custom monitor endpoint can be equal, for example, to an IP address
from the Ceph public network range.
Specifies a string with a valid label selector expression to
select machines to which the node spec must be applied. Mutually
exclusive with nodes parameter. For example:
Mandatory. Specifies the pool name as a prefix for each Ceph block pool. The
resulting Ceph block pool name will be <name>-<deviceClass>.
useAsFullName
Optional. Enables Ceph block pool to use only the name value as a name.
The resulting Ceph block pool name will be <name> without the
deviceClass suffix.
role
Mandatory. Specifies the pool role and is used mostly for (MOSK) pools.
default
Mandatory. Defines if the pool and dependent StorageClass should be set as
default. Must be enabled only for one pool.
deviceClass
Mandatory. Specifies the device class for the defined pool. Possible values are
HDD, SSD, and NVMe.
replicated
Mandatory, mutually exclusive with erasureCoded. Includes the following parameters:
size - the number of pool replicas.
targetSizeRatio - Optional. A float percentage
from 0.0 to 1.0, which specifies the expected consumption
of the total Ceph cluster capacity. The default values are as follows:
The default ratio of the Ceph Object Storage dataPool is
10.0%.
Mandatory. The failure domain across which the replicas or chunks of data will
be spread. Set to host by default. The list of possible
recommended values includes: host, rack, room, and
datacenter.
Caution
Mirantis does not recommend using the following
intermediate topology keys: pdu, row, chassis. Consider
the rack topology instead. The osd failure domain is
prohibited.
mirroring
Optional. Enables the mirroring feature for the defined pool.
Includes the mode parameter that can be set to pool or
image. For details, see Enable Ceph RBD mirroring.
A Kubernetes cluster only supports increase of storage
size.
rbdDeviceMapOptions
Optional. Not updatable as it applies only once. Specifies custom
rbddevicemap options to use with StorageClass of a
corresponding pool. Allows customizing the Kubernetes CSI driver
interaction with Ceph RBD for the defined StorageClass. For the
available options, see Ceph documentation: Kernel RBD (KRBD) options.
parameters
Optional. Available since MOSK 23.1. Specifies the
key-value map for the parameters of the Ceph pool. For details,
see Ceph documentation: Set Pool values.
reclaimPolicy
Optional. Available since MOSK 23.3. Specifies reclaim
policy for the underlying StorageClass of the pool.
Accepts Retain and Delete values. Default is Delete
if not set.
To configure additional required pools for MOSK,
see Add a Ceph cluster.
Caution
Since Ceph Pacific, Ceph CSI driver does not propagate the
777 permission on the mount point of persistent volumes based on any
StorageClass of the Ceph pool.
Mutually exclusive with the zone parameter. Object storage data pool
spec that should only contain replicated or erasureCoded and
failureDomain parameters. The failureDomain parameter may be
set to osd or host, defining the failure domain across which
the data will be spread. For dataPool, Mirantis recommends using an
erasureCoded pool. For details, see
Rook documentation: Erasure coding.
For example:
Mutually exclusive with the zone parameter. Object storage metadata
pool spec that should only contain replicated and failureDomain
parameters. The failureDomain parameter may be set to osd or
host, defining the failure domain across which the data will be
spread. Can use only replicated settings. For example:
where replicated.size is the number of full copies of data on
multiple nodes.
Warning
When using the non-recommended Ceph pools replicated.size of
less than 3, Ceph OSD removal cannot be performed. The minimal replica
size equals a rounded up half of the specified replicated.size.
For example, if replicated.size is 2, the minimal replica size is
1, and if replicated.size is 3, then the minimal replica size
is 2. The replica size of 1 allows Ceph having PGs with only one
Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED
health warning that blocks Ceph OSD removal. Mirantis recommends setting
replicated.size to 3 for each Ceph pool.
gateway
The gateway settings corresponding to the rgw daemon settings.
Includes the following parameters:
port - the port on which the Ceph RGW service will be listening on
HTTP.
securePort - the port on which the Ceph RGW service will be
listening on HTTPS.
instances - the number of pods in the Ceph RGW ReplicaSet. If
allNodes is set to true, a DaemonSet is created instead.
Note
Mirantis recommends using 2 instances for Ceph Object Storage.
allNodes - defines whether to start the Ceph RGW pods as a
DaemonSet on all nodes. The instances parameter is ignored if
allNodes is set to true.
Defines whether to delete the data and metadata pools in the rgw
section if the object storage is deleted. Set this parameter to true
if you need to store data even if the object storage is deleted.
However, Mirantis recommends setting this parameter to false.
objectUsers and buckets
Optional. To create new Ceph RGW resources, such as buckets or users,
specify the following keys. Ceph Controller will automatically create
the specified object storage users and buckets in the Ceph cluster.
objectUsers - a list of user specifications to create for object
storage. Contains the following fields:
name - a user name to create.
displayName - the Ceph user name to display.
capabilities - user capabilities:
user - admin capabilities to read/write Ceph Object Store
users.
bucket - admin capabilities to read/write Ceph Object Store
buckets.
metadata - admin capabilities to read/write Ceph Object Store
metadata.
usage - admin capabilities to read/write Ceph Object Store
usage.
zone - admin capabilities to read/write Ceph Object Store
zones.
users - a list of strings that contain user names to create for
object storage.
Note
This field is deprecated. Use objectUsers
instead. If users is specified, it will be automatically
transformed to the objectUsers section.
buckets - a list of strings that contain bucket names to create
for object storage.
zone
Optional. Mutually exclusive with metadataPool and dataPool.
Defines the Ceph Multisite zone where the object storage must be placed.
Includes the name parameter that must be set to one of the zones
items. For details, see Enable multisite for Ceph RGW Object Storage.
Optional. Available since {{ product_name_abbr }} 25.1. Flag to determine
that a TLS certificate for accessing the Ceph RGW endpoint is used but
not exposed in spec. For example:
The operator must manually provide TLS configuration using the
rgw-ssl-certificate secret in the rook-ceph namespace of the
managed cluster. The secret object must have the following structure:
When removing an already existing SSLCert block, no additional actions
are required, because this block uses the same rgw-ssl-certificate secret
in the rook-ceph namespace.
When adding a new secret directly without exposing it in spec, the following
rules apply:
cert - base64 representation of a file with the server TLS key,
server TLS cert, and cacert.
Available since MOSK 23.3.
A key-value setting used to assign a specification label to any
available device on a specific node. These labels can then be
utilized within nodeGroups or node definitions to eliminate
the need to specify different devices for each node individually.
Additionally, it helps in avoiding the use of device names,
facilitating the grouping of nodes with similar labels.
Available since MOSK 23.3 as TechPreview.
A list of custom device class names to use in the specification.
Enables you to specify the custom names different from the default
ones, which include ssd, hdd, and nvme, and use
them in nodes and pools definitions.
List of realms to use, represents the realm namespaces. Includes the
following parameters:
name - the realm name.
pullEndpoint - optional, required only when the master zone is in
a different storage cluster. The endpoint, access key, and system key
of the system user from the realm to pull from. Includes the
following parameters:
endpoint - the endpoint of the master zone in the master zone
group.
accessKey - the access key of the system user from the realm to
pull from.
secretKey - the system key of the system user from the realm to
pull from.
zoneGroupsTechnical Preview
The list of zone groups for realms. Includes the following parameters:
name - the zone group name.
realmName - the realm namespace name to which the zone group
belongs to.
zonesTechnical Preview
The list of zones used within one zone group. Includes the following
parameters:
name - the zone name.
metadataPool - the settings used to create the Object Storage
metadata pools. Must use replication. For details, see
Pool parameters.
dataPool - the settings to create the Object Storage data pool.
Can use replication or erasure coding. For details, see
Pool parameters.
zoneGroupName - the zone group name.
endpointsForZone - available since {{ product_name_abbr }} 24.2.
The list of all endpoints in the zone group.
If you use ingress proxy for RGW, the list of endpoints must contain
that FQDN/IP address to access RGW. By default, if no ingress proxy
is used, the list of endpoints is set to the IP address of the RGW
external service. Endpoints must follow the HTTP URL format.
Specifies health check settings for Ceph daemons. Contains the
following parameters:
status - configures health check settings for Ceph health
mon - configures health check settings for Ceph Monitors
osd - configures health check settings for Ceph OSDs
Each parameter allows defining the following settings:
disabled - a flag that disables the health check.
interval - an interval in seconds or minutes for the health
check to run. For example, 60s for 60 seconds.
timeout - a timeout for the health check in seconds or minutes.
For example, 60s for 60 seconds.
livenessProbe
Key-value parameter with liveness probe settings for the defined
daemon types. Can be one of the following: mgr, mon, osd,
or mds. Includes the disabled flag and the probe
parameter. The probe parameter accepts the following options:
initialDelaySeconds - the number of seconds after the container
has started before the liveness probes are initiated. Integer.
timeoutSeconds - the number of seconds after which the probe
times out. Integer.
periodSeconds - the frequency (in seconds) to perform the
probe. Integer.
successThreshold - the minimum consecutive successful probes
for the probe to be considered successful after a failure. Integer.
failureThreshold - the minimum consecutive failures for the
probe to be considered failed after having succeeded. Integer.
Note
Ceph Controller specifies the following livenessProbe
defaults for mon, mgr, osd, and mds (if CephFS
is enabled):
5 for timeoutSeconds
5 for failureThreshold
startupProbe
Key-value parameter with startup probe settings for the defined
daemon types. Can be one of the following: mgr, mon, osd,
or mds. Includes the disabled flag and the probe
parameter. The probe parameter accepts the following options:
timeoutSeconds - the number of seconds after which the probe
times out. Integer.
periodSeconds - the frequency (in seconds) to perform the
probe. Integer.
successThreshold - the minimum consecutive successful probes
for the probe to be considered successful after a failure. Integer.
failureThreshold - the minimum consecutive failures for the
probe to be considered failed after having succeeded. Integer.
Once you enable Ceph Object Gateway (radosgw) as described in
Enable Ceph RGW Object Storage, you can configure the Transport Layer Security (TLS)
protocol for a Ceph Object Gateway public endpoint using the following options:
Using MOSK TLS, if it is enabled and exposes its
certificates and domain for Ceph.
In this case, Ceph Object Gateway will automatically create an ingress rule
with MOSK certificates and domain to access the Ceph
Object Gateway public endpoint.
Therefore, you only need to reach the Ceph Object Gateway public and internal
endpoints and set the CA certificates for a trusted TLS connection.
Using custom ingress specified in the KaaSCephCluster CR. In this
case, Ceph Object Gateway public endpoint will use the public domain
specified using the ingress parameters.
Caution
External Ceph Object Gateway service is not supported and will
be deleted during update. If your system already uses endpoints of an
external Ceph Object Gateway service, reconfigure them to the ingress
endpoints.
Caution
When using a custom or OpenStack ingress, ensure to configure
the DNS name for RGW to target an external IP address of that ingress.
If there is no OpenStack or custom ingress available, point the DNS to
an external load balancer of RGW.
Note
Since MOSK 23.3, if the cluster has
tls-proxy enabled, TLS certificates specified in ingress objects,
including those configured in the KaaSCephCluster specification,
are disregarded. Instead, common certificates are applied to all ingresses
from the OpenStackDeployment object. This implies that tlsCert and
other ingress certificates specified in KaaSCephCluster are ignored,
and the common certificate from the OpenStackDeployment object is used.
This section also describes how to specify a custom public endpoint for the
Object Storage service.
To configure Ceph Object Gateway TLS:
Verify whether MOSK TLS is enabled. The
spec.features.ssl.public_endpoints section should be specified in the
OpenStackDeployment CR.
To generate an SSL certificate for internal usage, verify that the
gateway securePort parameter is specified in the KaasCephCluster CR.
For details, see Enable Ceph RGW Object Storage.
Select from the following options:
Since MOSK 25.1
Configure TLS for Ceph Object Gateway using a custom ingressConfig:
TLS configuration for ingress including certificates. Contains the following
parameters:
cacert
The Certificate Authority (CA) certificate, used for the ingress rule
TLS support.
tlsCert
The TLS certificate, used for the ingress rule TLS support.
tlsKey
The TLS private key, used for the ingress rule TLS support.
publicDomain
Mandatory. The domain name to use for public endpoints.
Caution
The default ingress controller does not support publicDomain
values different from the OpenStack ingress public domain. Therefore,
if you intend to use the default OpenStack Ingress Controller for your
Ceph Object Storage public endpoint, plan to use the same public domain
as your OpenStack endpoints.
hostname
Custom name to override the Objectstore RGW name for public RGW access.
Public RGW endpoint has the https://<hostname>.<publicDomain> format.
tlsSecretRefName
Optional. Secret name with TLS certs on the managed cluster in the
rook-ceph namespace prepared by the operator. Allows avoiding exposure
of certs directly in spec. Must contain the following format:
When using tlsSecretRefName, remove the following
fields: cacert, tlsCert, and tlsKey.
Description of optional parameters in the ingressConfig section¶
controllerClassName
Name of the custom Ingress Controller. By default, the
openstack-ingress-nginx class name is specified and Ceph uses the
OpenStack Ingress Controller based on NGINX.
annotations
Extra annotations for the ingress proxy that are a key-value mapping of
strings to add or override ingress rule annotations. For details, see
NGINX Ingress Controller: Annotations.
By default, the following annotations are set:
nginx.ingress.kubernetes.io/rewrite-target is set to /
nginx.ingress.kubernetes.io/upstream-vhost is set to
<rgwName>.rook-ceph.svc
The value for <rgwName> is located in
spec.cephClusterSpec.objectStorage.rgw.name.
Optional annotations:
nginx.ingress.kubernetes.io/proxy-request-buffering:"off"
that disables buffering for ingress to prevent the
413 (Request Entity Too Large) error when uploading large
files using radosgw.
nginx.ingress.kubernetes.io/proxy-body-size:<size> that
increases the default uploading size limit to prevent the
413 (Request Entity Too Large) error when uploading large
files using radosgw. Set the value in MB (m) or KB
(k). For example, 100m.
Note
By default, an ingress rule is created with an internal
Ceph Object Gateway service endpoint as a backend. Also,
rgwdnsname is specified in the Ceph configuration and is set
to <rgwName>.rook-ceph.svc by default.
You can override rgwdnsname using the
spec.cephClusterSpec.rookConfig key-value parameter.
In this case, also change the corresponding ingress annotation.
Configuration example with the rgwdnsname override
For clouds with the publicDomain parameter specified, align the
upstream-vhost ingress annotation with the name of the Ceph
Object Storage and the specified public domain.
Ceph Object Storage requires the upstream-vhost and
rgwdnsname parameters to be equal. Therefore, override the
default rgwdnsname with the corresponding ingress annotation
value.
Before MOSK 25.1
Configure Ceph Object Gateway TLS using a custom ingress:
Warning
The rgw section is deprecated and the ingress
parameters are moved under cephClusterSpec.ingress. If you continue
using rgw.ingress, it will be automatically translated into
cephClusterSpec.ingress during the MOSK cluster
release update.
Open the KaasCephCluster CR for editing.
Specify the ingress parameters:
publicDomain - domain name to use for the external service.
Caution
Since MOSK 23.3, the default
ingress controller does not support publicDomain values
different from the OpenStack ingress public domain. Therefore,
if you intend to use the default OpenStack ingress controller
for your Ceph Object Storage public endpoint, plan to use the
same public domain as your OpenStack endpoints.
cacert - Certificate Authority (CA) certificate, used for the
ingress rule TLS support.
tlsCert - TLS certificate, used for the ingress rule TLS support.
tlsKey - TLS private key, used for the ingress rule TLS support.
customIngressOptional -
includes the following custom Ingress Controller parameters:
className - the custom Ingress Controller class name. If not
specified, the openstack-ingress-nginx class name is used by
default.
nginx.ingress.kubernetes.io/rewrite-target is set to /
nginx.ingress.kubernetes.io/upstream-vhost is set to
<rgwName>.rook-ceph.svc.
The value for <rgwName> is
spec.cephClusterSpec.objectStorage.rgw.name.
Optional annotations:
nginx.ingress.kubernetes.io/proxy-request-buffering:"off"
that disables buffering for ingress to prevent the
413 (Request Entity Too Large) error when uploading large
files using radosgw.
nginx.ingress.kubernetes.io/proxy-body-size:<size> that
increases the default uploading size limit to prevent the
413 (Request Entity Too Large) error when uploading large
files using radosgw. Set the value in MB (m) or KB
(k). For example, 100m.
An ingress rule is by default created with an internal
Ceph Object Gateway service endpoint as a backend. Also,
rgwdnsname is specified in the Ceph configuration and is set
to <rgwName>.rook-ceph.svc by default. You can override this
option using the spec.cephClusterSpec.rookConfig key-value
parameter. In this case, also change the corresponding ingress
annotation.
For clouds with the publicDomain parameter specified, align
the upstream-vhost ingress annotation with the name of the
Ceph Object Storage and the specified public domain.
Ceph Object Storage requires the upstream-vhost and
rgwdnsname parameters to be equal. Therefore, override the
default rgwdnsname to the corresponding ingress annotation
value.
If MOSK TLS is enabled
Obtain the MOSK CA certificate for a trusted connection:
Obtain the internal endpoint name for Ceph Object Gateway:
kubectl-nrook-cephgetsvc-lapp=rook-ceph-rgw
The internal endpoint for Ceph Object Gateway has the
https://<internal-svc-name>.rook-ceph.svc:<rgw-secure-port>/
format, where <rgw-secure-port> is
spec.rgw.gateway.securePort specified in the
KaaSCephCluster CR.
Substitute <objectStorageName> with the Ceph Object Storage name and
<customPublicEndpoint> with the public endpoint with a custom public
domain.
If one or both endpoints are omitted in the list, add the missing
endpoints to the hostnames list in the zonegroup.json file and
update Ceph Object Gateway zonegroup configuration:
Once done, Ceph Object Gateway becomes available by the custom public endpoint
with an S3 API client, OpenStack Swift CLI, and OpenStack Horizon Containers
plugin.
When you use Ceph Object Gateway server-side encryption (SSE),
unencrypted data sent over HTTPS is stored encrypted by the Ceph Object Gateway
in the Ceph cluster. The current implementation integrates Barbican as a key
management service.
The object storage SSE feature is enabled by default in MOSK
deployments with Barbican. To use object storage SSE, the AWS CLI S3 client is
used.
In the output, capture the first value as the <user-id>,
which is c63b70134e0845a2b13c3f947880f66a in the above
example.
Specify the ceph-rgw user in the Barbican ACL:
openstackacluseradd--user<user-id><secret-href>
Substitute <user-id> with the corresponding value from the output of
the previous command and <secret-href> with the corresponding value
obtained in step 3.
This section explains how to create an Amazon Simple Storage Service
(Amazon S3 or S3) bucket and set an S3 bucket policy between two Ceph Object
Storage users.
Set a bucket policy for a Ceph Object Storage user¶
Available since 2.23.1 (Cluster release 12.7.0)
Amazon S3 is an object storage service with different access policies. A bucket
policy is a resource-based policy that grants permissions to a bucket and
objects in it. For more details, see Amazon S3 documentation: Using bucket
policies
.
The following procedure illustrates the process of setting a bucket policy for
a bucket (test01) stored in a Ceph Object Storage. The bucket policy
requires at least two users: a bucket owner (user-a) and a bucket user
(user-t). The bucket owner creates the bucket and sets the policy that
regulates access for the bucket user.
Caution
For user name, apply the UUID format with no capital letters.
To configure an Amazon S3 bucket policy:
Note
The s3cmd is a free command-line tool and client for
uploading, retrieving, and managing data in Amazon S3 and other cloud
storage service providers that use the S3 protocol. You can download the
s3cmd CLI tool from
Amazon S3 tools: Download s3cmd.
Configure the s3cmd client with the user-a credentials:
The following procedure illustrates the process of setting a bucket policy for
a bucket between two OpenStack users.
Due to specifics of the Ceph integration with OpenStack projects, you should
configure the bucket policy for OpenStack users indirectly through setting
permissions for corresponding OpenStack projects.
For illustration purposes, we use the following names in the procedure:
test01 for the bucket
user-a, user-t for the OpenStack users
project-a, project-t for the OpenStack projects
To configure an Amazon S3 bucket policy for OpenStack users:
Specify the rookConfig parameter in the cephClusterSpec section of
the KaaSCephCluster custom resource:
Obtain the values from the access and secret fields to connect with
Ceph Object Storage trough the s3cmd tool.
Note
The s3cmd is a free command-line tool for uploading, retrieving,
and managing data in Amazon S3 and other cloud storage service providers
that use the S3 protocol. You can download the s3cmd tool from
Amazon S3 tools: Download s3cmd.
Create bucket users and configure a bucket policy for the project-t
OpenStack project similarly to the procedure described in
Set a bucket policy for a Ceph Object Storage user.
Ceph integration does not allow providing permissions for OpenStack users
directly. Therefore, you need to set permissions for the project that
corresponds to the user:
Ceph pool target ratio defines for the Placement Group (PG) autoscaler the
amount of data the pools are expected to acquire over time in relation to each
other. You can set initial PG values for each Ceph pool. Otherwise, the
autoscaler starts with the minimum value and scales up, causing a lot of data
to move in the background.
You can allocate several pools to use the same device class, which is a solid
block of available capacity in Ceph. For example, if three pools
(kubernetes-hdd, images-hdd, and volumes-hdd) are set to use the
same device class hdd, you can set the target ratio for Ceph pools to
provide 80% of capacity to the volumes-hdd pool and distribute the
remaining capacity evenly between the two other pools. This way, Ceph pool
target ratio instructs Ceph on when to warn that a pool is running out of free
space and, at the same time, instructs Ceph on how many placement groups Ceph
should allocate/autoscale for a pool for better data distribution.
Ceph pool target ratio is not a constant value and you can change it according
to new capacity plans. Once you specify target ratio, if the PG number of a
pool scales, other pools with specified target ratio will automatically scale
accordingly.
For illustration purposes, the procedure below uses raw capacity of 185 TB
or 189440 GB.
Design Ceph pools with the considered device class upper bounds of the
possible capacity. For example, consider the hdd device class that
contains the following pools:
The kubernetes-hdd pool will contain not more than 2048 GB.
The images-hdd pool will contain not more than 2048 GB.
The volumes-hdd pool will contain 50 GB per VM. The upper bound of
used VMs on the cloud is 204, the pool replicated size is 3.
Therefore, calculate the upper bounds for volumes-hdd:
50 GB per VM * 204 VMs * 3 replicas=30600 GB
The backup-hdd pool can be calculated as a relative of
volumes-hdd. For example, 1 volumes-hdd storage unit per 5
backup-hdd units.
The vms-hdd is a pool for ephemeral storage Copy on Write (CoW). We
recommend designing the amount of ephemeral data it should store. For
example purposes, we use 500 GB. However, in reality, despite the CoW data
reduction, this value is very optimistic.
Note
If dataPool is replicated and Ceph Object Store is planned for
intensive use, also calculate upper bounds for dataPool.
Calculate target ratio for each considered pool. For example:
If required, calculate the target ratio for erasure-coded pools.
Due to erasure-coded pools splitting each object into K data parts
and M coding parts, the total used storage for each object is less
than that in replicated pools. Indeed, M is equal to the number of
OSDs that can be missing from the cluster without the cluster experiencing
data loss. This means that planned data is stored with an efficiency
of (K+M)/2 on the Ceph cluster. For example, if an erasure-coded data
pool with K=2,M=2 planned capacity is 200 GB, then the total used
capacity is 200*(2+2)/2, which is 400 GB.
Open the KaasCephCluster CR of a managed cluster for editing:
In the spec.cephClusterSpec.pools section, specify the calculated
relatives as parameters.target_size_ratio for each considered
erasure-coded pool. For example:
Note
The parameters section is a key-value mapping where
the value is of the string type and should be quoted.
After the Ceph pool becomes available, it is automatically specified as an
additional Cinder backend and registered as a new volume type, which you can
use to create Cinder volumes.
The following sections describe how to configure, manage, and verify
specific aspects of a Ceph cluster.
Caution
Before you proceed with any reading or writing operation, first
verify the cluster status using the ceph tool as described in
Verify the Ceph core services.
The Ceph LCM automated operations such as Ceph OSD or Ceph node removal are
performed by creating a corresponding KaaSCephOperationRequest CR that
creates separate CephOsdRemoveRequest requests. It allows for automated
removal of healthy or non-healthy Ceph OSDs from a Ceph cluster and covers the
following scenarios:
Reducing hardware - all Ceph OSDs are up/in but you want to decrease the
number of Ceph OSDs by reducing the number of disks or hosts.
Hardware issues. For example, if a host unexpectedly goes down and will not
be restored, or if a disk on a host goes down and requires replacement.
This section describes the KaaSCephOperationRequest CR creation workflow,
specification, and request status.
If KaaSCephOperationRequest contains information about Ceph OSDs to
remove in a proper format, the information will be validated to eliminate
human error and avoid a wrong Ceph OSD removal.
If the osdRemove.nodes section of KaaSCephOperationRequest is
empty, the Ceph Request Controller will automatically detect Ceph OSDs for
removal, if any. Auto-detection is based not only on the information
provided in the KaaSCephCluster but also on the information from the
Ceph cluster itself.
Once the validation or auto-detection completes, the entire information
about the Ceph OSDs to remove appears in the KaaSCephOperationRequest
object: hosts they belong to, OSD IDs, disks, partitions, and so on. The
request then moves to the ApproveWaiting phase until the Operator
manually specifies the approve flag in the spec.
Manually adding an affirmative approve flag in the
KaaSCephOperationRequest spec. Once done, the Ceph Status Controller
reconciliation pauses until the request is handled and executes the
following:
Stops regular Ceph Controller reconciliation
Removes Ceph OSDs
Runs batch jobs to clean up the device, if possible
Removes host information from the Ceph cluster if the entire Ceph node is
removed
Marks the request with an appropriate result with a description of
occurred issues
Note
If the request completes successfully, Ceph Controller
reconciliation resumes. Otherwise, it remains paused until the issue is
resolved.
Device cleanup jobs are not removed automatically and are kept in
the ceph-lcm-mirantis namespace along with pods containing
information about the executed actions. The jobs have the following
labels:
Additionally, jobs are labeled with disk names that will be cleaned up,
such as vdb=true. You can remove a single job or a group of jobs
using any label described above, such as host, disk, and so on.
This section describes the KaaSCephOperationRequest CR specification used
to automatically create a CephOsdRemoveRequest request. For the procedure
workflow, see Creating a Ceph OSD removal request.
Describes the definition for the CephOsdRemoveRequest spec. For
details on the osdRemove parameters, see the tables below.
kaasCephCluster
Defines KaaSCephCluster on which the KaaSCephOperationRequest
depends on. Use the kaasCephCluster parameter if the name or
project of the corresponding Container Cloud cluster differs from the
default one:
Flag that indicates whether a request is ready to execute removal. Can
only be manually enabled by the Operator. For example:
spec:osdRemove:approve:true
keepOnFail
Flag used to keep requests in handling and not to proceed to the next
request if the Validating or Processing phases failed. The
request will remain in the InputWaiting state until the flag or the
request itself is removed or the request spec is updated.
If the Validation phase fails, you can update the
spec.osdRemove.nodes section in KaaSCephCluster to avoid issues
and re-run the validation. If the Processing phase fails, you can
resolve issues without resuming the Ceph Controller reconciliation and
proceeding to the next request and apply the required actions to keep
cluster data.
For example:
spec:osdRemove:keepOnFail:true
resolved
Optional. Flag that marks a finished request, even if it failed, to keep
it in history. It allows resuming the Ceph Controller reconciliation
without removing the failed request. The flag is used only by Ceph
Controller and has no effect on request processing. Can only be manually
specified. For example:
spec:osdRemove:resolved:true
resumeFailed
Optional. Flag used to resume a failed request and proceed with Ceph OSD
removal if the KeepOnFail is set and the request status is
InputWaiting. For example:
Flag used to clean up an entire node and drop it from the CRUSH map.
Mutually exclusive with cleanupByDevice and cleanupByOsdId.
cleanupByDevice
List that describes devices to clean up by name or device path as they
were specified in KaaSCephCluster. Mutually exclusive with
completeCleanUp and cleanupByOsdId. Includes the following
parameters:
name - name of the device to remove from the Ceph cluster.
Mutually exclusive with path.
path - by-path of the device to remove from the Ceph cluster.
Mutually exclusive with name. Supports device removal with
by-id.
Warning
Since MOSK 23.3, Mirantis does not recommend setting device
name or device by-path symlink in the cleanupByDevice field
as these identifiers are not persistent and can change at node boot. Remove
Ceph OSDs with by-id symlinks specified in the path field or use
cleanupByOsdId instead.
For node-a, full cleanup, including all OSDs on the node, node drop from
the CRUSH map, and cleanup of all disks used for Ceph OSDs on this node.
For node-b, cleanup of Ceph OSDs with IDs 1, 15, and 25
along with the related disk information.
For node-c, cleanup of the device with name sdb, the device with
path ID /dev/disk/by-path/pci-0000:00:1c.5, and the device with by-id/dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS,
dropping of OSDs running on these devices.
This section describes the status.osdRemoveStatus.removeInfo fields of the
KaaSCephOperationRequest CR that you can use to review a Ceph OSD or node
removal phases. The following diagram represents the phases flow:
Describes the current request phase that can be one of:
Pending - the request is created and placed in the request queue.
Validation - the request is taken from the queue and the provided
information is being validated.
ApproveWaiting - the request passed the validation phase, is ready
to execute, and is waiting for user confirmation through the approve
flag.
Processing - the request is executing following the next phases:
Pending - marking the current Ceph OSD for removal.
Rebalancing - the Ceph OSD is moved out, waiting until it is
rebalanced. If the current Ceph OSD is down or already out, the next
phase takes place.
Removing - purging the Ceph OSD and its authorization key.
Removed - the Ceph OSD has been successfully removed.
Failed - the Ceph OSD failed to remove.
Completed - the request executed with no issues.
CompletedWithWarnings - the request executed with non-critical
issues. Review the output, action may be required.
InputWaiting - during the Validation or Processing phases,
critical issues occurred that require attention. If issues occurred
during validation, update osdRemove information, if present, and
re-run validation. If issues occurred during processing, review the
reported issues and manually resolve them.
Failed - the request failed during the Validation or
Processing phases.
removeInfo
The overall information about the Ceph OSDs to remove: final removal
map, issues, and warnings. Once the Processing phase succeeds,
removeInfo will be extended with the removal status for each node
and Ceph OSD. In case of an entire node removal, the status will contain
the status itself and an error message, if any.
The removeInfo.osdMapping field contains information about:
Ceph OSDs removal status.
Batch job reference for the device cleanup: its name, status, and
error, if any. The batch job status for the device cleanup will be
either Failed, Completed, or Skipped. The Skipped
status is used when a host is down, disk is crashed, or an error
occurred when obtaining the ceph-volume information.
Ceph OSD deployment removal status and the related Ceph OSD name. The
status will be either Failed or Removed.
messages
Informational messages describing the reason for the request transition
to the next phase.
conditions
History of spec updates for the request.
Example of status.osdRemoveStatus.removeInfo after
successful Validation
The example above is based on the example spec provided in
KaaSCephOperationRequest OSD removal specification.
During the Validation phase, the provided information was validated and
reflects the final map of the Ceph OSDs to remove:
For node-a, Ceph OSDs with IDs 2, 6, and 11 will be removed
with the related disk and its information: all block devices, names, paths,
and disk class.
For node-b, the Ceph OSDs with IDs 1, 15, and 25 will be
removed with the related disk information.
For node-c, the Ceph OSD with ID 8 will be removed, which is placed
on the specified sdb device. The related partition on the sdf disk,
which is used as the BlueStore metadata device, will be cleaned up keeping
the disk itself untouched. Other partitions on that device will not be
touched.
Example of removeInfo with removeStatus failed by timeout
removeInfo:cleanUpMap:"node-a":completeCleanUp:trueosdMapping:"2":removeStatus:osdRemoveStatus:errorReason:Timeout (30m0s) reached for waiting pg rebalance for osd 2status:FaileddeviceMapping:"sdb":path:"/dev/disk/by-path/pci-0000:00:0a.0"partition:"/dev/ceph-a-vg_sdb/osd-block-b-lv_sdb"type:"block"class:"hdd"zapDisk:true
Note
In case of failures similar to the examples above, review the
ceph-request-controller logs and the Ceph cluster status. Such failures
may simply indicate timeout and retry issues. If no other issues were found,
re-create the request with a new name and skip adding successfully removed
Ceph OSDS or Ceph nodes.
Mirantis Ceph Controller simplifies a Ceph cluster management by automating
LCM operations. This section describes how to add, remove, or reconfigure Ceph
nodes.
Note
When adding a Ceph node with the Ceph Monitor role, if any issues occur with
the Ceph Monitor, rook-ceph removes it and adds a new Ceph Monitor instead,
named using the next alphabetic character in order. Therefore, the Ceph Monitor
names may not follow the alphabetical order. For example, a, b, d,
instead of a, b, c.
Prepare a new machine for the required managed cluster as described in
Add a machine. During machine preparation, update the settings of the
related bare metal host profile for the Ceph node being replaced with the
required machine devices as described in Create a custom bare metal host profile.
Open the KaasCephCluster CR of a managed cluster for editing:
To use a new Ceph node for a Ceph Monitor or Ceph Manager deployment,
also specify the roles parameter.
Reducing the number of Ceph Monitors is not supported and causes the
Ceph Monitor daemons removal from random nodes.
Removal of the mgr role in the nodes section of the
KaaSCephCluster CR does not remove Ceph Managers. To remove a Ceph
Manager from a node, remove it from the nodes spec and manually
delete the mgr pod in the Rook namespace.
Verify that all new Ceph daemons for the specified node have been
successfully deployed in the Ceph cluster. The fullClusterInfo section
should not contain any issues.
status:fullClusterInfo:daemonsStatus:mgr:running:a is active mgrstatus:Okmon:running:'3/3monsrunning:[abc]inquorum'status:Okosd:running:'3/3running:3up,3in'status:Ok
To remove a Ceph node with a mon role, first move the Ceph
Monitor to another node and remove the mon role from the Ceph node as
described in Move a Ceph Monitor daemon to another node.
Open the KaasCephCluster CR of a managed cluster for editing:
Mirantis Ceph Controller simplifies Ceph cluster management by automating LCM
operations. This section describes how to add, remove, or reconfigure Ceph
OSDs.
Manually prepare the required machine devices with LVM2 on the existing
node because BareMetalHostProfile does not support in-place changes.
To add a Ceph OSD to an existing or hot-plugged raw device
If you want to add a Ceph OSD on top of a raw device that already exists
on a node or is hot-plugged, add the required device using the following
guidelines:
You can add a raw device to a node during node deployment.
If a node supports adding devices without node reboot, you can hot plug
a raw device to a node.
If a node does not support adding devices without node reboot, you can
hot plug a raw device during node shutdown. In this case, complete the
following steps:
Enable maintenance mode on the managed cluster.
Turn off the required node.
Attach the required raw device to the node.
Turn on the required node.
Disable maintenance mode on the managed cluster.
Open the KaasCephCluster CR of a managed cluster for editing:
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes.<machineName>.storageDevices section, specify the
parameters for a Ceph OSD as required. For the parameters description, see
Node parameters.
The example configuration of the nodes section with the new node:
Since MOSK 23.3
nodes:kaas-node-5bgk6:roles:-mon-mgrstorageDevices:-config:# existing itemdeviceClass:hddfullPath:/dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS-config:# new itemdeviceClass:hddfullPath:/dev/disk/by-id/scsi-0ATA_HGST_HUS724040AL_PN1334PEHN1VBC
Before MOSK 23.3
nodes:kaas-node-5bgk6:roles:-mon-mgrstorageDevices:-config:# existing itemdeviceClass:hddname:sdb-config:# new itemdeviceClass:hddname:sdc
Warning
Since MOSK 23.3, Mirantis highly recommends
using the non-wwn by-id symlinks to specify storage devices in the
storageDevices list.
When using the non-recommended Ceph pools replicated.size of
less than 3, Ceph OSD removal cannot be performed. The minimal replica
size equals a rounded up half of the specified replicated.size.
For example, if replicated.size is 2, the minimal replica size is
1, and if replicated.size is 3, then the minimal replica size
is 2. The replica size of 1 allows Ceph having PGs with only one
Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED
health warning that blocks Ceph OSD removal. Mirantis recommends setting
replicated.size to 3 for each Ceph pool.
Open the KaasCephCluster CR of a managed cluster for editing:
Substitute <managedClusterProjectName> with the corresponding value.
Remove the required Ceph OSD specification from the
spec.cephClusterSpec.nodes.<machineName>.storageDevices list:
The example configuration of the nodes section with the new node:
Since MOSK 23.3
nodes:kaas-node-5bgk6:roles:-mon-mgrstorageDevices:-config:deviceClass:hddfullPath:/dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS-config:# remove the entire item entry from storageDevices listdeviceClass:hddfullPath:/dev/disk/by-id/scsi-0ATA_HGST_HUS724040AL_PN1334PEHN1VBC
Before MOSK 23.3
nodes:kaas-node-5bgk6:roles:-mon-mgrstorageDevices:-config:deviceClass:hddname:sdb-config:# remove the entire item entry from storageDevices listdeviceClass:hddname:sdc
Create a YAML template for the KaaSCephOperationRequest CR. Select from
the following options:
Remove Ceph OSD by device name, by-path symlink, or by-id symlink:
Substitute <managedClusterProjectName> with the corresponding cluster
namespace and <kaasCephClusterName> with the corresponding
KaaSCephCluster name.
Warning
Since MOSK 23.3, Mirantis does not recommend setting device
name or device by-path symlink in the cleanupByDevice field
as these identifiers are not persistent and can change at node boot. Remove
Ceph OSDs with by-id symlinks specified in the path field or use
cleanupByOsdId instead.
Since MOSK 23.1, cleanupByDevice is not
supported if a device was physically removed from a node. Therefore,
use cleanupByOsdId instead. For details, see
Remove a failed Ceph OSD by Ceph OSD ID.
Before MOSK
23.1, if the storageDevice item was specified with by-id,
specify the path parameter in the cleanupByDevice section
instead of name.
If the storageDevice item was specified with a by-path device
path, specify the path parameter in the cleanupByDevice
section instead of name.
Add, remove, or reconfigure Ceph OSDs with metadata devices¶
Mirantis Ceph Controller simplifies Ceph cluster management by automating LCM
operations. This section describes how to add, remove, or reconfigure Ceph
OSDs with a separate metadata device.
From the Ceph disks defined in the BareMetalHostProfile object that was
configured using the Configure Ceph disks in a host profile procedure, select one disk for
data and one logical volume for metadata of a Ceph OSD to be added to the
Ceph cluster.
Note
If you add a new disk after machine provisioning, manually
prepare the required machine devices using Logical Volume Manager (LVM) 2
on the existing node because BareMetalHostProfile does not support
in-place changes.
To add a Ceph OSD to an existing or hot-plugged raw device
If you want to add a Ceph OSD on top of a raw device that already exists
on a node or is hot-plugged, add the required device using the following
guidelines:
You can add a raw device to a node during node deployment.
If a node supports adding devices without node reboot, you can hot plug
a raw device to a node.
If a node does not support adding devices without node reboot, you can
hot plug a raw device during node shutdown. In this case, complete the
following steps:
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes.<machineName>.storageDevices section, specify the
parameters for a Ceph OSD as required. For the parameters description, see
Node parameters.
The example configuration of the nodes section with the new node:
Since MOSK 23.3
nodes:kaas-node-5bgk6:roles:-mon-mgrstorageDevices:-config:# existing itemdeviceClass:hddfullPath:/dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS-config:# new itemdeviceClass:hddmetadataDevice:/dev/bluedb/meta_1fullPath:/dev/disk/by-id/scsi-0ATA_HGST_HUS724040AL_PN1334PEHN1VBC
Before MOSK 23.3
nodes:kaas-node-5bgk6:roles:-mon-mgrstorageDevices:-config:# existing itemdeviceClass:hddname:sdb-config:# new itemdeviceClass:hddmetadataDevice:/dev/bluedb/meta_1name:sdc
Warning
Since MOSK 23.3, Mirantis highly recommends
using the non-wwn by-id symlinks to specify storage devices in the
storageDevices list.
Ceph OSD removal implies the usage of the
KaaSCephOperationRequest custom resource (CR). For workflow overview,
spec and phases description, see High-level workflow of Ceph OSD or node removal.
Warning
When using the non-recommended Ceph pools replicated.size of
less than 3, Ceph OSD removal cannot be performed. The minimal replica
size equals a rounded up half of the specified replicated.size.
For example, if replicated.size is 2, the minimal replica size is
1, and if replicated.size is 3, then the minimal replica size
is 2. The replica size of 1 allows Ceph having PGs with only one
Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED
health warning that blocks Ceph OSD removal. Mirantis recommends setting
replicated.size to 3 for each Ceph pool.
Open the KaasCephCluster object of the managed cluster for editing:
Substitute <managedClusterProjectName> with the corresponding value.
Remove the required Ceph OSD specification from the
spec.cephClusterSpec.nodes.<machineName>.storageDevices list:
The example configuration of the nodes section with the new node:
Since MOSK 23.3
nodes:kaas-node-5bgk6:roles:-mon-mgrstorageDevices:-config:deviceClass:hddfullPath:/dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS-config:# remove the entire item entry from storageDevices listdeviceClass:hddmetadataDevice:/dev/bluedb/meta_1fullPath:/dev/disk/by-id/scsi-0ATA_HGST_HUS724040AL_PN1334PEHN1VBC
Before MOSK 23.3
nodes:kaas-node-5bgk6:roles:-mon-mgrstorageDevices:-config:deviceClass:hddname:sdb-config:# remove the entire item entry from storageDevices listdeviceClass:hddmetadataDevice:/dev/bluedb/meta_1name:sdc
Create a YAML template for the KaaSCephOperationRequest CR. For example:
Substitute <managedClusterProjectName> with the corresponding cluster
namespace and <kaasCephClusterName> with the corresponding
KaaSCephCluster name.
Warning
Since MOSK 23.3, Mirantis does not recommend setting device
name or device by-path symlink in the cleanupByDevice field
as these identifiers are not persistent and can change at node boot. Remove
Ceph OSDs with by-id symlinks specified in the path field or use
cleanupByOsdId instead.
Since MOSK 23.1,
cleanupByDevice is not supported if a device was physically
removed from a node. Therefore, use cleanupByOsdId instead. For
details, see Remove a failed Ceph OSD by Ceph OSD ID.
Before MOSK 23.1,
if the storageDevice item was specified with by-id, specify
the path parameter in the cleanupByDevice section instead of
name.
If the storageDevice item was specified with a by-path device
path, specify the path parameter in the cleanupByDevice section
instead of name.
Apply the template on the management cluster in the corresponding namespace:
kubectlapply-fremove-osd-<machineName>-sdb.yaml
Verify that the corresponding request has been created:
Reconfigure a partition of a Ceph OSD metadata device¶
There is no hot reconfiguration procedure for existing Ceph OSDs. To
reconfigure an existing Ceph node, remove and re-add a Ceph OSD with a
metadata device using the following options:
Since Container Cloud 2.24.0, if metadata device partitions are specified
in the BareMetalHostProfile object as described in Configure Ceph disks in a host profile,
the metadata device definition is an LVM path in metadataDevice of the
KaaSCephCluster object.
Therefore, automated LCM will clean up the logical volume without removal
and it can be reused. For this reason, to reconfigure a partition of a Ceph
OSD metadata device:
Before MOSK 23.2 or if metadata device partitions are not
specified in the BareMetalHostProfile object as described in
Configure Ceph disks in a host profile, the most common definition of a metadata device is a
full device name (by-path or by-id) in metadataDevice of the
KaaSCephCluster object for Ceph OSD. For example,
metadataDevice:/dev/nvme0n1. In this case, to reconfigure a partition
of a Ceph OSD metadata device:
Remove a Ceph OSD from the Ceph cluster as described in
Remove a Ceph OSD with a metadata device. Automated LCM will clean
up the data device and will remove the metadata device partition for the
required Ceph OSD.
Reconfigure the metadata device partition manually to use it during
addition of a new Ceph OSD.
Manual reconfiguration of a metadata device partition
Log in to the Ceph node running a Ceph OSD to reconfigure.
Find the required metadata device used for Ceph OSDs that should
have LVM partitions with the osd--db substring:
Capture the volume group UUID and logical volume sizes. In the
example above, the volume group UUID is
ceph--7831901d--398e--415d--8941--e78486f3b019 and the size
is 16G.
Capture the volume group with the name that matches the prefix of
LVM partitions of the metadata device. In the example above, the
required volume group is
ceph-7831901d-398e-415d-8941-e78486f3b019.
Make a manual LVM partitioning for the new Ceph OSD. Create a new
logical volume in the obtained volume group:
lvcreate-L<lvSize>-n<lvName><vgName>
Substitute the following parameters:
<lvSize> with the previously obtained logical volume size.
In the example above, it is 16G.
<lvName> with a new logical volume name. For example,
meta_1.
<vgName> with the previously obtained volume group name.
In the example above, it is
ceph-7831901d-398e-415d-8941-e78486f3b019.
Note
Manually created partitions can be removed only
manually, or during a complete metadata disk removal, or during
the Machine object removal or re-provisioning.
Add the same Ceph OSD but with a modified configuration and manually
created logical volume of the metadata device as described in
Add a Ceph OSD with a metadata device.
For example, instead of metadataDevice:/dev/bluedb/meta_1 define
metadataDevice:/dev/ceph-7831901d-398e-415d-8941-e78486f3b019/meta_1
that was manually created in the previous step.
After a physical disk replacement, you can use Ceph LCM API to redeploy
a failed Ceph OSD. The common flow of replacing a failed Ceph OSD is as
follows:
Remove the obsolete Ceph OSD from the Ceph cluster by device name, by Ceph
OSD ID, or by path.
Add a new Ceph OSD on the new disk to the Ceph cluster.
Remove a failed Ceph OSD by device name, path, or ID¶
Warning
The procedure below presuppose that the operator knows the exact
device name, by-path, or by-id of the replaced device, as well as on
which node the replacement occurred.
Warning
Since Container Cloud 2.23.1 (Cluster release 12.7.0),
a Ceph OSD removal using by-path, by-id, or device name is not
supported if a device was physically removed from a node. Therefore, use
cleanupByOsdId instead. For details, see
Remove a failed Ceph OSD by Ceph OSD ID.
Warning
Since MOSK 23.3, Mirantis does not recommend setting device
name or device by-path symlink in the cleanupByDevice field
as these identifiers are not persistent and can change at node boot. Remove
Ceph OSDs with by-id symlinks specified in the path field or use
cleanupByOsdId instead.
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section, remove the required device:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-name:<deviceName># remove the entire item from storageDevices list# fullPath: <deviceByPath> if device is specified with symlink instead of nameconfig:deviceClass:hdd
Substitute <machineName> with the machine name of the node where the
device <deviceName> or <deviceByPath> is going to be replaced.
Save KaaSCephCluster and close the editor.
Create a KaaSCephOperationRequest CR template and save it as
replace-failed-osd-<machineName>-<deviceName>-request.yaml:
apiVersion:kaas.mirantis.com/v1alpha1kind:KaaSCephOperationRequestmetadata:name:replace-failed-osd-<machineName>-<deviceName>namespace:<managedClusterProjectName>spec:osdRemove:nodes:<machineName>:cleanupByDevice:-name:<deviceName># If a device is specified with by-path or by-id instead of# name, path: <deviceByPath> or <deviceById>.kaasCephCluster:name:<kaasCephClusterName>namespace:<managedClusterProjectName>
Substitute <kaasCephClusterName> with the corresponding
KaaSCephCluster resource from the <managedClusterProjectName>
namespace.
Deploy a new device after removal of a failed one¶
Note
You can spawn Ceph OSD on a raw device, but it must be clean and
without any data or partitions. If you want to add a device that was in use,
also ensure it is raw and clean. To clean up all data and partitions from a
device, refer to official Rook documentation.
If you want to add a Ceph OSD on top of a raw device that already exists
on a node or is hot-plugged, add the required device using the following
guidelines:
You can add a raw device to a node during node deployment.
If a node supports adding devices without node reboot, you can hot plug
a raw device to a node.
If a node does not support adding devices without node reboot, you can
hot plug a raw device during node shutdown. In this case, complete the
following steps:
Enable maintenance mode on the managed cluster.
Turn off the required node.
Attach the required raw device to the node.
Turn on the required node.
Disable maintenance mode on the managed cluster.
Open the KaasCephCluster CR of a managed cluster for editing:
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section, add a new device:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-fullPath:<deviceByID># Since 2.25.0 (17.0.0) if device is supposed to be added with by-id# name: <deviceByID> # Prior MCC 2.25.0 if device is supposed to be added with by-id# fullPath: <deviceByPath> # if device is supposed to be added with by-pathconfig:deviceClass:hdd
Substitute <machineName> with the machine name of the node where device
<deviceName> or <deviceByPath> is going to be added as a Ceph OSD.
Verify that the new Ceph OSD has appeared in the Ceph cluster and is in
and up. The fullClusterInfo section should not contain any issues.
The document describes various scenarios of a Ceph OSD outage and recovery or
replacement. More specifically, this section describes how to replace a failed
Ceph OSD with a metadata device:
If the metadata device is specified as a logical volume in the
BareMetalHostProfile object and defined in the KaaSCephCluster
object as a logical volume path
If the metadata device is specified in the KaaSCephCluster object as a
device name
Note
Ceph OSD replacement implies the usage of the
KaaSCephOperationRequest custom resource (CR). For workflow overview,
spec and phases description, see High-level workflow of Ceph OSD or node removal.
Replace a failed Ceph OSD with a metadata device as a logical volume path¶
You can apply the below procedure in the following cases:
A Ceph OSD failed without data or metadata device outage. In this case,
first remove a failed Ceph OSD and clean up all corresponding disks and
partitions. Then add a new Ceph OSD to the same data and metadata paths.
A Ceph OSD failed with data or metadata device outage. In this case, you
also first remove a failed Ceph OSD and clean up all corresponding disks and
partitions. Then add a new Ceph OSD to a newly replaced data device with the
same metadata path.
Note
The below procedure also applies to manually created metadata
partitions.
Remove a failed Ceph OSD by ID with a defined metadata device¶
Identify the ID of Ceph OSD related to a failed device. For example, use
the Ceph CLI in the rook-ceph-tools Pod:
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section:
Find and capture the metadataDevice path to reuse it during
re-creation of the Ceph OSD.
Remove the required device:
Example configuration snippet:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-name:<deviceName># remove the entire item from the storageDevices list# fullPath: <deviceByPath> if device is specified using by-path instead of nameconfig:deviceClass:hddmetadataDevice:/dev/bluedb/meta_1
In the example above, <machineName> is the name of machine that relates
to the node on which the device <deviceName> or <deviceByPath> must
be replaced.
Create a KaaSCephOperationRequest CR template and save it as
replace-failed-osd-<machineName>-<osdID>-request.yaml:
<machineName> - name of the machine on which the device is being
replaced, for example, worker-1
<nodeName> - underlying node name of the machine, for example,
kaas-node-5a74b669-7e53-4535-aabd-5b509ec844af
<osdId> - Ceph OSD ID for the device being replaced, for example,
1
<dataDeviceByPath> - by-path of the device placed on the node,
for example, /dev/disk/by-path/pci-0000:00:1t.9
<dataDevice> - name of the device placed on the node, for example,
/dev/vde
<metadataDevice> - metadata name of the device placed on the node,
for example, /dev/vdf
<metadataDeviceByPath> - metadata by-path of the device placed
on the node, for example, /dev/disk/by-path/pci-0000:00:12.0
Note
The partitions that are manually created or configured using the
BareMetalHostProfile object can be removed only manually, or during a
complete metadata disk removal, or during the Machine object removal
or re-provisioning.
Verify that the cleanUpMap section matches the required removal and
wait for the ApproveWaiting phase to appear in status:
Review the following status fields of the KaaSCephOperationRequest
CR request processing:
status.phase - current state of request processing
status.messages - description of the current phase
status.conditions - full history of request processing before the
current phase
status.removeInfo.issues and status.removeInfo.warnings - error
and warning messages occurred during request processing, if any
Verify that the KaaSCephOperationRequest has been completed.
For example:
status:phase:Completed# or CompletedWithWarnings if there are non-critical issues
Re-create a Ceph OSD with the same metadata partition¶
Note
You can spawn Ceph OSD on a raw device, but it must be clean and
without any data or partitions. If you want to add a device that was in use,
also ensure it is raw and clean. To clean up all data and partitions from a
device, refer to official Rook documentation.
If you want to add a Ceph OSD on top of a raw device that already exists
on a node or is hot-plugged, add the required device using the following
guidelines:
You can add a raw device to a node during node deployment.
If a node supports adding devices without node reboot, you can hot plug
a raw device to a node.
If a node does not support adding devices without node reboot, you can
hot plug a raw device during node shutdown. In this case, complete the
following steps:
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section, add the replaced device with the same
metadataDevice path as on the removed Ceph OSD. For example:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-name:<deviceByID># Recommended. Add a new device by ID, for example, /dev/disk/by-id/...#fullPath: <deviceByPath> # Add a new device by path, for example, /dev/disk/by-path/...config:deviceClass:hddmetadataDevice:/dev/bluedb/meta_1# Must match the value of the previously removed OSD
Substitute <machineName> with the machine name of the node where the
new device <deviceByID> or <deviceByPath> must be added.
Wait for the replaced disk to apply to the Ceph cluster as a new Ceph OSD.
You can monitor the application state using either the status section
of the KaaSCephCluster CR or in the rook-ceph-tools Pod:
Replace a failed Ceph OSD disk with a metadata device as a device name¶
You can apply the below procedure if a Ceph OSD failed with data disk outage
and the metadata partition is not specified in the BareMetalHostProfile
custom resource (CR). This scenario implies that the Ceph cluster
automatically creates a required metadata logical volume on a desired device.
Remove a Ceph OSD with a metadata device as a device name¶
While editing KaasCephCluster in the nodes section, capture the
metadataDevice path to reuse it during re-creation of the Ceph OSD.
Example of the spec.nodes section:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-name:<deviceName># remove the entire item from the storageDevices list# fullPath: <deviceByPath> if device is specified using by-path instead of nameconfig:deviceClass:hddmetadataDevice:/dev/nvme0n1
In the example above, save the metadataDevice device name
/dev/nvme0n1.
During verification of removeInfo, capture the usedPartition value
of the metadata device located in the deviceMapping.<metadataDevice>
section.
In the example above, capture the following values from the
<metadataDevice> section:
ceph-b0c70c72-8570-4c9d-93e9-51c3ab4dd9f9 - name of the volume group
that contains all metadata partitions on the <metadataDevice> disk
osd-db-ecf64b20-1e07-42ac-a8ee-32ba3c0b7e2f - name of the logical
volume that relates to a failed Ceph OSD
Re-create the metadata partition on the existing metadata disk¶
After you remove the Ceph OSD disk, manually create a separate logical volume
for the metadata partition in an existing volume group on the metadata device:
lvcreate-l100%FREE-nmeta_1<vgName>
Subtitute <vgName> with the name of a volume group captured in the
usedPartiton parameter.
Note
If you removed more than one OSD, replace 100%FREE with the
corresponding partition size. For example:
lvcreate-l<partitionSize>-nmeta_1<vgName>
Substitute <partitionSize> with the corresponding value that matches the
size of other partitions placed on the affected metadata drive. To obtain
<partitionSize>, use the output of the lvs command. For example:
16G.
During execution of the lvcreate command, the system asks you to
wipe the found bluestore label on a metadata device. For example:
Re-create the Ceph OSD with the re-created metadata partition¶
Note
You can spawn Ceph OSD on a raw device, but it must be clean and
without any data or partitions. If you want to add a device that was in use,
also ensure it is raw and clean. To clean up all data and partitions from a
device, refer to official Rook documentation.
If you want to add a Ceph OSD on top of a raw device that already exists
on a node or is hot-plugged, add the required device using the following
guidelines:
You can add a raw device to a node during node deployment.
If a node supports adding devices without node reboot, you can hot plug
a raw device to a node.
If a node does not support adding devices without node reboot, you can
hot plug a raw device during node shutdown. In this case, complete the
following steps:
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section, add the replaced device with the same
metadataDevice path as in the previous Ceph OSD:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-fullPath:<deviceByID># Recommended since MCC 2.25.0 (17.0.0).# Add a new device by-id symlink, for example, /dev/disk/by-id/...#name: <deviceByID> # Add a new device by ID, for example, /dev/disk/by-id/...#fullPath: <deviceByPath> # Add a new device by path, for example, /dev/disk/by-path/...config:deviceClass:hddmetadataDevice:/dev/<vgName>/meta_1
Substitute <machineName> with the machine name of the node where the
new device <deviceByID> or <deviceByPath> must be added.
Also specify metadataDevice with the path to the logical volume created
during the Re-create the metadata partition on the existing metadata disk procedure.
Wait for the replaced disk to apply to the Ceph cluster as a new Ceph OSD.
You can monitor the application state using either the status section
of the KaaSCephCluster CR or in the rook-ceph-tools Pod:
This section describes the scenario when an underlying metadata device fails
with all related Ceph OSDs. In this case, the only solution is to remove all
Ceph OSDs related to the failed metadata device, then attach a device that
will be used as a new metadata device, and re-create all affected Ceph OSDs.
Caution
If you used BareMetalHostProfile to automatically partition
the failed device, you must create a manual partition of the new device
because BareMetalHostProfile does not support hot-load changes and
creates an automatic device partition only during node provisioning.
Remove failed Ceph OSDs with the affected metadata device¶
Save the KaaSCephCluster specification of all Ceph OSDs affected by the
failed metadata device to re-use this specification during re-creation of
Ceph OSDs after disk replacement.
Identify Ceph OSD IDs related to the failed metadata device, for example,
using Ceph CLI in the rook-ceph-tools Pod:
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section, remove all storageDevices items that relate
to the failed metadata device. For example:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-name:<deviceName1># remove the entire item from the storageDevices list# fullPath: <deviceByPath> if device is specified using symlink instead of nameconfig:deviceClass:hddmetadataDevice:<metadataDevice>-name:<deviceName2># remove the entire item from the storageDevices listconfig:deviceClass:hddmetadataDevice:<metadataDevice>-name:<deviceName3># remove the entire item from the storageDevices listconfig:deviceClass:hddmetadataDevice:<metadataDevice>...
In the example above, <machineName> is the machine name of the node
where the metadata device <metadataDevice> must be replaced.
Create a KaaSCephOperationRequest CR template and save it as
replace-failed-meta-<machineName>-<metadataDevice>-request.yaml:
Partition the replaced metadata device by N logical volumes (LVs), where N
is the number of Ceph OSDs previously located on a failed metadata device.
Calculate the new metadata LV percentage of used volume group capacity
using the 100/N formula.
Log in to the node with the replaced metadata disk.
Create an LVM physical volume atop the replaced metadata device:
pvcreate<metadataDisk>
Substitute <metadataDisk> with the replaced metadata device.
Create an LVM volume group atop of the physical volume:
vgcreatebluedb<metadataDisk>
Substitute <metadataDisk> with the replaced metadata device.
Create N LVM logical volumes with the calculated capacity per each volume:
lvcreate-l<X>%VG-nmeta_<i>bluedb
Substitute <X> with the result of the 100/N formula
and <i> with the current number of metadata partitions.
As a result, the replaced metadata device will have N LVM paths, for example,
/dev/bluedb/meta_1.
Re-create a Ceph OSD on the replaced metadata device¶
Note
You can spawn Ceph OSD on a raw device, but it must be clean and
without any data or partitions. If you want to add a device that was in use,
also ensure it is raw and clean. To clean up all data and partitions from a
device, refer to official Rook documentation.
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section, add the cleaned Ceph OSD device with the replaced
LVM paths of the metadata device from previous steps. For example:
spec:cephClusterSpec:nodes:<machineName>:storageDevices:-name:<deviceByID-1># Recommended. Add the new device by ID /dev/disk/by-id/...#fullPath: <deviceByPath-1> # Add a new device by path /dev/disk/by-path/...config:deviceClass:hddmetadataDevice:/dev/<vgName>/<lvName-1>-name:<deviceByID-2># Recommended. Add the new device by ID /dev/disk/by-id/...#fullPath: <deviceByPath-2> # Add a new device by path /dev/disk/by-path/...config:deviceClass:hddmetadataDevice:/dev/<vgName>/<lvName-2>-name:<deviceByID-3># Recommended. Add the new device by ID /dev/disk/by-id/...#fullPath: <deviceByPath-3> # Add a new device by path /dev/disk/by-path/...config:deviceClass:hddmetadataDevice:/dev/<vgName>/<lvName-3>
Substitute <machineName> with the machine name of the node where the
metadata device has been replaced.
Add all data devices for re-created Ceph OSDs and specify
metadataDevice that is the path to the previously created logical
volume. Substitute <vgName> with a volume group name that contains N
logical volumes <lvName-i>.
Wait for the re-created Ceph OSDs to apply to the Ceph cluster.
You can monitor the application state using either the status section
of the KaaSCephCluster CR or in the rook-ceph-tools Pod:
After a physical node replacement, you can use the Ceph LCM API to redeploy
failed Ceph nodes. The common flow of replacing a failed Ceph node is as
follows:
Remove the obsolete Ceph node from the Ceph cluster.
Add a new Ceph node with the same configuration to the Ceph cluster.
Deploy a new Ceph node after removal of a failed one¶
Note
You can spawn Ceph OSD on a raw device, but it must be clean and
without any data or partitions. If you want to add a device that was in use,
also ensure it is raw and clean. To clean up all data and partitions from a
device, refer to official Rook documentation.
Open the KaasCephCluster CR of a managed cluster for editing:
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes section, add a new device:
spec:cephClusterSpec:nodes:<machineName>:# add new configuration for replaced Ceph nodestorageDevices:-fullPath:<deviceByID># Recommended since MCC 2.25.0 (17.0.0), non-wwn by-id symlink# name: <deviceByID> # Prior MCC 2.25.0, non-wwn by-id symlink# fullPath: <deviceByPath> # if device is supposed to be added with by-pathconfig:deviceClass:hdd...
Substitute <machineName> with the machine name of the replaced node and
configure it as required.
Warning
Since MCC 2.25.0 (17.0.0), Mirantis highly recommends using
non-wwn by-id symlinks only to specify storage devices in
the storageDevices list.
Verify that all Ceph daemons from the replaced node have appeared on the
Ceph cluster and are in and up. The fullClusterInfo section
should not contain any issues.
status:fullClusterInfo:clusterStatus:ceph:health:HEALTH_OK...daemonStatus:mgr:running:a is active mgrstatus:Okmon:running:'3/3monsrunning:[abc]inquorum'status:Okosd:running:'3/3running:3up,3in'status:Ok
You may need to manually remove a Ceph OSD, for example, in the following
cases:
If you have removed a device or node from the KaaSCephClusterspec.cephClusterSpec.nodes or spec.cephClusterSpec.nodeGroups section
with manageOsds set to false.
If you do not want to rely on Ceph LCM operations and want to manage the Ceph
OSDs life cycle manually.
To safely remove one or multiple Ceph OSDs from a Ceph cluster, perform the
following procedure for each Ceph OSD one by one.
Warning
The procedure presupposes the Ceph OSD disk or logical volumes
partition cleanup.
To remove a Ceph OSD manually:
Edit the KaaSCephCluster resource on a management cluster:
Substitute <mgmtKubeconfig> with the management cluster kubeconfig
and <managedClusterProjectName> with the project name of the managed
cluster.
In the spec.cephClusterSpec.nodes section, remove the required
storageDevices item of the corresponding node spec. If after removal
storageDevices becomes empty and the node spec has no roles specified,
also remove the node spec.
Obtain kubeconfig of the managed cluster and provide it as an
environment variable:
exportKUBECONFIG=<pathToManagedKubeconfig>
Verify that all Ceph OSDs are up and in, the Ceph cluster is
healthy, and no rebalance or recovery is in progress:
If you are using multiple Ceph OSDs per device or metadata
device, make sure that you can clean up the entire disk. Otherwise,
instead clean up only the logical volume partitions for the volume group
by running lvremove <lvpartion_uuid> any Ceph OSD pod that
belongs to the same host as the removed Ceph OSD.
Delete the rook-ceph/rook-ceph-osd-<ID> deployment previously scaled to
0 replicas:
kubectl-nrook-cephdeletedeployrook-ceph-osd-<ID>
Substitute <ID> with the number of the removed Ceph OSD.
Scale the rook-ceph/rook-ceph-operator deployment to 1 replica and
wait for the orchestration to complete:
Migrate Ceph cluster to address storage devices using by-id¶
The by-id identifier is the only persistent device identifier for a Ceph
cluster that remains stable after the cluster upgrade or any other maintenance.
Therefore, Mirantis recommends using device by-id symlinks rather than
device names or by-path symlinks.
Container Cloud uses the device by-id identifier as the default method
of addressing the underlying devices of Ceph OSDs. Thus, you should migrate
all existing Ceph clusters, which are still utilizing the device names or
device by-path symlinks, to the by-id format.
This section explains how to configure the KaaSCephCluster specification
to use the by-id symlinks instead of disk names and by-path
identifiers as the default method of addressing storage devices.
Note
Mirantis recommends avoiding the use of wwn symlinks as
by-id identifiers due to their lack of persistence expressed in
inconsistent discovery during node boot.
Besides migrating to by-id, consider using the fullPath field for the
by-id symlinks configuration, instead of the name field in the
spec.cephClusterSpec.nodes.storageDevices section. This approach allows for
clear understanding of field namings and their use cases.
Note
MOSK enables you to use fullPath for the
by-id symlinks since MCC 2.25.0 (Cluster release 17.0.0). For earlier
product versions, use the name field instead.
Migrate the Ceph nodes section to by-id identifiers¶
Available since MCC 2.25.0 (Cluster release 17.0.0)
Make sure that your managed cluster is not currently running an upgrade or
any other maintenance process.
Obtain the list of all KaasCephCluster storage devices that use disk
names or disk by-path as identifiers of Ceph node storage devices:
kubectl-n<managedClusterProject>getkcc-oyaml
Substitute <managedClusterProject> with the corresponding managed
cluster namespace.
Verify the items from the storageDevices sections to be moved to
the by-id symlinks. The list of the items to migrate includes:
A disk name in the name field. For example, sdc, nvme3n1,
and so on.
A disk /dev/disk/by-path symlink in the fullPath field.
For example, /dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:2.
A disk /dev/disk/by-id symlink in the name field.
Note
This condition applies since MCC 2.25.0 (Cluster release
17.0.0).
A disk /dev/disk/by-id/wwn symlink, which is programmatically
calculated at boot.
For example, /dev/disk/by-id/wwn-0x26d546263bd312b8.
For the example above, we have to migrate both items of
managed-worker-1, both items of managed-worker-2, and the first
item of managed-worker-3. The second item of managed-worker-3
has already been configured in the required format, therefore, we are
leaving it as is.
To migrate all affected storageDevices items to by-id symlinks,
open the KaaSCephCluster custom resource for editing:
kubectl -n <managedClusterProject> edit kcc
For each affected node from the spec.cephClusterSpec.nodes section,
obtain a corresponding status.providerStatus.hardware.storage section
from the Machine custom resource:
For each affected storageDevices item from the considered Machine,
obtain a correct by-id symlink from
status.providerStatus.hardware.storage.byIDs. Such by-id symlink
must contain status.providerStatus.hardware.storage.serialNumber and
must not contain wwn.
For managed-worker-1, according to the example output above, we can use
the following by-id symlinks:
Replace the first item of storageDevices that contains name:sdc
with
fullPath:/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_2e52abb48862dbdc;
Replace the second item of storageDevices that contains
fullPath:/dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:2
with
fullPath:/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_26d546263bd312b8.
Replace all affected storageDevices items in KaaSCephCluster
with the obtained ones.
Note
Prior to MCC 2.25.0 (Cluster release 17.0.0), place the by-id
symlinks in the name field instead of the fullPath field.
The resulting example of the storage device identifier migration:
Migrate the Ceph nodeGroups section to by-id identifiers¶
Available since MCC 2.25.0 (Cluster release 17.0.0)
Besides the nodes section, your cluster may contain the nodeGroups
section specified with disk names instead of by-id symlinks. Despite
of inplace replacement of the nodes storage device identifiers,
nodeGroups requires another approach because of the repeatable spec
section for different nodes.
In the case of migrating nodeGroups storage devices, the deviceLabels
section should be used to label different disks with the same labels and use
these labels in node groups after. For the deviceLabels section
specification, refer to Ceph advanced configuration: extraOpts.
The following procedure describes how to keep the nodeGroups section
but use unique by-id identifiers instead of disk names.
To migrate the Ceph nodeGroups section to by-id identifiers:
Make sure that your managed cluster is not currently running an upgrade or
any other maintenance process.
Obtain the list of all KaasCephCluster storage devices that use disk
names or disk by-path as identifiers of Ceph node group storage
devices:
kubectl-n<managedClusterProject>getkcc-oyaml
Substitute <managedClusterProject> with the corresponding managed
cluster namespace.
Output example of the KaaSCephClusternodeGroups section with disk
names used as identifiers:
Verify the items from the storageDevices sections to be moved to
by-id symlinks. The list of the items to migrate includes:
A disk name in the name field. For example, sdc, nvme3n1,
and so on.
A disk /dev/disk/by-path symlink in the fullPath field.
For example, /dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:2.
A disk /dev/disk/by-id symlink in the name field.
Note
This condition applies since MCC 2.25.0 (Cluster release
17.0.0).
A disk /dev/disk/by-id/wwn symlink, which is programmatically
calculated at boot.
For example, /dev/disk/by-id/wwn-0x26d546263bd312b8.
All storageDevice sections in the example above contain disk names
in the name field. Therefore, you need to replace them with by-id
symlinks.
Open the KaaSCephCluster custom resource for editing to start
migration of all affected storageDevices items to by-id symlinks:
kubectl -n <managedClusterProject> edit kcc
Within each impacted Ceph node group in the nodeGroups section, add disk
labels to the deviceLabels sections for every affected storage device
linked with the nodes listed in nodes of that specific node group.
Verify that these disk labels are equal to by-id symlinks of
corresponding disks.
For example, if the node group rack-1 contains two nodes node-1 and
node-2 and spec contains three items with name, you need
to obtain proper by-id symlinks for disk names from both nodes and
write it down with the same disk labels. The following example contains
the labels for by-id symlinks of nvme0n1, nvme1n1, and
nvme2n1 disks from node-1 and node-2 correspondingly:
Keep device labels repeatable for all nodes from the node group.
This allows for specifying unified spec for different by-id
symlinks of different nodes.
Example of the full deviceLabels section for the nodeGroups section:
For each affected node group in the nodeGroups section, replace
the field with the insufficient disk identifier to the devLabel
field with the disk label from the deviceLabels section.
For the example above, the updated nodeGroups section looks as follows:
You can start using a storage device only after a corresponding Machine
becomes ready and accessible. Thus, KaaSCephCluster can be created only
after all machines receive the status.providerStatus.hardware.storage
configuration containing all required device by-id symlinks.
Obtain the item from the byIDs list from the
status.providerStatus.hardware.storage section that
contains serialNumber and does not contain wwn as a bus ID.
In the example above, for the disk with the /dev/sdc name, you can use
the /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_2e52abb48862dbdc symlink as
a persistent identifier of the storage device because it contains
the 2e52abb48862dbdc serial number and does not contain wwn.
Note
Do not rely on the byID field only. This field may contain
a /dev/disk/by-id/wwn symlink that cannot be considered
a persistent identifier of a storage device.
This section describes how to increase the overall storage size for all Ceph
pools of the same device class: hdd, ssd, or nvme.
The procedure presupposes adding a new Ceph OSD. The overall storage size for
the required device class automatically increases once the Ceph OSD becomes
available in the Ceph cluster.
To increase the overall storage size for a device class:
Identify the current storage size for the required device class:
This document describes how to migrate a Ceph Monitor daemon from one node to
another without changing the general number of Ceph Monitors in the cluster.
In the Ceph Controller concept, migration of a Ceph Monitor means manually
removing it from one node and adding it to another.
Consider the following exemplary placement scheme of Ceph Monitors in the
nodes spec of the KaaSCephCluster CR:
nodes:node-1:roles:-mon-mgrnode-2:roles:-mgr
Using the example above, if you want to move the Ceph Monitor from node-1
to node-2 without changing the number of Ceph Monitors, the roles table
of the nodes spec must result as follows:
nodes:node-1:roles:-mgrnode-2:roles:-mgr-mon
However, due to the Rook limitation related to Kubernetes architecture, once
you move the Ceph Monitor through the KaaSCephCluster CR, changes will not
apply automatically. This is caused by the following Rook behavior:
Rook creates Ceph Monitor resources as deployments with nodeSelector,
which binds Ceph Monitor pods to a requested node.
Rook does not recreate new Ceph Monitors with the new node placement if the
current mon quorum works.
Therefore, to move a Ceph Monitor to another node, you must also manually apply
the new Ceph Monitors placement to the Ceph cluster as described below.
Substitute <managedClusterProjectName> with the corresponding value.
In the nodes spec of the KaaSCephCluster CR, change the mon
roles placement without changing the total number of mon roles. For
details, see the example above. Note the nodes on which the mon roles
have been removed.
Wait until the corresponding MiraCeph resource is updated with the new
nodes spec:
Substitute <kubeconfig> with the Container Cloud cluster kubeconfig
that hosts the required Ceph cluster.
In the MiraCeph resource, determine which node has been changed in the
nodes spec. Save the name value of the node where the mon role
has been removed for further usage.
Once done, Rook removes the obsolete Ceph Monitor from the node and creates
a new one on the specified node with a new letter. For example, if the a,
b, and c Ceph Monitors were in quorum and mon-c was obsolete, Rook
removes mon-c and creates mon-d. In this case, the new quorum includes
the a, b, and d Ceph Monitors.
Migrate a Ceph Monitor before machine replacement¶
Available since MCC 2.25.0 (Cluster release 17.0.0)
This document describes how to migrate a Ceph Monitor to another machine
on baremetal-based clusters before node replacement as described in
Delete a cluster machine using web UI.
Warning
Remove the Ceph Monitor role before the machine removal.
Make sure that the Ceph cluster always has an odd number of
Ceph Monitors.
The procedure of a Ceph Monitor migration assumes that you temporarily move
the Ceph Manager/Monitor to a worker machine. After a node replacement, we
recommend migrating the Ceph Manager/Monitor to the new manager machine.
Ceph Controller enables you to deploy RADOS Gateway (RGW) Object Storage
instances and automatically manage its resources such as users and buckets.
Ceph Object Storage has an integration with OpenStack Object Storage (Swift)
in MOSK.
To enable the RGW Object Storage:
Open the KaasCephCluster CR of a managed cluster for editing:
Substitute <managedClusterProjectName> with a corresponding value.
Using the following table, update the cephClusterSpec.objectStorage.rgw
section specification as required:
Caution
Since MCC 2.24.0 (Cluster releases 15.0.1 and 14.0.1),
explicitly specify the deviceClass parameter for dataPool and
metadataPool.
Warning
Since Container Cloud 2.6.0, the spec.rgw section
is deprecated and its parameters are moved under objectStorage.rgw.
If you continue using spec.rgw, it is automatically translated
into objectStorage.rgw during the Container Cloud update to 2.6.0.
We strongly recommend changing spec.rgw to objectStorage.rgw
in all KaaSCephCluster CRs before spec.rgw becomes unsupported
and is deleted.
Mutually exclusive with the zone parameter. Object storage data pool
spec that should only contain replicated or erasureCoded and
failureDomain parameters. The failureDomain parameter may be
set to osd or host, defining the failure domain across which
the data will be spread. For dataPool, Mirantis recommends using an
erasureCoded pool. For details, see
Rook documentation: Erasure coding.
For example:
Mutually exclusive with the zone parameter. Object storage metadata
pool spec that should only contain replicated and failureDomain
parameters. The failureDomain parameter may be set to osd or
host, defining the failure domain across which the data will be
spread. Can use only replicated settings. For example:
where replicated.size is the number of full copies of data on
multiple nodes.
Warning
When using the non-recommended Ceph pools replicated.size of
less than 3, Ceph OSD removal cannot be performed. The minimal replica
size equals a rounded up half of the specified replicated.size.
For example, if replicated.size is 2, the minimal replica size is
1, and if replicated.size is 3, then the minimal replica size
is 2. The replica size of 1 allows Ceph having PGs with only one
Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED
health warning that blocks Ceph OSD removal. Mirantis recommends setting
replicated.size to 3 for each Ceph pool.
gateway
The gateway settings corresponding to the rgw daemon settings.
Includes the following parameters:
port - the port on which the Ceph RGW service will be listening on
HTTP.
securePort - the port on which the Ceph RGW service will be
listening on HTTPS.
instances - the number of pods in the Ceph RGW ReplicaSet. If
allNodes is set to true, a DaemonSet is created instead.
Note
Mirantis recommends using 2 instances for Ceph Object Storage.
allNodes - defines whether to start the Ceph RGW pods as a
DaemonSet on all nodes. The instances parameter is ignored if
allNodes is set to true.
Defines whether to delete the data and metadata pools in the rgw
section if the object storage is deleted. Set this parameter to true
if you need to store data even if the object storage is deleted.
However, Mirantis recommends setting this parameter to false.
objectUsers and buckets
Optional. To create new Ceph RGW resources, such as buckets or users,
specify the following keys. Ceph Controller will automatically create
the specified object storage users and buckets in the Ceph cluster.
objectUsers - a list of user specifications to create for object
storage. Contains the following fields:
name - a user name to create.
displayName - the Ceph user name to display.
capabilities - user capabilities:
user - admin capabilities to read/write Ceph Object Store
users.
bucket - admin capabilities to read/write Ceph Object Store
buckets.
metadata - admin capabilities to read/write Ceph Object Store
metadata.
usage - admin capabilities to read/write Ceph Object Store
usage.
zone - admin capabilities to read/write Ceph Object Store
zones.
users - a list of strings that contain user names to create for
object storage.
Note
This field is deprecated. Use objectUsers
instead. If users is specified, it will be automatically
transformed to the objectUsers section.
buckets - a list of strings that contain bucket names to create
for object storage.
zone
Optional. Mutually exclusive with metadataPool and dataPool.
Defines the Ceph Multisite zone where the object storage must be placed.
Includes the name parameter that must be set to one of the zones
items. For details, see Enable multisite for Ceph RGW Object Storage.
Optional. Available since {{ product_name_abbr }} 25.1. Flag to determine
that a TLS certificate for accessing the Ceph RGW endpoint is used but
not exposed in spec. For example:
The operator must manually provide TLS configuration using the
rgw-ssl-certificate secret in the rook-ceph namespace of the
managed cluster. The secret object must have the following structure:
When removing an already existing SSLCert block, no additional actions
are required, because this block uses the same rgw-ssl-certificate secret
in the rook-ceph namespace.
When adding a new secret directly without exposing it in spec, the following
rules apply:
cert - base64 representation of a file with the server TLS key,
server TLS cert, and cacert.
The Ceph multisite feature allows object storage to replicate its data over
multiple Ceph clusters. Using multisite, such object storage is independent and
isolated from another object storage in the cluster. Only the multi-zone
multisite setup is currently supported. For more details, see
Ceph documentation: Multisite.
List of realms to use, represents the realm namespaces. Includes the
following parameters:
name - the realm name.
pullEndpoint - optional, required only when the master zone is in
a different storage cluster. The endpoint, access key, and system key
of the system user from the realm to pull from. Includes the
following parameters:
endpoint - the endpoint of the master zone in the master zone
group.
accessKey - the access key of the system user from the realm to
pull from.
secretKey - the system key of the system user from the realm to
pull from.
zoneGroupsTechnical Preview
The list of zone groups for realms. Includes the following parameters:
name - the zone group name.
realmName - the realm namespace name to which the zone group
belongs to.
zonesTechnical Preview
The list of zones used within one zone group. Includes the following
parameters:
name - the zone name.
metadataPool - the settings used to create the Object Storage
metadata pools. Must use replication. For details, see
Pool parameters.
dataPool - the settings to create the Object Storage data pool.
Can use replication or erasure coding. For details, see
Pool parameters.
zoneGroupName - the zone group name.
endpointsForZone - available since {{ product_name_abbr }} 24.2.
The list of all endpoints in the zone group.
If you use ingress proxy for RGW, the list of endpoints must contain
that FQDN/IP address to access RGW. By default, if no ingress proxy
is used, the list of endpoints is set to the IP address of the RGW
external service. Endpoints must follow the HTTP URL format.
Caution
The multisite configuration requires master and secondary zones
to be reachable from each other.
Select from the following options:
If you do not need to replicate data from a different storage cluster,
and the current cluster represents the master zone, modify the current
objectStorage section to use the multisite mode:
Configure the zone RADOS Gateway (RGW) parameter by setting it to
the RGW Object Storage name.
Note
Leave dataPool and metadataPool empty. These
parameters are ignored because the zone block in the multisite
configuration specifies the pools parameters. Other RGW parameters
do not require changes.
Create the multiSite section where the names of realm, zone group,
and zone must match the current RGW name.
Since MCC 2.27.0 (Cluster release 17.2.0), specify the
endpointsForZone parameter according to your configuration:
If you use ingress proxy, which is defined in the
spec.cephClusterSpec.ingress section, add the FQDN endpoint.
If you do not use any ingress proxy and access the RGW API using the
default RGW external service, add the IP address of the external
service or leave this parameter empty.
The following example illustrates a complete objectStorage section:
If you use a different storage cluster, and its object storage data must
be replicated, specify the realm and zone group names along with the
pullEndpoint parameter. Additionally, specify the endpoint, access
key, and system keys of the system user of the realm from which you need
to replicate data. For details, see the step 2 of this procedure.
To obtain the endpoint of the cluster zone that must be replicated, run
the following command by specifying the zone group name of the required
master zone on the master zone side:
Mirantis recommends using the same metadataPool and
dataPool settings as you use in the master zone.
Configure the zone RGW parameter and leave dataPool
and metadataPool empty. These parameters are ignored because
the zone section in the multisite configuration specifies the pools
parameters.
Also, you can split the RGW daemon on daemons serving clients and daemons
running synchronization. To enable this option, specify
splitDaemonForMultisiteTrafficSync in the gateway section.
On the ceph-tools pod, verify the multisite status:
radosgw-adminsyncstatus
Once done, ceph-operator will create the required resources and Rook will
handle the multisite configuration. For details, see: Rook documentation:
Object Multisite.
Rook does not handle multisite configuration changes and cleanup.
Therefore, once you enable multisite for Ceph RGW Object Storage, perform
these operations manually in the ceph-tools pod. For details, see
Rook documentation: Multisite cleanup.
If automatic update of zone group hostnames is disabled, manually specify all
required hostnames and update the zone group. In the ceph-tools pod, run
the following script:
The section describes how to create, access, and remove Ceph RADOS Block
Device (RBD) or Ceph File System (CephFS) clients and RADOS Gateway (RGW)
users.
The KaaSCephCluster resource allows managing custom Ceph RADOS Block
Device (RBD) or Ceph File System (CephFS) clients. This section describes
how to create, access, and remove Ceph RBD or CephFS clients.
For all supported parameters of Ceph clients, refer to Clients parameters.
where mon_host are the comma-separated IP addresses with 6789
ports of the current Ceph Monitors. For example,
10.10.0.145:6789,10.10.0.153:6789,10.10.0.235:6789.
/etc/ceph/ceph.client.<clientName>.keyring:
[client.<clientName>]key=<cephClientCredentials>
<clientName> is a client name set in
spec.cephClusterSpec.clients the KaaSCephCluster resource,
for example, rbd-client
<cephClientCredentials> are the client credentials obtained in the
previous steps. For example,
AQAGHDNjxWYXJhAAjafCn3EtC6KgzgI1x4XDlg==
If the client caps parameters contain mon:allowr, verify the
client access using the following command:
The KaaSCephCluster resource allows managing custom Ceph Object Storage
users. This section describes how to create, access, and remove Ceph Object
Storage users.
Substitute <objstoreName> with a Ceph Object Storage name and
<username> with a Ceph Object Storage user name.
Use secretName and secretNamespace to access the Ceph Object
Storage user credentials from a managed cluster. The secret contains Amazon
S3 access and secret keys.
This section describes how to verify the components of a Ceph cluster after
deployment. For troubleshooting, verify Ceph Controller and Rook logs as
described in Verify Ceph Controller and Rook.
To confirm that all Ceph components including mon, mgr, osd, and
rgw have joined your cluster properly, analyze the logs for each pod and
verify the Ceph status:
To ensure that rook-discover is running properly, verify if the
local-device configmap has been created for each Ceph node specified
in the cluster configuration:
Verifying Ceph cluster state is an entry point for issues investigation.
This section describes how to verify Ceph state using the KaaSCephCluster,
MiraCeph, and MiraCephLog resources.
Note
Before MOSK 25.1, use MiraCephLog
instead of MiraCephHealth.
To verify the state of a Ceph cluster, Ceph Controller provides special
sections in KaaSCephCluster.status. The resource contains information about
the state of the Ceph cluster components, their health, and potentially
problematic components.
To verify the Ceph cluster state from a managed cluster:
Obtain kubeconfig of a managed cluster and provide it as an environment
variable:
KaaSCephCluster.status allows you to learn the current health of a Ceph
cluster and identify potentially problematic components. This section describes
KaaSCephCluster.status and its fields. To view KaaSCephCluster.status,
perform the steps described in Verify Ceph cluster state through CLI.
Available since MCC 2.25.0 (Cluster release 17.0.0).
Describes the current state of KaasCephCluster and reflects any
errors during object reconciliation, including spec generation,
object creation on a managed cluster, and status retrieval.
miraCephInfo
Describes the current phase of Ceph spec reconciliation and spec
validation result. The miraCephInfo section contains information
about the current validation and reconcile of the KaaSCephCluster
and MiraCeph resources. It helps to understand whether the specified
configuration is valid to create a Ceph cluster and informs about the
current phase of applying this configuration. For miraCephInfo
fields description, see KaaSCephCluster.status miraCephInfo specification.
shortClusterInfo
Reresents a short version of fullclusterinfo and contains a summary
on the Ceph cluster state collecting process and potential issues. It
helps to quickly verify if the fullClusterInfo is actual and if any
errors occurred during the information collecting. For
shortClusterInfo fields description, see KaaSCephCluster.status shortClusterInfo specification.
fullClusterInfo
Contains a complete Ceph cluster information including cluster, Ceph
resources, and daemons health. It helps to reveal the potentially
problematic components. For fullClusterInfo fields description, see
KaaSCephCluster.status fullClusterInfo specification.
miraCephSecretsInfo
Contains information about secrets of the managed cluster that are used
in the Ceph cluster, such as keyrings, Ceph clients, RADOS Gateway user
credentials, and so on. For miraCephSecretsInfo fields description, see
KaaSCephCluster.status miraCephSecretsInfo specification.
The following tables describe all sections of KaaSCephCluster.status.
Contains the current phase of handling of the applied Ceph cluster spec.
Can equal to Creating, Deploying, Validation, Ready,
Deleting, or Failed.
message
Contains a detailed description of the current phase or an error message
if the phase is Failed.
validation
Contains the KaaSCephCluster/MiraCeph spec validation result
(Succeed or Failed) with a list of messages, if any. The
validation section includes the following fields:
validation:result:Succeed or Failedmessages:["error","messages","list"]
General information from Rook about the Ceph cluster health and current
state. The clusterStatus field contains the following fields:
clusterStatus:state:<rook ceph cluster common status>phase:<rook ceph cluster spec reconcile phase>message:<rook ceph cluster phase details>conditions:<history of rook ceph clusterreconcile steps>ceph:<ceph cluster health>storage:deviceClasses:<list of used device classesin ceph cluster>version:image:<ceph image used in ceph cluster>version:<ceph version of ceph cluster>
operatorStatus
Status of the Rook Ceph Operator pod that is Ok or Notrunning.
daemonsStatus
Map of statuses for each Ceph cluster daemon type. Indicates the
expected and actual number of Ceph daemons on the cluster. Available
daemon types are: mgr, mon, osd, and rgw. The
daemonsStatus field contains the following fields:
daemonsStatus:<daemonType>:status:<daemons status>running:<number of running daemons withdetails>
For example:
daemonsStatus:mgr:running:a is active mgr ([] standBy)status:Okmon:running:'3/3monsrunning:[acd]inquorum'status:Okosd:running:'4/4running:4up,4in'status:Okrgw:running:2/2 running([openstack.store.a openstack.store.b])status:Ok
blockStorageStatus
State of the Ceph cluster block storage resources. Includes the
following fields:
pools - status map for each CephBlockPool resource. The map
includes the following fields:
pools:<cephBlockPoolName>:present:<flag whether desired pool ispresent in ceph cluster>status:<rook ceph block pool resource status>
clients - status map for each Ceph client resource. The map
includes the following fields:
Verbose details of the Ceph cluster state. cephDetails includes the
following fields:
diskUsage - the used, available, and total storage size for each
deviceClass and pool.
cephDetails:diskUsage:deviceClass:<deviceClass>:# The amount of raw storage consumed by user data (excluding bluestore database).bytesUsed:"<number>"# The amount of free space available in the cluster.bytesAvailable:"<number>"# The amount of storage capacity managed by the cluster.bytesTotal:"<number>"pools:<poolName>:# The space allocated for a pool over all OSDs. This includes replication,# allocation granularity, and erasure-coding overhead. Compression savings# and object content gaps are also taken into account. BlueStore database# is not included in this amount.bytesUsed:"<number>"# The notional percentage of storage used per pool.usedPercentage:"<number>"# Number calculated with the formula: bytesTotal - bytesUsed.bytesAvailable:"<number>"# An estimate of the notional amount of data that can be written to this pool.bytesTotal:"<number>"
cephDeviceMapping - a key-value mapping of which node contains
which Ceph OSD and which Ceph OSD uses which disk.
In MCC 2.24.2 (Cluster release 15.0.1), cephDeviceMapping
is removed because its large size can potentially exceed the Kubernetes
1.5 MB quota.
cephCSIPluginDaemonsStatus
Contains information, similar to the daemonsStatus format, for each
Ceph CSI plugin deployed in the Ceph cluster: rbd and, if enabled,
cephfs.
The cephCSIPluginDaemonsStatus field contains the following fields:
cephCSIPluginDaemonsStatus:<csiPlugin>:running:<number of running daemons with details>status:<csi plugin status>
For example:
cephCSIPluginDaemonsStatus:csi-rbdplugin:running:1/3 runningstatus:Some csi-rbdplugin daemons are not readycsi-cephfsplugin:running:3/3 runningstatus:Ok
KaaSCephCluster.status miraCephSecretsInfo specification
Available since MCC 2.23.1 (Cluster release 12.7.0)¶
Field
Description
state
Current state of the secret collector on the Ceph cluster:
Ready - secrets information is collected successfully
Failed - secrets information fails to be collected
lastSecretCheck
DateTime when the Ceph cluster secrets were verified last time.
lastSecretUpdate
DateTime when the Ceph cluster secrets were updated last time.
secretsInfo
List of secrets for Ceph clients and RADOS Gateway users:
clientSecrets - details on secrets for Ceph clients
rgwUserSecrets - details on secrets for Ceph RADOS Gateway users
The web UI capabilities for adding and managing a Ceph cluster are limited
and lack flexibility in defining Ceph cluster specifications.
For example, if an error occurs while adding a Ceph cluster using the
web UI, usually you can address it only through the CLI.
The web UI functionality for managing Ceph cluster is going to be
deprecated in one of the following releases.
Verifying Ceph cluster state is an entry point for issues investigation.
Through the Ceph Clusters page of the Container Cloud web UI, you
can view a detailed summary on all Ceph clusters deployed, including the
cluster name and ID, health status, number of Ceph OSDs, and so on.
To view Ceph cluster summary:
Log in to the Container Cloud web UI with the m:kaas:namespace@operator
or m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project action
icon located on top of the main left-side navigation panel.
In the Clusters tab, click the required cluster name. The page
with cluster details opens.
In the Ceph Clusters tab, verify the overall cluster health
and rebalancing statuses.
Available since MCC 2.25.0 (Cluster release 17.0.0).
Click Cluster Details:
The Machines tab contains the list of deployed Ceph machines
with the following details:
Status - deployment status
Role - role assigned to a machine, manager or monitor
Storage devices - number of storage devices assigned to a
machine
UP OSDs and IN OSDs - number of up and
in Ceph OSDs belonging to a machine
Note
To obtain details about a specific machine used for Ceph
deployment, in the Clusters > <clusterName> > Machines tab,
click the required machine name containing the storage label.
The OSDs tab contains the list of Ceph OSDs comprising the
Ceph cluster with the following details:
OSD - Ceph OSD ID
Storage Device ID - storage device ID assigned to a Ceph OSD
Type - type of storage device assigned to a Ceph OSD
Partition - partition name where Ceph OSD is located
The starting point for Ceph troubleshooting is the ceph-controller and
rook-operator logs. Once you locate the component that causes issues,
verify the logs of the related pod. This section describes how to verify the
Ceph Controller and Rook objects of a Ceph cluster.
To verify Ceph Controller and Rook:
Verify the Ceph cluster status:
Verify that the status of each pod in the ceph-lcm-mirantis and
rook-ceph namespaces is Running:
For ceph-lcm-mirantis:
kubectlgetpod-nceph-lcm-mirantis
For rook-ceph:
kubectlgetpod-nrook-ceph
Verify Ceph Controller. Ceph Controller prepares the configuration that Rook
uses to deploy the Ceph cluster, managed using the KaasCephCluster
resource. If Rook cannot finish the deployment, verify the Rook Operator
logs as described in the step 4.
On the managed cluster, verify the MiraCeph subresource:
kubectlgetmiraceph-nceph-lcm-mirantis-oyaml
Verify the Rook Operator logs. Rook deploys a Ceph cluster based on custom
resources created by the Ceph Controller, such as pools, clients,
cephcluster, and so on. Rook logs contain details about components
orchestration. For details about the Ceph cluster status and to get access
to CLI tools, connect to the ceph-tools pod as described in the step 5.
Verify the Rook Operator logs:
kubectl-nrook-cephlogs-lapp=rook-ceph-operator
Verify the CephCluster configuration:
Note
The Ceph Controller manages the CephCluster CR .
Open the CephCluster CR only for verification and do not modify it
manually.
Verify that CLI commands can run on the ceph-tools pod:
ceph-s
Verify hardware:
Through the ceph-tools pod, obtain the required device in your
cluster:
cephosdtree
Enter all Ceph OSD pods in the rook-ceph namespace one by one:
kubectlexec-it-nrook-ceph<osd-pod-name>bash
Verify that the ceph-volume tool is available on all pods running on
the target node:
ceph-volumelvmlist
Verify data access. Ceph volumes can be consumed directly by Kubernetes
workloads and internally, for example, by OpenStack services. To verify the
Kubernetes storage:
Verify the available storage classes. The storage classes that are
automatically managed by Ceph Controller use the
rook-ceph.rbd.csi.ceph.com provisioner.
This document does not provide any specific recommendations on
requests and limits for Ceph resources. The document stands for a native
Ceph resources configuration for any cluster with MOSK.
You can configure Ceph Controller to manage Ceph resources by specifying their
requirements and constraints. To configure the resources consumption for the
Ceph nodes, consider the following options that are based on different Helm
release configuration values:
Configuring tolerations for taint nodes for the Ceph Monitor, Ceph Manager,
and Ceph OSD daemons. For details, see
Taints and Tolerations.
Configuring nodes resources requests or limits for the Ceph daemons and for
each Ceph OSD device class such as HDD, SSD, or NVMe. For details, see
Managing Resources for Containers.
To enable Ceph tolerations and resources management:
To avoid Ceph cluster health issues during daemons configuration changing,
set Ceph noout, nobackfill, norebalance, and norecover
flags through the ceph-tools pod before editing Ceph tolerations
and resources:
Specifies resources requests or limits. The parameter is a map with
the daemon type as a key and the following structure as a value:
hyperconverge:resources:<daemonType>:requests:<kubernetes valid spec of daemon resource requests>limits:<kubernetes valid spec of daemon resource limits>
Possible values for <daemonType> are mon, mgr, osd,
osd-hdd, osd-ssd, osd-nvme, prepareosd, rgw, and
mds.
The osd-hdd, osd-ssd, and osd-nvme resource requirements
handle only the Ceph OSDs with a corresponding device class.
Save the reconfigured KaaSCephCluster resource and wait for
ceph-controller to apply the updated Ceph configuration. It will
recreate Ceph Monitors, Ceph Managers, or Ceph OSDs according to the
specified hyperconverge configuration.
If you have specified any osd tolerations, additionally specify
tolerations for the rook instances:
Open the Cluster resource of the required Ceph cluster on a
management cluster:
kubectl-n<ClusterProjectName>editcluster
Substitute <ClusterProjectName> with the project name of the required
cluster.
Specify the parameters in the ceph-controller section of
spec.providerSpec.value.helmReleases:
Specify the hyperconverge.tolerations.rook parameter as required:
In <yamlFormattedKubernetesTolerations>, specify YAML-formatted
tolerations from
cephClusterSpec.hyperconverge.tolerations.osd.rules of the
KaaSCephCluster spec. For example:
In controllers.cephController.replicas,
controllers.cephRequest.replicas, and
controllers.cephStatus.replicas, specify the replicas count. The
default is 3 replicas. For example:
Save the reconfigured Cluster resource and wait for the
ceph-controller Helm release update. It will recreate Ceph CSI and
discover pods according to the specified
hyperconverge.tolerations.rook configuration.
Specify tolerations for different Rook resources using the following
chart-based options:
hyperconverge.tolerations.rook - general toleration rules for each
Rook service if no exact rules specified
hyperconverge.tolerations.csiplugin - for tolerations of the
ceph-csi plugins DaemonSets
hyperconverge.tolerations.csiprovisioner - for the ceph-csi
provisioner deployment tolerations
hyperconverge.nodeAffinity.csiprovisioner - provides the ceph-csi
provisioner node affinity with a value section
After a successful Ceph reconfiguration, unset the flags set in step 1
through the ceph-tools pod:
After you enable Ceph resources management as described in
Enable Ceph tolerations and resources management, perform the steps below to verify that the
configured tolerations, requests, or limits have been successfully specified in
the Ceph cluster.
To verify Ceph tolerations and resources management:
To verify that the required tolerations are specified in the Ceph cluster,
inspect the output of the following commands:
To verify that the required resources requests or limits are specified for
the Ceph mon, mgr, or osd daemons, inspect the output of the
following command:
To verify that the required resources requests or limits are specified for
the Ceph OSDs hdd, ssd, or nvme device classes, perform the
following steps:
Identify which Ceph OSDs belong to the <deviceClass> device class in
question:
Ceph allows establishing multiple IP networks and subnet masks for clusters
with configured L3 network rules. In MOSK, you can configure
multinetwork through the network section of the KaaSCephCluster CR.
Ceph Controller uses this section to specify the Ceph networks for external
access and internal daemon communication. The parameters in the network
section use the CIDR notation, for example, 10.0.0.0/24.
Before enabling multiple networks for a Ceph cluster, consider the following
requirements:
Do not confuse the IP addresses you define with the public-facing IP
addresses the network clients may use to access the services.
If you define more than one IP address and subnet mask for the public or
cluster network, ensure that the subnets within the network can route to
each other.
Include each IP address or subnet in the network section to IP tables and
open ports for them as necessary.
The pods of the Ceph OSD and RadosGW daemons use cross-pods health checkers
to verify that the entire Ceph cluster is healthy. Therefore, each CIDR must
be accessible inside Ceph pods.
Avoid using the 0.0.0.0/0 CIDR in the network section. With a zero
range in publicNet and/or clusterNet, the Ceph daemons behavior
is unpredictable.
To enable multinetwork for Ceph:
Select from the following options:
If the Ceph cluster is not deployed on a managed cluster yet, edit the
deployment KaaSCephCluster YAML template.
If the Ceph cluster is already deployed on a managed cluster, open
KaaSCephCluster for editing:
If you are creating a managed cluster, save the updated
KaaSCephCluster template to the corresponding file and proceed with
the managed cluster creation.
If you are configuring KaaSCephCluster of an existing managed cluster,
exiting the text editor will apply the changes.
Once done, the specified network CIDRs will be passed to the Ceph daemons pods
through the rook-config-override ConfigMap.
This section describes how to configure and use RADOS Block Device (RBD)
mirroring for Ceph pools using the rbdMirror section in the
KaaSCephCluster CR. The feature may be useful if, for example, you have
two interconnected managed clusters. Once you enable RBD mirroring, the
images in the specified pools will be replicated and if a cluster becomes
unreachable, the second one will provide users with instant access to all
images. For details, see Ceph Documentation: RBD Mirroring.
Note
Ceph Controller only supports bidirectional mirroring.
To enable Ceph RBD monitoring, follow the procedure below and use the following
rbdMirror parameters description:
Count of rbd-mirror daemons to spawn. Mirantis recommends using one
instance of the rbd-mirror daemon.
peers
Optional. List of mirroring peers of an external cluster to connect to.
Only a single peer is supported. The peer section includes the
following parameters:
site - the label of a remote Ceph cluster associated with the
token.
token - the token that will be used by one site (Ceph cluster) to
pull images from the other site. To obtain the token, use the
rbd mirror pool peer bootstrap create command.
pools - optional, a list of pool names to mirror.
To enable Ceph RBD mirroring:
In KaaSCephCluster CRs of both Ceph clusters where you want to enable
mirroring, specify positive daemonsCount in the
spec.cephClusterSpec.rbdMirror section:
spec:cephClusterSpec:rbdMirror:daemonsCount:1
On both Ceph clusters where you want to enable mirroring, wait for the Ceph
RBD Mirror daemons to start running:
In KaaSCephCluster of both Ceph clusters where you want to enable
mirroring, specify the spec.cephClusterSpec.pools.mirroring.mode
parameter for all pools that must be mirrored.
Mirroring mode recommendations
Mirantis recommends using the pool mode for mirroring. For the
pool mode, explicitly enable journaling for each image.
To use the image mirroring mode, explicitly enable mirroring as
described in the step 8.
Substitute <mirroringPoolName> with the name of a pool to be mirrored.
In <siteName>, assign a label for the external Ceph cluster that will be
used along with mirroring.
Substitute <siteName> with the label assigned to the external Ceph
cluster, <bootstrapPeer> with the token obtained in the previous step,
and <mirroringPoolName> with names of pools that have the
mirroring.mode parameter defined.
Substitute <poolName> with the name of a pool with the image
mirroring mode, <imageName> with the name of an image stored in the
specified pool. Substitute <imageMirroringMode> with one of:
journal - for mirroring to use the RBD journaling image feature to
replicate the image contents. If the RBD journaling image feature is not
yet enabled on the image, it will be enabled automatically.
snapshot - for mirroring to use RBD image mirror-snapshots to
replicate the image contents. Once enabled, an initial mirror-snapshot
will automatically be created. To create additional RBD image
mirror-snapshots, use the rbd command.
Since Ceph Pacific, Ceph CSI driver does not propagate the
777 permission on the mount point of persistent volumes based on any
StorageClass of the CephFS data pool.
The Ceph Shared File System, or CephFS, provides the capability to create
read/write shared file system Persistent Volumes (PVs). These PVs support the
ReadWriteMany access mode for the FileSystem volume mode.
CephFS deploys its own daemons called MetaData Servers or Ceph MDS. For
details, see Ceph Documentation: Ceph File System.
Note
By design, CephFS data pool and metadata pool must be replicated
only.
Limitations
CephFS is supported as a Kubernetes CSI plugin that only supports creating
Kubernetes Persistent Volumes based on the FileSystem volume mode.
For a complete modes support matrix, see Ceph CSI: Support Matrix.
Before MOSK 25.1, Ceph Controller supports only one
CephFS installation per Ceph cluster.
Re-creating of the CephFS instance in a cluster requires a
different value for the name parameter.
A list of CephFS data pool specifications. Each spec contains the
name, replicated or erasureCoded, deviceClass, and
failureDomain parameters. The first pool in the list is treated
as the default data pool for CephFS and must always be
replicated. The failureDomain parameter may be set to osd
or host, defining the failure domain across which the data will
be spread. The number of data pools is unlimited, but the default
pool must always be present. For example:
Where replicated.size is the number of full copies of data on
multiple nodes.
Warning
When using the non-recommended Ceph pools replicated.size of
less than 3, Ceph OSD removal cannot be performed. The minimal replica
size equals a rounded up half of the specified replicated.size.
For example, if replicated.size is 2, the minimal replica size is
1, and if replicated.size is 3, then the minimal replica size
is 2. The replica size of 1 allows Ceph having PGs with only one
Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED
health warning that blocks Ceph OSD removal. Mirantis recommends setting
replicated.size to 3 for each Ceph pool.
Warning
Modifying of dataPools on a deployed CephFS has no
effect. You can manually adjust pool settings through the Ceph
CLI. However, for any changes in dataPools, Mirantis
recommends re-creating CephFS.
metadataPool
CephFS metadata pool spec that should only contain replicated,
deviceClass, and failureDomain parameters. The
failureDomain parameter may be set to osd or host,
defining the failure domain across which the data will be spread.
Can use only replicated settings. For example:
where replicated.size is the number of full copies of data on
multiple nodes.
Warning
Modifying of metadataPool on a deployed CephFS has
no effect. You can manually adjust pool settings through the
Ceph CLI. However, for any changes in metadataPool,
Mirantis recommends re-creating CephFS.
preserveFilesystemOnDelete
Defines whether to delete the data and metadata pools if CephFS is
deleted. Set to true to avoid occasional data loss in case of
human error. However, for security reasons, Mirantis recommends
setting preserveFilesystemOnDelete to false.
metadataServer
Metadata Server settings correspond to the Ceph MDS daemon settings.
Contains the following fields:
activeCount - the number of active Ceph MDS instances. As load
increases, CephFS will automatically partition the file system
across the Ceph MDS instances. Rook will create double the number
of Ceph MDS instances as requested by activeCount. The extra
instances will be in the standby mode for failover. Mirantis
recommends specifying this parameter to 1 and increasing the
MDS daemons count only in case of high load.
activeStandby - defines whether the extra Ceph MDS instances
will be in active standby mode and will keep a warm cache of the
file system metadata for faster failover. The instances will be
assigned by CephFS in failover pairs. If false, the extra
Ceph MDS instances will all be in passive standby mode and will
not maintain a warm cache of the metadata. The default value is
false.
resources - represents Kubernetes resource requirements for
Ceph MDS pods.
For example:
cephClusterSpec:sharedFilesystem:cephFS:-name:cephfs-storemetadataServer:activeCount:1activeStandby:falseresources:# example, non-prod valuesrequests:memory:1Gicpu:1limits:memory:2Gicpu:2
Optional. Override the CSI CephFS gRPC and liveness metrics port. For
example, if an application is already using the default CephFS ports
9092 and 9082, which may cause conflicts on the node.
Open the Cluster CR of a managed cluster for editing:
kubectl-n<managedClusterProjectName>editcluster
Substitute <managedClusterProjectName> with the corresponding value.
In the spec.providerSpec.helmReleases section, configure
csiCephFsGPCMetricsPort and csiCephFsLivenessMetricsPort as
required. For example:
spec:providerSpec:helmReleases:...-name:ceph-controller...values:...rookExtraConfig:csiCephFsEnabled:truecsiCephFsGPCMetricsPort:"9092"# should be a stringcsiCephFsLivenessMetricsPort:"9082"# should be a string
Rook will enable the CephFS CSI plugin and provisioner.
Open the KaasCephCluster CR of a managed cluster for editing:
Define the mds role for the corresponding nodes where Ceph MDS daemons
should be deployed. Mirantis recommends labeling only one node with the
mds role. For example:
Once CephFS is specified in the KaaSCephCluster CR, Ceph Controller will
validate it and request Rook to create CephFS. Then Ceph Controller will create
a Kubernetes StorageClass, required to start provisioning the storage,
which will operate the CephFS CSI driver to create Kubernetes PVs.
Available since MCC 2.23.1 (Cluster release 12.7.0)TechPreview
This section describes how to share a Ceph cluster with another managed
cluster of the same management cluster and how to manage such Ceph cluster.
A shared Ceph cluster allows connecting of a consumer cluster to a producer
cluster. The consumer cluster uses the Ceph cluster deployed on the producer
to store the necessary data. In other words, the producer cluster contains the
Ceph cluster with mon, mgr, osd, and mds daemons. And the
consumer cluster contains clients that require access to the Ceph storage.
For example, an NGINX application that runs in a cluster without storage
requires a persistent volume to store data. In this case, such a cluster can
connect to a Ceph cluster and use it as a block or file storage.
Limitations
Before MCC 2.24.2 (Cluster release 15.0.1), connection to a shared Ceph
cluster is possible only through the client.admin user.
The producer and consumer clusters must be located in the same
management cluster.
The LCM network of the producer cluster must be available in the
consumer cluster.
Ceph requires a non-admin client to share the producer cluster resources with
the consumer cluster. To connect the consumer cluster with the producer
cluster, the Ceph client requires the following caps (permissions):
Read-write access to Ceph Managers
Read and role-definer access to Ceph Monitors
Read-write access to Ceph Metadata servers if CephFS pools must be shared
Profile access to shared RBD/CephFS pools’ access for Ceph OSDs
To create a Ceph non-admin client, add the following snippet to the
clients section of the KaaSCephCluster object:
spec:cephClusterSpec:clients:-name:<nonAdminClientName>caps:mgr:"allowrw"mon:"allowr,profilerole-definer"mds:"allowrw"# if CephFS must be sharedosd:<poolsProfileCaps>
Substitute <nonAdminClientName> with a Ceph non-admin client name and
<poolsProfileCaps> with a comma-separated profile list of RBD and CephFS
pools in the following format:
profilerbdpool=<rbdPoolName> for each RBD pool
allowrwtagcephfsdata=<cephFsName> for each CephFS pool
For backward compatibility, the Ceph client.admin client is
available as <clientName>. However, Mirantis does not recommend
using client.admin for security reasons.
Connect to the producer cluster and generate connectionString.
Proceed according to the MCC version used:
Since MCC 2.25.0 (Cluster release 17.0.0)
Create a KaaSCephOperationRequest resource in a managed cluster
namespace of the management cluster:
apiVersion:kaas.mirantis.com/v1alpha1kind:KaaSCephOperationRequestmetadata:name:test-share-requestnamespace:<managedClusterProject>spec:k8sCluster:name:<managedClusterName>namespace:<managedClusterProject>kaasCephCluster:name:<managedKaaSCephClusterName>namespace:<managedClusterProject>share:clientName:<clientName>clusterID:<namespace/name>opts:cephFS:true# if the consumer cluster will use the CephFS storage
After KaaSCephOperationRequest is applied, wait until the
Prepared state displays in the status.shareStatus section.
Obtain connectionString from the status.shareStatus section.
The example of the status section:
<consumerClusterProjectName> is the project name of the consumer
managed cluster on the management cluster.
<clusterName> is the consumer managed cluster name.
<generatedConnectionString> is the connection string generated in
the previous step.
<clusterNetCIDR> and <publicNetCIDR> are values that must match
the same values in the producer KaaSCephCluster object.
Note
The spec.cephClusterSpec.network and
spec.cephClusterSpec.nodes parameters are mandatory.
The connectionString parameter is specified in the
spec.cephClusterSpec.external section of the KaaSCephCluster CR.
The parameter contains an encrypted string with all the configurations
needed to connect the consumer cluster to the shared Ceph cluster.
Apply consumer-kcc.yaml on the management cluster:
kubectlapply-fconsumer-kcc.yaml
Once the Ceph cluster is specified in the KaaSCephCluster CR of the
consumer cluster, Ceph Controller validates it and requests Rook to connect
the consumer and producer.
Substitute <managedClusterProjectName> with the corresponding value.
In the spec.cephClusterSpec.pools, specify pools from the producer
cluster to be used by the consumer cluster. For example:
Caution
Each name in the pools section must match the
corresponding full pool name of the producer cluster. You can find
full pool names in the KaaSCephCluster CR by the following path:
status.fullClusterInfo.blockStorageStatus.poolsStatus.
After specifying pools in the consumer KaaSCephCluster CR, Ceph Controller
creates a corresponding StorageClass for each specified pool, which can be
used for creating ReadWriteOnce persistent volumes (PVs) in the consumer
cluster.
Substitute <managedClusterProjectName> with the corresponding value.
In the sharedFilesystem section of the consumer cluster, specify
the dataPools to share.
Note
Sharing CephFS also requires specifying the metadataPool
and metadataServer sections similarly to the corresponding sections
of the producer cluster. For details, see CephFS specification.
After specifying CephFS in the KaaSCephCluster CR of the consumer
cluster, Ceph Controller creates a corresponding StorageClass that allows
creating ReadWriteMany (RWX) PVs in the consumer cluster.
If you need to configure the placement of Rook daemons on nodes, you can add
extra values in the ClusterproviderSpec section of the
ceph-controller Helm release.
The procedures in this section describe how to specify the placement of
rook-ceph-operator, rook-discover, and csi-rbdplugin.
To specify rook-ceph-operator placement:
On the management cluster, edit the Cluster resource of the target
managed cluster:
kubectl-n<managedClusterProjectName>editcluster
Add the following parameters to the ceph-controller Helm release values:
Substitute <labelSelectorX> with a valid Kubernetes label selector
expression to place the rook-discover and csi-rbdplugin DaemonSet
pods. For example, "role=storage-node;discover=true".
Wait for some time and verify on the managed cluster that the changes have
applied:
Migrate Ceph pools from one failure domain to another¶
The document describes how to change the failure domain of an already deployed
Ceph cluster.
Note
This document focuses on changing the failure domain from a smaller
to wider one, for example, from host to rack. Using the same
instruction, you can move the failure domain from a wider to smaller scale.
Caution
Data movement implies the Ceph cluster rebalancing that may impact
cluster performance, depending on the cluster size.
High-level overview of the procedure includes the following steps:
Set correct labels on the nodes.
Create the new bucket hierarchy.
Move nodes to new buckets.
Modify the CRUSH rules.
Add the manual changes to the KaaSCephCluster spec.
Verify that the Ceph cluster has enough space for multiple copies of data
to migrate. Mirantis highly recommends that the Ceph cluster has a minimum
of 25% of free space for the procedure to succeed.
Note
The migration procedure implies data movement and optional
modification of CRUSH rules that cause a large amount of data (depending
on the cluster size) to be first copied to a new location in the Ceph
cluster before data removal.
Create a backup of the current KaaSCephCluster object from the managed
namespace of the management cluster:
This procedure contains an example of moving failure domains of all pools from
host to rack. Using the same instruction, you can migrate pools from
other types of failure domains, migrate pools separately, and so on.
To migrate Ceph pools from one failure domain to another:
Set the required CRUSH topology in the KaaSCephCluster object for each
defined node. For details on the crush parameter, see Node parameters.
Setting the CRUSH topology to each node causes the Ceph Controller
to set proper Kubernetes labels on the nodes.
Example of adding the rack CRUSH topology key for each
node in the nodes section
Erasure-coded pools require different number of buckets to
store data. Instead of the number of replicas in replicated
pools, erasure-coded pools require the codingchunks+datachunks
number of buckets existing in the Ceph cluster. For example, if an
erasure-coded pool has 2 coding chunks and 2 data chunks configured,
then the pool requires 4 different buckets, for example, 4 racks,
to store data.
Obtain the current parameters of the erasure-coded profile:
cephosderasure-code-profileget<ecProfile>
In the profile, add the new bucket type as the failure domain using
the crush-failure-domain parameter:
New erasure-coded profiles cannot be renamed, so they will
not be removed automatically during pools cleanup. Remove them
manually, if needed.
Exit the ceph-tools pod.
In the management cluster, update the KaaSCephCluster object by setting
the failureDomain:rack parameter for each pool. The configuration from
the Rook perspective must match the manually created configuration.
For example:
Performance testing affects the overall Ceph cluster performance.
Do not run it unless you are sure that user load will not be affected.
This section describes how to configure periodic Ceph performance testing using
Kubernetes batch or cron jobs that execute a
fio process in a
separate container with a connection to the Ceph cluster. The test results can
then be stored in a persistent volume attached to the container.
Ceph performance testing is managed by the KaaSCephOperationRequest CR that
creates separate CephPerfTestRequest requests to handle the test run. Once
you configure the perfTest section of the KaaSCephOperationRequest
spec, it propagates to CephPerfTestRequest on the managed cluster in the
ceph-lcm-mirantis namespace. You can create a performance test for a single
or scheduled runs.
Performance testing affects the overall Ceph cluster performance.
Do not run it unless you are sure that user load will not be affected.
This section describes how to create a Ceph performance test request through
the KaaSCephOperationRequest CR.
To create a Ceph performance test request:
Create an RBD image with the required parameters. For example, run the
following command in ceph-tools-container to allow execution of the
perftest example below on a managed cluster:
Substitute <ceph-tools-pod> with the ceph-tools Pod ID,
<pool_name> and <image_name> with pool and image names, and
specify the size. In the example below, mirablock-k8s-block-hdd is used
as pool name and tests as image name:
Substitute <managedKubeconfig> with the managed cluster kubeconfig
and <name> with the KaaSCephOperationRequestmetadata.name, for
example, test-perf-req.
Optional. Remove the KaaSCephOperationRequest. Removal of
KaaSCephOperationRequest also removes the CephPerfTestRequest CR
propagated to the managed cluster.
This section describes the KaaSCephOperationRequest CR specification used
to automatically create a CephPerfTestRequest request. For the procedure
workflow, see Enable periodic Ceph performance testing.
Spec of the KaaSCephOperationRequest perftest high-level parameters¶
Parameter
Description
perfTest
Describes the definition for the CephPerfTestRequest spec. For
details on the perfTest parameters, see the tables below.
kaasCephCluster
Defines KaaSCephCluster on which the KaaSCephOperationRequest
depends on. Use the kaasCephCluster parameter if the name or project
of the corresponding Container Cloud cluster differs from the default
one:
Defines the cluster on which the KaaSCephOperationRequest depends
on. Use the k8sCluster parameter if the name or project of the
corresponding Container Cloud cluster differs from the default one:
spec:k8sCluster:name:kaas-mgmtnamespace:default
If you omit this parameter, ceph-kcc-controller will set it
automatically.
A list of command arguments for a performance test execution. For all
available parameters, see fio documentation.
Note
Performance test results will be saved on a PVC if the test
run parameters contain an argument to save to a file. Otherwise, test
results will be saved only as Pod logs. For example, for the default
fio image, use the --output=/results/<fileName> option to
redirect to a file that will be saved on the attached PVC.
Configuring a mount point is not supported.
command
Optional. Entrypoint command to run performance test in the container.
If the performance image is updated, you may also update the command.
By default, equals the image entry point.
image
Container image to use for jobs. By default, vineethac/fio_image.
Mirantis recommends using the default fio image as it supports
multitude I/O engines.
For details, see fio man page.
periodic
Configuration of the performance test runs as periodic jobs. Leave empty
if a single run is required. For details, see
Ceph performance periodic parameters.
saveResultOnPvc
Option that enables saving of the performance test results on a PVC.
Contains the following fields:
pvcName - PVC name to use. If not specified, a PVC name for the
performance test will be created automatically. Namespace is static
and equals rook-ceph.
pvcStorageClass - StorageClass to use for PVC. If not specified,
the default storage class is used.
pvcSize - PVC size, defaults to 10Gi.
preserveOnDelete - PVC preservation after removal of the performance
test.
This section describes the status.perfTestStatus fields of the
KaaSCephOperationRequest CR that you can use to check the status of a Ceph
performance test request.
Note
Performance test results will be saved on PVC if the test run
parameters contain the saveResultOnPvc option. Otherwise, test
results will be saved only as Pod logs. For details, see
Ceph performance test parameters.
Status of the KaaSCephOperationRequest high-level parameters¶
Status of the KaaSCephOperationRequestperfTestStatus parameters¶
Parameter
Description
phase
Describes the current request phase:
Pending - the request is created and placed in the request queue.
Scheduling - the performance test is handled, waiting for a Pod to
be scheduled for the run.
WaitingNextRun - the performance test is waiting for the next run
of the periodic job.
Running - the performance test is executing.
Finished - the performance test executed successfully.
Suspended - the performance test is suspended. Only for periodic
jobs.
Failed - the performance test failed.
LastStartTime
The last start time of the performance test execution.
LastDurationTime
The duration of the last successful performance test.
LastJobStatus
The execution status of the last performance test.
messages
Issues or warnings found during the performance test run.
results
Location of the performance test result. Contains the following fields:
perftestReference - reference to the job or cron job with the
performance test run.
referenceNamespace - namespace of the job or cron job with the
performance test run.
storedOnPvc - location of the performance test results on a PVC with
pvcName in pvcNamespace if the test run parameters contain the
saveResultOnPvc option.
statusHistory
History of statuses and timings for cron jobs:
StartTime - start time of the previous performance test
JobStatus - last status of the performance test
DurationTime - last duration of the performance test
Messages - issues occured during the previous performance test
This section describes how to configure StackLight in your Mirantis OpenStack
on Kubernetes deployment and includes the description of StackLight parameters
and their verification.
This section describes the StackLight configuration keys that you can specify
in the values section to change StackLight settings as required. Prior to
making any changes to StackLight configuration, perform the steps described in
StackLight configuration procedure.
After changing StackLight configuration, verify the changes as described in
Verify StackLight after configuration.
Important
Some parameters are marked as mandatory. Failure to specify
values for such parameters causes the Admission Controller to reject cluster
creation.
This section describes the OpenStack-related StackLight configuration keys.
For MOSK cluster configuration keys, see
MOSK cluster configuration parameters.
External FQDN used to communicate with OpenStack services for
certificates monitoring. The option is deprecated, use
openstack.externalFQDNs.enabled instead.
https://os.ssl.mirantis.net/
openstack.externalFQDNs.enabled (bool)
External FQDN used to communicate with OpenStack services. Used for
certificates monitoring. Set to false by default.
true or false
openstack.insecure (string)
Defines whether to verify the trust chain of the OpenStack endpoint SSL
certificates during monitoring.
Specifies the interval of metrics gathering from the OpenStack API. Set
to 1m by default.
1m, 3m
openstack.telegraf.insecure (bool)
Enables or disables the server certificate chain and host name
verification. Set to true by default.
true or false
openstack.telegraf.skipPublicEndpoints (bool)
Enables or disables HTTP probes for public endpoints from the OpenStack
service catalog. Set to false by default, meaning that Telegraf
verifies all endpoints from the OpenStack service catalog, including
the public, admin, and internal endpoints.
Available since MOSK 23.3. Defines the timeout of
the tungstenfabric-exporter client requests. Set to 5s by default.
tungstenFabricMonitoring:exportersTimeout:"5s"
tungstenFabricMonitoring.analyticsEnabled (bool)
Available since MOSK 24.1. Enables or disables
monitoring of the Tungsten Fabric analytics services.
In MOSK 24.1, defaults to true.
Since MOSK 24.2, the default value is set
automatically based on the real state of the Tungsten Fabric analytics
services (enabled or disabled) in the Tungsten Fabric cluster.
Defines custom alerts. Also, modifies or disables existing alert
configurations. For the list of predefined alerts, see StackLight alerts.
While adding or modifying alerts, follow the Alerting rules.
customAlerts:# To add a new alert:-alert:ExampleAlertannotations:description:Alert descriptionsummary:Alert summaryexpr:example_metric > 0for:5mlabels:severity:warning# To modify an existing alert expression:-alert:AlertmanagerFailedReloadexpr:alertmanager_config_last_reload_successful == 5# To disable an existing alert:-alert:TargetDownenabled:false
An optional field enabled is accepted in the alert body to disable
an existing alert by setting to false. All fields specified using
the customAlerts definition override the default predefined
definitions in the charts’ values.
On the managed clusters with limited Internet access, proxy is required for
StackLight components that use HTTP and HTTPS and are disabled by default but
need external access if enabled, for example, for the Salesforce integration
and Alertmanager notifications external rules.
Disables or enables alert inhibition rules. If enabled, Alertmanager
decreases alert noise by suppressing dependent alerts notifications to
provide a clearer view on the cloud status and simplify troubleshooting.
Enabled by default. For details, see Alert dependencies. For
details on inhibition rules, see Prometheus documentation.
On the managed clusters with limited Internet access, proxy is required for
StackLight components that use HTTP and HTTPS and are disabled by default but
need external access if enabled. The Microsoft Teams integration depends on the
Internet access through HTTPS.
Key
Description
Example values
alertmanagerSimpleConfig.msteams.enabled (bool)
Enables or disables Alertmanager integration with Microsoft Teams.
Requires a set up Microsoft Teams channel and a channel connector. Set
to false by default.
true or false
alertmanagerSimpleConfig.msteams.url (string)
Defines the URL of an Incoming Webhook connector of a
Microsoft Teams channel. For details about channel connectors, see
Microsoft documentation.
On the managed clusters with limited Internet access, proxy is required for
StackLight components that use HTTP and HTTPS and are disabled by default but
need external access if enabled. The Salesforce integration depends on the
Internet access through HTTPS.
Key
Description
Example values
clusterId (string)
Unique cluster identifier
clusterId="<ClusterProject>/<ClusterName>/<UID>",
generated for each cluster using Cluster Project,
Cluster Name, and cluster UID, separated by a slash. Used
for both sf-notifier and sf-reporter services.
The clusterId is automatically defined for each cluster.
Do not set or modify it manually.
Prior to configuring the integration with ServiceNow, perform the
following prerequisite steps using the ServiceNow documentation of the
required version.
In a new or existing Incident table, add the Alert ID field
as described in Add fields to a table.
To avoid alerts duplication, select Unique.
Create an Access Control List (ACL) with read/write permissions for the
Incident table as described in Securing table
records.
Enables or disables Alertmanager integration with ServiceNow. Set to
false by default. Requires a set up ServiceNow account and
compliance with the Incident table requirements above.
true or false
alertmanagerSimpleConfig.serviceNow (map)
Defines the ServiceNow parameters and credentials for integration with
Alertmanager:
incident_table - name of the table created in ServiceNow. Do not
confuse with the table label.
api_version - version of the ServiceNow HTTP API. By default,
v1.
alert_id_field - name of the unique string field configured in
ServiceNow to hold Prometheus alert IDs. Do not confuse with the table
label.
auth.instance - URL of the instance.
auth.username - name of the ServiceNow user account with access to
Incident table.
auth.password - password of the ServiceNow user account.
On the managed clusters with limited Internet access, proxy is required for
StackLight components that use HTTP and HTTPS and are disabled by default but
need external access if enabled. The Slack integration depends on the Internet
access through HTTPS.
Enables or disables the Watchdog alert that constantly fires as
long as the entire alerting pipeline is functional. You can use this
alert to verify that Alertmanager notifications properly flow to the
Alertmanager receivers. Set to true by default.
Specifies the approximate expected cluster size. Set to small by
default. Other possible values include medium and large.
Depending on the choice, appropriate resource limits are passed
according to the resources or resourcesPerClusterSize parameter.
Caution
Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and
16.3.0), resourcesPerClusterSize is deprecated and is overridden
by the resources parameter. Therefore, use the resources
parameter instead.
The values differ by the OpenSearch and Prometheus resource limits:
small (default) - 2 CPU, 6 Gi RAM for OpenSearch, 1 CPU,
8 Gi RAM for Prometheus. Use small only for testing and evaluation
purposes with no workloads expected.
medium - 4 CPU, 16 Gi RAM for OpenSearch, 3 CPU, 16 Gi RAM
for Prometheus.
large - 8 CPU, 32 Gi RAM for OpenSearch, 6 CPU, 32 Gi RAM
for Prometheus. Set to large only in case of lack of resources for
OpenSearch and Prometheus.
Removed in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0).
Disables Grafana Image Renderer. For example, for resource-limited
environments. Enabled by default.
true or false
grafana.homeDashboard (string)
Defines the home dashboard. Set to kubernetes-cluster by default.
You can define any of the available dashboards.
Available since MCC 2.25.1 (Cluster releases 17.0.1 and 16.0.1)
Key
Description
Example values
networkPolicies.enabled (bool)
Enables or disables the Kubernetes Network Policy resource that allows
controlling network connections to and from Pods deployed in the
stackLight namespace. Enabled by default.
Adds extra namespaces to collect Kubernetes Pod logs from. Requires
logging.enabled and logging.namespaceFiltering.logs.enabled
set to true. Defines a YAML-formatted list of namespaces,
which is empty by default.
Limits the number of namespaces for Kubernetes events collection.
Disabled by default due to sysdig scanner present on some
MOSK clusters and due to cluster-scoped objects
producing events by default to the default namespace, but it is
not passed to StackLight configuration anyhow. Requires
logging.enabled set to true.
Adds extra namespaces to collect Kubernetes events from. Requires
logging.enabled and logging.namespaceFiltering.events.enabled
set to true. Defines a YAML-formatted list of namespaces,
which is empty by default.
Defines the log verbosity level for all StackLight components if not
defined using component. To use the component default log verbosity
level, leave the string empty.
trace - most verbose log messages, generates large amounts of data
debug - messages typically of use only for debugging purposes
info - informational messages describing common processes such as
service starting or stopping; can be ignored during normal system
operation but may provide additional input for investigation
warn - messages about conditions that may require attention
error - messages on error conditions that prevent normal system
operation and require action
crit - messages on critical conditions indicating that a service
is not working, working incorrectly or is unusable, requiring
immediate attention
Since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0),
the NO_SEVERITY severity label is automatically added to a log with
no severity label in the message. This enables greater control over determining
which logs Fluentd processes and which ones are skipped by mistake.
stacklightLogLevels.component (map)
Defines (overrides the default value) the log verbosity level for
any StackLight component separately. To use the component default log
verbosity, leave the string empty.
Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).
Sets the least important level of log messages to send to OpenSearch.
Requires logging.enabled set to true.
The default logging level is INFO, meaning that StackLight will
drop log messages for the lower DEBUG and TRACE levels. Levels
from WARNING to EMERGENCY require attention.
Note
The FLUENTD_ERROR logs are of special type and cannot be
dropped.
TRACE - the most verbose logs. Such level generates large amounts
of data.
DEBUG- messages typically of use only for debugging purposes.
INFO - informational messages describing common processes such as
service starting or stopping. Can be ignored during normal system
operation but may provide additional input for investigation.
NOTICE - normal but significant conditions that may require
special handling.
WARNING - messages on unexpected conditions that may require
attention.
ERROR - messages on error conditions that prevent normal system
operation and require action.
CRITICAL - messages on critical conditions indicating that a
service is not working or working incorrectly.
ALERT - messages on severe events indicating that action is needed
immediately.
EMERGENCY - messages indicating that a service is unusable.
logging.metricQueries (map)
Allows configuring OpenSearch queries for the data present in
OpenSearch. Prometheus Elasticsearch Exporter then queries the
OpenSearch database and exposes such metrics in the
Prometheus format. For details, see Create logs-based metrics.
Includes the following parameters:
indices - specifies the index pattern
interval and timeout - specify in seconds how often to send the
query to OpenSearch and how long it can last before timing out
onError and onMissing - modify the prometheus-es-exporter
behavior on query error and missing index. For details,
see Prometheus Elasticsearch Exporter.
Available since MCC 2.25.0 (Cluster releases 17.0.0 and 16.0.0)
Key
Description
Example values
logging.enforceOopsCompression
Enforces 32 GB of heap size, unless the defined memory limit allows using
50 GB of heap. Requires logging.enabled set to true.
Enabled by default. When disabled, StackLight computes heap as ⅘ of
the set memory limit for any resulting heap value. For more details,
see Tune OpenSearch performance.
Available since MCC 2.23.0 (Cluster release 11.7.0)
Key
Description
Example values
logging.externalOutputs (map)
Specifies external Elasticsearch, OpenSearch, and syslog destinations
as fluentd-logs outputs. Requires logging.enabled:true. For
configuration procedure, see Enable log forwarding to external destinations.
Available since MCC 2.23.0 (Cluster release 11.7.0)
Key
Description
Example values
logging.externalOutputSecretMounts (map)
Specifies authentication secret mounts for external log destinations.
Requires logging.externalOutputs to be enabled and a Kubernetes
secret to be created under the stacklight namespace. Contains the
following values:
secretName
Mandatory. Kubernetes secret name.
mountPath
Mandatory. Mount path of the Kubernetes secret defined in secretName.
defaultMode
Optional. Decimal number defining secret permissions, 420 by default.
Deprecated since MCC 2.23.0 (Cluster release 11.7.0)
Note
Since Container Cloud 2.23.0 (Cluster release 11.7.0),
logging.syslog is deprecated for the sake of
logging.externalOutputs. For details, see
Logging to external outputs.
Key
Description
Example values
logging.syslog.enabled (bool)
Enables or disables remote logging to syslog. Disabled by default.
Requires logging.enabled set to true. For details and
configuration example, see Enable remote logging to syslog.
true or false
logging.syslog.host (string)
Specifies the remote syslog host.
remote-syslog.svc
logging.syslog.level (string)
Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).
Specifies logging level for the syslog output.
INFO
logging.syslog.port (string)
Specifies the remote syslog port.
514
logging.syslog.packetSize (string)
Defines the packet size in bytes for the syslog logging output. Set to
1024 by default. May be useful for syslog setups allowing packet
size larger than 1 kB. Mirantis recommends that you tune this parameter
to allow sending full log lines.
1024
logging.syslog.protocol (bool)
Specifies the remote syslog protocol. Set to udp by default.
tcp or udp
logging.syslog.tls.enabled (bool)
Optional. Disabled by default. Enables or disables TLS. Use TLS only
for the TCP protocol. TLS will not be enabled if you set a protocol
other than TCP.
true or false
logging.syslog.tls.verify_mode (int)
Optional. Configures TLS verification.
0 for OpenSSL::SSL::VERIFY_NONE
1 for OpenSSL::SSL::VERIFY_PEER
2 for OpenSSL::SSL::VERIFY_FAIL_IF_NO_PEER_CERT
4 for OpenSSL::SSL::VERIFY_CLIENT_ONCE
logging.syslog.tls.certificate (string)
Defines how to pass the certificate. secret takes precedence over
hostPath.
secret - specifies the name of the secret holding the certificate.
hostPath - specifies an absolute host path to the PEM certificate.
Optional. Overrides tag_include. Sets logs by tags to exclude from the
destination output. For example, to exclude all logs with the test tag,
set tag_exclude:'/.*test.*/'.
How to obtain tags for logs
Select from the following options:
In the main OpenSearch output, use the logger field that equals the
tag.
Use logs of a particular Pod or container by following the below order,
with the first match winning:
The value of the app Pod label. For example, for
app=opensearch-master, use opensearch-master as the log tag.
The value of the k8s-app Pod label.
The value of the app.kubernetes.io/name Pod label.
If a release_group Pod label exists and the component Pod label
starts with app, use the value of the component label as the tag.
Otherwise, the tag is the application label joined to the component
label with a -.
The name of the container from which the log is taken.
The values for tag_exclude and tag_include are placed into
<match> directives of Fluentd and only accept regex types that are
supported by the <match> directive of Fluentd. For details, refer to the
Fluentd official documentation.
'{fluentd-logs,systemd}'
tag_include (string)
Since MCC 2.23.0 (11.7.0)
Optional. Is overridden by tag_exclude. Sets logs by tags to include to
the destination output. For example, to include all logs with the auth
tag, set tag_include:'/.*auth.*/'.
Enables or disables HTTP endpoints monitoring. If enabled, the
monitoring tool performs the probes against the defined endpoints every
15 seconds. Set to false by default.
Defines the directory path with external endpoints certificates on host.
/etc/ssl/certs/
externalEndpointMonitoring.domains (slice)
Defines the list of HTTP endpoints to monitor. The endpoints must
successfully respond to a liveness probe. For success, a request to a
specific endpoint must result in a 2xx HTTP response code.
Enables or disables StackLight to monitor and alert on the expiration
date of the TLS certificate of an HTTPS endpoint. If enabled, the
monitoring tool performs the probes against the defined endpoints every
hour. Set to false by default.
true or false
sslCertificateMonitoring.domains (slice)
Defines the list of HTTPS endpoints to monitor the certificates from.
On the clusters that run large-scale workloads, workload monitoring
generates a big amount of resource-consuming metrics. To prevent
generation of excessive metrics, you can disable workload monitoring in
the StackLight metrics and monitor only the infrastructure.
The metricFilter parameter enables the cAdvisor (Container
Advisor) and kubeStateMetrics metric ingestion filters for
Prometheus. Set to false by default. If set to true, you can
define the namespaces to which the filter will apply. The parameter is
designed for managed clusters.
Defines the NodeSelector to use for the most of StackLight pods
(except some pods that refer to DaemonSets) if the NodeSelector
of a component is not defined.
default:role:stacklight
nodeSelector.component (map)
Defines the NodeSelector to use for particular StackLight component
pods. Overrides nodeSelector.default.
Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).
Specifies the retention time per index. Includes the following parameters:
logstash - specifies the logstash-* index retention time.
events - specifies the kubernetes_events-* index retention
time.
notifications - specifies the notification-* index retention
time.
The allowed values include integers (days) and numbers with suffixes:
y, m, w, d, h, including capital letters.
By default, values set in elasticsearch.logstashRetentionTime are
used. However, the elasticsearch.retentionTime parameters, if
defined, take precedence over elasticsearch.logstashRetentionTime.
Removed in Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).
Defines the OpenSearch (Elasticsearch) logstash-* index retention
time in days. The logstash-* index stores all logs gathered from
all nodes and containers. Set to 1 by default.
Note
Due to the known issue 27732-2,
a custom setting for this parameter is dismissed during cluster deployment
and changes to one day (default). Refer to the known issue description
for the affected Cluster releases and available workaround.
Specifies the OpenSearch (Elasticsearch) PVC(s) size. The number of PVCs
depends on the StackLight database mode. For HA, three PVCs will be
created, each of the size specified in this parameter. For non-HA, one
PVC of the specified size.
Important
You cannot modify this parameter after cluster creation.
Note
Due to the known issue 27732-1,
that is fixed in Container Cloud 2.22.0 (Cluster releases 11.6.0 and 12.7.0),
the OpenSearch PVC size configuration is dismissed during a cluster
deployment. Refer to the known issue description for affected
Cluster releases and available workarounds.
Available since Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).
Optional. Specifies the number of gigabytes that is exclusively available
for the OpenSearch data.
Since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0),
defines ceiling for storage-based retention, though only a portion of
this storage will be available for indices, depending on the total size
and cluster configuration.
Before Container Cloud 2.29.0 (Cluster releases 17.3.0, 16.3.0, or earlier),
defines ceiling for storage-based retention where 80% of the defined value
is assumed as available disk space for normal OpenSearch node functioning.
If not set (by default), the number of gigabytes from
elasticsearch.persistentVolumeClaimSize is used.
This parameter is useful in the following cases:
The real storage behind the volume is shared between multiple consumers.
As a result, OpenSearch cannot use all elasticsearch.persistentVolumeClaimSize.
The real volume size is bigger than elasticsearch.persistentVolumeClaimSize.
As a result, OpenSearch can use more than elasticsearch.persistentVolumeClaimSize.
Additional configuration for opensearch.yml that allows setting
various OpenSearch parameters, including logging settings, node watermarks,
and other cluster-level configurations.
Since Container Cloud 2.29.0 and MOSK 25.1, by default,
StackLight manages watermarks efficiently (low/high/flood: 150/100/50 GB).
If .extraConfig sets any watermark, StackLight stops managing them.
In this case, explicitly set all watermarks using absolute values instead of
percentages to prevent issues. While percentages are accepted, they may cause
unexpected behavior, especially in clusters that use LVP as a storage
provisioner, where OpenSearch shares storage with other components.
Defines the minimum amount of time for Prometheus to wait before
resending an alert to Alertmanager. Passed to the
--rules.alert.resend-delay flag. Set to 2m by default.
2m, 90s
prometheusServer.alertsCommonLabels (dict)
Available since Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).
Defines the list of labels to be injected to firing alerts while
they are sent to Alertmanager. Empty by default.
The following labels are reserved for internal purposes and cannot
be overridden: cluster_id, service, severity.
Caution
When new labels are injected, Prometheus sends alert updates
with a new set of labels, which can potentially cause Alertmanager
to have duplicated alerts for a short period of time if the cluster
currently has firing alerts.
Specifies the Prometheus PVC(s) size. The number of PVCs depends on the
StackLight database mode. For HA, three PVCs will be created,
each of the size specified in this parameter. For non-HA, one PVC of the
specified size.
Important
You cannot modify this parameter after cluster creation.
prometheusServer:persistentVolumeClaimSize:16Gi
prometheusServer.queryConcurrency (string)
Available since Container Cloud 2.24.0 (Cluster release 14.0.0).
Defines the number of concurrent queries limit. Passed to the
--query.max-concurrency flag. Set to 20 by default.
25
prometheusServer.retentionSize (string)
Defines the Prometheus database retention size. Passed to the
--storage.tsdb.retention.size flag. Set to 15GB by default.
15GB, 512MB
prometheusServer.retentionTime (string)
Defines the Prometheus database retention period. Passed to the
--storage.tsdb.retention.time flag. Set to 15d by default.
Specifies a set of custom Blackbox Exporter modules. For details, see
Blackbox Exporter configuration: module.
The http_2xx, http_2xx_verify, http_openstack,
http_openstack_insecure, tls, tls_verify names are reserved
for internal usage and any overrides will be discarded.
Specifies the offset to subtract from timeout in seconds
(--timeout-offset), upper bounded by 5.0 to comply with the built-in
StackLight functionality. If nothing is specified, the Blackbox
Exporter default value is used. For example, for Blackbox Exporter
v0.19.0, the default value is 0.5.
Defines custom Prometheus scrape configurations. For details, see
Prometheus documentation: scrape_config.
The names of default StackLight scrape configurations, which you can
view in the Status -> Targets tab of the Prometheus web UI,
are reserved for internal usage and any overrides will be discarded.
Therefore, provide unique names to avoid overrides.
Available since Container Cloud 2.24.0 (Cluster release 14.0.0)
Key
Description
Example values
metricsFiltering.enabled (bool)
Configuration for managing Prometheus metrics filtering. When enabled
(default), only actively used and explicitly white-listed metrics get
scraped by Prometheus.
prometheusServer:metricsFiltering:enabled:true
metricsFiltering.extraMetricsInclude (map)
List of extra metrics to whitelist, which are dropped by default.
Contains the following parameters:
<jobname> - scraping job name as a key for extra white-listed
metrics to add under the key. For the list of job names, see
White list of Prometheus scrape jobs.
If a job name is not present in this list, its target metrics are not
dropped and are collected by Prometheus by default.
You can also use group key names to add metrics to more than one job
using _group-<keyname>.
The following list combines jobs by groups:
The prometheus-coredns job from the
go-collector-metrics and process-collector-metrics groups
is removed in Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0).
<listofmetricstocollect> - extra metrics of <jobname>
to be white-listed.
Excludes monitoring of RegExp-specified network devices. The number of
network interface-related metrics is significant and may cause extended
Prometheus RAM usage in big clusters. Therefore, Prometheus Node
Exporter only collects information of a basic set of interfaces (both
host and container) and excludes the following monitoring interfaces:
veth/cali - the host-side part of the container-host Ethernet
tunnel
o-hm0 - the OpenStack Octavia management interface for
communication with the amphora machine
tap, qg-, qr-, ha- - the Open vSwitch virtual bridge
ports
br-(ex|int|tun) - the Open vSwitch virtual bridges
docker0, br- - the Docker bridge (master for the veth
interfaces)
ovs-system - the Open vSwitch interface (mapping interfaces to
bridges)
To enable information collecting for the interfaces above, edit the list
of blacklisted devices as needed.
Enables Node Exporter collectors. For a list of available collectors,
see Node Exporter Collectors. The
following collectors are enabled by default in StackLight:
Prometheus Relay is set up as an endpoint in the Prometheus
datasource in Grafana. Therefore, all requests from Grafana are sent to
Prometheus through Prometheus Relay. If Prometheus Relay reports request
timeouts or exceeds the response size limits, you can configure the
parameters below. In this case, Prometheus Relay resource limits may also
require tuning.
Key
Description
Example values
prometheusRelay.clientTimeout (string)
Specifies the client timeout in seconds. If empty, defaults to a value
determined by the cluster size: 10 for small, 30 for medium,
60 for large.
Note
The cluster size parameters are available since Container
Cloud 2.24.0 (Cluster release 14.0.0).
10
prometheusRelay.responseLimitBytes (string)
Specifies the response size limit in bytes. If empty, defaults to a
value determined by the cluster size: 6291456 for small,
18874368 for medium, 37748736 for large.
Note
The cluster size parameters are available since Container
Cloud 2.24.0 (Cluster release 14.0.0).
Skip this step if your remote server does not have authorization.
Defines additional mounts for remoteWrites secrets. Secret objects
with credentials needed to access the remote endpoint must be
precreated in the stacklight namespace. For details, see
Kubernetes Secrets.
Note
To create more than one file for the same remote write
endpoint, for example, to configure TLS connections,
use a single secret object with multiple keys in the data field.
Using the following example configuration, two files will be created,
cert_file and key_file:
Defines the configuration of a custom remote_write
endpoint for sending Prometheus samples.
Note
If the remote server uses authorization, first create
secret(s) in the stacklight namespace and mount them to
Prometheus through prometheusServer.remoteWriteSecretMounts. Then
define the created secret in the authorization field.
Provides the capability to override the default resource requests or
limits for any StackLight component for the predefined cluster sizes.
Caution
Since Container Cloud 2.28.0 (Cluster releases 17.3.0 and
16.3.0), resourcesPerClusterSize is deprecated and is overridden
by the resources parameter. Therefore, use the resources
parameter instead.
StackLight components for resource limits customization
Note
The below list has the
componentName:<podNamePrefix>/<containerName> format.
alerta:alerta/alertaalertmanager:prometheus-alertmanager/prometheus-alertmanageralertmanagerWebhookServicenow:alertmanager-webhook-servicenow/alertmanager-webhook-servicenowblackboxExporter:prometheus-blackbox-exporter/blackbox-exporterelasticsearch:opensearch-master/opensearch# DeprecatedelasticsearchCurator:elasticsearch-curator/elasticsearch-curatorelasticsearchExporter:elasticsearch-exporter/elasticsearch-exporterfluentdElasticsearch:fluentd-logs/fluentd-logs# DeprecatedfluentdLogs:fluentd-logs/fluentd-logsfluentdNotifications:fluentd-notifications/fluentdgrafana:grafana/grafanagrafanaRenderer:grafana/grafana-renderer# Removed in MCC 2.27.0 (17.2.0 and 16.2.0)iamProxy:iam-proxy/iam-proxy# DeprecatediamProxyAlerta:iam-proxy-alerta/iam-proxyiamProxyAlertmanager:iam-proxy-alertmanager/iam-proxyiamProxyGrafana:iam-proxy-grafana/iam-proxyiamProxyKibana:iam-proxy-kibana/iam-proxy# DeprecatediamProxyOpenSearchDashboards:iam-proxy-kibana/iam-proxyiamProxyPrometheus:iam-proxy-prometheus/iam-proxykibana:opensearch-dashboards/opensearch-dashboards# DeprecatedkubeStateMetrics:prometheus-kube-state-metrics/prometheus-kube-state-metricslibvirtExporter:prometheus-libvirt-exporter/prometheus-libvirt-exportermetricCollector:metric-collector/metric-collectormetricbeat:metricbeat/metricbeatnodeExporter:prometheus-node-exporter/prometheus-node-exporteropensearch:opensearch-master/opensearchopensearchDashboards:opensearch-dashboards/opensearch-dashboardspatroniExporter:patroni/patroni-patroni-exporterpgsqlExporter:patroni/patroni-pgsql-exporterpostgresql:patroni/patroniprometheusEsExporter:prometheus-es-exporter/prometheus-es-exporterprometheusMsTeams:prometheus-msteams/prometheus-msteamsprometheusRelay:prometheus-relay/prometheus-relayprometheusServer:prometheus-server/prometheus-serversfNotifier:sf-notifier/sf-notifiersfReporter:sf-reporter/sf-reporterstacklightHelmControllerController:stacklight-helm-controller/controllertelegrafDockerSwarm:telegraf-docker-swarm/telegraf-docker-swarmtelegrafDs:telegraf-ds-smart/telegraf-ds-smart# DeprecatedtelegrafDsSmart:telegraf-ds-smart/telegraf-ds-smarttelegrafOpenstack:telegraf-openstack/telegraf-openstack# replaced with osdpl-exporter in 24.1telegrafS:telegraf-docker-swarm/telegraf-docker-swarm# DeprecatedtelemeterClient:telemeter-client/telemeter-clienttelemeterServer:telemeter-server/telemeter-servertelemeterServerAuthServer:telemeter-server/telemeter-server-authorization-servertfControllerExporter:prometheus-tf-controller-exporter/prometheus-tungstenfabric-exportertfVrouterExporter:prometheus-tf-vrouter-exporter/prometheus-tungstenfabric-exporter
Provides the capability to override the containers resource requests or
limits for any StackLight component.
StackLight components for resource limits customization
Note
The below list has the
componentName:<podNamePrefix>/<containerName> format.
alerta:alerta/alertaalertmanager:prometheus-alertmanager/prometheus-alertmanageralertmanagerWebhookServicenow:alertmanager-webhook-servicenow/alertmanager-webhook-servicenowblackboxExporter:prometheus-blackbox-exporter/blackbox-exporterelasticsearch:opensearch-master/opensearch# DeprecatedelasticsearchCurator:elasticsearch-curator/elasticsearch-curatorelasticsearchExporter:elasticsearch-exporter/elasticsearch-exporterfluentdElasticsearch:fluentd-logs/fluentd-logs# DeprecatedfluentdLogs:fluentd-logs/fluentd-logsfluentdNotifications:fluentd-notifications/fluentdgrafana:grafana/grafanagrafanaRenderer:grafana/grafana-renderer# Removed in MCC 2.27.0 (17.2.0 and 16.2.0)iamProxy:iam-proxy/iam-proxy# DeprecatediamProxyAlerta:iam-proxy-alerta/iam-proxyiamProxyAlertmanager:iam-proxy-alertmanager/iam-proxyiamProxyGrafana:iam-proxy-grafana/iam-proxyiamProxyKibana:iam-proxy-kibana/iam-proxy# DeprecatediamProxyOpenSearchDashboards:iam-proxy-kibana/iam-proxyiamProxyPrometheus:iam-proxy-prometheus/iam-proxykibana:opensearch-dashboards/opensearch-dashboards# DeprecatedkubeStateMetrics:prometheus-kube-state-metrics/prometheus-kube-state-metricslibvirtExporter:prometheus-libvirt-exporter/prometheus-libvirt-exportermetricCollector:metric-collector/metric-collectormetricbeat:metricbeat/metricbeatnodeExporter:prometheus-node-exporter/prometheus-node-exporteropensearch:opensearch-master/opensearchopensearchDashboards:opensearch-dashboards/opensearch-dashboardspatroniExporter:patroni/patroni-patroni-exporterpgsqlExporter:patroni/patroni-pgsql-exporterpostgresql:patroni/patroniprometheusEsExporter:prometheus-es-exporter/prometheus-es-exporterprometheusMsTeams:prometheus-msteams/prometheus-msteamsprometheusRelay:prometheus-relay/prometheus-relayprometheusServer:prometheus-server/prometheus-serversfNotifier:sf-notifier/sf-notifiersfReporter:sf-reporter/sf-reporterstacklightHelmControllerController:stacklight-helm-controller/controllertelegrafDockerSwarm:telegraf-docker-swarm/telegraf-docker-swarmtelegrafDs:telegraf-ds-smart/telegraf-ds-smart# DeprecatedtelegrafDsSmart:telegraf-ds-smart/telegraf-ds-smarttelegrafOpenstack:telegraf-openstack/telegraf-openstack# replaced with osdpl-exporter in 24.1telegrafS:telegraf-docker-swarm/telegraf-docker-swarm# DeprecatedtelemeterClient:telemeter-client/telemeter-clienttelemeterServer:telemeter-server/telemeter-servertelemeterServerAuthServer:telemeter-server/telemeter-server-authorization-servertfControllerExporter:prometheus-tf-controller-exporter/prometheus-tungstenfabric-exportertfVrouterExporter:prometheus-tf-vrouter-exporter/prometheus-tungstenfabric-exporter
Using the example above, each pod in the alerta service will be
requesting 50 millicores of CPU and 200 MiB of memory, while being
hard-limited to 500 MiB of memory usage. Each configuration key is
optional.
Note
The logging mechanism performance depends on the cluster log
load. If the cluster components send an excessive amount of logs, the
default resource requests and limits for fluentdLogs (or
fluentdElasticsearch) may be insufficient, which may cause its
pods to be OOMKilled and trigger the KubePodCrashLooping alert.
In such case, increase the default resource requests and limits for
fluentdLogs. For example:
On the managed clusters with limited Internet access, proxy is required for
StackLight components that use HTTP and HTTPS and are disabled by default but
need external access if enabled. The Salesforce reporter depends on the
Internet access through HTTPS.
Key
Description
Example values
clusterId (string)
Unique cluster identifier
clusterId="<ClusterProject>/<ClusterName>/<UID>",
generated for each cluster using Cluster Project,
Cluster Name, and cluster UID, separated by a slash. Used
for both sf-reporter and sf-notifier services.
The clusterId key is automatically defined for each cluster.
Do not set or modify it manually.
In an HA StackLight setup, when highAvailabilityEnabled is set to true,
all StackLight Persistent Volumes (PVs) use the Local Volume Provisioner (LVP)
storage class not to rely on dynamic provisioners such as Ceph, which are not
available in every deployment. In a non-HA StackLight setup, when no storage
class is specified, PVs use the default storage class of a cluster.
Key
Description
Example values
storage.defaultStorageClass (string)
Defines the StorageClass to use for all StackLight Persistent Volume
Claims (PVCs) if a component StorageClass is not defined using the
componentStorageClasses. To use the default storage class,
leave the string empty.
lvp, standard
storage.componentStorageClasses (map)
Defines (overrides the defaultStorageClass value) the storage class
for any StackLight component separately. To use the default storage class,
leave the string empty.
Verify StackLight configuration of an OpenStack cluster¶
Key
Verification procedure
externalFQDNs.enabled
openstack.insecure
In the Prometheus web UI, navigate to Status > Targets.
Verify that the blackbox-external-endpoint target contains the
configured domains (URLs).
openstack.enabled
openstack.namespace
In the Grafana web UI, verify that the OpenStack dashboards are
present and not empty.
In the Prometheus web UI, click Alerts and verify that
the OpenStack alerts are present in the list of alerts.
openstack.gnocchi.enabled
In the Grafana web UI, verify that the Gnocchi dashboard
is present and not empty. Alternatively, verify that the Gnocchi
dashboard ConfigMap is present:
Verify that authentication to ServiceNow was successful. The output
should include ServiceNow authentication successful. In case of
authentication failure, the ServiceNowAuthFailure alert will
raise.
In your ServiceNow instance, verify that the Watchdog
alert appears in the Incident table. Once the incident is
created, the pod logs should include a line similar to
Created Incident: bef260671bdb2010d7b540c6cc4bcbed.
In case of any failure:
Verify that your ServiceNow instance is not in hibernation.
Verify that the service user credentials, table name, and
alert_id_field are correct.
Verify that the ServiceNow user has access to the table with
permission to read, create, and update records.
alertmanagerSimpleConfig.slack.enabled
alertmanagerSimpleConfig.slack.api_url
alertmanagerSimpleConfig.slack.channel
alertmanagerSimpleConfig.slack.route
In the Alertmanager web UI, navigate to Status and verify
that the Config section contains the HTTP-slack receiver
and route.
blackboxExporter.customModules
Verify that your module is present in the list of modules. It can
take up to 10 minutes for the module to appear in the ConfigMap.
Review the configmap-reload container logs to verify that the
reload happened successfully. It can take up to 1 minute for reload
to happen after the module appears in the ConfigMap.
For example, for blackboxExporter.timeoutOffset set to 0.1, the
output should include
["--config.file=/config/blackbox.yaml","--timeout-offset=0.1"].
It can take up to 10 minutes for the parameter to be populated.
ceph.enabled
In the Grafana web UI, verify that Ceph dashboards are present in the
list of dashboards and are populated with data.
In the Prometheus web UI, click Alerts and verify that
the list of alerts contains Ceph* alerts.
clusterSize
resourcesPerClusterSizeDeprecated
resources
Obtain the list of pods:
kubectlgetpo-nstacklight
Verify that the desired resource limits or requests are set in the
resources section of every container in the pod:
kubectlgetpo<pod_name>-nstacklight-oyaml
elasticsearch.logstashRetentionTime
Removed in MCC 2.26.0 (17.1.0, 16.1.0)
Verify that the unit_count parameter contains the desired number of
days:
Verify that OpenSearch, Fluentd, and OpenSearch Dashboards are present
in the list of StackLight resources. An empty output indicates that the
StackLight logging stack is disabled.
[...]2023-07-2509:39:33+0000[error]:configerrorfile="/etc/fluentd/fluent.conf"error_class=Fluent::ConfigErrorerror="host or host_with_port is required"
In the Prometheus web UI, navigate to
Status > Configuration.
Verify that the following fields in the metric_relabel_configs
section for the kubernetes-nodes-cadvisor and
prometheus-kube-state-metrics scrape jobs have the required
configuration:
action is set to keep or drop
regex contains a regular expression with configured namespaces
delimited by |
source_labels is set to [namespace]
mke.dockerdDataRoot
In the Prometheus web UI, navigate to Alerts and verify that
the MKEAPIDown is not false-positively firing due to the
certificate absence.
mke.enabled
In the Grafana web UI, verify that the MKE Cluster and
MKE Containers dashboards are present and not empty.
In the Prometheus web UI, navigate to Alerts and verify
that the MKE* alerts are present in the list of alerts.
nodeExporter.extraCollectorsEnabled
In the Prometheus web UI, run the following PromQL queries. The result
should not be empty.
In the Prometheus web UI, navigate to Status > Configuration.
Verify that the alerting.alert_relabel_configs section contains
the customization for common labels that you added in
prometheusServer.alertsCommonLabels during StackLight configuration.
prometheusServer.customAlerts
In the Prometheus web UI, navigate to Alerts and verify that
the list of alerts has changed according to your customization.
prometheusServer.customRecordingRules
In the Prometheus web UI, navigate to Status > Rules.
Verify that the list of Prometheus recording rules has changed
according to your customization.
prometheusServer.customScrapeConfigs
In the Prometheus web UI, navigate to Status > Targets.
Verify that the required target has appeared in the list of targets.
It may take up to 10 minutes for the change to apply.
prometheusServer.persistentVolumeClaimSize
Verify that the PVC(s) capacity equals or is higher (in case of
statically provisioned volumes) than specified:
After cron job execution (by default, at midnight server time),
obtain the Salesforce reporter pod name. The output should include
the Salesforce reporter pod name and STATUS must be
Completed.
kubectlgetpods-nstacklight
Verify that Salesforce reporter successfully authenticates to
Salesforce and creates records. The output must include the
Salesforce authentication successful, Created record or
Duplicate record and Updated record lines.
kubectllogs-nstacklight<sf-reporter-pod-name>
sslCertificateMonitoring.domains
sslCertificateMonitoring.enabled
In the Prometheus web UI, navigate to Status -> Targets.
Verify that the blackbox target contains the configured domains
(URLs).
storage.componentStorageClasses
storage.defaultStorageClass
Verify that the appropriate components PVCs have been created according
to the configured StorageClass:
The following hardware recommendations and software settings apply for better
OpenSearch performance in a MOSK cluster.
To tune OpenSearch performance:
Depending on your cluster size, set the required disk and CPU size along
with memory limit and heap size.
Heap size is calculated in StackLight as ⅘ of the specified memory limit.
If the calculated heap size exceeds 32 GB, slightly crossing this threshold
causes significant waste of memory due to loss of Ordinary Object Pointers
(OOPS) compression, which allows storing 64-bit pointers in 32-bits.
Since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0),
to prevent this behavior, for the memory limit in the 31-50 GB range, the
heap size is set to fixed 31 GB using the enforceOopsCompression
parameter, which is enabled by default. For details, see
Logging: Enforce OOPS compression. Exceeding the range causes loss of benefit
of OOPS compression, so the ⅘ formula applies again.
OpenSearch is write-heavy, so SSD is preferable as a disk type.
Extended retention periods, which depend on open shards, require
increasing this value significantly. For example, to 262144.
Configure swap as it significantly degrades performance. Lower
swappiness to 1 or 0 (to disable swap). For details, use the
Create MOSK host profiles procedure.
Example configuration:
kernelParameters:
sysctl:
vm.swappiness:"<value>"
Configure the kernel I/O scheduler to improve timing of disk writing
operations. Change it to one of the following options:
none - applies the FIFO queue.
mq-deadline - applies three queues: FIFO read, FIFO write, and
sorted.
Changing I/O scheduling is also possible through BareMetalHostProfile.
However, the specific implementation highly depends on the disk type used:
This section describes how to export logs from the OpenSearch Dashboards
navigation panel to the CSV format.
Caution
The log limit is set 10 000 rows, and it does not take into
account the resulted file size.
Note
The following instruction describes how to export all logs from the
opensearch-master-0 node of an OpenSearch cluster.
To export logs from the OpenSearch Dashboards navigation panel to CSV:
Log in to the OpenSearch Dashboards web UI as described in
Getting access.
Navigate to the Discover page.
In the left navigation panel, select the required log index pattern from
the top drop-down menu. For example, system* for system logs
and audit* for audit logs.
In the middle top menu, click Add filter and add the required
filters. For example:
event.provider matches the opensearch-master logger
orchestrator.pod matches the opensearch-master-0 node
name
In Search field names, search for required fields to be present
in the resulting CSV file. For example:
orchestrator.pod for opensearch-master-0
message for the log message
In the right top menu:
Click Save to save the filter after naming it.
Click Reporting > Generate CSV.
When the report generation completes, download the file depending on your
browser settings.
OpenSearch Dashboards is part of the StackLight logging stack. Using the
OpenSearch Dashboards web UI, you can view the visual representation of your
OpenStack deployment notifications, logs, Kubernetes events, and other cluster
notifications related to your deployment.
Log in to the OpenSearch Dashboards web UI as described in
Getting access.
Click the required dashboard to inspect the visualizations or perform a
search:
Dashboard
Description
Notifications
Provides visualizations on the number of notifications over time per
source and severity, host, and breakdowns. The dashboard includes search.
K8s events
Provides visualizations on the number of Kubernetes events per type,
and top event-producing resources and namespaces by reason and event
type. Includes search.
System Logs
Available for clusters created since Container Cloud 2.26.0
(Cluster releases 17.1.x, 16.1.x, or later).
Provides visualizations on the number of log messages per severity,
source, and top log-producing host, namespaces, containers, and
applications. Includes search.
Caution
Due to a known issue, this dashboard does not exist in
Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0).
The issue is addressed in Container Cloud 2.26.1 (Cluster releases
17.1.1 and 16.1.1). To work around the issue in 2.26.0, you can map
the fields of the logstash index to the system one and view
logs in the deprecated Logs dashboard.
For mapping details, see System index fields mapped to Logstash index fields.
LogsDeprecated in 2.26.0 (17.1.0 and 16.1.0)
Available only for clusters created before Container Cloud 2.26.0
(Cluster releases 17.0.x, 16.0.x, or earlier).
Analogous to System Logs but contains logs generated only
for the mentioned Cluster releases.
OpenSearch Dashboards provide the following search tools:
Filters
Queries
Full-text search
Filters enable you to organize the output information using the interface
tools. You can search for information by a set of indexed fields using
a variety of logical operators.
Queries enable you to construct search commands using OpenSearch query
domain-specific language (DSL) expressions. These expressions allow you to
search by the fields not included in the index.
In addition to filters and queries, you can use the Search input
field for full-text search.
In the dialog that opens, select the field of search in the
Field drop-down menu.
Select the logical operator in the Operator drop-down menu.
Type or select the filter value from the Value drop-down menu.
Create a filter using the ‘flat object’ field type¶
Available since MCC 2.23.0 (12.7.0 and 11.7.0)
For the orchestrator.labels field of the system and audit log indices,
you can use the flat_object field type to apply the filtering using
value or valueAndPath. For example:
Using value: to obtain all logs produced by iam-proxy, add
the following filters:
orchestrator.type that matches kubernetes
orchestrator.labels._value that matches iam-proxy
Using valueAndPath: to obtain all logs produced by the OpenSearch
cluster, add the following filters:
orchestrator.type that matches kubernetes
orchestrator.labels._valueAndPath that matches
orchestrator.labels.app=opensearch-master
Using the Grafana web UI, you can view the visual representation of the metric
graphs based on the time series databases.
Most Grafana dashboards include a
View logs in OpenSearch Dashboards link to immediately view
relevant logs in the OpenSearch Dashboards web UI. The OpenSearch Dashboards
web UI displays logs filtered using the Grafana dashboard variables, such as
the drop-downs. Once you amend the variables, wait for Grafana to generate a
new URL.
Note
Due to the known issue, the
View logs in OpenSearch Dashboards link does not work in
Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). The issue is
addressed in Container Cloud 2.26.1 (Cluster releases 17.1.1 and 16.1.1).
Caution
The Grafana dashboards that contain drop-down lists are limited
to 1000 lines. Therefore, if you require data on a specific item, use the
filter by name instead.
Note
Grafana dashboards that present node data have an additional
Node identifier drop-down menu. By default, it is set to
machine to display short names for Kubernetes nodes. To display
Kubernetes node name labels, change this option to node.
To view the Grafana dashboards:
Log in to the Grafana web UI as described Getting access.
From the drop-down list, select the required dashboard to inspect
the status and statistics of the corresponding service in your
management or MOSK cluster:
Component
Dashboard
Description
Ceph cluster
Ceph Cluster
Provides the overall health status of the Ceph cluster, capacity,
latency, and recovery metrics.
Ceph Nodes
Provides an overview of the host-related metrics, such as the number
of Ceph Monitors, Ceph OSD hosts, average usage of resources across
the cluster, network and hosts load.
This dashboard is deprecated since Container Cloud 2.25.0 (Cluster
releases 17.0.0 and 16.0.0) and is removed in Container Cloud
2.26.0 (Cluster releases 17.1.0 and 16.1.0).
Therefore, Mirantis recommends switching to the following dashboards
in the current release:
For Ceph stats, use the Ceph Cluster dashboard.
For resource utilization, use the System dashboard,
which includes filtering by Ceph node labels, such as
ceph_role_osd, ceph_role_mon, and ceph_role_mgr.
Ceph OSDs
Provides metrics for Ceph OSDs, including the Ceph OSD read and write
latencies, distribution of PGs per Ceph OSD, Ceph OSDs and physical
device performance.
Ceph Pools
Provides metrics for Ceph pools, including the client IOPS and
throughput by pool and pools capacity usage.
Ironic
Ironic BM
Provides graphs on Ironic health, HTTP API availability, provisioned
nodes by state and installed ironic-conductor backend drivers.
Container Cloud
Clusters Overview
Represents the main cluster capacity statistics for all clusters
of a Container Cloud deployment where StackLight is installed.
Note
Due to the known issue, the
Prometheus Targets Unavailable panel of the
Clusters Overview dashboard does not display data for
managed clusters of the 11.7.0, 11.7.4, 12.5.0, and 12.7.x series
Cluster releases after update to Container Cloud 2.24.0.
Etcd
Available since Container Cloud 2.21.0 (Cluster release 11.5.0). Provides
graphs on database size, leader elections, requests duration, incoming and
outgoing traffic.
MCC Applications Performance
Available since Container Cloud 2.23.0 (Cluster release 11.7.0). Provides
information on the Container Cloud internals work based on Golang,
controller runtime, and custom metrics. You can use it to verify
performance of applications and for troubleshooting purposes.
Kubernetes resources
Kubernetes Calico
Provides metrics of the entire Calico cluster usage, including the
cluster status, host status, and Felix resources.
Kubernetes Cluster
Provides metrics for the entire Kubernetes cluster, including the
cluster status, host status, and resources consumption.
Kubernetes Containers
Provides charts showing resource consumption per deployed Pod
containers running on Kubernetes nodes.
Kubernetes Deployments
Provides information on the desired and current state of all
service replicas deployed on a Container Cloud cluster.
Kubernetes Namespaces
Provides the Pods state summary and the CPU, MEM, network, and IOPS
resources consumption per name space.
Kubernetes Nodes
Provides charts showing resources consumption per Container Cloud
cluster node.
Kubernetes Pods
Provides charts showing resources consumption per deployed Pod.
NGINX
NGINX
Provides the overall status of the NGINX cluster and information
about NGINX requests and connections.
OpenStack
OpenStack - Overview
Provides general information on OpenStack services resources
consumption, API errors, deployed OpenStack compute nodes and block
storage usage.
OpenStack Ingress controller
Available since MOSK 23.3. Monitors the number
of requests, response times and statuses, as well as the number of
Ingress SSL certificates including expiration time and resources usage.
OpenStack Instances Availability
Available since MOSK 23.2.
Provides information about the availability of instance floating IPs
per OpenStack compute node and project. Also, enables monitoring
of probe statistics for individual instance floating IPs.
OpenStack Network IP Capacity
Available since MOSK 25.1.
Provides information about the statistics of IP address allocation for
external networks and subnets on non-Tungsten Fabric based
MOSK clusters. For configuration details, see
Start monitoring IP address capacity.
OpenStack PortProber
Available since MOSK 24.2.
Provides information about the availability of Neutron ports
per OpenStack compute node, project, and port owner.
OpenStack PortProber [Deprecated]
Available since MOSK 25.1.
Provides information about the availability of Neutron ports
per OpenStack compute node, project, and port owner. Deprecated in
favor of the OpenStack PortProber dashboard.
Use this deprecated dashboard only to access old data collected before
MOSK 25.1.
OpenStack PowerDNS
Available since MOSK 24.3.
Provides different stats about OpenStack PowerDNS servers such as
connections, resources, queries, rings, errors, and other.
OpenStack Usage Efficiency
Available since MOSK 23.3.
Provides information about requested (allocated) CPU and memory
usage efficiency on a per-project and per-flavor basis. Aims to
identify flavors that specific projects are not effectively using,
with allocations significantly exceeding actual usage. Also,
evaluates per-instance underuse for specific projects.
KPI - Provisioning
Provides provisioning statistics for OpenStack compute instances,
including graphs on VM creation results by day.
Cinder
Provides graphs on the OpenStack Block Storage service health, HTTP
API availability, pool capacity and utilization, number of created
volumes and snapshots.
Glance
Provides graphs on the OpenStack Image service health, HTTP API
availability, number of created images and snapshots.
Gnocchi
Provides panels and graphs on the Gnocchi health and HTTP API
availability.
Heat
Provides graphs on the OpenStack Orchestration service health, HTTP
API availability and usage.
Ironic OpenStack
Provides graphs on the OpenStack Bare Metal Provisioning service
health, HTTP API availability, provisioned nodes by state and
installed ironic-conductor backend drivers.
Keystone
Provides graphs on the OpenStack Identity service health, HTTP API
availability, number of tenants and users by state.
Neutron
Provides graphs on the OpenStack networking service health, HTTP API
availability, agents status and usage of Neutron L2 and L3 resources.
NGINX Ingress controller
Not recommended. Deprecated since MOSK 23.3 and
is removed in MOSK 24.1. Use
OpenStack Ingress controller instead.
Monitors the number of requests, response times and statuses, as
well as the number of Ingress SSL certificates including expiration
time and resources usage.
Nova - Availability Zones
Provides detailed graphs on the OpenStack availability zones and
hypervisor usage.
Nova - Hypervisor Overview
Provides a set of single-stat panels presenting resources usage by
host.
Nova - Instances
Provides graphs on libvirt Prometheus exporter health and resources
usage. Monitors the number of running instances and tasks and allows
sorting the metrics by top instances.
Nova - Overview
Provides graphs on the OpenStack compute services
(nova-scheduler, nova-conductor, and nova-compute)
health, as well as HTTP API availability.
Nova - Tenants
Provides graphs on CPU, RAM, disk throughput, IOPS, and space usage
and allocation and allows sorting the metrics by top tenants.
Nova - Users
Provides graphs on CPU, RAM, disk throughput, IOPS, and space usage
and allocation and allows sorting the metrics by top users.
Nova - Utilization
Provides detailed graphs on Nova hypervisor resources capacity and
consumption.
Memcached
Memcached Prometheus exporter dashboard. Monitors Kubernetes
Memcached pods and displays memory usage, hit rate, evicts and
reclaims rate, items in cache, network statistics, and commands rate.
MySQL
MySQL Prometheus exporter dashboard. Monitors Kubernetes MySQL pods,
resources usage and provides details on current connections and
database performance.
RabbitMQ [Deprecated]
Not recommended. Deprecated since MOSK 25.1.
RabbitMQ Prometheus exporter dashboard. Monitors Kubernetes RabbitMQ
pods, resources usage and provides details on cluster utilization and
performance.
Caution
This dashboard is renamed from RabbitMQ to
RabbitMQ [Deprecated] in MOSK 25.1
and will be removed in one of the following releases for the sake
of the RabbitMQ Overview and RabbitMQ Erlang
dashboards.
Available since MOSK 25.1. Monitors RabbitMQ BEAM
performance, memory details, load and distribution metrics using native
Prometheus plugin metrics.
RabbitMQ Overview
Available since MOSK 25.1. Monitors RabbitMQ node
performance, resource usage, message queue, channel, and connection
statistics using native Prometheus plugin metrics.
Cassandra
Provides graphs on Cassandra clusters’ health, ongoing operations,
and resource consumption.
Kafka
Provides graphs on Kafka clusters’ and broker health, as well as
broker and topic usage.
Redis
Provides graphs on Redis clusters’ and pods’ health, connections,
command calls, and resource consumption.
Tungsten Fabric
Tungsten Fabric Controller
Provides graphs on the overall Tungsten Fabric Controller cluster
processes and usage.
Tungsten Fabric vRouter
Provides graphs on the overall Tungsten Fabric vRouter cluster
processes and usage.
ZooKeeper
Provides graphs on ZooKeeper clusters’ quorum health and resource
consumption.
StackLight
Alertmanager
Provides performance metrics on the overall health status of the
Prometheus Alertmanager service, the number of firing and resolved
alerts received for various periods, the rate of successful and
failed notifications, and the resources consumption.
OpenSearch
Provides information about the overall health status of the
OpenSearch cluster, including the resources consumption,
number of operations and their performance.
OpenSearch Indices
Provides detailed information about the state of indices,
including their size, the number and the size of segments.
Grafana
Provides performance metrics for the Grafana service, including the
total number of Grafana entities, CPU and memory consumption.
PostgreSQL
Provides PostgreSQL statistics, including read (DQL) and write (DML)
row operations, transaction and lock, replication lag and conflict,
and checkpoint statistics, as well as PostgreSQL performance metrics.
Prometheus
Provides the availability and performance behavior of the Prometheus
servers, the sample ingestion rate, and system usage statistics per
server. Also, provides statistics about the overall status and uptime
of the Prometheus service, the chunks number of the local storage
memory, target scrapes, and queries duration.
Prometheus Relay
Provides service status and resources consumption metrics.
Telemeter Server
Provides statistics and the overall health status of the Telemeter
service.
Note
Due to the known issue, the
Telemeter Client Status panel of the
Telemeter Server dashboard does not display data for
managed clusters of the 11.7.0, 11.7.4, 12.5.0, and 12.7.x series
Cluster releases after update to Container Cloud 2.24.0.
System
System
Provides a detailed resource consumption and operating system
information per Container Cloud cluster node.
Mirantis Kubernetes Engine (MKE)
MKE Cluster
Provides a global overview of an MKE cluster: statistics about
the number of the worker and manager nodes, containers, images,
Swarm services.
MKE Containers
Provides per container resources consumption metrics
for the MKE containers such as CPU, RAM, network.
Export data from Table panels of Grafana dashboards to CSV¶
This section describes how to export data from Table panels of
Grafana dashboards to .csv files.
Note
Grafana performs data exports for individual panels on a dashboard,
not the entire dashboard.
To export data from Table panels of Grafana dashboards to CSV:
Log in to the Grafana web UI as described in Getting access.
In the right top corner of the required Table panel, click
the kebab menu icon and select Inspect > Data.
In Data options of the Data tab, configure export
options:
This section provides an overview of the available predefined StackLight
alerts, including OpenStack, Tungsten Fabric, Container Cloud, Ceph,
StackLight, MKE, and other alerts that can contain information about both
OpenStack and MOSK clusters.
To view the alerts, use the Prometheus web UI. To view the firing alerts, use
Alertmanager or Alerta web UI.
The {{ $labels.namespace }}/{{ $labels.pod }}
MariaDB node in the {{ $labels.cluster }} cluster has
high table lock waits of {{ $value }} percentage
(more than 30).
An average of {{ $value }} evictions occurred in the Memcached
database cluster {{ $labels.cluster }} in the
{{ $labels.namespace }} namespace during the last minute.
This section describes the alerts for the OpenStack SSL certificates.
By default, these alerts are disabled. To enable them, set
openstack.externalFQDNs.enabled to true. For details, see
Configuration options for SSL certificates.
SSL certificate for an OpenStack service expires on {{ $value | humanizeTimestamp }}
Description
The SSL certificate for the OpenStack {{ $labels.namespace }}/{{ $labels.service_name }}
service endpoints expires on {{ $value | humanizeTimestamp }},
less than 10 days are left.
SSL certificate for an OpenStack service expires on {{ $value | humanizeTimestamp }}
Description
The SSL certificate for the OpenStack {{ $labels.namespace }}/{{ $labels.service_name }}
service endpoints expires on {{ $value | humanizeTimestamp }},
less than 30 days are left.
{{ $labels.service_name }} RabbitMQ Exporter Prometheus target is
down.
Description
Prometheus fails to scrape metrics from the
{{ $labels.pod }} Pod of the
{{ $labels.namespace }}/{{ $labels.service_name }}
on the {{ $labels.node }} node.
{{$labels.pod}} RabbitMQ Prometheus target is down.
Description
Prometheus fails to scrape metrics from the {{ $labels.pod }} Pod of
the {{ $labels.namespace }}/{{ $labels.service_name }} on the
{{ $labels.node }} node.
DNS probe failure for {{ $labels.target_name }}{{ $labels.target_type }}
Description
The DNS probe failed at least 3 times for the DNS {{ $labels.target_type }}{{ $labels.target_name }} using the {{ $labels.protocol }} protocol
in the last 20 minutes.
DNS probe target experienced outage for {{ $labels.target_name }}{{ $labels.target_type }}
Description
Prometheus failed to probe the DNS {{ $labels.target_type }}{{ $labels.target_name }}
3 times using the {{ $labels.protocol }} protocol in the last 20 minutes.
High DNS query duration for {{ $labels.target_name }}{{ $labels.target_type }}
Description
The DNS query duration for the DNS {{ $labels.target_type }}{{ $labels.target_name }} using the {{ $labels.protocol }} protocol
exceeded 3 seconds at least 3 times in the last 20 minutes.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster reports an increased
number of authentication failures.
The average hit rate for the {{ $labels.cache }} cache in the
{{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the
{{ $labels.cassandra_cluster }} cluster is below 85%.
The {{ $labels.namespace }}/{{ $labels.pod}} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster reports an increased
number of {{ $labels.operation }} operation failures. A failure is a
non-timeout exception.
Cassandra client {{ $labels.operation }} request is unavailable.
Description
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ labels.cassandra_cluster }} cluster reports an increased
number of {{ $labels.operation }} operations ending with
UnavailableException. There are not enough replicas alive to perform
the {{ $labels.operation }} query with the requested consistency
level.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ labels.cassandra_cluster }} cluster reports that
{{ $value }} compaction executor tasks are blocked.
The pending compaction tasks in the
{{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the
{{ labels.cassandra_cluster }} cluster reached the threshold of 100
on average as measured over 30 minutes. This may occur due to a too low
cluster I/O capacity.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster reports an increased
number of connection timeouts between nodes.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster reports that
{{ $value } flush writer tasks are blocked.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster reports an increased
number of hints. Replica nodes are not available to accept mutation due
to a failure or maintenance.
The {{ $labels.namespace}}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster reports that
{{ $value }} repair tasks are blocked.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster reports an increased
number of storage exceptions.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster scanned {{ $value }}
tombstones in 99% of read queries.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster scanned {{ $value }}
tombstones in 99% of read queries.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster scanned {{ $value }}
tombstones in 99% of read queries.
The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in
the {{ $labels.cassandra_cluster }} cluster reports over 1-second
view/write latency for 99% of requests.
Unclean Kafka broker was elected as cluster leader.
Description
A Kafka broker that has not finished the replication state has been
elected as leader in {{ $labels.cluster }} within the
{{ $labels.namespace }} namespace.
The {{ $labels.cluster }} Redis cluster in the
{{ $labels.namespace }} namespace is not replicating to all
replicas. Consider verifying the Redis replication status.
{{ printf"%0.0f"$value }}% throttling of CPU for container(s) in
Pod(s) of
{{ $labels.created_by_name }}{{ $labels.created_by_kind }} in
the {{ $labels.namespace }} namespace.
This alert ensures that the entire alerting pipeline is functional.
This alert should always be firing in Alertmanager against a receiver.
Some integrations with various notification mechanisms can send a
notification when this alert is not firing. For example, the
DeadMansSnitch integration in PagerDuty.
Patroni cluster member is experiencing data page corruption.
Description
The {{ $labels.namespace }}/{{ $labels.pod }} Patroni Pod in the
{{ $labels.cluster }} cluster fails to calculate the data page
checksum due to a possible hardware fault.
The transactions submitted to the {{ $labels.datname }} database in
the {{ $labels.cluster }} Patroni cluster in the
{{ $labels.namespace }} namespace are experiencing deadlocks.
The query data does not fit into working memory of the
{{ $labels.pod }} Pod in the {{ $labels.cluster }} Patroni
cluster in the {{ $labels.namespace }} namespace.
The {{ $labels.namespace }}/{{ $labels.pod }}
Prometheus Pod has write-ahead log (WAL) corruptions in the time
series database (TSDB) for the last 5 minutes.
The {{ $labels.namespace }}/{{ $labels.pod }}
Prometheus Pod has failed evaluations for recording rules. Verify the
rules state in the Status/Rules section of the Prometheus
web UI.
Available since MCC 2.28.0 (17.3.0 and 16.3.0)TechPreview
This section lists alerts for the host-os-modules-controller service,
including alerts for HostOSConfiguration and HostOSConfigurationModules
custom resources. For details about these resources, refer to Container Cloud
API Reference:
HostOSConfiguration and
and HostOSConfigurationModules.
Deprecated module {{ $labels.module_name }} version
{{ $labels.module_version }} is used by {{ $value }}HostOSConfiguration object(s). It is deprecated by the module
{{ $labels.deprecated_by_module_name }} version
{{ $labels.deprecated_by_module_version }}.
Pod of {{ $labels.created_by_name }}{{ $labels.created_by_kind }} in crash loop.
Description
At least one Pod container of
{{ $labels.created_by_name }}{{ $labels.created_by_kind }} in the
{{ $labels.namespace }} namespace was restarted
more than twice during the last 20 minutes.
Pods of {{ $labels.created_by_name }}{{ $labels.created_by_kind }} in non-ready state.
Description
{{ $labels.created_by_name }}{{ $labels.created_by_kind }} in the
{{ $labels.namespace }} namespace has Pods in
non-Ready state for longer than 12 minutes.
{{ $labels.created_by_name }}{{ $labels.created_by_kind }} Pod
restarted regularly.
Description
The Pod of {{ $labels.created_by_name }}{{ $labels.created_by_kind }} in the {{ $labels.namespace }}
namespace has a container that was restarted at least once a day during
the last 2 days.
Deployment {{ $labels.deployment }} generation does not match the
metadata.
Description
The {{ $labels.namespace }}/{{ $labels.deployment }} Deployment
generation does not match the metadata, indicating that the Deployment
has failed but has not been rolled back.
StatefulSet {{ $labels.statefulset }} generation does not match the
metadata.
Description
The {{ $labels.namespace }}/{{ $labels.statefulset }}
StatefulSet generation does not match the metadata, indicating that the
StatefulSet has failed but has not been rolled back.
Due to the upstream bug in Kubernetes,
metrics for the KubePersistentVolumeUsageCritical and
KubePersistentVolumeFullInFourDays alerts that are collected for
persistent volumes provisioned by cinder-csi-plugin are not available.
PersistentVolume {{ $labels.persistentvolumeclaim }} is expected to
fill up in 4 days.
Description
The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }}
in the {{ $labels.namespace }} namespace is expected to fill up
within four days. Currently, {{ printf"%0.2f"$value }}% of free
space is available.
The {{ $labels.release_namespace }}/{{ $labels.release_name }}
release of the {{ $labels.namespace }}/{{ $labels.name }}
HelmBundle reconciled by the {{ $labels.controller_namespace }}/
{{ $labels.controller_name }} Controller is not in the deployed
status for the last 15 minutes.
Prometheus fails to scrape metrics from the
{{ $labels.controller_pod }} of the
{{ $labels.controller_namespace }}/{{ $labels.controller_name }}
on the {{ $labels.node }} node.
This section describes the alerts for Mirantis Container Cloud. These alerts
are based on metrics from the Mirantis Container Cloud Metric Exporter
(MCC Exporter) service.
Available since MCC 2.28.0 (17.3.0 and 16.3.0)TechPreview
Note
Before Container Cloud 2.29.2 (Cluster releases 17.3.7, 16.4.2, and
16.3.7), this alert was named ClusterUpdateStepInProggress, which
contained a typo.
Severity
Informational
Summary
Step {{ $labels.step_id }} of the Container Cloud cluster update
is in progress
Description
Step {{ $labels.step_id }} of the {{ $labels.cluster_namespace }}/{{ $labels.cluster_name }}
({{ $labels.cluster_uid }}) cluster update to {{ $labels.target }}
is in progress.
The Container Cloud update from {{ $labels.active_kaasrelease_version }}
to {{ $labels.pending_kaasrelease_version }} is available but blocked.
For details, see Troubleshoot Mirantis Container Cloud Exporter alerts.
The Container Cloud update from {{ $labels.active_kaasrelease_version }} to
{{ $labels.pending_kaasrelease_version }} is available and scheduled for
{{ $value|humanizeTimestamp }}. For details, see Schedule Mirantis Container Cloud updates.
SSL certificate for a Mirantis Container Cloud service expires on
{{ $value | humanizeTimestamp }}.
Description
The SSL certificate for the Mirantis Container Cloud
{{ $labels.namespace }}/{{ $labels.service_name }} service endpoints
expires on {{ $value | humanizeTimestamp }}, less than 10 days are left.
SSL certificate for a Mirantis Container Cloud service expires in less
than 30 days.
Description
The SSL certificate for the Mirantis Container Cloud
{{ $labels.namespace }}/{{ $labels.service_name }} service endpoints
expires on {{ $value | humanizeTimestamp }}, less than 30 days are left.
The qLen size and NetMsg showed unexpected output
for the last 10 minutes. Verify the NetworkDbStats output
for the qLen size and NetMsg using
journalctl -d docker.
Note
For the DockerNetworkUnhealthy alert, StackLight collects
metrics from logs. Therefore, this alert is available only if logging
is enabled.
The {{ $labels.node }} node is down. During the last
2 minutes Kubernetes treated the node as NotReady or Unknown
and kubelet was not accessible from Prometheus.
{{ $value|printf"%.2f" }} packets transmitted by the
{{ $labels.device }} interface on the {{ $labels.node }} node
were dropped during the last minute.
Using alert inhibition rules, Alertmanager decreases alert noise by suppressing
dependent alerts notifications to provide a clearer view on the cloud status
and simplify troubleshooting. Alert inhibition rules are enabled by default.
The following tables describe the dependencies between the OpenStack-related
and MOSK cluster alerts.
Once an alert from the Alert column raises, the alert from the
Inhibits and rules column will be suppressed with the
Inhibited status in the Alertmanager web UI.
The Inhibits and rules column lists the labels and conditions, if
any, for the inhibition to apply.
CephOSDDown with the same rook_cluster label
Before MCC 2.25.0 (17.0.0 and 16.0.0)
CephOSDDiskUnavailable
CephOSDDown with the same rook_cluster label
Before MCC 2.25.0 (17.0.0 and 16.0.0)
CephOSDNodeDownSince MCC 2.25.0 (17.0.0 and 16.0.0)
With the same node label:
CephOSDDiskNotResponding
CephOSDDiskUnavailable
CephOSDPgNumTooHighCritical
CephOSDPgNumTooHighWarning
DockerSwarmServiceReplicasFlapping
DockerSwarmServiceReplicasDown with the same service_id,
service_mode, and service_name labels
DockerSwarmServiceReplicasOutage
DockerSwarmServiceReplicasDown with the same service_id,
service_mode, and service_name labels
etcdDbSizeCritical
etcdDbSizeMajor with the same job and instance labels
etcdHighNumberOfFailedGRPCRequestsCritical
etcdHighNumberOfFailedGRPCRequestsWarning with the same
grpc_method, grpc_service, job, and instance labels
ExternalEndpointDown
ExternalEndpointTCPFailure with the same instance and job
labels
FileDescriptorUsageMajor
FileDescriptorUsageWarning with the same node label
FluentdTargetsOutage
FluentdTargetDown
KubeAPICertExpirationHigh
KubeAPICertExpirationMedium
KubeAPIErrorsHighMajor
KubeAPIErrorsHighWarning with the same instance label
KubeAPIOutage
KubeAPIDown
KubeAPIResourceErrorsHighMajor
KubeAPIResourceErrorsHighWarning with the same instance,
resource, and subresource labels
KubeClientCertificateExpirationInOneDayRemoved in MCC 2.28.0 (17.3.0 and 16.3.0)
KubeClientCertificateExpirationInSevenDays with the same
instance label
KubeDaemonSetOutage
CalicoTargetsOutage
KubeDaemonSetRolloutStuck with the same daemonset and
namespace labels
FluentdTargetsOutage
NodeExporterTargetsOutage
TelegrafSMARTTargetsOutage
KubeDeploymentOutage
KubeDeploymentReplicasMismatch with the same deployment and
namespace labels
GrafanaTargetDown
KubeDNSTargetsOutageRemoved in MCC 2.25.0 (17.0.0 and 16.0.0)
KubernetesMasterAPITargetsOutage
KubeStateMetricsTargetDown
PrometheusEsExporterTargetDown
PrometheusMsTeamsTargetDown
PrometheusRelayTargetDown
ServiceNowWebhookReceiverTargetDown
SfNotifierTargetDown
TelegrafDockerSwarmTargetDown
TelegrafOpenstackTargetDown
KubeJobFailed
KubePodsNotReady for created_by_kind=Job and with the same
created_by_name label (removed in Container Cloud 2.25.0, Cluster releases 17.0.0 and 16.0.0)
KubeletTargetsOutage
KubeletTargetDown
KubePersistentVolumeUsageCritical
With the same namespace and persistentvolumeclaim labels:
KubePersistentVolumeFullInFourDays
OpenSearchStorageUsageCritical
Since MCC 2.26.0 (17.1.0 and 16.1.0)
OpenSearchStorageUsageMajor
Since MCC 2.26.0 (17.1.0 and 16.1.0)
KubePodsCrashLooping
KubePodsRegularLongTermRestarts with the same created_by_name,
created_by_kind, and namespace labels
KubeStatefulSetOutage
Alerts with the same namespace and statefulset labels:
KubeStatefulSetUpdateNotRolledOut
KubeStatefulSetReplicasMismatch
AlertmanagerTargetDownSince MCC 2.25.0 (17.0.0 and 16.0.0)
AlertmanagerClusterTargetDownBefore MCC 2.25.0 (17.0.0 and 16.0.0)
ElasticsearchExporterTargetDown
FluentdTargetsOutage
OpenSearchClusterStatusCritical
PostgresqlReplicaDown
PostgresqlTargetDownSince MCC 2.25.0 (17.0.0 and 16.0.0)
PostgresqlTargetsOutageBefore MCC 2.25.0 (17.0.0 and 16.0.0)
PrometheusEsExporterTargetDown
PrometheusServerTargetDownSince MCC 2.25.0 (17.0.0 and 16.0.0)
PrometheusServerTargetsOutageBefore MCC 2.25.0 (17.0.0 and 16.0.0)
MCCLicenseExpirationHigh
MCCLicenseExpirationMedium
MCCSSLCertExpirationHigh
MCCSSLCertExpirationMedium with the same namespace and
service_name labels
MCCSSLProbesServiceTargetOutage
MCCSSLProbesEndpointTargetOutage with the same namespace and
service_name labels
MKEAPICertExpirationHigh
MKEAPICertExpirationMedium
MKEAPIOutage
MKEAPIDown
MKEMetricsEngineTargetsOutage
MKEMetricsEngineTargetDown
MKENodeDiskFullCritical
MKENodeDiskFullWarning with the same node label
NodeDown
KubeDaemonSetMisScheduled for the following DaemonSets
(removed in Container Cloud 2.27.0, Cluster releases 17.2.0 and 16.2.0):
cadvisor
csi-cephfsplugin
csi-cinder-nodeplugin
csi-rbdplugin
fluentd-logs
local-volume-provisioner
metallb-speaker
openstack-ccm
prometheus-libvirt-exporter
prometheus-node-exporter
rook-discover
telegraf-ds-smart
ucp-metrics
KubeDaemonSetRolloutStuck for the calico-node,
ucp-node-feature-discovery (since Container Cloud 2.29.0, Cluster
releases 17.4.0 and 16.4.0), and ucp-nvidia-device-plugin DaemonSets
For resource=nodes:
KubeAPIResourceErrorsHighMajor
KubeAPIResourceErrorsHighWarning
Alerts with the same node label:
cAdvisorTargetDown
CalicoTargetDown
FluentdTargetDown
KubeletDown
KubeletTargetDown
KubeNodeNotReady
LibvirtExporterTargetDown
MKEMetricsEngineTargetDown
MKENodeDown
NodeExporterTargetDown
TelegrafSMARTTargetDown
Since MCC 2.25.0 (Cluster releases 17.0.0 and 16.0.0)`:
AlertmanagerTargetDown
CephClusterTargetDown
etcdTargetDown
GrafanaTargetDown
HelmControllerTargetDown
KubeAPIDown
MCCCacheTargetDown
MCCControllerTargetDown
MCCProviderTargetDown
MKEAPIDown
PostgresqlTargetDown
PrometheusMsTeamsTargetDown
PrometheusRelayTargetDown
PrometheusServerTargetDown
ServiceNowWebhookReceiverTargetDown
SfNotifierTargetDown
TelegrafDockerSwarmTargetDown
TelemeterClientTargetDown
TelemeterServerFederationTargetDown
TelemeterServerTargetDown
NodeExporterTargetsOutage
NodeExporterTargetDown
OpenSearchClusterStatusCritical
OpenSearchClusterStatusWarning and
OpenSearchNumberOfUnassignedShards (removed in Container Cloud 2.27.0,
Cluster releases 17.2.0 and 16.2.0) with the same cluster label
For created_by_name=~"elasticsearch-curator-.":
KubeJobFailed
KubePodsNotReady (removed in Container Cloud 2.27.0, Cluster releases
17.0.0 and 16.0.0)
OpenSearchClusterStatusWarning
Since MCC 2.26.0 (17.1.0 and 16.1.0)
OpenSearchNumberOfUnassignedShards with the same cluster label
(removed in Container Cloud 2.27.0, Cluster releases 17.2.0 and 16.2.0)
OpenSearchHeapUsageCritical
OpenSearchHeapUsageWarning with the same cluster and name
labels
OpenSearchStorageUsageCritical
Since MCC 2.26.0 (17.1.0 and 16.1.0)
KubePersistentVolumeFullInFourDays and OpenSearchStorageUsageMajor
with the same namespace and persistentvolumeclaim labels
OpenSearchStorageUsageMajor
Since MCC 2.26.0 (17.1.0 and 16.1.0)
KubePersistentVolumeFullInFourDays with the same namespace
and persistentvolumeclaim labels
PostgresqlPatroniClusterUnlocked
With the same cluster and namespace labels:
PostgresqlReplicationNonStreamingReplicas
PostgresqlReplicationPaused
PostgresqlReplicaDown
Alerts with the same cluster and namespace labels:
PostgresqlReplicationNonStreamingReplicas
PostgresqlReplicationPaused
PostgresqlReplicationSlowWalApplication
PostgresqlReplicationSlowWalDownload
PostgresqlReplicationWalArchiveWriteFailing
PrometheusErrorSendingAlertsMajor
PrometheusErrorSendingAlertsWarning with the same alertmanager
and pod labels
SystemDiskFullMajor
SystemDiskFullWarning with the same device, mountpoint, and
node labels
SystemDiskInodesFullMajor
SystemDiskInodesFullWarning with the same device,
mountpoint, and node labels
SystemLoadTooHighCritical
SystemLoadTooHighWarning with the same node label
SystemMemoryFullMajor
SystemMemoryFullWarning with the same node label
SSLCertExpirationHigh
SSLCertExpirationMedium with the same instance label
Due to the Alertmanager issue, silences with
regexp matchers do not mute all notifications for all alerts matched by the
specified regular expression.
If you need to mute multiple alerts, for example, for maintenance or before
cluster update, Mirantis recommends using a set of fixed-matcher silences
instead. As an example, this section describes how to silence all alerts for a
specified period through the Alertmanager web UI or CLI without using the
regexp matchers. You can also manually force silence expiration before the
specified period ends.
To silence all alerts:
Silence alerts through the Alertmanager web UI:
Log in to the Alertmanager web UI as described in Getting access.
Click New Silence.
Create four Prometheus Alertmanager silences. In Matchers,
set Name to severity and Value to
warning, minor, major, and
critical, one for each silence.
Note
To silence the Watchdog alert, create an additional silence
with severity set in Name and
informational set in Value.
Silence alerts through CLI:
Log in to the host where your management cluster kubeconfig is located
and where kubectl is installed.
Run the following command setting the required duration:
kubectlexec-it-nstacklightprometheus-alertmanager-1prometheus-alertmanager--sh-c'rm -f /tmp/all_silences; \ touch /tmp/all_silences; \ for severity in warning minor major critical; do \ echo $severity; \ amtool silence add severity=${severity} \ --alertmanager.url=<http://prometheus-alertmanager> \ --comment="silence them all" \ --duration="2h" | tee /tmp/all_silences; \ done'
Note
To silence the Watchdog alert, add informational to the
list of severities.
To expire alert silences:
To expire alert silences through the Alertmanager web UI, click
Expire next to each silence.
To expire alert silences through CLI, run the following command:
kubectlexec-it-nstacklightprometheus-alertmanager-1prometheus-alertmanager--sh-c'for silence in $(cat /tmp/all_silences); do \ echo $severity; \ amtool silence expire $silence \ --alertmanager.url=<http://prometheus-alertmanager;> \ done'
Available since Cluster releases 17.0.1 and 16.0.1
The Kubernetes NetworkPolicy resource allows controlling network connections
to and from Pods within a cluster. This enhances security by restricting
communication from compromised Pod applications and provides transparency
into how applications communicate with each other.
Network Policies are enabled by default in StackLight using the
networkPolicies parameter. For configuration details, see
Kubernetes network policies.
The following table contains general network policy rules applied to
StackLight components:
Obsolete since MCC 2.26.0 (17.1.0, 16.1.0) for OpenSearch
Caution
In Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0),
the storage-based log retention together with the updated proportion of
available disk space replaces the estimated storage retention management
in OpenSearch. For details, see Storage-based log retention strategy.
The logging.retentionTime parameter is removed from the StackLight
configuration. While the Estimated Retention panel of the
OpenSearch dashboard in Grafana can provide some information,
it does not provide any guarantees. The panel is removed in Container Cloud
2.26.1 (Cluster releases 17.1.1 and 16.1.1). Therefore, consider this section
as obsolete for OpenSearch.
Using the following panels in the OpenSearch and Prometheus dashboards,
you can view details about the storage usage on managed clusters. These
details allow you to calculate the possible retention time based on
provisioned storage and its average usage:
OpenSearch dashboard:
Shards > Estimated Retention
Resources > Disk
Resources > File System Used Space by Percentage
Resources > Stored Indices Disk Usage
Resources > Age of Logs
Prometheus dashboard:
General > Estimated Retention
Resources > Storage
Resources > Storage by Percentage
To calculate the storage retention time:
Log in to the Grafana web UI. For details, see Getting access.
Assess the OpenSearch and Prometheus dashboards.
For details on Grafana dashboards, see View Grafana dashboards.
On each dashboard, select the required period for calculation.
Tip
Mirantis recommends analyzing at least one day of data collected
in the respective component to benefit from results presented on the
Estimated Retention panels.
Assess the Cluster > Estimated Retention panel of each
dashboard.
The panel displays maximum possible retention days while other
panels provide details on utilized and available storage.
If persistent volumes of some StackLight components share storage,
partition the storage logically to separate components before estimating
the retention threshold. This is required since the
Estimated Retention panel uses the entire provisioned storage
as the calculation base.
For example, if StackLight is deployed in the default HA mode, then it uses
Local Volume Provisioner that provides shared storage unless two separate
partitions are configured for each cluster node for exclusive use of
Prometheus and OpenSearch.
Two main storage provisioners are OpenSearch and Prometheus. The level of
storage usage by other StackLight components is relatively low.
For example, you can share storage logically as follows:
35% for Prometheus
35% for OpenSearch
30% for other components
In this case, take 35% of the calculated maximum retention value
and set it as threshold.
In the Prometheus dashboard, navigate to
Resources (Row) > Storage (Panel) > total provisioned disk
per pod (Metric) to verify the retention size for the Prometheus storage.
If both retention time and size are set, Prometheus applies retention
to the first reached threshold.
Caution
Mirantis does not recommend setting the retention size to
0 and replying on the retention time only.
You can change the retention settings through either the web UI or API:
Using the Container Cloud web UI, navigate to the
Configure cluster menu and use the StackLight tab
Using the Container Cloud API:
For OpenSearch, use the logging.retentionTime parameter
For Prometheus, use the prometheusServer.retentionTime and
prometheusServer.retentionSize parameters
Available since MCC 2.24.0 (Cluster release 14.0.0)
If you plan to switch to a long log retention period (months), tune StackLight
by increasing the cluster.max_shards_per_node limit. This configuration
enables OpenSearch to successfully accept new logs and prevents the
maximumopenshards error.
Available since MCC 2.23.0 (Cluster release 11.7.0)
By default, StackLight sends logs to OpenSearch. However, you can
configure StackLight to add external Elasticsearch, OpenSearch, and syslog
destinations as the fluentd-logs output. In this case, StackLight will
send logs both to an external server(s) and OpenSearch.
Since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0), you can also
enable sending of Container Cloud service logs to Splunk using the syslog
external output configuration. The feature is available in the
Technology Preview scope.
Warning
Sending logs to Splunk implies that the target Splunk instance is
available from the MOSK cluster. If proxy is enabled, the
feature is not supported.
Prior to enabling the functionality, complete the following prerequisites:
Enable StackLight logging
Deploy an external server outside MOSK
Make sure that Container Cloud proxy is not enabled since it only supports
the HTTP(S) traffic
For Splunk, configure the server to accept logs:
Create an index and set its type to Event
Configure data input:
Open the required port
Configure the required protocol (TCP/UDP)
Configure connection to the created index
To enable log forwarding to external destinations:
In the stacklight.values section of the opened manifest, configure the
logging.externalOutputs parameters using the following table.
Key
Description
Example values
disabled (bool)
Optional. Disables the output destination using disabled:true.
If not set, defaults to disabled:false.
true or false
type (string)
Required. Specifies the type of log destination. The following values
are accepted: elasticsearch, opensearch, remote_syslog,
and opensearch_data_stream (since Container Cloud 2.26.0,
Cluster releases 17.1.0 and 16.1.0).
remote_syslog
level (string)
Removed in MCC 2.26.0 (17.1.0, 16.1.0)
Optional. Sets the least important level of log messages to send. For
example, values that are defined using the severity_label field,
see the logging.level description in Logging.
warning
plugin_log_level (string)
Optional. Defaults to info. Sets the value of @log_level of the
output plugin for a particular backend. For other available values,
refer to the logging.level description in Logging.
notice
tag_exclude (string)
Optional. Overrides tag_include. Sets logs by tags to exclude from the
destination output. For example, to exclude all logs with the test tag,
set tag_exclude:'/.*test.*/'.
How to obtain tags for logs
Select from the following options:
In the main OpenSearch output, use the logger field that equals the
tag.
Use logs of a particular Pod or container by following the below order,
with the first match winning:
The value of the app Pod label. For example, for
app=opensearch-master, use opensearch-master as the log tag.
The value of the k8s-app Pod label.
The value of the app.kubernetes.io/name Pod label.
If a release_group Pod label exists and the component Pod label
starts with app, use the value of the component label as the tag.
Otherwise, the tag is the application label joined to the component
label with a -.
The name of the container from which the log is taken.
The values for tag_exclude and tag_include are placed into
<match> directives of Fluentd and only accept regex types that are
supported by the <match> directive of Fluentd. For details, refer to the
Fluentd official documentation.
'{fluentd-logs,systemd}'
tag_include (string)
Optional. Is overridden by tag_exclude. Sets logs by tags to include to
the destination output. For example, to include all logs with the auth
tag, set tag_include:'/.*auth.*/'.
'/.*auth.*/'
<pluginConfigOptions> (map)
Configures plugin settings. Has a hierarchical structure. The
first-level configuration parameters are dynamic except type,
id, and log_level that are reserved by StackLight. For
available options, refer to the required plugin documentation.
Mirantis does not set any default values for plugin configuration
settings except the reserved ones.
The second-level configuration options are predefined and limited to
buffer (for any type of log destination) and format (for
remote_syslog only). Inside the second-level configuration, the
parameters are dynamic.
For available configuration options, refer to the following
documentation:
Configures buffering of events using the second-level configuration
options. Applies to any type of log destinations. Parameters are
dynamic except the following mandatory ones that should not be
modified:
type:file that sets the default buffer type
path:<pathToBufferFile> that sets the path to the buffer
destination file
overflow_action:block that prevents Fluentd from crashing if
the output destination is down
For details about other mandatory and optional buffer
parameters, see the Fluentd: Output Plugins
documentation.
Note
To disable buffer without deleting it, use
buffer.disabled:true.
Configures the type of logs to forward. If set to audit,
only audit logs are forwarded. If unset, only system logs
are forwarded.
opensearch:output_kind:audit
Example configuration for logging.externalOutputs
logging:externalOutputs:elasticsearch:# disabled: falsetype:elasticsearchlevel:info# Removed in MCC 2.26.0 (17.1.0, 16.1.0)plugin_log_level:infotag_exclude:'{fluentd-logs,systemd}'host:elasticsearch-hostport:9200logstash_date_format:'%Y.%m.%d'logstash_format:truelogstash_prefix:logstash...buffer:# disabled: falsechunk_limit_size:16mflush_interval:15sflush_mode:intervaloverflow_action:block...opensearch:disabled:truetype:opensearchlevel:info# Removed in MCC 2.26.0 (17.1.0, 16.1.0)plugin_log_level:infotag_include:'/.*auth.*/'host:opensearch-hostport:9200logstash_date_format:'%Y.%m.%d'logstash_format:truelogstash_prefix:logstashoutput_kind:audit# Since MCC 2.26.0 (17.1.0, 16.1.0)...buffer:chunk_limit_size:16mflush_interval:15sflush_mode:intervaloverflow_action:block...syslog:type:remote_syslogplugin_log_level:infolevel:info# Removed in MCC 2.26.0 (17.1.0, 16.1.0)tag_include:'{iam-proxy,systemd}'host:remote-syslog.svcport:514hostname:example-hostnamepacket_size:1024protocol:udptls:falsebuffer:disabled:trueformat:"@type":single_valuemessage_key:message...splunk_syslog_output:type:remote_sysloghost:remote-splunk-syslog.svcport:514protocol:tcptls:trueca_file:/etc/ssl/certs/splunk-syslog.pemverify_mode:0buffer:chunk_limit:16MBtotal_limit:128MBexternalOutputSecretMounts:-secretName:syslog-pemmountPath:/etc/ssl/certs/splunk-syslog.pem
Note
Mirantis recommends that you tune the packet_size parameter
value to allow sending full log lines.
This parameter defines the packet size in bytes for the syslog logging
output. It is useful for syslog setups allowing packet size larger than
1 kB.
Optional. Mount authentication secrets for the required external
destination to Fluentd using logging.externalOutputSecretMounts. For
the parameter options, see Logging to external outputs: secrets.
If Fluentd cannot flush logs and the buffer of the external output
starts to fill depending on resources and configuration of the external
Elasticsearch or OpenSearch server, the
Datatoolarge,circuit_breaking_exception error may occur even after
you resolve the external output issues.
This error indicates that the output destination cannot accept logs data sent
in bulk because of their size. To mitigate the issue, select from the
following options:
Set bulk_message_request_threshold to 10MB or lower. It is
unlimited by default. For details, see the Fluentd plugin documentation
for Elasticsearch.
Adjust output destinations to accept a large amount of data at once. For
details, refer to the official documentation of the required external
system.
Deprecated since MCC 2.23.0 (Cluster release 11.7.0)
Caution
Since Container Cloud 2.23.0 (Cluster release 11.7.0), this
procedure and the logging.syslog parameter are deprecated. For a new
configuration of remote logging to syslog, follow the
Enable log forwarding to external destinations procedure instead.
By default, StackLight sends logs to OpenSearch. However, you can
configure StackLight to forward all logs to an external syslog server. In this
case, StackLight will send logs both to the syslog server and to OpenSearch.
Prior to enabling the functionality, consider the following requirements:
StackLight logging must be enabled
A remote syslog server must be deployed outside MOSK
Container Cloud proxy must not be enabled since it only supports the
HTTP(S) traffic
Mirantis recommends that you tune the packetSize parameter value
to allow sending full log lines.
The hostname field in the remote syslog database will be set based
on clusterId specified in the StackLight chart values. For example,
if clusterId is ns/cluster/example-uid, the hostname will
transform to ns_cluster_example-uid. For details, see clusterId
in StackLight configuration parameters.
StackLight provides a vast variety of metrics for MOSK
components. However, you may need to create a custom log-based metric to use it
for alert notifications, for example, in the following cases:
If a component producing logs does not expose scraping targets. In this case,
component-specific metrics may be missing.
If a scraping target lacks information that can be collected by aggregating
the log messages.
If alerting reasons are more explicitly presented in log messages.
For example, you want to receive alert notifications when more than 10 cases
are created in Salesforce within an hour. The sf-notifier scraping
endpoint does not expose such information. However, sf-notifier logs are
stored in OpenSearch and using prometheus-es-exporter you can perform the
following:
Configure a query using Query DSL (Domain Specific Language) and test it in
Dev Tools in in OpenSearch Dashboards.
Configure Prometheus Elasticsearch Exporter to expose the result as a
Prometheus metric showing the total amount of Salesforce cases created
daily, for example, salesforce_cases_daily_total_value.
Configure StackLight to send a notification once the value of this metric
increases by 10 or more within an hour.
Caution
StackLight logging must be enabled and functional.
Prometheus-es-exporter uses OpenSearch Search API. Therefore,
configured queries must be tuned for this specific API and must include:
The query part to filter documents
The aggregation part to combine filtered documents into a
metric-oriented result
In the manifest that opens, verify that StackLight logging is enabled:
logging:enabled:true
Create a query using Query DSL:
Select one of the following options:
Since Container Cloud 2.26.0 (Cluster releases 17.1.0 and
16.1.0)
In the OpenSearch Dashboards web UI, select an index to query.
StackLight stores logs in hourly OpenSearch indices.
Note
Optimize the query time by limiting the number of results.
For example, we will use the OpenSearch event.provider field
set to sf-notifier to limit the number of logs to search.
For example:
GET system/_search{"query":{"bool":{"filter":[{"term":{"event.provider":{"value":"sf-notifier"}}},{"range":{"@timestamp":{"gte":"now/d"}}}]}}}
Before Container Cloud 2.26.0 (Cluster releases 17.1.0 and
16.1.0)
In the OpenSearch Dashboards web UI, select an index to query.
StackLight stores logs in hourly OpenSearch indices. To select all
indices for a day, use the <logstash-{now/d}*> index pattern,
which stands for %3Clogstash-%7Bnow%2Fd%7D*%3E when URL-encoded.
Note
Optimize the query time by limiting the number of results.
For example, we will use the OpenSearch logger field set to
sf-notifier to limit the number of logs to search.
For example:
GET /%3Clogstash-%7Bnow%2Fd%7D*%3E/_search{"query":{"bool":{"must":{"term":{"logger":{"value":"sf-notifier"}}}}}}
Test the query in Dev Tools in OpenSearch Dashboards.
Select the log lines that include information about Salesforce cases
creation. For the info logging level, to indicate case creation,
sf-notifier produces log messages similar to the following one:
[2021-07-02 12:35:28,596] INFO in client: Created case: OrderedDict([('id', '5007h000007iqmKAAQ'), ('success', True), ('errors', [])]).
Such log messages include the Created case phrase. Use it in the query
to filter log messages for created cases:
Combine the query result to a single value that
prometheus-es-exporter will expose as a metric. Use the
value_count aggregation:
Since Container Cloud 2.26.0 (Cluster releases 17.1.0 and
16.1.0)
GET system/_search{"query":{"bool":{"filter":[{"term":{"event.provider":{"value":"sf-notifier"}}},{"range":{"@timestamp":{"gte":"now/d"}}},{"match_phrase_prefix":{"message":"Created case"}}]}},"aggs":{"daily_total":{"value_count":{"field":"event.provider"}}}}
Before Container Cloud 2.26.0 (Cluster releases 17.1.0 and
16.1.0)
GET /%3Clogstash-%7Bnow%2Fd%7D*%3E/_search{"query":{"bool":{"must":{"term":{"logger":{"value":"sf-notifier"}}},"filter":{"match_phrase_prefix":{"message":"Created case"}}}},"aggs":{"daily_total":{"value_count":{"field":"logger"}}}}
The aggregation result in Dev Tools should look as follows:
"aggregations":{"daily_total":{"value":19}}
Note
The metric name is suffixed with the aggregation name and
the result field name: salesforce_cases_daily_total_value.
In the example below, salesforce_cases is the query name. The final
metric name can be generalized using the
<query_name>_<aggregation_name>_<aggregation_result_field_name>
template.
Since Container Cloud 2.26.0 (Cluster releases 17.1.0 and
16.1.0)
prometheusServer:customAlerts:-alert:SalesforceCasesDailyWarningannotations:description:The number of cases created today in Salesforce increased by 10 within the last hour.summary:Too many cases in Salesforceexpr:increase(salesforce_cases_daily_total_value[1h]) >= 10labels:severity:warningservice:custom
StackLight can scrape metrics from any service that exposes Prometheus metrics
and is running on the Kubernetes cluster. Such metrics appear in Prometheus
under the
{job="stacklight-generic",service="<service_name>",namespace="<service_namespace>"}
set of labels. If the Kubernetes service is backed by Kubernetes pods, the set
of labels also includes {pod="<pod_name>"}.
To enable the functionality, define at least one of the following annotations
in the service metadata:
"generic.stacklight.mirantis.com/scrape-path" - the HTTP endpoint path,
related to the Prometheus scrape_config.metrics_path option. By default,
/metrics.
"generic.stacklight.mirantis.com/scrape-scheme" - the HTTP endpoint
scheme between HTTP and HTTPS, related to the Prometheus
scrape_config.scheme option. By default, http.
Available since MCC 2.24.0 (Cluster release 14.0.0)
By default, StackLight drops unused metrics to increase Prometheus performance
providing better resource utilization and faster query response.
The following list contains white-listed scrape jobs grouped by the job name.
Prometheus collects metrics from this list by default.
White list of Prometheus scrape jobs
_group-blackbox-metrics:-probe_dns_lookup_time_seconds-probe_duration_seconds-probe_http_content_length-probe_http_duration_seconds-probe_http_ssl-probe_http_uncompressed_body_length-probe_ssl_earliest_cert_expiry-probe_success_group-controller-runtime-metrics:-workqueue_adds_total-workqueue_depth-workqueue_queue_duration_seconds_count-workqueue_queue_duration_seconds_sum-workqueue_retries_total-workqueue_work_duration_seconds_count-workqueue_work_duration_seconds_sum_group-etcd-metrics:-etcd_cluster_version-etcd_debugging_snap_save_total_duration_seconds_sum-etcd_disk_backend_commit_duration_seconds_bucket-etcd_disk_backend_commit_duration_seconds_count-etcd_disk_backend_commit_duration_seconds_sum-etcd_disk_backend_snapshot_duration_seconds_count-etcd_disk_backend_snapshot_duration_seconds_sum-etcd_disk_wal_fsync_duration_seconds_bucket-etcd_disk_wal_fsync_duration_seconds_count-etcd_disk_wal_fsync_duration_seconds_sum-etcd_mvcc_db_total_size_in_bytes-etcd_network_client_grpc_received_bytes_total-etcd_network_client_grpc_sent_bytes_total-etcd_network_peer_received_bytes_total-etcd_network_peer_sent_bytes_total-etcd_server_go_version-etcd_server_has_leader-etcd_server_leader_changes_seen_total-etcd_server_proposals_applied_total-etcd_server_proposals_committed_total-etcd_server_proposals_failed_total-etcd_server_proposals_pending-etcd_server_quota_backend_bytes-etcd_server_version-grpc_server_handled_total-grpc_server_started_total_group-go-collector-metrics:-go_gc_duration_seconds-go_gc_duration_seconds_count-go_gc_duration_seconds_sum-go_goroutines-go_info-go_memstats_alloc_bytes-go_memstats_alloc_bytes_total-go_memstats_buck_hash_sys_bytes-go_memstats_frees_total-go_memstats_gc_sys_bytes-go_memstats_heap_alloc_bytes-go_memstats_heap_idle_bytes-go_memstats_heap_inuse_bytes-go_memstats_heap_released_bytes-go_memstats_heap_sys_bytes-go_memstats_lookups_total-go_memstats_mallocs_total-go_memstats_mcache_inuse_bytes-go_memstats_mcache_sys_bytes-go_memstats_mspan_inuse_bytes-go_memstats_mspan_sys_bytes-go_memstats_next_gc_bytes-go_memstats_other_sys_bytes-go_memstats_stack_inuse_bytes-go_memstats_stack_sys_bytes-go_memstats_sys_bytes-go_threads_group-process-collector-metrics:-process_cpu_seconds_total-process_max_fds-process_open_fds-process_resident_memory_bytes-process_start_time_seconds-process_virtual_memory_bytes_group-rest-client-metrics:-rest_client_request_latency_seconds_count-rest_client_request_latency_seconds_sum_group-service-handler-metrics:-service_handler_count-service_handler_sum_group-service-http-metrics:-service_http_count-service_http_sum_group-service-reconciler-metrics:-service_reconciler_count-service_reconciler_sumalertmanager-webhook-servicenow:-servicenow_auth_okblackbox:[]blackbox-external-endpoint:[]cadvisor:-cadvisor_version_info-container_cpu_cfs_periods_total-container_cpu_cfs_throttled_periods_total-container_cpu_usage_seconds_total-container_fs_reads_bytes_total-container_fs_reads_total-container_fs_writes_bytes_total-container_fs_writes_total-container_memory_usage_bytes-container_memory_working_set_bytes-container_network_receive_bytes_total-container_network_transmit_bytes_total-container_scrape_error-machine_cpu_corescalico:-felix_active_local_endpoints-felix_active_local_policies-felix_active_local_selectors-felix_active_local_tags-felix_cluster_num_host_endpoints-felix_cluster_num_hosts-felix_cluster_num_workload_endpoints-felix_host-felix_int_dataplane_addr_msg_batch_size_count-felix_int_dataplane_addr_msg_batch_size_sum-felix_int_dataplane_failures-felix_int_dataplane_iface_msg_batch_size_count-felix_int_dataplane_iface_msg_batch_size_sum-felix_ipset_errors-felix_ipsets_calico-felix_iptables_chains-felix_iptables_restore_errors-felix_iptables_save_errors-felix_resyncs_startedetcd-server:[]fluentd:-apache_http_request_duration_seconds_bucket-apache_http_request_duration_seconds_count-docker_networkdb_stats_netmsg-docker_networkdb_stats_qlen-kernel_io_errors_total# Since MCC 2.27.0 (17.2.0 and 16.2.0)helm-controller:-helmbundle_reconcile_up-helmbundle_release_ready-helmbundle_release_status-helmbundle_release_success-rest_client_requests_totalhost-os-modules-controller:-hostos_module_deprecation_info# Since MCC 2.28.0 (17.3.0 and 16.3.0)-hostos_module_usage# Since MCC 2.28.0 (17.3.0 and 16.3.0)ironic:-ironic_driver_metadata-ironic_drivers_total-ironic_nodes-ironic_upkaas-exporter:-kaas_cluster_info-kaas_cluster_lcm_healthy# Since MCC 2.28.0 (17.3.0 and 16.3.0)-kaas_cluster_ready# Since MCC 2.28.0 (17.3.0 and 16.3.0)-kaas_cluster_updating-kaas_clusters-kaas_info-kaas_license_expiry-kaas_machine_ready-kaas_machines_ready-kaas_machines_requested-mcc_cluster_pending_update_schedule_time# Since MCC 2.28.0 (17.3.0 and 16.3.0)-mcc_cluster_pending_update_status# Since MCC 2.28.0 (17.3.0 and 16.3.0)-mcc_cluster_update_plan_status# Since MCC 2.28.0 (17.3.0 and 16.3.0) as TechPreview-mcc_cluster_update_plan_step_status# Since MCC 2.28.0 (17.3.0 and 16.3.0) as TechPreview-rest_client_requests_totalkubelet:-kubelet_running_containers-kubelet_running_pods-kubelet_volume_stats_available_bytes-kubelet_volume_stats_capacity_bytes-kubelet_volume_stats_used_bytes# Since MCC 2.26.0 (17.1.0 and 16.1.0)-kubernetes_build_info-rest_client_requests_totalkubernetes-apiservers:-apiserver_client_certificate_expiration_seconds_bucket-apiserver_client_certificate_expiration_seconds_count-apiserver_request_total-kubernetes_build_info-rest_client_requests_totalkubernetes-master-api:[]mcc-blackbox:[]mcc-cache:[]mcc-controllers:-rest_client_requests_totalmcc-providers:-rest_client_requests_totalmke-manager-api:[]mke-metrics-controller:-ucp_controller_services-ucp_engine_node_healthmke-metrics-engine:-ucp_engine_container_cpu_percent-ucp_engine_container_cpu_total_time_nanoseconds-ucp_engine_container_health-ucp_engine_container_memory_usage_bytes-ucp_engine_container_network_rx_bytes_total-ucp_engine_container_network_tx_bytes_total-ucp_engine_container_unhealth-ucp_engine_containers-ucp_engine_disk_free_bytes-ucp_engine_disk_total_bytes-ucp_engine_images-ucp_engine_memory_total_bytes-ucp_engine_num_cpu_coresmsr-api:[]openstack-blackbox-ext:[]openstack-cloudprober:# Since MOSK 24.2-cloudprober_success-cloudprober_totalopenstack-dns-probes:# Since MOSK 24.3-probe_dns_duration_secondsopenstack-ingress-controller:-nginx_ingress_controller_build_info-nginx_ingress_controller_config_hash-nginx_ingress_controller_config_last_reload_successful-nginx_ingress_controller_nginx_process_connections-nginx_ingress_controller_nginx_process_cpu_seconds_total-nginx_ingress_controller_nginx_process_resident_memory_bytes-nginx_ingress_controller_request_duration_seconds_bucket-nginx_ingress_controller_request_size_sum-nginx_ingress_controller_requests-nginx_ingress_controller_response_size_sum-nginx_ingress_controller_ssl_expire_time_seconds-nginx_ingress_controller_successopenstack-portprober:-portprober_arping_target_success-portprober_arping_target_totalopenstack-powerdns:# Since MOSK 24.3-pdns_auth_backend_latency-pdns_auth_backend_queries-pdns_auth_cache_latency-pdns_auth_corrupt_packets-pdns_auth_cpu_iowait-pdns_auth_cpu_steal-pdns_auth_dnsupdate_answers-pdns_auth_dnsupdate_queries-pdns_auth_dnsupdate_refused-pdns_auth_fd_usage-pdns_auth_incoming_notifications-pdns_auth_open_tcp_connections-pdns_auth_qsize_q-pdns_auth_receive_latency-pdns_auth_ring_logmessages_capacity-pdns_auth_ring_logmessages_size-pdns_auth_ring_noerror_queries_capacity-pdns_auth_ring_noerror_queries_size-pdns_auth_ring_nxdomain_queries_capacity-pdns_auth_ring_nxdomain_queries_size-pdns_auth_ring_queries_capacity-pdns_auth_ring_queries_size-pdns_auth_ring_remotes_capacity-pdns_auth_ring_remotes_size-pdns_auth_ring_remotes_unauth_capacity-pdns_auth_ring_remotes_unauth_size-pdns_auth_ring_servfail_queries_capacity-pdns_auth_ring_servfail_queries_size-pdns_auth_ring_unauth_queries_capacity-pdns_auth_ring_unauth_queries_size-pdns_auth_signatures-pdns_auth_sys_msec-pdns_auth_tcp4_answers-pdns_auth_tcp4_answers_bytes-pdns_auth_tcp4_queries-pdns_auth_tcp6_answers-pdns_auth_tcp6_answers_bytes-pdns_auth_tcp6_queries-pdns_auth_timedout_packets-pdns_auth_udp4_answers-pdns_auth_udp4_answers_bytes-pdns_auth_udp4_queries-pdns_auth_udp6_answers-pdns_auth_udp6_answers_bytes-pdns_auth_udp6_queries-pdns_auth_udp_in_csum_errors-pdns_auth_udp_in_errors-pdns_auth_udp_noport_errors-pdns_auth_udp_recvbuf_errors-pdns_auth_udp_sndbuf_errors-pdns_auth_uptime-pdns_auth_user_msecosdpl-exporter:# Removed in MOSK 24.1-osdpl_aodh_alarms-osdpl_certificate_expiry-osdpl_cinder_zone_volumes-osdpl_neutron_availability_zone_info-osdpl_neutron_zone_routers-osdpl_nova_aggregate_hosts-osdpl_nova_availability_zone_info-osdpl_nova_availability_zone_instances-osdpl_nova_availability_zone_hosts-osdpl_version_infopatroni:-patroni_patroni_cluster_unlocked-patroni_patroni_info-patroni_postgresql_info-patroni_replication_info-patroni_xlog_location-patroni_xlog_paused-patroni_xlog_received_location-patroni_xlog_replayed_location-python_infopostgresql:-pg_database_size-pg_locks_count-pg_stat_activity_count-pg_stat_activity_max_tx_duration-pg_stat_archiver_failed_count-pg_stat_bgwriter_buffers_alloc_total-pg_stat_bgwriter_buffers_backend_fsync_total-pg_stat_bgwriter_buffers_backend_total-pg_stat_bgwriter_buffers_checkpoint_total-pg_stat_bgwriter_buffers_clean_total-pg_stat_bgwriter_checkpoint_sync_time_total-pg_stat_bgwriter_checkpoint_write_time_total-pg_stat_database_blks_hit-pg_stat_database_blks_read-pg_stat_database_checksum_failures-pg_stat_database_conflicts-pg_stat_database_conflicts_confl_bufferpin-pg_stat_database_conflicts_confl_deadlock-pg_stat_database_conflicts_confl_lock-pg_stat_database_conflicts_confl_snapshot-pg_stat_database_conflicts_confl_tablespace-pg_stat_database_deadlocks-pg_stat_database_temp_bytes-pg_stat_database_tup_deleted-pg_stat_database_tup_fetched-pg_stat_database_tup_inserted-pg_stat_database_tup_returned-pg_stat_database_tup_updated-pg_stat_database_xact_commit-pg_stat_database_xact_rollback-postgres_exporter_build_infoprometheus-alertmanager:-alertmanager_active_alerts-alertmanager_active_silences-alertmanager_alerts-alertmanager_alerts_invalid_total-alertmanager_alerts_received_total-alertmanager_build_info-alertmanager_cluster_failed_peers-alertmanager_cluster_health_score-alertmanager_cluster_members-alertmanager_cluster_messages_pruned_total-alertmanager_cluster_messages_queued-alertmanager_cluster_messages_received_size_total-alertmanager_cluster_messages_received_total-alertmanager_cluster_messages_sent_size_total-alertmanager_cluster_messages_sent_total-alertmanager_cluster_peer_info-alertmanager_cluster_peers_joined_total-alertmanager_cluster_peers_left_total-alertmanager_cluster_reconnections_failed_total-alertmanager_cluster_reconnections_total-alertmanager_config_last_reload_success_timestamp_seconds-alertmanager_config_last_reload_successful-alertmanager_nflog_gc_duration_seconds_count-alertmanager_nflog_gc_duration_seconds_sum-alertmanager_nflog_gossip_messages_propagated_total-alertmanager_nflog_queries_total-alertmanager_nflog_query_duration_seconds_bucket-alertmanager_nflog_query_errors_total-alertmanager_nflog_snapshot_duration_seconds_count-alertmanager_nflog_snapshot_duration_seconds_sum-alertmanager_nflog_snapshot_size_bytes-alertmanager_notification_latency_seconds_bucket-alertmanager_notifications_failed_total-alertmanager_notifications_total-alertmanager_oversize_gossip_message_duration_seconds_bucket-alertmanager_oversized_gossip_message_dropped_total-alertmanager_oversized_gossip_message_failure_total-alertmanager_oversized_gossip_message_sent_total-alertmanager_partial_state_merges_failed_total-alertmanager_partial_state_merges_total-alertmanager_silences-alertmanager_silences_gc_duration_seconds_count-alertmanager_silences_gc_duration_seconds_sum-alertmanager_silences_gossip_messages_propagated_total-alertmanager_silences_queries_total-alertmanager_silences_query_duration_seconds_bucket-alertmanager_silences_query_errors_total-alertmanager_silences_snapshot_duration_seconds_count-alertmanager_silences_snapshot_duration_seconds_sum-alertmanager_silences_snapshot_size_bytes-alertmanager_state_replication_failed_total-alertmanager_state_replication_totalprometheus-elasticsearch-exporter:-elasticsearch_breakers_estimated_size_bytes-elasticsearch_breakers_limit_size_bytes-elasticsearch_breakers_tripped-elasticsearch_cluster_health_active_primary_shards-elasticsearch_cluster_health_active_shards-elasticsearch_cluster_health_delayed_unassigned_shards-elasticsearch_cluster_health_initializing_shards-elasticsearch_cluster_health_number_of_data_nodes-elasticsearch_cluster_health_number_of_nodes-elasticsearch_cluster_health_number_of_pending_tasks-elasticsearch_cluster_health_relocating_shards-elasticsearch_cluster_health_status-elasticsearch_cluster_health_unassigned_shards-elasticsearch_exporter_build_info-elasticsearch_indices_docs-elasticsearch_indices_docs_deleted-elasticsearch_indices_docs_primary-elasticsearch_indices_fielddata_evictions-elasticsearch_indices_fielddata_memory_size_bytes-elasticsearch_indices_filter_cache_evictions-elasticsearch_indices_flush_time_seconds-elasticsearch_indices_flush_total-elasticsearch_indices_get_exists_time_seconds-elasticsearch_indices_get_exists_total-elasticsearch_indices_get_missing_time_seconds-elasticsearch_indices_get_missing_total-elasticsearch_indices_get_time_seconds-elasticsearch_indices_get_total-elasticsearch_indices_indexing_delete_time_seconds_total-elasticsearch_indices_indexing_delete_total-elasticsearch_indices_indexing_index_time_seconds_total-elasticsearch_indices_indexing_index_total-elasticsearch_indices_merges_docs_total-elasticsearch_indices_merges_total-elasticsearch_indices_merges_total_size_bytes_total-elasticsearch_indices_merges_total_time_seconds_total-elasticsearch_indices_query_cache_evictions-elasticsearch_indices_query_cache_memory_size_bytes-elasticsearch_indices_refresh_time_seconds_total-elasticsearch_indices_refresh_total-elasticsearch_indices_search_fetch_time_seconds-elasticsearch_indices_search_fetch_total-elasticsearch_indices_search_query_time_seconds-elasticsearch_indices_search_query_total-elasticsearch_indices_segment_count_primary-elasticsearch_indices_segment_count_total-elasticsearch_indices_segment_memory_bytes_primary-elasticsearch_indices_segment_memory_bytes_total-elasticsearch_indices_segments_count-elasticsearch_indices_segments_memory_bytes-elasticsearch_indices_store_size_bytes-elasticsearch_indices_store_size_bytes_primary-elasticsearch_indices_store_size_bytes_total-elasticsearch_indices_store_throttle_time_seconds_total-elasticsearch_indices_translog_operations-elasticsearch_indices_translog_size_in_bytes-elasticsearch_jvm_gc_collection_seconds_count-elasticsearch_jvm_gc_collection_seconds_sum-elasticsearch_jvm_memory_committed_bytes-elasticsearch_jvm_memory_max_bytes-elasticsearch_jvm_memory_pool_peak_used_bytes-elasticsearch_jvm_memory_used_bytes-elasticsearch_os_load1-elasticsearch_os_load15-elasticsearch_os_load5-elasticsearch_process_cpu_percent-elasticsearch_process_cpu_seconds_total-elasticsearch_process_cpu_time_seconds_sum-elasticsearch_process_open_files_count-elasticsearch_thread_pool_active_count-elasticsearch_thread_pool_completed_count-elasticsearch_thread_pool_queue_count-elasticsearch_thread_pool_rejected_count-elasticsearch_transport_rx_size_bytes_total-elasticsearch_transport_tx_size_bytes_totalprometheus-grafana:-grafana_api_dashboard_get_milliseconds-grafana_api_dashboard_get_milliseconds_count-grafana_api_dashboard_get_milliseconds_sum-grafana_api_dashboard_save_milliseconds-grafana_api_dashboard_save_milliseconds_count-grafana_api_dashboard_save_milliseconds_sum-grafana_api_dashboard_search_milliseconds-grafana_api_dashboard_search_milliseconds_count-grafana_api_dashboard_search_milliseconds_sum-grafana_api_dataproxy_request_all_milliseconds-grafana_api_dataproxy_request_all_milliseconds_count-grafana_api_dataproxy_request_all_milliseconds_sum-grafana_api_login_oauth_total-grafana_api_login_post_total-grafana_api_response_status_total-grafana_build_info-grafana_feature_toggles_info-grafana_http_request_duration_seconds_count-grafana_page_response_status_total-grafana_plugin_build_info-grafana_proxy_response_status_total-grafana_stat_total_orgs-grafana_stat_total_users-grafana_stat_totals_dashboardprometheus-kube-state-metrics:-kube_cronjob_next_schedule_time-kube_daemonset_created-kube_daemonset_status_current_number_scheduled-kube_daemonset_status_desired_number_scheduled-kube_daemonset_status_number_available-kube_daemonset_status_number_misscheduled-kube_daemonset_status_number_ready-kube_daemonset_status_number_unavailable-kube_daemonset_status_observed_generation-kube_daemonset_status_updated_number_scheduled-kube_deployment_created-kube_deployment_metadata_generation-kube_deployment_spec_replicas-kube_deployment_status_observed_generation-kube_deployment_status_replicas-kube_deployment_status_replicas_available-kube_deployment_status_replicas_unavailable-kube_deployment_status_replicas_updated-kube_endpoint_address# Since MOSK 25.1-kube_endpoint_address_available# Deprecated since MOSK 25.1-kube_job_status_active-kube_job_status_failed-kube_job_status_succeeded-kube_namespace_created-kube_namespace_status_phase-kube_node_info-kube_node_labels-kube_node_role-kube_node_spec_taint-kube_node_spec_unschedulable-kube_node_status_allocatable-kube_node_status_capacity-kube_node_status_condition-kube_persistentvolume_capacity_bytes-kube_persistentvolume_status_phase-kube_persistentvolumeclaim_resource_requests_storage_bytes-kube_pod_container_info-kube_pod_container_resource_limits-kube_pod_container_resource_requests-kube_pod_container_status_restarts_total-kube_pod_container_status_running-kube_pod_container_status_terminated-kube_pod_container_status_waiting-kube_pod_info-kube_pod_init_container_status_running-kube_pod_status_phase-kube_service_status_load_balancer_ingress-kube_statefulset_created-kube_statefulset_metadata_generation-kube_statefulset_replicas-kube_statefulset_status_current_revision-kube_statefulset_status_observed_generation-kube_statefulset_status_replicas-kube_statefulset_status_replicas_available-kube_statefulset_status_replicas_current-kube_statefulset_status_replicas_ready-kube_statefulset_status_replicas_updated-kube_statefulset_status_update_revisionprometheus-libvirt-exporter:-libvirt_domain_block_stats_allocation-libvirt_domain_block_stats_capacity-libvirt_domain_block_stats_physical-libvirt_domain_block_stats_read_bytes_total-libvirt_domain_block_stats_read_requests_total-libvirt_domain_block_stats_write_bytes_total-libvirt_domain_block_stats_write_requests_total-libvirt_domain_info_cpu_time_seconds_total-libvirt_domain_info_maximum_memory_bytes-libvirt_domain_info_memory_usage_bytes-libvirt_domain_info_state-libvirt_domain_info_virtual_cpus-libvirt_domain_interface_stats_receive_bytes_total-libvirt_domain_interface_stats_receive_drops_total-libvirt_domain_interface_stats_receive_errors_total-libvirt_domain_interface_stats_receive_packets_total-libvirt_domain_interface_stats_transmit_bytes_total-libvirt_domain_interface_stats_transmit_drops_total-libvirt_domain_interface_stats_transmit_errors_total-libvirt_domain_interface_stats_transmit_packets_total-libvirt_domain_memory_actual_balloon_bytes-libvirt_domain_memory_available_bytes-libvirt_domain_memory_rss_bytes-libvirt_domain_memory_unused_bytes-libvirt_domain_memory_usable_bytes-libvirt_upprometheus-memcached-exporter:-memcached_commands_total-memcached_current_bytes-memcached_current_connections-memcached_current_items-memcached_exporter_build_info-memcached_items_evicted_total-memcached_items_reclaimed_total-memcached_limit_bytes-memcached_read_bytes_total-memcached_up-memcached_version-memcached_written_bytes_totalprometheus-msteams:[]prometheus-mysql-exporter:-mysql_global_status_aborted_clients-mysql_global_status_aborted_connects-mysql_global_status_buffer_pool_pages-mysql_global_status_bytes_received-mysql_global_status_bytes_sent-mysql_global_status_commands_total-mysql_global_status_created_tmp_disk_tables-mysql_global_status_created_tmp_files-mysql_global_status_created_tmp_tables-mysql_global_status_handlers_total-mysql_global_status_innodb_log_waits-mysql_global_status_innodb_num_open_files-mysql_global_status_innodb_page_size-mysql_global_status_max_used_connections-mysql_global_status_open_files-mysql_global_status_open_table_definitions-mysql_global_status_open_tables-mysql_global_status_opened_files-mysql_global_status_opened_table_definitions-mysql_global_status_opened_tables-mysql_global_status_qcache_free_memory-mysql_global_status_qcache_hits-mysql_global_status_qcache_inserts-mysql_global_status_qcache_lowmem_prunes-mysql_global_status_qcache_not_cached-mysql_global_status_qcache_queries_in_cache-mysql_global_status_queries-mysql_global_status_questions-mysql_global_status_select_full_join-mysql_global_status_select_full_range_join-mysql_global_status_select_range-mysql_global_status_select_range_check-mysql_global_status_select_scan-mysql_global_status_slow_queries-mysql_global_status_sort_merge_passes-mysql_global_status_sort_range-mysql_global_status_sort_rows-mysql_global_status_sort_scan-mysql_global_status_table_locks_immediate-mysql_global_status_table_locks_waited-mysql_global_status_threads_cached-mysql_global_status_threads_connected-mysql_global_status_threads_created-mysql_global_status_threads_running-mysql_global_status_wsrep_flow_control_paused-mysql_global_status_wsrep_local_recv_queue-mysql_global_status_wsrep_local_state-mysql_global_status_wsrep_ready-mysql_global_variables_innodb_buffer_pool_size-mysql_global_variables_innodb_log_buffer_size-mysql_global_variables_key_buffer_size-mysql_global_variables_max_connections-mysql_global_variables_open_files_limit-mysql_global_variables_query_cache_size-mysql_global_variables_table_definition_cache-mysql_global_variables_table_open_cache-mysql_global_variables_thread_cache_size-mysql_global_variables_wsrep_desync-mysql_up-mysql_version_info-mysqld_exporter_build_infoprometheus-node-exporter:-node_arp_entries-node_bonding_active-node_bonding_slaves-node_boot_time_seconds-node_context_switches_total-node_cpu_seconds_total-node_disk_io_now-node_disk_io_time_seconds_total-node_disk_io_time_weighted_seconds_total-node_disk_read_bytes_total-node_disk_read_time_seconds_total-node_disk_reads_completed_total-node_disk_reads_merged_total-node_disk_write_time_seconds_total-node_disk_writes_completed_total-node_disk_writes_merged_total-node_disk_written_bytes_total-node_entropy_available_bits-node_exporter_build_info-node_filefd_allocated-node_filefd_maximum-node_filesystem_avail_bytes-node_filesystem_files-node_filesystem_files_free-node_filesystem_free_bytes-node_filesystem_readonly-node_filesystem_size_bytes-node_forks_total-node_hwmon_temp_celsius-node_hwmon_temp_crit_alarm_celsius-node_hwmon_temp_crit_celsius-node_hwmon_temp_crit_hyst_celsius-node_hwmon_temp_max_celsius-node_intr_total-node_load1-node_load15-node_load5-node_memory_Active_anon_bytes-node_memory_Active_bytes-node_memory_Active_file_bytes-node_memory_AnonHugePages_bytes-node_memory_AnonPages_bytes-node_memory_Bounce_bytes-node_memory_Buffers_bytes-node_memory_Cached_bytes-node_memory_CommitLimit_bytes-node_memory_Committed_AS_bytes-node_memory_DirectMap1G-node_memory_DirectMap2M_bytes-node_memory_DirectMap4k_bytes-node_memory_Dirty_bytes-node_memory_HardwareCorrupted_bytes-node_memory_HugePages_Free-node_memory_HugePages_Rsvd-node_memory_HugePages_Surp-node_memory_HugePages_Total-node_memory_Hugepagesize_bytes-node_memory_Inactive_anon_bytes-node_memory_Inactive_bytes-node_memory_Inactive_file_bytes-node_memory_KernelStack_bytes-node_memory_Mapped_bytes-node_memory_MemAvailable_bytes-node_memory_MemFree_bytes-node_memory_MemTotal_bytes-node_memory_Mlocked_bytes-node_memory_NFS_Unstable_bytes-node_memory_PageTables_bytes-node_memory_SReclaimable_bytes-node_memory_SUnreclaim_bytes-node_memory_Shmem_bytes-node_memory_Slab_bytes-node_memory_SwapCached_bytes-node_memory_SwapFree_bytes-node_memory_SwapTotal_bytes-node_memory_Unevictable_bytes-node_memory_VmallocChunk_bytes-node_memory_VmallocTotal_bytes-node_memory_VmallocUsed_bytes-node_memory_WritebackTmp_bytes-node_memory_Writeback_bytes-node_netstat_TcpExt_TCPSynRetrans-node_netstat_Tcp_ActiveOpens-node_netstat_Tcp_AttemptFails-node_netstat_Tcp_CurrEstab-node_netstat_Tcp_EstabResets-node_netstat_Tcp_InCsumErrors-node_netstat_Tcp_InErrs-node_netstat_Tcp_InSegs-node_netstat_Tcp_MaxConn-node_netstat_Tcp_OutRsts-node_netstat_Tcp_OutSegs-node_netstat_Tcp_PassiveOpens-node_netstat_Tcp_RetransSegs-node_netstat_Udp_InCsumErrors-node_netstat_Udp_InDatagrams-node_netstat_Udp_InErrors-node_netstat_Udp_NoPorts-node_netstat_Udp_OutDatagrams-node_netstat_Udp_RcvbufErrors-node_netstat_Udp_SndbufErrors-node_network_mtu_bytes-node_network_receive_bytes_total-node_network_receive_compressed_total-node_network_receive_drop_total-node_network_receive_errs_total-node_network_receive_fifo_total-node_network_receive_frame_total-node_network_receive_multicast_total-node_network_receive_packets_total-node_network_transmit_bytes_total-node_network_transmit_carrier_total-node_network_transmit_colls_total-node_network_transmit_compressed_total-node_network_transmit_drop_total-node_network_transmit_errs_total-node_network_transmit_fifo_total-node_network_transmit_packets_total-node_network_up-node_nf_conntrack_entries-node_nf_conntrack_entries_limit-node_procs_blocked-node_procs_running-node_scrape_collector_duration_seconds-node_scrape_collector_success-node_sockstat_FRAG_inuse-node_sockstat_FRAG_memory-node_sockstat_RAW_inuse-node_sockstat_TCP_alloc-node_sockstat_TCP_inuse-node_sockstat_TCP_mem-node_sockstat_TCP_mem_bytes-node_sockstat_TCP_orphan-node_sockstat_TCP_tw-node_sockstat_UDPLITE_inuse-node_sockstat_UDP_inuse-node_sockstat_UDP_mem-node_sockstat_UDP_mem_bytes-node_sockstat_sockets_used-node_textfile_scrape_error-node_time_seconds-node_timex_estimated_error_seconds-node_timex_frequency_adjustment_ratio-node_timex_maxerror_seconds-node_timex_offset_seconds-node_timex_sync_status-node_uname_infoprometheus-rabbitmq-exporter:# Deprecated since MOSK 25.1, use rabbitmq-prometheus-plugin instead-rabbitmq_channels-rabbitmq_connections-rabbitmq_consumers-rabbitmq_exchanges-rabbitmq_exporter_build_info-rabbitmq_fd_available-rabbitmq_fd_used-rabbitmq_node_disk_free-rabbitmq_node_disk_free_alarm-rabbitmq_node_mem_alarm-rabbitmq_node_mem_used-rabbitmq_partitions-rabbitmq_queue_messages_global-rabbitmq_queue_messages_ready_global-rabbitmq_queue_messages_unacknowledged_global-rabbitmq_queues-rabbitmq_sockets_available-rabbitmq_sockets_used-rabbitmq_up-rabbitmq_uptime-rabbitmq_version_infoprometheus-relay:[]prometheus-server:-prometheus_build_info-prometheus_config_last_reload_success_timestamp_seconds-prometheus_config_last_reload_successful-prometheus_engine_query_duration_seconds-prometheus_engine_query_duration_seconds_sum-prometheus_http_request_duration_seconds_count-prometheus_notifications_alertmanagers_discovered-prometheus_notifications_errors_total-prometheus_notifications_queue_capacity-prometheus_notifications_queue_length-prometheus_notifications_sent_total-prometheus_rule_evaluation_failures_total-prometheus_target_interval_length_seconds-prometheus_target_interval_length_seconds_count-prometheus_target_scrapes_sample_duplicate_timestamp_total-prometheus_tsdb_blocks_loaded-prometheus_tsdb_compaction_chunk_range_seconds_count-prometheus_tsdb_compaction_chunk_range_seconds_sum-prometheus_tsdb_compaction_chunk_samples_count-prometheus_tsdb_compaction_chunk_samples_sum-prometheus_tsdb_compaction_chunk_size_bytes_sum-prometheus_tsdb_compaction_duration_seconds_bucket-prometheus_tsdb_compaction_duration_seconds_count-prometheus_tsdb_compaction_duration_seconds_sum-prometheus_tsdb_compactions_failed_total-prometheus_tsdb_compactions_total-prometheus_tsdb_compactions_triggered_total-prometheus_tsdb_head_active_appenders-prometheus_tsdb_head_chunks-prometheus_tsdb_head_chunks_created_total-prometheus_tsdb_head_chunks_removed_total-prometheus_tsdb_head_gc_duration_seconds_sum-prometheus_tsdb_head_samples_appended_total-prometheus_tsdb_head_series-prometheus_tsdb_head_series_created_total-prometheus_tsdb_head_series_removed_total-prometheus_tsdb_reloads_failures_total-prometheus_tsdb_reloads_total-prometheus_tsdb_storage_blocks_bytes-prometheus_tsdb_wal_corruptions_total-prometheus_tsdb_wal_fsync_duration_seconds_count-prometheus_tsdb_wal_fsync_duration_seconds_sum-prometheus_tsdb_wal_truncations_failed_total-prometheus_tsdb_wal_truncations_totalrabbitmq-operator-metrics:-rest_client_requests_totalrabbitmq-prometheus-plugin:# Since MOSK 25.1 to replace prometheus-rabbitmq-exporter-erlang_vm_allocators-erlang_vm_dist_node_queue_size_bytes-erlang_vm_dist_node_state-erlang_vm_dist_recv_bytes-erlang_vm_dist_recv_cnt-erlang_vm_dist_send_bytes-erlang_vm_dist_send_cnt-erlang_vm_ets_limit-erlang_vm_memory_bytes_total-erlang_vm_memory_dets_tables-erlang_vm_memory_ets_tables-erlang_vm_memory_system_bytes_total-erlang_vm_port_count-erlang_vm_port_limit-erlang_vm_process_count-erlang_vm_process_limit-erlang_vm_statistics_bytes_output_total-erlang_vm_statistics_bytes_received_total-erlang_vm_statistics_context_switches-erlang_vm_statistics_dirty_cpu_run_queue_length-erlang_vm_statistics_dirty_io_run_queue_length-erlang_vm_statistics_garbage_collection_bytes_reclaimed-erlang_vm_statistics_garbage_collection_number_of_gcs-erlang_vm_statistics_reductions_total-erlang_vm_statistics_run_queues_length-erlang_vm_statistics_run_queues_length_total-erlang_vm_statistics_runtime_milliseconds-rabbitmq_alarms_free_disk_space_watermark-rabbitmq_alarms_memory_used_watermark-rabbitmq_build_info-rabbitmq_channels-rabbitmq_channels_closed_total-rabbitmq_channels_opened_total-rabbitmq_connections-rabbitmq_connections_closed_total-rabbitmq_connections_opened_total-rabbitmq_consumers-rabbitmq_disk_space_available_bytes-rabbitmq_erlang_uptime_seconds-rabbitmq_global_messages_acknowledged_total-rabbitmq_global_messages_confirmed_total-rabbitmq_global_messages_delivered_consume_auto_ack_total-rabbitmq_global_messages_delivered_consume_manual_ack_total-rabbitmq_global_messages_delivered_get_auto_ack_total-rabbitmq_global_messages_delivered_get_manual_ack_total-rabbitmq_global_messages_get_empty_total-rabbitmq_global_messages_received_confirm_total-rabbitmq_global_messages_received_total-rabbitmq_global_messages_redelivered_total-rabbitmq_global_messages_routed_total-rabbitmq_global_messages_unroutable_dropped_total-rabbitmq_global_messages_unroutable_returned_total-rabbitmq_global_publishers-rabbitmq_identity_info-rabbitmq_process_max_fds-rabbitmq_process_max_tcp_sockets-rabbitmq_process_open_fds-rabbitmq_process_open_tcp_sockets-rabbitmq_process_resident_memory_bytes-rabbitmq_queue_messages-rabbitmq_queue_messages_ready-rabbitmq_queue_messages_unacked-rabbitmq_queues-rabbitmq_queues_created_total-rabbitmq_queues_declared_total-rabbitmq_queues_deleted_total-rabbitmq_resident_memory_limit_bytes-rabbitmq_unreachable_cluster_peers_countsf-notifier:-sf_auth_ok-sf_error_count_created-sf_error_count_total-sf_request_count_created-sf_request_count_totaltelegraf-docker-swarm:-docker_n_containers-docker_n_containers_paused-docker_n_containers_running-docker_n_containers_stopped-docker_swarm_node_ready# Removed in MOSK 25.1-docker_swarm_tasks_desired-docker_swarm_tasks_running-internal_agent_gather_errorstelemeter-client:-federate_errors-federate_filtered_samples-federate_samplestelemeter-server:-telemeter_cleanups_total-telemeter_partitions-telemeter_samples_totaltf-cassandra-jmx-exporter:-cassandra_cache_entries-cassandra_cache_estimated_size_bytes-cassandra_cache_hits_total-cassandra_cache_requests_total-cassandra_client_authentication_failures_total-cassandra_client_native_connections-cassandra_client_request_failures_total-cassandra_client_request_latency_seconds_count-cassandra_client_request_latency_seconds_sum-cassandra_client_request_timeouts_total-cassandra_client_request_unavailable_exceptions_total-cassandra_client_request_view_write_latency_seconds-cassandra_commit_log_pending_tasks-cassandra_compaction_bytes_compacted_total-cassandra_compaction_completed_total-cassandra_dropped_messages_total-cassandra_endpoint_connection_timeouts_total-cassandra_storage_exceptions_total-cassandra_storage_hints_total-cassandra_storage_load_bytes-cassandra_table_estimated_pending_compactions-cassandra_table_repaired_ratio-cassandra_table_sstables_per_read_count-cassandra_table_tombstones_scanned-cassandra_thread_pool_active_tasks-cassandra_thread_pool_blocked_taskstf-control:-tf_controller_sessions-tf_controller_uptf-kafka-jmx:-jmx_exporter_build_info-kafka_controller_controllerstats_count-kafka_controller_controllerstats_oneminuterate-kafka_controller_kafkacontroller_value-kafka_log_log_value-kafka_network_processor_value-kafka_network_requestmetrics_99thpercentile-kafka_network_requestmetrics_mean-kafka_network_requestmetrics_oneminuterate-kafka_network_socketserver_value-kafka_server_brokertopicmetrics_count-kafka_server_brokertopicmetrics_oneminuterate-kafka_server_delayedoperationpurgatory_value-kafka_server_kafkarequesthandlerpool_oneminuterate-kafka_server_replicamanager_oneminuterate-kafka_server_replicamanager_valuetf-operator:-tf_operator_info# Since MOSK 23.3tf-redis:-redis_commands_duration_seconds_total-redis_commands_processed_total-redis_commands_total-redis_connected_clients-redis_connected_slaves-redis_db_keys-redis_db_keys_expiring-redis_evicted_keys_total-redis_expired_keys_total-redis_exporter_build_info-redis_instance_info-redis_keyspace_hits_total-redis_keyspace_misses_total-redis_memory_max_bytes-redis_memory_used_bytes-redis_net_input_bytes_total-redis_net_output_bytes_total-redis_rejected_connections_total-redis_slave_info-redis_up-redis_uptime_in_secondstf-vrouter:-tf_vrouter_ds_discard-tf_vrouter_ds_flow_action_drop-tf_vrouter_ds_flow_queue_limit_exceeded-tf_vrouter_ds_flow_table_full-tf_vrouter_ds_frag_err-tf_vrouter_ds_invalid_if-tf_vrouter_ds_invalid_label-tf_vrouter_ds_invalid_nh-tf_vrouter_flow_active-tf_vrouter_flow_aged-tf_vrouter_flow_created-tf_vrouter_lls_session_info-tf_vrouter_up-tf_vrouter_xmpp_connection_statetf-zookeeper:-approximate_data_size-bytes_received_count-commit_count-connection_drop_count-connection_rejected-connection_request_count-dead_watchers_cleaner_latency_sum-dead_watchers_cleared-dead_watchers_queued-digest_mismatches_count-election_time_sum-ephemerals_count-follower_sync_time_count-follower_sync_time_sum-fsynctime_sum-global_sessions-jvm_classes_loaded-jvm_gc_collection_seconds_sum-jvm_info-jvm_memory_pool_bytes_used-jvm_threads_current-jvm_threads_deadlocked-jvm_threads_state-leader_uptime-learner_commit_received_count-learner_proposal_received_count-learners-local_sessions-max_file_descriptor_count-node_changed_watch_count_sum-node_children_watch_count_sum-node_created_watch_count_sum-node_deleted_watch_count_sum-num_alive_connections-om_commit_process_time_ms_sum-om_proposal_process_time_ms_sum-open_file_descriptor_count-outstanding_requests-packets_received-packets_sent-pending_syncs-proposal_count-quorum_size-response_packet_cache_hits-response_packet_cache_misses-response_packet_get_children_cache_hits-response_packet_get_children_cache_misses-revalidate_count-snapshottime_sum-stale_sessions_expired-synced_followers-synced_non_voting_followers-synced_observers-unrecoverable_error_count-uptime-watch_count-znode_countucp-kv:[]
Note
The following Prometheus metrics are removed from the list of
white-listed scrape jobs in Container Cloud 2.25.0 (Cluster releases
17.0.0 and 16.0.0):
The prometheus-coredns job from the go-collector-metrics
and process-collector-metrics groups
You can add necessary metrics that are dropped to this white list as described
below. It is also possible to disable the filtering feature. However, Mirantis
does not recommend disabling the feature to prevent direct impact on the
Prometheus index size, which affects query speed. For clusters with extended
retention period, performance degradation will be the most noticeable.
You can expand the default white list of Prometheus
metrics using the prometheusServer.metricsFiltering.extraMetricsInclude
parameter to enable metrics that are dropped by default. For the
parameter description, see Prometheus metrics filtering. For configuration
steps, see Configure StackLight.
Mirantis does not recommend disabling metrics filtering to prevent direct
impact on the Prometheus index size, which affects query speed. In clusters
with an extended retention period, performance degradation will be the most
noticeable. Therefore, the best option is to keep the feature enabled and add
the required dropped metrics to the white list as described in
Add dropped metrics to the white list.
If disabling of metrics filtering is absolutely necessary, set the
prometheusServer.metricsFiltering.enabled parameter to false:
Available since MCC 2.27.0 (Cluster releases 17.2.0 and 16.2.0)
The StackLight telegraf-ds-smart exporter uses the
S.M.A.R.T. plugin to
obtain detailed disk information and export it as metrics on a
MOSK cluster. S.M.A.R.T. is a commonly used system across
vendors with performance data provided as attributes, whereas attribute names
can be different across vendors. Each attribute contains the following
different values:
Raw value
Actual value of the attribute for the time being. Units may not be the same
across vendors.
Current value
Health valuation where values can range from 1 to 253 (1
represents the worst case and 253 represents the best one). Depending
on the manufacturer, a value of 100 or 200 will often be selected
as the normal value.
Worst value
The worst value ever observed as a current one for a particular device.
Threshold value
Lower threshold for the current value. If the current value drops below the
lower threshold, it requires attention.
The following table provides examples for alert rules based on S.M.A.R.T.
metrics. These examples may not work for all clusters depending on vendor or
disk types.
Caution
Before creating alert rules, manually test these expressions to
verify whether they are valid for the cluster. You can also implement any
other alerts based on S.M.A.R.T. metrics.
To create custom alert rules in StackLight, use the customAlerts
parameter described in Alert configuration.
Expression
Description
expr:smart_device_exit_status>0
Alerts when a device exit status signals potential issues.
expr:smart_device_health_ok==0
Indicates disk health failure.
expr:smart_attribute_threshold>=smart_attribute
Targets any S.M.A.R.T. attribute reaching its predefined threshold,
indicating a potential risk or imminent failure of the disk. Utilizing
this alert might eliminate the need for more specific attribute alerts
by relying on the vendor’s established thresholds, streamlining the
monitoring process. Implementing inhibition rules may be necessary to
manage overlaps with other alerts effectively.
expr:smart_device_temp_c>60
Is triggered when disk temperature exceeds 60°C, indicating potential
overheating issues.
expr:increase(smart_device_udma_crc_errors[2m])>0
Identifies an increase in UDMA CRC errors, indicating data transmission
issues between the disk and controller.
expr:increase(smart_device_read_error_rate[2m])>0
Is triggered during a noticeable increase in the rate of read errors on the
disk. This is a strong indicator of issues with the disk surface or
read/write heads that can affect data integrity and accessibility.
Is triggered when the disk experiences an increase in attempts to spin up
to its operational speed, indicating potential issues with the disk motor,
bearings, or power supply, which can lead to drive failure.
Is triggered during an increase in the number of disk sectors that cannot
be corrected by the error correction algorithms of the drive, pointing
towards serious disk surface or read/write head issues.
Is triggered on a rise in sectors that are marked as pending for remapping
due to read errors. Persistent increases can indicate deteriorating
disk health and impending failure.
Detects an upsurge in errors during the process of data transmission
from the host to the disk and vice versa, highlighting potential issues
in data integrity during transfer operations.
Is triggered during an increase in sectors that have been reallocated due to
being deemed defective. A rising count is a critical sign of ongoing wear
and tear, or damage to the disk surface.
The following table describes S.M.A.R.T. metrics provided by Stacklight that
you can use for creating alert rules depending on your cluster requirements:
Metric
Description
smart_attribute
Reports current S.M.A.R.T. attribute values with labels for detailed context.
smart_attribute_exit_status
Indicates the fetching status of individual attributes. A non-zero code
indicates monitoring issues.
smart_attribute_raw_value
Reports raw S.M.A.R.T. attribute values with labels for detailed context.
smart_attribute_threshold
Reports S.M.A.R.T. attribute threshold values with labels for detailed context.
smart_attribute_worst
Reports the worst recorded values of S.M.A.R.T. attributes with labels
for detailed context.
smart_device_command_timeout
Counts timeouts when a drive fails to respond to a command, indicating
responsiveness issues.
smart_device_exit_status
Reflects the overall device status post-checks, where values other than
0 indicate issues.
smart_device_health_ok
Indicates overall device health, where values other than 1 indicate
issues. Relates to the --health attribute of the smartctl
tool.
The following table describes metrics from various S.M.A.R.T. attributes that
are part of the above smart_attribute* metrics. But their value
representation can be different, such as unified units or counter information.
Also, vendors may have different attribute namings. The following metrics are
standardized across different vendors. Depending on the disk or vendor type,
a cluster may miss some of the following metrics or have extra ones.
Metric
Description
smart_device_end_to_end_error
Monitors data transmission errors, where an increase suggests potential
transfer issues.
smart_device_pending_sector_count
Counts sectors awaiting remapping due to unrecoverable errors, with decreases
over time indicating successful remapping.
smart_device_read_error_rate
Tracks errors occurring during disk data reads.
smart_device_reallocated_sectors_count
Counts defective sectors that have been remapped, with increases indicating
drive degradation.
smart_device_seek_error_rate
Measures the error frequency of the drive positioning mechanism, with
high values indicating mechanical issues.
smart_device_spin_retry_count
Tracks the drive attempts to spin up to operational speed, with increases
indicating mechanical issues.
smart_device_temp_c
Reports the drive temperature in Celsius.
smart_device_udma_crc_errors
Counts errors in data communication between the drive and host.
smart_device_uncorrectable_errors
Records total uncorrectable read/write errors.
smart_device_uncorrectable_sector_count
Counts sectors that cannot be corrected indicating potentially damaged sectors.
On an existing managed cluster, addition of a worker machine that replaces the
one containing the StackLight node label requires the label
migration to the new machine and a manual removal of StackLight Pods from
the old machine, which you remove the label from.
Caution
In this procedure, replace <machine-name> with the name of
the machine from which you remove the StackLight node label.
To deschedule StackLight Pods from a worker machine:
Remove the stacklight=enabled node label from the spec section of
the target Machine object.
Connect to the required cluster using its kubeconfig.
Verify that the stacklight=enabled label was removed successfully:
The Tungsten Fabric cluster update is performed during the
MOSK cluster release update.
The control plane update is performed automatically. To complete
the data plane update, you will need to manually remove the vRouter pods.
See Cluster update for details.
MOSK enables you to perform the automatic TF
data backup using the tf-dbbackup-job cron job. Also, you can configure
a remote NFS storage for TF data backups. For configuration details,
refer to the Tungsten Fabric database section in Reference Architecture.
This section provides instructions on how to back up the TF data manually
if needed.
This section describes how to restore the Cassandra and ZooKeeper databases
from the backups created either automatically or manually as described in
Back up TF databases.
Caution
The data backup must be consistent across all systems
because the state of the Tungsten Fabric databases is associated with
other system databases, such as OpenStack databases.
When restoring the data, MOSK stops the
Tungsten Fabric services and recreates the database backends that
include Cassandra, Kafka, and ZooKeeper.
Note
The automated restoration process relies on automated database
backups configured by the Tungsten Fabric Operator. The Tungsten Fabric
data is restored from the backup type specified in the tf-dbBackup
section of the Tungsten Fabric Operator custom resource, or the default
pvc type if not specified. For the configuration details, refer to
Periodic Tungsten Fabric database backups.
Optional. Specify the name of the backup to be used for the
dbDumpName parameter. By default, the latest db-dump is used.
...Status: Health: ReadyEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal TfDaemonSetsDeleted 18m (x4 over 18m) tf-dbrestore TF DaemonSets were deleted Normal zookeeperOperatorScaledDown 18m tf-dbrestore zookeeper operator scaled to 0 Normal zookeeperStsScaledDown 18m tf-dbrestore tf-zookeeper statefulset scaled to 0 Normal cassandraOperatorScaledDown 17m tf-dbrestore cassandra operator scaled to 0 Normal cassandraStsScaledDown 17m tf-dbrestore tf-cassandra-config-dc1-rack1 statefulset scaled to 0 Normal cassandraStsPodsDeleted 16m tf-dbrestore tf-cassandra-config-dc1-rack1 statefulset pods deleted Normal cassandraPVCDeleted 16m tf-dbrestore tf-cassandra-config-dc1-rack1 PVC deleted Normal zookeeperStsPodsDeleted 16m tf-dbrestore tf-zookeeper statefulset pods deleted Normal zookeeperPVCDeleted 16m tf-dbrestore tf-zookeeper PVC deleted Normal kafkaOperatorScaledDown 16m tf-dbrestore kafka operator scaled to 0 Normal kafkaStsScaledDown 16m tf-dbrestore tf-kafka statefulset scaled to 0 Normal kafkaStsPodsDeleted 16m tf-dbrestore tf-kafka statefulset pods deleted Normal AllOperatorsStopped 16m tf-dbrestore All 3rd party operator's stopped Normal CassandraOperatorScaledUP 16m tf-dbrestore CassandraOperator scaled to 1 Normal CassandraStsScaledUP 16m tf-dbrestore Cassandra statefulset scaled to 3 Normal CassandraPodsActive 12m tf-dbrestore Cassandra pods active Normal ZookeeperOperatorScaledUP 12m tf-dbrestore Zookeeper Operator scaled to 1 Normal ZookeeperStsScaledUP 12m tf-dbrestore Zookeeper Operator scaled to 3 Normal ZookeeperPodsActive 12m tf-dbrestore Zookeeper pods active Normal DBRestoreFinished 12m tf-dbrestore TF db restore finished Normal TFRestoreDisabled 12m tf-dbrestore TF Restore disabled
Note
If the restoration was completed several hours ago, events may not be
shown with kubectl describe. If so, verify the Status
field and get events using the following command:
After the job completes, it can take around 15 minutes to stabilize
tf-control services. If some pods are still in the CrashLoopBackOff
status, restart these pods manually one by one:
List the tf-control pods:
kubectl-ntfgetpods-lapp=tf-control
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only the tf-control
pod that will be restarted.
Restart the tf-control pods sequentially:
kubectl-ntfdeletepodtf-control-<hash>
When the restoration completes, MOSK automatically sets
dbRestoreMode to false in the Tungsten Fabric Operator custom
resource.
Delete the tfdbrestore object from the cluster to be able to perform
the next restoration:
Terminate the configuration and analytics services and stop the database
changes associated with northbound APIs on all systems.
Note
The Tungsten Fabric Operator watches related resources and keeps
them updated and healthy. If any resource is deleted or changed, the
Tungsten Fabric Operator automatically runs reconciling to create
a resource or change the configuration back to the required state.
Therefore, the Tungsten Fabric Operator must not be running during
the data restoration.
Scale the tungstenfabric-operator deployment to 0 replicas:
Do not use the Tungsten Fabric API container used for the backup
file creation. In this case, a session with the Cassandra and ZooKeeper
databases is created once the Tungsten Fabric API service starts but
the Tungsten Fabric configuration services are stopped. The tools for
the data backup and restore are available only in the Tungsten Fabric
configuration API container. Using the steps below, start a blind
container based on the config-api image.
Deploy a pod using the configuration API image obtained in the first
step:
Note
Since MOSK 24.1, if your deployment uses the
cql Cassandra driver, update the value of the
CONFIGDB_CASSANDRA_DRIVER environment variable to cql.
To avoid network downtime, do not restart all pods
simultaneously.
List the tf-control pods
kubectl-ntfgetpods-lapp=tf-control
Restart the tf-control pods one by one.
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
kubectl-ntfdeletepodtf-control-<hash>
Convert v1alpha1 TFOperator custom resource to v2¶
Available since MOSK 24.2
In 24.1, MOSK introduces the API v2 for Tungsten Fabric.
Since 24.2, Tungsten Fabric API v2 becomes default for new deployments and
includes the ability to convert the existing v1alpha1 TFOperator to v2.
During the update to the 24.3 series, the old Tungsten Fabric cluster
configuration API v1alpha1 is automatically converted and replaced with the v2
version. And since MOSK 25.1, Tungsten Fabric API v1alpha1 is not present in
the product.
During cluster update to MOSK 24.3, the automatic
conversion of the TFOperator v1alpha1 to the v2 version takes place.
Therefore, there is no need to perform any manual conversion.
Warning
Since MOSK 24.3, start using the v2
TFOperator custom resource for any updates.
The v1alpha1 TFOperator custom resource remains in the cluster
but is no longer reconciled and will be automatically removed in
MOSK 25.1.
MOSK 24.2
Caution
Conversion of TFOperator causes recreation of the Tungsten
Fabric service pods. Therefore, Mirantis recommends performing
the conversion during a maintenance window.
Update the tungstenfabric-operator Helm release values in
the corresponding ClusterRelease resource:
When the chart changes apply, the tungstenfabric-operator-convert-to-v2
job performs the following:
Saves the existing v1alpha1 TFOperator specification to the
tfoperator-v1alpha1-copy ConfigMap
Creates the v2 TFOperator custom resource
Removes the redundant v1alpha1 TFOperator custom resource
While the conversion is being performed, monitor the recreation of the
Tungsten Fabric service pods. Verify that TFOperator v2 has been
created successfully:
kubectl-ntfdescribetf.mirantis.comopenstack-tf
Reverse the conversion of v1alpha1 TFOperator to v2¶
MOSK 24.3
Caution
During the reverse conversion, the Tungsten Fabric service
pods will get updated. Therefore, Mirantis recommends performing
the procedure during the maintenance window.
Caution
Reverse conversion is not possible since
MOSK 25.1 because v1alpha1 TFOperator is removed
from the product.
Update Helm release values in the corresponding ClusterRelease
resource:
values:operator:convertToV2:false
When the controller starts, it should use the v1alpha1 TFOperator
custom resources for reconcilation.
MOSK 24.2
Caution
During the conversion reverse, the Tungsten Fabric service
pods will get recreated. Therefore, Mirantis recommends performing
the conversion during the maintenance window.
Update the TFOperator HelmBundle:
values:operator:convertToV2:false
Manually, delete the v2 TFOperator custom resource:
kubectl-ntfdeletetf.mirantis.comopenstack-tf
Manually, create the v1alpha1 TFOperator custom resource using data from
the tfoperator-v1alpha1-copy ConfigMap.
If one of the Tungsten Fabric (TF) controller nodes has failed, follow this
procedure to replace it with a new node.
To replace a TF controller node:
Note
Pods that belong to the failed node can stay in the Terminating
state.
If a failed node has tfconfigdb=enabled or tfanalyticsdb=enabled,
or both labels assigned to it, get and note down the IP addresses of
the Cassandra pods that run on the node to be replaced:
Delete the failed TF controller node from the Kubernetes cluster using
the Mirantis Container Cloud web UI or CLI. For the procedure, refer
to Delete a cluster machine.
Note
Once the failed node has been removed from the cluster, all pods
that hanged in the Terminating state should be removed.
Remove the control (with the BGP router), config, and config-db
nodes from the TF configuration database:
curl-s-H"X-Auth-Token: $(openstacktokenissue|awk'/ id / {print $4}')"http://tf-config-api.tf.svc:8082/config-nodes|jq
Remove the failed config node:
curl-s-H"X-Auth-Token: $(openstacktokenissue|awk'/ id / {print $4}')"-X"DELETE"<LINK_FROM_HREF_WITH_NODE_UUID>
Obtain the list of config-database and control nodes:
curl-s-H"X-Auth-Token: $(openstacktokenissue|awk'/ id / {print $4}')"http://tf-config-api.tf.svc:8082/config-database-nodes|jq
curl-s-H"X-Auth-Token: $(openstacktokenissue|awk'/ id / {print $4}')"http://tf-config-api.tf.svc:8082/control-nodes|jq
Identify the config-database and control nodes to be deleted
using the href field from the system output from the previous step.
Delete the nodes as required:
curl-s-H"X-Auth-Token: $(openstacktokenissue|awk'/ id / {print $4}')"-X"DELETE"<LINK_FROM_HREF_WITH_NODE_UUID>
Hosts the TF control plane services such as database,
messaging, api, svc, config.
tfconfig=enabled
tfcontrol=enabled
tfwebui=enabled
tfconfigdb=enabled
3
TF analytics
Hosts the TF analytics services.
tfanalytics=enabled
tfanalyticsdb=enabled
3
TF vRouter
Hosts the TF vRouter module and vRouter Agent.
tfvrouter=enabled
Varies
TF vRouter DPDK Technical Preview
Hosts the TF vRouter Agent in DPDK mode.
tfvrouter-dpdk=enabled
Varies
Note
TF supports only Kubernetes OpenStack workloads.
Therefore, you should label OpenStack compute nodes with
the tfvrouter=enabled label.
Note
Do not specify the openvswitch=enabled label for the
OpenStack deployments with TF as a networking backend.
Once you label the new Kubernetes node, new pods start scheduling on the
node. Though, pods that use Persistent Volume Claims are stuck in the
Pending state as their volume claims stay bounded to the local volumes
from the deleted node. To resolve the issue:
Delete the PersistentVolumeClaim (PVC) bounded to the local volume
from the failed node:
Clustered services that use PVC, such as Cassandra, Kafka,
and ZooKeeper, start the replication process when new pods move
to the Ready state.
Check the PersistenceVolumes (PVs) claimed by the deleted PVCs.
If a PV is stuck in the Released state, delete it manually:
kubectl-ntfdeletepv<PV>
Delete the pod that is using the removed PVC:
kubectl-ntfdeletepod<POD-NAME>
Verify that the pods have successfully started on the replaced controller
node and stay in the Ready state.
If the failed controller node had tfconfigdb=enabled or
tfanalyticsdb=enabled, or both labels assigned to it, remove old
Cassandra hosts from the config and analytics cluster configuration:
Get the host ID of the removed Cassandra host using the pod IP addresses
saved during Step 1:
Tungsten Fabric vRouter collocates with the OpenStack compute node. Therefore,
to delete the vRouter node from your cluster, follow the node deletion
procedure in Delete a compute node.
Additionally, you need to remove the vhost0 OpenStack port and
the Node object of the deleted node from the Tungsten Fabric database:
Log in to the keystone-client pod in the openstack namespace through
the command line.
Obtain the OpenStack token required to authenticate with the Tungsten Fabric
API service:
TOKEN=$(openstacktokenissue|awk'/ id / {print $4}')
Obtain the list of vRouter nodes to retrieve the link for the deleted node:
This section explains how to use the tungsten-pytest test set to
verify your Tungsten Fabric (TF) deployment. The tungsten-pytest test
set is part of the TF operator and allows for prompt verification of the
Kubernetes objects related to TF and basic verification of the TF services.
To verify the TF deployment using tungsten-pytest:
Enable the tf-test controller in the TF Operator resource for the
Operator to start the pod with the test set:
This section describes a simple load balancing configuration. As an example, we
use a topology for balancing the traffic between two HTTP servers listening on
port 80. The example topology includes the following parameters:
Backend servers 10.10.0.4 and 10.10.0.3 in the private-subnet
subnet run an HTTP application that listens on the TCP port 80.
The public-subnet subnet is a shared external subnet created by the cloud
operator and accessible from the Internet.
The created load balancer is accessible through an IP address from the public
subnet that will distribute web requests between the backend servers.
By default, MOSK uses the Octavia Tungsten
Fabric load balancing. Since 23.1, you can explicitly specify amphorav2
as a provider when creating a load balancer using the provider
argument:
openstackloadbalancercreate--provideramphorav2
Octavia Amphora load balancing is available as a Technology Preview
feature. For details, refer to Octavia Amphora load balancing.
MOSK enables you to activate automatic Tungsten Fabric
database repairs using the tf-dbrepair-job CronJob. Running this repair
job is essential for maintaining the health and consistency of a Cassandra
cluster.
Below are scenarios where running the repair job is recommended:
Node recovery
If a node has been down for an extended period or replaced, run a repair
after bringing it back online to ensure it has all the latest data
Major changes
After significant changes, such as cluster update or schema modifications,
run a repair to ensure data consistency
Cluster modifications
When nodes are added or removed, data may not immediately be consistent
across replicas. Run a repair to reconcile the data across the cluster
To enable the repair job:
Edit the TFOperator custom resource to enable the database repair job:
spec:features:dbRepair:enabled:true
Optional. Specify the job schedule. By default, the job will run weekly.
For example, to schedule the job to run daily:
The contrail-tools container provides a centralized location for all
available Tungsten Fabric tools and CLI commands. The container includes such
utilities as vif, flow, nh, and other tools
to debug network issues. MOSK deploys contrail-tools
using the Tungsten Fabric Operator through the TFOperator custom resource.
To enable the Tungsten Fabric contrail-tools Deployment:
Enable the tools Deployment in the TFOperator resource for the
operator to start the Pods with utilities to debug Tungsten Fabric on
nodes with the tfvrouter:enabled label:
Use the labels section to specify target nodes for
the contrail-tools Deployment. If the labels section
is not specified, the tf-tool-ctools-<xxxxx> Pods
will be scheduled to all available nodes in current Deployment.
Wait until the tf-tool-ctools-<xxxxx> Pods are ready in the tf
namespace.
Note
The <xxxxx> string in a Pod name consists of random
alpha-numeric symbols generated by Kubernetes to differentiate the
tf-tool-ctools Pods.
Use interactive shell in the tf-tool-ctools-<xxxxx> Pod to debug
current Deployment or run commands through kubectl, for example:
The tf-api-cli container provides access to the Tungsten Fabric
API through the command-line interface (CLI). See the
contrail-api-cli documentation
for details.
Note
The tf-api-cli tool was initially called contrail-api-cli.
To enable the Tungsten Fabric API CLI Deployment:
Enable the tf-cli Deployment in the TFOperator custom resource
to start the Pod with utilities to access the Tungsten Fabric API CLI:
Usage of third-party software, which is not part of
Mirantis-supported configurations, for example, the use of custom DPDK
modules, may block upgrade of an operating system distribution. Users are
fully responsible for ensuring the compatibility of such custom components
with the latest supported Ubuntu version.
Distribution upgrade of an operating system (OS) is implemented for management
and MOSK clusters.
For management clusters, an OS distribution upgrade occurs automatically
since Container Cloud 2.24.0 (Cluster release 14.0.0) as part of cluster update
and requires machines reboot. The upgrade workflow is as follows:
The distribution ID value is taken from the id field of the
distribution from the allowedDistributions list in the spec of the
ClusterRelease object.
The distribution that has the default:true value is used during
update. This distribution ID is set in the
spec:providerSpec:value:distribution field of the Machine object
during cluster update.
For MOSK clusters, an in-place OS distribution upgrade
should be performed between cluster updates. This scenario implies a machine
cordoning, draining, and reboot.
The table below illustrates the correlation between the cluster updates and
upgrade to Ubuntu 22.04 to help you effectively plan and perform the upgrade.
Correlation between cluster updates and upgrade to Ubuntu 22.04¶
Management cluster version
MOSK cluster version
Default Ubuntu version
Key impact
Action required
2.28.5
24.3.0
24.3.1
24.3.2
22.04
No impact
Management cluster nodes are automatically upgraded to Ubuntu 22.04
during cluster upgrade to Container Cloud 2.27.0 (Cluster release
16.2.0)
Strongly recommended to upgrade Ubuntu on all MOSK
cluster nodes. 0
2.29
25.1
22.04
MOSK cluster requires Ubuntu 22.04. Upgrade to 25.1
is blocked unless all cluster nodes are running on Ubuntu 22.04.
Upgrade all MOSK cluster nodes to Ubuntu 22.04
to unblock the MOSK cluster update to 25.1. 0
2.29.1
24.3.3
22.04
Management cluster update to 2.29.1 is blocked unless all nodes present
in your deployment are running on Ubuntu 22.04.
Upgrade all nodes to Ubuntu 22.04 to unblock your management cluster
update. 0
Upgrading all nodes at once is not mandatory. You can upgrade them
individually or in small batches, depending on time constraints in
the maintenance window.
Caution
After the major cluster update, make sure to change the
postponeDistributionUpdate parameter back to false unless you want
to postpone new OS distribution upgrades.
Note
If you want to migrate container runtime on cluster machines from
Docker to containerd and have not upgraded the OS distribution to Jammy yet,
Mirantis recommends combining both procedures to minimize the maintenance
window. In this case, ensure that all cluster machines are updated during
one maintenance window to prevent machines from running different
container runtimes.
The machine reboot occurs automatically after completion of deployment
phases.
Once the distribution upgrade completes, verify that currentDistribution
matches the distribution value previously set in the object spec.
For description of the status fields, see Container Cloud documentation:
API Reference.
Repeat the procedure with the remaining machines.
Optional. Available since Container Cloud 2.28.4 (Cluster releases 17.3.4
and 16.3.4). Upgrade container runtime from Docker to containerd together
with distribution upgrade as described in Migrate container runtime from Docker to containerd
to minimize the size of maintenance window.
Note
Container runtime migration becomes mandatory in the scope of
Container Cloud 2.29.x. Otherwise, the management cluster update to
Container Cloud 2.30.0 will be blocked.
During a management or managed cluster update with Ubuntu package updates,
MOSK automatically removes unnecessary kernel and system
packages.
During cleanup, MOSK keeps a number of kernel versions
following the default behavior of the Ubuntu apt autoremove command:
Booted kernel
The currently booted kernel is always kept.
Latest kernel
If there are any kernel packages with versions higher than the booted
kernel version, then the kernel package with the highest version is
also kept.
Previous kernel
If there are any installed kernel packages with versions lower than the one
of the latest kernel that equals the booted kernel, then the kernel with
version previous to the latest kernel is kept.
Note
Previous kernel does not equal the previously booted kernel.
Previous kernel is an N-1 kernel in a sorted list of all kernels
installed in the system where N is the kernel with the highest version.
Caution
If a kernel package is a dependency of another package, it will
not be automatically removed. The rules above do not apply to such a case.
The number of kernel packages may be more than two if the
apt autoremove command has never been used or if the cluster is
affected by the known issue 46808.
Mirantis recommends keeping previous kernel version for fallback in case the
current kernel becomes unstable. However, if you absolutely require leaving
only the booted version of kernel packages, you can use the script described
below after considering all possible risks.
To remove all kernel packages of the previous version:
Verify that the cluster is successfully updated and is in the Ready
state.
Log in as root to the required node using SSH.
Run the following script that calls an Ansible module targeted at local
host. The module outputs a list of packages to remove, if any, without
actually removing them.
cleanup-kernel-packages
The script workflow includes the following tasks:
Task order
Task name
Description
1
Getkernelstocleanup
Collect installed kernel packages and detect the candidates for removal.
2
Getkernelstocleanup(LOG)
Print the log from the first task.
3
Kernelpackagestoremove
Print the list of packages collected by the first task.
4
Removekernelpackages
Remove packages that are detected as candidates for removal if the
following conditions are met:
The script detects at least one candidate for removal
You add the --cleanup flag to the
cleanup-kernel-packages command
If the system outputs any packages to remove, carefully assess the list
from the output of the Kernelpackagestoremove task.
Caution
The script removes all detected packages. There is no
possibility to modify the list of candidates for removal.
Example of system response with no found packages to remove
PLAY[localhost]
TASK[Getkernelstocleanup]
ok:[localhost]
TASK[Getkernelstocleanup(LOG)]
ok:[localhost]=>{"cleanup_kernels.log":["2023-09-27 12:49:31,925 [INFO] Logging enabled",
"2023-09-27 12:49:31,937 [DEBUG] Found kernel package linux-headers-5.15.0-83-generic, version 5.15.0.post83-generic",
"2023-09-27 12:49:31,938 [DEBUG] Found kernel package linux-image-5.15.0-83-generic, version 5.15.0.post83-generic",
"2023-09-27 12:49:31,938 [DEBUG] Found kernel package linux-modules-5.15.0-83-generic, version 5.15.0.post83-generic",
"2023-09-27 12:49:31,938 [DEBUG] Found kernel package linux-modules-extra-5.15.0-83-generic, version 5.15.0.post83-generic",
"2023-09-27 12:49:31,944 [DEBUG] Current kernel is 5.15.0.post83-generic",
"2023-09-27 12:49:31,944 [INFO] No kernel packages prior version '5.15.0.post83' found, nothing to remove.",
"2023-09-27 12:49:31,944 [INFO] Exiting successfully"]}
TASK[Kernelpackagestoremove]
ok:[localhost]=>{"cleanup_kernels.packages":[]}
TASK[Removekernelpackages]
skipping:[localhost]
Example of system response with several packages to remove
TASK[Getkernelstocleanup]
ok:[localhost]
TASK[Getkernelstocleanup(LOG)]
ok:[localhost]=>{"cleanup_kernels.log":["2023-09-28 10:08:42,849 [INFO] Logging enabled",
"2023-09-28 10:08:42,865 [DEBUG] Found kernel package linux-headers-5.15.0-79-generic, version 5.15.0.post79-generic",
"2023-09-28 10:08:42,865 [DEBUG] Found kernel package linux-headers-5.15.0-83-generic, version 5.15.0.post83-generic",
"2023-09-28 10:08:42,865 [DEBUG] Found kernel package linux-hwe-5.15-headers-5.15.0-79, version 5.15.0.post79",
"2023-09-28 10:08:42,865 [DEBUG] Found kernel package linux-hwe-5.15-headers-5.15.0-83, version 5.15.0.post83",
"2023-09-28 10:08:42,866 [DEBUG] Found kernel package linux-image-5.15.0-79-generic, version 5.15.0.post79-generic",
"2023-09-28 10:08:42,866 [DEBUG] Found kernel package linux-image-5.15.0-83-generic, version 5.15.0.post83-generic",
"2023-09-28 10:08:42,866 [DEBUG] Found kernel package linux-modules-5.15.0-79-generic, version 5.15.0.post79-generic",
"2023-09-28 10:08:42,866 [DEBUG] Found kernel package linux-modules-5.15.0-83-generic, version 5.15.0.post83-generic",
"2023-09-28 10:08:42,866 [DEBUG] Found kernel package linux-modules-extra-5.15.0-79-generic, version 5.15.0.post79-generic",
"2023-09-28 10:08:42,866 [DEBUG] Found kernel package linux-modules-extra-5.15.0-83-generic, version 5.15.0.post83-generic",
"2023-09-28 10:08:42,871 [DEBUG] Current kernel is 5.15.0.post83-generic",
"2023-09-28 10:08:42,871 [INFO] Kernel package version prior '5.15.0.post83': 5.15.0.post79",
"2023-09-28 10:08:42,872 [INFO] No kernel packages after version '5.15.0.post83' found.",
"2023-09-28 10:08:42,872 [INFO] Kernel package versions to remove: 5.15.0.post79",
"2023-09-28 10:08:42,872 [DEBUG] The following packages are candidates for autoremoval: linux-headers-5.15.0-79-generic, linux-hwe-5.15-headers-5.15.0-79,linux-image-5.15.0-79-generic, linux-modules-5.15.0-79-generic, linux-modules-extra-5.15.0-79-generic",
"2023-09-28 10:08:45,338 [DEBUG] The following packages are resolved reverse dependencies for autoremove candidates: linux-modules-5.15.0-79-generic, linux-modules-extra-5.15.0-79-generic, linux-hwe-5.15-headers-5.15.0-79, linux-headers-5.15.0-79-generic, linux-image-5.15.0-79-generic",
"2023-09-28 10:08:45,338 [INFO] No protected packages found",
"2023-09-28 10:08:45,339 [INFO] Exiting successfully"]}
TASK[Kernelpackagestoremove]
ok:[localhost]=>{"cleanup_kernels.packages":["linux-headers-5.15.0-79-generic",
"linux-hwe-5.15-headers-5.15.0-79",
"linux-image-5.15.0-79-generic",
"linux-modules-5.15.0-79-generic",
"linux-modules-extra-5.15.0-79-generic"]}
TASK[Removekernelpackages]****************
skipping:[localhost]
If you decide to proceed with removal of package candidates, rerun
the script with the --cleanup flag:
This section describes operations required during configuration of the
operating system installed on bare metal hosts of a MOSK
cluster.
Caution
Due to the known issue 49678
addressed in MOSK 25.1, the HostOSConfiguration object may not work
as expected after migration to containerd. For details, see the issue
description.
Available since MCC 2.26.0 (17.1.0 and 16.1.0)TechPreview
Important
The cloud operator takes all risks and responsibility for module
execution on cluster machines. For any questions, contact Mirantis support.
Caution
Due to the known issue 49678
addressed in MOSK 25.1, the HostOSConfiguration object may not work
as expected after migration to containerd. For details, see the issue
description.
The day-2 operations API extends configuration management of baremetal-based
clusters and machines after initial deployment. The feature allows managing
the operating system of a bare metal host granularly using modules without
rebuilding the node from scratch. Such approach prevents workload evacuation
and significantly reduces configuration time.
The day-2 operations API does not limit the cloud operator’s ability to
configure machines in any way, making the operator responsible for day-2
adjustments.
This section provides guidelines for Container Cloud or custom modules that
are used by the HostOSConfiguration and HostOSConfigurationModules
custom resources designed for baremetal-based management and managed clusters.
Add the configuration of the Container Cloud or custom module to
an existing HostOSConfiguration (hoc) object or create a new
hoc object with the following details:
Add the required configuration details of the module.
Set the selector for machines to apply the configuration.
Publish the module in a repository from which the cloud
operator can fetch the module.
Share the module details with the cloud operator.
The following diagram illustrates the high-level overview of the day-2
operations API:
Global recommendations for implementation of custom modules¶
The following global recommendations are intended to help creators of modules
and cloud operators to work with the day-2 operations API for module
implementation and execution, in order to keep the cluster and machines
healthy and ensure safe and reliable cluster operability.
Module functionality is limited only by the Ansible itself along with playbook
rules for a particular Ansible version. But Mirantis highly recommends paying
a special attention to critical components of Container Cloud, some of which
are mentioned below, and not managing them by the means of day-2 modules.
Important
The cloud operator takes all risks and responsibility for module
execution on cluster machines. For any questions, contact Mirantis support.
Do not restart Docker, containerd, and Kubernetes-related services.
Do not configure Docker and Kubernetes node labels.
Do not reconfigure or upgrade MKE.
Do not change the MKE bundle.
Do not reboot nodes using a day-2 module.
Do not change network configuration, especially on critical LCM and external
networks, so that they remain consistent with kaas-ipam objects.
Do not change iptables, especially for Docker, Kubernetes, and Calico rules.
Do not change partitions on the fly, especially the / and
/var/lib/docker ones.
Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0), the
following Ansible versions are supported for Ubuntu 20.04 and 22.04:
Ansible 2.12.10 and Ansible 5.10.0-collection. Therefore, your custom modules
must be compatible with the corresponding Ansible versions provided for a
specific Cluster release, on which your cluster is based.
To verify the Ansible version in a specific Cluster release, refer to
Container Cloud Release notes: Cluster releases.
Use the Artifacts > System and MCR artifacts section of the corresponding
Cluster release. For example, for 17.3.0.
By default, any Ansible execution in Container Cloud uses
/etc/ansible/mcc.cfg as Ansible configuration. A custom module may require
a specific Ansible configuration that you can add using ansible.cfg in the
root folder of the module and package it in the module archive. Such
configuration has higher priority than the default one.
Treat a day-2 module as an Ansible module to control a limited set of system
resources related to one component, for example, a service or driver, so that
a module contains a very limited amount of tasks to set up that component.
For example, if you need to configure a service on a host, the module must
manage only package installation, related configuration files, and service
enablement. Do not implement the module in a way so that it manages all tasks
required for the day-2 configuration of a host. Split such functionality on
tasks (modules) responsible for management of a single component. This helps to
re-apply (re-run) every module separately in case of any changes.
Mirantis highly recommends using the following key principles during module
implementation:
Idempotency
Any module re-run with the same configuration values must lead to the same
result.
Granularity
The module must manage only one specific component on a host.
Reset action
The module must be able to revert changes introduced by the module, or
at least the module must be able to disable the component controller.
The Container Cloud LCM does not provide a way to revert a day-2 change due to
unpredictability of potential functionality of any module. Therefore, the
reset action must be implemented on the module level. For example,
the package or file state can be present or absent, a service can be
enabled or disabled. And these states must be controlled by the
configuration values.
Mirantis highly recommends verifying any Container Cloud or custom module on
one machine before applying it to all target machines. For the testing
procedure, see Test a custom or MOSK module after creation.
A custom module may require node reboot after execution.
Implement a custom module using the following options, so that it can notify
lcm-agent and Container Cloud controllers about the required reboot:
If a module installs a package that requires a host reboot, then the
/run/reboot-required and /var/run/reboot-required.pkgs files
are created automatically by the package manager. LCM Agent detects these
files and places information about the reboot reason in the LCMMachine
status.
A module can create the /run/reboot-required file on the node. You can
add the reason for reboot in one of the following files as plain text:
/run/day2/reboot-requiredSince Container Cloud 2.28.0 (Cluster
releases 17.3.0 and 16.3.0)
/run/lcm/reboot-requiredDeprecated since Container Cloud 2.28.0
This text is passed to the reboot reason in the LCMMachine status.
If the name field is absent, then the deprecation logic is applied to the
module with the same name, meaning that the example above effectively equals
to the following one:
Archive the file with the module package in the GZIP format.
Implement all playbooks for Ansible version used by a specific Cluster
release of your Container Cloud cluster. For example, in Cluster releases
16.2.0 and 17.2.0, Ansible collection 5.10.0 and Ansible core 2.12.10
are used.
To verify the Ansible version in a specific Cluster release, refer to
Container Cloud Release notes: Cluster releases.
Use the Artifacts > System and MCR artifacts section of the corresponding
Cluster release. For example, for 17.3.0.
Note
Mirantis recommends implementing each module in modular approach
avoiding a single module for everything. This ensures maintainability and
readability, as well as improves testing and debugging. For details, refer
to Global recommendations for implementation of custom modules.
The common structure of metadata.yaml is as follows:
name
Required. Name of the module.
version
Required. Version of the module.
docURL
Optional. URL to the module documentation.
description
Optional. Brief summary of the module, useful if the complete documentation
is too detailed.
playbook
Required. Path to the module playbook. Path must be related to the archive
root that is directory/playbook.yaml if directory is a directory in
the root of the archive.
valuesJsonSchema
Optional. Path to the JSON-validation schema of the module. Path must be
related to the archive root that is directory/schema.json if
directory is a directory in the root of the archive.
deprecates
Optional. Available since Container Cloud 2.28.0 (Cluster releases 17.3.0
and 16.3.0). List of modules that are deprecated by the module.
For details, see Module deprecation.
supportedDistributions
Optional. Available since Container Cloud 2.28.0 (Cluster releases 17.3.0
and 16.3.0). List of operating system distributions that are supported by
the current module. An empty list means support of any distribution by the
current module.
An archive of a module does not require an inventory because the inventory
is generated by lcm-controller while processing configurations.
The format of the generated inventory is as follows:
all:hosts:localhost:ansible_connection:localvars:values:{{- range $key,$value:= .Values}}{{$key}}:{{$value}}{{- end}}
Available since MCC 2.27.0 (17.2.0 and 16.2.0)TechPreview
MOSK provides several configuration modules that use the
designated hocm object named mcc-modules. All other hocm objects
contain custom modules. For configuration modules provided by Mirantis, refer
to host-os-modules documentation.
Warning
Do not modify the mcc-modules object that contains only
Mirantis-provided modules. Any changes to this object will be overwritten
with data from an external source.
HostOSConfiguration and HostOSConfigurationModules concepts¶
Available since MCC 2.26.0 (17.1.0 and 16.1.0)TechPreview
This section outlines fundamental concepts of the HostOSConfiguration, aka
hoc, and HostOSConfigurationModules, aka hocm, custom resources
as well as provides usage guidelines for these resources. For detailed
descriptions of these resources, see Container Cloud API Reference: Bare metal
resources.
MOSK provides modules, which are described in
host-os-modules documentation
, using the designated hocm object named mcc-modules. All other
hocm objects contain custom modules.
Warning
Do not modify the mcc-modules object that contains only
Mirantis-provided modules. Any changes to this object will be overwritten
with data from an external source.
When the value of the machineSelector field in a hoc object is empty
(by default), no machines are selected. Therefore, no actions are triggered
until you provide a non-empty machineSelector.
This approach differs from the default behavior of Kubernetes selectors
to ensure that none of configurations are applied to all machines in a cluster
accidentally.
It is crucial to ensure that the namespace of a hoc object is the same as
the namespace of the associated Machine objects defined in the
machineSelector field.
For example, the following machines are located in two separate namespaces,
default and other-ns, and the hoc object is located in
other-ns:
And although machineSelector in the hoc object contains
example-label="1", which is set for machines in both namespaces, but only
worker-0, worker-1, worker-2 will be selected because the hoc
object is located in the other-ns namespace.
You may use arbitrary types for primitive (non-nested) values. But for optimal
compatibility and clarity, Mirantis recommends using string values for
primitives in the values section of a hoc object. This practice helps
maintain consistency and simplifies the interpretation of configurations.
Under the hood, all primitive values are converted to strings.
You can pass the values of any day-2 module to the HostOSConfiguration
object using both the values and secretValues fields simultaneously.
But if a key is present in both fields, the value from secretValues
is applied.
The values field supports the YAML format for values with any nesting
level. The HostOSConfiguration controller and provider use the YAML parser
underneath to manage the values. The following examples illustrate simple and
nested configuration formats:
The secretValues field is a reference (namespace and name) to the
Secret object.
Warning
The referenced Secret object must contain only
primitive non-nested values. Otherwise, the values will not be applied
correctly. Therefore, implement your custom modules in a way that secret
parameters are on the top level and not used within nested module
parameters.
You can create a Secret object in the YAML format. For example:
Manually encode secret values using the base64 format and ensure
that the value does not contain trailing whitespaces or line translation
such as the \n symbol. For example:
echo-n"secret"|base64
You can also create the Secret object using the kubectl command.
This way, the secret values are automatically base64-encoded:
Available since MCC 2.26.0 (17.1.0 and 16.1.0)TechPreview
This section describes integrations between the HostOSConfiguration
custom resouce, aka hoc, HostOSConfigurationModules custom resouce,
aka hocm, LCMCluster, and LCMMachine.
The implementation of the internal API used by day-2 operations utilizes
the current approach of StateItems, including the way how they are
processed and passed to lcm-agent.
The workflow of the internal API implementation is as follows:
Create a set of StateItem entries in LCMCluster taking into account
all hoc objects in the namespace of LCMCluster.
Fill out StateItems for each LCMMachine that was selected by the
machineSelector field value of a hoc object.
Pass StateItems to lcm-agent that is responsible for their execution
on nodes.
The machineSelector field selects Machine objects, but they map to
LCMMachine objects in 1-1 relation. This way, each selected Machine
exactly maps to a relevant LCMMachine object.
LCMCluster utilizes empty StateItem to establish
a baseline connection between the hoc, LCMMachine objects
and lcm-agent on nodes. These empty items have no parameters and
serve as placeholders, providing a template for further processing.
To identify items added from hoc objects, these StateItems along with
other state items of an LCMCluster object are located in the
.spec.machinesTypes.control and .spec.machinesTypes.worker blocks
with the following fields in an LCMCluster object:
params is absent
phase is reconfigure as the only supported value
version is v1 as the only supported value
runner can be either downloader or ansible:
downloader downloads the
package of a module of the provided version
into machine.
ansible executes the module on the machine with provided values.
name has the following patterns:
host-os-<hocObjectName>-<moduleName>-<moduleVersion>-<modulePhase>
if the runner field has the ansible value set
host-os-download-<hocObjectName>-<moduleName>-<moduleVersion>-<modulePhase> if the runner field has the downloader value set.
The following example of an LCMCluster object illustrates empty
StateItems for the following configuration:
Machine type - worker
hoc object name - test with a single entry in the configs field
To properly execute the StateItem list according to given configurations
from a hoc object, the implementation utilizes the
.spec.stateItemsOverwrites field in an LCMMachine object.
For each state item that corresponds to a hoc object selected for current
machine, each entry of the stateItemsOverwrites field dictionary is filled
in with key-value pairs:
Key is a StateItem name
Value is a set of parameters from the module configuration values that will
be passed as parameters to StateItem.
After the stateItemsOverwrites field is updated, the corresponding
StateItem entries are filled out with values from the
stateItemsOverwrites.
Once the StateItem list is updated, it is passed to lcm-agent to be
finally applied on nodes.
The following example of an LCMMachine object illustrates the
stateItemsOverwrites field having a hoc object with a single entry
in the configs field, configuring a module named sample-module with
version 1.0.0:
HostOSConfiguration processing by baremetal-provider¶
While processing the hoc object, baremetal-provider verifies the
hoc resource for both controlled LCMCluster and LCMMachine
resources.
Each change to a hoc object immediately triggers its resources if
host-os-modules-controller has successfully validated changes.
This behavior enables updates to existing LCMCluster and LCMMachine
objects described in the sections above. Thus, all empty StateItems,
overwrites, and filled out StateItems appear almost instantly.
This behavior also applies when removing a hoc object, thereby cleaning
everything related to the object. The object deletion is suspended until the
corresponding StateItems of a particular LCMMachine object is cleaned
up from the object status field.
Warning
A configuration that is already applied using the deleted hoc
object will not be reverted from nodes, because the feature does not provide
rollback mechanism. For module implementation details, refer to
Global recommendations for implementation of custom modules.
Do not modify the mcc-modules object that contains only
Mirantis-provided modules. Any changes to this object will be overwritten
with data from an external source.
To add a custom module to a MOSK deployment:
If you use a proxy on the management and/or managed cluster, ensure that
the custom module can be downloaded through that proxy, or domain address
of the module URL is added to the NO_PROXY value of the related
Proxy objects.
This way, the HostOSConfiguration Controller can download and verify
the module and its input parameters on the management cluster. After that,
the LCM Agent can download the module to any cluster machines for execution.
In the hocm object, set the name and version fields with the
same values from the corresponding fields in metadata.yaml of the module
archive. For details, see Metadata file format.
After you add a custom module to a Container Cloud deployment, the process of
fetching a module archive involves the following automatic steps:
Retrieve the .tgz archive of the module and unpack it into a temporary
directory.
Retrieve the metadata.yaml file and
validate its contents. Once done, the status of the module in the hocm
object reflects whether the archive fetching and validating succeeded or
failed.
The validation process includes the following verifications:
Validate that the SHA256 hash sum of the archive equals the value defined
in the sha256sum field.
Validate that the playbook key is present.
Validate that the file defined in the playbook key value exists in the
archive and has a non-zero length.
Validate that the name and version values from metadata.yaml
equal the corresponding fields in the hocm object.
If the valuesJsonSchema key is defined, validate that the file from the
key value exists in the archive and has a non-zero length.
Available since MCC 2.26.0 (17.1.0 and 16.1.0)TechPreview
Important
The cloud operator takes all risks and responsibility for module
execution on cluster machines. For any questions, contact Mirantis support.
After you create a custom or configure a MOSK module,
verify it on one machine before applying it to all target machines.
This approach ensures safe and reliable cluster operability.
Verify that the status field of modules execution is healthy, validate
logs, and verify that the machine is in the ready state.
If the execution result meets your expectations, continue applying
HostOSConfiguration on other machines using one of the following
options:
Use the same HostOSConfiguration object:
Change the matchLabels value in the machineSelector field to
match all target machines.
Assign the labels from the matchLabels value to other target
machines.
Create a new HostOSConfiguration object.
Note
Mirantis highly recommends using specific custom labels on machines
and in the HostOSConfiguration selector, so that HostOSConfiguration
is applied only to the machines with the specific custom label.
The cloud operator takes all risks and responsibility for module
execution on cluster machines. For any questions, contact Mirantis support.
There is no API to reexecute the same successfully applied module configuration
upon user request. Once executed, the same configuration will never be executed
prior to either of the following actions is taken on the hoc object:
Change the module-related values of the configs field list
Change the data of the Secret object referenced by the module-related
secretValues of the configs field list
To retrigger exactly the same configuration for a module, select one of the
following options:
Reapply machineSelector:
Save the current selector value.
Update the selector to match no machines (empty value) or those machines
where configuration should not be reapplied.
Update the selector to the previously saved value.
Re-create the hoc object:
Dump the whole hoc object.
Remove the hoc object.
Reapply the hoc object from the dump.
Caution
The above steps retrigger all configuration from the configs
field of the hoc object. To avoid such behavior, Mirantis recommends
the following procedure:
Copy a particular module configuration to a new hoc object and remove
the previous machineSelector field.
Remove this configuration from the original hoc object.
Add the required values to the machineSelector field in the new
object.
This section describes possible issues you may encounter while working with
day-2 operations as well as approaches on how to address these issues.
Troubleshoot the HostOSConfigurationModules object¶
In .status.modules, verify whether all modules have been loaded and
verified successfully. Each module must have the available value in the
state field. If not, the error field contains the reason of the issue.
Example of different erroneous states in a hocm object:
status:modules:# error state: hashes mismatched-error:'hashesarenotthesame:got''d78352e51792bbe64e573b841d12f54af089923c73bc185bac2dc5d0e6be84cd''want''c726ab9dfbfae1d1ed651bdedd0f8b99af589e35cb6c07167ce0ac6c970129ac'''name:sysctlsha256sum:d78352e51792bbe64e573b841d12f54af089923c73bc185bac2dc5d0e6be84cdstate:errorurl:<url-to-package>version:1.0.0# error state: an archive is not available because of misconfigured proxy-error:'failedtoperformrequesttofetchthemodulearchive:Get"<url-to-package>":Forbidden'name:custom-modulestate:errorurl:<url-to-package>version:0.0.1# successfully loaded and verified module-description:Module for package installationdocURL:https://docs.mirantis.comname:packageplaybookName:main.yamlsha256sum:2c7c91206ce7a81a90e0068cd4ce7ca05eab36c4da1893555824b5ab82c7cc0estate:availableurl:<url-to-package>valuesValidationSchema:<gzip+base64 encoded data>version:1.0.0
If a module is in the error state, it might affect the corresponding
hoc object that contains the module configuration.
Example of erroneous status in a hoc object:
status:configs:-moduleName:sysctlmoduleVersion:1.0.0modulesReference:mcc-moduleserror:module is not found or not verified in any HostOSConfigurationModules object
To resolve an issue described in the error field:
Address the root cause. For example, ensure that a package has the correct
hash sum, or adjust the proxy configuration to fetch the package, and so on.
Recreate the hocm object with correct settings.
Setting syncPeriod for debug sessions
During test or debug sessions where errors are inevitable, you can set a
reasonable sync period for host-os-modules-controller to avoid manual
recreation of hocm objects.
To enable the option, set the syncPeriod parameter in the
spec:providerSpec:value:kaas:regional:helmReleases: section of the
management Cluster object:
Verify that the corresponding LCMCluster object has all related
StateItems.
Verify that all selected LCMMachines have the
.spec.stateItemsOverwrites field,
in which all StateItems from the previous step are present.
Verify that all StateItems from the previous step have been successfully
processed by lcm-agent. Otherwise, a manual intervention is required.
To address an issue with a specific StateItem for which the lcm-agent
is reporting an error, log in to the corresponding node and
inspect Ansible execution logs:
ssh-i<path-to-ssh-key>mcc-user@<ip-addr-of-the-node>
sudo-i
cd/var/log/lcm/runners/
# from 2 directories, select the one# with subdirectories having 'host-os-' prefixcd<selected-dir>/<name-of-the-erroneous-state-item>
less<logs-file>
After the inspection, either resolve the issue manually or escalate the issue
to Mirantis support.
The day-2 operations API allows enabling logs of debug level, which is
integrated into the baremetal-provider controller and
host-os-modules-controller. Both may be helpful during debug sessions.
To enable log debugging in host-os-modules-controller, add the following
snippet to the Cluster object:
Migrate container runtime from Docker to containerd¶
Available since 2.28.4 (Cluster releases 17.3.4 and 16.3.4)
Caution
Due to the known issue 49678,
the HostOSConfiguration object may not work as expected after migration
to containerd. For details, see the issue description.
Migration of container runtime from Docker to containerd is implemented for
existing management and managed clusters. The use of containerd allows for
better Kubernetes performance and component update without pod restart when
applying fixes for CVEs.
Note
On greenfield deployments, containerd is the default container
runtime since Container Cloud 2.29.0 and MOSK 25.1.
Before that, Docker remains the default option.
Before the container runtime mirgation, consider the following precautions:
The migration involves machine cordoning and draining.
Cluster update is not allowed during migration to prevent machines from
running different container runtimes. However, you can still scale clusters
and replace nodes as required.
The migration is mandatory during the scope of Container Cloud 2.29.x.
Otherwise, the management cluster update to Container Cloud 2.30.0 will be
blocked.
Note
If you have not upgraded the operating system distribution on your
machines to Jammy yet, Mirantis recommends migrating machines from Docker
to containerd on managed clusters together with distribution upgrade to
minimize the maintenance window.
In this case, ensure that all cluster machines are updated at once during
the same maintenance window to prevent machines from running different
container runtimes.
You can schedule more than one machine for migration at the same
time. In this case, the process is automatically orchestrated without
service interruption.
In the metadata.annotations section, add the following annotation to
trigger migration to containerd runtime:
If an emergency related to containerd occurs on workloads before
migration is complete on all machines, you can temporarily roll back
containerd to Docker. Use the procedure above by changing the
kaas.mirantis.com/preferred-container-runtime annotation from
containerd to docker.
Change a user name and password for a bare metal host¶
This section describes how to change a user name and password of a bare metal
host using an existing BareMetalHostCredential object.
To change a user name and password for a bare metal host:
Open the BareMetalHostCredential object of the required bare metal
host for editing.
In the spec section:
Update the username field
Replace password.name:<secretName> with
password.value:<hostPasswordInPlainText>
Adding a password value is mandatory for a user name change.
You can either create a new password value or copy the existing one
from the related Secret object.
Caution
Changing a user name in the related Secret object does
not automatically update the BareMetalHostCredential object.
Therefore, Mirantis recommends updating credentials only using the
the BareMetalHostCredential object.
Warning
The kubectl apply command automatically saves the
applied data as plain text into the
kubectl.kubernetes.io/last-applied-configuration annotation of the
corresponding object. This may result in revealing sensitive data in this
annotation when creating or modifying the object.
Therefore, do not use kubectl apply on this object.
Use kubectl create, kubectl patch, or
kubectl edit instead.
If you used kubectl apply on this object, you
can remove the kubectl.kubernetes.io/last-applied-configuration
annotation from the object using kubectl edit.
You can use the Container Cloud API to restart an inspection of a bare metal
host in MOSK clusters. For example, this procedure is useful
when hardware was changed. This works for bare metal hosts that were not
provisioned yet or were successfully deprovisioned.
The workflow of the reinspection procedure desribed above is as follows:
Ensure that the BareMetalHostInventory object is not bound to any
Machine object and it is in the available state.
Edit the BareMetalHostInventory object to initiate an inspection of the
bare metal server that hosts the node.
Note
Before update of the management cluster to Container Cloud 2.29.0
(Cluster release 16.4.0), instead of BareMetalHostInventory, use the
BareMetalHost object. For details, see Container Cloud API Reference:
BareMetalHost resource.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
To restart the inspection of a bare metal host:
Using kubeconfig of the management cluster, access the Container Cloud
API and inspect the BareMetalHost object:
NAME STATE CONSUMER ONLINE ERROR AGE...managed-worker-a-storage-worker preparing managed-worker-a-storage-worker false 5d3hmanaged-worker-b-storage-worker preparing managed-worker-b-storage-worker false 5d3hmanaged-worker-c-storage-worker available false 5d3h
In the system response above, the managed-worker-c-storage-worker bare
metal host is in the available state and has no consumer (not bound
to any Machine). Therefore, you can reinspect it.
Open the required bare metal host object for editing:
Since the management cluster update to 16.4.0 (MCC 2.29.0)
You can use the Container Cloud API to restart a bare metal host in
Mirantis OpenStack for Kubernetes clusters. The workflow of the host restart
is as follows:
Set the maintenance mode on the cluster that contains the target node.
Set the maintenance mode on the target node for OpenStack and Container
Cloud to drain it from workloads. No new workloads will be provisioned to a
host in the maintenance mode.
Use the bare metal host object to initiate a hard reboot of the bare metal
server that hosts the node.
To restart a bare metal host:
Using kubeconfig of the Container Cloud management cluster, access the
Container Cloud API and open the Cluster object for editing:
kubectl-n<project-name>editcluster<cluster-name>
Add the following field to the spec section to set the maintenance mode
on the cluster:
spec:providerSpec:value:maintenance:true
Verify that the Cluster object status for Maintenance is
ready:true:
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
Before the management cluster update to 16.4.0 (MCC 2.29.0)
This section describes how to replace a failed manager node in your
MOSK deployment. The procedure applies to the manager
nodes that are, for example, permanently failed due to a hardware failure and
remain in the NotReady state.
Modify network configuration on an existing machine¶
TechPreview
Caution
Modification of L2 templates in use is only allowed with a
mandatory validation step from the infrastructure operator to prevent
accidental cluster failures due to unsafe changes. The list of risks posed
by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to
the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause
data loss.
Incorrect configurations on hosts can lead to irrevocable loss of
connectivity between services and unexpected cluster partition or
disassembly.
Warning
Netplan does not handle arbitrary configuration changes. For
details, see Netplan documentation.
To modify network configuration of an existing machine, you need to
create a new L2 template and change the assignment of the template for that
particular machine.
Warning
When a new network configuration is being applied on nodes,
sequential draining of corresponding nodes and re-running of LCM on them
occurs the same way as it is done during cluster update.
The following fields of the ipamHost status are renamed since
MOSK 23.1 in the scope of the L2Template and IpamHost objects
refactoring:
netconfigV2 to netconfigCandidate
netconfigV2state to netconfigCandidateState
netconfigFilesState to netconfigFilesStates (per file)
No user actions are required after renaming.
The format of netconfigFilesState changed after renaming. The
netconfigFilesStates field contains a dictionary of statuses of network
configuration files stored in netconfigFiles. The dictionary contains
the keys that are file paths and values that have the same meaning for each
file that netconfigFilesState had:
For a successfully rendered configuration file:
OK:<timestamp><sha256-hash-of-rendered-file>, where a timestamp
is in the RFC 3339 format.
For a failed rendering: ERR:<error-message>.
If the configuration is valid:
The netconfigCandidate field contains the Netplan configuration
file candidate rendered using the modified objects
The netconfigCandidateState and netconfigFilesStates fields
have the OK status
The netconfigFilesStates field contains the old date and checksum
meaning that the effective Netplan configuration is still based on the
previous versions of the modified objects
The messages field may contain some warnings but no errors
If the L2 template rendering fails, the candidate for Netplan
configuration is empty and its netconfigCandidateState status contains
an error message. A broken candidate for Netplan configuration cannot be
approved and become the effective Netplan configuration.
Warning
Do not proceed to the next step until you make sure that the
netconfigCandidate field contains the valid configuration and this
configuration meets your expectations.
Approve the new network configuration for the related IpamHost objects:
Once applied, the new configuration is copied to the netconfigFiles
field of the effective Netplan configuration, then copied to the
corresponding LCMMachine objects.
Verify the statuses of the updated IpamHost objects:
The following fields of the ipamHost status are renamed since
MOSK 23.1 in the scope of the L2Template and IpamHost objects
refactoring:
netconfigV2 to netconfigCandidate
netconfigV2state to netconfigCandidateState
netconfigFilesState to netconfigFilesStates (per file)
No user actions are required after renaming.
The format of netconfigFilesState changed after renaming. The
netconfigFilesStates field contains a dictionary of statuses of network
configuration files stored in netconfigFiles. The dictionary contains
the keys that are file paths and values that have the same meaning for each
file that netconfigFilesState had:
For a successfully rendered configuration file:
OK:<timestamp><sha256-hash-of-rendered-file>, where a timestamp
is in the RFC 3339 format.
For a failed rendering: ERR:<error-message>.
The new configuration is copied to the effective Netplan configuration and
both configurations are valid when:
The netconfigCandidateState and netconfigFilesStates fields have
the OK status and the same checksum
In the output of the above command, hash sums contained in the
bm_ipam_netconfig_files values must match those in the
IpamHost.status.netconfigFilesStates output. If so, the new
configuration is copied to LCMMachine objects.
Monitor the update operations that start on nodes. For details, see
Verify machine status.
Create Subnet objects for the following networks: LCM, workload,
tenant, and Ceph (where applicable).
Create a new L2 template that nodes in a new rack will use.
In this template, configure the external network to be either stretched
between racks or connected to the first rack only.
Caution
API/LCM network is the first rack LCM network in our example,
since a single-rack MOSK cluster was deployed first.
Therefore, only the first rack can contain Kubernetes master nodes that
provide access to Kubernetes API.
In the L2 template for the first rack, add IP routes pointing to
the networks in the new rack.
The following examples contain:
The modified L2 template for the first rack. Routes added to the second rack
are highlighted.
The new L2 template for the second rack with external network that is
stretched between racks. The IP gateway in the external network is used
as the default route on the nodes of the second rack.
Example of a modified L2 template for the first rack with routes
to the second rack
l3Layout:-subnetName:kaas-mgmtscope:globallabelSelector:kaas.mirantis.com/provider:baremetalkaas-mgmt-subnet:""-subnetName:k8s-lcmscope:namespace-subnetName:k8s-ext-ipamscope:namespace-subnetName:tenantscope:namespace-subnetName:k8s-podsscope:namespace-subnetName:ceph-frontscope:namespace-subnetName:ceph-backscope:namespace-subnetName:k8s-lcm-rack2scope:namespace-subnetName:tenant-rack2scope:namespace-subnetName:k8s-pods-rack2scope:namespace-subnetName:ceph-front-rack2scope:namespace-subnetName:ceph-back-rack2scope:namespacenpTemplate:|-version: 2ethernets:{{nic 0}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 0}}set-name: {{nic 0}}mtu: 1500{{nic 1}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 1}}set-name: {{nic 1}}mtu: 1500{{nic 2}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 2}}set-name: {{nic 2}}mtu: 9050{{nic 3}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 3}}set-name: {{nic 3}}mtu: 9050bonds:bond0:interfaces:- {{nic 0}}- {{nic 1}}parameters:mode: 802.3adtransmit-hash-policy: layer3+4mtu: 1500bond1:interfaces:- {{nic 2}}- {{nic 3}}parameters:mode: 802.3adtransmit-hash-policy: layer3+4mtu: 9050vlans:k8s-lcm-v:id: 738link: bond0k8s-pod-v:id: 731link: bond1mtu: 9000k8s-ext-v:id: 736link: bond1mtu: 9000tenant-vlan:id: 732link: bond1addresses:- {{ip "tenant-vlan:tenant"}}routes:# to 2nd rack of MOSK cluster- to: {{cidr_from_subnet "tenant-rack2"}}via: {{gateway_from_subnet "tenant"}}mtu: 9050ceph-front-v:id: 733link: bond1addresses:- {{ip "ceph-front-v:ceph-front"}}routes:# to 2nd rack of MOSK cluster- to: {{cidr_from_subnet "ceph-front-rack2"}}via: {{gateway_from_subnet "ceph-front"}}mtu: 9000ceph-back-v:id: 734link: bond1addresses:- {{ip "ceph-back-v:ceph-back"}}routes:# to 2nd rack of MOSK cluster- to: {{cidr_from_subnet "ceph-back-rack2"}}via: {{gateway_from_subnet "ceph-back"}}mtu: 9000bridges:k8s-lcm:interfaces: [k8s-lcm-v]addresses:- {{ip "k8s-lcm:k8s-lcm"}}nameservers:addresses: {{nameservers_from_subnet "k8s-lcm"}}routes:# to management network of Container Cloud cluster- to: {{cidr_from_subnet "kaas-mgmt"}}via: {{gateway_from_subnet "k8s-lcm"}}table: 101# fips network- to: 10.159.156.0/22via: {{gateway_from_subnet "k8s-lcm"}}table: 101# to 2nd rack of MOSK cluster- to: {{cidr_from_subnet "k8s-lcm-rack2"}}via: {{gateway_from_subnet "k8s-lcm"}}table: 101routing-policy:- from: {{cidr_from_subnet "k8s-lcm"}}table: 101k8s-pods:interfaces: [k8s-pod-v]addresses:- {{ip "k8s-pods:k8s-pods"}}routes:# to 2nd rack of MOSK cluster- to: {{cidr_from_subnet "k8s-pods-rack2"}}via: {{gateway_from_subnet "k8s-pods"}}mtu: 9000k8s-ext:interfaces: [k8s-ext-v]addresses:- {{ip "k8s-ext:k8s-ext-ipam"}}gateway4: {{gateway_from_subnet "k8s-ext-ipam"}}nameservers:addresses: {{nameservers_from_subnet "k8s-ext-ipam"}}mtu: 9000## FIP Bridgebr-fip:interfaces: [bond1]mtu: 9050
Example of a new L2 template for the second rack with external
network
l3Layout:-subnetName:kaas-mgmtscope:globallabelSelector:kaas.mirantis.com/provider:baremetalkaas-mgmt-subnet:""-subnetName:k8s-lcmscope:namespace-subnetName:k8s-ext-ipamscope:namespace-subnetName:tenantscope:namespace-subnetName:k8s-podsscope:namespace-subnetName:ceph-frontscope:namespace-subnetName:ceph-backscope:namespace-subnetName:k8s-lcm-rack2scope:namespace-subnetName:tenant-rack2scope:namespace-subnetName:k8s-pods-rack2scope:namespace-subnetName:ceph-front-rack2scope:namespace-subnetName:ceph-back-rack2scope:namespacenpTemplate:|-version: 2ethernets:{{nic 0}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 0}}set-name: {{nic 0}}mtu: 1500{{nic 1}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 1}}set-name: {{nic 1}}mtu: 1500{{nic 2}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 2}}set-name: {{nic 2}}mtu: 9050{{nic 3}}:dhcp4: falsedhcp6: falsematch:macaddress: {{mac 3}}set-name: {{nic 3}}mtu: 9050bonds:bond0:interfaces:- {{nic 0}}- {{nic 1}}parameters:mode: 802.3adtransmit-hash-policy: layer3+4mtu: 1500bond1:interfaces:- {{nic 2}}- {{nic 3}}parameters:mode: 802.3adtransmit-hash-policy: layer3+4mtu: 9050vlans:k8s-lcm-v:id: 738link: bond0k8s-pod-v:id: 731link: bond1mtu: 9000k8s-ext-v:id: 736link: bond1mtu: 9000tenant-vlan:id: 732link: bond1addresses:- {{ip "tenant-vlan:tenant-rack2"}}routes:# to 2nd rack of MOSK cluster- to: {{cidr_from_subnet "tenant"}}via: {{gateway_from_subnet "tenant-rack2"}}mtu: 9050ceph-front-v:id: 733link: bond1addresses:- {{ip "ceph-front-v:ceph-front-rack2"}}routes:# to 1st rack of MOSK cluster- to: {{cidr_from_subnet "ceph-front"}}via: {{gateway_from_subnet "ceph-front-rack2"}}mtu: 9000ceph-back-v:id: 734link: bond1addresses:- {{ip "ceph-back-v:ceph-back-rack2"}}routes:# to 2nd rack of MOSK cluster- to: {{cidr_from_subnet "ceph-back"}}via: {{gateway_from_subnet "ceph-back-rack2"}}mtu: 9000bridges:k8s-lcm:interfaces: [k8s-lcm-v]addresses:- {{ip "k8s-lcm:k8s-lcm-rack2"}}nameservers:addresses: {{nameservers_from_subnet "k8s-lcm-rack2"}}routes:# to management network of Container Cloud cluster- to: {{cidr_from_subnet "kaas-mgmt"}}via: {{gateway_from_subnet "k8s-lcm-rack2"}}table: 101# fips network- to: 10.159.156.0/22via: {{gateway_from_subnet "k8s-lcm-rack2"}}table: 101# to API/LCM network of MOSK cluster- to: {{cidr_from_subnet "k8s-lcm"}}via: {{gateway_from_subnet "k8s-lcm-rack2"}}table: 101routing-policy:- from: {{cidr_from_subnet "k8s-lcm-rack2"}}table: 101k8s-pods:interfaces: [k8s-pod-v]addresses:- {{ip "k8s-pods:k8s-pods-rack2"}}routes:# to 2nd rack of MOSK cluster- to: {{cidr_from_subnet "k8s-pods"}}via: {{gateway_from_subnet "k8s-pods-rack2"}}mtu: 9000k8s-ext:interfaces: [k8s-ext-v]addresses:- {{ip "k8s-ext:k8s-ext-ipam"}}gateway4: {{gateway_from_subnet "k8s-ext-ipam"}}nameservers:addresses: {{nameservers_from_subnet "k8s-ext-ipam"}}mtu: 9000## FIP Bridgebr-fip:interfaces: [bond1]mtu: 9050
Expand IP addresses capacity in an existing cluster¶
If the subnet capacity on your existing cluster is not enough to add new
machines, use the l2TemplateSelector feature to expand the IP addresses
capacity:
Create new Subnet object(s) to define additional address ranges for new
machines.
Set up routing between the existing and new subnets.
Create new L2 template(s) with the new subnet(s) being used in l3Layout.
Set up l2TemplateSelector in the Machine objects for new machines.
To expand IP addresses capacity for an existing cluster:
Verify the capacity of the subnet(s) currently associated with
the L2 template(s) used for cluster deployment:
If labelSelector is not used for the given subnet, use the
namespace value of the L2 template and the subnetName value
from the l3Layout section:
kubectlgetsubnet-n<namespace><subnetName>
If labelSelector is used for the given subnet, use the namespace
value of the L2 template and comma-separated key-value pairs from the
labelSelector section for the given subnet in the l3Layout
section:
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Note
Before MOSK 23.3, an L2 template requires
clusterRef:<clusterName> in the spec section. Since MOSK 23.3,
this parameter is deprecated and automatically migrated to the
cluster.sigs.k8s.io/cluster-name:<clusterName> label.
Create new objects:
Subnet with the user-defined/purpose:lcm-additional label.
L2Template with the alternative-template:“1” label.
The L2 template should reference the new Subnet object using the
user-defined/purpose:lcm-additional label in the labelSelector
field.
Note
The label name user-defined/purpose is used for illustration
purposes. Use any custom label name that differs from system names.
Use of a unique prefix such as user-defined/ is recommended.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Note
Before MOSK 23.3, an L2 template requires
clusterRef:<clusterName> in the spec section. Since MOSK 23.3,
this parameter is deprecated and automatically migrated to the
cluster.sigs.k8s.io/cluster-name:<clusterName> label.
You can also reference the new Subnet object by using its name
in the l3Layout section of the alternative-template L2 template.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
After creation, the new machine will use the alternative
L2 template that uses the new-lcm-network subnet linked by L3Layout.
Optional. Configure an additional IP address pool for MetalLB:
Since MOSK 24.2
Configure the additional extension IP address pool for the
metallb load balancer service.
Open the MetalLBConfig object of the management cluster for
editing:
In the snippet above, replace the following parameters:
<pool_start_ip> - first IP address in the required range
<pool_end_ip> - last IP address in the range
Add the extension IP address pool name to the L2Advertisements
definition. You can add it to the same L2 advertisement as the
default IP address pool, or create a new L2 advertisement
if required.
Define additional address ranges for MetalLB. For details, see the
optional step for the MetalLB service in Create subnets.
You can create one or several Subnet objects to extend the MetalLB
address pool with additional ranges. When the MetalLB traffic is routed
through the default gateway, you can add the MetalLB address ranges that
belong to different CIDR subnet addresses.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
MOSK provides cloud operators with a unified tool to perform
automatic self-diagnostic checks on both management and managed clusters. This
capability allows for easier troubleshooting and preventing potential issues.
For instance, self-diagnostic checks can notify you of deprecated features
that, if left unresolved, may block upgrades to subsequent versions.
Examples of self-diagnostic checks include:
An SSL/TLS certificate is not set explicitly as plain text
Deprecated OpenStackDeploymentSecret does not exist
Running self-diagnostics is essential to ensure the overall health and optimal
performance of your cluster. Mirantis recommends running self-diagnostics
before cluster update, node replacement, or any other significant changes in
the cluster to optimize maintenance window.
The Diagnostic Controller is a tool with a set of diagnostic checks to
automatically perform self-diagnostics of any cluster and help the operator to
easily understand, troubleshoot, and resolve potential issues against the
following major subsystems: core, bare metal, Ceph, StackLight,
Tungsten Fabric, and OpenStack. For illustration of diagnostic checks, refer
to the subsection describing the bare metal provider checks.
The Diagnostic Controller analyzes the configuration of the cluster subsystems
and reports results of checks that contain useful information about cluster
health. These reports may include references to documentation on known issues
related to results of checks, along with ticket numbers for tracking the
resolution progress of related issues.
The Diagnostic Controller watches for the Diagnostic objects and runs a set
of diagnostic checks depending on the cluster version and type, which are
identified by the cluster name defined in the spec.cluster section of the
Diagnostic object.
Trigger self-diagnostics for a management or managed cluster¶
Available since MCC 2.28.0 (17.3.0 and 16.3.0)
To run self-diagnostics for a cluster, the operator must create a
Diagnostic object. The creation of this object triggers
diagnostic-controller to start all available checks for the target cluster
defined in the spec.cluster section of the object.
After a successful completion of the required set of diagnostic checks,
diagnostics is never retriggered. To retrigger diagnostics for the same
cluster, the operator must create a new Diagnostic object.
The objects of the Diagnostic kind are not removed automatically so that
you can assess the result of each diagnostics later.
To trigger self-diagnostics for a cluster:
Log in to the host where kubeconfig of your management cluster is
located and where kubectl is installed.
Create the Diagnostic object in the namespace where the target cluster
is located. For example:
Verify the status section of the Diagnostic object:
If diagnostics is finished successfully, its result is displayed in the
result map containing key-value pairs describing results of the
corresponding diagnostic checks.
If diagnostics is finished unsuccessfully, or the Diagnostic Controller
version is outdated, diagnostic-controller saves the issue description
to the status.error field.
If the Diagnostic Controller version is outdated, ensure that
release-controller is running and a new DiagnosticRelease has been
created. Also, verify logs of the bare metal provider and
release-controller for issues.
If the status section is empty, diagnostic-controller has not run
any diagnostics yet.
The Diagnostic Controller is upgraded outside the Container Cloud release
cycle. Once the new version of the Diagnostic Controller is released, it is
automatically installed on the management cluster.
The Diagnostic Controller does not run any diagnostics until it is upgraded to
the latest version. If diagnostics is triggered before the Diagnostic
Controller is fully upgraded, the status field of the Diagnostic object
contains the corresponding error. For example:
apiVersion:diagnostic.mirantis.com/v1alpha1kind:Diagnosticmetadata:name:test-diagnosticnamespace:test-namespacespec:cluster:test-clusterstatus:error:The controller has outdated version v1.40.1 (the latest version isv1.40.2). Wait until the controller is updated to the latest version. Ensurethat the release controller is running and the new DiagnosticRelease hasbeen created. Check the release controller and the provider logs for issues.controllerVersion:v1.40.1
The bm_address_capacity check verifies that the available capacities
of IP addresses in the Subnet and MetalLBConfig objects are sufficient.
This check verifies the Subnet objects only with the following labels:
ipam/SVC-k8s-lcm
ipam/SVC-pxe-nics
For the MetalLBConfig objects, the check uses only the IP addresses
defined in .spec.ipAddressPools and verifies only the MetalLBConfig
objects with the following configuration for the IP address pool:
.spec.autoAssign is set to true
.spec.serviceAllocation.serviceSelectors is no set
The minimum thresholds for IP address capacity are as follows:
Subnet — 5
MetalLBConfig — 10
Capacity below these thresholds is reported as insufficient.
If thresholds are met, then the output status is INFO. Otherwise,
the status is WARNING.
The check reports the number of available IP addresses for each matching
Subnet object and for each matching IP address pool of a matching
MetalLBConfig object.
The bm_artifacts_overrides check verifies that no undesirable overrides are
present in the baremetal-operator release, including
but not limited to the values.init_bootstrap.provisioning_files.artifacts
path.
The bm_objects_statuses check verifies that no errors or undesired states
are present in the status of the following objects: IPAMHost,
MetalLBConfig, and LCMMachine.
Object name
Checks applied by bm_objects_statuses
IPAMHost
.status.state is set to OK
.status.netconfigCandidate equals the configuration set in
/etc/netplan/60-kaas-lcm-netplan.yaml on a corresponding machine
LCMMachine
.status.hostInfo.hardware is present and contains values
.status.stateItemStatuses has no errors for each StateItem
MetalLBConfig
.status.objects equals .spec
.status.updateResult.success and .status.propagateResult.success
are set to true
The Container Cloud web UI communicates with Keycloak to authenticate
users. Keycloak is exposed using HTTPS with self-signed TLS certificates
that are not trusted by web browsers.
User management for the Mirantis OpenStack for Kubernetes m:os roles is not
yet available through API or web UI. Therefore, continue managing these
roles using Keycloak.
You can use the following objects depending on the way you want the role
to be assigned to the user:
IAMGlobalRoleBinding for global role bindings
Any IAM role can be used in IAMGlobalRoleBinding and will be applied
globally, not limited to a specific project or cluster. For example,
the global-admin role.
IAMRoleBinding for project role bindings
Any role except the global-admin one apply. For example, using the
operator and user IAM roles in IAMRoleBinding of the example
project corresponds to assigning of m:kaas:example@operator/user
in Keycloak. You can also use these IAM roles in IAMGlobalRoleBinding.
In this case, the roles corresponding to every project will be assigned
to a user in Keycloak.
IAMClusterRoleBinding for cluster role bindings
Only the cluster-admin and stacklight-admin roles apply to
IAMClusterRoleBinding. Creation of such objects corresponds to the
assignment of m:k8s:namespace:cluster@cluster-admin/stacklight-admin
in Keycloak. You can also bind these roles to either
IAMGlobalRoleBinding or IAMRoleBinding. In this case, the roles
corresponding to all clusters and in all projects or one particular project
will be assigned to a user.
This section describes available IAM roles with use cases and the Container
Cloud API IAM*RoleBinding mapping with Keycloak.
The following table illustrates possible role use cases for a better
understanding on which roles should be assigned to users who perform
particular operations in a MOSK cluster:
Infrastructure operator with the global-admin role who performs
the following operations:
Can manage all types of role bindings for all users
Performs CRUD operations on namespaces to effectively manage
Container Cloud projects (Kubernetes namespaces)
Creates a new project when onboarding a new team to MOSK
Assigns the operator role to users who are going to create
Kubernetes clusters in a project
Can assign the user or operator role for themselves to
monitor cluster state in a specific namespace or manage Container
Cloud API objects in that namespace respectively.
Available since Container Cloud 2.25.0 (Cluster releases 17.0.0 and
16.0.0).
Infrastructure operator with the management-admin role who has
full access to the management cluster, for example, to debug
MOSK issues.
Infrastructure operator with the operator role who performs
the following operations:
Can manage Container Cloud API and Ceph-related objects in a
particular namespace, create clusters and machines, have full access
to Kubernetes clusters and StackLight APIs deployed by anyone in
this namespace
Can manage role bindings in the current namespace for users who
require the bm-pool-operator, operator, or user role,
or who should manage a particular Kubernetes cluster in this namespace
Is responsible for upgrading Kubernetes clusters in the defined
project when an update is available
Infrastructure support operator with the user role who performs
the following operations:
Is responsible for the infrastructure of a particular project
Has access to live statuses of the project cluster machines to
identify unhealthy ones and perform maintenance on the infrastructure
level with the possibility to adjust operating system if required
Has access to IAM objects such as IAMUser, IAMRole
User with the stacklight-admin role who performs
the following operations:
Has the admin-level access to the StackLight components of a
particular Kubernetes cluster deployed in a particular namespace
to monitor the cluster health.
Mapping of Keycloak roles to IAM*RoleBinding objects¶
Starting from Container Cloud 2.14.0 (Cluster releases 7.4.0, 6.20.0, and
5.21.0), MOSK role naming has changed. The old role names
logic has been reworked and new role names are introduced.
Old-style role mappings are reflected in the Container Cloud API with the new
roles and the legacy:true and legacyRole:“<oldRoleName>” fields set.
If you remove the legacy flag, user-controller automatically performs
the following update in Keycloak:
Grants the new-style role
Removes the old-style role mapping
Note
You can assign the old-style roles using Keycloak only. These roles will be
synced into the Container Cloud API as the corresponding IAM*RoleBinding
objects with the external:true, legacy:true, and
legacyRole:“<oldRoleName>” fields set.
If you assign new-style roles using Keycloak, they will be synced into the
Container Cloud API with the external:true field set.
Mapping of new-style Keycloak roles to IAM*RoleBinding objects¶
The following table describes how the IAM*RoleBinding objects in the
Container Cloud API map to roles in Keycloak.
Mapping of old-style Keycloak roles to IAM*RoleBinding objects¶
The following table describes how the role names available before the
Container Cloud 2.14.0 (Cluster releases 7.4.0, 6.20.0, and 5.21.0) map with
the current IAM*RoleBinding objects in the Container Cloud API map:
Examples of mapping between Keycloak roles and IAM*RoleBinding objects¶
The following tables contain several examples of role assignment either
through Keycloak or the Container Cloud IAM objects with the corresponding
role mappings for each use case.
For example, if you have two namespaces (ns1, ns2) and
two clusters in each namespace, the following roles are created
in Keycloak:
m:k8s:ns1:cluster1@cluster-admin
m:k8s:ns1:cluster2@cluster-admin
m:k8s:ns2:cluster3@cluster-admin
m:k8s:ns2:cluster4@cluster-admin
If you create a new cluster5 in ns2, the user is automatically
assigned a new role in Keycloak: m:k8s:ns2:cluster5@cluster-admin.
The following table provides the new-style and old-style examples on how
a role assigned to a user through Keycloak will be translated into IAM objects.
Creation of this role through Keycloak triggers creation of two
IAMGlobalRoleBindings: global-admin and operator.
To migrate the old-style m:kaas@writer role to the new-style roles,
remove the legacy:true flag in two API objects.
For example, if you have two namespaces (ns1 and ns2) and remove
the legacy:true flag from both IAMGlobalRoleBindings mentioned
above, the old-style m:kaas@writer role will be
substituted by the following roles in Keycloak:
m:kaas@global-admin
m:kaas:ns1@operator
m:kaas:ns2@operator
If you create a new ns3, user1 is automatically assigned
a new role m:kaas:ns3@operator.
If you do not remove the legacy flag from IAMGlobalRoleBindings,
only one role remains in Keycloak - m:kaas@writer.
Manage user roles through the Container Cloud web UI¶
If you are assigned the global-admin role, you can manage the
IAM*RoleBinding objects through the Container Cloud web UI. The possibility
to manage project role bindings using the operator role will become
available in one of the following Container Cloud releases.
To add or remove a role binding using the Container Cloud web UI:
Log in to the Container Cloud web UI as global-admin.
In the left-side navigation panel, click Users to open the
active users list and view the number and types of bindings for each
user. Click on a user name to open the details page with the user
Role Bindings.
Select from the following options:
To add a new binding:
Click Create Role Binding.
In the window that opens, configure the following fields:
Parameter
Description
Role
global-admin
Manage all types of role bindings for all users
management-adminSince MCC 2.25.0 (17.0.0 and 16.0.0)
Have full access to the management cluster
bm-pool-operator
Manage bare metal hosts of a particular namespace
operator
Manage Container Cloud API and Ceph-related objects in a
particular project, create clusters and machines,
have full access to Kubernetes clusters and StackLight APIs
deployed by anyone in this project
Manage role bindings in the current namespace for users
who require the bm-pool-operator, operator,
or user role
user
Manage infrastructure of a particular project with access
to live statuses of the project cluster machines to monitor
cluster health
cluster-admin
Have admin access to Kubernetes clusters and StackLight
components of a particular cluster and project
stacklight-admin
Have admin access to the StackLight components of a
particular Kubernetes cluster deployed in a particular project
to monitor the cluster health.
Binding type
Global
Bind a role globally, not limited to a specific project
or cluster. By default, global-admin has the global
binding type.
You can bind any role globally. For example,
you can change the default project binding of the operator
role to apply this role globally, to all existing and new
projects.
Project
Bind a role to a specific project. If selected, also define
the Project name that the binding is assigned to.
By default, the following IAM roles have the project
binding type: bm-pool-operator, operator, and user.
You can bind any role to a project except the global-admin one.
Cluster
Bind a role to a specific cluster. If selected, also define
the Project and Cluster name that
the binding is assigned to. You can bind only the
cluster-admin and stacklight-admin roles to a cluster.
To remove a binding, click the Delete action icon located
in the last column of the required role binding.
Bindings that have the external flag set to true will be synced
back from Keycloak during the next user-controller reconciliation.
Therefore, manage such bindings through Keycloak.
Mirantis Container Cloud creates the IAM roles in scopes.
For each application type, such as kaas, k8s, or sl,
Container Cloud creates a set of roles such as @admin, @cluster-admin,
@reader, @writer, @operator.
Depending on the role, you can perform specific operations in a cluster. For
example:
With the m:kaas@writer role, you can create a project
using the Container Cloud web UI. The corresponding project-specific roles
will be automatically created in Keycloak by iam-controller.
With the m:kaas* roles, you can download the kubeconfig of the
management cluster.
The semantic structure of role naming in MOSK is as follows:
Since Container Cloud 2.14.0 (Cluster releases 7.4.0, 6.20.0, 5.21.0),
new-style roles were introduced. They can be assigned to users through Keycloak
directly as well as by using IAM API objects. Mirantis recommends using IAM API
for roles assignment.
Users with the m:kaas@global-admin role can create MOSK
projects, which are Kubernetes namespaces in a management cluster, and all
IAM API objects that manage users access to MOSK.
Users with the m:kaas@management-admin role have full access to the
management cluster. This role is available since Container Cloud 2.25.0
(Cluster releases 17.0.0 and 16.0.0).
After project creation, iam-controller creates the following roles in
Keycloak:
m:kaas:<namespaceName>@operator
Provides the same permissions as m:kaas:<namespaceName>@writer
m:kaas:<namespaceName>@bm-pool-operator
Provides the same permissions as m:kaas@operator but restricted to a
single namespace
m:kaas:<namespaceName>@user
Provides the same permissions as m:kaas:<namespaceName>@reader
m:kaas:<namespaceName>@member
Provides the same permissions as m:kaas:<namespaceName>@operator except
for IAM API access
The old-style m:k8s:<namespaceName>:<clusterName>@cluster-admin role is
unchanged in the new-style format and is recommended for usage.
When a managed cluster is created, a new role
m:sl:<namespaceName>:<clusterName>@stacklight-admin for the sl
application is created. This role provides the same access to the StackLight
resources in the managed cluster as
m:sl:<namespaceName>:<clusterName>@admin and is included into the
corresponding m:k8s:<namespaceName>:<clusterName>@cluster-admin role.
Users with the m:kaas@writer role are considered global
MOSK administrators. They can create MOSK
projects that are Kubernetes namespaces in the management cluster. After a
project is created, the m:kaas:<namespaceName>@writer and
m:kaas:<namespaceName>@reader roles are created in Keycloak by
iam-controller. These roles are automatically included into the
corresponding global roles, such as m:kaas@writer, so that users with the
global-scoped role also obtain the rights provided by the namespace-scoped
roles. The global role m:kaas@operator provides full access to bare metal
objects.
When a managed cluster is created, roles for the sl and k8s
applications are created:
m:k8s:<namespaceName>:<clusterName>@cluster-admin (also applies to
new-style roles, recommended)
m:sl:<namespaceName>:<clusterName>@admin
These roles provide access to the corresponding resources in a managed cluster
and are included into the corresponding m:kaas:<namespaceName>@writer role.
Available since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0).
Have full access to the management cluster.
m:kaas:<namespaceName>
reader
m:kaas:<namespaceName>@reader
List the API resources within the specified Container Cloud project.
writer
m:kaas:<namespaceName>@writer
Create, update, or delete the API resources within the specified
Container Cloud project.
user
m:kaas:<namespaceName>@user
List the API resources within the specified Container Cloud project.
operator
m:kaas:<namespaceName>@operator
Create, update, or delete the API resources within the specified
Container Cloud project.
bm-pool-operator
m:kaas:<namespaceName>@bm-pool-operator
Add or delete a bare metal host and, since Container Cloud 2.29.1 (Cluster
release 16.4.1), bare metal inventory within the specified Container Cloud project.
member
m:kaas:<namespaceName>@member
Create, update, or delete the API resources within the specified
Container Cloud project, except IAM API.
This section illustrates possible use cases for a better understanding on which
roles should be assigned to users who perform particular operations in a
MOSK cluster:
Role
Use case
m:kaas@operator
Member of a dedicated infrastructure team who only manages bare metal hosts
and, since Container Cloud 2.29.1 (Cluster release 16.4.1), bare metal inventories
in MOSK
m:kaas@writer
Infrastructure Operator who performs the following operations:
Performs CRUD operations on namespaces to effectively manage MOSK
projects (Kubernetes namespaces)
Creates a new project when a new team is being onboarded to MOSK
Manages API objects in all namespaces, creates clusters and machines
Using kubeconfig downloaded through the Container Cloud web UI, has full access
to the Kubernetes clusters and StackLight APIs deployed by anyone in
MOSK except the management cluster
Has the Container Cloud API access in the management cluster using
the management cluster kubeconfig downloaded through the Container Cloud web UI
Note
To have full access to the management cluster, a kubeconfig
generated during the management cluster bootstrap is required.
m:kaas@reader
Member of a dedicated infrastructure support team responsible for the
MOSK infrastructure who performs the following operations:
Monitors the cluster and machine live statuses to control the underlying
cluster infrastructure health status
Performs maintenance on the infrastructure level
Performs adjustments on the operating system level
m:kaas:<namespaceName>@writer
User who administers a particular project:
Has full access to Kubernetes clusters and StackLight APIs deployed
by anyone in this project
Has full access to Container Cloud API in this project
Upgrades Kubernetes clusters in the project when an update is available
m:kaas:<namespaceName>@reader
Member of a dedicated infrastructure support team in a particular project.
For use cases, see the m:kaas@reader role described above.
m:k8s:<namespaceName>:<clusterName>@cluster-admin
User who has admin access to a Kubernetes cluster deployed in a particular
project.
m:sl:<namespaceName>:<clusterName>@admin
User who has full access to the StackLight components of a particular
Kubernetes cluster deployed in a particular project
to monitor the cluster health status.
Log in to the Keycloak web UI using the following link form with the default
keycloak admin user and the Keycloak credentials obtained in the
previous steps:
Navigate to Users > User list that contains all users in the
IAM realm.
Click the required user name. The page with user settings opens.
Open Credentials tab.
Using the Reset password form, update the password as required.
Note
To change the password permanently, toggle the
Temporary switch to the OFF position. Otherwise,
the user will be prompted to change the password after the next login.
IAM DB credentials:MYSQL_DBADMIN_PASSWORD:foobarMYSQL_DBSST_PASSWORD:barbaz
Caution
Credentials provided in the system response allow operating
MariaDB with the root user inside a container. Therefore, use them with
caution.
Manage Keycloak truststore using the Container Cloud web UI¶
Available since MCC 2.26.0 (17.1.0 and 16.1.0)
While communicating with external services, Keycloak must validate the
certificate of the remote server to ensure secured connection.
By default, the standard Java Truststore configuration is used for validating
outgoing requests. In order to properly validate client self-signed
certificates, the truststore configuration must be added. The truststore is
used to ensure secured connection to identity brokers, LDAP identity
providers, and so on.
If a custom truststore is set, only certificates from that truststore are used.
If trusted public CA certificates are also required, they must be included
in the custom truststore.
To add a custom truststore for Keycloak using the Container Cloud web UI:
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the default project using the Switch Project
action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the More action icon
in the last column of the management cluster and select
Configure cluster.
In the window that opens, click Keycloak and select
Configure trusted certificates.
Note
The Configure trusted certificates check box
is available since Container Cloud 2.26.4 (Cluster releases 17.1.4 and
16.1.4).
In the Truststore section that opens, fill out and save the form
with the following parameters:
Parameter
Description
Data
Content of the truststore file. Click Upload to select
the required file.
Password
Password of the truststore. Mandatory.
Type
Supported truststore types: jks, pkcs12,
or bcfks.
Hostname verification policy
Optional verification of the host name of the server certificate:
The default WILDCARD value allows wildcards in
subdomain names.
The STRICT value requires the Common Name (CN)
to match the host name.
Click Update.
Once a custom truststore for Keycloak is applied, the following configuration
is added to the Cluster object:
Use the same web UI menu to customize an existing truststore or
reset it to default settings, which is available since Container Cloud
2.26.4 (Cluster releases 17.1.4 and 16.1.4).
The Container Cloud web UI communicates with Keycloak to authenticate
users. Keycloak is exposed using HTTPS with self-signed TLS certificates
that are not trusted by web browsers.
Regional clusters are unsupported since Container Cloud 2.25.0
(Cluster releases 17.0.0 and 16.0.0). Mirantis does not perform functional
integration testing of the feature and the related code is removed in
Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0). If you still
require this feature, contact Mirantis support for further information.
This section covers the management aspects of the bare metal management
cluster that MOSK is based on.
The Mirantis Container Cloud web UI enables you to perform the following
operations with management clusters:
View the cluster details (such as cluster ID, creation date, node count,
and so on) as well as obtain a list of cluster endpoints including the
StackLight components, depending on your deployment configuration.
To view generic cluster details, in the Clusters tab, click the
More action icon in the last column of the required cluster and
select Cluster info.
Note
Adding more than 3 nodes to a management cluster is not supported.
Removing a management cluster using the Container Cloud web UI is not
supported. Use the dedicated cleanup script instead. For details, see
Remove a management cluster.
Verify the current release version of the cluster including the list of
installed components with their versions and the cluster release change log.
To view details of a Cluster release version, in the Clusters
tab, click the version in the Release column next to the name
of the required cluster.
This section outlines the operations that you can perform with a management
cluster.
The Container Cloud APIs are implemented using the Kubernetes
CustomResourceDefinitions (CRDs) that enable you to expand the Kubernetes
API. For details, see Container Cloud documentation: API Reference.
You can operate a cluster using the kubectl command-line tool that
is based on the Kubernetes API. For the kubectl reference, see the
official
Kubernetes documentation.
Workflow and configuration of management cluster upgrade¶
This section describes specifics of automatic upgrade workflow of a management
cluster as well as provides configuration procedures that you may apply before
and after automatic upgrade.
A management cluster upgrade to a newer version is performed automatically
once a new Container Cloud version is released. For more details about the
Container Cloud release upgrade mechanism, see: Container Cloud Reference
Architecture: Release Controller.
The Operator can delay the Container Cloud automatic upgrade procedure for a
limited amount of time or schedule upgrade to run at desired hours or weekdays.
For details, see Schedule Mirantis Container Cloud updates.
Container Cloud remains operational during the management cluster upgrade.
Managed clusters are not affected during this upgrade. For the list of
components that are updated during the Container Cloud upgrade, see the
Components versions section of the corresponding major Container Cloud
release in Container Cloud Release Notes: Container Cloud releases.
When Mirantis announces support of the newest versions of
Mirantis Container Runtime (MCR) and Mirantis Kubernetes Engine
(MKE), Container Cloud automatically upgrades these components as well.
For the maintenance window best practices before upgrade of these
components, see
MKE Documentation.
Since Container Cloud 2.23.2 (Cluster releases 12.7.1 and 11.7.1), the release
update train includes patch release updates being delivered between major
releases. Patch release updates also involve automatic upgrade of a management
cluster. For details on the currently available patch releases, see
Container Cloud Release Notes: Latest supported patch releases.
Update the bootstrap tarball after automatic cluster upgrade¶
Once the management cluster is upgraded to the latest version, update the
original bootstrap tarball for successful cluster management, such as
collecting logs and so on.
Select from the following options:
For clusters deployed using Container Cloud 2.11.0 (Cluster releases 7.1.0
and 6.18.0) or later:
For clusters deployed using the Container Cloud release earlier than 2.11.0
(7.0.0, 6.16.0, or earlier), or if you deleted the kaas-bootstrap folder,
download and run the Container Cloud bootstrap script:
By default, Container Cloud automatically updates to the latest version,
once available. An Operator can delay or reschedule Container Cloud automatic
update process using CLI or web UI. The scheduling feature allows:
Limiting hours and weekdays when Container Cloud update can run. For example,
if a release becomes available on Monday, you can delay it until Sunday by
setting Sunday as the only permitted day for auto-updates.
Available since Container Cloud 2.28.0 (Cluster release 16.3.0):
Blocking Container Cloud auto-update immediately on the release date.
The delay period is minimum 20 days for each newly discovered release.
The exact number of delay days is set in the release metadata and cannot be
changed by the user. It depends on the specifics of each release cycle and
on optional configuration of week days and hours selected for update.
You can verify the exact date of a scheduled auto-update either in the
Status section of the Management Cluster Updates
page in the web UI or in the status section of the MCCUpgrade
object.
Deprecated since Container Cloud 2.28.0 (Cluster release 16.3.0) in the CLI
and removed in the web UI. Blocking Container Cloud update process for up to
7 days from the current date and up to 30 days from the latest
Container Cloud release
Caution
Since Container Cloud 2.23.2 (Cluster release 11.7.1), the release
update train includes patch release updates being delivered between major
releases. The new approach increases the frequency of the release updates.
Therefore, schedule a longer maintenance window for the Container Cloud
update as there can be more than one scheduled update in the queue.
Schedule update of a management cluster using the web UI¶
Since Container Cloud 2.28.0 (Cluster release 16.3.0)
Log in to the Container Cloud web UI as m:kaas@global-admin or
m:kaas@writer.
In the left-side navigation panel, click Admin > Updates.
On the Management Cluster Updates page, verify the status
of the next release in the Status section.
If the management cluster update is delayed, the section contains the
following information about the new release: version, publish date,
link to release notes, scheduled date and time of update.
If the management cluster contains managed clusters running unsupported
Cluster versions, a tooltip with a notification about blocked update
is displayed.
If the cluster is updated to the latest version, the corresponding
message is displayed.
On the left side of the page, click Settings.
On the Configure updates schedule page, select
Auto-delay cluster updates to delay every new consecutive
release for minimum 20 days from the release publish date.
Note
Changing the number of delay days in unsupported. The exact
number of delay days depends on specifics of each release cycle and
on optional configuration of week days and hours selected for update.
Optional. Select Apply updates only within specific hours:
From the Time Zone list, select the required time zone or
type in the required location.
In Allowed Time for update, set the time intervals and week
days allowed for update. To set additional update hours, use the
+ button on the right side of the window.
Note
You can use this option with or without the auto-delay option.
When both options are enabled, the next available update starts after
the 20-day interval at the earliest allowed hour and week day that are
allowed by the defined time window.
Before Container Cloud 2.28.0 (Cluster release 16.2.0 or eralier)
Log in to the Container Cloud web UI as m:kaas@global-admin or
m:kaas@writer.
In the left-side navigation panel, click Upgrade Schedule in
the Admin section.
Click Configure Schedule.
Select the time zone from the Time Zone list. You can also
type the necessary location to find it in the list.
Optional. In Delay Upgrade, configure the update delay.
You can set no delay or select the exact day, hour, and
minute. You can delay the update up to 7 days, but not more than
30 days from the latest release date. For example, the current time is
10:00 March 28, and the latest release was on March 1. In this case, the
maximum delay you can set is 10:00 March 31. Regardless of your time
zone, configure time in accordance with the previously selected time
zone.
Optional. In Allowed Time for Upgrade, set the time intervals
when to allow update. Select the update hours in the From and
To time input fields. Select days of the week in the
corresponding check boxes. Click + to set additional update
hours.
Schedule update of a management cluster using CLI¶
You can delay or reschedule Container Cloud automatic update by editing the
MCCUpgrade object named mcc-upgrade in Kubernetes API.
Caution
Only the management cluster admin and users with the operator
(or writer in old-style Keycloak roles) permissions can edit the
MCCUpgrade object. For object editing, use kubeconfig generated
during the management cluster bootstrap or kubeconfig generated with the
operator (or writer) permissions.
To edit the current configuration, run the following command in the command
line:
kubectleditmccupgrademcc-upgrade
In the system response, the editor displays the current state of the
MCCUpgrade object in the YAML format. The spec section contains
the current update schedule configuration, for example:
On every update step, the Release Controller verifies if the current time is
allowed by the schedule and does not start or proceed with the update if it
is not.
When your Mirantis Container Cloud expires, contact you account manager to
request a new license by submitting a ticket through the
Mirantis CloudCare Portal.
If your trial license has expired, contact
Mirantis support for further
information.
Once you obtain a new mirantis.lic file, update Container Cloud along
with MKE clusters using the instructions below.
Important
Once your Container Cloud license expires, all API
operations with new and existing clusters are blocked until license
renewal. Existing workloads are not affected.
Additionally, since Container Cloud 2.25.0 (Cluster releases 17.0.0 and
16.0.0), you cannot perform the following operations on your cluster with
an expired license:
Create new clusters and machines
Automatically upgrade the management cluster
Update managed clusters
To update the Container Cloud and MKE licenses:
Log in to the Container Cloud web UI with the m:kaas@global-admin role.
Navigate to Admin > License.
Click Update License and upload your new license.
Click Update.
Caution
Machines are not cordoned and drained, user workloads are not
interrupted, and the MKE license is updated automatically for all clusters.
If you did not add the NTP server parameters during the management cluster
bootstrap, configure them on the existing management cluster as required.
These parameters are applied to all machines of managed clusters deployed
within the configured management cluster.
Caution
The procedure below applies only if ntpEnabled=true (default)
was set during a management cluster bootstrap. Enabling or disabling NTP
after bootstrap is not supported.
Warning
The procedure below triggers an upgrade of all clusters in a
specific management cluster, which may lead to workload disruption during
nodes cordoning and draining.
To configure an NTP server for managed clusters:
Download your management cluster kubeconfig:
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
Expand the menu of the tab with your user name.
Click Download kubeconfig to download kubeconfig
of your management cluster.
Log in to any local machine with kubectl installed.
Copy the downloaded kubeconfig to this machine.
Use the downloaded kubeconfig to edit the management cluster:
Automatically propagate Salesforce configuration to all clusters¶
You can enable automatic propagation of the Salesforce configuration of your
management cluster to the related managed clusters using the
autoSyncSalesForceConfig=true flag added to the Cluster object of the
management cluster. This option allows for automatic update and sync of the
Salesforce settings on all your clusters after you update your management
cluster configuration.
You can also set custom settings for managed clusters that always override
automatically propagated Salesforce values.
Enable propagation of Salesforce configuration using web UI¶
Log in to the Container Cloud web UI as m:kaas@global-admin or
m:kaas@writer.
In the Clusters tab, click the More action icon
in the last column of the required management cluster and select
Configure.
In the Configure cluster window, navigate to
StackLight > Salesforce and select
Salesforce Configuration Propagation To Managed Clusters.
Click Update.
Once the automatic propagation applies, the Events section of
the corresponding managed cluster displays the following message:
Propagated Cluster Salesforce Config From Management
<clusterName> Cluster uses SalesForce configuration from management
cluster.
Enable propagation of Salesforce configuration using CLI¶
Download your management cluster kubeconfig:
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
Expand the menu of the tab with your user name.
Click Download kubeconfig to download kubeconfig
of your management cluster.
Log in to any local machine with kubectl installed.
Copy the downloaded kubeconfig to this machine.
In the Cluster objects of the required managed cluster, remove all
Salesforce settings that you want to automatically sync with
the same settings of the management cluster:
Optional. Set custom Salesforce settings for your managed cluster to
override the related management cluster settings. Add the
required custom settings to the StackLight values section of the
corresponding Cluster object of your managed cluster:
Custom settings are not overridden if you update the
management cluster settings for Salesforce.
Update the Keycloak IP address on bare metal clusters¶
The following instruction describes how to update the IP address of the
Keycloak service on management clusters.
Note
The commands below contain the default kaas-mgmt name of the
management cluster. If you changed the default name, replace it accordingly.
To verify the cluster name, run kubectl get clusters.
To update the Keycloak IP address on a management cluster:
Log in to a node that contains kubeconfig of the required
management cluster.
Make sure that the configuration file is in your .kube directory.
Otherwise, set the KUBECONFIG environment variable with a full path to
the configuration file.
Configure the additional external IP address pool for the metallb
load balancer service.
The Keycloak service requires one IP address. Therefore, the external
IP address pool must contain at least one IP address.
Since Container Cloud 2.27.0 (Cluster release 16.2.0)
Open the MetalLBConfig object of the management cluster for editing:
In the snippet above, replace the following parameters:
<pool_start_ip> - first IP address in the required range
<pool_end_ip> - last IP address in the range
Add the external IP address pool name to the L2Advertisements
definition. You can add it to the same L2 advertisement as the
default IP address pool, or create a new L2 advertisement
if required.
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
In the template above, replace the following parameters:
<pool_start_ip> - first IP address in the desired range.
<pool_end_ip> - last IP address in the range.
<pool_cidr> - corresponding CIDR address. The only requirement
for this CIDR address is that the address range mentioned above
must fit into this CIDR. The CIDR address is not used by MetalLB,
it is just formally required for Subnet objects.
Note
If required, use a different IP address pool name.
Apply the Subnet template created in the previous step:
kubectlcreate-f<subnetTemplateName>
Open the MetalLBConfigTemplate object of the management cluster
for editing:
kubectledit<MetalLBConfigTemplateName>
Add the external IP address pool name to the L2Advertisements
definition. You can add it to the same L2 advertisement as the
default IP address pool, or create a new L2 advertisement
if required.
Before Container Cloud 2.24.0 (Cluster release 11.7.0 or earlier)
Open the Cluster object for editing:
kubectleditcluster<clusterName>
Add the following highlighted lines by replacing <pool_start_ip>
with the first IP address in the desired range and <pool_end_ip>
with the last IP address in the range:
spec:providerSpec:value:helmReleases:-name:metallbvalues:configInline:address-pools:-name:defaultprotocol:layer2addresses:-10.0.0.100-10.0.0.120 // example values-name:externalprotocol:layer2auto-assign:falseaddresses:-<pool_start_ip>-<pool_end_ip>
Note
If required, use a different IP address pool name.
Save and exit the object to apply changes.
Obtain the current Keycloak IP address for reference:
Available since MCC 2.24.0 (Cluster release 14.0.1)TechPreview
You can enable custom host names for cluster machines so that any machine host
name in a particular management cluster and its managed clusters matches
the related Machine object name.
For example, instead of the default kaas-node-<UID>, a machine host name
will be master-0. The custom naming format is more convenient and easier
to operate with.
Note
After you enable custom host names on an existing management
cluster, names of all newly deployed machines in this cluster and its
managed clusters will match machine host names. Existing host names will
remain the same.
If you are going to clean up a management cluster with this feature enabled
after cluster deployment, make sure to manually delete machines with existing
non-custom host names before cluster cleanup to prevent cleanup failure.
For details, see Remove a management cluster.
You can enable custom host names during management cluster bootstrap during
initial cluster configuration. For details, see Deploy a management cluster.
To enable the feature on an existing cluster, see the procedure below.
To enable custom host names on an existing management cluster:
Open the Cluster object of the management cluster for editing:
kubectleditcluster<mgmtClusterName>
In the spec.providerSpec.value.kaas.regional section of the required
region, find the required provider name under helmReleases and add
customHostnamesEnabled:true under values.config. For example:
Available since MCC 2.27.0 (Cluster release 16.2.0)
MOSK uses a MariaDB database to store data generated by
the Container Cloud components. Mirantis recommends backing up your databases
to ensure the integrity of your data. Also, you should create an instant backup
before upgrading your database to restore it if required.
The Kubernetes cron job responsible for the MariaDB backup is enabled
by default to create daily backups. You can modify the default configuration
before or after the management cluster deployment.
Warning
A local volume of only one node of a management cluster is
selected when the backup is created for the first time. This volume is used
for all subsequent backups.
If the node containing backup data must be redeployed, first move the MySQL
backup to another node and update the PVC binding along with the MariaDB
backup job to use another node as described in Change the storage node for MariaDB.
After the management cluster deployment, the cluster configuration includes
the MariaDB backup functionality. The Kubernetes cron job responsible for the
MariaDB backup is enabled by default. For the MariaDB backup workflow, see
Workflows of the OpenStack database backup and restoration.
Warning
A local volume of only one node of a management cluster is
selected when the backup is created for the first time. This volume is used
for all subsequent backups.
If the node containing backup data must be redeployed, first move the MySQL
backup to another node and update the PVC binding along with the MariaDB
backup job to use another node as described in Change the storage node for MariaDB.
If the object is missing, make sure that your management cluster is
successfully upgraded to the latest version.
Select from the following options:
If the management cluster is not bootstrapped yet, modify
cluster.yaml.template using the steps below.
If the management cluster is already deployed, modify the configuration
kubectl edit <mgmtClusterName> using the steps below.
By default, the management cluster name is kaas-mgmt.
If the newest full backup is older than the value of
the full_backup_cycle parameter, the system performs a full
backup. Otherwise, the system performs an incremental backup of
the newest full backup.
Number of seconds that defines a period between 2 full backups.
During this period, incremental backups are performed. The parameter
is taken into account only if backup_type is set to
incremental. Otherwise, it is ignored.
For example, with full_backup_cycle set to 604800 seconds,
a full backup is performed weekly and, if cron is set to 00***,
an incremental backup is performed daily.
Multiplier for the database size to predict the space required to
create a backup, either full or incremental, and perform a
restoration keeping the uncompressed backup files on the same file
system as the compressed ones.
To estimate the size of MARIADB_BACKUP_REQUIRED_SPACE_RATIO, use
the following formula: size of (1 uncompressed full backup + all
related incremental uncompressed backups + 1 full compressed backup)
in KB =< (DB_SIZE * MARIADB_BACKUP_REQUIRED_SPACE_RATIO) in
KB.
The DB_SIZE is the disk space allocated in the MySQL data
directory, which is /var/lib/mysql, for databases data excluding
galera.cache and ib_logfile* files. This parameter prevents
the backup PVC from being full in the middle of the restoration and
backup procedures. If the current available space is lower than
DB_SIZE * MARIADB_BACKUP_REQUIRED_SPACE_RATIO, the backup
script fails before the system starts the actual backup and the
overall status of the backup job is failed.
To perform full backups monthly and incremental backups daily at 02:30 AM and
keep the backups for the last six months, configure the database backup in your
Cluster object as follows:
Create the check_pod.yaml file to create the helper pod required
to view the backup volume content.
Configuration example:
apiVersion:v1kind:ServiceAccountmetadata:name:check-backup-helpernamespace:kaas---apiVersion:v1kind:Podmetadata:name:check-backup-helpernamespace:kaaslabels:application:check-backup-helperspec:containers:-name:helpersecurityContext:allowPrivilegeEscalation:falserunAsUser:0readOnlyRootFilesystem:truecommand:-sleep-infinity# using image from mariadb stsimage:<<insert_image_of_mariadb_container_here>>imagePullPolicy:IfNotPresentvolumeMounts:-name:pod-tmpmountPath:/tmp-mountPath:/var/backupname:mysql-backuprestartPolicy:NeverserviceAccount:check-backup-helperserviceAccountName:check-backup-helpervolumes:-name:pod-tmpemptyDir:{}-name:mariadb-secretssecret:secretName:mariadb-secretsdefaultMode:0444-name:mariadb-binconfigMap:name:mariadb-bindefaultMode:0555-name:mysql-backuppersistentVolumeClaim:claimName:mariadb-phy-backup-data
Apply the helper service account and pod resources:
The base directory contains full backups. Each directory in the incr
folder contains incremental backups related to a certain full backup in
the base folder. All incremental backups always have the base backup
name as the parent folder.
During the restore procedure, the MariaDB service will be unavailable
because the MariaDB StatefulSet scales down to 0 replicas. Therefore, plan
the maintenance window according to the database size.
The restore speed depends on the following:
Network throughput
Storage performance where backups are kept
Local disks performance of nodes with MariaDB local volumes
If you want to restore the full backup, the name from the
example above is 2021-09-09_11-35-48. To restore a specific
incremental backup, the name from the example above is
2021-09-09_11-35-48/2021-09-12_01-01-54.
In the example above, the backups will be restored in the following strict
order:
2021-09-09_11-35-48 - full backup,
path /var/backup/base/2021-09-09_11-35-48
Name of a folder with backup in <baseBackup> or
<baseBackup>/<incrementalBackup>.
replica-restore-timeout
Integer
3600
Timeout in seconds for 1 replica data to be restored to the
mysql data directory. Also, includes time for spawning a
rescue runner pod in Kubernetes and extracting data from a
backup archive.
Wait until the mariadb-phy-restore job succeeds:
kubectl -n kaas get jobs mariadb-phy-restore -o jsonpath='{.status}'
The mariadb-phy-restore job is an immutable object. Therefore,
remove the job after each execution. To correctly remove the job,
clean up all settings from the Cluster object
that you have configured during step 7 of this procedure.
This will remove all related pods as well.
Note
If you create a new user after creating the MariaDB backup file,
such user obviously will not exist in the database after restoring MariaDB.
But Keycloak may still contain cache about such user. Therefore, during
an attempt of this user to log in, the Container Cloud web UI may start the
authentication procedure that fails with the following error: Data loading
failed: Failed to log in: Failed to get token. Reason: “User not found”.
To clear cache in Keycloak, refer to the official Keycloak documentation.
The default storage class cannot be used on a management cluster, so a
specially created one is used for this purpose. For storage, this class uses
local volumes, which are managed by local-volume-provisioner.
Each node of a management cluster contains a local volume, and the volume bound
with a PVC is selected when the backup gets created for the first time.
This volume is used for all subsequent backups. Therefore, to ensure reliable
backup storage, consider creating a regular backup copy of this volume in a
separate location.
If the node that contains backup data must be redeployed, first move the MySQL
backup data to another node and update the PVC binding along with the MariaDB
backup job to use another node as described below.
The order of nodes that contain Secondarylocalvolume
is random.
Capture details of the node containing the primary local volume for further
usage. For example, you can use the InternalIP value to SSH to the
required node and copy the backup data located under Volumepath to a
separate location.
Change the default storage node for MariaDB backups¶
Capture details of the local volume and node containing backups
as described in Identify a node where backup data is stored. Also, capture details of
Secondarylocalvolume that you select to move backup data to.
Using InternalIP of Primarylocalvolume, SSH to the corresponding
node and create a backup tarball:
Note
In the command below, replace <newVolumePath> with the value
of the Volumepath field of the selected Secondarylocalvolume.
Using InternalIP of Secondarylocalvolume, SSH to the
corresponding node and copy the created backup mariadb-backup.tar.gz
using a dedicated utility such as scp, rsync, or
other.
Restore mariadb-backup.tar.gz under the selected Volumepath:
This section describes how to remove a management cluster. You can also use the
following instruction to remove unsupported regional clusters, if any.
To remove a management or regional cluster:
Verify that you have successfully removed all managed clusters that
run on top of the management or regional cluster to be removed. For details,
see Delete a managed cluster.
If you enabled custom host names on an existing management or regional
cluster as described in Configure host names for cluster machines, and the cluster contains hosts
with non-custom names, manually delete such hosts to prevent cleanup
failure.
Log in to a local machine where your management cluster kubeconfig
is located and where kubectl is installed.
Note
The management cluster kubeconfig is created
during the last stage of the management cluster bootstrap.
Note
To remove a regional cluster, you also need access to the regional
cluster kubeconfig that was created during the last stage of the
regional cluster bootstrap.
Verify that the bootstrap directory is updated.
Select from the following options:
For clusters deployed using Container Cloud 2.11.0 (Cluster releases 7.1.0
and 6.18.0) or later:
For clusters deployed using the Container Cloud release earlier than 2.11.0
(7.0.0, 6.16.0, or earlier), or if you deleted the kaas-bootstrap folder,
download and run the Container Cloud bootstrap script:
Available since MCC 2.24.0 (14.0.1 and 15.0.1)TechPreview
This section describes how to speed up deployment and update process of
managed clusters, which usually do not have access to the Internet and
consume artifacts from a management cluster using the mcc-cache service.
By default, after auto-upgrade of a management cluster,
before each managed cluster deployment or update, mcc-cache downloads the
required list of images, thus slowing down the process.
Using the CacheWarmupRequest resource, you can predownload (warm up)
a list of images included in a given set of Cluster releases into the
mcc-cache service only once per release for further usage on all managed
clusters.
After a successful cache warm-up, the object of the CacheWarmupRequest
resource is automatically deleted from the cluster and cache remains for
managed clusters deployment or update until next Container Cloud auto-upgrade
of the management cluster.
Caution
If the disk space for cache runs out, the cache for the oldest
object is evicted. To avoid running out of space in the cache, verify and
adjust its size before each cache warm-up.
Cache warm-up requires a lot of disk storage, it may take up to 100% of disk
space. Therefore, make sure to have enough space for storing cached objects
on each node of the management cluster before creating the
CacheWarmupRequest resource. The following example contains minimal
required values for the cache size for the management cluster:
After you calculate the disk size for warming up cache depending on your
cluster settings and minimal cache warm-up requirements, configure the size of
cache in the Cluster object of your cluster.
In the spec:providerSpec:value:kaas:regionalHelmReleases: section of the
management Cluster object, add the following snippet to the mcc-cache
entry with the required size value in GiB:
After you increase the size of cache on the cluster as described in
Increase cache size for mcc-cache, create the CacheWarmupRequest object in the
Kubernetes API.
Caution
Create CacheWarmupRequest objects only on the management
cluster.
To warm up cache using CLI:
Identify the latest available Cluster releases to use for deployment of
new clusters and update of existing clusters:
Once done, during deployment and update of managed clusters, Container
Cloud uses cached artifacts from the mcc-cache service to facilitate
and speed up the procedure.
When a new Container Cloud release becomes available and the management
cluster auto-upgrades to a new Container Cloud release, repeat the
above steps to predownload a new set of artifacts for managed clusters.
After deploying a managed cluster, you can configure a few cluster settings
using the Container Cloud web UI as described below.
To change a cluster configuration:
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Select the required project.
On the Clusters page, click the More action icon in
the last column of the required cluster and select
Configure cluster.
In the Configure cluster window:
In the General Settings tab, you can:
Add or update proxy for a cluster by selecting the name of previously
created proxy settings from the Proxy drop-down menu. To
add or update proxy parameters:
In the Proxies tab, configure proxy:
Click Add Proxy.
In the Add New Proxy wizard, fill out the form with the
following parameters:
If your proxy requires a trusted CA certificate, select the
CA Certificate check box and paste a CA certificate for a MITM
proxy to the corresponding field or upload a certificate using
Upload Certificate.
Using the SSH Keys drop-down menu, select the required
previously created SSH key to add it to the running cluster.
If required, you can add several keys or remove unused ones, if any.
Note
To delete an SSH key, use the SSH Keys tab
of the main menu.
Applies since MCC 2.21.x (Cluster releases 12.5.0 and 11.5.0).
Using the Container Registry drop-down menu, select
the previously created Docker container registry name to add
it to the running cluster.
Applies since MCC 2.25.0 (Cluster releases 17.0.0 and 16.0.0).
Using the following options, define the maximum number of worker
machines to be upgraded in parallel during cluster update:
Parallel Upgrade Of Worker Machines
The maximum number of the worker nodes to update simultaneously. It serves as
an upper limit on the number of machines that are drained at a given moment
of time. Defaults to 1.
Parallel Preparation For Upgrade Of Worker Machines
The maximum number of worker nodes being prepared at a given moment of time,
which includes downloading of new artifacts. It serves as a limit for the
network load that can occur when downloading the files to the nodes.
Defaults to 50.
In the Stacklight tab, select or deselect StackLight
and configure its parameters if enabled.
You can also update the default log level severity for all StackLight
components as well as set a custom log level severity for specific
StackLight components. For details about severity levels,
see Log verbosity.
Before performing node maintenance operations that are not managed by
MOSK, such as operating system configuration or node reboot,
enable maintenance mode on the cluster and required machines using the
Container Cloud web UI or CLI to prepare workloads for maintenance.
Enable maintenance mode on a cluster and machine using web UI¶
You can use the instructions below to enable maintenance mode on a cluster and
machine using the Container Cloud web UI. To enable maintenance mode using the
Container Cloud API, refer to Enable maintenance mode on a cluster and machine using CLI.
Caution
To enable maintenance mode on a machine, first enable maintenance mode
on the related cluster.
To disable maintenance mode on a cluster, first disable maintenance mode
on all machines of the cluster.
Warning
During cluster and machine maintenance:
Cluster upgrades and configuration changes (except of the SSH keys
setting) are unavailable. Make sure you disable maintenance mode on the
cluster after maintenance is complete.
Data load balancing is disabled while Ceph is in maintenance mode.
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Enable maintenance mode on the cluster:
In the Clusters tab, click the More action icon
in the last column of the cluster you want to put into maintenance
mode and select Enter maintenance. Confirm your selection.
Wait until the Status of the cluster switches to
Maintenance.
Now, you can switch cluster machines to maintenance mode.
In the Clusters tab, click the required cluster name to
open the list of machines running on it.
In the Maintenance column of the machine you want to put into
maintenance mode, enable the toggle switch.
Wait until the machine Status switches to
Maintenance.
Once done, the node of the selected machine is cordoned, drained, and
prepared for maintenance operations.
Important
Proceed with the node maintenance only after the machine
Status switches to Maintenance.
Disable maintenance mode on a cluster and machine¶
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
In the Clusters tab, click the required cluster name to open
its machines list.
In the Maintenance column of the machine you want to disable
maintenance mode, disable the toggle switch.
Wait until the machine Status does not display
Maintenance, Pending maintenance, or the progress
indicator.
Repeat the above steps for all machines that are in maintenance mode.
Disable maintenance mode on the related cluster:
In the Clusters tab, click the More action icon
in the last column of the cluster where you want to disable maintenance
mode and select Exit maintenance.
Wait until the cluster Status does not display
Maintenance, Pending maintenance, or the progress
indicator.
Enable maintenance mode on a cluster and machine using CLI¶
You can use the instructions below to enable maintenance mode on a cluster and
machine using the Container Cloud API. To enable maintenance mode using the
Container Cloud web UI, refer to Enable maintenance mode on a cluster and machine using web UI.
Caution
To enable maintenance mode on a machine, first enable maintenance mode
on the related cluster.
To disable maintenance mode on a cluster, first disable maintenance mode
on all machines of the cluster.
Warning
During cluster and machine maintenance:
Cluster upgrades and configuration changes (except of the SSH keys
setting) are unavailable. Make sure you disable maintenance mode on the
cluster after maintenance is complete.
Data load balancing is disabled while Ceph is in maintenance mode.
Available since MCC 2.25.0 (17.0.0 and 16.0.0) for workers
on managed clustersTechPreview
You can use the machine disabling API to seamlessly remove a worker machine
from the LCM control of a managed cluster. This action isolates the affected
node without impacting other machines in the cluster, effectively eliminating
it from the Kubernetes cluster. This functionality proves invaluable in
scenarios where a malfunctioning machine impedes cluster updates.
Note
The Technology Preview support of the machine disabling feature
also applies during cluster update to the Cluster release 17.1.0 or 16.1.0.
Before disabling a cluster machine, carefully read the following essential
information for a successful machine disablement:
MOSK supports machine disablement of worker machines only.
If an issue occurs on the control plane, which is updated before worker
machines, fix the issue or replace the affected
control machine as soon as possible to prevent issues with workloads.
For reference, see Troubleshooting Guide and Delete a cluster machine.
Disabling a machine can break high availability (HA) of components such as
StackLight. Therefore, Mirantis recommends adding a new machine as soon
as possible to provide sufficient node number for components HA.
Note
It is expected that the cluster status contains degraded replicas
of some components during or after cluster update with a disabled machine.
These replicas become available as soon as you replace the disabled
machine.
When a machine is disabled, some services may switch to the NodeReady
state and may require additional actions to unblock LCM tasks.
A disabled machine is removed from the overall cluster status and is labeled
as Disabled. The requested node number for the cluster remains
the same, but an additional disabled field is displayed with the number
of disabled nodes.
A disabled machine is not taken into account for any calculations,
for example, when the number of StackLight nodes is required for some
restriction check.
MOSK removes the node running the disabled machine from
the Kubernetes cluster.
Deletion of the disabled machine with the graceful deletion policy is
not allowed. Use the unsafe deletion policy instead.
For details, see Delete a cluster machine.
For a major cluster update, the Cluster release of a disabled machine must
match the Cluster release of other cluster machines.
If a machine is disabled during the major Cluster release update, then the
upgrade should be completed if all other requirements are met. However,
cluster update to the next available major Cluster release will be blocked
until you re-enable or replace the disabled machine.
Patch updates do not have such limitation on different patch Cluster
releases. You can update a cluster with a disabled machine to several patch
Cluster releases in the scope of one major Cluster release.
After enabling the machine, it will be updated to match the Cluster release
of the corresponding cluster, including all related components.
For Ceph machines, you need to perform additional disablement steps.
Disable a machine using the Container Cloud web UI¶
Carefully read the precautions for
machine disablement.
Power off the underlying host of a machine to be disabled.
Warning
If the underlying host of a machine is not powered off, the
cluster may still contain the disabled machine in the list of available
nodes with kubelet attempting to start the corresponding containers
on the disabled machine.
Therefore, Mirantis strongly recommends powering off the underlying host
to prevent manual removal of the related Kubernetes node from the Docker
Swarm cluster using the MKE web UI.
In the Clusters tab, click the required cluster name to
open the list of machines running on it.
Click the More action icon in the last column of the required
machine and click Disable.
Wait until the machine Status switches to Disabled.
If the disabled machine contains StackLight or Ceph, migrate these services
to a healthy machine:
Verify that the required disabled and healthy machines are not currently
added to GracefulRebootRequest:
Note
Machine configuration changes, such as reassigning Ceph and
StackLight labels from a disabled machine to a healthy one, which
are described in the following steps, are not allowed during graceful
reboot. For details, see Perform a graceful reboot of a cluster.
Verify that the More > Reboot machines option is not
disabled. If the option is active, skip the following sub-step and
proceed to the next step. If the option is disabled, proceed to the
following sub-step.
Using the Container Cloud CLI, verify that the new machine, which you
are going to use for StackLight or Ceph services migration, is not
included in the list of the GracefulRebootRequest resource.
Otherwise, remove GracefulRebootRequest before proceeding.
For details, see Disable a machine using the Container Cloud CLI.
Note
Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0), reboot of the disabled machine is automatically skipped in
GracefulRebootRequest.
If StackLight is deployed on the machine, unblock LCM tasks by moving the
stacklight=enabled label to another healthy machine with a sufficient
amount of resources and manually remove StackLight Pods along with related
local persistent volumes from the disabled machine. For details, see
Deschedule StackLight Pods from a worker machine.
If Ceph is deployed on the machine:
Disable a Ceph machine
Select one of the following options to open the Ceph cluster spec:
Web UI
In the CephClusters tab, click the required Ceph cluster
name to open its spec.
For mgr, rgw, or mds, move such role to another node
located in the node section. Such node must meet resource
requirements to run the corresponding daemon type and must not have the
respective node assigned yet.
For mon, refer to Move a Ceph Monitor daemon to another node for further instructions.
Mirantis recommends considering nodes with sufficient resources to run
the moved monitor daemon.
Carefully read the precautions for
machine disablement.
Power off the underlying host of a machine to be disabled.
Warning
If the underlying host of a machine is not powered off, the
cluster may still contain the disabled machine in the list of available
nodes with kubelet attempting to start the corresponding containers
on the disabled machine.
Therefore, Mirantis strongly recommends powering off the underlying host
to prevent manual removal of the related Kubernetes node from the Docker
Swarm cluster using the MKE web UI.
Open the required Machine object for editing.
In the providerSpec:value section, set disable to true:
If the disabled machine contains StackLight or Ceph, migrate these services
to a healthy machine:
Verify that the required disabled and healthy machines are not currently
added to GracefulRebootRequest:
Note
Machine configuration changes, such as reassigning Ceph and
StackLight labels from a disabled machine to a healthy one, which
are described in the following steps, are not allowed during graceful
reboot. For details, see Perform a graceful reboot of a cluster.
Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0), reboot of the disabled machine is automatically skipped in
GracefulRebootRequest.
If StackLight is deployed on the machine, unblock LCM tasks by moving the
stacklight=enabled label to another healthy machine with a sufficient
amount of resources and manually remove StackLight Pods along with related
local persistent volumes from the disabled machine. For details, see
Deschedule StackLight Pods from a worker machine.
If Ceph is deployed on the machine:
Disable a Ceph machine
Select one of the following options to open the Ceph cluster spec:
Web UI
In the CephClusters tab, click the required Ceph cluster
name to open its spec.
For mgr, rgw, or mds, move such role to another node
located in the node section. Such node must meet resource
requirements to run the corresponding daemon type and must not have the
respective node assigned yet.
For mon, refer to Move a Ceph Monitor daemon to another node for further instructions.
Mirantis recommends considering nodes with sufficient resources to run
the moved monitor daemon.
Available since MCC 2.23.x (Cluster releases 12.7.0 and 11.7.0)
You can perform a graceful reboot on a management or managed cluster. Use
the below procedure to cordon, drain, and reboot the required cluster machines
using a rolling reboot without workloads interruption. The procedure is also
useful for a bulk reboot of machines, for example, on large clusters.
The reboot occurs in the order of cluster upgrade policy that you can change
for managed clusters as described in Change the upgrade order of a machine.
Caution
The cluster and machines must have the Ready status to
perform a graceful reboot.
Perform a rolling reboot of a cluster using web UI¶
Available since MCC 2.24.x (Cluster release 14.0.1 and 15.0.1)
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
On the Clusters page, verify that the status of the required
cluster is Ready. Otherwise, the Reboot machines
option is disabled.
Click the More action icon in the last column of the required
cluster and select Reboot machines. Confirm the selection.
Note
While a graceful reboot is in progress, the
Reboot machines option is disabled.
Machine configuration changes are forbidden during graceful
reboot. Therefore, either wait until reboot is completed or cancel it using
CLI, as described in the following section.
In spec:machines, add the machine list or leave it empty to reboot all
cluster machines.
Wait until all specified machines are rebooted. You can monitor the reboot
status of the cluster and machines using the Conditions:GracefulReboot
fields of the corresponding Cluster and Machine objects.
The GracefulRebootRequest object is automatically deleted once the
reboot on all target machines completes.
Machine configuration changes are forbidden during graceful
reboot.
In emergency cases, for example, to migrate StackLight or Ceph services
from a disabled machine that fails during graceful reboot and blocks the
process, cancel the reboot by deleting the GracefulRebootRequest object:
Once you migrate StackLight or Ceph services to another machine and disable it,
re-create the GracefulRebootRequest object for the remaining machines
that require reboot.
Before deleting a cluster machine, carefully read the following essential
information for a successful machine deletion:
Mirantis recommends deleting cluster machines using the Container Cloud web UI
or API instead of using the provider tools directly. Otherwise, the cluster
deletion or detachment may hang and additional manual steps will be required
to clean up machine resources.
An operational managed cluster must contain a minimum of 3 Kubernetes manager
machines to meet the etcd quorum and 2 Kubernetes worker machines.
The deployment of the cluster does not start until the minimum
number of machines is created.
A machine with the manager role is automatically deleted during the cluster
deletion. Manual deletion of manager machines is allowed only for the purpose
of node replacement or recovery.
Consider the following precautions before deleting manager machines:
Create a new manager machine to replace the deleted one as soon as
possible. This is necessary because after machine removal, the cluster
has limited capabilities to tolerate faults. Deletion of manager machines
is intended only for replacement or recovery of failed nodes.
You can delete a manager machine only if your cluster has at least
two manager machines in the Ready state.
Do not delete more than one manager machine at once to prevent cluster
failure and data loss.
Before replacing a failed manager machine, make sure that all
Deployments with replicas configured to 1 are ready.
Ensure that the machine to delete is not a Ceph Monitor. Otherwise, migrate
the Ceph Monitor to keep the odd number quorum of Ceph Monitors after the
machine deletion. For details, see Migrate a Ceph Monitor before machine replacement.
If StackLight in HA mode is enabled and you are going to delete a
machine with the StackLight label:
Make sure that at least 3 machines with the StackLight label
remain after the deletion. Otherwise, add an additional machine with
such label before the deletion. After the deletion, perform the additional
steps described in the deletion procedure, if required.
Do not delete more than 1 machine with the StackLight label.
Since StackLight in HA mode uses local volumes bound to machines, the data
from these volumes on the deleted machine will be purged but its replicas
remain on other machines. Removal of more than 1 machine can cause
data loss.
If you move the StackLight label to a new worker machine on an
existing cluster, manually deschedule all StackLight components from the old
worker machine, which you remove the StackLight label from. For
details, see Operations Guide: Deschedule StackLight Pods from a worker
machine.
If the machine being deleted has a prioritized upgrade index and you want to
preserve the same upgrade order, manually set the required index to the new
node that replaces the deleted one. Otherwise, the new node is automatically
set the greatest upgrade index that is prioritized the last. To set the
upgrade index, refer to Change the upgrade order of a machine.
Ensure that the machine being deleted is not a Ceph Monitor. If it is,
migrate the Ceph Monitor to keep the odd number quorum of Ceph Monitors
after the machine deletion. For details, see
Migrate a Ceph Monitor before machine replacement.
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Click the More action icon in the last column of the machine
you want to delete and select Delete.
Select the machine deletion method:
Graceful
Recommended. The machine will be prepared for deletion with all
workloads safely evacuated. Using this option, you can cancel the
deletion before the corresponding node is removed from Docker Swarm.
Unsafe
Not recommended. The machine will be deleted without any preparation.
Forced
Not recommended. The machine will be deleted with no guarantee of
resources cleanup. Therefore, Mirantis recommends trying
Graceful or Unsafe option first.
If machine deletion fails, you can reduce the deletion policy restrictions
and try another method but in the following order only:
Graceful > Unsafe > Forced.
This section instructs you on how to scale down an existing management
or managed cluster through the Container Cloud API. To delete a
machine using the Container Cloud web UI, see Delete a cluster machine using web UI.
Using the Container Cloud API, you can delete a cluster machine using the
following methods:
Recommended. Enable the delete field in the providerSpec section of
the required Machine object. It allows aborting graceful machine
deletion before the node is removed from Docker Swarm.
Not recommended. Apply the delete request to the Machine object.
You can control machine deletion steps by following a specific machine
deletion policy.
The deletion policy of the Machine resource used in the Container Cloud
API defines specific steps occurring before a machine deletion.
The Container Cloud API contains the following types of deletion policies:
graceful, unsafe, forced. By default, the graceful deletion policy is used.
You can change the deletion policy before the machine deletion. If the
deletion process has already started, you can reduce the deletion policy
restrictions in the following order only: graceful > unsafe > forced.
During a forced machine deletion, the provider and LCM controllers perform
the following steps:
Send the delete request to the corresponding Machine resource.
Remove the provider resources such as the VM instance, network, volume,
and so on. Remove the related Kubernetes resources.
Remove the finalizer from the Machine resource. This step completes
the machine deletion from Kubernetes resources.
This policy type allows deleting a Machine resource even if the provider or
LCM controller gets stuck at some step. But this policy may require a manual
cleanup of machine resources in case of a controller failure. For details, see
Delete a machine from a cluster using CLI.
Caution
Consider the following precautions applied to the forced
machine deletion policy:
Use the forced machine deletion only if either graceful or unsafe machine
deletion fails.
If the forced machine deletion fails at any step, the LCM Controller
removes the finalizer anyway.
Before starting the forced machine deletion, back up the related
Machine resource:
Log in to the host where your management cluster kubeconfig is located
and where kubectl is installed.
For the bare metal provider, ensure that the machine being deleted is not
a Ceph Monitor. If it is, migrate the Ceph Monitor to keep the odd number
quorum of Ceph Monitors after the machine deletion. For details, see
Migrate a Ceph Monitor before machine replacement.
Select from the following options:
Recommended. In the providerSpec.value section of the Machine
object, set delete to true:
Since the management cluster update to 16.4.0 (MCC 2.29.0)
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
Due to a development limitation in baremetal-operator,
deletion of a managed cluster requires preliminary deletion
of the worker machines running on the cluster.
Warning
Mirantis recommends deleting cluster machines using the Container Cloud web UI
or API instead of using the provider tools directly. Otherwise, the cluster
deletion or detachment may hang and additional manual steps will be required
to clean up machine resources.
Using the Container Cloud web UI, first delete worker
machines one by one until you hit the minimum of 2 workers
for an operational cluster. After that, you can delete the cluster
with the remaining workers and managers.
To delete a baremetal-based managed cluster:
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the required cluster name
to open the list of machines running on it.
Click the More action icon in the last column of the
worker machine you want to delete and select Delete.
Confirm the deletion.
Repeat the step above until you have 2 workers left.
In the Clusters tab, click the More action icon
in the last column of the required cluster and select Delete.
Verify the list of machines to be removed. Confirm the deletion.
If the cluster deletion hangs and the Deleting status message
does not disappear after a while, refer to Cluster deletion or detachment freezes
to fix the issue.
Optional. Available since MOSK 23.2. If you do not plan
to reuse the bare metal hosts of the deleted cluster, delete them:
In the BM Hosts tab, click the Delete action
icon next to the name of the host to be deleted.
Confirm the deletion.
Caution
Credentials associated with the deleted bare metal host,
if any, are deleted automatically.
Optional. Applies before MOSK 23.2. If you do not need
credentials associated with the bare metal hosts of the deleted cluster,
delete them manually:
Deleting a cluster automatically frees up the resources allocated
for this cluster, for example, instances, load balancers, networks,
floating IPs, and so on.
This section instructs you on how to verify MOSK cluster
status using the Container Cloud web UI during cluster deployment or day-2
operations such as cluster update, maintenance, and so on.
To monitor the cluster readiness, hover over the status icon of a specific
cluster in the Status column of the Clusters page.
Once the orange blinking status icon becomes green and Ready,
the cluster deployment or update is complete.
You can monitor live deployment status of the following cluster components:
Component
Description
Helm
Installation or upgrade status of all Helm releases
Kubelet
Readiness of the node in a Kubernetes cluster, as reported by kubelet
Kubernetes
Readiness of all requested Kubernetes objects
Nodes
Equality of the requested nodes number in the cluster to the number
of nodes having the Ready LCM status
OIDC
Readiness of the cluster OIDC configuration
StackLight
Health of all StackLight-related objects in a Kubernetes cluster
Swarm
Readiness of all nodes in a Docker Swarm cluster
LoadBalancer
Readiness of the Kubernetes API load balancer
ProviderInstance
Readiness of all machines in the underlying infrastructure
Graceful Reboot
Readiness of a cluster during a scheduled graceful reboot,
available since Container Cloud 2.24.0 (Cluster releases 15.0.1 and 14.0.0).
Infrastructure Status
Available since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0).
Readiness of the MetalLBConfig object along with MetalLB and DHCP
subnets.
LCM Operation
Available since Container Cloud 2.26.0 (Cluster releases 17.1.0 and
16.1.0). Health of all LCM operations on the cluster and its machines.
LCM Agent
Available since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0). Health of all LCM agents on cluster machines and the status of
LCM agents update to the version from the current Cluster release.
This section instructs you on how to verify machine status of a
MOSK cluster using the Container Cloud web UI during cluster
deployment or day-2 operations such as cluster update, maintenance, and so on.
The machine creation starts with the Provision status. During provisioning,
the machine is not expected to be accessible since its infrastructure (VM,
network, and so on) is being created.
Other machine statuses are the same as the LCMMachine object states:
Uninitialized - the machine is not yet assigned to an LCMCluster.
Pending - the agent reports a node IP address and host name.
Prepare - the machine executes StateItems that correspond
to the prepare phase. This phase usually involves downloading
the necessary archives and packages.
Deploy - the machine executes StateItems that correspond
to the deploy phase that is becoming a Mirantis Kubernetes Engine (MKE)
node.
Ready - the machine is being deployed.
Upgrade - the machine is being upgraded to the new MKE version.
Reconfigure - the machine executes StateItems that correspond
to the reconfigure phase. The machine configuration is being updated
without affecting workloads running on the machine.
Once the status changes to Ready, the deployment of the cluster
components on this machine is complete.
To monitor the deploy or update live status of the machine, use one of the
following options:
Quick status
On the Clusters page, in the Managers or
Workers column. The green status icon indicates that the machine
is Ready, the orange status icon indicates that the machine is
Updating.
Detailed status
In the Machines section of a particular cluster page, in the
Status column. Hover over a particular machine status icon to
verify the deploy or update status of a specific machine component.
You can monitor the status of the following machine components:
Component
Description
Kubelet
Readiness of a node in a Kubernetes cluster.
Swarm
Health and readiness of a node in a Docker Swarm cluster.
LCM
LCM readiness status of a node.
ProviderInstance
Readiness of a node in the underlying bare metal infrastructure.
Graceful Reboot
Readiness of a machine during a scheduled graceful reboot of a cluster,
available since Container Cloud 2.24.x (Cluster releases 15.0.1 and
14.0.0).
Infrastructure Status
Available since Container Cloud 2.25.0 (Cluster releases 17.0.0 and
16.0.0). Readiness of the IPAMHost, L2Template,
BareMetalHost, and BareMetalHostProfile objects associated with
the machine.
LCM Operation
Available since Container Cloud 2.26.0 (Cluster releases 17.1.0 and
16.1.0). Health of all LCM operations on the machine.
LCM Agent
Available since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0). Health of the LCM Agent on the machine and the status of the
LCM Agent update to the version from the current Cluster release.
Available since MCC 2.24.0 (14.0.0, 14.0.1, 15.0.1)TechPreview
You can set the maximum transmission unit (MTU) size for Calico in the
Cluster object using the calico.mtu parameter. By default, the MTU
size for Calico is 1450 bytes. You can change it regardless of the host
operating system.
The following configuration example of the Cluster object covers a use
case where the interface MTU size of the workload network, which is the
smallest value across cluster nodes, is set to 9000 and the use of
WireGuard is not expected:
spec:...providerSpec:value:...calico:mtu:8950
Note
Since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0),
WireGuard is deprecated. If you still require the feature, contact
Mirantis support for further information.
Caution
If you do not expect to use WireGuard encryption, ensure that
the MTU size for Calico is at least 50 bytes smaller than the interface MTU
size of the workload network. IPv4 VXLAN uses a 50-byte header.
Warning
Mirantis does not recommend changing this parameter on a running
cluster. It leads to sequential draining of nodes and re-installation of
packets, as during cluster upgrade.
Add or update a CA certificate for a MITM proxy using API¶
Note
For managed clusters, this feature is available since
MOSK 23.1.
When you enable a man-in-the-middle (MITM) proxy access to a managed cluster,
your proxy requires a trusted CA certificate. This section describes how to
manually add the caCertificate field to the spec section
of the Proxy object. You can also use this instruction to update an
expired certificate on an existing cluster.
You can also add a CA certificate for a MITM proxy using the Container Cloud
web UI through the Proxies tab. For details, refer to the cluster
creation procedure as described in Create a managed bare metal cluster.
Warning
Any modification to the Proxy object, for example, changing
the proxy URL, NO_PROXY values, or certificate, leads to cordon-drain
and Docker restart on the cluster machines.
To add or update a CA certificate for a MITM proxy using API:
Encode your proxy CA certificate. For example:
cat~/.mitmproxy/mitmproxy-ca-cert.cer|base64-w0
Replace ~/.mitmproxy/mitmproxy-ca-cert.cer with the path to your CA
certificate file.
Open the existing Proxy object for editing:
Warning
The kubectl apply command automatically saves the
applied data as plain text into the
kubectl.kubernetes.io/last-applied-configuration annotation of the
corresponding object. This may result in revealing sensitive data in this
annotation when creating or modifying the object.
Therefore, do not use kubectl apply on this object.
Use kubectl create, kubectl patch, or
kubectl edit instead.
If you used kubectl apply on this object, you
can remove the kubectl.kubernetes.io/last-applied-configuration
annotation from the object using kubectl edit.
Save the Proxy object and proceed with the managed cluster creation.
If you update an expired certificate on an existing managed cluster,
wait until the machines switch from the Reconfigure to Ready state
to apply changes.
Configure TLS certificates for cluster applications¶
TechPreview
The Container Cloud web UI and StackLight endpoints are available
through Transport Layer Security (TLS) with self-signed certificates
generated by the Container Cloud provider.
Caution
The Container Cloud endpoints are available only through HTTPS.
Supported applications for TLS certificates configuration¶
Application name
Cluster Type
Comment
Container Cloud web UI
Management
iam-proxy
Management and managed
Available since Container Cloud 2.22.0 (Cluster release 11.6.0).
Keycloak
Management
mcc-cache
Management
Caution
The organization administrator must ensure that the application
host name is resolvable within and outside the cluster.
Caution
Custom TLS certificates for Keycloak are supported for new and
existing clusters originally deployed using MOS 21.3 or later.
Workflow of custom MKE certificates configuration¶
Available since 2.24.0 (Cluster releases 14.0.0, 14.0.1)Applies to management clusters only
When you add custom MKE certificates on a management cluster, the following
workflow applies:
LCM agents are notified to connect to the management cluster using a
different certificate.
After all agents confirm that they are ready to support both current and
custom authentication, new MKE certificates apply.
LCM agents switch to the new configuration as soon as it gets valid.
The next cluster reconciliation reconfigures helm-controller for each
managed cluster created within the configured management cluster.
If MKE certificates apply to the management cluster, the Container Cloud
web UI reconfigures.
Caution
The Container Cloud web UI requires up to 10 minutes to update the
MKE certificate configuration for communication with the management cluster.
During this time, requests to the management cluster fail with the following
example error:
This error is expected and disappears once new certificates apply.
Warning
During certificates application, LCM agents from every node must
confirm that they have a new configuration prepared. If managed clusters
contain a big number of nodes and some are stuck or orphaned, then the whole
process gets stuck. Therefore, before applying new certificates, make sure
that all nodes are ready.
Warning
If you apply MKE certificates to the management cluster with
proxy enabled, all nodes and pods of this cluster and its managed clusters
are triggered for reconfiguration and restart, which may cause the API and
workload outage.
Obtain your DNS server name. For example,
container-cloud-auth.example.com.
Buy or generate a certificate from a certification authority (CA)
that contains the following items:
A full CA bundle including the root and all intermediate CA certificates.
Your server certificate issued for the
container-cloud-auth.example.com DNS name.
Your secret key that was used to sign the certificate signing request.
For example, cert.key.
Select the root CA certificate from your CA bundle and add it to
root_ca.crt.
Combine all certificates including the root CA, intermediate CA from the
CA bundle, and your server certificate into one file. For example,
full_chain_cert.crt.
Configure TLS certificates using the Container Cloud web UI¶
Available since MCC 2.24.0 (14.0.0, 14.0.1)
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the More action icon
in the last column of the required cluster and select
Configure cluster.
In the Security > TLS Certificates section, click
Add certificate.
In the wizard that opens, fill out and save the form:
Parameter
Description
Server name
Host name of the application.
Applications
Drop-down list of available applications for TLS certificates configuration.
Server certificate
Certificate to authenticate the identity of the server to a client.
You can also add a valid certificate bundle. The server certificate
must be on the top of the chain.
Private key
Private key for the server that must correspond to the public key used
in the server certificate.
CA Certificate
CA certificate that issued the server certificate. Required when
configuring Keycloak or mcc-cache. Use the top-most
intermediate certificate if the CA certificate is unavailable.
The Security section displays the expiration date and the
readiness status for every application with user-defined certificates.
Optional. Edit the certificate using the Edit action icon
located to the right of the application status and edit the form filled out
in the previous step.
Note
To revoke a certificate, use the Delete action icon
located to the right of the application status.
Configure TLS certificates using the Container Cloud API¶
For clusters originally deployed using MOS release earlier than 21.3,
download the latest version of the bootstrap script on the management
cluster:
The self-signed certificates generated and managed by the Container Cloud
provider are stored in *-tls-certs secrets in the kaas and
stacklight namespaces.
MOSK provides automatic renewal of certificates for internal
Container Cloud services. Custom certificates require manual renewal.
If you have permissions to view the default project in the Container Cloud
web UI, you may see the Certificate Is Expiring Soon warning for
custom certificates. The warning appears on top of the Container Cloud web UI.
It displays the certificate with the least number of days before expiration.
Click See Details and get more information about other expiring
certificates. You can also find the details about the expiring certificates in
the Status column’s Certificate Issues tooltip on the
Clusters page.
The Certificate Issues status may include the following messages:
Some certificates require manual renewal
A custom certificate is expiring in less than seven days. Renew the
certificate manually using the same container-cloud binary as
for the certificate configuration. For details, see
Configure TLS certificates using the Container Cloud API.
Some certificates were not renewed automatically
An automatic certificate renewal issue. Unexpected error, contact Mirantis
support.
Define a custom CA certificate for a private Docker registry¶
This section instructs you on how to define a custom CA certificate for Docker
registry connections on your management or managed cluster using
the Container Cloud web UI or CLI.
Caution
A Docker registry that is being used by a cluster cannot be
deleted.
Define a custom CA certificate for a Docker registry using CLI¶
In the providerSpec section of the Cluster object, set the
containerRegistries field with the names list of created
ContainerRegistry resource objects:
Define a custom CA certificate for a Docker registry using web UI¶
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
In the Container Registries tab, click
Add Container Registry.
In the Add new Container Registry window, define the following
parameters:
Container Registry Name
Name of the Docker registry to select during cluster creation or
post-deployment configuration.
Domain
Host name and optional port of the registry. For example,
demohost:5000.
CA Certificate
SSL CA certificate of the registry to upload or insert in plain text.
Click Create.
You can add the created Docker registry configuration to a new or existing
managed cluster as well as to an existing management cluster:
For a new managed cluster, in the Create new cluster wizard,
select the required registry name from the drop-down menu of the
Container Registry option. For details on a new cluster creation,
see Create a managed bare metal cluster.
For an existing cluster of any type, in the More menu of the
cluster, select the required registry name from the drop-down menu of the
Configure cluster > General Settings > Container Registry
option. For details on an existing managed cluster configuration,
see Change a cluster configuration.
When any MOSK component reaches the limit of memory
resources usage, the affected pod may be killed by OOM killer to prevent memory
leaks and further destabilization of resource distribution.
A periodic recreation of a pod killed by OOM killer is normal once a day or
week. But if the alerts frequency increases or pods cannot start and move to
the CrashLoopBack state, adjust the default memory limits to fit your
cluster needs and prevent critical workloads interruption.
When any MOSK component reaches the limit of CPU resources
usage, StackLight raises the CPUThrottlingHigh alerts. CPU limits for
MOSK components (except the StackLight ones) were removed in
Container Cloud 2.24.0 (Cluster releases 14.0.0, 14.0.1, and 15.0.1). For
earlier versions, use the resources:limits:cpu parameter located in the
same section as the resources:limits:memory parameter of the corresponding
component.
The limits key location in the Cluster object can differ depending on
component. Different cluster types have different sets of components that
you can adjust limits for.
The following sections describe components that relate to a specific cluster
type with corresponding limits key location provided in configuration
examples. Limit values in the examples correspond to default values used since
Container Cloud 2.24.0 (Cluster releases 15.0.1, 14.0.1, and 14.0.0).
spec:providerSpec:value:helmReleases:-name:metallbvalues:controller:resources:limits:memory:200Mi# no CPU limit and 200Mi of memory limit since MCC 2.24.0 (15.0.1, 14.0.0)# 200m CPU and 200Mi of memory limit since MCC 2.23.0 (11.7.0)speaker:resources:limits:memory:500Mi# no CPU limit and 500Mi of memory limit since MCC 2.24.0 (15.0.1, 14.0.0)# 500m CPU and 500Mi of memory limit since MCC 2.23.0 (11.7.0)
The memory limits for the following components can be increased on a
management cluster in the
spec:providerSpec:value:kaas:management:helmReleases: section:
admission-controller
credentials-controllerSince MCC 2.28 (17.3.0 and 16.3.0)
Available since MCC 2.24.4 (Cluster releases 15.0.3 and 14.0.3)
You may need to increase the default etcd storage quota that is 2 GB
if etcd runs out of space and there is no other way to clean up the storage
on your management or managed cluster.
To increase storage quota for etcd:
In the spec:providerSpec:value: section of cluster.yaml, edit the
etcd:storageQuota value:
The kaas.mirantis.com/region label is removed from all
Container Cloud and MOSK objects in 24.1.
Therefore, do not add the label starting with these releases. On existing
clusters updated to these releases, or if added manually, Container Cloud
ignores this label.
Applies only to the following Cluster releases:
15.0.3 (MOSK 23.2.2) or 14.0.3
15.0.4 (MOSK 23.2.3) or 14.0.4 if you scheduled a
delayed management cluster upgrade
Available since MCC 2.24.3 (Cluster releases 15.0.2 and 14.0.2)
This section instructs you on how to enable and configure Kubernetes auditing
and profiling options for MKE using the Cluster object of your
MOSK managed or management cluster. These options enable
auditing and profiling of MKE performance with specialized debugging endpoints.
Note
You can also enable audit_log_configuration using the MKE API
with no MOSK overrides. However, if you enable the option
using the Cluster object, use the same object to disable the option.
Otherwise, if you disable the option using the MKE API, it will be
overridden by MOSK and enabled again.
You can configure the following parameters that are also defined in
the MKE configuration file:
Note
The names of the corresponding MKE options are marked with
[] in the below definitions.
level
Defines the value of [audit_log_configuration]level. Valid values
are request and metadata.
Note
For management clusters, the metadata value is set by
default since Container Cloud 2.26.0 (Cluster release 16.1.0).
includeInSupportDump
Defines the value of
[audit_log_configuration]support_dump_include_audit_logs. Boolean.
apiServer:enabled
Defines the value of [cluster_config]kube_api_server_auditing.
Boolean. If set to true but with no level set, the
[audit_log_configuration]level MKE option is set to metadata.
Note
For management clusters, this option is enabled by default
since the Container Cloud 2.26.0 (Cluster release 16.1.0).
maxAge
Available since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0). Defines the value of kube_api_server_audit_log_maxage.
Integer. If not set, defaults to 30.
maxBackup
Available since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0). Defines the value of kube_api_server_audit_log_maxbackup.
Integer. If not set, defaults to 10.
maxSize
Available since Container Cloud 2.27.0 (Cluster releases 17.2.0 and
16.2.0). Defines the value of kube_api_server_audit_log_maxsize.
Integer. If not set, defaults to 10.
Since Container Cloud 2.26.4 (Cluster releases 17.1.4 and 16.1.4), manually
enable audit log rotation in the MKE configuration file:
Note
Since Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0),
the below parameters are automatically enabled with default values along
with the auditing feature. Therefore, skip this step.
The section covers the limitations of Mirantis OpenStack for Kubernetes
(MOSK).
[3544] Due to a community issue,
Kubernetes pods may occasionally not be rescheduled on the nodes that are in
the NotReady state. As a workaround, manually reschedule the pods from
the node in the NotReady state using the
kubectl drain --ignore-daemonsets --force <node-uuid> command.
While operating your management or managed cluster, you may require
collecting and inspecting the cluster logs to analyze cluster events or
troubleshoot issues. For bootstrap logs, see Collect the bootstrap logs.
To collect cluster logs:
Verify that the bootstrap directory is updated.
Select from the following options:
For clusters deployed using Container Cloud 2.11.0 or later:
For clusters deployed using the Container Cloud release earlier than 2.11.0
or if you deleted the kaas-bootstrap folder, download and run
the Container Cloud bootstrap script:
Obtain kubeconfig of the required cluster. The management cluster
kubeconfig file is created during the last stage of the management
cluster bootstrap. To obtain a managed cluster kubeconfig, see
Connect to a MOSK cluster.
Obtain the private SSH key of the required cluster:
For a managed cluster, this is an SSH key added in the Container Cloud
web UI before the managed cluster creation.
For a management cluster, ssh_key is created in the same directory
as the bootstrap script during cluster bootstrap.
Note
If the initial version of your management cluster was earlier than
2.6.0, ssh_key is named openstack_tmp and is located at ~/.ssh/.
Depending on the cluster type that you require logs from, run the
corresponding command:
Substitute the parameters enclosed in angle brackets with the corresponding
values of your cluster.
Optional flags:
--output-dir
Directory path to save logs. The default value is logs/.
For example, logs/<clusterName>/events.log.
--extended
Output the extended version of logs that contains system and MKE logs,
logs from LCM Ansible and LCM Agent along with cluster events and
Kubernetes resources description and logs.
Without the --extended flag, the basic version of logs is collected, which
is sufficient for most use cases. The basic version of logs contains all
events, Kubernetes custom resources, and logs from all cluster components.
This version does not require passing --key-file.
If you require logs of a cluster update, inspect the following folders
on the control plane nodes:
/objects/namespaced/<namespaceName>/core/pods/lcm-lcm-controller-<controllerID>/ for the lcm-controller logs.
/objects/namespaced/<namespaceName>/core/pods/<providerName-ID>/
for logs of the provider controller. For example,
baremetal-provider-5b96fb4fd6-bhl7g.
/system/mke/<controllerMachineName>/ (or
/system/<controllerMachineName>/mke/) for the MKE support dump.
The dsinfo/dsinfo.txt file contains Docker and system information
of the MKE configuration set before and after update.
events.log for cluster events logs.
Technology Preview. Assess the Ironic pod logs:
Extract the content of the 'message' fields from every log message:
The syslog container collects logs generated by Ansible during the node
deployment and cleanup and outputs them in the JSON format.
Compress the collected log files and send the archive to the Mirantis
support team.
Inspect the history of a cluster and machine deployment or update¶
Available since MCC 2.22.0 (11.6.0)
Using the ClusterDeploymentStatus, ClusterUpgradeStatus,
MachineDeploymentStatus, and MachineUpgradeStatus objects, you
can inspect historical data of cluster and machine deployment or update
stages, their time stamps, statuses, and failure messages, if any.
Caution
The order of cluster and machine update stages may not always
be sorted by a time stamp but have an approximate logical order due to
several components running simultaneously.
Log in to the Container Cloud web UI with the m:kaas:namespace@operator or
m:kaas:namespace@writer permissions.
Switch to the required project using the Switch Project
action icon on top of the main left-side navigation panel.
In the Clusters tab, click the More action icon
in the last column of the required cluster area and select
History to display details of the ClusterDeploymentStatus
or ClusterUpgradeStatus object, if any.
In the window that opens, click the required object to display the
object stages, their time stamps, and statuses.
Object names match the initial and/or target Cluster release versions
and MKE versions of the cluster at a specific date and time. For example,
11.6.0+3.5.5 (initial version) or
11.5.0+3.5.5 -> 11.6.0+3.5.5.
If any stage fails, hover over the Failure status field to
display the failure message.
Optional. Inspect the deployment and update status of the cluster machines:
In the Clusters tab, click the required cluster name. The
cluster page with Machines list opens.
Click More action icon in the last column of the required
machine area and select History.
kind:ClusterDeploymentStatusmetadata:annotations:lcm.mirantis.com/new-history:"true"creationTimestamp:"2022-12-13T15:25:49Z"name:test-managednamespace:defaultownerReferences:-apiVersion:cluster.k8s.io/v1alpha1kind:Clustername:test-managedrelease:11.6.0+3.5.5stages:-message:""name:Network preparedstatus:Successsuccess:truetimestamp:"2022-12-13T15:27:19Z"-message:""name:Load balancers createdstatus:Successsuccess:truetimestamp:"2022-12-13T15:27:56Z"-message:""name:IAM objects createdstatus:Successsuccess:truetimestamp:"2022-12-13T15:27:21Z"-message:""name:Kubernetes API startedstatus:Successsuccess:truetimestamp:"2022-12-13T15:57:05Z"-message:""name:Helm-controller deployedstatus:Successsuccess:truetimestamp:"2022-12-13T15:57:13Z"-message:""name:HelmBundle createdstatus:Successsuccess:truetimestamp:"2022-12-13T15:57:15Z"-message:""name:Certificates configuredstatus:Successsuccess:truetimestamp:"2022-12-13T15:58:29Z"-message:""name:All machines of the cluster are readystatus:Successsuccess:truetimestamp:"2022-12-13T16:04:49Z"-message:""name:OIDC configuredstatus:Successsuccess:truetimestamp:"2022-12-13T16:04:07Z"-message:""name:Cluster is readystatus:Successsuccess:truetimestamp:"2022-12-13T16:14:04Z"
Object names match the initial and/or target Cluster release versions
and MKE versions of the cluster. For example,
11.5.0+3.5.5 (initial version) or
11.5.0+3.5.5 -> 11.6.0+3.5.5. Each object displays
the update stages, their time stamps, and statuses. If any stage fails,
the success field contains a failure message.
If you delete managed cluster nodes not using the Container Cloud web UI
or API, the cluster deletion or detachment may hang with the Deleting
message remaining in the cluster status.
To apply the issue resolution:
Expand the menu of the tab with your username.
Click Downloadkubeconfig to download kubeconfig
of your management cluster.
Log in to any local machine with kubectl installed.
The ‘database space exceeded’ error on large clusters¶
Occasionally, cluster update may get stuck on large clusters running 500+
nodes along with 15k+ pods due to the etcd database overflow. The following
error occurs every time when accessing the Kubernetes API server:
etcdserver:mvcc:databasespaceexceeded
Normally, kube-apiserver actively compacts the etcd database. In
rare cases, it is required to manually compact the etcd database as described
below, for example, during rapid creation of numerous Kubernetes objects.
Once done, Mirantis recommends that you identify the root cause of the issue
and clean up unnecessary resources to prevent manual etcd compacting and
defragmentation in future.
To apply the issue resolution:
Since Container Cloud 2.24.0 (Cluster release 14.0.0)
Open an SSH connection to any controller node.
Execute the following script to compact and defragment the etcd database:
sudo-i
compact_etcd.sh
defrag_etcd.sh
Before Container Cloud 2.24.0 (Cluster release 14.0.0)
If auditd contains a lot of rules, it may generate a lot of events and overrun
the buffer. Therefore, verify and update your preset and custom rules. Preset
rules are defined as presetRules, custom rules are defined as follows:
customRules
customRulesX32
customRulesX64
To verify and update the rules:
In the Cluster object of the affected cluster, verify that the
presetRules string does not start with the ! symbol.
Verify all audit rules:
Log in through SSH or directly using the console to the node having the
buffer overrun symptoms.
Run the following command:
auditctl-l
In the system response, identify the rules to exclude.
In /etc/audit/rules.d, find the files containing the rules to
exclude.
If the file is named 60-custom.rules, remove the rules from
any of the following parameters located in the Cluster object:
customRules
customRulesX32
customRulesX64
If the file is named 50-<NAME>.rules, and you want to exclude
all rules from that file, exclude the preset named <NAME>
from the list of allowed presets defined under presetRules in the
Cluster object.
If the file is named 50-<NAME>.rules, and you want to exclude only
several rules from that file:
Copy the rules you want to keep to one of the following parameters
located in the Cluster object:
customRules
customRulesX32
customRulesX64
Exclude the preset named <NAME> from the list of allowed
presets.
By default, the backlog buffer size is set to 8192, which is enough for most
use cases. To prevent buffer overrun, you can adjust the default value to fit
your needs. But keep in mind that increasing this value leads to higher memory
requirements because the buffer uses RAM.
To estimate RAM requirements for the buffer, you can use the following
calculation:
A buffer of 8192 audit records uses ~70 MiB of RAM
A buffer of 15000 audit records uses ~128 MiB of RAM
To change the backlog buffer size, adjust the backlogLimit value in the
Cluster object of the affected cluster.
You may also want to change the size directly on the system and verify
the result at once. But to change the size permanently, use the Cluster
object.
To adjust the size of the backlog buffer on a node:
Log in to the affected node through SSH or directly through the console.
If enabledAtBoot is enabled, adjust the audit_backlog_limit value
in kernel options:
List grub configuration files where GRUB_CMDLINE_LINUX is defined:
In each file obtained in the previous step, edit the
GRUB_CMDLINE_LINUX string by changing the integer value after
audit_backlog_limit= to the desired value.
In /etc/audit/rules.d/audit.rules, adjust the buffer size by editing
the integer value after -b.
Select from the following options:
If the auditd configuration is not immutable, restart the auditd
service:
systemctlrestartauditd.service
If the auditd configuration is immutable, reboot the node. The auditd
configuration is immutable if any of the following conditions are met:
In the auditctl -s output, the enabled parameter is set
to 2
The -e2 flag is defined explicitly in parameters of any custom
rule
The immutable preset is defined explicitly
The virtual preset all is enabled and the immutable preset
is not excluded explicitly
Caution
Arrange the time to reboot the node according to your
maintenance schedule. For the exact reboot procedure, use your
maintenance policies.
If the backloglimitexceeded message disappears, adjust the size
permanently using the backlogLimit value in the Cluster object.
The issue may occur because the default Docker network address
172.17.0.0/16 and/or the kind Docker network, which is used by
kind, overlap with your cloud address or other addresses of network
configuration.
To apply the issue resolution:
Log in to your local machine.
Verify routing to the IP addresses of the target cloud endpoints:
Obtain the IP address of your target cloud. For example:
If the routing is incorrect, change the IP address of the default Docker
bridge:
Create or edit /etc/docker/daemon.json by adding the "bip"
option:
{"bip":"192.168.91.1/24"}
Restart the Docker daemon:
sudosystemctlrestartdocker
If required, customize addresses for your kind Docker network or any
other additional Docker networks:
Remove the kind network:
dockernetworkrm'kind'
Select one of the following options:
Configure /etc/docker/daemon.json:
Note
The following steps are applied to to customize addresses
for the kind Docker network. Use these steps as an example for
any other additional Docker networks.
Add the following section to /etc/docker/daemon.json:
Docker pruning removes the user defined networks,
including 'kind'. Therefore, every time after running the Docker
pruning commands, re-create the 'kind' network again using the
command above.
If the BootstrapRegion object is in the Error state, find the error
type in the Status field of the object for the following components to
resolve the issue:
Field name
Troubleshooting steps
Helm
If the bootstrap HelmBundle is not ready for a long time, for example,
during 15 minutes in case of an average network bandwidth, verify
statuses of non-ready releases and resolve the issue depending
on the error message of a particular release:
The deployment statuses of a Machine object are the same as the
LCMMachine object states:
Uninitialized - the machine is not yet assigned to an LCMCluster.
Pending - the agent reports a node IP address and host name.
Prepare - the machine executes StateItems that correspond
to the prepare phase. This phase usually involves downloading
the necessary archives and packages.
Deploy - the machine executes StateItems that correspond
to the deploy phase that is becoming a Mirantis Kubernetes Engine (MKE)
node.
Ready - the machine is being deployed.
Upgrade - the machine is being upgraded to the new MKE version.
Reconfigure - the machine executes StateItems that correspond
to the reconfigure phase. The machine configuration is being updated
without affecting workloads running on the machine.
If the system response is empty, approve the BootstrapRegion object using
the Container Cloud CLI:
./container-cloudbootstrapapproveall
If the system response is not empty and the status remains the same for a
while, the issue may relate to machine misconfiguration. Therefore, verify
and adjust the parameters of the affected Machine object.
If the cluster deployment is stuck on the same stage for a long time, it may
be related to configuration issues in the Machine or other deployment
objects.
To troubleshoot cluster deployment:
Identify the current deployment stage that got stuck:
The syslog container collects logs generated by Ansible during the node
deployment and cleanup and outputs them in the JSON format.
Note
Add COLLECT_EXTENDED_LOGS=true before the
collect_logs command to output the extended version of logs
that contains system and MKE logs, logs from LCM Ansible and LCM Agent
along with cluster events and Kubernetes resources description and logs.
Without the --extended flag, the basic version of logs is collected, which
is sufficient for most use cases. The basic version of logs contains all
events, Kubernetes custom resources, and logs from all cluster components.
This version does not require passing --key-file.
The logs are collected in the directory where the bootstrap script
is located.
The Container Cloud logs structure in <output_dir>/<cluster_name>/
is as follows:
/events.log
Human-readable table that contains information about the cluster events.
/system
System logs.
/system/mke (or /system/MachineName/mke)
Mirantis Kuberntes Engine (MKE) logs.
/objects/cluster
Logs of the non-namespaced Kubernetes objects.
/objects/namespaced
Logs of the namespaced Kubernetes objects.
/objects/namespaced/<namespaceName>/core/pods
Logs of the pods from a specific Kubernetes namespace. For example, logs
of the pods from the kaas namespace contain logs of Container Cloud
controllers, including bootstrap-cluster-controller
since Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0).
Logs collected by the syslog container during the bootstrap phase
are not transferred to the management cluster during pivoting. These logs
are located in /volume/log/ironic/ansible_conductor.log inside the
Ironic pod.
Each log entry of the management cluster logs contains a request ID that
identifies chronology of actions performed on a cluster or machine.
The format of the log entry is as follows:
<process ID>.[<subprocess ID>...<subprocess ID N>].req:<requestID>: <logMessage>
For example, bm.machine.req:28 contains information about the task 28
applied to a bare metal machine.
Since Container Cloud 2.22.0 (Cluster release 11.6.0), the logging format has
the following extended structure for the admission-controller,
storage-discovery, and all baremetal-provider services of a management
cluster:
Informational level. Possible values: debug, info, warn,
error, panic.
ts
Time stamp in the <YYYY-MM-DDTHH:mm:ssZ> format. For example:
2022-11-14T21:37:23Z.
logger
Details on the process ID being logged:
<processID>
Primary process identifier. The list of possible values includes bm,
iam, license, and bootstrap.
Note
The iam and license values are available since
Container Cloud 2.23.0 (Cluster release 11.7.0). The bootstrap
value is available since Container Cloud 2.25.0 (Cluster release
16.0.0).
<subProcessID(s)>
One or more secondary process identifiers. The list of possible values
includes cluster, machine, controller, and cluster-ctrl.
Note
The controller value is available since Container Cloud
2.23.0 (Cluster release 11.7.0).
The cluster-ctrl value is available since Container Cloud
2.25.0 (Cluster release 16.0.0) for the bootstrap process
identifier.
req
Request ID number that increases when a service performs the following
actions:
Receives a request from Kubernetes about creating, updating, or
deleting an object
Receives an HTTP request
Runs a background process
The request ID allows combining all operations performed with an object
within one request. For example, the result of a Machine object
creation, update of its statuses, and so on has the same request ID.
caller
Code line used to apply the corresponding action to an object.
msg
Description of a deployment or update phase. If empty, it contains the
"error" key with a message followed by the "stacktrace" key with
stack trace details. For example:
"msg"="" "error"="Cluster nodes are not yet ready" "stacktrace": "<stack-trace-info>"
The log format of the following Container Cloud components does
not contain the "stacktrace" key for easier log handling:
baremetal-provider, bootstrap-provider, and
host-os-modules-controller.
Note
Logs may also include a number of informational key-value pairs
containing additional cluster details. For example,
"name":"object-name","foobar":"baz".
Depending on the type of issue found in logs, apply the corresponding fixes.
For example, if you detect the LoadBalancerERRORstate errors
during the bootstrap of an OpenStack-based management cluster,
contact your system administrator to fix the issue.
This section provides solutions to the issues that may occur while managing
your bare metal infrastructure.
Log in to the IPA virtual console for hardware troubleshooting¶
Container Cloud uses kernel and initramfs files with the
pre-installed Ironic Python Agent
(IPA) for inspection of server hardware. The IPA image initramfs is based
on Ubuntu Server.
If you need to troubleshoot hardware during inspection, you can use the IPA
virtual console to assess hardware logs and image configuration.
To log in to the IPA virtual console of a bare metal host:
Create the bare metal host object for the required bare metal host as
described in Add a bare metal host using CLI and wait for inspection to complete.
Caution
Meantime, do not create the Machine object for the
bare metal host being inspected to prevent automatic provisioning.
Using the pwgen utility, recover the dynamically calculated
password of the IPA image:
Remotely log in to the IPA console of the bare metal host using the
devuser user name and the password obtained in the previous step.
For example, use IPMItool, Integrated Lights-Out, or the iDRAC web UI.
Note
To assess the IPA logs, use the
journalctl -u ironic-python-agent.service command.
Bare metal hosts in ‘provisioned registration error’ state after update¶
After update of a management or managed cluster created using the Container
Cloud release earlier than 2.6.0, a bare metal host state is
Provisioned in the Container Cloud web UI while having the error state
in logs with the following message:
The issue is caused by the image URL pointing to an unavailable resource
due to the URI IP change during update. To apply the issue resolution, update
URLs for the bare metal host status and spec with the correct values
that use a stable DNS record as a host.
To apply the issue resolution:
Note
In the commands below, we update master-2 as an example.
Replace it with the corresponding value to fit your deployment.
Exit Lens.
In a new terminal, configure access to the affected cluster.
Close the terminal to quit kube-proxy and resume Lens.
Inspection error on bare metal hosts after dnsmasq restart¶
If the dnsmasq pod is restarted during the bootstrap of newly added
nodes, those nodes may fail to undergo inspection. That can result in
inspectionerror in the corresponding BareMetalHost objects.
The issue can occur when:
The dnsmasq pod was moved to another node.
DHCP subnets were changed, including addition or removal. In this case, the
dhcpd container of the dnsmasq pod is restarted.
Caution
If changing or adding of DHCP subnets is required to bootstrap
new nodes, wait after changing or adding DHCP subnets until the
dnsmasq pod becomes ready, then create BareMetalHost objects.
To verify whether the nodes are affected:
Verify whether the BareMetalHost objects contain the
inspectionerror:
Verify whether the dnsmasq pod was in Ready state when the
inspection of the affected baremetal hosts (test-worker-3 in the example
above) was started:
In the system response above, inspection was started at
"2024-10-11T07:38:19Z", immediately before the period of the dhcpd
container downtime. Therefore, this node is most likely affected by the
issue.
To apply the issue resolution:
Reboot the node using the IPMI reset or cycle
command.
If the node fails to boot, remove the failed BareMetalHost object and
create it again:
Remove BareMetalHost object. For example:
kubectldeletebmh-nmanaged-nstest-worker-3
Verify that the BareMetalHost object is removed:
kubectlgetbmh-nmanaged-nstest-worker-3
Create a BareMetalHost object from the template. For example:
Troubleshoot an operating system upgrade with host restart¶
Mandatory host restart for the operating system (OS) upgrade is designed to be
safe and takes certain precautions to protect the user data and the cluster
integrity. However, sometimes it may result in a host-level failure and block
the cluster update. Use this section to troubleshoot such issues.
Warning
The OS upgrade cannot be rolled back on a host or cluster level.
If the OS upgrade fails, recover or remove the faulty host before you can
complete the cluster upgrade.
Caution
Depending on the cluster configuration, applying security
updates and host restart can increase the update time for each node to up to
1 hour.
Cluster nodes are updated one by one. Therefore, for large clusters,
the update may take several days to complete.
If the cluster upgrade does not start, verify whether the
ceph-clusterworkloadlock object is present in the Container Cloud
Management API:
kubectlgetclusterworkloadlocks
Example of system response:
NAMEAGE
ceph-clusterworkloadlock7h37m
This object indicates that LCM operations that require hosts restart cannot
start on the cluster. The Ceph Controller verifies that Ceph services are
prepared for restart. Once the Ceph Controller completes verification, it
removes the ceph-clusterworkloadlock object and the cluster upgrade starts.
If this object is still present after the upgrade is initiated, assess the
logs of the ceph-controller pod to identify and fix errors:
If the host cannot boot after upgrade, verify the following possible issues:
Invalid boot order configuration in the host BIOS settings
Inspect the host settings using the IPMI console. If you see a message
about an invalid boot device, verify and correct the boot order in the host
BIOS settings. Set the first boot device to a network card and the second
device to a local disk (legacy or UEFI).
The host is stuck in the GRUB rescue mode
If you see the following message, you are likely affected by the
Ubuntu known issue
in the Ubuntu grub-installer:
Enteringrescuemode...
grubrescue>
In this case, redeploy the host with a correctly defined
BareMetalHostProfile. You will have to delete the corresponding
Machine resource and create a new Machine with the corresponding
BareMetalHostProfile. For details, see Create MOSK host profiles.
Container Cloud relies on iPXE to remotely bootstrap bare metal machines
before provisioning them to Kubernetes clusters. The remote bootstrap with
iPXE depends on the state of the underlay network. Incorrect or suboptimal
configuration of the underlay network can cause the process to fail.
The following error may mean that network configuration is incorrect:
Network switch not forwarding packets for a prolonged period after the server
brings up a link to a switch port may be the reason for this error. It may
happen because the switch waits for the Spanning Tree Protocol (STP)
configuration on the port.
To avoid this issue, configure the ports connecting the servers in
STP portfast mode. See details in the vendor documentation for your
particular network switch, for example:
Provisioning failure due to device naming issues in a bare metal host profile¶
During a bare metal host provisioning, transition to each stage implies the
host reboot. This may cause device name issues if a device is configured
using the by_name device identifier.
In Linux, assignment of device names, for example, /dev/sda,
to physical disks can change, especially in systems with multiple
disks or when hardware configuration changes. For example:
If you add or remove a hard drive or change the boot order, the device names
can shift.
If the system uses hardware with additional disk array controllers, such as
RaidControllers in the JBOD mode, device names can shift during
reboot. This can lead to unintended consequences and potential data loss if
your file systems are not mounted correctly.
The /dev/sda partition on the first boot may become /dev/sdb on the
second boot. Consequently, your file system may not be provisioned as
expected, leading to errors during disk formatting and assembling.
Linux recommends using unique identifiers (UUIDs) or labels for device
identification in /etc/fstab. These identifiers are more stable and ensure
that the defined devices are mounted regardless of the naming changes.
Therefore, to prevent device naming issues during a bare metal host
provisioning, instead of the by_name identifier, Mirantis recommends using
the workBy parameter along with device labels or filters such as
minSize and maxSize. These device settings ensure a successful
bare metal host provisioning with /dev/disk/by-uuid/<UUID> or
/dev/disk/by-label/<label> in /etc/fstab. For details on workBy,
see BareMetalHostProfile spec.
Overview of the device naming logic in a bare metal host profile¶
To manage physical devices, the bare metal provider uses the following
entities:
The status.hardware.storage fields of the BareMetalHost object
Initial description of physical disks that is discovered only once during
a bare metal host inspection.
The status.hostInfo.storage fields of the LCMMachine object
Current state of physical disks during life cycle of Machine and
LCMMachine objects.
The default device naming workflow during management of BareMetalHost and
BareMetalHostProfile objects is as follows:
An operator creates the BareMetalHostInventory and
BareMetalHostCredential objects.
Note
Before update of the management cluster to Container Cloud 2.29.0
(Cluster release 16.4.0), instead of BareMetalHostInventory, use the
BareMetalHost object. For details, see Container Cloud API Reference:
BareMetalHost resource.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
The baremetal-operator service inspects the objects.
The operator creates or reviews an existing BareMetalHostProfile object
using the status.hardware.storage fields of the BareMetalHost
object associated with the created BareMetalHostInventory object. For
details, see Create a custom bare metal host profile.
The baremeral-provider service starts processing
BareMetalHostProfile and searching for suitable hardware disks to build
the internal AnsibleExtra object configuration. During the building
process:
The first suitable disk for an item in the bmhp.spec.devices list
is selected.
The cleanup and provisioning stage of BareMetalHost starts:
During provisioning, the selection order described in bmhp.workBy
applies. For details, see Create MOSK host profiles.
This logic ensures that an exact by_id name is taken from the
discovery stage, as opposed to by_name that can be changed during
transition from the inspection to provisioning stage.
After provisioning finishes, the target system /etc/fstab is
generated using UUIDs.
This section describes how to recover a failed or accidentally removed Ceph
cluster in the following cases:
If Ceph Controller underlying a running Rook Ceph cluster has failed and
you want to install a new Ceph Controller Helm release and recover the failed
Ceph cluster onto the new Ceph Controller.
To migrate the data of an existing Ceph cluster to a new deployment in case
downtime can be tolerated.
Consider the common state of a failed or removed Ceph cluster:
The rook-ceph namespace does not contain pods or they are in the
Terminating state.
The rook-ceph or/and ceph-lcm-mirantis namespaces are in the
Terminating state.
The ceph-operator is in the FAILED state:
Management cluster: the state of the ceph-operator Helm release in the
management HelmBundle, such as default/kaas-mgmt, has switched from
DEPLOYED to FAILED.
Managed cluster: the state of the osh-system/ceph-operator
HelmBundle, or a related namespace, has switched from DEPLOYED to
FAILED.
The Rook CephCluster, CephBlockPool, CephObjectStore CRs in the
rook-ceph namespace cannot be found or have the deletionTimestamp
parameter in the metadata section.
Note
Prior to recovering the Ceph cluster, verify that your deployment
meets the following prerequisites:
The Ceph cluster fsid exists.
The Ceph cluster Monitor keyrings exist.
The Ceph cluster devices exist and include the data previously handled by
Ceph OSDs.
Back up the remaining resources. Skip the commands for the resources that
have already been removed:
kubectl-nrook-cephgetcephcluster<clusterName>-oyaml>backup/cephcluster.yaml
# perform this for each cephblockpool
kubectl-nrook-cephgetcephblockpool<cephBlockPool-i>-oyaml>backup/<cephBlockPool-i>.yaml
# perform this for each client
kubectl-nrook-cephgetcephclient<cephclient-i>-oyaml>backup/<cephclient-i>.yaml
kubectl-nrook-cephgetcephobjectstore<cephObjectStoreName>-oyaml>backup/<cephObjectStoreName>.yaml
# perform this for each secret
kubectl-nrook-cephgetsecret<secret-i>-oyaml>backup/<secret-i>.yaml
# perform this for each configMap
kubectl-nrook-cephgetcm<cm-i>-oyaml>backup/<cm-i>.yaml
SSH to each node where the Ceph Monitors or Ceph OSDs were placed before the
failure and back up the valuable data:
Before MOSK 25.1, use MiraCephLog
instead of MiraCephHealth as the object name and in the command
above.
Edit the CephCluster, CephBlockPool, CephClient, and
CephObjectStore CRs of the rook-ceph namespace and remove the
finalizer parameter from the metadata section:
Once you clean up every single resource related to the Ceph release,
open the Cluster CR for editing:
kubectl-n<projectName>editcluster<clusterName>
Substitute <projectName> with default for the management cluster
or with a related project name for the managed cluster.
Remove the ceph-controller Helm release item from the
spec.providerSpec.value.helmReleases array and save the Cluster
CR:
-name:ceph-controllervalues:{}
Verify that ceph-controller has disappeared from the corresponding
HelmBundle:
kubectl-n<projectName>gethelmbundle-oyaml
Open the KaaSCephCluster CR of the related management or managed cluster
for editing:
kubectl-n<projectName>editkaascephcluster
Substitute <projectName> with default for the management cluster or
with a related project name for the managed cluster.
Edit the roles of nodes. The entire nodes spec must contain only one
mon role. Save KaaSCephCluster after editing.
Open the Cluster CR for editing:
kubectl-n<projectName>editcluster<clusterName>
Substitute <projectName> with default for the management cluster or
with a related project name for the managed cluster.
Add ceph-controller to spec.providerSpec.value.helmReleases to
restore the ceph-controller Helm release. Save Cluster after
editing.
-name:ceph-controllervalues:{}
Verify that the ceph-controller Helm release is deployed:
Inspect the Rook Operator logs and wait until the orchestration has
settled:
kubectl-nrook-cephlogs-lapp=rook-ceph-operator
Verify that the pods in the rook-ceph namespace have
rook-ceph-mon-a, rook-ceph-mgr-a, and all the auxiliary pods ar
up and running, and no rook-ceph-osd-ID-xxxxxx are running:
kubectl-nrook-cephgetpod
Verify the Ceph state. The output must indicate that one mon and one
mgr are running, all Ceph OSDs are down, and all PGs are in the
Unknown state.
Rook should not start any Ceph OSD daemon because all devices
belong to the old cluster that has a different fsid. To verify the
Ceph OSD daemons, inspect the osd-prepare pods logs:
kubectl-nrook-cephlogs-lapp=rook-ceph-osd-prepare
Connect to the terminal of the rook-ceph-mon-a pod:
Inside the container, create /etc/ceph/ceph.conf for a stable
operation of ceph-mon:
touch/etc/ceph/ceph.conf
Change the directory to /var/lib/rook and edit monmap by
replacing the existing mon hosts with the new mon-a endpoints:
cd/var/lib/rook
rm/var/lib/rook/mon-a/data/store.db/LOCK# make sure the quorum lock file does not exist
ceph-mon--extract-monmapmonmap--mon-data./mon-a/data# Extract monmap from old ceph-mon db and save as monmap
monmaptool--printmonmap# Print the monmap content, which reflects the old cluster ceph-mon configuration.
monmaptool--rmamonmap# Delete `a` from monmap.
monmaptool--rmbmonmap# Repeat, and delete `b` from monmap.
monmaptool--rmcmonmap# Repeat this pattern until all the old ceph-mons are removed and monmap won't be empty
monmaptool--addva[v2:<nodeIP>:3300,v1:<nodeIP>:6789]monmap# Replace it with the rook-ceph-mon-a address you got from previous command.
ceph-mon--inject-monmapmonmap--mon-data./mon-a/data# Replace monmap in ceph-mon db with our modified version.
rmmonmap
exit
Substitute <nodeIP> with the IP address of the current <nodeName>
node.
Close the SSH connection.
Change fsid to the original one to run Rook as an old cluster:
kubectl-nrook-cepheditsecret/rook-ceph-mon
Note
The fsid is base64 encoded and must not contain a trailing
carriage return. For example:
echo-na811f99a-d865-46b7-8f2c-f94c064e4356|base64# Replace with the fsid from the old cluster.
Scale the ceph-lcm-mirantis/ceph-controller deployment replicas to
0:
Inspect the Rook Operator logs and wait until the orchestration has settled:
kubectl-nrook-cephlogs-lapp=rook-ceph-operator
Verify that the pods in the rook-ceph namespace have the
rook-ceph-mon-a, rook-ceph-mgr-a, and all the auxiliary pods are up
and running, and all rook-ceph-osd-ID-xxxxxx greater than zero are
running:
kubectl-nrook-cephgetpod
Verify the Ceph state. The output must indicate that one mon, one
mgr, and all Ceph OSDs are up and running and all PGs are either in the
Active or Degraded state:
Inspect the Rook Operator logs and wait until the orchestration has settled:
kubectl-nrook-cephlogs-lapp=rook-ceph-operator
Verify that the pods in the rook-ceph namespace have the
rook-ceph-mon-a, rook-ceph-mgr-a, and all the auxiliary pods are up
and running, and all rook-ceph-osd-ID-xxxxxx greater than zero are
running:
kubectl-nrook-cephgetpod
Verify the Ceph state. The output must indicate that one mon, one
mgr, and all Ceph OSDs are up and running and the overall stored data
size equals to the old cluster data size.
Edit the MiraCeph CR and add two more mon and mgr roles to the
corresponding nodes:
kubectl-nceph-lcm-mirantiseditmiraceph
Inspect the Rook namespace and wait until all Ceph Monitors are in the
Running state:
kubectl-nrook-cephgetpod-lapp=rook-ceph-mon
Verify the Ceph state. The output must indicate that three mon (three in
quorum), one mgr, and all Ceph OSDs are up and running and the overall
stored data size equals to the old cluster data size.
Identify the Ceph Monitor pods in the Error or CrashLookBackOff
state:
kubectl-nrook-cephgetpod-l'app in (rook-ceph-mon,rook-ceph-mon-canary)'
Verify that the affected pods contain the failure logs described above:
kubectl-nrook-cephlogs<failedMonPodName>
Substitute <failedMonPodName> with the Ceph Monitor pod name. For
example, rook-ceph-mon-g-845d44b9c6-fjc5d.
Save the identifying letters of failed Ceph Monitors for further usage.
For example, f, e, and so on.
Delete all corresponding deployments of these pods:
Identify the affected Ceph Monitor pod deployments:
kubectl-nrook-cephgetdeploy-l'app in (rook-ceph-mon,rook-ceph-mon-canary)'
Delete the affected Ceph Monitor pod deployments. For example, if the
Ceph cluster has the rook-ceph-mon-c-845d44b9c6-fjc5d pod in
the CrashLoopBackOff state, remove the corresponding
rook-ceph-mon-c:
kubectl-nrook-cephdeletedeployrook-ceph-mon-c
Canary mon deployments have the suffix -canary.
Remove all corresponding entries of Ceph Monitors from the MON map:
Inspect the current MON map and save the IP addresses of the failed Ceph
monitors for further usage:
cephmondump
Remove all entries of failed Ceph Monitors using the previously saved
letters:
cephmonrm<monLetter>
Substitute <monLetter> with the corresponding letter of a failed Ceph
Monitor.
Exit the ceph-tools pod.
Remove all failed Ceph Monitors entries from the Rook mon endpoints
ConfigMap:
Open the rook-ceph/rook-ceph-mon-endpoints ConfigMap for editing:
kubectl-nrook-cepheditcmrook-ceph-mon-endpoints
Remove all entries of failed Ceph Monitors from the ConfigMap data and
update the maxMonId value with the current number of Running
Ceph Monitors. For example, rook-ceph-mon-endpoints has the
following data:
KaaSCephOperationRequest failure with a timeout during rebalance¶
Ceph OSD removal procedure includes the Ceph OSD out action that starts
the Ceph PGs rebalancing process. The total time for rebalancing depends on a
cluster hardware configuration: network bandwidth, Ceph PGs placement, number
of Ceph OSDs, and so on. The default rebalance timeout is limited by 30
minutes, which applies to standard cluster configurations.
If the rebalance takes more than 30 minutes, the KaaSCephOperationRequest
resources created for removing Ceph OSDs or nodes fail with the following
example message:
status:removeStatus:osdRemoveStatus:errorReason:Timeout (30m0s) reached for waiting pg rebalance for osd 2status:Failed
To apply the issue resolution, increase the timeout for all future
KaaSCephOperationRequest resources:
On the management cluster, open the Cluster resource of the affected
managed cluster for editing:
The MON_DISK_LOW Ceph Cluster health message indicates that the
store.db size of the Ceph Monitor is rapidly growing and the compaction
procedure is not working. In most cases, store.db starts storing a
number of logm keys that are buffered due to Ceph OSD shadow errors.
Once prompted Continue?, first verify that rebalancing has finished
for the Ceph cluster, the Ceph OSD is up and in, and all PGs have
returned to their original state:
After some of the affected Ceph OSDs restart, Ceph Monitors will start
decreasing the store.db size to the original 100-300 MB. However,
complete the restart of all Ceph OSDs.
Replaced Ceph OSD fails to start on authorization¶
In rare cases, when the replaced Ceph OSD has the same ID as the previous Ceph
OSD and starts on a device with the same name as the previous Ceph OSD, Rook
fails to update the keyring value, which is stored on a node in the
corresponding host path. Thereby, Ceph OSD cannot start and fails with the
following exemplary log output:
The cluster is affected if keyrings of the failed Ceph OSD of the host
path and Ceph cluster differ. If so, proceed to fixing them and unblock
the failed Ceph OSD.
To fix different keyrings and unblock the Ceph OSD authorization:
Obtain the keyring value of the host path for this Ceph OSD:
SSH on a node hosting the required Ceph OSD.
In /var/lib/rook/rook-ceph, search for a directory containing
the keyring and whoami files that have the number of the
required Ceph OSD. For example:
Export the current Ceph OSD keyring stored in the Ceph cluster:
cephauthgetosd.<ID>-o/tmp/key
Replace the exported key with the value from keyring.
For example:
vi/tmp/key
# replace the key with the one from the keyring file[osd.3]key=AQD2k/BlcE+YJxAA/QsD/fIAL1qPrh3hjQ7AKQ==
capsmgr="allow profile osd"
capsmon="allow profile osd"
capsosd="allow *"
Import the replaced Ceph OSD keyring to the Ceph cluster:
The ceph-exporter pods are present in the Ceph crash list¶
After a managed cluster update, the ceph-exporter pods may be present in
the ceph crash ls or ceph health detail list while
rook-ceph-exporter attempts to obtain the port that is still in use.
For example:
PostgreSQL replication in a Patroni cluster is based on the Write-Ahead Log
(WAL) syncing between the cluster leader and replica. Occasionally, this
mechanism may lag due to networking issues, missing WAL segments (on rotation
or recycle), increased Patroni Pods CPU usage, or due to a hardware failure.
In StackLight, the PostgresqlReplicationSlowWalDownload alert indicates
that the Patroni cluster Replica is out of sync. This alert has the
Warning severity because under such conditions Patroni cluster is still
operational and the issue may disappear without intervention. However, a
persisting replication lag may impact the cluster availability if another Pod
in the cluster fails, leaving the leader alone to serve requests. In this case,
the Patroni leader will become read-only and unable to serve write requests,
which can cause outage of Alerta backed by Patroni. Grafana, which also uses
Patroni, will still be operational but any dashboard changes will not be saved.
Therefore, if PostgresqlReplicationSlowWalDownload fires, observe the
cluster and fix it if the issue persists or if the lag grows.
In the Alertmanager or Alerta web UI, verify that no new alerts are firing
for Patroni. The PostgresqlInsufficientWorkingMemory alert may become
pending during the operation but should not fire.
Alertmanager does not send resolve notifications for custom alerts¶
Due to the Alertmanager issue, Alertmanager loses
the in-memory alerts during restart. As a result, StackLight does not send
notifications for custom alerts in the following case:
Adding a custom alert.
Then removing the custom alert and at the same time changing the
Alertmanager configuration such as adding or removing a receiver.
For a removed custom alert, Alertmanager does not send a resolve notification
to any of the configured receivers. Therefore, until after the time set in
repeat_interval (3 hours by default), the alert will be visible in all
receivers but not in the Prometheus and Alertmanager web UIs.
When the alert is re-added, Alertmanager does not send a firing notification
for it until after the time set in repeat_interval, but the alert will be
visible in the Prometheus and Alertmanager web UIs.
The alert is based on the metric container_cpu_cfs_throttled_periods_total
over container_cpu_cfs_periods_total and means the percentage of
CPU periods where the container ran but was throttled (stopped from
running the whole CPU period).
Investigation
The alert usually fires when a Pod starts, often during brief
intervals. It may solve automatically once the Pod CPU usage stabilizes.
If the issue persists:
Obtain the created_by_name label from the alert.
List the affected Pods using the created_by_name label:
Helm Controller release status differs from deployed. Broken
HelmBundle configurations or missing Helm chart artifacts may cause this
when applying the HelmBundle update.
Investigation
Inspect logs of every Helm Controller Pod for error or warning
messages:
In case of an error to fetch the chart, review the chartURL
fields of the HelmBundle object to verify that the chart URL does not
have typographical errors:
Verify that the chart artifact is accessible from your cluster.
Mitigation
If the chart artifact is not accessible from your cluster, investigate
the network-related alerts, if any, and verify that the file is
available in the repository.
Prometheus fails in at least 10% of Helm Controller metrics scrapes. The
following two components can cause the alert to fire:
Helm Controller Pod(s):
If the Pod is down.
If the Pod target endpoint is at least partially unresponsive. For
example, in case of CPU throttling, application error preventing a
restart, or container flapping.
Prometheus server if it cannot reach the helm-controller endpoint(s).
For each object, identify the deprecated modules being used
and the list of modules that replace the deprecated ones:
kubectlgethoc<hoc_name>-n<hoc_namespace>\
-ogo-template="{{range .status.configs}}\{{if .moduleDeprecatedBy}}\deprecated: {{.moduleName}}-{{.moduleVersion}};\update to: {{range .moduleDeprecatedBy}}{{.name}}-{{.version}}; {{end}}\{{\"\n\"}}\{{end}}\{{end}}"
Read through the documentation or README of the new module,
and manually update all affected HostOSConfiguration objects
migrating from the deprecated version.
Mitigation
Use up-to-date versions of modules during configuration.
Available MCC since 2.27.0 (Cluster releases 17.2.0 and 16.2.0)
Root cause
Kernel logs generated IO error logs, potentially indicating disk issues.
IO errors may occur due to various reasons and are often unpredictable.
Investigation
Inspect kernel logs on affected nodes for IOerrors to pinpoint the issue,
identify the affected disk, if any, and assess its condition. Most major
Linux distributions store kernel logs in /var/log/dmesg and occasionally
in /var/log/kern.log.
If the issue is not related to a faulty disk, further inspect errors in logs
to identify the root cause.
Mitigation
Mitigation steps depend on the identified issue. If the issue is caused by
a faulty disk, replace the affected disk. Additionally, consider the following
measures to prevent such issues in the future:
Implement proactive monitoring of disk health to detect early signs of failure
and initiate replacements preemptively.
Utilize tools such as smartctl or nvme` for routine
collection of disk metrics, enabling prediction of failures and early
identification of underperforming disks to prevent major disruptions.
Related inhibited alert: KubePodsRegularLongTermRestarts.
Root cause
Termination of containers in Pods having .spec.restartPolicy set to
Always causes Kubernetes to bring them back. If the container exits
again, kubelet exponentially increases the back-off delay between next
restarts until it reaches 5 minutes. The Pods being stuck in restarts
loop get the CrashLoopBackOff status. Because of the underlying
metric inertia, StackLight measures restarts in an extended 20-minute
time window.
Investigation
Note
Verify if there are more alerts firing in the MOSK
cluster to obtain more information on the cluster state and simplify
issue investigation and mitigation.
Also examine the relation of the affected application with other
applications (dependencies) and Kubernetes resources it relies on.
During investigation, the affected Pod will likely be in the
CrashLoopBackOff or Error state.
List the unhealthy Pods of a particular application. Use the label
selector, if possible.
Collect logs from one of the unhealthy Pods and inspect them for
errors and stack traces:
kubectllogs-n<pod_namespace><pod_name>
Inspect Kubernetes events or the termination reason and exit code of
the Pod:
kubectldescribepods-n<pod_namespace><pod_name>
Alternatively, inspect K8S Events in the OpenSearch
Dashboards web UI.
In the Kubernetes Pods Grafana dashboard, monitor the Pod
resources usage.
Important
Performing the following step requires understanding
of Kubernetes workloads.
In some scenarios, observing Pods failing in real time may provide
more insight on the issue. To investigate the application this way,
restart (never with the --force flag) one of the failing Pods and
inspect the following in the Kubernetes Pods Grafana
dashboard, events and logs:
Define whether the issue reproduces.
Verify when does the issue reproduce in the Pod uptime: during the
initialization or after some time.
Verify that the application requirements for Kubernetes resources
and external dependencies are satisfied.
Define whether there is an issue with passing the readiness or
liveness tests.
Define how the Pod container terminates and whether it is
OOMKilled.
Note
While investigating, monitor the application health and
verify the resource limits. Most issues can be solved by fixing
the dependent application or tuning, such as providing additional
flags, changing resource limits, and so on.
Mitigation
Fixes typically fall into one of the following categories:
Fix the dependent service. For example, fixing opensearch-master
makes fluentd-logs Pods start successfully.
Fix the configuration if it causes container failure.
Tune the application by providing flags changing its behavior.
Tune the CPU or MEM limits if the system terminates a container upon
hitting the memory limits (OOMKilled) or stops responding because
of CPU throttling.
The Pod could not start successfully for the last 15 minutes, meaning
that its status phase is one of the following:
Pending - at least one Pod container was not created. The Pod
waits for the Kubernetes cluster to satisfy its requirement. For
example, in case of failure to pull the Docker image or create a
persistent volume.
Failed - the Pod terminated in the Error state and was not
restarted. At least one container exited with a non-zero status code
or was terminated by the system, for example, OOMKilled.
Unknown - kubelet communication issues.
Investigation
Note
Verify if there are more alerts firing in the MOSK
cluster to obtain more information on the cluster state and simplify
issue investigation and mitigation.
Also examine the relation of the affected application with other
applications (dependencies) and Kubernetes resources it relies on.
List the unhealthy Pods of the affected application. Use the label
selector, if possible.
For one of the unhealthy Pods, verify Kubernetes events, termination
reason, and exit code (for Failed only) of the Pod:
kubectldescribepods-n<pod_namespace><pod_name>
Alternatively, inspect K8S Events in the OpenSearch
Dashboards web UI.
For Failed Pods, collect logs and inspect them for errors and
stack traces:
kubectllogs-n<pod_namespace><pod_name>
In the Kubernetes Pods Grafana dashboard, monitor the Pod
resources usage.
Mitigation
For Pending, investigate and fix the root cause of the missing Pod
requirements. For example, dependent application failure, unavailable
Docker registry, unresponsive storage provided, and so on.
It is a long-term version of the KubePodsCrashLooping alert, aiming
to catch Pod container restarts in wider time windows. The alert raises
when the Pod container restarts once a day in a 2-days time frame. It
may indicate that a pattern in the application lifecycle needs
investigation, such as deadlocks, memory leaks, and so on.
Investigation
While investigating, the affected Pod will likely be in the Running
state.
List the Pods of the application, which containers were restarted at
least twice. Use the label selector, if possible.
Collect logs for one of the affected Pods and inspect them for errors
and stack traces:
kubectllogs-n<pod_namespace><pod_name>
In the OpenSearch Dashboards web UI, inspect the
K8S events dashboard. Filter the Pod using the
kubernetes.event.involved_object.name key.
In the Kubernetes Pods Grafana dashboard, monitor the Pod
resources usage. Filter the affected Pod and find the point in time
when the container was restarted. Observations may take several days.
Mitigation
Refer to the KubePodsCrashLoopingMitigation section.
Fixing this issue may require more effort than simple application
tuning. You may need to upgrade the application, upgrade its dependency
libraries, or apply a fix in the application code.
Deployment generation, or version, occupies 2 fields in the object:
.metadata.generation (updated upon kubectl apply
execution) - the desired Deployment generation.
.status.observedGeneration (triggers a new ReplicaSet rollout)
- observed by Deployment controller.
When the Deployment controller fails to observe a new Deployment
version, these 2 fields differ. The mismatch lasting for more than 15
minutes triggers the alert.
Investigation and mitigation
The alert indicates failure of a Kubernetes built-in Deployment
controller and requires debugging on the control plane level. See
Troubleshooting Guide for details on collecting cluster state and
mitigating known issues.
The number of available Deployment replicas did not match the desired
state set in the .spec.replicas field for the last 30 minutes,
meaning that at least one Deployment Pod is down.
The number of running StatefulSet replicas did not match the desired
state set in the .spec.replicas field for the last 30 minutes,
meaning that at least one StatefulSet Pod is down.
StatefulSet generation, or version, occupies 2 fields in the
object:
.metadata.generation (updated upon kubectl apply
execution) - the desired StatefulSet generation.
.status.observedGeneration (triggers a new ReplicaSet rollout)
- observed by StatefulSet controller.
When the StatefulSet controller fails to observe a new
StatefulSet version, these 2 fields differ. The mismatch lasting for
more than 15 minutes triggers the alert.
Investigation and mitigation
The alert indicates failure of a Kubernetes built-in StatefulSet
controller and requires debugging on the control plane level. See
Troubleshooting Guide for details on collecting cluster state and mitigating
known issues.
Related inhibited alerts: KubeStatefulSetReplicasMismatch and
KubeStatefulSetUpdateNotRolledOut.
Root cause
StatefulSet workloads are typically distributed across Kubernetes
nodes. Therefore, losing more than one replica indicates either a
serious application failure or issues on the Kubernetes cluster level.
The application likely experiences severe performance degradation and
availability issues.
Investigation
Verify the StatefulSet status:
kubectlgetsts-n<sts_namespace><sts_name>
Inspect the related Kubernetes events for error messages and probe
failures
kubectldescribests-n<sts_namespace><sts_name>
If events are unavailable, inspect K8S Events in the
OpenSearch Dashboards web UI.
List the StatefulSet Pods and verify them one by one. Use the
label selectors, if possible.
Refer to KubePodsCrashLooping. If after fixing the root cause
on the Pod level the affected Pods are still non-Running, contact
Mirantis support. StatefulSets must be treated with special caution
as they store data and their internal state.
The StatefulSet update did not finish in 30 minutes, which was
detected in the mismatch of the .spec.replicas and
.status.updatedReplicas fields. Such issue may occur during a
rolling update if a newly created Pod fails to pass the readiness test
and blocks the update.
The output includes the number of updated Pods. In Container Cloud,
StatefulSets use the RollingUpdate strategy for upgrades and the
Pod management policy does not affect updates. Therefore,
investigation requires verifying the failing Pods only.
List the non-Running Pods of the StatefulSet and inspect them one
by one for error messages and probe failures. Use the label
selectors, if possible.
See KubePodsCrashLooping. Pay special attention to the
information about the application cluster issues, as clusters in
Container Cloud are deployed as StatefulSets.
If none of these alerts apply and the new Pod is stuck failing to
pass postStartHook (Pod is in the PodInitializing state) or
the readiness probe (Pod in the Running state, but not fully
ready, for example, 0/1) it may be caused by Pod inability to
join the application cluster. Investigating such issue requires
understanding how the application cluster initializes and how nodes
join the cluster. The PodInitializing state may be especially
problematic as the kubectl logs command does not collect
logs from such Pod.
Warning
Perform the following step with caution and remember to
perform a rollback afterward.
In some StatefulSets, disabling postStartHook unlocks the Pod
to the Running state and allows for logs collection.
If after fixing the root cause on the Pod level the affected Pods are
still non-Running, contact Mirantis support. Treat StatefulSets with
special caution as they store data and their internal state. Improper
handling may result in a broken application cluster state and data loss.
For the last 30 minutes, DaemonSet has at least one Pod (not necessarily
the same one), which is not ready after being correctly scheduled. It
may be caused by missing Pod requirements on the node or unexpected Pod
termination.
Can relate to: KubeCPUOvercommitPods, KubeMemOvercommitPods.
Root cause
At least one Pod of the DaemonSet was not scheduled to a target node.
This may happen if resource requests for the Pod cannot be satisfied
by the node or if the node lacks other resources that the Pod requires,
such as PV of a specific storage class.
Investigation
Identify the number of available and desired Pods of the DaemonSet:
At least one node where the DaemonSet Pods were deployed got a
NoSchedule taint added afterward. Taints are respected during the
scheduling stage only, and the Pod is currently considered unschedulable
to such nodes.
Related inhibiting alert: KubeDaemonSetRolloutStuck.
Root cause
Although the DaemonSet was not scaled down to zero, there are zero
healthy Pods. As each DaemonSet Pod is deployed to a separate Kubernetes
node, such situation is rare and typically caused by a broken
configuration (ConfigMaps or Secrets) or wrongly tuned resource limits.
If the output is true, Kubernetes will not allow new Pods to run
until the current one terminates. In this case, investigate and fix
the issue on the application level.
In case of events similar to Cannot determine if job needs to be
started. Too many missed start time (> 100). Set or decrease
.spec.startingDeadlineSeconds or check clock skew.:
Verify if the ClockSkewDetected alert is firing for the
affected cluster.
Investigate and fix the root cause of missing Pod requirements,
such as failing dependency application, Docker registry
unavailability, unresponsive storage provided, and so on.
The sum of Kubernetes Pods CPU requests is higher than the average
capacity of the cluster without one node or 80% of total nodes CPU
capacity, depending on what is higher. It is a common issue of a cluster
with too many resources deployed.
Investigation
Select one of the following options to verify nodes CPU requests:
Inspect the allocatedresources section in the output of the
following command:
kubectldescribenodes
Inspect the Cluster CPU Capacity panel of the
Kubernetes Cluster Grafana dashboard.
Mitigation
Increase the node(s) CPU capacity or add a worker node(s).
The sum of Kubernetes Pods RAM requests is higher than the average
capacity of the cluster without one node or 80% of total nodes RAM
capacity, depending on what is higher. It is a common issue of a cluster
with too many resources deployed.
Investigation
Select one of the following options to verify nodes RAM requests:
Inspect the allocatedresources section in the output of the
following command:
kubectldescribenodes
Inspect the Cluster Mem Capacity panel of the
Kubernetes Cluster Grafana dashboard.
Mitigation
Increase the node(s) CPU capacity or add a worker node(s).
Verify the configured application retention period.
Optional. Review the data stored on the PV, including the
application data, logs, and so on, to verify the space consumption
and eliminate potential overuse:
kubectl get pv -o json | jq -r '.items[] | select(.status.phase=="Pending" or .status.phase=="Failed") | .metadata.name'
For the PVs in the Failed or Pending state:
kubectl describe pv <pv_name>
Inspect Kubernetes events, if available. Otherwise:
In the Discover section of the OpenSearch Dashboards
web UI, change the index pattern to
kubernetes_events-*.
Expand the time range and filter the results by
kubernetes.event.involved_object.name, which equals to
the <pv_name> from the previous step. In the matched results,
find the kubernetes.event.message field.
If the PV is in the Pending state, it waits to be provisioned.
Verify the PV storage class name:
kubectl get pv <pv_name> -o=json | jq -r '.spec.storageClassName'
Verify the provisioner name specified for the storage class:
kubectl get sc <sc_name> -o=json | jq -r '.spec.provisioner
If the provisioner is deployed as a workload in the affected
Kubernetes cluster, verify if it experiences availability or health
issues. Further investigation and mitigation depends on the
provisioner. The Failed state can be caused by a custom recycler
error when a deprecated Recycle reclaim policy is used.
Mitigation
Fix the PV in Pending state according to the investigation
outcome.
Warning
Deleting a PV causes data loss. Removing PVCs causes
deletion of a PV with the Delete reclaim policy.
Fix the PV in the Failed state:
Investigate the recycler Pod by verifying the
kube-controller-manager configuration. Search for the PV in the
Pod logs.
Delete the Pod and mounted PVC if it is still in the
Terminating state.
If the distribution is extremely odd, investigate custom taints in
underloaded nodes. If some of the custom taints are blocking Pods
from being scheduled - consider adding tolerations or scaling the
MOSK cluster out by adding worker nodes.
If no custom taints exist, add worker nodes.
Delete Pods that can be moved (preferably, multi-node Deployments).
Prometheus scraping of the kube-state-metrics service is unreliable,
resulting in the success rate below 90%. It indicates either failure
of the kube-state-metrics Pod or (in rare scenarios) network issues
causing scrape requests to timeout.
Related alert: KubeDeploymentOutage{deployment=prometheus-kube-state-metrics}
(inhibiting).
Investigation
In the Prometheus web UI, search for firing alerts that relate to
networking issues in the Container Cloud cluster and fix them.
If the cluster network is healthy, refer to the
Investigation section of the
KubePodsCrashLooping alert troubleshooting description
to collect information about CoreDNS Pods.
Mitigation
Based on the investigation results, select from the following options:
Fix the networking issues
Apply solutions from Mitigation
section of the KubePodsCrashLooping alert troubleshooting
description
If the issue still persists, collect the investigation output and
contact Mirantis support for further information.
The Prometheus Blackbox Exporter target probing /healthz endpoints
of the Kubernetes API server nodes is not reliably available.
Prometheus metric scrapes fail. It indicates either the
prometheus-blackbox-exporter Pod failure or (in rare cases)
network issues causing scrape requests to time out.
Related alert: KubeDeploymentOutage{deployment=prometheus-kube-blackbox-exporter}
(inhibiting).
Investigation
In the Prometheus web UI, search for firing alerts that relate to
networking issues in the Container Cloud cluster and fix them.
If the cluster network is healthy, refer to the
Investigation section of the
KubePodsCrashLooping alert troubleshooting description
to collect information about prometheus-blackbox-exporter Pods.
Mitigation
Based on the investigation results, select from the following options:
Fix the networking issues
Apply solutions from Mitigation
section of the KubePodsCrashLooping alert troubleshooting
description
If the issue still persists, collect the investigation output and
contact Mirantis support for further information.
The OpenSearch volume has reached the default flood_stage
disk allocation watermark of 95% disk usage. At this stage, all shards
are in read-only mode.
The OpenSearch volume has reached the default value for the high disk
allocation watermark of 90% disk usage. At this point, OpenSearch
attempts to reassign shards to other nodes if these nodes are still
under 90% of used disk space.
Investigation and mitigation
Verify that the user does not create indices that are not managed
by StackLight, which may also cause unexpected storage usage.
StackLight deletes old data only for its managed indices.
If an OpenSearch volume uses shared storage, such as LVP, disk usage
may still exceed expected limits even if rotation works as expected.
In this case, consider the following solutions:
Increase disk space
Delete old indices
Lower retention thresholds for components that use shared storage.
To reduce OpenSearch space usage, consider adjusting the
elasticsearch.persistentVolumeUsableStorageSizeGB parameter.
By default, elasticsearch-curator deletes old logs when
disk usage exceeds 80%. If it fails to delete old logs, inspect
the known issues described in the product Release Notes.
There are no Release Controller replicas scheduled in the cluster.
By default, 3 replicas should be scheduled. The controller was either
deleted or downscaled to 0.
Investigation
Verify the status of the release-controller-release-controller
deployment:
If the Release Controller deployment has been downscaled to 0, set the
replicas back to 3 in the release-controller Helm release in the
.spec.replicas section of the Deployment object on the
management cluster:
The Telemeter client fails to federate data from Prometheus or to send
data to the Telemeter server due to a very long incoming data sample.
The limit-bytes parameter in the StackLight Helm release is too low.
Investigation
Verify whether the logs of telemeterclient contain alerts
similar to msg="unabletoforwardresults"err="theincomingsampledataistoolong":
kubectl-nstacklightlogstelemeter-client-<podID>
Verify the current length limit established by Helm release:
Add the following parameter to the StackLight Helm release values of
the corresponding Cluster object:
telemetry:telemeterClient:limitBytes:4194304
Wait for the telemeter-client-<podID> Pod to be be recreated
and the byte limit to be changed from --limit-bytes=1048576 to
--limit-bytes=4194304.
OpenSearchPVCMismatch alert raises due to the OpenSearch PVC size mismatch¶
Caution
The below issue resolution applies since Container Cloud 2.22.0
(Cluster release 11.6.0) to existing clusters with insufficient resources.
Before Container Cloud 2.22.0 (Cluster release 11.6.0), use the workaround
described in the StackLight known issue 27732-1.
New clusters deployed on top of Container Cloud 2.22.0 are not affected.
The OpenSearch elasticsearch.persistentVolumeClaimSize custom setting can
be overwritten by logging.persistentVolumeClaimSize during deployment of a
Container Cloud cluster of any type and is set to the default 30Gi. This
issue raises the OpenSearchPVCMismatch alert. Since
elasticsearch.persistentVolumeClaim is immutable, you cannot update
the value by editing of the Cluster object.
Note
This issue does not affect cluster operability if the current volume
capacity is enough for the cluster needs.
To apply the issue resolution, select from the following use cases:
StackLight with an expandable StorageClass for OpenSearch PVCs
Verify that the StorageClass provisioner has enough space to satisfy
the new size:
StackLight with a non-expandable StorageClass for OpenSearch PVCs
If StackLight is operating in HA mode, the local volume provisioner
(LVP) has a non-expandable StorageClass used for OpenSearch PVCs
provisioning. Thus, the affected PV nodes have insufficient disk space.
If StackLight is operating in non-HA mode, the default non-expandable
storage provisioner is used.
Warning
After applying this issue resolution, the existing OpenSearch
data will be lost. If data loss is acceptable, proceed with the steps
below.
Move the existing log data to a new PV if required.
Verify that the provisioner has enough space to satisfy the new size:
OpenSearch cluster deadlock due to the corrupted index¶
Due to instability issues in a cluster, for example, after disaster recovery,
networking issues, or low resources, some OpenSearch master pods may remain
in the PostStartHookError due to the corrupted .opendistro-ism-config
index.
To verify that the cluster is affected:
The cluster is affected only when both conditions are met:
One or two opensearch-master pods are stuck in the
PostStartHookError state.
This internal index will be recreated on the next PostStartHook
execution of any affected replica.
Wait up to 30 minutes, assuming that during this time at least one
attempt of PostStartHook execution occurs, and verify that the
internal index was recreated:
Wait up to 30 minutes and verify whether the affected pods started
normally.
Before 2.27.0 (Cluster releases 17.2.0 and 16.2.0), verify that the
cluster is not affected by the issue 40020.
If it is affected, proceed to the corresponding workaround.
Failure of shard relocation in the OpenSearch cluster¶
On large managed clusters, shard relocation may fail in the OpenSearch cluster
with the yellow or red status of the OpenSearch cluster.
The characteristic symptom of the issue is that in the stacklight
namespace, the statefulset.apps/opensearch-master containers are
experiencing throttling with the KubeContainersCPUThrottlingHigh alert
firing for the following set of labels:
The throttling that OpenSearch is experiencing may be a temporary
situation, which may be related, for example, to a peaky load and the
ongoing shards initialization as part of disaster recovery or after node
restart. In this case, Mirantis recommends waiting until initialization
of all shards is finished. After that, verify the cluster state and whether
throttling still exists. And only if throttling does not disappear, apply
the workaround below.
To verify that the initialization of shards is ongoing:
The system response above indicates that shards from the
.ds-system-000072, .ds-system-000073, and .ds-audit-000001
indicies are in the INITIALIZING state. In this case, Mirantis
recommends waiting until this process is finished, and only then consider
changing the limit.
You can additionally analyze the exact level of throttling and the current
CPU usage on the Kubernetes Containers dashboard in Grafana.
To apply the issue resolution:
Verify the currently configured CPU requests and limits for the
opensearch containers:
In the example above, the CPU request is 500m and the CPU limit is
600m.
Increase the CPU limit to a reasonably high number.
For example, the default CPU limit for the clusters with the
clusterSize:large parameter set was increased from
8000m to 12000m for StackLight in Container Cloud 2.27.0
(Cluster releases 17.2.0 and 16.2.0).
If the CPU limit for the opensearch component is already set, increase
it in the Cluster object for the opensearch parameter. Otherwise,
the default StackLight limit is used. In this case, increase the CPU limit
for the opensearch component using the resources parameter.
Wait until all opensearch-master pods are recreated with the new CPU
limits and become running and ready.
To verify the current CPU limit for every opensearch container in every
opensearch-master pod separately:
The waiting time may take up to 20 minutes depending on the cluster size.
If the issue is fixed, the KubeContainersCPUThrottlingHigh alert stops
firing immediately, while OpenSearchClusterStatusWarning or
OpenSearchClusterStatusCritical can still be firing for some time during
shard relocation.
If the KubeContainersCPUThrottlingHigh alert is still firing, proceed with
another iteration of the CPU limit increase.
StackLight pods get stuck with the ‘NodeAffinity failed’ error¶
On a managed cluster, the StackLight Pods may get stuck with the
PodpredicateNodeAffinityfailed error in the Pod status. The issue may
occur if the StackLight node label was added to one machine and
then removed from another one.
The issue does not affect the StackLight services, all required StackLight
Pods migrate successfully except extra Pods that are created and stuck during
Pod migration.
To apply the issue resolution, remove the stuck Pods:
After enabling log forwarding to Splunk as described in
Enable log forwarding to external destinations, you may see no specific errors but logs are not
being sent to Splunk. In this case, debug the issue using the procedure below.
To debug the issue:
Temporary set the debug logging level for the syslog output plugin:
In the following example output, the error indicates that the specified
Splunk host name cannot be resolved. Therefore, verify and update the
host name accordingly.
Example output
2023-07-2509:57:29+0000[info]:addingmatchin@splunk_syslog_output-externalpattern="**"type="remote_syslog"@label@splunk_syslog_output-external
<label@splunk_syslog_output-external>
@idsplunk_syslog_output-external
path"/var/log/fluentd-buffers/splunk_syslog_output-external.system.buffer"path"/var/log/fluentd-buffers/splunk_syslog_output-external.system.buffer"path"/var/log/fluentd-buffers/splunk_syslog_output-external.system.buffer"2023-07-2509:57:30+0000[debug]:[splunk_syslog_output-external]restoringbufferfile:path=/var/log/fluentd-buffers/splunk_syslog_output-external.system.buffer/buffer.q6014c3643b68e68c03c6217052e1af55.log
2023-07-2509:57:30+0000[debug]:[splunk_syslog_output-external]restoringbufferfile:path=/var/log/fluentd-buffers/splunk_syslog_output-external.system.buffer/buffer.q6014c36877047570ab3b892f6bd5afe8.log
2023-07-2509:57:30+0000[debug]:[splunk_syslog_output-external]restoringbufferfile:path=/var/log/fluentd-buffers/splunk_syslog_output-external.system.buffer/buffer.b6014c36d40fcc16ea630fa86c9315638.log
2023-07-2509:57:30+0000[debug]:[splunk_syslog_output-external]bufferstartedinstance=61140stage_size=17628134queue_size=50266052023-07-2509:57:30+0000[debug]:[splunk_syslog_output-external]flush_threadactuallyrunning
2023-07-2509:57:30+0000[debug]:[splunk_syslog_output-external]enqueue_threadactuallyrunning
2023-07-2509:57:33+0000[debug]:[splunk_syslog_output-external]takingbackchunkforerrors.chunk="6014c3643b68e68c03c6217052e1af55"2023-07-2509:57:33+0000[warn]:[splunk_syslog_output-external]failedtoflushthebuffer.retry_times=0next_retry_time=2023-07-2509:57:35+0000chunk="6014c3643b68e68c03c6217052e1af55"error_class=SocketErrorerror="getaddrinfo: Name or service not known"
Mirantis OpenStack for Kubernetes (MOSK) provides a cloud operator
with a declarative interface to describe the desired configuration of
the cloud.
The program modules responsible for life cycle management of
MOSK components
extend the API of the underlying Kubernetes cluster with Custom Resources
(CRs).
These data structures define the services the cloud will provide and
specifics of its behavior.
CRs used in MOSK contain dozens of tunable parameters,
and the number is constantly growing with every new release as new capabilities
get added to the product. Also, each parameter value must be in a specific
format and within a range of valid values.
The purpose of the reference documents below is to provide cloud operators
with an up-to-date and comprehensive definition of the language they need
to use to communicate with MOSK:
This guide provides the usage instructions for a Mirantis OpenStack for Kubernetes
(MOSK) environment and is intended for the
cloud end user to successfully perform the OpenStack lifecycle management.
The section provides instructions on how to verify whether the Masakari service
has been correctly configured by the cloud operator and will recover an
instance from the process and compute node failures.
Depending on the Masakari service configuration,
you may need to mark instances with the HA_Enabled tag.
For more information about service configuration, refer to
Configure high availability with Masakari.
Depending on the Masakari service configuration,
you may need to mark instances with the HA_Enabled tag.
For more information about service configuration, refer to
Configure high availability with Masakari.
This section explains how to set up the introspective instance monitor to
enhance the reliability of virtual machines in your OpenStack environment.
To configure the introspective instance monitoring:
Verify that the introspective instance monitor is enabled in the
OpenStackDeployment custom resource as described in
Enabling introspective instance monitor.
Create a high availability (HA) segment to group relevant compute hosts that
will participate in monitoring and recovery as described in
Group compute nodes into segments.
Ensure the virtual machine image supports the QEMU Guest Agent for
monitoring by updating the image with the hw_qemu_guest_agent=yes
property:
Install the QEMU Guest Agent to enable communication between the host and
guest operating systems and ensure precise monitoring of the instance
health.
Linux-based virtual machines
Install the QEMU Guest Agent using the system package manager.
For example:
sudoaptinstallqemu-guest-agent
Verify that the agent is up and running:
systemctlstatusqemu-guest-agent
Windows-based virtual machines
Note
This procedure uses a generic approach to adding drivers
to Windows. The exact steps may vary depending on the Windows
version. Refer to the installation documentation specific to your
Windows version for detailed instructions.
Download the VirtIO driver ISO file (virtio-win.iso).
Install the VirtIO Serial Driver:
Attach virtio-win.iso to your Windows virtual machine.
Log in to the virtual machine.
Open the Windows Device Manager.
Locate PCI Simple Communications Controller
in the device list.
Right-click on it and select Update Driver.
Browse to the mounted ISO directory DRIVE:\vioserial\<OSVERSION>\,
where <OSVERSION> corresponds to the Windows version of your
Windows virtual machine.
Click Next to install the driver.
Reboot the virtual machine to complete the driver installation.
Install the QEMU Guest Agent:
Log in to the virtual machine.
Use the File Explorer to navigate to
the guest-agent directory in the virtio-win CD drive.
Run the installer: qemu-ga-x86_64.msi(64-bit) or
qemu-ga-i386.msi(32-bit).
Verify that the qemu-guest-agent is up and running.
For example, in the PowerShell:
Application credentials is a mechanism in the MOSK
Identity service (Keystone) that enables application automation tools, such as
shell scripts, Terraform modules, Python programs, and others, to securely
perform various actions in the cloud API in order to deploy and manage
application components.
Historically, dedicated technical user accounts were created to be used by
application automation tools. The application credentials mechanism has
significant advantages over the legacy approach in terms of the following:
Self-service: Cloud users manage application credential objects
completely on their own, without having to reach out to cloud operators.
Note
Application credentials are owned by the cloud user who created
them, not the project, domain, or system that they have access to.
Non-admin users only have access to application credentials that they
have created themselves.
Security: Cloud users creating application credentials have control
over the actions that automation tools will be allowed to perform on their
behalf by the following:
Specifying the cloud API endpoints the tool may access.
Delegating to the tool just a subset of the owner’s roles.
Restricting the tool from creating new application credential objects or
trusts.
Defining the validity period for a credential.
Simplicity: In case a credential is compromised, the automation tools
using it can be easily switched to a new object.
For security reasons, a cloud user who logs in to the cloud through the
Mirantis Container Cloud IAM or an external identity provider cannot
use the application credentials mechanism by default. To enable the
functionality, contact your cloud operator.
MOSK Object Storage service does not support application
credentials authentication to access S3 API. To authenticate in S3 API, use
the EC2 credentials mechanism.
MOSK Object Storage service has limited support for
application credentials when accessing Swift API. The service does not
accept application credentials with restrictions to allowed API endpoints.
You can create an application credential using OpenStack CLI or Horizon.
To create an application credential using CLI, use the
openstack application credential create command.
If you do not provide the application credential secret, one will be generated
automatically.
Warning
The application credential secret displays only once upon creation.
It cannot be recovered from the Identity service. Therefore, capture the
secret string from the command output and keep it in a safe place for future
usage.
When creating application credentials, you can limit their capabilities
depending on the security requirements of your deployment:
Define expiration time.
Limit the roles of an application credential to only a subset of roles that
the user creating the credential has.
Pass a list of allowed API paths and actions, aka access rules, that the
application credential will have access to. For the comprehensive list of
possible options when creating credentials, consult the upstream OpenStack
documentation.
Restrict an application credential from creating another application
credential or a trust.
Note
This is the default behavior, but depending on what the
application credential is used for, you may need to loosen this restriction.
An application credential will be created with access to the scope of your
current session. For example, if your current credential is scoped to a
specific project, domain, or system, the created application credential will
have access to the same scope.
To authenticate in a MOSK cloud using an application
credential, you need to know the ID and secret of the application credential.
When using the human-readable name of an application credential instead of its
ID, you also have to supply the user ID or the user name with the user domain
ID or name. These details are required for the Identity service (Keystone) to
resolve your application credential, since different users may have
application credentials with the same name.
The following example illustrates a snippet from an RC file with required
environment variables using the application credential name:
The token is located in the x-subject-token header of the response, and
the response body contains information about the user, scope, effective roles,
and the service catalog.
In case an application credential becomes invalid due to the expiry or the
owner-user leaving the team, or compromised if its secret gets exposed,
Mirantis recommends rotating the credential immediately as follows:
Create a new application credential with the same permissions.
Adjust the automation tooling configuration to use the new object.
Delete the old object. This can be performed by the owner-user or cloud
operator.
Authenticate in OpenStack API as a federated OIDC user¶
This section offers an example workflow of federated user authentication in
OpenStack using an external identity provider and OpenID Connect (OIDC)
protocol. The example illustrates a typical HTTP-based interchange of
authentication data happening underneath between the client software,
identity provider, and MOSK Identity service (OpenStack
Keystone) when a cloud user logs in to a cloud using their corporate or
social ID, depending on cloud configuration.
The instructions below can be handy for cloud operators who want to delve
into how federated authentication operates in OpenStack and troubleshoot any
related issues, as well as advanced cloud users keen on crafting their own
basic automation tools for cloud interactions.
The instructions are provided for educational purposes. Mirantis encourages
the majority of cloud users to rely on existing mature tools and libraries,
such as openstacksdk, keystoneauth, python-openstackclient,
or gophercloud to communicate with OpenStack APIs in a programmable
manner.
Warning
Mirantis advises cloud users not to rely on federated
authentication when managing their cloud resources using command line and
cloud automation tool. Instead, consider using OpenStack built-in
application credentials mechanism to ensure secure and reliable access
to OpenStack APIs of your MOSK cloud.
See Manage application credentials for details.
Verify cloud configuration with the cloud administrator¶
For cloud users to be able to log in to an OpenStack cloud using their
federated identity, the cloud administrator should configure the cloud
to integrate with an external OIDC-compatible identity provider, such as
Mirantis Container Cloud IAM (Keycloak) with all the necessary resources
pre-created in the OpenStack Keystone API.
To authenticate in OpenStack using your federated identity, an
OIDC-compatible identity provider protocol, and the v3oidcpassword
authentication method, get yourself acquainted with the authentication
parameters listed below. You can extract the required information from the
OpenStack RC file that is available for download from OpenStack Dashboard
(Horizon) once you log in to it with your federated identity, or the
clouds.yaml file.
The URL pointing to the OIDC discovery document served by the identity provider,
usually ends with .well-known/openid-configuration.
OS_CLIENT_ID
The identifier of the OIDC client to use.
OS_CLIENT_SECRET
The secret for the OIDC client. Many OIDC providers require this
parameter. Some providers, including Mirantis Container Cloud IAM
(Keycloak), allow so-called public clients, where secrets are not used
and can be any string.
OS_OPENID_SCOPE
The scope requested when using OIDC authentication. This is at least
openid, but your identity provider may allow to return additional
scopes.
OS_IDENTITY_PROVIDER
The name of the corresponding identity provider object as created in
the Keystone API.
OS_PROTOCOL
The name of the protocol object as created in the Keystone API.
OS_USERNAME
Your user name.
OS_PASSWORD
Your password in the identity provider, such as Mirantis Container Cloud
IAM (Keycloak).
Note
Additionally, to obtain a scoped token, you need information about
the target scope, such as, OS_PROJECT_DOMAIN_NAME and
OS_PROJECT_NAME.
Below is an example of RC file to set environment variables used in the further
code examples for the project scope authentication:
Obtain the access token from the identity provider by sending the POST
request to the token endpoint that will return the access token in exchange
to the login credentials:
As per OpenID Connect RFC,
the request to the endpoint must use the Form serialization.
Now, you can exchange the OIDC access token for an unscoped token from
OpenStack Keystone.
Obtain the unscoped token from OpenStack Keystone¶
The Keystone token is included in the response header. However, the response
body in the JSON format often contains additional data that may prove useful
for certain applications. The following example excludes the body by using
the -I flag:
The tr-d'\r' line trims the carriage return characters from
grep and awk outputs so that the extracted data
can be properly inserted into the JSON authentication request later.
Now, you can generate a scoped token from OpenStack Keystone using the unscoped
token and specifying the project scope.
In case you do not know beforehand the OpenStack authorization scope you want
to log in to, use the unscoped token to get a list of the available scopes:
Replace https://glance.it.just.works with the endpoint of the
MOSK Image service (OpenStack Glance) that you can obtain
from the Access & Security dashboard of OpenStack Horizon.
This section provides the necessary steps to prepare your environment for
working with the MOSK Shared Filesystems service
(OpenStack Manila), create a share network, configure share file systems,
and grant access to clients using various supported share drivers.
Set up Shared Filesystems service with generic driver¶
This section provides the necessary steps to prepare your environment for
working with the MOSK Shared Filesystems service
(OpenStack Manila), create a share network, configure share file systems,
and grant access to clients. By following these instructions, you will
ensure that your system is properly set up to manage and mount shared
resources using Manila with the generic driver.
Set up Shared Filesystems service with CephFS driver¶
This section provides the necessary steps to prepare your environment for
working with the MOSK Shared Filesystems service
(OpenStack Manila), create a share network, configure share file systems,
and grant access to clients. By following these instructions, you will
ensure that your system is properly set up to manage and mount shared
resources using Manila with the CephFS driver.
Create a share file system and grant access to it¶
Deploy your first cloud application using automation¶
This section aims to help you build your first cloud application and onboard
it to a MOSK cloud. It will guide you through the process of
deploying and managing a sample application using automation, and showcase the
powerful capabilities of OpenStack.
The sample application offered by Mirantis is a typical web-based application
consisting of a front end that provides a RESTful API situated behind
the cloud load balancer (OpenStack Octavia) and a backend database that
stores data in the cloud block storage (OpenStack Cinder volumes).
Mirantis RefApp
You can extend the sample application to make use of advanced features offered
by MOSK, for example:
An HTTPS-terminating load balancer that stores its certificate in the Key
Manager service (OpenStack Barbican)
A public endpoint accessible by the domain name with the help of the DNS
service (OpenStack Designate)
The sample application intends to showcase how deployment automation
can enable the DevOps engineers to streamline the process of installing,
updating, and managing their workloads in the cloud providing an efficient
and scalable approach to building and running cloud-based applications.
The sample application offers example templates for the most common tools
that include:
OpenStack Heat, an OpenStack service used to orchestrate composite cloud
applications with a declarative template format through the
OpenStack-native REST API
Terraform, an Infrastructure-as-code tool from HashiCorp, designed to build,
change, and version cloud and on-prem resources using a declarative
configuration language
You can easily customize and extend the templates for similar workloads.
Note
The sample source code and automation templates reside in the
OpenStack RefApp GitHub
repository.
Environment
The sample cloud application deployment has been verified in the following
environment:
OpenStack command-line client v5.8.1
Terraform v1.3.x
OpenStack Yoga
Ubuntu 18.04 LTS (Bionic Beaver) as guest operating system
Navigate to the project where you want to deploy the application.
Use the top-right user menu to download the OpenStackRCFile
to your local machine.
Note
As an example, you will be using your own user credential
to deploy the sample application. However, in the future, Mirantis
strongly recommends creating dedicated application credentials
for your workloads.
Verify the default deployment configuration in the top.yaml template.
Modify parameters as required. For example, you may want to change the
image, flavor, or network parameters.
Create the stack using the provided template with the public key generated
above:
Run the curl tool against the URL of the application public end
point to make sure that all components of the application have been
deployed correctly and it is responding to user requests:
$curlhttp://<APP_URL>/
{"host":"host name of API instance that replied","app":"openstack-refapp"}
The sample application provides a RESTful API, which you can use for advanced
database queries.
Deploy your first cloud application using cloud web UI¶
This section aims to help you build your first cloud application and onboard
it to a MOSK cloud. It will guide you through the process of
deploying a simple application using the cloud web UI (OpenStack Horizon).
The section will also introduce you into the fundamental OpenStack primitives
that are commonly used to create virtual infrastructures for cloud
applications.
The sample application in the context of this tutorial is a typical web-based
application consisting of Wordpress, a popular
web content management system, and a database that stores Wordpress data in the
cloud block storage (OpenStack Cinder volume).
Mirantis-refapp-simple.html
You can extend the sample application to make use of advanced features offered
by MOSK, for example:
Add an HTTPS-terminating load balancer that stores its certificate in the Key
Manager service (OpenStack Barbican)
Make the public endpoint accessible by the domain name with the help of the
DNS service (OpenStack Designate)
You can run the sample application on any OpenStack cloud. It can be a
private MOSK cluster of your company, a public OpenStack
cloud, or even your own tiny TryMOSK instance spinned up in an AWS tenant
as described in the Try Mirantis OpenStack for Kubernetes on AWS
article.
To deploy the sample application, you need:
Access to your cloud web UI with the credentails obtained from your cloud
administator:
The URL of the cloud web UI (OpenStack Horizon)
The login, password, and, optionally, authentication method that
you need to use to log in to the MOSK cloud
The name of the OpenStack project with enough resources available
Connectivity from the MOSK cluster to the Internet
to be able to download the components of the sample application.
If needed, consult with your cloud administrator.
A local machine with the SSH client installed and connectivity to the
cloud public address (floating IP) space.
Environment
The sample cloud application deployment has been verified in the following
environment:
OpenStack Yoga
Ubuntu 18.04 LTS (Bionic Beaver) as guest operating system
Open your favorite web browser and navigate to the URL of the cloud
web UI.
Use the access credentials to log in.
Select the appropriate project from the drop-down menu at the top left.
Create a dedicated private network for your application:
Note
Virtual networks in OpenStack are used for isolated communication
between instances (virtual machines) within the cloud. Instances get
plugged into networks to communicate with any virtual networking
entities, such as virtual routers and load balancers as well as
the outside world.
Navigate to Network > Networks.
Click Create Network. The Create Network dialog
box opens.
In the Network tab, specify a name for the new network.
Select the Enable Admin State and Create Subnet
check boxes.
In the Subnet tab, specify a name for the
subnet and network address, for example, 192.168.1.0/24.
In the Subnet Details tab, keep the preset
configuration.
Click Create.
Create and connect a network router.
Note
A virtual router, just like its physical counterpart, is used
to pass the layer 3 network traffic between two or more networks.
Also, a router performs the network address translation (NAT) for
instances to be able to communicate with the outside world through
the external networks.
To create the network router:
Navigate to Network > Routers.
Click Create Router. The Create Router dialog
box opens.
Specify a meaningful name for the new router and select the external
network it needs to be plugged in. If you do not know which external
network to select, consult your cloud administator.
Now, when the router is up and running, you need to plug it into the
application private network, so it can forward the packets between your
local machine and the instance, which you will create later.
To connect the network router:
Navigate to Network > Routers.
Find the router you have just created and click on its name.
Open the Interfaces tab and click Add Interface.
The Add Interface dialog box opens.
Select the subnetwork that you provided in the first step.
Click Add Interface.
Create an instance:
Note
A virtual machine, or an instance in the OpenStack terminology,
is the machine where your application processes will be effectively
running.
Navigate to Compute > Instances.
Click Launch Instance. The Launch Instance dialog
box opens.
In the Details tab, specify a meaningful name for the new
instance, so that you can easily identify it among others later.
In the Source tab, select the Image boot source.
MOSK comes with a few prebuilt images, we will be
using the one with the Ubuntu Bionic Server image for the sample
application.
In the Flavor tab, pick the m1.small size for
the instance, which provides just enough resources for the application
to run.
In the Networks tab, select the previously created private
network to plug the instance into.
In the Security Groups tab, verify that the default security
group is selected and it allows the ingress HTTP, HTTPs, and SSH traffic.
In the Key Pair tab, create or import a new key pair
to be able to log in to the instance securily through the SSH protocol.
Make sure to have a copy of the private key on your local machine to pass
to the SSH client.
Now all the needed settings have been made, click
Launch Instance.
Wait for the new instance to get shown in the Active state in
the Instances dashboard.
Attach a volume to the instance.
Note
Volumes in OpenStack provide persistent storage for applications,
allowing the data, which is placed on them, to exist independently from
the instances, as opposed to the data written to ephemeral storage,
which automatically gets terminated together with its instance.
To create the volume:
Navigate to Volumes > Volumes.
Click Create Volume.
The Create Volume dialog box opens.
Specify a meaningful name for the new volume.
Leave all fields with the default values. A 1 GiB volume
will be enough for the sample application.
Once the volume is allocated, it shows up in the same Volumes
dashboard. Now, you can attach the new volume to your running instance:
In the Volumes dashboard, select the volume to add to
the instance.
Click Manage Attachments.
The Manage Volume Attachments dialog box opens.
Select the required instance.
Click Attach Volume.
Now, the Attached To column in the Volumes
dashboard will display your volume device name as the volume
attached to your instance. Also, you can view the status of a volume
that can be either Available or In-Use.
Expose the instance outside:
Note
A floating IP address in OpenStack is an abstraction over
a publicly routable IP address that allows an instance to be accessed
from outside the cloud. Floating IPv4 addresses are typically scarce
and expensive resources and, so they need to be explicitly allocated
and assigned to selected instances.
Navigate to Compute > Instances.
From the Actions drop-down list next to your instance,
select Associate Floating IP.
The Manage Floating IP Associations dialog box opens.
Allocate a new floating IP address using the + button.
The Port to be associated should be already filled in
with the instance private port.
Click Associate.
The Compute > Instances dashboard will display the instance
floating IP address along with the private one. Write down the floating IP
address as you will need it in the next step.
Access the instance through SSH.
On your local machine use the SSH client to log in to the instance by its
floating IP address. Ensure the 0600 permission on the private key file.
We will run the application components as Docker containers to simplify
their provisioning and configuration and isolate them from each other.
The common Ubuntu cloud image does not have Docker engine preinstalled, so
you need to install it manually:
sudoapt-getupdate&&sudoapt-getinstall-ydocker.io
Our sample application consists of two Docker containers:
MySQL database server
Wordpress instance
First, we create a new Docker network so that both containers can
communicate with each other:
dockernetworkcreatesamplenet
Now, let’s spin up the MySQL database. We will place all its data in
a separate directory on the mounted volume. You can find more information
about the parameters of MySQL image on its DockerHub page.
Use the web browser on you local machine to navigate to the application
endpoint http://<instance-floating-IP-address>. If you have followed
all the steps accurately, your browser should now display the Wordpress
Getting Started dialog.
Feel free to provide the necessary parameters and proceed with the
initialization. Once it finishes, you proceed with building your own
cloud-hosted website and start serving the users. Congratulations!
Use Heat to create and manage Tungsten Fabric objects¶
Utilizing OpenStack Heat templates is a common practice to orchestrate Tungsten
Fabric resources. Heat allows for the definition of templates, which can depict
the relationships between resources such as networks, and enforce policies
accordingly. Through these templates, OpenStack REST APIs are
invoked to create the necessary infrastructure in the correct order required to
launch applications.
Managing Tungsten Fabric resources through OpenStack Heat represents
a structured and automated approach as compared to using of Tungsten Fabric UI
or API directly. The Heat templates provide a declarative mechanism to define
and manage infrastructure, ensuring repeatability and consistency across
deployments. This contrasts with the manual and potentially error-prone process
of managing resources through the Tungsten Fabric UI and API.
To orchestrate Tungsten Fabric objects through a Heat template:
Define the template with the Tungsten Fabric objects as required.
Note
You can view the full list of Heat resources available in
your environment from either OpenStack Horizon dashboard, the
Project > Orchestration > Resource Types page,
or the OpenStack CLI:
openstackorchestrationresourcetypelist
Also, you can obtain the specification of the Tungsten Fabric
configuration API by accessing
http://TF_API_ADDRESS:8082/documentation/contrail_openapi.html
on your environment.
Below is an example template showcasing a Heat topology that illustrates
the creation sequence of the following Tungsten Fabric resources: instance,
port, network, router, and external network.
Example Heat topology with Tungsten Fabric resources
heat_template_version:2015-04-30description:HOT template to create a Instance connected to external networkparameters:stack_prefix:type:stringdescription:Prefix name for stack resources.default:"net-logical-router"project:type:stringdescription:project for the Serverpublic_network_id:type:stringfloating_ip_pool:type:stringsubnet_ip_prefix:type:stringdefault:'192.168.96.0'subnet_ip_prefix_len:type:stringdefault:'24'server_image:type:stringdescription:Name of image to use for server.default:'Cirros-6.0'availability_zone:type:stringdefault:'nova'resources:ipam:type:OS::ContrailV2::NetworkIpamproperties:name:{ list_join:['_',[get_param:stack_prefix,"ipam"]]}project:{ get_param:project}private_network:type:OS::ContrailV2::VirtualNetworkproperties:name:{ list_join:['_',[get_param:stack_prefix,"network"]]}project:{ get_param:project}network_ipam_refs:[{ get_resource:ipam}]network_ipam_refs_data:[{ network_ipam_refs_data_ipam_subnets:[{ network_ipam_refs_data_ipam_subnets_subnet_name:{ list_join:['_',[get_param:stack_prefix,"subnet"]]}, network_ipam_refs_data_ipam_subnets_subnet:{ network_ipam_refs_data_ipam_subnets_subnet_ip_prefix:'192.168.96.0', network_ipam_refs_data_ipam_subnets_subnet_ip_prefix_len:'24',}, network_ipam_refs_data_ipam_subnets_allocation_pools:[{ network_ipam_refs_data_ipam_subnets_allocation_pools_start:'192.168.96.10', network_ipam_refs_data_ipam_subnets_allocation_pools_end:'192.168.96.100'}], network_ipam_refs_data_ipam_subnets_default_gateway:'192.168.96.1', network_ipam_refs_data_ipam_subnets_enable_dhcp:'true',}]}]private_network_interface:type:OS::ContrailV2::VirtualMachineInterfaceproperties:name:{ list_join:['_',[get_param:stack_prefix,"interface"]]}project:{ get_param:project}virtual_machine_interface_device_owner:'network:router_interface'virtual_machine_interface_bindings:{ virtual_machine_interface_bindings_key_value_pair:[{ virtual_machine_interface_bindings_key_value_pair_key:'vnic_type', virtual_machine_interface_bindings_key_value_pair_value:'normal'}]}virtual_network_refs:[{ get_resource:private_network}]instance_ip:type:OS::ContrailV2::InstanceIpproperties:name:{ list_join:['_',[get_param:stack_prefix,"instance_ip"]]}fq_name:{ list_join:['_',["fq_name",get_param:stack_prefix]]}virtual_network_refs:[{ get_resource:private_network}]virtual_machine_interface_refs:[{ get_resource:private_network_interface}]router:type:OS::ContrailV2::LogicalRouterproperties:name:{ list_join:['_',[get_param:stack_prefix,"router"]]}project:{ get_param:project}virtual_machine_interface_refs:[{ get_resource:private_network_interface}]virtual_network_refs:[{ get_param:public_network_id}]virtual_network_refs_data:[{ virtual_network_refs_data_logical_router_virtual_network_type:'ExternalGateway'},]security_group:type:OS::ContrailV2::SecurityGroupproperties:# description: SG with allowed ssh/icmp trafficname:{ list_join:['_',[get_param:stack_prefix,"sg"]]}project:{ get_param:project}security_group_entries:{ security_group_entries_policy_rule:[{ security_group_entries_policy_rule_direction:'>', security_group_entries_policy_rule_protocol:'any', security_group_entries_policy_rule_ethertype:'IPv4', security_group_entries_policy_rule_src_addresses:[{ security_group_entries_policy_rule_src_addresses_security_group:'local',}], security_group_entries_policy_rule_dst_addresses:[{ security_group_entries_policy_rule_dst_addresses_subnet:{ security_group_entries_policy_rule_dst_addresses_subnet_ip_prefix:'0.0.0.0', security_group_entries_policy_rule_dst_addresses_subnet_ip_prefix_len:'0',},}]},{ security_group_entries_policy_rule_direction:'>', security_group_entries_policy_rule_protocol:'any', security_group_entries_policy_rule_ethertype:'IPv6', security_group_entries_policy_rule_src_addresses:[{ security_group_entries_policy_rule_src_addresses_security_group:'local',}], security_group_entries_policy_rule_dst_addresses:[{ security_group_entries_policy_rule_dst_addresses_subnet:{ security_group_entries_policy_rule_dst_addresses_subnet_ip_prefix:'::', security_group_entries_policy_rule_dst_addresses_subnet_ip_prefix_len:'0',},}]},{ security_group_entries_policy_rule_direction:'>', security_group_entries_policy_rule_protocol:'icmp', security_group_entries_policy_rule_ethertype:'IPv4', security_group_entries_policy_rule_src_addresses:[{ security_group_entries_policy_rule_src_addresses_subnet:{ security_group_entries_policy_rule_src_addresses_subnet_ip_prefix:'0.0.0.0', security_group_entries_policy_rule_src_addresses_subnet_ip_prefix_len:'0',},}], security_group_entries_policy_rule_dst_addresses:[{ security_group_entries_policy_rule_dst_addresses_security_group:'local',}]},{ security_group_entries_policy_rule_direction:'>', security_group_entries_policy_rule_protocol:'tcp', security_group_entries_policy_rule_ethertype:'IPv4', security_group_entries_policy_rule_src_addresses:[{ security_group_entries_policy_rule_src_addresses_subnet:{ security_group_entries_policy_rule_src_addresses_subnet_ip_prefix:'0.0.0.0', security_group_entries_policy_rule_src_addresses_subnet_ip_prefix_len:'0',}}], security_group_entries_policy_rule_dst_addresses:[{ security_group_entries_policy_rule_dst_addresses_security_group:'local',}], security_group_entries_policy_rule_dst_ports:[{ security_group_entries_policy_rule_dst_ports_start_port:'22', security_group_entries_policy_rule_dst_ports_end_port:'22',}]}]}flavor:type:OS::Nova::Flavorproperties:disk:3name:{ list_join:['_',[get_param:stack_prefix,"flavor"]]}ram:1024vcpus:2server_port:type:OS::Neutron::Portproperties:network_id:{ get_resource:private_network}binding:vnic_type:'normal'security_groups:[{ get_resource:security_group}]server:type:OS::Nova::Serverproperties:name:{ list_join:['_',[get_param:stack_prefix,"server"]]}image:{ get_param:server_image}flavor:{ get_resource:flavor}availability_zone:{ get_param:availability_zone}networks:-port:{ get_resource:server_port}server_fip:type:OS::ContrailV2::FloatingIpproperties:floating_ip_pool:{ get_param:floating_ip_pool}virtual_machine_interface_refs:[{ get_resource:server_port}]outputs:server_fip:description:Floating IP address of server in public networkvalue:{ get_attr:[server_fip,floating_ip_address]}
Create an environment file to define values to put in the variables
in the template file:
Before you start using the S3 API, ensure you have the necessary prerequisites
in place. This includes having access to an OpenStack deployment with the
Object Storage service enabled and authenticated credentials.
Verify the presence of the object-store service within the OpenStack
Identity service catalog. If the service is present, the following command
returns endpoints related to the object-store service:
openstackcatalogshowobject-store
If the object-store service is not present in the OpenStack Identity
service catalog, consult your cloud operator to confirm that the Object
Store service is enabled in the kind:OpenStackDeployment resource
controlling your OpenStack installation. The following element must be
present in the configuration:
The S3 API utilizes the AWS authorization protocol, which is not directly
compatible with the OpenStack Identity service, aka Keystone, by default.
To access the MOSK Object Storage service using the S3
API, you should create EC2 credentials within the OpenStack Identity service:
When accessing the Object Storage service through the S3 API, take note of the
access and secret fields. These values serve as respective equivalents
for the access_key and secret_access_key options, or similarly named
parameters, within the S3-specific tools.
To interact seamlessly with OpenStack Object Storage through the S3 API,
familiarize yourself with essential S3-specific tools, such as
s3cmd, the AWS Command Line Interface (CLI), and Boto3 SDK for
Python.
This section provides concise yet comprehensive configuration examples for
utilizing these S3-specific tools allowing users to interact with the
Amazon S3 and other cloud storage providers employing the S3 protocol.
S3cmd is a free command-line client
designed for uploading, retrieving, and managing data across various cloud
storage service providers that utilize the S3 protocol, including Amazon S3.
Example of a minimal s3cfg configuration:
[default]# use 'access' value from "openstack ec2 credentials create"access_key=a354a74e0fa3434e8039d0425f7a0b59# use 'secret' value from "openstack ec2 credentials create"secret_key=d7c2ca9488dd4c8ab3cff2f1aad1c683# use hostname of the "openstack-store" service, without protocolhost_base=openstack-store.it.just.works# important, leave emptyhost_bucket=
When configured, you can use s3cmd as usual:
s3cmd-cs3cfgls# list buckets
s3cmd-cs3cfgmbs3://my-bucket# create a bucket
s3cmd-cs3cfgputmyfile.txts3://my-bucket# upload file to bucket
s3cmd-cs3cfggets3://my-bucket/myfile.txtmyfile2.txt# download file
s3cmd-cs3cfgrms3://my-bucket/myfile.txt# delete file from bucket
s3cmd-cs3cfgrbs3://my-bucket# delete bucket
The AWS CLI stands as the
official and powerful command-line interface provided by Amazon Web Services
(AWS). It serves as a versatile tool that enables users to interact with
AWS services directly from the command line. Offering a wide range of
functionalities, the AWS CLI facilitates diverse operations, including but
not limited to resource provisioning, configuration management, deployment,
and monitoring across various AWS services.
To start using the AWS CLI:
Set the authorization values as shell variables:
# use "access" field from created ec2 credentialsexportAWS_ACCESS_KEY_ID=a354a74e0fa3434e8039d0425f7a0b59
# use "secret" field from created ec2 credentialsexportAWS_SECRET_ACCESS_KEY=a354a74e0fa3434e8039d0425f7a0b59
Explicitly provide the --endpoint-url set to the endpoint
of the openstack-store service to every aws CLI command:
Boto3
is the official Python3 SDK (Software Development Kit) specifically designed
for Amazon Web Services (AWS), providing comprehensive support for various
AWS services, including the S3 API for object storage. It offers extensive
functionality and tools for developers to interact programmatically with
AWS services, facilitating tasks such as managing, accessing, and manipulating
data stored in Amazon S3 buckets.
Presuming that you have configured the environment with the same environment
variables as in the example for AWS CLI, you can create an S3 client
in Python as follows:
importboto3,os# high level "resource" interfaces3=boto3.resources("s3",endpoint_url=os.getenv("S3_API_URL"))forbucketins3.buckets.all():# returns rich objectsprint(bucket.name)# low level "client" interfaces3=boto3.client("s3",endpoint_url=os.getenv("S3_API_URL"))buckets=s3.list_buckets()# returns raw JSON-like dictionaries
MOSK enables users to configure and run Windows guests
on OpenStack, which allows for optimization of cloud infrastructure for
diverse workloads. This section delves into the nuances of achieving seamless
integration between the Windows operating system and MOSK
clouds.
Also, you have the option to set up Windows guests in a way that supports
UEFI Secure Boot and includes an emulated virtual Trusted Platform Module
(TPM). This configuration enhances security features for your Windows
virtual machines within the OpenStack environment.
Note
Windows 11 imposes a security system requirement, necessitating
the activation of UEFI Secure Boot and ensuring that TPM version 2.0
is enabled.
Configuration example for the image with Windows 11:
You can configure the UEFI Secure Boot support through flavor extra specs or
image metadata properties. For x86_64 hosts, enabling secure boot also
necessitates configuring the use of the Q35 machine type.
MOSK enables you to configure this on a per-guest basis
using the hw_machine_type image metadata property.
Configuration example for the image that meets both requirements:
A vTPM can be requested for a server through either flavor extra specs or
image metadata properties. There are two supported TPM versions: 1.2 and 2.0,
along with two models: TPM Interface Specification (TIS) and Command-Response
Buffer (CRB). Notably, the CRB model is only supported with version 2.0.
The Dynamic Resource Balancer (DRB) service automatically moves OpenStack
instances around to achieve more optimal resource usage in a
MOSK cluster.
Consult your cloud administrator to determine if this service is enabled
in your cloud and which mode is used. The DRB service relies on the OpenStack
live migration mechanism to ensure that instances can be seamlessly moved
to another hypervisor of its choice.
Note
The live migration mode supports local block storage. The live
migration mechanism automatically determines whether Nova should
migrate using a local block storage or a shared storage.
Depending on the nature of your workload and configuration of the
MOSK cluster, you can explicitly configure the
DRB service in two ways:
Opt out of automatic instance migration if your instances are sensitive
to live migration. For example, if the instances rely on special local
resources such as SR-IOV-based virtual NICs or cannot tolerate CPU
throttling.
Opt in to optimize your instance placement at any time. In this case,
if the DRB service is configured not to move instances by default,
this allows your applications to relocate away from noisy neighbors
that are consuming excessive shared resources on a hypervisor.
If the DRB service in your MOSK cluster is configured to
auto-migrate all instances by default, you, as the owner of the instance,
can opt out of such automated migrations.
To achieve this, tag your instances with lcm.mirantis.com:no-drb.
For example, using the openstack CLI:
Successfull execution of the command above produces no output. To revert to
the default behavior, use the openstack server unset command in
a similar way.
Note
OpenStack instance tags are distinct from metadata (key-value pairs).
Therefore, use instance tagging explicitly for this purpose.
If the DRB service in your MOSK cloud is configured to
move only specific instances, in order for the placement of your instances
to get automatically optimized, you need to explicitly tag each instance with
lcm.mirantis.com:drb. For example, using the openstack CLI:
Successfull execution of the command above produces no output. To revert
to the default behavior, use the openstack server unset command in
a similar way.
Note
OpenStack instance tags are distinct from metadata (key-value pairs).
Therefore, use instance tagging explicitly for this purpose.
MOSK provides the capability to perform instance migrations
for the non-administrative users of the OpenStack cloud.
Consult your cloud administrator to ensure this functionality is available
to you. If it is, you may have access to cold migration, live migration,
or both. Refer to Instance migration to learn more about these migration
types in MOSK.
If migration is available to you as a non-administrative user, it is,
by default, a completely scheduler-controlled type of migration. As a user,
you do not have the option to select the target host for your instance.
Instead, the Compute service scheduler automatically selects the best-suited
target host, if one is available.
To perform migrations, you can use any preferred method, including direct
API interactions, CLI tools (the openstack client), or
the OpenStack Dashboard service.
This tutorial provides step-by-step instructions on how to use the Neutron
Trunk extension in your project infrastructure. By following this guide,
you will learn how to configure trunk ports in OpenStack Neutron, enabling
efficient network segmentation and traffic management.
The Neutron Trunk extension allows a single virtual machine (VM) to connect
to multiple networks using a single port. This is achieved by designating
one port as the parent port, which handles untagged IP packets, while
additional subports receive tagged packets through the IEEE 802.1Q VLAN
protocol.
The Neutron Trunk extension is enabled by default.
We create trunk_subport using the same MAC address as
its parent trunk_port. Neutron developers recommend this approach
to avoid issues with ARP spoof protection and the native OVS firewall driver.
By following this tutorial, you have successfully configured a trunk port in
OpenStack Neutron. VM3 can now communicate with both net_A and net_B
through a single interface using VLAN segmentation. This setup enables
efficient network management and reduces the number of required ports,
simplifying your infrastructure.
For further customization, refer to the official OpenStack Neutron
documentation on trunk port configurations.
MOSK enables cloud users to mark their instances for LCM to
handle them individually during host maintenance operations, such as host
reboots or data plane restarts. This can be useful if some instances in your
environment are sensitive to live migration, allowing you, as a cloud user,
to communicate your requirements effectively to the cloud operator and
streamline cluster maintenance.
To mark the instances that require individual handling during host
maintenance, assign the
openstack.lcm.mirantis.com:maintenance_action=<ACTION-TAG>
tag to them using the Nova API:
Below is the table that describes the supported tag values.
Maintenance action tags for instance migration handling¶
Tag value
Description
poweroff
The instance can be gracefully powered off during a host reboot.
This is equivalent to the skip mode in
openstack.lcm.mirantis.com/instance_migration_mode for instance
migration configuration for hosts.
live_migrate
The instance can be live-migrated during maintenance.
This is equivalent to the live mode in
openstack.lcm.mirantis.com/instance_migration_mode for instance
migration configuration for hosts.
notify
The user must be explicitly notified about planned host maintenance.
This is equivalent to the manual mode in
openstack.lcm.mirantis.com/instance_migration_mode for instance
migration configuration for hosts.
This guide provides recommendations on how to effectively use product
capabilities to harden the security of a Mirantis OpenStack for Kubernetes
(MOSK) deployment.
Note
The guide is being under development and will be updated
with new sections in future releases of the product documentation.
MOSK services can emit notifications in the Cloud Auditing
Data Federation (CADF) format, which is
a standardized format for event data. The information contained in such
notifications describes every action users perform in the cloud and is
commonly used by organizations to perform security audits and intrusion
detection.
Currently, the following MOSK services support the emission
of CADF notifications:
Compute service (OpenStack Nova)
Block Storage service (OpenStack Cinder)
Images service (OpenStack Glance)
Networking service (OpenStack Neutron)
Orchestration service (OpenStack Heat)
DNS service (OpenStack Designate)
Bare Metal service (OpenStack Ironic)
Load Balancing service (OpenStack Octavia)
CADF notifications are enabled in the features:logging:cadf section of
the OpenStackDeployment custom resource. For example:
spec:features:logging:cadf:enabled:true
The way the notification messages get delivered to the consumers is
controlled by the notification driver setting. The following options are
supported:
messagingv2 - Default
Messages get posted to the notifications.info queue in
the MOSK message bus, which is RabbitMQ
log
Messages get posted to a standard log output and then collected
by Mirantis StackLight
MOSK generates all credentials used internally, including
two types of credentials generated during the OpenStack deployment:
Credentials for admin users provide unlimited access and enable the
initial configuration of cloud entities. Three sets of such credentials
are generated for accessing the following services:
OpenStack database
OpenStack APIs (OpenStack admin identity account)
OpenStack messaging
Credentials for OpenStack service users are generated for each deployed
OpenStack service. To operate successfully, OpenStack services require
three sets of credentials for accessing the following services:
OpenStack database
OpenStack APIs (OpenStack service identity account)
OpenStack messaging
To enhance the information security level, Mirantis recommends changing
the passwords of internally used credentials periodically. We suggest
changing the credentials every month. MOSK
includes an automated routine for changing credentials, which must
be triggered manually.
Restarting OpenStack services is necessary to apply new credentials.
Therefore, it is crucial to have a smooth transition period to minimize
the downtime for the OpenStack control plane. To achieve this, perform
the credential rotation as described in Rotate OpenStack credentials.
Ceph monitors use their node host networks to interact with Ceph daemons.
Ceph daemons communicate with each other over a specified cluster network
and provide endpoints over the public network.
The messenger V2 (msgr2) or earlier V1 (msgr) protocols are used for
communication between Ceph daemons.
Ceph daemon
Network
Protocol
Port
Description
Consumers
Manager (mgr)
Cluster network
msgr/msgr2
6800,
9283
Listens on the first available port of the 6800-7300 range.
Uses 9283 port for exporting metrics.
csi-rbdplugin,
csi-rbdprovisioner,
rook-ceph-mon
Metadata server (mds)
Cluster network
msgr/msgr2
6800
Listens on the first available port of the 6800-7300 range
csi-cephfsplugin,
csi-cephfsprovisioner
Monitor (mon)
LCM host network
msgr/msgr2
msgr:3300,
msgr2:6789
Monitor has separate ports for msgr and msgr2
Ceph clients
rook-ceph-osd,
rook-ceph-rgw
Ceph OSD (osd)
Cluster network
msgr/msgr2
6800-7300
Binds to the first available port from the 6800-7300 range
Ceph Controller uses the NetworkPolicy objects for each Ceph daemon.
Each NetworkPolicy is applied to a pod with defined labels in the
rook-ceph namespace. It only allows the use of the ports specified in the
NetworkPolicyspec. Any other port is prohibited.
Ceph daemon
Pod label
Allowed ports
Manager (mgr)
app=rook-ceph-mgr
6800-7300,
9283
Monitor (mon)
app=rook-ceph-mon
3300,
6789
Ceph OSD (osd)
app=rook-ceph-osd
6800-7300
Metadata server (mds)
app=rook-ceph-mds
6800-7300
Ceph Object Storage (rgw)
app=rook-ceph-rgw
Value from spec.cephClusterSpec.objectStorage.rgw.gateway.port,
Value from spec.cephClusterSpec.objectStorage.rgw.gateway.securePort
Communications between Mirantis OpenStack for Kubernetes (MOSK)
components are provided by the Calico networking. All internal communications
occur through the Calico tunnel through the VXLAN or WireGuard protocols.
Note
Since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0),
WireGuard is deprecated. If you still require the feature, contact
Mirantis support for further information.
Caution
These ports are only used for in-cluster communications. Open them
only to a trusted network and never at a perimeter firewall.
Component
Protocol
Port
Description
Calico VXLAN
UDP
4792
Calico networking with VXLAN enabled
Calico WireGuard
UPD
51820
Calico networking with IPv4 Wireguard enabled
In-cluster communications between MetalLB speaker components are done using
the LCM network. MetalLB components also provide metrics to be collected
by StackLight.
Caution
These ports are only used for in-cluster communications. Open
them only to a trusted network and never at a perimeter firewall.
OpenStack provides operators with fine-grained control over access
to API endpoints and actions through access policies. These policies
allow cloud administrators to restrict or grant access based on roles
and the current request context, such as the project, domain, or system.
OpenStack services come with a set of default policy rules that are
generally sufficient for most users. However, for specific use cases,
these policies may need to be modified.
MOSK enables you to define custom policies through
the OpenStackDeployment custom resource. For configuration details,
refer to features:policies.
With the legacy default policies, only the admin role has a dedicated
meaning. Granting this role to a user in any context provides global
administrative access to the service APIs.
Any other role within the project grants the user standard access, enabling
them to create resources as well.
The new default policies are based on the enhanced capabilities of updated
OpenStack Keystone. They incorporate the hierarchical default roles, such
as reader, member, and admin, as well as system scope.
If a policy rule is explicitly defined by the deployment or by the cloud
operator through the OpenStackDeployment custom resoruce, only that
rule is enforced.
If no explicit API access rule is set, MOSK applies both
the legacy and new policy sets simultaneously. Each API access request is
checked by both sets, and access is granted if either of the policy sets
allows it. This behavior is controlled by the
[oslo_policy]enforce_new_defaults configuration option, which is set
individually for each OpenStack service. Setting this option to True
ensures that API access to this service is evaluated only against
the new default policies.
Caution
Mirantis does not recommend enforcing the new default policies.
Our test results indicate that these policies are not yet consistently
reliable across all services. Additionally, as of the OpenStack Antelope
release, the new default policies have not undergone extensive testing
in the upstream development.
Enforcing or using the new default policies may lead to unexpected
consequences potentially affecting LCM operations such as running
Tempest tests, performing automatic live migrations during node
maintenance, and so on.
MOSK deploys OpenStack with certain upstream policies
customized and additional fine-grained policies that are not present in
upstream. The following list provides details on these policies.
A user with this role can decode any Barbican secret in any project.
This role is specifically granted to the service user performing
automatic instance live migrations during node maintenance. Granting this role to the service user
enables them to live migrate instances that use encrypted volumes.
By default, upstream policies restrict secret decryption to either
user who created the secret or the administrator of the corresponding
project.
A user that created an order can also delete that order¶
Available since MOSK 23.1
A user can automatically clean up orders, preventing them from accumulating
and causing the Barbican database to grow uncontrollably.
By default, upstream policies allow a user with the creator role to create
orders. However, they restrict order deletion to the project administrator.
By default, upstream policies restrict live migration to administrative users
only, without the ability to distinguish between different types of live
migration.
A cloud operator can define flexible rules to control assignment and removal
of specific server tags to and from OpenStack instances. These rules allow
the operator to restrict tag assignment and removal based on their value.
Per-tag server tag policies include the following:
os_compute_api:os-server-tags:update:{tag_name}
Restricts access to the APIs for creating instances, adding tags, and
replacing tags
os_compute_api:os-server-tags:delete:{tag_name}
Restricts access to the APIs for deleting a single tag, deleting all tags,
and replacing existing tags
For example, to ensure that only administrators can exclude specific
instances from migration by the DRB service,
the operator can configure the following policies through the
OpenStackDeployment custom resource:
MOSK offers various mechanisms to ensure data integrity
and confidentiality. This section provides an overview of the data protection
capabilities available in MOSK.
This section provides an overview of the data protection capabilities
available in MOSK, focusing primarily on data
encryption. You will gain insights into different data encryption
features of MOSK, understand the type of data they
protect, where encryption occurs concerning cloud boundaries, and
whether these mechanisms are available by default or require explicit
enablement by the cloud operator or cloud user.
Live migration enables the seamless movement of a running instance to another
node within the cluster, ensuring uninterrupted access to the virtual workload.
In MOSK, the native TLS encryption feature is available
for QEMU and libvirt, securing all data transports, including disks not on
shared storage. Additionally, the libvirt daemon exclusively listens to
TLS connections.
To establish a TLS environment, encompassing CA, server, and client
certificates, the relevant compute nodes automatically generate these
components. By default, these certificates are encrypted with a 2048-bit
RSA private key and are valid for 3650 days.
You can easily enable live migration over TLS by configuring the
features:nova:libvirt:tls parameter in the OpenStackDeployment custom
resource. For reference, see Configuring live migration.
Caution
Instances started before enabling secure live migration will not
support live migration.
The issue arises due to the SSL certificates for live migration with QEMU
native TLS being generated during the service update. Thus, these
certificates do not exist in the libvirt container when existing instances
were started. Consequently, QEMU processes of those instances lack
the required SSL certificate information, leading to migration failures
with an internal error:
internal error:unable to execute QEMU command ‘object-add’: Unable to access credentials /etc/pki/qemu/ca-cert.pem: No such file or directory
As a workaround, stop and then start the instances that failed to live
migrate. This process will create new QEMU processes within the libvirt
container, ensuring the availability of TLS certificate details.
In a cloud infrastructure, the components comprising the cloud control plane
exchange messages that may contain sensitive information, such as cloud
configuration details, application and cloud user credentials, and other
essential data that an attacker can use to highjack the cloud.
Encrypting the control plane traffic is crucial for data confidentiality
and overall security of the cloud.
MOSK offers the ability to encrypt its control plane
communication by means of encapsulating the in-cluster traffic of the
underlying Kubernetes into a WireGuard mesh network built across its nodes.
Untitled Diagram
Note
Since Container Cloud 2.29.0 (Cluster releases 17.4.0 and 16.4.0),
WireGuard is deprecated. If you still require the feature, contact
Mirantis support for further information.
When an attacker is able to intercept the traffic between the nodes of
a MOSK cluster but does not have access to the nodes
themselves, WireGuard ensures the following:
Data confidentiality
Any intercepted traffic remains unreadable, especially the traffic of those
components of the MOSK control plane that do not enable
SSL/TLS encryption on the application level and rather rely on the
underlying networking layer.
Data integrity
Alterations in traffic are detectable, ensuring that no tampering has
occurred during transit.
Authentication
Only machines with valid cryptographic credentials can join the network
and exchange data.
The following control plane components can have their communications protected
with the WireGuard encryption:
OpenStack database (MariaDB)
OpenStack message bus (RabbitMQ)
OpenStack internal API
OpenStack services interacting with auxiliary components, such as memcached,
RedisDB, and PowerDNS
Interaction between StackLight internal components, including collection
of metrics from OpenStack, Ceph, and other subsystems
Tungsten Fabric auxiliary components that include ZooKeeper, Kafka, Cassandra
database, Redis database, and RabbitMQ
Communications not protected by WireGuard encryption¶
All components of the cloud control plane that require explicit firewall rules
configuration as per MOSK firewall configuration guide utilize the Kubernetes host network mode for their pods,
and, therefore, cannot be protected by WireGuard.
By default, the WireGuard encryption of the control plane
communications is not enabled in MOSK. However, it is
possible to enable the encryption upon initial deployment or later.
When enabling WireGuard, make sure to configure the Calico MTU size correctly.
It must be at least 60 bytes smaller than the interface MTU size of the
workload network.
WireGuard uses public-private ECDH key pairs for secure handshake between
the nodes of the cluster. Each node obtains its unique pair, with the
public key shared across other nodes. A key pair persists indefinitely
unless the node is reprovisioned and re-added to the cluster.
The handshake procedure establishes symmetric keys used for traffic
encryption and automatically re-occurs every few minutes to ensure
data security.
While WireGuard is designed for efficiency, enabling encryption introduces
some overhead.
Caution
The impact can vary depending on the cloud scale and usage
profile.
You may experience the following:
A slight increase in CPU utilization on the MOSK
cluster nodes.
Less than 30% loss of network throughput, which, given the cluster is
designed according to Mirantis recommendations, does not impact control
plane communications of an average cloud.
This section provides insights into the standards and regulatory requirements
that MOSK adheres to, ensuring a secure and compliant
environment that you can trust.
Federal Information Processing Standard Publication (FIPS), outlines security
requirements for cryptographic modules used by the US government and
its contractors to protect sensitive and valuable information. It categorizes
the level of security provided by these modules, ranging from level 1 to level
4, with each level having progressively stringent security measures.
The FIPS mode within OpenStack verifies that its cryptographic algorithms and
modules strictly conform to approved standards. This is crucial for several
reasons:
Regulatory compliance
Many government agencies and industries dealing with sensitive data,
such as finance and healthcare, require FIPS-140 compliance as a regulatory
mandate. Ensuring compliance enables organizations to operate within legal
boundaries and meet industry standards.
Data security
FIPS-140 compliance ensures a higher level of security for cryptographic
functions, protecting sensitive information from unauthorized access and
manipulation. FIPS-compliant environments have a high level of security
for data encryption, digital signatures, and the integrity of communication
channels.
Interoperability
FIPS-140 compliance can enhance interoperability by ensuring that systems
and cryptographic modules across different platforms or vendors meet
a standard set of security requirements. This is essential, especially
in multi-cloud or interconnected environments.
MOSK ensures that the user-to-cloud communications are
always protected in compliance with FIPS 140-2. The capability is implemented
as an SSL/TLS proxy injected into MOSK underlying Kubernetes
ingress networking and performs the data encryption using a FIPS-validated
cryptographic module.
Container Cloud uses policy-controller for signature validation of
pod images. It verifies that images used by the Container Cloud and
Mirantis OpenStack for Kubernetes (MOSK) controllers are signed by a
trusted authority. The policy-controller inspects defined image policies
that list image registries and authorities for signature validation.
The policy-controller validates only pods with image references from
the Container Cloud content delivery network (CDN). Other registries are
ignored by the controller.
The policy-controller supports two modes of image policy validation for
Container Cloud and MOSK images:
warn
Default. Allows controllers to use untrusted images, but a warning message
is logged in the policy-controller logs and sent as an admission
response.
enforce
Experimental. Blocks pod creating and updating operations if a pod image
does not have a valid Mirantis signature. If a pod creation or update is
blocked in the enforce mode, send the untrusted artifact to
Mirantis support for further
inspection. To unblock pod operations, switch to the warn mode.
Warning
The enforce mode is still under development and is available
as an experimental option. Mirantis does not recommend enabling this option
for production deployments. The full support for this option will be
announced separately in one of the following Container Cloud releases.
In case of unstable connections from the policy-controller to Container
Cloud CDN that disrupt pod creation and update operations, you can disable
the controller by setting enabled:false in the configuration.
The policy-controller configuration is located in the Cluster object:
This section provides answers to common questions about
MOSK, and it is designed to help you quickly find
the information you need. We have included answers to the most common
questions and uncertainties that our users encounter, along with helpful
tips and references to step-by-step instructions where required.
The questions with answers in this section are organized by topic.
If you cannot find the information you are looking for in this section,
search in the whole documentation set. Also, do not hesitate to contact us
through the Feedback button. We are always available to answer
your questions and provide you with the assistance you need to use our product
effectively.
What is the difference between patch and major release versions?¶
Both major and patch release versions incorporate solutions for security
vulnerabilities and known product issues. The primary distinction between
these two release types lies in the fact that major release versions
introduce new functionalities, whereas patch release versions predominantly
offer minor product enhancements.
Patch releases strive to considerably reduce the timeframe for delivering
CVE resolutions in images to your deployments, aiding in the mitigation
of cyber threats and data breaches.
Content
Major release
Patch release
Version update and upgrade of the major product components including
but not limited to OpenStack, Tungsten Fabric, Kubernetes, Ceph, and
Stacklight 0
Container runtime changes including Mirantis Container Runtime and
containerd updates
Changes in public API
Changes in the Container Cloud and MOSK lifecycle management including
but not limited to machines, clusters, Ceph OSDs
Host machine changes including host operating system and kernel updates
Patch version bumps of MKE and Kubernetes
Fixes for Common Vulnerabilities and Exposures (CVE) in images
A product release series is a series of consecutive releases that starts with
a major release and includes a number of patch releases built on top of
the major release.
For example, the 23.1 series includes the 23.1 major release and 23.1.1,
23.1.2, 23.1.3, and 23.1.4 patch releases.
What is the difference between the new and old update schemes¶
Apply patch updates only if you want to receive security fixes as soon as they
become available and you are prepared to update your cluster often,
approximately once in three weeks.
Otherwise, you can skip patch releases and update only between major releases.
Each subsequent major release includes patch release updates of the previous
major release.
When planning the update path for your cluster, take into account the release
support status included in Release Compatibility Matrix.
Can I skip patch releases within a single series since 24.1 series?¶
Yes.
Can I skip patch releases within a single series before 24.1 series?¶
Yes.
Additonally, before MOSK 24.1 series, it is technically
not possible to update to any intermediate release version if the newer patch
version has been released. You can update only to the latest available patch
version in the series, which contains the updates from all the preceding
versions. For example, if your cluster is running MOSK
23.1 and the latest available patch version is MOSK 23.1.2,
you must update to 23.1.2 receiving the product updates from 23.1.1 and 23.1.2
at one go.
Moreover, if between the two major releases you apply at
least one patch version belonging to the N series, you have to obtain
the last patch release in the series to be able to update to the N+1
major release version.
How do I update to a patch version within the same series?¶
When updating between the patches of the same series, follow the
Update to a patch version procedure.
When updating between the series, follow the Cluster update procedure.
How do I update to the next major version
if I started receiving patches of the previous series?¶
Caution
This answer applies only to MOSK 24.1 series.
Starting from MOSK 24.1.5, Mirantis introduces a new
update scheme allowing for the update path flexibility.
Firstly, if you started receiving patch updates from the previous release
series, update your cluster to the latest patch release in that series as
described the Update to a patch version procedure.
After, follow the Cluster update procedure to update from the latest
patch release in the series to the next major release. It is technically
impossible to receive a major release while on any patch release in the
previous series other than the last one.
This document provides a high-level overview of new features, known issues,
and bug fixes included in the latest MOSK release.
It also includes lists of release artifacts and fixed Common Vulnerabilities
and Exposures (CVEs). The content is intended to help product users, operators,
and administrators stay informed about key changes and improvements to the
platform.
In addition to release highlights, the document includes update notes for
each MOSK release. These notes support a smooth and
informed cluster update process by outlining the update impact, critical
pre- and post-update steps, and other relevant information.
Introduced the IP address capacity monitoring, enabling cloud operators to
better manage routable IP addresses. By providing insights into capacity usage,
this monitoring capability helps predict future cloud needs, prevent service
disruptions, and optimize the allocation of external IP address pools.
Synchronization of local MariaDB backups with remote S3 storage¶
TechPreview
Implemented the capability to synchronize local MariaDB backups with a remote
S3 storage ensuring data safety through secure authentication and server-side
encryption for stored archives.
Implemented support for introspective instance monitor in the Instance High
Availability (HA) service to improve the reliability and availability of
OpenStack environments by continuously monitoring virtual machines
for critical failure events. These include operating system crashes,
kernel panics, unresponsive states, and so on.
Restricting tag assignments on OpenStack instances¶
Implemented the capability that enables cloud operators to define flexible
rules to control assignment and removal of specific tags to and from
OpenStack instances. The per-tag server tag policies allow the operator
to restrict tag assignment and removal based on tag values.
Implemented the capability that enables the cloud users to mark instances
that should be handled individually during host maintenance operations,
such as host reboots or data plane restarts. This provides greater
flexibility during cluster updates, especially for workloads that are
sensitive to live migration.
To mark the instances that require individual handling during host
maintenance, one of the following values for the
openstack.lcm.mirantis.com:maintenance_action=<ACTION-TAG> server tag
can be used: poweroff, live_migrate, or notify.
Enabled cloud operators to configure Message of the Day (MOTD) in the
MOSK Dashboard (OpenStack Horizon). This feature allows
cloud operators to communicate critical information, such as infrastructure
issues, scheduled maintenance, and other important events, directly to users.
Added the capability for cloud users to specify the type of the volume to be
created when launching instances using Image (with
Create New Volume selected) as a boot source through the
MOSK Dashboard (OpenStack Horizon). The default selection
is the default volume type as returned by the Cinder API.
This enhancement provides greater control and an improved user experience
for instance configuration through the web UI.
Enhanced cloud security by providing the capability to enable encryption
of OpenStack database backups, both local and remote, using the OpenSSL
aes-256-cbc encryption through the OpenStackDeployment custom
resource.
The OpenStack Controller, which is the central component of
MOSK and is responsible for the life cycle management of
OpenStack services running in Kubernetes containers, has been open-sourced
under the new name Rockoon and will be maintained as an independent
open-source project going forward.
As part of this transition, all openstack-controller pods are now
named rockoon across the MOSK documentation and
deployments. This change does not affect functionality, but users should
update any references to the previous pod names accordingly.
Introduced automatic Cassandra database repairs for Tungsten Fabric through
the tf-dbrepair-job CronJob. This enhancement allows users to enable
scheduled repairs, ensuring the health and consistency of their Cassandra
clusters with minimal manual intervention.
Reworked the following agent-related and service-related alerts from the
cluster-wide to the host-wide scope, including the corresponding changes in the
inhibition rules:
CinderServiceDown
NeutronAgentDown
NovaServiceDown
This enhancement allows the operator to better operate environments on a large
scale.
Reworked monitoring of RabbitMQ by implementing the following changes:
Switched from the obsolete prometheus-rabbitmq-exporter job to the
rabbitmq-prometheus-plugin one, which is based on the native
RabbitMQ Prometheus plugin ensuring reliable and direct metric colletion.
Introduced the RabbitMQ Overview Grafana dashboard and reworked
all alert rules to utilize metrics from the RabbitMQ Prometheus plugin. This
dashboard replaces the deprecated RabbitMQ dashboard, which will
be removed in one of the following releases.
Introduced the RabbitMQ Erlang Grafana dashboard to further
enhance RabbitMQ monitoring capabilities.
Reworked RabbitMQ alerts:
Added the RabbitMQTargetDown alert.
Renamed RabbitMQNetworkPartitionsDetected to
RabbitMQUnreachablePeersDetected.
Deprecated RabbitMQDown and RabbitMQExporterTargetDown. They will
be removed in one of the following releases.
Warning
If you use deprecated RabbitMQ metrics in customizations such as
alerts and dashboards, switch to the new metrics and dashboards within the
course of the MOSK 25.1 series to prevent issues once the
deprecated metrics and dashboard will be removed.
Hiding sensitive ingress data of Ceph public endpoints¶
Introduced the ability to securely store ingress Transport Layer Security (TLS)
certificates for Ceph Object Gateway public endpoints in a secret object. This
feature leverages the tlsSecretRefName field in the Ceph cluster spec,
enhancing security by preventing the exposure of sensitive data associated with
Ceph public endpoints.
On existing clusters, Mirantis recommends updating the Ceph cluster spec
by replacing fields containing TLS certificates with tlsSecretRefName as
described in Hide sensitive ingress data for Ceph public endpoints.
Note
Since MOSK 25.1, the ingress field of the
Ceph cluster spec is automatically replaced with the ingressConfig
field.
Added support for Rook 1.14.10 along with support for Ceph CSI v3.11.0.
The updated Rook version contains the following brand new features included
into the Ceph Controller API:
Introduced the ability to define custom monitor endpoint using the
monitorIP field located in the nodes section of the
KaasCephCluster CR. This field allows defining the monitor IP address
from the Ceph public network range. For example:
roles:["mon","mgr"]monitorIP:"196.168.13.1"
Added support for balancer mode for the Ceph Manager balancer module
using the settings.balancerMode field in the KaasCephCluster CR.
For example:
To allow the operator use the gitops approach, implemented the
BareMetalHostInventory resource that must be used instead of
BareMetalHost for adding and modifying configuration of bare metal servers.
The BareMetalHostInventory resource monitors and manages the state of a
bare metal server and is created for each Machine with all information
about machine hardware configuration.
Each BareMetalHostInventory object is synchronized with an automatically
created BareMetalHost object, which is now used for internal purposes of
the Container Cloud private API.
Caution
Any change in the BareMetalHost object will be overwitten by
BareMetalHostInventory.
For any existing BareMetalHost object, a BareMetalHostInventory object
is created automatically during cluster update.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
Introduced automatic pausing of a MOSK cluster update using
the UpdateAutoPause object. The operator can now define specific StackLight
alerts that trigger auto-pause of an update phase execution. The feature
enhances update management of MOSK clusters by preventing
harmful changes to be propagated across the entire cloud.
Granular cluster update through the Container Cloud web UI¶
Implemented the ability to granularly update a MOSK cluster
in the Container Cloud web UI using the ClusterUpdatePlan object. The
feature introduces a convenient way to perform and control every step of a
MOSK cluster update.
MOSK 25.1 introduces switching of the default container
runtime for the underlying Kubernetes cluster from Docker to containerd on
greenfield deployments. The use of containerd allows for better Kubernetes
performance and component update without pod restart when applying fixes for
CVEs.
On existing deployments, perform the mandatory migration from Docker to
containerd in the scope of MOSK 25.1.x. Otherwise, the
management cluster update to Container Cloud 2.30.0 will be blocked.
Important
Container runtime migration involves machine cordoning and
draining.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related Container Cloud release, refer
to Mirantis Container Cloud: Release Notes.
OpenStack¶[31186,34132] Pods get stuck during MariaDB operations¶
During MariaDB operations on a management cluster, Pods may get stuck
in continuous restarts with the following example error:
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
To work around the issue, manually adjust the affected dashboards to
restore their custom appearance.
[51524] sf-notifier creates big amount of relogins to Salesforce¶
The incompatibility between the newly implemented session refresh in the
upstream simple-salesforce with the MOSK implementation of session refresh
in sf-notifier results in the uncontrolled growth of new logins and lack
of session reuse. The issue applies to both MOSK and management clusters.
Workaround:
The workaround to the issue is to change the sf-notifier image tag directly
in the Deployment object. This change is not persistent as this direct change
in the Deployment object will be reverted or overridden by:
Container Cloud version update (for management clusters)
Cluster release version update (for MOSK cluster)
Any sf-notifier-related operation (for all clusters):
Disable and enable
Credentials change
IDs change
Any configuration change for resources, node selector, tolerations, and
log level
Once applied, this workaround must be re-applied whenever one of the
above operations is performed in the cluster.
Compare the sf-notifier image tag with the list of affected tags.
If the image is affected, it has to be replaced. Otherwise, your cluster
is not affected.
In the resulting string, replace only the tag of the affected image with
the desired v0.4-20240828023015 tag. Keep the registry the same as
in the original Deployment object.
Wait until the pod with the updated image is created, and check the logs.
Verify that there are no errors in the logs:
kubectllogspod/<sf-notifierpod>-nstacklight
As this change is not persistent and can be reverted by the cluster update
operation or any operation related to sf-notifier, periodically check all
clusters and if the change has been reverted, re-apply the workaround.
Optionally, you can add a custom alert that will monitor the current tag of
the sf-notifier image and will fire the alert if the tag is present in
the list of affected tags. For the custom alert configuration details,
refer to alert-configuration.
Example of a custom alert to monitor the current tag of the sf-notifier
image:
Container Cloud web UI¶[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
Now, all other services in the Tungsten Fabric namespace should be in
the Active state.
[49705] Cluster update is stuck due to unhealthy tf-vrouter-agent-dpdk pods¶
During a MOSK cluster update, the tf-vrouter-agent-dpdk pods may become
unhealthy due to a failed LivenessProbe, causing the update process to get
stuck. The issue may only affect major updates when the cluster dataplane
components are restarted.
To work around the issue, manually remove the tf-vrouter-agent-dpdk
pods.
[43058] Resolved the issue with the Cronjob for MariaDB that prevented
it from transitioning to the APPLYING state after changing the
OpenStackDeployment custom resource.
[47269] Resolved the issue that prevented instances from live-migrating.
[48890] Resolved the issue that caused an extremely high load on the
gateway nodes.
[49678] Resolved the issue that caused the flapping
status (Configure → Ready → Configure → Ready) of machines
where any HostOSConfiguration object was targeted and migration to
containerd was applied.
[49340] Resolved the issue that caused failure
of tag-based log filtering using the tag_include parameter for
logging.externalOutputs when output_kind:audit is selected.
[45215] Resolved the performance issue in the
OpenStack PortProber Grafana dashboard when handling large
amounts of metrics that caused time ranges to exceed one hour.
Implemented recording rules and updated the dashboard to leverage them,
resulting in significant performance improvements. But be aware that
the updated dashboard will only display data collected after cluster update.
To access older data, use the OpenStack PortProber [Deprecated]
dashboard that will be removed in one of the following releases due to being
unreliable when querying extended time ranges in high-load clusters.
[42660] Resolved the issue that caused the
Nova - Hypervisor Overview Grafana dashboard to display the load
average (per vCPU), allocated memory, and allocated disk (allocated by VMs)
instead of real CPU, memory, and disk utilization with data collected from
node-exporter and OpenStack Nova.
[39368] Resolved the issue that caused the
DockerSwarmNodeFlapping to be firing during cluster update.
It is expected to see the DockerSwarmNodeFlapping and
DockerSwarmServiceReplicasFlapping alerts firing during cluster update to
Container Cloud 2.29.0 but only before the StackLight component is updated.
[39077] Resolved the issue that caused the
TelegrafGatherErrors for telegraf-docker-swarm to be firing during
cluster update.
Reworked the TelegrafGatherErrors alert and replaced it with
TelegrafSMARTGatherErrors and TelegrafDockerSwarmGatherErrors alerts.
It is expected to see the TelegrafGatherErrors alert firing during
cluster update to Container Cloud 2.29.0 but only before the StackLight
component is updated.
This section describes the specific actions you as a Cloud Operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster update to the
version 25.1. Consider this information as a supplement to the generic
update procedure published in Operations Guide: Update a
MOSK cluster.
There is a possibility to update to the 25.1 version from the following
cluster versions:
24.3 (released on October 16, 2024)
24.3.2 (released on February 03, 2025)
Important
Be advised that updating to version 25.1 will not be possible
from at least the upcoming 24.3.3 and 24.3.4 patches. For the detailed
cluster update schema, refer to Managed cluster update schema.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of the North-South
connectivity, depending on the type of virtual routers used by
a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds
Non-distributed routers, non-HA mode - interruption up to 10 minutes
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
Host OS components
No impact
Instance network connectivity interruption up to 5 minutes
Host OS kernel
No impact
Restart of instances due to the hypervisor reboot 0
Host operating system needs to be rebooted for the kernel update
to be applied. Configure live-migration of workloads to avoid the impact on the instances running
on a host.
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Update known issues.
Since Ubuntu 20.04 reaches end-of-life in April 2025, MOSK
25.1 does not support the Cluster release update of the Ubuntu 20.04-based
clusters, and Ubuntu 22.04 becomes the only supported version of the host
operating system.
Therefore, ensure that all your MOSK clusters are running
Ubuntu 22.04 to unblock update of the management cluster to the Cluster release
16.4.1. For the Ubuntu upgrade procedure, refer to Upgrade an operating system distribution.
Caution
Usage of third-party software, which is not part of
Mirantis-supported configurations, for example, the use of custom DPDK
modules, may block upgrade of an operating system distribution. Users are
fully responsible for ensuring the compatibility of such custom components
with the latest supported Ubuntu version.
In MOSK 25.1 and Container Cloud 2.29.0, Grafana is updated
to version 11 where the following deprecated Angular-based plugins are
automatically migrated to the React-based ones:
Graph (old) -> Time Series
Singlestat -> Stat
Stat (old) -> Stat
Table (old) -> Table
Worldmap -> Geomap
This migration may corrupt custom Grafana dashboards that have Angular-based
panels. Therefore, if you have such dashboards, back them up and manually
upgrade Angular-based panels before updating to MOSK 25.1
to prevent custom appearance issues after plugin migration.
Note
All Grafana dashboards provided by StackLight are also migrated to
React automatically. For the list of default dashboards, see
View Grafana dashboards.
Warning
For management clusters that are updated automatically, it is
important to prepare the backup before Container Cloud 2.29.0 is released.
Otherwise, custom dashboards using Angular-based plugins may be corrupted.
For managed clusters, you can perform the backup after the Container Cloud
2.29.0 release date but before updating them to MOSK
25.1.
Post-update actions¶Hide sensitive ingress data for Ceph public endpoints¶
Since MOSK 25.1, you can hide ingress TLS certificates for
Ceph Object Gateway public endpoint in a secret object and use
tlsSecretRefName in the Ceph cluster spec. This configuration prevents
exposing sensitive data of Ceph public endpoints.
On existing clusters, Mirantis recommends updating the Ceph cluster spec
by replacing fields containing TLS certificates with tlsSecretRefName:
Update the Alertmanager API v1 integrations to v2¶
Note
This step applies if you use the Alertmanager API v1 in your
integrations and configurations. Otherwise, skip this step.
In MOSK 25.1 and Container Cloud 2.29.0, the Alertmanager
API v1 is deprecated and will be removed in one of the upcomping
MOSK and Container Cloud releases. For details, see
Deprecation Notes.
Therefore, if you use API v1, update your integrations and configurations to
use the API v2 ensuring compatibility with new versions of Alertmanager.
This step applies if log forwarding to external destinations is
enabled. Otherwise, skip this step.
In the following major MOSK and Container Cloud releases,
the Fluentd plugin out_elasticsearch will be updated to the version that
no longer supports external output to opensearch.
Therefore, if you use opensearch as an external destination for logging and
used the elasticsearch value for the logging.externalOutputs[].type
parameter, change it to opensearch in the scope of Container Cloud 2.29.x
and MOSK 25.1.x release series. For the configuration
procedure, see Enable log forwarding to external destinations.
MOSK 25.1 introduces several enhancements for monitoring of
RabbitMQ by StackLight, which include deprecation of some RabbitMQ metrics,
alerts, and dashboard. For details, see RabbitMQ monitoring rework.
If you use deprecated RabbitMQ metrics in customizations such as alerts and
dashboards, switch to the new metrics and dashboards within the course of the
MOSK 25.1 series to prevent issues once the deprecated
metrics and dashboard will be removed.
Start using BareMetalHostInventory instead of BareMetalHost¶
MOSK 25.1 introduces the BareMetalHostInventory resource
that must be used instead of BareMetalHost for adding and modifying
configuration of bare metal servers. Therefore, if you need to modify an
existing or create a new configuration of a bare metal host, use
BareMetalHostInventory.
Each BareMetalHostInventory object is synchronized with an automatically
created BareMetalHost object, which is now used for internal purposes of
the Container Cloud private API.
Caution
Any change in the BareMetalHost object will be overwitten by
BareMetalHostInventory.
For any existing BareMetalHost object, a BareMetalHostInventory object
is created automatically during cluster update.
Caution
While the Cluster release of the management cluster is 16.4.0,
BareMetalHostInventory operations are allowed to
m:kaas@management-admin only. This limitation is lifted once the
management cluster is updated to the Cluster release 16.4.1 or later.
Migrate container runtime from Docker to containerd¶
MOSK 25.1 introduces switching of the default container
runtime for the underlying Kubernetes cluster from Docker to containerd on
greenfield deployments.
On existing deployments, perform the mandatory migration from Docker to
containerd in the scope of MOSK 25.1.x. Otherwise, the
management cluster update to Container Cloud 2.30.0 will be blocked.
Important
Container runtime migration involves machine cordoning and
draining.
In total, since MOSK 24.3 major release, in 25.1,
285 Common Vulnerabilities and Exposures (CVE) have been fixed:
10 of critical and 275 of high severity.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component since
MOSK 24.3.2 patch. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.29.0: Security notes.
Implemented full support for OpenStack Caracal with Open vSwitch and Tungsten
Fabric networking backends for greenfield deployments and for an upgrade from
OpenStack Antelope. To upgrade an existing cloud from OpenStack Antelope
to Caracal, follow the Upgrade OpenStack procedure.
Highlights from upstream OpenStack supported by
MOSK deployed on Caracal
Horizon:
Horizon added TOTP authentication support, allowing users to enhance their
security by authenticating with Time-based One-Time Passwords.
Manila:
Manila shares and access rules can now be locked against deletion.
A generic resource locks framework has been introduced to facilitate this.
Users can also hide sensitive fields of access rules with this feature.
Neutron:
Limit the rate at which instances can query the metadata service in order
to protect the OpenStack deployment from DoS or misbehaved instances.
New API which allows to define a set of security group rules to be used
automatically in every new default and/or custom security group created
for any project.
Nova:
It is now possible to define different authorization policies for
migration with and without a target host.
Since the OpenStack Caracal release, the Tungsten Fabric Horizon plugin
has been deprecated and removed. This change impacts the
Networking panel in OpenStack Horizon, which previously
allowed for managing Network IPAMs and Network policies. With the removal
of this plugin, Horizon no longer supports these features.
As a result, cloud operators may encounter Tungsten Fabric service networks
with snat-si in their names. These networks will be visible in
the network tabs and during the creation of ports or instances. Mirantis
advises cloud operators not to interact with these networks, as doing so
may cause system malfunctions.
Implemented full support for Ubuntu 22.04 LTS (Jammy Jellyfish) as the default
host operating system in MOSK clusters, including greenfield
deployments and update from Ubuntu 20.04 to 22.04 on existing clusters.
Ubuntu 20.04 is unsupported for greenfield deployments and is considered
deprecated during the MOSK 24.3 release cycle for existing
clusters.
Note
Since Container Cloud 2.27.0 (Cluster release 16.2.0), existing
MOSK management clusters were automatically updated to
Ubuntu 22.04 during cluster upgrade. Greenfield deployments of management
clusters are also based on Ubuntu 22.04.
Instance migration for non-administrative OpenStack users¶
Implemented the capability for non-administrative OpenStack users to migrate
instances, including both live and cold migrations. This functionality is
useful when performing different maintenance tasks including cloud updates,
handling noisy neighbors, and other operational needs.
Implemented the capability to connect external OpenID Connect (OIDC)
identity providers to MOSK Identity service (OpenStack
Keystone) directly through the OpenStackDeployment custom resource.
Introduced the ability to configure custom volume backends for
MOSK Block Storage service (OpenStack Cinder), enhancing
flexibility in storage management. Users can now define and deploy their own
backend configurations through the OpenStackDeployment custom resource.
Implemented the capability to automatically power off the guest instances
during the compute node shutdown or reboot through the ACPI power event.
This ensures the integrity of disk filesystems and prevents damage to running
applications during cluster updates.
Introduced general availability support for the MOSK
Shared Filesystems service (OpenStack Manila), allowing cloud users
to create and manage virtual file shares. This enables applications
to store data using common network file-sharing protocols such as CIFS,
NFS, and more.
Implemented the Diagnostic Controller that performs cluster self-diagnostics
to help the operator to easily understand, troubleshoot, and resolve potential
issues against the major cluster components, including OpenStack, Tungsten
Fabric, Ceph, and StackLight.
Running self-diagnostics is essential to ensure the overall health and optimal
performance of a cluster. Mirantis recommends running self-diagnostics
before cluster update, node replacement, or any other significant changes
in the cluster to prevent potential issues and optimize the maintenance window.
Implemented etcd as a backend for TaskFlow within Octavia, offering a scalable,
consistent, and fault-tolerant solution for persisting and managing task
states. This ensures that Octavia reliably handles distributed load balancing
tasks in a Kubernetes cluster.
Separated the vRouter provisioner from other Tungsten Fabric components.
Now, the vRouter provisioner is deployed as a separate DaemonSet
tf-vrouter-provisioner to allow for better control over
the vRouter components.
Automatic conversion to Tungsten Fabric Operator API v2¶
Implemented the automatic conversion of the Tungsten Fabric cluster
configuration API (TFOperator) v1alpha1 to the v2 version during update
to MOSK 24.3.
Since MOSK 24.3, the v2 TFOperator custom resource
should be used for any updates. The v1alpha1 TFOperator custom resource
will remain in the cluster but will no longer be reconciled and will be
automatically removed with the next cluster update.
Implemented monitoring of orphaned allocations in the MOSK
Compute service (OpenStack Nova). This feature simplifies the detection and
troubleshooting of orphaned resource allocations, ensuring that resources
are correctly assigned and utilized within the cloud infrastructure.
Implemented health monitoring of the PowerDNS backend for
MOSK DNS service (OpenStack Designate) using StackLight that
allows detecting and preventing PowerDNS issues. Started scraping a set of
metrics to monitor PowerDNS networking and detect server errors, failures, and
outages. Based on these metrics, added the dedicated
OpenStack PowerDNS Grafana dashboard and several alerts to notify
the operator of any detected issues.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related Container Cloud release, refer
to Mirantis Container Cloud: Release Notes.
OpenStack¶[31186,34132] Pods get stuck during MariaDB operations¶
During MariaDB operations on a management cluster, Pods may get stuck
in continuous restarts with the following example error:
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
During the upgrade to OpenStack Caracal, the masakari-db-sync Kubernetes
Job is failing preventing the Masakari API pods from initializing. The failure
is caused during migration of the Masakari database from legacy SQLAlchemy
Migrate to Alembic due to misconfigured alembic_table.
Workaround:
Before upgrading to Caracal, pin the masakari_db_sync image to
the updated Caracal image by adding the following content
to the OpenStackDeployment custom resource:
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
Tag-based filtering of logs using the tag_include parameter does not work
for the logging.externalOutputs feature when output_kind:audit is
selected.
For example, if the user wants to send only logs from the sudo
program and sets tag_include:sudo, none of the logs will be sent to an
external destination.
To work around the issue, allow forwarding of all audit logs in addition to
sudo, which include logs from sshd,
systemd-logind, and su. Instead of tag_include:sudo,
specify tag_include:'{sudo,systemd-audit}'.
When the fix applies in MOSK 25.1, filtering starts working automatically.
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
The designate-zone-setup Kubernetes job in the openstack namespace
fails during update to MOSK 24.3 with the following error present in the
logs of the job pod:
The issue occurs when the DNS service (OpenStack Designate) has any TLDs
created, but test is not among them. Since DNS service monitoring
was added to MOSK 24.3, it attempts to create a test zone test-zone.test
in the Designate service, which fails if the test TLD is missing.
To work around the issue, verify that there are created TLDs present
in the DNS service:
openstacktldlist-fvalue-cname
If there are TLDs present and test is not one of them, create it:
Warning
Do not create the test TLD if no TLDs were present
in the DNS service initially. In this case, the issue is caused by
a different factor, and creating the test TLD when none existed
before may disrupt users of both the DNS and Networking services.
[49705] Cluster update is stuck due to unhealthy tf-vrouter-agent-dpdk pods¶
During a MOSK cluster update, the tf-vrouter-agent-dpdk pods may become
unhealthy due to a failed LivenessProbe, causing the update process to get
stuck. The issue may only affect major updates when the cluster dataplane
components are restarted.
To work around the issue, manually remove the tf-vrouter-agent-dpdk
pods.
Container Cloud web UI¶[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
The following issues have been addressed in the MOSK
24.3 release:
[43966][Antelope] Improved parallel image downloading when Glance
is configured with Cinder backend.
[44813][Antelope] Resolved the issue that caused disruption
on trunk ports.
[40900][Tungsten Fabric] Resolved the issue that caused
Cassandra database to enter an infinite table creation or changing
state.
[46220][Tungsten Fabric] Resolved the issue that caused
subsequent cluster maintenance requests to get stuck on clusters running
Tungsten Fabric with API v2, after updating from MOSK 24.2 to 24.2.1.
This section describes the specific actions you as a Cloud Operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster update to the
version 24.3. Consider this information as a supplement to the generic
update procedure published in Operations Guide: Update a
MOSK cluster.
According to the new cluster update schema introduced in the product in the
MOSK 24.2 series, you can update to the 24.3 version from
the following cluster versions:
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of the North-South
connectivity, depending on the type of virtual routers used by
a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds
Non-distributed routers, non-HA mode - interruption up to 10 minutes
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
Host OS components
No impact
Instance network connectivity interruption up to 5 minutes
Host OS kernel
No impact
Restart of instances due to the hypervisor reboot 0
Host operating system needs to be rebooted for the kernel update
to be applied. Configure live-migration of workloads to avoid the impact on the instances running
on a host.
To properly plan the update maintenance window, use the following
documentation:
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Update known issues.
Pay special attention to [47602] Failed designate-zone-setup job blocks cluster update. Before performing the cluster
update, verify the DNS service (OpenStack Designate) for any created Top-Level
Domains (TLDs). If TLDs are present but the test TLD is missing, create
test according to the known issue description.
MOSK 24.3 release series is the last one to support
Ubuntu 20.04 as the host operating system. Ubuntu 20.04 reaches
end-of-life in April 2025. Therefore, Mirantis encourages all
MOSK users to upgrade their clusters to Ubuntu 22.04 as soon
as possible after getting to MOSK 24.3.
A host operating system upgrade requires reboot of the servers and can be
performed in small batches. For the detailed procedure of the Ubuntu upgrade,
refer to Upgrade an operating system distribution.
Warning
Update of management or MOSK clusters running
Ubuntu 20.04 will not be possible in the following major product version.
Caution
Usage of third-party software, which is not part of
Mirantis-supported configurations, for example, the use of custom DPDK
modules, may block upgrade of an operating system distribution. Users are
fully responsible for ensuring the compatibility of such custom components
with the latest supported Ubuntu version.
Start using new API for Tungsten Fabric configuration (TFOperator v2)¶
Since MOSK 24.3, the v2 TFOperator custom resource
becomes the default and the only way to manage the configuration of
Tungsten Fabric cluster. During update to MOSK 24.3,
the old v1alpha1 TFOperator custom resource will get automatically
converted to version v2.
Note
The v1alpha1 TFOperator custom resource remains in the cluster
but is no longer reconciled and will be automatically removed with
the next major cluster update.
In MOSK 25.1 and Container Cloud 2.29.0, Grafana will be
updated to version 11 where the following deprecated Angular-based plugins will
be automatically migrated to the React-based ones:
Graph (old) -> Time Series
Singlestat -> Stat
Stat (old) -> Stat
Table (old) -> Table
Worldmap -> Geomap
This migration may corrupt custom Grafana dashboards that have Angular-based
panels. Therefore, if you have such dashboards, back them up and manually
upgrade Angular-based panels during the course of MOSK 24.3
and Container Cloud 2.28.x (Cluster releases 17.3.x and 16.3.x) to prevent
custom appearance issues after plugin migration in Container Cloud 2.29.0 and
MOSK 25.1.
Note
All Grafana dashboards provided by StackLight are also migrated to
React automatically. For the list of default dashboards, see
View Grafana dashboards.
Warning
For management clusters that are updated automatically, it is
important to prepare the backup before Container Cloud 2.29.0 is released.
Otherwise, custom dashboards using Angular-based plugins may be corrupted.
For managed clusters, you can perform the backup after the Container Cloud
2.29.0 release date but before updating them to MOSK
25.1.
In total, since MOSK 24.2 major release, in 24.3,
1071 Common Vulnerabilities and Exposures (CVE) have been fixed:
79 of critical and 992 of high severity.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component since
MOSK 24.2.2 patch. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.28.0: Security notes.
The MOSK 24.3.1 patch includes the following updates:
Update of Mirantis Kubernetes Engine (MKE) to 3.7.17.
Update of Mirantis Container Runtime (MCR) to 23.0.15
(with containerd 1.6.36).
Important
As a result of the MCR version update, downtimes during
cluster updates are expected to be similar to those experienced
during a major version update. To accurately plan the cluster update,
refer to Update notes.
The table below contains the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
release version. The common CVEs are issues addressed across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.28.4: Security notes.
The following issues have been addressed in the MOSK
24.3.1 release:
[47602][Update] Resolved the issue with
the designate-zone-setup job that blocked cluster update.
[47603][OpenStack] Resolved the issue that caused Masakari failure
during the OpenStack upgrade to Caracal.
[48160][OpenStack] Resolved the issue that caused instances
to fail booting when using a VFAT-formatted config drive.
[47174][Tungsten Fabric] Adjusted generation of affinity rules for
Redis for the clusters where analytics services are disabled.
[47717][Tungsten Fabric] Resolved the issue with the invalid
BgpAsn setting in tungstenfabric-operator.
[48153][Tungsten Fabric] Resolved the issue with OpenStack
generating duplicate floating IP addresses for Tungsten Fabric within
the same floating IP network, assigning them different IDs.
This section lists MOSK known issues with workarounds for
the MOSK release 24.3.1. For the known issues in the related
Container Cloud release, refer to Mirantis Container Cloud: Release Notes.
OpenStack¶[31186,34132] Pods get stuck during MariaDB operations¶
During MariaDB operations on a management cluster, Pods may get stuck
in continuous restarts with the following example error:
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Sometimes, after changing the OpenStackDeployment custom resource,
it does not transition to the APPLYING state as expected.
To work around the issue, restart the rockoon` pod in the osh-system
namespace.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
Tag-based filtering of logs using the tag_include parameter does not work
for the logging.externalOutputs feature when output_kind:audit is
selected.
For example, if the user wants to send only logs from the sudo
program and sets tag_include:sudo, none of the logs will be sent to an
external destination.
To work around the issue, allow forwarding of all audit logs in addition to
sudo, which include logs from sshd,
systemd-logind, and su. Instead of tag_include:sudo,
specify tag_include:'{sudo,systemd-audit}'.
When the fix applies in MOSK 25.1, filtering starts working automatically.
[51524] sf-notifier creates big amount of relogins to Salesforce¶
The incompatibility between the newly implemented session refresh in the
upstream simple-salesforce with the MOSK implementation of session refresh
in sf-notifier results in the uncontrolled growth of new logins and lack
of session reuse. The issue applies to both MOSK and management clusters.
Workaround:
The workaround to the issue is to change the sf-notifier image tag directly
in the Deployment object. This change is not persistent as this direct change
in the Deployment object will be reverted or overridden by:
Container Cloud version update (for management clusters)
Cluster release version update (for MOSK cluster)
Any sf-notifier-related operation (for all clusters):
Disable and enable
Credentials change
IDs change
Any configuration change for resources, node selector, tolerations, and
log level
Once applied, this workaround must be re-applied whenever one of the
above operations is performed in the cluster.
Compare the sf-notifier image tag with the list of affected tags.
If the image is affected, it has to be replaced. Otherwise, your cluster
is not affected.
In the resulting string, replace only the tag of the affected image with
the desired v0.4-20240828023015 tag. Keep the registry the same as
in the original Deployment object.
Wait until the pod with the updated image is created, and check the logs.
Verify that there are no errors in the logs:
kubectllogspod/<sf-notifierpod>-nstacklight
As this change is not persistent and can be reverted by the cluster update
operation or any operation related to sf-notifier, periodically check all
clusters and if the change has been reverted, re-apply the workaround.
Optionally, you can add a custom alert that will monitor the current tag of
the sf-notifier image and will fire the alert if the tag is present in
the list of affected tags. For the custom alert configuration details,
refer to alert-configuration.
Example of a custom alert to monitor the current tag of the sf-notifier
image:
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
The cluster is affected if orphaned containers with the k8s_ prefix are
present on the affected nodes:
dockerps-a--format'{{ .Names }}'|grep'^k8s_'
Workaround:
Inspect recent Ansible logs at /var/log/lcm/* and make sure that the
only failed task during migration is Deleterunningpods. If so, proceed
to the next step. Otherwise, contact Mirantis support for further
information.
Stop and remove orphaned containers with the k8s_ prefix.
Note
This action has no impact on the cluster because the nodes are
already cordoned and drained as part of the maintenance window.
[49678] The Machine status is flapping after migration to containerd¶
On cluster machines where any HostOSConfiguration object is targeted and
migration to containerd is applied, the machine status may be flapping
(Configure → Ready → Configure → Ready) with the
HostOSConfiguration-related Ansible tasks constantly restarting. This
occurs due to the HostOSConfiguration object state items being constantly
added and then removed from related LCMMachine objects.
To work around the issue, temporarily disable all HostOSConfiguration
objects until the issue is resolved. The expected Container Cloud release with
the issue resolution is targeted to Container Cloud 2.29.0, after the
management cluster update to the Cluster release 16.4.0.
To disable HostOSConfiguration objects:
In the machineSelector:matchLabels section of every
HostOSConfiguration object, remove the corresponding label selectors for
cluster machines.
Wait for each HostOSConfiguration object status to be updated
and the machinesStates field to be absent:
Once the issue is resolved in the target release, re-enable all objects using
the same procedure.
Container Cloud web UI¶[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
Although MOSK 24.3.1 is classified as a patch release, as a
cloud operator, you will be performing a major update regardless of the upgrade
path: whether you are upgrading from patch 24.2.5 or major version 24.3. This
is because of the Mirantis Container Runtime update to 23.0.15.
This section describes the specific actions you need to complete to accurately
plan and successfully perform the update. Consider this information as
a supplement to the generic update procedure published in
Operations Guide: Update a MOSK cluster.
Optional migration of container runtime from Docker to containerd¶
MOSK 24.3.1 enables optional migration of container runtime
from Docker to containerd. Usage of containerd introduces a major enhancement
for cloud workloads running on top of MOSK, as it helps
minimize the network connectivity downtime caused by underlying infrastructure
updates.
To minimize the number of required maintenance windows, Mirantis recommends
that cloud operators switch to the containerd runtime on the nodes
simultaneously with their upgrade to Ubuntu 22.04 if the Ubuntu upgrade has
not been completed before.
Note
Migration from Docker to containerd will become mandatory in
MOSK 25.1.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of the North-South
connectivity, depending on the type of virtual routers used by
a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds
Non-distributed routers, non-HA mode - interruption up to 10 minutes
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
Host OS components
No impact
Instance network connectivity interruption up to 5 minutes
Host OS kernel
No impact
Restart of instances due to the hypervisor reboot 0
Host operating system needs to be rebooted for the kernel update
to be applied. Configure live-migration of workloads to avoid the impact on the instances running
on a host.
To properly plan the update maintenance window, use the following
documentation:
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Update known issues.
Post-update actions¶Upgrade Ubuntu to 22.04 and migrate to containerd runtime¶
MOSK 24.3 release series is the last release series
to support Ubuntu 20.04 as the host operating system. Ubuntu 20.04 reaches
end-of-life in April 2025. Therefore, Mirantis encourages all
MOSK users to upgrade their clusters to Ubuntu 22.04
as soon as possible after getting to the 24.3 series. A host operating
system upgrade requires reboot of the nodes and can be performed in small
batches.
Also, MOSK 24.3.1 introduces the new container runtime for
the underlying Kubernetes cluster - containerd. The migration from Docker to
containerd in 24.3.1 is optional and requires node cordoning and draining.
Therefore, if you decide to start using containerd and have not yet upgraded
to Ubuntu 22.04, Mirantis highly recommends that these two changes be applied
simultaneously to every node to minimize downtime for cloud workloads and
users:
In MOSK 24.3.x, the default container runtime
remains Docker for greenfield deployments. Support for greenfield
deployments based on containerd will be announced in one of the following
releases.
Warning
Update of management or MOSK clusters running
Ubuntu 20.04 or Docker runtime will not be possible in the following product
series.
Caution
Usage of third-party software, which is not part of
Mirantis-supported configurations, for example, the use of custom DPDK
modules, may block upgrade of an operating system distribution. Users are
fully responsible for ensuring the compatibility of such custom components
with the latest supported Ubuntu version.
In MOSK 25.1 and Container Cloud 2.29.0, Grafana will be
updated to version 11 where the following deprecated Angular-based plugins will
be automatically migrated to the React-based ones:
Graph (old) -> Time Series
Singlestat -> Stat
Stat (old) -> Stat
Table (old) -> Table
Worldmap -> Geomap
This migration may corrupt custom Grafana dashboards that have Angular-based
panels. Therefore, if you have such dashboards, back them up and manually
upgrade Angular-based panels during the course of MOSK 24.3
and Container Cloud 2.28.x (Cluster releases 17.3.x and 16.3.x) to prevent
custom appearance issues after plugin migration in Container Cloud 2.29.0 and
MOSK 25.1.
Note
All Grafana dashboards provided by StackLight are also migrated to
React automatically. For the list of default dashboards, see
View Grafana dashboards.
Warning
For management clusters that are updated automatically, it is
important to prepare the backup before Container Cloud 2.29.0 is released.
Otherwise, custom dashboards using Angular-based plugins may be corrupted.
For managed clusters, you can perform the backup after the Container Cloud
2.29.0 release date but before updating them to MOSK
25.1.
The table below contains the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
release version. The common CVEs are issues addressed across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.28.5: Security notes.
The following issues have been addressed in the MOSK
24.3.2 release:
[47115][OpenStack] Resolved the issue that prevented the virtual
machine with a floating IP address assigned to it from reaching the gateway
node located on the same hypervisor.
[48274][OpenStack] Resolved the issue that caused the failure
of live migrations if multiple networks existed on an instance.
[48571][OpenStack] Resolved the issue that caused Keystone and
DNS downtimes during the OpenStack controller replacement.
[48614][OpenStack] Resolved the issue that prevented the PXE boot
from downloading the iPXE file due to restrictive permissions on the
tftpboot directory.
This section lists MOSK known issues with workarounds for
the MOSK release 24.3.2. For the known issues in the related
Container Cloud release, refer to Mirantis Container Cloud: Release Notes.
OpenStack¶[31186,34132] Pods get stuck during MariaDB operations¶
During MariaDB operations on a management cluster, Pods may get stuck
in continuous restarts with the following example error:
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Sometimes, after changing the OpenStackDeployment custom resource,
it does not transition to the APPLYING state as expected.
To work around the issue, restart the rockoon` pod in the osh-system
namespace.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
Tag-based filtering of logs using the tag_include parameter does not work
for the logging.externalOutputs feature when output_kind:audit is
selected.
For example, if the user wants to send only logs from the sudo
program and sets tag_include:sudo, none of the logs will be sent to an
external destination.
To work around the issue, allow forwarding of all audit logs in addition to
sudo, which include logs from sshd,
systemd-logind, and su. Instead of tag_include:sudo,
specify tag_include:'{sudo,systemd-audit}'.
When the fix applies in MOSK 25.1, filtering starts working automatically.
[51524] sf-notifier creates big amount of relogins to Salesforce¶
The incompatibility between the newly implemented session refresh in the
upstream simple-salesforce with the MOSK implementation of session refresh
in sf-notifier results in the uncontrolled growth of new logins and lack
of session reuse. The issue applies to both MOSK and management clusters.
Workaround:
The workaround to the issue is to change the sf-notifier image tag directly
in the Deployment object. This change is not persistent as this direct change
in the Deployment object will be reverted or overridden by:
Container Cloud version update (for management clusters)
Cluster release version update (for MOSK cluster)
Any sf-notifier-related operation (for all clusters):
Disable and enable
Credentials change
IDs change
Any configuration change for resources, node selector, tolerations, and
log level
Once applied, this workaround must be re-applied whenever one of the
above operations is performed in the cluster.
Compare the sf-notifier image tag with the list of affected tags.
If the image is affected, it has to be replaced. Otherwise, your cluster
is not affected.
In the resulting string, replace only the tag of the affected image with
the desired v0.4-20240828023015 tag. Keep the registry the same as
in the original Deployment object.
Wait until the pod with the updated image is created, and check the logs.
Verify that there are no errors in the logs:
kubectllogspod/<sf-notifierpod>-nstacklight
As this change is not persistent and can be reverted by the cluster update
operation or any operation related to sf-notifier, periodically check all
clusters and if the change has been reverted, re-apply the workaround.
Optionally, you can add a custom alert that will monitor the current tag of
the sf-notifier image and will fire the alert if the tag is present in
the list of affected tags. For the custom alert configuration details,
refer to alert-configuration.
Example of a custom alert to monitor the current tag of the sf-notifier
image:
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
The cluster is affected if orphaned containers with the k8s_ prefix are
present on the affected nodes:
dockerps-a--format'{{ .Names }}'|grep'^k8s_'
Workaround:
Inspect recent Ansible logs at /var/log/lcm/* and make sure that the
only failed task during migration is Deleterunningpods. If so, proceed
to the next step. Otherwise, contact Mirantis support for further
information.
Stop and remove orphaned containers with the k8s_ prefix.
Note
This action has no impact on the cluster because the nodes are
already cordoned and drained as part of the maintenance window.
[49678] The Machine status is flapping after migration to containerd¶
On cluster machines where any HostOSConfiguration object is targeted and
migration to containerd is applied, the machine status may be flapping
(Configure → Ready → Configure → Ready) with the
HostOSConfiguration-related Ansible tasks constantly restarting. This
occurs due to the HostOSConfiguration object state items being constantly
added and then removed from related LCMMachine objects.
To work around the issue, temporarily disable all HostOSConfiguration
objects until the issue is resolved. The expected Container Cloud release with
the issue resolution is targeted to Container Cloud 2.29.0, after the
management cluster update to the Cluster release 16.4.0.
To disable HostOSConfiguration objects:
In the machineSelector:matchLabels section of every
HostOSConfiguration object, remove the corresponding label selectors for
cluster machines.
Wait for each HostOSConfiguration object status to be updated
and the machinesStates field to be absent:
Once the issue is resolved in the target release, re-enable all objects using
the same procedure.
[49705] Cluster update is stuck due to unhealthy tf-vrouter-agent-dpdk pods¶
During a MOSK cluster update, the tf-vrouter-agent-dpdk pods may become
unhealthy due to a failed LivenessProbe, causing the update process to get
stuck. The issue may only affect major updates when the cluster dataplane
components are restarted.
To work around the issue, manually remove the tf-vrouter-agent-dpdk
pods.
Container Cloud web UI¶[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
This section describes the specific actions you need to complete to accurately
plan and successfully perform the update. Consider this information as
a supplement to the generic update procedures published in
Operations Guide: Cluster update.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of North-South connectivity,
depending on the type of virtual routers used by a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds 0
Non-distributed routers, non-HA mode - interruption up to 10 minutes
0
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
Major update impact and maintenance windows planning¶
The following table provides details on the update impact on
a MOSK cluster.
Host operating system needs to be rebooted for the kernel update
to be applied. Configure live-migration of workloads to avoid the impact on the instances running
on a host.
To properly plan the update maintenance window, use the following
documentation:
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Update known issues.
Post-update actions¶Upgrade Ubuntu to 22.04 and migrate to containerd runtime¶
MOSK 24.3 release series is the last release series
to support Ubuntu 20.04 as the host operating system. Ubuntu 20.04 reaches
end-of-life in April 2025. Therefore, Mirantis encourages all
MOSK users to upgrade their clusters to Ubuntu 22.04
as soon as possible after getting to the 24.3 series. A host operating
system upgrade requires reboot of the nodes and can be performed in small
batches.
Also, MOSK 24.3.1 introduces the new container runtime for
the underlying Kubernetes cluster - containerd. The migration from Docker to
containerd in 24.3.2 is still optional and requires node cordoning and
draining. Therefore, if you decide to start using containerd and have not
yet upgraded to Ubuntu 22.04, Mirantis highly recommends that these two
changes be applied simultaneously to every node to minimize downtime for
cloud workloads and users:
In MOSK 24.3.x, the default container runtime
remains Docker for greenfield deployments. Support for greenfield
deployments based on containerd will be announced in one of the following
releases.
Warning
Update of management or MOSK clusters running
Ubuntu 20.04 or Docker runtime will not be possible in the following product
series.
Usage of third-party software, which is not part of Mirantis-supported
configurations, for example, the use of custom DPDK modules, may block
upgrade of an operating system distribution. Users are ully responsible
for ensuring the compatibility of such custom components with the latest
supported Ubuntu version.
If you have not upgraded the operating system distribution on your machines
to 22.04 yet, Mirantis recommends migrating machines from Docker to containerd
on managed clusters together with distribution upgrade to minimize the
maintenance window. In this case, ensure that all cluster machines are
updated at once during the same maintenance window to prevent machines
from running different container runtimes.
In MOSK 25.1 and Container Cloud 2.29.0, Grafana will be
updated to version 11 where the following deprecated Angular-based plugins will
be automatically migrated to the React-based ones:
Graph (old) -> Time Series
Singlestat -> Stat
Stat (old) -> Stat
Table (old) -> Table
Worldmap -> Geomap
This migration may corrupt custom Grafana dashboards that have Angular-based
panels. Therefore, if you have such dashboards, back them up and manually
upgrade Angular-based panels during the course of MOSK 24.3
and Container Cloud 2.28.x (Cluster releases 17.3.x and 16.3.x) to prevent
custom appearance issues after plugin migration in Container Cloud 2.29.0 and
MOSK 25.1.
Note
All Grafana dashboards provided by StackLight are also migrated to
React automatically. For the list of default dashboards, see
View Grafana dashboards.
Warning
For management clusters that are updated automatically, it is
important to prepare the backup before Container Cloud 2.29.0 is released.
Otherwise, custom dashboards using Angular-based plugins may be corrupted.
For managed clusters, you can perform the backup after the Container Cloud
2.29.0 release date but before updating them to MOSK
25.1.
The table below contains the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
release version. The common CVEs are issues addressed across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.29.1: Security notes.
This section lists MOSK known issues with workarounds for
the MOSK release 24.3.3. For the known issues in the related
Container Cloud release, refer to Mirantis Container Cloud: Release Notes.
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
The cluster is affected if orphaned containers with the k8s_ prefix are
present on the affected nodes:
dockerps-a--format'{{ .Names }}'|grep'^k8s_'
Workaround:
Inspect recent Ansible logs at /var/log/lcm/* and make sure that the
only failed task during migration is Deleterunningpods. If so, proceed
to the next step. Otherwise, contact Mirantis support for further
information.
Stop and remove orphaned containers with the k8s_ prefix.
Note
This action has no impact on the cluster because the nodes are
already cordoned and drained as part of the maintenance window.
[49678] The Machine status is flapping after migration to containerd¶
On cluster machines where any HostOSConfiguration object is targeted and
migration to containerd is applied, the machine status may be flapping
(Configure → Ready → Configure → Ready) with the
HostOSConfiguration-related Ansible tasks constantly restarting. This
occurs due to the HostOSConfiguration object state items being constantly
added and then removed from related LCMMachine objects.
To work around the issue, temporarily disable all HostOSConfiguration
objects until the issue is resolved. The expected Container Cloud release with
the issue resolution is targeted to Container Cloud 2.29.0, after the
management cluster update to the Cluster release 16.4.0.
To disable HostOSConfiguration objects:
In the machineSelector:matchLabels section of every
HostOSConfiguration object, remove the corresponding label selectors for
cluster machines.
Wait for each HostOSConfiguration object status to be updated
and the machinesStates field to be absent:
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Sometimes, after changing the OpenStackDeployment custom resource,
it does not transition to the APPLYING state as expected.
To work around the issue, restart the rockoon` pod in the osh-system
namespace.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
Tag-based filtering of logs using the tag_include parameter does not work
for the logging.externalOutputs feature when output_kind:audit is
selected.
For example, if the user wants to send only logs from the sudo
program and sets tag_include:sudo, none of the logs will be sent to an
external destination.
To work around the issue, allow forwarding of all audit logs in addition to
sudo, which include logs from sshd,
systemd-logind, and su. Instead of tag_include:sudo,
specify tag_include:'{sudo,systemd-audit}'.
When the fix applies in MOSK 25.1, filtering starts working automatically.
[51524] sf-notifier creates big amount of relogins to Salesforce¶
The incompatibility between the newly implemented session refresh in the
upstream simple-salesforce with the MOSK implementation of session refresh
in sf-notifier results in the uncontrolled growth of new logins and lack
of session reuse. The issue applies to both MOSK and management clusters.
Workaround:
The workaround to the issue is to change the sf-notifier image tag directly
in the Deployment object. This change is not persistent as this direct change
in the Deployment object will be reverted or overridden by:
Container Cloud version update (for management clusters)
Cluster release version update (for MOSK cluster)
Any sf-notifier-related operation (for all clusters):
Disable and enable
Credentials change
IDs change
Any configuration change for resources, node selector, tolerations, and
log level
Once applied, this workaround must be re-applied whenever one of the
above operations is performed in the cluster.
Compare the sf-notifier image tag with the list of affected tags.
If the image is affected, it has to be replaced. Otherwise, your cluster
is not affected.
In the resulting string, replace only the tag of the affected image with
the desired v0.4-20240828023015 tag. Keep the registry the same as
in the original Deployment object.
Wait until the pod with the updated image is created, and check the logs.
Verify that there are no errors in the logs:
kubectllogspod/<sf-notifierpod>-nstacklight
As this change is not persistent and can be reverted by the cluster update
operation or any operation related to sf-notifier, periodically check all
clusters and if the change has been reverted, re-apply the workaround.
Optionally, you can add a custom alert that will monitor the current tag of
the sf-notifier image and will fire the alert if the tag is present in
the list of affected tags. For the custom alert configuration details,
refer to alert-configuration.
Example of a custom alert to monitor the current tag of the sf-notifier
image:
Container Cloud web UI¶[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
This section describes the specific actions you need to complete to accurately
plan and successfully perform the update. Consider this information as
a supplement to the generic update procedures published in
Operations Guide: Cluster update.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of North-South connectivity,
depending on the type of virtual routers used by a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds 0
Non-distributed routers, non-HA mode - interruption up to 10 minutes
0
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
Major update impact and maintenance windows planning¶
The following table provides details on the update impact on
a MOSK cluster.
Host operating system needs to be rebooted for the kernel update
to be applied. Configure live-migration of workloads to avoid the impact on the instances running
on a host.
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Update known issues.
Pre-update actions¶Update MOSK clusters to Ubuntu 22.04¶
Management cluster update to Container Cloud 2.29.1 will be blocked if at least
one node of any related MOSK cluster is running Ubuntu
20.04.
Therefore, ensure that every node of your MOSK clusters
are running Ubuntu 22.04 to unblock management cluster update to Container
Cloud 2.29.1 and MOSK cluster update to 24.3.3.
Usage of third-party software, which is not part of
Mirantis-supported configurations, for example, the use of custom DPDK
modules, may block upgrade of an operating system distribution. Users
are fully responsible for ensuring the compatibility of such custom
components with the latest supported Ubuntu version.
Migrate container runtime from Docker to containerd¶
Since 24.3.1, MOSK introduces the new container runtime for
the underlying Kubernetes cluster - containerd. The migration from Docker to
containerd in 24.3.3 is still optional and requires node cordoning and
draining.
If you decide to start using containerd and have not yet upgraded to Ubuntu
22.04, Mirantis highly recommends that these two changes be applied
simultaneously to every node to minimize downtime for cloud workloads
and users. In this case, ensure that all cluster machines are
updated at once during the same maintenance window to prevent machines
from running different container runtimes.
In MOSK 24.3.x, the default container runtime
remains Docker for greenfield deployments. Support for greenfield
deployments based on containerd is added in Container Cloud 2.29.0
(Cluster release 16.4.0) for management clusters and in
MOSK 25.1 for MOSK clusters.
In MOSK 25.1 and Container Cloud 2.29.0, Grafana will be
updated to version 11 where the following deprecated Angular-based plugins will
be automatically migrated to the React-based ones:
Graph (old) -> Time Series
Singlestat -> Stat
Stat (old) -> Stat
Table (old) -> Table
Worldmap -> Geomap
This migration may corrupt custom Grafana dashboards that have Angular-based
panels. Therefore, if you have such dashboards, back them up and manually
upgrade Angular-based panels during the course of MOSK 24.3
and Container Cloud 2.28.x (Cluster releases 17.3.x and 16.3.x) to prevent
custom appearance issues after plugin migration in Container Cloud 2.29.0 and
MOSK 25.1.
Note
All Grafana dashboards provided by StackLight are also migrated to
React automatically. For the list of default dashboards, see
View Grafana dashboards.
Warning
For management clusters that are updated automatically, it is
important to prepare the backup before Container Cloud 2.29.0 is released.
Otherwise, custom dashboards using Angular-based plugins may be corrupted.
For MOSK clusters, you can perform the backup after the
Container Cloud 2.29.0 release date but before updating them to
MOSK 25.1.
The table below contains the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
release version. The common CVEs are issues addressed across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.29.2: Security notes.
This section lists MOSK known issues with workarounds for
the MOSK release 24.3.4. For the known issues in the related
Container Cloud release, refer to Mirantis Container Cloud: Release Notes.
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
The cluster is affected if orphaned containers with the k8s_ prefix are
present on the affected nodes:
dockerps-a--format'{{ .Names }}'|grep'^k8s_'
Workaround:
Inspect recent Ansible logs at /var/log/lcm/* and make sure that the
only failed task during migration is Deleterunningpods. If so, proceed
to the next step. Otherwise, contact Mirantis support for further
information.
Stop and remove orphaned containers with the k8s_ prefix.
Note
This action has no impact on the cluster because the nodes are
already cordoned and drained as part of the maintenance window.
[49678] The Machine status is flapping after migration to containerd¶
On cluster machines where any HostOSConfiguration object is targeted and
migration to containerd is applied, the machine status may be flapping
(Configure → Ready → Configure → Ready) with the
HostOSConfiguration-related Ansible tasks constantly restarting. This
occurs due to the HostOSConfiguration object state items being constantly
added and then removed from related LCMMachine objects.
To work around the issue, temporarily disable all HostOSConfiguration
objects until the issue is resolved. The expected Container Cloud release with
the issue resolution is targeted to Container Cloud 2.29.0, after the
management cluster update to the Cluster release 16.4.0.
To disable HostOSConfiguration objects:
In the machineSelector:matchLabels section of every
HostOSConfiguration object, remove the corresponding label selectors for
cluster machines.
Wait for each HostOSConfiguration object status to be updated
and the machinesStates field to be absent:
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Sometimes, after changing the OpenStackDeployment custom resource,
it does not transition to the APPLYING state as expected.
To work around the issue, restart the rockoon` pod in the osh-system
namespace.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
Tag-based filtering of logs using the tag_include parameter does not work
for the logging.externalOutputs feature when output_kind:audit is
selected.
For example, if the user wants to send only logs from the sudo
program and sets tag_include:sudo, none of the logs will be sent to an
external destination.
To work around the issue, allow forwarding of all audit logs in addition to
sudo, which include logs from sshd,
systemd-logind, and su. Instead of tag_include:sudo,
specify tag_include:'{sudo,systemd-audit}'.
When the fix applies in MOSK 25.1, filtering starts working automatically.
[51524] sf-notifier creates big amount of relogins to Salesforce¶
The incompatibility between the newly implemented session refresh in the
upstream simple-salesforce with the MOSK implementation of session refresh
in sf-notifier results in the uncontrolled growth of new logins and lack
of session reuse. The issue applies to both MOSK and management clusters.
Workaround:
The workaround to the issue is to change the sf-notifier image tag directly
in the Deployment object. This change is not persistent as this direct change
in the Deployment object will be reverted or overridden by:
Container Cloud version update (for management clusters)
Cluster release version update (for MOSK cluster)
Any sf-notifier-related operation (for all clusters):
Disable and enable
Credentials change
IDs change
Any configuration change for resources, node selector, tolerations, and
log level
Once applied, this workaround must be re-applied whenever one of the
above operations is performed in the cluster.
Compare the sf-notifier image tag with the list of affected tags.
If the image is affected, it has to be replaced. Otherwise, your cluster
is not affected.
In the resulting string, replace only the tag of the affected image with
the desired v0.4-20240828023015 tag. Keep the registry the same as
in the original Deployment object.
Wait until the pod with the updated image is created, and check the logs.
Verify that there are no errors in the logs:
kubectllogspod/<sf-notifierpod>-nstacklight
As this change is not persistent and can be reverted by the cluster update
operation or any operation related to sf-notifier, periodically check all
clusters and if the change has been reverted, re-apply the workaround.
Optionally, you can add a custom alert that will monitor the current tag of
the sf-notifier image and will fire the alert if the tag is present in
the list of affected tags. For the custom alert configuration details,
refer to alert-configuration.
Example of a custom alert to monitor the current tag of the sf-notifier
image:
Container Cloud web UI¶[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
This section describes the specific actions you need to complete to accurately
plan and successfully perform the update. Consider this information as
a supplement to the generic update procedures published in
Operations Guide: Cluster update.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of North-South connectivity,
depending on the type of virtual routers used by a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds 0
Non-distributed routers, non-HA mode - interruption up to 10 minutes
0
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
Major update impact and maintenance windows planning¶
The following table provides details on the update impact on
a MOSK cluster.
Host operating system needs to be rebooted for the kernel update
to be applied. Configure live-migration of workloads to avoid the impact on the instances running
on a host.
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Update known issues.
Post-update actions¶Mandatory migration of container runtime from Docker to containerd¶
Migration of container runtime from Docker to containerd, which is implemented
for existing management and managed clusters, becomes mandatory in the scope
of Container Cloud 2.29.x. Otherwise, the management cluster update to 2.30.0
will be blocked.
The use of containerd allows for better Kubernetes performance and component
update without pod restart when applying fixes for CVEs. For the migration
procedure, refer to Migrate container runtime from Docker to containerd.
Important
Container runtime migration involves machine cordoning and
draining.
Implemented the technical preview support for OpenStack Caracal for
greenfield deployments.
To start experimenting with the new functionality, set openstack_version to
caracal in the OpenStackDeployment custom resource during the cloud
deployment.
Completely removed the OpenStackDeploymentSecret custom resource, which was
previously used to aggregate cloud confidential settings.
Sensitive information within the OpenStackDeployment object can be hidden
using the value_from directive. This enhancement was introduced in
MOSK 23.1 and allows for better management of confidential
data without the need for a separate custom resource.
Implemented the possibility to specify the raw option for the image storage
backend through the spec.features section of the OpenStackDeployment
custom resource.
Network port availability monitoring (Portprober)¶
TechPreview
Added support for the network port availability monitoring service
(Portprober). The service is implemented as an extension to the
OpenStack Neutron service and gets enabled automatically together with
the floating IP address availability monitoring service (Cloudprober).
Portprober is available for the MOSK clusters
running OpenStack Antelope or newer version and using the Neutron
OVS backend for networking.
Implemented the technical preview support for Dynamic Resource Balancer
(DRB) service that enables cloud operators to continuously ensure optimal
placement of their workloads.
Introduced full support for Tungsten Fabric Operator API v2. All greenfield
deployments now deploy v2 by default. After updating, all existing deployments
include the ability to convert existing v1alpha1 TFOperator to v2.
This new API version aligns with the OpenStack Controller API and provides
better interface for advanced configurations. The configuration documentation
for Tungsten Fabric provides configuration examples for both API v1alpha1 and
API v2.
Implemented the capability to disable spoof checking on the SR-IOV enabled
ports of some virtual network functions. This enhancement streamlines
the control over SR-IOV spoof check within Tungsten Fabric, offering
cloud operators a more seamless experience.
Implemented the capability to specify the target nodes for hosting Tungsten
Fabric SNAT and load balancer namespaces. This feature helps preserve
resources on compute nodes running highly sensitive workloads.
Upgraded Ceph major version from Quincy 17.2.7 to Reef 18.2.3 with an automatic
upgrade of Ceph components during the Cluster version update.
Ceph Reef delivers new version of RocksDB which provides better IO performance.
Also, this version supports RGW multisite re-sharding and contains overall
security improvements.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Ocassionally, after the credential rotation, OpenStack Controller Exporter
fails to scrub the metrics. To work around the issue, restart the pod
with openstack-controller-exporter.
[43058] [Antelope] Cronjob for MariaDB is not created¶
After upgrading to OpenStack Antelope, clusters with configured trunk ports
experience traffic flow disruptions that block the cluster updates.
To work around the issue, pin the MOSK Networking service
(OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
After upgrade to OpenStack Antelope, the virtual machines experience
connectivity disruptions when sending data over the virtual networks.
Network packets with full MTU are dropped.
The issue affects the MOSK clusters with Open vSwitch as the networking
backend and with the following specific MTU settings:
The MTU configured on the tunnel interface of compute nodes is equal
to the value of the
spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu
parameter of the OpenStackDeployment custom resource (if not specified,
default is 1500 bytes).
If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster
is not affected by the issue.
The cluster contains virtual machines that have the MTU of the network
interfaces of the guest operating system larger than the MTU of the value of
the global_physnet_mtu parameter above minus 50 bytes.
To work around the issue, pin the MOSK Networking
service (OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
During initial deployment of a Tungsten Fabric cluster, there is a known issue
where the Cassandra database may enter an infinite table creation or changing
state. This results in the Tungsten Fabric configuration pods failing to reach
the Ready state.
The root cause of this issue is a schema mismatch within Cassandra.
To verify whether the cluster is affected:
The symptoms of this issue can be observed by verifying the Tungsten Fabric
configuration pods:
The events above indicate that the configuration services remain in the
initializing state after deployment due to inability to connect to
the database. As a result, liveness and readiness probes fail, and
the pods continuously restart.
Additionally, each node of Cassandra configuration database logs similar
errors:
After the pod is restarted, monitor the status of other Tungsten Fabric pods.
If they become Ready within two minutes, the issue is resolved. Otherwise,
inspect the latest Cassandra logs in other pods and restart any other pods
exhibiting the same pattern of errors:
Repeat the process until all affected pods become Ready.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
This section lists the update known issues with workarounds for the
MOSK release 24.2.
[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
This section lists the Container Cloud web UI known issues with workarounds
for the MOSK release 24.2.
[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
The following issues have been addressed in the MOSK
24.2 release:
[OpenStack][36524] Resolved the issue causing etcd to enter
a panic state after replacement of the controller node.
[OpenStack][39768] Resolved the issue that caused the OpenStack
controller exporter to fail to initialize within the default timeout on
large (500+ compute nodes) clusters.
[OpenStack][42390] Resolved the issue that caused the absence of
caching for PowerDNS.
[Ceph][42903] Resolved the issue that prevented ceph-controller
from correct handling of missing pools.
[Update][41810] Resolved the cluster update issue caused by the
OpenStack controller flooding.
This section describes the specific actions you as a Cloud Operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster update to the
version 24.2.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of the North-South
connectivity, depending on the type of virtual routers used by
a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds
Non-distributed routers, non-HA mode - interruption up to 10 minutes
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
Host OS components
No impact
Instance network connectivity interruption up to 5 minutes
Host OS kernel
No impact
Restart of instances due to the hypervisor reboot 0
Host operating system needs to be rebooted for the kernel update
to be applied. Configure live-migration of workloads to avoid the impact on the instances running
on a host.
To properly plan the update maintenance window, use the following
documentation:
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Known issues.
Pre-update actions¶Verify that OpenStackDeploymentSecret is not in use¶
MOSK 24.2 includes the updated patch version of
the Cassandra database. With the cluster update, Cassandra is updated
from 3.11.10 to 3.11.17.
Additionally, the connectivity method between the Tungsten Fabric
services and Cassandra database clusters changes from Thrift to
Cassandra Query Language (CQL) protocol.
Therefore, Mirantis highly recommends backing up your Cassandra database
before updating a MOSK cluster with Tungsten Fabric
to 24.2. For the procedure, refer to Back up TF databases.
Post-update actions¶Convert v1alpha1 TFOperator custom resource to v2¶
In MOSK 24.2, the Tungsten Fabric API v2 becomes default
for new deployments and includes the ability to convert existing v1alpha1
TFOperator to v2.
Conversion of TFOperator causes recreation of the Tungsten Fabric service
pods. Therefore, Mirantis recommends performing the conversion within a
maintenance window during or after the update. The conversion is optional
in MOSK 24.2.
If your cluster runs Tungsten Fabric analytics services and you want to obtain
a more lightweight setup, you can disable these services through the custom
resource of the Tungsten Fabric Operator. For the details, refer to the
Tungsten Fabric analytics services deprecation notice.
With the MOSK 24.2 series, the OpenStack Yoga version
is being deprecated. Therefore, Mirantis encourages you to upgrade
to Antelope to start benefitting from the enhanced functionality and
new features of the newer OpenStack release.
MOSK allows for direct upgrade from Yoga to
Antelope, without the need to upgrade to the intermediate Zed release.
To upgrade the cloud, complete the Upgrade OpenStack procedure.
Important
There are several known issue affecting MOSK clusters running
OpenStack Antelope that can disrupt the network connectivity of the cloud
workloads.
If your cluster is still running OpenStack Yoga, update to the MOSK 24.2.1
patch release first and only then upgrade to OpenStack Antelope. If you
have not been applying patch releases previously and would prefer to switch
back to major releases-only mode, you will be able to do this when MOSK 24.3
is released.
If you have updated your cluster to OpenStack Antelope, apply the
workarounds described in Release notes: OpenStack known issues for the following issues:
[45879] [Antelope] Incorrect packet handling between instance and
its gateway
[44813] Traffic disruption observed on trunk ports
In total, since MOSK 24.1 major release, in 24.2,
771 Common Vulnerabilities and Exposures (CVE) have been fixed:
39 of critical and 732 of high severity.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component since
MOSK 24.1.5 patch. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.27.0: Security notes.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.2.1 patch:
The table below contains the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
patch release version. The common CVEs are issues addressed across several
images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.27.3: Security notes.
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Sometimes, after changing the OpenStackDeployment custom resource,
it does not transition to the APPLYING state as expected.
To work around the issue, restart the rockoon` pod in the osh-system
namespace.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
On clusters running Tungsten Fabric with API v2, after updating from MOSK 24.2
to 24.2.1, subsequent cluster maintenance requests may stuck. The root cause
of the issue is a version mismatch within the internal structures of
the Tungsten Fabric Operator.
The output similar to the one below, indicates that the Tungsten Fabric
ClusterWorkloadLock remains in the active state indefinitely preventing
further LCM operations with other components:
apiVersion:lcm.mirantis.com/v1alpha1kind:ClusterWorkloadLockmetadata:creationTimestamp:"2024-08-30T13:50:33Z"generation:1name:tf-openstack-tfresourceVersion:"4414649"uid:582fc558-c343-4e96-a445-a2d1818dcdb2spec:controllerName:tungstenfabricstatus:errorMessage:cluster is not in ready staterelease:17.2.4+24.2.2state:active
Additionally, the LCM controller logs may contain errors similar to:
{"level":"info","ts":"2024-09-02T16:22:16Z","logger":"entrypoint.lcmcluster-controller.req:5520","caller":"lcmcluster/maintenance.go:178","msg":"ClusterWorkloadLock is inactive cwl {{ClusterWorkloadLock lcm.mirantis.com/v1alpha1} {ceph-clusterworkloadlock a45eca91-cd7b-4d68-9a8e-4d656b4308af 3383288 1 2024-08-30 13:15:14 +0000 UTC <nil> <nil> map[] map[miraceph-ready:true] [{v1 Namespace ceph-lcm-mirantis 43853f67-9058-44ed-8287-f650dbeac5d7 <nil> <nil>}][] [{ceph-controller Update lcm.mirantis.com/v1alpha1 2024-08-30 13:25:53 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:miraceph-ready\":{}},\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"43853f67-9058-44ed-8287-f650dbeac5d7\\\"}\":{}}},\"f:spec\":{\".\":{},\"f:controllerName\":{}}} } {ceph-controller Update lcm.mirantis.com/v1alpha1 2024-09-02 10:48:27 +0000 UTC FieldsV1 {\"f:status\":{\".\":{},\"f:release\":{},\"f:state\":{}}} status}]} {ceph} {inactive 17.2.4+24.2.2}}","ns":"child-ns-tf","name":"child-cl"}{"level":"info","ts":"2024-09-02T16:22:16Z","logger":"entrypoint.lcmcluster-controller.req:5520","caller":"lcmcluster/maintenance.go:178","msg":"ClusterWorkloadLock is inactive cwl {{ClusterWorkloadLock lcm.mirantis.com/v1alpha1} {openstack-osh-dev 7de2b86f-d247-4cee-be8d-dcbcf5e1e11b 3382535 1 2024-08-30 13:50:54 +0000 UTC <nil> <nil> map[] map[] [] [] [{pykube-ng Update lcm.mirantis.com/v1alpha1 2024-08-30 13:50:54 +0000 UTC FieldsV1 {\"f:spec\":{\".\":{},\"f:controllerName\":{}}} } {pykube-ng Update lcm.mirantis.com/v1alpha1 2024-09-02 10:47:29 +0000 UTC FieldsV1 {\"f:status\":{\".\":{},\"f:release\":{},\"f:state\":{}}} status}]} {openstack} {inactive 17.2.4+24.2.2}}","ns":"child-ns-tf","name":"child-cl"}{"level":"info","ts":"2024-09-02T16:22:16Z","logger":"entrypoint.lcmcluster-controller.req:5520","caller":"lcmcluster/maintenance.go:173","msg":"ClusterWorkloadLock is still active cwl {{ClusterWorkloadLock lcm.mirantis.com/v1alpha1} {tf-openstack-tf 582fc558-c343-4e96-a445-a2d1818dcdb2 3382495 1 2024-08-30 13:50:33 +0000 UTC <nil> <nil> map[] map[] [] [] [{maintenance-ctl Update lcm.mirantis.com/v1alpha1 2024-08-30 13:50:33 +0000 UTC FieldsV1 {\"f:spec\":{\".\":{},\"f:controllerName\":{}}} } {maintenance-ctl Update lcm.mirantis.com/v1alpha1 2024-09-02 10:47:25 +0000 UTC FieldsV1 {\"f:status\":{\".\":{},\"f:errorMessage\":{},\"f:release\":{},\"f:state\":{}}} status}]} {tungstenfabric} {active cluster is not in ready state 17.2.4+24.2.2}}","ns":"child-ns-tf","name":"child-cl"}{"level":"error","ts":"2024-09-02T16:22:16Z","logger":"entrypoint.lcmcluster-controller.req:5520","caller":"lcmcluster/lcmcluster_controller.go:388","msg":"","ns":"child-ns-tf","name":"child-cl","error":"following ClusterWorkloadLocks in cluster child-ns-tf/child-cl are still active - tf-openstack-tf: InProgress not all ClusterWorkloadLocks are inactive yet","stacktrace":"sigs.k8s.io/cluster-api-provider-openstack/pkg/lcm/controller/lcmcluster.(*ReconcileLCMCluster).updateCluster\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/lcm/controller/lcmcluster/lcmcluster_controller.go:388\nsigs.k8s.io/cluster-api-provider-openstack/pkg/lcm/controller/lcmcluster.(*ReconcileLCMCluster).Reconcile\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/lcm/controller/lcmcluster/lcmcluster_controller.go:223\nsigs.k8s.io/cluster-api-provider-openstack/pkg/service.(*reconcilePanicCatcher).Reconcile\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/service/reconcile.go:98\nsigs.k8s.io/cluster-api-provider-openstack/pkg/service.(*reconcileContextEnricher).Reconcile\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/service/reconcile.go:78\nsigs.k8s.io/cluster-api-provider-openstack/pkg/service.(*reconcileMetrics).Reconcile\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/service/reconcile.go:136\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:31:
To work around the issue, set the actual version of
Tungsten Fabric Operator in the TFOperator custom resource:
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
Container Cloud web UI¶[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
To improve user update experience and make the update path more flexible,
MOSK is introducing a new scheme of updating between
cluster releases. For the details and possible update paths, refer to
24.1.5 update notes: Cluster update scheme.
You can update to 24.2.1 either from 24.1.7 (major update) or 24.2 (patch
update).
Update from 24.1.7 version is a major update between the series.
Therefore, the expected impact and maintenance window for the major update
to 24.2 series applies. For details, refer to 24.2 update notes
Expected impact when updating within the 24.2 series¶
The following table provides details on the impact of a MOSK
cluster update to a patch release within the 24.2 series.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of North-South connectivity,
depending on the type of virtual routers used by a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds 0
Non-distributed routers, non-HA mode - interruption up to 10 minutes
0
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.2.2 patch.
The table below contains the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
patch release version. The common CVEs are issues addressed across several
images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.27.4: Security notes.
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Sometimes, after changing the OpenStackDeployment custom resource,
it does not transition to the APPLYING state as expected.
To work around the issue, restart the rockoon` pod in the osh-system
namespace.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
On clusters running Tungsten Fabric with API v2, after updating from MOSK 24.2
to 24.2.1, subsequent cluster maintenance requests may stuck. The root cause
of the issue is a version mismatch within the internal structures of
the Tungsten Fabric Operator.
The output similar to the one below, indicates that the Tungsten Fabric
ClusterWorkloadLock remains in the active state indefinitely preventing
further LCM operations with other components:
apiVersion:lcm.mirantis.com/v1alpha1kind:ClusterWorkloadLockmetadata:creationTimestamp:"2024-08-30T13:50:33Z"generation:1name:tf-openstack-tfresourceVersion:"4414649"uid:582fc558-c343-4e96-a445-a2d1818dcdb2spec:controllerName:tungstenfabricstatus:errorMessage:cluster is not in ready staterelease:17.2.4+24.2.2state:active
Additionally, the LCM controller logs may contain errors similar to:
{"level":"info","ts":"2024-09-02T16:22:16Z","logger":"entrypoint.lcmcluster-controller.req:5520","caller":"lcmcluster/maintenance.go:178","msg":"ClusterWorkloadLock is inactive cwl {{ClusterWorkloadLock lcm.mirantis.com/v1alpha1} {ceph-clusterworkloadlock a45eca91-cd7b-4d68-9a8e-4d656b4308af 3383288 1 2024-08-30 13:15:14 +0000 UTC <nil> <nil> map[] map[miraceph-ready:true] [{v1 Namespace ceph-lcm-mirantis 43853f67-9058-44ed-8287-f650dbeac5d7 <nil> <nil>}][] [{ceph-controller Update lcm.mirantis.com/v1alpha1 2024-08-30 13:25:53 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:miraceph-ready\":{}},\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"43853f67-9058-44ed-8287-f650dbeac5d7\\\"}\":{}}},\"f:spec\":{\".\":{},\"f:controllerName\":{}}} } {ceph-controller Update lcm.mirantis.com/v1alpha1 2024-09-02 10:48:27 +0000 UTC FieldsV1 {\"f:status\":{\".\":{},\"f:release\":{},\"f:state\":{}}} status}]} {ceph} {inactive 17.2.4+24.2.2}}","ns":"child-ns-tf","name":"child-cl"}{"level":"info","ts":"2024-09-02T16:22:16Z","logger":"entrypoint.lcmcluster-controller.req:5520","caller":"lcmcluster/maintenance.go:178","msg":"ClusterWorkloadLock is inactive cwl {{ClusterWorkloadLock lcm.mirantis.com/v1alpha1} {openstack-osh-dev 7de2b86f-d247-4cee-be8d-dcbcf5e1e11b 3382535 1 2024-08-30 13:50:54 +0000 UTC <nil> <nil> map[] map[] [] [] [{pykube-ng Update lcm.mirantis.com/v1alpha1 2024-08-30 13:50:54 +0000 UTC FieldsV1 {\"f:spec\":{\".\":{},\"f:controllerName\":{}}} } {pykube-ng Update lcm.mirantis.com/v1alpha1 2024-09-02 10:47:29 +0000 UTC FieldsV1 {\"f:status\":{\".\":{},\"f:release\":{},\"f:state\":{}}} status}]} {openstack} {inactive 17.2.4+24.2.2}}","ns":"child-ns-tf","name":"child-cl"}{"level":"info","ts":"2024-09-02T16:22:16Z","logger":"entrypoint.lcmcluster-controller.req:5520","caller":"lcmcluster/maintenance.go:173","msg":"ClusterWorkloadLock is still active cwl {{ClusterWorkloadLock lcm.mirantis.com/v1alpha1} {tf-openstack-tf 582fc558-c343-4e96-a445-a2d1818dcdb2 3382495 1 2024-08-30 13:50:33 +0000 UTC <nil> <nil> map[] map[] [] [] [{maintenance-ctl Update lcm.mirantis.com/v1alpha1 2024-08-30 13:50:33 +0000 UTC FieldsV1 {\"f:spec\":{\".\":{},\"f:controllerName\":{}}} } {maintenance-ctl Update lcm.mirantis.com/v1alpha1 2024-09-02 10:47:25 +0000 UTC FieldsV1 {\"f:status\":{\".\":{},\"f:errorMessage\":{},\"f:release\":{},\"f:state\":{}}} status}]} {tungstenfabric} {active cluster is not in ready state 17.2.4+24.2.2}}","ns":"child-ns-tf","name":"child-cl"}{"level":"error","ts":"2024-09-02T16:22:16Z","logger":"entrypoint.lcmcluster-controller.req:5520","caller":"lcmcluster/lcmcluster_controller.go:388","msg":"","ns":"child-ns-tf","name":"child-cl","error":"following ClusterWorkloadLocks in cluster child-ns-tf/child-cl are still active - tf-openstack-tf: InProgress not all ClusterWorkloadLocks are inactive yet","stacktrace":"sigs.k8s.io/cluster-api-provider-openstack/pkg/lcm/controller/lcmcluster.(*ReconcileLCMCluster).updateCluster\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/lcm/controller/lcmcluster/lcmcluster_controller.go:388\nsigs.k8s.io/cluster-api-provider-openstack/pkg/lcm/controller/lcmcluster.(*ReconcileLCMCluster).Reconcile\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/lcm/controller/lcmcluster/lcmcluster_controller.go:223\nsigs.k8s.io/cluster-api-provider-openstack/pkg/service.(*reconcilePanicCatcher).Reconcile\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/service/reconcile.go:98\nsigs.k8s.io/cluster-api-provider-openstack/pkg/service.(*reconcileContextEnricher).Reconcile\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/service/reconcile.go:78\nsigs.k8s.io/cluster-api-provider-openstack/pkg/service.(*reconcileMetrics).Reconcile\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/service/reconcile.go:136\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:31:
To work around the issue, set the actual version of
Tungsten Fabric Operator in the TFOperator custom resource:
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
Now, all other services in the Tungsten Fabric namespace should be in
the Active state.
Container Cloud web UI¶[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
To improve user update experience and make the update path more flexible,
MOSK is introducing a new scheme of updating between
cluster releases. For the details and possible update paths, refer to
24.1.5 update notes: Cluster update scheme.
Expected impact when updating within the 24.2 series¶
The following table provides details on the impact of a MOSK
cluster update to a patch release within the 24.2 series.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of North-South connectivity,
depending on the type of virtual routers used by a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds 0
Non-distributed routers, non-HA mode - interruption up to 10 minutes
0
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.2.3 patch, if any.
The table below contains the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
patch release version. The common CVEs are issues addressed across several
images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.28.1: Security notes.
The following issues have been addressed in the MOSK
24.2.3 release:
[46220][Tungsten Fabric] Resolved the issue that caused
subsequent cluster maintenance requests to get stuck on clusters running
Tungsten Fabric with API v2, after updating from MOSK 24.2 to 24.2.1.
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Sometimes, after changing the OpenStackDeployment custom resource,
it does not transition to the APPLYING state as expected.
To work around the issue, restart the rockoon` pod in the osh-system
namespace.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
The designate-zone-setup Kubernetes job in the openstack namespace
fails during update to MOSK 24.3 with the following error present in the
logs of the job pod:
The issue occurs when the DNS service (OpenStack Designate) has any TLDs
created, but test is not among them. Since DNS service monitoring
was added to MOSK 24.3, it attempts to create a test zone test-zone.test
in the Designate service, which fails if the test TLD is missing.
To work around the issue, verify that there are created TLDs present
in the DNS service:
openstacktldlist-fvalue-cname
If there are TLDs present and test is not one of them, create it:
Warning
Do not create the test TLD if no TLDs were present
in the DNS service initially. In this case, the issue is caused by
a different factor, and creating the test TLD when none existed
before may disrupt users of both the DNS and Networking services.
StackLight¶[51524] sf-notifier creates big amount of relogins to Salesforce¶
The incompatibility between the newly implemented session refresh in the
upstream simple-salesforce with the MOSK implementation of session refresh
in sf-notifier results in the uncontrolled growth of new logins and lack
of session reuse. The issue applies to both MOSK and management clusters.
Workaround:
The workaround to the issue is to change the sf-notifier image tag directly
in the Deployment object. This change is not persistent as this direct change
in the Deployment object will be reverted or overridden by:
Container Cloud version update (for management clusters)
Cluster release version update (for MOSK cluster)
Any sf-notifier-related operation (for all clusters):
Disable and enable
Credentials change
IDs change
Any configuration change for resources, node selector, tolerations, and
log level
Once applied, this workaround must be re-applied whenever one of the
above operations is performed in the cluster.
Compare the sf-notifier image tag with the list of affected tags.
If the image is affected, it has to be replaced. Otherwise, your cluster
is not affected.
In the resulting string, replace only the tag of the affected image with
the desired v0.4-20240828023015 tag. Keep the registry the same as
in the original Deployment object.
Wait until the pod with the updated image is created, and check the logs.
Verify that there are no errors in the logs:
kubectllogspod/<sf-notifierpod>-nstacklight
As this change is not persistent and can be reverted by the cluster update
operation or any operation related to sf-notifier, periodically check all
clusters and if the change has been reverted, re-apply the workaround.
Optionally, you can add a custom alert that will monitor the current tag of
the sf-notifier image and will fire the alert if the tag is present in
the list of affected tags. For the custom alert configuration details,
refer to alert-configuration.
Example of a custom alert to monitor the current tag of the sf-notifier
image:
Container Cloud web UI¶[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of North-South connectivity,
depending on the type of virtual routers used by a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds 0
Non-distributed routers, non-HA mode - interruption up to 10 minutes
0
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.2.4 patch, if any.
The table below contains the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
patch release version. The common CVEs are issues addressed across several
images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.28.2: Security notes.
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Sometimes, after changing the OpenStackDeployment custom resource,
it does not transition to the APPLYING state as expected.
To work around the issue, restart the rockoon` pod in the osh-system
namespace.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
The designate-zone-setup Kubernetes job in the openstack namespace
fails during update to MOSK 24.3 with the following error present in the
logs of the job pod:
The issue occurs when the DNS service (OpenStack Designate) has any TLDs
created, but test is not among them. Since DNS service monitoring
was added to MOSK 24.3, it attempts to create a test zone test-zone.test
in the Designate service, which fails if the test TLD is missing.
To work around the issue, verify that there are created TLDs present
in the DNS service:
openstacktldlist-fvalue-cname
If there are TLDs present and test is not one of them, create it:
Warning
Do not create the test TLD if no TLDs were present
in the DNS service initially. In this case, the issue is caused by
a different factor, and creating the test TLD when none existed
before may disrupt users of both the DNS and Networking services.
StackLight¶[51524] sf-notifier creates big amount of relogins to Salesforce¶
The incompatibility between the newly implemented session refresh in the
upstream simple-salesforce with the MOSK implementation of session refresh
in sf-notifier results in the uncontrolled growth of new logins and lack
of session reuse. The issue applies to both MOSK and management clusters.
Workaround:
The workaround to the issue is to change the sf-notifier image tag directly
in the Deployment object. This change is not persistent as this direct change
in the Deployment object will be reverted or overridden by:
Container Cloud version update (for management clusters)
Cluster release version update (for MOSK cluster)
Any sf-notifier-related operation (for all clusters):
Disable and enable
Credentials change
IDs change
Any configuration change for resources, node selector, tolerations, and
log level
Once applied, this workaround must be re-applied whenever one of the
above operations is performed in the cluster.
Compare the sf-notifier image tag with the list of affected tags.
If the image is affected, it has to be replaced. Otherwise, your cluster
is not affected.
In the resulting string, replace only the tag of the affected image with
the desired v0.4-20240828023015 tag. Keep the registry the same as
in the original Deployment object.
Wait until the pod with the updated image is created, and check the logs.
Verify that there are no errors in the logs:
kubectllogspod/<sf-notifierpod>-nstacklight
As this change is not persistent and can be reverted by the cluster update
operation or any operation related to sf-notifier, periodically check all
clusters and if the change has been reverted, re-apply the workaround.
Optionally, you can add a custom alert that will monitor the current tag of
the sf-notifier image and will fire the alert if the tag is present in
the list of affected tags. For the custom alert configuration details,
refer to alert-configuration.
Example of a custom alert to monitor the current tag of the sf-notifier
image:
Container Cloud web UI¶[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of North-South connectivity,
depending on the type of virtual routers used by a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds 0
Non-distributed routers, non-HA mode - interruption up to 10 minutes
0
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.2.5 patch, if any.
The table below contains the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
patch release version. The common CVEs are issues addressed across several
images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.28.3: Security notes.
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Sometimes, after changing the OpenStackDeployment custom resource,
it does not transition to the APPLYING state as expected.
To work around the issue, restart the rockoon` pod in the osh-system
namespace.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
Update known issues¶[42449] Rolling reboot failure on a Tungsten Fabric cluster¶
During cluster update, the rolling reboot fails on the Tungsten Fabric cluster.
To work around the issue, restart the RabbitMQ pods in the Tungsten
Fabric cluster.
[46671] Cluster update fails with the tf-config pods crashed¶
When updating to the MOSK 24.3 series, tf-config pods from the Tungsten
Fabric namespace may enter the CrashLoopBackOff state. For example:
To troubleshoot the issue, check the logs inside the tf-config API
container and the tf-cassandra pods. The following example logs
indicate that Cassandra services failed to peer with each other and
are operating independently:
Logs from the tf-config API container:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.200.23:9042 dc1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1, \'consistency\': \'QUORUM\'}',)})
Logs from the tf-cassandra pods:
INFO [OptionalTasks:1] 2024-09-09 08:59:36,231 CassandraRoleManager.java:419 - Setup task failed with error, reschedulingWARN [OptionalTasks:1] 2024-09-09 08:59:46,231 CassandraRoleManager.java:379 - CassandraRoleManager skipped default role setup:some nodes were not ready
To work around the issue, restart the Cassandra services in the Tungsten
Fabric namespace by deleting the affected pods sequentially to establish
the connection between them:
The designate-zone-setup Kubernetes job in the openstack namespace
fails during update to MOSK 24.3 with the following error present in the
logs of the job pod:
The issue occurs when the DNS service (OpenStack Designate) has any TLDs
created, but test is not among them. Since DNS service monitoring
was added to MOSK 24.3, it attempts to create a test zone test-zone.test
in the Designate service, which fails if the test TLD is missing.
To work around the issue, verify that there are created TLDs present
in the DNS service:
openstacktldlist-fvalue-cname
If there are TLDs present and test is not one of them, create it:
Warning
Do not create the test TLD if no TLDs were present
in the DNS service initially. In this case, the issue is caused by
a different factor, and creating the test TLD when none existed
before may disrupt users of both the DNS and Networking services.
StackLight¶[51524] sf-notifier creates big amount of relogins to Salesforce¶
The incompatibility between the newly implemented session refresh in the
upstream simple-salesforce with the MOSK implementation of session refresh
in sf-notifier results in the uncontrolled growth of new logins and lack
of session reuse. The issue applies to both MOSK and management clusters.
Workaround:
The workaround to the issue is to change the sf-notifier image tag directly
in the Deployment object. This change is not persistent as this direct change
in the Deployment object will be reverted or overridden by:
Container Cloud version update (for management clusters)
Cluster release version update (for MOSK cluster)
Any sf-notifier-related operation (for all clusters):
Disable and enable
Credentials change
IDs change
Any configuration change for resources, node selector, tolerations, and
log level
Once applied, this workaround must be re-applied whenever one of the
above operations is performed in the cluster.
Compare the sf-notifier image tag with the list of affected tags.
If the image is affected, it has to be replaced. Otherwise, your cluster
is not affected.
In the resulting string, replace only the tag of the affected image with
the desired v0.4-20240828023015 tag. Keep the registry the same as
in the original Deployment object.
Wait until the pod with the updated image is created, and check the logs.
Verify that there are no errors in the logs:
kubectllogspod/<sf-notifierpod>-nstacklight
As this change is not persistent and can be reverted by the cluster update
operation or any operation related to sf-notifier, periodically check all
clusters and if the change has been reverted, re-apply the workaround.
Optionally, you can add a custom alert that will monitor the current tag of
the sf-notifier image and will fire the alert if the tag is present in
the list of affected tags. For the custom alert configuration details,
refer to alert-configuration.
Example of a custom alert to monitor the current tag of the sf-notifier
image:
Container Cloud web UI¶[50181] Failure to deploy a compact cluster using the Container Cloud web UI¶
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane:false using the web UI.
To work around the issue, manually add the required labels using CLI. Once
done, the cluster deployment resumes.
[50168] Inability to use a new project through the Container Cloud web UI¶
A newly created project does not display all available tabs and contains
different access denied errors during first five minutes after
creation.
To work around the issue, refresh the browser in five minutes after the
project creation.
~1% of read operations on cloud API resources may fail
~8% of create and update operations on cloud API resources may fail
Open vSwitch networking - interruption of North-South connectivity,
depending on the type of virtual routers used by a workload:
Distributed (DVR) routers - no interruption
Non-distributed routers, High Availability (HA) mode - interruption up
to 1 minute, usually less than 5 seconds 0
Non-distributed routers, non-HA mode - interruption up to 10 minutes
0
Tungsten Fabric networking - no impact
Ceph
~1% of read operations on object storage API may fail
IO performance degradation for Ceph-backed virtual storage devices.
Pay special attention to the known issue
50566
that may affect the maintenance window.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
The primary distinction between major and patch product versions lies in
the fact that major release versions introduce new functionalities,
whereas patch release versions predominantly offer minor product
enhancements, mostly CVE resolutions for your clusters.
Depending on your deployment needs, you can either update only between
major releases or apply patch updates between major releases. Choosing
the latter option, which includes patch updates, ensures you receive
security fixes as soon as they become available. Though, be prepared
to update your cluster frequently, approximately once every three weeks.
Refer to the following documentation for the details about the release
content, update schemes, and so on:
Pre-update inspection of pinned product artifacts in a Cluster object¶
To ensure that Container Cloud clusters remain consistently updated with the
latest security fixes and product improvements, the Admission Controller
has been enhanced. Now, it actively prevents the utilization of pinned
custom artifacts for Container Cloud components. Specifically, it blocks
a management or managed cluster release update, or any cluster configuration
update, for example, adding public keys or proxy, if a Cluster object
contains any custom Container Cloud artifacts with global or image-related
values overwritten in the helm-releases section, until these values are
removed.
Normally, the Container Cloud clusters do not contain pinned artifacts,
which eliminates the need for any pre-update actions in most deployments.
However, if the update of your cluster is blocked with the
invalidHelmReleasesconfiguration error, refer to
Update notes: Pre-update actions for details.
Note
In rare cases, if the image-related or global values should be
changed, you can use the ClusterRelease or KaaSRelease objects
instead. But make sure to update these values manually after every major
and patch update.
Note
The pre-update inspection applies only to images delivered by
Container Cloud that are overwritten. Any custom images unrelated to the
product components are not verified and do not block cluster update.
Added full support for OpenStack Antelope with Open vSwitch and Tungsten Fabric
21.4 networking backends.
Starting from 24.1, MOSK deploys all new clouds with
OpenStack Antelope by default. To upgrade an existing cloud from OpenStack Yoga
to Antelope, follow the Upgrade OpenStack procedure.
Highlights from upstream OpenStack supported by
MOSK deployed on Antelope
Designate:
Ability to share Designate zones across multiple projects. This not only
allows two or more projects to manage recordsets in the zone but also
enables “Classless IN-ADDR.ARPA delegation” (RFC 2317) in Designate.
“Classless IN-ADDR.ARPA delegation” permits IP address DNS PTR record
assignment in smaller blocks without creating a DNS zone for each address.
Manila:
Feature parity between the native client and OSC.
Capability for users to specify metadata when creating their share
snapshots. The behavior should be similar to Manila shares, allowing
users to query snapshots filtering them by metadata, and update or delete
the metadata of the given resources.
Neutron:
Capability for managing network traffic based on packet rate by
implementing the QoS (Quality of Service) rule type “packet per second”
(pps).
Nova:
Improved behavior for Windows guests by adding new Hyper-V enlightments on
all libvirt guests by default.
Ability to unshelve an instance to a specific host (admin only).
With microversion 2.92, the capability to only import a public key and not
generate a keypair. Also, the capability to use an extended name pattern.
Octavia:
Support for notifications about major events of the life cycle of a load
balancer. Only loadbalancer.[create|update|delete].end events are
emitted.
Implemented the capability to enable SPICE remote console through the
OpenStackDeployment custom resource as a method to interact with
OpenStack virtual machines through the CLI and desktop client as well
as MOSK Dashboard (OpenStack Horizon).
The usage of the SPICE remote console is an alternative to using the
noVNC-based VNC remote console.
Implemented the capability to configure and run Windows guests on OpenStack,
which allows for optimization of cloud infrastructure for diverse workloads.
Introduced support for the Virtual Graphics Processing Unit (vGPU) feature
that allows for leveraging the power of virtualized GPU resources to enhance
performance and scalability of cloud deployments.
Introduced the technical preview support for the API v2 for the Tungsten
Fabric Operator. This API version aligns with the OpenStack Controller
API and provides better interface for advanced configurations.
In MOSK 24.1, the API v2 is available only for the
greenfield product deployments with Tungsten Fabric. The Tungsten Fabric
configuration documentation provides configuration examples for both
API v1alpha1 and API v2.
Removed from support Tungsten Fabric analytics services, primarily designed
for collecting various metrics from the Tungsten Fabric services.
Despite its initial implementation, user demand for this feature has been
minimal. As a result, Tungsten Fabric analytics services become unsupported
in the product.
All greenfield deployments starting from MOSK 24.1
do not include Tungsten Fabric analytics services using StackLight capabilities
instead by default. The existing deployments updated to 24.1 and newer versions
will include Tungsten Fabric analytics services as well as the ability to
disable them.
Removal of the StackLight telegraf-openstack plugin¶
Removed StackLight telegraf-openstack plugin and replaced it with
osdpl-exporter.
All valuable Telegraf metrics used by StackLight components have been
reimplemented in osdpl-exporter and all dependent StackLight alerts
and dashboards started using new metrics.
Implemented more restrictive network policies for Kubernetes pods running
OpenStack services.
As part of the enhancement, added NetworkPolicy objects for all types of
Ceph daemons. These policies allow only specified ports to be used by the
corresponding Ceph daemon pods.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
After provisioning the controller node, the etcd pod initiates before the
Kubernetes networking is fully operational. As a result, the pod encounters
difficulties resolving DNS and establishing connections with other members,
ultimately leading to a panic state for the etcd service.
Workaround:
Delete the PVC related to the replaced controller node:
kubectl-nopenstackdeletepvc<PVC-NAME>
Delete pods related to the crashing etcd service on the replaced controller
node:
kubectl-nopenstackdeletepods<ETCD-POD-NAME>
[39768] OpenStack Controller exporter fails to start¶
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
After upgrading to OpenStack Antelope, clusters with configured trunk ports
experience traffic flow disruptions that block the cluster updates.
To work around the issue, pin the MOSK Networking service
(OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
After upgrade to OpenStack Antelope, the virtual machines experience
connectivity disruptions when sending data over the virtual networks.
Network packets with full MTU are dropped.
The issue affects the MOSK clusters with Open vSwitch as the networking
backend and with the following specific MTU settings:
The MTU configured on the tunnel interface of compute nodes is equal
to the value of the
spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu
parameter of the OpenStackDeployment custom resource (if not specified,
default is 1500 bytes).
If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster
is not affected by the issue.
The cluster contains virtual machines that have the MTU of the network
interfaces of the guest operating system larger than the MTU of the value of
the global_physnet_mtu parameter above minus 50 bytes.
To work around the issue, pin the MOSK Networking
service (OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
This section lists the Tungsten Fabric (TF) known issues with
workarounds for the Mirantis OpenStack for Kubernetes release
24.1. For TF limitations, see Tungsten Fabric known limitations.
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
The following issues have been addressed in the MOSK
24.1 release:
[OpenStack][Antelope][37678] Resolved the issue that
prevented instance live-migration due to CPU incompatibility.
[OpenStack][38629] Optimized resource allocation to enable
designate-api to scale up its operation.
[OpenStack][38792] Resolved the issue that prevented
MOSK from creating instances and volumes from
images stored in Pure Storage.
[OpenStack][39069] Resolved the issue that caused the
logging of false alerts about 401 responses from OpenStack endpoints.
[StackLight][36211] Resolved the issue that caused the
deprecated dashboards NGINX Ingress controller and
Ceph Nodes to be displayed in Grafana. These dashboards are
now removed. Therefore, Mirantis recommends switching to the following
dashboards:
OpenStack Ingress controller instead of
NGINX Ingress controller
For Ceph:
Ceph Cluster dashboard for Ceph stats
System dashboard for resource utilization, which includes
filtering by Ceph node labels, such as ceph_role_osd,
ceph_role_mon, and ceph_role_mgr
This section describes the specific actions you as a Cloud Operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster update to the
version 24.1.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
Host operating system needs to be rebooted for the kernel update
to be applied. Configure live-migration of workloads to avoid the impact on the instances running
on a host.
To properly plan the update maintenance window, use the following
documentation:
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Update known issues.
Pre-update actions¶Unblock cluster update by removing any pinned product artifacts¶
If any pinned product artifacts are present in the Cluster object of a
management or managed cluster, the update will be blocked by the Admission
Controller with the invalidHelmReleasesconfiguration error until such
artifacts are removed. The update process does not start and any changes in
the Cluster object are blocked by the Admission Controller except the
removal of fields with pinned product artifacts.
Therefore, verify that the following sections of the Cluster objects
do not contain any image-related (tag, name, pullPolicy,
repository) and global values inside Helm releases:
The custom pinned product artifacts are inspected and blocked by the
Admission Controller to ensure that Container Cloud clusters remain
consistently updated with the latest security fixes and product improvements
Note
The pre-update inspection applies only to images delivered by
Container Cloud that are overwritten. Any custom images unrelated to the
product components are not verified and do not block cluster update.
Post-update actions¶Upgrade OpenStack to Antelope¶
With 24.1, MOSK is rolling out OpenStack Antelope support
for both Open vSwitch and Tungsten Fabric-based deployments.
Mirantis encourages you to upgrade to Antelope to start benefitting from the
enhanced functionality and new features of this OpenStack release.
MOSK allows for direct upgrade from Yoga to
Antelope, without the need to upgrade to the intermediate Zed release.
To upgrade the cloud, complete the Upgrade OpenStack procedure.
Important
There are several known issue affecting MOSK clusters running
OpenStack Antelope that can disrupt the network connectivity of the cloud
workloads.
If your cluster is still running OpenStack Yoga, update to the MOSK 24.2.1
patch release first and only then upgrade to OpenStack Antelope. If you
have not been applying patch releases previously and would prefer to switch
back to major releases-only mode, you will be able to do this when MOSK 24.3
is released.
If you have updated your cluster to OpenStack Antelope, apply the
workarounds described in Release notes: OpenStack known issues for the following issues:
[45879] [Antelope] Incorrect packet handling between instance and
its gateway
[44813] Traffic disruption observed on trunk ports
If your cluster runs Tungsten Fabric analytics services and you want to obtain
a more lightweight setup, you can disable these services through the custom
resource of the Tungsten Fabric Operator. For the details, refer to the
Tungsten Fabric analytics services deprecation notice.
In total, since MOSK 23.3 major release, in 24.1,
327 Common Vulnerabilities and Exposures (CVE) have been fixed:
15 of critical and 312 of high severity.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component since
MOSK 23.3.4. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.26.0: Security notes.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.1.1 patch.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
Introduced the ability to update Ubuntu packages including kernel minor
version update, when available in a product release, to address CVE issues
on a host operating system.
On management clusters, the update of Ubuntu mirror along with the update
of minor kernel version occurs automatically with cordon-drain and reboot
of machines.
On MOSK clusters, the update of Ubuntu mirror along with the update of
minor kernel version applies during a manual cluster update without automatic
cordon-drain and reboot of machines. After a managed cluster update, all
cluster machines have the rebootisrequired notification.
The kernel update is not obligatory on MOSK clusters.
Though, if you prefer obtaining the latest CVE fixes for Ubuntu, update the
kernel by manually rebooting machines during a convenient maintenance window
using GracefulRebootRequest.
In MOSK 24.1.1, the kernel version has been updated to
5.15.0-97-generic.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.26.1: Security notes.
The following issues have been addressed in the MOSK
24.1.1 release:
[40036] Resolved the issue causing nodes to remain in the Kubernetes
cluster when the corresponding machine is marked as disabled during cluster
update.
After provisioning the controller node, the etcd pod initiates before the
Kubernetes networking is fully operational. As a result, the pod encounters
difficulties resolving DNS and establishing connections with other members,
ultimately leading to a panic state for the etcd service.
Workaround:
Delete the PVC related to the replaced controller node:
kubectl-nopenstackdeletepvc<PVC-NAME>
Delete pods related to the crashing etcd service on the replaced controller
node:
kubectl-nopenstackdeletepods<ETCD-POD-NAME>
[39768] OpenStack Controller exporter fails to start¶
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
After upgrading to OpenStack Antelope, clusters with configured trunk ports
experience traffic flow disruptions that block the cluster updates.
To work around the issue, pin the MOSK Networking service
(OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
After upgrade to OpenStack Antelope, the virtual machines experience
connectivity disruptions when sending data over the virtual networks.
Network packets with full MTU are dropped.
The issue affects the MOSK clusters with Open vSwitch as the networking
backend and with the following specific MTU settings:
The MTU configured on the tunnel interface of compute nodes is equal
to the value of the
spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu
parameter of the OpenStackDeployment custom resource (if not specified,
default is 1500 bytes).
If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster
is not affected by the issue.
The cluster contains virtual machines that have the MTU of the network
interfaces of the guest operating system larger than the MTU of the value of
the global_physnet_mtu parameter above minus 50 bytes.
To work around the issue, pin the MOSK Networking
service (OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
Remove the pinning after updating to MOSK 24.2.1 or
later patch or major release.
Tungsten Fabric¶[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.1.2 patch.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component since
MOSK 24.1.1. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.26.2: Security notes.
The following issues have been addressed in the MOSK
24.1.2 release:
[39663] Resolved the issue in OpenStack notifications for the Keystone
component where the "event_type":"identity.user.deleted" field lacked
the ID of the request initiator (req_initiator) in the CADF payload.
[39768] Resolved the issue that caused the OpenStack controller exporter
to fail to initialize within the default timeout on large (500+ compute
nodes) clusters.
[40712] Resolved the issue that caused nova-manageimage_property
to show traces.
[40740] Resolved the issue that caused the OpenStack sos report logs
collection failure.
After provisioning the controller node, the etcd pod initiates before the
Kubernetes networking is fully operational. As a result, the pod encounters
difficulties resolving DNS and establishing connections with other members,
ultimately leading to a panic state for the etcd service.
Workaround:
Delete the PVC related to the replaced controller node:
kubectl-nopenstackdeletepvc<PVC-NAME>
Delete pods related to the crashing etcd service on the replaced controller
node:
kubectl-nopenstackdeletepods<ETCD-POD-NAME>
[41810] Cluster update is stuck due to the OpenStack Controller flooding¶
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
After upgrading to OpenStack Antelope, clusters with configured trunk ports
experience traffic flow disruptions that block the cluster updates.
To work around the issue, pin the MOSK Networking service
(OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
After upgrade to OpenStack Antelope, the virtual machines experience
connectivity disruptions when sending data over the virtual networks.
Network packets with full MTU are dropped.
The issue affects the MOSK clusters with Open vSwitch as the networking
backend and with the following specific MTU settings:
The MTU configured on the tunnel interface of compute nodes is equal
to the value of the
spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu
parameter of the OpenStackDeployment custom resource (if not specified,
default is 1500 bytes).
If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster
is not affected by the issue.
The cluster contains virtual machines that have the MTU of the network
interfaces of the guest operating system larger than the MTU of the value of
the global_physnet_mtu parameter above minus 50 bytes.
To work around the issue, pin the MOSK Networking
service (OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
Remove the pinning after updating to MOSK 24.2.1 or
later patch or major release.
Tungsten Fabric¶[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.1.3 patch.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component since
MOSK 24.1.2. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.26.3: Security notes.
After provisioning the controller node, the etcd pod initiates before the
Kubernetes networking is fully operational. As a result, the pod encounters
difficulties resolving DNS and establishing connections with other members,
ultimately leading to a panic state for the etcd service.
Workaround:
Delete the PVC related to the replaced controller node:
kubectl-nopenstackdeletepvc<PVC-NAME>
Delete pods related to the crashing etcd service on the replaced controller
node:
kubectl-nopenstackdeletepods<ETCD-POD-NAME>
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
After upgrading to OpenStack Antelope, clusters with configured trunk ports
experience traffic flow disruptions that block the cluster updates.
To work around the issue, pin the MOSK Networking service
(OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
After upgrade to OpenStack Antelope, the virtual machines experience
connectivity disruptions when sending data over the virtual networks.
Network packets with full MTU are dropped.
The issue affects the MOSK clusters with Open vSwitch as the networking
backend and with the following specific MTU settings:
The MTU configured on the tunnel interface of compute nodes is equal
to the value of the
spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu
parameter of the OpenStackDeployment custom resource (if not specified,
default is 1500 bytes).
If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster
is not affected by the issue.
The cluster contains virtual machines that have the MTU of the network
interfaces of the guest operating system larger than the MTU of the value of
the global_physnet_mtu parameter above minus 50 bytes.
To work around the issue, pin the MOSK Networking
service (OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
Remove the pinning after updating to MOSK 24.2.1 or
later patch or major release.
Tungsten Fabric¶[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.1.4 patch.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
patch release version. The common CVEs are issues addressed across several
images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.26.4: Security notes.
The following issues have been addressed in the MOSK
24.1.4 release:
[40897][Tungsten Fabric]
Resolved the issue caused by absence of check for error 500
while checking the Tungsten Fabric services.
[41613][Tungsten Fabric]
Resolved the issue caused by incosistencies in the Database
Management Tool (the db_manage.py script) that checks, heals, and
cleans up inconsistent database entries.
[41784][OpenStack]
Resolved the issue caused by the missing dependency between
FIP association and router interface addition to the subnet with a server.
After provisioning the controller node, the etcd pod initiates before the
Kubernetes networking is fully operational. As a result, the pod encounters
difficulties resolving DNS and establishing connections with other members,
ultimately leading to a panic state for the etcd service.
Workaround:
Delete the PVC related to the replaced controller node:
kubectl-nopenstackdeletepvc<PVC-NAME>
Delete pods related to the crashing etcd service on the replaced controller
node:
kubectl-nopenstackdeletepods<ETCD-POD-NAME>
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
After upgrading to OpenStack Antelope, clusters with configured trunk ports
experience traffic flow disruptions that block the cluster updates.
To work around the issue, pin the MOSK Networking service
(OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
After upgrade to OpenStack Antelope, the virtual machines experience
connectivity disruptions when sending data over the virtual networks.
Network packets with full MTU are dropped.
The issue affects the MOSK clusters with Open vSwitch as the networking
backend and with the following specific MTU settings:
The MTU configured on the tunnel interface of compute nodes is equal
to the value of the
spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu
parameter of the OpenStackDeployment custom resource (if not specified,
default is 1500 bytes).
If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster
is not affected by the issue.
The cluster contains virtual machines that have the MTU of the network
interfaces of the guest operating system larger than the MTU of the value of
the global_physnet_mtu parameter above minus 50 bytes.
To work around the issue, pin the MOSK Networking
service (OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
Remove the pinning after updating to MOSK 24.2.1 or
later patch or major release.
Tungsten Fabric¶[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
The MOSK 24.1.4 patch provides the following updates:
Support for MKE 3.7.8
Update of minor kernel version from 5.15.0-102-generic to
5.15.0-105-generic
Security fixes for CVEs in images
Resolved product issues
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.1.5 patch:
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
patch release version. The common CVEs are issues addressed across several
images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.26.5: Security notes.
The following issues have been addressed in the MOSK
24.1.5 release:
[42375][OpenStack] Resolved the issue with the OpenStack Controller
releasing more NodeWorkloadLock objects than allowed during creation of
concurrent NodeMaintenanceRequest objects.
After provisioning the controller node, the etcd pod initiates before the
Kubernetes networking is fully operational. As a result, the pod encounters
difficulties resolving DNS and establishing connections with other members,
ultimately leading to a panic state for the etcd service.
Workaround:
Delete the PVC related to the replaced controller node:
kubectl-nopenstackdeletepvc<PVC-NAME>
Delete pods related to the crashing etcd service on the replaced controller
node:
kubectl-nopenstackdeletepods<ETCD-POD-NAME>
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
After upgrading to OpenStack Antelope, clusters with configured trunk ports
experience traffic flow disruptions that block the cluster updates.
To work around the issue, pin the MOSK Networking service
(OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
After upgrade to OpenStack Antelope, the virtual machines experience
connectivity disruptions when sending data over the virtual networks.
Network packets with full MTU are dropped.
The issue affects the MOSK clusters with Open vSwitch as the networking
backend and with the following specific MTU settings:
The MTU configured on the tunnel interface of compute nodes is equal
to the value of the
spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu
parameter of the OpenStackDeployment custom resource (if not specified,
default is 1500 bytes).
If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster
is not affected by the issue.
The cluster contains virtual machines that have the MTU of the network
interfaces of the guest operating system larger than the MTU of the value of
the global_physnet_mtu parameter above minus 50 bytes.
To work around the issue, pin the MOSK Networking
service (OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
Remove the pinning after updating to MOSK 24.2.1 or
later patch or major release.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
In rare cases, when ceph-controller cannot confirm the existence of
MOSK pools, instead of denying action and raising errors,
it proceeds to recreate the Cinder Ceph client. Such behavior may
potentially cause issues with OpenStack workloads.
Workaround:
In spec.cephClusterSpec of the KaaSCephCluster custom resource,
remove the external section.
Wait for the Not all mgrs are running: 1/2 message to disappear from the
KaaSCephClusterstatus.
Verify that the nova Ceph client that is integrated to
MOSK has the same keyring as in the Ceph cluster.
Keyring verification for the Ceph nova client
Compare the keyring used in the nova-compute and libvirt pods
with the one from the Ceph cluster:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.nova-o/tmp/nova.key
vi/tmp/nova.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/nova.key
Verify that the client.nova keyring of the Ceph cluster matches the
one obtained from the OpenStack pods:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.cinder-o/tmp/cinder.key
vi/tmp/cinder.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/cinder.key
Verify that the client.cinder keyring of the Ceph cluster matches
the one obtained from the OpenStack pods:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.glance-o/tmp/glance.key
vi/tmp/glance.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/glance.key
Verify that the client.glance keyring of the Ceph cluster matches
the one obtained from the OpenStack pods:
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
To improve user update experience and make the update path more flexible,
MOSK is introducing a new scheme of updating between
cluster releases. More specifically, MOSK intends to
ultimately provide a possibility to update to any newer patch version within
single series at any point of time. The patch version downgrade is not
supported.
Though, in some cases, Mirantis may request to update to some specific
patch version in the series to be able to update to the next major series.
This may be necessary due to the specifics of technical content already
released or planned for the release.
Note
The management cluster update scheme remains the same.
A management cluster obtains the new product version automatically
after release.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.1.6 patch:
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
patch release version. The common CVEs are issues addressed across several
images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.27.1: Security notes.
After provisioning the controller node, the etcd pod initiates before the
Kubernetes networking is fully operational. As a result, the pod encounters
difficulties resolving DNS and establishing connections with other members,
ultimately leading to a panic state for the etcd service.
Workaround:
Delete the PVC related to the replaced controller node:
kubectl-nopenstackdeletepvc<PVC-NAME>
Delete pods related to the crashing etcd service on the replaced controller
node:
kubectl-nopenstackdeletepods<ETCD-POD-NAME>
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
After upgrading to OpenStack Antelope, clusters with configured trunk ports
experience traffic flow disruptions that block the cluster updates.
To work around the issue, pin the MOSK Networking service
(OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
After upgrade to OpenStack Antelope, the virtual machines experience
connectivity disruptions when sending data over the virtual networks.
Network packets with full MTU are dropped.
The issue affects the MOSK clusters with Open vSwitch as the networking
backend and with the following specific MTU settings:
The MTU configured on the tunnel interface of compute nodes is equal
to the value of the
spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu
parameter of the OpenStackDeployment custom resource (if not specified,
default is 1500 bytes).
If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster
is not affected by the issue.
The cluster contains virtual machines that have the MTU of the network
interfaces of the guest operating system larger than the MTU of the value of
the global_physnet_mtu parameter above minus 50 bytes.
To work around the issue, pin the MOSK Networking
service (OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
Remove the pinning after updating to MOSK 24.2.1 or
later patch or major release.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
In rare cases, when ceph-controller cannot confirm the existence of
MOSK pools, instead of denying action and raising errors,
it proceeds to recreate the Cinder Ceph client. Such behavior may
potentially cause issues with OpenStack workloads.
Workaround:
In spec.cephClusterSpec of the KaaSCephCluster custom resource,
remove the external section.
Wait for the Not all mgrs are running: 1/2 message to disappear from the
KaaSCephClusterstatus.
Verify that the nova Ceph client that is integrated to
MOSK has the same keyring as in the Ceph cluster.
Keyring verification for the Ceph nova client
Compare the keyring used in the nova-compute and libvirt pods
with the one from the Ceph cluster:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.nova-o/tmp/nova.key
vi/tmp/nova.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/nova.key
Verify that the client.nova keyring of the Ceph cluster matches the
one obtained from the OpenStack pods:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.cinder-o/tmp/cinder.key
vi/tmp/cinder.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/cinder.key
Verify that the client.cinder keyring of the Ceph cluster matches
the one obtained from the OpenStack pods:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.glance-o/tmp/glance.key
vi/tmp/glance.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/glance.key
Verify that the client.glance keyring of the Ceph cluster matches
the one obtained from the OpenStack pods:
To improve user update experience and make the update path more flexible,
MOSK is introducing a new scheme of updating between
cluster releases. More specifically, MOSK intends to
ultimately provide a possibility to update to any newer patch version within
single series at any point of time. The patch version downgrade is not
supported.
Though, in some cases, Mirantis may request to update to some specific
patch version in the series to be able to update to the next major series.
This may be necessary due to the specifics of technical content already
released or planned for the release.
Note
The management cluster update scheme remains the same.
A management cluster obtains the new product version automatically
after release.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
The patch release notes contain the description of product enhancements,
the list of updated artifacts and Common Vulnerabilities and Exposures
(CVE) fixes as well as description of the addressed product issues
for the MOSK 24.1.7 patch:
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component compared to the previous
patch release version. The common CVEs are issues addressed across several
images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.27.2: Security notes.
After provisioning the controller node, the etcd pod initiates before the
Kubernetes networking is fully operational. As a result, the pod encounters
difficulties resolving DNS and establishing connections with other members,
ultimately leading to a panic state for the etcd service.
Workaround:
Delete the PVC related to the replaced controller node:
kubectl-nopenstackdeletepvc<PVC-NAME>
Delete pods related to the crashing etcd service on the replaced controller
node:
kubectl-nopenstackdeletepods<ETCD-POD-NAME>
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
After upgrading to OpenStack Antelope, clusters with configured trunk ports
experience traffic flow disruptions that block the cluster updates.
To work around the issue, pin the MOSK Networking service
(OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
After upgrade to OpenStack Antelope, the virtual machines experience
connectivity disruptions when sending data over the virtual networks.
Network packets with full MTU are dropped.
The issue affects the MOSK clusters with Open vSwitch as the networking
backend and with the following specific MTU settings:
The MTU configured on the tunnel interface of compute nodes is equal
to the value of the
spec:services:networking:neutron:values:conf:neutron:DEFAULT:global_physnet_mtu
parameter of the OpenStackDeployment custom resource (if not specified,
default is 1500 bytes).
If the MTU of the tunnel interface is higher by at least 4 bytes, the cluster
is not affected by the issue.
The cluster contains virtual machines that have the MTU of the network
interfaces of the guest operating system larger than the MTU of the value of
the global_physnet_mtu parameter above minus 50 bytes.
To work around the issue, pin the MOSK Networking
service (OpenStack Neutron) container image by adding the following content
to the OpenStackDeployment custom resource:
Remove the pinning after updating to MOSK 24.2.1 or
later patch or major release.
Tungsten Fabric¶[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[40032] tf-rabbitmq fails to start after rolling reboot¶
Occasionally, RabbitMQ instances in tf-rabbitmq pods fail to enable
the tracking_records_in_ets during the initialization process.
To work around the problem, restart the affected pods manually.
[42896] Cassandra cluster contains extra node
with outdated IP after replacement of TF control node¶
After replacing a failed Tungsten Fabric controller node as described in
Replace a failed TF controller node, the first restart of the Cassandra
pod on this node may cause an issue if the Cassandra node with the outdated
IP address has not been removed from the cluster. Subsequent Cassandra pod
restarts should not trigger this problem.
To verify if your Cassandra cluster is affected, run the
nodetool status command specifying the config or analytics cluster
and the replica number:
An extra node will appear in the cluster with an outdated IP address
(the IP of the terminated Cassandra pod) in the Down state.
To work around the issue, after replacing the Tungsten Fabric
controller node, delete the Cassandra pod on the replaced node and remove
the outdated node from the Cassandra cluster using nodetool:
In rare cases, when ceph-controller cannot confirm the existence of
MOSK pools, instead of denying action and raising errors,
it proceeds to recreate the Cinder Ceph client. Such behavior may
potentially cause issues with OpenStack workloads.
Workaround:
In spec.cephClusterSpec of the KaaSCephCluster custom resource,
remove the external section.
Wait for the Not all mgrs are running: 1/2 message to disappear from the
KaaSCephClusterstatus.
Verify that the nova Ceph client that is integrated to
MOSK has the same keyring as in the Ceph cluster.
Keyring verification for the Ceph nova client
Compare the keyring used in the nova-compute and libvirt pods
with the one from the Ceph cluster:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.nova-o/tmp/nova.key
vi/tmp/nova.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/nova.key
Verify that the client.nova keyring of the Ceph cluster matches the
one obtained from the OpenStack pods:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.cinder-o/tmp/cinder.key
vi/tmp/cinder.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/cinder.key
Verify that the client.cinder keyring of the Ceph cluster matches
the one obtained from the OpenStack pods:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.glance-o/tmp/glance.key
vi/tmp/glance.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/glance.key
Verify that the client.glance keyring of the Ceph cluster matches
the one obtained from the OpenStack pods:
To improve user update experience and make the update path more flexible,
MOSK is introducing a new scheme of updating between
cluster releases. For the details and possible update paths, refer to
24.1.5 update notes: Cluster update scheme.
You can bypass updating components of the cloud data plane to avoid
the network downtime during Update to a patch version. By using
this technique, you accept the risk that some security fixes may
not be applied.
Provided the technical preview support for OpenStack Antelope with Neutron OVS
and Tungsten Fabric 21.4 for greenfield deployments.
To start experimenting with the new functionality, set openstack_version to
antelope in the OpenStackDeployment custom resource during the cloud
deployment.
Implemented the capability to automatically collect logs and generate support
dumps that provide valuable insights for troubleshooting OpenStack-related
problems through the osctl sos report tool present within
the openstack-controller image.
Introduced FIPS-compatible encryption into the API of all
MOSK cloud services, ensuring data security and regulatory
compliance with the FIPS 140-2 standard.
Introduced support for Mirantis Kubernetes Engine (MKE) 3.7 with Kubernetes
1.27. MOSK clusters are updated to the latest supported
MKE version during the cluster update.
Implemented the OpenStack Usage Efficiency dashboard for Grafana
that provides information about requested (allocated) CPU and memory usage
efficiency on a per-project and per-flavor basis.
This dashboard aims to identify flavors that specific projects are not
effectively using, with allocations significantly exceeding actual usage.
Also, it evaluates per-instance underuse for specific projects.
Implemented the following monitoring improvements for Ceph:
Optimized the following Ceph dashboards in Grafana: Ceph Cluster,
Ceph Pools, Ceph OSDs.
Removed the redundant Ceph Nodes Grafana dashboard. You can view
its content using the following dashboards:
Ceph stats through the Ceph Cluster dashboard.
Resource utilization through the System dashboard, which now
includes filtering by Ceph node labels, such as ceph_role_osd,
ceph_role_mon, and ceph_role_mgr.
Removed the rook_cluster alert label.
Removed the redundant CephOSDDown alert.
Renamed the CephNodeDown alert to CephOSDNodeDown.
Implemented an online calculator for quick calculation of the approximate
time required to update your MOSK cluster that uses Open
vSwitch as a networking backend.
Published a procedure that instructs on how to orchestrate Tungsten Fabric
objects through Heat templates to ensure repeatability and consistency
across deployments.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
This section lists the OpenStack known issues with workarounds for the
Mirantis OpenStack for Kubernetes release 23.3.
[31186,34132] Pods get stuck during MariaDB operations¶
Due to the upstream MariaDB issue,
during MariaDB operations on a management cluster, Pods may get stuck
in continuous restarts with the following example error:
Create a backup of the /var/lib/mysql directory on the
mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically
restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes
and restores the quorum.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
This section lists the Tungsten Fabric (TF) known issues with
workarounds for the Mirantis OpenStack for Kubernetes release
23.3. For TF limitations, see Tungsten Fabric known limitations.
The Cassandra containers of the tf-cassandra-analytics service are
experiencing high CPU and memory utilization. This is happening because
Cassandra Analytics is running out of memory, causing restarts of both
Cassandra and the Tungsten Fabric control plane services.
To work around the issue, use the custom images from the Mirantis public
repository:
Specify the image for config-api in the TFOperator custom resource:
To apply the changes, restart the vRouters manually.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
During update, the ingress pods that have not been updated yet adopt
the configuration meant for the updated pods, causing disruptions.
This occurs as ingress pods are sequentially updated, leading to potential
inaccessibility to the cloud public API for unpredictable durations until
all ingress pods are updated.
To mitigate this issue, Mirantis recommends updating ingress pods in larger
batches, preferably half of all pods at a time. This approach minimizes
downtime for the public API.
Workaround:
Before you start updating to MOSK 23.3:
Increase maxUnavailable for the ingress DaemonSet to 50% of replicas
by patching directly the DaemonSet:
In certain scenarios, the change may trigger an immediate restart of half
of the ingress pods. Therefore, after patching the ingress, wait until
all ingress pods become ready, taking into account that there might be
occasional failures in public API calls.
To verify that the patch has been applied successfully:
The following issues have been addressed in the MOSK
23.3 release:
[OpenStack][34897] Resolved the issue that caused the unavailability
of machines from the nodes with DPDK after update of OpenStack from Victoria
to Wallaby.
[OpenStack][34411] Resolved the issue with an incorrect port value
for RabbitMQ after update.
[OpenStack][25124] Improved performance while sending data between
instances affected by the Multiprotocol Label Switching over Generic Routing
Encapsulation (MPLSoGRE) throughput limitation.
[TF][30738] Fixed the issue that caused the tf-vrouter-agent
readiness probe failure (No Configuration for self).
[Update][35111] Resolved the issue that caused the
openstack-operator-ensure-resources job getting stuck
in CrashLoopBackOff.
[WireGuard][35147] Resolved the issue that prevented the WireGuard
interface from having the IPv4 address assigned.
[Bare metal][34342] Resolved the issue that caused a failure of the
etcd pods due to the simultaneous deployment of several pods on a single
node. To ensure that etcd pods are always placed on different nodes,
MOSK now deploys etcd with
the requiredDuringSchedulingIgnoredDuringExecution policy.
[StackLight][35738] Resolved the issue with ucp-node-exporter.
It was unable to bind port 9100, causing the ucp-node-exporter start
failure. This issue was due to a conflict with the StackLight
node-exporter, which was also binding the same port.
The resolution of the issue involves an automatic change of the port for the
StackLight node-exporter from 9100 to 19100. No manual port update is
required.
If your cluster uses a firewall, add an additional firewall rule that
grants the same permissions to port 19100 as those currently assigned
to port 9100 on all cluster nodes.
This section describes the specific actions you as a Cloud Operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster update to the
version 23.3.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Update known issues.
In the 23.3 release series, MOSK stops supporting
Ubuntu 18.04. Therefore, upgrade the operating system on your cluster
machines to Ubuntu 20.04 before you update to MOSK
23.3. Otherwise, the Cluster release update for the cluster running
on Ubuntu 18.04 becomes impossible.
It is not mandatory to upgrade all machines at once. You can upgrade them
one by one or in small batches, for example, if the maintenance window is
limited in time.
MOSK supports the OpenStack Victoria version until
September, 2023. MOSK 23.2 was the last release version
where OpenStack Victoria packages were updated.
If you have not already upgraded your OpenStack version to Yoga, perform the
upgrade before cluster update.
While updating your cluster, the Instance High Availability service
(OpenStack Masakari) may not work as expected. Therefore, temporarily
disable the service by removing instance-ha from the service list
in the OpenStackDeployment custom resource.
Ensure running one etcd pod per OpenStack controller node¶
During the update, you may encounter the issue that causes a failure of the
etcd pods due to the simultaneous deployment of several pods on a single
node.
Therefore, before starting the update, ensure that each OpenStack controller
node runs only one etcd pod.
In total, since MOSK 23.2 major release, in 23.3, 466
Common Vulnerabilities and Exposures (CVE) have been fixed:
24 of critical and 442 of high severity.
The table below includes the total numbers of addressed unique and common
CVEs by MOSK-specific component since
MOSK 23.2.3. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.25.0: Security notes.
The patch release notes contain the list of updated artifacts
and Common Vulnerabilities and Exposures (CVE) fixes in images
as well as description of the addressed product issues for the
MOSK 23.3.1 patch.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
In total, since MOSK 23.3 release, in 23.3.1, 157
Common Vulnerabilities and Exposures (CVE) have been fixed:
5 of critical and 152 of high severity.
The table below includes the total numbers of addressed unique and common
CVEs by MOSK-specific component since
MOSK 23.2.3. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.25.1: Security notes.
The following issues have been addressed in the MOSK
23.3.1 release:
[37012] Resolved the issue that caused the cluster update failure due
to instances evacuation when they were not supposed to be evacuated.
[37083] Resolved the issue that caused Cloudprober to produce warnings
about large amount of targets.
[37185] Resolved the issue that caused the OpenStack Controller to fail
while applying the Manila Helm charts during the attempt to enable Manila
through the OpenStackDeployment custom resource.
The patch release notes contain the list of updated artifacts
and Common Vulnerabilities and Exposures (CVE) fixes in images
for the MOSK 23.3.2 patch.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component since
MOSK 23.3.1. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.25.2: Security notes.
The patch release notes contain the lists of updated artifacts and
addressed product issues, as well as the details on Common
Vulnerabilities and Exposures (CVE) fixes in images for the
MOSK 23.3.3 patch.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component since
MOSK 23.3.2. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.25.3: Security notes.
The patch release notes contain the lists of updated artifacts and
addressed product issues, as well as the details on Common
Vulnerabilities and Exposures (CVE) fixes in images for the
MOSK 23.3.4 patch.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
The table below includes the total number of addressed unique and common
CVEs by MOSK-specific component since
MOSK 23.3.3. The common CVEs are issues addressed
across several images.
For the detailed list of fixed and present CVEs across the Mirantis
Container Cloud and MOSK products, refer to
Mirantis Security Portal.
Mirantis Container Cloud CVEs
For the number of fixed CVEs in the Mirantis Container Cloud-related
components including kaas core, bare metal, Ceph, and StackLight, refer to
Container Cloud 2.25.4: Security notes.
Implemented the capability to parallelize OpenStack, Ceph, and Tungsten Fabric
node update operations, significantly improving the efficiency of
MOSK deployments. The parallel node update feature applies
to any operation that utilizes the Node Maintenance API, such as cluster
updates or graceful node reboots.
Implemented the OpenStack workload monitoring feature through
the Cloudprober exporter.
After enablement and proper configuration, the exporter allows for
monitoring the availability of instance floating IP addresses
per OpenStack compute node and project, as well as viewing
the probe statistics for individual instance floating IP addresses
through the Openstack Instances Availability dashboard
in Grafana.
Introduced the Technology Preview support for the BGP dynamic routing
extension to the Networking service (OpenStack Neutron) that will be
particularly useful for the MOSK clouds where private
networks managed by cloud users need to be transparently integrated into the
networking of the data center.
Implemented the TLS encryption feature for QEMU and libvirt to secure all data
transports during live migration, including disks not on shared storage.
Tungsten Fabric graceful restart and long-lived graceful restart¶
Available since MOSK 23.2
for Tungsten Fabric 21.4 onlyTechPreview
Added support for graceful restart and long-lived graceful restart
allowing for a more efficient and robust routing experience for
Tungsten Fabric. These features enhance the speed at which routing
tables converge, specifically when dealing with BGP router restarts or
failures.
Introduced support for Mirantis Kubernetes Engine (MKE) 3.6 with Kubernetes
1.24. MOSK clusters are updated to the latest supported
MKE version during the cluster update.
Added initial Technology Preview support for custom host names of cluster
machines. When enabled, any machine host name in a particular region matches
the related Machine object name.
Added initial Technology Preview support for the Linux Audit daemon
auditd to monitor activity of cluster processes that allow for
detection of potential malicious activity.
Added a tutorial to help you build your first cloud application and onboard
it to a MOSK cloud. It will guide you through the process of
deploying a simple application using the cloud web UI (OpenStack Horizon).
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
Multiprotocol Label Switching over Generic Routing Encapsulation (MPLSoGRE)
provides limited throughput while sending data between VMs up to 38 Mbps, as
per Mirantis tests.
As a workaround, switch the encapsulation type to VXLAN in the
OpenStackDeployment custom resource:
[31186,34132] Pods get stuck during MariaDB operations¶
Due to the upstream MariaDB issue,
during MariaDB operations on a management cluster, Pods may get stuck
in continuous restarts with the following example error:
Restart the neutron-ovs-agent agent on the affected nodes.
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue,
a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
This section lists the Tungsten Fabric (TF) known issues with
workarounds for the Mirantis OpenStack for Kubernetes release
23.2. For TF limitations, see Tungsten Fabric known limitations.
The Cassandra containers of the tf-cassandra-analytics service are
experiencing high CPU and memory utilization. This is happening because
Cassandra Analytics is running out of memory, causing restarts of both
Cassandra and the Tungsten Fabric control plane services.
To work around the issue, use the custom images from the Mirantis public
repository:
Specify the image for config-api in the TFOperator custom resource:
Execution of the TF Heat Tempest test test_template_global_vrouter_config
can result in lost vRouter configuration. This causes the tf-vrouter pod
readiness probe to fail with the following error message:
"Readiness probe failed:vRouter is PRESENT contrail-vrouter-agent: initializing (No Configuration for self)"
As a result, vRouters may have an incomplete routing table making some
services, such as metadata, become unavailable.
Workaround:
Add the tf_heat_tempest_plugin tests with global configuration to the
exclude list in the OpenStackDeployment custom resource:
If you ran test_template_global_vrouter_config and tf-vrouter-agent
pods moved to the error state with the above error, re-create these pods
through deletion:
kubectl-ntfdeletepodtf-vrouter-agent-*
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
Due to the upstream Calico
issue, on clusters
with Wireguard enabled, the Wireguard interface on a node may not have
the IPv4 address assigned. This leads to broken inter-Pod communication
between the affected node and other cluster nodes.
The node is affected if the IP address is missing on the Wireguard interface:
During MOSK update to either 23.2 major release or
any patch release of the 23.2 release series, the
openstack-operator-ensure-resources job may get stuck in
the CrashLoopBackOff state with the following error:
Traceback (most recent call last):File "/usr/local/bin/osctl-ensure-shared-resources", line 8, in <module>sys.exit(main())File "/usr/local/lib/python3.8/dist-packages/openstack_controller/cli/ensure_shared_resources.py", line 61, in mainobj.update()File "/usr/local/lib/python3.8/dist-packages/pykube/objects.py", line 165, in updateself.patch(self.obj, subresource=subresource)File "/usr/local/lib/python3.8/dist-packages/pykube/objects.py", line 157, in patchself.api.raise_for_status(r)File "/usr/local/lib/python3.8/dist-packages/pykube/http.py", line 444, in raise_for_statusraise HTTPError(resp.status_code, payload["message"])pykube.exceptions.HTTPError:CustomResourceDefinition.apiextensions.k8s.io "redisfailovers.databases.spotahome.com" is invalid: spec.preserveUnknownFields: Invalid value: true:must be false in order to use defaults in the schema
As a workaround, delete the redisfailovers.databases.spotahome.com
CRD from your cluster:
The following issues have been addressed in the MOSK
23.2 release:
[OpenStack][33006] Fixed the issue that prevented communication
between virtual machines on the same network.
[OpenStack][34208] Prevented the Masakari API pods from constant
restart.
[TF][32723] Fixed the issue that prevented a compiled vRouter
kmod from automatic refreshing with the new kernel.
[TF][32326] Fixed the issue that allowed for unathorized access
to the Tungsten Fabric API.
[Ceph][30635] Fixed the issue with irrelevant error message
displaying in the osd-prepare Pod during the deployment of Ceph OSDs on
removable devices on AMD nodes. Now, the error message clearly states that
removable devices with hotplug enabled are not supported for deploying
Ceph OSDs.
[Ceph][31630] Fixed the issue that caused the Ceph cluster upgrade
to Pacific to be stuck with Rook connection failure.
[Ceph][31555] Fixed the issue with Ceph finding only 1 out of 2
mgr after update.
[Ceph][23292] Fixed the issue that caused the failure of the Ceph
rook-operator with FIPS kernel.
[Update][27797] Fixed the issue that stopped cluster kubeconfig
from working during the MKE minor version update.
[Update][32311] Fixed the issue with the tf-rabbit-exporter
ReplicaSet blocking the cluster update.
[StackLight][30867] Fixed the Instance Info panel
for RabbitMQ in Grafana.
This section describes the specific actions you as a Cloud Operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster update to the
version 23.2.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
The update to MOSK 23.2 does not include any
version-specific impact on the cluster. To start planning a maintenance window,
use the Operations Guide: Update a MOSK cluster standard procedure.
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Cluster update known issues.
Pre-update actions¶Disable the Instance High Availability service¶
While updating your cluster, the Instance High Availability service
(OpenStack Masakari) may not work as expected. Therefore, temporarily
disable the service by removing instance-ha from the service list
in the OpenStackDeployment custom resource.
In the next release series, MOSK will stop supporting
Ubuntu 18.04. Therefore, Mirantis highly recommends upgrading an operating
system on your cluster machines to Ubuntu 20.04 during the course of the
MOSK 23.2 series by rebooting cluster nodes.
It is not mandatory to reboot all machines at once. You can reboot them
one by one or in small batches, for example, if the maintenance window is
limited in time.
Otherwise, the Cluster release update for the cluster running on Ubuntu 18.04
will become impossible.
MOSK supports the OpenStack Victoria version until
September, 2023. MOSK 23.2 is the last release version
where OpenStack Victoria packages are updated.
If you have not already upgraded your OpenStack version to Yoga, Mirantis
highly recommends doing this during the course of the MOSK
23.2 series.
Make the OpenStack notifications available in StackLight¶
After the update, the notifications from OpenStack become unavailable in
StackLight. On an attempt to establish a TCP connection to the RabbitMQ
server, the connection is refused with the following error:
In total, since MOSK 23.1 major release, in 23.2, 1611
Common Vulnerabilities and Exposures (CVE) have been fixed:
65 of critical and 1546 of high severity.
Among them, 689 CVEs that are listed in
Addressed CVEs - detailed have been fixed since 23.1.4 patch release:
29 of critical and 660 of high severity. The fixes for the rest of CVEs
were released with the patch releases of the MOSK 23.1
series.
The full list of the CVEs present in the current Mirantis OpenStack for
Kubernetes (MOSK) release is available at the Mirantis Security Portal.
The Addressed CVEs - summary table includes the total number of
unique CVEs along with the total number of issues fixed across images.
Duplicate CVEs for packages in
the Addressed CVEs - detailed table can mean that they were
discovered in container images with the same names but different tags,
for example, openstack/barbican for Openstack Victoria and Yoga
versions.
The patch release notes contain the list of artifacts and Common
Vulnerabilities and Exposures (CVE) fixes for the MOSK
23.2.1 patch released on August 29, 2023.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
This section lists the cluster update known issues with workarounds for the
Mirantis OpenStack for Kubernetes release 23.2.1.
[35111] openstack-operator-ensure-resources job stuck in CrashLoopBackOff¶
During MOSK update to either 23.2 major release or
any patch release of the 23.2 release series, the
openstack-operator-ensure-resources job may get stuck in
the CrashLoopBackOff state with the following error:
Traceback (most recent call last):File "/usr/local/bin/osctl-ensure-shared-resources", line 8, in <module>sys.exit(main())File "/usr/local/lib/python3.8/dist-packages/openstack_controller/cli/ensure_shared_resources.py", line 61, in mainobj.update()File "/usr/local/lib/python3.8/dist-packages/pykube/objects.py", line 165, in updateself.patch(self.obj, subresource=subresource)File "/usr/local/lib/python3.8/dist-packages/pykube/objects.py", line 157, in patchself.api.raise_for_status(r)File "/usr/local/lib/python3.8/dist-packages/pykube/http.py", line 444, in raise_for_statusraise HTTPError(resp.status_code, payload["message"])pykube.exceptions.HTTPError:CustomResourceDefinition.apiextensions.k8s.io "redisfailovers.databases.spotahome.com" is invalid: spec.preserveUnknownFields: Invalid value: true:must be false in order to use defaults in the schema
As a workaround, delete the redisfailovers.databases.spotahome.com
CRD from your cluster:
The patch release notes contain the list of artifacts and Common
Vulnerabilities and Exposures (CVE) fixes for the MOSK
23.2.2 patch released on September 14, 2023.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
The following issues have been addressed in the MOSK
23.2.2 release:
[34342] Resolved the issue that caused a failure of the etcd pods due
to the simultaneous deployment of several pods on a single node. To ensure
that etcd pods are always placed on different nodes, MOSK
now deploys etcd with the requiredDuringSchedulingIgnoredDuringExecution
policy.
[34276] Resolved the issue that caused the presence of stale namespaces
if the agent responsible for hosting the network was modified while
the agent was offline.
During the update, you may encounter the issue that causes a failure of the
etcd pods due to the simultaneous deployment of several pods on a single
node.
The workaround is to remove the PVC for one etcd pod.
[35111] openstack-operator-ensure-resources job stuck in CrashLoopBackOff¶
During MOSK update to either 23.2 major release or
any patch release of the 23.2 release series, the
openstack-operator-ensure-resources job may get stuck in
the CrashLoopBackOff state with the following error:
Traceback (most recent call last):File "/usr/local/bin/osctl-ensure-shared-resources", line 8, in <module>sys.exit(main())File "/usr/local/lib/python3.8/dist-packages/openstack_controller/cli/ensure_shared_resources.py", line 61, in mainobj.update()File "/usr/local/lib/python3.8/dist-packages/pykube/objects.py", line 165, in updateself.patch(self.obj, subresource=subresource)File "/usr/local/lib/python3.8/dist-packages/pykube/objects.py", line 157, in patchself.api.raise_for_status(r)File "/usr/local/lib/python3.8/dist-packages/pykube/http.py", line 444, in raise_for_statusraise HTTPError(resp.status_code, payload["message"])pykube.exceptions.HTTPError:CustomResourceDefinition.apiextensions.k8s.io "redisfailovers.databases.spotahome.com" is invalid: spec.preserveUnknownFields: Invalid value: true:must be false in order to use defaults in the schema
As a workaround, delete the redisfailovers.databases.spotahome.com
CRD from your cluster:
The patch release notes contain the list of artifacts and Common
Vulnerabilities and Exposures (CVE) fixes as well as description of the fixed
product issues for the MOSK 23.2.3 patch released on
September 26, 2023.
For the list of enhancements and bug fixes that relate to Mirantis Container
Cloud, refer to the Mirantis Container Cloud Release notes.
During the update, you may encounter the issue that causes a failure of the
etcd pods due to the simultaneous deployment of several pods on a single
node.
The workaround is to remove the PVC for one etcd pod.
[35111] openstack-operator-ensure-resources job stuck in CrashLoopBackOff¶
During MOSK update to either 23.2 major release or
any patch release of the 23.2 release series, the
openstack-operator-ensure-resources job may get stuck in
the CrashLoopBackOff state with the following error:
Traceback (most recent call last):File "/usr/local/bin/osctl-ensure-shared-resources", line 8, in <module>sys.exit(main())File "/usr/local/lib/python3.8/dist-packages/openstack_controller/cli/ensure_shared_resources.py", line 61, in mainobj.update()File "/usr/local/lib/python3.8/dist-packages/pykube/objects.py", line 165, in updateself.patch(self.obj, subresource=subresource)File "/usr/local/lib/python3.8/dist-packages/pykube/objects.py", line 157, in patchself.api.raise_for_status(r)File "/usr/local/lib/python3.8/dist-packages/pykube/http.py", line 444, in raise_for_statusraise HTTPError(resp.status_code, payload["message"])pykube.exceptions.HTTPError:CustomResourceDefinition.apiextensions.k8s.io "redisfailovers.databases.spotahome.com" is invalid: spec.preserveUnknownFields: Invalid value: true:must be false in order to use defaults in the schema
As a workaround, delete the redisfailovers.databases.spotahome.com
CRD from your cluster:
Dynamic configuration of resource oversubscription¶
Introduced a new default way to configure the resource oversubscription
in the cloud that enables the cloud operator to dynamically control the
oversubscription through the Compute service (OpenStack Nova) placement API.
The initial configuration is performed through the OpenStackDeployment
custom resource. By default, the following values are applied:
Starting from 23.1, MOSK deploys all new clouds using
Tungsten Fabric 21.4 by default. The existing OpenStack deployments using
Tungsten Fabric as a networking backend will obtain this new version
automatically during the cluster update to MOSK 23.1.
One of the key highlights of the Tungsten Fabric 21.4 release is the support
for configuring Maximum Transmission Unit for virtual networks. This
capability enables you to set the maximum packet size for your virtual
networks, ensuring that your network traffic is optimized for performance and
efficiency.
Enhanced load balancing as a service for Tungsten Fabric -enabled
MOSK clouds by adding support for Amphora instances
on top of the Tungsten Fabric networks.
Compared to the old implementation, which relied on the
Tungsten Fabric-controlled HAproxy, the new approach offers:
Implemented the new panels in the Grafana dashboards for OpenSearch
and Prometheus that provide details on the storage usage and allow
calculating the possible retention time based on provisioned storage and
average usage.
Added the capability to forward logs to external Elasticsearch and OpenSearch
servers as the fluentd-logs output. This enhancement also expands
existing configuration options for log forwarding to syslog.
Implemented the capability to hide sensitive fields from the
OpenStackDeployment object by adding reference to a secret
to this object using the value_from structure.
Implemented the functionality that enables cloud operators to periodically
rotate credentials of OpenStack admin and service users with minimized
impact on service availability and workload downtime.
Ensured better security for the noVNC client by allowing encryption of data
transfer between the instances and the noVNC proxy server using VeNCrypt
authentication scheme. You can enable this feature by defining
features:nova:console:novnc:tls:enabled in the OpenStackDeployment
custom resource.
Technology Preview. Reworked the default MOSK access
policies to restrict the permissions of a project administrator role
exclusively to the scope of their project.
Implemented the capability to reboot several cluster nodes in one go by using
the Graceful reboot mechanism
provided by Mirantis Container Cloud. The mechanism restarts the selected
nodes one by one, honoring the instance migration policies.
Implemented the capability to identify the nodes requiring reboot through
both the Mirantis Container Cloud API and web UI:
API: reboot.required.true in
status:providerStatus of a Machine object
Web UI: the One or more machines require a reboot
notification on the Clusters and Machines pages
Published the tutorial to help you build your first cloud application and
onboard it to a MOSK cloud. The dedicated section
in the User Guide will guide you through the process of deploying and
managing a sample application using automation, and showcase the powerful
capabilities of OpenStack.
Published the instructions on how you can customize the functionality of
MOSK OpenStack services by installing custom system
or Python packages into their container images.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
This section lists the OpenStack known issues with workarounds for the
Mirantis OpenStack for Kubernetes release 23.1.
[25124] MPLSoGRE encapsulation has limited throughput¶
Multiprotocol Label Switching over Generic Routing Encapsulation (MPLSoGRE)
provides limited throughput while sending data between VMs up to 38 Mbps, as
per Mirantis tests.
As a workaround, switch the encapsulation type to VXLAN in the
OpenStackDeployment custom resource:
This section lists the Tungsten Fabric (TF) known issues with
workarounds for the Mirantis OpenStack for Kubernetes release
23.1. For TF limitations, see Tungsten Fabric known limitations.
Execution of the TF Heat Tempest test test_template_global_vrouter_config
can result in lost vRouter configuration. This causes the tf-vrouter pod
readiness probe to fail with the following error message:
"Readiness probe failed:vRouter is PRESENT contrail-vrouter-agent: initializing (No Configuration for self)"
As a result, vRouters may have an incomplete routing table making some
services, such as metadata, become unavailable.
Workaround:
Add the tf_heat_tempest_plugin tests with global configuration to the
exclude list in the OpenStackDeployment custom resource:
If you ran test_template_global_vrouter_config and tf-vrouter-agent
pods moved to the error state with the above error, re-create these pods
through deletion:
kubectl-ntfdeletepodtf-vrouter-agent-*
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
The vRouter kernel module remains at
/usr/src/vrouter-<TF-VROUTER-IMAGE-VERSION>, even
if it was initially compiled for an older kernel version. This leads
to the reuse of compiled artifacts without recompilation. Consequently,
after upgrading to Mirantis OpenStack for Kubernetes 23.1, an outdated module gets
loaded onto the new kernel. This mismatch results in a failure that
triggers the CrashLoop state for the vRouter on the affected node.
Workaround:
On the affected node, move the old vRouter kernel module to another
directory. For example:
The deployment of Ceph OSDs fails with the following messages in the status
section of the KaaSCephCluster custom resource:
shortClusterInfo:messages:-Not all osds are deployed-Not all osds are in-Not all osds are up
To find out if your cluster is affected, verify if the devices on
the AMD hosts you use for the Ceph OSDs deployment are removable.
For example, if the sdb device name is specified in
spec.cephClusterSpec.nodes.storageDevices of the KaaSCephCluster
custom resource for the affected host, run:
# cat /sys/block/sdb/removable1
The system output shows that the reason of the above messages in status
is the enabled hotplug functionality on the AMD nodes, which marks all drives
as removable. And the hotplug functionality is not supported by Ceph in
MOSK.
As a workaround, disable the hotplug functionality in the BIOS settings
for disks that are configured to be used as Ceph OSD data devices.
[31630] Ceph cluster upgrade to Pacific is stuck with Rook connection failure¶
The KaaSCephCluster custom resource contains the following configuration
option in the rookConfig section:
spec:cephClusterSpec:rookConfig:ms_crc_data:"false"# or 'ms crc data: "false"'
As a workaround, remove ms_crc_data (or mscrcdata) configuration
key from the KaaSCephCluster custom resource and wait for the
rook-ceph-mon pods to restart on the MOSK cluster:
kubectl-nrook-cephgetpod-lapp=rook-ceph-mon-w
[31555] Ceph can find only 1 out of 2 ‘mgr’ after update to MOSK 23.1¶
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.nova-o/tmp/nova.key
vi/tmp/nova.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/nova.key
Verify that the client.nova keyring of the Ceph cluster matches the
one obtained from the OpenStack pods:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.cinder-o/tmp/cinder.key
vi/tmp/cinder.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/cinder.key
Verify that the client.cinder keyring of the Ceph cluster matches
the one obtained from the OpenStack pods:
If the keyring differs, change the one stored in Ceph cluster with
the key from the OpenStack pods:
kubectl-nrook-cephexec-itdeploy/rook-ceph-tools--bash
cephauthgetclient.glance-o/tmp/glance.key
vi/tmp/glance.key
# in the editor, change "key" value to the key obtained from the OpenStack pods# then save and exit editing
cephauthimport-i/tmp/glance.key
Verify that the client.glance keyring of the Ceph cluster matches
the one obtained from the OpenStack pods:
During update of a Container Cloud management cluster, if the MKE minor
version is updated from 3.4.x to 3.5.x, access to the cluster using the
existing kubeconfig fails with the You must be logged in to the server
(Unauthorized) error due to OIDC settings being reconfigured.
As a workaround, during the Container Cloud cluster update, use the
adminkubeconfig instead of the existing one. Once the update
completes, you can use the existing cluster kubeconfig again.
On a cluster with Tungsten Fabric enabled, the cluster update is stuck with
the tf-rabbit-exporter deployment having a number of pods in the
Terminating state.
To verify whether your cluster is affected:
kubectl-ntfgetpods|greptf-rabbit-exporter
Example of system response on the affected cluster:
The following issues have been addressed in the MOSK
23.1 release:
[OpenStack][30450] Fixed the issue causing high CPU load of MariaDB.
[OpenStack][29501] Fixed the issue when Cinder periodic database
cleanup resets the state of volumes.
[OpenStack][27168] Fixed the issue that made
openvswitch-openvswitch-vswitchd-default and
neutron-ovs-agent-default pods stuck in the NotReady status
after restart.
[OpenStack][29539] Fixed the issue with missing network traffic for
a trunked port in OpenStack Yoga.
[OpenStack][Yoga][24067] Fixed the issue with inability to set
up a secondary DNS zone in OpenStack Yoga.
Note
The issue still affects OpenStack Victoria.
[TF][10096] Fixed the issue that prevented tf-control from
refreshing IP addresses of Cassandra pods.
[TF][28728] Fixed the issue when
tungstenFabricMonitoring.enabled was not enabled by default during
Tungsten Fabric deployment.
[TF][30449] Fixed the issue that resulted in losing connectivity
after the primary TF Controller node reboot.
[Ceph][28142] Added the ability to specify node affinity for
rook-discover pods through the ceph-operator Helm release.
[Ceph][26820] Fixed the issue when the status section in the
KaaSCephCluster.status custom resource did not reflect issues
during the process of a Ceph cluster deletion.
[StackLight][28372] Fixed the issue causing false-positive liveness
probe failures for fluentd-notifications.
[StackLight][29330] Fixed the issue that prevented tf-rabbitmq
from being monitored.
[Updates][29438] Fixed the issue that caused the cluster update
being stuck during the Tungsten Fabric Operator update.
This section describes the specific actions you as a Cloud Operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster update to the
version 23.1.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
As part of the update to MOSK 23.1, Tungsten Fabric will
automatically get updated from version 2011 to version 21.4.
Note
For the compatibility matrix of the most recent MOSK
releases and their major components in conjunction with Container Cloud and
Cluster releases, refer to Release Compatibility Matrix.
The update to MOSK 23.1 does not include any
version-specific impact on the cluster. To start planning a maintenance window,
use the Operations Guide: Update a MOSK cluster standard procedure.
Before updating the cluster, be sure to review the potential issues that
may arise during the process and the recommended solutions to address
them, as outlined in Cluster update known issues.
Pre-update actions¶Update the baremetal-provider image to 1.37.18¶
If your Container Cloud management cluster has updated to 2.24.1, to avoid
the issue with waiting for the lcm-agent to update
the currentDistribution field during the cluster update to
MOSK 23.1, replace the baremetal-provider image
1.37.15 tag with 1.37.18:
Open the kaasrelease object for editing:
kubectleditkaasreleasekaas-2-24-1
Replace the 1.37.15 tag with 1.37.18 for the baremetal-provider
image:
Explicitly define the OIDCClaimDelimiter parameter¶
MOSK 23.1 introduces a new default value for the
OIDCClaimDelimiter parameter, which defines the delimiter to use when
setting multi-valued claims in the HTTP headers. See the MOSK 23.1 OpenStack
API Reference
for details.
Previously, the value of the OIDCClaimDelimiter parameter defaulted to
",". This value misaligned with the behavior expected by Keystone.
As a result, when creating federation mappings for Keystone, the cloud operator
was forced to write more complex rules. Therefore, in MOSK
22.4, Mirantis announced the change of the default value for the
OIDCClaimDelimiter parameter.
If your deployment is affected and you have not explicitly defined the
OIDCClaimDelimiter parameter, as Mirantis advised, after update to
MOSK 22.4 or 22.5, now would be a good time to do it.
Otherwise, you may encounter unforeseen consequences after the update to
MOSK 23.1.
Affected deployments
Proceed with the instruction below only if the following conditions are
true:
Keystone is set to use federation through the OpenID Connect protocol,
with Mirantis Container Cloud Keycloak in particular. The following
configuration is present in your OpenStackDeployment custom resource:
The new default value for the OIDCClaimDelimiter parameter
is ";". To find out whether your Keystone mappings will need
adjustment after changing the default value, set the parameter to
";" on your staging environment and verify the rules.
Verify that the KaaSCephCluster custom resource does not contain the
following entries. If they exist, remove them.
In the spec.cephClusterSpec section, the external section.
Caution
If the external section exists in the KaaSCephCluster
spec during upgrade to MOSK 23.1, it will cause Ceph
outage that leads to corruption of the Cinder volumes file system and
requires a lot of routine work to fix sectors with Cinder volumes
one-by-one after fixing Ceph outage.
Therefore, make sure that the external section is removed from the
KaaSCephCluster spec right before starting cluster upgrade.
In the spec.cephClusterSpec.rookConfig section, the ms_crc_data or
mscrcdata configuration key. After you remove the key, wait for
rook-ceph-mon pods to restart on the MOSK
cluster.
Caution
If the ms_crc_data key exists in the rookConfig section
of KaaSCephCluster during upgrade to MOSK 23.1,
it causes missing connection between Rook Operator and Ceph Monitors
during Ceph version upgrade leading to a stuck upgrade and requires
that you manually disable the ms_crc_data key for all Ceph Monitors.
Therefore, make sure that the ms_crc_data key is removed from the
KaaSCephCluster spec right before starting cluster upgrade.
To prevent issues during graceful reboot of the OpenStack controller nodes,
temporarily remove Tempest from the OpenStackDeployment object:
spec:features:services:-tempest
Post-update actions¶Remove sensitive information from cluster configuration¶
The OpenStackDeploymentSecret custom resource has been deprecated in
MOSK 23.1. The fields that store confidential settings
in OpenStackDeploymentSecret and OpenStackDeployment custom resources
need to be migrated to the Kubernetes secrets.
To ensure stability for production workloads, MOSK 23.1
changes the default value of RAM oversubscription on compute nodes to 1.0,
which is no oversubscription. In MOSK 22.5 and earlier,
the effective default value of RAM allocation ratio is 1.1.
This change will be applied only to the compute nodes added to the cloud
after update to MOSK 23.1. The effective RAM
oversubscription value for existing compute nodes will not automatically
change after updating to MOSK 23.1.
Use dynamic configuration for resource oversubscription¶
Since MOSK 23.1, the Compute service (OpenStack Nova)
enables you to control the resource oversubscription dynamically through
the placement API.
However, if your cloud already makes use of custom allocation ratios, the new
functionality will not become immediately available after update. Any compute
node configured with explicit values for the cpu_allocation_ratio,
disk_allocation_ratio, and ram_allocation_ratio configuration options
will continue to enforce those values in the placement service. Therefore, any
changes made through the placement API will be overridden by the values set in
those configuration options in the Compute service. To modify oversubscription,
you should adjust the values of these configuration options in the
OpenStackDeployment custom resource. This procedure should be performed
with caution as modifying these values may result in compute service restarts
and potential disruptions in the instance builds.
To enable the use of the new functionality, Mirantis recommends removing
explicit values for the cpu_allocation_ratio, disk_allocation_ratio,
and ram_allocation_ratio options from the OpenStackDeployment custom
resource. Instead, use the new configuration options as described in
Configuring initial resource oversubscription. Also, keep in mind that
the changes will only impact newly added compute nodes and will not be applied
to the existing ones.
The patch release notes contain the list of artifacts and Common
Vulnerabilities and Exposures (CVE) fixes for the MOSK
23.1.1 patch released on April 20, 2023.
The OpenStack upgrade to Yoga fails due to the delay in the Cinder start.
Workaround:
Follow the openstack-controller logs from the OpenStackDeployment
container. When the controller is stuck on checking health for any OpenStack
component, verify the Helm releases statuses:
The following issues have been addressed in the MOSK
23.1.4 release:
[27031] Fixed the removal of objects marked as Deleted from the
Barbican database during the database cleanup.
[30224] Decreased the default weight for
build_failure_weight_multiplier to 2 to normalize instance spreading
across compute nodes.
[30673] Fixed the issue with duplicate tasks responses from the
ironic-python agent on the ironic-conductor side.
[30888] Adjusted the caching time for PowerDNS to fit Designate
timeouts.
[31021] Fixed the race in openstack-controller that could lead
to setting the default user names and passwords in configuration files
during initial deployment.
[31358] Configured the warning message about the world readable
directory with fernet keys to be logged only once during the startup.
[31711] Started to pass the autogenerated memcache_secret_key
to avoid its regeneration every time the Manila Helm chart gets updated.
This section describes the patch-related known issues with available
workarounds.
[32761] Bare-metal nodes stuck in the cleaning state¶
During the initial deployment of Mirantis Container Cloud, some nodes may
get stuck in the cleaning state. The workaround is to wipe disks manually
before initializing the Mirantis Container Cloud bootstrap.
Added full support for OpenStack Yoga with Open vSwitch and Tungsten Fabric
2011 networking backends.
Starting from 22.5, MOSK deploys all new clouds using
OpenStack Yoga by default. To upgrade an existing cloud from OpenStack Victoria
to Yoga, follow the Upgrade OpenStack procedure.
Highlights from upstream supported by Mirantis OpenStack
deployed on Yoga
[Cinder] Removed the deprecated Block Storage API version 2.0. Instead,
use the Block Storage API version 3.0 that is fully compatible with the
previous version.
[Cinder] Removed the requirement for the request URLs to contain a project
ID in the Block Storage API making it more consistent with other
OpenStack APIs. For backward compatibility, legacy URLs containing
a project ID continue to be recognized.
[Designate] Added support for the CERT resource record type enabling
new use cases such as secure email and publication of certificate
revocation list through DNS.
[Horizon] Added support for the Network QoS Policy creation.
[Glance] Implemented /v2/images/<image-id>/tasks to get tasks
associated with an image.
[Ironic] Changed the default deployment boot mode from legacy BIOS to
UEFI.
[Masakari] Added support for disabling and enabling failover segments.
Now, cloud operators can put whole segments into the maintenance
mode.
[Neutron] Implemented the address-groups resource that can be used to
add groups of IP addresses to security group rules.
[Nova] Added support for the API microversion 2.90. It enables the users
to configure the host name exposed through the Nova metadata service
during instances creating or rebuilding.
[Octavia] Increased the performance and scalability of load balancers
that use the amphora provider when using amphora images built with
version 2.x of the HAProxy load balancing engine.
[Octavia] Improved the observability of load balancers by adding the
PROMETHEUS listeners that expose a Prometheus exporter endpoint.
The Octavia amphora provider exposes over 150 unique metrics.
Implemented the capability to securely expose part of a
MOSK cluster message bus (RabbitMQ) to the outside world.
This enables external consumers to subscribe to notification messages
emitted by the cluster services and can be helpful in several use cases:
Analysis of notification history for retrospective security audit
Real-time aggregation of notification messages to collect
statistics of cloud resource consumption for capacity planning or
charge-back
The external notification endpoint can be easily enabled and configured through
the OpenStackDeployment custom resource.
Added MOSK support for the Shared Filesystems service
(OpenStack Manila), which enables cloud users to create and manage virtual
file shares, so that applications can store their data using common network
file sharing protocols, such as CIFS, NFS, and so on.
Implemented the ability to enable the BGP load-balancing mode for
MOSK underlying Kubernetes to allow distribution
of services providing OpenStack APIs across multiple independent
racks that have no L2 segments in common.
Automated configuration of public FQDN for the Object Storage endpoint¶
The fully qualified domain name (FQDN) for the Object Storage service
(Ceph Object gateway) public endpoint is now configurable through just
a single parameter in the KaaSCephCluster custom resource, which is
spec.cephClusterSpec.ingress.publicDomain. Previously, you had to perform
a set of manual steps to define a custom name. If the parameter is not set,
the FQDN settings from the OpenStackDeployment custom resource apply
by default.
The new parameter simplifies configuration of Transport Layer Security of
user-facing endpoints of the Object Storage service.
Implemented the following enhancements for etcd monitoring:
Introduced etcd monitoring for OpenStack by implementing the Etcd
Grafana dashboard and by adding OpenStack to the set of existing alerts for
etcd that were used for MKE clusters only in previous releases.
Improved etcd monitoring for MKE on MOSK clusters by
implementing the Etcd dashboard and etcdDbSizeCritical and
etcdDbSizeMajor alerts that inform about the size of the etcd database.
Setting of a custom value for a node label using web UI¶
Implemented the ability to set a custom value for a predefined node label using
the Container Cloud web UI. The list of available node labels is
obtained from allowedNodeLabels of your current Cluster release.
If the value field is not defined in allowedNodeLabels, select the
check box of the required label and define an appropriate custom value for
this label to be set to the node.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
One of the most common symptoms of the high CPU load of MariaDB is slow API
responses. To troubleshoot the issue, verify the CPU consumption of MariaDB
using the General > Kubernetes Pods Grafana dashboard or through
the CLI as follows:
Obtain the resource consumption details for the MariaDB server:
If you are observing a huge difference between the filtered and
r_filtered columns for the query, as in the example of system
response above, analyze the performance of tables by running the
ANALYZE TABLE <TABLE_NAME>; and
ANALYZE TABLE <TABLE_NAME> PERSISTENT FOR ALL; commands:
Due to an issue in the database auto-cleanup job for the Block Storage service
(OpenStack Cinder), the state of volumes that are attached to instances gets
reset every time the job runs. The instances can still write and read block
storage data, however, volume objects appear in the OpenStack API as
notattached causing confusion.
The workaround is to temporarily disable the job until the issue is
fixed and execute the script below to restore the affected instances.
To disable the job, update the OpenStackDeployment custom resource as
follows:
The provided script does not fix the Cinder database clean-up job
and is only intended to restore the functionality of the affected
instances. Therefore, leave the job disabled.
[25124] MPLSoGRE encapsulation has limited throughput¶
Multiprotocol Label Switching over Generic Routing Encapsulation (MPLSoGRE)
provides limited throughput while sending data between VMs up to 38 Mbps, as
per Mirantis tests.
As a workaround, switch the encapsulation type to VXLAN in the
OpenStackDeployment custom resource:
It is not possible to create an instance that uses a security group shared
through role-based access control (RBAC) with only specifying the network ID
when calling Nova. In such case, before creating a port in the given network,
Nova verifies if the given security group exists in Neutron. However, Nova asks
only for the security groups filtered by project_id. Therefore, it will not
get the shared security group back from the Neutron API. For details, see the
OpenStack known issue
#1942615.
Note
The bug affects only OpenStack Victoria and is fixed for OpenStack
Yoga in MOSK 22.5.
This section lists the Tungsten Fabric known issues with
workarounds for the Mirantis OpenStack for Kubernetes release
22.5. For Tungsten Fabric limitations, see Tungsten Fabric known limitations.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
kubectl-ntfdeletepodtf-control-<hash>
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
If the tungstenfabric-operator-metrics service was present on the cluster
in MOSK 22.4, the update to 22.5 can stuck due to absence
of correct labels for this service. As a workaround, delete the service
manually:
[27797] Cluster ‘kubeconfig’ stops working during MKE minor version update¶
During update of a Container Cloud management cluster, if the MKE minor
version is updated from 3.4.x to 3.5.x, access to the cluster using the
existing kubeconfig fails with the You must be logged in to the server
(Unauthorized) error due to OIDC settings being reconfigured.
As a workaround, during the Container Cloud cluster update, use the
adminkubeconfig instead of the existing one. Once the update
completes, you can use the existing cluster kubeconfig again.
If a cluster does not currently have any ongoing operations that comprise
OpenStack notifications, the fluentd containers in the
fluentd-notifications Pods are frequently restarted due to false-positive
failures of liveness probe and trigger alerts.
Ignore such failures and alerts if the Pods are in the Running state.
To verify the fluentd-notifications Pods:
The following issues have been addressed in the MOSK
22.5 release:
[26773] Fixed the issue with VM autoscaling failure when using the
CPU-related metrics in Telemetry.
[26534] Fixed the issue with the ironic-conductor Pod getting stuck in
the CrashLoopBackOff state after the Container Cloud management cluster
upgrade from 2.19.0 to 2.20.0. The issue occurred due to the race condition
between the ironic-conductor and ironic-conductor-http containers
of the ironic-conductor Pod that tried to use ca-bundle.pem
simultaneously but from different users.
[25594][Yoga] Fixed the issue with security groups shared through RBAC not
being filtered and used by Nova to create instances due to the OpenStack
known issue #1942615.
Note
The bug still affects OpenStack Victoria and is fixed for
OpenStack Yoga.
[24435] Fixed the issue with MetalLB speaker failing to announce the LB IP
for the Ingress service after the MOSK cluster update.
For existing clusters, you can set externalTrafficPolicy back from
Cluster to Local after updating to 22.5. For details, see
Post-upgrade actions.
This section describes the specific actions you as a Cloud Operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster update to the
version 22.5.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
Additionally, read through the Cluster update known issues for the problems
that are known to occur during update with recommended workarounds.
The update to MOSK 22.5 does not include any
version-specific impact on the cluster. To start planning a maintenance window,
use the Operations Guide: Update a MOSK cluster standard procedure.
Before you proceed with updating the cluster, make sure that you perform the
following pre-update actions if applicable:
Due to the [29438] Cluster update gets stuck during the Tungsten Fabric operator update known issue, the MOSK cluster
update from 22.4 to 22.5 can get stuck. Your cluster is affected if it has
been updated from MOSK 22.3 to 22.4, regardless of the
SDN backend in use (Open vSwitch or Tungsten Fabric). The newly deployed
MOSK 22.4 clusters are not affected.
To avoid the issue, manually delete the tungstenfabric-operator-metrics
service from the cluster before update:
Due to the known issue in the database auto-cleanup job for the Block Storage
service (OpenStack Cinder), the state of volumes that are attached to
instances gets reset every time the job runs. The workaround is to
temporarily disable the job until the issue is fixed. For details,
refer to [29501] Cinder periodic database cleanup resets the state of volumes.
Post-update actions¶Explicitly define the OIDCClaimDelimiter parameter¶
The OIDCClaimDelimiter parameter defines the delimiter to use when setting
multi-valued claims in the HTTP headers. See the MOSK 22.5 OpenStack API
Reference
for details.
The current default value of the OIDCClaimDelimiter parameter is ",".
This value misaligns with the behavior expected by Keystone. As a result, when
creating federation mappings for Keystone, the cloud operator may be forced
to write more complex rules. Therefore, in early 2023, Mirantis will change
the default value for the OIDCClaimDelimiter parameter.
Affected deployments
Proceed with the instruction below only if the following conditions are
true:
Keystone is set to use federation through the OpenID Connect protocol,
with Mirantis Container Cloud Keycloak in particular. The following
configuration is present in your OpenStackDeployment custom resource:
The new default value for the OIDCClaimDelimiter parameter
will be ";". To find out whether your Keystone mappings will need
adjustment after changing the default value, set the parameter to
";" on your staging environment and verify the rules.
Optional. Set externalTrafficPolicy=Local for the OpenStack Ingress service¶
In MOSK 22.4 and older versions, the OpenStack Ingress
service was not accessible through its LB IP address on the environments
having the external network restricted to a few nodes in the
MOSK cluster. For such use cases, Mirantis recommended
setting the externalTrafficPolicy parameter to Cluster as a workaround.
The issue #24435 has been fixed in
MOSK 22.5. Therefore, if the monitoring of source IPs of
the requests to OpenStack services is required, you can set the
externalTrafficPolicy parameter back to Local.
Affected deployments
You are affected if your deployment configuration matches the following
conditions:
The external network is restricted to a few nodes in the
MOSK cluster. In this case, only a limited set of
nodes have IPs in the external network where MetalLB announces LB IPs.
The workaround was applied by setting externalTrafficPolicy=Cluster
for the Ingress service.
To set externalTrafficPolicy back from Cluster to Local:
On the MOSK cluster, add the node selector to the
L2Advertisement MetalLB object so that it matches the nodes in the
MOSK cluster having IPs in the external network,
or a subset of those nodes.
The openstack-control-plane:enabled label selector defines nodes in
the MOSK cluster having IPs in the external network.
In the MOSK Cluster object located on the management
cluster, remove or edit node selectors and affinity for MetalLB speaker
in the MetalLB chart values, if required.
Example of the helmReleases section in Cluster.spec after editing
the nodeSelector parameter:
The OpenStack Panko service has been removed from the product since
MOSK 22.2 in OpenStack Victoria without the user
being involved. The OpenStack Panko service is no longer maintained in the
upstream OpenStack. See the project
repository page for details.
Though, in MOSK 22.5, before upgrading to OpenStack Yoga,
verify that you remove the Panko service from the cloud by removing the
event entry from the spec:features:services structure in the
OpenStackDeployment resource as described in Operations Guide:
Remove an OpenStack service.
Provided the technical preview support for OpenStack Yoga with Neutron OVS and
Tungsten Fabric 21.4.
To start experimenting with the new functionality, set openstack_version to
yoga in the OpenStackDeployment custom resource during the cloud
deployment.
Provided the technical preview support for Tungsten Fabric 21.4.
The new version of the Tungsten Fabric networking enables support for the EVPN
type 2 routes for graceful restart and long-lived graceful restart features in
MOSK.
Note
Implementation of the Red Hat Universal Base Image 8 (UBI 8) support
for the Tungsten Fabric container images is being under development and will
be released in one of the upcoming product versions.
To start experimenting with the new functionality, set tfVersion to
21.4 in the TFOperator custom resource during the cloud deployment.
Enabled the application credentials mechanism in the Identity service
for application automation tools to securely authenticate against the
cloud’s API.
Enabled the capability of the OpenStack services to emit notifications in
the Cloud Auditing Data Federation (CADF) format. The CADF notifications
configuration is available through the features:logging:cadf section of
the OpenStackDeployment custom resource.
Implemented the capability to store the OpenStack database backup data
externally. Instead of the default Ceph volume, the cloud operator can
now easily configure the NFS storage backend through the
OpenStackDeployment CR.
Implemented the post-update restart of the TF vRouter pods. Previously,
the cloud operator had to manually restart the vRouter pods after updating
the deployment to a newer MOSK version. The update
procedure has been amended accordingly.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
One of the most common symptoms of the high CPU load of MariaDB is slow API
responses. To troubleshoot the issue, verify the CPU consumption of MariaDB
using the General > Kubernetes Pods Grafana dashboard or through
the CLI as follows:
Obtain the resource consumption details for the MariaDB server:
If you are observing a huge difference between the filtered and
r_filtered columns for the query, as in the example of system
response above, analyze the performance of tables by running the
ANALYZE TABLE <TABLE_NAME>; and
ANALYZE TABLE <TABLE_NAME> PERSISTENT FOR ALL; commands:
It is not possible to create an instance that uses a security group shared
through role-based access control (RBAC) with only specifying the network ID
when calling Nova. In such case, before creating a port in the given network,
Nova verifies if the given security group exists in Neutron. However, Nova asks
only for the security groups filtered by project_id. Therefore, it will not
get the shared security group back from the Neutron API. For details, see the
OpenStack known issue
#1942615.
If security groups shared through RBAC are used, apply them to ports
only, not to instances directly.
[25124] MPLSoGRE encapsulation has limited throughput¶
Multiprotocol Label Switching over Generic Routing Encapsulation (MPLSoGRE)
provides limited throughput while sending data between VMs up to 38 Mbps, as
per Mirantis tests.
As a workaround, switch the encapsulation type to VXLAN in the
OpenStackDeployment custom resource:
This section lists the Tungsten Fabric known issues with
workarounds for the Mirantis OpenStack for Kubernetes release
22.4. For Tungsten Fabric limitations, see Tungsten Fabric known limitations.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
kubectl-ntfdeletepodtf-control-<hash>
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
After the Container Cloud management cluster upgrade from 2.19.0 to 2.20.0,
the ironic-conductor Pod gets stuck in the CrashLoopBackOff state. The
issue occurs due to the race condition between the ironic-conductor and
ironic-conductor-http containers of the ironic-conductor Pod that try
to use ca-bundle.pem simultaneously but from different users.
After updating the MOSK cluster, MetalLB speaker may
fail to announce the Load Balancer (LB) IP address for the OpenStack Ingress
service. As a result, the OpenStack Ingress service is not accessible using
its LB IP address.
The issue may occur if the MetalLB speaker nodeSelector selects not all
the nodes selected by nodeSelector of the OpenStack Ingress service.
The issue may arise and disappear when a new MetalLB speaker is being selected
by the MetalLB Controller to announce the LB IP address.
The issue occurs since MOSK 22.2 after
externalTrafficPolicy was set to local for the OpenStack Ingress
service.
Workaround:
Select from the following options:
Set externalTrafficPolicy to cluster for the OpenStack Ingress
service.
This option is preferable in the following cases:
If not all cluster nodes have connection to the external network
If the connection to the external network cannot be established
If network configuration changes are not desired
If network configuration is allowed and if you require the
externalTrafficPolicy:local option:
Wire the external network to all cluster nodes where the OpenStack Ingress
service Pods are running.
Configure IP addresses in the external network on the nodes and change the
default routes on the nodes.
Change nodeSelector of MetalLB speaker to match nodeSelector
of the OpenStack Ingress service.
[28372] False-positive liveness probe failures for ‘fluentd-notifications’¶
If a cluster does not currently have any ongoing operations that comprise
OpenStack notifications, the fluentd containers in the
fluentd-notifications Pods are frequently restarted due to false-positive
failures of liveness probe and trigger alerts.
Ignore such failures and alerts if the Pods are in the Running state.
To verify the fluentd-notifications Pods:
The following issues have been addressed in the MOSK
22.4 release:
[25349][Update] Fixed the issue causing MOSK cluster
update failure after an OpenStack controller node replacement.
[26278][OpenStack] Fixed the issue with l3-agent being stuck in the
Notready state and routers not being initialized properly during
Neutron restart.
[25447][OpenStack] Fixed the issue that caused a Masakari instance evacuation
to fail if an encrypted volume was attached to a node.
[25448][OpenStack] Fixed the issue that caused some Masakari instances to get
stuck in the Rebuild or Error state when being migrated to a new
OpenStack compute node during host evacuation. The issue occurred on
OpenStack compute nodes with a large number of instances.
[22930][OpenStack] Fixed the issue wherein Octavia load balancers
provisioning, and, occasionally, the listeners or pools associated with these
load balancers got stuck in the ERROR, PENDING_UPDATE,
PENDING_CREATE, or PENDING_DELETE state.
[25450][OpenStack] Implemented the capability to enable trusted mode for
SR-IOV ports.
[25316][StackLight] Introduced projects filtering by a domain name for the
default domain to fix the issue wherein a wrong project was chosen by
name in case of multiple projects with the same names.
[24376][Ceph] Implemented the capability to parametrize the RADOS Block
Device (RBD) device map to avoid Ceph volumes being unresponsive due to a
disabled cyclic redundancy check (CRC) mode. Now you can use the
rbdDeviceMapOptions field in the Ceph pool parameters of the
KaaSCephCluster CR to specify custom RBD map options to use with
StorageClass of a corresponding Ceph pool. For details, see
Pool parameters.
[28783] [Ceph] Fixed the issue causing Ceph conditon stuck in absence of
the Ceph cluster secrets information. If you applied the
workaround to your MOSK 22.3 cluster
before the update, remove the version parameter definition from
KaaSCephCluster after the managed cluster update because the Ceph cluster
version in MOSK 22.4 updates to 15.2.17.
This section describes the specific actions you as a Cloud Operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster to the version 22.4.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
Additionally, read through the Cluster update known issues for the problems
that are known to occur during update with recommended workarounds.
When updating to MOSK 22.4, the Cloud Operator can easily
determine if a node needs to be rebooted by checking for the
restartRequired flag in the machine status. For details,
see Determine if the node needs to be rebooted.
Post-upgrade actions¶Explicitly define the OIDCClaimDelimiter parameter¶
The OIDCClaimDelimiter parameter defines the delimiter to use when setting
multi-valued claims in the HTTP headers. See the MOSK 22.4 OpenStack API
Reference
for details.
The current default value of the OIDCClaimDelimiter parameter is ",".
This value misaligns with the behavior expected by Keystone. As a result, when
creating federation mappings for Keystone, the cloud operator may be forced
to write more complex rules. Therefore, in early 2023, Mirantis will change
the default value for the OIDCClaimDelimiter parameter.
Affected deployments
Proceed with the instruction below only if the following conditions are
true:
Keystone is set to use federation through the OpenID Connect protocol,
with Mirantis Container Cloud Keycloak in particular. The following
configuration is present in your OpenStackDeployment custom resource:
The new default value for the OIDCClaimDelimiter parameter
will be ";". To find out whether your Keystone mappings will need
adjustment after changing the default value, set the parameter to
";" on your staging environment and verify the rules.
Ubuntu 20.04 on OpenStack with OVS and Tungsten Fabric greenfield deployments¶
Implemented full support for Ubuntu 20.04 LTS (Focal Fossa) as the default host
operating system on OpenStack with OVS and OpenStack with Tungsten Fabric
greenfield deployments.
MOSK is now confirmed to be able to run up to 10,000 virtual
machines under a single control plane.
Depending on the cloud workload profile and the number of OpenStack objects in
use, the control plane needs to be extended with additional hardware.
Specifically, for the MOSK clouds that use Open vSwitch as
a backend for the Networking service (OpenStack Neutron) and run more than
12,000 network ports, Mirantis recommends deploying extra tenant gateways.
The maximum size of a MOSK cluster is limited to 500 nodes
in total, regardless of their roles.
Introduced the OpenStackDeploymentSecret custom resource to aggregate
the cloud’s confidential settings such as SSL/TLS certificates, access
credentials for external systems, and other secrets. Previously, the secrets
were stored together with the rest of configuration in the
OpenStackDeployment custom resource.
The following fields have been moved out of the OpenStackDeployment
custom resource:
Switched all OpenStack services to use the built-in policies, aka in-code
policies, to control user access to cloud functions. MOSK
keeps the built-in policies up-to-date with the OpenStack development ensuring
safe by default behavior as well as allowing you to override only those access
rules that you actually need through the features:policies structure in
the OpenStackDeployment custom resource.
Sticking to the default policy set as much as possible simplifies the future
enablement of advanced authentication and access control functionality, such
as scoped tokens and scoped access policies.
Added capability to precache containers’ images on Kubernetes nodes
to minimize possible downtime on the components update. The feature is
enabled by default and can be disabled through the TFOperator custom
resource if required.
Implemented support for custom Docker registries configuration. Using the
ContainerRegistry custom resource, you can configure CA certificates on
machines to access private Docker registries.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
During l3-agent restart, routers may not be initialized properly due to
erroneous logic in Neutron code causing l3-agent to get stuck in the
Notready state. The readiness probe states that one of routers is not
ready with the keepalived process not started.
Example output of the
kubectl -n openstack describe pod <neutron-l3 agent pod name>
command:
It is not possible to create an instance that uses a security group shared
through role-based access control (RBAC) with only specifying the network ID
when calling Nova. In such case, before creating a port in the given network,
Nova verifies if the given security group exists in Neutron. However, Nova asks
only for the security groups filtered by project_id. Therefore, it will not
get the shared security group back from the Neutron API. For details, see the
OpenStack known issue
#1942615.
Octavia load balancers provisioning_status may get stuck in the
ERROR, PENDING_UPDATE, PENDING_CREATE, or PENDING_DELETE state.
Occasionally, the listeners or pools associated with these load balancers may
also get stuck in the same state.
Workaround:
For administrative users that have access to the
keystone-client pod:
For non-administrative users, access the Octavia API directly and delete the
affected load balancer using the "force":true argument in the delete
request:
This section lists the Tungsten Fabric known issues with
workarounds for the Mirantis OpenStack for Kubernetes release
22.3. For Tungsten Fabric limitations, see Tungsten Fabric known limitations.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
kubectl-ntfdeletepodtf-control-<hash>
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
Ceph conditon gets stuck in absence of the Ceph cluster secrets information.
The observed behaviour can be found on the MOSK clusters
that have automatically updated their management cluster to Container Cloud
2.21 but are still running the MOSK 22.3 version.
The list of the symptoms includes:
The Cluster object contains the following condition:
Substitute <managedClusterProject> with the corresponding
managed cluster namespace.
Define the version parameter in the KaaSCephCluster spec:
spec:cephClusterSpec:version:15.2.13
Note
Starting from MOSK 22.4, the Ceph cluster
version updates to 15.2.17. Therefore, remove the version parameter
definition from KaaSCephCluster after the managed cluster update.
Save the updated KaaSCephCluster spec.
Find the MiraCeph Custom Resource on a managed cluster and copy all
annotations starting with meta.helm.sh:
Substitute <managedClusterKubeconfig> with a corresponding managed
cluster kubeconfig.
Example of a system output:
apiVersion:apiextensions.k8s.io/v1kind:CustomResourceDefinitionmetadata:annotations:controller-gen.kubebuilder.io/version:v0.6.0# save all annotations with "meta.helm.sh" somewheremeta.helm.sh/release-name:ceph-controllermeta.helm.sh/release-namespace:ceph...
Create the miracephsecretscrd.yaml file and fill it with the following
template:
apiVersion:apiextensions.k8s.io/v1kind:CustomResourceDefinitionmetadata:annotations:controller-gen.kubebuilder.io/version:v0.6.0<insert all "meta.helm.sh" annotations here>labels:app.kubernetes.io/managed-by:Helmname:miracephsecrets.lcm.mirantis.comspec:conversion:strategy:Nonegroup:lcm.mirantis.comnames:kind:MiraCephSecretlistKind:MiraCephSecretListplural:miracephsecretssingular:miracephsecretscope:Namespacedversions:-name:v1alpha1schema:openAPIV3Schema:description:MiraCephSecret aggregates secrets created by Cephproperties:apiVersion:type:stringkind:type:stringmetadata:type:objectstatus:properties:lastSecretCheck:type:stringlastSecretUpdate:type:stringmessages:items:type:stringtype:arraystate:type:stringtype:objecttype:objectserved:truestorage:true
Insert the copied meta.helm.sh annotations to the
metadata.annotations section of the template.
Apply miracephsecretscrd.yaml on the managed cluster:
After the Container Cloud management cluster upgrade from 2.19.0 to 2.20.0,
the ironic-conductor Pod gets stuck in the CrashLoopBackOff state. The
issue occurs due to the race condition between the ironic-conductor and
ironic-conductor-http containers of the ironic-conductor Pod that try
to use ca-bundle.pem simultaneously but from different users.
After an OpenStack controller node replacement, the
octavia-create-resources job does not restart and the Octavia Health
Manager Pod on the new node cannot find its port in the Kubernetes secret. As a
result, MOSK cluster update may fail.
Workaround:
After adding the new OpenStack controller node but before the update process
starts, manually restart the octavia-create-resources job:
After updating the MOSK cluster, MetalLB speaker may
fail to announce the Load Balancer (LB) IP address for the OpenStack Ingress
service. As a result, the OpenStack Ingress service is not accessible using
its LB IP address.
The issue may occur if the MetalLB speaker nodeSelector selects not all
the nodes selected by nodeSelector of the OpenStack Ingress service.
The issue may arise and disappear when a new MetalLB speaker is being selected
by the MetalLB Controller to announce the LB IP address.
The issue occurs since MOSK 22.2 after
externalTrafficPolicy was set to local for the OpenStack Ingress
service.
Workaround:
Select from the following options:
Set externalTrafficPolicy to cluster for the OpenStack Ingress
service.
This option is preferable in the following cases:
If not all cluster nodes have connection to the external network
If the connection to the external network cannot be established
If network configuration changes are not desired
If network configuration is allowed and if you require the
externalTrafficPolicy:local option:
Wire the external network to all cluster nodes where the OpenStack Ingress
service Pods are running.
Configure IP addresses in the external network on the nodes and change the
default routes on the nodes.
Change nodeSelector of MetalLB speaker to match nodeSelector
of the OpenStack Ingress service.
[23154] Ceph health is in ‘HEALTH_WARN’ state after managed cluster update¶
After updating the MOSK cluster, Ceph health is in
the HEALTH_WARN state with the SLOW_OPS health message.
The workaround is to restart the affected Ceph Monitors.
The following issues have been addressed in the MOSK
22.3 release:
[23771][Update] Fixed the issue that caused connectivity loss due to a wrong
update order of Neutron services.
[23131][Update] Fixed the issue that caused live migration to fail during an
update of a cluster with encrypted storage. Now, you can perform live
migrations using the following command:
[21790][Update] Fixed the issue wherein the Ceph cluster failed to update on
a managed cluster with the Daemonset csi-rbdplugin is not found error
message.
[23985][OpenStack] Fixed the issue that caused the federated authorization
failure on the Keycloak URL update.
[23484][OpenStack] To prevent from ovsdb session disconnection causing
port processing operations being blocked, configured the default timeouts for
ovsdb.
[23297][OpenStack] Fixed the issue that caused VMs to be unaccessible through
floating IP due to missing IPtables rules for floating IPs on the OpenStack
compute DVR router.
[23043][OpenStack] Updated Neutron Open vSwitch to version 2.13 to fix the
issue that caused broken communication between VMs in the same network.
[19065][OpenStack] Fixed the issue that caused Octavia load balancers to lose
Amphora VMs after failover.
[23338][Tungsten Fabric] Fixed the issue wherein the Tungsten Fabric (TF)
tools from the contrail-tools container did not work on DPDK nodes.
[22273][Tungsten Fabric] To avoid issues with CassandraCacheHitRateTooLow
StackLight alerts raising for the tf-cassandra-analytics Pods,
implemented the capability to configure file_cache_size_in_mb for the
tf-cassandra-analytics or tf-cassandra-config Cassandra deployment.
By default, this parameter is set to 512. For details, see
Cassandra configuration.
This section describes the specific actions you as a cloud operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster to the version 22.3.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
Additionally, read through the Cluster update known issues for the problems
that are known to occur during update with recommended workarounds.
Features¶Migrating secrets from OpenStackDeployment to OpenStackDeploymentSecret CR¶
The OpenStackDeploymentSecret custom resource replaced the fields in
OpenStackDeployment customer resource that used to keep the cloud’s
confidential settings. These include:
After the update, migrate the fields mentioned above from OpenStackDeployment
to OpenStackDeploymentSecret custom resource as follows:
Create an OpenStackDeploymentSecret object with the same name as
the OpenStackDeployment object.
Set the fields in the OpenStackDeploymentSecret custom resource as
required.
Remove the related fields from the OpenStackDeployment custom resource.
Switching to built-in policies for OpenStack services¶
Switched all OpenStack components to built-in policies by default. If you have
any custom policies defined through the features:policies structure in
the OpenStackDeployment custom resource, some API calls may not work as
usual. Therefore, after completing the update, revalidate all the custom
access rules configured for your cloud.
Post-update actions¶Validation of custom OpenStack policies¶
Revalidate all the custom OpenStack access rules configured through the
features:policies structure in the OpenStackDeployment custom
resource.
To complete the update of a cluster with Tungsten Fabric as a backend for
networking, manually restart Tungsten Fabric vRouter agent Pods on all
compute nodes.
Restart of a vRouter agent on a compute node will cause up to 30-60
seconds of networking downtime per instance hosted there. If downtime is
unacceptable for some workloads, we recommend that you migrate them before
restarting the vRouter Pods.
Warning
Under certain rare circumstances, the reload of the vRouter kernel
module triggered by the restart of a vRouter agent can hang due to
the inability to complete the drop_caches operation. Watch the status
and logs of the vRouter agent being restarted and trigger the reboot of
the node, if necessary.
To restart the vRouter Pods:
Remove the vRouter pods one by one manually.
Note
Manual removal is required because vRouter pods use the
OnDelete update strategy. vRouter pod restart causes networking
downtime for workloads on the affected node. If it is not applicable for
some workloads, migrate them before restarting the vRouter pods.
kubectl-ntfdeletepod<VROUTER-POD-NAME>
Verify that all tf-vrouter-* pods have been updated:
kubectl-ntfgetds|greptf-vrouter
The UP-TO-DATE and CURRENT fields must have the same values.
Exposed the IP addresses of the cloud users that consume API of a cloud to all
user-facing cloud services, such as OpenStack, Ceph, and others. Now, the IP
addresses get recoded in the corresponding logs allowing for easy
troubleshooting and security auditing of the cloud.
Implemented the capability to configure CPU isolation using the cpusets
mechanism in Linux kernel. Configuring CPU isolation using the isolcpus
configuration parameter for Linux kernel is considered deprecated.
Validated MOSK against the upstream
OpenStack Security Checklist.
The default configuration of MOSK services that include
Identity, Dashboard, Compute, Block Storage, and Networking services is
compliant with the security recommendations from the OpenStack community.
Encryption of all the internal communications for MOSK
services will become available in one of the nearest product releases.
Implemented the capability to configure the LoadBalancer type for PowerDNS
through the spec:features:designate definition in the
OpenStackDeployment CR, for example, to expose the TCP protocol instead of
the default UDP, or both.
Added the tf-control-dns-external service to the list of the Tungsten
Fabric configuration options. The service is created by default to expose TF
controldns. You can disable creation of this service using the
enableDNSExternal parameter in the TFOperator CR.
Implemented the initial Technology Preview support for MOSK
deployment on local software-based mdadm Redundant Array of Independent Disks
(RAID) devices of level 10 (raid10) to withstand failure of one device
at a time.
The raid10 RAID type requires at least four and in total an even number
of storage devices available on your servers.
To create and configure RAID, use the softRaidDevices field in
BaremetalHostProfile.
Also, added the capability to create LVM volume groups on top of mdadm-based
RAID devices.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
This section lists the Tungsten Fabric known issues with
workarounds for the Mirantis OpenStack for Kubernetes release
22.2. For Tungsten Fabric limitations, see Tungsten Fabric known limitations.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
kubectl-ntfdeletepodtf-control-<hash>
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
During l3-agent restart, routers may not be initialized properly due to
erroneous logic in Neutron code causing l3-agent to get stuck in the
Notready state. The readiness probe states that one of routers is not
ready with the keepalived process not started.
Example output of the
kubectl -n openstack describe pod <neutron-l3 agent pod name>
command:
It is not possible to create an instance that uses a security group shared
through role-based access control (RBAC) with only specifying the network ID
when calling Nova. In such case, before creating a port in the given network,
Nova verifies if the given security group exists in Neutron. However, Nova asks
only for the security groups filtered by project_id. Therefore, it will not
get the shared security group back from the Neutron API. For details, see the
OpenStack known issue
#1942615.
After updating the Keycloak URL in the OpenStackDeployment resource
through the spec.features.keystone.keycloak.url or
spec.features.keystone.keycloak.oidc.OIDCProviderMetadataURL fields,
authentication to Keystone through federated OpenID Connect through Keycloak
stops working returning HTTP 403 Unauthorized on authentication attempt.
The failure occurs because such change is not automatically propagated to the
corresponding Keycloak identity provider, which was automatically created in
Keystone during the initial deployment.
The workaround is to manually update the identity provider’s remote_ids
attribute:
Compare the Keycloak URL set in the OpenStackDeployment resource
with the one set in Keystone identity provider:
kubectl -n openstack get osdpl -ojsonpath='{.items[].spec.features.keystone.keycloak}# vsopenstack identity provider show keycloak -f value -c remote_ids
If the URLs do not coincide, update the identity provider in OpenStack with
the correct URL keeping the auth/realms/iam part as shown below.
Otherwise, the problem is caused by something else, and you need to proceed
with the debugging.
Octavia load balancers provisioning_status may get stuck in the
ERROR, PENDING_UPDATE, PENDING_CREATE, or PENDING_DELETE state.
Occasionally, the listeners or pools associated with these load balancers may
also get stuck in the same state.
Workaround:
For administrative users that have access to the
keystone-client pod:
For non-administrative users, access the Octavia API directly and delete the
affected load balancer using the "force":true argument in the delete
request:
If an Amphora VM does not respond or responds too long to heartbeat requests,
the Octavia load balancer automatically initiates a failover process after 60
seconds of unsuccessful attempts. Long responses of an Amphora VM may be caused
by various events, such as a high load on the OpenStack compute node that hosts
the Amphora VM, network issues, system service updates, and so on. After a
failover, the Amphora VMs may be missing in the completed Octavia load
balancer.
Workaround:
If your deployment is already affected, manually restore the work of the load
balancer by recreating the Amphora VM:
To avoid an automatic failover start that may cause the issue, set the
heartbeat_timeout parameter in the OpenStackDeployment CR to a large
value in seconds. The default is 60 seconds. For example:
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not
work correctly. The symptoms include both failure to properly balance
traffic and failure to perform an amphora failover. For details, see
DVR incompatibility with ARP announcements and VRRP.
[23154] Ceph health is in ‘HEALTH_WARN’ state after managed cluster update¶
After updating the MOSK cluster, Ceph health is in
the HEALTH_WARN state with the SLOW_OPS health message.
The workaround is to restart the affected Ceph Monitors.
[23771] Connectivity loss due to wrong update order of Neutron services¶
After updating the cluster, simultaneous unordered restart of Neutron L2
and L3, DHCP, and Metadata services leads to the state when ports on
br-int are tagged with valid VLAN tags but with trunks:[4095].
After updating the MOSK cluster, MetalLB speaker may
fail to announce the Load Balancer (LB) IP address for the OpenStack Ingress
service. As a result, the OpenStack Ingress service is not accessible using
its LB IP address.
The issue may occur if the MetalLB speaker nodeSelector selects not all
the nodes selected by nodeSelector of the OpenStack Ingress service.
The issue may arise and disappear when a new MetalLB speaker is being selected
by the MetalLB Controller to announce the LB IP address.
The issue occurs since MOSK 22.2 after
externalTrafficPolicy was set to local for the OpenStack Ingress
service.
Workaround:
Select from the following options:
Set externalTrafficPolicy to cluster for the OpenStack Ingress
service.
This option is preferable in the following cases:
If not all cluster nodes have connection to the external network
If the connection to the external network cannot be established
If network configuration changes are not desired
If network configuration is allowed and if you require the
externalTrafficPolicy:local option:
Wire the external network to all cluster nodes where the OpenStack Ingress
service Pods are running.
Configure IP addresses in the external network on the nodes and change the
default routes on the nodes.
Change nodeSelector of MetalLB speaker to match nodeSelector
of the OpenStack Ingress service.
The following issues have been addressed in the MOSK 22.2
release:
[22725][Update] Fixed the issue raising upon managed cluster update and
causing live migration to fail for instances with deleted images.
[18871][Update] Fixed the issue causing MySQL to crash during a managed
cluster update or instances live migration.
[16987][Update] Fixed the issue that caused update of a
MOSK cluster to fail with the ceph csi-driver is
not evacuated yet, waiting… error during the Ceph CSI pod eviction.
[22321][OpenStack] Fixed the issue wherein the Neutron backend option in
OsDpl inadvertently changed from ml2 to tungstenfabric upon
managed cluster update.
[21998][OpenStack] Fixed the issue wherein the OpenStack Controller got
stuck during the managed cluster update.
[21838][OpenStack] Fixed the issue wherein the Designate API failed to log
requests.
[21376][OpenStack] Fixed the issue that caused inability to create
non-encrypted volumes from a large image.
[15354][OpenStack] Implemented coordination between instances of the Masakari
API service to prevent creation of multiple evacuation flows for instances.
[1659][OpenStack] Fixed the issue wherein the Neutron Open vSwitch agent did
not clean tunnels upon changes of the tunnel_ip option.
[20192][StackLight] To avoid issues causing false-positive
CassandraTombstonesTooManyMajor StackLight alerts, adjusted the
thresholds for the CassandraTombstonesTooManyMajor and
CassandraTombstonesTooManyWarning alerts and added a new
CassandraTombstonesTooManyCritical alert.
[21064][Ceph] Fixed the issue causing the MOSK managed
cluster to fail with the Error updating release ceph/ceph-controller error
and enabled Helm v3 for Ceph Controller.
This section describes the specific actions you as a cloud operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster to the version 22.2.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
Additionally, read through the Cluster update known issues for the problems
that are known to occur during update with recommended workarounds.
Your MOSK cluster will obtain the newly implemented
capabilities automatically with no significant impact on the update procedure.
Update impact and maintenance windows planning¶Up to 1 minute of downtime for TF data plane¶
During the Kubernetes master nodes update, there is a downtime on Kubernetes
cluster’s internal DNS service. Thus, Tungsten Fabric vRouter pods lose
connection with the control plane resulting in up to 1 minute of downtime
for the Tungsten Fabric data plane nodes and impact on the tenant networking.
Post-update actions¶Manual restart of TF vRouter Agent pods¶
To complete the update of the cluster with Tungsten Fabric, manually restart
Tungsten Fabric vRouter Agent pods on all compute nodes. The restart of a
vRouter Agent on a compute node will cause up to 30-60 seconds of networking
downtime per instance hosted there. If downtime is unacceptable for some
workloads, we recommend that you migrate them before restarting the vRouter
pods.
Warning
Under certain rare circumstances, the reload of the vRouter kernel
module triggered by the restart of a vRouter Agent can hang due to
the inability to complete the drop_caches operation. Watch the status
and logs of the vRouter Agent being restarted and trigger the reboot of
the node, if necessary.
The list of the major changes in the component versions includes:
Host OS kernel v5.4
RabbitMQ 3.9
Mirantis Kubernetes Engine (MKE) 3.4.6 with Kubernetes 1.20
All the relevant changes are applied to the MOSK cluster
automatically during the cluster update procedure. The host machine’s kernel
update implies node reboot. See the links below for details.
Implemented the capability to configure the CPU model through the
spec:features:nova:vcpu_type definition of the OpenStackDeployment CR.
The default CPU model is now host-model, which replaces the previous
default kvm64 CPU model.
For deployments with CPU model customized through spec:services, remove
this customization after upgrading your managed cluster.
Automatic backup and restoration of Tungsten Fabric data¶
Implemented the capability to automatically back up and restore the Tungsten
Fabric data stored in Cassandra and ZooKeeper.
The user can perform the automatic data backup by enabling the
tf-dbBackup controller through the Tungsten Fabric Operator CR.
By default, the job is scheduled for weekly execution, allocating PVC
with 5Gi size for storing backups, and keeping 5 previous backups.
Also, MOSK allows for automatic data restoration with
the ability to restore from the exact backup if required.
Implemented the Border Gateway Protocol (BGP) and encapsulation settings in the
Tungsten Fabric Operator custom resource. This feature provides persistency of
the BGP and encapsulation parameters.
Also, added technical preview of the VxLAN encapsulation feature.
Implemented object storage encryption integrated with
the OpenStack Key Manager service (Barbican). The feature is enabled by
default in MOSK deployments with Barbican.
Mirantis has tested MOSK against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOSK components with the exact versions against which
testing has been performed.
This section describes the MOSK known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
Tungsten Fabric does not provide the following functionality:
Automatic generation of network port records in DNSaaS
(Designate) as Neutron with Tungsten Fabric as a backend
is not integrated with DNSaaS. As a workaround, you can use
the Tungsten Fabric built-in DNS service that enables virtual
machines to resolve each other names.
Secret management (Barbican). You cannot use the certificates
stored in Barbican to terminate HTTPs in a load balancer.
Role Based Access Control (RBAC) for Neutron objects.
[10096] tf-control does not refresh IP addresses of Cassandra pods¶
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
kubectl-ntfdeletepodtf-control-<hash>
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[15684] Pods fail when rolling Tungsten Fabric 2011 back to 5.1¶
Some tf-control and tf-analytics pods may fail during the Tungsten
Fabric rollback from version 2011 to 5.1. In this case, the control
container from the tf-control pod and/or the collector container from
the tf-analytics pod contain SYS_WARN messages such as
… AMQP_QUEUE_DELETE_METHOD caused: PRECONDITION_FAILED - queue
‘<contrail-control/contrail-collector>.<nodename>’ in vhost ‘/’ not empty
….
The workaround is to manually delete the queue that fails to be deleted by
AMQP_QUEUE_DELETE_METHOD:
It is not possible to create an instance that uses a security group shared
through role-based access control (RBAC) with only specifying the network ID
when calling Nova. In such case, before creating a port in the given network,
Nova verifies if the given security group exists in Neutron. However, Nova asks
only for the security groups filtered by project_id. Therefore, it will not
get the shared security group back from the Neutron API. For details, see the
OpenStack known issue
#1942615.
After updating the Keycloak URL in the OpenStackDeployment resource
through the spec.features.keystone.keycloak.url or
spec.features.keystone.keycloak.oidc.OIDCProviderMetadataURL fields,
authentication to Keystone through federated OpenID Connect through Keycloak
stops working returning HTTP 403 Unauthorized on authentication attempt.
The failure occurs because such change is not automatically propagated to the
corresponding Keycloak identity provider, which was automatically created in
Keystone during the initial deployment.
The workaround is to manually update the identity provider’s remote_ids
attribute:
Compare the Keycloak URL set in the OpenStackDeployment resource
with the one set in Keystone identity provider:
kubectl -n openstack get osdpl -ojsonpath='{.items[].spec.features.keystone.keycloak}# vsopenstack identity provider show keycloak -f value -c remote_ids
If the URLs do not coincide, update the identity provider in OpenStack with
the correct URL keeping the auth/realms/iam part as shown below.
Otherwise, the problem is caused by something else, and you need to proceed
with the debugging.
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not
work correctly. The symptoms include both failure to properly balance
traffic and failure to perform an amphora failover. For details, see
DVR incompatibility with ARP announcements and VRRP.
[19065] Octavia load balancers lose Amphora VMs after failover¶
If an Amphora VM does not respond or responds too long to heartbeat requests,
the Octavia load balancer automatically initiates a failover process after 60
seconds of unsuccessful attempts. Long responses of an Amphora VM may be caused
by various events, such as a high load on the OpenStack compute node that hosts
the Amphora VM, network issues, system service updates, and so on. After a
failover, the Amphora VMs may be missing in the completed Octavia load
balancer.
Workaround:
If your deployment is already affected, manually restore the work of the load
balancer by recreating the Amphora VM:
To avoid an automatic failover start that may cause the issue, set the
heartbeat_timeout parameter in the OpenStackDeployment CR to a large
value in seconds. The default is 60 seconds. For example:
During the update of a MOSK cluster to 22.1, live
migration may fail for instances if their images were previously deleted. In
this case, the nova-compute pod contains an error message similar to the
following one:
2022-03-2223:55:24.46811816ERRORnova.compute.manager[instance:128cf508-f7f7-4a40-b742-392c8c80fc7d]Command:scp-C-rkaas-node-03ab613d-cf79-4830-ac70-ed735453481a:/var/l
ib/nova/instances/_base/e2b6c1622d45071ec8a88a41d07ef785e4dfdfe8/var/lib/nova/instances/_base/e2b6c1622d45071ec8a88a41d07ef785e4dfdfe8
2022-03-2223:55:24.46811816ERRORnova.compute.manager[instance:128cf508-f7f7-4a40-b742-392c8c80fc7d]Exitcode:12022-03-2223:55:24.46811816ERRORnova.compute.manager[instance:128cf508-f7f7-4a40-b742-392c8c80fc7d]Stdout:''2022-03-2223:55:24.46811816ERRORnova.compute.manager[instance:128cf508-f7f7-4a40-b742-392c8c80fc7d]Stderr:'ssh: Could not resolve hostname kaas-node-03ab613d-cf79-4830-ac70-ed735453481a: Name or service not known\r\n'
Workaround:
If you have not yet started the managed cluster update, change the
nova-compute image by setting the following metadata in the
OpenStackDeployment CR:
If you have already started the managed cluster update, manually update the
nova-compute container image in the nova-compute DaemonSet to
mirantis.azurecr.io/openstack/nova:victoria-bionic-20220324125700.
[16987] Cluster update fails at Ceph CSI pod eviction¶
An update of a MOSK cluster may fail with the
ceph csi-driver is not evacuated yet, waiting… error during the Ceph
CSI pod eviction.
Workaround:
Scale the affected StatefulSet of the pod that fails to init down to
0 replicas. If it is the DaemonSet such as nova-compute, it must
not be scheduled on the affected node.
On every csi-rbdplugin pod, search for stuck csi-vol:
Scale the affected StatefulSet back to the original number of replicas
or until its state is Running. If it is a DaemonSet, run the pod on
the affected node again.
[18871] MySQL crashes during managed cluster update or instances live migration¶
MySQL may crash when performing instances live migration or during a managed
cluster update. After the crash, MariaDB cannot connect to the cluster and gets
stuck in the CrashLoopBackOff state.
Workaround:
Verify that other MariaDB replicas are up and running and have joined the
cluster:
Verify that at least 2 pods are running and operational
(2/2 and Running):
kubectl-nopenstackgetpods|grepmaria
Example of system response where the pods mariadb-server-0 and
mariadb-server-2 are operational:
In the openstack-controller logs, retries are exhausted:
2022-02-17 18:41:43,317 [ERROR] kopf._core.engines.peering: Request attempt#8 failed; will retry: PATCH https://10.232.0.1:443/apis/zalando.org/v1/namespaces/openstack/kopfpeerings/openstack-controller.nodemaintenancerequest -> APIServerError('Internal error occurred: unable to unmarshal response in forceLegacy: json: cannot unmarshal number into Go value of type bool', {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'Internal error occurred: unable to unmarshal response in forceLegacy: json: cannot unmarshal number into Go value of type bool', 'reason': 'InternalError', 'details': {'causes': [{'message': 'unable to unmarshal response in forceLegacy: json: cannot unmarshal number into Go value of type bool'}]}, 'code': 500})2022-02-17 18:42:50,834 [INFO] kopf.objects: Timer 'heartbeat' succeeded.2022-02-17 18:47:50,848 [INFO] kopf.objects: Timer 'heartbeat' succeeded.2022-02-17 18:52:50,853 [INFO] kopf.objects: Timer 'heartbeat' succeeded.2022-02-17 18:57:50,858 [INFO] kopf.objects: Timer 'heartbeat' succeeded.2022-02-17 19:02:50,862 [INFO] kopf.objects: Timer 'heartbeat' succeeded.
Notification about a successful finish does not exist:
[22321] Neutron backend may change from TF to ML2¶
An update of the MOSK cluster with Tungsten Fabric
may hang due to the changed Neutron backend with the following symptoms:
The libvirt and nova-compute pods fail to start:
Entrypoint WARNING:2022/03/03 08:49:45 entrypoint.go:72:Resolving dependency Pod on same host with labelsmap[application:neutron component:neutron-ovs-agent] in namespace openstack failed:Found no pods matching labels:map[application:neutron component:neutron-ovs-agent] .
In the OSDPL network section, the ml2 backend is specified instead
of tungstenfabric:
spec:features:neutron:backend:ml2
As a workaround, change the backend option from ml2 to
tungstenfabric:
The following issues have been addressed in the MOSK 22.1
release:
[16495][OpenStack] Fixed the issue with Kubernetes not rescheduling OpenStack
deployment pods after a node recovery.
[18713][OpenStack] Fixed the issue causing inability to remove a Glance image
after an unsuccessful instance spawn attempt when image signature
verification was enabled but the signature on the image was incorrect.
[19274][OpenStack] Fixed the issue causing inability to create a Heat stack
by specifying the Heat template as a URL link due to Horizon container
missing proxy variables.
[19791][OpenStack] Fixed the issue with the DEFAULT volume type being
automatically created, as well as listed as a volume type in Horizon, for
database migration even if default_volume_type was set in
cinder.conf.
[18829][StackLight] Fixed the issue causing the Prometheus exporter for the
Tungsten Fabric Controller pod to fail to start upon the StackLight log level
change.
[18879][Ceph] Fixed the issue with the RADOS Gateway (RGW) pod overriding the
global CA bundle located at /etc/pki/tls/certs with an incorrect
self-signed CA bundle during deployment of a Ceph cluster.
[19195][Tungsten Fabric] Fixed the issue causing the managed cluster status
to flap between the Ready/NotReady states in the Container Cloud
web UI.
This section describes the specific actions you as a cloud operator need to
complete to accurately plan and successfully perform your
Mirantis OpenStack for Kubernetes (MOSK) cluster to the version 22.1.
Consider this information as a supplement to the generic update procedure
published in Operations Guide: Update a MOSK cluster.
Additionally, read through the Cluster update known issues for the problems
that are known to occur during update with recommended workarounds.
Starting from MOSK 22.1, the virtual CPU mode is set to
host-model by default, which replaces the previous default kvm64
CPU model.
The new default option provides performance and workload portability, namely
reliable live and cold migration of instances between hosts, and ability
to run modern guest operating systems, such as Windows Server.
For the deployments the virtual CPU mode settings customized through
spec:services, remove this customization in favor of the defaults
after the update.
Update impact and maintenance windows planning¶Host OS kernel version upgrade to v5.4¶
MOSK 22.1 includes the updated version of the host
machine’s kernel that is v5.4. All nodes in the cluster will get restarted
to apply the relevant changes.
Node group
Sequential restart
Impact on end users and workloads
Kubernetes master nodes
Yes
No
Control plane nodes
Yes
No
Storage nodes
Yes
No
Compute nodes
Yes
15-20 minutes of downtime for workloads hosted on a compute node
depending on the hardware specifications of the node
During the Kubernetes master nodes update, there is a downtime on Kubernetes
cluster’s internal DNS service. Thus, Tungsten Fabric vRouter pods lose
connection with the control plane resulting in up to 1 minute of downtime
for the Tungsten Fabric data plane nodes and impact on the tenant networking.
Post-update actions¶Manual restart of TF vRouter Agent pods¶
To complete the update of the cluster with Tungsten Fabric, manually restart
Tungsten Fabric vRouter Agent pods on all compute nodes. The restart of a
vRouter Agent on a compute node will cause up to 30-60 seconds of networking
downtime per instance hosted there. If downtime is unacceptable for some
workloads, we recommend that you migrate them before restarting the vRouter
pods.
Warning
Under certain rare circumstances, the reload of the vRouter kernel
module triggered by the restart of a vRouter Agent is known to hang due to
the inability to complete the drop_caches operation. Watch the status
and logs of the vRouter Agent being restarted and trigger the reboot of
the node if necessary.
Periodic automatic cleanup of OpenStack databases¶
Implemented an automatic cleanup of deleted entries from the databases of
OpenStack services. By default, the deleted rows older than 30 days are sanely
and safely purged from the Barbican, Cinder, Glance, Heat, Masakari, and Nova
databases for all relevant tables.
Implemented the capability to perform image signature verification when
booting an OpenStack instance, uploading a Glance image with signature
metadata fields set, and creating a volume from an image.
Implemented the ability to set the kv_mountpoint and namespace
in spec:features:barbican to specify the mountpoint of a Key-Value
store and the Vault namespace to be used for all requests to Vault
respectively.
Implemented the capability for the Tungsten Fabric Operator to use
ValidatingAdmissionWebhook to validate environment variables set to
Tungsten Fabric components upon the TFOperator object creation or update.
Tungsten Fabric 2011 is now set as the default version for deployment.
Tungsten Fabric 5.1 is considered deprecated and will be declared unsupported
in one of the upcoming releases. Therefore, Mirantis highly recommends
upgrading from Tungsten Fabric 5.1 to 2011.
Implemented the capability to deploy a MOS with
Tungsten Fabric cluster with a multi-rack architecture to allow for native
integration with the Layer 3-centric networking topologies.
Mirantis has tested MOS against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOS components with the exact versions against which
testing has been performed.
This section describes the MOS known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
Tungsten Fabric does not provide the following functionality:
Automatic generation of network port records in DNSaaS
(Designate) as Neutron with Tungsten Fabric as a backend
is not integrated with DNSaaS. As a workaround, you can use
the Tungsten Fabric built-in DNS service that enables virtual
machines to resolve each other names.
Secret management (Barbican). You cannot use the certificates
stored in Barbican to terminate HTTPs in a load balancer.
Role Based Access Control (RBAC) for Neutron objects.
Modification of custom vRouter DaemonSets based on the SR-IOV definition in
the OsDpl CR.
[10096] tf-control does not refresh IP addresses of Cassandra pods¶
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
kubectl-ntfdeletepodtf-control-<hash>
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[15684] Pods fail when rolling Tungsten Fabric 2011 back to 5.1¶
Some tf-control and tf-analytics pods may fail during the Tungsten
Fabric rollback from version 2011 to 5.1. In this case, the control
container from the tf-control pod and/or the collector container from
the tf-analytics pod contain SYS_WARN messages such as
… AMQP_QUEUE_DELETE_METHOD caused: PRECONDITION_FAILED - queue
‘<contrail-control/contrail-collector>.<nodename>’ in vhost ‘/’ not empty
….
The workaround is to manually delete the queue that fails to be deleted by
AMQP_QUEUE_DELETE_METHOD:
During LCM operations such as Tungsten Fabric update or upgrade, the following
parameters defined by the cluster administrator are reset to the following
defaults upon the tf-config pod restart:
BGP_ASN to 64512
ENCAP_PRIORITY to MPLSoUDP,MPLSoGRE,VXLAN
VXLAN_VN_ID_MODE to automatic
As a workaround, manually set up values for the required parameters if they
differ from the defaults:
The status of a managed cluster may be flapping between the Ready and
NotReady states in the Container Cloud web UI. In this case, if the
cluster Status field includes a message about not ready
tf/tf-tool-status-aggregator and/or tf-tool-status-party deployments
with 1/1 replicas, the status flapping may be caused by frequent updates of
these deployments by the Tungsten Fabric Operator.
Workaround:
Verify whether the tf/tf-tool-status-aggregator and
tf-tool-status-party deployments are up and running:
kubectl-ntfgetdeployments
Safely disable the tf/tf-tool-status-aggregator and
tf-tool-status-party deployments through the TFOperator CR:
It is not possible to create an instance that uses a security group shared
through role-based access control (RBAC) with only specifying the network ID
when calling Nova. In such case, before creating a port in the given network,
Nova verifies if the given security group exists in Neutron. However, Nova asks
only for the security groups filtered by project_id. Therefore, it will not
get the shared security group back from the Neutron API. For details, see the
OpenStack known issue
#1942615.
After updating the Keycloak URL in the OpenStackDeployment resource
through the spec.features.keystone.keycloak.url or
spec.features.keystone.keycloak.oidc.OIDCProviderMetadataURL fields,
authentication to Keystone through federated OpenID Connect through Keycloak
stops working returning HTTP 403 Unauthorized on authentication attempt.
The failure occurs because such change is not automatically propagated to the
corresponding Keycloak identity provider, which was automatically created in
Keystone during the initial deployment.
The workaround is to manually update the identity provider’s remote_ids
attribute:
Compare the Keycloak URL set in the OpenStackDeployment resource
with the one set in Keystone identity provider:
kubectl -n openstack get osdpl -ojsonpath='{.items[].spec.features.keystone.keycloak}# vsopenstack identity provider show keycloak -f value -c remote_ids
If the URLs do not coincide, update the identity provider in OpenStack with
the correct URL keeping the auth/realms/iam part as shown below.
Otherwise, the problem is caused by something else, and you need to proceed
with the debugging.
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not work
correctly. The symptoms include both failure to properly balance traffic and
failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.
[16495] Failure to reschedule OpenStack deployment pods after a node recovery¶
When image signature verification is enabled and the signature is incorrect on
the image, it is impossible to remove a Glance image right after an
unsuccessful instance spawn attempt. As a workaround, wait for at least one
minute before trying to remove the image.
Horizon container is missing proxy variables. As a result, it is not possible
to create a Heat stack by specifying the Heat template as a URL link. As a
workaround, use a different upload method and specify the file from a local
folder.
[19065] Octavia load balancers lose Amphora VMs after failover¶
If an Amphora VM does not respond or responds too long to heartbeat requests,
the Octavia load balancer automatically initiates a failover process after 60
seconds of unsuccessful attempts. Long responses of an Amphora VM may be caused
by various events, such as a high load on the OpenStack compute node that hosts
the Amphora VM, network issues, system service updates, and so on. After a
failover, the Amphora VMs may be missing in the completed Octavia load
balancer.
Workaround:
If your deployment is already affected, manually restore the work of the load
balancer by recreating the Amphora VM:
To avoid an automatic failover start that may cause the issue, set the
heartbeat_timeout parameter in the OpenStackDeployment CR to a large
value in seconds. The default is 60 seconds. For example:
An update of a MOS cluster may fail with the
ceph csi-driver is not evacuated yet, waiting… error during the Ceph
CSI pod eviction.
Workaround:
Scale the affected StatefulSet of the pod that fails to init down to
0 replicas. If it is the DaemonSet such as nova-compute, it must
not be scheduled on the affected node.
On every csi-rbdplugin pod, search for stuck csi-vol:
Scale the affected StatefulSet back to the original number of replicas
or until its state is Running. If it is a DaemonSet, run the pod on
the affected node again.
[18871] MySQL crashes during managed cluster update or instances live migration¶
MySQL may crash when performing instances live migration or during an update of
a managed cluster running MOS from version 6.19.0 to 6.20.0. After the crash,
MariaDB cannot connect to the cluster and gets stuck in the
CrashLoopBackOff state.
Workaround:
Verify that other MariaDB replicas are up and running and have joined the
cluster:
Verify that at least 2 pods are running and operational
(2/2 and Running):
kubectl-nopenstackgetpods|grepmaria
Example of system response where the pods mariadb-server-0 and
mariadb-server-2 are operational:
During deployment of a Ceph cluster, the RADOS Gateway (RGW) pod overrides
the global CA bundle located at /etc/pki/tls/certs with an incorrect
self-signed CA bundle. The issue affects only clusters with public
certificates.
Workaround:
Open the KaasCephCluster CR of a managed cluster for editing:
Substitute <managedClusterProjectName> with a corresponding value.
Note
If the public CA certificates also apply to the OsDpl CR,
edit this resource as well.
Select from the following options:
If you are using the GoDaddy certificates, in the
cephClusterSpec.objectStorage.rgw section, replace the
cacert parameters with your public CA certificate that already
contains both the root CA certificate and intermediate CA certificate:
The following issues have been addressed in the Mirantis OpenStack for
Kubernetes 21.6 release:
[16452][OpenStack] Fixed the issue causing failure to update the Octavia
policy after policies removal from the OsDpl CR. The issue affected OpenStack
Victoria.
[16103][OpenStack] Fixed the issue causing Glance client to return the
HTTPInternalServerError while operating with volume when Glance was
configured with the Cinder backend TechPreview.
[17321][OpenStack] Fixed the issue causing RPC errors in the logs of the
designate-central pods during liveness probes.
[17927][OpenStack] Fixed the issue causing inability to delete volume backups
created from encrypted volumes.
[18029][OpenStack] Fixed the issue with live migration of instances with
SR-IOV macvtap ports occasionally requiring the same virtual functions
(VF) number to be free on the destination compute nodes.
[18744][OpenStack] Fixed the issue with the application_credential
authentication method being disabled in Keystone in case of an enabled
Keycloak integration.
[17246][StackLight] Deprecated the redundant openstack.externalFQDN
(string) parameter and added the new externalFQDNs.enabled (bool)
parameter.
Implemented a machine-readable status for OpenStack deployments. Now, you can
use the OpenStackDeploymentStatus (OsDplSt) custom resource as a single
data structure that describes the OpenStackDeployment (OsDpl) status at a
particular moment.
Implemented the capability to enable Cinder volume encryption through the
OpenStackDeployment CR using Barbican that will store the encryption keys
and Linux Unified Key Setup (LUKS) that will create encrypted Cinder volumes
including bootable ones. If an encrypted volume is bootable, Nova will get a
symmetric encryption key from Barbican.
Implemented full support for Tungsten Fabric 2011. Though, Tungsten Fabric 5.1
is deployed by default in MOS 21.5, you can use the
tfVersion parameter to define the 2011 version for deployment.
Implemented support for multiple workers of the contrail-api in Tungsten
Fabric. Starting from the MOS 21.5 release, six workers
of the contrail-api
service are used by default. In the previous MOS releases,
only one worker of this service was used.
Short names for Kubernetes nodes in Grafana dashboards¶
Enhanced the Grafana dashboards to display user-friendly short names for
Kubernetes nodes, for example, master-0, instead of long name labels
such as kaas-node-f736fc1c-3baa-11eb-8262-0242ac110002.
This feature provides for consistency with Kubernetes nodes naming in the
Mirantis Container Cloud web UI.
All Grafana dashboards that present node data now have an additional
Node identifier drop-down menu. By default, it is set to
machine to display short names for Kubernetes nodes. To display
Kubernetes node name labels as previously, change this option to
node.
Implemented the following improvements to StackLight alerting:
Added the OpenstackServiceInternalApiOutage and
OpenstackServicePublicApiOutage alerts that raise in case of an OpenStack
service internal or public API outage.
Enhanced the alert inhibition rules.
Reworked a number of alerts to improve alerting efficiency and reduce alert
flooding.
Removed the inefficient OpenstackServiceApiDown and
OpenstackServiceApiOutage alerts.
Published MOS Release Compatibility Matrix that describes
the cloud configurations that have been supported by the product over the
course of its lifetime and the path a MOS cloud can
take to move from an older configuration to a newer one.
Published the OpenStack Ussuri to Victoria upgrade procedure and Tungsten
Fabric 5.1 to 2011 upgrade procedure that instruct cloud operators on how
to prepare for the upgrade, use the MOS life cycle
management API to perform the upgrade, and verify the cloud operability
after the upgrade.
Mirantis has tested MOS against a very specific
configuration and can guarantee a predictable behavior of the product onlyin
the exact same environments. The table below includes the major
MOS components with the exact versions against which
testing has been performed.
This section describes the MOS known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
Tungsten Fabric does not provide the following functionality:
Automatic generation of network port records in DNSaaS
(Designate) as Neutron with Tungsten Fabric as a backend
is not integrated with DNSaaS. As a workaround, you can use
the Tungsten Fabric built-in DNS service that enables virtual
machines to resolve each other names.
Secret management (Barbican). You cannot use the certificates
stored in Barbican to terminate HTTPs in a load balancer.
Role Based Access Control (RBAC) for Neutron objects.
Modification of custom vRouter DaemonSets based on the SR-IOV definition in
the OsDpl CR.
[10096] tf-control does not refresh IP addresses of Cassandra pods¶
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
kubectl-ntfdeletepodtf-control-<hash>
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[15684] Pods fail when rolling Tungsten Fabric 2011 back to 5.1¶
Some tf-control and tf-analytics pods may fail during the Tungsten
Fabric rollback from version 2011 to 5.1. In this case, the control
container from the tf-control pod and/or the collector container from
the tf-analytics pod contain SYS_WARN messages such as
… AMQP_QUEUE_DELETE_METHOD caused: PRECONDITION_FAILED - queue
‘<contrail-control/contrail-collector>.<nodename>’ in vhost ‘/’ not empty ….
The workaround is to manually delete the queue that fails to be deleted by
AMQP_QUEUE_DELETE_METHOD:
During LCM operations such as Tungsten Fabric update or upgrade, the following
parameters defined by the cluster administrator are reset to the following
defaults upon the tf-config pod restart:
BGP_ASN to 64512
ENCAP_PRIORITY to MPLSoUDP,MPLSoGRE,VXLAN
VXLAN_VN_ID_MODE to automatic
As a workaround, manually set up values for the required parameters if they
differ from the defaults:
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not work
correctly. The symptoms include both failure to properly balance traffic and
failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.
[16495] Failure to reschedule OpenStack deployment pods after a node recovery¶
Kubernetes does not reschedule OpenStack deployment pods after a node
recovery.
As a workaround, delete all pods of the deployment:
When Glance is configured with the Cinder backend TechPreview, the
Glance client may return the HTTPInternalServerError error while operating
with volume. In this case, repeat the action again until it succeeds.
[19065] Octavia load balancers lose Amphora VMs after failover¶
If an Amphora VM does not respond or responds too long to heartbeat requests,
the Octavia load balancer automatically initiates a failover process after 60
seconds of unsuccessful attempts. Long responses of an Amphora VM may be caused
by various events, such as a high load on the OpenStack compute node that hosts
the Amphora VM, network issues, system service updates, and so on. After a
failover, the Amphora VMs may be missing in the completed Octavia load
balancer.
Workaround:
If your deployment is already affected, manually restore the work of the load
balancer by recreating the Amphora VM:
To avoid an automatic failover start that may cause the issue, set the
heartbeat_timeout parameter in the OpenStackDeployment CR to a large
value in seconds. The default is 60 seconds. For example:
An update of a MOS cluster may fail with
the ceph csi-driver is not evacuated yet, waiting… error during the
Ceph CSI pod eviction.
Workaround:
Scale the affected StatefulSet of the pod that fails to init down to
0 replicas. If it is the DaemonSet such as nova-compute, it must
not be scheduled on the affected node.
On every csi-rbdplugin pod, search for stuck csi-vol:
Scale the affected StatefulSet back to the original number of replicas
or until its state is Running. If it is a DaemonSet, run the pod on
the affected node again.
[18879] The RGW pod overrides the global CA bundle with an incorrect mount¶
During deployment of a Ceph cluster, the RADOS Gateway (RGW) pod overrides
the global CA bundle located at /etc/pki/tls/certs with an incorrect
self-signed CA bundle. The issue affects only clusters with public
certificates.
Workaround:
Open the KaasCephCluster CR of a managed cluster for editing:
Substitute <managedClusterProjectName> with a corresponding value.
Note
If the public CA certificates also apply to the OsDpl CR,
edit this resource as well.
Select from the following options:
If you are using the GoDaddy certificates, in the
cephClusterSpec.objectStorage.rgw section, replace the
cacert parameters with your public CA certificate that already
contains both the root CA certificate and intermediate CA certificate:
The following issues have been addressed in the Mirantis OpenStack for
Kubernetes 21.5 release:
[17115][Update] Fixed the issue with the
status.providerStatus.releaseRefs.previous.name field in the Cluster
object for Ceph not changing during the MOS
cluster update. If you have previously applied the workaround as described in
[17115] Cluster update does not change releaseRefs in Cluster object for Ceph, manually add the subresources section back to the
clusterworkloadlock CRD:
kubectl edit crd clusterworkloadlocks.lcm.mirantis.com# add here 'subresources' section:spec:versions:-name:v1alpha1subresources:status:{}
[17477][Update] Fixed the issue with StackLight in HA mode placed on
controller nodes being not deployed or cluster update being blocked. Once you
update your MOS cluster from the Cluster release
6.18.0 to 6.19.0, roll back the workaround applied as described in
[17477] StackLight in HA mode is not deployed or cluster update is blocked:
Remove stacklight labels from worker nodes. Wait for the labels to be
removed.
Remove the custom nodeSelector section from the cluster spec.
[16103][OpenStack] Fixed the issue with the Glance client returning the
HTTPInternalServerError error while operating with a volume if Glance
was configured with the Cinder backend TechPreview.
[14678][OpenStack] Fixed the issue with instance being inaccessible through
floating IP upon floating IP quick reuse when using a small floating network.
[16963][OpenStack] Fixed the issue with Ironic failing to provide nodes on
deployments with OpenStack Victoria.
[16241][Tungsten Fabric] Fixed the issue causing failure to update a port, or
security group assigned to the port, through the Horizon web UI.
[17045][StackLight] Fixed the issue causing the fluentd-notifications pod
failing to track the RabbitMQ credentials updates in the Secret object.
[17573][StackLight] Fixed the issue with OpenStack notifications missing in
Elasticsearch and the Kibana notification-* index being empty.
Implemented full support for OpenStack Victoria with OVS or Tungsten Fabric
5.1. However, for new OpenStack Victoria with Tungsten Fabric deployments,
Mirantis recommends that you install Tungsten Fabric 2011, which is shipped as
TechPreview in this release.
Verified the upgrade path from Ussuri with OVS or Tungsten Fabric 5.1 to
Victoria with OVS or Tungsten Fabric 5.1.
OpenStack Ussuri is considered deprecated and will be declared unsupported
in one of the upcoming releases. Therefore, start planning your Ussuri to
Victoria cloud upgrade.
Default policies override for core OpenStack services¶
Implemented the mechanism to define additional policy rules for the core
OpenStack services through the OpenStackDeployment Custom Resource.
Implemented support for Masakari instance evacuation. Now, Masakari host
monitor is deployed by default with Instances High Availability Service for
OpenStack to provide automatic instance evacuation from failed instances.
Implemented the usage of direct Helm 3 communication by the OpenStack operator.
The usage of HelmBundles is dropped and automatic transition from Helm 2 to
Helm 3 is performed during the MOS 21.3 to
MOS 21.4 release update.
Implemented the capability to configure Cinder backend for images through
the OpenStackDeployment Custom Resource. The usage of Cinder backend
for Glance enables the OpenStack clouds relying on third-party appliances
for block storage to have images in one place.
Compact control plane for small Open vSwitch-based clouds¶
TechPreview
Added the capability to collocate the OpenStack control plane with the managed
cluster master nodes through the OpenStackDeployment Custom Resource.
Note
If the StackLight cluster is configured to run in the HA mode
on the same nodes with the control plane services, additional manual
steps are required for an upgrade to MOS 21.4 or for a greenfield
deployment. For details, see known issue 17477.
Added support for large-scale deployments that number up to 200 nodes
out of the box. The use case has been verified for core OpenStack services
with OVS and non-DVR Neutron configuration on a dedicated hardware scale lab.
For a successful deployment, we recommend sticking to the optimal limit for
the number of ports on gateway nodes that is 1500 ports per gateway node.
This recommendation was confirmed during the testing and should be taken
into account when planning large environments.
Implemented the capability to enable the BGP VPN service to allow for
connection of OpenStack Virtual Private Networks with external VPN
sites through either BGP/MPLS IP VPNs or E-VPN.
Published MOS API Reference to provide cloud operators
with an up-to-date and comprehensive definition of the language they need
to use to communicate with MOS OpenStack and Tungsten
Fabric.
Enhanced StackLight to send all OpenStack notifications to the notification
index. Now, to view the previously called audit notifications, see the
cadfLogger in the Kibana Notifications
dashboard.
Mirantis has tested MOS against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOS components with the exact versions against which
testing has been performed.
This section describes the MOS known issues with available
workarounds. For the known issues in the related version of
Mirantis Container Cloud, refer to Mirantis Container Cloud: Release Notes.
Tungsten Fabric does not provide the following functionality:
Automatic generation of network port records in DNSaaS
(Designate) as Neutron with Tungsten Fabric as a backend
is not integrated with DNSaaS. As a workaround, you can use
the Tungsten Fabric built-in DNS service that enables virtual
machines to resolve each other names.
Secret management (Barbican). You cannot use the certificates
stored in Barbican to terminate HTTPs in a load balancer.
Role Based Access Control (RBAC) for Neutron objects.
Modification of custom vRouter DaemonSets based on the SR-IOV definition in
the OsDpl CR.
[10096] tf-control does not refresh IP addresses of Cassandra pods¶
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
kubectl-ntfdeletepodtf-control-<hash>
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not work
correctly. The symptoms include both failure to properly balance traffic and
failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.
[14678] Instance inaccessible through floating IP upon floating IP quick reuse¶
Fixed in MOS 21.5
When using a small floating network and the floating IP that was previously
allocated to an instance and re-associated with another instance in a short
period of time, the instance may be inaccessible. The Address Resolution
Protocol (ARP) cache timeout on the infrastructure layer is typically set to 5
minutes.
As a workaround, set a shorter ARP cache timeout on the infrastructure
side.
When Glance is configured with the Cinder backend TechPreview, the
Glance client may return the HTTPInternalServerError error while operating
with volume. In this case, repeat the action again until it succeeds.
[17045] fluentd-notifications does not track RabbitMQ credentials updates¶
Fixed in MOS 21.5
The fluentd-notifications pod fails to track the RabbitMQ credentials
updates in the Secret object. In this case, the fluentd-notifications pods
in the StackLight namespace are being restarted too often with the following
error message present in logs:
Authentication with RabbitMQ failed. Please check your connection settings.
Workaround:
Delete the affected fluentd-notifications pod. For example:
[17573] OpenStack notifications missing in Elasticsearch and Kibana¶
Fixed in MOS 21.5
OpenStack notifications may be missing in Elasticsearch and the Kibana
notification-* index may be empty. In this case, error messages
similar to the following one may be present in the
fluentd-notifications logs:
On the affected managed cluster, obtain the proper user name and password
from the rabbitmq-creds Secret in the openstack-lma-shared namespace
(strip b' prefix and ' suffix):
[17477] StackLight in HA mode is not deployed or cluster update is blocked¶
Fixed in MOS 21.5
The deployment of new managed clusters using the Cluster release 6.18.0
with StackLight enabled in the HA mode on control plane nodes does not have
StackLight deployed. The update of existing clusters with such StackLight
configuration that were created using the Cluster release 6.16.0 is blocked
with the following error message:
If you faced the issue during a managed cluster deployment, skip
this step.
If you faced the issue during a managed cluster update, wait until all
StackLight components resources are recreated on the target nodes
with updated node selectors.
In the Container Cloud web UI, add a fake StackLight label to any 3 worker
nodes to satisfy the deployment requirement as described in
Mirantis Container Cloud Operations Guide: Create a machine using web UI.
Eventually, StackLight will be still placed on the
target nodes with the forcedRole:stacklight label.
Once done, the StackLight deployment or update proceeds
[17305] Cluster update fails with the ‘Not ready releases: descheduler’ error¶
Affects only MOS 21.4
An update of a MOS cluster from the Cluster release
6.16.0 to 6.18.0 may fail with the following exemplary error message:
Clusterdatastatus:conditions:
-message:'Helm charts are not installed(upgraded) yet. Not ready releases: descheduler.'ready:falsetype:Helm
The issue may affect the descheduler and metrics-server Helm releases.
As a workaround, run helm uninstall descheduler or
helm uninstall metrics-server and wait for Helm Controller
to recreate the affected release.
[16987] Cluster update fails at Ceph CSI pod eviction¶
Fixed in MOS 22.2
An update of a MOS cluster may fail with the
ceph csi-driver is not evacuated yet, waiting… error during the Ceph
CSI pod eviction.
Workaround:
Scale the affected StatefulSet of the pod that fails to init down to
0 replicas. If it is the DaemonSet such as nova-compute, it must
not be scheduled on the affected node.
On every csi-rbdplugin pod, search for stuck csi-vol:
Scale the affected StatefulSet back to the original number of replicas
or until its state is Running. If it is a DaemonSet, run the pod on
the affected node again.
[17115] Cluster update does not change releaseRefs in Cluster object for Ceph¶
Fixed in MOS 21.5
During an update of a MOS cluster from the Cluster release 6.16.0 to 6.18.0,
the status.providerStatus.releaseRefs.previous.name field in the Cluster
object does not change.
Workaround:
In the clusterworkloadlock CRD, remove the subresources section:
Create a ceph-cwl.yaml file with Ceph ClusterWorkloadLock:
apiVersion:lcm.mirantis.com/v1alpha1kind:ClusterWorkloadLockmetadata:name:ceph-clusterworkloadlockspec:controllerName:cephstatus:state:inactiverelease:<clusterRelease># from the previous step
Substitute <clusterRelease> with clusterRelease obtained in the
previous step.
[17038] Cluster update may fail with TimeoutError¶
Affects only MOS 21.4
A MOS cluster update from the Cluster version
6.16.0 to 6.18.0 may fail with the Timeout waiting for pods statuses
timeout error. The error means that pods containers will be not ready
and will often restart with OOMKilled as a restart reason. For example:
The following issues have been addressed in the Mirantis OpenStack for
Kubernetes 21.4 release:
[13273][OpenStack] Fixed the issue with Octavia amphora getting stuck after
the MOS cluster update.
[16849][OpenStack] Fixed the issue causing inability to delete a load
balancer with a number higher than the maximum limit in API.
[16180][Tungsten Fabric] Fixed the issue with inability to schedule
vRouter DPDK on a node with DPDK and 1 GB huge pages enabled. Enhanced
Tungsten Fabric Operator to support 1 GB huge pages for a DPDK-based vRouter.
[16033][Ceph] Fixed the issue with inability to access RADOS Gateway through
using S3 authentication. Added rgws3authusekeystone=true to the
default RADOS Gateway options.
[16604][StackLight] To avoid issues with defunct processes on the
OpenStack controller nodes, temporarily disabled instances downtime
monitoring and removed the KPI - Downtime Grafana dashboard.
Update for the MOS GA release introducing support
for Hyperconverged
OpenStack compute nodes, SR-IOV and control interface specification for
Tungsten Fabric, and the following Technology Preview features:
Implemented full support for colocation of a cluster services on the same host,
for example, Ceph OSD and OpenStack compute.
To avoid nodes overloading, limit the hardware resources consumption by the
OpenStack compute services as described in Deployment Guide: Limit HW
resources for hyperconverged OpenStack compute nodes.
Implemented the capability to encrypt the east-west tenant traffic between the
OpenStack compute nodes and gateways using strongSwan Internet Protocol
Security (IPsec) solution.
Implemented full support for SR-IOV in Tungsten Fabric.
After the OpenStackDeployment CR modification, the TF Operator
now generates a separate vRouter DaemonSet with specified settings.
After the SR-IOV enablement, the tf-vrouter-agent pods
are automatically restarted on the corresponding nodes causing
the network services interruption on virtual machines running on these hosts.
Therefore, plan this procedure accordingly.
Implemented the targetSizeRatio parameter for the replicated
MOS Ceph pools. The targetSizeRatio value specifies
the default ratio for each Ceph pool type to define the expected consumption
of the Ceph cluster capacity.
Added the customIngress parameter to implement the capability to specify
a custom Ingress Controller when configuring the Ceph RGW TLS.
Caution
Starting from MOS 21.3, external Ceph RGW
service is not supported and will be deleted during update. If your
system already uses endpoints of an external RGW service, reconfigure
them to the ingress endpoints.
Mirantis has tested MOS against a very specific
configuration and can guarantee a predictable behavior of the product
only in the exact same environments. The table below includes the major
MOS components with the exact versions against which
testing has been performed.
Tungsten Fabric does not provide the following functionality:
Automatic generation of network port records in DNSaaS
(Designate) as Neutron with Tungsten Fabric as a backend
is not integrated with DNSaaS. As a workaround, you can use
the Tungsten Fabric built-in DNS service that enables virtual
machines to resolve each other names.
Secret management (Barbican). You cannot use the certificates
stored in Barbican to terminate HTTPs in a load balancer.
Role Based Access Control (RBAC) for Neutron objects.
Modification of custom vRouter DaemonSets based on the SR-IOV definition in
the OsDpl CR.
[10096] tf-control does not refresh IP addresses of Cassandra pods¶
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
kubectl-ntfdeletepodtf-control-<hash>
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or
other circumstances that cause the Cassandra pods to start simultaneously may
cause a broken Cassandra TFConfig and/or TFAnalytics cluster.
In this case, Cassandra nodes do not join the ring and do not update the IPs of
the neighbor nodes. As a result, the TF services cannot operate Cassandra
cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
[15525] HelmBundle Controller gets stuck during cluster update¶
Affects only MOS 21.3
The HelmBundle Controller that handles OpenStack releases gets stuck during
cluster update and does not apply HelmBundle changes. The issue is caused
by an unlimited releases history that increases the amount of RAM consumed by
Tiller. The workaround is to manually limit the releases number history to 3.
[13273] Octavia amphora may get stuck after cluster update¶
Fixed in MOS 21.4
After the MOS cluster update, Octavia amphora
may get stuck with the
WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not
connect to instance. Retrying. error message present in the Octavia worker
logs. The workaround is to manually switch the Octavia amphorae driver from
V2 to V1.
Workaround:
In the OsDpl CR, specify the following configuration:
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not work
correctly. The symptoms include both failure to properly balance traffic and
failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.
Copy last certificate in the chain and save it to the temp file, for
example, tmp-cacert.crt.
Encode the certificate from tmp-cacert.crt with base64 encoding in one
line:
cattmp-cacert.crt|base64-w0
Create a new cacert key in the rook-ceph/rgw-ssl-certificate secret
and copy the base64-encoded cacert to its value. The following is an
example of the resulting secret data:
The following issues have been addressed in the Mirantis OpenStack for
Kubernetes 21.3 release:
[13422][OpenStack] Fixed the issue with some Redis pods remaining in
Pending state and causing failure to update the Cluster release.
[12511][OpenStack] Fixed the issue with Kubernetes nodes getting stuck in the
Prepare state during the MOS cluster update.
[13233][StackLight] Fixed the issue with low memory limits for StackLight
Helm Controller causing update failure.
[12917][StackLight] Fixed the issue with prometheus-tf-vrouter-exporter
pods failing to start on Tungsten Fabric nodes with DPDK. To remove the
nodeSelector definition specified when applying the
workaround:
Remove the nodeSelector.component.tfVrouterExporter definition from
the stackLighthelmRelease
(.spec.providerSpec.value.helmReleases) values of the Cluster
resource.
Remove the label using the following command. Do not remove the last dash
sign.
kubectllabelnode<node_name>tfvrouter-fix-
[11961][Tungsten Fabric] Fixed the issue with members failing to join the
RabbitMQ cluster after the tf-control nodes reboot.
Implemented the cache and proxy support for MOS managed
clusters.
By default, during a MOS cluster deployment and
update the Mirantis artifacts are downloaded through a cache running
on a management or regional cluster. If you have an external application
that requires Internet access, you can now use proxy with required parameters
specified for that application.
Implemented the capability to enable Masakari, the OpenStack service that
ensures high availability of instances running on a host. The feature is
disabled by default.
Implemented the capability to specify custom settings for the Tungsten Fabric
vRouter nodes using the customSpecs parameter, such as to change the name
of the tunnel network interface or enable debug level logging.
Improved user experience by moving the rgw.ingress parameters of the
KaasCephCluster CR to a common cephClusterSpec.ingress section. The
rgw section is deprecated. However, if you continue using rgw.ingress,
it will be automatically translated into cephClusterSpec.ingress during
the MOS cluster release update.
Mirantis has tested MOS against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOS components with the exact versions against which
testing has been performed.
Tungsten Fabric does not provide the following functionality:
Automatic generation of network port records in DNSaaS
(Designate) as Neutron with Tungsten Fabric as a backend
is not integrated with DNSaaS. As a workaround, you can use
the Tungsten Fabric built-in DNS service that enables virtual
machines to resolve each other names.
Secret management (Barbican). You cannot use the certificates
stored in Barbican to terminate HTTPs in a load balancer.
Role Based Access Control (RBAC) for Neutron objects.
Modification of custom vRouter DaemonSets based on the SR-IOV definition in
the OsDpl CR.
[10096] tf-control does not refresh IP addresses of Cassandra pods¶
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
kubectl-ntfdeletepodtf-control-<hash>
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes,
maintenance, or other circumstances that cause the Cassandra pods to
start simultaneously may cause a broken Cassandra TFConfig and/or
TFAnalytics cluster. In this case, Cassandra nodes do not join the
ring and do not update the IPs of the neighbor nodes. As a result, the
TF services cannot operate Cassandra cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config or
analytics cluster and the replica number:
During the MOS cluster update to Cluster release
6.14.0, Kubernetes nodes may get stuck in the Prepare state. At the same
time, the LCM Controller logs may contain the following errors:
After the MOS cluster update, Octavia amphora may
get stuck with the
WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect
to instance. Retrying. error message present in the Octavia worker logs. The
workaround is to manually switch the Octavia amphorae driver from V2 to V1.
Workaround:
In the OsDpl CR, specify the following configuration:
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not work
correctly. The symptoms include both failure to properly balance traffic and
failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.
During the MOS cluster update to Cluster release
6.14.0, StackLight Helm Controller containers (controller and/or
tiller) may get OOMKilled and cause failure to update.
As a workaround, manually increase the default resource requests and limits
for stacklightHelmControllerController and
stacklightHelmControllerTiller in the StackLight Helm chart values of the
Cluster release resource:
StackLight deploys the prometheus-tf-vrouter-exporter exporter based on the
node selector matching the tfvrouter:enabled node label. The Tungsten
Fabric nodes with DPDK have the tfvrouter-dpdk:enabled label set instead.
Therefore, the prometheus-tf-vrouter-exporter exporter fails to start on
these nodes.
Workaround:
Add the tfvrouter-fix:enabled label to every node that contains either
the tfvrouter:enabled or the tfvrouter-dpdk:enabled node label.
kubectllabelnode<node_name>tfvrouter-fix=enabled
In the Cluster release resource, specify the following nodeSelector
definition in the StackLight Helm chart values:
The following issues have been addressed in the Mirantis OpenStack for
Kubernetes 21.2 release:
[7725][Tungsten Fabric] Fixed the issue with the Neturon service failing to
create a network through Horizon and OpenStack CLI throwing the
Anunknownexceptionoccurred error.
Update for the MOS GA release introducing support
for the PCI passthrough feature and Tungsten Fabric monitoring,
as well as the following Technology Preview features:
OpenStack Victoria support with OVS and Tungsten Fabric 5.1
SR-IOV for OpenStack
Components collocation (OpenStack compute and Ceph nodes)
Added support for the Peripheral Component Interconnect (PCI) passthrough
feature in OpenStack to use, mainly, as a part of the SR-IOV network traffic
acceleration technique. Now, MOS enables the user to
configure Nova on a per-node basis to allow PCI devices to be passed through
from hosts to virtual machines.
Enhanced StackLight to monitor Tungsten Fabric and its components, including
Casandra, Kafka, Redis, and ZooKeeper. Implemented the Tungsten Fabric alerts
and Grafana dashboards. The feature is disabled by default. You can enable it
manually during or after the Tungsten Fabric deployment.
Implemented alert inhibition rules to provide a clearer view on the cloud
status and simplify troubleshooting. Using alert inhibition rules, Alertmanager
decreases alert noise by suppressing dependent alerts notifications. The
feature is enabled by default. For details, see
Operations Guide: Alert dependencies.
Implemented integration between Grafana and Kibana by adding a
View logs in Kibana link to most Grafana dashboards, which allows
you to immediately view contextually relevant logs through the Kibana web UI.
Added the capability to configure the Transport Layer Security (TLS) protocol
for a Ceph RGW public endpoint using MOS TLS if enabled,
or using a custom ingress specified in the KaaSCephCluster custom
resource.
Mirantis has tested MOS against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOS components with the exact versions against which
testing has been performed.
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not work
correctly. The symptoms include both failure to properly balance traffic and
failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.
Tungsten Fabric does not provide the following functionality:
Automatic generation of network port records in DNSaaS
(Designate) as Neutron with Tungsten Fabric as a backend
is not integrated with DNSaaS. As a workaround, you can use
the Tungsten Fabric built-in DNS service that enables virtual
machines to resolve each other names.
Secret management (Barbican). You cannot use the certificates
stored in Barbican to terminate HTTPs in a load balancer.
Role Based Access Control (RBAC) for Neutron objects.
The Neturon service fails to create a network through Horizon and
OpenStack CLI throwing the Anunknownexceptionoccurred error.
The workaround is to restart the tf-config pods:
Obtain the list of the tf-config pods:
kubectl-ntfgetpod-lapp=tf-config
Delete the tf-config-* pods. For example:
kubectl-ntfdeletepodtf-config-2whbb
Verify that the pods have been recreated:
kubectl-ntfgetpod-lapp=tf-config
[10096] tf-control does not refresh IP addresses of Cassandra pods¶
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
The following issues have been addressed in the Mirantis OpenStack for
Kubernetes 21.1 release:
[9809] [Kubernetes] Fixed the issue with the pods getting stuck in
the Pending state during update of a MOSK
cluster by increasing the default kubelet_max_pods setting to 150.
[9589][StackLight] Fixed the issue with the Patroni pod crashing when
scheduled to an OpenStack compute node with huge pages.
[8573] [OpenStack] Fixed the issue with the external authentication
to Horizon failing to log in a different user.
The first update to MOS Ussuri release introducing
support for object storage and a Telco deployment profile, which
includes implementation of baseline Enhanced Platform Awareness
(NUMA awareness, huge pages, CPU pinning) capabilities, and a
technical preview of packet processing acceleration (Data Plane
Development Kit-enabled Tungsten Fabric).
Implemented the capability to easily perform the node-specific configuration
through the OpenStack Controller. More specifically, the node-specific
overrides allow you to:
Implemented the capability to customize the look and feel of Horizon through
the OpenStackDeployment custom resource. Cloud operator is now able
to specify the origin of the theme bundle to be applied to OpenStack Horizon
in features:horizon:themes.
Implemented the capability to disable HTTP probes for public endpoints from the
OpenStack service catalog. In this case, Telegraf performs HTTP checks only
for the admin and internal OpenStack endpoints. By default, Telegraf verifies
all endpoints from the OpenStack service catalog.
Implemented the capability to verify the status of Tungsten Fabric services,
including the third-party services such as Cassandra, ZooKeeper, Kafka, Redis,
and RabbitMQ using the Tungsten Fabric Operator tf-status tool.
Mirantis has tested MOS against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOS components with the exact versions against which
testing has been performed.
Due to limitations in the Octavia and MOS integration,
the clusters where Neutron is deployed in the Distributed Virtual Router
(DVR) mode are not stable. Therefore, Mirantis does not recommend such
configuration for production deployments.
[9809] The default max_pods setting does not allow upgrading a cluster¶
Fixed in MOS 21.1
During update of a MOS cluster, the pods may get
stuck in the Pending state with the following example warning:
Warning FailedScheduling <unknown> default-scheduler 0/9 nodes are available:1 node(s) were unschedulable, 2 Too many pods, 6 node(s) didn't match node selector.
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not work
correctly. The symptoms include both failure to properly balance traffic and
failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.
[8573] External authentication to Horizon fails to log in a different user¶
Fixed in MOS 21.1
Horizon retains the user’s credentials following their initial login using
External Authentication Service, and does not allow to log in with another
user credentials.
Workaround:
Clear cookies in your browser.
Select External Authentication Service on the Horizon login
page.
Click Sign In. The Keycloak login page opens.
If the following error occurs, refresh the page and try again:
CSRF token missing or incorrect. Cookies may be turned off.Make sure cookies are enabled and try again.
Tungsten Fabric does not provide the following functionality:
Automatic generation of network port records in DNSaaS
(Designate) as Neutron with Tungsten Fabric as a backend
is not integrated with DNSaaS. As a workaround, you can use
the Tungsten Fabric built-in DNS service that enables virtual
machines to resolve each other names.
Secret management (Barbican). You cannot use the certificates
stored in Barbican to terminate HTTPs in a load balancer.
Role Based Access Control (RBAC) for Neutron objects.
[10096] tf-control service does not refresh IP addresses of Cassandra pods¶
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
General availability of the product with OpenStack Ussuri and choice of
Neutron/OVS or Tungsten Fabric 5.1 for networking. Runs on top of a bare
metal Kubernetes cluster managed by Container Cloud.
Mirantis OpenStack for Kubernetes (MOS) represents a frictionless
cloud infrastructure on-premise. MOS Ussuri is integrated
with Container Cloud bare metal with Ceph and StackLight onboard and,
optionally, supports Tugsten Fabric 5.1 as a backend for the OpenStack
networking. In terms of updates, MOS Ussuri fully relies
on the Container Cloud update delivery mechanism.
Mirantis has tested MOS against a very specific
configuration and can guarantee a predictable behavior of the product only
in the exact same environments. The table below includes the major
MOS components with the exact versions against which
testing has been performed.
Due to limitations in the Octavia and MOS integration,
the clusters where Neutron is deployed in the Distributed Virtual Router
(DVR) mode are not stable. Therefore, Mirantis does not recommend such
configuration for production deployments.
[6912] Octavia load balancers may not work properly with DVR¶
Limitation
When Neutron is deployed in the DVR mode, Octavia load balancers may not work
correctly. The symptoms include both failure to properly balance traffic and
failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.
[8573] External authentication to Horizon fails to log in a different user¶
Target fix version: next MOS update
Horizon retains the user’s credentials following their initial login using
External Authentication Service, and does not allow to log in with another
user credentials.
Workaround:
Clear cookies in your browser.
Select External Authentication Service on the Horizon login
page.
Click Sign In. The Keycloak login page opens.
If the following error occurs, refresh the page and try again:
CSRF token missing or incorrect. Cookies may be turned off.Make sure cookies are enabled and try again.
Tungsten Fabric does not provide the following functionality:
Automatic generation of network port records in DNSaaS
(Designate) as Neutron with Tungsten Fabric as a backend
is not integrated with DNSaaS. As a workaround, you can use
the Tungsten Fabric built-in DNS service that enables virtual
machines to resolve each other names.
Secret management (Barbican). You cannot use the certificates
stored in Barbican to terminate HTTPs in a load balancer.
Role Based Access Control (RBAC) for Neutron objects.
The HAProxy service, which is used as a backend for load balancers in
Tungsten Fabric, uses non-existing socket files from the log collection
service. This error in the configuration causes the logging of error
messages on attempts to use loggers in contrail-lbaas-haproxy-stdout.log.
The issue does not affect the service operability.
[10096] tf-control service does not refresh IP addresses of Cassandra pods¶
The tf-control service resolves the DNS names of Cassandra pods at startup
and does not update them if Cassandra pods got new IP addresses, for example,
in case of a restart. As a workaround, to refresh the IP addresses of
Cassandra pods, restart the tf-control pods one by one:
Caution
Before restarting the tf-control pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
Considering continuous reorganization and enhancement of Mirantis OpenStack
for Kubernetes (MOSK), certain components are deprecated
and eventually removed from the product. This section provides details about
the deprecated and removed functionality that may potentially impact existing
MOSK deployments.
Configuring CPU isolation through the isolcpus configuration
parameter for Linux kernel is considered deprecated.
MOSK 21.5 introduces the capability to configure CPU
isolation using the cpusets mechanism in Linux kernel. For details,
see CPU isolation using cpusets.
Instead of v1alpha1, MOSK introduces support for the
API v2 for the Tungsten Fabric Operator. The new version of the Tungsten
Fabric Operator API aligns with the OpenStack Controller API and
provides better interface for advanced configurations.
In MOSK 24.1, the API v2 is available only for the
new product deployments with Tungsten Fabric.
Since 24.2, MOSK API v2 becomes default for new product
deployments and includes the ability to convert existing v1alpha1
TFOperator to v2 during update. For details, refer to
Convert v1alpha1 TFOperator custom resource to v2.
During the update to the 24.3 series, the old Tungsten Fabric cluster
configuration API v1alpha1 is automatically converted and replaced
with the v2 version.
Tungsten Fabric analytics services, primarily designed for collecting
various metrics from the Tungsten Fabric services, is being deprecated.
Despite its initial implementation, user demand for this feature has
been minimal. As a result, Tungsten Fabric analytics services will
become unsupported in the product.
All greenfield deployments starting from MOSK 24.1
do not include Tungsten Fabric analytics services. The existing
deployments updated to 24.1 and newer versions will include Tungsten
Fabric analytics services as well as the ability to disable them as
described in Disable Tungsten Fabric analytics services.
Deprecated the SubnetPool resource along with automated subnet creation
using SubnetPool.
Existing configurations that use SubnetPool objects in L2Template
will be automatically migrated to Subnet objects during cluster update
to MOSK 24.2. As a result of migration, existing
Subnet objects will be referenced in L2Template objects
instead of SubnetPool.
Since MOSK 25.1 and Container Cloud 2.29.0, Admission
Controller blocks creation of new SubnetPool objects.
If you still require this feature, contact Mirantis support for further
information.
Disabled creation of the default L2 template for a namespace.
On existing clusters, clusterRef:default is removed during the
migration process. Subsequently, this parameter is not substituted with
the cluster.sigs.k8s.io/cluster-name label, ensuring the application
of the L2 template across the entire Kubernetes namespace. Therefore,
you can continue using existing default L2 templates for
namespaces.
Deprecated the clusterRef parameter located in the L2Templatespec. Use the cluster.sigs.k8s.io/cluster-name label instead.
On existing clusters, this parameter is automatically migrated to the
cluster.sigs.k8s.io/cluster-name label since MOSK 23.3
and Container Cloud 2.25.0.
Deprecated support for the Focal Fossa Ubuntu distribution in favor of
Jammy Jellyfish.
Warning
During the course of the MOSK 24.3 and Container Cloud 2.28.x
series, Mirantis highly recommends upgrading an operating system on your
cluster machines to Ubuntu 22.04 before the following major release becomes
available.
It is not mandatory to upgrade all machines at once. You can upgrade them
one by one or in small batches, for example, if the maintenance window is
limited in time.
The Cluster release update of the Ubuntu 20.04-based MOSK clusters will
become impossible as of Container Cloud 2.29.0, where Ubuntu 22.04 is the
only supported version.
Management cluster update to Container Cloud 2.29.1 will be blocked if
at least one node of any related MOSK cluster is running Ubuntu 20.04.
Deprecated the byName field in the BareMetalHostProfile object.
As a replacement, use a more specific selector, such as byPath,
serialNumber, or wwn. For details, see
Container Cloud API Reference: BareMetalHostProfile.
minSizeGiB and maxSizeGiB in BareMetalHostProfile¶
Deprecated
MOSK 24.1 and Container Cloud 2.26.0
Unsupported
To be decided
Details
Deprecated the minSizeGiB and maxSizeGiB fields in the
BareMetalHostProfile object.
Instead of floats that define sizes in GiB for *GiB fields, use
the <sizeNumber>Gi text notation such as Ki, Mi, and
so on.
All newly created profiles are automatically migrated to the Gi
syntax. In existing profiles, migrate the syntax manually.
Deprecated the wipe field from the spec:devices section of the
BareMetalHostProfile object for the sake of wipeDevice.
For backward compatibility, any existing wipe:true option
is automatically converted to the following structure:
wipeDevice:eraseMetadata:enabled:True
For new machines, use the wipeDevice structure in the
BareMetalHostProfile object.
L2Template without the l3Layout parameters section¶
Deprecated
MOS 21.3 and Container Cloud 2.9.0
Unsupported
MOSK 23.2 and Container Cloud 2.24.0
Details
Deprecated the use of the L2Template object without the l3Layout
section in spec. The use of the l3Layout section is mandatory
since Container Cloud 2.24.0 and MOSK 23.2.
On existing clusters, the l3Layout section is not added automatically.
Therefore, if you do not have the l3Layout section in L2 templates
of your existing clusters, manually add it and define all subnets
that are used in the npTemplate section of the L2 template.
Deprecated the dnsmasq.dhcp_range parameter of the
baremetal-operator Helm chart values in the Clusterspec.
Use the Subnet object configuration for this purpose instead.
Since Container Cloud 2.24.0, admission-controller does not accept
any changes to dnsmasq.dhcp_range except removal. Therefore, manually
remove this parameter from the baremetal-operator release spec section
of the Cluster object as described in Configure multiple DHCP address ranges.
Deprecated the configInline parameter in the metallb Helm chart
values of the Clusterspec. Use the MetalLBConfig,
MetalLBConfigTemplate, and Subnet objects instead of this parameter.
The L2Template and IPaddr parameter status fields¶
Deprecated
MOSK 23.1 and Container Cloud 2.23.0
Unsupported
MOSK 23.3 and Container Cloud 2.25.0
Details
Deprecated the following status fields for the L2Template and IPaddr objects:
Renamed the following fields of the IpamHost status:
netconfigV2 to netconfigCandidate
netconfigV2state to netconfigCandidateState
netconfigFilesState to netconfigFilesStates (per file)
The format of netconfigFilesState is changed after renaming. The
netconfigFilesStates field contains a dictionary of statuses of network
configuration files stored in netconfigFiles. The dictionary contains
the keys that are file paths and values that have the same meaning for each
file that netconfigFilesState had:
For a successfully rendered configuration file:
OK:<timestamp><sha256-hash-of-rendered-file>, where a timestamp
is in the RFC 3339 format.
For a failed rendering: ERR:<error-message>.
The status.l2RenderResult field of the IpamHost object¶
Deprecated
MOSK 22.4 and Container Cloud 2.19.0
Unsupported
To be decided
Details
Deprecated the status.l2RenderResult field of the IpamHost
object in the sake of status.NetconfigCandidateState.
The status.nicMACmap field of the IpamHost object¶
Deprecated
MOSK 21.6 and Container Cloud 2.14.0
Unsupported
MOSK 22.1 and Container Cloud 2.15.0
Details
Removed nicMACmap in the IpamHost status. Instead, use
the serviceMap field that contains actual information about
services, IP addresses, and addresses interfaces.
The ipam/DefaultSubnet label of the Subnet object¶
Deprecated
MOSK 21.6 and Container Cloud 2.14.0
Unsupported
To be decided
Details
Deprecated the ipam/DefaultSubnet label of the metadata field of
the Subnet object.
Deprecated the following IPAM API resources: created, lastUpdated,
and versionIpam.
These resources will be eventually replaced with objCreated, objUpdated,
and objStatusUpdated.
Container Cloud 2.25.0 (Cluster releases 17.0.0 and 16.0.0)
Unsupported
Container Cloud 2.26.0 (Cluster releases 17.1.0 and 16.1.0)
Details
Suspended support for regional clusters and several regions on a single
management cluster. Simultaneously, ceased performing functional
integration testing of the feature and removed the related code in
Container Cloud 2.26.0. If you still require this feature,
contact Mirantis support for further information.
Container Cloud 2.15.0 (Cluster releases 7.5.0 and 5.22.0)
Unsupported
Container Cloud 2.18.0 (Cluster releases 11.2.0 and 7.8.0)
Details
Deprecated the iam-api service and IAM CLI (the iamctl command).
The logic of the iam-api service required for Container Cloud
is moved to scope-controller.
Container Cloud 2.9.0 (Cluster releases 6.16.0 and 5.16.0)
Details
Replaced all existing SSH user names, such as ubuntu, with the
universal mcc-user user name. Since Container Cloud 2.9.0,
SSH keys are managed only for mcc-user.
Container Cloud 2.13.0 (Cluster releases 7.3.0 and 5.20.0)
Unsupported
Container Cloud 2.14.0 (Cluster releases 7.4.0 and 5.21.0)
Details
Removed the DISABLE_OIDC flag required to be set for custom TLS
Keycloak and web UI certificates during a management cluster deployment.
Do not set this parameter anymore in bootstrap.env. To use your
own TLS certificates for Keycloak, refer to Configure TLS certificates for cluster applications.
Deprecated the performance metric exporter that is integrated into the
Ceph Manager daemon for the sake of the dedicated Ceph Exporter daemon.
Names of metrics will not be changed, no metrics will be removed.
All Ceph metrics to be collected by the Ceph Exporter daemon will change
their labels job and instance due to scraping metrics from
new Ceph Exporter daemon instead of the performance metric exporter of
Ceph Manager:
Values of the job labels will be changed from rook-ceph-mgr to
prometheus-rook-exporter for all Ceph metrics moved to Ceph
Exporter. The full list of moved metrics is presented below.
Values of the instance labels will be changed from the metric endpoint
of Ceph Manager with port 9283 to the metric endpoint of Ceph Exporter
with port 9926 for all Ceph metrics moved to Ceph Exporter. The full
list of moved metrics is presented below.
Values of the instance_id labels of Ceph metrics from the RADOS
Gateway (RGW) daemons will be changed from the daemon GID to the daemon
subname. For example, instead of instance_id="<RGW_PROCESS_GID>",
the instance_id="a" (ceph_rgw_qlen{instance_id="a"}) will be
used. The list of moved Ceph RGW metrics is presented below.
Therefore, if Ceph metrics to be collected by the Ceph Exporter daemon
are used in any customizations, for example, custom alerts, Grafana
dashboards, or queries in custom tools, update your customizations
to use new labels since Container Cloud 2.28.0 (Cluster releases 16.3.0
and 17.3.0).
List of affected Ceph RGW metrics
ceph_rgw_cache_.*
ceph_rgw_failed_req
ceph_rgw_gc_retire_object
ceph_rgw_get.*
ceph_rgw_keystone_.*
ceph_rgw_lc_.*
ceph_rgw_lua_.*
ceph_rgw_pubsub_.*
ceph_rgw_put.*
ceph_rgw_qactive
ceph_rgw_qlen
ceph_rgw_req
List of all metrics to be collected by Ceph Exporter instead of
Ceph Manager
Removed cephDeviceMapping from the status.fullClusterInfo.cephDetails
section of the KaaSCephCluster object because its large size can
potentially exceed the Kubernetes 1.5 MB quota.
Container Cloud 2.19.0 (Cluster releases 11.3.0 and 7.9.0)
Unsupported
Container Cloud 2.20.0 (Cluster releases 11.4.0 and 7.10.0)
Details
Removed Ceph cluster deployment from the management and regional
clusters to reduce resource consumption. Ceph is automatically removed
during the Cluster release update to 11.4.0 or 7.10.0.
Deprecated the Alertmanager API v1 in favor of v2. The current Alertmanager
version supports both API versions. However, in one of the upcomping
MOSK and Container Cloud releases, Alertmanager will be
upgraded to the version that supports only the API v2. Therefore, if you use
API v1, update your integrations and configurations to use the API v2 to
ensure compatibility with the upgraded Alertmanager.
Following the upstream deprecation in Prometheus, deprecated the
prometheus-rabbitmq-exporter job in favor of the
rabbitmq-prometheus-plugin one, which is based on the native
RabbitMQ Prometheus plugin ensuring reliable and direct metric colletion.
As a result, deprecated and renamed the RabbitMQ Grafana dashboard
to the RabbitMQ [Deprecated] one. As a replacement, use the
RabbitMQ Overview and RabbitMQ Erlang Grafana dashboards.
Warning
If you use deprecated RabbitMQ metrics in customizations such as
alerts and dashboards, switch to the new metrics and dashboards within the
course of the MOSK 25.1 series to prevent issues once the
deprecated metrics and dashboard will be removed.
Following the upstream deprecation in Grafana, deprecated the Angular-based
plugins in favor of the React-based ones. In Container Cloud 2.29.0 and
MOSK 25.1, where Grafana is updated from version 10 to 11,
the following Angular plugins are automatically migrated to the React ones:
Graph (old) -> Time Series
Singlestat -> Stat
Stat (old) -> Stat
Table (old) -> Table
Worldmap -> Geomap
All Grafana dashboards provided by StackLight are also migrated to React
automatically. For the list of default dashboards, see
View Grafana dashboards.
Warning
This migration may corrupt custom Grafana dashboards that have
Angular-based panels. Therefore, if you have such dashboards, back them up
and manually upgrade Angular-based panels during the course of
Container Cloud 2.28.x and MOSK 24.3.x to prevent
custom appearance issues after plugin migration.
The StackLight telegraf-openstack plugin is going to be replaced
by osdpl-exporter. As a result, all valuable Telegraf metrics that
are used by StackLight components will be reimplemented in
osdpl-exporter and all dependent StackLight alerts and dashboards
will start using new metrics.
Therefore, if you use any telegraf-openstack metrics in any cluster
customizations, consider reimplementing them with new metrics.
To obtain the list of metrics that are removed and replaced with new
ones, contact Mirantis support.
Deprecated logging.syslog in favor of logging.externalOutputs
that contains a wider range of configuration options.
Services and parameters related to OpenSearch and Kibana¶
Deprecated
MOSK 22.3 and Container Cloud 2.18.0
Removed
To be decided
Details
Deprecated elasticsearch-master in favor of opensearch-master.
In future releases, the following parameters of the
stacklight.values section will be deprecated and finally replaced
by:
elasticsearch in favor of logging
elasticsearch.retentionTime in favor of logging.retentionTime
resourcesPerClusterSize.elasticsearch in favor of
resourcesPerClusterSize.opensearch
resourcesPerClusterSize.fluentdElasticsearch in favor of
resourcesPerClusterSize.fluentdLogs
resources.fluentdElasticsearch in favor of
resources.fluentdLogs
resources.elasticsearch in favor of resources.opensearch
resources.iamProxyKibana in favor of
resources.iamProxyOpenSearchDashboards
resources.kibana in favor of resources.opensearchDashboards
nodeSelector.component.elasticsearch in favor of
nodeSelector.component.opensearch
nodeSelector.component.fluentdElasticsearch in favor of
nodeSelector.component.fluentdLogs
nodeSelector.component.kibana in favor of
nodeSelector.component.opensearchDashboards
tolerations.component.elasticsearch in favor of
tolerations.component.opensearch
tolerations.component.fluentdElasticsearch in favor of
tolerations.component.fluentdLogs
tolerations.component.kibana in favor of
tolerations.component.opensearchDashboards
stacklightLogLevels.component.fluentdElasticsearch in favor of
stacklightLogLevels.component.fluentdLogs
stacklightLogLevels.component.elasticsearch in favor of
stacklightLogLevels.component.opensearch
stacklightLogLevels.component.kibana in favor of
stacklightLogLevels.component.opensearchDashboards
Replaced Elasticsearch with OpenSearch, and Kibana with OpenSearch Dashboards
due to licensing changes for Elasticsearch. OpenSearch is a fork of Elasticsearch
under the open-source Apache License with development led by Amazon Web Services.
For new deployments with the logging stack enabled, OpenSearch is now deployed
by default. For existing deployments, migration to OpenSearch is performed
automatically during clusters update. However, the entire Elasticsearch cluster
may go down for up to 15 minutes.
Retention Time parameter in the Container Cloud web UI¶
Deprecated
MOSK 22.2 and Container Cloud 2.16.0
Removed
MOSK 22.3 and Container Cloud 2.17.0
Details
Replaced the Retention Time parameter with the
Logstash Retention Time, Events Retention Time,
and Notifications Retention Time parameters.
logstashRetentionTime parameter for Elasticsearch¶
Deprecated
MOSK 22.2 and Container Cloud 2.16.0
Removed
MOSK 24.1 and Container Cloud 2.26.0
Details
Deprecated the elasticsearch.logstashRetentionTime parameter in
favor of the elasticsearch.retentionTime.logstash,
elasticsearch.retentionTime.events, and
elasticsearch.retentionTime.notifications parameters.
Mirantis aims to release Mirantis OpenStack for Kubernetes (MOSK)
software regularly and often.
MOSK software includes OpenStack, Tungsten Fabric,
life-cycle management tooling, other supporting software, and dependencies.
Mirantis’s goal is to ensure that such updates are easy to install in
zero-touch and zero-downtime fashion.
MOSK release cadence consists of major, for example,
MOSK 24.1, and patch, for example, MOSK
24.1.1 or 24.1.2, releases. The major release with the patch releases based
on it are called a release series, for example, MOSK 24.1
series.
Both major and patch release versions incorporate solutions for security
vulnerabilities and known product issues. The primary distinction between
these two release types lies in the fact that major release versions
introduce new functionalities, whereas patch release versions predominantly
offer minor product enhancements.
Patch releases strive to considerably reduce the timeframe for delivering
CVE resolutions in images to your deployments, aiding in the mitigation
of cyber threats and data breaches.
Content
Major release
Patch release
Version update and upgrade of the major product components including
but not limited to OpenStack, Tungsten Fabric, Kubernetes, Ceph, and
Stacklight 0
Container runtime changes including Mirantis Container Runtime and
containerd updates
Changes in public API
Changes in the Container Cloud and MOSK lifecycle management including
but not limited to machines, clusters, Ceph OSDs
Host machine changes including host operating system and kernel updates
Patch version bumps of MKE and Kubernetes
Fixes for Common Vulnerabilities and Exposures (CVE) in images
StackLight subcomponents may be updated during patch releases
Most patch release versions involve minor changes that only require restarting
containers on the cluster during updates. However, the product can also deliver
CVE fixes on Ubuntu, which includes updating the minor version of the Ubuntu
kernel. This kernel update is not mandatory, but if you prioritize getting
the latest CVE fixes for Ubuntu, you can manually reboot machines during
a convenient maintenance window to update the kernel.
Each subsequent major release includes patch release updates of the previous
major release.
You may decide to update between only major releases without updating to patch
releases. In this case, you will perform updates from an N to N+1 major
release. However, Mirantis recommends applying security fixes using patch
releases as soon as they become available.
Starting from MOSK 24.1.5, Mirantis introduces a new
update scheme allowing for the update path flexibility.
The user cannot update to the intermediate patch version in the series
if a newer patch version has been released.
The user can update to any patch version in the series even if a newer
patch version has been released already.
If the cluster starts receiving patch releases, the user must apply the
latest patch version in the series to be able to update to the following
major release.
The user can always update to the newer major version from the latest
patch version of the previous series. Additionally, there will be
another possibility of major update during the course of the patch
series from the patch version released immediately before the target
major version. Refer to Update path for 24.1, 24.2, 24.3, and 25.1 series for the update
possibility illustration.
Mirantis provides Long Term Support (LTS) for specific versions of OpenStack.
LTS includes scheduled updates with new functionality, bug and security fixes.
Mirantis intends to introduce support for new OpenStack version once a year.
The LTS duration of an OpenStack version is two years.
The diagram below illustrates the current LTS support cycle for OpenStack.
The upstream versions not mentioned in the diagram are not supported in the
product as well as the upgrade paths from or to such versions.
Important
MOSK supports the OpenStack Victoria version until
September, 2023. MOSK 23.2 is the last release version
where OpenStack Victoria packages are updated.
If you have not already upgraded your OpenStack version to Yoga, Mirantis
highly recommends doing this during the course of the MOSK
23.2 series.
Untitled Diagram
Versions of Tungsten Fabric, underlying Kubernetes, Ceph, StackLight, and other
supporting software and dependencies may change at Mirantis discretion. Follow
Release Compatibility Matrix and product Release Notes for any changes in
product component versions.
A Technology Preview feature provides early access to upcoming product
innovations, allowing customers to experiment with the functionality and
provide feedback.
Technology Preview features may be privately or publicly available but
neither are intended for production use. While Mirantis will provide
assistance with such features through official channels, normal Service
Level Agreements do not apply.
As Mirantis considers making future iterations of Technology Preview features
generally available, we will do our best to resolve any issues that customers
experience when using these features.
During the development of a Technology Preview feature, additional components
may become available to the public for evaluation. Mirantis cannot guarantee
the stability of such features. As a result, if you are using Technology
Preview features, you may not be able to seamlessly update to subsequent
product releases, as well as upgrade or migrate to the functionality that
has not been announced as full support yet.
Mirantis makes no guarantees that Technology Preview features will graduate
to generally available features.
The Release Compatibility Matrix describes the cloud configurations that have
been supported by the product over the course of its lifetime and the path a
MOSK cloud can take to move from an older configuration
to a newer one.
For each MOSK release, the document outlines the versions
of the product major components, the valid combinations of these versions,
and the way every component must be updated or upgraded.
For a more comprehensive list of the product subcomponents and their
respective versions included in each MOSK release, refer to
Release Notes, or use the Releases section in the Container
Cloud UI or API.
The following table outlines the compatibility matrix of the most recent
MOSK releases and their major components in conjunction
with Container Cloud and Cluster releases.
The product support status reflects the freshness of
a MOSK cluster and should be considered when planning
the cluster update path:
Supported
Latest supported product release version to use for a greenfield
cluster deployment and to update to.
Deprecated
Product release version that you should update to the latest supported
product release. You cannot update between two deprecated release
versions.
The deprecated product release version becomes unsupported when newer
product versions are released. Therefore, when planning the update path
for the cluster, consider the dates of the upcoming product releases.
Greenfield deployments based on a deprecated product release are not
supported. Use the latest supported release version for initial
deployments instead.
Unsupported
Product release that blocks automatic upgrade of a management cluster
and must be updated immediately to resume receiving newest product
features and enhancements.
Mirantis Container Cloud will update itself automatically as long as
the release of each managed cluster has either supported or
deprecated status in the new version of Container Cloud.
A deprecated cluster release becomes unsupported in one of
the following Container Cloud releases. Therefore, we strongly
recommend that you update your deprecated MOSK
clusters to the latest supported version as described in
Cluster components update paths.
The kernel version of the host operating system validated by Mirantis
and confirmed to be working for the supported use cases. Usage of
custom kernel versions or third-party vendor-provided kernels, such
as FIPS-enabled, assume full responsibility for validating the
compatibility of components in such environments.
Management cluster update is performed automatically as long as
the release of each managed cluster has either supported or
deprecated status in the new version of Container Cloud.
In case, any of the clusters managed by Container Cloud is about
to get an unsupported status as a result of an update, the Container
Cloud updates will get blocked till the cluster gets to a later release.
Major cluster update is initiated by a cloud operator through the Container
Cloud UI. The update procedure is automated and covers all the life cycle
management modules of the cluster that include OpenStack, Tungsten Fabric,
Ceph, and StackLight. See Cluster update for details.
Version-specific considerations
Before the MOSK 24.1 series, if between the major
releases you apply at least one patch release belonging to the N series,
you should obtain the last patch release in the series to be able
to update to the N+1 major release version.
Patch cluster update is initiated by a cloud operator through the Container
Cloud UI. The update procedure is automated and covers all the life cycle
management modules of the cluster that include OpenStack, Tungsten Fabric,
Ceph, and StackLight. See Update to a patch version for details.