Mirantis OpenStack for Kubernetes Documentation¶
This documentation provides information on how to deploy and operate a Mirantis OpenStack for Kubernetes (MOSK) environment. The documentation is intended to help operators to understand the core concepts of the product. The documentation provides sufficient information to deploy and operate the solution.
The information provided in this documentation set is being constantly improved and amended based on the feedback and kind requests from the consumers of MOS.
The following table lists the guides included in the documentation set you are reading:
Guide |
Purpose |
---|---|
Learn the fundamentals of MOSK reference architecture to appropriately plan your deployment |
|
Deploy a MOSK environment of a preferred configuration using supported deployment profiles tailored to the demands of specific business cases |
|
Operate your MOSK environment |
|
Learn about new features and bug fixes in the current MOSK version |
Intended audience¶
This documentation is intended for engineers who have the basic knowledge of Linux, virtualization and containerization technologies, Kubernetes API and CLI, Helm and Helm charts, Mirantis Kubernetes Engine (MKE), and OpenStack.
Documentation history¶
The following table contains the released revision of the documentation set you are reading.
Release date |
Release name |
---|---|
November 05, 2020 |
MOSK GA release |
December 23, 2020 |
MOSK GA Update release |
March 01, 2021 |
MOSK 21.1 |
April 22, 2021 |
MOSK 21.2 |
June 15, 2021 |
MOSK 21.3 |
September 01, 2021 |
MOSK 21.4 |
October 05, 2021 |
MOSK 21.5 |
November 11, 2021 |
MOSK 21.6 |
February 23, 2022 |
MOSK 22.1 |
April 14, 2022 |
MOSK 22.2 |
June 30, 2022 |
MOSK 22.3 |
Conventions¶
This documentation set uses the following conventions in the HTML format:
Convention |
Description |
---|---|
boldface font |
Inline CLI tools and commands, titles of the procedures and system response examples, table titles |
|
Files names and paths, Helm charts parameters and their values, names of packages, nodes names and labels, and so on |
italic font |
Information that distinguishes some concept or term |
External links and cross-references, footnotes |
|
Main menu > menu item |
GUI elements that include any part of interactive user interface and menu navigation |
Superscript |
Some extra, brief information |
Note The Note block |
Messages of a generic meaning that may be useful for the user |
Caution The Caution block |
Information that prevents a user from mistakes and undesirable consequences when following the procedures |
Warning The Warning block |
Messages that include details that can be easily missed, but should not be ignored by the user and are valuable before proceeding |
See also The See also block |
List of references that may be helpful for understanding of some related tools, concepts, and so on |
Learn more The Learn more block |
Used in the Release Notes to wrap a list of internal references to the reference architecture, deployment and operation procedures specific to a newly implemented product feature |
Product Overview¶
Mirantis OpenStack for Kubernetes (MOSK) combines the power of Mirantis Container Cloud for delivering and managing Kubernetes clusters, with the industry standard OpenStack APIs, enabling you to build your own cloud infrastructure.
The advantages of running all of the OpenStack components as a Kubernetes application are multi-fold and include the following:
Zero downtime, non disruptive updates
Fully automated Day-2 operations
Full-stack management from bare metal through the operating system and all the necessary components
The list of the most common use cases includes:
- Software-defined data center
The traditional data center requires multiple requests and interactions to deploy new services, by abstracting the data center functionality behind a standardised set of APIs service can be deployed faster and more efficiently. MOSK enables you to define all your data center resources behind the industry standard OpenStack APIs allowing you to automate the deployment of applications or simply request resources through the UI to quickly and efficiently provision virtual machines, storage, networking, and other resources.
- Virtual Network Functions (VNFs)
VNFs require high performance systems that can be accessed on demand in a standardised way, with assurances that they will have access to the necessary resources and performance guarantees when needed. MOSK provides extensive support for VNF workload enabling easy access to functionality such as Intel EPA (NUMA, CPU pinning, Huge Pages) as well as the consumption of specialised networking interfaces cards to support SR-IOV and DPDK. The centralised management model of MOSK and Mirantis Container Cloud also enables the easy management of multiple MOSK deployments with full lifecycle management.
- Legacy workload migration
With the industry moving toward cloud-native technologies many older or legacy applications are not able to be moved easily and often it does not make financial sense to transform the applications to cloud-native applications. MOSK provides a stable cloud platform that can cost effectively host legacy applications whilst still providing the expected levels of control, customization, and uptime.
Reference Architecture¶
Mirantis OpenStack for Kubernetes (MOSK) is a virtualization platform that provides an infrastructure for cloud-ready applications, in combination with reliability and full control over the data.
MOSK combines OpenStack, an open-source cloud infrastructure software, with application management techniques used in the Kubernetes ecosystem that include container isolation, state enforcement, declarative definition of deployments, and others.
MOSK integrates with Mirantis Container Cloud to rely on its capabilities for bare-metal infrastructure provisioning, Kubernetes cluster management, and continuous delivery of the stack components.
MOSK simplifies the work of a cloud operator by automating all major cloud life cycle management routines including cluster updates and upgrades.
Deployment profiles¶
A Mirantis OpenStack for Kubernetes (MOSK) deployment profile is a thoroughly tested and officially supported reference architecture that is guaranteed to work at a specific scale and is tailored to the demands of a specific business case, such as generic IaaS cloud, Network Function Virtualisation infrastructure, Edge Computing, and others.
A deployment profile is defined as a combination of:
Services and features the cloud offers to its users.
Non-functional characteristics that users and operators should expect when running the profile on top of a reference hardware configuration. Including, but not limited to:
Performance characteristics, such as an average network throughput between VMs in the same virtual network.
Reliability characteristics, such as the cloud API error response rate when recovering a failed controller node.
Scalability characteristics, such as the total amount of virtual routers tenants can run simultaneously.
Hardware requirements - the specification of physical servers, and networking equipment required to run the profile in production.
Deployment parameters that an operator for the cloud can tweak within a certain range without being afraid of breaking the cloud or losing support.
In addition, the following items may be included in a definition:
Compliance-driven technical requirements, such as TLS encryption of all external API endpoints.
Foundation-level software components, such as Tungsten Fabric or Open vSwitch as a back end for the networking service.
Note
Mirantis reserves the right to revise the technical implementation of any profile at will while preserving its definition - the functional and non-functional characteristics that operators and users are known to rely on.
MOSK supports a huge list of different deployment profiles to address a wide variety of business tasks. The table below includes the profiles for the most common use cases.
Note
Some components of a MOSK cluster are mandatory and are being installed during the managed cluster deployment by MCC regardless of the deployment profile in use. StackLight is one of the cluster components that are enabled by default. See MCC Operations Guide for details.
Profile |
OpenStackDeployment CR Preset |
Description |
---|---|---|
Cloud Provider Infrastructure (CPI) |
|
Provides the core set of the services an IaaS vendor would need including some extra functionality. The profile is designed to support up 50-70 compute nodes and a reasonable number of storage nodes. 0 The core set of services provided by the profile includes:
|
CPI with Tungsten Fabric |
|
A variation of the CPI profile 1 with Tugsten Fabric as a back end for networking. |
- 0
The supported node count is approximate and may vary depending on the hardware, cloud configuration, and planned workload.
- 1(1,2)
Ironic is an optional component for the CPI profile. See Bare metal OsDpl configuration for details.
- 2
Ironic is not supported for the CPI with Tungsten Fabric profile. See Tungsten Fabric known limitations for details.
- 3
Telemetry services are optional components and should be enabled together through the list of services to be deployed in the
OpenStackDeployment
CR as described in Deploy an OpenStack cluster.
See also
Components overview¶
Mirantis OpenStack for Kubernetes (MOSK) includes the following key design elements.
HelmBundle Operator¶
The HelmBundle Operator is the realization of the Kubernetes Operator
pattern that provides a Kubernetes custom resource of the HelmBundle
kind and code running inside a pod in Kubernetes. This code handles changes,
such as creation, update, and deletion, in the Kubernetes resources of this
kind by deploying, updating, and deleting groups of Helm releases from
specified Helm charts with specified values.
OpenStack¶
The OpenStack platform manages virtual infrastructure resources, including virtual servers, storage devices, networks, and networking services, such as load balancers, as well as provides management functions to the tenant users.
Various OpenStack services are running as pods in Kubernetes and are
represented as appropriate native Kubernetes resources, such as
Deployments
, StatefulSets
, and DaemonSets
.
For a simple, resilient, and flexible deployment of OpenStack and related services on top of a Kubernetes cluster, MOSK uses OpenStack-Helm that provides a required collection of the Helm charts.
Also, MOSK uses OpenStack Operator as the realization
of the Kubernetes Operator pattern. The OpenStack Operator provides a custom
Kubernetes resource of the OpenStackDeployment
kind and code running
inside a pod in Kubernetes. This code handles changes such as creation,
update, and deletion in the Kubernetes resources of this kind by
deploying, updating, and deleting groups of the Helm releases.
Ceph¶
Ceph is a distributed storage platform that provides storage resources, such as objects and virtual block devices, to virtual and physical infrastructure.
MOSK uses Rook as the implementation of the
Kubernetes Operator pattern that manages resources of the CephCluster
kind to deploy and
manage Ceph services as pods on top of Kubernetes to provide Ceph-based
storage to the consumers, which include OpenStack services, such as Volume
and Image services, and underlying Kubernetes through Ceph CSI (Container
Storage Interface).
The Ceph Controller is the implementation of the Kubernetes Operator
pattern, that manages resources of the MiraCeph
kind to simplify
management of the Rook-based Ceph clusters.
StackLight Logging, Monitoring, and Alerting¶
The StackLight component is responsible for collection, analysis, and visualization of critical monitoring data from physical and virtual infrastructure, as well as alerting and error notifications through a configured communication system, such as email. StackLight includes the following key sub-components:
Prometheus
OpenSearch
OpenSearch Dashboards
Fluentd
Requirements¶
MOSK cluster hardware requirements¶
This section provides hardware requirements for the Mirantis Container Cloud management cluster with a managed Mirantis OpenStack for Kubernetes (MOSK) cluster.
For installing MOSK, the Mirantis Container Cloud management cluster and managed cluster must be deployed with baremetal provider.
Note
One of the industry best practices is to verify every new update or configuration change in a non-customer-facing environment before applying it to production. Therefore, Mirantis recommends having a staging cloud, deployed and maintained along with the production clouds. The recommendation is especially applicable to the environments that:
Receive updates often and use continuous delivery. For example, any non-isolated deployment of Mirantis Container Cloud.
Have significant deviations from the reference architecture or third party extensions installed.
Are managed under the Mirantis OpsCare program.
Run business-critical workloads where even the slightest application downtime is unacceptable.
A typical staging cloud is a complete copy of the production environment including the hardware and software configurations, but with a bare minimum of compute and storage capacity.
The table below describes the node types the MOSK reference architecture includes.
Node type |
Description |
---|---|
Mirantis Container Cloud management cluster nodes |
The Container Cloud management cluster architecture on bare metal requires three physical servers for manager nodes. On these hosts, we deploy a Kubernetes cluster with services that provide Container Cloud control plane functions. |
OpenStack control plane node and StackLight node |
Host OpenStack control plane services such as database, messaging, API, schedulers conductors, and L3 and L2 agents, as well as the StackLight components. Note MOSK enables the cloud administrator to collocate the OpenStack control plane with the managed cluster master nodes on the OpenStack deployments of a small size. This capability is available as technical preview. Use such configuration for testing and evaluation purposes only. |
Tenant gateway node |
Optional. Hosts OpenStack gateway services including L2, L3, and DHCP agents. The tenant gateway nodes are combined with OpenStack control plane nodes. The strict requirement is a dedicated physical network (bond) for tenant network traffic. |
Tungsten Fabric control plane node |
Required only if Tungsten Fabric is enabled as a back end for the OpenStack networking. These nodes host the TF control plane services such as Cassandra database, messaging, API, control, and configuration services. |
Tungsten Fabric analytics node |
Required only if Tungsten Fabric is enabled as a back end for the OpenStack networking. These nodes host the TF analytics services such as Cassandra, ZooKeeper, and collector. |
Compute node |
Hosts the OpenStack Compute services such as QEMU, L2 agents, and others. |
Infrastructure nodes |
Runs underlying Kubernetes cluster management services. The MOSK reference configuration requires minimum three infrastructure nodes. |
The table below specifies the hardware resources the MOSK reference architecture recommends for each node type.
Node type |
# of servers |
CPU cores # per server |
RAM per server, GB |
Disk space per server, GB |
NICs # per server |
---|---|---|---|---|---|
Mirantis Container Cloud management cluster node |
3 0 |
16 |
128 |
1 SSD x 960
2 SSD x 1900 1
|
3 2 |
OpenStack control plane, gateway 3, and StackLight nodes |
3 or more |
32 |
128 |
1 SSD x 500
2 SSD x 1000 6
|
5 |
Tenant gateway (optional) |
0-3 |
32 |
128 |
1 SSD x 500 |
5 |
Tungsten Fabric control plane nodes 4 |
3 |
16 |
64 |
1 SSD x 500 |
1 |
Tungsten Fabric analytics nodes 4 |
3 |
32 |
64 |
1 SSD x 1000 |
1 |
Compute node |
3 (varies) |
16 |
64 |
1 SSD x 500 7 |
5 |
Infrastructure node (Kubernetes cluster management) |
3 8 |
16 |
64 |
1 SSD x 500 |
5 |
Infrastructure node (Ceph) 5 |
3 |
16 |
64 |
1 SSD x 500
2 HDDs x 2000
|
5 |
Note
The exact hardware specifications and number of the control plane and gateway nodes depend on a cloud configuration and scaling needs. For example, for the clouds with more than 12,000 Neutron ports, Mirantis recommends increasing the number of gateway nodes.
- 0
Adding more than 3 nodes to a management or regional cluster is not supported.
- 1
In total, at least 3 disks are required:
sda
- system storage, minimum 60 GBsdb
- Container Cloud services storage, not less than 110 GB. The exact capacity requirements depend on StackLight data retention period.sdc
- for persistent storage on Ceph
See Management cluster storage for details.
- 2
OOB management (IPMI) port is not included.
- 3
OpenStack gateway services can optionally be moved to separate nodes.
- 4(1,2)
TF control plane and analytics nodes can be combined with a respective addition of RAM, CPU, and disk space to the hardware hosts. Though, Mirantis does not recommend such configuration for production environments as the risk of the cluster downtime if one of the nodes unexpectedly fails increases.
- 5
A Ceph cluster with 3 Ceph nodes does not provide hardware fault tolerance and is not eligible for recovery operations, such as a disk or an entire node replacement.
A Ceph cluster uses the replication factor that equals 3. If the number of Ceph OSDs is less than 3, a Ceph cluster moves to the degraded state with the write operations restriction until the number of alive Ceph OSDs equals the replication factor again.
- 6
1 SSD x 500 for operating system
1 SSD x 1000 for OpenStack LVP
1 SSD x 1000 for StackLight LVP
- 7
When Nova is used with local folders, additional capacity is required depending on the VM images size.
- 8
For nodes hardware requirements, refer to Container Cloud Reference Architecture: Managed cluster hardware configuration.
Note
If you would like to evaluate the MOSK capabilities and do not have much hardware at your disposal, you can deploy it in a virtual environment. For example, on top of another OpenStack cloud using the sample Heat templates.
Please mind, the tooling is provided for reference only and is not a part of the product itself. Mirantis does not guarantee its interoperability with any MOSK version.
Management cluster storage¶
The management cluster requires minimum three storage devices per node. Each device is used for different type of storage:
One storage device for boot partitions and root file system. SSD is recommended. A RAID device is not supported.
One storage device per server is reserved for local persistent volumes. These volumes are served by the Local Storage Static Provisioner, that is
local-volume-provisioner
, and used by many services of Mirantis Container Cloud.At least one disk per server must be configured as a device managed by a Ceph OSD.
The minimal recommended number of Ceph OSDs for management cluster is 2 OSDs per node, to the total of 6 OSDs.
The recommended replication factor is 3, which ensures that no data is lost if any single node of the management cluster fails.
You can configure host storage devices using BareMetalHostProfile
resources.
System requirements for the seed node¶
The seed node is only necessary to deploy the management cluster. When the bootstrap is complete, the bootstrap node can be discarded and added back to the MOSK cluster as a node of any type.
The minimum reference system requirements for a baremetal-based bootstrap seed node are as follow:
Basic Ubuntu 18.04 server with the following configuration:
Kernel of version 4.15.0-76.86 or later
8 GB of RAM
4 CPU
10 GB of free disk space for the bootstrap cluster cache
No DHCP or TFTP servers on any NIC networks
Routable access IPMI network for the hardware servers.
Internet access for downloading of all required artifacts
If you use a firewall or proxy, make sure that the bootstrap, management, and regional clusters have access to the following IP ranges and domain names:
IP ranges:
Domain names:
mirror.mirantis.com and repos.mirantis.com for packages
binary.mirantis.com for binaries and Helm charts
mirantis.azurecr.io for Docker images
mcc-metrics-prod-ns.servicebus.windows.net:9093 for Telemetry (port 443 if proxy is enabled)
mirantis.my.salesforce.com for Salesforce alerts
Note
Access to Salesforce is required from any Container Cloud cluster type.
If any additional Alertmanager notification receiver is enabled, for example, Slack, its endpoint must also be accessible from the cluster.
Components collocation¶
MOSK uses Kubernetes labels to place components onto hosts. For the default locations of components, see MOSK cluster hardware requirements. Additionally, MOSK supports component collocation. This is mostly useful for OpenStack compute and Ceph nodes. For component collocation, consider the following recommendations:
When calculating hardware requirements for nodes, consider the requirements for all collocated components.
When performing maintenance on a node with collocated components, execute the maintenance plan for all of them.
When combining other services with the OpenStack compute host, verify that
reserved_host_*
has increased accordingly to the needs of collocated components by using node-specific overrides for thecompute
service.
Infrastructure requirements¶
This section lists the infrastructure requirements for the Mirantis OpenStack for Kubernetes (MOSK) reference architecture.
Service |
Description |
---|---|
MetalLB |
MetalLB exposes external IP addresses to access applications in a Kubernetes cluster. |
DNS |
The Kubernetes Ingress NGINX controller is used to expose OpenStack services outside of a Kubernetes deployment. Access to the Ingress services is allowed only by its FQDN. Therefore, DNS is a mandatory infrastructure service for an OpenStack on Kubernetes deployment. |
See also
Automatic upgrade of a host operating system¶
To keep operating system on a bare metal host up to date with the latest security updates, the operating system requires periodic software packages upgrade that may or may not require the host reboot.
Mirantis Container Cloud uses life cycle management tools to update the operating system packages on the bare metal hosts. Container Cloud may also trigger restart of bare metal hosts to apply the updates.
In a management cluster, software package upgrade and host restart are applied automatically when a new Container Cloud version with available kernel or software packages upgrade is released.
In a managed cluster, package upgrade and host restart are applied as part of usual cluster update, when applicable. To start planning the maintenance window and proceed with the managed cluster update, see Update a MOSK cluster to 22.1 or 22.2.
Operating system upgrade and host restart are applied to cluster nodes one by one. If Ceph is installed in the cluster, the Container Cloud orchestration securely pauses the Ceph OSDs on the node before restart. This allows avoiding degradation of the storage service.
OpenStack¶
OpenStack Operator¶
The OpenStack Operator component is a combination of the following entities:
OpenStack Controller¶
The OpenStack Controller runs in a set of containers in a pod in Kubernetes. The OpenStack Controller is deployed as a Deployment with 1 replica only. The failover is provided by Kubernetes that automatically restarts the failed containers in a pod.
However, given the recommendation to use a separate Kubernetes cluster for each OpenStack deployment, the controller in envisioned mode for operation and deployment will only manage a single OpenStackDeployment resource, making the proper HA much less of an issue.
The OpenStack Controller is written in Python using Kopf, as a Python framework to build Kubernetes operators, and Pykube, as a Kubernetes API client.
Using Kubernetes API, the controller subscribes to changes to resources of
kind: OpenStackDeployment
, and then reacts to these changes by creating,
updating, or deleting appropriate resources in Kubernetes.
The basic child resources managed by the controller are Helm releases.
They are rendered from templates taking into account
an appropriate values set from the main
and features
fields in the
OpenStackDeployment resource.
Then, the common fields are merged to resulting data structures. Lastly, the services fields are merged providing the final and precise override for any value in any Helm release to be deployed or upgraded.
The constructed values are then used by the OpenStack Controller during a
Helm release
installation.
Container |
Description |
---|---|
|
The core container that handles changes in the |
|
The container that watches the |
|
The container that watches all Kubernetes native
resources, such as |
|
The container that provides data exchange between different components such as Ceph. |
|
The container that handles the node events. |

OpenStackDeployment Admission Controller¶
The CustomResourceDefinition
resource in Kubernetes uses the
OpenAPI Specification version 2 to specify the schema of the resource
defined. The Kubernetes API outright rejects the resources that do not
pass this schema validation.
The language of the schema, however, is not expressive enough to define a specific validation logic that may be needed for a given resource. For this purpose, Kubernetes enables the extension of its API with Dynamic Admission Control.
For the OpenStackDeployment (OsDpl) CR the ValidatingAdmissionWebhook
is a natural choice. It is deployed as part of OpenStack Controller
by default and performs specific extended validations when an OsDpl CR is
created or updated.
The inexhaustive list of additional validations includes:
Deny the OpenStack version downgrade
Deny the OpenStack version skip-level upgrade
Deny the OpenStack master version deployment
Deny upgrade to the OpenStack master version
Deny upgrade if any part of an OsDpl CR specification changes along with the OpenStack version
Under specific circumstances, it may be viable to disable the Admission Controller, for example, when you attempt to deploy or upgrade to the master version of OpenStack.
Warning
Mirantis does not support MOSK deployments performed without the OpenStackDeployment Admission Controller enabled. Disabling of the OpenStackDeployment Admission Controller is only allowed in staging non-production environments.
To disable the Admission Controller, ensure that the following structures and
values are present in the openstack-controller
HelmBundle resource:
apiVersion: lcm.mirantis.com/v1alpha1
kind: HelmBundle
metadata:
name: openstack-operator
namespace: osh-system
spec:
releases:
- name: openstack-operator
values:
admission:
enabled: false
At that point, all safeguards except for those expressed by the CR definition are disabled.
OpenStackDeployment custom resource¶
The resource of kind OpenStackDeployment
(OsDpl) is a custom resource
(CR) defined by a resource of kind CustomResourceDefinition
. This section
is intended to provide a detailed overview of the OsDpl configuration including
the definition of its main elements as well as the configuration of extra
OpenStack services that do no belong to standard deployment profiles.
The detailed information about schema of an OpenStackDeployment
(OsDpl)
custom resource can be obtained by running:
kubectl get crd openstackdeployments.lcm.mirantis.com -oyaml
The definition of a particular OpenStack deployment can be obtained by running:
kubectl -n openstack get osdpl -oyaml
apiVersion: lcm.mirantis.com/v1alpha1
kind: OpenStackDeployment
metadata:
name: openstack-cluster
namespace: openstack
spec:
openstack_version: victoria
preset: compute
size: tiny
internal_domain_name: cluster.local
public_domain_name: it.just.works
features:
neutron:
tunnel_interface: ens3
external_networks:
- physnet: physnet1
interface: veth-phy
bridge: br-ex
network_types:
- flat
vlan_ranges: null
mtu: null
floating_network:
enabled: False
nova:
live_migration_interface: ens3
images:
backend: local
For the detailed description of the OsDpl main elements, see sections below:
apiVersion
¶Specifies the version of the Kubernetes API that is used to create this object.
kind
¶Specifies the kind of the object.
metadata:name
¶Specifies the name of metadata. Should be set in compliance with the Kubernetes resource naming limitations.
metadata:namespace
¶Specifies the metadata namespace. While technically it is possible to
deploy OpenStack on top of Kubernetes in other than openstack
namespace, such configuration is not included in the MOSK
system integration test plans. Therefore, we do not recommend such scenario.
Warning
Both OpenStack and Kubernetes platforms provide resources to applications. When OpenStack is running on top of Kubernetes, Kubernetes is completely unaware of OpenStack-native workloads, such as virtual machines, for example.
For better results and stability, Mirantis recommends using a dedicated Kubernetes cluster for OpenStack, so that OpenStack and auxiliary services, Ceph, and StackLight are the only Kubernetes applications running in the cluster.
spec
¶Contains the data that defines the OpenStack deployment and configuration. It has both high-level and low-level sections.
The very basic values that must be provided include:
spec:
openstack_version:
preset:
size:
public_domain_name:
For the detailed description of the spec
subelements, see
Spec OsDpl elements.
openstack_version
¶Specifies the OpenStack release to deploy.
preset
¶String that specifies the name of the preset
, a predefined
configuration for the OpenStack cluster. A preset includes:
A set of enabled services that includes virtualization, bare metal management, secret management, and others
Major features provided by the services, such as VXLAN encapsulation of the tenant traffic
Integration of services
Every supported deployment profile incorporates an OpenStack preset. Refer to Deployment profiles for the list of possible values.
size
¶String that specifies the size category for the OpenStack cluster. The size category defines the internal configuration of the cluster such as the number of replicas for service workers and timeouts, etc.
The list of supported sizes include:
tiny
- for approximately 10 OpenStack compute nodessmall
- for approximately 50 OpenStack compute nodesmedium
- for approximately 100 OpenStack compute nodes
public_domain_name
¶Specifies the public DNS name for OpenStack services. This is a base DNS name that must be accessible and resolvable by API clients of your OpenStack cloud. It will be present in the OpenStack endpoints as presented by the OpenStack Identity service catalog.
The TLS certificates used by the OpenStack services (see below) must also be issued to this DNS name.
persistent_volume_storage_class
¶Specifies the Kubernetes storage class name used for services to create
persistent volumes. For example, backups of MariaDB. If not specified,
the storage class marked as default
will be used.
features
¶Contains the top-level collections of settings for the OpenStack deployment that potentially target several OpenStack services. The section where the customizations should take place.
features:services
¶Contains a list of extra OpenStack services to deploy. Extra OpenStack
services are services that are not included into preset
.
features:services:object-storage
¶Enables the object storage and provides a RADOS Gateway Swift API that is
compatible with the OpenStack Swift API. To enable the service, add
object-storage
to the service list:
spec:
features:
services:
- object-storage
To create the RADOS Gateway pool in Ceph, see Container Cloud Operations Guide: Enable Ceph RGW Object Storage.
features:services:instance-ha
¶TechPreview
Enables Masakari, the OpenStack service that ensures high availability of
instances running on a host. To enable the service, add instance-ha
to the
service list:
spec:
features:
services:
- instance-ha
features:services:tempest
¶Enables tests against a deployed OpenStack cloud:
spec:
features:
services:
- tempest
features:ssl
¶Deprecated since 22.3
Setting this field in the OpenStackDeployment
custom resource has been
deprecated. Use OpenStackDeploymentSecret custom resource to define the cloud’s secret parameters.
For the deprecation details, refer to OpenStackDeployment CR fields containing cloud secret parameters.
features:neutron:tunnel_interface
¶Defines the name of the NIC device on the actual host that will be used for Neutron.
We recommend setting up your Kubernetes hosts in such a way that networking is configured identically on all of them, and names of the interfaces serving the same purpose or plugged into the same network are consistent across all physical nodes.
features:neutron:dns_servers
¶Defines the list of IPs of DNS servers that are accessible from virtual networks. Used as default DNS servers for VMs.
features:neutron:external_networks
¶Contains the data structure that defines external (provider) networks on top of which the Neutron networking will be created.
features:neutron:floating_network
¶If enabled, must contain the data structure defining the floating IP network that will be created for Neutron to provide external access to your Nova instances.
features:nova:live_migration_interface
¶Specifies the name of the NIC device on the actual host that will be used by Nova for the live migration of instances.
We recommend setting up your Kubernetes hosts in such a way that networking is configured identically on all of them, and names of the interfaces serving the same purpose or plugged into the same network are consistent across all physical nodes.
Also, set the option to vhost0
in the following cases:
The Neutron service uses Tungsten Fabric.
Nova migrates instances through the interface specified by the Neutron’s
tunnel_interface
parameter.
features:nova:images:backend
¶Defines the type of storage for Nova to use on the compute hosts for the images that back up the instances.
The list of supported options include:
local
- the local storage is used. The pros include faster operation, failure domain independency from the external storage. The cons include local space consumption and less performant and robust live migration with block migration.ceph
- instance images are stored in a Ceph pool shared across all Nova hypervisors. The pros include faster image start, faster and more robust live migration. The cons include considerably slower IO performance, workload operations direct dependency on Ceph cluster availability and performance.lvm
TechPreview - instance images and ephemeral images are stored on a local Logical Volume. If specified,features:nova:images:lvm:volume_group
must be set to an available LVM Volume Group, by default,nova-vol
. For details, see Enable LVM ephemeral storage.
features:barbican:backends:vault
¶Specifies the object containing the Vault parameters to connect to Barbican. The list of supported options includes:
enabled
- boolean parameter indicating that the Vault back end is enabled.approle_role_id
Deprecated since 22.3 - Vault app role ID.Setting this field in the
OpenStackDeployment
custom resource has been deprecated. Use OpenStackDeploymentSecret custom resource to define the cloud’s secret parameters.For the deprecation details, refer to OpenStackDeployment CR fields containing cloud secret parameters.
approle_secret_id
Deprecated since 22.3 - secret ID created for the app role.Setting this field in the
OpenStackDeployment
custom resource has been deprecated. Use OpenStackDeploymentSecret custom resource to define the cloud’s secret parameters.For the deprecation details, refer to OpenStackDeployment CR fields containing cloud secret parameters.
vault_url
- URL of the Vault server.use_ssl
- enables the SSL encryption. Since MOSK does not currently support the Vault SSL encryption, theuse_ssl
parameter should be set tofalse
.kv_mountpoint
TechPreview - optional, specifies the mountpoint of a Key-Value store in Vault to use.namespace
TechPreview - optional, specifies the Vault namespace to use with all requests to Vault.Note
The Vault namespaces feature is available only in Vault Enterprise.
Note
Vault namespaces are supported only starting from the OpenStack Victoria release.
If the Vault back end is used, configure it properly using the following parameters:
spec:
features:
barbican:
backends:
vault:
enabled: true
approle_role_id: <APPROLE_ROLE_ID>
approle_secret_id: <APPROLE_SECRET_ID>
vault_url: <VAULT_SERVER_URL>
use_ssl: false
Note
Since MOSK does not currently support the Vault
SSL encryption, set the use_ssl
parameter to false
.
features:keystone:keycloak
¶Defines parameters to connect to the Keycloak identity provider. For details, see Integration with Identity Access Management (IAM).
features:keystone:domain_specific_configuration
¶Defines the domain-specific configuration and is useful for integration
with LDAP. An example of OsDpl with LDAP integration, which will create
a separate domain.with.ldap
domain and configure it to use LDAP as
an identity driver:
spec:
features:
keystone:
domain_specific_configuration:
enabled: true
domains:
domain.with.ldap:
enabled: true
config:
assignment:
driver: keystone.assignment.backends.sql.Assignment
identity:
driver: ldap
ldap:
chase_referrals: false
group_desc_attribute: description
group_id_attribute: cn
group_member_attribute: member
group_name_attribute: ou
group_objectclass: groupOfNames
page_size: 0
password: XXXXXXXXX
query_scope: sub
suffix: dc=mydomain,dc=com
url: ldap://ldap01.mydomain.com,ldap://ldap02.mydomain.com
user: uid=openstack,ou=people,o=mydomain,dc=com
user_enabled_attribute: enabled
user_enabled_default: false
user_enabled_invert: true
user_enabled_mask: 0
user_id_attribute: uid
user_mail_attribute: mail
user_name_attribute: uid
user_objectclass: inetOrgPerson
features:telemetry:mode
¶The information about Telemetry has been amended and updated and is now
published in the Telemetry services section. The feature is set to
autoscaling
by default.
features:logging
¶Specifies the standard logging levels for OpenStack services that
include the following, at increasing severity: TRACE
, DEBUG
,
INFO
, AUDIT
, WARNING
, ERROR
, and CRITICAL
.
For example:
spec:
features:
logging:
nova:
level: DEBUG
features:horizon:themes
¶Defines the list of custom OpenStack Dashboard themes. Content of the archive file with a theme depends on the level of customization and can include static files, Django templates, and other artifacts. For the details, refer to OpenStack official documentation: Customizing Horizon Themes.
spec:
features:
horizon:
themes:
- name: theme_name
description: The brand new theme
url: https://<path to .tgz file with the contents of custom theme>
sha256summ: <SHA256 checksum of the archive above>
features:policies
¶Defines the list of custom policies for OpenStack services.
Structure example:
spec:
features:
policies:
nova:
custom_policy: custom_value
The list of services available for configuration includes: Cinder, Nova, Designate, Keystone, Glance, Neutron, Heat, Octavia, Barbican, Placement, Ironic, aodh, Panko, Gnocchi, and Masakari.
Caution
Mirantis is not responsible for cloud operability in case of default policies modifications but provides API to pass the required configuration to the core OpenStack services.
features:database:cleanup
¶Available since MOSK 21.6
Defines the cleanup of the databases stale entries that are marked by OpenStack services as deleted. The scripts run on a periodic basis as cron jobs. By default, the databases entries older than 30 days are cleaned each Monday as per the following schedule:
Service |
Server time |
---|---|
Cinder |
12:01 a.m. |
Nova |
01:01 a.m. |
Glance |
02:01 a.m. |
Masakari |
03:01 a.m. |
Barbican |
04:01 a.m. |
Heat |
05:01 a.m. |
The list of services available for configuration includes: Barbican, Cinder, Glance, Heat, Masakari, and Nova.
Structure example:
spec:
features:
database:
cleanup:
<os-service>:
enabled:
schedule:
age: 30
batch: 1000
artifacts
¶A low-level section that defines the base URI prefixes for images and binary artifacts.
common
¶A low-level section that defines values that will be passed to all
OpenStack (spec:common:openstack
) or auxiliary
(spec:common:infra
) services Helm charts.
Structure example:
spec:
artifacts:
common:
openstack:
values:
infra:
values:
services
¶A section of the lowest level, enables the definition of specific values to pass to specific Helm charts on a one-by-one basis:
Warning
Mirantis does not recommend changing the default settings for
spec:artifacts
, spec:common
, and spec:services
elements.
Customizations can compromise the OpenStack deployment update and upgrade
processes.
However, you may need to edit the spec:services
section to limit
hardware resources in case of a hyperconverged architecture as described in
Limit HW resources for hyperconverged OpenStack compute nodes.
This feature has been removed in MOSK 22.1 in favor of the
OpenStackDeploymentStatus
(OsDplSt) custom resource.
Mirantis Container Cloud uses the Identity and access management (IAM) service
for users and permission management. The IAM integration is enabled by default
on the OpenStack side. On the IAM side, the service creates the os
client
in Keycloak automatically.
The role management and assignment should be configured separately on a particular OpenStack deployment.
The Bare metal (Ironic) service is an extra OpenStack service that can be deployed by the OpenStack Operator. This section provides the baremetal-specific configuration options of the OsDpl resource.
To install bare metal services, add the baremetal
keyword to the
spec:features:services
list:
spec:
features:
services:
- baremetal
Note
All bare metal services are scheduled to the nodes with the
openstack-control-plane: enabled
label.
To provision a user image onto a bare metal server, Ironic boots a node with
a ramdisk image. Depending on the node’s deploy interface and hardware, the
ramdisk may require different drivers (agents). MOSK
provides tinyIPA-based ramdisk images and uses the direct
deploy interface
with the ipmitool
power interface.
Example of agent_images
configuration:
spec:
features:
ironic:
agent_images:
base_url: https://binary.mirantis.com/openstack/bin/ironic/tinyipa
initramfs: tinyipa-stable-ussuri-20200617101427.gz
kernel: tinyipa-stable-ussuri-20200617101427.vmlinuz
Since the bare metal nodes hardware may require additional drivers, you may need to build a deploy ramdisk for particular hardware. For more information, see Ironic Python Agent Builder. Be sure to create a ramdisk image with the version of Ironic Python Agent appropriate for your OpenStack release.
Ironic supports the flat
and multitenancy
networking modes.
The flat
networking mode assumes that all bare metal nodes are
pre-connected to a single network that cannot be changed during the
virtual machine provisioning. This network with bridged interfaces
for Ironic should be spread across all nodes including compute nodes
to allow plug-in regular virtual machines to connect to Ironic network.
In its turn, the interface defined as provisioning_interface
should
be spread across gateway nodes. The cloud administrator can perform
all these underlying configuration through the L2 templates.
Example of the OsDpl resource illustrating the configuration for the flat
network mode:
spec:
features:
services:
- baremetal
neutron:
external_networks:
- bridge: ironic-pxe
interface: <baremetal-interface>
network_types:
- flat
physnet: ironic
vlan_ranges: null
ironic:
# The name of neutron network used for provisioning/cleaning.
baremetal_network_name: ironic-provisioning
networks:
# Neutron baremetal network definition.
baremetal:
physnet: ironic
name: ironic-provisioning
network_type: flat
external: true
shared: true
subnets:
- name: baremetal-subnet
range: 10.13.0.0/24
pool_start: 10.13.0.100
pool_end: 10.13.0.254
gateway: 10.13.0.11
# The name of interface where provision services like tftp and ironic-conductor
# are bound.
provisioning_interface: br-baremetal
The multitenancy
network mode uses the neutron
Ironic network
interface to share physical connection information with Neutron. This
information is handled by Neutron ML2 drivers when plugging a Neutron port
to a specific network. MOSK supports the
networking-generic-switch
Neutron ML2 driver out of the box.
Example of the OsDpl resource illustrating the configuration for the
multitenancy
network mode:
spec:
features:
services:
- baremetal
neutron:
tunnel_interface: ens3
external_networks:
- physnet: physnet1
interface: <physnet1-interface>
bridge: br-ex
network_types:
- flat
vlan_ranges: null
mtu: null
- physnet: ironic
interface: <physnet-ironic-interface>
bridge: ironic-pxe
network_types:
- vlan
vlan_ranges: 1000:1099
ironic:
# The name of interface where provision services like tftp and ironic-conductor
# are bound.
provisioning_interface: <baremetal-interface>
baremetal_network_name: ironic-provisioning
networks:
baremetal:
physnet: ironic
name: ironic-provisioning
network_type: vlan
segmentation_id: 1000
external: true
shared: false
subnets:
- name: baremetal-subnet
range: 10.13.0.0/24
pool_start: 10.13.0.100
pool_end: 10.13.0.254
gateway: 10.13.0.11
Depending on the use case, you may need to configure the same application components differently on different hosts. MOSK enables you to easily perform the required configuration through node-specific overrides at the OpenStack Controller side.
The limitation of using the node-specific overrides is that they override only the configuration settings while other components, such as startup scripts and others, should be reconfigured as well.
Caution
The overrides have been implemented in a similar way to the OpenStack node and node label specific DaemonSet configurations. Though, the OpenStack Controller node-specific settings conflict with the upstream OpenStack node and node label specific DaemonSet configurations. Therefore, we do not recommend configuring node and node label overrides.
The list of allowed node labels is located in the Cluster
object status
providerStatus.releaseRef.current.allowedNodeLabels
field.
Starting from MOSK 22.3, if the value
field is not
defined in allowedNodeLabels
, a label can have any value.
Before or after a machine deployment, add the required label from the allowed
node labels list with the corresponding value to
spec.providerSpec.value.nodeLabels
in machine.yaml
. For example:
nodeLabels:
- key: <NODE-LABEL>
value: <NODE-LABEL-VALUE>
The addition of a node label that is not available in the list of allowed node labels is restricted.
The node-specific settings are activated through the spec:nodes
section of the OsDpl CR. The spec:nodes
section contains the following
subsections:
features
- implements overrides for a limited subset of fields and is constructed similarly tospec::features
services
- similarly tospec::services
, enables you to override settings in general for the components running as DaemonSets.
Example configuration:
spec:
nodes:
<NODE-LABEL>::<NODE-LABEL-VALUE>:
features:
# Detailed information about features might be found at
# openstack_controller/admission/validators/nodes/schema.yaml
services:
<service>:
<chart>:
<chart_daemonset_name>:
values:
# Any value from specific helm chart
See also
OpenStackDeploymentSecret custom resource¶
Available since MOSK 22.3
The resource of kind OpenStackDeploymentSecret
(OsDplSecret) is a custom
resource that is intended to aggregate cloud’s confidential settings such
as SSL/TLS certificates, external systems access credentials, and other
secrets.
To obtain detailed information about the schema of an OsDplSecret custom resource, run:
kubectl get crd openstackdeploymentsecret.lcm.mirantis.com -oyaml
The resource has similar structure as the OpenStackDeployment
custom
resource and enables the user to set a limited subset of fields that
contain sensitive data.
Important
If you are migrating the related fields from the
OpenStackDeployment
custom resource, refer to
Migrating secrets from OpenStackDeployment to OpenStackDeploymentSecret CR.
Example of an OpenStackDeploymentSecret
custom resource of minimum
configuration:
1apiVersion: lcm.mirantis.com/v1alpha1
2kind: OpenStackDeploymentSecret
3metadata:
4 name: osh-dev
5 namespace: openstack
6spec:
7 features:
8 ssl:
9 public_endpoints:
10 ca_cert: |-
11 -----BEGIN CERTIFICATE-----
12 ...
13 -----END CERTIFICATE-----
14 api_cert: |-
15 -----BEGIN CERTIFICATE-----
16 ...
17 -----END CERTIFICATE-----
18 api_key: |-
19 -----BEGIN RSA PRIVATE KEY-----
20 ...
21 -----END RSA PRIVATE KEY-----
22 barbican:
23 backends:
24 vault:
25 approle_role_id: f6f0f775-...-cc00a1b7d0c3
26 approle_secret_id: 2b5c4b87-...-9bfc6d796f8c
features:ssl
¶Contains the content of SSL/TLS certificates (server, key, and CA bundle) used to enable secure communication to public OpenStack API services.
These certificates must be issued to the DNS domain specified in the public_domain_name field.
features:barbican:backends:vault
¶Specifies the object containing parameters used to connect to a Hashicorp Vault instance. The list of supported configurations includes:
approle_role_id
– Vault app role IDapprole_secret_id
– Secret ID created for the app role
OpenStackDeploymentStatus custom resource¶
The resource of kind OpenStackDeploymentStatus
(OsDplSt) is a custom
resource that describes the status of an OpenStack deployment.
To obtain detailed information about the schema of an
OpenStackDeploymentStatus
(OsDplSt) custom resource, run:
kubectl get crd openstackdeploymentstatus.lcm.mirantis.com -oyaml
To obtain the status definition for a particular OpenStack deployment, run:
kubectl -n openstack get osdplst -oyaml
Example of an OsDplSt CR:
kind: OpenStackDeploymentStatus
metadata:
name: osh-dev
namespace: openstack
spec: {}
status:
handle:
lastStatus: update
health:
barbican:
api:
generation: 2
status: Ready
cinder:
api:
generation: 2
status: Ready
backup:
generation: 1
status: Ready
scheduler:
generation: 1
status: Ready
volume:
generation: 1
status: Ready
osdpl:
cause: update
changes: '((''add'', (''status'',), None, {''watched'': {''ceph'': {''secret'':
{''hash'': ''0fc01c5e2593bc6569562b451b28e300517ec670809f72016ff29b8cbaf3e729''}}}}),)'
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:45.633143"
services:
baremetal:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:00:54.081353"
block-storage:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:00:57.306669"
compute:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:18.853068"
coordination:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:00.593719"
dashboard:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:00:57.652145"
database:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:00.233777"
dns:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:00:56.540886"
identity:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:00.961175"
image:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:00:58.976976"
ingress:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:01.440757"
key-manager:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:00:51.822997"
load-balancer:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:02.462824"
memcached:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:03.165045"
messaging:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:00:58.637506"
networking:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:35.553483"
object-storage:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:01.828834"
orchestration:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:01:02.846671"
placement:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:00:58.039210"
redis:
controller_version: 0.5.3.dev12
fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
openstack_version: ussuri
state: APPLIED
timestamp: "2021-09-08 17:00:36.562673"
For the detailed description of the OsDplSt main elements, see the sections below:
The health
subsection provides a brief output on services health.
The osdpl
subsection describes the overall status of the OpenStack
deployment and consists of the following items:
cause
¶The cause that triggered the LCM action: update
when OsDpl is updated,
resume
when the OpenStack Controller is restarted.
changes
¶A string representation of changes in the OpenstackDeployment
object.
controller_version
¶The version of openstack-controller
that handles the LCM action.
fingerprint
¶The SHA sum of the OpenStackDeployment
object spec
section.
openstack_version
¶The current OpenStack version specified in the osdpl
object.
state
¶The current state of the LCM action. Possible values include:
APPLYING
- not all operations are completed.APPLIED
- all operations are completed.
timestamp
¶The timestamp of the status:osdpl
section update.
The services
subsection provides detailed information of LCM performed with
a specific service. This is a dictionary where keys are service names, for
example, baremetal
or compute
and values are dictionaries with the
following items:
controller_verison
¶The version of the openstack-controller
that handles the LCM action on a
specific service.
fingerprint
¶The SHA sum of the OpenStackDeployment
object spec
section used when
perfoming the LCM on a specific service.
openstack_version
¶The OpenStack version specified in the osdpl
object used when performing
the LCM action on a specific service.
state
¶The current state of the LCM action performed on a service. Possible values include:
WAITING
- waiting for dependencies.APPLYING
- not all operations are completed.APPLIED
- all operations are completed.
timestamp
¶The timestamp of the status:services:<SERVICE-NAME>
section update.
OpenStack on Kubernetes architecture¶
OpenStack and auxiliary services are running as containers in the kind: Pod
Kubernetes resources. All long-running services are governed by one of
the ReplicationController-enabled
Kubernetes resources, which include
either kind: Deployment
, kind: StatefulSet
, or kind: DaemonSet
.
The placement of the services is mostly governed by the Kubernetes node labels. The labels affecting the OpenStack services include:
openstack-control-plane=enabled
- the node hosting most of the OpenStack control plane services.openstack-compute-node=enabled
- the node serving as a hypervisor for Nova. The virtual machines with tenants workloads are created there.openvswitch=enabled
- the node hosting Neutron L2 agents and OpenvSwitch pods that manage L2 connection of the OpenStack networks.openstack-gateway=enabled
- the node hosting Neutron L3, Metadata and DHCP agents, Octavia Health Manager, Worker and Housekeeping components.

Note
OpenStack is an infrastructure management platform. Mirantis OpenStack for Kubernetes (MOSK) uses Kubernetes mostly for orchestration and dependency isolation. As a result, multiple OpenStack services are running as privileged containers with host PIDs and Host Networking enabled. You must ensure that at least the user with the credentials used by Helm/Tiller (administrator) is capable of creating such Pods.
Infrastructure services¶
Service |
Description |
---|---|
Storage |
While the underlying Kubernetes cluster is configured to use Ceph CSI for providing persistent storage for container workloads, for some types of workloads such networked storage is suboptimal due to latency. This is why the separate |
Database |
A single WSREP (Galera) cluster of MariaDB is deployed as the SQL
database to be used by all OpenStack services. It uses the storage class
provided by Local Volume Provisioner to store the actual database files.
The service is deployed as |
Messaging |
RabbitMQ is used as a messaging bus between the components of the OpenStack services. A separate instance of RabbitMQ is deployed for each OpenStack service that needs a messaging bus for intercommunication between its components. An additional, separate RabbitMQ instance is deployed to serve as a notification messages bus for OpenStack services to post their own and listen to notifications from other services. StackLight also uses this message bus to collect notifications for monitoring purposes. Each RabbitMQ instance is a single node and is deployed as
|
Caching |
A single multi-instance of the Memcached service is deployed to be used by all OpenStack services that need caching, which are mostly HTTP API services. |
Coordination |
A separate instance of etcd is deployed to be used by Cinder, which require Distributed Lock Management for coordination between its components. |
Ingress |
Is deployed as |
Image pre-caching |
A special This is especially useful for containers used in |
OpenStack services¶
Service |
Description |
---|---|
Identity (Keystone) |
Uses MySQL back end by default.
|
Image (Glance) |
Supported back end is RBD (Ceph is required). |
Volume (Cinder) |
Supported back end is RBD (Ceph is required). |
Network (Neutron) |
Supported back ends are Open vSwitch and Tungsten Fabric. |
Placement |
|
Compute (Nova) |
Supported hypervisor is Qemu/KVM through libvirt library. |
Dashboard (Horizon) |
|
DNS (Designate) |
Supported back end is PowerDNS. |
Load Balancer (Octavia) |
|
RADOS Gateway Object Storage (SWIFT) |
Provides the object storage and a RADOS Gateway Swift API that is
compatible with the OpenStack Swift API. You can manually enable the
service in the |
Instance HA (Masakari) |
An OpenStack service that ensures high availability of instances running
on a host. You can manually enable Masakari in the
|
Orchestration (Heat) |
|
Key Manager (Barbican) |
The supported back ends include:
|
Tempest |
Runs tests against a deployed OpenStack cloud. You can manually enable
Tempest in the |
Telemetry |
Telemetry services include alarming (aodh), event storage (Panko),
metering (Ceilometer), and metric (Gnocchi). All services should be
enabled together through the list of services to be deployed in the
|
OpenStack database architecture¶
A complete setup of a MariaDB Galera cluster for OpenStack is illustrated in the following image:

MariaDB server pods are running a Galera multi-master cluster. Clients
requests are forwarded by the Kubernetes mariadb
service to the
mariadb-server
pod that has the primary
label. Other pods from
the mariadb-server
StatefulSet have the backup
label. Labels are
managed by the mariadb-controller
pod.
The MariaDB Controller periodically checks the readiness of the
mariadb-server
pods and sets the primary
label to it if the following
requirements are met:
The
primary
label has not already been set on the pod.The pod is in the ready state.
The pod is not being terminated.
The pod name has the lowest integer suffix among other ready pods in the StatefulSet. For example, between
mariadb-server-1
andmariadb-server-2
, the pod with themariadb-server-1
name is preferred.
Otherwise, the MariaDB Controller sets the backup
label. This means that
all SQL requests are passed only to one node while other two nodes are in
the backup state and replicate the state from the primary node.
The MariaDB clients are connecting to the mariadb
service.
OpenStack and Ceph controllers integration¶
The integration between Ceph and OpenStack controllers is implemented
through the shared Kubernetes openstack-ceph-shared
namespace.
Both controllers have access to this namespace to read and write
the Kubernetes kind: Secret
objects.

As Ceph is required and only supported back end for several OpenStack
services, all necessary Ceph pools must be specified in the configuration
of the kind: MiraCeph
custom resource as part of the deployment.
Once the Ceph cluster is deployed, the Ceph Controller posts the
information required by the OpenStack services to be properly configured
as a kind: Secret
object into the openstack-ceph-shared
namespace.
The OpenStack Controller watches this namespace. Once the corresponding
secret is created, the OpenStack Controller transforms this secret to the
data structures expected by the OpenStack-Helm charts. Even if an OpenStack
installation is triggered at the same time as a Ceph cluster deployment, the
OpenStack Controller halts the deployment of the OpenStack services that
depend on Ceph availability until the secret in the shared namespace is
created by the Ceph Controller.
For the configuration of Ceph RADOS Gateway as an OpenStack Object
Storage, the reverse process takes place. The OpenStack Controller waits
for the OpenStack-Helm to create a secret with OpenStack Identity
(Keystone) credentials that RADOS Gateway must use to validate the
OpenStack Identity tokens, and posts it back to the same
openstack-ceph-shared
namespace in the format suitable for
consumption by the Ceph Controller. The Ceph Controller then reads this
secret and reconfigures RADOS Gateway accordingly.
OpenStack and StackLight integration¶
StackLight integration with OpenStack includes automatic discovery of RabbitMQ
credentials for notifications and OpenStack credentials for OpenStack API
metrics. For details, see the
openstack.rabbitmq.credentialsConfig
and
openstack.telegraf.credentialsConfig
parameters description in
StackLight configuration parameters.
OpenStack and Tungsten Fabric integration¶
The levels of integration between OpenStack and Tungsten Fabric (TF) include:
Controllers integration¶
The integration between the OpenStack and TF controllers is
implemented through the shared Kubernetes openstack-tf-shared
namespace.
Both controllers have access to this namespace to read and write the Kubernetes
kind: Secret
objects.
The OpenStack Controller posts the data into the openstack-tf-shared
namespace required by the TF services. The TF controller watches this
namespace. Once an appropriate secret is created, the TF controller obtains it
into the internal data structures for further processing.
The OpenStack Controller includes the following data for the TF Controller:
tunnel_inteface
Name of the network interface for the TF data plane. This interface is used by TF for the encapsulated traffic for overlay networks.
- Keystone authorization information
Keystone Administrator credentials and an up-and-running IAM service are required for the TF Controller to initiate the deployment process.
- Nova metadata information
Required for the TF vRrouter agent service.
Also, the OpenStack Controller watches the openstack-tf-shared
namespace
for the vrouter_port
parameter that defines the vRouter port number and
passes it to the nova-compute
pod.
Services integration¶
The list of the OpenStack services that are integrated with TF through their API include:
neutron-server
- integration is provided by thecontrail-neutron-plugin
component that is used by theneutron-server
service for transformation of the API calls to the TF API compatible requests.nova-compute
- integration is provided by thecontrail-nova-vif-driver
andcontrail-vrouter-api
packages used by thenova-compute
service for interaction with the TF vRouter to the network ports.octavia-api
- integration is provided by the Octavia TF Driver that enables you to use OpenStack CLI and Horizon for operations with load balancers. See Tungsten Fabric load balancing for details.
Warning
TF is not integrated with the following OpenStack services:
DNS service (Designate)
Key management (Barbican)
Services¶
The section explains specifics of the services provided by Mirantis OpenStack for Kubernetes (MOSK). The list of the services and their supported features included in this section is not full and is being constantly amended based on the complexity of the architecture and use of a particular service.
Compute service¶
Mirantis OpenStack for Kubernetes (MOSK) provides instances management capability through the OpenStack Compute service, or Nova. Nova interacts with other OpenStack components of an OpenStack environment to provide life-cycle management of the virtual machine instances.
vCPU type¶
Available since MOSK 22.1
host-model
is the default CPU model configured for all instances managed
by the OpenStack Compute service (Nova), the same as in Nova for the KVM or
QEMU hypervisor.
To configure the type of vCPU that Nova will create instances with, use the
spec:features:nova:vcpu_type
definition in the OpenStackDeployment
custom resource.
The supported CPU models include:
host-model
(default) - mimics the host CPU and provides for decent performance, good security, and moderate compatibility with live migrations.With this mode, libvirt finds an available predefined CPU model that best matches the host CPU, and then explicitly adds the missing CPU feature flags to closely match the host CPU features. To mitigate known security flaws, libvirt automatically adds critical CPU flags, supported by installed libvirt, QEMU, kernel, and CPU microcode versions.
This is a safe choice if your OpenStack compute node CPUs are of the same generation. If your OpenStack compute node CPUs are sufficiently different, for example, span multiple CPU generations, Mirantis strongly recommends setting explicit CPU models supported by all of your OpenStack compute node CPUs or organizing your OpenStack compute nodes into host aggregates and availability zones that have largely identical CPUs.
Note
The
host-model
model does not guarantee two-way live migrations between nodes.When migrating instances, the libvirt domain XML is first copied as is to the destination OpenStack compute node. Once the instance is hard rebooted or shut down and started again, the domain XML will be re-generated. If versions of libvirt, kernel, CPU microcode, or BIOS firmware differ from what they were on the source compute node the instance was started before, libvirt may pick up additional CPU feature flags, making it impossible to live-migrate back to the original compute node.
host-passthrough
- provides maximum performance, especially when nested virtualization is required or if live migration support is not a concern for workloads. Live migration requires exactly the same CPU on all OpenStack compute nodes, including the CPU microcode and kernel versions. Therefore, for live migrations support, organize your compute nodes into host aggregates and availability zones. For workload migration between non-identical OpenStack compute nodes, contact Mirantis support.A comma-separated list of exact QEMU CPU models to create and emulate. Specify the common and less advanced CPU models first. All explicit CPU models provided must be compatible with the OpenStack compute node CPUs.
To specify an exact CPU model, review the available CPU models and their features. List and inspect the
/usr/share/libvirt/cpu_map/*.xml
files in thelibvirt
containers of pods of thelibvirt
DeamonSet or multiple DaemonSets if you are using node-specific settings.Review the available CPU models
Identify the available libvirt DaemonSets:
kubectl -n openstack get ds -l application=libvirt --show-labels
Example of system response:
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE LABELS libvirt-libvirt-default 2 2 2 2 2 openstack-compute-node=enabled 34d app.kubernetes.io/managed-by=Helm,application=libvirt,component=libvirt,release_group=openstack-libvirt
Identify the pods of libvirt DaemonSets:
kubectl -n openstack get po -l application=libvirt,release_group=openstack-libvirt
Example of system response:
NAME READY STATUS RESTARTS AGE libvirt-libvirt-default-5zs8m 2/2 Running 0 8d libvirt-libvirt-default-vt8wd 2/2 Running 0 3d14h
List and review the available CPU model definition files. For example:
kubectl -n openstack exec -ti libvirt-libvirt-default-5zs8m -c libvirt -- ls /usr/share/libvirt/cpu_map/*.xml
List and review the content of all CPU model definition files. For example:
kubectl -n openstack exec -ti libvirt-libvirt-default-5zs8m -c libvirt -- bash -c 'for f in `ls /usr/share/libvirt/cpu_map/*.xml`; do echo $f; cat $f; done'
For example, to set the host-passthrough
CPU model for all OpenStack
compute nodes:
spec:
features:
nova:
vcpu_type: host-passthrough
For nodes that are labeled with processor=amd-epyc
, set a custom EPYC
CPU model:
spec:
nodes:
processor::amd-epyc
features:
nova:
vcpu_type: EPYC
Networking service¶
Mirantis OpenStack for Kubernetes (MOSK) Networking service, represented by the OpenStack Neutron service, provides workloads with Connectivity-as-a-Service enabling instances to communicate with each other and the outside world.
The API provided by the service abstracts all the nuances of implementing a virtual network infrastructure on top of your own physical network infrastructure. The service allows cloud users to create advanced virtual network topologies that may include load balancing, virtual private networking, traffic filtering, and other services.
MOSK Networking service supports Open vSwitch and Tungsten Fabric SDN technologies as back ends.
MOSK offers Neutron as a part of its core setup. You can
configure the service through the spec:features:neutron
section of the
OpenStackDeployment
custom resource. See features
for details.
Networking service known limitations¶
Due to the known issue #1774459 in the upstream implementation, Mirantis does not recommend using Distributed Virtual Routing (DVR) routers in the same networks as load balancers or other applications that utilize the Virtual Router Redundancy Protocol (VRRP) such as Keepalived. The issue prevents the DVR functionality from working correctly with network protocols that rely on the Address Resolution Protocol (ARP) announcements such as VRRP.
The issue occurs when updating permanent ARP entries for
allowed_address_pair
IP addresses in DVR routers since DVR performs
the ARP table update through the control plane and does not allow any
ARP entry to leave the node to prevent the router IP/MAC from
contaminating the network.
This results in various network failover mechanisms not functioning in virtual networks that have a distributed virtual router plugged in. For instance, the default back end for MOSK Load Balancing service, represented by OpenStack Octavia with the OpenStack Amphora back end when deployed in the HA mode in a DVR-connected network, is not able to redirect the traffic from a failed active service instance to a standby one without interruption.
DNS service¶
Mirantis OpenStack for Kubernetes (MOSK) provides DNS records managing capability through the OpenStack DNS service, or Designate.
LoadBalancer type for PowerDNS¶
Available since MOSK 22.2
The supported back end for Designate is PowerDNS. If required, you can specify whether to use an external IP address or UDP, TCP, or TCP + UDP kind of Kubernetes for the PowerDNS service.
To configure LoadBalancer for PowerDNS, use the spec:features:designate
definition in the OpenStackDeployment
custom resource.
The list of supported options includes:
external_ip
- Optional. An IP address for the LoadBalancer service. If not defined, LoadBalancer allocates the IP address.protocol
- A protocol for the Designate back end in Kubernetes. Can only beudp
,tcp
, ortcp+udp
.type
- The type of the back end for Designate. Can only bepowerdns
.
For example:
spec:
features:
designate:
backend:
external_ip: 10.172.1.101
protocol: udp
type: powerdns
DNS service known limitations¶
Due to an issue in the dnspython
library, Asynchronous Transfer Full Range
(AXFR) requests do not work and cause inability to set up a secondary DNS zone.
The issue affects OpenStack Victoria and will be fixed in the Yoga release.
Instance HA service¶
Instance High Availability Service or Masakari is an OpenStack project designed to ensure high availability of instances and compute processes running on hosts.
The service consists of the following microservices:
API recieves requests from users and events from monitors, and sends them to engine
Engine executes recovery workflow
Monitors detect failures and notifies API. MOSK uses monitors of the following types:
Instance monitor performs liveness of instance processes
Host monitor performs liveness of a compute host, runs as part of the Node controller from the OpenStack Controller
Note
The Processes monitor is not present in MOSK as far as HA for the compute processes is handled by Kubernetes.
Block Storage service¶
Volume encryption¶
TechPreview
The OpenStack Block Storage service (Cinder) supports volume encryption using a key stored in the OpenStack Key Manager service (Barbican). Such configuration uses Linux Unified Key Setup (LUKS) to create an encrypted volume type and attach it to the OpenStack Compute (Nova) instances. Nova retrieves the asymmetric key from Barbican and stores it on the OpenStack compute node as a libvirt key to encrypt the volume locally or on the back end and only after that transfers it to Cinder.
Note
To create an encrypted volume under a non-admin user, the
creator
role must be assigned to the user.When planning your cloud, consider that encryption may impact CPU.
Object Storage service¶
RADOS Gateway (RGW) provides Object Storage (Swift) API for end users in
MOSK deployments. For the API compatibility, refer to
Ceph Documentation: Ceph Object Gateway Swift API.
You can manually enable the service in the OpenStackDeployment
CR as
described in Deploy an OpenStack cluster.
Object storage server-side encryption¶
Available since MOSK 22.1 TechPreview
RADOS Gateway also provides Amazon S3 compatible API. For details, see Ceph Documentation: Ceph Object Gateway S3 API. Using integration with the OpenStack Key Manager service (Barbican), the objects uploaded through S3 API can be encrypted by RGW according to the AWS Documentation: Protecting data using server-side encryption with customer-provided encryption keys (SSE-C) specification.
Instead of Swift, such configuration uses an S3 client to upload server-side encrypted objects. Using server-side encryption, the data is sent over a secure HTTPS connection in an unencrypted form and the Ceph Object Gateway stores that data in the Ceph cluster in an encrypted form.
Image service¶
Mirantis OpenStack for Kubernetes (MOSK) provides the image management capability through the OpenStack Image service, aka Glance.
The Image service enables you to discover, register, and retrieve virtual machine images. Using the Glance API, you can query virtual machine image metadata and retrieve actual images.
MOSK deployment profiles include the Image service in the
core set of services. You can configure the Image service through the
spec:features
definition in the OpenStackDeployment
custom resource.
See features for details.
Image signature verification¶
Available since MOSK 21.6 TechPreview
MOSK can automatically verify the cryptographic signatures
associated with images to ensure the integrity of their data. A signed image
has a few additional properties set in its metadata that include
img_signature
, img_signature_hash_method
, img_signature_key_type
,
and img_signature_certificate_uuid
. You can find more information about
these properties and their values in the upstream OpenStack documentation.
MOSK performs image signature verification during the following operations:
A cloud user or a service creates an image in the store and starts to upload its data. If the signature metadata properties are set on the image, its content gets verified against the signature. The Image service accepts non-signed image uploads.
A cloud user spawns a new instance from an image. The Compute service ensures that the data it downloads from the image storage matches the image signature. If the signature is missing or does not match the data, the operation fails. Limitations apply, see Known limitations.
A cloud user boots an instance from a volume, or creates a new volume from an image. If the image is signed, the Block Storage service compares the downloaded image data against the signature. If there is a mismatch, the operation fails. The service will accept a non-signed image as a source for a volume. Limitations apply, see Known limitations.
spec:
features:
glance:
signature:
enabled: true
Every MOSK cloud is pre-provisioned with a baseline set of images containing most popular operating systems, such as Ubuntu, Fedora, CirrOS.
In addition, a few services in MOSK rely on the creation of service instances to provide their functions, namely the Load Balancer service and the Bare Metal service, and require corresponding images to exist in the image store.
When image signature verification is enabled during the cloud deployment, all these images get automatically signed with a pre-generated self-signed certificate. Enabling the feature in an already existing cloud requires manual signing of all of the images stored in it. Consult the OpenStack documentation for an example of the image signing procedure.
The image signature verification is supported for LVM and local back ends for ephemeral storage.
The functionality is not compatible with Ceph-backed ephemeral storage combined with RAW formatted images. The Ceph copy-on-write mechanism enables the user to create instance virtual disks without downloading the image to a compute node, the data is handled completely on the side of a Ceph cluster. This enables you to spin up instances almost momentarily but makes it impossible to verify the image data before creating an instance from it.
The Image service does not enforce the presence of a signature in the metadata when the user creates a new image. The service will accept the non-signed image uploads.
The Image service does not verify the correctness of an image signature upon update of the image metadata.
MOSK does not validate if the certificate used to sign an image is trusted, it only ensures the correctness of the signature itself. Cloud users are allowed to use self-signed certificates.
The Compute service does not verify image signature for Ceph back end when the RAW image format is used as described in Supported storage back ends.
The Compute service does not verify image signature if the image is already cached on the target compute node.
The Instance HA service may experience issues when auto-evacuating instances created from signed images if it does have access to the corresponding secrets in the Key manager service.
The Block Storage service does not perform image signature verification when a Ceph back end is used and the images are in the RAW format.
The Block Storage service does not enforce the presence of a signature on the images.
Telemetry services¶
The Telemetry services are part of OpenStack services available in Mirantis OpenStack for Kubernetes (MOSK). The Telemetry services monitor OpenStack components, collect and store the telemetry data from them, and perform responsive actions upon this data. See OpenStack on Kubernetes architecture for details about OpenStack services in MOSK.
OpenStack Ceilometer is a service that collects data from various OpenStack
components. The service can also collect and process notifications from
different OpenStack services. Ceilometer stores the data in the Gnocchi
database. The service is specified as metering
in the
OpenStackDeployment
custom resource (CR).
Gnocchi is an open-source time series database. One of the advantages of this
database is the ability to pre-aggregate the telemetry data while storing it.
Gnocchi is specified as metric
in the OpenStackDeployment
CR.
OpenStack Aodh is part of the Telemetry project. Aodh provides a service that
creates alarms based on various metric values or specific events and triggers
response actions. The service uses data collected and stored by Ceilometer
and Gnocchi. Aodh is specified as alarming
in the OpenStackDeployment
CR.
OpenStack Panko is the service that stores the event data generated by other
OpenStack services. The service provides the ability to browse and query the
data. Panko is specified as event
in the OpenStackDeployment
CR.
Note
The OpenStack Panko service has been removed from the product since MOSK 22.2. See Deprecation Notes for details.
Enabling Telemetry services¶
The Telemetry feature in MOSK has a single mode.
The autoscaling
mode provides settings for telemetry data collection and
storing. The OpenStackDeployment
CR should have this mode specified for the
correct work of the OpenStack Telemetry services. The autoscaling
mode has
the following notable configurations:
Gnocchi stores cache and data using the Redis storage driver.
Metric stores data for one hour with a resolution of 1 minute.
The Telemetry services are disabled by default in MOSK.
You have to enable them
in the openstackdeployment.yaml
file (the OpenStackDeployment
CR).
The following code block provides an example of deploying the Telemetry
services as part of MOSK:
kind: OpenStackDeployment
spec:
features:
services:
- alarming
- metering
- metric
telemetry:
mode: autoscaling
Gnocchi is not an OpenStack service, so the settings related to its
functioning should be included in the spec:common:infra
section of the
OpenStackDeployment
CR.
Available since MOSK 22.1
The Ceilometer configuration files contain many list structures. Overriding
list elements in YAML files is context-dependent and error-prone. Therefore,
to override these configuration files, define the spec:services
structure in the OpenStackDeployment
CR.
The spec:services
structure provides the ability to use a complete file as
text and not as YAML data structure.
Overriding through the spec:services
structure is possible for the
following files:
pipeline.yaml
polling.yaml
meters.yaml
gnocchi_resources.yaml
event_pipeline.yaml
event_definitions.yaml
An example of overriding through the OpenStackDeployment
CR
By default, the autoscaling mode collects the data related to CPU, disk, and memory every minute. The autoscaling mode collects the rest of the available metrics every hour.
The following example shows the overriding of the polling.yaml
configuration file through the spec:services
structure of the
OpenStackDeployment
CR.
Get the current configuration file:
kubectl -n openstack get secret ceilometer-etc -ojsonpath="{.data['polling\.yaml']}" | base64 -d sources: - interval: 60 meters: - cpu - disk* - memory* name: ascale_pollsters - interval: 3600 meters: - '*' name: all_pollsters
Add the
network
parameter to the file.Copy and paste the edited
polling.yaml
file content to thespec:services:metering
section of theOpenStackDeployment
CR:spec: services: metering: ceilometer: conf: polling: | # Obligatory. The "|" indicator denotes the literal style. See https://yaml.org/spec/1.2-old/spec.html#id2795688 for details. sources: - interval: 60 meters: - cpu - disk* - memory* - network* name: ascale_pollsters - interval: 3600 meters: - '*' name: all_pollsters
Networking¶
Depending on the size of an OpenStack environment and the components that you use, you may want to have a single or multiple network interfaces, as well as run different types of traffic on a single or multiple VLANs.
This section provides the recommendations for planning the network configuration and optimizing the cloud performance.
Networking overview¶
Mirantis OpenStack for Kubernetes (MOSK) cluster networking is complex and defined by the security requirements and performance considerations. It is based on the Kubernetes cluster networking provided by Mirantis Container Cloud and expanded to facilitate the demands of the OpenStack virtualization platform.
A Container Cloud Kubernetes cluster provides a platform for MOSK and is considered a part of its control plane. All networks that serve Kubernetes and related traffic are considered control plane networks. The Kubernetes cluster networking is typically focused on connecting pods of different nodes as well as exposing the Kubernetes API and services running in pods into an external network.
The OpenStack networking connects virtual machines to each other and the outside world. Most of the OpenStack-related networks are considered a part of the data plane in an OpenStack cluster. Ceph networks are considered data plane networks for the purpose of this reference architecture.
When planning your OpenStack environment, consider the types of traffic that your workloads generate and design your network accordingly. If you anticipate that certain types of traffic, such as storage replication, will likely consume a significant amount of network bandwidth, you may want to move that traffic to a dedicated network interface to avoid performance degradation.
The following diagram provides a simplified overview of the underlay networking in a MOSK environment:
Management cluster networking¶
This page summarizes the recommended networking architecture of a Mirantis Container Cloud management cluster for a Mirantis OpenStack for Kubernetes (MOSK) cluster.
We recommend deploying the management cluster with a dedicated interface for the provisioning (PXE) network. The separation of the provisioning network from the management network ensures additional security and resilience of the solution.
MOSK end users typically should have access to the Keycloak service in the management cluster for authentication to the Horizon web UI. Therefore, we recommend that you connect the management network of the management cluster to an external network through an IP router. The default route on the management cluster nodes must be configured with the default gateway in the management network.
If you deploy the multi-rack configuration, ensure that the provisioning network of the management cluster is connected to an IP router that connects it to the provisioning networks of all racks.
MOSK cluster networking¶
Mirantis OpenStack for Kubernetes (MOSK) clusters managed by Mirantis Container Cloud use the following networks to serve different types of traffic:
Network role |
Description |
---|---|
Provisioning (PXE) network |
Facilitates the iPXE boot of all bare metal machines in a MOSK cluster and provisioning of the operating system to machines. This network is only used during provisioning of the host. It must not be configured on an operational MOSK node. |
Life-cycle management (LCM) and API network |
Connects LCM agents on the hosts to the Container Cloud API
provided by the regional or management cluster.
Used for communication between You can use more than one LCM network segment in a MOSK cluster. In this case, separated L2 segments and interconnected L3 subnets are still used to serve LCM and API traffic. All IP subnets in the LCM networks must be connected to each other by IP routes. These routes must be configured on the hosts through L2 templates. All IP subnets in the LCM network must be connected to the Kubernetes API endpoints of the management or regional cluster through an IP router. You can manually select the VIP address for the Kubernetes API
endpoint from the LCM subnet and specify it in the Note Due to current limitations of the API endpoint failover, only one of the LCM networks can contain the API endpoint. This network is called API/LCM throughout this documentation. It consists of a VLAN segment stretched between all Kubernetes manager nodes in the cluster and the IP subnet that provides IP addresses allocated to these nodes. |
Kubernetes workloads network |
Serves as an underlay network for traffic between pods in the managed cluster. Calico uses this network to build mesh interconnections between nodes in the cluster. This network should not be shared between clusters. There might be more than one Kubernetes pods network in the cluster. In this case, they must be connected through an IP router. Kubernetes workloads network does not need an external access. |
Kubernetes external network |
Serves for an access to the OpenStack endpoints in a MOSK
cluster. Due to the limitations of MetalLB in the A typical MOSK cluster only has one external network. The external network must include at least two IP address ranges
defined by separate
|
Storage access network |
Serves for the storage access traffic from and to Ceph OSD services. A MOSK cluster may have more than one VLAN segment and IP subnet in the storage access network. All IP subnets of this network in a single cluster must be connected by an IP router. The storage access network does not require external access unless you want to directly expose Ceph to the clients outside of a MOSK cluster. Note A direct access to Ceph by the clients outside of a MOSK cluster is technically possible but not supported by Mirantis. Use at your own risk. The IP addresses from subnets in this network are assigned to Ceph nodes. The Ceph OSD services bind to these addresses on their respective nodes. This is a public network in Ceph terms. 1 |
Storage replication network |
Serves for the storage replication traffic between Ceph OSD services. A MOSK cluster may have more than one VLAN segment and IP subnet in this network as long as the subnets are connected by an IP router. This network does not require external access. The IP addresses from subnets in this network are assigned to Ceph nodes. The Ceph OSD services bind to these addresses on their respective nodes. This is a cluster network in Ceph terms. 1 |
Out-of-Band (OOB) network |
Connects Baseboard Management Controllers (BMCs) of the bare metal hosts. Must not be accessible from a MOSK cluster. |
- 1(1,2)
For more details about Ceph networks, see Ceph Network Configuration Reference.
The following diagram illustrates the networking schema of the Container Cloud deployment on bare metal with a MOSK cluster:

Network types¶
This section describes network types for Layer 3 networks used for Kubernetes and Mirantis OpenStack for Kubernetes (MOSK) clusters along with requirements for each network type.
Note
Only IPv4 is currently supported by Container Cloud and IPAM for infrastructure networks. IPv6 is not supported and not used in Container Cloud and MOSK underlay infrastructure networks.
The following diagram provides an overview of the underlay networks in a MOSK environment:

L3 networks for Kubernetes¶
A MOSK deployment typically requires the following types of networks:
- Provisioning network
Used for provisioning of bare metal servers.
- Management network
Used for management of the Container Cloud infrastructure and for communication between containers in Kubernetes.
- LCM/API network
Must be configured on the Kubernetes manager nodes of the cluster. Contains the Kubernetes API endpoint with the VRRP virtual IP address. Enables communication between the MKE cluster nodes.
- LCM network
Enables communication between the MKE cluster nodes. Multiple VLAN segments and IP subnets can be created for a multi-rack architecture. Each server must be connected to one of the LCM segments and have an IP from the corresponding subnet.
- External network
Used to expose the OpenStack, StackLight, and other services of the MOSK cluster.
- Kubernetes workloads network
Used for communication between containers in Kubernetes.
- Storage access network (Ceph)
Used for accessing the Ceph storage. In Ceph terms, this is a public network 0. We recommended that it is placed on a dedicated hardware interface.
- Storage replication network (Ceph)
Used for Ceph storage replication. In Ceph terms, this is a cluster network 0. To ensure low latency and fast access, place the network on a dedicated hardware interface.
- 0(1,2)
For details about Ceph networks, see Ceph Network Configuration Reference.
L3 networks for MOSK¶
The MOSK deployment additionally requires the following networks.
Service name |
Network |
Description |
VLAN name |
---|---|---|---|
Networking |
Provider networks |
Typically, a routable network used to provide the external access to OpenStack instances (a floating network). Can be used by the OpenStack services such as Ironic, Manila, and others, to connect their management resources. |
|
Networking |
Overlay networks (virtual networks) |
The network used to provide denied, secure tenant networks with the help of the tunneling mechanism (VLAN/GRE/VXLAN). If the VXLAN and GRE encapsulation takes place, the IP address assignment is required on interfaces at the node level. |
|
Compute |
Live migration network |
The network used by the OpenStack compute service (Nova) to transfer data during live migration. Depending on the cloud needs, it can be placed on a dedicated physical network not to affect other networks during live migration. The IP address assignment is required on interfaces at the node level. |
|
The way of mapping of the logical networks described above to physical networks and interfaces on nodes depends on the cloud size and configuration. We recommend placing OpenStack networks on a dedicated physical interface (bond) that is not shared with storage and Kubernetes management network to minimize the influence on each other.
L3 networks requirements¶
The following tables describe networking requirements for a MOSK cluster, Container Cloud management and Ceph clusters.
Network type |
Provisioning |
Management |
---|---|---|
Suggested interface name |
|
|
Minimum number of VLANs |
1 |
1 |
Minimum number of IP subnets |
3 |
2 |
Minimum recommended IP subnet size |
|
|
External routing |
Not required |
Required, may use proxy server |
Multiple segments/stretch segment |
Stretch segment for management cluster due to MetalLB Layer 2 limitations 1 |
Stretch segment due to VRRP, MetalLB Layer 2 limitations |
Internal routing |
Routing to separate DHCP segments, if in use |
|
- 1
Multiple VLAN segments with IP subnets can be added to the cluster configuration for separate DHCP domains.
Network type |
Provisioning |
LCM/API |
LCM |
External |
Kubernetes workloads |
---|---|---|---|---|---|
Minimum number of VLANs |
1 (optional) |
1 |
1 (optional) |
1 |
1 |
Suggested interface name |
N/A |
|
|
|
|
Minimum number of IP subnets |
1 (optional) |
1 |
1 (optional) |
2 |
1 |
Minimum recommended IP subnet size |
16 IPs (DHCP range) |
|
1 IP per MOSK node (Kubernetes worker) |
|
1 IP per cluster node |
Stretch or multiple segments |
Multiple |
Stretch due to VRRP limitations |
Multiple |
Stretch connected to all MOSK controller nodes |
Multiple |
External routing |
Not required |
Not required |
Not required |
Required, default route |
Not required |
Internal routing |
Routing to the provisioning network of the management cluster |
|
|
Routing to the IP subnet of the Container Cloud Management API |
Routing to all IP subnets of Kubernetes workloads |
- 2
The bridge interface with this name is mandatory if you need to separate Kubernetes workloads traffic. You can configure this bridge over the VLAN or directly over the bonded or single interface.
Network type |
Storage access |
Storage replication |
---|---|---|
Minimum number of VLANs |
1 |
1 |
Suggested interface name |
|
|
Minimum number of IP subnets |
1 |
1 |
Minimum recommended IP subnet size |
1 IP per cluster node |
1 IP per cluster node |
Stretch or multiple segments |
Multiple |
Multiple |
External routing |
Not required |
Not required |
Internal routing |
Routing to all IP subnets of the Storage access network |
Routing to all IP subnets of the Storage replication network |
Note
When selecting externally routable subnets, ensure that the subnet ranges do not overlap with the internal subnets ranges. Otherwise, internal resources of users will not be available from the MOSK cluster.
- 3(1,2)
For details about Ceph networks, see Ceph Network Configuration Reference.
Multi-rack architecture¶
Available since MOSK 21.6 TechPreview
Mirantis OpenStack for Kubernetes (MOSK) enables you to deploy a cluster with a multi-rack architecture, where every data center cabinet (a rack), incorporates its own Layer 2 network infrastructure that does not extend beyond its top-of-rack switch. The architecture allows a MOSK cloud to integrate natively with the Layer 3-centric networking topologies seen in modern data centers, such as Spine-Leaf.
The architecture eliminates the need to stretch and manage VLANs across multiple physical locations in a single data center, or to establish VPN tunnels between the parts of a geographically distributed cloud.
The set of networks present in each rack depends on the type of the OpenStack networking service back end in use.

Bare metal provisioning¶
The multi-rack architecture in Mirantis Container Cloud and
MOSK requires
additional configuration of networking infrastructure. Every Layer 2 domain,
or rack, needs to have a DHCP relay agent configured on its dedicated
segment of the Common/PXE network (lcm-nw
VLAN). The agent
handles all Layer-2 DHCP requests incoming from the bare metal servers living
in the rack and forwards them as Layer-3 packets across the data center fabric
to a Mirantis Container Cloud regional cluster.

You need to configure per-rack DHCP ranges by defining Subnet resources in Mirantis Container Cloud as described in Mirantis Container Cloud documentation: Configure multiple DHCP ranges using Subnet resources.
Based on the address of the DHCP agent that relays a request from a server, Mirantis Container Cloud will automatically allocate an IP address in the corresponding subnet.
For the networks types other than Common/PXE, you need to define subnets using the Mirantis Container Cloud L2 templates. Every rack needs to have a dedicated set of L2 templates, each template representing a specific server role and configuration.
Multi-rack MOSK cluster with Tungsten Fabric¶
For MOSK clusters with the Tungsten Fabric back end, you need to place the servers running the cloud control plane components into a single rack. This limitation is caused by the Layer 2 VRRP protocol used by the Kubernetes load balancer mechanism (MetalLB) to ensure high availability of Mirantis Container Cloud and MOSK API.
Note
In the future product versions, Mirantis will be implementing support for the Layer 3 BGP mode for the Kubernetes load balancing mechanism.
The diagram below will help you to plan the networking layout of a multi-rack MOSK cloud with Tungsten Fabric.

The table below provides a mapping between the racks and the network types participating in a multi-rack MOSK cluster with the Tungsten Fabric back end.
Network |
VLAN name |
Rack 1 |
Rack 2 and N |
---|---|---|---|
Common/PXE |
|
Yes |
Yes |
Management |
|
Yes |
Yes |
External (MetalLB) |
|
Yes |
No |
Kubernetes workloads |
|
Yes |
Yes |
Storage access (Ceph) |
|
Yes |
Yes |
Storage replication (Ceph) |
|
Yes |
Yes |
Overlay |
|
Yes |
Yes |
Live migration |
|
Yes |
Yes |
Physical networks layout¶
This section summarizes the requirements for the physical layout of underlay network and VLANs configuration for the multi-rack architecture of Mirantis OpenStack for Kubernetes (MOSK).
Physical networking of a Container Cloud management cluster¶
Due to limitations of virtual IP address for Kubernetes API and of MetalLB load balancing in Container Cloud, the management cluster nodes must share VLAN segments in the provisioning and management networks.
In the multi-rack architecture, the management cluster nodes may be placed to a single rack or spread across three racks. In either case, provisioning and management network VLANs must be stretched across ToR switches of the racks.
The following diagram illustrates physical and L2 connections of the Container Cloud management cluster.

Physical networking of a MOSK cluster¶
Due to limitations of MetalLB load balancing, all MOSK cluster nodes must share the VLAN segment in the external network.
In the multi-rack architecture, the external network VLAN must be stretched to the ToR switches of all racks. All other VLANs may be configured per rack.
Due to limitations of using a virtual IP address for Kubernetes API, the Kubernetes manager nodes must share the VLAN segment in the API/LCM network.
In the multi-rack architecture, Kubernetes manager nodes may be spread across three racks. The API/LCM network VLAN must be stretched to the ToR switches of the racks. All other VLANs may be configured per rack.
The following diagram illustrates physical and L2 network connections of the Kubernetes manager nodes in a MOSK cluster.
Caution
Such configuration does not apply to a compact control plane MOSK installation. See Create a MOSK cluster.

The following diagram illustrates physical and L2 network connections of the control plane nodes in a MOSK cluster.

All VLANs for OpenStack compute nodes may be configured per rack. No VLAN should be stretched across multiple racks.
The following diagram illustrates physical and L2 network connections of the compute nodes in a MOSK cluster.

All VLANs for OpenStack storage nodes may be configured per rack. No VLAN should be stretched across multiple racks.
The following diagram illustrates physical and L2 network connections of the storage nodes in a MOSK cluster.

Performance optimization¶
The following recommendations apply to all types of nodes in the Mirantis OpenStack for Kubernetes (MOSK) clusters.
Jumbo frames¶
To improve the goodput, we recommend that you enable jumbo frames where possible. The jumbo frames have to be enabled on the whole path of the packets traverse. If one of the network components cannot handle jumbo frames, the network path uses the smallest MTU.
Bonding¶
To provide fault tolerance of a single NIC, we recommend using the link aggregation, such as bonding. The link aggregation is useful for linear scaling of bandwidth, load balancing, and fault protection. Depending on the hardware equipment, different types of bonds might be supported. Use the multi-chassis link aggregation as it provides fault tolerance at the device level. For example, MLAG on Arista equipment or vPC on Cisco equipment.
The Linux kernel supports the following bonding modes:
active-backup
balance-xor
802.3ad
(LACP)balance-tlb
balance-alb
Since LACP is the IEEE standard 802.3ad
supported by the majority of
network platforms, we recommend using this bonding mode.
Use the Link Aggregation Control Protocol (LACP) bonding mode
with MC-LAG domains configured on ToR switches. This corresponds to
the 802.3ad
bond mode on hosts.
Additionally, follow these recommendations in regards to bond interfaces:
Use ports from different multi-port NICs when creating bonds. This makes network connections redundant if failure of a single NIC occurs.
Configure the ports that connect servers to the PXE network with PXE VLAN as native or untagged. On these ports, configure LACP fallback to ensure that the servers can reach DHCP server and boot over network.
Spanning tree portfast mode¶
Configure Spanning Tree Protocol (STP) settings on the network switch ports to ensure that the ports start forwarding packets as soon as the link comes up. It helps avoid iPXE timeout issues and ensures reliable boot over network.
Storage¶
A MOSK cluster uses Ceph as a distributed storage system for file, block, and object storage exposed by the Container Cloud baremetal management cluster. This section provides an overview of a Ceph cluster deployed by Container Cloud.
Ceph overview¶
Mirantis Container Cloud deploys Ceph on the baremetal-based management and managed clusters using Helm charts with the following components:
- Rook Ceph Operator
A storage orchestrator that deploys Ceph on top of a Kubernetes cluster. Also known as
Rook
orRook Operator
. Rook operations include:Deploying and managing a Ceph cluster based on provided Rook CRs such as
CephCluster
,CephBlockPool
,CephObjectStore
, and so on.Orchestrating the state of the Ceph cluster and all its daemons.
KaaSCephCluster
custom resource (CR)Represents the customization of a Kubernetes installation and allows you to define the required Ceph configuration through the Container Cloud web UI before deployment. For example, you can define the failure domain, Ceph pools, Ceph node roles, number of Ceph components such as Ceph OSDs, and so on. The
ceph-kcc-controller
controller on the Container Cloud management cluster manages theKaaSCephCluster
CR.- Ceph Controller
A Kubernetes controller that obtains the parameters from Container Cloud through a CR, creates CRs for Rook and updates its CR status based on the Ceph cluster deployment progress. It creates users, pools, and keys for OpenStack and Kubernetes and provides Ceph configurations and keys to access them. Also, Ceph Controller eventually obtains the data from the OpenStack Controller for the Keystone integration and updates the RADOS Gateway services configurations to use Kubernetes for user authentication. Ceph Controller operations include:
Transforming user parameters from the Container Cloud Ceph CR into Rook CRs and deploying a Ceph cluster using Rook.
Providing integration of the Ceph cluster with Kubernetes.
Providing data for OpenStack to integrate with the deployed Ceph cluster.
- Ceph Status Controller
A Kubernetes controller that collects all valuable parameters from the current Ceph cluster, its daemons, and entities and exposes them into the
KaaSCephCluster
status. Ceph Status Controller operations include:Collecting all statuses from a Ceph cluster and corresponding Rook CRs.
Collecting additional information on the health of Ceph daemons.
Provides information to the
status
section of theKaaSCephCluster
CR.
- Ceph Request Controller
A Kubernetes controller that obtains the parameters from Container Cloud through a CR and manages Ceph OSD lifecycle management (LCM) operations. It allows for a safe Ceph OSD removal from the Ceph cluster. Ceph Request Controller operations include:
Providing an ability to perform Ceph OSD LCM operations.
Obtaining specific CRs to remove Ceph OSDs and executing them.
Pausing the regular Ceph Controller reconciliation until all requests are completed.
A typical Ceph cluster consists of the following components:
Ceph Monitors - three or, in rare cases, five Ceph Monitors.
Ceph Managers - one Ceph Manager in a regular cluster.
RADOS Gateway services - Mirantis recommends having three or more RADOS Gateway instances for HA.
Ceph OSDs - the number of Ceph OSDs may vary according to the deployment needs.
Warning
A Ceph cluster with 3 Ceph nodes does not provide hardware fault tolerance and is not eligible for recovery operations, such as a disk or an entire Ceph node replacement.
A Ceph cluster uses the replication factor that equals 3. If the number of Ceph OSDs is less than 3, a Ceph cluster moves to the degraded state with the write operations restriction until the number of alive Ceph OSDs equals the replication factor again.
The placement of Ceph Monitors and Ceph Managers is defined in the
KaaSCephCluster
CR.
The following diagram illustrates the way a Ceph cluster is deployed in Container Cloud:

The following diagram illustrates the processes within a deployed Ceph cluster:

See also
Ceph limitations¶
A Ceph cluster configuration in Mirantis Container Cloud includes but is not limited to the following limitations:
Only one Ceph Controller per a management, regional, or managed cluster and only one Ceph cluster per Ceph Controller are supported.
The replication size for any Ceph pool must be set to more than 1.
Only one CRUSH tree per cluster. The separation of devices per Ceph pool is supported through device classes with only one pool of each type for a device class.
All CRUSH rules must have the same
failure_domain
.Only the following types of CRUSH buckets are supported:
topology.kubernetes.io/region
topology.kubernetes.io/zone
topology.rook.io/datacenter
topology.rook.io/room
topology.rook.io/pod
topology.rook.io/pdu
topology.rook.io/row
topology.rook.io/rack
topology.rook.io/chassis
RBD mirroring is not supported.
Consuming an existing Ceph cluster is not supported.
CephFS is not supported.
Only IPv4 is supported.
If two or more Ceph OSDs are located on the same device, there must be no dedicated WAL or DB for this class.
Only a full collocation or dedicated WAL and DB configurations are supported.
The minimum size of any defined Ceph OSD device is 5 GB.
Reducing the number of Ceph Monitors is not supported and causes the Ceph Monitor daemons removal from random nodes.
When adding a Ceph node with the Ceph Monitor role, if any issues occur with the Ceph Monitor,
rook-ceph
removes it and adds a new Ceph Monitor instead, named using the next alphabetic character in order. Therefore, the Ceph Monitor names may not follow the alphabetical order. For example,a
,b
,d
, instead ofa
,b
,c
.
StackLight¶
StackLight is the logging, monitoring, and alerting solution that provides a single pane of glass for cloud maintenance and day-to-day operations as well as offers critical insights into cloud health including operational information about the components deployed with Mirantis OpenStack for Kubernetes (MOSK). StackLight is based on Prometheus, an open-source monitoring solution and a time series database, and OpenSearch, the logs and notifications storage.
Deployment architecture¶
Mirantis OpenStack for Kubernetes (MOSK) deploys the StackLight stack as a release of a Helm chart that contains the helm-controller and HelmBundle custom resources. The StackLight HelmBundle consists of a set of Helm charts describing the StackLight components. Apart from the OpenStack-specific components below, StackLight also includes the components described in Mirantis Container Cloud Reference Architecture: Deployment architecture. By default, StackLight logging stack is disabled.
During the StackLight configuration when deploying a MOSK cluster, you can define the HA or non-HA StackLight architecture type. For details, see Mirantis Container Cloud Reference Architecture: StackLight database modes.
StackLight component |
Description |
---|---|
Prometheus native exporters and endpoints |
Export the existing metrics as Prometheus metrics and include:
|
Telegraf OpenStack plugin |
Collects and processes the OpenStack metrics. |
Monitored components¶
StackLight measures, analyzes, and reports in a timely manner about failures that may occur in the following Mirantis OpenStack for Kubernetes (MOSK) components and their sub-components. Apart from the components below, StackLight also monitors the components listed in Mirantis Container Cloud Reference Architecture: Monitored components.
Libvirt
Memcached
MariaDB
NTP
OpenStack (Barbican, Cinder, Designate, Glance, Heat, Horizon, Ironic, Keystone, Neutron, Nova, Octavia)
OpenStack SSL certificates
Open vSwitch
RabbitMQ
Tungsten Fabric (Casandra, Kafka, Redis, ZooKeeper)
OpenSearch and Prometheus storage sizing¶
Caution
Calculations in this document are based on numbers from a real-scale test cluster with 34 nodes. The exact space required for metrics and logs must be calculated depending on the ongoing cluster operations. Some operations force the generation of additional metrics and logs. The values below are approximate. Use them only as recommendations.
During the deployment of a new cluster, you must specify the OpenSearch retention time and Persistent Volume Claim (PVC) size, Prometheus PVC, retention time, and retention size. When configuring an existing cluster, you can only set OpenSearch retention time, Prometheus retention time, and retention size.
The following table describes the recommendations for both OpenSearch
and Prometheus retention size and PVC size for a cluster with 34 nodes.
Retention time depends on the space allocated for the data. To calculate
the required retention time, use the
{retention time} = {retention size} / {amount of data per day}
formula.
Service |
Required space per day |
Description |
---|---|---|
OpenSearch |
|
When setting Persistent Volume Claim Size for OpenSearch during the cluster creation, take into account that it defines the PVC size for a single instance of the OpenSearch cluster. StackLight in HA mode has 3 OpenSearch instances. Therefore, for a total OpenSearch capacity, multiply the PVC size by 3. |
Prometheus |
|
Every Prometheus instance stores the entire database. Multiple replicas store multiple copies of the same data. Therefore, treat the Prometheus PVC size as the capacity of Prometheus in the cluster. Do not sum them up. Prometheus has built-in retention mechanisms based on the database size and time series duration stored in the database. Therefore, if you miscalculate the PVC size, retention size set to ~1 GB less than the PVC size will prevent disk overfilling. |
See also
Tungsten Fabric¶
Tungsten Fabric provides basic L2/L3 networking to an OpenStack environment running on the MKE cluster and includes the IP address management, security groups, floating IP addresses, and routing policies functionality. Tungsten Fabric is based on overlay networking, where all virtual machines are connected to a virtual network with encapsulation (MPLSoGRE, MPLSoUDP, VXLAN). This enables you to separate the underlay Kubernetes management network. A workload requires an external gateway, such as a hardware EdgeRouter or a simple gateway to route the outgoing traffic.
The Tungsten Fabric vRouter uses different gateways for the control and data planes.
Tungsten Fabric cluster¶
All services of Tungsten Fabric are delivered as separate containers, which are deployed by the Tungsten Fabric Operator (TFO). Each container has an INI-based configuration file that is available on the host system. The configuration file is generated automatically upon the container start and is based on environment variables provided by the TFO through Kubernetes ConfigMaps.
The main Tungsten Fabric containers run with the host
network as
DeploymentSet
, without using the Kubernetes networking layer. The services
listen directly on the host
network interface.
The following diagram describes the minimum production installation of Tungsten Fabric with a Mirantis OpenStack for Kubernetes (MOSK) deployment.

For the details about the Tungsten Fabric services included in MOSK deployments and the types of traffic and traffic flow directions, see the subsections below.
Tungsten Fabric cluster components¶
This section describes the Tungsten Fabric services and their distribution across the Mirantis OpenStack for Kubernetes (MOSK) deployment.
The Tungsten Fabric services run mostly as DaemonSets in separate containers for each service. The deployment and update processes are managed by the Tungsten Fabric operator. However, Kubernetes manages the probe checks and restart of broken containers.
All configuration and control services run on the Tungsten Fabric Controller nodes.
Service name |
Service description |
---|---|
|
Exposes a REST-based interface for the Tungsten Fabric API. |
|
Collects data of the Tungsten Fabric configuration processes and sends
it to the Tungsten Fabric |
|
Communicates with the cluster gateways using BGP and with the vRouter agents using XMPP, as well as redistributes appropriate networking information. |
|
Collects the Tungsten Fabric Controller process data and sends
this information to the Tungsten Fabric |
|
Manages physical networking devices using |
|
Using the |
|
The customized Berkeley Internet Name Domain (BIND) daemon of
Tungsten Fabric that manages DNS zones for the |
|
Listens to configuration changes performed by a user and generates corresponding system configuration objects. In multi-node deployments, it works in the active-backup mode. |
|
Listens to configuration changes of |
|
Consists of the |
All analytics services run on Tungsten Fabric analytics nodes.
Service name |
Service description |
---|---|
|
Evaluates and manages the alarms rules. |
|
Provides a REST API to interact with the Cassandra analytics database. |
|
Collects all Tungsten Fabric analytics process data and sends
this information to the Tungsten Fabric |
|
Provisions the init model if needed. Collects data of the |
|
Collects and analyzes data from all Tungsten Fabric services. |
|
Handles the queries to access data from the Cassandra database. |
|
Receives the authorization and configuration of the physical routers
from the |
|
Reads the SNMP information from the physical router user-visible entities (UVEs), creates a neighbor list, and writes the neighbor information to the physical router UVEs. The Tungsten Fabric web UI uses the neighbor list to display the physical topology. |
The Tungsten Fabric vRouter provides data forwarding to an OpenStack tenant instance and reports statistics to the Tungsten Fabric analytics service. The Tungsten Fabric vRouter is installed on all OpenStack compute nodes. Mirantis OpenStack for Kubernetes (MOSK) supports the kernel-based deployment of the Tungsten Fabric vRouter.
Service name |
Service description |
---|---|
|
Connects to the Tungsten Fabric Controller container and the Tungsten Fabric DNS system using the Extensible Messaging and Presence Protocol (XMPP). The vRouter Agent acts as a local control plane. Each Tungsten Fabric vRouter Agent is connected to at least two Tungsten Fabric controllers in an active-active redundancy mode. The Tungsten Fabric vRouter Agent is responsible for all networking-related functions including routing instances, routes, and others. The Tungsten Fabric vRouter uses different gateways for the control and data planes. For example, the Linux system gateway is located on the management network, and the Tungsten Fabric gateway is located on the data plane network. |
|
Collects the supervisor |
The following diagram illustrates the Tungsten Fabric kernel vRouter set up by the TF operator:

On the diagram above, the following types of networks interfaces are used:
eth0
- for the management (PXE) network (eth1
andeth2
are the slave interfaces ofBond0
)Bond0.x
- for the MKE control plane networkBond0.y
- for the MKE data plane network
Service name |
Service description |
---|---|
|
|
|
The Kubernetes operator that enables the Cassandra clusters creation and management. |
|
Handles the messaging bus and generates alarms across the Tungsten Fabric analytics containers. |
|
The Kubernetes operator that enabels Kafka clusters creation and management. |
|
Stores the physical router UVE storage and serves as a messaging bus for event notifications. |
|
The Kubernetes operator that enables Redis clusters creation and management. |
|
Holds the active-backup status for the |
|
The Kubernetes operator that enables ZooKeeper clusters creation and management. |
|
Exchanges messages between API servers and original request senders. |
|
The Kubernetes operator that enables RabbitMQ clusters creation and management. |
All Tungsten Fabric plugin services are installed on the OpenStack controller nodes.
Service name |
Service description |
---|---|
|
The Neutron server that includes the Tungsten Fabric plugin. |
|
The Octavia API that includes the Tungsten Fabric Octavia driver. |
|
The Heat API that includes the Tungsten Fabric Heat resources and templates. |
Available since MOSK 22.3
Along with the Tungsten Fabric services, MOSK deploys and
updates special image precaching DaemonSets when the kind TFOperator
resource is created or image references in it get updated.
These DaemonSets precache container images on Kubernetes nodes minimizing
possible downtime when updating container images. Cloud Operator can
disable image precaching through the TFOperator
resource.
See also
Tungsten Fabric traffic flow¶
This section describes the types of traffic and traffic flow directions in a Mirantis OpenStack for Kubernetes (MOSK) cluster.
The following diagram illustrates all types of UI and API traffic in a Mirantis OpenStack for Kubernetes cluster, including the monitoring and OpenStack API traffic. The OpenStack Dashboard pod hosts Horizon and acts as a proxy for all other types of traffic. TLS termination is also performed for this type of traffic.

SDN or Tungsten Fabric traffic goes through the overlay Data network and processes east-west and north-south traffic for applications that run in a MOSK cluster. This network segment typically contains tenant networks as separate MPLS-over-GRE and MPLS-over-UDP tunnels. The traffic load depends on the workload.
The control traffic between the Tungsten Fabric controllers, edge routers, and vRouters uses the XMPP with TLS and iBGP protocols. Both protocols produce low traffic that does not affect MPLS over GRE and MPLS over UDP traffic. However, this traffic is critical and must be reliably delivered. Mirantis recommends configuring higher QoS for this type of traffic.
The following diagram displays both MPLS over GRE/MPLS over UDP and iBGP and XMPP traffic examples in a MOSK cluster:

Tungsten Fabric lifecycle management¶
Mirantis OpenStack for Kubernetes (MOSK) provides the Tungsten Fabric lifecycle management including pre-deployment custom configurations, updates, data backup and restoration, as well as handling partial failure scenarios, by means of the Tungsten Fabric operator.
This section is intended for the cloud operators who want to gain insight into the capabilities provided by the Tungsten Fabric operator along with the understanding of how its architecture allows for easy management while addressing the concerns of users of Tungsten Fabric-based MOSK clusters.
Tungsten Fabric operator¶
The Tungsten Fabric operator (TFO) is based on the Kubernetes operator SDK project. The Kubernetes operator SDK is a framework that uses the controller-runtime library to make writing operators easier by providing the following:
High-level APIs and abstractions to write the operational logic more intuitively.
Tools for scaffolding and code generation to bootstrap a new project fast.
Extensions to cover common operator use cases.
The TFO deploys the following sub-operators. Each sub-operator handles a separate part of a TF deployment:
Network |
Description |
---|---|
TFControl |
Deploys the Tungsten Fabric control services, such as:
|
TFConfig |
Deploys the Tungsten Fabric configuration services, such as:
|
TFAnalytics |
Deploys the Tungsten Fabric analytics services, such as:
|
TFVrouter |
Deploys a vRouter on each compute node with the following services:
|
TFWebUI |
Deploys the following web UI services:
|
TFTool |
Deploys the following tools to verify the TF deployment status:
|
TFTest |
An operator to run Tempest tests. |
Besides the sub-operators that deploy TF services, TFO uses operators to deploy and maintain third-party services, such as different types of storage, cache, message system, and so on. The following table describes all third-party operators:
Network |
Description |
---|---|
casandra-operator |
An upstream operator that automates the Cassandra HA storage operations for the configuration and analytics data. |
zookeeper-operator |
An upstream operator for deployment and automation of a ZooKeeper cluster. |
kafka-operator |
An operator for the Kafka cluster used by analytics services. |
redis-operator |
An upstream operator that automates the Redis cluster deployment and keeps it healthy. |
rabbitmq-operator |
An operator for the messaging system based on RabbitMQ. |
The following diagram illustrates a simplified TFO workflow:

TFOperator custom resource¶
The resource of kind TFOperator
(TFO) is a custom resource (CR) defined by
a resource of kind CustomResourceDefinition
.
The CustomResourceDefinition
resource in Kubernetes uses the OpenAPI
Specification (OAS) version 2 to specify the schema of the defined resource.
The Kubernetes API outright rejects the resources that do not pass this schema
validation. Along with schema validation, starting from MOSK
21.6, TFOperator
uses ValidatingAdmissionWebhook
for extended
validations when a CR is created or updated.
For the list of configuration options available to a cloud operator, refer to Tungsten Fabric configuration. Also, check out the Tungsten Fabric API Reference document of the MOSK version that your cluster has been deployed with.
Available since MOSK 21.6
Tungsten Fabric Operator uses ValidatingAdmissionWebhook
to validate
environment variables set to Tungsten Fabric components upon the TFOperator
object creation or update. The following validations are performed:
Environment variables passed to TF components containers
Mapping between
tfVersion
andtfImageTag
, if definedSchedule and data capacity format for
tf-dbBackup
If required, you can disable ValidatingAdmissionWebhook
through the
TFOperator
HelmBundle resource:
apiVersion: lcm.mirantis.com/v1alpha1
kind: HelmBundle
metadata:
name: tungstenfabric-operator
namespace: tf
spec:
releases:
- name: tungstenfabric-operator
values:
admission:
enabled: false
Environment variables |
TF components and containers |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
See also
Tungsten Fabric configuration¶
Mirantis OpenStack for Kubernetes (MOSK) allows you to easily adapt
your Tungsten Fabric deployment to the needs of your environment through the
TFOperator
custom resource.
This section includes custom configuration details available to you.
Cassandra configuration¶
This section describes the Cassandra configuration through the Tungsten Fabric Operator custom resource.
By default, Tungsten Fabric Operator sets up the following resource limits for Cassandra analytics and configuration StatefulSets:
Limits:
cpu: 8
memory: 32Gi
Requests:
cpu: 1
memory: 16Gi
This is a verified configuration suitable for most cases. However, if nodes
are under a heavy load, the KubeContainerCPUThrottlingHigh
StackLight alert
may raise for Tungsten Fabric Pods of the tf-cassandra-analytics
and
tf-cassandra-config
StatefulSets. If such alerts appear constantly, you can
increase the limits through the TFOperator
CR. For example:
spec:
controllers:
cassandra:
deployments:
- name: tf-cassandra-config
resources:
limits:
cpu: "12"
memory: 32Gi
requests:
cpu: "2"
memory: 16Gi
- name: tf-cassandra-analytics
resources:
limits:
cpu: "12"
memory: 32Gi
requests:
cpu: "2"
memory: 16Gi
Available since MOSK 22.3
To specify custom configurations for Cassandra clusters, use the
configOptions
settings in the TFOperator
CR. For example, you may need
to increase the file cache size in case of a heavy load on the nodes labeled
with tfanalyticsdb=enabled
or tfconfigdb=enabled
:
spec:
controllers:
cassandra:
deployments:
- name: tf-cassandra-analytics
configOptions:
file_cache_size_in_mb: 1024
Custom vRouter settings¶
TechPreview
To specify custom settings for the Tungsten Fabric (TF) vRouter nodes, for
example, to change the name of the tunnel
network interface or enable debug
level logging on some subset of nodes, use the customSpecs
settings in
the TFOperator
CR.
For example, to enable debug level logging on a specific node or multiple nodes:
spec:
controllers:
tf-vrouter:
agent:
customSpecs:
- name: debug
label:
name: <NODE-LABEL>
value: <NODE-LABEL-VALUE>
containers:
- name: agent
env:
- name: LOG_LEVEL
value: SYS_DEBUG
The customSpecs
parameter inherits all settings for the tf-vrouter
containers that are set on the spec:controllers:agent
level and overrides
or adds additional parameters. The example configuration above overrides the
logging level from SYS_INFO
, which is the default logging level, to
SYS_DEBUG
.
Starting from MOSK 21.6, for clusters with a multi-rack
architecture, you may need to redefine the gateway IP for the Tungsten Fabric
vRouter nodes using the VROUTER_GATEWAY
parameter. For details,
see Multi-rack architecture.
Control plane traffic interface¶
By default, the TF control
service uses the management
interface for
the BGP and XMPP traffic. You can change the control
service interface
using the controlInterface
parameter in the TFOperator
CR, for example,
to combine the BGP and XMPP traffic with the data (tenant) traffic:
spec:
settings:
controlInterface: <tunnel-interface>
Traffic encapsulation¶
Tungsten Fabric implements cloud tenants’ virtual networks as Layer 3 overlays. Tenant traffic gets encapsulated into one of the supported protocols and is carried over the infrastructure network between 2 compute nodes or a compute node and an edge router device.
In addition, Tungsten Fabric is capable of exchanging encapsulated traffic with external systems in order to build advanced virtual networking topologies, for example, BGP VPN connectivity between 2 MOSK clouds or a MOSK cloud and a cloud tenant premises.
MOSK supports the following encapsulation protocols:
- MPLS over Generic Routing Encapsulation (GRE)
A traditional encapsulation method supported by several router vendors, including Cisco and Juniper. The feature is applicable when other encapsulation methods are not available. For example, an SDN gateway runs software that does not support MPLS over UDP.
- MPLS over User Datagram Protocol (UDP)
A variation of the MPLS over GRE mechanism. It is the default and the most frequently used option in MOSK. MPLS over UDP replaces headers in UDP packets. In this case, a UDP port stores a hash of the packet payload (entropy). It provides a significant benefit for equal-cost multi-path (ECMP) routing load balancing. MPLS over UDP and MPLS over GRE transfer Layer 3 traffic only.
- Virtual Extensible LAN (VXLAN)
Available since MOSK 22.1 TechPreview
The combination of VXLAN and EVPN technologies is often used for creating advanced cloud networking topologies. For example, it can provide transparent Layer 2 interconnections between Virtual Network Functions running on top of the cloud and physical traffic generator appliances hosted somewhere else.
The ENCAP_PRIORIY
parameter defines the priority in which the
encapsulation protocols are attempted to be used when setting the BGP VPN
connectivity between the cloud and external systems.
By default, the encapsulation order is set to MPLSoUDP,MPLSoGRE,VXLAN
.
The cloud operator can change it depending their needs in the TFOperator
custom resource as it is illustrated in Configuring encapsulation.
The list of supported encapsulated methods along with their order is shared between BGP peers as part of the capabilities information exchange when establishing a BGP session. Both parties must support the same encapsulation methods to build a tunnel for the network traffic.
For example, if the cloud operator wants to set up a Layer 2 VPN between the cloud and their network infrastructure, they configure the cloud’s virtual networks with VXLAN identifiers (VNIs) and do the same on the other side, for example, on a network switch. Also, VXLAN must be set in the first position in encapsulation priority order. Otherwise, VXLAN tunnels will not get established between endpoints, even though both endpoints may support the VXLAN protocol.
However, setting VXLAN first in the encapsulation priority order will not enforce VXLAN encapsulation between compute nodes or between compute nodes and gateway routers that use Layer 3 VPNs for communication.
The TFOperator
custom resource allows you to define encapsulation settings
for your Tungsten Fabric cluster.
Important
The TFOperator
CR must be the only place to configure
the cluster encapsulation. Performing these configurations through
the TF web UI, CLI, or API does not provide the configuration persistency,
and the settings defined this way may get reset to defaults during the
cluster services restart or update.
Note
Defining the default values for encapsulation parameters in the TF operator CR is unnecessary.
Parameter |
Default value |
Description |
---|---|---|
|
|
Defines the encapsulation priority order. |
|
|
Defines the Virtual Network ID type. The list of possible values includes:
Typically, for a Layer 2 VPN use case, the |
Example configuration:
controllers:
tf-config:
provisioner:
containers:
- env:
- name: VXLAN_VN_ID_MODE
value: automatic
- name: ENCAP_PRIORITY
value: VXLAN,MPLSoUDP,MPLSoGRE
name: provisioner
Autonomous System Number (ASN)¶
In the routing fabric of a data centre, a MOSK cluster with Tungsten Fabric enabled can be represented either by a separate Autonomous System (AS) or as part of a bigger autonomous system. In either case, Tungsten Fabric needs to participate in the BGP peering, exchanging routes with external devices and within the cloud.
The Tungsten Fabric Controller acts as an internal (iBGP) route reflector for
the cloud’s AS by populating /32
routes pointing to VMs across all compute
nodes as well as the cloud’s edge gateway devices in case they belong to the
same AS. Apart from being an iBGP router reflector for the cloud’s AS, the
Tungsten Fabric Controller can act as a BGP peer for autonomous systems
external to the cloud, for example, for the AS configured across the data
center’s leaf-spine fabric.
The Autonomous System Number (ASN) setting contains the unique identifier of the autonomous system that the MOSK cluster with Tungsten Fabric belongs to. The ASN number does not affect the internal iBGP communication between vRouters running on the compute nodes. Such communication will work regardless of the ASN number settings. However, any network appliance that is not managed by the Tungsten Fabric control plane will have BGP configured manually. Therefore, the ASN settings should be configured accordingly on both sides. Otherwise, it would result in the inability to establish BPG sessions, regardless of whether the external device peers with Tungsten Fabric over iBGP or eBGP.
The TFOperator
custom resource enables you to define ASN settings for
your Tungsten Fabric cluster.
Important
The TFOperator
CR must be the only place to configure
the cluster ASN. Performing these configurations through the TF web UI,
CLI, or API does not provide the configuration persistency, and the
settings defined this way may get reset to defaults during the cluster
services restart or update.
Note
Defining the default values for ASN parameters in the TF operator CR is unnecessary.
Parameter |
Default value |
Description |
---|---|---|
|
|
Defines the control node’s Autonomous System Number (ASN). |
|
|
Enables the 4-byte ASN format. |
Example configuration:
controllers:
tf-config:
provisioner:
containers:
- env:
- name: BGP_ASN
value: 64515
- name: ENABLE_4BYTE_AS
value: true
name: provisioner
tf-control:
provisioner:
containers:
- env:
- name: BGP_ASN
value: 64515
name: provisioner
Access to external DNS¶
Available since MOSK 22.2
By default, the TF tf-control-dns-external
service is created to expose
TF control dns
. You can disable creation of this service using the
enableDNSExternal
parameter in the TFOperator
CR. For example:
spec:
controllers:
tf-control:
dns:
enableDNSExternal: false
If such service was manually created before MOSK 22.2
with a name that differs from tf-control-dns-external
, you can optionally
delete the old service. If the name was the same, the service will be handled
by the TF Operator.
Gateway for vRouter data plane network¶
If an edge router is accessible from the data plane through a gateway, define
the VROUTER_GATEWAY
parameter in the TFOperator
custom resource.
Otherwise, the default system gateway is used.
spec:
controllers:
tf-vrouter:
agent:
containers:
- name: agent
env:
- name: VROUTER_GATEWAY
value: <data-plane-network-gateway>
You can also configure the parameter for Tungsten Fabric vRouter in the DPDK mode:
spec:
controllers:
tf-vrouter:
agent-dpdk:
enabled: true
containers:
- name: agent
env:
- name: VROUTER_GATEWAY
value: <data-plane-network-gateway>
Tungsten Fabric image precaching¶
Available since MOSK 22.3
By default, MOSK deploys image precaching DaemonSets
to minimize possible downtime when updating container images. You can disable
creation of these DaemonSets by setting the imagePreCaching
parameter in
the TFOperator
custom resource to false
:
spec:
settings:
imagePreCaching: false
When you disable imagePreCaching
, the Tungsten Fabric Operator does not
automatically remove the image precaching DaemonSets that have already been
created. These DaemonSets do not affect the cluster setup. To remove them
manually:
kubectl -n tf delete daemonsets.apps -l app=tf-image-pre-caching
Tungsten Fabric services¶
The section explains specifics of the Tungsten Fabric services provided by Mirantis OpenStack for Kubernetes (MOSK). The list of the services and their supported features included in this section is not full and is being constantly amended based on the complexity of the architecture and use of a particular service.
Tungsten Fabric load balancing¶
MOSK ensures Octavia with Tungsten Fabric integration by OpenStack Octavia Driver with Tungsten Fabric HAProxy as a back end.
The Tungsten Fabric-based MOSK deployment supports creation, update, and deletion operations with the following standard load balancing API entities:
Load balancers
Note
For a load balancer creation operation, the driver supports only the
vip-subnet-id
argument, thevip-network-id
argument is not supported.Listeners
Pools
Health monitors
The Tungsten Fabric-based MOSK deployment does not support the following load balancing capabilities:
L7 load balancing capabilities, such as L7 policies, L7 rules, and others
Setting specific availability zones for load balancers and their resources
Using of the UDP protocol
Operations with Octavia quotas
Operations with Octavia flavors
Warning
The Tungsten Fabric-based MOSK deployment enables you to manage the load balancer resources by means of the OpenStack CLI or OpenStack Horizon. Do not perform any manipulations with the load balancer resources through the Tungsten Fabric web UI because in this case the changes will not be reflected on the OpenStack API side.
See also
Tungsten Fabric known limitations¶
This section contains a summary of the Tungsten Fabric upstream features and use cases not supported in MOSK, features and use cases offered as Technology Preview in the current product release if any, and known limitations of Tungsten Fabric in integration with other product components.
Feature or use case |
Status |
Description |
---|---|---|
Tungsten Fabric web UI |
Provided as is |
MOSK provides the TF web UI as is and does not include this service in the support Service Level Agreement |
Automatic generation of network port records in DNSaaS (Designate) |
Not supported |
As a workaround, you can use the Tungsten Fabric built-in DNS service that enables virtual machines to resolve each other names |
Secret management (Barbican) |
Not supported |
It is not possible to use the certificates stored in Barbican to terminate HTTPs on a load balancer in a Tungsten Fabric deployment |
Role Based Access Control (RBAC) for Neutron objects |
Not supported |
|
Advanced Tungsten Fabric features |
Not supported |
Tungsten Fabric does not support the following upstream advanced features:
|
Technical Preview |
DPDK |
Node maintenance API¶
Available since MOSK 22.1
This section describes internal implementation of the node maintenance API and how OpenStack and Tungsten Fabric controllers communicate with LCM and each other during a managed cluster update.
Node maintenance API objects¶
The node maintenance API consists of the following objects:
Cluster level:
ClusterWorkloadLock
ClusterMaintenanceRequest
Node level:
NodeWorkloadLock
NodeMaintenanceRequest
WorkloadLock objects¶
The WorkloadLock
objects are created by each Application Controller.
These objects prevent LCM from performing any changes on the cluster or node
level while the lock is in the active state. The inactive state of the lock
means that the Application Controller has finished its work and the LCM can
proceed with the node or cluster maintenance.
apiVersion: lcm.mirantis.com/v1alpha1
kind: ClusterWorkloadLock
metadata:
name: cluster-1-openstack
spec:
controllerName: openstack
status:
state: active # inactive;active;failed (default: active)
errorMessage: ""
release: "6.16.0+21.3"
apiVersion: lcm.mirantis.com/v1alpha1
kind: NodeWorkloadLock
metadata:
name: node-1-openstack
spec:
nodeName: node-1
controllerName: openstack
status:
state: active # inactive;active;failed (default: active)
errorMessage: ""
release: "6.16.0+21.3"
MaintenanceRequest objects¶
The MaintenanceRequest
objects are created by LCM. These objects notify
Application Controllers about the upcoming maintenance of a cluster or
a specific node.
apiVersion: lcm.mirantis.com/v1alpha1
kind: ClusterMaintenanceRequest
metadata:
name: cluster-1
spec:
scope: drain # drain;os
apiVersion: lcm.mirantis.com/v1alpha1
kind: NodeMaintenanceRequest
metadata:
name: node-1
spec:
nodeName: node-1
scope: drain # drain;os
The scope
parameter in the object specification defines the impact on
the managed cluster or node. The list of the available options include:
drain
A regular managed cluster update. Each node in the cluster goes over a drain procedure. No node reboot takes place, a maximum impact includes restart of services on the node including Docker, which causes the restart of all containers present in the cluster.
os
A node might be rebooted during the update. Triggers the workload evacuation by the OpenStack Controller.
When the MaintenanceRequest
object is created, an Application Controller
executes a handler to prepare workloads for maintenance and put appropriate
WorkloadLock
objects into the inactive state.
When maintenance is over, LCM removes MaintenanceRequest
objects,
and the Application Controllers move their WorkloadLocks
objects into
the active state.
OpenStack Controller maintenance API¶
When LCM creates the ClusterMaintenanceRequest
object, the OpenStack
Controller ensures that all OpenStack components are in the Healthy
state, which means that the pods are up and running, and the readiness
probes are passing.
The ClusterMaintenanceRequest object creation flow:
When LCM creates the NodeMaintenanceRequest
, the OpenStack Controller:
Prepares components on the node for maintenance by removing
nova-compute
from scheduling.If the reboot of a node is possible, the instance migration workflow is triggered. The Operator can configure the instance migration flow through the Kubernetes node annotation and should define the required option before the managed cluster update.
To mitigate the potential impact on the cloud workloads, you can define the instance migration flow for the compute nodes running the most valuable instances.
The list of available options for the instance migration configuration includes:
The
openstack.lcm.mirantis.com/instance_migration_mode
annotation:live
Default. The OpenStack controller live migrates instances automatically. The update mechanism tries to move the memory and local storage of all instances on the node to another node without interrupting before applying any changes to the node. By default, the update mechanism makes three attempts to migrate each instance before falling back to the
manual
mode.Note
Success of live migration depends on many factors including the selected vCPU type and model, the amount of data that needs to be transferred, the intensity of the disk IO and memory writes, the type of the local storage, and others. Instances using the following product features are known to have issues with live migration:
LVM-based ephemeral storage with and without encryption
Encrypted block storage volumes
CPU and NUMA node pinning
manual
The OpenStack Controller waits for the Operator to migrate instances from the compute node. When it is time to update the compute node, the update mechanism asks you to manually migrate the instances and proceeds only once you confirm the node is safe to update.
skip
The OpenStack Controller skips the instance check on the node and reboots it.
Note
For the clouds relying on the converged LVM with iSCSI block storage that offer persistent volumes in a remote edge sub-region, it is important to keep in mind that applying a major change to a compute node may impact not only the instances running on this node but also the instances attached to the LVM devices hosted there. We recommend that in such environments you perform the update procedure in the
manual
mode with mitigation measures taken by the Operator for each compute node. Otherwise, all the instances that have LVM with iSCSI volumes attached would need reboot to restore the connectivity.
- The
openstack.lcm.mirantis.com/instance_migration_attempts
annotation Defines the number of times the OpenStack Controller attempts to migrate a single instance before giving up. Defaults to
3
.
- The
Note
You can also use annotations to control the update of non-compute nodes if they represent critical points of a specific cloud architecture. For example, setting the
instance_migration_mode
tomanual
on a controller node with a collocated gateway (Open vSwitch) will allow the Operator to gracefully shut down all the virtual routers hosted on this node.If the OpenStack Controller cannot migrate instances due to errors, it is suspended unless all instances are migrated manually or the
openstack.lcm.mirantis.com/instance_migration_mode
annotation is set toskip
.
The NodeMaintenanceRequest object creation flow:
When the node maintenance is over, LCM removes the NodeMaintenanceRequest
object and the OpenStack Controller:
Verifies that the Kubernetes Node becomes
Ready
.Verifies that all OpenStack components on a given node are
Healthy
, which means that the pods are up and running, and the readiness probes are passing.Ensures that the OpenStack components are connected to RabbitMQ. For example, the Neutron Agents become alive on the node, and compute instances are in the
UP
state.
Note
The OpenStack Controller enables you to have only one
nodeworkloadlock
object at a time in the inactive state. Therefore,
the update process for nodes is sequential.
The NodeMaintenanceRequest object removal flow:
When the cluster maintenance is over, the OpenStack Controller sets the
ClusterWorkloadLock
object to back active
and the update completes.
The CLusterMaintenanceRequest object removal flow:
Tungsten Fabric Controller maintenance API¶
The Tungsten Fabric (TF) Controller creates and uses both types of
workloadlocks that include ClusterWorkloadLock
and NodeWorkloadLock
.
When the ClusterMaintenanceRequest
object is created, the TF Controller
verifies the TF cluster health status and proceeds as follows:
If the cluster is
Ready
, the TF Controller moves theClusterWorkloadLock
object to the inactive state.Otherwise, the TF Controller keeps the
ClusterWorkloadLock
object in the active state.
When the NodeMaintenanceRequest
object is created, the TF Controller
verifies the vRouter pod state on the corresponding node and proceeds as
follows:
If all containers are
Ready
, the TF Controller moves theNodeWorkloadLock
object to the inactive state.Otherwise, the TF Controller keeps the
NodeWorkloadLock
in the active state.
Note
If there is a NodeWorkloadLock
object in the inactive state
present in the cluster, the TF Controller does not process the
NodeMaintenanceRequest
object for other nodes until this inactive
NodeWorkloadLock
object becomes active.
When the cluster LCM removes the MaintenanceRequest
object, the TF
Controller waits for the vRouter pods to become ready and proceeds as follows:
If all containers are in the
Ready
state, the TF Controller moves theNodeWorkloadLock
object to the active state.Otherwise, the TF Controller keeps the
NodeWorkloadLock
object in the inactive state.
Cluster update flow¶
This section describes the MOSK cluster update flow to the product releases that contain major updates and require node reboot such as support for new Linux kernel, and similar.
The diagram below illustrates the sequence of operations controlled by
LCM and taking place during the update under the hood. We assume that the
ClusterWorkloadLock
and NodeWrokloadLock
objects present in the cluster
are in the active state before the Cloud Operator triggers the update.
See also
For details about the Application Controllers flow during different maintenance stages, refer to:
Phase 1: The Operator triggers the update¶
The Operator sets appropriate annotations on nodes and selects suitable migration mode for workloads.
The Operator triggers the managed cluster update through the Mirantis Container Cloud web UI as described in Update the cluster to MOSK 22.1 or above: Step 3. Initiate MOSK cluster update.
LCM creates the
ClusterMaintenance
object and notifies the application controllers about planned maintenance.
Phase 2: LCM triggers the OpenStack and Ceph update¶
The OpenStack update starts.
Ceph is waiting for the OpenStack
ClusterWorkloadLock
object to become inactive.When the OpenStack update is finalized, the OpenStack Controller marks
ClusterWorkloadLock
as inactive.The Ceph Controller triggers an update of the Ceph cluster.
When the Ceph update is finalized, Ceph marks the
ClusterWorkloadLock
object as inactive.
Phase 3: LCM initiates the Kubernetes master nodes update¶
If a master node has collocated roles, LCM creates
NodeMainteananceRequest
for the node.All Application Controllers mark their
NodeWorkloadLock
objects for this node as inactive.LCM starts draining the node by gracefully moving out all pods from the node. The DaemonSet pods are not evacuated and left running.
LCM downloads the new version of the LCM Agent and runs its states.
Note
While running Ansible states, the services on the node may be restarted.
The above flow is applied to all Kubernetes master nodes one by one.
LCM removes
NodeMainteananceRequest
.
Phase 4: LCM initiates the Kubernetes worker nodes update¶
LCM creates
NodeMaintenanceRequest
for the node with specifying scope.Application Controllers start preparing the node according to the scope.
LCM waits until all Application Controllers mark their
NodeWorkloadLock
objects for this node as inactive.All pods are evacuated from the node by draining it. This does not apply to the DaemonSet pods, which cannot be removed.
LCM downloads the new version of the LCM Agent and runs its states.
Note
While running Ansible states, the services on the node may be restarted.
The above flow is applied to all Kubernetes worker nodes one by one.
LCM removes
NodeMainteananceRequest
.
Phase 5: Finalization¶
LCM triggers the update for all other applications present in the cluster, such as StackLight, Tungsten Fabric, and others.
LCM removes
ClusterMaintenanceRequest
.
After a while the cluster update completes and becomes fully operable again.
Deployment Guide¶
Mirantis OpenStack for Kubernetes (MOSK) enables the operator to create, scale, update, and upgrade OpenStack deployments on Kubernetes through a declarative API.
The Kubernetes built-in features, such as flexibility, scalability, and declarative resource definition make MOSK a robust solution.
Plan the deployment¶
The detailed plan of any Mirantis OpenStack for Kubernetes (MOSK) deployment is determined on a per-cloud basis. For the MOSK reference architecture and design overview, see Reference Architecture.
Also, read through Mirantis Container Cloud Reference Architecture: Container Cloud bare metal as a MOSK cluster is deployed on top of a bare metal cluster managed by Mirantis Container Cloud.
Note
One of the industry best practices is to verify every new update or configuration change in a non-customer-facing environment before applying it to production. Therefore, Mirantis recommends having a staging cloud, deployed and maintained along with the production clouds. The recommendation is especially applicable to the environments that:
Receive updates often and use continuous delivery. For example, any non-isolated deployment of Mirantis Container Cloud.
Have significant deviations from the reference architecture or third party extensions installed.
Are managed under the Mirantis OpsCare program.
Run business-critical workloads where even the slightest application downtime is unacceptable.
A typical staging cloud is a complete copy of the production environment including the hardware and software configurations, but with a bare minimum of compute and storage capacity.
Provision a Container Cloud bare metal management cluster¶
The bare metal management system enables the Infrastructure Operator to deploy Container Cloud on a set of bare metal servers. It also enables Container Cloud to deploy MOSK clusters on bare metal servers without a pre-provisioned operating system.
To provision your bare metal management cluster, refer to Mirantis Container Cloud Deployment Guide: Deploy a baremetal-based management cluster
Create a managed cluster¶
After bootstrapping your baremetal-based Mirantis Container Cloud management cluster, you can create a baremetal-based managed cluster to deploy Mirantis OpenStack for Kubernetes using the Container Cloud API.
Add a bare metal host¶
Before creating a bare metal managed cluster, add the required number of bare metal hosts using CLI and YAML files for configuration. This section describes how to add bare metal hosts using the Container Cloud CLI during a managed cluster creation.
To add a bare metal host:
Verify that you configured each bare metal host as follows:
Enable the boot NIC support for UEFI load. Usually, at least the built-in network interfaces support it.
Enable the UEFI-LAN-OPROM support in BIOS -> Advanced -> PCIPCIe.
Enable the IPv4-PXE stack.
Set the following boot order:
UEFI-DISK
UEFI-PXE
If your PXE network is not configured to use the first network interface, fix the UEFI-PXE boot order to speed up node discovering by selecting only one required network interface.
Power off all bare metal hosts.
Warning
Only one Ethernet port on a host must be connected to the Common/PXE network at any given time. The physical address (MAC) of this interface must be noted and used to configure the
BareMetalHost
object describing the host.Log in to the host where your management cluster
kubeconfig
is located and where kubectl is installed.Create a secret YAML file that describes the unique credentials of the new bare metal host.
Example of the bare metal host secret¶apiVersion: v1 data: password: <credentials-password> username: <credentials-user-name> kind: Secret metadata: labels: kaas.mirantis.com/credentials: "true" kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one name: <credentials-name> namespace: <managed-cluster-project-name> type: Opaque
In the
data
section, add the IPMI user name and password in the base64 encoding to access the BMC. To obtain the base64-encoded credentials, you can use the following command in your Linux console:echo -n <username|password> | base64
Caution
Each bare metal host must have a unique
Secret
.Apply this secret YAML file to your deployment:
kubectl apply -f ${<bmh-cred-file-name>}.yaml
Create a YAML file that contains a description of the new bare metal host.
Example of the bare metal host configuration file with theworker
role¶apiVersion: metal3.io/v1alpha1 kind: BareMetalHost metadata: labels: kaas.mirantis.com/baremetalhost-id: <unique-bare-metal-host-hardware-node-id> hostlabel.bm.kaas.mirantis.com/worker: "true" kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one name: <bare-metal-host-unique-name> namespace: <managed-cluster-project-name> spec: bmc: address: <ip_address_for-bmc-access> credentialsName: <credentials-name> bootMACAddress: <bare-metal-host-boot-mac-address> online: true
For a detailed fields description, see Mirantis Container Cloud API Reference: BareMetalHost.
Apply this configuration YAML file to your deployment:
kubectl apply -f ${<bare-metal-host-config-file-name>}.yaml
Verify the new
BareMetalHost
object status:kubectl create -n <managed-cluster-project-name> get bmh -o wide <bare-metal-host-unique-name>
Example of system response:
NAMESPACE NAME STATUS STATE CONSUMER BMC BOOTMODE ONLINE ERROR REGION my-project bmh1 OK preparing ip_address_for-bmc-access legacy true region-one
During provisioning, the status changes as follows:
registering
inspecting
preparing
After
BareMetalHost
switches to thepreparing
stage, theinspecting
phase finishes and you can verify that hardware information is available in the object status and matches the MOSK cluster hardware requirements.For example:
Verify the status of hardware NICs:
kubectl -n <managed-cluster-project-name> get bmh -o yaml <bare-metal-host-unique-name> -o json | jq -r '[.status.hardware.nics]'
Example of system response:
[ [ { "ip": "172.18.171.32", "mac": "ac:1f:6b:02:81:1a", "model": "0x8086 0x1521", "name": "eno1", "pxe": true }, { "ip": "fe80::225:90ff:fe33:d5ac%ens1f0", "mac": "00:25:90:33:d5:ac", "model": "0x8086 0x10fb", "name": "ens1f0" }, ...
Verify the status of RAM:
kubectl -n <managed-cluster-project-name> get bmh -o yaml <bare-metal-host-unique-name> -o json | jq -r '[.status.hardware.ramMebibytes]'
Example of system response:
[ 98304 ]
Now, proceed with Create a custom bare metal host profile.
Create a custom bare metal host profile¶
The bare metal host profile is a Kubernetes custom resource. It enables the operator to define how the storage devices and the operating system are provisioned and configured.
This section describes the bare metal host profile default settings and configuration of custom profiles for managed clusters using Mirantis Container Cloud API.
Default configuration of the host system storage¶
The default host profile requires three storage devices in the following strict order:
- Boot device and operating system storage
This device contains boot data and operating system data. It is partitioned using the GUID Partition Table (GPT) labels. The root file system is an
ext4
file system created on top of an LVM logical volume. For a detailed layout, refer to the table below.
- Local volumes device
This device contains an
ext4
file system with directories mounted as persistent volumes to Kubernetes. These volumes are used by the Mirantis Container Cloud services to store its data, including monitoring and identity databases.
- Ceph storage device
This device is used as a Ceph datastore or Ceph OSD.
The following table summarizes the default configuration of the host system storage set up by the Container Cloud bare metal management.
Device/partition |
Name/Mount point |
Recommended size, GB |
Description |
---|---|---|---|
|
|
4 MiB |
The mandatory GRUB boot partition required for non-UEFI systems. |
|
|
0.2 GiB |
The boot partition required for the UEFI boot mode. |
|
|
64 MiB |
The mandatory partition for the |
|
|
100% of the remaining free space in the LVM volume group |
The main LVM physical volume that is used to create the root file system. |
|
|
100% of the remaining free space in the LVM volume group |
The LVM physical volume that is used to create the file system
for |
|
|
100% of the remaining free space in the LVM volume group |
Clean raw disk that will be used for the Ceph storage back end. |
Now, proceed to Create MOSK host profiles.
Create MOSK host profiles¶
Different types of MOSK nodes require differently configured host storage. This section describes how to create custom host profiles for different types of MOSK nodes.
You can create custom profiles for managed clusters using Container Cloud API.
To create MOSK bare metal host profiles:
Log in to the local machine where you management cluster
kubeconfig
is located and wherekubectl
is installed.Note
The management cluster
kubeconfig
is created automatically during the last stage of the management cluster bootstrap.Create a new bare metal host profile for MOSK compute nodes in a YAML file under the
templates/bm/
directory.Edit the host profile using the example template below to meet your hardware configuration requirements:
apiVersion: metal3.io/v1alpha1 kind: BareMetalHostProfile metadata: name: <PROFILE_NAME> namespace: <PROJECT_NAME> spec: devices: # From the HW node, obtain the first device, which size is at least 60Gib - device: workBy: "by_id,by_wwn,by_path,by_name" minSizeGiB: 60 type: ssd wipe: true partitions: - name: bios_grub partflags: - bios_grub sizeGiB: 0.00390625 wipe: true - name: uefi partflags: - esp sizeGiB: 0.2 wipe: true - name: config-2 sizeGiB: 0.0625 wipe: true # This partition is only required on compute nodes if you plan to # use LVM ephemeral storage. - name: lvm_nova_part wipe: true sizeGiB: 100 - name: lvm_root_part sizeGiB: 0 wipe: true # From the HW node, obtain the second device, which size is at least 60Gib # If a device exists but does not fit the size, # the BareMetalHostProfile will not be applied to the node - device: workBy: "by_id,by_wwn,by_path,by_name" minSizeGiB: 60 type: ssd wipe: true # From the HW node, obtain the disk device with the exact name - device: workBy: "by_id,by_wwn,by_path,by_name" minSizeGiB: 60 wipe: true partitions: - name: lvm_lvp_part sizeGiB: 0 wipe: true # Example of wiping a device w\o partitioning it. # Mandatory for the case when a disk is supposed to be used for Ceph back end # later - device: workBy: "by_id,by_wwn,by_path,by_name" wipe: true fileSystems: - fileSystem: vfat partition: config-2 - fileSystem: vfat mountPoint: /boot/efi partition: uefi - fileSystem: ext4 logicalVolume: root mountPoint: / - fileSystem: ext4 logicalVolume: lvp mountPoint: /mnt/local-volumes/ logicalVolumes: - name: root sizeGiB: 0 vg: lvm_root - name: lvp sizeGiB: 0 vg: lvm_lvp postDeployScript: | #!/bin/bash -ex echo $(date) 'post_deploy_script done' >> /root/post_deploy_done preDeployScript: | #!/bin/bash -ex echo $(date) 'pre_deploy_script done' >> /root/pre_deploy_done volumeGroups: - devices: - partition: lvm_root_part name: lvm_root - devices: - partition: lvm_lvp_part name: lvm_lvp grubConfig: defaultGrubOptions: - GRUB_DISABLE_RECOVERY="true" - GRUB_PRELOAD_MODULES=lvm - GRUB_TIMEOUT=20 kernelParameters: sysctl: kernel.panic: "900" kernel.dmesg_restrict: "1" kernel.core_uses_pid: "1" fs.file-max: "9223372036854775807" fs.aio-max-nr: "1048576" fs.inotify.max_user_instances: "4096" vm.max_map_count: "262144"
Add or edit the mandatory parameters in the new
BareMetalHostProfile
object. For the parameters description, see Container Cloud API: BareMetalHostProfile spec.Add the bare metal host profile to your management cluster:
kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> apply -f <pathToBareMetalHostProfileFile>
If required, further modify the host profile:
kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> edit baremetalhostprofile <hostProfileName>
Repeat the steps above to create host profiles for other OpenStack node roles such as control plane nodes and storage nodes.
Now, proceed to Enable huge pages in a host profile.
Enable huge pages in a host profile¶
The BareMetalHostProfile
API allows configuring a host to use the
huge pages feature of the Linux kernel on managed clusters.
Note
Huge pages is a mode of operation of the Linux kernel. With huge pages enabled, the kernel allocates the RAM in bigger chunks, or pages. This allows a KVM (kernel-based virtual machine) and VMs running on it to use the host RAM more efficiently and improves the performance of VMs.
To enable huge pages in a custom bare metal host profile for a managed cluster:
Log in to the local machine where you management cluster
kubeconfig
is located and wherekubectl
is installed.Note
The management cluster
kubeconfig
is created automatically during the last stage of the management cluster bootstrap.Open for editing or create a new bare metal host profile under the
templates/bm/
directory.Edit the
grubConfig
section of the host profilespec
using the example below to configure the kernel boot parameters and enable huge pages:spec: grubConfig: defaultGrubOptions: - GRUB_DISABLE_RECOVERY="true" - GRUB_PRELOAD_MODULES=lvm - GRUB_TIMEOUT=20 - GRUB_CMDLINE_LINUX_DEFAULT="hugepagesz=1G hugepages=N"
The example configuration above will allocate
N
huge pages of 1 GB each on the server boot. The lasthugepagesz
parameter value is default unlessdefault_hugepagesz
is defined. For details about possible values, see official Linux kernel documentation.Add the bare metal host profile to your management cluster:
kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> apply -f <pathToBareMetalHostProfileFile>
If required, further modify the host profile:
kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> edit baremetalhostprofile <hostProfileName>
Proceed to Create a MOSK cluster.
Configure RAID support¶
TechPreview
You can configure the support of the software-based Redundant Array of
Independent Disks (RAID) using BareMetalHosProfile
to set up an LVM-based
RAID level 1 (raid1
) or an mdadm-based RAID level 0, 1, or 10 (raid0
,
raid1
, or raid10
).
If required, you can further configure RAID in the same profile, for example, to install a cluster operating system onto a RAID device.
Caution
RAID configuration on already provisioned bare metal machines or on an existing cluster is not supported.
To start using any kind of RAIDs, reprovisioning of machines with a new
BaremetalHostProfile
is required.Mirantis supports the
raid1
type of RAID devices both for LVM and mdadm.Mirantis supports the
raid0
type for the mdadm RAID to be on par with the LVMlinear
type.Mirantis recommends having at least two physical disks for
raid0
andraid1
devices to prevent unnecessary complexity.Starting from MOSK 22.2, Mirantis supports the
raid10
type for mdadm RAID. At least four physical disks are required for this type of RAID.Only an even number of disks can be used for a
raid1
orraid10
device.
TechPreview
Warning
The EFI system partition partflags: ['esp']
must be
a physical partition in the main partition table of the disk, not under
LVM or mdadm software RAID.
During configuration of your custom bare metal host profile,
you can create an LVM-based software RAID device raid1
by adding
type: raid1
to the logicalVolume
spec in BaremetalHostProfile
.
For the LVM RAID parameters description, refer to Container Cloud API: BareMetalHostProfile spec.
For a bare metal host profile configuration, refer to Create a custom bare metal host profile.
Caution
The logicalVolume
spec of the raid1
type requires
at least two devices (partitions) in volumeGroup
where you
build a logical volume. For an LVM of the linear
type,
one device is enough.
Note
The LVM raid1
requires additional space to store the raid1
metadata on a volume group, roughly 4 MB for each partition.
Therefore, you cannot create a logical volume of exactly the same
size as the partitions it works on.
For example, if you have two partitions of 10 GiB, the corresponding
raid1
logical volume size will be less than 10 GiB. For that
reason, you can either set sizeGiB: 0
to use all available space
on the volume group, or set a smaller size than the partition size.
For example, use sizeGiB: 9.9
instead of sizeGiB: 10
for the logical volume.
The following example illustrates an extract of BaremetalHostProfile
with /
on the LVM raid1
.
...
devices:
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 200
type: hdd
wipe: true
partitions:
- name: root_part1
sizeGiB: 120
partitions:
- name: rest_sda
sizeGiB: 0
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 200
type: hdd
wipe: true
partitions:
- name: root_part2
sizeGiB: 120
partitions:
- name: rest_sdb
sizeGiB: 0
volumeGroups:
- name: vg-root
devices:
- partition: root_part1
- partition: root_part2
- name: vg-data
devices:
- partition: rest_sda
- partition: rest_sdb
logicalVolumes:
- name: root
type: raid1 ## <-- LVM raid1
vg: vg-root
sizeGiB: 119.9
- name: data
type: linear
vg: vg-data
sizeGiB: 0
fileSystems:
- fileSystem: ext4
logicalVolume: root
mountPoint: /
mountOpts: "noatime,nodiratime"
- fileSystem: ext4
logicalVolume: data
mountPoint: /mnt/data
TechPreview
Warning
The EFI system partition partflags: ['esp']
must be
a physical partition in the main partition table of the disk, not under
LVM or mdadm software RAID.
During configuration of your custom bare metal host profile as described in
Create a custom bare metal host profile, you can create an mdadm-based software RAID
device raid0
and raid1
by describing the mdadm devices under the
softRaidDevices
field in BaremetalHostProfile
. For example:
...
softRaidDevices:
- name: /dev/md0
devices:
- partition: sda1
- partition: sdb1
- name: raid-name
devices:
- partition: sda2
- partition: sdb2
...
Starting from MOSK 22.2, you can also use the raid10
type for the mdadm-based software RAID devices. This type requires at least
four and in total an even number of storage devices available on your servers.
For example:
softRaidDevices:
- name: /dev/md0
level: raid10
devices:
- partition: sda1
- partition: sdb1
- partition: sdc1
- partition: sdd1
The following fields in softRaidDevices
describe RAID devices:
name
Name of the RAID device to refer to throughout the
baremetalhostprofile
.
level
Type or level of RAID used to create a device, defaults to
raid1
. Set toraid0
orraid10
to create a device of the corresponding type.
devices
List of physical devices or partitions used to build a software RAID device. It must include at least two partitions or devices to build a
raid0
andraid1
devices and at least four forraid10
.
For the rest of the mdadm RAID parameters, see Container Cloud API: BareMetalHostProfile spec.
Caution
The mdadm RAID devices cannot be created on top of LVM devices.
The following example illustrates an extract of BaremetalHostProfile
with /
on the mdadm raid1
and some data storage on raid0
:
Example with /
on the mdadm raid1 and data storage on raid0
...
devices:
- device:
workBy: "by_id,by_wwn,by_path,by_name"
type: nvme
wipe: true
partitions:
- name: root_part1
sizeGiB: 120
partitions:
- name: rest_sda
sizeGiB: 0
- device:
workBy: "by_id,by_wwn,by_path,by_name"
type: nvme
wipe: true
partitions:
- name: root_part2
sizeGiB: 120
partitions:
- name: rest_sdb
sizeGiB: 0
softRaidDevices:
- name: root
level: raid1 ## <-- mdadm raid1
devices:
- partition: root_part1
- partition: root_part2
- name: raid-name
level: raid0 ## <-- mdadm raid0
devices:
- partition: rest_sda
- partition: rest_sdb
fileSystems:
- fileSystem: ext4
softRaidDevice: root
mountPoint: /
mountOpts: "noatime,nodiratime"
- fileSystem: ext4
softRaidDevice: data
mountPoint: /mnt/data
...
The following example illustrates an extract of BaremetalHostProfile
with data storage on a raid10
device:
Example with data storage on the mdadm raid10
...
devices:
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 60
type: ssd
wipe: true
partitions:
- name: bios_grub1
partflags:
- bios_grub
sizeGiB: 0.00390625
wipe: true
- name: uefi
partflags:
- esp
sizeGiB: 0.20000000298023224
wipe: true
- name: config-2
sizeGiB: 0.0625
wipe: true
- name: lvm_root
sizeGiB: 0
wipe: true
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 60
type: nvme
wipe: true
partitions:
- name: md_part1
partflags:
- raid
sizeGiB: 40
wipe: true
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 60
type: nvme
wipe: true
partitions:
- name: md_part2
partflags:
- raid
sizeGiB: 40
wipe: true
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 60
type: nvme
wipe: true
partitions:
- name: md_part3
partflags:
- raid
sizeGiB: 40
wipe: true
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 60
type: nvme
wipe: true
partitions:
- name: md_part4
partflags:
- raid
sizeGiB: 40
wipe: true
fileSystems:
- fileSystem: vfat
partition: config-2
- fileSystem: vfat
mountPoint: /boot/efi
partition: uefi
- fileSystem: ext4
mountOpts: rw,noatime,nodiratime,lazytime,nobarrier,commit=240,data=ordered
mountPoint: /
partition: root
- filesystem: ext4
mountPoint: /var
softRaidDevice: /dev/md0
softRaidDevices:
- devices:
- partition: md_root_part1
- partition: md_root_part2
- partition: md_root_part3
- partition: md_root_part4
level: raid10
metadata: "1.2"
name: /dev/md0
...
Create LVM volume groups on top of RAID devices¶
Available since MOSK 22.2 TechPreview
You can configure an LVM volume group on top of mdadm-based RAID devices as
physical volumes using the BareMetalHostProfile
resource. List the
required RAID devices in a separate field of the volumeGroups
definition
within the storage configuration of BareMetalHostProfile
.
The following example illustrates an extract of BaremetalHostProfile
with
a volume group named lvm_nova
to be created on top of an mdadm-based
RAID device raid1
:
...
devices:
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 60
type: ssd
wipe: true
partitions:
- name: bios_grub
partflags:
- bios_grub
sizeGiB: 0.00390625
- name: uefi
partflags:
- esp
sizeGiB: 0.20000000298023224
- name: config-2
sizeGiB: 0.0625
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 30
type: ssd
wipe: true
partitions:
- name: md0_part1
- device:
workBy: "by_id,by_wwn,by_path,by_name"
minSizeGiB: 30
type: ssd
wipe: true
partitions:
- name: md0_part2
softRaidDevices:
- devices:
- partition: md0_part1
- partition: md0_part2
level: raid1
metadata: "1.0"
name: /dev/md0
volumeGroups:
- devices:
- softRaidDevice: /dev/md0
name: lvm_nova
...
Create a MOSK cluster¶
With L2 networking templates, you can create MOSK clusters with advanced host networking configurations. For example, you can create bond interfaces on top of physical interfaces on the host or use multiple subnets to separate different types of network traffic.
You can use several host-specific L2 templates per one cluster to support different hardware configurations. For example, you can create L2 templates with a different number and layout of NICs to be applied to specific machines of one cluster.
You can also use multiple L2 templates to support different roles for nodes in a MOSK installation. You can create L2 templates with different logical interfaces and assign them to individual machines based on their roles in a MOSK cluster.
When you create a baremetal-based project in the Container Cloud web UI, the
exemplary templates with the ipam/PreInstalledL2Template
label are copied
to this project. These templates are preinstalled during the management
cluster bootstrap.
Follow the procedures below to create MOSK clusters using L2 templates.
Create a managed bare metal cluster¶
This section instructs you on how to configure and deploy a managed cluster that is based on the baremetal-based management cluster through the Mirantis Container Cloud web UI.
To create a managed cluster on bare metal:
Log in to the Container Cloud web UI with the
writer
permissions.Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.
Caution
Do not create a new managed cluster for MOSK in the
default
project (Kubernetes namespace). If no projects are defined, create a newmosk
project first.In the SSH keys tab, click Add SSH Key to upload the public SSH key that will be used for the SSH access to VMs.
Optional. In the Proxies tab, enable proxy access to the managed cluster:
Click Add Proxy.
In the Add New Proxy wizard, fill out the form with the following parameters:
Proxy configuration¶ Parameter
Description
Proxy Name
Name of the proxy server to use during a managed cluster creation.
Region
From the drop-down list, select the required region.
HTTP Proxy
Add the HTTP proxy server domain name in the following format:
http://proxy.example.com:port
- for anonymous accesshttp://user:password@proxy.example.com:port
- for restricted access
HTTPS Proxy
Add the HTTPS proxy server domain name in the same format as for HTTP Proxy.
No Proxy
Comma-separated list of IP addresses or domain names.
For the list of Mirantis resources and IP addresses to be accessible from the Container Cloud clusters, see Reference Architecture: Requirements.
In the Clusters tab, click Create Cluster.
Configure the new cluster in the Create New Cluster wizard that opens:
Define general and Kubernetes parameters:
Create new cluster: General, Provider, and Kubernetes¶ Section
Parameter name
Description
General settings
Cluster name
The cluster name.
Provider
Select Baremetal.
Region
From the drop-down list, select Baremetal.
Release version
Select a Container Cloud version with the OpenStack label tag. Otherwise, you will not be able to deploy MOSK on this managed cluster.
Proxy
Optional. From the drop-down list, select the proxy server name that you have previously created.
SSH keys
From the drop-down list, select the SSH key name that you have previously added for SSH access to the bare metal hosts.
Provider
LB host IP
The IP address of the load balancer endpoint that will be used to access the Kubernetes API of the new cluster. This IP address must be on the Combined/PXE network.
LB address range
The range of IP addresses that can be assigned to load balancers for Kubernetes Services by MetalLB.
Kubernetes
Services CIDR blocks
The Kubernetes Services CIDR blocks. For example,
10.233.0.0/18
.Pods CIDR blocks
The Kubernetes pods CIDR blocks. For example,
10.233.64.0/18
.Configure StackLight:
StackLight configuration¶ Section
Parameter name
Description
StackLight
Enable Monitoring
Selected by default. Deselect to skip StackLight deployment.
Note
You can also enable, disable, or configure StackLight parameters after deploying a managed cluster. For details, see Mirantis Container Cloud Operations Guide:
Enable Logging
Select to deploy the StackLight logging stack. For details about the logging components, see Deployment architecture.
Note
The logging mechanism performance depends on the cluster log load. In case of a high load, you may need to increase the default resource requests and limits for
fluentdLogs
. For details, see Mirantis Container Cloud Operations Guide: StackLight resource limits.HA Mode
Select to enable StackLight monitoring in the HA mode. For the differences between HA and non-HA modes, see Deployment architecture.
StackLight Default Logs Severity Level
Log severity (verbosity) level for all StackLight components. The default value for this parameter is Default component log level that respects original defaults of each StackLight component. For details about severity levels, see Mirantis Container Cloud Operations Guide: StackLight log verbosity.
StackLight Component Logs Severity Level
The severity level of logs for a specific StackLight component that overrides the value of the StackLight Default Logs Severity Level parameter. For details about severity levels, see Mirantis Container Cloud Operations Guide: StackLight log verbosity.
Expand the drop-down menu for a specific component to display its list of available log levels.
Elasticsearch 0
Logstash Retention Time Available since MOSK 22.3
Available if you select Enable Logging. Specifies the
logstash-*
index retention time.Events Retention Time Available since MOSK 22.3
Available if you select Enable Logging. Specifies the
kubernetes_events-*
index retention time.Notifications Retention Time Available since MOSK 22.3
Available if you select Enable Logging. Specifies the
notification-*
index retention time.Retention Time Removed since MOSK 22.3
Available if you select Enable Logging. The OpenSearch logs retention period.
Persistent Volume Claim Size
Available if you select Enable Logging. The OpenSearch persistent volume claim size.
Collected Logs Severity Level
Available if you select Enable Logging. The minimum severity of all Container Cloud components logs collected in OpenSearch. For details about severity levels, see Mirantis Container Cloud Operations Guide: StackLight logging.
Prometheus
Retention Time
The Prometheus database retention period.
Retention Size
The Prometheus database retention size.
Persistent Volume Claim Size
The Prometheus persistent volume claim size.
Enable Watchdog Alert
Select to enable the Watchdog alert that fires as long as the entire alerting pipeline is functional.
Custom Alerts
Specify alerting rules for new custom alerts or upload a YAML file in the following exemplary format:
- alert: HighErrorRate expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 for: 10m labels: severity: page annotations: summary: High request latency
For details, see Official Prometheus documentation: Alerting rules. For the list of the predefined StackLight alerts, see Operations Guide: StackLight alerts.
StackLight Email Alerts
Enable Email Alerts
Select to enable the StackLight email alerts.
Send Resolved
Select to enable notifications about resolved StackLight alerts.
Require TLS
Select to enable transmitting emails through TLS.
Email alerts configuration for StackLight
Fill out the following email alerts parameters as required:
To - the email address to send notifications to.
From - the sender address.
SmartHost - the SMTP host through which the emails are sent.
Authentication username - the SMTP user name.
Authentication password - the SMTP password.
Authentication identity - the SMTP identity.
Authentication secret - the SMTP secret.
StackLight Slack Alerts
Enable Slack alerts
Select to enable the StackLight Slack alerts.
Send Resolved
Select to enable notifications about resolved StackLight alerts.
Slack alerts configuration for StackLight
Fill out the following Slack alerts parameters as required:
API URL - The Slack webhook URL.
Channel - The channel to send notifications to, for example, #channel-for-alerts.
- 0
Starting from MOSK 22.2, Elasticsearch has switched to OpenSearch. For details, see Elasticsearch switch to OpenSearch.
Click Create.
To monitor the cluster readiness, hover over the status icon of a specific cluster in the Status column of the Clusters page.
Once the orange blinking status icon is green and Ready, the cluster deployment or update is complete.
You can monitor live deployment status of the following cluster components:
Component
Description
Helm
Installation or upgrade status of all Helm releases
Kubelet
Readiness of the node in a Kubernetes cluster, as reported by kubelet
Kubernetes
Readiness of all requested Kubernetes objects
Nodes
Equality of the requested nodes number in the cluster to the number of nodes having the
Ready
LCM statusOIDC
Readiness of the cluster OIDC configuration
StackLight
Health of all StackLight-related objects in a Kubernetes cluster
Swarm
Readiness of all nodes in a Docker Swarm cluster
LoadBalancer
Readiness of the Kubernetes API load balancer
ProviderInstance
Readiness of all machines in the underlying infrastructure (virtual or bare metal, depending on the provider type)
Optional. Colocate the OpenStack control plane with the managed cluster Kubernetes manager nodes by adding the following field to the
Cluster
object spec:spec: providerSpec: value: dedicatedControlPlane: false
Note
This feature is available as technical preview. Use such configuration for testing and evaluation purposes only.
Optional. Customize MetalLB speakers that are deployed on all Kubernetes nodes except master nodes by default. For details, see Configure the MetalLB speaker node selector.
Once you have created a MOSK cluster, some StackLight alerts may raise as false-positive until you deploy the Mirantis OpenStack environment.
Proceed to Workflow of network interface naming.
Workflow of network interface naming¶
To simplify operations with L2 templates, before you start creating them, inspect the general workflow of a network interface name gathering and processing.
Network interface naming workflow:
The Operator creates a
baremetalHost
object.The
baremetalHost
object executes theintrospection
stage and becomesready
.The Operator collects information about NIC count, naming, and so on for further changes in the mapping logic.
At this stage, the NICs order in the object may randomly change during each introspection, but the NICs names are always the same. For more details, see Predictable Network Interface Names.
For example:
# Example commands: # kubectl -n managed-ns get bmh baremetalhost1 -o custom-columns='NAME:.metadata.name,STATUS:.status.provisioning.state' # NAME STATE # baremetalhost1 ready # kubectl -n managed-ns get bmh baremetalhost1 -o yaml # Example output: apiVersion: metal3.io/v1alpha1 kind: BareMetalHost ... status: ... nics: - ip: fe80::ec4:7aff:fe6a:fb1f%eno2 mac: 0c:c4:7a:6a:fb:1f model: 0x8086 0x1521 name: eno2 pxe: false - ip: fe80::ec4:7aff:fe1e:a2fc%ens1f0 mac: 0c:c4:7a:1e:a2:fc model: 0x8086 0x10fb name: ens1f0 pxe: false - ip: fe80::ec4:7aff:fe1e:a2fd%ens1f1 mac: 0c:c4:7a:1e:a2:fd model: 0x8086 0x10fb name: ens1f1 pxe: false - ip: 192.168.1.151 # Temp. PXE network adress mac: 0c:c4:7a:6a:fb:1e model: 0x8086 0x1521 name: eno1 pxe: true ...
The Operator selects from the following options:
Create an
l2template
object with theifMapping
configuration. For details, see Create L2 templates.Create a
Machine
object, with thel2TemplateIfMappingOverride
configuration. For details, see Override network interfaces naming and order.
The Operator creates a
Machine
orSubnet
object.The
baremetal-provider
service links theMachine
object to thebaremetalHost
object.The
kaas-ipam
andbaremetal-provider
services collect hardware information from thebaremetalHost
object and use it to configure host networking and services.The
kaas-ipam
service:Spawns the
IpamHost
object.Renders the
l2template
object.Spawns the
ipaddr
object.Updates the
IpamHost
object status with all rendered and linked information.
The
baremetal-provider
service collects the rendered networking information from theIpamHost
objectThe
baremetal-provider
service proceeds with theIpamHost
object provisioning.
Now proceed to Create subnets.
Create subnets¶
Before creating an L2 template, ensure that you have the required subnets
that can be used in the L2 template to allocate IP addresses for the
MOSK cluster nodes.
Where required, create a number of subnets for a particular project
using the Subnet
CR. A subnet has three logical scopes:
global - CR uses the
default
namespace. A subnet can be used for any cluster located in any project.namespaced - CR uses the namespace that corresponds to a particular project where MOSK clusters are located. A subnet can be used for any cluster located in the same project.
cluster - CR uses the namespace where the referenced cluster is located. A subnet is only accessible to the cluster that
L2Template.spec.clusterRef
refers to. TheSubnet
objects with thecluster
scope will be created for every new cluster.
You can have subnets with the same name in different projects. In this case, the subnet that has the same project as the cluster will be used. One L2 template may often reference several subnets, those subnets may have different scopes in this case.
The IP address objects (IPaddr
CR) that are allocated from subnets
always have the same project as their corresponding IpamHost
objects,
regardless of the subnet scope.
To create subnets for a cluster:
Log in to a local machine where your management cluster
kubeconfig
is located and wherekubectl
is installed.Note
The management cluster
kubeconfig
is created during the last stage of the management cluster bootstrap.Create the
subnet.yaml
file with a number of global or namespaced subnets depending on the configuration of your cluster:kubectl --kubeconfig <pathToManagementClusterKubeconfig> apply -f <SubnetFileName.yaml>
Note
In the command above and in the steps below, substitute the parameters enclosed in angle brackets with the corresponding values.
Example of a
subnet.yaml
file:apiVersion: ipam.mirantis.com/v1alpha1 kind: Subnet metadata: name: demo namespace: demo-namespace spec: cidr: 10.11.0.0/24 gateway: 10.11.0.9 includeRanges: - 10.11.0.5-10.11.0.70 nameservers: - 172.18.176.6
Specification fields of the Subnet object¶ Parameter
Description
cidr
(singular)A valid IPv4 CIDR, for example, 10.11.0.0/24.
includeRanges
(list)A list of IP address ranges within the given CIDR that should be used in the allocation of IPs for nodes (excluding the gateway address). The IPs outside the given ranges will not be used in the allocation. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77. In the example above, the addresses
10.11.0.5-10.11.0.70
(excluding the gateway address 10.11.0.9) will be allocated for nodes. TheincludeRanges
parameter is mutually exclusive withexcludeRanges
.excludeRanges
(list)A list of IP address ranges within the given CIDR that should not be used in the allocation of IPs for nodes. The IPs within the given CIDR but outside the given ranges will be used in the allocation (excluding gateway address). Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77. The
excludeRanges
parameter is mutually exclusive withincludeRanges
.useWholeCidr
(boolean)If set to
true
, the subnet address (10.11.0.0 in the example above) and the broadcast address (10.11.0.255 in the example above) are included into the address allocation for nodes. Otherwise, (false
by default), the subnet address and broadcast address will be excluded from the address allocation.gateway
(singular)A valid gateway address, for example, 10.11.0.9.
nameservers
(list)A list of the IP addresses of name servers. Each element of the list is a single address, for example, 172.18.176.6.
Caution
The subnet for the PXE network is automatically created during deployment and must contain the
ipam/DefaultSubnet: "1"
label. Each bare metal region must have only one subnet with this label.Caution
You may use different subnets to allocate IP addresses to diffrenet components of MCC in your cluster. See below for detailed list of available options. Each subnet that is used to configure an MCC service must be labelled with special service label that starts with
ipam/SVC-
prefix. Make sure that no subnet has more than one such label.Optional. Add subnet for MetalLB service in your cluster. To designate a Subnet as MetalLB address pool, use label key
ipam/SVC-MetalLB
. Set value of the label to “1”. Set thecluster.sigs.k8s.io/cluster-name
label to the name of the cluster where subnet is used. You may create multiple subnets withipam/SVC-MetalLB
label to define multiple IP address ranges for MetalLB in the cluster.Caution
The IP addresses of the MetalLB address pool are not assigned to the interfaces on hosts. This is purely virtual subnet. Make sure that it is not included in the L2 template definitions for your cluster.
Caution
Intersection of IP address ranges within any single MetalLB address pool is not permitted. Make sure that this requirement is satisfied when configuring MetalLB address pools.
Since MOSK 22.3, intersection of IP address ranges is verified by the bare metal provider. If intersection is identified, the MetalLB configuration will be blocked and the provider logs will contain corresponding error messages.
Note
When MetalLB address ranges are defined in both cluster specification and specific
Subnet
objects, the resulting MetalLB address pools configuration will contain address ranges from both cluster specification andSubnet
objects.All address ranges for L2 address pools that are defined in both cluster specification and
Subnet
objects are aggregated into a single L2 address pool and sorted as strings.
Optional. Technology Preview. Add a subnet for the externally accessible API endpoint of the MOSK cluster.
Make sure that
loadBalancerHost
is set to""
(empty string) in theCluster
spec.spec: providerSpec: value: apiVersion: baremetal.k8s.io/v1alpha1 kind: BaremetalClusterProviderSpec ... loadBalancerHost: ""
Create a subnet with the
ipam/SVC-LBhost
label having the"1"
value to make thebaremetal-provider
use this subnet for allocation of addresses for cluster API endpoints.
One IP address will be allocated for each cluster to serve its Kubernetes/MKE API endpoint.
Caution
Make sure that master nodes have host local-link addresses in the same subnet as the cluster API endpoint address. These host IP addresses will be used for VRRP traffic. The cluster API endpoint address will be assigned to the same interface on one of the master nodes where these host IPs are assigned.
Note
We highly recommend that you assign the cluster API endpoint address from the LCM network. For details on cluster networks types, refer to MOSK cluster networking. See also the Single MOSK cluster use case example in the following table.
You can use several options of addresses allocation scope of API endpoints using subnets:
Use case
Example configuration
Several MOSK clusters in a region
Create a subnet in the
default
namespace with no reference to any cluster.apiVersion: ipam.mirantis.com/v1alpha1 kind: Subnet metadata: name: lbhost-per-region namespace: default labels: kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one ipam/SVC-LBhost: "1" spec: cidr: 191.11.0.0/24 includeRanges: - 191.11.0.6-191.11.0.20
Several MOSK clusters in a project
Create a subnet in a namespace corresponding to your project with no reference to any cluster. Such subnet has priority over the one described above.
apiVersion: ipam.mirantis.com/v1alpha1 kind: Subnet metadata: name: lbhost-per-namespace namespace: my-project labels: kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one ipam/SVC-LBhost: "1" spec: cidr: 191.11.0.0/24 includeRanges: - 191.11.0.6-191.11.0.20
Single MOSK cluster
Create a subnet in a namespace corresponding to your project with a reference to the target cluster using the
cluster.sigs.k8s.io/cluster-name
label. Such subnet has priority over the ones described above. In this case, it is not obligatory to use a dedicated subnet for addresses allocation of API endpoints. You can add theipam/SVC-LBhost
label to the LCM subnet, and one of the addresses from this subnet will be allocated for an API endpoint:apiVersion: ipam.mirantis.com/v1alpha1 kind: Subnet metadata: name: lbhost-per-namespace namespace: my-project labels: kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one ipam/SVC-LBhost: "1" ipam/SVC-k8s-lcm: "1" cluster.sigs.k8s.io/cluster-name: my-cluster spec: cidr: 10.11.0.0/24 includeRanges: - 10.11.0.6-10.11.0.50
The above options can be used in conjunction. For example, you can define a subnet for a region, a number of subnets within this region defined for particular namespaces, and a number of subnets within the same region and namespaces defined for particular clusters.
Optional. Add a subnet(s) for the storage access network.
Set the
ipam/SVC-ceph-public
label with the value"1"
to create a subnet that will be used to configure the Ceph public network.Ceph will automatically use this subnet for its external connections.
A Ceph OSD will look for and bind to an address from this subnet when it is started on a machine.
Use this subnet in the L2 template for storage nodes.
Assign this subnet to the interface connected to your storage access network.
When using this label, set the
cluster.sigs.k8s.io/cluster-name
label to the name of the target cluster during the subnet creation.Optional. Add a subnet(s) for the storage replication network.
Set the
ipam/SVC-ceph-cluster
label with the value"1"
to create a subnet that will be used to configure Ceph ‘replication’ network.Ceph will automatically use this subnet for its internal replication traffic.
Use this subnet in the L2 template for storage nodes.
When using this label, set the
cluster.sigs.k8s.io/cluster-name
label to the name of the target cluster during the subnet creation.Optional. Add a subnet for Kubernetes pods traffic.
Caution
Use of a dedicated network for Kubernetes pods traffic, for external connection to the Kubernetes services exposed by the cluster, and for the Ceph cluster access and replication traffic is available as Technology Preview. Use such configurations for testing and evaluation purposes only. For the Technology Preview feature definition, refer to Technology Preview features.
The following feature is still under development and will be announced in one of the following Container Cloud releases:
Switching Kubernetes API to listen to the specified IP address on the node
Verify that the subnet is successfully created:
kubectl get subnet kaas-mgmt -oyaml
In the system output, verify the
status
fields of thesubnet.yaml
file using the table below.Status fields of the Subnet object¶ Parameter
Description
statusMessage
Contains a short state description and a more detailed one if applicable. The short status values are as follows:
OK
- operational.ERR
- non-operational. This status has a detailed description, for example,ERR: Wrong includeRange for CIDR…
.
cidr
Reflects the actual CIDR, has the same meaning as
spec.cidr
.gateway
Reflects the actual gateway, has the same meaning as
spec.gateway
.nameservers
Reflects the actual name servers, has same meaning as
spec.nameservers
.ranges
Specifies the address ranges that are calculated using the fields from
spec: cidr, includeRanges, excludeRanges, gateway, useWholeCidr
. These ranges are directly used for nodes IP allocation.lastUpdate
Includes the date and time of the latest update of the
Subnet
RC.allocatable
Includes the number of currently available IP addresses that can be allocated for nodes from the subnet.
allocatedIPs
Specifies the list of IPv4 addresses with the corresponding
IPaddr
object IDs that were already allocated from the subnet.capacity
Contains the total number of IP addresses being held by ranges that equals to a sum of the
allocatable
andallocatedIPs
parameters values.versionIpam
Contains thevVersion of the
kaas-ipam
component that made the latest changes to theSubnet
RC.Example of a successfully created subnet:
apiVersion: ipam.mirantis.com/v1alpha1 kind: Subnet metadata: labels: ipam/UID: 6039758f-23ee-40ba-8c0f-61c01b0ac863 kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one name: kaas-mgmt namespace: default spec: cidr: 10.0.0.0/24 excludeRanges: - 10.0.0.100 - 10.0.0.101-10.0.0.120 gateway: 10.0.0.1 includeRanges: - 10.0.0.50-10.0.0.90 nameservers: - 172.18.176.6 status: allocatable: 38 allocatedIPs: - 10.0.0.50:0b50774f-ffed-11ea-84c7-0242c0a85b02 - 10.0.0.51:1422e651-ffed-11ea-84c7-0242c0a85b02 - 10.0.0.52:1d19912c-ffed-11ea-84c7-0242c0a85b02 capacity: 41 cidr: 10.0.0.0/24 gateway: 10.0.0.1 lastUpdate: "2020-09-26T11:40:44Z" nameservers: - 172.18.176.6 ranges: - 10.0.0.50-10.0.0.90 statusMessage: OK versionIpam: v3.0.999-20200807-130909-44151f8
Now, proceed with creating subnets for your MOSK cluster as described in Create subnets for a MOSK cluster.
Create subnets for a MOSK cluster¶
According to the MOSK reference architecture, you should create the following subnets.
lcm-nw
¶The LCM network of the MOSK cluster. Example of lcm-nw
:
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
labels:
kaas.mirantis.com/provider: baremetal
kaas.mirantis.com/region: region-one
kaas-mgmt-subnet: ""
name: lcm-nw
namespace: <MOSKClusterNamespace>
spec:
cidr: 172.16.43.0/24
gateway: 172.16.43.1
includeRanges:
- 172.16.43.10-172.16.43.100
k8s-ext-subnet
¶The addresses from this subnet are assigned to interfaces connected to the external network.
Example of k8s-ext-subnet
:
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
labels:
kaas.mirantis.com/provider: baremetal
kaas.mirantis.com/region: region-one
name: k8s-ext-subnet
namespace: <MOSKClusterNamespace>
spec:
cidr: 172.16.45.0/24
includeRanges:
- 172.16.45.10-172.16.45.100
mosk-metallb-subnet
¶This subnet is not allocated to interfaces, but used as a MetalLB address pool to expose MOSK API endpoints as Kubernetes cluster services.
Example of mosk-metallb-subnet
:
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
labels:
kaas.mirantis.com/provider: baremetal
kaas.mirantis.com/region: region-one
ipam/SVC-metallb: true
name: mosk-metallb-subnet
namespace: <MOSKClusterNamespace>
spec:
cidr: 172.16.45.0/24
includeRanges:
- 172.16.45.101-172.16.45.200
k8s-pods-subnet
¶The addresses from this subnet are assigned to interfaces conncected to the internal network and used by Calico as underlay for traffic between the pods in Kubernetes cluster.
Example of k8s-pods-subnet
:
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
labels:
kaas.mirantis.com/provider: baremetal
kaas.mirantis.com/region: region-one
name: k8s-pods-subnet
namespace: <MOSKClusterNamespace>
spec:
cidr: 10.12.3.0/24
includeRanges:
- 10.12.3.10-10.12.3.100
neutron-tunnel-subnet
¶The underlay network for VXLAN tunnels for the MOSK tenants traffic. If deployed with Tungsten Fabric, it is used for MPLS over UDP+GRE traffic.
Example of neutron-tunnel-subnet
:
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
labels:
kaas.mirantis.com/provider: baremetal
kaas.mirantis.com/region: region-one
name: neutron-tunnel-subnet
namespace: <MOSKClusterNamespace>
spec:
cidr: 10.12.2.0/24
includeRanges:
- 10.12.2.10-10.12.2.100
ceph-public-subnet
¶Example of a Ceph cluster access network:
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
labels:
kaas.mirantis.com/provider: baremetal
kaas.mirantis.com/region: region-one
ipam/SVC-ceph-public: true
name: ceph-public-subnet
namespace: <MOSKClusterNamespace>
spec:
cidr: 10.12.0.0/24
ceph-cluster-subnet
¶Example of the Ceph replication traffic network:
apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
labels:
kaas.mirantis.com/provider: baremetal
kaas.mirantis.com/region: region-one
ipam/SVC-ceph-cluster: true
name: ceph-cluster-subnet
namespace: <MOSKClusterNamespace>
spec:
cidr: 10.12.1.0/24
Now, proceed with creating an L2 template for one or multiple managed clusters as described in Create L2 templates.
Create L2 templates¶
After you create subnets for the MOSK cluster as described in Create subnets, follow the procedure below to create L2 templates for different types of OpenStack nodes in the cluster.
See the following subsections for templates that implement the MOSK Reference Architecture: Networking. You may adjust the templates according to the requirements of your architecture using the last two subsections of this section. They explain mandatory parameters of the templates and supported configuration options.
Warning
Avoid modifying existing L2 templates and subnets that the deployed machines use. This prevents multiple clusters failures caused by unsafe changes. The list of risks posed by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause data loss.
Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.
Note
Starting from MOSK 22.3, modification of L2 templates in use is prohibited in the API to prevent accidental cluster failures due to unsafe changes.
According to the reference architecture, the Kubernetes manager nodes in the MOSK cluster must be connected to the following networks:
PXE network
LCM network
Caution
If you plan to deploy MOSK cluster with the compact control plane option, skip this section entirely and proceed with Create an L2 template for a MOSK controller node.
To create an L2 template for Kubernetes manager nodes:
Create or open the
mosk-l2templates.yml
file that contains the L2 templates you are preparing.Add an L2 template using the following example. Adjust the values of specific parameters according to the specifications of your environment.
L2 template example¶apiVersion: ipam.mirantis.com/v1alpha1 kind: L2Template metadata: labels: kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one cluster.sigs.k8s.io/cluster-name: <MOSKClusterName> name: k8s-manager namespace: <MOSKnClusterNamespace> spec: autoIfMappingPrio: - provision - eno - ens - enp clusterRef: <MOSKClusterName> l3Layout: - subnetName: lcm-nw scope: global labelSelector: kaas.mirantis.com/provider: baremetal kaas-mgmt-subnet: "" npTemplate: |- version: 2 ethernets: {{nic 0}}: dhcp4: false dhcp6: false match: macaddress: {{mac 0}} set-name: {{nic 0}} mtu: 9000 {{nic 1}}: dhcp4: false dhcp6: false match: macaddress: {{mac 1}} set-name: {{nic 1}} mtu: 9000 {{nic 2}} dhcp4: false dhcp6: false match: macaddress: {{mac 2}} set-name: {{nic 2}} mtu: 9000 {{nic 3}}: dhcp4: false dhcp6: false match: macaddress: {{mac 3}} set-name: {{nic 3}} mtu: 9000 bonds: bond0: mtu: 9000 parameters: mode: 802.3ad interfaces: - {{nic 0}} - {{nic 1}} vlans: k8s-lcm-v: id: 403 link: bond0 mtu: 9000 k8s-ext-v: id: 409 link: bond0 mtu: 9000 k8s-pods-v: id: 408 link: bond0 mtu: 9000 bridges: k8s-lcm: interfaces: [k8s-lcm-v] nameservers: addresses: {{nameservers_from_subnet "lcm-nw"}} gateway4: {{ gateway_from_subnet "lcm-nw" }} addresses: - {{ ip "0:lcm-nw" }} k8s-ext: interfaces: [k8s-ext-v] addresses: - {{ip "k8s-ext:k8s-ext-subnet"}} mtu: 9000 k8s-pods: interfaces: [k8s-pods-v] addresses: - {{ip "k8s-pods:k8s-pods-subnet"}} mtu: 9000
Proceed with Create an L2 template for a MOSK controller node. The resulting L2 template will be used to render the netplan configuration for the managed cluster machines.
Warning
Avoid modifying existing L2 templates and subnets that the deployed machines use. This prevents multiple clusters failures caused by unsafe changes. The list of risks posed by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause data loss.
Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.
Note
Starting from MOSK 22.3, modification of L2 templates in use is prohibited in the API to prevent accidental cluster failures due to unsafe changes.
According to the reference architecture, MOSK controller nodes must be connected to the following networks:
PXE network
LCM network
Kubernetes workloads network
Storage access network
Floating IP and provider networks. Not required for deployment with Tungsten Fabric.
Tenant underlay networks. If deploying with VXLAN networking or with Tungsten Fabric. In the latter case, the BGP service is configured over this network.
To create an L2 template for MOSK controller nodes:
Create or open the
mosk-l2template.yml
file that contains the L2 templates.Add an L2 template using the following example. Adjust the values of specific parameters according to the specification of your environment.
Example of an L2 template for MOSK controller nodes¶apiVersion: ipam.mirantis.com/v1alpha1 kind: L2Template metadata: labels: kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one cluster.sigs.k8s.io/cluster-name: <MOSKClusterName> name: mosk-controller namespace: <MOSKClusterNamespace> spec: autoIfMappingPrio: - provision - eno - ens - enp clusterRef: <MOSKClusterName> l3Layout: - subnetName: lcm-nw scope: global labelSelector: kaas.mirantis.com/provider: baremetal kaas-mgmt-subnet: "" - subnetName: k8s-ext-subnet scope: namespace - subnetName: k8s-pods-subnet scope: namespace - subnetName: ceph-cluster-subnet scope: namespace - subnetName: ceph-public-subnet scope: namespace - subnetName: neutron-tunnel-subnet scope: namespace npTemplate: |- version: 2 ethernets: {{nic 0}}: dhcp4: false dhcp6: false match: macaddress: {{mac 0}} set-name: {{nic 0}} mtu: 9000 {{nic 1}}: dhcp4: false dhcp6: false match: macaddress: {{mac 1}} set-name: {{nic 1}} mtu: 9000 {{nic 2}} dhcp4: false dhcp6: false match: macaddress: {{mac 2}} set-name: {{nic 2}} mtu: 9000 {{nic 3}}: dhcp4: false dhcp6: false match: macaddress: {{mac 3}} set-name: {{nic 3}} mtu: 9000 bonds: bond0: mtu: 9000 parameters: mode: 802.3ad interfaces: - {{nic 0}} - {{nic 1}} bond1: mtu: 9000 parameters: mode: 802.3ad interfaces: - {{nic 2}} - {{nic 3}} vlans: k8s-lcm-v: id: 403 link: bond0 mtu: 9000 k8s-ext-v: id: 409 link: bond0 mtu: 9000 k8s-pods-v: id: 408 link: bond0 mtu: 9000 pr-floating: id: 407 link: bond1 mtu: 9000 stor-frontend: id: 404 link: bond0 mtu: 9000 stor-backend: id: 405 link: bond1 mtu: 9000 neutron-tunnel: id: 406 link: bond1 addresses: - {{ip "neutron-tunnel:neutron-tunnel-subnet"}} mtu: 9000 bridges: k8s-lcm: interfaces: [k8s-lcm-v] nameservers: addresses: {{nameservers_from_subnet "lcm-nw"}} gateway4: {{ gateway_from_subnet "lcm-nw" }} addresses: - {{ ip "0:lcm-nw" }} k8s-ext: interfaces: [k8s-ext-v] addresses: - {{ip "k8s-ext:k8s-ext-subnet"}} mtu: 9000 k8s-pods: interfaces: [k8s-pods-v] addresses: - {{ip "k8s-pods:k8s-pods-subnet"}} mtu: 9000 ceph-public: interfaces: [stor-frontend] addresses: - {{ip "ceph-public:ceph-public-subnet"}} mtu: 9000 ceph-cluster: interfaces: [stor-backend] addresses: - {{ip "ceph-cluster:ceph-cluster-subnet"}} mtu: 9000
Proceed with Create an L2 template for a MOSK compute node.
Warning
Avoid modifying existing L2 templates and subnets that the deployed machines use. This prevents multiple clusters failures caused by unsafe changes. The list of risks posed by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause data loss.
Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.
Note
Starting from MOSK 22.3, modification of L2 templates in use is prohibited in the API to prevent accidental cluster failures due to unsafe changes.
According to the reference architecture, MOSK compute nodes must be connected to the following networks:
PXE network
LCM network
Storage public network (if deploying with Ceph as a back-end for ephemeral storage)
Floating IP and provider networks (if deploying OpenStack with DVR)
Tenant underlay networks
To create an L2 template for MOSK compute nodes:
Add L2 template to the
mosk-l2templates.yml
file using the following example. Adjust the values of parameters according to the specification of your environment.Example of an L2 template for MOSK compute nodes¶apiVersion: ipam.mirantis.com/v1alpha1 kind: L2Template metadata: labels: kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one cluster.sigs.k8s.io/cluster-name: <MOSKClusterName> name: mosk-compute namespace: <MOSKClusterNamespace> spec: autoIfMappingPrio: - provision - eno - ens - enp clusterRef: <MOSKClusterName> l3Layout: - subnetName: lcm-nw scope: global labelSelector: kaas.mirantis.com/provider: baremetal kaas-mgmt-subnet: "" - subnetName: k8s-ext-subnet scope: namespace - subnetName: k8s-pods-subnet scope: namespace - subnetName: ceph-cluster-subnet scope: namespace - subnetName: neutron-tunnel-subnet scope: namespace npTemplate: |- version: 2 ethernets: {{nic 0}}: dhcp4: false dhcp6: false match: macaddress: {{mac 0}} set-name: {{nic 0}} mtu: 9000 {{nic 1}}: dhcp4: false dhcp6: false match: macaddress: {{mac 1}} set-name: {{nic 1}} mtu: 9000 {{nic 2}} dhcp4: false dhcp6: false match: macaddress: {{mac 2}} set-name: {{nic 2}} mtu: 9000 {{nic 3}}: dhcp4: false dhcp6: false match: macaddress: {{mac 3}} set-name: {{nic 3}} mtu: 9000 bonds: bond0: mtu: 9000 parameters: mode: 802.3ad interfaces: - {{nic 0}} - {{nic 1}} bond1: mtu: 9000 parameters: mode: 802.3ad interfaces: - {{nic 2}} - {{nic 3}} vlans: k8s-lcm-v: id: 403 link: bond0 mtu: 9000 k8s-ext-v: id: 409 link: bond0 mtu: 9000 k8s-pods-v: id: 408 link: bond0 mtu: 9000 pr-floating: id: 407 link: bond1 mtu: 9000 stor-frontend: id: 404 link: bond0 mtu: 9000 stor-backend: id: 405 link: bond1 mtu: 9000 neutron-tunnel: id: 406 link: bond1 addresses: - {{ip "neutron-tunnel:neutron-tunnel-subnet"}} mtu: 9000 bridges: k8s-lcm: interfaces: [k8s-lcm-v] nameservers: addresses: {{nameservers_from_subnet "lcm-nw"}} gateway4: {{ gateway_from_subnet "lcm-nw" }} addresses: - {{ ip "0:lcm-nw" }} k8s-ext: interfaces: [k8s-ext-v] addresses: - {{ip "k8s-ext:k8s-ext-subnet"}} mtu: 9000 k8s-pods: interfaces: [k8s-pods-v] addresses: - {{ip "k8s-pods:k8s-pods-subnet"}} mtu: 9000 ceph-public: interfaces: [stor-frontend] addresses: - {{ip "ceph-public:ceph-public-subnet"}} mtu: 9000 ceph-cluster: interfaces: [stor-backend] addresses: - {{ip "ceph-cluster:ceph-cluster-subnet"}} mtu: 9000
Proceed with Create an L2 template for a MOSK storage node.
Warning
Avoid modifying existing L2 templates and subnets that the deployed machines use. This prevents multiple clusters failures caused by unsafe changes. The list of risks posed by modifying L2 templates includes:
Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.
Connections between services are interrupted unexpectedly, which can cause data loss.
Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.
Note
Starting from MOSK 22.3, modification of L2 templates in use is prohibited in the API to prevent accidental cluster failures due to unsafe changes.
According to the reference architecture, MOSK storage nodes in the MOSK cluster must be connected to the following networks:
PXE network
LCM network
Storage access network
Storage replication network
To create an L2 template for MOSK storage nodes:
Add an L2 template to the
mosk-l2templates.yml
file using the following example. Adjust the values of parameters according to the specification of your environment.Example of an L2 template for MOSK storage nodes¶apiVersion: ipam.mirantis.com/v1alpha1 kind: L2Template metadata: labels: kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one cluster.sigs.k8s.io/cluster-name: <MOSKClusterName> name: mosk-storage namespace: <MOSKClusterNamespace> spec: autoIfMappingPrio: - provision - eno - ens - enp clusterRef: <MOSKClusterName> l3Layout: - subnetName: lcm-nw scope: global labelSelector: kaas.mirantis.com/provider: baremetal kaas-mgmt-subnet: "" - subnetName: k8s-ext-subnet scope: namespace - subnetName: k8s-pods-subnet scope: namespace - subnetName: ceph-cluster-subnet scope: namespace - subnetName: ceph-public-subnet scope: namespace npTemplate: |- version: 2 ethernets: {{nic 0}}: dhcp4: false dhcp6: false match: macaddress: {{mac 0}} set-name: {{nic 0}} mtu: 9000 {{nic 1}}: dhcp4: false dhcp6: false match: macaddress: {{mac 1}} set-name: {{nic 1}} mtu: 9000 {{nic 2}} dhcp4: false dhcp6: false match: macaddress: {{mac 2}} set-name: {{nic 2}} mtu: 9000 {{nic 3}}: dhcp4: false dhcp6: false match: macaddress: {{mac 3}} set-name: {{nic 3}} mtu: 9000 bonds: bond0: mtu: 9000 parameters: mode: 802.3ad interfaces: - {{nic 0}} - {{nic 1}} bond1: mtu: 9000 parameters: mode: 802.3ad interfaces: - {{nic 2}} - {{nic 3}} vlans: k8s-lcm-v: id: 403 link: bond0 mtu: 9000 k8s-ext-v: id: 409 link: bond0 mtu: 9000 k8s-pods-v: id: 408 link: bond0 mtu: 9000 stor-frontend: id: 404 link: bond0 mtu: 9000 stor-backend: id: 405 link: bond1 mtu: 9000 bridges: k8s-lcm: interfaces: [k8s-lcm-v] nameservers: addresses: {{nameservers_from_subnet "lcm-nw"}} gateway4: {{ gateway_from_subnet "lcm-nw" }} addresses: - {{ ip "0:lcm-nw" }} k8s-ext: interfaces: [k8s-ext-v] addresses: - {{ip "k8s-ext:k8s-ext-subnet"}} mtu: 9000 k8s-pods: interfaces: [k8s-pods-v] addresses: - {{ip "k8s-pods:k8s-pods-subnet"}} mtu: 9000 ceph-public: interfaces: [stor-frontend] addresses: - {{ip "ceph-public:ceph-public-subnet"}} mtu: 9000 ceph-cluster: interfaces: [stor-backend] addresses: - {{ip "ceph-cluster:ceph-cluster-subnet"}} mtu: 9000
Proceed with Edit and apply L2 templates.
To add L2 templates to a MOSK cluster:
Log in to a local machine where your management cluster
kubeconfig
is located and wherekubectl
is installed.Note
The management cluster
kubeconfig
is created during the last stage of the management cluster bootstrap.Add the L2 template to your management cluster:
kubectl --kubeconfig <pathToManagementClusterKubeconfig> apply -f <pathToL2TemplateYamlFile>
Inspect the existing L2 templates to see if any one fits your deployment:
kubectl --kubeconfig <pathToManagementClusterKubeconfig> \ get l2template -n <ProjectNameForNewManagedCluster>
Optional. Further modify the template, if required or in case of mistake in configuration. See Mandatory parameters of L2 templates and Netplan template macros for details.
kubectl --kubeconfig <pathToManagementClusterKubeconfig> \ -n <ProjectNameForNewManagedCluster> edit l2template <L2templateName>
Think of an L2 template as a template for networking configuration for your hosts. You may adjust the parameters according to the actual requirements and hardware setup of your hosts.
Parameter |
Description |
---|---|
|
References the Cluster object that this template is applied to.
The Caution
|
|
|
|
Subnets to be used in the
Caution The Caution Using the If Mirantis recommends using a unique label prefix such as
|
|
A netplan-compatible configuration with special lookup functions
that defines the networking settings for the cluster hosts,
where physical NIC names and details are parameterized.
This configuration will be processed using Go templates.
Instead of specifying IP and MAC addresses, interface names,
and other network details specific to a particular host,
the template supports use of special lookup functions.
These lookup functions, such as Caution All rules and restrictions of the netplan configuration also apply to L2 templates. For details, see the official netplan documentation. Caution We strongly recommend following the below conventions on network interface naming:
We recommend setting interfaces names that do not exceed 13 symbols for both physical and virtual interfaces to avoid corner cases and issues in netplan rendering. |
Parameter |
Description |
---|---|
|
Name of the reference to the subnet that will be used in the
|
|
A dictionary of the labels and values that are used to filter out
and find the |
|
Optional. Default: none. Name of the parent |
|
Logical scope of the
|
The following table describes the main lookup functions, or macros,
that can be used in the npTemplate
field of an L2 template.
Lookup function |
Description |
---|---|
|
Name of a NIC number N. NIC numbers correspond to the interface
mapping list. This macro can be used as a key for the elements
of |
|
MAC address of a NIC number N registered during a host hardware inspection. |
|
IP address and mask for a NIC number N. The address will be allocated automatically from the given subnet, unless an IP address for that interface already exists. The interface is identified by its MAC address. |
|
IP address and mask for a virtual interface, |
|
IPv4 default gateway address from the given subnet. |
|
List of the IP addresses of name servers from the given subnet. |
This section contains an exemplary L2 template that demonstrates how to set up bonds and bridges on hosts for your managed clusters.
If you want to use a dedicated network for Kubernetes pods traffic,
configure each node with an IPv4
address that will be used to route the pods traffic between nodes.
To accomplish that, use the npTemplate.bridges.k8s-pods
bridge
in the L2 template, as demonstrated in the example below.
As defined in Container Cloud Reference Architecture: Host networking,
this bridge name is reserved for the Kubernetes pods network. When the
k8s-pods
bridge is defined in an L2 template, Calico CNI uses that network
for routing the pods traffic between nodes.
You can use a dedicated network for external connection to the Kubernetes
services exposed by the cluster.
If enabled, MetalLB will listen and respond on the dedicated virtual bridge.
To accomplish that, configure each node where metallb-speaker
is deployed
with an IPv4 address. Both, the MetalLB IP address ranges and the IP
addresses configured on those nodes, must fit in the same CIDR.
Use the npTemplate.bridges.k8s-ext
bridge in the L2 template,
as demonstrated in the example below.
This bridge name is reserved for the Kubernetes external network.
The Subnet
object that corresponds to the k8s-ext
bridge must have
explicitly excluded IP address ranges that are in use by MetalLB.
Starting from Container Cloud 2.7.0, you can configure dedicated networks
for the Ceph cluster access and replication traffic. Set labels on the
Subnet
CRs for the corresponding networks, as described in
Create subnets.
Container Cloud automatically configures Ceph to use the addresses from these
subnets. Ensure that the addresses are assigned to the storage nodes.
Use the npTemplate.bridges.ceph-cluster
and
npTemplate.bridges.ceph-public
bridges in the L2 template,
as demonstrated in the example below. These names are reserved for the Ceph
cluster access and replication networks.
The Subnet
objects used to assign IP addresses to these bridges
must have corresponding labels ipam/SVC-ceph-public
for the
ceph-public
bridge and ipam/SVC-ceph-cluster
for the
ceph-cluster
bridge.
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
name: test-managed
namespace: managed-ns
spec:
clusterRef: managed-cluster
autoIfMappingPrio:
- provision
- eno
- ens
- enp
npTemplate: |
version: 2
ethernets:
ten10gbe0s0:
dhcp4: false
dhcp6: false
match:
macaddress: {{mac 2}}
set-name: {{nic 2}}
ten10gbe0s1:
dhcp4: false
dhcp6: false
match:
macaddress: {{mac 3}}
set-name: {{nic 3}}
bonds:
bond0:
interfaces:
- ten10gbe0s0
- ten10gbe0s1
vlans:
k8s-ext-vlan:
id: 1001
link: bond0
k8s-pods-vlan:
id: 1002
link: bond0
stor-frontend:
id: 1003
link: bond0
stor-backend:
id: 1004
link: bond0
bridges:
k8s-ext:
interfaces: [k8s-ext-vlan]
addresses:
- {{ip "k8s-ext:demo-ext"}}
k8s-pods:
interfaces: [k8s-pods-vlan]
addresses:
- {{ip "k8s-pods:demo-pods"}}
ceph-cluster:
interfaces: [stor-backend]
addresses:
- {{ip "ceph-cluster:demo-ceph-cluster"}}
ceph-public
interfaces: [stor-frontend]
addresses:
- {{ip "ceph-public:demo-ceph-public"}}
Configure the MetalLB speaker node selector¶
By default, MetalLB speakers are deployed on all Kubernetes nodes except master nodes. You can decrease the number of MetalLB speakers or run them on a particular set of nodes.
To customize the MetalLB speaker node selector:
Using
kubeconfig
of the Container Cloud management cluster, open the MOSKCluster
object for editing:kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <OSClusterNamespace> edit cluster <OSClusterName>
In the
spec:providerSpec:value:helmReleases
section, add thespeaker.nodeSelector
field formetallb
:spec: ... providerSpec: value: ... helmReleases: - name: metallb values: configInline: ... speaker: nodeSelector: metallbSpeakerEnabled: "true"
The
metallbSpeakerEnabled: "true"
parameter in this example is the label on Kubernetes nodes where MetalLB speakers will be deployed. It can be an already existing node label or a new one.Note
Due to the issue with collocation of MetalLB speaker and the OpenStack Ingress service Pods, the use of the MetalLB speaker node selector is limited. For details, see [24435] MetalLB speaker fails to announce the LB IP for the Ingress service.
You can add user-defined labels to nodes using the
nodeLabels
field.This field contains the list of node labels to be attached to a node for the user to run certain components on separate cluster nodes. The list of allowed node labels is located in the
Cluster
object statusproviderStatus.releaseRef.current.allowedNodeLabels
field.Starting from MOSK 22.3, if the
value
field is not defined inallowedNodeLabels
, a label can have any value. For example:allowedNodeLabels: - displayName: Stacklight key: stacklight
Before or after a machine deployment, add the required label from the allowed node labels list with the corresponding value to
spec.providerSpec.value.nodeLabels
inmachine.yaml
. For example:nodeLabels: - key: stacklight value: enabled
Adding of a node label that is not available in the list of allowed node labels is restricted.
Add a machine¶
This section describes how to add a machine to a managed MOSK cluster using CLI for advanced configuration.
Create a machine using CLI¶
This section describes adding machines to a new MOSK cluster using Mirantis Container Cloud CLI.
If you need to add more machines to an existing MOSK cluster, see Add a controller node and Add a compute node.
To add machine to the MOSK cluster:
Log in to the host where your management cluster
kubeconfig
is located and where kubectl is installed.Create a new text file
mosk-cluster-machines.yaml
and create the YAML definitons of theMachine
resources. Use this as an example, and see the descriptions of the fields below:apiVersion: cluster.k8s.io/v1alpha1 kind: Machine metadata: name: mosk-node-role-name namespace: mosk-project labels: kaas.mirantis.com/provider: baremetal kaas.mirantis.com/region: region-one cluster.sigs.k8s.io/cluster-name: mosk-cluster spec: providerSpec: value: apiVersion: baremetal.k8s.io/v1alpha1 kind: BareMetalMachineProviderSpec bareMetalHostProfile: name: mosk-k8s-mgr namespace: mosk-project l2TemplateSelector: name: mosk-k8s-mgr hostSelector: {} l2TemplateMappingOverride: []
Add the top level fields:
apiVersion
API version of the object that is
cluster.k8s.io/v1alpha1
.
kind
Object type that is
Machine
.
metadata
This section will contain the metadata of the object.
spec
This section will contain the configuration of the object.
Add mandatory fields to the
metadata
section of theMachine
object definition.name
The name of the
Machine
object.
namespace
The name of the Project where the
Machine
will be created.
labels
This section contains additional metadata of the machine. Set the following mandatory labels for the
Machine
object.kaas.mirantis.com/provider
Set to
"baremetal"
.
kaas.mirantis.com/region
Region name that matches the region name in the
Cluster
object.
cluster.sigs.k8s.io/cluster-name
The name of the cluster to add the machine to.
Configure the mandatory parameters of the
Machine
object inspec
field. AddproviderSpec
field that contains parameters for deployment on bare metal in a form of Kubernetes subresource.In the
providerSpec
section, add the following mandatory configuration parameters:apiVersion
API version of the subresource that is
baremetal.k8s.io/v1alpha1
.
kind
Object type that is
BareMetalMachineProviderSpec
.
bareMetalHostProfile
Reference to a configuration profile of a bare metal host. It helps to pick bare metal host with suitable configuration for the machine. This section includes two parameters:
name
Name of a bare metal host profile
namespace
Project in which the bare metal host profile is created.
l2TemplateSelector
If specified, contains the
name
(first priority) orlabel
of the L2 template that will be applied during a machine creation. Note that changing this field afterMachine
object is created will not affect the host network configuration of the machine.You should assign one of the templates you defined in Create L2 templates to the machine. If there is no suitable templates, you should create one per Create L2 templates.
hostSelector
This parameter defines matching criteria for picking a bare metal host for the machine by label.
Any custom label that is assigned to one or more bare metal hosts using API can be used as a host selector. If the
BareMetalHost
objects with the specified label are missing, theMachine
object will not be deployed until at least one bare metal host with the specified label is available.See Deploy a machine to a specific bare metal host for details.
l2TemplateIfMappingOverride
This parameter contains a list of names of network interfaces of the host. It allows to override the default naming and ordering of network interfaces defined in L2 template referenced by the
l2TemplateSelector
. This ordering informs the L2 templates how to generate the host network configuration.See Override network interfaces naming and order for details.
Depending on the role of the machine in the MOSK cluster, add labels to the
nodeLabels
field.This field contains the list of node labels to be attached to a node for the user to run certain components on separate cluster nodes. The list of allowed node labels is located in the
Cluster
object statusproviderStatus.releaseRef.current.allowedNodeLabels
field.Starting from MOSK 22.3, if the
value
field is not defined inallowedNodeLabels
, a label can have any value. For example:allowedNodeLabels: - displayName: Stacklight key: stacklight
Before or after a machine deployment, add the required label from the allowed node labels list with the corresponding value to
spec.providerSpec.value.nodeLabels
inmachine.yaml
. For example:nodeLabels: - key: stacklight value: enabled
Adding of a node label that is not available in the list of allowed node labels is restricted.
If you are NOT deploying MOSK with the compact control plane, add 3 dedicated Kubernetes manager nodes.
Add 3
Machine
objects for Kubernetes manager nodes using the following label:metadata: labels: cluster.sigs.k8s.io/control-plane: "true"
Note
The value of the label might be any non-empty string. On a worker node, this label must be omitted entirely.
Add 3
Machine
objects for MOSK controller nodes using the following labels:spec: providerSpec: value: nodeLabels: openstack-control-plane: enabled openstack-gateway: enabled
If you are deploying MOSK with compact control plane, add
Machine
objects for 3 combined control plane nodes using the following labels and parameters to thenodeLabels
field:metadata: labels: cluster.sigs.k8s.io/control-plane: true spec: providerSpec: value: nodeLabels: openstack-control-plane: enabled openstack-gateway: enabled openvswitch: enabled
Add
Machine
objects for as many compute nodes as you want to install using the following labels:spec: providerSpec: value: nodeLabels: openstack-compute-node: enabled openvswitch: enabled
Save the text file and repeat the process to create configuration for all machines in your MOSK cluster.
Create machines in the cluster using command:
kubectl create -f mosk-cluster-machines.yaml
Proceed to Add a Ceph cluster.
See also
Assign L2 templates to machines¶
To install MOSK on bare metal with Container Cloud, you must create L2 templates for each node type in the MOSK cluster. Additionally, you may have to create separate templates for nodes of the same type when they have different configuration.
To assign specific L2 templates to machines in a cluster:
Use the
clusterRef
parameter in the L2 template spec to assign the templates to the cluster.Add a unique identifier label to every L2 template. Typically, that would be the name of the MOSK node role, for example
l2template-compute
, orl2template-compute-5nics
.Assign an L2 template to a machine. Set the
l2TemplateSelector
field in the machine spec to the name of the label added in the previous step. IPAM Controller uses this field to use a specific L2 template for the corresponding machine.Alternatively, you may set the
l2TemplateSelector
field to the name of the L2 template.
Consider the following examples of an L2 template assignment to a machine.
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
name: example-node-netconfig
namespace: my-project
labels:
kaas.mirantis.com/provider: baremetal
kaas.mirantis.com/region: region-one
l2template-example-node-netconfig: "1"
...
spec:
clusterRef: my-cluster
...
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
name: machine1
namespace: my-project
...
spec:
providerSpec:
value:
l2TemplateSelector:
label: l2template-example-node-netconfig
...
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
name: machine1
namespace: my-project
...
spec:
providerSpec:
value:
l2TemplateSelector:
name: example-node-netconfig
...
Now, proceed to Deploy a machine to a specific bare metal host.
Deploy a machine to a specific bare metal host¶
Machine in a MOSK cluster requires dedicated bare metal
host for deployment. The bare metal hosts are represented by the
BareMetalHost
objects in MCC management API. All BareMetalHost
objects must be labeled upon creation with a label that will allow to
identify the host and assign it to a machine.
The labels may be unique, or applied to a group of hosts, based on similarities in their capacity, capabilities and hardware configuration, on their location, suitable role, or a combination of thereof.
In some cases, you may need to deploy a machine to a specific bare metal host. This is especially useful when some of your bare metal hosts have different hardware configuration than the rest.
To deploy a machine to a specific bare metal host:
Log in to the host where your management cluster
kubeconfig
is located and where kubectl is installed.Identify the bare metal host that you want to associate with the specific machine. For example, host
host-1
.kubectl get baremetalhost host-1 -o yaml
Add a label that will uniquely identify this host, for example, by the name of the host and machine that you want to deploy on it.
Caution
Do not remove any existing labels from the
BareMetalHost
resource.kubectl edit baremetalhost host-1
Configuration example:
kind: BareMetalHost metadata: name: host-1 namespace: myProjectName labels: kaas.mirantis.com/baremetalhost-id: host-1-worker-HW11-cad5 ...
Open the text file with the YAML definition of the
Machine
object, created in Create a machine using CLI.Add a host selector that matches the label you have added to the
BareMetalHost
object in the previous step.Example:
kind: Machine metadata: name: worker-HW11-cad5 namespace: myProjectName spec: ... providerSpec: value: apiVersion: baremetal.k8s.io/v1alpha1 kind: BareMetalMachineProviderSpec ... hostSelector: matchLabels: kaas.mirantis.com/baremetalhost-id: host-1-worker-HW11-cad5 ...
Once created, this machine will be associated with the specified bare metal host, and you can return to Create a machine using CLI.
Override network interfaces naming and order¶
An L2 template contains the ifMapping
field that allows you to
identify Ethernet interfaces for the template. The Machine
object
API enables the Operator to override the mapping from the L2 template
by enforcing a specific order of names of the interfaces when applied
to the template.
The field l2TemplateIfMappingOverride
in the spec of the Machine
object contains a list of interfaces names. The order of the interfaces
names in the list is important because the L2Template
object will
be rendered with NICs ordered as per this list.
Note
Changes in the l2TemplateIfMappingOverride
field will apply
only once when the Machine
and corresponding IpamHost
objects
are created. Further changes to l2TemplateIfMappingOverride
will not reset the interfaces assignment and configuration.
Caution
The l2TemplateIfMappingOverride
field must contain the names of
all interfaces of the bare metal host.
The following example illustrates how to include the override field to the
Machine
object. In this example, we configure the interface eno1
,
which is the second on-board interface of the server, to precede the first
on-board interface eno0
.
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
finalizers:
- foregroundDeletion
- machine.cluster.sigs.k8s.io
labels:
cluster.sigs.k8s.io/cluster-name: kaas-mgmt
cluster.sigs.k8s.io/control-plane: "true"
kaas.mirantis.com/provider: baremetal
kaas.mirantis.com/region: region-one
spec:
providerSpec:
value:
apiVersion: baremetal.k8s.io/v1alpha1
hostSelector:
matchLabels:
baremetal: hw-master-0
image: {}
kind: BareMetalMachineProviderSpec
l2TemplateIfMappingOverride:
- eno1
- eno0
- enp0s1
- enp0s2
As a result of the configuration above, when used with the example
L2 template for bonds and bridges described in Create L2 templates,
the enp0s1
and enp0s2
interfaces will be in predictable
ordered state. This state will be used to create subinterfaces for
Kubernetes networks (k8s-pods
) and for Kubernetes external
network (k8s-ext
).
Add a Ceph cluster¶
After you add machines to your new bare metal managed cluster as described in Add a machine, create a Ceph cluster on top of this managed cluster using the Mirantis Container Cloud web UI.
For an advanced configuration through the KaaSCephCluster
CR, see
Mirantis Container Cloud Operations Guide: Ceph advanced configuration.
To configure Ceph Controller through Kubernetes templates to manage Ceph nodes
resources, see
Mirantis Container Cloud Operations Guide: Enable Ceph tolerations and
resources management.
The procedure below enables you to create a Ceph cluster with minimum three Ceph nodes that provides persistent volumes to the Kubernetes workloads in the managed cluster.
To create a Ceph cluster in the managed cluster:
Log in to the Container Cloud web UI with the
m:kaas:namespace@operator
orm:kaas:namespace@writer
permissions.Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the required cluster name. The Cluster page with the Machines and Ceph clusters lists opens.
In the Ceph Clusters block, click Create Cluster.
Configure the Ceph cluster in the Create New Ceph Cluster wizard that opens:
Create new Ceph cluster¶ Section
Parameter name
Description
General settings
Name
The Ceph cluster name.
Cluster Network
Replication network for Ceph OSDs. Must contain the CIDR definition and match the corresponding values of the cluster
L2Template
object or the environment network values.Public Network
Public network for Ceph data. Must contain the CIDR definition and match the corresponding values of the cluster
L2Template
object or the environment network values.Enable OSDs LCM
Select to enable LCM for Ceph OSDs.
Machines / Machine #1-3
Select machine
Select the name of the Kubernetes machine that will host the corresponding Ceph node in the Ceph cluster.
Manager, Monitor
Select the required Ceph services to install on the Ceph node.
Devices
Select the disk that Ceph will use.
Warning
Do not select the device for system services, for example,
sda
.Enable Object Storage
Select to enable the single-instance RGW Object Storage.
To add more Ceph nodes to the new Ceph cluster, click + next to any Ceph Machine title in the Machines tab. Configure a Ceph node as required.
Warning
Do not add more than 3
Manager
and/orMonitor
services to the Ceph cluster.After you add and configure all nodes in your Ceph cluster, click Create.
Open the
KaaSCephCluster
CR for editing as described in Mirantis Container Cloud Operations Guide: Ceph advanced configuration.Verify that the following snippet is present in the
KaaSCephCluster
configuration:network: clusterNet: 10.10.10.0/24 publicNet: 10.10.11.0/24
Configure the pools for Image, Block Storage, and Compute services.
Note
Ceph validates the specified pools. Therefore, do not omit any of the following pools.
spec: pools: - default: true deviceClass: hdd name: kubernetes replicated: size: 3 role: kubernetes - default: false deviceClass: hdd name: volumes replicated: size: 3 role: volumes - default: false deviceClass: hdd name: vms replicated: size: 3 role: vms - default: false deviceClass: hdd name: backup replicated: size: 3 role: backup - default: false deviceClass: hdd name: images replicated: size: 3 role: images - default: false deviceClass: hdd name: other replicated: size: 3 role: other
Each Ceph pool, depending on its role, has a default
targetSizeRatio
value that defines the expected consumption of the total Ceph cluster capacity. The default ratio values for MOSK pools are as follows:20.0% for a Ceph pool with role
volumes
40.0% for a Ceph pool with role
vms
10.0% for a Ceph pool with role
images
10.0% for a Ceph pool with role
backup
Once all pools are created, verify that an appropriate secret required for a successful deployment of the OpenStack services that rely on Ceph is created in the
openstack-ceph-shared
namespace:kubectl -n openstack-ceph-shared get secrets openstack-ceph-keys
Example of a positive system response:
NAME TYPE DATA AGE openstack-ceph-keys Opaque 7 36m
Verify your Ceph cluster as described in Mirantis Container Cloud Operations Guide: Verify Ceph.
Delete a managed cluster¶
Due to a development limitation in baremetal operator
,
deletion of a managed cluster requires preliminary deletion
of the worker machines running on the cluster.
Using the Container Cloud web UI, first delete worker machines one by one until you hit the minimum of 2 workers for an operational cluster. After that, you can delete the cluster with the remaining workers and managers.
To delete a baremetal-based managed cluster:
Log in to the Mirantis Container Cloud web UI with the
writer
permissions.Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.
In the Clusters tab, click the required cluster name to open the list of machines running on it.
Click the More action icon in the last column of the worker machine you want to delete and select Delete. Confirm the deletion.
Repeat the step above until you have 2 workers left.
In the Clusters tab, click the More action icon in the last column of the required cluster and select Delete.
Verify the list of machines to be removed. Confirm the deletion.
Optional. If you do not plan to reuse the credentials of the deleted cluster, delete them:
In the Credentials tab, click the Delete credential action icon next to the name of the credentials to be deleted.
Confirm the deletion.
Warning
You can delete credentials only after deleting the managed cluster they relate to.
Deleting a cluster automatically frees up the resources allocated for this cluster, for example, instances, load balancers, networks, floating IPs, and so on.
Deploy OpenStack¶
This section instructs you on how to deploy OpenStack on top of Kubernetes as well as how to troubleshoot the deployment and access your OpenStack environment after deployment.
Deploy an OpenStack cluster¶
This section instructs you on how to deploy OpenStack on top of Kubernetes
using the OpenStack Controller and openstackdeployments.lcm.mirantis.com
(OsDpl) CR.
To deploy an OpenStack cluster:
Verify that you have pre-configured the networking according to Networking.
Verify that the TLS certificates that will be required for the OpenStack cluster deployment have been pre-generated.
Note
The Transport Layer Security (TLS) protocol is mandatory on public endpoints.
Caution
To avoid certificates renewal with subsequent OpenStack updates during which additional services with new public endpoints may appear, we recommend using wildcard SSL certificates for public endpoints. For example,
*.it.just.works
, whereit.just.works
is a cluster public domain.The sample code block below illustrates how to generate a self-signed certificate for the
it.just.works
domain. The procedure presumes the cfssl and cfssljson tools are installed on the machine.mkdir cert && cd cert tee ca-config.json << EOF { "signing": { "default": { "expiry": "8760h" }, "profiles": { "kubernetes": { "usages": [ "signing", "key encipherment", "server auth", "client auth" ], "expiry": "8760h" } } } } EOF tee ca-csr.json << EOF { "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names":[{ "C": "<country>", "ST": "<state>", "L": "<city>", "O": "<organization>", "OU": "<organization unit>" }] } EOF cfssl gencert -initca ca-csr.json | cfssljson -bare ca tee server-csr.json << EOF { "CN": "*.it.just.works", "hosts": [ "*.it.just.works" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "US", "L": "CA", "ST": "San Francisco" }] } EOF cfssl gencert -ca=ca.pem -ca-key=ca-key.pem --config=ca-config.json -profile=kubernetes server-csr.json | cfssljson -bare server
Create the
openstackdeployment.yaml
file that will include the OpenStack cluster deployment configuration.Note
The resource of kind
OpenStackDeployment
(OsDpl) is a custom resource defined by a resource of kindCustomResourceDefinition
. The resource is validated with the help of the OpenAPI v3 schema.Configure the OsDpl resource depending on the needs of your deployment. For the configuration details, refer to OpenStackDeployment custom resource.
Note
If you plan to deploy the Telemetry service, you have to specify the Telemetry mode through
features:telemetry:mode
as described in OpenStackDeployment custom resource. Otherwise, Telemetry will fail to deploy.Example of an OpenStackDeployment CR of minimum configuration¶apiVersion: lcm.mirantis.com/v1alpha1 kind: OpenStackDeployment metadata: name: openstack-cluster namespace: openstack spec: openstack_version: victoria preset: compute size: tiny internal_domain_name: cluster.local public_domain_name: it.just.works features: neutron: tunnel_interface: ens3 external_networks: - physnet: physnet1 interface: veth-phy bridge: br-ex network_types: - flat vlan_ranges: null mtu: null floating_network: enabled: False nova: live_migration_interface: ens3 images: backend: local
If required, enable DPDK, huge pages, and other supported Telco features as described in Advanced OpenStack configuration (optional).
To the
openstackdeployment
object, add information about the TLS certificates:ssl:public_endpoints:ca_cert
- CA certificate content (ca.pem
)ssl:public_endpoints:api_cert
- server certificate content (server.pem
)ssl:public_endpoints:api_key
- server private key (server-key.pem
)
Verify that the Load Balancer network does not overlap your corporate or internal Kubernetes networks, for example, Calico IP pools. Also, verify that the pool of Load Balancer network is big enough to provide IP addresses for all Amphora VMs (
loadbalancers
).If required, reconfigure the Octavia network settings using the following sample structure:
spec: services: load-balancer: octavia: values: octavia: settings: lbmgmt_cidr: "10.255.0.0/16" lbmgmt_subnet_start: "10.255.1.0" lbmgmt_subnet_end: "10.255.255.254"
Trigger the OpenStack deployment:
kubectl apply -f openstackdeployment.yaml
Monitor the status of your OpenStack deployment:
kubectl -n openstack get pods kubectl -n openstack describe osdpl osh-dev
Assess the current status of the OpenStack deployment using the
status
section output in the OsDpl resource:Get the OsDpl YAML file:
kubectl -n openstack get osdpl osh-dev -o yaml
Analyze the
status
output using the detailed description in OpenStackDeployment custom resource.
Verify that the OpenStack cluster has been deployed:
clinet_pod_name=$(kubectl -n openstack get pods -l application=keystone,component=client | grep keystone-client | head -1 | awk '{print $1}') kubectl -n openstack exec -it $clinet_pod_name -- openstack service list
Example of a positive system response:
+----------------------------------+---------------+----------------+ | ID | Name | Type | +----------------------------------+---------------+----------------+ | 159f5c7e59784179b589f933bf9fc6b0 | cinderv3 | volumev3 | | 6ad762f04eb64a31a9567c1c3e5a53b4 | keystone | identity | | 7e265e0f37e34971959ce2dd9eafb5dc | heat | orchestration | | 8bc263babe9944cdb51e3b5981a0096b | nova | compute | | 9571a49d1fdd4a9f9e33972751125f3f | placement | placement | | a3f9b25b7447436b85158946ca1c15e2 | neutron | network | | af20129d67a14cadbe8d33ebe4b147a8 | heat-cfn | cloudformation | | b00b5ad18c324ac9b1c83d7eb58c76f5 | radosgw-swift | object-store | | b28217da1116498fa70e5b8d1b1457e5 | cinderv2 | volumev2 | | e601c0749ce5425c8efb789278656dd4 | glance | image | +----------------------------------+---------------+----------------+
Register a record on the customer DNS:
Caution
The DNS component is mandatory to access OpenStack public endpoints.
Obtain the full list of endpoints:
kubectl -n openstack get ingress -ocustom-columns=NAME:.metadata.name,HOSTS:spec.rules[*].host | awk '/namespace-fqdn/ {print $2}'
Example of system response:
barbican.<spec:public_domain_name> cinder.<spec:public_domain_name> cloudformation.<spec:public_domain_name> designate.<spec:public_domain_name> glance.<spec:public_domain_name> heat.<spec:public_domain_name> horizon.<spec:public_domain_name> keystone.<spec:public_domain_name> metadata.<spec:public_domain_name> neutron.<spec:public_domain_name> nova.<spec:public_domain_name> novncproxy.<spec:public_domain_name> octavia.<spec:public_domain_name> placement.<spec:public_domain_name>
Obtain the public endpoint IP:
kubectl -n openstack get services ingress
In the system response, capture
EXTERNAL-IP
.Example of system response:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress LoadBalancer 10.96.32.97 10.172.1.101 80:34234/TCP,443:34927/TCP,10246:33658/TCP 4h56m
Ask the customer to create records for public endpoints, obtained earlier in this procedure, to
EXTERNAL-IP
from the Ingress service.
See also
Advanced OpenStack configuration (optional)¶
This section includes configuration information for available advanced Mirantis OpenStack for Kubernetes features that include DPDK with the Neutron OVS back end, huge pages, CPU pinning, and other Enhanced Platform Awareness (EPA) capabilities.
Enable LVM ephemeral storage¶
TechPreview
Note
Consider this section as part of Deploy an OpenStack cluster.
This section instructs you on how to configure LVM as back end for the VM disks and ephemeral storage.
Warning
Usage of more than one nonvolatile memory express (NVMe) drive per node may cause update issues and is therefore not supported.
To enable LVM ephemeral storage:
In
BareMetalHostProfile
in thespec:volumeGroups
section, add the following configuration for the OpenStack compute nodes:spec: devices: - device: byName: /dev/nvme0n1 minSizeGiB: 30 wipe: true partitions: - name: lvm_nova_vol sizeGiB: 0 wipe: true volumeGroups: - devices: - partition: lvm_nova_vol name: nova-vol logicalVolumes: - name: nova-fake vg: nova-vol sizeGiB: 0.1 fileSystems: - fileSystem: ext4 logicalVolume: nova-fake mountPoint: /nova-fake
Note
Due to a limitation, it is not possible to create volume groups without logical volumes and formatted partitions. Therefore, set the
logicalVolumes:name
,fileSystems:logicalVolume
, andfileSystems:mountPoint
parameters tonova-fake
.For details about
BareMetalHostProfile
, see Mirantis Container Cloud Operations Guide: Create a custom bare metal host profile.Configure the
OpenStackDeployment
CR to deploy OpenStack with LVM ephemeral storage. For example:spec: features: nova: images: backend: lvm lvm: volume_group: "nova-vol"
Optional. Enable encryption for the LVM ephemeral storage by adding the following metadata in the
OpenStackDeployment
CR:spec: features: nova: images: encryption: enabled: true cipher: "aes-xts-plain64" key_size: 256
Caution
Both live and cold migrations are not supported for such instances.
Enable LVM block storage¶
TechPreview
Note
Consider this section as part of Deploy an OpenStack cluster.
This section instructs you on how to configure LVM as a back end for the OpenStack Block Storage service.
To enable LVM block storage:
Open
BareMetalHostProfile
for editing.In the
spec:volumeGroups
section, specify the following data for the OpenStack compute nodes. In the following example, we deploy a Cinder volume with LVM on compute nodes. However, you can use dedicated nodes for this purpose.spec: devices: - device: byName: /dev/nvme0n1 minSizeGiB: 30 wipe: true partitions: - name: lvm_cinder_vol sizeGiB: 0 wipe: true volumeGroups: - devices: - partition: lvm_cinder_vol name: cinder-vol logicalVolumes: - name: cinder-fake vg: cinder-vol sizeGiB: 0.1 fileSystems: - fileSystem: ext4 logicalVolume: cinder-fake mountPoint: /cinder-fake
Note
Due to a limitation, volume groups cannot be created without logical volumes and formatted partitions. Therefore, set the
logicalVolumes:name
,fileSystems:logicalVolume
, andfileSystems:mountPoint
parameters tocinder-fake
.For details about
BareMetalHostProfile
, see Mirantis Container Cloud Operations Guide: Create a custom bare metal host profile.Configure the
OpenStackDeployment
CR to deploy OpenStack with LVM block storage. For example:spec: nodes: openstack-compute-node::enabled: features: cinder: volume: backends: lvm: lvm: volume_group: "cinder-vol"
Enable DPDK with OVS¶
TechPreview
Note
Consider this section as part of Deploy an OpenStack cluster.
This section instructs you on how to enable DPDK with the Neutron OVS back end.
To enable DPDK with OVS:
Verify that your deployment meets the following requirements:
The required drivers have been installed on the host operating system.
Different Poll Mode Driver (PMD) types may require different kernel drivers to properly work with NIC. For more information about the DPDK drivers, read DPDK official documentation: Linux Drivers and Overview of Networking Drivers.
The DPDK NICs are not used on the host operating system.
The huge pages feature is enabled on the host. See Enable huge pages for OpenStack for details.
Enable DPDK in the OsDpl custom resource through the
node specific overrides
settings. For example:spec: nodes: <NODE-LABEL>::<NODE-LABEL-VALUE>: features: neutron: dpdk: bridges: - ip_address: 10.12.2.80/24 name: br-phy driver: igb_uio enabled: true nics: - bridge: br-phy name: nic01 pci_id: "0000:05:00.0" tunnel_interface: br-phy
Enable SR-IOV with OVS¶
Note
Consider this section as part of Deploy an OpenStack cluster.
This section instructs you on how to enable SR-IOV with the Neutron OVS back end.
To enable SR-IOV with OVS:
Verify that your deployment meets the following requirements:
NICs with the SR-IOV support are installed
SR-IOV and VT-d are enabled in BIOS
Enable IOMMU in the kernel by configuring
intel_iommu=on
in the GRUB configuration file. Specify the parameter for compute nodes inBareMetalHostProfile
in thegrubConfig
section:spec: grubConfig: defaultGrubOptions: - 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX intel_iommu=on"'
Configure the
OpenStackDeployment
CR to deploy OpenStack with the VLAN tenant network encapsulation.Caution
To ensure correct appliance of the configuration changes, configure VLAN segmentation during the initial OpenStack deployment.
Configuration example:
spec: features: neutron: external_networks: - bridge: br-ex interface: pr-floating mtu: null network_types: - flat physnet: physnet1 vlan_ranges: null - bridge: br-tenant interface: bond0 network_types: - vlan physnet: tenant vlan_ranges: 490:499,1420:1459 tenant_network_types: - vlan
Enable SR-IOV in the
OpenStackDeployment
CR through the node-specific overrides settings. For example:spec: nodes: <NODE-LABEL>::<NODE-LABEL-VALUE>: features: neutron: sriov: enabled: true nics: - device: enp10s0f1 num_vfs: 7 physnet: tenant
Enable BGP VPN¶
TechPreview
Note
Consider this section as part of Deploy an OpenStack cluster.
The BGP VPN service is an extra OpenStack Neutron plugin that enables connection of OpenStack Virtual Private Networks with external VPN sites through either BGP/MPLS IP VPNs or E-VPN.
To enable the BGP VPN service:
Enable BGP VPN in the OsDpl custom resource through the
node specific overrides
settings. For example:
spec:
features:
neutron:
bgpvpn:
enabled: true
route_reflector:
# Enable deploygin FRR route reflector
enabled: true
# Local AS number
as_number: 64512
# List of subnets we allow to connect to
# router reflector BGP
neighbor_subnets:
- 10.0.0.0/8
- 172.16.0.0/16
nodes:
openstack-compute-node::enabled:
features:
neutron:
bgpvpn:
enabled: true
When the service is enabled, a route reflector is scheduled on nodes with
the openstack-frrouting: enabled
label. Mirantis recommends collocating
the route reflector nodes with the OpenStack controller nodes. By default, two
replicas are deployed.
Encrypt the east-west traffic¶
TechPreview
Note
Consider this section as part of Deploy an OpenStack cluster.
Mirantis OpenStack on Kubernetes allows configuring Internet Protocol Security (IPsec) encryption for the east-west tenant traffic between the OpenStack compute nodes and gateways. The feature uses the strongSwan open source IPsec solution. Authentication is accomplished through a pre-shared key (PSK). However, other authentication methods are upcoming.
To encrypt the east-west tenant traffic, enable ipsec
in the
spec:features:neutron
settings of the OpenStackDeployment
CR:
spec:
features:
neutron:
ipsec:
enabled: true
Enable Cinder back end for Glance¶
TechPreview
Note
Consider this section as part of Deploy an OpenStack cluster.
This section instructs you on how to configure Cinder back end for the for images through the OpenStackDeployment CR.
Note
This feature depends heavily on Cinder multi-attach, which enables you to simultaneously attach volumes to multiple instances. Therefore, only the block storage back ends that support multi-attach can be used.
To configure a Cinder back end for Glance, define the back end identity in the OpenStackDeployment CR. This identity will be used as a name for the back end section in the Glance configuration file.
When defining the back end:
Configure one of the back ends as default.
Configure each back end to use specific Cinder volume type.
Note
You can use the
volume_type
parameter instead ofbackend_name
. If so, you have to create this volume type beforehand and take into account that the bootstrap script does not manage such volume types.
The blockstore
identity definition example:
spec:
features:
glance:
backends:
cinder:
blockstore:
default: true
backend_name: <volume_type:volume_name>
# e.g. backend_name: lvm:lvm_store
spec:
features:
glance:
backends:
cinder:
blockstore:
default: true
volume_type: netapp
Enable Cinder volume encryption¶
TechPreview
Note
Consider this section as part of Deploy an OpenStack cluster.
This section instructs you on how to enable Cinder volume encryption
through the OpenStackDeployment
CR using Linux Unified Key Setup (LUKS)
and store the encryption keys in Barbican. For details, see
Volume encryption.
To enable Cinder volume encryption:
In the
OpenStackDeployment
CR, specify the LUKS volume type and configure the required encryption parameters for the storage system to encrypt or decrypt the volume.The
volume_types
definition example:spec: services: block-storage: cinder: values: bootstrap: volume_types: volumes-hdd-luks: arguments: encryption-cipher: aes-xts-plain64 encryption-control-location: front-end encryption-key-size: 256 encryption-provider: luks volume_backend_name: volumes-hdd
To create an encrypted volume as a non-admin user and store keys in the Barbican storage, assign the
creator
role to the user since the default Barbican policy allows only theadmin
orcreator
role:openstack role add --project <PROJECT-ID> --user <USER-ID> --creator <CREATOR-ID> creator
Optional. To define an encrypted volume as a default one, specify
volumes-hdd-luks
indefault_volume_type
in the Cinder configuration:spec: services: block-storage: cinder: values: conf: cinder: DEFAULT: default_volume_type: volumes-hdd-luks
Advanced configuration for OpenStack compute nodes¶
Note
Consider this section as part of Deploy an OpenStack cluster.
The section describes how to perform advanced configuration for the OpenStack compute nodes. Such configuration can be required in some specific use cases, such as DPDK, SR-IOV, or huge pages features usage.
Available since MOSK 22.1
Note
Consider this section as part of Deploy an OpenStack cluster.
Mirantis OpenStack for Kubernetes (MOSK) enables you to configure the
vCPU model for all instances managed by the OpenStack Compute service (Nova)
using the following osdpl
definition:
spec:
features:
nova:
vcpu_type: host-model
For the supported values and configuration examples, see vCPU type.
Note
Consider this section as part of Deploy an OpenStack cluster.
Note
The instruction provided in this section applies to both OpenStack with OVS and OpenStack with Tungsten Fabric topologies.
The huge pages OpenStack feature provides essential performance improvements
for applications that are highly memory IO-bound. Huge pages should be enabled
on a per compute node basis. By default, NUMATopologyFilter
is enabled.
To activate the feature, you need to enable huge pages on the dedicated bare metal host as described in Enable huge pages in a host profile during the predeployment bare metal configuration.
Note
The multi-size huge pages are not fully supported by Kubernetes versions before 1.19. Therefore, define only one size in kernel parameters.
Note
Consider this section as part of Deploy an OpenStack cluster.
CPU isolation is a way to force the system scheduler to use only some logical CPU cores for processes. For compute hosts, you should typically isolate system processes and virtual guests on different cores. This section describes the two possible options on how to achieve this:
Through the
isolcpus
configuration parameter for Linux kernel Deprecated since MOSK 22.2Through the
cpusets
mechanism in Linux kernel Available since MOSK 22.2, TechPreview
For details, see OpenStack official documentation: CPU topologies and Shielding Linux Resources.
Note
Starting from MOSK 22.2, isolcpus
is
deprecated.
Using the isolcpus
parameter, specific CPUs are removed from the general
kernel symmetrical multiprocessing (SMP) load balancing and scheduling. The
only way to get tasks scheduled onto isolated CPUs is taskset
. The list of
isolcpus
is configured statically at boot time. You can only change it by
rebooting with a different value. In Linux kernel, the isolcpus
parameter is deprecated in favor of cpusets.
To configure CPU isolation using isolcpus:
Configure
isolcpus
in Linux kernel:GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=4-15"
Apply the changes:
update-grub
Isolate cores from scheduling of Docker or Kubernetes workloads:
cat <<-"EOF" > /usr/bin/setup-cgroups.sh #!/bin/bash set -x UNSHIELDED_CPUS=${UNSHIELDED_CPUS:-"0-3"} SHIELD_CPUS=${SHIELD_CPUS:-"4-15"} SHIELD_MODE=${SHIELD_MODE:-"isolcpu"} # One of isolcpu or cpuset DOCKER_CPUS=${DOCKER_CPUS:-$UNSHIELDED_CPUS} KUBERNETES_CPUS=${KUBERNETES_CPUS:-$UNSHIELDED_CPUS} CSET_CMD=${CSET_CMD:-"python2 /usr/bin/cset"} if [[ ${SHIELD_MODE} == "cpuset" ]]; then ${CSET_CMD} set -c ${UNSHIELDED_CPUS} -s system ${CSET_CMD} proc -m -f root -t system ${CSET_CMD} proc -k -f root -t system fi ${CSET_CMD} set --cpu=${DOCKER_CPUS} --set=docker ${CSET_CMD} set --cpu=${KUBERNETES_CPUS} --set=kubepods ${CSET_CMD} set --cpu=${DOCKER_CPUS} --set=com.docker.ucp EOF chmod +x /usr/bin/setup-cgroups.sh cat <<-"EOF" > /etc/systemd/system/shield-cpus.service [Unit] Descriptio=Shield CPUs DefaultDependencies=no After=systemd-udev-settle.service Before=lvm2-activation-early.service Wants=systemd-udev-settle.service [Service] ExecStart=/usr/bin/setup-cgroups.sh RemainAfterExit=true Type=oneshot [Install] WantedBy=basic.target EOF systemctl enable shield-cpus
As root user, reboot the host:
cat /sys/devices/system/cpu/isolated
Example of system response:
4-15
As root user, verify that isolation is active:
cset set -l
Example of system response:
cset: Name CPUs-X MEMs-X Tasks Subs Path ------------ ---------- - ------- - ----- ---- ---------- root 0-15 y 0 y 1449 3 / kubepods 0-3 n 0 n 0 2 /kubepods docker 0-3 n 0 n 0 5 /docker com.docker.ucp 0-3 n 0 n 0 1 /com.docker.ucp
Available since MOSK 22.2 TechPreview
The Linux kernel and cpuset provide a mechanism to run tasks by limiting the
resources defined by a cpuset. The tasks can be moved from one cpuset to
another to use the resources defined in other cpusets. The cset
Python tool
is a command-line interface to work with cpusets.
To configure CPU isolation using cpusets:
Configure core isolation:
Note
You can also automate this step during deployment by using the
postDeploy
script as described in Create MOSK host profiles.cat <<-"EOF" > /usr/bin/setup-cgroups.sh #!/bin/bash set -x UNSHIELDED_CPUS=${UNSHIELDED_CPUS:-"0-3"} SHIELD_CPUS=${SHIELD_CPUS:-"4-15"} SHIELD_MODE=${SHIELD_MODE:-"cpuset"} # One of isolcpu or cpuset DOCKER_CPUS=${DOCKER_CPUS:-$UNSHIELDED_CPUS} KUBERNETES_CPUS=${KUBERNETES_CPUS:-$UNSHIELDED_CPUS} CSET_CMD=${CSET_CMD:-"python2 /usr/bin/cset"} if [[ ${SHIELD_MODE} == "cpuset" ]]; then ${CSET_CMD} set -c ${UNSHIELDED_CPUS} -s system ${CSET_CMD} proc -m -f root -t system ${CSET_CMD} proc -k -f root -t system fi ${CSET_CMD} set --cpu=${DOCKER_CPUS} --set=docker ${CSET_CMD} set --cpu=${KUBERNETES_CPUS} --set=kubepods ${CSET_CMD} set --cpu=${DOCKER_CPUS} --set=com.docker.ucp EOF chmod +x /usr/bin/setup-cgroups.sh cat <<-"EOF" > /etc/systemd/system/shield-cpus.service [Unit] Descriptio=Shield CPUs DefaultDependencies=no After=systemd-udev-settle.service Before=lvm2-activation-early.service Wants=systemd-udev-settle.service [Service] ExecStart=/usr/bin/setup-cgroups.sh RemainAfterExit=true Type=oneshot [Install] WantedBy=basic.target EOF systemctl enable shield-cpus reboot
As root user, verify that isolation has been applied:
cset set -l
Example of system response:
cset: Name CPUs-X MEMs-X Tasks Subs Path ------------ ---------- - ------- - ----- ---- ---------- root 0-15 y 0 y 165 4 / kubepods 0-3 n 0 n 0 2 /kubepods docker 0-3 n 0 n 0 0 /docker system 0-3 n 0 n 65 0 /system com.docker.ucp 0-3 n 0 n 0 0 /com.docker.ucp
Run the
cpustress
container:docker run -it --name cpustress --rm containerstack/cpustress --cpu 4 --timeout 30s --metrics-brief
Verify that isolated cores are not affected:
htop
Example of system response highlighting the load created on all available Docker cores:
Note
Consider this section as part of Deploy an OpenStack cluster.
The majority of CPU topologies features are activated by NUMATopologyFilter
that is enabled by default. Such features do not require any further service
configuration and can be used directly on a vanilla MOSK
deployment. The list of the CPU topologies features includes, for example:
NUMA placement policies
CPU pinning policies
CPU thread pinning policies
CPU topologies
To enable libvirt CPU pinning through the node-specific overrides in the
OpenStackDeployment
custom resource, use the following sample
configuration structure:
spec:
nodes:
<NODE-LABEL>::<NODE-LABEL-VALUE>:
services:
compute:
nova:
nova_compute:
values:
conf:
nova:
compute:
cpu_dedicated_set: 2-17
cpu_shared_set: 18-47
Note
Consider this section as part of Deploy an OpenStack cluster.
The Peripheral Component Interconnect (PCI) passthrough feature in OpenStack allows full access and direct control over physical PCI devices in guests. The mechanism applies to any kind of PCI devices including a Network Interface Card (NIC), Graphics Processing Unit (GPU), and any other device that can be attached to a PCI bus. The only requirement for the guest to properly use the device is to correctly install the driver.
To enable PCI passthrough in a MOSK deployment:
For Linux X86 compute nodes, verify that the following features are enabled on the host:
VT-d in BIOS
IOMMU on the host operating system as described in Enable SR-IOV with OVS.
Configure the
nova-api
service that is scheduled on OpenStack controllers nodes. To generate the alias for PCI innova.conf
, add the alias configuration through theOpenStackDeployment
CR.Note
When configuring PCI with SR-IOV on the same host, the values specified in
alias
take precedence. Therefore, add the SR-IOV devices topassthrough_whitelist
explicitly.For example:
spec: services: compute: nova: values: conf: nova: pci: alias: '{ "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1" }'
Configure the
nova-compute
service that is scheduled on OpenStack compute nodes. To enable Nova to pass PCI devices to virtual machines, configure thepassthrough_whitelist
section innova.conf
through the node-specific overrides in theOpenStackDeployment
CR. For example:spec: nodes: <NODE-LABEL>::<NODE-LABEL-VALUE>: services: compute: nova: nova_compute: values: conf: nova: pci: alias: '{ "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1" }' passthrough_whitelist: | [{"devname":"enp216s0f0","physical_network":"sriovnet0"}, { "vendor_id": "8086", "product_id": "154d" }]
Note
Consider this section as part of Deploy an OpenStack cluster.
Hyperconverged architecture combines OpenStack compute nodes along with Ceph nodes. To avoid nodes overloading, which can cause Ceph performance degradation and outage, limit the hardware resources consumption by the OpenStack compute services.
You can reserve hardware resources for non-workload related consumption using
the following nova-compute
parameters. For details, see
OpenStack documentation: Overcommitting CPU and RAM
and OpenStack documentation: Configuration Options.
cpu_allocation_ratio
- in case of a hyperconverged architecture, the value depends on the number of vCPU used for non-workload related operations, total number of vCPUs of a hyperconverged node, and on workload vCPU consumption:cpu_allocation_ratio = (${vCPU_count_on_a_hyperconverged_node} - ${vCPU_used_for_non_OpenStack_related_tasks}) / ${vCPU_count_on_a_hyperconverged_node} / ${workload_vCPU_utilization}
To define the vCPU count used for non-OpenStack related tasks, use the following formula, considering the storage data plane performance tests:
vCPU_used_for_non-OpenStack_related_tasks = 2 * SSDs_per_hyperconverged_node + 1 * Ceph_OSDs_per_hyperconverged_node + 0.8 * Ceph_OSDs_per_hyperconverged_node
Consider the following example with 5 SSD disks for Ceph OSDs per hyperconverged node and 2 Ceph OSDs per disk:
vCPU_used_for_non-OpenStack_related_tasks = 2 * 5 + 1 * 10 + 0.8 * 10 = 28
In this case, if there are 40 vCPUs per hyperconverged node, 28 vCPUs are required for non-workload related calculations, and a workload consumes 50% of the allocated CPU time:
cpu_allocation_ratio = (40-28) / 40 / 0.5 = 0.6
.
reserved_host_memory_mb
- a dedicated variable in the OpenStack Nova configuration, to reserve memory for non-OpenStack related VM activities:reserved_host_memory_mb = 13 GB * Ceph_OSDs_per_hyperconverged_node
For example for 10 Ceph OSDs per hyperconverged node:
reserved_host_memory_mb = 13 GB * 10 = 130 GB = 133120
ram_allocation_ratio
- the allocation ratio of virtual RAM to physical RAM. To completely exclude the possibility of memory overcommitting, set to1
.
To limit HW resources for hyperconverged OpenStack compute nodes:
In the OpenStackDeployment
CR, specify the cpu_allocation_ratio
,
ram_allocation_ratio
, and reserved_host_memory_mb
parameters as
required using the calculations described above.
For example:
apiVersion: lcm.mirantis.com/v1alpha1
kind: OpenStackDeployment
spec:
services:
compute:
nova:
values:
conf:
nova:
DEFAULT:
cpu_allocation_ratio: 0.6
ram_allocation_ratio: 1
reserved_host_memory_mb: 133120
Note
For an existing OpenStack deployment:
Obtain the name of your
OpenStackDeployment
CR:kubectl -n openstack get osdpl
Open the
OpenStackDeployment
CR for editing and specify the parameters as required.kubectl -n openstack edit osdpl <osdpl name>
Enable image signature verification¶
Available since MOSK 21.6 TechPreview
Note
Consider this section as part of Deploy an OpenStack cluster.
Mirantis OpenStack for Kubernetes (MOSK) enables you to perform image signature verification when booting an OpenStack instance, uploading a Glance image with signature metadata fields set, and creating a volume from an image.
To enable signature verification, use the following osdpl
definition:
spec:
features:
glance:
signature:
enabled: true
When enabled during initial deployment, all internal images such as Amphora, Ironic, and test (CirrOS, Fedora, Ubuntu) images, will be signed by a self-signed certificate.
Enable Telemetry services¶
The Telemetry services monitor OpenStack components, collect and store the telemetry data from them, and perform responsive actions upon this data.
To enable the Telemetry service:
Specify the following definition in the OpenStackDeployment
custom resource (CR):
kind: OpenStackDeployment
spec:
features:
services:
- alarming
- event
- metering
- metric
telemetry:
mode: autoscaling
See also
Configure LoadBalancer for PowerDNS¶
Available since MOSK 22.2
Note
Consider this section as part of Deploy an OpenStack cluster.
Mirantis OpenStack for Kubernetes (MOSK) allows configuring
LoadBalancer for the Designate PowerDNS back end. For example, you can expose a
TCP port for zone transferring using the following exemplary osdpl
definition:
spec:
designate:
backend:
external_ip: 10.172.1.101
protocol: udp
type: powerdns
For the supported values, see LoadBalancer type for PowerDNS.
Access OpenStack after deployment¶
This section contains the guidelines on how to access your MOSK OpenStack environment.
Configure DNS to access OpenStack¶
DNS is a mandatory component for MOSK deployment, all records must be created on the customer DNS server. The OpenStack services are exposed through the Ingress NGINX controller.
Warning
This document describes how to temporarily configure DNS. The workflow contains non-permanent changes that will be rolled back during a managed cluster update or reconciliation loop. Therefore, proceed at your own risk.
To configure DNS to access your OpenStack environment:
Obtain the external IP address of the Ingress service:
kubectl -n openstack get services ingress
Example of system response:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress LoadBalancer 10.96.32.97 10.172.1.101 80:34234/TCP,443:34927/TCP,10246:33658/TCP 4h56m
Select from the following options:
If you have a corporate DNS server, update your corporate DNS service and create appropriate DNS records for all OpenStack public endpoints.
To obtain the full list of public endpoints:
kubectl -n openstack get ingress -ocustom-columns=NAME:.metadata.name,HOSTS:spec.rules[*].host | awk '/namespace-fqdn/ {print $2}'
Example of system response:
barbican.it.just.works cinder.it.just.works cloudformation.it.just.works designate.it.just.works glance.it.just.works heat.it.just.works horizon.it.just.works keystone.it.just.works neutron.it.just.works nova.it.just.works novncproxy.it.just.works octavia.it.just.works placement.it.just.works
If you do not have a corporate DNS server, perform one of the following steps:
Add the appropriate records to
/etc/hosts
locally. For example:10.172.1.101 barbican.it.just.works 10.172.1.101 cinder.it.just.works 10.172.1.101 cloudformation.it.just.works 10.172.1.101 designate.it.just.works 10.172.1.101 glance.it.just.works 10.172.1.101 heat.it.just.works 10.172.1.101 horizon.it.just.works 10.172.1.101 keystone.it.just.works 10.172.1.101 neutron.it.just.works 10.172.1.101 nova.it.just.works 10.172.1.101 novncproxy.it.just.works 10.172.1.101 octavia.it.just.works 10.172.1.101 placement.it.just.works
Deploy your DNS server on top of Kubernetes:
Deploy a standalone CoreDNS server by including the following configuration into
coredns.yaml
:apiVersion: lcm.mirantis.com/v1alpha1 kind: HelmBundle metadata: name: coredns namespace: osh-system spec: repositories: - name: hub_stable url: https://charts.helm.sh/stable releases: - name: coredns chart: hub_stable/coredns version: 1.8.1 namespace: coredns values: image: repository: mirantis.azurecr.io/openstack/extra/coredns tag: "1.6.9" isClusterService: false servers: - zones: - zone: . scheme: dns:// use_tcp: false port: 53 plugins: - name: cache parameters: 30 - name: errors # Serves a /health endpoint on :8080, required for livenessProbe - name: health # Serves a /ready endpoint on :8181, required for readinessProbe - name: ready # Required to query kubernetes API for data - name: kubernetes parameters: cluster.local - name: loadbalance parameters: round_robin # Serves a /metrics endpoint on :9153, required for serviceMonitor - name: prometheus parameters: 0.0.0.0:9153 - name: forward parameters: . /etc/resolv.conf - name: file parameters: /etc/coredns/it.just.works.db it.just.works serviceType: LoadBalancer zoneFiles: - filename: it.just.works.db domain: it.just.works contents: | it.just.works. IN SOA sns.dns.icann.org. noc.dns.icann.org. 2015082541 7200 3600 1209600 3600 it.just.works. IN NS b.iana-servers.net. it.just.works. IN NS a.iana-servers.net. it.just.works. IN A 1.2.3.4 *.it.just.works. IN A 1.2.3.4
Update the public IP address of the Ingress service:
sed -i 's/1.2.3.4/10.172.1.101/' coredns.yaml kubectl apply -f coredns.yaml
Verify that the DNS resolution works properly:
Assign an external IP to the service:
kubectl -n coredns patch service coredns-coredns --type='json' -p='[{"op": "replace", "path": "/spec/ports", "value": [{"name": "udp-53", "port": 53, "protocol": "UDP", "targetPort": 53}]}]' kubectl -n coredns patch service coredns-coredns --type='json' -p='[{"op": "replace", "path": "/spec/type", "value":"LoadBalancer"}]'
Obtain the external IP address of CoreDNS:
kubectl -n coredns get service coredns-coredns
Example of system response:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE coredns-coredns ClusterIP 10.96.178.21 10.172.1.102 53/UDP,53/TCP 25h
Point your machine to use the correct DNS. It is
10.172.1.102
in the example system response above.If you plan to launch Tempest tests or use the OpenStack client from a
keystone-client-XXX
pod, verify that the Kubernetes built-in DNS service is configured to resolve your public FQDN records by adding your public domain toCorefile
. For example, to add theit.just.works
domain:kubectl -n kube-system get configmap coredns -oyaml
Example of system response:
apiVersion: v1 data: Corefile: | .:53 { errors health ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance } it.just.works:53 { errors cache 30 forward . 10.96.178.21 }
Access your OpenStack environment¶
This section explains how to access your OpenStack environment as the Admin user.
Before you proceed, verify that you can access the Kubernetes API and have
privileges to read secrets from the openstack
namespace in Kubernetes or
you are able to exec to the pods in this namespace.
You can use the built-in admin CLI client and execute the openstack
CLI commands from a dedicated pod deployed in the openstack
namespace:
kubectl -n openstack exec \
$(kubectl -n openstack get pod -l application=keystone,component=client -ojsonpath='{.items[*].metadata.name}') \
-ti -- bash
This pod has python-openstackclient
and all required plugins already
installed. Also, this pod has cloud admin credentials stored as appropriate
shell environment variables for the openstack CLI command to
consume.
Configure the external DNS resolution for OpenStack services as described in Configure DNS to access OpenStack.
Obtain the password of the Admin user:
kubectl -n openstack get secret keystone-keystone-admin -ojsonpath='{.data.OS_PASSWORD}' | base64 -d
Access Horizon through your browser using its public service. For example,
https://horizon.it.just.works
.To log in, specify the
admin
user name anddefault
domain. If the OpenStack Identity service has been deployed with the OpenID Connect integration:From the Authenticate using drop-down menu, select OpenID Connect.
Click Connect. You will be redirected to your identity provider to proceed with the authentication.
Note
If OpenStack has been deployed with self-signed TLS certificates for public endpoints, you may get a warning about an untrusted certificate. To proceed, allow the connection.
To be able to access your OpenStack environment using CLI, you need to set the required environment variables that are stored in an OpenStack RC environment file. You can either download a project-specific file from Horizon, which is the easiest way, or create an environment file.
To access OpenStack through CLI, select from the following options:
Download and source the OpenStack RC file:
Log in to Horizon as described in Access an OpenStack environment through Horizon.
Download the
openstackrc
orclouds.yaml
file from the Web interface.On any shell from which you want to run OpenStack commands, source the environment file for the respective project.
Create and source the OpenStack RC file:
Configure the external DNS resolution for OpenStack services as described in Configure DNS to access OpenStack.
Create a stub of the OpenStack RC file:
cat << EOF > openstackrc export OS_PASSWORD=$(kubectl -n openstack get secret keystone-keystone-admin -ojsonpath='{.data.OS_PASSWORD}' | base64 -d) export OS_USERNAME=admin export OS_USER_DOMAIN_NAME=Default export OS_PROJECT_NAME=admin export OS_PROJECT_DOMAIN_NAME=Default export OS_REGION_NAME=RegionOne export OS_INTERFACE=public export OS_IDENTITY_API_VERSION="3" EOF
Add the Keystone public endpoint to this file as the
OS_AUTH_URL
variable. For example, for the domain name used throughout this guide:echo export OS_AUTH_URL=https://keystone.it.just.works >> openstackrc
Source the obtained data into the shell:
source <openstackrc>
Now, you can use the openstack CLI as usual. For example:
openstack user list +----------------------------------+-----------------+ | ID | Name | +----------------------------------+-----------------+ | dc23d2d5ee3a4b8fae322e1299f7b3e6 | internal_cinder | | 8d11133d6ef54349bd014681e2b56c7b | admin | +----------------------------------+-----------------+
Note
If OpenStack was deployed with self-signed TLS certificates for public endpoints, you may need to use the openstack CLI client with certificate validation disabled. For example:
openstack --insecure user list
Troubleshoot an OpenStack deployment¶
This section provides the general debugging instructions for your OpenStack on Kubernetes deployment. Start your troubleshooting with the determination of the failing component that can include the OpenStack Operator, Helm, a particular pod or service.
Note
For Kubernetes cluster debugging and troubleshooting, refer to Kubernetes official documentation: Troubleshoot clusters and Docker Enterprise v3.0 documentation: Monitor and troubleshoot.
Debugging the Helm releases¶
Note
MOSK uses direct communication with Helm 3.
Log in to the
openstack-controller
pod, where the Helm v3 client is installed, or download the Helm v3 binary locally:kubectl -n osh-system get pods |grep openstack-controller
Example of a system response:
openstack-controller-5c6947c996-vlrmv 5/5 Running 0 10m openstack-controller-admission-f946dc8d6-6bgn2 1/1 Running 0 4h3m
Verify the Helm releases statuses:
helm3 --namespace openstack list --all
Example of a system response:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION etcd openstack 4 2021-07-09 11:06:25.377538008 +0000 UTC deployed etcd-0.1.0-mcp-2735 ingress-openstack openstack 4 2021-07-09 11:06:24.892822083 +0000 UTC deployed ingress-0.1.0-mcp-2735 openstack-barbican openstack 4 2021-07-09 11:06:25.733684392 +0000 UTC deployed barbican-0.1.0-mcp-3890 openstack-ceph-rgw openstack 4 2021-07-09 11:06:25.045759981 +0000 UTC deployed ceph-rgw-0.1.0-mcp-2735 openstack-cinder openstack 4 2021-07-09 11:06:42.702963544 +0000 UTC deployed cinder-0.1.0-mcp-3890 openstack-designate openstack 4 2021-07-09 11:06:24.400555027 +0000 UTC deployed designate-0.1.0-mcp-3890 openstack-glance openstack 4 2021-07-09 11:06:25.5916904 +0000 UTC deployed glance-0.1.0-mcp-3890 openstack-heat openstack 4 2021-07-09 11:06:25.3998706 +0000 UTC deployed heat-0.1.0-mcp-3890 openstack-horizon openstack 4 2021-07-09 11:06:23.27538297 +0000 UTC deployed horizon-0.1.0-mcp-3890 openstack-iscsi openstack 4 2021-07-09 11:06:37.891858343 +0000 UTC deployed iscsi-0.1.0-mcp-2735 v1.0.0 openstack-keystone openstack 4 2021-07-09 11:06:24.878052272 +0000 UTC deployed keystone-0.1.0-mcp-3890 openstack-libvirt openstack 4 2021-07-09 11:06:38.185312907 +0000 UTC deployed libvirt-0.1.0-mcp-2735 openstack-mariadb openstack 4 2021-07-09 11:06:24.912817378 +0000 UTC deployed mariadb-0.1.0-mcp-2735 openstack-memcached openstack 4 2021-07-09 11:06:24.852840635 +0000 UTC deployed memcached-0.1.0-mcp-2735 openstack-neutron openstack 4 2021-07-09 11:06:58.96398517 +0000 UTC deployed neutron-0.1.0-mcp-3890 openstack-neutron-rabbitmq openstack 4 2021-07-09 11:06:51.454918432 +0000 UTC deployed rabbitmq-0.1.0-mcp-2735 openstack-nova openstack 4 2021-07-09 11:06:44.277976646 +0000 UTC deployed nova-0.1.0-mcp-3890 openstack-octavia openstack 4 2021-07-09 11:06:24.775069513 +0000 UTC deployed octavia-0.1.0-mcp-3890 openstack-openvswitch openstack 4 2021-07-09 11:06:55.271711021 +0000 UTC deployed openvswitch-0.1.0-mcp-2735 openstack-placement openstack 4 2021-07-09 11:06:21.954550107 +0000 UTC deployed placement-0.1.0-mcp-3890 openstack-rabbitmq openstack 4 2021-07-09 11:06:25.431404853 +0000 UTC deployed rabbitmq-0.1.0-mcp-2735 openstack-tempest openstack 2 2021-07-09 11:06:21.330801212 +0000 UTC deployed tempest-0.1.0-mcp-3890
If a Helm release is not in the
DEPLOYED
state, obtain the details from the output of the following command:helm3 --namespace openstack history <release-name>
To verify the status of a Helm release:
helm3 --namespace openstack status <release-name>
Example of a system response:
NAME: openstack-memcached
LAST DEPLOYED: Fri Jul 9 11:06:24 2021
NAMESPACE: openstack
STATUS: deployed
REVISION: 4
TEST SUITE: None
Debugging the OpenStack Controller¶
The OpenStack Controller is running in several containers in the
openstack-controller-xxxx
pod in the osh-system
namespace.
For the full list of containers and their roles, refer to
OpenStack Controller.
To verify the status of the OpenStack Controller, run:
kubectl -n osh-system get pods
Example of a system response:
NAME READY STATUS RESTARTS AGE
openstack-controller-5c6947c996-vlrmv 5/5 Running 0 17m
openstack-controller-admission-f946dc8d6-6bgn2 1/1 Running 0 4h9m
openstack-operator-ensure-resources-5ls8k 0/1 Completed 0 4h12m
To verify the logs for the osdpl
container, run:
kubectl -n osh-system logs -f <openstack-controller-xxxx> -c osdpl
Debugging the OsDpl CR¶
This section includes the ways to mitigate the most common issues with the OsDpl CR. We assume that you have already debugged the Helm releases and OpenStack Controllers to rule out possible failures with these components as described in Debugging the Helm releases and Debugging the OpenStack Controller.
osdpl
has DEPLOYED=false
¶Possible root cause: One or more Helm releases have not been deployed successfully.
To determine if you are affected:
Verify the status of the osdpl
object:
kubectl -n openstack get osdpl osh-dev
Example of a system response:
NAME AGE DEPLOYED DRAFT
osh-dev 22h false false
To debug the issue:
Identify the failed release by assessing the
status:children
section in the OsDpl resource:Get the OsDpl YAML file:
kubectl -n openstack get osdpl osh-dev -o yaml
Analyze the
status
output using the detailed description in Status OsDpl elements Removed.
For further debugging, refer to Debugging the Helm releases.
Init
¶Possible root cause: MOSK uses the Kubernetes entrypoint
init container to resolve dependencies between objects. If the pod is stuck
in Init:0/X
, this pod may be waiting for its dependencies.
To debug the issue:
Verify the missing dependencies:
kubectl -n openstack logs -f placement-api-84669d79b5-49drw -c init
Example of a system response:
Entrypoint WARNING: 2020/04/21 11:52:50 entrypoint.go:72: Resolving dependency Job placement-ks-user in namespace openstack failed: Job Job placement-ks-user in namespace openstack is not completed yet .
Entrypoint WARNING: 2020/04/21 11:52:52 entrypoint.go:72: Resolving dependency Job placement-ks-endpoints in namespace openstack failed: Job Job placement-ks-endpoints in namespace openstack is not completed yet .
Possible root cause: some OpenStack services depend on Ceph. These services
include OpenStack Image, OpenStack Compute, and OpenStack Block Storage.
If the Helm releases for these services are not present, the
openstack-ceph-keys
secret may be missing in the openstack-ceph-shared
namespace.
To debug the issue:
Verify that the Ceph Controller has created the openstack-ceph-keys
secret in the openstack-ceph-shared
namespace:
kubectl -n openstack-ceph-shared get secrets openstack-ceph-keys
Example of a positive system response: