Mirantis OpenStack for Kubernetes Documentation

This documentation provides information on how to deploy and operate a Mirantis OpenStack for Kubernetes (MOSK) environment. The documentation is intended to help operators to understand the core concepts of the product. The documentation provides sufficient information to deploy and operate the solution.

The information provided in this documentation set is being constantly improved and amended based on the feedback and kind requests from the consumers of MOS.

The following table lists the guides included in the documentation set you are reading:

Guide list

Guide

Purpose

Reference Architecture

Learn the fundamentals of MOSK reference architecture to appropriately plan your deployment

Deployment Guide

Deploy a MOSK environment of a preferred configuration using supported deployment profiles tailored to the demands of specific business cases

Operations Guide

Operate your MOSK environment

Release Notes

Learn about new features and bug fixes in the current MOSK version

Intended audience

This documentation is intended for engineers who have the basic knowledge of Linux, virtualization and containerization technologies, Kubernetes API and CLI, Helm and Helm charts, Mirantis Kubernetes Engine (MKE), and OpenStack.

Documentation history

The following table contains the released revision of the documentation set you are reading.

Release date

Release name

November 05, 2020

MOSK GA release

December 23, 2020

MOSK GA Update release

March 01, 2021

MOSK 21.1

April 22, 2021

MOSK 21.2

June 15, 2021

MOSK 21.3

September 01, 2021

MOSK 21.4

October 05, 2021

MOSK 21.5

November 11, 2021

MOSK 21.6

February 23, 2022

MOSK 22.1

April 14, 2022

MOSK 22.2

June 30, 2022

MOSK 22.3

Conventions

This documentation set uses the following conventions in the HTML format:

Documentation conventions

Convention

Description

boldface font

Inline CLI tools and commands, titles of the procedures and system response examples, table titles

monospaced font

Files names and paths, Helm charts parameters and their values, names of packages, nodes names and labels, and so on

italic font

Information that distinguishes some concept or term

Links

External links and cross-references, footnotes

Main menu > menu item

GUI elements that include any part of interactive user interface and menu navigation

Superscript

Some extra, brief information

Note

The Note block

Messages of a generic meaning that may be useful for the user

Caution

The Caution block

Information that prevents a user from mistakes and undesirable consequences when following the procedures

Warning

The Warning block

Messages that include details that can be easily missed, but should not be ignored by the user and are valuable before proceeding

See also

The See also block

List of references that may be helpful for understanding of some related tools, concepts, and so on

Learn more

The Learn more block

Used in the Release Notes to wrap a list of internal references to the reference architecture, deployment and operation procedures specific to a newly implemented product feature

Product Overview

Mirantis OpenStack for Kubernetes (MOSK) combines the power of Mirantis Container Cloud for delivering and managing Kubernetes clusters, with the industry standard OpenStack APIs, enabling you to build your own cloud infrastructure.

The advantages of running all of the OpenStack components as a Kubernetes application are multi-fold and include the following:

  • Zero downtime, non disruptive updates

  • Fully automated Day-2 operations

  • Full-stack management from bare metal through the operating system and all the necessary components

The list of the most common use cases includes:

Software-defined data center

The traditional data center requires multiple requests and interactions to deploy new services, by abstracting the data center functionality behind a standardised set of APIs service can be deployed faster and more efficiently. MOSK enables you to define all your data center resources behind the industry standard OpenStack APIs allowing you to automate the deployment of applications or simply request resources through the UI to quickly and efficiently provision virtual machines, storage, networking, and other resources.

Virtual Network Functions (VNFs)

VNFs require high performance systems that can be accessed on demand in a standardised way, with assurances that they will have access to the necessary resources and performance guarantees when needed. MOSK provides extensive support for VNF workload enabling easy access to functionality such as Intel EPA (NUMA, CPU pinning, Huge Pages) as well as the consumption of specialised networking interfaces cards to support SR-IOV and DPDK. The centralised management model of MOSK and Mirantis Container Cloud also enables the easy management of multiple MOSK deployments with full lifecycle management.

Legacy workload migration

With the industry moving toward cloud-native technologies many older or legacy applications are not able to be moved easily and often it does not make financial sense to transform the applications to cloud-native applications. MOSK provides a stable cloud platform that can cost effectively host legacy applications whilst still providing the expected levels of control, customization, and uptime.

Reference Architecture

Mirantis OpenStack for Kubernetes (MOSK) is a virtualization platform that provides an infrastructure for cloud-ready applications, in combination with reliability and full control over the data.

MOSK combines OpenStack, an open-source cloud infrastructure software, with application management techniques used in the Kubernetes ecosystem that include container isolation, state enforcement, declarative definition of deployments, and others.

MOSK integrates with Mirantis Container Cloud to rely on its capabilities for bare-metal infrastructure provisioning, Kubernetes cluster management, and continuous delivery of the stack components.

MOSK simplifies the work of a cloud operator by automating all major cloud life cycle management routines including cluster updates and upgrades.

Deployment profiles

A Mirantis OpenStack for Kubernetes (MOSK) deployment profile is a thoroughly tested and officially supported reference architecture that is guaranteed to work at a specific scale and is tailored to the demands of a specific business case, such as generic IaaS cloud, Network Function Virtualisation infrastructure, Edge Computing, and others.

A deployment profile is defined as a combination of:

  • Services and features the cloud offers to its users.

  • Non-functional characteristics that users and operators should expect when running the profile on top of a reference hardware configuration. Including, but not limited to:

    • Performance characteristics, such as an average network throughput between VMs in the same virtual network.

    • Reliability characteristics, such as the cloud API error response rate when recovering a failed controller node.

    • Scalability characteristics, such as the total amount of virtual routers tenants can run simultaneously.

  • Hardware requirements - the specification of physical servers, and networking equipment required to run the profile in production.

  • Deployment parameters that an operator for the cloud can tweak within a certain range without being afraid of breaking the cloud or losing support.

In addition, the following items may be included in a definition:

  • Compliance-driven technical requirements, such as TLS encryption of all external API endpoints.

  • Foundation-level software components, such as Tungsten Fabric or Open vSwitch as a back end for the networking service.

Note

Mirantis reserves the right to revise the technical implementation of any profile at will while preserving its definition - the functional and non-functional characteristics that operators and users are known to rely on.

MOSK supports a huge list of different deployment profiles to address a wide variety of business tasks. The table below includes the profiles for the most common use cases.

Note

Some components of a MOSK cluster are mandatory and are being installed during the managed cluster deployment by MCC regardless of the deployment profile in use. StackLight is one of the cluster components that are enabled by default. See MCC Operations Guide for details.

Supported deployment profiles

Profile

OpenStackDeployment CR Preset

Description

Cloud Provider Infrastructure (CPI)

compute

Provides the core set of the services an IaaS vendor would need including some extra functionality. The profile is designed to support up 50-70 compute nodes and a reasonable number of storage nodes. 0

The core set of services provided by the profile includes:

  • Compute (Nova)

  • Images (Glance)

  • Networking (Neutron with Open vSwitch as a back end)

  • Identity (Keystone)

  • Block Storage (Cinder)

  • Orchestration (Heat)

  • Load balancing (Octavia)

  • DNS (Designate)

  • Secret Management (Barbican)

  • Web front end (Horizon)

  • Bare metal provisioning (Ironic) 1 2

  • Telemetry (aodh, Panko, Ceilometer, and Gnocchi) 3

CPI with Tungsten Fabric

compute-tf

A variation of the CPI profile 1 with Tugsten Fabric as a back end for networking.

0

The supported node count is approximate and may vary depending on the hardware, cloud configuration, and planned workload.

1(1,2)

Ironic is an optional component for the CPI profile. See Bare metal OsDpl configuration for details.

2

Ironic is not supported for the CPI with Tungsten Fabric profile. See Tungsten Fabric known limitations for details.

3

Telemetry services are optional components and should be enabled together through the list of services to be deployed in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Components overview

Mirantis OpenStack for Kubernetes (MOSK) includes the following key design elements.

HelmBundle Operator

The HelmBundle Operator is the realization of the Kubernetes Operator pattern that provides a Kubernetes custom resource of the HelmBundle kind and code running inside a pod in Kubernetes. This code handles changes, such as creation, update, and deletion, in the Kubernetes resources of this kind by deploying, updating, and deleting groups of Helm releases from specified Helm charts with specified values.

OpenStack

The OpenStack platform manages virtual infrastructure resources, including virtual servers, storage devices, networks, and networking services, such as load balancers, as well as provides management functions to the tenant users.

Various OpenStack services are running as pods in Kubernetes and are represented as appropriate native Kubernetes resources, such as Deployments, StatefulSets, and DaemonSets.

For a simple, resilient, and flexible deployment of OpenStack and related services on top of a Kubernetes cluster, MOSK uses OpenStack-Helm that provides a required collection of the Helm charts.

Also, MOSK uses OpenStack Operator as the realization of the Kubernetes Operator pattern. The OpenStack Operator provides a custom Kubernetes resource of the OpenStackDeployment kind and code running inside a pod in Kubernetes. This code handles changes such as creation, update, and deletion in the Kubernetes resources of this kind by deploying, updating, and deleting groups of the Helm releases.

Ceph

Ceph is a distributed storage platform that provides storage resources, such as objects and virtual block devices, to virtual and physical infrastructure.

MOSK uses Rook as the implementation of the Kubernetes Operator pattern that manages resources of the CephCluster kind to deploy and manage Ceph services as pods on top of Kubernetes to provide Ceph-based storage to the consumers, which include OpenStack services, such as Volume and Image services, and underlying Kubernetes through Ceph CSI (Container Storage Interface).

The Ceph Controller is the implementation of the Kubernetes Operator pattern, that manages resources of the MiraCeph kind to simplify management of the Rook-based Ceph clusters.

StackLight Logging, Monitoring, and Alerting

The StackLight component is responsible for collection, analysis, and visualization of critical monitoring data from physical and virtual infrastructure, as well as alerting and error notifications through a configured communication system, such as email. StackLight includes the following key sub-components:

  • Prometheus

  • OpenSearch

  • OpenSearch Dashboards

  • Fluentd

Requirements

MOSK cluster hardware requirements

This section provides hardware requirements for the Mirantis Container Cloud management cluster with a managed Mirantis OpenStack for Kubernetes (MOSK) cluster.

For installing MOSK, the Mirantis Container Cloud management cluster and managed cluster must be deployed with baremetal provider.

Note

One of the industry best practices is to verify every new update or configuration change in a non-customer-facing environment before applying it to production. Therefore, Mirantis recommends having a staging cloud, deployed and maintained along with the production clouds. The recommendation is especially applicable to the environments that:

  • Receive updates often and use continuous delivery. For example, any non-isolated deployment of Mirantis Container Cloud.

  • Have significant deviations from the reference architecture or third party extensions installed.

  • Are managed under the Mirantis OpsCare program.

  • Run business-critical workloads where even the slightest application downtime is unacceptable.

A typical staging cloud is a complete copy of the production environment including the hardware and software configurations, but with a bare minimum of compute and storage capacity.

The table below describes the node types the MOSK reference architecture includes.

MOSK node types

Node type

Description

Mirantis Container Cloud management cluster nodes

The Container Cloud management cluster architecture on bare metal requires three physical servers for manager nodes. On these hosts, we deploy a Kubernetes cluster with services that provide Container Cloud control plane functions.

OpenStack control plane node and StackLight node

Host OpenStack control plane services such as database, messaging, API, schedulers conductors, and L3 and L2 agents, as well as the StackLight components.

Note

MOSK enables the cloud administrator to collocate the OpenStack control plane with the managed cluster master nodes on the OpenStack deployments of a small size. This capability is available as technical preview. Use such configuration for testing and evaluation purposes only.

Tenant gateway node

Optional. Hosts OpenStack gateway services including L2, L3, and DHCP agents. The tenant gateway nodes are combined with OpenStack control plane nodes. The strict requirement is a dedicated physical network (bond) for tenant network traffic.

Tungsten Fabric control plane node

Required only if Tungsten Fabric is enabled as a back end for the OpenStack networking. These nodes host the TF control plane services such as Cassandra database, messaging, API, control, and configuration services.

Tungsten Fabric analytics node

Required only if Tungsten Fabric is enabled as a back end for the OpenStack networking. These nodes host the TF analytics services such as Cassandra, ZooKeeper, and collector.

Compute node

Hosts the OpenStack Compute services such as QEMU, L2 agents, and others.

Infrastructure nodes

Runs underlying Kubernetes cluster management services. The MOSK reference configuration requires minimum three infrastructure nodes.

The table below specifies the hardware resources the MOSK reference architecture recommends for each node type.

Hardware requirements

Node type

# of servers

CPU cores # per server

RAM per server, GB

Disk space per server, GB

NICs # per server

Mirantis Container Cloud management cluster node

3 0

16

128

1 SSD x 960
2 SSD x 1900 1

3 2

OpenStack control plane, gateway 3, and StackLight nodes

3 or more

32

128

1 SSD x 500
2 SSD x 1000 6

5

Tenant gateway (optional)

0-3

32

128

1 SSD x 500

5

Tungsten Fabric control plane nodes 4

3

16

64

1 SSD x 500

1

Tungsten Fabric analytics nodes 4

3

32

64

1 SSD x 1000

1

Compute node

3 (varies)

16

64

1 SSD x 500 7

5

Infrastructure node (Kubernetes cluster management)

3 8

16

64

1 SSD x 500

5

Infrastructure node (Ceph) 5

3

16

64

1 SSD x 500
2 HDDs x 2000

5

Note

The exact hardware specifications and number of the control plane and gateway nodes depend on a cloud configuration and scaling needs. For example, for the clouds with more than 12,000 Neutron ports, Mirantis recommends increasing the number of gateway nodes.

0

Adding more than 3 nodes to a management or regional cluster is not supported.

1

In total, at least 3 disks are required:

  • sda - system storage, minimum 60 GB

  • sdb - Container Cloud services storage, not less than 110 GB. The exact capacity requirements depend on StackLight data retention period.

  • sdc - for persistent storage on Ceph

See Management cluster storage for details.

2

OOB management (IPMI) port is not included.

3

OpenStack gateway services can optionally be moved to separate nodes.

4(1,2)

TF control plane and analytics nodes can be combined with a respective addition of RAM, CPU, and disk space to the hardware hosts. Though, Mirantis does not recommend such configuration for production environments as the risk of the cluster downtime if one of the nodes unexpectedly fails increases.

5
  • A Ceph cluster with 3 Ceph nodes does not provide hardware fault tolerance and is not eligible for recovery operations, such as a disk or an entire node replacement.

  • A Ceph cluster uses the replication factor that equals 3. If the number of Ceph OSDs is less than 3, a Ceph cluster moves to the degraded state with the write operations restriction until the number of alive Ceph OSDs equals the replication factor again.

6
  • 1 SSD x 500 for operating system

  • 1 SSD x 1000 for OpenStack LVP

  • 1 SSD x 1000 for StackLight LVP

7

When Nova is used with local folders, additional capacity is required depending on the VM images size.

8

For nodes hardware requirements, refer to Container Cloud Reference Architecture: Managed cluster hardware configuration.

Note

If you would like to evaluate the MOSK capabilities and do not have much hardware at your disposal, you can deploy it in a virtual environment. For example, on top of another OpenStack cloud using the sample Heat templates.

Please mind, the tooling is provided for reference only and is not a part of the product itself. Mirantis does not guarantee its interoperability with any MOSK version.

Management cluster storage

The management cluster requires minimum three storage devices per node. Each device is used for different type of storage:

  • One storage device for boot partitions and root file system. SSD is recommended. A RAID device is not supported.

  • One storage device per server is reserved for local persistent volumes. These volumes are served by the Local Storage Static Provisioner, that is local-volume-provisioner, and used by many services of Mirantis Container Cloud.

  • At least one disk per server must be configured as a device managed by a Ceph OSD.

  • The minimal recommended number of Ceph OSDs for management cluster is 2 OSDs per node, to the total of 6 OSDs.

  • The recommended replication factor is 3, which ensures that no data is lost if any single node of the management cluster fails.

You can configure host storage devices using BareMetalHostProfile resources.

System requirements for the seed node

The seed node is only necessary to deploy the management cluster. When the bootstrap is complete, the bootstrap node can be discarded and added back to the MOSK cluster as a node of any type.

The minimum reference system requirements for a baremetal-based bootstrap seed node are as follow:

  • Basic Ubuntu 18.04 server with the following configuration:

    • Kernel of version 4.15.0-76.86 or later

    • 8 GB of RAM

    • 4 CPU

    • 10 GB of free disk space for the bootstrap cluster cache

  • No DHCP or TFTP servers on any NIC networks

  • Routable access IPMI network for the hardware servers.

  • Internet access for downloading of all required artifacts

    If you use a firewall or proxy, make sure that the bootstrap, management, and regional clusters have access to the following IP ranges and domain names:

    • IP ranges:

    • Domain names:

      • mirror.mirantis.com and repos.mirantis.com for packages

      • binary.mirantis.com for binaries and Helm charts

      • mirantis.azurecr.io for Docker images

      • mcc-metrics-prod-ns.servicebus.windows.net:9093 for Telemetry (port 443 if proxy is enabled)

      • mirantis.my.salesforce.com for Salesforce alerts

    Note

    • Access to Salesforce is required from any Container Cloud cluster type.

    • If any additional Alertmanager notification receiver is enabled, for example, Slack, its endpoint must also be accessible from the cluster.

Components collocation

MOSK uses Kubernetes labels to place components onto hosts. For the default locations of components, see MOSK cluster hardware requirements. Additionally, MOSK supports component collocation. This is mostly useful for OpenStack compute and Ceph nodes. For component collocation, consider the following recommendations:

  • When calculating hardware requirements for nodes, consider the requirements for all collocated components.

  • When performing maintenance on a node with collocated components, execute the maintenance plan for all of them.

  • When combining other services with the OpenStack compute host, verify that reserved_host_* has increased accordingly to the needs of collocated components by using node-specific overrides for the compute service.

Infrastructure requirements

This section lists the infrastructure requirements for the Mirantis OpenStack for Kubernetes (MOSK) reference architecture.

Infrastructure requirements

Service

Description

MetalLB

MetalLB exposes external IP addresses to access applications in a Kubernetes cluster.

DNS

The Kubernetes Ingress NGINX controller is used to expose OpenStack services outside of a Kubernetes deployment. Access to the Ingress services is allowed only by its FQDN. Therefore, DNS is a mandatory infrastructure service for an OpenStack on Kubernetes deployment.

Automatic upgrade of a host operating system

To keep operating system on a bare metal host up to date with the latest security updates, the operating system requires periodic software packages upgrade that may or may not require the host reboot.

Mirantis Container Cloud uses life cycle management tools to update the operating system packages on the bare metal hosts. Container Cloud may also trigger restart of bare metal hosts to apply the updates.

In a management cluster, software package upgrade and host restart are applied automatically when a new Container Cloud version with available kernel or software packages upgrade is released.

In a managed cluster, package upgrade and host restart are applied as part of usual cluster update, when applicable. To start planning the maintenance window and proceed with the managed cluster update, see Update a MOSK cluster to 22.1 or 22.2.

Operating system upgrade and host restart are applied to cluster nodes one by one. If Ceph is installed in the cluster, the Container Cloud orchestration securely pauses the Ceph OSDs on the node before restart. This allows avoiding degradation of the storage service.

OpenStack

OpenStack Operator

The OpenStack Operator component is a combination of the following entities:

OpenStack Controller

The OpenStack Controller runs in a set of containers in a pod in Kubernetes. The OpenStack Controller is deployed as a Deployment with 1 replica only. The failover is provided by Kubernetes that automatically restarts the failed containers in a pod.

However, given the recommendation to use a separate Kubernetes cluster for each OpenStack deployment, the controller in envisioned mode for operation and deployment will only manage a single OpenStackDeployment resource, making the proper HA much less of an issue.

The OpenStack Controller is written in Python using Kopf, as a Python framework to build Kubernetes operators, and Pykube, as a Kubernetes API client.

Using Kubernetes API, the controller subscribes to changes to resources of kind: OpenStackDeployment, and then reacts to these changes by creating, updating, or deleting appropriate resources in Kubernetes.

The basic child resources managed by the controller are Helm releases. They are rendered from templates taking into account an appropriate values set from the main and features fields in the OpenStackDeployment resource.

Then, the common fields are merged to resulting data structures. Lastly, the services fields are merged providing the final and precise override for any value in any Helm release to be deployed or upgraded.

The constructed values are then used by the OpenStack Controller during a Helm release installation.

OpenStack Controller containers

Container

Description

osdpl

The core container that handles changes in the osdpl object.

helmbundle

The container that watches the helmbundle objects and reports their statuses to the osdpl object in status:children. See Status OsDpl elements Removed for details.

health

The container that watches all Kubernetes native resources, such as Deployments, Daemonsets, Statefulsets, and reports their statuses to the osdpl object in status:health. See Status OsDpl elements Removed for details.

secrets

The container that provides data exchange between different components such as Ceph.

node

The container that handles the node events.

_images/openstack_controller.png
OpenStackDeployment Admission Controller

The CustomResourceDefinition resource in Kubernetes uses the OpenAPI Specification version 2 to specify the schema of the resource defined. The Kubernetes API outright rejects the resources that do not pass this schema validation.

The language of the schema, however, is not expressive enough to define a specific validation logic that may be needed for a given resource. For this purpose, Kubernetes enables the extension of its API with Dynamic Admission Control.

For the OpenStackDeployment (OsDpl) CR the ValidatingAdmissionWebhook is a natural choice. It is deployed as part of OpenStack Controller by default and performs specific extended validations when an OsDpl CR is created or updated.

The inexhaustive list of additional validations includes:

  • Deny the OpenStack version downgrade

  • Deny the OpenStack version skip-level upgrade

  • Deny the OpenStack master version deployment

  • Deny upgrade to the OpenStack master version

  • Deny upgrade if any part of an OsDpl CR specification changes along with the OpenStack version

Under specific circumstances, it may be viable to disable the Admission Controller, for example, when you attempt to deploy or upgrade to the master version of OpenStack.

Warning

Mirantis does not support MOSK deployments performed without the OpenStackDeployment Admission Controller enabled. Disabling of the OpenStackDeployment Admission Controller is only allowed in staging non-production environments.

To disable the Admission Controller, ensure that the following structures and values are present in the openstack-controller HelmBundle resource:

apiVersion: lcm.mirantis.com/v1alpha1
kind: HelmBundle
metadata:
  name: openstack-operator
  namespace: osh-system
spec:
  releases:
  - name: openstack-operator
    values:
      admission:
        enabled: false

At that point, all safeguards except for those expressed by the CR definition are disabled.

OpenStackDeployment custom resource

The resource of kind OpenStackDeployment (OsDpl) is a custom resource (CR) defined by a resource of kind CustomResourceDefinition. This section is intended to provide a detailed overview of the OsDpl configuration including the definition of its main elements as well as the configuration of extra OpenStack services that do no belong to standard deployment profiles.

OsDpl standard configuration

The detailed information about schema of an OpenStackDeployment (OsDpl) custom resource can be obtained by running:

kubectl get crd openstackdeployments.lcm.mirantis.com -oyaml

The definition of a particular OpenStack deployment can be obtained by running:

kubectl -n openstack get osdpl -oyaml
Example of an OpenStackDeployment CR of minimum configuration
apiVersion: lcm.mirantis.com/v1alpha1
kind: OpenStackDeployment
metadata:
  name: openstack-cluster
  namespace: openstack
spec:
  openstack_version: victoria
  preset: compute
  size: tiny
  internal_domain_name: cluster.local
  public_domain_name: it.just.works
  features:
    neutron:
      tunnel_interface: ens3
      external_networks:
        - physnet: physnet1
          interface: veth-phy
          bridge: br-ex
          network_types:
           - flat
          vlan_ranges: null
          mtu: null
      floating_network:
        enabled: False
    nova:
      live_migration_interface: ens3
      images:
        backend: local

For the detailed description of the OsDpl main elements, see sections below:

Main OsDpl elements
apiVersion

Specifies the version of the Kubernetes API that is used to create this object.

kind

Specifies the kind of the object.

metadata:name

Specifies the name of metadata. Should be set in compliance with the Kubernetes resource naming limitations.

metadata:namespace

Specifies the metadata namespace. While technically it is possible to deploy OpenStack on top of Kubernetes in other than openstack namespace, such configuration is not included in the MOSK system integration test plans. Therefore, we do not recommend such scenario.

Warning

Both OpenStack and Kubernetes platforms provide resources to applications. When OpenStack is running on top of Kubernetes, Kubernetes is completely unaware of OpenStack-native workloads, such as virtual machines, for example.

For better results and stability, Mirantis recommends using a dedicated Kubernetes cluster for OpenStack, so that OpenStack and auxiliary services, Ceph, and StackLight are the only Kubernetes applications running in the cluster.

spec

Contains the data that defines the OpenStack deployment and configuration. It has both high-level and low-level sections.

The very basic values that must be provided include:

spec:
  openstack_version:
  preset:
  size:
  public_domain_name:

For the detailed description of the spec subelements, see Spec OsDpl elements.

Spec OsDpl elements
openstack_version

Specifies the OpenStack release to deploy.

preset

String that specifies the name of the preset, a predefined configuration for the OpenStack cluster. A preset includes:

  • A set of enabled services that includes virtualization, bare metal management, secret management, and others

  • Major features provided by the services, such as VXLAN encapsulation of the tenant traffic

  • Integration of services

Every supported deployment profile incorporates an OpenStack preset. Refer to Deployment profiles for the list of possible values.

size

String that specifies the size category for the OpenStack cluster. The size category defines the internal configuration of the cluster such as the number of replicas for service workers and timeouts, etc.

The list of supported sizes include:

  • tiny - for approximately 10 OpenStack compute nodes

  • small - for approximately 50 OpenStack compute nodes

  • medium - for approximately 100 OpenStack compute nodes

public_domain_name

Specifies the public DNS name for OpenStack services. This is a base DNS name that must be accessible and resolvable by API clients of your OpenStack cloud. It will be present in the OpenStack endpoints as presented by the OpenStack Identity service catalog.

The TLS certificates used by the OpenStack services (see below) must also be issued to this DNS name.

persistent_volume_storage_class

Specifies the Kubernetes storage class name used for services to create persistent volumes. For example, backups of MariaDB. If not specified, the storage class marked as default will be used.

features

Contains the top-level collections of settings for the OpenStack deployment that potentially target several OpenStack services. The section where the customizations should take place.

features:services

Contains a list of extra OpenStack services to deploy. Extra OpenStack services are services that are not included into preset.

features:services:object-storage

Enables the object storage and provides a RADOS Gateway Swift API that is compatible with the OpenStack Swift API. To enable the service, add object-storage to the service list:

spec:
  features:
    services:
    - object-storage

To create the RADOS Gateway pool in Ceph, see Container Cloud Operations Guide: Enable Ceph RGW Object Storage.

features:services:instance-ha

TechPreview

Enables Masakari, the OpenStack service that ensures high availability of instances running on a host. To enable the service, add instance-ha to the service list:

spec:
  features:
    services:
    - instance-ha
features:services:tempest

Enables tests against a deployed OpenStack cloud:

spec:
  features:
    services:
    - tempest
features:ssl

Deprecated since 22.3

Setting this field in the OpenStackDeployment custom resource has been deprecated. Use OpenStackDeploymentSecret custom resource to define the cloud’s secret parameters.

For the deprecation details, refer to OpenStackDeployment CR fields containing cloud secret parameters.

features:neutron:tunnel_interface

Defines the name of the NIC device on the actual host that will be used for Neutron.

We recommend setting up your Kubernetes hosts in such a way that networking is configured identically on all of them, and names of the interfaces serving the same purpose or plugged into the same network are consistent across all physical nodes.

features:neutron:dns_servers

Defines the list of IPs of DNS servers that are accessible from virtual networks. Used as default DNS servers for VMs.

features:neutron:external_networks

Contains the data structure that defines external (provider) networks on top of which the Neutron networking will be created.

features:neutron:floating_network

If enabled, must contain the data structure defining the floating IP network that will be created for Neutron to provide external access to your Nova instances.

features:nova:live_migration_interface

Specifies the name of the NIC device on the actual host that will be used by Nova for the live migration of instances.

We recommend setting up your Kubernetes hosts in such a way that networking is configured identically on all of them, and names of the interfaces serving the same purpose or plugged into the same network are consistent across all physical nodes.

Also, set the option to vhost0 in the following cases:

  • The Neutron service uses Tungsten Fabric.

  • Nova migrates instances through the interface specified by the Neutron’s tunnel_interface parameter.

features:nova:images:backend

Defines the type of storage for Nova to use on the compute hosts for the images that back up the instances.

The list of supported options include:

  • local - the local storage is used. The pros include faster operation, failure domain independency from the external storage. The cons include local space consumption and less performant and robust live migration with block migration.

  • ceph - instance images are stored in a Ceph pool shared across all Nova hypervisors. The pros include faster image start, faster and more robust live migration. The cons include considerably slower IO performance, workload operations direct dependency on Ceph cluster availability and performance.

  • lvm TechPreview - instance images and ephemeral images are stored on a local Logical Volume. If specified, features:nova:images:lvm:volume_group must be set to an available LVM Volume Group, by default, nova-vol. For details, see Enable LVM ephemeral storage.

features:barbican:backends:vault

Specifies the object containing the Vault parameters to connect to Barbican. The list of supported options includes:

  • enabled - boolean parameter indicating that the Vault back end is enabled.

  • approle_role_id Deprecated since 22.3 - Vault app role ID.

    Setting this field in the OpenStackDeployment custom resource has been deprecated. Use OpenStackDeploymentSecret custom resource to define the cloud’s secret parameters.

    For the deprecation details, refer to OpenStackDeployment CR fields containing cloud secret parameters.

  • approle_secret_id Deprecated since 22.3 - secret ID created for the app role.

    Setting this field in the OpenStackDeployment custom resource has been deprecated. Use OpenStackDeploymentSecret custom resource to define the cloud’s secret parameters.

    For the deprecation details, refer to OpenStackDeployment CR fields containing cloud secret parameters.

  • vault_url - URL of the Vault server.

  • use_ssl - enables the SSL encryption. Since MOSK does not currently support the Vault SSL encryption, the use_ssl parameter should be set to false.

  • kv_mountpoint TechPreview - optional, specifies the mountpoint of a Key-Value store in Vault to use.

  • namespace TechPreview - optional, specifies the Vault namespace to use with all requests to Vault.

    Note

    The Vault namespaces feature is available only in Vault Enterprise.

    Note

    Vault namespaces are supported only starting from the OpenStack Victoria release.

If the Vault back end is used, configure it properly using the following parameters:

spec:
  features:
    barbican:
      backends:
        vault:
          enabled: true
          approle_role_id: <APPROLE_ROLE_ID>
          approle_secret_id: <APPROLE_SECRET_ID>
          vault_url: <VAULT_SERVER_URL>
          use_ssl: false

Note

Since MOSK does not currently support the Vault SSL encryption, set the use_ssl parameter to false.

features:keystone:keycloak

Defines parameters to connect to the Keycloak identity provider. For details, see Integration with Identity Access Management (IAM).

features:keystone:domain_specific_configuration

Defines the domain-specific configuration and is useful for integration with LDAP. An example of OsDpl with LDAP integration, which will create a separate domain.with.ldap domain and configure it to use LDAP as an identity driver:

spec:
  features:
    keystone:
      domain_specific_configuration:
        enabled: true
        domains:
          domain.with.ldap:
            enabled: true
            config:
              assignment:
                driver: keystone.assignment.backends.sql.Assignment
              identity:
                driver: ldap
              ldap:
                chase_referrals: false
                group_desc_attribute: description
                group_id_attribute: cn
                group_member_attribute: member
                group_name_attribute: ou
                group_objectclass: groupOfNames
                page_size: 0
                password: XXXXXXXXX
                query_scope: sub
                suffix: dc=mydomain,dc=com
                url: ldap://ldap01.mydomain.com,ldap://ldap02.mydomain.com
                user: uid=openstack,ou=people,o=mydomain,dc=com
                user_enabled_attribute: enabled
                user_enabled_default: false
                user_enabled_invert: true
                user_enabled_mask: 0
                user_id_attribute: uid
                user_mail_attribute: mail
                user_name_attribute: uid
                user_objectclass: inetOrgPerson
features:telemetry:mode

The information about Telemetry has been amended and updated and is now published in the Telemetry services section. The feature is set to autoscaling by default.

features:logging

Specifies the standard logging levels for OpenStack services that include the following, at increasing severity: TRACE, DEBUG, INFO, AUDIT, WARNING, ERROR, and CRITICAL. For example:

spec:
  features:
    logging:
      nova:
        level: DEBUG
features:horizon:themes

Defines the list of custom OpenStack Dashboard themes. Content of the archive file with a theme depends on the level of customization and can include static files, Django templates, and other artifacts. For the details, refer to OpenStack official documentation: Customizing Horizon Themes.

spec:
  features:
    horizon:
      themes:
        - name: theme_name
          description: The brand new theme
          url: https://<path to .tgz file with the contents of custom theme>
          sha256summ: <SHA256 checksum of the archive above>
features:policies

Defines the list of custom policies for OpenStack services.

Structure example:

spec:
  features:
    policies:
      nova:
        custom_policy: custom_value

The list of services available for configuration includes: Cinder, Nova, Designate, Keystone, Glance, Neutron, Heat, Octavia, Barbican, Placement, Ironic, aodh, Panko, Gnocchi, and Masakari.

Caution

Mirantis is not responsible for cloud operability in case of default policies modifications but provides API to pass the required configuration to the core OpenStack services.

features:database:cleanup

Available since MOSK 21.6

Defines the cleanup of the databases stale entries that are marked by OpenStack services as deleted. The scripts run on a periodic basis as cron jobs. By default, the databases entries older than 30 days are cleaned each Monday as per the following schedule:

Service

Server time

Cinder

12:01 a.m.

Nova

01:01 a.m.

Glance

02:01 a.m.

Masakari

03:01 a.m.

Barbican

04:01 a.m.

Heat

05:01 a.m.

The list of services available for configuration includes: Barbican, Cinder, Glance, Heat, Masakari, and Nova.

Structure example:

spec:
  features:
    database:
      cleanup:
        <os-service>:
          enabled:
          schedule:
          age: 30
          batch: 1000
artifacts

A low-level section that defines the base URI prefixes for images and binary artifacts.

common

A low-level section that defines values that will be passed to all OpenStack (spec:common:openstack) or auxiliary (spec:common:infra) services Helm charts.

Structure example:

spec:
  artifacts:
  common:
    openstack:
      values:
    infra:
      values:
services

A section of the lowest level, enables the definition of specific values to pass to specific Helm charts on a one-by-one basis:

Warning

Mirantis does not recommend changing the default settings for spec:artifacts, spec:common, and spec:services elements. Customizations can compromise the OpenStack deployment update and upgrade processes. However, you may need to edit the spec:services section to limit hardware resources in case of a hyperconverged architecture as described in Limit HW resources for hyperconverged OpenStack compute nodes.

Status OsDpl elements Removed

This feature has been removed in MOSK 22.1 in favor of the OpenStackDeploymentStatus (OsDplSt) custom resource.

Integration with Identity Access Management (IAM)

Mirantis Container Cloud uses the Identity and access management (IAM) service for users and permission management. The IAM integration is enabled by default on the OpenStack side. On the IAM side, the service creates the os client in Keycloak automatically.

The role management and assignment should be configured separately on a particular OpenStack deployment.

Bare metal OsDpl configuration

The Bare metal (Ironic) service is an extra OpenStack service that can be deployed by the OpenStack Operator. This section provides the baremetal-specific configuration options of the OsDpl resource.

To install bare metal services, add the baremetal keyword to the spec:features:services list:

spec:
  features:
    services:
      - baremetal

Note

All bare metal services are scheduled to the nodes with the openstack-control-plane: enabled label.

Ironic agent deployment images

To provision a user image onto a bare metal server, Ironic boots a node with a ramdisk image. Depending on the node’s deploy interface and hardware, the ramdisk may require different drivers (agents). MOSK provides tinyIPA-based ramdisk images and uses the direct deploy interface with the ipmitool power interface.

Example of agent_images configuration:

spec:
  features:
    ironic:
       agent_images:
         base_url: https://binary.mirantis.com/openstack/bin/ironic/tinyipa
         initramfs: tinyipa-stable-ussuri-20200617101427.gz
         kernel: tinyipa-stable-ussuri-20200617101427.vmlinuz

Since the bare metal nodes hardware may require additional drivers, you may need to build a deploy ramdisk for particular hardware. For more information, see Ironic Python Agent Builder. Be sure to create a ramdisk image with the version of Ironic Python Agent appropriate for your OpenStack release.

Bare metal networking

Ironic supports the flat and multitenancy networking modes.

The flat networking mode assumes that all bare metal nodes are pre-connected to a single network that cannot be changed during the virtual machine provisioning. This network with bridged interfaces for Ironic should be spread across all nodes including compute nodes to allow plug-in regular virtual machines to connect to Ironic network. In its turn, the interface defined as provisioning_interface should be spread across gateway nodes. The cloud administrator can perform all these underlying configuration through the L2 templates.

Example of the OsDpl resource illustrating the configuration for the flat network mode:

spec:
  features:
    services:
      - baremetal
    neutron:
      external_networks:
        - bridge: ironic-pxe
          interface: <baremetal-interface>
          network_types:
            - flat
          physnet: ironic
          vlan_ranges: null
    ironic:
       # The name of neutron network used for provisioning/cleaning.
       baremetal_network_name: ironic-provisioning
       networks:
         # Neutron baremetal network definition.
         baremetal:
           physnet: ironic
           name: ironic-provisioning
           network_type: flat
           external: true
           shared: true
           subnets:
             - name: baremetal-subnet
               range: 10.13.0.0/24
               pool_start: 10.13.0.100
               pool_end: 10.13.0.254
               gateway: 10.13.0.11
       # The name of interface where provision services like tftp and ironic-conductor
       # are bound.
       provisioning_interface: br-baremetal

The multitenancy network mode uses the neutron Ironic network interface to share physical connection information with Neutron. This information is handled by Neutron ML2 drivers when plugging a Neutron port to a specific network. MOSK supports the networking-generic-switch Neutron ML2 driver out of the box.

Example of the OsDpl resource illustrating the configuration for the multitenancy network mode:

spec:
  features:
    services:
      - baremetal
    neutron:
      tunnel_interface: ens3
      external_networks:
        - physnet: physnet1
          interface: <physnet1-interface>
          bridge: br-ex
          network_types:
            - flat
          vlan_ranges: null
          mtu: null
        - physnet: ironic
          interface: <physnet-ironic-interface>
          bridge: ironic-pxe
          network_types:
            - vlan
          vlan_ranges: 1000:1099
    ironic:
      # The name of interface where provision services like tftp and ironic-conductor
      # are bound.
      provisioning_interface: <baremetal-interface>
      baremetal_network_name: ironic-provisioning
      networks:
        baremetal:
          physnet: ironic
          name: ironic-provisioning
          network_type: vlan
          segmentation_id: 1000
          external: true
          shared: false
          subnets:
            - name: baremetal-subnet
              range: 10.13.0.0/24
              pool_start: 10.13.0.100
              pool_end: 10.13.0.254
              gateway: 10.13.0.11
Node-specific settings

Depending on the use case, you may need to configure the same application components differently on different hosts. MOSK enables you to easily perform the required configuration through node-specific overrides at the OpenStack Controller side.

The limitation of using the node-specific overrides is that they override only the configuration settings while other components, such as startup scripts and others, should be reconfigured as well.

Caution

The overrides have been implemented in a similar way to the OpenStack node and node label specific DaemonSet configurations. Though, the OpenStack Controller node-specific settings conflict with the upstream OpenStack node and node label specific DaemonSet configurations. Therefore, we do not recommend configuring node and node label overrides.

The list of allowed node labels is located in the Cluster object status providerStatus.releaseRef.current.allowedNodeLabels field.

Starting from MOSK 22.3, if the value field is not defined in allowedNodeLabels, a label can have any value.

Before or after a machine deployment, add the required label from the allowed node labels list with the corresponding value to spec.providerSpec.value.nodeLabels in machine.yaml. For example:

nodeLabels:
- key: <NODE-LABEL>
  value: <NODE-LABEL-VALUE>

The addition of a node label that is not available in the list of allowed node labels is restricted.

The node-specific settings are activated through the spec:nodes section of the OsDpl CR. The spec:nodes section contains the following subsections:

  • features- implements overrides for a limited subset of fields and is constructed similarly to spec::features

  • services - similarly to spec::services, enables you to override settings in general for the components running as DaemonSets.

Example configuration:

spec:
  nodes:
    <NODE-LABEL>::<NODE-LABEL-VALUE>:
      features:
        # Detailed information about features might be found at
        # openstack_controller/admission/validators/nodes/schema.yaml
      services:
        <service>:
          <chart>:
            <chart_daemonset_name>:
              values:
                # Any value from specific helm chart
OpenStackDeploymentSecret custom resource

Available since MOSK 22.3

The resource of kind OpenStackDeploymentSecret (OsDplSecret) is a custom resource that is intended to aggregate cloud’s confidential settings such as SSL/TLS certificates, external systems access credentials, and other secrets.

To obtain detailed information about the schema of an OsDplSecret custom resource, run:

kubectl get crd openstackdeploymentsecret.lcm.mirantis.com -oyaml
Usage

The resource has similar structure as the OpenStackDeployment custom resource and enables the user to set a limited subset of fields that contain sensitive data.

Important

If you are migrating the related fields from the OpenStackDeployment custom resource, refer to Migrating secrets from OpenStackDeployment to OpenStackDeploymentSecret CR.

Example of an OpenStackDeploymentSecret custom resource of minimum configuration:

 1apiVersion: lcm.mirantis.com/v1alpha1
 2kind: OpenStackDeploymentSecret
 3metadata:
 4  name: osh-dev
 5  namespace: openstack
 6spec:
 7  features:
 8    ssl:
 9      public_endpoints:
10        ca_cert: |-
11          -----BEGIN CERTIFICATE-----
12          ...
13          -----END CERTIFICATE-----
14        api_cert: |-
15          -----BEGIN CERTIFICATE-----
16          ...
17          -----END CERTIFICATE-----
18        api_key: |-
19          -----BEGIN RSA PRIVATE KEY-----
20          ...
21          -----END RSA PRIVATE KEY-----
22    barbican:
23      backends:
24        vault:
25          approle_role_id: f6f0f775-...-cc00a1b7d0c3
26          approle_secret_id: 2b5c4b87-...-9bfc6d796f8c
Public endpoints certificates
features:ssl

Contains the content of SSL/TLS certificates (server, key, and CA bundle) used to enable secure communication to public OpenStack API services.

These certificates must be issued to the DNS domain specified in the public_domain_name field.

Vault back end for Secrets service (OpenStack Barbican)
features:barbican:backends:vault

Specifies the object containing parameters used to connect to a Hashicorp Vault instance. The list of supported configurations includes:

  • approle_role_id – Vault app role ID

  • approle_secret_id – Secret ID created for the app role

OpenStackDeploymentStatus custom resource

The resource of kind OpenStackDeploymentStatus (OsDplSt) is a custom resource that describes the status of an OpenStack deployment.

OpenStackDeploymensStatus overview

To obtain detailed information about the schema of an OpenStackDeploymentStatus (OsDplSt) custom resource, run:

kubectl get crd openstackdeploymentstatus.lcm.mirantis.com -oyaml

To obtain the status definition for a particular OpenStack deployment, run:

kubectl -n openstack get osdplst -oyaml

Example of an OsDplSt CR:

kind: OpenStackDeploymentStatus
metadata:
  name: osh-dev
  namespace: openstack
spec: {}
status:
  handle:
    lastStatus: update
  health:
    barbican:
      api:
        generation: 2
        status: Ready
    cinder:
      api:
        generation: 2
        status: Ready
      backup:
        generation: 1
        status: Ready
      scheduler:
        generation: 1
        status: Ready
      volume:
        generation: 1
        status: Ready
  osdpl:
    cause: update
    changes: '((''add'', (''status'',), None, {''watched'': {''ceph'': {''secret'':
      {''hash'': ''0fc01c5e2593bc6569562b451b28e300517ec670809f72016ff29b8cbaf3e729''}}}}),)'
    controller_version: 0.5.3.dev12
    fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
    openstack_version: ussuri
    state: APPLIED
    timestamp: "2021-09-08 17:01:45.633143"
  services:
    baremetal:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:54.081353"
    block-storage:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:57.306669"
    compute:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:18.853068"
    coordination:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:00.593719"
    dashboard:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:57.652145"
    database:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:00.233777"
    dns:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:56.540886"
    identity:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:00.961175"
    image:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:58.976976"
    ingress:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:01.440757"
    key-manager:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:51.822997"
    load-balancer:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:02.462824"
    memcached:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:03.165045"
    messaging:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:58.637506"
    networking:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:35.553483"
    object-storage:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:01.828834"
    orchestration:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:02.846671"
    placement:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:58.039210"
    redis:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:36.562673"

For the detailed description of the OsDplSt main elements, see the sections below:

Health elements

The health subsection provides a brief output on services health.

OsDpl elements

The osdpl subsection describes the overall status of the OpenStack deployment and consists of the following items:

cause

The cause that triggered the LCM action: update when OsDpl is updated, resume when the OpenStack Controller is restarted.

changes

A string representation of changes in the OpenstackDeployment object.

controller_version

The version of openstack-controller that handles the LCM action.

fingerprint

The SHA sum of the OpenStackDeployment object spec section.

openstack_version

The current OpenStack version specified in the osdpl object.

state

The current state of the LCM action. Possible values include:

  • APPLYING - not all operations are completed.

  • APPLIED - all operations are completed.

timestamp

The timestamp of the status:osdpl section update.

Services elements

The services subsection provides detailed information of LCM performed with a specific service. This is a dictionary where keys are service names, for example, baremetal or compute and values are dictionaries with the following items:

controller_verison

The version of the openstack-controller that handles the LCM action on a specific service.

fingerprint

The SHA sum of the OpenStackDeployment object spec section used when perfoming the LCM on a specific service.

openstack_version

The OpenStack version specified in the osdpl object used when performing the LCM action on a specific service.

state

The current state of the LCM action performed on a service. Possible values include:

  • WAITING - waiting for dependencies.

  • APPLYING - not all operations are completed.

  • APPLIED - all operations are completed.

timestamp

The timestamp of the status:services:<SERVICE-NAME> section update.

OpenStack on Kubernetes architecture

OpenStack and auxiliary services are running as containers in the kind: Pod Kubernetes resources. All long-running services are governed by one of the ReplicationController-enabled Kubernetes resources, which include either kind: Deployment, kind: StatefulSet, or kind: DaemonSet.

The placement of the services is mostly governed by the Kubernetes node labels. The labels affecting the OpenStack services include:

  • openstack-control-plane=enabled - the node hosting most of the OpenStack control plane services.

  • openstack-compute-node=enabled - the node serving as a hypervisor for Nova. The virtual machines with tenants workloads are created there.

  • openvswitch=enabled - the node hosting Neutron L2 agents and OpenvSwitch pods that manage L2 connection of the OpenStack networks.

  • openstack-gateway=enabled - the node hosting Neutron L3, Metadata and DHCP agents, Octavia Health Manager, Worker and Housekeeping components.

_images/os-k8s-pods-layout.png

Note

OpenStack is an infrastructure management platform. Mirantis OpenStack for Kubernetes (MOSK) uses Kubernetes mostly for orchestration and dependency isolation. As a result, multiple OpenStack services are running as privileged containers with host PIDs and Host Networking enabled. You must ensure that at least the user with the credentials used by Helm/Tiller (administrator) is capable of creating such Pods.

Infrastructure services

Service

Description

Storage

While the underlying Kubernetes cluster is configured to use Ceph CSI for providing persistent storage for container workloads, for some types of workloads such networked storage is suboptimal due to latency.

This is why the separate local-volume-provisioner CSI is deployed and configured as an additional storage class. Local Volume Provisioner is deployed as kind: DaemonSet.

Database

A single WSREP (Galera) cluster of MariaDB is deployed as the SQL database to be used by all OpenStack services. It uses the storage class provided by Local Volume Provisioner to store the actual database files. The service is deployed as kind: StatefulSet of a given size, which is no less than 3, on any openstack-control-plane node. For details, see OpenStack database architecture.

Messaging

RabbitMQ is used as a messaging bus between the components of the OpenStack services.

A separate instance of RabbitMQ is deployed for each OpenStack service that needs a messaging bus for intercommunication between its components.

An additional, separate RabbitMQ instance is deployed to serve as a notification messages bus for OpenStack services to post their own and listen to notifications from other services. StackLight also uses this message bus to collect notifications for monitoring purposes.

Each RabbitMQ instance is a single node and is deployed as kind: StatefulSet.

Caching

A single multi-instance of the Memcached service is deployed to be used by all OpenStack services that need caching, which are mostly HTTP API services.

Coordination

A separate instance of etcd is deployed to be used by Cinder, which require Distributed Lock Management for coordination between its components.

Ingress

Is deployed as kind: DaemonSet.

Image pre-caching

A special kind: DaemonSet is deployed and updated each time the kind: OpenStackDeployment resource is created or updated. Its purpose is to pre-cache container images on Kubernetes nodes, and thus, to minimize possible downtime when updating container images.

This is especially useful for containers used in kind: DaemonSet resources, as during the image update Kubernetes starts to pull the new image only after the container with the old image is shut down.

OpenStack services

Service

Description

Identity (Keystone)

Uses MySQL back end by default.

keystoneclient - a separate kind: Deployment with a pod that has the OpenStack CLI client as well as relevant plugins installed, and OpenStack admin credentials mounted. Can be used by administrator to manually interact with OpenStack APIs from within a cluster.

Image (Glance)

Supported back end is RBD (Ceph is required).

Volume (Cinder)

Supported back end is RBD (Ceph is required).

Network (Neutron)

Supported back ends are Open vSwitch and Tungsten Fabric.

Placement

Compute (Nova)

Supported hypervisor is Qemu/KVM through libvirt library.

Dashboard (Horizon)

DNS (Designate)

Supported back end is PowerDNS.

Load Balancer (Octavia)

RADOS Gateway Object Storage (SWIFT)

Provides the object storage and a RADOS Gateway Swift API that is compatible with the OpenStack Swift API. You can manually enable the service in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Instance HA (Masakari)

An OpenStack service that ensures high availability of instances running on a host. You can manually enable Masakari in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Orchestration (Heat)

Key Manager (Barbican)

The supported back ends include:

  • The built-in Simple Crypto, which is used by default

  • Vault

    Vault by HashiCorp is a third-party system and is not installed by MOSK. Hence, the Vault storage back end should be available elsewhere on the user environment and accessible from the MOSK deployment.

    If the Vault back end is used, you can configure Vault in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Tempest

Runs tests against a deployed OpenStack cloud. You can manually enable Tempest in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Telemetry

Telemetry services include alarming (aodh), event storage (Panko), metering (Ceilometer), and metric (Gnocchi). All services should be enabled together through the list of services to be deployed in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

OpenStack database architecture

A complete setup of a MariaDB Galera cluster for OpenStack is illustrated in the following image:

_images/os-k8s-mariadb-galera.png

MariaDB server pods are running a Galera multi-master cluster. Clients requests are forwarded by the Kubernetes mariadb service to the mariadb-server pod that has the primary label. Other pods from the mariadb-server StatefulSet have the backup label. Labels are managed by the mariadb-controller pod.

The MariaDB Controller periodically checks the readiness of the mariadb-server pods and sets the primary label to it if the following requirements are met:

  • The primary label has not already been set on the pod.

  • The pod is in the ready state.

  • The pod is not being terminated.

  • The pod name has the lowest integer suffix among other ready pods in the StatefulSet. For example, between mariadb-server-1 and mariadb-server-2, the pod with the mariadb-server-1 name is preferred.

Otherwise, the MariaDB Controller sets the backup label. This means that all SQL requests are passed only to one node while other two nodes are in the backup state and replicate the state from the primary node. The MariaDB clients are connecting to the mariadb service.

OpenStack and Ceph controllers integration

The integration between Ceph and OpenStack controllers is implemented through the shared Kubernetes openstack-ceph-shared namespace. Both controllers have access to this namespace to read and write the Kubernetes kind: Secret objects.

_images/osctl-ceph-integration.png

As Ceph is required and only supported back end for several OpenStack services, all necessary Ceph pools must be specified in the configuration of the kind: MiraCeph custom resource as part of the deployment. Once the Ceph cluster is deployed, the Ceph Controller posts the information required by the OpenStack services to be properly configured as a kind: Secret object into the openstack-ceph-shared namespace. The OpenStack Controller watches this namespace. Once the corresponding secret is created, the OpenStack Controller transforms this secret to the data structures expected by the OpenStack-Helm charts. Even if an OpenStack installation is triggered at the same time as a Ceph cluster deployment, the OpenStack Controller halts the deployment of the OpenStack services that depend on Ceph availability until the secret in the shared namespace is created by the Ceph Controller.

For the configuration of Ceph RADOS Gateway as an OpenStack Object Storage, the reverse process takes place. The OpenStack Controller waits for the OpenStack-Helm to create a secret with OpenStack Identity (Keystone) credentials that RADOS Gateway must use to validate the OpenStack Identity tokens, and posts it back to the same openstack-ceph-shared namespace in the format suitable for consumption by the Ceph Controller. The Ceph Controller then reads this secret and reconfigures RADOS Gateway accordingly.

OpenStack and StackLight integration

StackLight integration with OpenStack includes automatic discovery of RabbitMQ credentials for notifications and OpenStack credentials for OpenStack API metrics. For details, see the openstack.rabbitmq.credentialsConfig and openstack.telegraf.credentialsConfig parameters description in StackLight configuration parameters.

OpenStack and Tungsten Fabric integration

The levels of integration between OpenStack and Tungsten Fabric (TF) include:


Controllers integration

The integration between the OpenStack and TF controllers is implemented through the shared Kubernetes openstack-tf-shared namespace. Both controllers have access to this namespace to read and write the Kubernetes kind: Secret objects.

The OpenStack Controller posts the data into the openstack-tf-shared namespace required by the TF services. The TF controller watches this namespace. Once an appropriate secret is created, the TF controller obtains it into the internal data structures for further processing.

The OpenStack Controller includes the following data for the TF Controller:

  • tunnel_inteface

    Name of the network interface for the TF data plane. This interface is used by TF for the encapsulated traffic for overlay networks.

  • Keystone authorization information

    Keystone Administrator credentials and an up-and-running IAM service are required for the TF Controller to initiate the deployment process.

  • Nova metadata information

    Required for the TF vRrouter agent service.

Also, the OpenStack Controller watches the openstack-tf-shared namespace for the vrouter_port parameter that defines the vRouter port number and passes it to the nova-compute pod.

Services integration

The list of the OpenStack services that are integrated with TF through their API include:

  • neutron-server - integration is provided by the contrail-neutron-plugin component that is used by the neutron-server service for transformation of the API calls to the TF API compatible requests.

  • nova-compute - integration is provided by the contrail-nova-vif-driver and contrail-vrouter-api packages used by the nova-compute service for interaction with the TF vRouter to the network ports.

  • octavia-api - integration is provided by the Octavia TF Driver that enables you to use OpenStack CLI and Horizon for operations with load balancers. See Tungsten Fabric load balancing for details.

Warning

TF is not integrated with the following OpenStack services:

  • DNS service (Designate)

  • Key management (Barbican)

Services

The section explains specifics of the services provided by Mirantis OpenStack for Kubernetes (MOSK). The list of the services and their supported features included in this section is not full and is being constantly amended based on the complexity of the architecture and use of a particular service.

Compute service

Mirantis OpenStack for Kubernetes (MOSK) provides instances management capability through the OpenStack Compute service, or Nova. Nova interacts with other OpenStack components of an OpenStack environment to provide life-cycle management of the virtual machine instances.

vCPU type

Available since MOSK 22.1

host-model is the default CPU model configured for all instances managed by the OpenStack Compute service (Nova), the same as in Nova for the KVM or QEMU hypervisor.

To configure the type of vCPU that Nova will create instances with, use the spec:features:nova:vcpu_type definition in the OpenStackDeployment custom resource.

Supported CPU models

The supported CPU models include:

  • host-model (default) - mimics the host CPU and provides for decent performance, good security, and moderate compatibility with live migrations.

    With this mode, libvirt finds an available predefined CPU model that best matches the host CPU, and then explicitly adds the missing CPU feature flags to closely match the host CPU features. To mitigate known security flaws, libvirt automatically adds critical CPU flags, supported by installed libvirt, QEMU, kernel, and CPU microcode versions.

    This is a safe choice if your OpenStack compute node CPUs are of the same generation. If your OpenStack compute node CPUs are sufficiently different, for example, span multiple CPU generations, Mirantis strongly recommends setting explicit CPU models supported by all of your OpenStack compute node CPUs or organizing your OpenStack compute nodes into host aggregates and availability zones that have largely identical CPUs.

    Note

    The host-model model does not guarantee two-way live migrations between nodes.

    When migrating instances, the libvirt domain XML is first copied as is to the destination OpenStack compute node. Once the instance is hard rebooted or shut down and started again, the domain XML will be re-generated. If versions of libvirt, kernel, CPU microcode, or BIOS firmware differ from what they were on the source compute node the instance was started before, libvirt may pick up additional CPU feature flags, making it impossible to live-migrate back to the original compute node.

  • host-passthrough - provides maximum performance, especially when nested virtualization is required or if live migration support is not a concern for workloads. Live migration requires exactly the same CPU on all OpenStack compute nodes, including the CPU microcode and kernel versions. Therefore, for live migrations support, organize your compute nodes into host aggregates and availability zones. For workload migration between non-identical OpenStack compute nodes, contact Mirantis support.

  • A comma-separated list of exact QEMU CPU models to create and emulate. Specify the common and less advanced CPU models first. All explicit CPU models provided must be compatible with the OpenStack compute node CPUs.

    To specify an exact CPU model, review the available CPU models and their features. List and inspect the /usr/share/libvirt/cpu_map/*.xml files in the libvirt containers of pods of the libvirt DeamonSet or multiple DaemonSets if you are using node-specific settings.

Configuration examples

For example, to set the host-passthrough CPU model for all OpenStack compute nodes:

spec:
  features:
    nova:
      vcpu_type: host-passthrough

For nodes that are labeled with processor=amd-epyc, set a custom EPYC CPU model:

spec:
  nodes:
    processor::amd-epyc
      features:
        nova:
          vcpu_type: EPYC
Networking service

Mirantis OpenStack for Kubernetes (MOSK) Networking service, represented by the OpenStack Neutron service, provides workloads with Connectivity-as-a-Service enabling instances to communicate with each other and the outside world.

The API provided by the service abstracts all the nuances of implementing a virtual network infrastructure on top of your own physical network infrastructure. The service allows cloud users to create advanced virtual network topologies that may include load balancing, virtual private networking, traffic filtering, and other services.

MOSK Networking service supports Open vSwitch and Tungsten Fabric SDN technologies as back ends.

MOSK offers Neutron as a part of its core setup. You can configure the service through the spec:features:neutron section of the OpenStackDeployment custom resource. See features for details.

Networking service known limitations
DVR incompatibility with ARP announcements and VRRP

Due to the known issue #1774459 in the upstream implementation, Mirantis does not recommend using Distributed Virtual Routing (DVR) routers in the same networks as load balancers or other applications that utilize the Virtual Router Redundancy Protocol (VRRP) such as Keepalived. The issue prevents the DVR functionality from working correctly with network protocols that rely on the Address Resolution Protocol (ARP) announcements such as VRRP.

The issue occurs when updating permanent ARP entries for allowed_address_pair IP addresses in DVR routers since DVR performs the ARP table update through the control plane and does not allow any ARP entry to leave the node to prevent the router IP/MAC from contaminating the network.

This results in various network failover mechanisms not functioning in virtual networks that have a distributed virtual router plugged in. For instance, the default back end for MOSK Load Balancing service, represented by OpenStack Octavia with the OpenStack Amphora back end when deployed in the HA mode in a DVR-connected network, is not able to redirect the traffic from a failed active service instance to a standby one without interruption.

DNS service

Mirantis OpenStack for Kubernetes (MOSK) provides DNS records managing capability through the OpenStack DNS service, or Designate.

LoadBalancer type for PowerDNS

Available since MOSK 22.2

The supported back end for Designate is PowerDNS. If required, you can specify whether to use an external IP address or UDP, TCP, or TCP + UDP kind of Kubernetes for the PowerDNS service.

To configure LoadBalancer for PowerDNS, use the spec:features:designate definition in the OpenStackDeployment custom resource.

The list of supported options includes:

  • external_ip - Optional. An IP address for the LoadBalancer service. If not defined, LoadBalancer allocates the IP address.

  • protocol - A protocol for the Designate back end in Kubernetes. Can only be udp, tcp, or tcp+udp.

  • type - The type of the back end for Designate. Can only be powerdns.

For example:

spec:
  features:
    designate:
      backend:
        external_ip: 10.172.1.101
        protocol: udp
        type: powerdns
DNS service known limitations
Inability to set up a secondary DNS zone

Due to an issue in the dnspython library, Asynchronous Transfer Full Range (AXFR) requests do not work and cause inability to set up a secondary DNS zone. The issue affects OpenStack Victoria and will be fixed in the Yoga release.

Instance HA service

Instance High Availability Service or Masakari is an OpenStack project designed to ensure high availability of instances and compute processes running on hosts.

The service consists of the following microservices:

  • API recieves requests from users and events from monitors, and sends them to engine

  • Engine executes recovery workflow

  • Monitors detect failures and notifies API. MOSK uses monitors of the following types:

    • Instance monitor performs liveness of instance processes

    • Host monitor performs liveness of a compute host, runs as part of the Node controller from the OpenStack Controller

    Note

    The Processes monitor is not present in MOSK as far as HA for the compute processes is handled by Kubernetes.

Block Storage service
Volume encryption

TechPreview

The OpenStack Block Storage service (Cinder) supports volume encryption using a key stored in the OpenStack Key Manager service (Barbican). Such configuration uses Linux Unified Key Setup (LUKS) to create an encrypted volume type and attach it to the OpenStack Compute (Nova) instances. Nova retrieves the asymmetric key from Barbican and stores it on the OpenStack compute node as a libvirt key to encrypt the volume locally or on the back end and only after that transfers it to Cinder.

Note

  • To create an encrypted volume under a non-admin user, the creator role must be assigned to the user.

  • When planning your cloud, consider that encryption may impact CPU.

Object Storage service

RADOS Gateway (RGW) provides Object Storage (Swift) API for end users in MOSK deployments. For the API compatibility, refer to Ceph Documentation: Ceph Object Gateway Swift API. You can manually enable the service in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Object storage server-side encryption

Available since MOSK 22.1 TechPreview

RADOS Gateway also provides Amazon S3 compatible API. For details, see Ceph Documentation: Ceph Object Gateway S3 API. Using integration with the OpenStack Key Manager service (Barbican), the objects uploaded through S3 API can be encrypted by RGW according to the AWS Documentation: Protecting data using server-side encryption with customer-provided encryption keys (SSE-C) specification.

Instead of Swift, such configuration uses an S3 client to upload server-side encrypted objects. Using server-side encryption, the data is sent over a secure HTTPS connection in an unencrypted form and the Ceph Object Gateway stores that data in the Ceph cluster in an encrypted form.

Image service

Mirantis OpenStack for Kubernetes (MOSK) provides the image management capability through the OpenStack Image service, aka Glance.

The Image service enables you to discover, register, and retrieve virtual machine images. Using the Glance API, you can query virtual machine image metadata and retrieve actual images.

MOSK deployment profiles include the Image service in the core set of services. You can configure the Image service through the spec:features definition in the OpenStackDeployment custom resource. See features for details.

Image signature verification

Available since MOSK 21.6 TechPreview

MOSK can automatically verify the cryptographic signatures associated with images to ensure the integrity of their data. A signed image has a few additional properties set in its metadata that include img_signature, img_signature_hash_method, img_signature_key_type, and img_signature_certificate_uuid. You can find more information about these properties and their values in the upstream OpenStack documentation.

MOSK performs image signature verification during the following operations:

  • A cloud user or a service creates an image in the store and starts to upload its data. If the signature metadata properties are set on the image, its content gets verified against the signature. The Image service accepts non-signed image uploads.

  • A cloud user spawns a new instance from an image. The Compute service ensures that the data it downloads from the image storage matches the image signature. If the signature is missing or does not match the data, the operation fails. Limitations apply, see Known limitations.

  • A cloud user boots an instance from a volume, or creates a new volume from an image. If the image is signed, the Block Storage service compares the downloaded image data against the signature. If there is a mismatch, the operation fails. The service will accept a non-signed image as a source for a volume. Limitations apply, see Known limitations.

Configuration example
spec:
  features:
    glance:
      signature:
        enabled: true
Signing pre-built images

Every MOSK cloud is pre-provisioned with a baseline set of images containing most popular operating systems, such as Ubuntu, Fedora, CirrOS.

In addition, a few services in MOSK rely on the creation of service instances to provide their functions, namely the Load Balancer service and the Bare Metal service, and require corresponding images to exist in the image store.

When image signature verification is enabled during the cloud deployment, all these images get automatically signed with a pre-generated self-signed certificate. Enabling the feature in an already existing cloud requires manual signing of all of the images stored in it. Consult the OpenStack documentation for an example of the image signing procedure.

Supported storage back ends

The image signature verification is supported for LVM and local back ends for ephemeral storage.

The functionality is not compatible with Ceph-backed ephemeral storage combined with RAW formatted images. The Ceph copy-on-write mechanism enables the user to create instance virtual disks without downloading the image to a compute node, the data is handled completely on the side of a Ceph cluster. This enables you to spin up instances almost momentarily but makes it impossible to verify the image data before creating an instance from it.

Known limitations
  • The Image service does not enforce the presence of a signature in the metadata when the user creates a new image. The service will accept the non-signed image uploads.

  • The Image service does not verify the correctness of an image signature upon update of the image metadata.

  • MOSK does not validate if the certificate used to sign an image is trusted, it only ensures the correctness of the signature itself. Cloud users are allowed to use self-signed certificates.

  • The Compute service does not verify image signature for Ceph back end when the RAW image format is used as described in Supported storage back ends.

  • The Compute service does not verify image signature if the image is already cached on the target compute node.

  • The Instance HA service may experience issues when auto-evacuating instances created from signed images if it does have access to the corresponding secrets in the Key manager service.

  • The Block Storage service does not perform image signature verification when a Ceph back end is used and the images are in the RAW format.

  • The Block Storage service does not enforce the presence of a signature on the images.

Telemetry services

The Telemetry services are part of OpenStack services available in Mirantis OpenStack for Kubernetes (MOSK). The Telemetry services monitor OpenStack components, collect and store the telemetry data from them, and perform responsive actions upon this data. See OpenStack on Kubernetes architecture for details about OpenStack services in MOSK.

OpenStack Ceilometer is a service that collects data from various OpenStack components. The service can also collect and process notifications from different OpenStack services. Ceilometer stores the data in the Gnocchi database. The service is specified as metering in the OpenStackDeployment custom resource (CR).

Gnocchi is an open-source time series database. One of the advantages of this database is the ability to pre-aggregate the telemetry data while storing it. Gnocchi is specified as metric in the OpenStackDeployment CR.

OpenStack Aodh is part of the Telemetry project. Aodh provides a service that creates alarms based on various metric values or specific events and triggers response actions. The service uses data collected and stored by Ceilometer and Gnocchi. Aodh is specified as alarming in the OpenStackDeployment CR.

OpenStack Panko is the service that stores the event data generated by other OpenStack services. The service provides the ability to browse and query the data. Panko is specified as event in the OpenStackDeployment CR.

Note

The OpenStack Panko service has been removed from the product since MOSK 22.2. See Deprecation Notes for details.

Enabling Telemetry services

The Telemetry feature in MOSK has a single mode. The autoscaling mode provides settings for telemetry data collection and storing. The OpenStackDeployment CR should have this mode specified for the correct work of the OpenStack Telemetry services. The autoscaling mode has the following notable configurations:

  • Gnocchi stores cache and data using the Redis storage driver.

  • Metric stores data for one hour with a resolution of 1 minute.

The Telemetry services are disabled by default in MOSK. You have to enable them in the openstackdeployment.yaml file (the OpenStackDeployment CR). The following code block provides an example of deploying the Telemetry services as part of MOSK:

kind: OpenStackDeployment
spec:
  features:
    services:
    - alarming
    - metering
    - metric
  telemetry:
    mode: autoscaling
Advanced configuration
Gnocchi

Gnocchi is not an OpenStack service, so the settings related to its functioning should be included in the spec:common:infra section of the OpenStackDeployment CR.

Ceilometer

Available since MOSK 22.1

The Ceilometer configuration files contain many list structures. Overriding list elements in YAML files is context-dependent and error-prone. Therefore, to override these configuration files, define the spec:services structure in the OpenStackDeployment CR. The spec:services structure provides the ability to use a complete file as text and not as YAML data structure.

Overriding through the spec:services structure is possible for the following files:

  • pipeline.yaml

  • polling.yaml

  • meters.yaml

  • gnocchi_resources.yaml

  • event_pipeline.yaml

  • event_definitions.yaml

Networking

Depending on the size of an OpenStack environment and the components that you use, you may want to have a single or multiple network interfaces, as well as run different types of traffic on a single or multiple VLANs.

This section provides the recommendations for planning the network configuration and optimizing the cloud performance.

Networking overview

Mirantis OpenStack for Kubernetes (MOSK) cluster networking is complex and defined by the security requirements and performance considerations. It is based on the Kubernetes cluster networking provided by Mirantis Container Cloud and expanded to facilitate the demands of the OpenStack virtualization platform.

A Container Cloud Kubernetes cluster provides a platform for MOSK and is considered a part of its control plane. All networks that serve Kubernetes and related traffic are considered control plane networks. The Kubernetes cluster networking is typically focused on connecting pods of different nodes as well as exposing the Kubernetes API and services running in pods into an external network.

The OpenStack networking connects virtual machines to each other and the outside world. Most of the OpenStack-related networks are considered a part of the data plane in an OpenStack cluster. Ceph networks are considered data plane networks for the purpose of this reference architecture.

When planning your OpenStack environment, consider the types of traffic that your workloads generate and design your network accordingly. If you anticipate that certain types of traffic, such as storage replication, will likely consume a significant amount of network bandwidth, you may want to move that traffic to a dedicated network interface to avoid performance degradation.

The following diagram provides a simplified overview of the underlay networking in a MOSK environment:

cluster-networking
Management cluster networking

This page summarizes the recommended networking architecture of a Mirantis Container Cloud management cluster for a Mirantis OpenStack for Kubernetes (MOSK) cluster.

We recommend deploying the management cluster with a dedicated interface for the provisioning (PXE) network. The separation of the provisioning network from the management network ensures additional security and resilience of the solution.

MOSK end users typically should have access to the Keycloak service in the management cluster for authentication to the Horizon web UI. Therefore, we recommend that you connect the management network of the management cluster to an external network through an IP router. The default route on the management cluster nodes must be configured with the default gateway in the management network.

If you deploy the multi-rack configuration, ensure that the provisioning network of the management cluster is connected to an IP router that connects it to the provisioning networks of all racks.

MOSK cluster networking

Mirantis OpenStack for Kubernetes (MOSK) clusters managed by Mirantis Container Cloud use the following networks to serve different types of traffic:

MOSK network types

Network role

Description

Provisioning (PXE) network

Facilitates the iPXE boot of all bare metal machines in a MOSK cluster and provisioning of the operating system to machines.

This network is only used during provisioning of the host. It must not be configured on an operational MOSK node.

Life-cycle management (LCM) and API network

Connects LCM agents on the hosts to the Container Cloud API provided by the regional or management cluster. Used for communication between kubelet and the Kubernetes API server inside a Kubernetes cluster. The MKE components use this network for communication inside a swarm cluster.

You can use more than one LCM network segment in a MOSK cluster. In this case, separated L2 segments and interconnected L3 subnets are still used to serve LCM and API traffic.

All IP subnets in the LCM networks must be connected to each other by IP routes. These routes must be configured on the hosts through L2 templates.

All IP subnets in the LCM network must be connected to the Kubernetes API endpoints of the management or regional cluster through an IP router.

You can manually select the VIP address for the Kubernetes API endpoint from the LCM subnet and specify it in the Cluster object configuration. Alternatively, you can allocate a dedicated IP range for a virtual IP of the API endpoint by adding a Subnet object with a special annotation. For details, see Create subnets.

Note

Due to current limitations of the API endpoint failover, only one of the LCM networks can contain the API endpoint. This network is called API/LCM throughout this documentation. It consists of a VLAN segment stretched between all Kubernetes manager nodes in the cluster and the IP subnet that provides IP addresses allocated to these nodes.

Kubernetes workloads network

Serves as an underlay network for traffic between pods in the managed cluster. Calico uses this network to build mesh interconnections between nodes in the cluster. This network should not be shared between clusters.

There might be more than one Kubernetes pods network in the cluster. In this case, they must be connected through an IP router.

Kubernetes workloads network does not need an external access.

Kubernetes external network

Serves for an access to the OpenStack endpoints in a MOSK cluster. Due to the limitations of MetalLB in the layer2 mode, the network must contain a VLAN segment extended to all MOSK controller nodes.

A typical MOSK cluster only has one external network.

The external network must include at least two IP address ranges defined by separate Subnet objects in Container Cloud API:

  • MOSK services range Technology Preview

    Provides IP addresses for externally available load-balanced services, including OpenStack API endpoints. The IP addresses for MetalLB services are assigned from this range.

  • External range

    Provides IP addresses to be assigned to the network interfaces on the cluster nodes:

    • Before MOSK 22.2, on the OpenStack controller nodes

    • Since MOSK 22.2, on all nodes

    This is required for external traffic to return to the originating client. The default route on the MOSK nodes must be configured with the default gateway in the external network.

Storage access network

Serves for the storage access traffic from and to Ceph OSD services.

A MOSK cluster may have more than one VLAN segment and IP subnet in the storage access network. All IP subnets of this network in a single cluster must be connected by an IP router.

The storage access network does not require external access unless you want to directly expose Ceph to the clients outside of a MOSK cluster.

Note

A direct access to Ceph by the clients outside of a MOSK cluster is technically possible but not supported by Mirantis. Use at your own risk.

The IP addresses from subnets in this network are assigned to Ceph nodes. The Ceph OSD services bind to these addresses on their respective nodes.

This is a public network in Ceph terms. 1

Storage replication network

Serves for the storage replication traffic between Ceph OSD services.

A MOSK cluster may have more than one VLAN segment and IP subnet in this network as long as the subnets are connected by an IP router.

This network does not require external access.

The IP addresses from subnets in this network are assigned to Ceph nodes. The Ceph OSD services bind to these addresses on their respective nodes.

This is a cluster network in Ceph terms. 1

Out-of-Band (OOB) network

Connects Baseboard Management Controllers (BMCs) of the bare metal hosts. Must not be accessible from a MOSK cluster.

1(1,2)

For more details about Ceph networks, see Ceph Network Configuration Reference.

The following diagram illustrates the networking schema of the Container Cloud deployment on bare metal with a MOSK cluster:

_images/network-multirack.png
Network types

This section describes network types for Layer 3 networks used for Kubernetes and Mirantis OpenStack for Kubernetes (MOSK) clusters along with requirements for each network type.

Note

Only IPv4 is currently supported by Container Cloud and IPAM for infrastructure networks. IPv6 is not supported and not used in Container Cloud and MOSK underlay infrastructure networks.

The following diagram provides an overview of the underlay networks in a MOSK environment:

_images/os-cluster-l3-networking.png
L3 networks for Kubernetes

A MOSK deployment typically requires the following types of networks:

  • Provisioning network

    Used for provisioning of bare metal servers.

  • Management network

    Used for management of the Container Cloud infrastructure and for communication between containers in Kubernetes.

  • LCM/API network

    Must be configured on the Kubernetes manager nodes of the cluster. Contains the Kubernetes API endpoint with the VRRP virtual IP address. Enables communication between the MKE cluster nodes.

  • LCM network

    Enables communication between the MKE cluster nodes. Multiple VLAN segments and IP subnets can be created for a multi-rack architecture. Each server must be connected to one of the LCM segments and have an IP from the corresponding subnet.

  • External network

    Used to expose the OpenStack, StackLight, and other services of the MOSK cluster.

  • Kubernetes workloads network

    Used for communication between containers in Kubernetes.

  • Storage access network (Ceph)

    Used for accessing the Ceph storage. In Ceph terms, this is a public network 0. We recommended that it is placed on a dedicated hardware interface.

  • Storage replication network (Ceph)

    Used for Ceph storage replication. In Ceph terms, this is a cluster network 0. To ensure low latency and fast access, place the network on a dedicated hardware interface.

0(1,2)

For details about Ceph networks, see Ceph Network Configuration Reference.

L3 networks for MOSK

The MOSK deployment additionally requires the following networks.

L3 networks for MOSK

Service name

Network

Description

VLAN name

Networking

Provider networks

Typically, a routable network used to provide the external access to OpenStack instances (a floating network). Can be used by the OpenStack services such as Ironic, Manila, and others, to connect their management resources.

pr-floating

Networking

Overlay networks (virtual networks)

The network used to provide denied, secure tenant networks with the help of the tunneling mechanism (VLAN/GRE/VXLAN). If the VXLAN and GRE encapsulation takes place, the IP address assignment is required on interfaces at the node level.

neutron-tunnel

Compute

Live migration network

The network used by the OpenStack compute service (Nova) to transfer data during live migration. Depending on the cloud needs, it can be placed on a dedicated physical network not to affect other networks during live migration. The IP address assignment is required on interfaces at the node level.

lm-vlan

The way of mapping of the logical networks described above to physical networks and interfaces on nodes depends on the cloud size and configuration. We recommend placing OpenStack networks on a dedicated physical interface (bond) that is not shared with storage and Kubernetes management network to minimize the influence on each other.

L3 networks requirements

The following tables describe networking requirements for a MOSK cluster, Container Cloud management and Ceph clusters.

Container Cloud management cluster networking requirements

Network type

Provisioning

Management

Suggested interface name

bm-pxe

lcm-nw

Minimum number of VLANs

1

1

Minimum number of IP subnets

3

2

Minimum recommended IP subnet size

  • 8 IP addresses (Container Cloud management cluster hosts)

  • 8 IP addresses (MetalLB for provisioning services)

  • 16 IP addresses (DHCP range for directly connected servers)

  • 8 IP addresses (Container Cloud management cluster hosts, API VIP)

  • 16 IP addresses (MetalLB for Container Cloud services)

External routing

Not required

Required, may use proxy server

Multiple segments/stretch segment

Stretch segment for management cluster due to MetalLB Layer 2 limitations 1

Stretch segment due to VRRP, MetalLB Layer 2 limitations

Internal routing

Routing to separate DHCP segments, if in use

  • Routing to API endpoints of managed clusters for LCM

  • Routing to MetalLB ranges of managed clusters for StackLight authentication

  • Default route from Container Cloud management cluster hosts

1

Multiple VLAN segments with IP subnets can be added to the cluster configuration for separate DHCP domains.

MOSK cluster networking requirements

Network type

Provisioning

LCM/API

LCM

External

Kubernetes workloads

Minimum number of VLANs

1 (optional)

1

1 (optional)

1

1

Suggested interface name

N/A

lcm-nw

lcm-nw

k8s-ext-v

k8s-pods 2

Minimum number of IP subnets

1 (optional)

1

1 (optional)

2

1

Minimum recommended IP subnet size

16 IPs (DHCP range)

  • 3 IPs for Kubernetes manager nodes

  • 1 IP for the API endpoint VIP

1 IP per MOSK node (Kubernetes worker)

  • 1 IP per cluster node

  • 16 IPs (MetalLB for StackLight, OpenStack services)

1 IP per cluster node

Stretch or multiple segments

Multiple

Stretch due to VRRP limitations

Multiple

Stretch connected to all MOSK controller nodes

Multiple

External routing

Not required

Not required

Not required

Required, default route

Not required

Internal routing

Routing to the provisioning network of the management cluster

  • Routing to the IP subnet of the Container Cloud management network

  • Routing to all LCM IP subnets of the same MOSK cluster, if in use

  • Routing to the IP subnet of the LCM/API network

  • Routing to all IP subnets of the LCM network, if in use

Routing to the IP subnet of the Container Cloud Management API

Routing to all IP subnets of Kubernetes workloads

2

The bridge interface with this name is mandatory if you need to separate Kubernetes workloads traffic. You can configure this bridge over the VLAN or directly over the bonded or single interface.

MOSK Ceph cluster networking requirements

Network type

Storage access

Storage replication

Minimum number of VLANs

1

1

Suggested interface name

stor-public 3

stor-cluster 3

Minimum number of IP subnets

1

1

Minimum recommended IP subnet size

1 IP per cluster node

1 IP per cluster node

Stretch or multiple segments

Multiple

Multiple

External routing

Not required

Not required

Internal routing

Routing to all IP subnets of the Storage access network

Routing to all IP subnets of the Storage replication network

Note

When selecting externally routable subnets, ensure that the subnet ranges do not overlap with the internal subnets ranges. Otherwise, internal resources of users will not be available from the MOSK cluster.

3(1,2)

For details about Ceph networks, see Ceph Network Configuration Reference.

Multi-rack architecture

Available since MOSK 21.6 TechPreview

Mirantis OpenStack for Kubernetes (MOSK) enables you to deploy a cluster with a multi-rack architecture, where every data center cabinet (a rack), incorporates its own Layer 2 network infrastructure that does not extend beyond its top-of-rack switch. The architecture allows a MOSK cloud to integrate natively with the Layer 3-centric networking topologies seen in modern data centers, such as Spine-Leaf.

The architecture eliminates the need to stretch and manage VLANs across multiple physical locations in a single data center, or to establish VPN tunnels between the parts of a geographically distributed cloud.

The set of networks present in each rack depends on the type of the OpenStack networking service back end in use.

_images/multi-rack.png
Bare metal provisioning

The multi-rack architecture in Mirantis Container Cloud and MOSK requires additional configuration of networking infrastructure. Every Layer 2 domain, or rack, needs to have a DHCP relay agent configured on its dedicated segment of the Common/PXE network (lcm-nw VLAN). The agent handles all Layer-2 DHCP requests incoming from the bare metal servers living in the rack and forwards them as Layer-3 packets across the data center fabric to a Mirantis Container Cloud regional cluster.

_images/multi-rack-bm.png

You need to configure per-rack DHCP ranges by defining Subnet resources in Mirantis Container Cloud as described in Mirantis Container Cloud documentation: Configure multiple DHCP ranges using Subnet resources.

Based on the address of the DHCP agent that relays a request from a server, Mirantis Container Cloud will automatically allocate an IP address in the corresponding subnet.

For the networks types other than Common/PXE, you need to define subnets using the Mirantis Container Cloud L2 templates. Every rack needs to have a dedicated set of L2 templates, each template representing a specific server role and configuration.

Multi-rack MOSK cluster with Tungsten Fabric

For MOSK clusters with the Tungsten Fabric back end, you need to place the servers running the cloud control plane components into a single rack. This limitation is caused by the Layer 2 VRRP protocol used by the Kubernetes load balancer mechanism (MetalLB) to ensure high availability of Mirantis Container Cloud and MOSK API.

Note

In the future product versions, Mirantis will be implementing support for the Layer 3 BGP mode for the Kubernetes load balancing mechanism.

The diagram below will help you to plan the networking layout of a multi-rack MOSK cloud with Tungsten Fabric.

_images/multi-rack-tf.png

The table below provides a mapping between the racks and the network types participating in a multi-rack MOSK cluster with the Tungsten Fabric back end.

Networks and VLANs for a multi-rack MOSK cluster with TF

Network

VLAN name

Rack 1

Rack 2 and N

Common/PXE

lcm-nw

Yes

Yes

Management

lcm-nw

Yes

Yes

External (MetalLB)

k8s-ext-v

Yes

No

Kubernetes workloads

k8s-pods-v

Yes

Yes

Storage access (Ceph)

stor-frontend

Yes

Yes

Storage replication (Ceph)

stor-backend

Yes

Yes

Overlay

tenant-vlan

Yes

Yes

Live migration

lm-vlan

Yes

Yes

Physical networks layout

This section summarizes the requirements for the physical layout of underlay network and VLANs configuration for the multi-rack architecture of Mirantis OpenStack for Kubernetes (MOSK).

Physical networking of a Container Cloud management cluster

Due to limitations of virtual IP address for Kubernetes API and of MetalLB load balancing in Container Cloud, the management cluster nodes must share VLAN segments in the provisioning and management networks.

In the multi-rack architecture, the management cluster nodes may be placed to a single rack or spread across three racks. In either case, provisioning and management network VLANs must be stretched across ToR switches of the racks.

The following diagram illustrates physical and L2 connections of the Container Cloud management cluster.

_images/os-cluster-mgmt-physical.png
Physical networking of a MOSK cluster
External network

Due to limitations of MetalLB load balancing, all MOSK cluster nodes must share the VLAN segment in the external network.

In the multi-rack architecture, the external network VLAN must be stretched to the ToR switches of all racks. All other VLANs may be configured per rack.

Kubernetes manager nodes

Due to limitations of using a virtual IP address for Kubernetes API, the Kubernetes manager nodes must share the VLAN segment in the API/LCM network.

In the multi-rack architecture, Kubernetes manager nodes may be spread across three racks. The API/LCM network VLAN must be stretched to the ToR switches of the racks. All other VLANs may be configured per rack.

The following diagram illustrates physical and L2 network connections of the Kubernetes manager nodes in a MOSK cluster.

Caution

Such configuration does not apply to a compact control plane MOSK installation. See Create a MOSK cluster.

_images/os-cluster-k8s-mgr-physical.png
OpenStack controller nodes

The following diagram illustrates physical and L2 network connections of the control plane nodes in a MOSK cluster.

_images/os-cluster-control-physical.png
OpenStack compute nodes

All VLANs for OpenStack compute nodes may be configured per rack. No VLAN should be stretched across multiple racks.

The following diagram illustrates physical and L2 network connections of the compute nodes in a MOSK cluster.

_images/os-cluster-compute-physical.png
OpenStack storage nodes

All VLANs for OpenStack storage nodes may be configured per rack. No VLAN should be stretched across multiple racks.

The following diagram illustrates physical and L2 network connections of the storage nodes in a MOSK cluster.

_images/os-cluster-storage-physical.png
Performance optimization

The following recommendations apply to all types of nodes in the Mirantis OpenStack for Kubernetes (MOSK) clusters.

Jumbo frames

To improve the goodput, we recommend that you enable jumbo frames where possible. The jumbo frames have to be enabled on the whole path of the packets traverse. If one of the network components cannot handle jumbo frames, the network path uses the smallest MTU.

Bonding

To provide fault tolerance of a single NIC, we recommend using the link aggregation, such as bonding. The link aggregation is useful for linear scaling of bandwidth, load balancing, and fault protection. Depending on the hardware equipment, different types of bonds might be supported. Use the multi-chassis link aggregation as it provides fault tolerance at the device level. For example, MLAG on Arista equipment or vPC on Cisco equipment.

The Linux kernel supports the following bonding modes:

  • active-backup

  • balance-xor

  • 802.3ad (LACP)

  • balance-tlb

  • balance-alb

Since LACP is the IEEE standard 802.3ad supported by the majority of network platforms, we recommend using this bonding mode. Use the Link Aggregation Control Protocol (LACP) bonding mode with MC-LAG domains configured on ToR switches. This corresponds to the 802.3ad bond mode on hosts.

Additionally, follow these recommendations in regards to bond interfaces:

  • Use ports from different multi-port NICs when creating bonds. This makes network connections redundant if failure of a single NIC occurs.

  • Configure the ports that connect servers to the PXE network with PXE VLAN as native or untagged. On these ports, configure LACP fallback to ensure that the servers can reach DHCP server and boot over network.

Spanning tree portfast mode

Configure Spanning Tree Protocol (STP) settings on the network switch ports to ensure that the ports start forwarding packets as soon as the link comes up. It helps avoid iPXE timeout issues and ensures reliable boot over network.

Storage

A MOSK cluster uses Ceph as a distributed storage system for file, block, and object storage exposed by the Container Cloud baremetal management cluster. This section provides an overview of a Ceph cluster deployed by Container Cloud.

Ceph overview

Mirantis Container Cloud deploys Ceph on the baremetal-based management and managed clusters using Helm charts with the following components:

Rook Ceph Operator

A storage orchestrator that deploys Ceph on top of a Kubernetes cluster. Also known as Rook or Rook Operator. Rook operations include:

  • Deploying and managing a Ceph cluster based on provided Rook CRs such as CephCluster, CephBlockPool, CephObjectStore, and so on.

  • Orchestrating the state of the Ceph cluster and all its daemons.

KaaSCephCluster custom resource (CR)

Represents the customization of a Kubernetes installation and allows you to define the required Ceph configuration through the Container Cloud web UI before deployment. For example, you can define the failure domain, Ceph pools, Ceph node roles, number of Ceph components such as Ceph OSDs, and so on. The ceph-kcc-controller controller on the Container Cloud management cluster manages the KaaSCephCluster CR.

Ceph Controller

A Kubernetes controller that obtains the parameters from Container Cloud through a CR, creates CRs for Rook and updates its CR status based on the Ceph cluster deployment progress. It creates users, pools, and keys for OpenStack and Kubernetes and provides Ceph configurations and keys to access them. Also, Ceph Controller eventually obtains the data from the OpenStack Controller for the Keystone integration and updates the RADOS Gateway services configurations to use Kubernetes for user authentication. Ceph Controller operations include:

  • Transforming user parameters from the Container Cloud Ceph CR into Rook CRs and deploying a Ceph cluster using Rook.

  • Providing integration of the Ceph cluster with Kubernetes.

  • Providing data for OpenStack to integrate with the deployed Ceph cluster.

Ceph Status Controller

A Kubernetes controller that collects all valuable parameters from the current Ceph cluster, its daemons, and entities and exposes them into the KaaSCephCluster status. Ceph Status Controller operations include:

  • Collecting all statuses from a Ceph cluster and corresponding Rook CRs.

  • Collecting additional information on the health of Ceph daemons.

  • Provides information to the status section of the KaaSCephCluster CR.

Ceph Request Controller

A Kubernetes controller that obtains the parameters from Container Cloud through a CR and manages Ceph OSD lifecycle management (LCM) operations. It allows for a safe Ceph OSD removal from the Ceph cluster. Ceph Request Controller operations include:

  • Providing an ability to perform Ceph OSD LCM operations.

  • Obtaining specific CRs to remove Ceph OSDs and executing them.

  • Pausing the regular Ceph Controller reconciliation until all requests are completed.

A typical Ceph cluster consists of the following components:

  • Ceph Monitors - three or, in rare cases, five Ceph Monitors.

  • Ceph Managers - one Ceph Manager in a regular cluster.

  • RADOS Gateway services - Mirantis recommends having three or more RADOS Gateway instances for HA.

  • Ceph OSDs - the number of Ceph OSDs may vary according to the deployment needs.

    Warning

    • A Ceph cluster with 3 Ceph nodes does not provide hardware fault tolerance and is not eligible for recovery operations, such as a disk or an entire Ceph node replacement.

    • A Ceph cluster uses the replication factor that equals 3. If the number of Ceph OSDs is less than 3, a Ceph cluster moves to the degraded state with the write operations restriction until the number of alive Ceph OSDs equals the replication factor again.

The placement of Ceph Monitors and Ceph Managers is defined in the KaaSCephCluster CR.

The following diagram illustrates the way a Ceph cluster is deployed in Container Cloud:

_images/ceph-deployment.png

The following diagram illustrates the processes within a deployed Ceph cluster:

_images/ceph-data-flow.png
Ceph limitations

A Ceph cluster configuration in Mirantis Container Cloud includes but is not limited to the following limitations:

  • Only one Ceph Controller per a management, regional, or managed cluster and only one Ceph cluster per Ceph Controller are supported.

  • The replication size for any Ceph pool must be set to more than 1.

  • Only one CRUSH tree per cluster. The separation of devices per Ceph pool is supported through device classes with only one pool of each type for a device class.

  • All CRUSH rules must have the same failure_domain.

  • Only the following types of CRUSH buckets are supported:

    • topology.kubernetes.io/region

    • topology.kubernetes.io/zone

    • topology.rook.io/datacenter

    • topology.rook.io/room

    • topology.rook.io/pod

    • topology.rook.io/pdu

    • topology.rook.io/row

    • topology.rook.io/rack

    • topology.rook.io/chassis

  • RBD mirroring is not supported.

  • Consuming an existing Ceph cluster is not supported.

  • CephFS is not supported.

  • Only IPv4 is supported.

  • If two or more Ceph OSDs are located on the same device, there must be no dedicated WAL or DB for this class.

  • Only a full collocation or dedicated WAL and DB configurations are supported.

  • The minimum size of any defined Ceph OSD device is 5 GB.

  • Reducing the number of Ceph Monitors is not supported and causes the Ceph Monitor daemons removal from random nodes.

  • When adding a Ceph node with the Ceph Monitor role, if any issues occur with the Ceph Monitor, rook-ceph removes it and adds a new Ceph Monitor instead, named using the next alphabetic character in order. Therefore, the Ceph Monitor names may not follow the alphabetical order. For example, a, b, d, instead of a, b, c.

StackLight

StackLight is the logging, monitoring, and alerting solution that provides a single pane of glass for cloud maintenance and day-to-day operations as well as offers critical insights into cloud health including operational information about the components deployed with Mirantis OpenStack for Kubernetes (MOSK). StackLight is based on Prometheus, an open-source monitoring solution and a time series database, and OpenSearch, the logs and notifications storage.

Deployment architecture

Mirantis OpenStack for Kubernetes (MOSK) deploys the StackLight stack as a release of a Helm chart that contains the helm-controller and HelmBundle custom resources. The StackLight HelmBundle consists of a set of Helm charts describing the StackLight components. Apart from the OpenStack-specific components below, StackLight also includes the components described in Mirantis Container Cloud Reference Architecture: Deployment architecture. By default, StackLight logging stack is disabled.

During the StackLight configuration when deploying a MOSK cluster, you can define the HA or non-HA StackLight architecture type. For details, see Mirantis Container Cloud Reference Architecture: StackLight database modes.

OpenStack-specific StackLight components overview

StackLight component

Description

Prometheus native exporters and endpoints

Export the existing metrics as Prometheus metrics and include:

  • libvirt-exporter

  • memcached-exporter

  • mysql-exporter

  • rabbitmq-exporter

  • tungstenfabric-exporter

Telegraf OpenStack plugin

Collects and processes the OpenStack metrics.

Monitored components

StackLight measures, analyzes, and reports in a timely manner about failures that may occur in the following Mirantis OpenStack for Kubernetes (MOSK) components and their sub-components. Apart from the components below, StackLight also monitors the components listed in Mirantis Container Cloud Reference Architecture: Monitored components.

  • Libvirt

  • Memcached

  • MariaDB

  • NTP

  • OpenStack (Barbican, Cinder, Designate, Glance, Heat, Horizon, Ironic, Keystone, Neutron, Nova, Octavia)

  • OpenStack SSL certificates

  • Open vSwitch

  • RabbitMQ

  • Tungsten Fabric (Casandra, Kafka, Redis, ZooKeeper)

OpenSearch and Prometheus storage sizing

Caution

Calculations in this document are based on numbers from a real-scale test cluster with 34 nodes. The exact space required for metrics and logs must be calculated depending on the ongoing cluster operations. Some operations force the generation of additional metrics and logs. The values below are approximate. Use them only as recommendations.

During the deployment of a new cluster, you must specify the OpenSearch retention time and Persistent Volume Claim (PVC) size, Prometheus PVC, retention time, and retention size. When configuring an existing cluster, you can only set OpenSearch retention time, Prometheus retention time, and retention size.

The following table describes the recommendations for both OpenSearch and Prometheus retention size and PVC size for a cluster with 34 nodes. Retention time depends on the space allocated for the data. To calculate the required retention time, use the {retention time} = {retention size} / {amount of data per day} formula.

Service

Required space per day

Description

OpenSearch

StackLight in non-HA mode:
  • 202 - 253 GB for the entire cluster

  • ~6 - 7.5 GB for a single node

StackLight in HA mode:
  • 404 - 506 GB for the entire cluster

  • ~12 - 15 GB for a single node

When setting Persistent Volume Claim Size for OpenSearch during the cluster creation, take into account that it defines the PVC size for a single instance of the OpenSearch cluster. StackLight in HA mode has 3 OpenSearch instances. Therefore, for a total OpenSearch capacity, multiply the PVC size by 3.

Prometheus

  • 11 GB for the entire cluster

  • ~400 MB for a single node

Every Prometheus instance stores the entire database. Multiple replicas store multiple copies of the same data. Therefore, treat the Prometheus PVC size as the capacity of Prometheus in the cluster. Do not sum them up.

Prometheus has built-in retention mechanisms based on the database size and time series duration stored in the database. Therefore, if you miscalculate the PVC size, retention size set to ~1 GB less than the PVC size will prevent disk overfilling.

Tungsten Fabric

Tungsten Fabric provides basic L2/L3 networking to an OpenStack environment running on the MKE cluster and includes the IP address management, security groups, floating IP addresses, and routing policies functionality. Tungsten Fabric is based on overlay networking, where all virtual machines are connected to a virtual network with encapsulation (MPLSoGRE, MPLSoUDP, VXLAN). This enables you to separate the underlay Kubernetes management network. A workload requires an external gateway, such as a hardware EdgeRouter or a simple gateway to route the outgoing traffic.

The Tungsten Fabric vRouter uses different gateways for the control and data planes.

Tungsten Fabric cluster

All services of Tungsten Fabric are delivered as separate containers, which are deployed by the Tungsten Fabric Operator (TFO). Each container has an INI-based configuration file that is available on the host system. The configuration file is generated automatically upon the container start and is based on environment variables provided by the TFO through Kubernetes ConfigMaps.

The main Tungsten Fabric containers run with the host network as DeploymentSet, without using the Kubernetes networking layer. The services listen directly on the host network interface.

The following diagram describes the minimum production installation of Tungsten Fabric with a Mirantis OpenStack for Kubernetes (MOSK) deployment.

_images/tf-architecture.png

For the details about the Tungsten Fabric services included in MOSK deployments and the types of traffic and traffic flow directions, see the subsections below.

Tungsten Fabric cluster components

This section describes the Tungsten Fabric services and their distribution across the Mirantis OpenStack for Kubernetes (MOSK) deployment.

The Tungsten Fabric services run mostly as DaemonSets in separate containers for each service. The deployment and update processes are managed by the Tungsten Fabric operator. However, Kubernetes manages the probe checks and restart of broken containers.

Configuration and control services

All configuration and control services run on the Tungsten Fabric Controller nodes.

Service name

Service description

config-api

Exposes a REST-based interface for the Tungsten Fabric API.

config-nodemgr

Collects data of the Tungsten Fabric configuration processes and sends it to the Tungsten Fabric collector.

control

Communicates with the cluster gateways using BGP and with the vRouter agents using XMPP, as well as redistributes appropriate networking information.

control-nodemgr

Collects the Tungsten Fabric Controller process data and sends this information to the Tungsten Fabric collector.

device-manager

Manages physical networking devices using netconf or ovsdb. In multi-node deployments, it operates in the active-backup mode.

dns

Using the named service, provides the DNS service to the VMs spawned on different compute nodes. Each vRouter node connects to two Tungsten Fabric Controller containers that run the dns process.

named

The customized Berkeley Internet Name Domain (BIND) daemon of Tungsten Fabric that manages DNS zones for the dns service.

schema

Listens to configuration changes performed by a user and generates corresponding system configuration objects. In multi-node deployments, it works in the active-backup mode.

svc-monitor

Listens to configuration changes of service-template and service-instance, as well as spawns and monitors virtual machines for the firewall, analyzer services, and so on. In multi-node deployments, it works in the active-backup mode.

webui

Consists of the webserver and jobserver services. Provides the Tungsten Fabric web UI.

Analytics services

All analytics services run on Tungsten Fabric analytics nodes.

Service name

Service description

alarm-gen

Evaluates and manages the alarms rules.

analytics-api

Provides a REST API to interact with the Cassandra analytics database.

analytics-nodemgr

Collects all Tungsten Fabric analytics process data and sends this information to the Tungsten Fabric collector.

analytics-database-nodemgr

Provisions the init model if needed. Collects data of the database process and sends it to the Tungsten Fabric collector.

collector

Collects and analyzes data from all Tungsten Fabric services.

query-engine

Handles the queries to access data from the Cassandra database.

snmp-collector

Receives the authorization and configuration of the physical routers from the config-nodemgr service, polls the physical routers using the Simple Network Management Protocol (SNMP), and uploads the data to the Tungsten Fabric collector.

topology

Reads the SNMP information from the physical router user-visible entities (UVEs), creates a neighbor list, and writes the neighbor information to the physical router UVEs. The Tungsten Fabric web UI uses the neighbor list to display the physical topology.

vRouter

The Tungsten Fabric vRouter provides data forwarding to an OpenStack tenant instance and reports statistics to the Tungsten Fabric analytics service. The Tungsten Fabric vRouter is installed on all OpenStack compute nodes. Mirantis OpenStack for Kubernetes (MOSK) supports the kernel-based deployment of the Tungsten Fabric vRouter.

vRouter services on the OpenStack compute nodes

Service name

Service description

vrouter-agent

Connects to the Tungsten Fabric Controller container and the Tungsten Fabric DNS system using the Extensible Messaging and Presence Protocol (XMPP). The vRouter Agent acts as a local control plane. Each Tungsten Fabric vRouter Agent is connected to at least two Tungsten Fabric controllers in an active-active redundancy mode.

The Tungsten Fabric vRouter Agent is responsible for all networking-related functions including routing instances, routes, and others.

The Tungsten Fabric vRouter uses different gateways for the control and data planes. For example, the Linux system gateway is located on the management network, and the Tungsten Fabric gateway is located on the data plane network.

vrouter-nodemgr

Collects the supervisor vrouter data and sends it to the Tungsten Fabric collector.

The following diagram illustrates the Tungsten Fabric kernel vRouter set up by the TF operator:

_images/tf_vrouter.png

On the diagram above, the following types of networks interfaces are used:

  • eth0 - for the management (PXE) network (eth1 and eth2 are the slave interfaces of Bond0)

  • Bond0.x - for the MKE control plane network

  • Bond0.y - for the MKE data plane network

Third-party services

Service name

Service description

cassandra

  • On the Tungsten Fabric control plane nodes, maintains the configuration data of the Tungsten Fabric cluster.

  • On the Tungsten Fabric analytics nodes, stores the collector service data.

cassandra-operator

The Kubernetes operator that enables the Cassandra clusters creation and management.

kafka

Handles the messaging bus and generates alarms across the Tungsten Fabric analytics containers.

kafka-operator

The Kubernetes operator that enabels Kafka clusters creation and management.

redis

Stores the physical router UVE storage and serves as a messaging bus for event notifications.

redis-operator

The Kubernetes operator that enables Redis clusters creation and management.

zookeeper

Holds the active-backup status for the device-manager, svc-monitor, and the schema-transformer services. This service is also used for mapping of the Tungsten Fabric resources names to UUIDs.

zookeeper-operator

The Kubernetes operator that enables ZooKeeper clusters creation and management.

rabbitmq

Exchanges messages between API servers and original request senders.

rabbitmq-operator

The Kubernetes operator that enables RabbitMQ clusters creation and management.

Plugin services

All Tungsten Fabric plugin services are installed on the OpenStack controller nodes.

Service name

Service description

neutron-server

The Neutron server that includes the Tungsten Fabric plugin.

octavia-api

The Octavia API that includes the Tungsten Fabric Octavia driver.

heat-api

The Heat API that includes the Tungsten Fabric Heat resources and templates.

Image precaching DaemonSets

Available since MOSK 22.3

Along with the Tungsten Fabric services, MOSK deploys and updates special image precaching DaemonSets when the kind TFOperator resource is created or image references in it get updated. These DaemonSets precache container images on Kubernetes nodes minimizing possible downtime when updating container images. Cloud Operator can disable image precaching through the TFOperator resource.

Tungsten Fabric traffic flow

This section describes the types of traffic and traffic flow directions in a Mirantis OpenStack for Kubernetes (MOSK) cluster.

User interface and API traffic

The following diagram illustrates all types of UI and API traffic in a Mirantis OpenStack for Kubernetes cluster, including the monitoring and OpenStack API traffic. The OpenStack Dashboard pod hosts Horizon and acts as a proxy for all other types of traffic. TLS termination is also performed for this type of traffic.

_images/tf-traffic_flow_ui_api.png
SDN traffic

SDN or Tungsten Fabric traffic goes through the overlay Data network and processes east-west and north-south traffic for applications that run in a MOSK cluster. This network segment typically contains tenant networks as separate MPLS-over-GRE and MPLS-over-UDP tunnels. The traffic load depends on the workload.

The control traffic between the Tungsten Fabric controllers, edge routers, and vRouters uses the XMPP with TLS and iBGP protocols. Both protocols produce low traffic that does not affect MPLS over GRE and MPLS over UDP traffic. However, this traffic is critical and must be reliably delivered. Mirantis recommends configuring higher QoS for this type of traffic.

The following diagram displays both MPLS over GRE/MPLS over UDP and iBGP and XMPP traffic examples in a MOSK cluster:

_images/tf-traffic_flow_sdn.png
Tungsten Fabric lifecycle management

Mirantis OpenStack for Kubernetes (MOSK) provides the Tungsten Fabric lifecycle management including pre-deployment custom configurations, updates, data backup and restoration, as well as handling partial failure scenarios, by means of the Tungsten Fabric operator.

This section is intended for the cloud operators who want to gain insight into the capabilities provided by the Tungsten Fabric operator along with the understanding of how its architecture allows for easy management while addressing the concerns of users of Tungsten Fabric-based MOSK clusters.

Tungsten Fabric operator

The Tungsten Fabric operator (TFO) is based on the Kubernetes operator SDK project. The Kubernetes operator SDK is a framework that uses the controller-runtime library to make writing operators easier by providing the following:

  • High-level APIs and abstractions to write the operational logic more intuitively.

  • Tools for scaffolding and code generation to bootstrap a new project fast.

  • Extensions to cover common operator use cases.

The TFO deploys the following sub-operators. Each sub-operator handles a separate part of a TF deployment:

TFO sub-operators

Network

Description

TFControl

Deploys the Tungsten Fabric control services, such as:

  • Control

  • DNS

  • Control NodeManager

TFConfig

Deploys the Tungsten Fabric configuration services, such as:

  • API

  • Service monitor

  • Schema transformer

  • Device manager

  • Configuration NodeManager

  • Database NodeManager

TFAnalytics

Deploys the Tungsten Fabric analytics services, such as:

  • API

  • Collector

  • Alarm

  • Alarm-gen

  • SNMP

  • Topology

  • Alarm NodeManager

  • Database NodeManager

  • SNMP NodeManager

TFVrouter

Deploys a vRouter on each compute node with the following services:

  • vRouter Agent

  • NodeManager

TFWebUI

Deploys the following web UI services:

  • Web server

  • Job server

TFTool

Deploys the following tools to verify the TF deployment status:

  • TF-status

  • TF-status aggregator

TFTest

An operator to run Tempest tests.

Besides the sub-operators that deploy TF services, TFO uses operators to deploy and maintain third-party services, such as different types of storage, cache, message system, and so on. The following table describes all third-party operators:

TFO third-party sub-operators

Network

Description

casandra-operator

An upstream operator that automates the Cassandra HA storage operations for the configuration and analytics data.

zookeeper-operator

An upstream operator for deployment and automation of a ZooKeeper cluster.

kafka-operator

An operator for the Kafka cluster used by analytics services.

redis-operator

An upstream operator that automates the Redis cluster deployment and keeps it healthy.

rabbitmq-operator

An operator for the messaging system based on RabbitMQ.

The following diagram illustrates a simplified TFO workflow:

_images/tf-operator-workflow.png
TFOperator custom resource

The resource of kind TFOperator (TFO) is a custom resource (CR) defined by a resource of kind CustomResourceDefinition.

The CustomResourceDefinition resource in Kubernetes uses the OpenAPI Specification (OAS) version 2 to specify the schema of the defined resource. The Kubernetes API outright rejects the resources that do not pass this schema validation. Along with schema validation, starting from MOSK 21.6, TFOperator uses ValidatingAdmissionWebhook for extended validations when a CR is created or updated.

For the list of configuration options available to a cloud operator, refer to Tungsten Fabric configuration. Also, check out the Tungsten Fabric API Reference document of the MOSK version that your cluster has been deployed with.

TFOperator custom resource validation

Available since MOSK 21.6

Tungsten Fabric Operator uses ValidatingAdmissionWebhook to validate environment variables set to Tungsten Fabric components upon the TFOperator object creation or update. The following validations are performed:

  • Environment variables passed to TF components containers

  • Mapping between tfVersion and tfImageTag, if defined

  • Schedule and data capacity format for tf-dbBackup

If required, you can disable ValidatingAdmissionWebhook through the TFOperator HelmBundle resource:

apiVersion: lcm.mirantis.com/v1alpha1
kind: HelmBundle
metadata:
  name: tungstenfabric-operator
  namespace: tf
spec:
  releases:
  - name: tungstenfabric-operator
    values:
      admission:
        enabled: false
Allowed environment variables for TF components

Environment variables

TF components and containers

  • INTROSPECT_LISTEN_ALL

  • LOG_DIR

  • LOG_LEVEL

  • LOG_LOCAL

  • tf-analytics (alarm-gen, api, collector, alarm-nodemgr, db-nodemgr, nodemgr, snmp-nodemgr, query-engine, snmp, topology)

  • tf-config (api, db-nodemgr, nodemgr)

  • tf-control (control, dns, nodemgr)

  • tf-vrouter (agent, dpdk-nodemgr, nodemgr)

  • LOG_DIR

  • LOG_LEVEL

  • LOG_LOCAL

tf-config (config, devicemgr, schema, svc-monitor)

  • PROVISION_DELAY

  • PROVISION_RETRIES

  • BGP_ASN

  • ENCAP_PRIORITY

  • VXLAN_VN_ID_MODE

  • tf-analytics (alarm-provisioner, db-provisioner, provisioner, snmp-provisioner)

  • tf-config (db-provisioner, provisioner)

  • tf-control (provisioner)

  • tf-vrouter (dpdk-provisioner, provisioner)

  • CONFIG_API_LIST_OPTIMIZATION_ENABLED

  • CONFIG_API_WORKER_COUNT

  • CONFIG_API_MAX_REQUESTS

  • FWAAS_ENABLE

  • RABBITMQ_HEARTBEAT_INTERVAL

  • DISABLE_VNC_API_STATS

tf-config (config)

  • DNS_NAMED_MAX_CACHE_SIZE

  • DNS_NAMED_MAX_RETRANSMISSIONS

  • DNS_RETRANSMISSION_INTERVAL

tf-control (dns)

  • WEBUI_LOG_LEVEL

  • WEBUI_STATIC_AUTH_PASSWORD

  • WEBUI_STATIC_AUTH_ROLE

  • WEBUI_STATIC_AUTH_USER

tf-webui (job, web)

  • ANALYTICS_CONFIG_AUDIT_TTL

  • ANALYTICS_DATA_TTL

  • ANALYTICS_FLOW_TTL

  • ANALYTICS_STATISTICS_TTL

  • COLLECTOR_disk_usage_percentage_high_watermark0

  • COLLECTOR_disk_usage_percentage_high_watermark1

  • COLLECTOR_disk_usage_percentage_high_watermark2

  • COLLECTOR_disk_usage_percentage_low_watermark0

  • COLLECTOR_disk_usage_percentage_low_watermark1

  • COLLECTOR_disk_usage_percentage_low_watermark2

  • COLLECTOR_high_watermark0_message_severity_level

  • COLLECTOR_high_watermark1_message_severity_level

  • COLLECTOR_high_watermark2_message_severity_level

  • COLLECTOR_low_watermark0_message_severity_level

  • COLLECTOR_low_watermark1_message_severity_level

  • COLLECTOR_low_watermark2_message_severity_level

  • COLLECTOR_pending_compaction_tasks_high_watermark0

  • COLLECTOR_pending_compaction_tasks_high_watermark1

  • COLLECTOR_pending_compaction_tasks_high_watermark2

  • COLLECTOR_pending_compaction_tasks_low_watermark0

  • COLLECTOR_pending_compaction_tasks_low_watermark1

  • COLLECTOR_pending_compaction_tasks_low_watermark2

  • COLLECTOR_LOG_FILE_COUNT

  • COLLECTOR_LOG_FILE_SIZE

tf-analytics (collector)

  • ANALYTICS_DATA_TTL

  • QUERYENGINE_MAX_SLICE

  • QUERYENGINE_MAX_TASKS

  • QUERYENGINE_START_TIME

tf-analytics (query-engine)

  • SNMPCOLLECTOR_FAST_SCAN_FREQUENCY

  • SNMPCOLLECTOR_SCAN_FREQUENCY

tf-analytics (snmp)

TOPOLOGY_SCAN_FREQUENCY

tf-analytics (topology)

  • DPDK_UIO_DRIVER

  • PHYSICAL_INTERFACE

  • SRIOV_PHYSICAL_INTERFACE

  • SRIOV_PHYSICAL_NETWORK

  • SRIOV_VF

  • TSN_AGENT_MODE

  • TSN_NODES

  • AGENT_MODE

  • FABRIC_SNAT_HASH_TABLE_SIZE

  • PRIORITY_BANDWIDTH

  • PRIORITY_ID

  • PRIORITY_SCHEDULING

  • PRIORITY_TAGGING

  • QOS_DEF_HW_QUEUE

  • QOS_LOGICAL_QUEUES

  • QOS_QUEUE_ID

  • VROUTER_GATEWAY

  • HUGE_PAGES_2MB

  • HUGE_PAGES_1GB

  • DISABLE_TX_OFFLOAD

  • DISABLE_STATS_COLLECTION

tf-vrouter (agent)

  • CPU_CORE_MASK

  • SERVICE_CORE_MASK

  • DPDK_CTRL_THREAD_MASK

  • DPDK_COMMAND_ADDITIONAL_ARGS

  • DPDK_MEM_PER_SOCKET

  • DPDK_UIO_DRIVER

  • HUGE_PAGES

  • HUGE_PAGES_DIR

  • NIC_OFFLOAD_ENABLE

  • DPDK_ENABLE_VLAN_FWRD

tf-vrouter (agent-dpdk)

See also

API Reference

Tungsten Fabric configuration

Mirantis OpenStack for Kubernetes (MOSK) allows you to easily adapt your Tungsten Fabric deployment to the needs of your environment through the TFOperator custom resource.

This section includes custom configuration details available to you.

Cassandra configuration

This section describes the Cassandra configuration through the Tungsten Fabric Operator custom resource.

Cassandra resource limits configuration

By default, Tungsten Fabric Operator sets up the following resource limits for Cassandra analytics and configuration StatefulSets:

Limits:
  cpu:     8
  memory:  32Gi
Requests:
  cpu:     1
  memory:  16Gi

This is a verified configuration suitable for most cases. However, if nodes are under a heavy load, the KubeContainerCPUThrottlingHigh StackLight alert may raise for Tungsten Fabric Pods of the tf-cassandra-analytics and tf-cassandra-config StatefulSets. If such alerts appear constantly, you can increase the limits through the TFOperator CR. For example:

spec:
  controllers:
    cassandra:
      deployments:
      - name: tf-cassandra-config
        resources:
          limits:
            cpu: "12"
            memory: 32Gi
          requests:
            cpu: "2"
            memory: 16Gi
      - name: tf-cassandra-analytics
        resources:
          limits:
            cpu: "12"
            memory: 32Gi
          requests:
            cpu: "2"
            memory: 16Gi
Custom configuration

Available since MOSK 22.3

To specify custom configurations for Cassandra clusters, use the configOptions settings in the TFOperator CR. For example, you may need to increase the file cache size in case of a heavy load on the nodes labeled with tfanalyticsdb=enabled or tfconfigdb=enabled:

spec:
  controllers:
    cassandra:
       deployments:
       - name: tf-cassandra-analytics
         configOptions:
             file_cache_size_in_mb: 1024
Custom vRouter settings

TechPreview

To specify custom settings for the Tungsten Fabric (TF) vRouter nodes, for example, to change the name of the tunnel network interface or enable debug level logging on some subset of nodes, use the customSpecs settings in the TFOperator CR.

For example, to enable debug level logging on a specific node or multiple nodes:

spec:
  controllers:
    tf-vrouter:
      agent:
        customSpecs:
        - name: debug
          label:
            name: <NODE-LABEL>
            value: <NODE-LABEL-VALUE>
          containers:
          - name: agent
            env:
            - name: LOG_LEVEL
              value: SYS_DEBUG

The customSpecs parameter inherits all settings for the tf-vrouter containers that are set on the spec:controllers:agent level and overrides or adds additional parameters. The example configuration above overrides the logging level from SYS_INFO, which is the default logging level, to SYS_DEBUG.

Starting from MOSK 21.6, for clusters with a multi-rack architecture, you may need to redefine the gateway IP for the Tungsten Fabric vRouter nodes using the VROUTER_GATEWAY parameter. For details, see Multi-rack architecture.

Control plane traffic interface

By default, the TF control service uses the management interface for the BGP and XMPP traffic. You can change the control service interface using the controlInterface parameter in the TFOperator CR, for example, to combine the BGP and XMPP traffic with the data (tenant) traffic:

spec:
  settings:
    controlInterface: <tunnel-interface>
Traffic encapsulation

Tungsten Fabric implements cloud tenants’ virtual networks as Layer 3 overlays. Tenant traffic gets encapsulated into one of the supported protocols and is carried over the infrastructure network between 2 compute nodes or a compute node and an edge router device.

In addition, Tungsten Fabric is capable of exchanging encapsulated traffic with external systems in order to build advanced virtual networking topologies, for example, BGP VPN connectivity between 2 MOSK clouds or a MOSK cloud and a cloud tenant premises.

MOSK supports the following encapsulation protocols:

  • MPLS over Generic Routing Encapsulation (GRE)

    A traditional encapsulation method supported by several router vendors, including Cisco and Juniper. The feature is applicable when other encapsulation methods are not available. For example, an SDN gateway runs software that does not support MPLS over UDP.

  • MPLS over User Datagram Protocol (UDP)

    A variation of the MPLS over GRE mechanism. It is the default and the most frequently used option in MOSK. MPLS over UDP replaces headers in UDP packets. In this case, a UDP port stores a hash of the packet payload (entropy). It provides a significant benefit for equal-cost multi-path (ECMP) routing load balancing. MPLS over UDP and MPLS over GRE transfer Layer 3 traffic only.

  • Virtual Extensible LAN (VXLAN)

    Available since MOSK 22.1 TechPreview

    The combination of VXLAN and EVPN technologies is often used for creating advanced cloud networking topologies. For example, it can provide transparent Layer 2 interconnections between Virtual Network Functions running on top of the cloud and physical traffic generator appliances hosted somewhere else.

Encapsulation priority

The ENCAP_PRIORIY parameter defines the priority in which the encapsulation protocols are attempted to be used when setting the BGP VPN connectivity between the cloud and external systems.

By default, the encapsulation order is set to MPLSoUDP,MPLSoGRE,VXLAN. The cloud operator can change it depending their needs in the TFOperator custom resource as it is illustrated in Configuring encapsulation.

The list of supported encapsulated methods along with their order is shared between BGP peers as part of the capabilities information exchange when establishing a BGP session. Both parties must support the same encapsulation methods to build a tunnel for the network traffic.

For example, if the cloud operator wants to set up a Layer 2 VPN between the cloud and their network infrastructure, they configure the cloud’s virtual networks with VXLAN identifiers (VNIs) and do the same on the other side, for example, on a network switch. Also, VXLAN must be set in the first position in encapsulation priority order. Otherwise, VXLAN tunnels will not get established between endpoints, even though both endpoints may support the VXLAN protocol.

However, setting VXLAN first in the encapsulation priority order will not enforce VXLAN encapsulation between compute nodes or between compute nodes and gateway routers that use Layer 3 VPNs for communication.

Configuring encapsulation

The TFOperator custom resource allows you to define encapsulation settings for your Tungsten Fabric cluster.

Important

The TFOperator CR must be the only place to configure the cluster encapsulation. Performing these configurations through the TF web UI, CLI, or API does not provide the configuration persistency, and the settings defined this way may get reset to defaults during the cluster services restart or update.

Note

Defining the default values for encapsulation parameters in the TF operator CR is unnecessary.

Encapsulation settings

Parameter

Default value

Description

ENCAP_PRIORITY

MPLSoUDP,MPLSoGRE,VXLAN

Defines the encapsulation priority order.

VXLAN_VN_ID_MODE

automatic

Defines the Virtual Network ID type. The list of possible values includes:

  • automatic - to assign the VXLAN identifier to virtual networks automatically

  • configured - to make cloud users explicitly provide the VXLAN identifier for the virtual networks.

Typically, for a Layer 2 VPN use case, the VXLAN_VN_ID_MODE parameter is set to configured.

Example configuration:

controllers:
  tf-config:
    provisioner:
      containers:
      - env:
        - name: VXLAN_VN_ID_MODE
          value: automatic
        - name: ENCAP_PRIORITY
          value: VXLAN,MPLSoUDP,MPLSoGRE
        name: provisioner
Autonomous System Number (ASN)

In the routing fabric of a data centre, a MOSK cluster with Tungsten Fabric enabled can be represented either by a separate Autonomous System (AS) or as part of a bigger autonomous system. In either case, Tungsten Fabric needs to participate in the BGP peering, exchanging routes with external devices and within the cloud.

The Tungsten Fabric Controller acts as an internal (iBGP) route reflector for the cloud’s AS by populating /32 routes pointing to VMs across all compute nodes as well as the cloud’s edge gateway devices in case they belong to the same AS. Apart from being an iBGP router reflector for the cloud’s AS, the Tungsten Fabric Controller can act as a BGP peer for autonomous systems external to the cloud, for example, for the AS configured across the data center’s leaf-spine fabric.

The Autonomous System Number (ASN) setting contains the unique identifier of the autonomous system that the MOSK cluster with Tungsten Fabric belongs to. The ASN number does not affect the internal iBGP communication between vRouters running on the compute nodes. Such communication will work regardless of the ASN number settings. However, any network appliance that is not managed by the Tungsten Fabric control plane will have BGP configured manually. Therefore, the ASN settings should be configured accordingly on both sides. Otherwise, it would result in the inability to establish BPG sessions, regardless of whether the external device peers with Tungsten Fabric over iBGP or eBGP.

Configuring ASNs

The TFOperator custom resource enables you to define ASN settings for your Tungsten Fabric cluster.

Important

The TFOperator CR must be the only place to configure the cluster ASN. Performing these configurations through the TF web UI, CLI, or API does not provide the configuration persistency, and the settings defined this way may get reset to defaults during the cluster services restart or update.

Note

Defining the default values for ASN parameters in the TF operator CR is unnecessary.

ASN settings

Parameter

Default value

Description

BGP_ASN

64512

Defines the control node’s Autonomous System Number (ASN).

ENABLE_4BYTE_AS

FALSE

Enables the 4-byte ASN format.

Example configuration:

controllers:
  tf-config:
    provisioner:
      containers:
      - env:
        - name: BGP_ASN
          value: 64515
        - name: ENABLE_4BYTE_AS
          value: true
        name: provisioner
  tf-control:
    provisioner:
      containers:
      - env:
        - name: BGP_ASN
          value: 64515
        name: provisioner
Access to external DNS

Available since MOSK 22.2

By default, the TF tf-control-dns-external service is created to expose TF control dns. You can disable creation of this service using the enableDNSExternal parameter in the TFOperator CR. For example:

spec:
  controllers:
    tf-control:
      dns:
        enableDNSExternal: false

If such service was manually created before MOSK 22.2 with a name that differs from tf-control-dns-external, you can optionally delete the old service. If the name was the same, the service will be handled by the TF Operator.

Gateway for vRouter data plane network

If an edge router is accessible from the data plane through a gateway, define the VROUTER_GATEWAY parameter in the TFOperator custom resource. Otherwise, the default system gateway is used.

spec:
  controllers:
    tf-vrouter:
      agent:
        containers:
        - name: agent
          env:
          - name: VROUTER_GATEWAY
            value: <data-plane-network-gateway>

You can also configure the parameter for Tungsten Fabric vRouter in the DPDK mode:

spec:
  controllers:
    tf-vrouter:
      agent-dpdk:
        enabled: true
        containers:
        - name: agent
          env:
          - name: VROUTER_GATEWAY
            value: <data-plane-network-gateway>
Tungsten Fabric image precaching

Available since MOSK 22.3

By default, MOSK deploys image precaching DaemonSets to minimize possible downtime when updating container images. You can disable creation of these DaemonSets by setting the imagePreCaching parameter in the TFOperator custom resource to false:

spec:
  settings:
     imagePreCaching: false

When you disable imagePreCaching, the Tungsten Fabric Operator does not automatically remove the image precaching DaemonSets that have already been created. These DaemonSets do not affect the cluster setup. To remove them manually:

kubectl -n tf delete daemonsets.apps -l app=tf-image-pre-caching
Tungsten Fabric services

The section explains specifics of the Tungsten Fabric services provided by Mirantis OpenStack for Kubernetes (MOSK). The list of the services and their supported features included in this section is not full and is being constantly amended based on the complexity of the architecture and use of a particular service.

Tungsten Fabric load balancing

MOSK ensures Octavia with Tungsten Fabric integration by OpenStack Octavia Driver with Tungsten Fabric HAProxy as a back end.

The Tungsten Fabric-based MOSK deployment supports creation, update, and deletion operations with the following standard load balancing API entities:

  • Load balancers

    Note

    For a load balancer creation operation, the driver supports only the vip-subnet-id argument, the vip-network-id argument is not supported.

  • Listeners

  • Pools

  • Health monitors

The Tungsten Fabric-based MOSK deployment does not support the following load balancing capabilities:

  • L7 load balancing capabilities, such as L7 policies, L7 rules, and others

  • Setting specific availability zones for load balancers and their resources

  • Using of the UDP protocol

  • Operations with Octavia quotas

  • Operations with Octavia flavors

Warning

The Tungsten Fabric-based MOSK deployment enables you to manage the load balancer resources by means of the OpenStack CLI or OpenStack Horizon. Do not perform any manipulations with the load balancer resources through the Tungsten Fabric web UI because in this case the changes will not be reflected on the OpenStack API side.

Tungsten Fabric known limitations

This section contains a summary of the Tungsten Fabric upstream features and use cases not supported in MOSK, features and use cases offered as Technology Preview in the current product release if any, and known limitations of Tungsten Fabric in integration with other product components.

Feature or use case

Status

Description

Tungsten Fabric web UI

Provided as is

MOSK provides the TF web UI as is and does not include this service in the support Service Level Agreement

Automatic generation of network port records in DNSaaS (Designate)

Not supported

As a workaround, you can use the Tungsten Fabric built-in DNS service that enables virtual machines to resolve each other names

Secret management (Barbican)

Not supported

It is not possible to use the certificates stored in Barbican to terminate HTTPs on a load balancer in a Tungsten Fabric deployment

Role Based Access Control (RBAC) for Neutron objects

Not supported

Advanced Tungsten Fabric features

Not supported

Tungsten Fabric does not support the following upstream advanced features:

  • Service Function Chaining

  • Production ready multi-site SDN

Technical Preview

DPDK

Node maintenance API

Available since MOSK 22.1

This section describes internal implementation of the node maintenance API and how OpenStack and Tungsten Fabric controllers communicate with LCM and each other during a managed cluster update.

Node maintenance API objects

The node maintenance API consists of the following objects:

  • Cluster level:

    • ClusterWorkloadLock

    • ClusterMaintenanceRequest

  • Node level:

    • NodeWorkloadLock

    • NodeMaintenanceRequest

WorkloadLock objects

The WorkloadLock objects are created by each Application Controller. These objects prevent LCM from performing any changes on the cluster or node level while the lock is in the active state. The inactive state of the lock means that the Application Controller has finished its work and the LCM can proceed with the node or cluster maintenance.

ClusterWorkloadLock object example configuration
apiVersion: lcm.mirantis.com/v1alpha1
kind: ClusterWorkloadLock
metadata:
  name: cluster-1-openstack
spec:
  controllerName: openstack
status:
  state: active # inactive;active;failed (default: active)
  errorMessage: ""
  release: "6.16.0+21.3"
NodeWorkloadLock object example configuration
apiVersion: lcm.mirantis.com/v1alpha1
kind: NodeWorkloadLock
metadata:
  name: node-1-openstack
spec:
  nodeName: node-1
  controllerName: openstack
status:
  state: active # inactive;active;failed (default: active)
  errorMessage: ""
  release: "6.16.0+21.3"
MaintenanceRequest objects

The MaintenanceRequest objects are created by LCM. These objects notify Application Controllers about the upcoming maintenance of a cluster or a specific node.

ClusterMaintenanceRequest object example configuration
apiVersion: lcm.mirantis.com/v1alpha1
kind: ClusterMaintenanceRequest
metadata:
  name: cluster-1
spec:
  scope: drain # drain;os
NodeMaintenanceRequest object example configuration
 apiVersion: lcm.mirantis.com/v1alpha1
 kind: NodeMaintenanceRequest
 metadata:
   name: node-1
 spec:
   nodeName: node-1
   scope: drain # drain;os

The scope parameter in the object specification defines the impact on the managed cluster or node. The list of the available options include:

  • drain

    A regular managed cluster update. Each node in the cluster goes over a drain procedure. No node reboot takes place, a maximum impact includes restart of services on the node including Docker, which causes the restart of all containers present in the cluster.

  • os

    A node might be rebooted during the update. Triggers the workload evacuation by the OpenStack Controller.

When the MaintenanceRequest object is created, an Application Controller executes a handler to prepare workloads for maintenance and put appropriate WorkloadLock objects into the inactive state.

When maintenance is over, LCM removes MaintenanceRequest objects, and the Application Controllers move their WorkloadLocks objects into the active state.

OpenStack Controller maintenance API

When LCM creates the ClusterMaintenanceRequest object, the OpenStack Controller ensures that all OpenStack components are in the Healthy state, which means that the pods are up and running, and the readiness probes are passing.

The ClusterMaintenanceRequest object creation flow:

Untitled Diagram-cmr - create

When LCM creates the NodeMaintenanceRequest, the OpenStack Controller:

  1. Prepares components on the node for maintenance by removing nova-compute from scheduling.

  2. If the reboot of a node is possible, the instance migration workflow is triggered. The Operator can configure the instance migration flow through the Kubernetes node annotation and should define the required option before the managed cluster update.

    To mitigate the potential impact on the cloud workloads, you can define the instance migration flow for the compute nodes running the most valuable instances.

    The list of available options for the instance migration configuration includes:

    • The openstack.lcm.mirantis.com/instance_migration_mode annotation:

      • live

        Default. The OpenStack controller live migrates instances automatically. The update mechanism tries to move the memory and local storage of all instances on the node to another node without interrupting before applying any changes to the node. By default, the update mechanism makes three attempts to migrate each instance before falling back to the manual mode.

        Note

        Success of live migration depends on many factors including the selected vCPU type and model, the amount of data that needs to be transferred, the intensity of the disk IO and memory writes, the type of the local storage, and others. Instances using the following product features are known to have issues with live migration:

        • LVM-based ephemeral storage with and without encryption

        • Encrypted block storage volumes

        • CPU and NUMA node pinning

      • manual

        The OpenStack Controller waits for the Operator to migrate instances from the compute node. When it is time to update the compute node, the update mechanism asks you to manually migrate the instances and proceeds only once you confirm the node is safe to update.

      • skip

        The OpenStack Controller skips the instance check on the node and reboots it.

        Note

        For the clouds relying on the converged LVM with iSCSI block storage that offer persistent volumes in a remote edge sub-region, it is important to keep in mind that applying a major change to a compute node may impact not only the instances running on this node but also the instances attached to the LVM devices hosted there. We recommend that in such environments you perform the update procedure in the manual mode with mitigation measures taken by the Operator for each compute node. Otherwise, all the instances that have LVM with iSCSI volumes attached would need reboot to restore the connectivity.

    • The openstack.lcm.mirantis.com/instance_migration_attempts annotation

      Defines the number of times the OpenStack Controller attempts to migrate a single instance before giving up. Defaults to 3.

    Note

    You can also use annotations to control the update of non-compute nodes if they represent critical points of a specific cloud architecture. For example, setting the instance_migration_mode to manual on a controller node with a collocated gateway (Open vSwitch) will allow the Operator to gracefully shut down all the virtual routers hosted on this node.

  3. If the OpenStack Controller cannot migrate instances due to errors, it is suspended unless all instances are migrated manually or the openstack.lcm.mirantis.com/instance_migration_mode annotation is set to skip.

The NodeMaintenanceRequest object creation flow:

Untitled Diagram-nmr - create

When the node maintenance is over, LCM removes the NodeMaintenanceRequest object and the OpenStack Controller:

  • Verifies that the Kubernetes Node becomes Ready.

  • Verifies that all OpenStack components on a given node are Healthy, which means that the pods are up and running, and the readiness probes are passing.

  • Ensures that the OpenStack components are connected to RabbitMQ. For example, the Neutron Agents become alive on the node, and compute instances are in the UP state.

Note

The OpenStack Controller enables you to have only one nodeworkloadlock object at a time in the inactive state. Therefore, the update process for nodes is sequential.

The NodeMaintenanceRequest object removal flow:

Untitled Diagram-nmr - delete

When the cluster maintenance is over, the OpenStack Controller sets the ClusterWorkloadLock object to back active and the update completes.

The CLusterMaintenanceRequest object removal flow:

Untitled Diagram-cmr - delete
Tungsten Fabric Controller maintenance API

The Tungsten Fabric (TF) Controller creates and uses both types of workloadlocks that include ClusterWorkloadLock and NodeWorkloadLock.

When the ClusterMaintenanceRequest object is created, the TF Controller verifies the TF cluster health status and proceeds as follows:

  • If the cluster is Ready , the TF Controller moves the ClusterWorkloadLock object to the inactive state.

  • Otherwise, the TF Controller keeps the ClusterWorkloadLock object in the active state.

When the NodeMaintenanceRequest object is created, the TF Controller verifies the vRouter pod state on the corresponding node and proceeds as follows:

  • If all containers are Ready, the TF Controller moves the NodeWorkloadLock object to the inactive state.

  • Otherwise, the TF Controller keeps the NodeWorkloadLock in the active state.

Note

If there is a NodeWorkloadLock object in the inactive state present in the cluster, the TF Controller does not process the NodeMaintenanceRequest object for other nodes until this inactive NodeWorkloadLock object becomes active.

When the cluster LCM removes the MaintenanceRequest object, the TF Controller waits for the vRouter pods to become ready and proceeds as follows:

  • If all containers are in the Ready state, the TF Controller moves the NodeWorkloadLock object to the active state.

  • Otherwise, the TF Controller keeps the NodeWorkloadLock object in the inactive state.

Cluster update flow

This section describes the MOSK cluster update flow to the product releases that contain major updates and require node reboot such as support for new Linux kernel, and similar.

The diagram below illustrates the sequence of operations controlled by LCM and taking place during the update under the hood. We assume that the ClusterWorkloadLock and NodeWrokloadLock objects present in the cluster are in the active state before the Cloud Operator triggers the update.

Untitled Diagram

See also

For details about the Application Controllers flow during different maintenance stages, refer to:

Phase 1: The Operator triggers the update
  1. The Operator sets appropriate annotations on nodes and selects suitable migration mode for workloads.

  2. The Operator triggers the managed cluster update through the Mirantis Container Cloud web UI as described in Update the cluster to MOSK 22.1 or above: Step 3. Initiate MOSK cluster update.

  3. LCM creates the ClusterMaintenance object and notifies the application controllers about planned maintenance.

Phase 2: LCM triggers the OpenStack and Ceph update
  1. The OpenStack update starts.

  2. Ceph is waiting for the OpenStack ClusterWorkloadLock object to become inactive.

  3. When the OpenStack update is finalized, the OpenStack Controller marks ClusterWorkloadLock as inactive.

  4. The Ceph Controller triggers an update of the Ceph cluster.

  5. When the Ceph update is finalized, Ceph marks the ClusterWorkloadLock object as inactive.

Phase 3: LCM initiates the Kubernetes master nodes update
  1. If a master node has collocated roles, LCM creates NodeMainteananceRequest for the node.

  2. All Application Controllers mark their NodeWorkloadLock objects for this node as inactive.

  3. LCM starts draining the node by gracefully moving out all pods from the node. The DaemonSet pods are not evacuated and left running.

  4. LCM downloads the new version of the LCM Agent and runs its states.

    Note

    While running Ansible states, the services on the node may be restarted.

  5. The above flow is applied to all Kubernetes master nodes one by one.

  6. LCM removes NodeMainteananceRequest.

Phase 4: LCM initiates the Kubernetes worker nodes update
  1. LCM creates NodeMaintenanceRequest for the node with specifying scope.

  2. Application Controllers start preparing the node according to the scope.

  3. LCM waits until all Application Controllers mark their NodeWorkloadLock objects for this node as inactive.

  4. All pods are evacuated from the node by draining it. This does not apply to the DaemonSet pods, which cannot be removed.

  5. LCM downloads the new version of the LCM Agent and runs its states.

    Note

    While running Ansible states, the services on the node may be restarted.

  6. The above flow is applied to all Kubernetes worker nodes one by one.

  7. LCM removes NodeMainteananceRequest.

Phase 5: Finalization
  1. LCM triggers the update for all other applications present in the cluster, such as StackLight, Tungsten Fabric, and others.

  2. LCM removes ClusterMaintenanceRequest.

After a while the cluster update completes and becomes fully operable again.

Deployment Guide

Mirantis OpenStack for Kubernetes (MOSK) enables the operator to create, scale, update, and upgrade OpenStack deployments on Kubernetes through a declarative API.

The Kubernetes built-in features, such as flexibility, scalability, and declarative resource definition make MOSK a robust solution.

Plan the deployment

The detailed plan of any Mirantis OpenStack for Kubernetes (MOSK) deployment is determined on a per-cloud basis. For the MOSK reference architecture and design overview, see Reference Architecture.

Also, read through Mirantis Container Cloud Reference Architecture: Container Cloud bare metal as a MOSK cluster is deployed on top of a bare metal cluster managed by Mirantis Container Cloud.

Note

One of the industry best practices is to verify every new update or configuration change in a non-customer-facing environment before applying it to production. Therefore, Mirantis recommends having a staging cloud, deployed and maintained along with the production clouds. The recommendation is especially applicable to the environments that:

  • Receive updates often and use continuous delivery. For example, any non-isolated deployment of Mirantis Container Cloud.

  • Have significant deviations from the reference architecture or third party extensions installed.

  • Are managed under the Mirantis OpsCare program.

  • Run business-critical workloads where even the slightest application downtime is unacceptable.

A typical staging cloud is a complete copy of the production environment including the hardware and software configurations, but with a bare minimum of compute and storage capacity.

Provision a Container Cloud bare metal management cluster

The bare metal management system enables the Infrastructure Operator to deploy Container Cloud on a set of bare metal servers. It also enables Container Cloud to deploy MOSK clusters on bare metal servers without a pre-provisioned operating system.

To provision your bare metal management cluster, refer to Mirantis Container Cloud Deployment Guide: Deploy a baremetal-based management cluster

Create a managed cluster

After bootstrapping your baremetal-based Mirantis Container Cloud management cluster, you can create a baremetal-based managed cluster to deploy Mirantis OpenStack for Kubernetes using the Container Cloud API.

Add a bare metal host

Before creating a bare metal managed cluster, add the required number of bare metal hosts using CLI and YAML files for configuration. This section describes how to add bare metal hosts using the Container Cloud CLI during a managed cluster creation.

To add a bare metal host:

  1. Verify that you configured each bare metal host as follows:

    • Enable the boot NIC support for UEFI load. Usually, at least the built-in network interfaces support it.

    • Enable the UEFI-LAN-OPROM support in BIOS -> Advanced -> PCIPCIe.

    • Enable the IPv4-PXE stack.

    • Set the following boot order:

      1. UEFI-DISK

      2. UEFI-PXE

    • If your PXE network is not configured to use the first network interface, fix the UEFI-PXE boot order to speed up node discovering by selecting only one required network interface.

    • Power off all bare metal hosts.

    Warning

    Only one Ethernet port on a host must be connected to the Common/PXE network at any given time. The physical address (MAC) of this interface must be noted and used to configure the BareMetalHost object describing the host.

  2. Log in to the host where your management cluster kubeconfig is located and where kubectl is installed.

  3. Create a secret YAML file that describes the unique credentials of the new bare metal host.

    Example of the bare metal host secret
     apiVersion: v1
     data:
       password: <credentials-password>
       username: <credentials-user-name>
     kind: Secret
     metadata:
       labels:
         kaas.mirantis.com/credentials: "true"
         kaas.mirantis.com/provider: baremetal
         kaas.mirantis.com/region: region-one
       name: <credentials-name>
       namespace: <managed-cluster-project-name>
     type: Opaque
    

    In the data section, add the IPMI user name and password in the base64 encoding to access the BMC. To obtain the base64-encoded credentials, you can use the following command in your Linux console:

    echo -n <username|password> | base64
    

    Caution

    Each bare metal host must have a unique Secret.

  4. Apply this secret YAML file to your deployment:

    kubectl apply -f ${<bmh-cred-file-name>}.yaml
    
  5. Create a YAML file that contains a description of the new bare metal host.

    Example of the bare metal host configuration file with the worker role
    apiVersion: metal3.io/v1alpha1
    kind: BareMetalHost
    metadata:
      labels:
        kaas.mirantis.com/baremetalhost-id: <unique-bare-metal-host-hardware-node-id>
        hostlabel.bm.kaas.mirantis.com/worker: "true"
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
      name: <bare-metal-host-unique-name>
      namespace: <managed-cluster-project-name>
    spec:
      bmc:
        address: <ip_address_for-bmc-access>
        credentialsName: <credentials-name>
      bootMACAddress: <bare-metal-host-boot-mac-address>
      online: true
    

    For a detailed fields description, see Mirantis Container Cloud API Reference: BareMetalHost.

  6. Apply this configuration YAML file to your deployment:

    kubectl apply -f ${<bare-metal-host-config-file-name>}.yaml
    
  7. Verify the new BareMetalHost object status:

    kubectl create -n <managed-cluster-project-name> get bmh -o wide <bare-metal-host-unique-name>
    

    Example of system response:

    NAMESPACE    NAME   STATUS   STATE      CONSUMER  BMC                        BOOTMODE  ONLINE  ERROR  REGION
    my-project   bmh1   OK       preparing            ip_address_for-bmc-access  legacy    true           region-one
    

    During provisioning, the status changes as follows:

    1. registering

    2. inspecting

    3. preparing

  8. After BareMetalHost switches to the preparing stage, the inspecting phase finishes and you can verify that hardware information is available in the object status and matches the MOSK cluster hardware requirements.

    For example:

    • Verify the status of hardware NICs:

      kubectl -n <managed-cluster-project-name> get bmh -o yaml <bare-metal-host-unique-name> -o json |  jq -r '[.status.hardware.nics]'
      

      Example of system response:

      [
        [
          {
            "ip": "172.18.171.32",
            "mac": "ac:1f:6b:02:81:1a",
            "model": "0x8086 0x1521",
            "name": "eno1",
            "pxe": true
          },
          {
            "ip": "fe80::225:90ff:fe33:d5ac%ens1f0",
            "mac": "00:25:90:33:d5:ac",
            "model": "0x8086 0x10fb",
            "name": "ens1f0"
          },
       ...
      
    • Verify the status of RAM:

      kubectl -n <managed-cluster-project-name> get bmh -o yaml <bare-metal-host-unique-name> -o json |  jq -r '[.status.hardware.ramMebibytes]'
      

      Example of system response:

      [
        98304
      ]
      

Now, proceed with Create a custom bare metal host profile.

Create a custom bare metal host profile

The bare metal host profile is a Kubernetes custom resource. It enables the operator to define how the storage devices and the operating system are provisioned and configured.

This section describes the bare metal host profile default settings and configuration of custom profiles for managed clusters using Mirantis Container Cloud API.

Default configuration of the host system storage

The default host profile requires three storage devices in the following strict order:

  1. Boot device and operating system storage

    This device contains boot data and operating system data. It is partitioned using the GUID Partition Table (GPT) labels. The root file system is an ext4 file system created on top of an LVM logical volume. For a detailed layout, refer to the table below.

  2. Local volumes device

    This device contains an ext4 file system with directories mounted as persistent volumes to Kubernetes. These volumes are used by the Mirantis Container Cloud services to store its data, including monitoring and identity databases.

  3. Ceph storage device

    This device is used as a Ceph datastore or Ceph OSD.

The following table summarizes the default configuration of the host system storage set up by the Container Cloud bare metal management.

Default configuration of the bare metal host storage

Device/partition

Name/Mount point

Recommended size, GB

Description

/dev/sda1

bios_grub

4 MiB

The mandatory GRUB boot partition required for non-UEFI systems.

/dev/sda2

UEFI -> /boot/efi

0.2 GiB

The boot partition required for the UEFI boot mode.

/dev/sda3

config-2

64 MiB

The mandatory partition for the cloud-init configuration. Used during the first host boot for initial configuration.

/dev/sda4

lvm_root_part

100% of the remaining free space in the LVM volume group

The main LVM physical volume that is used to create the root file system.

/dev/sdb

lvm_lvp_part -> /mnt/local-volumes

100% of the remaining free space in the LVM volume group

The LVM physical volume that is used to create the file system for LocalVolumeProvisioner.

/dev/sdc

-

100% of the remaining free space in the LVM volume group

Clean raw disk that will be used for the Ceph storage back end.

Now, proceed to Create MOSK host profiles.

Create MOSK host profiles

Different types of MOSK nodes require differently configured host storage. This section describes how to create custom host profiles for different types of MOSK nodes.

You can create custom profiles for managed clusters using Container Cloud API.

To create MOSK bare metal host profiles:

  1. Log in to the local machine where you management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created automatically during the last stage of the management cluster bootstrap.

  2. Create a new bare metal host profile for MOSK compute nodes in a YAML file under the templates/bm/ directory.

  3. Edit the host profile using the example template below to meet your hardware configuration requirements:

    apiVersion: metal3.io/v1alpha1
    kind: BareMetalHostProfile
    metadata:
      name: <PROFILE_NAME>
      namespace: <PROJECT_NAME>
    spec:
      devices:
      # From the HW node, obtain the first device, which size is at least 60Gib
      - device:
          workBy: "by_id,by_wwn,by_path,by_name"
          minSizeGiB: 60
          type: ssd
          wipe: true
        partitions:
        - name: bios_grub
          partflags:
          - bios_grub
          sizeGiB: 0.00390625
          wipe: true
        - name: uefi
          partflags:
          - esp
          sizeGiB: 0.2
          wipe: true
        - name: config-2
          sizeGiB: 0.0625
          wipe: true
        # This partition is only required on compute nodes if you plan to
        # use LVM ephemeral storage.
        - name: lvm_nova_part
          wipe: true
          sizeGiB: 100
        - name: lvm_root_part
          sizeGiB: 0
          wipe: true
      # From the HW node, obtain the second device, which size is at least 60Gib
      # If a device exists but does not fit the size,
      # the BareMetalHostProfile will not be applied to the node
      - device:
          workBy: "by_id,by_wwn,by_path,by_name"
          minSizeGiB: 60
          type: ssd
          wipe: true
      # From the HW node, obtain the disk device with the exact name
      - device:
          workBy: "by_id,by_wwn,by_path,by_name"
          minSizeGiB: 60
          wipe: true
        partitions:
        - name: lvm_lvp_part
          sizeGiB: 0
          wipe: true
      # Example of wiping a device w\o partitioning it.
      # Mandatory for the case when a disk is supposed to be used for Ceph back end
      # later
      - device:
          workBy: "by_id,by_wwn,by_path,by_name"
          wipe: true
      fileSystems:
      - fileSystem: vfat
        partition: config-2
      - fileSystem: vfat
        mountPoint: /boot/efi
        partition: uefi
      - fileSystem: ext4
        logicalVolume: root
        mountPoint: /
      - fileSystem: ext4
        logicalVolume: lvp
        mountPoint: /mnt/local-volumes/
      logicalVolumes:
      - name: root
        sizeGiB: 0
        vg: lvm_root
      - name: lvp
        sizeGiB: 0
        vg: lvm_lvp
      postDeployScript: |
        #!/bin/bash -ex
        echo $(date) 'post_deploy_script done' >> /root/post_deploy_done
      preDeployScript: |
        #!/bin/bash -ex
        echo $(date) 'pre_deploy_script done' >> /root/pre_deploy_done
      volumeGroups:
      - devices:
        - partition: lvm_root_part
        name: lvm_root
      - devices:
        - partition: lvm_lvp_part
        name: lvm_lvp
      grubConfig:
        defaultGrubOptions:
        - GRUB_DISABLE_RECOVERY="true"
        - GRUB_PRELOAD_MODULES=lvm
        - GRUB_TIMEOUT=20
      kernelParameters:
        sysctl:
          kernel.panic: "900"
          kernel.dmesg_restrict: "1"
          kernel.core_uses_pid: "1"
          fs.file-max: "9223372036854775807"
          fs.aio-max-nr: "1048576"
          fs.inotify.max_user_instances: "4096"
          vm.max_map_count: "262144"
    
  4. Add or edit the mandatory parameters in the new BareMetalHostProfile object. For the parameters description, see Container Cloud API: BareMetalHostProfile spec.

  5. Add the bare metal host profile to your management cluster:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> apply -f <pathToBareMetalHostProfileFile>
    
  6. If required, further modify the host profile:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> edit baremetalhostprofile <hostProfileName>
    
  7. Repeat the steps above to create host profiles for other OpenStack node roles such as control plane nodes and storage nodes.

Now, proceed to Enable huge pages in a host profile.

Enable huge pages in a host profile

The BareMetalHostProfile API allows configuring a host to use the huge pages feature of the Linux kernel on managed clusters.

Note

Huge pages is a mode of operation of the Linux kernel. With huge pages enabled, the kernel allocates the RAM in bigger chunks, or pages. This allows a KVM (kernel-based virtual machine) and VMs running on it to use the host RAM more efficiently and improves the performance of VMs.

To enable huge pages in a custom bare metal host profile for a managed cluster:

  1. Log in to the local machine where you management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created automatically during the last stage of the management cluster bootstrap.

  2. Open for editing or create a new bare metal host profile under the templates/bm/ directory.

  3. Edit the grubConfig section of the host profile spec using the example below to configure the kernel boot parameters and enable huge pages:

    spec:
      grubConfig:
        defaultGrubOptions:
        - GRUB_DISABLE_RECOVERY="true"
        - GRUB_PRELOAD_MODULES=lvm
        - GRUB_TIMEOUT=20
        - GRUB_CMDLINE_LINUX_DEFAULT="hugepagesz=1G hugepages=N"
    

    The example configuration above will allocate N huge pages of 1 GB each on the server boot. The last hugepagesz parameter value is default unless default_hugepagesz is defined. For details about possible values, see official Linux kernel documentation.

  4. Add the bare metal host profile to your management cluster:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> apply -f <pathToBareMetalHostProfileFile>
    
  5. If required, further modify the host profile:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> edit baremetalhostprofile <hostProfileName>
    
  6. Proceed to Create a MOSK cluster.

Configure RAID support

TechPreview

You can configure the support of the software-based Redundant Array of Independent Disks (RAID) using BareMetalHosProfile to set up an LVM-based RAID level 1 (raid1) or an mdadm-based RAID level 0, 1, or 10 (raid0, raid1, or raid10).

If required, you can further configure RAID in the same profile, for example, to install a cluster operating system onto a RAID device.

Caution

  • RAID configuration on already provisioned bare metal machines or on an existing cluster is not supported.

    To start using any kind of RAIDs, reprovisioning of machines with a new BaremetalHostProfile is required.

  • Mirantis supports the raid1 type of RAID devices both for LVM and mdadm.

  • Mirantis supports the raid0 type for the mdadm RAID to be on par with the LVM linear type.

  • Mirantis recommends having at least two physical disks for raid0 and raid1 devices to prevent unnecessary complexity.

  • Starting from MOSK 22.2, Mirantis supports the raid10 type for mdadm RAID. At least four physical disks are required for this type of RAID.

  • Only an even number of disks can be used for a raid1 or raid10 device.

Create an LVM software RAID (raid1)

TechPreview

Warning

The EFI system partition partflags: ['esp'] must be a physical partition in the main partition table of the disk, not under LVM or mdadm software RAID.

During configuration of your custom bare metal host profile, you can create an LVM-based software RAID device raid1 by adding type: raid1 to the logicalVolume spec in BaremetalHostProfile.

Caution

The logicalVolume spec of the raid1 type requires at least two devices (partitions) in volumeGroup where you build a logical volume. For an LVM of the linear type, one device is enough.

Note

The LVM raid1 requires additional space to store the raid1 metadata on a volume group, roughly 4 MB for each partition. Therefore, you cannot create a logical volume of exactly the same size as the partitions it works on.

For example, if you have two partitions of 10 GiB, the corresponding raid1 logical volume size will be less than 10 GiB. For that reason, you can either set sizeGiB: 0 to use all available space on the volume group, or set a smaller size than the partition size. For example, use sizeGiB: 9.9 instead of sizeGiB: 10 for the logical volume.

The following example illustrates an extract of BaremetalHostProfile with / on the LVM raid1.

...
devices:
  - device:
      workBy: "by_id,by_wwn,by_path,by_name"
      minSizeGiB: 200
      type: hdd
      wipe: true
    partitions:
      - name: root_part1
        sizeGiB: 120
    partitions:
      - name: rest_sda
        sizeGiB: 0
  - device:
      workBy: "by_id,by_wwn,by_path,by_name"
      minSizeGiB: 200
      type: hdd
      wipe: true
    partitions:
      - name: root_part2
        sizeGiB: 120
    partitions:
      - name: rest_sdb
        sizeGiB: 0
volumeGroups:
  - name: vg-root
    devices:
      - partition: root_part1
      - partition: root_part2
  - name: vg-data
    devices:
      - partition: rest_sda
      - partition: rest_sdb
logicalVolumes:
  - name: root
    type: raid1  ## <-- LVM raid1
    vg: vg-root
    sizeGiB: 119.9
  - name: data
    type: linear
    vg: vg-data
    sizeGiB: 0
fileSystems:
  - fileSystem: ext4
    logicalVolume: root
    mountPoint: /
    mountOpts: "noatime,nodiratime"
  - fileSystem: ext4
    logicalVolume: data
    mountPoint: /mnt/data
Create an mdadm software RAID (raid0, raid1, raid10)

TechPreview

Warning

The EFI system partition partflags: ['esp'] must be a physical partition in the main partition table of the disk, not under LVM or mdadm software RAID.

During configuration of your custom bare metal host profile as described in Create a custom bare metal host profile, you can create an mdadm-based software RAID device raid0 and raid1 by describing the mdadm devices under the softRaidDevices field in BaremetalHostProfile. For example:

...
softRaidDevices:
- name: /dev/md0
  devices:
  - partition: sda1
  - partition: sdb1
- name: raid-name
  devices:
  - partition: sda2
  - partition: sdb2
...

Starting from MOSK 22.2, you can also use the raid10 type for the mdadm-based software RAID devices. This type requires at least four and in total an even number of storage devices available on your servers. For example:

softRaidDevices:
- name: /dev/md0
  level: raid10
  devices:
    - partition: sda1
    - partition: sdb1
    - partition: sdc1
    - partition: sdd1

The following fields in softRaidDevices describe RAID devices:

  • name

    Name of the RAID device to refer to throughout the baremetalhostprofile.

  • level

    Type or level of RAID used to create a device, defaults to raid1. Set to raid0 or raid10 to create a device of the corresponding type.

  • devices

    List of physical devices or partitions used to build a software RAID device. It must include at least two partitions or devices to build a raid0 and raid1 devices and at least four for raid10.

For the rest of the mdadm RAID parameters, see Container Cloud API: BareMetalHostProfile spec.

Caution

The mdadm RAID devices cannot be created on top of LVM devices.

The following example illustrates an extract of BaremetalHostProfile with / on the mdadm raid1 and some data storage on raid0:

The following example illustrates an extract of BaremetalHostProfile with data storage on a raid10 device:

Create LVM volume groups on top of RAID devices

Available since MOSK 22.2 TechPreview

You can configure an LVM volume group on top of mdadm-based RAID devices as physical volumes using the BareMetalHostProfile resource. List the required RAID devices in a separate field of the volumeGroups definition within the storage configuration of BareMetalHostProfile.

The following example illustrates an extract of BaremetalHostProfile with a volume group named lvm_nova to be created on top of an mdadm-based RAID device raid1:

...
devices:
  - device:
      workBy: "by_id,by_wwn,by_path,by_name"
      minSizeGiB: 60
      type: ssd
      wipe: true
    partitions:
      - name: bios_grub
        partflags:
          - bios_grub
        sizeGiB: 0.00390625
      - name: uefi
        partflags:
          - esp
        sizeGiB: 0.20000000298023224
      - name: config-2
        sizeGiB: 0.0625
  - device:
      workBy: "by_id,by_wwn,by_path,by_name"
      minSizeGiB: 30
      type: ssd
      wipe: true
    partitions:
      - name: md0_part1
  - device:
      workBy: "by_id,by_wwn,by_path,by_name"
      minSizeGiB: 30
      type: ssd
      wipe: true
    partitions:
      - name: md0_part2
softRaidDevices:
  - devices:
      - partition: md0_part1
      - partition: md0_part2
    level: raid1
    metadata: "1.0"
    name: /dev/md0
volumeGroups:
  - devices:
      - softRaidDevice: /dev/md0
    name: lvm_nova
...
Create a MOSK cluster

With L2 networking templates, you can create MOSK clusters with advanced host networking configurations. For example, you can create bond interfaces on top of physical interfaces on the host or use multiple subnets to separate different types of network traffic.

You can use several host-specific L2 templates per one cluster to support different hardware configurations. For example, you can create L2 templates with a different number and layout of NICs to be applied to specific machines of one cluster.

You can also use multiple L2 templates to support different roles for nodes in a MOSK installation. You can create L2 templates with different logical interfaces and assign them to individual machines based on their roles in a MOSK cluster.

When you create a baremetal-based project in the Container Cloud web UI, the exemplary templates with the ipam/PreInstalledL2Template label are copied to this project. These templates are preinstalled during the management cluster bootstrap.

Follow the procedures below to create MOSK clusters using L2 templates.

Create a managed bare metal cluster

This section instructs you on how to configure and deploy a managed cluster that is based on the baremetal-based management cluster through the Mirantis Container Cloud web UI.

To create a managed cluster on bare metal:

  1. Log in to the Container Cloud web UI with the writer permissions.

  2. Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.

    Caution

    Do not create a new managed cluster for MOSK in the default project (Kubernetes namespace). If no projects are defined, create a new mosk project first.

  3. In the SSH keys tab, click Add SSH Key to upload the public SSH key that will be used for the SSH access to VMs.

  4. Optional. In the Proxies tab, enable proxy access to the managed cluster:

    1. Click Add Proxy.

    2. In the Add New Proxy wizard, fill out the form with the following parameters:

      Proxy configuration

      Parameter

      Description

      Proxy Name

      Name of the proxy server to use during a managed cluster creation.

      Region

      From the drop-down list, select the required region.

      HTTP Proxy

      Add the HTTP proxy server domain name in the following format:

      • http://proxy.example.com:port - for anonymous access

      • http://user:password@proxy.example.com:port - for restricted access

      HTTPS Proxy

      Add the HTTPS proxy server domain name in the same format as for HTTP Proxy.

      No Proxy

      Comma-separated list of IP addresses or domain names.

    For the list of Mirantis resources and IP addresses to be accessible from the Container Cloud clusters, see Reference Architecture: Requirements.

  5. In the Clusters tab, click Create Cluster.

  6. Configure the new cluster in the Create New Cluster wizard that opens:

    1. Define general and Kubernetes parameters:

      Create new cluster: General, Provider, and Kubernetes

      Section

      Parameter name

      Description

      General settings

      Cluster name

      The cluster name.

      Provider

      Select Baremetal.

      Region

      From the drop-down list, select Baremetal.

      Release version

      Select a Container Cloud version with the OpenStack label tag. Otherwise, you will not be able to deploy MOSK on this managed cluster.

      Proxy

      Optional. From the drop-down list, select the proxy server name that you have previously created.

      SSH keys

      From the drop-down list, select the SSH key name that you have previously added for SSH access to the bare metal hosts.

      Provider

      LB host IP

      The IP address of the load balancer endpoint that will be used to access the Kubernetes API of the new cluster. This IP address must be on the Combined/PXE network.

      LB address range

      The range of IP addresses that can be assigned to load balancers for Kubernetes Services by MetalLB.

      Kubernetes

      Services CIDR blocks

      The Kubernetes Services CIDR blocks. For example, 10.233.0.0/18.

      Pods CIDR blocks

      The Kubernetes pods CIDR blocks. For example, 10.233.64.0/18.

    2. Configure StackLight:

      StackLight configuration

      Section

      Parameter name

      Description

      StackLight

      Enable Monitoring

      Selected by default. Deselect to skip StackLight deployment.

      Note

      You can also enable, disable, or configure StackLight parameters after deploying a managed cluster. For details, see Mirantis Container Cloud Operations Guide:

      Enable Logging

      Select to deploy the StackLight logging stack. For details about the logging components, see Deployment architecture.

      Note

      The logging mechanism performance depends on the cluster log load. In case of a high load, you may need to increase the default resource requests and limits for fluentdLogs. For details, see Mirantis Container Cloud Operations Guide: StackLight resource limits.

      HA Mode

      Select to enable StackLight monitoring in the HA mode. For the differences between HA and non-HA modes, see Deployment architecture.

      StackLight Default Logs Severity Level

      Log severity (verbosity) level for all StackLight components. The default value for this parameter is Default component log level that respects original defaults of each StackLight component. For details about severity levels, see Mirantis Container Cloud Operations Guide: StackLight log verbosity.

      StackLight Component Logs Severity Level

      The severity level of logs for a specific StackLight component that overrides the value of the StackLight Default Logs Severity Level parameter. For details about severity levels, see Mirantis Container Cloud Operations Guide: StackLight log verbosity.

      Expand the drop-down menu for a specific component to display its list of available log levels.

      Elasticsearch 0

      Logstash Retention Time Available since MOSK 22.3

      Available if you select Enable Logging. Specifies the logstash-* index retention time.

      Events Retention Time Available since MOSK 22.3

      Available if you select Enable Logging. Specifies the kubernetes_events-* index retention time.

      Notifications Retention Time Available since MOSK 22.3

      Available if you select Enable Logging. Specifies the notification-* index retention time.

      Retention Time Removed since MOSK 22.3

      Available if you select Enable Logging. The OpenSearch logs retention period.

      Persistent Volume Claim Size

      Available if you select Enable Logging. The OpenSearch persistent volume claim size.

      Collected Logs Severity Level

      Available if you select Enable Logging. The minimum severity of all Container Cloud components logs collected in OpenSearch. For details about severity levels, see Mirantis Container Cloud Operations Guide: StackLight logging.

      Prometheus

      Retention Time

      The Prometheus database retention period.

      Retention Size

      The Prometheus database retention size.

      Persistent Volume Claim Size

      The Prometheus persistent volume claim size.

      Enable Watchdog Alert

      Select to enable the Watchdog alert that fires as long as the entire alerting pipeline is functional.

      Custom Alerts

      Specify alerting rules for new custom alerts or upload a YAML file in the following exemplary format:

      - alert: HighErrorRate
        expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
        for: 10m
        labels:
          severity: page
        annotations:
          summary: High request latency
      

      For details, see Official Prometheus documentation: Alerting rules. For the list of the predefined StackLight alerts, see Operations Guide: StackLight alerts.

      StackLight Email Alerts

      Enable Email Alerts

      Select to enable the StackLight email alerts.

      Send Resolved

      Select to enable notifications about resolved StackLight alerts.

      Require TLS

      Select to enable transmitting emails through TLS.

      Email alerts configuration for StackLight

      Fill out the following email alerts parameters as required:

      • To - the email address to send notifications to.

      • From - the sender address.

      • SmartHost - the SMTP host through which the emails are sent.

      • Authentication username - the SMTP user name.

      • Authentication password - the SMTP password.

      • Authentication identity - the SMTP identity.

      • Authentication secret - the SMTP secret.

      StackLight Slack Alerts

      Enable Slack alerts

      Select to enable the StackLight Slack alerts.

      Send Resolved

      Select to enable notifications about resolved StackLight alerts.

      Slack alerts configuration for StackLight

      Fill out the following Slack alerts parameters as required:

      • API URL - The Slack webhook URL.

      • Channel - The channel to send notifications to, for example, #channel-for-alerts.

      0

      Starting from MOSK 22.2, Elasticsearch has switched to OpenSearch. For details, see Elasticsearch switch to OpenSearch.

  7. Click Create.

    To monitor the cluster readiness, hover over the status icon of a specific cluster in the Status column of the Clusters page.

    Once the orange blinking status icon is green and Ready, the cluster deployment or update is complete.

    You can monitor live deployment status of the following cluster components:

    Component

    Description

    Helm

    Installation or upgrade status of all Helm releases

    Kubelet

    Readiness of the node in a Kubernetes cluster, as reported by kubelet

    Kubernetes

    Readiness of all requested Kubernetes objects

    Nodes

    Equality of the requested nodes number in the cluster to the number of nodes having the Ready LCM status

    OIDC

    Readiness of the cluster OIDC configuration

    StackLight

    Health of all StackLight-related objects in a Kubernetes cluster

    Swarm

    Readiness of all nodes in a Docker Swarm cluster

    LoadBalancer

    Readiness of the Kubernetes API load balancer

    ProviderInstance

    Readiness of all machines in the underlying infrastructure (virtual or bare metal, depending on the provider type)

  8. Optional. Colocate the OpenStack control plane with the managed cluster Kubernetes manager nodes by adding the following field to the Cluster object spec:

    spec:
      providerSpec:
        value:
          dedicatedControlPlane: false
    

    Note

    This feature is available as technical preview. Use such configuration for testing and evaluation purposes only.

  9. Optional. Customize MetalLB speakers that are deployed on all Kubernetes nodes except master nodes by default. For details, see Configure the MetalLB speaker node selector.

  10. Once you have created a MOSK cluster, some StackLight alerts may raise as false-positive until you deploy the Mirantis OpenStack environment.

  11. Proceed to Workflow of network interface naming.

Workflow of network interface naming

To simplify operations with L2 templates, before you start creating them, inspect the general workflow of a network interface name gathering and processing.

Network interface naming workflow:

  1. The Operator creates a baremetalHost object.

  2. The baremetalHost object executes the introspection stage and becomes ready.

  3. The Operator collects information about NIC count, naming, and so on for further changes in the mapping logic.

    At this stage, the NICs order in the object may randomly change during each introspection, but the NICs names are always the same. For more details, see Predictable Network Interface Names.

    For example:

    # Example commands:
    # kubectl -n managed-ns get bmh baremetalhost1 -o custom-columns='NAME:.metadata.name,STATUS:.status.provisioning.state'
    # NAME            STATE
    # baremetalhost1  ready
    
    # kubectl -n managed-ns get bmh baremetalhost1 -o yaml
    # Example output:
    
    apiVersion: metal3.io/v1alpha1
    kind: BareMetalHost
    ...
    status:
    ...
        nics:
        - ip: fe80::ec4:7aff:fe6a:fb1f%eno2
          mac: 0c:c4:7a:6a:fb:1f
          model: 0x8086 0x1521
          name: eno2
          pxe: false
        - ip: fe80::ec4:7aff:fe1e:a2fc%ens1f0
          mac: 0c:c4:7a:1e:a2:fc
          model: 0x8086 0x10fb
          name: ens1f0
          pxe: false
        - ip: fe80::ec4:7aff:fe1e:a2fd%ens1f1
          mac: 0c:c4:7a:1e:a2:fd
          model: 0x8086 0x10fb
          name: ens1f1
          pxe: false
        - ip: 192.168.1.151 # Temp. PXE network adress
          mac: 0c:c4:7a:6a:fb:1e
          model: 0x8086 0x1521
          name: eno1
          pxe: true
     ...
    
  4. The Operator selects from the following options:

  5. The Operator creates a Machine or Subnet object.

  6. The baremetal-provider service links the Machine object to the baremetalHost object.

  7. The kaas-ipam and baremetal-provider services collect hardware information from the baremetalHost object and use it to configure host networking and services.

  8. The kaas-ipam service:

    1. Spawns the IpamHost object.

    2. Renders the l2template object.

    3. Spawns the ipaddr object.

    4. Updates the IpamHost object status with all rendered and linked information.

  9. The baremetal-provider service collects the rendered networking information from the IpamHost object

  10. The baremetal-provider service proceeds with the IpamHost object provisioning.

Now proceed to Create subnets.

Create subnets

Before creating an L2 template, ensure that you have the required subnets that can be used in the L2 template to allocate IP addresses for the MOSK cluster nodes. Where required, create a number of subnets for a particular project using the Subnet CR. A subnet has three logical scopes:

  • global - CR uses the default namespace. A subnet can be used for any cluster located in any project.

  • namespaced - CR uses the namespace that corresponds to a particular project where MOSK clusters are located. A subnet can be used for any cluster located in the same project.

  • cluster - CR uses the namespace where the referenced cluster is located. A subnet is only accessible to the cluster that L2Template.spec.clusterRef refers to. The Subnet objects with the cluster scope will be created for every new cluster.

You can have subnets with the same name in different projects. In this case, the subnet that has the same project as the cluster will be used. One L2 template may often reference several subnets, those subnets may have different scopes in this case.

The IP address objects (IPaddr CR) that are allocated from subnets always have the same project as their corresponding IpamHost objects, regardless of the subnet scope.

To create subnets for a cluster:

  1. Log in to a local machine where your management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created during the last stage of the management cluster bootstrap.

  2. Create the subnet.yaml file with a number of global or namespaced subnets depending on the configuration of your cluster:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> apply -f <SubnetFileName.yaml>
    

    Note

    In the command above and in the steps below, substitute the parameters enclosed in angle brackets with the corresponding values.

    Example of a subnet.yaml file:

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      name: demo
      namespace: demo-namespace
    spec:
      cidr: 10.11.0.0/24
      gateway: 10.11.0.9
      includeRanges:
      - 10.11.0.5-10.11.0.70
      nameservers:
      - 172.18.176.6
    
    Specification fields of the Subnet object

    Parameter

    Description

    cidr (singular)

    A valid IPv4 CIDR, for example, 10.11.0.0/24.

    includeRanges (list)

    A list of IP address ranges within the given CIDR that should be used in the allocation of IPs for nodes (excluding the gateway address). The IPs outside the given ranges will not be used in the allocation. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77. In the example above, the addresses 10.11.0.5-10.11.0.70 (excluding the gateway address 10.11.0.9) will be allocated for nodes. The includeRanges parameter is mutually exclusive with excludeRanges.

    excludeRanges (list)

    A list of IP address ranges within the given CIDR that should not be used in the allocation of IPs for nodes. The IPs within the given CIDR but outside the given ranges will be used in the allocation (excluding gateway address). Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77. The excludeRanges parameter is mutually exclusive with includeRanges.

    useWholeCidr (boolean)

    If set to true, the subnet address (10.11.0.0 in the example above) and the broadcast address (10.11.0.255 in the example above) are included into the address allocation for nodes. Otherwise, (false by default), the subnet address and broadcast address will be excluded from the address allocation.

    gateway (singular)

    A valid gateway address, for example, 10.11.0.9.

    nameservers (list)

    A list of the IP addresses of name servers. Each element of the list is a single address, for example, 172.18.176.6.

    Caution

    The subnet for the PXE network is automatically created during deployment and must contain the ipam/DefaultSubnet: "1" label. Each bare metal region must have only one subnet with this label.

    Caution

    You may use different subnets to allocate IP addresses to diffrenet components of MCC in your cluster. See below for detailed list of available options. Each subnet that is used to configure an MCC service must be labelled with special service label that starts with ipam/SVC- prefix. Make sure that no subnet has more than one such label.

  3. Optional. Add subnet for MetalLB service in your cluster. To designate a Subnet as MetalLB address pool, use label key ipam/SVC-MetalLB. Set value of the label to “1”. Set the cluster.sigs.k8s.io/cluster-name label to the name of the cluster where subnet is used. You may create multiple subnets with ipam/SVC-MetalLB label to define multiple IP address ranges for MetalLB in the cluster.

    Caution

    The IP addresses of the MetalLB address pool are not assigned to the interfaces on hosts. This is purely virtual subnet. Make sure that it is not included in the L2 template definitions for your cluster.

    Caution

    Intersection of IP address ranges within any single MetalLB address pool is not permitted. Make sure that this requirement is satisfied when configuring MetalLB address pools.

    Since MOSK 22.3, intersection of IP address ranges is verified by the bare metal provider. If intersection is identified, the MetalLB configuration will be blocked and the provider logs will contain corresponding error messages.

    Note

    • When MetalLB address ranges are defined in both cluster specification and specific Subnet objects, the resulting MetalLB address pools configuration will contain address ranges from both cluster specification and Subnet objects.

    • All address ranges for L2 address pools that are defined in both cluster specification and Subnet objects are aggregated into a single L2 address pool and sorted as strings.

  4. Optional. Technology Preview. Add a subnet for the externally accessible API endpoint of the MOSK cluster.

    • Make sure that loadBalancerHost is set to "" (empty string) in the Cluster spec.

      spec:
        providerSpec:
          value:
            apiVersion: baremetal.k8s.io/v1alpha1
            kind: BaremetalClusterProviderSpec
            ...
            loadBalancerHost: ""
      
    • Create a subnet with the ipam/SVC-LBhost label having the "1" value to make the baremetal-provider use this subnet for allocation of addresses for cluster API endpoints.

    One IP address will be allocated for each cluster to serve its Kubernetes/MKE API endpoint.

    Caution

    Make sure that master nodes have host local-link addresses in the same subnet as the cluster API endpoint address. These host IP addresses will be used for VRRP traffic. The cluster API endpoint address will be assigned to the same interface on one of the master nodes where these host IPs are assigned.

    Note

    We highly recommend that you assign the cluster API endpoint address from the LCM network. For details on cluster networks types, refer to MOSK cluster networking. See also the Single MOSK cluster use case example in the following table.

    You can use several options of addresses allocation scope of API endpoints using subnets:

    Use case

    Example configuration

    Several MOSK clusters in a region

    Create a subnet in the default namespace with no reference to any cluster.

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      name: lbhost-per-region
      namespace: default
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        ipam/SVC-LBhost: "1"
    spec:
      cidr: 191.11.0.0/24
      includeRanges:
      - 191.11.0.6-191.11.0.20
    

    Several MOSK clusters in a project

    Create a subnet in a namespace corresponding to your project with no reference to any cluster. Such subnet has priority over the one described above.

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      name: lbhost-per-namespace
      namespace: my-project
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        ipam/SVC-LBhost: "1"
    spec:
      cidr: 191.11.0.0/24
      includeRanges:
      - 191.11.0.6-191.11.0.20
    

    Single MOSK cluster

    Create a subnet in a namespace corresponding to your project with a reference to the target cluster using the cluster.sigs.k8s.io/cluster-name label. Such subnet has priority over the ones described above. In this case, it is not obligatory to use a dedicated subnet for addresses allocation of API endpoints. You can add the ipam/SVC-LBhost label to the LCM subnet, and one of the addresses from this subnet will be allocated for an API endpoint:

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      name: lbhost-per-namespace
      namespace: my-project
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        ipam/SVC-LBhost: "1"
        ipam/SVC-k8s-lcm: "1"
        cluster.sigs.k8s.io/cluster-name: my-cluster
    spec:
      cidr: 10.11.0.0/24
      includeRanges:
      - 10.11.0.6-10.11.0.50
    

    The above options can be used in conjunction. For example, you can define a subnet for a region, a number of subnets within this region defined for particular namespaces, and a number of subnets within the same region and namespaces defined for particular clusters.

  5. Optional. Add a subnet(s) for the storage access network.

    • Set the ipam/SVC-ceph-public label with the value "1" to create a subnet that will be used to configure the Ceph public network.

    • Ceph will automatically use this subnet for its external connections.

    • A Ceph OSD will look for and bind to an address from this subnet when it is started on a machine.

    • Use this subnet in the L2 template for storage nodes.

    • Assign this subnet to the interface connected to your storage access network.

    When using this label, set the cluster.sigs.k8s.io/cluster-name label to the name of the target cluster during the subnet creation.

  6. Optional. Add a subnet(s) for the storage replication network.

    • Set the ipam/SVC-ceph-cluster label with the value "1" to create a subnet that will be used to configure Ceph ‘replication’ network.

    • Ceph will automatically use this subnet for its internal replication traffic.

    • Use this subnet in the L2 template for storage nodes.

    When using this label, set the cluster.sigs.k8s.io/cluster-name label to the name of the target cluster during the subnet creation.

  7. Optional. Add a subnet for Kubernetes pods traffic.

    Caution

    Use of a dedicated network for Kubernetes pods traffic, for external connection to the Kubernetes services exposed by the cluster, and for the Ceph cluster access and replication traffic is available as Technology Preview. Use such configurations for testing and evaluation purposes only. For the Technology Preview feature definition, refer to Technology Preview features.

    The following feature is still under development and will be announced in one of the following Container Cloud releases:

    • Switching Kubernetes API to listen to the specified IP address on the node

  8. Verify that the subnet is successfully created:

    kubectl get subnet kaas-mgmt -oyaml
    

    In the system output, verify the status fields of the subnet.yaml file using the table below.

    Status fields of the Subnet object

    Parameter

    Description

    statusMessage

    Contains a short state description and a more detailed one if applicable. The short status values are as follows:

    • OK - operational.

    • ERR - non-operational. This status has a detailed description, for example, ERR: Wrong includeRange for CIDR….

    cidr

    Reflects the actual CIDR, has the same meaning as spec.cidr.

    gateway

    Reflects the actual gateway, has the same meaning as spec.gateway.

    nameservers

    Reflects the actual name servers, has same meaning as spec.nameservers.

    ranges

    Specifies the address ranges that are calculated using the fields from spec: cidr, includeRanges, excludeRanges, gateway, useWholeCidr. These ranges are directly used for nodes IP allocation.

    lastUpdate

    Includes the date and time of the latest update of the Subnet RC.

    allocatable

    Includes the number of currently available IP addresses that can be allocated for nodes from the subnet.

    allocatedIPs

    Specifies the list of IPv4 addresses with the corresponding IPaddr object IDs that were already allocated from the subnet.

    capacity

    Contains the total number of IP addresses being held by ranges that equals to a sum of the allocatable and allocatedIPs parameters values.

    versionIpam

    Contains thevVersion of the kaas-ipam component that made the latest changes to the Subnet RC.

    Example of a successfully created subnet:

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      labels:
        ipam/UID: 6039758f-23ee-40ba-8c0f-61c01b0ac863
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
      name: kaas-mgmt
      namespace: default
    spec:
      cidr: 10.0.0.0/24
      excludeRanges:
      - 10.0.0.100
      - 10.0.0.101-10.0.0.120
      gateway: 10.0.0.1
      includeRanges:
      - 10.0.0.50-10.0.0.90
      nameservers:
      - 172.18.176.6
    status:
      allocatable: 38
      allocatedIPs:
      - 10.0.0.50:0b50774f-ffed-11ea-84c7-0242c0a85b02
      - 10.0.0.51:1422e651-ffed-11ea-84c7-0242c0a85b02
      - 10.0.0.52:1d19912c-ffed-11ea-84c7-0242c0a85b02
      capacity: 41
      cidr: 10.0.0.0/24
      gateway: 10.0.0.1
      lastUpdate: "2020-09-26T11:40:44Z"
      nameservers:
      - 172.18.176.6
      ranges:
      - 10.0.0.50-10.0.0.90
      statusMessage: OK
      versionIpam: v3.0.999-20200807-130909-44151f8
    

Now, proceed with creating subnets for your MOSK cluster as described in Create subnets for a MOSK cluster.

Create subnets for a MOSK cluster

According to the MOSK reference architecture, you should create the following subnets.

lcm-nw

The LCM network of the MOSK cluster. Example of lcm-nw:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    kaas-mgmt-subnet: ""
  name: lcm-nw
  namespace: <MOSKClusterNamespace>
spec:
  cidr: 172.16.43.0/24
  gateway: 172.16.43.1
  includeRanges:
  - 172.16.43.10-172.16.43.100
k8s-ext-subnet

The addresses from this subnet are assigned to interfaces connected to the external network.

Example of k8s-ext-subnet:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
  name: k8s-ext-subnet
  namespace: <MOSKClusterNamespace>
spec:
  cidr: 172.16.45.0/24
  includeRanges:
  - 172.16.45.10-172.16.45.100
mosk-metallb-subnet

This subnet is not allocated to interfaces, but used as a MetalLB address pool to expose MOSK API endpoints as Kubernetes cluster services.

Example of mosk-metallb-subnet:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    ipam/SVC-metallb: true
  name: mosk-metallb-subnet
  namespace: <MOSKClusterNamespace>
spec:
  cidr: 172.16.45.0/24
  includeRanges:
  - 172.16.45.101-172.16.45.200
k8s-pods-subnet

The addresses from this subnet are assigned to interfaces conncected to the internal network and used by Calico as underlay for traffic between the pods in Kubernetes cluster.

Example of k8s-pods-subnet:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
  name: k8s-pods-subnet
  namespace: <MOSKClusterNamespace>
spec:
  cidr: 10.12.3.0/24
  includeRanges:
  - 10.12.3.10-10.12.3.100
neutron-tunnel-subnet

The underlay network for VXLAN tunnels for the MOSK tenants traffic. If deployed with Tungsten Fabric, it is used for MPLS over UDP+GRE traffic.

Example of neutron-tunnel-subnet:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
  name: neutron-tunnel-subnet
  namespace: <MOSKClusterNamespace>
spec:
  cidr: 10.12.2.0/24
  includeRanges:
  - 10.12.2.10-10.12.2.100
ceph-public-subnet

Example of a Ceph cluster access network:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    ipam/SVC-ceph-public: true
  name: ceph-public-subnet
  namespace: <MOSKClusterNamespace>
spec:
  cidr: 10.12.0.0/24
ceph-cluster-subnet

Example of the Ceph replication traffic network:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    ipam/SVC-ceph-cluster: true
  name: ceph-cluster-subnet
  namespace: <MOSKClusterNamespace>
spec:
  cidr: 10.12.1.0/24

Now, proceed with creating an L2 template for one or multiple managed clusters as described in Create L2 templates.

Create L2 templates

After you create subnets for the MOSK cluster as described in Create subnets, follow the procedure below to create L2 templates for different types of OpenStack nodes in the cluster.

See the following subsections for templates that implement the MOSK Reference Architecture: Networking. You may adjust the templates according to the requirements of your architecture using the last two subsections of this section. They explain mandatory parameters of the templates and supported configuration options.

Create an L2 template for a Kubernetes manager node

Warning

Avoid modifying existing L2 templates and subnets that the deployed machines use. This prevents multiple clusters failures caused by unsafe changes. The list of risks posed by modifying L2 templates includes:

  • Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.

  • Connections between services are interrupted unexpectedly, which can cause data loss.

  • Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.

Note

Starting from MOSK 22.3, modification of L2 templates in use is prohibited in the API to prevent accidental cluster failures due to unsafe changes.

According to the reference architecture, the Kubernetes manager nodes in the MOSK cluster must be connected to the following networks:

  • PXE network

  • LCM network

Caution

If you plan to deploy MOSK cluster with the compact control plane option, skip this section entirely and proceed with Create an L2 template for a MOSK controller node.

To create an L2 template for Kubernetes manager nodes:

  1. Create or open the mosk-l2templates.yml file that contains the L2 templates you are preparing.

  2. Add an L2 template using the following example. Adjust the values of specific parameters according to the specifications of your environment.

    L2 template example
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: <MOSKClusterName>
      name: k8s-manager
      namespace: <MOSKnClusterNamespace>
    spec:
      autoIfMappingPrio:
      - provision
      - eno
      - ens
      - enp
      clusterRef: <MOSKClusterName>
      l3Layout:
      - subnetName: lcm-nw
        scope: global
        labelSelector:
          kaas.mirantis.com/provider: baremetal
          kaas-mgmt-subnet: ""
      npTemplate: |-
        version: 2
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
            mtu: 9000
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
            mtu: 9000
          {{nic 2}}
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 2}}
            set-name: {{nic 2}}
            mtu: 9000
          {{nic 3}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 3}}
            set-name: {{nic 3}}
            mtu: 9000
        bonds:
          bond0:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 0}}
            - {{nic 1}}
        vlans:
          k8s-lcm-v:
            id: 403
            link: bond0
            mtu: 9000
          k8s-ext-v:
            id: 409
            link: bond0
            mtu: 9000
          k8s-pods-v:
            id: 408
            link: bond0
            mtu: 9000
        bridges:
          k8s-lcm:
            interfaces: [k8s-lcm-v]
            nameservers:
              addresses: {{nameservers_from_subnet "lcm-nw"}}
            gateway4: {{ gateway_from_subnet "lcm-nw" }}
            addresses:
            - {{ ip "0:lcm-nw" }}
          k8s-ext:
            interfaces: [k8s-ext-v]
            addresses:
            - {{ip "k8s-ext:k8s-ext-subnet"}}
            mtu: 9000
          k8s-pods:
            interfaces: [k8s-pods-v]
            addresses:
            - {{ip "k8s-pods:k8s-pods-subnet"}}
            mtu: 9000
    
  3. Proceed with Create an L2 template for a MOSK controller node. The resulting L2 template will be used to render the netplan configuration for the managed cluster machines.

Create an L2 template for a MOSK controller node

Warning

Avoid modifying existing L2 templates and subnets that the deployed machines use. This prevents multiple clusters failures caused by unsafe changes. The list of risks posed by modifying L2 templates includes:

  • Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.

  • Connections between services are interrupted unexpectedly, which can cause data loss.

  • Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.

Note

Starting from MOSK 22.3, modification of L2 templates in use is prohibited in the API to prevent accidental cluster failures due to unsafe changes.

According to the reference architecture, MOSK controller nodes must be connected to the following networks:

  • PXE network

  • LCM network

  • Kubernetes workloads network

  • Storage access network

  • Floating IP and provider networks. Not required for deployment with Tungsten Fabric.

  • Tenant underlay networks. If deploying with VXLAN networking or with Tungsten Fabric. In the latter case, the BGP service is configured over this network.

To create an L2 template for MOSK controller nodes:

  1. Create or open the mosk-l2template.yml file that contains the L2 templates.

  2. Add an L2 template using the following example. Adjust the values of specific parameters according to the specification of your environment.

    Example of an L2 template for MOSK controller nodes
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: <MOSKClusterName>
      name: mosk-controller
      namespace: <MOSKClusterNamespace>
    spec:
      autoIfMappingPrio:
      - provision
      - eno
      - ens
      - enp
      clusterRef: <MOSKClusterName>
      l3Layout:
      - subnetName: lcm-nw
        scope: global
        labelSelector:
          kaas.mirantis.com/provider: baremetal
          kaas-mgmt-subnet: ""
      - subnetName: k8s-ext-subnet
        scope: namespace
      - subnetName: k8s-pods-subnet
        scope: namespace
      - subnetName: ceph-cluster-subnet
        scope: namespace
      - subnetName: ceph-public-subnet
        scope: namespace
      - subnetName: neutron-tunnel-subnet
        scope: namespace
      npTemplate: |-
        version: 2
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
            mtu: 9000
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
            mtu: 9000
          {{nic 2}}
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 2}}
            set-name: {{nic 2}}
            mtu: 9000
          {{nic 3}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 3}}
            set-name: {{nic 3}}
            mtu: 9000
        bonds:
          bond0:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 0}}
            - {{nic 1}}
          bond1:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 2}}
            - {{nic 3}}
        vlans:
          k8s-lcm-v:
            id: 403
            link: bond0
            mtu: 9000
          k8s-ext-v:
            id: 409
            link: bond0
            mtu: 9000
          k8s-pods-v:
            id: 408
            link: bond0
            mtu: 9000
          pr-floating:
            id: 407
            link: bond1
            mtu: 9000
          stor-frontend:
            id: 404
            link: bond0
            mtu: 9000
          stor-backend:
            id: 405
            link: bond1
            mtu: 9000
          neutron-tunnel:
            id: 406
            link: bond1
            addresses:
            - {{ip "neutron-tunnel:neutron-tunnel-subnet"}}
            mtu: 9000
        bridges:
          k8s-lcm:
            interfaces: [k8s-lcm-v]
            nameservers:
              addresses: {{nameservers_from_subnet "lcm-nw"}}
            gateway4: {{ gateway_from_subnet "lcm-nw" }}
            addresses:
            - {{ ip "0:lcm-nw" }}
          k8s-ext:
            interfaces: [k8s-ext-v]
            addresses:
            - {{ip "k8s-ext:k8s-ext-subnet"}}
            mtu: 9000
          k8s-pods:
            interfaces: [k8s-pods-v]
            addresses:
            - {{ip "k8s-pods:k8s-pods-subnet"}}
            mtu: 9000
          ceph-public:
            interfaces: [stor-frontend]
            addresses:
            - {{ip "ceph-public:ceph-public-subnet"}}
            mtu: 9000
          ceph-cluster:
            interfaces: [stor-backend]
            addresses:
            - {{ip "ceph-cluster:ceph-cluster-subnet"}}
            mtu: 9000
    
  3. Proceed with Create an L2 template for a MOSK compute node.

Create an L2 template for a MOSK compute node

Warning

Avoid modifying existing L2 templates and subnets that the deployed machines use. This prevents multiple clusters failures caused by unsafe changes. The list of risks posed by modifying L2 templates includes:

  • Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.

  • Connections between services are interrupted unexpectedly, which can cause data loss.

  • Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.

Note

Starting from MOSK 22.3, modification of L2 templates in use is prohibited in the API to prevent accidental cluster failures due to unsafe changes.

According to the reference architecture, MOSK compute nodes must be connected to the following networks:

  • PXE network

  • LCM network

  • Storage public network (if deploying with Ceph as a back-end for ephemeral storage)

  • Floating IP and provider networks (if deploying OpenStack with DVR)

  • Tenant underlay networks

To create an L2 template for MOSK compute nodes:

  1. Add L2 template to the mosk-l2templates.yml file using the following example. Adjust the values of parameters according to the specification of your environment.

    Example of an L2 template for MOSK compute nodes
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: <MOSKClusterName>
      name: mosk-compute
      namespace: <MOSKClusterNamespace>
    spec:
      autoIfMappingPrio:
      - provision
      - eno
      - ens
      - enp
      clusterRef: <MOSKClusterName>
      l3Layout:
      - subnetName: lcm-nw
        scope: global
        labelSelector:
          kaas.mirantis.com/provider: baremetal
          kaas-mgmt-subnet: ""
      - subnetName: k8s-ext-subnet
        scope: namespace
      - subnetName: k8s-pods-subnet
        scope: namespace
      - subnetName: ceph-cluster-subnet
        scope: namespace
      - subnetName: neutron-tunnel-subnet
        scope: namespace
      npTemplate: |-
        version: 2
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
            mtu: 9000
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
            mtu: 9000
          {{nic 2}}
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 2}}
            set-name: {{nic 2}}
            mtu: 9000
          {{nic 3}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 3}}
            set-name: {{nic 3}}
            mtu: 9000
        bonds:
          bond0:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 0}}
            - {{nic 1}}
          bond1:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 2}}
            - {{nic 3}}
        vlans:
          k8s-lcm-v:
            id: 403
            link: bond0
            mtu: 9000
          k8s-ext-v:
            id: 409
            link: bond0
            mtu: 9000
          k8s-pods-v:
            id: 408
            link: bond0
            mtu: 9000
          pr-floating:
            id: 407
            link: bond1
            mtu: 9000
          stor-frontend:
            id: 404
            link: bond0
            mtu: 9000
          stor-backend:
            id: 405
            link: bond1
            mtu: 9000
          neutron-tunnel:
            id: 406
            link: bond1
            addresses:
            - {{ip "neutron-tunnel:neutron-tunnel-subnet"}}
            mtu: 9000
        bridges:
          k8s-lcm:
            interfaces: [k8s-lcm-v]
            nameservers:
              addresses: {{nameservers_from_subnet "lcm-nw"}}
            gateway4: {{ gateway_from_subnet "lcm-nw" }}
            addresses:
            - {{ ip "0:lcm-nw" }}
          k8s-ext:
            interfaces: [k8s-ext-v]
            addresses:
            - {{ip "k8s-ext:k8s-ext-subnet"}}
            mtu: 9000
          k8s-pods:
            interfaces: [k8s-pods-v]
            addresses:
            - {{ip "k8s-pods:k8s-pods-subnet"}}
            mtu: 9000
          ceph-public:
            interfaces: [stor-frontend]
            addresses:
            - {{ip "ceph-public:ceph-public-subnet"}}
            mtu: 9000
          ceph-cluster:
            interfaces: [stor-backend]
            addresses:
            - {{ip "ceph-cluster:ceph-cluster-subnet"}}
            mtu: 9000
    
  2. Proceed with Create an L2 template for a MOSK storage node.

Create an L2 template for a MOSK storage node

Warning

Avoid modifying existing L2 templates and subnets that the deployed machines use. This prevents multiple clusters failures caused by unsafe changes. The list of risks posed by modifying L2 templates includes:

  • Services running on hosts cannot reconfigure automatically to switch to the new IP addresses and/or interfaces.

  • Connections between services are interrupted unexpectedly, which can cause data loss.

  • Incorrect configurations on hosts can lead to irrevocable loss of connectivity between services and unexpected cluster partition or disassembly.

Note

Starting from MOSK 22.3, modification of L2 templates in use is prohibited in the API to prevent accidental cluster failures due to unsafe changes.

According to the reference architecture, MOSK storage nodes in the MOSK cluster must be connected to the following networks:

  • PXE network

  • LCM network

  • Storage access network

  • Storage replication network

To create an L2 template for MOSK storage nodes:

  1. Add an L2 template to the mosk-l2templates.yml file using the following example. Adjust the values of parameters according to the specification of your environment.

    Example of an L2 template for MOSK storage nodes
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: <MOSKClusterName>
      name: mosk-storage
      namespace: <MOSKClusterNamespace>
    spec:
      autoIfMappingPrio:
      - provision
      - eno
      - ens
      - enp
      clusterRef: <MOSKClusterName>
      l3Layout:
      - subnetName: lcm-nw
        scope: global
        labelSelector:
          kaas.mirantis.com/provider: baremetal
          kaas-mgmt-subnet: ""
      - subnetName: k8s-ext-subnet
        scope: namespace
      - subnetName: k8s-pods-subnet
        scope: namespace
      - subnetName: ceph-cluster-subnet
        scope: namespace
      - subnetName: ceph-public-subnet
        scope: namespace
      npTemplate: |-
        version: 2
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
            mtu: 9000
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
            mtu: 9000
          {{nic 2}}
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 2}}
            set-name: {{nic 2}}
            mtu: 9000
          {{nic 3}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 3}}
            set-name: {{nic 3}}
            mtu: 9000
        bonds:
          bond0:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 0}}
            - {{nic 1}}
          bond1:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 2}}
            - {{nic 3}}
        vlans:
          k8s-lcm-v:
            id: 403
            link: bond0
            mtu: 9000
          k8s-ext-v:
            id: 409
            link: bond0
            mtu: 9000
          k8s-pods-v:
            id: 408
            link: bond0
            mtu: 9000
          stor-frontend:
            id: 404
            link: bond0
            mtu: 9000
          stor-backend:
            id: 405
            link: bond1
            mtu: 9000
        bridges:
          k8s-lcm:
            interfaces: [k8s-lcm-v]
            nameservers:
              addresses: {{nameservers_from_subnet "lcm-nw"}}
            gateway4: {{ gateway_from_subnet "lcm-nw" }}
            addresses:
            - {{ ip "0:lcm-nw" }}
          k8s-ext:
            interfaces: [k8s-ext-v]
            addresses:
            - {{ip "k8s-ext:k8s-ext-subnet"}}
            mtu: 9000
          k8s-pods:
            interfaces: [k8s-pods-v]
            addresses:
            - {{ip "k8s-pods:k8s-pods-subnet"}}
            mtu: 9000
          ceph-public:
            interfaces: [stor-frontend]
            addresses:
            - {{ip "ceph-public:ceph-public-subnet"}}
            mtu: 9000
          ceph-cluster:
            interfaces: [stor-backend]
            addresses:
            - {{ip "ceph-cluster:ceph-cluster-subnet"}}
            mtu: 9000
    
  2. Proceed with Edit and apply L2 templates.

Edit and apply L2 templates

To add L2 templates to a MOSK cluster:

  1. Log in to a local machine where your management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created during the last stage of the management cluster bootstrap.

  2. Add the L2 template to your management cluster:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> apply -f <pathToL2TemplateYamlFile>
    
  3. Inspect the existing L2 templates to see if any one fits your deployment:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> \
    get l2template -n <ProjectNameForNewManagedCluster>
    
  4. Optional. Further modify the template, if required or in case of mistake in configuration. See Mandatory parameters of L2 templates and Netplan template macros for details.

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> \
    -n <ProjectNameForNewManagedCluster> edit l2template <L2templateName>
    
Mandatory parameters of L2 templates

Think of an L2 template as a template for networking configuration for your hosts. You may adjust the parameters according to the actual requirements and hardware setup of your hosts.

L2 template mandatory parameters

Parameter

Description

clusterRef

References the Cluster object that this template is applied to. The default value is used to apply the given template to all clusters within a particular project, unless an L2 template that references a specific cluster name exists.

Caution

  • An L2 template must have the same namespace as the referenced cluster.

  • A cluster can be associated with many L2 templates. Only one of them can have the ipam/DefaultForCluster label. Every L2 template that does not have the ipam/DefaultForCluster label can be later assigned to a particular machine using l2TemplateSelector.

  • A project (Kubernetes namespace) can have only one default L2 template (L2Template with Spec.clusterRef: default).

ifMapping or autoIfMappingPrio

  • ifMapping is a list of interface names for the template. The interface mapping is defined globally for all bare metal hosts in the cluster but can be overridden at the host level, if required, by editing the IpamHost object for a particular host. The ifMapping parameter is mutually exclusive with autoIfMappingPrio.

  • autoIfMappingPrio is a list of prefixes, such as eno, ens, and so on, to match the interfaces to automatically create a list for the template. If you are not aware of any specific ordering of interfaces on the nodes, use the default ordering from Predictable Network Interfaces Names specification for systemd.

    You can also override the default NIC list per host using the IfMappingOverride parameter of the corresponding IpamHost. The provision value corresponds to the network interface that was used to provision a node. Usually, it is the first NIC found on a particular node. It is defined explicitly to ensure that this interface will not be reconfigured accidentally.

    The autoIfMappingPrio parameter is mutually exclusive with ifMapping.

l3Layout

Subnets to be used in the npTemplate section. The field contains a list of subnet definitions with parameters used by template macros.

  • subnetName

    Defines the alias name of the subnet that can be used to reference this subnet from the template macros. This parameter is mandatory for every entry in the l3Layout list.

  • subnetPool

    Optional. Default: none. Defines a name of the parent SubnetPool object that will be used to create a Subnet object with a given subnetName and scope.

    If a corresponding Subnet object already exists, nothing will be created and the existing object will be used. If no SubnetPool is provided, no new Subnet object will be created.

  • scope

    Logical scope of the Subnet object with a corresponding subnetName. Possible values:

    • global - the Subnet object is accessible globally, for any Container Cloud project and cluster in the region, for example, the PXE subnet.

    • namespace - the Subnet object is accessible within the same project and region where the L2 template is defined.

    • cluster - the Subnet object is only accessible to the cluster that L2Template.spec.clusterRef refers to. The Subnet objects with the cluster scope will be created for every new cluster.

  • labelSelector

    Contains a dictionary of labels and their respective values that will be used to find the matching Subnet object for the subnet. If the labelSelector field is omitted, the Subnet object will be selected by name, specified by the subnetName parameter.

Caution

The l3Layout section is mandatory for each L2Template custom resource.

Caution

Using the l3Layout section, define all subnets that are used in the npTemplate section. Defining only part of subnets is not allowed.

If labelSelector is used in l3Layout, use any custom label name that differs from system names. This allows for easier cluster scaling in case of adding new subnets as described in Expand IP addresses capacity in an existing cluster.

Mirantis recommends using a unique label prefix such as user-defined/.

npTemplate

A netplan-compatible configuration with special lookup functions that defines the networking settings for the cluster hosts, where physical NIC names and details are parameterized. This configuration will be processed using Go templates. Instead of specifying IP and MAC addresses, interface names, and other network details specific to a particular host, the template supports use of special lookup functions. These lookup functions, such as nic, mac, ip, and so on, return host-specific network information when the template is rendered for a particular host. For details about netplan, see the official netplan documentation.

Caution

All rules and restrictions of the netplan configuration also apply to L2 templates. For details, see the official netplan documentation.

Caution

We strongly recommend following the below conventions on network interface naming:

  • A physical NIC name set by an L2 template must not exceed 15 symbols. Otherwise, an L2 template creation fails. This limit is set by the Linux kernel.

  • Names of virtual network interfaces such as VLANs, bridges, bonds, veth, and so on must not exceed 15 symbols.

We recommend setting interfaces names that do not exceed 13 symbols for both physical and virtual interfaces to avoid corner cases and issues in netplan rendering.

l3Layout section parameters

Parameter

Description

subnetName

Name of the reference to the subnet that will be used in the npTemplate section. This name may be different than the name of the actual Subnet resource, if labelSelector field is present and uniquely identifies the resource.

labelSelector

A dictionary of the labels and values that are used to filter out and find the Subnet resource to refer to from the template by the subnetName.

subnetPool

Optional. Default: none. Name of the parent SubnetPool object that will be used to create a Subnet object with a given subnetName and scope. If a corresponding Subnet object already exists, nothing will be created and the existing object will be used. If no SubnetPool is provided, no new Subnet object will be created.

scope

Logical scope of the Subnet object with a corresponding subnetName. Possible values:

  • global - the Subnet object is accessible globally, for any Container Cloud project and cluster in the region, for example, the PXE subnet.

  • namespace - the Subnet object is accessible within the same project and region where the L2 template is defined.

Netplan template macros

The following table describes the main lookup functions, or macros, that can be used in the npTemplate field of an L2 template.

Lookup function

Description

{{nic N}}

Name of a NIC number N. NIC numbers correspond to the interface mapping list. This macro can be used as a key for the elements of ethernets map, or as the value of name and set-name parameters of a NIC. It is also used to reference the physical NIC from definitions of virtual interfaces (vlan, bridge).

{{mac N}}

MAC address of a NIC number N registered during a host hardware inspection.

{{ip "N:subnet-a"}}

IP address and mask for a NIC number N. The address will be allocated automatically from the given subnet, unless an IP address for that interface already exists. The interface is identified by its MAC address.

{{ip "br0:subnet-x"}}

IP address and mask for a virtual interface, "br0" in this example. The address will be auto-allocated from the given subnet if the address does not exist yet.

{{gateway_from_subnet "subnet-a"}}

IPv4 default gateway address from the given subnet.

{{nameservers_from_subnet "subnet-a"}}

List of the IP addresses of name servers from the given subnet.

L2 template example with bonds and bridges

This section contains an exemplary L2 template that demonstrates how to set up bonds and bridges on hosts for your managed clusters.


Dedicated network for the Kubernetes pods traffic

If you want to use a dedicated network for Kubernetes pods traffic, configure each node with an IPv4 address that will be used to route the pods traffic between nodes. To accomplish that, use the npTemplate.bridges.k8s-pods bridge in the L2 template, as demonstrated in the example below. As defined in Container Cloud Reference Architecture: Host networking, this bridge name is reserved for the Kubernetes pods network. When the k8s-pods bridge is defined in an L2 template, Calico CNI uses that network for routing the pods traffic between nodes.

Dedicated network for the Kubernetes services traffic (MetalLB)

You can use a dedicated network for external connection to the Kubernetes services exposed by the cluster. If enabled, MetalLB will listen and respond on the dedicated virtual bridge. To accomplish that, configure each node where metallb-speaker is deployed with an IPv4 address. Both, the MetalLB IP address ranges and the IP addresses configured on those nodes, must fit in the same CIDR.

Use the npTemplate.bridges.k8s-ext bridge in the L2 template, as demonstrated in the example below. This bridge name is reserved for the Kubernetes external network. The Subnet object that corresponds to the k8s-ext bridge must have explicitly excluded IP address ranges that are in use by MetalLB.

Dedicated network for the Ceph distributed storage traffic

Starting from Container Cloud 2.7.0, you can configure dedicated networks for the Ceph cluster access and replication traffic. Set labels on the Subnet CRs for the corresponding networks, as described in Create subnets. Container Cloud automatically configures Ceph to use the addresses from these subnets. Ensure that the addresses are assigned to the storage nodes.

Use the npTemplate.bridges.ceph-cluster and npTemplate.bridges.ceph-public bridges in the L2 template, as demonstrated in the example below. These names are reserved for the Ceph cluster access and replication networks.

The Subnet objects used to assign IP addresses to these bridges must have corresponding labels ipam/SVC-ceph-public for the ceph-public bridge and ipam/SVC-ceph-cluster for the ceph-cluster bridge.

Example of an L2 template with interfaces bonding
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  name: test-managed
  namespace: managed-ns
spec:
  clusterRef: managed-cluster
  autoIfMappingPrio:
    - provision
    - eno
    - ens
    - enp
  npTemplate: |
    version: 2
    ethernets:
      ten10gbe0s0:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 2}}
        set-name: {{nic 2}}
      ten10gbe0s1:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 3}}
        set-name: {{nic 3}}
    bonds:
      bond0:
        interfaces:
          - ten10gbe0s0
          - ten10gbe0s1
    vlans:
      k8s-ext-vlan:
        id: 1001
        link: bond0
      k8s-pods-vlan:
        id: 1002
        link: bond0
      stor-frontend:
        id: 1003
        link: bond0
      stor-backend:
        id: 1004
        link: bond0
    bridges:
      k8s-ext:
        interfaces: [k8s-ext-vlan]
        addresses:
          - {{ip "k8s-ext:demo-ext"}}
      k8s-pods:
        interfaces: [k8s-pods-vlan]
        addresses:
          - {{ip "k8s-pods:demo-pods"}}
      ceph-cluster:
        interfaces: [stor-backend]
        addresses:
          - {{ip "ceph-cluster:demo-ceph-cluster"}}
      ceph-public
        interfaces: [stor-frontend]
        addresses:
          - {{ip "ceph-public:demo-ceph-public"}}
Configure the MetalLB speaker node selector

By default, MetalLB speakers are deployed on all Kubernetes nodes except master nodes. You can decrease the number of MetalLB speakers or run them on a particular set of nodes.

To customize the MetalLB speaker node selector:

  1. Using kubeconfig of the Container Cloud management cluster, open the MOSK Cluster object for editing:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <OSClusterNamespace> edit cluster <OSClusterName>
    
  2. In the spec:providerSpec:value:helmReleases section, add the speaker.nodeSelector field for metallb:

     spec:
       ...
       providerSpec:
         value:
           ...
           helmReleases:
           - name: metallb
             values:
               configInline:
                 ...
               speaker:
                 nodeSelector:
                   metallbSpeakerEnabled: "true"
    

    The metallbSpeakerEnabled: "true" parameter in this example is the label on Kubernetes nodes where MetalLB speakers will be deployed. It can be an already existing node label or a new one.

    Note

    Due to the issue with collocation of MetalLB speaker and the OpenStack Ingress service Pods, the use of the MetalLB speaker node selector is limited. For details, see [24435] MetalLB speaker fails to announce the LB IP for the Ingress service.

    You can add user-defined labels to nodes using the nodeLabels field.

    This field contains the list of node labels to be attached to a node for the user to run certain components on separate cluster nodes. The list of allowed node labels is located in the Cluster object status providerStatus.releaseRef.current.allowedNodeLabels field.

    Starting from MOSK 22.3, if the value field is not defined in allowedNodeLabels, a label can have any value. For example:

    allowedNodeLabels:
    - displayName: Stacklight
      key: stacklight
    

    Before or after a machine deployment, add the required label from the allowed node labels list with the corresponding value to spec.providerSpec.value.nodeLabels in machine.yaml. For example:

    nodeLabels:
    - key: stacklight
      value: enabled
    

    Adding of a node label that is not available in the list of allowed node labels is restricted.

Add a machine

This section describes how to add a machine to a managed MOSK cluster using CLI for advanced configuration.

Create a machine using CLI

This section describes adding machines to a new MOSK cluster using Mirantis Container Cloud CLI.

If you need to add more machines to an existing MOSK cluster, see Add a controller node and Add a compute node.

To add machine to the MOSK cluster:

  1. Log in to the host where your management cluster kubeconfig is located and where kubectl is installed.

  2. Create a new text file mosk-cluster-machines.yaml and create the YAML definitons of the Machine resources. Use this as an example, and see the descriptions of the fields below:

    apiVersion: cluster.k8s.io/v1alpha1
    kind: Machine
    metadata:
      name: mosk-node-role-name
      namespace: mosk-project
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: mosk-cluster
    spec:
      providerSpec:
        value:
          apiVersion: baremetal.k8s.io/v1alpha1
          kind: BareMetalMachineProviderSpec
          bareMetalHostProfile:
            name: mosk-k8s-mgr
            namespace: mosk-project
          l2TemplateSelector:
            name: mosk-k8s-mgr
          hostSelector: {}
          l2TemplateMappingOverride: []
    
  3. Add the top level fields:

    • apiVersion

      API version of the object that is cluster.k8s.io/v1alpha1.

    • kind

      Object type that is Machine.

    • metadata

      This section will contain the metadata of the object.

    • spec

      This section will contain the configuration of the object.

  4. Add mandatory fields to the metadata section of the Machine object definition.

    • name

      The name of the Machine object.

    • namespace

      The name of the Project where the Machine will be created.

    • labels

      This section contains additional metadata of the machine. Set the following mandatory labels for the Machine object.

      • kaas.mirantis.com/provider

        Set to "baremetal".

      • kaas.mirantis.com/region

        Region name that matches the region name in the Cluster object.

      • cluster.sigs.k8s.io/cluster-name

        The name of the cluster to add the machine to.

  5. Configure the mandatory parameters of the Machine object in spec field. Add providerSpec field that contains parameters for deployment on bare metal in a form of Kubernetes subresource.

  6. In the providerSpec section, add the following mandatory configuration parameters:

    • apiVersion

      API version of the subresource that is baremetal.k8s.io/v1alpha1.

    • kind

      Object type that is BareMetalMachineProviderSpec.

    • bareMetalHostProfile

      Reference to a configuration profile of a bare metal host. It helps to pick bare metal host with suitable configuration for the machine. This section includes two parameters:

      • name

        Name of a bare metal host profile

      • namespace

        Project in which the bare metal host profile is created.

    • l2TemplateSelector

      If specified, contains the name (first priority) or label of the L2 template that will be applied during a machine creation. Note that changing this field after Machine object is created will not affect the host network configuration of the machine.

      You should assign one of the templates you defined in Create L2 templates to the machine. If there is no suitable templates, you should create one per Create L2 templates.

    • hostSelector

      This parameter defines matching criteria for picking a bare metal host for the machine by label.

      Any custom label that is assigned to one or more bare metal hosts using API can be used as a host selector. If the BareMetalHost objects with the specified label are missing, the Machine object will not be deployed until at least one bare metal host with the specified label is available.

      See Deploy a machine to a specific bare metal host for details.

    • l2TemplateIfMappingOverride

      This parameter contains a list of names of network interfaces of the host. It allows to override the default naming and ordering of network interfaces defined in L2 template referenced by the l2TemplateSelector. This ordering informs the L2 templates how to generate the host network configuration.

      See Override network interfaces naming and order for details.

  7. Depending on the role of the machine in the MOSK cluster, add labels to the nodeLabels field.

    This field contains the list of node labels to be attached to a node for the user to run certain components on separate cluster nodes. The list of allowed node labels is located in the Cluster object status providerStatus.releaseRef.current.allowedNodeLabels field.

    Starting from MOSK 22.3, if the value field is not defined in allowedNodeLabels, a label can have any value. For example:

    allowedNodeLabels:
    - displayName: Stacklight
      key: stacklight
    

    Before or after a machine deployment, add the required label from the allowed node labels list with the corresponding value to spec.providerSpec.value.nodeLabels in machine.yaml. For example:

    nodeLabels:
    - key: stacklight
      value: enabled
    

    Adding of a node label that is not available in the list of allowed node labels is restricted.

  8. If you are NOT deploying MOSK with the compact control plane, add 3 dedicated Kubernetes manager nodes.

    1. Add 3 Machine objects for Kubernetes manager nodes using the following label:

      metadata:
        labels:
          cluster.sigs.k8s.io/control-plane: "true"
      

      Note

      The value of the label might be any non-empty string. On a worker node, this label must be omitted entirely.

    2. Add 3 Machine objects for MOSK controller nodes using the following labels:

      spec:
        providerSpec:
          value:
            nodeLabels:
              openstack-control-plane: enabled
              openstack-gateway: enabled
      
  9. If you are deploying MOSK with compact control plane, add Machine objects for 3 combined control plane nodes using the following labels and parameters to the nodeLabels field:

    metadata:
      labels:
        cluster.sigs.k8s.io/control-plane: true
    spec:
      providerSpec:
        value:
          nodeLabels:
            openstack-control-plane: enabled
            openstack-gateway: enabled
            openvswitch: enabled
    
  10. Add Machine objects for as many compute nodes as you want to install using the following labels:

    spec:
      providerSpec:
        value:
          nodeLabels:
            openstack-compute-node: enabled
            openvswitch: enabled
    
  11. Save the text file and repeat the process to create configuration for all machines in your MOSK cluster.

  12. Create machines in the cluster using command:

    kubectl create -f mosk-cluster-machines.yaml
    

Proceed to Add a Ceph cluster.

Assign L2 templates to machines

To install MOSK on bare metal with Container Cloud, you must create L2 templates for each node type in the MOSK cluster. Additionally, you may have to create separate templates for nodes of the same type when they have different configuration.

To assign specific L2 templates to machines in a cluster:

  1. Use the clusterRef parameter in the L2 template spec to assign the templates to the cluster.

  2. Add a unique identifier label to every L2 template. Typically, that would be the name of the MOSK node role, for example l2template-compute, or l2template-compute-5nics.

  3. Assign an L2 template to a machine. Set the l2TemplateSelector field in the machine spec to the name of the label added in the previous step. IPAM Controller uses this field to use a specific L2 template for the corresponding machine.

    Alternatively, you may set the l2TemplateSelector field to the name of the L2 template.

Consider the following examples of an L2 template assignment to a machine.

Example of an L2Template resource
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  name: example-node-netconfig
  namespace: my-project
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    l2template-example-node-netconfig: "1"
...
spec:
  clusterRef: my-cluster
...

Example of a Machine resource with the label-based L2 template selector
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: machine1
  namespace: my-project
...
spec:
  providerSpec:
    value:
      l2TemplateSelector:
        label: l2template-example-node-netconfig
...

Example of a Machine resource with the name-based L2 template selector
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: machine1
  namespace: my-project
...
spec:
  providerSpec:
    value:
      l2TemplateSelector:
        name: example-node-netconfig
...

Now, proceed to Deploy a machine to a specific bare metal host.

Deploy a machine to a specific bare metal host

Machine in a MOSK cluster requires dedicated bare metal host for deployment. The bare metal hosts are represented by the BareMetalHost objects in MCC management API. All BareMetalHost objects must be labeled upon creation with a label that will allow to identify the host and assign it to a machine.

The labels may be unique, or applied to a group of hosts, based on similarities in their capacity, capabilities and hardware configuration, on their location, suitable role, or a combination of thereof.

In some cases, you may need to deploy a machine to a specific bare metal host. This is especially useful when some of your bare metal hosts have different hardware configuration than the rest.

To deploy a machine to a specific bare metal host:

  1. Log in to the host where your management cluster kubeconfig is located and where kubectl is installed.

  2. Identify the bare metal host that you want to associate with the specific machine. For example, host host-1.

    kubectl get baremetalhost host-1 -o yaml
    
  3. Add a label that will uniquely identify this host, for example, by the name of the host and machine that you want to deploy on it.

    Caution

    Do not remove any existing labels from the BareMetalHost resource.

    kubectl edit baremetalhost host-1
    

    Configuration example:

    kind: BareMetalHost
    metadata:
      name: host-1
      namespace: myProjectName
      labels:
        kaas.mirantis.com/baremetalhost-id: host-1-worker-HW11-cad5
        ...
    
  4. Open the text file with the YAML definition of the Machine object, created in Create a machine using CLI.

  5. Add a host selector that matches the label you have added to the BareMetalHost object in the previous step.

    Example:

    kind: Machine
    metadata:
      name: worker-HW11-cad5
      namespace: myProjectName
    spec:
      ...
      providerSpec:
        value:
          apiVersion: baremetal.k8s.io/v1alpha1
          kind: BareMetalMachineProviderSpec
          ...
          hostSelector:
            matchLabels:
              kaas.mirantis.com/baremetalhost-id: host-1-worker-HW11-cad5
      ...
    

Once created, this machine will be associated with the specified bare metal host, and you can return to Create a machine using CLI.

Override network interfaces naming and order

An L2 template contains the ifMapping field that allows you to identify Ethernet interfaces for the template. The Machine object API enables the Operator to override the mapping from the L2 template by enforcing a specific order of names of the interfaces when applied to the template.

The field l2TemplateIfMappingOverride in the spec of the Machine object contains a list of interfaces names. The order of the interfaces names in the list is important because the L2Template object will be rendered with NICs ordered as per this list.

Note

Changes in the l2TemplateIfMappingOverride field will apply only once when the Machine and corresponding IpamHost objects are created. Further changes to l2TemplateIfMappingOverride will not reset the interfaces assignment and configuration.

Caution

The l2TemplateIfMappingOverride field must contain the names of all interfaces of the bare metal host.

The following example illustrates how to include the override field to the Machine object. In this example, we configure the interface eno1, which is the second on-board interface of the server, to precede the first on-board interface eno0.

apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  finalizers:
  - foregroundDeletion
  - machine.cluster.sigs.k8s.io
  labels:
    cluster.sigs.k8s.io/cluster-name: kaas-mgmt
    cluster.sigs.k8s.io/control-plane: "true"
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
spec:
  providerSpec:
    value:
      apiVersion: baremetal.k8s.io/v1alpha1
      hostSelector:
        matchLabels:
          baremetal: hw-master-0
      image: {}
      kind: BareMetalMachineProviderSpec
      l2TemplateIfMappingOverride:
      - eno1
      - eno0
      - enp0s1
      - enp0s2

As a result of the configuration above, when used with the example L2 template for bonds and bridges described in Create L2 templates, the enp0s1 and enp0s2 interfaces will be in predictable ordered state. This state will be used to create subinterfaces for Kubernetes networks (k8s-pods) and for Kubernetes external network (k8s-ext).

Add a Ceph cluster

After you add machines to your new bare metal managed cluster as described in Add a machine, create a Ceph cluster on top of this managed cluster using the Mirantis Container Cloud web UI.

For an advanced configuration through the KaaSCephCluster CR, see Mirantis Container Cloud Operations Guide: Ceph advanced configuration. To configure Ceph Controller through Kubernetes templates to manage Ceph nodes resources, see Mirantis Container Cloud Operations Guide: Enable Ceph tolerations and resources management.

The procedure below enables you to create a Ceph cluster with minimum three Ceph nodes that provides persistent volumes to the Kubernetes workloads in the managed cluster.

To create a Ceph cluster in the managed cluster:

  1. Log in to the Container Cloud web UI with the m:kaas:namespace@operator or m:kaas:namespace@writer permissions.

  2. Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.

  3. In the Clusters tab, click the required cluster name. The Cluster page with the Machines and Ceph clusters lists opens.

  4. In the Ceph Clusters block, click Create Cluster.

  5. Configure the Ceph cluster in the Create New Ceph Cluster wizard that opens:

    Create new Ceph cluster

    Section

    Parameter name

    Description

    General settings

    Name

    The Ceph cluster name.

    Cluster Network

    Replication network for Ceph OSDs. Must contain the CIDR definition and match the corresponding values of the cluster L2Template object or the environment network values.

    Public Network

    Public network for Ceph data. Must contain the CIDR definition and match the corresponding values of the cluster L2Template object or the environment network values.

    Enable OSDs LCM

    Select to enable LCM for Ceph OSDs.

    Machines / Machine #1-3

    Select machine

    Select the name of the Kubernetes machine that will host the corresponding Ceph node in the Ceph cluster.

    Manager, Monitor

    Select the required Ceph services to install on the Ceph node.

    Devices

    Select the disk that Ceph will use.

    Warning

    Do not select the device for system services, for example, sda.

    Enable Object Storage

    Select to enable the single-instance RGW Object Storage.

  6. To add more Ceph nodes to the new Ceph cluster, click + next to any Ceph Machine title in the Machines tab. Configure a Ceph node as required.

    Warning

    Do not add more than 3 Manager and/or Monitor services to the Ceph cluster.

  7. After you add and configure all nodes in your Ceph cluster, click Create.

  8. Open the KaaSCephCluster CR for editing as described in Mirantis Container Cloud Operations Guide: Ceph advanced configuration.

  9. Verify that the following snippet is present in the KaaSCephCluster configuration:

    network:
      clusterNet: 10.10.10.0/24
      publicNet: 10.10.11.0/24
    
  10. Configure the pools for Image, Block Storage, and Compute services.

    Note

    Ceph validates the specified pools. Therefore, do not omit any of the following pools.

    spec:
      pools:
        - default: true
          deviceClass: hdd
          name: kubernetes
          replicated:
            size: 3
          role: kubernetes
        - default: false
          deviceClass: hdd
          name: volumes
          replicated:
            size: 3
          role: volumes
        - default: false
          deviceClass: hdd
          name: vms
          replicated:
            size: 3
          role: vms
        - default: false
          deviceClass: hdd
          name: backup
          replicated:
            size: 3
          role: backup
        - default: false
          deviceClass: hdd
          name: images
          replicated:
            size: 3
          role: images
        - default: false
          deviceClass: hdd
          name: other
          replicated:
            size: 3
          role: other
    

    Each Ceph pool, depending on its role, has a default targetSizeRatio value that defines the expected consumption of the total Ceph cluster capacity. The default ratio values for MOSK pools are as follows:

    • 20.0% for a Ceph pool with role volumes

    • 40.0% for a Ceph pool with role vms

    • 10.0% for a Ceph pool with role images

    • 10.0% for a Ceph pool with role backup

  11. Once all pools are created, verify that an appropriate secret required for a successful deployment of the OpenStack services that rely on Ceph is created in the openstack-ceph-shared namespace:

    kubectl -n openstack-ceph-shared get secrets openstack-ceph-keys
    

    Example of a positive system response:

    NAME                  TYPE     DATA   AGE
    openstack-ceph-keys   Opaque   7      36m
    
  12. Verify your Ceph cluster as described in Mirantis Container Cloud Operations Guide: Verify Ceph.

Delete a managed cluster

Due to a development limitation in baremetal operator, deletion of a managed cluster requires preliminary deletion of the worker machines running on the cluster.

Using the Container Cloud web UI, first delete worker machines one by one until you hit the minimum of 2 workers for an operational cluster. After that, you can delete the cluster with the remaining workers and managers.

To delete a baremetal-based managed cluster:

  1. Log in to the Mirantis Container Cloud web UI with the writer permissions.

  2. Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.

  3. In the Clusters tab, click the required cluster name to open the list of machines running on it.

  4. Click the More action icon in the last column of the worker machine you want to delete and select Delete. Confirm the deletion.

  5. Repeat the step above until you have 2 workers left.

  6. In the Clusters tab, click the More action icon in the last column of the required cluster and select Delete.

  7. Verify the list of machines to be removed. Confirm the deletion.

  8. Optional. If you do not plan to reuse the credentials of the deleted cluster, delete them:

    1. In the Credentials tab, click the Delete credential action icon next to the name of the credentials to be deleted.

    2. Confirm the deletion.

    Warning

    You can delete credentials only after deleting the managed cluster they relate to.

Deleting a cluster automatically frees up the resources allocated for this cluster, for example, instances, load balancers, networks, floating IPs, and so on.

Deploy OpenStack

This section instructs you on how to deploy OpenStack on top of Kubernetes as well as how to troubleshoot the deployment and access your OpenStack environment after deployment.

Deploy an OpenStack cluster

This section instructs you on how to deploy OpenStack on top of Kubernetes using the OpenStack Controller and openstackdeployments.lcm.mirantis.com (OsDpl) CR.

To deploy an OpenStack cluster:

  1. Verify that you have pre-configured the networking according to Networking.

  2. Verify that the TLS certificates that will be required for the OpenStack cluster deployment have been pre-generated.

    Note

    The Transport Layer Security (TLS) protocol is mandatory on public endpoints.

    Caution

    To avoid certificates renewal with subsequent OpenStack updates during which additional services with new public endpoints may appear, we recommend using wildcard SSL certificates for public endpoints. For example, *.it.just.works, where it.just.works is a cluster public domain.

    The sample code block below illustrates how to generate a self-signed certificate for the it.just.works domain. The procedure presumes the cfssl and cfssljson tools are installed on the machine.

    mkdir cert && cd cert
    
    tee ca-config.json << EOF
    {
      "signing": {
        "default": {
          "expiry": "8760h"
        },
        "profiles": {
          "kubernetes": {
            "usages": [
              "signing",
              "key encipherment",
              "server auth",
              "client auth"
            ],
            "expiry": "8760h"
          }
        }
      }
    }
    EOF
    
    tee ca-csr.json << EOF
    {
      "CN": "kubernetes",
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names":[{
        "C": "<country>",
        "ST": "<state>",
        "L": "<city>",
        "O": "<organization>",
        "OU": "<organization unit>"
      }]
    }
    EOF
    
    cfssl gencert -initca ca-csr.json | cfssljson -bare ca
    
    tee server-csr.json << EOF
    {
        "CN": "*.it.just.works",
        "hosts":     [
            "*.it.just.works"
        ],
        "key":     {
            "algo": "rsa",
            "size": 2048
        },
        "names": [    {
            "C": "US",
            "L": "CA",
            "ST": "San Francisco"
        }]
    }
    EOF
    cfssl gencert -ca=ca.pem -ca-key=ca-key.pem --config=ca-config.json -profile=kubernetes server-csr.json | cfssljson -bare server
    
  3. Create the openstackdeployment.yaml file that will include the OpenStack cluster deployment configuration.

    Note

    The resource of kind OpenStackDeployment (OsDpl) is a custom resource defined by a resource of kind CustomResourceDefinition. The resource is validated with the help of the OpenAPI v3 schema.

  4. Configure the OsDpl resource depending on the needs of your deployment. For the configuration details, refer to OpenStackDeployment custom resource.

    Note

    If you plan to deploy the Telemetry service, you have to specify the Telemetry mode through features:telemetry:mode as described in OpenStackDeployment custom resource. Otherwise, Telemetry will fail to deploy.

    Example of an OpenStackDeployment CR of minimum configuration
    apiVersion: lcm.mirantis.com/v1alpha1
    kind: OpenStackDeployment
    metadata:
      name: openstack-cluster
      namespace: openstack
    spec:
      openstack_version: victoria
      preset: compute
      size: tiny
      internal_domain_name: cluster.local
      public_domain_name: it.just.works
      features:
        neutron:
          tunnel_interface: ens3
          external_networks:
            - physnet: physnet1
              interface: veth-phy
              bridge: br-ex
              network_types:
               - flat
              vlan_ranges: null
              mtu: null
          floating_network:
            enabled: False
        nova:
          live_migration_interface: ens3
          images:
            backend: local
    
  5. If required, enable DPDK, huge pages, and other supported Telco features as described in Advanced OpenStack configuration (optional).

  6. To the openstackdeployment object, add information about the TLS certificates:

    • ssl:public_endpoints:ca_cert - CA certificate content (ca.pem)

    • ssl:public_endpoints:api_cert - server certificate content (server.pem)

    • ssl:public_endpoints:api_key - server private key (server-key.pem)

  7. Verify that the Load Balancer network does not overlap your corporate or internal Kubernetes networks, for example, Calico IP pools. Also, verify that the pool of Load Balancer network is big enough to provide IP addresses for all Amphora VMs (loadbalancers).

    If required, reconfigure the Octavia network settings using the following sample structure:

    spec:
      services:
        load-balancer:
          octavia:
            values:
              octavia:
                settings:
                  lbmgmt_cidr: "10.255.0.0/16"
                  lbmgmt_subnet_start: "10.255.1.0"
                  lbmgmt_subnet_end: "10.255.255.254"
    
  8. Trigger the OpenStack deployment:

    kubectl apply -f openstackdeployment.yaml
    
  9. Monitor the status of your OpenStack deployment:

    kubectl -n openstack get pods
    kubectl -n openstack describe osdpl osh-dev
    
  10. Assess the current status of the OpenStack deployment using the status section output in the OsDpl resource:

    1. Get the OsDpl YAML file:

      kubectl -n openstack get osdpl osh-dev -o yaml
      
    2. Analyze the status output using the detailed description in OpenStackDeployment custom resource.

  11. Verify that the OpenStack cluster has been deployed:

    clinet_pod_name=$(kubectl -n openstack get pods -l application=keystone,component=client  | grep keystone-client | head -1 | awk '{print $1}')
    kubectl -n openstack exec -it $clinet_pod_name -- openstack service list
    

    Example of a positive system response:

    +----------------------------------+---------------+----------------+
    | ID                               | Name          | Type           |
    +----------------------------------+---------------+----------------+
    | 159f5c7e59784179b589f933bf9fc6b0 | cinderv3      | volumev3       |
    | 6ad762f04eb64a31a9567c1c3e5a53b4 | keystone      | identity       |
    | 7e265e0f37e34971959ce2dd9eafb5dc | heat          | orchestration  |
    | 8bc263babe9944cdb51e3b5981a0096b | nova          | compute        |
    | 9571a49d1fdd4a9f9e33972751125f3f | placement     | placement      |
    | a3f9b25b7447436b85158946ca1c15e2 | neutron       | network        |
    | af20129d67a14cadbe8d33ebe4b147a8 | heat-cfn      | cloudformation |
    | b00b5ad18c324ac9b1c83d7eb58c76f5 | radosgw-swift | object-store   |
    | b28217da1116498fa70e5b8d1b1457e5 | cinderv2      | volumev2       |
    | e601c0749ce5425c8efb789278656dd4 | glance        | image          |
    +----------------------------------+---------------+----------------+
    
  12. Register a record on the customer DNS:

    Caution

    The DNS component is mandatory to access OpenStack public endpoints.

    1. Obtain the full list of endpoints:

      kubectl -n openstack get ingress -ocustom-columns=NAME:.metadata.name,HOSTS:spec.rules[*].host | awk '/namespace-fqdn/ {print $2}'
      

      Example of system response:

      barbican.<spec:public_domain_name>
      cinder.<spec:public_domain_name>
      cloudformation.<spec:public_domain_name>
      designate.<spec:public_domain_name>
      glance.<spec:public_domain_name>
      heat.<spec:public_domain_name>
      horizon.<spec:public_domain_name>
      keystone.<spec:public_domain_name>
      metadata.<spec:public_domain_name>
      neutron.<spec:public_domain_name>
      nova.<spec:public_domain_name>
      novncproxy.<spec:public_domain_name>
      octavia.<spec:public_domain_name>
      placement.<spec:public_domain_name>
      
    2. Obtain the public endpoint IP:

      kubectl -n openstack get services ingress
      

      In the system response, capture EXTERNAL-IP.

      Example of system response:

      NAME      TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)                                      AGE
      ingress   LoadBalancer   10.96.32.97   10.172.1.101   80:34234/TCP,443:34927/TCP,10246:33658/TCP   4h56m
      
    3. Ask the customer to create records for public endpoints, obtained earlier in this procedure, to EXTERNAL-IP from the Ingress service.

See also

Networking

Advanced OpenStack configuration (optional)

This section includes configuration information for available advanced Mirantis OpenStack for Kubernetes features that include DPDK with the Neutron OVS back end, huge pages, CPU pinning, and other Enhanced Platform Awareness (EPA) capabilities.

Enable LVM ephemeral storage

TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to configure LVM as back end for the VM disks and ephemeral storage.

Warning

Usage of more than one nonvolatile memory express (NVMe) drive per node may cause update issues and is therefore not supported.

To enable LVM ephemeral storage:

  1. In BareMetalHostProfile in the spec:volumeGroups section, add the following configuration for the OpenStack compute nodes:

    spec:
      devices:
        - device:
            byName: /dev/nvme0n1
            minSizeGiB: 30
            wipe: true
          partitions:
            - name: lvm_nova_vol
              sizeGiB: 0
              wipe: true
      volumeGroups:
        - devices:
          - partition: lvm_nova_vol
          name: nova-vol
      logicalVolumes:
        - name: nova-fake
          vg: nova-vol
          sizeGiB: 0.1
      fileSystems:
        - fileSystem: ext4
          logicalVolume: nova-fake
          mountPoint: /nova-fake
    

    Note

    Due to a limitation, it is not possible to create volume groups without logical volumes and formatted partitions. Therefore, set the logicalVolumes:name, fileSystems:logicalVolume, and fileSystems:mountPoint parameters to nova-fake.

    For details about BareMetalHostProfile, see Mirantis Container Cloud Operations Guide: Create a custom bare metal host profile.

  2. Configure the OpenStackDeployment CR to deploy OpenStack with LVM ephemeral storage. For example:

    spec:
      features:
        nova:
          images:
            backend: lvm
            lvm:
              volume_group: "nova-vol"
    
  3. Optional. Enable encryption for the LVM ephemeral storage by adding the following metadata in the OpenStackDeployment CR:

    spec:
      features:
        nova:
          images:
            encryption:
              enabled: true
              cipher: "aes-xts-plain64"
              key_size: 256
    

    Caution

    Both live and cold migrations are not supported for such instances.

Enable LVM block storage

TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to configure LVM as a back end for the OpenStack Block Storage service.

To enable LVM block storage:

  1. Open BareMetalHostProfile for editing.

  2. In the spec:volumeGroups section, specify the following data for the OpenStack compute nodes. In the following example, we deploy a Cinder volume with LVM on compute nodes. However, you can use dedicated nodes for this purpose.

    spec:
      devices:
        - device:
            byName: /dev/nvme0n1
            minSizeGiB: 30
            wipe: true
          partitions:
            - name: lvm_cinder_vol
              sizeGiB: 0
              wipe: true
      volumeGroups:
        - devices:
          - partition: lvm_cinder_vol
          name: cinder-vol
      logicalVolumes:
        - name: cinder-fake
          vg: cinder-vol
          sizeGiB: 0.1
      fileSystems:
        - fileSystem: ext4
          logicalVolume: cinder-fake
          mountPoint: /cinder-fake
    

    Note

    Due to a limitation, volume groups cannot be created without logical volumes and formatted partitions. Therefore, set the logicalVolumes:name, fileSystems:logicalVolume, and fileSystems:mountPoint parameters to cinder-fake.

    For details about BareMetalHostProfile, see Mirantis Container Cloud Operations Guide: Create a custom bare metal host profile.

  3. Configure the OpenStackDeployment CR to deploy OpenStack with LVM block storage. For example:

    spec:
      nodes:
        openstack-compute-node::enabled:
          features:
            cinder:
              volume:
                backends:
                  lvm:
                    lvm:
                      volume_group: "cinder-vol"
    
Enable DPDK with OVS

TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to enable DPDK with the Neutron OVS back end.

To enable DPDK with OVS:

  1. Verify that your deployment meets the following requirements:

  2. Enable DPDK in the OsDpl custom resource through the node specific overrides settings. For example:

    spec:
      nodes:
        <NODE-LABEL>::<NODE-LABEL-VALUE>:
          features:
            neutron:
              dpdk:
                bridges:
                - ip_address: 10.12.2.80/24
                  name: br-phy
                driver: igb_uio
                enabled: true
                nics:
                - bridge: br-phy
                  name: nic01
                  pci_id: "0000:05:00.0"
              tunnel_interface: br-phy
    
Enable SR-IOV with OVS

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to enable SR-IOV with the Neutron OVS back end.

To enable SR-IOV with OVS:

  1. Verify that your deployment meets the following requirements:

    • NICs with the SR-IOV support are installed

    • SR-IOV and VT-d are enabled in BIOS

  2. Enable IOMMU in the kernel by configuring intel_iommu=on in the GRUB configuration file. Specify the parameter for compute nodes in BareMetalHostProfile in the grubConfig section:

    spec:
      grubConfig:
          defaultGrubOptions:
            - 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX intel_iommu=on"'
    
  3. Configure the OpenStackDeployment CR to deploy OpenStack with the VLAN tenant network encapsulation.

    Caution

    To ensure correct appliance of the configuration changes, configure VLAN segmentation during the initial OpenStack deployment.

    Configuration example:

    spec:
      features:
        neutron:
          external_networks:
          - bridge: br-ex
            interface: pr-floating
            mtu: null
            network_types:
            - flat
            physnet: physnet1
            vlan_ranges: null
          - bridge: br-tenant
            interface: bond0
            network_types:
              - vlan
            physnet: tenant
            vlan_ranges: 490:499,1420:1459
          tenant_network_types:
            - vlan
    
  4. Enable SR-IOV in the OpenStackDeployment CR through the node-specific overrides settings. For example:

    spec:
      nodes:
        <NODE-LABEL>::<NODE-LABEL-VALUE>:
          features:
            neutron:
              sriov:
                enabled: true
                nics:
                - device: enp10s0f1
                  num_vfs: 7
                  physnet: tenant
    
Enable BGP VPN

TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

The BGP VPN service is an extra OpenStack Neutron plugin that enables connection of OpenStack Virtual Private Networks with external VPN sites through either BGP/MPLS IP VPNs or E-VPN.

To enable the BGP VPN service:

Enable BGP VPN in the OsDpl custom resource through the node specific overrides settings. For example:

spec:
  features:
    neutron:
      bgpvpn:
        enabled: true
         route_reflector:
           # Enable deploygin FRR route reflector
           enabled: true
           # Local AS number
           as_number: 64512
           # List of subnets we allow to connect to
           # router reflector BGP
           neighbor_subnets:
             - 10.0.0.0/8
             - 172.16.0.0/16
  nodes:
    openstack-compute-node::enabled:
      features:
        neutron:
          bgpvpn:
            enabled: true

When the service is enabled, a route reflector is scheduled on nodes with the openstack-frrouting: enabled label. Mirantis recommends collocating the route reflector nodes with the OpenStack controller nodes. By default, two replicas are deployed.

Encrypt the east-west traffic

TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

Mirantis OpenStack on Kubernetes allows configuring Internet Protocol Security (IPsec) encryption for the east-west tenant traffic between the OpenStack compute nodes and gateways. The feature uses the strongSwan open source IPsec solution. Authentication is accomplished through a pre-shared key (PSK). However, other authentication methods are upcoming.

To encrypt the east-west tenant traffic, enable ipsec in the spec:features:neutron settings of the OpenStackDeployment CR:

spec:
  features:
    neutron:
      ipsec:
        enabled: true
Enable Cinder back end for Glance

TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to configure Cinder back end for the for images through the OpenStackDeployment CR.

Note

This feature depends heavily on Cinder multi-attach, which enables you to simultaneously attach volumes to multiple instances. Therefore, only the block storage back ends that support multi-attach can be used.

To configure a Cinder back end for Glance, define the back end identity in the OpenStackDeployment CR. This identity will be used as a name for the back end section in the Glance configuration file.

When defining the back end:

  • Configure one of the back ends as default.

  • Configure each back end to use specific Cinder volume type.

    Note

    You can use the volume_type parameter instead of backend_name. If so, you have to create this volume type beforehand and take into account that the bootstrap script does not manage such volume types.

The blockstore identity definition example:

spec:
  features:
    glance:
      backends:
        cinder:
          blockstore:
            default: true
            backend_name: <volume_type:volume_name>
            # e.g. backend_name: lvm:lvm_store

spec:
  features:
    glance:
      backends:
        cinder:
          blockstore:
            default: true
            volume_type: netapp
Enable Cinder volume encryption

TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to enable Cinder volume encryption through the OpenStackDeployment CR using Linux Unified Key Setup (LUKS) and store the encryption keys in Barbican. For details, see Volume encryption.

To enable Cinder volume encryption:

  1. In the OpenStackDeployment CR, specify the LUKS volume type and configure the required encryption parameters for the storage system to encrypt or decrypt the volume.

    The volume_types definition example:

    spec:
      services:
        block-storage:
          cinder:
            values:
              bootstrap:
                volume_types:
                  volumes-hdd-luks:
                    arguments:
                      encryption-cipher: aes-xts-plain64
                      encryption-control-location: front-end
                      encryption-key-size: 256
                      encryption-provider: luks
                    volume_backend_name: volumes-hdd
    
  2. To create an encrypted volume as a non-admin user and store keys in the Barbican storage, assign the creator role to the user since the default Barbican policy allows only the admin or creator role:

    openstack role add --project <PROJECT-ID> --user <USER-ID> --creator <CREATOR-ID> creator
    
  3. Optional. To define an encrypted volume as a default one, specify volumes-hdd-luks in default_volume_type in the Cinder configuration:

    spec:
      services:
        block-storage:
          cinder:
            values:
              conf:
                cinder:
                  DEFAULT:
                    default_volume_type: volumes-hdd-luks
    
Advanced configuration for OpenStack compute nodes

Note

Consider this section as part of Deploy an OpenStack cluster.

The section describes how to perform advanced configuration for the OpenStack compute nodes. Such configuration can be required in some specific use cases, such as DPDK, SR-IOV, or huge pages features usage.

Configure the CPU model

Available since MOSK 22.1

Note

Consider this section as part of Deploy an OpenStack cluster.

Mirantis OpenStack for Kubernetes (MOSK) enables you to configure the vCPU model for all instances managed by the OpenStack Compute service (Nova) using the following osdpl definition:

spec:
  features:
    nova:
      vcpu_type: host-model

For the supported values and configuration examples, see vCPU type.

Enable huge pages for OpenStack

Note

Consider this section as part of Deploy an OpenStack cluster.

Note

The instruction provided in this section applies to both OpenStack with OVS and OpenStack with Tungsten Fabric topologies.

The huge pages OpenStack feature provides essential performance improvements for applications that are highly memory IO-bound. Huge pages should be enabled on a per compute node basis. By default, NUMATopologyFilter is enabled.

To activate the feature, you need to enable huge pages on the dedicated bare metal host as described in Enable huge pages in a host profile during the predeployment bare metal configuration.

Note

The multi-size huge pages are not fully supported by Kubernetes versions before 1.19. Therefore, define only one size in kernel parameters.

Configure CPU isolation for an instance

Note

Consider this section as part of Deploy an OpenStack cluster.

CPU isolation is a way to force the system scheduler to use only some logical CPU cores for processes. For compute hosts, you should typically isolate system processes and virtual guests on different cores. This section describes the two possible options on how to achieve this:

  • Through the isolcpus configuration parameter for Linux kernel Deprecated since MOSK 22.2

  • Through the cpusets mechanism in Linux kernel Available since MOSK 22.2, TechPreview

For details, see OpenStack official documentation: CPU topologies and Shielding Linux Resources.

Configure CPU isolation using isolcpus

Note

Starting from MOSK 22.2, isolcpus is deprecated.

Using the isolcpus parameter, specific CPUs are removed from the general kernel symmetrical multiprocessing (SMP) load balancing and scheduling. The only way to get tasks scheduled onto isolated CPUs is taskset. The list of isolcpus is configured statically at boot time. You can only change it by rebooting with a different value. In Linux kernel, the isolcpus parameter is deprecated in favor of cpusets.

To configure CPU isolation using isolcpus:

  1. Configure isolcpus in Linux kernel:

    GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=4-15"
    
  2. Apply the changes:

    update-grub
    
  3. Isolate cores from scheduling of Docker or Kubernetes workloads:

    cat <<-"EOF" > /usr/bin/setup-cgroups.sh
    #!/bin/bash
    
    set -x
    
    UNSHIELDED_CPUS=${UNSHIELDED_CPUS:-"0-3"}
    SHIELD_CPUS=${SHIELD_CPUS:-"4-15"}
    SHIELD_MODE=${SHIELD_MODE:-"isolcpu"} # One of isolcpu or cpuset
    
    DOCKER_CPUS=${DOCKER_CPUS:-$UNSHIELDED_CPUS}
    KUBERNETES_CPUS=${KUBERNETES_CPUS:-$UNSHIELDED_CPUS}
    CSET_CMD=${CSET_CMD:-"python2 /usr/bin/cset"}
    
    if [[ ${SHIELD_MODE} == "cpuset" ]]; then
        ${CSET_CMD} set -c ${UNSHIELDED_CPUS} -s system
        ${CSET_CMD} proc -m -f root -t system
        ${CSET_CMD} proc -k -f root -t system
    fi
    
    ${CSET_CMD} set --cpu=${DOCKER_CPUS} --set=docker
    ${CSET_CMD} set --cpu=${KUBERNETES_CPUS} --set=kubepods
    ${CSET_CMD} set --cpu=${DOCKER_CPUS} --set=com.docker.ucp
    
    EOF
    chmod +x /usr/bin/setup-cgroups.sh
    
    cat <<-"EOF" > /etc/systemd/system/shield-cpus.service
    [Unit]
    Descriptio=Shield CPUs
    DefaultDependencies=no
    After=systemd-udev-settle.service
    Before=lvm2-activation-early.service
    Wants=systemd-udev-settle.service
    [Service]
    ExecStart=/usr/bin/setup-cgroups.sh
    RemainAfterExit=true
    Type=oneshot
    [Install]
    WantedBy=basic.target
    EOF
    
    systemctl enable shield-cpus
    
  4. As root user, reboot the host:

    cat /sys/devices/system/cpu/isolated
    

    Example of system response:

    4-15
    
  5. As root user, verify that isolation is active:

    cset set -l
    

    Example of system response:

    cset:
         Name       CPUs-X      MEMs-X   Tasks Subs   Path
     ------------ ---------- - ------- - ----- ---- ----------
     root             0-15 y       0 y    1449    3  /
     kubepods         0-3 n        0 n       0    2  /kubepods
     docker           0-3 n        0 n       0    5  /docker
     com.docker.ucp   0-3 n        0 n       0    1  /com.docker.ucp
    
Configure CPU isolation using cpusets

Available since MOSK 22.2 TechPreview

The Linux kernel and cpuset provide a mechanism to run tasks by limiting the resources defined by a cpuset. The tasks can be moved from one cpuset to another to use the resources defined in other cpusets. The cset Python tool is a command-line interface to work with cpusets.

To configure CPU isolation using cpusets:

  1. Configure core isolation:

    Note

    You can also automate this step during deployment by using the postDeploy script as described in Create MOSK host profiles.

    cat <<-"EOF" > /usr/bin/setup-cgroups.sh
    #!/bin/bash
    
    set -x
    
    UNSHIELDED_CPUS=${UNSHIELDED_CPUS:-"0-3"}
    SHIELD_CPUS=${SHIELD_CPUS:-"4-15"}
    SHIELD_MODE=${SHIELD_MODE:-"cpuset"} # One of isolcpu or cpuset
    
    DOCKER_CPUS=${DOCKER_CPUS:-$UNSHIELDED_CPUS}
    KUBERNETES_CPUS=${KUBERNETES_CPUS:-$UNSHIELDED_CPUS}
    CSET_CMD=${CSET_CMD:-"python2 /usr/bin/cset"}
    
    if [[ ${SHIELD_MODE} == "cpuset" ]]; then
        ${CSET_CMD} set -c ${UNSHIELDED_CPUS} -s system
        ${CSET_CMD} proc -m -f root -t system
        ${CSET_CMD} proc -k -f root -t system
    fi
    
    ${CSET_CMD} set --cpu=${DOCKER_CPUS} --set=docker
    ${CSET_CMD} set --cpu=${KUBERNETES_CPUS} --set=kubepods
    ${CSET_CMD} set --cpu=${DOCKER_CPUS} --set=com.docker.ucp
    
    EOF
    chmod +x /usr/bin/setup-cgroups.sh
    
    cat <<-"EOF" > /etc/systemd/system/shield-cpus.service
    [Unit]
    Descriptio=Shield CPUs
    DefaultDependencies=no
    After=systemd-udev-settle.service
    Before=lvm2-activation-early.service
    Wants=systemd-udev-settle.service
    [Service]
    ExecStart=/usr/bin/setup-cgroups.sh
    RemainAfterExit=true
    Type=oneshot
    [Install]
    WantedBy=basic.target
    EOF
    
    systemctl enable shield-cpus
    
    reboot
    
  2. As root user, verify that isolation has been applied:

    cset set -l
    

    Example of system response:

    cset:
          Name       CPUs-X     MEMs-X    Tasks Subs   Path
      ------------ ---------- - ------- - ----- ---- ----------
      root             0-15 y       0 y     165    4  /
      kubepods         0-3 n        0 n       0    2  /kubepods
      docker           0-3 n        0 n       0    0  /docker
      system           0-3 n        0 n      65    0  /system
      com.docker.ucp   0-3 n        0 n       0    0  /com.docker.ucp
    
  3. Run the cpustress container:

    docker run -it --name cpustress --rm containerstack/cpustress --cpu 4 --timeout 30s --metrics-brief
    
  4. Verify that isolated cores are not affected:

    htop
    

    Example of system response highlighting the load created on all available Docker cores:

    _images/cpu-isolation-htop.png
Configure custom CPU topologies

Note

Consider this section as part of Deploy an OpenStack cluster.

The majority of CPU topologies features are activated by NUMATopologyFilter that is enabled by default. Such features do not require any further service configuration and can be used directly on a vanilla MOSK deployment. The list of the CPU topologies features includes, for example:

  • NUMA placement policies

  • CPU pinning policies

  • CPU thread pinning policies

  • CPU topologies

To enable libvirt CPU pinning through the node-specific overrides in the OpenStackDeployment custom resource, use the following sample configuration structure:

spec:
  nodes:
    <NODE-LABEL>::<NODE-LABEL-VALUE>:
      services:
        compute:
          nova:
            nova_compute:
              values:
                conf:
                  nova:
                    compute:
                      cpu_dedicated_set: 2-17
                      cpu_shared_set: 18-47
Configure PCI passthrough for guests

Note

Consider this section as part of Deploy an OpenStack cluster.

The Peripheral Component Interconnect (PCI) passthrough feature in OpenStack allows full access and direct control over physical PCI devices in guests. The mechanism applies to any kind of PCI devices including a Network Interface Card (NIC), Graphics Processing Unit (GPU), and any other device that can be attached to a PCI bus. The only requirement for the guest to properly use the device is to correctly install the driver.

To enable PCI passthrough in a MOSK deployment:

  1. For Linux X86 compute nodes, verify that the following features are enabled on the host:

  2. Configure the nova-api service that is scheduled on OpenStack controllers nodes. To generate the alias for PCI in nova.conf, add the alias configuration through the OpenStackDeployment CR.

    Note

    When configuring PCI with SR-IOV on the same host, the values specified in alias take precedence. Therefore, add the SR-IOV devices to passthrough_whitelist explicitly.

    For example:

    spec:
      services:
        compute:
          nova:
            values:
              conf:
                nova:
                  pci:
                    alias: '{ "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1" }'
    
  3. Configure the nova-compute service that is scheduled on OpenStack compute nodes. To enable Nova to pass PCI devices to virtual machines, configure the passthrough_whitelist section in nova.conf through the node-specific overrides in the OpenStackDeployment CR. For example:

    spec:
      nodes:
        <NODE-LABEL>::<NODE-LABEL-VALUE>:
          services:
            compute:
              nova:
                nova_compute:
                  values:
                    conf:
                      nova:
                        pci:
                          alias: '{ "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1" }'
                          passthrough_whitelist: |
                            [{"devname":"enp216s0f0","physical_network":"sriovnet0"}, { "vendor_id": "8086", "product_id": "154d" }]
    
Limit HW resources for hyperconverged OpenStack compute nodes

Note

Consider this section as part of Deploy an OpenStack cluster.

Hyperconverged architecture combines OpenStack compute nodes along with Ceph nodes. To avoid nodes overloading, which can cause Ceph performance degradation and outage, limit the hardware resources consumption by the OpenStack compute services.

You can reserve hardware resources for non-workload related consumption using the following nova-compute parameters. For details, see OpenStack documentation: Overcommitting CPU and RAM and OpenStack documentation: Configuration Options.

  • cpu_allocation_ratio - in case of a hyperconverged architecture, the value depends on the number of vCPU used for non-workload related operations, total number of vCPUs of a hyperconverged node, and on workload vCPU consumption:

    cpu_allocation_ratio = (${vCPU_count_on_a_hyperconverged_node} -
    ${vCPU_used_for_non_OpenStack_related_tasks}) /
    ${vCPU_count_on_a_hyperconverged_node} / ${workload_vCPU_utilization}
    

    To define the vCPU count used for non-OpenStack related tasks, use the following formula, considering the storage data plane performance tests:

    vCPU_used_for_non-OpenStack_related_tasks = 2 * SSDs_per_hyperconverged_node +
    1 * Ceph_OSDs_per_hyperconverged_node + 0.8 * Ceph_OSDs_per_hyperconverged_node
    

    Consider the following example with 5 SSD disks for Ceph OSDs per hyperconverged node and 2 Ceph OSDs per disk:

    vCPU_used_for_non-OpenStack_related_tasks = 2 * 5 + 1 * 10 + 0.8 * 10 = 28
    

    In this case, if there are 40 vCPUs per hyperconverged node, 28 vCPUs are required for non-workload related calculations, and a workload consumes 50% of the allocated CPU time: cpu_allocation_ratio = (40-28) / 40 / 0.5 = 0.6.


  • reserved_host_memory_mb - a dedicated variable in the OpenStack Nova configuration, to reserve memory for non-OpenStack related VM activities:

    reserved_host_memory_mb = 13 GB * Ceph_OSDs_per_hyperconverged_node
    

    For example for 10 Ceph OSDs per hyperconverged node: reserved_host_memory_mb = 13 GB * 10 = 130 GB = 133120


  • ram_allocation_ratio - the allocation ratio of virtual RAM to physical RAM. To completely exclude the possibility of memory overcommitting, set to 1.

To limit HW resources for hyperconverged OpenStack compute nodes:

In the OpenStackDeployment CR, specify the cpu_allocation_ratio, ram_allocation_ratio, and reserved_host_memory_mb parameters as required using the calculations described above.

For example:

apiVersion: lcm.mirantis.com/v1alpha1
kind: OpenStackDeployment
spec:
  services:
    compute:
      nova:
        values:
          conf:
            nova:
              DEFAULT:
                cpu_allocation_ratio: 0.6
                ram_allocation_ratio: 1
                reserved_host_memory_mb: 133120

Note

For an existing OpenStack deployment:

  1. Obtain the name of your OpenStackDeployment CR:

    kubectl -n openstack get osdpl
    
  2. Open the OpenStackDeployment CR for editing and specify the parameters as required.

    kubectl -n openstack edit osdpl <osdpl name>
    
Enable image signature verification

Available since MOSK 21.6 TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

Mirantis OpenStack for Kubernetes (MOSK) enables you to perform image signature verification when booting an OpenStack instance, uploading a Glance image with signature metadata fields set, and creating a volume from an image.

To enable signature verification, use the following osdpl definition:

spec:
  features:
    glance:
      signature:
        enabled: true

When enabled during initial deployment, all internal images such as Amphora, Ironic, and test (CirrOS, Fedora, Ubuntu) images, will be signed by a self-signed certificate.

Enable Telemetry services

The Telemetry services monitor OpenStack components, collect and store the telemetry data from them, and perform responsive actions upon this data.

To enable the Telemetry service:

Specify the following definition in the OpenStackDeployment custom resource (CR):

kind: OpenStackDeployment
spec:
  features:
    services:
    - alarming
    - event
    - metering
    - metric
  telemetry:
    mode: autoscaling
Configure LoadBalancer for PowerDNS

Available since MOSK 22.2

Note

Consider this section as part of Deploy an OpenStack cluster.

Mirantis OpenStack for Kubernetes (MOSK) allows configuring LoadBalancer for the Designate PowerDNS back end. For example, you can expose a TCP port for zone transferring using the following exemplary osdpl definition:

spec:
 designate:
   backend:
     external_ip: 10.172.1.101
     protocol: udp
     type: powerdns

For the supported values, see LoadBalancer type for PowerDNS.

Access OpenStack after deployment

This section contains the guidelines on how to access your MOSK OpenStack environment.

Configure DNS to access OpenStack

DNS is a mandatory component for MOSK deployment, all records must be created on the customer DNS server. The OpenStack services are exposed through the Ingress NGINX controller.

Warning

This document describes how to temporarily configure DNS. The workflow contains non-permanent changes that will be rolled back during a managed cluster update or reconciliation loop. Therefore, proceed at your own risk.

To configure DNS to access your OpenStack environment:

  1. Obtain the external IP address of the Ingress service:

    kubectl -n openstack get services ingress
    

    Example of system response:

    NAME      TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)                                      AGE
    ingress   LoadBalancer   10.96.32.97   10.172.1.101   80:34234/TCP,443:34927/TCP,10246:33658/TCP   4h56m
    
  2. Select from the following options:

    • If you have a corporate DNS server, update your corporate DNS service and create appropriate DNS records for all OpenStack public endpoints.

      To obtain the full list of public endpoints:

      kubectl -n openstack get ingress -ocustom-columns=NAME:.metadata.name,HOSTS:spec.rules[*].host | awk '/namespace-fqdn/ {print $2}'
      

      Example of system response:

      barbican.it.just.works
      cinder.it.just.works
      cloudformation.it.just.works
      designate.it.just.works
      glance.it.just.works
      heat.it.just.works
      horizon.it.just.works
      keystone.it.just.works
      neutron.it.just.works
      nova.it.just.works
      novncproxy.it.just.works
      octavia.it.just.works
      placement.it.just.works
      
    • If you do not have a corporate DNS server, perform one of the following steps:

      • Add the appropriate records to /etc/hosts locally. For example:

        10.172.1.101 barbican.it.just.works
        10.172.1.101 cinder.it.just.works
        10.172.1.101 cloudformation.it.just.works
        10.172.1.101 designate.it.just.works
        10.172.1.101 glance.it.just.works
        10.172.1.101 heat.it.just.works
        10.172.1.101 horizon.it.just.works
        10.172.1.101 keystone.it.just.works
        10.172.1.101 neutron.it.just.works
        10.172.1.101 nova.it.just.works
        10.172.1.101 novncproxy.it.just.works
        10.172.1.101 octavia.it.just.works
        10.172.1.101 placement.it.just.works
        
      • Deploy your DNS server on top of Kubernetes:

        1. Deploy a standalone CoreDNS server by including the following configuration into coredns.yaml:

          apiVersion: lcm.mirantis.com/v1alpha1
          kind: HelmBundle
          metadata:
            name: coredns
            namespace: osh-system
          spec:
            repositories:
            - name: hub_stable
              url: https://charts.helm.sh/stable
            releases:
            - name: coredns
              chart: hub_stable/coredns
              version: 1.8.1
              namespace: coredns
              values:
                image:
                  repository: mirantis.azurecr.io/openstack/extra/coredns
                  tag: "1.6.9"
                isClusterService: false
                servers:
                - zones:
                  - zone: .
                    scheme: dns://
                    use_tcp: false
                  port: 53
                  plugins:
                  - name: cache
                    parameters: 30
                  - name: errors
                  # Serves a /health endpoint on :8080, required for livenessProbe
                  - name: health
                  # Serves a /ready endpoint on :8181, required for readinessProbe
                  - name: ready
                  # Required to query kubernetes API for data
                  - name: kubernetes
                    parameters: cluster.local
                  - name: loadbalance
                    parameters: round_robin
                  # Serves a /metrics endpoint on :9153, required for serviceMonitor
                  - name: prometheus
                    parameters: 0.0.0.0:9153
                  - name: forward
                    parameters: . /etc/resolv.conf
                  - name: file
                    parameters: /etc/coredns/it.just.works.db it.just.works
                serviceType: LoadBalancer
                zoneFiles:
                - filename: it.just.works.db
                  domain: it.just.works
                  contents: |
                    it.just.works.            IN      SOA     sns.dns.icann.org. noc.dns.icann.org. 2015082541 7200 3600 1209600 3600
                    it.just.works.            IN      NS      b.iana-servers.net.
                    it.just.works.            IN      NS      a.iana-servers.net.
                    it.just.works.            IN      A       1.2.3.4
                    *.it.just.works.           IN      A      1.2.3.4
          
        2. Update the public IP address of the Ingress service:

          sed -i 's/1.2.3.4/10.172.1.101/' coredns.yaml
          kubectl apply -f coredns.yaml
          
        3. Verify that the DNS resolution works properly:

          1. Assign an external IP to the service:

            kubectl -n coredns patch service coredns-coredns --type='json' -p='[{"op": "replace", "path": "/spec/ports", "value": [{"name": "udp-53", "port": 53, "protocol": "UDP", "targetPort": 53}]}]'
            kubectl -n coredns patch service coredns-coredns --type='json' -p='[{"op": "replace", "path": "/spec/type", "value":"LoadBalancer"}]'
            
          2. Obtain the external IP address of CoreDNS:

            kubectl -n coredns get service coredns-coredns
            

            Example of system response:

            NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
            coredns-coredns   ClusterIP   10.96.178.21   10.172.1.102      53/UDP,53/TCP   25h
            
        4. Point your machine to use the correct DNS. It is 10.172.1.102 in the example system response above.

        5. If you plan to launch Tempest tests or use the OpenStack client from a keystone-client-XXX pod, verify that the Kubernetes built-in DNS service is configured to resolve your public FQDN records by adding your public domain to Corefile. For example, to add the it.just.works domain:

          kubectl -n kube-system get configmap coredns -oyaml
          

          Example of system response:

          apiVersion: v1
          data:
            Corefile: |
              .:53 {
                  errors
                  health
                  ready
                  kubernetes cluster.local in-addr.arpa ip6.arpa {
                    pods insecure
                    fallthrough in-addr.arpa ip6.arpa
                  }
                  prometheus :9153
                  forward . /etc/resolv.conf
                  cache 30
                  loop
                  reload
                  loadbalance
              }
              it.just.works:53 {
                  errors
                  cache 30
                  forward . 10.96.178.21
              }
          
Access your OpenStack environment

This section explains how to access your OpenStack environment as the Admin user.

Before you proceed, verify that you can access the Kubernetes API and have privileges to read secrets from the openstack namespace in Kubernetes or you are able to exec to the pods in this namespace.

Access OpenStack using the Kubernetes built-in admin CLI

You can use the built-in admin CLI client and execute the openstack CLI commands from a dedicated pod deployed in the openstack namespace:

kubectl -n openstack exec \
  $(kubectl -n openstack get pod -l application=keystone,component=client -ojsonpath='{.items[*].metadata.name}') \
  -ti -- bash

This pod has python-openstackclient and all required plugins already installed. Also, this pod has cloud admin credentials stored as appropriate shell environment variables for the openstack CLI command to consume.

Access an OpenStack environment through Horizon
  1. Configure the external DNS resolution for OpenStack services as described in Configure DNS to access OpenStack.

  2. Obtain the password of the Admin user:

    kubectl -n openstack get secret keystone-keystone-admin -ojsonpath='{.data.OS_PASSWORD}' | base64 -d
    
  3. Access Horizon through your browser using its public service. For example, https://horizon.it.just.works.

    To log in, specify the admin user name and default domain. If the OpenStack Identity service has been deployed with the OpenID Connect integration:

    1. From the Authenticate using drop-down menu, select OpenID Connect.

    2. Click Connect. You will be redirected to your identity provider to proceed with the authentication.

    Note

    If OpenStack has been deployed with self-signed TLS certificates for public endpoints, you may get a warning about an untrusted certificate. To proceed, allow the connection.

Access OpenStack through CLI from your local machine

To be able to access your OpenStack environment using CLI, you need to set the required environment variables that are stored in an OpenStack RC environment file. You can either download a project-specific file from Horizon, which is the easiest way, or create an environment file.

To access OpenStack through CLI, select from the following options:

  • Download and source the OpenStack RC file:

    1. Log in to Horizon as described in Access an OpenStack environment through Horizon.

    2. Download the openstackrc or clouds.yaml file from the Web interface.

    3. On any shell from which you want to run OpenStack commands, source the environment file for the respective project.

  • Create and source the OpenStack RC file:

    1. Configure the external DNS resolution for OpenStack services as described in Configure DNS to access OpenStack.

    2. Create a stub of the OpenStack RC file:

      cat << EOF > openstackrc
      export OS_PASSWORD=$(kubectl -n openstack get secret keystone-keystone-admin -ojsonpath='{.data.OS_PASSWORD}' | base64 -d)
      export OS_USERNAME=admin
      export OS_USER_DOMAIN_NAME=Default
      export OS_PROJECT_NAME=admin
      export OS_PROJECT_DOMAIN_NAME=Default
      export OS_REGION_NAME=RegionOne
      export OS_INTERFACE=public
      export OS_IDENTITY_API_VERSION="3"
      EOF
      
    3. Add the Keystone public endpoint to this file as the OS_AUTH_URL variable. For example, for the domain name used throughout this guide:

      echo export OS_AUTH_URL=https://keystone.it.just.works >> openstackrc
      
    4. Source the obtained data into the shell:

      source <openstackrc>
      

      Now, you can use the openstack CLI as usual. For example:

      openstack user list
      +----------------------------------+-----------------+
      | ID                               | Name            |
      +----------------------------------+-----------------+
      | dc23d2d5ee3a4b8fae322e1299f7b3e6 | internal_cinder |
      | 8d11133d6ef54349bd014681e2b56c7b | admin           |
      +----------------------------------+-----------------+
      

      Note

      If OpenStack was deployed with self-signed TLS certificates for public endpoints, you may need to use the openstack CLI client with certificate validation disabled. For example:

      openstack --insecure user list
      
Troubleshoot an OpenStack deployment

This section provides the general debugging instructions for your OpenStack on Kubernetes deployment. Start your troubleshooting with the determination of the failing component that can include the OpenStack Operator, Helm, a particular pod or service.

Debugging the Helm releases

Note

MOSK uses direct communication with Helm 3.

Verify the Helm releases statuses
  1. Log in to the openstack-controller pod, where the Helm v3 client is installed, or download the Helm v3 binary locally:

    kubectl -n osh-system get pods  |grep openstack-controller
    

    Example of a system response:

    openstack-controller-5c6947c996-vlrmv            5/5     Running     0          10m
    openstack-controller-admission-f946dc8d6-6bgn2   1/1     Running     0          4h3m
    
  2. Verify the Helm releases statuses:

    helm3 --namespace openstack list --all
    

    Example of a system response:

    NAME                            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
    etcd                            openstack       4               2021-07-09 11:06:25.377538008 +0000 UTC deployed        etcd-0.1.0-mcp-2735
    ingress-openstack               openstack       4               2021-07-09 11:06:24.892822083 +0000 UTC deployed        ingress-0.1.0-mcp-2735
    openstack-barbican              openstack       4               2021-07-09 11:06:25.733684392 +0000 UTC deployed        barbican-0.1.0-mcp-3890
    openstack-ceph-rgw              openstack       4               2021-07-09 11:06:25.045759981 +0000 UTC deployed        ceph-rgw-0.1.0-mcp-2735
    openstack-cinder                openstack       4               2021-07-09 11:06:42.702963544 +0000 UTC deployed        cinder-0.1.0-mcp-3890
    openstack-designate             openstack       4               2021-07-09 11:06:24.400555027 +0000 UTC deployed        designate-0.1.0-mcp-3890
    openstack-glance                openstack       4               2021-07-09 11:06:25.5916904 +0000 UTC deployed        glance-0.1.0-mcp-3890
    openstack-heat                  openstack       4               2021-07-09 11:06:25.3998706 +0000 UTC deployed        heat-0.1.0-mcp-3890
    openstack-horizon               openstack       4               2021-07-09 11:06:23.27538297 +0000 UTC deployed        horizon-0.1.0-mcp-3890
    openstack-iscsi                 openstack       4               2021-07-09 11:06:37.891858343 +0000 UTC deployed        iscsi-0.1.0-mcp-2735            v1.0.0
    openstack-keystone              openstack       4               2021-07-09 11:06:24.878052272 +0000 UTC deployed        keystone-0.1.0-mcp-3890
    openstack-libvirt               openstack       4               2021-07-09 11:06:38.185312907 +0000 UTC deployed        libvirt-0.1.0-mcp-2735
    openstack-mariadb               openstack       4               2021-07-09 11:06:24.912817378 +0000 UTC deployed        mariadb-0.1.0-mcp-2735
    openstack-memcached             openstack       4               2021-07-09 11:06:24.852840635 +0000 UTC deployed        memcached-0.1.0-mcp-2735
    openstack-neutron               openstack       4               2021-07-09 11:06:58.96398517 +0000 UTC deployed        neutron-0.1.0-mcp-3890
    openstack-neutron-rabbitmq      openstack       4               2021-07-09 11:06:51.454918432 +0000 UTC deployed        rabbitmq-0.1.0-mcp-2735
    openstack-nova                  openstack       4               2021-07-09 11:06:44.277976646 +0000 UTC deployed        nova-0.1.0-mcp-3890
    openstack-octavia               openstack       4               2021-07-09 11:06:24.775069513 +0000 UTC deployed        octavia-0.1.0-mcp-3890
    openstack-openvswitch           openstack       4               2021-07-09 11:06:55.271711021 +0000 UTC deployed        openvswitch-0.1.0-mcp-2735
    openstack-placement             openstack       4               2021-07-09 11:06:21.954550107 +0000 UTC deployed        placement-0.1.0-mcp-3890
    openstack-rabbitmq              openstack       4               2021-07-09 11:06:25.431404853 +0000 UTC deployed        rabbitmq-0.1.0-mcp-2735
    openstack-tempest               openstack       2               2021-07-09 11:06:21.330801212 +0000 UTC deployed        tempest-0.1.0-mcp-3890
    

    If a Helm release is not in the DEPLOYED state, obtain the details from the output of the following command:

    helm3 --namespace openstack  history <release-name>
    
Verify the status of a Helm release

To verify the status of a Helm release:

helm3 --namespace openstack status <release-name>

Example of a system response:

NAME: openstack-memcached
LAST DEPLOYED: Fri Jul  9 11:06:24 2021
NAMESPACE: openstack
STATUS: deployed
REVISION: 4
TEST SUITE: None
Debugging the OpenStack Controller

The OpenStack Controller is running in several containers in the openstack-controller-xxxx pod in the osh-system namespace. For the full list of containers and their roles, refer to OpenStack Controller.

To verify the status of the OpenStack Controller, run:

kubectl -n osh-system get pods

Example of a system response:

NAME                                  READY   STATUS    RESTARTS   AGE
openstack-controller-5c6947c996-vlrmv            5/5     Running     0          17m
openstack-controller-admission-f946dc8d6-6bgn2   1/1     Running     0          4h9m
openstack-operator-ensure-resources-5ls8k        0/1     Completed   0          4h12m

To verify the logs for the osdpl container, run:

kubectl -n osh-system logs -f <openstack-controller-xxxx> -c osdpl
Debugging the OsDpl CR

This section includes the ways to mitigate the most common issues with the OsDpl CR. We assume that you have already debugged the Helm releases and OpenStack Controllers to rule out possible failures with these components as described in Debugging the Helm releases and Debugging the OpenStack Controller.

The osdpl has DEPLOYED=false

Possible root cause: One or more Helm releases have not been deployed successfully.

To determine if you are affected:

Verify the status of the osdpl object:

kubectl -n openstack get osdpl osh-dev

Example of a system response:

NAME      AGE   DEPLOYED   DRAFT
osh-dev   22h   false      false

To debug the issue:

  1. Identify the failed release by assessing the status:children section in the OsDpl resource:

    1. Get the OsDpl YAML file:

      kubectl -n openstack get osdpl osh-dev -o yaml
      
    2. Analyze the status output using the detailed description in Status OsDpl elements Removed.

  2. For further debugging, refer to Debugging the Helm releases.

Some pods are stuck in Init

Possible root cause: MOSK uses the Kubernetes entrypoint init container to resolve dependencies between objects. If the pod is stuck in Init:0/X, this pod may be waiting for its dependencies.

To debug the issue:

Verify the missing dependencies:

kubectl -n openstack logs -f placement-api-84669d79b5-49drw -c init

Example of a system response:

Entrypoint WARNING: 2020/04/21 11:52:50 entrypoint.go:72: Resolving dependency Job placement-ks-user in namespace openstack failed: Job Job placement-ks-user in namespace openstack is not completed yet .
Entrypoint WARNING: 2020/04/21 11:52:52 entrypoint.go:72: Resolving dependency Job placement-ks-endpoints in namespace openstack failed: Job Job placement-ks-endpoints in namespace openstack is not completed yet .
Some Helm releases are not present

Possible root cause: some OpenStack services depend on Ceph. These services include OpenStack Image, OpenStack Compute, and OpenStack Block Storage. If the Helm releases for these services are not present, the openstack-ceph-keys secret may be missing in the openstack-ceph-shared namespace.

To debug the issue:

Verify that the Ceph Controller has created the openstack-ceph-keys secret in the openstack-ceph-shared namespace:

kubectl -n openstack-ceph-shared get secrets openstack-ceph-keys

Example of a positive system response: