Mirantis OpenStack for Kubernetes Documentation

This documentation provides information on how to deploy and operate a Mirantis OpenStack for Kubernetes (MOS) environment. The documentation is intended to help operators to understand the core concepts of the product. The documentation provides sufficient information to deploy and operate the solution.

The information provided in this documentation set is being constantly improved and amended based on the feedback and kind requests from the consumers of MOS.

The following table lists the guides included in the documentation set you are reading:

Guide list

Guide

Purpose

Reference Architecture

Learn the fundamentals of MOS reference architecture to appropriately plan your deployment

Deployment Guide

Deploy a MOS environment of a preferred configuration using supported deployment profiles tailored to the demands of specific business cases

Operations Guide

Operate your MOS environment

Release Notes

Learn about new features and bug fixes in the current MOS version

Intended audience

This documentation is intended for engineers who have the basic knowledge of Linux, virtualization and containerization technologies, Kubernetes API and CLI, Helm and Helm charts, Mirantis Kubernetes Engine (MKE), and OpenStack.

Technology Preview support scope

This documentation set includes description of the Technology Preview features. A Technology Preview feature provide early access to upcoming product innovations, allowing customers to experience the functionality and provide feedback during the development process. Technology Preview features may be privately or publicly available and neither are intended for production use. While Mirantis will provide support for such features through official channels, normal Service Level Agreements do not apply. Customers may be supported by Mirantis Customer Support or Mirantis Field Support.

As Mirantis considers making future iterations of Technology Preview features generally available, we will attempt to resolve any issues that customers experience when using these features.

During the development of a Technology Preview feature, additional components may become available to the public for testing. Because Technology Preview features are being under development, Mirantis cannot guarantee the stability of such features. As a result, if you are using Technology Preview features, you may not be able to seamlessly upgrade to subsequent releases of that feature. Mirantis makes no guarantees that Technology Preview features will be graduated to a generally available product release.

The Mirantis Customer Success Organization may create bug reports on behalf of support cases filed by customers. These bug reports will then be forwarded to the Mirantis Product team for possible inclusion in a future release.

Documentation history

The following table contains the released revision of the documentation set you are reading.

Release date

Description

November 05, 2020

MOS GA release

December 23, 2020

MOS GA Update release

March 01, 2021

MOS 21.1

April 22, 2021

MOS 21.2

June 15, 2021

MOS 21.3

September 01, 2021

MOS 21.4

October 05, 2021

MOS 21.5

Conventions

This documentation set uses the following conventions in the HTML format:

Documentation conventions

Convention

Description

boldface font

Inline CLI tools and commands, titles of the procedures and system response examples, table titles

monospaced font

Files names and paths, Helm charts parameters and their values, names of packages, nodes names and labels, and so on

italic font

Information that distinguishes some concept or term

Links

External links and cross-references, footnotes

Main menu > menu item

GUI elements that include any part of interactive user interface and menu navigation

Superscript

Some extra, brief information

Note

The Note block

Messages of a generic meaning that may be useful for the user

Caution

The Caution block

Information that prevents a user from mistakes and undesirable consequences when following the procedures

Warning

The Warning block

Messages that include details that can be easily missed, but should not be ignored by the user and are valuable before proceeding

See also

The See also block

List of references that may be helpful for understanding of some related tools, concepts, and so on

Learn more

The Learn more block

Used in the Release Notes to wrap a list of internal references to the reference architecture, deployment and operation procedures specific to a newly implemented product feature

Product Overview

Mirantis OpenStack for Kubernetes (MOS) combines the power of a Mirantis Container Cloud delivered and managed Kubernetes clusters, with the industry standard OpenStack APIs enabling you to build your own cloud infrastructure.

The advantages of running all of the OpenStack components as a Kubernetes application are multi-fold and include the following:

  • Zero downtime, non disruptive updates

  • Fully automated Day-2 operations

  • Full-stack management from bare metal through the operating system and all the necessary components

The list of the most common use cases includes:

Software-defined data center

The traditional data center requires multiple requests and interactions to deploy new services, by abstracting the data center functionality behind a standardised set of APIs service can be deployed faster and more efficiently. MOS enables you to define all your data center resources behind the industry standard OpenStack APIs allowing you to automate the deployment of applications or simply request resources through the UI to quickly and efficiently provision virtual machines, storage, networking, and other resources.

Virtual Network Functions (VNFs)

VNFs require high performance systems that can be accessed on demand in a standardised way, with assurances that they will have access to the necessary resources and performance guarantees when needed. MOS provides extensive support for VNF workload enabling easy access to functionality such as Intel EPA (NUMA, CPU pinning, Huge Pages) as well as the consumption of specialised networking interfaces cards to support SR-IOV and DPDK. The centralised management model of MOS and Mirantis Container Cloud also enables the easy management of multiple MOS deployments with full lifecycle management.

Legacy workload migration

With the industry moving toward cloud-native technologies many older or legacy applications are not able to be moved easily and often it does not make financial sense to transform the applications to cloud-native applications. MOS provides a stable cloud platform that can cost effectively host legacy applications whilst still providing the expected levels of control, customization, and uptime.

Reference Architecture

Mirantis OpenStack for Kubernetes (MOS) is a virtualization platform that provides an infrastructure for cloud-ready applications, in combination with reliability and full control over the data.

MOS alloys OpenStack, an open-source cloud infrastructure software, with application management techniques used in Kubernetes ecosystem that include container isolation, state enforcement, declarative definition of deployments, and others.

MOS integrates with Mirantis Container Cloud to rely on its capabilities for bare-metal infrastructure provisioning, Kubernetes cluster management, and continuous delivery of the stack components.

MOS simplifies the work of a cloud operator by automating all major cloud life cycle management routines including cluster updates and upgrades.

Deployment profiles

A Mirantis OpenStack for Kubernetes (MOS) deployment profile is a thoroughly tested and officially supported reference architecture that is guaranteed to work at a specific scale and is tailored to the demands of a specific business case, such as generic IaaS cloud, Network Function Virtualisation infrastructure, Edge Computing, and others.

A deployment profile is defined as a combination of:

  • Services and features the cloud offers to its users.

  • Non-functional characteristics that users and operators should expect when running the profile on top of a reference hardware configuration. Including, but not limited to:

    • Performance characteristics, such as an average network throughput between VMs in the same virtual network.

    • Reliability characteristics, such as the cloud API error response rate when recovering a failed controller node.

    • Scalability characteristics, such as the total amount of virtual routers tenants can run simultaneously.

  • Hardware requirements - the specification of physical servers, and networking equipment required to run the profile in production.

  • Deployment parameters that an operator for the cloud can tweak within a certain range without being afraid of breaking the cloud or losing support.

In addition, the following items may be included in a definition:

  • Compliance-driven technical requirements, such as TLS encryption of all external API endpoints.

  • Foundation-level software components, such as Tungsten Fabric or Open vSwitch as a back end for the networking service.

Note

Mirantis reserves the right to revise the technical implementation of any profile at will while preserving its definition - the functional and non-functional characteristics that operators and users are known to rely on.

MOS supports a huge list of different deployment profiles to address a wide variety of business tasks. The table below includes the profiles for the most common use cases.

Note

Some components of a MOS cluster are mandatory and are being installed during the managed cluster deployment by MCC regardless of the deployment profile in use. StackLight is one of the cluster components that are enabled by default. See MCC Operations Guide for details.

Supported deployment profiles

Profile

OpenStackDeployment CR Preset

Description

Cloud Provider Infrastructure (CPI)

compute

Provides the core set of the services an IaaS vendor would need including some extra functionality. The profile is designed to support up 50-70 compute nodes and a reasonable number of storage nodes. 0

The core set of services provided by the profile includes:

  • Compute (Nova)

  • Images (Glance)

  • Networking (Neutron with Open vSwitch as a back end)

  • Identity (Keystone)

  • Block Storage (Cinder)

  • Orchestration (Heat)

  • Load balancing (Octavia)

  • DNS (Designate)

  • Secret Management (Barbican)

  • Web front end (Horizon)

  • Bare metal provisioning (Ironic) 1 2

  • Telemetry (aodh, Panko, Ceilometer, and Gnocchi) 3

CPI with Tungsten Fabric

compute-tf

A variation of the CPI profile 1 with Tugsten Fabric as a back end for networking.

0

The supported node count is approximate and may vary depending on the hardware, cloud configuration, and planned workload.

1(1,2)

Ironic is an optional component for the CPI profile. See Bare metal OsDpl configuration for details.

2

Ironic is not supported for the CPI with Tungsten Fabric profile. See Tungsten Fabric known limitations for details.

3

Telemetry services are optional components and should be enabled together through the list of services to be deployed in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Components overview

Mirantis OpenStack for Kubernetes (MOS) includes the following key design elements.

HelmBundle Operator

The HelmBundle Operator is the realization of the Kubernetes Operator pattern that provides a Kubernetes custom resource of the HelmBundle kind and code running inside a pod in Kubernetes. This code handles changes, such as creation, update, and deletion, in the Kubernetes resources of this kind by deploying, updating, and deleting groups of Helm releases from specified Helm charts with specified values.

OpenStack

The OpenStack platform manages virtual infrastructure resources, including virtual servers, storage devices, networks, and networking services, such as load balancers, as well as provides management functions to the tenant users.

Various OpenStack services are running as pods in Kubernetes and are represented as appropriate native Kubernetes resources, such as Deployments, StatefulSets, and DaemonSets.

For a simple, resilient, and flexible deployment of OpenStack and related services on top of a Kubernetes cluster, MOS uses OpenStack-Helm that provides a required collection of the Helm charts.

Also, MOS uses OpenStack Operator as the realization of the Kubernetes Operator pattern. The OpenStack Operator provides a custom Kubernetes resource of the OpenStackDeployment kind and code running inside a pod in Kubernetes. This code handles changes such as creation, update, and deletion in the Kubernetes resources of this kind by deploying, updating, and deleting groups of the Helm releases.

Ceph

Ceph is a distributed storage platform that provides storage resources, such as objects and virtual block devices, to virtual and physical infrastructure.

MOS uses Rook as the implementation of the Kubernetes Operator pattern that manages resources of the CephCluster kind to deploy and manage Ceph services as pods on top of Kubernetes to provide Ceph-based storage to the consumers, which include OpenStack services, such as Volume and Image services, and underlying Kubernetes through Ceph CSI (Container Storage Interface).

The Ceph controller is the implementation of the Kubernetes Operator pattern, that manages resources of the MiraCeph kind to simplify management of the Rook-based Ceph clusters.

StackLight Logging, Monitoring, and Alerting

The StackLight component is responsible for collection, analysis, and visualization of critical monitoring data from physical and virtual infrastructure, as well as alerting and error notifications through a configured communication system, such as email. StackLight includes the following key sub-components:

  • Prometheus

  • Elasticsearch

  • Fluentd

  • Kibana

Requirements

MOS cluster hardware requirements

This section provides hardware requirements for the Mirantis Container Cloud management cluster with a managed Mirantis OpenStack for Kubernetes (MOS) cluster.

For installing MOS, the Mirantis Container Cloud management cluster and managed cluster must be deployed with baremetal provider.

Note

One of the industry best practices is to verify every new update or configuration change in a non-customer-facing environment before applying it to production. Therefore, Mirantis recommends having a staging cloud, deployed and maintained along with the production clouds. The recommendation is especially applicable to the environments that:

  • Receive updates often and use continuous delivery. For example, any non-isolated deployment of Mirantis Container Cloud and Mirantis OpenStack for Kubernetes (MOS).

  • Have significant deviations from the reference architecture or third party extensions installed.

  • Are managed under the Mirantis OpsCare program.

  • Run business-critical workloads where even the slightest application downtime is unacceptable.

A typical staging cloud is a complete copy of the production environment including the hardware and software configurations, but with a bare minimum of compute and storage capacity.

The MOS reference architecture includes the following node types:

  • Mirantis Container Cloud management cluster nodes

    The Container Cloud management cluster architecture on bare metal requires three physical servers for manager nodes. On these hosts, we deploy a Kubernetes cluster with services that provide Container Cloud control plane functions.

  • OpenStack control plane node and StackLight node

    Host OpenStack control plane services such as database, messaging, API, schedulers conductors, and L3 and L2 agents, as well as the StackLight components.

    Note

    As of MOS 21.4, you can collocate the OpenStack control plane with the managed cluster master nodes on the OpenStack deployments of a small size.

    This feature is available as technical preview. Use such configuration for testing and evaluation purposes only.

  • Tenant gateway node

    Optional, hosts OpenStack gateway services including L2, L3, and DHCP agents. The tenant gateway nodes are combined with OpenStack control plane nodes. The strict requirement is a dedicated physical network (bond) for tenant network traffic.

  • Tungsten Fabric control plane node

    Required only if Tungsten Fabric (TF) is enabled as a back end for the OpenStack networking. These nodes host the TF control plane services such as Cassandra database, messaging, API, control, and configuration services.

  • Tungsten Fabric analytics node

    Required only if TF is enabled as a back end for the OpenStack networking. These nodes host the TF analytics services such as Cassandra, ZooKeeper and collector.

  • Compute node

    Hosts OpenStack Compute services such as QEMU, L2 agents, and others.

  • Infrastructure nodes

    Runs underlying Kubernetes cluster management services. The MOS reference configuration requires minimum three infrastructure nodes.

The table below specifies the hardware resources the MOS reference architecture recommends for each node type.

Hardware requirements

Node type

# of servers

CPU cores # per server

RAM per server, GB

Disk space per server, GB

NICs # per server

Mirantis Container Cloud management cluster node

3 0

16

128

1 SSD x 960
2 SSD x 1900 1

3 2

OpenStack control plane, gateway 3, and StackLight nodes

3

32

128

1 SSD x 500
2 SSD x 1000 6

5

Tenant gateway (optional)

0-3

32

128

1 SSD x 500

5

Tungsten Fabric control plane nodes 4

3

16

64

1 SSD x 500

1

Tungsten Fabric analytics nodes 4

3

32

64

1 SSD x 1000

1

Compute node

3 (varies)

16

64

1 SSD x 500 7

5

Infrastructure node (Kubernetes cluster management)

3 8

16

64

1 SSD x 500

5

Infrastructure node (Ceph) 5

3

16

64

1 SSD x 500
2 HDDs x 2000

5

Note

The exact hardware specifications and number of nodes depend on a cloud configuration and scaling needs.

0

Adding more than 3 nodes to a management or regional cluster is not supported.

1

In total, at least 3 disks are required:

  • sda - system storage, minimum 60 GB

  • sdb - Container Cloud services storage, not less than 110 GB. The exact capacity requirements depend on StackLight data retention period.

  • sdc - for persistent storage on Ceph

See Management cluster storage for details.

2

OOB management (IPMI) port is not included.

3

OpenStack gateway services can optionally be moved to separate nodes.

4(1,2)

TF control plane and analytics nodes can be combined with a respective addition of RAM, CPU, and disk space to the hardware hosts. Though, Mirantis does not recommend such configuration for production environments as the risk of the cluster downtime if one of the nodes unexpectedly fails increases.

5
  • A Ceph cluster with 3 Ceph nodes does not provide hardware fault tolerance and is not eligible for recovery operations, such as a disk or an entire node replacement.

  • A Ceph cluster uses the replication factor that equals 3. If the number of Ceph OSDs is less than 3, a Ceph cluster moves to the degraded state with the write operations restriction until the number of alive Ceph OSDs equals the replication factor again.

6
  • 1 SSD x 500 for operating system

  • 1 SSD x 1000 for OpenStack LVP

  • 1 SSD x 1000 for StackLight LVP

7

When Nova is used with local folders, additional capacity is required depending on the VM images size.

8

For nodes hardware requirements, refer to Container Cloud Reference Architecture: Managed cluster hardware configuration.

Note

If you are looking to try MOS and do not have much hardware at your disposal, you can deploy it in a virtual environment, for example, on top of another OpenStack cloud using the sample Heat templates.

Please mind, the tooling is provided for reference only and is not a part of the product itself. Mirantis does not guarantee its interoperability with the latest MOS version.

Management cluster storage

The management cluster requires minimum three storage devices per node. Each device is used for different type of storage:

  • One storage device for boot partitions and root file system. SSD is recommended. A RAID device is not supported.

  • One storage device per server is reserved for local persistent volumes. These volumes are served by the Local Storage Static Provisioner, that is local-volume-provisioner, and used by many services of Mirantis Container Cloud.

  • At least one disk per server must be configured as a device managed by a Ceph OSD.

  • The minimal recommended number of Ceph OSDs for management cluster is 2 OSDs per node, to the total of 6 OSDs.

  • The recommended replication factor is 3, which ensures that no data is lost if any single node of the management cluster fails.

You can configure host storage devices using BareMetalHostProfile resources.

System requirements for the seed node

The seed node is only necessary to deploy the management cluster. When the bootstrap is complete, the bootstrap node can be discarded and added back to the MOS cluster as a node of any type.

The minimum reference system requirements for a baremetal-based bootstrap seed node are as follow:

  • Basic Ubuntu 18.04 server with the following configuration:

    • Kernel of version 4.15.0-76.86 or later

    • 8 GB of RAM

    • 4 CPU

    • 10 GB of free disk space for the bootstrap cluster cache

  • No DHCP or TFTP servers on any NIC networks

  • Routable access IPMI network for the hardware servers.

  • Internet access for downloading of all required artifacts

    If you use a firewall or proxy, make sure that the bootstrap, management, and regional clusters have access to the following IP ranges and domain names:

    • IP ranges:

    • Domain names:

      • mirror.mirantis.com and repos.mirantis.com for packages

      • binary.mirantis.com for binaries and Helm charts

      • mirantis.azurecr.io for Docker images

      • mcc-metrics-prod-ns.servicebus.windows.net:9093 for Telemetry (port 443 if proxy is enabled)

      • mirantis.my.salesforce.com for Salesforce alerts

    Note

    • Access to Salesforce is required from any Container Cloud cluster type.

    • If any additional Alertmanager notification receiver is enabled, for example, Slack, its endpoint must also be accessible from the cluster.

Components collocation

MOS uses Kubernetes labels to place components onto hosts. For the default locations of components, see MOS cluster hardware requirements. Additionally, MOS supports component collocation. This is mostly useful for OpenStack compute and Ceph nodes. For component collocation, consider the following recommendations:

  • When calculating hardware requirements for nodes, consider the requirements for all collocated components.

  • When performing maintenance on a node with collocated components, execute the maintenance plan for all of them.

  • When combining other services with the OpenStack compute host, verify that reserved_host_* has increased accordingly to the needs of collocated components by using node-specific overrides for the compute service.

Infrastructure requirements

This section lists the infrastructure requirements for the Mirantis OpenStack for Kubernetes (MOS) reference architecture.

Infrastructure requirements

Service

Description

MetalLB

MetalLB exposes external IP addresses to access applications in a Kubernetes cluster.

DNS

The Kubernetes Ingress NGINX controller is used to expose OpenStack services outside of a Kubernetes deployment. Access to the Ingress services is allowed only by its FQDN. Therefore, DNS is a mandatory infrastructure service for an OpenStack on Kubernetes deployment.

OpenStack

OpenStack Operator

The OpenStack Operator component is a combination of the following entities:

OpenStack Controller

The OpenStack Controller runs in a set of containers in a pod in Kubernetes. The OpenStack Controller is deployed as a Deployment with 1 replica only. The failover is provided by Kubernetes that automatically restarts the failed containers in a pod.

However, given the recommendation to use a separate Kubernetes cluster for each OpenStack deployment, the controller in envisioned mode for operation and deployment will only manage a single OpenStackDeployment resource, making the proper HA much less of an issue.

The OpenStack Controller is written in Python using Kopf, as a Python framework to build Kubernetes operators, and Pykube, as a Kubernetes API client.

Using Kubernetes API, the controller subscribes to changes to resources of kind: OpenStackDeployment, and then reacts to these changes by creating, updating, or deleting appropriate resources in Kubernetes.

The basic child resources managed by the controller are Helm releases. They are rendered from templates taking into account an appropriate values set from the main and features fields in the OpenStackDeployment resource.

Then, the common fields are merged to resulting data structures. Lastly, the services fields are merged providing the final and precise override for any value in any Helm release to be deployed or upgraded.

The constructed values are then used by the OpenStack Controller during a Helm release installation.

OpenStack Controller containers

Container

Description

osdpl

The core container that handles changes in the osdpl object.

helmbundle

The container that watches the helmbundle objects and reports their statuses to the osdpl object in status:children. See Status OsDpl elements for details.

health

The container that watches all Kubernetes native resources, such as Deployments, Daemonsets, Statefulsets, and reports their statuses to the osdpl object in status:health. See Status OsDpl elements for details.

secrets

The container that provides data exchange between different components such as Ceph.

node

The container that handles the node events.

_images/openstack_controller.png
OpenStackDeployment admission controller

The CustomResourceDefinition resource in Kubernetes uses the OpenAPI Specification version 2 to specify the schema of the resource defined. The Kubernetes API outright rejects the resources that do not pass this schema validation.

The language of the schema, however, is not expressive enough to define a specific validation logic that may be needed for a given resource. For this purpose, Kubernetes enables the extension of its API with Dynamic Admission Control.

For the OpenStackDeployment (OsDpl) CR the ValidatingAdmissionWebhook is a natural choice. It is deployed as part of OpenStack Controller by default and performs specific extended validations when an OsDpl CR is created or updated.

The inexhaustive list of additional validations includes:

  • Deny the OpenStack version downgrade

  • Deny the OpenStack version skip-level upgrade

  • Deny the OpenStack master version deployment

  • Deny upgrade to the OpenStack master version

  • Deny upgrade if any part of an OsDpl CR specification changes along with the OpenStack version

Under specific circumstances, it may be viable to disable the admission controller, for example, when you attempt to deploy or upgrade to the master version of OpenStack.

Warning

Mirantis does not support MOS deployments performed without the OpenStackDeployment admission controller enabled. Disabling of the OpenStackDeployment admission controller is only allowed in staging non-production environments.

To disable the admission controller, ensure that the following structures and values are present in the openstack-controller HelmBundle resource:

apiVersion: lcm.mirantis.com/v1alpha1
kind: HelmBundle
metadata:
  name: openstack-operator
  namespace: osh-system
spec:
  releases:
  - name: openstack-operator
    values:
      admission:
        enabled: false

At that point, all safeguards except for those expressed by the CR definition are disabled.

OpenStackDeployment custom resource

The resource of kind OpenStackDeployment (OsDpl) is a custom resource (CR) defined by a resource of kind CustomResourceDefinition. This section is intended to provide a detailed overview of the OsDpl configuration including the definition of its main elements as well as the configuration of extra OpenStack services that do no belong to standard deployment profiles.

OsDpl standard configuration

The detailed information about schema of an OpenStackDeployment (OsDpl) custom resource can be obtained by running:

kubectl get crd openstackdeployments.lcm.mirantis.com -oyaml

The definition of a particular OpenStack deployment can be obtained by running:

kubectl -n openstack get osdpl -oyaml

Example of an OsDpl CR of minimum configuration:

apiVersion: lcm.mirantis.com/v1alpha1
kind: OpenStackDeployment
metadata:
  name: openstack-cluster
  namespace: openstack
spec:
  openstack_version: ussuri
  preset: compute
  size: tiny
  internal_domain_name: cluster.local
  public_domain_name: it.just.works
  features:
    ssl:
      public_endpoints:
        api_cert: |-
          The public key certificate of the OpenStack public endpoints followed by
          the certificates of any intermediate certificate authorities which
          establishes a chain of trust up to the root CA certificate.
        api_key: |-
          The private key of the certificate for the OpenStack public endpoints.
          This key must match the public key used in the api_cert.
        ca_cert: |-
          The public key certificate of the root certificate authority.
          If you do not have one, use the top-most intermediate certificate instead.
    neutron:
      tunnel_interface: ens3
      external_networks:
        - physnet: physnet1
          interface: veth-phy
          bridge: br-ex
          network_types:
           - flat
          vlan_ranges: null
          mtu: null
      floating_network:
        enabled: False
    nova:
      live_migration_interface: ens3
      images:
        backend: local

For the detailed description of the OsDpl main elements, see sections below:


Main OsDpl elements
apiVersion

Specifies the version of the Kubernetes API that is used to create this object.

kind

Specifies the kind of the object.

metadata:name

Specifies the name of metadata. Should be set in compliance with the Kubernetes resource naming limitations.

metadata:namespace

Specifies the metadata namespace. While technically it is possible to deploy OpenStack on top of Kubernetes in other than openstack namespace, such configuration is not included in the MOS system integration test plans. Therefore, we do not recommend such scenario.

Warning

Both OpenStack and Kubernetes platforms provide resources to applications. When OpenStack is running on top of Kubernetes, Kubernetes is completely unaware of OpenStack-native workloads, such as virtual machines, for example.

For better results and stability, Mirantis recommends using a dedicated Kubernetes cluster for OpenStack, so that OpenStack and auxiliary services, Ceph, and StackLight are the only Kubernetes applications running in the cluster.

spec

Contains the data that defines the OpenStack deployment and configuration. It has both high-level and low-level sections.

The very basic values that must be provided include:

spec:
  openstack_version:
  preset:
  size:
  internal_domain_name:
  public_domain_name:

For the detailed description of the spec subelements, see Spec OsDpl elements.


Spec OsDpl elements
openstack_version

Specifies the OpenStack release to deploy.

preset

String that specifies the name of the preset, a predefined configuration for the OpenStack cluster. A preset includes:

  • A set of enabled services that includes virtualization, bare metal management, secret management, and others

  • Major features provided by the services, such as VXLAN encapsulation of the tenant traffic

  • Integration of services

Every supported deployment profile incorporates an OpenStack preset. Refer to Deployment profiles for the list of possible values.

size

String that specifies the size category for the OpenStack cluster. The size category defines the internal configuration of the cluster such as the number of replicas for service workers and timeouts, etc.

The list of supported sizes include:

  • tiny - for approximately 10 OpenStack compute nodes

  • small - for approximately 50 OpenStack compute nodes

  • medium - for approximately 100 OpenStack compute nodes

internal_domain_name

Specifies the internal DNS name used inside the Kubernetes cluster on top of which the OpenStack cloud is deployed.

public_domain_name

Specifies the public DNS name for OpenStack services. This is a base DNS name that must be accessible and resolvable by API clients of your OpenStack cloud. It will be present in the OpenStack endpoints as presented by the OpenStack Identity service catalog.

The TLS certificates used by the OpenStack services (see below) must also be issued to this DNS name.

persistent_volume_storage_class

Specifies the Kubernetes storage class name used for services to create persistent volumes. For example, backups of MariaDB. If not specified, the storage class marked as default will be used.

features

Contains the top-level collections of settings for the OpenStack deployment that potentially target several OpenStack services. The section where the customizations should take place.

features:services

Contains a list of extra OpenStack services to deploy. Extra OpenStack services are services that are not included into preset.

features:services:object-storage

Available since MOS Ussuri Update

Enables the object storage and provides a RADOS Gateway Swift API that is compatible with the OpenStack Swift API. To enable the service, add object-storage to the service list:

spec:
  features:
    services:
    - object-storage

To create the RADOS Gateway pool in Ceph, see Container Cloud Operations Guide: Enable Ceph RGW Object Storage.

features:services:instance-ha

Available since MOS 21.2 TechPreview

Enables Masakari, the OpenStack service that ensures high availability of instances running on a host. To enable the service, add instance-ha to the service list:

spec:
  features:
    services:
    - instance-ha
features:services:tempest

Enables tests against a deployed OpenStack cloud:

spec:
  features:
    services:
    - tempest
features:ssl

Contains the content of SSL/TLS certificates (server, key, CA bundle) used to enable a secure communication to public OpenStack API services.

These certificates must be issued to the DNS domain specified in the public_domain_name field.

features:neutron:tunnel_interface

Defines the name of the NIC device on the actual host that will be used for Neutron.

We recommend setting up your Kubernetes hosts in such a way that networking is configured identically on all of them, and names of the interfaces serving the same purpose or plugged into the same network are consistent across all physical nodes.

features:neutron:dns_servers

Defines the list of IPs of DNS servers that are accessible from virtual networks. Used as default DNS servers for VMs.

features:neutron:external_networks

Contains the data structure that defines external (provider) networks on top of which the Neutron networking will be created.

features:neutron:floating_network

If enabled, must contain the data structure defining the floating IP network that will be created for Neutron to provide external access to your Nova instances.

features:nova:live_migration_interface

Specifies the name of the NIC device on the actual host that will be used by Nova for the live migration of instances.

We recommend setting up your Kubernetes hosts in such a way that networking is configured identically on all of them, and names of the interfaces serving the same purpose or plugged into the same network are consistent across all physical nodes.

features:barbican:backends:vault

Specifies the object containing the Vault parameters to connect to Barbican. The list of supported options includes:

  • enabled - boolean parameter indicating that the Vault back end is enabled.

  • approle_role_id - Vault app role ID.

  • approle_secret_id - secret ID created for the app role.

  • vault_url - URL of the Vault server.

  • use_ssl - enables the SSL encryption. Since MOS does not currently support the Vault SSL encryption, the use_ssl parameter should be set to false.

  • kv_mountpoint TechPreview - optional, specifies the mountpoint of a Key-Value store in Vault to use.

  • namespace TechPreview - optional, specifies the Vault namespace to use with all requests to Vault.

    Note

    The Vault namespaces feature is available only in Vault Enterprise.

    Note

    Vault namespaces are supported only starting from the OpenStack Victoria release.

If the Vault back end is used, configure it properly using the following parameters:

spec:
  features:
    barbican:
      backends:
        vault:
          enabled: true
          approle_role_id: <APPROLE_ROLE_ID>
          approle_secret_id: <APPROLE_SECRET_ID>
          vault_url: <VAULT_SERVER_URL>
          use_ssl: false

Note

Since MOS does not currently support the Vault SSL encryption, set the use_ssl parameter to false.

features:nova:images:backend

Defines the type of storage for Nova to use on the compute hosts for the images that back up the instances.

The list of supported options include:

  • local - the local storage is used. The pros include faster operation, failure domain independency from the external storage. The cons include local space consumption and less performant and robust live migration with block migration.

  • ceph - instance images are stored in a Ceph pool shared across all Nova hypervisors. The pros include faster image start, faster and more robust live migration. The cons include considerably slower IO performance, workload operations direct dependency on Ceph cluster availability and performance.

  • lvm Available since MOS 21.2, Technical Preview - instance images and ephemeral images are stored on a local Logical Volume. If specified, features:nova:images:lvm:volume_group must be set to an available LVM Volume Group, by default, nova-vol. For details, see Enable LVM ephemeral storage.

features:keystone:keycloak

Defines parameters to connect to the Keycloak identity provider. For details, see Integration with Identity Access Management (IAM).

features:keystone:domain_specific_configuration

Defines the domain-specific configuration and is useful for integration with LDAP. An example of OsDpl with LDAP integration, which will create a separate domain.with.ldap domain and configure it to use LDAP as an identity driver:

spec:
  features:
    keystone:
      domain_specific_configuration:
        enabled: true
        domains:
        - name: domain.with.ldap
          enabled: true
          config:
            assignment:
              driver: keystone.assignment.backends.sql.Assignment
            identity:
              driver: ldap
            ldap:
              chase_referrals: false
              group_desc_attribute: description
              group_id_attribute: cn
              group_member_attribute: member
              group_name_attribute: ou
              group_objectclass: groupOfNames
              page_size: 0
              password: XXXXXXXXX
              query_scope: sub
              suffix: dc=mydomain,dc=com
              url: ldap://ldap01.mydomain.com,ldap://ldap02.mydomain.com
              user: uid=openstack,ou=people,o=mydomain,dc=com
              user_enabled_attribute: enabled
              user_enabled_default: false
              user_enabled_invert: true
              user_enabled_mask: 0
              user_id_attribute: uid
              user_mail_attribute: mail
              user_name_attribute: uid
              user_objectclass: inetOrgPerson
features:telemetry:mode

Specifies the Telemetry mode, which determines the permitted actions for the Telemetry services. The only supported value is autoscaling that allows for autoscaling of instances with HOT templates according to predefined conditions related to load of an instance and rules in the alarming service. The accounting mode support is being under development.

Structure example:

spec:
  features:
    telemetry:
      mode: "autoscaling"

Caution

To enable the Telemetry mode, the corresponding services including the alarming, event, metering, and metric services should be specified in spec:features:services:

spec:
  features:
    services:
    - alarming
    - event
    - metering
    - metric
features:logging

Specifies the standard logging levels for OpenStack services that include the following, at increasing severity: TRACE, DEBUG, INFO, AUDIT, WARNING, ERROR, and CRITICAL. For example:

spec:
  features:
    logging:
      nova:
        level: DEBUG
features:horizon:themes

Available since MOS Ussuri Update

Defines the list of custom OpenStack Dashboard themes. Content of the archive file with a theme depends on the level of customization and can include static files, Django templates, and other artifacts. For the details, refer to OpenStack official documentation: Customizing Horizon Themes.

spec:
  features:
    horizon:
      themes:
        - name: theme_name
          description: The brand new theme
          url: https://<path to .tgz file with the contents of custom theme>
          sha256summ: <SHA256 checksum of the archive above>
features:policies

Available since MOS 21.4

Defines the list of custom policies for OpenStack services.

Structure example:

spec:
  features:
    policies:
      nova:
        custom_policy: custom_value

The list of services available for configuration includes: Cinder, Nova, Designate, Keystone, Glance, Neutron, Heat, Octavia, Barbican, Placement, Ironic, aodh, Panko, Gnocchi, and Masakari.

Caution

Mirantis is not responsible for cloud operability in case of default policies modifications but provides API to pass the required configuration to the core OpenStack services.

features:database:cleanup

Available since MOS 21.6

Defines the cleanup of the databases stale entries that are marked by OpenStack services as deleted. The scripts run on a periodic basis as cron jobs. By default, the databases entries older than 30 days are cleaned each Monday as per the following schedule:

Service

Server time

Cinder

12:01 a.m.

Nova

01:01 a.m.

Glance

02:01 a.m.

Masakari

03:01 a.m.

Barbican

04:01 a.m.

Heat

05:01 a.m.

The list of services available for configuration includes: Barbican, Cinder, Glance, Heat, Masakari, and Nova.

Structure example:

spec:
  features:
    database:
      cleanup:
        <os-service>:
          enabled:
          schedule:
          age: 30
          batch: 1000
artifacts

A low-level section that defines the base URI prefixes for images and binary artifacts.

common

A low-level section that defines values that will be passed to all OpenStack (spec:common:openstack) or auxiliary (spec:common:infra) services Helm charts.

Structure example:

spec:
  artifacts:
  common:
    openstack:
      values:
    infra:
      values:
services

A section of the lowest level, enables the definition of specific values to pass to specific Helm charts on a one-by-one basis:

Warning

Mirantis does not recommend changing the default settings for spec:artifacts, spec:common, and spec:services elements. Customizations can compromise the OpenStack deployment update and upgrade processes. However, you may need to edit the spec:services section to limit hardware resources in case of a hyperconverged architecture as described in Limit HW resources for hyperconverged OpenStack compute nodes.


Status OsDpl elements
The status element

Contains information about the current status of an OpenStack deployment, which cannot be changed by the user.

status:children

Specifies the current status of Helm releases that are managed by the OpenStack Operator. The possible values include:

  • True - when the Helm chart is in the deployed state.

  • False - when an error occurred during the Helm chart deployment.

An example of children output:

children:
  openstack-block-storage: true
  openstack-compute: true
  openstack-coordination: true
  ...
status:deployed

Shows an overall status of all Helm releases. Shows True when all children are in the deployed state.

status:fingerprint

Is the MD5 hash of the body:spec object. It is passed to values of all child Helm releases as part of the lcm.mirantis.com/v1alpha1 structure. Also, status:fingerprint enables detecting the OsDpl resource version used when installing a Helm release.

status:version

Contains the version of the OpenStack Operator that processes the OsDpl resource. And, similarly to fingerprint, it enables detecting the version of the OpenStack Operator that processed the child Helm release.

status:health

While status:children shows information about any deployed child Helm releases, status:health shows the actual status of the deployed Kubernetes resources. The resources may be created as different Kubernetes objects, such as Deployments, Statefulsets, or DaemonSets. Possible values include:

  • Ready - all pods from the resource are in the Ready state

  • Unhealthy - not all pods from the resource are in the Ready state

  • Progressing - Kubernetes resource is updating

  • Unknown - other, unrecognized states

An example of the health output:

health:
  barbican:
    api:
      generation: 4
      status: Ready
    rabbitmq:
      generation: 1
      status: Ready
  cinder:
    api:
      generation: 4
      status: Ready
    backup:
      generation: 2
      status: Ready
    rabbitmq:
      generation: 1
      status: Ready
    ...
  ...
status:kopf

Contains the structure that is used by the Kopf library to store its internal data.

Integration with Identity Access Management (IAM)

Mirantis Container Cloud uses the Identity and access management (IAM) service for users and permission management. This section describes how you can integrate your OpenStack deployment with Keycloak through the OpenID connect.

To enable integration on the OpenStack side, define the following parameters in your openstackdeployment custom resource:

spec:
  features:
    keystone:
      keycloak:
        enabled: true
        url: <https://my-keycloak-instance>
        # optionally ssl cert validation might be disabled
        oidc:
           OIDCSSLValidateServer: false
           OIDCOAuthSSLValidateServer: false

The configuration above will trigger the creation of the os client in Keycloak. The role management and assignment should be configured separately on a particular deployment.

Bare metal OsDpl configuration

The Bare metal (Ironic) service is an extra OpenStack service that can be deployed by the OpenStack Operator. This section provides the baremetal-specific configuration options of the OsDpl resource.

To install bare metal services, add the baremetal keyword to the spec:features:services list:

spec:
  features:
    services:
      - baremetal

Note

All bare metal services are scheduled to the nodes with the openstack-control-plane: enabled label.

Ironic agent deployment images

To provision a user image onto a bare metal server, Ironic boots a node with a ramdisk image. Depending on the node’s deploy interface and hardware, the ramdisk may require different drivers (agents). MOS provides tinyIPA-based ramdisk images and uses the direct deploy interface with the ipmitool power interface.

Example of agent_images configuration:

spec:
  features:
    ironic:
       agent_images:
         base_url: https://binary.mirantis.com/openstack/bin/ironic/tinyipa
         initramfs: tinyipa-stable-ussuri-20200617101427.gz
         kernel: tinyipa-stable-ussuri-20200617101427.vmlinuz

Since the bare metal nodes hardware may require additional drivers, you may need to build a deploy ramdisk for particular hardware. For more information, see Ironic Python Agent Builder. Be sure to create a ramdisk image with the version of Ironic Python Agent appropriate for your OpenStack release.

Bare metal networking

Ironic supports the flat and multitenancy networking modes.

The flat networking mode assumes that all bare metal nodes are pre-connected to a single network that cannot be changed during the virtual machine provisioning.

Example of the OsDpl resource illustrating the configuration for the flat network mode:

spec:
  features:
    services:
      - baremetal
    neutron:
      external_networks:
        - bridge: ironic-pxe
          interface: <baremetal-interface>
          network_types:
            - flat
          physnet: ironic
          vlan_ranges: null
    ironic:
       # The name of neutron network used for provisioning/cleaning.
       baremetal_network_name: ironic-provisioning
       networks:
         # Neutron baremetal network definition.
         baremetal:
           physnet: ironic
           name: ironic-provisioning
           network_type: flat
           external: true
           shared: true
           subnets:
             - name: baremetal-subnet
               range: 10.13.0.0/24
               pool_start: 10.13.0.100
               pool_end: 10.13.0.254
               gateway: 10.13.0.11
       # The name of interface where provision services like tftp and ironic-conductor
       # are bound.
       provisioning_interface: br-baremetal

The multitenancy network mode uses the neutron Ironic network interface to share physical connection information with Neutron. This information is handled by Neutron ML2 drivers when plugging a Neutron port to a specific network. MOS supports the networking-generic-switch Neutron ML2 driver out of the box.

Example of the OsDpl resource illustrating the configuration for the multitenancy network mode:

spec:
  features:
    services:
      - baremetal
    neutron:
      tunnel_interface: ens3
      external_networks:
        - physnet: physnet1
          interface: <physnet1-interface>
          bridge: br-ex
          network_types:
            - flat
          vlan_ranges: null
          mtu: null
        - physnet: ironic
          interface: <physnet-ironic-interface>
          bridge: ironic-pxe
          network_types:
            - vlan
          vlan_ranges: 1000:1099
    ironic:
      # The name of interface where provision services like tftp and ironic-conductor
      # are bound.
      provisioning_interface: <baremetal-interface>
      baremetal_network_name: ironic-provisioning
      networks:
        baremetal:
          physnet: ironic
          name: ironic-provisioning
          network_type: vlan
          segmentation_id: 1000
          external: true
          shared: false
          subnets:
            - name: baremetal-subnet
              range: 10.13.0.0/24
              pool_start: 10.13.0.100
              pool_end: 10.13.0.254
              gateway: 10.13.0.11
Node-specific settings

Available since MOS Ussuri Update

Depending on the use case, you may need to configure the same application components differently on different hosts. MOS enables you to easily perform the required configuration through node-specific overrides at the OpenStack Controller side.

The limitation of using the node-specific overrides is that they override only the configuration settings while other components, such as startup scripts and others, should be reconfigured as well.

Caution

The overrides have been implemented in a similar way to the OpenStack node and node label specific DaemonSet configurations. Though, the OpenStack Controller node-specific settings conflict with the upstream OpenStack node and node label specific DaemonSet configurations. Therefore, we do not recommend configuring node and node label overrides.

The node-specific settings are activated through the spec:nodes section of the OsDpl CR. The spec:nodes section contains the following subsections:

  • features- implements overrides for a limited subset of fields and is constructed similarly to spec::features

  • services - similarly to spec::services, enables you to override settings in general for the components running as DaemonSets.

Example configuration:

spec:
  nodes:
    <NODE-LABEL>::<NODE-LABEL-VALUE>:
      features:
        # Detailed information about features might be found at
        # openstack_controller/admission/validators/nodes/schema.yaml
      services:
        <service>:
          <chart>:
            <chart_daemonset_name>:
              values:
                # Any value from specific helm chart
OpenStackDeploymentStatus custom resource

Available since MOS 21.5

The resource of kind OpenStackDeploymentStatus (OsDplSt) is a custom resource that describes the status of an OpenStack deployment.

OpenStackDeploymensStatus overview

Available since MOS 21.5

To obtain detailed information about the schema of an OpenStackDeploymentStatus (OsDplSt) custom resource, run:

kubectl get crd openstackdeploymentstatus.lcm.mirantis.com -oyaml

To obtain the status definition for a particular OpenStack deployment, run:

kubectl -n openstack get osdplst -oyaml

Example of an OsDplSt CR:

kind: OpenStackDeploymentStatus
metadata:
  name: osh-dev
  namespace: openstack
spec: {}
status:
  handle:
    lastStatus: update
  health:
    barbican:
      api:
        generation: 2
        status: Ready
    cinder:
      api:
        generation: 2
        status: Ready
      backup:
        generation: 1
        status: Ready
      scheduler:
        generation: 1
        status: Ready
      volume:
        generation: 1
        status: Ready
  osdpl:
    cause: update
    changes: '((''add'', (''status'',), None, {''watched'': {''ceph'': {''secret'':
      {''hash'': ''0fc01c5e2593bc6569562b451b28e300517ec670809f72016ff29b8cbaf3e729''}}}}),)'
    controller_version: 0.5.3.dev12
    fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
    openstack_version: ussuri
    state: APPLIED
    timestamp: "2021-09-08 17:01:45.633143"
  services:
    baremetal:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:54.081353"
    block-storage:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:57.306669"
    compute:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:18.853068"
    coordination:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:00.593719"
    dashboard:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:57.652145"
    database:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:00.233777"
    dns:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:56.540886"
    identity:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:00.961175"
    image:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:58.976976"
    ingress:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:01.440757"
    key-manager:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:51.822997"
    load-balancer:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:02.462824"
    memcached:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:03.165045"
    messaging:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:58.637506"
    networking:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:35.553483"
    object-storage:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:01.828834"
    orchestration:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:01:02.846671"
    placement:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:58.039210"
    redis:
      controller_version: 0.5.3.dev12
      fingerprint: a112a4a7d00c0b5b79e69a2c78c3b50b0caca76a15fe7d79a6ad1305b19ee5ec
      openstack_version: ussuri
      state: APPLIED
      timestamp: "2021-09-08 17:00:36.562673"

For the detailed description of the OsDplSt main elements, see the sections below:


Health elements

The health subsection provides a brief output on services health.

OsDpl elements

The osdpl subsection describes the overall status of the OpenStack deployment and consists of the following items:

cause

The cause that triggered the LCM action: update when OsDpl is updated, resume when the OpenStack controller is restarted.

changes

A string representation of changes in the OpenstackDeployment object.

controller_version

The version of openstack-controller that handles the LCM action.

fingerprint

The SHA sum of the OpenStackDeployment object spec section.

openstack_version

The current OpenStack version specified in the osdpl object.

state

The current state of the LCM action. Possible values include:

  • APPLYING - not all operations are completed.

  • APPLIED - all operations are completed.

timestamp

The timestamp of the status:osdpl section update.

Services elements

The services subsection provides detailed information of LCM performed with a specific service. This is a dictionary where keys are service names, for example, baremetal or compute and values are dictionaries with the following items:

controller_verison

The version of the openstack-controller that handles the LCM action on a specific service.

fingerprint

The SHA sum of the OpenStackDeployment object spec section used when perfoming the LCM on a specific service.

openstack_version

The OpenStack version specified in the osdpl object used when performing the LCM action on a specific service.

state

The current state of the LCM action performed on a service. Possible values include:

  • WAITING - waiting for dependencies.

  • APPLYING - not all operations are completed.

  • APPLIED - all operations are completed.

timestamp

The timestamp of the status:services:<SERVICE-NAME> section update.

OpenStack on Kubernetes architecture

OpenStack and auxiliary services are running as containers in the kind: Pod Kubernetes resources. All long-running services are governed by one of the ReplicationController-enabled Kubernetes resources, which include either kind: Deployment, kind: StatefulSet, or kind: DaemonSet.

The placement of the services is mostly governed by the Kubernetes node labels. The labels affecting the OpenStack services include:

  • openstack-control-plane=enabled - the node hosting most of the OpenStack control plane services.

  • openstack-compute-node=enabled - the node serving as a hypervisor for Nova. The virtual machines with tenants workloads are created there.

  • openvswitch=enabled - the node hosting Neutron L2 agents and OpenvSwitch pods that manage L2 connection of the OpenStack networks.

  • openstack-gateway=enabled - the node hosting Neutron L3, Metadata and DHCP agents, Octavia Health Manager, Worker and Housekeeping components.

_images/os-k8s-pods-layout.png

Note

OpenStack is an infrastructure management platform. Mirantis OpenStack for Kubernetes (MOS) uses Kubernetes mostly for orchestration and dependency isolation. As a result, multiple OpenStack services are running as privileged containers with host PIDs and Host Networking enabled. You must ensure that at least the user with the credentials used by Helm/Tiller (administrator) is capable of creating such Pods.

Infrastructure services

Service

Description

Storage

While the underlying Kubernetes cluster is configured to use Ceph CSI for providing persistent storage for container workloads, for some types of workloads such networked storage is suboptimal due to latency.

This is why the separate local-volume-provisioner CSI is deployed and configured as an additional storage class. Local Volume Provisioner is deployed as kind: DaemonSet.

Database

A single WSREP (Galera) cluster of MariaDB is deployed as the SQL database to be used by all OpenStack services. It uses the storage class provided by Local Volume Provisioner to store the actual database files. The service is deployed as kind: StatefulSet of a given size, which is no less than 3, on any openstack-control-plane node. For details, see OpenStack database architecture.

Messaging

RabbitMQ is used as a messaging bus between the components of the OpenStack services.

A separate instance of RabbitMQ is deployed for each OpenStack service that needs a messaging bus for intercommunication between its components.

An additional, separate RabbitMQ instance is deployed to serve as a notification messages bus for OpenStack services to post their own and listen to notifications from other services. StackLight also uses this message bus to collect notifications for monitoring purposes.

Each RabbitMQ instance is a single node and is deployed as kind: StatefulSet.

Caching

A single multi-instance of the Memcached service is deployed to be used by all OpenStack services that need caching, which are mostly HTTP API services.

Coordination

A separate instance of etcd is deployed to be used by Cinder, which require Distributed Lock Management for coordination between its components.

Ingress

Is deployed as kind: DaemonSet.

Image pre-caching

A special kind: DaemonSet is deployed and updated each time the kind: OpenStackDeployment resource is created or updated. Its purpose is to pre-cache container images on Kubernetes nodes, and thus, to minimize possible downtime when updating container images.

This is especially useful for containers used in kind: DaemonSet resources, as during the image update Kubernetes starts to pull the new image only after the container with the old image is shut down.

OpenStack services

Service

Description

Identity (Keystone)

Uses MySQL back end by default.

keystoneclient - a separate kind: Deployment with a pod that has the OpenStack CLI client as well as relevant plugins installed, and OpenStack admin credentials mounted. Can be used by administrator to manually interact with OpenStack APIs from within a cluster.

Image (Glance)

Supported back end is RBD (Ceph is required).

Volume (Cinder)

Supported back end is RBD (Ceph is required).

Network (Neutron)

Supported back ends are Open vSwitch and Tungsten Fabric.

Placement

Compute (Nova)

Supported hypervisor is Qemu/KVM through libvirt library.

Dashboard (Horizon)

DNS (Designate)

Supported back end is PowerDNS.

Load Balancer (Octavia)

RADOS Gateway Object Storage (SWIFT) Available since MOS Ussuri Update

Provides the object storage and a RADOS Gateway Swift API that is compatible with the OpenStack Swift API. You can manually enable the service in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Instance HA (Masakari) Available since MOS 21.2, Technical Preview

An OpenStack service that ensures high availability of instances running on a host. You can manually enable Masakari in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Orchestration (Heat)

Key Manager (Barbican)

The supported back ends include:

  • The built-in Simple Crypto, which is used by default

  • Vault

    Vault by HashiCorp is a third-party system and is not installed by MOS. Hence, the Vault storage back end should be available elsewhere on the user environment and accessible from the MOS deployment.

    If the Vault back end is used, you can configure Vault in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Tempest

Runs tests against a deployed OpenStack cloud. You can manually enable Tempest in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

Telemetry

Telemetry services include alarming (aodh), event storage (Panko), metering (Ceilometer), and metric (Gnocchi). All services should be enabled together through the list of services to be deployed in the OpenStackDeployment CR as described in Deploy an OpenStack cluster.

OpenStack database architecture

A complete setup of a MariaDB Galera cluster for OpenStack is illustrated in the following image:

_images/os-k8s-mariadb-galera.png

MariaDB server pods are running a Galera multi-master cluster. Clients requests are forwarded by the Kubernetes mariadb service to the mariadb-server pod that has the primary label. Other pods from the mariadb-server StatefulSet have the backup label. Labels are managed by the mariadb-controller pod.

The MariaDB controller periodically checks the readiness of the mariadb-server pods and sets the primary label to it if the following requirements are met:

  • The primary label has not already been set on the pod.

  • The pod is in the ready state.

  • The pod is not being terminated.

  • The pod name has the lowest integer suffix among other ready pods in the StatefulSet. For example, between mariadb-server-1 and mariadb-server-2, the pod with the mariadb-server-1 name is preferred.

Otherwise, the MariaDB controller sets the backup label. This means that all SQL requests are passed only to one node while other two nodes are in the backup state and replicate the state from the primary node. The MariaDB clients are connecting to the mariadb service.

OpenStack and Ceph controllers integration

The integration between Ceph and OpenStack controllers is implemented through the shared Kubernetes openstack-ceph-shared namespace. Both controllers have access to this namespace to read and write the Kubernetes kind: Secret objects.

_images/osctl-ceph-integration.png

As Ceph is required and only supported back end for several OpenStack services, all necessary Ceph pools must be specified in the configuration of the kind: MiraCeph custom resource as part of the deployment. Once the Ceph cluster is deployed, the Ceph controller posts the information required by the OpenStack services to be properly configured as a kind: Secret object into the openstack-ceph-shared namespace. The OpenStack controller watches this namespace. Once the corresponding secret is created, the OpenStack controller transforms this secret to the data structures expected by the OpenStack-Helm charts. Even if an OpenStack installation is triggered at the same time as a Ceph cluster deployment, the OpenStack controller halts the deployment of the OpenStack services that depend on Ceph availability until the secret in the shared namespace is created by the Ceph controller.

For the configuration of Ceph RADOS Gateway as an OpenStack Object Storage, the reverse process takes place. The OpenStack controller waits for the OpenStack-Helm to create a secret with OpenStack Identity (Keystone) credentials that RADOS Gateway must use to validate the OpenStack Identity tokens, and posts it back to the same openstack-ceph-shared namespace in the format suitable for consumption by the Ceph controller. The Ceph controller then reads this secret and reconfigures RADOS Gateway accordingly.

OpenStack and StackLight integration

StackLight integration with OpenStack includes automatic discovery of RabbitMQ credentials for notifications and OpenStack credentials for OpenStack API metrics. For details, see the openstack.rabbitmq.credentialsConfig and openstack.telegraf.credentialsConfig parameters description in StackLight configuration parameters.

OpenStack and Tungsten Fabric integration

The levels of integration between OpenStack and Tungsten Fabric (TF) include:


Controllers integration

The integration between the OpenStack and TF controllers is implemented through the shared Kubernetes openstack-tf-shared namespace. Both controllers have access to this namespace to read and write the Kubernetes kind: Secret objects.

The OpenStack controller posts the data into the openstack-tf-shared namespace required by the TF services. The TF controller watches this namespace. Once an appropriate secret is created, the TF controller obtains it into the internal data structures for further processing.

The OpenStack controller includes the following data for the TF controller:

  • tunnel_inteface

    Name of the network interface for the TF data plane. This interface is used by TF for the encapsulated traffic for overlay networks.

  • Keystone authorization information

    Keystone Administrator credentials and an up-and-running IAM service are required for the TF controller to initiate the deployment process.

  • Nova metadata information

    Required for the TF vRrouter agent service.

Also, the OpenStack Controller watches the openstack-tf-shared namespace for the vrouter_port parameter that defines the vRouter port number and passes it to the nova-compute pod.


Services integration

The list of the OpenStack services that are integrated with TF through their API include:

  • neutron-server - integration is provided by the contrail-neutron-plugin component that is used by the neutron-server service for transformation of the API calls to the TF API compatible requests.

  • nova-compute - integration is provided by the contrail-nova-vif-driver and contrail-vrouter-api packages used by the nova-compute service for interaction with the TF vRouter to the network ports.

  • octavia-api - integration is provided by the Octavia TF Driver that enables you to use OpenStack CLI and Horizon for operations with load balancers. See Tungsten Fabric integration with OpenStack Octavia for details.

Warning

TF is not integrated with the following OpenStack services:

  • DNS service (Designate)

  • Key management (Barbican)

Services

The section explains specifics of the services provided by Mirantis OpenStack for Kubernetes (MOS). The list of the services and their supported features included in this section is not full and is being constantly amended based on the complexity of the architecture and use of a particular service.

Instance HA service

Instance High Availability Service or Masakari is an OpenStack project designed to ensure high availability of instances and compute processes running on hosts.

The service consists of the following microservices:

  • API recieves requests from users and events from monitors, and sends them to engine

  • Engine executes recovery workflow

  • Monitors detect failures and notifies API. MOS uses monitors of the following types:

    • Instance monitor performs liveness of instance processes

    • Host monitor performs liveness of a compute host, runs as part of the Node controller from the OpenStack controller

    Note

    The Processes monitor is not present in MOS as far as HA for the compute processes is handled by Kubernetes.

Block Storage service
Volume encryption

Available since MOS 21.5 TechPreview

The OpenStack Block Storage service (Cinder) supports volume encryption using a key stored in the OpenStack Key Manager service (Barbican). Such configuration uses Linux Unified Key Setup (LUKS) to create an encrypted volume type and attach it to the OpenStack Compute (Nova) instances. Nova retrieves the asymmetric key from Barbican and stores it on the OpenStack compute node as a libvirt key to encrypt the volume locally or on the back end and only after that transfers it to Cinder.

Note

  • To create an encrypted volume under a non-admin user, the creator role must be assigned to the user.

  • When planning your cloud, consider that encryption may impact CPU.

Image service

Mirantis OpenStack for Kubernetes (MOS) provides the image management capability through the OpenStack Image service, aka Glance.

The Image service enables you to discover, register, and retrieve virtual machine images. Using the Glance API, you can query virtual machine image metadata and retrieve actual images.

MOS deployment profiles include the Image service in the core set of services. You can configure the Image service through the spec:features definition in the OpenStackDeployment custom resource. See features for details.

Image signature verification

Available since MOS 21.6 TechPreview

MOS can automatically verify the cryptographic signatures associated with images to ensure the integrity of their data. A signed image has a few additional properties set in its metadata that include img_signature, img_signature_hash_method, img_signature_key_type, and img_signature_certificate_uuid. You can find more information about these properties and their values in the upstream OpenStack documentation.

MOS performs image signature verification during the following operations:

  • A cloud user or a service creates an image in the store and starts to upload its data. If the signature metadata properties are set on the image, its content gets verified against the signature. The Image service accepts non-signed image uploads.

  • A cloud user spawns a new instance from an image. The Compute service ensures that the data it downloads from the image storage matches the image signature. If the signature is missing or does not match the data, the operation fails. Limitations apply, see Known limitations.

  • A cloud user boots an instance from a volume, or creates a new volume from an image. If the image is signed, the Block Storage service compares the downloaded image data against the signature. If there is a mismatch, the operation fails. The service will accept a non-signed image as a source for a volume. Limitations apply, see Known limitations.

Configuration example
spec:
  features:
    glance:
      signature:
        enabled: true
Signing pre-built images

Every MOS cloud is pre-provisioned with a baseline set of images containing most popular operating systems, such as Ubuntu, Fedora, CirrOS.

In addition, a few services in MOS rely on the creation of service instances to provide their functions, namely the Load Balancer service and the Bare Metal service, and require corresponding images to exist in the image store.

When image signature verification is enabled during the cloud deployment, all these images get automatically signed with a pre-generated self-signed certificate. Enabling the feature in an already existing cloud requires manual signing of all of the images stored in it. Consult the OpenStack documentation for an example of the image signing procedure.

Supported storage back ends

The image signature verification is supported for LVM and local back ends for ephemeral storage.

The functionality is not compatible with Ceph-backed ephemeral storage combined with RAW formatted images. The Ceph copy-on-write mechanism enables the user to create instance virtual disks without downloading the image to a compute node, the data is handled completely on the side of a Ceph cluster. This enables you to spin up instances almost momentarily but makes it impossible to verify the image data before creating an instance from it.

Known limitations
  • The Image service does not enforce the presence of a signature in the metadata when the user creates a new image. The service will accept the non-signed image uploads.

  • The Image service does not verify the correctness of an image signature upon update of the image metadata.

  • MOS does not validate if the certificate used to sign an image is trusted, it only ensures the correctness of the signature itself. Cloud users are allowed to use self-signed certificates.

  • The Compute service does not verify image signature for Ceph back end when the RAW image format is used as described in Supported storage back ends.

  • The Compute service does not verify image signature if the image is already cached on the target compute node.

  • The Instance HA service may experience issues when auto-evacuating instances created from signed images if it does have access to the corresponding secrets in the Key manager service.

  • The Block Storage service does not perform image signature verification when a Ceph back end is used and the images are in the RAW format.

  • The Block Storage service does not enforce the presence of a signature on the images.

Networking

Depending on the size of an OpenStack environment and the components that you use, you may want to have a single or multiple network interfaces, as well as run different types of traffic on a single or multiple VLANs.

This section provides the recommendations for planning the network configuration and optimizing the cloud performance.

Physical networks layout

The diagrams below illustrate the recommended physical networks layout for a Mirantis OpenStack for Kubernetes (MOS) deployment with Ceph.

The list of recommendations applicable to all types of nodes include:

  • Use the Link Aggregation Control Protocol (LACP) bonding mode with MC-LAG domains configured on leaf switches. This corresponds to the 802.3ad bond mode on hosts.

  • Use ports from different multi-port NICs when creating bonds. This makes network connections redundant if failure of a single NIC occurs.

  • Configure the ports that connect servers to the PXE network with PXE VLAN as native or untagged. On these ports, configure LACP fallback to ensure that the servers can reach DHCP server and boot over network.

Container Cloud Management cluster physical networking

The following diagram illustrates physical and L2 connections of the Container Cloud management cluster.

_images/mos-cluster-mgmt-physical.png
MOS managed cluster physical networking
Kubernetes manager nodes

The following diagram illustrates physical and L2 network connections of the Kubernetes manager nodes in a MOS cluster.

Caution

Such configuration does not apply to a compact control plane MOS installation. See Create a managed cluster.

_images/mos-cluster-k8s-mgr-physical.png
OpenStack controller nodes

The following diagram illustrates physical and L2 network connections of the control plane nodes in a MOS cluster.

_images/mos-cluster-control-physical.png
OpenStack compute nodes

The following diagram illustrates physical and L2 network connections of the compute nodes in a MOS cluster.

_images/mos-cluster-compute-physical.png
OpenStack storage nodes

The following diagram illustrates physical and L2 network connections of the storage nodes in a MOS cluster.

_images/mos-cluster-storage-physical.png
Network types

When planning your OpenStack environment, consider what types of traffic your workloads generate and design your network accordingly. If you anticipate that certain types of traffic, such as storage replication, will likely consume a significant amount of network bandwidth, you may want to move that traffic to a dedicated network interface to avoid performance degradation.

L3 networks for Kubernetes

A Mirantis OpenStack for Kubernetes (MOS) deployment typically requires the following networks.

L3 networks for Kubernetes

Network role

Description

VLAN name

Common/PXE network

The network used for the provisioning of bare metal servers.

lcm-nw

Management network

The network used for managing of bare metal servers.

lcm-nw

Kubernetes workloads network

The network used for communication between containers in Kubernetes.

k8s-pods-v

Storage access network (Ceph)

The network used for accessing the Ceph storage. We recommended that it is placed on a dedicated hardware interface.

stor-frontend

Storage replication network (Ceph)

The network used for the storage replication (Ceph). To ensure low latency and fast access, place the network on a dedicated hardware interface.

stor-backend

External networks (MetalLB)

The routable network used for external IP addresses of the Kubernetes LoadBalancer services managed by MetalLB.

k8s-ext-v

Note

When selecting subnets, ensure that the subnet ranges do not overlap with the internal subnets’ ranges. Otherwise, the users’ internal resources will not be available from the deployed Container Cloud managed cluster.

L3 networks for MOS

The MOS deployment additionally requires the following networks.

L3 networks for MOS

Service name

Network

Description

VLAN name

Networking

Provider networks

Typically, a routable network used to provide the external access to OpenStack instances (a floating network). Can be used by the OpenStack services such as Ironic, Manila, and others, to connect their management resources.

pr-floating

Networking

Overlay networks (virtual networks)

The network used to provide denied, secure tenant networks with the help of the tunneling mechanism (VLAN/GRE/VXLAN). If the VXLAN and GRE encapsulation takes place, the IP address assignment is required on interfaces at the node level.

neutron-tunnel

Compute

Live migration network

The network used by the OpenStack compute service (Nova) to transfer data during live migration. Depending on the cloud needs, it can be placed on a dedicated physical network not to affect other networks during live migration. The IP address assignment is required on interfaces at the node level.

lm-vlan

The way of mapping of the logical networks described above to physical networks and interfaces on nodes depends on the cloud size and configuration. We recommend placing OpenStack networks on a dedicated physical interface (bond) that is not shared with storage and Kubernetes management network to minimize the influence on each other.

Performance optimization

To improve the goodput, we recommend that you enable jumbo frames where possible. The jumbo frames have to be enabled on the whole path of the packets traverse. If one of the network components cannot handle jumbo frames, the network path uses the smallest MTU.

To provide fault tolerance of a single NIC, we recommend using the link aggregation, such as bonding. The link aggregation is useful for linear scaling of bandwidth, load balancing, and fault protection. Depending on the hardware equipment, different types of bonds might be supported. Use the multi-chassis link aggregation as it provides fault tolerance at the device level. For example, MLAG on Arista equipment or vPC on Cisco equipment.

The Linux kernel supports the following bonding modes:

  • active-backup

  • balance-xor

  • 802.3ad (LACP)

  • balance-tlb

  • balance-alb

Since LACP is the IEEE standard 802.3ad supported by the majority of network platforms, we recommend using this bonding mode.

Multi-rack architecture

Available since MOS 21.6 TechPreview

Mirantis OpenStack for Kubernetes (MOS) enables you to deploy a cluster with a multi-rack architecture, where every data center cabinet (a rack), incorporates its own Layer 2 network infrastructure that does not extend beyond its top-of-rack switch. The architecture allows a MOS cloud to integrate natively with the Layer 3-centric networking topologies seen in modern data centers, such as Spine-Leaf.

The architecture eliminates the need to stretch and manage VLANs across multiple physical locations in a single data center, or to establish VPN tunnels between the parts of a geographically distributed cloud.

The set of networks present in each rack depends on the type of the OpenStack networking service back end in use.

_images/multi-rack.png
Bare metal provisioning

The multi-rack architecture in Mirantis Container Cloud and MOS requires additional configuration of networking infrastructure. Every Layer 2 domain, or rack, needs to have a DHCP relay agent configured on its dedicated segment of the Common/PXE network (lcm-nw VLAN). The agent handles all Layer-2 DHCP requests incoming from the bare metal servers living in the rack and forwards them as Layer-3 packets across the data center fabric to a Mirantis Container Cloud regional cluster.

_images/multi-rack-bm.png

You need to configure per-rack DHCP ranges by defining Subnet resources in Mirantis Container Cloud as described in Mirantis Container Cloud documentation: Configure multiple DHCP ranges using Subnet resources.

Based on the address of the DHCP agent that relays a request from a server, Mirantis Container Cloud will automatically allocate an IP address in the corresponding subnet.

For the networks types other than Common/PXE, you need to define subnets using the Mirantis Container Cloud L2 templates. Every rack needs to have a dedicated set of L2 templates, each template representing a specific server role and configuration.

Multi-rack MOS cluster with Tungsten Fabric

For MOS clusters with the Tungsten Fabric back end, you need to place the servers running the cloud control plane components into a single rack. This limitation is caused by the Layer 2 VRRP protocol used by the Kubernetes load balancer mechanism (MetalLB) to ensure high availability of Mirantis Container Cloud and MOS API.

Note

In the future product versions, Mirantis will be implementing support for the Layer 3 BGP mode for the Kubernetes load balancing mechanism.

The diagram below will help you to plan the networking layout of a multi-rack MOS cloud with Tungsten Fabric.

_images/multi-rack-tf.png

The table below provides a mapping between the racks and the network types participating in a multi-rack MOS cluster with the Tungsten Fabric back end.

Networks and VLANs for a multi-rack MOS cluster with TF

Network

VLAN name

Rack 1

Rack 2 and N

Common/PXE

lcm-nw

Yes

Yes

Management

lcm-nw

Yes

Yes

External (MetalLB)

k8s-ext-v

Yes

No

Kubernetes workloads

k8s-pods-v

Yes

Yes

Storage access (Ceph)

stor-frontend

Yes

Yes

Storage replication (Ceph)

stor-backend

Yes

Yes

Overlay

tenant-vlan

Yes

Yes

Live migration

lm-vlan

Yes

Yes

Storage

A MOS cluster uses Ceph as a distributed storage system for file, block, and object storage exposed by the Container Cloud baremetal management cluster. This section provides an overview of a Ceph cluster deployed by Container Cloud.

Ceph overview

Mirantis Container Cloud deploys Ceph on the baremetal-based management and managed clusters using Helm charts with the following components:

  • Ceph controller - a Kubernetes controller that obtains the parameters from Container Cloud through a custom resource (CR), creates CRs for Rook, and updates its CR status based on the Ceph cluster deployment progress. It creates users, pools, and keys for OpenStack and Kubernetes and provides Ceph configurations and keys to access them. Also, Ceph controller eventually obtains the data from the OpenStack Controller for the Keystone integration and updates the RADOS Gateway services configurations to use Kubernetes for user authentication.

  • Ceph operator

    • Transforms user parameters from the Container Cloud Ceph CR into Rook objects and deploys a Ceph cluster using Rook.

    • Provides integration of the Ceph cluster with Kubernetes

    • Provides data for OpenStack to integrate with the deployed Ceph cluster

  • Custom resource (CR) - represents the customization of a Kubernetes installation and allows you to define the required Ceph configuration through the Container Cloud web UI before deployment. For example, you can define the failure domain, pools, Ceph node roles, number of Ceph components such as Ceph OSDs, and so on.

  • Rook - a storage orchestrator that deploys Ceph on top of a Kubernetes cluster.

A typical Ceph cluster consists of the following components:

Ceph Monitors

Three or, in rare cases, five Ceph Monitors.

Ceph Managers

Mirantis recommends having three Ceph Managers in every cluster

RADOS Gateway services

Mirantis recommends having three or more RADOS Gateway services for HA.

Ceph OSDs

The number of Ceph OSDs may vary according to the deployment needs.

Warning

  • A Ceph cluster with 3 Ceph nodes does not provide hardware fault tolerance and is not eligible for recovery operations, such as a disk or an entire Ceph node replacement.

  • A Ceph cluster uses the replication factor that equals 3. If the number of Ceph OSDs is less than 3, a Ceph cluster moves to the degraded state with the write operations restriction until the number of alive Ceph OSDs equals the replication factor again.

The placement of Ceph Monitors and Ceph Managers is defined in the custom resource.

The following diagram illustrates the way a Ceph cluster is deployed in Container Cloud:

_images/ceph-deployment.png

The following diagram illustrates the processes within a deployed Ceph cluster:

_images/ceph-data-flow.png
Ceph limitations

A Ceph cluster configuration in Mirantis Container Cloud includes but is not limited to the following limitations:

  • Only one Ceph controller per a management, regional, or managed cluster and only one Ceph cluster per Ceph controller are supported.

  • The replication size for any Ceph pool must be set to more than 1.

  • Only one CRUSH tree per cluster. The separation of devices per Ceph pool is supported through device classes with only one pool of each type for a device class.

  • All CRUSH rules must have the same failure_domain.

  • Only the following types of CRUSH buckets are supported:

    • topology.kubernetes.io/region

    • topology.kubernetes.io/zone

    • topology.rook.io/datacenter

    • topology.rook.io/room

    • topology.rook.io/pod

    • topology.rook.io/pdu

    • topology.rook.io/row

    • topology.rook.io/rack

    • topology.rook.io/chassis

  • RBD mirroring is not supported.

  • Consuming an existing Ceph cluster is not supported.

  • CephFS is not supported.

  • Only IPv4 is supported.

  • If two or more Ceph OSDs are located on the same device, there must be no dedicated WAL or DB for this class.

  • Only a full collocation or dedicated WAL and DB configurations are supported.

  • The minimum size of any defined Ceph OSD device is 5 GB.

  • Reducing the number of Ceph Monitors is not supported and causes the Ceph Monitor daemons removal from random nodes.

  • When adding a Ceph node with the Ceph Monitor role, if any issues occur with the Ceph Monitor, rook-ceph removes it and adds a new Ceph Monitor instead, named using the next alphabetic character in order. Therefore, the Ceph Monitor names may not follow the alphabetical order. For example, a, b, d, instead of a, b, c.

StackLight

StackLight is the logging, monitoring, and alerting solution that provides a single pane of glass for cloud maintenance and day-to-day operations as well as offers critical insights into cloud health including operational information about the components deployed with Mirantis OpenStack for Kubernetes (MOS). StackLight is based on Prometheus, an open-source monitoring solution and a time series database, and Elasticsearch, the logs and notifications storage.

Deployment architecture

Mirantis OpenStack for Kubernetes (MOS) deploys the StackLight stack as a release of a Helm chart that contains the helm-controller and HelmBundle custom resources. The StackLight HelmBundle consists of a set of Helm charts describing the StackLight components. Apart from the OpenStack-specific components below, StackLight also includes the components described in Mirantis Container Cloud Reference Architecture: Deployment architecture. By default, StackLight logging stack is disabled.

During the StackLight configuration when deploying a MOS managed cluster, you can define the HA or non-HA StackLight architecture type. For details, see Mirantis Container Cloud Reference Architecture: StackLight database modes.

OpenStack-specific StackLight components overview

StackLight component

Description

Prometheus native exporters and endpoints

Export the existing metrics as Prometheus metrics and include:

  • libvirt-exporter

  • memcached-exporter

  • mysql-exporter

  • rabbitmq-exporter

  • tungstenfabric-exporter Available since MOS 21.1

Telegraf OpenStack plugin

Collects and processes the OpenStack metrics.

Monitored components

StackLight measures, analyzes, and reports in a timely manner about failures that may occur in the following Mirantis OpenStack for Kubernetes (MOS) components and their sub-components. Apart from the components below, StackLight also monitors the components listed in Mirantis Container Cloud Reference Architecture: Monitored components.

  • Libvirt

  • Memcached

  • MariaDB

  • NTP

  • OpenStack (Barbican, Cinder, Designate, Glance, Heat, Horizon, Ironic, Keystone, Neutron, Nova, Octavia)

  • OpenStack SSL certificates

  • Open vSwitch

  • RabbitMQ

  • Tungsten Fabric (Casandra, Kafka, Redis, ZooKeeper) Available since MOS 21.1

Elasticsearch and Prometheus storage sizing

Caution

Calculations in this document are based on numbers from a real-scale test cluster with 34 nodes. The exact space required for metrics and logs must be calculated depending on the ongoing cluster operations. Some operations force the generation of additional metrics and logs. The values below are approximate. Use them only as recommendations.

During the deployment of a new cluster, you must specify the Elasticsearch retention time and Persistent Volume Claim (PVC) size, Prometheus PVC, retention time, and retention size. When configuring an existing cluster, you can only set Elasticsearch retention time, Prometheus retention time, and retention size.

The following table describes the recommendations for both Elasticsearch and Prometheus retention size and PVC size for a cluster with 34 nodes. Retention time depends on the space allocated for the data. To calculate the required retention time, use the {retention time} = {retention size} / {amount of data per day} formula.

Service

Required space per day

Description

Elasticsearch

StackLight in non-HA mode:
  • 202 - 253 GB for the entire cluster

  • ~6 - 7.5 GB for a single node

StackLight in HA mode:
  • 404 - 506 GB for the entire cluster

  • ~12 - 15 GB for a single node

When setting Persistent Volume Claim Size for Elasticsearch during the cluster creation, take into account that it defines the PVC size for a single instance of the Elasticsearch cluster. StackLight in HA mode has 3 Elasticsearch instances. Therefore, for a total Elasticsearch capacity, multiply the PVC size by 3.

Prometheus

  • 11 GB for the entire cluster

  • ~400 MB for a single node

Every Prometheus instance stores the entire database. Multiple replicas store multiple copies of the same data. Therefore, treat the Prometheus PVC size as the capacity of Prometheus in the cluster. Do not sum them up.

Prometheus has built-in retention mechanisms based on the database size and time series duration stored in the database. Therefore, if you miscalculate the PVC size, retention size set to ~1 GB less than the PVC size will prevent disk overfilling.

Tungsten Fabric

Tungsten Fabric provides basic L2/L3 networking to an OpenStack environment running on the MKE cluster and includes the IP address management, security groups, floating IP addresses, and routing policies functionality. Tungsten Fabric is based on overlay networking, where all virtual machines are connected to a virtual network with encapsulation (MPLSoGRE, MPLSoUDP, VXLAN). This enables you to separate the underlay Kubernetes management network. A workload requires an external gateway, such as a hardware EdgeRouter or a simple gateway to route the outgoing traffic.

The Tungsten Fabric vRouter uses different gateways for the control and data planes.

Tungsten Fabric known limitations

This section contains a summary of the Tungsten Fabric upstream features and use cases not supported in MOS, features and use cases offered as Technology Preview in the current product release if any, and known limitations of Tungsten Fabric in integration with other product components.

Tungsten Fabric known limitations

Feature or use case

Status

Description

Tungsten Fabric web UI

Provided as is

MOS provides the TF web UI as is and does not include this service in the support Service Level Agreement

Automatic generation of network port records in DNSaaS (Designate)

Not supported

As a workaround, you can use the Tungsten Fabric built-in DNS service that enables virtual machines to resolve each other names

Secret management (Barbican)

Not supported

It is not possible to use the certificates stored in Barbican to terminate HTTPs on a load balancer in a Tungsten Fabric deployment

Role Based Access Control (RBAC) for Neutron objects

Not supported

Advanced Tungsten Fabric features

Not supported

Tungsten Fabric does not support the following upstream advanced features:

  • Service Function Chaining

  • Production ready multi-site SDN

Technical Preview

DPDK Available since MOS Ussuri Update

Tungsten Fabric cluster overview

All services of Tungsten Fabric are delivered as separate containers, which are deployed by the Tungsten Fabric Operator (TFO). Each container has an INI-based configuration file that is available on the host system. The configuration file is generated automatically upon the container start and is based on environment variables provided by the TFO through Kubernetes ConfigMaps.

The main Tungsten Fabric containers run with the host network as DeploymentSet, without using the Kubernetes networking layer. The services listen directly on the host network interface.

The following diagram describes the minimum production installation of Tungsten Fabric with a Mirantis OpenStack for Kubernetes (MOS) deployment.

_images/tf-architecture.png
Tungsten Fabric components

This section describes the Tungsten Fabric services and their distribution across the Mirantis OpenStack for Kubernetes (MOS) deployment.

The Tungsten Fabric services run mostly as DaemonSets in a separate container for each service. The deployment and update processes are managed by the Tungsten Fabric operator. However, Kubernetes manages the probe checks and restart of broken containers.

The following tables describe the Tungsten Fabric services:


Configuration and control services in Tungsten Fabric controller containers

Service name

Service description

config-api

Exposes a REST-based interface for the Tungsten Fabric API.

config-nodemgr

Collects data of the Tungsten Fabric configuration processes and sends it to the Tungsten Fabric collector.

control

Communicates with the cluster gateways using BGP and with the vRouter agents using XMPP, as well as redistributes appropriate networking information.

control-nodemgr

Collects the Tungsten Fabric controller process data and sends this information to the Tungsten Fabric collector.

device-manager

Manages physical networking devices using netconf or ovsdb. In multi-node deployments, it operates in the active-backup mode.

dns

Using the named service, provides the DNS service to the VMs spawned on different compute nodes. Each vRouter node connects to two Tungsten Fabric controller containers that run the dns process.

named

The customized Berkeley Internet Name Domain (BIND) daemon of Tungsten Fabric that manages DNS zones for the dns service.

schema

Listens to configuration changes performed by a user and generates corresponding system configuration objects. In multi-node deployments, it works in the active-backup mode.

svc-monitor

Listens to configuration changes of service-template and service-instance, as well as spawns and monitors virtual machines for the firewall, analyzer services, and so on. In multi-node deployments, it works in the active-backup mode.

webui

Consists of the webserver and jobserver services. Provides the Tungsten Fabric web UI.


Analytics services in Tungsten Fabric analytics containers

Service name

Service description

alarm-gen

Evaluates and manages the alarms rules.

analytics-api

Provides a REST API to interact with the Cassandra analytics database.

analytics-nodemgr

Collects all Tungsten Fabric analytics process data and sends this information to the Tungsten Fabric collector.

analytics-database-nodemgr

Provisions the init model if needed. Collects data of the database process and sends it to the Tungsten Fabric collector.

collector

Collects and analyzes data from all Tungsten Fabric services.

query-engine

Handles the queries to access data from the Cassandra database.

snmp-collector

Receives the authorization and configuration of the physical routers from the config-nodemgr service, polls the physical routers using the Simple Network Management Protocol (SNMP), and uploads the data to the Tungsten Fabric collector.

topology

Reads the SNMP information from the physical router user-visible entities (UVEs), creates a neighbor list, and writes the neighbor information to the physical router UVEs. The Tungsten Fabric web UI uses the neighbor list to display the physical topology.


vRouter services on the OpenStack compute nodes

Service name

Service description

vrouter-agent

Connects to the Tungsten Fabric controller container and the Tungsten Fabric DNS system using the Extensible Messaging and Presence Protocol (XMPP).

vrouter-nodemgr

Collects the supervisor vrouter data and sends it to the Tungsten Fabric collector.


Third-party services for Tungsten Fabric

Service name

Service description

cassandra

  • On the Tungsten Fabric control plane nodes, maintains the configuration data of the Tungsten Fabric cluster.

  • On the Tungsten Fabric analytics nodes, stores the collector service data.

cassandra-operator

The Kubernetes operator that enables the Cassandra clusters creation and management.

kafka

Handles the messaging bus and generates alarms across the Tungsten Fabric analytics containers.

kafka-operator

The Kubernetes operator that enabels Kafka clusters creation and management.

redis

Stores the physical router UVE storage and serves as a messaging bus for event notifications.

redis-operator

The Kubernetes operator that enables Redis clusters creation and management.

zookeeper

Holds the active-backup status for the device-manager, svc-monitor, and the schema-transformer services. This service is also used for mapping of the Tungsten Fabric resources names to UUIDs.

zookeeper-operator

The Kubernetes operator that enables ZooKeeper clusters creation and management.

rabbitmq

Exchanges messages between API servers and original request senders.

rabbitmq-operator

The Kubernetes operator that enables RabbitMQ clusters creation and management.


Tungsten Fabric plugin services on the OpenStack controller nodes

Service name

Service description

neutron-server

The Neutron server that includes the Tungsten Fabric plugin.

octavia-api

The Octavia API that includes the Tungsten Fabric Octavia driver.

heat-api

The Heat API that includes the Tungsten Fabric Heat resources and templates.

Tungsten Fabric operator

The Tungsten Fabric operator (TFO) is based on the operator SDK project. The operator SDK is a framework that uses the controller-runtime library to make writing operators easier by providing:

  • High-level APIs and abstractions to write the operational logic more intuitively.

  • Tools for scaffolding and code generation to bootstrap a new project fast.

  • Extensions to cover common operator use cases.

The TFO deploys the following sub-operators. Each sub-operator handles a separate part of a TF deployment:

TFO sub-operators

Network

Description

TFControl

Deploys the Tungsten Fabric control services, such as:

  • Control

  • DNS

  • Control NodeManager

TFConfig

Deploys the Tungsten Fabric configuration services, such as:

  • API

  • Service monitor

  • Schema transformer

  • Device manager

  • Configuration NodeManager

  • Database NodeManager

TFAnalytics

Deploys the Tungsten Fabric analytics services, such as:

  • API

  • Collector

  • Alarm

  • Alarm-gen

  • SNMP

  • Topology

  • Alarm NodeManager

  • Database NodeManager

  • SNMP NodeManager

TFVrouter

Deploys a vRouter on each compute node with the following services:

  • vRouter agent

  • NodeManager

TFWebUI

Deploys the following web UI services:

  • Web server

  • Job server

TFTool

Deploys the following tools to verify the TF deployment status:

  • TF-status

  • TF-status aggregator

TFTest

An operator to run Tempest tests.

Besides the sub-operators that deploy TF services, TFO uses operators to deploy and maintain third-party services, such as different types of storage, cache, message system, and so on. The following table describes all third-party operators:

TFO third-party sub-operators

Network

Description

casandra-operator

An upstream operator that automates the Cassandra HA storage operations for the configuration and analytics data.

zookeeper-operator

An upstream operator for deployment and automation of a ZooKeeper cluster.

kafka-operator

An operator for the Kafka cluster used by analytics services.

redis-operator

An upstream operator that automates the Redis cluster deployment and keeps it healthy.

rabbitmq-operator

An operator for the messaging system based on RabbitMQ.

The following diagram illustrates a simplified TFO workflow:

_images/tf-operator-workflow.png
TFOperator custom resource

The resource of kind TFOperator (TFO) is a custom resource (CR) defined by a resource of kind CustomResourceDefinition.

The CustomResourceDefinition resource in Kubernetes uses the OpenAPI Specification (OAS) version 2 to specify the schema of the defined resource. The Kubernetes API outright rejects the resources that do not pass this schema validation. Along with schema validation, starting from MOS 21.6, TFOperator uses ValidatingAdmissionWebhook for extended validations when a CR is created or updated.

This section describes the TFOperator CR parameters.

TFOperator custom resource validation

Available since MOS 21.6

Tungsten Fabric Operator uses ValidatingAdmissionWebhook to validate environment variables set to Tungsten Fabric components upon the TFOperator object creation or update. The following validations are performed:

  • Environment variables passed to TF components containers

  • Mapping between tfVersion and tfImageTag, if defined

If required, you can disable ValidatingAdmissionWebhook through the TFOperator HelmBundle resource:

apiVersion: lcm.mirantis.com/v1alpha1
kind: HelmBundle
metadata:
  name: tungstenfabric-operator
  namespace: tf
spec:
  releases:
  - name: tungstenfabric-operator
    values:
      admission:
        enabled: false

The following table lists the allowed variables.

Allowed environment variables for TF components

Environment variables

TF components and containers

  • INTROSPECT_LISTEN_ALL

  • LOG_DIR

  • LOG_LEVEL

  • LOG_LOCAL

  • tf-analytics (alarm-gen, api, collector, alarm-nodemgr, db-nodemgr, nodemgr, snmp-nodemgr, query-engine, snmp, topology)

  • tf-config (api, db-nodemgr, nodemgr)

  • tf-control (control, dns, nodemgr)

  • tf-vrouter (agent, dpdk-nodemgr, nodemgr)

  • LOG_DIR

  • LOG_LEVEL

  • LOG_LOCAL

tf-config (config, devicemgr, schema, svc-monitor)

  • PROVISION_DELAY

  • PROVISION_RETRIES

  • BGP_ASN

  • ENCAP_PRIORITY

  • VXLAN_VN_ID_MODE

  • tf-analytics (alarm-provisioner, db-provisioner, provisioner, snmp-provisioner)

  • tf-config (db-provisioner, provisioner)

  • tf-control (provisioner)

  • tf-vrouter (dpdk-provisioner, provisioner)

  • CONFIG_API_LIST_OPTIMIZATION_ENABLED

  • CONFIG_API_WORKER_COUNT

  • CONFIG_API_MAX_REQUESTS

  • FWAAS_ENABLE

  • RABBITMQ_HEARTBEAT_INTERVAL

  • DISABLE_VNC_API_STATS

tf-config (config)

  • DNS_NAMED_MAX_CACHE_SIZE

  • DNS_NAMED_MAX_RETRANSMISSIONS

  • DNS_RETRANSMISSION_INTERVAL

tf-control (dns)

  • WEBUI_LOG_LEVEL

  • WEBUI_STATIC_AUTH_PASSWORD

  • WEBUI_STATIC_AUTH_ROLE

  • WEBUI_STATIC_AUTH_USER

tf-webui (job, web)

  • ANALYTICS_CONFIG_AUDIT_TTL

  • ANALYTICS_DATA_TTL

  • ANALYTICS_FLOW_TTL

  • ANALYTICS_STATISTICS_TTL

  • COLLECTOR_disk_usage_percentage_high_watermark0

  • COLLECTOR_disk_usage_percentage_high_watermark1

  • COLLECTOR_disk_usage_percentage_high_watermark2

  • COLLECTOR_disk_usage_percentage_low_watermark0

  • COLLECTOR_disk_usage_percentage_low_watermark1

  • COLLECTOR_disk_usage_percentage_low_watermark2

  • COLLECTOR_high_watermark0_message_severity_level

  • COLLECTOR_high_watermark1_message_severity_level

  • COLLECTOR_high_watermark2_message_severity_level

  • COLLECTOR_low_watermark0_message_severity_level

  • COLLECTOR_low_watermark1_message_severity_level

  • COLLECTOR_low_watermark2_message_severity_level

  • COLLECTOR_pending_compaction_tasks_high_watermark0

  • COLLECTOR_pending_compaction_tasks_high_watermark1

  • COLLECTOR_pending_compaction_tasks_high_watermark2

  • COLLECTOR_pending_compaction_tasks_low_watermark0

  • COLLECTOR_pending_compaction_tasks_low_watermark1

  • COLLECTOR_pending_compaction_tasks_low_watermark2

  • COLLECTOR_LOG_FILE_COUNT

  • COLLECTOR_LOG_FILE_SIZE

tf-analytics (collector)

  • ANALYTICS_DATA_TTL

  • QUERYENGINE_MAX_SLICE

  • QUERYENGINE_MAX_TASKS

  • QUERYENGINE_START_TIME

tf-analytics (query-engine)

  • SNMPCOLLECTOR_FAST_SCAN_FREQUENCY

  • SNMPCOLLECTOR_SCAN_FREQUENCY

tf-analytics (snmp)

TOPOLOGY_SCAN_FREQUENCY

tf-analytics (topology)

  • DPDK_UIO_DRIVER

  • PHYSICAL_INTERFACE

  • SRIOV_PHYSICAL_INTERFACE

  • SRIOV_PHYSICAL_NETWORK

  • SRIOV_VF

  • TSN_AGENT_MODE

  • TSN_NODES

  • AGENT_MODE

  • FABRIC_SNAT_HASH_TABLE_SIZE

  • PRIORITY_BANDWIDTH

  • PRIORITY_ID

  • PRIORITY_SCHEDULING

  • PRIORITY_TAGGING

  • QOS_DEF_HW_QUEUE

  • QOS_LOGICAL_QUEUES

  • QOS_QUEUE_ID

  • VROUTER_GATEWAY

  • HUGE_PAGES_2MB

  • HUGE_PAGES_1GB

  • DISABLE_TX_OFFLOAD

  • DISABLE_STATS_COLLECTION

tf-vrouter (agent)

  • CPU_CORE_MASK

  • SERVICE_CORE_MASK

  • DPDK_CTRL_THREAD_MASK

  • DPDK_COMMAND_ADDITIONAL_ARGS

  • DPDK_MEM_PER_SOCKET

  • DPDK_UIO_DRIVER

  • HUGE_PAGES

  • HUGE_PAGES_DIR

  • NIC_OFFLOAD_ENABLE

  • DPDK_ENABLE_VLAN_FWRD

tf-vrouter (agent-dpdk)

Control interface specification

Available since MOS 21.3

By default, the TF control service uses the management interface for the BGP and XMPP traffic. You can change the control service interface using the controlInterface parameter in the TFOperator CR, for example, to combine the BGP and XMPP traffic with the data (tenant) traffic:

spec:
  settings:
    controlInterface: <tunnel interface>
Custom vRouter settings

Available since MOS 21.2 TechPreview

To specify custom settings for the Tungsten Fabric (TF) vRouter nodes, for example, to change the name of the tunnel network interface or enable debug level logging on some subset of nodes, use the customSpecs settings in the TFOperator CR.

For example, to enable debug level logging on a specific node or multiple nodes:

spec:
  controllers:
    tf-vrouter:
      agent:
        customSpecs:
        - name: debug
          label:
            name: <NODE-LABEL>
            value: <NODE-LABEL-VALUE>
          containers:
          - name: agent
            env:
            - name: LOG_LEVEL
              value: SYS_DEBUG

The customSpecs parameter inherits all settings for the tf-vrouter containers that are set on the spec:controllers:agent level and overrides or adds additional parameters. The example configuration above overrides the logging level from SYS_INFO, which is the default logging level, to SYS_DEBUG.

Starting from MOS 21.6, for clusters with a multi-rack architecture, you may need to redefine the gateway IP for the Tungsten Fabric vRouter nodes using the VROUTER_GATEWAY parameter. For details, see Multi-rack architecture.

Tungsten Fabric traffic flow

This section describes the types of traffic and traffic flow directions in a Mirantis OpenStack for Kubernetes (MOS) cluster.

User interface and API traffic

The following diagram illustrates all types of UI and API traffic in a MOS cluster, including the monitoring and OpenStack API traffic. The OpenStack Dashboard pod hosts Horizon and acts as a proxy for all other types of traffic. TLS termination is also performed for this type of traffic.

_images/tf-traffic_flow_ui_api.png
SDN traffic

SDN or Tungsten Fabric traffic goes through the overlay Data network and processes east-west and north-south traffic for applications that run in a MOS cluster. This network segment typically contains tenant networks as separate MPLS-over-GRE and MPLS-over-UDP tunnels. The traffic load depends on the workload.

The control traffic between the Tungsten Fabric controllers, edge routers, and vRouters uses the XMPP with TLS and iBGP protocols. Both protocols produce low traffic that does not affect MPLS over GRE and MPLS over UDP traffic. However, this traffic is critical and must be reliably delivered. Mirantis recommends configuring higher QoS for this type of traffic.

The following diagram displays both MPLS over GRE/MPLS over UDP and iBGP and XMPP traffic examples in a MOS cluster:

_images/tf-traffic_flow_sdn.png
Tungsten Fabric vRouter

The Tungsten Fabric vRouter provides data forwarding to an OpenStack tenant instance and reports statistics to the Tungsten Fabric analytics service. The Tungsten Fabric vRouter is installed on all OpenStack compute nodes. Mirantis OpenStack for Kubernetes (MOS) supports the kernel-based deployment of the Tungsten Fabric vRouter.

The vRouter agent acts as a local control plane. Each Tungsten Fabric vRouter agent is connected to at least two Tungsten Fabric controllers in an active-active redundancy mode. The Tungsten Fabric vRouter agent is responsible for all networking-related functions including routing instances, routes, and so on.

The Tungsten Fabric vRouter uses different gateways for the control and data planes. For example, the Linux system gateway is located on the management network, and the Tungsten Fabric gateway is located on the data plane network.

The following diagram illustrates the Tungsten Fabric kernel vRouter setup by the TF operator:

_images/tf_vrouter.png

On the diagram above, the following types of networks interfaces are used:

  • eth0 - for the management (PXE) network (eth1 and eth2 are the slave interfaces of Bond0)

  • Bond0.x - for the MKE control plane network

  • Bond0.y - for the MKE data plane network

Tungsten Fabric integration with OpenStack Octavia

MOS ensures Octavia with Tungsten Fabric integration by OpenStack Octavia Driver with Tungsten Fabric HAProxy as a back end.

Octavia Tungsten Fabric Driver supports creation, update, and deletion operations with the following entities:

  • Load balancers

    Note

    For a load balancer creation operation, the driver supports only the vip-subnet-id argument, the vip-network-id argument is not supported.

  • Listeners

  • Pools

  • Health monitors

Octavia Tungsten Fabric Driver does not support the following functionality:

  • L7 load balancing capabilities, such as L7 policies, L7 rules, and others

  • Setting specific availability zones for load balancers and their resources

  • Using of the UDP protocol

  • Operations with Octavia quotas

  • Operations with Octavia flavors

Warning

Octavia Tungsten Fabric Driver enables you to manage the load balancer resources through the OpenStack CLI or OpenStack Horizon. Do not perform any operations on the load balancer resources through the Tungsten Fabric web UI because in this case the changes will not be reflected on the OpenStack side.

Deployment Guide

Mirantis OpenStack for Kubernetes (MOS) enables the operator to create, scale, update, and upgrade OpenStack deployments on Kubernetes through a declarative API.

The Kubernetes built-in features, such as flexibility, scalability, and declarative resource definition make MOS a robust solution.

Plan the deployment

The detailed plan of any Mirantis OpenStack for Kubernetes (MOS) deployment is determined on a per-cloud basis. For the MOS reference architecture and design overview, see Reference Architecture.

Also, read through Mirantis Container Cloud Reference Architecture: Container Cloud bare metal as a MOS managed cluster is deployed on top of a baremetal-based Container Cloud management cluster.

Note

One of the industry best practices is to verify every new update or configuration change in a non-customer-facing environment before applying it to production. Therefore, Mirantis recommends having a staging cloud, deployed and maintained along with the production clouds. The recommendation is especially applicable to the environments that:

  • Receive updates often and use continuous delivery. For example, any non-isolated deployment of Mirantis Container Cloud and Mirantis OpenStack for Kubernetes (MOS).

  • Have significant deviations from the reference architecture or third party extensions installed.

  • Are managed under the Mirantis OpsCare program.

  • Run business-critical workloads where even the slightest application downtime is unacceptable.

A typical staging cloud is a complete copy of the production environment including the hardware and software configurations, but with a bare minimum of compute and storage capacity.

Provision a Container Cloud bare metal management cluster

The bare metal management system enables the Infrastructure Operator to deploy Container Cloud on a set of bare metal servers. It also enables Container Cloud to deploy MOS managed clusters on bare metal servers without a pre-provisioned operating system.

To provision your bare metal management cluster, refer to Mirantis Container Cloud Deployment Guide: Deploy a baremetal-based management cluster

Create a managed cluster

After bootstrapping your baremetal-based Mirantis Container Cloud management cluster, you can create a baremetal-based managed cluster to deploy Mirantis OpenStack for Kubernetes using the Container Cloud API.

Add a bare metal host

Before creating a bare metal managed cluster, add the required number of bare metal hosts using CLI and YAML files for configuration. This section describes how to add bare metal hosts using the Container Cloud CLI during a managed cluster creation.

To add a bare metal host:

  1. Verify that you configured each bare metal host as follows:

    • Enable the boot NIC support for UEFI load. Usually, at least the built-in network interfaces support it.

    • Enable the UEFI-LAN-OPROM support in BIOS -> Advanced -> PCIPCIe.

    • Enable the IPv4-PXE stack.

    • Set the following boot order:

      1. UEFI-DISK

      2. UEFI-PXE

    • If your PXE network is not configured to use the first network interface, fix the UEFI-PXE boot order to speed up node discovering by selecting only one required network interface.

    • Power off all bare metal hosts.

    Warning

    Only one Ethernet port on a host must be connected to the Common/PXE network at any given time. The physical address (MAC) of this interface must be noted and used to configure the BareMetalHost object describing the host.

  2. Log in to the host where your management cluster kubeconfig is located and where kubectl is installed.

  3. Create a secret YAML file that describes the unique credentials of the new bare metal host.

    Example of the bare metal host secret
     apiVersion: v1
     data:
       password: <credentials-password>
       username: <credentials-user-name>
     kind: Secret
     metadata:
       labels:
         kaas.mirantis.com/credentials: "true"
         kaas.mirantis.com/provider: baremetal
         kaas.mirantis.com/region: region-one
       name: <credentials-name>
       namespace: <managed-cluster-project-name>
     type: Opaque
    

    In the data section, add the IPMI user name and password in the base64 encoding to access the BMC. To obtain the base64-encoded credentials, you can use the following command in your Linux console:

    echo -n <username|password> | base64
    

    Caution

    Each bare metal host must have a unique Secret.

  4. Apply this secret YAML file to your deployment:

    kubectl apply -f ${<bmh-cred-file-name>}.yaml
    
  5. Create a YAML file that contains a description of the new bare metal host.

    Example of the bare metal host configuration file with the worker role
    apiVersion: metal3.io/v1alpha1
    kind: BareMetalHost
    metadata:
      labels:
        kaas.mirantis.com/baremetalhost-id: <unique-bare-metal-host-hardware-node-id>
        hostlabel.bm.kaas.mirantis.com/worker: "true"
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
      name: <bare-metal-host-unique-name>
      namespace: <managed-cluster-project-name>
    spec:
      bmc:
        address: <ip_address_for-bmc-access>
        credentialsName: <credentials-name>
      bootMACAddress: <bare-metal-host-boot-mac-address>
      online: true
    
  6. Apply this configuration YAML file to your deployment:

    kubectl apply -f ${<bare-metal-host-config-file-name>}.yaml
    
  7. Check the status of the bare metal hosts using the following command:

    kubectl get -n <managed-cluster-project-name> baremetalhosts -o yaml
    
  8. Using the command above, check that the hosts hardware configuration matches the MOS cluster hardware requirements.

Now, proceed with Create a custom bare metal host profile.

Create a custom bare metal host profile

The bare metal host profile is a Kubernetes custom resource. It enables the operator to define how the storage devices and the operating system are provisioned and configured.

This section describes the bare metal host profile default settings and configuration of custom profiles for managed clusters using Mirantis Container Cloud API.

Default configuration of the host system storage

The default host profile requires three storage devices in the following strict order:

  1. Boot device and operating system storage

    This device contains boot data and operating system data. It is partitioned using the GUID Partition Table (GPT) labels. The root file system is an ext4 file system created on top of an LVM logical volume. For a detailed layout, refer to the table below.

  2. Local volumes device

    This device contains an ext4 file system with directories mounted as persistent volumes to Kubernetes. These volumes are used by the Mirantis Container Cloud services to store its data, including monitoring and identity databases.

  3. Ceph storage device

    This device is used as a Ceph datastore or Ceph OSD.

The following table summarizes the default configuration of the host system storage set up by the Container Cloud bare metal management.

Default configuration of the bare metal host storage

Device/partition

Name/Mount point

Recommended size, GB

Description

/dev/sda1

bios_grub

4 MiB

The mandatory GRUB boot partition required for non-UEFI systems.

/dev/sda2

UEFI -> /boot/efi

0.2 GiB

The boot partition required for the UEFI boot mode.

/dev/sda3

config-2

64 MiB

The mandatory partition for the cloud-init configuration. Used during the first host boot for initial configuration.

/dev/sda4

lvm_root_part

100% of the remaining free space in the LVM volume group

The main LVM physical volume that is used to create the root file system.

/dev/sdb

lvm_lvp_part -> /mnt/local-volumes

100% of the remaining free space in the LVM volume group

The LVM physical volume that is used to create the file system for LocalVolumeProvisioner.

/dev/sdc

-

100% of the remaining free space in the LVM volume group

Clean raw disk that will be used for the Ceph storage back end.

Now, proceed to Create MOS host profiles.

Create MOS host profiles

Different types of MOS nodes require differently configured host storage. This section describes how to create custom host profiles for different types of MOS nodes.

You can create custom profiles for managed clusters using Container Cloud API.

To create MOS bare metal host profiles:

  1. Log in to the local machine where you management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created automatically during the last stage of the management cluster bootstrap.

  2. Create a new bare metal host profile for MOS compute nodes in a YAML file under the templates/bm/ directory.

  3. Edit the host profile using the example template below to meet your hardware configuration requirements:

    apiVersion: metal3.io/v1alpha1
    kind: BareMetalHostProfile
    metadata:
      name: <PROFILE_NAME>
      namespace: <PROJECT_NAME>
    spec:
      devices:
      # From the HW node, obtain the first device, which size is at least 60Gib
      - device:
          minSizeGiB: 60
          wipe: true
        partitions:
        - name: bios_grub
          partflags:
          - bios_grub
          sizeGiB: 0.00390625
          wipe: true
        - name: uefi
          partflags:
          - esp
          sizeGiB: 0.2
          wipe: true
        - name: config-2
          sizeGiB: 0.0625
          wipe: true
        # This partition is only required on compute nodes if you plan to
        # use LVM ephemeral storage.
        - name: lvm_nova_part
          wipe: true
          sizeGiB: 100
        - name: lvm_root_part
          sizeGiB: 0
          wipe: true
      # From the HW node, obtain the second device, which size is at least 60Gib
      # If a device exists but does not fit the size,
      # the BareMetalHostProfile will not be applied to the node
      - device:
          minSizeGiB: 60
          wipe: true
      # From the HW node, obtain the disk device with the exact name
      - device:
          byName: /dev/nvme0n1
          minSizeGiB: 60
          wipe: true
        partitions:
        - name: lvm_lvp_part
          sizeGiB: 0
          wipe: true
      # Example of wiping a device w\o partitioning it.
      # Mandatory for the case when a disk is supposed to be used for Ceph back end
      # later
      - device:
          byName: /dev/sde
          wipe: true
      fileSystems:
      - fileSystem: vfat
        partition: config-2
      - fileSystem: vfat
        mountPoint: /boot/efi
        partition: uefi
      - fileSystem: ext4
        logicalVolume: root
        mountPoint: /
      - fileSystem: ext4
        logicalVolume: lvp
        mountPoint: /mnt/local-volumes/
      logicalVolumes:
      - name: root
        sizeGiB: 0
        vg: lvm_root
      - name: lvp
        sizeGiB: 0
        vg: lvm_lvp
      postDeployScript: |
        #!/bin/bash -ex
        echo $(date) 'post_deploy_script done' >> /root/post_deploy_done
      preDeployScript: |
        #!/bin/bash -ex
        echo $(date) 'pre_deploy_script done' >> /root/pre_deploy_done
      volumeGroups:
      - devices:
        - partition: lvm_root_part
        name: lvm_root
      - devices:
        - partition: lvm_lvp_part
        name: lvm_lvp
      grubConfig:
        defaultGrubOptions:
        - GRUB_DISABLE_RECOVERY="true"
        - GRUB_PRELOAD_MODULES=lvm
        - GRUB_TIMEOUT=20
      kernelParameters:
        sysctl:
          kernel.panic: "900"
          kernel.dmesg_restrict: "1"
          kernel.core_uses_pid: "1"
          fs.file-max: "9223372036854775807"
          fs.aio-max-nr: "1048576"
          fs.inotify.max_user_instances: "4096"
          vm.max_map_count: "262144"
    
  4. Add or edit the mandatory parameters in the new BareMetalHostProfile object. For the parameters description, see API: BareMetalHostProfile spec.

  5. Add the bare metal host profile to your management cluster:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> apply -f <pathToBareMetalHostProfileFile>
    
  6. If required, further modify the host profile:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> edit baremetalhostprofile <hostProfileName>
    
  7. Repeat the steps above to create host profiles for other OpenStack node roles such as control plane nodes and storage nodes.

Now, proceed to Enable huge pages in a host profile.

Enable huge pages in a host profile

The BareMetalHostProfile API allows configuring a host to use the huge pages feature of the Linux kernel on managed clusters.

Note

Huge pages is a mode of operation of the Linux kernel. With huge pages enabled, the kernel allocates the RAM in bigger chunks, or pages. This allows a KVM (kernel-based virtual machine) and VMs running on it to use the host RAM more efficiently and improves the performance of VMs.

To enable huge pages in a custom bare metal host profile for a managed cluster:

  1. Log in to the local machine where you management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created automatically during the last stage of the management cluster bootstrap.

  2. Open for editing or create a new bare metal host profile under the templates/bm/ directory.

  3. Edit the grubConfig section of the host profile spec using the example below to configure the kernel boot parameters and enable huge pages:

    spec:
      grubConfig:
        defaultGrubOptions:
        - GRUB_DISABLE_RECOVERY="true"
        - GRUB_PRELOAD_MODULES=lvm
        - GRUB_TIMEOUT=20
        - GRUB_CMDLINE_LINUX_DEFAULT="hugepagesz=1G hugepages=N"
    

    The example configuration above will allocate N huge pages of 1 GB each on the server boot. The last hugepagesz parameter value is default unless default_hugepagesz is defined. For details about possible values, see official Linux kernel documentation.

  4. Add the bare metal host profile to your management cluster:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> apply -f <pathToBareMetalHostProfileFile>
    
  5. If required, further modify the host profile:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> edit baremetalhostprofile <hostProfileName>
    
  6. Proceed to Create a managed cluster.

Create a managed cluster

This section instructs you on how to configure and deploy a managed cluster that is based on the baremetal-based management cluster through the Mirantis Container Cloud web UI.

To create a managed cluster on bare metal:

  1. Log in to the Container Cloud web UI with the writer permissions.

  2. Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.

    Caution

    Do not create a new managed cluster for MOS in the default project (Kubernetes namespace). If no projects are defined, create a new mos project first.

  3. In the SSH keys tab, click Add SSH Key to upload the public SSH key that will be used for the SSH access to VMs.

  4. Available since 2.7.0 Optional. In the Proxies tab, enable proxy access to the managed cluster:

    1. Click Add Proxy.

    2. In the Add New Proxy wizard, fill out the form with the following parameters:

      Proxy configuration

      Parameter

      Description

      Proxy Name

      Name of the proxy server to use during a managed cluster creation.

      Region

      From the drop-down list, select the required region.

      HTTP Proxy

      Add the HTTP proxy server domain name in the following format:

      • http://proxy.example.com:port - for anonymous access

      • http://user:password@proxy.example.com:port - for restricted access

      HTTPS Proxy

      Add the HTTPS proxy server domain name in the same format as for HTTP Proxy.

      No Proxy

      Comma-separated list of IP addresses or domain names.

    For the list of Mirantis resources and IP addresses to be accessible from the Container Cloud clusters, see Reference Architecture: Requirements.

  5. In the Clusters tab, click Create Cluster.

  6. Configure the new cluster in the Create New Cluster wizard that opens:

    1. Define general and Kubernetes parameters:

      Create new cluster: General, Provider, and Kubernetes

      Section

      Parameter name

      Description

      General settings

      Cluster name

      The cluster name.

      Provider

      Select Baremetal.

      Region

      From the drop-down list, select Baremetal.

      Release version

      Select a Container Cloud version with the OpenStack label tag. Otherwise, you will not be able to deploy MOS on this managed cluster.

      Proxy Available since 2.7.0

      Optional. From the drop-down list, select the proxy server name that you have previously created.

      SSH keys

      From the drop-down list, select the SSH key name that you have previously added for SSH access to the bare metal hosts.

      Provider

      LB host IP

      The IP address of the load balancer endpoint that will be used to access the Kubernetes API of the new cluster. This IP address must be on the Combined/PXE network.

      LB address range

      The range of IP addresses that can be assigned to load balancers for Kubernetes Services by MetalLB.

      Kubernetes

      Services CIDR blocks

      The Kubernetes Services CIDR blocks. For example, 10.233.0.0/18.

      Pods CIDR blocks

      The Kubernetes pods CIDR blocks. For example, 10.233.64.0/18.

    2. Configure StackLight:

  7. Click Create.

  8. Optional. As of MOS 21.4, you can colocate the OpenStack control plane with the managed cluster Kubernetes manager nodes by adding the following field to the Cluster object spec:

    spec:
      providerSpec:
        value:
          dedicatedControlPlane: false
    

    Note

    This feature is available as technical preview. Use such configuration for testing and evaluation purposes only.

  9. Once you have created a MOS managed cluster, some StackLight alerts may raise as false-positive until you deploy the Mirantis OpenStack environment.

  10. Proceed to Advanced networking configuration.

Advanced networking configuration

With L2 networking templates, you can create advanced host networking configurations for your clusters. For example, you can create bond interfaces on top of physical interfaces on the host or use multiple subnets to separate different types of network traffic.

You can use several host-specific L2 templates per one cluster to support different hardware configurations. For example, you can create L2 templates with different number and layout of NICs to be applied to the specific machines of one cluster.

You can also use multiple L2 templates to support different roles for nodes in a MOS installation. You can create L2 templates with different logical interfaces, and assign them to individual machines based on their roles in a MOS cluster.

When you create a baremetal-based project, the exemplary templates with the ipam/PreInstalledL2Template label are copied to this project. These templates are preinstalled during the management cluster bootstrap.

Follow the procedures below to create L2 templates for your managed MOS clusters.

Workflow of network interface naming

To simplify operations with L2 templates, before you start creating them, inspect the general workflow of a network interface name gathering and processing.

Network interface naming workflow:

  1. The Operator creates a baremetalHost object.

  2. The baremetalHost object executes the introspection stage and becomes ready.

  3. The Operator collects information about NIC count, naming, and so on for further changes in the mapping logic.

    At this stage, the NICs order in the object may randomly change during each introspection, but the NICs names are always the same. For more details, see Predictable Network Interface Names.

    For example:

    # Example commands:
    # kubectl -n managed-ns get bmh baremetalhost1 -o custom-columns='NAME:.metadata.name,STATUS:.status.provisioning.state'
    # NAME            STATE
    # baremetalhost1  ready
    
    # kubectl -n managed-ns get bmh baremetalhost1 -o yaml
    # Example output:
    
    apiVersion: metal3.io/v1alpha1
    kind: BareMetalHost
    ...
    status:
    ...
        nics:
        - ip: fe80::ec4:7aff:fe6a:fb1f%eno2
          mac: 0c:c4:7a:6a:fb:1f
          model: 0x8086 0x1521
          name: eno2
          pxe: false
        - ip: fe80::ec4:7aff:fe1e:a2fc%ens1f0
          mac: 0c:c4:7a:1e:a2:fc
          model: 0x8086 0x10fb
          name: ens1f0
          pxe: false
        - ip: fe80::ec4:7aff:fe1e:a2fd%ens1f1
          mac: 0c:c4:7a:1e:a2:fd
          model: 0x8086 0x10fb
          name: ens1f1
          pxe: false
        - ip: 192.168.1.151 # Temp. PXE network adress
          mac: 0c:c4:7a:6a:fb:1e
          model: 0x8086 0x1521
          name: eno1
          pxe: true
     ...
    
  4. The Operator selects from the following options:

  5. The Operator creates a Machine or Subnet object.

  6. The baremetal-provider service links the Machine object to the baremetalHost object.

  7. The kaas-ipam and baremetal-provider services collect hardware information from the baremetalHost object and use it to configure host networking and services.

  8. The kaas-ipam service:

    1. Spawns the IpamHost object.

    2. Renders the l2template object.

    3. Spawns the ipaddr object.

    4. Updates the IpamHost object status with all rendered and linked information.

  9. The baremetal-provider service collects the rendered networking information from the IpamHost object

  10. The baremetal-provider service proceeds with the IpamHost object provisioning.

Create subnets

Before creating an L2 template, ensure that you have the required subnets that can be used in the L2 template to allocate IP addresses for the managed cluster nodes. Where required, create a number of subnets for a particular project using the Subnet CR. A subnet has three logical scopes:

  • global - CR uses the default namespace. A subnet can be used for any cluster located in any project.

  • namespaced - CR uses the namespace that corresponds to a particular project where managed clusters are located. A subnet can be used for any cluster located in the same project.

  • cluster - CR uses the namespace where the referenced cluster is located. A subnet is only accessible to the cluster that L2Template.spec.clusterRef refers to. The Subnet objects with the cluster scope will be created for every new cluster.

You can have subnets with the same name in different projects. In this case, the subnet that has the same project as the cluster will be used. One L2 template may often reference several subnets, those subnets may have different scopes in this case.

The IP address objects (IPaddr CR) that are allocated from subnets always have the same project as their corresponding IpamHost objects, regardless of the subnet scope.

To create subnets for a cluster:

  1. Log in to a local machine where your management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created during the last stage of the management cluster bootstrap.

  2. Create the subnet.yaml file with a number of global or namespaced subnets depending on the configuration of your cluster:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> apply -f <SubnetFileName.yaml>
    

    Note

    In the command above and in the steps below, substitute the parameters enclosed in angle brackets with the corresponding values.

    Example of a subnet.yaml file:

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      name: demo
      namespace: demo-namespace
    spec:
      cidr: 10.11.0.0/24
      gateway: 10.11.0.9
      includeRanges:
      - 10.11.0.5-10.11.0.70
      nameservers:
      - 172.18.176.6
    
    Specification fields of the Subnet object

    Parameter

    Description

    cidr (singular)

    A valid IPv4 CIDR, for example, 10.11.0.0/24.

    includeRanges (list)

    A list of IP address ranges within the given CIDR that should be used in the allocation of IPs for nodes (excluding the gateway address). The IPs outside the given ranges will not be used in the allocation. Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77. In the example above, the addresses 10.11.0.5-10.11.0.70 (excluding the gateway address 10.11.0.9) will be allocated for nodes. The includeRanges parameter is mutually exclusive with excludeRanges.

    excludeRanges (list)

    A list of IP address ranges within the given CIDR that should not be used in the allocation of IPs for nodes. The IPs within the given CIDR but outside the given ranges will be used in the allocation (excluding gateway address). Each element of the list can be either an interval 10.11.0.5-10.11.0.70 or a single address 10.11.0.77. The excludeRanges parameter is mutually exclusive with includeRanges.

    useWholeCidr (boolean)

    If set to true, the subnet address (10.11.0.0 in the example above) and the broadcast address (10.11.0.255 in the example above) are included into the address allocation for nodes. Otherwise, (false by default), the subnet address and broadcast address will be excluded from the address allocation.

    gateway (singular)

    A valid gateway address, for example, 10.11.0.9.

    nameservers (list)

    A list of the IP addresses of name servers. Each element of the list is a single address, for example, 172.18.176.6.

    Caution

    The subnet for the PXE network is automatically created during deployment and must contain the ipam/DefaultSubnet: "1" label. Each bare metal region must have only one subnet with this label.

    Caution

    You may use different subnets to allocate IP addresses to diffrenet components of MCC in your cluster. See below for detailed list of available options. Each subnet that is used to configure an MCC service must be labelled with special service label that starts with ipam/SVC- prefix. Make sure that no subnet has more than one such label.

  3. Optional. Add subnet for MetalLB service in your cluster. To designate a Subnet as MetalLB address pool, use label key ipam/SVC-MetalLB. Set value of the label to “1”. Set the cluster.sigs.k8s.io/cluster-name label to the name of the cluster where subnet is used. You may create multiple subnets with ipam/SVC-MetalLB label to define multiple IP address ranges for MetalLB in the cluster.

    Note

    The IP addresses of the MetalLB address pool are not assigned to the interfaces on hosts. This is purely virtual subnet. Make sure that it is not included in the L2 template definitions for your cluster.

    Note

    • When MetalLB address ranges are defined in both cluster specification and specific Subnet objects, the resulting MetalLB address pools configuration will contain address ranges from both cluster specification and Subnet objects.

    • All address ranges for L2 address pools that are defined in both cluster specification and Subnet objects are aggregated into a single L2 address pool and sorted as strings.

  4. Optional. Add Ceph ‘public’ subnet.

    Set label ipam/SVC-ceph-public with value “1” to create a subnet that will be used to configure Ceph ‘public’ network. Ceph will automatically use this subnet for its external connections. A Ceph OSD will look for and bind to an address from this subnet when it is started on a machine. Use this subnet in the L2 template for storage nodes. Assign this subnet to the interface connected to your storage access network.

    When using this label, set the cluster.sigs.k8s.io/cluster-name label to the name of the target cluster during the subnet creation.

  5. Optional. Add Ceph ‘replication’ subnet.

    Set label ipam/SVC-ceph-cluster with value “1” to create a subnet that will be used to configure Ceph ‘replication’ network. Ceph will automatically use this subnet for its internal replication traffic. Use this subnet in the L2 template for storage nodes.

    When using this label, set the cluster.sigs.k8s.io/cluster-name label to the name of the target cluster during the subnet creation.

  6. Optional. Add subnet for Kubernetes pods traffic.

  7. Verify that the subnet is successfully created:

    kubectl get subnet kaas-mgmt -oyaml
    

    In the system output, verify the status fields of the subnet.yaml file using the table below.

    Status fields of the Subnet object

    Parameter

    Description

    statusMessage

    Contains a short state description and a more detailed one if applicable. The short status values are as follows:

    • OK - operational.

    • ERR - non-operational. This status has a detailed description, for example, ERR: Wrong includeRange for CIDR….

    cidr

    Reflects the actual CIDR, has the same meaning as spec.cidr.

    gateway

    Reflects the actual gateway, has the same meaning as spec.gateway.

    nameservers

    Reflects the actual name servers, has same meaning as spec.nameservers.

    ranges

    Specifies the address ranges that are calculated using the fields from spec: cidr, includeRanges, excludeRanges, gateway, useWholeCidr. These ranges are directly used for nodes IP allocation.

    lastUpdate

    Includes the date and time of the latest update of the Subnet RC.

    allocatable

    Includes the number of currently available IP addresses that can be allocated for nodes from the subnet.

    allocatedIPs

    Specifies the list of IPv4 addresses with the corresponding IPaddr object IDs that were already allocated from the subnet.

    capacity

    Contains the total number of IP addresses being held by ranges that equals to a sum of the allocatable and allocatedIPs parameters values.

    versionIpam

    Contains thevVersion of the kaas-ipam component that made the latest changes to the Subnet RC.

    Example of a successfully created subnet:

    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Subnet
    metadata:
      labels:
        ipam/UID: 6039758f-23ee-40ba-8c0f-61c01b0ac863
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
      name: kaas-mgmt
      namespace: default
    spec:
      cidr: 10.0.0.0/24
      excludeRanges:
      - 10.0.0.100
      - 10.0.0.101-10.0.0.120
      gateway: 10.0.0.1
      includeRanges:
      - 10.0.0.50-10.0.0.90
      nameservers:
      - 172.18.176.6
    status:
      allocatable: 38
      allocatedIPs:
      - 10.0.0.50:0b50774f-ffed-11ea-84c7-0242c0a85b02
      - 10.0.0.51:1422e651-ffed-11ea-84c7-0242c0a85b02
      - 10.0.0.52:1d19912c-ffed-11ea-84c7-0242c0a85b02
      capacity: 41
      cidr: 10.0.0.0/24
      gateway: 10.0.0.1
      lastUpdate: "2020-09-26T11:40:44Z"
      nameservers:
      - 172.18.176.6
      ranges:
      - 10.0.0.50-10.0.0.90
      statusMessage: OK
      versionIpam: v3.0.999-20200807-130909-44151f8
    

Now, proceed with creating subnets for your MOS cluster as described in Create subnets for MOS cluster.

Create subnets for MOS cluster

According to the MOS reference architecture, you should create the following subnets.

lcm-nw

The LCM network of the MOS cluster. Example of lcm-nw:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    kaas-mgmt-subnet: ""
  name: lcm-nw
  namespace: <MOSClusterNamespace>
spec:
  cidr: 172.16.43.0/24
  gateway: 172.16.43.1
  includeRanges:
  - 172.16.43.10-172.16.43.100
k8s-ext-subnet

The addresses from this subnet are assigned to interfaces connected to the external network.

Example of k8s-ext-subnet:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
  name: k8s-ext-subnet
  namespace: <MOSClusterNamespace>
spec:
  cidr: 172.16.45.0/24
  includeRanges:
  - 172.16.45.10-172.16.45.100
mos-metallb-subnet

This subnet is not allocated to interfaces, but used as a MetalLB address pool to expose MOS API endpoints as Kubernetes cluster services.

Example of mos-metallb-subnet:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    ipam/SVC-metallb: true
  name: mos-metallb-subnet
  namespace: <MOSClusterNamespace>
spec:
  cidr: 172.16.45.0/24
  includeRanges:
  - 172.16.45.101-172.16.45.200
k8s-pods-subnet

The addresses from this subnet are assigned to interfaces conncected to the internal network and used by Calico as underlay for traffic between the pods in Kubernetes cluster.

Example of k8s-pods-subnet:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
  name: k8s-pods-subnet
  namespace: <MOSClusterNamespace>
spec:
  cidr: 10.12.3.0/24
  includeRanges:
  - 10.12.3.10-10.12.3.100
neutron-tunnel-subnet

The underlay network for VXLAN tunnels for the MOS tenants traffic. If deployed with Tungsten Fabric, it is used for MPLS over UDP+GRE traffic.

Example of neutron-tunnel-subnet:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
  name: neutron-tunnel-subnet
  namespace: <MOSClusterNamespace>
spec:
  cidr: 10.12.2.0/24
  includeRanges:
  - 10.12.2.10-10.12.2.100
ceph-public-subnet

Example of a Ceph cluster access network:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    ipam/SVC-ceph-public: true
  name: ceph-public-subnet
  namespace: <MOSClusterNamespace>
spec:
  cidr: 10.12.0.0/24
ceph-cluster-subnet

Example of the Ceph replication traffic network:

apiVersion: ipam.mirantis.com/v1alpha1
kind: Subnet
metadata:
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    ipam/SVC-ceph-cluster: true
  name: ceph-cluster-subnet
  namespace: <MOSClusterNamespace>
spec:
  cidr: 10.12.1.0/24

Now, proceed with creating an L2 template for one or multiple managed clusters as described in Create L2 templates.

Create L2 templates

After you create subnets for the MOS managed cluster as described in Create subnets, follow the procedure below to create L2 templates for different types of OpenStack nodes in the cluster.

See the following subsections for templates that implement the MOS Reference Architecture: Networking. You may adjust the templates according to the requirements of your architecture using the last two subsections of this section. They explain mandatory parameters of the templates and supported configuration options.

Create Kubernetes manager node L2 template

According to the reference architecture, the Kubernetes manager nodes in the MOS managed cluster must be connected to the following networks:

  • PXE network

  • LCM network

Caution

If you plan to deploy MOS cluster with the compact control plane option, skip this section entirely and proceed with Create MOS controller node L2 template.

To create an L2 template for Kubernetes manager nodes:

  1. Create or open the mos-l2templates.yml file that contains the L2 templates you are preparing.

  2. Add an L2 template using the following example. Adjust the values of specific parameters according to the specifications of your environment.

    L2 template example
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: <MOSClusterName>
      name: k8s-manager
      namespace: <MOSClusterNamespace>
    spec:
      autoIfMappingPrio:
      - provision
      - eno
      - ens
      - enp
      clusterRef: <MOSClusterName>
      l3Layout:
      - subnetName: lcm-nw
        scope: global
        labelSelector:
          kaas.mirantis.com/provider: baremetal
          kaas-mgmt-subnet: ""
      npTemplate: |-
        version: 2
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
            mtu: 9000
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
            mtu: 9000
          {{nic 2}}
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 2}}
            set-name: {{nic 2}}
            mtu: 9000
          {{nic 3}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 3}}
            set-name: {{nic 3}}
            mtu: 9000
        bonds:
          bond0:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 0}}
            - {{nic 1}}
        vlans:
          k8s-lcm-v:
            id: 403
            link: bond0
            mtu: 9000
          k8s-ext-v:
            id: 409
            link: bond0
            mtu: 9000
          k8s-pods-v:
            id: 408
            link: bond0
            mtu: 9000
        bridges:
          k8s-lcm:
            interfaces: [k8s-lcm-v]
            nameservers:
              addresses: {{nameservers_from_subnet "lcm-nw"}}
            gateway4: {{ gateway_from_subnet "lcm-nw" }}
            addresses:
            - {{ ip "0:lcm-nw" }}
          k8s-ext:
            interfaces: [k8s-ext-v]
            addresses:
            - {{ip "k8s-ext:k8s-ext-subnet"}}
            mtu: 9000
          k8s-pods:
            interfaces: [k8s-pods-v]
            addresses:
            - {{ip "k8s-pods:k8s-pods-subnet"}}
            mtu: 9000
    
  3. Proceed with Create MOS controller node L2 template. The resulting L2 template will be used to render the netplan configuration for the managed cluster machines.

Create MOS controller node L2 template

According to the reference architecture, MOS controller nodes must be connected to the following networks:

  • PXE network

  • LCM network

  • Kubernetes workloads network

  • Storage public network

  • Floating IP and provider networks. Not required for deployment with Tungsten Fabric.

  • Tenant underlay networks. If deploying with VXLAN networking or with Tungsten Fabric. In the latter case, the BGP service is configured over this network.

To create an L2 template for MOS controller nodes:

  1. Create or open the mos-l2template.yml file that contains the L2 templates.

  2. Add an L2 template using the following example. Adjust the values of specific parameters according to the specification of your environment.

    Example of an L2 template for MOS controller nodes
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: <MOSClusterName>
      name: mos-controller
      namespace: <MOSClusterNamespace>
    spec:
      autoIfMappingPrio:
      - provision
      - eno
      - ens
      - enp
      clusterRef: <MOSClusterName>
      l3Layout:
      - subnetName: lcm-nw
        scope: global
        labelSelector:
          kaas.mirantis.com/provider: baremetal
          kaas-mgmt-subnet: ""
      - subnetName: k8s-ext-subnet
        scope: namespace
      - subnetName: k8s-pods-subnet
        scope: namespace
      - subnetName: ceph-cluster-subnet
        scope: namespace
      - subnetName: ceph-public-subnet
        scope: namespace
      - subnetName: neutron-tunnel-subnet
        scope: namespace
      npTemplate: |-
        version: 2
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
            mtu: 9000
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
            mtu: 9000
          {{nic 2}}
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 2}}
            set-name: {{nic 2}}
            mtu: 9000
          {{nic 3}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 3}}
            set-name: {{nic 3}}
            mtu: 9000
        bonds:
          bond0:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 0}}
            - {{nic 1}}
          bond1:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 2}}
            - {{nic 3}}
        vlans:
          k8s-lcm-v:
            id: 403
            link: bond0
            mtu: 9000
          k8s-ext-v:
            id: 409
            link: bond0
            mtu: 9000
          k8s-pods-v:
            id: 408
            link: bond0
            mtu: 9000
          pr-floating:
            id: 407
            link: bond1
            mtu: 9000
          stor-frontend:
            id: 404
            link: bond0
            mtu: 9000
          stor-backend:
            id: 405
            link: bond1
            mtu: 9000
          neutron-tunnel:
            id: 406
            link: bond1
            addresses:
            - {{ip "neutron-tunnel:neutron-tunnel-subnet"}}
            mtu: 9000
        bridges:
          k8s-lcm:
            interfaces: [k8s-lcm-v]
            nameservers:
              addresses: {{nameservers_from_subnet "lcm-nw"}}
            gateway4: {{ gateway_from_subnet "lcm-nw" }}
            addresses:
            - {{ ip "0:lcm-nw" }}
          k8s-ext:
            interfaces: [k8s-ext-v]
            addresses:
            - {{ip "k8s-ext:k8s-ext-subnet"}}
            mtu: 9000
          k8s-pods:
            interfaces: [k8s-pods-v]
            addresses:
            - {{ip "k8s-pods:k8s-pods-subnet"}}
            mtu: 9000
          ceph-public:
            interfaces: [stor-frontend]
            addresses:
            - {{ip "ceph-public:ceph-public-subnet"}}
            mtu: 9000
          ceph-cluster:
            interfaces: [stor-backend]
            addresses:
            - {{ip "ceph-cluster:ceph-cluster-subnet"}}
            mtu: 9000
    
  3. Proceed with Create MOS compute node L2 template.

Create MOS compute node L2 template

According to the reference architecture, MOS compute nodes must be connected to the following networks:

  • PXE network

  • LCM network

  • Storage public network (if deploying with Ceph as a back-end for ephemeral storage)

  • Floating IP and provider networks (if deploying OpenStack with DVR)

  • Tenant underlay networks

To create an L2 template for MOS compute nodes:

  1. Add L2 template to the mos-l2templates.yml file using the following example. Adjust the values of parameters according to the specification of your environment.

    Example of an L2 template for MOS compute nodes
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: <MOSClusterName>
      name: mos-compute
      namespace: <MOSClusterNamespace>
    spec:
      autoIfMappingPrio:
      - provision
      - eno
      - ens
      - enp
      clusterRef: <MOSClusterName>
      l3Layout:
      - subnetName: lcm-nw
        scope: global
        labelSelector:
          kaas.mirantis.com/provider: baremetal
          kaas-mgmt-subnet: ""
      - subnetName: k8s-ext-subnet
        scope: namespace
      - subnetName: k8s-pods-subnet
        scope: namespace
      - subnetName: ceph-cluster-subnet
        scope: namespace
      - subnetName: neutron-tunnel-subnet
        scope: namespace
      npTemplate: |-
        version: 2
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
            mtu: 9000
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
            mtu: 9000
          {{nic 2}}
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 2}}
            set-name: {{nic 2}}
            mtu: 9000
          {{nic 3}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 3}}
            set-name: {{nic 3}}
            mtu: 9000
        bonds:
          bond0:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 0}}
            - {{nic 1}}
          bond1:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 2}}
            - {{nic 3}}
        vlans:
          k8s-lcm-v:
            id: 403
            link: bond0
            mtu: 9000
          k8s-ext-v:
            id: 409
            link: bond0
            mtu: 9000
          k8s-pods-v:
            id: 408
            link: bond0
            mtu: 9000
          pr-floating:
            id: 407
            link: bond1
            mtu: 9000
          stor-frontend:
            id: 404
            link: bond0
            mtu: 9000
          stor-backend:
            id: 405
            link: bond1
            mtu: 9000
          neutron-tunnel:
            id: 406
            link: bond1
            addresses:
            - {{ip "neutron-tunnel:neutron-tunnel-subnet"}}
            mtu: 9000
        bridges:
          k8s-lcm:
            interfaces: [k8s-lcm-v]
            nameservers:
              addresses: {{nameservers_from_subnet "lcm-nw"}}
            gateway4: {{ gateway_from_subnet "lcm-nw" }}
            addresses:
            - {{ ip "0:lcm-nw" }}
          k8s-ext:
            interfaces: [k8s-ext-v]
            addresses:
            - {{ip "k8s-ext:k8s-ext-subnet"}}
            mtu: 9000
          k8s-pods:
            interfaces: [k8s-pods-v]
            addresses:
            - {{ip "k8s-pods:k8s-pods-subnet"}}
            mtu: 9000
          ceph-public:
            interfaces: [stor-frontend]
            addresses:
            - {{ip "ceph-public:ceph-public-subnet"}}
            mtu: 9000
          ceph-cluster:
            interfaces: [stor-backend]
            addresses:
            - {{ip "ceph-cluster:ceph-cluster-subnet"}}
            mtu: 9000
    
  2. Proceed with Create MOS storage node L2 template.

Create MOS storage node L2 template

According to the reference architecture, MOS storage nodes in the MOS managed cluster must be connected to the following networks:

  • PXE network

  • LCM network

  • Storage access network

  • Storage replication network

To create an L2 template for MOS storage nodes:

  1. Add an L2 template to the mos-l2templates.yml file using the following example. Adjust the values of parameters according to the specification of your environment.

    Example of an L2 template for MOS storage nodes
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: <MOSClusterName>
      name: mos-storage
      namespace: <MOSClusterNamespace>
    spec:
      autoIfMappingPrio:
      - provision
      - eno
      - ens
      - enp
      clusterRef: <MOSClusterName>
      l3Layout:
      - subnetName: lcm-nw
        scope: global
        labelSelector:
          kaas.mirantis.com/provider: baremetal
          kaas-mgmt-subnet: ""
      - subnetName: k8s-ext-subnet
        scope: namespace
      - subnetName: k8s-pods-subnet
        scope: namespace
      - subnetName: ceph-cluster-subnet
        scope: namespace
      - subnetName: ceph-public-subnet
        scope: namespace
      npTemplate: |-
        version: 2
        ethernets:
          {{nic 0}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 0}}
            set-name: {{nic 0}}
            mtu: 9000
          {{nic 1}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 1}}
            set-name: {{nic 1}}
            mtu: 9000
          {{nic 2}}
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 2}}
            set-name: {{nic 2}}
            mtu: 9000
          {{nic 3}}:
            dhcp4: false
            dhcp6: false
            match:
              macaddress: {{mac 3}}
            set-name: {{nic 3}}
            mtu: 9000
        bonds:
          bond0:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 0}}
            - {{nic 1}}
          bond1:
            mtu: 9000
            parameters:
              mode: 802.3ad
            interfaces:
            - {{nic 2}}
            - {{nic 3}}
        vlans:
          k8s-lcm-v:
            id: 403
            link: bond0
            mtu: 9000
          k8s-ext-v:
            id: 409
            link: bond0
            mtu: 9000
          k8s-pods-v:
            id: 408
            link: bond0
            mtu: 9000
          stor-frontend:
            id: 404
            link: bond0
            mtu: 9000
          stor-backend:
            id: 405
            link: bond1
            mtu: 9000
        bridges:
          k8s-lcm:
            interfaces: [k8s-lcm-v]
            nameservers:
              addresses: {{nameservers_from_subnet "lcm-nw"}}
            gateway4: {{ gateway_from_subnet "lcm-nw" }}
            addresses:
            - {{ ip "0:lcm-nw" }}
          k8s-ext:
            interfaces: [k8s-ext-v]
            addresses:
            - {{ip "k8s-ext:k8s-ext-subnet"}}
            mtu: 9000
          k8s-pods:
            interfaces: [k8s-pods-v]
            addresses:
            - {{ip "k8s-pods:k8s-pods-subnet"}}
            mtu: 9000
          ceph-public:
            interfaces: [stor-frontend]
            addresses:
            - {{ip "ceph-public:ceph-public-subnet"}}
            mtu: 9000
          ceph-cluster:
            interfaces: [stor-backend]
            addresses:
            - {{ip "ceph-cluster:ceph-cluster-subnet"}}
            mtu: 9000
    
  2. Proceed with Apply and check L2 templates.

Apply and check L2 templates

To add L2 templates for a MOS cluster:

  1. Log in to a local machine where your management cluster kubeconfig is located and where kubectl is installed.

    Note

    The management cluster kubeconfig is created during the last stage of the management cluster bootstrap.

  2. Add the L2 template to your management cluster:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> apply -f <pathToL2TemplateYamlFile>
    
  3. Inspect the existing L2 templates to see if any one fits your deployment:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> \
    get l2template -n <ProjectNameForNewManagedCluster>
    
  4. Optional. Further modify the template, if required or in case of mistake in configuration. See Mandatory parameters of L2 templates and Netplan template macros for details:

    kubectl --kubeconfig <pathToManagementClusterKubeconfig> \
    -n <ProjectNameForNewManagedCluster> edit l2template <L2templateName>
    
Mandatory parameters of L2 templates

Think of an L2 template as a template for networking configuration for your hosts. You may adjust the parameters according to the actual requirements and hardware setup of your hosts.

L2 template mandatory parameters

Parameter

Description

clusterRef

References the Cluster object that this template is applied to. The default value is used to apply the given template to all clusters within a particular project, unless an L2 template that references a specific cluster name exists.

ifMapping or autoIfMappingPrio

  • ifMapping is a list of interface names for the template. The interface mapping is defined globally for all bare metal hosts in the cluster but can be overridden at the host level, if required, by editing the IpamHost object for a particular host. The ifMapping parameter is mutually exclusive with autoIfMappingPrio.

  • autoIfMappingPrio is a list of prefixes, such as eno, ens, and so on, to match the interfaces to automatically create a list for the template. If you are not aware of any specific ordering of interfaces on the nodes, use the default ordering from Predictable Network Interfaces Names specification for systemd.

    You can also override the default NIC list per host using the IfMappingOverride parameter of the corresponding IpamHost. The provision value corresponds to the network interface that was used to provision a node. Usually, it is the first NIC found on a particular node. It is defined explicitly to ensure that this interface will not be reconfigured accidentally.

    The autoIfMappingPrio parameter is mutually exclusive with ifMapping.

npTemplate

A netplan-compatible configuration with special lookup functions that defines the networking settings for the cluster hosts, where physical NIC names and details are parameterized. This configuration will be processed using Go templates. Instead of specifying IP and MAC addresses, interface names, and other network details specific to a particular host, the template supports use of special lookup functions. These lookup functions, such as nic, mac, ip, and so on, return host-specific network information when the template is rendered for a particular host. For details about netplan, see the official netplan documentation.

Caution

All rules and restrictions of the netplan configuration also apply to L2 templates. For details, see the official netplan documentation.

Caution

We strongly recommend following the below conventions on network interface naming:

  • A physical NIC name set by an L2 template must not exceed 15 symbols. Otherwise, an L2 template creation fails. This limit is set by the Linux kernel.

  • Names of virtual network interfaces such as VLANs, bridges, bonds, veth, and so on must not exceed 15 symbols.

We recommend setting interfaces names that do not exceed 13 symbols for both physical and virtual interfaces to avoid corner cases and issues in netplan rendering.

l3Layout section parameters

Parameter

Description

subnetName

Name of the reference to the subnet that will be used in the npTemplate section. This name may be different than the name of the actual Subnet resource, if labelSelector field is present and uniquely identifies the resource.

labelSelector

A dictionary of the labels and values that are used to filter out and find the Subnet resource to refer to from the template by the subnetName.

subnetPool

Optional. Default: none. Name of the parent SubnetPool object that will be used to create a Subnet object with a given subnetName and scope. If a corresponding Subnet object already exists, nothing will be created and the existing object will be used. If no SubnetPool is provided, no new Subnet object will be created.

scope

Logical scope of the Subnet object with a corresponding subnetName. Possible values:

  • global - the Subnet object is accessible globally, for any Container Cloud project and cluster in the region, for example, the PXE subnet.

  • namespace - the Subnet object is accessible within the same project and region where the L2 template is defined.

Netplan template macros

The following table describes the main lookup functions, or macros, that can be used in the npTemplate field of an L2 template.

Lookup function

Description

{{nic N}}

Name of a NIC number N. NIC numbers correspond to the interface mapping list. This macro can be used as a key for the elements of ethernets map, or as the value of name and set-name parameters of a NIC. It is also used to reference the physical NIC from definitions of virtual interfaces (vlan, bridge).

{{mac N}}

MAC address of a NIC number N registered during a host hardware inspection.

{{ip "N:subnet-a"}}

IP address and mask for a NIC number N. The address will be allocated automatically from the given subnet, unless an IP address for that interface already exists. The interface is identified by its MAC address.

{{ip "br0:subnet-x"}}

IP address and mask for a virtual interface, "br0" in this example. The address will be auto-allocated from the given subnet if the address does not exist yet.

{{gateway_from_subnet "subnet-a"}}

IPv4 default gateway address from the given subnet.

{{nameservers_from_subnet "subnet-a"}}

List of the IP addresses of name servers from the given subnet.

L2 template example with bonds and bridges

This section contains an exemplary L2 template that demonstrates how to set up bonds and bridges on hosts for your managed clusters.

Dedicated network for the Kubernetes pods traffic

If you want to use a dedicated network for Kubernetes pods traffic, configure each node with an IPv4 and/or IPv6 address that will be used to route the pods traffic between nodes. To accomplish that, use the npTemplate.bridges.k8s-pods bridge in the L2 template, as demonstrated in the example below. As defined in Reference Architecture: Host networking, this bridge name is reserved for the Kubernetes pods network. When the k8s-pods bridge is defined in an L2 template, Calico CNI uses that network for routing the pods traffic between nodes.

Dedicated network for the Kubernetes services traffic (MetalLB)

You can use a dedicated network for external connection to the Kubernetes services exposed by the cluster. If enabled, MetalLB will listen and respond on the dedicated virtual bridge. To accomplish that, configure each node where metallb-speaker is deployed with an IPv4 or IPv6 address. Both, the MetalLB IP address ranges and the IP addresses configured on those nodes, must fit in the same CIDR.

Use the npTemplate.bridges.k8s-ext bridge in the L2 template, as demonstrated in the example below. This bridge name is reserved for the Kubernetes external network. The Subnet object that corresponds to the k8s-ext bridge must have explicitly excluded IP address ranges that are in use by MetalLB.

Dedicated network for the Ceph distributed storage traffic

Starting from Container Cloud 2.7.0, you can configure dedicated networks for the Ceph cluster access and replication traffic. Set labels on the Subnet CRs for the corresponding networks, as described in Create subnets. Container Cloud automatically configures Ceph to use the addresses from these subnets. Ensure that the addresses are assigned to the storage nodes.

Use the npTemplate.bridges.ceph-cluster and npTemplate.bridges.ceph-public bridges in the L2 template, as demonstrated in the example below. These names are reserved for the Ceph cluster access and replication networks.

The Subnet objects used to assign IP addresses to these bridges must have corresponding labels ipam/SVC-ceph-public for the ceph-public bridge and ipam/SVC-ceph-cluster for the ceph-cluster bridge.

Example of an L2 template with interfaces bonding
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  name: test-managed
  namespace: managed-ns
spec:
  clusterRef: managed-cluster
  autoIfMappingPrio:
    - provision
    - eno
    - ens
    - enp
  npTemplate: |
    version: 2
    ethernets:
      ten10gbe0s0:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 2}}
        set-name: {{nic 2}}
      ten10gbe0s1:
        dhcp4: false
        dhcp6: false
        match:
          macaddress: {{mac 3}}
        set-name: {{nic 3}}
    bonds:
      bond0:
        interfaces:
          - ten10gbe0s0
          - ten10gbe0s1
    vlans:
      k8s-ext-vlan:
        id: 1001
        link: bond0
      k8s-pods-vlan:
        id: 1002
        link: bond0
      stor-frontend:
        id: 1003
        link: bond0
      stor-backend:
        id: 1004
        link: bond0
    bridges:
      k8s-ext:
        interfaces: [k8s-ext-vlan]
        addresses:
          - {{ip "k8s-ext:demo-ext"}}
      k8s-pods:
        interfaces: [k8s-pods-vlan]
        addresses:
          - {{ip "k8s-pods:demo-pods"}}
      ceph-cluster:
        interfaces: [stor-backend]
        addresses:
          - {{ip "ceph-cluster:demo-ceph-cluster"}}
      ceph-public
        interfaces: [stor-frontend]
        addresses:
          - {{ip "ceph-public:demo-ceph-public"}}
Assign L2 templates to machines

To install MOS on bare metal with Container Cloud, you must create L2 templates for each node type in the MOS cluster. Additionally, you may have to create separate templates for nodes of the same type when they have different configuration.

To assign specific L2 templates to machines in a MOS cluster:

  1. Use the clusterRef parameter in the L2 template spec to assign the templates to the cluster.

  2. Add a unique identifier label to every L2 template. Typically, that would be the name of the MOS node role, for example l2template-compute, or l2template-compute-5nics.

  3. Assign an L2 template to a machine. Set the l2TemplateSelector field in the machine spec to the name of the label added in the previous step. IPAM controller uses this field to use a specific L2 template for the corresponding machine.

    Alternatively, you may set the l2TemplateSelector field to the name of the L2 template.

Consider the following examples of an L2 template assignment to a machine.

Example of an L2Template resource
apiVersion: ipam.mirantis.com/v1alpha1
kind: L2Template
metadata:
  name: example-node-netconfig
  namespace: my-project
  labels:
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
    l2template-example-node-netconfig: "1"
...
spec:
  clusterRef: my-cluster
...

Example of a Machine resource with the label-based L2 template selector
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: machine1
  namespace: my-project
...
spec:
  providerSpec:
    value:
      l2TemplateSelector:
        label: l2template-example-node-netconfig
...

Example of a Machine resource with the name-based L2 template selector
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  name: machine1
  namespace: my-project
...
spec:
  providerSpec:
    value:
      l2TemplateSelector:
        name: example-node-netconfig
...

Now, proceed to add-machine-to-managed.

Add a machine

This section describes how to add a machine to a managed MOS cluster using CLI for advanced configuration.

Create a machine using CLI

This section describes adding machines to a new MOS cluster using Mirantis Container Cloud CLI.

If you need to add more machines to an existing MOS cluster, see Add a controller node and Add a compute node.

To add machine to the MOS cluster:

  1. Log in to the host where your management cluster kubeconfig is located and where kubectl is installed.

  2. Create a new text file mos-cluster-machines.yaml and create the YAML definitons of the Machine resources. Use this as an example, and see the descriptions of the fields below:

    apiVersion: cluster.k8s.io/v1alpha1
    kind: Machine
    metadata:
      name: mos-node-role-name
      namespace: mos-project
      labels:
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
        cluster.sigs.k8s.io/cluster-name: mos-cluster
    spec:
      providerSpec:
        value:
          apiVersion: baremetal.k8s.io/v1alpha1
          kind: BareMetalMachineProviderSpec
          bareMetalHostProfile:
            name: mos-k8s-mgr
            namespace: mos-project
          l2TemplateSelector:
            name: mos-k8s-mgr
          hostSelector:
          l2TemplateMappingOverride: []
    
  3. Add the top level fields:

    • apiVersion

      API version of the object that is cluster.k8s.io/v1alpha1.

    • kind

      Object type that is Machine.

    • metadata

      This section will contain the metadata of the object.

    • spec

      This section will contain the configuration of the object.

  4. Add mandatory fields to the metadata section of the Machine object definition.

    • name

      The name of the Machine object.

    • namespace

      The name of the Project where the Machine will be created.

    • labels

      This section contains additional metadata of the machine. Set the following mandatory labels for the Machine object.

      • kaas.mirantis.com/provider

        Set to "baremetal".

      • kaas.mirantis.com/region

        Region name that matches the region name in the Cluster object.

      • cluster.sigs.k8s.io/cluster-name

        The name of the cluster to add the machine to.

  5. Configure the mandatory parameters of the Machine object in spec field. Add providerSpec field that contains parameters for deployment on bare metal in a form of Kubernetes subresource.

  6. In the providerSpec section, add the following mandatory configuration parameters:

    • apiVersion

      API version of the subresource that is baremetal.k8s.io/v1alpha1.

    • kind

      Object type that is BareMetalMachineProviderSpec.

    • bareMetalHostProfile

      Reference to a configuration profile of a bare metal host. It helps to pick bare metal host with suitable configuration for the machine. This section includes two parameters:

      • name

        Name of a bare metal host profile

      • namespace

        Project in which the bare metal host profile is created.

    • l2TemplateSelector

      If specified, contains the name (first priority) or label of the L2 template that will be applied during a machine creation. Note that changing this field after Machine object is created will not affect the host network configuration of the machine.

      You should assign one of the templates you defined in Create L2 templates to the machine. If there is no suitable templates, you should create one per Create L2 templates.

    • hostSelector

      This parameter defines matching criteria for picking a bare metal host for the machine by label.

      Any custom label that is assigned to one or more bare metal hosts using API can be used as a host selector. If the BareMetalHost objects with the specified label are missing, the Machine object will not be deployed until at least one bare metal host with the specified label is available.

      See Deploy a machine to a specific bare metal host for details.

    • l2TemplateIfMappingOverride

      This parameter contains a list of names of network interfaces of the host. It allows to override the default naming and ordering of network interfaces defined in L2 template referenced by the l2TemplateSelector. This ordering informs the L2 templates how to generate the host network configuration.

      See Override network interfaces naming and order for details.

  7. Depending on the role of the machine in the MOS cluster, add labels to the nodeLabels field. If you are NOT deploying MOS with compact control plane, you have to add 3 dedicated Kubernetes manager nodes.

    1. Add 3 Machine objects for Kubernetes manager nodes using the following label:

      metadata:
        labels:
          cluster.sigs.k8s.io/control-plane: true
      

      Note

      The value of the label might be any non-empty string. On a worker node, this label must be omitted entirely.

    2. Add 3 Machine objects for MOS controller nodes using the following labels:

      spec:
        nodeLabels:
          openstack-control-plane: enabled
          openstack-gateway: enabled
      
  8. If you are deploying MOS with compact control plane, add Machine objects for 3 combined control plane nodes using the following labels and parameters to the nodeLabels field:

    metadata:
      labels:
        cluster.sigs.k8s.io/control-plane: true
    spec:
      nodeLabels:
        opesntack-control-plane: enabled
        openstack-gateway: enabled
        openvswitch: enabled
    
  9. Add Machine objects for as many compute nodes as you want to install using the following labels:

    spec:
      nodeLabels:
        openstack-compute-node: enabled
        openvswitch: enabled
    
  10. Save the text file and repeat the process to create configuration for all machines in your MOS cluster.

  11. Create machines in the cluster using command:

    kubectl create -f mos-cluster-machines.yaml
    

Proceed to Add a Ceph cluster.

Deploy a machine to a specific bare metal host

Machine in a MOS cluster requires dedicated bare metal host for deployment. The bare metal hosts are represented by the BareMetalHost objects in MCC management API. All BareMetalHost objects must be labeled upon creation with a label that will allow to identify the host and assign it to a machine.

The labels may be unique, or applied to a group of hosts, based on similarities in their capacity, capabilities and hardware configuration, on their location, suitable role, or a combination of thereof.

In some cases, you may need to deploy a machine to a specific bare metal host. This is especially useful when some of your bare metal hosts have different hardware configuration than the rest.

To deploy a machine to a specific bare metal host:

  1. Log in to the host where your management cluster kubeconfig is located and where kubectl is installed.

  2. Identify the bare metal host that you want to associate with the specific machine. For example, host host-1.

    kubectl get baremetalhost host-1 -o yaml
    
  3. Add a label that will uniquely identify this host, for example, by the name of the host and machine that you want to deploy on it.

    Caution

    Do not remove any existing labels from the BareMetalHost resource.

    kubectl edit baremetalhost host-1
    

    Configuration example:

    kind: BareMetalHost
    metadata:
      name: host-1
      namespace: myProjectName
      labels:
        kaas.mirantis.com/baremetalhost-id: host-1-worker-HW11-cad5
        ...
    
  4. Open the text file with the YAML definition of the Machine object, created in Create a machine using CLI.

  5. Add a host selector that matches the label you have added to the BareMetalHost object in the previous step.

    Example:

    kind: Machine
    metadata:
      name: worker-HW11-cad5
      namespace: myProjectName
    spec:
      ...
      providerSpec:
        value:
          apiVersion: baremetal.k8s.io/v1alpha1
          kind: BareMetalMachineProviderSpec
          ...
          hostSelector:
            matchLabels:
              kaas.mirantis.com/baremetalhost-id: host-1-worker-HW11-cad5
      ...
    

Once created, this machine will be associated with the specified bare metal host, and you can return to Create a machine using CLI.

Override network interfaces naming and order

An L2 template contains the ifMapping field that allows you to identify Ethernet interfaces for the template. The Machine object API enables the Operator to override the mapping from the L2 template by enforcing a specific order of names of the interfaces when applied to the template.

The field l2TemplateIfMappingOverride in the spec of the Machine object contains a list of interfaces names. The order of the interfaces names in the list is important because the L2Template object will be rendered with NICs ordered as per this list.

Note

Changes in the l2TemplateIfMappingOverride field will apply only once when the Machine and corresponding IpamHost objects are created. Further changes to l2TemplateIfMappingOverride will not reset the interfaces assignment and configuration.

Caution

The l2TemplateIfMappingOverride field must contain the names of all interfaces of the bare metal host.

The following example illustrates how to include the override field to the Machine object. In this example, we configure the interface eno1, which is the second on-board interface of the server, to precede the first on-board interface eno0.

apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  finalizers:
  - foregroundDeletion
  - machine.cluster.sigs.k8s.io
  labels:
    cluster.sigs.k8s.io/cluster-name: kaas-mgmt
    cluster.sigs.k8s.io/control-plane: "true"
    kaas.mirantis.com/provider: baremetal
    kaas.mirantis.com/region: region-one
spec:
  providerSpec:
    value:
      apiVersion: baremetal.k8s.io/v1alpha1
      hostSelector:
        matchLabels:
          baremetal: hw-master-0
      image: {}
      kind: BareMetalMachineProviderSpec
      l2TemplateIfMappingOverride:
      - eno1
      - eno0
      - enp0s1
      - enp0s2

As a result of the configuration above, when used with the example L2 template for bonds and bridges described in Create L2 templates, the enp0s1 and enp0s2 interfaces will be in predictable ordered state. This state will be used to create subinterfaces for Kubernetes networks (k8s-pods) and for Kubernetes external network (k8s-ext).

Add a Ceph cluster

After you add machines to your new bare metal managed cluster as described in add-machine-to-managed, create a Ceph cluster on top of this managed cluster using the Mirantis Container Cloud web UI.

For an advanced configuration through the KaaSCephCluster CR, see ceph-advanced-config. To configure Ceph controller through Kubernetes templates to manage Ceph nodes resources, see enable-resources-mgmt.

The procedure below enables you to create a Ceph cluster with minimum three Ceph nodes that provides persistent volumes to the Kubernetes workloads in the managed cluster.

To create a Ceph cluster in the managed cluster:

  1. Log in to the Container Cloud web UI with the writer permissions.

  2. Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.

  3. In the Clusters tab, click the required cluster name. The Cluster page with the Machines and Ceph clusters lists opens.

  4. In the Ceph Clusters block, click Create Cluster.

  5. Configure the Ceph cluster in the Create New Ceph Cluster wizard that opens:

    Create new Ceph cluster

    Section

    Parameter name

    Description

    General settings

    Name

    The Ceph cluster name.

    Cluster Network

    Replication network for Ceph OSDs. Must contain the CIDR definition and match the corresponding values of the cluster L2Template object or the environment network values.

    Public Network

    Public network for Ceph data. Must contain the CIDR definition and match the corresponding values of the cluster L2Template object or the environment network values.

    Enable OSDs LCM

    Select to enable LCM for Ceph OSDs.

    Machines / Machine #1-3

    Select machine

    Select the name of the Kubernetes machine that will host the corresponding Ceph node in the Ceph cluster.

    Manager, Monitor

    Select the required Ceph services to install on the Ceph node.

    Devices

    Select the disk that Ceph will use.

    Warning

    Do not select the device for system services, for example, sda.

    Enable Object Storage

    Select to enable the single-instance RGW Object Storage.

  6. To add more Ceph nodes to the new Ceph cluster, click + next to any Ceph Machine title in the Machines tab. Configure a Ceph node as required.

    Warning

    Do not add more than 3 Manager and/or Monitor services to the Ceph cluster.

  7. After you add and configure all nodes in your Ceph cluster, click Create.

Once done, verify your Ceph cluster as described in verify-ceph.

Delete a managed cluster

Due to a development limitation in baremetal operator, deletion of a managed cluster requires preliminary deletion of the worker machines running on the cluster.

Using the Container Cloud web UI, first delete worker machines one by one until you hit the minimum of 2 workers for an operational cluster. After that, you can delete the cluster with the remaining workers and managers.

To delete a baremetal-based managed cluster:

  1. Log in to the Mirantis Container Cloud web UI with the writer permissions.

  2. Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.

  3. In the Clusters tab, click the required cluster name to open the list of machines running on it.

  4. Click the More action icon in the last column of the worker machine you want to delete and select Delete. Confirm the deletion.

  5. Repeat the step above until you have 2 workers left.

  6. In the Clusters tab, click the More action icon in the last column of the required cluster and select Delete.

  7. Verify the list of machines to be removed. Confirm the deletion.

  8. Optional. If you do not plan to reuse the credentials of the deleted cluster, delete them:

    1. In the Credentials tab, click the Delete credential action icon next to the name of the credentials to be deleted.

    2. Confirm the deletion.

    Warning

    You can delete credentials only after deleting the managed cluster they relate to.

Deleting a cluster automatically frees up the resources allocated for this cluster, for example, instances, load balancers, networks, floating IPs, and so on.

Deploy a Ceph cluster

This section describes how to deploy a Ceph cluster in a MOS managed cluster.

For Ceph cluster limitations, see Mirantis Container Cloud Reference Architecture: Limitations.

To deploy a Ceph cluster:

  1. Deploy Ceph in the same Kubernetes cluster as described in Mirantis Container Cloud Operations Guide: Add a Ceph cluster.

  2. Open the KaaSCephCluster CR for editing as described in Mirantis Container Cloud Operations Guide: Ceph advanced configuration.

  3. Verify that the following snippet is present in the KaaSCephCluster configuration:

    network:
      clusterNet: 10.10.10.0/24
      publicNet: 10.10.11.0/24
    
  4. Configure the pools for Image, Block Storage, and Compute services.

    Note

    Ceph validates the specified pools. Therefore, do not omit any of the following pools.

    spec:
      pools:
        - default: true
          deviceClass: hdd
          name: kubernetes
          replicated:
            size: 3
          role: kubernetes
        - default: false
          deviceClass: hdd
          name: volumes
          replicated:
            size: 3
          role: volumes
        - default: false
          deviceClass: hdd
          name: vms
          replicated:
            size: 3
          role: vms
        - default: false
          deviceClass: hdd
          name: backup
          replicated:
            size: 3
          role: backup
        - default: false
          deviceClass: hdd
          name: images
          replicated:
            size: 3
          role: images
        - default: false
          deviceClass: hdd
          name: other
          replicated:
            size: 3
          role: other
    

    Each Ceph pool, depending on its role, has a default targetSizeRatio value that defines the expected consumption of the total Ceph cluster capacity. The default ratio values for MOS pools are as follows:

    • 20.0% for a Ceph pool with role volumes

    • 40.0% for a Ceph pool with role vms

    • 10.0% for a Ceph pool with role images

    • 10.0% for a Ceph pool with role backup

  5. Once all pools are created, verify that an appropriate secret required for a successful deployment of the OpenStack services that rely on Ceph is created in the openstack-ceph-shared namespace:

    kubectl -n openstack-ceph-shared get secrets openstack-ceph-keys
    

    Example of a positive system response:

    NAME                  TYPE     DATA   AGE
    openstack-ceph-keys   Opaque   7      36m
    

Deploy OpenStack

This section instructs you on how to deploy OpenStack on top of Kubernetes as well as how to troubleshoot the deployment and access your OpenStack environment after deployment.

Deploy an OpenStack cluster

This section instructs you on how to deploy OpenStack on top of Kubernetes using the OpenStack Controller and openstackdeployments.lcm.mirantis.com (OsDpl) CR.

To deploy an OpenStack cluster:

  1. Verify that you have pre-configured the networking according to Networking.

  2. Verify that the TLS certificates that will be required for the OpenStack cluster deployment have been pre-generated.

    Note

    The Transport Layer Security (TLS) protocol is mandatory on public endpoints.

    Caution

    To avoid certificates renewal with subsequent OpenStack updates during which additional services with new public endpoints may appear, we recommend using wildcard SSL certificates for public endpoints. For example, *.it.just.works, where it.just.works is a cluster public domain.

    The sample code block below illustrates how to generate a self-signed certificate for the it.just.works domain. The procedure presumes the cfssl and cfssljson tools are installed on the machine.

    mkdir cert && cd cert
    
    tee ca-config.json << EOF
    {
      "signing": {
        "default": {
          "expiry": "8760h"
        },
        "profiles": {
          "kubernetes": {
            "usages": [
              "signing",
              "key encipherment",
              "server auth",
              "client auth"
            ],
            "expiry": "8760h"
          }
        }
      }
    }
    EOF
    
    tee ca-csr.json << EOF
    {
      "CN": "kubernetes",
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names":[{
        "C": "<country>",
        "ST": "<state>",
        "L": "<city>",
        "O": "<organization>",
        "OU": "<organization unit>"
      }]
    }
    EOF
    
    cfssl gencert -initca ca-csr.json | cfssljson -bare ca
    
    tee server-csr.json << EOF
    {
        "CN": "*.it.just.works",
        "hosts":     [
            "*.it.just.works"
        ],
        "key":     {
            "algo": "rsa",
            "size": 2048
        },
        "names": [    {
            "C": "US",
            "L": "CA",
            "ST": "San Francisco"
        }]
    }
    EOF
    cfssl gencert -ca=ca.pem -ca-key=ca-key.pem --config=ca-config.json -profile=kubernetes server-csr.json | cfssljson -bare server
    
  3. Create the openstackdeployment.yaml file that will include the OpenStack cluster deployment configuration.

    Note

    The resource of kind OpenStackDeployment (OsDpl) is a custom resource defined by a resource of kind CustomResourceDefinition. The resource is validated with the help of the OpenAPI v3 schema.

  4. Configure the OsDpl resource depending on the needs of your deployment. For the configuration details, refer to OpenStackDeployment custom resource.

    Note

    If you plan to deploy the Telemetry service, you have to specify the Telemetry mode through features:telemetry:mode as described in OpenStackDeployment custom resource. Otherwise, Telemetry will fail to deploy.

    Example of an OsDpl CR of minimum configuration:

    apiVersion: lcm.mirantis.com/v1alpha1
    kind: OpenStackDeployment
    metadata:
      name: openstack-cluster
      namespace: openstack
    spec:
      openstack_version: ussuri
      preset: compute
      size: tiny
      internal_domain_name: cluster.local
      public_domain_name: it.just.works
      features:
        ssl:
          public_endpoints:
            api_cert: |-
              The public key certificate of the OpenStack public endpoints followed by
              the certificates of any intermediate certificate authorities which
              establishes a chain of trust up to the root CA certificate.
            api_key: |-
              The private key of the certificate for the OpenStack public endpoints.
              This key must match the public key used in the api_cert.
            ca_cert: |-
              The public key certificate of the root certificate authority.
              If you do not have one, use the top-most intermediate certificate instead.
        neutron:
          tunnel_interface: ens3
          external_networks:
            - physnet: physnet1
              interface: veth-phy
              bridge: br-ex
              network_types:
               - flat
              vlan_ranges: null
              mtu: null
          floating_network:
            enabled: False
        nova:
          live_migration_interface: ens3
          images:
            backend: local
    
  5. If required, enable DPDK, huge pages, and other supported Telco features as described in Advanced OpenStack configuration (optional).

  6. To the openstackdeployment object, add information about the TLS certificates:

    • ssl:public_endpoints:ca_cert - CA certificate content (ca.pem)

    • ssl:public_endpoints:api_cert - server certificate content (server.pem)

    • ssl:public_endpoints:api_key - server private key (server-key.pem)

  7. Verify that the Load Balancer network does not overlap your corporate or internal Kubernetes networks, for example, Calico IP pools. Also, verify that the pool of Load Balancer network is big enough to provide IP addresses for all Amphora VMs (loadbalancers).

    If required, reconfigure the Octavia network settings using the following sample structure:

    spec:
      services:
        load-balancer:
          octavia:
            values:
              octavia:
                settings:
                  lbmgmt_cidr: "10.255.0.0/16"
                  lbmgmt_subnet_start: "10.255.1.0"
                  lbmgmt_subnet_end: "10.255.255.254"
    
  8. Trigger the OpenStack deployment:

    kubectl apply -f openstackdeployment.yaml
    
  9. Monitor the status of your OpenStack deployment:

    kubectl -n openstack get pods
    kubectl -n openstack describe osdpl osh-dev
    
  10. Assess the current status of the OpenStack deployment using the status section output in the OsDpl resource:

    1. Get the OsDpl YAML file:

      kubectl -n openstack get osdpl osh-dev -o yaml
      
    2. Analyze the status output using the detailed description in OpenStackDeployment custom resource.

  11. Verify that the OpenStack cluster has been deployed:

    clinet_pod_name=$(kubectl -n openstack get pods -l application=keystone,component=client  | grep keystone-client | head -1 | awk '{print $1}')
    kubectl -n openstack exec -it $clinet_pod_name -- openstack service list
    

    Example of a positive system response:

    +----------------------------------+---------------+----------------+
    | ID                               | Name          | Type           |
    +----------------------------------+---------------+----------------+
    | 159f5c7e59784179b589f933bf9fc6b0 | cinderv3      | volumev3       |
    | 6ad762f04eb64a31a9567c1c3e5a53b4 | keystone      | identity       |
    | 7e265e0f37e34971959ce2dd9eafb5dc | heat          | orchestration  |
    | 8bc263babe9944cdb51e3b5981a0096b | nova          | compute        |
    | 9571a49d1fdd4a9f9e33972751125f3f | placement     | placement      |
    | a3f9b25b7447436b85158946ca1c15e2 | neutron       | network        |
    | af20129d67a14cadbe8d33ebe4b147a8 | heat-cfn      | cloudformation |
    | b00b5ad18c324ac9b1c83d7eb58c76f5 | radosgw-swift | object-store   |
    | b28217da1116498fa70e5b8d1b1457e5 | cinderv2      | volumev2       |
    | e601c0749ce5425c8efb789278656dd4 | glance        | image          |
    +----------------------------------+---------------+----------------+
    

See also

Networking

Advanced OpenStack configuration (optional)

This section includes configuration information for available advanced Mirantis OpenStack for Kubernetes features that include DPDK with the Neutron OVS back end, huge pages, CPU pinning, and other Enhanced Platform Awareness (EPA) capabilities.

Enable LVM ephemeral storage

Available since MOS 21.2 TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to configure LVM as back end for the VM disks and ephemeral storage.

To enable LVM ephemeral storage:

  1. In BareMetalHostProfile in the spec:volumeGroups section, add the following configuration for the OpenStack compute nodes:

    spec:
      devices:
        - device:
            byName: /dev/nvme0n1
            minSizeGiB: 30
            wipe: true
          partitions:
            - name: lvm_nova_vol
              sizeGiB: 0
              wipe: true
      volumeGroups:
        - devices:
          - partition: lvm_nova_vol
          name: nova-vol
      logicalVolumes:
        - name: nova-fake
          vg: nova-vol
          sizeGiB: 0.1
      fileSystems:
        - fileSystem: ext4
          logicalVolume: nova-fake
          mountPoint: /nova-fake
    

    Note

    Due to a limitation, it is not possible to create volume groups without logical volumes and formatted partitions. Therefore, set the logicalVolumes:name, fileSystems:logicalVolume, and fileSystems:mountPoint parameters to nova-fake.

    For details about BareMetalHostProfile, see Mirantis Container Cloud Operations Guide: Create a custom bare metal host profile.

  2. Configure the OpenStackDeployment CR to deploy OpenStack with LVM ephemeral storage. For example:

    spec:
      features:
        nova:
          images:
            backend: lvm
            lvm:
              volume_group: "nova-vol"
    
  3. Optional. Enable encryption for the LVM ephemeral storage by adding the following metadata in the OpenStackDeployment CR:

    spec:
      features:
        nova:
          images:
            encryption:
              enabled: true
              cipher: "aes-xts-plain64"
              key_size: 256
    

    Caution

    Both live and cold migrations are not supported for such instances.

Enable LVM block storage

Available since MOS 21.3 TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to configure LVM as a back end for the OpenStack Block Storage service.

To enable LVM block storage:

  1. Open BareMetalHostProfile for editing.

  2. In the spec:volumeGroups section, specify the following data for the OpenStack compute nodes. In the following example, we deploy a Cinder volume with LVM on compute nodes. However, you can use dedicated nodes for this purpose.

    spec:
      devices:
        - device:
            byName: /dev/nvme0n1
            minSizeGiB: 30
            wipe: true
          partitions:
            - name: lvm_cinder_vol
              sizeGiB: 0
              wipe: true
      volumeGroups:
        - devices:
          - partition: lvm_cinder_vol
          name: cinder-vol
      logicalVolumes:
        - name: cinder-fake
          vg: cinder-vol
          sizeGiB: 0.1
      fileSystems:
        - fileSystem: ext4
          logicalVolume: cinder-fake
          mountPoint: /cinder-fake
    

    Note

    Due to a limitation, volume groups cannot be created without logical volumes and formatted partitions. Therefore, set the logicalVolumes:name, fileSystems:logicalVolume, and fileSystems:mountPoint parameters to cinder-fake.

    For details about BareMetalHostProfile, see Mirantis Container Cloud Operations Guide: Create a custom bare metal host profile.

  3. Configure the OpenStackDeployment CR to deploy OpenStack with LVM block storage. For example:

    spec:
      nodes:
        openstack-compute-node::enabled:
          features:
            cinder:
              volume:
                backends:
                  lvm:
                    lvm:
                      volume_group: "cinder-vol"
    
Enable DPDK with OVS

Available since MOS Ussuri Update TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to enable DPDK with the Neutron OVS back end.

To enable DPDK with OVS:

  1. Verify that your deployment meets the following requirements:

  2. Enable DPDK in the OsDpl custom resource through the node specific overrides settings. For example:

    spec:
      nodes:
        <NODE-LABEL>::<NODE-LABEL-VALUE>:
          features:
            neutron:
              dpdk:
                bridges:
                - ip_address: 10.12.2.80/24
                  name: br-phy
                driver: igb_uio
                enabled: true
                nics:
                - bridge: br-phy
                  name: nic01
                  pci_id: "0000:05:00.0"
              tunnel_interface: br-phy
    
Enable SR-IOV with OVS

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to enable SR-IOV with the Neutron OVS back end.

To enable SR-IOV with OVS:

  1. Verify that your deployment meets the following requirements:

    • NICs with the SR-IOV support are installed

    • SR-IOV and VT-d are enabled in BIOS

  2. Enable IOMMU in the kernel by configuring intel_iommu=on in the GRUB configuration file. Specify the parameter for compute nodes in BareMetalHostProfile in the grubConfig section:

    spec:
      grubConfig:
          defaultGrubOptions:
            - 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX intel_iommu=on"'
    
  3. Configure the OpenStackDeployment CR to deploy OpenStack with the VLAN tenant network encapsulation.

    Caution

    To ensure correct appliance of the configuration changes, configure VLAN segmentation during the initial OpenStack deployment.

    Configuration example:

    spec:
      features:
        neutron:
          external_networks:
          - bridge: br-ex
            interface: pr-floating
            mtu: null
            network_types:
            - flat
            physnet: physnet1
            vlan_ranges: null
          - bridge: br-tenant
            interface: bond0
            network_types:
              - vlan
            physnet: tenant
            vlan_ranges: 490:499,1420:1459
          tenant_network_types:
            - vlan
    
  4. Enable SR-IOV in the OpenStackDeployment CR through the node-specific overrides settings. For example:

    spec:
      nodes:
        <NODE-LABEL>::<NODE-LABEL-VALUE>:
          features:
            neutron:
              sriov:
                enabled: true
                nics:
                - device: enp10s0f1
                  num_vfs: 7
                  physnet: tenant
    
Enable BGP VPN

TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

The BGP VPN service is an extra OpenStack Neutron plugin that enables connection of OpenStack Virtual Private Networks with external VPN sites through either BGP/MPLS IP VPNs or E-VPN.

To enable the BGP VPN service:

Enable BGP VPN in the OsDpl custom resource through the node specific overrides settings. For example:

spec:
  features:
    neutron:
      bgpvpn:
        enabled: true
         route_reflector:
           # Enable deploygin FRR route reflector
           enabled: true
           # Local AS number
           as_number: 64512
           # List of subnets we allow to connect to
           # router reflector BGP
           neighbor_subnets:
             - 10.0.0.0/8
             - 172.16.0.0/16
  nodes:
    openstack-compute-node::enabled:
      features:
        neutron:
          bgpvpn:
            enabled: true

When the service is enabled, a route reflector is scheduled on nodes with the openstack-frrouting: enabled label. Mirantis recommends collocating the route reflector nodes with the OpenStack controller nodes. By default, two replicas are deployed.

Encrypt the east-west traffic

Available since MOS 21.3 TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

Mirantis OpenStack on Kubernetes allows configuring Internet Protocol Security (IPsec) encryption for the east-west tenant traffic between the OpenStack compute nodes and gateways. The feature uses the strongSwan open source IPsec solution. Authentication is accomplished through a pre-shared key (PSK). However, other authentication methods are upcoming.

To encrypt the east-west tenant traffic, enable ipsec in the spec:features:neutron settings of the OpenStackDeployment CR:

spec:
  features:
    neutron:
      ipsec:
        enabled: true
Enable Cinder back end for Glance

Available since MOS 21.4 TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to configure Cinder back end for the for images through the OpenStackDeployment CR.

Note

This feature depends heavily on Cinder multi-attach, which enables you to simultaneously attach volumes to multiple instances. Therefore, only the block storage back ends that support multi-attach can be used.

To configure a Cinder back end for Glance, define the back end identity in the OpenStackDeployment CR. This identity will be used as a name for the back end section in the Glance configuration file.

When defining the back end:

  • Configure one of the back ends as default.

  • Configure each back end to use specific Cinder volume type.

    Note

    You can use the volume_type parameter instead of backend_name. If so, you have to create this volume type beforehand and take into account that the bootstrap script does not manage such volume types.

The blockstore identity definition example:

spec:
  features:
    glance:
      backends:
        cinder:
          blockstore:
            default: true
            backend_name: <volume_type:volume_name>
            # e.g. backend_name: lvm:lvm_store

spec:
  features:
    glance:
      backends:
        cinder:
          blockstore:
            default: true
            volume_type: netapp
Enable Cinder volume encryption

Available since MOS 21.5 TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

This section instructs you on how to enable Cinder volume encryption through the OpenStackDeployment CR using Linux Unified Key Setup (LUKS) and store the encryption keys in Barbican. For details, see Volume encryption.

To enable Cinder volume encryption:

  1. In the OpenStackDeployment CR, specify the LUKS volume type and configure the required encryption parameters for the storage system to encrypt or decrypt the volume.

    The volume_types definition example:

    spec:
      services:
        block-storage:
          cinder:
            values:
              bootstrap:
                volume_types:
                  volumes-hdd-luks:
                    arguments:
                      encryption-cipher: aes-xts-plain64
                      encryption-control-location: front-end
                      encryption-key-size: 256
                      encryption-provider: luks
                    volume_backend_name: volumes-hdd
    
  2. To create an encrypted volume as a non-admin user and store keys in the Barbican storage, assign the creator role to the user since the default Barbican policy allows only the admin or creator role:

    openstack role add --project <PROJECT-ID> --user <USER-ID> --creator <CREATOR-ID> creator
    
  3. Optional. To define an encrypted volume as a default one, specify volumes-hdd-luks in default_volume_type in the Cinder configuration:

    spec:
      services:
        block-storage:
          cinder:
            values:
              conf:
                cinder:
                  DEFAULT:
                    default_volume_type: volumes-hdd-luks
    
Advanced configuration for OpenStack compute nodes

Available since MOS Ussuri Update

Note

Consider this section as part of Deploy an OpenStack cluster.

The section describes how to perform advanced configuration for the OpenStack compute nodes. Such configuration can be required in some specific use cases, such as DPDK, SR-IOV, or huge pages features usage.

Enable huge pages for OpenStack

Note

Consider this section as part of Deploy an OpenStack cluster.

Note

The instruction provided in this section applies to both OpenStack with OVS and OpenStack with Tungsten Fabric topologies.

The huge pages OpenStack feature provides essential performance improvements for applications that are highly memory IO-bound. Huge pages should be enabled on a per compute node basis. By default, NUMATopologyFilter is enabled in a MOS deployment.

To activate the feature, you need to enable huge pages on the dedicated bare metal host as described in enable-hugepages-bm during the predeployment bare metal configuration.

Note

The multi-size huge pages are not fully supported by Kubernetes versions before 1.19. Therefore, define only one size in kernel parameters.

Configure CPU isolation for an instance

Note

Consider this section as part of Deploy an OpenStack cluster.

CPU isolation allows for better performance of some HPC applications, such as Open vSwitch with DPDK. To configure CPU isolation, add the isolcpus parameter to the GRUB configuration. For example:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=8-19"

Use the instruction from Mirantis Container Cloud Operations Guide: Enable huge pages in a host profile as an example procedure for GRUB parameters configuration.

Configure custom CPU topologies

Note

Consider this section as part of Deploy an OpenStack cluster.

The majority of CPU topologies features are activated by NUMATopologyFilter that is enabled by default. Such features do not require any further service configuration and can be used directly on a vanilla MOS deployment. The list of the CPU topologies features includes, for example:

  • NUMA placement policies

  • CPU pinning policies

  • CPU thread pinning policies

  • CPU topologies

To enable libvirt CPU pinning through the node-specific overrides in the OpenStackDeployment custom resource, use the following sample configuration structure:

spec:
  nodes:
    <NODE-LABEL>::<NODE-LABEL-VALUE>:
      services:
        compute:
          nova_compute:
            values:
              conf:
                nova:
                  compute:
                    cpu_dedicated_set: 2-17
                    cpu_shared_set: 18-47
Configure PCI passthrough for guests

Note

Consider this section as part of Deploy an OpenStack cluster.

The Peripheral Component Interconnect (PCI) passthrough feature in OpenStack allows full access and direct control over physical PCI devices in guests. The mechanism applies to any kind of PCI devices including a Network Interface Card (NIC), Graphics Processing Unit (GPU), and any other device that can be attached to a PCI bus. The only requirement for the guest to properly use the device is to correctly install the driver.

To enable PCI passthrough in a MOS deployment:

  1. For Linux X86 compute nodes, verify that the following features are enabled on the host:

  2. Configure the nova-api service that is scheduled on OpenStack controllers nodes. To generate the alias for PCI in nova.conf, add the alias configuration through the OpenStackDeployment CR.

    Note

    When configuring PCI with SR-IOV on the same host, the values specified in alias take precedence. Therefore, add the SR-IOV devices to passthrough_whitelist explicitly.

    For example:

    spec:
      services:
        compute:
          nova:
            values:
              conf:
                nova:
                  pci:
                    alias: '{ "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1" }'
    
  3. Configure the nova-compute service that is scheduled on OpenStack compute nodes. To enable Nova to pass PCI devices to virtual machines, configure the passthrough_whitelist section in nova.conf through the node-specific overrides in the OpenStackDeployment CR. For example:

    spec:
      nodes:
        <NODE-LABEL>::<NODE-LABEL-VALUE>:
          services:
            compute:
              nova_compute:
                values:
                  conf:
                    nova:
                      pci:
                        alias: '{ "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1" }'
                        passthrough_whitelist: |
                          [{"devname":"enp216s0f0","physical_network":"sriovnet0"}, { "vendor_id": "8086", "product_id": "154d" }]
    
Limit HW resources for hyperconverged OpenStack compute nodes

Available since MOS 21.3

Note

Consider this section as part of Deploy an OpenStack cluster.

Hyperconverged architecture combines OpenStack compute nodes along with Ceph nodes. To avoid nodes overloading, which can cause Ceph performance degradation and outage, limit the hardware resources consumption by the OpenStack compute services.

You can reserve hardware resources for non-workload related consumption using the following nova-compute parameters. For details, see OpenStack documentation: Overcommitting CPU and RAM and OpenStack documentation: Configuration Options.

  • cpu_allocation_ratio - in case of a hyperconverged architecture, the value depends on the number of vCPU used for non-workload related operations, total number of vCPUs of a hyperconverged node, and on workload vCPU consumption:

    cpu_allocation_ratio = (${vCPU_count_on_a_hyperconverged_node} -
    ${vCPU_used_for_non_OpenStack_related_tasks}) /
    ${vCPU_count_on_a_hyperconverged_node} / ${workload_vCPU_utilization}
    

    To define the vCPU count used for non-OpenStack related tasks, use the following formula, considering the storage data plane performance tests:

    vCPU_used_for_non-OpenStack_related_tasks = 2 * SSDs_per_hyperconverged_node +
    1 * Ceph_OSDs_per_hyperconverged_node + 0.8 * Ceph_OSDs_per_hyperconverged_node
    

    Consider the following example with 5 SSD disks for Ceph OSDs per hyperconverged node and 2 Ceph OSDs per disk:

    vCPU_used_for_non-OpenStack_related_tasks = 2 * 5 + 1 * 10 + 0.8 * 10 = 28
    

    In this case, if there are 40 vCPUs per hyperconverged node, 28 vCPUs are required for non-workload related calculations, and a workload consumes 50% of the allocated CPU time: cpu_allocation_ratio = (40-28) / 40 / 0.5 = 0.6.


  • reserved_host_memory_mb - a dedicated variable in the OpenStack Nova configuration, to reserve memory for non-OpenStack related VM activities:

    reserved_host_memory_mb = 13 GB * Ceph_OSDs_per_hyperconverged_node
    

    For example for 10 Ceph OSDs per hyperconverged node: reserved_host_memory_mb = 13 GB * 10 = 130 GB = 133120


  • ram_allocation_ratio - the allocation ratio of virtual RAM to physical RAM. To completely exclude the possibility of memory overcommitting, set to 1.

To limit HW resources for hyperconverged OpenStack compute nodes:

In the OpenStackDeployment CR, specify the cpu_allocation_ratio, ram_allocation_ratio, and reserved_host_memory_mb parameters as required using the calculations described above.

For example:

apiVersion: lcm.mirantis.com/v1alpha1
kind: OpenStackDeployment
spec:
  services:
    compute:
      nova:
        values:
          conf:
            nova:
              DEFAULT:
                cpu_allocation_ratio: 0.6
                ram_allocation_ratio: 1
                reserved_host_memory_mb: 133120

Note

For an existing OpenStack deployment:

  1. Obtain the name of your OpenStackDeployment CR:

    kubectl -n openstack get osdpl
    
  2. Open the OpenStackDeployment CR for editing and specify the parameters as required.

    kubectl -n openstack edit osdpl <osdpl name>
    
Enable image signature verification

Available since MOS 21.6 TechPreview

Note

Consider this section as part of Deploy an OpenStack cluster.

Mirantis OpenStack for Kubernetes (MOS) enables you to perform image signature verification when booting an OpenStack instance, uploading a Glance image with signature metadata fields set, and creating a volume from an image.

To enable signature verification, use the following osdpl definition:

spec:
  features:
    glance:
      signature:
        enabled: true

When enabled during initial deployment, all internal images such as Amphora, Ironic, and test (CirrOS, Fedora, Ubuntu) images, will be signed by a self-signed certificate.

Access OpenStack after deployment

This section contains the guidelines on how to access your MOS OpenStack environment.

Configure DNS to access OpenStack

The OpenStack services are exposed through the Ingress NGINX controller.

To configure DNS to access your OpenStack environment:

  1. Obtain the external IP address of the Ingress service:

    kubectl -n openstack get services ingress
    

    Example of system response:

    NAME      TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)                                      AGE
    ingress   LoadBalancer   10.96.32.97   10.172.1.101   80:34234/TCP,443:34927/TCP,10246:33658/TCP   4h56m
    
  2. Select from the following options:

    • If you have a corporate DNS server, update your corporate DNS service and create appropriate DNS records for all OpenStack public endpoints.

      To obtain the full list of public endpoints:

      kubectl -n openstack get ingress -ocustom-columns=NAME:.metadata.name,HOSTS:spec.rules[*].host | awk '/namespace-fqdn/ {print $2}'
      

      Example of system response:

      barbican.it.just.works
      cinder.it.just.works
      cloudformation.it.just.works
      designate.it.just.works
      glance.it.just.works
      heat.it.just.works
      horizon.it.just.works
      keystone.it.just.works
      neutron.it.just.works
      nova.it.just.works
      novncproxy.it.just.works
      octavia.it.just.works
      placement.it.just.works
      
    • If you do not have a corporate DNS server, perform one of the following steps:

      • Add the appropriate records to /etc/hosts locally. For example:

        10.172.1.101 barbican.it.just.works
        10.172.1.101 cinder.it.just.works
        10.172.1.101 cloudformation.it.just.works
        10.172.1.101 designate.it.just.works
        10.172.1.101 glance.it.just.works
        10.172.1.101 heat.it.just.works
        10.172.1.101 horizon.it.just.works
        10.172.1.101 keystone.it.just.works
        10.172.1.101 neutron.it.just.works
        10.172.1.101 nova.it.just.works
        10.172.1.101 novncproxy.it.just.works
        10.172.1.101 octavia.it.just.works
        10.172.1.101 placement.it.just.works
        
      • Deploy your DNS server on top of Kubernetes:

        1. Deploy a standalone CoreDNS server by including the following configuration into coredns.yaml:

          apiVersion: lcm.mirantis.com/v1alpha1
          kind: HelmBundle
          metadata:
            name: coredns
            namespace: osh-system
          spec:
            repositories:
            - name: hub_stable
              url: https://charts.helm.sh/stable
            releases:
            - name: coredns
              chart: hub_stable/coredns
              version: 1.8.1
              namespace: coredns
              values:
                image:
                  repository: mirantis.azurecr.io/openstack/extra/coredns
                  tag: "1.6.9"
                isClusterService: false
                servers:
                - zones:
                  - zone: .
                    scheme: dns://
                    use_tcp: false
                  port: 53
                  plugins:
                  - name: cache
                    parameters: 30
                  - name: errors
                  # Serves a /health endpoint on :8080, required for livenessProbe
                  - name: health
                  # Serves a /ready endpoint on :8181, required for readinessProbe
                  - name: ready
                  # Required to query kubernetes API for data
                  - name: kubernetes
                    parameters: cluster.local
                  - name: loadbalance
                    parameters: round_robin
                  # Serves a /metrics endpoint on :9153, required for serviceMonitor
                  - name: prometheus
                    parameters: 0.0.0.0:9153
                  - name: forward
                    parameters: . /etc/resolv.conf
                  - name: file
                    parameters: /etc/coredns/it.just.works.db it.just.works
                serviceType: LoadBalancer
                zoneFiles:
                - filename: it.just.works.db
                  domain: it.just.works
                  contents: |
                    it.just.works.            IN      SOA     sns.dns.icann.org. noc.dns.icann.org. 2015082541 7200 3600 1209600 3600
                    it.just.works.            IN      NS      b.iana-servers.net.
                    it.just.works.            IN      NS      a.iana-servers.net.
                    it.just.works.            IN      A       1.2.3.4
                    *.it.just.works.           IN      A      1.2.3.4
          
        2. Update the public IP address of the Ingress service:

          sed -i 's/1.2.3.4/10.172.1.101/' release/ci/30-coredns.yaml
          kubectl apply -f release/ci/30-coredns.yaml
          
        3. Verify that the DNS resolution works properly:

          1. Assign an external IP to the service:

            kubectl -n coredns patch service coredns-coredns --type='json' -p='[{"op": "replace", "path": "/spec/ports", "value": [{"name": "udp-53", "port": 53, "protocol": "UDP", "targetPort": 53}]}]'
            kubectl -n coredns patch service coredns-coredns --type='json' -p='[{"op": "replace", "path": "/spec/type", "value":"LoadBalancer"}]'
            
          2. Obtain the external IP address of CoreDNS:

            kubectl -n coredns get service coredns-coredns
            

            Example of system response:

            NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
            coredns-coredns   ClusterIP   10.96.178.21   10.172.1.102      53/UDP,53/TCP   25h
            
        4. Point your machine to use the correct DNS. It is 10.172.1.102 in the example system response above.

        5. If you plan to launch Tempest tests or use the OpenStack client from a keystone-client-XXX pod, verify that the Kubernetes built-in DNS service is configured to resolve your public FQDN records by adding your public domain to Corefile. For example, to add the it.just.works domain:

          kubectl -n kube-system get configmap coredns -oyaml
          

          Example of system response:

          apiVersion: v1
          data:
            Corefile: |
              .:53 {
                  errors
                  health
                  ready
                  kubernetes cluster.local in-addr.arpa ip6.arpa {
                    pods insecure
                    fallthrough in-addr.arpa ip6.arpa
                  }
                  prometheus :9153
                  forward . /etc/resolv.conf
                  cache 30
                  loop
                  reload
                  loadbalance
              }
              it.just.works:53 {
                  errors
                  cache 30
                  forward . 10.96.178.21
              }
          
Access your OpenStack environment

This section explains how to access your OpenStack environment as the Admin user.

Before you proceed, verify that you can access the Kubernetes API and have privileges to read secrets from the openstack namespace in Kubernetes or you are able to exec to the pods in this namespace.

Access OpenStack using the Kubernetes built-in admin CLI

You can use the built-in admin CLI client and execute the openstack CLI commands from a dedicated pod deployed in the openstack namespace:

kubectl -n openstack exec \
  $(kubectl -n openstack get pod -l application=keystone,component=client -ojsonpath='{.items[*].metadata.name}') \
  -ti -- bash

This pod has python-openstackclient and all required plugins already installed. Also, this pod has cloud admin credentials stored as appropriate shell environment variables for the openstack CLI command to consume.

Access an OpenStack environment through Horizon
  1. Configure the external DNS resolution for OpenStack services as described in Configure DNS to access OpenStack.

  2. Obtain the password of the Admin user:

    kubectl -n openstack get secret keystone-keystone-admin -ojsonpath='{.data.OS_PASSWORD}' | base64 -d
    
  3. Access Horizon through your browser using its public service. For example, https://horizon.it.just.works.

    To log in, specify the admin user name and default domain. If the OpenStack Identity service has been deployed with the OpenID Connect integration:

    1. From the Authenticate using drop-down menu, select OpenID Connect.

    2. Click Connect. You will be redirected to your identity provider to proceed with the authentication.

    Note

    If OpenStack has been deployed with self-signed TLS certificates for public endpoints, you may get a warning about an untrusted certificate. To proceed, allow the connection.

Access OpenStack through CLI from your local machine

To be able to access your OpenStack environment using CLI, you need to set the required environment variables that are stored in an OpenStack RC environment file. You can either download a project-specific file from Horizon, which is the easiest way, or create an environment file.

To access OpenStack through CLI, select from the following options:

  • Download and source the OpenStack RC file:

    1. Log in to Horizon as described in Access an OpenStack environment through Horizon.

    2. Download the openstackrc or clouds.yaml file from the Web interface.

    3. On any shell from which you want to run OpenStack commands, source the environment file for the respective project.

  • Create and source the OpenStack RC file:

    1. Configure the external DNS resolution for OpenStack services as described in Configure DNS to access OpenStack.

    2. Create a stub of the OpenStack RC file:

      cat << EOF > openstackrc
      export OS_PASSWORD=$(kubectl -n openstack get secret keystone-keystone-admin -ojsonpath='{.data.OS_PASSWORD}' | base64 -d)
      export OS_USERNAME=admin
      export OS_USER_DOMAIN_NAME=Default
      export OS_PROJECT_NAME=admin
      export OS_PROJECT_DOMAIN_NAME=Default
      export OS_REGION_NAME=RegionOne
      export OS_INTERFACE=public
      export OS_IDENTITY_API_VERSION="3"
      EOF
      
    3. Add the Keystone public endpoint to this file as the OS_AUTH_URL variable. For example, for the domain name used throughout this guide:

      echo export OS_AUTH_URL=https://keystone.it.just.works >> openstackrc
      
    4. Source the obtained data into the shell:

      source <openstackrc>
      

      Now, you can use the openstack CLI as usual. For example:

      openstack user list
      +----------------------------------+-----------------+
      | ID                               | Name            |
      +----------------------------------+-----------------+
      | dc23d2d5ee3a4b8fae322e1299f7b3e6 | internal_cinder |
      | 8d11133d6ef54349bd014681e2b56c7b | admin           |
      +----------------------------------+-----------------+
      

      Note

      If OpenStack was deployed with self-signed TLS certificates for public endpoints, you may need to use the openstack CLI client with certificate validation disabled. For example:

      openstack --insecure user list
      
Troubleshoot an OpenStack deployment

This section provides the general debugging instructions for your OpenStack on Kubernetes deployment. Start your troubleshooting with the determination of the failing component that can include the OpenStack Operator, Helm, a particular pod or service.

Debugging the Helm releases

Note

Starting from MOS 21.4, MOS uses direct communication with Helm 3.

Verify the Helm releases statuses
  1. Log in to the openstack-controller pod, where the Helm v3 client is installed, or download the Helm v3 binary locally:

    kubectl -n osh-system get pods  |grep openstack-controller
    

    Example of a system response:

    openstack-controller-5c6947c996-vlrmv            5/5     Running     0          10m
    openstack-controller-admission-f946dc8d6-6bgn2   1/1     Running     0          4h3m
    
  2. Verify the Helm releases statuses:

    helm3 --namespace openstack list --all
    

    Example of a system response:

    NAME                            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
    etcd                            openstack       4               2021-07-09 11:06:25.377538008 +0000 UTC deployed        etcd-0.1.0-mcp-2735
    ingress-openstack               openstack       4               2021-07-09 11:06:24.892822083 +0000 UTC deployed        ingress-0.1.0-mcp-2735
    openstack-barbican              openstack       4               2021-07-09 11:06:25.733684392 +0000 UTC deployed        barbican-0.1.0-mcp-3890
    openstack-ceph-rgw              openstack       4               2021-07-09 11:06:25.045759981 +0000 UTC deployed        ceph-rgw-0.1.0-mcp-2735
    openstack-cinder                openstack       4               2021-07-09 11:06:42.702963544 +0000 UTC deployed        cinder-0.1.0-mcp-3890
    openstack-designate             openstack       4               2021-07-09 11:06:24.400555027 +0000 UTC deployed        designate-0.1.0-mcp-3890
    openstack-glance                openstack       4               2021-07-09 11:06:25.5916904 +0000 UTC deployed        glance-0.1.0-mcp-3890
    openstack-heat                  openstack       4               2021-07-09 11:06:25.3998706 +0000 UTC deployed        heat-0.1.0-mcp-3890
    openstack-horizon               openstack       4               2021-07-09 11:06:23.27538297 +0000 UTC deployed        horizon-0.1.0-mcp-3890
    openstack-iscsi                 openstack       4               2021-07-09 11:06:37.891858343 +0000 UTC deployed        iscsi-0.1.0-mcp-2735            v1.0.0
    openstack-keystone              openstack       4               2021-07-09 11:06:24.878052272 +0000 UTC deployed        keystone-0.1.0-mcp-3890
    openstack-libvirt               openstack       4               2021-07-09 11:06:38.185312907 +0000 UTC deployed        libvirt-0.1.0-mcp-2735
    openstack-mariadb               openstack       4               2021-07-09 11:06:24.912817378 +0000 UTC deployed        mariadb-0.1.0-mcp-2735
    openstack-memcached             openstack       4               2021-07-09 11:06:24.852840635 +0000 UTC deployed        memcached-0.1.0-mcp-2735
    openstack-neutron               openstack       4               2021-07-09 11:06:58.96398517 +0000 UTC deployed        neutron-0.1.0-mcp-3890
    openstack-neutron-rabbitmq      openstack       4               2021-07-09 11:06:51.454918432 +0000 UTC deployed        rabbitmq-0.1.0-mcp-2735
    openstack-nova                  openstack       4               2021-07-09 11:06:44.277976646 +0000 UTC deployed        nova-0.1.0-mcp-3890
    openstack-octavia               openstack       4               2021-07-09 11:06:24.775069513 +0000 UTC deployed        octavia-0.1.0-mcp-3890
    openstack-openvswitch           openstack       4               2021-07-09 11:06:55.271711021 +0000 UTC deployed        openvswitch-0.1.0-mcp-2735
    openstack-placement             openstack       4               2021-07-09 11:06:21.954550107 +0000 UTC deployed        placement-0.1.0-mcp-3890
    openstack-rabbitmq              openstack       4               2021-07-09 11:06:25.431404853 +0000 UTC deployed        rabbitmq-0.1.0-mcp-2735
    openstack-tempest               openstack       2               2021-07-09 11:06:21.330801212 +0000 UTC deployed        tempest-0.1.0-mcp-3890
    

    If a Helm release is not in the DEPLOYED state, obtain the details from the output of the following command:

    helm3 --namespace openstack  history <release-name>
    
Verify the status of a Helm release

To verify the status of a Helm release:

helm3 --namespace openstack status <release-name>

Example of a system response:

NAME: openstack-memcached
LAST DEPLOYED: Fri Jul  9 11:06:24 2021
NAMESPACE: openstack
STATUS: deployed
REVISION: 4
TEST SUITE: None
Debugging the OpenStack Controller

The OpenStack Controller is running in several containers in the openstack-controller-xxxx pod in the osh-system namespace. For the full list of containers and their roles, refer to OpenStack Controller.

To verify the status of the OpenStack Controller, run:

kubectl -n osh-system get pods

Example of a system response:

NAME                                  READY   STATUS    RESTARTS   AGE
openstack-controller-5c6947c996-vlrmv            5/5     Running     0          17m
openstack-controller-admission-f946dc8d6-6bgn2   1/1     Running     0          4h9m
openstack-operator-ensure-resources-5ls8k        0/1     Completed   0          4h12m

To verify the logs for the osdpl container, run:

kubectl -n osh-system logs -f <openstack-controller-xxxx> -c osdpl
Debugging the OsDpl CR

This section includes the ways to mitigate the most common issues with the OsDpl CR. We assume that you have already debugged the Helm releases and OpenStack Controllers to rule out possible failures with these components as described in Debugging the Helm releases and Debugging the OpenStack Controller.

The osdpl has DEPLOYED=false

Possible root cause: One or more Helm releases have not been deployed successfully.

To determine if you are affected:

Verify the status of the osdpl object:

kubectl -n openstack get osdpl osh-dev

Example of a system response:

NAME      AGE   DEPLOYED   DRAFT
osh-dev   22h   false      false

To debug the issue:

  1. Identify the failed release by assessing the status:children section in the OsDpl resource:

    1. Get the OsDpl YAML file:

      kubectl -n openstack get osdpl osh-dev -o yaml
      
    2. Analyze the status output using the detailed description in Status OsDpl elements.

  2. For further debugging, refer to Debugging the Helm releases.

Some pods are stuck in Init

Possible root cause: MOS uses the Kubernetes entrypoint init container to resolve dependencies between objects. If the pod is stuck in Init:0/X, this pod may be waiting for its dependencies.

To debug the issue:

Verify the missing dependencies:

kubectl -n openstack logs -f placement-api-84669d79b5-49drw -c init

Example of a system response:

Entrypoint WARNING: 2020/04/21 11:52:50 entrypoint.go:72: Resolving dependency Job placement-ks-user in namespace openstack failed: Job Job placement-ks-user in namespace openstack is not completed yet .
Entrypoint WARNING: 2020/04/21 11:52:52 entrypoint.go:72: Resolving dependency Job placement-ks-endpoints in namespace openstack failed: Job Job placement-ks-endpoints in namespace openstack is not completed yet .
Some Helm releases are not present

Possible root cause: some OpenStack services depend on Ceph. These services include OpenStack Image, OpenStack Compute, and OpenStack Block Storage. If the Helm releases for these services are not present, the openstack-ceph-keys secret may be missing in the openstack-ceph-shared namespace.

To debug the issue:

Verify that the Ceph Controller has created the openstack-ceph-keys secret in the openstack-ceph-shared namespace:

kubectl -n openstack-ceph-shared get secrets openstack-ceph-keys

Example of a positive system response:

NAME                  TYPE     DATA   AGE
openstack-ceph-keys   Opaque   7      23h

If the secret is not present, create one manually.

Deploy Tungsten Fabric

This section describes how to deploy Tungsten Fabric as a back end for networking for your MOS environment.

Caution

Before you proceed with the Tungsten Fabric deployment, read through Tungsten Fabric known limitations.

Tungsten Fabric deployment prerequisites

Before you proceed with the actual Tungsten Fabric (TF) deployment, verify that your deployment meets the following prerequisites:

  1. Your MOS OpenStack cluster is deployed as described in Deploy an OpenStack cluster with the Tungsten Fabric back end enabled for Neutron using the following structure:

    spec:
      features:
        neutron:
          backend: tungstenfabric
    
  2. Your MOS OpenStack cluster uses the correct value of features:neutron:tunnel_interface in the openstackdeployment object. The TF Operator will consume this value through the shared secret and use it as a network interface from the underlay network to create encapsulated tunnels with the tenant networks.

    Warning

    TF uses features:neutron:tunnel_interface to create the vhost0 virtual interface and transfers the IP configuration from the tunnel_interface to the virtual one. Therefore, plan this interface as a dedicated physical interface for TF overlay networks.

  3. The Kubernetes nodes are labeled according to the TF node roles:

    Tungsten Fabric (TF) node roles

    Node role

    Description

    Kubernetes labels

    Minimal count

    TF control plane

    Hosts the TF control plane services such as database, messaging, api, svc, config.

    tfconfig=enabled
    tfcontrol=enabled
    tfwebui=enabled
    tfconfigdb=enabled

    3

    TF analytics

    Hosts the TF analytics services.

    tfanalytics=enabled
    tfanalyticsdb=enabled

    3

    TF vRouter

    Hosts the TF vRouter module and vRouter agent.

    tfvrouter=enabled

    Varies

    TF vRouter DPDK Technical Preview

    Hosts the TF vRouter agent in DPDK mode.

    tfvrouter-dpdk=enabled

    Varies

    Note

    TF supports only Kubernetes OpenStack workloads. Therefore, you should label OpenStack compute nodes with the tfvrouter=enabled label.

    Note

    Do not specify the openstack-gateway=enabled and openvswitch=enabled labels for the MOS deployments with TF as a networking back end for OpenStack.

Deploy Tungsten Fabric

Deployment of Tungsten Fabric (TF) is managed by the tungstenfabric-operator Helm resource in a respective MOS ClusterRelease.

To deploy TF:

  1. Verify that you have completed all prerequisite steps as described in Tungsten Fabric deployment prerequisites.

  2. Create a tungstenfabric.yaml file with the TF resource configuration. For example:

    apiVersion: operator.tf.mirantis.com/v1alpha1
    kind: TFOperator
    metadata:
      name: openstack-tf
      namespace: tf
    spec:
      settings:
        orchestrator: openstack
    

    If you do not specify tfVersion, MOS deploys TF 2011. We do not recommend deploying TF 5.1 as this version is considered deprecated and will be declared unsupported in one of the upcoming releases.

  3. Configure the TFOperator custom resource according to the needs of your deployment. For the configuration details, refer to TFOperator custom resource.

  4. Trigger the TF deployment:

    kubectl apply -f tungstenfabric.yaml
    
  5. Verify that TF has been successfully deployed:

    kubectl get pods -n tf
    

    The successfully deployed TF services should appear in the Running status in the system response.

  6. Starting from MOS 21.1, if you have enabled StackLight, enable Tungsten Fabric monitoring by setting tungstenFabricMonitoring.enabled to true as described in StackLight configuration procedure.

Advanced Tungsten Fabric configuration (optional)

This section includes configuration information for available advanced Mirantis OpenStack for Kubernetes features that include SR-IOV and DPDK with the Neutron Tungsten Fabric back end.

Enable huge pages for OpenStack with Tungsten Fabric

Note

The instruction provided in this section applies to both OpenStack with OVS and OpenStack with Tungsten Fabric topologies.

The huge pages OpenStack feature provides essential performance improvements for applications that are highly memory IO-bound. Huge pages should be enabled on a per compute node basis. By default, NUMATopologyFilter is enabled in a MOS deployment.

To activate the feature, you need to enable huge pages on the dedicated bare metal host as described in enable-hugepages-bm during the predeployment bare metal configuration.

Note

The multi-size huge pages are not fully supported by Kubernetes versions before 1.19. Therefore, define only one size in kernel parameters.

Enable DPDK for Tungsten Fabric

Available since MOS Ussuri Update TechPreview

This section describes how to enable DPDK mode for the Tungsten Fabric (TF) vRouter.

To enable DPDK for TF:

  1. Install the required drivers on the host operating system. The vfio-pci, uio_pci_generic, or mlnx drivers can be used with the TF vRouter agent in DPDK mode. For details about DPDK drivers, see Linux Drivers.

  2. Verify that DPDK NICs are not used on the host operating system.

    Note

    For use in the Linux user space, DPDK NICs will be bound to specific Linux drivers, required by PMDs. In such a way, bounded NICs are not available for usage by standard Linux network utilities. Therefore, allocate a dedicated NIC(s) for the vRouter deployment in DPDK mode.

  3. Enable huge pages on the host as described in enable-hugepages-bm.

  4. Mark the hosts for deployment with DPDK with the tfvrouter-dpdk=enabled label.

  5. Open the TF Operator custom resource for editing:

    kubectl -n tf edit tfoperators.operator.tf.mirantis.com openstack-tf
    
  6. Enable DPDK:

    spec:
      controllers:
        tf-vrouter:
          agent-dpdk:
            enabled: true
    
Enable SR-IOV for Tungsten Fabric

Available since MOS 21.2

This section instructs you on how to enable SR-IOV with the Neutron Tungsten Fabric (TF) back end.

To enable SR-IOV for TF:

  1. Verify that your deployment meets the following requirements:

    • NICs with the SR-IOV support are installed

    • SR-IOV and VT-d are enabled in BIOS

  2. Enable IOMMU in the kernel by configuring intel_iommu=on in the GRUB configuration file. Specify the parameter for compute nodes in BareMetalHostProfile in the grubConfig section:

    spec:
      grubConfig:
        defaultGrubOptions:
          - 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX intel_iommu=on"'
    
  3. Enable SR-IOV in the OpenStackDeployment CR through the node-specific overrides settings. For example:

    spec:
      nodes:
        <NODE-LABEL>::<NODE-LABEL-VALUE>:
          features:
            neutron:
              sriov:
                enabled: true
                nics:
                - device: enp10s0f1
                  num_vfs: 7
                  physnet: tenant
    

    Warning

    After the OpenStackDeployment CR modification, the TF Operator generates a separate vRouter DaemonSet with specified settings. The tf-vrouter-agent-<XXXXX> pods will be automatically restarted on the affected nodes causing the network services interruption on virtual machines running on these hosts.

Configure multiple Contrail API workers

Available since MOS 21.5 TechPreview

Starting from the MOS 21.5 release, six workers of the contrail-api service are used by default on the Tungsten Fabric MOS deployments. In the previous MOS releases, only one worker of this service was used. If required, you can change the default configuration using the instruction below.

To configure the number of Contrail API workers on a MOS TF deployment:

  1. Specify the required number of workers in the CONFIG_API_WORKER_COUNT environment variable in the TFOperator custom resource (CR):

    spec:
      controllers:
        tf-config:
          api:
            containers:
            - env:
              - name: CONFIG_API_WORKER_COUNT
                value: "7"
              name: api
    
  2. Wait until all tf-config-* pods are restarted.

  3. Verify the number of workers inside the running API container:

    kubectl -n tf exec -ti tf-config-rclzq -c api -- ps aux --width 500
    kubectl -n tf exec -ti tf-config-rclzq -c api -- ls /etc/contrail/
    

    Verify that the ps output lists one API process with PID "1" and the number of workers set in the TFOperator CR.

  4. In /etc/contrail/, verify that the number of configuration files contrail-api-X.conf matches the number of workers set in the TFOperator CR.

Access the Tungsten Fabric web UI

The Tungsten Fabric (TF) web UI allows for easy and fast TF resources configuration, monitoring, and debugging. You can access the TF web UI through either the Ingress service or the Kubernetes Service directly. TLS termination for the https protocol is performed through the Ingress service.

Note

Mirantis OpenStack for Kubernetes provides the TF web UI as is and does not include this service in the support Service Level Agreement.

To access the TF web UI through Ingress:

  1. Log in to a local machine running Ubuntu 18.04 where kubectl is installed.

  2. Obtain and export kubeconfig of your managed cluster as described in Mirantis Container Cloud Operations Guide: Connect to a Container Cloud managed cluster.

  3. Obtain the password of the Admin user:

    kubectl -n openstack exec -it $(kubectl -n openstack get pod -l application=keystone,component=client -o jsonpath='{.items[0].metadata.name}') -- env | grep PASS
    
  4. Obtain the external IP address of the Ingress service:

    kubectl -n openstack get services ingress
    

    Example of system response:

    NAME      TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)                                      AGE
    ingress   LoadBalancer   10.96.32.97   10.172.1.101   80:34234/TCP,443:34927/TCP,10246:33658/TCP   4h56m
    

    Note

    Do not use the EXTERNAL-IP value to directly access the TF web UI. Instead, use the FQDN from the list below.

  5. Obtain the FQDN of tf-webui:

    Note

    The command below outputs all host names assigned to the TF web UI service. Use one of them.

    kubectl -n tf get ingress tf-webui -o custom-columns=HOSTS:.spec.rules[*].host
    
  6. Configure DNS to access the TF web UI host as described in Configure DNS to access OpenStack.

  7. Use your favorite browser to access the TF web UI at https://<FQDN-WEBUI>.

Troubleshoot the Tungsten Fabric deployment

This section provides the general debugging instructions for your Tungsten Fabric (TF) on Kubernetes deployment.

Enable debug logs for the Tungsten Fabric services

To enable debug logging for the Tungsten Fabric (TF) services:

  1. Open the TF custom resource for modification:

    kubectl -n tf edit tfoperators.operator.tf.mirantis.com openstack-tf
    
  2. Specify the LOG_LEVEL variable with the SYS_DEBUG value for the required TF service. For example, for the config-api service:

    spec:
      controllers:
        tf-config:
          api:
            containers:
            - name: api
              env:
              - name: LOG_LEVEL
                value: SYS_DEBUG
    

Warning

After the TF custom resource modification, the pods related to the affected services will be restarted. This rule does not apply to the tf-vrouter-agent-<XXXXX> pods as their update strategy differs. Therefore, if you enable the debug logging for the services in a tf-vrouter-agent-<XXXXX> pod, restart this pod manually after you modify the custom resource.

Troubleshoot access to the Tungsten Fabric web UI

If you cannot access the Tungsten Fabric (TF) web UI service, verify that the FQDN of the TF web UI is resolvable on your PC by running one of the following commands:

host tf-webui.it.just.works
# or
ping tf-webui.it.just.works
# or
dig host tf-webui.it.just.works

All commands above should resolve the web UI domain name to the IP address that should match the EXTERNAL-IPs subnet dedicated to Kubernetes.

If the TF web UI domain name has not been resolved to the IP address, your PC is using a different DNS or the DNS does not contain the record for the TF web UI service. To resolve the issue, define the IP address of the Ingress service from the openstack namespace of Kubernetes in the hosts file of your machine. To obtain the Ingress IP address:

kubectl -n openstack get svc ingress -o custom-columns=HOSTS:.status.loadBalancer.ingress[*].ip

If the web UI domain name is resolvable but you still cannot access the service, verify the connectivity to the cluster.

Disable TX offloading on NICs used by vRouter

Available since MOS 21.3

In the following cases, a TCP-based service may not work on VMs:

  • If the setup has nested VMs.

  • If VMs are running in the ESXi hypervisor.

  • If the Network Interface Cards (NICs) do not support the IP checksum calculation and generate an incorrect checksum. For example, the Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe NIC cards.

To resolve the issue, disable the transmit (TX) offloading on all OpenStack compute nodes for the affected NIC used by the vRouter as described below.

To identify the issue:

  1. Verify whether ping is working between VMs on different hypervisor hosts and the TCP services are working.

  2. Run the following command for the vRouter agent and verify whether the output includes the number of Checksum errors:

    kubectl -n tf exec tf-vrouter-agent-XXXXX -c agent -- dropstats
    
  3. Run the following command and verify if the output includes the cksum incorrect entries:

    kubectl -n tf exec tf-vrouter-agent-XXXXX -c agent -- tcpdump -i <tunnel interface> -v -nn | grep -i incorrect
    

    Example of system response:

    tcpdump: listening on <tunnel interface>, link-type EN10MB (Ethernet), capture size 262144 bytes
    <src ip.port> > <dst ip.port>: Flags [S.], cksum 0x43bf (incorrect -> 0xb8dc), \
    seq 1901889431, ack 1081063811, win 28960, options [mss 1420,sackOK,\
    TS val 456361578 ecr 41455995,nop,wscale 7], length 0
    <src ip.port> > <dst ip.port>: Flags [S.], cksum 0x43bf (incorrect -> 0xb8dc), \
    seq 1901889183, ack 1081063811, win 28960, options [mss 1420,sackOK,\
    TS val 456361826 ecr 41455995,nop,wscale 7], length 0
    <src ip.port> > <dst ip.port>: Flags [S.], cksum 0x43bf (incorrect -> 0xb8dc), \
    seq 1901888933, ack 1081063811, win 28960, options [mss 1420,sackOK,\
    TS val 456362076 ecr 41455995,nop,wscale 7], length 0
    
  4. Run the following command for the vRouter agent container and verify whether the output includes the information about a drop for an unknown reason:

    kubectl -n tf exec tf-vrouter-agent-XXXXX -c agent -- flow -l
    

To disable the TX offloading on NICs used by vRouter:

  1. Open the TFOperator custom resource (CR) for editing:

    kubectl -n tf edit tfoperators.operator.tf.mirantis.com openstack-tf
    
  2. Specify the DISABLE_TX_OFFLOAD variable with the "YES" value for the vRouter agent container:

    spec:
      controllers:
        tf-vrouter:
          agent:
            containers:
            - name: agent
              env:
              - name: DISABLE_TX_OFFLOAD
                value: "YES"
    

    Warning

    Once you modify the TFOperator CR, the tf-vrouter-agent-<XXXXX> pods will not restart automatically because they use the OnDelete update strategy. Restart such pods manually, considering that the vRouter pods restart causes network services interruption for the VMs hosted on the affected nodes.

  3. To disable TX offloading on a specific subset of nodes, use custom vRouter settings. For details, see Custom vRouter settings.

    Warning

    Once you add a new CustomSpec, a new daemon set will be generated and the tf-vrouter-agent-<XXXXX> pods will be automatically restarted. The vRouter pods restart causes network services interruption for VMs hosted on the affected node. Therefore, plan this procedure accordingly.

Operations Guide

This guide outlines the post-deployment Day-2 operations for a Mirantis OpenStack for Kubernetes environment. It describes how to configure and manage the MOS components, perform different types of cloud verification, and enable additional features depending on your cloud needs. The guide also contains day-to-day maintenance procedures such as how to back up and restore, update and upgrade, or troubleshoot your MOS cluster.

Update a MOS cluster

Once a Mirantis Container Cloud management cluster automatically upgrades to a new available Container Cloud release version, a newer version of a Cluster release becomes available for MOS managed clusters.

This section instructs you on how to update your MOS cluster using the Container Cloud web UI.

Note

The Tungsten Fabric update is part of the MOS managed cluster release update and does not require any additional manual steps.

Caution

Make sure to update the Cluster release version of your managed cluster before the current Cluster release version becomes unsupported by a new Container Cloud release version. Otherwise, Container Cloud stops auto-upgrade and eventually Container Cloud itself becomes unsupported.

To update a MOS cluster:

  1. Set the maintenance flag for Ceph:

    Note

    Starting from the MOS 21.4 to 21.5 update, skip this step since it is automated. The maintenance flag is deprecated and will be removed from KaasCephCluster.

    1. Open the KaasCephCluster CR for editing:

      kubectl edit kaascephcluster
      
    2. Enable the maintenance flag:

      spec:
        cephClusterSpec:
          maintenance: true
      
  2. Log in to the Container Cloud web UI with the writer permissions.

  3. Switch to the required project using the Switch Project action icon located on top of the main left-side navigation panel.

  4. In the Clusters tab, click More action icon in the last column for each cluster and select Update cluster where available.

  5. In the Release Update window, select the required Cluster release to update your managed cluster to.

    The Description section contains the list of components versions to be installed with a new Cluster release.

  6. Click Update.

    Before the cluster update starts, Container Cloud performs a backup of MKE and Docker Swarm. The backup directory is located under:

    • /srv/backup/swarm on every Container Cloud node for Docker Swarm

    • /srv/backup/ucp on one of the controller nodes for MKE

    To view the update status, verify the cluster status on the Clusters page. Once the orange blinking dot near the cluster name disappears, the update is complete.

  7. Disable the maintenance flag for Ceph from the KaasCephCluster CR once the update is complete and all nodes are in the Ready status:

    Note

    Starting from the MOS 21.4 to 21.5 update, skip this step since it is automated. The maintenance flag is deprecated and will be removed from KaasCephCluster.

    spec:
      cephClusterSpec:
        maintenance: false
    

Note

In rare cases, after a managed cluster update, Grafana may stop working due to the issues with helm-controller.

The development team is working on the issue that will be addressed in one of the following releases.

Note

MKE and Kubernetes API may return short-term 50x errors during the update process. Ignore these errors.

OpenStack operations

The section covers the management aspects of an OpenStack cluster deployed on Kubernetes.

Update OpenStack

The update of the OpenStack components is performed during the MOS managed cluster release update as described in Update a MOS cluster.

Upgrade OpenStack

Available since MOS 21.5

This section provides instructions on how to upgrade the OpenStack version on a MOS managed cluster.

Prerequisites
  1. Verify that your OpenStack cloud is running on the latest MOS release. See Release Compatibility Matrix NEW for the release matrix and supported upgrade paths.

  2. Just before the upgrade, back up your OpenStack databases. See Back up and restore a MariaDB Galera database for details.

  3. Verify that OpenStack is healthy and operational. All OpenStack components in the health group in the OpenStackDeploymentStatus CR should be in the Ready state. See OpenStackDeploymentStatus custom resource for details.

  4. Verify the workability of your OpenStack deployment by running Tempest against the OpenStack cluster as described in Run Tempest tests. Verification of the testing pass rate before upgrading will help you measure your cloud quality before and after upgrade.

  5. Read carefully through the MOS release notes of your MOS version paying attention to the Known issues section and the OpenStack upstream release notes for the target OpenStack version.

  6. Calculate the maintenance window using Calculate a maintenance window duration and notify users.

Perform the upgrade

To start the OpenStack upgrade, change the value of the spec:openstack_version parameter in the OpenStackDeployment object to the target OpenStack release.

Caution

It is not allowed to do skip level upgrades as well as downgrades.

When you change the value of the spec:openstack_version parameter, the OpenStack controller initializes the upgrade process.

To verify the upgrade status, use:

  • Logs from the osdpl container in the OpenStack controller pod.

  • The OpenStackDeploymentStatus object.

    When upgrade starts, the OPENSTACK VERSION field content changes to the target OpenStack version, and STATE displays APPLYING:

    kubectl -n openstack get osdplst
    NAME      OPENSTACK VERSION   CONTROLLER VERSION   STATE
    osh-dev   victoria            0.5.8.dev15          APPLYING
    

    When upgrade finishes, the STATE field should display APPLIED:

    kubectl -n openstack get osdplst
    NAME      OPENSTACK VERSION   CONTROLLER VERSION   STATE
    osh-dev   victoria            0.5.8.dev15          APPLIED
    
Verify the upgrade
  1. Verify that OpenStack is healthy and operational. All OpenStack components in the health group in the OpenStackDeploymentStatus CR should be in the Ready state. See OpenStackDeploymentStatus custom resource for details.

  2. Verify the workability of your OpenStack deployment by running Tempest against the OpenStack cluster as described in Run Tempest tests.

Add a compute node

This section describes how to add a new compute node to your existing Mirantis OpenStack for Kubernetes deployment.

To add a compute node:

  1. Add a bare metal host to the managed cluster with MOS as described in Add a bare metal host.

  2. Create a Kubernetes machine in your cluster as described in Add a machine.

    When adding the machine, specify the node labels as required for an OpenStack compute node:

    OpenStack node roles

    Node role

    Description

    Kubernetes labels

    Minimal count

    OpenStack control plane

    Hosts the OpenStack control plane services such as database, messaging, API, schedulers, conductors, L3 and L2 agents.

    openstack-control-plane=enabled
    openstack-gateway=enabled
    openvswitch=enabled

    3

    OpenStack compute

    Hosts the OpenStack compute services such as libvirt and L2 agents.

    openstack-compute-node=enabled
    openvswitch=enabled (for a deployment with Open vSwitch as a back end for networking)

    Varies

  3. If required, configure the compute host to enable DPDK, huge pages, SR-IOV, and other advanced features in your MOS deployment. See Advanced OpenStack configuration (optional) for details.

  4. Once the node is available in Kubernetes and when the nova-compute and neutron pods are running on the node, verify that the compute service and Neutron agents are healthy in OpenStack API.

    In the keystone-client pod, run:

    openstack network agent list --host <cmp_host_name>
    
    openstack compute service list --host <cmp_host_name>
    
  5. Verify that the compute service is mapped to cell.

    The OpenStack controller triggers the nova-cell-setup job once it detects a new compute pod in the Ready state. This job sets mapping for new compute services to cells.

    In the nova-api-osapi pod, run:

    nova-manage cell_v2 list_hosts | grep <cmp_host_name>
    
Delete a compute node

This section describes how to delete an OpenStack compute node from your MOS deployment.

To delete a compute node:

Caution

The OpenStack compute node can be collocated with other components, for example, Ceph. Refer to the removal steps of collocated components when planning maintenance.

  1. Disable the compute service to prevent spawning of new instances. In the keystone-client pod, run:

    openstack compute service set --disable <cmp_host_name> nova-compute --disable-reason "Compute is going to be removed."
    
  2. Migrate all workloads from the node. For more information, follow Nova official documentation: Migrate instances.

  3. Ensure that there are no pods running on the node to delete by draining the node as instructed in the Kubernetes official documentation: Safely drain node.

  4. Delete the compute service using OpenStack API. In the keystone-client pod, run:

    openstack compute service delete <service_id>
    

    Note

    To obtain <service_id>, run:

    openstack compute service list --host <cmp_host_name>
    
  5. Delete the Neutron agent service. In the keystone-client pod, run:

    openstack network agent delete <agent_id>
    

    Note

    To obtain <agent_id>, run:

    openstack network agent list --host <cmp_host_name>
    
  6. Delete the node through the Mirantis Container Cloud web UI as described in Mirantis Container Cloud Operations Guide: Delete a machine.

Add a controller node

This section describes how to add a new control plane node to the existing MOS deployment.

To add an OpenStack controller node:

  1. Add a bare metal host to the managed cluster with MOS as described in Add a bare metal host.

    When adding the bare metal host YAML file, specify the following OpenStack control plane node labels for the OpenStack control plane services such as database, messaging, API, schedulers, conductors, L3 and L2 agents:

    • openstack-control-plane=enabled

    • openstack-gateway=enabled

    • openvswitch=enabled

  2. Create a Kubernetes machine in your cluster as described in Add a machine.

    When adding the machine, verify that OpenStack control plane node has the following labels:

    • openstack-control-plane=enabled

    • openstack-gateway=enabled

    • openvswitch=enabled

    Note

    Depending on the applications that were colocated on the failed controller node, you may need to specify some additional labels, for example, ceph_role_mgr=true and ceph_role_mon=true . To successfuly replace a failed mon and mgr node, refer to Mirantis Container Cloud Operations Guide: Manage Ceph.

  3. Verify that the node is in the Ready state through the Kubernetes API:

    kubectl get node <NODE-NAME> -o wide | grep Ready
    
  4. Verify that the node has all required labels described in the previous steps:

    kubectl get nodes --show-labels
    
  5. Configure new Octavia health manager resources:

    1. Rerun the octavia-create-resources job:

      kubectl -n osh-system exec -t <OS-CONTROLLER-POD> -c osdpl osctl-job-rerun octavia-create-resources openstack
      
    2. Wait until the Octavia health manager pod on the newly added control plane node appears in the Running state:

      kubectl -n openstack get pods -o wide | grep <NODE_ID> | grep octavia-health-manager
      

      Note

      If the pod is in the crashloopbackoff state, remove it:

      kubectl -n openstack delete pod <OCTAVIA-HEALTH-MANAGER-POD-NAME>
      
    3. Verify that an OpenStack port for the node has been created and the node is in the Active state:

      kubectl -n openstack exec -t <KEYSTONE-CLIENT-POD-NAME> openstack port show octavia-health-manager-listen-port-<NODE-NAME>
      
Replace a failed controller node

This section describes how to replace a failed control plane node in your MOS deployment. The procedure applies to the control plane nodes that are, for example, permanently failed due to a hardware failure and appear in the NotReady state:

kubectl get nodes <CONTAINER-CLOUD-NODE-NAME>

Example of system response:

NAME                         STATUS       ROLES    AGE   VERSION
<CONTAINER-CLOUD-NODE-NAME>    NotReady   <none>   10d   v1.18.8-mirantis-1

To replace a failed controller node:

  1. Remove the Kubernetes labels from the failed node by editing the .metadata.labels node object:

    kubectl edit node <CONTAINER-CLOUD-NODE-NAME>
    
  2. Add the control plane node to your deployment as described in Add a controller node.

  3. Identify all stateful applications present on the failed node:

    node=<CONTAINER-CLOUD-NODE-NAME>
    claims=$(kubectl -n openstack get pv -o jsonpath="{.items[?(@.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0] == '${node}')].spec.claimRef.name}")
    for i in $claims; do echo $i; done
    

    Example of system response:

    mysql-data-mariadb-server-2
    openstack-operator-bind-mounts-rfr-openstack-redis-1
    etcd-data-etcd-etcd-0
    
  4. Reschedule stateful applications pods to healthy controller nodes as described in Reschedule stateful applications.

  5. If the failed controller node had the StackLight label, fix the StackLight volume node affinity conflict as described in Mirantis Container Cloud Operations Guide: Delete a machine.

  6. Remove the OpenStack port related to the Octavia health manager pod of the failed node:

    kubectl -n openstack exec -t <KEYSTONE-CLIENT-POD-NAME> openstack port delete octavia-health-manager-listen-port-<NODE-NAME>
    
Reschedule stateful applications

The rescheduling of stateful applications may be required when replacing a permanently failed node, decommissioning a node, migrating applications to nodes with a more suitable set of hardware, and in several other use cases.

MOS deployment profiles include the following stateful applications:

  • OpenStack database (MariaDB)

  • OpenStack coordination (etcd)

  • OpenStack Time Series Database back end (Redis)

Each stateful application from the list above has a persistent volume claim (PVC) based on a local persistent volume per pod. Each of control plane nodes has a set of local volumes available. To migrate an application pod to another node, recreate a PVC with the persistent volume from the target node.

Caution

A stateful application pod can only be migrated to a node that does not contain other pods of this application.

Caution

When a PVC is removed, all data present in the related persistent volume is removed from the node as well.

Reschedule pods to another control plane node

This section describes how to reschedule pods for MariaDB, etcd, and Redis to another control plane node.

To reschedule pods for MariaDB:

  1. Recreate PVCs as described in Recreate a PVC on another control plane node.

  2. Remove the pod:

    Note

    To remove a pod from a node in the NotReady state, add --grace-period=0 --force to the following command.

    kubectl -n openstack delete pod <STATEFULSET-NAME>-<NUMBER>
    
  3. Wait until the pod appears in the Ready state.

    When the rescheduling is finalized, the <STATEFULSET-NAME>-<NUMBER> pod rejoins the Galera cluster with a clean MySQL data directory and requests the Galera state transfer from the available nodes.

To reschedule pods for Redis:

  1. Recreate PVCs as described in Recreate a PVC on another control plane node.

  2. Remove the pod:

    Note

    To remove a pod from a node in the NotReady state, add --grace-period=0 --force to the following command.

    kubectl -n openstack-redis delete pod <STATEFULSET-NAME>-<NUMBER>
    
  3. Wait until the pod is in the Ready state.

To reschedule pods for etcd:

Warning

During the reschedule procedure of the etcd LCM, a short cluster downtime is expected.

  1. On the failed node, identify the etcd replica ID that is a numeric suffix in a pod name. For example, the ID of the etcd-etcd-0 node id 0. This ID is required during the reschedule procedure.

    kubectl -n openstack get pods | grep etcd
    
    etcd-etcd-0                    0/1     Pending                 0          3m52s
    etcd-etcd-1                    1/1     Running                 0          39m
    etcd-etcd-2                    1/1     Running                 0          39m
    
  2. If the replica ID is 1 or higher:

    1. Add the coordination section to the spec.services section of the OsDpl object:

      spec:
        services:
          coordination:
            etcd:
              values:
                conf:
                  etcd:
                    ETCD_INITIAL_CLUSTER_STATE: existing
      
    2. Wait for the etcd statefulSet to update the new state parameter:

      kubectl -n openstack get sts etcd-etcd -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="ETCD_INITIAL_CLUSTER_STATE")].value}'
      
  3. Scale down the etcd StatefulSet to 0 replicas. Verify that no replicas are running on the failed node.

    kubectl -n openstack scale sts etcd-etcd --replicas=0
    
  4. Recreate PVCs as described in Recreate a PVC on another control plane node.

  5. Scale the etcd StatefulSet to the initial number of replicas:

    kubectl -n openstack scale sts etcd-etcd --replicas=<NUMBER-OF-REPLICAS>
    
  6. Wait until all etcd pods are in the Ready state.

  7. Verify that the etcd cluster is healthy:

    kubectl -n openstack exec -t etcd-etcd-1 -- etcdctl cluster-health
    
  8. If the replica ID is 1 or higher:

    1. Remove the coordination section from the spec.services section of the OsDpl object.

    2. Wait until all etcd pods appear in the Ready state.

    3. Verify that the etcd cluster is healthy:

      kubectl -n openstack exec -t etcd-etcd-1 -- etcdctl cluster-health
      
Recreate a PVC on another control plane node

This section describes how to recreate a PVC of a stateful application on another control plane node.

To recreate a PVC on another control plane node:

  1. Select one of the persistent volumes available on the node:

    Caution

    A stateful application pod can only be migrated to the node that does not contain other pods of this application.

    NODE_NAME=<NODE-NAME>
    STORAGE_CLASS=$(kubectl -n openstack get osdpl <OSDPL_OBJECT_NAME> -o jsonpath='{.spec.local_volume_storage_class}')
    kubectl -n openstack get pv -o json | jq --arg NODE_NAME $NODE_NAME --arg STORAGE_CLASS $STORAGE_CLASS -r '.items[] | select(.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0] == $NODE_NAME and .spec.storageClassName == $STORAGE_CLASS and .status.phase == "Available") | .metadata.name'
    
  2. As the new PVC should contain the same parameters as the deleted one except for volumeName, save the old PVC configuration in YAML:

    kubectl -n <NAMESPACE> get pvc <PVC-NAME> -o yaml > <OLD-PVC>.yaml
    

    Note

    <NAMESPACE> is a Kubernetes namespace where the PVC is created. For Redis, specify openstack-redis, for other applications specify openstack.

  3. Delete the old PVC:

    kubectl -n <NAMESPACE> delete pvc <PVC-NAME>
    

    Note

    If a PVC has stuck in the terminating state, run kubectl -n openstack edit pvc <PVC-NAME> and remove the finalizers section from metadata of the PVC.

  4. Create a PVC with a new persistent volume:

    cat <<EOF | kubectl apply -f -
         apiVersion: v1
         kind: PersistentVolumeClaim
         metadata:
           name: <PVC-NAME>
           namespace: <NAMESPACE>
         spec:
           accessModes:
           - ReadWriteOnce
           resources:
             requests:
               storage: <STORAGE-SIZE>
           storageClassName: <STORAGE-CLASS>
           volumeMode: Filesystem
           volumeName: <PV-NAME>
        EOF
    

    Caution

    <STORAGE-SIZE>, <STORAGE-CLASS>, and <NAMESPACE> should correspond to the storage, storageClassName, and namespace values from the <OLD-PVC>.yaml file with the old PVC configuration.

Back up and restore a MariaDB Galera database

MOS uses a MariaDB Galera database cluster to store data generated by OpenStack components. Mirantis recommends backing up your databases daily to ensure the integrity of your data. Also, you should create an instant backup before upgrading your database or for testing purposes.

MOS uses the Mariabackup utility to back up MariaDB Galera cluster data. Mariabackup is launched on periodic basis by a Kubernetes cron job, which is part of the MOS installation and is in the suspended state by default. To start running the job, you need to explicitly enable it in the OpenStackDeployment object.

Also, MOS provides the means for the MariaDB Galera cluster restoration using Mariabackup. During restoration, the job restores all Mariadb Galera nodes.

MariaDB backup workflow

The OpenStack database backup workflow includes the following phases:

  1. Backup phase 1:

    The mariadb-phy-backup job launches the mariadb-phy-backup-<TIMESTAMP> pod. This pod contains the main backup script, which is responsible for:

    • Basic sanity checks and choosing right node for backup

    • Verifying the wsrep status and changing the wsrep_desync parameter settings

    • Managing the mariadb-phy-backup-runner pod

    During the first backup phase, the following actions take place:

    1. Sanity check: verification of the Kubernetes status and wsrep status of each MariaDB pod. If some pods have wrong statuses, the backup job fails unless the --allow-unsafe-backup parameter is passed to the main script in the Kubernetes backup job.

      Note

      Mirantis does not recommend setting the --allow-unsafe-backup parameter unless it is absolutely required. To ensure the consistency of a backup, verify that the MariaDB galera cluster is in a working state before you proceed with the backup.

    2. Select the replica to back up. The system selects the replica with the highest number in its name as a target replica. For example, if the MariaDB server pods have the mariadb-server-0, mariadb-server-1, and mariadb-server-2 names, the mariadb-server-2 replica will be backed up.

    3. Desynchronize the replica from the Galera cluster. The script connects the target replica and sets the wsrep_desync variable to ON. Then, the replica stops receiving write-sets and receives the wsrep status Donor/Desynced. The Kubernetes health check of that mariadb-server pod fails and the Kubernetes status of that pod becomes Not ready. If the pod has the primary label, the MariaDB controller sets the backup label to it and the pod is removed from the endpoints list of the MariaDB service.

    _images/os-k8s-mariadb-backup-phase1.png
  2. Backup phase 2:

    1. The main script in the mariadb-phy-backup pod launches the Kubernetes pod mariadb-phy-backup-runner-<TIMESTAMP> on the same node where the target mariadb-server replica is running, which is node X in the example.

    2. The mariadb-phy-backup-runner pod has both mysql data directory and backup directory mounted. The pod performs the following actions:

      1. Verifies that there is enough space in the /var/backup folder to perform the backup. The amount of available space in the folder should be greater than <DB-SIZE> * <MARIADB-BACKUP-REQUIRED-SPACE-RATIO in KB.

      2. Performs the actual backup using the mariabackup tool.

      3. If the number of current backups is greater than the value of the MARIADB_BACKUPS_TO_KEEP job parameter, the script removes all old backups exceeding the allowed number of backups.

      4. Exits with 0 code.

    3. The script waits untill the mariadb-phy-backup-runner pod is completed and collects its logs.

    4. The script puts the backed up replica back to sync with the Galera cluster by setting wsrep_desync to OFF and waits for the replica to become Ready in Kubernetes.

    _images/os-k8s-mariadb-backup-phase2.png
MariaDB restore workflow

The OpenStack database restore workflow includes the following phases:

  1. Restoration phase 1:

    The mariadb-phy-restore job launches the mariadb-phy-restore pod. This pod contains the main restore script, which is responsible for:

    • Scaling of the mariadb-server StatefulSet

    • Verifying of the mariadb-server pods statuses

    • Managing of the openstack-mariadb-phy-restore-runner pods

    Caution

    During the restoration, the database is not available for OpenStack services that means a complete outage of all OpenStack services.

    During the first phase, the following actions are performed:

    1. Save the list of mariadb-server persistent volume claims (PVC).

    2. Scale the mariadb server StatefulSet to 0 replicas. At this point, the database becomes unavailable for OpenStack services.

    _images/os-k8s-mariadb-restore-phase1.png

  2. Restoration phase 2:

    1. The mariadb-phy-restore pod launches openstack-mariadb-phy-restore-runner with the first mariadb-server replica PVC mounted to the /var/lib/mysql folder and the backup PVC mounted to /var/backup. The openstack-mariadb-phy-restore-runner pod performs the following actions:

      1. Unarchives the database backup files to a temporary directory within /var/backup.

      2. Executes mariabackup --prepare on the unarchived data.

      3. Creates the .prepared file in the temporary directory in /var/backup.

      4. Restores the backup to /var/lib/mysql.

      5. Exits with 0.

    2. The script in the mariadb-phy-restore pod collects the logs from the openstack-mariadb-phy-restore-runner pod and removes the pod. Then, the script launches the next openstack-mariadb-phy-restore-runner pod for the next mariadb-server replica PVC. The openstack-mariadb-phy-restore-runner pod restores the backup to /var/lib/mysql and exits with 0.

      Step 2 is repeated for every mariadb-server replica PVC sequentially.

    3. When the last replica’s data is restored, the last openstack-mariadb-phy-restore-runner pod removes the .prepared file and the temporary folder with unachieved data from /var/backup.

    _images/os-k8s-mariadb-restore-phase2.png

  3. Restoration phase 3:

    1. The mariadb-phy-restore pod scales the mariadb-server StatefulSet back to the configured number of replicas.

    2. The mariadb-phy-restore pod waits until all mariadb-server replicas are ready.

    _images/os-k8s-mariadb-restore-phase3.png
Limitations and prerequisites

The list of prerequisites includes:

  • The MOS cluster should contain a preconfigured storage class with Ceph as a storage back end.

  • The default sizes of volumes for backups should be configured as follows:

    • 20 GB for the tiny cluster size

    • 40 GB for the small cluster size

    • 80 GB for the medium cluster size

The list of limitations includes:

  • Backup and restoration of specific databases and tables are not supported. MOS supports only backup and restoration of all databases in the mysql data directory.

  • During the MariaDB Galera cluster restoration, the cron job restores the state on all MariaDB nodes sequentially. You cannot perform parallel restoration because Ceph Kubernetes volumes do not support concurrent mounting from different nodes.

Configure a periodic backup for MariaDB databases

After the MOS deployment, the cluster confguration includes the MariaDB backup functionality. By default, the Kubernetes cron job responsible for the MariaDB backup is in the suspended state.

To enable the MariaDB databases backup:

  1. Enable the backup in the OpenStackDeployment object:

    spec:
      features:
        database:
          backup:
            enabled: true
    
  2. Verify that the mariadb-phy-backup CronJob object is present:

    kubectl -n openstack get cronjob mariadb-phy-backup
    

    Example of a positive system response:

    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      annotations:
        openstackhelm.openstack.org/release_uuid: ""
      creationTimestamp: "2020-09-08T14:13:48Z"
      managedFields:
      <<<skipped>>>>
      name: mariadb-phy-backup
      namespace: openstack
      resourceVersion: "726449"
      selfLink: /apis/batch/v1beta1/namespaces/openstack/cronjobs/mariadb-phy-backup
      uid: 88c9be21-a160-4de1-afcf-0853697dd1a1
    spec:
      concurrencyPolicy: Forbid
      failedJobsHistoryLimit: 1
      jobTemplate:
        metadata:
          creationTimestamp: null
          labels:
            application: mariadb-phy-backup
            component: backup
            release_group: openstack-mariadb
        spec:
          activeDeadlineSeconds: 4200
          backoffLimit: 0
          completions: 1
          parallelism: 1
          template:
            metadata:
              creationTimestamp: null
              labels:
                application: mariadb-phy-backup
                component: backup
                release_group: openstack-mariadb
            spec:
              containers:
              - command:
                - /tmp/mariadb_resque.py
                - backup
                - --backup-timeout
                - "3600"
                - --backup-type
                - incremental
                env:
                - name: MARIADB_BACKUPS_TO_KEEP
                  value: "10"
                - name: MARIADB_BACKUP_PVC_NAME
                  value: mariadb-phy-backup-data
                - name: MARIADB_FULL_BACKUP_CYCLE
                  value: "604800"
                - name: MARIADB_REPLICAS
                  value: "3"
                - name: MARIADB_BACKUP_REQUIRED_SPACE_RATIO
                  value: "1.2"
                - name: MARIADB_RESQUE_RUNNER_IMAGE
                  value: docker-dev-kaas-local.docker.mirantis.net/general/mariadb:10.4.14-bionic-20200812025059
                - name: MARIADB_RESQUE_RUNNER_SERVICE_ACCOUNT
                  value: mariadb-phy-backup-runner
                - name: MARIADB_RESQUE_RUNNER_POD_NAME_PREFIX
                  value: openstack-mariadb
                - name: MARIADB_POD_NAMESPACE
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: metadata.namespace
                image: docker-dev-kaas-local.docker.mirantis.net/general/mariadb:10.4.14-bionic-20200812025059
                imagePullPolicy: IfNotPresent
                name: phy-backup
                resources: {}
                securityContext:
                  allowPrivilegeEscalation: false
                  readOnlyRootFilesystem: true
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
                volumeMounts:
                - mountPath: /tmp
                  name: pod-tmp
                - mountPath: /tmp/mariadb_resque.py
                  name: mariadb-bin
                  readOnly: true
                  subPath: mariadb_resque.py
                - mountPath: /tmp/resque_runner.yaml.j2
                  name: mariadb-bin
                  readOnly: true
                  subPath: resque_runner.yaml.j2
                - mountPath: /etc/mysql/admin_user.cnf
                  name: mariadb-secrets
                  readOnly: true
                  subPath: admin_user.cnf
              dnsPolicy: ClusterFirst
              initContainers:
              - command:
                - kubernetes-entrypoint
                env:
                - name: POD_NAME
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: metadata.name
                - name: NAMESPACE
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: metadata.namespace
                - name: INTERFACE_NAME
                  value: eth0
                - name: PATH
                  value: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/
                - name: DEPENDENCY_SERVICE
                - name: DEPENDENCY_DAEMONSET
                - name: DEPENDENCY_CONTAINER
                - name: DEPENDENCY_POD_JSON
                - name: DEPENDENCY_CUSTOM_RESOURCE
                image: docker-dev-kaas-local.docker.mirantis.net/openstack/extra/kubernetes-entrypoint:v1.0.0-20200311160233
                imagePullPolicy: IfNotPresent
                name: init
                resources: {}
                securityContext:
                  allowPrivilegeEscalation: false
                  readOnlyRootFilesystem: true
                  runAsUser: 65534
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
              nodeSelector:
                openstack-control-plane: enabled
              restartPolicy: Never
              schedulerName: default-scheduler
              securityContext:
                runAsUser: 999
              serviceAccount: mariadb-phy-backup
              serviceAccountName: mariadb-phy-backup
              terminationGracePeriodSeconds: 30
              volumes:
              - emptyDir: {}
                name: pod-tmp
              - name: mariadb-secrets
                secret:
                  defaultMode: 292
                  secretName: mariadb-secrets
              - configMap:
                  defaultMode: 365
                  name: mariadb-bin
                name: mariadb-bin
      schedule: 0 1 * * *
      successfulJobsHistoryLimit: 3
      suspend: false
    
  3. If required, modify the default backup configuration. By default, the backup is set up as follows:

    • Runs on a daily basis at 01:00 AM

    • Creates incremental backups daily and full backups weekly

    • Keeps 10 latest full backups

    • Saves backups to the mariadb-phy-backup-data PVC

    • The backup timeout is 3600 seconds

    • The backup type is incremental

    As illustrated in the cron job exmample, the mariadb_resque.py script launches backups of the MariaDB Galera cluster. The script accepts settings through parameters and environment variables.

    The following table describes the parameters that you can pass to the cron job and override from the OpenStackDeployment object.

    MariaDB backup: Configuration parameters

    Parameter

    Type

    Default

    Description

    --backup-type

    String

    incremental

    Type of a backup. The list of possible values include:

    • incremental

      If the newest full backup is older than the value of the full_backup_cycle parameter, the system performs a full backup. Otherwise, the system performs an incremental backup of the newest full backup.

    • full

      Always performs only a full backup.

    Usage example:

    spec:
      features:
        database:
          backup:
            backup_type: incremental
    

    --backup-timeout

    Integer

    21600

    Timeout in seconds for the system to wait for the backup operation to succeed.

    Usage example:

    spec:
      services:
        database:
          mariadb:
            values:
              conf:
                phy_backup:
                  backup_timeout: 30000
    

    --allow-unsafe-backup

    Boolean

    false

    If set to true, enables the MariaDB cluster backup in a not fully operational cluster, where:

    • The current number of ready pods is not equal to MARIADB_REPLICAS.

    • Some replicas do not have healthy wsrep statuses.

    Usage example:

    spec:
      services:
        database:
          mariadb:
            values:
              conf:
                phy_backup:
                  allow_unsafe_backup: true
    

    The following table describes the environment variables that you can pass to the cron job and override from the OpenStackDeployment object.

    MariaDB backup: Environment variables

    Variable

    Type

    Default

    Description

    MARIADB_BACKUPS_TO_KEEP

    Integer

    10

    Number of full backups to keep.

    Usage example:

    spec:
      features:
        database:
          backup:
            backups_to_keep: 3
    

    MARIADB_BACKUP_PVC_NAME

    String

    mariadb-phy-backup-data

    Persistent volume claim used to store backups.

    Usage example:

    spec:
      services:
        database:
          mariadb:
            values:
              conf:
                phy_backup:
                  backup_pvc_name: mariadb-phy-backup-data
    

    MARIADB_FULL_BACKUP_CYCLE

    Integer

    604800

    Number of seconds that defines a period between 2 full backups. During this period, incremental backups are performed. The parameter is taken into account only if backup_type is set to incremental. Otherwise, it is ignored. For example, with full_backup_cycle set to 604800 seconds a full backup is taken weekly and, if cron is set to 0 0 * * *, an incremental backup is performed on daily basis.

    Usage example:

    spec:
      features:
        database:
          backup:
            full_backup_cycle: 70000
    

    MARIADB_BACKUP_REQUIRED_SPACE_RATIO

    Floating

    1.2

    Multiplier for the database size to predict the space required to create a backup, either full or incremental, and perform a restoration keeping the uncompressed backup files on the same file system as the compressed ones.

    To estimate the size of MARIADB_BACKUP_REQUIRED_SPACE_RATIO, use the following formula: size of (1 uncompressed full backup + all related incremental uncompressed backups + 1 full compressed backup) in KB =< (DB_SIZE * MARIADB_BACKUP_REQUIRED_SPACE_RATIO) in KB.

    The DB_SIZE is the disk space allocated in the MySQL data directory, which is /var/lib/mysql, for databases data excluding galera.cache and ib_logfile* files. This parameter prevents the backup PVC from being full in the middle of the restoration and backup procedures. If the current available space is lower than DB_SIZE * MARIADB_BACKUP_REQUIRED_SPACE_RATIO, the backup script fails before the system starts the actual backup and the overall status of the backup job is failed.

    Usage example:

    spec:
      services:
        database:
          mariadb:
            values:
              conf:
                phy_backup:
                  backup_required_space_ratio: 1.4
    

    For example, to perform full backups monthly and incremental backups daily at 02:30 AM and keep the backups for the last six months, configure the database backup in your OpenStackDeployment object as follows:

    spec:
      features:
        database:
          backup:
            enabled: true
            backups_to_keep: 6
            schedule_time: '30 2 * * *'
            full_backup_cycle: 2628000
    
Restore MariaDB databases

During the restoration procedure, the MariaDB service will be unavailable, because of the MariaDB StatefulSet scale down to 0 replicas. Therefore, plan the maintenance window according to the database size. The speed of the restoration may depend on the following:

  • Network throughput

  • Storage performance where backups are kept (Ceph by default)

  • Local disks performance of nodes where MariaDB local volumes are present

To restore MariaDB databases:

  1. Obtain an image of the MariaDB container:

    kubectl -n openstack get pods mariadb-server-0 -o jsonpath='{.spec.containers[0].image}'
    
  2. Create the check_pod.yaml file to create the helper pod required to view the backup volume content:

    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: check-backup-helper
      namespace: openstack
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: check-backup-helper
      namespace: openstack
      labels:
        application: check-backup-helper
    spec:
      nodeSelector:
        openstack-control-plane: enabled
      containers:
        - name: helper
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: 0
            readOnlyRootFilesystem: true
          command:
            - sleep
            - infinity
          image: << image of mariadb container >>
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: pod-tmp
              mountPath: /tmp
            - mountPath: /var/backup
              name: mysql-backup
      restartPolicy: Never
      serviceAccount: check-backup-helper
      serviceAccountName: check-backup-helper
      volumes:
        - name: pod-tmp
          emptyDir: {}
        - name: mariadb-secrets
          secret:
            secretName: mariadb-secrets
            defaultMode: 0444
        - name: mariadb-bin
          configMap:
            name: mariadb-bin
            defaultMode: 0555
        - name: mysql-backup
          persistentVolumeClaim:
            claimName: mariadb-phy-backup-data
    
  3. Create the helper pod:

    kubectl -n openstack apply -f check_pod.yaml
    
  4. Obtain the name of the backup to restore:

    kubectl -n openstack exec -t check-backup-helper -- tree /var/backup
    

    Example of system response:

    /var/backup
    |-- base
    |   `-- 2020-09-09_11-35-48
    |       |-- backup.stream.gz
    |       |-- backup.successful
    |       |-- grastate.dat
    |       |-- xtrabackup_checkpoints
    |       `-- xtrabackup_info
    |-- incr
    |   `-- 2020-09-09_11-35-48
    |       |-- 2020-09-10_01-02-36
    |       |-- 2020-09-11_01-02-02
    |       |-- 2020-09-12_01-01-54
    |       |-- 2020-09-13_01-01-55
    |       `-- 2020-09-14_01-01-55
    `-- lost+found
    
    10 directories, 5 files
    

    If you want to restore the full backup, the name from the example above is 2020-09-09_11-35-48. To restore a specific incremental backup, the name from the example above is 2020-09-09_11-35-48/2020-09-12_01-01-54.

    In the example above, the backups will be restored in the following strict order:

    1. 2020-09-09_11-35-48 - full backup, path /var/backup/base/2020-09-09_11-35-48

    2. 2020-09-10_01-02-36 - incremental backup, path /var/backup/incr/2020-09-09_11-35-48/2020-09-10_01-02-36

    3. 2020-09-11_01-02-02 - incremental backup, path /var/backup/incr/2020-09-09_11-35-48/2020-09-11_01-02-02

    4. 2020-09-12_01-01-54 - incremental backup, path /var/backup/incr/2020-09-09_11-35-48/2020-09-12_01-01-54

  5. Delete the helper pod:

    kubectl -n openstack delete -f check_pod.yaml
    
  6. Pass the following parameters to the mariadb_resque.py script from the OsDpl object:

    Parameter

    Type

    Default

    Description

    --backup-name

    String

    Name of a folder with backup in <BASE_BACKUP> or <BASE_BACKUP>/<INCREMENTAL_BACKUP>.

    --replica-restore-timeout

    Integer

    3600

    Timeout in seconds for 1 replica data to be restored to the mysql data directory. Also, includes time for spawning a rescue runner pod in Kubernetes and extracting data from a backup archive.

  7. Edit the OpenStackDeployment object as follows:

    spec:
      services:
        database:
          mariadb:
            values:
              manifests:
                job_mariadb_phy_restore: true
              conf:
                phy_restore:
                  backup_name: "2020-09-09_11-35-48/2020-09-12_01-01-54"
                  replica_restore_timeout: 7200
    
  8. Wait until the mariadb-phy-restore job suceeds:

    kubectl -n openstack get jobs mariadb-phy-restore -o jsonpath='{.status}'
    
  9. The mariadb-phy-restore job is an immutable object. Therefore, remove the job after each execution. To correctly remove the job, clean up all the settings from the OpenStackDeployment object that you have configured during step 7 of this procedure. This will remove all related pods as well.

Verify operationability of the MariaDB backup jobs

To verify operationability of the MariaDB backup jobs:

  1. Verify pods in the openstack namespace. After the backup jobs have succeeded, the pods stay in the Completed state:

    kubectl -n openstack get pods -l application=mariadb-phy-backup
    

    Example of a posistive system response:

    NAME                                  READY   STATUS      RESTARTS   AGE
    mariadb-phy-backup-1599613200-n7jqv   0/1     Completed   0          43h
    mariadb-phy-backup-1599699600-d79nc   0/1     Completed   0          30h
    mariadb-phy-backup-1599786000-d5kc7   0/1     Completed   0          6h17m
    

    Note

    By default, the system keeps three latest successful and one latest failed pods.

  2. Obtain an image of the MariaDB container:

    kubectl -n openstack get pods mariadb-server-0 -o jsonpath='{.spec.containers[0].image}'
    
  3. Create the check_pod.yaml file to create the helper pod required to view the backup volume content.

    Configuration example:

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: check-backup-helper
      namespace: openstack
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: check-backup-helper
      namespace: openstack
      labels:
        application: check-backup-helper
    spec:
      nodeSelector:
        openstack-control-plane: enabled
      containers:
        - name: helper
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: 0
            readOnlyRootFilesystem: true
          command:
            - sleep
            - infinity
          image: << image of mariadb container >>
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: pod-tmp
              mountPath: /tmp
            - mountPath: /var/backup
              name: mysql-backup
      restartPolicy: Never
      serviceAccount: check-backup-helper
      serviceAccountName: check-backup-helper
      volumes:
        - name: pod-tmp
          emptyDir: {}
        - name: mariadb-secrets
          secret:
            secretName: mariadb-secrets
            defaultMode: 0444
        - name: mariadb-bin
          configMap:
            name: mariadb-bin
            defaultMode: 0555
        - name: mysql-backup
          persistentVolumeClaim:
            claimName: mariadb-phy-backup-data
    
  4. Apply the helper service account and pod resources:

    kubectl -n openstack apply -f check_pod.yaml
    kubectl -n openstack get pods -l application=check-backup-helper
    

    Example of a positive system response:

    NAME                  READY   STATUS    RESTARTS   AGE
    check-backup-helper   1/1     Running   0          27s
    
  5. Verify the directories structure within the /var/backup directory of the spawned pod:

    kubectl -n openstack exec -t check-backup-helper -- tree /var/backup
    

    Example of a system response:

    /var/backup
    |-- base
    |   `-- 2020-09-09_11-35-48
    |       |-- backup.stream.gz
    |       |-- backup.successful
    |       |-- grastate.dat
    |       |-- xtrabackup_checkpoints
    |       `-- xtrabackup_info
    |-- incr
    |   `-- 2020-09-09_11-35-48
    |       |-- 2020-09-10_01-02-36
    |       |   |-- backup.stream.gz
    |       |   |-- backup.successful
    |       |   |-- grastate.dat
    |       |   |-- xtrabackup_checkpoints
    |       |   `-- xtrabackup_info
    |       `-- 2020-09-11_01-02-02
    |           |-- backup.stream.gz
    |           |-- backup.successful
    |           |-- grastate.dat
    |           |-- xtrabackup_checkpoints
    |           `-- xtrabackup_info
    

    The base directory contains full backups. Each directory in the incr folder contains incremental backups related to a certain full backup in the base folder. All incremental backups always have the base backup name as parent folder.

  6. Delete the helper pod:

    kubectl delete -f check_pod.yaml
    
Modify automatic cleanup of OpenStack databases

Available since MOS 21.6

OpenStack services mark removed objects in the database as deleted but store information about them. By default, to keep the databases smaller and faster, stale records older than 30 days are automatically cleaned up using scripts that run as cron jobs. For details about the defaults, see features:database:cleanup.

If required, you can modify the cleanup schedule for the following services: Barbican, Cinder, Glance, Heat, Masakari, and Nova.

To modify the cleanup schedule of OpenStack databases, configure the features:database:cleanup settings in the OpenStackDeployment CR using the following structure example. Set the schedule parameter to the required cron expression.

spec:
  features:
    database:
      cleanup:
        <os-service>:
          enabled: true
          schedule: "1 0 * * 1"
          age: 30
          batch: 1000
Run Tempest tests

The OpenStack Integration Test Suite (Tempest), is a set of integration tests to be run against a live OpenStack cluster. This section instructs you on how to verify the workability of your OpenStack deployment using Tempest.

To verify an OpenStack deployment using Tempest:

  1. Configure the Tempest run parameters using the following structure in the OsDpl CR.

    Note

    To perform the smoke testing of your deployment, no additional configuration is required.

    For example, with the following configuration in tempest.conf, the system performs the full Tempest testing:

    spec:
      services:
        tempest:
          tempest:
            values:
              conf:
                script: |
                  tempest run --config-file /etc/tempest/tempest.conf --concurrency 4 --blacklist-file /etc/tempest/test-blacklist --full
    

    The following example structure from the OsDpl CR will set image:build_timeout to 600 in tempest.conf:

    spec:
      services:
        tempest:
          tempest:
            values:
              conf:
                tempest:
                  image:
                    build_timeout: 600
    
  2. Run Tempest. The OpenStack Tempest is deployed like other OpenStack services in a dedicated openstack-tempest Helm release by adding tempest to spec:features:services in the OSDPL resource.

    spec:
      features:
        services:
          - tempest
    
  3. Wait until Tempest is ready. The Tempest tests are launched by the openstack-tempest-run-tests job. To keep track of the tests execution, run:

    kubectl -n openstack logs -l application=tempest,component=run-tests
    
  4. Get the Tempest results. The Tempest results can be stored in a pvc-tempest PersistentVolumeClaim (PVC). To get them from a PVC, use:

    # Run pod and mount pvc to it
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: tempest-test-results-pod
      namespace: openstack
    spec:
      nodeSelector:
        openstack-control-plane: enabled
      volumes:
        - name: tempest-pvc-storage
          persistentVolumeClaim:
            claimName: pvc-tempest
      containers:
        - name: tempest-pvc-container
          image: ubuntu
          command: ['sh', '-c', 'sleep infinity']
          volumeMounts:
            - mountPath: "/var/lib/tempest/data"
              name: tempest-pvc-storage
    EOF
    
  5. If required, copy the results locally:

    kubectl -n openstack cp tempest-test-results-pod:/var/lib/tempest/data/report_file.xml .
    
  6. Remove the Tempest test results pod:

    kubectl -n openstack delete pod tempest-test-results-pod
    
  7. To rerun Tempest:

    1. Remove Tempest from the list of enabled services.

    2. Wait until Tempest jobs are removed.

    3. Add Tempest back to the list of the enabled services.

Calculate a maintenance window duration

This section provides the background information on the approximate time spent on operations for pods of different purposes, possible data plane impact during these operations, and the possibility of a parallel pods update. Such data helps the cloud administrators to correctly estimate maintenance windows and impacts on the workloads for your OpenStack deployment.

Note

The approximate time cost to upgrade an OpenStack cloud with 50 compute nodes is 2 hours.

Note

During the MOS managed cluster update, numerous StackLight alerts may fire. This is an expected behavior. Ignore or temporarily mute them as described in Mirantis Container Cloud Operations Guide: Silence alerts.

Maintenance window calculation

Pod name

Pod description

Kubernetes kind

Readiness time

Data plane impact

Parallel update

[*]-api

Contains API services of OpenStack components. Horizontally well scalable.

Deployment

<30s

NO

YES (batches 10% of overall count)

[*]-conductor

Contains proxy service between OpenStack and database.

Deployment

<30s

NO

YES (batches 10% of overall count)

[*]-scheduler

Spreads OpenStack resources between nodes.

Deployment

<30s

NO

YES (batches 10% of overall count)

[*]-worker
[*]-engine
[*]-volume
[*]-backup
[*]

Process user requests.

Deployment

<30s

NO

YES (batches 10% of overall count)

nova-compute

Processes user requests, interacts with the data plane services.

DaemonSet

<120s

NO

YES (batches 10% of overall count)

neutron-l3-agent

Creates virtual routers (spawns keepalived processes for the HA routers).

DaemonSet

10-15m (for 100 routers)

YES

NO (one by one)

neutron-openvswitch-agent

Configures tunnels between nodes.

DaemonSet

<120s

NO

YES (batches 10% of overall count)

neutron-dhcp-agent

Configures the DHCP server for the networking service.

DaemonSet

<30s

Partially (only if the downtime exceeds the lease timeout.

YES (batches 10% of overall count)

neutron-metadata-agent

Provides metadata information to user workloads (VMs).

DaemonSet

<30s

NO

YES (batches 10% of overall count)

libvirt

Starts the libvirtd communication daemon.

DaemonSet

<30s

NO

YES (batches 10% of overall count)

openvswitch-[*]

Sets up the Open vSwitch datapaths and then operates the switching across each bridge.

DaemonSet

<30s

YES

NO (one by one)

mariadb-[*]

Contains persistent storage (database) for OpenStack deployment.

StatefulSet

<180s

NO

NO (one by one)

memcached-[*]

Contains the memory object caching system.

Deployment

<30s

NO

NO (one by one)

[*]-rabbitmq-[*]

Contains the messaging service for OpenStack.

StatefulSet

<30s

NO

NO (one by one)

Remove an OpenStack cluster

This section instructs you on how to remove an OpenStack cluster, deployed on top of Kubernetes, by deleting the openstackdeployments.lcm.mirantis.com (OsDpl) CR.

To remove an OpenStack cluster:

  1. Verify that the OsDpl object is present:

    kubectl get osdpl -n openstack
    
  2. Delete the OsDpl object:

    kubectl delete osdpl osh-dev -n openstack
    

    The deletion may take a certain amount of time.

  3. Verify that all pods and jobs have been deleted and no objects are present in the command output:

    kubectl get pods,jobs -n openstack
    
  4. Delete Persistent Volume Claims (PVCs) using the following snippet. Deletion of PVCs causes data deletion on Persistent Volumes. The volumes themselves will become available for further operations.

    Caution

    Before deleting PVCs, save valuable data in a safe place.

    #!/bin/bash
    PVCS=$(kubectl get pvc  --all-namespaces | egrep "redis|etcd|mariadb" | awk '{print $1" "$2" "$4}'| column -t | awk 'NR>1')
    echo  "$PVCS" | while read line; do
    PVC_NAMESPACE=$(echo "$line" | awk '{print $1}')
    PVC_NAME=$(echo "$line" | awk '{print $2}')
    echo "Deleting PVC ${PVC_NAME}"
    kubectl delete pvc ${PVC_NAME} -n ${PVC_NAMESPACE}
    done
    

    Note

    Deletion of PVCs may get stuck if a resource that uses the PVC is still running. Once the resource is deleted, the PVC deletion process will proceed.

  5. Delete the MariaDB state ConfigMap:

    kubectl delete configmap openstack-mariadb-mariadb-state -n openstack
    
  6. Delete secrets using the following snippet:

    #!/bin/bash
    SECRETS=$(kubectl get secret  -n openstack | awk '{print $1}'| column -t | awk 'NR>1')
    echo  "$SECRETS" | while read line; do
    echo "Deleting Secret ${line}"
    kubectl delete secret ${line} -n openstack
    done
    
  7. Verify that OpenStack ConfigMaps and secrets have been deleted:

    kubectl get configmaps,secrets -n openstack
    
Remove an OpenStack service

This section instructs you on how to remove an OpenStack service deployed on top of Kubernetes. A service is typically removed by deleting a corresponding entry in the spec.features.services section of the openstackdeployments.lcm.mirantis.com (OsDpl) CR.

Caution

You cannot remove the default services built into the preset section.


Remove a service
  1. Verify that the spec.features.services section is present in the OsDpl object:

    kubectl -n openstack get osdpl osh-dev -o jsonpath='{.spec.features.services}'
    

    Example of system output:

    [instance-ha object-storage]
    
  2. Obtain the user name of the service database that will be required during Clean up OpenStack database leftovers after the service removal to substitute SERVICE-DB-NAME:

    Note

    For example, the <SERVICE-NAME> for the instance-ha service type is masakari.

    kubectl -n osh-system exec -t <OPENSTACK-CONTROLLER-POD-NAME> -- helm3 -n openstack get values openstack-<SERVICE-NAME> -o json | jq -r .endpoints.oslo_db.auth.<SERVICE-NAME>.username
    
  3. Delete the service from the spec.features.services section of the OsDpl CR:

    kubectl -n openstack edit osdpl osh-dev
    

    The deletion may take a certain amount of time.

  4. Verify that all related objects have been deleted and no objects are present in the output of the following command:

    for i in $(kubectl api-resources --namespaced -o name | grep -v event); do kubectl -n openstack get $i 2>/dev/null | grep <SERVICE-NAME>; done
    
Clean up OpenStack API leftovers after the service removal
  1. Log in to the Keystone client pod shell:

    kubectl -n openstack exec -it <KEYSTONE-CLIENT-POD-NAME> -- bash
    
  2. Remove service endpoints from the Keystone catalog:

    for i in $(openstack endpoint list --service <SERVICE-NAME> -f value -c ID); do openstack endpoint delete $i; done
    
  3. Remove the service user from the Keystone catalog:

    openstack user list --project service | grep <SERVICE-NAME>
    openstack user delete <SERVICE-USER-ID>
    
  4. Remove the service from the catalog:

    openstack service list | grep <SERVICE-NAME>
    openstack service delete <SERVICE-ID>
    
Clean up OpenStack database leftovers after the service removal

Caution

The procedure below will permanently destroy the data of the removed service.

  1. Log in to the mariadb-server pod shell:

    kubectl -n openstack exec -it mariadb-server-0 -- bash
    
  2. Remove the service database user and its permissions:

    Note

    Use the user name for the service database obtained during the Remove a service procedure to substitute SERVICE-DB-NAME:

    mysql -u root -p${MYSQL_DBADMIN_PASSWORD} -e "REVOKE ALL PRIVILEGES, GRANT OPTION FROM '<SERVICE-DB-USERNAME>'@'%';"
    mysql -u root -p${MYSQL_DBADMIN_PASSWORD} -e "DROP USER '<SERVICE-DB-USERNAME>'@'%';"
    
  3. Remove the service database:

    mysql -u root -p${MYSQL_DBADMIN_PASSWORD} -e "DROP DATABASE <SERVICE-NAME>;"
    

OpenStack services configuration

The section covers OpenStack services post-deployment configuration and is intended for cloud operators who are responsible for providing working cloud infrastructure to the cloud end users.

Configure high availability with Masakari

Instances High Availability Service or Masakari is an OpenStack project designed to ensure high availability of instances and compute processes running on hosts.

Before the end user can start enjoying the benefits of Masakari, the cloud operator has to configure the service properly. This section includes instructions on how to create segments and host through the Masakari API as well as provides the list of additional settings that can be useful in certain use cases.

Group compute nodes into segments

The segment object is a logical grouping of compute nodes into zones also known as availability zones. The segment object enables the cloud operator to list, create, show details for, update, and delete segments.

To create a segment named allcomputes with service_type = compute, and recovery_method = auto, run:

openstack segment create allcomputes auto compute

Example of a positive system response:

+-----------------+--------------------------------------+
| Field           | Value                                |
+-----------------+--------------------------------------+
| created_at      | 2021-07-06T07:34:23.000000           |
| updated_at      | None                                 |
| uuid            | b8b0d7ca-1088-49db-a1e2-be004522f3d1 |
| name            | allcomputes                          |
| description     | None                                 |
| id              | 2                                    |
| service_type    | compute                              |
| recovery_method | auto                                 |
+-----------------+--------------------------------------+
Create hosts under segments

The host object represents compute service hypervisors. A host belongs to a segment. The host can be any kind of virtual machine that has compute service running on it. The host object enables the operator to list, create, show details for, update, and delete hosts.

To create a host under a given segment:

  1. Obtain the hypervisor hostname:

    openstack hypervisor list
    

    Example of a positive system response:

    +----+-------------------------------------------------------+-----------------+------------+-------+
    | ID | Hypervisor Hostname                                   | Hypervisor Type | Host IP    | State |
    +----+-------------------------------------------------------+-----------------+------------+-------+
    |  2 | vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt | QEMU            | 10.10.0.39 | up    |
    |  5 | vs-ps-vyvsrkrdpusv-0-ukqbpy2pkcuq-server-s4u2thvgxdfi | QEMU            | 10.10.0.14 | up    |
    +----+-------------------------------------------------------+-----------------+------------+-------+
    
  2. Create the host under previously created segment. For example, with uuid = b8b0d7ca-1088-49db-a1e2-be004522f3d1:

    Caution

    The segment under which you create a host must exist.

    openstack segment host create \
        vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt \
        compute \
        SSH \
        b8b0d7ca-1088-49db-a1e2-be004522f3d1
    

    Positive system response:

    +---------------------+-------------------------------------------------------+
    | Field               | Value                                                 |
    +---------------------+-------------------------------------------------------+
    | created_at          | 2021-07-06T07:37:26.000000                            |
    | updated_at          | None                                                  |
    | uuid                | 6f1bd5aa-0c21-446a-b6dd-c1b4d09759be                  |
    | name                | vs-ps-vyvsrkrdpusv-1-w2mtagbeyhel-server-cgpejthzbztt |
    | type                | compute                                               |
    | control_attributes  | SSH                                                   |
    | reserved            | False                                                 |
    | on_maintenance      | False                                                 |
    | failover_segment_id | b8b0d7ca-1088-49db-a1e2-be004522f3d1                  |
    +---------------------+-------------------------------------------------------+
    
Enable notifications

The alerting API is used by Masakari monitors to notify about a failure of either a host, process, or instance. The notification object enables the operator to list, create, and show details of notifications.

Useful tunings

The list of useful tunings for the Masakari service includes:

  • [host_failure]\evacuate_all_instances

    Enables the operator to decide whether to evacuate all instances or only the instances that have [host_failure]\ha_enabled_instance_metadata_key set to True.

  • [host_failure]\ha_enabled_instance_metadata_key

    Enables the operator to decide on the instance metadata key naming that affects the per instance behavior of [host_failure]\evacuate_all_instances. The default is the same for both failure types, which include host and instance, but the value can be overridden to make the metadata key different per failure type.

  • [host_failure]\ignore_instances_in_error_state

    Enables the operator to decide whether error instances should be allowed for evacuation from a failed source compute node or not. If set to True, it will ignore error instances from evacuation from a failed source compute node. Otherwise, it will evacuate error instances along with other instances from a failed source compute node.

  • [instance_failure]\process_all_instances

    Enables the operator to decide whether all instances or only the ones that have [instance_failure]\ha_enabled_instance_metadata_key set to True should be recovered from instance failure events. If set to True, it will execute instance failure recovery actions for an instance irrespective of whether that particular instance has [instance_failure]\ha_enabled_instance_metadata_key set to True or not. Otherwise, it will only execute instance failure recovery actions for an instance which has [instance_failure]\ha_enabled_instance_metadata_key set to True.

  • [instance_failure]\ha_enabled_instance_metadata_key

    Enables the operators to decide on the instance metadata key naming that affects the per-instance behavior of [instance_failure]\process_all_instances. The default is the same for both failure types, which include host and instance, but you can override the value to make the metadata key different per failure type.

Ceph operations

To manage a running Ceph cluster, for example, to add or remove a Ceph OSD, remove a Ceph node, replace a failed physical disk with a Ceph node, or update your Ceph cluster, refer to Mirantis Container Cloud Operations Guide: Manage Ceph.

Before you proceed with any reading or writing operation, first check the cluster status using the ceph tool as described in Mirantis Container Cloud Operations Guide: Verify the Ceph core services.

This section describes the OpenStack-related Ceph operations.

Configure Ceph RGW TLS

Once you enable Ceph RGW as described in Mirantis Container Cloud: Enable Ceph RGW Object Storage, you can configure the Transport Layer Security (TLS) protocol for a Ceph RGW public endpoint using the following options:

  • Using MOS TLS, if it is enabled and exposes its certificates and domain for Ceph. In this case, Ceph RGW will automatically create an ingress rule with MOS certificates and domain to access the Ceph RGW public endpoint. Therefore, you only need to reach the Ceph RGW public and internal endpoints and set the CA certificates for a trusted TLS connection.

  • Using custom ingress specified in the KaaSCephCluster CR. In this case, Ceph RGW public endpoint will use the public domain specified using the ingress parameters.

Caution

  • Starting from MOS 21.3, external Ceph RGW service is not supported and will be deleted during update. If your system already uses endpoints of an external RGW service, reconfigure them to the ingress endpoints.

  • When using a custom or OpenStack ingress, configure the DNS name for RGW to look on an external IP address of that ingress. If you do not have an OpenStack or custom ingress, point the DNS to an external LB of RGW.

To configure Ceph RGW TLS:

  1. Verify whether MOS TLS is enabled. The spec.features.ssl.public_endpoints section should be specified in the OpenStackDeployment CR.

  2. To generate an SSL certificate for internal usage, verify that the gateway securePort parameter is specified in the KaasCephCluster CR. For details, see Mirantis Container Cloud: Enable Ceph RGW Object Storage.

  3. Select from the following options:

    • If MOS TLS is enabled, obtain the MOS CA certificate for a trusted connection:

      kubectl -n openstack-ceph-shared get secret openstack-rgw-creds -o jsonpath="{.data.ca_cert}" | base64 -d
      
    • Configure Ceph RGW TLS using a custom ingress:

      Warning

      Starting from MOS 21.2, the rgw section is deprecated and the ingress parameters are moved under cephClusterSpec.ingress. If you continue using rgw.ingress, it will be automatically translated into cephClusterSpec.ingress during the MOS managed cluster release update.

      1. Open the KaasCephCluster CR for editing.

      2. Specify the ingress parameters:

        • publicDomain - domain name to use for the external service.

        • cacert - Certificate Authority (CA) certificate, used for the ingress rule TLS support.

        • tlsCert - TLS certificate, used for the ingress rule TLS support.

        • tlsKey - TLS private key, used for the ingress rule TLS support.

        • customIngressOptional. Available since MOS 21.3 - includes the following custom Ingress Controller parameters:

          • className - the custom Ingress Controller class name. If not specified, the openstack-ingress-nginx class name is used by default.

          • annotations - extra annotations for the ingress proxy. For details, see NGINX Ingress Controller: Annotations. By default, the following annotations are set:

            • nginx.ingress.kubernetes.io/rewrite-target is set to /

            • nginx.ingress.kubernetes.io/upstream-vhost is set to <rgwName>.rook-ceph.svc.

              The value for <rgwName> is spec.cephClusterSpec.objectStorage.rgw.name.

          For example:

          customIngress:
            className: openstack-ingress-nginx
            annotations:
              nginx.ingress.kubernetes.io/rewrite-target: /
              nginx.ingress.kubernetes.io/upstream-vhost: openstack-store.rook-ceph.svc
          

          Note

          Starting from MOS 21.3, an ingress rule is by default created with an internal Ceph RGW service endpoint as a back end. Also, rgw dns name is specified in the Ceph configuration and is set to <rgwName>.rook-ceph.svc by default. You can override this option using the spec.cephClusterSpec.rookConfig key-value parameter. In this case, also change the corresponding ingress annotation.

        For example:

        spec:
          cephClusterSpec:
            objectStorage:
              rgw:
                name: rgw-store
            ingress:
              publicDomain: public.domain.name
              cacert: |
                -----BEGIN CERTIFICATE-----
                ...
                -----END CERTIFICATE-----
              tlsCert: |
                -----BEGIN CERTIFICATE-----
                ...
                -----END CERTIFICATE-----
              tlsKey: |
                -----BEGIN RSA PRIVATE KEY-----
                ...
                -----END RSA PRIVATE KEY-----
              customIngress:
                annotations:
                  "nginx.ingress.kubernetes.io/upstream-vhost": rgw-store.public.domain.name
            rookConfig:
              "rgw dns name": rgw-store.public.domain.name
        

        Warning

        • For clouds with the publicDomain parameter specified, align the upstream-vhost ingress annotation with the name of the Ceph Object Storage and the specified public domain.

        • Ceph Object Storage requires the upstream-vhost and rgw dns name parameters to be equal. Therefore, override the default rgw dns name to the corresponding ingress annotation value.

  4. To access internal and public Ceph RGW endpoints:

    1. Obtain the Ceph RGW public endpoint:

      kubectl -n rook-ceph get ingress
      
    2. To use the Ceph RGW internal endpoint with TLS, configure trusted connection for the required CA certificate:

      kubectl -n rook-ceph get secret <rgwCacertSecretName> -o jsonpath="{.data.cacert}" | base64 -d
      

      Substitute <rgwCacertSecretName> with the following value:

      • Starting from MOS 21.2, rgw-ssl-certificate

      • Prior to MOS 21.2, rgw-ssl-local-certificate

    3. Obtain the internal endpoint name for Ceph RGW:

      kubectl -n rook-ceph get svc -l app=rook-ceph-rgw
      

      The internal endpoint for Ceph RGW has the https://<internal-svc-name>.rook-ceph.svc:<rgw-secure-port>/ format, where <rgw-secure-port> is spec.rgw.gateway.securePort specified in the KaaSCephCluster CR.

Ceph default configuration options

Available since MOS 21.3

Ceph controller provides the capability to specify configuration options for the Ceph cluster through the spec.cephClusterSpec.rookConfig key-value parameter of the KaaSCephCluster resource as if they were set in a usual ceph.conf file. For details, see Mirantis Container Cloud Operations Guide: Ceph advanced configuration.

However, if rookConfig is empty but the spec.cephClsuterSpec.objectStore.rgw section is defined, Ceph controller specifies the following OpenStack-related default configuration options for each Ceph cluster. For other default options, see Mirantis Container Cloud Operations Guide: Ceph default configuration options.

  • RADOS Gateway options, which you can override using the rookConfig parameter:

    rgw swift account in url = true
    rgw keystone accepted roles = '_member_, Member, member, swiftoperator'
    rgw keystone accepted admin roles = admin
    rgw keystone implicit tenants = true
    rgw swift versioning enabled = true
    rgw enforce swift acls = true
    rgw_max_attr_name_len = 64
    rgw_max_attrs_num_in_req = 32
    rgw_max_attr_size = 1024
    rgw_bucket_quota_ttl = 0
    rgw_user_quota_bucket_sync_interval = 0
    rgw_user_quota_sync_interval = 0
    rgw s3 auth use keystone = true # Available since MOS 21.4
    
  • Additional parameters for the Keystone integration:

    Warning

    All values with the keystone prefix are programmatically specified for each MOS deployment. Do not modify these parameters manually.

    rgw keystone api version = 3
    rgw keystone url = <keystoneAuthURL>
    rgw keystone admin user = <keystoneUser>
    rgw keystone admin password = <keystonePassword>
    rgw keystone admin domain = <keystoneProjectDomain>
    rgw keystone admin project = <keystoneProjectName>
    

StackLight operations

The section covers the StackLight management aspects.

View Grafana dashboards

Using the Grafana web UI, you can view the visual representation of the metric graphs based on the time series databases.

This section describes only the OpenStack-related Grafana dashboards. For other dashboards, including the system, Kubernetes, Ceph, and StackLight dashboards, see Mirantis Container Cloud Operations Guide: View Grafana dashboards.

Note

  • Starting from MOS 21.1, some Grafana dashboards include a View logs in Kibana link to immediately view relevant logs in the Kibana web UI.

  • Starting from MOS 21.5, Grafana dashboards that present node data have an additional Node identifier drop-down menu. By default, it is set to machine to display short names for Kubernetes nodes. To display Kubernetes node name labels, change this option to node.

To view the Grafana dashboards:

  1. Log in to the Grafana web UI as described in Mirantis Container Cloud Operations Guide: Access StackLight web UIs.

  2. From the drop-down list, select the required dashboard to inspect the status and statistics of the corresponding service deployed in Mirantis OpenStack on Kubernetes:

    Dashboard

    Description

    OpenStack - Overview

    Provides general information on OpenStack services resources consumption, API errors, deployed OpenStack compute nodes and block storage usage.

    KPI - Provisioning

    Provides provisioning statistics for OpenStack compute instances, including graphs on VM creation results by day.

    Cinder

    Provides graphs on the OpenStack Block Storage service health, HTTP API availability, pool capacity and utilization, number of created volumes and snapshots.

    Glance

    Provides graphs on the OpenStack Image service health, HTTP API availability, number of created images and snapshots.

    Gnocchi

    Provides panels and graphs on the Gnocchi health and HTTP API availability.

    Heat

    Provides graphs on the OpenStack Orchestration service health, HTTP API availability and usage.

    Ironic

    Provides graphs on the OpenStack Bare Metal Provisioning service health, HTTP API availability, provisioned nodes by state and installed ironic-conductor back-end drivers.

    Keystone

    Provides graphs on the OpenStack Identity service health, HTTP API availability, number of tenants and users by state.

    Neutron

    Provides graphs on the OpenStack networking service health, HTTP API availability, agents status and usage of Neutron L2 and L3 resources.

    NGINX Ingress controller

    Monitors the number of requests, response times and statuses, as well as the number of Ingress SSL certificates including expiration time and resources usage.

    Nova - Availability Zones

    Provides detailed graphs on the OpenStack availability zones and hypervisor usage.

    Nova - Hypervisor Overview

    Provides a set of single-stat panels presenting resources usage by host.

    Nova - Instances

    Provides graphs on libvirt Prometheus exporter health and resources usage. Monitors the number of running instances and tasks and allows sorting the metrics by top instances.

    Nova - Overview

    Provides graphs on the OpenStack compute services (nova-scheduler, nova-conductor, and nova-compute) health, as well as HTTP API availability.

    Nova - Tenants

    Provides graphs on CPU, RAM, disk throughput, IOPS, and space usage and allocation and allows sorting the metrics by top tenants.

    Nova - Users

    Provides graphs on CPU, RAM, disk throughput, IOPS, and space usage and allocation and allows sorting the metrics by top users.

    Nova - Utilization

    Provides detailed graphs on Nova hypervisor resources capacity and consumption.

    Memcached

    Memcached Prometheus exporter dashboard. Monitors Kubernetes Memcached pods and displays memory usage, hit rate, evicts and reclaims rate, items in cache, network statistics, and commands rate.

    MySQL

    MySQL Prometheus exporter dashboard. Monitors Kubernetes MySQL pods, resources usage and provides details on current connections and database performance.

    RabbitMQ

    RabbitMQ Prometheus exporter dashboard. Monitors Kubernetes RabbitMQ pods, resources usage and provides details on cluster utilization and performance.

    Cassandra Available since MOS 21.1

    Provides graphs on Cassandra clusters’ health, ongoing operations, and resource consumption.

    Kafka Clusters Available since MOS 21.1

    Provides graphs on Kafka clusters’ and broker health, as well as broker and topic usage.

    Redis Clusters Available since MOS 21.1

    Provides graphs on Redis clusters’ and pods’ health, connections, command calls, and resource consumption.

    Tungsten Fabric Controller Available since MOS 21.1

    Provides graphs on the overall Tungsten Fabric controller cluster processes and usage.

    Tungsten Fabric vRouter Available since MOS 21.1

    Provides graphs on the overall Tungsten Fabric vRouter cluster processes and usage.

    ZooKeeper Clusters Available since MOS 21.1

    Provides graphs on ZooKeeper clusters’ quorum health and resource consumption.

View Kibana dashboards

Kibana is part of the StackLight logging stack. Using the Kibana web UI, you can view the visual representation of your OpenStack deployment notifications.

The Notifications dashboard provides visualizations on the number of notifications over time per source and severity, host, and breakdowns. The dashboard includes search.

For other dashboards, including the logs and Kubernetes events dashboards, see Mirantis Container Cloud Operations Guide: View Kibana dashboards.

Note

By default, StackLight logging stack, including Kibana, is disabled. For details, see Mirantis Container Cloud Reference Architecture: Deployment architecture.

To view the Kibana dashboards:

  1. Log in to the Kibana web UI as described in Mirantis Container Cloud Operations Guide: Access StackLight web UIs.

  2. Click the required dashboard to inspect the visualizations or perform a search.

StackLight alerts

This section provides an overview of the available predefined OpenStack-related StackLight alerts. To view the alerts, use the Prometheus web UI. To view the firing alerts, use Alertmanager or Alerta web UI.

For other alerts, including the node, Kubernetes, Ceph, and StackLight alerts, see Mirantis Container Cloud Operations Guide: Available StackLight alerts.

Core services

This section describes the alerts available for the core services.

Libvirt

This section lists the alerts for the libvirt service.


LibvirtDown

Severity

Critical

Summary

Failure to gather libvirt metrics.

Description

The libvirt metric exporter fails to gather metrics on the {{ $labels.node }} node for 2 minutes.


LibvirtExporterTargetDown

Available since MOS 21.6

Severity

Major

Summary

Libvirt exporter Prometheus target is down.

Description

Prometheus fails to scrape metrics from the libvirt exporter endpoint on the {{ $labels.node }} node (more than 1/10 failed scrapes).


LibvirtExporterTargetsOutage

Available since MOS 21.6

Severity

Critical

Summary

Libvirt exporter Prometheus targets outage.

Description

Prometheus fails to scrape metrics from all libvirt exporter endpoints (more than 1/10 failed scrapes).

MariaDB

This section lists the alerts for the MariaDB service.


MariadbGaleraDonorFallingBehind

Severity

Warning

Summary

MariaDB cluster donor node is falling behind.

Description

The MariaDB cluster node is falling behind (the queue size is {{ $value }}).


MariadbGaleraNotReady

Severity

Major

Summary

MariaDB cluster is not ready.

Description

The MariaDB cluster is not ready to accept queries.


MariadbGaleraOutOfSync

Severity

Minor

Summary

MariaDB cluster node is out of sync.

Description

The MariaDB cluster node is not in sync ({{ $value }} != 4).


MariadbInnodbLogWaits

Severity

Warning

Summary

MariaDB InnoDB log writes are stalling.

Description

The MariaDB InnoDB logs are waiting for the disk at a rate of {{ $value }} per second.


MariadbInnodbReplicationFallenBehind

Severity

Warning

Summary

MariaDB InnoDB replication is lagging.

Description

The MariaDB InnoDB replication has fallen behind and is not recovering.


MariadbTableLockWaitHigh

Severity

Minor

Summary

MariaDB table lock waits are high.

Description

MariaDB has {{ $value }}% of table lock waits.

Memcached

This section lists the alerts for the Memcached service.


MemcachedServiceDown

Severity

Minor

Summary

Memcached service is down.

Description

The Memcached database cluster {{ $labels.cluster }} in the {{ $labels.namespace }} namespace is down.


MemcachedConnectionsNoneMinor

Severity

Minor

Summary

Memcached has no open connections.

Description

The Memcached database cluster {{ $labels.cluster }} in the {{ $labels.namespace }} namespace has no open connections.


MemcachedConnectionsNoneMajor

Severity

Major

Summary

Memcached has no open connections on all nodes.

Description

The Memcached database cluster {{ $labels.cluster }} in the {{ $labels.namespace }} namespace has no open connections on all nodes.


MemcachedEvictionsLimit

Severity

Warning

Summary

10 Memcached evictions.

Description

An average of {{ $value }} evictions occurred in the Memcached database cluster {{ $labels.cluster }} in the {{ $labels.namespace }} namespace during the last minute.

SSL certificates

This section describes the alerts for the OpenStack SSL certificates.


OpenstackSSLCertExpirationMajor

Severity

Major

Summary

SSL certificate for an OpenStack service expires in 10 days.

Description

The SSL certificate for the OpenStack {{ $labels.namespace }}/{{ $labels.service_name }} service endpoints expires in less than 10 days.


OpenstackSSLCertExpirationWarning

Severity

Warning

Summary

SSL certificate for an OpenStack service expires in 30 days.

Description

The SSL certificate for the OpenStack {{ $labels.namespace }}/{{ $labels.service_name }} service endpoints expires in less than 30 days.


OpenstackSSLProbesFailing

Severity

Critical

Summary

SSL certificate probes for an OpenStack service are failing.

Description

The SSL certificate probes for the OpenStack {{ $labels.namespace }}/{{ $labels.service_name }} service endpoints are failing.


OpenstackSSLProbesTargetOutage

Available since MOS 21.6

Severity

Critical

Summary

OpenStack {{ $labels.service_name }} SSL ingress target outage.

Description

Prometheus fails to probe the OpenStack {{ $labels.service_name }} service SSL ingress target (more than 1/10 failed scrapes).

RabbitMQ

This section lists the alerts for the RabbitMQ service.


RabbitMQNetworkPartitionsDetected

Severity

Warning

Summary

RabbitMQ network partitions detected.

Description

The {{ $labels.node }} server of the {{ $labels.cluster }} RabbitMQ cluster in the {{ $labels.namespace }} Namespace detects {{ $value }} network partitions.


RabbitMQDown

Severity

Critical

Summary

RabbitMQ is down.

Description

The {{ $labels.cluster }} RabbitMQ cluster in the {{ $labels.namespace }} Namespace is down for the last 2 minutes.


RabbitMQExporterTargetDown

Available since MOS 21.6

Severity

Major

Summary

{{ $labels.service_name }} RabbitMQ exporter Prometheus target is down.

Description

Prometheus fails to scrape metrics from the {{ $labels.service_name }} RabbitMQ exporter endpoint (more than 1/10 failed scrapes).


RabbitMQOperatorTargetDown

Available since MOS 21.6

Severity

Major

Summary

RabbitMQ operator Prometheus target is down.

Description

Prometheus fails to scrape metrics from the RabbitMQ operator endpoint (more than 1/10 failed scrapes).


RabbitMQFileDescriptorUsageWarning

Severity

Warning

Summary

RabbitMQ file descriptors usage is high for the last 10 minutes.

Description

The {{ $labels.node }} server of the {{ $labels.cluster }} RabbitMQ cluster in the {{ $labels.namespace }} Namespace has high file descriptor usage of {{ $value }}%.


RabbitMQNodeDiskFreeAlarm

Severity

Warning

Summary

RabbitMQ disk space usage is high.

Description

The {{ $labels.node }} server of the {{ $labels.cluster }} RabbitMQ cluster in the {{ $labels.namespace }} Namespace has low free disk space available.


RabbitMQNodeMemoryAlarm

Severity

Minor

Summary

RabbitMQ memory usage is high.

Description

The {{ $labels.node }} server of the {{ $labels.cluster }} RabbitMQ cluster in the {{ $labels.namespace }} Namespace has low free memory.

OpenStack

This section describes the alerts available for the OpenStack services.

OpenStack services API

This section describes the alerts for the OpenStack services API.


OpenstackIngressControllerTargetsOutage

Available since MOS 21.6

Severity

Critical

Summary

OpenStack ingress controller Prometheus targets outage.

Description

Prometheus fails to scrape metrics from all OpenStack ingress controller endpoints (more than 1/10 failed scrapes).


OpenstackAPI401Critical

Available since MOS 21.3

Severity

Critical

Summary

OpenStack API responds with HTTP 401.

Description

The OpenStack API {{ $labels.component }} responds with HTTP 401 for more than 5% of requests for the last 10 minutes.


OpenstackAPI5xxCritical

Available since MOS 21.3

Severity

Critical

Summary

OpenStack API responds with HTTP 5xx.

Description

The OpenStack API {{ $labels.component }} responds with HTTP 5xx for more than 1% of requests for the last 10 minutes.


OpenstackPublicAPI401Critical

Available since MOS 21.3

Severity

Critical

Summary

OpenStack public API responds with HTTP 401.

Description

The OpenStack {{ $labels.ingress }} public ingress responds with HTTP 401 for more than 5% of requests for the last 10 minutes.


OpenstackPublicAPI5xxCritical

Available since MOS 21.3

Severity

Critical

Summary

OpenStack Public API responds with HTTP 5xx.

Description

The OpenStack {{ $labels.ingress }} public ingress responds with HTTP 5xx for more than 1% of requests for the last 10 minutes.


OpenstackServiceInternalApiOutage

Available since MOS 21.5

Severity

Critical

Summary

OpenStack {{ $labels.service_name }} internal API outage.

Description

The OpenStack {{ $labels.service_name }} internal API is not accessible.


OpenstackServicePublicApiOutage

Available since MOS 21.5

Severity

Critical

Summary

OpenStack {{ $labels.service_name }} public API outage.

Description

The OpenStack {{ $labels.service_name }} public API is not accessible.

Cinder

This section lists the alerts for Cinder.


CinderServiceDisabled

Available since MOS 21.4

Severity

Critical

Summary

{{ $labels.binary }} service is disabled.

Description

The {{ $labels.binary }} service is disabled on all hosts.

CinderServiceDown

Severity

Minor

Summary

{{ $labels.binary }} service is down.

Description

The {{ $labels.binary }} service is in the down state on {{ $value }} host(s) where it is enabled.


CinderServiceOutage

Severity

Critical

Summary

{{ $labels.binary }} service outage.

Description

The {{ $labels.binary }} service is down on all hosts where it is enabled.

Ironic

This section lists the alerts for Ironic.


IronicDriverMissing

Severity

Major

Summary

ironic-conductor {{ $labels.driver }} back-end driver missing.

Description

The {{ $labels.driver }} back-end driver of the ironic-conductor container is missing on {{ $value }} node(s).

Neutron

This section lists the alerts for Neutron.


NeutronAgentDisabled

Available since MOS 21.4

Severity

Critical

Summary

{{ $labels.binary }} agent is disabled.

Description

The {{ $labels.binary }} agent is disabled on all hosts.


NeutronAgentDown

Severity

Minor

Summary

{{ $labels.binary }} agent is down.

Description

The {{ $labels.binary }} agent is in the down state on {{ $value }} host(s) where it is enabled.


NeutronAgentOutage

Available since MOS 21.4

Severity

Critical

Summary

{{ $labels.binary }} agent outage.

Description

The {{ $labels.binary }} agent is down on all hosts where it is enabled.

Nova

This section lists the alerts for Nova.


NovaServiceDisabled

Available since MOS 21.4

Severity

Critical

Summary

{{ $labels.binary }} service is disabled.

Description

The {{ $labels.binary }} service is disabled on all hosts.


NovaServiceDown

Severity

Minor

Summary

{{ $labels.binary }} service is down.

Description

The {{ $labels.binary }} service is in the down state on {{ $value }} host(s) where it is enabled.


NovaServiceOutage

Severity

Critical

Summary

{{ $labels.binary }} service outage.

Description

The {{ $labels.binary }} service is down on all hosts where it is enabled.

Tungsten Fabric

Available since MOS 21.1

This section describes the alerts available for the Tungsten Fabric services.

Cassandra

Available since MOS 21.1

This section lists the alerts for Cassandra.


CassandraAuthFailures

Severity

Warning

Summary

Cassandra authentication failures.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster reports an increased number of authentication failures.


CassandraCacheHitRateTooLow

Severity

Major

Summary

Cassandra cache hit rate is too low.

Description

The average hit rate for the {{ $labels.cache }} cache in the {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster is below 85%.


CassandraClientRequestFailure

Severity

Major

Summary

Cassandra client {{ $labels.operation }} request failure.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster}} cluster reports an increased number of {{ $labels.operation }} operation failures. A failure is a non-timeout exception.


CassandraClientRequestUnavailable

Severity

Critical

Summary

Cassandra client {{ $labels.operation }} request is unavailable.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ labels.cassandra_cluster }} cluster reports an increased number of {{ $labels.operation }} operations ending with UnavailableException. There are not enough replicas alive to perform the {{ $labels.operation }} query with the requested consistency level.


CassandraClusterTargetsOutage

Available since MOS 21.6

Severity

Critical

Summary

Cassandra cluster Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of the {{ $labels.cluster }} cluster endpoints (more than 1/10 failed scrapes).


CassandraCommitlogTasksPending

Severity

Warning

Summary

Cassandra commitlog has too many pending tasks.

Description

The commitlog in the {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster reached 15 pending tasks.


CassandraCompactionExecutorTasksBlocked

Severity

Warning

Summary

Cassandra compaction executor tasks are blocked.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ labels.cassandra_cluster }} cluster reports that {{ $value }} compaction executor tasks are blocked.


CassandraCompactionTasksPending

Severity

Warning

Summary

Cassandra has too many pending compactions.

Description

The pending compaction tasks in the {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ labels.cassandra_cluster }} cluster reached the threshold of 100 on average as measured over 30 minutes. This may occur due to a too low cluster I/O capacity.


CassandraConnectionTimeouts

Severity

Critical

Summary

Cassandra connection timeouts.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster reports an increased number of connection timeouts between nodes.


CassandraFlushWriterTasksBlocked

Severity

Warning

Summary

Cassandra flush writer tasks are blocked.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster reports that {{ $value }} flush writer tasks are blocked.


CassandraHintsTooMany

Severity

Major

Summary

Cassandra has too many hints.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster reports an increased number of hints. Replica nodes are not available to accept mutation due to a failure or maintenance.


CassandraRepairTasksBlocked

Severity

Warning

Summary

Cassandra repair tasks are blocked.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster reports that {{ $value }} repair tasks are blocked.


CassandraStorageExceptions

Severity

Critical

Summary

Cassandra storage exceptions.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster reports an increased number of storage exceptions.


CassandraTombstonesTooManyMajor

Severity

Major

Summary

Cassandra scanned 1000 tombstones.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster scanned {{ $value }} tombstones in 99% of read queries.


CassandraTombstonesTooManyWarning

Severity

Warning

Summary

Cassandra scanned 100 tombstones.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster scanned {{ $value }} tombstones in 99% of read queries.


CassandraViewWriteLatencyTooHigh

Severity

Warning

Summary

Cassandra high view/write latency.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Cassandra Pod in the {{ $labels.cassandra_cluster }} cluster reports over 1-second view/write latency for 99% of requests.

Kafka

Available since MOS 21.1

This section lists the alerts for Kafka.


KafkaClusterTargetsOutage

Available since MOS 21.6

Severity

Critical

Summary

Kafka cluster Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of the {{ $labels.cluster }} cluster endpoints (more than 1/10 failed scrapes).


KafkaInsufficientBrokers

Severity

Critical

Summary

Kafka cluster has missing brokers.

Description

The {{ $labels.cluster }} Kafka cluster in the {{ $labels.namespace }} namespace has missing brokers.


KafkaMissingController

Severity

Critical

Summary

Kafka cluster controller is missing.

Description

The {{ $labels.cluster }} Kafka cluster in the {{ $labels.namespace }} namespace has no controllers.


KafkaOfflinePartitionsDetected

Severity

Critical

Summary

Unavailable partitions in Kafka cluster.

Description

Partitions without a primary replica have been detected in the {{ $labels.cluster }} Kafka cluster in the {{ $labels.namespace }} namespace.


KafkaTooManyControllers

Severity

Critical

Summary

Kafka cluster has too many controllers.

Description

The {{ $labels.cluster }} Kafka cluster in the {{ $labels.namespace }} in namespace has too many controllers.


KafkaUncleanLeaderElectionOccured

Severity

Major

Summary

Unclean Kafka broker was elected as cluster leader.

Description

A Kafka broker that has not finished the replication state has been elected as leader in {{ $labels.cluster }} within the {{ $labels.namespace }} namespace.


KafkaUnderReplicatedPartitions

Severity

Warning

Summary

Kafka cluster has underreplicated partitions.

Description

The topics in the {{ $labels.cluster }} Kafka cluster in the {{ $labels.namespace }} namespace have insufficient replica partitions.

Redis

Available since MOS 21.1

This section lists the alerts for Redis.


RedisClusterFlapping

Severity

Major

Summary

Redis cluster is flapping.

Description

Changes have been detected in the {{ $labels.cluster }} Redis cluster within the {{ $labels.namespace }} namespace replica connections.


RedisClusterTargetsOutage

Available since MOS 21.6

Severity

Major

Summary

Redis cluster Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of the {{ $labels.cluster }} cluster endpoints (more than 1/10 failed scrapes).


RedisDisconnectedReplicas

Severity

Minor

Summary

Redis has disconnected replicas.

Description

The {{ $labels.cluster }} Redis cluster in the {{ $labels.namespace }} namespace is not replicating to all replicas. Consider verifying the Redis replication status.


RedisDown

Severity

Critical

Summary

Redis Pod is down.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} Redis Pod in the {{ $labels.cluster }} cluster is down.


RedisMissingPrimary

Severity

Critical

Summary

Redis cluster has no primary node.

Description

The {{ $labels.cluster }} Redis cluster in the {{ $labels.namespace }} namespace has no node marked as primary.


RedisMultiplePrimaries

Severity

Major

Summary

Redis has multiple primaries.

Description

The {{ $labels.cluster }} Redis cluster in the {{ $labels.namespace }} namespace has {{ $value }} nodes marked as primary.


RedisRejectedConnections

Severity

Major

Summary

Redis cluster has rejected connections.

Description

Some connections to the {{ $labels.namespace }}/{{ $labels.pod }} Redis Pod in the {{ $labels.cluster }} cluster have been rejected.


RedisReplicationBroken

Severity

Major

Summary

Redis replication is broken.

Description

The {{ $labels.cluster }} Redis cluster in the {{ $labels.namespace }} namespace instance lost a replica.

Tungsten Fabric

Available since MOS 21.1

This section lists the alerts for Tungsten Fabric.


TungstenFabricAPI401Critical

Available since MOS 21.4

Severity

Critical

Summary

Tungsten Fabric API responds with HTTP 401.

Description

The Tungsten Fabric API responds with HTTP 401 for more than 5% of requests for the last 10 minutes.


TungstenFabricAPI5xxCritical

Available since MOS 21.4

Severity

Critical

Summary

Tungsten Fabric API responds with HTTP 5xx.

Description

The Tungsten Fabric API responds with HTTP 5xx for more than 1% of requests for the last 10 minutes.


TungstenFabricBGPSessionsDown

Severity

Warning

Summary

Tungsten Fabric BGP sessions are down.

Description

{{ $value }} Tungsten Fabric BGP sessions on the {{ $labels.node }} node are down for 2 minutes.


TungstenFabricBGPSessionsNoActive

Severity

Warning

Summary

No active Tungsten Fabric BGP sessions.

Description

There are no active Tungsten Fabric BGP sessions on the {{ $labels.node }} node for 2 minutes.


TungstenFabricBGPSessionsNoEstablished

Severity

Warning

Summary

No established Tungsten Fabric BGP sessions.

Description

There are no established Tungsten Fabric BGP sessions on the {{ $labels.node }} node for 2 minutes.


TungstenFabricControllerDown

Severity

Minor

Summary

Tungsten Fabric controller is down.

Description

The Tungsten Fabric controller on the {{ $labels.node }} node is down for 2 minutes.


TungstenFabricControllerOutage

Severity

Critical

Summary

All Tungsten Fabric controllers are down.

Description

All Tungsten Fabric controllers are down for 2 minutes.


TungstenFabricControllerTargetsOutage

Available since MOS 21.6

Severity

Critical

Summary

Tungsten Fabric Controller Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of the Tungsten Fabric Controller exporter endpoints (more than 1/10 failed scrapes).


TungstenFabricVrouterDown

Severity

Minor

Summary

Tungsten Fabric vRouter is down.

Description

The Tungsten Fabric vRouter on the {{ $labels.node }} node is down for 2 minutes.


TungstenFabricVrouterLLSSessionsChangesTooHigh

Severity

Warning

Summary

Tungsten Fabric vRouter LLS sessions changes reached the limit of 5.

Description

The Tungsten Fabric vRouter LLS sessions on the {{ $labels.node }} node have changed {{ $value }} times.


TungstenFabricVrouterLLSSessionsTooHigh

Severity

Warning

Summary

Tungsten Fabric vRouter LLS sessions reached the limit of 10.

Description

{{ $value }} Tungsten Fabric vRouter LLS sessions are open on the {{ $labels.node }} node for 2 minutes.


TungstenFabricVrouterMetadataCheck

Severity

Critical

Summary

Tungsten Fabric metadata is unavailable.

Description

The Tungsten Fabric metadata on the {{ $labels.node }} node is unavailable for 15 minutes.


TungstenFabricVrouterOutage

Severity

Critical

Summary

All Tungsten Fabric vRouters are down.

Description

All Tungsten Fabric vRouters are down for 2 minutes.


TungstenFabricVrouterTargetDown

Available since MOS 21.6

Severity

Major

Summary

Tungsten Fabric vRouter Prometheus target is down.

Description

Prometheus fails to scrape metrics from the Tungsten Fabric vRouter exporter endpoint on the {{ $labels.node }} node (more than 1/10 failed scrapes).


TungstenFabricVrouterTargetsOutage

Available since MOS 21.6

Severity

Critical

Summary

Tungsten Fabric vRouter Prometheus targets outage.

Description

Prometheus fails to scrape metrics from all Tungsten Fabric vRouter exporter endpoints (more than 1/10 failed scrapes).


TungstenFabricVrouterXMPPSessionsChangesTooHigh

Severity

Warning

Summary

Tungsten Fabric vRouter XMPP sessions changes reached the limit of 5.

Description

The Tungsten Fabric vRouter XMPP sessions on the {{ $labels.node }} node have changed {{ $value }} times.


TungstenFabricVrouterXMPPSessionsTooHigh

Severity

Warning

Summary

Tungsten Fabric vRouter XMPP sessions reached the limit of 10.

Description

{{ $value }} Tungsten Fabric vRouter XMPP sessions are open on the {{ $labels.node }} node for 2 minutes.


TungstenFabricVrouterXMPPSessionsZero

Severity

Warning

Summary

No Tungsten Fabric vRouter XMPP sessions.

Description

There are no Tungsten Fabric vRouter XMPP sessions on the {{ $labels.node }} node for 2 minutes.


TungstenFabricXMPPSessionsChangesTooHigh

Severity

Warning

Summary

Tungsten Fabric XMPP sessions changes reached the limit of 100.

Description

The Tungsten Fabric XMPP sessions on the {{ $labels.node }} node have changed {{ $value }} times.


TungstenFabricXMPPSessionsDown

Severity

Warning

Summary

Tungsten Fabric XMPP sessions are down.

Description

{{ $value }} Tungsten Fabric XMPP sessions on the {{ $labels.node }} node are down for 2 minutes.


TungstenFabricXMPPSessionsMissing

Severity

Warning

Summary

Missing Tungsten Fabric XMPP sessions.

Description

{{ $value }} Tungsten Fabric XMPP sessions are missing on the compute cluster for 2 minutes.


TungstenFabricXMPPSessionsMissingEstablished

Severity

Warning

Summary

Missing established Tungsten Fabric XMPP sessions.

Description

{{ $value }} established Tungsten Fabric XMPP sessions are missing on the compute cluster for 2 minutes.


TungstenFabricXMPPSessionsTooHigh

Severity

Warning

Summary

Tungsten Fabric XMPP sessions reached the limit of 500.

Description

{{ $value }} Tungsten Fabric XMPP sessions on the {{ $labels.node }} node are open for 2 minutes.

ZooKeeper

Available since MOS 21.1

This section lists the alerts for ZooKeeper.


ZooKeeperClusterTargetsOutage

Available since MOS 21.6

Severity

Major

Summary

ZooKeeper cluster Prometheus targets outage.

Description

Prometheus fails to scrape metrics from 2/3 of the {{ $labels.cluster }} cluster endpoints (more than 1/10 failed scrapes).


ZooKeeperMissingFollowers

Severity

Warning

Summary

ZooKeeper cluster has missing followers.

Description

The {{ $labels.cluster }} ZooKeeper cluster in the {{ $labels.namespace }} namespace has missing follower servers.


ZooKeeperRequestOverload

Severity

Warning

Summary

ZooKeeper server request overload.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} ZooKeeper Pod in the {{ $labels.cluster }} cluster is not keeping up with request handling.


ZooKeeperRunningOutOfFileDescriptors

Severity

Warning

Summary

ZooKeeper server is running out of file descriptors.

Description

The {{ $labels.namespace }}/{{ $labels.pod }} ZooKeeper Pod in the {{ $labels.cluster }} cluster is using at least 85% of available file descriptors.


ZooKeeperSyncOverload

Severity

Warning

Summary

ZooKeeper leader synchronization overload.

Description

The ZooKeeper leader in the {{ $labels.cluster }} cluster in the {{ $labels.namespace }} namespace is not keeping up with synchronization.

Alert dependencies

Available since MOS 21.1

Using alert inhibition rules, Alertmanager decreases alert noise by suppressing dependent alerts notifications to provide a clearer view on the cloud status and simplify troubleshooting. Alert inhibition rules are enabled by default. The following table describes the dependency between the OpenStack-related alerts. For other alerts, see Mirantis Container Cloud Operations Guide: Alert dependencies.

Once an alert from the Alert column raises, the alert from the Inhibits column will be suppressed with the Inhibited status in the Alertmanager web UI.

Alert

Inhibits

CassandraTombstonesTooManyMajor

CassandraTombstonesTooManyWarning

CinderServiceOutage

CinderServiceDown

KafkaInsufficientBrokers

TargetDownjob

MemcachedConnectionsNoneMajor

MemcachedConnectionsNoneMinor

NeutronAgentOutage

NeutronAgentDown

NovaServiceOutage

NovaServiceDown

OpenstackServiceApiOutage

OpenstackServiceApiDown

TelegrafGatherErrors

OpenstackSSLCertExpirationMajor

OpenstackSSLCertExpirationWarning

TungstenFabricControllerOutage

TungstenFabricControllerDown

TungstenFabricVrouterOutage

TungstenFabricVrouterDown

TungstenFabricVrouterTargetsOutage

TungstenFabricVrouterTargetDown

Configure StackLight

This section describes how to configure StackLight in your Mirantis OpenStack on Kubernetes deployment and includes the description of OpenStack-related StackLight parameters and their verification. For other available configuration keys and their configuration verification, see Mirantis Container Cloud Operations Guide: Manage StackLight.

StackLight configuration procedure

This section describes the StackLight configuration workflow.

To configure StackLight:

  1. Obtain kubeconfig of the Mirantis Container Cloud management cluster and open the Cluster object manifest of the managed cluster for editing as described in the steps 1-2 in Mirantis Container Cloud Operations Guide: StackLight configuration procedure.

  2. In the following section of the opened manifest, configure the StackLight parameters as required:

    spec:
      providerSpec:
        value:
          helmReleases:
         - name: stacklight
           values:
    
  3. Verify StackLight configuration depending on the modified parameters:

StackLight configuration parameters

This section describes the OpenStack-related StackLight configuration keys that you can specify in the values section to change StackLight settings as required. For other available configuration keys, see Mirantis Container Cloud Operations Guide: StackLight configuration parameters.

Prior to making any changes to StackLight configuration, perform the steps described in StackLight configuration procedure. After changing StackLight configuration, verify the changes as described in Verify StackLight after configuration.


OpenStack

Key

Description

Example values

openstack.enabled (bool)

Enables OpenStack monitoring. Set to true by default.

true or false

openstack.namespace (string)

Defines the namespace within which the OpenStack virtualized control plane is installed. Set to openstack by default.

openstack


Gnocchi

Key

Description

Example values

openstack.gnocchi.enabled (bool)

Enables Gnocchi monitoring. Set to false by default.

true or false


Ironic

Key

Description

Example values

openstack.ironic.enabled (bool)

Enables Ironic monitoring. Set to false by default.

true or false


RabbitMQ

Key

Description

Example values

openstack.rabbitmq.credentialsConfig (map)

Defines the RabbitMQ credentials to use if credentials discovery is disabled or some required parameters were not found during the discovery.

credentialsConfig:
  username: "stacklight"
  password: "stacklight"
  host: "rabbitmq.openstack.svc"
  queue: "notifications"
  vhost: "openstack"

openstack.rabbitmq.credentialsDiscovery (map)

Enables the credentials discovery to obtain the username and password from the secret object.

credentialsDiscovery:
  enabled: true
  namespace: openstack
  secretName: os-rabbitmq-user-credentials

Telegraf

Key

Description

Example values

openstack.telegraf.credentialsConfig (map)

Specifies the OpenStack credentials to use if the credentials discovery is disabled or some required parameters were not found during the discovery.

credentialsConfig:
  identityEndpoint: "" # "http://keystone-api.openstack.svc:5000/v3"
  domain: "" # "default"
  password: "" # "workshop"
  project: "" # "admin"
  region: "" # "RegionOne"
  username: "" # "admin"

openstack.telegraf.credentialsDiscovery (map)

Enables the credentials discovery to obtain all required parameters from the secret object.

credentialsDiscovery:
  enabled: true
  namespace: openstack
  secretName: keystone-keystone-admin

openstack.telegraf.interval (string)

Specifies the interval of metrics gathering from the OpenStack API. Set to 1m by default.

1m, 3m

openstack.telegraf.insecure (bool) Available since MOS Ussuri Update

Enables or disables the server certificate chain and host name verification. Set to true by default.

true or false

openstack.telegraf.skipPublicEndpoints (bool) Available since MOS Ussuri Update

Enables or disables HTTP probes for public endpoints from the OpenStack service catalog. Set to false by default, meaning that Telegraf verifies all endpoints from the OpenStack service catalog, including the public, admin, and internal endpoints.

true or false


SSL certificates

Key

Description

Example values

openstack.externalFQDN (string) Deprecated since MOS 21.5

External FQDN used to communicate with OpenStack services. Used for certificates monitoring. Starting from MOS 21.6, use externalFQDNs.enabled instead.

https://os.ssl.mirantis.net/

externa