Mirantis Container Cloud Reference Architecture latest documentation

Mirantis Container Cloud Reference Architecture

Preface

This documentation provides information on how to deploy and operate Mirantis Container Cloud.

About this documentation set

The documentation is intended to help operators understand the core concepts of the product.

The information provided in this documentation set is being constantly improved and amended based on the feedback and kind requests from our software consumers. This documentation set outlines description of the features that are supported within two latest Cloud Container minor releases, with a corresponding note Available since release.

The following table lists the guides included in the documentation set you are reading:

Guides list

Guide

Purpose

Reference Architecture

Learn the fundamentals of Container Cloud reference architecture to plan your deployment.

Deployment Guide

Deploy Container Cloud of a preferred configuration using supported deployment profiles tailored to the demands of specific business cases.

Operations Guide

Operate your Container Cloud deployment.

Release Compatibility Matrix

Deployment compatibility of the Container Cloud components versions for each product release.

Release Notes

Learn about new features and bug fixes in the current Container Cloud version as well as in the Container Cloud minor releases.

For your convenience, we provide all guides from this documentation set in HTML (default), single-page HTML, PDF, and ePUB formats. To use the preferred format of a guide, select the required option from the Formats menu next to the guide title on the Container Cloud documentation home page.

Intended audience

This documentation assumes that the reader is familiar with network and cloud concepts and is intended for the following users:

  • Infrastructure Operator

    • Is member of the IT operations team

    • Has working knowledge of Linux, virtualization, Kubernetes API and CLI, and OpenStack to support the application development team

    • Accesses Mirantis Container Cloud and Kubernetes through a local machine or web UI

    • Provides verified artifacts through a central repository to the Tenant DevOps engineers

  • Tenant DevOps engineer

    • Is member of the application development team and reports to line-of-business (LOB)

    • Has working knowledge of Linux, virtualization, Kubernetes API and CLI to support application owners

    • Accesses Container Cloud and Kubernetes through a local machine or web UI

    • Consumes artifacts from a central repository approved by the Infrastructure Operator

Conventions

This documentation set uses the following conventions in the HTML format:

Documentation conventions

Convention

Description

boldface font

Inline CLI tools and commands, titles of the procedures and system response examples, table titles.

monospaced font

Files names and paths, Helm charts parameters and their values, names of packages, nodes names and labels, and so on.

italic font

Information that distinguishes some concept or term.

Links

External links and cross-references, footnotes.

Main menu > menu item

GUI elements that include any part of interactive user interface and menu navigation.

Superscript

Some extra, brief information. For example, if a feature is available from a specific release or if a feature is in the Technology Preview development stage.

Note

The Note block

Messages of a generic meaning that may be useful to the user.

Caution

The Caution block

Information that prevents a user from mistakes and undesirable consequences when following the procedures.

Warning

The Warning block

Messages that include details that can be easily missed, but should not be ignored by the user and are valuable before proceeding.

See also

The See also block

List of references that may be helpful for understanding of some related tools, concepts, and so on.

Learn more

The Learn more block

Used in the Release Notes to wrap a list of internal references to the reference architecture, deployment and operation procedures specific to a newly implemented product feature.

Technology Preview support scope

This documentation set includes description of the Technology Preview features. A Technology Preview feature provide early access to upcoming product innovations, allowing customers to experience the functionality and provide feedback during the development process. Technology Preview features may be privately or publicly available and neither are intended for production use. While Mirantis will provide support for such features through official channels, normal Service Level Agreements do not apply. Customers may be supported by Mirantis Customer Support or Mirantis Field Support.

As Mirantis considers making future iterations of Technology Preview features generally available, we will attempt to resolve any issues that customers experience when using these features.

During the development of a Technology Preview feature, additional components may become available to the public for testing. Because Technology Preview features are being under development, Mirantis cannot guarantee the stability of such features. As a result, if you are using Technology Preview features, you may not be able to seamlessly upgrade to subsequent releases of that feature. Mirantis makes no guarantees that Technology Preview features will be graduated to a generally available product release.

The Mirantis Customer Success Organization may create bug reports on behalf of support cases filed by customers. These bug reports will then be forwarded to the Mirantis Product team for possible inclusion in a future release.

Documentation history

The documentation set refers to Mirantis Container Cloud GA as to the latest released GA version of the product. For details about the Container Cloud GA minor releases dates, refer to Container Cloud releases.

Mirantis Container Cloud overview

Mirantis Container Cloud is a set of microservices that are deployed using Helm charts and run in a Kubernetes cluster. Container Cloud is based on the Kubernetes Cluster API community initiative.

The following diagram illustrates the Container Cloud overview:

_images/cluster-overview.png

All artifacts used by Kubernetes and workloads are stored on the Container Cloud content delivery network (CDN):

  • mirror.mirantis.com (Debian packages including the Ubuntu mirrors)

  • binary.mirantis.com (Helm charts and binary artifacts)

  • mirantis.azurecr.io (Docker image registry)

All Container Cloud components are deployed in the Kubernetes clusters. All Container Cloud APIs are implemented using the Kubernetes Custom Resource Definition (CRD) that represents custom objects stored in Kubernetes and allows you to expand Kubernetes API.

The Container Cloud logic is implemented using controllers. A controller handles the changes in custom resources defined in the controller CRD. A custom resource consists of a spec that describes the desired state of a resource provided by a user. During every change, a controller reconciles the external state of a custom resource with the user parameters and stores this external state in the status subresource of its custom resource.

The types of the Container Cloud clusters include:

Bootstrap cluster
  • Runs the bootstrap process on a seed node. For the OpenStack-based or AWS-based Container Cloud, it can be an operator desktop computer. For the baremetal-based Container Cloud, this is the first temporary data center node.

  • Requires access to a provider back end, OpenStack, AWS, or bare metal.

  • Contains minimum set of services to deploy the management and regional clusters.

  • Is destroyed completely after a successful bootstrap.

Management and regional clusters
  • Management cluster:

    • Runs all public APIs and services including the web UIs of Container Cloud.

    • Does not require access to any provider back end.

  • Regional cluster:

    • Is combined with management cluster by default.

    • Runs the provider-specific services and internal API including LCMMachine and LCMCluster. Also, it runs an LCM controller for orchestrating managed clusters and other controllers for handling different resources.

    • Requires two-way access to a provider back end. The provider connects to a back end to spawn a managed cluster nodes, and the agent running on the nodes accesses the regional cluster to obtain the deployment information.

    • Requires access to a management cluster to obtain user parameters.

    • Supports multi-regional deployments. For example, you can deploy an AWS-based management cluster with AWS-based and OpenStack-based regional clusters.

      Supported combinations of providers types for management and regional clusters

      Bare metal regional cluster

      AWS regional cluster

      OpenStack regional cluster

      Bare metal management cluster

      AWS management cluster

      OpenStack management cluster

Managed cluster
  • A Container Cloud cluster with Mirantis Kubernetes Engine (MKE) that an end user creates using Container Cloud.

  • Requires access to a regional cluster. Each node of a managed cluster runs an LCM agent that connects to the LCM machine of the regional cluster to obtain the deployment details.

  • Starting from MKE 3.3.3, a user can also attach and manage an existing MKE cluster that is not created using Container Cloud. In such case, nodes of the attached cluster do not contain LCM agent.

All types of the Container Cloud clusters except the bootstrap cluster are based on the MKE and Mirantis Container Runtime (MCR) architecture. For details, see the following Docker Enterprise documentation:

The following diagram illustrates the distribution of services between each type of the Container Cloud clusters:

_images/cluster-types.png

Mirantis Container Cloud provider

The Mirantis Container Cloud provider is the central component of Container Cloud that provisions a node of a management, regional, or managed cluster and runs the LCM agent on this node. It runs in a management and regional clusters and requires connection to a provider back end.

The Container Cloud provider interacts with the following types of public API objects:

Public API object name

Description

Container Cloud release object

Contains the following information about clusters:

  • Version of the supported Cluster release for a management and regional clusters

  • List of supported Cluster releases for the managed clusters and supported upgrade path

  • Description of Helm charts that are installed on the management and regional clusters depending on the selected provider

Cluster release object

  • Provides a specific version of a management, regional, or managed cluster. Any Cluster release object, as well as a Container Cloud release object never changes, only new releases can be added. Any change leads to a new release of a cluster.

  • Contains references to all components and their versions that are used to deploy all cluster types:

    • LCM components:

      • LCM agent

      • Ansible playbooks

      • Scripts

      • Description of steps to execute during a cluster deployment and upgrade

      • Helm controller image references

    • Supported Helm charts description:

      • Helm chart name and version

      • Helm release name

      • Helm values

Cluster object

  • References the Credentials, KaaSRelease and ClusterRelease objects.

  • Is tied to a specific Container Cloud region and provider.

  • Represents all cluster-level resources. For example, for the OpenStack-based clusters, it represents networks, load balancer for the Kubernetes API, and so on. It uses data from the Credentials object to create these resources and data from the KaaSRelease and ClusterRelease objects to ensure that all lower-level cluster objects are created.

Machine object

  • References the Cluster object.

  • Represents one node of a managed cluster, for example, an OpenStack VM, and contains all data to provision it.

Credentials object

  • Contains all information necessary to connect to a provider back end.

  • Is tied to a specific Container Cloud region and provider.

PublicKey object

Is provided to every machine to obtain an SSH access.

The following diagram illustrates the Container Cloud provider data flow:

_images/provider-dataflow.png

The Container Cloud provider performs the following operations in Container Cloud:

  • Consumes the below types of data from a management and regional cluster:

    • Credentials to connect to a provider back end

    • Deployment instructions from the KaaSRelease and ClusterRelease objects

    • The cluster-level parameters from the Cluster objects

    • The machine-level parameters from the Machine objects

  • Prepares data for all Container Cloud components:

    • Creates the LCMCluster and LCMMachine custom resources for LCM controller and LCM agent. The LCMMachine custom resources are created empty to be later handled by the LCM controller.

    • Creates the the HelmBundle custom resources for the Helm controller using data from the KaaSRelease and ClusterRelease objects.

    • Creates service accounts for these custom resources.

    • Creates a scope in Identity and access management (IAM) for a user access to a managed cluster.

  • Provisions nodes for a managed cluster using the cloud-init script that downloads and runs the LCM agent.

Mirantis Container Cloud release controller

The Mirantis Container Cloud release controller is responsible for the following functionality:

  • Monitor and control the KaaSRelease and ClusterRelease objects present in a management cluster. If any release object is used in a cluster, the release controller prevents the deletion of such an object.

  • Sync the KaaSRelease and ClusterRelease objects published at https://binary.mirantis.com/releases/ with an existing management cluster.

  • Trigger the Container Cloud auto-upgrade procedure if a new KaaSRelease object is found:

    1. Search for the managed clusters with old Cluster releases that are not supported by a new Container Cloud release. If any are detected, abort the auto-upgrade and display a corresponding note about an old Cluster release in the Container Cloud web UI for the managed clusters. In this case, a user must update all managed clusters using the Container Cloud web UI. Once all managed clusters are upgraded to the Cluster releases supported by a new Container Cloud release, the Container Cloud auto-upgrade is retriggered by the release controller.

    2. Trigger the Container Cloud release upgrade of all Container Cloud components in a management cluster. The upgrade itself is processed by the Container Cloud provider.

    3. Trigger the Cluster release upgrade of a management cluster to the Cluster release version that is indicated in the upgraded Container Cloud release version.

    4. Verify the regional cluster(s) status. If the regional cluster is ready, trigger the Cluster release upgrade of the regional cluster.

      Once a management cluster is upgraded, an option to update a managed cluster becomes available in the Container Cloud web UI. During a managed cluster update, all cluster components including Kubernetes are automatically upgraded to newer versions if available.

Mirantis Container Cloud web UI

The Mirantis Container Cloud web UI is mainly designed to create and update the managed clusters as well as add or remove machines to or from an existing managed cluster. It also allows attaching existing Mirantis Kubernetes Engine (MKE) clusters.

You can use the Container Cloud web UI to obtain the management cluster details including endpoints, release version, and so on. The management cluster update occurs automatically with a new release change log available through the Container Cloud web UI.

The Container Cloud web UI is a JavaScript application that is based on the React framework. The Container Cloud web UI is designed to work on a client side only. Therefore, it does not require a special back end. It interacts with the Kubernetes and Keycloak APIs directly. The Container Cloud web UI uses a Keycloak token to interact with Container Cloud API and download kubeconfig for the management and managed clusters.

The Container Cloud web UI uses NGINX that runs on a management cluster and handles the Container Cloud web UI static files. NGINX proxies the Kubernetes and Keycloak APIs for the Container Cloud web UI.

Mirantis Container Cloud bare metal

The bare metal service provides for the discovery, deployment, and management of bare metal hosts.

The bare metal management in Mirantis Container Cloud is implemented as a set of modular microservices. Each microservice implements a certain requirement or function within the bare metal management system.

Bare metal components

The bare metal management solution for Mirantis Container Cloud includes the following components:

Bare metal components

Component

Description

OpenStack Ironic

The back-end bare metal manager in a standalone mode with its auxiliary services that include httpd, dnsmasq, and mariadb.

OpenStack Ironic Inspector

Introspects and discovers the bare metal hosts inventory. Includes OpenStack Ironic Python Agent (IPA) that is used as a provision-time agent for managing bare metal hosts.

Ironic Operator

Monitors changes in the external IP addresses of httpd, ironic, and ironic-inspector and automatically reconciles the configuration for dnsmasq, ironic, baremetal-provider, and baremetal-operator.

Bare Metal Operator

Manages bare metal hosts through the Ironic API. The Container Cloud bare-metal operator implementation is based on the Metal³ project.

cluster-api-provider-baremetal

The plugin for the Kubernetes Cluster API integrated with Container Cloud. Container Cloud uses the Metal³ implementation of cluster-api-provider-baremetal for the Cluster API.

LCM agent

Used for physical and logical storage, physical and logical network, and control over the life cycle of a bare metal machine resources.

Ceph

Distributed shared storage is required by the Container Cloud services to create persistent volumes to store their data.

MetalLB

Load balancer for Kubernetes services on bare metal.

NGINX

Load balancer for external access to the Kubernetes API endpoint.

Keepalived

Monitoring service that ensures availability of the virtual IP for the external load balancer endpoint (NGINX).

IPAM

IP address management services provide consistent IP address space to the machines in bare metal clusters. See details in IP Address Management.

The diagram below summarizes the following components and resource kinds:

  • Metal³-based bare metal management in Container Cloud (white)

  • Internal APIs (yellow)

  • External dependency components (blue)

_images/bm-component-stack.png

IP Address Management

Mirantis Container Cloud on bare metal uses the IP Address Management (IPAM) to keep track of the network addresses allocated to bare metal hosts. This is necessary to avoid IP address conflicts and expiration of address leases to machines through DHCP.

The IPAM functions are provided by the kaas-ipam controller and a set of custom resources. A cluster API extension enables you to define the addresses and associate them with hosts. The addresses are then configured by the Ironic provisioning system using the cloud-init tool.

The kaas-ipam controller provides the following functionality:

  • Link the IPAM objects with the cluster API objects, such as BareMetalHost or Machine through the intermediate IpamHost objects.

  • Handle the IP pools and addresses as Kubernetes custom objects defined by CRDs.

  • Control the integration with Container Cloud.

You can apply complex networking configurations to a bare metal host using the L2 templates. The L2 templates imply multihomed host networking and enable you to create a managed cluster with more than one network interface for different types of traffic. Multihoming is required to ensure the security and performance of a managed cluster. By design, this feature should not touch the NIC that is used for PXE boot and LCM.

IPAM uses single L3 network per management cluster, as defined in Cluster networking, to assign addresses to bare metal hosts.

Extended hardware configuration

Mirantis Container Cloud provides APIs that enable you to define hardware configurations that extend the reference architecture:

Typically, operations with the extended hardware configurations are available through the API and CLI, but not the web UI.

Storage

The baremetal-based Mirantis Container Cloud uses Ceph as a distributed storage system for file, block, and object storage. This section provides an overview of a Ceph cluster deployed by Container Cloud.

Overview

Mirantis Container Cloud deploys Ceph on the baremetal-based management and managed clusters using Helm charts with the following components:

  • Ceph controller - a Kubernetes controller that obtains the parameters from Container Cloud through a custom resource (CR), creates CRs for Rook, and updates its CR status based on the Ceph cluster deployment progress. It creates users, pools, and keys for OpenStack and Kubernetes and provides Ceph configurations and keys to access them. Also, Ceph controller eventually obtains the data from the OpenStack Controller for the Keystone integration and updates the RADOS Gateway services configurations to use Kubernetes for user authentication.

  • Ceph operator

    • Transforms user parameters from the Container Cloud web UI into Rook credentials and deploys a Ceph cluster using Rook.

    • Provides integration of the Ceph cluster with Kubernetes

    • Provides data for OpenStack to integrate with the deployed Ceph cluster

  • Custom resource (CR) - represents the customization of a Kubernetes installation and allows you to define the required Ceph configuration through the Container Cloud web UI before deployment. For example, you can define the failure domain, pools, Ceph node roles, number of Ceph components such as Ceph OSDs, and so on.

  • Rook - a storage orchestrator that deploys Ceph on top of a Kubernetes cluster.

A typical Ceph cluster consists of the following components:

Ceph Monitors

Three or, in rare cases, five Ceph Monitors.

Ceph Managers

Mirantis recommends having three Ceph Managers in every cluster

RADOS Gateway services

Mirantis recommends having three or more RADOS Gateway services for HA.

Ceph OSDs

The number of Ceph OSDs may vary according to the deployment needs.

Warning

A Ceph cluster with 3 Ceph nodes does not provide hardware fault tolerance and is not eligible for recovery operations, such as a disk or an entire Ceph node replacement.

The placement of Ceph Monitors and Ceph Managers is defined in the custom resource.

The following diagram illustrates the way a Ceph cluster is deployed in Container Cloud:

_images/ceph-deployment.png

The following diagram illustrates the processes within a deployed Ceph cluster:

_images/ceph-data-flow.png

Limitations

A Ceph cluster configuration in Mirantis Container Cloud includes but is not limited to the following limitations:

  • Only one Ceph controller per a management, regional, or managed cluster and only one Ceph cluster per Ceph controller are supported.

  • Only one CRUSH tree per cluster. The separation of devices per Ceph pool is supported through device classes with only one pool of each type for a device class.

  • Only the following types of CRUSH buckets are supported:

    • topology.kubernetes.io/region

    • topology.kubernetes.io/zone

    • topology.rook.io/datacenter

    • topology.rook.io/room

    • topology.rook.io/pod

    • topology.rook.io/pdu

    • topology.rook.io/row

    • topology.rook.io/rack

    • topology.rook.io/chassis

  • RBD mirroring is not supported.

  • Consuming an existing Ceph cluster is not supported.

  • CephFS is not supported.

  • Only IPv4 is supported.

  • If two or more Ceph OSDs are located on the same device, there must be no dedicated WAL or DB for this class.

  • Only a full collocation or dedicated WAL and DB configurations are supported.

  • All CRUSH rules must have the same failure_domain.

  • When adding a Ceph node with the Ceph Monitor role, if any issues occur with the Ceph Monitor, rook-ceph removes it and adds a new Ceph Monitor instead, named using the next alphabetic character in order. Therefore, the Ceph Monitor names may not follow the alphabetical order. For example, a, b, d, instead of a, b, c.

See also

Ceph

Kubernetes lifecycle management

The Kubernetes lifecycle management (LCM) engine in Mirantis Container Cloud consists of the following components:

LCM controller

Responsible for all LCM operations. Consumes the LCMCluster object and orchestrates actions through LCM agent.

LCM agent

Relates only to Mirantis Kubernetes Engine (MKE) clusters deployed using Container Cloud, and is not used for attached MKE clusters. Runs on the target host. Executes Ansible playbooks in headless mode.

Helm controller

Responsible for the lifecycle of the Helm charts. It is installed by LCM controller and interacts with Tiller.

The Kubernetes LCM components handle the following custom resources:

  • LCMCluster

  • LCMMachine

  • HelmBundle

The following diagram illustrates handling of the LCM custom resources by the Kubernetes LCM components. On a managed cluster, apiserver handles multiple Kubernetes objects, for example, deployments, nodes, RBAC, and so on.

_images/lcm-components.png

LCM custom resources

The Kubernetes LCM components handle the following custom resources (CRs):

  • LCMMachine

  • LCMCluster

  • HelmBundle

LCMMachine

Describes a machine that is located on a cluster. It contains the machine type, control or worker, StateItems that correspond to Ansible playbooks and miscellaneous actions, for example, downloading a file or executing a shell command. LCMMachine reflects the current state of the machine, for example, a node IP address, and each StateItem through its status. Multiple LCMMachine CRs can correspond to a single cluster.

LCMCluster

Describes a managed cluster. In its spec, LCMCluster contains a set of StateItems for each type of LCMMachine, which describe the actions that must be performed to deploy the cluster. LCMCluster is created by the provider, using machineTypes of the Release object. The status field of LCMCluster reflects the status of the cluster, for example, the number of ready or requested nodes.

HelmBundle

Wrapper for Helm charts that is handled by Helm controller. HelmBundle tracks what Helm charts must be installed on a managed cluster.

LCM controller

LCM controller runs on the management and regional cluster and orchestrates the LCMMachine objects according to their type and their LCMCluster object.

Once the LCMCluster and LCMMachine objects are created, LCM controller starts monitoring them to modify the spec fields and update the status fields of the LCMMachine objects when required. The status field of LCMMachine is updated by LCM agent running on a node of a management, regional, or managed cluster.

Each LCMMachine has the following lifecycle states:

  1. Uninitialized - the machine is not yet assigned to an LCMCluster.

  2. Pending - the agent reports a node IP address and hostname.

  3. Prepare - the machine executes StateItems that correspond to the prepare phase. This phase usually involves downloading the necessary archives and packages.

  4. Deploy - the machine executes StateItems that correspond to the deploy phase that is becoming a Mirantis Kubernetes Engine (MKE) node.

  5. Ready - the machine is deployed.

  6. Reconfigure - the machine is being updated with a new set of manager nodes. Once done, the machine moves to the ready state again.

The templates for StateItems are stored in the machineTypes field of an LCMCluster object, with separate lists for the MKE manager and worker nodes. Each StateItem has the execution phase field for a management, regional, and managed cluster:

  1. The prepare phase is executed for all machines for which it was not executed yet. This phase comprises downloading the files necessary for the cluster deployment, installing the required packages, and so on.

  2. During the deploy phase, a node is added to the cluster. LCM controller applies the deploy phase to the nodes in the following order:

    1. First manager node is deployed.

    2. The remaining manager nodes are deployed one by one and the worker nodes are deployed in batches (by default, up to 50 worker nodes at the same time). After at least one manager and one worker node are in the ready state, helm-controller is installed on the cluster.

LCM controller deploys and upgrades a Mirantis Container Cloud cluster by setting StateItems of LCMMachine objects following the corresponding StateItems phases described above. The Container Cloud cluster upgrade process follows the same logic that is used for a new deployment, that is applying a new set of StateItems to the LCMMachines after updating the LCMCluster object. But during the upgrade, the following additional actions are performed:

  • If the existing worker node is being upgraded, LCM controller performs draining and cordoning on this node honoring the Pod Disruption Budgets. This operation prevents unexpected disruptions of the workloads.

  • LCM controller verifies that the required version of helm-controller is installed.

LCM agent

LCM agent handles a single machine that belongs to a management, regional, or managed cluster. It runs on the machine operating system but communicates with apiserver of the regional cluster. LCM agent is deployed as a systemd unit using cloud-init. LCM agent has a built-in self-upgrade mechanism.

LCM agent monitors the spec of a particular LCMMachine object to reconcile the machine state with the object StateItems and update the LCMMachine status accordingly. The actions that LCM agent performs while handling the StateItems are as follows:

  • Download configuration files

  • Run shell commands

  • Run Ansible playbooks in headless mode

LCM agent provides the IP address and hostname of the machine for the LCMMachine status parameter.

Helm controller

Helm controller is used by Mirantis Container Cloud to handle management, regional, and managed clusters core addons such as StackLight and the application addons such as the OpenStack components.

Helm controller runs in the same pod as the Tiller process. The Tiller gRPC endpoint is not accessible outside the pod. The pod is created using StatefulSet inside a cluster by LCM controller once the cluster contains at least one manager and worker node.

The Helm release information is stored in the KaaSRelease object for the management and regional clusters and in the ClusterRelease object for all types of the Container Cloud clusters. These objects are used by the Container Cloud provider. The Container Cloud provider uses the information from the ClusterRelease object together with the Container Cloud API Cluster spec. In Cluster spec, the operator can specify the Helm release name and charts to use. By combining the information from the Cluster providerSpec parameter and its ClusterRelease object, the cluster actuator generates the LCMCluster objects. These objects are further handled by LCM controller and the HelmBundle object handled by Helm controller. HelmBundle must have the same name as the LCMCluster object for the cluster that HelmBundle applies to.

Although a cluster actuator can only create a single HelmBundle per cluster, Helm controller can handle multiple HelmBundle objects per cluster.

Helm controller handles the HelmBundle objects and reconciles them with the Tiller state in its cluster. However, full reconciliation against Tiller is not supported yet relying on the status data of the HelmBundle objects.

Helm controller can also be used by the management cluster with corresponding HelmBundle objects created as part of the initial management cluster setup.

Identity and access management

Identity and access management (IAM) provides a central point of users and permissions management of the Mirantis Container Cloud cluster resources in a granular and unified manner. Also, IAM provides infrastructure for single sign-on user experience across all Container Cloud web portals.

IAM for Container Cloud consists of the following components:

Keycloak
  • Provides the OpenID Connect endpoint

  • Integrates with an external identity provider (IdP), for example, existing LDAP or Google Open Authorization (OAuth)

  • Stores roles mapping for users

IAM controller
  • Provides IAM API with data about Container Cloud projects

  • Handles all role-based access control (RBAC) components in Kubernetes API

IAM API

Provides an abstraction API for creating user scopes and roles

IAM API and CLI

Mirantis IAM exposes the versioned and backward compatible Google remote procedure call (gRPC) protocol API to interact with IAM CLI.

IAM API is designed as a user-facing functionality. For this reason, it operates in the context of user authentication and authorization.

In IAM API, an operator can use the following entities:

  • Grants - to grant or revoke user access

  • Scopes - to describe user roles

  • Users - to provide user account information

Mirantis Container Cloud UI interacts with IAM API on behalf of the user. However, the user can directly work with IAM API using IAM CLI. IAM CLI uses the OpenID Connect (OIDC) endpoint to obtain the OIDC token for authentication in IAM API and enable you to perform different API operations.

The following diagram illustrates the interaction between IAM API and CLI:

_images/iam-api-cli.png

External identity provider integration

To be consistent and keep the integrity of a user database and user permissions, in Mirantis Container Cloud, IAM stores the user identity information internally. However in real deployments, the identity provider usually already exists.

Out of the box, in Container Cloud, IAM supports integration with LDAP and Google Open Authorization (OAuth). If LDAP is configured as an external identity provider, IAM performs one-way synchronization doing mapping attributes according to configuration.

In case of the Google Open Authorization (OAuth) integration, the user is automatically registered and their credentials are stored in the internal database according to the user template configuration. The Google OAuth registration workflow is as follows:

  1. The user requests a Container Cloud web UI resource.

  2. The user is redirected to the IAM login page and logs in using the Log in with Google account option.

  3. IAM creates a new user with the default access rights that are defined in the user template configuration.

  4. The user can access the Container Cloud web UI resource.

The following diagram illustrates the external IdP integration to IAM:

_images/iam-ext-idp.png

You can configure simultaneous integration with both external IdPs with the user identity matching feature enabled.

Authentication and authorization

Mirantis IAM uses the OpenID Connect (OIDC) protocol for handling authentication.

Implementation flow

Mirantis IAM performs as an OpenID Connect (OIDC) provider, it issues a token and exposes discovery endpoints.

The credentials can be handled by IAM itself or delegated to an external identity provider (IdP).

The issued JSON Web Token (JWT) is sufficient to perform operations across Mirantis Container Cloud according to the scope and role defined in it. Mirantis recommends using asymmetric cryptography for token signing (RS256) to minimize the dependency between IAM and managed components.

When Container Cloud calls Mirantis Kubernetes Engine (MKE), the user in Keycloak is created automatically with a JWT issued by Keycloak on behalf of the end user. MKE, in its turn, verifies whether the JWT is issued by Keycloak. If the user retrieved from the token does not exist in the MKE database, the user is automatically created in the MKE database based on the information from the token.

The authorization implementation is out of the scope of IAM in Container Cloud. This functionality is delegated to the component level. IAM interacts with a Container Cloud component using the OIDC token content that is processed by a component itself and required authorization is enforced. Such an approach enables you to have any underlying authorization that is not dependent on IAM and still to provide a unified user experience across all Container Cloud components.

Kubernetes CLI authentication flow

The following diagram illustrates the Kubernetes CLI authentication flow. The authentication flow for Helm and other Kubernetes-oriented CLI utilities is identical to the Kubernetes CLI flow, but JSON Web Tokens (JWT) must be pre-provisioned.

_images/iam-authn-k8s.png

Monitoring

Mirantis Container Cloud uses StackLight, the logging, monitoring, and alerting solution that provides a single pane of glass for cloud maintenance and day-to-day operations as well as offers critical insights into cloud health including operational information about the components deployed in management, regional, and managed clusters. StackLight is based on Prometheus, an open-source monitoring solution and a time series database.

Deployment architecture

Mirantis Container Cloud deploys the StackLight stack as a release of a Helm chart that contains the helm-controller and helmbundles.lcm.mirantis.com (HelmBundle) custom resources. The StackLight HelmBundle consists of a set of Helm charts with the StackLight components that include:

StackLight components overview

StackLight component

Description

Alerta

Receives, consolidates, and deduplicates the alerts sent by Alertmanager and visually represents them through a simple web UI. Using the Alerta web UI, you can view the most recent or watched alerts, group, and filter alerts.

Alertmanager

Handles the alerts sent by client applications such as Prometheus, deduplicates, groups, and routes alerts to receiver integrations. Using the Alertmanager web UI, you can view the most recent fired alerts, silence them, or view the Alertmanager configuration.

Elasticsearch curator

Maintains the data (indexes) in Elasticsearch by performing such operations as creating, closing, or opening an index as well as deleting a snapshot. Also, manages the data retention policy in Elasticsearch.

Elasticsearch exporter

The Prometheus exporter that gathers internal Elasticsearch metrics.

Grafana

Builds and visually represents metric graphs based on time series databases. Grafana supports querying of Prometheus using the PromQL language.

Database back ends

StackLight uses PostgreSQL for Alerta and Grafana. PostgreSQL reduces the data storage fragmentation while enabling high availability. High availability is achieved using Patroni, the PostgreSQL cluster manager that monitors for node failures and manages failover of the primary node. StackLight also uses Patroni to manage major version upgrades of PostgreSQL clusters, which allows leveraging the database engine functionality and improvements as they are introduced upstream in new releases, maintaining functional continuity without version lock-in.

Logging stack

Responsible for collecting, processing, and persisting logs and Kubernetes events. By default, when deploying through the Container Cloud web UI, only the metrics stack is enabled on managed clusters. To enable StackLight to gather a managed cluster logs, enable the logging stack during deployment. On management clusters, the logging stack is enabled by default. The logging stack components include:

  • Elasticsearch, which stores logs and notifications.

  • Fluentd-elasticsearch, which collects logs, sends them to Elasticsearch, generates metrics based on analysis of incoming log entries, and exposes these metrics to Prometheus.

  • Kibana, which provides real-time visualization of the data stored in Elasticsearch and enables you to detect issues.

  • Metricbeat, which collects Kubernetes events and sends them to Elasticsearch for storage.

  • Prometheus-es-exporter, which presents the Elasticsearch data as Prometheus metrics by periodically sending configured queries to the Elasticsearch cluster and exposing the results to a scrapable HTTP endpoint like other Prometheus targets.

Metric collector

Collects telemetry data (CPU or memory usage, number of active alerts, and so on) from Prometheus and sends the data to centralized cloud storage for further processing and analysis. Metric collector is enabled by default and runs on the management cluster.

Netchecker

Monitors the network connectivity between the Kubernetes nodes. Netchecker runs on managed clusters.

Prometheus

Gathers metrics. Automatically discovers and monitors the endpoints. Using the Prometheus web UI, you can view simple visualizations and debug. By default, the Prometheus database stores metrics of the past 15 days or up to 15 GB of data depending on the limit that is reached first.

Prometheus-es-exporter

Presents the Elasticsearch data as Prometheus metrics by periodically sending configured queries to the Elasticsearch cluster and exposing the results to a scrapable HTTP endpoint like other Prometheus targets.

Prometheus node exporter

Gathers hardware and operating system metrics exposed by kernel.

Prometheus Relay

Adds a proxy layer to Prometheus to merge the results from underlay Prometheus servers to prevent gaps in case some data is missing on some servers. Is available only in the HA StackLight mode.

Pushgateway

Enables ephemeral and batch jobs to expose their metrics to Prometheus. Since these jobs may not exist long enough to be scraped, they can instead push their metrics to Pushgateway, which then exposes these metrics to Prometheus. Pushgateway is not an aggregator or a distributed counter but rather a metrics cache. The pushed metrics are exactly the same as scraped from a permanently running program.

Telegraf

Collects metrics from the system. Telegraf is plugin-driven and has the concept of two distinct set of plugins: input plugins collect metrics from the system, services, or third-party APIs; output plugins write and expose metrics to various destinations.

The Telegraf agents used in Container Cloud include:

  • telegraf-ds-smart monitors SMART disks, and runs on both management and managed clusters.

  • telegraf-ironic monitors Ironic on the baremetal-based management clusters. The ironic input plugin collects and processes data from Ironic HTTP API, while the http_response input plugin checks Ironic HTTP API availability. As an output plugin, to expose collected data as Prometheus target, Telegraf uses prometheus.

  • telegraf-ucp gathers metrics from the Mirantis Container Runtime API about the Docker nodes, networks, and Swarm services. This is a Docker Telegraf input plugin with the downstream additions.

Telemeter

Enables a multi-cluster view through a Grafana dashboard of the management cluster. Telemeter includes a Prometheus federation push server and clients to enable isolated Prometheus instances, which cannot be scraped from a central Prometheus instance, to push metrics to the central location.

The Telemeter services are distributed as follows:

  • Management cluster hosts the Telemeter server

  • Regional clusters host the Telemeter server and Telemeter client

  • Managed clusters host the Telemeter client

The metrics from managed clusters are aggregated on regional clusters. Then both regional and managed clusters metrics are sent from regional clusters to the management cluster.

Every Helm chart contains a default values.yml file. These default values are partially overridden by custom values defined in the StackLight Helm chart.

Before deploying a management or managed cluster, you can select the HA or non-HA StackLight architecture type. The non-HA mode is set by default. The following table lists the differences between the HA and non-HA modes:

StackLight database modes

Non-HA StackLight mode default

HA StackLight mode

  • One Prometheus instance

  • One Elasticsearch instance

  • One PostgreSQL instance

One persistent volume is provided for storing data. In case of a service or node failure, a new pod is redeployed and the volume is reattached to provide the existing data. Such setup has a reduced hardware footprint but provides less performance.

  • Two Prometheus instances

  • Three Elasticsearch instances

  • Three PostgreSQL instances

Local Volume Provisioner is used to provide local host storage. In case of a service or node failure, the traffic is automatically redirected to any other running Prometheus or Elasticsearch server. For better performance, Mirantis recommends that you deploy StackLight in the HA mode.

Authentication flow

StackLight provides five web UIs including Prometheus, Alertmanager, Alerta, Kibana, and Grafana. Access to StackLight web UIs is protected by Keycloak-based Identity and access management (IAM). All web UIs except Alerta are exposed to IAM through the IAM proxy middleware. The Alerta configuration provides direct integration with IAM.

The following diagram illustrates accessing the IAM-proxied StackLight web UIs, for example, Prometheus web UI:

_images/sl-auth-iam-proxied.png

Authentication flow for The IAM-proxied StackLight web UIs:

  1. A user enters the public IP of a StackLight web UI, for example, Prometheus web UI.

  2. The public IP leads to IAM proxy, deployed as a Kubernetes LoadBalancer, which protects the Prometheus web UI.

  3. LoadBalancer routes the HTTP request to Kubernetes internal IAM proxy service endpoints, specified in the X-Forwarded-Proto or X-Forwarded-Host headers.

  4. The Keycloak login form opens (--discovery-url in the IAM proxy, which points to Keycloak realm) and the user enters the user name and password.

  5. Keycloak validates the user name and password.

  6. The user obtains access to the Prometheus web UI (--upstream-url in the IAM proxy).

Note

  • The discovery URL is the URL of the IAM service.

  • The upstream URL is the hidden endpoint of a web UI (Prometheus web UI in the example above).

The following diagram illustrates accessing the Alerta web UI:

_images/sl-authentication-direct.png

Authentication flow for the Alerta web UI:

  1. A user enters the public IP of the Alerta web UI.

  2. The public IP leads to Alerta deployed as a Kubernetes LoadBalancer type.

  3. LoadBalancer routes the HTTP request to the Kubernetes internal Alerta service endpoint.

  4. The Keycloak login form opens (Alerta refers to the IAM realm) and the user enters the user name and password.

  5. Keycloak validates the user name and password.

  6. The user obtains access to the Alerta web UI.

Supported features

Using the Mirantis Container Cloud web UI, on the pre-deployment stage of a managed cluster, you can view, enable or disable, or tune the following StackLight features available:

  • StackLight HA mode.

  • Database retention size and time for Prometheus.

  • Tunable index retention period for Elasticsearch.

  • Tunable PersistentVolumeClaim (PVC) size for Prometheus and Elasticsearch set to 16 GB for Prometheus and 30 GB for Elasticsearch by default. The PVC size must be logically aligned with the retention periods or sizes for these components.

  • Email and Slack receivers for the Alertmanager notifications.

  • Predefined set of dashboards.

  • Predefined set of alerts and capability to add new custom alerts for Prometheus in the following exemplary format:

    - alert: HighErrorRate
      expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
      for: 10m
      labels:
        severity: page
      annotations:
        summary: High request latency
    

Monitored components

StackLight measures, analyzes, and reports in a timely manner about failures that may occur in the following Mirantis Container Cloud components and their sub-components, if any:

  • Ceph

  • Ironic (Container Cloud bare-metal provider)

  • Kubernetes services:

    • Calico

    • etcd

    • Kubernetes cluster

    • Kubernetes containers

    • Kubernetes deployments

    • Kubernetes nodes

    • Netchecker

  • NGINX

  • Node hardware and operating system

  • PostgreSQL

  • SMART disks

  • StackLight:

    • Alertmanager

    • Elasticsearch

    • Grafana

    • Prometheus

    • Prometheus Relay

    • Pushgateway

    • Telemeter

  • SSL certificates

  • Mirantis Kubernetes Engine (MKE)

    • Docker/Swarm metrics (through Telegraf)

    • Built-in MKE metrics

Outbound cluster metrics

The data collected and transmitted through an encrypted channel back to Mirantis provides our Customer Success Organization information to better understand the operational usage patterns our customers are experiencing as well as to provide feedback on product usage statistics to enable our product teams to enhance our products and services for our customers.

The node-level resource data are broken down into three broad categories: Cluster, Node, and Namespace. The telemetry data tracks Allocatable, Capacity, Limits, Requests, and actual Usage of node-level resources.

Terms explanation

Term

Definition

Allocatable

On a Kubernetes Node, the amount of compute resources that are available for pods

Capacity

The total number of available resources regardless of current consumption

Limits

Constraints imposed by Administrators

Requests

The resources that a given container application is requesting

Usage

The actual usage or consumption of a given resource

The full list of the outbound data includes:

  • From all Container Cloud managed clusters:

    • cluster_master_nodes_total

    • cluster_nodes_total

    • cluster_persistentvolumeclaim_requests_storage_bytes

    • cluster_total_alerts_triggered

    • cluster_usage_cpu_cores

    • cluster_usage_per_allocatable_cpu_ratio

    • cluster_worker_nodes_total

    • cluster_working_set_memory_bytes

    • cluster_working_set_per_allocatable_memory_ratio

    • kaas_clusters

    • kaas_machines_ready

    • kaas_machines_requested

  • From Mirantis OpenStack on Kubernetes (MOSK) managed clusters only:

    • openstack_cinder_volumes_total

    • openstack_glance_images_total

    • openstack_glance_snapshots_total

    • openstack_instance_create_end

    • openstack_instance_create_error

    • openstack_instance_create_start

    • openstack_instance_downtime_check_all

    • openstack_instance_downtime_check_failed

    • openstack_keystone_tenants_total

    • openstack_keystone_users_total

    • openstack_kpi_downtime

    • openstack_kpi_provisioning

    • openstack_neutron_lbaas_loadbalancers_total

    • openstack_neutron_networks_total

    • openstack_neutron_ports_total

    • openstack_neutron_routers_total

    • openstack_neutron_subnets_total

    • openstack_nova_disk_total_gb

    • openstack_nova_instances_active_total

    • openstack_nova_ram_total_gb

    • openstack_nova_used_disk_total_gb

    • openstack_nova_used_ram_total_gb

    • openstack_nova_used_vcpus_total

    • openstack_nova_vcpus_total

    • openstack_quota_instances

    • openstack_quota_ram_gb

    • openstack_quota_vcpus

    • openstack_quota_volume_storage_gb

    • openstack_usage_instances

    • openstack_usage_ram_gb

    • openstack_usage_vcpus

    • openstack_usage_volume_storage_gb

Hardware and system requirements

Using Mirantis Container Cloud, you can deploy a Kubernetes cluster on bare metal, OpenStack, or Amazon Web Services (AWS). Each provider requires corresponding resources.

Note

Using the free Mirantis license, you can create up to three Container Cloud managed clusters with three worker nodes on each cluster. Within the same quota, you can also attach existing MKE clusters that are not deployed by Container Cloud. If you need to increase this quota, contact Mirantis support for further details.

Baremetal-based cluster

Reference hardware configuration

The following hardware configuration is used as a reference to deploy Mirantis Container Cloud with bare metal Container Cloud clusters with Mirantis Kubernetes Engine.

Reference hardware configuration for Container Cloud clusters on bare metal

Server role

# of servers

Server model

CPU model

# of CPUs

# of vCPUs

RAM, GB

SSD system disk, GB

SSD/HDD storage disk, GB 0

NIC model

# of NICs 1

# of server ifaces (onboard + NICs)

Management cluster

3 2

Supermicro 1U SYS-6018R-TDW

Intel Xeon E5-2620v4

1

16

96

1x 960 3

2x 1900

Intel X520-DA2

2

6

Managed cluster

6 3

Supermicro 1U SYS-6018R-TDW

Intel Xeon E5-2620v4

1

16

96

1x 960 4

2x 1900

Intel X520-DA2

2

6

0

Minimum 3 storage disks are required:

  • sda - minimum 60 GB for system

  • sdb - minimum 60 GB for LocalVolumeProvisioner

  • sdc - for Ceph

For the default storage schema, see Operations Guide: Default host system storage

1

Only one PXE NIC per node is allowed.

2

Adding more than 3 nodes to a management or regional cluster is not supported.

3(1,2)

A management cluster requires 2 volumes for Container Cloud (total 50 GB) and 5 volumes for StackLight (total 60 GB). A managed cluster requires 5 volumes for StackLight.

4

Three manager nodes for HA and three worker storage nodes for a minimal Ceph cluster. For more details about Ceph requirements, see Ceph.

System requirements for the seed node

The seed node is necessary only to deploy the management cluster. When the bootstrap is complete, the bootstrap node can be redeployed and its resources can be reused for the managed cluster workloads.

The minimum reference system requirements for an baremetal-based bootstrap seed node are as follow:

  • Basic server on Ubuntu 18.04 with the following configuration:

    • Kernel version 4.15.0-76.86 or later

    • 8 GB of RAM

    • 4 CPU

    • 10 GB of free disk space for the bootstrap cluster cache

  • No DHCP or TFTP servers on any NIC networks

  • Routable access IPMI network for the hardware servers. For more details, see Host networking.

  • Internet access for downloading of all required artifacts

Host networking

The following network roles are defined for all Mirantis Container Cloud clusters nodes on bare metal including the bootstrap, management, and managed cluster nodes:

  • Out-of-band (OOB) network

    Connects the Baseboard Management Controllers (BMC) of the hosts in the network to Ironic. This network or multiple networks if managed clusters have their own OOB networks must be accessible from the PXE network through the IP routing.

  • Common/PXE network

    Is a general purpose network used to remotely boot servers through the PXE protocol as well as for the Kubernetes API access and Kubernetes pods traffic. This network is shared between the management and managed clusters.

    Warning

    Only one Ethernet port on a host must be connected to the Common/PXE network at any given time. The physical address (MAC) of this interface must be noted and used to configure the BareMetalHost object describing the host.

The initially installed bootstrap node or node0 must be connected to the following networks:

  • The OOB network. Ironic must have access to the IPMI/BMC of the managed bare metal hosts. Though, Ironic must not be connected to the L2 segment directly. The OOB network must be accessible through the Router 1 in the PXE network.

  • The Common/PXE network. The instance of the kaas-bm running on node0 provides DHCP service on this network. This service is required for Ironic to inspect the bare metal hosts and install the operating system. The bootstrap node must be directly connected to the PXE network to ensure the L2 connectivity for DHCP. The default route for node0 must point to the Router 1 in the PXE network.

The Container Cloud bootstrap cluster node has the following networking configuration:

_images/bm-bootstrap-network.png

A management cluster node has the following networking configuration:

_images/bm-mgmt-network.png

A managed cluster node has the following network configuration:

_images/bm-managed-network.png

Cluster networking

The following diagram illustrates the L3 networking schema for the final state of the bare metal deployment as described in Host networking.

_images/bm-cluster-l3-networking.png

Network fabric

The following diagram illustrates the physical and virtual L2 underlay networking schema for the final state of the Mirantis Container Cloud bare metal deployment.

_images/bm-cluster-physical-and-l2-networking.png

The network fabric reference configuration is a spine/leaf with 2 leaf ToR switches and one out-of-band (OOB) switch per rack.

Reference configuration uses the following switches for ToR and OOB:

  • Cisco WS-C3560E-24TD has 24 of 1 GbE ports. Used in OOB network segment.

  • Dell Force 10 S4810P has 48 of 1/10GbE ports. Used as ToR in Common/PXE network segment.

In the reference configuration, all odd interfaces from NIC0 are connected to TOR Switch 1, and all even interfaces from NIC0 are connected to TOR Switch 2. The Baseboard Management Controller (BMC) interfaces of the servers are connected to OOB Switch 1.

Ceph

The management cluster requires minimum three storage devices per node for Ceph:

  • The first device is always used by the operating system.

  • At least one disk per server must be configured as a device managed by a Ceph OSD.

  • One disk per server is required for LocalVolumeProvisioner for storing data of persistent volumes served by the Local Storage Static Provisioner (local-volume-provisioner).

  • The recommended number of Ceph OSDs per a management cluster node is two or more. Container Cloud supports up to 22 Ceph OSDs per node.

OpenStack-based cluster

While planning the deployment of an OpenStack-based Mirantis Container Cloud cluster with Mirantis Kubernetes Engine, consider the following general requirements:

  • Kubernetes on OpenStack requires the Cinder and Octavia APIs availability.

  • The only supported OpenStack networking is Open vSwitch. Other networking technologies, such as Tungsten Fabric, are not supported.

  • The bootstrap and management clusters must have access to *.mirantis.com to download the release information and artifacts.

Note

Container Cloud is developed and tested on OpenStack Queens.

Requirements for an OpenStack-based Container Cloud cluster

Resource

Bootstrap cluster 0

Management cluster

Managed cluster

Comments

# of nodes

1

3 (HA) + 1 (Bastion)

5 (6 with StackLight HA)

  • A bootstrap cluster requires access to the OpenStack API.

  • A management cluster requires 3 nodes for the manager nodes HA. Adding more than 3 nodes to a management or regional cluster is not supported.

  • A managed cluster requires 3 nodes for the manager nodes HA and 2 nodes for the Container Cloud workloads. If the multiserver mode is enabled for StackLight, 3 nodes are required for the Container Cloud workloads.

  • A management cluster requires 1 node for the Bastion instance that is created with a public IP address to allow SSH access to instances.

# of vCPUs per node

2

8

8

  • The Bastion node requires 1 vCPU.

  • Refer to the RAM recommendations described below to plan resources for different types of nodes.

RAM in GB per node

4

16

16

To prevent issues with low RAM, Mirantis recommends the following types of instances for a managed cluster with 50-200 nodes:

  • 16 vCPUs and 32 GB of RAM - manager node

  • 16 vCPUs and 128 GB of RAM - nodes where the StackLight server components run

The Bastion node requires 1 GB of RAM.

Storage in GB per node

5 (available)

120

120

For the Bastion node, the default amount of storage is enough.

Operating system

Ubuntu 16.04 or 18.04

Ubuntu 18.04

Ubuntu 18.04

For a management and managed cluster, a base Ubuntu 18.04 image with the default SSH ubuntu user name must be present in Glance.

Docker version

18.09

-

-

For a management and managed cluster, Mirantis Container Runtime 19.03.12 is deployed by Container Cloud as a CRI.

OpenStack version

-

Queens

Queens

Obligatory OpenStack components

-

Octavia, Cinder, OVS

Octavia, Cinder, OVS

# of Cinder volumes

-

7 (total 110 GB)

5 (total 60 GB)

  • A management cluster requires 2 volumes for Container Cloud (total 50 GB) and 5 volumes for StackLight (total 60 GB)

  • A managed cluster requires 5 volumes for StackLight

# of load balancers

-

10

6

  • LBs for a management cluster: 1 for Kubernetes, 4 for Container Cloud, 5 for StackLight

  • LBs for a managed cluster: 1 for Kubernetes and 5 for StackLight

# of floating IPs

-

13

11

  • FIPs for a management cluster: 1 for Kubernetes, 3 for the manager nodes (one FIP per node), 4 for Container Cloud, 5 for StackLight

  • FIPs for a managed cluster: 1 for Kubernetes, 3 for the manager nodes, 2 for the worker nodes, 5 for StackLight

0

The bootstrap cluster is necessary only to deploy the management cluster. When the bootstrap is complete, this cluster can be deleted and its resources can be reused for a managed cluster workloads.

AWS-based cluster

While planning the deployment of an AWS-based Mirantis Container Cloud cluster with Mirantis Kubernetes Engine, consider the requirements described below.

Warning

Some of the AWS features required for Container Cloud may not be included into your AWS account quota. Therefore, carefully consider the AWS fees applied to your account that may increase for the Container Cloud infrastructure.

Requirements for an AWS-based Container Cloud cluster

Resource

Bootstrap cluster 0

Management cluster

Managed cluster

Comment

# of nodes

1

3 (HA)

5 (6 with StackLight HA)

  • A bootstrap cluster requires access to the Mirantis CDN.

  • A management cluster requires 3 nodes for the manager nodes HA. Adding more than 3 nodes to a management or regional cluster is not supported.

  • A managed cluster requires 3 nodes for the manager nodes HA and 2 nodes for the Container Cloud workloads. If the multiserver mode is enabled for StackLight, 3 nodes are required for the Container Cloud workloads.

# of vCPUs per node

2

8

8

RAM in GB per node

4

16

16

Storage in GB per node

5 (available)

120

120

Operating system

Ubuntu 16.04 or 18.04

Ubuntu 18.04

Ubuntu 18.04

For a management and managed cluster, a base Ubuntu 18.04 image with the default SSH ubuntu user name is required.

Docker version

18.09

-

-

For a management and managed cluster, Mirantis Container Runtime 19.03.12 is deployed by Container Cloud as a CRI.

Instance type

-

c5d.2xlarge

c5d.2xlarge

To prevent issues with low RAM, Mirantis recommends the following types of instances for a managed cluster with 50-200 nodes:

  • c5d.4xlarge - manager node

  • r5.4xlarge - nodes where the StackLight server components run

Bastion host instance type

-

t2.micro

t2.micro

The Bastion instance is created with a public Elastic IP address to allow SSH access to instances.

# of volumes

-

7 (total 110 GB)

5 (total 60 GB)

  • A management cluster requires 2 volumes for Container Cloud (total 50 GB) and 5 volumes for StackLight (total 60 GB)

  • A managed cluster requires 5 volumes for StackLight

# of Elastic load balancers to be used

-

10

6

  • Elastic LBs for a management cluster: 1 for Kubernetes, 4 for Container Cloud, 5 for StackLight

  • Elastic LBs for a managed cluster: 1 for Kubernetes and 5 for StackLight

# of Elastic IP addresses to be used

-

1

1

0

The bootstrap cluster is necessary only to deploy the management cluster. When the bootstrap is complete, this cluster can be deleted and its resources can be reused for the managed cluster workloads.

vSphere-based cluster

Caution

This feature is available as Technology Preview. Use such configuration for testing and evaluation purposes only. For details about the Mirantis Technology Preview support scope, see the Preface section of this guide.

Caution

This feature is available starting from the Container Cloud release 2.2.0.

In a Mirantis Container Cloud deployment on VMWare vSphere, the bootstrap and management clusters must have access to *.mirantis.com to download the release information and artifacts.

Note

Container Cloud is developed and tested on VMWare vSphere 7.0.

Requirements for a vSphere-based Container Cloud cluster

Resource

Bootstrap cluster 0

Management cluster

Managed cluster

Comments

# of nodes

1

3 (HA)

5 (6 with StackLight HA)

  • A bootstrap cluster requires access to the VMWare vSphere API.

  • A management cluster requires 3 nodes for the manager nodes HA. Adding more than 3 nodes to a management or regional cluster is not supported.

  • A managed cluster requires 3 nodes for the manager nodes HA and 2 nodes for the Container Cloud workloads. If the multiserver mode is enabled for StackLight, 3 nodes are required for the Container Cloud workloads.

# of vCPUs per node

2

8

8

Refer to the RAM recommendations described below to plan resources for different types of nodes.

RAM in GB per node

4

16

16

To prevent issues with low RAM, Mirantis recommends the following VM templates for a managed cluster with 50-200 nodes:

  • 16 vCPUs and 32 GB of RAM - manager node

  • 16 vCPUs and 128 GB of RAM - nodes where the StackLight server components run

Storage in GB per node

5 (available)

120

120

Operating system

Ubuntu 16.04 or 18.04

RHEL 7.8

RHEL 7.8

For a management and managed cluster, a base RHEL 7.8 VM template must be present in the VMWare VM templates folder available to Container Cloud. For details about the template, see Deployment Guide: Prerequisites.

RHEL license

-

RHEL licenses for Virtual Datacenters

RHEL licenses for Virtual Datacenters

This license type allows running unlimited guests inside one hypervisor. The amount of licenses is equal to the amount of hypervisors in VMware vCenter Server, which will be used to host RHEL-based machines. Container Cloud will schedule machines according to scheduling rules applied to VMware vCenter Server. Therefore, make sure that your RedHat Customer portal account has enough licenses for allowed hypervisors.

Docker version

18.09

-

-

For a management and managed cluster, Mirantis Container Runtime 19.03.12 is deployed by Container Cloud as a CRI.

VMWare vSphere version

-

7.0

7.0

Obligatory VMWare vSphere capabilities

-

DRS (Distributed Resources Scheduler)

DRS

IP subnet size

-

/24

/24

VMWare network must have an external DHCP server on the primary cluster network to assign IP addresses to the node VMs.

IP addresses distribution:

  • Management cluster: 1 for Kubernetes, 3 for the manager nodes (one per node), 4 for Container Cloud, 5 for StackLight

  • Managed cluster: 1 for Kubernetes, 3 for the manager nodes, 2 for the worker nodes, 5 for StackLight

0

The bootstrap cluster is necessary only to deploy the management cluster. When the bootstrap is complete, this cluster can be deleted and its resources can be reused for a managed cluster workloads.

Mirantis Kubernetes Engine API limitations

To ensure the Mirantis Container Cloud stability in managing the Container Cloud-based Mirantis Kubernetes Engine (MKE) clusters, the following MKE API functionality is not available for the Container Cloud-based MKE clusters as compared to the attached MKE clusters that are not deployed by Container Cloud. Use the Container Cloud web UI or CLI for this functionality instead.

Public APIs limitations in a Container Cloud-based MKE cluster

API endpoint

Limitation

GET /swarm

Swarm Join Tokens are filtered out for all users, including admins.

PUT /api/ucp/config-toml

All requests are forbidden.

POST /nodes/{id}/update

Requests for the following changes are forbidden:

  • Change Role

  • Add or remove the com.docker.ucp.orchestrator.swarm and com.docker.ucp.orchestrator.kubernetes labels.

DELETE /nodes/{id}

All requests are forbidden.