Docker Enterprise Container Cloud Reference Architecture latest documentation

Docker Enterprise Container Cloud Reference Architecture

Preface

This documentation provides information on how to use Mirantis products to deploy cloud environments. The information is for reference purposes and is subject to change.

Intended audience

This documentation assumes that the reader is familiar with network and cloud concepts and is intended for the following users:

  • Infrastructure Operator

    • Is member of the IT operations team

    • Has working knowledge of Linux, virtualization, Kubernetes API and CLI, and OpenStack to support the application development team

    • Accesses Docker Enterprise (DE) Container Cloud and Kubernetes through a local machine or web UI

    • Provides verified artifacts through a central repository to the Tenant DevOps engineers

  • Tenant DevOps engineer

    • Is member of the application development team and reports to line-of-business (LOB)

    • Has working knowledge of Linux, virtualization, Kubernetes API and CLI to support application owners

    • Accesses DE Container Cloud and Kubernetes through a local machine or web UI

    • Consumes artifacts from a central repository approved by the Infrastructure Operator

Documentation history

The documentation set refers to DE Container Cloud GA as to the latest released GA version of the product. For details about the DE Container Cloud GA minor releases dates, refer to DE Container Cloud releases.

Introduction

Docker Enterprise (DE) Container Cloud orchestrates DE Clusters with UCP and addresses key challenges with running Kubernetes on-premises, including:

  • Management capabilities of DE Clusters with UCP utilizing Kubernetes Cluster API, with self-service API and web-based UI

  • Controlling and delegating access to DE Clusters with UCP and their namespaces using existing Identity Providers with IAM integration based on Keycloak

  • Backend-agnostic load balancing and storage capabilities for Kubernetes through integration with OpenStack, AWS, and bare metal services

Docker Enterprise Container Cloud overview

Docker Enterprise (DE) Container Cloud is a set of microservices that are deployed using Helm charts and run in a Kubernetes cluster. DE Container Cloud is based on the Kubernetes Cluster API community initiative.

The following diagram illustrates the DE Container Cloud overview:

_images/cluster-overview.png

All artifacts used by Kubernetes and workloads are stored on the DE Container Cloud content delivery network (CDN):

  • mirror.mirantis.com (Debian packages including the Ubuntu mirrors)

  • binary.mirantis.com (Helm charts and binary artifacts)

  • mirantis.azurecr.io (Docker image registry)

All DE Container Cloud components are deployed in the Kubernetes clusters. All DE Container Cloud APIs are implemented using the Kubernetes Custom Resource Definition (CRD) that represents custom objects stored in Kubernetes and allows you to expand Kubernetes API.

The DE Container Cloud logic is implemented using controllers. A controller handles the changes in custom resources defined in the controller CRD. A custom resource consists of a spec that describes the desired state of a resource provided by a user. During every change, a controller reconciles the external state of a custom resource with the user parameters and stores this external state in the status subresource of its custom resource.

The types of the DE Container Cloud clusters include:

Bootstrap cluster
  • Runs the bootstrap process on a seed node. For the OpenStack-based or AWS-based DE Container Cloud, it can be an operator desktop computer. For the baremetal-based DE Container Cloud, this is the first temporary data center node.

  • Requires access to a provider back end, OpenStack, AWS, or bare metal.

  • Contains minimum set of services to deploy the management and regional clusters.

  • Is destroyed completely after a successful bootstrap.

Management and regional clusters
  • Management cluster:

    • Runs all public APIs and services including the web UIs of DE Container Cloud.

    • Does not require access to any provider back end.

  • Regional cluster:

    • Is combined with management cluster by default.

    • Runs the provider-specific services and internal API including LCMMachine and LCMCluster. Also, it runs an LCM controller for orchestrating managed clusters and other controllers for handling different resources.

    • Requires two-way access to a provider back end. The provider connects to a back end to spawn a managed cluster nodes, and the agent running on the nodes accesses the regional cluster to obtain the deployment information.

    • Requires access to a management cluster to obtain user parameters.

    • Supports multi-regional deployments. For example, you can deploy an AWS-based management cluster with AWS-based and OpenStack-based regional clusters.

      Supported combinations of providers types for management and regional clusters

      Bare metal regional cluster

      AWS regional cluster

      OpenStack regional cluster

      Bare metal management cluster

      AWS management cluster

      OpenStack management cluster

Managed cluster
  • A DE Cluster with Universal Control Plane (UCP) that an end user creates using DE Container Cloud.

  • Requires access to a regional cluster. Each node of a managed cluster runs an LCM agent that connects to the LCM machine of the regional cluster to obtain the deployment details.

  • Starting from UCP 3.3.3, a user can also attach and manage an existing UCP cluster that is not created using DE Container Cloud. In such case, nodes of the attached cluster do not contain LCM agent.

All types of the DE Container Cloud clusters except the bootstrap cluster are based on the Docker Enterprise UCP and Docker Engine - Enterprise architecture. For details, see the following Docker Enterprise documentation:

The following diagram illustrates the distribution of services between each type of the DE Container Cloud clusters:

_images/cluster-types.png

Docker Enterprise Container Cloud provider

The Docker Enterprise (DE) Container Cloud provider is the central component of DE Container Cloud that provisions a node of a management, regional, or managed cluster and runs the LCM agent on this node. It runs in a management and regional clusters and requires connection to a provider back end.

The DE Container Cloud provider interacts with the following types of public API objects:

Public API object name

Description

DE Container Cloud release object

Contains the following information about clusters:

  • Version of the supported Cluster release for a management and regional clusters

  • List of supported Cluster releases for the managed clusters and supported upgrade path

  • Description of Helm charts that are installed on the management and regional clusters depending on the selected provider

Cluster release object

  • Provides a specific version of a management, regional, or managed cluster. Any Cluster release object, as well as a DE Container Cloud release object never changes, only new releases can be added. Any change leads to a new release of a cluster.

  • Contains references to all components and their versions that are used to deploy all cluster types:

    • LCM components:

      • LCM agent

      • Ansible playbooks

      • Scripts

      • Description of steps to execute during a cluster deployment and upgrade

      • Helm controller image references

    • Supported Helm charts description:

      • Helm chart name and version

      • Helm release name

      • Helm values

Cluster object

  • References the Credentials, KaaSRelease and ClusterRelease objects.

  • Is tied to a specific DE Container Cloud region and provider.

  • Represents all cluster-level resources. For example, for the OpenStack-based clusters, it represents networks, load balancer for the Kubernetes API, and so on. It uses data from the Credentials object to create these resources and data from the KaaSRelease and ClusterRelease objects to ensure that all lower-level cluster objects are created.

Machine object

  • References the Cluster object.

  • Represents one node of a managed cluster, for example, an OpenStack VM, and contains all data to provision it.

Credentials object

  • Contains all information necessary to connect to a provider back end.

  • Is tied to a specific DE Container Cloud region and provider.

PublicKey object

Is provided to every machine to obtain an SSH access.

The following diagram illustrates the DE Container Cloud provider data flow:

_images/provider-dataflow.png

The DE Container Cloud provider performs the following operations in DE Container Cloud:

  • Consumes the below types of data from a management and regional cluster:

    • Credentials to connect to a provider back end

    • Deployment instructions from the KaaSRelease and ClusterRelease objects

    • The cluster-level parameters from the Cluster objects

    • The machine-level parameters from the Machine objects

  • Prepares data for all DE Container Cloud components:

    • Creates the LCMCluster and LCMMachine custom resources for LCM controller and LCM agent. The LCMMachine custom resources are created empty to be later handled by the LCM controller.

    • Creates the the HelmBundle custom resources for the Helm controller using data from the KaaSRelease and ClusterRelease objects.

    • Creates service accounts for these custom resources.

    • Creates a scope in Identity and access management (IAM) for a user access to a managed cluster.

  • Provisions nodes for a managed cluster using the cloud-init script that downloads and runs the LCM agent.

Docker Enterprise Container Cloud release controller

The Docker Enterprise (DE) Container Cloud release controller is responsible for the following functionality:

  • Monitor and control the KaaSRelease and ClusterRelease objects present in a management cluster. If any release object is used in a cluster, the release controller prevents the deletion of such an object.

  • Sync the KaaSRelease and ClusterRelease objects published at https://binary.mirantis.com/releases/ with an existing management cluster.

  • Trigger the DE Container Cloud auto-upgrade procedure if a new KaaSRelease object is found:

    1. Search for the managed clusters with old Cluster releases that are not supported by a new DE Container Cloud release. If any are detected, abort the auto-upgrade and display a corresponding note about an old Cluster release in the DE Container Cloud web UI for the managed clusters. In this case, a user must update all managed clusters using the DE Container Cloud web UI. Once all managed clusters are upgraded to the Cluster releases supported by a new DE Container Cloud release, the DE Container Cloud auto-upgrade is retriggered by the release controller.

    2. Trigger the DE Container Cloud release upgrade of all DE Container Cloud components in a management cluster. The upgrade itself is processed by the DE Container Cloud provider.

    3. Trigger the Cluster release upgrade of a management cluster to the Cluster release version that is indicated in the upgraded DE Container Cloud release version.

    4. Verify the regional cluster(s) status. If the regional cluster is ready, trigger the Cluster release upgrade of the regional cluster.

      Once a management cluster is upgraded, an option to update a managed cluster becomes available in the DE Container Cloud web UI. During a managed cluster update, all cluster components including Kubernetes are automatically upgraded to newer versions if available.

Docker Enterprise Container Cloud web UI

The Docker Enterprise (DE) Container Cloud web UI is mainly designed to create and update the managed clusters as well as add or remove machines to or from an existing managed cluster. It also allows attaching existing UCP clusters.

You can use the DE Container Cloud web UI to obtain the management cluster details including endpoints, release version, and so on. The management cluster update occurs automatically with a new release change log available through the DE Container Cloud web UI.

The DE Container Cloud web UI is a JavaScript application that is based on the React framework. The DE Container Cloud web UI is designed to work on a client side only. Therefore, it does not require a special back end. It interacts with the Kubernetes and Keycloak APIs directly. The DE Container Cloud web UI uses a Keycloak token to interact with DE Container Cloud API and download kubeconfig for the management and managed clusters.

The DE Container Cloud web UI uses NGINX that runs on a management cluster and handles the DE Container Cloud web UI static files. NGINX proxies the Kubernetes and Keycloak APIs for the DE Container Cloud web UI.

Docker Enterprise Container Cloud bare metal

The bare metal service provides for the discovery, deployment, and management of bare metal hosts.

The bare metal management in Docker Enterprise (DE) Container Cloud is implemented as a set of modular mircoservices. Each mircroservice implements a certain requirement or function within the bare metal management system.

Bare metal components

The bare metal management solution for Docker Enterprise (DE) Container Cloud includes the following components:

Bare metal components

Component

Description

OpenStack Ironic

The back-end bare metal manager in a standalone mode with its auxiliary services that include httpd, dnsmasq, and mariadb.

OpenStack Ironic Inspector

Introspects and discovers the bare metal hosts inventory. Includes OpenStack Ironic Python Agent (IPA) that is used as a provision-time agent for managing bare metal hosts.

Ironic Operator

Monitors changes in the external IP addresses of httpd, ironic, and ironic-inspector and automatically reconciles the configuration for dnsmasq, ironic, baremetal-provider, and baremetal-operator.

Bare Metal Operator

Manages bare metal hosts through the Ironic API. The DE Container Cloud bare-metal operator implementation is based on the Metal³ project.

cluster-api-provider-baremetal

The plugin for the Kubernetes Cluster API integrated with DE Container Cloud. DE Container Cloud uses the Metal³ implementation of cluster-api-provider-baremetal for the Cluster API.

LCM agent

Used for physical and logical storage, physical and logical network, and control over the life cycle of a bare metal machine resources.

Ceph

Distributed shared storage is required by the DE Container Cloud services to create persistent volumes to store their data.

MetalLB

Load balancer for Kubernetes services on bare metal.

NGINX

Load balancer for external access to the Kubernetes API endpoint.

Keepalived

Monitoring service that ensures availability of the virtual IP for the external load balancer endpoint (NGINX).

IPAM

IP address management services provide consistent IP address space to the machines in bare metal clusters. See details in IP Address Management.

The diagram below summarizes the following components and resource kinds:

  • Metal³-based bare metal management in DE Container Cloud (white)

  • Internal APIs (yellow)

  • External dependency components (blue)

_images/bm-component-stack.png

Docker Enterprise Container Cloud limitations on bare metal

The Docker Enterprise (DE) Container Cloud bare metal management system has the following limitations in support of hardware capabilities:

  • Multiple storage devices are required

    To enable Ceph storage on the management cluster nodes, multiple storage devices are required for each node. The first device is always used by the operating system. At least one additional disk per server must be configured as a Ceph OSD device. The recommended number of OSD devices per a management cluster node is two or more. DE Container Cloud supports up to 22 OSD devices per node.

  • Only the UEFI boot mode is supported

    All bare metal hosts used to deploy management and managed clusters must be configured to boot in the UEFI mode. The non-UEFI boot is not supported by the Bare Metal Operator component of DE Container Cloud.

IP Address Management

Docker Enterprise (DE) Container Cloud on bare metal uses the IP Address Management (IPAM) to keep track of the network addresses allocated to bare metal hosts. This is necessary to avoid IP address conflicts and expiration of address leases to machines through DHCP.

The IPAM functions are provided by the kaas-ipam controller and a set of custom resources. A cluster API extension enables you to define the addresses and associate them with hosts. The addresses are then configured by the Ironic provisioning system using the cloud-init tool.

The kaas-ipam controller provides the following functionality:

  • Link the IPAM objects with the cluster API objects, such as BareMetalHost or Machine through the intermediate IpamHost objects.

  • Handle the IP pools and addresses as Kubernetes custom objects defined by CRDs.

  • Control the integration with DE Container Cloud.

You can apply complex networking configurations to a bare metal host using the L2 templates. The L2 templates imply multihomed host networking and enable you to create a managed cluster with more than one network interface for different types of traffic. Multihoming is required to ensure the security and performance of a managed cluster. By design, this feature should not touch the NIC that is used for PXE boot and LCM.

IPAM uses single L3 network per management cluster, as defined in Cluster networking, to assign addresses to bare metal hosts.

Extended hardware configuration

Docker Enterprise (DE) Container Cloud provides APIs that enable you to define hardware configurations that extend the reference architecture:

  • Bare Metal Host Profile API

    Enables for quick configuration of host boot and storage devices and assigning of custom configuration profiles to individual machines.

  • IP Address Management API

    Enables for quick configuration of host network interfaces and IP addresses and setting up of IP addresses ranges for automatic allocation.

Typically, operations with the extended hardware configurations are available through the API and CLI, but not the web UI.

Kubernetes lifecycle management

The Kubernetes lifecycle management (LCM) engine in Docker Enterprise (DE) Container Cloud consists of the following components:

LCM controller

Responsible for all LCM operations. Consumes the LCMCluster object and orchestrates actions through LCM agent.

LCM agent

Relates only to UCP clusters deployed using DE Container Cloud, and is not used for attached UCP clusters. Runs on the target host. Executes Ansible playbooks in headless mode.

Helm controller

Responsible for the lifecycle of the Helm charts. It is installed by LCM controller and interacts with Tiller.

The Kubernetes LCM components handle the following custom resources:

  • LCMCluster

  • LCMMachine

  • HelmBundle

The following diagram illustrates handling of the LCM custom resources by the Kubernetes LCM components. On a managed cluster, apiserver handles multiple Kubernetes objects, for example, deployments, nodes, RBAC, and so on.

_images/lcm-components.png

LCM custom resources

The Kubernetes LCM components handle the following custom resources (CRs):

  • LCMMachine

  • LCMCluster

  • HelmBundle

LCMMachine

Describes a machine that is located on a cluster. It contains the machine type, control or worker, StateItems that correspond to Ansible playbooks and miscellaneous actions, for example, downloading a file or executing a shell command. LCMMachine reflects the current state of the machine, for example, a node IP address, and each StateItem through its status. Multiple LCMMachine CRs can correspond to a single cluster.

LCMCluster

Describes a managed cluster. In its spec, LCMCluster contains a set of StateItems for each type of LCMMachine, which describe the actions that must be performed to deploy the cluster. LCMCluster is created by the provider, using machineTypes of the Release object. The status field of LCMCluster reflects the status of the cluster, for example, the number of ready or requested nodes.

HelmBundle

Wrapper for Helm charts that is handled by Helm controller. HelmBundle tracks what Helm charts must be installed on a managed cluster.

LCM controller

LCM controller runs on the management and regional cluster and orchestrates the LCMMachine objects according to their type and their LCMCluster object.

Once the LCMCluster and LCMMachine objects are created, LCM controller starts monitoring them to modify the spec fields and update the status fields of the LCMMachine objects when required. The status field of LCMMachine is updated by LCM agent running on a node of a management, regional, or managed cluster.

Each LCMMachine has the following lifecycle states:

  1. Uninitialized - the machine is not yet assigned to an LCMCluster.

  2. Pending - the agent reports a node IP address and hostname.

  3. Prepare - the machine executes StateItems that correspond to the prepare phase. This phase usually involves downloading the necessary archives and packages.

  4. Deploy - the machine executes StateItems that correspond to the deploy phase that is becoming a Universal Control Plane (UCP) node.

  5. Ready - the machine is deployed.

  6. Reconfigure - the machine is being updated with a new set of manager nodes. Once done, the machine moves to the ready state again.

The templates for StateItems are stored in the machineTypes field of an LCMCluster object, with separate lists for the UCP manager and worker nodes. Each StateItem has the execution phase field for a management, regional, and managed cluster:

  1. The prepare phase is executed for all machines for which it was not executed yet. This phase comprises downloading the files necessary for the cluster deployment, installing the required packages, and so on.

  2. During the deploy phase, a node is added to the cluster. LCM controller applies the deploy phase to the nodes in the following order:

    1. First manager node is deployed.

    2. The remaining manager nodes are deployed one by one and the worker nodes are deployed in batches (by default, up to 50 worker nodes at the same time). After at least one manager and one worker node are in the ready state, helm-controller is installed on the cluster.

LCM controller deploys and upgrades a Docker Enterprise (DE) Cluster with UCP by setting StateItems of LCMMachine objects following the corresponding StateItems phases described above. The DE Cluster with UCP upgrade process follows the same logic that is used for a new deployment, that is applying a new set of StateItems to the LCMMachines after updating the LCMCluster object. But during the upgrade, the following additional actions are performed:

  • If the existing worker node is being upgraded, LCM controller performs draining and cordoning on this node honoring the Pod Disruption Budgets. This operation prevents unexpected disruptions of the workloads.

  • LCM controller verifies that the required version of helm-controller is installed.

LCM agent

LCM agent handles a single machine that belongs to a management, regional, or managed cluster. It runs on the machine operating system but communicates with apiserver of the regional cluster. LCM agent is deployed as a systemd unit using cloud-init. LCM agent has a built-in self-upgrade mechanism.

LCM agent monitors the spec of a particular LCMMachine object to reconcile the machine state with the object StateItems and update the LCMMachine status accordingly. The actions that LCM agent performs while handling the StateItems are as follows:

  • Download configuration files

  • Run shell commands

  • Run Ansible playbooks in headless mode

LCM agent provides the IP address and hostname of the machine for the LCMMachine status parameter.

Helm controller

Helm controller is used by Docker Enterprise (DE) Container Cloud to handle management, regional, and managed clusters core addons such as StackLight and the application addons such as the OpenStack components.

Helm controller runs in the same pod as the Tiller process. The Tiller gRPC endpoint is not accessible outside the pod. The pod is created using StatefulSet inside a cluster by LCM controller once the cluster contains at least one manager and worker node.

The Helm release information is stored in the KaaSRelease object for the management and regional clusters and in the ClusterRelease object for all types of the DE Container Cloud clusters. These objects are used by the DE Container Cloud provider. The DE Container Cloud provider uses the information from the ClusterRelease object together with the DE Container Cloud API Cluster spec. In Cluster spec, the operator can specify the Helm release name and charts to use. By combining the information from the Cluster providerSpec parameter and its ClusterRelease object, the cluster actuator generates the LCMCluster objects. These objects are further handled by LCM controller and the HelmBundle object handled by Helm controller. HelmBundle must have the same name as the LCMCluster object for the cluster that HelmBundle applies to.

Although a cluster actuator can only create a single HelmBundle per cluster, Helm controller can handle multiple HelmBundle objects per cluster.

Helm controller handles the HelmBundle objects and reconciles them with the Tiller state in its cluster. However, full reconciliation against Tiller is not supported yet relying on the status data of the HelmBundle objects.

Helm controller can also be used by the management cluster with corresponding HelmBundle objects created as part of the initial management cluster setup.

Identity and access management

Identity and access management (IAM) provides a central point of users and permissions management of the Docker Enterprise (DE) Container Cloud cluster resources in a granular and unified manner. Also, IAM provides infrastructure for single sign-on user experience across all DE Container Cloud web portals.

IAM for DE Container Cloud consists of the following components:

Keycloak
  • Provides the OpenID Connect endpoint

  • Integrates with an external Identity Provider (IdP), for example, existing LDAP or Google Open Authorization (OAuth)

  • Stores roles mapping for users

IAM controller
  • Provides IAM API with data about DE Container Cloud projects

  • Handles all role-based access control (RBAC) components in Kubernetes API

IAM API

Provides an abstraction API for creating user scopes and roles

IAM API and CLI

Mirantis IAM exposes the versioned and backward compatible Google remote procedure call (gRPC) protocol API to interact with IAM CLI.

IAM API is designed as a user-facing functionality. For this reason, it operates in the context of user authentication and authorization.

In IAM API, an operator can use the following entities:

  • Grants - to grant or revoke user access

  • Scopes - to describe user roles

  • Users - to provide user account information

Docker Enterprise (DE) Container Cloud UI interacts with IAM API on behalf of the user. However, the user can directly work with IAM API using IAM CLI. IAM CLI uses the OpenID Connect (OIDC) endpoint to obtain the OIDC token for authentication in IAM API and enable you to perform different API operations.

The following diagram illustrates the interaction between IAM API and CLI:

_images/iam-api-cli.png

External Identity Provider integration

To be consistent and keep the integrity of a user database and user permissions, in Docker Enterprise (DE) Container Cloud, IAM stores the user identity information internally. However in real deployments, the identity provider usually already exists.

Out of the box, in DE Container Cloud, IAM supports a post-deployment integration with LDAP and Google Open Authorization (OAuth) using Keycloak. If LDAP is configured as an external identity provider, IAM performs one-way synchronization doing mapping attributes according to configuration.

In case of the Google Open Authorization (OAuth) integration, the user is automatically registered and their credentials are stored in the internal database according to the user template configuration. The Google OAuth registration workflow is as follows:

  1. The user requests a DE Container Cloud web UI resource.

  2. The user is redirected to the IAM login page and logs in using the Log in with Google account option.

  3. IAM creates a new user with the default access rights that are defined in the user template configuration.

  4. The user can access the DE Container Cloud web UI resource.

The following diagram illustrates the external IdP integration to IAM:

_images/iam-ext-idp.png

You can configure simultaneous integration with both external IdPs with the user identity matching feature enabled.

Authentication and authorization

Mirantis IAM uses the OpenID Connect (OIDC) protocol for handling authentication.

Implementation flow

Mirantis IAM performs as an OpenID Connect (OIDC) provider, it issues a token and exposes discovery endpoints.

The credentials can be handled by IAM itself or delegated to an external identity provider (IdP).

The issued JSON Web Token (JWT) is sufficient to perform operations across Docker Enterprise (DE) Container Cloud according to the scope and role defined in it. Mirantis recommends using asymmetric cryptography for token signing (RS256) to minimize the dependency between IAM and managed components.

When DE Container Cloud calls UCP, the user in Keycloak is created automatically with a JWT issued by Keycloak on behalf of the end user. UCP, in its turn, verifies whether the JWT is issued by Keycloak. If the user retrieved from the token does not exist in the UCP database, the user is automatically created in the UCP database based on the information from the token.

The authorization implementation is out of the scope of IAM in DE Container Cloud. This functionality is delegated to the component level. IAM interacts with a DE Container Cloud component using the OIDC token content that is processed by a component itself and required authorization is enforced. Such an approach enables you to have any underlying authorization that is not dependent on IAM and still to provide a unified user experience across all DE Container Cloud components.

Kubernetes CLI authentication flow

The following diagram illustrates the Kubernetes CLI authentication flow. The authentication flow for Helm and other Kubernetes-oriented CLI utilities is identical to the Kubernetes CLI flow, but JSON Web Tokens (JWT) must be pre-provisioned.

_images/iam-authn-k8s.png

Storage

The baremetal-based Docker Enterprise (DE) Container Cloud uses Ceph as a distributed storage system for file, block, and object storage. This section provides an overview of a Ceph cluster deployed by DE Container Cloud.

Overview

Docker Enterprise (DE) Container Cloud deploys Ceph on the baremetal-based management and managed clusters using Helm charts with the following components:

  • Ceph controller - a Kubernetes controller that obtains the parameters from DE Container Cloud through a custom resource (CR), creates CRs for Rook, and updates its CR status based on the Ceph cluster deployment progress. It creates users, pools, and keys for OpenStack and Kubernetes and provides Ceph configurations and keys to access them. Also, Ceph controller eventually obtains the data from the OpenStack Controller for the Keystone integration and updates the RADOS Gateway services configurations to use Kubernetes for user authentication.

  • Ceph operator

    • Transforms user parameters from the DE Container Cloud web UI into Rook credentials and deploys a Ceph cluster using Rook.

    • Provides integration of the Ceph cluster with Kubernetes

    • Provides data for OpenStack to integrate with the deployed Ceph cluster

  • Custom resource (CR) - represents the customization of a Kubernetes installation and allows you to define the required Ceph configuration through the DE Container Cloud web UI before deployment. For example, you can define the failure domain, pools, node roles, number of Ceph components such as Ceph OSDs, and so on.

  • Rook - a storage orchestrator that deploys Ceph on top of a Kubernetes cluster.

A typical Ceph cluster consists of the following components:

Ceph Monitors

Three or, in rare cases, five Ceph Monitors.

Ceph Managers

Mirantis recommends having three Ceph Managers in every cluster

RADOS Gateway services

Mirantis recommends having three or more RADOS Gateway services for HA.

Ceph OSDs

The number of OSDs may vary according to the deployment needs.

The placement of Ceph Monitors and Ceph Managers is defined in the custom resource.

The following diagram illustrates the way a Ceph cluster is deployed in DE Container Cloud:

_images/ceph-deployment.png

The following diagram illustrates the processes within a deployed Ceph cluster:

_images/ceph-data-flow.png

Limitations

A Ceph cluster configuration in Docker Enterprise (DE) Container Cloud includes but is not limited to the following limitations:

  • Only one Ceph controller per a management, regional, or managed cluster and only one Ceph cluster per Ceph controller are supported.

  • Only one CRUSH tree per cluster. The separation of devices per Ceph pool is supported through device classes with only one pool of each type for a device class.

  • Only the following types of CRUSH buckets are supported:

    • topology.kubernetes.io/region

    • topology.kubernetes.io/zone

    • topology.rook.io/datacenter

    • topology.rook.io/room

    • topology.rook.io/pod

    • topology.rook.io/pdu

    • topology.rook.io/row

    • topology.rook.io/rack

    • topology.rook.io/chassis

  • RBD mirroring is not supported.

  • Consuming an existing Ceph cluster is not supported.

  • CephFS is not supported.

  • Only IPv4 is supported.

  • If two or more OSDs are located on the same device, there must be no dedicated WAL or DB for this class.

  • Only a full collocation or dedicated WAL and DB configurations are supported.

  • All CRUSH rules must have the same failure_domain.

Monitoring

Docker Enterprise (DE) Container Cloud uses StackLight, the logging, monitoring, and alerting solution that provides a single pane of glass for cloud maintenance and day-to-day operations as well as offers critical insights into cloud health including operational information about the components deployed in management, regional, and managed clusters. StackLight is based on Prometheus, an open-source monitoring solution and a time series database.

Deployment architecture

Docker Enterprise (DE) Container Cloud deploys the StackLight stack as a release of a Helm chart that contains the helm-controller and helmbundles.lcm.mirantis.com (HelmBundle) custom resources. The StackLight HelmBundle consists of a set of Helm charts with the StackLight components that include:

StackLight components overview

StackLight component

Description

Alerta

Receives, consolidates, and deduplicates the alerts sent by Alertmanager and visually represents them through a simple web UI. Using the Alerta web UI, you can view the most recent or watched alerts, group, and filter alerts.

Alertmanager

Handles the alerts sent by client applications such as Prometheus, deduplicates, groups, and routes alerts to receiver integrations. Using the Alertmanager web UI, you can view the most recent fired alerts, silence them, or view the Alertmanager configuration.

Elasticsearch curator

Maintains the data (indexes) in Elasticsearch by performing such operations as creating, closing, or opening an index as well as deleting a snapshot. Also, manages the data retention policy in Elasticsearch.

Elasticsearch exporter

The Prometheus exporter that gathers internal Elasticsearch metrics.

Grafana

Builds and visually represents metric graphs based on time series databases. Grafana supports querying of Prometheus using the PromQL language.

Database back ends

StackLight uses PostgreSQL for Alerta and Grafana. PostgreSQL reduces the data storage fragmentation while enabling high availability. High availability is achieved using Patroni, the PostgreSQL cluster manager that monitors for node failures and manages failover of the primary node. StackLight also uses Patroni to manage major version upgrades of PostgreSQL clusters, which allows leveraging the database engine functionality and improvements as they are introduced upstream in new releases, maintaining functional continuity without version lock-in.

Logging stack

Responsible for collecting, processing, and persisting logs and Kubernetes events. By default, when deploying through the DE Container Cloud web UI, only the metrics stack is enabled on managed clusters. To enable StackLight to gather a managed cluster logs, enable the logging stack during deployment. On management clusters, the logging stack is enabled by default. The logging stack components include:

  • Elasticsearch, which stores logs and notifications.

  • Fluentd-elasticsearch, which collects logs, sends them to Elasticsearch, generates metrics based on analysis of incoming log entries, and exposes these metrics to Prometheus.

  • Kibana, which provides real-time visualization of the data stored in Elasticsearch and enables you to detect issues.

  • Metricbeat, which collects Kubernetes events and sends them to Elasticsearch for storage.

  • Prometheus-es-exporter, which presents the Elasticsearch data as Prometheus metrics by periodically sending configured queries to the Elasticsearch cluster and exposing the results to a scrapable HTTP endpoint like other Prometheus targets.

Metric collector

Collects telemetry data (CPU or memory usage, number of active alerts, and so on) from Prometheus and sends the data to centralized cloud storage for further processing and analysis. Metric collector is enabled by default and runs on the management cluster.

Netchecker

Monitors the network connectivity between the Kubernetes nodes. Netchecker runs on managed clusters.

Prometheus

Gathers metrics. Automatically discovers and monitors the endpoints. Using the Prometheus web UI, you can view simple visualizations and debug. By default, the Prometheus database stores metrics of the past 15 days.

Prometheus-es-exporter

Presents the Elasticsearch data as Prometheus metrics by periodically sending configured queries to the Elasticsearch cluster and exposing the results to a scrapable HTTP endpoint like other Prometheus targets.

Prometheus node exporter

Gathers hardware and operating system metrics exposed by kernel.

Prometheus Relay

Adds a proxy layer to Prometheus to merge the results from underlay Prometheus servers to prevent gaps in case some data is missing on some servers. Is available only in the HA StackLight mode.

Pushgateway

Enables ephemeral and batch jobs to expose their metrics to Prometheus. Since these jobs may not exist long enough to be scraped, they can instead push their metrics to Pushgateway, which then exposes these metrics to Prometheus. Pushgateway is not an aggregator or a distributed counter but rather a metrics cache. The pushed metrics are exactly the same as scraped from a permanently running program.

Telegraf

Collects metrics from the system. Telegraf is plugin-driven and has the concept of two distinct set of plugins: input plugins collect metrics from the system, services, or third-party APIs; output plugins write and expose metrics to various destinations.

The Telegraf agents used in DE Container Cloud include:

  • telegraf-ds-smart monitors SMART disks, and runs on both management and managed clusters.

  • telegraf-ironic monitors Ironic on the baremetal-based management clusters. The ironic input plugin collects and processes data from Ironic HTTP API, while the http_response input plugin checks Ironic HTTP API availability. As an output plugin, to expose collected data as Prometheus target, Telegraf uses prometheus.

  • telegraf-ucp gathers metrics from the Docker Engine - Enterprise API about the Docker nodes, networks, and Swarm services. This is a Docker Telegraf input plugin with the downstream additions.

Telemeter

Enables a multi-cluster view through a Grafana dashboard of the management cluster. Telemeter includes a Prometheus federation push server and clients to enable isolated Prometheus instances, which cannot be scraped from a central Prometheus instance, to push metrics to the central location. Telemeter server runs on the management cluster and Telemeter clients run on the managed clusters.

Every Helm chart contains a default values.yml file. These default values are partially overridden by custom values defined in the StackLight Helm chart.

Before deploying a management or managed cluster, you can select the HA or non-HA StackLight architecture type. The non-HA mode is set by default. The following table lists the differences between the HA and non-HA modes:

StackLight database modes

Non-HA StackLight mode default

HA StackLight mode

  • One Prometheus instance

  • One Elasticsearch instance

  • One PostgreSQL instance

One persistent volume is provided for storing data. In case of a service or node failure, a new pod is redeployed and the volume is reattached to provide the existing data. Such setup has a reduced hardware footprint but provides less performance.

  • Two Prometheus instances

  • Three Elasticsearch instances

  • Three PostgreSQL instances

Local Volume Provisioner is used to provide local host storage. In case of a service or node failure, the traffic is automatically redirected to any other running Prometheus or Elasticsearch server. For better performance, Mirantis recommends that you deploy StackLight in the HA mode.

Authentication flow

StackLight provides five web UIs including Prometheus, Alertmanager, Alerta, Kibana, and Grafana. Access to StackLight web UIs is protected by Keycloak-based Identity and access management (IAM). All web UIs except Alerta are exposed to IAM through the IAM proxy middleware. The Alerta configuration provides direct integration with IAM.

The following diagram illustrates accessing the IAM-proxied StackLight web UIs, for example, Prometheus web UI:

_images/sl-auth-iam-proxied.png

Authentication flow for The IAM-proxied StackLight web UIs:

  1. A user enters the public IP of a StackLight web UI, for example, Prometheus web UI.

  2. The public IP leads to IAM proxy, deployed as a Kubernetes LoadBalancer, which protects the Prometheus web UI.

  3. LoadBalancer routes the HTTP request to Kubernetes internal IAM proxy service endpoints, specified in the X-Forwarded-Proto or X-Forwarded-Host headers.

  4. The Keycloak login form opens (--discovery-url in the IAM proxy, which points to Keycloak realm) and the user enters the user name and password.

  5. Keycloak validates the user name and password.

  6. The user obtains access to the Prometheus web UI (--upstream-url in the IAM proxy).

Note

  • The discovery URL is the URL of the IAM service.

  • The upstream URL is the hidden endpoint of a web UI (Prometheus web UI in the example above).

The following diagram illustrates accessing the Alerta web UI:

_images/sl-authentication-direct.png

Authentication flow for the Alerta web UI:

  1. A user enters the public IP of the Alerta web UI.

  2. The public IP leads to Alerta deployed as a Kubernetes LoadBalancer type.

  3. LoadBalancer routes the HTTP request to the Kubernetes internal Alerta service endpoint.

  4. The Keycloak login form opens (Alerta refers to the IAM realm) and the user enters the user name and password.

  5. Keycloak validates the user name and password.

  6. The user obtains access to the Alerta web UI.

Supported features

Using the Docker Enterprise (DE) Container Cloud web UI, on the pre-deployment stage of a managed cluster, you can view, enable or disable, or tune the following StackLight features available:

  • StackLight HA mode.

  • Database retention size and time for Prometheus.

  • Tunable index retention period for Elasticsearch.

  • Tunable PersistentVolumeClaim (PVC) size for Prometheus and Elasticsearch set to 16 GB for Prometheus and 30 GB for Elasticsearch by default. The PVC size must be logically aligned with the retention periods or sizes for these components.

  • Email and Slack receivers for the Alertmanager notifications.

  • Predefined set of dashboards.

  • Predefined set of alerts and capability to add new custom alerts for Prometheus in the following exemplary format:

    - alert: HighErrorRate
      expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
      for: 10m
      labels:
        severity: page
      annotations:
        summary: High request latency
    

Monitored components

StackLight measures, analyzes, and reports in a timely manner about failures that may occur in the following Docker Enterprise (DE) Container Cloud components and their sub-components, if any:

  • Ceph

  • Ironic (DE Container Cloud bare-metal provider)

  • Kubernetes services:

    • Calico

    • etcd

    • Kubernetes cluster

    • Kubernetes containers

    • Kubernetes deployments

    • Kubernetes nodes

    • Netchecker

  • NGINX

  • Node hardware and operating system

  • PostgreSQL

  • SMART disks

  • StackLight:

    • Alertmanager

    • Elasticsearch

    • Grafana

    • Prometheus

    • Prometheus Relay

    • Pushgateway

    • Telemeter

  • SSL certificates

  • UCP

    • Docker/Swarm metrics (through Telegraf)

    • Built-in UCP metrics

Hardware and system requirements

Using Docker Enterprise (DE) Container Cloud, you can deploy a Kubernetes cluster on bare metal, OpenStack, or Amazon Web Services (AWS). Each provider requires corresponding resources.

Note

Using the free Mirantis license, you can create up to three DE Container Cloud managed clusters with three worker nodes on each cluster. Within the same quota, you can also attach existing UCP clusters that are not deployed by DE Container Cloud. If you need to increase this quota, contact Mirantis support for further details.

Baremetal-based cluster

Reference hardware configuration

The following hardware configuration is used as a reference to deploy Docker Enterprise (DE) Container Cloud with bare metal DE Clusters with UCP.

Reference hardware configuration for DE Container Cloud clusters on bare metal

Server role

# of servers

Server model

CPU model

# of CPUs

# of vCPUs

RAM, GB

SSD system disk, GB

SSD/HDD storage disk, GB 0

NIC model

# of NICs 1

# of server ifaces (onboard + NICs)

Management cluster

3

Supermicro 1U SYS-6018R-TDW

Intel Xeon E5-2620v4

1

16

96

1x 960 2

2x 1900

Intel X520-DA2

2

6

Managed cluster

6 3

Supermicro 1U SYS-6018R-TDW

Intel Xeon E5-2620v4

1

16

96

1x 960 2

2x 1900

Intel X520-DA2

2

6

0

Minimum 3 storage disks are required:

  • sda - minimum 60 GB for system

  • sdb - minimum 60 GB for LocalVolumeProvisioner

  • sdc - for Ceph

For the default storage schema, see Operations Guide: Default host system storage

1

Only one PXE NIC per node is allowed.

2(1,2)

A management cluster requires 2 volumes for DE Container Cloud (total 50 GB) and 5 volumes for StackLight (total 60 GB). A managed cluster requires 5 volumes for StackLight.

3

Three manager nodes for HA and three worker storage nodes for a minimal Ceph cluster.

System requirements for the seed node

The seed node is necessary only to deploy the management cluster. When the bootstrap is complete, the bootstrap node can be redeployed and its resources can be reused for the managed cluster workloads.

The minimum reference system requirements for an baremetal-based bootstrap seed node are as follow:

  • Basic server on Ubuntu 18.04 with the following configuration:

    • Kernel version 4.15.0-76.86 or later

    • 8 GB of RAM

    • 4 CPU

    • 10 GB of free disk space for the bootstrap cluster cache

  • No DHCP or TFTP servers on any NIC networks

  • Routable access IPMI network for the hardware servers. For more details, see Host networking.

  • Internet access for downloading of all required artifacts

Host networking

The following network roles are defined for all Docker Enterprise (DE) Container Cloud clusters nodes on bare metal including the bootstrap, management, and managed cluster nodes:

  • Out-of-band (OOB) network

    Connects the Baseboard Management Controllers (BMC) of the hosts in the network to Ironic. This network or multiple networks if managed clusters have their own OOB networks must be accessible from the PXE network through the IP routing.

  • Common/PXE network

    Is a general purpose network used to remotely boot servers through the PXE protocol as well as for the Kubernetes API access and Kubernetes pods traffic. This network is shared between the management and managed clusters.

    Warning

    Only one Ethernet port on a host must be connected to the Common/PXE network at any given time. The physical address (MAC) of this interface must be noted and used to configure the BareMetalHost object describing the host.

The initially installed bootstrap node or node0 must be connected to the following networks:

  • The OOB network. Ironic must have access to the IPMI/BMC of the managed bare metal hosts. Though, Ironic must not be connected to the L2 segment directly. The OOB network must be accessible through the Router 1 in the PXE network.

  • The Common/PXE network. The instance of the kaas-bm running on node0 provides DHCP service on this network. This service is required for Ironic to inspect the bare metal hosts and install the operating system. The bootstrap node must be directly connected to the PXE network to ensure the L2 connectivity for DHCP. The default route for node0 must point to the Router 1 in the PXE network.

The DE Container Cloud bootstrap cluster node has the following networking configuration:

_images/bm-bootstrap-network.png

A management cluster node has the following networking configuration:

_images/bm-mgmt-network.png

A managed cluster node has the following network configuration:

_images/bm-managed-network.png

Cluster networking

The following diagram illustrates the L3 networking schema for the final state of the bare metal deployment as described in Host networking.

_images/bm-cluster-l3-networking.png

Network fabric

The following diagram illustrates the physical and virtual L2 underlay networking schema for the final state of the Docker Enterprise (DE) Container Cloud bare metal deployment.

_images/bm-cluster-physical-and-l2-networking.png

The network fabric reference configuration is a spine/leaf with 2 leaf ToR switches and one out-of-band (OOB) switch per rack.

Reference configuration uses the following switches for ToR and OOB:

  • Cisco WS-C3560E-24TD has 24 of 1 GbE ports. Used in OOB network segment.

  • Dell Force 10 S4810P has 48 of 1/10GbE ports. Used as ToR in Common/PXE network segment.

In the reference configuration, all odd interfaces from NIC0 are connected to TOR Switch 1, and all even interfaces from NIC0 are connected to TOR Switch 2. The Baseboard Management Controller (BMC) interfaces of the servers are connected to OOB Switch 1.

OpenStack-based cluster

While planning the deployment of an OpenStack-based Docker Enterprise (DE) Cluster with UCP, consider the following general requirements:

  • Kubernetes on OpenStack requires the Cinder and Octavia APIs availability.

  • The only supported OpenStack networking is Open vSwitch. Other networking technologies, such as Tungsten Fabric, are not supported.

  • The bootstrap and management clusters must have access to *.mirantis.com to download the release information and artifacts.

Requirements for an OpenStack-based DE Cluster with UCP

Resource

Bootstrap cluster 0

Management cluster

Managed cluster

Comments

# of nodes

1

3 (HA) + 1 (Bastion)

5 (6 with StackLight HA)

  • A bootstrap cluster requires access to the OpenStack API.

  • A management cluster requires 3 nodes for the manager nodes HA.

  • A managed cluster requires 3 nodes for the manager nodes HA and 2 nodes for the Docker Enterprise (DE) Container Cloud workloads. If the multiserver mode is enabled for StackLight, 3 nodes are required for the DE Container Cloud workloads.

  • A management cluster requires 1 node for the Bastion instance that is created with a public IP address to allow SSH access to instances.

# of vCPUs per node

2

8

8

  • The Bastion node requires 1 vCPU.

  • Refer to the RAM recommendations described below to plan resources for different types of nodes.

RAM in GB per node

4

16

16

To prevent issues with low RAM, Mirantis recommends the following types of instances for a managed cluster with 50-200 nodes:

  • 16 vCPUs and 32 GB of RAM - manager node

  • 16 vCPUs and 128 GB of RAM - nodes where the StackLight server components run

The Bastion node requires 1 GB of RAM.

Storage in GB per node

5 (available)

120

120

For the Bastion node, the default amount of storage is enough.

Operating system

Ubuntu 16.04 or 18.04

Ubuntu 18.04

Ubuntu 18.04

For a management and managed cluster, a base Ubuntu 18.04 image with the default SSH ubuntu user name must be present in Glance.

Docker version

18.09

-

-

For a management and managed cluster, Docker Engine - Enterprise 19.03.12 is deployed by DE Container Cloud as a CRI.

OpenStack version

-

Queens

Queens

Obligatory OpenStack components

-

Octavia, Cinder, OVS

Octavia, Cinder, OVS

# of Cinder volumes

-

7 (total 110 GB)

5 (total 60 GB)

  • A management cluster requires 2 volumes for DE Container Cloud (total 50 GB) and 5 volumes for StackLight (total 60 GB)

  • A managed cluster requires 5 volumes for StackLight

# of load balancers

-

10

6

  • LBs for a management cluster: 1 for Kubernetes, 4 for DE Container Cloud, 5 for StackLight

  • LBs for a managed cluster: 1 for Kubernetes and 5 for StackLight

# of floating IPs

-

13

11

  • FIPs for a management cluster: 1 for Kubernetes, 3 for the manager nodes (one FIP per node), 4 for DE Container Cloud, 5 for StackLight

  • FIPs for a managed cluster: 1 for Kubernetes, 3 for the manager nodes, 2 for the worker nodes, 5 for StackLight

0

The bootstrap cluster is necessary only to deploy the management cluster. When the bootstrap is complete, this cluster can be deleted and its resources can be reused for a managed cluster workloads.

AWS-based cluster

While planning the deployment of an AWS-based Docker Enterprise (DE) Cluster with UCP, consider the requirements described below.

Warning

Some of the AWS features required for Docker Enterprise (DE) Container Cloud may not be included into your AWS account quota. Therefore, carefully consider the AWS fees applied to your account that may increase for the DE Container Cloud infrastructure.

Requirements for an AWS-based DE Cluster with UCP

Resource

Bootstrap cluster 0

Management cluster

Managed cluster

Comment

# of nodes

1

3 (HA)

5 (6 with StackLight HA)

  • A bootstrap cluster requires access to the Mirantis CDN.

  • A management cluster requires 3 nodes for the manager nodes HA.

  • A managed cluster requires 3 nodes for the manager nodes HA and 2 nodes for the DE Container Cloud workloads. If the multiserver mode is enabled for StackLight, 3 nodes are required for the DE Container Cloud workloads.

# of vCPUs per node

2

8

8

RAM in GB per node

4

16

16

Storage in GB per node

5 (available)

120

120

Operating system

Ubuntu 16.04 or 18.04

Ubuntu 18.04

Ubuntu 18.04

For a management and managed cluster, a base Ubuntu 18.04 image with the default SSH ubuntu user name is required.

Docker version

18.09

-

-

For a management and managed cluster, Docker Engine - Enterprise 19.03.12 is deployed by DE Container Cloud as a CRI.

Instance type

-

c5d.2xlarge

c5d.2xlarge

To prevent issues with low RAM, Mirantis recommends the following types of instances for a managed cluster with 50-200 nodes:

  • c5d.4xlarge - manager node

  • r5.4xlarge - nodes where the StackLight server components run

Bastion host instance type

-

t2.micro

t2.micro

The Bastion instance is created with a public Elastic IP address to allow SSH access to instances.

# of volumes

-

7 (total 110 GB)

5 (total 60 GB)

  • A management cluster requires 2 volumes for DE Container Cloud (total 50 GB) and 5 volumes for StackLight (total 60 GB)

  • A managed cluster requires 5 volumes for StackLight

# of Elastic load balancers to be used

-

10

6

  • Elastic LBs for a management cluster: 1 for Kubernetes, 4 for DE Container Cloud, 5 for StackLight

  • Elastic LBs for a managed cluster: 1 for Kubernetes and 5 for StackLight

# of Elastic IP addresses to be used

-

1

1

0

The bootstrap cluster is necessary only to deploy the management cluster. When the bootstrap is complete, this cluster can be deleted and its resources can be reused for the managed cluster workloads.

UCP API limitations

To ensure the Docker Enterprise (DE) Container Cloud stability in managing the DE Container Cloud-based UCP clusters, the following UCP API functionality is not available for the DE Container Cloud-based UCP clusters as compared to the attached UCP clusters that are not deployed by DE Container Cloud. Use the DE Container Cloud web UI or CLI for this functionality instead.

Public APIs limitations in a DE Container Cloud-based UCP cluster

API endpoint

Limitation

GET /swarm

Swarm Join Tokens are filtered out for all users, including admins.

PUT /api/ucp/config-toml

All requests are forbidden.

POST /nodes/{id}/update

Requests for the following changes are forbidden:

  • Change Role

  • Add or remove the com.docker.ucp.orchestrator.swarm and com.docker.ucp.orchestrator.kubernetes labels.

DELETE /nodes/{id}

All requests are forbidden.