Introduction

This documentation provides information on how to deploy and operate a Mirantis Kubernetes Engine (MKE). The documentation is intended to help operators to understand the core concepts of the product. The documentation provides sufficient information to deploy and operate the solution.

The information provided in this documentation set is being constantly improved and amended based on the feedback and kind requests from the consumers of MKE.

Product Overview

Mirantis Kubernetes Engine (MKE, formerly Universal Control Plane or UCP) is the industry-leading container orchestration platform for developing and running modern applications at scale, on private clouds, public clouds, and on bare metal.

MKE delivers immediate value to your business by allowing you to adopt modern application development and delivery models that are cloud-first and cloud-ready. With MKE you get a centralized place with a graphical UI to manage and monitor your Kubernetes and/or Swarm cluster instance.

Your business benefits from using MKE as a container orchestration platform, especially in the following use cases:

More than one container orchestrator

Whether your application requirements are complex and require medium to large clusters or simple ones that can be deployed quickly on development environments, MKE gives you a container orchestration choice. Deploy Kubernetes, Swarm, or both types of clusters and manage them on a single MKE instance or centrally manage your instance using Mirantis Container Cloud.

Robust and scalable applications deployment

Monolithic applications are old school, microservices are the modern way to deploy an application at scale. Delivering applications through an automated CI/CD pipeline can dramatically improve time-to-market and service agility. Adopting microservices becomes a lot easier when using Kubernetes and/or Swarm clusters to deploy and test microservice-based applications.

Multi-tenant software offerings

Containerizing existing monolithic SaaS applications enables quicker development cycles, automated continuous integration and deployment. But these applications need to allow multiple users to share a single instance of a software application. MKE can operate multi-tenant environments, isolate teams and organizations, separate cluster resources, and so on.

See also

Kubernetes

Reference Architecture

The MKE Reference Architecture provides a technical overview of Mirantis Kubernetes Engine (MKE). It is your source for the product hardware and software specifications, standards, component information, and configuration detail.

Introduction to MKE

Mirantis Kubernetes Engine (MKE) is a containerized application that serves to simplify the deployment, configuration, and monitoring of your applications at scale.

Centralized cluster management

With Docker, you can join up to thousands of physical or virtual machines together to create a container cluster that allows you to deploy your applications at scale. MKE extends the functionality provided by Docker to make it easier to manage your cluster from a centralized place.

You can manage and monitor your container cluster using a graphical UI.

Deploy, manage, and monitor

With MKE, you can manage from a centralized place all of the computing resources you have available, like nodes, volumes, and networks.

You can also deploy and monitor your applications and services.

Built-in security and access control

MKE has its own built-in authentication mechanism and integrates with LDAP services. It also has role-based access control (RBAC), so that you can control who can access and make changes to your cluster and applications.

MKE integrates with Mirantis Secure Registry (MSR) so that you can keep the Docker images you use for your applications behind your firewall, where they are safe and can’t be tampered with.

You can also enforce security policies and only allow running applications that use Docker images you know and trust.

Use through the Docker CLI client

Because MKE exposes the standard Docker API, you can continue using the tools you already know, including the Docker CLI client, to deploy and manage your applications.

For example, you can use the docker info command to check the status of a cluster that’s managed by MKE:

docker info

This command produces the output that you expect from MKE:

Containers: 38
Running: 23
Paused: 0
Stopped: 15
Images: 17
Server Version: 19.03.13
...
Swarm: active
NodeID: ocpv7el0uz8g9q7dmw8ay4yps
Is Manager: true
ClusterID: tylpv1kxjtgoik2jnrg8pvkg6
Managers: 1
…

Once the MKE instance is deployed, developers and IT operations no longer interact with Mirantis Container Runtime directly, but interact with MKE instead.

MKE leverages the clustering and orchestration functionality provided by Docker.

A swarm is a collection of nodes that are in the same Docker cluster. Nodes in a Docker swarm operate in one of two modes: manager or worker. If nodes are not already running in a swarm when installing MKE, nodes will be configured to run in swarm mode.

When you deploy MKE, it starts running a globally scheduled service called ucp-agent. This service monitors the node where it’s running and starts and stops MKE services, based on whether the node is a manager or a worker node.

If the node is a:

  • Manager: the ucp-agent service automatically starts serving

    all MKE components, including the MKE web UI and data stores used by MKE. The ucp-agent accomplishes this by deploying several containers on the node. By promoting a node to manager, MKE automatically becomes highly available and fault tolerant.

  • Worker: on worker nodes, the ucp-agent service starts serving

    a proxy service that ensures only authorized users and other MKE services can run Docker commands in that node. The ucp-agent deploys a subset of containers on worker nodes.

MKE internal components

The core component of MKE is a globally scheduled service called ucp-agent. When you install MKE on a node, or join a node to a swarm that’s being managed by MKE, the ucp-agent service starts running on that node.

Once this service is running, it deploys containers with other MKE components, and it ensures they keep running. The MKE components that are deployed on a node depend on whether the node is a manager or a worker.

Note

Regarding OS-specific component names, some MKE component names depend on the node’s operating system. For example, on Windows, the ucp-agent component is named ucp-agent-win.

MKE hardware requirements

Take careful note of the minimum and recommended hardware requirements for MKE manager and worker nodes prior to deployment.

Note

  • High availability (HA) installations require transferring files between hosts.

  • On manager nodes, MKE only supports the workloads it requires to run.

  • Windows container images are typically larger than Linux container images. As such, provision more local storage for Windows nodes and for any MSR repositories that store Windows container images.

Minimum and recommended hardware requirements

Manager nodes

Worker nodes

Minimum hardware requirements

  • 8 GB of RAM

  • 2 vCPUs

  • 10 GB storage for the /var partition

  • 25 GB storage for /var/lib/kubelet/ for upgrade 1

  • 4 GB RAM

  • 500 MB storage for the /var partition

  • 25 GB storage for /var/lib/kubelet/ for upgrade 1

Recommended hardware requirements

  • 16 GB RAM

  • 4 vCPUs

  • 25 - 100 GB storage, allocated as follows:

    • 25 GB for a single /var/ partition

    • Separate partitions with a minimum of:

      • 6 GB for /var/lib/kubelet/ (for installations)

      • 4 GB for /var/lib/containerd/

      • 15 GB for /var/lib/docker/

Recommendations vary depending on the workloads.

1(1,2)

To override, use the --force-minimums flag.

MKE software requirements

Prior to MKE deployment, consider the following software requirements:

  • Run the same MCR version (19.03.8 or higher) on all nodes.

  • Run Linux kernel 3.10 or higher on all nodes.

    For debugging purposes, the host OS kernel versions should match as closely as possible.

  • Use a static IP address for each node in the cluster.

Manager nodes

Manager nodes manage a swarm and persist the swarm state. Using several containers per node, the ucp-agent automatically deploys all MKE components on manager nodes, including the MKE web UI and the data stores that MKE uses.

The following table details the MKE services that run on manager nodes:

MKE components on manager nodes

MKE component

Description

k8s_calico-kube-controllers

A cluster-scoped Kubernetes controller used to coordinate Calico networking. Runs on one manager node only.

k8s_calico-node

The Calico node agent, which coordinates networking fabric according to the cluster-wide Calico configuration. Part of the calico-node DaemonSet. Runs on all nodes. Configure the container network interface (CNI) plugin using the --cni-installer-url flag. If this flag is not set, MKE uses Calico as the default CNI plugin.

k8s_install-cni_calico-node

A container in which the Calico CNI plugin binaries are installed and configured on each host. Part of the calico-node DaemonSet. Runs on all nodes.

k8s_POD_calico-node

The Pause containers for the calico-node pod.

k8s_POD_calico-kube-controllers

The Pause containers for the calico-kube-controllers pod.

k8s_POD_compose

The Pause containers for the compose pod.

k8s_POD_kube-dns

The Pause containers for the kube-dns pod.

k8s_ucp-dnsmasq-nanny

A dnsmasq instance used in the Kubernetes DNS Service. Part of the kube-dns deployment. Runs on one manager node only.

k8s_ucp-kube-compose

A custom Kubernetes resource component that translates Compose files into Kubernetes constructs. Part of the compose deployment. Runs on one manager node only.

k8s_ucp-kube-dns

The main Kubernetes DNS Service, used by pods to resolve service names. Part of the kube-dns deployment, a set of three containers deployed through Kubernetes as a single pod. Provides service discovery for Kubernetes services and pods. Runs on one manager node only.

k8s_ucp-kubedns-sidecar

A daemon of the Kubernetes DNS Service responsible for health checking and metrics. Part of the kube-dns deployment. Runs on one manager node only.

ucp-agent

The agent that monitors the node and ensures the right MKE services are running. Named ucp-agent-win on Windows nodes.

ucp-auth-api

The centralized service for identity and authentication used by MKE and MSR.

ucp-auth-store

A container that stores authentication configurations and data for users, organizations, and teams.

ucp-auth-worker

A container that performs scheduled LDAP synchronizations and cleans authentication and authorization data.

ucp-client-root-ca

A certificate authority to sign client bundles.

ucp-cluster-root-ca

A certificate authority used for TLS communication between MKE components.

ucp-controller

The MKE web server.

ucp-dsinfo

A Docker system script for collecting troubleshooting information. Named ucp-dsinfo-win on Windows nodes.

ucp-hardware-info

A container for collecting disk/hardware information about the host.

ucp-interlock

A container that monitors Swarm workloads configured to use Layer 7 routing. Only runs when you enable Layer 7 routing.

ucp-interlock-proxy

A service that provides load balancing and proxying for Swarm workloads. Only runs when you enable Layer 7 routing.

ucp-kube-apiserver

A master component that serves the Kubernetes API. It persists its state in etcd directly, and all other components communicate directly with the API server. The Kubernetes API server is configured to encrypt Secrets using AES-CBC with a 256-bit key. The encryption key is never rotated, and the encryption key is stored on manager nodes, in a file on disk.

ucp-kube-controller-manager

A master component that manages the desired state of controllers and other Kubernetes objects. It monitors the API server and performs background tasks when needed.

ucp-kubelet

The Kubernetes node agent running on every node, which is responsible for running Kubernetes pods, reporting the health of the node, and monitoring resource usage.

ucp-kube-proxy

The networking proxy running on every node, which enables pods to contact Kubernetes services and other pods by way of cluster IP addresses.

ucp-kube-scheduler

A master component that handles pod scheduling. It communicates with the API server only to obtain workloads that need to be scheduled.

ucp-kv

A container used to store the MKE configurations. Do not use it in your applications, as it is for internal use only. Also used by Kubernetes components.

ucp-metrics

A container used to collect and process metrics for a node, such as the disk space available.

ucp-node-feature-discovery

A container that provides node feature discovery labels for Kubernetes nodes.

ucp-nvidia-gpu-device-discovery

A container that provides GPU feature discovery to automatically label nodes with NVIDIA hardware devices.

ucp-nvidia-device-plugin

A container that allows GPU-enabled Kubernetes workloads to run on MKE.

ucp-proxy

A TLS proxy that allows secure access from the local Mirantis Container Runtime to MKE components.

ucp-reconcile

A container that converges the node to its desired state whenever the ucp-agent service detects that the node is not running the correct MKE components. This container should remain in an exited state when the node is healthy.

ucp-swarm-manager

A container used to provide backwards compatibility with Docker Swarm.

See also

Worker nodes

Worker nodes are instances of MCR that participate in a swarm for the purpose of executing containers. Such nodes receive and execute tasks dispatched from manager nodes. Worker nodes must have at least one manager node, as they do not participate in the Raft distributed state, perform scheduling, or serve the swarm mode HTTP API.

The following table details the MKE services that run on worker nodes.

MKE components on worker nodes

MKE component

Description

k8s_calico-node

A cluster-scoped Kubernetes controller used to coordinate Calico networking. Runs on all nodes.

k8s_install-cni_calico-node

A container that installs the Calico CNI plugin binaries and configuration on each host. Part of the calico-node DaemonSet. Runs on all nodes.

k8s_POD_calico-node

The Pause containers for the Calico-node pod. By default, this container is hidden, but you can see it by running the following command:

docker ps -a

ucp-agent

A service that monitors the node and ensures that the correct MKE services are running. On worker nodes, the ucp-agent service ensures that only authorized users and other MKE services can run Docker commands on the node. The ucp-agent deploys a subset of containers on worker nodes.

ucp-interlock-extension

A helper service that reconfigures the ucp-interlock-proxy service, based on the Swarm workloads that are running.

ucp-interlock-proxy

A service that provides load balancing and proxying for swarm workloads. Only runs when you enable Layer 7 routing.

ucp-dsinfo

A Docker system script for collecting information that assists with troubleshooting. On Windows nodes the component name is ucp-dsinfo-win.

ucp-hardware-info

A container for collecting disk and hardware information about the host.

ucp-kubelet

The kubernetes node agent running on every node, which is responsible for running Kubernetes pods, reporting the health of the node, and monitoring resource usage.

ucp-kube-proxy

The networking proxy running on every node, which enables pods to contact Kubernetes services and other pods through cluster IP addresses.

ucp-node-feature-discovery

A container that provides node feature discovery labels for Kubernetes nodes.

ucp-nvidia-gpu-device-discovery

A container that provides GPU feature discovery to automatically label nodes with NVIDIA hardware devices.

ucp-nvidia-device-plugin

A container that allows GPU-enabled Kubernetes workloads to run on MKE.

ucp-reconcile

A container that converges the node to its desired state whenever the ucp-agent service detects that the node is not running the correct MKE components. This container should remain in an exited state when the node is healthy.

ucp-proxy

A TLS proxy that allows secure access from the local Mirantis Container Runtime to MKE components.

See also

Kubernetes

Admission controllers

Admission controllers are plugins that govern and enforce cluster usage. There are two types of admission controllers: default and custom. The tables below list the available admission controllers. For more information, see Kubernetes documentation: Using Admission Controllers.

Note

You cannot enable or disable custom admission controllers.


Default admission controllers

Name

Description

DefaultStorageClass

Adds a default storage class to PersistentVolumeClaim objects that do not request a specific storage class.

DefaultTolerationSeconds

Sets the pod default forgiveness toleration to tolerate the notready:NoExecute and unreachable:NoExecute taints based on the default-not-ready-toleration-seconds and default-unreachable-toleration-seconds Kubernetes API server input parameters if they do not already have toleration for the node.kubernetes.io/not-ready:NoExecute or node.kubernetes.io/unreachable:NoExecute taints. The default value for both input parameters is five minutes.

LimitRanger

Ensures that incoming requests do not violate the constraints in a namespace LimitRange object.

MutatingAdmissionWebhook

Calls any mutating webhooks that match the request.

NamespaceLifecycle

Ensures that users cannot create new objects in namespaces undergoing termination and that MKE rejects requests in nonexistent namespaces. It also prevents users from deleting the reserved default, kube-system, and kube-public namespaces.

NodeRestriction

Limits the Node and Pod objects that a kubelet can modify.

PersistentVolumeLabel (deprecated)

Attaches region or zone labels automatically to PersistentVolumes as defined by the cloud provider.

PodNodeSelector

Limits which node selectors can be used within a namespace by reading a namespace annotation and a global configuration.

PodSecurityPolicy

Determines whether a new or modified pod should be admitted based on the requested security context and the available Pod Security Policies.

ResourceQuota

Observes incoming requests and ensures they do not violate any of the constraints in a namespace ResourceQuota object.

ServiceAccount

Implements automation for ServiceAccount resources.

ValidatingAdmissionWebhook

Calls any validating webhooks that match the request.


Custom admission controllers

Name

Description

UCPAuthorization

  • Annotates Docker Compose-on-Kubernetes Stack resources with the identity of the user performing the request so that the Docker Compose-on-Kubernetes resource controller can manage Stacks with correct user authorization.

  • Detects the deleted ServiceAccount resources to correctly remove them from the scheduling authorization back end of an MKE node.

  • Simplifies creation of the RoleBindings and ClusterRoleBindings resources by automatically converting user, organization, and team Subject names into their corresponding unique identifiers.

  • Prevents users from deleting the built-in cluster-admin, ClusterRole, or ClusterRoleBinding resources.

  • Prevents under-privileged users from creating or updating PersistentVolume resources with host paths.

  • Works in conjunction with the built-in PodSecurityPolicies admission controller to prevent under-privileged users from creating Pods with privileged options.

CheckImageSigning

Enforces MKE Docker Content Trust policy which, if enabled, requires that all pods use container images that have been digitally signed by trusted and authorized users, which are members of one or more teams in MKE.

UCPNodeSelector

Adds a com.docker.ucp.orchestrator.kubernetes:* toleration to pods in the kube-system namespace and removes the com.docker.ucp.orchestrator.kubernetes tolerations from pods in other namespaces. This ensures that user workloads do not run on swarm-only nodes, which MKE taints with com.docker.ucp.orchestrator.kubernetes:NoExecute. It also adds a node affinity to prevent pods from running on manager nodes depending on MKE settings.

Pause containers

Every Kubernetes Pod includes an empty pause container, which bootstraps the pod to establish all of the cgroups, reservations, and namespaces before its individual containers are created. The pause container image is always present, so the pod resource allocation happens instantaneously as containers are created.

Pause containers are hidden by default. To display pause containers:

docker ps -a | grep -I pause

Example of system response:

8c9707885bf6   dockereng/ucp-pause:3.0.0-6d332d3   "/pause"  47 hours ago   Up 47 hours   k8s_POD_calico-kube-controllers-559f6948dc-5c84l_kube-system_d00e5130-1bf4-11e8-b426-0242ac110011_0
258da23abbf5   dockereng/ucp-pause:3.0.0-6d332d3   "/pause"  47 hours ago   Up 47 hours   k8s_POD_kube-dns-6d46d84946-tqpzr_kube-system_d63acec6-1bf4-11e8-b426-0242ac110011_0
2e27b5d31a06   dockereng/ucp-pause:3.0.0-6d332d3   "/pause"  47 hours ago   Up 47 hours   k8s_POD_compose-698cf787f9-dxs29_kube-system_d5866b3c-1bf4-11e8-b426-0242ac110011_0
5d96dff73458   dockereng/ucp-pause:3.0.0-6d332d3   "/pause"  47 hours ago   Up 47 hours   k8s_POD_calico-node-4fjgv_kube-system_d043a0ea-1bf4-11e8-b426-0242ac110011_0

See also

Kubernetes Pods

Volumes

MKE uses named volumes to persist data on all nodes on which it runs.

Volumes used by MKE

Volume name

Contents

ucp-auth-api-certs

Certificate and keys for the authentication and authorization service.

ucp-auth-store-certs

Certificate and keys for the authentication and authorization store.

ucp-auth-store-data

Data of the authentication and authorization store, replicated across managers.

ucp-auth-worker-certs

Certificate and keys for authentication worker.

ucp-auth-worker-data

Data of the authentication worker.

ucp-client-root-ca

Root key material for the MKE root CA that issues client certificates.

ucp-cluster-root-ca

Root key material for the MKE root CA that issues certificates for swarm members.

ucp-controller-client-certs

Certificate and keys that the MKE web server uses to communicate with other MKE components.

ucp-controller-server-certs

Certificate and keys for the MKE web server running in the node.

ucp-kv

MKE configuration data, replicated across managers.

ucp-kv-certs

Certificates and keys for the key-value store.

ucp-metrics-data

Monitoring data that MKE gathers.

ucp-metrics-inventory

Configuration file that the ucp-metrics service uses.

ucp-node-certs

Certificate and keys for node communication.

ucp-backup

Backup artifacts that are created while processing a backup. The artifacts persist on the volume for the duration of the backup and are cleaned up when the backup completes, though the volume itself remains.

You can customize the volume driver for the volumes by creating the volumes prior to installing MKE. During installation, MKE determines which volumes do not yet exist on the node and creates those volumes using the default volume driver.

By default, MKE stores the data for these volumes at /var/lib/docker/volumes/<volume-name>/_data.

Configuration

The table below presents the configuration files in use by MKE:

Configuration files in use by MKE

Configuration file name

Description

com.docker.interlock.extension

Configuration of the Interlock extension service that monitors and configures the proxy service

com.docker.interlock.proxy

Configuration of the service that handles and routes user requests

com.docker.license

MKE license

com.docker.ucp.interlock.conf

Configuration of the core Interlock service

Web UI and CLI

You can interact with MKE either through the web UI or the CLI.

With the MKE web UI you can manage your swarm, grant and revoke user permissions, deploy, configure, manage, and monitor your applications.

In addition, MKE exposes the standard Docker API, so you can continue using such existing tools as the Docker CLI client. As MKE secures your cluster with RBAC, you must configure your Docker CLI client and other client tools to authenticate your requests using client certificates that you can download from your MKE profile page.

MKE limitations

See also

Kubernetes

Installation Guide

The MKE Installation Guide provides everything you need to install and configure Mirantis Kubernetes Engine (MKE). The guide offers detailed information, procedures, and examples that are specifically designed to help DevOps engineers and administrators install and configure the MKE container orchestration platform.

Plan the deployment

Default install directories

The following table details the default MKE install directories:

Path

Description

/var/lib/docker

Docker data root directory

/var/lib/kubelet

kubelet data root directory (created with ftype = 1)

/var/lib/containerd

containerd data root directory (created with ftype = 1)

Host name strategy

Before installing MKE, plan a single host name strategy to use consistently throughout the cluster, keeping in mind that MKE and MCR both use host names.

There are two general strategies for creating host names: short host names and fully qualified domain names (FQDN). Consider the following examples:

  • Short host name: engine01

  • Fully qualified domain name: node01.company.example.com

MCR considerations

A number of MCR considerations must be taken into account when deploying any MKE cluster.

default-address-pools

MCR uses three separate IP ranges for the docker0, docker_gwbridge, and ucp-bridge interfaces. By default, MCR assigns the first available subnet in default-address-pools (172.17.0.0/16) to docker0, the second (172.18.0.0/16) to docker_gwbridge, and the third (172.18.0.0/16) to ucp-bridge.

Note

The ucp-bridge bridge network specifically supports MKE component containers.

You can reassign the docker0, docker_gwbridge, and ucp-bridge subnets in default-address-pools. To do so, replace the relevant values in default-address-pools, making sure that the setting includes at least three IP pools.

By default, default-address-pools contains the following values:

{
  "default-address-pools": [
   {"base":"172.17.0.0/16","size":16}, <-- docker0
   {"base":"172.18.0.0/16","size":16}, <-- docker_gwbridge
   {"base":"172.19.0.0/16","size":16}, <-- ucp-bridge
   {"base":"172.20.0.0/16","size":16},
   {"base":"172.21.0.0/16","size":16},
   {"base":"172.22.0.0/16","size":16},
   {"base":"172.23.0.0/16","size":16},
   {"base":"172.24.0.0/16","size":16},
   {"base":"172.25.0.0/16","size":16},
   {"base":"172.26.0.0/16","size":16},
   {"base":"172.27.0.0/16","size":16},
   {"base":"172.28.0.0/16","size":16},
   {"base":"172.29.0.0/16","size":16},
   {"base":"172.30.0.0/16","size":16},
   {"base":"192.168.0.0/16","size":20}
   ]
 }
The default-address-pools parameters

Parameter

Description

default-address-pools

The list of CIDR ranges used to allocate subnets for local bridge networks.

base

The CIDR range allocated for bridge networks in each IP address pool.

size

The CIDR netmask that determines the subnet size to allocate from the base pool. If the size matches the netmask of the base, then the pool contains one subnet. For example, {"base":"172.17.0.0/16","size":16} creates the subnet: 172.17.0.0/16 (172.17.0.1 - 172.17.255.255).

For example, {"base":"192.168.0.0/16","size":20} allocates /20 subnets from 192.168.0.0/16, including the following subnets for bridge networks:

192.168.0.0/20 (192.168.0.1 - 192.168.15.255)

192.168.16.0/20 (192.168.16.1 - 192.168.31.255)

192.168.32.0/20 (192.168.32.1 - 192.168.47.255)

192.168.48.0/20 (192.168.48.1 - 192.168.63.255)

192.168.64.0/20 (192.168.64.1 - 192.168.79.255)

192.168.240.0/20 (192.168.240.1 - 192.168.255.255)

docker0

MCR creates and configures the host system with the docker0 virtual network interface, an ethernet bridge through which all traffic between MCR and the container moves. MCR uses docker0 to handle all container routing. You can specify an alternative network interface when you start the container.

MCR allocates IP addresses from the docker0 configurable IP range to the containers that connect to docker0. The default IP range, or subnet, for docker0 is 172.17.0.0/16.

You can change the docker0 subnet in daemon.json using the following settings:

Parameter

Description

default-address-pools

Modify the first pool in default-address-pools.

Caution

By default, MCR assigns the second pool to docker_gwbridge. If you modify the first pool such that the size does not match the base netmask, it can affect docker_gwbridge.

{
   "default-address-pools": [
         {"base":"172.17.0.0/16","size":16}, <-- Modify this value
         {"base":"172.18.0.0/16","size":16},
         {"base":"172.19.0.0/16","size":16},
         {"base":"172.20.0.0/16","size":16},
         {"base":"172.21.0.0/16","size":16},
         {"base":"172.22.0.0/16","size":16},
         {"base":"172.23.0.0/16","size":16},
         {"base":"172.24.0.0/16","size":16},
         {"base":"172.25.0.0/16","size":16},
         {"base":"172.26.0.0/16","size":16},
         {"base":"172.27.0.0/16","size":16},
         {"base":"172.28.0.0/16","size":16},
         {"base":"172.29.0.0/16","size":16},
         {"base":"172.30.0.0/16","size":16},
         {"base":"192.168.0.0/16","size":20}
   ]
}

fixed-cidr

Configures a CIDR range.

Customize the subnet for docker0 using standard CIDR notation. The default subnet is 172.17.0.0/16, the network gateway is 172.17.0.1, and MCR allocates IPs 172.17.0.2 - 172.17.255.254 for your containers.

{
  "fixed-cidr": "172.17.0.0/16",
}

bip

Configures a gateway IP address and CIDR netmask of the docker0 network.

Customize the subnet for docker0 using the <gateway IP>/<CIDR netmask> notation. The default subnet is 172.17.0.0/16, the network gateway is 172.17.0.1, and MCR allocates IPs 172.17.0.2 - 172.17.255.254 for your containers.

{
  "bip": "172.17.0.0/16",
}
docker_gwbridge

The docker_gwbridge is a virtual network interface that connects overlay networks (including ingress) to individual MCR container networks. Initializing a Docker swarm or joining a Docker host to a swarm automatically creates docker_gwbridge in the kernel of the Docker host. The default docker_gwbridge subnet (172.18.0.0/16) is the second available subnet in default-address-pools.

To change the docker_gwbridge subnet, open daemon.json and modify the second pool in default-address-pools:

{
    "default-address-pools": [
       {"base":"172.17.0.0/16","size":16},
       {"base":"172.18.0.0/16","size":16}, <-- Modify this value
       {"base":"172.19.0.0/16","size":16},
       {"base":"172.20.0.0/16","size":16},
       {"base":"172.21.0.0/16","size":16},
       {"base":"172.22.0.0/16","size":16},
       {"base":"172.23.0.0/16","size":16},
       {"base":"172.24.0.0/16","size":16},
       {"base":"172.25.0.0/16","size":16},
       {"base":"172.26.0.0/16","size":16},
       {"base":"172.27.0.0/16","size":16},
       {"base":"172.28.0.0/16","size":16},
       {"base":"172.29.0.0/16","size":16},
       {"base":"172.30.0.0/16","size":16},
       {"base":"192.168.0.0/16","size":20}
   ]
}

Caution

  • Modifying the first pool to customize the docker0 subnet can affect the default docker_gwbridge subnet. Refer to docker0 for more information.

  • You can only customize the docker_gwbridge settings before you join the host to the swarm or after temporarily removing it.

Docker swarm

The default address pool that Docker Swarm uses for its overlay network is 10.0.0.0/8. If this pool conflicts with your current network implementation, you must use a custom IP address pool. Prior to installing MKE, specify your custom address pool using the --default-addr-pool option when initializing swarm.

Note

The Swarm default-addr-pool and MCR default-address-pools settings define two separate IP address ranges used for different purposes.

Kubernetes

Kubernetes uses two internal IP ranges, either of which can overlap and conflict with the underlying infrastructure, thus requiring custom IP ranges.

The pod network

Either Calico or Azure IPAM services gives each Kubernetes pod an IP address in the default 192.168.0.0/16 range. To customize this range, during MKE installation, use the --pod-cidr flag with the ucp install command.

The services network

You can access Kubernetes services with a VIP in the default 10.96.0.0/16 Cluster IP range. To customize this range, during MKE installation, use the --service-cluster-ip-range flag with the ucp install command.

See also

docker data-root

The storage path for such persisted data as images, volumes, and cluster state is docker data root (data-root in /etc/docker/daemon.json).

MKE clusters require that all nodes have the same docker data-root for the Kubernetes network to function correctly. In addition, if the data-root is changed on all nodes you must recreate the Kubernetes network configuration in MKE by running the following commands:

kubectl -n kube-system delete configmap/calico-config
kubectl -n kube-system delete ds/calico-node deploy/calico-kube-controllers

See also

Kubernetes

no-new-privileges

MKE currently does not support no-new-privileges: true in the /etc/docker/daemon.json file, as this causes several MKE components to enter a failed state.

Perform pre-deployment configuration

Configure networking

A well-configured network is essential for the proper functioning of your MKE deployment. Pay particular attention to such key factors as IP address provisioning, port management, and traffic enablement.

IP considerations

Before installing MKE, adopt the following practices when assigning IP addresses:

  • Ensure that your network and nodes support using a static IPv4 address and assign one to every node.

  • Avoid IP range conflicts. The following table lists the recommended addresses you can use to avoid IP range conflicts:

    Component

    Subnet

    Range

    Recommended IP address

    MCR

    default-address-pools

    CIDR range for interface and bridge networks

    172.17.0.0/16 - 172.30.0.0/16, 192.168.0.0/16

    Swarm

    default-addr-pool

    CIDR range for Swarm overlay networks

    10.0.0.0/8

    Kubernetes

    pod-cidr

    CIDR range for Kubernetes pods

    192.168.0.0/16

    Kubernetes

    service-cluster-ip-range

    CIDR range for Kubernetes services

    10.96.0.0/16

See also

Kubernetes

Open ports to incoming traffic

When installing MKE on a host, you need to open specific ports to incoming traffic. Each port listens for incoming traffic from a particular set of hosts, known as the port scope.

MKE uses the following scopes:

Scope

Description

External

Traffic arrives from outside the cluster through end-user interaction.

Internal

Traffic arrives from other hosts in the same cluster.

Self

Traffic arrives to that port only from processes on the same host.


Open the following ports for incoming traffic on each host type:

Hosts

Port

Scope

Purpose

Managers, workers

TCP 179

Internal

BGP peers, used for Kubernetes networking

Managers

TCP 443 (configurable)

External, internal

MKE web UI and API

Managers

TCP 2376 (configurable)

Internal

Docker swarm manager, used for backwards compatibility

Managers

TCP 2377 (configurable)

Internal

Control communication between swarm nodes

Managers, workers

UDP 4789

Internal

Overlay networking

Managers

TCP 6443 (configurable)

External, internal

Kubernetes API server endpoint

Managers, workers

TCP 6444

Self

Kubernetes API reverse proxy

Managers, workers

TCP, UDP 7946

Internal

Gossip-based clustering

Managers, workers

TCP 9099

Self

Calico health check

Managers, workers

TCP 10250

Internal

Kubelet

Managers, workers

TCP 12376

Internal

TLS authentication proxy that provides access to MCR

Managers, workers

TCP 12378

Self

etcd reverse proxy

Managers

TCP 12379

Internal

etcd Control API

Managers

TCP 12380

Internal

etcd Peer API

Managers

TCP 12381

Internal

MKE cluster certificate authority

Managers

TCP 12382

Internal

MKE client certificate authority

Managers

TCP 12383

Internal

Authentication storage back end

Managers

TCP 12384

Internal

Authentication storage back end for replication across managers

Managers

TCP 12385

Internal

Authentication service API

Managers

TCP 12386

Internal

Authentication worker

Managers

TCP 12387

Internal

Prometheus server

Managers

TCP 12388

Internal

Kubernetes API server

Managers, workers

TCP 12389

Self

Hardware Discovery API

Cluster and service networking options

Available since MKE 3.5.0

MKE supports the following cluster and service networking options:

  • Kube-proxy with iptables proxier, and either the managed CNI or an unmanaged alternative

  • Kube-proxy with ipvs proxier, and either the managed CNI or an unmanaged alternative

  • eBPF mode with either the managed CNI or an unmanaged alternative

You can configure cluster and service networking options at install time or in existing clusters. For detail on reconfiguring existing clusters, refer to Configure cluster and service networking in an existing cluster in the MKE Operations Guide.

Caution

Swarm workloads that require the use of encrypted overlay networks must use iptables proxier with either the managed CNI or an unmanaged alternative. Be aware that the other networking options detailed here automatically disable Docker Swarm encrypted overlay networks.


To enable kube-proxy with iptables proxier while using the managed CNI:

Using default option kube-proxy with iptables proxier is the equivalent of specifying --kube-proxy-mode=iptables at install time. To verify that the option is operational, confirm the presence of the following line in the ucp-kube-proxy container logs:

I1027 05:35:27.798469        1 server_others.go:212] Using iptables Proxier.

To enable kube-proxy with ipvs proxier while using the managed CNI:

  1. Prior to MKE installation, verify that the following kernel modules are available on all Linux manager and worker nodes:

    • ipvs

    • ip_vs_rr

    • ip_vs_wrr

    • ip_vs_sh

    • nf_conntrack_ipv4

  2. Specify --kube-proxy-mode=ipvs at install time.

  3. Optional. Once installation is complete, configure the following ipvs-related parameters in the MKE configuration file (otherwise, MKE will use the Kubernetes default parameter settings):

    • ipvs_exclude_cidrs = ""

    • ipvs_min_sync_period = ""

    • ipvs_scheduler = ""

    • ipvs_strict_arp = false

    • ipvs_sync_period = ""

    • ipvs_tcp_timeout = ""

    • ipvs_tcpfin_timeout = ""

    • ipvs_udp_timeout = ""

    For more information on using these parameters, refer to kube-proxy in the Kubernetes documentation.

    Note

    The ipvs-related parameters have no install time counterparts and therefore must only be configured once MKE installation is complete.

  4. Verify that kube-proxy with ipvs proxier is operational by confirming the presence of the following lines in the ucp-kube-proxy container logs:

    I1027 05:14:50.868486     1 server_others.go:274] Using ipvs Proxier.
    W1027 05:14:50.868822     1 proxier.go:445] IPVS scheduler not specified, use rr by default
    

To enable eBPF mode while using the managed CNI:

  1. Verify that the prerequisites for eBPF use have been met, including kernel compatibility, for all Linux manager and worker nodes. Refer to the Calico documentation Enable the eBPF dataplane for more information.

  2. Specify --calico-ebpf-enabled at install time.

  3. Verify that eBPF mode is operational by confirming the presence of the following lines in the ucp-kube-proxy container logs:

    KUBE_PROXY_MODE (disabled) CLEANUP_ON_START_DISABLED true
    "Sleeping forever...."
    

To enable kube-proxy with iptables proxier while using an unmanaged CNI.

  1. Specify --unmanaged-cni at install time.

  2. Verify that kube-proxy with iptables proxier is operational by confirming the presence of the following line in the ucp-kube-proxy container logs:

    I1027 05:35:27.798469     1 server_others.go:212] Using iptables Proxier.
    

To enable kube-proxy with ipvs proxier while using an unmanaged CNI:

  1. Specify the following parameters at install time:

    • --unmanaged-cni

    • --kube-proxy-mode=ipvs

  2. Verify that kube-proxy with ipvs proxier is operational by confirming the presence of the following lines in the ucp-kube-proxy container logs:

    I1027 05:14:50.868486     1 server_others.go:274] Using ipvs Proxier.
    W1027 05:14:50.868822     1 proxier.go:445] IPVS scheduler not specified, use rr by default
    

To enable eBPF mode while using an unmanaged CNI:

  1. Verify that the prerequisites for eBPF use have been met, including kernel compatibility, for all Linux manager and worker nodes. Refer to the Calico documentation Enable the eBPF dataplane for more information.

  2. Specify the following parameters at install time:

    • --unmanaged-cni

    • --kube-proxy-mode=disabled

    • --kube-default-drop-masq-bits

  3. Verify that eBPF mode is operational by confirming the presence of the following lines in ucp-kube-proxy container logs:

    KUBE_PROXY_MODE (disabled) CLEANUP_ON_START_DISABLED true
    "Sleeping forever...."
    
Calico networking

The default networking plugin for MKE is Calico, which supports two types of encapsulation: VXLAN and IP-in-IP. The default MKE setting is VXLAN.

Refer to Overlay networking in the Calico documentation for more information.

Enable ESP traffic

To function properly, overlay networks with encryption require that you allow IP protocol 50 Encapsulating Security Payload (ESP) traffic.

Avoid firewall conflicts

Avoid firewall conflicts in the following Linux distributions:

Linux distribution

Procedure

SUSE Linux Enterprise Server 12 SP2

Installations have the FW_LO_NOTRACK flag turned on by default in the openSUSE firewall. It speeds up packet processing on the loopback interface but breaks certain firewall setups that redirect outgoing packets via custom rules on the local machine.

To turn off the FW_LO_NOTRACK option:

  1. In /etc/sysconfig/SuSEfirewall2, set FW_LO_NOTRACK="no".

  2. Either restart the firewall or reboot the system.

SUSE Linux Enterprise Server 12 SP3

No change is required, as installations have the FW_LO_NOTRACK flag turned off by default.

Red Hat Enterprise Linux (RHEL) 8

Configure the FirewallBackend option:

  1. Verify that firewalld is running.

  2. In /etc/firewalld/firewalld.conf, set FirewallBackend=iptables (formerly FirewallBackend=nftables).

Alternatively, to allow traffic to enter the default bridge network (docker0), run the following commands:

firewall-cmd --permanent --zone=trusted --add-interface=docker0
firewall-cmd --reload

Preconfigure an SLES installation

Before performing SUSE Linux Enterprise Server (SLES) installations, consider the following prerequisite steps:

  • For SLES 15 installations, disable CLOUD_NETCONFIG_MANAGE prior to installing MKE:

    1. Set CLOUD_NETCONFIG_MANAGE="no" in the /etc/sysconfig/network/ifcfg-eth0 network interface configuration file.

    2. Run the service network restart command.

  • By default, SLES disables connection tracking. To allow Kubernetes controllers in Calico to reach the Kubernetes API server, enable connection tracking on the loopback interface for SLES by running the following commands for each node in the cluster:

    sudo mkdir -p /etc/sysconfig/SuSEfirewall2.d/defaults
    echo FW_LO_NOTRACK=no | sudo tee \
    /etc/sysconfig/SuSEfirewall2.d/defaults/99-docker.cfg
    sudo SuSEfirewall2 start
    

See also

Verify the timeout settings

Confirm that your networks provide MKE components with enough time to communicate.

Default timeout settings

Component

Timeout (ms)

Configurable

Raft consensus between manager nodes

3000

no

Gossip protocol for overlay networking

5000

no

etcd

500

yes

RethinkDB

10000

no

Stand-alone cluster

90000

no

See also

Configure time synchronization

Configure all containers in an MKE cluster to regularly synchronize with a Network Time Protocol (NTP) server. This ensures consistency between every container in the cluster and avoids unexpected behavior that can lead to poor performance.

Configure a load balancer

Though MKE does not include a load balancer, you can configure your own to balance user requests across all manager nodes. Before that, decide whether you will add nodes to the load balancer using their IP address or their fully qualified domain name (FQDN), and then use that strategy consistently throughout the cluster. Take note of all IP addresses or FQDNs before you start the installation.

If you plan to deploy both MKE and MSR, your load balancer must be able to differentiate between the two: either by IP address or port number. Because both MKE and MSR use port 443 by default, your options are as follows:

  • Configure your load balancer to expose either MKE or MSR on a port other than 443.

  • Configure your load balancer to listen on port 443 with separate virtual IP addresses for MKE and MSR.

  • Configure separate load balancers for MKE and MSR, both listening on port 443.

If you want to install MKE in a high-availability configuration with a load balancer in front of your MKE controllers, include the appropriate IP address and FQDN for the load balancer VIP. To do so, use one or more --san flags either with the ucp install command or in interactive mode when MKE requests additional SANs.

Configure IPVS

MKE supports the setting of values for all IPVS related parameters that are exposed by kube-proxy.

Kube-proxy runs on each cluster node, its role being to load-balance traffic whose destination is services (via cluster IPs and node ports) to the correct backend pods. Of the modes in which kube-proxy can run, IPVS (IP Virtual Server) offers the widest choice of load balancing algorithms and superior scalability.

Refer to the Calico documentation, Comparing kube-proxy modes: iptables or IPVS? for detailed information on IPVS.

Caution

You can only enable IPVS for MKE at installation, and it persists throughout the life of the cluster. Thus, you cannot switch to iptables at a later stage or switch over existing MKE clusters to use IPVS proxier.

MKE supports setting values for all IPVS-related parameters. For full parameter details, refer to the Kubernetes documentation for kube-proxy.

Use the kube-proxy-mode parameter at install time to enable IPVS proxier. The two valid values are iptables (default) and ipvs.

You can specify the following ipvs parameters for kube-proxy:

  • ipvs_exclude_cidrs

  • ipvs_min_sync_period

  • ipvs_scheduler

  • ipvs_strict_arp = false

  • ipvs_sync_period

  • ipvs_tcp_timeout

  • ipvs_tcpfin_timeout

  • ipvs_udp_timeout

To set these values at the time of bootstrap/installation:

  1. Add the required values under [cluster_config] in a TOML file (for example, config.toml).

  2. Create a config named com.docker.ucp.config from this TOML file:

    docker config create com.docker.ucp.config config.toml
    
  3. Use the --existing-config parameter when installing MKE. You can also change these values post-install using the MKE-s ucp/config-toml endpoint.

Caution

If you are using MKE 3.3.x with IPVS proxier and plan to upgrade to MKE 3.4.x, you must upgrade to MKE 3.4.3 or later as earlier versions of MKE 3.4.x do not support IPVS proxier.

Use an External Certificate Authority

You can customize MKE to use certificates signed by an External Certificate Authority (ECA). When using your own certificates, include a certificate bundle with the following:

  • ca.pem file with the root CA public certificate.

  • cert.pem file with the server certificate and any intermediate CA public certificates. This certificate should also have Subject Alternative Names (SANs) for all addresses used to reach the MKE manager.

  • key.pem file with a server private key.

You can either use separate certificates for every manager node or one certificate for all managers. If you use separate certificates, you must use a common SAN throughout. For example, MKE permits the following on a three-node cluster:

  • node1.company.example.org with the SAN mke.company.org

  • node2.company.example.org with the SAN mke.company.org

  • node3.company.example.org with the SAN mke.company.org

If you use a single certificate for all manager nodes, MKE automatically copies the certificate files both to new manager nodes and to those promoted to a manager role.

Customize named volumes

Note

Skip this step if you want to use the default named volumes.

MKE uses named volumes to persist data. If you want to customize the drivers that manage such volumes, create the volumes before installing MKE. During the installation process, the installer will automatically detect the existing volumes and start using them. Otherwise, MKE will create the default named volumes.

Configure kernel parameters

MKE uses the kernel parameters detailed in the tables below. Organized by parameter prefix, these tables present both the default parameter values and the values as they are set following MKE installation.

Note

The MKE value parameter values are not set by MKE, but by either MCR or an upstream component.


kernel.*

Parameter

Default

MKE value

Description

*panic

Distribution dependent

1

Sets the number of seconds the kernel waits to reboot following a panic.

*panic_on_oops

Distribution dependent

1

Sets whether the kernel should panic on an oops rather than continuing to attempt operations.

*pty.nr

Dependent on number of logins. Not user-configurable.

1

Sets the number of open PTYs.


net.bridge.bridge-nf-*

Parameter

Default

MKE value

Description

*call-arptables

No default

1

Sets whether arptables rules apply to bridged network traffic. If the bridge module is not loaded, and thus no bridges are present, this key is not present.

*call-ip6tables

No default

1

Sets whether ip6tables rules apply to bridged network traffic. If the bridge module is not loaded, and thus no bridges are present, this key is not present.

*call-iptables

No default

1

Sets whether iptables rules apply to bridged network traffic. If the bridge module is not loaded, and thus no bridges are present, this key is not present.

*filter-pppoe-tagged

No default

0

Sets whether netfilter rules apply to bridged PPPOE network traffic. If the bridge module is not loaded, and thus no bridges are present, this key is not present.

*filter-vlan-tagged

No default

0

Sets whether netfilter rules apply to bridged VLAN network traffic. If the bridge module is not loaded, and thus no bridges are present, this key is not present.

*pass-vlan-input-dev

No default

0

Sets whether netfilter strips the incoming VLAN interface name from bridged traffic. If the bridge module is not loaded, and thus no bridges are present, this key is not present.


net.fan.*

Parameter

Default

MKE value

Description

*vxlan

No default

4

Sets the version of the VXLAN module on older kernels, not present on kernel version 5.x. If the VXLAN module is not loaded this key is not present.


net.ipv4.*

Note

  • The *.vs.* default values persist, changing only because the ipvs kernel module was not previously loaded. Refer to the Linux kernel documentation for the complete documentation.

Parameter

Default

MKE value

Description

*conf.all.accept_redirects

1

0

Sets whether ICMP redirects are permitted. This key affects all interfaces.

*conf.all.forwarding

0

1

Sets whether network traffic is forwarded. This key affects all interfaces.

*conf.all.route_localnet

0

1

Sets 127/8 for local routing. This key affects all interfaces.

*conf.default.forwarding

0

1

Sets 127/8 for local routing. This key affects new interfaces.

*conf.lo.forwarding

0

1

Sets forwarding for localhost traffic.

*ip_forward

0

1

Sets whether traffic forwards between interfaces. For Kubernetes to run, this parameter must be set to 1.

*vs.am_droprate

10

10

Sets the always mode drop rate used in mode 3 of the drop_rate defense.

*vs.amemthresh

1024

1024

Sets the available memory threshold in pages, which is used in the automatic modes of defense. When there is not enough available memory, this enables the strategy and the variable is set to 2. Otherwise, the strategy is disabled and the variable is set to 1.

*vs.backup_only

0

0

Sets whether the director function is disabled while the server is in back-up mode, to avoid packet loops for DR/TUN methods.

*vs.cache_bypass

0

0

Sets whether packets forward directly to the original destination when no cache server is available and the destination address is not local (iph->daddr is RTN_UNICAST). This mostly applies to transparent web cache clusters.

*vs.conn_reuse_mode

1

1

Sets how IPVS handles connections detected on port reuse. It is a bitmap with the following values:

  • 0 disables any special handling on port reuse. The new connection is delivered to the same real server that was servicing the previous connection, effectively disabling expire_nodest_conn.

  • bit 1 enables rescheduling of new connections when it is safe. That is, whenever expire_nodest_conn and for TCP sockets, when the connection is in TIME_WAIT state (which is only possible if you use NAT mode).

  • bit 2 is bit 1 plus, for TCP connections, when connections are in FIN_WAIT state, as this is the last state seen by load balancer in Direct Routing mode. This bit helps when adding new real servers to a very busy cluster.

*vs.conntrack

0

0

Sets whether connection-tracking entries are maintained for connections handled by IPVS. Enable if connections handled by IPVS are to be subject to stateful firewall rules. That is, iptables rules that make use of connection tracking. Otherwise, disable this setting to optimize performance. Connections handled by the IPVS FTP application module have connection tracking entries regardless of this setting, which is only available when IPVS is compiled with CONFIG_IP_VS_NFCT enabled.

*vs.drop_entry

0

0

Sets whether entries are randomly dropped in the connection hash table, to collect memory back for new connections. In the current code, the drop_entry procedure can be activated every second, then it randomly scans 1/32 of the whole and drops entries that are in the SYN-RECV/SYNACK state, which should be effective against syn-flooding attack.

The valid values of drop_entry are 0 to 3, where 0 indicates that the strategy is always disabled, 1 and 2 indicate automatic modes (when there is not enough available memory, the strategy is enabled and the variable is automatically set to 2, otherwise the strategy is disabled and the variable is set to 1), and 3 indicates that the strategy is always enabled.

*vs.drop_packet

0

0

Sets whether rate packets are dropped prior to being forwarded to real servers. Rate 1 drops all incoming packets.

The value definition is the same as that for drop_entry. In automatic mode, the following formula determines the rate: rate = amemthresh / (amemthresh - available_memory) when available memory is less than the available memory threshold. When mode 3 is set, the always mode drop rate is controlled by the /proc/sys/net/ipv4/vs/am_droprate.

*vs.expire_nodest_conn

0

0

Sets whether the load balancer silently drops packets when its destination server is not available. This can be useful when the user-space monitoring program deletes the destination server (due to server overload or wrong detection) and later adds the server back, and the connections to the server can continue.

If this feature is enabled, the load balancer terminates the connection immediately whenever a packet arrives and its destination server is not available, after which the client program will be notified that the connection is closed. This is equivalent to the feature that is sometimes required to flush connections when the destination is not available.

*vs.ignore_tunneled

0

0

Sets whether IPVS configures the ipvs_property on all packets of unrecognized protocols. This prevents users from routing such tunneled protocols as IPIP, which is useful in preventing the rescheduling packets that have been tunneled to the IPVS host (that is, to prevent IPVS routing loops when IPVS is also acting as a real server).

*vs.nat_icmp_send

0

0

Sets whether ICMP error messages (ICMP_DEST_UNREACH) are sent for VS/NAT when the load balancer receives packets from real servers but the connection entries do not exist.

*vs.pmtu_disc

0

0

Sets whether all DF packets that exceed the PMTU are rejected with FRAG_NEEDED, irrespective of the forwarding method. For the TUN method, the flag can be disabled to fragment such packets.

*vs.schedule_icmp

0

0

Sets whether scheduling ICMP packets in IPVS is enabled.

*vs.secure_tcp

0

0

Sets the use of a more complicated TCP state transition table. For VS/NAT, the secure_tcp defense delays entering the TCP ESTABLISHED state until the three-way handshake completes. The value definition is the same as that of drop_entry and drop_packet.

*vs.sloppy_sctp

0

0

Sets whether IPVS is permitted to create a connection state on any packet, rather than an SCTP INIT only.

*vs.sloppy_tcp

0

0

Sets whether IPVS is permitted to create a connection state on any packet, rather than a TCP SYN only.

*vs.snat_reroute

0

1

Sets whether the route of SNATed packets is recalculated from real servers as if they originate from the director. If disabled, SNATed packets are routed as if they have been forwarded by the director.

If policy routing is in effect, then it is possible that the route of a packet originating from a director is routed differently to a packet being forwarded by the director.

If policy routing is not in effect, then the recalculated route will always be the same as the original route. It is an optimization to disable snat_reroute and avoid the recalculation.

*vs.sync_persist_mode

0

0

Sets the synchronization of connections when using persistence. The possible values are defined as follows:

  • 0 means all types of connections are synchronized.

  • 1 attempts to reduce the synchronization traffic depending on the connection type. For persistent services, avoid synchronization for normal connections, do it only for persistence templates. In such case, for TCP and SCTP it may need enabling sloppy_tcp and sloppy_sctp flags on back-up servers. For non-persistent services such optimization is not applied, mode 0 is assumed.

*vs.sync_ports

1

1

Sets the number of threads that the master and back-up servers can use for sync traffic. Every thread uses a single UDP port, thread 0 uses the default port 8848, and the last thread uses port 8848+sync_ports-1.

*vs.sync_qlen_max

Calculated

Calculated

Sets a hard limit for queued sync messages that are not yet sent. It defaults to 1/32 of the memory pages but actually represents number of messages. It will protect us from allocating large parts of memory when the sending rate is lower than the queuing rate.

*vs.sync_refresh_period

0

0

Sets (in seconds) the difference in the reported connection timer that triggers new sync messages. It can be used to avoid sync messages for the specified period (or half of the connection timeout if it is lower) if the connection state has not changed since last sync.

This is useful for normal connections with high traffic, to reduce sync rate. Additionally, retry sync_retries times with period of sync_refresh_period/8.

*vs.sync_retries

0

0

Sets sync retries with period of sync_refresh_period/8. Useful to protect against loss of sync messages. The range of the sync_retries is 0 to 3.

*vs.sync_sock_size

0

0

Sets the configuration of SNDBUF (master) or RCVBUF (slave) socket limit. Default value is 0 (preserve system defaults).

*vs.sync_threshold

3 50

3 50

Sets the synchronization threshold, which is the minimum number of incoming packets that a connection must receive before the connection is synchronized. A connection will be synchronized every time the number of its incoming packets modulus sync_period equals the threshold. The range of the threshold is 0 to sync_period. When sync_period and sync_refresh_period are 0, send sync only for state changes or only once when packets matches sync_threshold.

*vs.sync_version

1

1

Sets the version of the synchronization protocol to use when sending synchronization messages. The possible values are:

  • ``0 ``selects the original synchronization protocol (version 0). This should be used when sending synchronization messages to a legacy system that only understands the original synchronization protocol.

  • 1 selects the current synchronization protocol (version 1). This should be used whenever possible.

Kernels with this sync_version entry are able to receive messages of both version 1 and version 2 of the synchronization protocol.


net.netfilter.nf_conntrack_*

Note

  • The *.nf_conntrack* default values persist, changing only because the nf_conntrack kernel module was not previously loaded. Refer to the Linux kernel documentation for the complete documentation.

Parameter

Default

MKE value

Description

*acct

0

0

Sets whether connection-tracking flow accounting is enabled. Adds 64-bit byte and packet counter per flow.

*buckets

Calculated

Calculated

Sets the size of the hash table. If not specified during module loading, the default size is calculated by dividing total memory by 16384 to determine the number of buckets. The hash table will never have fewer than 1024 and never more than 262144 buckets. This sysctl is only writeable in the initial net namespace.

*checksum

0

0

Sets whether the checksum of incoming packets is verified. Packets with bad checksums are in an invalid state. If this is enabled, such packets are not considered for connection tracking.

*dccp_loose

0

1

Sets whether picking up already established connections for Datagram Congestion Control Protocol (DCCP) is permitted.

*dccp_timeout_closereq

Distribution dependent

64

The parameter description is not yet available in the Linux kernel documentation.

*dccp_timeout_closing

Distribution dependent

64

The parameter description is not yet available in the Linux kernel documentation.

*dccp_timeout_open

Distribution dependent

43200

The parameter description is not yet available in the Linux kernel documentation.

*dccp_timeout_partopen

Distribution dependent

480

The parameter description is not yet available in the Linux kernel documentation.

*dccp_timeout_request

Distribution dependent

240

The parameter description is not yet available in the Linux kernel documentation.

*dccp_timeout_respond

Distribution dependent

480

The parameter description is not yet available in the Linux kernel documentation.

*dccp_timeout_timewait

Distribution dependent

240

The parameter description is not yet available in the Linux kernel documentation.

*events

0

1

Sets whether the connection tracking code provides userspace with connection-tracking events through ctnetlink.

*expect_max

Calculated

1024

Sets the maximum size of the expectation table. The default value is nf_conntrack_buckets / 256. The minimum is 1.

*frag6_high_thresh

Calculated

4194304

Sets the maximum memory used to reassemble IPv6 fragments. When nf_conntrack_frag6_high_thresh bytes of memory is allocated for this purpose, the fragment handler tosses packets until nf_conntrack_frag6_low_thresh is reached. The size of this parameter is calculated based on system memory.

*frag6_low_thresh

Calculated

3145728

See nf_conntrack_frag6_high_thresh. The size of this parameter is calculated based on system memory.

*frag6_timeout

60

60

Sets the time to keep an IPv6 fragment in memory.

*generic_timeout

600

600

Sets the default for a generic timeout. This refers to layer 4 unknown and unsupported protocols.

*gre_timeout

30

30

Set the GRE timeout from the conntrack table.

*gre_timeout_stream

180

180

Sets the GRE timeout for streamed connections. This extended timeout is used when a GRE stream is detected.

*helper

0

0

Sets whether the automatic conntrack helper assignment is enabled. If disabled, you must set up iptables rules to assign helpers to connections. See the CT target description in the iptables-extensions(8) main page for more information.

*icmp_timeout

30

30

Sets the default for ICMP timeout.

*icmpv6_timeout

30

30

Sets the default for ICMP6 timeout.

*log_invalid

0

0

Sets whether invalid packets of a type specified by value are logged.

*max

Calculated

131072

Sets the maximum number of allowed connection tracking entries. This value is set to nf_conntrack_buckets by default.

Connection-tracking entries are added to the table twice, once for the original direction and once for the reply direction (that is, with the reversed address). Thus, with default settings a maxed-out table will have an average hash chain length of 2, not 1.

*sctp_timeout_closed

Distribution dependent

10

The parameter description is not yet available in the Linux kernel documentation.

*sctp_timeout_cookie_echoed

Distribution dependent

3

The parameter description is not yet available in the Linux kernel documentation.

*sctp_timeout_cookie_wait

Distribution dependent

3

The parameter description is not yet available in the Linux kernel documentation.

*sctp_timeout_established

Distribution dependent

432000

The parameter description is not yet available in the Linux kernel documentation.

*sctp_timeout_heartbeat_acked

Distribution dependent

210

The parameter description is not yet available in the Linux kernel documentation.

*sctp_timeout_heartbeat_sent

Distribution dependent

30

The parameter description is not yet available in the Linux kernel documentation.

*sctp_timeout_shutdown_ack_sent

Distribution dependent

3

The parameter description is not yet available in the Linux kernel documentation.

*sctp_timeout_shutdown_recd

Distribution dependent

0

The parameter description is not yet available in the Linux kernel documentation.

*sctp_timeout_shutdown_sent

Distribution dependent

0

The parameter description is not yet available in the Linux kernel documentation.

*tcp_be_liberal

0

0

Sets whether only out of window RST segments are marked as INVALID.

*tcp_loose

0

1

Sets whether already established connections are picked up.

*tcp_max_retrans

3

3

Sets the maximum number of packets that can be retransmitted without receiving an acceptable ACK from the destination. If this number is reached, a shorter timer is started. Timeout for unanswered.

*tcp_timeout_close

Distribution dependent

10

The parameter description is not yet available in the Linux kernel documentation.

*tcp_timeout_close_wait

Distribution dependent

3600

The parameter description is not yet available in the Linux kernel documentation.

*tcp_timeout_fin_wait

Distribution dependent

120

The parameter description is not yet available in the Linux kernel documentation.

*tcp_timeout_last_ack

Distribution dependent

30

The parameter description is not yet available in the Linux kernel documentation.

*tcp_timeout_max_retrans

Distribution dependent

300

The parameter description is not yet available in the Linux kernel documentation.

*tcp_timeout_syn_recv

Distribution dependent

60

The parameter description is not yet available in the Linux kernel documentation.

*tcp_timeout_syn_sent

Distribution dependent

120

The parameter description is not yet available in the Linux kernel documentation.

*tcp_timeout_time_wait

Distribution dependent

120

The parameter description is not yet available in the Linux kernel documentation.

*tcp_timeout_unacknowledged

Distribution dependent

30

The parameter description is not yet available in the Linux kernel documentation.

*timestamp

0

0

Sets whether connection-tracking flow timestamping is enabled.

*udp_timeout

30

30

Sets the UDP timeout.

*udp_timeout_stream

120

120

Sets the extended timeout that is used whenever a UDP stream is detected.


net.nf_conntrack_*

Note

  • The *.nf_conntrack* default values persist, changing only because the nf_conntrack kernel module was not previously loaded. Refer to the Linux kernel documentation for the complete documentation.

Parameter

Default

MKE value

Description

*max

Calculated

131072

Sets the maximum number of connections to track. The size of this parameter is calculated based on system memory.


vm.overcommit_*

Parameter

Default

MKE value

Description

*memory

Distribution dependent

1

Sets whether the kernel permits memory overcommitment from malloc() calls.

Install the MKE image

To install MKE:

  1. Log in to the target host using Secure Shell (SSH).

  2. Pull the latest version of MKE:

    docker image pull mirantis/ucp:3.5.0
    
  3. Install MKE:

    docker container run --rm -it --name ucp \
    -v /var/run/docker.sock:/var/run/docker.sock \
    mirantis/ucp:3.5.0 install \
    --host-address <node-ip-address> \
    --interactive
    

    The ucp install command runs in interactive mode, prompting you for the necessary configuration values. For more information about the ucp install command, including how to install MKE on a system with SELinux enabled, refer to the MKE Operations Guide: mirantis/ucp install.

Note

MKE installs Project Calico for Kubernetes container-to-container communication. However, you may install an alternative CNI plugin, such as Weave or Flannel. For more information, refer to the MKE Operations Guide: Installing an unmanaged CNI plugin.

Obtain the license

After you Install the MKE image, proceed with downloading your MKE license as described below. This section also contains steps to apply your new license using the MKE web UI.

Warning

Users are not authorized to run MKE on production workloads without a valid license. Refer to Mirantis Agreements and Terms for more information.

To download your MKE license:

  1. Open an email from Mirantis Support with the subject Welcome to Mirantis’ CloudCare Portal and follow the instructions for logging in.

    If you did not receive the CloudCare Portal email, it is likely that you have not yet been added as a Designated Contact. To remedy this, contact your Designated Administrator.

  2. In the top navigation bar, click Environments.

  3. Click the Cloud Name associated with the license you want to download.

  4. Scroll down to License Information and click the License File URL. A new tab opens in your browser.

  5. Click View file to download your license file.

To update your license settings in the MKE web UI:

  1. Log in to your MKE instance using an administrator account.

  2. In the left navigation, click Settings.

  3. On the General tab, click Apply new license. A file browser dialog displays.

  4. Navigate to where you saved the license key (.lic) file, select it, and click Open. MKE automatically updates with the new settings.

Note

Though MKE is generally a subscription-only service, Mirantis offers a free trial license by request. Use our contact form to request a free trial license.

Install MKE on AWS

This section describes how to customize your MKE installation on AWS. It is for those deploying Kubernetes workloads while leveraging the AWS Kubernetes cloud provider, which provides dynamic volume and loadbalancer provisioning.

Note

You may skip this topic if you plan to install MKE on AWS with no customizations or if you will only deploy Docker Swarm workloads. Refer to Install the MKE image for the appropriate installation instruction.

Prerequisites

Complete the following prerequisites prior to installing MKE on AWS.

  1. Log in to the AWS Management Console.

  2. Assign your instance a host name using the ip-<private ip>.<region>.compute.internal template. For example, ip-172-31-15-241.us-east-2.compute.internal.

  3. Tag your instance, VPC, and subnets by specifying kubernetes.io/cluster/<unique-cluster-id> in the Key field and <cluster-type> in the Value field. Possible <cluster-type> values are as follows:

    • owned, if the cluster owns and manages the resources that it creates

    • shared, if the cluster shares its resources between multiple clusters

    For example, Key: kubernetes.io/cluster/1729543642a6 and Value: owned.

  4. To enable introspection and resource provisioning, specify an instance profile with appropriate policies for manager nodes. The following is an example of a very permissive instance profile:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [ "ec2:*" ],
          "Resource": [ "*" ]
        },
        {
          "Effect": "Allow",
          "Action": [ "elasticloadbalancing:*" ],
          "Resource": [ "*" ]
        },
        {
          "Effect": "Allow",
          "Action": [ "route53:*" ],
          "Resource": [ "*" ]
        },
        {
          "Effect": "Allow",
          "Action": "s3:*",
          "Resource": [ "arn:aws:s3:::kubernetes-*" ]
        }
      ]
    }
    
  5. To enable access to dynamically provisioned resources, specify an instance profile with appropriate policies for worker nodes. The following is an example of a very permissive instance profile:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "s3:*",
          "Resource": [ "arn:aws:s3:::kubernetes-*" ]
        },
        {
          "Effect": "Allow",
          "Action": "ec2:Describe*",
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": "ec2:AttachVolume",
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": "ec2:DetachVolume",
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": [ "route53:*" ],
          "Resource": [ "*" ]
        }
    }
    

Install MKE

After you perform the steps described in Prerequisites, run the following command to install MKE on a master node. Substitute <ucp-ip> with the private IP address of the master node.

docker container run --rm -it \
--name ucp \
--volume /var/run/docker.sock:/var/run/docker.sock \
mirantis/ucp:3.5.0 install \
--host-address <ucp-ip> \
--cloud-provider aws \
--interactive

Install MKE on Azure

Mirantis Kubernetes Engine (MKE) closely integrates with Microsoft Azure for its Kubernetes Networking and Persistent Storage feature set. MKE deploys the Calico CNI provider. In Azure, the Calico CNI leverages the Azure networking infrastructure for data path networking and the Azure IPAM for IP address management.

Prerequisites

To avoid significant issues during the installation process, you must meet the following infrastructure prerequisites to successfully deploy MKE on Azure.

  • Deploy all MKE nodes (managers and workers) into the same Azure resource group. You can deploy the Azure networking components (virtual network, subnets, security groups) in a second Azure resource group.

  • Size the Azure virtual network and subnet appropriately for your environment, because addresses from this pool will be consumed by Kubernetes Pods.

  • Attach all MKE worker and manager nodes to the same Azure subnet.

  • Set internal IP addresses for all nodes to Static rather than the Dynamic default.

  • Match the Azure virtual machine object name to the Azure virtual machine computer name and the node operating system hostname that is the FQDN of the host (including domain names). All characters in the names must be in lowercase.

  • Ensure the presence of an Azure Service Principal with Contributor access to the Azure resource group hosting the MKE nodes. Kubernetes uses this Service Principal to communicate with the Azure API. The Service Principal ID and Secret Key are MKE prerequisites.

    If you are using a separate resource group for the networking components, the same Service Principal must have Network Contributor access to this resource group.

  • Ensure that an open NSG between all IPs on the Azure subnet passes into MKE during installation. Kubernetes Pods integrate into the underlying Azure networking stack, from an IPAM and routing perspective with the Azure CNI IPAM module. As such, Azure network security groups (NSG) impact pod-to-pod communication. End users may expose containerized services on a range of underlying ports, resulting in a manual process to open an NSG port every time a new containerized service deploys on the platform, affecting only workloads that deploy on the Kubernetes orchestrator.

    To limit exposure, restrict the use of the Azure subnet to container host VMs and Kubernetes Pods. Additionally, you can leverage Kubernetes Network Policies to provide micro segmentation for containerized applications and services.

The MKE installation requires the following information:

subscriptionId

Azure Subscription ID in which to deploy the MKE objects

tenantId

Azure Active Directory Tenant ID in which to deploy the MKE objects

aadClientId

Azure Service Principal ID

aadClientSecret

Azure Service Principal Secret Key

Networking

MKE configures the Azure IPAM module for Kubernetes so that it can allocate IP addresses for Kubernetes Pods. Per Azure IPAM module requirements, the configuration of each Azure VM that is part of the Kubernetes cluster must include a pool of IP addresses.

You can use automatic or manual IPs provisioning for the Kubernetes cluster on Azure.

  • Automatic provisioning

    Allows for IP pool configuration and maintenance for standalone Azure virtual machines (VMs). This service runs within the calico-node daemonset and provisions 128 IP addresses for each node by default.

    Note

    If you are using a VXLAN data plane, MKE automatically uses Calico IPAM. It is not necessary to do anything specific for Azure IPAM.

    New MKE installations use Calico VXLAN as the default data plane (the MKE configuration calico_vxlan is set to true). MKE does not use Calico VXLAN if the MKE version is lower than 3.3.0 or if you upgrade MKE from lower than 3.3.0 to 3.3.0 or higher.

  • Manual provisioning

    Manual provisioning of additional IP address for each Azure VM can be done through the Azure Portal, the Azure CLI az network nic ip-config create, or an ARM template.

Azure configuration file

For MKE to integrate with Microsoft Azure, the azure.json configuration file for all Linux MKE manager and Linux MKE worker nodes in your cluster must be identical. Place the file in /etc/kubernetes on each host and, because root owns the configuration file, set its permissions to 0644 to ensure the container user has read access.

The following is an example template for azure.json.

{
    "cloud":"AzurePublicCloud",
    "tenantId": "<parameter_value>",
    "subscriptionId": "<parameter_value>",
    "aadClientId": "<parameter_value>",
    "aadClientSecret": "<parameter_value>",
    "resourceGroup": "<parameter_value>",
    "location": "<parameter_value>",
    "subnetName": "<parameter_value>",
    "securityGroupName": "<parameter_value>",
    "vnetName": "<parameter_value>",
    "useInstanceMetadata": true
}

Optional parameters are available for Azure deployments:

primaryAvailabilitySetName

Worker nodes availability set

vnetResourceGroup

Virtual network resource group if your Azure network objects live in a separate resource group

routeTableName

Applicable if you have defined multiple route tables within an Azure subnet

Guidelines for IPAM configuration

Warning

To avoid significant issue during the installation process, follow these guidelines to either use the appropriate size network in Azure or take the necessary actions to fit within the subnet.

Configure the subnet and the virtual network associated with the primary interface of the Azure VMs with an adequate address prefix/range. The number of required IP addresses depends on the workload and the number of nodes in the cluster.

For example, for a cluster of 256 nodes, make sure that the address space of the subnet and the virtual network can allocate at least 128 * 256 IP addresses, in order to run a maximum of 128 pods concurrently on a node. This is in addition to initial IP allocations to VM network interface card (NICs) during Azure resource creation.

Accounting for the allocation of IP addresses to NICs that occur during VM bring-up, set the address space of the subnet and virtual network to 10.0.0.0/16. This ensures that the network can dynamically allocate at least 32768 addresses, plus a buffer for initial allocations for primary IP addresses.

Note

The Azure IPAM module queries the metadata of an Azure VM to obtain a list of the IP addresses that are assigned to the VM NICs. The IPAM module allocates these IP addresses to Kubernetes pods. You configure the IP addresses as ipConfigurations in the NICs associated with a VM or scale set member, so that Azure IPAM can provide the addresses to Kubernetes on request.

Manually provision IP address pools as part of an Azure VM scale set

Configure IP Pools for each member of the VM scale set during provisioning by associating multiple ipConfigurations with the scale set’s networkInterfaceConfigurations.

The following example networkProfile configuration for an ARM template configures pools of 32 IP addresses for each VM in the VM scale set.

"networkProfile": {
  "networkInterfaceConfigurations": [
    {
      "name": "[variables('nicName')]",
      "properties": {
        "ipConfigurations": [
          {
            "name": "[variables('ipConfigName1')]",
            "properties": {
              "primary": "true",
              "subnet": {
                "id": "[concat('/subscriptions/', subscription().subscriptionId,'/resourceGroups/', resourceGroup().name, '/providers/Microsoft.Network/virtualNetworks/', variables('virtualNetworkName'), '/subnets/', variables('subnetName'))]"
              },
              "loadBalancerBackendAddressPools": [
                {
                  "id": "[concat('/subscriptions/', subscription().subscriptionId,'/resourceGroups/', resourceGroup().name, '/providers/Microsoft.Network/loadBalancers/', variables('loadBalancerName'), '/backendAddressPools/', variables('bePoolName'))]"
                }
              ],
              "loadBalancerInboundNatPools": [
                {
                  "id": "[concat('/subscriptions/', subscription().subscriptionId,'/resourceGroups/', resourceGroup().name, '/providers/Microsoft.Network/loadBalancers/', variables('loadBalancerName'), '/inboundNatPools/', variables('natPoolName'))]"
                }
              ]
            }
          },
          {
            "name": "[variables('ipConfigName2')]",
            "properties": {
              "subnet": {
                "id": "[concat('/subscriptions/', subscription().subscriptionId,'/resourceGroups/', resourceGroup().name, '/providers/Microsoft.Network/virtualNetworks/', variables('virtualNetworkName'), '/subnets/', variables('subnetName'))]"
              }
            }
          }
          .
          .
          .
          {
            "name": "[variables('ipConfigName32')]",
            "properties": {
              "subnet": {
                "id": "[concat('/subscriptions/', subscription().subscriptionId,'/resourceGroups/', resourceGroup().name, '/providers/Microsoft.Network/virtualNetworks/', variables('virtualNetworkName'), '/subnets/', variables('subnetName'))]"
              }
            }
          }
        ],
        "primary": "true"
      }
    }
  ]
}

Adjust the IP count value

During an MKE installation, you can alter the number of Azure IP addresses that MKE automatically provisions for pods.

By default, MKE will provision 128 addresses, from the same Azure subnet as the hosts, for each VM in the cluster. If, however, you have manually attached additional IP addresses to the VMs (by way of an ARM Template, Azure CLI or Azure Portal) or you are deploying in to small Azure subnet (less than /16), you can use an --azure-ip-count flag at install time.

Note

Do not set the --azure-ip-count variable to a value of less than 6 if you have not manually provisioned additional IP addresses for each VM. The MKE installation needs at least 6 IP addresses to allocate to the core MKE components that run as Kubernetes pods (in addition to the VM’s private IP address).

Below are several example scenarios that require the defining of the --azure-ip-count variable.

Scenario 1: Manually provisioned addresses

If you have manually provisioned additional IP addresses for each VM and want to disable MKE from dynamically provisioning more IP addresses, you must pass --azure-ip-count 0 into the MKE installation command.

Scenario 2: Reducing the number of provisioned addresses

Pass --azure-ip-count <custom_value> into the MKE installation command to reduce the number of IP addresses dynamically allocated from 128 to a custom value due to:

  • Primary use of the Swarm Orchestrator

  • Deployment of MKE on a small Azure subnet (for example, /24)

  • Plans to run a small number of Kubernetes pods on each node

To adjust this value post-installation, refer to the instructions on how to download the MKE configuration file, change the value, and update the configuration via the API.

Note

If you reduce the value post-installation, existing VMs will not reconcile and you will need to manually edit the IP count in Azure.

Run the following command to install MKE on a manager node.

docker container run --rm -it \
  --name ucp \
  --volume /var/run/docker.sock:/var/run/docker.sock \
  mirantis/ucp:3.5.0 install \
  --host-address <ucp-ip> \
  --pod-cidr <ip-address-range> \
  --cloud-provider Azure \
  --interactive
  • The --pod-cidr option maps to the IP address range that you configured for the Azure subnet.

    The pod-cidr range must match the Azure virtual network’s subnet attached to the hosts. For example, if the Azure virtual network had the range 172.0.0.0/16 with VMs provisioned on an Azure subnet of 172.0.1.0/24, then the Pod CIDR should also be 172.0.1.0/24.

    This requirement applies only when MKE does not use the VXLAN data plane. If MKE uses the VXLAN data plane, the pod-cidr range must be different than the node IP subnet.

  • The --host-address maps to the private IP address of the master node.

  • The --azure-ip-count serves to adjust the amount of IP addresses provisioned to each VM.

Azure custom roles

You can create your own Azure custom roles for use with MKE. You can assign these roles to users, groups, and service principals at management group (in preview only), subscription, and resource group scopes.

Deploy an MKE cluster into a single resource group

A resource group is a container that holds resources for an Azure solution. These resources are the virtual machines (VMs), networks, and storage accounts that are associated with the swarm.

To create a custom all-in-one role with permissions to deploy an MKE cluster into a single resource group:

  1. Create the role permissions JSON file.

    For example:

    {
      "Name": "Docker Platform All-in-One",
      "IsCustom": true,
      "Description": "Can install and manage Docker platform.",
      "Actions": [
        "Microsoft.Authorization/*/read",
        "Microsoft.Authorization/roleAssignments/write",
        "Microsoft.Compute/availabilitySets/read",
        "Microsoft.Compute/availabilitySets/write",
        "Microsoft.Compute/disks/read",
        "Microsoft.Compute/disks/write",
        "Microsoft.Compute/virtualMachines/extensions/read",
        "Microsoft.Compute/virtualMachines/extensions/write",
        "Microsoft.Compute/virtualMachines/read",
        "Microsoft.Compute/virtualMachines/write",
        "Microsoft.Network/loadBalancers/read",
        "Microsoft.Network/loadBalancers/write",
        "Microsoft.Network/loadBalancers/backendAddressPools/join/action",
        "Microsoft.Network/networkInterfaces/read",
        "Microsoft.Network/networkInterfaces/write",
        "Microsoft.Network/networkInterfaces/join/action",
        "Microsoft.Network/networkSecurityGroups/read",
        "Microsoft.Network/networkSecurityGroups/write",
        "Microsoft.Network/networkSecurityGroups/join/action",
        "Microsoft.Network/networkSecurityGroups/securityRules/read",
        "Microsoft.Network/networkSecurityGroups/securityRules/write",
        "Microsoft.Network/publicIPAddresses/read",
        "Microsoft.Network/publicIPAddresses/write",
        "Microsoft.Network/publicIPAddresses/join/action",
        "Microsoft.Network/virtualNetworks/read",
        "Microsoft.Network/virtualNetworks/write",
        "Microsoft.Network/virtualNetworks/subnets/read",
        "Microsoft.Network/virtualNetworks/subnets/write",
        "Microsoft.Network/virtualNetworks/subnets/join/action",
        "Microsoft.Resources/subscriptions/resourcegroups/read",
        "Microsoft.Resources/subscriptions/resourcegroups/write",
        "Microsoft.Security/advancedThreatProtectionSettings/read",
        "Microsoft.Security/advancedThreatProtectionSettings/write",
        "Microsoft.Storage/*/read",
        "Microsoft.Storage/storageAccounts/listKeys/action",
        "Microsoft.Storage/storageAccounts/write"
      ],
      "NotActions": [],
      "AssignableScopes": [
        "/subscriptions/6096d756-3192-4c1f-ac62-35f1c823085d"
      ]
    }
    
  2. Create the Azure RBAC role.

    az role definition create --role-definition all-in-one-role.json
    
Deploy MKE compute resources

Compute resources act as servers for running containers.

To create a custom role to deploy MKE compute resources only:

  1. Create the role permissions JSON file.

    For example:

    {
      "Name": "Docker Platform",
      "IsCustom": true,
      "Description": "Can install and run Docker platform.",
      "Actions": [
        "Microsoft.Authorization/*/read",
        "Microsoft.Authorization/roleAssignments/write",
        "Microsoft.Compute/availabilitySets/read",
        "Microsoft.Compute/availabilitySets/write",
        "Microsoft.Compute/disks/read",
        "Microsoft.Compute/disks/write",
        "Microsoft.Compute/virtualMachines/extensions/read",
        "Microsoft.Compute/virtualMachines/extensions/write",
        "Microsoft.Compute/virtualMachines/read",
        "Microsoft.Compute/virtualMachines/write",
        "Microsoft.Network/loadBalancers/read",
        "Microsoft.Network/loadBalancers/write",
        "Microsoft.Network/networkInterfaces/read",
        "Microsoft.Network/networkInterfaces/write",
        "Microsoft.Network/networkInterfaces/join/action",
        "Microsoft.Network/publicIPAddresses/read",
        "Microsoft.Network/virtualNetworks/read",
        "Microsoft.Network/virtualNetworks/subnets/read",
        "Microsoft.Network/virtualNetworks/subnets/join/action",
        "Microsoft.Resources/subscriptions/resourcegroups/read",
        "Microsoft.Resources/subscriptions/resourcegroups/write",
        "Microsoft.Security/advancedThreatProtectionSettings/read",
        "Microsoft.Security/advancedThreatProtectionSettings/write",
        "Microsoft.Storage/storageAccounts/read",
        "Microsoft.Storage/storageAccounts/listKeys/action",
        "Microsoft.Storage/storageAccounts/write"
      ],
      "NotActions": [],
      "AssignableScopes": [
        "/subscriptions/6096d756-3192-4c1f-ac62-35f1c823085d"
      ]
    }
    
  2. Create the Docker Platform RBAC role.

    az role definition create --role-definition platform-role.json
    
Deploy MKE network resources

Network resources are services inside your cluster. These resources can include virtual networks, security groups, address pools, and gateways.

To create a custom role to deploy MKE network resources only:

  1. Create the role permissions JSON file.

    For example:

    {
      "Name": "Docker Networking",
      "IsCustom": true,
      "Description": "Can install and manage Docker platform networking.",
      "Actions": [
        "Microsoft.Authorization/*/read",
        "Microsoft.Network/loadBalancers/read",
        "Microsoft.Network/loadBalancers/write",
        "Microsoft.Network/loadBalancers/backendAddressPools/join/action",
        "Microsoft.Network/networkInterfaces/read",
        "Microsoft.Network/networkInterfaces/write",
        "Microsoft.Network/networkInterfaces/join/action",
        "Microsoft.Network/networkSecurityGroups/read",
        "Microsoft.Network/networkSecurityGroups/write",
        "Microsoft.Network/networkSecurityGroups/join/action",
        "Microsoft.Network/networkSecurityGroups/securityRules/read",
        "Microsoft.Network/networkSecurityGroups/securityRules/write",
        "Microsoft.Network/publicIPAddresses/read",
        "Microsoft.Network/publicIPAddresses/write",
        "Microsoft.Network/publicIPAddresses/join/action",
        "Microsoft.Network/virtualNetworks/read",
        "Microsoft.Network/virtualNetworks/write",
        "Microsoft.Network/virtualNetworks/subnets/read",
        "Microsoft.Network/virtualNetworks/subnets/write",
        "Microsoft.Network/virtualNetworks/subnets/join/action",
        "Microsoft.Resources/subscriptions/resourcegroups/read",
        "Microsoft.Resources/subscriptions/resourcegroups/write"
      ],
      "NotActions": [],
      "AssignableScopes": [
        "/subscriptions/6096d756-3192-4c1f-ac62-35f1c823085d"
      ]
    }
    
  2. Create the Docker Networking RBAC role.

    az role definition create --role-definition networking-role.json
    

Install MKE offline

To install MKE on an offline host, you must first use a separate computer with an Internet connection to download a single package with all the images and then copy that package to the host where you will install MKE. Once the package is on the host and loaded, you can install MKE offline as described in Install the MKE image.

Note

During the offline installation, both manager and worker nodes must be offline.

To install MKE offline:

  1. Download the required MKE package.

  2. Copy the MKE package to the host machine:

    scp ucp.tar.gz <user>@<host>
    
  3. Use SSH to log in to the host where you transferred the package.

  4. Load the MKE images from the .tar.gz file:

    docker load -i ucp.tar.gz
    
  5. Install the MKE image.

Uninstall MKE

This topic describes how to uninstall MKE from your cluster. After uninstalling MKE, your instances of MCR will continue running in swarm mode and your applications will run normally. You will not, however, be able to do the following unless you reinstall MKE:

  • Enforce role-based access control (RBAC) to the cluster.

  • Monitor and manage the cluster from a central place.

  • Join new nodes using docker swarm join.

    Note

    You cannot join new nodes to your cluster after uninstalling MKE because your cluster will be in swarm mode, and swarm mode relies on MKE to provide the CA certificates that allow nodes to communicate with each other. After the certificates expire, the nodes will not be able to communicate at all. Either reinstall MKE before the certificates expire, or disable swarm mode by running docker swarm leave --force on every node.

To uninstall MKE:

  1. Log in to a manager node using SSH.

  2. Run the uninstall-ucp command in interactive mode, thus prompting you for the necessary configuration values:

    docker container run --rm -it \
      -v /var/run/docker.sock:/var/run/docker.sock \
      --name ucp \
      mirantis/ucp:3.5.0 uninstall-ucp --interactive
    

    Note

    The uninstall-ucp command completely removes MKE from every node in the cluster. You do not need to run the command from multiple nodes.

    If the uninstall-ucp command fails, manually uninstall MKE:

    1. On any manager node, remove the remaining MKE services:

      docker service rm $(docker service ls -f name=ucp- -q)
      
    2. On each manager node, remove the remaining MKE containers:

      docker container rm -f $(docker container ps -a -f name=ucp- -f name=k8s_ -q)
      
    3. On each manager node, remove the remaining MKE volumes:

      docker volume rm $(docker volume ls -f name=ucp -q)
      
  3. Optional. Delete the MKE configuration:

    docker container run --rm -it \
      -v /var/run/docker.sock:/var/run/docker.sock \
      --name ucp \
      mirantis/ucp:3.5.0 uninstall-ucp --purge-config
    

    MKE keeps the configuration by default in case you want to reinstall MKE later with the same configuration. For all available uninstall-ucp options, refer to mirantis/ucp uninstall-ucp.

  4. Optional. Restore the host IP tables to their pre-MKE installation values by restarting the node.

    Note

    The Calico network plugin changed the host IP tables from their original values during MKE installation.

Deploy Swarm-only mode

Available since MKE 3.5.0

Swarm-only mode is an MKE configuration that supports only Swarm orchestration. Lacking Kubernetes and its operational and health-check dependencies, the resulting highly-stable application is smaller than a typical mixed-orchestration MKE installation.

You can only enable or disable Swarm-only mode at the time of MKE installation. MKE preserves the Swarm-only setting through upgrades, backups, and system restoration. Installing MKE in Swarm-only mode pulls only the images required to run MKE in this configuration. Refer to Swarm-only images for more information.

Note

Installing MKE in Swarm-only mode removes all Kubernetes options from the MKE web UI.

To install MKE in swarm-only mode:

  1. Complete the steps and recommendations in Plan the deployment and Perform pre-deployment configuration.

  2. Add the --swarm-only flag to the install command in Install the MKE image:

    docker container run --rm -it --name ucp \
    -v /var/run/docker.sock:/var/run/docker.sock \
    mirantis/ucp:3.5.0 install \
    --host-address <node-ip-address> \
    --interactive \
    --swarm-only
    

Note

In addition, MKE includes the --swarm-only flag with the bootstrapper images command, which you can use to pull or to check the required images on manager nodes.

Swarm-only images

Installing MKE in Swarm-only mode pulls the following set of images, which is smaller than that of a typical MKE installation:

  • ucp-agent (ucp-agent-win on Windows)

  • ucp-auth-store

  • ucp-auth

  • ucp-azure-ip-allocator

  • ucp-cfssl

  • ucp-compose

  • ucp-containerd-shim-process (ucp-containerd-shim-process-win on Windows)

  • ucp-controller

  • ucp-csi-attacher

  • ucp-csi-liveness-probe

  • ucp-csi-node-driver-registrar

  • ucp-csi-provisioner

  • ucp-csi-resizer

  • ucp-csi-snapshotter

  • ucp-dsinfo (ucp-dsinfo-win on Windows)

  • ucp-etcd

  • ucp-interlock-config

  • ucp-interlock-extension

  • ucp-interlock-proxy

  • ucp-interlock

  • ucp-metrics

  • ucp-openstack-ccm

  • ucp-openstack-cinder-csi-plugin

  • ucp-swarm

Prometheus

In Swarm-only mode, MKE runs the Prometheus server and the authenticating proxy in a single container on each manager node. Thus, unlike in conventional MKE installations, you cannot configure Prometheus server placement. Prometheus does not collect Kubernetes metrics in Swarm-only mode, and it requires an additional reserved port on manager nodes: 12387.

Operations Guide

The MKE Operations Guide provides the comprehensive information you need to run the MKE container orchestration platform. The guide is intended for anyone who needs to effectively develop and securely administer applications at scale, on private clouds, public clouds, and on bare metal.

Access an MKE cluster

You can access an MKE cluster in a variety of ways including through the MKE web UI, Docker CLI, and kubectl (the Kubernetes CLI). To use the Docker CLI and kubectl with MKE, first download a client certificate bundle. This topic describes the MKE web UI, how to download and configure the client bundle, and how to configure kubectl with MKE.

Access the MKE web UI

MKE allows you to control your cluster visually using the web UI. Role-based access control (RBAC) gives administrators and non-administrators access to the following web UI features:

  • Administrators:

    • Manage cluster configurations.

    • View and edit all cluster images, networks, volumes, and containers.

    • Manage the permissions of users, teams, and organizations.

    • Grant node-specific task scheduling permissions to users.

  • Non-administrators:

    • View and edit all cluster images, networks, volumes, and containers. Requires administrator to grant access.

To access the MKE web UI:

  1. Open a browser and navigate to https://<ip-address> (substituting <ip-address> with the IP address of the machine that ran docker run).

  2. Enter the user name and password that you set up when installing the MKE image.

Note

To set up two-factor authentication for logging in to the MKE web UI, see Use two-factor authentication.

Download and configure the client bundle

Download and configure the MKE client certificate bundle to use MKE with Docker CLI and kubectl. The bundle includes:

  • A private and public key pair for authorizing your requests using MKE

  • Utility scripts for configuring Docker CLI and kubectl with your MKE deployment

Note

MKE issues different certificates for each user type:

User certificate bundles

Allow running docker commands only through MKE manager nodes.

Administrator certificate bundles

Allow running docker commands through all node types.

Download the client bundle

This section explains how to download the client certificate bundle using either the MKE web UI or the MKE API.

To download the client certificate bundle using the MKE web UI:

  1. Navigate to My Profile.

  2. Click Client Bundles > New Client Bundle.

To download the client certificate bundle using the MKE API on Linux:

  1. Create an environment variable with the user security token:

    AUTHTOKEN=$(curl -sk -d \
    '{"username":"<username>","password":"<password>"}' \
    https://<mke-ip>/auth/login | jq -r .auth_token)
    
  2. Download the client certificate bundle:

    curl -k -H "Authorization: Bearer $AUTHTOKEN" \
    https://<mke-ip>/api/clientbundle -o bundle.zip
    

To download the client certificate bundle using the MKE API on Windows Server 2016:

  1. Open an elevated PowerShell prompt.

  2. Create an environment variable with the user security token:

    $AUTHTOKEN=((Invoke-WebRequest -Body '{"username":"<username>", \
    "password":"<password>"}' -Uri https://`<mke-ip`>/auth/login \
    -Method POST).Content)|ConvertFrom-Json|select auth_token \
    -ExpandProperty auth_token
    
  3. Download the client certificate bundle:

    [io.file]::WriteAllBytes("ucp-bundle.zip", \
    ((Invoke-WebRequest -Uri https://`<mke-ip`>/api/clientbundle \
    -Headers @{"Authorization"="Bearer $AUTHTOKEN"}).Content))
    
Configure the client bundle

This section explains how to configure the client certificate bundle to authenticate your requests with MKE using the Docker CLI and kubectl.

To configure the client certificate bundle:

  1. Extract the client bundle .zip file into a directory, and use the appropriate utility script for your system:

    • For Linux:

      cd client-bundle && eval "$(<env.sh)"
      
    • For Windows (from an elevated PowerShell prompt):

      cd client-bundle && env.cmd
      

    The utility scripts do the following:

    • Update DOCKER_HOST to make the client tools communicate with your MKE deployment.

    • Update DOCKER_CERT_PATH to use the certificates included in the client bundle.

    • Configure kubectl with the kubectl config command.

  2. Verify that your client tools communicate with MKE:

    docker version --format '{{.Server.Version}}'
    kubectl config current-context
    

    The expected Docker CLI server version starts with ucp/, and the expected kubectl context name starts with ucp_.

  3. Optional. Change your context directly using the client certificate bundle .zip files. In the directory where you downloaded the user bundle, add the new context:

    cd client-bundle && docker context \
    import myucp ucp-bundle-$USER.zip
    

Note

If you use the client certificate bundle with buildkit, make sure that builds are not accidentally scheduled on manager nodes. For more information, refer to Restrict services to worker nodes.

Configure kubectl with MKE

MKE installations include Kubernetes. Users can deploy, manage, and monitor Kubernetes using either the MKE web UI or kubectl.

To install and use kubectl:

  1. Identify which version of Kubernetes you are running by using the MKE web UI, the MKE API version endpoint, or the Docker CLI docker version command with the client bundle.

    Caution

    Kubernetes requires that kubectl and Kubernetes be within one minor version of each other.

  2. Refer to Kubernetes: Install Tools to download and install the appropriate kubectl binary.

  3. Download the client bundle.

  4. Refer to Configure the client bundle to configure kubectl with MKE using the certificates and keys contained in the client bundle.

  5. Optional. Install Helm, the Kubernetes package manager, and Tiller, the Helm server.

    Caution

    Helm requires MKE 3.1.x or higher.

    To use Helm and Tiller with MKE, grant the default service account within the kube-system namespace the necessary roles:

    kubectl create rolebinding default-view --clusterrole=view \
    --serviceaccount=kube-system:default --namespace=kube-system
    
    kubectl create clusterrolebinding add-on-cluster-admin \
    --clusterrole=cluster-admin --serviceaccount=kube-system:default
    

    Note

    Helm recommends that you specify a Role and RoleBinding to limit the scope of Tiller to a particular namespace. Refer to the official Helm documentation for more information.

See also

Kubernetes

Administer an MKE cluster

Add labels to cluster nodes

With MKE, you can add labels to your nodes. Labels are metadata that describe the node, such as:

  • node role (development, QA, production)

  • node region (US, EU, APAC)

  • disk type (HDD, SSD)

Once you apply a label to a node, you can specify constraints when deploying a service to ensure that the service only runs on nodes that meet particular criteria.

Hint

Use resource sets (MKE collections or Kubernetes namespaces) to organize access to your cluster, rather than creating labels for authorization and permissions to resources.

Apply labels to a node

The following example procedure applies the ssd label to a node.

  1. Log in to the MKE web UI with administrator credentials.

  2. Click Shared Resources in the navigation menu to expand the selections.

  3. Click Nodes. The details pane will display the full list of nodes.

  4. Click the node on the list that you want to attach labels to. The details pane will transition, presenting the Overview information for the selected node.

  5. Click the settings icon in the upper-right corner to open the Edit Node page.

  6. Navigate to the Labels section and click Add Label.

  7. Add a label, entering disk into the Key field and ssd into the Value field.

  8. Click Save to dismiss the Edit Node page and return to the node Overview.

Hint

You can use the CLI to apply a label to a node:

docker node update --label-add <key>=<value> <node-id>
Deploy a service with constraints

The following example procedure deploys a service with a constraint that ensures that the service only runs on nodes with SSD storage node.labels.disk == ssd.

  1. Log in to the MKE web UI with administrator credentials.

  2. Click Shared Resources in the navigation menu to expand the selections.

  3. Click Stacks. The details pane will display the full list of stacks.

  4. Click the Create Stack button to open the Create Application page.

  5. Under 1. Configure Application, enter “wordpress” into the Name field .

  6. Under ORCHESTRATOR NODE, select Swarm Services.

  7. Under 2. Add Application File, paste the following stack file in the docker-compose.yml editor:

    version: "3.1"
    
    services:
      db:
        image: mysql:5.7
        deploy:
          placement:
            constraints:
              - node.labels.disk == ssd
          restart_policy:
            condition: on-failure
        networks:
          - wordpress-net
        environment:
          MYSQL_ROOT_PASSWORD: wordpress
          MYSQL_DATABASE: wordpress
          MYSQL_USER: wordpress
          MYSQL_PASSWORD: wordpress
      wordpress:
        depends_on:
          - db
        image: wordpress:latest
        deploy:
          replicas: 1
          placement:
            constraints:
              - node.labels.disk == ssd
          restart_policy:
            condition: on-failure
            max_attempts: 3
        networks:
          - wordpress-net
        ports:
          - "8000:80"
        environment:
          WORDPRESS_DB_HOST: db:3306
          WORDPRESS_DB_PASSWORD: wordpress
    
    networks:
      wordpress-net:
    
  8. Click Create to deploy the stack.

  9. Click Done once the stack deployment completes to return to the stacks list which now features your newly created stack.

  10. In the navigation menu, click Nodes. The details pane will display the full list of nodes.

  11. Click the node with the disk label.

  12. In the details pane, click the Inspect Resource drop-down menu and select Containers.

  13. Dismiss the filter and navigate to the Nodes page.

  14. Click any node that does not have the disk label.

  15. In the details pane, click the Inspect Resource drop-down menu and select Containers. Dismiss the filter since there are no WordPress containers scheduled on the node.

Add or remove a service constraint using the MKE web UI

You can declare the deployment constraints in your docker-compose.yml file or when you create a stack. Also, you can apply constraints when you create a service.

To add or remove a service constraint:

  1. Verify whether a service has deployment constraints:

    1. Navigate to the Services page and select that service.

    2. In the details pane, click Constraints to list the constraint labels.

  2. Edit the constraints on the service:

    1. Click Configure and select Details to open the Update Service page.

    2. Click Scheduling to view the constraints.

    3. Add or remove deployment constraints.

See also

Kubernetes

Add SANs to cluster certificates

A SAN (Subject Alternative Name) is a structured means for associating various values (such as domain names, IP addresses, email addresses, URIs, and so on) with a security certificate.

MKE always runs with HTTPS enabled. As such, whenever you connect to MKE, you must ensure that the MKE certificates recognize the host name in use. For example, if MKE is behind a load balancer that forwards traffic to your MKE instance, your requests will not be for the MKE host name or IP address but for the host name of the load balancer. Thus, MKE will reject the requests, unless you include the address of the load balancer as a SAN in the MKE certificates.

Note

  • To use your own TLS certificates, confirm first that these certificates have the correct SAN values.

  • To use the self-signed certificate that MKE offers out-of-the-box, you can use the --san argument to set up the SANs during MKE deployment.

To add new SANs using the MKE web UI:

  1. Log in to the MKE web UI using administrator credentials.

  2. Navigate to the Nodes page.

  3. Click on a manager node to display the details pane for that node.

  4. Click Configure and select Details.

  5. In the SANs section, click Add SAN and enter one or more SANs for the cluster.

  6. Click Save.

  7. Repeat for every existing manager node in the cluster.

    Note

    Thereafter, the SANs are automatically applied to any new manager nodes that join the cluster.

To add new SANs using the MKE CLI:

  1. Get the current set of SANs for the given manager node:

    docker node inspect --format '{{ index .Spec.Labels "com.docker.ucp.SANs"
    }}' <node-id>
    

    Example of system response:

    default-cs,127.0.0.1,172.17.0.1
    
  2. Append the desired SAN to the list (for example, default-cs,127.0.0.1,172.17.0.1,example.com) and run:

    docker node update --label-add com.docker.ucp.SANs=<SANs-list> <node-id>
    

    Note

    <SANs-list> is the comma-separated list of SANs with your new SAN appended at the end.

  3. Repeat the command sequence for each manager node.

Collect MKE cluster metrics with Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit to which you can configure MKE as a target.

Prometheus runs as a Kubernetes deployment that, by default, is a DaemonSet that runs on every manager node. A key benefit of this is that you can set the DaemonSet to not schedule on any nodes, which effectively disables Prometheus if you do not use the MKE web interface.

Along with events and logs, metrics are data sources that provide a view into your cluster, presenting numerical data values that have a time-series component. There are several sources from which you can derive metrics, each providing different meanings for a business and its applications.

As the metrics data is stored locally on disk for each Prometheus server, it does not replicate on new managers or if you schedule Prometheus to run on a new node. The metrics are kept no longer than 24 hours.

MKE metrics types

MKE provides a base set of metrics that gets you into production without having to rely on external or third-party tools. Mirantis strongly encourages, though, the use of additional monitoring to provide more comprehensive visibility into your specific MKE environment.

Metrics types

Metric type

Description

Business

High-level aggregate metrics that typically combine technical, financial, and organizational data to create IT infrastructure information for business leaders. Examples of business metrics include:

  • Company or division-level application downtime

  • Aggregation resource utilization

  • Application resource demand growth

Application

Metrics on APM tools domains (such as AppDynamics and DynaTrace) that supply information on the state or performance of the application itself.

  • Service state

  • Container platform

  • Host infrastructure

Service

Metrics on the state of services that are running on the container platform. Such metrics have very low cardinality, meaning the values are typically from a small fixed set of possibilities (commonly binary).

  • Application health

  • Convergence of Kubernetes deployments and Swarm services

  • Cluster load by number of services or containers or pods

Note

Web UI disk usage (including free space) reflects only the MKE managed portion of the file system: /var/lib/docker. To monitor the total space available on each filesystem of an MKE worker or manager, deploy a third-party monitoring solution to oversee the operating system.

See also

Kubernetes

Metrics labels

The metrics that MKE exposes in Prometheus have standardized labels, depending on the target resource.

Container labels

Label name

Value

collection

The collection ID of the collection the container is in, if any.

container

The ID of the container.

image

The name of the container image.

manager

Set to true if the container node is an MKE manager.

name

The container name.

podName

The pod name, if the container is part of a Kubernetes Pod.

podNamespace

The pod namespace, if the container is part of a Kubernetes Pod namespace.

podContainerName

The container name in the pod spec, if the container is part of a Kubernetes pod.

service

The service ID, if the container is part of a Swarm service.

stack

The stack name, if the container is part of a Docker Compose stack.

Container networking labels

Label name

Value

collection

The collection ID of the collection the container is in, if any.

container

The ID of the container.

image

The name of the container image.

manager

Set to true if the container node is an MKE manager.

name

The container name.

network

The ID of the network.

podName

The pod name, if the container is part of a Kubernetes pod.

podNamespace

The pod namespace, if the container is part of a Kubernetes pod namespace.

podContainerName

The container name in the pod spec, if the container is part of a Kubernetes pod.

service

The service ID, if the container is part of a Swarm service.

stack

The stack name, if the container is part of a Docker Compose stack.

Note

The container networking labels are the same as the Container labels, with the addition of network.

Node labels

Label name

Value

manager

Set to true if the node is an MKE manager.

See also

Kubernetes

MKE Metrics exposed by Prometheus

MKE exports metrics on every node and also exports additional metrics from every controller.

Node-sourced MKE metrics

The metrics that MKE exports from nodes are specific to those nodes (for example, the total memory on that node).

The tables below offer detail on the node-sourced metrics that MKE exposes in Prometheus with the ucp_ label.

ucp_engine_container_cpu_percent

Units

Percentage

Description

Percentage of CPU time in use by the container

Labels

Container

ucp_engine_container_cpu_total_time_nanoseconds

Units

Nanoseconds

Description

Total CPU time used by the container

Labels

Container

ucp_engine_container_health

Units

0.0 or 1.0

Description

The container health, according to its healthcheck.

The 0 value indicates that the container is not reporting as healthy, which is likely because it either does not have a healthcheck defined or because healthcheck results have not yet been returned

Labels

Container

ucp_engine_container_memory_max_usage_bytes

Units

Bytes

Description

Maximum memory in use by the container in bytes

Labels

Container

ucp_engine_container_memory_usage_bytes

Units

Bytes

Description

Current memory in use by the container in bytes

Labels

Container

ucp_engine_container_memory_usage_percent

Units

Percentage

Description

Percentage of total node memory currently in use by the container

Labels

Container

ucp_engine_container_network_rx_bytes_total

Units

Bytes

Description

Number of bytes received by the container over the network in the last sample

Labels

Container networking

ucp_engine_container_network_rx_dropped_packets_total

Units

Number of packets

Description

Number of packets bound for the container over the network that were dropped in the last sample

Labels

Container networking

ucp_engine_container_network_rx_errors_total

Units

Number of errors

Description

Number of received network errors for the container over the network in the last sample

Labels

Container networking

ucp_engine_container_network_rx_packets_total

Units

Number of packets

Description

Number of packets received by the container over the network in the last sample

Labels

Container networking

ucp_engine_container_network_tx_bytes_total

Units

Bytes

Description

Number of bytes sent by the container over the network in the last sample

Labels

Container networking

ucp_engine_container_network_tx_dropped_packets_total

Units

Number of packets

Description

Number of packets sent from the container over the network that were dropped in the last sample

Labels

Container networking

ucp_engine_container_network_tx_errors_total

Units

Number of errors

Description

Number of sent network errors for the container on the network in the last sample

Labels

Container networking

ucp_engine_container_network_tx_packets_total

Units

Number of packets

Description

Number of sent packets for the container over the network in the last sample

Labels

Container networking

ucp_engine_container_unhealth

Units

0.0 or 1.0

Description

Indicates whether the container is healthy, according to its healthcheck.

The 0 value indicates that the container is not reporting as healthy, which is likely because it either does not have a healthcheck defined or because healthcheck results have not yet been returned

Labels

Container

ucp_engine_containers

Units

Number of containers

Description

Total number of containers on the node

Labels

Node

ucp_engine_cpu_total_time_nanoseconds

Units

Nanoseconds

Description

System CPU time used by the container

Labels

Container

ucp_engine_disk_free_bytes

Units

Bytes

Description

Free disk space on the Docker root directory on the node, in bytes. This metric is not available to Windows nodes

Labels

Node

ucp_engine_disk_total_bytes

Units

Bytes

Description

Total disk space on the Docker root directory on this node in bytes. Note that the ucp_engine_disk_free_bytes metric is not available for Windows nodes

Labels

Node

ucp_engine_images

Units

Number of images

Description

Total number of images on the node

Labels

Node

ucp_engine_memory_total_bytes

Units

Bytes

Description

Total amount of memory on the node

Labels

Node

ucp_engine_networks

Units

Number of networks

Description

Total number of networks on the node

Labels

Node

ucp_engine_num_cpu_cores

Units

Number of cores

Description

Number of CPU cores on the node

Labels

Node

ucp_engine_volumes

Units

Number of volumes

Description

Total number of volumes on the node

Labels

Node

Controller-sourced MKE metrics

The metrics that MKE exports from controllers are cluster-scoped (for example, the total number of Swarm services).

The tables below offer detail on the controller-sourced metrics that MKE exposes in Prometheus with the ucp_ label.

ucp_controller_services

Units

Number of services

Description

Total number of Swarm services

Labels

Not applicable

ucp_engine_node_health

Units

0.0 or 1.0

Description

Health status of the node, as determined by MKE

Labels

nodeName: node name, nodeAddr: node IP address

ucp_engine_pod_container_ready

Units

0.0 or 1.0

Description

Readiness of the container in a Kubernetes pod, as determined by its readiness probe

Labels

Pod

ucp_engine_pod_ready

Units

0.0 or 1.0

Description

Readiness of the container in a Kubernetes pod, as determined by its readiness probe

Labels

Pod

See also

Kubernetes Pods

Deploy Prometheus on worker nodes

MKE deploys Prometheus by default on the manager nodes to provide a built-in metrics back end. For cluster sizes over 100 nodes, or if you need to scrape metrics from Prometheus instances, Mirantis recommends that you deploy Prometheus on dedicated worker nodes in the cluster.

To deploy Prometheus on worker nodes:

  1. Source an admin bundle.

  2. Verify that ucp-metrics pods are running on all managers:

    $ kubectl -n kube-system get pods -l k8s-app=ucp-metrics -o wide
    
    NAME               READY  STATUS   RESTARTS  AGE  IP            NODE
    ucp-metrics-hvkr7  3/3    Running  0         4h   192.168.80.66 3a724a-0
    
  3. Add a Kubernetes node label to one or more workers. For example, a label with key ucp-metrics and value "" to a node with name 3a724a-1.

    $ kubectl label node 3a724a-1 ucp-metrics=
    
    node "test-3a724a-1" labeled
    

    SELinux Prometheus Deployment

    If you use SELinux, label your ucp-node-certs directories properly on the worker nodes before you move the ucp-metrics workload to them. To run ucp-metrics on a worker node, update the ucp-node-certs label by running:

    sudo chcon -R system_u:object_r:container_file_t:s0 /var/lib/docker/volumes/ucp-node-certs/_data.

  4. Patch the ucp-metrics DaemonSet’s nodeSelector with the same key and value in use for the node label. This example shows the key ucp-metrics and the value "".

    $ kubectl -n kube-system patch daemonset ucp-metrics --type json -p
    '[{"op": "replace", "path": "/spec/template/spec/nodeSelector", "value":
    {"ucp-metrics": ""}}]' daemonset "ucp-metrics" patched
    
  5. Confirm that ucp-metrics pods are running only on the labeled workers.

    $ kubectl -n kube-system get pods -l k8s-app=ucp-metrics -o wide
    
    NAME               READY  STATUS       RESTARTS  AGE IP           NODE
    ucp-metrics-88lzx  3/3    Running      0         12s 192.168.83.1 3a724a-1
    ucp-metrics-hvkr7  3/3    Terminating  0         4h 192.168.80.66 3a724a-0
    

See also

Kubernetes

Configure external Prometheus to scrape metrics from MKE

To configure your external Prometheus server to scrape metrics from Prometheus in MKE:

  1. Source an admin bundle.

  2. Create a Kubernetes secret that contains your bundle TLS material.

    (cd $DOCKER_CERT_PATH && kubectl create secret generic prometheus --from-file=ca.pem --from-file=cert.pem --from-file=key.pem)
    
  3. Create a Prometheus deployment and ClusterIP service using YAML.

    On AWS with the Kubernetes cloud provider configured:

    1. Replace ClusterIP with LoadBalancer in the service YAML.

    2. Access the service through the load balancer.

    3. If you run Prometheus external to MKE, change the domain for the inventory container in the Prometheus deployment from ucp-controller.kube-system.svc.cluster.local to an external domain, to access MKE from the Prometheus node.

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus
    data:
      prometheus.yaml: |
        global:
          scrape_interval: 10s
        scrape_configs:
        - job_name: 'ucp'
          tls_config:
            ca_file: /bundle/ca.pem
            cert_file: /bundle/cert.pem
            key_file: /bundle/key.pem
            server_name: proxy.local
          scheme: https
          file_sd_configs:
          - files:
            - /inventory/inventory.json
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: prometheus
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: prometheus
      template:
        metadata:
          labels:
            app: prometheus
        spec:
          containers:
          - name: inventory
            image: alpine
            command: ["sh", "-c"]
            args:
            - apk add --no-cache curl &&
              while :; do
                curl -Ss --cacert /bundle/ca.pem --cert /bundle/cert.pem --key /bundle/key.pem --output /inventory/inventory.json https://ucp-controller.kube-system.svc.cluster.local/metricsdiscovery;
                sleep 15;
              done
            volumeMounts:
            - name: bundle
              mountPath: /bundle
            - name: inventory
              mountPath: /inventory
          - name: prometheus
            image: prom/prometheus
            command: ["/bin/prometheus"]
            args:
            - --config.file=/config/prometheus.yaml
            - --storage.tsdb.path=/prometheus
            - --web.console.libraries=/etc/prometheus/console_libraries
            - --web.console.templates=/etc/prometheus/consoles
            volumeMounts:
            - name: bundle
              mountPath: /bundle
            - name: config
              mountPath: /config
            - name: inventory
              mountPath: /inventory
          volumes:
          - name: bundle
            secret:
              secretName: prometheus
          - name: config
            configMap:
              name: prometheus
          - name: inventory
            emptyDir:
              medium: Memory
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus
    spec:
      ports:
      - port: 9090
        targetPort: 9090
      selector:
        app: prometheus
      sessionAffinity: ClientIP
    EOF
    
  4. Determine the service ClusterIP:

    $ kubectl get service prometheus
    
    NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
    prometheus   ClusterIP   10.96.254.107   <none>        9090/TCP   1h
    
  5. Forward port 9090 on the local host to the ClusterIP. The tunnel you create does not need to be kept alive as its only purpose is to expose the Prometheus UI.

    ssh -L 9090:10.96.254.107:9090 ANY_NODE
    
  6. Visit http://127.0.0.1:9090 to explore the MKE metrics that Prometheus is collecting.

See also

Kubernetes

See also

Kubernetes

Configure native Kubernetes role-based access control

MKE uses native Kubernetes RBAC, which is active by default for Kubernetes clusters. The YAML files of many ecosystem applications and integrations use Kubernetes RBAC to access service accounts. Also, organizations looking to run MKE both on-premises and in hosted cloud services want to run Kubernetes applications in both environments without having to manually change RBAC in their YAML file.

Note

Kubernetes and Swarm roles have separate views. Using the MKE web UI, you can view all the roles for a particular cluster:

  1. Click Access Control in the navigation menu at the left.

  2. Click Roles.

  3. Select the Kubernetes tab or the Swarm tab to view the specific roles for each.

Create a Kubernetes role

You create Kubernetes roles either through the CLI using Kubernetes kubectl tool or through the MKE web UI.

To create a Kubernetes role using the MKE web UI:

  1. Log in to the the MKE web UI.

  2. In the navigation menu at the left, click Access Control to display the available options.

  3. Click Roles.

  4. At the top of the details pane, click the Kubernetes tab.

  5. Click Create to open the Create Kubernetes Object page.

  6. Click Namespace to select a namespace for the role from one of the available options.

  7. Provide the YAML file for the role. To do this, either enter it in the Object YAML editor, or upload an existing .yml file using the Click to upload a .yml file selection link at the right.

  8. Click Create to complete role creation.

See also

Create a Kubernetes role grant

Kubernetes provides two types of role grants:

  • ClusterRoleBinding (applies to all namespaces)

  • RoleBinding (applies to a specific namespace)

To create a grant for a Kubernetes role in the MKE web UI:

  1. Log in to the the MKE web UI.

  2. In the navigation menu at the left, click Access Control to display the available options.

  3. Click the Grants option.

  4. At the top of the details paine, click the Kubernetes tab. All existing grants to Kubernetes roles are present in the details pane.

  5. Click Create Role Binding to open the Create Role Binding page.

  6. Select the subject type at the top of the 1. Subject section (Users, Organizations, or Service Account).

  7. Create a role binding for the selected subject type:

    • Users: Select a type from the User drop-down list.

    • Organizations: Select a type from the Organization drop-down list. Optionally, you can also select a team using the Team(optional) drop-down list, if any have been established.

    • Service Account: Select a NAMESPACE from the Namespace drop-down list, then a type from the Service Account drop-down list.

  8. Click Next to activate the 2. Resource Set section.

  9. Select a resource set for the subject.

    By default, the default namespace is indicated. To use a different namespace, select the Select Namespace button associated with the desired namespace.

    For ClusterRoleBinding, slide the Apply Role Binding to all namespace (Cluster Role Binding) selector to the right.

  10. Click Next to activate the 3. Role section.

  11. Select the role type.

    • Role

    • Cluster Role

    Note

    Cluster Role type is the only role type available if you enabled Apply Role Binding to all namespace (Cluster Role Binding) in the 2. Resource Set section.

  12. Select the role from the from the drop-down list.

  13. Click Create to complete grant creation.

See also

Kubernetes

MKE audit logging

Audit logs are a chronological record of security-relevant activities by individual users, administrators, or software components that have had an effect on an MKE system. They focus on external user/agent actions and security, rather than attempting to understand state or events of the system itself.

Audit logs capture all HTTP actions (GET, PUT, POST, PATCH, DELETE) to all MKE API, Swarm API, and Kubernetes API endpoints (with the exception of the ignored list) that are invoked and and sent to Mirantis Container Runtime via stdout.

The benefits that audit logs provide include:

Historical troubleshooting

You can use audit logs to determine a sequence of past events that can help explain why an issue occurred.

Security analysis and auditing

A full record of all user interactions with the container infrastructure can provide your security team with the visibility necessary to root out questionable or unauthorized access attempts.

Chargeback

Use audit log about the resources to generate chargeback information.

Alerting

With a watch on an event stream or a notification the event creates, you can build alerting features on top of event tools that generate alerts for ops teams (PagerDuty, OpsGenie, Slack, or custom solutions).

Logging levels

MKE provides three levels of audit logging to administrators:

None

Audit logging is disabled.

Metadata

Includes:
  • Method and API endpoint for the request

  • MKE user who made the request

  • Response status (success or failure)

  • Timestamp of the call

  • Object ID of any created or updated resource (for create or update API calls). We do not include names of created or updated resources.

  • License key

  • Remote address

Request

Includes all fields from the Metadata level, as well as the request payload.

Once you enable MKE audit logging, the audit logs will collect within the container logs of the ucp-controller container on each MKE manager node.

Note

Be sure to configure a logging driver with log rotation set, as audit logging can generate a large amount of data.

Enable MKE audit logging

You can enable MKE audit logging using the MKE web user interface, the MKE API, and the MKE configuration file.

Enable MKE audit logging using the web UI
  1. Log in to the MKE web user interface.

  2. Click admin to open the navigation menu at the left.

  3. Click Admin Settings.

  4. Click Logs & Audit Logs to open the Logs & Audit Logs details pane.

  5. In the Configure Audit Log Level section, select the relevant logging level.

  6. Click Save.

Enable MKE audit logging using the API
  1. Download the MKE client bundle from the command line, as described in Download the client bundle.

  2. Retrieve the JSON file for current audit log configuration:

    export DOCKER_CERT_PATH=~/ucp-bundle-dir/
    curl --cert ${DOCKER_CERT_PATH}/cert.pem --key ${DOCKER_CERT_PATH}/key.pem --cacert ${DOCKER_CERT_PATH}/ca.pem -k -X GET https://ucp-domain/api/ucp/config/logging > auditlog.json
    
  3. In auditlog.json, edit the auditlevel field to metadata or request:

    {
        "logLevel": "INFO",
        "auditLevel": "metadata",
        "supportDumpIncludeAuditLogs": false
    }
    
  4. Send the JSON request for the audit logging configuration with the same API path, but using the PUT method:

    curl --cert ${DOCKER_CERT_PATH}/cert.pem --key
    ${DOCKER_CERT_PATH}/key.pem --cacert ${DOCKER_CERT_PATH}/ca.pem -k -H
    "Content-Type: application/json" -X PUT --data $(cat auditlog.json)
    https://ucp-domain/api/ucp/config/logging
    
Enable MKE audit logging using the configuration file

You can enable MKE audit logging using the MKE configuration file before or after MKE installation.

The section of the MKE configuration file that controls MKE auditing logging is [audit_log_configuration]:

[audit_log_configuration]
  level = "metadata"
  support_dump_include_audit_logs = false

The level setting supports the following variables:

  • ""

  • "metadata"

  • "request"

Caution

The support_dump_include_audit_logs flag specifies whether user identification information from the ucp-controller container logs is included in the support dump. To prevent this information from being sent with the support dump, make sure that support_dump_include_audit_logs is set to false. When disabled, the support dump collection tool filters out any lines from the ucp-controller container logs that contain the substring auditID.

Access audit logs using the docker CLI

The audit logs are exposed through the ucp-controller logs. You can access these logs locally through the Docker CLI.

Note

You can also access MKE audit logs using an external container logging solution, such as ELK.

To access audit logs using the Docker CLI:

  1. Source a MKE client bundle.

  2. Run docker logs to obtain audit logs.

    The following example tails the command to show the last log entry.

    $ docker logs ucp-controller --tail 1
    
    {"audit":{"auditID":"f8ce4684-cb55-4c88-652c-d2ebd2e9365e","kind":"docker-swarm","level":"metadata","metadata":{"creationTimestamp":null},"requestReceivedTimestamp":"2019-01-30T17:21:45.316157Z","requestURI":"/metricsservice/query?query=(%20(sum%20by%20(instance)%20(ucp_engine_container_memory_usage_bytes%7Bmanager%3D%22true%22%7D))%20%2F%20(sum%20by%20(instance)%20(ucp_engine_memory_total_bytes%7Bmanager%3D%22true%22%7D))%20)%20*%20100\u0026time=2019-01-30T17%3A21%3A45.286Z","sourceIPs":["172.31.45.250:48516"],"stage":"RequestReceived","stageTimestamp":null,"timestamp":null,"user":{"extra":{"licenseKey":["FHy6u1SSg_U_Fbo24yYUmtbH-ixRlwrpEQpdO_ntmkoz"],"username":["admin"]},"uid":"4ec3c2fc-312b-4e66-bb4f-b64b8f0ee42a","username":"4ec3c2fc-312b-4e66-bb4f-b64b8f0ee42a"},"verb":"GET"},"level":"info","msg":"audit","time":"2019-01-30T17:21:45Z"}
    

    Sample audit log for a Kubernetes cluster:

    {"audit"; {
          "metadata": {...},
          "level": "Metadata",
          "timestamp": "2018-08-07T22:10:35Z",
          "auditID": "7559d301-fa6b-4ad6-901c-b587fab75277",
          "stage": "RequestReceived",
          "requestURI": "/api/v1/namespaces/default/pods",
          "verb": "list",
          "user": {"username": "alice",...},
          "sourceIPs": ["127.0.0.1"],
          ...,
          "requestReceivedTimestamp": "2018-08-07T22:10:35.428850Z"}}
    

    Sample audit log for a Swarm cluster:

    {"audit"; {
          "metadata": {...},
          "level": "Metadata",
          "timestamp": "2018-08-07T22:10:35Z",
          "auditID": "7559d301-94e7-4ad6-901c-b587fab31512",
          "stage": "RequestReceived",
          "requestURI": "/v1.30/configs/create",
          "verb": "post",
          "user": {"username": "alice",...},
          "sourceIPs": ["127.0.0.1"],
          ...,
          "requestReceivedTimestamp": "2018-08-07T22:10:35.428850Z"}}
    
API endpoints logging constraints

With regard to audit logging, for reasons having to do with system security a number of MKE API endpoints are either ignored or have their information redacted.

API endpoints ignored

The following API endpoints are ignored since they are not considered security events and can create a large amount of log entries:

  • /_ping

  • /ca

  • /auth

  • /trustedregistryca

  • /kubeauth

  • /metrics

  • /info

  • /version\*

  • /debug

  • /openid_keys

  • /apidocs

  • /kubernetesdocs

  • /manage

API endpoints information redacted

For security purposes, information for the following API endpoints is redacted from the audit logs:

  • /secrets/create (POST)

  • /secrets/{id}/update (POST)

  • /swarm/join (POST)

  • /swarm/update (POST) -/auth/login (POST)

  • Kubernetes secrets create/update endpoints

See also

Kubernetes

See also

Kubernetes

Enable MKE telemetry

You can set MKE to automatically record and transmit data to Mirantis through an encrypted channel for monitoring and analysis purposes. The data collected provides the Mirantis Customer Success Organization with information that helps us to better understand the operational use of MKE by our customers. It also provides key feedback in the form of product usage statistics, which enable our product teams to enhance Mirantis products and services.

Specifically, with MKE you can send hourly usage reports, as well as information on API and UI usage.

Caution

To send the telemetry, verify that dockerd and the MKE application container can resolve api.segment.io and create a TCP (HTTPS) connection on port 443.

To enable telemetry in MKE:

  1. Log in to the MKE web UI as an administrator.

  2. At the top of the navigation menu at the left, click the user name drop-down to display the available options.

  3. Click Admin Settings to display the available options.

  4. Click Usage to open the Usage Reporting screen.

  5. Toggle the Enable API and UI tracking slider to the right.

  6. (Optional) Enter a unique label to identify the cluster in the usage reporting.

  7. Click Save.

Enable and integrate SAML authentication

Security Assertion Markup Language (SAML) is an open standard for exchanging authentication and authorization data between parties. It is commonly supported by enterprise authentication systems. SAML-based single sign-on (SSO) gives you access to MKE through a SAML 2.0-compliant identity provider.

MKE supports the Okta and ADFS identity providers.

The SAML integration process is as follows.

  1. Configure the Identity Provider (IdP).

  2. Enable SAML and configure MKE as the Service Provider under Admin Settings > Authentication and Authorization.

  3. Create (Edit) Teams to link with the Group memberships. This updates team membership information when a user signs in with SAML.

Note

If you enable LDAP integration, you cannot enable SAML for authentication. Note, though, that this does not affect local MKE user account authentication.

Configure SAML integration on identity provider

Identity providers require certain values to successfully integrate with MKE. As these values vary depending on the identity provider, consult your identity provider documentation for instructions on how to best provide the needed information.

Okta integration values

Okta integration requires the following values:

Value

Description

URL for single signon (SSO)

URL for MKE, qualified with /enzi/v0/saml/acs. For example, https://111.111.111.111/enzi/v0/saml/acs.

Service provider audience URI

URL for MKE, qualified with /enzi/v0/saml/metadata. For example, https://111.111.111.111/enzi/v0/saml/metadata.

NameID format

Select Unspecified.

Application user name

Email. For example, a custom ${f:substringBefore(user.email, "@")} specifies the user name portion of the email address.

Attribute Statements

  • Name: fullname
    Value: user.displayName

Group Attribute Statement

  • Name: member-of
    Filter: (user defined) for associate group membership.
    The group name is returned with the assertion.
  • Name: is-admin
    Filter: (user defined) for identifying whether the user is an admin.

Okta configuration

When two or more group names are expected to return with the assertion, use the regex filter. For example, use the value apple|orange to return groups apple and orange.

ADFS integration values

To enable ADFS integration:

  1. Add a relying party trust.

  2. Obtain the service provider metadata URI.

    The service provider metadata URI value is the URL for MKE, qualified with /enzi/v0/saml/metadata. For example, https://111.111.111.111/enzi/v0/saml/metadata.

  3. Add claim rules.

    1. Convert values from AD to SAML

      • Display-name : Common Name

      • E-Mail-Addresses : E-Mail Address

      • SAM-Account-Name : Name ID

    2. Create a full name for MKE (custom rule):

      c:[Type == "http://schemas.xmlsoap.org/claims/CommonName"]      => issue(Type = "fullname", Issuer = c.Issuer, OriginalIssuer = c.OriginalIssuer, Value = c.Value,       ValueType = c.ValueType);
      
    3. Transform account name to Name ID:

      • Incoming type: Name ID

      • Incoming format: Unspecified

      • Outgoing claim type: Name ID

      • Outgoing format: Transient ID

    4. Pass admin value to allow admin access based on AD group. Send group membership as claim:

      • Users group: your admin group

      • Outgoing claim type: is*admin

      • Outgoing claim value: 1

    5. Configure group membership for more complex organizations, with multiple groups able to manage access.

      • Send LDAP attributes as claims

      • Attribute store: Active Directory

        • Add two rows with the following information:

          • LDAP attribute = email address; outgoing claim type: email address

          • LDAP attribute = Display*Name; outgoing claim type: common name

      • Mapping:

        • Token-Groups - Unqualified Names : member-of

Note

Once you enable SAML, Service Provider metadata is available at https://<SPHost>/enzi/v0/saml/metadata. The metadata link is also labeled as entityID.

Only POST binding is supported for the Assertion Consumer Service, which is located at https://<SP Host>/enzi/v0/saml/acs.

Configure SAML integration on MKE

SAML configuration requires that you know the metadata URL for your chosen identity provider, as well as the URL for the MKE host that contains the IP address or domain of your MKE installation.

To configure SAML integration on MKE:

  1. Log in to the MKE web UI.

  2. In the navigation menu at the left, click the user name drop-down to display the available options.

  3. Click Admin Settings to display the available options.

  4. Click Authentication & Authorization.

  5. In the Identity Provider section in the details pane, move the slider next to SAML to enable the SAML settings.

  6. In the SAML idP Server subsection, enter the URL for the identity provider metadata in the IdP Metadata URL field.

    Note

    If the metadata URL is publicly certified, you can continue with the default settings:

    • Skip TLS Verification unchecked

    • Root Certificates Bundle blank

    Mirantis recommends TLS verification in production environments. If the metadata URL cannot be certified by the default certificate authority store, you must provide the certificates from the identity provider in the Root Certificates Bundle field.

  7. In the SAML Service Provider subsection, in the MKE Host field, enter the URL that includes the IP address or domain of your MKE installation.

    The port number is optional. The current IP address or domain displays by default.

  8. (Optional) Customize the text of the sign-in button by entering the text for the button in the Customize Sign In Button Text field. By default, the button text is Sign in with SAML.

  9. Copy the SERVICE PROVIDER METADATA URL, the ASSERTION CONSUMER SERVICE (ACS) URL, and the SINGLE LOGOUT (SLO) URL to paste into the identity provider workflow.

  10. Click Save.

Note

  • To configure a service provider, enter the Identity Provider’s metadata URL to obtain its metadata. To access the URL, you may need to provide the CA certificate that can verify the remote server.

  • To link group membership with users, use the Edit or Create team dialog to associate SAML group assertion with the MKE team to synchronize user team membership when the user log in.

SAML security considerations

From the MKE web UI you can download a client bundle with which you can access MKE using the CLI and the API.

A client bundle is a group of certificates that enable command-line access and API access to the software. It lets you authorize a remote Docker engine to access specific user accounts that are managed in MKE, absorbing all associated RBAC controls in the process. Once you obtain the client bundle, you can execute Docker Swarm commands from your remote machine to take effect on the remote cluster.

Previously-authorized client bundle users can still access MKE, regardless of the newly configured SAML access controls.

Mirantis recomments that you take the following steps to ensure that access from the client bundle is in sync with the identity provider, and to thus prevent any previously-authorized users from accessing MKE through their existing client bundle:

  1. Remove the user account from MKE that grants the client bundle access.

  2. If group membership in the identity provider changes, replicate the change in MKE.

  3. Continue using LDAP to sync group membership.

To download the client bundle:

  1. Log in to the MKE web UI.

  2. In the navigation menu at the left, click the user name drop-down to display the available options.

  3. Click your account name to display the available options.

  4. Click My Profile.

  5. Click the New Client Bundle drop-down in the details pane and select Generate Client Bundle.

  6. (Optional) Enter a name for the bundle into the Label field.

  7. Click Confirm to initiate the bundle download.

Configure an OpenID Connect identity provider

Available since MKE 3.5.0

OpenID Connect (OIDC) allows you to authenticate MKE users with a trusted external identity provider.

Note

Kubernetes users who want client bundles to use OIDC must Download and configure the client bundle and replace the authorization section therein with the parameters presented in the Kubernetes OIDC Authenticator documentation.

For identity providers that require a client redirect URI, use https://<MKE_HOST>/login. For identity providers that do not permit the use of an IP address for the host, use https://<mke-cluster-domain>/login.

The requested scopes for all identity providers are "openid email". Claims are read solely from the ID token that your identity provider returns. MKE does not use the UserInfo URL to obtain user information. The default username claim is sub. To use a different username claim, you must specify that value with the usernameClaim setting in the MKE configuration file.

The following example details the MKE configuration file settings for using an external identity provider.

  • For the *signInCriteria array, term is set to hosted domain ("hd") and value is set to the domain from which the user is permitted to sign in.

  • For the *adminRoleCriteria array, matchType is set to "contains", in case any administrators are assigned to multiple roles that include admin.

[auth.external_identity_provider]
  wellKnownConfigUrl = "https://example.com/.well-known/openid-configuration"
  clientId = "4dcdace6-4eb4-461d-892f-01aed344ac80"
  clientSecret = "ed89aeddcdb4461ace640"
  usernameClaim = "email"
  caBundle = "----BEGIN CERTIFICATE----\nMIIF...UfTd\n----END CERTIFICATE----\n"

  [[auth.external_identity_provider.signInCriteria]]
    term = "hd"
    value = "myorg.com"
    matchType = "must"

  [[auth.external_identity_provider.adminRoleCriteria]]
    term = "roles"
    value = "admin"
    matchType = "contains"

Note

Using an external identity provider to sign in to the MKE web UI creates a new user session, and thus users who sign in this way will not be signed out when their ID token expires. Instead, the session lifetime is set using the auth.sessions parameters in the MKE configuration file.

Refer to the MKE configuration file auth.external_identity_provider (optional) for the complete reference documentation.

SCIM integration

Simple-Cloud-Identity-Management/System-for-Cross-domain-Identity-Management (SCIM) provides an LDAP alternative for provisioning and managing users and groups, as well as syncing users and groups with an upstream identity provider. Using SCIM schema and API, you can utilize Single Sign-on services (SSO) across various tools.

Prior to MKE 3.2, when deactivating a user or changing a user’s group membership association in the identity provider, these events were not synchronized with MKE (the service provider). You were required to manually change the status and group membership of the user, and possibly revoke the client bundle. SCIM implementation allows proactive synchronization with MKE and eliminates this manual intervention.

Supported identity providers
  • Okta 3.2.0

Typical steps involved in SCIM integration:
  1. Configure SCIM for MKE.

  2. Configure SCIM authentication and access.

  3. Specify user attributes.

Configure SCIM for MKE

Docker’s SCIM implementation utilizes SCIM version 2.0.

Navigate to Admin Settings -> Authentication and Authorization. By default, docker-datacenter is the organization to which the SCIM team belongs. Enter the API token in the UI or have MKE generate a UUID for you.

Configure SCIM authentication and access

The base URL for all SCIM API calls is https://<Host IP>/enzi/v0/scim/v2/. All SCIM methods are accessible API endpoints of this base URL.

Bearer Auth is the API authentication method. When configured, SCIM API endpoints are accessed via the following HTTP header Authorization: Bearer <token>

Note

  • SCIM API endpoints are not accessible by any other user (or their token), including the MKE administrator and MKE admin Bearer token.

  • As of MKE 3.2.0, an HTTP authentication request header that contains a Bearer token is the only method supported.

Specify user attributes

The following table maps SCIM and SAML attributes to user attribute fields that Docker uses.

MKE

SAML

SCIM

Account name

nameID in response

userName

Account full name

attribute value in fullname assertion

user’s name.formatted

Team group link name

attribute value in member-of assertion

group’s displayName

Team name

N/A

when creating a team, use group’s displayName + _SCIM

Supported SCIM API endpoints
  • User operations

    • Retrieve user information

    • Create a new user

    • Update user information

  • Group operations

    • Create a new user group

    • Retrieve group information

    • Update user group membership (add/replace/remove users)

  • Service provider configuration operations

    • Retrieve service provider resource type metadata

    • Retrieve schema for service provider and SCIM resources

    • Retrieve schema for service provider configuration

User operations

For user GET and POST operations:

  • Filtering is only supported using the userName attribute and eq operator. For example, filter=userName Eq "john".

  • Attribute name and attribute operator are case insensitive. For example, the following two expressions evaluate to the same logical value:

    • filter=userName Eq "john"

    • filter=Username eq "john"

  • Pagination is fully supported.

  • Sorting is not supported.

GET /Users

Returns a list of SCIM users, 200 users per page by default. Use the startIndex and count query parameters to paginate long lists of users.

For example, to retrieve the first 20 Users, set startIndex to 1 and count to 20, provide the following json request:

GET Host IP/enzi/v0/scim/v2/Users?startIndex=1&count=20
Host: example.com
Accept: application/scim+json
Authorization: Bearer h480djs93hd8

The response to the previous query returns metadata regarding paging that is similar to the following example:

{
  "totalResults":100,
  "itemsPerPage":20,
  "startIndex":1,
  "schemas":["urn:ietf:params:scim:api:messages:2.0:ListResponse"],
  "Resources":[{
     ...
  }]
}
GET /Users/{id}

Retrieves a single user resource. The value of the {id} should be the user’s ID. You can also use the userName attribute to filter the results.

GET {Host IP}/enzi/v0/scim/v2/Users?{user ID}
Host: example.com
Accept: application/scim+json
Authorization: Bearer h480djs93hd8
POST /Users

Creates a user. Must include the userName attribute and at least one email address.

POST {Host IP}/enzi/v0/scim/v2/Users
Host: example.com
Accept: application/scim+json
Authorization: Bearer h480djs93hd8
PATCH /Users/{id}

Updates a user’s active status. Inactive users can be reactivated by specifying "active": true. Active users can be deactivated by specifying "active": false. The value of the {id} should be the user’s ID.

PATCH {Host IP}/enzi/v0/scim/v2/Users?{user ID}
Host: example.com
Accept: application/scim+json
Authorization: Bearer h480djs93hd8
PUT /Users/{id}

Updates existing user information. All attribute values are overwritten, including attributes for which empty values or no values were provided. If a previously set attribute value is left blank during a PUT operation, the value is updated with a blank value in accordance with the attribute data type and storage provider. The value of the {id} should be the user’s ID.

Group operations

For group GET and POST operations:

  • Pagination is fully supported.

  • Sorting is not supported.

GET /Groups/{id}

Retrieves information for a single group.

GET /scim/v1/Groups?{Group ID}
Host: example.com
Accept: application/scim+json
Authorization: Bearer h480djs93hd8
GET /Groups

Returns a paginated list of groups, ten groups per page by default. Use the startIndex and count query parameters to paginate long lists of groups.

GET /scim/v1/Groups?startIndex=4&count=500 HTTP/1.1
Host: example.com
Accept: application/scim+json
Authorization: Bearer h480djs93hd8
POST /Groups

Creates a new group. Users can be added to the group during group creation by supplying user ID values in the members array.

PATCH /Groups/{id}

Updates an existing group resource, allowing individual (or groups of) users to be added or removed from the group with a single operation. Add is the default operation.

Setting the operation attribute of a member object to delete removes members from a group.

PUT /Groups/{id}

Updates an existing group resource, overwriting all values for a group even if an attribute is empty or not provided. PUT replaces all members of a group with members provided via the members attribute. If a previously set attribute is left blank during a PUT operation, the new value is set to blank in accordance with the data type of the attribute and the storage provider.

Service provider configuration operations

SCIM defines three endpoints to facilitate discovery of SCIM service provider features and schema that can be retrieved using HTTP GET:

GET /ResourceTypes

Discovers the resource types available on a SCIM service provider, for example, Users and Groups. Each resource type defines the endpoints, the core schema URI that defines the resource, and any supported schema extensions.

GET /Schemas

Retrieves information about all supported resource schemas supported by a SCIM service provider.

GET /ServiceProviderConfig

Returns a JSON structure that describes the SCIM specification features available on a service provider using a schemas attribute of urn:ietf:params:scim:schemas:core:2.0:ServiceProviderConfig.

Enable Helm with MKE

To use Helm with MKE, you must define the necessary roles in the kube-system default service account.

Note

For comprehensive information on the use of Helm, refer to the Helm user documentation.

To enable Helm with MKE, enter the following kubectl commands in sequence:

kubectl create rolebinding default-view --clusterrole=view
--serviceaccount=kube-system:default --namespace=kube-system

kubectl create clusterrolebinding add-on-cluster-admin
--clusterrole=cluster-admin --serviceaccount=kube-system:default

Integrate with an LDAP directory

MKE integrates with LDAP directory services, so that you can manage users and groups from your organization’s directory and it will automatically propagate that information to MKE and MSR.

If you enable LDAP, MKE uses a remote directory server to create users automatically, and all logins are forwarded to the directory server.

When you switch from built-in authentication to LDAP authentication, all manually created users whose usernames don’t match any LDAP search results are still available.

When you enable LDAP authentication, you can choose whether MKE creates user accounts only when users log in for the first time. Select the Just-In-Time User Provisioning option to ensure that the only LDAP accounts that exist in MKE are those that have had a user log in to MKE.

Note

If you enable SAML integration, you cannot enable LDAP for authentication. This does not affect local MKE user account authentication.

How MKE integrates with LDAP

You control how MKE integrates with LDAP by creating searches for users. You can specify multiple search configurations, and you can specify multiple LDAP servers to integrate with. Searches start with the Base DN, which is the distinguished name of the node in the LDAP directory tree where the search starts looking for users.

Access LDAP settings by navigating to the Authentication & Authorization page in the MKE web interface. There are two sections for controlling LDAP searches and servers.

  • LDAP user search configurations: This is the section of the Authentication & Authorization page where you specify search parameters, like Base DN, scope, filter, the username attribute, and the full name attribute. These searches are stored in a list, and the ordering may be important, depending on your search configuration.

  • LDAP server: This is the section where you specify the URL of an LDAP server, TLS configuration, and credentials for doing the search requests. Also, you provide a domain for all servers but the first one. The first server is considered the default domain server. Any others are associated with the domain that you specify in the page.

Here’s what happens when MKE synchronizes with LDAP:

  1. MKE creates a set of search results by iterating over each of the user search configs, in the order that you specify.

  2. MKE choses an LDAP server from the list of domain servers by considering the Base DN from the user search config and selecting the domain server that has the longest domain suffix match.

  3. If no domain server has a domain suffix that matches the Base DN from the search config, MKE uses the default domain server.

  4. MKE combines the search results into a list of users and creates MKE accounts for them. If the Just-In-Time User Provisioning option is set, user accounts are created only when users first log in.

The domain server to use is determined by the Base DN in each search config. MKE doesn’t perform search requests against each of the domain servers, only the one which has the longest matching domain suffix, or the default if there’s no match.

Here’s an example. Let’s say we have three LDAP domain servers:

Domain

Server URL

default

ldaps://ldap.example.com

dc=subsidiary1,dc=com

ldaps://ldap.subsidiary1.com

dc=subsidiary2,dc=subsidiary1,dc=com

ldaps://ldap.subsidiary2.com

Here are three user search configs with the following Base DNs:

  • baseDN=ou=people,dc=subsidiary1,dc=com

    For this search config, dc=subsidiary1,dc=com is the only server with a domain which is a suffix, so MKE uses the server ldaps://ldap.subsidiary1.com for the search request.

  • baseDN=ou=product,dc=subsidiary2,dc=subsidiary1,dc=com

    For this search config, two of the domain servers have a domain which is a suffix of this base DN, but dc=subsidiary2,dc=subsidiary1,dc=com is the longer of the two, so MKE uses the server ldaps://ldap.subsidiary2.com for the search request.

  • baseDN=ou=eng,dc=example,dc=com

    For this search config, there is no server with a domain specified which is a suffix of this base DN, so MKE uses the default server, ldaps://ldap.example.com, for the search request.

If there are username collisions for the search results between domains, MKE uses only the first search result, so the ordering of the user search configs may be important. For example, if both the first and third user search configs result in a record with the username jane.doe, the first has higher precedence and the second is ignored. For this reason, it’s important to choose a username attribute that’s unique for your users across all domains.

Because names may collide, it’s a good idea to use something unique to the subsidiary, like the email address for each person. Users can log in with the email address, for example, jane.doe@subsidiary1.com.

Configure the LDAP integration

To configure MKE to create and authenticate users by using an LDAP directory, go to the MKE web interface, navigate to the Admin Settings page, and click Authentication & Authorization to select the method used to create and authenticate users.

In the LDAP Enabled section, click Yes. Now configure your LDAP directory integration.

Default role for all private collections

Use this setting to change the default permissions of new users.

Click the drop-down menu to select the permission level that MKE assigns by default to the private collections of new users. For example, if you change the value to View Only, all users who log in for the first time after the setting is changed have View Only access to their private collections, but permissions remain unchanged for all existing users.

LDAP enabled

Click Yes to enable integrating MKE users and teams with LDAP servers.

LDAP server

Field

Description

LDAP server URL

The URL where the LDAP server can be reached.

Reader DN

The distinguished name of the LDAP account used for searching entries in the LDAP server. As a best practice, this should be an LDAP read-only user.

Reader password

The password of the account used for searching entries in the LDAP server.

Use Start TLS

Whether to authenticate/encrypt the connection after connecting to the LDAP server over TCP. If you set the LDAP Server URL field with ldaps://, this field is ignored.

Skip TLS verification

Whether to verify the LDAP server certificate when using TLS. The connection is still encrypted but vulnerable to man-in-the-middle attacks.

No simple pagination

If your LDAP server doesn’t support pagination.

Just-In-Time User Provisioning

Whether to create user accounts only when users log in for the first time. The default value of true is recommended. If you upgraded from UCP 2.0.x, the default is false.

Note

LDAP connections using certificates created with TLS v1.2 do not currently advertise support for sha512WithRSAEncryption in the TLS handshake which leads to issues establishing connections with some clients. Support for advertising sha512WithRSAEncryption will be added in MKE 3.1.0.

Click Confirm to add your LDAP domain.

To integrate with more LDAP servers, click Add LDAP Domain.

LDAP user search configurations

Field

Description

Base DN

The distinguished name of the node in the directory tree where the search should start looking for users.

Username attribute

The LDAP attribute to use as username on MKE. Only user entries with a valid username will be created. A valid username is no longer than 100 characters and does not contain any unprintable characters, whitespace characters, or any of the following characters: / \ [ ] : ; | = , + * ? < > ' ".

Full name attribute

The LDAP attribute to use as the user’s full name for display purposes. If left empty, MKE will not create new users with a full name value.

Filter

The LDAP search filter used to find users. If you leave this field empty, all directory entries in the search scope with valid username attributes are created as users.

Search subtree instead of just one level

Whether to perform the LDAP search on a single level of the LDAP tree, or search through the full LDAP tree starting at the Base DN.

Match Group Members

Whether to further filter users by selecting those who are also members of a specific group on the directory server. This feature is helpful if the LDAP server does not support memberOf search filters.

Iterate through group members

If Select Group Members is selected, this option searches for users by first iterating over the target group’s membership, making a separate LDAP query for each member, as opposed to first querying for all users which match the above search query and intersecting those with the set of group members. This option can be more efficient in situations where the number of members of the target group is significantly smaller than the number of users which would match the above search filter, or if your directory server does not support simple pagination of search results.

Group DN

If Select Group Members is selected, this specifies the distinguished name of the group from which to select users.

Group Member Attribute

If Select Group Members is selected, the value of this group attribute corresponds to the distinguished names of the members of the group.

To configure more user search queries, click Add LDAP User Search Configuration again. This is useful in cases where users may be found in multiple distinct subtrees of your organization’s directory. Any user entry which matches at least one of the search configurations will be synced as a user.

LDAP test login

Field

Description

Username

An LDAP username for testing authentication to this application. This value corresponds with the Username Attribute specified in the LDAP user search configurations section.

Password

The user’s password used to authenticate (BIND) to the directory server.

Before you save the configuration changes, you should test that the integration is correctly configured. You can do this by providing the credentials of an LDAP user, and clicking the Test button.

LDAP sync configuration

Field

Description

Sync interval

The interval, in hours, to synchronize users between MKE and the LDAP server. When the synchronization job runs, new users found in the LDAP server are created in MKE with the default permission level. MKE users that don’t exist in the LDAP server become inactive.

Enable sync of admin users

This option specifies that system admins should be synced directly with members of a group in your organization’s LDAP directory. The admins will be synced to match the membership of the group. The configured recovery admin user will also remain a system admin.

Once you’ve configured the LDAP integration, MKE synchronizes users based on the interval you’ve defined starting at the top of the hour. When the synchronization runs, MKE stores logs that can help you troubleshoot when something goes wrong.

You can also manually synchronize users by clicking Sync Now.

Revoke user access

When a user is removed from LDAP, the effect on the user’s MKE account depends on the Just-In-Time User Provisioning setting:

  • Just-In-Time User Provisioning is false: Users deleted from LDAP become inactive in MKE after the next LDAP synchronization runs.

  • Just-In-Time User Provisioning is true: Users deleted from LDAP can’t authenticate, but their MKE accounts remain active. This means that they can use their client bundles to run commands. To prevent this, deactivate their MKE user accounts.

Data synced from your organization’s LDAP directory

MKE saves a minimum amount of user data required to operate. This includes the value of the username and full name attributes that you have specified in the configuration as well as the distinguished name of each synced user. MKE does not store any additional data from the directory server.

Sync teams

MKE enables syncing teams with a search query or group in your organization’s LDAP directory.

LDAP Configuration via API

As of MKE 3.1.5, LDAP-specific GET and PUT API endpoints have been added to the Config resource. Note that swarm mode has to be enabled before you can hit the following endpoints:

  • GET /api/ucp/config/auth/ldap - Returns information on your current system LDAP configuration.

  • PUT /api/ucp/config/auth/ldap - Lets you update your LDAP configuration.

Restrict services to worker nodes

You can configure MKE to allow users to deploy and run services only in worker nodes. This ensures all cluster management functionality stays performant, and makes the cluster more secure.

Important

In the event that a user deploys a malicious service capable of affecting the node on which it is running, that service will not be able to strike any other nodes in the cluster or have any impact on cluster management functionality.

Swarm Workloads

To restrict users from deploying to manager nodes, log in with administrator credentials to the MKE web interface, navigate to the Admin Settings page, and choose Scheduler.

You can then choose if user services should be allowed to run on manager nodes or not.

Note

Creating a grant with the Scheduler role against the / collection takes precedence over any other grants with Node Schedule on subcollections.

Kubernetes Workloads

By default MKE clusters takes advantage of Taints and Tolerations to prevent a User’s workload being deployed on to MKE Manager or MSR Nodes.

You can view this taint by running:

$ kubectl get nodes <mkemanager> -o json | jq -r '.spec.taints | .[]'
{
  "effect": "NoSchedule",
  "key": "com.docker.ucp.manager"
}

Note

Workloads deployed by an Administrator in the kube-system namespace do not follow these scheduling constraints. If an Administrator deploys a workload in the kube-system namespace, a toleration is applied to bypass this taint, and the workload is scheduled on all node types.

Allow Administrators to Schedule on Manager / MSR Nodes

To allow Administrators to deploy workloads accross all nodes types, an Administrator can tick the “Allow administrators to deploy containers on MKE managers or nodes running MSR” box in the MKE web interface.

For all new workloads deployed by Administrators after this box has been ticked, MKE will apply a toleration to your workloads to allow the pods to be scheduled on all node types.

For existing workloads, the Administrator will need to edit the Pod specification, through kubectl edit <object> <workload> or the MKE web interface and add the following toleration:

tolerations:
- key: "com.docker.ucp.manager"
  operator: "Exists"

You can check this has been applied succesfully by:

$ kubectl get <object> <workload> -o json | jq -r '.spec.template.spec.tolerations | .[]'
{
  "key": "com.docker.ucp.manager",
  "operator": "Exists"
}
Allow Users and Service Accounts to Schedule on Manager / MSR Nodes

To allow Kubernetes Users and Service Accounts to deploy workloads accross all node types in your cluster, an Administrator will need to tick “Allow all authenticated users, including service accounts, to schedule on all nodes, including MKE managers and MSR nodes.” in the MKE web interface.

For all new workloads deployed by Kubernetes Users after this box has been ticked, MKE will apply a toleration to your workloads to allow the pods to be scheduled on all node types. For existing workloads, the User would need to edit Pod Specification as detailed above in the “Allow Administrators to Schedule on Manager / MSR Nodes” section.

There is a NoSchedule taint on MKE managers and MSR nodes and if you have scheduling on managers/workers disabled in the MKE scheduling options, then a toleration for that taint will not get applied to the deployments, so they should not schedule on those nodes. Unless the Kube workload is deployed in the kube-system name space.

See also

Kubernetes

Run only the images you trust

With MKE you can enforce applications to only use Docker images signed by MKE users you trust. Each time a user attempts to deploy an application to the cluster, MKE checks whether the application is using a trusted Docker image (and will halt the deployment if that is not the case).

By signing and verifying the Docker images, you ensure that the images being used in your cluster are the ones you trust and haven’t been altered either in the image registry or on their way from the image registry to your MKE cluster.

Example workflow
  1. A developer makes changes to a service and pushes their changes to a version control system.

  2. A CI system creates a build, runs tests, and pushes an image to MSR with the new changes.

  3. The quality engineering team pulls the image and runs more tests. If everything looks good they sign and push the image.

  4. The IT operations team deploys a service. If the image used for the service was signed by the QA team, MKE deploys it. Otherwise MKE refuses to deploy.

Configure MKE

To configure MKE to only allow running services that use Docker trusted images:

  1. Access the MKE UI and browse to the Admin Settings page.

  2. In the left navigation pane, click Docker Content Trust.

  3. Select the Run only signed images option.

    With this setting, MKE allows deploying any image as long as the image has been signed. It doesn’t matter who signed the image.

    To enforce that the image needs to be signed by specific teams, click Add Team and select those teams from the list.

    If you specify multiple teams, the image needs to be signed by a member of each team, or someone that is a member of all those teams.

  4. Click Save.

    At this point, MKE starts enforcing the policy. Existing services will continue running and can be restarted if needed, however MKE only allows the deployment of new services that use a trusted image.

Set user session properties

MKE enables the setting of various user sessions properties, such as session timeout and the permitted number of concurrent sessions.

To configure MKE login session properties:

  1. Log in to the MKE web UI.

  2. In the left-side navigation menu, click the user name drop-down to display the available options.

  3. Click Admin Settings > Authentication & Authorization to reveal the MKE login session controls.

The following table offers information on the MKE login session controls:

Field

Description

Lifetime Minutes

The set duration of a login session in minutes, starting from the moment MKE generates the session. MKE invalidates the active session once this period expires and the user must re-authenticate to establish a new session.

  • Default: 60

  • Minimum: 10

Renewal Threshold Minutes

The time increment in minutes by which MKE extends an active session prior to session expiration. MKE extends the session by the amount specified in Lifetime Minutes. The threshold value cannot be greater than that set in Lifetime Minutes.

To specify that sessions not be extended, set the threshold value to 0. Be aware, though, that this may cause MKE web UI users to be unexpectedly logged out.

  • Default: 20

  • Maximum: 5 minutes less than Lifetime Minutes

Per User Limit

The maximum number of sessions a user can have running simultaneously. If the creation of a new session results in the exceeding of this limit, MKE will delete the session least recently put to use. Specifically, every time you use a session token, the server marks it with the current time (lastUsed metadata). When you create a new session exceeds the per-user limit, the session with the oldest lastUsed time is deleted, which is not necessarily the oldest session.

To disable the Per User Limit setting, set the value to 0.

  • Default: 10

  • Minimum: 1 / Maximum: No limit

KMS plugin support for MKE

Mirantis Kubernetes Engine (MKE) 3.2.5 adds support for a Key Management Service (KMS) plugin to allow access to third-party secrets management solutions, such as Vault. This plugin is used by MKE for access from Kubernetes clusters.

Deployment

KMS must be deployed before a machine becomes a MKE manager or it may be considered unhealthy. MKE will not health check, clean up, or otherwise manage the KMS plugin.

Configuration

KMS plugin configuration should be done through MKE. MKE will maintain ownership of the Kubernetes EncryptionConfig file, where the KMS plugin is configured for Kubernetes. MKE does not currently check this file’s contents after deployment.

MKE adds new configuration options to the cluster configuration table. These options are not exposed through the web UI, but can be configured via the API.

The following table shows the configuration options for the KMS plugin. These options are not required.

Parameter

Type

Description

kms_enabled

bool

Determines if MKE should configure a KMS plugin.

kms_name

string

Name of the KMS plugin resource (for example, “vault”).

kms_endpoint

string

Path of the KMS plugin socket. This path must refer to a UNIX socket on the host (for example, “/tmp/socketfile.sock”). MKE will bind mount this file to make it accessible to the API server.

kms_cachesize

int

Number of data encryption keys (DEKs) to be cached in the clear.

See also

Kubernetes

Use a local node network in a swarm

Mirantis Kubernetes Engine (MKE) can use your local networking drivers to orchestrate your cluster. You can create a config network, with a driver like MAC VLAN, and you use it like any other named network in MKE. If it’s set up as attachable, you can attach containers.

Warning

Encrypting communication between containers on different nodes works only on overlay networks.

Use MKE to create node-specific networks

Always use MKE to create node-specific networks. You can use the MKE web UI or the CLI (with an admin bundle). If you create the networks without MKE, the networks won’t have the right access labels and won’t be available in MKE.

Create a MAC VLAN network
  1. Log in as an administrator.

  2. Navigate to Networks and click Create Network.

  3. Name the network “macvlan”.

  4. In the Driver dropdown, select Macvlan.

  5. In the Macvlan Configure section, select the configuration option. Create all of the config-only networks before you create the config-from network.

    • Config Only: Prefix the config-only network name with a node hostname prefix, like node1/my-cfg-network, node2/my-cfg-network, etc. This is necessary to ensure that the access labels are applied consistently to all of the back-end config-only networks. MKE routes the config-only network creation to the appropriate node based on the node hostname prefix. All config-only networks with the same name must belong in the same collection, or MKE returns an error. Leaving the access label empty puts the network in the admin’s default collection, which is / in a new MKE installation.

    • Config From: Create the network from a Docker config. Don’t set up an access label for the config-from network. The labels of the network and its collection placement are inherited from the related config-only networks.

  6. Click Create.

Scale your cluster

MKE is designed for scaling horizontally as your applications grow in size and usage. You can add or remove nodes from the MKE cluster to make it scale to your needs.

Since MKE leverages the clustering functionality provided by Docker Engine, you use the docker swarm join command to add more nodes to your cluster. When joining new nodes, the MKE services automatically start running in that node.

When joining a node to a cluster you can specify its role: manager or worker.

  • Manager nodes

    Manager nodes are responsible for cluster management functionality and dispatching tasks to worker nodes. Having multiple manager nodes allows your cluster to be highly-available and tolerate node failures.

    Manager nodes also run all MKE components in a replicated way, so by adding additional manager nodes, you’re also making MKE highly available.

  • Worker nodes

    Worker nodes receive and execute your services and applications. Having multiple worker nodes allows you to scale the computing capacity of your cluster.

    When deploying Mirantis Secure Registry in your cluster, you deploy it to a worker node.

Join nodes to the cluster

To join nodes to the cluster, go to the MKE web UI and navigate to the Nodes page.

Click Add Node to add a new node.

  • Click Manager if you want to add the node as a manager.

  • Check Use a custom listen address option to specify the IP address of the host that you will be joining to the cluster.

  • Check the Use a custom advertise address option to specify the IP address that is advertised to all members of the cluster for API access.

Copy the displayed command, use ssh to log into the host that you want to join to the cluster, and run the docker swarm join command on the host.

To add a Windows node, click Windows and follow the instructions in Join Windows worker nodes to a cluster.

After you run the join command in the node, you can view the node in the MKE web UI.

Remove nodes from the cluster
  1. If the target node is a manager, you will need to first demote the node into a worker before proceeding with the removal:

    • From the MKE web UI, navigate to the Nodes page. Select the node you wish to remove and switch its role to Worker, wait until the operation completes, and confirm that the node is no longer a manager.

    • From the CLI, perform docker node ls and identify the nodeID or hostname of the target node. Then, run docker node demote <nodeID or hostname>.

  2. If the status of the worker node is Ready, you’ll need to manually force the node to leave the cluster. To do this, connect to the target node through SSH and run docker swarm leave --force directly against the local instance of MCR.

    Loss of quorum

    Do not perform this step if the node is still a manager, as this may cause loss of quorum.

  3. Now that the status of the node is reported as Down, you may remove the node:

    • From the MKE web UI, browse to the Nodes page and select the node. In the details pane, click Actions and select Remove. Click Confirm when you’re prompted.

    • From the CLI, perform docker node rm <nodeID or hostname>.

Pause and drain nodes

Once a node is part of the cluster you can change its role making a manager node into a worker and vice versa. You can also configure the node availability so that it is:

  • Active: the node can receive and execute tasks.

  • Paused: the node continues running existing tasks, but doesn’t receive new ones.

  • Drained: the node won’t receive new tasks. Existing tasks are stopped and replica tasks are launched in active nodes.

In the MKE web UI, browse to the Nodes page and select the node. In the details pane, click the Configure to open the Edit Node page.

If you’re load-balancing user requests to MKE across multiple manager nodes, when demoting those nodes into workers, don’t forget to remove them from your load-balancing pool.

Use the CLI to scale your cluster

You can also use the command line to do all of the above operations. To get the join token, run the following command on a manager node:

docker swarm join-token worker

If you want to add a new manager node instead of a worker node, use docker swarm join-token manager instead. If you want to use a custom listen address, add the --listen-addr arg:

docker swarm join \
    --token SWMTKN-1-2o5ra9t7022neymg4u15f3jjfh0qh3yof817nunoioxa9i7lsp-dkmt01ebwp2m0wce1u31h6lmj \
    --listen-addr 234.234.234.234 \
    192.168.99.100:2377

Once your node is added, you can see it by running docker node ls on a manager:

docker node ls

To change the node’s availability, use:

docker node update --availability drain node2

You can set the availability to active, pause, or drain.

To remove the node, use:

docker node rm <node-hostname>

Use your own TLS certificates

All MKE services are exposed using HTTPS, to ensure all communications between clients and MKE are encrypted. By default, this is done using self-signed TLS certificates that are not trusted by client tools like web browsers. So when you try to access MKE, your browser warns that it doesn’t trust MKE or that MKE has an invalid certificate.

The same happens with other client tools.

$ curl https://mke.example.org

SSL certificate problem: Invalid certificate chain

You can configure MKE to use your own TLS certificates, so that it is automatically trusted by your browser and client tools.

To ensure minimal impact to your business, you should plan for this change to happen outside business peak hours. Your applications will continue running normally, but existing MKE client certificates will become invalid, so users will have to download new ones to access MKE from the CLI.

Configure MKE to use your own TLS certificates and keys

To configure MKE to use your own TLS certificates and keys:

  1. Log into the MKE web UI with administrator credentials and navigate to the Admin Settings page.

  2. Click Certificates.

  3. Upload your certificates and keys based on the following table:

    Type

    Description

    Private key

    The unencrypted private key of MKE. This key must correspond to the public key used in the server certificate. Click Upload Key.

    Server certificate

    The public key certificate of MKE followed by the certificates of any intermediate certificate authorities which establishes a chain of trust up to the root CA certificate. Click Upload Certificate to upload a PEM file.

    CA certificate

    The public key certificate of the root certificate authority that issued the MKE server certificate. If you don’t have one, use the top-most intermediate certificate instead. Click Upload CA Certificate to upload a PEM file.

    Client CA

    This field is available in MKE 3.2. This field may contain one or more Root CA certificates which the MKE Controller will use to verify that client certificates are issued by a trusted entity. MKE is automatically configured to trust its internal CAs which issue client certificates as part of generated client bundles, however, you may supply MKE with additional custom root CA certificates here so that MKE may trust client certificates issued by your corporate or trusted third-party certificate authorities. Note that your custom root certificates will be appended to MKE’s internal root CA certificates. Click Upload CA Certificate to upload a PEM file. Click Download MKE Server CA Certificate to download the certificate as a PEM file.

  4. Click Save.

After replacing the TLS certificates, your users will not be able to authenticate with their old client certificate bundles. Ask your users to access the MKE web UI and download new client certificate bundles.

If you deployed Mirantis Secure Registry (MSR), you’ll also need to reconfigure it to trust the new MKE TLS certificates.

Manage and deploy private images

Mirantis offers its own image registry, MSR (Mirantis Secure Registry), which you can use to store and manage the images that you deploy to your cluster. This topic illustrates the use of Kubernetes orchestrator to push an image to MSR and later deploy that image to your cluster.

Open the MSR web UI
  1. In the MKE web UI, click Admin Settings.

  2. In the left pane, click Mirantis Secure Registry.

  3. In the Installed MSRs section, note the URL of your cluster’s MSR instance.

  4. In a new browser tab, enter the URL to open the MSR web UI.

Create an image repository
  1. In the MSR web UI, click Repositories.

  2. Click New Repository, and in the Repository Name field, enter “wordpress”.

  3. Click Save to create the repository.

Push an image to MSR

Instead of building an image from scratch, we’ll pull the official WordPress image from Docker Hub, tag it, and push it to MSR. Once that WordPress version is in MSR, only authorized users can change it.

CLI access to a licensed installation is required to push images to MSR.

  1. Pull the public WordPress image from Docker Hub:

    docker pull wordpress
    
  2. Tag the image, using the IP address or DNS name of your MSR instance:

    docker tag wordpress:latest <msr-url>:<port>/admin/wordpress:latest
    
  3. Log in to an MKE manager node.

  4. Push the tagged image to MSR:

    docker image push <msr-url>:<port>/admin/wordpress:latest
    
Confirm the image push

In the MSR web UI, confirm that the wordpress:latest image is store in your MSR instance.

  1. In the MSR web UI, click Repositories.

  2. Click wordpress to open the repo.

  3. Click Images to view the stored images.

  4. Confirm that the latest tag is present.

You’re ready to deploy the wordpress:latest image into production.

Deploy the private image to MKE

With the WordPress image stored in MSR, you can deploy the image to a Kubernetes cluster with a simple Deployment object:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: wordpress-deployment
spec:
  selector:
    matchLabels:
      app: wordpress
  replicas: 2
  template:
    metadata:
      labels:
        app: wordpress
    spec:
      containers:
      - name: wordpress
        image: <msr-url>:<port>/admin/wordpress:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: wordpress-service
  labels:
    app: wordpress
spec:
  type: NodePort
  ports:
    - port: 80
      nodePort: 30081
  selector:
    app: wordpress

The Deployment object’s YAML specifies your MSR image in the pod template spec: image: <msr-url>:<port>/admin/wordpress:latest. Also, the YAML file defines a NodePort service that exposes the WordPress application, so it’s accessible from outside the cluster.

  1. Open the MKE web UI, and in the left pane, click Kubernetes.

  2. Click Create to open the Create Kubernetes Object page.

  3. In the Namespace dropdown, select default.

  4. In the Object YAML editor, paste the Deployment object’s YAML.

  5. Click Create. When the Kubernetes objects are created, the Load Balancers page opens.

  6. Click wordpress-service, and in the details pane, find the Ports section.

  7. Click the URL to open the default WordPress home page.

See also

Kubernetes

Set the orchestrator type for a node

When you add a node to the cluster, the node’s workloads are managed by a default orchestrator, either Docker Swarm or Kubernetes. When you install MKE, new nodes are managed by Docker Swarm, but you can change the default orchestrator to Kubernetes in the administrator settings.

Changing the default orchestrator doesn’t affect existing nodes in the cluster. You can change the orchestrator type for individual nodes in the cluster by navigating to the node’s configuration page in the MKE web UI.

Change the orchestrator for a node

You can change the current orchestrator for any node that is joined to an MKE cluster. The available orchestrator types are Kubernetes, Swarm, and Mixed.

The Mixed type enables workloads to be scheduled by Kubernetes and Swarm both on the same node. Although you can choose to mix orchestrator types on the same node, this isn’t recommended for production deployments because of the likelihood of resource contention.

To change a node’s orchestrator type from the Edit Node page:

  1. Log in to the MKE web UI with an administrator account.

  2. Navigate to the Nodes page, and click the node that you want to assign to a different orchestrator.

  3. In the details pane, click Configure and select Details to open the Edit Node page.

  4. In the Orchestrator Properties section, click the orchestrator type for the node.

  5. Click Save to assign the node to the selected orchestrator.

What happens when you change a node’s orchestrator

When you change the orchestrator type for a node, existing workloads are evicted, and they’re not migrated to the new orchestrator automatically. If you want the workloads to be scheduled by the new orchestrator, you must migrate them manually. For example, if you deploy WordPress on a Swarm node, and you change the node’s orchestrator type to Kubernetes, MKE doesn’t migrate the workload, and WordPress continues running on Swarm. In this case, you must migrate your WordPress deployment to Kubernetes manually.

The following table summarizes the results of changing a node’s orchestrator.

Workload

On orchestrator change

Containers

Container continues running in node

Docker service

Node is drained, and tasks are rescheduled to another node

Pods and other imperative resources

Continue running in node

Deployments and other declarative resources

Might change, but for now, continue running in node

If a node is running containers, and you change the node to Kubernetes, these containers will continue running, and Kubernetes won’t be aware of them, so you’ll be in the same situation as if you were running in Mixed node.

Warning

Be careful when mixing orchestrators on a node.

When you change a node’s orchestrator, you can choose to run the node in a mixed mode, with both Kubernetes and Swarm workloads. The Mixed type is not intended for production use, and it may impact existing workloads on the node.

This is because the two orchestrator types have different views of the node’s resources, and they don’t know about each other’s workloads. One orchestrator can schedule a workload without knowing that the node’s resources are already committed to another workload that was scheduled by the other orchestrator. When this happens, the node could run out of memory or other resources.

For this reason, we recommend not mixing orchestrators on a production node.

Set the default orchestrator type for new nodes

You can set the default orchestrator for new nodes to Kubernetes or Swarm.

To set the orchestrator for new nodes:

  1. Log in to the MKE web UI with an administrator account.

  2. Open the Admin Settings page, and in the left pane, click Scheduler.

  3. Under Set Orchestrator Type for New Nodes, click Swarm or Kubernetes.

  4. Click Save.

From now on, when you join a node to the cluster, new workloads on the node are scheduled by the specified orchestrator type. Existing nodes in the cluster aren’t affected.

Once a node is joined to the cluster, you can change the orchestrator that schedules its workloads.

MSR in mixed mode

The default behavior for MSR nodes is to be in mixed orchestration. Additionally, if the MSR mode type is changed to Swarm only or Kubernetes only, reconciliation will revert the node back to mixed mode. This is the expected behavior.

Choosing the orchestrator type

The workloads on your cluster can be scheduled by Kubernetes or by Swarm, or the cluster can be mixed, running both orchestrator types. If you choose to run a mixed cluster, be aware that the different orchestrators aren’t aware of each other, and there’s no coordination between them.

We recommend that you make the decision about orchestration when you set up the cluster initially. Commit to Kubernetes or Swarm on all nodes, or assign each node individually to a specific orchestrator. Once you start deploying workloads, avoid changing the orchestrator setting. If you do change the orchestrator for a node, your workloads are evicted, and you must deploy them again through the new orchestrator.

Node demotion and orchestrator type

When you promote a worker node to be a manager, its orchestrator type automatically changes to Mixed. If you demote the same node to be a worker, its orchestrator type remains as Mixed.

Use the CLI to set the orchestrator type

Set the orchestrator on a node by assigning the orchestrator labels, com.docker.ucp.orchestrator.swarm or com.docker.ucp.orchestrator.kubernetes, to true.

To schedule Swarm workloads on a node:

docker node update --label-add com.docker.ucp.orchestrator.swarm=true <node-id>

To schedule Kubernetes workloads on a node:

docker node update --label-add com.docker.ucp.orchestrator.kubernetes=true <node-id>

To schedule Kubernetes and Swarm workloads on a node:

docker node update --label-add com.docker.ucp.orchestrator.swarm=true <node-id>
docker node update --label-add com.docker.ucp.orchestrator.kubernetes=true <node-id>

Warning

Mixed nodes

Scheduling both Kubernetes and Swarm workloads on a node is not recommended for production deployments, because of the likelihood of resource contention.

To change the orchestrator type for a node from Swarm to Kubernetes:

docker node update --label-add com.docker.ucp.orchestrator.kubernetes=true <node-id>
docker node update --label-rm com.docker.ucp.orchestrator.swarm <node-id>

MKE detects the node label change and updates the Kubernetes node accordingly.

Check the value of the orchestrator label by inspecting the node:

docker node inspect <node-id> | grep -i orchestrator

The docker node inspect command returns the node’s configuration, including the orchestrator:

"com.docker.ucp.orchestrator.kubernetes": "true"

Important

Orchestrator label

The com.docker.ucp.orchestrator label isn’t displayed in the Labels list for a node in the MKE web UI.

Set the default orchestrator type for new nodes

The default orchestrator for new nodes is a setting in the MKE configuration file:

default_node_orchestrator = "swarm"

The value can be swarm or kubernetes.

See also

Kubernetes

View Kubernetes objects in a namespace

With MKE, administrators can filter the view of Kubernetes objects by the namespace the objects are assigned to. You can specify a single namespace, or you can specify all available namespaces.

Create two namespaces

In this example, you create two Kubernetes namespaces and deploy a service to both of them.

  1. Log in to the MKE web UI with an administrator account.

  2. In the left pane, click Kubernetes.

  3. Click Create to open the Create Kubernetes Object page.

  4. In the Object YAML editor, paste the following YAML.

    apiVersion: v1
    kind: Namespace
    metadata:
     name: blue
    ---
     apiVersion: v1
     kind: Namespace
     metadata:
      name: green
    
  5. Click Create to create the blue and green namespaces.

Deploy services

Create a NodePort service in the blue namespace.

  1. Navigate to the Create Kubernetes Object page.

  2. In the Namespace dropdown, select blue.

  3. In the Object YAML editor, paste the following YAML.

    apiVersion: v1
    kind: Service
    metadata:
      name: app-service-blue
      labels:
        app: app-blue
    spec:
      type: NodePort
      ports:
        - port: 80
          nodePort: 32768
      selector:
        app: app-blue
    
  4. Click Create to deploy the service in the blue namespace.

  5. Repeat the previous steps with the following YAML, but this time, select green from the Namespace dropdown.

    apiVersion: v1
    kind: Service
    metadata:
      name: app-service-green
      labels:
        app: app-green
    spec:
      type: NodePort
      ports:
        - port: 80
          nodePort: 32769
      selector:
        app: app-green
    
View services

Currently, the Namespaces view is set to the default namespace, so the Load Balancers page doesn’t show your services.

  1. In the left pane, click Namespaces to open the list of namespaces.

  2. In the upper-right corner, click the Set context for all namespaces toggle and click Confirm. The indicator in the left pane changes to All Namespaces.

  3. Click Load Balancers to view your services.

Filter the view by namespace

With the Set context for all namespaces toggle set, you see all of the Kubernetes objects in every namespace. Now filter the view to show only objects in one namespace.

  1. In the left pane, click Namespaces to open the list of namespaces.

  2. In the green namespace, click the More options icon and in the context menu, select Set Context.

  3. Click Confirm to set the context to the green namespace. The indicator in the left pane changes to green.

  4. Click Load Balancers to view your app-service-green service. The app-service-blue service doesn’t appear.

To view the app-service-blue service, repeat the previous steps, but this time, select Set Context on the blue namespace.

See also

Kubernetes

Join Nodes

Set up high availability

MKE is designed for high availability (HA). You can join multiple manager nodes to the cluster, so that if one manager node fails, another can automatically take its place without impact to the cluster.

Having multiple manager nodes in your cluster allows you to:

  • Handle manager node failures,

  • Load-balance user requests across all manager nodes.

Size your deployment

To make the cluster tolerant to more failures, add additional replica nodes to your cluster.

Manager nodes

Failures tolerated

1

0

3

1

5

2

For production-grade deployments, follow these best practices:

  • For HA with minimal network overhead, the recommended number of manager nodes is 3. The recommended maximum number of manager nodes is 5. Adding too many manager nodes to the cluster can lead to performance degradation, because changes to configurations must be replicated across all manager nodes.

  • When a manager node fails, the number of failures tolerated by your cluster decreases. Don’t leave that node offline for too long.

  • You should distribute your manager nodes across different availability zones. This way your cluster can continue working even if an entire availability zone goes down.

Join Linux nodes to your cluster

MKE is designed for scaling horizontally as your applications grow in size and usage. You can add or remove nodes from the cluster to scale it to your needs. You can join Windows Server and Linux nodes to the cluster.

Because MKE leverages the clustering functionality provided by Mirantis Container Runtime, you use the docker swarm join command to add more nodes to your cluster. When you join a new node, MCR services start running on the node automatically.

Node roles

When you join a node to a cluster, you specify its role: manager or worker.

  • Manager: Manager nodes are responsible for cluster management functionality and dispatching tasks to worker nodes. Having multiple manager nodes allows your swarm to be highly available and tolerant of node failures.

    Manager nodes also run MKE in a replicated way, so by adding additional manager nodes, you’re also making the cluster highly available.

  • Worker: Worker nodes receive and execute your services and applications. Having multiple worker nodes allows you to scale the computing capacity of your cluster.

    When deploying Mirantis Secure Registry in your cluster, you deploy it to a worker node.

Join a node to the cluster

You can join Windows Server and Linux nodes to the cluster, but only Linux nodes can be managers.

To join nodes to the cluster, go to the MKE web interface and navigate to the Nodes page.

  1. Click Add Node to add a new node.

  2. Select the type of node to add, Windows or Linux.

  3. Click Manager if you want to add the node as a manager.

  4. Check the Use a custom listen address option to specify the address and port where new node listens for inbound cluster management traffic.

  5. Check the Use a custom listen address option to specify the IP address that’s advertised to all members of the cluster for API access.

Copy the displayed command, use SSH to log in to the host that you want to join to the cluster, and run the docker swarm join command on the host.

To add a Windows node, click Windows and follow the instructions in Join Windows worker nodes to a cluster.

After you run the join command in the node, the node is displayed on the Nodes page in the MKE web interface. From there, you can change the node’s cluster configuration, including its assigned orchestrator type.

Pause or drain a node

Once a node is part of the cluster, you can configure the node’s availability so that it is:

  • Active: the node can receive and execute tasks.

  • Paused: the node continues running existing tasks, but doesn’t receive new tasks.

  • Drained: the node won’t receive new tasks. Existing tasks are stopped and replica tasks are launched in active nodes.

Pause or drain a node from the Edit Node page:

  1. In the MKE web interface, browse to the Nodes page and select the node.

  2. In the details pane, click Configure and select Details to open the Edit Node page.

  3. In the Availability section, click Active, Pause, or Drain.

  4. Click Save to change the availability of the node.

Promote or demote a node

You can promote worker nodes to managers to make MKE fault tolerant. You can also demote a manager node into a worker.

To promote or demote a manager node:

  1. Navigate to the Nodes page, and click the node that you want to demote.

  2. In the details pane, click Configure and select Details to open the Edit Node page.

  3. In the Role section, click Manager or Worker.

  4. Click Save and wait until the operation completes.

  5. Navigate to the Nodes page, and confirm that the node role has changed.

If you are load balancing user requests to MKE across multiple manager nodes, remember to remove these nodes from the load-balancing pool when demoting them to workers.

Remove a node from the cluster
Removing worker nodes

Worker nodes can be removed from a cluster at any time.

  1. Shut down the worker node or have it leave the swarm.

  2. Navigate to the Nodes page, and select the node.

  3. In the details pane, click Actions and select Remove.

  4. Click Confirm when prompted.

Removing manager nodes

Manager nodes are ingtegral to the cluster’s overall health, and thus you must be careful when removing one from the cluster.

  1. Confirm that all nodes in the cluster are healthy (otherwise, do not remove manager nodes).

  2. Demote the manager nodes into workers.

  3. Remove the newly-demoted workers from the cluster.

Use the CLI to manage your nodes

You can use the Docker CLI client to manage your nodes from the CLI. To do this, configure your Docker CLI client with a MKE client bundle.

Once you do that, you can start managing your MKE nodes:

docker node ls
Use the API to manage your nodes

You can use the API to manage your nodes in the following ways:

  • Use the node update API to add the orchestrator label (that is, com.docker.ucp.orchestrator.kubernetes):

    /nodes/{id}/update
    
  • Use the /api/ucp/config-toml API to change the default orchestrator setting.

Join Windows worker nodes to your cluster

MKE 3.3 supports worker nodes that run on Windows Server 2019. Only worker nodes are supported on Windows, and all manager nodes in the cluster must run on Linux.

Configure the daemon for Windows nodes

To configure the docker daemon and the Windows environment:

  1. Pull the Windows-specific image of ucp-agent, which is named ucp-agent-win.

  2. Run the Windows worker setup script provided with ucp-agent-win.

  3. Join the cluster with the token provided by the MKE web interface or CLI.

Pull the Windows-specific images

On a manager node, run the following command to list the images that are required on Windows nodes.

docker container run --rm
-v /var/run/docker.sock:/var/run/docker.sock \
mirantis/ucp:3.5.0 images \ --list --enable-windows
mirantis/ucp-agent-win:3.5.0
mirantis/ucp-dsinfo-win:3.5.0

On a Windows Server node, in a PowerShell terminal running as Administrator, log in to Docker Hub with the docker login command and pull the listed images.

docker image pull mirantis/ucp-agent-win:3.5.0
docker image pull mirantis/ucp-dsinfo-win:3.5.0

If the cluster is deployed in a site that is offline, sideload MKE images onto the Windows Server nodes. For more information, refer to MKE Deployment Guide: Install MKE offline.

Join the Windows node to the cluster

To join the cluster using the docker swarm join command provided by the MKE web interface and CLI:

  1. Log in to the MKE web interface with an administrator account.

  2. Navigate to the Nodes page.

  3. Click Add Node to add a new node.

  4. In the Node Type section, click Windows.

  5. In the Step 2 section, select the check box for “I have followed the instructions and I’m ready to join my Windows node.”

  6. Select the Use a custom listen address option to specify the address and port where new node listens for inbound cluster management traffic.

  7. Select the Use a custom listen address option to specify the IP address that’s advertised to all members of the cluster for API access.

Copy the displayed command. It looks similar to the following:

docker swarm join --token <token> <mke-manager-ip>

You can also use the command line to get the join token. Using your MKE client bundle, run:

docker swarm join-token worker

Run the docker swarm join command on each instance of Windows Server that will be a worker node.

Windows nodes limitations

The following features are not yet supported on Windows Server 2019:

  • Networking

    • Encrypted networks are not supported. If you’ve upgraded from a previous version, you’ll also need to recreate the ucp-hrm network to make it unencrypted.

  • Secrets

    • When using secrets with Windows services, Windows stores temporary secret files on disk. You can use BitLocker on the volume containing the Docker root directory to encrypt the secret data at rest.

    • When creating a service which uses Windows containers, the options to specify UID, GID, and mode are not supported for secrets. Secrets are currently only accessible by administrators and users with system access within the container.

  • Mounts

    • On Windows, Docker can’t listen on a Unix socket. Use TCP or a named pipe instead.

Use a load balancer

Once you’ve joined multiple manager nodes for high availability (HA), you can configure your own load balancer to balance user requests across all manager nodes.

This allows users to access MKE using a centralized domain name. If a manager node goes down, the load balancer can detect that and stop forwarding requests to that node, so that the failure goes unnoticed by users.

Load-balancing on MKE

Since MKE uses mutual TLS, make sure you configure your load balancer to:

  • Load-balance TCP traffic on ports 443 and 6443.

  • Not terminate HTTPS connections.

  • Use the /_ping endpoint on each manager node, to check if the node is healthy and if it should remain on the load balancing pool or not.

Load balancing MKE and MSR

By default, both MKE and MSR use port 443. If you plan on deploying MKE and MSR, your load balancer needs to distinguish traffic between the two by IP address or port number.

  • If you want to configure your load balancer to listen on port 443:

    • Use one load balancer for MKE, and another for MSR,

    • Use the same load balancer with multiple virtual IPs.

  • Configure your load balancer to expose MKE or MSR on a port other than 443.

Important

Additional requirements

In addition to configuring your load balancer to distinguish between MKE and MSR, configuring a load balancer for MSR has further requirements (refer to the MSR documentation).

Configuration examples

Use the following examples to configure your load balancer for MKE.

user  nginx;
   worker_processes  1;

   error_log  /var/log/nginx/error.log warn;
   pid        /var/run/nginx.pid;

   events {
      worker_connections  1024;
   }

   stream {
      upstream ucp_443 {
         server <UCP_MANAGER_1_IP>:443 max_fails=2 fail_timeout=30s;
         server <UCP_MANAGER_2_IP>:443 max_fails=2 fail_timeout=30s;
         server <UCP_MANAGER_N_IP>:443  max_fails=2 fail_timeout=30s;
      }
      server {
         listen 443;
         proxy_pass ucp_443;
      }
   }
global
      log /dev/log    local0
      log /dev/log    local1 notice

   defaults
         mode    tcp
         option  dontlognull
         timeout connect     5s
         timeout client      50s
         timeout server      50s
         timeout tunnel      1h
         timeout client-fin  50s
   ### frontends
   # Optional HAProxy Stats Page accessible at http://<host-ip>:8181/haproxy?stats
   frontend ucp_stats
         mode http
         bind 0.0.0.0:8181
         default_backend ucp_stats
   frontend ucp_443
         mode tcp
         bind 0.0.0.0:443
         default_backend ucp_upstream_servers_443
   ### backends
   backend ucp_stats
         mode http
         option httplog
         stats enable
         stats admin if TRUE
         stats refresh 5m
   backend ucp_upstream_servers_443
         mode tcp
         option httpchk GET /_ping HTTP/1.1\r\nHost:\ <UCP_FQDN>
         server node01 <UCP_MANAGER_1_IP>:443 weight 100 check check-ssl verify none
         server node02 <UCP_MANAGER_2_IP>:443 weight 100 check check-ssl verify none
         server node03 <UCP_MANAGER_N_IP>:443 weight 100 check check-ssl verify none
{
      "Subnets": [
         "subnet-XXXXXXXX",
         "subnet-YYYYYYYY",
         "subnet-ZZZZZZZZ"
      ],
      "CanonicalHostedZoneNameID": "XXXXXXXXXXX",
      "CanonicalHostedZoneName": "XXXXXXXXX.us-west-XXX.elb.amazonaws.com",
      "ListenerDescriptions": [
         {
               "Listener": {
                  "InstancePort": 443,
                  "LoadBalancerPort": 443,
                  "Protocol": "TCP",
                  "InstanceProtocol": "TCP"
               },
               "PolicyNames": []
         }
      ],
      "HealthCheck": {
         "HealthyThreshold": 2,
         "Interval": 10,
         "Target": "HTTPS:443/_ping",
         "Timeout": 2,
         "UnhealthyThreshold": 4
      },
      "VPCId": "vpc-XXXXXX",
      "BackendServerDescriptions": [],
      "Instances": [
         {
               "InstanceId": "i-XXXXXXXXX"
         },
         {
               "InstanceId": "i-XXXXXXXXX"
         },
         {
               "InstanceId": "i-XXXXXXXXX"
         }
      ],
      "DNSName": "XXXXXXXXXXXX.us-west-2.elb.amazonaws.com",
      "SecurityGroups": [
         "sg-XXXXXXXXX"
      ],
      "Policies": {
         "LBCookieStickinessPolicies": [],
         "AppCookieStickinessPolicies": [],
         "OtherPolicies": []
      },
      "LoadBalancerName": "ELB-UCP",
      "CreatedTime": "2017-02-13T21:40:15.400Z",
      "AvailabilityZones": [
         "us-west-2c",
         "us-west-2a",
         "us-west-2b"
      ],
      "Scheme": "internet-facing",
      "SourceSecurityGroup": {
         "OwnerAlias": "XXXXXXXXXXXX",
         "GroupName":  "XXXXXXXXXXXX"
      }
   }

You can deploy your load balancer using:

# Create the nginx.conf file, then
# deploy the load balancer

docker run --detach \
--name ucp-lb \
--restart=unless-stopped \
--publish 443:443 \
--volume ${PWD}/nginx.conf:/etc/nginx/nginx.conf:ro \
nginx:stable-alpine
# Create the haproxy.cfg file, then
# deploy the load balancer

docker run --detach \
--name ucp-lb \
--publish 443:443 \
--publish 8181:8181 \
--restart=unless-stopped \
--volume ${PWD}/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro \
haproxy:1.7-alpine haproxy -d -f /usr/local/etc/haproxy/haproxy.cfg

Deploy route reflectors for improved network performance

MKE uses Calico as the default Kubernetes networking solution. Calico is configured to create a BGP mesh between all nodes in the cluster.

As you add more nodes to the cluster, networking performance starts decreasing. If your cluster has more than 100 nodes, you should reconfigure Calico to use Route Reflectors instead of a node-to-node mesh.

This article guides you in deploying Calico Route Reflectors in a MKE cluster. MKE running on Microsoft Azure uses Azure SDN instead of Calico for multi-host networking. If your MKE deployment is running on Azure, you don’t need to configure it this way.

Before you begin

For production-grade systems, you should deploy at least two Route Reflectors, each running on a dedicated node. These nodes should not be running any other workloads.

If Route Reflectors are running on a same node as other workloads, swarm ingress and NodePorts might not work in these workloads.

Choose dedicated nodes
  1. Taint the nodes to ensure that they are unable to run other workloads.

  2. For each dedicated node, run:

    kubectl taint node <node-name> \
    com.docker.ucp.kubernetes.calico/route-reflector=true:NoSchedule
    
  3. Add labels to those nodes:

    kubectl label nodes <node-name> \
    com.docker.ucp.kubernetes.calico/route-reflector=true
    
Deploy the Route Reflectors
  1. Create a calico-rr.yaml file with the following content:

    kind: DaemonSet
    apiVersion: extensions/v1beta1
    metadata:
      name: calico-rr
      namespace: kube-system
      labels:
        app: calico-rr
    spec:
      updateStrategy:
        type: RollingUpdate
      selector:
        matchLabels:
          k8s-app: calico-rr
      template:
        metadata:
          labels:
            k8s-app: calico-rr
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
        spec:
          tolerations:
            - key: com.docker.ucp.kubernetes.calico/route-reflector
              value: "true"
              effect: NoSchedule
          hostNetwork: true
          containers:
            - name: calico-rr
              image: calico/routereflector:v0.6.1
              env:
                - name: ETCD_ENDPOINTS
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_endpoints
                - name: ETCD_CA_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_ca
                # Location of the client key for etcd.
                - name: ETCD_KEY_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_key # Location of the client certificate for etcd.
                - name: ETCD_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_cert
                - name: IP
                  valueFrom:
                    fieldRef:
                      fieldPath: status.podIP
              volumeMounts:
                - mountPath: /calico-secrets
                  name: etcd-certs
              securityContext:
                privileged: true
          nodeSelector:
            com.docker.ucp.kubernetes.calico/route-reflector: "true"
          volumes:
          # Mount in the etcd TLS secrets.
            - name: etcd-certs
              secret:
                secretName: calico-etcd-secrets
    
  2. Deploy the DaemonSet using:

    kubectl create -f calico-rr.yaml
    
Configure calicoctl

To reconfigure Calico to use Route Reflectors instead of a node-to-node mesh, you’ll need to tell calicoctl where to find the etcd key-value store managed by MKE. From a CLI with a MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

UCP_VERSION=$(docker version --format '{{index (split .Server.Version "/") 1}}')
alias calicoctl="\
docker run -i --rm \
  --pid host \
  --net host \
  -e constraint:ostype==linux \
  -e ETCD_ENDPOINTS=127.0.0.1:12378 \
  -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
  -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
  -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
  -v /var/run/calico:/var/run/calico \
  -v ucp-node-certs:/ucp-node-certs:ro \
  mirantis/ucp-dsinfo:${UCP_VERSION} \
  calicoctl \
"
Disable node-to-node BGP mesh

After configuring calicoctl, check the current Calico BGP configuration:

calicoctl get bgpconfig

If you don’t see any configuration listed, create one:

calicoctl create -f - <<EOF
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: false
  asNumber: 63400
EOF

This action creates a new configuration with node-to-node mesh BGP disabled.

If you have a configuration, and meshenabled is set to true:

  1. Update your configuration:

    calicoctl get bgpconfig --output yaml > bgp.yaml
    
  2. Edit the bgp.yaml file, updating nodeToNodeMeshEnabled to false.

  3. Update the Calico configuration:

    calicoctl replace -f - < bgp.yaml
    
Configure Calico to use Route Reflectors

To configure Calico to use the Route Reflectors you need to know the AS number for your network first. For that, run:

calicoctl get nodes --output=wide

Using the AS number, create the Calico configuration by customizing and running the following snippet for each route reflector:

calicoctl create -f - << EOF
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: bgppeer-global
spec:
  peerIP: <IP_RR>
  asNumber: <AS_NUMBER>
EOF

Where:

  • IP_RR is the IP of the node where the Route Reflector pod is deployed.

  • AS_NUMBER is the same AS number for your nodes.

Stop calico-node pods
  1. Manually delete any calico-mode pods that are running on nodes dedicated to the running of route reflectors, as this will ensure that there are no instances in whic pods and route reflectors are running on the same node.

  2. Using your MKE client bundle:

    # Find the Pod name
    kubectl -n kube-system \
      get pods --selector k8s-app=calico-node -o wide | \
      grep <node-name>
    
    # Delete the Pod
    kubectl -n kube-system delete pod <pod-name>
    
Validate peers
  1. Verify that calico-node pods running on other nodes are peering with the Route Reflector.

  2. From a CLI with a MKE client bundle, use a Swarm affinity filter to run calicoctl node status on any node running calico-node:

    UCP_VERSION=$(docker version --format {% raw %}'{{index (split .Server.Version "/") 1}}'{% endraw %})
    docker run -i --rm \
      --pid host \
      --net host \
      -e affinity:container=='k8s_calico-node.*' \
      -e ETCD_ENDPOINTS=127.0.0.1:12378 \
      -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
      -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
      -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
      -v /var/run/calico:/var/run/calico \
      -v ucp-node-certs:/ucp-node-certs:ro \
      mirantis/ucp-dsinfo:${UCP_VERSION} \
      calicoctl node status
    

The delivered results should resemble the following sample output:

IPv4 BGP status
+--------------+-----------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE |  SINCE   |    INFO     |
+--------------+-----------+-------+----------+-------------+
| 172.31.24.86 | global    | up    | 23:10:04 | Established |
+--------------+-----------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

Use two-factor authentication

Two-factor authentication (2FA) adds an extra layer of security when logging in to the MKE web UI. Once enabled, 2FA requires the user to submit an additional authentication code generated on a separate mobile device along with their user name and password at login.

Configure 2FA

MKE 2FA requires the use of a time-based one-time password (TOTP) application installed on a mobile device to generate a time-based authentication code for each login to the MKE web UI. Examples of such applications include 1Password, Authy, and LastPass Authenticator.

To configure 2FA:

  1. Install a TOTP application to your mobile device.

  2. In the MKE web UI, navigate to My Profile > Security.

  3. Toggle the Two-factor authentication control to enabled.

  4. Open the TOTP application and scan the offered QR code. The device will display a six-digit code.

  5. Enter the six-digit code in the offered field and click Register. The TOTP application will save your MKE account.

    Important

    A set of recovery codes displays in the MKE web UI when two-factor authentication is enabled. Save these codes in a safe location, as they can be used to access the MKE web UI if for any reason the configured mobile device becomes unavailable. Refer to Recover 2FA for details.

Access MKE using 2FA

Once 2FA is enabled, you will need to provide an authentication code each time you log in to the MKE web UI. Typically, the TOTP application installed on your mobile device generates the code and refreshes it every 30 seconds.

Access the MKE web UI with 2FA enabled:

  1. In the MKE web UI, click Sign in. The Sign in page will display.

  2. Enter a valid user name and password.

  3. Access the MKE code in the TOTP application on your mobile device.

  4. Enter the current code in the 2FA Code field in the MKE web UI.

Note

Multiple authentication failures may indicate a lack of synchronization between the mobile device clock and the mobile provider.

Disable 2FA

Mirantis strongly recommends using 2FA to secure MKE accounts. If you need to temporarily disable 2FA, re-enable it as soon as possible.

To disable 2FA:

  1. In the MKE web UI, navigate to My Profile > Security.

  2. Toggle the Two-factor authentication control to disabled.

Recover 2FA

If the mobile device with authentication codes is unavailable, you can re-access MKE using any of the recovery codes that display in the MKE web UI when 2FA is first enabled.

To recover 2FA:

  1. Enter one of the recovery codes when prompted for the two-factor authentication code upon login to the MKE web UI.

  2. Navigate to My Profile > Security.

  3. Disable 2FA and then re-enable it.

  4. Open the TOTP application and scan the offered QR code. The device will display a six-digit code.

  5. Enter the six-digit code in the offered field and click Register. The TOTP application will save your MKE account.

If there are no recovery codes to draw from, ask your system administrator to disable 2FA in order to regain access to the MKE web UI. Once done, repeat the Configure 2FA procedure to reinstate 2FA protection.

MKE administrators are not able to re-enable 2FA for users.

Configure and use OpsCare

Available since MKE 3.5.0

Any time there is an issue with your cluster, OpsCare routes notifications from your MKE deployment to Mirantis support engineers. These company personnel will then either directly resolve the problem or arrange to troubleshoot the matter with you.

For more information, refer to OpsCare for Mirantis Cloud Platform.

Configure OpsCare

To configure OpsCare you must first obtain a Salesforce username, password, and environment ID from your Mirantis Customer Success Manager. You then store these credentials as Swarm secrets using the following naming convention:

  • User name: sfdc_username

  • Password: sfdc_password

  • Environment ID: sfdc_environment_id

Note

Every cluster that uses OpsCare must have its own unique sfdc_environment_id.


To configure OpsCare using the CLI:

  1. Create secrets for your Salesforce login credentials:

    printf "<username-obtained-from-csm>" | docker secret create sfdc_username -
    printf "<password-obtained-from-csm>" | docker secret create sfdc_password -
    printf "<environment-id-obtained-from-csm>" | docker secret create sfdc_environment_id -
    
  2. Enable OpsCare:

    MKE_USERNAME=<mke-username>
    MKE_PASSWORD=<mke-password>
    MKE_HOST=<mke-host>
    
    AUTHTOKEN=$(curl --silent --insecure --data "{\"username\":\"$MKE_USERNAME\",\"password\":\"$MKE_PASSWORD\"}" https://$MKE_HOST/auth/login | jq --raw-output .auth_token)
    curl --silent --insecure -X GET "https://$MKE_HOST/api/ucp/config-toml" -H "accept: application/toml" -H "Authorization: Bearer $AUTHTOKEN" > ucp-config.toml
    sed -i 's/ops_care = false/ops_care = true/' ucp-config.toml
    curl --silent --insecure -X PUT -H "accept: application/toml" -H "Authorization: Bearer $AUTHTOKEN" --upload-file './ucp-config.toml' https://$MKE_HOST/api/ucp/config-toml
    

To configure OpsCare using the MKE web UI:

  1. Log in to the MKE web UI.

  2. Using the left-side navigation panel, navigate to <username> > Admin Settings > Usage.

  3. Under OpsCare Settings, toggle the Enable OpsCare slider to the right.

  4. In the Salesforce Username field, enter your Salesforce user name.

  5. Next, enter your Salesforce password and Salesforce environment ID.

  6. Click Create Secrets.

  7. Click Save.

Manage Salesforce alerts

OpsCare uses a predefined group of MKE alerts to notify your Customer Success Manager of problems with your deployment. This alerts group is identical to those seen in any MKE cluster that is provisioned by Mirantis Container Cloud. A single watchdog alert serves to verify the proper function of the OpsCare alert pipeline as a whole.

To verify that the OpsCare alerts are functioning properly:

  1. Log in to Salesforce.

  2. Navigate to Cases and verify that the watchdog alert is present. It presents as Watchdog alert. It is always firing.

Disable OpsCare

You must disable OPsCare before you can delete the three secrets in use.

To disable OpsCare:

  1. Log in to the MKE web UI.

  2. Using the left-side navigation panel, navigate to <username> > Admin Settings > Usage.

  3. Toggle the Enable Ops Care slider to the left.

Alternatively, you can disable OpsCare by changing the ops_care entry in the MKE configuration file to false.

Configure cluster and service networking in an existing cluster

Available since MKE 3.5.0

On systems that use the managed CNI, you can switch existing clusters to either kube-proxy with ipvs proxier or eBPF mode.

MKE does not support switching kube-proxy in an existing cluster from ipvs proxier to iptables proxier, nor does it support disabling eBPF mode after it has been enabled. Using a CNI that supports both cluster and service networking requires that you disable kube-proxy.

Refer to Cluster and service networking options in the MKE Installation Guide for information on how to configure cluster and service networking at install time.

Caution

The configuration changes described here cannot be reversed. As such, Mirantis recommends that you make a cluster backup, drain your workloads, and take your cluster offline prior to performing any of these changes.

Caution

Swarm workloads that require the use of encrypted overlay networks must use iptables proxier. Be aware that the other networking options detailed here automatically disable Docker Swarm encrypted overlay networks.


To switch an existing cluster to kube-proxy with ipvs proxier while using the managed CNI:

  1. Obtain the current MKE configuration file for your cluster.

  2. Set kube_proxy_mode to "ipvs".

  3. Upload the new MKE configuration file. Be aware that this will require a wait time of approximately five minutes.

  4. Verify that the following values are set in your MKE configuration file:

    unmanaged_cni = false
    calico_ebpf_enabled = false
    kube_default_drop_masq_bits = false
    kube_proxy_mode = "ipvs"
    kube_proxy_no_cleanup_on_start = false
    
  5. Verify that the ucp-kube-proxy container logs on all nodes contain the following:

    KUBE_PROXY_MODE (ipvs) CLEANUP_ON_START_DISABLED false
    Performing cleanup
    kube-proxy cleanup succeeded
    Actually starting kube-proxy....
    
  6. Obtain the current MKE configuration file for your cluster.

  7. Set kube_proxy_no_cleanup_on_start to true.

  8. Upload the new MKE configuration file. Be aware that this will require a wait time of approximately five minutes.

  9. Reboot all nodes.

  10. Verify that the following values are set in your MKE configuration file and that your cluster is in a healthy state with all nodes ready:

    unmanaged_cni = false
    calico_ebpf_enabled = false
    kube_default_drop_masq_bits = false
    kube_proxy_mode = "ipvs"
    kube_proxy_no_cleanup_on_start = true
    
  11. Verify that the ucp-kube-proxy container logs on all nodes contain the following:

    KUBE_PROXY_MODE (ipvs) CLEANUP_ON_START_DISABLED true
    Actually starting kube-proxy....
    .....
    I1111 02:41:05.559641     1 server_others.go:274] Using ipvs Proxier.
    W1111 02:41:05.559951     1 proxier.go:445] IPVS scheduler not specified, use rr by default
    
  12. Optional. Configure the following ipvs-related parameters in the MKE configuration file (otherwise, MKE will use the Kubernetes default parameter settings):

    • ipvs_exclude_cidrs = ""

    • ipvs_min_sync_period = ""

    • ipvs_scheduler = ""

    • ipvs_strict_arp = false

    • ipvs_sync_period = ""

    • ipvs_tcp_timeout = ""

    • ipvs_tcpfin_timeout = ""

    • ipvs_udp_timeout = ""

    For more information on using these parameters, refer to kube-proxy in the Kubernetes documentation.


To switch an existing cluster to eBPF mode while using the managed CNI:

  1. Verify that the prerequisites for eBPF use have been met, including kernel compatibility, for all Linux manager and worker nodes. Refer to the Calico documentation Enable the eBPF dataplane for more information.

  2. Obtain the current MKE configuration file for your cluster.

  3. Set kube_default_drop_masq_bits to true.

  4. Upload the new MKE configuration file. Be aware that this will require a wait time of approximately five minutes.

  5. Verify that the ucp-kube-proxy container started on all nodes, that the kube-proxy cleanup took place, and that ucp-kube-proxy launched kube-proxy.

    for cont in $(docker ps -a|rev | cut -d' ' -f 1 | rev|grep ucp-kube-proxy); \
    do nodeName=$(echo $cont|cut -d '/' -f1); \
    docker logs $cont 2>/dev/null|grep -q 'kube-proxy cleanup succeeded'; \
    if [ $? -ne 0 ]; \
    then echo $nodeName; \
    fi; \
    done|sort
    

    Expected output in the ucp-kube-proxy logs:

    KUBE_PROXY_MODE (iptables) CLEANUP_ON_START_DISABLED false
    Performing cleanup
    kube-proxy cleanup succeeded
    Actually starting kube-proxy....
    

    Note

    If the count returned by the command does not quickly converge at 0, check the ucp-kube-proxy logs on the nodes where either of the following took place:

    • The ucp-kube-proxy container did not launch.

    • The kube-proxy cleanup did not happen.

  6. Reboot all nodes.

  7. Obtain the current MKE configuration file for your cluster.

  8. Verify that the following values are set in your MKE configuration file:

    unmanaged_cni = false
    calico_ebpf_enabled = false
    kube_default_drop_masq_bits = true
    kube_proxy_mode = "iptables"
    kube_proxy_no_cleanup_on_start = false
    
  9. Verify that the ucp-kube-proxy container logs on all nodes contain the following:

    KUBE_PROXY_MODE (iptables) CLEANUP_ON_START_DISABLED false
    Performing cleanup
    ....
    kube-proxy cleanup succeeded
    Actually starting kube-proxy....
    ....
    I1111 03:29:25.048458     1 server_others.go:212] Using iptables Proxier.
    
  10. Set kube_proxy_mode to "disabled".

  11. Set calico_ebpf_enabled to true.

  12. Upload the new MKE configuration file. Be aware that this will require a wait time of approximately five minutes.

  13. Verify that the ucp-kube-proxy container started on all nodes, that the kube-proxy cleanup took place, and that ucp-kube-proxy did not launch kube-proxy.

    for cont in $(docker ps -a|rev | cut -d' ' -f 1 | rev|grep ucp-kube-proxy); \
    do nodeName=$(echo $cont|cut -d '/' -f1); \
    docker logs $cont 2>/dev/null|grep -q 'Sleeping forever'; \
    if [ $? -ne 0 ]; \
    then echo $nodeName; \
    fi; \
    done|sort
    

    Expected output in the ucp-kube-proxy logs:

    KUBE_PROXY_MODE (disabled) CLEANUP_ON_START_DISABLED false
    Performing cleanup
    kube-proxy cleanup succeeded
    Sleeping forever....
    

    Note

    If the count returned by the command does not quickly converge at 0, check the ucp-kube-proxy logs on the nodes where either of the following took place:

    • The ucp-kube-proxy container did not launch.

    • The ucp-kube-proxy container launched kube-proxy.

  14. Obtain the current MKE configuration file for your cluster.

  15. Verify that the following values are set in your MKE configuration file:

    unmanaged_cni = false
    calico_ebpf_enabled = true
    kube_default_drop_masq_bits = true
    kube_proxy_mode = "disabled"
    kube_proxy_no_cleanup_on_start = false
    
  16. Set kube_proxy_no_cleanup_on_start to true.

  17. Upload the new MKE configuration file. Be aware that this will require a wait time of approximately five minutes.

  18. Verify that the following values are set in your MKE configuration file and that your cluster is in a healthy state with all nodes ready:

    unmanaged_cni = false
    calico_ebpf_enabled = true
    kube_default_drop_masq_bits = true
    kube_proxy_mode = "disabled"
    kube_proxy_no_cleanup_on_start = true
    
  19. Verify that eBPF mode is operational by confirming the presence of the following lines in the ucp-kube-proxy container logs:

    KUBE_PROXY_MODE (disabled) CLEANUP_ON_START_DISABLED true
    "Sleeping forever...."
    
  20. Verify that you can SSH into all nodes.

Authorize role-based access

MKE allows administrators to authorize users to view, edit, and use cluster resources by granting role-based permissions for specific resource sets. This section describes how to configure all the relevant components of role-based access control (RBAC).

Access control model

Mirantis Kubernetes Engine (MKE) lets you authorize users to view, edit, and use cluster resources by granting role-based permissions against resource sets.

To authorize access to cluster resources across your organization, MKE administrators might take the following high-level steps:

  • Add and configure subjects (users, teams, and service accounts).

  • Define custom roles (or use defaults) by adding permitted operations per type of resource.

  • Group cluster resources into resource sets of Swarm collections or Kubernetes namespaces.

  • Create grants by combining subject + role + resource set.

Subjects

A subject represents a user, team, organization, or a service account. A subject can be granted a role that defines permitted operations against one or more resource sets.

  • User: A person authenticated by the authentication backend. Users can belong to one or more teams and one or more organizations.

  • Team: A group of users that share permissions defined at the team level. A team can be in one organization only.

  • Organization: A group of teams that share a specific set of permissions, defined by the roles of the organization.

  • Service account: A Kubernetes object that enables a workload to access cluster resources which are assigned to a namespace.

Roles

Roles define what operations can be done by whom. A role is a set of permitted operations against a type of resource, like a container or volume, which is assigned to a user or a team with a grant.

For example, the built-in role, Restricted Control, includes permissions to view and schedule nodes but not to update nodes. A custom DBA role might include permissions to r-w-x (read, write, and execute) volumes and secrets.

Most organizations use multiple roles to fine-tune the appropriate access. A given team or user may have different roles provided to them depending on what resource they are accessing.

Resource sets

To control user access, cluster resources are grouped into Docker Swarm collections or Kubernetes namespaces.

  • Swarm collections: A collection has a directory-like structure that holds Swarm resources. You can create collections in MKE by defining a directory path and moving resources into it. Also, you can create the path in MKE and use labels in your YAML file to assign application resources to the path. Resource types that users can access in a Swarm collection include containers, networks, nodes, services, secrets, and volumes.

  • Kubernetes namespaces: A namespace is a logical area for a Kubernetes cluster. Kubernetes comes with a default namespace for your cluster objects, plus two more namespaces for system and public resources. You can create custom namespaces, but unlike Swarm collections, namespaces cannot be nested. Resource types that users can access in a Kubernetes namespace include pods, deployments, network policies, nodes, services, secrets, and many more.

Together, collections and namespaces are named resource sets.

Grants

A grant is made up of a subject, a role, and a resource set.

Grants define which users can access what resources in what way. Grants are effectively Access Control Lists (ACLs) which provide comprehensive access policies for an entire organization when grouped together.

Only an administrator can manage grants, subjects, roles, and access to resources.

Note

An administrator is a user who creates subjects, groups resources by moving them into collections or namespaces, defines roles by selecting allowable operations, and applies grants to users and teams.

Secure Kubernetes defaults

For cluster security, only MKE admin users and service accounts that are granted the cluster-admin ClusterRole for all Kubernetes namespaces via a ClusterRoleBinding can deploy pods with privileged options. This prevents a platform user from being able to bypass the Universal Control Plane Security Model.

These privileged options include:

Pods with any of the following defined in the Pod Specification:

  • PodSpec.hostIPC - Prevents a user from deploying a pod in the host’s IPC Namespace.

  • PodSpec.hostNetwork - Prevents a user from deploying a pod in the host’s Network Namespace.

  • PodSpec.hostPID - Prevents a user from deploying a pod in the host’s PID Namespace.

  • SecurityContext.allowPrivilegeEscalation - Prevents a child process of a container from gaining more privileges than its parent.

  • SecurityContext.capabilities - Prevents additional Linux Capabilities from being added to a pod.

  • SecurityContext.privileged - Prevents a user from deploying a Privileged Container.

  • Volume.hostPath - Prevents a user from mounting a path from the host into the container. This could be a file, a directory, or even the Docker Socket.

Persistent Volumes using the following storage classes:

  • Local - Prevents a user from creating a persistent volume with the Local Storage Class. The Local storage class allows a user to mount directorys from the host into a pod. This could be a file, a directory, or even the Docker Socket.

Note

If an admin has created a persistent volume with the local storage class, a non-admin could consume this via a persistent volume claim.

If a user without a cluster admin role tries to deploy a pod with any of these privileged options, an error similar to the following example is displayed:

Error from server (Forbidden): error when creating "pod.yaml": pods "mypod"
is forbidden: user "<user-id>" is not an admin and does not have permissions
to use privileged mode for resource

See also

Kubernetes

Create organizations, teams, and users

This topic describes how to create organizations, teams, and users.

Note

  • Individual users can belong to multiple teams but a team can belong to only one organization.

  • New users have a default permission level that you can extend by adding the user to a team and creating grants. Alternatively, you can make the user an administrator to extend their permission level.

  • All users are authenticated on the back end. MKE provides built-in authentication and also integrates with LDAP directory services. To use MKE built-in authentication, you must create users manually.

Log in to the MKE web UI and perform the following steps:

To create an organization:

  1. Navigate to Access Control > Orgs & Teams > Create.

  2. Enter an organization name and click Create.


To create a team in the organization:

  1. Navigate to the required organization and click the plus sign in the top right corner.

  2. Enter a team name and description and click Create.


To create a user:

  1. Navigate to Access Control > Users > Create.

  2. Enter a user name, password, and the user’s full name.

  3. Optional. Select IS A MIRANTIS KUBERNETES ENGINE ADMIN to give the user administrator privileges.

  4. Click Create.


To add an existing user to a team:

  1. Navigate to the required team and click the plus sign in the top right corner.

  2. Select the users you want to include and click Add Users.

Enable LDAP and sync teams and users

This topic describes how to enable LDAP and to sync your LDAP directory to the teams and users that you have created in MKE.

To enable LDAP and sync to your LDAP directory:

  1. Log in to the MKE web UI as an MKE administrator.

  2. Under your user name drop-down, click Admin Settings > Authentication & Authorization.

  3. Scroll down and click Enabled next to LDAP. A list of LDAP settings displays.

  4. Enter the values that correspond with your LDAP server installation.

  5. Test your configuration in MKE.

  6. Create a team in MKE to mirror your LDAP directory.

  7. Select ENABLE SYNC TEAM MEMBERS.

  8. Choose between the following two methods for matching group members from an LDAP directory. Refer to the table below for more information.

    • Select LDAP MATCH METHOD to change the method for matching group members in the LDAP directory from Match Search Results (default) to Match Group Members. Fill out Group DN and Group Member Attribute as required.

    • Keep the default Match Search Results method and fill out Search Base DN, Search filter, and Search subtree instead of just one level as required.

  9. Optional. Select Immediately Sync Team Members to run an LDAP sync operation immediately after saving the configuration for the team.

  10. Click Create.


There are two methods for matching group members from an LDAP directory:

Bind method

Description

Match Group Members (direct bind)

Specifies that team members are synced directly with members of a group in the LDAP directory of your organization. The team membership is synced to match the membership of the group.

Group DN

The distinguished name of the group from which you select users.

Group Member Attribute

The value of this group attribute corresponds to the distinguished names of the members of the group.

Match Search Results (search bind)

Specifies that team members are synced using a search query against the LDAP directory of your organization. The team membership is synced to match the users in the search results.

Search Base DN

The distinguished name of the node in the directory tree where the search starts looking for users.

Search filter

Filter to find users. If empty, existing users in the search scope are added as members of the team.

Search subtree instead of just one level

Defines search through the full LDAP tree, not just one level, starting at the base DN.

Define roles with authorized API operations

A role defines a set of API operations permitted against a resource set. You apply roles to users and teams by creating grants.

Some important rules regarding roles:

  • Roles are always enabled.

  • Roles can’t be edited. To edit a role, you must delete and recreate it.

  • Roles used within a grant can be deleted only after first deleting the grant.

  • Only administrators can create and delete roles.

Default roles

You can define custom roles or use the following built-in roles:

Role

Description

None

Users have no access to Swarm or Kubernetes resources. Maps to No Access role in UCP 2.1.x.

View Only

Users can view resources but can’t create them.

Restricted Control

Users can view and edit resources but can’t run a service or container in a way that affects the node where it’s running. Users cannot mount a node directory, exec into containers, or run containers in privileged mode or with additional kernel capabilities.

Scheduler

Users can view nodes (worker and manager) and schedule (not view) workloads on these nodes. By default, all users are granted the Scheduler role against the /Shared collection. (To view workloads, users need permissions such as Container View).

Full Control

Users can view and edit all granted resources. They can create containers without any restriction, but can’t see the containers of other users.

Create a custom role for Swarm

When creating custom roles to use with Swarm, the Roles page lists all default and custom roles applicable in the organization.

You can give a role a global name, such as “Remove Images”, which might enable the Remove and Force Remove operations for images. You can apply a role with the same name to different resource sets.

  1. Click Roles under Access Control.

  2. Click Create Role.

  3. Enter the role name on the Details page.

  4. Click Operations. All available API operations are displayed.

  5. Select the permitted operations per resource type.

  6. Click Create.

Swarm operations roles

This section describes the set of operations (calls) that can be executed to the Swarm resources. Be aware that each permission corresponds to a CLI command and enables the user to execute that command.

Operation

Command

Description

Config

docker config

Manage Docker configurations. See child commands for specific examples.

Container

docker container

Manage Docker containers. See child commands for specific examples.

Container

docker container create

Create a new container. See extended description and examples for more information.

Container

docker create [OPTIONS] IMAGE [COMMAND] [ARG...]

Create new containers. See extended description and examples for more information.

Container

docker update [OPTIONS] CONTAINER [CONTAINER...]

Update configuration of one or more containers. Using this command can also prevent containers from consuming too many resources from their Docker host. See extended description and examples for more information.

Container

docker rm [OPTIONS] CONTAINER [CONTAINER...]

Remove one or more containers. See options and examples for more information.

Image

docker image COMMAND

Remove one or more containers. See options and examples for more information.

Image

docker image remove

Remove one or more images. See child commands for examples.

Network

docker network

Manage networks. You can use child commands to create, inspect, list, remove, prune, connect, and disconnect networks.

Node

docker node COMMAND

Manage Swarm nodes. See child commands for examples.

Secret

docker secret COMMAND

Manage Docker secrets. See child commands for sample usage and options.

Service

docker service COMMAND

Manage services. See child commands for sample usage and options.

Volume

docker volume create [OPTIONS] [VOLUME]

Create a new volume that containers can consume and store data in. See examples for more information.

Volume

docker volume rm [OPTIONS] VOLUME [VOLUME...]

Remove one or more volumes. Users cannot remove a volume that is in use by a container. See related commands for more information.

See also

Kubernetes

Use collections and namespaces

MKE enables access control to cluster resources by grouping them into two types of resource sets: Swarm collections (for Swarm workloads) and Kubernetes namespaces (for Kubernetes workloads). Refer to rbac for a description of the difference between Swarm collections and Kubernetes namespaces. Administrators use grants to combine resources sets, giving users permission to access specific cluster resources.

Swarm collection labels

Users assign resources to collections with labels. The following resource types have editable labels and thus you can assign them to collections: services, nodes, secrets, and configs. For these resources types, change com.docker.ucp.access.label to move a resource to a different collection. Collections have generic names by default, but you can assign them meaningful names as required (such as dev, test, and prod).

Note

The following resource types do not have editable labels and thus you cannot assign them to collections: containers, networks, and volumes.

Groups of resources identified by a shared label are called stacks. You can place one stack of resources in multiple collections. MKE automatically places resources in the default collection. Users can change this using a specific com.docker.ucp.access.label in the stack/compose file.

The system uses com.docker.ucp.collection.* to enable efficient resource lookup. You do not need to manage these labels, as MKE controls them automatically. Nodes have the following labels set to true by default:

  • com.docker.ucp.collection.root

  • com.docker.ucp.collection.shared

  • com.docker.ucp.collection.swarm

Default and built-in Swarm collections

This topic describes both MKE default and built-in Swarm collections.


Default Swarm collections

Each user has a default collection, which can be changed in the MKE preferences.

To deploy resources, they must belong to a collection. When a user deploys a resource without using an access label to specify its collection, MKE automatically places the resource in the default collection.

Default collections are useful for the following types of users:

  • Users who work only on a well-defined portion of the system

  • Users who deploy stacks but do not want to edit the contents of their compose files

Custom collections are appropriate for users with more complex roles in the system, such as administrators.

Note

For those using Docker Compose, the system applies default collection labels across all resources in the stack unless you explicitly set com.docker.ucp.access.label.

Built-in Swarm collections

MKE includes the following built-in Swarm collections:

Built-in Swarm collection

Description

/

Path to all resources in the Swarm cluster. Resources not in a collection are put here.

/System

Path to MKE managers, MSR nodes, and MKE/MSR system services. By default, only administrators have access to this collection.

/Shared

Path to a user’s private collection. Private collections are not created until the user logs in for the first time.

/Shared/Private

Path to a user’s private collection. Private collections are not created until the user logs in for the first time.

/Shared/Legacy

Path to the access control labels of legacy versions (UCP 2.1 and earlier).

Group and isolate cluster resources

This topic describes how to group and isolate cluster resources into swarm collections and Kubernetes namespaces.

Log in to the MKE web UI as an administrator and complete the following steps:

To create a Swarm collection:

  1. Navigate to Shared Resources > Collections.

  2. Click View Children next to Swarm.

  3. Click Create Collection.

  4. Enter a collection name and click Create.


To add a resource to the collection:

  1. Navigate to the resource you want to add to the collection. For example, click Shared Resources > Nodes and then click the node you want to add.

  2. Click the gear icon in the top right to edit the resource.

  3. Scroll down to Labels and enter the name of the collection you want to add the resource to, for example, Prod.


To create a Kubernetes namespace:

  1. Navigate to Kubernetes > Namespaces and click Create.

  2. Leave the Namespace drop-down blank.

  3. Paste the following in the Object YAML editor:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: namespace-name
    
  4. Click Create.

Note

For more information on assigning resources to a particular namespace, refer to Kubernetes Documentation: Namespaces Walkthrough.

See also

Kubernetes

See also

Kubernetes

Create grants

MKE administrators create grants to control how users and organizations access resource sets. A grant defines user permissions to access resources. Each grant associates one subject with one role and one resource set. For example, you can grant the Prod Team Restricted Control over services in the /Production collection.

The following is a common workflow for creating grants:

  1. Create organizations, teams, and users.

  2. Define custom roles (or use defaults) by adding permitted API operations per type of resource.

  3. Group cluster resources into Swarm collections or Kubernetes namespaces.

  4. Create grants by combining subject, role, and resource set.

Note

This section assumes that you have created the relevant objects for the grant, including the subject, role, and resource set (Kubernetes namespace or Swarm collection).

To create a Kubernetes grant:

  1. Log in to the MKE web UI.

  2. Navigate to Access Control > Grants.

  3. Select the Kubernetes tab and click Create Role Binding.

  4. Under Subject, select Users, Organizations, or Service Account.

    • For Users, select the user from the pull-down menu.

    • For Organizations, select the organization and, optionally, the team from the pull-down menu.

    • For Service Account, select the namespace and service account from the pull-down menu.

  5. Click Next to save your selections.

  6. Under Resource Set, toggle the switch labeled Apply Role Binding to all namespaces (Cluster Role Binding).

  7. Click Next.

  8. Under Role, select a cluster role.

  9. Click Create.


To create a Swarm grant:

  1. Log in to the MKE web UI.

  2. Navigate to Access Control > Grants.

  3. Select the Swarm tab and click Create Grant.

  4. Under Subject, select Users or Organizations.

    • For Users, select a user from the pull-down menu.

    • For Organizations, select the organization and, optionally, the team from the pull-down menu.

  5. Click Next to save your selections.

  6. Under Resource Set, click View Children until the required collection displays.

  7. Click Select Collection next to the required collection.

  8. Click Next.

  9. Under Role, select a role type from the drop-down menu.

  10. Click Create.

Note

MKE places new users in the docker-datacenter organization by default. To apply permissions to all MKE users, create a grant with the docker-datacenter organization as a subject.

Grant users permission to pull images

By default, only administrators can pull images into a cluster managed by MKE. This topic describes how to give non-administrator users permission to pull images.

Images are always in the swarm collection, as they are a shared resource. Grant users the Image Create permission for the Swarm collection to allow them to pull images.

To grant a user permission to pull images:

  1. Log in to the MKE web UI as an administrator.

  2. Navigate to Access Control > Roles.

  3. Select the Swarm tab and click Create.

  4. On the Details tab, enter Pull images for the role name.

  5. On the Operations tab, select Image Create from the IMAGE OPERATIONS drop-down.

  6. Click Create.

  7. Navigate to Access Control > Grants.

  8. Select the Swarm tab and click Create Grant.

  9. Under Subject, click Users and select the required user from the drop-down.

  10. Click Next.

  11. Under Resource Set, select the Swarm collection and click Next.

  12. Under Role, select Pull images from the drop-down.

  13. Click Create.

Reset passwords

This topic describes how to reset passwords for users and administrators.

To change a user password in MKE:

  1. Log in to the MKE web UI with administrator credentials.

  2. Click Access Control > Users.

  3. Select the user whose password you want to change.

  4. Click the gear icon in the top right corner.

  5. Select Security from the left navigation.

  6. Enter the new password, confirm that it is correct, and click Update Password.

Note

For users managed with an LDAP service, you must change user passwords on the LDAP server.

To change an administrator password in MKE:

  1. SSH to an MKE manager node and run:

    docker run --net=host -v ucp-auth-api-certs:/tls -it \
    "$(docker inspect --format \
    '{{ .Spec.TaskTemplate.ContainerSpec.Image }}' \
    ucp-auth-api)" \
    "$(docker inspect --format \
    '{{ index .Spec.TaskTemplate.ContainerSpec.Args 0 }}' \
    ucp-auth-api)" \
    passwd -i
    
  2. Optional. If you have DEBUG set as your global log level within MKE, running $(docker inspect --format '{{ index .Spec.TaskTemplate.ContainerSpec.Args 0 }}` returns --debug instead of --db-addr.

    Pass Args 1 to $docker inspect instead to reset your administrator password:

    docker run --net=host -v ucp-auth-api-certs:/tls -it \
    "$(docker inspect --format \
    '{{ .Spec.TaskTemplate.ContainerSpec.Image }}' \
    ucp-auth-api)" \
    "$(docker inspect --format \
    '{{ index .Spec.TaskTemplate.ContainerSpec.Args 1 }}' \
    ucp-auth-api)" \
    passwd -i
    

Note

Alternatively, ask another administrator to change your password.

RBAC tutorials

This section contains a collection of tutorials that explain how to use RBAC in a variety of scenarios.

Deploy a simple stateless app with RBAC

This tutorial explains how to deploy a NGINX web server and limit access to one team with role-based access control (RBAC).

Scenario

You are the MKE system administrator at Acme Company and need to configure permissions to company resources. The best way to do this is to:

  • Build the organization with teams and users.

  • Define roles with allowable operations per resource types, like permission to run containers.

  • Create collections or namespaces for accessing actual resources.

  • Create grants that join team + role + resource set.

Build the organization

Add the organization, acme-datacenter, and create three teams according to the following structure:

acme-datacenter
├── dba
│ └── Alex*
├── dev
│ └── Bett
└── ops
├── Alex*
└── Chad
Kubernetes deployment

In this section, we deploy NGINX with Kubernetes.

Create namespace

Create a namespace to logically store the NGINX application:

  1. Click Kubernetes > Namespaces.

  2. Paste the following manifest in the terminal window and click Create.

apiVersion: v1
kind: Namespace
metadata:
  name: nginx-namespace
Define roles

For this exercise, create a simple role for the ops team.

Grant access

Grant the ops team (and only the ops team) access to nginx-namespace with the custom role, Kube Deploy.

acme-datacenter/ops + Kube Deploy + nginx-namespace
Deploy NGINX

You’ve configured Docker EE. The ops team can now deploy nginx.

  1. Log on to MKE as “chad” (on the opsteam).

  2. Click Kubernetes > Namespaces.

  3. Paste the following manifest in the terminal window and click Create.

    apiVersion: apps/v1beta2  # Use apps/v1beta1 for versions < 1.8.0
    kind: Deployment
    metadata:
    name: nginx-deployment
    spec:
    selector:
       matchLabels:
          app: nginx
    replicas: 2
    template:
       metadata:
          labels:
          app: nginx
       spec:
          containers:
          - name: nginx
          image: nginx:latest
          ports:
          - containerPort: 80
    
  4. Log on to MKE as each user and ensure that:

    • dba (alex) can’t see nginx-namespace.

    • dev (bett) can’t see nginx-namespace.

Swarm stack

In this section, we deploy nginx as a Swarm service. See Kubernetes Deployment for the same exercise with Kubernetes.

Create collection paths

Create a collection for NGINX resources, nested under the /Shared collection:

/
├── System
└── Shared
    └── nginx-collection

Tip

To drill into a collection, click View Children.

Define roles

You can use the built-in roles or define your own. For this exercise, create a simple role for the ops team:

  1. Click Roles under User Management.

  2. Click Create Role.

  3. On the Details tab, name the role Swarm Deploy.

  4. On the Operations tab, check all Service Operations.

  5. Click Create.

Grant access

Grant the ops team (and only the ops team) access to nginx-collection with the built-in role, Swarm Deploy.

acme-datacenter/ops + Swarm Deploy + /Shared/nginx-collection
Deploy NGINX

You’ve configured MKE. The ops team can now deploy an nginx Swarm service.

  1. Log on to MKE as chad (on the opsteam).

  2. Click Swarm > Services.

  3. Click Create Stack.

  4. On the Details tab, enter:

    • Name: nginx-service

    • Image: nginx:latest

  5. On the Collections tab:

    • Click /Shared in the breadcrumbs.

    • Select nginx-collection.

  6. Click Create.

  7. Log on to MKE as each user and ensure that:

    • dba (alex) cannot see nginx-collection.

    • dev (bett) cannot see nginx-collection.

See also

Isolate volumes to specific teams

This topic describes how to grant two teams access to separate volumes in two different resource collections such that neither team can see the volumes of the other team. MKE allows you to do this even if the volumes are on the same nodes.

To create two teams:

  1. Log in to the MKE web UI.

  2. Navigate to Orgs & Teams.

  3. Create two teams in the engineering organization named Dev and Prod.

  4. Add a non-admin MKE user to the Dev team.

  5. Add a non-admin MKE user to the Prod team.

To create two resource collections:

  1. Create a Swarm collection called dev-volumes nested under the Shared collection.

  2. Create a Swarm collection called prod-volumes nested under the Shared collection.

To create grants for controlling access to the new volumes:

  1. Create a grant for the Dev team to access the dev-volumes collection with the Restricted Control built-in role.

  2. Create a grant for the Prod team to access the prod-volumes collection with the Restricted Control built-in role.

To create a volume as a team member:

  1. Log in as one of the users on the Dev team.

  2. Navigate to Swarm > Volumes and click Create.

  3. On the Details tab, name the new volume dev-data.

  4. On the Collection tab, navigate to the dev-volumes collection and click Create.

  5. Log in as one of the users on the Prod team.

  6. Navigate to Swarm > Volumes and click Create.

  7. On the Details tab, name the new volume prod-data.

  8. On the Collection tab, navigate to the prod-volumes collection and click Create.

As a result, the user on the Prod team cannot see the Dev team volumes, and the user on the Dev team cannot see the Prod team volumes. MKE administrators can see all of the volumes created by either team.

Isolate cluster nodes

With MKE, you can enable physical isolation of resources by organizing nodes into collections and granting Scheduler access for different users. To control access to nodes, move them to dedicated collections where you can grant access to specific users, teams, and organizations.

In this example, a team gets access to a node collection and a resource collection, and MKE access control ensures that the team members cannot view or use Swarm resources that aren’t in their collection.

Note

You need an MKE license and at least two worker nodes to complete this example.

To isolate cluster nodes:

  1. Create an Ops team and assign a user to it.

  2. Create a /Prod collection for the team’s node.

  3. Assign a worker node to the /Prod collection.

  4. Grant the Ops teams access to its collection.

Create a team

In the web UI, navigate to the Organizations & Teams page to create a team named “Ops” in your organization. Add a user who is not a MKE administrator to the team.

Create a node collection and a resource collection

In this example, the Ops team uses an assigned group of nodes, which it accesses through a collection. Also, the team has a separate collection for its resources.

Create two collections: one for the team’s worker nodes and another for the team’s resources.

  1. Navigate to the Collections page to view all of the resource collections in the swarm.

  2. Click Create collection and name the new collection “Prod”.

  3. Click Create to create the collection.

  4. Find Prod in the list, and click View children.

  5. Click Create collection, and name the child collection “Webserver”. This creates a sub-collection for access control.

You’ve created two new collections. The /Prod collection is for the worker nodes, and the /Prod/Webserver sub-collection is for access control to an application that you’ll deploy on the corresponding worker nodes.

Move a worker node to a collection

By default, worker nodes are located in the /Shared collection. Worker nodes that are running MSR are assigned to the /System collection. To control access to the team’s nodes, move them to a dedicated collection.

Move a worker node by changing the value of its access label key, com.docker.ucp.access.label, to a different collection.

  1. Navigate to the Nodes page to view all of the nodes in the swarm.

  2. Click a worker node, and in the details pane, find its Collection. If it’s in the /System collection, click another worker node, because you can’t move nodes that are in the /System collection. By default, worker nodes are assigned to the /Shared collection.

  3. When you’ve found an available node, in the details pane, click Configure.

  4. In the Labels section, find com.docker.ucp.access.label and change its value from /Shared to /Prod.

  5. Click Save to move the node to the /Prod collection.

Note

If you don’t have an MKE license, you will get the following error message when you try to change the access label: Nodes must be in either the shared or system collection without a license.

Grant access for a team

You need two grants to control access to nodes and container resources:

  • Grant the Ops team the Restricted Control role for the /Prod/Webserver resources.

  • Grant the Ops team the Scheduler role against the nodes in the /Prod collection.

Create two grants for team access to the two collections:

  1. Navigate to the Grants page and click Create Grant.

  2. In the left pane, click Resource Sets, and in the Swarm collection, click View Children.

  3. In the Prod collection, click View Children.

  4. In the Webserver collection, click Select Collection.

  5. In the left pane, click Roles, and select Restricted Control in the dropdown.

  6. Click Subjects, and under Select subject type, click Organizations.

  7. Select your organization, and in the Team dropdown, select Ops.

  8. Click Create to grant the Ops team access to the /Prod/Webserver collection.

The same steps apply for the nodes in the /Prod collection.

  1. Navigate to the Grants page and click Create Grant.

  2. In the left pane, click Collections, and in the Swarm collection, click View Children.

  3. In the Prod collection, click Select Collection.

  4. In the left pane, click Roles, and in the dropdown, select Scheduler.

  5. In the left pane, click Subjects, and under Select subject type, click Organizations.

  6. Select your organization, and in the Team dropdown, select Ops .

  7. Click Create to grant the Ops team Scheduler access to the nodes in the /Prod collection.

The cluster is set up for node isolation. Users with access to nodes in the /Prod collection can deploy Swarm services and Kubernetes apps, and their workloads won’t be scheduled on nodes that aren’t in the collection.

Deploy a Swarm service as a team member

When a user deploys a Swarm service, MKE assigns its resources to the user’s default collection.

From the target collection of a resource, MKE walks up the ancestor collections until it finds the highest ancestor that the user has Scheduler access to. Tasks are scheduled on any nodes in the tree below this ancestor. In this example, MKE assigns the user’s service to the /Prod/Webserver collection and schedules tasks on nodes in the /Prod collection.

As a user on the Ops team, set your default collection to /Prod/Webserver.

  1. Log in as a user on the Ops team.

  2. Navigate to the Collections page, and in the Prod collection, click View Children.

  3. In the Webserver collection, click the More Options icon and select Set to default.

Deploy a service automatically to worker nodes in the /Prod collection. All resources are deployed under the user’s default collection, /Prod/Webserver, and the containers are scheduled only on the nodes under /Prod.

  1. Navigate to the Services page, and click Create Service.

  2. Name the service “NGINX”, use the “nginx:latest” image, and click Create.

  3. When the nginx service status is green, click the service. In the details view, click Inspect Resource, and in the dropdown, select Containers.

  4. Click the NGINX container, and in the details pane, confirm that its Collection is /Prod/Webserver.

  5. Click Inspect Resource, and in the dropdown, select Nodes.

  6. Click the node, and in the details pane, confirm that its Collection is /Prod.

Alternative: Use a grant instead of the default collection

Another approach is to use a grant instead of changing the user’s default collection. An administrator can create a grant for a role that has the Service Create permission against the /Prod/Webserver collection or a child collection. In this case, the user sets the value of the service’s access label, com.docker.ucp.access.label, to the new collection or one of its children that has a Service Create grant for the user.

Isolate nodes to Kubernetes namespaces

Use a Kubernetes namespace to deploy a Kubernetes workload to worker nodes:

  1. Create a Kubernetes namespace.

  2. Create a grant for the namespace.

  3. Associate nodes with the namespace.

  4. Deploy a Kubernetes workload.

Create a Kubernetes namespace

An administrator must create a Kubernetes namespace to enable node isolation for Kubernetes workloads.

  1. In the left pane, click Kubernetes.

  2. Click Create to open the Create Kubernetes Object page.

  3. In the Object YAML editor, paste the following YAML.

    apiVersion: v1
    kind: Namespace
    metadata:
      Name: namespace-name
    
  4. Click Create to create the namespace-name namespace.

Grant access to the Kubernetes namespace

Create a grant to the namespace-name namespace:

  1. On the Create Grant page, select Namespaces.

  2. Select the namespace-name namespace, and create a Full Control grant.

Associate nodes with the namespace

Namespaces can be associated with a node collection in either of the following ways:

  • Define an annotation key during namespace creation. This is described in the following paragraphs.

  • Provide the namespace definition information in a configuration file.

Annotation file

The scheduler.alpha.kubernetes.io/node-selector annotation key assigns node selectors to namespaces. If you define a scheduler.alpha.kubernetes.io/node-selector: name-of-node-selector annotation key when creating a namespace, all applications deployed in that namespace are pinned to the nodes with the node selector specified.

The following example labels nodes as example-zone, and adds a scheduler node selector annotation as part of the ops-nodes namespace definition:

For example, to pin all applications deployed in the ops-nodes namespace to nodes in the example-zone region:

  1. Label the nodes with example-zone.

  2. Add an scheduler node selector annotation as part of the namespace definition.

    ```
    apiVersion: v1
       kind: Namespace
       metadata:
          annotations:
          scheduler.alpha.kubernetes.io/node-selector: zone=example-zone
          name: ops-nodes
    ```
    

See also

Kubernetes

Access control design

Collections and grants are strong tools that can be used to control access and visibility to resources in MKE.

This tutorial describes a fictitious company named OrcaBank that needs to configure an architecture in MKE with role-based access control (RBAC) for their application engineering group.

Team access requirements

OrcaBank reorganized their application teams by product with each team providing shared services as necessary. Developers at OrcaBank do their own DevOps and deploy and manage the lifecycle of their applications.

OrcaBank has four teams with the following resource needs:

  • security should have view-only access to all applications in the cluster.

  • db should have full access to all database applications and resources.

  • mobile should have full access to their mobile applications and limited access to shared db services.

  • payments should have full access to their payments applications and limited access to shared db services.

Role composition

To assign the proper access, OrcaBank is employing a combination of default and custom roles:

  • View Only (default role) allows users to see all resources (but not edit or use).

  • Ops (custom role) allows users to perform all operations against configs, containers, images, networks, nodes, secrets, services, and volumes.

  • View & Use Networks + Secrets (custom role) enables users to view/connect to networks and view/use secrets used by db containers, but prevents them from seeing or impacting the db applications themselves.

Collection architecture

OrcaBank is also creating collections of resources to mirror their team structure.

Currently, all OrcaBank applications share the same physical resources, so all nodes and applications are being configured in collections that nest under the built-in collection, /Shared.

Other collections are also being created to enable shared db applications.

  • /Shared/mobile hosts all Mobile applications and resources.

  • /Shared/payments hosts all Payments applications and resources.

  • /Shared/db is a top-level collection for all db resources.

  • /Shared/db/payments is a collection of db resources for Payments applications.

  • /Shared/db/mobile is a collection of db resources for Mobile applications.

The collection architecture has the following tree representation:

/
├── System
└── Shared
    ├── mobile
    ├── payments
    └── db
        ├── mobile
        └── payments

OrcaBank’s Grant composition ensures that their collection architecture gives the db team access to all db resources and restricts app teams to shared db resources.

LDAP/AD integration

OrcaBank has standardized on LDAP for centralized authentication to help their identity team scale across all the platforms they manage.

To implement LDAP authentication in MKE, OrcaBank is using MKE’s native LDAP/AD integration to map LDAP groups directly to MKE teams. Users can be added to or removed from MKE teams via LDAP which can be managed centrally by OrcaBank’s identity team.

The following grant composition shows how LDAP groups are mapped to MKE teams.

Grant composition

OrcaBank is taking advantage of the flexibility in MKE’s grant model by applying two grants to each application team. One grant allows each team to fully manage the apps in their own collection, and the second grant gives them the (limited) access they need to networks and secrets within the db collection.

OrcaBank access architecture

OrcaBank’s resulting access architecture shows applications connecting across collection boundaries. By assigning multiple grants per team, the Mobile and Payments applications teams can connect to dedicated Database resources through a secure and controlled interface, leveraging Database networks and secrets.

Note

In MKE, all resources are deployed across the same group of MKE worker nodes. Node segmentation is provided in Docker Enterprise.

DB team

The db team is responsible for deploying and managing the full lifecycle of the databases used by the application teams. They can execute the full set of operations against all database resources.

Mobile team

The mobile team is responsible for deploying their own application stack, minus the database tier that is managed by the db team.

Access control design using additional security requirements

Caution

Complete the Access control design prior to undertaking this advanced tutorial.

In the first tutorial, the fictional company, OrcaBank, designed an architecture with role-based access control (RBAC) to meet their organization’s security needs. They assigned multiple grants to fine-tune access to resources across collection boundaries on a single platform.

In this tutorial, OrcaBank implements new and more stringent security requirements for production applications:

  1. First, OrcaBank adds staging zone to their deployment model. They will no longer move developed applications directly in to production. Instead, they will deploy apps from their dev cluster to staging for testing, and then to production.

  2. Second, production applications are no longer permitted to share any physical infrastructure with non-production infrastructure. OrcaBank segments the scheduling and access of applications with Node Access Control.

Note

Node Access Control is a feature of MKE and provides secure multi-tenancy with node-based isolation. Nodes can be placed in different collections so that resources can be scheduled and isolated on disparate physical or virtual hardware resources.

Team access requirements

OrcaBank still has three application teams, payments, mobile, and db with varying levels of segmentation between them.

Their RBAC redesign is going to organize their MKE cluster into two top-level collections, staging and production, which are completely separate security zones on separate physical infrastructure.

OrcaBank’s four teams now have different needs in production and staging:

  • security should have view-only access to all applications in production (but not staging).

  • db should have full access to all database applications and resources in production (but not staging).

  • mobile should have full access to their Mobile applications in both production and staging and limited access to shared db services.

  • payments should have full access to their Payments applications in both production and staging and limited access to shared db services.

Role composition

OrcaBank has decided to replace their custom Ops role with the built-in Full Control role.

  • View Only (default role) allows users to see but not edit all cluster resources.

  • Full Control (default role) allows users complete control of all collections granted to them. They can also create containers without restriction but cannot see the containers of other users.

  • View & Use Networks + Secrets (custom role) enables users to view/connect to networks and view/use secrets used by db containers, but prevents them from seeing or impacting the db applications themselves.

Collection architecture

In the previous tutorial, OrcaBank created separate collections for each application team and nested them all under /Shared.

To meet their new security requirements for production, OrcaBank is redesigning collections in two ways:

  • Adding collections for both the production and staging zones, and nesting a set of application collections under each.

  • Segmenting nodes. Both the production and staging zones will have dedicated nodes; and in production, each application will be on a dedicated node.

The collection architecture now has the following tree representation:

/
├── System
├── Shared
├── prod
│   ├── mobile
│   ├── payments
│   └── db
│       ├── mobile
│       └── payments
|
└── staging
    ├── mobile
    └── payments
Grant composition

OrcaBank must now diversify their grants further to ensure the proper division of access.

The payments and mobile application teams will have three grants each–one for deploying to production, one for deploying to staging, and the same grant to access shared db networks and secrets.

OrcaBank access architecture

The resulting access architecture, designed with MKE, provides physical segmentation between production and staging using node access control.

Applications are scheduled only on MKE worker nodes in the dedicated application collection. And applications use shared resources across collection boundaries to access the databases in the /prod/db collection.

DB team

The OrcaBank db team is responsible for deploying and managing the full lifecycle of the databases that are in production. They have the full set of operations against all database resources.

Mobile team

The mobile team is responsible for deploying their full application stack in staging. In production they deploy their own applications but use the databases that are provided by the db team.

Upgrade an MKE installation

This topic describes how to upgrade to a later version of MKE.

Caution

You cannot deploy Kubernetes Ingress on a cluster after upgrading from MKE 3.2.6 to 3.3.0. Refer to the release notes for more information and a workaround. This issue does not impact non-upgrade 3.3.0 installations.

Review the mke-3-4-relnotes before upgrading for information that may be relevant to the upgrade process.

Plan to upgrade MCR to version 19.03.08 or later on every cluster node. Mirantis recommends that you schedule the upgrade for non-business hours to ensure minimal user impact.

Do not make changes to your MKE configuration while upgrading, as doing so can cause misconfigurations that are difficult to troubleshoot.

Verify your environment

Before you perform the environment verifications necessary to ensure a smooth upgrade, Mirantis recommends that you run upgrade checks:

docker container run --rm -it \
--name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
mirantis/ucp \
upgrade checks [command options]

This process confirms:

  • Port availability

  • Sufficient memory and disk space

  • Supported OS version is in use

  • Existing backup availability


To perform system verifications:

  1. Verify time synchronization across all nodes and assess time daemon logs for any large time drifting.

  2. Verify that PROD=4 and vCPU/16GB system requirements are met for MKE managers and MSR replicas.

  3. Verify that your port configurations meet all MKE, MSR, and MCR port requirements.

  4. Verify that your cluster nodes meet the minimum requirements.

  5. Verify that you meet all minimum hardware and software requirements.

Note

Azure installations have additional prerequisites. Refer to Install MKE on Azure for more information.


To perform storage verifications:

  1. Verify that no more than 70% of /var/ storage is used. If more than 70% is used, allocate enough storage to meet this requirement.

  2. Verify whether any node local file systems have disk storage issues, including MSR back-end storage, for example, NFS.

  3. Verify that you are using Overlay2 storage drivers, as they are more stable. If you are not, you should transition to Overlay2 at this time. Transitioning from device mapper to Overlay2 is a destructive rebuild.


To perform operating system verifications:

  1. Patch all relevant packages to the most recent cluster node operating system version, including the kernel.

  2. Perform rolling restart of each node to confirm in-memory settings are the same as startup scripts.

  3. After performing rolling restarts, run check-config.sh on each cluster node checking for kernel compatibility issues.


To perform procedural verifications:

  1. Perform Swarm, MKE, and MSR backups.

  2. Gather Compose, service, and stack files.

  3. Generate an MKE support dump for this specific point in time.

  4. Preinstall MKE, MSR, and MCR images. If your cluster does not have an Internet connection, Mirantis provides tarballs containing all the required container images. If your cluster does have an Internet connection, pull the required container images onto your nodes:

    $ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
    mirantis/ucp:3.5.0 images \
    --list | xargs -L 1 docker pull
    
  5. Load troubleshooting packages, for example, netshoot.


To upgrade MCR:

The MKE upgrade requires MCR 19.03.08 or later to be running on every cluster node. If it is not, perform the following steps first on manager and then on worker nodes:

  1. Log in to the node using SSH.

  2. Upgrade MCR to version 19.03.08 or later.

  3. Using the MKE web UI, verify that the node is in a healthy state:

    1. Log in to the MKE web UI.

    2. Navigate to Shared Resources > Nodes.

    3. Verify that the node is healthy and a part of the cluster.

Caution

Mirantis recommends upgrading in the following order: MCR, MKE, MSR. This topic is limited to the upgrade instructions for MKE.


To perform cluster verifications:

  1. Verify that your cluster is in a healthy state, as it will be easier to troubleshoot should a problem occur.

  2. Create a backup of your cluster, thus allowing you to recover should something go wrong during the upgrade process.

Note

You cannot use the backup archive during the upgrade process, as it is version specific. For example, if you create a backup archive for an MKE 3.4.2 cluster, you cannot use the archive file after you upgrade to MKE 3.4.4.

Perform the upgrade

This topic describes the following three different methods of upgrading MKE:

Note

To upgrade MKE on machines that are not connected to the Internet, refer to Install MKE offline to learn how to download the MKE package for offline installation.

In all three methods, manager nodes are automatically upgraded in place. You cannot control the order of manager node upgrades. For each worker node that requires an upgrade, you can upgrade that node in place or you can replace the node with a new worker node. The type of upgrade you perform depends on what is needed for each node.

Consult the following table to determine which method is right for you:

Upgrade method

Description

Automated in-place cluster upgrade

Performed on any manager node. This method automatically upgrades the entire cluster.

Phased in-place cluster upgrade

Automatically upgrades manager nodes and allows you to control the upgrade order of worker nodes. This type of upgrade is more advanced than the automated in-place cluster upgrade.

Replace existing worker nodes using blue-green deployment

This type of upgrade allows you to stand up a new cluster in parallel to the current one and switch over when the upgrade is complete. It requires that you join new worker nodes, schedule workloads to run on them, pause, drain, and remove old worker nodes in batches (rather than one at a time), and shut down servers to remove worker nodes. This is the most advanced upgrade method.

Automated in-place cluster upgrade method:

This is the standard method of upgrading MKE. It updates all MKE components on all nodes within the MKE cluster one-by-one until the upgrade is complete, and is thus not ideal for those needing to upgrade their worker nodes in a particular order.

  1. Verify that all MCR instances have been upgraded to the corresponding new version.

  2. SSH into one MKE manager node and run the following command (do not run this command on a workstation with a client bundle):

    docker container run --rm -it \
    --name ucp \
    --volume /var/run/docker.sock:/var/run/docker.sock \
    mirantis/ucp:3.5.0 \
    upgrade \
    --interactive
    

    The upgrade command will print messages as it automatically upgrades MKE on all nodes in the cluster.

Phased in-place cluster upgrade

This method allows granular control of the MKE upgrade process by first upgrading a manager node and then allowing you to upgrade worker nodes manually in the order that you select. This allows you to migrate workloads and control traffic while upgrading. You can temporarily run MKE worker nodes with different versions of MKE and MCR.

This method allows you to handle failover by adding additional worker node capacity during an upgrade. You can add worker nodes to a partially-upgraded cluster, migrate workloads, and finish upgrading the remaining worker nodes.

  1. Verify that all MCR instances have been upgraded to the corresponding new version.

  2. SSH into one MKE manager node and run the following command (do not run this command on a workstation with a client bundle):

    docker container run --rm -it \
    --name ucp \
    --volume /var/run/docker.sock:/var/run/docker.sock \
    mirantis/ucp:3.5.0 \
    upgrade \
    --manual-worker-upgrade \
    --interactive
    

    The --manual-worker-upgrade flag allows MKE to upgrade only the manager nodes. It adds an upgrade-hold label to all worker nodes, which prevents MKE from upgrading each worker node until you remove the label.

  3. Optional. Join additional worker nodes to your cluster:

    docker swarm join --token SWMTKN-<swarm-token> <manager-ip>:2377
    

    For more information, refer to Join Linux nodes to your cluster.

    Note

    New worker nodes will already have the newer version of MCR and MKE installed when they join the cluster.

  4. Remove the upgrade-hold label from each worker node to upgrade:

    docker node update --label-rm com.docker.ucp.upgrade-hold \
    <node-name-or-id>
    
Replace existing worker nodes using blue-green deployment

This method creates a parallel environment for a new deployment, which reduces downtime, upgrades worker nodes without disrupting workloads, and allows you to migrate traffic to the new environment with worker node rollback capability.

Note

You do not have to replace all worker nodes in the cluster at one time, but can instead replace them in groups.

  1. Verify that all MCR instances have been upgraded to the corresponding new version.

  2. SSH into one MKE manager node and run the following command (do not run this command on a workstation with a client bundle):

    docker container run --rm -it \
    --name ucp \
    --volume /var/run/docker.sock:/var/run/docker.sock \
    mirantis/ucp:3.5.0 \
    upgrade \
    --manual-worker-upgrade \
    --interactive
    

    The --manual-worker-upgrade flag allows MKE to upgrade only the manager nodes. It adds an upgrade-hold label to all worker nodes, which prevents MKE from upgrading each worker node until the label is removed.

  3. Join additional worker nodes to your cluster:

    docker swarm join --token SWMTKN-<swarm-token> <manager-ip>:2377
    

    For more information, refer to Join Linux nodes to your cluster.

    Note

    New worker nodes will already have the newer version of MCR and MKE installed when they join the cluster.

  4. Join MCR to the cluster:

    docker swarm join --token SWMTKN-<your-token> <manager-ip>:2377
    
  5. Pause all existing worker nodes to ensure that MKE does not deploy new workloads on existing nodes:

    docker node update --availability pause <node-name>
    
  6. Drain the paused nodes in preparation for migrating your workloads:

    docker node update --availability drain <node-name>
    

    Note

    MKE automatically reschedules workloads onto new nodes while existing nodes are paused.

  7. Remove each fully-drained node:

    docker swarm leave <node-name>
    
  8. Remove each manager node after its worker nodes become unresponsive:

    docker node rm <node-name>
    
  9. From any manager node, remove old MKE agents after the upgrade is complete, including 390x and Windows agents carried over from the previous install:

    docker service rm ucp-agent
    docker service rm ucp-agent-win
    docker service rm ucp-agent-s390x
    

Troubleshoot the upgrade process

This topic describes common problems and errors that occur during the upgrade process and how to identify and resolve them.


To check for multiple conflicting upgrades:

The upgrade command automatically checks for multiple ucp-worker-agents, the existence of which can indicate that the cluster is still undergoing a prior manual upgrade. You must resolve the conflicting node labels before proceeding with the upgrade.


To resolve upgrade failures:

You can resolve upgrade failures on worker nodes by changing the node labels back to the previous version, but this is not supported on manager nodes.


To check Kubernetes errors:

For more information on anything that might have gone wrong during the upgrade process, check Kubernetes errors in node state messages after the upgrade is complete.


To check for additional errors:

  1. Check for the following error in your MKE dashboard:

    Awaiting healthy status in Kubernetes node inventory
    Kubelet is unhealthy: Kubelet stopped posting node status
    
  2. Check for the following error in the ucp-controller container log:

    http: proxy error: dial tcp 10.14.101.141:12388: connect: no route to host
    

See also

Kubernetes

Deploy applications with Swarm

Deploy a single-service application

This topic describes how to use both the MKE web UI and the CLI to deploy an NGINX web server and make it accessible on port 8000.


To deploy a single-service application using the MKE web UI:

  1. Log in to the MKE web UI.

  2. Navigate to Swarm > Services and click Create a service.

  3. In the Service Name field, enter nginx.

  4. In the Image Name field, enter nginx:latest.

  5. Navigate to Network > Ports and click Publish Port.

  6. In the Target port field, enter 80.

  7. In the Protocol field, enter tcp.

  8. In the Publish mode field, enter Ingress.

  9. In the Published port field, enter 8000.

  10. Click Confirm to map the ports for the NGINX service.

  11. Specify the service image and ports.

  12. Click Create to deploy the service into the MKE cluster.


To view the default NGINX page through the MKE web UI:

  1. Navigate to Swarm > Services.

  2. Click nginx.

  3. Click Published Endpoints.

  4. Click the link to open a new tab with the default NGINX home page.


To deploy a single service using the CLI:

  1. Verify that you have downloaded and configured the client bundle.

  2. Deploy the single-service application:

    docker service create --name nginx \
    --publish mode=ingress,target=80,published=8000 \
    --label com.docker.ucp.access.owner=<your-username> \
    nginx
    
  3. View the default NGINX page by visiting http://<node-ip>:8000.

See also

NGINX

Deploy a multi-service app

Mirantis Kubernetes Engine (MKE) allows you to use the tools you already know, like docker stack deploy to deploy multi-service applications. You can also deploy your applications from the MKE web UI.

In this example we’ll deploy a multi-service application that allows users to vote on whether they prefer cats or dogs.

version: "3"
services:

  # A Redis key-value store to serve as message queue
  redis:
    image: redis:alpine
    ports:
      - "6379"
    networks:
      - frontend

  # A PostgreSQL database for persistent storage
  db:
    image: postgres:9.4
    volumes:
      - db-data:/var/lib/postgresql/data
    networks:
      - backend

  # Web UI for voting
  vote:
    image: dockersamples/examplevotingapp_vote:before
    ports:
      - 5000:80
    networks:
      - frontend
    depends_on:
      - redis

  # Web UI to count voting results
  result:
    image: dockersamples/examplevotingapp_result:before
    ports:
      - 5001:80
    networks:
      - backend
    depends_on:
      - db

  # Worker service to read from message queue
  worker:
    image: dockersamples/examplevotingapp_worker
    networks:
      - frontend
      - backend

networks:
  frontend:
  backend:

volumes:
  db-data:
From the web UI

To deploy your applications from the MKE web UI, on the left navigation bar expand Shared resources, choose Stacks, and click Create stack.

Choose the name you want for your stack, and choose Swarm services as the deployment mode.

When you choose this option, MKE deploys your app using the Docker swarm built-in orchestrator. If you choose ‘Basic containers’ as the deployment mode, MKE deploys your app using the classic Swarm orchestrator.

Then copy-paste the application definition in docker-compose.yml format.

Once you’re done click Create to deploy the stack.

From the CLI

To deploy the application from the CLI, start by configuring your Docker CLI using a MKE client bundle.

Then, create a file named docker-stack.yml with the content of the yaml above, and run:

docker stack deploy --compose-file voting_app
docker-compose --file docker-compose.yml --project-name voting_app up -d
Check your app

Once the multi-service application is deployed, it shows up in the MKE web UI. The ‘Stacks’ page shows that you’ve deployed the voting app.

You can also inspect the individual services of the app you deployed. For that, click the voting_app to open the details pane, open Inspect resources and choose Services, since this app was deployed with the built-in Docker Swarm orchestrator.

You can also use the Docker CLI to check the status of your app:

docker stack ps voting_app

Great! The app is deployed so we can cast votes by accessing the service that’s listening on port 5000. You don’t need to know the ports a service listens to. You can click the voting_app_vote service and click on the Published endpoints link.

Limitations

When deploying applications from the web UI, you can’t reference any external files, no matter if you’re using the built-in swarm orchestrator or classic Swarm. For that reason, the following keywords are not supported:

  • build

  • dockerfile

  • env_file

Also, MKE doesn’t store the stack definition you’ve used to deploy the stack. You can use a version control system for this.

Deploy services to a Swarm collection

This topic describes how to use both the CLI and a Compose file to deploy application resources to a particular Swarm collection. Attach the Swarm collection path to the service access label to assign the service to the required collection. MKE automatically assigns new services to the default collection unless you use either of the methods presented here to assign a different Swarm collection.

Caution

To assign services to Swarm collections, an administrator must first create the Swarm collection and grant the user access to the required collection. Otherwise the deployment will fail.

Note

If required, you can place application resources into multiple collections.


To deploy a service to a Swarm collection using the CLI:

Use docker service create to deploy your service to a collection:

docker service create \
--name <service-name> \
--label com.docker.ucp.access.label="</collection/path>"
<app-name>:<version>

To deploy a service to a Swarm collection using a Compose file:

  1. Use a labels: dictionary in a Compose file and add the Swarm collection path to the com.docker.ucp.access.label key.

    The following example specifies two services, WordPress and MySQL, and assigns /Shared/wordpress to their access labels:

    version: '3.1'
    
    services:
    
      wordpress:
        image: wordpress
        networks:
          - wp
        ports:
          - 8080:80
        environment:
          WORDPRESS_DB_PASSWORD: example
        deploy:
          labels:
            com.docker.ucp.access.label: /Shared/wordpress
      mysql:
        image: mysql:5.7
        networks:
          - wp
        environment:
          MYSQL_ROOT_PASSWORD: example
        deploy:
          labels:
            com.docker.ucp.access.label: /Shared/wordpress
    
    networks:
      wp:
        driver: overlay
        labels:
          com.docker.ucp.access.label: /Shared/wordpress
    
  2. Log in to the MKE web UI.

  3. Navigate to the Shared Resources > Stacks and click Create Stack.

  4. Name the application wordpress.

  5. Under ORCHESTRATOR MODE, select Swarm Services and click Next.

  6. In the Add Application File editor, paste the Compose file.

  7. Click Create to deploy the application

  8. Click Done when the deployment completes.

Note

MKE reports an error if the /Shared/wordpress collection does not exist or if you do not have a grant for accessing it.


To confirm that the service deployed to the correct Swarm collection:

  1. Navigate to Shared Resources > Stacks and select your application.

  2. Navigate to the to Services tab and select the required service.

  3. On the details pages, verify that the service is assigned to the correct Swarm collection.

Note

MKE creates a default overlay network for your stack that attaches to each container you deploy. This works well for administrators and those assigned full control roles. If you have lesser permissions, define a custom network with the same com.docker.ucp.access.label label as your services and attach this network to each service. This correctly groups your network with the other resources in your stack.

Use secrets in Swarm deployments

This topic describes how to create and use secrets with MKE by showing you how to deploy a WordPress application that uses a secret for storing a plaintext password. Other sensitive information you might use a secret to store includes TLS certificates and private keys. MKE allows you to securely store secrets and configure who can access and manage them using role-based access control (RBAC).

The application you will create in this topic includes the following two services:

  • wordpress

    Apache, PHP, and WordPress

  • wordpress-db

    MySQL database

The following example stores a password in a secret, and the secret is stored in a file inside the container that runs the services you will deploy. The services have access to the file, but no one else can see the plaintext password. To make things simple, you will not configure the database to persist data, and thus when the service stops, the data is lost.


To create a secret:

  1. Log in to the MKE web UI.

  2. Navigate to Swarm > Secrets and click Create.

    Note

    After you create the secret, you will not be able to edit or see the secret again.

  3. Name the secret wordpress-password-v1.

  4. In the Content field, assign a value to the secret.

  5. Optional. Define a permission label so that other users can be given permission to use this secret.

    Note

    To use services and secrets together, they must either have the same permission label or no label at all.


To create a network for your services:

  1. Navigate to Swarm > Networks and click Create.

  2. Create a network called wordpress-network with the default settings.


To create the MySQL service:

  1. Navigate to Swarm > Services and click Create.

  2. Under Service Details, name the service wordpress-db.

  3. Under Task Template, enter mysql:5.7.

  4. In the left-side menu, navigate to Network, click Attach Network +, and select wordpress-network from the drop-down.

  5. In the left-side menu, navigate to Environment, click Use Secret +, and select wordpress-password-v1 from the drop-down.

  6. Click Confirm to associate the secret with the service.

  7. Scroll down to Environment variables and click Add Environment Variable +.

  8. Enter the following string to create an environment variable that contains the path to the password file in the container:

    “MYSQL_ROOT_PASSWORD_FILE=/run/secrets/wordpress-password-v1”
    
  9. If you specified a permission label on the secret, you must set the same permission label on this service.

  10. Click Create to deploy the MySQL service.

This creates a MySQL service that is attached to the wordpress-network network and that uses the wordpress-password-v1 secret. By default, this creates a file with the same name in /run/secrets/<secret-name> inside the container running the service.

We also set the MYSQL_ROOT_PASSWORD_FILE environment variable to configure MySQL to use the content of the /run/secrets/wordpress-password-v1 file as the root password.


To create the WordPress service:

  1. Navigate to Swarm > Services and click Create.

  2. Under Service Details, name the service wordpress.

  3. Under Task Template, enter wordpress:latest.

  4. In the left-side menu, navigate to Network, click Attach Network +, and select wordpress-network from the drop-down.

  5. In the left-side menu, navigate to Environment, click Use Secret +, and select wordpress-password-v1 from the drop-down.

  6. Click Confirm to associate the secret with the service.

  7. Scroll down to Environment variables and click Add Environment Variable +.

  8. Enter the following string to create an environment variable that contains the path to the password file in the container:

    “WORDPRESS_DB_PASSWORD_FILE=/run/secrets/wordpress-password-v1”
    
  9. Add another environment variable and enter the following string:

    “WORDPRESS_DB_HOST=wordpress-db:3306”.
    
  10. If you specified a permission label on the secret, you must set the same permission label on this service.

  11. Click Create to deploy the WordPress service.

This creates a WordPress service that is attached to the same network as the MySQL service so that they can communicate, and maps the port 80 of the service to port 8000 of the cluster routing mesh.

Once you deploy this service, you will be able to access it on port 8000 using the IP address of any node in your MKE cluster.


To update a secret:

If the secret is compromised, you need to change it, update the services that use it, and delete the old secret.

  1. Create a new secret named wordpress-password-v2.

  2. From Swarm > Secrets, select the wordpress-password-v1 secret to view all the services that you need to update. In this example, it is straightforward, but that will not always be the case.

  3. Update wordpress-db to use the new secret.

  4. Update the MYSQL_ROOT_PASSWORD_FILE environment variable with either of the following methods:

    • Update the environment variable directly with the following:

      “MYSQL_ROOT_PASSWORD_FILE=/run/secrets/wordpress-password-v2”
      
    • Mount the secret file in /run/secrets/wordpress-password-v1 by setting the Target Name field with wordpress-password-v1. This mounts the file with the wordpress-password-v2 content in /run/secrets/wordpress-password-v1.

  5. Delete the wordpress-password-v1 secret and click Update.

  6. Repeat the foregoing steps for the WordPress service.

Interlock

Layer 7 routing

MKE includes a system for application-layer (Layer 7) routing that offers both application routing and load balancing (ingress routing) for Swarm orchestration. The Interlock architecture leverages Swarm components to provide scalable Layer 7 routing and Layer 4 VIP mode functionality.

Swarm mode provides MCR with a routing mesh, which enables users to access services using the IP address of any node in the swarm. Layer 7 routing enables you to access services through any node in the swarm by using a domain name, with Interlock routing the traffic to the node with the relevant container.

Interlock uses the Docker remote API to automatically configure extensions such as NGINX and HAProxy for application traffic. Interlock is designed for:

  • Full integration with MCR, including Swarm services, secrets, and configs

  • Enhanced configuration, including context roots, TLS, zero downtime deployment, and rollback

  • Support through extensions for external load balancers, such as NGINX, HAProxy, and F5

  • Least privilege for extensions, such that they have no Docker API access

Note

Interlock and Layer 7 routing are used for Swarm deployments. Refer to NGINX Ingress Controller for information on routing traffic to your Kubernetes applications.

Terminology
Cluster

A group of compute resources running MKE

Swarm

An MKE cluster running in Swarm mode

Upstream

An upstream container that serves an application

Proxy service

A service, such as NGINX, that provides load balancing and proxying

Extension service

A secondary service that configures the proxy service

Service cluster

A combined Interlock extension and proxy service

gRPC

A high-performance RPC framework

Interlock services
Interlock

The central piece of the Layer 7 routing solution. The core service is responsible for interacting with the Docker remote API and building an upstream configuration for the extensions. Interlock uses the Docker API to monitor events, and manages the extension and proxy services, and it serves this on a gRPC API that the extensions are configured to access.

Interlock manages extension and proxy service updates for both configuration changes and application service deployments. There is no operator intervention required.

The Interlock service starts a single replica on a manager node. The Interlock extension service runs a single replica on any available node, and the Interlock proxy service starts two replicas on any available node. Interlock prioritizes replica placement in the following order:

  • Replicas on the same worker node

  • Replicas on different worker nodes

  • Replicas on any available nodes, including managers

Interlock extension

A secondary service that queries the Interlock gRPC API for the upstream configuration. The extension service configures the proxy service according to the upstream configuration. For proxy services that use files such as NGINX or HAProxy, the extension service generates the file and sends it to Interlock using the gRPC API. Interlock then updates the corresponding Docker configuration object for the proxy service.

Interlock proxy

A proxy and load-balancing service that handles requests for the upstream application services. Interlock configures these using the data created by the corresponding extension service. By default, this service is a containerized NGINX deployment.

Features and benefits
High availability

All Layer 7 routing components are failure-tolerant and leverage Docker Swarm for high availability.

Automatic configuration

Interlock uses the Docker API for automatic configuration, without needing you to manually update or restart anything to make services available. MKE monitors your services and automatically reconfigures proxy services.

Scalability

Interlock uses a modular design with a separate proxy service, allowing an operator to individually customize and scale the proxy Layer to handle user requests and meet services demands, with transparency and no downtime for users.

TLS

You can leverage Docker secrets to securely manage TLS certificates and keys for your services. Interlock supports both TLS termination and TCP passthrough.

Context-based routing

Interlock supports advanced application request routing by context or path.

Host mode networking

Layer 7 routing leverages the Docker Swarm routing mesh by default, but Interlock also supports running proxy and application services in host mode networking, allowing you to bypass the routing mesh completely, thus promoting maximum application performance.

Security

The Layer 7 routing components that are exposed to the outside world run on worker nodes, thus your cluster will not be affected if they are compromised.

SSL

Interlock leverages Docker secrets to securely store and use SSL certificates for services, supporting both SSL termination and TCP passthrough.

Blue-green and canary service deployment

Interlock supports blue-green service deployment allowing an operator to deploy a new application while the current version is serving. Once the new application verifies the traffic, the operator can scale the older version to zero. If there is a problem, the operation is easy to reverse.

Service cluster support

Interlock supports multiple extension and proxy service combinations, thus allowing for operators to partition load balancing resources to be used, for example, in region- or organization-based load balancing.

Least privilege

Interlock supports being deployed where the load balancing proxies do not need to be colocated with a Swarm manager. This is a more secure approach to deployment as it ensures that the extension and proxy services do not have access to the Docker API.

Single Interlock deployment

When an application image is updated, the following actions occur:

  1. The service is updated with a new version of the application.

  2. The default “stop-first” policy stops the first replica before scheduling the second. The interlock proxies remove ip1.0 out of the backend pool as the app.1 task is removed.

  3. The first application task is rescheduled with the new image after the first task stops.

  4. The interlock proxy.1 is then rescheduled with the new NGINX configuration that contains the update for the new app.1 task.

  5. After proxy.1 is complete, proxy.2 redeploys with the updated ngnix configuration for the app.1 task.

  6. In this scenario, the amount of time that the service is unavailable is less than 30 seconds.

Optimizing Interlock for applications
Application update order

Swarm provides control over the order in which old tasks are removed while new ones are created. This is controlled on the service-level with --update-order.

  • stop-first (default)- Configures the currently updating task to stop before the new task is scheduled.

  • start-first - Configures the current task to stop after the new task has scheduled. This guarantees that the new task is running before the old task has shut down.

Use start-first if …

  • You have a single application replica and you cannot have service interruption. Both the old and new tasks run simultaneously during the update, but this ensurse that there is no gap in service during the update.

Use stop-first if …

  • Old and new tasks of your service cannot serve clients simultaneously.

  • You do not have enough cluster resourcing to run old and new replicas simultaneously.

In most cases, start-first is the best choice because it optimizes for high availability during updates.

Application update delay

Swarm services use update-delay to control the speed at which a service is updated. This adds a timed delay between application tasks as they are updated. The delay controls the time from when the first task of a service transitions to healthy state and the time that the second task begins its update. The default is 0 seconds, which means that a replica task begins updating as soon as the previous updated task transitions in to a healthy state.

Use update-delay if …

  • You are optimizing for the least number of dropped connections and a longer update cycle as an acceptable tradeoff.

  • Interlock update convergence takes a long time in your environment (can occur when having large amount of overlay networks).

Do not use update-delay if …

  • Service updates must occur rapidly.

  • Old and new tasks of your service cannot serve clients simultaneously.

Use application health checks

Swarm uses application health checks extensively to ensure that its updates do not cause service interruption. health-cmd can be configured in a Dockerfile or compose file to define a method for health checking an application. Without health checks, Swarm cannot determine when an application is truly ready to service traffic and will mark it as healthy as soon as the container process is running. This can potentially send traffic to an application before it is capable of serving clients, leading to dropped connections.

Application stop grace period

Use stop-grace-period to configure the maximum time period delay prior to force killing of the task (default: 10 seconds). In short, under the default setting a task can continue to run for no more than 10 seconds once its shutdown cycle has been initiated. This benefits applications that require long periods to process requests, allowing connection to terminate normally.

Interlock optimizations
Use service clusters for Interlock segmentation

Interlock service clusters allow Interlock to be segmented into multiple logical instances called “service clusters”, which have independently managed proxies. Application traffic only uses the proxies for a specific service cluster, allowing the full segmentation of traffic. Each service cluster only connects to the networks using that specific service cluster, which reduces the number of overlay networks to which proxies connect. Because service clusters also deploy separate proxies, this also reduces the amount of churn in LB configs when there are service updates.

Minimizing number of overlay networks

Interlock proxy containers connect to the overlay network of every Swarm service. Having many networks connected to Interlock adds incremental delay when Interlock updates its load balancer configuration. Each network connected to Interlock generally adds 1-2 seconds of update delay. With many networks, the Interlock update delay causes the LB config to be out of date for too long, which can cause traffic to be dropped.

Minimizing the number of overlay networks that Interlock connects to can be accomplished in two ways:

  • Reduce the number of networks. If the architecture permits it, applications can be grouped together to use the same networks.

  • Use Interlock service clusters. By segmenting Interlock, service clusters also segment which networks are connected to Interlock, reducing the number of networks to which each proxy is connected.

  • Use admin-defined networks and limit the number of networks per service cluster.

Use Interlock VIP Mode

VIP Mode can be used to reduce the impact of application updates on the Interlock proxies. It utilizes the Swarm L4 load balancing VIPs instead of individual task IPs to load balance traffic to a more stable internal endpoint. This prevents the proxy LB configs from changing for most kinds of app service updates reducing churn for Interlock. The following features are not supported in VIP mode:

  • Sticky sessions

  • Websockets

  • Canary deployments

The following features are supported in VIP mode:

  • Host & context routing

  • Context root rewrites

  • Interlock TLS termination

  • TLS passthrough

  • Service clusters

See also

NGINX

Configure
Configure layer 7 routing service

To further customize the layer 7 routing solution, you must update the ucp-interlock service with a new Docker configuration.

  1. Find out what configuration is currently being used for the ucp-interlock service and save it to a file:

    CURRENT_CONFIG_NAME=$(docker service inspect --format '{{ (index .Spec.TaskTemplate.ContainerSpec.Configs 0).ConfigName }}' ucp-interlock)
    docker config inspect --format '{{ printf "%s" .Spec.Data }}' $CURRENT_CONFIG_NAME > config.toml
    
  2. Make the necessary changes to the config.toml file.

  3. Create a new Docker configuration object from the config.toml file:

    NEW_CONFIG_NAME="com.docker.ucp.interlock.conf-$(( $(cut -d '-' -f 2 <<< "$CURRENT_CONFIG_NAME") + 1 ))"
    docker config create $NEW_CONFIG_NAME config.toml
    
  4. Update the ucp-interlock service to start using the new configuration:

    docker service update \
      --config-rm $CURRENT_CONFIG_NAME \
      --config-add source=$NEW_CONFIG_NAME,target=/config.toml \
      ucp-interlock
    

By default, the ucp-interlock service is configured to roll back to a previous stable configuration if you provide an invalid configuration.

If you want the service to pause instead of rolling back, you can update it with the following command:

docker service update \
  --update-failure-action pause \
  ucp-interlock

Note

When you enable the layer 7 routing solution from the MKE UI, the ucp-interlock service is started using the default configuration.

If you’ve customized the configuration used by the ucp-interlock service, you must update it again to use the Docker configuration object you’ve created.

TOML file configuration options

The following sections describe how to configure the primary Interlock services:

  • Core

  • Extension

  • Proxy

Core configuration

The core configuration handles the Interlock service itself. The following configuration options are available for the ucp-interlock service.

Option

Type

Description

ListenAddr

string

Address to serve the Interlock GRPC API. Defaults to 8080.

DockerURL

string

Path to the socket or TCP address to the Docker API. Defaults to unix:// /var/run/docker.sock

TLSCACert

string

Path to the CA certificate for connecting securely to the Docker API.

TLSCert

string

Path to the certificate for connecting securely to the Docker API.

TLSKey

string

Path to the key for connecting securely to the Docker API.

AllowInsecure

bool

Skip TLS verification when connecting to the Docker API via TLS.

PollInterval

string

Interval to poll the Docker API for changes. Defaults to 3s.

EndpointOverride

string

Override the default GRPC API endpoint for extensions. The default is detected via Swarm.

Extensions

[]Extension

Array of extensions as listed below

Extension configuration

Interlock must contain at least one extension to service traffic. The following options are available to configure the extensions.

Option

Type

Description

Image

string

Name of the Docker image to use for the extenstion.

Args

[]string

Arguments to be passed to the extension service.

Labels

map[string]string

Labels to add to the extension service.

Networks

[]string

Allows the administrator to cherry-pick a list of networks that Interlock can connect to. If this option is not specified, the proxy-service can connect to all networks.

ContainerLabels

map[string]string

labels for the extension service tasks.

Constraints

[]string

One or more constraints to use when scheduling the extenstion service.

PlacementPreferences

[]string

One of more placement prefs.

ServiceName

string

Name of the extension service.

ProxyImage

string

Name of the Docker image to use for the proxy service.

ProxyArgs

[]string

Arguments to pass to the proxy service.

ProxyLabels

map[string]string

Labels to add to the proxy service.

ProxyContainerLabels

map[string]string

Labels to be added to the proxy service tasks.

ProxyServiceName

string

Name of the proxy service.

ProxyConfigPath

string

Path in the service for the generated proxy config.

ProxyReplicas

unit

Number or proxy service replicas.

ProxyStopSignal

string

Stop signal for the proxy service, for example SIGQUIT.

ProxyStopGracePeriod

string

Stop grace period for the proxy service in seconds, for example 5s.

ProxyConstraints

[]string

One or more constraints to use when scheduling the proxy service. Set the variable to false, as it is currenlty set to true by default.

ProxyPlacementPreferences

[]string

One or more placement prefs to use when scheduling the proxy service.

ProxyUpdateDelay

string

Delay between rolling proxy container updates.

ServiceCluster

string

Name of the cluster this extension services.

PublishMode

string (ingress or host)

Publish mode that the proxy service uses.

PublishedPort

int

Port on which the proxy service serves non-SSL traffic.

PublishedSSLPort

int

Port on which the proxy service serves SSL traffic.

Template

int

Docker configuration object that is used as the extension template.

Config

Config

Proxy configuration used by the extensions as described in this section.

HitlessServiceUpdate

bool

When set to true, services can be updated without restarting the proxy container.

ConfigImage

Config

Name for the config service (used by hitless service updates). For example, mirantis/ucp-interlock-config:3.2.1.

ConfigServiceName

Config

Name of the config service. This name is equivalent to ProxyServiceName. For example, ucp-interlock-config.

Proxy

Options are made available to the extensions, and the extensions utilize the options needed for proxy service configuration. This provides overrides to the extension configuration.

Because Interlock passes the extension configuration directly to the extension, each extension has different configuration options available. Refer to the documentation for each extension for supported options:

  • NGINX

Customize the default proxy service

The default proxy service used by MKE to provide layer 7 routing is NGINX. If users try to access a route that hasn’t been configured, they will see the default NGINX 404 page:

You can customize this by labeling a service with com.docker.lb.default_backend=true. In this case, if users try to access a route that’s not configured, they are redirected to this service.

As an example, create a docker-compose.yml file with:

version: "3.2"

services:
  demo:
    image: ehazlett/interlock-default-app
    deploy:
      replicas: 1
      labels:
        com.docker.lb.default_backend: "true"
        com.docker.lb.port: 80
    networks:
      - demo-network

networks:
  demo-network:
    driver: overlay

Set up your CLI client with a MKE client bundle, and deploy the service:

docker stack deploy --compose-file docker-compose.yml demo

If users try to access a route that’s not configured, they are directed to this demo service.

To minimize forwarding interruption to the updating service while updating a single replicated service, use com.docker.lb.backend_mode=vip.

Example Configuration

The following is an example configuration to use with the NGINX extension.

ListenAddr = ":8080"
DockerURL = "unix:///var/run/docker.sock"
PollInterval = "3s"

[Extensions.default]
  Image = "mirantis/interlock-extension-nginx:3.5.0"
  Args = ["-D"]
  ServiceName = "interlock-ext"
  ProxyImage = "mirantis/ucp-interlock-proxy:3.5.0"
  ProxyArgs = []
  ProxyServiceName = "interlock-proxy"
  ProxyConfigPath = "/etc/nginx/nginx.conf"
  ProxyStopGracePeriod = "3s"
  PublishMode = "ingress"
  PublishedPort = 80
  ProxyReplicas = 1
  TargetPort = 80
  PublishedSSLPort = 443
  TargetSSLPort = 443
  [Extensions.default.Config]
    User = "nginx"
    PidPath = "/var/run/proxy.pid"
    WorkerProcesses = 1
    RlimitNoFile = 65535
    MaxConnections = 2048

See also

NGINX

Configure host mode networking

By default, layer 7 routing components communicate with one another using overlay networks, but Interlock supports host mode networking in a variety of ways, including proxy only, Interlock only, application only, and hybrid.

When using host mode networking, you cannot use DNS service discovery, since that functionality requires overlay networking. For services to communicate, each service needs to know the IP address of the node where the other service is running.

To use host mode networking instead of overlay networking:

  1. Perform the configuration needed for a production-grade deployment.

  2. Update the ucp-interlock configuration.

  3. Deploy your Swarm services.

Configuration for a production-grade deployment

If you have not done so, configure the layer 7 routing solution for production. The ucp-interlock-proxy service replicas should then be running on their own dedicated nodes.

Update the ucp-interlock config

Update the ucp-interlock service configuration so that it uses host mode networking.

Update the PublishMode key to:

PublishMode = "host"

When updating the ucp-interlock service to use the new Docker configuration, make sure to update it so that it starts publishing its port on the host:

docker service update \
  --config-rm $CURRENT_CONFIG_NAME \
  --config-add source=$NEW_CONFIG_NAME,target=/config.toml \
  --publish-add mode=host,target=8080 \
  ucp-interlock

The ucp-interlock and ucp-interlock-extension services are now communicating using host mode networking.

Deploy your Swarm services

Now you can deploy your Swarm services. Set up your CLI client with an MKE client bundle, and deploy the service. The following example deploys a demo service that also uses host mode networking:

docker service create \
  --name demo \
  --detach=false \
  --label com.docker.lb.hosts=app.example.org \
  --label com.docker.lb.port=8080 \
  --publish mode=host,target=8080 \
  --env METADATA="demo" \
  ehazlett/docker-demo

In this example, Docker allocates a high random port on the host where the service can be reached.

To test that everything is working, run the following command:

curl --header "Host: app.example.org" \
  http://<proxy-address>:<routing-http-port>/ping

Where:

  • <proxy-address> is the domain name or IP address of a node where the proxy service is running.

  • <routing-http-port> is the port you’re using to route HTTP traffic.

If everything is working correctly, you should get a JSON result like:

{"instance":"63b855978452", "version":"0.1", "request_id":"d641430be9496937f2669ce6963b67d6"}

The following example describes how to configure an eight (8) node Swarm cluster that uses host mode networking to route traffic without using overlay networks. There are three (3) managers and five (5) workers. Two of the workers are configured with node labels to be dedicated ingress cluster load balancer nodes. These will receive all application traffic.

This example does not cover the actual deployment of infrastructure. It assumes you have a vanilla Swarm cluster (docker init and docker swarm join from the nodes).

Note

When using host mode networking, you cannot use the DNS service discovery because that requires overlay networking. You can use other tooling, such as Registrator, to get that functionality if needed.

Configure the load balancer worker nodes (lb-00 and lb-01) with node labels in order to pin the Interlock Proxy service. Once you are logged into one of the Swarm managers run the following to add node labels to the dedicated load balancer worker nodes:

$> docker node update --label-add nodetype=loadbalancer lb-00
lb-00
$> docker node update --label-add nodetype=loadbalancer lb-01
lb-01

Inspect each node to ensure the labels were successfully added:

$> docker node inspect -f '{{ .Spec.Labels  }}' lb-00
map[nodetype:loadbalancer]
$> docker node inspect -f '{{ .Spec.Labels  }}' lb-01
map[nodetype:loadbalancer]

Next, create a configuration object for Interlock that specifies host mode networking:

$> cat << EOF | docker config create service.interlock.conf -
ListenAddr = ":8080"
DockerURL = "unix:///var/run/docker.sock"
PollInterval = "3s"

[Extensions]
  [Extensions.default]
    Image = "mirantis/ucp-interlock-extension:3.5.0"
    Args = []
    ServiceName = "interlock-ext"
    ProxyImage = "mirantis/ucp-interlock-proxy:3.5.0"
    ProxyArgs = []
    ProxyServiceName = "interlock-proxy"
    ProxyConfigPath = "/etc/nginx/nginx.conf"
    ProxyReplicas = 1
    PublishMode = "host"
    PublishedPort = 80
    TargetPort = 80
    PublishedSSLPort = 443
    TargetSSLPort = 443
    [Extensions.default.Config]
      User = "nginx"
      PidPath = "/var/run/proxy.pid"
      WorkerProcesses = 1
      RlimitNoFile = 65535
      MaxConnections = 2048
EOF
oqkvv1asncf6p2axhx41vylgt

Note

The PublishMode = "host" setting. This instructs Interlock to configure the proxy service for host mode networking.

Now create the Interlock service also using host mode networking:

$> docker service create \
    --name interlock \
    --mount src=/var/run/docker.sock,dst=/var/run/docker.sock,type=bind \
    --constraint node.role==manager \
    --publish mode=host,target=8080 \
    --config src=service.interlock.conf,target=/config.toml \
    mirantis/ucp-interlock:3.5.0 -D run -c /config.toml
sjpgq7h621exno6svdnsvpv9z
Configure proxy services

With the node labels, you can re-configure the Interlock Proxy services to be constrained to the workers. From a manager run the following to pin the proxy services to the load balancer worker nodes:

$> docker service update \
    --constraint-add node.labels.nodetype==loadbalancer \
    interlock-proxy

Now you can deploy the application:

$> docker service create \
    --name demo \
    --detach=false \
    --label com.docker.lb.hosts=demo.local \
    --label com.docker.lb.port=8080 \
    --publish mode=host,target=8080 \
    --env METADATA="demo" \
    ehazlett/docker-demo

This runs the service using host mode networking. Each task for the service has a high port (for example, 32768) and uses the node IP address to connect. You can see this when inspecting the headers from the request:

$> curl -vs -H "Host: demo.local" http://127.0.0.1/ping
curl -vs -H "Host: demo.local" http://127.0.0.1/ping
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> GET /ping HTTP/1.1
> Host: demo.local
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.13.6
< Date: Fri, 10 Nov 2017 15:38:40 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 110
< Connection: keep-alive
< Set-Cookie: session=1510328320174129112; Path=/; Expires=Sat, 11 Nov 2017 15:38:40 GMT; Max-Age=86400
< x-request-id: e4180a8fc6ee15f8d46f11df67c24a7d
< x-proxy-id: d07b29c99f18
< x-server-info: interlock/2.0.0-preview (17476782) linux/amd64
< x-upstream-addr: 172.20.0.4:32768
< x-upstream-response-time: 1510328320.172
<
{"instance":"897d3c7b9e9c","version":"0.1","metadata":"demo","request_id":"e4180a8fc6ee15f8d46f11df67c24a7d"}
Configure NGINX

By default, NGINX is used as a proxy. The following configuration options are available for the NGINX extension.

Note

The ServerNamesHashBucketSize option, which allowed the user to manually set the bucket size for the server names hash table, was removed in MKE 3.4.2 because MKE now adaptively calculates the setting and overrides any manual input.

Option

Type

Description

Defaults

User

string

User name for the proxy

nginx

PidPath

string

Path to the PID file for the proxy service

/var/run/proxy.pid

MaxConnections

int

Maximum number of connections for the proxy service

1024

ConnectTimeout

int

Timeout in seconds for clients to connect

600

SendTimeout

int

Timeout in seconds for the service to read a response from the proxied upstream

600

ReadTimeout

int

Timeout in seconds for the service to read a response from the proxied upstream

600

SSLOpts

int

Options to be passed when configuring SSL

N/A

SSLDefaultDHParam

int

Size of DH parameters

1024

SSLDefaultDHParamPath

string

Path to DH parameters file

N/A

SSLVerify

string

SSL client verification

required

WorkerProcesses

string

Number of worker processes for the proxy service

1

RLimitNoFile

int

Maximum number of open files for the proxy service

65535

SSLCiphers

string

SSL ciphers to use for the proxy service

HIGH:!aNULL:!MD5

SSLProtocols

string

Enable the specified TLS protocols

TLSv1.2

HideInfoHeaders

bool

Hide proxy-related response headers

N/A

KeepaliveTimeout

string

Connection keep-alive timeout

75s

ClientMaxBodySize

string

Maximum allowed client request body size

1 m

ClientBodyBufferSize

string

Buffer size for reading client request body

8k

ClientHeaderBufferSize

string

Maximum number and size of buffers used for reading large client request header

1k

LargeClientHeaderBuffers

string

Maximum number and size of buffers used for reading large client request header

4 8k

ClientBodyTimeout

string

Timeout for reading client request body

60s

UnderscoresInHeaders

bool

Enables or disables the use of underscores in client request header fields

false

UpstreamZoneSize

int

Size of the shared memory zone (in KB)

64

GlobalOptions

[]string

List of options that are included in the global configuration

N/A

HTTPOptions

[]string

List of options that are included in the HTTP configuration

N/A

TCPOptions

[]string

List of options that are included in the stream (TCP) configuration

N/A

AccessLogPath

string

Path to use for access logs

/dev/stdout

ErrorLogPath

string

Path to use for error logs

/dev/stdout

MainLogFormat

string

Format to use for main logger

N/A

TraceLogFormat

string

Format to use for trace logger

N/A

See also

NGINX

Tune the proxy service

This topic describes how to tune various components of the proxy service.

  • Constrain the proxy service to multiple dedicated worker nodes:

    <need-sme-instructions>
    
  • Adjust the stop signal and grace period, for example, to SIGTERM for the stop signal and ten seconds for the grace period:

    docker service update --stop-signal=SIGTERM \
    --stop-grace-period=10s interlock-proxy
    
  • Change the action that Swarm takes when an update fails using update-failure-action (the default is pause), for example, to rollback to the previous configuration:

    docker service update --update-failure-action=rollback \
    interlock-proxy
    
  • Change the amount of time between proxy updates using update-delay (the default is to use rolling updates), for example, setting the delay to thirty seconds:

    docker service update --update-delay=30s interlock-proxy
    
Update Interlock services

This topic describes how to update Interlock services by first updating the Interlock configuration to specify the new extension or proxy image versions and then updating the Interlock services to use the new configuration and image.

To update Interlock services:

  1. Create the new Interlock configuration:

    docker config create service.interlock.conf.v2 <path-to-new-config>
    
  2. Remove the old configuration and specify the new configuration:

    docker service update --config-rm \
    service.interlock.conf ucp-interlock
    docker service update --config-add \
    source=service.interlock.conf.v2,target=/config.toml \
    ucp-interlock
    
  3. Update the Interlock service to use the new image, for example, to pull the latest version of MKE:

    docker pull v/ucp:latest
    

    Example output:

    latest: Pulling from mirantis/ucp
    cd784148e348: Already exists
    3871e7d70c20: Already exists
    cad04e4a4815: Pull complete
    Digest: sha256:63ca6d3a6c7e94aca60e604b98fccd1295bffd1f69f3d6210031b72fc2467444
    Status: Downloaded newer image for mirantis/ucp:latest
    docker.io/mirantis/ucp:latest
    
  4. List all of the latest MKE images:

    docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
    mirantis/ucp images --list
    

    Example output

    mirantis/ucp-agent:3.5.0
    mirantis/ucp-auth-store:3.5.0
    mirantis/ucp-auth:3.5.0
    mirantis/ucp-azure-ip-allocator:3.5.0
    mirantis/ucp-calico-cni:3.5.0
    mirantis/ucp-calico-kube-controllers:3.5.0
    mirantis/ucp-calico-node:3.5.0
    mirantis/ucp-cfssl:3.5.0
    mirantis/ucp-compose:3.5.0
    mirantis/ucp-controller:3.5.0
    mirantis/ucp-dsinfo:3.5.0
    mirantis/ucp-etcd:3.5.0
    mirantis/ucp-hyperkube:3.5.0
    mirantis/ucp-interlock-extension:3.5.0
    mirantis/ucp-interlock-proxy:3.5.0
    mirantis/ucp-interlock:3.5.0
    mirantis/ucp-kube-compose-api:3.5.0
    mirantis/ucp-kube-compose:3.5.0
    mirantis/ucp-kube-dns-dnsmasq-nanny:3.5.0
    mirantis/ucp-kube-dns-sidecar:3.5.0
    mirantis/ucp-kube-dns:3.5.0
    mirantis/ucp-metrics:3.5.0
    mirantis/ucp-pause:3.5.0
    mirantis/ucp-swarm:3.5.0
    mirantis/ucp:3.5.0
    
  5. Start Interlock to verify the configuration object, which has the new extension version, and deploy a rolling update on all extensions:

    docker service update \
    --image mirantis/ucp-interlock:3.5.0 \
    ucp-interlock
    
Deploy
Deploy a Layer 7 routing solution

This topic describes how to route traffic to Swarm services by deploying a Layer 7 routing solution into a Swarm-orchestrated cluster. It has the following prerequisites:


Enabling Layer 7 routing causes the following to occur:

  1. MKE creates the ucp-interlock overlay network.

  2. MKE deploys the ucp-interlock service and attaches it both to the Docker socket and the overlay network that was created. This allows the Interlock service to use the Docker API, which is why this service needs to run on a manger node.

  3. The ucp-interlock service starts the ucp-interlock-extension service and attaches it to the ucp-interlock network, allowing both services to communicate.

  4. The ucp-interlock-extension generates a configuration for the proxy service to use. By default the proxy service is NGINX, so this service generates a standard NGINX configuration. MKE creates the com.docker.ucp.interlock.conf-1 configuration file and uses it to configure all the internal components of this service.

  5. The ucp-interlock service takes the proxy configuration and uses it to start the ucp-interlock-proxy service.

Note

Layer 7 routing is disabled by default.


To enable Layer 7 routing using the MKE web UI:

  1. Log in to the MKE web UI as an administrator.

  2. Navigate to <user-name> > Admin Settings.

  3. Click Ingress.

  4. Toggle the Swarm HTTP ingress slider to the right.

  5. Optional. By default, the routing mesh service listens on port 8080 for HTTP and 8443 for HTTPS. Change these ports if you already have services using them.

The three primary Interlock services include the core service, the extensions, and the proxy. The following is the default MKE configuration, which is created automatically when you enable Interlock as described in this topic.

ListenAddr = ":8080"
DockerURL = "unix:///var/run/docker.sock"
AllowInsecure = false
PollInterval = "3s"

[Extensions]
  [Extensions.default]
    Image = "mirantis/ucp-interlock-extension:3.5.0"
    ServiceName = "ucp-interlock-extension"
    Args = []
    Constraints = ["node.labels.com.docker.ucp.orchestrator.swarm==true", "node.platform.os==linux"]
    ProxyImage = "mirantis/ucp-interlock-proxy:3.5.0"
    ProxyServiceName = "ucp-interlock-proxy"
    ProxyConfigPath = "/etc/nginx/nginx.conf"
    ProxyReplicas = 2
    ProxyStopSignal = "SIGQUIT"
    ProxyStopGracePeriod = "5s"
    ProxyConstraints = ["node.labels.com.docker.ucp.orchestrator.swarm==true", "node.platform.os==linux"]
    PublishMode = "ingress"
    PublishedPort = 8080
    TargetPort = 80
    PublishedSSLPort = 8443
    TargetSSLPort = 443
    [Extensions.default.Labels]
      "com.docker.ucp.InstanceID" = "fewho8k85kyc6iqypvvdh3ntm"
    [Extensions.default.ContainerLabels]
      "com.docker.ucp.InstanceID" = "fewho8k85kyc6iqypvvdh3ntm"
    [Extensions.default.ProxyLabels]
      "com.docker.ucp.InstanceID" = "fewho8k85kyc6iqypvvdh3ntm"
    [Extensions.default.ProxyContainerLabels]
      "com.docker.ucp.InstanceID" = "fewho8k85kyc6iqypvvdh3ntm"
    [Extensions.default.Config]
      Version = ""
      User = "nginx"
      PidPath = "/var/run/proxy.pid"
      MaxConnections = 1024
      ConnectTimeout = 600
      SendTimeout = 600
      ReadTimeout = 600
      IPHash = false
      AdminUser = ""
      AdminPass = ""
      SSLOpts = ""
      SSLDefaultDHParam = 1024
      SSLDefaultDHParamPath = ""
      SSLVerify = "required"
      WorkerProcesses = 1
      RLimitNoFile = 65535
      SSLCiphers = "HIGH:!aNULL:!MD5"
      SSLProtocols = "TLSv1.2"
      AccessLogPath = "/dev/stdout"
      ErrorLogPath = "/dev/stdout"
      MainLogFormat = "'$remote_addr - $remote_user [$time_local] \"$request\" '\n\t\t    '$status $body_bytes_sent \"$http_referer\" '\n\t\t    '\"$http_user_agent\" \"$http_x_forwarded_for\"';"
      TraceLogFormat = "'$remote_addr - $remote_user [$time_local] \"$request\" $status '\n\t\t    '$body_bytes_sent \"$http_referer\" \"$http_user_agent\" '\n\t\t    '\"$http_x_forwarded_for\" $request_id $msec $request_time '\n\t\t    '$upstream_connect_time $upstream_header_time $upstream_response_time';"
      KeepaliveTimeout = "75s"
      ClientMaxBodySize = "32m"
      ClientBodyBufferSize = "8k"
      ClientHeaderBufferSize = "1k"
      LargeClientHeaderBuffers = "4 8k"
      ClientBodyTimeout = "60s"
      UnderscoresInHeaders = false
      HideInfoHeaders = false

To enable Layer 7 routing from the command line:

Interlock uses a TOML file for the core service configuration. The following example uses Swarm deployment and recovery features by creating a Docker config object.

  1. Create a Docker config object:

    cat << EOF | docker config create service.interlock.conf -
    ListenAddr = ":8080"
    DockerURL = "unix:///var/run/docker.sock"
    PollInterval = "3s"
    
    [Extensions]
      [Extensions.default]
        Image = "mirantis/ucp-interlock-extension:3.5.0"
        Args = ["-D"]
        ProxyImage = "mirantis/ucp-interlock-proxy:3.5.0"
        ProxyArgs = []
        ProxyConfigPath = "/etc/nginx/nginx.conf"
        ProxyReplicas = 1
        ProxyStopGracePeriod = "3s"
        ServiceCluster = ""
        PublishMode = "ingress"
        PublishedPort = 8080
        TargetPort = 80
        PublishedSSLPort = 8443
        TargetSSLPort = 443
        [Extensions.default.Config]
          User = "nginx"
          PidPath = "/var/run/proxy.pid"
          WorkerProcesses = 1
          RlimitNoFile = 65535
          MaxConnections = 2048
    EOF
    oqkvv1asncf6p2axhx41vylgt
    
  2. Create a dedicated network for Interlock and the extensions:

    docker network create --driver overlay ucp-interlock
    
  3. Create the Interlock service:

    docker service create \
    --name ucp-interlock \
    --mount src=/var/run/docker.sock,dst=/var/run/docker.sock,type=bind \
    --network ucp-interlock \
    --constraint node.role==manager \
    --config src=service.interlock.conf,target=/config.toml \
    mirantis/ucp-interlock:3.5.0 -D run -c /config.toml
    

    Note

    The Interlock core service must have access to a Swarm manager (--constraint node.role==manager), however the extension and proxy services are recommended to run on workers.

  4. Verify that the three services are created, one for the Interlock service, one for the extension service, and one for the proxy service:

    docker service ls
    ID                  NAME                     MODE                REPLICAS            IMAGE                                                                PORTS
    sjpgq7h621ex        ucp-interlock            replicated          1/1                 mirantis/ucp-interlock:3.5.0
    oxjvqc6gxf91        ucp-interlock-extension  replicated          1/1                 mirantis/ucp-interlock-extension:3.5.0
    lheajcskcbby        ucp-interlock-proxy      replicated          1/1                 mirantis/ucp-interlock-proxy:3.5.0        *:80->80/tcp *:443->443/tcp
    
Configure Layer 7 routing for production

This topic describes how to configure Interlock for a production environment and builds upon the instruction in the previous topic, Deploy a Layer 7 routing solution. It does not describe infrastructure deployment, and it assumes you are using a typical Swarm cluster, using docker init and docker swarm join from the nodes.

The Layer 7 solution that ships with MKE is highly available, fault tolerant, and designed to work independently of how many nodes you manage with MKE.

The following procedures require that you dedicate two worker nodes for running the ucp-interlock-proxy service. This tuning ensures the following:

  • The proxy services have dedicated resources to handle user requests. You can configure these nodes with higher performance network interfaces.

  • No application traffic can be routed to a manager node, thus making your deployment more secure.

  • If one of the two dedicated nodes fails, Layer 7 routing continues working.


To dedicate two nodes to running the proxy service:

  1. Select two nodes that you will dedicate to running the proxy service.

  2. Log in to one of the Swarm manager nodes.

  3. Add labels to the two dedicated proxy service nodes, configuring them as load balancer worker nodes, for example, lb-00 and lb-01:

    docker node update --label-add nodetype=loadbalancer lb-00
    lb-00
    docker node update --label-add nodetype=loadbalancer lb-01
    lb-01
    
  4. Verify that the labels were added successfully:

    docker node inspect -f '{{ .Spec.Labels  }}' lb-00
    map[nodetype:loadbalancer]
    docker node inspect -f '{{ .Spec.Labels  }}' lb-01
    map[nodetype:loadbalancer]
    

To update the proxy service:

You must update the ucp-interlock-proxy service configuration to deploy the proxy service properly constrained to the dedicated worker nodes.

  1. From a manager node, add a constraint to the ucp-interlock-proxy service to update the running service:

    docker service update --replicas=2 \
    --constraint-add node.labels.nodetype==loadbalancer \
    --stop-signal SIGQUIT \
    --stop-grace-period=5s \
    $(docker service ls -f 'label=type=com.docker.interlock.core.proxy' -q)
    

    This updates the proxy service to have two replicas, ensures that they are constrained to the workers with the label nodetype==loadbalancer, and configures the stop signal for the tasks to be a SIGQUIT with a grace period of five seconds. This ensures that NGINX does not exit before the client request is finished.

  2. Inspect the service to verify that the replicas have started on the selected nodes:

    docker service ps $(docker service ls -f \
    'label=type=com.docker.interlock.core.proxy' -q)
    

    Example of system response:

    ID            NAME                    IMAGE          NODE     DESIRED STATE   CURRENT STATE                   ERROR   PORTS
    o21esdruwu30  interlock-proxy.1       nginx:alpine   lb-01    Running         Preparing 3 seconds ago
    n8yed2gp36o6   \_ interlock-proxy.1   nginx:alpine   mgr-01   Shutdown        Shutdown less than a second ago
    aubpjc4cnw79  interlock-proxy.2       nginx:alpine   lb-00    Running         Preparing 3 seconds ago
    
  3. Add the constraint to the ProxyConstraints array in the interlock-proxy service configuration in case Interlock is restored from backup:

    [Extensions]
      [Extensions.default]
        ProxyConstraints = ["node.labels.com.docker.ucp.orchestrator.swarm==true", "node.platform.os==linux", "node.labels.nodetype==loadbalancer"]
    
  4. Optional. By default, the config service is global, scheduling one task on every node in the cluster. To modify constraint scheduling, update the ProxyConstraints variable in the Interlock configuration file. Refer to Configure layer 7 routing service for more information.

  5. Verify that the proxy service is running on the dedicated nodes:

    docker service ps ucp-interlock-proxy
    
  6. Update the settings in the upstream load balancer, such as ELB or F5, with the addresses of the dedicated ingress workers, thus directing all traffic to these two worker nodes.

See also

NGINX

Offline installation considerations

To install Interlock on your cluster without an Internet connection, you must have the required Docker images loaded on your computer. This topic describes how to export the required images from a local instance of MCR and then load them to your Swarm-orchestrated cluster.

To export Docker images from a local instance:

  1. Using a local instance of MCR, save the required images:

    docker save mirantis/ucp-interlock:3.5.0 > interlock.tar
    docker save mirantis/ucp-interlock-extension:3.5.0 > interlock-extension-nginx.tar
    docker save mirantis/ucp-interlock-proxy:3.5.0 > interlock-proxy-nginx.tar
    

    This saves the following three files:

    • interlock.tar - the core Interlock application.

    • interlock-extension-nginx.tar - the Interlock extension for NGINX.

    • interlock-proxy-nginx.tar - the official NGINX image based on Alpine.

    Note

    Replace mirantis/ucp-interlock-extension:3.5.0 and mirantis/ucp-interlock-proxy:3.5.0 with the corresponding extension and proxy image if you are not using NGINX.

  2. Copy the three files you just saved to each node in the cluster and load each image:

    docker load < interlock.tar
    docker load < interlock-extension-nginx.tar
    docker load < interlock-proxy-nginx.tar
    

Refer to Deploy a Layer 7 routing solution to continue the installation.

See also

NGINX

Routing traffic to services
Route traffic to a Swarm service

After Interlock is deployed, you can launch and publish services and applications. Use Service Labels to configure services to publish themselves to the load balancer.

The following examples assume a DNS entry (or local hosts entry if you are testing locally) exists for each of the applications.

Publish a service with four replicas

Create a Docker Service using two labels:

  • com.docker.lb.hosts

  • com.docker.lb.port

The com.docker.lb.hosts label instructs Interlock where the service should be available. The com.docker.lb.port label instructs what port the proxy service should use to access the upstreams.

Publish a demo service to the host demo.local:

First, create an overlay network so that service traffic is isolated and secure:

$> docker network create -d overlay demo
1se1glh749q1i4pw0kf26mfx5

Next, deploy the application:

$> docker service create \
    --name demo \
    --network demo \
    --label com.docker.lb.hosts=demo.local \
    --label com.docker.lb.port=8080 \
    ehazlett/docker-demo
6r0wiglf5f3bdpcy6zesh1pzx

Interlock detects when the service is available and publishes it. After tasks are running and the proxy service is updated, the application is available via http://demo.local.

$> curl -s -H "Host: demo.local" http://127.0.0.1/ping
{"instance":"c2f1afe673d4","version":"0.1",request_id":"7bcec438af14f8875ffc3deab9215bc5"}

To increase service capacity, use the Docker Service Scale command:

$> docker service scale demo=4
demo scaled to 4

In this example, four service replicas are configured as upstreams. The load balancer balances traffic across all service replicas.

Publish a service with a web interface

This example deploys a simple service that:

  • Has a JSON endpoint that returns the ID of the task serving the request.

  • Has a web interface that shows how many tasks the service is running.

  • Can be reached at http://app.example.org.

Create a docker-compose.yml file with:

version: "3.2"

services:
  demo:
    image: ehazlett/docker-demo
    deploy:
      replicas: 1
      labels:
        com.docker.lb.hosts: app.example.org
        com.docker.lb.network: demo_demo-network
        com.docker.lb.port: 8080
    networks:
      - demo-network

networks:
  demo-network:
    driver: overlay

Note that:

  • Docker Compose files must reference networks as external. Include external:true in the docker-compose.yml file.

  • The com.docker.lb.hosts label defines the hostname for the service. When the layer 7 routing solution gets a request containing app.example.org in the host header, that request is forwarded to the demo service.

  • The com.docker.lb.network defines which network the ucp-interlock-proxy should attach to in order to be able to communicate with the demo service. To use layer 7 routing, your services need to be attached to at least one network. If your service is only attached to a single network, you don’t need to add a label to specify which network to use for routing. When using a common stack file for multiple deployments leveraging MKE Interlock / Layer 7 Routing, prefix com.docker.lb.network with the stack name to ensure traffic will be directed to the correct overlay network. When using in combination with com.docker.lb.ssl_passthrough the label in mandatory, even if your service is only attached to a single network.

  • The com.docker.lb.port label specifies which port the ucp-interlock-proxy service should use to communicate with this demo service.

  • Your service doesn’t need to expose a port in the Swarm routing mesh. All communications are done using the network you’ve specified.

Set up your CLI client with a MKE client bundle and deploy the service:

docker stack deploy --compose-file docker-compose.yml demo

The ucp-interlock service detects that your service is using these labels and automatically reconfigures the ucp-interlock-proxy service.

Test using the CLI

To test that requests are routed to the demo service, run:

curl --header "Host: app.example.org" \
  http://<mke-address>:<routing-http-port>/ping

Where:

  • <mke-address> is the domain name or IP address of a MKE node.

  • <routing-http-port> is the port you are using to route HTTP traffic.

If everything is working correctly, you should get a JSON result like:

{"instance":"63b855978452", "version":"0.1", "request_id":"d641430be9496937f2669ce6963b67d6"}
Test using a browser

Since the demo service exposes an HTTP endpoint, you can also use your browser to validate that everything is working.

Make sure the /etc/hosts file in your system has an entry mapping app.example.org to the IP address of a MKE node. Once you do that, you’ll be able to start using the service from your browser.

Publish a service as a canary instance

The following example publishes a service as a canary instance.

First, create an overlay network to isolate and secure service traffic:

$> docker network create -d overlay demo
1se1glh749q1i4pw0kf26mfx5

Next, create the initial service:

$> docker service create \
    --name demo-v1 \
    --network demo \
    --detach=false \
    --replicas=4 \
    --label com.docker.lb.hosts=demo.local \
    --label com.docker.lb.port=8080 \
    --env METADATA="demo-version-1" \
    ehazlett/docker-demo

Interlock detects when the service is available and publishes it. After tasks are running and the proxy service is updated, the application is available via http://demo.local:

$> curl -vs -H "Host: demo.local" http://127.0.0.1/ping
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to demo.local (127.0.0.1) port 80 (#0)
> GET /ping HTTP/1.1
> Host: demo.local
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.13.6
< Date: Wed, 08 Nov 2017 20:28:26 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 120
< Connection: keep-alive
< Set-Cookie: session=1510172906715624280; Path=/; Expires=Thu, 09 Nov 2017 20:28:26 GMT; Max-Age=86400
< x-request-id: f884cf37e8331612b8e7630ad0ee4e0d
< x-proxy-id: 5ad7c31f9f00
< x-server-info: interlock/2.0.0-development (147ff2b1) linux/amd64
< x-upstream-addr: 10.0.2.4:8080
< x-upstream-response-time: 1510172906.714
<
{"instance":"df20f55fc943","version":"0.1","metadata":"demo-version-1","request_id":"f884cf37e8331612b8e7630ad0ee4e0d"}

Notice metadata is specified with demo-version-1.

Deploy an updated service as a canary instance

The following example deploys an updated service as a canary instance:

$> docker service create \
    --name demo-v2 \
    --network demo \
    --detach=false \
    --label com.docker.lb.hosts=demo.local \
    --label com.docker.lb.port=8080 \
    --env METADATA="demo-version-2" \
    --env VERSION="0.2" \
    ehazlett/docker-demo

Since this has a replica of one (1), and the initial version has four (4) replicas, 20% of application traffic is sent to demo-version-2:

$> curl -vs -H "Host: demo.local" http://127.0.0.1/ping
{"instance":"23d9a5ec47ef","version":"0.1","metadata":"demo-version-1","request_id":"060c609a3ab4b7d9462233488826791c"}
$> curl -vs -H "Host: demo.local" http://127.0.0.1/ping
{"instance":"f42f7f0a30f9","version":"0.1","metadata":"demo-version-1","request_id":"c848e978e10d4785ac8584347952b963"}
$> curl -vs -H "Host: demo.local" http://127.0.0.1/ping
{"instance":"c2a686ae5694","version":"0.1","metadata":"demo-version-1","request_id":"724c21d0fb9d7e265821b3c95ed08b61"}
$> curl -vs -H "Host: demo.local" http://127.0.0.1/ping
{"instance":"1b0d55ed3d2f","version":"0.2","metadata":"demo-version-2","request_id":"b86ff1476842e801bf20a1b5f96cf94e"}
$> curl -vs -H "Host: demo.local" http://127.0.0.1/ping
{"instance":"c2a686ae5694","version":"0.1","metadata":"demo-version-1","request_id":"724c21d0fb9d7e265821b3c95ed08b61"}

To increase traffic to the new version, add more replicas with docker scale:

$> docker service scale demo-v2=4
demo-v2

To complete the upgrade, scale the demo-v1 service to zero (0):

$> docker service scale demo-v1=0
demo-v1

This routes all application traffic to the new version. If you need to rollback, simply scale the v1 service back up and v2 down.

Use context or path-based routing

The following example publishes a service using context or path based routing.

First, create an overlay network so that service traffic is isolated and secure:

$> docker network create -d overlay demo
1se1glh749q1i4pw0kf26mfx5

Next, create the initial service:

$> docker service create \
    --name demo \
    --network demo \
    --detach=false \
    --label com.docker.lb.hosts=demo.local \
    --label com.docker.lb.port=8080 \
    --label com.docker.lb.context_root=/app \
    --label com.docker.lb.context_root_rewrite=true \
    --env METADATA="demo-context-root" \
    ehazlett/docker-demo

Only one path per host

Interlock only supports one path per host per service cluster. When a specific com.docker.lb.hosts label is applied, it cannot be applied again in the same service cluster.

Interlock detects when the service is available and publishes it. After the tasks are running and the proxy service is updated, the application is available via http://demo.local:

$> curl -vs -H "Host: demo.local" http://127.0.0.1/app/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> GET /app/ HTTP/1.1
> Host: demo.local
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.13.6
< Date: Fri, 17 Nov 2017 14:25:17 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< x-request-id: 077d18b67831519defca158e6f009f82
< x-proxy-id: 77c0c37d2c46
< x-server-info: interlock/2.0.0-dev (732c77e7) linux/amd64
< x-upstream-addr: 10.0.1.3:8080
< x-upstream-response-time: 1510928717.306
Specify a routing mode

You can publish services using “vip” and “task” backend routing modes.

Task routing mode

Task routing is the default Interlock behavior and the default backend mode if one is not specified. In task routing mode, Interlock uses backend task IPs to route traffic from the proxy to each container. Traffic to the frontend route is L7 load balanced directly to service tasks. This allows for per-container routing functionality such as sticky sessions. Task routing mode applies L7 routing and then sends packets directly to a container.

VIP routing mode

VIP mode is an alternative mode of routing in which Interlock uses the Swarm service VIP as the backend IP instead of container IPs. Traffic to the frontend route is L7 load balanced to the Swarm service VIP, which L4 load balances to backend tasks. VIP mode can be useful to reduce the amount of churn in Interlock proxy service configuration, which can be an advantage in highly dynamic environments.

VIP mode optimizes for fewer proxy updates in a tradeoff for a reduced feature set. Most application updates do not require configuring backends in VIP mode.

In VIP routing mode Interlock uses the service VIP (a persistent endpoint that exists from service creation to service deletion) as the proxy backend. VIP routing mode was introduced in UCP 3.0 version 3.0.3 and 3.1 version 3.1.2. VIP routing mode applies L7 routing and then sends packets to the Swarm L4 load balancer which routes traffic service containers.

While VIP mode provides endpoint stability in the face of application churn, it cannot support sticky sessions because sticky sessions depend on routing directly to container IPs. Sticky sessions are therefore not supported in VIP mode.

Because VIP mode routes by service IP rather than by task IP it also affects the behavior of canary deployments. In task mode a canary service with one task next to an existing service with four tasks represents one out of five total tasks, so the canary will receive 20% of incoming requests. By contrast the same canary service in VIP mode will receive 50% of incoming requests, because it represents one out of two total services.

Usage

You can set the backend mode on a per-service basis, which means that some applications can be deployed in task mode, while others are deployed in VIP mode.

The default backend mode is task. If a label is set to task or a label does not exist, then Interlock uses the task routing mode.

To use Interlock VIP mode, the following label must be applied:

com.docker.lb.backend_mode=vip

In VIP mode, the following non-exhaustive list of application events does not require proxy reconfiguration:

  • Service replica increase/decrease

  • New image deployment

  • Config or secret updates

  • Add/Remove labels

  • Add/Remove environment variables

  • Rescheduling a failed application task

The following two updates still require a proxy reconfiguration (because these actions create or destroy a service VIP):

  • Add/Remove a network on a service

  • Deployment/Deletion of a service

Publish a default host service

The following example publishes a service to be a default host. The service responds whenever there is a request to a host that is not configured.

First, create an overlay network so that service traffic is isolated and secure:

$> docker network create -d overlay demo
1se1glh749q1i4pw0kf26mfx5

Next, create the initial service:

$> docker service create \
    --name demo-default \
    --network demo \
    --detach=false \
    --replicas=1 \
    --label com.docker.lb.default_backend=true \
    --label com.docker.lb.port=8080 \
    ehazlett/interlock-default-app

Interlock detects when the service is available and publishes it. After tasks are running and the proxy service is updated, the application is available via any URL that is not configured:

Publish a service using “vip” backend mode
  1. Create an overlay network so that service traffic is isolated and secure:

    $> docker network create -d overlay demo
    1se1glh749q1i4pw0kf26mfx5
    
  2. Create the initial service:

    $> docker service create \
       --name demo \
       --network demo \
       --detach=false \
       --replicas=4 \
       --label com.docker.lb.hosts=demo.local \
       --label com.docker.lb.port=8080 \
       --label com.docker.lb.backend_mode=vip \
       --env METADATA="demo-vip-1" \
       ehazlett/docker-demo
    

Interlock detects when the service is available and publishes it. After tasks are running and the proxy service is updated, the application should be available via http://demo.local:

$> curl -vs -H "Host: demo.local" http://127.0.0.1/ping
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to demo.local (127.0.0.1) port 80 (#0)
> GET /ping HTTP/1.1
> Host: demo.local
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.13.6
< Date: Wed, 08 Nov 2017 20:28:26 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 120
< Connection: keep-alive
< Set-Cookie: session=1510172906715624280; Path=/; Expires=Thu, 09 Nov 2017 20:28:26 GMT; Max-Age=86400
< x-request-id: f884cf37e8331612b8e7630ad0ee4e0d
< x-proxy-id: 5ad7c31f9f00
< x-server-info: interlock/2.0.0-development (147ff2b1) linux/amd64
< x-upstream-addr: 10.0.2.9:8080
< x-upstream-response-time: 1510172906.714
<
{"instance":"df20f55fc943","version":"0.1","metadata":"demo","request_id":"f884cf37e8331612b8e7630ad0ee4e0d"}

Instead of using each task IP for load balancing, configuring VIP mode causes Interlock to use the Virtual IPs of the service instead. Inspecting the service shows the VIPs:

"Endpoint": {
    "Spec": {
                "Mode": "vip"

    },
    "VirtualIPs": [
        {
                "NetworkID": "jed11c1x685a1r8acirk2ylol",
                "Addr": "10.0.2.9/24"
        }
    ]
}

In this case, Interlock configures a single upstream for the host using the IP “10.0.2.9”. Interlock skips further proxy updates as long as there is at least 1 replica for the service because the only upstream is the VIP.

Swarm routes requests for the VIP in a round robin fashion at L4. This means that the following Interlock features are incompatible with VIP mode:

  • Sticky sessions

Use routing labels

After you enable the layer 7 routing solution, you can start using it in your swarm services.

Service labels define hostnames that are routed to the service, the applicable ports, and other routing configurations. Applications that publish using Interlock use service labels to configure how they are published.

When you deploy or update a swarm service with service labels, the following actions occur:

  1. The ucp-interlock service monitors the Docker API for events and publishes the events to the ucp-interlock-extension service.

  2. That service then generates a new configuration for the proxy service, based on the labels you added to your services.

  3. The ucp-interlock service takes the new configuration and reconfigures the ucp-interlock-proxy to start using the new configuration.

The previous steps occur in milliseconds and with rolling updates. Even though services are being reconfigured, users won’t notice it.

Service label options

Label

Description

Example

com.docker.lb.hosts

Comma separated list of the hosts that the service should serve.

example.com, test.com

com.docker.lb.port

Port to use for internal upstream communication,

8080

com.docker.lb.network

Name of network the proxy service should attach to for upstream connectivity.

app-network-a

com.docker.lb.context_root

Context or path to use for the application.

/app

com.docker.lb.context_root_rewrite

When set to true, this option changes the path from the value of label com.docker.lb.context_root to /.

true

com.docker.lb.ssl_cert

Docker secret to use for the SSL certificate.

example.com.cert

com.docker.lb.ssl_key

Docker secret to use for the SSL key.

`example.com.key`

com.docker.lb.websocket_endpoints

Comma separated list of endpoints to configure to be upgraded for websockets.

/ws,/foo

com.docker.lb.service_cluster

Name of the service cluster to use for the application.

us-east

com.docker.lb.sticky_session_cookie

Cookie to use for sticky sessions.

app_session

com.docker.lb.redirects

Semi-colon separated list of redirects to add in the format of <source>, <target>.

http://old.example.com, http://new.example.com

com.docker.lb.ssl_passthrough

Enable SSL passthrough

false

com.docker.lb.backend_mode

Select the backend mode that the proxy should use to access the upstreams. Defaults to task.

vip

Configure redirects

The following example publishes a service and configures a redirect from old.local to new.local.

Note

There is currently a limitation where redirects do not work if a service is configured for TLS passthrough in Interlock proxy.

First, create an overlay network so that service traffic is isolated and secure:

$> docker network create -d overlay demo
1se1glh749q1i4pw0kf26mfx5

Next, create the service with the redirect:

$> docker service create \
    --name demo \
    --network demo \
    --detach=false \
    --label com.docker.lb.hosts=old.local,new.local \
    --label com.docker.lb.port=8080 \
    --label com.docker.lb.redirects=http://old.local,http://new.local \
    --env METADATA="demo-new" \
    ehazlett/docker-demo

Interlock detects when the service is available and publishes it. After tasks are running and the proxy service is updated, the application is available via http://new.local with a redirect configured that sends http://old.local to http://new.local:

$> curl -vs -H "Host: old.local" http://127.0.0.1
* Rebuilt URL to: http://127.0.0.1/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: old.local
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 302 Moved Temporarily
< Server: nginx/1.13.6
< Date: Wed, 08 Nov 2017 19:06:27 GMT
< Content-Type: text/html
< Content-Length: 161
< Connection: keep-alive
< Location: http://new.local/
< x-request-id: c4128318413b589cafb6d9ff8b2aef17
< x-proxy-id: 48854cd435a4
< x-server-info: interlock/2.0.0-development (147ff2b1) linux/amd64
<
<html>
<head><title>302 Found</title></head>
<body bgcolor="white">
<center><h1>302 Found</h1></center>
<hr><center>nginx/1.13.6</center>
</body>
</html>
Create service clusters

Reconfiguring Interlock’s proxy can take 1-2 seconds per overlay network managed by that proxy. To scale up to larger number of Docker networks and services routed to by Interlock, you may consider implementing service clusters. Service clusters are multiple proxy services managed by Interlock (rather than the default single proxy service), each responsible for routing to a separate set of Docker services and their corresponding networks, thereby minimizing proxy reconfiguration time.

Prerequisites

In this example, we’ll assume you have a MKE cluster set up with at least two worker nodes, mke-node-0 and mke-node-1; we’ll use these as dedicated proxy servers for two independent Interlock service clusters. We’ll also assume you’ve already enabled Interlock with an HTTP port of 80 and an HTTPS port of 8443.

Setting up Interlock service clusters

First, apply some node labels to the MKE workers you’ve chosen to use as your proxy servers. From a MKE manager:

docker node update --label-add nodetype=loadbalancer --label-add region=east mke-node-0
docker node update --label-add nodetype=loadbalancer --label-add region=west mke-node-1

We’ve labeled mke-node-0 to be the proxy for our east region, and mke-node-1 to be the proxy for our west region.

Let’s also create a dedicated overlay network for each region’s proxy to manage traffic on. We could create many for each, but bear in mind the cumulative performance hit that incurs:

docker network create --driver overlay eastnet
docker network create --driver overlay westnet

Next, modify Interlock’s configuration to create two service clusters. Start by writing its current configuration out to a file which you can modify:

CURRENT_CONFIG_NAME=$(docker service inspect --format '{{ (index .Spec.TaskTemplate.ContainerSpec.Configs 0).ConfigName }}' ucp-interlock)
docker config inspect --format '{{ printf "%s" .Spec.Data }}' $CURRENT_CONFIG_NAME > old_config.toml

Make a new config file called config.toml with the following content, which declares two service clusters, east and west.

Note

You will have to change the MKE version (3.2.3 in the example below) to match yours, as well as all instances of *.ucp.InstanceID (vl5umu06ryluu66uzjcv5h1bo below):

ListenAddr = ":8080"
DockerURL = "unix:///var/run/docker.sock"
AllowInsecure = false
PollInterval = "3s"

[Extensions]
  [Extensions.east]
    Image = "mirantis/ucp-interlock-extension:3.2.3"
    ServiceName = "ucp-interlock-extension-east"
    Args = []
    Constraints = ["node.labels.com.docker.ucp.orchestrator.swarm==true", "node.platform.os==linux"]
    ConfigImage = "mirantis/ucp-interlock-config:3.2.3"
    ConfigServiceName = "ucp-interlock-config-east"
    ProxyImage = "mirantis/ucp-interlock-proxy:3.2.3"
    ProxyServiceName = "ucp-interlock-proxy-east"
    ServiceCluster="east"
    Networks=["eastnet"]
    ProxyConfigPath = "/etc/nginx/nginx.conf"
    ProxyReplicas = 1
    ProxyStopSignal = "SIGQUIT"
    ProxyStopGracePeriod = "5s"
    ProxyConstraints = ["node.labels.com.docker.ucp.orchestrator.swarm==true", "node.platform.os==linux", "node.labels.region==east"]
    PublishMode = "host"
    PublishedPort = 80
    TargetPort = 80
    PublishedSSLPort = 8443
    TargetSSLPort = 443
    [Extensions.east.Labels]
      "ext_region" = "east"
      "com.docker.ucp.InstanceID" = "vl5umu06ryluu66uzjcv5h1bo"
    [Extensions.east.ContainerLabels]
      "com.docker.ucp.InstanceID" = "vl5umu06ryluu66uzjcv5h1bo"
    [Extensions.east.ProxyLabels]
      "proxy_region" = "east"
      "com.docker.ucp.InstanceID" = "vl5umu06ryluu66uzjcv5h1bo"
    [Extensions.east.ProxyContainerLabels]
      "com.docker.ucp.InstanceID" = "vl5umu06ryluu66uzjcv5h1bo"
    [Extensions.east.Config]
      Version = ""
      HTTPVersion = "1.1"
      User = "nginx"
      PidPath = "/var/run/proxy.pid"
      MaxConnections = 1024
      ConnectTimeout = 5
      SendTimeout = 600
      ReadTimeout = 600
      IPHash = false
      AdminUser = ""
      AdminPass = ""
      SSLOpts = ""
      SSLDefaultDHParam = 1024
      SSLDefaultDHParamPath = ""
      SSLVerify = "required"
      WorkerProcesses = 1
      RLimitNoFile = 65535
      SSLCiphers = "HIGH:!aNULL:!MD5"
      SSLProtocols = "TLSv1.2"
      AccessLogPath = "/dev/stdout"
      ErrorLogPath = "/dev/stdout"
      MainLogFormat = "'$remote_addr - $remote_user [$time_local] \"$request\" '\n\t\t    '$status $body_bytes_sent \"$http_referer\" '\n\t\t    '\"$http_user_agent\" \"$http_x_forwarded_for\"';"
      TraceLogFormat = "'$remote_addr - $remote_user [$time_local] \"$request\" $status '\n\t\t    '$body_bytes_sent \"$http_referer\" \"$http_user_agent\" '\n\t\t    '\"$http_x_forwarded_for\" $reqid $msec $request_time '\n\t\t    '$upstream_connect_time $upstream_header_time $upstream_response_time';"
      KeepaliveTimeout = "75s"
      ClientMaxBodySize = "32m"
      ClientBodyBufferSize = "8k"
      ClientHeaderBufferSize = "1k"
      LargeClientHeaderBuffers = "4 8k"
      ClientBodyTimeout = "60s"
      UnderscoresInHeaders = false
      UpstreamZoneSize = 64
      ServerNamesHashBucketSize = 128
      GlobalOptions = []
      HTTPOptions = []
      TCPOptions = []
      HideInfoHeaders = false

  [Extensions.west]
    Image = "mirantis/ucp-interlock-extension:3.2.3"
    ServiceName = "ucp-interlock-extension-west"
    Args = []
    Constraints = ["node.labels.com.docker.ucp.orchestrator.swarm==true", "node.platform.os==linux"]
    ConfigImage = "mirantis/ucp-interlock-config:3.2.3"
    ConfigServiceName = "ucp-interlock-config-west"
    ProxyImage = "mirantis/ucp-interlock-proxy:3.2.3"
    ProxyServiceName = "ucp-interlock-proxy-west"
    ServiceCluster="west"
    Networks=["westnet"]
    ProxyConfigPath = "/etc/nginx/nginx.conf"
    ProxyReplicas = 1
    ProxyStopSignal = "SIGQUIT"
    ProxyStopGracePeriod = "5s"
    ProxyConstraints = ["node.labels.com.docker.ucp.orchestrator.swarm==true", "node.platform.os==linux", "node.labels.region==west"]
    PublishMode = "host"
    PublishedPort = 80
    TargetPort = 80
    PublishedSSLPort = 8443
    TargetSSLPort = 443
    [Extensions.west.Labels]
      "ext_region" = "west"
      "com.docker.ucp.InstanceID" = "vl5umu06ryluu66uzjcv5h1bo"
    [Extensions.west.ContainerLabels]
      "com.docker.ucp.InstanceID" = "vl5umu06ryluu66uzjcv5h1bo"
    [Extensions.west.ProxyLabels]
      "proxy_region" = "west"
      "com.docker.ucp.InstanceID" = "vl5umu06ryluu66uzjcv5h1bo"
    [Extensions.west.ProxyContainerLabels]
      "com.docker.ucp.InstanceID" = "vl5umu06ryluu66uzjcv5h1bo"
    [Extensions.west.Config]
      Version = ""
      HTTPVersion = "1.1"
      User = "nginx"
      PidPath = "/var/run/proxy.pid"
      MaxConnections = 1024
      ConnectTimeout = 5
      SendTimeout = 600
      ReadTimeout = 600
      IPHash = false
      AdminUser = ""
      AdminPass = ""
      SSLOpts = ""
      SSLDefaultDHParam = 1024
      SSLDefaultDHParamPath = ""
      SSLVerify = "required"
      WorkerProcesses = 1
      RLimitNoFile = 65535
      SSLCiphers = "HIGH:!aNULL:!MD5"
      SSLProtocols = "TLSv1.2"
      AccessLogPath = "/dev/stdout"
      ErrorLogPath = "/dev/stdout"
      MainLogFormat = "'$remote_addr - $remote_user [$time_local] \"$request\" '\n\t\t    '$status $body_bytes_sent \"$http_referer\" '\n\t\t    '\"$http_user_agent\" \"$http_x_forwarded_for\"';"
      TraceLogFormat = "'$remote_addr - $remote_user [$time_local] \"$request\" $status '\n\t\t    '$body_bytes_sent \"$http_referer\" \"$http_user_agent\" '\n\t\t    '\"$http_x_forwarded_for\" $reqid $msec $request_time '\n\t\t    '$upstream_connect_time $upstream_header_time $upstream_response_time';"
      KeepaliveTimeout = "75s"
      ClientMaxBodySize = "32m"
      ClientBodyBufferSize = "8k"
      ClientHeaderBufferSize = "1k"
      LargeClientHeaderBuffers = "4 8k"
      ClientBodyTimeout = "60s"
      UnderscoresInHeaders = false
      UpstreamZoneSize = 64
      ServerNamesHashBucketSize = 128
      GlobalOptions = []
      HTTPOptions = []
      TCPOptions = []
      HideInfoHeaders = false

If instead you prefer to modify the config file that Interlock creates by default, make the following crucial service adjustments:

  • Replace [Extensions.default] with [Extensions.east]

  • Change ServiceName to "ucp-interlock-extension-east"

  • Change ConfigServiceName to "ucp-interlock-config-east"

  • Change ProxyServiceName to "ucp-interlock-proxy-east"

  • Add the constraint "node.labels.region==east" to the list ProxyConstraints

  • Add the key ServiceCluster="east" immediately below and inline with ProxyServiceName

  • Add the key Networks=["eastnet"] immediately below and inline with ServiceCluster (Note this list can contain as many overlay networks as you like; Interlock will only connect to the specified networks, and will connect to them all at startup.)

  • Change PublishMode="ingress" to PublishMode="host"

  • Change the section title [Extensions.default.Labels] to [Extensions.east.Labels]

  • Add the key "ext_region" = "east" under the [Extensions.east.Labels] section

  • Change the section title [Extensions.default.ContainerLabels] to [Extensions.east.ContainerLabels]

  • Change the section title [Extensions.default.ProxyLabels] to [Extensions.east.ProxyLabels]

  • Add the key "proxy_region" = "east" under the [Extensions.east.ProxyLabels] section

  • Change the section title [Extensions.default.ProxyContainerLabels] to [Extensions.east.ProxyContainerLabels]

  • Change the section title [Extensions.default.Config] to [Extensions.east.Config]

  • [Optional] change ProxyReplicas=2 to ProxyReplicas=1, necessary only if there is a single node labeled to be a proxy for each service cluster.

  • Copy the entire [Extensions.east] block a second time, changing east to west for your west service cluster.

Create a new docker config object from this configuration file:

NEW_CONFIG_NAME="com.docker.ucp.interlock.conf-$(( $(cut -d '-' -f 2 <<< "$CURRENT_CONFIG_NAME") + 1 ))"
docker config create $NEW_CONFIG_NAME config.toml

Update the ucp-interlock service to start using this new configuration:

docker service update \
  --config-rm $CURRENT_CONFIG_NAME \
  --config-add source=$NEW_CONFIG_NAME,target=/config.toml \
  ucp-interlock

Finally, do a docker service ls. You should see two services providing Interlock proxies, ucp-interlock-proxy-east and -west. If you only see one Interlock proxy service, delete it with docker service rm. After a moment, the two new proxy services should be created, and Interlock will be successfully configured with two service clusters.

Deploying services in separate service clusters

Now that you’ve set up your service clusters, you can deploy services to be routed to by each proxy by using the service_cluster label. Create two example services:

docker service create --name demoeast \
        --network eastnet \
        --label com.docker.lb.hosts=demo.A \
        --label com.docker.lb.port=8000 \
        --label com.docker.lb.service_cluster=east \
        training/whoami:latest

docker service create --name demowest \
        --network westnet \
        --label com.docker.lb.hosts=demo.B \
        --label com.docker.lb.port=8000 \
        --label com.docker.lb.service_cluster=west \
        training/whoami:latest

Recall that mke-node-0 was your proxy for the east service cluster. Attempt to reach your whoami service there:

curl -H "Host: demo.A" http://<mke-node-0 public IP>

You should receive a response indicating the container ID of the whoami container declared by the demoeast service. Attempt the same curl at mke-node-1’s IP, and it will fail: the Interlock proxy running there only routes traffic to services with the service_cluster=west label, connected to the westnet Docker network you listed in that service cluster’s configuration.

Finally, make sure your second service cluster is working analogously to the first:

curl -H "Host: demo.B" http://<mke-node-1 public IP>

The service routed by Host: demo.B is reachable via (and only via) the Interlock proxy mapped to port 80 on mke-node-1. At this point, you have successfully set up and demonstrated that Interlock can manage multiple proxies routing only to services attached to a select subset of Docker networks.

Persistent sessions

You can publish a service and configure the proxy for persistent (sticky) sessions using:

  • Cookies

  • IP hashing

Cookies

To configure sticky sessions using cookies:

  1. Create an overlay network so that service traffic is isolated and secure, as shown in the following example:

    docker network create -d overlay demo 1se1glh749q1i4pw0kf26mfx5
    
  2. Create a service with the cookie to use for sticky sessions:

    $> docker service create \
       --name demo \
       --network demo \
       --detach=false \
       --replicas=5 \
       --label com.docker.lb.hosts=demo.local \
       --label com.docker.lb.sticky_session_cookie=session \
       --label com.docker.lb.port=8080 \
       --env METADATA="demo-sticky" \
       ehazlett/docker-demo
    

Interlock detects when the service is available and publishes it. When tasks are running and the proxy service is updated, the application is available via http://demo.local and is configured to use sticky sessions:

$> curl -vs -c cookie.txt -b cookie.txt -H "Host: demo.local" http://127.0.0.1/ping
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> GET /ping HTTP/1.1
> Host: demo.local
> User-Agent: curl/7.54.0
> Accept: */*
> Cookie: session=1510171444496686286
>
< HTTP/1.1 200 OK
< Server: nginx/1.13.6
< Date: Wed, 08 Nov 2017 20:04:36 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 117
< Connection: keep-alive
* Replaced cookie session="1510171444496686286" for domain demo.local, path /, expire 0
< Set-Cookie: session=1510171444496686286
< x-request-id: 3014728b429320f786728401a83246b8
< x-proxy-id: eae36bf0a3dc
< x-server-info: interlock/2.0.0-development (147ff2b1) linux/amd64
< x-upstream-addr: 10.0.2.5:8080
< x-upstream-response-time: 1510171476.948
<
{"instance":"9c67a943ffce","version":"0.1","metadata":"demo-sticky","request_id":"3014728b429320f786728401a83246b8"}

Notice the Set-Cookie from the application. This is stored by the curl command and is sent with subsequent requests, which are pinned to the same instance. If you make a few requests, you will notice the same x-upstream-addr.

IP Hashing

The following example shows how to configure sticky sessions using client IP hashing. This is not as flexible or consistent as cookies but enables workarounds for some applications that cannot use the other method. When using IP hashing, reconfigure Interlock proxy to use host mode networking, because the default ingress networking mode uses SNAT, which obscures client IP addresses.

  1. Create an overlay network so that service traffic is isolated and secure:

    $> docker network create -d overlay demo
    1se1glh749q1i4pw0kf26mfx5
    
  2. Create a service with the cookie to use for sticky sessions using IP hashing:

    $> docker service create \
       --name demo \
       --network demo \
       --detach=false \
       --replicas=5 \
       --label com.docker.lb.hosts=demo.local \
       --label com.docker.lb.port=8080 \
       --label com.docker.lb.ip_hash=true \
       --env METADATA="demo-sticky" \
       ehazlett/docker-demo
    

Interlock detects when the service is available and publishes it. When tasks are running and the proxy service is updated, the application is available via http://demo.local and is configured to use sticky sessions:

$> curl -vs -H "Host: demo.local" http://127.0.0.1/ping
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> GET /ping HTTP/1.1
> Host: demo.local
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.13.6
< Date: Wed, 08 Nov 2017 20:04:36 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 117
< Connection: keep-alive
< x-request-id: 3014728b429320f786728401a83246b8
< x-proxy-id: eae36bf0a3dc
< x-server-info: interlock/2.0.0-development (147ff2b1) linux/amd64
< x-upstream-addr: 10.0.2.5:8080
< x-upstream-response-time: 1510171476.948
<
{"instance":"9c67a943ffce","version":"0.1","metadata":"demo-sticky","request_id":"3014728b429320f786728401a83246b8"}

You can use docker service scale demo=10 to add more replicas. When scaled, requests are pinned to a specific backend.

Note

Due to the way the IP hashing works for extensions, you will notice a new upstream address when scaling replicas. This is expected, because internally the proxy uses the new set of replicas to determine a backend on which to pin. When the upstreams are determined, a new “sticky” backend is chosen as the dedicated upstream.

Secure services with TLS

MKE offers you two different methods for securing your services with Transport Layer Security (TLS): proxy-managed TLS and service-managed TLS.

Method

Description

Proxy-managed TLS

All traffic between users and the proxy is encrypted, but the traffic between the proxy and your Swarm service is not secure.

Service-managed TLS

The end-to-end traffic is encrypted and the proxy service allows TLS traffic to pass through unchanged.

Proxy-managed TLS

This topic describes how to deploy a Swarm service wherein the proxy manages the TLS connection. Using proxy-managed TLS entails that the traffic between the proxy and the Swarm service is not secure, so you should only use this option if you trust that no one can monitor traffic inside the services that run in your datacenter.

To deploy a Swarm service with proxy-managed TLS:

  1. Obtain a private key and certificate for the TLS connection. The Common Name (CN) in the certificate must match the name where your service will be available. Generate a self-signed certificate for app.example.org:

    openssl req \
    -new \
    -newkey rsa:4096 \
    -days 3650 \
    -nodes \
    -x509 \
    -subj "/C=US/ST=CA/L=SF/O=Docker-demo/CN=app.example.org" \
    -keyout app.example.org.key \
    -out app.example.org.cert
    
  2. Create the following docker-compose.yml file:

    version: "3.2"
    
    services:
      demo:
        image: ehazlett/docker-demo
        deploy:
          replicas: 1
          labels:
            com.docker.lb.hosts: app.example.org
            com.docker.lb.network: demo-network
            com.docker.lb.port: 8080
            com.docker.lb.ssl_cert: demo_app.example.org.cert
            com.docker.lb.ssl_key: demo_app.example.org.key
        environment:
          METADATA: proxy-handles-tls
        networks:
          - demo-network
    
    networks:
      demo-network:
        driver: overlay
    secrets:
      app.example.org.cert:
        file: ./app.example.org.cert
      app.example.org.key:
        file: ./app.example.org.key
    

    The demo service has labels specifying that the proxy service routes app.example.org traffic to this service. All traffic between the service and proxy occurs using the demo-network network. The service has labels that specify the Docker secrets used on the proxy service for terminating the TLS connection.

    The private key and certificate are stored as Docker secrets, and thus you can readily scale the number of replicas used for running the proxy service, with MKE distributing the secrets to the replicas.

  3. Download and configure the client bundle and deploy the service:

    docker stack deploy --compose-file docker-compose.yml demo
    
  4. Test that everything works correctly by updating your /etc/hosts file to map app.example.org to the IP address of an MKE node.

  5. Optional. In a production deployment, create a DNS entry so that users can access the service using the domain name of your choice. After creating the DNS entry, access your service at https://<hostname>:<https-port>.

    • hostname is the name you specified with the com.docker.lb.hosts. label.

    • https-port is the port you configured in the MKE settings.

    Because this example uses self-signed certificates, client tools such as browsers display a warning that the connection is insecure.

  6. Optional. Test that everything works using the CLI:

    curl --insecure \
    --resolve <hostname>:<https-port>:<mke-ip-address> \
    https://<hostname>:<https-port>/ping
    

    Example output:

    {"instance":"f537436efb04","version":"0.1","request_id":"5a6a0488b20a73801aa89940b6f8c5d2"}
    

    The proxy uses SNI to determine where to route traffic, and thus you must verify that you are using a version of curl that includes the SNI header with insecure requests. Otherwise, curl displays the following error:

    Server aborted the SSL handshake
    

Note

There is no way to update expired certificates using the proxy-managed TLS method. You must create a new secret and then update the corresponding service.

Service-managed TLS

This topic describes how to deploy a Swarm service wherein the service manages the TLS connection by encrypting traffic from users to your Swarm service.

Deploy your Swarm service using the following example docker-compose.yml file:

version: "3.2"

services:
  demo:
    image: ehazlett/docker-demo
    command: --tls-cert=/run/secrets/cert.pem --tls-key=/run/secrets/key.pem
    deploy:
      replicas: 1
      labels:
        com.docker.lb.hosts: app.example.org
        com.docker.lb.network: demo-network
        com.docker.lb.port: 8080
        com.docker.lb.ssl_passthrough: "true"
    environment:
      METADATA: end-to-end-TLS
    networks:
      - demo-network
    secrets:
      - source: app.example.org.cert
        target: /run/secrets/cert.pem
      - source: app.example.org.key
        target: /run/secrets/key.pem

networks:
  demo-network:
    driver: overlay
secrets:
  app.example.org.cert:
    file: ./app.example.org.cert
  app.example.org.key:
    file: ./app.example.org.key

This updates the service to start using the secrets with the private key and certificate and it labels the service with com.docker.lb.ssl_passthrough: true, thus configuring the proxy service such that TLS traffic for app.example.org is passed to the service.

Since the connection is fully encrypted from end-to-end, the proxy service cannot add metadata such as version information or the request ID to the response headers.

Deploy services with mTLS enabled

Available since MKE 3.5.0

Mutual Transport Layer Security (mTLS) is a process of mutual authentication in which both parties verify the identity of the other party, using a signed certificate.

You must have the following items to deploy services with mTLS:

  • One or more CA certificates for signing the server and client certificates and keys.

  • A signed certificate and key for the server

  • A signed certificate and key for the client


To deploy a back-end service with proxy-managed mTLS enabled:

  1. Create a secret for the CA certificate that the client uses to authenticate the server.

  2. Modify the docker-compose.yml file produced in Proxy-managed TLS:

    1. Add the following label to the docker-compose.yml file:

      com.docker.lb.client_ca_cert: demo_app.example.org.client-ca-cert
      
    2. Add the CA certificate to the secrets: in the docker-compose.yml file:

      app.example.org.client-ca.cert:
        file: ./app.example.org.client-ca.cert
      

    The docker-compose-yml file presents as follows:

    version: "3.2"
    
    services:
      demo:
        image: ehazlett/docker-demo
        deploy:
          replicas: 1
          labels:
            com.docker.lb.hosts: app.example.org
            com.docker.lb.network: demo-network
            com.docker.lb.port: 8080
            com.docker.lb.ssl_cert: demo_app.example.org.cert
            com.docker.lb.ssl_key: demo_app.example.org.key
            com.docker.lb.client_ca_cert: demo_app.example.org.client-ca.cert
        environment:
          METADATA: proxy-handles-tls
        networks:
          - demo-network
    
    networks:
      demo-network:
        driver: overlay
    secrets:
      app.example.org.cert:
        file: ./app.example.org.cert
      app.example.org.key:
        file: ./app.example.org.key
      app.example.org.client-ca.cert:
        file: ./app.example.org.client-ca.cert
    
  3. Deploy the service:

    docker stack deploy --compose-file docker-compose.yml demo
    
  4. Test the mTLS-enabled service:

    curl --insecure \
    --resolve app.example.org:<mke-https-port>:<mke-ip-address> \
    --cacert client_ca_cert.pem \
    --cert client_cert.pem \
    --key client_key.pem \
    https://app.example.org:<mke-https-port>/ping
    

    A successful deployment returns a JSON payload in plain text.

    Note

    Omitting --cacert, --cert, or --key from the cURL command returns an error message, as all three parameters are required.

Use websockets

First, create an overlay network to isolate and secure service traffic:

$> docker network create -d overlay demo
1se1glh749q1i4pw0kf26mfx5

Next, create the service with websocket endpoints:

$> docker service create \
    --name demo \
    --network demo \
    --detach=false \
    --label com.docker.lb.hosts=demo.local \
    --label com.docker.lb.port=8080 \
    --label com.docker.lb.websocket_endpoints=/ws \
    ehazlett/websocket-chat

Note

For websockets to work, you must have an entry for demo.local in your local hosts (i.e., /etc/hosts) file. This uses the browser for websocket communication, so you must have an entry or use a routable domain.

Interlock detects when the service is available and publishes it. Once tasks are running and the proxy service is updated, the application should be available via http://demo.local. Open two instances of your browser and text should be displayed on both instances as you type.

Deploy applications with Kubernetes

Use Kubernetes on Windows Server nodes

Starting with version 3.3.0, MKE now supports the Kubernetes orchestrator on Windows Server nodes.

Prerequisites
  1. Install MKE

  2. Create a linux-only cluster

Note

The Kubernetes orchestrator on Windows Server nodes is not supported on clusters upgraded from previous versions of MKE. To deploy the Kubernetes orchestrator on Windows Server nodes you will need to deploy a new cluster running MKE 3.3.0 and higher.

Adding Windows Server nodes

Having installed a one-node Linux-only MKE cluster, we are now ready to add additional Windows workers to this cluster using these steps:

  1. Use your browser on your local system to log into the MKE installation above.

  2. Navigate to the nodes list and click on Add Node at the top right of the page.

  3. In the Add Node page select “Windows” as the node type. You will notice that Windows Server nodes are only allowed to have worker roles.

  4. Optionally, you may also select and set custom listen and advertise addresses.

  5. A command line will be generated that includes a join-token. It should look something like:

    docker swarm join ... --token <join-token> ...
    

    Copy this command.

  6. Add your Windows Server node to the MKE cluster by running the swarm-join commandline generated above.

Validating the cluster setup
  1. You can use either the MKE web interface or the command line to view your clusters.

    • To use the web interface, log into MKE, and navigate to the node list view. All nodes should be green.

    • Check the node status using this command on your local system:

      kubectl get nodes
      

    Your nodes should all have a status value of “Ready”, as in the following example.

    NAME                   STATUS   ROLES    AGE     VERSION
    user-135716-win-0      Ready    <none>   2m16s   v1.17.2
    user-7d985f-ubuntu-0   Ready    master   4m55s   v1.17.2-docker-d-2
    user-135716-win-1      Ready    <none>   1m12s   v1.17.2
    
  1. Now that you have confirmed that the nodes are ready, the next step is to change the orchestrator to kubernetes for the pods. You can change the orchestrator using the MKE web interface, either by changing the default orchestrator in the Administrator settings, or by using the web interface to toggle the orchestrator after joining a node. The equivalent CLI command is:

    docker node update <node name> --label-add com.docker.ucp.orchestrator.kubernetes=true
    
  2. Optionally, you can deploy a workload on the cluster to make sure everything is working as expected.

Troubleshooting

If you can’t join your Windows Server node to the cluster, confirm that the correct processes are running on the node.

PS C:\> Get-Process tigera-calico

You should see something like this.

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName
-------  ------    -----      -----     ------     --  -- -----------
   276      17    33284      40948      39.89   8132   0 tigera-calico

The next troubleshooting step is to check the kubelet process.

PS C:\> Get-Process kubelet

You should see something like this.

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName
-------  ------    -----      -----     ------     --  -- -----------
   524      23    47332      73380     828.50   6520   0 kubelet

After that check the kube-proxy process.

PS C:\> Get-Process kube-proxy

You should see something like this.

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName
-------  ------    -----      -----     ------     --  -- -----------
   322      19    25464      33488      21.00   7852   0 kube-proxy

If any of the process checks show that something isn’t working, the next step is to check the logs. For kubelet and kubeproxy, the logs are placed under C:klogs. Tigera/Calico logs are placed under C:TigeraCalicologs.

An example workload on Windows Server

This section captures an example workload that serves to illustrate Kubernetes on Windows Server capabilities. The following procedure deploys a complete web application on IIS servers as Kubernetes Services. The example workload includes an MSSQL database and a loadbalancer.

Specifically, the procedure covers:

  • Namespace creation

  • Pod and Deployment scheduling

  • Kubernetes service provisioning

  • Addition of application workloads

  • Connectivity of Pods, Nodes and Services

  1. Create a Namespace.

    $ kubectl create -f demo-namespace.yaml
    
    # demo-namespace.yaml
    apiVersion: v1
    kind: Namespace
    metadata:
    name: demo
    
  2. Create a web service as a Kubernetes service.

    $ kubectl create -f win-webserver.yaml
    service/win-webserver created
    deployment.apps/win-webserver created
    
    $ kubectl get service --namespace demo
    NAME            TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
    win-webserver   NodePort   10.96.29.12   <none>        80:35048/TCP   12m
    
    # win-webserver.yaml
    apiVersion: v1
    kind: Service
    metadata:
    name: win-webserver
    namespace: demo
    labels:
       app: win-webserver
    spec:
    ports:
       # the port that this service should serve on
       - port: 80
          targetPort: 80
    selector:
       app: win-webserver
    type: NodePort
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    labels:
       app: win-webserver
       namespace: demo
    name: win-webserver
    namespace: demo
    spec:
    replicas: 2
    selector:
       matchLabels:
          app: win-webserver
    template:
       metadata:
          labels:
          app: win-webserver
          name: win-webserver
       spec:
          affinity:
          podAntiAffinity:
             requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                   matchExpressions:
                      - key: app
                      operator: In
                      values:
                         - win-webserver
                topologyKey: "kubernetes.io/hostname"
          containers:
          - name: windowswebserver
             image: mcr.microsoft.com/windows/servercore:ltsc2019
             command:
                - powershell.exe
                - -command
                - "<#code used from https://gist.github.com/wagnerandrade/5424431#> ; $$listener = New-Object System.Net.HttpListener ; $$listener.Prefixes.Add('http://*:80/') ; $$listener.Start() ; $$callerCounts = @{} ; Write-Host('Listening at http://*:80/') ; while ($$listener.IsListening) { ;$$context = $$listener.GetContext() ;$$requestUrl = $$context.Request.Url ;$$clientIP = $$context.Request.RemoteEndPoint.Address ;$$response = $$context.Response ;Write-Host '' ;Write-Host('> {0}' -f $$requestUrl) ;  ;$$count = 1 ;$$k=$$callerCounts.Get_Item($$clientIP) ;if ($$k -ne $$null) { $$count += $$k } ;$$callerCounts.Set_Item($$clientIP, $$count) ;$$ip=(Get-NetAdapter | Get-NetIpAddress); $$header='<html><body><H1>Windows Container Web Server</H1>' ;$$callerCountsString='' ;$$callerCounts.Keys | % { $$callerCountsString+='<p>IP {0} callerCount {1} ' -f $$ip[1].IPAddress,$$callerCounts.Item($$_) } ;$$footer='</body></html>' ;$$content='{0}{1}{2}' -f $$header,$$callerCountsString,$$footer ;Write-Output $$content ;$$buffer = [System.Text.Encoding]::UTF8.GetBytes($$content) ;$$response.ContentLength64 = $$buffer.Length ;$$response.OutputStream.Write($$buffer, 0, $$buffer.Length) ;$$response.Close() ;$$responseStatus = $$response.StatusCode ;Write-Host('< {0}' -f $$responseStatus)  } ; "
          nodeSelector:
          beta.kubernetes.io/os: windows
    
  3. Check the pods deployed on different Windows Server worker nodes with Inter-pod affinity and anti-affinity.

    $ kubectl get pod --namespace demo
    
    NAME                            READY   STATUS    RESTARTS   AGE
    win-webserver-8c5678c68-qggzh   1/1     Running   0          6m21s
    win-webserver-8c5678c68-v8p84   1/1     Running   0          6m21s
    
    # Check the detailed status of pods deployed
    $ kubectl describe pod win-webserver-8c5678c68-qggzh --namespace demo
    
  4. Access the web service by node-to-pod communication across the network.

    From a kubectl client:

    $ kubectl get pods --namespace demo -o wide
    
    NAME                            READY   STATUS    RESTARTS   AGE   IP              NODE              NOMINATED NODE   READINESS GATES
    win-webserver-8c5678c68-qggzh   1/1     Running   0          16m   192.168.77.68   user-135716-win-1 <none>           <none>
    win-webserver-8c5678c68-v8p84   1/1     Running   0          16m   192.168.4.206   user-135716-win-0 <none>           <none>
    
    $ ssh -o ServerAliveInterval=15 root@<master-node>
    
    $ curl 10.96.29.12
    <html><body><H1>Windows Container Web Server</H1><p>IP 192.168.77.68 callerCount 1 </body></html>
    $
    

    Run the curl command a second time. You can see the second request load-balanced to a different pod.

    $ curl 10.96.29.12
    <html><body><H1>Windows Container Web Server</H1><p>IP 192.168.4.206 callerCount 1 </body></html>
    
  5. Access the web service by pod-to-pod communication across the network.

    $ kubectl get service --namespace demo
    
    NAME            TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
    win-webserver   NodePort   10.96.29.12   <none>        80:35048/TCP   12m
    
    $ kubectl get pods --namespace demo -o wide
    
    NAME                            READY   STATUS    RESTARTS   AGE   IP              NODE              NOMINATED NODE   READINESS GATES
    win-webserver-8c5678c68-qggzh   1/1     Running   0          16m   192.168.77.68   user-135716-win-1 <none>           <none>
    win-webserver-8c5678c68-v8p84   1/1     Running   0          16m   192.168.4.206   user-135716-win-0 <none>           <none>
    
    $ kubectl exec -it win-webserver-8c5678c68-qggzh --namespace demo cmd
    
    Microsoft Windows [Version 10.0.17763.1098]
    (c) 2018 Microsoft Corporation. All rights reserved.
    
    C:\>curl 10.96.29.12
    <html><body><H1>Windows Container Web Server</H1><p>IP 192.168.77.68
    callerCount 1 <p>IP 192.168.77.68 callerCount 1 </body></html>
    

See also

Kubernetes

Accessing Kubernetes resources

The following diagram shows which Kubernetes resources are visible from the MKE web interface.

_images/kubemap.png

See also

Kubernetes

Deploying a workload to a Kubernetes cluster

You can use the MKE web UI to deploy your Kubernetes YAML files. In most cases, modifications are not necessary to deploy on a cluster managed by MKE.

Deploy an NGINX server

In this example, a simple Kubernetes Deployment object for an NGINX server is defined in a YAML file.

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

This YAML file specifies an earlier version of NGINX, which will be updated in a later section.

To create the YAML file:

  1. Navigate to the MKE web UI, and from the left pane, click Kubernetes.

  2. Click Create to open the Create Kubernetes Object page.

  3. From the Namespace drop-down list, select default.

  4. Paste the previous YAML file in the Object YAML editor.

  5. Click Create.

Inspect the deployment

The MKE web UI shows the status of your deployment when you click the links in the Kubernetes section of the left pane.

  1. From the left pane, click Controllers to see the resource controllers that MKE created for the NGINX server.

  2. Click the nginx-deployment controller, and in the details pane, scroll to the Template section. This shows the values that MKE used to create the deployment.

  3. From the left pane, click Pods to see the pods that are provisioned for the NGINX server. Click one of the pods, and in the details pane, scroll to the Status section to see that pod’s phase, IP address, and other properties.

Expose the server

The NGINX server is up and running, but it’s not accessible from outside of the cluster. Create a YAML file to add a NodePort service to expose the server on a specified port.

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  type: NodePort
  ports:
    - port: 80
      nodePort: 32768
  selector:
    app: nginx

The service connects the cluster’s internal port 80 to the external port 32768.

To expose the server:

  1. Repeat the previous steps and copy-paste the YAML file that defines the nginx service into the Object YAML editor on the Create Kubernetes Object page. When you click Create, the Load Balancers page opens.

  2. Click the nginx service, and in the details pane, find the Ports section.

  3. Click the link that’s labeled URL to view the default NGINX page.

The YAML definition connects the service to the NGINX server using the app label nginx and a corresponding label selector.

Update the deployment

Update an existing deployment by applying an updated YAML file. In this example, the server is scaled up to four replicas and updated to a later version of NGINX.

...
spec:
  progressDeadlineSeconds: 600
  replicas: 4
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx:1.8
...
  1. From the left pane, click Controllers and select nginx-deployment.

  2. In the details pane, click Configure, and in the Edit Deployment page, find the replicas: 2 entry.

  3. Change the number of replicas to 4, so the line reads replicas: 4.

  4. Find the image: nginx:1.7.9 entry and change it to image: nginx:1.8.

  5. Click Save to update the deployment with the new YAML file.

  6. From the the left pane, click Pods to view the newly created replicas.

Use the CLI to deploy Kubernetes objects

With MKE, you deploy your Kubernetes objects on the command line using kubectl.

Use a client bundle to configure your client tools, like Docker CLI and kubectl to communicate with MKE instead of the local deployments you might have running.

When you have the client bundle set up, you can deploy a Kubernetes object from the YAML file.

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  type: NodePort
  ports:
    - port: 80
      nodePort: 32768
  selector:
    app: nginx

Save the previous YAML file to a file named “deployment.yaml”, and use the following command to deploy the NGINX server:

kubectl apply -f deployment.yaml
Inspect the deployment

Use the describe deployment option to inspect the deployment:

kubectl describe deployment nginx-deployment

Also, you can use the MKE web UI to see the deployment’s pods and controllers.

Update the deployment

Update an existing deployment by applying an updated YAML file.

Edit deployment.yaml and change the following lines:

  • Increase the number of replicas to 4, so the line reads replicas: 4.

  • Update the NGINX version by specifying image: nginx:1.8.

Save the edited YAML file to a file named “update.yaml”, and use the following command to deploy the NGINX server:

kubectl apply -f update.yaml

Check that the deployment was scaled out by listing the deployments in the cluster:

kubectl get deployments

You should see four pods in the deployment:

NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment       4         4         4            4           2d

Check that the pods are running the updated image:

kubectl describe deployment nginx-deployment | grep -i image

You should see the currently running image:

Image:        nginx:1.8

See also

Deploying a Compose-based app to a Kubernetes cluster

MKE enables deploying Docker Compose files to Kubernetes clusters. Starting in Compose file version 3.3, you use the same docker-compose.yml file that you use for Swarm deployments, but you specify Kubernetes workloads when you deploy the stack. The result is a true Kubernetes app.

Get access to a Kubernetes namespace

To deploy a stack to Kubernetes, you need a namespace for the app’s resources. Contact your MKE administrator to get access to a namespace. In this example, the namespace is called labs.

Create a Kubernetes app from a Compose file

In this example, you create a simple app, named “lab-words”, by using a Compose file. This assumes you are deploying onto a cloud infrastructure. The following YAML defines the stack:

version: '3.3'

services:
  web:
    image: dockersamples/k8s-wordsmith-web
    ports:
     - "8080:80"

  words:
    image: dockersamples/k8s-wordsmith-api
    deploy:
      replicas: 5

  db:
    image: dockersamples/k8s-wordsmith-db
  1. In your browser, log in to https://<mke-url>. Navigate to Shared Resources > Stacks.

  2. Click Create Stack to open up the “Create Application” page.

  3. Under “Configure Application”, type “lab-words” for the application name.

  4. Select Kubernetes Workloads for Orchestrator Mode.

  5. In the Namespace drowdown, select “labs”.

  6. Under “Application File Mode”, leave Compose File selected and click Next.

  7. Paste the previous YAML, then click Create to deploy the stack.

Inspect the deployment

After a few minutes have passed, all of the pods in the lab-words deployment are running.

To inspect the deployment:

  1. Navigate to Kubernetes > Pods. Confirm that there are seven pods and that their status is Running. If any pod has a status of Pending, wait until every pod is running.

  2. Next, select Kubernetes > Load balancers and find the web-published service.

  3. Click the web-published service, and scroll down to the Ports section.

  4. Under Ports, grab the Node Port information.

  5. In a new tab or window, enter your cloud instance public IP Address and append :<NodePort> from the previous step. For example, to find the public IP address of an EC2 instance, refer to Amazon EC2 Instance IP Addressing. The app is displayed.

See also

Kubernetes

Using Pod Security Policies

Pod Security Policies (PSPs) are cluster-level resources that are enabled by default in MKE 3.2.

There are two default PSPs in MKE: a privileged policy and an unprivileged policy. Administrators of the cluster can enforce additional policies and apply them to users and teams for further control of what runs in the Kubernetes cluster. This topic describes the two default policies, and provides two example use cases for custom policies.

Kubernetes Role Based Access Control (RBAC)

To interact with PSPs, a user will need to be granted access to the PodSecurityPolicy object in Kubernetes RBAC. If the user is a MKE Admin, then the user can already manipulate PSPs. A normal user can interact with policies if a MKE admin creates the following ClusterRole and ClusterRoleBinding:

$ cat <<EOF | kubectl create -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: psp-admin
rules:
- apiGroups:
  - extensions
  resources:
  - podsecuritypolicies
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
EOF

$ USER=jeff

$ cat <<EOF | kubectl create -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: psp-admin:$USER
roleRef:
  kind: ClusterRole
  name: psp-admin
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: User
  name: $USER
EOF
Default pod security policies in MKE

By default, there are two policies defined within MKE, privileged and unprivileged. Additionally, there is a ClusterRoleBinding that gives every single user access to the privileged policy. This is for backward compatibility after an upgrade. By default, any user can create any pod.

Note

PSPs do not override security defaults built into the MKE RBAC engine for Kubernetes pods. These security defaults prevent non-admin users from mounting host paths into pods or starting privileged pods.

$ kubectl get podsecuritypolicies
NAME           PRIV    CAPS   SELINUX    RUNASUSER   FSGROUP    SUPGROUP   READONLYROOTFS   VOLUMES
privileged     true    *      RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            *
unprivileged   false          RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            *

Specification for the privileged policy:

allowPrivilegeEscalation: true
allowedCapabilities:
- '*'
fsGroup:
  rule: RunAsAny
hostIPC: true
hostNetwork: true
hostPID: true
hostPorts:
- max: 65535
  min: 0
privileged: true
runAsUser:
  rule: RunAsAny
seLinux:
  rule: RunAsAny
supplementalGroups:
  rule: RunAsAny
volumes:
- '*'

Specification for the unprivileged policy:

allowPrivilegeEscalation: false
allowedHostPaths:
- pathPrefix: /dev/null
  readOnly: true
fsGroup:
  rule: RunAsAny
hostPorts:
- max: 65535
  min: 0
runAsUser:
  rule: RunAsAny
seLinux:
  rule: RunAsAny
supplementalGroups:
  rule: RunAsAny
volumes:
- '*'
Use the unprivileged policy

Note

When following this guide, if the prompt $ follows admin, the action needs to be performed by a user with access to create pod security policies. If the prompt $ follows user, the MKE account does not need access to the PSP object in Kubernetes. The user only needs the ability to create Kubernetes pods.

To switch users from the privileged policy to the unprivileged policy (or any custom policy), an admin must first remove the ClusterRoleBinding that links all users and service accounts to the privileged policy.

admin $ kubectl delete clusterrolebindings ucp:all:privileged-psp-role

When the ClusterRoleBinding is removed, cluster admins can still deploy pods, and these pods are deployed with the privileged policy. But users or service accounts are unable to deploy pods, because Kubernetes does not know what pod security policy to apply. Note that cluster admins would not be able to deploy deployments.

user $ kubectl apply -f pod.yaml
Error from server (Forbidden): error when creating "pod.yaml": pods "demopod" is forbidden: unable to validate against any pod security policy: []

Therefore, to allow a user or a service account to use the unprivileged policy (or any custom policy), you must create a RoleBinding to link that user or team with the alternative policy. For the unprivileged policy, a ClusterRole has already been defined, but has not been attached to a user.

# List Existing Cluster Roles
admin $ kubectl get clusterrole | grep psp
privileged-psp-role                                                    3h47m
unprivileged-psp-role                                                  3h47m

# Define which user to apply the ClusterRole too
admin $ USER=jeff

# Create a RoleBinding linking the ClusterRole to the User
admin $ cat <<EOF | kubectl create -f -
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: unprivileged-psp-role:$USER
  namespace: default
roleRef:
  kind: ClusterRole
  name: unprivileged-psp-role
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: User
  name: $USER
  namespace: default
EOF

In the following example, when user “jeff” deploys a basic nginx pod, the unprivileged policy then gets applied.

user $ cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: demopod
spec:
  containers:
    - name:  demopod
      image: nginx
EOF

user $ kubectl get pods
NAME      READY   STATUS    RESTARTS   AGE
demopod   1/1     Running   0          10m

To check which PSP is applied to a pod, you can get a detailed view of the pod spec using the -o yaml or -o json syntax with kubectl. You can parse JSON output with jq.

user $ kubectl get pods demopod -o json | jq -r '.metadata.annotations."kubernetes.io/psp"'
unprivileged
Using the unprivileged policy in a deployment

Note

In most use cases, a Pod is not actually scheduled by a user. When creating Kubernetes objects such as Deployments or Daemonsets, the pods are scheduled by a service account or a controller.

If you have disabled the privileged PSP policy, and created a RoleBinding to map a user to a new PSP policy, Kubernetes objects like Deployments and Daemonsets will not be able to deploy pods. This is because Kubernetes objects, like Deployments, use a Service Account to schedule pods, instead of the user that created the Deployment.

user $ kubectl get deployments
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   0/1     0            0           88s

user $ kubectl get replicasets
NAME              DESIRED   CURRENT   READY   AGE
nginx-cdcdd9f5c   1         0         0       92s

user $ kubectl describe replicasets nginx-cdcdd9f5c
...
  Warning  FailedCreate  48s (x15 over 2m10s)  replicaset-controller  Error creating: pods "nginx-cdcdd9f5c-" is forbidden: unable to validate against any pod security policy: []

For this deployment to be able to schedule pods, the service account defined wthin the deployment specification needs to be associated with a PSP policy. If a service account is not defined within a deployment spec, the default service account in a namespace is used.

This is the case in the deployment output above. As there is no service account defined, a Rolebinding is needed to grant the default service account in the default namespace to use the PSP policy.

Example RoleBinding to associate the unprivileged PSP policy in MKE with the defaut service account in the default namespace:

admin $ cat <<EOF | kubectl create -f -
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: unprivileged-psp-role:defaultsa
  namespace: default
roleRef:
  kind: ClusterRole
  name: unprivileged-psp-role
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
EOF

To allow the replica set to schedule pods within the cluster:

user $ kubectl get deployments
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           6m11s

user $ kubectl get replicasets
NAME              DESIRED   CURRENT   READY   AGE
nginx-cdcdd9f5c   1         1         1       6m16s

user $ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
nginx-cdcdd9