To upgrade Docker Enterprise, you must individually upgrade each of the following components:
Because some components become temporarily unavailable during an upgrade, schedule upgrades to occur outside of peak business hours to minimize impact to your business.
Mirantis Container Runtime upgrades in Swarm clusters should follow these guidelines in order to avoid IP address space exhaustion and associated application downtime.
Before upgrading Mirantis Container Runtime, you should make sure you create a backup. This makes it possible to recover if anything goes wrong during the upgrade.
You should also check the compatibility matrix, to make sure all Mirantis Container Runtime components are certified to work with one another. You may also want to check the Mirantis Container Runtime maintenance lifecycle, to understand until when your version may be supported.
Before you upgrade, make sure:
Your firewall rules are configured to allow traffic in the ports MKE uses for communication. Learn about MKE port requirements.
Make sure you don’t have containers or services that are listening on ports used by MKE.
Configure your load balancer to forward TCP traffic to the Kubernetes API server port (6443/TCP by default) running on manager nodes.
Externally signed certificates are used by the Kubernetes API server and the MKE controller.
In Swarm overlay networks, each task connected to a network consumes an
IP address on that network. Swarm networks have a finite amount of IPs
based on the --subnet
configured when the network is created. If no
subnet is specified then Swarm defaults to a /24
network with 254
available IP addresses. When the IP space of a network is fully
consumed, Swarm tasks can no longer be scheduled on that network.
Starting with Mirantis Container Runtime 18.09 and later, each Swarm node will consume an IP address from every Swarm network. This IP address is consumed by the Swarm internal load balancer on the network. Swarm networks running on MCR versions 18.09 or greater must be configured to account for this increase in IP usage. Networks at or near consumption prior to engine version 18.09 may have a risk of reaching full utilization that will prevent tasks from being scheduled on to the network.
Maximum IP consumption per network at any given moment follows the following formula:
Max IP Consumed per Network = Number of Tasks on a Swarm Network + 1 IP for each node where these tasks are scheduled
To prevent this from happening, overlay networks should have enough capacity prior to an upgrade to 18.09, such that the network will have enough capacity after the upgrade. The below instructions offer tooling and steps to ensure capacity is measured before performing an upgrade.
The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks.
To avoid application downtime, you should be running Mirantis Container Runtime in Swarm mode and deploying your workloads as Docker services. That way you can drain the nodes of any workloads before starting the upgrade.
If you have workloads running as containers as opposed to swarm services, make sure they are configured with a restart policy. This ensures that your containers are started automatically after the upgrade.
To ensure that workloads running as Swarm services have no downtime, you need to:
If you do this sequentially for every node, you can upgrade with no application downtime. When upgrading manager nodes, make sure the upgrade of a node finishes before you start upgrading the next node. Upgrading multiple manager nodes at the same time can lead to a loss of quorum, and possible data loss.
Starting with a cluster with one or more services configured, determine whether some networks may require updating the IP address space in order to function correctly after an Mirantis Container Runtime 18.09 upgrade.
$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock docker/ip-util-check
If the network is in danger of exhaustion, the output will show similar warnings or errors:
Overlay IP Utilization Report
----
Network ex_net1/XXXXXXXXXXXX has an IP address capacity of 29 and uses 28 addresses
ERROR: network will be over capacity if upgrading Docker engine version 18.09
or later.
----
Network ex_net2/YYYYYYYYYYYY has an IP address capacity of 29 and uses 24 addresses
WARNING: network could exhaust IP addresses if the cluster scales to 5 or more nodes
----
Network ex_net3/ZZZZZZZZZZZZ has an IP address capacity of 61 and uses 52 addresses
WARNING: network could exhaust IP addresses if the cluster scales to 9 or more nodes
With an exhausted network, you can triage it using the following steps.
docker service ls
output. It will display the service
that is unable to completely fill all its replicas such as:ID NAME MODE REPLICAS IMAGE PORTS
wn3x4lu9cnln ex_service replicated 19/24 nginx:latest
docker service ps ex_service
to find a failed replica such
as:ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
...
i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1 Shutdown Rejected 7 minutes ago "node is missing network attac…"
...
docker inspect
. In this example, the
docker inspect i64lee19ia6s
output shows the error in the
Status.Err
field:...
"Status": {
"Timestamp": "2018-08-24T21:03:37.885405884Z",
"State": "rejected",
"Message": "preparing",
**"Err": "node is missing network attachments, ip addresses may be exhausted",**
"ContainerStatus": {
"ContainerID": "",
"PID": 0,
"ExitCode": 0
},
"PortStatus": {}
},
...
The following is a constraint introduced by architectural changes to the Swarm overlay networking when upgrading to Mirantis Container Runtime 18.09 or later. It only applies to this one-time upgrade and to workloads that are using the Swarm overlay driver. Once upgraded to Mirantis Container Runtime 18.09, this constraint does not impact future upgrades.
When upgrading to Mirantis Container Runtime 18.09, manager nodes cannot reschedule new workloads on the managers until all managers have been upgraded to the Mirantis Container Runtime 18.09 (or higher) version. During the upgrade of the managers, there is a possibility that any new workloads that are scheduled on the managers will fail to schedule until all of the managers have been upgraded.
In order to avoid any impactful application downtime, it is advised to reschedule any critical workloads on to Swarm worker nodes during the upgrade of managers. Worker nodes and their network functionality will continue to operate independently during any upgrades or outages on the managers. Note that this restriction only applies to managers and not worker nodes.
When running live application on the cluster during an upgrade operation, remove applications from the nodes being upgraded so as not to create unplanned outages.
Start by draining the node so that services get scheduled in another node and continue running without downtime.
For that, run this command on a manager node:
$ docker node update --availability drain <node>
To upgrade a node individually by operating system, please follow the instructions listed below:
After all manager and worker nodes have been upgrades, the Swarm cluster
can be used again to schedule new workloads. If workloads were
previously scheduled off of the managers, they can be rescheduled again.
If any worker nodes were drained, they can be undrained again by setting
--availability active
.