MKE Disaster Recovery¶
Disaster recovery procedures should be performed in the following order:
MKE disaster recovery (this topic)
MSR disaster recovery
MKE disaster recovery¶
In the event half or more manager nodes are lost and cannot be recovered to a healthy state, the system is considered to have lost quorum and can only be restored through the following disaster recovery procedure.
Recover an MKE cluster from an existing backup¶
If MKE is still installed on the swarm, uninstall MKE:
Note
Skip this step when restoring MKE on new machines.
docker container run -it --rm -v /var/run/docker.sock:/var/run/docker.sock \ mirantis/ucp:<mkeVersion> uninstall-ucp -i
In the above command, replace
<mkeVersion>
with the matching MKE version of your backup.You will be prompted to confirm the uninstallation of the MKE instance with the corresponding instance ID.
Example of system response:
INFO[0000] Detected UCP instance tgokpm55qcx4s2dsu1ssdga92 INFO[0000] We're about to uninstall UCP from this Swarm cluster Do you want to proceed with the uninstall? (y/n):
Restore MKE from the existing backup as described in Backup MKE.
If the swarm exists, restore MKE on a manager node. Otherwise, restore MKE on any node, and the swarm will be created automatically during the restore procedure.
Recreate objects within Orchestrators that MKE supports¶
Kubernetes currently backs up the declarative state of Kube objects in etcd. However, for Swarm, there is no way to take the state and export it to a declarative format, since the objects that are embedded within the Swarm raft logs are not easily transferable to other nodes or clusters.
For disaster recovery, to recreate swarm related workloads requires
having the original scripts used for deployment. Alternatively, you can
recreate workloads by manually recreating output from docker inspect
commands.
See also