Disaster recovery#
MKE 4k supports disaster recovery, whereby a cluster is bootstrapped either on a different infrastructure or with a different node configuration than the original backed-up cluster.
Typically, disaster recovery scenarios include:
- Cluster recovery wherein the original infrastructure is no longer available and the original cluster is irretrievable.
- Cluster restoration from a backup, wherein some of the nodes may be different or may have different metadata; examples include name, IP address, labels, and annotations. For information, refer to the Create a backup documentation.
Warning
-
You must flatten the nodes that you use for disaster recovery, as they may have previously been a part of any MKE 4k or k0s cluster.
-
You must configure the bootstrap node appropriately before you attempt disaster recovery. For example, you may need to configure
max_user_instancesandmax_user_watchesto a higher non-default value, as exemplified:sudo sysctl fs.inotify.max_user_instances=1280 sudo sysctl fs.inotify.max_user_watches=655360
Important
- Do not change the DNS name for the load balancer while performing disaster recovery. If you opt to deploy a new load balancer, ensure that it uses the same DNS name as the one in use by the original cluster.
Info
- During disaster recovery, the original nodes are not accessible in the new environment. As such, because the entire cluster will initially be brought up on a single node, some of the containers and processes may encounter resource issues. You can resolve these issues either by using a highly resourced node or through operating system configuration.
Disaster recovery is a two-step process:
- Bootstrap a new cluster on a single node and create a single node cluster from the backup tar file.
- Restore user workloads by joining the remaining manager and worker nodes to the cluster and ensuring that the workloads achieve ready state.
Bootstrap a new cluster:#
-
Prepare a
hosts.yamlfile that contains the information needed to SSH into the bootstrap node:hosts: - ssh: address: <bootstrap-node-ip-address> user: <SSH-user-name> port: <ssh port> keyPath: <full-path-to-SSH-private-key> role: controller+worker -
Restore the cluster:
mkectl restore -l debug --hosts-path <full-path-to-hosts.yaml-file-on-node-from-which-you-run-mkectl> --name <backup-tar-file>
Restore user workloads#
-
Add the required additional nodes to the
mke4.yamlconfiguration file that was generated as a result of bootstrapping a new cluster.Tip
You can use the newly generated
mke4.yamlconfiguration file to add the remaining nodes to the cluster, as well as to construct a one-to-one correspondence between nodes in the original cluster and nodes in the recreated cluster through the use of node labels. -
Run the
mkectl applycommand.