Skip to content

Disaster recovery#

MKE 4k supports disaster recovery, whereby a cluster is bootstrapped either on a different infrastructure or with a different node configuration than the original backed-up cluster.

Typically, disaster recovery scenarios include:

  • Cluster recovery wherein the original infrastructure is no longer available and the original cluster is irretrievable.
  • Cluster restoration from a backup, wherein some of the nodes may be different or may have different metadata; examples include name, IP address, labels, and annotations. For information, refer to the Create a backup documentation.

Warning

  • You must flatten the nodes that you use for disaster recovery, as they may have previously been a part of any MKE 4k or k0s cluster.

  • You must configure the bootstrap node appropriately before you attempt disaster recovery. For example, you may need to configuremax_user_instances and max_user_watches to a higher non-default value, as exemplified:

    sudo sysctl fs.inotify.max_user_instances=1280
    sudo sysctl fs.inotify.max_user_watches=655360
    

Important

  • Do not change the DNS name for the load balancer while performing disaster recovery. If you opt to deploy a new load balancer, ensure that it uses the same DNS name as the one in use by the original cluster.

Info

  • During disaster recovery, the original nodes are not accessible in the new environment. As such, because the entire cluster will initially be brought up on a single node, some of the containers and processes may encounter resource issues. You can resolve these issues either by using a highly resourced node or through operating system configuration.

Disaster recovery is a two-step process:

  1. Bootstrap a new cluster on a single node and create a single node cluster from the backup tar file.
  2. Restore user workloads by joining the remaining manager and worker nodes to the cluster and ensuring that the workloads achieve ready state.

Bootstrap a new cluster:#

  1. Prepare a hosts.yaml file that contains the information needed to SSH into the bootstrap node:

      hosts:
      - ssh:
          address: <bootstrap-node-ip-address>
          user: <SSH-user-name>
          port: <ssh port>
          keyPath: <full-path-to-SSH-private-key>
        role: controller+worker
    
  2. Restore the cluster:

    mkectl restore -l debug --hosts-path <full-path-to-hosts.yaml-file-on-node-from-which-you-run-mkectl> --name <backup-tar-file>
    

Restore user workloads#

  1. Add the required additional nodes to the mke4.yaml configuration file that was generated as a result of bootstrapping a new cluster.

    Tip

    You can use the newly generated mke4.yaml configuration file to add the remaining nodes to the cluster, as well as to construct a one-to-one correspondence between nodes in the original cluster and nodes in the recreated cluster through the use of node labels.

  2. Run the mkectl apply command.