Disaster recovery overview

Mirantis Secure Registry is a clustered application. You can join multiple replicas for high availability.

For an MSR cluster to be healthy, a majority of its replicas (n/2 + 1) must be healthy and able to communicate with the other replicas. This is also known as maintaining quorum.

The three possible failure scenarios are detailed below.

Replica is unhealthy but cluster maintains quorum

One or more replicas are unhealthy, but the overall majority (n/2 + 1) is still healthy and able to communicate with one another.

Here, the MSR cluster has five replicas but one of the nodes has stopped working, and the other has problems with the MSR overlay network.

Even though these two replicas are unhealthy the MSR cluster has a majority of replicas that are still working, which means that the cluster is healthy.

Thus, you should repair the unhealthy replicas, or remove them from the cluster and join new ones.

The majority of replicas are unhealthy

A majority of replicas are unhealthy, making the cluster lose quorum, but at least one replica is still healthy, or at least the data volumes for MSR are accessible from that replica.

Failure scenario 2

Here, the MSR cluster is unhealthy but since one replica is still running it is possible to repair the cluster without having to restore from a backup, which minimizes the amount of data loss.

All replicas are unhealthy

In this total disaster scenario, in which all MSR replicas are lost, the data volumes for all MSR replicas are corrupted or lost.

Failure scenario 3

Here, you must restore MSR from an existing backup. Such an operation should be considered as a last resort, as such an emergency repair may prevent some data loss.