Disaster recovery overview

Mirantis Secure Registry (MSR) uses RethinkDB to store metadata. RethinkDB is a clustered application, and thus to configure it with high availability it must have three or more servers, and its tables must be configured to have three or more replicas.

For a RethinkDB table to be healthy, a majority (n/2 + 1) of replicas per table must be available. As such, there are three possible failure scenarios:

Failure scenarios



Minority of replicas are unhealthy

One or more table replicas are unhealthy, but the overall majority (n/2 + 1) remains healthy and is able to communicate, each with the others.

As long as more than half of the table voting replicas and more than half of the voting replicas for each shard remain available, one of those voting replicas will be arbitrarily selected as the new primary.

Majority of replicas are unhealthy

Half or more voting replicas of a shard are lost and cannot be reconnected.

An emergency repair of the cluster remains possible, without having to restore from a backup, which minimizes the amount of data lost. Refer to mirantis/msr db emergency-repair for more detail.

All replicas are unhealthy

A complete disaster scenario wherein all replicas are lost, the result being the loss or corruption of all associated data volumes. In this scenario, you must restore MSR from a backup. Restoring from a backup should be a last resort solution. You should first attempt an emergency repair, as this can mitigate data loss. Refer to Restore from an MSR backup for more information.