Repair a cluster¶
For a MSR cluster to be healthy, a majority of its replicas (n/2 + 1) need to be healthy and be able to communicate with the other replicas. This is known as maintaining quorum.
In a scenario where quorum is lost, but at least one replica is still accessible, you can use that replica to repair the cluster. That replica doesn’t need to be completely healthy. The cluster can still be repaired as the MSR data volumes are persisted and accessible.
Repairing the cluster from an existing replica minimizes the amount of data lost. If this procedure doesn’t work, you’ll have to restore from an existing backup.
Diagnose an unhealthy cluster¶
When a majority of replicas are unhealthy, causing the overall MSR
cluster to become unhealthy,
internal server error presents for operations
such as docker login , docker pull , and
/_ping endpoint of any replica also returns the same
error. It is also possible that the MSR web UI is partially or fully
Using the msr db scale command returns an error such as:
Perform an emergency repair¶
Use the msr db emergency-repair command to repair an
unhealthy MSR cluster from the
This command overrides the standard safety checks that occur when scaling a
RethinkDB cluster. This allows RethinkDB to modify the replication factor to
the setting most appropriate for the number of
rethinkdb-cluster Pods that
are connected to the database.
The msr db emergency-repair command is commonly used when the
msr db scale command is no longer able to reliably scale the
database. This typically occurs when there is a prior loss of quorum, which
often happens when you scale
rethinkdb.cluster.replicaCount without first
decommissioning and scaling RethinkDB servers. For more information on scaling
down RethinkDB servers, refer to Remove replicas from RethinkDB.
Run the following command to perform an emergency repair:
kubectl exec deploy/msr-api -- msr db emergency-repair
docker run -v $(pwd)/values.yml:/config/values.yml -v /var/run/docker.sock:/var/run/docker.sock -it msr-installer scale --replicas 1