Repair a single replica¶
When one or more MSR replicas are unhealthy but the overall majority (n/2 + 1) is healthy and able to communicate with one another, your MSR cluster is still functional and healthy.
Given that the MSR cluster is healthy, there is no need to execute a disaster recovery procedure, such as restoring from a backup. Instead, you should:
Instead, you should:
Remove the unhealthy replicas from the MSR cluster.
Join new replicas to make MSR highly available.
The order in which you perform these operations is important, as an MSR cluster requires a majority of replicas to be healthy at all times. If you join more replicas before removing the ones that are unhealthy, your MSR cluster might become unhealthy.
Split-brain scenario¶
To understand why you should remove unhealthy replicas before joining new ones, imagine you have a five-replica MSR deployment, and something goes wrong with the overlay network connection the replicas, causing them to be separated in two groups.
Because the cluster originally had five replicas, it can work as long as three replicas are still healthy and able to communicate (5 / 2 + 1 = 3). Even though the network separated the replicas in two groups, MSR is still healthy.
If at this point you join a new replica instead of fixing the network problem or removing the two replicas that got isolated from the rest, it is possible that the new replica ends up in the side of the network partition that has less replicas.
When this happens, both groups now have the minimum amount of replicas needed to establish a cluster. This is also known as a split-brain scenario, because both groups can now accept writes and their histories start diverging, making the two groups effectively two different clusters.
Scale Helm deployment¶
Important
With MSR 3.0 you can configure the number of replicas, however you cannot add or remove separate replicas.
To scale your Helm deployment, you must first obtain your MSR deployment:
kubectl get deployment
Next, run the following command to add and remove replicas from your MSR deployment.
kubectl scale deployment --replicas=3 <deployment-name>
Example:
kubectl scale deployment --replicas=3 msr-api
For comprehensive information on how to scale MSR on Helm up and down as a Kubernetes application, refer to the Kubernetes documenation Running Multiple Instances of Your App.
See also