mirantis/dtr emergency-repair

mirantis/dtr emergency-repair

Recover MSR from loss of quorum

Usage

docker run -it --rm mirantis/dtr \
    emergency-repair [command options]

Description

This command repairs a MSR cluster that has lost quorum by reverting your cluster to a single MSR replica.

There are three steps you can take to recover an unhealthy MSR cluster:

  1. If the majority of replicas are healthy, remove the unhealthy nodes from the cluster, and join new ones for high availability.

  2. If the majority of replicas are unhealthy, use this command to revert your cluster to a single MSR replica.

  3. If you can’t repair your cluster to a single replica, you’ll have to restore from an existing backup, using the restore command.

When you run this command, a MSR replica of your choice is repaired and turned into the only replica in the whole MSR cluster. The containers for all the other MSR replicas are stopped and removed. When using the force option, the volumes for these replicas are also deleted.

After repairing the cluster, you should use the join command to add more MSR replicas for high availability.

Options

Option

Environment variable

Description

--debug

$DEBUG

Enable debug mode for additional logs.

--existing-replica-id

$MSR_REPLICA_ID

The ID of an existing MSR replica. To add, remove or modify MSR, you must connect to an existing healthy replica’s database.

--help-extended

$MSR_EXTENDED_HELP

Display extended help text for a given command.

--overlay-subnet

$MSR_OVERLAY_SUBNET

The subnet used by the dtr-ol overlay network. Example: 10.0.0.0/24. For high-availability, MSR creates an overlay network between MKE nodes. This flag allows you to choose the subnet for that network. Make sure the subnet you choose is not used on any machine where MSR replicas are deployed.

--prune

$PRUNE

Delete the data volumes of all unhealthy replicas. With this option, the volume of the MSR replica you’re restoring is preserved but the volumes for all other replicas are deleted. This has the same result as completely uninstalling MSR from those replicas.

--ucp-ca

$UCP_CA

Use a PEM-encoded TLS CA certificate for MKE. Download the MKE TLS CA certificate from https:// /ca, and use --ucp-ca "$(cat ca.pem)".

--ucp-insecure-tls

$UCP_INSECURE_TLS

Disable TLS verification for MKE. The installation uses TLS but always trusts the TLS certificate used by MKE, which can lead to MITM (man-in-the-middle) attacks. For production deployments, use --ucp-ca "$(cat ca.pem)" instead.

--ucp-password

$UCP_PASSWORD

The MKE administrator password.

--ucp-url

$UCP_URL

The MKE URL including domain and port.

--ucp-username

$UCP_USERNAME

The MKE administrator username.

--y, yes

$YES

Answer yes to any prompts.