Restore a Galera cluster and database automatically

Restore a Galera cluster and database automatically¶

The Galera cluster ensures that the OpenStack services are operable. In case of a cluster outage, the number of manual steps to start the cluster, as well as ensuring the necessary access can significantly delay the restoration of services and is prone to operator errors. Therefore, to reduce the complexity of the procedure and support greater scalability, MCP provides the automatic way to verify and restore the Galera cluster in your deployment.

This section describes how to verify the status of a Galera cluster and restore it using the Verify and Restore Galera cluster Jenkins pipeline. Use the automatic restoration procedure only if 1 Galera node is down or the data is corrupted. Otherwise, apply the manual procedure adjusted to the needs of your deployment as described in Restore a Galera cluster manually.

Note

This feature is available starting from the MCP 2019.2.5 maintenance update. Before enabling the feature, follow the steps described in Apply maintenance updates.

Note

The Verify and Restore Galera cluster Jenkins pipeline restores the Galera cluster with the provided configuration and does not fix the issues caused by cluster misconfiguration.

To restore the Galera cluster and database automatically:

Log in to the Jenkins web UI.
Open the Verify and Restore Galera cluster pipeline.

Specify the required parameters:

Parameter	Description and values
SALT_MASTER_URL	Add the IP address of your Salt Master node host and the `salt-api` port. For example, `http://172.18.170.27:6969`.
CREDENTIALS_ID	Add `credentials_id` as credentials for the connection.
RESTORE_TYPE	Check `ONLY_RESTORE` if manual backup has been performed already. The created backup will be used during the restoration. Check `BACKUP_AND_RESTORE` if backup has not been performed and is required to be performed during the pipeline run.
ASK_CONFIRMATION	Set to `False` if you do not want the pipeline to wait for a manual confirmation before running the restoration. Defaults to `True`.
CHECK_TIME_SYNC	Set to `False` if you do not want the pipeline to verify the time synchronization accross the nodes. Defaults to `True`.
VERIFICATION_RETRIES	Specify the number of retries of the verification process after the restoration was performed. The value should be increased for the bigger clusters as it may take more time for such clusters to come up and synchronize. Defaults to `5`.

Click Deploy.

The pipeline workflow:

The verification stage:

Obtaining and parsing the result of the mysql.status call.
Formatting and printing a result report to the user.

Example of a verification report:

 CLUSTER STATUS REPORT: 6 expected values, 0 warnings and 1 error found:

 [OK     ] Cluster status: Primary (Expected: Primary)
 [OK     ] Master node status: true (Expected: ON or true)
 [OK     ] Master node status comment: Synced (Expected: Joining or Waiting on SST
           or Joined or Synced or Donor)
 [OK     ] Master node connectivity: true (Expected: ON or true)
 [OK     ] Average size of local reveived queue: 0.166667 (Expected: below 0.5)
           (Value above 0 means that the node cannot apply write-sets as fast
           as it receives them, which can lead to replication throttling)
 [OK     ] Average size of local send queue: 0.010204 (Expected: below 0.5)
           (Value above 0 indicate replication throttling or network throughput
           issues, such as a bottleneck on the network link.)

 [  ERROR] Current cluster size: 2 (Expected: 3)

 Errors found.

There's something wrong with the cluster, do you want to run a restore?

 Are you sure you want to run a restore? Click to confirm
 Proceed or Abort

Optional. The backup stage:

Running the Galera database backup pipeline. For the pipeline workflow, see Create an instant backup of a MySQL database automatically.
The restoration stage:
1. If Proceed is selected, the restoration stage will continue. Otherwise, it will abort.
2. The last shutdown node will be used as a source of truth.
The verification stage:

Verifying the status of the cluster.

After the restoration is finalized, verify that all nodes are back and the cluster is working.
Revert the changes made in the cluster/openstack/database/init.yml file in the step 2 during Prepare for a Galera cluster restoration.

updated: 2025-01-10 08:56

Prepare for a Galera cluster restoration

View Previous Section

Restore a Galera cluster manually