The Galera cluster ensures that the OpenStack services are operable. In case of a cluster outage, the number of manual steps to start the cluster, as well as ensuring the necessary access can significantly delay the restoration of services and is prone to operator errors. Therefore, to reduce the complexity of the procedure and support greater scalability, MCP provides the automatic way to verify and restore the Galera cluster in your deployment.
This section describes how to verify the status of a Galera cluster and restore it using the Verify and Restore Galera cluster Jenkins pipeline. Use the automatic restoration procedure only if 1 Galera node is down or the data is corrupted. Otherwise, apply the manual procedure adjusted to the needs of your deployment as described in Restore a Galera cluster manually.
Note
This feature is available starting from the MCP 2019.2.5 maintenance update. Before enabling the feature, follow the steps described in Apply maintenance updates.
Note
The Verify and Restore Galera cluster Jenkins pipeline restores the Galera cluster with the provided configuration and does not fix the issues caused by cluster misconfiguration.
To restore the Galera cluster and database automatically:
Log in to the Jenkins web UI.
Open the Verify and Restore Galera cluster pipeline.
Specify the required parameters:
Parameter | Description and values |
---|---|
SALT_MASTER_URL | Add the IP address of your Salt Master node host and the
salt-api port. For example, http://172.18.170.27:6969 . |
CREDENTIALS_ID | Add credentials_id as credentials for the connection. |
RESTORE_TYPE | Check ONLY_RESTORE if manual backup has been performed
already. The created backup will be used during the restoration.
Check BACKUP_AND_RESTORE if backup has not been performed
and is required to be performed during the pipeline run. |
ASK_CONFIRMATION | Set to False if you do not want the pipeline to wait for a
manual confirmation before running the restoration.
Defaults to True . |
CHECK_TIME_SYNC | Set to False if you do not want the pipeline to verify the
time synchronization accross the nodes. Defaults to True . |
VERIFICATION_RETRIES | Specify the number of retries of the verification process after
the restoration was performed. The value should be increased for
the bigger clusters as it may take more time for such clusters to
come up and synchronize. Defaults to 5 . |
Click Deploy.
The pipeline workflow:
The verification stage:
Example of a verification report:
CLUSTER STATUS REPORT: 6 expected values, 0 warnings and 1 error found:
[OK ] Cluster status: Primary (Expected: Primary)
[OK ] Master node status: true (Expected: ON or true)
[OK ] Master node status comment: Synced (Expected: Joining or Waiting on SST
or Joined or Synced or Donor)
[OK ] Master node connectivity: true (Expected: ON or true)
[OK ] Average size of local reveived queue: 0.166667 (Expected: below 0.5)
(Value above 0 means that the node cannot apply write-sets as fast
as it receives them, which can lead to replication throttling)
[OK ] Average size of local send queue: 0.010204 (Expected: below 0.5)
(Value above 0 indicate replication throttling or network throughput
issues, such as a bottleneck on the network link.)
[ ERROR] Current cluster size: 2 (Expected: 3)
Errors found.
There's something wrong with the cluster, do you want to run a restore?
Are you sure you want to run a restore? Click to confirm
Proceed or Abort
Optional. The backup stage:
Running the Galera database backup pipeline. For the pipeline workflow, see Create an instant backup of a MySQL database automatically.
The restoration stage:
The verification stage:
Verifying the status of the cluster.
After the restoration is finalized, verify that all nodes are back and the cluster is working.
Revert the changes made in the cluster/openstack/database/init.yml
file in the step 2 during Prepare for a Galera cluster restoration.