Workflows of the OpenStack database backup and restoration¶
This section provides technical details about the internal implementation of automated backup and restoration routines built into MOSK. The below information would be helpful for troubleshooting of any issues related to the process or understanding the impact these procedures impose on a running cloud.
Backup workflow¶
The OpenStack database backup workflow consists of the following phases.
Backup phase 1¶
The mariadb-phy-backup
job is responsible for:
Performing basic sanity checks and choosing right node for backup
Verifying the wsrep status and changing the
wsrep_desync
parameter settingsChecking backup integrity (ensuring correct hash sums)
Managing the
mariadb-phy-backup-runner
podIf enabled, synchronizing the local backup storage with the remote S3 storage
During the first backup phase, the following actions take place:
The
mariadb-phy-backup
pod starts on the node where themariadb-server
replica with the highest number in its name runs. For example, if the MariaDB server pods are namedmariadb-server-0
,mariadb-server-1
, andmariadb-server-2
, themariadb-phy-backup
pod starts on the same node asmariadb-server-2
.The backup process verifies the hash sums of existing backup files based on ConfigMap information:
If the verification fails and synchronization with the remote S3 storage is enabled, the process checks the hash sums of remote backups as well. If the remote backups are valid, they are downloaded.
If the hash sums are incorrect for both local and remote backups, the backup job fails.
If no ConfigMap exists, these hash sum checks are skipped.
Sanity check: verification of the Kubernetes status and wsrep status of each MariaDB pod. If some pods have wrong statuses, the backup job fails unless the
--allow-unsafe-backup
parameter is passed to the main script in the Kubernetes backup job.Note
Since MOSK 22.4, the
--allow-unsafe-backup
functionality is removed from the product for security and backup procedure simplification purposes.Mirantis does not recommend setting the
--allow-unsafe-backup
parameter unless it is absolutely required. To ensure the consistency of a backup, verify that the MariaDB Galera cluster is in a working state before you proceed with the backup.Desynchronize the replica from the Galera cluster. The script connects the target replica and sets the
wsrep_desync
variable toON
. Then, the replica stops receiving write-sets and receives the wsrep statusDonor/Desynced
. The Kubernetes health check of thatmariadb-server
pod fails and the Kubernetes status of that pod becomesNot ready
. If the pod has theprimary
label, the MariaDB Controller sets thebackup
label to it and the pod is removed from the endpoints list of the MariaDB service.Verify that there is enough space in the
/var/backup
folder to perform the backup. The amount of available space in the folder should exceed<DB-SIZE> * <MARIADB-BACKUP-REQUIRED-SPACE-RATIO>
in KB.
Backup phase 2¶
The
mariadb-phy-backup
pod performs the backup using the mariabackup tool.The script puts the backed up replica back to sync with the Galera cluster by setting
wsrep_desync
toOFF
and waits for the replica to becomeReady
in Kubernetes.
Backup phase 3¶
The script calculates hash sums for backup files and stores them in a special ConfigMap.
If the number of existing backups exceeds the value of the
MARIADB_BACKUPS_TO_KEEP
job parameter, the script removes the oldest backups to maintain the allowed limit.If enabled, the script synchronizes the local backup storage with the remote S3 storage.
Restoration workflow¶
The OpenStack database restoration workflow consists of the following phases.
Restoration phase 1¶
The mariadb-phy-restore
job launches the mariadb-phy-restore
pod.
This pod starts with the mariadb-server
PVC with the highest number
in its name. This PVC is mounted to the /var/lib/mysql
folder and the
backup PVC (or local filesystem if the hostapath backend is configured)
is mounted to /var/backup
.
The mariadb-phy-restore
pod contains the main restore script, which is
responsible for:
Scaling the
mariadb-server
StatefulSetVerifying the statuses of
mariadb-server
podsManaging the
openstack-mariadb-phy-restore-runner
podsChecking backup integrity (ensuring correct hash sums)
Caution
During the restoration, the database is not available for OpenStack services that means a complete outage of all OpenStack services.
During the first phase, the following actions take place:
The restoration process verifies the hash sums of existing backup files based on ConfigMap information:
If the verification fails and synchronization with the remote S3 storage is enabled, the process checks the hash sums of remote backups as well. If the remote backups are valid, they are downloaded.
If the hash sums are incorrect for both local and remote backups, the backup job fails.
Save the list of
mariadb-server
persistent volume claims (PVC).Scale the
mariadb
server StatefulSet to0
replicas. At this point, the database becomes unavailable for OpenStack services.
Restoration phase 2¶
The
mariadb-phy-restore
pod performs the following actions:Launches the
openstack-mariadb-phy-restore-runner
pod for eachmariadb-server
PVC. This pod cleans all MySQL data on each PVC.Collects logs from the
openstack-mariadb-phy-restore-runner
pod and then removes it.Unarchives the database backup files to a temporary directory within
/var/backup
.Executes
mariabackup --prepare
on the unarchived data.Restores the backup to
/var/lib/mysql
.
Restoration phase 3¶
The
mariadb-phy-restore
pod scales themariadb-server
StatefulSet back to the configured number of replicas.The
mariadb-phy-restore
pod waits until allmariadb-server
replicas are ready.