Workflows of the OpenStack database backup and restoration

This section provides technical details about the internal implementation of automated backup and restoration routines built into MOSK. The below information would be helpful for troubleshooting of any issues related to the process or understanding the impact these procedures impose on a running cloud.

Backup workflow

The OpenStack database backup workflow consists of the following phases.

Backup phase 1

The mariadb-phy-backup job is responsible for:

  • Performing basic sanity checks and choosing right node for backup

  • Verifying the wsrep status and changing the wsrep_desync parameter settings

  • Checking backup integrity (ensuring correct hash sums)

  • Managing the mariadb-phy-backup-runner pod

  • If enabled, synchronizing the local backup storage with the remote S3 storage

During the first backup phase, the following actions take place:

  1. The mariadb-phy-backup pod starts on the node where the mariadb-server replica with the highest number in its name runs. For example, if the MariaDB server pods are named mariadb-server-0, mariadb-server-1, and mariadb-server-2, the mariadb-phy-backup pod starts on the same node as mariadb-server-2.

  2. The backup process verifies the hash sums of existing backup files based on ConfigMap information:

    • If the verification fails and synchronization with the remote S3 storage is enabled, the process checks the hash sums of remote backups as well. If the remote backups are valid, they are downloaded.

    • If the hash sums are incorrect for both local and remote backups, the backup job fails.

    • If no ConfigMap exists, these hash sum checks are skipped.

  3. Sanity check: verification of the Kubernetes status and wsrep status of each MariaDB pod. If some pods have wrong statuses, the backup job fails unless the --allow-unsafe-backup parameter is passed to the main script in the Kubernetes backup job.

    Note

    Since MOSK 22.4, the --allow-unsafe-backup functionality is removed from the product for security and backup procedure simplification purposes.

    Mirantis does not recommend setting the --allow-unsafe-backup parameter unless it is absolutely required. To ensure the consistency of a backup, verify that the MariaDB Galera cluster is in a working state before you proceed with the backup.

  4. Desynchronize the replica from the Galera cluster. The script connects the target replica and sets the wsrep_desync variable to ON. Then, the replica stops receiving write-sets and receives the wsrep status Donor/Desynced. The Kubernetes health check of that mariadb-server pod fails and the Kubernetes status of that pod becomes Not ready. If the pod has the primary label, the MariaDB Controller sets the backup label to it and the pod is removed from the endpoints list of the MariaDB service.

  5. Verify that there is enough space in the /var/backup folder to perform the backup. The amount of available space in the folder should exceed <DB-SIZE> * <MARIADB-BACKUP-REQUIRED-SPACE-RATIO> in KB.

mariadb_backup_scheme-os-k8s-mariadb-backup-phase1

Backup phase 2

  1. The mariadb-phy-backup pod performs the backup using the mariabackup tool.

  2. The script puts the backed up replica back to sync with the Galera cluster by setting wsrep_desync to OFF and waits for the replica to become Ready in Kubernetes.

mariadb_backup_scheme-os-k8s-mariadb-backup-phase2

Backup phase 3

  1. The script calculates hash sums for backup files and stores them in a special ConfigMap.

  2. If the number of existing backups exceeds the value of the MARIADB_BACKUPS_TO_KEEP job parameter, the script removes the oldest backups to maintain the allowed limit.

  3. If enabled, the script synchronizes the local backup storage with the remote S3 storage.

mariadb_backup_scheme-os-k8s-mariadb-backup-phase3

Restoration workflow

The OpenStack database restoration workflow consists of the following phases.

Restoration phase 1

The mariadb-phy-restore job launches the mariadb-phy-restore pod. This pod starts with the mariadb-server PVC with the highest number in its name. This PVC is mounted to the /var/lib/mysql folder and the backup PVC (or local filesystem if the hostapath backend is configured) is mounted to /var/backup.

The mariadb-phy-restore pod contains the main restore script, which is responsible for:

  • Scaling the mariadb-server StatefulSet

  • Verifying the statuses of mariadb-server pods

  • Managing the openstack-mariadb-phy-restore-runner pods

  • Checking backup integrity (ensuring correct hash sums)

Caution

During the restoration, the database is not available for OpenStack services that means a complete outage of all OpenStack services.

During the first phase, the following actions take place:

  1. The restoration process verifies the hash sums of existing backup files based on ConfigMap information:

    • If the verification fails and synchronization with the remote S3 storage is enabled, the process checks the hash sums of remote backups as well. If the remote backups are valid, they are downloaded.

    • If the hash sums are incorrect for both local and remote backups, the backup job fails.

  2. Save the list of mariadb-server persistent volume claims (PVC).

  3. Scale the mariadb server StatefulSet to 0 replicas. At this point, the database becomes unavailable for OpenStack services.

mariadb_backup_scheme-os-k8s-mariadb-restore-phase1

Restoration phase 2

  1. The mariadb-phy-restore pod performs the following actions:

    1. Launches the openstack-mariadb-phy-restore-runner pod for each mariadb-server PVC. This pod cleans all MySQL data on each PVC.

    2. Collects logs from the openstack-mariadb-phy-restore-runner pod and then removes it.

    3. Unarchives the database backup files to a temporary directory within /var/backup.

    4. Executes mariabackup --prepare on the unarchived data.

    5. Restores the backup to /var/lib/mysql.

mariadb_backup_scheme-os-k8s-mariadb-restore-phase2

Restoration phase 3

  1. The mariadb-phy-restore pod scales the mariadb-server StatefulSet back to the configured number of replicas.

  2. The mariadb-phy-restore pod waits until all mariadb-server replicas are ready.

mariadb_backup_scheme-os-k8s-mariadb-restore-phase3