Workflows of the OpenStack database backup and restoration

This section provides technical details about the internal implementation of automated backup and restoration routines built into MOSK. The below information would be helpful for troubleshooting of any issues related to the process or understanding the impact these procedures impose on a running cloud.

Backup workflow

The OpenStack database backup workflow consists of the following phases.

Backup phase 1

The mariadb-phy-backup job launches the mariadb-phy-backup-<TIMESTAMP> pod. This pod contains the main backup script, which is responsible for:

  • Basic sanity checks and choosing right node for backup

  • Verifying the wsrep status and changing the wsrep_desync parameter settings

  • Managing the mariadb-phy-backup-runner pod

During the first backup phase, the following actions take place:

  1. Sanity check: verification of the Kubernetes status and wsrep status of each MariaDB pod. If some pods have wrong statuses, the backup job fails unless the --allow-unsafe-backup parameter is passed to the main script in the Kubernetes backup job.


    • Since MOSK 22.4, the --allow-unsafe-backup functionality is removed from the product for security and backup procedure simplification purposes.

    • Mirantis does not recommend setting the --allow-unsafe-backup parameter unless it is absolutely required. To ensure the consistency of a backup, verify that the MariaDB Galera cluster is in a working state before you proceed with the backup.

  2. Select the replica to back up. The system selects the replica with the highest number in its name as a target replica. For example, if the MariaDB server pods have the mariadb-server-0, mariadb-server-1, and mariadb-server-2 names, the mariadb-server-2 replica will be backed up.

  3. Desynchronize the replica from the Galera cluster. The script connects the target replica and sets the wsrep_desync variable to ON. Then, the replica stops receiving write-sets and receives the wsrep status Donor/Desynced. The Kubernetes health check of that mariadb-server pod fails and the Kubernetes status of that pod becomes Not ready. If the pod has the primary label, the MariaDB Controller sets the backup label to it and the pod is removed from the endpoints list of the MariaDB service.


Backup phase 2

  1. The main script in the mariadb-phy-backup pod launches the Kubernetes pod mariadb-phy-backup-runner-<TIMESTAMP> on the same node where the target mariadb-server replica is running, which is node X in the example.

  2. The mariadb-phy-backup-runner pod has both mysql data directory and backup directory mounted. The pod performs the following actions:

    1. Verifies that there is enough space in the /var/backup folder to perform the backup. The amount of available space in the folder should be greater than <DB-SIZE> * <MARIADB-BACKUP-REQUIRED-SPACE-RATIO in KB.

    2. Performs the actual backup using the mariabackup tool.

    3. If the number of current backups is greater than the value of the MARIADB_BACKUPS_TO_KEEP job parameter, the script removes all old backups exceeding the allowed number of backups.

    4. Exits with 0 code.

  3. The script waits untill the mariadb-phy-backup-runner pod is completed and collects its logs.

  4. The script puts the backed up replica back to sync with the Galera cluster by setting wsrep_desync to OFF and waits for the replica to become Ready in Kubernetes.


Restoration workflow

The OpenStack database restoration workflow consists of the following phases.

Restoration phase 1

The mariadb-phy-restore job launches the mariadb-phy-restore pod. This pod contains the main restore script, which is responsible for:

  • Scaling of the mariadb-server StatefulSet

  • Verifying of the mariadb-server pods statuses

  • Managing of the openstack-mariadb-phy-restore-runner pods


During the restoration, the database is not available for OpenStack services that means a complete outage of all OpenStack services.

During the first phase, the following actions are performed:

  1. Save the list of mariadb-server persistent volume claims (PVC).

  2. Scale the mariadb server StatefulSet to 0 replicas. At this point, the database becomes unavailable for OpenStack services.


Restoration phase 2

  1. The mariadb-phy-restore pod launches openstack-mariadb-phy-restore-runner with the first mariadb-server replica PVC mounted to the /var/lib/mysql folder and the backup PVC mounted to /var/backup. The openstack-mariadb-phy-restore-runner pod performs the following actions:

    1. Unarchives the database backup files to a temporary directory within /var/backup.

    2. Executes mariabackup --prepare on the unarchived data.

    3. Creates the .prepared file in the temporary directory in /var/backup.

    4. Restores the backup to /var/lib/mysql.

    5. Exits with 0.

  2. The script in the mariadb-phy-restore pod collects the logs from the openstack-mariadb-phy-restore-runner pod and removes the pod. Then, the script launches the next openstack-mariadb-phy-restore-runner pod for the next mariadb-server replica PVC. The openstack-mariadb-phy-restore-runner pod restores the backup to /var/lib/mysql and exits with 0.

    Step 2 is repeated for every mariadb-server replica PVC sequentially.

  3. When the last replica’s data is restored, the last openstack-mariadb-phy-restore-runner pod removes the .prepared file and the temporary folder with unachieved data from /var/backup.


Restoration phase 3

  1. The mariadb-phy-restore pod scales the mariadb-server StatefulSet back to the configured number of replicas.

  2. The mariadb-phy-restore pod waits until all mariadb-server replicas are ready.