Skip to content

Restoring From Backup#

Note

Please refer to the official migration documentation to familiarize yourself with potential limitations of the Velero backup system.

In the event of disaster, you can restore from a backup by doing the following:

  1. Create a clean Mirantis k0rdent Enterprise installation, including velero and its plugins. Specifically, you want to avoid creating a Management object and similar objects because they will be part of your restored cluster. You can remove these objects after installation, but you can also install Mirantis k0rdent Enterprise without them in the first place:

    helm install kcm oci://registry.mirantis.com/k0rdent-enterprise/charts/k0rdent-enterprise \
     --version <version> \
     --create-namespace \
     --namespace kcm-system \
     --set controller.createManagement=false \
     --set controller.createAccessManagement=false \
     --set controller.createRelease=false \
     --set controller.createTemplates=false \
     --set regional.velero.initContainers[0].name=velero-plugin-for-<provider-name> \
     --set regional.velero.initContainers[0].image=velero/velero-plugin-for-<provider-name>:<provider-plugin-tag> \
     --set regional.velero.initContainers[0].volumeMounts[0].mountPath=/target \
     --set regional.velero.initContainers[0].volumeMounts[0].name=plugins
    
  2. Create the BackupStorageLocation/Secret objects that were created during the preparation stage of creating a backup (preferably the same depending on a plugin).

  3. Restore the kcm system creating the Restore object. Follow one of the case that is applicable to clusters' configuration in use:

    1. If there are no regional clusters or all regional clusters' infrastructure is healthy.

      Note that it is important to set the .spec.existingResourcePolicy field value to update.

      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: <restore-name>
        namespace: kcm-system
      spec:
        backupName: <backup-name>
        existingResourcePolicy: update
        includedNamespaces:
        - '*'
      
    2. If one or more regional clusters require reprovisioning.

      The following listing will create a ConfigMap object along with the Restore object, and it allows Velero to set the pause annotation to all of regions objects.

      Substitute <cluster-deployment-name> with the corresponding names of ClusterDeployment objects used for provisioning of the corresponding region cluster.

      Note that it is important to set the .spec.existingResourcePolicy field value to update.

      ---
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: add-region-pause-anno
        namespace: kcm-system
      data:
        add-region-pause-anno: |
          version: v1
          resourceModifierRules:
          - conditions:
              groupResource: regions.k0rdent.mirantis.com
            mergePatches:
            - patchData: |
                {
                  "metadata": {
                    "annotations": {
                      "k0rdent.mirantis.com/region-pause": "true"
                    }
                  }
                }
      ---
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: <restore-name>
        namespace: kcm-system
      spec:
        backupName: <backup-name>
        existingResourcePolicy: update
        includedNamespaces:
        - '*'
        labelSelector:
          matchExpressions:
            - key: cluster.x-k8s.io/cluster-name
              operator: NotIn
              values: ["<cluster-deployment-name>"]
            # Add new entries accordingly if more regional clusters require reprovisioning
            # - key: cluster.x-k8s.io/cluster-name
            #   operator: NotIn
            #   values: ["<cluster-deployment-name>"]
        resourceModifier:
          kind: ConfigMap
          name: add-region-pause-anno
      
  4. Wait until the Restore status is Completed and all kcm components are up and running.

  5. If there were one or more regional clusters that required reprovisioning, then:

    1. On the management cluster, wait for the regions object readiness:

      kubectl wait regions kcm --for=condition=Ready=True --timeout 30m
      
    2. Manually ensure that the freshly reprovisioned regional cluster runs and is accessable.

    3. On the regional cluster, repeat the second step, creating the BackupStorageLocation/Secret objects that were created during the preparation stage.

    4. On the regional cluster, restore the cluster by creating a new Restore object:

      Note that in this case the .spec.existingResourcePolicy field is not set.

      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: <restore-name>
        namespace: kcm-system
      spec:
        backupName: <region-name>-<backup-name>
        excludedResources:
        - mutatingwebhookconfiguration.admissionregistration.k8s.io
        - validatingwebhookconfiguration.admissionregistration.k8s.io
        includedNamespaces:
        - '*'
      
    5. On the regional cluster, wait until the Restore status is Completed and all ClusterDeployment objects are ready.

    6. On the management cluster, unpause provisioning of regional ClusterDeployment objects by removing annotation from the regions object:

      kubectl annotate regions <region-name> 'k0rdent.mirantis.com/region-pause-'
      

If the Restore fails#

At the time of this writing, there is a mis-match between what Mirantis k0rdent Enterprise expects and the objects velero provides, which may result in a PartiallyFailed result. A fix for this problem is coming soon, but in the meantime, you will need to rename the kcm-velero Deployment to velero.

Follow these steps:

  1. Export the YAML for the object, then delete it:

    kubectl -n kcm-system get deployment kcm-velero -o yaml > velero-deployment.yaml
    kubectl delete -n kcm-system deployment kcm-velero
    
    deployment.apps "kcm-velero" deleted
    

  2. Edit the velero-deployment.yaml file to change metadata.name from kcm-velero to velero:

    ...
        component: velero
        helm.sh/chart: velero-9.1.2
      name: velero
      namespace: kcm-system
      resourceVersion: "1653"
    ...
    
  3. Recreate the Deployment with the new name:

    kubectl apply -n kcm-system -f velero-deployment.yaml
    
    deployment.apps/velero created
    

  4. Delete the failed Restore:

    kubectl delete restore -n <your-namespace> <restore-name>
    
  5. Recreate the Restore object as before. It should now complete successfully.

Caveats#

For some CAPI providers it is necessary to make changes to the Restore object due to the large number of different resources and logic in each provider. The resources described below are not excluded from a ManagementBackup by default to avoid logical dependencies on one or another provider, and to create a provider-agnostic system.

Note

The below mentioned exclusions (excludedResources) are applicable to any of the Restore examples on this page, including those tailored for regional clusters.

Azure (CAPZ)#

The following resources should be excluded from the Restore object:

  • natgateways.network.azure.com
  • resourcegroups.resources.azure.com
  • virtualnetworks.network.azure.com
  • virtualnetworkssubnets.network.azure.com

Due to the webhook conversion, objects of these resources cannot be restored, and they will be created in the management cluster by the CAPZ provider automatically with the same spec as in the backup.

The resulting Restore object:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore-name>
  namespace: kcm-system
spec:
  backupName: <backup-name>
  existingResourcePolicy: update
  excludedResources:
  - natgateways.network.azure.com
  - resourcegroups.resources.azure.com
  - virtualnetworks.network.azure.com
  - virtualnetworkssubnets.network.azure.com
  includedNamespaces:
  - '*'

vSphere (CAPV)#

The following resources should be excluded from the Restore object:

  • mutatingwebhookconfiguration.admissionregistration.k8s.io
  • validatingwebhookconfiguration.admissionregistration.k8s.io

Due to the Velero Restoration Order, some of the CAPV core objects cannot be restored, and they will not be recreated automatically. Because all of the objects have already passed both mutations and validations, there is not much sense in validating them again. The webhook configurations will be restored during installation of the CAPV provider.

The resulting Restore object:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore-name>
  namespace: kcm-system
spec:
  backupName: <backup-name>
  existingResourcePolicy: update
  excludedResources:
  - mutatingwebhookconfiguration.admissionregistration.k8s.io
  - validatingwebhookconfiguration.admissionregistration.k8s.io
  includedNamespaces:
  - '*'