Restore Tungsten Fabric data¶

This section describes how to restore the Cassandra and ZooKeeper databases from the backups created either automatically or manually as described in Back up TF data.

Caution

The data backup must be consistent across all systems because the state of the Tungsten Fabric databases is associated with other system databases, such as OpenStack databases.

Automatically restore the data¶

Verify that the cluster does not have the tfdbrestore object. If there is still one remaining from the previous restoration, delete it:
```
kubectl -n tf delete tfdbrestores.tf.mirantis.com tf-dbrestore
```
Edit the TFOperator custom resource to perform the data restoration:
```
spec:
  features:
    dbRestoreMode:
      enabled: true
```
Warning

When restoring the data, MOSK stops the Tungsten Fabric services and recreates the database backends that include Cassandra and ZooKeeper.

Note

The automated restoration process relies on automated database backups configured by the Tungsten Fabric Operator. The Tungsten Fabric data is restored from the backup type specified in the tf-dbBackup section of the Tungsten Fabric Operator custom resource, or the default pvc type if not specified. For the configuration details, refer to Periodic Tungsten Fabric database backups.

Optional. Specify the name of the backup to be used for the dbDumpName parameter. By default, the latest db-dump is used.

spec:
  features:
    dbRestoreMode:
      enabled: true
      dbDumpName: db-dump-20220111-110138.json

To verify the restoration status and stage, verify the events recorded for the tfdbrestore object:

kubectl -n tf describe tfdbrestores.tf.mirantis.com tf-dbrestore

Example of a system response:

...
Status:
   Health:  Ready
Events:
   Type    Reason                       Age                From          Message
   ----    ------                       ----               ----          -------
   Normal  TfDaemonSetsDeleted          18m (x4 over 18m)  tf-dbrestore  TF DaemonSets were deleted
   Normal  zookeeperOperatorScaledDown  18m                tf-dbrestore  zookeeper operator scaled to 0
   Normal  zookeeperStsScaledDown       18m                tf-dbrestore  tf-zookeeper statefulset scaled to 0
   Normal  cassandraOperatorScaledDown  17m                tf-dbrestore  cassandra operator scaled to 0
   Normal  cassandraStsScaledDown       17m                tf-dbrestore  tf-cassandra-config-dc1-rack1 statefulset scaled to 0
   Normal  cassandraStsPodsDeleted      16m                tf-dbrestore  tf-cassandra-config-dc1-rack1 statefulset pods deleted
   Normal  cassandraPVCDeleted          16m                tf-dbrestore  tf-cassandra-config-dc1-rack1 PVC deleted
   Normal  zookeeperStsPodsDeleted      16m                tf-dbrestore  tf-zookeeper statefulset pods deleted
   Normal  zookeeperPVCDeleted          16m                tf-dbrestore  tf-zookeeper PVC deleted
   Normal  AllOperatorsStopped          16m                tf-dbrestore  All 3rd party operator's stopped
   Normal  CassandraOperatorScaledUP    16m                tf-dbrestore  CassandraOperator  scaled to 1
   Normal  CassandraStsScaledUP         16m                tf-dbrestore  Cassandra statefulset scaled to 3
   Normal  CassandraPodsActive          12m                tf-dbrestore  Cassandra pods active
   Normal  ZookeeperOperatorScaledUP    12m                tf-dbrestore  Zookeeper Operator  scaled to 1
   Normal  ZookeeperStsScaledUP         12m                tf-dbrestore  Zookeeper Operator  scaled to 3
   Normal  ZookeeperPodsActive          12m                tf-dbrestore  Zookeeper pods  active
   Normal  DBRestoreFinished            12m                tf-dbrestore  TF db restore finished
   Normal  TFRestoreDisabled            12m                tf-dbrestore  TF Restore disabled

Note

If the restoration was completed several hours ago, events may not be shown with kubectl describe. If so, verify the Status field and get events using the following command:

kubectl -n tf get events --field-selector involvedObject.name=tf-dbrestore

After the job completes, it can take around 15 minutes to stabilize tf-control services. If some pods are still in the CrashLoopBackOff status, restart these pods manually one by one:
1. List the tf-control pods:
```
kubectl -n tf get pods -l app=tf-control
```
2. Verify that the new pods are successfully spawned.
3. Verify that no vRouters are connected to only the tf-control pod that will be restarted.
4. Restart the tf-control pods sequentially:
```
kubectl -n tf delete pod tf-control-<hash>
```
When the restoration completes, MOSK automatically sets dbRestoreMode to false in the Tungsten Fabric Operator custom resource.
Delete the tfdbrestore object from the cluster to be able to perform the next restoration:
```
kubectl -n tf delete tfdbrestores.tf.mirantis.com tf-dbrestore
```

Manually restore the data¶

Obtain the config API image repository and tag.

kubectl -n tf get tfconfig tf-config -o jsonpath='{.spec.images.config.configAPI}'

From the output, copy the entire image link.

Terminate the configuration and analytics services, if the latter are present in your deployment, and stop the database changes associated with northbound APIs on all systems.

Note

The Tungsten Fabric Operator watches related resources and keeps them updated and healthy. If any resource is deleted or changed, the Tungsten Fabric Operator automatically runs reconciling to create a resource or change the configuration back to the required state. Therefore, the Tungsten Fabric Operator must not be running during the data restoration.
1. Scale the tungstenfabric-operator deployment to 0 replicas:
```
kubectl -n tf scale deploy tungstenfabric-operator --replicas 0
```
2. Verify the number of replicas:
```
kubectl -n tf get deploy tungstenfabric-operator
```
  Example of a positive system response:
```
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
tungstenfabric-operator   0/0     0            0           10h
```
3. Delete the Tungsten Fabric configuration and analytics DaemonSets, if the latter are present in your deployment:
```
kubectl -n tf delete daemonset tf-config
kubectl -n tf delete daemonset tf-config-db
kubectl -n tf delete daemonset tf-analytics
kubectl -n tf delete daemonset tf-analytics-snmp
```
  The Tungsten Fabric configuration pods should be automatically terminated.
4. Verify that the Tungsten Fabric configuration and analytics pods, if the latter are present in your deployment, are terminated:
```
kubectl -n tf get pod -l app=tf-config
kubectl -n tf get pod -l tungstenfabric=analytics
kubectl -n tf get pod -l tungstenfabric=analytics-snmp
```
  Example of a positive system response:
```
No resources found.
```
Stop and wipe the Cassandra database:
1. Scale the cassandra-operator deployment to 0 replicas:
```
kubectl -n tf scale deploy cassandra-operator --replicas 0
```
2. Scale the tf-cassandra-config-dc1-rack1 statefulSet to 0 replicas:
```
kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 0
```
3. Verify the number of replicas:
```
kubectl -n tf get sts tf-cassandra-config-dc1-rack1
```
  Example of a positive system response:
```
NAME                            READY   AGE
tf-cassandra-config-dc1-rack1   0/0     10h
```
4. Delete Persistent Volume Claims (PVCs) for the Cassandra configuration pods:
```
kubectl -n tf delete pvc -l app=cassandracluster,cassandracluster=tf-cassandra-config
```
  Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.
Stop and wipe the ZooKeeper database:
1. Scale the zookeeper-operator deployment to 0 replicas:
```
kubectl -n tf scale deploy zookeeper-operator --replicas 0
```
2. Scale the tf-zookeeper statefulSet to 0 replicas:
```
kubectl -n tf scale sts tf-zookeeper --replicas 0
```
3. Verify the number of replicas:
```
kubectl -n tf get sts tf-zookeeper
```
  Example of a positive system response:
```
NAME           READY   AGE
tf-zookeeper   0/0     10h
```
4. Delete PVCs for the ZooKeeper configuration pods:
```
kubectl -n tf delete pvc -l app=tf-zookeeper
```
  Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.

Restore the number of replicas to run Cassandra and ZooKeeper and restore the deleted PVCs.

Restore the cassandra-operator deployment replicas:

kubectl -n tf scale deploy cassandra-operator --replicas 1

Restore the tf-cassandra-config-dc1-rack1 statefulSet replicas:

kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 3

Verify that Cassandra pods have been created and are running:

kubectl -n tf get pod -l app=cassandracluster,cassandracluster=tf-cassandra-config

Example of a positive system response:

NAME                              READY   STATUS    RESTARTS   AGE
tf-cassandra-config-dc1-rack1-0   1/1     Running   0          4m43s
tf-cassandra-config-dc1-rack1-1   1/1     Running   0          3m30s
tf-cassandra-config-dc1-rack1-2   1/1     Running   0          2m6s

Restore the zookeeper-operator deployment replicas:

kubectl -n tf scale deploy zookeeper-operator --replicas 1

Restore the tf-zookeeper statefulSet replicas:

kubectl -n tf scale sts tf-zookeeper --replicas 3

Verify that ZooKeeper pods have been created and are running:

kubectl -n tf get pod -l app=tf-zookeeper

Example of a positive system response:

NAME             READY   STATUS    RESTARTS   AGE
tf-zookeeper-0   1/1     Running   0          3m23s
tf-zookeeper-1   1/1     Running   0          2m56s
tf-zookeeper-2   1/1     Running   0          2m20s

Restore the data from the backup:

Note

Do not use the Tungsten Fabric API container used for the backup file creation. In this case, a session with the Cassandra and ZooKeeper databases is created once the Tungsten Fabric API service starts but the Tungsten Fabric configuration services are stopped. The tools for the data backup and restore are available only in the Tungsten Fabric configuration API container. Using the steps below, start a blind container based on the config-api image.

Deploy a pod using the configuration API image obtained in the first step:

Note

If your deployment uses the cql Cassandra driver, update the value of the CONFIGDB_CASSANDRA_DRIVER environment variable to cql.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: privileged
  labels:
    app: tf-restore-db
  name: tf-restore-db
  namespace: tf
spec:
  containers:
    - name: api
      image: <PUT_LINK_TO_CONFIG_API_IMAGE_FROM_STEP_ABOVE>
      command:
        - sleep
        - infinity
      envFrom:
        - configMapRef:
            name: tf-rabbitmq-cfgmap
        - configMapRef:
            name: tf-zookeeper-cfgmap
        - configMapRef:
            name: tf-cassandra-cfgmap
        - configMapRef:
            name: tf-services-cfgmap
        - secretRef:
            name: tf-os-secret
      env:
      - name: CONFIGDB_CASSANDRA_DRIVER
        value: thrift
      imagePullPolicy: Always
  nodeSelector:
    tfcontrol: enabled
  dnsPolicy: ClusterFirstWithHostNet
  enableServiceLinks: true
  hostNetwork: true
  priority: 0
  restartPolicy: Always
  serviceAccount: default
  serviceAccountName: default
EOF

If you use the backup that was created automatically, extend the YAML file content above with the following configuration:

...
spec:
  containers:
  - name: api
    volumeMounts:
      - mountPath: </PATH/TO/MOUNT>
        name: <TF-DBBACKUP-VOL-NAME>
  volumes:
    - name: <TF-DBBACKUP-VOL-NAME>
      persistentVolumeClaim:
        claimName: <TF-DBBACKUP-PVC-NAME>

Copy the database dump to the container:

Warning

Skip this step if you use the auto-backup and have provided the volume definition as described above.
```
kubectl cp <PATH_TO_DB_DUMP> tf/tf-restore-db:/tmp/db-dump.json
```

Copy the contrail-api.conf file to the container:

kubectl cp <PATH-TO-CONFIG> tf/tf-restore-db:/tmp/contrail-api.conf

Join the restarted container:

kubectl -n tf exec -it tf-restore-db -- bash

Restore the Cassandra database from the backup:

(config-api) $ cd /usr/lib/python3.6/site-packages/cfgm_common
(config-api) $ python db_json_exim.py --import-from /tmp/db-dump.json --api-conf /tmp/contrail-api.conf

Delete the restore container:
```
kubectl -n tf delete pod tf-restore-db
```

Run the Tungsten Fabric Operator to restore the Tungsten Fabric configuration and analytics services, if the latter were present in your deployment:
1. Restore the replica for the Tungsten Fabric Operator Deployment:
```
kubectl -n tf scale deploy tungstenfabric-operator --replicas 1
```
2. Verify that the Tungsten Fabric Operator is running properly without any restarts:
```
kubectl -n tf get pod -l name=tungstenfabric-operator
```
3. Verify that the configuration and analytics pods, if the latter were present in your deployment, have been automatically started:
```
kubectl -n tf get pod -l app=tf-config
kubectl -n tf get pod -l tungstenfabric=analytics
kubectl -n tf get pod -l tungstenfabric=analytics-snmp
```
Restart the tf-control services:

Caution

To avoid network downtime, do not restart all pods simultaneously.
1. List the tf-control pods
```
kubectl -n tf get pods -l app=tf-control
```
2. Restart the tf-control pods one by one.
  Caution
  
  Before restarting the tf-control pods:
  - Verify that the new pods are successfully spawned.
  - Verify that no vRouters are connected to only one tf-control pod that will be restarted.
```
kubectl -n tf delete pod tf-control-<hash>
```

No results

An error occurred

Restore Tungsten Fabric data¶

Automatically restore the data¶

Manually restore the data¶