Restore TF data¶
This section describes how to restore the Cassandra and ZooKeeper TF databases from the backups created either automatically or manually as described in Back up TF databases.
Caution
The data backup must be consistent across all systems because the state of the Tungsten Fabric databases is associated with other system databases, such as OpenStack databases.
Automatically restore TF data¶
Verify that there is no existing
tfdbrestore
object in the cluster. If there is still one remaining from the previous restoration, delete it:kubectl -n tf delete tfdbrestores.tf-dbrestore.tf.mirantis.com tf-dbrestore
Edit the TF operator CR to perform the TF data restoration:
spec: settings: dbRestoreMode: enabled: true
Warning
When restoring the data, MOSK stops the TF services and recreates the database back ends that include Cassandra, Kafka, and ZooKeeper.
Optional. Specify the name of the backup to be used for the
dbDumpName
parameter. By default, the latestdb-dump
is used.spec: settings: dbRestoreMode: enabled: true dbDumpName: db-dump-20220111-110138.json
To verify the restoration status and stage, verify the events recorded for the
tfdbrestore
object:kubectl -n tf describe tfdbrestores.tf-dbrestore.tf.mirantis.com
Example of a system response:
... Status: Health: Ready Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal TfDaemonSetsDeleted 18m (x4 over 18m) tf-dbrestore TF DaemonSets were deleted Normal zookeeperOperatorScaledDown 18m tf-dbrestore zookeeper operator scaled to 0 Normal zookeeperStsScaledDown 18m tf-dbrestore tf-zookeeper statefulset scaled to 0 Normal cassandraOperatorScaledDown 17m tf-dbrestore cassandra operator scaled to 0 Normal cassandraStsScaledDown 17m tf-dbrestore tf-cassandra-config-dc1-rack1 statefulset scaled to 0 Normal cassandraStsPodsDeleted 16m tf-dbrestore tf-cassandra-config-dc1-rack1 statefulset pods deleted Normal cassandraPVCDeleted 16m tf-dbrestore tf-cassandra-config-dc1-rack1 PVC deleted Normal zookeeperStsPodsDeleted 16m tf-dbrestore tf-zookeeper statefulset pods deleted Normal zookeeperPVCDeleted 16m tf-dbrestore tf-zookeeper PVC deleted Normal kafkaOperatorScaledDown 16m tf-dbrestore kafka operator scaled to 0 Normal kafkaStsScaledDown 16m tf-dbrestore tf-kafka statefulset scaled to 0 Normal kafkaStsPodsDeleted 16m tf-dbrestore tf-kafka statefulset pods deleted Normal AllOperatorsStopped 16m tf-dbrestore All 3rd party operator's stopped Normal CassandraOperatorScaledUP 16m tf-dbrestore CassandraOperator scaled to 1 Normal CassandraStsScaledUP 16m tf-dbrestore Cassandra statefulset scaled to 3 Normal CassandraPodsActive 12m tf-dbrestore Cassandra pods active Normal ZookeeperOperatorScaledUP 12m tf-dbrestore Zookeeper Operator scaled to 1 Normal ZookeeperStsScaledUP 12m tf-dbrestore Zookeeper Operator scaled to 3 Normal ZookeeperPodsActive 12m tf-dbrestore Zookeeper pods active Normal DBRestoreFinished 12m tf-dbrestore TF db restore finished Normal TFRestoreDisabled 12m tf-dbrestore TF Restore disabled
Note
If the restoration was completed several hours ago, events may not be shown with kubectl describe. If so, verify the
Status
field and get events using the following command:kubectl -n tf get events --field-selector involvedObject.name=tf-dbrestore
After the job completes, it can take around 15 minutes to stabilize
tf-control
services. If some pods are still in theCrashLoopBackOff
status, restart these pods manually one by one:List the
tf-control
pods:kubectl -n tf get pods -l app=tf-control
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only the
tf-control
pod that will be restarted.Restart the
tf-control
pods sequentially:kubectl -n tf delete pod tf-control-<hash>
When the restoration completes, MOSK automatically sets
dbRestoreMode
tofalse
in the TF operator CR.Delete the
tfdbrestore
object from the cluster to be able to perform the next restoration:kubectl -n tf delete tfdbrestores.tf-dbrestore.tf.mirantis.com tf-dbrestore
Manually restore TF data¶
Obtain the config API image repository and tag.
kubectl -n tf get tfconfig tf-config -o=jsonpath='{.spec.api.containers[?(@.name=="api")].image}'
From the output, copy the entire image link.
Terminate the configuration and analytics services and stop the database changes associated with northbound APIs on all systems.
Note
The TF operator watches related resources and keeps them updated and healthy. If any resource is deleted or changed, the TF Operator automatically runs reconciling to create a resource or change the configuration back to the desired state. Therefore, the TF Operator must not be running during the data restoration.
Scale the
tungstenfabric-operator
deployment to 0 replicas:kubectl -n tf scale deploy tungstenfabric-operator --replicas 0
Verify the number of replicas:
kubectl -n tf get deploy tungstenfabric-operator
Example of a positive system response:
NAME READY UP-TO-DATE AVAILABLE AGE tungstenfabric-operator 0/0 0 0 10h
Delete the TF configuration and analytics daemonsets:
kubectl -n tf delete daemonset tf-config kubectl -n tf delete daemonset tf-config-db kubectl -n tf delete daemonset tf-analytics kubectl -n tf delete daemonset tf-analytics-snmp
The TF configuration pods should be automatically terminated.
Verify that the TF configuration pods are terminated:
kubectl -n tf get pod -l app=tf-config kubectl -n tf get pod -l tungstenfabric=analytics kubectl -n tf get pod -l tungstenfabric=analytics-snmp
Example of a positive system response:
No resources found.
Stop Kafka:
Scale the
kafka-operator
deployment to 0 replicas:kubectl -n tf scale deploy kafka-operator --replicas 0
Scale the
tf-kafka
statefulSet to 0 replicas:kubectl -n tf scale sts tf-kafka --replicas 0
Verify the number of replicas:
kubectl -n tf get sts tf-kafka
Example of a positive system response:
NAME READY AGE tf-kafka 0/0 10h
Stop and wipe the Cassandra database:
Scale the
cassandra-operator
deployment to 0 replicas:kubectl -n tf scale deploy cassandra-operator --replicas 0
Scale the
tf-cassandra-config-dc1-rack1
statefulSet to 0 replicas:kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 0
Verify the number of replicas:
kubectl -n tf get sts tf-cassandra-config-dc1-rack1
Example of a positive system response:
NAME READY AGE tf-cassandra-config-dc1-rack1 0/0 10h
Delete Persistent Volume Claims (PVCs) for the Cassandra configuration pods:
kubectl -n tf delete pvc -l app=cassandracluster,cassandracluster=tf-cassandra-config
Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.
Stop and wipe the ZooKeeper database:
Scale the
zookeeper-operator
deployment to 0 replicas:kubectl -n tf scale deploy zookeeper-operator --replicas 0
Scale the
tf-zookeeper
statefulSet to 0 replicas:kubectl -n tf scale sts tf-zookeeper --replicas 0
Verify the number of replicas:
kubectl -n tf get sts tf-zookeeper
Example of a positive system response:
NAME READY AGE tf-zookeeper 0/0 10h
Delete PVCs for the ZooKeeper configuration pods:
kubectl -n tf delete pvc -l app=tf-zookeeper
Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.
Restore the number of replicas to run Cassandra and ZooKeeper and restore the deleted PVCs.
Restore the
cassandra-operator
deployment replicas:kubectl -n tf scale deploy cassandra-operator --replicas 1
Restore the
tf-cassandra-config-dc1-rack1
statefulSet replicas:kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 3
Verify that Cassandra pods have been created and are running:
kubectl -n tf get pod -l app=cassandracluster,cassandracluster=tf-cassandra-config
Example of a positive system response:
NAME READY STATUS RESTARTS AGE tf-cassandra-config-dc1-rack1-0 1/1 Running 0 4m43s tf-cassandra-config-dc1-rack1-1 1/1 Running 0 3m30s tf-cassandra-config-dc1-rack1-2 1/1 Running 0 2m6s
Restore the
zookeeper-operator
deployment replicas:kubectl -n tf scale deploy zookeeper-operator --replicas 1
Restore the
tf-zookeeper
statefulSet replicas:kubectl -n tf scale sts tf-zookeeper --replicas 3
Verify that ZooKeeper pods have been created and are running:
kubectl -n tf get pod -l app=tf-zookeeper
Example of a positive system response:
NAME READY STATUS RESTARTS AGE tf-zookeeper-0 1/1 Running 0 3m23s tf-zookeeper-1 1/1 Running 0 2m56s tf-zookeeper-2 1/1 Running 0 2m20s
Restore the data from the backup:
Note
Do not use the TF API container used for the backup file creation. In this case, a session with the Cassandra and ZooKeeper databases is created once the TF API service starts but the TF configuration services are stopped. The tools for the data backup and restore are available only in the TF configuration API container. Using the steps below, start a blind container based on the
config-api
image.Deploy a pod using the configuration API image obtained in the first step:
cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/psp: privileged labels: app: tf-restore-db name: tf-restore-db namespace: tf spec: containers: - name: api image: <PUT_LINK_TO_CONFIG_API_IMAGE_FROM_STEP_ABOVE> command: - sleep - infinity envFrom: - configMapRef: name: tf-rabbitmq-cfgmap - configMapRef: name: tf-zookeeper-cfgmap - configMapRef: name: tf-cassandra-cfgmap - configMapRef: name: tf-services-cfgmap - secretRef: name: tf-os-secret imagePullPolicy: Always nodeSelector: tfcontrol: enabled dnsPolicy: ClusterFirstWithHostNet enableServiceLinks: true hostNetwork: true priority: 0 restartPolicy: Always serviceAccount: default serviceAccountName: default EOF
If you use the backup that was created automatically, extend the YAML file content above with the following configuration:
... spec: containers: - name: api volumeMounts: - mountPath: </PATH/TO/MOUNT> name: <TF-DBBACKUP-VOL-NAME> volumes: - name: <TF-DBBACKUP-VOL-NAME> persistentVolumeClaim: claimName: <TF-DBBACKUP-PVC-NAME>
Copy the database dump to the container:
Warning
Skip this step if you use the auto-backup and have provided the volume definition as described above.
kubectl cp <PATH_TO_DB_DUMP> tf/tf-restore-db:/tmp/db-dump.json
Copy the
contrail-api.conf
file to the container:kubectl cp <PATH-TO-CONFIG> tf/tf-restore-db:/tmp/contrail-api.conf
Join the restarted container:
kubectl -n tf exec -it tf-restore-db -- bash
Restore the Cassandra database from the backup:
(config-api) $ cd /usr/lib/python2.7/site-packages/cfgm_common (config-api) $ python db_json_exim.py --import-from /tmp/db-dump.json --api-conf /tmp/contrail-api.conf
Delete the restore container:
kubectl -n tf delete pod tf-restore-db
Restore the replica number to run Kafka:
Restore the
kafka-operator
deployment replicas:kubectl -n tf scale deploy kafka-operator --replicas 1
Kafka operator should automatically restore the number of replicas of the appropriate StatefulSet.
Verify the number of replicas:
kubectl -n tf get sts tf-kafka
Example of a positive system response:
NAME READY AGE tf-kafka 3/3 10h
Run TF Operator to restore the TF configuration and analytics services:
Restore the TF Operator deployment replica:
kubectl -n tf scale deploy tungstenfabric-operator --replicas 1
Verify that the TF Operator is running properly without any restarts:
kubectl -n tf get pod -l name=tungstenfabric-operator
Verify that the configuration pods have been automatically started:
kubectl -n tf get pod -l app=tf-config kubectl -n tf get pod -l tungstenfabric=analytics kubectl -n tf get pod -l tungstenfabric=analytics-snmp
Restart the
tf-control
services:Caution
To avoid network downtime, do not restart all pods simultaneously.
List the
tf-control
podskubectl -n tf get pods -l app=tf-control
Restart the
tf-control
pods one by one.Caution
Before restarting the
tf-control
pods:Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one
tf-control
pod that will be restarted.
kubectl -n tf delete pod tf-control-<hash>