This section describes how to restore the Cassandra and ZooKeeper TF databases
from a db-dump
file created as described in Back up TF databases.
Caution
The backup of database must be consistent across all systems because the state of the Tungsten Fabric databases is associated with other system databases, such as OpenStack databases.
To restore TF databases:
Obtain the config API image repository and tag.
kubectl -n tf get tfconfig tf-config -o=jsonpath='{.spec.api.containers[?(@.name=="api")].image}'
From the output, copy the entire image link.
Terminate the configuration and analytics services and stop the database changes associated with northbound APIs on all systems.
Note
The TF operator watches related resources and keeps them updated and healthy. If any resource is deleted or changed, the TF Operator automatically runs reconciling to create a resource or change the configuration back to the desired state. Therefore, the TF Operator must not be running during the databases restoration.
Scale the tungstenfabric-operator
deployment to 0 replicas:
kubectl -n tf scale deploy tungstenfabric-operator --replicas 0
Verify the number of replicas:
kubectl -n tf get deploy tungstenfabric-operator
Example of a positive system response:
NAME READY UP-TO-DATE AVAILABLE AGE
tungstenfabric-operator 0/0 0 0 10h
Delete the TF configuration and analytics daemonsets:
kubectl -n tf delete daemonset tf-config
kubectl -n tf delete daemonset tf-config-db
kubectl -n tf delete daemonset tf-analytics
kubectl -n tf delete daemonset tf-analytics-snmp
The TF configuration pods should be automatically terminated.
Verify that the TF configuration pods are terminated:
kubectl -n tf get pod -l app=tf-config
kubectl -n tf get pod -l tungstenfabric=analytics
kubectl -n tf get pod -l tungstenfabric=analytics-snmp
Example of a positive system response:
No resources found.
Stop Kafka:
Scale the kafka-operator
deployment to 0 replicas:
kubectl -n tf scale deploy kafka-operator --replicas 0
Scale the tf-kafka
statefulSet to 0 replicas:
kubectl -n tf scale sts tf-kafka --replicas 0
Verify the number of replicas:
kubectl -n tf get sts tf-kafka
Example of a positive system response:
NAME READY AGE
tf-kafka 0/0 10h
Stop and wipe the Cassandra database:
Scale the cassandra-operator
deployment to 0 replicas:
kubectl -n tf scale deploy cassandra-operator --replicas 0
Scale the tf-cassandra-config-dc1-rack1
statefulSet to 0 replicas:
kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 0
Verify the number of replicas:
kubectl -n tf get sts tf-cassandra-config-dc1-rack1
Example of a positive system response:
NAME READY AGE
tf-cassandra-config-dc1-rack1 0/0 10h
Delete Persistent Volume Claims (PVCs) for the Cassandra configuration pods:
kubectl -n tf delete pvc -l app=cassandracluster,cassandracluster=tf-cassandra-config
Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.
Stop and wipe the ZooKeeper database:
Scale the zookeeper-operator
deployment to 0 replicas:
kubectl -n tf scale deploy zookeeper-operator --replicas 0
Scale the tf-zookeeper
statefulSet to 0 replicas:
kubectl -n tf scale sts tf-zookeeper --replicas 0
Verify the number of replicas:
kubectl -n tf get sts tf-zookeeper
Example of a positive system response:
NAME READY AGE
tf-zookeeper 0/0 10h
Delete PVCs for the ZooKeeper configuration pods:
kubectl -n tf delete pvc -l app=tf-zookeeper
Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.
Restore the number of replicas to run Cassandra and ZooKeeper and restore the deleted PVCs.
Restore the cassandra-operator
deployment replicas:
kubectl -n tf scale deploy cassandra-operator --replicas 1
Restore the tf-cassandra-config-dc1-rack1
statefulSet replicas:
kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 3
Verify that Cassandra pods have been created and are running:
kubectl -n tf get pod -l app=cassandracluster,cassandracluster=tf-cassandra-config
Example of a positive system response:
NAME READY STATUS RESTARTS AGE
tf-cassandra-config-dc1-rack1-0 1/1 Running 0 4m43s
tf-cassandra-config-dc1-rack1-1 1/1 Running 0 3m30s
tf-cassandra-config-dc1-rack1-2 1/1 Running 0 2m6s
Restore the zookeeper-operator
deployment replicas:
kubectl -n tf scale deploy zookeeper-operator --replicas 1
Restore the tf-zookeeper
statefulSet replicas:
kubectl -n tf scale sts tf-zookeeper --replicas 3
Verify that ZooKeeper pods have been created and are running:
kubectl -n tf get pod -l app=tf-zookeeper
Example of a positive system response:
NAME READY STATUS RESTARTS AGE
tf-zookeeper-0 1/1 Running 0 3m23s
tf-zookeeper-1 1/1 Running 0 2m56s
tf-zookeeper-2 1/1 Running 0 2m20s
Restore the databases from the backup:
Note
Do not use the TF API container used for the backup file creation.
In this case, a session with the Cassandra and ZooKeeper databases is
created once the TF API service starts but the TF configuration services
are stopped. The tools for the database backup and restore are available
only in the TF configuration API container. Using the steps below, start
a blind container based on the config-api
image.
Deploy a pod using the configuration API image obtained in the first step:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: privileged
labels:
app: tf-restore-db
name: tf-restore-db
namespace: tf
spec:
containers:
- name: api
image: <PUT_LINK_TO_CONFIG_API_IMAGE_FROM_STEP_ABOVE>
command:
- sleep
- infinity
envFrom:
- configMapRef:
name: tf-rabbitmq-cfgmap
- configMapRef:
name: tf-zookeeper-cfgmap
- configMapRef:
name: tf-cassandra-cfgmap
- configMapRef:
name: tf-services-cfgmap
- secretRef:
name: tf-os-secret
imagePullPolicy: Always
nodeSelector:
tfcontrol: enabled
dnsPolicy: ClusterFirstWithHostNet
enableServiceLinks: true
hostNetwork: true
priority: 0
restartPolicy: Always
serviceAccount: default
serviceAccountName: default
EOF
Copy the database dump to the container:
kubectl cp <PATH_TO_DB_DUMP> tf/tf-restore-db:/tmp/db-dump.json
Join to the restored container and build the configuration files:
kubectl -n tf exec -it tf-restore-db -- bash
(config-api) $ ./entrypoint.sh
Restore the Cassandra database from the backup:
(config-api) $ cd /usr/lib/python2.7/site-packages/cfgm_common
(config-api) $ python db_json_exim.py --import-from /tmp/db-dump.json
Delete the restore container:
kubectl -n tf delete pod tf-restore-db
Restore the replica number to run Kafka:
Restore the kafka-operator
deployment replicas:
kubectl -n tf scale deploy kafka-operator --replicas 1
Kafka operator should automatically restore the number of replicas of the appropriate StatefulSet.
Verify the number of replicas:
kubectl -n tf get sts tf-kafka
Example of a positive system response:
NAME READY AGE
tf-kafka 3/3 10h
Run TF Operator to restore the TF configuration and analytics services:
Restore the TF Operator deployment replica:
kubectl -n tf scale deploy tungstenfabric-operator --replicas 1
Verify that the TF Operator is running properly without any restarts:
kubectl -n tf get pod -l name=tungstenfabric-operator
Verify that the configuration pods have been automatically started:
kubectl -n tf get pod -l app=tf-config
kubectl -n tf get pod -l tungstenfabric=analytics
kubectl -n tf get pod -l tungstenfabric=analytics-snmp
Restart the tf-control
services:
Caution
To avoid network downtime, do not restart all pods simultaneously.
List the tf-control
pods
kubectl -n tf get pods -l app=tf-control
Restart the tf-control
pods one by one.
Caution
Before restarting the tf-control
pods:
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one tf-control
pod that will be restarted.
kubectl -n tf delete pod tf-control-<hash>