Restore TF databases

This section describes how to restore the Cassandra and ZooKeeper TF databases from a db-dump file created as described in Back up TF databases.

Caution

The backup of database must be consistent across all systems because the state of the Tungsten Fabric databases is associated with other system databases, such as OpenStack databases.

To restore TF databases:

  1. Obtain the config API image repository and tag.

    kubectl -n tf get tfconfig tf-config -o=jsonpath='{.spec.api.containers[?(@.name=="api")].image}'
    

    From the output, copy the entire image link.

  2. Terminate the configuration and analytics services and stop the database changes associated with northbound APIs on all systems.

    Note

    The TF operator watches related resources and keeps them updated and healthy. If any resource is deleted or changed, the TF Operator automatically runs reconciling to create a resource or change the configuration back to the desired state. Therefore, the TF Operator must not be running during the databases restoration.

    1. Scale the tungstenfabric-operator deployment to 0 replicas:

      kubectl -n tf scale deploy tungstenfabric-operator --replicas 0
      
    2. Verify the number of replicas:

      kubectl -n tf get deploy tungstenfabric-operator
      

      Example of a positive system response:

      NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
      tungstenfabric-operator   0/0     0            0           10h
      
    3. Delete the TF configuration and analytics daemonsets:

      kubectl -n tf delete daemonset tf-config
      kubectl -n tf delete daemonset tf-config-db
      kubectl -n tf delete daemonset tf-analytics
      kubectl -n tf delete daemonset tf-analytics-snmp
      

      The TF configuration pods should be automatically terminated.

    4. Verify that the TF configuration pods are terminated:

      kubectl -n tf get pod -l app=tf-config
      kubectl -n tf get pod -l tungstenfabric=analytics
      kubectl -n tf get pod -l tungstenfabric=analytics-snmp
      

      Example of a positive system response:

      No resources found.
      
  3. Stop Kafka:

    1. Scale the kafka-operator deployment to 0 replicas:

      kubectl -n tf scale deploy kafka-operator --replicas 0
      
    2. Scale the tf-kafka statefulSet to 0 replicas:

      kubectl -n tf scale sts tf-kafka --replicas 0
      
    3. Verify the number of replicas:

      kubectl -n tf get sts tf-kafka
      

      Example of a positive system response:

      NAME       READY   AGE
      tf-kafka   0/0     10h
      
  4. Stop and wipe the Cassandra database:

    1. Scale the cassandra-operator deployment to 0 replicas:

      kubectl -n tf scale deploy cassandra-operator --replicas 0
      
    2. Scale the tf-cassandra-config-dc1-rack1 statefulSet to 0 replicas:

      kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 0
      
    3. Verify the number of replicas:

      kubectl -n tf get sts tf-cassandra-config-dc1-rack1
      

      Example of a positive system response:

      NAME                            READY   AGE
      tf-cassandra-config-dc1-rack1   0/0     10h
      
    4. Delete Persistent Volume Claims (PVCs) for the Cassandra configuration pods:

      kubectl -n tf delete pvc -l app=cassandracluster,cassandracluster=tf-cassandra-config
      

      Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.

  5. Stop and wipe the ZooKeeper database:

    1. Scale the zookeeper-operator deployment to 0 replicas:

      kubectl -n tf scale deploy zookeeper-operator --replicas 0
      
    2. Scale the tf-zookeeper statefulSet to 0 replicas:

      kubectl -n tf scale sts tf-zookeeper --replicas 0
      
    3. Verify the number of replicas:

      kubectl -n tf get sts tf-zookeeper
      

      Example of a positive system response:

      NAME           READY   AGE
      tf-zookeeper   0/0     10h
      
    4. Delete PVCs for the ZooKeeper configuration pods:

      kubectl -n tf delete pvc -l app=tf-zookeeper
      

      Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.

  6. Restore the number of replicas to run Cassandra and ZooKeeper and restore the deleted PVCs.

    1. Restore the cassandra-operator deployment replicas:

      kubectl -n tf scale deploy cassandra-operator --replicas 1
      
    2. Restore the tf-cassandra-config-dc1-rack1 statefulSet replicas:

      kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 3
      
    3. Verify that Cassandra pods have been created and are running:

      kubectl -n tf get pod -l app=cassandracluster,cassandracluster=tf-cassandra-config
      

      Example of a positive system response:

      NAME                              READY   STATUS    RESTARTS   AGE
      tf-cassandra-config-dc1-rack1-0   1/1     Running   0          4m43s
      tf-cassandra-config-dc1-rack1-1   1/1     Running   0          3m30s
      tf-cassandra-config-dc1-rack1-2   1/1     Running   0          2m6s
      
    4. Restore the zookeeper-operator deployment replicas:

      kubectl -n tf scale deploy zookeeper-operator --replicas 1
      
    5. Restore the tf-zookeeper statefulSet replicas:

      kubectl -n tf scale sts tf-zookeeper --replicas 3
      
    6. Verify that ZooKeeper pods have been created and are running:

      kubectl -n tf get pod -l app=tf-zookeeper
      

      Example of a positive system response:

      NAME             READY   STATUS    RESTARTS   AGE
      tf-zookeeper-0   1/1     Running   0          3m23s
      tf-zookeeper-1   1/1     Running   0          2m56s
      tf-zookeeper-2   1/1     Running   0          2m20s
      
  7. Restore the databases from the backup:

    Note

    Do not use the TF API container used for the backup file creation. In this case, a session with the Cassandra and ZooKeeper databases is created once the TF API service starts but the TF configuration services are stopped. The tools for the database backup and restore are available only in the TF configuration API container. Using the steps below, start a blind container based on the config-api image.

    1. Deploy a pod using the configuration API image obtained in the first step:

      cat <<EOF | kubectl apply -f -
      apiVersion: v1
      kind: Pod
      metadata:
        annotations:
          kubernetes.io/psp: privileged
        labels:
          app: tf-restore-db
        name: tf-restore-db
        namespace: tf
      spec:
        containers:
          - name: api
            image: <PUT_LINK_TO_CONFIG_API_IMAGE_FROM_STEP_ABOVE>
            command:
              - sleep
              - infinity
            envFrom:
              - configMapRef:
                  name: tf-rabbitmq-cfgmap
              - configMapRef:
                  name: tf-zookeeper-cfgmap
              - configMapRef:
                  name: tf-cassandra-cfgmap
              - configMapRef:
                  name: tf-services-cfgmap
              - secretRef:
                  name: tf-os-secret
            imagePullPolicy: Always
        nodeSelector:
          tfcontrol: enabled
        dnsPolicy: ClusterFirstWithHostNet
        enableServiceLinks: true
        hostNetwork: true
        priority: 0
        restartPolicy: Always
        serviceAccount: default
        serviceAccountName: default
      EOF
      
    2. Copy the database dump to the container:

      kubectl cp <PATH_TO_DB_DUMP> tf/tf-restore-db:/tmp/db-dump.json
      
    3. Copy the contrail-api.conf file to the container:

      kubectl cp <PATH-TO-CONFIG> tf/tf-restore-db:/tmp/contrail-api.conf
      
    4. Join the restarted container:

      kubectl -n tf exec -it tf-restore-db -- bash
      
    5. Restore the Cassandra database from the backup:

      (config-api) $ cd /usr/lib/python2.7/site-packages/cfgm_common
      (config-api) $ python db_json_exim.py --import-from /tmp/db-dump.json --api-conf /tmp/contrail-api.conf
      
    6. Delete the restore container:

      kubectl -n tf delete pod tf-restore-db
      
  8. Restore the replica number to run Kafka:

    1. Restore the kafka-operator deployment replicas:

      kubectl -n tf scale deploy kafka-operator --replicas 1
      

      Kafka operator should automatically restore the number of replicas of the appropriate StatefulSet.

    2. Verify the number of replicas:

      kubectl -n tf get sts tf-kafka
      

      Example of a positive system response:

      NAME       READY   AGE
      tf-kafka   3/3     10h
      
  9. Run TF Operator to restore the TF configuration and analytics services:

    1. Restore the TF Operator deployment replica:

      kubectl -n tf scale deploy tungstenfabric-operator --replicas 1
      
    2. Verify that the TF Operator is running properly without any restarts:

      kubectl -n tf get pod -l name=tungstenfabric-operator
      
    3. Verify that the configuration pods have been automatically started:

      kubectl -n tf get pod -l app=tf-config
      kubectl -n tf get pod -l tungstenfabric=analytics
      kubectl -n tf get pod -l tungstenfabric=analytics-snmp
      
  10. Restart the tf-control services:

    Caution

    To avoid network downtime, do not restart all pods simultaneously.

    1. List the tf-control pods

      kubectl -n tf get pods -l app=tf-control
      
    2. Restart the tf-control pods one by one.

      Caution

      Before restarting the tf-control pods:

      • Verify that the new pods are successfully spawned.

      • Verify that no vRouters are connected to only one tf-control pod that will be restarted.

      kubectl -n tf delete pod tf-control-<hash>