Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!
Now, the MOSK documentation set covers all product layers, including MOSK management (formerly Container Cloud). This means everything you need is in one place. Some legacy names may remain in the code and documentation and will be updated in future releases. The separate Container Cloud documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.
Restore Tungsten Fabric data¶
This section describes how to restore the Cassandra and ZooKeeper databases from the backups created either automatically or manually as described in Back up TF data.
Caution
The data backup must be consistent across all systems because the state of the Tungsten Fabric databases is associated with other system databases, such as OpenStack databases.
Automatically restore the data¶
Verify that the cluster does not have the
tfdbrestoreobject. If there is still one remaining from the previous restoration, delete it:kubectl -n tf delete tfdbrestores.tf.mirantis.com tf-dbrestore
Edit the
TFOperatorcustom resource to perform the data restoration:spec: features: dbRestoreMode: enabled: true
Warning
When restoring the data, MOSK stops the Tungsten Fabric services and recreates the database backends that include Cassandra, Kafka, and ZooKeeper.
Note
The automated restoration process relies on automated database backups configured by the Tungsten Fabric Operator. The Tungsten Fabric data is restored from the backup type specified in the
tf-dbBackupsection of the Tungsten Fabric Operator custom resource, or the defaultpvctype if not specified. For the configuration details, refer to Periodic Tungsten Fabric database backups.Optional. Specify the name of the backup to be used for the
dbDumpNameparameter. By default, the latestdb-dumpis used.spec: features: dbRestoreMode: enabled: true dbDumpName: db-dump-20220111-110138.json
To verify the restoration status and stage, verify the events recorded for the
tfdbrestoreobject:kubectl -n tf describe tfdbrestores.tf.mirantis.com tf-dbrestore
Example of a system response:
... Status: Health: Ready Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal TfDaemonSetsDeleted 18m (x4 over 18m) tf-dbrestore TF DaemonSets were deleted Normal zookeeperOperatorScaledDown 18m tf-dbrestore zookeeper operator scaled to 0 Normal zookeeperStsScaledDown 18m tf-dbrestore tf-zookeeper statefulset scaled to 0 Normal cassandraOperatorScaledDown 17m tf-dbrestore cassandra operator scaled to 0 Normal cassandraStsScaledDown 17m tf-dbrestore tf-cassandra-config-dc1-rack1 statefulset scaled to 0 Normal cassandraStsPodsDeleted 16m tf-dbrestore tf-cassandra-config-dc1-rack1 statefulset pods deleted Normal cassandraPVCDeleted 16m tf-dbrestore tf-cassandra-config-dc1-rack1 PVC deleted Normal zookeeperStsPodsDeleted 16m tf-dbrestore tf-zookeeper statefulset pods deleted Normal zookeeperPVCDeleted 16m tf-dbrestore tf-zookeeper PVC deleted Normal kafkaOperatorScaledDown 16m tf-dbrestore kafka operator scaled to 0 Normal kafkaStsScaledDown 16m tf-dbrestore tf-kafka statefulset scaled to 0 Normal kafkaStsPodsDeleted 16m tf-dbrestore tf-kafka statefulset pods deleted Normal AllOperatorsStopped 16m tf-dbrestore All 3rd party operator's stopped Normal CassandraOperatorScaledUP 16m tf-dbrestore CassandraOperator scaled to 1 Normal CassandraStsScaledUP 16m tf-dbrestore Cassandra statefulset scaled to 3 Normal CassandraPodsActive 12m tf-dbrestore Cassandra pods active Normal ZookeeperOperatorScaledUP 12m tf-dbrestore Zookeeper Operator scaled to 1 Normal ZookeeperStsScaledUP 12m tf-dbrestore Zookeeper Operator scaled to 3 Normal ZookeeperPodsActive 12m tf-dbrestore Zookeeper pods active Normal DBRestoreFinished 12m tf-dbrestore TF db restore finished Normal TFRestoreDisabled 12m tf-dbrestore TF Restore disabled
Note
If the restoration was completed several hours ago, events may not be shown with kubectl describe. If so, verify the
Statusfield and get events using the following command:kubectl -n tf get events --field-selector involvedObject.name=tf-dbrestore
After the job completes, it can take around 15 minutes to stabilize
tf-controlservices. If some pods are still in theCrashLoopBackOffstatus, restart these pods manually one by one:List the
tf-controlpods:kubectl -n tf get pods -l app=tf-control
Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only the
tf-controlpod that will be restarted.Restart the
tf-controlpods sequentially:kubectl -n tf delete pod tf-control-<hash>
When the restoration completes, MOSK automatically sets
dbRestoreModetofalsein the Tungsten Fabric Operator custom resource.Delete the
tfdbrestoreobject from the cluster to be able to perform the next restoration:kubectl -n tf delete tfdbrestores.tf.mirantis.com tf-dbrestore
Manually restore the data¶
Obtain the config API image repository and tag.
kubectl -n tf get tfconfig tf-config -o jsonpath='{.spec.images.config.configAPI}'
From the output, copy the entire image link.
Terminate the configuration and analytics services, if the latter are present in your deployment, and stop the database changes associated with northbound APIs on all systems.
Note
The Tungsten Fabric Operator watches related resources and keeps them updated and healthy. If any resource is deleted or changed, the Tungsten Fabric Operator automatically runs reconciling to create a resource or change the configuration back to the required state. Therefore, the Tungsten Fabric Operator must not be running during the data restoration.
Scale the
tungstenfabric-operatordeployment to 0 replicas:kubectl -n tf scale deploy tungstenfabric-operator --replicas 0
Verify the number of replicas:
kubectl -n tf get deploy tungstenfabric-operator
Example of a positive system response:
NAME READY UP-TO-DATE AVAILABLE AGE tungstenfabric-operator 0/0 0 0 10h
Delete the Tungsten Fabric configuration and analytics DaemonSets, if the latter are present in your deployment:
kubectl -n tf delete daemonset tf-config kubectl -n tf delete daemonset tf-config-db kubectl -n tf delete daemonset tf-analytics kubectl -n tf delete daemonset tf-analytics-snmp
The Tungsten Fabric configuration pods should be automatically terminated.
Verify that the Tungsten Fabric configuration and analytics pods, if the latter are present in your deployment, are terminated:
kubectl -n tf get pod -l app=tf-config kubectl -n tf get pod -l tungstenfabric=analytics kubectl -n tf get pod -l tungstenfabric=analytics-snmp
Example of a positive system response:
No resources found.
Stop Kafka:
Scale the
kafka-operatordeployment to 0 replicas:kubectl -n tf scale deploy kafka-operator --replicas 0
Scale the
tf-kafkastatefulSet to 0 replicas:kubectl -n tf scale sts tf-kafka --replicas 0
Verify the number of replicas:
kubectl -n tf get sts tf-kafka
Example of a positive system response:
NAME READY AGE tf-kafka 0/0 10h
Stop and wipe the Cassandra database:
Scale the
cassandra-operatordeployment to 0 replicas:kubectl -n tf scale deploy cassandra-operator --replicas 0
Scale the
tf-cassandra-config-dc1-rack1statefulSet to 0 replicas:kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 0
Verify the number of replicas:
kubectl -n tf get sts tf-cassandra-config-dc1-rack1
Example of a positive system response:
NAME READY AGE tf-cassandra-config-dc1-rack1 0/0 10h
Delete Persistent Volume Claims (PVCs) for the Cassandra configuration pods:
kubectl -n tf delete pvc -l app=cassandracluster,cassandracluster=tf-cassandra-config
Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.
Stop and wipe the ZooKeeper database:
Scale the
zookeeper-operatordeployment to 0 replicas:kubectl -n tf scale deploy zookeeper-operator --replicas 0
Scale the
tf-zookeeperstatefulSet to 0 replicas:kubectl -n tf scale sts tf-zookeeper --replicas 0
Verify the number of replicas:
kubectl -n tf get sts tf-zookeeper
Example of a positive system response:
NAME READY AGE tf-zookeeper 0/0 10h
Delete PVCs for the ZooKeeper configuration pods:
kubectl -n tf delete pvc -l app=tf-zookeeper
Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.
Restore the number of replicas to run Cassandra and ZooKeeper and restore the deleted PVCs.
Restore the
cassandra-operatordeployment replicas:kubectl -n tf scale deploy cassandra-operator --replicas 1
Restore the
tf-cassandra-config-dc1-rack1statefulSet replicas:kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 3
Verify that Cassandra pods have been created and are running:
kubectl -n tf get pod -l app=cassandracluster,cassandracluster=tf-cassandra-config
Example of a positive system response:
NAME READY STATUS RESTARTS AGE tf-cassandra-config-dc1-rack1-0 1/1 Running 0 4m43s tf-cassandra-config-dc1-rack1-1 1/1 Running 0 3m30s tf-cassandra-config-dc1-rack1-2 1/1 Running 0 2m6s
Restore the
zookeeper-operatordeployment replicas:kubectl -n tf scale deploy zookeeper-operator --replicas 1
Restore the
tf-zookeeperstatefulSet replicas:kubectl -n tf scale sts tf-zookeeper --replicas 3
Verify that ZooKeeper pods have been created and are running:
kubectl -n tf get pod -l app=tf-zookeeper
Example of a positive system response:
NAME READY STATUS RESTARTS AGE tf-zookeeper-0 1/1 Running 0 3m23s tf-zookeeper-1 1/1 Running 0 2m56s tf-zookeeper-2 1/1 Running 0 2m20s
Restore the data from the backup:
Note
Do not use the Tungsten Fabric API container used for the backup file creation. In this case, a session with the Cassandra and ZooKeeper databases is created once the Tungsten Fabric API service starts but the Tungsten Fabric configuration services are stopped. The tools for the data backup and restore are available only in the Tungsten Fabric configuration API container. Using the steps below, start a blind container based on the
config-apiimage.Deploy a pod using the configuration API image obtained in the first step:
Note
If your deployment uses the
cqlCassandra driver, update the value of theCONFIGDB_CASSANDRA_DRIVERenvironment variable tocql.cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/psp: privileged labels: app: tf-restore-db name: tf-restore-db namespace: tf spec: containers: - name: api image: <PUT_LINK_TO_CONFIG_API_IMAGE_FROM_STEP_ABOVE> command: - sleep - infinity envFrom: - configMapRef: name: tf-rabbitmq-cfgmap - configMapRef: name: tf-zookeeper-cfgmap - configMapRef: name: tf-cassandra-cfgmap - configMapRef: name: tf-services-cfgmap - secretRef: name: tf-os-secret env: - name: CONFIGDB_CASSANDRA_DRIVER value: thrift imagePullPolicy: Always nodeSelector: tfcontrol: enabled dnsPolicy: ClusterFirstWithHostNet enableServiceLinks: true hostNetwork: true priority: 0 restartPolicy: Always serviceAccount: default serviceAccountName: default EOF
If you use the backup that was created automatically, extend the YAML file content above with the following configuration:
... spec: containers: - name: api volumeMounts: - mountPath: </PATH/TO/MOUNT> name: <TF-DBBACKUP-VOL-NAME> volumes: - name: <TF-DBBACKUP-VOL-NAME> persistentVolumeClaim: claimName: <TF-DBBACKUP-PVC-NAME>
Copy the database dump to the container:
Warning
Skip this step if you use the auto-backup and have provided the volume definition as described above.
kubectl cp <PATH_TO_DB_DUMP> tf/tf-restore-db:/tmp/db-dump.json
Copy the
contrail-api.conffile to the container:kubectl cp <PATH-TO-CONFIG> tf/tf-restore-db:/tmp/contrail-api.conf
Join the restarted container:
kubectl -n tf exec -it tf-restore-db -- bash
Restore the Cassandra database from the backup:
(config-api) $ cd /usr/lib/python3.6/site-packages/cfgm_common (config-api) $ python db_json_exim.py --import-from /tmp/db-dump.json --api-conf /tmp/contrail-api.conf
Delete the restore container:
kubectl -n tf delete pod tf-restore-db
Restore the replica number to run Kafka:
Restore the
kafka-operatordeployment replicas:kubectl -n tf scale deploy kafka-operator --replicas 1
Kafka operator should automatically restore the number of replicas of the appropriate StatefulSet.
Verify the number of replicas:
kubectl -n tf get sts tf-kafka
Example of a positive system response:
NAME READY AGE tf-kafka 3/3 10h
Run the Tungsten Fabric Operator to restore the Tungsten Fabric configuration and analytics services, if the latter were present in your deployment:
Restore the replica for the Tungsten Fabric Operator Deployment:
kubectl -n tf scale deploy tungstenfabric-operator --replicas 1
Verify that the Tungsten Fabric Operator is running properly without any restarts:
kubectl -n tf get pod -l name=tungstenfabric-operator
Verify that the configuration and analytics pods, if the latter were present in your deployment, have been automatically started:
kubectl -n tf get pod -l app=tf-config kubectl -n tf get pod -l tungstenfabric=analytics kubectl -n tf get pod -l tungstenfabric=analytics-snmp
Restart the
tf-controlservices:Caution
To avoid network downtime, do not restart all pods simultaneously.
List the
tf-controlpodskubectl -n tf get pods -l app=tf-control
Restart the
tf-controlpods one by one.Caution
Before restarting the
tf-controlpods:Verify that the new pods are successfully spawned.
Verify that no vRouters are connected to only one
tf-controlpod that will be restarted.
kubectl -n tf delete pod tf-control-<hash>