Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!

Starting with MOSK 25.2, the MOSK documentation set covers all product layers, including MOSK management (formerly Container Cloud). This means everything you need is in one place. Some legacy names may remain in the code and documentation and will be updated in future releases. The separate Container Cloud documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.

Restore Tungsten Fabric data¶

Important

Before MOSK 25.2, due to the known issue 53831, do not use the below procedure in clusters that have undergone at least one major or patch cluster update.

Restoration of Tungsten Fabric database from a backup may lead to missing default configuration objects such as default IPAM, global configuration settings, and so on. This may result in critical components such as tf-config and tf-config-db to repeatedly crash, which renders the Dashboard service (OpenStack Horizon) and other services non-functional and impacts workloads.

This section describes how to restore the Cassandra and ZooKeeper databases from the backups created either automatically or manually as described in Back up TF databases.

Caution

The data backup must be consistent across all systems because the state of the Tungsten Fabric databases is associated with other system databases, such as OpenStack databases.

Manually restore the data¶

Obtain the config API image repository and tag.

API v2 Available since MOSK 24.1

kubectl -n tf get tfconfig tf-config -o jsonpath='{.spec.images.config.configAPI}'

API v1alpha1 Removed in MOSK 25.1

kubectl -n tf get tfconfig tf-config -o=jsonpath='{.spec.api.containers[?(@.name=="api")].image}'

From the output, copy the entire image link.

Terminate the configuration and analytics services, if the latter are present in your deployment, and stop the database changes associated with northbound APIs on all systems.

Note

The Tungsten Fabric Operator watches related resources and keeps them updated and healthy. If any resource is deleted or changed, the Tungsten Fabric Operator automatically runs reconciling to create a resource or change the configuration back to the required state. Therefore, the Tungsten Fabric Operator must not be running during the data restoration.
1. Scale the tungstenfabric-operator deployment to 0 replicas:
```
kubectl -n tf scale deploy tungstenfabric-operator --replicas 0
```
2. Verify the number of replicas:
```
kubectl -n tf get deploy tungstenfabric-operator
```
  Example of a positive system response:
```
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
tungstenfabric-operator   0/0     0            0           10h
```
3. Delete the Tungsten Fabric configuration and analytics DaemonSets, if the latter are present in your deployment:
```
kubectl -n tf delete daemonset tf-config
kubectl -n tf delete daemonset tf-config-db
kubectl -n tf delete daemonset tf-analytics
kubectl -n tf delete daemonset tf-analytics-snmp
```
  The Tungsten Fabric configuration pods should be automatically terminated.
4. Verify that the Tungsten Fabric configuration and analytics pods, if the latter are present in your deployment, are terminated:
```
kubectl -n tf get pod -l app=tf-config
kubectl -n tf get pod -l tungstenfabric=analytics
kubectl -n tf get pod -l tungstenfabric=analytics-snmp
```
  Example of a positive system response:
```
No resources found.
```

Stop Kafka:

Scale the kafka-operator deployment to 0 replicas:

kubectl -n tf scale deploy kafka-operator --replicas 0

Scale the tf-kafka statefulSet to 0 replicas:

kubectl -n tf scale sts tf-kafka --replicas 0

Verify the number of replicas:

kubectl -n tf get sts tf-kafka

Example of a positive system response:

NAME       READY   AGE
tf-kafka   0/0     10h

Stop and wipe the Cassandra database:
1. Scale the cassandra-operator deployment to 0 replicas:
```
kubectl -n tf scale deploy cassandra-operator --replicas 0
```
2. Scale the tf-cassandra-config-dc1-rack1 statefulSet to 0 replicas:
```
kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 0
```
3. Verify the number of replicas:
```
kubectl -n tf get sts tf-cassandra-config-dc1-rack1
```
  Example of a positive system response:
```
NAME                            READY   AGE
tf-cassandra-config-dc1-rack1   0/0     10h
```
4. Delete Persistent Volume Claims (PVCs) for the Cassandra configuration pods:
```
kubectl -n tf delete pvc -l app=cassandracluster,cassandracluster=tf-cassandra-config
```
  Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.
Stop and wipe the ZooKeeper database:
1. Scale the zookeeper-operator deployment to 0 replicas:
```
kubectl -n tf scale deploy zookeeper-operator --replicas 0
```
2. Scale the tf-zookeeper statefulSet to 0 replicas:
```
kubectl -n tf scale sts tf-zookeeper --replicas 0
```
3. Verify the number of replicas:
```
kubectl -n tf get sts tf-zookeeper
```
  Example of a positive system response:
```
NAME           READY   AGE
tf-zookeeper   0/0     10h
```
4. Delete PVCs for the ZooKeeper configuration pods:
```
kubectl -n tf delete pvc -l app=tf-zookeeper
```
  Once PVCs are deleted, the related Persistent Volumes are automatically released. The release process takes approximately one minute.

Restore the number of replicas to run Cassandra and ZooKeeper and restore the deleted PVCs.

Restore the cassandra-operator deployment replicas:

kubectl -n tf scale deploy cassandra-operator --replicas 1

Restore the tf-cassandra-config-dc1-rack1 statefulSet replicas:

kubectl -n tf scale sts tf-cassandra-config-dc1-rack1 --replicas 3

Verify that Cassandra pods have been created and are running:

kubectl -n tf get pod -l app=cassandracluster,cassandracluster=tf-cassandra-config

Example of a positive system response:

NAME                              READY   STATUS    RESTARTS   AGE
tf-cassandra-config-dc1-rack1-0   1/1     Running   0          4m43s
tf-cassandra-config-dc1-rack1-1   1/1     Running   0          3m30s
tf-cassandra-config-dc1-rack1-2   1/1     Running   0          2m6s

Restore the zookeeper-operator deployment replicas:

kubectl -n tf scale deploy zookeeper-operator --replicas 1

Restore the tf-zookeeper statefulSet replicas:

kubectl -n tf scale sts tf-zookeeper --replicas 3

Verify that ZooKeeper pods have been created and are running:

kubectl -n tf get pod -l app=tf-zookeeper

Example of a positive system response:

NAME             READY   STATUS    RESTARTS   AGE
tf-zookeeper-0   1/1     Running   0          3m23s
tf-zookeeper-1   1/1     Running   0          2m56s
tf-zookeeper-2   1/1     Running   0          2m20s

Restore the data from the backup:

Note

Do not use the Tungsten Fabric API container used for the backup file creation. In this case, a session with the Cassandra and ZooKeeper databases is created once the Tungsten Fabric API service starts but the Tungsten Fabric configuration services are stopped. The tools for the data backup and restore are available only in the Tungsten Fabric configuration API container. Using the steps below, start a blind container based on the config-api image.

Deploy a pod using the configuration API image obtained in the first step:

Note

Since MOSK 24.1, if your deployment uses the cql Cassandra driver, update the value of the CONFIGDB_CASSANDRA_DRIVER environment variable to cql.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: privileged
  labels:
    app: tf-restore-db
  name: tf-restore-db
  namespace: tf
spec:
  containers:
    - name: api
      image: <PUT_LINK_TO_CONFIG_API_IMAGE_FROM_STEP_ABOVE>
      command:
        - sleep
        - infinity
      envFrom:
        - configMapRef:
            name: tf-rabbitmq-cfgmap
        - configMapRef:
            name: tf-zookeeper-cfgmap
        - configMapRef:
            name: tf-cassandra-cfgmap
        - configMapRef:
            name: tf-services-cfgmap
        - secretRef:
            name: tf-os-secret
      env:
      - name: CONFIGDB_CASSANDRA_DRIVER
        value: thrift
      imagePullPolicy: Always
  nodeSelector:
    tfcontrol: enabled
  dnsPolicy: ClusterFirstWithHostNet
  enableServiceLinks: true
  hostNetwork: true
  priority: 0
  restartPolicy: Always
  serviceAccount: default
  serviceAccountName: default
EOF

If you use the backup that was created automatically, extend the YAML file content above with the following configuration:

...
spec:
  containers:
  - name: api
    volumeMounts:
      - mountPath: </PATH/TO/MOUNT>
        name: <TF-DBBACKUP-VOL-NAME>
  volumes:
    - name: <TF-DBBACKUP-VOL-NAME>
      persistentVolumeClaim:
        claimName: <TF-DBBACKUP-PVC-NAME>

Copy the database dump to the container:

Warning

Skip this step if you use the auto-backup and have provided the volume definition as described above.
```
kubectl cp <PATH_TO_DB_DUMP> tf/tf-restore-db:/tmp/db-dump.json
```

Copy the contrail-api.conf file to the container:

kubectl cp <PATH-TO-CONFIG> tf/tf-restore-db:/tmp/contrail-api.conf

Join the restarted container:

kubectl -n tf exec -it tf-restore-db -- bash

Restore the Cassandra database from the backup:

Since MOSK 25.1

(config-api) $ cd /usr/lib/python3.6/site-packages/cfgm_common
(config-api) $ python db_json_exim.py --import-from /tmp/db-dump.json --api-conf /tmp/contrail-api.conf

Before MOSK 25.1

(config-api) $ cd /usr/lib/python2.7/site-packages/cfgm_common
(config-api) $ python db_json_exim.py --import-from /tmp/db-dump.json --api-conf /tmp/contrail-api.conf

Delete the restore container:
```
kubectl -n tf delete pod tf-restore-db
```

Restore the replica number to run Kafka:
1. Restore the kafka-operator deployment replicas:
```
kubectl -n tf scale deploy kafka-operator --replicas 1
```
  Kafka operator should automatically restore the number of replicas of the appropriate StatefulSet.
2. Verify the number of replicas:
```
kubectl -n tf get sts tf-kafka
```
  Example of a positive system response:
```
NAME       READY   AGE
tf-kafka   3/3     10h
```
Run the Tungsten Fabric Operator to restore the Tungsten Fabric configuration and analytics services, if the latter were present in your deployment:
1. Restore the replica for the Tungsten Fabric Operator Deployment:
```
kubectl -n tf scale deploy tungstenfabric-operator --replicas 1
```
2. Verify that the Tungsten Fabric Operator is running properly without any restarts:
```
kubectl -n tf get pod -l name=tungstenfabric-operator
```
3. Verify that the configuration and analytics pods, if the latter were present in your deployment, have been automatically started:
```
kubectl -n tf get pod -l app=tf-config
kubectl -n tf get pod -l tungstenfabric=analytics
kubectl -n tf get pod -l tungstenfabric=analytics-snmp
```
Restart the tf-control services:

Caution

To avoid network downtime, do not restart all pods simultaneously.
1. List the tf-control pods
```
kubectl -n tf get pods -l app=tf-control
```
2. Restart the tf-control pods one by one.
  Caution
  
  Before restarting the tf-control pods:
  - Verify that the new pods are successfully spawned.
  - Verify that no vRouters are connected to only one tf-control pod that will be restarted.
```
kubectl -n tf delete pod tf-control-<hash>
```

Restore Tungsten Fabric data¶

Automatically restore the data¶

Manually restore the data¶