Tungsten Fabric known issues¶
This section lists the Tungsten Fabric (TF) known issues with workarounds for the Mirantis OpenStack for Kubernetes release 23.3. For TF limitations, see Tungsten Fabric known limitations.
[37684] Cassandra containers are experiencing high resource utilization
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot
[37684] Cassandra containers are experiencing high resource utilization¶
The Cassandra containers of the tf-cassandra-analytics
service are
experiencing high CPU and memory utilization. This is happening because
Cassandra Analytics is running out of memory, causing restarts of both
Cassandra and the Tungsten Fabric control plane services.
To work around the issue, use the custom images from the Mirantis public repository:
Specify the image for
config-api
in theTFOperator
custom resource:controllers: tf-config: api: containers: - image: mirantis.azurecr.io/tungsten/contrail-controller-config-api:23.2-r21.4.20231208123354 name: api
Wait for the
tf-config
pods to restart.Monitor the Cassandra Analytics resources continuously. If the Out Of Memory (OOM) error is not present, the applied workaround is sufficient.
Otherwise, modify the TF vRouters configuration as well:
controllers: tf-vrouter: agent: containers: - env: - name: VROUTER_GATEWAY value: 10.32.6.1 - name: DISABLE_TX_OFFLOAD value: "YES" name: agent image: mirantis.azurecr.io/tungsten/contrail-vrouter-agent:23.2-r21.4.20231208123354
To apply the changes, restart the vRouters manually.
[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot¶
Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or other circumstances that cause the Cassandra pods to start simultaneously may cause a broken Cassandra TFConfig and/or TFAnalytics cluster. In this case, Cassandra nodes do not join the ring and do not update the IPs of the neighbor nodes. As a result, the TF services cannot operate Cassandra cluster(s).
To verify that a Cassandra cluster is affected:
Run the nodetool status command specifying the config
or
analytics
cluster and the replica number:
kubectl -n tf exec -it tf-cassandra-<config/analytics>-dc1-rack1-<replica number> -c cassandra -- nodetool status
Example of system response with outdated IP addresses:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN <outdated ip> ? 256 64.9% a58343d0-1e3f-4d54-bcdf-9b9b949ca873 r1
DN <outdated ip> ? 256 69.8% 67f1d07c-8b13-4482-a2f1-77fa34e90d48 r1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN <actual ip> 3.84 GiB 256 65.2% 7324ebc4-577a-425f-b3de-96faac95a331 rack1
Workaround:
Manually delete the Cassandra pod from the failed config
or analytics
cluster to re-initiate the bootstrap process for one of the Cassandra nodes:
kubectl -n tf delete pod tf-cassandra-<config/analytics>-dc1-rack1-<replica_num>