Tungsten Fabric known issues and limitations

This section lists the Tungsten Fabric known issues with workarounds for the Mirantis OpenStack for Kubernetes release 21.6.


Limitations

Tungsten Fabric does not provide the following functionality:

  • Automatic generation of network port records in DNSaaS (Designate) as Neutron with Tungsten Fabric as a backend is not integrated with DNSaaS. As a workaround, you can use the Tungsten Fabric built-in DNS service that enables virtual machines to resolve each other names.

  • Secret management (Barbican). You cannot use the certificates stored in Barbican to terminate HTTPs in a load balancer.

  • Role Based Access Control (RBAC) for Neutron objects.

  • Modification of custom vRouter DaemonSets based on the SR-IOV definition in the OsDpl CR.


[10096] tf-control does not refresh IP addresses of Cassandra pods

The tf-control service resolves the DNS names of Cassandra pods at startup and does not update them if Cassandra pods got new IP addresses, for example, in case of a restart. As a workaround, to refresh the IP addresses of Cassandra pods, restart the tf-control pods one by one:

Caution

Before restarting the tf-control pods:

  • Verify that the new pods are successfully spawned.

  • Verify that no vRouters are connected to only one tf-control pod that will be restarted.

kubectl -n tf delete pod tf-control-<hash>

[13755] TF pods switch to CrashLoopBackOff after a simultaneous reboot

Rebooting all Cassandra cluster TFConfig or TFAnalytics nodes, maintenance, or other circumstances that cause the Cassandra pods to start simultaneously may cause a broken Cassandra TFConfig and/or TFAnalytics cluster. In this case, Cassandra nodes do not join the ring and do not update the IPs of the neighbor nodes. As a result, the TF services cannot operate Cassandra cluster(s).

To verify that a Cassandra cluster is affected:

Run the nodetool status command specifying the config or analytics cluster and the replica number:

kubectl -n tf exec -it tf-cassandra-<config/analytics>-dc1-rack1-<replica number> -c cassandra -- nodetool status

Example of system response with outdated IP addresses:

Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
DN  <outdated ip>   ?          256          64.9%             a58343d0-1e3f-4d54-bcdf-9b9b949ca873  r1
DN  <outdated ip>   ?          256          69.8%             67f1d07c-8b13-4482-a2f1-77fa34e90d48  r1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens       Owns (effective)  Host ID                               Rack
UN  <actual ip>      3.84 GiB   256          65.2%             7324ebc4-577a-425f-b3de-96faac95a331  rack1

Workaround:

Manually delete a Cassandra pod from the failed config or analytics cluster to re-initiate the bootstrap process for one of the Cassandra nodes:

kubectl -n tf delete pod tf-cassandra-<config/analytics>-dc1-rack1-<replica number>

[15684] Pods fail when rolling Tungsten Fabric 2011 back to 5.1

Some tf-control and tf-analytics pods may fail during the Tungsten Fabric rollback from version 2011 to 5.1. In this case, the control container from the tf-control pod and/or the collector container from the tf-analytics pod contain SYS_WARN messages such as … AMQP_QUEUE_DELETE_METHOD caused: PRECONDITION_FAILED - queue ‘<contrail-control/contrail-collector>.<nodename>’ in vhost ‘/’ not empty ….

The workaround is to manually delete the queue that fails to be deleted by AMQP_QUEUE_DELETE_METHOD:

kubectl -n tf exec -it tf-rabbitmq-<num of replica> -- rabbitmqctl delete_queue <queue name>

[18148] TF resets BGP_ASN, ENCAP_PRIORITY, and VXLAN_VN_ID_MODE to defaults

Invalid, click for details

During LCM operations such as Tungsten Fabric update or upgrade, the following parameters defined by the cluster administrator are reset to the following defaults upon the tf-config pod restart:

  • BGP_ASN to 64512

  • ENCAP_PRIORITY to MPLSoUDP,MPLSoGRE,VXLAN

  • VXLAN_VN_ID_MODE to automatic

As a workaround, manually set up values for the required parameters if they differ from the defaults:

controllers:
  tf-config:
    provisioner:
      containers:
      - env:
        - name: BGP_ASN
          value: <USER_BGP_ASN_VALUE>
        - name: ENCAP_PRIORITY
          value: <USER_ENCAP_PRIORITY_VALUE>
        name: provisioner

[19195] Managed cluster status is flapping between the Ready/Not Ready states

Fixed in MOS 22.1

The status of a managed cluster may be flapping between the Ready and Not Ready states in the Container Cloud web UI. In this case, if the cluster Status field includes a message about not ready tf/tf-tool-status-aggregator and/or tf-tool-status-party deployments with 1/1 replicas, the status flapping may be caused by frequent updates of these deployments by the Tungsten Fabric Operator.

Workaround:

  1. Verify whether the tf/tf-tool-status-aggregator and tf-tool-status-party deployments are up and running:

    kubectl -n tf get deployments
    
  2. Safely disable the tf/tf-tool-status-aggregator and tf-tool-status-party deployments through the TFOperator CR:

    spec:
      controllers:
        tf-tool:
          status:
            enabled: false
          statusAggregator:
            enabled: false
          statusThirdParty:
            enabled: false