Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.19.0 including the Cluster releases 11.3.0 and 7.9.0.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.


MKE

[20651] A cluster deployment or update fails with not ready compose deployments

A managed cluster deployment, attachment, or update to a Cluster release with MKE versions 3.3.13, 3.4.6, 3.5.1, or earlier may fail with the compose pods flapping (ready > terminating > pending) and with the following error message appearing in logs:

'not ready: deployments: kube-system/compose got 0/0 replicas, kube-system/compose-api
 got 0/0 replicas'
 ready: false
 type: Kubernetes

Workaround:

  1. Disable Docker Content Trust (DCT):

    1. Access the MKE web UI as admin.

    2. Navigate to Admin > Admin Settings.

    3. In the left navigation pane, click Docker Content Trust and disable it.

  2. Restart the affected deployments such as calico-kube-controllers, compose, compose-api, coredns, and so on:

    kubectl -n kube-system delete deployment <deploymentName>
    

    Once done, the cluster deployment or update resumes.

  3. Re-enable DCT.



Bare metal

[24005] Deletion of a node with ironic Pod is stuck in the Terminating state

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

  • All Pods are stuck in the Terminating state

  • A new ironic Pod fails to start

  • The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.

[20736] Region deletion failure after regional deployment failure

If a baremetal-based regional cluster deployment fails before pivoting is done, the corresponding region deletion fails.

Workaround:

Using the command below, manually delete all possible traces of the failed regional cluster deployment, including but not limited to the following objects that contain the kaas.mirantis.com/region label of the affected region:

  • cluster

  • machine

  • baremetalhost

  • baremetalhostprofile

  • l2template

  • subnet

  • ipamhost

  • ipaddr

kubectl delete <objectName> -l kaas.mirantis.com/region=<regionName>

Warning

Do not use the same region name again after the regional cluster deployment failure since some objects that reference the region name may still exist.



StackLight

[27732-1] OpenSearch PVC size custom settings are dismissed during deployment

The OpenSearch elasticsearch.persistentVolumeClaimSize custom setting is overwritten by logging.persistentVolumeClaimSize during deployment of a Container Cloud cluster of any type and is set to the default 30Gi.

Note

This issue does not block the OpenSearch cluster operations if the default retention time is set. The default setting is usually enough for the capacity size of this cluster.

The issue may affect the following Cluster releases:

  • 11.2.0 - 11.5.0

  • 7.8.0 - 7.11.0

  • 8.8.0 - 8.10.0 ( MOSK clusters)

  • 10.2.4 - 10.8.1 (attached MKE 3.4.x clusters)

  • 13.0.2 - 13.5.1 (attached MKE 3.5.x clusters)

To verify that the cluster is affected:

Note

In the commands below, substitute parameters enclosed in angle brackets to match the affected cluster values.

kubectl --kubeconfig=<managementClusterKubeconfigPath> \
-n <affectedClusterProjectName> \
get cluster <affectedClusterName> \
-o=jsonpath='{.spec.providerSpec.value.helmReleases[*].values.elasticsearch.persistentVolumeClaimSize}' | xargs echo config size:


kubectl --kubeconfig=<affectedClusterKubeconfigPath> \
-n stacklight get pvc -l 'app=opensearch-master' \
-o=jsonpath="{.items[*].status.capacity.storage}" | xargs echo capacity sizes:
  • The cluster is not affected if the configuration size value matches or is less than any capacity size. For example:

    config size: 30Gi
    capacity sizes: 30Gi 30Gi 30Gi
    
    config size: 50Gi
    capacity sizes: 100Gi 100Gi 100Gi
    
  • The cluster is affected if the configuration size is larger than any capacity size. For example:

    config size: 200Gi
    capacity sizes: 100Gi 100Gi 100Gi
    

Workaround for a new cluster creation:

  1. Select from the following options:

    • For a management or regional cluster, during the bootstrap procedure, open cluster.yaml.template for editing.

    • For a managed cluster, open the Cluster object for editing.

      Caution

      For a managed cluster, use the Container Cloud API instead of the web UI for cluster creation.

  2. In the opened .yaml file, add logging.persistentVolumeClaimSize along with elasticsearch.persistentVolumeClaimSize. For example:

    apiVersion: cluster.k8s.io/v1alpha1
    spec:
    ...
      providerSpec:
        value:
        ...
          helmReleases:
          - name: stacklight
            values:
              elasticsearch:
                persistentVolumeClaimSize: 100Gi
              logging:
                enabled: true
                persistentVolumeClaimSize: 100Gi
    
  3. Continue the cluster deployment. The system will use the custom value set in logging.persistentVolumeClaimSize.

    Caution

    If elasticsearch.persistentVolumeClaimSize is absent in the .yaml file, the Admission Controller blocks the configuration update.

Workaround for an existing cluster creation:

[27732-2] Custom settings for ‘elasticsearch.logstashRetentionTime’ are dismissed

Custom settings for the deprecated elasticsearch.logstashRetentionTime parameter are overwritten by the default setting set to 1 day.

The issue may affect the following Cluster releases with enabled elasticsearch.logstashRetentionTime:

  • 11.2.0 - 11.5.0

  • 7.8.0 - 7.11.0

  • 8.8.0 - 8.10.0 ( MOSK clusters)

  • 10.2.4 - 10.8.1 (attached MKE 3.4.x clusters)

  • 13.0.2 - 13.5.1 (attached MKE 3.5.x clusters)

As a workaround, in the Cluster object, replace elasticsearch.logstashRetentionTime with elasticsearch.retentionTime that was implemented to replace the deprecated parameter. For example:

apiVersion: cluster.k8s.io/v1alpha1
kind: Cluster
spec:
  ...
  providerSpec:
    value:
    ...
      helmReleases:
      - name: stacklight
        values:
          elasticsearch:
            retentionTime:
              logstash: 10
              events: 10
              notifications: 10
          logging:
            enabled: true

For the StackLight configuration procedure and parameters description, refer to Configure StackLight.

[20876] StackLight pods get stuck with the ‘NodeAffinity failed’ error

On a managed cluster, the StackLight pods may get stuck with the Pod predicate NodeAffinity failed error in the pod status. The issue may occur if the StackLight node label was added to one machine and then removed from another one.

The issue does not affect the StackLight services, all required StackLight pods migrate successfully except extra pods that are created and stuck during pod migration.

As a workaround, remove the stuck pods:

kubectl --kubeconfig <managedClusterKubeconfig> -n stacklight delete pod <stuckPodName>

Container Cloud web UI

[26416] Failure to upload an MKE client bundle during cluster attachment

Fixed in 2.21.0

During attachment of an existing MKE cluster using the Container Cloud web UI, uploading of an MKE client bundle fails with a false-positive message about a successful uploading.

Workaround:

Select from the following options:

  • Fill in the required fields for the MKE client bundle manually.

  • In the Attach Existing MKE Cluster window, use upload MKE client bundle twice to upload ucp.bundle-admin.zip and ucp-docker-bundle.zip located in the first archive.

[23002] Inability to set a custom value for a predefined node label

Fixed in 2.21.0

During machine creation using the Container Cloud web UI, a custom value for a node label cannot be set.

As a workaround, manually add the value to spec.providerSpec.value.nodeLabels in machine.yaml.