Known issues

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.0.0.


AWS

[8013] Managed cluster deployment requiring PVs may fail

Fixed in the Cluster release 7.0.0

Note

The issue below affects only the Kubernetes 1.18 deployments. Moving forward, the workaround for this issue will be moved from Release Notes to Operations Guide: Troubleshooting.

On a management cluster with multiple AWS-based managed clusters, some clusters fail to complete the deployments that require persistent volumes (PVs), for example, Elasticsearch. Some of the affected pods get stuck in the Pending state with the pod has unbound immediate PersistentVolumeClaims and node(s) had volume node affinity conflict errors.

Warning

The workaround below applies to HA deployments where data can be rebuilt from replicas. If you have a non-HA deployment, back up any existing data before proceeding, since all data will be lost while applying the workaround.

Workaround:

  1. Obtain the persistent volume claims related to the storage mounts of the affected pods:

    kubectl get pod/<pod_name1> pod/<pod_name2> \
    -o jsonpath='{.spec.volumes[?(@.persistentVolumeClaim)].persistentVolumeClaim.claimName}'
    

    Note

    In the command above and in the subsequent steps, substitute the parameters enclosed in angle brackets with the corresponding values.

  2. Delete the affected Pods and PersistentVolumeClaims to reschedule them: For example, for StackLight:

    kubectl -n stacklight delete \
    
      pod/<pod_name1> pod/<pod_name2> ...
      pvc/<pvc_name2> pvc/<pvc_name2> ...
    


Bare metal

[6988] LVM fails to deploy if the volume group name already exists

Fixed in Container Cloud 2.5.0

During a management or managed cluster deployment, LVM cannot be deployed on a new disk if an old volume group with the same name already exists on the target hardware node but on the different disk.

Workaround:

In the bare metal host profile specific to your hardware configuration, add the wipe: true parameter to the device that fails to be deployed. For the procedure details, see Operations Guide: Create a custom host profile.


IAM

[2757] IAM fails to start during management cluster deployment

Fixed in Container Cloud 2.4.0

During a management cluster deployment, IAM fails to start with the IAM pods being in the CrashLoopBackOff status.

Workaround:

  1. Log in to the bootstrap node.

  2. Remove the iam-mariadb-state configmap:

    kubectl delete cm -n kaas iam-mariadb-state
    
  3. Manually delete the mariadb pods:

    kubectl delete po -n kaas mariadb-server-{0,1,2}
    

    Wait for the pods to start. If the mariadb pod does not start with the connection to peer timed out exception, repeat the step 2.

  4. Obtain the MariaDB database admin password:

    kubectl get secrets -n kaas mariadb-dbadmin-password \
    -o jsonpath='{.data.MYSQL_DBADMIN_PASSWORD}' | base64 -d ; echo
    
  5. Log in to MariaDB:

    kubectl exec -it -n kaas mariadb-server-0 -- bash -c 'mysql -uroot -p<mysqlDbadminPassword>'
    

    Substitute <mysqlDbadminPassword> with the corresponding value obtained in the previous step.

  6. Run the following command:

    DROP DATABASE IF EXISTS keycloak;
    
  7. Manually delete the Keycloak pods:

    kubectl delete po -n kaas iam-keycloak-{0,1,2}
    

StackLight

[7101] Monitoring of disabled components

Fixed in 2.1.0

On the baremetal-based clusters, the monitoring of Ceph and Ironic is enabled when Ceph and Ironic are disabled. The issue with Ceph relates to both management or managed clusters, the issue with Ironic relates to managed clusters only.

Workaround:

  1. Open the StackLight configuration manifest as described in Operations Guide: Configure StackLight.

  2. Add the following parameter to the StackLight helmReleases values of the Cluster object to explicitly disable the required component monitoring:

    • For Ceph:

      helmReleases:
        - name: stacklight
          values:
            ...
            ceph:
              disabledOnBareMetal: true
            ...
      
    • For Ironic:

      helmReleases:
        - name: stacklight
          values:
            ...
            ironic:
              disabledOnBareMetal: true
            ...
      

[7324] Ceph monitoring disabled

Fixed in 2.1.0

Ceph monitoring may be disabled on the baremetal-based managed clusters due to a missing provider: BareMetal parameter.

Workaround:

  1. Open the StackLight configuration manifest as described in Operations Guide: Configure StackLight.

  2. Add the provider: BareMetal parameter to the StackLight helmReleases values of the Cluster object:

    spec:
      providerSpec:
        value:
          helmReleases:
          - name: stacklight
            values:
              ...
              provider: BareMetal
              ...
    

Storage

[6164] Small number of PGs per Ceph OSD

Fixed in 2.2.0

After deploying a managed cluster with Ceph, the number of placement groups (PGs) per Ceph OSD may be too small and the Ceph cluster may have the HEALTH_WARN status:

health: HEALTH_WARN
        too few PGs per OSD (3 < min 30)

The workaround is to enable the PG balancer to properly manage the number of PGs:

kexec -it $(k get pod -l "app=rook-ceph-tools" --all-namespaces -o jsonpath='{.items[0].metadata.name}') -n rook-ceph bash
ceph mgr module enable pg_autoscaler

[7131] rook-ceph-mgr fails during managed cluster deployment

Fixed in 2.2.0

Occasionally, the deployment of a managed cluster may fail during the Ceph Monitor or Manager deployment. In this case, the Ceph cluster may be down and and a stack trace similar to the following one may be present in Ceph Manager logs:

kubectl -n rook-ceph logs rook-ceph-mgr-a-c5dc846f8-k68rs

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/MonMap.h: In function 'void MonMap::add(const mon_info_t&)' thread 7fd3d3744b80 time 2020-09-03 10:16:46.586388
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/MonMap.h: 195: FAILED ceph_assert(addr_mons.count(a) == 0)
ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7fd3ca9b2875]
2: (()+0x253a3d) [0x7fd3ca9b2a3d]
3: (MonMap::add(mon_info_t const&)+0x80) [0x7fd3cad49190]
4: (MonMap::add(std::string const&, entity_addrvec_t const&, int)+0x110) [0x7fd3cad493a0]
5: (MonMap::init_with_ips(std::string const&, bool, std::string const&)+0xc9) [0x7fd3cad43849]
6: (MonMap::build_initial(CephContext*, bool, std::ostream&)+0x314) [0x7fd3cad45af4]
7: (MonClient::build_initial_monmap()+0x130) [0x7fd3cad2e140]
8: (MonClient::get_monmap_and_config()+0x5f) [0x7fd3cad365af]
9: (global_pre_init(std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const*, std::vector<char const*, std::allocator<char const*> >&, unsigned int, code_environment_t, int)+0x524) [0x55ce86711444]
10: (global_init(std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const*, std::vector<char const*, std::allocator<char const*> >&, unsigned int, code_environment_t, int, char const*, bool)+0x76) [0x55ce86711b56]
11: (main()+0x136) [0x55ce864ff9a6]
12: (__libc_start_main()+0xf5) [0x7fd3c6e73555]
13: (()+0xfc010) [0x55ce86505010]

The workaround is to start the managed cluster deployment from scratch.

[7073] Cannot automatically remove a Ceph node

When removing a worker node, it is not possible to automatically remove a Ceph node. The workaround is to manually remove the Ceph node from the Ceph cluster as described in Operations Guide: Add, remove, or reconfigure Ceph nodes before removing the worker node from your deployment.

Note

Moving forward, the workaround for this issue will be moved from Release Notes to Operations Guide: Troubleshoot Ceph.


Bootstrap

[7281] Space in PATH causes failure of bootstrap process

Fixed in 2.1.0

A management cluster bootstrap script fails if there is a space in the PATH environment variable. As a workaround, before running the bootstrap.sh script, verify that there are no spaces in the PATH environment variable.


Container Cloud web UI

[249] A newly created project does not display in the Container Cloud web UI

A project that is newly created in the Container Cloud web UI does not display in the Projects list even after refreshing the page. The issue occurs due to the token missing the necessary role for the new project. As a workaround, relogin to the Container Cloud web UI.