Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!

We are bringing MCC documentation into the MOSK documentation set so you can find everything in one place. The current MCC documentation portal will be retired soon, so we encourage you to update your bookmarks and workflows for continued easy access to the latest content.

Known issues¶

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.29.3 including the Cluster releases 17.3.8, 16.3.8, and 16.4.3. For the known issues in the related MOSK release, see MOSK release notes 24.3.5: Known issues.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal¶

[42386] A load balancer service does not obtain the external IP address¶

Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.

The issue occurs when two services share the same external IP address and have the same externalTrafficPolicy value. Initially, the services have the external IP address assigned and are accessible. After modifying the externalTrafficPolicy value for both services from Cluster to Local, the first service that has been changed remains with no external IP address assigned. Though, the second service, which was changed later, has the external IP assigned as expected.

To work around the issue, make a dummy change to the service object where external IP is <pending>:

Identify the service that is stuck:

kubectl get svc -A | grep pending

Example of system response:

stacklight  iam-proxy-prometheus  LoadBalancer  10.233.28.196  <pending>  443:30430/TCP

Add an arbitrary label to the service that is stuck. For example:

kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1

Example of system response:

service/iam-proxy-prometheus labeled

Verify that the external IP was allocated to the service:

kubectl get svc -n stacklight iam-proxy-prometheus

Example of system response:

NAME                  TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)        AGE
iam-proxy-prometheus  LoadBalancer  10.233.28.196  10.0.34.108  443:30430/TCP  12d

[24005] Deletion of a node with ironic Pod is stuck in the Terminating state¶

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

All Pods are stuck in the Terminating state
A new ironic Pod fails to start
The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.

Ceph¶

[50637] Ceph creates second miracephnodedisable object during node disabling¶

During managed cluster update, if some node is being disabled and at the same time ceph-maintenance-controller is restarted, a second miracephnodedisable object is erroneously created for the node. As a result, the second object fails in the Cleaning state, which blocks managed cluster update.

Workaround

On the affected managed cluster, obtain the list of miracephnodedisable objects:

kubectl get miracephnodedisable -n ceph-lcm-mirantis

The system response must contain one completed and one failed miracephnodedisable object for the node being disabled. For example:

NAME                                               AGE   NODE NAME                                        STATE      LAST CHECK             ISSUE
nodedisable-353ccad2-8f19-4c11-95c9-a783abb531ba   58m   kaas-node-91207a35-3200-41d1-9ba9-388500970981   Ready      2025-03-06T22:04:48Z
nodedisable-58bbf563-1c76-4319-8c28-363d73a5efef   57m   kaas-node-91207a35-3200-41d1-9ba9-388500970981   Cleaning   2025-03-07T11:59:27Z   host clean up Job 'ceph-lcm-mirantis/host-cleanup-nodedisable-58bbf563-1c76-4319-8c28-363d73a5efef' is failed, check logs

Remove the failed miracephnodedisable object. For example:

kubectl delete miracephnodedisable -n ceph-lcm-mirantis nodedisable-58bbf563-1c76-4319-8c28-363d73a5efef

[26441] Cluster update fails with the MountDevice failed for volume warning¶

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

Verify that the description of the Pods that failed to run contain the FailedMount events:
```
kubectl -n <affectedProjectName> describe pod <affectedPodName>
```
In the command above, replace the following values:
- <affectedProjectName> is the Container Cloud project name where the Pods failed to run
- <affectedPodName> is a Pod name that failed to run in the specified project
In the Pod description, identify the node name where the Pod failed to run.
Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.
1. Identify csiPodName of the corresponding csi-rbdplugin:
```
kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
-o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
```
2. Output the affected csiPodName logs:
```
kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
```
Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

On every csi-rbdplugin Pod, search for stuck csi-vol:

for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
  echo $pod
  kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
done

Unmap the affected csi-vol:
```
rbd unmap -o force /dev/rbd<i>
```
The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

Delete volumeattachment of the affected Pod:

kubectl get volumeattachments | grep <csi-vol-uuid>
kubectl delete volumeattacmhent <id>

Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.

LCM¶

[50561] The local-volume-provisioner pod switches to CrashLoopBackOff¶

After machine disablement and consequent re-enablement, persistent volumes (PVs) provisioned by local-volume-provisioner may cause the local-volume-provisioner pod on such machine to switch to the CrashLoopBackOff state.

This occurs because re-enabling the machine results in a new node UID being assigned in Kubernetes. As a result, the owner ID of the volumes provisioned by local-volume-provisioner no longer matches the new node UID. Although the volumes remain in the correct system paths, local-volume-provisioner detects a mismatch in ownership, leading to an unhealthy service state.

Workaround:

Identify the ID of the affected local-volume-provisioner:

kubectl -n kube-system get pods

Example of system response extract:

local-volume-provisioner-h5lrc   0/1   CrashLoopBackOff   33 (2m3s ago)   90m

In the local-volume-provisioner logs, identify the affected PVs. For example:

kubectl logs -n kube-system local-volume-provisioner-h5lrc | less

Example of system response extract:

E0304 23:21:31.455148    1 discovery.go:221] Failed to discover local volumes:
5 error(s) while discovering volumes: [error creating PV "local-pv-1d04ed53"
for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol04":
persistentvolumes "local-pv-1d04ed53" already exists error creating PV "local-pv-ce2dfc24"
for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol01":
persistentvolumes "local-pv-ce2dfc24" already exists error creating PV "local-pv-bcb9e4bd"
for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol02":
persistentvolumes "local-pv-bcb9e4bd" already exists error creating PV "local-pv-c5924ada"
for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol03":
persistentvolumes "local-pv-c5924ada" already exists error creating PV "local-pv-7c7150cf"
for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol00":
persistentvolumes "local-pv-7c7150cf" already exists]

Update the pv.kubernetes.io/provisioned-by annotation for all PVs that are mentioned in the already exists errors on the enabled node. The annotation must have the local-volume-provisioner-<K8S-NODE-NAME>-<K8S-NODE-UID> format.

To obtain the node UID:
```
kubectl get node <K8S-NODE-NAME> -o jsonpath='{.metadata.uid}'
```
To edit annotations on the volumes:
```
kubectl edit pv <PV-NAME>
```
For example:
```
kubectl edit pv local-pv-ce2dfc24
```

[31186,34132] Pods get stuck during MariaDB operations¶

During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:

[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49

Workaround:

Create a backup of the /var/lib/mysql directory on the mariadb-server Pod.
Verify that other replicas are up and ready.
Remove the galera.cache file for the affected mariadb-server Pod.
Remove the affected mariadb-server Pod or wait until it is automatically restarted.

After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.

StackLight¶

[43474] Custom Grafana dashboards are corrupted¶

Custom Grafana panels and dashboards may be corrupted after automatic migration of deprecated Angular-based plugins to the React-based ones. For details, see MOSK Deprecation Notes: Angular plugins in Grafana dashboards and the post-update step Back up custom Grafana dashboards in Container Cloud 2.28.4 update notes.

To work around the issue, manually adjust the affected dashboards to restore their custom appearance.

Container Cloud web UI¶

[50181] Failure to deploy a compact cluster¶

A compact MOSK cluster fails to be deployed through the Container Cloud web UI due to inability to add any label to the control plane machines along with inability to change dedicatedControlPlane: false using the web UI.

To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.

[50168] Inability to use a new project right after creation¶

A newly created project does not display all available tabs in the Container Cloud web UI and contains different access denied errors during first five minutes after creation.

To work around the issue, refresh the browser in five minutes after the project creation.