Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!
Starting with MOSK 25.2, the MOSK documentation set covers all product layers, including MOSK management (formerly Container Cloud). This means everything you need is in one place. Some legacy names may remain in the code and documentation and will be updated in future releases. The separate Container Cloud documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.
Known issues¶
This section lists known issues with workarounds for the Mirantis Container Cloud release 2.29.6 including the Cluster releases 17.4.6 and 16.4.6. For the known issues in the related MOSK release, see MOSK 25.1.1 release notes: Known issues.
For other issues that can occur while deploying and operating a Container Cloud cluster, see Troubleshooting Guide.
Note
This section also outlines still valid known issues from previous releases.
Bare metal¶
[42386] A load balancer service does not obtain the external IP address¶
Due to the MetalLB upstream issue, a load balancer service may not obtain the external IP address.
The issue occurs when two services share the same external IP address and have
the same externalTrafficPolicy value. Initially, the services have the
external IP address assigned and are accessible. After modifying the
externalTrafficPolicy value for both services from Cluster to
Local, the first service that has been changed remains with no external IP
address assigned. Though, the second service, which was changed later, has the
external IP assigned as expected.
To work around the issue, make a dummy change to the service object where
external IP is <pending>:
Identify the service that is stuck:
kubectl get svc -A | grep pending
Example of system response:
stacklight iam-proxy-prometheus LoadBalancer 10.233.28.196 <pending> 443:30430/TCP
Add an arbitrary label to the service that is stuck. For example:
kubectl label svc -n stacklight iam-proxy-prometheus reconcile=1
Example of system response:
service/iam-proxy-prometheus labeledVerify that the external IP was allocated to the service:
kubectl get svc -n stacklight iam-proxy-prometheus
Example of system response:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE iam-proxy-prometheus LoadBalancer 10.233.28.196 10.0.34.108 443:30430/TCP 12d
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state¶
During deletion of a manager machine running the ironic Pod from a bare
metal management cluster, the following problems occur:
All Pods are stuck in the
TerminatingstateA new
ironicPod fails to startThe related bare metal host is stuck in the
deprovisioningstate
As a workaround, before deletion of the node running the ironic Pod,
cordon and drain the node using the kubectl cordon <nodeName> and
kubectl drain <nodeName> commands.
Ceph¶
[50637] Ceph creates second miracephnodedisable object during node disabling¶
Fixed in MOSK management 2.30.0 and MOSK 25.2
During managed cluster update, if some node is being disabled and at the same
time ceph-maintenance-controller is restarted, a second
miracephnodedisable object is erroneously created for the node. As a
result, the second object fails in the Cleaning state, which blocks
managed cluster update.
Workaround
On the affected managed cluster, obtain the list of
miracephnodedisableobjects:kubectl get miracephnodedisable -n ceph-lcm-mirantis
The system response must contain one completed and one failed
miracephnodedisableobject for the node being disabled. For example:NAME AGE NODE NAME STATE LAST CHECK ISSUE nodedisable-353ccad2-8f19-4c11-95c9-a783abb531ba 58m kaas-node-91207a35-3200-41d1-9ba9-388500970981 Ready 2025-03-06T22:04:48Z nodedisable-58bbf563-1c76-4319-8c28-363d73a5efef 57m kaas-node-91207a35-3200-41d1-9ba9-388500970981 Cleaning 2025-03-07T11:59:27Z host clean up Job 'ceph-lcm-mirantis/host-cleanup-nodedisable-58bbf563-1c76-4319-8c28-363d73a5efef' is failed, check logs
Remove the failed
miracephnodedisableobject. For example:kubectl delete miracephnodedisable -n ceph-lcm-mirantis nodedisable-58bbf563-1c76-4319-8c28-363d73a5efef
[26441] Cluster update fails with the MountDevice failed for volume warning¶
Note
The issue does not reproduce since MOSK 25.2.
Update of a managed cluster based on bare metal and Ceph enabled fails with
PersistentVolumeClaim getting stuck in the Pending state for the
prometheus-server StatefulSet and the
MountVolume.MountDevice failed for volume warning in the StackLight event
logs.
Workaround:
Verify that the description of the Pods that failed to run contain the
FailedMountevents:kubectl -n <affectedProjectName> describe pod <affectedPodName>
In the command above, replace the following values:
<affectedProjectName>is the Container Cloud project name where the Pods failed to run<affectedPodName>is a Pod name that failed to run in the specified project
In the Pod description, identify the node name where the Pod failed to run.
Verify that the
csi-rbdpluginlogs of the affected node contain therbd volume mount failed: <csi-vol-uuid> is being usederror. The<csi-vol-uuid>is a unique RBD volume name.Identify
csiPodNameof the correspondingcsi-rbdplugin:kubectl -n rook-ceph get pod -l app=csi-rbdplugin \ -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
Output the affected
csiPodNamelogs:kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
Scale down the affected
StatefulSetorDeploymentof the Pod that fails to0replicas.On every
csi-rbdpluginPod, search for stuckcsi-vol:for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do echo $pod kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid> done
Unmap the affected
csi-vol:rbd unmap -o force /dev/rbd<i>
The
/dev/rbd<i>value is a mapped RBD volume that usescsi-vol.Delete
volumeattachmentof the affected Pod:kubectl get volumeattachments | grep <csi-vol-uuid> kubectl delete volumeattacmhent <id>
Scale up the affected
StatefulSetorDeploymentback to the original number of replicas and wait until its state becomesRunning.
LCM¶
[50561] The local-volume-provisioner pod switches to CrashLoopBackOff¶
Fixed in MOSK management 2.30.0 and MOSK 25.2
After machine disablement and consequent re-enablement, persistent volumes
(PVs) provisioned by local-volume-provisioner may cause the
local-volume-provisioner pod on such machine to switch to the
CrashLoopBackOff state.
This occurs because re-enabling the machine results in a new node UID being
assigned in Kubernetes. As a result, the owner ID of the volumes provisioned by
local-volume-provisioner no longer matches the new node UID. Although the
volumes remain in the correct system paths, local-volume-provisioner
detects a mismatch in ownership, leading to an unhealthy service state.
Workaround:
Identify the ID of the affected
local-volume-provisioner:kubectl -n kube-system get pods -o wide
Example of system response extract:
local-volume-provisioner-h5lrc 0/1 CrashLoopBackOff 7 (5m12s ago) 14m 10.233.197.90 <K8S-NODE-NAME> <none> <none>
<K8S-NODE-NAME>is the name of the node where the exemplary podlocal-volume-provisioner-h5lrcis running.In the
local-volume-provisionerlogs, identify the affected PVs. For example:kubectl logs -n kube-system local-volume-provisioner-h5lrc | less
Example of system response extract:
E0304 23:21:31.455148 1 discovery.go:221] Failed to discover local volumes: 5 error(s) while discovering volumes: [error creating PV "local-pv-1d04ed53" for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol04": persistentvolumes "local-pv-1d04ed53" already exists error creating PV "local-pv-ce2dfc24" for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol01": persistentvolumes "local-pv-ce2dfc24" already exists error creating PV "local-pv-bcb9e4bd" for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol02": persistentvolumes "local-pv-bcb9e4bd" already exists error creating PV "local-pv-c5924ada" for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol03": persistentvolumes "local-pv-c5924ada" already exists error creating PV "local-pv-7c7150cf" for volume at "/mnt/local-volumes/openstack-operator/bind-mounts/vol00": persistentvolumes "local-pv-7c7150cf" already exists]
Update the
pv.kubernetes.io/provisioned-byannotation for all PVs that are mentioned in thealready existserrors on the enabled node. The annotation must have thelocal-volume-provisioner-<K8S-NODE-NAME>-<K8S-NODE-UID>format.<K8S-NODE-NAME>is the name of the node where thelocal-volume-provisionerpod is running.To obtain the node UID:
kubectl get node <K8S-NODE-NAME> -o jsonpath='{.metadata.uid}'
To edit annotations on the volumes:
kubectl edit pv <PV-NAME>
For example:
kubectl edit pv local-pv-ce2dfc24
[31186,34132] Pods get stuck during MariaDB operations¶
During MariaDB operations on a management cluster, Pods may get stuck in continuous restarts with the following example error:
[ERROR] WSREP: Corrupt buffer header: \
addr: 0x7faec6f8e518, \
seqno: 3185219421952815104, \
size: 909455917, \
ctx: 0x557094f65038, \
flags: 11577. store: 49, \
type: 49
Workaround:
Create a backup of the
/var/lib/mysqldirectory on themariadb-serverPod.Verify that other replicas are up and ready.
Remove the
galera.cachefile for the affectedmariadb-serverPod.Remove the affected
mariadb-serverPod or wait until it is automatically restarted.
After Kubernetes restarts the Pod, the Pod clones the database in 1-2 minutes and restores the quorum.
StackLight¶
[43474] Custom Grafana dashboards are corrupted¶
Custom Grafana panels and dashboards may be corrupted after automatic migration of deprecated Angular-based plugins to the React-based ones. For details, see MOSK Deprecation Notes: Angular plugins in Grafana dashboards and the post-update step Back up custom Grafana dashboards in Container Cloud 2.28.4 update notes.
To work around the issue, manually adjust the affected dashboards to restore their custom appearance.
Container Cloud web UI¶
[50181] Failure to deploy a compact cluster¶
Fixed in MOSK management 2.30.0 and MOSK 25.2
A compact MOSK cluster fails to be deployed through the Container Cloud web UI
due to inability to add any label to the control plane machines along with
inability to change dedicatedControlPlane: false using the web UI.
To work around the issue, manually add the required labels using CLI. Once done, the cluster deployment resumes.
[50168] Inability to use a new project right after creation¶
A newly created project does not display all available tabs in the Container
Cloud web UI and contains different access denied errors during first five
minutes after creation.
To work around the issue, refresh the browser in five minutes after the project creation.