Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!

Starting with MOSK 25.2, the MOSK documentation set covers all product layers, including MOSK management (formerly Container Cloud). This means everything you need is in one place. Some legacy names may remain in the code and documentation and will be updated in future releases. The separate Container Cloud documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.

Troubleshoot an operating system upgrade with host restart¶

Mandatory host restart for the operating system (OS) upgrade is designed to be safe and takes certain precautions to protect the user data and the cluster integrity. However, sometimes it may result in a host-level failure and block the cluster update. Use this section to troubleshoot such issues.

Warning

The OS upgrade cannot be rolled back on a host or cluster level. If the OS upgrade fails, recover or remove the faulty host before you can complete the cluster upgrade.

Caution

Depending on the cluster configuration, applying security updates and host restart can increase the update time for each node to up to 1 hour.
Cluster nodes are updated one by one. Therefore, for large clusters, the update may take several days to complete.

Pre-upgrade workload lock issues¶

If the cluster upgrade does not start, verify whether the ceph-clusterworkloadlock object is present in the MOSK management API:

kubectl get clusterworkloadlocks

Example of system response:

NAME                       AGE
ceph-clusterworkloadlock   7h37m

This object indicates that LCM operations that require hosts restart cannot start on the cluster. The Ceph Controller verifies that Ceph services are prepared for restart. Once the Ceph Controller completes verification, it removes the ceph-clusterworkloadlock object and the cluster upgrade starts.

If this object is still present after the upgrade is initiated, assess the logs of the ceph-controller pod to identify and fix errors:

kubectl -n ceph-lcm-mirantis logs deployments/ceph-controller

If a node upgrade does not start, verify whether the NodeWorkloadLock object is present in the MOSK management API:

kubectl get nodeworkloadlocks

If the object is present, assess the affected node logs to identify and fix errors.

Host restart issues¶

If the host cannot boot after upgrade, verify the following possible issues:

Invalid boot order configuration in the host BIOS settings
Inspect the host settings using the IPMI console. If you see a message about an invalid boot device, verify and correct the boot order in the host BIOS settings. Set the first boot device to a network card and the second device to a local disk (legacy or UEFI).
The host is stuck in the GRUB rescue mode
If you see the following message, you are likely affected by the Ubuntu known issue in the Ubuntu grub-installer:
Entering rescue mode... grub rescue>
In this case, redeploy the host with a correctly defined BareMetalHostProfile. You will have to delete the corresponding Machine resource and create a new Machine with the corresponding BareMetalHostProfile. For details, see Create MOSK host profiles.