Addressed issues

The following issues have been addressed in the MOSK 26.1 release:

Bare metal

  • [54431] Resolved the issue that caused InfraConnectivityMonitor to report a false-positive ok status for machine readiness while some machines were not yet processed by the controller, so the number of machines in the InfraConnectivityMonitor status did not match the actual cluster machines under infrastructure connectivity monitoring.

  • [55281] Resolved the issue that caused Calico pods to get stuck in the Pending state during MOSK cluster update when new Calico images were scheduled on nodes that had not been updated yet.

  • [56624] Resolved the issue that caused Ansible to get stuck in the Reconfigure state when trying to pull the latest mirantis.azurecr.io/lcm/external/pause image after disabling auditd on MOSK clusters configured with proxy.

  • [57425] Resolved the issue that caused Ironic failure to provision a server with a Linux raid10 device defined in BareMetalHostProfile, resulting in an Ansible error when reassembling the MD device.

Cluster update

  • [55211] Adjusted validation for the management of update groups to prohibit changing the update group order if the update for any node in that group has already started. This prevents situations where several nodes from different update groups start updating at the same time.

  • [57014] Resolved the issue that caused the cluster update to hang during the worker node update step when updating to a patch release. The issue occurred when machines in an update group with a higher index than that of the update group displayed in the update plan stuck step were evacuated prematurely in seamless-upgrade mode.

OpenStack

  • [42386] Resolved the issue that caused a load balancer service not to obtain the external IP address when two services shared the same external IP and had the same externalTrafficPolicy value. Caused by the MetalLB upstream issue.

  • [50258] Resolved the issue that prevented openvswitch-ovn-db pods from starting during node maintenance causing the MOSK cluster to become inoperable.

  • [51127] Resolved the issue that caused a replaced node to create a new OVN database cluster instead of rejoining the existing cluster.

  • [54416] Resolved the issue that caused the OpenStackDeployment state to be stuck with the APPLYING status after cluster update.

  • [54430] Resolved the issue that caused AMQP message delivery to fail when the message size exceeded the configured RabbitMQ maximum message size limit, resulting in MessageDeliveryFailure and potential disruption of services using AMQP.

  • [Ironic] [55512] Resolved the issue with missing image tags that caused the ironic-update-nodes-metadata job to fail raising the KubeJobFailed alert.

  • [OVN] [55191] Resolved the issue that caused communication failure between VMs using FIPs in the same external network when VMs were connected to the same private network.

  • [OVN] [55262] Resolved the issue that caused sporadic failures of the OVN database bootstrap.

  • [OVN] [55352] Resolved the issue that prevented the OVN database replica from joining the existing cluster after replacement of a failed control plane node due to misrouted raft heartbeats.

  • [OVN] [55768] Resolved the issue that caused an OVN load balancer for monitoring services to become unreachable.

Others

  • [Ceph] [56345] Resolved the unmarshalling issue in Ceph that caused Ceph LCM to fail or Ceph update to get stuck with the failed MiraCephMaintenance status.

  • [Core] [54393] Resolved the issue that caused the MOSK cluster status to display outdated information after restoring the Ceph cluster health of the MiraCeph-based Ceph.

  • [LCM] [7947] Resolved the issue that caused Docker panic and its service restarts every 24 hours due to errors related to sending of telemetry statistics. The issue was resolved by updating MCR to 25.0.13.1 that does not contain compatibility issues with sending of telemetry statistics. Also, disabled sending of MKE telemetry statistics in the default configuration.