Skip to content

Troubleshooting#

Steps to Debug KubeVirt Cluster Deployments#

  1. Check the ClusterDeployment status condition on the management or regional cluster:

    kubectl -n $CLUSTER_NAMESPACE get clusterdeployment.k0rdent.mirantis.com $CLUSTER_NAME -o=jsonpath='{.status.conditions[?(@.type=="Ready")]}' | jq
    
  2. Check the KubevirtCluster status condition on the management or regional cluster:

    kubectl -n $CLUSTER_NAMESPACE get kubevirtcluster $CLUSTER_NAME -o=jsonpath='{.status.conditions[?(@.type=="Ready")]}' | jq
    
  3. Check the vm,vmi status on the KubeVirt Infrastructure Cluster:

    kubectl --kubeconfig $KUBEVIRT_INFRA_KUBECONFIG_PATH -n $CLUSTER_NAMESPACE get vm -l cluster.x-k8s.io/cluster-name=$CLUSTER_NAME -o=yaml
    
  4. Check the logs of the virt-handler pod on the KubeVirt Infrastructure Cluster:

    kubectl --kubeconfig $KUBEVIRT_INFRA_KUBECONFIG_PATH -n kubevirt get pods -l kubevirt.io=virt-handler
    
  5. Sometimes you need to SSH into the VM created on the KubeVirt Infrastructure Cluster to check system or k0s logs:

    virtctl console -n $CLUSTER_NAMESPACE $VM_NAME --kubeconfig $KUBEVIRT_INFRA_KUBECONFIG_PATH
    

    or you can port-forward the SSH port to a virtualmachine and access directly via SSH:

    virtctl port-forward vmi/$VM_NAME -n $CLUSTER_NAMESPACE --kubeconfig $KUBEVIRT_INFRA_KUBECONFIG_PATH $LOCAL_PORT:22
    

    Then you can SSH into the VM:

    ssh -p $LOCAL_PORT capk@127.0.0.1 -i $SSH_PRIVATE_KEY_PATH
    

    Warning

    The SSH key pair is generated by Cluster API Provider KubeVirt during the provisioning process. You can retrieve the private key from the secret created in the management or regional cluster:

    kubectl get secret -n $CLUSTER_NAMESPACE $CLUSTER_NAME-ssh-keys -o=jsonpath={.data.key} | base64 -d
    

  6. Check logs on the VM:

    For k0s logs:

    sudo journalctl -u k0sworker
    

    For containers logs, see /var/log/containers directory. For more information see k0s troubleshooting.

Known Issues#

The KubeVirtCluster deployment fails on proto: integer overflow error#

Related issue #369

When deploying the KubeVirt cluster, if the namespace where the ClusterDeployment has been created does not exist on the KubeVirt Infrastructure Cluster, the following misleading error may appear in the cluster-api-provider-kubevirt logs:

E0126 13:41:19.622971       1 controller.go:474] "Reconciler error" err="failed to create load balancer: proto: integer overflow"
controller="kubevirtcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="KubevirtCluster"
KubevirtCluster="kcm-system/my-kubevirt-clusterdeployment1" namespace="kcm-system" name="my-kubevirt-clusterdeployment1"
reconcileID="2074dadd-23e8-4d64-bf87-b7da2905f347"

The ClusterDeployment Ready condition will be:

kubectl -n kcm-system get clusterdeployment my-kubevirt-clusterdeployment1 -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
{
  "lastTransitionTime": "2026-01-26T13:37:06Z",
  "message": "* InfrastructureReady:
    * LoadBalancerAvailable: proto: integer overflow (0/1 conditions met)
    * ControlPlaneInitialized: Control plane not yet initialized
    * ControlPlaneAvailable: K0sControlPlane status.initialization.controlPlaneInitialized is false
    * WorkersAvailable:
      * MachineDeployment my-kubevirt-clusterdeployment1-md: 0 available replicas, at least 1 required (spec.strategy.rollout.maxUnavailable is 0, spec.replicas is 1)
      * RemoteConnectionProbe: Remote connection not established yet",
  "reason": "Failed",
  "status": "False",
  "type": "Ready"
}

Workaround

Most likely the issue is caused by the missing namespace where the ClusterDeployment exists on the KubeVirt Infrastructure Cluster. You must create the namespace in advance before creating the ClusterDeployment object:

kubectl --kubeconfig <kubevirt-infra-cluster-kubeconfig> create namespace <cld-namespace>

bridge-marker CrashLoopBackOff when Ceph RADOS Gateway uses port 8081#

On bare-metal worker nodes where Mirantis k0rdent Virtualization and Ceph are deployed together, the CNAO-managed bridge-marker DaemonSet may fail to start. The bridge-marker pod runs with hostNetwork: true and binds its health HTTP server to port 8081. If Ceph RADOS Gateway (RGW) is also listening on port 8081 on the same node, bridge-marker cannot bind to the port and exits with "address already in use". The container terminates with exit code 2, probes never succeed, and the pods remain in CrashLoopBackOff:

kubectl -n kubevirt get pods -l app=bridge-marker
NAME                  READY   STATUS             RESTARTS   AGE
bridge-marker-6pcbm   0/1     CrashLoopBackOff   262        21h
bridge-marker-ck89b   0/1     CrashLoopBackOff   261        21h
bridge-marker-fmpgf   0/1     CrashLoopBackOff   261        21h

The bridge-marker container exposes port 8081 on the host:

kubectl -n kubevirt describe pod -l app=bridge-marker
Containers:
  bridge-marker:
    Port:          8081/TCP
    Host Port:     8081/TCP
    State:         Waiting
      Reason:      CrashLoopBackOff
    Last State:    Terminated
      Reason:      Error
      Exit Code:   2

Workaround

Configure the Ceph RADOS Gateway to use port 8082 instead of 8081 in the CephDeployment object. Edit the CephDeployment in the ceph-lcm-mirantis namespace and set spec.objectStorage.rgw.gateway.port to 8082:

kubectl edit cephdeployment -n ceph-lcm-mirantis
apiVersion: lcm.mirantis.com/v1alpha1
kind: CephDeployment
metadata:
  name: rook-ceph
  namespace: ceph-lcm-mirantis
spec:
  objectStorage:
    rgw:
      gateway:
        port: 8082
        securePort: 8443

After the Ceph controller applies the updated configuration, verify that RGW is listening on port 8082 and that bridge-marker pods become Ready:

kubectl -n kubevirt get pods -l app=bridge-marker
kubectl get cephdeployment -n ceph-lcm-mirantis

Sidecar and Root feature gates are not converted from v1beta1 to v1 in HCO 1.18.2-mira#

Starting with HCO 1.18.2-mira, the HyperConverged API moves from hco.kubevirt.io/v1beta1 to hco.kubevirt.io/v1. The conversion webhook migrates most fields automatically, but custom Sidecar and Root feature gate settings from v1beta1 are not preserved.

In v1beta1, these settings were configured as boolean fields under spec.featureGates (for example, sidecar: true or root: true). In v1, feature gates use a list of objects with name and optional state fields. After upgrade, clusters that previously enabled or disabled sidecar or root behavior may revert to defaults unless the settings are reapplied manually.

Before upgrading, record the existing v1beta1 feature gate values:

kubectl -n kubevirt get hyperconverged kubevirt-hyperconverged -o jsonpath='{.spec.featureGates}{"\n"}' | jq

After upgrading to 1.18.2-mira, verify the converted HyperConverged CR:

kubectl -n kubevirt get hyperconverged kubevirt-hyperconverged -o jsonpath='{.spec.featureGates}' | jq

Workaround

If sidecar or root settings were lost during conversion, reapply them in the v1 HyperConverged CR:

apiVersion: hco.kubevirt.io/v1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: kubevirt
spec:
  featureGates:
    - name: downwardMetrics
    - name: sidecar
    - name: root
  virtualization:
    platform: k0s
  deployment:
    nodePlacements:
      infra:
        nodeSelector:
          kubernetes.io/os: linux

Only include sidecar and root entries if they were previously enabled in your environment. For more information on the v1 HyperConverged API format, see Configuration.