Known issues

MKE 3.5.2 issues for which there are available workaround solutions include:


[MKE-8538] Limited Windows support dump availability

CLI-based support dumps are unavailable on Windows worker nodes.

Workaround:

For Swarm-orchestrated Windows nodes, use the MKE web UI to obtain a support dump. For Kubernetes-orchestrated Windows nodes, you must manually collect the logs.


[FIELD-4200] Reloading firewalld can disable docker ingress routing mesh

The calico-node firewalld-policy init container can disable the docker ingress routing mesh when reloading firewalld.

Workaround:

  1. Prevent the issue from recurring by disabling firewalld:

    sudo systemctl disable --now firewalld
    
  2. Restore missing iptables chains by restarting dockerd:

    sudo systemctl restart docker
    

    Note

    Restarting dockerd stops all containers on the corresponding node. The node capacity will not be available to the cluster until the node returns to a healthy state in MKE. You must restart dockerd on manager nodes one node at a time, confirming the health of each one in MKE before moving on to the next.

  3. Confirm issue resolution by checking for the presence of the DOCKER-INGRESS iptables chain:

    sudo iptables --list DOCKER-INGRESS
    

    Expected output:

    Chain DOCKER-INGRESS (2 references)
    target     prot opt source               destination
    [...]
    

[MKE-8738] Windows Kubernetes worker nodes can fail on long-haul run

Windows Kubernetes worker nodes can fail on long-haul runs, with a DiskPressure error that is similar to the following:

time="2022-02-08T17:20:30Z" level=warning msg="error while removing container: failed to unprepare layer C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\3707: hcsshim::UnprepareLayer - failed failed in Win32: The system could not find the instance specified. (0x801f0015): unknown"
time="2022-02-08T17:20:30Z" level=fatal msg="failed to cleanup old containers: failed to unprepare layer C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\3707: hcsshim::UnprepareLayer - failed failed in Win32: The system could not find the instance specified. (0x801f0015): unknown"

Workaround:

  1. Identify the stopped task:

    C:\Users\Docker>ctr -n com.docker.ucp task ls
    

    Example output:

    TASK                  PID      STATUS
    ucp-tigera-felix      12012    RUNNING
    ucp-kube-proxy        7912     RUNNING
    ucp-kubelet-health    26616    STOPPED
    ucp-tigera-node       3236     RUNNING
    
  2. Identify the containerd-shim process that is associated with the stopped task:

    Get-CimInstance -ClassName Win32_Process \
    -Filter "Name like 'containerd-shim%'" | \
    select ProcessId,CommandLine | fl
    
  3. Stop the containerd-shim process that is associated with the stopped task:

    Stop-Process -Id <containerd-shim-pid> -Confirm -PassThru -Force