Known issues¶

MKE 3.5.2 issues for which there are available workaround solutions include:

[MKE-8538] Limited Windows support dump availability
[FIELD-4200] Reloading firewalld can disable docker ingress routing mesh
[MKE-8738] Windows Kubernetes worker nodes can fail on long-haul run

[MKE-8538] Limited Windows support dump availability¶

CLI-based support dumps are unavailable on Windows worker nodes.

Workaround:

For Swarm-orchestrated Windows nodes, use the MKE web UI to obtain a support dump. For Kubernetes-orchestrated Windows nodes, you must manually collect the logs.

[FIELD-4200] Reloading firewalld can disable docker ingress routing mesh¶

The calico-node firewalld-policy init container can disable the docker ingress routing mesh when reloading firewalld.

Workaround:

Prevent the issue from recurring by disabling firewalld:
```
sudo systemctl disable --now firewalld
```
Restore missing iptables chains by restarting dockerd:
```
sudo systemctl restart docker
```
Note

Restarting dockerd stops all containers on the corresponding node. The node capacity will not be available to the cluster until the node returns to a healthy state in MKE. You must restart dockerd on manager nodes one node at a time, confirming the health of each one in MKE before moving on to the next.

Confirm issue resolution by checking for the presence of the DOCKER-INGRESS iptables chain:

sudo iptables --list DOCKER-INGRESS

Expected output:

Chain DOCKER-INGRESS (2 references)
target     prot opt source               destination
[...]

[MKE-8738] Windows Kubernetes worker nodes can fail on long-haul run¶

Windows Kubernetes worker nodes can fail on long-haul runs, with a DiskPressure error that is similar to the following:

time="2022-02-08T17:20:30Z" level=warning msg="error while removing container: failed to unprepare layer C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\3707: hcsshim::UnprepareLayer - failed failed in Win32: The system could not find the instance specified. (0x801f0015): unknown"
time="2022-02-08T17:20:30Z" level=fatal msg="failed to cleanup old containers: failed to unprepare layer C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\3707: hcsshim::UnprepareLayer - failed failed in Win32: The system could not find the instance specified. (0x801f0015): unknown"

Workaround:

Identify the stopped task:

C:\Users\Docker>ctr -n com.docker.ucp task ls

Example output:

TASK                  PID      STATUS
ucp-tigera-felix      12012    RUNNING
ucp-kube-proxy        7912     RUNNING
ucp-kubelet-health    26616    STOPPED
ucp-tigera-node       3236     RUNNING

Identify the containerd-shim process that is associated with the stopped task:

Get-CimInstance -ClassName Win32_Process \
-Filter "Name like 'containerd-shim%'" | \
select ProcessId,CommandLine | fl

Stop the containerd-shim process that is associated with the stopped task:
```
Stop-Process -Id <containerd-shim-pid> -Confirm -PassThru -Force
```