Known issues¶
MKE 3.5.2 issues for which there are available workaround solutions include:
[FIELD-4200] Reloading firewalld can disable docker ingress routing mesh
[MKE-8738] Windows Kubernetes worker nodes can fail on long-haul run
[MKE-8538] Limited Windows support dump availability¶
CLI-based support dumps are unavailable on Windows worker nodes.
Workaround:
For Swarm-orchestrated Windows nodes, use the MKE web UI to obtain a support dump. For Kubernetes-orchestrated Windows nodes, you must manually collect the logs.
[FIELD-4200] Reloading firewalld can disable docker ingress routing mesh¶
The calico-node
firewalld-policy
init container can disable the
docker ingress routing mesh when reloading firewalld
.
Workaround:
Prevent the issue from recurring by disabling
firewalld
:sudo systemctl disable --now firewalld
Restore missing iptables chains by restarting
dockerd
:sudo systemctl restart docker
Note
Restarting
dockerd
stops all containers on the corresponding node. The node capacity will not be available to the cluster until the node returns to a healthy state in MKE. You must restartdockerd
on manager nodes one node at a time, confirming the health of each one in MKE before moving on to the next.Confirm issue resolution by checking for the presence of the
DOCKER-INGRESS
iptables chain:sudo iptables --list DOCKER-INGRESS
Expected output:
Chain DOCKER-INGRESS (2 references) target prot opt source destination [...]
[MKE-8738] Windows Kubernetes worker nodes can fail on long-haul run¶
Windows Kubernetes worker nodes can fail on long-haul runs, with a DiskPressure error that is similar to the following:
time="2022-02-08T17:20:30Z" level=warning msg="error while removing container: failed to unprepare layer C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\3707: hcsshim::UnprepareLayer - failed failed in Win32: The system could not find the instance specified. (0x801f0015): unknown"
time="2022-02-08T17:20:30Z" level=fatal msg="failed to cleanup old containers: failed to unprepare layer C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\3707: hcsshim::UnprepareLayer - failed failed in Win32: The system could not find the instance specified. (0x801f0015): unknown"
Workaround:
Identify the stopped task:
C:\Users\Docker>ctr -n com.docker.ucp task ls
Example output:
TASK PID STATUS ucp-tigera-felix 12012 RUNNING ucp-kube-proxy 7912 RUNNING ucp-kubelet-health 26616 STOPPED ucp-tigera-node 3236 RUNNING
Identify the
containerd-shim
process that is associated with the stopped task:Get-CimInstance -ClassName Win32_Process \ -Filter "Name like 'containerd-shim%'" | \ select ProcessId,CommandLine | fl
Stop the
containerd-shim
process that is associated with the stopped task:Stop-Process -Id <containerd-shim-pid> -Confirm -PassThru -Force