Known issues¶
MKE 3.7.16 known issues with available workaround solutions include:
[FIELD-7023] Air-gapped swarm-only upgrades fail if images are inaccessible
[MKE-11535] ucp-nvidia-gpu-feature-discovery pods may enter CrashLoopBackOff state
[MKE-11531] NodeLocal DNS Pods attempt to deploy to Windows nodes
[MKE-11525] Kubelet node profiles fail to supersede global setting
[MKE-10152] Upgrading large Windows clusters can initiate a rollback
[MKE-9699] Ingress Controller with external load balancer can enter crashloop
[MKE-8662] Swarm only manager nodes are labeled as mixed mode
[MKE-8914] Windows Server Core with Containers images incompatible with GCP
[MKE-8814] Mismatched MTU values cause Swarm overlay network issues on GCP
[FIELD-7023] Air-gapped swarm-only upgrades fail if images are inaccessible¶
In air-gapped swarm-only environments, upgrades fail to start if all of the MKE images are not preloaded on the selected manager node or if the node cannot automatically pull the required MKE images.
Workaround:
Ensure either that the manager nodes have the complete set of MKE images preloaded before performing an upgrade or that they can pull the images from a remote repository.
[MKE-11535] ucp-nvidia-gpu-feature-discovery pods may enter CrashLoopBackOff state¶
Due to the upstream dependency issue in gpu-feature-discovery software,
customers may encounter nvidia-gpu-feature-discovery
in
CrashLoopBackOff state with the following errors:
I0726 08:42:14.338857 1 main.go:144] Start running
SIGSEGV: segmentation violation
PC=0x0 m=4 sigcode=1
signal arrived during cgo execution
goroutine 1 [syscall]:
runtime.cgocall(0x12f4f20, 0xc00025b720)
/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00025b6f8 sp=0xc00025b6c0 pc=0x409b2b
Workaround:
Disable
nvidia_device_plugin
. Refer to Use an MKE configuration file.
[MKE-11531] NodeLocal DNS Pods attempt to deploy to Windows nodes¶
The DNS caching service that NodeLocalDNS deploys to nodes as Pods is a Linux-only solution, however it attempts without success to also deploy to Windows nodes.
Workaround:
Edit the
node-local-dns
daemonset:kubectl edit daemonset node-local-dns -n kube-system
Add the following under
spec.template.spec
:nodeSelector: kubernetes.io/os: linux
Save the daemonset.
[MKE-11525] Kubelet node profiles fail to supersede global setting¶
Flags specified in the global custom_kubelet_flags
setting and then applied
through kubelet node profiles end up being applied twice.
Workaround:
Do not define any global flags in the global custom_kubelet_flags
setting
that will be used in kubelet node profiles.
[MKE-10152] Upgrading large Windows clusters can initiate a rollback¶
Upgrades can rollback on a cluster with a large number of Windows worker nodes.
Workaround:
Invoke the --manual-worker-upgrade option and then manually upgrade the workers.
[MKE-9699] Ingress Controller with external load balancer can enter crashloop¶
Due to the upstream Kubernetes issue 73140, rapid toggling of the Ingress Controller with an external load balancer in use can cause the resource to become stuck in a crashloop.
Workaround:
Log in to the MKE web UI as an administrator.
In the left-side navigation panel, navigate to <user name> > Admin Settings > Ingress.
Click the Kubernetes tab to display the HTTP Ingress Controller for Kubernetes pane.
Toggle the HTTP Ingress Controller for Kubernetes enabled control to the left to disable the Ingress Controller.
Use the CLI to delete the Ingress Controller resources:
kubectl delete service ingress-nginx-controller-admission --namespace ingress-nginx kubectl delete deployment ingress-nginx-controller --namespace ingress-nginx
Verify the successful deletion of the resources:
kubectl get all --namespace ingress-nginx
Example output:
No resources found in ingress-nginx namespace.
Return to the HTTP Ingress Controller for Kubernetes pane in the MKE web UI and change the nodeport numbers for HTTP Port, HTTPS Port and TCP Port.
Toggle the HTTP Ingress Controller for Kubernetes enabled control to the right to re-enable the Ingress Controller.
[MKE-8662] Swarm only manager nodes are labeled as mixed mode¶
When MKE is installed in swarm only mode, manager nodes start off in mixed mode. As Kubernetes installation is skipped altogether, however, they should be labeled as swarm mode.
Workaround: Change the labels following installation.
Change the labels following installation.
[MKE-8914] Windows Server Core with Containers images incompatible with GCP¶
The use of Windows ServerCore with Containers images will prevent kubelet
from starting up, as these images are not compatible with GCP.
As a workaround, use Windows Server or Windows Server Core images.
[MKE-8814] Mismatched MTU values cause Swarm overlay network issues on GCP¶
Communication between GCP VPCs and Docker networks that use Swarm overlay networks will fail if their MTU values are not manually aligned. By default, the MTU value for GCP VPCs is 1460, while the default MTU value for Docker networks is 1500.
Workaround:
Select from the following options:
Create a new VPC and set the MTU value to 1500.
Set the MTU value of the existing VPC to 1500.
For more information, refer to the Google Cloud Platform documentation, Change the MTU setting of a VPC network.