Nodes in Not Ready state on long-running clusters


The issue affects the MKE 3.3.x versions and is addressed in MKE 3.4.x.

On long-running Container Cloud clusters, one or more nodes may occasionally become Not Ready with different errors in the ucp-kubelet containers of failed nodes.

To apply the issue resolution, restart ucp-kubelet on the failed node:

ctr -n com.docker.ucp snapshot rm ucp-kubelet
docker rm -f ucp-kubelet