Troubleshoot MKE node states

Nodes enter a variety of states in the course of their lifecycle, including transitional states such as when a node joins a cluster and when a node is promoted or demoted. MKE reports the steps of the transition process as they occur in both the ucp-controller logs and in the MKE web UI.


To view transitional node states in the MKE web UI:

  1. Log in to the MKE web UI.

  2. In the left-side navigation panel, navigate to Shared Resources > Nodes. The transitional node state displays in the DETAILS column for each node.

  3. Optional. Click the required node. The transitional node state displays in the Overview tab under Cluster Message.

The following table includes all the node states as they are reported by MKE, along with their description and expected duration:

Message

Description

Expected duration

Completing node registration

The node is undergoing the registration process and does not yet appear in the KV node inventory. This is expected to occur when a node first joins the MKE swarm.

5 - 30 seconds

heartbeat failure

The node has not contacted any swarm managers in the last 10 seconds. Verify the swarm state using docker info on the node.

  • inactive indicates that the node has been removed from the swarm with docker swarm leave.

  • pending indicates dockerd has been attempting to contact a manager since dockerd started on the node. Confirm that the network security policy allows TCP port 2377 from the node to the managers.

  • error indicates an error prevented Swarm from starting on the node. Verify the docker daemon logs on the node.

Until resolved

Node is being reconfigured

The ucp-reconcile container is converging the current state of the node to the desired state. Depending on which state the node is currently in, this process can involve issuing certificates, pulling missing images, or starting containers.

1 - 60 seconds

Reconfiguration pending

The node is expected to be a manager but the ucp-reconcile container has not yet been started.

1 - 10 seconds

The ucp-agent task is state

The ucp-agent task on the node is not yet in a running state. This message is expected when the configuration has been updated or when a node first joins the MKE cluster. This step may take longer than expected if the MKE images need to be pulled from Docker Hub on the affected node.

1 - 10 seconds

Unable to determine node state

The ucp-reconcile container on the target node has just begun running and its state is not yet evident.

1 - 10 seconds

Unhealthy MKE Controller: node is unreachable

Other manager nodes in the cluster have not received a heartbeat message from the affected node within a predetermined timeout period. This usually indicates that there is either a temporary or permanent interruption in the network link to that manager node. Ensure that the underlying networking infrastructure is operational, and contact support if the symptom persists.

Until resolved

Unhealthy MKE Controller: unable to reach controller

The controller that the node is currently communicating with is not reachable within a predetermined timeout. Refresh the node listing to determine whether the symptom persists. The symptom appearing intermittently can indicate latency spikes between manager nodes, which can lead to temporary loss in the availability of MKE. Ensure the underlying networking infrastructure is operational and contact support if the symptom persists.

Until resolved

Unhealthy MKE Controller: Docker Swarm Cluster: Local node <ip> has status Pending

The MCR Engine ID is not unique in the swarm. When a node first joins the cluster, it is added to the node inventory and discovered as Pending by Swarm. MCR is considered validated if a ucp-swarm-manager container can connect to MCR through TLS and its Engine ID is unique in the swarm. If you see this issue repeatedly, make sure that MCR does not have duplicate IDs. Use docker info to view the Engine ID. To refresh the ID, remove the /etc/docker/key.json file and restart the daemon.

Until resolved