Troubleshoot MKE node states¶

Nodes enter a variety of states in the course of their lifecycle, including transitional states such as when a node joins a cluster and when a node is promoted or demoted. MKE reports the steps of the transition process as they occur in both the ucp-controller logs and in the MKE web UI.

To view transitional node states in the MKE web UI:

Log in to the MKE web UI.
In the left-side navigation panel, navigate to Shared Resources > Nodes. The transitional node state displays in the DETAILS column for each node.
Optional. Click the required node. The transitional node state displays in the Overview tab under Cluster Message.

The following table includes all the node states as they are reported by MKE, along with their description and expected duration:

Message	Description	Expected duration
Completing node registration	The node is undergoing the registration process and does not yet appear in the KV node inventory. This is expected to occur when a node first joins the MKE swarm.	5 - 30 seconds
heartbeat failure	The node has not contacted any swarm managers in the last 10 seconds. Verify the swarm state using docker info on the node. `inactive` indicates that the node has been removed from the swarm with docker swarm leave. `pending` indicates dockerd has been attempting to contact a manager since dockerd started on the node. Confirm that the network security policy allows TCP port 2377 from the node to the managers. `error` indicates an error prevented Swarm from starting on the node. Verify the docker daemon logs on the node.	Until resolved
Node is being reconfigured	The `ucp-reconcile` container is converging the current state of the node to the desired state. Depending on which state the node is currently in, this process can involve issuing certificates, pulling missing images, or starting containers.	1 - 60 seconds
Reconfiguration pending	The node is expected to be a manager but the `ucp-reconcile` container has not yet been started.	1 - 10 seconds
The `ucp-agent` task is `state`	The `ucp-agent` task on the node is not yet in a running state. This message is expected when the configuration has been updated or when a node first joins the MKE cluster. This step may take longer than expected if the MKE images need to be pulled from Docker Hub on the affected node.	1 - 10 seconds
Unable to determine node state	The `ucp-reconcile` container on the target node has just begun running and its state is not yet evident.	1 - 10 seconds
Unhealthy MKE Controller: node is unreachable	Other manager nodes in the cluster have not received a heartbeat message from the affected node within a predetermined timeout period. This usually indicates that there is either a temporary or permanent interruption in the network link to that manager node. Ensure that the underlying networking infrastructure is operational, and contact support if the symptom persists.	Until resolved
Unhealthy MKE Controller: unable to reach controller	The controller that the node is currently communicating with is not reachable within a predetermined timeout. Refresh the node listing to determine whether the symptom persists. The symptom appearing intermittently can indicate latency spikes between manager nodes, which can lead to temporary loss in the availability of MKE. Ensure the underlying networking infrastructure is operational and contact support if the symptom persists.	Until resolved
Unhealthy MKE Controller: Docker Swarm Cluster: Local node <ip> has status `Pending`	The MCR Engine ID is not unique in the swarm. When a node first joins the cluster, it is added to the node inventory and discovered as `Pending` by Swarm. MCR is considered validated if a `ucp-swarm-manager` container can connect to MCR through TLS and its Engine ID is unique in the swarm. If you see this issue repeatedly, make sure that MCR does not have duplicate IDs. Use docker info to view the Engine ID. To refresh the ID, remove the `/etc/docker/key.json` file and restart the daemon.	Until resolved