Troubleshoot cluster configurations¶
MKE regularly monitors its internal components, attempting to resolve issues as it discovers them.
In most cases where a single MKE component remains in a persistently failed state, removing and rejoining the unhealthy node restores the cluster to a healthy state.
MKE persists configuration data on an etcd key-value store and RethinkDB
database that are replicated on all MKE manager nodes. These data stores are
for internal use only and should not be used by other applications.
Troubleshoot the etcd key-value store with the HTTP API¶
This example uses curl to make requests to the key-value store REST API
and jq to process the responses.
Install
curlandjqon a Ubuntu distribution:sudo apt-get update && sudo apt-get install curl jq
Use a client bundle to authenticate your requests. Download and configure the client bundle if you have not done so already.
Use the REST API to access the cluster configurations. The
$DOCKER_HOSTand$DOCKER_CERT_PATHenvironment variables are set when using the client bundle.export KV_URL="https://$(echo $DOCKER_HOST | cut -f3 -d/ | cut -f1 -d:):12379" curl -s \ --cert ${DOCKER_CERT_PATH}/cert.pem \ --key ${DOCKER_CERT_PATH}/key.pem \ --cacert ${DOCKER_CERT_PATH}/ca.pem \ ${KV_URL}/v2/keys | jq "."
Troubleshoot the etcd key-value store with the CLI¶
Execution of the MKE etcd key-value store takes place in containers with
the name ucp-kv. To check the health of etcd clusters, execute commands
inside these containers using docker exec` with etcdctl.
Log in to a manager node using SSH.
Troubleshoot an etcd key-value store:
docker exec -it ucp-kv sh -c \ 'etcdctl --cluster=true endpoint health -w table 2>/dev/null'
If the command fails, an error code is the only output that displays.
Troubleshoot your cluster configuration using the RethinkDB database¶
User and organization data for MKE is stored in a RethinkDB database, which is replicated across all manager nodes in the MKE cluster.
The database replication and failover is typically handled automatically by the MKE configuration management processes. However, you can use the CLI to review the status of the database and manually reconfigure database replication.
Log in to a manager node using SSH.
Produce a detailed status of all servers and database tables in the RethinkDB cluster:
NODE_ADDRESS=$(docker info --format '{{.Swarm.NodeAddr}}') VERSION=$(docker image ls --format '{{.Tag}}' mirantis/ucp-auth | head -n 1) docker container run --rm -v ucp-auth-store-certs:/tls mirantis/ucp-auth:${VERSION} --db-addr=${NODE_ADDRESS}:12383 db-status
NODE_ADDRESSis the IP address of this Docker Swarm manager node.VERSIONis the most recent version of themirantis/ucp-authimage.
Expected output:
Server Status: [ { "ID": "ffa9cd5a-3370-4ccd-a21f-d7437c90e900", "Name": "ucp_auth_store_192_168_1_25", "Network": { "CanonicalAddresses": [ { "Host": "192.168.1.25", "Port": 12384 } ], "TimeConnected": "2017-07-14T17:21:44.198Z" } } ] ...
Repair the RethinkDB cluster so that the number of replicas it has is equal to the number of manager nodes in the cluster.
NODE_ADDRESS=$(docker info --format '{{.Swarm.NodeAddr}}') NUM_MANAGERS=$(docker node ls --filter role=manager -q | wc -l) VERSION=$(docker image ls --format '{{.Tag}}' mirantis/ucp-auth | head -n 1) docker container run --rm -v ucp-auth-store-certs:/tls mirantis/ucp-auth:${VERSION} --db-addr=${NODE_ADDRESS}:12383 --debug reconfigure-db --num-replicas ${NUM_MANAGERS}
NODE_ADDRESSis the IP address of this Docker Swarm manager node.NUM_MANAGERSis the current number of manager nodes in the cluster.VERSIONis the most recent version of themirantis/ucp-authimage.
Example output:
time="2017-07-14T20:46:09Z" level=debug msg="Connecting to db ..." time="2017-07-14T20:46:09Z" level=debug msg="connecting to DB Addrs: [192.168.1.25:12383]" time="2017-07-14T20:46:09Z" level=debug msg="Reconfiguring number of replicas to 1" time="2017-07-14T20:46:09Z" level=debug msg="(00/16) Reconfiguring Table Replication..." time="2017-07-14T20:46:09Z" level=debug msg="(01/16) Reconfigured Replication of Table \"grant_objects\"" ...
Note
If the quorum in any of the RethinkDB tables is lost, run the
reconfigure-db command with the --emergency-repair flag.