Troubleshoot cluster configurations¶
MKE regularly monitors its internal components, attempting to resolve issues as it discovers them.
In most cases where a single MKE component remains in a persistently failed state, removing and rejoining the unhealthy node restores the cluster to a healthy state.
MKE persists configuration data on an etcd
key-value store and RethinkDB
database that are replicated on all MKE manager nodes. These data stores are
for internal use only and should not be used by other applications.
Troubleshoot the etcd key-value store with the HTTP API¶
This example uses curl
to make requests to the key-value store REST API
and jq
to process the responses.
Install
curl
andjq
on a Ubuntu distribution:sudo apt-get update && sudo apt-get install curl jq
Use a client bundle to authenticate your requests. Download and configure the client bundle if you have not done so already.
Use the REST API to access the cluster configurations. The
$DOCKER_HOST
and$DOCKER_CERT_PATH
environment variables are set when using the client bundle.export KV_URL="https://$(echo $DOCKER_HOST | cut -f3 -d/ | cut -f1 -d:):12379" curl -s \ --cert ${DOCKER_CERT_PATH}/cert.pem \ --key ${DOCKER_CERT_PATH}/key.pem \ --cacert ${DOCKER_CERT_PATH}/ca.pem \ ${KV_URL}/v2/keys | jq "."
Troubleshoot the etcd key-value store with the CLI¶
Execution of the MKE etcd
key-value store takes place in containers with
the name ucp-kv
. To check the health of etcd
clusters, execute commands
inside these containers using docker exec` with etcdctl
.
Log in to a manager node using SSH.
Troubleshoot an etcd key-value store:
docker exec -it ucp-kv sh -c \ 'etcdctl --cluster=true endpoint health -w table 2>/dev/null'
If the command fails, an error code is the only output that displays.
Troubleshoot your cluster configuration using the RethinkDB database¶
User and organization data for MKE is stored in a RethinkDB database, which is replicated across all manager nodes in the MKE cluster.
The database replication and failover is typically handled automatically by the MKE configuration management processes. However, you can use the CLI to review the status of the database and manually reconfigure database replication.
Log in to a manager node using SSH.
Produce a detailed status of all servers and database tables in the RethinkDB cluster:
NODE_ADDRESS=$(docker info --format '{{.Swarm.NodeAddr}}') VERSION=$(docker image ls --format '{{.Tag}}' mirantis/ucp-auth | head -n 1) docker container run --rm -v ucp-auth-store-certs:/tls mirantis/ucp-auth:${VERSION} --db-addr=${NODE_ADDRESS}:12383 db-status
NODE_ADDRESS
is the IP address of this Docker Swarm manager node.VERSION
is the most recent version of themirantis/ucp-auth
image.
Expected output:
Server Status: [ { "ID": "ffa9cd5a-3370-4ccd-a21f-d7437c90e900", "Name": "ucp_auth_store_192_168_1_25", "Network": { "CanonicalAddresses": [ { "Host": "192.168.1.25", "Port": 12384 } ], "TimeConnected": "2017-07-14T17:21:44.198Z" } } ] ...
Repair the RethinkDB cluster so that the number of replicas it has is equal to the number of manager nodes in the cluster.
NODE_ADDRESS=$(docker info --format '{{.Swarm.NodeAddr}}') NUM_MANAGERS=$(docker node ls --filter role=manager -q | wc -l) VERSION=$(docker image ls --format '{{.Tag}}' mirantis/ucp-auth | head -n 1) docker container run --rm -v ucp-auth-store-certs:/tls mirantis/ucp-auth:${VERSION} --db-addr=${NODE_ADDRESS}:12383 --debug reconfigure-db --num-replicas ${NUM_MANAGERS}
NODE_ADDRESS
is the IP address of this Docker Swarm manager node.NUM_MANAGERS
is the current number of manager nodes in the cluster.VERSION
is the most recent version of themirantis/ucp-auth
image.
Example output:
time="2017-07-14T20:46:09Z" level=debug msg="Connecting to db ..." time="2017-07-14T20:46:09Z" level=debug msg="connecting to DB Addrs: [192.168.1.25:12383]" time="2017-07-14T20:46:09Z" level=debug msg="Reconfiguring number of replicas to 1" time="2017-07-14T20:46:09Z" level=debug msg="(00/16) Reconfiguring Table Replication..." time="2017-07-14T20:46:09Z" level=debug msg="(01/16) Reconfigured Replication of Table \"grant_objects\"" ...
Note
If the quorum in any of the RethinkDB tables is lost, run the
reconfigure-db command with the --emergency-repair
flag.