Troubleshoot MSR¶
This guide contains tips and tricks for troubleshooting MSR problems.
Troubleshoot overlay networks¶
High availability in MSR depends on swarm overlay networking. One way to test if overlay networks are working correctly is to deploy containers to the same overlay network on different nodes and see if they can ping one another.
Use SSH to log into a node and run:
docker run -it --rm \
--net dtr-ol --name overlay-test1 \
--entrypoint sh mirantis/dtr
Then use SSH to log into another node and run:
docker run -it --rm \
--net dtr-ol --name overlay-test2 \
--entrypoint ping mirantis/dtr -c 3 overlay-test1
If the second command succeeds, it indicates overlay networking is working correctly between those nodes.
You can run this test with any attachable overlay network and any Docker
image that has sh
and ping
.
Access RethinkDB directly¶
MSR uses RethinkDB for persisting data and replicating it across replicas. It might be helpful to connect directly to the RethinkDB instance running on an MSR replica to check the MSR internal state.
Warning
Modifying RethinkDB directly is not supported and may cause problems.
via RethinkCLI¶
The RethinkCLI can be run from a separate
image in the mirantis
organization. Note that the
commands below are using separate tags for non-interactive and
interactive modes.
Non-interactive¶
Use SSH to log into a node that is running an MSR replica, and run the following:
# List problems in the cluster detected by the current node.
REPLICA_ID=$(docker container ls --filter=name=dtr-rethink --format '{{.Names}}' | cut -d'/' -f2 | cut -d'-' -f3 | head -n 1) && echo 'r.db("rethinkdb").table("current_issues")' | docker run --rm -i --net dtr-ol -v "dtr-ca-${REPLICA_ID}:/ca" -e DTR_REPLICA_ID=$REPLICA_ID mirantis/rethinkcli:v2.2.0-ni non-interactive
On a healthy cluster the output will be []
.
Interactive¶
Starting in DTR 2.5.5, you can run RethinkCLI from a separate image. First, set an environment variable for your MSR replica ID:
REPLICA_ID=$(docker inspect -f '{{.Name}}' $(docker ps -q -f name=dtr-rethink) | cut -f 3 -d '-')
RethinkDB stores data in different databases that contain multiple tables. Run the following command to get into interactive mode and query the contents of the DB:
docker run -it --rm --net dtr-ol -v dtr-ca-$REPLICA_ID:/ca mirantis/rethinkcli:v2.3.0 $REPLICA_ID
# List problems in the cluster detected by the current node.
> r.db("rethinkdb").table("current_issues")
[]
# List all the DBs in RethinkDB
> r.dbList()
[ 'dtr2',
'jobrunner',
'notaryserver',
'notarysigner',
'rethinkdb' ]
# List the tables in the dtr2 db
> r.db('dtr2').tableList()
[ 'blob_links',
'blobs',
'client_tokens',
'content_caches',
'events',
'layer_vuln_overrides',
'manifests',
'metrics',
'namespace_team_access',
'poll_mirroring_policies',
'promotion_policies',
'properties',
'pruning_policies',
'push_mirroring_policies',
'repositories',
'repository_team_access',
'scanned_images',
'scanned_layers',
'tags',
'user_settings',
'webhooks' ]
# List the entries in the repositories table
> r.db('dtr2').table('repositories')
[ { enableManifestLists: false,
id: 'ac9614a8-36f4-4933-91fa-3ffed2bd259b',
immutableTags: false,
name: 'test-repo-1',
namespaceAccountID: 'fc3b4aec-74a3-4ba2-8e62-daed0d1f7481',
namespaceName: 'admin',
pk: '3a4a79476d76698255ab505fb77c043655c599d1f5b985f859958ab72a4099d6',
pulls: 0,
pushes: 0,
scanOnPush: false,
tagLimit: 0,
visibility: 'public' },
{ enableManifestLists: false,
id: '9f43f029-9683-459f-97d9-665ab3ac1fda',
immutableTags: false,
longDescription: '',
name: 'testing',
namespaceAccountID: 'fc3b4aec-74a3-4ba2-8e62-daed0d1f7481',
namespaceName: 'admin',
pk: '6dd09ac485749619becaff1c17702ada23568ebe0a40bb74a330d058a757e0be',
pulls: 0,
pushes: 0,
scanOnPush: false,
shortDescription: '',
tagLimit: 1,
visibility: 'public' } ]
Individual DBs and tables are a private implementation detail and may
change in MSR from version to version, but you can always use
dbList()
and tableList()
to explore the contents and data
structure.
via API¶
To check on the overall status of your MSR cluster without interacting with RethinkCLI, run the following API request:
curl -u admin:$TOKEN -X GET "https://<msr-url>/api/v0/meta/cluster_status" -H "accept: application/json"
Example API Response¶
{
"rethink_system_tables": {
"cluster_config": [
{
"heartbeat_timeout_secs": 10,
"id": "heartbeat"
}
],
"current_issues": [],
"db_config": [
{
"id": "339de11f-b0c2-4112-83ac-520cab68d89c",
"name": "notaryserver"
},
{
"id": "aa2e893f-a69a-463d-88c1-8102aafebebc",
"name": "dtr2"
},
{
"id": "bdf14a41-9c31-4526-8436-ab0fed00c2fd",
"name": "jobrunner"
},
{
"id": "f94f0e35-b7b1-4a2f-82be-1bdacca75039",
"name": "notarysigner"
}
],
"server_status": [
{
"id": "9c41fbc6-bcf2-4fad-8960-d117f2fdb06a",
"name": "dtr_rethinkdb_5eb9459a7832",
"network": {
"canonical_addresses": [
{
"host": "dtr-rethinkdb-5eb9459a7832.dtr-ol",
"port": 29015
}
],
"cluster_port": 29015,
"connected_to": {
"dtr_rethinkdb_56b65e8c1404": true
},
"hostname": "9e83e4fee173",
"http_admin_port": "<no http admin>",
"reql_port": 28015,
"time_connected": "2019-02-15T00:19:22.035Z"
},
}
...
]
}
}
Recover from an unhealthy replica¶
When an MSR replica is unhealthy or down, the MSR web UI displays a warning:
Warning: The following replicas are unhealthy: 59e4e9b0a254; Reasons: Replica reported health too long ago: 2017-02-18T01:11:20Z; Replicas 000000000000, 563f02aba617 are still healthy.
To fix this, you should remove the unhealthy replica from the MSR cluster, and join a new one. Start by running:
docker run -it --rm \
mirantis/dtr:2.9.16 remove \
--ucp-insecure-tls
And then:
docker run -it --rm \
mirantis/dtr:2.9.16 join \
--ucp-node <mke-node-name> \
--ucp-insecure-tls
Vulnerability scan warnings¶
Warnings display in a red banner at the top of the MSR web UI to indicate potential vulnerability scanning issues.
Warning |
Cause |
---|---|
Warning: Cannot perform security scans because no vulnerability database was found. |
Displays when vulnerability scanning is enabled but there is no vulnerability database available to MSR. Typically, the warning displays when a vulnerability database update is run for the first time and the operation fails, as no usable vulnerability database exists at this point. |
Warning: Last vulnerability database sync failed. |
Displays when a vulnerability database update fails, even though there is a previous usable vulnerability database available for vulnerability scans. The warning typically displays when a vulnerability database update fails, despite successful completion of a prior vulnerability database update. |
Note
The terms vulnerability database sync and vulnerability database update are interchangeable, in the context of MSR web UI warnings.
Note
The issuing of warnings is the same regardless of whether vulnerability database updating is done manually or is performed automatically through a job.
MSR undergoes a number of steps in performing a vulnerability database update, including TAR file download and extraction, file validation, and the update operation itself. Errors that can trigger warnings can occur at any point in the update process. These errors can include such system-related matters as low disk space, issues with the transient network, or configuration complications. As such, the best strategy for troubleshooting MSR vulnerability scanning issues is to review the logs.
To view the logs for an online vulnerability database update:
Online vulnerability database updates are performed by a jobrunner container, the logs for which you can view through a docker CLI command or by using the MSR web UI:
CLI command:
docker logs <jobrunner-container-name>
MSR web UI:
Navigate to System > Job Logs in the left-side navigation panel.
To view the logs for an offline vulnerability database update:
The MSR vulnerability database update occurs through the dtr-api
container.
As such, access the logs for that container to ascertain the reason for update
failure.
To obtain more log information:
If the logs do not initially offer enough detail on the cause of vulnerability database update failure, set MSR to enable debug logging, which will display additional debug logs.
Refer to the reconfigure CLI command documentation for information on how to enable debug logging. For example:
docker run -it --rm mirantis/dtr:<version-number> reconfigure --ucp-url
$MKE_URL --ucp-username $USER --ucp-password $PASSWORD --ucp-insecure-tls
--dtr-external-url $MSR_URL --log-level debug
Certificate issues when pushing and pulling images¶
If TLS is not properly configured, you are likely to encounter an
x509: certificate signed by unknown authority
error when attempting to run
the following commands:
docker login
docker push
docker pull
To resolve the issue:
Verify that your MSR instance has been configured with your TLS certificate Fully Qualified Domain Name (FQDN). For more information, refer to Add a custom TLS certificate.
Alternatively, but only in testing scenarios, you can skip using a certificate
by adding your registry host name as an insecure registry in the Docker
daemon.json
file:
{
"insecure-registries" : [ "registry-host-name" ]
}