Troubleshoot MSR

This guide contains tips and tricks for troubleshooting MSR problems.

Troubleshoot overlay networks

High availability in MSR depends on swarm overlay networking. One way to test if overlay networks are working correctly is to deploy containers to the same overlay network on different nodes and see if they can ping one another.

Use SSH to log into a node and run:

docker run -it --rm \
  --net dtr-ol --name overlay-test1 \
  --entrypoint sh mirantis/dtr

Then use SSH to log into another node and run:

docker run -it --rm \
  --net dtr-ol --name overlay-test2 \
  --entrypoint ping mirantis/dtr -c 3 overlay-test1

If the second command succeeds, it indicates overlay networking is working correctly between those nodes.

You can run this test with any attachable overlay network and any Docker image that has sh and ping.

Access RethinkDB directly

MSR uses RethinkDB for persisting data and replicating it across replicas. It might be helpful to connect directly to the RethinkDB instance running on a MSR replica to check the MSR internal state.

Warning

Modifying RethinkDB directly is not supported and may cause problems.

via RethinkCLI

The RethinkCLI can be run from a separate image in the mirantis organization. Note that the commands below are using separate tags for non-interactive and interactive modes.

Non-interactive

Use SSH to log into a node that is running a MSR replica, and run the following:

# List problems in the cluster detected by the current node.
REPLICA_ID=$(docker container ls --filter=name=dtr-rethink --format '{{.Names}}' | cut -d'/' -f2 | cut -d'-' -f3 | head -n 1) && echo 'r.db("rethinkdb").table("current_issues")' | docker run --rm -i --net dtr-ol -v "dtr-ca-${REPLICA_ID}:/ca" -e DTR_REPLICA_ID=$REPLICA_ID mirantis/rethinkcli:v2.2.0-ni non-interactive

On a healthy cluster the output will be [].

Interactive

Starting in DTR 2.5.5, you can run RethinkCLI from a separate image. First, set an environment variable for your MSR replica ID:

REPLICA_ID=$(docker inspect -f '{{.Name}}' $(docker ps -q -f name=dtr-rethink) | cut -f 3 -d '-')

RethinkDB stores data in different databases that contain multiple tables. Run the following command to get into interactive mode and query the contents of the DB:

docker run -it --rm --net dtr-ol -v dtr-ca-$REPLICA_ID:/ca mirantis/rethinkcli:v2.3.0 $REPLICA_ID
# List problems in the cluster detected by the current node.
> r.db("rethinkdb").table("current_issues")
[]

# List all the DBs in RethinkDB
> r.dbList()
[ 'dtr2',
  'jobrunner',
  'notaryserver',
  'notarysigner',
  'rethinkdb' ]

# List the tables in the dtr2 db
> r.db('dtr2').tableList()
[ 'blob_links',
  'blobs',
  'client_tokens',
  'content_caches',
  'events',
  'layer_vuln_overrides',
  'manifests',
  'metrics',
  'namespace_team_access',
  'poll_mirroring_policies',
  'promotion_policies',
  'properties',
  'pruning_policies',
  'push_mirroring_policies',
  'repositories',
  'repository_team_access',
  'scanned_images',
  'scanned_layers',
  'tags',
  'user_settings',
  'webhooks' ]

# List the entries in the repositories table
> r.db('dtr2').table('repositories')
[ { enableManifestLists: false,
    id: 'ac9614a8-36f4-4933-91fa-3ffed2bd259b',
    immutableTags: false,
    name: 'test-repo-1',
    namespaceAccountID: 'fc3b4aec-74a3-4ba2-8e62-daed0d1f7481',
    namespaceName: 'admin',
    pk: '3a4a79476d76698255ab505fb77c043655c599d1f5b985f859958ab72a4099d6',
    pulls: 0,
    pushes: 0,
    scanOnPush: false,
    tagLimit: 0,
    visibility: 'public' },
  { enableManifestLists: false,
    id: '9f43f029-9683-459f-97d9-665ab3ac1fda',
    immutableTags: false,
    longDescription: '',
    name: 'testing',
    namespaceAccountID: 'fc3b4aec-74a3-4ba2-8e62-daed0d1f7481',
    namespaceName: 'admin',
    pk: '6dd09ac485749619becaff1c17702ada23568ebe0a40bb74a330d058a757e0be',
    pulls: 0,
    pushes: 0,
    scanOnPush: false,
    shortDescription: '',
    tagLimit: 1,
    visibility: 'public' } ]

Individual DBs and tables are a private implementation detail and may change in MSR from version to version, but you can always use dbList() and tableList() to explore the contents and data structure.

Learn more about RethinkDB queries.

via API

To check on the overall status of your MSR cluster without interacting with RethinkCLI, run the following API request:

curl -u admin:$TOKEN -X GET "https://<msr-url>/api/v0/meta/cluster_status" -H "accept: application/json"

Example API Response

{
  "rethink_system_tables": {
    "cluster_config": [
      {
        "heartbeat_timeout_secs": 10,
        "id": "heartbeat"
      }
    ],
    "current_issues": [],
    "db_config": [
      {
        "id": "339de11f-b0c2-4112-83ac-520cab68d89c",
        "name": "notaryserver"
      },
      {
        "id": "aa2e893f-a69a-463d-88c1-8102aafebebc",
        "name": "dtr2"
      },
      {
        "id": "bdf14a41-9c31-4526-8436-ab0fed00c2fd",
        "name": "jobrunner"
      },
      {
        "id": "f94f0e35-b7b1-4a2f-82be-1bdacca75039",
        "name": "notarysigner"
      }
    ],
    "server_status": [
      {
        "id": "9c41fbc6-bcf2-4fad-8960-d117f2fdb06a",
        "name": "dtr_rethinkdb_5eb9459a7832",
        "network": {
          "canonical_addresses": [
            {
              "host": "dtr-rethinkdb-5eb9459a7832.dtr-ol",
              "port": 29015
            }
          ],
          "cluster_port": 29015,
          "connected_to": {
            "dtr_rethinkdb_56b65e8c1404": true
          },
          "hostname": "9e83e4fee173",
          "http_admin_port": "<no http admin>",
          "reql_port": 28015,
          "time_connected": "2019-02-15T00:19:22.035Z"
        },
       }
     ...
    ]
  }
}

Recover from an unhealthy replica

When a MSR replica is unhealthy or down, the MSR web UI displays a warning:

Warning: The following replicas are unhealthy: 59e4e9b0a254; Reasons: Replica reported health too long ago: 2017-02-18T01:11:20Z; Replicas 000000000000, 563f02aba617 are still healthy.

To fix this, you should remove the unhealthy replica from the MSR cluster, and join a new one. Start by running:

docker run -it --rm \
  mirantis/dtr:2.9.16 remove \
  --ucp-insecure-tls

And then:

docker run -it --rm \
  mirantis/dtr:2.9.16 join \
  --ucp-node <mke-node-name> \
  --ucp-insecure-tls

Vulnerability scan warnings

Warnings display in a red banner at the top of the MSR web UI to indicate potential vulnerability scanning issues.

Warning

Cause

Warning: Cannot perform security scans because no vulnerability database was found.

Displays when vulnerabilty scanning is enabled but there is no vulnerability database available to MSR. Typically, the warning displays when a vulnerability database update is run for the first time and the operation fails, as no usable vulnerability database exists at this point.

Warning: Last vulnerability database sync failed.

Displays when a vulnerability database update fails, even though there is a previous usable vulnerabilty database available for vulnerability scans. The warning typically displays when a vulnerability database update fails, despite successful completion of a prior vulnerability database update.

Note

The terms vulnerability database sync and vulnerability database update are interchangeable, in the context of MSR web UI warnings.

Note

The issuing of warnings is the same regardless of whether vulnerability database updating is done manually or is performed automatically through a job.

MSR undergoes a number of steps in performing a vulnerability database update, including TAR file download and extraction, file validation, and the update operation itself. Errors that can trigger warnings can occur at any point in the update process. These errors can include such system-related matters as low disk space, issues with the transient network, or configuration complications. As such, the best strategy for troubleshooting MSR vulnerability scanning issues is to review the logs.


To view the logs for an online vulnerability database update:

Online vulnerability database updates are performed by a jobrunner container, the logs for which you can view through a docker CLI command or by using the MSR web UI:

  • CLI command:

    docker logs <jobrunner-container-name>
    
  • MSR web UI:

    Navigate to System > Job Logs in the left-side navigation panel.


To view the logs for an offline vulnerability database update:

The MSR vulnerability database update occurs through the dtr-api container. As such, access the logs for that container to ascertain the reason for update failure.


To obtain more log information:

If the logs do not initially offer enough detail on the cause of vulnerability database update failure, set MSR to enable debug logging, which will display additional debug logs.

Refer to the reconfigure CLI command documentation for information on how to enable debug logging. For example:

docker run -it --rm mirantis/dtr:<version-number> reconfigure --ucp-url
$MKE_URL --ucp-username $USER --ucp-password $PASSWORD --ucp-insecure-tls
--dtr-external-url $MSR_URL --log-level debug

Certificate issues when pushing and pulling images

If TLS is not properly configured, you are likely to encounter an x509: certificate signed by unknown authority error when attempting to run the following commands:

  • docker login

  • docker push

  • docker pull

To resolve the issue:

Verify that your MSR instance has been configured with your TLS certificate Fully Qualified Domain Name (FQDN). For more information, refer to Add a custom TLS certificate.

Alternatively, but only in testing scenarios, you can skip using a certificate by adding your registry host name as an insecure registry in the Docker daemon.json file:

{
    "insecure-registries" : [ "registry-host-name" ]
}