Mirantis Secure Registry is a Dockerized application. To monitor it, you can use the same tools and techniques you’re already using to monitor other containerized applications running on your cluster. One way to monitor MSR is using the monitoring capabilities of Docker Universal Control Plane.
In your browser, log in to Mirantis Kubernetes Engine (MKE), and navigate to the Stacks page. If you have MSR set up for high-availability, then all the MSR replicas are displayed.
To check the containers for the MSR replica, click the replica you want to inspect, click Inspect Resource, and choose Containers.
Now you can drill into each MSR container to see its logs and find the root cause of the problem.
MSR also exposes several endpoints you can use to assess if a MSR replica is healthy or not:
/_ping
: Checks if the MSR replica is healthy, and returns a
simple json response. This is useful for load balancing or other
automated health check tasks.
/nginx_status
: Returns the number of connections being handled by
the NGINX front-end used by MSR.
/api/v0/meta/cluster_status
: Returns extensive information about
all MSR replicas.
The /api/v0/meta/cluster_status
endpoint requires administrator
credentials, and returns a JSON object for the entire cluster as observed by
the replica being queried. You can authenticate your requests using HTTP basic
auth.
curl -ksL -u <user>:<pass> https://<msr-domain>/api/v0/meta/cluster_status
{
"current_issues": [
{
"critical": false,
"description": "... some replicas are not ready. The following servers are
not reachable: dtr_rethinkdb_f2277ad178f7",
}],
"replica_health": {
"f2277ad178f7": "OK",
"f3712d9c419a": "OK",
"f58cf364e3df": "OK"
},
}
You can find health status on the current_issues
and
replica_health
arrays. If this endpoint doesn’t provide meaningful
information when trying to troubleshoot, try troubleshooting using
logs.
Docker Content Trust (DCT) keeps audit logs of changes made to trusted repositories. Every time you push a signed image to a repository, or delete trust data for a repository, DCT logs that information.
These logs are only available from the MSR API.
To access the audit logs you need to authenticate your requests using an authentication token. You can get an authentication token for all repositories, or one that is specific to a single repository.
MSR returns a JSON file with a token, even when the user doesn’t have access to the repository to which they requested the authentication token. This token doesn’t grant access to MSR repositories.
The JSON file returned has the following structure:
{
"token": "<token>",
"access_token": "<token>",
"expires_in": "<expiration in seconds>",
"issued_at": "<time>"
}
Once you have an authentication token you can use the following endpoints to get audit logs:
URL |
Description |
Authorization |
---|---|---|
|
Get audit logs for all repositories. |
Global scope token |
|
Get audit logs for a specific repository. |
Repositorhy-specific token |
Both endpoints have the following query string parameters:
Field name |
Required |
Type |
Description |
---|---|---|---|
|
Yes |
String |
A non-inclusive starting change ID from which to start returning results. This will typically be the first or last change ID from the previous page of records requested, depending on which direction your are paging in. The value 0 indicates records should be returned starting from the beginning of time. The value 1 indicates records should be returned starting from the most recent record. If 1 is provided, the implementation will also assume the records value is meant to be negative, regardless of the given sign. |
|
Yes |
String integer |
The number of records to return. A negative value indicates the number of records preceding the change_id should be returned. Records are always returned sorted from oldest to newest. |
The response is a JSON like:
{
"count": 1,
"records": [
{
"ID": "0a60ec31-d2aa-4565-9b74-4171a5083bef",
"CreatedAt": "2017-11-06T18:45:58.428Z",
"GUN": "msr.example.org/library/wordpress",
"Version": 1,
"SHA256": "a4ffcae03710ae61f6d15d20ed5e3f3a6a91ebfd2a4ba7f31fc6308ec6cc3e3d",
"Category": "update"
}
]
}
Below is the description for each of the fields in the response:
Field name |
Description |
---|---|
|
The number of records returned. |
|
The ID of the change record. Should be used in the change_id field of requests to provide a non-exclusive starting index. It should be treated as an opaque value that is guaranteed to be unique within an instance of notary. |
|
The time the change happened. |
|
The MSR repository that was changed. |
|
The version that the repository was updated to. This increments every time there’s a change to the trust repository. This is always 0 for events representing trusted data being removed from the repository. |
|
The checksum of the timestamp being updated to. This can be used with the existing notary APIs to request said timestamp. This is always an empty string for events representing trusted data being removed from the repository |
|
The kind of change that was made to the trusted repository. Can be update, or deletion. |
The results only include audit logs for events that happened more than 60 seconds ago, and are sorted from oldest to newest.
Even though the authentication API always returns a token, the changefeed API validates if the user has access to see the audit logs or not:
If the user is an admin they can see the audit logs for any repositories,
All other users can only see audit logs for repositories they have read access.
This guide contains tips and tricks for troubleshooting MSR problems.
High availability in MSR depends on swarm overlay networking. One way to test if overlay networks are working correctly is to deploy containers to the same overlay network on different nodes and see if they can ping one another.
Use SSH to log into a node and run:
docker run -it --rm \
--net dtr-ol --name overlay-test1 \
--entrypoint sh mirantis/dtr
Then use SSH to log into another node and run:
docker run -it --rm \
--net dtr-ol --name overlay-test2 \
--entrypoint ping mirantis/dtr -c 3 overlay-test1
If the second command succeeds, it indicates overlay networking is working correctly between those nodes.
You can run this test with any attachable overlay network and any Docker
image that has sh
and ping
.
MSR uses RethinkDB for persisting data and replicating it across replicas. It might be helpful to connect directly to the RethinkDB instance running on a MSR replica to check the MSR internal state.
Warning
Modifying RethinkDB directly is not supported and may cause problems.
The RethinkCLI can be run from a separate
image in the mirantis
organization. Note that the
commands below are using separate tags for non-interactive and
interactive modes.
Use SSH to log into a node that is running a MSR replica, and run the following:
# List problems in the cluster detected by the current node.
REPLICA_ID=$(docker container ls --filter=name=dtr-rethink --format '{{.Names}}' | cut -d'/' -f2 | cut -d'-' -f3 | head -n 1) && echo 'r.db("rethinkdb").table("current_issues")' | docker run --rm -i --net dtr-ol -v "dtr-ca-${REPLICA_ID}:/ca" -e MSR_REPLICA_ID=$REPLICA_ID mirantis/rethinkcli:v2.2.0-ni non-interactive
On a healthy cluster the output will be []
.
Starting in DTR 2.5.5, you can run RethinkCLI from a separate image. First, set an environment variable for your MSR replica ID:
REPLICA_ID=$(docker inspect -f '{{.Name}}' $(docker ps -q -f name=dtr-rethink) | cut -f 3 -d '-')
RethinkDB stores data in different databases that contain multiple tables. Run the following command to get into interactive mode and query the contents of the DB:
docker run -it --rm --net dtr-ol -v dtr-ca-$REPLICA_ID:/ca mirantis/rethinkcli:v2.3.0 $REPLICA_ID
# List problems in the cluster detected by the current node.
> r.db("rethinkdb").table("current_issues")
[]
# List all the DBs in RethinkDB
> r.dbList()
[ 'dtr2',
'jobrunner',
'notaryserver',
'notarysigner',
'rethinkdb' ]
# List the tables in the dtr2 db
> r.db('dtr2').tableList()
[ 'blob_links',
'blobs',
'client_tokens',
'content_caches',
'events',
'layer_vuln_overrides',
'manifests',
'metrics',
'namespace_team_access',
'poll_mirroring_policies',
'promotion_policies',
'properties',
'pruning_policies',
'push_mirroring_policies',
'repositories',
'repository_team_access',
'scanned_images',
'scanned_layers',
'tags',
'user_settings',
'webhooks' ]
# List the entries in the repositories table
> r.db('dtr2').table('repositories')
[ { enableManifestLists: false,
id: 'ac9614a8-36f4-4933-91fa-3ffed2bd259b',
immutableTags: false,
name: 'test-repo-1',
namespaceAccountID: 'fc3b4aec-74a3-4ba2-8e62-daed0d1f7481',
namespaceName: 'admin',
pk: '3a4a79476d76698255ab505fb77c043655c599d1f5b985f859958ab72a4099d6',
pulls: 0,
pushes: 0,
scanOnPush: false,
tagLimit: 0,
visibility: 'public' },
{ enableManifestLists: false,
id: '9f43f029-9683-459f-97d9-665ab3ac1fda',
immutableTags: false,
longDescription: '',
name: 'testing',
namespaceAccountID: 'fc3b4aec-74a3-4ba2-8e62-daed0d1f7481',
namespaceName: 'admin',
pk: '6dd09ac485749619becaff1c17702ada23568ebe0a40bb74a330d058a757e0be',
pulls: 0,
pushes: 0,
scanOnPush: false,
shortDescription: '',
tagLimit: 1,
visibility: 'public' } ]
Individual DBs and tables are a private implementation detail and may
change in MSR from version to version, but you can always use
dbList()
and tableList()
to explore the contents and data
structure.
To check on the overall status of your MSR cluster without interacting with RethinkCLI, run the following API request:
curl -u admin:$TOKEN -X GET "https://<msr-url>/api/v0/meta/cluster_status" -H "accept: application/json"
{
"rethink_system_tables": {
"cluster_config": [
{
"heartbeat_timeout_secs": 10,
"id": "heartbeat"
}
],
"current_issues": [],
"db_config": [
{
"id": "339de11f-b0c2-4112-83ac-520cab68d89c",
"name": "notaryserver"
},
{
"id": "aa2e893f-a69a-463d-88c1-8102aafebebc",
"name": "dtr2"
},
{
"id": "bdf14a41-9c31-4526-8436-ab0fed00c2fd",
"name": "jobrunner"
},
{
"id": "f94f0e35-b7b1-4a2f-82be-1bdacca75039",
"name": "notarysigner"
}
],
"server_status": [
{
"id": "9c41fbc6-bcf2-4fad-8960-d117f2fdb06a",
"name": "dtr_rethinkdb_5eb9459a7832",
"network": {
"canonical_addresses": [
{
"host": "dtr-rethinkdb-5eb9459a7832.dtr-ol",
"port": 29015
}
],
"cluster_port": 29015,
"connected_to": {
"dtr_rethinkdb_56b65e8c1404": true
},
"hostname": "9e83e4fee173",
"http_admin_port": "<no http admin>",
"reql_port": 28015,
"time_connected": "2019-02-15T00:19:22.035Z"
},
}
...
]
}
}
When a MSR replica is unhealthy or down, the MSR web UI displays a warning:
Warning: The following replicas are unhealthy: 59e4e9b0a254; Reasons: Replica reported health too long ago: 2017-02-18T01:11:20Z; Replicas 000000000000, 563f02aba617 are still healthy.
To fix this, you should remove the unhealthy replica from the MSR cluster, and join a new one. Start by running:
docker run -it --rm \
mirantis/dtr:2.8.2 remove \
--ucp-insecure-tls
And then:
docker run -it --rm \
mirantis/dtr:2.8.2 join \
--ucp-node <mke-node-name> \
--ucp-insecure-tls