Trigger self-diagnostics for a management or managed cluster

Available since MCC 2.28.0 (17.3.0 and 16.3.0)

To run self-diagnostics for a cluster, the operator must create a Diagnostic object. The creation of this object triggers diagnostic-controller to start all available checks for the target cluster defined in the spec.cluster section of the object.

After a successful completion of the required set of diagnostic checks, diagnostics is never retriggered. To retrigger diagnostics for the same cluster, the operator must create a new Diagnostic object.

The objects of the Diagnostic kind are not removed automatically so that you can assess the result of each diagnostics later.

To trigger self-diagnostics for a cluster:

  1. Log in to the host where kubeconfig of your management cluster is located and where kubectl is installed.

  2. Create the Diagnostic object in the namespace where the target cluster is located. For example:

    apiVersion: diagnostic.mirantis.com/v1alpha1
    kind: Diagnostic
    metadata:
      name: test-diagnostic
      namespace: test-namespace
    spec:
      cluster: test-cluster
    
  3. Wait until diagnostics is finished. To monitor the progress:

    while [ -z "$(kubectl -n <diagnosticObjectNamespace> get diagnostic <diagnosticObjectName> -o go-template='{{.status.finishedAt}}')" ]; do sleep 1; done;
    
  4. Verify the status section of the Diagnostic object:

    • If diagnostics is finished successfully, its result is displayed in the result map containing key-value pairs describing results of the corresponding diagnostic checks.

    • If diagnostics is finished unsuccessfully, or the Diagnostic Controller version is outdated, diagnostic-controller saves the issue description to the status.error field.

      If the Diagnostic Controller version is outdated, ensure that release-controller is running and a new DiagnosticRelease has been created. Also, verify logs of the bare metal provider and release-controller for issues.

    • If the status section is empty, diagnostic-controller has not run any diagnostics yet.

    For details about the object status, see Container Cloud API Reference: Diagnostic resource.