Collect the bootstrap logs

If the bootstrap process is stuck or fails, collect and inspect the bootstrap and management cluster logs.

To collect the bootstrap logs:

If the Cluster object is not created yet
  1. List all available deployments:

    kubectl --kubeconfig <pathToKindKubeconfig> \
    -n kaas get deploy
    
  2. Collect the logs of the required deployment:

    kubectl --kubeconfig <pathToKindKubeconfig> \
    -n kaas logs -lapp.kubernetes.io/name=<deploymentName>
    
If the Cluster object is created

Select from the following options:

  • If a management cluster is not deployed yet:

    CLUSTER_NAME=<clusterName> ./bootstrap.sh collect_logs
    
  • If a management cluster is deployed or pivoting is done:

    1. Obtain the cluster kubeconfig:

      ./container-cloud get cluster-kubeconfig \
      --kubeconfig <pathToKindKubeconfig> \
      --cluster-name <clusterName> \
      --kubeconfig-output <pathToMgmtClusterKubeconfig>
      
    2. Collect the logs:

      CLUSTER_NAME=<cluster-name> \
      KUBECONFIG=<pathToMgmtClusterKubeconfig> \
      ./bootstrap.sh collect_logs
      
    3. Technology Preview. For bare metal clusters, assess the Ironic pod logs:

      • Extract the content of the 'message' fields from every log message:

        kubectl -n kaas logs <ironicPodName> -c syslog | jq -rRM 'fromjson? | .message'
        
      • Extract the content of the 'message' fields from the ironic_conductor source log messages:

        kubectl -n kaas logs <ironicPodName> -c syslog | jq -rRM 'fromjson? | select(.source == "ironic_conductor") | .message'
        

      The syslog container collects logs generated by Ansible during the node deployment and cleanup and outputs them in the JSON format.

Note

Add COLLECT_EXTENDED_LOGS=true before the collect_logs command to output the extended version of logs that contains system and MKE logs, logs from LCM Ansible and LCM Agent along with cluster events and Kubernetes resources description and logs.

Without the --extended flag, the basic version of logs is collected, which is sufficient for most use cases. The basic version of logs contains all events, Kubernetes custom resources, and logs from all Container Cloud components. This version does not require passing --key-file.

The logs are collected in the directory where the bootstrap script is located.

Logs structure

The Container Cloud logs structure in <output_dir>/<cluster_name>/ is as follows:

  • /events.log

    Human-readable table that contains information about the cluster events.

  • /system

    System logs.

  • /system/mke (or /system/MachineName/mke)

    Mirantis Kuberntes Engine (MKE) logs.

  • /objects/cluster

    Logs of the non-namespaced Kubernetes objects.

  • /objects/namespaced

    Logs of the namespaced Kubernetes objects.

  • /objects/namespaced/<namespaceName>/core/pods

    Logs of the pods from a specific Kubernetes namespace. For example, logs of the pods from the kaas namespace contain logs of Container Cloud controllers, including bootstrap-cluster-controller since Container Cloud 2.25.0.

  • /objects/namespaced/<namespaceName>/core/pods/<containerName>.prev.log

    Logs of the pods from a specific Kubernetes namespace that were previously removed or failed.

  • /objects/namespaced/<namespaceName>/core/pods/<ironicPodName>/syslog.log Technology Preview. Ironic pod logs of the bare metal clusters.

    Note

    Logs collected by the syslog container during the bootstrap phase are not transferred to the management cluster during pivoting. These logs are located in /volume/log/ironic/ansible_conductor.log inside the Ironic pod.

Each log entry of the management cluster logs contains a request ID that identifies chronology of actions performed on a cluster or machine. The format of the log entry is as follows:

<process ID>.[<subprocess ID>...<subprocess ID N>].req:<requestID>: <logMessage>

For example, os.machine.req:28 contains information about the task 28 applied to an OpenStack machine.

Since Container Cloud 2.22.0, the logging format has the following extended structure for the admission-controller, storage-discovery, and all supported <providerName>-provider services of a management cluster:

level:<debug,info,warn,error,panic>,
ts:<YYYY-MM-DDTHH:mm:ssZ>,
logger:<processID>.<subProcessID(s)>.req:<requestID>,
caller:<lineOfCode>,
msg:<message>,
error:<errorMessage>,
stacktrace:<codeInfo>

Since Container Cloud 2.23.0, this structure also applies to the <name>-controller services of a management cluster.

Example of a log extract for openstack-provider since 2.22.0
{"level":"error","ts":"2022-11-14T21:37:18Z","logger":"os.cluster.req:318","caller":"lcm/machine.go:808","msg":"","error":"could not determine machine demo-46880-bastion host name”,”stacktrace”:”sigs.k8s.io/cluster-api-provider-openstack/pkg/lcm.GetMachineConditions\n\t/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/lcm/machine.go:808\nsigs.k8s.io/cluster-api-provider-openstack/pkg...."}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"os.machine.req:476","caller":"service/reconcile.go:128","msg":"request: default/demo-46880-2"}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"os.machine.req:476","caller":"machine/machine_controller.go:201","msg":"Reconciling Machine \"default/demo-46880-2\""}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"os.machine.req:476","caller":"machine/actuator.go:454","msg":"Checking if machine exists: \"default/demo-46880-2\" (cluster: \"default/demo-46880\")"}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"os.machine.req:476","caller":"machine/machine_controller.go:327","msg":"Reconciling machine \"default/demo-46880-2\" triggers idempotent update"}
{"level":"info","ts":"2022-11-14T21:37:23Z","logger":"os.machine.req:476","caller":"machine/actuator.go:290","msg":"Updating machine: \"default/demo-46880-2\" (cluster: \"default/demo-46880\")"}
{"level":"info","ts":"2022-11-14T21:37:24Z","logger":"os.machine.req:476","caller":"lcm/machine.go:73","msg":"Machine in LCM cluster, reconciling LCM objects"}
{"level":"info","ts":"2022-11-14T21:37:26Z","logger":"os.machine.req:476","caller":"lcm/machine.go:902","msg":"Updating Machine default/demo-46880-2 conditions"}
  • level

    Informational level. Possible values: debug, info, warn, error, panic.

  • ts

    Time stamp in the <YYYY-MM-DDTHH:mm:ssZ> format. For example: 2022-11-14T21:37:23Z.

  • logger

    Details on the process ID being logged:

    • <processID>

      Primary process identifier. The list of possible values includes bm, os, iam, license, and bootstrap.

      Note

      The iam and license values are available since Container Cloud 2.23.0. The bootstrap value is available since Container Cloud 2.25.0.

    • <subProcessID(s)>

      One or more secondary process identifiers. The list of possible values includes cluster, machine, controller, and cluster-ctrl.

      Note

      The controller value is available since Container Cloud 2.23.0. The cluster-ctrl value is available since Container Cloud 2.25.0 for the bootstrap process identifier.

    • req

      Request ID number that increases when a service performs the following actions:

      • Receives a request from Kubernetes about creating, updating, or deleting an object

      • Receives an HTTP request

      • Runs a background process

      The request ID allows combining all operations performed with an object within one request. For example, the result of a Machine object creation, update of its statuses, and so on has the same request ID.

  • caller

    Code line used to apply the corresponding action to an object.

  • msg

    Description of a deployment or update phase. If empty, it contains the "error" key with a message followed by the "stacktrace" key with stack trace details. For example:

    "msg"="" "error"="Cluster nodes are not yet ready" "stacktrace": "<stack-trace-info>"
    

    The log format of the following Container Cloud components does not contain the "stacktrace" key for easier log handling: baremetal-provider, bootstrap-provider, and host-os-modules-controller.

Note

Logs may also include a number of informational key-value pairs containing additional cluster details. For example, "name": "object-name", "foobar": "baz".

Depending on the type of issue found in logs, apply the corresponding fixes. For example, if you detect the LoadBalancer ERROR state errors during the bootstrap of an OpenStack-based management cluster, contact your system administrator to fix the issue.