Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!
Starting with MOSK 25.2, the MOSK documentation set covers all product layers, including MOSK management (formerly Container Cloud). This means everything you need is in one place. Some legacy names may remain in the code and documentation and will be updated in future releases. The separate Container Cloud documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.
Inspection error on bare metal hosts after dnsmasq restart¶
If the dnsmasq pod is restarted during the bootstrap of newly added
nodes, those nodes may fail to undergo inspection. That can result in
inspection error in the corresponding BareMetalHost objects.
The issue can occur when:
The
dnsmasqpod was moved to another node.DHCP subnets were changed, including addition or removal. In this case, the
dhcpdcontainer of thednsmasqpod is restarted.Caution
If changing or adding of DHCP subnets is required to bootstrap new nodes, wait after changing or adding DHCP subnets until the
dnsmasqpod becomes ready, then createBareMetalHostobjects.
To verify whether the nodes are affected:
Verify whether the
BareMetalHostobjects contain theinspection error:kubectl get bmh -n <mosk-cluster-namespace-name>
Example of system response:
NAME STATE CONSUMER ONLINE ERROR AGE test-master-1 provisioned test-master-1 true 9d test-master-2 provisioned test-master-2 true 9d test-master-3 provisioned test-master-3 true 9d test-worker-1 provisioned test-worker-1 true 9d test-worker-2 provisioned test-worker-2 true 9d test-worker-3 inspecting true inspection error 19h
Verify whether the
dnsmasqpod was inReadystate when the inspection of the affected baremetal hosts (test-worker-3in the example above) was started:kubectl -n kaas get pod <dnsmasq-pod-name> -oyaml
Example of system response:
... status: conditions: - lastProbeTime: null lastTransitionTime: "2024-10-10T15:37:34Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2024-10-11T07:38:54Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2024-10-11T07:38:54Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2024-10-10T15:37:34Z" status: "True" type: PodScheduled containerStatuses: - containerID: containerd://6dbcf2fc4b36ce4c549c9191ab01f72d0236c51d42947675302675e4bfaf4cdf image: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq:base-2-28-alpine-20240812132650 imageID: docker-dev-kaas-virtual.artifactory-eu.mcp.mirantis.net/bm/baremetal-dnsmasq@sha256:3dad3e278add18e69b2608e462691c4823942641a0f0e25e6811e703e3c23b3b lastState: terminated: containerID: containerd://816fcf079cd544acd74e312065de5b5ed4dbf1dc6159fefffff4f644b5e45987 exitCode: 0 finishedAt: "2024-10-11T07:38:35Z" reason: Completed startedAt: "2024-10-10T15:37:45Z" name: dhcpd ready: true restartCount: 2 started: true state: running: startedAt: "2024-10-11T07:38:37Z" ...
In the system response above, the
dhcpdcontainer was not ready between"2024-10-11T07:38:35Z"and"2024-10-11T07:38:54Z".Verify the affected baremetal host. For example:
kubectl get bmh -n mosk-ns test-worker-3 -oyaml
Example of system response:
... status: errorCount: 15 errorMessage: Introspection timeout errorType: inspection error ... operationHistory: deprovision: end: null start: null inspect: end: null start: "2024-10-11T07:38:19Z" provision: end: null start: null register: end: "2024-10-11T07:38:19Z" start: "2024-10-11T07:37:25Z"
In the system response above, inspection was started at
"2024-10-11T07:38:19Z", immediately before the period of thedhcpdcontainer downtime. Therefore, this node is most likely affected by the issue.
To apply the issue resolution:
Reboot the node using the IPMI reset or cycle command.
If the node fails to boot, remove the failed
BareMetalHostobject and create it again:Remove
BareMetalHostobject. For example:kubectl delete bmh -n mosk-ns test-worker-3
Verify that the
BareMetalHostobject is removed:kubectl get bmh -n mosk-ns test-worker-3
Create a
BareMetalHostobject from the template. For example:kubectl create -f bmhc-test-worker-3.yaml kubectl create -f bmh-test-worker-3.yaml