Troubleshooting¶
This section describes possible issues you may encounter while working with day-2 operations as well as approaches on how to address these issues.
Troubleshoot the HostOSConfigurationModules object¶
In .status.modules
, verify whether all modules have been loaded and
verified successfully. Each module must have the available
value in the
state
field. If not, the error
field contains the reason of the issue.
Example of different erroneous states in a hocm
object:
status:
modules:
# error state: hashes mismatched
- error: 'hashes are not the same: got ''d78352e51792bbe64e573b841d12f54af089923c73bc185bac2dc5d0e6be84cd''
want ''c726ab9dfbfae1d1ed651bdedd0f8b99af589e35cb6c07167ce0ac6c970129ac'''
name: sysctl
sha256sum: d78352e51792bbe64e573b841d12f54af089923c73bc185bac2dc5d0e6be84cd
state: error
url: <url-to-package>
version: 1.0.0
# error state: an archive is not available because of misconfigured proxy
- error: 'failed to perform request to fetch the module archive: Get "<url-to-package>": Forbidden'
name: custom-module
state: error
url: <url-to-package>
version: 0.0.1
# successfully loaded and verified module
- description: Module for package installation
docURL: https://docs.mirantis.com
name: package
playbookName: main.yaml
sha256sum: 2c7c91206ce7a81a90e0068cd4ce7ca05eab36c4da1893555824b5ab82c7cc0e
state: available
url: <url-to-package>
valuesValidationSchema: <gzip+base64 encoded data>
version: 1.0.0
If a module is in the error
state, it might affect the corresponding
hoc
object that contains the module configuration.
Example of erroneous status
in a hoc
object:
status:
configs:
- moduleName: sysctl
moduleVersion: 1.0.0
modulesReference: mcc-modules
error: module is not found or not verified in any HostOSConfigurationModules object
To resolve an issue described in the error
field:
Address the root cause. For example, ensure that a package has the correct hash sum, or adjust the proxy configuration to fetch the package, and so on.
Recreate the
hocm
object with correct settings.
Setting syncPeriod for debug sessions
During test or debug sessions where errors are inevitable, you can set a
reasonable sync period for host-os-modules-controller
to avoid manual
recreation of hocm
objects.
To enable the option, set the syncPeriod
parameter in the
spec:providerSpec:value:kaas:regional:helmReleases:
section of the
management Cluster
object:
spec:
providerSpec:
value:
kaas:
regional:
- provider: baremetal
helmReleases:
- name: host-os-modules-controller
values:
syncPeriod: 2m
Normally, syncPeriod
is not required in the cluster settings. Therefore,
you can remove this option after completing a debug session.
Troubleshoot the HostOSConfiguration object¶
After creation of a hoc
object with various configurations,
perform the following steps with reference to HostOSConfiguration
status:
Verify that the
.status.isValid
field has thetrue
value.Verify that the
.status.configs[*].error
fields are absent.Verify that all
.status.machinesStates.<machineName>.configStateItemsStatuses
have noFailed
status.
For details on the HostOSConfiguration
status, refer to Container Cloud
API Reference: HostOSConfiguration custom resource.
Also, verify the LCM-related objects:
Verify that the corresponding
LCMCluster
object has all related StateItems.Verify that all selected
LCMMachines
have the .spec.stateItemsOverwrites field, in which allStateItems
from the previous step are present.Verify that all
StateItems
from the previous step have been successfully processed bylcm-agent
. Otherwise, a manual intervention is required.
To address an issue with a specific StateItem
for which the lcm-agent
is reporting an error, log in to the corresponding node and
inspect Ansible execution logs:
ssh -i <path-to-ssh-key> mcc-user@<ip-addr-of-the-node>
sudo -i
cd /var/log/lcm/runners/
# from 2 directories, select the one
# with subdirectories having 'host-os-' prefix
cd <selected-dir>/<name-of-the-erroneous-state-item>
less <logs-file>
After the inspection, either resolve the issue manually or escalate the issue to Mirantis support.
Enable log debugging¶
The day-2 operations API allows enabling logs of debug level, which is
integrated into the baremetal-provider
controller and
host-os-modules-controller
. Both may be helpful during debug sessions.
To enable log debugging in host-os-modules-controller
, add the following
snippet to the Cluster
object:
providerSpec:
# ...
value:
# ...
kaas:
regional:
- helmReleases:
- name: host-os-modules-controller
values:
logLevel: 2
To enable log debugging in baremetal-provider
, add the following snippet
to the Cluster
object:
providerSpec:
# ...
value:
# ...
kaas:
regional:
- helmReleases:
- name: baremetal-provider
values:
cluster_api_provider_baremetal:
log:
verbosity: 3
To obtain the logs related to day-2 operations in baremetal-provider
,
filter them by the .host-os.
key:
kubectl logs -n kaas <baremetal-provider-pod> | grep ".host-os."