Troubleshoot Helm Controller alerts

This section describes the investigation and troubleshooting steps for the Helm Controller service and the HelmBundle custom resources.


HelmBundleReleaseNotDeployed

Root cause

Helm Controller release status differs from deployed. Broken HelmBundle configurations or missing Helm chart artifacts may cause this when applying the HelmBundle update.

Investigation

  1. Inspect logs of every Helm Controller Pod for error or warning messages:

    kubectl logs -n <controller_namespace> <controller_name>
    
  2. In case of an error to fetch the chart, review the chartURL fields of the HelmBundle object to verify that the chart URL does not have typographical errors:

    kubectl get helmbundle -n <helmbundle_namespace> <helmbundle_name> -o yaml
    
  3. Verify that the chart artifact is accessible from your cluster.

Mitigation

If the chart artifact is not accessible from your cluster, investigate the network-related alerts, if any, and verify that the file is available in the repository.

HelmControllerReconcileDown

Root cause

Helm Controller failed to reconcile the HelmBundle spec.

Investigation and mitigation

Refer to HelmBundleReleaseNotDeployed.

HelmControllerTargetDown

Root cause

Prometheus fails in at least 10% of Helm Controller metrics scrapes. The following two components can cause the alert to fire:

  • Helm Controller Pod(s):

    • If the Pod is down.

    • If the Pod target endpoint is at least partially unresponsive. For example, in case of CPU throttling, application error preventing a restart, or container flapping.

  • Prometheus server if it cannot reach the helm-controller endpoint(s).

Investigation and mitigation

  1. Refer to KubePodsCrashLooping.

  2. Inspect and resolve the network-related alerts.