Increase CPU and memory limits for cluster components

When any Container Cloud component reaches the limit of CPU or memory resources usage, StackLight raises the CPUThrottlingHigh alerts and the affected pod may be killed by OOM killer to prevent memory leaks and further destabilization of resource distribution.

A periodic recreation of a pod killed by OOM killer is normal once a day or week. But if the alerts increase or pods cannot start and move to the CrashLoopBack state, adjust the default CPU and memory limits to fit your cluster needs and prevent critical workloads interruption.

To increase limits on a Container Cloud cluster:

In the spec:providerSpec:value: section of cluster.yaml, add the resources:limits parameters with the required values for necessary Container Cloud components:

kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> edit cluster <clusterName>

The limits key location in the Cluster object can differ depending on component. Different cluster types have different set of components that you can adjust limits for.

The following sections describe components that relate to a specific cluster type with corresponding limits key location provided in configuration examples.

Note

For StackLight resources limits, refer to Resource limits.

Limits for common components of any cluster type

The CPU and memory limits for the following components can be increased on the management, regional, and managed clusters:

  • client-certificate-controller

  • metrics-server

  • storage-discovery

  • metallb

Note

  • For helm-controller, limits configuration is not supported

  • For metallb applicable to bare metal and vSphere providers, the limits key in cluster.yaml differs from other common components

Common components for any cluster type

Component name

Configuration example

<common-component-name>

spec:
  providerSpec:
    value:
      helmReleases:
      - name: client-certificate-controller
        values:
          resources:
            limits:
              cpu: 100m
              memory: 200Mi

metallb

spec:
  providerSpec:
    value:
      helmReleases:
      - name: metallb
        values:
          controller:
            resources:
              limits:
                memory: 200Mi
                # no CPU limit and 200Mi of memory limit since Container Cloud 2.24.0
                # 200m CPU and 200Mi of memory limit since Container Cloud 2.23.0
                # 50m CPU and 100Mi of memory limit since Container Cloud 2.21.0
                # 30m CPU and 50Mi of memory limit before Container Cloud 2.21.0
          speaker:
            resources:
              limits:
                memory: 500Mi
                # no CPU limit and 500Mi of memory limit since Container Cloud 2.24.0
                # 500m CPU and 500Mi of memory limit since Container Cloud 2.23.0
                # 50m CPU and 100Mi of memory limit since Container Cloud 2.21.0
                # 30m CPU and 70Mi of memory limit before Container Cloud 2.21.0

Limits for management cluster components

The CPU and memory limits for the following components can be increased on management cluster in the spec:providerSpec:value:kaas:management:helmReleases: section:

  • admission-controller

  • cert-manager

  • event-controller

  • iam

  • iam-controller

  • kaas-exporter

  • kaas-ui

  • license-controller

  • proxy-controller

  • release-controller

  • rhellicense-controller

  • scope-controller

  • user-controller

Limits for management cluster components

Component name

Configuration example

<mgmt-cluster-component-name>

spec:
  providerSpec:
    value:
      kaas:
        management:
          helmReleases:
          - name: release-controller
            values:
              resources:
                limits:
                  cpu: 200m
                  memory: 200Mi

Limits for management and regional cluster components

The CPU and memory limits for the following components can be increased on management and regional clusters in the spec:providerSpec:value:kaas: regional:[(provider: <provider-name>):helmReleases]: or spec:providerSpec:value:kaas:regionalHelmReleases: sections:

  • agent-controller

  • baremetal-provider

  • lcm-controller

  • mcc-cache

  • openstack-provider

  • os-credentials-controller

  • rbac-controller

  • vsphere-credentials-controller

  • vsphere-provider

  • squid-proxy

Limits for management and regional cluster components

Component name

Configuration example

openstack-provider

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: openstack
          helmReleases:
          - name: openstack-provider
            values:
              resources:
                openstackMachineController:
                  limits:
                    cpu: 2000m
                    memory: 500Mi

os-credentials-controller

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: openstack
          helmReleases:
          - name: os-credentials-controller
            values:
              resources:
                limits:
                  cpu: 100m
                  memory: 1Gi

baremetal-provider

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: baremetal
          helmReleases:
          - name: baremetal-provider
            values:
              cluster_api_provider_baremetal:
                resources:
                  limits:
                    cpu: 2000m
                    memory: 500M

vsphere-provider

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: vsphere # <provider-name>
          helmReleases:
          - name: vsphere-provider # <provider-name>
            values:
              vsphereController: # <provider-name>Controller:
                resources:
                  limits:
                    cpu: 2000m
                    memory: 500Mi

vsphere-credentials-controller

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: vsphere # <provider-name>
          helmReleases:
          - name: vsphere-credentials-controller # <provider-credentials-controller-name>
            values:
              resources:
                limits:
                  cpu: 100m
                  memory: 1Gi
  • lcm-controller

  • agent-controller

  • rbac-controller

spec:
  providerSpec:
    value:
      kaas:
        regionalHelmReleases:
        - name: lcm-controller
          values:
            resources:
              limits:
                cpu: 50m
                memory: 150Mi

squid-proxy

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: vsphere
          helmReleases:
          - name: squid-proxy
            values:
              resources:
                limits:
                  cpu: 100m
                  memory: 1Gi

mcc-cache

spec:
  providerSpec:
    value:
      kaas:
        regionalHelmReleases:
        - name: mcc-cache
          values:
            nginx:
              resources:
                limits:
                  cpu: 200m
                  memory: 300Mi
            registry:
              resources:
                limits:
                  cpu: 200m
                  memory: 300Mi
            kproxy:
              resources:
                limits:
                  cpu: 200m
                  memory: 300M