Increase CPU and memory limits for cluster components

When any Container Cloud component reaches the limit of CPU or memory resources usage, StackLight raises the CPUThrottlingHigh alerts and the affected pod may be killed by OOM killer to prevent memory leaks and further destabilization of resource distribution.

A periodic recreation of a pod killed by OOM killer is normal once a day or week. But if the alerts increase or pods cannot start and move to the CrashLoopBack state, adjust the default CPU and memory limits to fit your cluster needs and prevent critical workloads interruption.

To increase limits on a Container Cloud cluster:

In the spec:providerSpec:value: section of cluster.yaml, add the resources:limits parameters with the required values for necessary Container Cloud components:

kubectl --kubeconfig <pathToManagementClusterKubeconfig> -n <projectName> edit cluster <clusterName>

The limits key location in the Cluster object can differ depending on component. Different cluster types have different set of components that you can adjust limits for.

The following sections describe components that relate to a specific cluster type with corresponding limits key location provided in configuration examples.

Note

For StackLight resources limits, refer to Resource limits.

Limits for common components of any cluster type

The CPU and memory limits for the following components can be increased on the management, regional, and managed clusters:

  • client-certificate-controller

  • metrics-server

  • storage-discovery

  • metallb

Note

  • For helm-controller, limits configuration is not supported

  • For metallb aplicable to bare metal, vSphere, and Equinix Metal providers, the limits key in cluster.yaml differs from other common components

Common components for any cluster type

Component name

Configuration example

<common-component-name>

spec:
  providerSpec:
    value:
      helmReleases:
      - name: client-certificate-controller
        values:
          resources:
            limits:
              cpu: 100m
              memory: 200Mi

metallb

spec:
  providerSpec:
    value:
      helmReleases:
      - name: metallb
        values:
          controller:
            resources:
              limits:
                cpu: 50m
                memory: 50Mi
          speaker:
            resources:
              limits:
                cpu: 30m
                memory: 70Mi

Limits for management cluster components

The CPU and memory limits for the following components can be increased on management cluster in the spec:providerSpec:value:kaas:management:helmReleases]: section:

  • admission-controller

  • cert-manager

  • event-controller

  • iam

  • iam-controller

  • kaas-exporter

  • kaas-ui

  • license-controller

  • proxy-controller

  • release-controller

  • rhellicense-controller

  • scope-controller

  • user-controller

Limits for management cluster components

Component name

Configuration example

<mgmt-cluster-component-name>

spec:
  providerSpec:
    value:
      kaas:
        management:
          helmReleases:
          - name: release-controller
            values:
              resources:
                limits:
                  cpu: 200m
                  memory: 200Mi

Limits for management and regional cluster components

The CPU and memory limits for the following components can be increased on management and regional clusters in the spec:providerSpec:value:kaas: regional:[(provider: <provider-name>):helmReleases]: or spec:providerSpec:value:kaas:regionalHelmReleases]: sections:

  • agent-controller

  • aws-credentials-controller

  • aws-provider

  • azure-credentials-controller

  • azure-provider

  • baremetal-provider

  • byo-credentials-controller

  • byo-provider

  • equinix-credentials-controller

  • equinix-provider

  • lcm-controller

  • mcc-cache

  • openstack-provider

  • os-credentials-controller

  • rbac-controller

  • vsphere-credentials-controller

  • vsphere-provider

  • squid-proxy

Limits for management and regional cluster components

Component name

Configuration example

openstack-provider

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: openstack
          helmReleases:
          - name: openstack-provider
            values:
              resources:
                openstackMachineController:
                  limits:
                    cpu: 2000m
                    memory: 500Mi

os-credentials-controller

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: openstack
          helmReleases:
          - name: os-credentials-controller
            values:
              resources:
                limits:
                  cpu: 100m
                  memory: 1Gi

baremetal-provider

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: baremetal
          helmReleases:
          - name: baremetal-provider
            values:
              cluster_api_provider_baremetal:
                resources:
                  limits:
                    cpu: 2000m
                    memory: 500M
  • aws-provider

  • azure-provider

  • vsphere-provider

  • byo-provider

  • equinix-provider

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: equinixmetal # <provider-name>
          # equinixmetalv2 for clusters with private networking
          helmReleases:
          - name: equinix-provider # <provider-name>
            values:
              equinixController: # <provider-name>Controller:
                resources:
                  limits:
                    cpu: 2000m
                    memory: 500Mi
  • aws-credentials-controller

  • azure-credentials-controller

  • vsphere-credentials-controller

  • byo-credentials-controller

  • equinix-credentials-controller

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: equinixmetal # <provider-name>
          # equinixmetalv2 for clusters with private networking
          helmReleases:
          - name: equinix-credentials-controller # <provider-credentials-controller-name>
            values:
              resources:
                limits:
                  cpu: 100m
                  memory: 1Gi
  • lcm-controller

  • agent-controller

  • rbac-controller

spec:
  providerSpec:
    value:
      kaas:
        regionalHelmReleases:
        - name: lcm-controller
          values:
            resources:
              limits:
                cpu: 50m
                memory: 150Mi

squid-proxy

spec:
  providerSpec:
    value:
      kaas:
        regional:
        - provider: vsphere
          helmReleases:
          - name: squid-proxy
            values:
              resources:
                limits:
                  cpu: 100m
                  memory: 1Gi

mcc-cache

spec:
  providerSpec:
    value:
      kaas:
        regionalHelmReleases:
        - name: mcc-cache
          values:
            nginx:
              resources:
                limits:
                  cpu: 200m
                  memory: 300Mi
            registry:
              resources:
                limits:
                  cpu: 200m
                  memory: 300Mi
            kproxy:
              resources:
                limits:
                  cpu: 200m
                  memory: 300M