Resource oversubscription

The Compute service (OpenStack Nova) enables you to spawn instances that can collectively consume more resources than what is physically available on a compute node through resource oversubscription, also known as overcommit or allocation ratio.

Resources available for oversubscription on a compute node include the number of CPUs, amount of RAM, and amount of available disk space. When making a scheduling decision, the scheduler of the Compute service takes into account the actual amount of resources multiplied by the allocation ratio. Thereby, the service allocates resources based on the assumption that not all instances will be using their full allocation of resources at the same time.

Oversubscription enables you to increase the density of workloads and compute resource utilization and, thus, achieve better Return on Investment (ROI) on compute hardware. In addition, oversubscription can also help avoid the need to create too many fine-grained flavors, which is commonly known as flavor explosion.

Configuring initial resource oversubscription

Available since MOSK 23.1

There are two ways to control the oversubscription values for compute nodes:

  • The legacy approach entails utilizing the {cpu,disk,ram}_allocation_ratio configuration options offered by the Compute service. A drawback of this method is that restarting the Compute service is mandatory to apply the new configuration. This introduces the risk of possible interruptions of cloud user operations, for example, instance build failures.

  • The modern and recommended approach, adopted in MOSK 23.1, involves using the initial_{cpu,disk,ram}_allocation_ratio configuration options, which are employed exclusively during the initial provisioning of a compute node. This may occur during the initial deployment of the cluster or when new compute nodes are added subsequently. Any further alterations can be performed dynamically using the OpenStack Placement service API without necessitating the restart of the service.

There is no definitive method for selecting optimal oversubscription values. As a cloud operator, you should continuously monitor your workloads, ideally have a comprehensive understanding of their nature, and experimentally determine the maximum values that do not impact performance. This approach ensures maximum workload density and cloud resource utilization.

To configure the initial compute resource oversubscription in MOSK, specify the spec:features:nova:allocation_ratios parameter in the OpenStackDeployment custom resource as explained in the table below.

Resource oversubscription configuration

Parameter

spec:features:nova:allocation_ratios

Configuration

Configure initial oversubscription of CPU, disk space, and RAM resources on compute nodes. By default, the following values are applied:

  • cpu: 8.0

  • disk: 1.6

  • ram: 1.0

Note

In MOSK 22.5 and earlier, the effective default value of RAM allocation ratio is 1.1.

Warning

Mirantis strongly advises against oversubscribing RAM, by any amount. See Preventing resource overconsumption for details.

Changing the resource oversubscription configuration through the OpenStackDeployment resource after cloud deployment will only affect the newly added compute nodes and will not change oversubscription for already existing compute nodes. To change oversubscription for already existing compute nodes, use the placement service API as described in Change oversubscription settings for existing compute nodes.

Usage

Configuration example:

kind: OpenStackDeployment
spec:
  features:
    nova:
      allocation_ratios:
        cpu: 8
        disk: 1.6
        ram: 1.0

Configuration example of setting different oversubscription values for specific nodes:

spec:
  nodes:
    compute-type::hi-perf:
      features:
        nova:
          allocation_ratios:
            cpu: 2.0
            disk: 1.0

In the example configuration above, the compute nodes labeled with compute-type=hi-perf label will use less intense oversubscription on CPU and no oversubscription on disk.

Preventing resource overconsumption

When using oversubscription, it is important to conduct thorough cloud management and monitoring to avoid system overloading and performance degradation. If many or all instances on a compute node start using all allocated resources at once and, thereby, overconsume physical resources, failure scenarios depend on the resource being exhausted.

Symptoms of resource exhaustion

Affected resource

Symptoms

CPU

Workloads are getting slower as they actively compete for physical CPU usage. A useful indicator is the steal time as reported inside the workload, which is a percentage of time the operating system in the workload is waiting for actual physical CPU core availability to run instructions.

To verify the steal time in the Linux-based workload, use the top command:

top -bn1 | head | grep st$ | awk -F ',' '{print $NF}'

Generally, steal times of >10 for 20-30 minutes are considered alarming.

RAM

Operating system on the compute node starts to aggressively use physical swap space, which significantly slows the workloads down. Sometimes, when the swap is also exhausted, the operating system of a compute node can outright OOM kill most offending processes, which can cause major disruptions to workloads or a compute node itself.

Warning

While it may seem like a good idea to make the most of available resources, oversubscribing RAM can lead to various issues and is generally not recommended due to potential performance degradation, reduced stability, and security risks for the workloads.

Mirantis strongly advises against oversubscribing RAM, by any amount.

Disk space

Depends on the physical layout of storage. Virtual root and ephemeral storage devices that are hosted on a compute node itself are put in the read-only mode negatively affecting workloads. Additionally, the file system used by the operating system on a compute node may become read-only too blocking the compute node operability.

There are workload types that are not suitable for running in an oversubscribed environment, especially those with high performance, latency-sensitive, or real-time requirements. Such workloads are better suited for compute nodes with dedicated CPUs, ensuring that only processes of a single instance run on each CPU core.