Dynamic Resource Balancer service

Available since MOSK 24.2 TechPreview

In a cloud environment where resources are shared across all workloads, those resources often become a point of contention.

For example, it is not uncommon for an oversubscribed compute node to experience the noisy neighbor problem, when one of the instances may start consuming a lot more resources than usually, negatively affecting performance of other instances running on the same node.

In such cases, an intervention is required from the cloud operators to manually re-distribute workloads in the cluster to achieve more equal utilization of resources.

The Dynamic Resource Balancer (DRB) service continiously measures resource usage on hypervisors and redistributes workloads to achieve some optimum target, thereby eliminating the need for manual interventions from cloud operators.

Architecture overview

The DRB service is implemented as a Kubernetes operator, controlled by the custom resource of kind: DRBConfig. Unless at least one resource of such kind is present, the service does not perform any operations. Cloud operators who want to enable the DRB service for their MOSK clouds, need to create the resource with proper configuration.

The DRB controller consists of the following сomponents interacting with each other:

  • collector

    Collects the statistics of resource consumption in the cluster

  • scheduler

    Based on the data from the collector, makes decisions whether cloud resources need to be relocated to achieve the optimum

  • actuator

    Executes the resource relocation decisions made by scheduler

Out of the box, these service components implement a very simple logic, which, however, can be individually enhanced according to the needs of a specific cloud environment by utilizing their pluggable architecture. The plugins need to be written in Python programming language and injected as modules into the DRB service by building a custom drb-controller container image. Default plugins as well as custom plugins are configured through the corresponding sections of DRBConfig custom resources.

Also, it is possible to limit the scope of DRB decisions and actions to only a subset of hosts. This way, you can model the node grouping schema that is configured in OpenStack, for example, compute node aggregates and availability zones, to avoid DRB service attempting resource placement changes that cannot be fulfilled by MOSK Compute service (OpenStack Nova).

Example configuration

apiVersion: lcm.mirantis.com/v1alpha1
kind: DRBConfig
metadata:
  name: drb-test
  namespace: openstack
spec:
  actuator:
    max_parallel_migrations: 10
    migration_polling_interval: 5
    migration_timeout: 180
    name: os-live-migration
  collector:
    name: stacklight
  hosts: []
  migrateAny: false
  reconcileInterval: 300
  scheduler:
    load_threshold: 80
    min_improvement: 0
    name: vm-optimize

The spec section of configuration consists of the following main parts:

  • collector

    Specifies and configures the collector plugin to collect the metrics on which decisions are based. At a minimum, the name of the plugin must be provided.

  • scheduler

    Specifies and configures the scheduler plugin that will make decisions based on the collected metrics. At a minimum, the name of the plugin must be provided.

  • actuator

    Specifies and configures the actuator plugin that will move resources around. At a minimum, the name of the plugin must be provided.

  • reconcileInterval

    Defines time in seconds between reconciliation cycles. Should be large enough for the metrics to settle after resources are moved around.

    For the default stacklight collector plugin, this value must equal at least 300.

  • hosts

    Specifies the list of cluster hosts to which this given instance of DRBConfig applies. This means that only metrics from these hosts will be used for making decisions, only resources belonging to these hosts will be considered for re-distribution, and only these hosts will be considered as possible targets for re-distribution.

    You can create multiple DRBConfig resources that watch over non-overlapping sets of hosts.

    Default of this setting is an empty list that implies all hosts.

  • migrateAny

    A boolean flag that the scheduler plugin can consider when making decisions, allowing cloud operators and users to opt certain workloads in or out of redistribution.

    For the default vm-optimize scheduler plugin:

    • migrateAny: true (default) - any instance can be migrated, except for instances tagged with lcm.mirantis.com:no-drb, explicitly opting out of the DRB functionality

    • migrateAny: false - only instances tagged with lcm.mirantis.com:drb are migrated by the DRB service, explicitly opting in to the DRB functionality

Included default plugins

Collector plugins

stacklight

Collects node_load5, machine_cpu_cores, and libvirt_domain_info_cpu_time_seconds:rate5m metrics from the StackLight service running in the MOSK cluster.

Does not have options available.

Requires the reconcileInterval set to at least 300 (5 minutes), as both the collected node and instance CPU usage metrics are effectively averaged over a 5-minute sliding window.

Scheduler plugins

vm-optimize

Attempts to minimize the standard deviation of node load. The node load is normalized per CPU core, so heterogeneous compute hosts can be compared.

Available options:

  • load_threshold

    The value in percent of the compute host load after which the host will be considered overloaded and attempts will be made to migrate instances from it. Defaults to 80.

  • min_improvement

    Minimal improvement of the optimization metric in percent. While making decisions, the scheduler attempts to predict the resulting load distribution to determine if moving resources is beneficial. If the total improvement after all necessary decisions is calculated to be less than min_improvement, no decisions will be executed.

    Defaults to 0, any potential improvement is acted upon. Setting this to a higher value should allow avoiding instance migrations that provide negligible improvements.

Warning

The current version of this plugin takes into account only basic resource classes when making scheduling decisions. These include only RAM, disk, and vCPU count from the instance flavor. It does not take into account any other information including specific image or aggregate metadata, custom resource classes, PCI devices, NUMA, hugepages, and so on. Moving around instances that consume such resources will more likely fail as the current implementation of the scheduler plugin cannot reliably predict if such instances fit onto the selected target host.

Actuator plugins

os-live-migration

Live migrates instances to specific hosts. Assumes any migration is possible. Refer to the hosts and migrateAny options above to learn how to control which instances are migrated to which locations.

Available options:

  • max_parallel_migrations

    Defines the number of instances to migrate in parallel.

    Defaults to 10.

    This value applies to all decisions being processed, so it may involve instances from different hosts. Meanwhile, the nova-compute service may have its own limits on how many live migrations a given host can handle in parallel.

  • migration_polling_interval

    Defines the interval in seconds for checking the instance status while the latter is being migrated

    Defaults to 5.

  • migration_timeout

    Defines the interval in seconds after which an unfinished migration is considered failed.

    Defaults to 180.

noop

Only logs the decisions that were scheduled for execution. Useful for debugging and dry-runs.