Dynamic Resource Balancer service¶
Available since MOSK 24.2 TechPreview
In a cloud environment where resources are shared across all workloads, those resources often become a point of contention.
For example, it is not uncommon for an oversubscribed compute node to experience the noisy neighbor problem, when one of the instances may start consuming a lot more resources than usually, negatively affecting performance of other instances running on the same node.
In such cases, an intervention is required from the cloud operators to manually re-distribute workloads in the cluster to achieve more equal utilization of resources.
The Dynamic Resource Balancer (DRB) service continiously measures resource usage on hypervisors and redistributes workloads to achieve some optimum target, thereby eliminating the need for manual interventions from cloud operators.
Architecture overview¶
The DRB service is implemented as a Kubernetes operator, controlled by
the custom resource of kind: DRBConfig
. Unless at least one resource
of such kind is present, the service does not perform any operations. Cloud
operators who want to enable the DRB service for their MOSK
clouds, need to create the resource with proper configuration.
The DRB controller consists of the following сomponents interacting with each other:
collector
Collects the statistics of resource consumption in the cluster
scheduler
Based on the data from the collector, makes decisions whether cloud resources need to be relocated to achieve the optimum
actuator
Executes the resource relocation decisions made by
scheduler
Out of the box, these service components implement a very simple logic, which,
however, can be individually enhanced according to the needs of a specific
cloud environment by utilizing their pluggable architecture. The plugins
need to be written in Python programming language and injected as modules
into the DRB service by building a custom drb-controller
container image.
Default plugins as well as custom plugins are configured through the
corresponding sections of DRBConfig
custom resources.
Also, it is possible to limit the scope of DRB decisions and actions to only a subset of hosts. This way, you can model the node grouping schema that is configured in OpenStack, for example, compute node aggregates and availability zones, to avoid DRB service attempting resource placement changes that cannot be fulfilled by MOSK Compute service (OpenStack Nova).
Example configuration¶
apiVersion: lcm.mirantis.com/v1alpha1
kind: DRBConfig
metadata:
name: drb-test
namespace: openstack
spec:
actuator:
max_parallel_migrations: 10
migration_polling_interval: 5
migration_timeout: 180
name: os-live-migration
collector:
name: stacklight
hosts: []
migrateAny: false
reconcileInterval: 300
scheduler:
load_threshold: 80
min_improvement: 0
name: vm-optimize
The spec
section of configuration consists of the following main parts:
collector
Specifies and configures the collector plugin to collect the metrics on which decisions are based. At a minimum, the
name
of the plugin must be provided.
scheduler
Specifies and configures the scheduler plugin that will make decisions based on the collected metrics. At a minimum, the
name
of the plugin must be provided.
actuator
Specifies and configures the actuator plugin that will move resources around. At a minimum, the
name
of the plugin must be provided.
reconcileInterval
Defines time in seconds between reconciliation cycles. Should be large enough for the metrics to settle after resources are moved around.
For the default
stacklight
collector plugin, this value must equal at least300
.
hosts
Specifies the list of cluster hosts to which this given instance of
DRBConfig
applies. This means that only metrics from these hosts will be used for making decisions, only resources belonging to these hosts will be considered for re-distribution, and only these hosts will be considered as possible targets for re-distribution.You can create multiple
DRBConfig
resources that watch over non-overlapping sets of hosts.Default of this setting is an empty list that implies all hosts.
migrateAny
A boolean flag that the scheduler plugin can consider when making decisions, allowing cloud operators and users to opt certain workloads in or out of redistribution.
For the default
vm-optimize
scheduler plugin:migrateAny: true
(default) - any instance can be migrated, except for instances tagged withlcm.mirantis.com:no-drb
, explicitly opting out of the DRB functionalitymigrateAny: false
- only instances tagged withlcm.mirantis.com:drb
are migrated by the DRB service, explicitly opting in to the DRB functionality
Included default plugins¶
Collector plugins¶
stacklight¶
Collects node_load5
, machine_cpu_cores
, and
libvirt_domain_info_cpu_time_seconds:rate5m
metrics
from the StackLight service running in the MOSK cluster.
Does not have options available.
Requires the reconcileInterval
set to at least 300
(5 minutes),
as both the collected node and instance CPU usage metrics are effectively
averaged over a 5-minute sliding window.
Scheduler plugins¶
vm-optimize¶
Attempts to minimize the standard deviation of node load. The node load is normalized per CPU core, so heterogeneous compute hosts can be compared.
Available options:
load_threshold
The value in percent of the compute host load after which the host will be considered overloaded and attempts will be made to migrate instances from it. Defaults to
80
.
min_improvement
Minimal improvement of the optimization metric in percent. While making decisions, the scheduler attempts to predict the resulting load distribution to determine if moving resources is beneficial. If the total improvement after all necessary decisions is calculated to be less than
min_improvement
, no decisions will be executed.Defaults to
0
, any potential improvement is acted upon. Setting this to a higher value should allow avoiding instance migrations that provide negligible improvements.
Warning
The current version of this plugin takes into account only basic resource classes when making scheduling decisions. These include only RAM, disk, and vCPU count from the instance flavor. It does not take into account any other information including specific image or aggregate metadata, custom resource classes, PCI devices, NUMA, hugepages, and so on. Moving around instances that consume such resources will more likely fail as the current implementation of the scheduler plugin cannot reliably predict if such instances fit onto the selected target host.
Actuator plugins¶
os-live-migration¶
Live migrates instances to specific hosts. Assumes any migration is possible.
Refer to the hosts
and migrateAny
options above to learn how to control
which instances are migrated to which locations.
Available options:
max_parallel_migrations
Defines the number of instances to migrate in parallel.
Defaults to
10
.This value applies to all decisions being processed, so it may involve instances from different hosts. Meanwhile, the
nova-compute
service may have its own limits on how many live migrations a given host can handle in parallel.
migration_polling_interval
Defines the interval in seconds for checking the instance status while the latter is being migrated
Defaults to
5
.
migration_timeout
Defines the interval in seconds after which an unfinished migration is considered failed.
Defaults to
180
.
noop¶
Only logs the decisions that were scheduled for execution. Useful for debugging and dry-runs.