Parallelizing node update operations

Available since MOSK 23.2 TechPreview

MOSK enables you to parallelize node update operations, significantly improving the efficiency of your deployment. This capability applies to any operation that utilizes the Node Maintenance API, such as cluster updates or graceful node reboots.

The core implementation of parallel updates is handled by the LCM Controller ensuring seamless execution of parallel operations. LCM starts performing an operation on the node only when all NodeWorkloadLock objects for the node are marked as inactive. By default, the LCM Controller creates one NodeMaintenanceRequest at a time.

Each application controller, including Ceph, OpenStack, and Tungsten Fabric Controllers, manages parallel NodeMaintenanceRequest objects independently. The controllers determine how to handle and execute parallel node maintenance requests based on specific requirements of their respective applications. To understand the workflow of the Node Maintenance API, refer to WorkloadLock objects.

Enhancing parallelism during node updates

  1. Set the nodes update order.

    You can optimize parallel updates by setting the order in which nodes are updated. You can accomplish this by configuring upgradeIndex of the Machine object. For the procedure, refer to Mirantis Container Cloud: Change upgrade order for machines.

  2. Increase parallelism.

    Boost parallelism by adjusting the maximum number of worker node updates that are allowed during LCM operations using the spec.providerSpec.value.maxWorkerUpgradeCount configuration parameter, which is set to 1 by default.

    For configuration details, refer to Mirantis Container Cloud: Configure the parallel update of worker nodes.

  3. Execute LCM operations.

    Run LCM operations, such as cluster updates, taking advantage of the increased parallelism.

OpenStack nodes update

By default, the OpenStack Controller handles the NodeMaintenanceRequest objects as follows:

  • Updates the OpenStack controller nodes sequentially (one by one).

  • Updates the gateway nodes sequentially. Technically, you can increase the number of gateway nodes upgrades allowed in parallel using the nwl_parallel_max_gateway parameter but Mirantis does not recommend to do so.

  • Updates the compute nodes in parallel. The default number of allowed parallel updates is 30. You can adjust this value through the nwl_parallel_max_compute parameter.

    Parallelism considerations for compute nodes

    When considering parallelism for compute nodes, take into account that during certain pod restarts, for example, the openvswitch-vswitchd pods, a brief instance downtime may occur. Select a suitable level of parallelism to minimize the impact on workloads and prevent excessive load on the control plane nodes.

    If your cloud environment is distributed across failure domains, which are represented by Nova availability zones, you can limit the parallel updates of nodes to only those within the same availability zone. This behavior is controlled by the respect_nova_az option in the OpenStack Controller.

The OpenStack Controller configuration is stored in the openstack-controller-config configMap of the osh-system namespace. The options are picked up automatically after update. To learn more about the OpenStack Controller configuration parameters, refer to OpenStack Controller configuration.

Ceph nodes update

By default, the Ceph Controller handles the NodeMaintenanceRequest objects as follows:

  • Updates the non-storage nodes sequentially. Non-storage nodes include all nodes that have mon, mgr, rgw, or mds roles.

  • Updates storage nodes in parallel. The default number of allowed parallel updates is calculated automatically based on the minimal failure domain in a Ceph cluster.

    Parallelism calculations for storage nodes

    The Ceph Controller automatically calculates the parallelism number in the following way:

    • Finds the minimal failure domain for a Ceph cluster. For example, the minimal failure domain is rack.

    • Filters all currently requested nodes by minimal failure domain. For example, parallelism equals to 5, and LCM requests 3 nodes from the rack1 rack and 2 nodes from the rack2 rack.

    • Handles each filtered node group one by one. For example, the controller handles in parallel all nodes from rack1 before processing nodes from rack2.

The Ceph Controller handles non-storage nodes before the storage ones. If there are node requests for both node types, the Ceph Controller handles sequentially the non-storage nodes first. Therefore, Mirantis recommends setting the upgrade index of a higher priority for the non-storage nodes to decrease the total upgrade time.

If the minimal failure domain is host, the Ceph Controller updates only one storage node per failure domain unit. This results in updating all Ceph nodes sequentially, despite the potential for increased parallelism.

Tungsten Fabric nodes update

By default, the Tungsten Fabric Controller handles the NodeMaintenanceRequest objects as follows:

  • Updates the Tungsten Fabric Controller and gateway nodes sequentially.

  • Updates the vRouter nodes in parallel. The Tungsten Fabric Controller allows updating up to 30 vRouter nodes in parallel.

    Maximum amount of vRouter nodes in maintenance

    While the Tungsten Fabric Controller has the capability to process up to 30 NodeMaintenanceRequest objects targeted to vRouter nodes, the actual amount may be lower. This is due to a check that ensures OpenStack readiness to unlock the relevant nodes for maintenance. If OpenStack allows for maintenance, the Tungsten Fabric Controller verifies the vRouter pods. Upon successful verification, the NodeWorkloadLock object is switched to the maintenance mode.