Randomize RabbitMQ reconnection intervals

Randomize RabbitMQ reconnection intervalsΒΆ

Note

This feature is available starting from the MCP 2019.2.15 maintenance update. Before using the feature, follow the steps described in Apply maintenance updates.

You can randomize RabbitMQ reconnection intervals (or timeouts) for the required OpenStack services. It is helpful for large OpenStack environments where a simultaneous reconnection of all OpenStack services after a RabbitMQ cluster partitioning can significantly prolong the RabbitMQ cluster recovery or cause the cluster to enter the split-brain mode.

Using this feature, the following OpenStack configuration options will be randomized:

  • kombu_reconnect_delay - from 30 to 60 seconds
  • rabbit_retry_interval - from 10 to 60 seconds
  • rabbit_retry_backoff - from 30 to 60 seconds
  • rabbit_interval_max - from 60 to 180 seconds

To randomize RabbitMQ reconnection intervals :

  1. Open your project Git repository with the Reclass model on the cluster level.

  2. Open the configuration file of the required OpenStack service. For example, for the OpenStack Compute service (Nova), open <cluster_name>/openstack/compute/init.yml.

  3. Under message_queue, specify rabbit_timeouts_random: True:

    parameters:
      nova:
        compute:
          message_queue:
            rabbit_timeouts_random: True
    
  4. Log in to the Salt Master node.

  5. Apply the corresponding OpenStack service state(s). For example, for the OpenStack Compute service (Nova), apply the following state:

    salt -C 'I@nova:compute' state.sls nova.compute
    

    Note

    Each service configured with this feature on every node will receive new unique timeouts on every run of the corresponding OpenStack service Salt state.

  6. Perform the steps 2-5 for other OpenStack services as required.