Note
This feature is available starting from the MCP 2019.2.15 maintenance update. Before using the feature, follow the steps described in Apply maintenance updates.
You can randomize RabbitMQ reconnection intervals (or timeouts) for the required OpenStack services. It is helpful for large OpenStack environments where a simultaneous reconnection of all OpenStack services after a RabbitMQ cluster partitioning can significantly prolong the RabbitMQ cluster recovery or cause the cluster to enter the split-brain mode.
Using this feature, the following OpenStack configuration options will be randomized:
kombu_reconnect_delay
- from 30 to 60 secondsrabbit_retry_interval
- from 10 to 60 secondsrabbit_retry_backoff
- from 30 to 60 secondsrabbit_interval_max
- from 60 to 180 secondsTo randomize RabbitMQ reconnection intervals :
Open your project Git repository with the Reclass model on the cluster level.
Open the configuration file of the required OpenStack service. For example,
for the OpenStack Compute service (Nova), open
<cluster_name>/openstack/compute/init.yml
.
Under message_queue
, specify rabbit_timeouts_random: True
:
parameters:
nova:
compute:
message_queue:
rabbit_timeouts_random: True
Log in to the Salt Master node.
Apply the corresponding OpenStack service state(s). For example, for the OpenStack Compute service (Nova), apply the following state:
salt -C 'I@nova:compute' state.sls nova.compute
Note
Each service configured with this feature on every node will receive new unique timeouts on every run of the corresponding OpenStack service Salt state.
Perform the steps 2-5 for other OpenStack services as required.