This section instructs you on how to switch clustered RabbitMQ to a nonclustered configuration.
Note
This feature is available starting from the MCP 2019.2.13 maintenance update. Before using the feature, follow the steps described in Apply maintenance updates.
Caution
To switch RabbitMQ to a nonclustered configuration:
Perform the following prerequisite steps:
Log in to the Salt Master node.
Verify that the salt-formula-nova version is
2016.12.1+202101271624.d392d41~xenial1 or newer:
dpkg -l |grep salt-formula-nova
Verify that the salt-formula-oslo-templates version is
2018.1+202101191343.e24fd64~xenial1 or newer.
dpkg -l |grep salt-formula-oslo-templates
Create /root/non-clustered-rabbit-helpers.sh with the following
content:
#!/bin/bash
# Apply all known openstack states on given target
# example: run_openstack_states ctl*
function run_openstack_states {
  local target="$1"
  all_formulas=$(salt-call config.get orchestration:upgrade:applications --out=json | jq '.[] | . as $in | keys_unsorted | map ({"key": ., "priority": $in[.].priority}) | sort_by(.priority) | map(.key | [(.)]) | add' | sed -e 's/"//g' -e 's/,//g' -e 's/\[//g' -e 's/\]//g')
  #List of nodes in cloud
  list_nodes=`salt -C "$target" test.ping --out=text | cut -d: -f1 | tr '\n' ' '`
  for node in $list_nodes; do
    #List of applications on the given node
    node_applications=$(salt $node pillar.items __reclass__:applications --out=json | jq 'values |.[] | values |.[] | .[]' | tr -d '"' | tr '\n' ' ')
    for component in $all_formulas ; do
      if [[ " ${node_applications[*]} " == *"$component"* ]]; then
        echo "Applying state: $component on the $node"
        salt $node state.apply $component
      fi
    done
  done
}
# Apply specified update state for all OpenStack applications on given target
# example: run_openstack_update_states ctl0* upgrade.verify
# will run {nova|glance|cinder|keystone}.upgrade.verify on ctl01
function run_openstack_update_states {
  local target="$1"
  local state="$2"
  all_formulas=$(salt-call config.get orchestration:upgrade:applications --out=json | jq '.[] | . as $in | keys_unsorted | map ({"key": ., "priority": $in[.].priority}) | sort_by(.priority) | map(.key | [(.)]) | add' | sed -e 's/"//g' -e 's/,//g' -e 's/\[//g' -e 's/\]//g')
  #List of nodes in cloud
  list_nodes=`salt -C "$target" test.ping --out=text | cut -d: -f1 | tr '\n' ' '`
  for node in $list_nodes; do
    #List of applications on the given node
    node_applications=$(salt $node pillar.items __reclass__:applications --out=json | jq 'values |.[] | values |.[] | .[]' | tr -d '"' | tr '\n' ' ')
    for component in $all_formulas ; do
      if [[ " ${node_applications[*]} " == *"$component"* ]]; then
        echo "Applying state: $component.${state} on the $node"
        salt $node state.apply $component.${state}
      fi
    done
  done
}
Run simple API checks for ctl01*. The output should not include
errors.
. /root/non-clustered-rabbit-helpers.sh
run_openstack_update_states ctl01* upgrade.verify
Open your project Git repository with the Reclass model on the cluster level.
Prepare the Neutron server for the RabbitMQ reconfiguration:
In openstack/control.yml, specify the
allow_automatic_dhcp_failover parameter as required.
Caution
If set to true, the server reschedules the nets from
the failed DHCP agents so that the alive agents catch up the net
and serve DHCP. Once the agent reconnects to RabbitMQ, it detects
that its net has been rescheduled and removes the DHCP port,
namespace, and flows. This parameter is useful if the entire
gateway node goes down. In case of an unstable RabbitMQ, agents do
not go down and the data plane is not affected. Therefore, we
recommend that you set the allow_automatic_dhcp_failover
parameter to false. However, consider the risks of a gateway
node going down before setting the
allow_automatic_dhcp_failover parameter.
neutron:
  server:
    allow_automatic_dhcp_failover: false
Apply the changes:
salt -C 'I@neutron:server' state.apply neutron.server
Verify the changes:
salt -C 'I@neutron:server' cmd.run "grep allow_automatic_dhcp_failover /etc/neutron/neutron.conf"
Perform the following changes in the Reclass model on the cluster level:
In infra/init.yml, add the following variable:
parameters:
  _param:
    openstack_rabbitmq_standalone_mode: true
In openstack/message_queue.yml, comment the following class:
classes:
#- system.rabbitmq.server.cluster
In openstack/message_queue.yml, add the following classes:
classes:
- system.keepalived.cluster.instance.rabbitmq_vip
- system.rabbitmq.server.single
If your deployment has OpenContrail, add the following variables:
In opencontrail/analytics.yml, add:
parameters:
  opencontrail:
    collector:
      message_queue:
        ~members:
          - host: ${_param:openstack_message_queue_address}
In opencontrail/control.yml, add:
parameters:
  opencontrail:
    config:
      message_queue:
        ~members:
          - host: ${_param:openstack_message_queue_address}
    control:
      message_queue:
        ~members:
          - host: ${_param:openstack_message_queue_address}
To update the cells database when running Nova states, add the following
variable to openstack/control.yml:
parameters:
  nova:
    controller:
      update_cells: true
Refresh pillars on all nodes:
salt '*' saltutil.sync_all; salt '*' saltutil.refresh_pillar
Verify that the messaging variables are set correctly:
Note
The following validation highlights the output for core OpenStack services only. Validate any additional deployed services appropriately.
For Keystone:
salt -C 'I@keystone:server' pillar.items keystone:server:message_queue:use_vip_address keystone:server:message_queue:host
For Heat:
salt -C 'I@heat:server' pillar.items heat:server:message_queue:use_vip_address heat:server:message_queue:host
For Cinder:
salt -C 'I@cinder:controller' pillar.items cinder:controller:message_queue:use_vip_address cinder:controller:message_queue:host
For Glance:
salt -C 'I@glance:server' pillar.items glance:server:message_queue:use_vip_address glance:server:message_queue:host
For Nova:
salt -C 'I@nova:controller' pillar.items nova:controller:message_queue:use_vip_address nova:controller:message_queue:host
For the OpenStack compute nodes:
salt -C 'I@nova:compute' pillar.items nova:compute:message_queue:use_vip_address nova:compute:message_queue:host
For Neutron:
salt -C 'I@neutron:server' pillar.items neutron:server:message_queue:use_vip_address neutron:server:message_queue:host
salt -C 'I@neutron:gateway' pillar.items neutron:gateway:message_queue:use_vip_address neutron:gateway:message_queue:host
If your deployment has OpenContrail:
salt 'ntw01*' pillar.items opencontrail:config:message_queue:members opencontrail:control:message_queue:members
salt 'nal01*' pillar.items opencontrail:collector:message_queue:members
Apply the changes:
Stop the OpenStack control plane services on the ctl nodes:
. /root/non-clustered-rabbit-helpers.sh
run_openstack_update_states ctl* upgrade.service_stopped
Stop the OpenStack services on the gtw nodes. Skip this step if your
deployment has OpenContrail or does not have gtw nodes.
. /root/non-clustered-rabbit-helpers.sh
run_openstack_update_states gtw* upgrade.service_stopped
Reconfigure the Keepalived and RabbitMQ clusters on the msg nodes:
Verify that the rabbitmq:cluster pillars are not present:
salt -C 'I@rabbitmq:server' pillar.items rabbitmq:cluster
Verify that the haproxy pillars are not present:
salt -C 'I@rabbitmq:server' pillar.item haproxy
Remove HAProxy, HAProxy monitoring, and reconfigure Keepalived:
salt -C 'I@rabbitmq:server' cmd.run "export DEBIAN_FRONTEND=noninteractive; apt purge haproxy -y"
salt -C 'I@rabbitmq:server' state.apply telegraf
salt -C 'I@rabbitmq:server' state.apply keepalived
Verify that a VIP address is present on one of the msg nodes:
OPENSTCK_MSG_Q_ADDRESS=$(salt msg01* pillar.items _param:openstack_message_queue_address --out json|jq '.[][]')
salt -C 'I@rabbitmq:server' cmd.run "ip addr |grep $OPENSTCK_MSG_Q_ADDRESS"
Stop the RabbitMQ server, clear mnesia, and reconfigure
rabbitmq-server:
salt -C 'I@rabbitmq:server' cmd.run 'systemctl stop rabbitmq-server'
salt -C 'I@rabbitmq:server' cmd.run 'rm -rf /var/lib/rabbitmq/mnesia/'
salt -C 'I@rabbitmq:server' state.apply rabbitmq
Verify that the RabbitMQ server is running in a nonclustered configuration:
salt -C 'I@rabbitmq:server' cmd.run "rabbitmqctl --formatter=erlang cluster_status |grep running_nodes"
Example of system response:
msg01.heat-cicd-queens-dvr-sl.local:
     {running_nodes,[rabbit@msg01]},
msg03.heat-cicd-queens-dvr-sl.local:
     {running_nodes,[rabbit@msg03]},
msg02.heat-cicd-queens-dvr-sl.local:
     {running_nodes,[rabbit@msg02]},
Reconfigure OpenStack services on the ctl nodes:
Apply all OpenStack states on ctl nodes:
. /root/non-clustered-rabbit-helpers.sh
run_openstack_states ctl*
Verify transport_url for the OpenStack services on the ctl
nodes:
salt 'ctl*' cmd.run "for s in nova glance cinder keystone heat neutron; do if [[ -d "/etc/\$s" ]]; then grep ^transport_url /etc/\$s/*.conf; fi; done" shell=/bin/bash
Verify that the cells database is updated and transport_url has a
VIP address:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; nova-manage cell_v2 list_cells"
Reconfigure RabbitMQ on the gtw nodes. Skip this step if your
deployment has OpenContrail or does not have gtw nodes.
Apply all OpenStack states on the gtw nodes:
. /root/non-clustered-rabbit-helpers.sh
run_openstack_states gtw*
Verify transport_url for the OpenStack services on the gtw
nodes:
salt 'gtw*' cmd.run "for s in nova glance cinder keystone heat neutron; do if [[ -d "/etc/\$s" ]]; then grep ^transport_url /etc/\$s/*.conf; fi; done" shell=/bin/bash
Verify that the agents are up:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack orchestration service list"
If your deployment has OpenContrail, reconfigure RabbitMQ on the ntw
and nal nodes:
Apply the following state on the ntw and nal nodes:
salt -C 'ntw* or nal*' state.apply opencontrail
Verify transport_url for the OpenStack services on the ntw
and nal nodes:
salt -C 'ntw* or nal*' cmd.run "for s in contrail; do if [[ -d "/etc/\$s" ]]; then grep ^rabbitmq_server_list /etc/\$s/*.conf; fi; done" shell=/bin/bash
salt 'ntw*' cmd.run "for s in contrail; do if [[ -d "/etc/\$s" ]]; then grep ^rabbit_server /etc/\$s/*.conf; fi; done" shell=/bin/bash
Verify the OpenContrail status:
salt -C 'ntw* or nal*' cmd.run 'doctrail all contrail-status'
Reconfigure OpenStack services on the cmp nodes:
Apply all OpenStack states on the cmp nodes:
. /root/non-clustered-rabbit-helpers.sh
run_openstack_states cmp*
Verify transport_url for the OpenStack services on the cmp
nodes:
salt 'cmp*' cmd.run "for s in nova glance cinder keystone heat neutron; do if [[ -d "/etc/\$s" ]]; then grep ^transport_url /etc/\$s/*.conf; fi; done" shell=/bin/bash
Caution
If your deployment has other nodes with OpenStack services, apply the changes on such nodes as well using the required states.
Verify the services:
Verify that the Neutron services are up. Skip this step if your deployment has OpenContrail.
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack network agent list"
Verify that the Nova services are up:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack compute service list"
Verify that Heat services are up:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack orchestration service list"
Verify that the Cinder services are up:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack volume service list"
From the ctl01* node, apply the <app>.upgrade.verify
state. The output should not include errors.
. /root/non-clustered-rabbit-helpers.sh
run_openstack_update_states ctl01* upgrade.verify
Perform post-configuration steps:
Disable the RabbitMQUnequalQueueCritical Prometheus alert:
In stacklight/server.yml, add the following variable:
parameters:
  prometheus:
    server:
      alert:
        RabbitMQUnequalQueueCritical:
          enabled: false
Apply the Prometheus state to the mon nodes:
salt -C 'I@docker:swarm and I@prometheus:server' state.sls prometheus.server -b1
Revert the changes in the Reclass model on the cluster level:
openstack/control.yaml, set allow_automatic_dhcp_failover
back to true or leave as is if you did not change the value.openstack/control.yaml, remove
nova:controller:update_cells:true.Apply the Neutron state:
salt -C 'I@neutron:server' state.apply neutron.server
Verify the changes:
salt -C 'I@neutron:server' cmd.run "grep allow_automatic_dhcp_failover /etc/neutron/neutron.conf"
Remove the script:
rm -f /root/non-clustered-rabbit-helpers.sh
See also