This section instructs you on how to switch clustered RabbitMQ to a nonclustered configuration.
Note
This feature is available starting from the MCP 2019.2.13 maintenance update. Before using the feature, follow the steps described in Apply maintenance updates.
Caution
To switch RabbitMQ to a nonclustered configuration:
Perform the following prerequisite steps:
Log in to the Salt Master node.
Verify that the salt-formula-nova
version is
2016.12.1+202101271624.d392d41~xenial1 or newer:
dpkg -l |grep salt-formula-nova
Verify that the salt-formula-oslo-templates
version is
2018.1+202101191343.e24fd64~xenial1 or newer.
dpkg -l |grep salt-formula-oslo-templates
Create /root/non-clustered-rabbit-helpers.sh
with the following
content:
#!/bin/bash
# Apply all known openstack states on given target
# example: run_openstack_states ctl*
function run_openstack_states {
local target="$1"
all_formulas=$(salt-call config.get orchestration:upgrade:applications --out=json | jq '.[] | . as $in | keys_unsorted | map ({"key": ., "priority": $in[.].priority}) | sort_by(.priority) | map(.key | [(.)]) | add' | sed -e 's/"//g' -e 's/,//g' -e 's/\[//g' -e 's/\]//g')
#List of nodes in cloud
list_nodes=`salt -C "$target" test.ping --out=text | cut -d: -f1 | tr '\n' ' '`
for node in $list_nodes; do
#List of applications on the given node
node_applications=$(salt $node pillar.items __reclass__:applications --out=json | jq 'values |.[] | values |.[] | .[]' | tr -d '"' | tr '\n' ' ')
for component in $all_formulas ; do
if [[ " ${node_applications[*]} " == *"$component"* ]]; then
echo "Applying state: $component on the $node"
salt $node state.apply $component
fi
done
done
}
# Apply specified update state for all OpenStack applications on given target
# example: run_openstack_update_states ctl0* upgrade.verify
# will run {nova|glance|cinder|keystone}.upgrade.verify on ctl01
function run_openstack_update_states {
local target="$1"
local state="$2"
all_formulas=$(salt-call config.get orchestration:upgrade:applications --out=json | jq '.[] | . as $in | keys_unsorted | map ({"key": ., "priority": $in[.].priority}) | sort_by(.priority) | map(.key | [(.)]) | add' | sed -e 's/"//g' -e 's/,//g' -e 's/\[//g' -e 's/\]//g')
#List of nodes in cloud
list_nodes=`salt -C "$target" test.ping --out=text | cut -d: -f1 | tr '\n' ' '`
for node in $list_nodes; do
#List of applications on the given node
node_applications=$(salt $node pillar.items __reclass__:applications --out=json | jq 'values |.[] | values |.[] | .[]' | tr -d '"' | tr '\n' ' ')
for component in $all_formulas ; do
if [[ " ${node_applications[*]} " == *"$component"* ]]; then
echo "Applying state: $component.${state} on the $node"
salt $node state.apply $component.${state}
fi
done
done
}
Run simple API checks for ctl01*
. The output should not include
errors.
. /root/non-clustered-rabbit-helpers.sh
run_openstack_update_states ctl01* upgrade.verify
Open your project Git repository with the Reclass model on the cluster level.
Prepare the Neutron server for the RabbitMQ reconfiguration:
In openstack/control.yml
, specify the
allow_automatic_dhcp_failover
parameter as required.
Caution
If set to true
, the server reschedules the nets from
the failed DHCP agents so that the alive agents catch up the net
and serve DHCP. Once the agent reconnects to RabbitMQ, it detects
that its net has been rescheduled and removes the DHCP port,
namespace, and flows. This parameter is useful if the entire
gateway node goes down. In case of an unstable RabbitMQ, agents do
not go down and the data plane is not affected. Therefore, we
recommend that you set the allow_automatic_dhcp_failover
parameter to false
. However, consider the risks of a gateway
node going down before setting the
allow_automatic_dhcp_failover
parameter.
neutron:
server:
allow_automatic_dhcp_failover: false
Apply the changes:
salt -C 'I@neutron:server' state.apply neutron.server
Verify the changes:
salt -C 'I@neutron:server' cmd.run "grep allow_automatic_dhcp_failover /etc/neutron/neutron.conf"
Perform the following changes in the Reclass model on the cluster level:
In infra/init.yml
, add the following variable:
parameters:
_param:
openstack_rabbitmq_standalone_mode: true
In openstack/message_queue.yml
, comment the following class:
classes:
#- system.rabbitmq.server.cluster
In openstack/message_queue.yml
, add the following classes:
classes:
- system.keepalived.cluster.instance.rabbitmq_vip
- system.rabbitmq.server.single
If your deployment has OpenContrail, add the following variables:
In opencontrail/analytics.yml
, add:
parameters:
opencontrail:
collector:
message_queue:
~members:
- host: ${_param:openstack_message_queue_address}
In opencontrail/control.yml
, add:
parameters:
opencontrail:
config:
message_queue:
~members:
- host: ${_param:openstack_message_queue_address}
control:
message_queue:
~members:
- host: ${_param:openstack_message_queue_address}
To update the cells database when running Nova states, add the following
variable to openstack/control.yml
:
parameters:
nova:
controller:
update_cells: true
Refresh pillars on all nodes:
salt '*' saltutil.sync_all; salt '*' saltutil.refresh_pillar
Verify that the messaging variables are set correctly:
Note
The following validation highlights the output for core OpenStack services only. Validate any additional deployed services appropriately.
For Keystone:
salt -C 'I@keystone:server' pillar.items keystone:server:message_queue:use_vip_address keystone:server:message_queue:host
For Heat:
salt -C 'I@heat:server' pillar.items heat:server:message_queue:use_vip_address heat:server:message_queue:host
For Cinder:
salt -C 'I@cinder:controller' pillar.items cinder:controller:message_queue:use_vip_address cinder:controller:message_queue:host
For Glance:
salt -C 'I@glance:server' pillar.items glance:server:message_queue:use_vip_address glance:server:message_queue:host
For Nova:
salt -C 'I@nova:controller' pillar.items nova:controller:message_queue:use_vip_address nova:controller:message_queue:host
For the OpenStack compute nodes:
salt -C 'I@nova:compute' pillar.items nova:compute:message_queue:use_vip_address nova:compute:message_queue:host
For Neutron:
salt -C 'I@neutron:server' pillar.items neutron:server:message_queue:use_vip_address neutron:server:message_queue:host
salt -C 'I@neutron:gateway' pillar.items neutron:gateway:message_queue:use_vip_address neutron:gateway:message_queue:host
If your deployment has OpenContrail:
salt 'ntw01*' pillar.items opencontrail:config:message_queue:members opencontrail:control:message_queue:members
salt 'nal01*' pillar.items opencontrail:collector:message_queue:members
Apply the changes:
Stop the OpenStack control plane services on the ctl
nodes:
. /root/non-clustered-rabbit-helpers.sh
run_openstack_update_states ctl* upgrade.service_stopped
Stop the OpenStack services on the gtw
nodes. Skip this step if your
deployment has OpenContrail or does not have gtw
nodes.
. /root/non-clustered-rabbit-helpers.sh
run_openstack_update_states gtw* upgrade.service_stopped
Reconfigure the Keepalived and RabbitMQ clusters on the msg
nodes:
Verify that the rabbitmq:cluster
pillars are not present:
salt -C 'I@rabbitmq:server' pillar.items rabbitmq:cluster
Verify that the haproxy
pillars are not present:
salt -C 'I@rabbitmq:server' pillar.item haproxy
Remove HAProxy, HAProxy monitoring, and reconfigure Keepalived:
salt -C 'I@rabbitmq:server' cmd.run "export DEBIAN_FRONTEND=noninteractive; apt purge haproxy -y"
salt -C 'I@rabbitmq:server' state.apply telegraf
salt -C 'I@rabbitmq:server' state.apply keepalived
Verify that a VIP address is present on one of the msg
nodes:
OPENSTCK_MSG_Q_ADDRESS=$(salt msg01* pillar.items _param:openstack_message_queue_address --out json|jq '.[][]')
salt -C 'I@rabbitmq:server' cmd.run "ip addr |grep $OPENSTCK_MSG_Q_ADDRESS"
Stop the RabbitMQ server, clear mnesia
, and reconfigure
rabbitmq-server
:
salt -C 'I@rabbitmq:server' cmd.run 'systemctl stop rabbitmq-server'
salt -C 'I@rabbitmq:server' cmd.run 'rm -rf /var/lib/rabbitmq/mnesia/'
salt -C 'I@rabbitmq:server' state.apply rabbitmq
Verify that the RabbitMQ server is running in a nonclustered configuration:
salt -C 'I@rabbitmq:server' cmd.run "rabbitmqctl --formatter=erlang cluster_status |grep running_nodes"
Example of system response:
msg01.heat-cicd-queens-dvr-sl.local:
{running_nodes,[rabbit@msg01]},
msg03.heat-cicd-queens-dvr-sl.local:
{running_nodes,[rabbit@msg03]},
msg02.heat-cicd-queens-dvr-sl.local:
{running_nodes,[rabbit@msg02]},
Reconfigure OpenStack services on the ctl
nodes:
Apply all OpenStack states on ctl
nodes:
. /root/non-clustered-rabbit-helpers.sh
run_openstack_states ctl*
Verify transport_url
for the OpenStack services on the ctl
nodes:
salt 'ctl*' cmd.run "for s in nova glance cinder keystone heat neutron; do if [[ -d "/etc/\$s" ]]; then grep ^transport_url /etc/\$s/*.conf; fi; done" shell=/bin/bash
Verify that the cells database is updated and transport_url
has a
VIP address:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; nova-manage cell_v2 list_cells"
Reconfigure RabbitMQ on the gtw
nodes. Skip this step if your
deployment has OpenContrail or does not have gtw
nodes.
Apply all OpenStack states on the gtw
nodes:
. /root/non-clustered-rabbit-helpers.sh
run_openstack_states gtw*
Verify transport_url
for the OpenStack services on the gtw
nodes:
salt 'gtw*' cmd.run "for s in nova glance cinder keystone heat neutron; do if [[ -d "/etc/\$s" ]]; then grep ^transport_url /etc/\$s/*.conf; fi; done" shell=/bin/bash
Verify that the agents are up:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack orchestration service list"
If your deployment has OpenContrail, reconfigure RabbitMQ on the ntw
and nal
nodes:
Apply the following state on the ntw
and nal
nodes:
salt -C 'ntw* or nal*' state.apply opencontrail
Verify transport_url
for the OpenStack services on the ntw
and nal
nodes:
salt -C 'ntw* or nal*' cmd.run "for s in contrail; do if [[ -d "/etc/\$s" ]]; then grep ^rabbitmq_server_list /etc/\$s/*.conf; fi; done" shell=/bin/bash
salt 'ntw*' cmd.run "for s in contrail; do if [[ -d "/etc/\$s" ]]; then grep ^rabbit_server /etc/\$s/*.conf; fi; done" shell=/bin/bash
Verify the OpenContrail status:
salt -C 'ntw* or nal*' cmd.run 'doctrail all contrail-status'
Reconfigure OpenStack services on the cmp
nodes:
Apply all OpenStack states on the cmp
nodes:
. /root/non-clustered-rabbit-helpers.sh
run_openstack_states cmp*
Verify transport_url
for the OpenStack services on the cmp
nodes:
salt 'cmp*' cmd.run "for s in nova glance cinder keystone heat neutron; do if [[ -d "/etc/\$s" ]]; then grep ^transport_url /etc/\$s/*.conf; fi; done" shell=/bin/bash
Caution
If your deployment has other nodes with OpenStack services, apply the changes on such nodes as well using the required states.
Verify the services:
Verify that the Neutron services are up. Skip this step if your deployment has OpenContrail.
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack network agent list"
Verify that the Nova services are up:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack compute service list"
Verify that Heat services are up:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack orchestration service list"
Verify that the Cinder services are up:
salt -C 'I@nova:controller and *01*' cmd.run ". /root/keystonercv3; openstack volume service list"
From the ctl01*
node, apply the <app>.upgrade.verify
state. The output should not include errors.
. /root/non-clustered-rabbit-helpers.sh
run_openstack_update_states ctl01* upgrade.verify
Perform post-configuration steps:
Disable the RabbitMQUnequalQueueCritical
Prometheus alert:
In stacklight/server.yml
, add the following variable:
parameters:
prometheus:
server:
alert:
RabbitMQUnequalQueueCritical:
enabled: false
Apply the Prometheus state to the mon
nodes:
salt -C 'I@docker:swarm and I@prometheus:server' state.sls prometheus.server -b1
Revert the changes in the Reclass model on the cluster level:
openstack/control.yaml
, set allow_automatic_dhcp_failover
back to true
or leave as is if you did not change the value.openstack/control.yaml
, remove
nova:controller:update_cells:true
.Apply the Neutron state:
salt -C 'I@neutron:server' state.apply neutron.server
Verify the changes:
salt -C 'I@neutron:server' cmd.run "grep allow_automatic_dhcp_failover /etc/neutron/neutron.conf"
Remove the script:
rm -f /root/non-clustered-rabbit-helpers.sh
See also