This section describes how to properly shut down an entire Ceph cluster for maintenance and bring it up afterward.
To shut down a Ceph cluster for maintenance:
Log in to the Salt Master node.
Stop the OpenStack workloads.
Stop the services that are using the Ceph cluster. For example:
heat-engine
(if it has the autoscaling option enabled)glance-api
(if it uses Ceph to store images)cinder-scheduler
(if it uses Ceph to store images)Identify the first Ceph Monitor for operations:
CEPH_MON=$(salt -C 'I@ceph:mon' --out=txt test.ping | sort | head -1 | \
cut -d: -f1)
Verify that the Ceph cluster is in healthy state:
salt "${CEPH_MON}" cmd.run 'ceph -s'
Example of system response:
cmn01.domain.com:
cluster e0b75d1b-544c-4e5d-98ac-cfbaf29387ca
health HEALTH_OK
monmap e3: 3 mons at {cmn01=192.168.16.14:6789/0,cmn02=192.168.16.15:6789/0,cmn03=192.168.16.16:6789/0}
election epoch 42, quorum 0,1,2 cmn01,cmn02,cmn03
osdmap e102: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v41138: 384 pgs, 6 pools, 45056 kB data, 19 objects
798 MB used, 60575 MB / 61373 MB avail
384 active+clean
Set the following flags to disable rebalancing and restructuring and to pause the Ceph cluster:
salt "${CEPH_MON}" cmd.run 'ceph osd set noout'
salt "${CEPH_MON}" cmd.run 'ceph osd set nobackfill'
salt "${CEPH_MON}" cmd.run 'ceph osd set norecover'
salt "${CEPH_MON}" cmd.run 'ceph osd set norebalance'
salt "${CEPH_MON}" cmd.run 'ceph osd set nodown'
salt "${CEPH_MON}" cmd.run 'ceph osd set pause'
Verify that the flags are set:
salt "${CEPH_MON}" cmd.run 'ceph -s'
Example of system response:
cmn01.domain.com:
cluster e0b75d1b-544c-4e5d-98ac-cfbaf29387ca
health **HEALTH_WARN**
**pauserd**,**pausewr**,**nodown**,**noout**,**nobackfill**,**norebalance**,**norecover** flag(s) set
monmap e3: 3 mons at {cmn01=192.168.16.14:6789/0,cmn02=192.168.16.15:6789/0,cmn03=192.168.16.16:6789/0}
election epoch 42, quorum 0,1,2 cmn01,cmn02,cmn03
osdmap e108: 6 osds: 6 up, 6 in
flags **pauserd**,**pausewr**,**nodown**,**noout**,**nobackfill**,**norebalance**,**norecover**,sortbitwise,require_jewel_osds
pgmap v41152: 384 pgs, 6 pools, 45056 kB data, 19 objects
799 MB used, 60574 MB / 61373 MB avail
384 active+clean
Shut down the Ceph cluster.
Warning
Shut down the nodes one by one in the following order:
Once done, perform the maintenance as required.
To start a Ceph cluster after maintenance:
Log in to the Salt Master node.
Start the Ceph cluster nodes.
Warning
Start the Ceph nodes one by one in the following order:
Verify that the Salt minions are up:
salt -C "I@ceph:common" test.ping
Verify that the date is the same for all Ceph clients:
salt -C "I@ceph:common" cmd.run date
Identify the first Ceph Monitor for operations:
CEPH_MON=$(salt -C 'I@ceph:mon' --out=txt test.ping | sort | head -1 | \
cut -d: -f1)
Unset the following flags to resume the Ceph cluster:
salt "${CEPH_MON}" cmd.run 'ceph osd unset pause'
salt "${CEPH_MON}" cmd.run 'ceph osd unset nodown'
salt "${CEPH_MON}" cmd.run 'ceph osd unset norebalance'
salt "${CEPH_MON}" cmd.run 'ceph osd unset norecover'
salt "${CEPH_MON}" cmd.run 'ceph osd unset nobackfill'
salt "${CEPH_MON}" cmd.run 'ceph osd unset noout'
Verify that the Ceph cluster is in healthy state:
salt "${CEPH_MON}" cmd.run 'ceph -s'
Example of system response:
cmn01.domain.com:
cluster e0b75d1b-544c-4e5d-98ac-cfbaf29387ca
health HEALTH_OK
monmap e3: 3 mons at {cmn01=192.168.16.14:6789/0,cmn02=192.168.16.15:6789/0,cmn03=192.168.16.16:6789/0}
election epoch 42, quorum 0,1,2 cmn01,cmn02,cmn03
osdmap e102: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v41138: 384 pgs, 6 pools, 45056 kB data, 19 objects
798 MB used, 60575 MB / 61373 MB avail
384 active+clean