Shut down a Ceph cluster for maintenance

Shut down a Ceph cluster for maintenance¶

This section describes how to properly shut down an entire Ceph cluster for maintenance and bring it up afterward.

To shut down a Ceph cluster for maintenance:

Log in to the Salt Master node.
Stop the OpenStack workloads.
Stop the services that are using the Ceph cluster. For example:
- Manila workloads (if you have shares on top of Ceph mount points)
- heat-engine (if it has the autoscaling option enabled)
- glance-api (if it uses Ceph to store images)
- cinder-scheduler (if it uses Ceph to store images)

Identify the first Ceph Monitor for operations:

CEPH_MON=$(salt -C 'I@ceph:mon' --out=txt test.ping | sort | head -1 | \
    cut -d: -f1)

Verify that the Ceph cluster is in healthy state:

salt "${CEPH_MON}" cmd.run 'ceph -s'

Example of system response:

cmn01.domain.com:
        cluster e0b75d1b-544c-4e5d-98ac-cfbaf29387ca
         health HEALTH_OK
         monmap e3: 3 mons at {cmn01=192.168.16.14:6789/0,cmn02=192.168.16.15:6789/0,cmn03=192.168.16.16:6789/0}
                election epoch 42, quorum 0,1,2 cmn01,cmn02,cmn03
         osdmap e102: 6 osds: 6 up, 6 in
                flags sortbitwise,require_jewel_osds
          pgmap v41138: 384 pgs, 6 pools, 45056 kB data, 19 objects
                798 MB used, 60575 MB / 61373 MB avail
                     384 active+clean

Set the following flags to disable rebalancing and restructuring and to pause the Ceph cluster:

salt "${CEPH_MON}" cmd.run 'ceph osd set noout'
salt "${CEPH_MON}" cmd.run 'ceph osd set nobackfill'
salt "${CEPH_MON}" cmd.run 'ceph osd set norecover'
salt "${CEPH_MON}" cmd.run 'ceph osd set norebalance'
salt "${CEPH_MON}" cmd.run 'ceph osd set nodown'
salt "${CEPH_MON}" cmd.run 'ceph osd set pause'

Verify that the flags are set:

salt "${CEPH_MON}" cmd.run 'ceph -s'

Example of system response:

cmn01.domain.com:
        cluster e0b75d1b-544c-4e5d-98ac-cfbaf29387ca
         health **HEALTH_WARN**
                **pauserd**,**pausewr**,**nodown**,**noout**,**nobackfill**,**norebalance**,**norecover** flag(s) set
         monmap e3: 3 mons at {cmn01=192.168.16.14:6789/0,cmn02=192.168.16.15:6789/0,cmn03=192.168.16.16:6789/0}
                election epoch 42, quorum 0,1,2 cmn01,cmn02,cmn03
         osdmap e108: 6 osds: 6 up, 6 in
                flags **pauserd**,**pausewr**,**nodown**,**noout**,**nobackfill**,**norebalance**,**norecover**,sortbitwise,require_jewel_osds
          pgmap v41152: 384 pgs, 6 pools, 45056 kB data, 19 objects
                799 MB used, 60574 MB / 61373 MB avail
                     384 active+clean

Shut down the Ceph cluster.
Warning

Shut down the nodes one by one in the following order:
1. Service nodes (for example, RADOS Gateway nodes)
2. Ceph OSD nodes
3. Ceph Monitor nodes

Once done, perform the maintenance as required.

To start a Ceph cluster after maintenance:

Log in to the Salt Master node.
Start the Ceph cluster nodes.
Warning

Start the Ceph nodes one by one in the following order:
1. Ceph Monitor nodes
2. Ceph OSD nodes
3. Service nodes (for example, RADOS Gateway nodes)
Verify that the Salt minions are up:
```
salt -C "I@ceph:common" test.ping
```
Verify that the date is the same for all Ceph clients:
```
salt -C "I@ceph:common" cmd.run date
```

Identify the first Ceph Monitor for operations:

CEPH_MON=$(salt -C 'I@ceph:mon' --out=txt test.ping | sort | head -1 | \
    cut -d: -f1)

Unset the following flags to resume the Ceph cluster:

salt "${CEPH_MON}" cmd.run 'ceph osd unset pause'
salt "${CEPH_MON}" cmd.run 'ceph osd unset nodown'
salt "${CEPH_MON}" cmd.run 'ceph osd unset norebalance'
salt "${CEPH_MON}" cmd.run 'ceph osd unset norecover'
salt "${CEPH_MON}" cmd.run 'ceph osd unset nobackfill'
salt "${CEPH_MON}" cmd.run 'ceph osd unset noout'

Verify that the Ceph cluster is in healthy state:

salt "${CEPH_MON}" cmd.run 'ceph -s'

Example of system response:

cmn01.domain.com:
        cluster e0b75d1b-544c-4e5d-98ac-cfbaf29387ca
         health HEALTH_OK
         monmap e3: 3 mons at {cmn01=192.168.16.14:6789/0,cmn02=192.168.16.15:6789/0,cmn03=192.168.16.16:6789/0}
                election epoch 42, quorum 0,1,2 cmn01,cmn02,cmn03
         osdmap e102: 6 osds: 6 up, 6 in
                flags sortbitwise,require_jewel_osds
          pgmap v41138: 384 pgs, 6 pools, 45056 kB data, 19 objects
                798 MB used, 60575 MB / 61373 MB avail
                     384 active+clean

View Previous Section

Back up and restore Ceph