Shut down a Ceph cluster for maintenance

Shut down a Ceph cluster for maintenance

This section describes how to properly shut down an entire Ceph cluster for maintenance and bring it up afterward.

To shut down a Ceph cluster for maintenance:

  1. Log in to the Salt Master node.

  2. Stop the OpenStack workloads.

  3. Stop the services that are using the Ceph cluster. For example:

    • Manila workloads (if you have shares on top of Ceph mount points)

    • heat-engine (if it has the autoscaling option enabled)

    • glance-api (if it uses Ceph to store images)

    • cinder-scheduler (if it uses Ceph to store images)

  4. Identify the first Ceph Monitor for operations:

    CEPH_MON=$(salt -C 'I@ceph:mon' --out=txt test.ping | sort | head -1 | \
        cut -d: -f1)
    
  5. Verify that the Ceph cluster is in healthy state:

    salt "${CEPH_MON}" cmd.run 'ceph -s'
    

    Example of system response:

    cmn01.domain.com:
            cluster e0b75d1b-544c-4e5d-98ac-cfbaf29387ca
             health HEALTH_OK
             monmap e3: 3 mons at {cmn01=192.168.16.14:6789/0,cmn02=192.168.16.15:6789/0,cmn03=192.168.16.16:6789/0}
                    election epoch 42, quorum 0,1,2 cmn01,cmn02,cmn03
             osdmap e102: 6 osds: 6 up, 6 in
                    flags sortbitwise,require_jewel_osds
              pgmap v41138: 384 pgs, 6 pools, 45056 kB data, 19 objects
                    798 MB used, 60575 MB / 61373 MB avail
                         384 active+clean
    
  6. Set the following flags to disable rebalancing and restructuring and to pause the Ceph cluster:

    salt "${CEPH_MON}" cmd.run 'ceph osd set noout'
    salt "${CEPH_MON}" cmd.run 'ceph osd set nobackfill'
    salt "${CEPH_MON}" cmd.run 'ceph osd set norecover'
    salt "${CEPH_MON}" cmd.run 'ceph osd set norebalance'
    salt "${CEPH_MON}" cmd.run 'ceph osd set nodown'
    salt "${CEPH_MON}" cmd.run 'ceph osd set pause'
    
  7. Verify that the flags are set:

    salt "${CEPH_MON}" cmd.run 'ceph -s'
    

    Example of system response:

    cmn01.domain.com:
            cluster e0b75d1b-544c-4e5d-98ac-cfbaf29387ca
             health **HEALTH_WARN**
                    **pauserd**,**pausewr**,**nodown**,**noout**,**nobackfill**,**norebalance**,**norecover** flag(s) set
             monmap e3: 3 mons at {cmn01=192.168.16.14:6789/0,cmn02=192.168.16.15:6789/0,cmn03=192.168.16.16:6789/0}
                    election epoch 42, quorum 0,1,2 cmn01,cmn02,cmn03
             osdmap e108: 6 osds: 6 up, 6 in
                    flags **pauserd**,**pausewr**,**nodown**,**noout**,**nobackfill**,**norebalance**,**norecover**,sortbitwise,require_jewel_osds
              pgmap v41152: 384 pgs, 6 pools, 45056 kB data, 19 objects
                    799 MB used, 60574 MB / 61373 MB avail
                         384 active+clean
    
  8. Shut down the Ceph cluster.

    Warning

    Shut down the nodes one by one in the following order:

    1. Service nodes (for example, RADOS Gateway nodes)

    2. Ceph OSD nodes

    3. Ceph Monitor nodes

Once done, perform the maintenance as required.


To start a Ceph cluster after maintenance:

  1. Log in to the Salt Master node.

  2. Start the Ceph cluster nodes.

    Warning

    Start the Ceph nodes one by one in the following order:

    1. Ceph Monitor nodes

    2. Ceph OSD nodes

    3. Service nodes (for example, RADOS Gateway nodes)

  3. Verify that the Salt minions are up:

    salt -C "I@ceph:common" test.ping
    
  4. Verify that the date is the same for all Ceph clients:

    salt -C "I@ceph:common" cmd.run date
    
  5. Identify the first Ceph Monitor for operations:

    CEPH_MON=$(salt -C 'I@ceph:mon' --out=txt test.ping | sort | head -1 | \
        cut -d: -f1)
    
  6. Unset the following flags to resume the Ceph cluster:

    salt "${CEPH_MON}" cmd.run 'ceph osd unset pause'
    salt "${CEPH_MON}" cmd.run 'ceph osd unset nodown'
    salt "${CEPH_MON}" cmd.run 'ceph osd unset norebalance'
    salt "${CEPH_MON}" cmd.run 'ceph osd unset norecover'
    salt "${CEPH_MON}" cmd.run 'ceph osd unset nobackfill'
    salt "${CEPH_MON}" cmd.run 'ceph osd unset noout'
    
  7. Verify that the Ceph cluster is in healthy state:

    salt "${CEPH_MON}" cmd.run 'ceph -s'
    

    Example of system response:

    cmn01.domain.com:
            cluster e0b75d1b-544c-4e5d-98ac-cfbaf29387ca
             health HEALTH_OK
             monmap e3: 3 mons at {cmn01=192.168.16.14:6789/0,cmn02=192.168.16.15:6789/0,cmn03=192.168.16.16:6789/0}
                    election epoch 42, quorum 0,1,2 cmn01,cmn02,cmn03
             osdmap e102: 6 osds: 6 up, 6 in
                    flags sortbitwise,require_jewel_osds
              pgmap v41138: 384 pgs, 6 pools, 45056 kB data, 19 objects
                    798 MB used, 60575 MB / 61373 MB avail
                         384 active+clean