Note
Before proceeding with the manual steps below, verify that you have performed the steps described in Apply maintenance updates.
Fixed the issue with scheduled backups of the Ceph Monitor nodes, which could cause cluster raise condition or outage. Now, the backups for different Ceph Monitor nodes run at a different time. An additional health check has been added to verify the Ceph Monitor nodes during backup.
To apply the issue resolution:
Open your Git project repository with the Reclass model on the cluster level.
In classes/cluster/cluster_name/ceph/mon.yml
, add the following
parameters:
parameters:
ceph:
backup:
client:
backup_times:
hour: ${_param:ceph_backup_time}
In classes/cluster/cluster_name/ceph/init.yml
, add the following
pillar in the parameters
section:
ceph_mon_node01_ceph_backup_hour: 2
ceph_mon_node02_ceph_backup_hour: 3
ceph_mon_node03_ceph_backup_hour: 4
In classes/cluster/cluster_name/infra/config/nodes.yml
, for each Ceph
Monitor node specify the ceph_backup_time
parameter. For example:
ceph_mon_node01:
params:
{%- if cookiecutter.get('static_ips_on_deploy_network_enabled', 'False') == 'True' %}
deploy_address: ${_param:ceph_mon_node01_deploy_address}
{%- endif %}
ceph_public_address: ${_param:ceph_mon_node01_ceph_public_address}
ceph_backup_time: ${_param:ceph_mon_node01_ceph_backup_hour}
Log in to the Salt Master node.
Apply the following states:
salt -C "I@ceph:mon" state.sls ceph.backup
salt "cfg01*" state.sls reclass.storage
In crontab on each Ceph Monitor, verify that the scripts running time changed.
Fixed the issue with inability to change the maximum number of PGs per OSD
using the mon_max_pg_per_osd
parameter.
To apply the issue resolution:
Log in to the Salt Master node.
Apply the ceph.common
state on all Ceph nodes:
salt -C "I@ceph:common" state.sls ceph.common
Restart the Ceph Monitor, Manager, OSD, and RADOS Gateway services on the Ceph nodes in the following strict order:
Warning
After the restart of every service, wait for the system to
become healthy. Use the ceph health
command to verify the Ceph
cluster status.
Restart the Ceph Monitor and Manager services on all cmn
nodes one by
one:
salt -C NODE_NAME cmd.run 'systemctl restart ceph-mon.target'
salt -C NODE_NAME cmd.run 'systemctl restart ceph-mgr.target'
salt -C NODE_NAME cmd.run 'ceph -s'
Restart the Ceph OSD services on all osd
nodes one by one:
salt -C NODE_NAME cmd.run 'systemctl restart ceph-osd@<osd_num>'
Restart the RADOS Gateway service on all rgw
nodes one by one:
salt -C NODE_NAME cmd.run 'systemctl restart ceph-radosgw.target'