Issues resolutions requiring manual application

Issues resolutions requiring manual application

Note

Before proceeding with the manual steps below, verify that you have performed the steps described in Apply maintenance updates.


[30161] Ceph Monitor nodes backups can cause cluster outage

Fixed the issue with scheduled backups of the Ceph Monitor nodes, which could cause cluster raise condition or outage. Now, the backups for different Ceph Monitor nodes run at a different time. An additional health check has been added to verify the Ceph Monitor nodes during backup.

To apply the issue resolution:

  1. Open your Git project repository with the Reclass model on the cluster level.

  2. In classes/cluster/cluster_name/ceph/mon.yml, add the following parameters:

    parameters:
      ceph:
        backup:
          client:
            backup_times:
              hour: ${_param:ceph_backup_time}
    
  3. In classes/cluster/cluster_name/ceph/init.yml, add the following pillar in the parameters section:

    ceph_mon_node01_ceph_backup_hour: 2
    ceph_mon_node02_ceph_backup_hour: 3
    ceph_mon_node03_ceph_backup_hour: 4
    
  4. In classes/cluster/cluster_name/infra/config/nodes.yml, for each Ceph Monitor node specify the ceph_backup_time parameter. For example:

    ceph_mon_node01:
      params:
        {%- if cookiecutter.get('static_ips_on_deploy_network_enabled', 'False') == 'True' %}
        deploy_address: ${_param:ceph_mon_node01_deploy_address}
        {%- endif %}
        ceph_public_address: ${_param:ceph_mon_node01_ceph_public_address}
        ceph_backup_time: ${_param:ceph_mon_node01_ceph_backup_hour}
    
  5. Log in to the Salt Master node.

  6. Apply the following states:

    salt -C "I@ceph:mon" state.sls ceph.backup
    salt "cfg01*" state.sls reclass.storage
    
  7. In crontab on each Ceph Monitor, verify that the scripts running time changed.


[29811] Inability to change the maximum number of PGs per OSD

Fixed the issue with inability to change the maximum number of PGs per OSD using the mon_max_pg_per_osd parameter.

To apply the issue resolution:

  1. Log in to the Salt Master node.

  2. Apply the ceph.common state on all Ceph nodes:

    salt -C "I@ceph:common" state.sls ceph.common
    
  3. Restart the Ceph Monitor, Manager, OSD, and RADOS Gateway services on the Ceph nodes in the following strict order:

    Warning

    After the restart of every service, wait for the system to become healthy. Use the ceph health command to verify the Ceph cluster status.

    1. Restart the Ceph Monitor and Manager services on all cmn nodes one by one:

      salt -C NODE_NAME cmd.run 'systemctl restart ceph-mon.target'
      salt -C NODE_NAME cmd.run 'systemctl restart ceph-mgr.target'
      salt -C NODE_NAME cmd.run 'ceph -s'
      
    2. Restart the Ceph OSD services on all osd nodes one by one:

      salt -C NODE_NAME cmd.run 'systemctl restart ceph-osd@<osd_num>'
      
    3. Restart the RADOS Gateway service on all rgw nodes one by one:

      salt -C NODE_NAME cmd.run 'systemctl restart ceph-radosgw.target'