Issues resolutions requiring manual application

Issues resolutions requiring manual application¶

Note

Before proceeding with the manual steps below, verify that you have performed the steps described in Apply maintenance updates.

[30161] Ceph Monitor nodes backups can cause cluster outage
[29811] Inability to change the maximum number of PGs per OSD

[30161] Ceph Monitor nodes backups can cause cluster outage¶

Fixed the issue with scheduled backups of the Ceph Monitor nodes, which could cause cluster raise condition or outage. Now, the backups for different Ceph Monitor nodes run at a different time. An additional health check has been added to verify the Ceph Monitor nodes during backup.

To apply the issue resolution:

Open your Git project repository with the Reclass model on the cluster level.

In classes/cluster/cluster_name/ceph/mon.yml, add the following parameters:

parameters:
  ceph:
    backup:
      client:
        backup_times:
          hour: ${_param:ceph_backup_time}

In classes/cluster/cluster_name/ceph/init.yml, add the following pillar in the parameters section:

ceph_mon_node01_ceph_backup_hour: 2
ceph_mon_node02_ceph_backup_hour: 3
ceph_mon_node03_ceph_backup_hour: 4

In classes/cluster/cluster_name/infra/config/nodes.yml, for each Ceph Monitor node specify the ceph_backup_time parameter. For example:

ceph_mon_node01:
  params:
    {%- if cookiecutter.get('static_ips_on_deploy_network_enabled', 'False') == 'True' %}
    deploy_address: ${_param:ceph_mon_node01_deploy_address}
    {%- endif %}
    ceph_public_address: ${_param:ceph_mon_node01_ceph_public_address}
    ceph_backup_time: ${_param:ceph_mon_node01_ceph_backup_hour}

Apply the following states:

salt -C "I@ceph:mon" state.sls ceph.backup
salt "cfg01*" state.sls reclass.storage

In crontab on each Ceph Monitor, verify that the scripts running time changed.

[29811] Inability to change the maximum number of PGs per OSD¶

Fixed the issue with inability to change the maximum number of PGs per OSD using the mon_max_pg_per_osd parameter.

To apply the issue resolution:

Apply the ceph.common state on all Ceph nodes:

salt -C "I@ceph:common" state.sls ceph.common

Restart the Ceph Monitor, Manager, OSD, and RADOS Gateway services on the Ceph nodes in the following strict order:

Warning

After the restart of every service, wait for the system to become healthy. Use the ceph health command to verify the Ceph cluster status.
1. Restart the Ceph Monitor and Manager services on all cmn nodes one by one:
```
salt -C NODE_NAME cmd.run 'systemctl restart ceph-mon.target'
salt -C NODE_NAME cmd.run 'systemctl restart ceph-mgr.target'
salt -C NODE_NAME cmd.run 'ceph -s'
```
2. Restart the Ceph OSD services on all osd nodes one by one:
```
salt -C NODE_NAME cmd.run 'systemctl restart ceph-osd@<osd_num>'
```
3. Restart the RADOS Gateway service on all rgw nodes one by one:
```
salt -C NODE_NAME cmd.run 'systemctl restart ceph-radosgw.target'
```

updated: 2025-01-10 09:02

Issues resolutions applied automatically

View Previous Section

Known issues