Restarting the RADOS Gateway service using systemctl
may fail. The
workaround is to restart the service manually.
Workaround:
Log in to an rgw
node.
Obtain the process ID of the RADOS Gateway service:
ps uax | grep radosgw
Example of system response:
root 17526 0.0 0.0 13232 976 pts/0 S+ 10:30 \
0:00 grep --color=auto radosgw
ceph 20728 0.1 1.4 1306844 58204 ? Ssl Jan28 \
2:51 /usr/bin/radosgw -f --cluster ceph --name client.rgw.rgw01 --setuser ceph --setgroup ceph
Where the process ID is 20728
.
Stop the process using the obtained process ID. For example:
kill -9 $20728
Start the RADOS Gateway service specifying the node name, for example,
client.rgw.rgw01
:
/usr/bin/radosgw --cluster ceph --name client.rgw.rgw01 --setuser ceph --setgroup ceph
Perform the steps 1 - 4 from the remaining rgw
nodes one by one.
Fixed in 2019.2.3
The upgrade of a Ceph cluster from Jewel to Luminous using the
Ceph - upgrade Jenkins pipeline job does not include an automatic
check if other components were upgraded before upgrading the rgw
nodes.
As a result, uploading a file to object storage may fail. The workaround
is to upgrade the rgw
nodes only after you have successfully upgraded the
mon
, mgr
, and osd
nodes.
The tempest.api.object_storage.test_account_quotas.AccountQuotasTest.test_admin_modify_quota
Tempest test fails because modifying the account quota is not possible even
if the OpenStack user has the ResellerAdmin
role. Setting a quota using
the Swift CLI and API served by RADOS Gateway is also not possible. As a
workaround set the quotas using the radosgw-admin
utility (requires
an SSH access to an OpenStack environment) as described in
Quota management
or using the RADOS Gateway Admin Operations API as described in
Quotas.
Creating Swift containers with custom headers using the Heat stack or the
tempest.api.orchestration.stacks.test_swift_resources.SwiftResourcesTestJSON.test_acl
Tempest test fails. As a workaround, first create a container without
additional parameters and then set the metadata variables as required.
Fixed in 2019.2.4
The mon_max_pg_per_osd
variable is set in a wrong section and does not
apply on the Ceph OSDs. The workaround is to manually apply the necessary
changes to the cluster model.
Workaround:
In classes/cluster/<cluster_name>/ceph/common.yml
, define the
additional parameters in the ceph:common
pillar as follows:
parameters:
ceph:
common:
config:
global:
mon_max_pg_per_osd: 600
In /classes/service/ceph/mon/cluster.yml
and
/classes/service/ceph/mon/single.yml
, remove the configuration for
mon_max_pg_per_osd
:
common:
# config:
# mon:
# mon_max_pg_per_osd: 600
Apply the ceph.common state on the Ceph nodes:
salt -C "I@ceph:common" state.sls ceph.common
Set the noout
and norebalance
flags:
ceph osd set noout
ceph osd set norebalance
Restart the Ceph Monitor services on the cmn
nodes one by one. Verify
that the nodes are in the HEALTH_OK
status after each Ceph Monitor
restart.
salt -C <HOST_NAME> cmd.run 'systemctl restart ceph-mon.target'
salt -C <HOST_NAME> cmd.run 'systemctl restart ceph-mgr.target'
salt -C <HOST_NAME> cmd.run 'ceph -s'
Restart the Ceph OSD services on the osd
nodes one by one:
On each Ceph OSD node verify the OSDs running:
ceph001# ceph osd status 2>&1 | grep $(hostname)
For each Ceph OSD number:
ceph001# service ceph-osd@OSD_NR_FROM_LIST status
ceph001# service ceph-osd@OSD_NR_FROM_LIST restart
ceph001# service ceph-osd@OSD_NR_FROM_LIST status
Verify that the cluster is in the HEALTH_OK
status before
restarting the next Ceph OSD.
When the last Ceph OSD restarts, unset the noout
and norebalance
flags:
ceph osd unset noout
ceph osd unset norebalance