Ceph health reports PG_DAMAGED after a failed disk or node replacement¶
After adding a new OSD node on a compact cluster, Ceph health may report
HEALTH_ERR
with the ceph health detail command output containing
PG_DAMAGED
and OSD_SCRUB_ERRORS
messages. For example:
$ ceph -s
cluster:
id: 8bca9dfb-df99-4920-bba0-e5bca59876b4
health: HEALTH_ERR
1 scrub errors
Possible data damage: 1 pg inconsistent
services:
mon: 3 daemons, quorum a,b,c (age 3h)
mgr: a(active, since 3h), standbys: b
osd: 4 osds: 4 up (since 109m), 4 in (since 110m)
rgw: 2 daemons active (2 hosts, 1 zones)
$ ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
[ERR] OSD_SCRUB_ERRORS: 1 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 11.2a is active+clean+inconsistent, acting [3,1]
To fix the PG_DAMAGED health error:
Obtain the damaged placement group (PG) ID:
ceph health detail
Example of system response:
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent [ERR] OSD_SCRUB_ERRORS: 1 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent pg 11.2a is active+clean+inconsistent, acting [3,1]
In the example above,
11.2a
is the required PG ID.Repair the damaged PG:
ceph pg repair <pgid>
Substitute
<pgid>
with a damaged PG ID. For example:ceph pg repair 11.2a