Ceph Monitors store.db size rapidly growing¶
The MON_DISK_LOW
Ceph Cluster health message indicates that the
store.db
size of the Ceph Monitor is rapidly growing and the compaction
procedure is not working. In most cases, store.db
starts storing a
number of logm
keys that are buffered due to Ceph OSD shadow errors.
To verify if store.db size is rapidly growing:
Identify the Ceph Monitors
store.db
size:for pod in $(kubectl get pods -n rook-ceph | grep mon | awk '{print $1}'); \ do printf "$pod:\n"; kubectl exec -n rook-ceph "$pod" -it -c mon -- \ du -cms /var/lib/ceph/mon/ ; done
Repeat the previous step two or three times within the interval of 5-15 seconds.
If between the command runs the total size increases by more than 10 MB, perform the steps described below to resolve the issue.
To apply the issue resolution:
Verify the original state of placement groups (PGs):
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
Apply
clog_to_monitors
with thefalse
value for all Ceph OSDs at runtime:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash ceph tell osd.* config set clog_to_monitors false
Restart Ceph OSDs one by one:
Restart one of the Ceph OSDs:
for pod in $(kubectl get pods -n rook-ceph -l app=rook-ceph-osd | \ awk 'FNR>1{print $1}'); do printf "$pod:\n"; kubectl -n rook-ceph \ delete pod "$pod"; echo "Continue?"; read; done
Once prompted
Continue?
, first verify that rebalancing has finished for the Ceph cluster, the Ceph OSD isup
andin
, and all PGs have returned to their original state:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd tree
Once you are confident that the Ceph OSD restart and recovery is over, press
ENTER
.Restart the remaining Ceph OSDs.
Note
Periodically verify the Ceph Monitors
store.db
size:for pod in $(kubectl get pods -n rook-ceph | grep mon | awk \ '{print $1}'); do printf "$pod:\n"; kubectl exec -n rook-ceph \ "$pod" -it -c mon -- du -cms /var/lib/ceph/mon/ ; done
After some of the affected Ceph OSDs restart, Ceph Monitors will start
decreasing the store.db
size to the original 100-300 MB. However,
complete the restart of all Ceph OSDs.