Ceph Monitors store.db size rapidly growing¶
The MON_DISK_LOW Ceph Cluster health message indicates that the
store.db size of the Ceph Monitor is rapidly growing and the compaction
procedure is not working. In most cases, store.db starts storing a
number of logm keys that are buffered due to Ceph OSD shadow errors.
To verify if store.db size is rapidly growing:
Identify the Ceph Monitors
store.dbsize:for pod in $(kubectl get pods -n rook-ceph | grep mon | awk '{print $1}'); \ do printf "$pod:\n"; kubectl exec -n rook-ceph "$pod" -it -c mon -- \ du -cms /var/lib/ceph/mon/ ; done
Repeat the previous step two or three times within the interval of 5-15 seconds.
If between the command runs the total size increases by more than 10 MB, perform the steps described below to resolve the issue.
To apply the issue resolution:
Verify the original state of placement groups (PGs):
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
Apply
clog_to_monitorswith thefalsevalue for all Ceph OSDs at runtime:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash ceph tell osd.* config set clog_to_monitors false
Restart Ceph OSDs one by one:
Restart one of the Ceph OSDs:
for pod in $(kubectl get pods -n rook-ceph -l app=rook-ceph-osd | \ awk 'FNR>1{print $1}'); do printf "$pod:\n"; kubectl -n rook-ceph \ delete pod "$pod"; echo "Continue?"; read; done
Once prompted
Continue?, first verify that rebalancing has finished for the Ceph cluster, the Ceph OSD isupandin, and all PGs have returned to their original state:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd tree
Once you are confident that the Ceph OSD restart and recovery is over, press
ENTER.Restart the remaining Ceph OSDs.
Note
Periodically verify the Ceph Monitors
store.dbsize:for pod in $(kubectl get pods -n rook-ceph | grep mon | awk \ '{print $1}'); do printf "$pod:\n"; kubectl exec -n rook-ceph \ "$pod" -it -c mon -- du -cms /var/lib/ceph/mon/ ; done
After some of the affected Ceph OSDs restart, Ceph Monitors will start
decreasing the store.db size to the original 100-300 MB. However,
complete the restart of all Ceph OSDs.