Ceph Monitors store.db size rapidly growing¶

The MON_DISK_LOW Ceph Cluster health message indicates that the store.db size of the Ceph Monitor is rapidly growing and the compaction procedure is not working. In most cases, store.db starts storing a number of logm keys that are buffered due to Ceph OSD shadow errors.

To verify if store.db size is rapidly growing:

Identify the Ceph Monitors store.db size:

for pod in $(kubectl get pods -n rook-ceph | grep mon | awk '{print $1}'); \
do printf "$pod:\n"; kubectl exec -n rook-ceph "$pod" -it -c mon -- \
du -cms /var/lib/ceph/mon/ ; done

Repeat the previous step two or three times within the interval of 5-15 seconds.

If between the command runs the total size increases by more than 10 MB, perform the steps described below to resolve the issue.

To apply the issue resolution:

Verify the original state of placement groups (PGs):

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s

Apply clog_to_monitors with the false value for all Ceph OSDs at runtime:

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
ceph tell osd.* config set clog_to_monitors false

Restart Ceph OSDs one by one:

Restart one of the Ceph OSDs:

for pod in $(kubectl get pods -n rook-ceph -l app=rook-ceph-osd | \
awk 'FNR>1{print $1}'); do printf "$pod:\n"; kubectl -n rook-ceph \
delete pod "$pod"; echo "Continue?"; read; done

Once prompted Continue?, first verify that rebalancing has finished for the Ceph cluster, the Ceph OSD is up and in, and all PGs have returned to their original state:
```
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd tree
```
Once you are confident that the Ceph OSD restart and recovery is over, press ENTER.

Restart the remaining Ceph OSDs.

Note

Periodically verify the Ceph Monitors store.db size:

for pod in $(kubectl get pods -n rook-ceph | grep mon | awk \
'{print $1}'); do printf "$pod:\n"; kubectl exec -n rook-ceph \
"$pod" -it -c mon -- du -cms /var/lib/ceph/mon/ ; done

After some of the affected Ceph OSDs restart, Ceph Monitors will start decreasing the store.db size to the original 100-300 MB. However, complete the restart of all Ceph OSDs.

No results

An error occurred

Ceph Monitors store.db size rapidly growing¶