Move a Ceph Monitor daemon to another node during machine disablement
Note
Consider this section as part of the Disable a machine procedure.
This section describes how to migrate a Ceph Monitor daemon from one node to another without changing the general number of Ceph Monitors in the cluster. The instruction applies during disablement of a machine with a Ceph Monitor role.
In the Pelagia Controllers concept, migration of a Ceph Monitor means manually removing it from one node and adding it to another.
Consider the following exemplary placement scheme of Ceph Monitors in the
nodes spec of the CephDeployment custom resource (CR):
spec:
nodes:
node-1:
roles:
- mon
- mgr
node-2:
roles:
- mgr
Using the example above, if you want to move the Ceph Monitor from node-1
to node-2 without changing the number of Ceph Monitors, the roles
table of the nodes spec must result as follows:
spec:
nodes:
node-1:
roles:
- mgr
node-2:
roles:
- mgr
- mon
However, due to the Rook limitation related to Kubernetes architecture, once
you move the Ceph Monitor through the CephDeployment CR, changes will not
apply automatically. This is caused by the following Rook behavior:
Rook creates Ceph Monitor resources as deployments with
nodeSelector, which binds Ceph Monitor pods to a requested node.Rook does not recreate new Ceph Monitors with the new node placement if the current
monquorum works.
Therefore, to move a Ceph Monitor to another node, you must also manually apply the new Ceph Monitors placement to the Ceph cluster as described below.
To move a Ceph Monitor to another node:
Open the
CephDeploymentCR for editing:kubectl -n pelagia edit cephdpl
In the
nodesspec of theCephDeploymentCR, change themonroles placement without changing the total number ofmonroles. For details, see the example above. Capture thenamevalues of the nodes on which themonroles have been removed.Verify that the following conditions are met before proceeding to the next step:
There are at least 2 running and available Ceph Monitors so that the Ceph cluster is accessible during the Ceph Monitor migration:
kubectl -n rook-ceph get pod -l app=rook-ceph-mon kubectl -n rook-ceph exec -it deploy/pelagia-ceph-toolbox -- ceph -s
The
CephDeploymentobject on the MOSK cluster has the required node with themonrole added in thenodessection ofspec:kubectl -n ceph-lcm-mirantis get cephdpl -o yaml
The Ceph
NodeWorkloadLockfor the required node is created:kubectl --kubeconfig <managed-cluster-kubeconfig> get nodeworkloadlock -o jsonpath='{range .items[?(@.spec.nodeName == "<requiredNodeName>")]}{@.metadata.name}{"\n"}{end}' | grep ceph
Scale the
ceph-maintenance-controllerandpelagia-lcm-controllerdeployments to0replicas:kubectl -n ceph-lcm-mirantis scale deploy ceph-maintenance-controller --replicas 0 kubectl -n ceph-lcm-mirantis scale deploy pelagia-lcm-controller --replicas 0
Verify that the
rook-ceph-operatordeployment is scaled to0replicas:kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 0
Obtain the
rook-ceph-mondeployment name placed on the obsolete node using the previously obtained node name:kubectl -n rook-ceph get deploy -l app=rook-ceph-mon -o jsonpath="{.items[?(@.spec.template.spec.nodeSelector['kubernetes\.io/hostname'] == '<nodeName>')].metadata.name}"
Substitute
<nodeName>with the name of the node where you removed themonrole.Back up the
rook-ceph-mondeployment placed on the obsolete node:kubectl -n rook-ceph get deploy <rook-ceph-mon-name> -o yaml > <rook-ceph-mon-name>-backup.yaml
Remove the
rook-ceph-mondeployment placed on the obsolete node:kubectl -n rook-ceph delete deploy <rook-ceph-mon-name>
Enter the
pelagia-ceph-toolboxpod:kubectl -n rook-ceph exec -it deploy/pelagia-ceph-toolbox -- bash
Remove the Ceph Monitor from the Ceph
monmapby letter:ceph mon rm <monLetter>
Substitute
<monLetter>with the old Ceph Monitor letter. For example,mon-bhas the letterb.Verify that the Ceph cluster does not have any information about the removed Ceph Monitor:
ceph mon dump ceph -s
Exit the
pelagia-ceph-toolboxpod.Scale up the
rook-ceph-operatordeployment to1replica:kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 1
Wait for the missing Ceph Monitor failover process to start:
kubectl -n rook-ceph logs -l app=rook-ceph-operator -f
Example of log extract:
2024-03-01 12:33:08.741215 W | op-mon: mon b NOT found in ceph mon map, failover 2024-03-01 12:33:08.741244 I | op-mon: marking mon "b" out of quorum ... 2024-03-01 12:33:08.766822 I | op-mon: Failing over monitor "b" 2024-03-01 12:33:08.766881 I | op-mon: starting new mon...
Once done, Rook removes the obsolete Ceph Monitor from the node and creates a new one on the specified node with a new letter. For example, if the
a,b, andcCeph Monitors were in quorum andmon-cwas obsolete, Rook removesmon-cand createsmon-d. In this case, the new quorum includes thea,b, anddCeph Monitors.Scale the
rook-ceph-operatordeployment to0replicas:kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 0
Scale the
ceph-maintenance-controllerandpelagia-lcm-controllerdeployments to3replicas:kubectl -n ceph-lcm-mirantis scale deploy ceph-maintenance-controller --replicas 3 kubectl -n ceph-lcm-mirantis scale deploy pelagia-lcm-controller --replicas 3
Once done, ceph-maintenance-controller continues with the node
disablement procedure.
Warning
Since the mon node is replaced, the Ceph mon IP address
will be changed. Therefore, you need to update the Ceph mon IP address
in the related OpenStack services.