Add, remove, or reconfigure Ceph nodes¶
Pelagia Lifecycle Management (LCM) Controller simplifies Ceph cluster management by automating LCM operations. This section describes how to add, remove, or reconfigure Ceph nodes.
Note
When adding a Ceph node with the Ceph Monitor role, if any issues occur with
the Ceph Monitor, rook-ceph removes it and adds a new Ceph Monitor instead,
named using the next alphabetic character in order. Therefore, the Ceph Monitor
names may not follow the alphabetical order. For example, a, b, d,
instead of a, b, c.
Add a Ceph node¶
Prepare a new node for the cluster.
Open the
CephDeploymentcustom resource (CR) for editing:kubectl -n pelagia edit cephdpl
In the
nodessection, specify the parameters for a Ceph node as required. For the parameter description, see Nodes parameters.The example configuration of the
nodessection with the new node:nodes: - name: storage-worker-414 roles: - mon - mgr devices: - config: deviceClass: hdd fullPath: /dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS
You can also add a new node with device filters. For example:
nodes: - name: storage-worker-414 roles: - mon - mgr config: deviceClass: hdd devicePathFilter: "^/dev/disk/by-id/scsi-SATA_HGST+*"
Note
To use a new Ceph node for a Ceph Monitor or Ceph Manager deployment, also specify the
rolesparameter.Reducing the number of Ceph Monitors is not supported and causes the Ceph Monitor daemons removal from random nodes.
Removal of the
mgrrole in thenodessection of theCephDeploymentCR does not remove Ceph Managers. To remove a Ceph Manager from a node, remove it from thenodesspec and manually delete themgrpod in the Rook namespace.
Verify that all new Ceph daemons for the specified node have been successfully deployed in the Ceph cluster. The
CephDeploymentHealthCRstatus.healthReport.cephDaemons.cephDaemonsshould not contain any issues.kubectl -n pelagia get cephdeploymenthealth -o yaml
Example of system response:
status: healthReport: cephDaemons: cephDaemons: mgr: info: - 'a is active mgr, standbys: [b]' status: ok mon: info: - 3 mons, quorum [a b c] status: ok osd: info: - 3 osds, 3 up, 3 in status: ok
Remove a Ceph node¶
Ceph OSD removal presupposes usage of a CephOsdRemoveTask CR. For workflow overview, see Creating a Ceph OSD remove task.
Note
To remove a Ceph node with a mon role, first move the Ceph Monitor to another node and remove the mon role from the Ceph node as described in Move a Ceph Monitor daemon to another node.
Open the
CephDeploymentCR for editing:kubectl -n pelagia edit cephdpl
In the
nodessection, remove the required Ceph node specification.For example:
spec: nodes: - name: storage-worker-5 # remove the entire entry for the required node devices: {...} roles: [...]
Create a YAML template for the
CephOsdRemoveTaskCR. For example:apiVersion: lcm.mirantis.com/v1alpha1 kind: CephOsdRemoveTask metadata: name: remove-osd-worker-5 namespace: pelagia spec: nodes: storage-worker-5: completeCleanUp: true
Apply the template on the Rockoon cluster:
kubectl apply -f remove-osd-worker-5.yaml
Verify that the corresponding request has been created:
kubectl -n pelagia get cephosdremovetask remove-osd-worker-5
Verify that the
removeInfosection appeared in theCephOsdRemoveTaskCRstatus:kubectl -n pelagia get cephosdremovetask remove-osd-worker-5 -o yaml
Example of system response:
status: removeInfo: cleanupMap: storage-worker-5: osdMapping: "10": deviceMapping: sdb: path: "/dev/disk/by-path/pci-0000:00:1t.9" partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdb" type: "block" class: "hdd" zapDisk: true "16": deviceMapping: sdc: path: "/dev/disk/by-path/pci-0000:00:1t.10" partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdc" type: "block" class: "hdd" zapDisk: true
Verify that the
cleanupMapsection matches the required removal and wait for theApproveWaitingphase to appear instatus:kubectl -n pelagia get cephosdremovetask remove-osd-worker-5 -o yaml
Example of system response:
status: phase: ApproveWaiting
Edit the
CephOsdRemoveTaskCR and set theapproveflag totrue:kubectl -n pelagia edit cephosdremovetask remove-osd-worker-5
For example:
spec: approve: true
Review the status of the
CephOsdRemoveTaskresource processing. The valuable parameters are as follows:status.phase- the current state of task processingstatus.messages- the description of the current phasestatus.conditions- full history of task processing before the current phasestatus.removeInfo.issuesandstatus.removeInfo.warnings- contain error and warning messages occurred during task processing
Verify that the
CephOsdRemoveTaskhas been completed. For example:status: phase: Completed # or CompletedWithWarnings if there are non-critical issues
Remove the device cleanup jobs:
kubectl delete jobs -n pelagia -l app=pelagia-lcm-cleanup-disks
Reconfigure a Ceph node¶
There is no hot reconfiguration procedure for existing Ceph OSDs and Ceph Monitors. To reconfigure an existing Ceph node:
Remove the Ceph node from the Ceph cluster.
Add the same Ceph node but with a modified configuration.