Add, remove, or reconfigure Ceph nodes

Mirantis Ceph Controller simplifies a Ceph cluster management by automating LCM operations. This section describes how to add, remove, or reconfigure Ceph nodes.

Note

When adding a Ceph node with the Ceph Monitor role, if any issues occur with the Ceph Monitor, rook-ceph removes it and adds a new Ceph Monitor instead, named using the next alphabetic character in order. Therefore, the Ceph Monitor names may not follow the alphabetical order. For example, a, b, d, instead of a, b, c.

Add Ceph nodes on a managed cluster

  1. Prepare a new machine for the required managed cluster as described in Add a machine. During machine preparation, update the settings of the related bare metal host profile for the Ceph node being replaced with the required machine devices as described in Create a custom bare metal host profile.

  2. Open the KaasCephCluster CR of a managed cluster for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  3. In the nodes section, specify the parameters for a Ceph node as required. For the parameters description, see Node parameters.

    The example configuration of the nodes section with the new node:

    nodes:
      kaas-node-5bgk6:
        roles:
        - mon
        - mgr
        storageDevices:
        - config:
            deviceClass: hdd
          fullPath: /dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS
    
    nodes:
      kaas-node-5bgk6:
        roles:
        - mon
        - mgr
        storageDevices:
        - config:
            deviceClass: hdd
          name: sdb
    

    Warning

    Since Container Cloud 2.25.0, Mirantis highly recommends using the non-wwn by-id symlinks to specify storage devices in the storageDevices list.

    For details, see Addressing storage devices.

    Note

    • To use a new Ceph node for a Ceph Monitor or Ceph Manager deployment, also specify the roles parameter.

    • Reducing the number of Ceph Monitors is not supported and causes the Ceph Monitor daemons removal from random nodes.

    • Removal of the mgr role in the nodes section of the KaaSCephCluster CR does not remove Ceph Managers. To remove a Ceph Manager from a node, remove it from the nodes spec and manually delete the mgr pod in the Rook namespace.

  4. Verify that all new Ceph daemons for the specified node have been successfully deployed in the Ceph cluster. The fullClusterInfo section should not contain any issues.

    kubectl -n <managedClusterProjectName> get kaascephcluster -o yaml
    
    Example of system response
    status:
      fullClusterInfo:
        daemonsStatus:
          mgr:
            running: a is active mgr
            status: Ok
          mon:
            running: '3/3 mons running: [a b c] in quorum'
            status: Ok
          osd:
            running: '3/3 running: 3 up, 3 in'
            status: Ok
    

Remove a Ceph node from a managed cluster

Note

Ceph node removal presupposes usage of a KaaSCephOperationRequest CR. For workflow overview, spec and phases description, see High-level workflow of Ceph OSD or node removal.

Note

To remove a Ceph node with a mon role, first move the Ceph Monitor to another node and remove the mon role from the Ceph node as described in Move a Ceph Monitor daemon to another node.

  1. Open the KaasCephCluster CR of a managed cluster for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  2. In the spec.cephClusterSpec.nodes section, remove the required Ceph node specification.

    For example:

    spec:
      cephClusterSpec:
        nodes:
          worker-5: # remove the entire entry for the required node
            storageDevices: {...}
            roles: [...]
    
  3. Create a YAML template for the KaaSCephOperationRequest CR. For example:

    apiVersion: kaas.mirantis.com/v1alpha1
    kind: KaaSCephOperationRequest
    metadata:
      name: remove-osd-worker-5
      namespace: <managedClusterProjectName>
    spec:
      osdRemove:
        nodes:
          worker-5:
            completeCleanUp: true
      kaasCephCluster:
        name: <kaasCephClusterName>
        namespace: <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding cluster namespace and <kaasCephClusterName> with the corresponding KaaSCephCluster name.

  4. Apply the template on the management cluster in the corresponding namespace:

    kubectl apply -f remove-osd-worker-5.yaml
    
  5. Verify that the corresponding request has been created:

    kubectl get kaascephoperationrequest remove-osd-worker-5 -n <managedClusterProjectName>
    
  6. Verify that the removeInfo section appeared in the KaaSCephOperationRequest CR status:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest remove-osd-worker-5 -o yaml
    
    Example of system response
    status:
      childNodesMapping:
        kaas-node-d4aac64d-1721-446c-b7df-e351c3025591: worker-5
      osdRemoveStatus:
        removeInfo:
          cleanUpMap:
            kaas-node-d4aac64d-1721-446c-b7df-e351c3025591:
              osdMapping:
                "10":
                  deviceMapping:
                    sdb:
                      path: "/dev/disk/by-path/pci-0000:00:1t.9"
                      partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdb"
                      type: "block"
                      class: "hdd"
                      zapDisk: true
                "16":
                  deviceMapping:
                    sdc:
                      path: "/dev/disk/by-path/pci-0000:00:1t.10"
                      partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdc"
                      type: "block"
                      class: "hdd"
                      zapDisk: true
    
  7. Verify that the cleanUpMap section matches the required removal and wait for the ApproveWaiting phase to appear in status:

    kubectl -n <managedClusterProjectName> get kaascephoperationrequest remove-osd-worker-5 -o yaml
    

    Example of system response:

    status:
      phase: ApproveWaiting
    
  8. Edit the KaaSCephOperationRequest CR and set the approve flag to true:

    kubectl -n <managedClusterProjectName> edit kaascephoperationrequest remove-osd-worker-5
    

    For example:

    spec:
      osdRemove:
        approve: true
    
  9. Review the status of the KaaSCephOperationRequest resource request processing. The valuable parameters are as follows:

    • status.phase - the current state of request processing

    • status.messages - the description of the current phase

    • status.conditions - full history of request processing before the current phase

    • status.removeInfo.issues and status.removeInfo.warnings - contain error and warning messages occurred during request processing

  10. Verify that the KaaSCephOperationRequest has been completed. For example:

    status:
      phase: Completed # or CompletedWithWarnings if there are non-critical issues
    
  11. Remove the device cleanup jobs:

    kubectl delete jobs -n ceph-lcm-mirantis -l app=miraceph-cleanup-disks
    

Reconfigure a Ceph node on a managed cluster

There is no hot reconfiguration procedure for existing Ceph OSDs and Ceph Monitors. To reconfigure an existing Ceph node, follow the steps below:

  1. Remove the Ceph node from the Ceph cluster as described in Remove a Ceph node from a managed cluster.

  2. Add the same Ceph node but with a modified configuration as described in Add Ceph nodes on a managed cluster.