Mirantis Container Cloud (MCC) becomes part of Mirantis OpenStack for Kubernetes (MOSK)!

Starting with MOSK 25.2, the MOSK documentation set covers all product layers, including MOSK management (formerly MCC). This means everything you need is in one place. The separate MCC documentation site will be retired, so please update your bookmarks for continued easy access to the latest content.

Move a Ceph Monitor daemon to another node

Warning

This procedure is valid for MOSK clusters that use the MiraCeph custom resource (CR), which is available since MOSK 25.2 to replace the deprecated KaaSCephCluster. For the equivalent procedure with the KaaSCephCluster CR, refer to the following section:

Move a Ceph Monitor daemon to another node

This document describes how to migrate a Ceph Monitor daemon from one node to another without changing the general number of Ceph Monitors in the cluster. In the Ceph Controller concept, migration of a Ceph Monitor means manually removing it from one node and adding it to another.

Consider the following exemplary placement scheme of Ceph Monitors in the nodes spec of the MiraCeph CR:

nodes:
  node-1:
    roles:
    - mon
    - mgr
  node-2:
    roles:
    - mgr

Using the example above, if you want to move the Ceph Monitor from node-1 to node-2 without changing the number of Ceph Monitors, the roles table of the nodes spec must result as follows:

nodes:
  node-1:
    roles:
    - mgr
  node-2:
    roles:
    - mgr
    - mon

However, due to the Rook limitation related to Kubernetes architecture, once you move the Ceph Monitor through the MiraCeph CR, changes will not apply automatically. This is caused by the following Rook behavior:

  • Rook creates Ceph Monitor resources as deployments with nodeSelector, which binds Ceph Monitor pods to a requested node.

  • Rook does not recreate new Ceph Monitors with the new node placement if the current mon quorum works.

Therefore, to move a Ceph Monitor to another node, you must also manually apply the new Ceph Monitors placement to the Ceph cluster as described below.

To move a Ceph Monitor to another node:

  1. Open the MiraCeph CR on a MOSK cluster:

    kubectl -n ceph-lcm-mirantis edit miraceph
    
  2. In the nodes spec of the MiraCeph CR, change the mon roles placement without changing the total number of mon roles. For details, see the example above. Note the nodes on which the mon roles have been removed and save the name value of that nodes.

  3. If you perform a MOSK cluster update, follow additional steps:

    1. Verify that the following conditions are met before proceeding to the next step:

      • There are at least 2 running and available Ceph Monitors so that the Ceph cluster is accessible during the Ceph Monitor migration:

        kubectl -n rook-ceph get pod -l app=rook-ceph-mon
        kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
        
      • The MiraCeph object on the MOSK cluster has the required node with the mon role added in the nodes section of spec:

        kubectl -n ceph-lcm-mirantis get miraceph -o yaml
        
      • The Ceph NodeWorkloadLock for the required node is created:

        kubectl --kubeconfig child-kubeconfig get nodeworkloadlock -o jsonpath='{range .items[?(@.spec.nodeName == "<desiredNodeName>")]}{@.metadata.name}{"\n"}{end}' | grep ceph
        
    2. Scale the ceph-maintenance-controller deployment to 0 replicas:

      kubectl -n ceph-lcm-mirantis scale deploy ceph-maintenance-controller --replicas 0
      
    3. Manually edit the MOSK cluster node labels: remove the ceph_role_mon label from the obsolete node and add this label to the new node:

      kubectl label node <obsoleteNodeName> ceph_role_mon-
      kubectl label node <newNodeName> ceph_role_mon=true
      
    4. Verify that the rook-ceph-operator deployment is scaled to 0 replica:

      kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 0
      
  4. Obtain the rook-ceph-mon deployment name placed on the obsolete node using the previously obtained node name:

    kubectl -n rook-ceph get deploy -l app=rook-ceph-mon -o jsonpath="{.items[?(@.spec.template.spec.nodeSelector['kubernetes\.io/hostname'] == '<nodeName>')].metadata.name}"
    

    Substitute <nodeName> with the name of the node where you removed the mon role.

  5. Back up the rook-ceph-mon deployment placed on the obsolete node:

    kubectl -n rook-ceph get deploy <rook-ceph-mon-name> -o yaml > <rook-ceph-mon-name>-backup.yaml
    
  6. Remove the rook-ceph-mon deployment placed on the obsolete node:

    kubectl -n rook-ceph delete deploy <rook-ceph-mon-name>
    
  7. If you perform a MOSK cluster update, follow additional steps:

    1. Enter the ceph-tools pod:

      kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
      
    2. Remove the Ceph Monitor from the Ceph monmap by letter:

      ceph mon rm <monLetter>
      

      Substitute <monLetter> with the old Ceph Monitor letter. For example, mon-b has the letter b.

    3. Verify that the Ceph cluster does not have any information about the the removed Ceph Monitor:

      ceph mon dump
      ceph -s
      
    4. Exit the ceph-tools pod.

    5. Scale up the rook-ceph-operator deployment to 1 replica:

      kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 1
      
    6. Wait for the missing Ceph Monitor failover process to start:

      kubectl -n rook-ceph logs -l app=rook-ceph-operator -f
      

      Example of log extract:

      2024-03-01 12:33:08.741215 W | op-mon: mon b NOT found in ceph mon map, failover
      2024-03-01 12:33:08.741244 I | op-mon: marking mon "b" out of quorum
      ...
      2024-03-01 12:33:08.766822 I | op-mon: Failing over monitor "b"
      2024-03-01 12:33:08.766881 I | op-mon: starting new mon...
      
  8. Select one of the following options:

    Wait approximately 10 minutes until rook-ceph-operator performs a failover of the Pending mon pod. Inspect the logs during the failover process:

    kubectl -n rook-ceph logs -l app=rook-ceph-operator -f
    

    Example of log extract:

    2021-03-15 17:48:23.471978 W | op-mon: mon "a" not found in quorum, waiting for timeout (554 seconds left) before failover
    

    Note

    If the failover process fails:

    1. Scale down the rook-ceph-operator deployment to 0 replicas.

    2. Apply the backed-up rook-ceph-mon deployment.

    3. Scale back the rook-ceph-operator deployment to 1 replica.

    1. Scale the rook-ceph-operator deployment to 0 replicas:

      kubectl -n rook-ceph scale deploy rook-ceph-operator --replicas 0
      
    2. Scale the ceph-maintenance-controller deployment to 3 replicas:

      kubectl -n ceph-lcm-mirantis scale deploy ceph-maintenance-controller --replicas 3
      

Once done, Rook removes the obsolete Ceph Monitor from the node and creates a new one on the specified node with a new letter. For example, if the a, b, and c Ceph Monitors were in quorum and mon-c was obsolete, Rook removes mon-c and creates mon-d. In this case, the new quorum includes the a, b, and d Ceph Monitors.