Prerequisites for a Ceph cluster distributed over L3 domains

Prerequisites for a Ceph cluster distributed over L3 domains

Note

This feature is available starting from the MCP 2019.2.5 maintenance update. Before enabling the feature, follow the steps described in Apply maintenance updates.

Before deploying a Ceph cluster with nodes in different L3 compartments, consider the following prerequisite steps. Otherwise, proceed to Deploy a Ceph cluster right away.

This document uses the terms failure domain and L3 compartment. Failure domains are a logical representation of a physical cluster structure. For example, one L3 segment spans two racks and another one spans a single rack. In this case, failure domains reside along the rack boundary, instead of the L3 segmentation.

  1. Verify your networking configuration:

    Note

    Networking verification may vary depending on the hardware used for the deployment. Use the following steps as a reference only.

    1. To ensure the best level of high availability, verify that the Ceph Monitor and RADOS Gateway nodes are distributed as evenly as possible over the failure domains.

    2. Verify that the same number and weight of OSD nodes and OSDs are defined in each L3 compartment for the best data distribution:

      1. In classes/cluster/cluster_name/ceph/osd.yml, verify the Ceph OSDs weight. For example:

        backend:
                bluestore:
                  disks:
                    - dev: /dev/vdc
                      block_db: /dev/vdd
                      class: hdd
                      weight: 1.5
        
      2. In classes/cluster/cluster_name/infra/config/nodes.yml, verify the number of OSDs.

    3. Verify the connection between the nodes from different compartments through public or cluster VLANs. To use different subnets for the Ceph nodes in different compartments, specify all subnets in classes/cluster/cluster_name/ceph/common.yml. For example:

      parameters:
        ceph:
          common:
            public_network: 10.10.0.0/24, 10.10.1.0/24
            cluster_network: 10.11.0.0/24, 10.11.1.0/24
      
  2. Prepare the CRUSHMAP:

    1. To ensure at least one data replica in every failure domain, group the Ceph OSD nodes from each compartment by defining the ceph_crush_parent parameter in classes/cluster/cluster_name/infra/config/nodes.yml for each Ceph OSD node. For example, for three Ceph OSDs in rack01:

      ceph_osd_rack01:
        name: ${_param:ceph_osd_rack01_hostname}<<count>>
        domain: ${_param:cluster_domain}
        classes:
          - cluster.${_param:cluster_name}.ceph.osd
        repeat:
          count: 3
          ip_ranges:
            single_address: 10.11.11.1-10.11.20.255
            backend_address: 10.12.11.1-10.12.20.255
            ceph_public_address: 10.13.11.1-10.13.20.255
          start: 1
          digits: 0
          params:
            single_address:
              value: <<single_address>>
            backend_address:
              value: <<backend_address>>
            ceph_public_address:
              value: <<ceph_public_address>>
        params:
          salt_master_host: ${_param:reclass_config_master}
          ceph_crush_parent: rack01
          linux_system_codename: xenial
      
    2. In /classes/cluster/cluster_name/ceph/setup.yml, create a new CRUSHMAP and define the failure domains. For example, to have three copies of each object distributed over rack01, rack02, rack03:

      parameters:
        ceph:
          setup:
            crush:
               enforce: false # uncomment this line and set it to true only if
                              # you want to enforce CRUSHMAP with ceph.setup
                              # state !
               type:  # define any non-standard bucket type here
                 - root
                 - region
                 - rack
                 - host
                 - osd
              root:
                 - name: default
                room:
                  - name: room1
                    parent: default
                  - name: room2
                    parent: default
                  - name: room3
                    parent: default
                rack:
                  - name: rack01 # OSD nodes defined in previous section
                                 # will be added to this rack
                    parent: room1
                  - name: rack02
                    parent: room2
                  - name: rack03
                    parent: room3
                rule:
                  default:
                    ruleset: 0
                    type: replicated
                    min_size: 2
                    max_size: 10
                    steps:
                      - take take default
                      - chooseleaf firstn 0 type region
                      - emit
      

Once done, proceed to Deploy a Ceph cluster.