Prerequisites for a Ceph cluster distributed over L3 domains

Prerequisites for a Ceph cluster distributed over L3 domains¶

Note

This feature is available starting from the MCP 2019.2.5 maintenance update. Before enabling the feature, follow the steps described in Apply maintenance updates.

Before deploying a Ceph cluster with nodes in different L3 compartments, consider the following prerequisite steps. Otherwise, proceed to Deploy a Ceph cluster right away.

This document uses the terms failure domain and L3 compartment. Failure domains are a logical representation of a physical cluster structure. For example, one L3 segment spans two racks and another one spans a single rack. In this case, failure domains reside along the rack boundary, instead of the L3 segmentation.

Verify your networking configuration:

Note

Networking verification may vary depending on the hardware used for the deployment. Use the following steps as a reference only.
1. To ensure the best level of high availability, verify that the Ceph Monitor and RADOS Gateway nodes are distributed as evenly as possible over the failure domains.
2. Verify that the same number and weight of OSD nodes and OSDs are defined in each L3 compartment for the best data distribution:
  1. In classes/cluster/cluster_name/ceph/osd.yml, verify the Ceph OSDs weight. For example:
    backend: bluestore: disks: - dev: /dev/vdc block_db: /dev/vdd class: hdd weight: 1.5
  2. In classes/cluster/cluster_name/infra/config/nodes.yml, verify the number of OSDs.
3. Verify the connection between the nodes from different compartments through public or cluster VLANs. To use different subnets for the Ceph nodes in different compartments, specify all subnets in classes/cluster/cluster_name/ceph/common.yml. For example:
```
parameters:
  ceph:
    common:
      public_network: 10.10.0.0/24, 10.10.1.0/24
      cluster_network: 10.11.0.0/24, 10.11.1.0/24
```

Prepare the CRUSHMAP:

To ensure at least one data replica in every failure domain, group the Ceph OSD nodes from each compartment by defining the ceph_crush_parent parameter in classes/cluster/cluster_name/infra/config/nodes.yml for each Ceph OSD node. For example, for three Ceph OSDs in rack01:

ceph_osd_rack01:
  name: ${_param:ceph_osd_rack01_hostname}<<count>>
  domain: ${_param:cluster_domain}
  classes:
    - cluster.${_param:cluster_name}.ceph.osd
  repeat:
    count: 3
    ip_ranges:
      single_address: 10.11.11.1-10.11.20.255
      backend_address: 10.12.11.1-10.12.20.255
      ceph_public_address: 10.13.11.1-10.13.20.255
    start: 1
    digits: 0
    params:
      single_address:
        value: <<single_address>>
      backend_address:
        value: <<backend_address>>
      ceph_public_address:
        value: <<ceph_public_address>>
  params:
    salt_master_host: ${_param:reclass_config_master}
    ceph_crush_parent: rack01
    linux_system_codename: xenial

In /classes/cluster/cluster_name/ceph/setup.yml, create a new CRUSHMAP and define the failure domains. For example, to have three copies of each object distributed over rack01, rack02, rack03:

parameters:
  ceph:
    setup:
      crush:
         enforce: false # uncomment this line and set it to true only if
                        # you want to enforce CRUSHMAP with ceph.setup
                        # state !
         type:  # define any non-standard bucket type here
           - root
           - region
           - rack
           - host
           - osd
        root:
           - name: default
          room:
            - name: room1
              parent: default
            - name: room2
              parent: default
            - name: room3
              parent: default
          rack:
            - name: rack01 # OSD nodes defined in previous section
                           # will be added to this rack
              parent: room1
            - name: rack02
              parent: room2
            - name: rack03
              parent: room3
          rule:
            default:
              ruleset: 0
              type: replicated
              min_size: 2
              max_size: 10
              steps:
                - take take default
                - chooseleaf firstn 0 type region
                - emit

Once done, proceed to Deploy a Ceph cluster.

updated: 2025-01-10 08:54

Deploy a Ceph cluster

View Previous Section

Deploy a Ceph cluster