Additional Ceph considerations

Additional Ceph considerations

When planning storage for your cloud, you must consider performance, capacity, and operational requirements that affect the efficiency of your MCP environment.

Based on those considerations and operational experience, Mirantis recommends no less than nine-node Ceph clusters for OpenStack production environments. Recommendation for test, development, or PoC environments is a minimum of five nodes.

Note

This section provides simplified calculations for your reference. Each Ceph cluster must be evaluated by a Mirantis Solution Architect.

Capacity

When planning capacity for your Ceph cluster, consider the following:

  • Total usable capacity

    The existing amount of data plus the expected increase of data volume over the projected life of the cluster.

  • Data protection (replication)

    Typically, for persistent storage a factor of 3 is recommended, while for ephemeral storage a factor of 2 is sufficient. However, with a replication factor of 2, an object can not be recovered if one of the replicas is damaged.

  • Cluster overhead

    To ensure cluster integrity, Ceph stops writing if the cluster is 90% full. Therefore, you need to plan accordingly.

  • Administrative overhead

    To catch spikes in cluster usage or unexpected increases in data volume, an additional 10-15% of the raw capacity should be set aside.

  • BlueStore back end

    According to the official Ceph documentation, it is recommended that the BlueStore block.db device size must not be smaller than 4% of the block size. For example, if the block size is 1 TB, then the block.db device size must not be smaller than 40 GB. Salt formulas do not perform complex calculations on the parameters. Therefore, plan the cloud storage accordingly.

The following table describes an example of capacity calculation:

Example calculation
Parameter Value
Current capacity persistent 500 TB
Expected growth over 3 years 300 TB
Required usable capacity 800 TB
Replication factor for all pools 3
Raw capacity 2.4 PB
With 10% cluster internal reserve 2.64 PB
With operational reserve of 15% 3.03 PB
Total cluster capacity 3 PB

Overall sizing

When you have both performance and capacity requirements, scale the cluster size to the higher requirement. For example, if a Ceph cluster requires 10 nodes for capacity and 20 nodes for performance to meet requirements, size the cluster to 20 nodes.

Operational recommendations

  • A minimum of 9 Ceph OSD nodes is recommended to ensure that a node failure does not impact cluster performance.
  • Mirantis does not recommend using servers with excessive number of disks, such as more than 24 disks.
  • All Ceph OSD nodes must have identical CPU, memory, disk and network hardware configurations.
  • If you use multiple availability zones (AZ), the number of nodes must be evenly divisible by the number of AZ.

Perfromance considerations

When planning performance for your Ceph cluster, consider the following:

  • Raw performance capability of the storage devices. For example, a SATA hard drive provides 150 IOPS for 4k blocks.

  • Ceph read IOPS performance. Calculate it using the following formula:

    number of raw read IOPS per device X number of storage devices X 80%
    
  • Ceph write IOPS performance. Calculate it using the following formula:

    number of raw write IOPS per device X number of storage
    devices / replication factor X 65%
    
  • Ratio between reads and writes. Perform a rough calculation using the following formula:

    read IOPS X % reads + write IOPS X % writes
    

Note

Do not use these formulas for a Ceph cluster that is based on SSDs only. Technical specifications of SSDs may vary thus the performance of SSD-only Ceph clusters must be evaluated individually for each model.

Storage device considerations

  • The expected number of IOPS that a storage device can carry out, as well as its throughput, depends on the type of device. For example, a hard disk may be rated for 150 IOPS and 75 MB/s. These numbers are complementary because IOPS are measured with very small files while the throughput is typically measured with big files.

    Read IOPS and write IOPS differ depending on the device. Сonsidering typical usage patterns helps determining how many read and write IOPS the cluster must provide. A ratio of 70/30 is fairly common for many types of clusters. The cluster size must also be considered, since the maximum number of write IOPS that a cluster can push is divided by the cluster size. Furthermore, Ceph can not guarantee the full IOPS numbers that a device could theoretically provide, because the numbers are typically measured under testing environments, which the Ceph cluster cannot offer and also because of the OSD and network overhead.

    You can calculate estimated read IOPS by multiplying the read IOPS number for the device type by the number of devices, and then multiplying by ~0.8. Write IOPS are calculated as follows:

    (the device IOPS * number of devices * 0.65) / cluster size
    

    If the cluster size for the pools is different, an average can be used. If the number of devices is required, the respective formulas can be solved for the device number instead.

  • Consider disk weights. For Ceph Nautilus, by default, a balancer module actively balances the usage of each disk. For Ceph Luminous, by default, the weight is set depending on disk space. To set a disk-specific weight, specify it as an integer using the default Ceph OSD definition field. However, Mirantis does not recommend setting Ceph OSDs weights manually.

    disks:
      - dev: /dev/vdc
        ...
        weight: 5