Ceph OSD hardware considerations

Ceph OSD hardware considerations

When sizing a Ceph cluster, you must consider the number of drives needed for capacity and the number of drives required to accommodate performance requirements. You must also consider the largest number of drives that ensure all requirements are met.

The following list describes generic hardware considerations for a Ceph cluster:

  • Use HDD storage devices for Ceph Object Storage Devices (OSDs). Ceph is designed to work on commercial off-the-shelf (COTS) hardware. Most disk devices from major vendors are supported.
  • Create one OSD per HDD in Ceph OSD nodes.
  • Allocate 1 CPU thread per OSD.
  • Allocate 1 GB of RAM per 1 TB of disk storage on the OSD node.
  • Disks for OSDs must be presented to the system as individual devices.
    • This can be achieved by using Host Bus Adapter (HBA) mode for disk controller.
    • RAID controllers are only acceptable if disks can be presented to the operating system as individual devices (JBOD or HBA mode).
  • Place Ceph write journals on write-optimized SSDs instead of OSD HDD disks. Use one SSD journal device for 4 - 5 OSD hard disks.

The following table provides an example of input parameters for a Ceph cluster calculation:

Example of input parameters
Parameter Value
Virtual instance size 40 GB
Read IOPS 14
Read to write IOPS ratio 70/30
Number of availability zones 3

For 50 compute nodes, 1,000 instances

Number of OSD nodes: 9, 20-disk 2U chassis

This configuration provides 360 TB of raw storage and with cluster size of 3 and 60% used initially, the initial amount of data should not exceed 72 TB (out of 120 TB of replicated storage). Expected read IOPS for this cluster is approximately 20,000 and write IOPS 5,000, or 15,000 IOPS in a 70/30 pattern.

Note

In this case performance is the driving factor, and so the capacity is greater than required.

For 300 compute nodes, 6,000 instances

Number of OSD nodes: 54, 36-disks chassis

The cost per node is low compared to the cost of the storage devices and with a larger number of nodes failure of one node is proportionally less critical. A separate replication network is recommended.

For 500 compute nodes, 10,000 instances

Number of OSD nodes: 60, 36-disks chassis

You may consider using a larger chassis. A separate replication network is required.