Ceph advanced configuration

This section describes how to configure a Ceph cluster through the KaaSCephCluster (kaascephclusters.kaas.mirantis.com) CR during or after the deployment of a managed cluster.

The KaaSCephCluster CR spec has two sections, cephClusterSpec and k8sCluster and specifies the nodes to deploy as Ceph components. Based on the roles definitions in the KaaSCephCluster CR, Ceph Controller automatically labels nodes for Ceph Monitors and Managers. Ceph OSDs are deployed based on the storageDevices parameter defined for each Ceph node.

For a default KaaSCephCluster CR, see step 16 in Example of a complete L2 templates configuration for cluster creation.

To configure a Ceph cluster:

  1. Select from the following options:

    • If you do not have a Container Cloud cluster yet, open kaascephcluster.yaml.template for editing.

    • If the Container Cloud cluster is already deployed, open the KaasCephCluster CR of a managed cluster for editing:

      kubectl edit kaascephcluster -n <managedClusterProjectName>
      

      Substitute <managedClusterProjectName> with a corresponding value.

  2. Using the tables below, configure the Ceph cluster as required.

    High-level parameters

    Parameter

    Description

    cephClusterSpec

    Describes a Ceph cluster in the Container Cloud cluster. For details on cephClusterSpec parameters, see the tables below.

    k8sCluster

    Defines the cluster on which the KaaSCephCluster depends on. Use the k8sCluster parameter if the name or namespace of the corresponding Container Cloud cluster differs from default one:

    spec:
      k8sCluster:
        name: kaas-mgmt
        namespace: default
    
    General parameters

    Parameter

    Description

    network

    Specifies networks for the Ceph cluster:

    • clusterNet - specifies a Classless Inter-Domain Routing (CIDR) for the Ceph OSD replication network.

      Warning

      To avoid ambiguous behavior of Ceph daemons, do not specify 0.0.0.0/0 in clusterNet. Otherwise, Ceph daemons can select an incorrect public interface that can cause the Ceph cluster to become unavailable. The bare metal provider automatically translates the 0.0.0.0/0 network range to the default LCM IPAM subnet if it exists.

      Note

      The clusterNet and publicNet parameters support multiple IP networks. For details, see Enable Ceph multinetwork.

      • publicNet - specifies a CIDR for communication between the service and operator.

        Warning

        To avoid ambiguous behavior of Ceph daemons, do not specify 0.0.0.0/0 in publicNet. Otherwise, Ceph daemons can select an incorrect public interface that can cause the Ceph cluster to become unavailable. The bare metal provider automatically translates the 0.0.0.0/0 network range to the default LCM IPAM subnet if it exists.

        Note

        The clusterNet and publicNet parameters support multiple IP networks. For details, see Enable Ceph multinetwork.

    nodes

    Specifies the list of Ceph nodes. For details, see Node parameters. The nodes parameter is a map with machine names as keys and Ceph node specifications as values, for example:

    nodes:
      master-0:
        <node spec>
      master-1:
        <node spec>
      ...
      worker-0:
        <node spec>
    

    nodeGroups

    Specifies the list of Ceph nodes grouped by node lists or node labels. For details, see NodeGroups parameters. The nodeGroups parameter is a map with group names as keys and Ceph node specifications for defined nodes or node labels as values. For example:

    nodes:
      group-1:
        spec: <node spec>
        nodes: ["master-0", "master-1"]
      group-2:
        spec: <node spec>
        label: <nodeLabelExpression>
      ...
      group-3:
        spec: <node spec>
        nodes: ["worker-2", "worker-3"]
    

    The <nodeLabelExpression> must be a valid Kubernetes label selector expression.

    pools

    Specifies the list of Ceph pools. For details, see Pool parameters.

    objectStorage

    Specifies the parameters for Object Storage, such as RADOS Gateway, the Ceph Object Storage. Also specifies the RADOS Gateway Multisite configuration. For details, see RADOS Gateway parameters and Multisite parameters.

    rookConfig

    Optional. String key-value parameter that allows overriding Ceph configuration options.

    extraLogging

    Enables or disables verbose logging in the ceph-controller service, which will expose resource changes in logs. Set to false by default.

    ingress

    Enables a custom ingress rule for public access on Ceph services, for example, Ceph RADOS Gateway. For details, see Enable TLS for Ceph public endpoints.

    rbdMirror

    Enables pools mirroring between two interconnected clusters. For details, see Enable Ceph RBD mirroring.

    clients

    List of Ceph clients. For details, see Clients parameters.

    disableOsSharedKeys

    Disables autogeneration of shared Ceph values for OpenStack deployments. Set to false by default.

    mgr

    • Prior Container Cloud 2.20.0 and MOSK 22.4:

      Contains the modules parameter that should include a list of Ceph Manager modules to enable on the Ceph cluster. For example:

      mgr:
        modules:
        - balancer
        - pg_autoscaler
      
    • Since Container Cloud 2.20.0 and MOSK 22.4:

      Contains the mgrModules parameter that should list the following keys:

      • name - Ceph Manager module name

      • enabled - flag that defines whether the Ceph Manager module is enabled

      For example:

      mgr:
        mgrModules:
        - name: balancer
          enabled: true
        - name: pg_autoscaler
          enabled: true
      

      Note

      Since Container Cloud 2.20.0, the modules parameter is deprecated and, if specified, will be automatically transformed to mgrModules.

    The balancer and pg_autoscaler Ceph Manager modules are enabled by default and cannot be disabled.

    Note

    Most Ceph Manager modules require additional configuration that you can perform through the ceph-tools pod on a managed cluster.

    healthCheck

    Available since Container Cloud 2.20.0 and MOSK 22.4. Configures health checks and liveness probe settings for Ceph daemons. For details, see HealthCheck parameters Available since Container Cloud 2.20.0 and MOSK 22.4.

    Node parameters

    Parameter

    Description

    roles

    Specifies the mon, mgr, or rgw daemon to be installed on a Ceph node. You can place the daemons on any nodes upon your decision. Consider the following recommendations:

    • The recommended number of Ceph Monitors in a Ceph cluster is 3. Therefore, at least 3 Ceph nodes must contain the mon item in the roles parameter.

    • The number of Ceph Monitors must be odd. For example, if the KaaSCephCluster spec contains 3 Ceph monitors and you need to add more, the number of Ceph monitors must equal 5, 7, 9, and so on.

    • Do not add more than 2 Ceph monitors at a time and wait until the Ceph cluster is Ready.

    • For a better HA, the number of mgr roles must equal the number of mon roles.

    • If rgw roles are not specified, all rgw daemons will spawn on the same nodes with mon daemons.

      If a Ceph node contains a mon role, the Ceph Monitor Pod will be deployed on it. If the Ceph node contains a mgr role, it informs the Ceph Controller that a Ceph Manager can be deployed on that node. However, only one Ceph Manager must be deployed on a node.

    storageDevices

    Specifies the list of devices to use for Ceph OSD deployment. Includes the following parameters:

    • name - storage device name that accepts the device name and device by-id alias. Place the device name in the /dev folder, for example, vda. The by-id alias must refer to the exact device by-id and be equal to the status.hardware.storage.byID value of the Machine object. This parameter is mutually exclusive with fullPath.

    • fullPath - full path to the device on the node. Specify only a valid by-path device path, for example, /dev/disk/by-path/pci-0000:00:11.4-ata-3. This parameter is mutually exclusive with name.

    • config - a map of device configurations that must contain a mandatory deviceClass parameter set to hdd, ssd, or nvme. The device class must be defined in a pool and can optionally contain a metadata device:

      storageDevices:
      - name: sdc
        config:
          deviceClass: hdd
          metadataDevice: nvme01
      

      The underlying storage format to use for Ceph OSDs is BlueStore.

      The metadataDevice parameter accepts a device name or logical volume path for the BlueStore device. Mirantis recommends using logical volume paths created on nvme devices. For devices partitioning on logical volumes, see Create a custom bare metal host profile.

    crush

    Specifies the explicit key-value CRUSH topology for a node. For details, see Ceph official documentation: CRUSH maps. Includes the following parameters:

    • datacenter - a physical data center that consists of rooms and handles data.

    • room - a room that accommodates one or more racks with hosts.

    • pdu - a power distribution unit (PDU) device that has multiple outputs and distributes electric power to racks located within a data center.

    • row - a row of computing racks inside room.

    • rack - a computing rack that accommodates one or more hosts.

    • chassis - a bare metal structure that houses or physically assembles hosts.

    • region - the geographic location of one or more Ceph Object instances within one or more zones.

    • zone - a logical group that consists of one or more Ceph Object instances.

    Example configuration:

    crush:
      datacenter: dc1
      room: room1
      pdu: pdu1
      row: row1
      rack: rack1
      chassis: ch1
      region: region1
      zone: zone1
    
    NodeGroups parameters

    Parameter

    Description

    spec

    Specifies a Ceph node specification. For the entire spec, see Node parameters.

    nodes

    Specifies a list of names of machines to which the Ceph node spec must be applied. Mutually exclusive with the label parameter. For example:

    nodeGroups:
      group-1:
        spec: <node spec>
        nodes:
        - master-0
        - master-1
        - worker-0
    

    label

    Specifies a string with a valid label selector expression to select machines to which the node spec must be applied. Mutually exclusive with nodes parameter. For example:

    nodeGroup:
      group-2:
        spec: <node spec>
        label: "ceph-storage-node=true,!ceph-control-node"
    
    Pool parameters

    Parameter

    Description

    name

    Specifies the pool name as a prefix for each Ceph block pool. The resulting Ceph block pool name will be <name>-<deviceClass>.

    useAsFullName

    Enables Ceph block pool to use only the name value as a name. The resulting Ceph block pool name will be <name> without the deviceClass suffix.

    role

    Specifies the pool role and is used mostly for Mirantis OpenStack for Kubernetes (MOSK) pools.

    default

    Defines if the pool and dependent StorageClass should be set as default. Must be enabled only for one pool.

    deviceClass

    Specifies the device class for the defined pool. Possible values are HDD, SSD, and NVMe.

    replicated

    The replicated parameter is mutually exclusive with erasureCoded and includes the following parameters:

    • size - the number of pool replicas.

    • targetSizeRatio - Optional. A float percentage from 0.0 to 1.0, which specifies the expected consumption of the total Ceph cluster capacity. The default values are as follows:

    erasureCoded

    Enables the erasure-coded pool. For details, see Rook documentation: Erasure coded and Ceph documentation: Erasure coded pool. The erasureCoded parameter is mutually exclusive with replicated.

    failureDomain

    The failure domain across which the replicas or chunks of data will be spread. Set to host by default. The possible values are osd or host.

    mirroring

    Optional. Enables the mirroring feature for the defined pool. Includes the mode parameter that can be set to pool or image. For details, see Enable Ceph RBD mirroring.

    allowVolumeExpansion

    Optional. Not updatable as it applies only once. Enables expansion of persistent volumes based on StorageClass of a corresponding pool. For details, see Kubernetes documentation: Resizing persistent volumes using Kubernetes.

    Note

    A Kubernetes cluster only supports increase of storage size.

    rbdDeviceMapOptions

    Optional. Available since Container Cloud 2.20.0 and MOSK 22.4. Not updatable as it applies only once. Specifies custom rbd device map options to use with StorageClass of a corresponding pool. Allows customizing the Kubernetes CSI driver interaction with Ceph RBD for the defined StorageClass. For the available options, see Ceph documentation: Kernel RBD (KRBD) options.

    Example configuration:

    pools:
    - name: kubernetes
      role: kubernetes
      deviceClass: hdd
      replicated:
        size: 3
        targetSizeRatio: 10.0
      default: true
    

    To configure additional required pools for MOSK, see MOSK Deployment Guide: Deploy a Ceph cluster.

    Clients parameters

    Parameter

    Description

    name

    Ceph client name.

    caps

    Key-value parameter with Ceph client capabilities. For details about caps, refer to Ceph documentation: Authorization (capabilities).

    Example configuration:

    clients:
    - name: glance
      caps:
        mon: allow r, allow command "osd blacklist"
        osd: profile rbd pool=images
    
    RADOS Gateway parameters

    Parameter

    Description

    name

    Ceph Object Storage instance name.

    dataPool

    Mutually exclusive with the zone parameter. Object storage data pool spec that should only contain replicated or erasureCoded and failureDomain parameters. The failureDomain parameter may be set to osd or host, defining the failure domain across which the data will be spread. For dataPool, Mirantis recommends using an erasureCoded pool. For details, see Rook documentation: Erasure coding. For example:

    cephClusterSpec:
      objectStorage:
        rgw:
          dataPool:
            erasureCoded:
              codingChunks: 1
              dataChunks: 2
    

    metadataPool

    Mutually exclusive with the zone parameter. Object storage metadata pool spec that should only contain replicated and failureDomain parameters. The failureDomain parameter may be set to osd or host, defining the failure domain across which the data will be spread. Can use only replicated settings. For example:

    cephClusterSpec:
      objectStorage:
        rgw:
          metadataPool:
            replicated:
              size: 3
            failureDomain: host
    

    where replicated.size is the number of full copies of data on multiple nodes.

    Warning

    When using the non-recommended Ceph pools replicated.size of less than 3, Ceph OSD removal cannot be performed. The minimal replica size equals a rounded up half of the specified replicated.size. For example, if replicated.size is 2, the minimal replica size is 1, and if replicated.size is 3, then the minimal replica size is 2. The replica size of 1 allows Ceph having PGs with only one Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED health warning that blocks Ceph OSD removal. Mirantis recommends having setting replicated.size to 3 for each Ceph pool.

    gateway

    The gateway settings corresponding to the rgw daemon settings. Includes the following parameters:

    • port - the port on which the Ceph RGW service will be listening on HTTP.

    • securePort - the port on which the Ceph RGW service will be listening on HTTPS.

    • instances - the number of pods in the Ceph RGW ReplicaSet. If allNodes is set to true, a DaemonSet is created instead.

      Note

      Mirantis recommends using 2 instances for Ceph Object Storage.

    • allNodes - defines whether to start the Ceph RGW pods as a DaemonSet on all nodes. The instances parameter is ignored if allNodes is set to true.

    For example:

    cephClusterSpec:
      objectStorage:
        rgw:
          gateway:
            allNodes: false
            instances: 1
            port: 80
            securePort: 8443
    

    preservePoolsOnDelete

    Defines whether to delete the data and metadata pools in the rgw section if the object storage is deleted. Set this parameter to true if you need to store data even if the object storage is deleted. However, Mirantis recommends setting this parameter to false.

    objectUsers and buckets

    Optional. To create new Ceph RGW resources, such as buckets or users, specify the following keys. Ceph Controller will automatically create the specified object storage users and buckets in the Ceph cluster.

    • objectUsers - available since Container Cloud 2.20.0 and MOSK 22.4. A list of user specifications to create for object storage. Contains the following fields:

      • name - a user name to create.

      • displayName - the Ceph user name to display.

      • capabilities - user capabilities:

        • user - admin capabilities to read/write Ceph Object Store users.

        • bucket - admin capabilities to read/write Ceph Object Store buckets.

        • metadata - admin capabilities to read/write Ceph Object Store metadata.

        • usage - admin capabilities to read/write Ceph Object Store usage.

        • zone - admin capabilities to read/write Ceph Object Store zones.

        The available options are *, read, write, read, write. For details, see Ceph documentation: Add/remove admin capabilities.

      • quotas - user quotas:

        • maxBuckets - the maximum bucket limit for the Ceph user. Integer, for example, 10.

        • maxSize - the maximum size limit of all objects across all the buckets of a user. String size, for example, 10G.

        • maxObjects - the maximum number of objects across all buckets of a user. Integer, for example, 10.

        For example:

        objectUsers:
        - capabilities:
            bucket: '*'
            metadata: read
            user: read
          displayName: test-user
          name: test-user
          quotas:
            maxBuckets: 10
            maxSize: 10G
        
    • users - a list of strings that contain user names to create for object storage.

      Note

      Deprecated since Container Cloud 2.20.0. Use objectUsers instead. If users is specified, it will be automatically transformed to the objectUsers section.

    • buckets - a list of strings that contain bucket names to create for object storage.

    zone

    Optional. Mutually exclusive with metadataPool and dataPool. Defines the Ceph Multisite zone where the object storage must be placed. Includes the name parameter that must be set to one of the zones items. For details, see Enable Multisite for Ceph RGW Object Storage.

    For example:

    cephClusterSpec:
      objectStorage:
        multisite:
          zones:
          - name: master-zone
          ...
        rgw:
          zone:
            name: master-zone
    

    SSLCert

    Optional. Custom TLS certificate parameters used to access the Ceph RGW endpoint. If not specified, a self-signed certificate will be generated.

    For example:

    cephClusterSpec:
      objectStorage:
        rgw:
          SSLCert:
            cacert: |
              -----BEGIN CERTIFICATE-----
              ca-certificate here
              -----END CERTIFICATE-----
            tlsCert: |
              -----BEGIN CERTIFICATE-----
              private TLS certificate here
              -----END CERTIFICATE-----
            tlsKey: |
              -----BEGIN RSA PRIVATE KEY-----
              private TLS key here
              -----END RSA PRIVATE KEY-----
    

    For configuration example, see Enable Ceph RGW Object Storage.

    Multisite parameters

    Parameter

    Description

    realms Technical Preview

    List of realms to use, represents the realm namespaces. Includes the following parameters:

    • name - the realm name.

    • pullEndpoint - optional, required only when the master zone is in a different storage cluster. The endpoint, access key, and system key of the system user from the realm to pull from. Includes the following parameters:

      • endpoint - the endpoint of the master zone in the master zone group.

      • accessKey - the access key of the system user from the realm to pull from.

      • secretKey - the system key of the system user from the realm to pull from.

    zoneGroups Technical Preview

    The list of zone groups for realms. Includes the following parameters:

    • name - the zone group name.

    • realmName - the realm namespace name to which the zone group belongs to.

    zones Technical Preview

    The list of zones used within one zone group. Includes the following parameters:

    • name - the zone name.

    • metadataPool - the settings used to create the Object Storage metadata pools. Must use replication. For details, see Pool parameters.

    • dataPool - the settings to create the Object Storage data pool. Can use replication or erasure coding. For details, see Pool parameters.

    • zoneGroupName - the zone group name.

    For configuration example, see Enable Multisite for Ceph RGW Object Storage.

    HealthCheck parameters Available since Container Cloud 2.20.0 and MOSK 22.4

    Parameter

    Description

    daemonHealth

    Specifies health check settings for Ceph daemons. Contains the following parameters:

    • status - configures health check settings for Ceph health

    • mon - configures health check settings for Ceph Monitors

    • osd - configures health check settings for Ceph OSDs

    Each parameter allows defining the following settings:

    • disabled - a flag that disables the health check.

    • interval - an interval in seconds or minutes for the health check to run. For example, 60s for 60 seconds.

    • timeout - a timeout for the health check in seconds or minutes. For example, 60s for 60 seconds.

    livenessProbe

    Key-value parameter with liveness probe settings for the defined daemon types. Can be one of the following: mgr, mon, osd, or mds. Includes the disabled flag and the probe parameter. The probe parameter accepts the following options:

    • initialDelaySeconds - the number of seconds after the container has started before the liveness probes are initiated. Integer.

    • timeoutSeconds - the number of seconds after which the probe times out. Integer.

    • periodSeconds - the frequency (in seconds) to perform the probe. Integer.

    • successThreshold - the minimum consecutive successful probes for the probe to be considered successful after a failure. Integer.

    • failureThreshold - the minimum consecutive failures for the probe to be considered failed after having succeeded. Integer.

    Note

    Ceph Controller specifies the following livenessProbe defaults for mon, mgr, osd, and mds (if CephFS is enabled):

    • 5 for timeoutSeconds

    • 5 for failureThreshold

    startupProbe

    Key-value parameter with startup probe settings for the defined daemon types. Can be one of the following: mgr, mon, osd, or mds. Includes the disabled flag and the probe parameter. The probe parameter accepts the following options:

    • timeoutSeconds - the number of seconds after which the probe times out. Integer.

    • periodSeconds - the frequency (in seconds) to perform the probe. Integer.

    • successThreshold - the minimum consecutive successful probes for the probe to be considered successful after a failure. Integer.

    • failureThreshold - the minimum consecutive failures for the probe to be considered failed after having succeeded. Integer.

  3. Select from the following options:

    • If you are creating a managed cluster, save the updated KaaSCephCluster template to the corresponding file and proceed with the managed cluster creation.

    • If you are configuring KaaSCephCluster of an existing managed cluster, exiting the text editor will apply the change.