Enable Ceph tolerations and resources management¶

Warning

This document does not provide any specific recommendations on requests and limits for Ceph resources. The document stands for a native Ceph resources configuration for any cluster with Mirantis Container Cloud or Mirantis OpenStack for Kubernetes (MOSK).

You can configure Ceph Controller to manage Ceph resources by specifying their requirements and constraints. To configure the resources consumption for the Ceph nodes, consider the following options that are based on different Helm release configuration values:

Configuring tolerations for taint nodes for the Ceph Monitor, Ceph Manager, and Ceph OSD daemons. For details, see Taints and Tolerations.
Configuring nodes resources requests or limits for the Ceph daemons and for each Ceph OSD device class such as HDD, SSD, or NVMe. For details, see Managing Resources for Containers.

To enable Ceph tolerations and resources management:

To avoid Ceph cluster health issues during daemons configuration changing, set Ceph noout, nobackfill, norebalance, and norecover flags through the ceph-tools pod before editing Ceph tolerations and resources:
```
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l \
"app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
ceph osd set noout
ceph osd set nobackfill
ceph osd set norebalance
ceph osd set norecover
exit
```
Note

Skip this step if you are only configuring the PG rebalance timeout and replicas count parameters.
Edit the KaaSCephCluster resource of a managed cluster:
```
kubectl -n <managedClusterProjectName> edit kaascephcluster
```
Substitute <managedClusterProjectName> with the project name of the required managed cluster.

Specify the parameters in the hyperconverge section as required. The hyperconverge section includes the following parameters:

Ceph tolerations and resources parameters¶
Parameter	Description	Example values
`tolerations`	Specifies tolerations for taint nodes for the defined daemon type. Each daemon type key contains the following parameters: cephClusterSpec: hyperconverge: tolerations: <daemonType>: rules: - key: "" operator: "" value: "" effect: "" tolerationSeconds: 0 Possible values for `<daemonType>` are `osd`, `mon`, `mgr`, and `rgw`. The following values are also supported: `all` - specifies general toleration rules for all daemons if no separate daemon rule is specified. `mds` - specifies the CephFS Metadata Server daemons.	hyperconverge: tolerations: mon: rules: - effect: NoSchedule key: node-role.kubernetes.io/controlplane operator: Exists mgr: rules: - effect: NoSchedule key: node-role.kubernetes.io/controlplane operator: Exists osd: rules: - effect: NoSchedule key: node-role.kubernetes.io/controlplane operator: Exists rgw: rules: - effect: NoSchedule key: node-role.kubernetes.io/controlplane operator: Exists
`resources`	Specifies resources requests or limits. The parameter is a map with the daemon type as a key and the following structure as a value: hyperconverge: resources: <daemonType>: requests: <kubernetes valid spec of daemon resource requests> limits: <kubernetes valid spec of daemon resource limits> Possible values for `<daemonType>` are `mon`, `mgr`, `osd`, `osd-hdd`, `osd-ssd`, `osd-nvme`, `prepareosd`, `rgw`, and `mds`. The `osd-hdd`, `osd-ssd`, and `osd-nvme` resource requirements handle only the Ceph OSDs with a corresponding device class.	hyperconverge: resources: mon: requests: memory: 1Gi cpu: 2 limits: memory: 2Gi cpu: 3 mgr: requests: memory: 1Gi cpu: 2 limits: memory: 2Gi cpu: 3 osd: requests: memory: 1Gi cpu: 2 limits: memory: 2Gi cpu: 3 osd-hdd: requests: memory: 1Gi cpu: 2 limits: memory: 2Gi cpu: 3 osd-ssd: requests: memory: 1Gi cpu: 2 limits: memory: 2Gi cpu: 3 osd-nvme: requests: memory: 1Gi cpu: 2 limits: memory: 2Gi cpu: 3

Parameter

Description

Example values

tolerations

Specifies tolerations for taint nodes for the defined daemon type. Each daemon type key contains the following parameters:

cephClusterSpec:
  hyperconverge:
    tolerations:
      <daemonType>:
        rules:
        - key: ""
          operator: ""
          value: ""
          effect: ""
          tolerationSeconds: 0

Possible values for <daemonType> are osd, mon, mgr, and rgw. The following values are also supported:

all - specifies general toleration rules for all daemons if no separate daemon rule is specified.
mds - specifies the CephFS Metadata Server daemons.

hyperconverge:
  tolerations:
    mon:
      rules:
      - effect: NoSchedule
        key: node-role.kubernetes.io/controlplane
        operator: Exists
    mgr:
      rules:
      - effect: NoSchedule
        key: node-role.kubernetes.io/controlplane
        operator: Exists
    osd:
      rules:
      - effect: NoSchedule
        key: node-role.kubernetes.io/controlplane
        operator: Exists
    rgw:
      rules:
      - effect: NoSchedule
        key: node-role.kubernetes.io/controlplane
        operator: Exists

resources

Specifies resources requests or limits. The parameter is a map with the daemon type as a key and the following structure as a value:

hyperconverge:
  resources:
    <daemonType>:
      requests: <kubernetes valid spec of daemon resource requests>
      limits: <kubernetes valid spec of daemon resource limits>

Possible values for <daemonType> are mon, mgr, osd, osd-hdd, osd-ssd, osd-nvme, prepareosd, rgw, and mds. The osd-hdd, osd-ssd, and osd-nvme resource requirements handle only the Ceph OSDs with a corresponding device class.

hyperconverge:
  resources:
    mon:
      requests:
        memory: 1Gi
        cpu: 2
      limits:
        memory: 2Gi
        cpu: 3
    mgr:
      requests:
        memory: 1Gi
        cpu: 2
      limits:
        memory: 2Gi
        cpu: 3
    osd:
      requests:
        memory: 1Gi
        cpu: 2
      limits:
        memory: 2Gi
        cpu: 3
    osd-hdd:
      requests:
        memory: 1Gi
        cpu: 2
      limits:
        memory: 2Gi
        cpu: 3
    osd-ssd:
      requests:
        memory: 1Gi
        cpu: 2
      limits:
        memory: 2Gi
        cpu: 3
    osd-nvme:
      requests:
        memory: 1Gi
        cpu: 2
      limits:
        memory: 2Gi
        cpu: 3

For the Ceph node specific resources settings, specify the resources section in the corresponding nodes spec of KaaSCephCluster:

spec:
  cephClusterSpec:
    nodes:
      <nodeName>:
        resources:
          requests: <kubernetes valid spec of daemon resource requests>
          limits: <kubernetes valid spec of daemon resource limits>

Substitute <nodeName> with the node requested for specific resources. For example:

spec:
  cephClusterSpec:
    nodes:
      <nodeName>:
        resources:
          requests:
            memory: 1Gi
            cpu: 2
          limits:
            memory: 2Gi
            cpu: 3

For the RADOS Gateway instances specific resources settings, specify the resources section in the rgw spec of KaaSCephCluster:

spec:
  cephClusterSpec:
    objectStorage:
      rgw:
        gateway:
          resources:
            requests: <kubernetes valid spec of daemon resource requests>
            limits: <kubernetes valid spec of daemon resource limits>

For example:

spec:
  cephClusterSpec:
    objectStorage:
      rgw:
        gateway:
          resources:
            requests:
              memory: 1Gi
              cpu: 2
            limits:
              memory: 2Gi
              cpu: 3

Save the reconfigured KaaSCephCluster resource and wait for ceph-controller to apply the updated Ceph configuration. It will recreate Ceph Monitors, Ceph Managers, or Ceph OSDs according to the specified hyperconverge configuration.
If you have specified any osd tolerations, additionally specify tolerations for the rook instances:
1. Open the Cluster resource of the required Ceph cluster on a management cluster:
```
kubectl -n <ClusterProjectName> edit cluster
```
  Substitute <ClusterProjectName> with the project name of the required cluster.
2. Specify the parameters in the ceph-controller section of spec.providerSpec.value.helmReleases:
  1. Specify the hyperconverge.tolerations.rook parameter as required:
    hyperconverge: tolerations: rook: | <yamlFormattedKubernetesTolerations>
    In <yamlFormattedKubernetesTolerations>, specify YAML-formatted tolerations from cephClusterSpec.hyperconverge.tolerations.osd.rules of the KaaSCephCluster spec. For example:
    hyperconverge: tolerations: rook: | - effect: NoSchedule key: node-role.kubernetes.io/controlplane operator: Exists
  2. In controllers.cephRequest.parameters.pgRebalanceTimeoutMin, specify the PG rebalance timeout for requests. The default is 30 minutes. For example:
    controllers: cephRequest: parameters: pgRebalanceTimeoutMin: 35
  3. In controllers.cephController.replicas, controllers.cephRequest.replicas, and controllers.cephStatus.replicas, specify the replicas count. The default is 3 replicas. For example:
    controllers: cephController: replicas: 1 cephRequest: replicas: 1 cephStatus: replicas: 1
3. Save the reconfigured Cluster resource and wait for the ceph-controller Helm release update. It will recreate Ceph CSI and discover pods according to the specified hyperconverge.tolerations.rook configuration.
Specify tolerations for different Rook resources using the following chart-based options:
- hyperconverge.tolerations.rook - general toleration rules for each Rook service if no exact rules specified
- hyperconverge.tolerations.csiplugin - for tolerations of the ceph-csi plugins DaemonSets
- hyperconverge.tolerations.csiprovisioner - for the ceph-csi provisioner deployment tolerations
- hyperconverge.nodeAffinity.csiprovisioner - provides the ceph-csi provisioner node affinity with a value section

After a successful Ceph reconfiguration, unset the flags set in step 1 through the ceph-tools pod:

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l \
"app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
ceph osd unset
ceph osd unset noout
ceph osd unset nobackfill
ceph osd unset norebalance
ceph osd unset norecover
exit

Note

Skip this step if you have only configured the PG rebalance timeout and replicas count parameters.

Once done, proceed to Verify Ceph tolerations and resources management.