Enable Ceph Shared File System (CephFS)

Available since 2.22.0 as GA

Caution

For MKE clusters that are part of MOSK infrastructure, the feature is not supported yet.

Caution

Before Container Cloud 2.22.0, this feature is available as Technology Preview. Therefore, with earlier Container Cloud versions, use CephFS at your own risk.

Caution

Since Ceph Pacific, Ceph CSI driver does not propagate the 777 permission on the mount point of persistent volumes based on any StorageClass of the CephFS data pool.

The Ceph Shared File System, or CephFS, provides the capability to create read/write shared file system Persistent Volumes (PVs). These PVs support the ReadWriteMany access mode for the FileSystem volume mode. CephFS deploys its own daemons called MetaData Servers or Ceph MDS. For details, see Ceph Documentation: Ceph File System.

Note

By design, CephFS data pool and metadata pool must be replicated only.

Limitations

  • CephFS is supported as a Kubernetes CSI plugin that only supports creating Kubernetes Persistent Volumes based on the FileSystem volume mode. For a complete modes support matrix, see Ceph CSI: Support Matrix.

  • Ceph Controller supports only one CephFS installation per Ceph cluster.

  • Re-creating of the CephFS instance in a cluster requires a different value for the name parameter.

CephFS specification

The KaaSCephCluster CR includes the spec.cephClusterSpec.sharedFilesystem.cephFS section with the following CephFS parameters:

CephFS specification

Parameter

Description

name

CephFS instance name.

dataPools

A list of CephFS data pool specifications. Each spec contains the name, replicated or erasureCoded, deviceClass, and failureDomain parameters. The first pool in the list is treated as the default data pool for CephFS and must always be replicated. The failureDomain parameter may be set to osd or host, defining the failure domain across which the data will be spread. The number of data pools is unlimited, but the default pool must always be present. For example:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      dataPools:
      - name: default-pool
        deviceClass: ssd
        replicated:
          size: 3
        failureDomain: host
      - name: second-pool
        deviceClass: hdd
        erasureCoded:
          dataChunks: 2
          codingChunks: 1

Where replicated.size is the number of full copies of data on multiple nodes.

Warning

When using the non-recommended Ceph pools replicated.size of less than 3, Ceph OSD removal cannot be performed. The minimal replica size equals a rounded up half of the specified replicated.size.

For example, if replicated.size is 2, the minimal replica size is 1, and if replicated.size is 3, then the minimal replica size is 2. The replica size of 1 allows Ceph having PGs with only one Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED health warning that blocks Ceph OSD removal. Mirantis recommends setting replicated.size to 3 for each Ceph pool.

Warning

Modifying of dataPools on a deployed CephFS has no effect. You can manually adjust pool settings through the Ceph CLI. However, for any changes in dataPools, Mirantis recommends re-creating CephFS.

metadataPool

CephFS metadata pool spec that should only contain replicated, deviceClass, and failureDomain parameters. The failureDomain parameter may be set to osd or host, defining the failure domain across which the data will be spread. Can use only replicated settings. For example:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
     - name: cephfs-store
       metadataPool:
         deviceClass: nvme
         replicated:
           size: 3
         failureDomain: host

where replicated.size is the number of full copies of data on multiple nodes.

Warning

Modifying metadataPool on a deployed CephFS has no effect. You can manually adjust pool settings through the Ceph CLI. However, for any changes in metadataPool, Mirantis recommends re-creating CephFS.

preserveFilesystemOnDelete

Defines whether to delete the data and metadata pools if CephFS is deleted. Set to true to avoid occasional data loss in case of human error. However, for security reasons, Mirantis recommends setting preserveFilesystemOnDelete to false.

metadataServer

Metadata Server settings correspond to the Ceph MDS daemon settings. Contains the following fields:

  • activeCount - the number of active Ceph MDS instances. As load increases, CephFS will automatically partition the file system across the Ceph MDS instances. Rook will create double the number of Ceph MDS instances as requested by activeCount. The extra instances will be in the standby mode for failover. Mirantis recommends specifying this parameter to 1 and increasing the MDS daemons count only in case of high load.

  • activeStandby - defines whether the extra Ceph MDS instances will be in active standby mode and will keep a warm cache of the file system metadata for faster failover. The instances will be assigned by CephFS in failover pairs. If false, the extra Ceph MDS instances will all be in passive standby mode and will not maintain a warm cache of the metadata. The default value is false.

  • resources - represents Kubernetes resource requirements for Ceph MDS pods.

For example:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      metadataServer:
        activeCount: 1
        activeStandby: false
        resources: # example, non-prod values
          requests:
            memory: 1Gi
            cpu: 1
          limits:
            memory: 2Gi
            cpu: 2

Enable and configure CephFS

Note

Since Container Cloud 2.22.0, CephFS is enabled by default. Therefore, skip steps 1-2.

  1. Open the corresponding Cluster resource for editing:

    kubectl -n <managedClusterProjectName> edit cluster
    

    Substitute <managedClusterProjectName> with the corresponding value.

  2. In the spec.providerSpec.helmReleases section, enable the CephFS CSI plugin installation:

    spec:
      providerSpec:
        helmReleases:
        ...
        - name: ceph-controller
          ...
          values:
            ...
            rookExtraConfig:
              csiCephFsEnabled: true
    
  3. Optional. Override the CSI CephFS gRPC and liveness metrics port. For example, if an application is already using the default CephFS ports 9092 and 9082, which may cause conflicts on the node.

    spec:
      providerSpec:
        helmReleases:
        ...
        - name: ceph-controller
          ...
          values:
            ...
            rookExtraConfig:
              csiCephFsEnabled: true
              csiCephFsGPCMetricsPort: "9092" # should be a string
              csiCephFsLivenessMetricsPort: "9082" # should be a string
    

    Rook will enable the CephFS CSI plugin and provisioner.

  4. Open the KaasCephCluster CR of a managed cluster for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  5. In the sharedFilesystem section, specify parameters according to CephFS specification. For example:

    spec:
      cephClusterSpec:
        sharedFilesystem:
          cephFS:
          - name: cephfs-store
            dataPools:
            - name: cephfs-pool-1
              deviceClass: hdd
              replicated:
                size: 3
              failureDomain: host
            metadataPool:
              deviceClass: nvme
              replicated:
                size: 3
              failureDomain: host
            metadataServer:
              activeCount: 1
              activeStandby: false
    
  6. Define the mds role for the corresponding nodes where Ceph MDS daemons should be deployed. Mirantis recommends labeling only one node with the mds role. For example:

    spec:
      cephClusterSpec:
        nodes:
          ...
          worker-1:
            roles:
            ...
            - mds
    

Once CephFS is specified in the KaaSCephCluster CR, Ceph Controller will validate it and request Rook to create CephFS. Then Ceph Controller will create a Kubernetes StorageClass, required to start provisioning the storage, which will operate the CephFS CSI driver to create Kubernetes PVs.