Enable Ceph Shared File System (CephFS)

TechPreview

The Ceph Shared File System, or CephFS, provides the capability to create read/write shared file system Persistent Volumes (PVs). These PVs support the ReadWriteMany access mode for the FileSystem volume mode. CephFS deploys its own daemons called MetaData Servers or Ceph MDS. For details, see Ceph Documentation: Ceph File System.

Note

By design, CephFS data pool and metadata pool must be replicated only.

Note

Due to the Technology Preview status of the feature, the following restrictions apply:

  • CephFS is supported as a Kubernetes CSI plugin that only supports creating Kubernetes Persistent Volumes based on the FileSystem volume mode. For a complete modes support matrix, see Ceph CSI: Support Matrix.

  • Ceph Controller supports only one CephFS installation per Ceph cluster.

  • Prior to Container Cloud 2.19.0, Ceph Controller supports only one data pool per CephFS installation.

  • Since Container Cloud 2.19.0 for non-MOSK based clusters, Ceph Controller supports multiple data pools per CephFS installation.

CephFS specification

The KaaSCephCluster CR includes the spec.cephClusterSpec.sharedFilesystem.cephFS section with the following CephFS parameters:

CephFS specification

Parameter

Description

name

CephFS instance name.

dataPool Deprecated since 2.19.0

CephFS data pool spec that should only contain replicated and failureDomain parameters. The failureDomain parameter may be set to osd or host, defining the failure domain across which the data will be spread. Can use only replicated settings. For example:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      dataPool:
        replicated:
         size: 3
        failureDomain: host

where replicated.size is the number of full copies of data on multiple nodes.

Warning

  • The dataPool field is deprecated in favor of dataPools. All data from this field will be translated into dataPools with data0 as the default data pool name.

  • Modifying of dataPool on a deployed CephFS has no effect. You can manually adjust pool settings through the Ceph CLI. However, for any changes in dataPool, Mirantis recommends re-creating CephFS.

dataPools Available since 2.19.0 for non-MOSK based clusters

A list of CephFS data pool specifications. Each spec contains the name, replicated or erasureCoded, and failureDomain parameters. The first pool in the list is treated as the default data pool for CephFS and must always be replicated. The failureDomain parameter may be set to osd or host, defining the failure domain across which the data will be spread. The number of data pools is unlimited, but the default pool must always be present. For example:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      dataPools:
      - name: default-pool
        replicated:
          size: 3
        failureDomain: host
      - name: second-pool
        erasureCoded:
          dataChunks: 2
          codingChunks: 1

Where replicated.size is the number of full copies of data on multiple nodes.

Warning

When using the non-recommended Ceph pools replicated.size of less than 3, Ceph OSD removal cannot be performed. The minimal replica size equals a rounded up half of the specified replicated.size. For example, if replicated.size is 2, the minimal replica size is 1, and if replicated.size is 3, then the minimal replica size is 2. The replica size of 1 allows Ceph having PGs with only one Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED health warning that blocks Ceph OSD removal. Mirantis recommends having setting replicated.size to 3 for each Ceph pool.

Warning

Modifying of dataPools on a deployed CephFS has no effect. You can manually adjust pool settings through the Ceph CLI. However, for any changes in dataPools, Mirantis recommends re-creating CephFS.

metadataPool

CephFS metadata pool spec that should only contain replicated and failureDomain parameters. The failureDomain parameter may be set to osd or host, defining the failure domain across which the data will be spread. Can use only replicated settings. For example:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
     - name: cephfs-store
       metadataPool:
        replicated:
          size: 3
        failureDomain: host

where replicated.size is the number of full copies of data on multiple nodes.

Warning

Modifying metadataPool on a deployed CephFS has no effect. You can manually adjust pool settings through the Ceph CLI. However, for any changes in metadataPool, Mirantis recommends re-creating CephFS.

preserveFilesystemOnDelete

Defines whether to delete the data and metadata pools if CephFS is deleted. Set to true to avoid occasional data loss in case of human error. However, for security reasons, Mirantis recommends setting preserveFilesystemOnDelete to false.

metadataServer

Metadata Server settings correspond to the Ceph MDS daemon settings. Contains the following fields:

  • activeCount - the number of active Ceph MDS instances. As load increases, CephFS will automatically partition the file system across the Ceph MDS instances. Rook will create double the number of Ceph MDS instances as requested by activeCount. The extra instances will be in the standby mode for failover. Mirantis recommends specifying this parameter to 1 and increasing the MDS daemons count only in case of high load.

  • activeStandby - defines whether the extra Ceph MDS instances will be in active standby mode and will keep a warm cache of the file system metadata for faster failover. The instances will be assigned by CephFS in failover pairs. If false, the extra Ceph MDS instances will all be in passive standby mode and will not maintain a warm cache of the metadata. The default value is false.

  • resources - represents Kubernetes resource requirements for Ceph MDS pods.

For example:

cephClusterSpec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      metadataServer:
        activeCount: 1
        activeStandby: false
        resources: # example, non-prod values
          requests:
            memory: 1Gi
            cpu: 1
          limits:
            memory: 2Gi
            cpu: 2

Enable CephFS

  1. Open the corresponding Cluster resource for editing:

    kubectl -n <managedClusterProjectName> edit cluster
    

    Substitute <managedClusterProjectName> with the corresponding value.

  2. In the spec.providerSpec.helmReleases section, enable the CephFS CSI plugin installation:

    spec:
      providerSpec:
        helmReleases:
        ...
        - name: ceph-controller
          ...
          values:
            ...
            rookExtraConfig:
              csiCephFsEnabled: true
    

    You can also override the CSI Ceph FS gRPC and liveness metrics port. For example, if an application is already using the default CephFS ports 9092 and 9082, which may cause conflicts on the node.

    spec:
      providerSpec:
        helmReleases:
        ...
        - name: ceph-controller
          ...
          values:
            ...
            rookExtraConfig:
              csiCephFsEnabled: true
              csiCephFsGPCMetricsPort: "9092" # should be a string
              csiCephFsLivenessMetricsPort: "9082" # should be a string
    

    Rook will enable the CephFS CSI plugin and provisioner.

  3. Save Cluster and close the editor.

  4. Open the KaasCephCluster CR of a managed cluster for editing:

    kubectl edit kaascephcluster -n <managedClusterProjectName>
    

    Substitute <managedClusterProjectName> with the corresponding value.

  5. In the sharedFilesystem section, specify parameters according to CephFS specification. For example:

    • Prior to Container Cloud 2.19.0:

      spec:
        cephClusterSpec:
          sharedFilesystem:
            cephFS:
            - name: cephfs-store
              dataPool:
                replicated:
                  size: 3
                failureDomain: host
              metadataPool:
                replicated:
                  size: 3
                failureDomain: host
              metadataServer:
                activeCount: 1
                activeStandby: false
      
    • Since Container Cloud 2.19.0 for non-MOSK based clusters:

      spec:
        cephClusterSpec:
          sharedFilesystem:
            cephFS:
            - name: cephfs-store
              dataPools:
              - name: cephfs-pool-1
                replicated:
                  size: 3
                failureDomain: host
              metadataPool:
                replicated:
                  size: 3
                failureDomain: host
              metadataServer:
                activeCount: 1
                activeStandby: false
      
  6. Define the mds role for the corresponding nodes where Ceph MDS daemons should be deployed. Mirantis recommends labeling only one node with the mds role. For example:

    spec:
      cephClusterSpec:
        nodes:
          ...
          worker-1:
            roles:
            ...
            - mds
    
  7. Save KaaSCephCluster and close the editor.

Once CephFS is specified in the KaaSCephCluster CR, Ceph Controller will validate it and request Rook to create CephFS. Then Ceph Controller will create a Kubernetes StorageClass, required to start provisioning the storage, which will operate the CephFS CSI driver to create Kubernetes PVs.

Note

The Storage Class will be named as <cephfs-name>-cephfs. Also, the provisioner will be set to rook-ceph.cephfs.csi.ceph.com. To use CephFS for provisioning volumes, StorageClass must match <cephfs-name>-cephfs in PersistentVolumeClaim (PVC). For example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc-example
  namespace: some-namespace
spec:
  storageClassName: <cephfs-name>-cephfs
  ...