Upgrade the Ceph cluster

Upgrade the Ceph cluster

This section describes how to upgrade an existing Ceph cluster from Jewel to Luminous and from Luminous to Nautilus. If your Ceph version is Jewel, you must first upgrade to Luminous before upgrading to Ceph Nautilus. The Ceph - upgrade pipeline contains several stages. Each node is upgraded separately and requires user input to verify if the status of the Ceph cluster is correct and if the upgrade of a Ceph node was successful. The upgrade procedure is performed on a node-by-node basis. In case of a failure, the user can immediately roll back each node.

Note

The following setup provides for the Ceph upgrade to a major version. To update the Ceph packages to the latest minor versions, follow Update Ceph.

Warning

Before you upgrade Ceph:

  1. If Ceph is being upgraded as part of the MCP upgrade, verify that you have upgraded your MCP cluster as described in Upgrade DriveTrain to a newer release version.

  2. Verify that you have configured the server and client roles for a Ceph backup as described in Create a backup schedule for Ceph nodes.

  3. The upgrade of Ceph Luminous to Nautilus is supported starting from the 2019.2.10 maintenance update. Verify that you have performed the following steps:

    1. Apply maintenance updates.

    2. Enable the ceph-volume tool.

To upgrade the Ceph cluster:

  1. Open your Git project repository with the Reclass model on the cluster level.

  2. In cluster/ceph/init.yml, specify the ceph_version parameter as required:

    • To upgrade from Jewel to Luminous:

      _param:
        ceph_version: luminous
      
    • To upgrade from Luminous to Nautilus:

      _param:
        ceph_version: nautilus
      
  3. In infra/init.yml, specify the linux_system_repo_update_mcp_ceph_url parameter as required:

    • To upgrade from Jewel to Luminous:

      _param:
        linux_system_repo_update_mcp_ceph_url: ${_param:linux_system_repo_update_url}/ceph-luminous/
      
    • To upgrade from Luminous to Nautilus:

      _param:
        linux_system_repo_update_mcp_ceph_url: ${_param:linux_system_repo_update_url}/ceph-nautilus/
      
  4. In cluster/ceph/mon.yml, verify that the following line is present:

    classes:
    - system.ceph.mgr.cluster
    
  5. Commit the changes to your local repository:

    git add infra/init.yml
    git add ceph/init.yml
    git commit -m "updated repositories for Ceph upgrade"
    
  6. Refresh Salt pillars:

    salt '*' saltutil.refresh_pillar
    
  7. Enable scrubbing. While scrubbing is not mandatory for the upgrade, the pipeline job requires a healthy cluster to proceed.

    Note

    If you plan to run the Ceph - upgrade pipeline with the WAIT_FOR_HEALTHY parameter selected, skip this step and proceed to step 8.

    1. From any Ceph node, run ceph -s as a root user and inspect the output.

      • If the third line of the output is health: HEALTH_WARN, inspect the lines that follow. If the nodeep-scrub flag(s) set and/or noscrub flag(s) set exist, unset the flags as described below.

      • If deep scrubbing and scrubbing are already enabled, proceed to step 8.

    2. Restrict deep scrubbing during the upgrade:

      1. Set the week day to begin scrubbing:

        ceph tell 'osd.*' injectargs '--osd_scrub_begin_week_day DAY_NUM'
        

        DAY_NUM is the week day number starting from Monday, which is 1. Specify any day to ensure enough time to finish the upgrade. For example, if today is Monday and you schedule two days for the upgrade, set DAY_NUM to 3, which is Wednesday.

      2. Set the week day to end scrubbing:

        ceph tell 'osd.*' injectargs '--osd_scrub_end_week_day DAY_NUM_END'
        

        DAY_NUM_END should equal to (DAY_NUM + 1) % 7.

    3. Enable scrubbing and deep scrubbing:

      ceph osd unset noscrub
      ceph osd unset nodeep-scrub
      
  8. Log in to the Jenkins web UI.

  9. Open the Ceph - upgrade pipeline.

  10. Specify the following parameters:

    Parameter

    Description and values

    SALT_MASTER_CREDENTIALS

    The Salt Master credentials to use for connection, defaults to salt.

    SALT_MASTER_URL

    The Salt Master node host URL with the salt-api port, defaults to the jenkins_salt_api_url parameter. For example, http://172.18.170.27:6969.

    ADMIN_HOST

    Add cmn01* as the Ceph cluster node with the admin keyring.

    CLUSTER_FLAGS

    Add a comma-separated list of flags to apply before and after the pipeline:

    • The sortbitwise,noout flags are mandatory for the upgrade of Ceph Jewel to Luminous.

    • The noout flag is mandatory for the upgrade of Ceph Luminous to Nautilus.

    WAIT_FOR_HEALTHY

    Verify that this parameter is selected as it enables the Ceph health check within the pipeline.

    ORIGIN_RELEASE

    Add the current Ceph release version.

    TARGET_RELEASE

    Add the required Ceph release version.

    STAGE_UPGRADE_MON

    Select to upgrade Ceph mon nodes.

    STAGE_UPGRADE_MGR

    Select to deploy new mgr services or upgrade the existing ones.

    STAGE_UPGRADE_OSD

    Select to upgrade Ceph osd nodes.

    STAGE_UPGRADE_RGW

    Select to upgrade Ceph rgw nodes.

    STAGE_UPGRADE_CLIENT

    Select to upgrade Ceph client nodes, such as ctl, cmp, and others.

    STAGE_FINALIZE

    Select to set the configurations recommended for TARGET_RELEASE as a final step of the upgrade.

    BACKUP_ENABLED

    Select to copy the disks of Ceph VMs before upgrade and to back up Ceph directories on OSD nodes.

    Note

    During the backup, virtual machines are consequently backed up one after another - each VM is destroyed, the disk is copied and then the VM is started again. After a VM launches, the backup procedure is paused until the VM joins the Ceph cluster again and only then it continues to back up the other node. On OSD nodes, only the /etc/ceph and /var/lib/ceph/ directories are backed up. Mirantis recommends verifying that each OSD has been successfully upgraded before proceeding to the next one.

    BACKUP_DIR Added since 2019.2.4 update

    Optional. If BACKUP_ENABLED is selected, specify the target directory for the backup.

  11. Click Deploy.

    Warning

    If the upgrade on the first node fails, stop the upgrade procedure and roll back the failed node as described in Roll back Ceph services.

The Ceph - upgrade pipeline workflow:

  1. Perform the backup.

  2. Set upgrade flags.

  3. Perform the following steps for each selected stage for each node separately:

    1. Update Ceph repository.

    2. Upgrade Ceph packages.

    3. Restart Ceph services.

    4. Execute the verification command.

    5. Wait for user input to proceed.

  4. Unset the upgrade flags.

  5. Set ceph osd require-osd-release as TARGET_RELEASE.

  6. Set ceph osd set-require-min-compat-client as ORIGIN_RELEASE.

  7. Set CRUSH tunables to optimal.

  8. If you enabled scrubbing and deep scrubbing before starting the upgrade, disable them specifying the ceph osd set noscrub and ceph osd set nodeep-scrub flags. Also, remove scrubbing settings if any.