Upgrade the Ceph cluster

Upgrade the Ceph clusterΒΆ

This section describes how to upgrade an existing Ceph cluster from Jewel to Luminous and from Luminous to Nautilus. If your Ceph version is Jewel, you must first upgrade to Luminous before upgrading to Ceph Nautilus. The Ceph - upgrade pipeline contains several stages. Each node is upgraded separately and requires user input to verify if the status of the Ceph cluster is correct and if the upgrade of a Ceph node was successful. The upgrade procedure is performed on a node-by-node basis. In case of a failure, the user can immediately roll back each node.

Note

The following setup provides for the Ceph upgrade to a major version. To update the Ceph packages to the latest minor versions, follow Update Ceph.

Warning

Before you upgrade Ceph:

  1. If Ceph is being upgraded as part of the MCP upgrade, verify that you have upgraded your MCP cluster as described in Upgrade DriveTrain to a newer release version.
  2. Verify that you have configured the server and client roles for a Ceph backup as described in Create a backup schedule for Ceph nodes.
  3. The upgrade of Ceph Luminous to Nautilus is supported starting from the 2019.2.10 maintenance update. Verify that you have performed the following steps:
    1. Apply maintenance updates.
    2. Enable the ceph-volume tool.
  4. If you are upgrading Ceph from a version prior 14.2.20, verify that the ceph:common:config:mon:auth_allow_insecure_global_id_reclaim pillar is unset or set to true.

To upgrade the Ceph cluster:

  1. Open your Git project repository with the Reclass model on the cluster level.

  2. In ceph/init.yml, specify the ceph_version parameter as required:

    • To upgrade from Jewel to Luminous:

      _param:
        ceph_version: luminous
      
    • To upgrade from Luminous to Nautilus:

      _param:
        ceph_version: nautilus
        linux_system_repo_mcp_ceph_codename: ${_param:ceph_version}
      
  3. Reconfigure package repositories:

    • To upgrade from Jewel to Luminous, in infra/init.yml, specify the linux_system_repo_update_mcp_ceph_url parameter:

      _param:
        linux_system_repo_update_mcp_ceph_url: ${_param:linux_system_repo_update_url}/ceph-luminous/
      
    • To upgrade from Luminous to Nautilus, remove the obsolete package repository by deleting all includes of system.linux.system.repo.mcp.apt_mirantis.ceph from the cluster model.

      The default files that include this class are as follows:

      • ceph/common.yml
      • openstack/init.yml
      • infra/kvm.yml

      Also, remove this class from non-default files of your cluster model, if any.

  4. In ceph/mon.yml, verify that the following line is present:

    classes:
    - system.ceph.mgr.cluster
    
  5. Commit the changes to your local repository:

    git add infra/init.yml
    git add ceph/init.yml
    git add ceph/mon.yml
    git commit -m "updated repositories for Ceph upgrade"
    
  6. Refresh Salt pillars:

    salt '*' saltutil.refresh_pillar
    
  7. Unset all flags and verify that the cluster is healthy.

    Note

    Proceeding with some flags set on the cluster may cause unexpected errors.

  8. Log in to the Jenkins web UI.

  9. Open the Ceph - upgrade pipeline.

  10. Specify the following parameters:

    Parameter Description and values
    SALT_MASTER_CREDENTIALS The Salt Master credentials to use for connection, defaults to salt.
    SALT_MASTER_URL The Salt Master node host URL with the salt-api port, defaults to the jenkins_salt_api_url parameter. For example, http://172.18.170.27:6969.
    ADMIN_HOST Add cmn01* as the Ceph cluster node with the admin keyring.
    CLUSTER_FLAGS

    Add a comma-separated list of flags to apply before and after the pipeline:

    • The sortbitwise,noout flags are mandatory for the upgrade of Ceph Jewel to Luminous.
    • The noout flag is mandatory for the upgrade of Ceph Luminous to Nautilus.
    • Specify the flags unset in the step 7. The flags will be automatically unset at the end of the pipeline execution.
    WAIT_FOR_HEALTHY Verify that this parameter is selected as it enables the Ceph health check within the pipeline.
    ORIGIN_RELEASE Add the current Ceph release version.
    TARGET_RELEASE Add the required Ceph release version.
    STAGE_UPGRADE_MON Select to upgrade Ceph mon nodes.
    STAGE_UPGRADE_MGR Select to deploy new mgr services or upgrade the existing ones.
    STAGE_UPGRADE_OSD Select to upgrade Ceph osd nodes.
    STAGE_UPGRADE_RGW Select to upgrade Ceph rgw nodes.
    STAGE_UPGRADE_CLIENT Select to upgrade Ceph client nodes, such as ctl, cmp, and others.
    STAGE_FINALIZE Select to set the configurations recommended for TARGET_RELEASE as a final step of the upgrade.
    BACKUP_ENABLED

    Select to copy the disks of Ceph VMs before upgrade and to back up Ceph directories on OSD nodes.

    Note

    During the backup, virtual machines are consequently backed up one after another - each VM is destroyed, the disk is copied and then the VM is started again. After a VM launches, the backup procedure is paused until the VM joins the Ceph cluster again and only then it continues to back up the other node. On OSD nodes, only the /etc/ceph and /var/lib/ceph/ directories are backed up. Mirantis recommends verifying that each OSD has been successfully upgraded before proceeding to the next one.

    BACKUP_DIR Added since 2019.2.4 update Optional. If BACKUP_ENABLED is selected, specify the target directory for the backup.
  11. Click Deploy.

    Warning

    If the upgrade on the first node fails, stop the upgrade procedure and roll back the failed node as described in Roll back Ceph services.

  12. In case of Ceph Luminous to Nautilus upgrade, run the Deploy - Upgrade Stacklight Jenkins pipeline job as described in Upgrade StackLight LMA using the Jenkins job to reconfigure StackLight for the new Ceph metrics.

Caution

The Jenkins pipeline job changes repositories, runs upgrade packages, and restarts the service on each node of selected groups. As the pipeline installs packages from configured repositories and does not verify the version to install, for some environments the version of Ceph packages can differ between nodes after upgrade. It can affect one node due to manual configuration without a model or an entire group, like Ceph OSD nodes, due to a misconfiguration in Reclass. It is possible to run a cluster with mismatching versions. However, such configuration is not supported and, with a specific version, may cause cluster outage.

The Jenkins pipeline job provides information about packages versions change in the console output for each node. Consider checking them before proceeding to the next node, especially on the first node of each component.

The Ceph - upgrade pipeline workflow:

  1. Perform the backup.
  2. Set upgrade flags.
  3. Perform the following steps for each selected stage for each node separately:
    1. Update Ceph repository.
    2. Upgrade Ceph packages.
    3. Restart Ceph services.
    4. Execute the verification command.
    5. Wait for user input to proceed.
  4. Unset the upgrade flags.
  5. Set ceph osd require-osd-release as TARGET_RELEASE.
  6. Set ceph osd set-require-min-compat-client as ORIGIN_RELEASE.
  7. Set CRUSH tunables to optimal.
  8. If you enabled scrubbing and deep scrubbing before starting the upgrade, disable them specifying the ceph osd set noscrub and ceph osd set nodeep-scrub flags. Also, remove scrubbing settings if any.