Upgrade and update an MCP cluster

Upgrade and update an MCP cluster

A typical MCP cluster includes multiple components, such as DriveTrain, StackLight, OpenStack, OpenContrail, and Ceph. Most of MCP components have their own versioning schema. For the majority of the components, MCP supports multiple versions at once.

The upgrade of an MCP deployment to a new version is a multi-step process that needs to take into account the cross-dependencies between the components of the platform, and compatibility matrix of supported versions of the components.

The MCP components that do not have their own versioning schema within MCP and are versioned by the MCP release include:

  • The DriveTrain components: Aptly, Gerrit, Jenkins, Reclass, Salt formulas and their subcomponents
  • StackLight LMA

Caution

Before proceeding with the upgrade procedure, verify that you have updated DriveTrain including Aptly, Gerrit, Jenkins, Reclass, Salt formulas, and their subcomponents to the current MCP release version. Otherwise, the current MCP product documentation is not applicable to your MCP deployment.

Note

Starting from the MCP 2019.2.16 maintenance update, before proceeding with any update or upgrade procedure, first verify that Nova cell mapping is enabled. For details, see Disable Nova cell mapping.

Note

Starting from the MCP 2019.2.17 maintenance update, before proceeding with the next update procedure, you can verify that the model contains information about the necessary fixes and workarounds. For details, see Verify DriveTrain.

For the MCP components with support for multiple versions, such as OpenStack or OpenContrail, you usually can select between two operations:

  • Minor version update (maintenance update)

    New minor versions of the components artifacts are installed. Services are restarted as necessary. This kind of update allows you to obtain the latest bug and security fixes for the components, but it typically does not change the components capabilities.

  • Major version update (upgrade)

    New major versions of the components artifacts are installed. Additional orchestration tasks are executed to change the components configuration, if necessary. This kind of update typically changes and improves the components capabilities.

The following table outlines a general upgrade and update procedure workflow of an MCP cluster. For the detailed upgrade and update workflow of MCP components, refer to the corresponding sections below.

General upgrade and update procedure workflow
# Stage Description
1 Upgrade or update DriveTrain

Perform the basic LCM update or upgrade:

  1. Update the Reclass system.
  2. Fetch the corresponding Git repositories.
  3. Update all binary repository definitions on the Salt Master node.
  4. Update and sync all Salt formulas.
  5. Apply the linux.repo,linux.user and openssh states on all nodes.
  6. Upgrade or update the DriveTrain services.
  7. Optional. Upgrade system packages on the Salt Master node.
  8. Upgrade or update GlusterFS:
    1. Upgrade or update packages for the GlusterFS server on each target host one by one.
    2. Upgrade or update packages for the GlusterFS clients and re-mount volumes on each target GlusterFS client host one by one.
    3. Obtain the cluster.max-op-version option value from GlusterFS and compare it with cluster.op-version to identify whether a version upgrade is required.
    4. Update cluster.op-version.
  9. Optional. Configure allowed and rejected IP addresses for the GlusterFS volumes.
2 Upgrade or update OpenContrail (if applicable)
  1. Verify the OpenContrail service statuses.
  2. Back up the Cassandra and ZooKeeper data.
  3. Stop the Neutron server services.
  4. Upgrade or update the OpenContrail analytics nodes simultaneously. During upgrade, new Docker containers for the OpenContrail analytics nodes are spawned. During update, the corresponding Docker images are updated.
  5. Upgrade or update the OpenContrail controller nodes. During upgrade, new Docker containers for the OpenContrail controller nodes are spawned. During update, the corresponding Docker images are updated. All nodes are upgraded or updated simultaneously except the one that meantime runs the contrail-control service and is upgraded or updated after other nodes.
  6. Upgrade or update the OpenContrail packages on the OpenStack controller nodes simultaneously.
  7. Start the Neutron server services.
  8. Upgrade or update the OpenContrail data plane nodes one by one with the workloads migration if needed since this step implies downtime of the Networking service.
3 Upgrade or update OpenStack or Kubernetes

For OpenStack:

  1. On every OpenStack controller node one by one:
    1. Stop the OpenStack API services.
    2. Upgrade or update the OpenStack packages.
    3. Start the OpenStack services.
    4. Apply the OpenStack states.
    5. Verify that the OpenStack services are up and healthy.
  2. Upgrade the OpenStack data plane.

Caution

We recommend that you do not upgrade or update OpenStack and RabbitMQ simultaneously. Upgrade or update the RabbitMQ component only once OpenStack is running on the new version.

4 Upgrade or update Galera
  1. Prepare the Galera cluster for the upgrade.
  2. Upgrade or update the MySQL and Galera packages on the Galera nodes one by one.
  3. Verify the cluster status after upgrade.
5 Upgrade or update RabbitMQ
  1. Prepare the Neutron service for the RabbitMQ upgrade or update.
  2. Verify that the RabbitMQ upgrade pipeine job is present in Jenkins.
  3. Upgrade or update the RabbitMQ component.

Caution

We recommend that you do not upgrade or update OpenStack and RabbitMQ simultaneously. Upgrade or update the RabbitMQ component only once OpenStack is running on the new version.

   

For Kubernetes:

  1. Upgrade or update essential Kubernetes binaries, for example, hypercube, etcd, cni.
  2. Restart essential Kubernetes services.
  3. Upgrade or update the addons definitions with the latest images.
  4. Perform the Kubernetes control plane changes, if any, on every Kubernetes Master node one by one.
  5. Upgrade or update the Kubernetes Nodes one by one.
6 Upgrade or update StackLight
  1. During upgrade, enable the Ceph Prometheus plugin (if applicable).
  2. Upgrade or update system components including Telegraf, Fluentd, Prometheus Relay, libvirt-exporter, and jmx-exporter.
  3. Upgrade or update Elasticsearch and Kibana one by one:
    1. Stop the corresponding service on all log nodes.
    2. Upgrade or update the packages to the newest version.
    3. For Elasticsearch, reload the systemd configuration.
    4. Start the corresponding service on all log nodes.
    5. Verify that the Elasticsearch cluster status is green.
    6. In case of a major version upgrade, transform the indices for the new version of Elasticsearch and migrate Kibana to the new index.
  4. Upgrade or update components running in Docker Swarm:
    1. Disable and remove the previous versions of monitoring services.
    2. Rebuild the Prometheus configuration by applying the prometheus state on the mon nodes.
    3. Disable and remove the previous version of Grafana.
    4. Start the monitoring services by applying the docker state on the mon nodes.
    5. Apply the saltutil.sync_all state and the grafana.client state to refresh the Grafana dashboards.
7 Upgrade Ceph or Update Ceph

For upgrade:

  1. Prepare the Ceph cluster for upgrade.
  2. Perform the backup.
  3. Upgrade the Ceph repository on each node one by one.
  4. Upgrade the Ceph packages on each node one by one.
  5. Restart the Ceph services on each node one by one.
  6. Verify the upgrade on each node one by one and wait for user input to proceed.
  7. Perform the post-upgrade procedures.

For update:

  1. Update and install new Ceph packages on the cmn nodes.
  2. Restart Ceph Monitor services on all cmn nodes one by one.
  3. Starting from the 2019.2.8 update, restart Ceph Manager on all mgr nodes one by one.
  4. Update and install new Ceph packages on the osd nodes.
  5. Restart Ceph OSDs services on all osd nodes one by one.
  6. Update and install new Ceph packages on the rgw nodes.
  7. Restart Ceph RADOS Gateway services on all rgw nodes one by one.

After the restart of every service, wait for the system to become healthy.

8 Update the base operating system

Install security updates on all nodes.

To reduce the size of new packages to be installed on a cluster during update or upgrade, this is the final step of the procedure. However, you can perform it at any stage to fetch only security patches.

9 Apply issues resolutions requiring manual application described in the Addressed issues sections of all Maintenance updates. Apply fixes that require manual application for all maintenance updates one by one.