Rollback MCP to a previous release version

Rollback MCP to a previous release version

You can rollback your MCP deployment to a previous MCP release version through DriveTrain using the Deploy - update cloud pipeline.

To rollback to the previous stable MCP release version:

  1. Verify that the correct previous Build ID release repositories are available. These include the local repositories present in the mirror image backup as well as the local aptly repositories.

  2. In the infra/init.yml file of the Reclass model, specify the previous Build ID in the mcp_version parameter.

  3. In the infra/init.yml file of the Reclass model, verify that the following pillar is present.

    parameters:
      linux:
        system:
          purge_repos: true
    
  4. Update the classes/system Git submodule of the Reclass model to the commit of the required Build ID by running the following command from the classes/system directory:

    git pull origin release/BUILD_ID
    
  5. Commit and push the changes to the Git repository where the cluster Reclass model is located.

  6. Select from the following options:

    • The ROLLBACK_BY_REDEPLOY was not selected for the update pipeline:

      1. Roll back the Salt Master node:

        1. On the KVM node hosting the Salt Master node, run the following commands:

          virsh destroy cfg01.domain
          virsh define /var/lib/libvirt/images/cfg01.domain.xml
          virsh start cfg01.domain; virsh snapshot-delete cfg0X.domain --metadata ${SNAPSHOT_NAME}
          rm /var/lib/libvirt/images/cfg01.domain.${SNAPSHOT_NAME}.qcow2
          rm /var/lib/libvirt/images/cfg01.domain.xml
          
        2. On the Salt Master node, apply the linux.system.repo Salt state.

      2. Roll back the CI/CD nodes:

        1. On the KVM nodes hosting the CI/CD nodes, run the following commands:

          virsh destroy cid0X.domain
          virsh define /var/lib/libvirt/images/cid0X.domain.xml
          virsh start cid0X.domain
          virsh snapshot-delete cid0X.domain --metadata ${SNAPSHOT_NAME}
          rm /var/lib/libvirt/images/cid0X.domain.${SNAPSHOT_NAME}.qcow2
          rm /var/lib/libvirt/images/cid0X.domain.xml
          
        2. On all CI/CD nodes, restart the docker service and apply the linux.system.repo Salt state.

    • The ROLLBACK_BY_REDEPLOY parameter was selected for the update pipeline:

      1. Roll back the Salt Master node as described in Back up and restore the Salt Master node.

      2. Redeploy the CI/CD nodes:

        1. Virsh destroy and undefine the cid nodes:

          virsh destroy cid<NUM>.<DOMAIN_NAME>
          virsh undefine cid<NUM>.<DOMAIN_NAME>
          
        2. Remove the cid salt-keys from the Salt Master node.

        3. Redeploy the CI/CD nodes using the Deploy CI/CD procedure.

  7. Run the Deploy - update cloud pipeline in the Jenkins web UI specifying the following parameters as required:

    Deploy - update cloud pipeline parameters
    Parameter Description
    CEPH_OSD_TARGET The Salt targeted physical Ceph OSD osd nodes.
    CID_TARGET The Salt targeted CI/CD cid nodes.
    CMN_TARGET The Salt targeted Ceph monitor cmn nodes.
    CMP_TARGET The Salt targeted physical compute cmp nodes.
    CTL_TARGET The Salt targeted controller ctl nodes.
    DBS_TARGET The Salt targeted database dbs nodes.
    GTW_TARGET The Salt targeted physical or virtual gateway gtw nodes.
    INTERACTIVE Ask interactive questions during the pipeline run. If not selected, the pipeline will either succeed or fail.
    KVM_TARGET The Salt targeted physical KVM kvm nodes.
    LOG_TARGET The Salt targeted log storage and visualization log nodes.
    MON_TARGET The Salt targeted StackLight LMA monitoring node mon nodes.
    MSG_TARGET The Salt targeted RabbitMQ server msg nodes.
    MTR_TARGET The Salt targeted StackLight LMA metering mtr nodes.
    NAL_TARGET The Salt targeted OpenContrail 3.2 analytics nal nodes.
    NTW_TARGET The Salt targeted OpenContrail 3.2 controller ntw nodes.
    PER_NODE Target nodes will be managed one by one. Recommended.
    PRX_TARGET The Salt targeted proxy prx nodes.
    RGW_TARGET The Salt targeted RADOS gateway rgw nodes.
    ROLLBACK_BY_REDEPLOY Select if live snapshots were taken during update.
    SALT_MASTER_URL URL of Salt Master node API.
    SALT_MASTER_CREDENTIALS ID of the Salt Master node API credentials stored in Jenkins.
    SNAPSHOT_NAME Live snapshot name.
    STOP_SERVICES Stop API services before the rollback.
    PURGE_PKGS The space-separated list of pkgs=versions to be purged on the physical targeted machines. For example, pkg_name1=pkg_version1 pkg_name2=pkg_version2.
    REMOVE_PKGS The space-separated list of pkgs=versions to be removed on the physical targeted machines. For example, pkg_name1=pkg_version1 pkg_name2=pkg_version2.
    RESTORE_CONTRAIL_DB

    Restore the Cassandra and ZooKeeper databases for OpenContrail 3.2. OpenContrail 4.x is not supported. Select only if rollback of the OpenContrail 3.2 controller nodes failed or a specific backup defined in the cluster model is required.

    If RESTORE_CONTRAIL_DB is selected, add the following configuration to the cluster/opencontrail/control.yml file of your Reclass model:

    • For ZooKeeper:

      parameters:
        zookeeper:
          backup:
            client:
              enabled: true
              restore_latest: 1
              restore_from: remote
      
    • For Cassandra:

      parameters:
        cassandra:
          backup:
            client:
              enabled: true
              restore_latest: 1
              restore_from: remote
      
    RESTORE_GALERA

    Restore the Galera database. Select only if the rollback of the database nodes failed or a specific backup defined in the cluster model is required.

    If RESTORE_GALERA is selected, add the xtrabackup restore lines to the cluster/openstack/database/init.yml of your Reclass model:

    parameters:
      xtrabackup:
        client:
          enabled: true
          restore_full_latest: 1
          restore_from: remote
    
    ROLLBACK_PKG_VERSIONS

    Copy back the list of package versions installed before the update acquired from the pipeline before pkgs were upgraded. The space-separated list of pkgs=versions to roll back to on the physical targeted machines. For example, pkg_name1=pkg_version1 pkg_name2=pkg_version2.

    If ROLLBACK_PKG_VERSIONS is empty, apt --allow-downgrades dist-upgrade will be run on the targeted physical machines.

    If ROLLBACK_PKG_VERSIONS contains the salt-minion package, you will have to rerun the pipeline for every targeted physical machine as it will be disconnected.

    TARGET_ROLLBACKS The comma-separated list of nodes to rollback. The valid values include ctl, prx, msg, dbs, log, mon, mtr, ntw, nal, gtw-virtual, cmn, rgw, cmp, kvm, osd, gtw-physical.
    TARGET_REBOOT

    The comma-separated list of physical nodes to reboot after a rollback. The valid values include cmp, kvm, osd, gtw-physical.

    Caution

    When the kvm node is defined, the pipeline can be interrupted due to the Jenkins slave reboot. If so, remove the already updated nodes from TARGET_UPDATES and rerun the pipeline.

    TARGET_HIGHSTATE The comma-separated list of physical nodes to run Salt highstate on after a rollback. The valid values include cmp, kvm, osd, gtw-physical.

    Common rollback workflow for different nodes types:

    1. Downtime for VCP occurs if a rollback for VCP VMs is required.
    2. If the ROLLBACK_BY_REDEPLOY parameter is selected, the VCP VMs will be destroyed, undefined, and their salt-key will be deleted from the Salt Master node.
    3. Verification of the service or API status is done.

    The procedure of the pipeline if ROLLBACK_BY_REDEPLOY is not selected:

    1. VCP VMs by their target type are destroyed.
    2. Live snapshot is deleted if any and its original base file is used to boot the VM.
    3. Repositories are updated.
    4. Physical machines are rolled back:
      1. Repositories are updated.
      2. Packages defined in PURGE_PKGS are purged.
      3. Packages defined in REMOVE_PKGS are removed.
      4. Package versions defined in ROLLBACK_PKG_VERSIONS are installed.
      5. The Salt highstate is applied.
  8. Verify that the following lines are not present in cluster/infra/backup/client_mysql.yml:

    parameters:
      xtrabackup:
        client:
          cron: false
    
  9. Verify that the following lines are not present in cluster/infra/backup/server.yml:

    parameters:
      xtrabackup:
        server:
          cron: false
    
  10. If OpenContrail 3.2 is used:

    1. Verify that the following lines are not present in cluster/infra/backup/client_zookeeper.yml and cluster/infra/backup/server.yml:

      parameters:
        zookeeper:
          backup:
            cron: false
      
    2. Verify that the following lines are not present in cluster/infra/backup/client_cassandra.yml and cluster/infra/backup/server.yml:

      parameters:
        cassandra:
          backup:
            cron: false
      
  11. If the ROLLBACK_BY_REDEPLOY parameter was selected and the Deploy - cloud update pipeline succeeds, continue the rollback:

    1. Roll back the Salt Master node as described in Back up and restore the Salt Master node procedure if necessary.

    2. If Ceph is enabled and you want to rollback Ceph monitoring nodes, proceed with Restore a Ceph Monitor node.

    3. Redeploy the nodes based on your model by running the Deploy - OpenStack pipeline. See the pipeline configuration details in Deploy an OpenStack environment, step 10.

      Note

      Specify k8s in the Install parameter field in case of an MCP Kubernetes deployment.

    4. Rerun the Deploy - cloud update pipeline with RESTORE_GALERA and RESTORE_CONTRAIL_DB selected.