Back up Swarm

MKE manager nodes store the swarm state and manager logs in the /var/lib/docker/swarm/ directory. Swarm raft logs contain crucial information for recreating Swarm-specific resources, including services, secrets, configurations, and node cryptographic identity. This data includes the keys used to encrypt the raft logs. You must have these keys to restore the swarm.

Because logs contain node IP address information and are not transferable to other nodes, you must perform a manual backup on each manager node. If you do not back up the raft logs, you cannot verify workloads or Swarm resource provisioning after restoring the cluster.

Note

You can avoid performing a Swarm backup by storing stacks, services definitions, secrets, and networks definitions in a source code management or config management tool.

Swarm backup contents

Data

Backed up

Description

Raft keys

Yes

Keys used to encrypt communication between Swarm nodes and to encrypt and decrypt raft logs

Membership

Yes

List of the nodes in the cluster

Services

Yes

Stacks and services stored in Swarm mode

Overlay networks

Yes

Overlay networks created on the cluster

Configs

Yes

Configs created in the cluster

Secrets

Yes

Secrets saved in the cluster

Swarm unlock key

No

Secret key needed to unlock a manager after its Docker daemon restarts

To back up Swarm:

Note

All commands that follow must be prefixed with sudo or executed from a superuser shell by first running sudo sh.

  1. If auto-lock is enabled, retrieve your Swarm unlock key. Refer to Rotate the unlock key in the Docker documentation for more information.

  2. Optional. Mirantis recommends that you run at least three manager nodes, in order to achieve high availability, as you must stop the engine of the manager node before performing the backup. A majority of managers must be online for a cluster to be operational. If you have less than 3 managers, the cluster will be unavailable during the backup.

    Note

    While a manager is shut down, your swarm is more likely to lose quorum if further nodes are lost. A loss of quorum renders the swarm unavailable until quorum is recovered. Quorum is only recovered when more than 50% of the nodes become available. If you regularly take down managers when performing backups, consider running a 5-manager swarm, as this will enable you to lose an additional manager while the backup is running, without disrupting services.

  3. Select a manager node other than the leader to avoid a new election inside the cluster:

    docker node ls -f "role=manager" | tail -n+2 | grep -vi leader
    
  4. Optional. Store the Mirantis Container Runtime (MCR) version in a variable to easily add it to your backup name.

    ENGINE=$(docker version -f '{{.Server.Version}}')
    
  5. Stop MCR on the manager node before backing up the data, so that no data is changed during the backup:

    systemctl stop docker
    
  6. Back up the /var/lib/docker/swarm directory:

    tar cvzf "/tmp/swarm-${ENGINE}-$(hostname -s)-$(date +%s%z).tgz" /var/lib/docker/swarm/
    

    You can decode the Unix epoch in the file name by typing date -d @timestamp:

    date -d @1531166143
    Mon Jul  9 19:55:43 UTC 2018
    
  7. If auto-lock is enabled, unlock the swarm:

    docker swarm unlock
    
  8. Restart MCR on the manager node:

    systemctl start docker
    
  9. Repeat the above steps for each manager node.