Skip to content

Hosted Control Plane Backup Procedure#

This document describes a reproducible, step-by-step showcase to backup and restore Hosted Control Planes (HCP) via the k0rdent DataSource API.

Note

This document describes the procedure using the Zalando postgres-operator as an example.

Description of the setup#

For the purpose of this guide, we will use the following clusters:

  1. A management Mirantis k0rdent Enterprise cluster.
  2. A regional cluster where HCP and Postgres will be deployed.

Backup scope#

In a cluster where Kine uses PostgreSQL as the backing datastore (replacing etcd), database backups taken via the postgres-operator exclusively capture the cluster control plane state, not the application data.

What is backed up: All Kubernetes API objects and metadata. This includes workload definitions (Deployments, Pods), cluster configuration (ConfigMaps, Secrets, RBAC), and storage definitions (PV/PVC manifests).

What is NOT backed up: The actual physical data residing on your storage backend. Any files, databases, or state stored inside your Persistent Volumes (PVCs) are entirely excluded.

Note

Recommended solution To fully protect your workloads, a dedicated volume backup solution (such as Velero or CSI VolumeSnapshots) must be implemented alongside these control plane backups.

High-level steps#

  1. Install Postgres and create a postgresql object on the regional cluster.
  2. On the management cluster create a DataSource object pointing to the host routing to the Postgres instance.
  3. On the management cluster create ClusterDeployment referring the DataSource.
  4. Make a backup of the database.
  5. Create a postgresql object restoring the database on the regional cluster.
  6. On the management cluster create a new DataSource object pointing to the host routing to the restored Postgres instance.
  7. Change the ClusterDeployment object to refer to the new DataSource.

Note

For the examplar purposes, a dedicated LoadBalancer Service with the IP address to be accessible from the management cluster is being created for each postgresql object.

Here are the steps in detail.

Backing up#

Follow these steps to back up your HCP child clusters.

1. Create a LoadBalancer Service#

Create a Service through which the management cluster will connect to Postgres:

cat <<EOF | kubectl --kubeconfig <regional-kubeconfig> create -f -
apiVersion: v1
kind: Service
metadata:
  name: pg-example-lb
  namespace: default
spec:
  type: LoadBalancer
  ports:
    - port: 5432
      targetPort: 5432
      protocol: TCP
      name: postgres
  selector:
    application: spilo
    spilo-role: master
    cluster-name: pg-example
EOF

Note

If the available infra does not provide a LoadBalancer IP, use NodePort or a DNS + external load balancer accordingly.

2. Prepare certificates#

Retrieve the external IP of the Service created earlier:

kubectl --kubeconfig <regional-kubeconfig> -n default get svc pg-example-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Generate a root CA, then a server (tls) cert signed by that CA. Use the Service host in the subjectAltName — see server.cfg for req_ext.

# server.cfg
[ req ]
default_bits       = 2048
prompt             = no
default_md         = sha256
distinguished_name = dn
req_extensions     = req_ext

[ dn ]
CN = postgres

[ req_ext ]
subjectAltName = @alt_names

[ alt_names ]
IP.1 = <pg-example-lb-service-address>
DNS.1 = postgres
# create CA
openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -days 3650 -out ca.crt -subj "/CN=postgres"

# create server key & CSR (server.cfg must contain proper SANs / req_ext)
openssl genrsa -out tls.key 2048
openssl req -new -key tls.key -out tls.csr -config server.cfg

# sign CSR with CA
openssl x509 -req -in tls.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out tls.crt -days 365 -extensions req_ext -extfile server.cfg

3. Create Postgres TLS secrets on regional#

Create a TLS Secret so Postgres pods can use it:

kubectl --kubeconfig <regional-kubeconfig> -n default create secret generic pg-tls \
  --from-file=tls.crt=tls.crt \
  --from-file=tls.key=tls.key \
  --from-file=ca.crt=ca.crt

4. Install postgres-operator and the Postgres instance#

Add the chart repo and install the operator:

helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm repo update

helm --kubeconfig <regional-kubeconfig> install postgres-operator postgres-operator-charts/postgres-operator \
  --namespace default

Note

For simplicity, we will use AWS S3 to store backups. Follow the postgres-operator documentation for other options.

Create a Secret with AWS S3 credentials for storing backups:

kubectl --kubeconfig <regional-kubeconfig> -n default create secret generic aws-creds \
  --from-file=AWS_ACCESS_KEY_ID=<aws_access_key_id> \
  --from-file=AWS_SECRET_ACCESS_KEY=<aws_secret_access_key>

Create a YAML file with the postgresql object for the Postgres cluster:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: pg-example
  namespace: default
spec:
  allowedSourceRanges:
  - 0.0.0.0/0
  databases:
    kine: k0smotron # dbname:owner
  enableMasterLoadBalancer: false
  env:
  - name: AWS_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: aws-creds
        key: AWS_ACCESS_KEY_ID
  - name: AWS_SECRET_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: aws-creds
        key: AWS_SECRET_ACCESS_KEY
  - name: AWS_REGION
    value: <aws-region>
  - name: WALE_S3_PREFIX
    value: s3://<s3-bucket-name>
  - name: WALG_S3_PREFIX
    value: s3://<s3-bucket-name>
  numberOfInstances: 2
  postgresql:
    version: "17"
  spiloFSGroup: 103
  teamId: acid
  tls:
    caFile: ca.crt
    secretName: pg-tls
  users:
    k0smotron: # database owner
    - superuser
    - createdb
  volume:
    size: 10Gi

Create the object:

kubectl --kubeconfig <regional-kubeconfig> -n default create -f <postgres-cr>.yaml

Wait for the pods to be ready and get the master pod name:

kubectl --kubeconfig <regional-kubeconfig> -n default get pods -l application=spilo,cluster-name=pg-example,spilo-role=master -o jsonpath='{.items[0].metadata.name}'

5. Prepare and create a DataSource object#

Create a Secret on the management cluster containing the CA (generated in the previous steps) and another Secret with the DB credentials (username/password).

kubectl --kubeconfig <mgmt-kubeconfig> -n default create secret generic postgres-ca --from-file=ca.crt=ca.crt
REG_SECRET_NAME="k0smotron.pg-example.credentials.postgresql.acid.zalan.do"

kubectl --kubeconfig <mgmt-kubeconfig> -n default create secret generic auth-secret \
  --from-literal=password=$(kubectl --kubeconfig <regional-kubeconfig> -n default get secret ${REG_SECRET_NAME} -o go-template='{{.data.password|base64decode}}') \
  --from-literal=username=$(kubectl --kubeconfig <regional-kubeconfig> -n default get secret ${REG_SECRET_NAME} -o go-template='{{.data.username|base64decode}}')

Note

Ensure secret names and namespaces are correct. The example secret name pattern comes from the postgres-operator.

Retrieve the external IP of the Service created earlier:

kubectl --kubeconfig <regional-kubeconfig> -n default get svc pg-example-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Create a YAML file with the DataSource object that contains host, port and CA references.

apiVersion: k0rdent.mirantis.com/v1beta1
kind: DataSource
metadata:
  name: example-ds
  namespace: <clusterdeployment-namespace>
spec:
  type: postgresql
  endpoints:
  - <pg-example-lb-service-address>:5432
  auth:
    username:
      namespace: default
      name: auth-secret
      key: username
    password:
      namespace: default
      name: auth-secret
      key: password
  certificateAuthority:
    namespace: default
    name: postgres-ca
    key: ca.crt

Create the object on the management cluster:

kubectl --kubeconfig <mgmt-kubeconfig> -n <clusterdeployment-namespace> create -f <datasource-cr>.yaml

You can also get more examples and information.

6. Create a ClusterDeployment with HCP#

On the management cluster, create a ClusterDeployment representing the child cluser and reference the DataSource object. The latter must be in the same namespace as the former.

Example with the referrenced DataSource:

apiVersion: hcp.example.com/v1
kind: ClusterDeployment
metadata:
  name: <hcp-deployment>
  namespace: <clusterdeployment-namespace>
spec:
  template: <hosted-template-name>
  credential: <credentials-name>
  dataSource: example-ds
  config: {} # to be filled

Apply and then monitor the ClusterDeployment until it is in the Ready state.

7. Create the Postgres backup#

For an exemplar purpose, create a dummy Pod on the HCP cluster:

cat <<EOF | kubectl --kubeconfig <hcp-deployment-kubeconfig> create -f -
apiVersion: v1
kind: Pod
metadata:
  name: test-pod-foo
spec:
  containers:
    - name: demo
      image: ghcr.io/containerd/busybox:1.36
      command: ["sh", "-c", "sleep 3600"]
EOF

Now create a new backup instance manually. (We'll use this to restore later.) Enter a master's pod:

PGMASTER=$(kubectl --kubeconfig <regional-kubeconfig> -n default get pods -l application=spilo,cluster-name=pg-example,spilo-role=master -o jsonpath='{.items[0].metadata.name}')

kubectl --kubeconfig <regional-kubeconfig> exec -it ${PGMASTER} -n default -- sh

In the pod's shell, make a new backup:

su - postgres
envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"

Wait until the process finishes and verify the backup is in place:

envdir "/run/etc/wal-e.d/env" wal-e backup-list

Delete the dummy pod from the HCP:

kubectl --kubeconfig <hcp-deployment-kubeconfig> delete po test-pod-foo

Restoring from backup#

To restore your HCP external database (and thus your child clusters) follow these steps.

We will partially repeats the previous steps, as restoration essentially involves creating empty objects and then populating them from the backup. We'll state each step explicitly to simplify the process, but we'll leave out the details.

8. Restore Postgres from the backup#

Start by restoring Postgres.

  1. On the regional cluster, create a new LoadBalancer Service that will route to the restored Postgres:

    cat <<EOF | kubectl --kubeconfig <regional-kubeconfig> create -f -
    apiVersion: v1
    kind: Service
    metadata:
      name: pg-restore-lb
      namespace: default
    spec:
      type: LoadBalancer
      ports:
        - port: 5432
          targetPort: 5432
          protocol: TCP
          name: postgres
      selector:
        application: spilo
        spilo-role: master
        cluster-name: pg-restore
    EOF
    

    Retrieve the external IP of the Service created:

    kubectl --kubeconfig <regional-kubeconfig> -n default get svc pg-restore-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
    
  2. Repeat the steps creating new client certificates for the new address, reusing the already created CA and server key:

    # server_restore.cfg
    [ req ]
    default_bits       = 2048
    prompt             = no
    default_md         = sha256
    distinguished_name = dn
    req_extensions     = req_ext
    
    [ dn ]
    CN = postgres
    
    [ req_ext ]
    subjectAltName = @alt_names
    
    [ alt_names ]
    IP.1 = <pg-restore-lb-service-address>
    DNS.1 = postgres
    
    # create CSR (server_restore.cfg must contain proper SANs / req_ext)
    openssl req -new -key tls.key -out tls_restore.csr -config server_restore.cfg
    
    # sign CSR with CA
    openssl x509 -req -in tls_restore.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out tls_restore.crt -days 365 -extensions req_ext -extfile server_restore.cfg
    
  3. Create a TLS Secret with the new certificate so new Postgres pods can use it:

    kubectl --kubeconfig <regional-kubeconfig> -n default create secret generic pg-tls-restore \
      --from-file=tls.crt=tls_restore.crt \
      --from-file=tls.key=tls.key \
      --from-file=ca.crt=ca.crt
    
  4. Create the postgresql object, cloning the original object.

    For more details, please follow the postgres-operator documentation instructions.

    Retreive the original object's UID:

    kubectl --kubeconfig <regional-kubeconfig> -n default get postgresql pg-example -o jsonpath='{.metadata.uid}'
    

    Create a YAML file with the postgresql object for the Postgres cluster to be cloned. Reference the Secret with the AWS credentials:

    apiVersion: acid.zalan.do/v1
    kind: postgresql
    metadata:
      name: pg-restore
      namespace: default
    spec:
      allowedSourceRanges:
      - 0.0.0.0/0
      clone:
        cluster: pg-example # cluster to clone
        s3_wal_path: s3://<s3-bucket-name>
        timestamp: <timestamp-later-than-the-latest-manual-backup> # format: 2006-01-02T15:04:05+07:00
        uid: <uid-retreived> # uid of the original postgresql object
      enableMasterLoadBalancer: false
      env:
      - name: AWS_ACCESS_KEY_ID
        valueFrom:
          secretKeyRef:
            name: aws-creds
            key: AWS_ACCESS_KEY_ID
      - name: AWS_SECRET_ACCESS_KEY
        valueFrom:
          secretKeyRef:
            name: aws-creds
            key: AWS_SECRET_ACCESS_KEY
      - name: AWS_REGION
        value: <aws-region>
      - name: WALE_S3_PREFIX
        value: s3://<s3-bucket-name>
      - name: WALG_S3_PREFIX
        value: s3://<s3-bucket-name>
      - name: CLONE_WALE_S3_PREFIX
        value: s3://<s3-bucket-name>
      - name: CLONE_WALG_S3_PREFIX
        value: s3://<s3-bucket-name>
      - name: CLONE_AWS_ACCESS_KEY_ID
        valueFrom:
          secretKeyRef:
            name: aws-creds
            key: AWS_ACCESS_KEY_ID
      - name: CLONE_AWS_SECRET_ACCESS_KEY
        valueFrom:
          secretKeyRef:
            name: aws-creds
            key: AWS_SECRET_ACCESS_KEY
      - name: CLONE_AWS_REGION
        value: <aws-region>
      numberOfInstances: 2
      postgresql:
        version: "17"
      spiloFSGroup: 103
      teamId: acid
      tls:
        caFile: ca.crt
        secretName: pg-tls-restore
      volume:
        size: 10Gi
    

    Create the object:

    kubectl --kubeconfig <regional-kubeconfig> -n default create -f <postgres-restore-cr>.yaml
    
  5. Wait for the DB to be restored and available.

    Check the logs of the master pod:

    PGMASTER=$(kubectl --kubeconfig <regional-kubeconfig> -n default get pods -l application=spilo,cluster-name=pg-restore,spilo-role=master -o jsonpath='{.items[0].metadata.name}')
    
    kubectl --kubeconfig <regional-kubeconfig> logs -n default ${PGMASTER}
    

    Check that the backup is in place:

    kubectl --kubeconfig <regional-kubeconfig> exec -it ${PGMASTER} -- envdir "/run/etc/wal-e.d/env" wal-e backup-list
    

    Fetch the default user's (postgres) password:

    REG_SECRET_NAME="postgres.pg-restore.credentials.postgresql.acid.zalan.do"
    kubectl --kubeconfig <regional-kubeconfig> -n default get secret ${REG_SECRET_NAME} -o go-template='{{.data.password|base64decode}}'
    

    Connect to the DB from the master pod using the acquired password and inspect the DBs:

    kubectl --kubeconfig <regional-kubeconfig> exec -it ${PGMASTER} -n default -- psql -U postgres -h <pg-restore-lb-service-address> -c '\l+'
    

    You should be presented the DB with the name containing the ClusterDeployment's object name.

9. Prepare and create a new DataSource object#

The credentials haven't changed, so there is no need to creating any new Secret objects on the management cluster.

Retrieve the external IP of the Service created during the restoration:

kubectl --kubeconfig <regional-kubeconfig> -n default get svc pg-restore-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Create a YAML file with the DataSource object that contains host, port and CA references.

apiVersion: k0rdent.mirantis.com/v1beta1
kind: DataSource
metadata:
  name: restore-ds
  namespace: <clusterdeployment-namespace>
spec:
  type: postgresql
  endpoints:
  - <pg-restore-lb-service-address>:5432
  auth:
    username:
      namespace: default
      name: auth-secret
      key: username
    password:
      namespace: default
      name: auth-secret
      key: password
  certificateAuthority:
    namespace: default
    name: postgres-ca
    key: ca.crt

Create the object on the management cluster:

kubectl --kubeconfig <mgmt-kubeconfig> -n <clusterdeployment-namespace> create -f <datasource-restore-cr>.yaml

10. Modify the ClusterDeployment object#

Modify the reference to the DataSource object in the ClusterDeployment object created earlier:

kubectl --kubeconfig <mgmt-kubeconfig> -n <clusterdeployment-namespace> patch cld <hcp-deployment> --type=merge -p '{"spec": {"dataSource": "restore-ds"}}'

Wait for the HCP pods to be ready.

The dummy pod included in the backup should now be presented on the restored HCP cluster:

kubectl --kubeconfig <hcp-deployment-kubeconfig> get po test-pod-foo

Known Issues#

v1.2.x#

On Mirantis k0rdent Enterprise version 1.2.x there is a known limitation, in that the whole procedure does not work as expected because it doesn't update the required secrets containing the DB DSN.

To solve this problem, follow these steps after completing the procedure above:

  1. Find the kine Secret with the name <clusterdeployment-name>-kine (see Integration with the Data Source) on the management cluster.
  2. Extract the DSN:

    kubectl --kubeconfig <mgmt-kubeconfig> get secret -n <clusterdeployment-namespace> <clusterdeployment-name>-kine -o go-template='{{.data.K0SMOTRON_KINE_DATASOURCE_URL|base64decode}}'
    

    Example output:

    kubectl get secret -n kcm-system openstack-hosted-kine -o go-template='{{.data.K0SMOTRON_KINE_DATASOURCE_URL|base64decode}}'
    postgres://kine_kcm_system_openstack_hosted_9gpg6:qsc99cn6cfznqxbcckd8sck8d5544sdc@172.19.115.108:5432/kcm_system_openstack_hosted_9gpg6?sslmode=verify-full&sslrootcert=%2Fvar%2Flib%2Fk0s%2Fkine-ca%2Fca.crt
    
  3. Copy the DSN, modify only the host, and encode it back to base64:

    For example:

    echo -n 'postgres://kine_kcm_system_openstack_hosted_9gpg6:qsc99cn6cfznqxbcckd8sck8d5544sdc@<new host>:5432/kcm_system_openstack_hosted_9gpg6?sslmode=verify-full&sslrootcert=%2Fvar%2Flib%2Fk0s%2Fkine-ca%2Fca.crt' | base64
    
  4. Modify the kine Secret's key K0SMOTRON_KINE_DATASOURCE_URL value with the updated DSN.

  5. On the regional cluster reconcile the k0smotroncontrolplane object by restarting the Deployment:

    kubectl --kubeconfig <regional-kubeconfig> -n <system-namespace> rollout restart deploy k0smotron-controller-manager-control-plane
    
  6. On the regional cluster trigger Mirantis k0rdent Enterprise to recreate the HCP cluster's StatefulSet by deleting it (the restored DB instance must be running at this point):

    kubectl --kubeconfig <regional-kubeconfig> -n <clusterdeployment-namespace> delete sts kcm-<clusterdeployment-name>
    
  7. On the regional cluster restart the CAPI Deployment:

    kubectl --kubeconfig <regional-kubeconfig> -n <system-namespace> rollout restart deploy capi-controller-manager