Hosted Control Plane Backup Procedure#
This document describes a reproducible, step-by-step showcase to
backup and restore Hosted Control Planes (HCP) via the k0rdent DataSource API.
Note
This document describes the procedure using the
Zalando postgres-operator as an example.
Description of the setup#
For the purpose of this guide, we will use the following clusters:
- A management Mirantis k0rdent Enterprise cluster.
- A regional cluster where HCP and Postgres will be deployed.
Backup scope#
In a cluster where Kine uses PostgreSQL as the backing datastore (replacing
etcd), database backups taken via the postgres-operator exclusively capture
the cluster control plane state, not the application data.
What is backed up: All Kubernetes API objects and metadata. This includes workload definitions (Deployments, Pods), cluster configuration (ConfigMaps, Secrets, RBAC), and storage definitions (PV/PVC manifests).
What is NOT backed up: The actual physical data residing on your storage backend. Any files, databases, or state stored inside your Persistent Volumes (PVCs) are entirely excluded.
Note
Recommended solution To fully protect your workloads, a dedicated volume backup solution (such as Velero or CSI VolumeSnapshots) must be implemented alongside these control plane backups.
High-level steps#
- Install Postgres and create a
postgresqlobject on the regional cluster. - On the management cluster create a
DataSourceobject pointing to the host routing to the Postgres instance. - On the management cluster create
ClusterDeploymentreferring theDataSource. - Make a backup of the database.
- Create a
postgresqlobject restoring the database on the regional cluster. - On the management cluster create a new
DataSourceobject pointing to the host routing to the restored Postgres instance. - Change the
ClusterDeploymentobject to refer to the newDataSource.
Note
For the examplar purposes, a dedicated LoadBalancer Service with the
IP address to be accessible from the management cluster is being created for
each postgresql object.
Here are the steps in detail.
Backing up#
Follow these steps to back up your HCP child clusters.
1. Create a LoadBalancer Service#
Create a Service through which the management cluster will connect to Postgres:
cat <<EOF | kubectl --kubeconfig <regional-kubeconfig> create -f -
apiVersion: v1
kind: Service
metadata:
name: pg-example-lb
namespace: default
spec:
type: LoadBalancer
ports:
- port: 5432
targetPort: 5432
protocol: TCP
name: postgres
selector:
application: spilo
spilo-role: master
cluster-name: pg-example
EOF
Note
If the available infra does not provide a LoadBalancer IP, use NodePort or a DNS + external load balancer accordingly.
2. Prepare certificates#
Retrieve the external IP of the Service created earlier:
kubectl --kubeconfig <regional-kubeconfig> -n default get svc pg-example-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
Generate a root CA, then a server (tls) cert signed by that CA. Use the Service
host in the subjectAltName — see server.cfg for req_ext.
# server.cfg
[ req ]
default_bits = 2048
prompt = no
default_md = sha256
distinguished_name = dn
req_extensions = req_ext
[ dn ]
CN = postgres
[ req_ext ]
subjectAltName = @alt_names
[ alt_names ]
IP.1 = <pg-example-lb-service-address>
DNS.1 = postgres
# create CA
openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -days 3650 -out ca.crt -subj "/CN=postgres"
# create server key & CSR (server.cfg must contain proper SANs / req_ext)
openssl genrsa -out tls.key 2048
openssl req -new -key tls.key -out tls.csr -config server.cfg
# sign CSR with CA
openssl x509 -req -in tls.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out tls.crt -days 365 -extensions req_ext -extfile server.cfg
3. Create Postgres TLS secrets on regional#
Create a TLS Secret so Postgres pods can use it:
kubectl --kubeconfig <regional-kubeconfig> -n default create secret generic pg-tls \
--from-file=tls.crt=tls.crt \
--from-file=tls.key=tls.key \
--from-file=ca.crt=ca.crt
4. Install postgres-operator and the Postgres instance#
Add the chart repo and install the operator:
helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm repo update
helm --kubeconfig <regional-kubeconfig> install postgres-operator postgres-operator-charts/postgres-operator \
--namespace default
Note
For simplicity, we will use AWS S3 to store backups.
Follow the postgres-operator documentation
for other options.
Create a Secret with AWS S3 credentials for storing backups:
kubectl --kubeconfig <regional-kubeconfig> -n default create secret generic aws-creds \
--from-file=AWS_ACCESS_KEY_ID=<aws_access_key_id> \
--from-file=AWS_SECRET_ACCESS_KEY=<aws_secret_access_key>
Create a YAML file with the postgresql object for the Postgres cluster:
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: pg-example
namespace: default
spec:
allowedSourceRanges:
- 0.0.0.0/0
databases:
kine: k0smotron # dbname:owner
enableMasterLoadBalancer: false
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-creds
key: AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-creds
key: AWS_SECRET_ACCESS_KEY
- name: AWS_REGION
value: <aws-region>
- name: WALE_S3_PREFIX
value: s3://<s3-bucket-name>
- name: WALG_S3_PREFIX
value: s3://<s3-bucket-name>
numberOfInstances: 2
postgresql:
version: "17"
spiloFSGroup: 103
teamId: acid
tls:
caFile: ca.crt
secretName: pg-tls
users:
k0smotron: # database owner
- superuser
- createdb
volume:
size: 10Gi
Create the object:
kubectl --kubeconfig <regional-kubeconfig> -n default create -f <postgres-cr>.yaml
Wait for the pods to be ready and get the master pod name:
kubectl --kubeconfig <regional-kubeconfig> -n default get pods -l application=spilo,cluster-name=pg-example,spilo-role=master -o jsonpath='{.items[0].metadata.name}'
5. Prepare and create a DataSource object#
Create a Secret on the management cluster containing the CA
(generated in the previous steps)
and another Secret with the DB credentials (username/password).
kubectl --kubeconfig <mgmt-kubeconfig> -n default create secret generic postgres-ca --from-file=ca.crt=ca.crt
REG_SECRET_NAME="k0smotron.pg-example.credentials.postgresql.acid.zalan.do"
kubectl --kubeconfig <mgmt-kubeconfig> -n default create secret generic auth-secret \
--from-literal=password=$(kubectl --kubeconfig <regional-kubeconfig> -n default get secret ${REG_SECRET_NAME} -o go-template='{{.data.password|base64decode}}') \
--from-literal=username=$(kubectl --kubeconfig <regional-kubeconfig> -n default get secret ${REG_SECRET_NAME} -o go-template='{{.data.username|base64decode}}')
Note
Ensure secret names and namespaces are correct.
The example secret name pattern comes from the postgres-operator.
Retrieve the external IP of the Service created earlier:
kubectl --kubeconfig <regional-kubeconfig> -n default get svc pg-example-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
Create a YAML file with the DataSource object that contains host, port and CA references.
apiVersion: k0rdent.mirantis.com/v1beta1
kind: DataSource
metadata:
name: example-ds
namespace: <clusterdeployment-namespace>
spec:
type: postgresql
endpoints:
- <pg-example-lb-service-address>:5432
auth:
username:
namespace: default
name: auth-secret
key: username
password:
namespace: default
name: auth-secret
key: password
certificateAuthority:
namespace: default
name: postgres-ca
key: ca.crt
Create the object on the management cluster:
kubectl --kubeconfig <mgmt-kubeconfig> -n <clusterdeployment-namespace> create -f <datasource-cr>.yaml
You can also get more examples and information.
6. Create a ClusterDeployment with HCP#
On the management cluster, create a ClusterDeployment representing the child cluser
and reference the DataSource object.
The latter must be in the same namespace as the former.
Example with the referrenced DataSource:
apiVersion: hcp.example.com/v1
kind: ClusterDeployment
metadata:
name: <hcp-deployment>
namespace: <clusterdeployment-namespace>
spec:
template: <hosted-template-name>
credential: <credentials-name>
dataSource: example-ds
config: {} # to be filled
Apply and then monitor the ClusterDeployment until it is in the Ready state.
7. Create the Postgres backup#
For an exemplar purpose, create a dummy Pod on the HCP cluster:
cat <<EOF | kubectl --kubeconfig <hcp-deployment-kubeconfig> create -f -
apiVersion: v1
kind: Pod
metadata:
name: test-pod-foo
spec:
containers:
- name: demo
image: ghcr.io/containerd/busybox:1.36
command: ["sh", "-c", "sleep 3600"]
EOF
Now create a new backup instance manually. (We'll use this to restore later.) Enter a master's pod:
PGMASTER=$(kubectl --kubeconfig <regional-kubeconfig> -n default get pods -l application=spilo,cluster-name=pg-example,spilo-role=master -o jsonpath='{.items[0].metadata.name}')
kubectl --kubeconfig <regional-kubeconfig> exec -it ${PGMASTER} -n default -- sh
In the pod's shell, make a new backup:
su - postgres
envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
Wait until the process finishes and verify the backup is in place:
envdir "/run/etc/wal-e.d/env" wal-e backup-list
Delete the dummy pod from the HCP:
kubectl --kubeconfig <hcp-deployment-kubeconfig> delete po test-pod-foo
Restoring from backup#
To restore your HCP external database (and thus your child clusters) follow these steps.
We will partially repeats the previous steps, as restoration essentially involves creating empty objects and then populating them from the backup. We'll state each step explicitly to simplify the process, but we'll leave out the details.
8. Restore Postgres from the backup#
Start by restoring Postgres.
-
On the regional cluster, create a new
LoadBalancerServicethat will route to the restored Postgres:cat <<EOF | kubectl --kubeconfig <regional-kubeconfig> create -f - apiVersion: v1 kind: Service metadata: name: pg-restore-lb namespace: default spec: type: LoadBalancer ports: - port: 5432 targetPort: 5432 protocol: TCP name: postgres selector: application: spilo spilo-role: master cluster-name: pg-restore EOFRetrieve the external IP of the
Servicecreated:kubectl --kubeconfig <regional-kubeconfig> -n default get svc pg-restore-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}' -
Repeat the steps creating new client certificates for the new address, reusing the already created CA and server key:
# server_restore.cfg [ req ] default_bits = 2048 prompt = no default_md = sha256 distinguished_name = dn req_extensions = req_ext [ dn ] CN = postgres [ req_ext ] subjectAltName = @alt_names [ alt_names ] IP.1 = <pg-restore-lb-service-address> DNS.1 = postgres# create CSR (server_restore.cfg must contain proper SANs / req_ext) openssl req -new -key tls.key -out tls_restore.csr -config server_restore.cfg # sign CSR with CA openssl x509 -req -in tls_restore.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out tls_restore.crt -days 365 -extensions req_ext -extfile server_restore.cfg -
Create a TLS
Secretwith the new certificate so new Postgres pods can use it:kubectl --kubeconfig <regional-kubeconfig> -n default create secret generic pg-tls-restore \ --from-file=tls.crt=tls_restore.crt \ --from-file=tls.key=tls.key \ --from-file=ca.crt=ca.crt -
Create the
postgresqlobject, cloning the original object.For more details, please follow the
postgres-operatordocumentation instructions.Retreive the original object's UID:
kubectl --kubeconfig <regional-kubeconfig> -n default get postgresql pg-example -o jsonpath='{.metadata.uid}'Create a YAML file with the
postgresqlobject for the Postgres cluster to be cloned. Reference theSecretwith the AWS credentials:apiVersion: acid.zalan.do/v1 kind: postgresql metadata: name: pg-restore namespace: default spec: allowedSourceRanges: - 0.0.0.0/0 clone: cluster: pg-example # cluster to clone s3_wal_path: s3://<s3-bucket-name> timestamp: <timestamp-later-than-the-latest-manual-backup> # format: 2006-01-02T15:04:05+07:00 uid: <uid-retreived> # uid of the original postgresql object enableMasterLoadBalancer: false env: - name: AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: name: aws-creds key: AWS_ACCESS_KEY_ID - name: AWS_SECRET_ACCESS_KEY valueFrom: secretKeyRef: name: aws-creds key: AWS_SECRET_ACCESS_KEY - name: AWS_REGION value: <aws-region> - name: WALE_S3_PREFIX value: s3://<s3-bucket-name> - name: WALG_S3_PREFIX value: s3://<s3-bucket-name> - name: CLONE_WALE_S3_PREFIX value: s3://<s3-bucket-name> - name: CLONE_WALG_S3_PREFIX value: s3://<s3-bucket-name> - name: CLONE_AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: name: aws-creds key: AWS_ACCESS_KEY_ID - name: CLONE_AWS_SECRET_ACCESS_KEY valueFrom: secretKeyRef: name: aws-creds key: AWS_SECRET_ACCESS_KEY - name: CLONE_AWS_REGION value: <aws-region> numberOfInstances: 2 postgresql: version: "17" spiloFSGroup: 103 teamId: acid tls: caFile: ca.crt secretName: pg-tls-restore volume: size: 10GiCreate the object:
kubectl --kubeconfig <regional-kubeconfig> -n default create -f <postgres-restore-cr>.yaml -
Wait for the DB to be restored and available.
Check the logs of the master pod:
PGMASTER=$(kubectl --kubeconfig <regional-kubeconfig> -n default get pods -l application=spilo,cluster-name=pg-restore,spilo-role=master -o jsonpath='{.items[0].metadata.name}') kubectl --kubeconfig <regional-kubeconfig> logs -n default ${PGMASTER}Check that the backup is in place:
kubectl --kubeconfig <regional-kubeconfig> exec -it ${PGMASTER} -- envdir "/run/etc/wal-e.d/env" wal-e backup-listFetch the default user's (
postgres) password:REG_SECRET_NAME="postgres.pg-restore.credentials.postgresql.acid.zalan.do" kubectl --kubeconfig <regional-kubeconfig> -n default get secret ${REG_SECRET_NAME} -o go-template='{{.data.password|base64decode}}'Connect to the DB from the master pod using the acquired password and inspect the DBs:
kubectl --kubeconfig <regional-kubeconfig> exec -it ${PGMASTER} -n default -- psql -U postgres -h <pg-restore-lb-service-address> -c '\l+'You should be presented the DB with the name containing the
ClusterDeployment's object name.
9. Prepare and create a new DataSource object#
The credentials haven't changed, so there is no need to creating any
new Secret objects on the management cluster.
Retrieve the external IP of the Service created during the restoration:
kubectl --kubeconfig <regional-kubeconfig> -n default get svc pg-restore-lb -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
Create a YAML file with the DataSource object that contains host, port and CA references.
apiVersion: k0rdent.mirantis.com/v1beta1
kind: DataSource
metadata:
name: restore-ds
namespace: <clusterdeployment-namespace>
spec:
type: postgresql
endpoints:
- <pg-restore-lb-service-address>:5432
auth:
username:
namespace: default
name: auth-secret
key: username
password:
namespace: default
name: auth-secret
key: password
certificateAuthority:
namespace: default
name: postgres-ca
key: ca.crt
Create the object on the management cluster:
kubectl --kubeconfig <mgmt-kubeconfig> -n <clusterdeployment-namespace> create -f <datasource-restore-cr>.yaml
10. Modify the ClusterDeployment object#
Modify the reference to the DataSource object in the
ClusterDeployment object created earlier:
kubectl --kubeconfig <mgmt-kubeconfig> -n <clusterdeployment-namespace> patch cld <hcp-deployment> --type=merge -p '{"spec": {"dataSource": "restore-ds"}}'
Wait for the HCP pods to be ready.
The dummy pod included in the backup should now be presented on the restored HCP cluster:
kubectl --kubeconfig <hcp-deployment-kubeconfig> get po test-pod-foo
Known Issues#
v1.2.x#
On Mirantis k0rdent Enterprise version 1.2.x there is a known limitation, in that the whole procedure does not work as expected because it doesn't update the required secrets containing the DB DSN.
To solve this problem, follow these steps after completing the procedure above:
- Find the kine
Secretwith the name<clusterdeployment-name>-kine(see Integration with the Data Source) on the management cluster. -
Extract the DSN:
kubectl --kubeconfig <mgmt-kubeconfig> get secret -n <clusterdeployment-namespace> <clusterdeployment-name>-kine -o go-template='{{.data.K0SMOTRON_KINE_DATASOURCE_URL|base64decode}}'Example output:
kubectl get secret -n kcm-system openstack-hosted-kine -o go-template='{{.data.K0SMOTRON_KINE_DATASOURCE_URL|base64decode}}' postgres://kine_kcm_system_openstack_hosted_9gpg6:qsc99cn6cfznqxbcckd8sck8d5544sdc@172.19.115.108:5432/kcm_system_openstack_hosted_9gpg6?sslmode=verify-full&sslrootcert=%2Fvar%2Flib%2Fk0s%2Fkine-ca%2Fca.crt -
Copy the DSN, modify only the host, and encode it back to base64:
For example:
echo -n 'postgres://kine_kcm_system_openstack_hosted_9gpg6:qsc99cn6cfznqxbcckd8sck8d5544sdc@<new host>:5432/kcm_system_openstack_hosted_9gpg6?sslmode=verify-full&sslrootcert=%2Fvar%2Flib%2Fk0s%2Fkine-ca%2Fca.crt' | base64 -
Modify the kine
Secret's keyK0SMOTRON_KINE_DATASOURCE_URLvalue with the updated DSN. -
On the regional cluster reconcile the
k0smotroncontrolplaneobject by restarting theDeployment:kubectl --kubeconfig <regional-kubeconfig> -n <system-namespace> rollout restart deploy k0smotron-controller-manager-control-plane -
On the regional cluster trigger Mirantis k0rdent Enterprise to recreate the HCP cluster's
StatefulSetby deleting it (the restored DB instance must be running at this point):kubectl --kubeconfig <regional-kubeconfig> -n <clusterdeployment-namespace> delete sts kcm-<clusterdeployment-name> -
On the regional cluster restart the
CAPIDeployment:kubectl --kubeconfig <regional-kubeconfig> -n <system-namespace> rollout restart deploy capi-controller-manager