Skip to content

adfinis/kubernetes-etcd-backup

Repository files navigation

Kubernetes etcd backup CronJob

This CronJob creates a Pod which runs /backup.sh on a Kubernetes cluster to create the described backup. After finishing, it copies the files to a configured PV and expires old backups according to its configuration.

The backup script generates a snapshot.db file with the date when it is performed.

Installation

First, create a namespace:

kubectl create namespace etcd-backup

Get the necessary configuration

If you run etcd in your cluster you can read the etcd configuration and the location of the required certificates from your clusters etcd pod. The following commands will give you the necessary information:

kubectl describe pod -n kube-system etcd-<name of your etcd pod> | less

Get the IP address of the etcd endpoint and put it in the config map. Then get the location of the following certificates:

  • peer-cert-file etcd-peer.crt
  • peer-key-file etcd-peer.key
  • trusted-ca-file etcd-ca.crt

If you run etcd outside of your cluster, you can get the information from the etcd configuration file. The default location is /etc/etcd.env. The certificate information is in the TLS section. You need the ETCD_ADVERTISE_CLIENT_URLS, ETCD_PEER_TRUSTED_CA_FILE, ETCD_PEER_CERT_FILE and ETCD_PEER_KEY_FILE variables. The following example shows the default values:

  • ETCD_ADVERTISE_CLIENT_URLS=https://192.168.122.151:2379
  • ...
  • ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
  • ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-node1.pem
  • ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-node1-key.pem

Get the certificates from the Kubernetes host and put them into a secret:

kubectl create secret generic etcd-peer-tls --from-file=tls.crt --from-file=tls.key -n etcd-backup
kubectl create secret generic etcd-server-ca --from-file=ca.crt -n etcd-backup

Add the endpoint IP address to the ConfigMap, without scheme or port:

  ENDPOINT: "192.168.122.151"

Create the backup configuration

Then adjust the storage configuration to your needs in backup-storage.yaml and deploy it. The example uses NFS but you can use any storage class you want:

kubectl create -f backup-storage.yaml

Configure the backup-script:

kubectl create -f backup-config.yaml

Then deploy the CronJob:

kubectl create -f backup-cronjob.yaml

Creating manual backup for testing purpose

To test the backup or create an manual backup you can run a job:

backupName=$(date "+etcd-backup-manual-%F-%H-%M-%S")
kubectl create job --from=cronjob/etcd-backup ${backupName}

To see if everything works as it should you can check the logs:

kubectl logs -l job-name=${backupName}

Then check on your Storage, if the files are there as excepted.

Configuration

Configuration can be changed in the ConfigMap backup-config:

kubectl edit -n etcd-backup cm/backup-config

The following options are used:

  • ETCD_BACKUP_S3: Use S3 to store etcd-backup snapshots
  • ETCD_BACKUP_S3_NAME: MinIO client host alias name
  • ETCD_BACKUP_S3_HOST: S3 host endpoint (with scheme)
  • ETCD_BACKUP_S3_BUCKET: S3 bucket name
  • ETCD_BACKUP_S3_ACCESS_KEY: access key to access S3 bucket
  • ETCD_BACKUP_S3_SECRET_KEY: secret key to access S3 bucket
  • ETCD_BACKUP_SUBDIR: Sub directory on PVC that should be used to store the backup. If it does not exist it will be created.
  • ETCD_BACKUP_DIRNAME: Directory name for a single backup. This is a format string used by date
  • ETCD_BACKUP_EXPIRE_TYPE:
    • days: Keep backups newer than backup.keepdays.
    • count: Keep a number of backups. backup.keepcount is used to determine how much.
    • never: Do not expire backups, keep all of them.
  • ETCD_BACKUP_KEEP_DAYS: Days to keep the backup. Only used if backup.expiretype is set to days.
  • ETCD_BACKUP_KEEP_COUNT: Number of backups to keep. Only used if backup.expiretype is set to count.
  • ETCD_BACKUP_UMASK: Umask used inside the script to set restrictive permission on written files, as they contain sensitive information.
  • ENDPOINT: The IP address of the etcd endpoint, without scheme or port, e.g. "192.168.39.86".

Note that the storage type is exclusive. This means it is either S3 or PVC. In case of using S3 we do not manage the retention within the backup script. We suggest using a rentention policy on the S3 bucket itself. This can be done thanks to an objects expiration configuration as described in the object lifecycle management documentation.

Changing the schedule be done in the CronJob directly, with spec.schedule:

kubectl edit -n etcd-backup cronjob/etcd-backup

Default is 0 0 * * * which means the CronJob runs one time a day at midnight.

Monitoring

To be able to get alerts when backups are failing or not being scheduled you can deploy this PrometheusRule.

kubectl create -n etcd-backup -f etcd-backup-cronjob-monitor.PrometheusRule.yaml

Helm chart

To easily deploy the solution a Helm chart is available on upstream Adfinis charts repository.

Installation

Fist create the namespace:

kubectl create namespace etcd-backup

Then create the secrets as described above. Finally update the values.yaml file according to your needs.

helm repo add adfinis https://charts.adfinis.com
helm install etcd-backup adfinis/kubernetes-etcd-backup

Development

Release Management

The CI/CD setup uses semantic commit messages following the conventional commits standard. There is a GitHub Action in .github/workflows/semantic-release.yaml that uses go-semantic-commit to create new releases.

The commit message should be structured as follows:

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

The commit contains the following structural elements, to communicate intent to the consumers of your library:

  1. fix: a commit of the type fix patches gets released with a PATCH version bump
  2. feat: a commit of the type feat gets released as a MINOR version bump
  3. BREAKING CHANGE: a commit that has a footer BREAKING CHANGE: gets released as a MAJOR version bump
  4. types other than fix: and feat: are allowed and don't trigger a release

If a commit does not contain a conventional commit style message you can fix it during the squash and merge operation on the PR.

References