Backup & Restore MariaDB - WordPress on Kubernetes

In this section we’ll look at some traditional tools to perform database backups and restores in our Kubernetes cluster. We’ll also look at some wrappers around those tools, allowing for scheduled backups, as well as automatically shipping backups to an S3-compatible storage, such as our MinIO service.

Note: all code samples from this section are available on GitHub.

mysqldump

We can’t talk about MySQL or MariaDB backups without the industry standard mysqldump utility and it’s MariaDB-flavored mariadb-dump (they’re the exact same executable on MariaDB containers).

In previous sections we’ve already used the mysql binary to run queries in our MariaDB containers. Using mysqldump is not that different, however, large database dumps can be somewhat resource intensive, so it’s always recommended to perform them on a replica server if possible.

$ kubectl get mariadbs
NAME                READY   STATUS    PRIMARY POD           AGE
wordpress-mariadb   True    Running   wordpress-mariadb-0   78m

In our configuration, wordpress-mariadb-0 is the primary pod, so our -1 and -2 servers are replicas. Here’s a single-transaction dump from our first replica:

$ kubectl exec wordpress-mariadb-1 -- \
  mysqldump -uroot -pverysecret wordpress \
  --single-transaction

We can pipe this into a compressed file on our local system:

$ kubectl exec wordpress-mariadb-1 -- \
  mysqldump -uroot -pverysecret wordpress \
  --single-transaction | gzip -c9 \
  > $(date +"%Y-%m%d-%H%M%S").sql.gz

This is useful if you need to quickly create a database dump for your local development environment or some local analysis. It’s also quite useful if you’re migrating an existing database from elsewhere into Kubernetes, and may have some existing backup scripts, which can be adapted to fit this format.

Restore a database

Similar to mysqldump we can use mysql to restore the data into our production cluster. Note that we do need to do this on the primary server as it’s the only one accepting writes:

$ gzcat 2024-0724-103130.sql.gz | kubectl exec -i \
  wordpress-mariadb-0 -- mysql -uroot -pverysecret \
  wordpress

After a database import it’s usually advised to flush the WordPress object cache (if it’s persistent). We’ll cover object caching and running CLI commands in a Kubernetes-based WordPress installation in a future section.

In addition to being able to run these traditional tools, the MariaDB operator provides several abstractions for convenience.

Operator Backups

The MariaDB Operator provides a coupe of custom resources to help with backups and restores. These abstractions are called Backup and Restore and use underlying Kubernetes Jobs and CronJob resources.

Unlike the mysqldump examples above, which produce the SQL dump files on the computer running kubectl, the operator Backup and Restore jobs run within the Kubernetes cluster context. This requires them to explicitly define the destination for their backups. These can be persistent volumes within the Kubernetes cluster, or S3-compatible storage services (external or internal).

Let’s create an example Backup resource:

apiVersion: k8s.mariadb.com/v1alpha1
kind: Backup
metadata:
  name: wordpress-backup
spec:
  mariaDbRef:
    name: wordpress-mariadb
  storage:
    persistentVolumeClaim:
      storageClassName: openebs-hostpath
      resources:
        requests:
          storage: 100Mi
      accessModes:
        - ReadWriteOnce

This creates a one-time backup job that claims a 100Mi persistent volume and performs the backup. As with other MariaDB related resources, we tell it which cluster to backup using the mariaDbRef attribute.

Let’s call this manifest mariadb.backup.pvc.yml and add it to our Kubernetes cluster:

$ kubectl apply -f mariadb.backup.pvc.yml
backup.k8s.mariadb.com/wordpress-backup created

$ kubectl get jobs
NAME               STATUS     COMPLETIONS   DURATION   AGE
wordpress-backup   Complete   1/1           8s         72s

$ kubectl get backups
NAME               COMPLETE   STATUS    MARIADB             AGE
wordpress-backup   True       Success   wordpress-mariadb   14s

$ kubectl get pods
NAME                                                READY   STATUS      RESTARTS        AGE
wordpress-backup-mb96d                              0/1     Completed   0               20s

We’ve omitted some output from the pods list, but as you can see the backup resource creates a Kubernetes Job resource, which launches a Pod that claims a persistent volume and ultimately runs mysqldump.

CronJobs

Jobs in Kubernetes are single-use. We can’t re-run them on demand, so for every such manual backup, we’ll need to create a new job. We can, however use CronJobs in Kubernetes, which are a great fit for database backups and other maintenance tasks.

The Backup resource of the MariaDB operator supports a Cron schedule, as well as a maxRetention property. Let’s delete our existing backup:

$ kubectl delete -f mariadb.backup.pvc.yml 
backup.k8s.mariadb.com "wordpress-backup" deleted

Now let’s update our manifest to include a schedule and our max retention. For testing purposes we’re going to create a backup every minute, with a 10 minute retention:

apiVersion: k8s.mariadb.com/v1alpha1
kind: Backup
metadata:
  name: wordpress-backup
spec:
  mariaDbRef:
    name: wordpress-mariadb
  schedule:
    cron: "*/1 * * * *"
  maxRetention: 10m
  storage:
    persistentVolumeClaim:
      storageClassName: openebs-hostpath
      resources:
        requests:
          storage: 100Mi
      accessModes:
        - ReadWriteOnce

Let’s re-create this resource in the Kubernetes cluster and look around:

$ kubectl apply -f mariadb.backup.pvc.yml 
backup.k8s.mariadb.com/wordpress-backup created

$ kubectl get cronjobs
NAME               SCHEDULE      TIMEZONE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
wordpress-backup   */1 * * * *   <none>     False     0        34s             3m26s

$ kubectl get jobs
NAME                        STATUS     COMPLETIONS   DURATION   AGE
wordpress-backup-28697081   Complete   1/1           5s         3m2s
wordpress-backup-28697082   Complete   1/1           5s         2m2s
wordpress-backup-28697083   Complete   1/1           4s         62s
wordpress-backup-28697084   Running    0/1           2s         2s

As you can see, this CronJob is now creating new Jobs in our Kubernetes cluster every minute, which are creating Pods, running mysqldump with our persistent volume attached.

Browsing PVCs

Often times when working with persistent volumes in Kubernetes, you might want to browse that volume, to perhaps download a specific file, or even update a configuration file in place when debugging an application.

We can do this in Kubernetes by creating a new Pod that mounts the persistent volume and runs a busybox container, allowing us to shell in and explore. Let’s create our Pod manifest:

apiVersion: v1
kind: Pod
metadata:
  name: pvc-browser
spec:
  containers:
  - image: busybox
    name: pvc-browser
    command: ['sleep', 'infinity']
    volumeMounts:
    - mountPath: /pvc
      name: pvc-mount

  volumes:
  - name: pvc-mount
    persistentVolumeClaim:
      claimName: wordpress-backup

This manifest runs a pod in our cluster that claims the same wordpress-backup PVC and mounts it into the /pvc directory in our busybox container that infinitely sleeps. Let’s call this manifest pvc-browser.yml and add it to our Kubernetes cluster:

$ kubectl apply -f pvc-browser.yml
pod/pvc-browser created

$ kubectl exec -it pvc-browser -- sh
# ls -lh /pvc
total 27M    
-rwxrwxrwx    1 999      999           31 Jul 24 12:58 0-backup-target.txt
-rw-rw-r--    1 999      999         2.7M Jul 24 12:49 backup.2024-07-24T12:49:00Z.sql
-rw-rw-r--    1 999      999         2.7M Jul 24 12:50 backup.2024-07-24T12:50:00Z.sql
-rw-rw-r--    1 999      999         2.7M Jul 24 12:51 backup.2024-07-24T12:51:00Z.sql

As mentioned, this is incredibly useful not just for backups, but for looking around volumes in general when working with Kubernetes. You can use kubectl cp to copy these files to your local computer if needed too.

Do note, however, that some storage classes will not allow multiple pods to claim the same volume, even if they’re on the same node. This means that your backups may be failing while the browser pod is running.

After having the CronJob running for a while, you’ll notice that the 10 minute retention policy works as expected, deleting backups that are older than 10 minutes.

Let’s delete the browser pod, backup configuration and backup volume before looking at shipping backups to S3.

$ kubectl delete -f pvc-browser.yml -f mariadb.backup.pvc.yml 
pod "pvc-browser" deleted
backup.k8s.mariadb.com "wordpress-backup" deleted

$ kubectl delete pvc wordpress-backup
persistentvolumeclaim "wordpress-backup" deleted

Backups to S3

We’ve configured a MinIO instance in a previous section for saving WordPress media uploads to an S3-compatible storage. Let’s use the same MinIO instance to hold our MariaDB database backups.

First, let’s use the MinIO console to create a new bucket called mariadb, and an access key/secret, and add those to our existing mariadb.secrets.yml:

apiVersion: v1
kind: Secret
metadata:
  name: mariadb-secrets
  labels:
    k8s.mariadb.com/watch:
stringData:
  MARIADB_PASSWORD: secret
  MARIADB_ROOT_PASSWORD: verysecret
  MINIO_ACCESS_KEY: O112zy6WaWTjZ4BQpqgd
  MINIO_SECRET_KEY: Bpqe8KqWl6bLsLNmizj1ebwClmNCkOsKDioKHTMs

Next we’ll create a new manifest called mariadb.backup.s3.yml with our S3 storage configuration:

apiVersion: k8s.mariadb.com/v1alpha1
kind: Backup
metadata:
  name: wordpress-backup-s3
spec:
  mariaDbRef:
    name: wordpress-mariadb
  schedule:
    cron: "*/1 * * * *"
  maxRetention: 10m
  storage:
    s3:
      bucket: mariadb
      endpoint: minio:9000
      accessKeyIdSecretKeyRef:
        name: mariadb-secrets
        key: MINIO_ACCESS_KEY
      secretAccessKeySecretKeyRef:
        name: mariadb-secrets
        key: MINIO_SECRET_KEY

Note that the storage.s3.endpoint contains our minio service name and port, which is a Kubernetes ClusterIP service defined in a previous section when configuring MinIO.

We set the storage.s3.bucket name to the mariadb bucket we created moments ago, and link to the access and secret keys from the mariadb-secrets resource.

Let’s apply both manifests to our Kubernetes cluster:

$ kubectl apply \
  -f mariadb.secrets.yml \
  -f mariadb.backup.s3.yml
secret/mariadb-secrets configured
backup.k8s.mariadb.com/wordpress-backup-s3 created

Wait a few minutes and see our backup jobs (hopefully) succeeding:

$ kubectl get jobs
NAME                           STATUS     COMPLETIONS   DURATION   AGE
wordpress-backup-s3-28697152   Complete   1/1           4s         2m13s
wordpress-backup-s3-28697153   Complete   1/1           4s         73s
wordpress-backup-s3-28697154   Complete   1/1           5s         13s

This can also be observed via the MinIO web console:

The backups jobs will also take care of deleting older backups which no longer fit the retention policy, just like the PVC-based backups.

Restoring Backups

The final piece of the puzzle is the ability to restore the backups created with the MariaDB operator. This can be achieved using the Restore custom resource provided by the operator:

apiVersion: k8s.mariadb.com/v1alpha1
kind: Restore
metadata:
  name: restore
spec:
  mariaDbRef:
    name: wordpress-mariadb
  backupRef:
    name: wordpress-backup

A backup object is linked to a MariaDB cluster via the mariaDbRef property, and a backup object via the backupRef reference. Creating this restore object in our Kubernetes cluster will immediately launch a restoration job, that will import the latest MariaDB backup that’s available in the linked backup resource.

$ kubectl apply -f mariadb.restore.yml 
restore.k8s.mariadb.com/restore created

$ kubectl get restores
NAME      COMPLETE   STATUS    MARIADB             AGE
restore   True       Success   wordpress-mariadb   8s

Of course getting the latest backup is not always the one we need. That’s where the targetRecoveryTime attribute comes in, which allows us to specify a specific timestamp, and the MariaDB operator will look for an available backup that’s closest to that recovery time.

Let’s create a restore resource with a target recovery time that’s the beginning of the Unix epoch:

apiVersion: k8s.mariadb.com/v1alpha1
kind: Restore
metadata:
  name: restore
spec:
  mariaDbRef:
    name: wordpress-mariadb
  backupRef:
    name: wordpress-backup
  targetRecoveryTime: 1970-01-01T00:00:00Z

Deleting and re-creating this resource should return our MariaDB cluster to a state that’s the earliest available backup:

$ kubectl delete -f mariadb.restore.yml
restore.k8s.mariadb.com "restore" deleted

$ kubectl apply -f mariadb.restore.yml 
restore.k8s.mariadb.com/restore created

Of course as demonstrated earlier, you could always download the specific backup file, modify as needed, and import explicitly to the primary database pod using the mysql command line utility.

What’s next?

There are a few other options for backups and restores via the MariaDB operator which may be useful, especially if you’re running a database cluster that’s used across multiple applications. We encourage you to explore these, as well as the various options you can pass to mysql and mysqldump to balance between speed, compatibility, etc.

In this section we looked at using traditional mysqldump and mysql utilities to create and restore MySQL and MariaDB backups. We also looked at the custom resources provided by the MariaDB operator, to create, restore and schedule backups stored in a persistent volume or an S3 compatible storage.

In the next section we’ll look at some disaster recovery options for when things go really bad.