MariaDB Disaster Recovery - WordPress on Kubernetes

In this section we’ll briefly look at some disaster recovery options for MariaDB databases. We’ll recap the backup and restore options demonstrated in the previous sections, and look at what other tools and techniques we can use to safeguard our data.

Note: all code samples from this section are available on GitHub.

Backup and Restore

The mysqldump and mysql utilities are the universally recognized standards for MySQL and MariaDB backups and restores. Although you can achieve many things with GUI tools nowadays (such as phpMyAdmin), we highly recommend getting comfortable with these command-line tools.

As demonstrated in the previous section, we can use mysqldump to quickly create a database backup from any of the MariaDB nodes in our Kubernetes cluster, and compress it into a local file:

$ kubectl exec wordpress-mariadb-1 -- \
  mysqldump -uroot -pverysecret wordpress \
  --single-transaction | gzip -c9 \
  > $(date +"%Y-%m%d-%H%M%S").sql.gz

We can then use gzcat (or zcat on some systems) to restore this dump into our production database. We do have to use the primary node for write operations:

$ gzcat 2024-0724-103130.sql.gz | kubectl exec -i \
  wordpress-mariadb-0 -- mysql -uroot -pverysecret \
  wordpress

We’ve also explored the use of the Backup and Restore custom resources in the MariaDB operator, which are definitely a great way to configure scheduled backups in a Kubernetes cluster. Furthermore, we’ve looked at using a special pod to access and look around our scheduled backups volume.

Backing up Binary Logs

Backups are usually the first line of defense when it comes to disaster recovery. However in databases under frequent writes, backups can quickly become outdated. Increasing the frequency is not always a great option, especially for very large databases.

For these situations you might consider backing up the MySQL binary logs (from /var/lib/mysql) together with the logical dump. This will ensure that every write statement prior to the logical dump, is recorded as well. Restoring these (and doing incremental backups) will be covered in another section, but backing the binary logs up is a good first step.

We did briefly touch on CronJobs in Kubernetes, or rather an abstraction that creates a CronJob. Since the MariaDB operator does not have an option to backup binary logs, we’ll create our own CronJob in Kubernetes that does that.

The CronJob will spawn a MariaDB container (we need MariaDB tooling) and run the mysqlbinlog utility to read the binary logs from a remote server (our primary) and store them onto the mounted Kubernetes volume.

Let’s create a new mariadb.backup.binlog.yml manifest for this new CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: wordpress-backup-binlog
spec:
  schedule: "0 0 * * 0"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure

Each CronJob has a schedule in Unix Cron format. We’re using a weekly schedule in this example, but that can be reduced or increased, depending on how heavy the cluster is on writes.

          containers:
          - name: backup
            image: docker-registry1.mariadb.com/library/mariadb:10.11.8
            command:
            - bash
            - -c
            - >
              mkdir -p /pvc/binlogs; cd /pvc/binlogs;
              BINLOG_START=$(mysql --host=wordpress-mariadb-primary --user=root
              --password=${MARIADB_ROOT_PASSWORD} --silent --raw
              -e 'show binary logs;' | head -1 | awk '{print $1}');
              mysqlbinlog --read-from-remote-server
              --host=wordpress-mariadb-primary --user=root
              --password=${MARIADB_ROOT_PASSWORD} --raw
              --to-last-log $BINLOG_START;

Next in the CronJob template spec is the containers definitions, similar to Deployments and StatefulSets in Kubernetes. Here we run a MariaDB container, but instead of starting a MariaDB server, we run bash with some commands:

Create a /pvc/binlogs directory if it doesn’t exist and change to it.
Determine the name of the first binary log file from the primary MariaDB server.
Use the mysqlbinlog utility to fetch binary logs from the remote server, in raw format, starting at the first file, ending at the last.

This will result in the raw binary logs stored in the /pvc/binlogs directory.

            env:
            - name: MARIADB_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: MARIADB_ROOT_PASSWORD
                  name: mariadb-secrets
            volumeMounts:
            - mountPath: /pvc
              name: pvc-mount

Next, we tell the container where to find our MariaDB root password through an environment variable, and where to mount the backups PVC. Finally, we define the PVC itself:

          volumes:
          - name: pvc-mount
            persistentVolumeClaim:
              claimName: wordpress-backup

We use the same wordpress-backup persistent volume here which we used earlier for our database logical backups. Let’s apply this manifest to our Kubernetes cluster and make sure it’s running:

$ kubectl apply -f mariadb.backup.binlog.yml 
cronjob.batch/wordpress-backup-binlog created

$ kubectl get cronjobs
NAME                      SCHEDULE      TIMEZONE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
wordpress-backup          */1 * * * *   <none>     False     0        25s             5d1h
wordpress-backup-binlog   0 0 * * 0     <none>     False     0        <none>          16s

You might want to use something more frequent than a weekly schedule for testing purposes. You’ll also likely need to incorporate some retention policy/cleanup in the script, to make sure really old binary logs are deleted.

Off-site Replication

Depending on your disaster recovery requirements, a complete site/facility failure is something you might want to protect against.

Off-site backups is a good first step, for example shipping logical backups off to a different region or availability zone. However, this approach will still have some potential for data loss, equal to the interval between logical backups.

To protect against this, we can backup our entire database to an off-site server in real time using replication, lowering the potential data loss to the amount of replication lag and latency between the two facilities, usually well within one second.

Furthermore, depending on your application configuration, the remote database service can be set up as an emergency read replica, or even promoted to a primary in extreme cases.

Preparing the source MariaDB

Our use of MariaDB has been entirely within the cluster so far, but to access the database from a different network, we’ll need to make it publicly available.

There are a few different ways to achieve this, and the best one will vary depending on your configuration and/or provider, i.e. if you’re in a public cloud you might prefer a LoadBalancer mapped to a static IP address.

We’ll use a NodePort service in this example for simplicity, which will make our MariaDB primary service publicly available on a high port, routable from any node in our cluster. Note that this would require some additional security (allowing access from certain IPs, etc.) which we’ll cover in a separate section.

Let’s update our original mariadb.yml configuration to reflect this:

  primaryService:
    type: NodePort

  secondaryService:
    type: ClusterIP

We changed our primaryService to a NodePort service type. We don’t really need to expose our secondary service at this point. Even in a failover event, a newly promoted primary will be accessed through the primary service, and thus available as a NodePort service.

Next, we’ll also need a predictable replication user password, which we can use in our off-site server. By default the MariaDB operator will randomly generate one for us, but we can force it to use a specific secret instead with the replPasswordSecretKeyRef attribute:

  replicas: 3
  replication:
    enabled: true
    replica:
      replPasswordSecretKeyRef:
        name: mariadb-secrets
        key: MARIADB_REPL_PASSWORD

Our mariadb.secrets.yml will need that entry of course:

apiVersion: v1
kind: Secret
metadata:
  name: mariadb-secrets
  labels:
    k8s.mariadb.com/watch:
stringData:
  MARIADB_PASSWORD: secret
  MARIADB_ROOT_PASSWORD: verysecret
  MARIADB_REPL_PASSWORD: stillsecret

Let’s apply these updated configurations and make sure the database is accessible using the replication user:

$ kubectl apply \
  -f mariadb.secrets.yml
  -f mariadb.yml

$ kubectl get svc
# (some output omitted)
wordpress-mariadb-primary            NodePort    10.100.160.6     <none>        3306:30606/TCP   18h

In our case the port chosen for the NodePort service is 30606. We can now use this with any node in our cluster:

$ mysql -uroot -pverysecret -hk0 -P30606 -e "select 1;"
+---+
| 1 |
+---+
| 1 |
+---+

$ mysql -urepl -pstillsecret -hk0 -P30606 -e "select 1;"
+---+
| 1 |
+---+
| 1 |
+---+

Make sure the repl user is able to connect to the service as this is the user we’ll configure in our off-site MariaDB server. Next, we’ll need a full database dump to initialize our off-site replica, one that will contain our binary log position.

$ kubectl exec -it wordpress-mariadb-0 -- mysqldump \
  -uroot -pverysecret --all-databases \
  --master-data=2 --flush-logs \
  --single-transaction > dump.sql

The --master-data flag will add some comments to the output dump with the GTID position, which is going to be important when we initialize our new replica. These comments are usually within the first 30 lines of the dump file:

$ head -n30 dump.sql
-- Preferably use GTID to start replication from GTID position:
-- SET GLOBAL gtid_slave_pos='0-10-7881';

The MariaDB operator is configured to use GTID (global transaction IDs) by default, which is a good option for replication, so make note of that gtid_slave_pos. If you’re not using global transactions, you’ll find a comment with the MASTER_LOG_FILE and MASTER_LOG_POS further down the dump.

Creating the Replica

Now that our source database is publicly accessible and we’ve obtained a full database dump with the binary log positions, let’s create our off-site replica server.

Hopefully we’re running a Kubernetes cluster on the remote location too, so we could utilize the MariaDB operator with all the bells and whistles. For demonstration purposes here however, we’ll be setting this up in the same cluster, but under a different namespace.

Let’s create a new offsite namespace and use kubens to switch to that namespace, which is a common practice when working with larger clusters or multiple clusters.

$ kubectl create namespace offsite
namespace/offsite created

$ kubectl ns offsite
Context "kubernetes-admin@kubernetes" modified.
Active namespace is "offsite".

$ kubectl get pods
No resources found in offsite namespace.

Let’s create a new offsite/mariadb.secrets.yml manifest to make sure we have a working root password (we’re pretending it’s a new cluster and we can’t reuse our existing secrets):

apiVersion: v1
kind: Secret
metadata:
  name: mariadb-secrets
stringData:
  MARIADB_ROOT_PASSWORD: verysecret

Our offsite/mariadb.yml will look like this:

apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
  name: wordpress-mariadb-offsite
spec:
  rootPasswordSecretKeyRef:
    name: mariadb-secrets
    key: MARIADB_ROOT_PASSWORD

  storage:
    size: 1Gi
    storageClassName: openebs-hostpath

  myCnf: |
    [mariadb]
    server_id=900
    log_bin=wordpress-mariadb-offsite
    read_only=1

  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      memory: 256Mi

Unlike our existing MariaDB cluster, we will not use replication in this configuration. Instead, we’ll use the myCnf attribute to pass some initialization options to the new server:

The server_id has to be unique in a replicated cluster, and given that our cluster server IDs increment with the number of active replicas in the StatefulSet, we’ll need a number we’re unlikely to reach with our original cluster.
The log_bin variable enables binary logging on the new replica. Not really a requirement, but can be useful to recover from a human error that’s propagated to all replicas in a cluster.
The read_only flag makes sure changes to our new replica are not possible outside of the root and replication users.

Let’s create the new database and import our dump.sql file:

$ kubectl apply \
  -f offsite/mariadb.secrets.yml \
  -f offsite/mariadb.yml 
secret/mariadb-secrets created
mariadb.k8s.mariadb.com/wordpress-mariadb-offsite created

$ cat dump.sql | kubectl exec -i wordpress-mariadb-offsite-0 -- mysql -uroot -pverysecret

Now let’s get a MySQL shell on the new server, make sure our import succeeded, then configure replication:

$ kubectl exec -it wordpress-mariadb-offsite-0 -- mysql -uroot -pverysecret
MariaDB [(none)]> use wordpress;
MariaDB [wordpress]> show tables;
+-----------------------+
| Tables_in_wordpress   |
+-----------------------+
| wp_commentmeta        |
| wp_comments           |
| wp_links              |
# ...

We’ll use the CHANGE MASTER TO command to let MySQL know where the primary server is and its credentials:

MariaDB [(none)]> CHANGE MASTER TO
    MASTER_HOST='10.0.10.100',
    MASTER_PORT=30606,
    MASTER_USER='repl',
    MASTER_PASSWORD='stillsecret';
Query OK, 0 rows affected (0.007 sec)

Finally, let’s set the replication GTID (first few lines within the dump.sql file) and start replication:

MariaDB [(none)]> SET GLOBAL gtid_slave_pos='0-10-8441';
Query OK, 0 rows affected (0.007 sec)

MariaDB [(none)]> START REPLICA;
Query OK, 0 rows affected (0.008 sec)

MariaDB [(none)]> SHOW REPLICA STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
    Master_Host: 10.0.10.100
    Master_User: repl
    Master_Port: 30606

Let’s switch back to our default namespace, and verify that our primary server knows about the new replica:

$ kubectl ns default
Context "kubernetes-admin@kubernetes" modified.
Active namespace is "default".

$ kubectl exec -it wordpress-mariadb-0 -- \             
  mysql -uroot -pverysecret \
  -e 'show replica hosts;'
+-----------+-------------+------+-----------+
| Server_id | Host        | Port | Master_id |
+-----------+-------------+------+-----------+
|       900 | 10.10.0.66  | 3306 |        10 |
|        12 | 10.10.2.70  | 3306 |        10 |
|        11 | 10.10.3.103 | 3306 |        10 |
+-----------+-------------+------+-----------+

Creating a scheduled backup of the replica may also be a good idea, in addition to backing up its binary logs of course.

Recovery Time & Point

We won’t go into too much depth around RTO and RPO in this guide, but having an offsite backup via replication with binary logs will cover the most common scenarios, and will allow for point-in-time recovery, up to the exact SQL statement where things have gone wrong. The mysqlbinlog utility is a great place to start exploring that.

Recovery time will highly depend on what level of automation you’re looking for. The MariaDB operator already provides quite an impressive recovery time of just a few seconds for switching the primary. Switching to an offsite replica would require some additional automation work.

Recap

In this section we’ve touched on disaster recovery for MariaDB and MySQL services. The MariaDB operator for Kubernetes covers most of the usual requirements with scheduled backups. We’ve also covered backing up binary logs using a custom CronJob in Kubernetes.

Finally, we created a new off-site MariaDB replica, which connects to our primary server and backs up all of our data in real time using MySQL replication.

Would you like daily or weekly reminders to help you stay on track? Follow by email for a quick nudge to keep your momentum going.