In this section we’ll briefly look at some disaster recovery options for MariaDB databases. We’ll recap the backup and restore options demonstrated in the previous sections, and look at what other tools and techniques we can use to safeguard our data.
Note: all code samples from this section are available on GitHub.
Backup and Restore
The mysqldump
and mysql
utilities are the universally recognized standards for MySQL and MariaDB backups and restores. Although you can achieve many things with GUI tools nowadays (such as phpMyAdmin), we highly recommend getting comfortable with these command-line tools.
As demonstrated in the previous section, we can use mysqldump
to quickly create a database backup from any of the MariaDB nodes in our Kubernetes cluster, and compress it into a local file:
$ kubectl exec wordpress-mariadb-1 -- \
mysqldump -uroot -pverysecret wordpress \
--single-transaction | gzip -c9 \
> $(date +"%Y-%m%d-%H%M%S").sql.gz
We can then use gzcat
(or zcat
on some systems) to restore this dump into our production database. We do have to use the primary node for write operations:
$ gzcat 2024-0724-103130.sql.gz | kubectl exec -i \
wordpress-mariadb-0 -- mysql -uroot -pverysecret \
wordpress
We’ve also explored the use of the Backup
and Restore
custom resources in the MariaDB operator, which are definitely a great way to configure scheduled backups in a Kubernetes cluster. Furthermore, we’ve looked at using a special pod to access and look around our scheduled backups volume.
Backing up Binary Logs
Backups are usually the first line of defense when it comes to disaster recovery. However in databases under frequent writes, backups can quickly become outdated. Increasing the frequency is not always a great option, especially for very large databases.
For these situations you might consider backing up the MySQL binary logs (from /var/lib/mysql
) together with the logical dump. This will ensure that every write statement prior to the logical dump, is recorded as well. Restoring these (and doing incremental backups) will be covered in another section, but backing the binary logs up is a good first step.
We did briefly touch on CronJobs in Kubernetes, or rather an abstraction that creates a CronJob. Since the MariaDB operator does not have an option to backup binary logs, we’ll create our own CronJob in Kubernetes that does that.
The CronJob will spawn a MariaDB container (we need MariaDB tooling) and run the mysqlbinlog
utility to read the binary logs from a remote server (our primary) and store them onto the mounted Kubernetes volume.
Let’s create a new mariadb.backup.binlog.yml
manifest for this new CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: wordpress-backup-binlog
spec:
schedule: "0 0 * * 0"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
Each CronJob has a schedule in Unix Cron format. We’re using a weekly schedule in this example, but that can be reduced or increased, depending on how heavy the cluster is on writes.
containers:
- name: backup
image: docker-registry1.mariadb.com/library/mariadb:10.11.8
command:
- bash
- -c
- >
mkdir -p /pvc/binlogs; cd /pvc/binlogs;
BINLOG_START=$(mysql --host=wordpress-mariadb-primary --user=root
--password=${MARIADB_ROOT_PASSWORD} --silent --raw
-e 'show binary logs;' | head -1 | awk '{print $1}');
mysqlbinlog --read-from-remote-server
--host=wordpress-mariadb-primary --user=root
--password=${MARIADB_ROOT_PASSWORD} --raw
--to-last-log $BINLOG_START;
Next in the CronJob template spec is the containers definitions, similar to Deployments and StatefulSets in Kubernetes. Here we run a MariaDB container, but instead of starting a MariaDB server, we run bash
with some commands:
- Create a
/pvc/binlogs
directory if it doesn’t exist and change to it. - Determine the name of the first binary log file from the primary MariaDB server.
- Use the
mysqlbinlog
utility to fetch binary logs from the remote server, in raw format, starting at the first file, ending at the last.
This will result in the raw binary logs stored in the /pvc/binlogs
directory.
env:
- name: MARIADB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
key: MARIADB_ROOT_PASSWORD
name: mariadb-secrets
volumeMounts:
- mountPath: /pvc
name: pvc-mount
Next, we tell the container where to find our MariaDB root password through an environment variable, and where to mount the backups PVC. Finally, we define the PVC itself:
volumes:
- name: pvc-mount
persistentVolumeClaim:
claimName: wordpress-backup
We use the same wordpress-backup
persistent volume here which we used earlier for our database logical backups. Let’s apply this manifest to our Kubernetes cluster and make sure it’s running:
$ kubectl apply -f mariadb.backup.binlog.yml
cronjob.batch/wordpress-backup-binlog created
$ kubectl get cronjobs
NAME SCHEDULE TIMEZONE SUSPEND ACTIVE LAST SCHEDULE AGE
wordpress-backup */1 * * * * <none> False 0 25s 5d1h
wordpress-backup-binlog 0 0 * * 0 <none> False 0 <none> 16s
You might want to use something more frequent than a weekly schedule for testing purposes. You’ll also likely need to incorporate some retention policy/cleanup in the script, to make sure really old binary logs are deleted.
Off-site Replication
Depending on your disaster recovery requirements, a complete site/facility failure is something you might want to protect against.
Off-site backups is a good first step, for example shipping logical backups off to a different region or availability zone. However, this approach will still have some potential for data loss, equal to the interval between logical backups.
To protect against this, we can backup our entire database to an off-site server in real time using replication, lowering the potential data loss to the amount of replication lag and latency between the two facilities, usually well within one second.
Furthermore, depending on your application configuration, the remote database service can be set up as an emergency read replica, or even promoted to a primary in extreme cases.
Preparing the source MariaDB
Our use of MariaDB has been entirely within the cluster so far, but to access the database from a different network, we’ll need to make it publicly available.
There are a few different ways to achieve this, and the best one will vary depending on your configuration and/or provider, i.e. if you’re in a public cloud you might prefer a LoadBalancer mapped to a static IP address.
We’ll use a NodePort service in this example for simplicity, which will make our MariaDB primary service publicly available on a high port, routable from any node in our cluster. Note that this would require some additional security (allowing access from certain IPs, etc.) which we’ll cover in a separate section.
Let’s update our original mariadb.yml
configuration to reflect this:
primaryService:
type: NodePort
secondaryService:
type: ClusterIP
We changed our primaryService
to a NodePort
service type. We don’t really need to expose our secondary service at this point. Even in a failover event, a newly promoted primary will be accessed through the primary service, and thus available as a NodePort service.
Next, we’ll also need a predictable replication user password, which we can use in our off-site server. By default the MariaDB operator will randomly generate one for us, but we can force it to use a specific secret instead with the replPasswordSecretKeyRef
attribute:
replicas: 3
replication:
enabled: true
replica:
replPasswordSecretKeyRef:
name: mariadb-secrets
key: MARIADB_REPL_PASSWORD
Our mariadb.secrets.yml
will need that entry of course:
apiVersion: v1
kind: Secret
metadata:
name: mariadb-secrets
labels:
k8s.mariadb.com/watch:
stringData:
MARIADB_PASSWORD: secret
MARIADB_ROOT_PASSWORD: verysecret
MARIADB_REPL_PASSWORD: stillsecret
Let’s apply these updated configurations and make sure the database is accessible using the replication user:
$ kubectl apply \
-f mariadb.secrets.yml
-f mariadb.yml
$ kubectl get svc
# (some output omitted)
wordpress-mariadb-primary NodePort 10.100.160.6 <none> 3306:30606/TCP 18h
In our case the port chosen for the NodePort service is 30606. We can now use this with any node in our cluster:
$ mysql -uroot -pverysecret -hk0 -P30606 -e "select 1;"
+---+
| 1 |
+---+
| 1 |
+---+
$ mysql -urepl -pstillsecret -hk0 -P30606 -e "select 1;"
+---+
| 1 |
+---+
| 1 |
+---+
Make sure the repl
user is able to connect to the service as this is the user we’ll configure in our off-site MariaDB server. Next, we’ll need a full database dump to initialize our off-site replica, one that will contain our binary log position.
$ kubectl exec -it wordpress-mariadb-0 -- mysqldump \
-uroot -pverysecret --all-databases \
--master-data=2 --flush-logs \
--single-transaction > dump.sql
The --master-data
flag will add some comments to the output dump with the GTID position, which is going to be important when we initialize our new replica. These comments are usually within the first 30 lines of the dump file:
$ head -n30 dump.sql
-- Preferably use GTID to start replication from GTID position:
-- SET GLOBAL gtid_slave_pos='0-10-7881';
The MariaDB operator is configured to use GTID (global transaction IDs) by default, which is a good option for replication, so make note of that gtid_slave_pos
. If you’re not using global transactions, you’ll find a comment with the MASTER_LOG_FILE
and MASTER_LOG_POS
further down the dump.
Creating the Replica
Now that our source database is publicly accessible and we’ve obtained a full database dump with the binary log positions, let’s create our off-site replica server.
Hopefully we’re running a Kubernetes cluster on the remote location too, so we could utilize the MariaDB operator with all the bells and whistles. For demonstration purposes here however, we’ll be setting this up in the same cluster, but under a different namespace.
Let’s create a new offsite
namespace and use kubens to switch to that namespace, which is a common practice when working with larger clusters or multiple clusters.
$ kubectl create namespace offsite
namespace/offsite created
$ kubectl ns offsite
Context "kubernetes-admin@kubernetes" modified.
Active namespace is "offsite".
$ kubectl get pods
No resources found in offsite namespace.
Let’s create a new offsite/mariadb.secrets.yml
manifest to make sure we have a working root password (we’re pretending it’s a new cluster and we can’t reuse our existing secrets):
apiVersion: v1
kind: Secret
metadata:
name: mariadb-secrets
stringData:
MARIADB_ROOT_PASSWORD: verysecret
Our offsite/mariadb.yml
will look like this:
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: wordpress-mariadb-offsite
spec:
rootPasswordSecretKeyRef:
name: mariadb-secrets
key: MARIADB_ROOT_PASSWORD
storage:
size: 1Gi
storageClassName: openebs-hostpath
myCnf: |
[mariadb]
server_id=900
log_bin=wordpress-mariadb-offsite
read_only=1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
memory: 256Mi
Unlike our existing MariaDB cluster, we will not use replication in this configuration. Instead, we’ll use the myCnf
attribute to pass some initialization options to the new server:
- The
server_id
has to be unique in a replicated cluster, and given that our cluster server IDs increment with the number of active replicas in the StatefulSet, we’ll need a number we’re unlikely to reach with our original cluster. - The
log_bin
variable enables binary logging on the new replica. Not really a requirement, but can be useful to recover from a human error that’s propagated to all replicas in a cluster. - The
read_only
flag makes sure changes to our new replica are not possible outside of the root and replication users.
Let’s create the new database and import our dump.sql file:
$ kubectl apply \
-f offsite/mariadb.secrets.yml \
-f offsite/mariadb.yml
secret/mariadb-secrets created
mariadb.k8s.mariadb.com/wordpress-mariadb-offsite created
$ cat dump.sql | kubectl exec -i wordpress-mariadb-offsite-0 -- mysql -uroot -pverysecret
Now let’s get a MySQL shell on the new server, make sure our import succeeded, then configure replication:
$ kubectl exec -it wordpress-mariadb-offsite-0 -- mysql -uroot -pverysecret
MariaDB [(none)]> use wordpress;
MariaDB [wordpress]> show tables;
+-----------------------+
| Tables_in_wordpress |
+-----------------------+
| wp_commentmeta |
| wp_comments |
| wp_links |
# ...
We’ll use the CHANGE MASTER TO command to let MySQL know where the primary server is and its credentials:
MariaDB [(none)]> CHANGE MASTER TO
MASTER_HOST='10.0.10.100',
MASTER_PORT=30606,
MASTER_USER='repl',
MASTER_PASSWORD='stillsecret';
Query OK, 0 rows affected (0.007 sec)
Finally, let’s set the replication GTID (first few lines within the dump.sql file) and start replication:
MariaDB [(none)]> SET GLOBAL gtid_slave_pos='0-10-8441';
Query OK, 0 rows affected (0.007 sec)
MariaDB [(none)]> START REPLICA;
Query OK, 0 rows affected (0.008 sec)
MariaDB [(none)]> SHOW REPLICA STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.10.100
Master_User: repl
Master_Port: 30606
Let’s switch back to our default
namespace, and verify that our primary server knows about the new replica:
$ kubectl ns default
Context "kubernetes-admin@kubernetes" modified.
Active namespace is "default".
$ kubectl exec -it wordpress-mariadb-0 -- \
mysql -uroot -pverysecret \
-e 'show replica hosts;'
+-----------+-------------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+-------------+------+-----------+
| 900 | 10.10.0.66 | 3306 | 10 |
| 12 | 10.10.2.70 | 3306 | 10 |
| 11 | 10.10.3.103 | 3306 | 10 |
+-----------+-------------+------+-----------+
Creating a scheduled backup of the replica may also be a good idea, in addition to backing up its binary logs of course.
Recovery Time & Point
We won’t go into too much depth around RTO and RPO in this guide, but having an offsite backup via replication with binary logs will cover the most common scenarios, and will allow for point-in-time recovery, up to the exact SQL statement where things have gone wrong. The mysqlbinlog utility is a great place to start exploring that.
Recovery time will highly depend on what level of automation you’re looking for. The MariaDB operator already provides quite an impressive recovery time of just a few seconds for switching the primary. Switching to an offsite replica would require some additional automation work.
Recap
In this section we’ve touched on disaster recovery for MariaDB and MySQL services. The MariaDB operator for Kubernetes covers most of the usual requirements with scheduled backups. We’ve also covered backing up binary logs using a custom CronJob in Kubernetes.
Finally, we created a new off-site MariaDB replica, which connects to our primary server and backs up all of our data in real time using MySQL replication.