WordPress on Kubernetes

The Definitive Guide to WordPress on k8s

A MySQL Replication Primer

Now that we can easily scale our WordPress pods in and out, let’s move on to the database. In this section we’re going to introduce some concepts around scaling MySQL and MariaDB databases: binary logging, replication basics, primaries and replicas and more.

Why Scale the Database?

Running a single-pod database, as we did up until now, is a perfectly valid way to run WordPress in a Kubernetes cluster. Not every WordPress application needs a scalable database, and there are a few aspects to consider to find out if yours does.

There are two main reasons to scale a database.

High availability allows our application to continue functioning if part of our system fails. For example, if a Kubernetes node that happens to be running our MariaDB or MySQL database crashes, a highly available setup may continue serving our application from a different node, without (or with brief) downtime.

Whether or not this is a requirement for your application depends on your business needs. Can you afford to take the risk of your application being offline for a disaster recovery process? In most cases the answer is yes and the actual impact (financial, reputation, etc.) of a few hours of downtime is quite minimal.

Performance is the second reason to scale a database. We covered vertical and horizontal scaling in an earlier section, and the same concepts apply here. Scaling vertically is much easier, cheaper and quicker to implement. Vertical scaling is also what actually makes database queries faster.

Scaling horizontally doesn’t make database queries faster, but it does allow your system to serve more queries overall. Query Monitor will tell you what database queries are run on every WordPress request as well as how long they took.

If you find queries are slow even with minimal load on the website, then you should prioritize vertical scaling (after query and/or schema optimization of course). If your queries are fairly quick, but do tend to get slower under load, then scaling out horizontally may be the best option.

Other reasons for scaling include: data sharding (though quite uncommon for WordPress applications), resiliency, backups via replication and disaster recovery.

So how exactly does a MySQL or MariaDB database scale horizontally?

Replication

Replication allows one database server to stay in sync with another database server, allowing the application to use both servers. The source database is often called the primary or master. The destination database is usually referred to as a replica or slave.

Scaling out involves adding new replicas to the cluster, thus increasing capacity. Scaling in removes one or more replicas from the cluster, decreasing the overall capacity.

This is the classic primary/replica replication model, although the popularity of primary/primary replication has skyrocketed in the recent years too. In this guide we’ll focus mostly on the primary/replica model, but feel free to check out writeset replication in Galera Cluster.

In MariaDB and MySQL replication works though a feature called binary logging. This is essentially a journal of every write that happens in the database. The binary log is used primarily for replication and data recovery. Each replica or slave reads the source binary log, and repeats every write operation on its own dataset, keeping it in sync.

For performance reasons this usually happens asynchronously, which also means there may be a delay when data between two or more servers is inconsistent. This is called replication lag and is something to be aware of, especially when building applications with heavy writes (WooCommerce or any LMS for example).

Reads vs. Writes

In the traditional primary/replica configuration, only the primary database server can accept write queries. All other servers in the same cluster can respond to read queries only. This means that our queries must be split into read and write groups, and land on the appropriate servers.

There are a few different ways to achieve this with MySQL and MariaDB. Some of them as separate services, such as MaxScale, ProxySQL. These proxies allow your application to operate as if you had a single database server. The splitting, routing and failover is performed in the proxy itself.

For WordPress applications, however, it is more common to use HyperDB or LudicrousDB. These implement the splitting and routing logic into the WordPress application database layer itself, including fail/retry logic, replication lag detection and more, generally giving you more control.

We’ll be configuring HyperDB in a later section, after getting a MariaDB replicated cluster up and running.

Primary/primary (master/master or dual-master) replication configurations are slightly less complex for the application layer, since the application is able to write to and read from any database server in a cluster, generally load-balanced through DNS. This however comes at some additional complexity on the database level.

As mentioned earlier, this guide will mostly be focused on the traditional primary/replica scaling model. This is also the model used with most managed MySQL service providers in public cloud platforms.

Load Balancing

Unfortunately for the primary server, there’s not much load to balance in the traditional primary/replica model since we’ll only have one primary node. Luckily, most WordPress applications are read-heavy so the load balancing is required for replica nodes.

On most cloud providers this is usually done at the DNS level, where a specific reader hostname is provided to the application, which resolves to the IP addresses of all replica servers in a round-robin fashion. This will distribute the load pretty evenly across all replica servers.

When running a proxy in front of the cluster, such as haproxy, ProxySQL, MaxScale, RDS Proxy, etc., the load balancing usually happens in the proxy service itself, and load balancing can happen based on the number of open connections to each replica, which is slightly more efficient.

HyperDB and LudicrousDB allow administrators to specify a full list of database servers and their roles. This can have the WordPress application itself balance the load between servers, including weighted balancing. However, in the event of adding, removing or changing server IPs or roles, the WordPress configuration will have to be updated, which is not ideal, especially if we’re hoping to auto-scale the database service.

Failover

In the event of a replica failure, reads from the application are sent to a different replica, or the primary server if no other replicas are available. Just like load balancing, this is usually done via DNS, but can also be done in the application layer itself via some fail/retry logic and even replication lag detection.

In the event of a primary server failure, things might get quite tricky. One of the replica nodes should be promoted to be the new primary, and all other replicas need to update their configuration to read the binary log from the new server. Plus, a new replica server should be provisioned to restore the cluster capacity. Application, proxy and/or DNS configurations will also require updating.

Often times this is easier said than done. However, managed database services on public cloud platforms can automate failover for you, and as you will see in the next section, the MariaDB operator for Kubernetes makes this surprisingly straightforward as well!

Recap

In this section we covered some theory behind scaling a MySQL or MariaDB database service. We mentioned the binary log and its role in replication, the traditional primary/replica model as well as the more complex primary/primary replication.

We chatted about splitting reads and writes, and how that could be handled in WordPress, or as a separate proxy service. We covered some load balancing and failover concepts.

In the next section we’ll look at a Kubernetes concept we haven’t really used so far in this guide – operators. We’ll then use the MariaDB operator to provision a database and look at ways to handle load balancing, failover and scaling.