HyperDB for WordPress - WordPress on Kubernetes

In the previous section we’ve configured a MariaDB cluster which has two endpoints – one for writing and one for reading. Out of the box, WordPress does not support working with multiple databases and splitting read/write queries, however this is possible with a special plugin for WordPress called HyperDB.

Note: all code samples from this section are available on GitHub.

In this section we’ll introduce HyperDB into our WordPress application, we’ll configure it to speak to our primary database as well as all replicas. We’ll briefly look at some other features included with HyperDB and how they can be effectively used in large-scale WordPress deployments.

HyperDB

HyperDB is a plugin (or drop-in rather) for WordPress, which implements a custom database class. This class is then used by all WordPress core components and abstractions (Queries, Users, Terms, Comments, etc.) as well as third-party themes and plugins that use the $wpdb object for custom database operations.

One of HyperDB’s main features is support for database replication. That means splitting writes and reads between primary and replica servers. In addition to that, HyperDB supports sharding/partitioning, failover and load balancing. We’ll touch on a couple of these, though most of them are useful only for very large WordPress deployments.

Installing HyperDB

HyperDB is not a traditional WordPress plugin. It can’t be downloaded, installed and updated through the WordPress dashboard or through the official plugin directory.

It’s a db.php drop-in, which as a PHP file that should be copied to (or linked from) the wp-content directory in WordPress, and is automatically loaded very very early in the application bootstrap process.

When loaded, this file looks for a db-config.php file right next to the wp-config.php file. That’s where your new, slightly more complicated database configuration is going to live. A sample db-config.php file is provided with some well-commented examples, showcasing some of the HyperDB features. We highly recommend reading this sample configuration to understand what HyperDB is all about.

In the previous sections we’ve been using a private GitHub repository with our WordPress plugin and theme files (S3 uploads, etc.) This repository is then deployed through an Init Container into our Kubernets cluster.

Let’s create a new branch in that repository, and add our db.php dropin file to the wp-content directory:

$ cd /path/to/repository
$ git checkout -b hyperdb
$ wget https://raw.githubusercontent.com/Automattic/HyperDB/trunk/db.php -P wp-content

Next, let’s create a db-config.php file with our MariaDB service addresses:

<?php
$wpdb->add_database( [
	'host'      => 'wordpress-mariadb-primary',
	'user'      => DB_USER,
	'password'  => DB_PASSWORD,
	'name'      => DB_NAME,
	'write'     => 1,
	'read'      => 1000,
] );

$wpdb->add_database( [
	'host'     => 'wordpress-mariadb-secondary',
	'user'     => DB_USER,
	'password' => DB_PASSWORD,
	'name'     => DB_NAME,
	'write'    => 0,
	'read'     => 1,
] );

We’ll keep it simple for now, letting HyperDB know about our primary database endpoint, which accepts writes, as well as reads with priority 1000 (more on that in a bit). And a second database which holds our secondary endpoint and accepts only reads.

This db-config.php file should be placed in the root of our repository, right next to the wp-content directory, while the db.php drop-in file is placed inside the wp-content directory. Commit and push everything to the Git repository and redeploy the application:

$ git add db-config.php wp-content/db.php
$ git commit -m "Add HyperDB and its configuration"
$ git push origin hyperdb

We’ll also need to adjust our wordpress.deployment.yml to reference the new branch in the initContainers configuration:

      initContainers:
      - name: init
        image: alpine/git
        command:
        - /bin/sh
        - -c
        args:
        - cd /var/www/html;
          git clone [email protected]:koddrio/k8s-sample-application.git .;
          git reset --hard $HASH;
          rm -rf .git
        env:
        - name: HASH
          value: origin/hyperdb

Finally, let’s ship the updated deployment manifest to the Kubernetes cluster, which will trigger an update on the running WordPress pods:

$ kubectl apply -f wordpress.deployment.yml

After the new pods have been initialized, you should be able to browse to your WordPress application, which hopefully would be running off of HyperDB this time.

Is it working?

It might be tricky to tell whether HyperDB is actually working, or whether WordPress is simply reading off of the primary database. We came up with a little test mu-plugin to figure that out.

<?php
add_action( 'muplugins_loaded', function() {
	global $wpdb;
	$wpdb->queries = [];
	$wpdb->save_queries = true;

	$wpdb->query( "SELECT * FROM {$wpdb->users} LIMIT 1;" );
	print_r( $wpdb->queries );
	print_r( $wpdb->dbhs );

	$wpdb->query( "UPDATE $wpdb->users SET user_login = 'foo' where user_login = 'bar' LIMIT 1;" );
	print_r( $wpdb->queries );
	print_r( $wpdb->dbhs );
	die();
} );

Add this to a mu-plugins directory inside wp-content and re-deploy the WordPress application:

$ git add wp-content/mu-plugins/hyperdb-test.php
$ git commit -m "Is it working?"
$ git push origin hyperdb
$ kubectl rollout restart deploy/wordpress

This plugin hooks into the WordPress load order very early (when mu-plugins finish loading), sets up HyperDB to save any queries it receives, then fires off a read query, then a write query, while printing some debug output after each.

The queries property will contain all the queries that have been run by HyperDB, so in the first case you should see just the read query, and the second case should reveal both the read and the write queries.

The dbhs property is the database handles or connections. In the first case this should show a connection to global__r which is a read-only global dataset, while the second time around you should see two connections: one to global__r and an additional one to the global__w dataset for that write query.

Each connection will also tell you the host name of the database (primary/secondary in our case) which may be useful if you’re trying to figure out whether your specific query is routed to the correct database server or service. Of course you should never print any of this in an actual production website, but rather write it to a private log file if needed.

Failover with HyperDB

HyperDB has a built-in mechanism to retry queries on a different database server if one has failed.

Each operation type (read, write) is defined with a certain priority when adding servers to HyperDB, and the lower priority servers will go first. This is why we defined that our primary server is capable of read operations, with with a priority of 1000 so that any existing replicas will go first.

This technique works really well when we have a defined pool of replica servers, however in our case we only have one endpoint for all replicas, which is then load balanced within the Kubernetes service. Unfortunately this means that even if we’re running a dozen replica servers, HyperDB will only attempt to connect to a replica once, and fail over to the primary if that connection fails.

To work around this we can define multiple copies of the same replica service, forcing HyperDB to retry the replica service again. Of course we could be unlucky and land on the same faulty pod, but hopefully our MariaDB operator has removed that faulty pod from the Service by then.

Another option would be to list out individual pods, with a database entry in our HyperDB configuration for each one. Except the primary of course, we’d still want to use the service there as it may change automatically in a failover event. This, however, will need manual adjustments when the cluster administrator decides to scale the MySQL service in or out, so it would naturally be a poor fit for an auto-scaling configuration.

The third option is to do service discovery: perform DNS lookups just in time, to figure out what database servers are running, what their IP addresses and roles are. We’ll test this type of service discovery configuration in a future section.

Other HyperDB Features

The sample db-config.php contains some example implementations of replication lag detection with shared memory. This may be incredibly useful, especially in some public cloud environments, where networking is not exactly a priority and CPU shares are not guaranteed.

Replication lag detection allows us to mark specific replica servers as “lagged” and avoid querying them for some time, allowing them to catch up to their primary. This feature will only work when the full list of replica servers (or service discovery) is configured (rather than just the secondary endpoint).

Another great feature in HyperDB is data sharding or partitioning. This allows us to place different tables on different server clusters, while continue querying these tables from WordPress’ $wpdb as if it were the same one (no JOINs, etc. of course).

The dataset property is responsible for that when configuring servers for HyperDB, with a call to add_callback to register a function, which returns the desired dataset based on the query. There’s an example of this configuration in the sample db-config.php file as well.

HyperDB also supports persistent connections, which may be a great idea for performance, though it’s worth noting that it may cause the WordPress application to hit various max-connection limits, since these connections will no longer be closed when the request is finished. This is especially true in some public cloud environments, where connection limits for managed database services (such as Amazon RDS/Aurora) tend to be very low by default.

Before turning this feature on, make sure it’s tested thoroughly with any HPA configurations at max capacity. Double that capacity if you’d like to account for blue/green deployments as well.

What’s next?

In this second we introduced the HyperDB drop-in for WordPress, which allows us to speak to multiple databases, split our read and write queries and some other great features.

In the next section we’ll look at some backup and restore techniques for our WordPress database, using some industry standard tools, as well as some built-in utilities from the MariaDB operator.