Working with Persistent Volumes - WordPress on Kubernetes

You may have noticed that our current application lacks persistent storage. This means that any container crash or restart, pod restart, etc. will completely wipe our WordPress and MariaDB data, and have us start from scratch every time. Obviously this is undesirable.

Note: all code samples from this section are available on GitHub.

To solve this problem we’ll need some persistent storage for both our WordPress application — for our themes and plugins, media uploads, etc.; as well as our database — for our post and page contents, configuration options and more.

Storage in Kubernetes is a large and complex topic, and we’ll touch on various interesting options along the way, but to keep things very simple for now, we’ll just use local volumes. A local persistent volume is essentially a directory (sometimes a disk or partition) on a Kubernetes node, that can be mounted into our containers. If you’ve worked with bind mounts in Docker before, this concept will be very familiar.

Node Affinity and Local Volumes

Local volumes exist on the physical disk attached to a specific node in our Kubernetes cluster. Only pods and containers running on that specific node will be able to use the volume. While this is not perfect (we’ll ultimately want our pods to be able to run on any node) it’s a good start for simple use cases, and we’ll explore more options in later sections.

In order to use local storage, we’ll first have to define a StorageClass in Kubernetes. This is a component that defines the type of storage, reclaim rules, provisioning and some other options and behaviours. We’ll manually provision our local storage on one of the nodes, so our StorageClass will be fairly simple. Let’s create a storage-class.yml file:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner

Nothing fancy here, but do note down the name we’ve assigned to our storage class, as we’ll have to use it later in our volumes and claims. Let’s now pick one of our nodes that will contain our local volumes for WordPress and MariaDB, and create the two directories there using SSH. We’re going to use the second node in our cluster with the hostname k1:

$ ssh k1
$ sudo mkdir -p /data/volumes/wordpress/www-data
$ sudo mkdir -p /data/volumes/wordpress/mariadb
$ exit

Next, let’s define these two volumes in a new `volumes.yml` file:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: www-data
spec:
  capacity:
    storage: 2Gi
  accessModes:
  - ReadWriteOnce
  storageClassName: local-storage
  local:
    path: /data/volumes/wordpress/www-data/
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - k1

We define a persistent volume called www-data here, with 2GB of storage, a ReadWriteOnce access mode (you’ll learn about access modes in later sections), using our defined local-storage class. The local.path attribute is set to the location of our www-data directory on our node, and we also have a nodeAffinity section, forcing this volume to live on a Kubernetes node whose hostname matches k1 — that’s the one where we created our two directories.

Let’s create our second volume for mariadb in the same YAML file, separating the two Kubernetes objects with --- on its own line:

# first volume definition here ...
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mariadb
spec:
  capacity:
    storage: 2Gi
  accessModes:
  - ReadWriteOnce
  storageClassName: local-storage
  local:
    path: /data/volumes/wordpress/mariadb
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - k1

Quite similar to our first volume. Let’s create our StorageClass and PersistentVolumes in Kubernetes with kubectl:

$ kubectl apply -f storage-class.yml -f volumes.yml
storageclass.storage.k8s.io/local-storage created
persistentvolume/www-data created
persistentvolume/mariadb created

Let’s look at claims next.

Persistent Volume Claims

Volumes alone are not enough to get things into our containers. We’ll need to define a couple of PersistentVolumeClaim objects, which are like requests for storage in Kubernetes. These can be satisfied with available volumes, like in our case, or in more common scenarios, a provisioner will dynamically create a new volume for such a request, we’ll look at some of these options in later sections.

Let’s create a volume-claims.yml file and define claims for both volumes:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: www-data
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: local-storage
  resources:
    requests:
      storage: 2Gi
  volumeName: www-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mariadb
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: local-storage
  resources:
    requests:
      storage: 2Gi
  volumeName: mariadb

We’re pretty much repeating the what we already defined in our volumes. We’re also using the volumeName attribute here, to make sure our claims are bound to the volumes with the correct names, otherwise we might end up in a situation, where our www-data claim is bound to the mariadb volume and vice-verca.

Add the claims to your Kubernetes cluster with kubectl:

$ kubectl apply -f volume-claims.yml
persistentvolumeclaim/www-data created
persistentvolumeclaim/mariadb created

You can then check the status of the claims, to make sure they’re bound to the correct volumes:

$ kubectl get pvc     
NAME       STATUS   VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
mariadb    Bound    mariadb    2Gi        RWO            local-storage   <unset>                 30s
www-data   Bound    www-data   2Gi        RWO            local-storage   <unset>                 30s

Mounting Volumes into Containers

Now that our volumes and volume claims are ready, let’s update our pod.yml file to mount these volumes into the corresponding containers. First, let’s add a new volumes section on the same level as our containers section:

spec:
  containers:
  # ...

  volumes:
  - name: www-data
    persistentVolumeClaim:
      claimName: www-data
  - name: mariadb
    persistentVolumeClaim:
      claimName: mariadb

Here we tell Kubernetes which claims we would like to use for our Pod and ther names. Next, we’ll update our `wordpress` and `mariadb` containers specification to mount these volumes. The `wordpress` specification will now look like this:

spec:
  containers:
  - name: wordpress
    image: wordpress:6.4-apache
    ports:
    - containerPort: 80
    volumeMounts:
    - name: www-data
      mountPath: /var/www/html
    env:
    # environment vars ...

The new volumeMounts section tells Kubernetes to mount the www-data volume (which is liked to the www-data volume claim) into the /var/www/html directory inside our container. This is where Apache will be looking for our PHP files. Similarly, we’ll add the volume mount for MariaDB:

  - name: mariadb
    image: mariadb:10.11
    ports:
    - containerPort: 3306
    volumeMounts:
    - name: mariadb
      mountPath: /var/lib/mysql
    env:
    # environment vars ...

Here we mount the mariadb volume (linked to the mariadb PVC) to /var/lib/mysql, which is where MariaDB stores all its data.

The full pod.yml file at this stage can be found [here for reference]. The service.yml can be used as is from our previous section. Let’s create our pod and service in Kubernetes:

$ kubectl apply -f pod.yml -f service.yml
pod/wordpress created
service/wordpress created

Now let’s check the status of our new pod, to make sure that both containers are running.

$ kubectl get pods
NAME        READY   STATUS    RESTARTS   AGE
wordpress   2/2     Running   0          8s

Finally, point your web browser to the Kubernetes service and port by IP or hostname, run through the WordPress installer once more and feel free to add some plugins, themes and content to your WordPress website. All the data should now be persisted in our attached volumes. Let’s verify this by deleting our pod and creating it again:

$ kubectl delete pod wordpress
pod "wordpress" deleted

$ kubectl apply -f pod.yml
pod/wordpress created

Browse to the website once again, and verify that all the content is still there. We can also look at the contents of our two volumes via SSH directly on the Kubernetes node:

$ ssh k1
$ ls -l /data/volumes/wordpress/www-data/
total 248
-rw-r--r--  1 www-data www-data   405 Feb  6  2020 index.php
-rw-r--r--  1 www-data www-data 19915 Jun  9 23:12 license.txt
-rw-r--r--  1 www-data www-data  7401 Jun  9 23:12 readme.html
# ...

$ ls -l /data/volumes/wordpress/mariadb/
total 139460
-rw-rw---- 1 999 systemd-journal  16932864 Jun 10 10:06 aria_log.00000001
-rw-rw---- 1 999 systemd-journal        52 Jun 10 10:06 aria_log_control
-rw-rw---- 1 999 systemd-journal         9 Jun 10 10:07 ddl_recovery.log
-rw-rw---- 1 999 systemd-journal      1547 Jun 10 10:06 ib_buffer_pool
# ...

Recap & Cleanup

In our previous configuration all storage is ephemeral and lives only during the lifetime of the corresponding container. For the data to be persisted and available across restarts/recreates we’ll need to write it to persistent volumes in Kubernetes. In this section we’ve created and mounted two local storage persistent volumes to our WordPress and MariaDB containers.

This approach does have quite a few drawbacks. Since our volumes live on our k1 node in Kubernetes, we don’t have any options on where to schedule our Pods and containers, they have to live on the same node to make use of that storage. If the node happens to crash, we’re in trouble, losing both our running services, and potentially losing our WordPress and MariaDB data that’s on the disk attached to that node. We’ll address these and other problems in future sections where we expand on storage options in Kubernetes.

To delete your pod, service, volumes, claims and storage class, run:

$ kubectl delete -f pod.yml -f service.yml \
  -f volume-claims.yml -f volumes.yml \
  -f storage-class.yml

pod "wordpress" deleted
service "wordpress" deleted
persistentvolumeclaim "www-data" deleted
persistentvolumeclaim "mariadb" deleted
persistentvolume "www-data" deleted
persistentvolume "mariadb" deleted
storageclass.storage.k8s.io "local-storage" deleted

While the volumes are delete from the Kubernetes cluster, the volume data is still persisted on our disk on the k1 node. We’ll have to delete that manually via SSH, similar to how we provisioned it:

$ ssh k1
$ cd /data/volumes
$ sudo rm -rf www-data mariadb

All done here, see you in the next section!