Storage in Kubernetes is a large and complex topic. We’ve chatted about the concept of persistent volumes in an earlier section, giving our WordPress and MariaDB database containers some persistent space for their data. However, in that section we had to go through the trouble of manually creating volumes on specific Kubernetes nodes, and were then limited to where we could run our pods.
Note: all code samples from this section are available on GitHub.
In this section we’ll address this problem with Dynamic Volume Provisioning. We’ll take a quick look at the OpenEBS project and how it can help us automatically provision local volumes. We’ll take a look at replicated storage as well, allowing us to run our applications on any worker node in our Kubernetes cluster.
Dynamic Volume Provisioning
As a Kubernetes cluster user, you’ll almost never work with volumes directly, opting to use volume claims instead. Dynamic volume provisioning allows cluster users to automatically create volumes from a persistent volume claim, rather than having a cluster administrator manually create those volumes beforehand.
This is quite convenient, and the flexibility allows for different volume types, which are called Storage Classes. When operating on just local volumes, this doesn’t matter that much as each volume is just some physical disk space on a particular node. However in real-world clusters that type of storage is very limited in use, and we’ll cover some more complex and interesting options later.
Think about cloud environments though, such as AWS or Google Cloud, where you have expensive fast storage, and cheap slow storage, read-optimized storage, multi-write storage, replicated storage and plenty of other options. The abstractions in Kubernetes are designed to work with all these options, and the main concept behind it is a Storage Class.
Provisioners
By itself, a storage class is just a declaration of some arbitrary type of storage. We’ve seen in a previous section a declaration of an arbitrary “local-storage” class which we then manually assigned to our volumes and volume claims.
A storage class is more useful when it comes along with a provisioner, which determines what’s going to happen, when somebody claims a persistent volume of that class. The provisioner
is defined in the StorageClass
YAML manifset and is usually picked up by a Kubernetes CSI (container storage interface) plugin/driver.
The unopinionated vanilla Kubernetes install doesn’t ship with any storage plugins, so there’s no provisioner you could use out of the box. However, you’ll find that cloud platforms like AWS, Azure, and others will often ship with existing provisioners or relatively straightforward ways to enable them, allowing users to claim cloud-specific storage, like Amazon’s EBS or EFS volumes.
For on-premise Kubernetes deployments you’ll find CSI drivers for things like NFS, GlusterFS, Ceph and other storage options. We’re going to explore Ceph in more detail in later sections of this guide, but to introduce you to provisioners, we’ll start with OpenEBS.
Installing OpenEBS
OpenEBS is a CNCF storage project, which allows you to use existing storage devices or disk space in your Kubernetes cluster, to dynamically provision local or replicated volumes.
Local volumes in OpenEBS are regular filesystem mounts, while replicated volumes are based on block storage connected via iSCSI, meaning it’s usually very fast and efficient, but does come with some limitations (we’ll look at shared storage in later sections).
Let’s first install the very minimal version of OpenEBS, one that only supports local volumes backed by a filesystem path on the host node. We’ll use Helm, which is a package manager for Kubernetes, and we typically use it on the management node:
$ helm repo add openebs https://openebs.github.io/openebs
$ helm repo update
$ helm install openebs openebs/openebs \
--set engines.replicated.mayastor.enabled=false \
--set engines.local.lvm.enabled=false \
--set engines.local.zfs.enabled=false \
--namespace openebs \
--create-namespace
This tells Helm where to look for the openebs
repository, and then install the openebs/openebs
chart under the openebs
namespace that it will create.
By default, the current OpenEBS Helm chart will install support for multiple engines, but we’ll keep things simple for now and skip installing Mayastor, LVM and ZFS support. This will leave us with just the local HostPath provisioner:
$ kubectl -n openebs get pods
NAME READY STATUS RESTARTS AGE
openebs-localpv-provisioner-7cd9f85f8f-c479d 1/1 Running 0 11s
You might also notice that a new StorageClass is now available in our Kubernetes cluster:
$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
openebs-hostpath openebs.io/local Delete WaitForFirstConsumer false 94s
Let’s rework our WordPress application manifests to work with this new storage class.
Provisioning in Action
We’ll continue building on our configuration from the previous section, where we had a MariaDB and WordPress stateful sets and an Nginx deployment with three replicas. Instead of provisioning new volumes on a specific host, we’ll let our new provisioner handle that.
This means we’ll no longer need storage-class.yml
and volumes.yml
, but we do need to update the storage class in our volume-claims.yml
file (feel free to remove the www-data claim and focus on just mariadb for now):
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mariadb
spec:
accessModes:
- ReadWriteOnce
storageClassName: openebs-hostpath
resources:
requests:
storage: 2Gi
Let’s now provision the the volume claim and see what happens:
$ kubectl apply -f volume-claims.yml
persistentvolumeclaim/mariadb created
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
mariadb Pending openebs-hostpath <unset> 12s
We now have a pending volume claim. Let’s launch our MariaDB StatefulSet and don’t forget to apply the ConfigMap and Service:
$ kubectl apply \
-f mariadb.configmap.yml \
-f mariadb.statefulset.yml \
-f mariadb.service.yml
configmap/mariadb created
statefulset.apps/mariadb created
service/mariadb created
If you look at your PVCs list again, you’ll see that the MariaDB claim is now bound to a volume:
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
mariadb Bound pvc-3f85e3cb-5cae-4163-aaf8-1a3003c6ca58 2Gi RWO openebs-hostpath <unset> 2m58s
Furthermore, you will see that new volume appear in the volumes list:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
pvc-3f85e3cb-5cae-4163-aaf8-1a3003c6ca58 2Gi RWO Delete Bound default/mariadb openebs-hostpath <unset> 6m26s
And our MariaDB pod is up and running:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mariadb-0 1/1 Running 0 4m42s
As you would expect, deleting the MariaDB pod will simply cause the StatefulSet to create a new pod with the same configuration, with the same volume attached.
$ kubectl delete pod mariadb-0
pod "mariadb-0" deleted
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mariadb-0 1/1 Running 0 6s
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
pvc-3f85e3cb-5cae-4163-aaf8-1a3003c6ca58 2Gi RWO Delete Bound default/mariadb openebs-hostpath <unset> 9m26s
Interestingly, deleting the StatefulSet will also retain the persistent volume, until the claim continues to exist. Recreating the StatefulSet will re-attach that volume to any new pod that requires it.
$ kubectl delete -f mariadb.statefulset.yml
statefulset.apps "mariadb" deleted
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
mariadb Bound pvc-3f85e3cb-5cae-4163-aaf8-1a3003c6ca58 2Gi RWO openebs-hostpath <unset> 12m
However, removing the volume claim (along with the StatefulSet) will cause the the OpenEBS provisioner to delete the volume per the reclaim policy:
$ kubectl delete -f volume-claims.yml
persistentvolumeclaim "mariadb" deleted
$ kubectl get pv
No resources found
The provisioner takes care of the lifecycle of our volumes, creating new ones when a new claim is made, and deleting (or retaining, depending on policy) when a claim is removed.
The provisioner also takes care of placement for us, so we don’t really have to think which nodes to create our volumes on. However, once a volume is created, it can not (easily) be moved.
Node Affinity
We touched on this topic in an earlier section. Any local persistent volume exists on one specific node, and can only be mounted to containers running on that node.
This behavior will cause Kubernetes to reschedule a crashed (or deleted) pod on the same node when it requires a volume from that node. Until that is possible, it will remain in pending state forever. We can test this by tainting a node, which allows us to prevent scheduling new pods there.
Let’s create our volume claims and MariaDB statefulset, and observe where exactly the pod has landed:
$ kubectl apply -f volume-claims.yml -f mariadb.statefulset.yml
persistentvolumeclaim/mariadb created
statefulset.apps/mariadb created
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mariadb-0 1/1 Running 0 15s 10.10.2.136 k2 <none> <none>
In the above case, the pod has landed on the k2
node, which means the local storage is also provisioned on that node. We can verify this by describing our volume:
$ kubectl describe pv pvc-3c90db59-2879-41f2-8b9a-f5ee80e44a64
Name: pvc-3c90db59-2879-41f2-8b9a-f5ee80e44a64
Labels: openebs.io/cas-type=local-hostpath
Annotations: pv.kubernetes.io/provisioned-by: openebs.io/local
Finalizers: [kubernetes.io/pv-protection]
StorageClass: openebs-hostpath
Status: Bound
Claim: default/mariadb
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 2Gi
Node Affinity:
Required Terms:
Term 0: kubernetes.io/hostname in [k2]
Message:
Source:
Type: LocalVolume (a persistent volume backed by local storage on a node)
Path: /var/openebs/local/pvc-3c90db59-2879-41f2-8b9a-f5ee80e44a64
Events: <none>
See the Node Affinity section in the output above. No matter how many times we delete the pod, it will be recreated on the k2
node because of this affinity. Let’s taint the k2
node, preventing the scheduling of new pods:
$ kubectl taint nodes k2 please:NoSchedule
node/k2 tainted
$ kubectl delete pod mariadb-0
pod "mariadb-0" deleted
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mariadb-0 0/1 Pending 0 12s
The only other available node in our cluster for this workload is k1
, but the volume doesn’t exist there, so Kubernetes may keep this pod in pending state forever. Describing the pod will yield more details:
$ kubectl describe pod mariadb-0
# (output omitted ...)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 62s default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {please: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
Describing the Persistent Volume Claim will also let you know which node the volume is provisioned on, if any, via annotations:
$ kubectl describe pvc mariadb
# (output omitted...)
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: openebs.io/local
volume.kubernetes.io/selected-node: k2
volume.kubernetes.io/storage-provisioner: openebs.io/local
Note that more often than not, creating a volume claim does not cause a volume to be created right away. Most provisioners will only create the volume (and thus make the decision which node to place it on) when a pod is trying to mount it, so affinity/node matching is usually done at that time for new volumes.
Before proceeding, don’t forget to untaint the previously tainted node so that we can schedule workflows there:
$ kubectl taint nodes k2 please:NoSchedule-
node/k2 untainted
The Usefulness of Local Volumes
While having a provisioner is certainly helpful, the fact that it’s still creating local volumes bound to specific nodes isn’t all that great. If we run the rest of our YAML manifests you’ll see that the three Nginx replica pods and the WordPress pod still all end up on the same node:
$ kubectl apply -f .
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mariadb-0 1/1 Running 0 7s 10.10.1.23 k1 <none> <none>
nginx-76876c5747-2bllb 1/1 Running 0 7s 10.10.2.135 k2 <none> <none>
nginx-76876c5747-cgdft 1/1 Running 0 7s 10.10.2.12 k2 <none> <none>
nginx-76876c5747-hkkl6 1/1 Running 0 7s 10.10.2.225 k2 <none> <none>
wordpress-0 1/1 Running 0 6s 10.10.2.157 k2 <none> <none>
This is because they’re bound to the same volume, which has been provisioned and only available on the k2
node. If this k2
node suffers from a hardware failure, Kubernetes will not be able to reschedule the pods on a different node as we’ve seen with the taint example earlier.
The concept of a local volume is mostly useful for stateful applications that already have some kind of resiliency built in. Furthermore, locally attached storage tends to be significantly faster that network-attached block storage and especially filesystems.
Many databases are good candidates for this. MySQL or MariaDB are no exception. When scaling out a MySQL service, your cluster will typically consist of one primary (or master) server, and one or more replicas (or slave) server. These servers will never share filesystems or other storage between each other and will only use MySQL’s replication mechanisms to keep the data in sync across the cluster.
If one replica server dies along with its storage, spinning up a new replica with a brand new volume elsewhere will not be a problem. If the primary server dies, along with its storage, promoting one of the existing replicas to primary, then adding another replica with a new volume is also not that hard.
In most circumstances when running a MySQL or MariaDB cluster, we don’t really need redundant storage, so a locally attached volume is perfectly fine, and as a bonus, will probably outperform the majority of network attached solutions too. We’ll explore MySQL replication and getting WordPress to work with multiple databases in a later section.
However, if running a single MySQL or MariaDB server, then relying on a local volume is not a great idea. This is where replicated volumes may come in handy, and OpenEBS has a great engine just for that.
Before diving into replicated storage, let’s remove all our resources in the default namespace, as we’ll need to recreate them later with new volume claims:
$ kubectl delete -f .
OpenEBS Replicated Storage
On the surface, replicated storage with OpenEBS is not that different from local storage – you create a claim, you get some disk space mounted into your pod and your application is happy.
Underneath, however, it’s a whole different story based on a storage engine called Mayastor, which brings synchronous replication, redundancy and other great features.
Mayastor creates persistent volumes, which can be attached from any node in a Kubernetes cluster, solving our single MySQL/MariaDB server problem. Once the node is dead, you will be able to spin up a new MySQL/MariaDB pod on a different node, and use the existing OpenEBS/Mayastor volume, where all your data is (hopefully) intact.
Before we install OpenEBS with all the bells and whistles, there are a couple of requirements we need to take care of. First, let’s enable Hugepages support in Linux via the /etc/sysctl.conf
file:
vm.nr_hugepages = 1024
Run sysctl -p
after modifying the file to apply changes, and restart the Kubelet service:
$ sysctl -p
$ systemctl restart kubelet.service
Then, we’ll need to enable a kernel module called NVMe/TCP which is a requirement for Mayastor. Most Linux distributions will have this available.
You can activate the module using modprobe
:
$ modprobe nvme_tcp
$ lsmod | grep nvme_tcp
nvme_tcp 53248 0
nvme_keyring 20480 1 nvme_tcp
nvme_fabrics 36864 1 nvme_tcp
nvme_core 212992 2 nvme_tcp,nvme_fabrics
To persist a reboot you’ll need to add a modules-load.d entry (non-Debian distributions should have something similar):
$ echo nvme_tcp > /etc/modules-load.d/nvme_tcp.conf
Installing OpenEBS with Mayastor
We’ve already used Helm to install the minimal OpenEBS version earlier. Let’s uninstall that, and then run the installer again, only this time have it use all the default options:
$ helm uninstall openebs -n openebs
$ helm install openebs openebs/openebs -n openebs
$ kubectl -n openebs get pods
Don’t be alarmed by the amount of pods you see! It’ll be about 30 for a three-node cluster with one tainted for control-plane. It may take a few minutes for all the pods to get into a Running state.
Next, we’ll need to set up some storage space for Mayastor. In a production environment you’ll usually use separate disk drives for this, however for testing purposes, we may create a file (that looks like a disk) on our existing nodes filesystems, let’s do that on all three nodes:
$ mkdir -p /var/local/openebs/pools/
$ truncate -s 10G /var/local/openebs/pools/disk.img
As you might have guessed, this will create a 10 GB disk image on each node, giving us a total of 30 GB in our pool of three nodes. Next, we’ll need to make sure this new directory we created is mounted to the OpenEBS IO engine pods. To do this, we’ll have to modify the openebs-io-engine
DaemonSet.
A DaemonSet is just another abstraction over Pods in Kubernetes. We already covered Deployments and StatefulSets earlier. DaemonSets are very similar, and ensure certain Pods are running across all (necessary) nodes in a cluster.
We can use the kubectl edit
command to edit this component in our cluster:
$ kubectl -n openebs edit DaemonSet/openebs-io-engine
This will open up your text editor with the YAML of the DaemonSet. Note that saving the file end exiting the editor at this point will send the updated definition to the Kubernetes API server, replacing the previous definition. It’s quite a dangerous way to change things in your cluster, but it’s great for learning purposes.
In this openebs-io-engine DaemonSet definition you’ll find some existing volume definitions and mounts in the volumeMounts
and volumes
sections respectively.
Let’s add our new directory to the volumeMounts
section:
volumeMounts:
- mountPath: /dev
name: device
- mountPath: /run/udev
name: udev
- mountPath: /dev/shm
name: dshm
- mountPath: /var/local/openebs/io-engine/
name: configlocation
- mountPath: /dev/hugepages
name: hugepage
- mountPath: /var/local/openebs/pools/
name: pools
Then define the pools
volume in the volumes
section:
volumes:
- hostPath:
path: /dev
type: Directory
name: device
- hostPath:
path: /run/udev
type: Directory
name: udev
- emptyDir:
medium: Memory
sizeLimit: 1Gi
name: dshm
- emptyDir:
medium: HugePages
name: hugepage
- hostPath:
path: /var/local/openebs/io-engine/
type: DirectoryOrCreate
name: configlocation
- hostPath:
path: /var/local/openebs/pools/
type: DirectoryOrCreate
name: pools
You might notice that the /dev
directory is already mounted in this DaemonSet, which means that if you’re planning to use an actual disk (and not an image like in our example) then this step will not be necessary.
Next, we’ll need to tell OpenEBS which nodes are okay to run Mayastor on. This can be done with the label: openebs.io/engine: mayastor
$ kubectl label nodes {k0,k1,k2} openebs.io/engine=mayastor
node/k0 labeled
node/k1 labeled
node/k2 labeled
You’ll notice a new openebs-io-engine-* pod start in your cluster for every labelled node. These are the pods managed by the DaemonSet we’ve edited earlier.
Finally, we’ll need to define some DiskPool resources to reference our fake (or maybe real in your case) disks. This is a custom resource defined and used by OpenEBS. Let’s create a new diskpool.yml
file and define our three pools. Since this pool may be shared between different applications and namespaces, it’s best to create this in a different directory (we chose cluster
) so that it’s not deleted accidentally:
apiVersion: "openebs.io/v1beta2"
kind: DiskPool
metadata:
name: pool-k0
namespace: openebs
spec:
node: k0
disks: ["aio:///var/local/openebs/pools/disk.img"]
---
apiVersion: "openebs.io/v1beta2"
kind: DiskPool
metadata:
name: pool-k1
namespace: openebs
spec:
node: k1
disks: ["aio:///var/local/openebs/pools/disk.img"]
---
apiVersion: "openebs.io/v1beta2"
kind: DiskPool
metadata:
name: pool-k2
namespace: openebs
spec:
node: k2
disks: ["aio:///var/local/openebs/pools/disk.img"]
We’re defining three DiskPool
objects in the openebs
namespace named pool-k0
, pool-k1
and pool-k2
. Each has a spec referencing a specific node
and an array for disks
. You’ll use the path to a physical disk device if you’re using one. In our example we’re using disk emulation (among other options) via the aio://
URI.
Let’s apply our diskpool.yml
file and look at the results:
$ kubectl apply -f cluster/diskpool.yml
$ kubectl -n openebs get diskpool
NAME NODE STATE POOL_STATUS CAPACITY USED AVAILABLE
pool-k0 k0 Created Online 10724835328 0 10724835328
pool-k1 k1 Created Online 10724835328 0 10724835328
pool-k2 k2 Created Online 10724835328 0 10724835328
If everything worked flawlessly, you should see the pools with a Created state on each node. We can also see capacity and usage of each pool. If your pools are stuck in Pending or an Error state, you can use kubectl describe
to obtain more information and start debugging.
Mounting Replicated Volumes
The default installation of OpenEBS ships with a StorageClass called openebs-single-replica
. While this does use the Mayastor backend, the resulting volume will still only exist on a single node. To have a truly replicated volume we’ll need to define a new StorageClass with our desired replica count.
Let’s create a storage-class.yml
file for our replicated volumes (also separately from our applications):
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-replicated
parameters:
protocol: nvmf
repl: "3"
provisioner: io.openebs.csi-mayastor
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
The new StorageClass will be named openebs-replicated
, the nvmf
protocol tells Mayastor to use the NVMe over TCP protocol. The repl
parameter tells Mayastor how many replicas we want. The provisioner
attribute makes sure Mayastor handles all claims using this storage class. The rest of the attributes define some provisioning and management behavior. You can find more details about these in the StorageClass documentation.
Apply the storage-class.yml
manifest:
$ kubectl apply -f cluster/storage-class.yml
storageclass.storage.k8s.io/openebs-replicated created
Next, let’s update our application’s volume-claims.yml
file for our MariaDB claim to use this new storage class:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mariadb
spec:
accessModes:
- ReadWriteOnce
storageClassName: openebs-replicated
resources:
requests:
storage: 2Gi
Apply the manifest and list the claims:
$ kubectl apply -f volume-claims.yml
persistentvolumeclaim/mariadb created
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
mariadb Bound pvc-343545ca-fcce-442c-bce4-184271f14d0b 2Gi RWO openebs-replicated <unset> 2s
This time around the volume is created immediately and bound to the claim, without having to wait for a pod to try and use it. This behavior is defined by volumeBindingMode
in the StorageClass
.
Note that we’re defining a claim here with 2GB of storage in total. Let’s look at how this affected our DiskPools
:
$ kubectl -n openebs get diskpool
NAME NODE STATE POOL_STATUS CAPACITY USED AVAILABLE
pool-k0 k0 Created Online 10724835328 2147483648 8577351680
pool-k1 k1 Created Online 10724835328 2147483648 8577351680
pool-k2 k2 Created Online 10724835328 2147483648 8577351680
As you can see, each pool’s availability has shrunk by about 2GB, which suggests that the space is allocated on all three pools, as expected with our replication count set to 3.
Let’s run some pods with these volumes, shall we?
MySQL/MariaDB
As mentioned earlier, a single-server MariaDB or MySQL database would greatly benefit from a replicated block store. If the underlying Kubernetes node crashes, we can (often times) resume our pods on a different node. Let’s test some of that.
Unaltered from our previous examples we’ll need a MariaDB ConfigMap, Service and StatefulSet:
$ kubectl apply \
-f mariadb.configmap.yml \
-f mariadb.statefulset.yml \
-f mariadb.service.yml
configmap/mariadb created
statefulset.apps/mariadb created
service/mariadb created
Let’s take a look which node the MariaDB pod was assigned to:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mariadb-0 1/1 Running 0 10s 10.10.1.217 k1 <none> <none>
In our example it’s node k1
. Let’s add some data to the MySQL database on this node (you’ll find the database credentials in the ConfigMap):
$ kubectl exec -it mariadb-0 -- mysql -uwordpress -psecret wordpress
> create table foo (bar text);
> insert into foo values ('foo'), ('bar'), ('baz');
Query OK, 3 rows affected (0.002 sec)
Records: 3 Duplicates: 0 Warnings: 0
Let’s now exit the MariaDB shell and drain the node:
$ kubectl drain k1 --ignore-daemonsets --delete-emptydir-data
# (omitted output ...)
node/k1 cordoned
node/k1 drained
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mariadb-0 1/1 Running 0 93s 10.10.2.49 k2 <none> <none>
Looks like MariaDB is happily chilling on the k2
node now. Let’s make sure the data is still there:
$ kubectl exec -it mariadb-0 -- mysql -uwordpress -psecret wordpress
> select * from foo;
+------+
| bar |
+------+
| foo |
| bar |
| baz |
+------+
3 rows in set (0.007 sec)
If you want to be able to schedule things on the drained node, you’ll need to uncordon
it with kubectl
:
$ kubectl uncordon k1
node/k1 uncordoned
In the example above we drained the node using kubectl
. This tells the node to evict all the pods from it, and does so in a graceful manner. This means that the MariaDB service will have some time to properly shutdown and write its data to disk before terminating. This is what usually happens with scheduled maintenance, when you need to replace nodes or disks, or shuffle things around.
However, it is not uncommon for a node to crash, or lose connectivity or suffer from some bad hardware failure beyond recovery. These cases need a lot more involvement in replacing the faulty nodes, and possibly recovering (or permanently losing) unwritten data.
MySQL and MariaDB have some built-in crash recovery mechanisms that usually run on startup, but there’s never a guarantee, so even with this storage redundancy in place, you should continue doing proper database backups. We’ll cover more backup and disaster recovery topics in later sections in this guide.
Recap & Cleanup
In this section we covered storage provisioners in Kubernetes. We installed OpenEBS and looked at automatically provisioning local volumes using the HostPath provisioner.
We then looked at the OpenEBS Mayastor storage backend and configured our Kubernetes cluster with some DiskPools and a 3-replica StorageClass. We provisioned some replicated volumes using that new class and ran a MariaDB pod with the provisioned replicated volume attached. Finally we looked at how to move the MariaDB pod to a different node, while still having access to the replicated volume.
Feel free to nuke the MariaDB stateful set, service and configmap, along with any volume claims before proceeding:
$ kubectl delete \
-f mariadb.configmap.yml \
-f mariadb.statefulset.yml \
-f mariadb.service.yml \
-f volume-claims.yml
Head over to the next section where we’ll look at different storage access modes, and why such block storage is still a bit problematic for a scalable WordPress application in a Kubernetes cluster.