Externalize SQL Server Container State using Persistent Volume Claim

Background

This is the seventh part of the series on building highly scalable multi-container apps using AKS. So far in the series we have covered following topics:

This post is about externalizing the state associated with SQL Server 2017 Linux container to an external volume which we provisioned in the previous post. We will be focussing on following topics during this post:

  • Create a StatefulSet Kubernetes object which will be responsible for creating the SQL Server 2017 Linux container
  • Externalize the state by means of volume mapping to utilize the storage which was provisioned in the previous post
  • Query and update data from SQL Server 2017 Linux container
  • Verify data is persisted if the container restarts
  • Verify the volume mount using Kubernetes cli

Lets get the ball rolling.

Create a StatefulSet Kubernetes object

Kubernetes provides an object of type StatefulSet which is often used when deploying database workloads that store their state on external storage providers. Statefulsets are used when data loss is unacceptable. StatefulSets provide a stable volume mounts across the restarts of pods or even cluster.

Lets see how we can create a StatefulSet. We will use the manifest file shown below to  declaratively set the different properties.

We specify the kind as StatefulSet. Compared to the deployment manifest, there is another difference here. For the StatefulSet, we need to specify the serviceName attribute. The rest of the attributes like metadata and spec are similar to other objects that we have created in the series.

Externalize the state by means of volume mapping

Before we can deploy the statefulset to Kubernetes cluster using kubectl, we need to make couple of other changes to ensure that the volume is mapped correctly. In the same yaml file, scroll towards the bottom and pay attention to the volume mounts and volumes sections.

Lets first look at the volumeMounts section. Here we specify the name for the volume which is mounted as mssqldb. The mountPath specifies which path inside the container is mapped to the volume named mssqldb. The MSSQL Server 2017 Linux container stores the data files inside /var/opt/mssql directory. So we have mapped this path inside the container to an external volume.

Next is the link between the volume and the physical storage which is persisted to the Azure Disk. This association is done in the volumes section. We specify exactly the same name which was assigned to the PVC in previous post, techtalks-data.

The beauty of this is that the container does not know what is the underlying storage provider. In future we can swap the Azure Disk to some other provider and the container need not have to worry about it. As long as the PVC name matches with the claimName, the container will continue to run.

We also expos the service via a load balancer. The other manifest files used during this series can be found in the Github repo. Navigate to the TechTalksDB folder and run the kubectl apply command to create the objects in the Kubernetes cluster. Below are the list of commands I executed to deploy the TechTalksDB, TechTalksAPI and TechTalksWeb containers.

kubectl part1

kubectl commands

Query and update data from SQL Server container

Lets query the data using SQL Operations Studio. The initialization script contains 3 records. I added one more before taking the screenshot. So at this point of time, we have 4 records in the TechTalk table.

database initial state

Lets add a new record and query it back.

database after adding new record

Verify Data is persisted on container restart

This is where the things start to get interesting. Lets delete the pod which is running the SQL Server 2017 container. We can delete the pod in multiple ways. Earlier in the series I had demonstrated how to interact with Kubernetes objects using the kubectl cli. To keep things simple, lets use the kubernetes control plane to delete the pod using GUI option.

Delete statefulpod

Click on the Delete option as shown above. We will be asked for confirmation, go ahead and confirm the deletion.

Delete pod

Verify data is persisted if the container restarts

As soon as the pod is deleted, the replication controller starts another instance of the pod. Any guess why? Go back to the manifest file and check the number of replicas. We have specified the number of replicas as 1. This is the desired state that we requested Kubernetes to maintain. As soon as the replica count became zero, the controller initiates the creation of new pod to make the actual state to what is desired.

For few seconds while the old pod is terminating and the new one is initiating, we will see some errors.

statefulset error

Give it a few seconds to stabilize and within a minute we can see that things are back to normal with the new pod successfully running.

statefulset recreated

The pod is restarted, but does that mean we did not loose the data which was persisted? When I queried the table again using SQL operations studio, I got the results with all the 5 rows intact. To save some space, I will not put the same screenshot again. You can give it a try yourself as a home work.

We can inspect the logs of the newly created pod to see what happened in the background.

SQL Server container logs

Verify the volume mount using Kubernetes cli

We can also look at the different events that took place at the cluster level. Run the command kubectl get events –namespace abc2018sg and we can see the list of events. Check the sequence of events in the screenshot which shows what happened after the pod was killed and new one was recreated.

volume mount events 

I have highlighted 3 main areas above. First one is the deletion of sql2017 pod. Second is the recreation of the stateful container. And third is the volume mount which is the real game changer. If the volume is not there, we would not be able to recover the data.

Conclusion

During the course of this post we saw how stateful sets can mount external volumes using the persistent volume claims. Data is persisted across container restarts. StatefulSets provide a stable endpoint for the containers which need persistent storage like in the case of database container. By externalizing the data from container into persistent volumes we can ensure that even of the cluster is restarted, we can still be able to mount the volumes back to container and achieve scalability.

All that I demonstrated here is also covered as part of the AKS learning series. The video recording of the series is available on Youtube. Thanks to Engineers.SG who are helping to record this series of talks.

AKS Part4 Youtube

As always, the code used during this post is available on Github.

Until next time, Code with Passion and Strive for Excellence.

spacer