Deploying consideration for a Stateful application in Kubernetes Cluster

Introduction

After the first release in June, 2014 in just a span of approx. 6 years, Kubernetes a CNCF project has become the standard of container orchestration with almost all major technology giants like AWS, Azure, GCP, IBM, Redhat, VMware (Project Pacific) and many more started supporting it. It would not be wrong to say, Kubernetes is the fastest growing project in the history of Open Source Software.

Stateless vs Stateful

Initially, Kubernetes was primarily considered a platform to run stateless applications where application is not required to hold any data. The server processes requests based only on information relayed with each request and doesn’t rely on information from earlier requests. On the other hand, Stateful services like database, analytics where server processes requests based on the information relayed with each request and information stored from earlier requests will run either in virtual machines or managed services by any cloud provider.

In this article, I will be focussing on the key points you need to keep in consideration before deploying a stateful application. As we are now clear, stateful application require information to be stored. In a Kubernetes cluster, there are multiple approaches to store the data.

Using Shared storage for the Kubernetes cluster
Using Kubernetes StatefulSets

Let’s discuss the two approaches….

Stateful application using Shared filesystem

By design Docker containers are ephemeral in nature and require persistent disk storage i.e. persistent volumes to store the data. A persistent volumes can either be created manually or dynamically. A manual persistent volume or static provisioning, will be created before application provisioning whereas in dynamic provisioning of storage, the cluster can automatically deploy storage in response to the persistent volume claims it receives and then permanently bind the resulting persistent volume to the requesting pod. In Kubernetes, dynamic provisioning can be done using StorageClass.

You can create a persistent volume either by

Directly creating persistent volume on shared file system. These days most of the shared file system providers i.e. samba, NFS, iSCSI, Amazon EFS, Azure Files, Google Cloud Filestore provides volume drivers or CSI (Container Storage Interface) to enable cluster admins to directly provision persistent volume on the shared storage.
Mounting shared storage on the Kubernetes nodes and creating persistent volume on the mounted volume. Once mounted directly on the Kubernetes nodes, persistent Volume can be pointed to the host directory through hostPath or Local PV.

Stateful applications using Kubernetes Statefulset controller

In case of shared file system, durability and persistence is of data provided by the underlying storage as the workload is completed decoupled from it. This provides flexibility to get the pods scheduled on any node of the Kubernetes cluster . As the workload is completely decoupled from the underline storage this approach is not right fit for the applications like noSQL relational databases which requires high I/O throughput.

For the stateful application requiring high I/O throughput, Kubernetes Statefulsets are the recommended method. Leveraging Statefulsets along with Persistent Volume Claim you can have applications that can scale up automatically with unique Persistent Volume Claims associated to each replica Pod. StatefulSets are suitable for deploying Kafka, MySQL, Redis, ZooKeeper, and other applications needing unique, persistent identities and stable hostnames.

There are three major components underlying a StatefulSet:

A Headless Service, a service with a service IP but instead of load-balancing it will return the IPs of our associated Pods. This allows direct interaction with the Pods instead of a proxy.
The StatefulSet, Pods belonging to a StatefulSet are guaranteed to have stable, unique identifiers and follow a naming convention and also support ordered, graceful deployment and scaling.
Persistent volume Claims, pods participating in a Statefulset is required to have a persistent volume claim following the similar naming convention. If in case a pod get terminated and get restarted on another node, Kubernetes controller will ensure to associate the new pod with its corresponding existing Persistent Volume Claim.

Conclusion

In this articles we discussed on two approaches of deploying a stateful application in a Kubernetes cluster. Deploying a stateful application using shared filesystem is best fit for the application which don’t require high I/O throughput. On the other hand Deploying a stateful application using Kubernetes Statefulsets is right fit of applications requiring high I/O throughput. You can choose from a wide set of storage choices like GlusterFS, Samba, NFS, Amazon EFS, Azure Files, Google Cloud Filestore.

I hope this will be informative for you. Please do share if you find worth sharing this.