How Kubernetes conquers stateful cloud-native applications

Kubernetes has added many layers of support for building stateful applications and managing them at scale. It’s only a start

The widespread misconception that Kubernetes was not ready for stateful applications such as MySQL and MongoDB has had a surprisingly long half-life. This misconception has been driven by a combination of the initial focus on stateless applications within the community and the relatively late addition of support for persistent storage to the platform.

Further, even after initial support for persistent storage, the kinds of higher-level platform primitives that brought ease of use and flexibility to stateless applications were missing for stateful workloads. However, not only has this shortcoming been addressed, but Kubernetes is fast becoming the preferred platform for stateful cloud-native applications.

Today, one can find first-class Kubernetes storage support for all of the major public cloud providers and for the leading storage products for on-premises or hybrid environments. While the availability of Kubernetes-compatible storage has been a great enabler, Kubernetes support for the Container Storage Interface (CSI) specification is even more important.

The CSI initiative not only introduces a uniform interface for storage vendors across container orchestrators, but it also makes it much easier to provide support for new storage systems, to encourage innovation, and, most importantly, to provide more options for developers and operators.

While increasing storage support for Kubernetes is a welcome trend, it is neither a sufficient nor primary reason why stateful cloud-native applications will be successful. To step back for a second, the driving force behind the success of a platform like Kubernetes is that it is focused on developers and applications, and not on vendors or infrastructure. In response, the Kubernetes development community has stepped in with significant contributions to create appropriate abstractions that bridge the gap between raw infrastructure such as disks and volumes and the applications that use that infrastructure.

Kubernetes StatefulSets, Operators, and Helm charts

First, to make it much simpler to build stateful applications, support for orchestration was added in the form of building blocks such as StatefulSets. StatefulSets automatically handle the hard problems of gracefully scaling and upgrading stateful applications, and of preserving network identity across container restarts. StatefulSets provide a great foundation to build, automate, and operate highly available applications such as databases.

Second, to make it easier to manage stateful applications at scale and without human intervention, the “Operator” concept was introduced. A Kubernetes Operator encodes, in software, the manual playbooks that go into operating complex applications. The benefits of these operators can be clearly seen in the operators published for MySQL, Couchbase, and multi-database environments.

In conjunction with these orchestration advances, the flourishing of Helm, the equivalent of a package manager for Kubernetes, has made it simple to deploy not only different databases but also higher-level applications such as GitLab that draw on multiple data stores. Helm uses a packaging format called “charts” to describe applications and their Kubernetes resources. A single-line command gets you started, and Helm charts can be easily embedded in larger applications to provide the persistence for any stack. In addition, multiple reference examples are available in the form of open source charts that can be easily customized for the needs of custom applications.

Kanister and the K10 Platform

At Kasten, we have been working on two projects, Kanister and K10, that make it dramatically easier for both developers and operators to consume all of the above advancements. Driven by extensive customer input, these projects don’t just abstract away some of the technical complexity inherent in Kubernetes but also present a homogeneous operational experience across applications and clouds at scale.

Kanister, an open-source project, has been driven by the increasing need for a universal and application-aware data management plane—one that supports multiple data services and performs data management tasks at the application level. Developers today frequently draw on multiple data sources for a single app (polyglot persistence), consume data services that are eventually consistent (e.g., Cassandra), and have complex requirements including consistent data capture, custom data masking, and application-centric backup and recovery.

Kanister addresses these challenges by providing a uniform control plane API for data-related actions such as backup, restore, masking, etc. At the same time, Kanister allows domain experts to capture application-specific data management actions in blueprints or recipes that can be easily shared and extended. While Kanister is based on the Kubernetes Operator pattern and Kubernetes CustomResourceDefinitions, those details are hidden from developers, allowing them to focus on their application’s requirements for these data APIs. Instead of learning how to write a Kubernetes Controller, they simply author actions for their data service in whatever language they prefer, ranging from Bash scripts to Go. Today, public examples cover everything from MongoDB backups to deep integration with PostgreSQL’s Point-in-Time-Recovery functionality.

Whereas Kanister handles data at an application level, significant operator challenges also exist for managing data within multiple applications and microservices spread across clusters, clouds, and development environments. We at Kasten introduced the K10 Platform to make it easy for enterprises to build, deploy, and manage stateful containerized applications at scale. With a unique application-centric view, K10 uses policy-driven automation to deliver capabilities such as compliance, data mobility, data manipulation, auditing, and global visibility for your cloud-native applications. For stateful applications, K10 takes the complexity out of a number of use cases including backup and recovery, cross-cluster and multi-cloud application migration, and disaster recovery.

The state of stateful Kubernetes

The need for products such as Kanister and the K10 Platform is being driven by the accelerating growth in the use of stateful container-based applications. A recent survey run by the Kubernetes Special Interest Group on Applications showed that more than 50 percent of users were running some kind of relational database or NoSQL system in their Kubernetes clusters. This number will only go up.

Further, we not only see the use of traditional database systems in cloud-native environments but also the growth of database systems that are built specifically for resiliency, manageability, and observability in a true cloud-native manner. As next-generation systems like Vitess, YugaByte, and CockroachDB mature, expect to see even more innovation in this space.

As we turn the page on this first chapter of the evolution of stateful cloud-native applications, the future holds both a number of opportunities as well as challenges. Given the true cloud portability being offered by cloud-native platforms such as Kubernetes, moving application data around multi-cluster, multi-cloud, and even planet-scale environments will require a new category of distributed systems to be developed.

Data gravity is a major challenge that will need to be overcome. New efficient distribution and transfer algorithms will be needed to work around the speed of light. Allowing enterprise platform operators to work at the unprecedented scale that these new cloud-native platforms enable will require a fundamental, application-centric rethinking of how the data in these environments is managed. What we are doing at Kasten with our K10 enterprise platform and Kanister not only tackles these issues but also sets the stage for true cloud-native data management.

Niraj Tolia is co-founder and CEO of Kasten, an early-stage startup working on cloud-native storage infrastructure. Previously, he was the senior director of software engineering at EMC/Maginatics where he was responsible for the CloudBoost family of products that focused on in-cloud data protection. Prior to EMC’s acquisition of Maginatics, he was a founding member of the Maginatics team and played multiple roles within the company including vice president of engineering, chief architect, and staff engineer. Niraj received his PhD, MS, and BS degrees in computer engineering from Carnegie Mellon University.

—

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Next read this: