Home » How to run data on Kubernetes: 6 starting principles

How to run data on Kubernetes: 6 starting principles

by Sonal Shukla

The post is designed to help developers understand the process of running data on Kubernetes. It is a technical piece that assumes familiarity with the fundamentals of containers and container orchestration.

There are many benefits to running data on Kubernetes, but in order for it to be a realistic option for your project you may need to spend some time understanding how it works and what’s involved. By understanding the concepts and requirements of running your data on Kubernetes, you’ll have a better sense of the limitations.

The post is based on interviews with developers who have been working with containers for a while and have transitioned their data to run inside containers. You’ll be able to see how they did it and what they experienced in the process. This gives you some clue into what is involved in running data on Kubernetes and why some projects are helped more by that approach than others.

This article is about real data, as opposed to mixed-practice projects for which developers often create a VM on every server. Since there are many different kinds of containers and container orchestration systems it’s hard to recommend a single approach, so I’ll highlight what I think are the most effective.

2. The first step is to pick a way to run your data

Kubernetes can be used with many kinds of applications and platforms. It’s important that you understand both the requirements of your project and the restrictions of Kubernetes before you start taking steps to transition your data into containers .

3. Master-slave is a good approach

It’s likely that your application will run in master-slave mode, where you have a database server and other services that can talk to it. You’re going to need some kind of database to store all the data that needs to be persistent across container restarts. One option is to have the same database server talking with different containers at different times, but this puts too much load on the application servers where your data lives. This can slow read performance, which is definitely not something you want.

4. Master-slave is a good approach

This is about making sure that the database server is replicated so that it’s available even if one master fails, and you can also scale out your database as you grow (up to whatever limits you’re facing).

5. Kubernetes helps with replication, but isn’t a full solution

Replication of databases can be handled by Kubernetes, but it looks like it’s not going to be a high priority feature any time soon. If the only thing that has to replicate data is the master database server then you’ll probably want to use some kind of replication technology instead. You can also use Kubernetes orchestration to restart your containers and the databases they run if the master fails. You’ll still need to make sure that your data can be replicated somewhere, though.

6. You may want to run multiple databases, but there’s a gotcha

If you do need to run multiple databases you should probably set up some kind of load balancing across them (as part of an ingress controller). This is because Kubernetes won’t be coordinating between them. There are some ways to simulate this coordination by using a Kubernetes secret instead of data on disk (we’ll discuss this in Part 2), but it’s not as robust as using something like Consul or Service Discovery.

HomepageClick Hear

Related Posts

Leave a Comment