Diego: Today, we will talk about the costs of Cloud Native applications, a theme that is only apparently simple. Especially in the application design, costs should be considered from the very first steps of the project.
Giulio, before starting the interview, can you briefly tell us who you are and what you do?
Giulio: I am the CTO and co-founder of Mia-Platform, a solution for developing cloud-native applications or platforms (we like to call it the meta-platform). Everyday, I have to deal with applications that go into production, problems in production, costs that go up and that must be kept under control.
We always say: “every euro saved is a euro earned”.
D: Right! Let’s go into today’s topic: Cloud-Native applications and costs that are Cloud Native and can run on various infrastructures: on premise, on cloud, across multiple clouds, multi-clouds, etc.
Let’s start from the beginning: what is a cloud native application? What is the simplest explanation you could give?
G: If we opened Google now and searched for “Cloud Native”, “The Twelve-Factor App” term would probably come out. I would leave this concept aside for a moment because over time it has been revisited and adapted.
In fact, software applications are applications that must be implemented according to certain paradigms: clean code, TDD, clear responsibility both in terms of names and runtime, easy maintenance, etc.
These are the characteristics of software applications in general.
Sometimes these characteristics are attributed to Cloud Native applications, although I think that it is a bit reductive as Cloud Native applications have several more features. We have identified 7, but the ones I care the most are:
- The application must make a log that is understandable and traceable and historically managed in order to be able to troubleshoot. As applications that scale dynamically, the logs should all go in one place (you can’t map them sparsely) and should be sorted temporally.
- From the resources and management point of view , the application must be small because it must be able to scale up or down quickly. Every time you increase the application’s replicas, for example, it shouldn’t take 8 minutes to start but only a few seconds.
- It must be possible to measure both its health and readiness status, and the metrics exposed, otherwise you might lose track of what you are doing. When you have hundreds of replicas of the same application and you start having problems, you can no longer read the logs and aggregation becomes difficult. Measurement can help you observe what the application is doing at the moment, how many messages are queuing per second, how many are queued – for example from a kafka queue -, etc. The cloud native application must be observable so that you can, automatically or semi-automatically, repair it and restore the system in its continuity.
Q: I was hoping you wouldn’t explain the 12 factors!
So there is no need for a microservice for the application to be Cloud Native?
G: I can’t tell you if there is a need or not. A Cloud Native application has the characteristics we have seen, and others as well. These characteristics are necessary conditions for a microservice, but a Cloud Native application is not necessarily a microservice.
On the other hand, the concepts are now so overlapped that it is easier to apply the same mechanism that you apply when you call blotting paper “Scottex”, regardless of the brand you are using.
D: This metaphor is beautiful!
I agree, it is often assumed that the Cloud-Native application must run on containers, but what you have just explained is also applicable to a very small application that runs on a minimal server. It is clear, however, that going towards the world of containers and Kubernetes there are also constructs that allow us to define by code the major or minor limits that one of these resources will have to consume. So reservations can be made, limits can be set, everything is more complicated and explanatory.
My next questions are: how do I understand from the beginning how many resources my application will need? What resources should I foresee? And, based on this information, should I design my application in one way rather than another?
G: These are very difficult questions. I don’t have an answer yet. At Mia-Platform, we are studying the topic and I will try to share some parameters on which we can make a metric and have a good measure.
You can say that Cloud Native applications do not necessarily go on containers and can go on infrastructure. Furthermore, it is also true that Kubernetes containers are bringing out Standard APIs that will probably become standard for managing and orchestrating distributed operating systems (as I like to call Kubernetes).
From this point of view, it also becomes important to understand how you would mark the limits of your resources to optimize the infrastructure load.
How do you understand if that software, application, service or container is sized correctly? The only way we have found is to have a well-structured table based on the programming language used.
Let’s take a subset: greenfield development of an application logic inside a container, where you could develop with standard languages among application servers (classic Java within a Spring Boot, or Python and Node.js up to Elixir, Rust, Go).
For example, we took many ‘hello worlds’ and started them to understand the memory fingerprint (different from that of the CPU). We did some tests and we saw that Java uses from 150 to 300 megabytes of RAM, Node.js about 50, Go about 5, with Rust and C++ we arrived at less than 5 MB. If, however, we make a million requests per minute, obviously that 5 will increase the size.
The programming language – and the underlying framework – is important to reduce the fingerprint but it is not enough because it must be understood what kind of stress is placed on the application. Here, stress tests are very important!
This applies to all non-container applications, because also in this case it is necessary to carry out a sizing and a capacity plan; we tend not to do them because we take it for granted that we have such a large infrastructure that there will be no need for it.
In a cloud solution this is not possible: you usually move to the cloud to save money.
In the cloud, you spend only regarding what you need: so you can test the system to make a correct capacity plan according to the CPU and RAM by doing stress tests with various surrounding conditions.
Q: The questions that follow are: let’s say I’m on the cloud, how can I see what the costs of an application are? And how do I monitor them? Do I need to have a product from the cloud provider? What if I’m in IaaS mode?
What is your experience?
G: For each pod you have the possibility to assign limits and requests to the CPU and RAM. At that point the pod will consume within those limits and you can measure how much RAM and CPU it is consuming at that moment.
The pod is your basic measurement, the starting point. If you start measuring periodically how much RAM and CPU each pod consumes, you can do cost x RAM and cost x CPU and come up with a cost per single pod. This can be a starting point.
To assign the CPU and RAM limits we have already seen that stress tests are useful. The vertical pod autoscaling by Kubernetes can be useful to suggest these limits: this helps you to size and be calm.
It may happen that, despite the sizing, that service is not used as you expected but is used very little. You could thus have a very high margin between what you expected and what is actually used and have a potential cost saving per each pod.
Take for example this context:
- You have a single application with an aggregate of microservices within a namespace of a few microservices,
- A single team,
- And a single cluster
It becomes quite easy to measure it because you have metrics collected, for example, by Prometheus and a consumption history on which you can go and do calculations. This is a manual but doable solution.
It starts to be a problem when the Kubernetes cluster is used by multiple applications, by multiple teams, or even by multiple teams from different areas of the same company with different cost units.
At that point you have to start tagging all your pods with labels in order to make an aggregate of these costs per consumption of each individual pod.
On this aspect, there are a few solutions that help: kubemodel, for example, is an open-source project that helps you do this job.
Conceptually it is quite simple: you have the costs of CPU consumed, the costs of RAM consumed, and you can do it both on the cloud and on premise.
Q: That’s true and you can do it on Google, Amazon, and any other cloud. Building your own cost model as you suggest could be a win.
Now that we have seen how to monitor CPU and RAM, and related costs – and therefore you have observability – how can you keep them under control? There is a way to detect if you are leaving the trend line or consuming too much / too little than expected.
G: To make an example, we have several clusters with a few thousand pods running. It is not easy to manage all the costs because the teams work on their own.
In my experience I have learned to first of all put alarms on macro costs according to my global budget: some of them go off, both below and above 100% (especially above 100%!). When the alarm reaches 50%, I look at what day of the month it is and see if we are on the line or not.
Also, each project has its billing, its cost aggregations, and as long as the cloud provider can help you, you get there in the same way. When you don’t get there, you can then use alerts on Prometheus which are about the consumption of resources aggregated on namespaces.
I haven’t tried it yet, but the Google auto pilot that bills you by single pod running CPU seems very interesting to me, but it has really recently come out and still needs to be understood and tested.
Certainly setting alarms on a fixed budget is the most important thing, and must be checked at fixed periods and evaluated from time to time based on the entry of new projects or new variables.
D: So you have to constantly check the measured consumption vs the expected one and go and see the reason: it can be due to growing business, or implementation problems.
Now I ask you: who are the professional figures that do this cost analysis? Are they within the project and the feature team or are they dedicated people?
G: Those who control costs tend to do budget and cost control outside the team: this is a separate area. But the one who has to give you the information and is alerted by alarms is the feature team. The feature team is responsible for spending the budget and saving, while the control is performed from an external area so that there is reconciliation between what was paid on the invoice and what was planned and monitored in the management systems. This is because you might risk thinking you are spending x, while actually paying y and not realizing it.
You have to involve the company’s financial unit because the risk is doing a beautiful technical job that might not bring value to the company, and also because the teams do not always have access to the invoices.
Q: Sure, so the cost controller calls the feature team and together they see where the budget shifts that have been detected come from.
Can we say that the only way to compare costs between the various clouds is to build your own cost control model?
G: At Mia-Platform, we are trying to build our cost model. Comparing costs between cloud vendors is not trivial, because one thing is to look at the price list, another is to start making a little reservation, pay attention to how you allocate your node pulls under Kubernetes depending on the type of machines, etc. So you start having a mix of costs that is not easily comparable.
In this historical moment you can build your own model, or alternatively there are market solutions that help you do it. We are currently building ours, depending on how much we are going to reserve and how many CPUs we are going to buy for the future. This becomes a bit of “a resource bag” because it allows you to monitor resources over the years and make a real capacity plan.
The capacity plan is really important in the cloud, coupled with some good insights into the future.
It is still difficult to understand whether to move the application from one cloud to another, but it is worth having a model to understand how to navigate and at the moment there seems to be nothing on the market.
I’ll give you an example: I’m talking about Google but it also applies to others, AWS, etc.
You can reserve the CPUs for 3 years with a significant cost saving. The problem is that if you reserve the CPU, what is downscaling for? You can always downscale your clusters, then reduce or shut down the pods, but this is already purchased consumption.
Davide Bianchi, Mia-Platform Senior Technical Leader, has just developed kube-green, an open source project that turns off unused pods and starts them up again when needed. But if you buy CPUs for 3 years then you might as well leave it all on.
Here it begins to be a matter not only of costs but also of ethics and environment, so it is worth making a good bag with the total at 3 years, at 1 year and variable. This mix is the true art of “FinOps” because if you can do this mix you can be really effective on cost scaling.
Q: In fact, downscaling is something that is too often underestimated, even for development environments.
We are talking about multi-clouds, but I claim that even my datacenter at home is a cloud if I work in the cloud; in this way it is even more difficult to compare the costs between private and public clouds.
G: If you think about it, the advantage of Kubernetes, containers and Docker is that they have removed the concept that you can only share one application. Instead, you can share several applications by sharing the millicpu.
The first time I read it I was very impressed because I realized that you can actually divide into millicpu, so one millicpu = one core makes you understand how much you can share on the same infrastructure. At that point, even if your infrastructure has a defined cost, you can make it more and more effective and efficient by sharing more and more things without necessarily scaling.
Q: We must then consider costs vs performance. Performance is also a metric that needs to be considered.
“Cloud Native is very nice but I’m spending a lot more than I expected”: have they ever made this observation to you?
G: It depends on what you put on it: if you put big things, you don’t save much, but if you put small things, which scale dynamically, then you can actually have considerable savings. The classic monolithic elephant on the cloud does not make you save much; instead it makes you lose money.
If you take the elephant in the cloud, reduce networking, start slicing and modulating scaling, then you’ll start saving.
Until now, nobody ever told me: “it costs me much more to go in the cloud” but surely the benefits come after a while that you have a real Cloud Native architecture.
D: The buzzword of 2021 is FinOps: and the thing is that if you want to do multi cloud, zero lock in, etc., then you must put in the highest priority the cost control on the cloud. Do you agree?
G: Yes, from day 0 you have to put the alarms on how much you spend; otherwise, you can consider yourself blind. Open the account and set the alarms. Even on 120% of spending. And also on the 200%.
Reducing technical debt is a professionally ethical thing to do and we take it for granted. However, we are moving to a new level of ethics to be integrated, which is the environmental one, which considers what we consume and what is our impact.
Bitcoin and blockchain consume a lot of energy but I don’t think it was initially foreseen.
Will the software that we write today still exist in 10 years, and if it will be, how will it scale? With millions of users it could consume a lot more energy than what we see today.
That will be the cloud native technical debt for the next few years. We will need gigantic clusters if we are not careful to minimize the fingerprint of our pods.
These are the problems we are creating today and that we will have to solve in the future.