Part 2: Taking control of the cost drivers
• Jul 1, 2021
Now that you have examined your compute-related costs (if you haven’t, check out the previous post in this series) let’s dig into cost optimization of traffic and storage resources.
On the boundary of compute and traffic is cluster composition. Or, in other words, the debate between having one multi-tenant cluster or having many single-tenant clusters. There are many considerations to account for when making this decision. For our purposes here we will look at things through the cost-saving prism. But there is much more to this and we encourage you to get a full picture of the trade-offs to make an informed decision.
These days Kubernetes makes it quick and easy to spin up a cluster that suits your needs. Clusters can be created by environment (i.e. development, testing, staging or production), by team, or by area. A single cluster can serve multiple purposes (tenants), due to namespacing.
Now that we understand that we can do almost anything we want, let’s talk about cost considerations. Costs for each cluster are affected by the number of nodes and how large that cluster will get. So, in effect, a large cluster costs a lot. When you have many small clusters, you will incur different costs. These include the number of control planes you need or the managed solutions that you use since much of the cloud pricing is cluster-based.
You also have to consider traffic. Traffic within a VPC is free, as is ingress traffic. But, egress traffic is charged by the cloud providers. You may need to connect services between clusters. They may or may not be within the same availability zone. You may need load balancers to route traffic between services that are running on different clusters. Interconnectivity ends up as a cost and should be considered and planned for.
Our recommendation is to put services or applications that need to communicate extensively with each other on the same cluster.
You may have heard that reliability improves when high availability applications run in different availability zones. The trade-off is egress traffic, which is charged by GB or TB.
Our recommendation is to step back from this reliability paradigm. Think about the needs and characteristics of your application. In some cases, when an availability zone goes down, you can quickly spin up a container and restart your application in a different availability zone. This scenario could be sufficient for the application in question.
There is an option to create node pools across availability zones, which is appealing in terms of reliability. In this case, your tradeoff would be towards autoscaling capabilities. These do not work as well when spread across more than one availability zone.
Our recommendation is to stick with a single availability zone and a single instance type per node pool. This will allow you to best utilize the autoscaling capabilities discussed earlier.
Network policies do not directly incur the cost. They do affect it in an indirect way, as they impact traffic. Network policies manage traffic based on allow-and-deny policies. While developing these policies, you have the opportunity to reexamine and optimize all the connections coming in and out of applications.
Ingress / Services
There are two ways to expose traffic to the outside world:
1. A LoadBalancer service is the standard way to expose a service to the internet.
It provides you with a single IP address that will forward all traffic to your service. If you want to directly expose a service, this is the default method.
The big downside is that each service you expose with a LoadBalancer will get its own IP address. You pay for a LoadBalancer per exposed service, which can get expensive!
2. Ingress is actually NOT a type of service.
Instead, it sits in front of several services and acts as a “smart router” or entry point into your cluster. Ingress is probably the most powerful way to expose your services. It can also be the most complicated. You can do a lot of different things with an ingress. There are many types of Ingress controllers that have different capabilities, to choose from.
From the cost perspective, when you use ingress, you only pay for one load balancer. Besides, ingress is “smart” you can get a lot of additional features at the same cost (like SSL, Auth, Routing, and more).
Our recommendation is to go with ingress since it provides a lot of flexibility. Load balancers can be very reliable, but in order to communicate outside a cluster, you will require ingress anyway. And, if you are running several applications, with different SLAs on a single cluster, you still need more than one ingress.
We will discuss three types of storage in this section:
- Block storage
- File storage
- Object storage
We'll also provide recommendations on how to use each type of storage in the most cost-effective way.
The initial consideration with regards to block storage is that you need to think about IOPS or throughput. Different storage classes are available from the cloud providers.
The interesting thing here is that with Kubernetes, you can define how big your volumes are. You can decide on the size on the fly, and even resize it. Most cloud providers give you the possibility to increase the size of the volume after it is created allowing you to not have to commit upfront.
Our recommendation for cost-saving and effective resource utilization is to start with small volumes and grow them as needed.
You can also automate a volume snapshot. This is especially useful for applications that are not running all the time. In this case, your starting point is recovering the snapshot for the next time the application runs.
Another option is to use ephemeral storage. Ephemeral volume storage comes along with pods. It is the volume space within the pod and it stays as long as the pod lives. The moment the pod restarts or we delete it, all the data disappears with it.
Kubernetes gives us the possibility to request ephemeral storage volumes that are attached to the instance getting profit from this free space.
This type of storage aligns with the traditional concept of storage. It keeps the data in a neat, hierarchical structure. It can be a local filesystem, like hostPath, or it can be a network share like NFS. Kubernetes can directly mount it inside the pod without any additional preparation. File systems on one node can also be attached to many pods on that particular node. Usually, for this type of storage, you’ll pay for exactly what you use, as there’s no advance provisioning, up-front fees, or commitments. In some scenarios would be cheaper than block storage if the latency offered meets your requirements.
Focuses on storing large amounts of unstructured data at scale. It can be used by enterprise workloads as well as for backup solutions. They are offered by all the cloud providers. As of the writing of this post, there is no official Kubernetes supported way to mount object storage inside pods but there are workarounds. In most cases, you will work with object storage directly from your app. You will use provider-specific libraries, which is how it's meant to work.
Cloud providers have available their own CSI solutions but there are cool alternatives such as Minio and Rook which add some benefits upon managed solutions. Running the storage directly in the clusters makes the solution portable and reduces provider costs, but take into account that there will be an additional management overhead.
In terms of the data lifecycle, we sometimes see waste when it comes to using cloud provider solutions. There are buckets filled with objects that are not necessary for compliance, or other needs. People just forget to remove things.
In this post, we looked at how traffic and storage affect your costs. We also offered some strategies to take control of these costs and optimize your setup. In our next and final post, we will be looking at the tradeoffs you may sometimes need to make in order to optimize your setup. In addition, we will be giving recommendations for some open-source tools that can help with cost control and we'll close off with some considerations around cost allocation.
Get this final article in the series delivered straight to your inbox...
Subscribe to our Newsletter! And enjoy a curated list of cloud-native insights every month.
We at Giant Swarm are committed to creating the best stack for our customers. If you would like to talk to us about optimizing your cloud-native stack, contact us — we're here to help!
About the author
Fernando Ripoll is a Solution Architect at Giant Swarm. He has hosted webinars on Evolving from Infrastructure as Code to Platform as Code with Kubernetes and has presented at DevOps Barcelona — and that's just scratching the surface. When he's not sharing his expertise with customers like Vodafone, he can be found in nature with his family. Find him on Twitter and say hola! 👋