• May 9, 2018
We started building our azure-operator in the fall of 2017. One of the challenges that we faced was the networking architecture. We evaluated multiple possible architectures and finally chose the one that was best by many parameters. We hope this post will help people setting up their own Azure clusters with decent networking. First let’s look at the available options for Kubernetes networking in Azure.
The first option was to use default Calico with BGP. We are using this option in our on-prem and AWS clusters. On Azure we faced some known limitations: IPIP tunnel traffic and unknown IP traffic is not allowed.
The second option was to use the Azure Container Networking (ACN) CNI plugin. ACN uses Azure network resources (network interfaces, routing tables, etc.) to provide networking connectivity for containers. Native Azure networking support was really promising, but after spending some time we were not able to run it successfully (issue). We did our evaluation in November 2017, at the moment (May 2018) many things have changed and ACN is used by the Azure official installer now.
The third and most trivial option was to use overlay networking. Canal (Calico policy-only + Flannel) provides us with the benefits of network policies and overlay networking is universal meaning it can run anywhere. This is a very good option. The only drawback is the VxLAN overlay networking, which as with any other overlay has a performance penalty.
Finally the best option was to use a mix of Calico in policy-only mode and Kubernetes native (Azure cloud provider) support of Azure user-defined routes.
This approach has multiple benefits:
The only difference from default Calico is that BGP networking is disabled. Instead we are enabling node CIDR allocation in Kubernetes. This is briefly described in the official Calico guide for Azure. Responsibility for routing pod traffic is lying with the Azure route table instead of BGP (used by default in Calico). This is shown on the diagram below.
To better understand the services configuration, let’s look at the next diagram below.
The following checklist can help with configuring your cluster to use Azure route tables with Calico network policies.
--allocate-node-cidrsset to true and a proper subnet in the
--cluster-cidrparameter. The subnet should be a part of Azure virtual network and will be used by pods. For example, for virtual network 10.0.0.0/16 and VMs subnet 10.0.0.0/24 we can pick up the second half of the virtual network, which is 10.0.128.0/17. By default Kubernetes allocates /24 subnet per node which is equal to 256 pods per node. This controlled by
routeTableNameparameter set to the name of the routing table. In general, having a properly configured cloud provider config in Azure is very important, but this is out of scope of this post. All available options for Azure cloud provider can be found here.
This post has shown some of the Kubernetes networking options in Azure and why we chose the solution we’re using. If you’re running Kubernetes on Azure or thinking about it you can read more about our production ready Kubernetes on Azure. If you’re interested in learning more about how Giant Swarm can run your Kubernetes on Azure, get in touch.
Giant Swarm’s managed microservices infrastructure enables enterprises to run agile, resilient, distributed systems at scale, while removing the tasks related to managing the complex underlying infrastructure.
GET IN TOUCH
CERTIFIED SERVICE PROVIDER