In the following post, I want to share how I would set up a new Kubernetes cluster on Azure. The tips result from several years of experience operating Kubernetes clusters, the last year mainly on Azure.
Disclaimer: Everything is opinionated and adapted to my needs.
As stated in the official documentation, I recommend also setting up two different node pools (system and worker). You should also enable strict scheduling, which means your workloads will only be scheduled on the worker node pool. It would be best to spread all nodes inside a node pool across availability zones to increase guaranteed uptime when using an SLA (see next section) and make your cluster more fault-tolerant.
For the worker node pool, you can also enable the cluster autoscaler. This is useful when you are deploying autoscaling workloads (for example using Horizontal Pod Autoscaling) or many CronJobs. But always set an upper limit on the node count, so if something is not working as expected, you will not get shocked if you receive the invoice at the end of the month.
This article helped me choose the correct VM instance for my nodes. Also, remember: the bigger the node, the higher the percentage of resources you can use for your workloads (see resource reservations).
Tip: Use https://azureprice.net to compare prices for VMs available in your region
Tip: Once you have found your perfect VM size, create a reservation to save money. Depending on your cluster size, the savings can be huge! Azure is also quite flexible if you want to change something with your reservation.
Tip: Set a monthly budget and corresponding alerts so you can quickly react if something is going out of control.
The Azure Kubernetes Service is fully managed, which means you have no control over the control plane nodes, but you still have plenty of options for configuration.
At least for your production cluster, you should enable the SLA. This gives you an uptime of 99.95% if you use availability zones and 99.9% otherwise, as well as increased resources for the control plane components.
Depending on your workloads, you might also want to consider enabling Azure Monitor. This gives you insights that are important for debugging and help right-size your workloads. But be aware that this can be pricy! I’ve seen clusters where the monitoring costs are higher than the costs for the VMs.
Configure a static IP address for inbound and outbound traffic. This makes DNS configuration for your workloads easier and allows you to use allowlists for external resources you need to access from inside the cluster (e.g., databases).
As the Kubernetes Hardening Guide recommends, you should restrict access to the API server to limited IP addresses. This helps to reduce the attack surface. For node security, you should install kured on your cluster. Nodes get automatically patched, but some updates require a reboot, which is not done automatically. Kured reboots your nodes if needed.
Tip: Configure Pod Disruption Budgets and make your workloads redundant, so you get no service interruptions if a node gets rebooted. If you are using the cluster autoscaler, depending on the settings, these interruptions might also occur when the cluster is scaled.
Tip: If you are using Terraform or something similar, don’t declare your public IP addresses within the same state as your cluster because you never want to lose it.
If you are deploying multiple applications to your cluster, which need to be available through the internet, you will probably need the following tools:
I’ve decided to deploy these tools to the system node pool using tolerations because they are crucial for my workloads.
Tip: Keep your cluster stateless. This means that you don’t store any primary data inside the cluster. For example, you can use an external database or a Storage Account. This helps you migrate workloads between clusters and limits the pain if you need to recreate your entire cluster.
One add-on I like to recommend is kubecost. Kubecost gives insights into which workloads or namespaces are consuming the most resources and makes it possible to bill your departments or customers accordingly.