Auto shutdown dev/test AKS clusters to save money

You want to experiment with Azure Kubernetes service (AKS) but don’t want the expense of cluster(s) running 24/7, however you also don’t have time to wait for the 10-15mins it usually takes to create a new cluster from scratch? Instead of creating a cluster everytime you need one and then deleting it again, you can provision a cluster once and then use the built-in features of Azure VMs to automatically shut down the cluster nodes on a schedule thereby reducing the costs of running a cluster to close to zero.^[The cost isn’t exactly zero because you’ll likely be paying a nominal monthly fee for some other services such as managed disks, public IPs etc] When you need the cluster you can spin the nodes back up in a few minutes. Scripts to start and stop the nodes can be found below.

How does this work?

Unlike some other managed Kubernetes providers, Azure doesn’t charge a fee to use the AKS service, you only pay the standard IaaS costs for running a Kubernetes cluster. The majority of the cost of an AKS cluster is the VM node pool (the VMs where your containers run). Because you don’t pay for the master nodes on AKS if you can reduce the costs of the worker nodes you can save a substantial amount of money for clusters that don’t need to be up and running 24/7.

Enabling the auto-shutdown

At the time of writing this post I don’t believe its possible to enable the auto-shutdown feature of a VM from the CLI, therefore you’ll need to enable this for each node of your cluster from the portal. First, find the resource group created for you when your cluster was provisoned, this is should be named using the following pattern MC_*resourcegroupname_clustername_location*, within that resource group you should find the VMs and if you click on each of those VMs you should see the auto-shutdown option in the left hand pane

Starting the nodes

You will need to ensure you have the Azure CLI installed. To start the nodes you just need to know the name of the resource group where you infrastructure resides, is named using the following pattern MC_*resourcegroupname_clustername_location*

az vm start --ids $(az vm list -g <resourcegroupname> --query "[].id" -o tsv)

Stopping the nodes

Your nodes will auto-shutdown based on the schedule you’ve setup up but you can also script the nodes to shutdown on demand. In Azure you want to deallocate VMs to ensure you are not paying for them rather than just shutting them down.

az vm deallocate --ids $(az vm list -g <resourcegroupname> --query "[].id" -o tsv)

Final notes

This isn’t an officially sanctioned approach and I wouldn’t use it on any clusters that are used for production workloads (even those that only have traffic during certain periods). I can’t guarantee it won’t cause any weird issues (although I’ve been using this trick for several months including during the transition from preview to GA). If you scale the cluster up your new nodes will not have auto-shutdown feature automatically enabled, also if you upgrade your cluster you may lose the auto-shutdown settings. You can penny pinch further by using the B2 series VMs (thanks to Gordon Byers for highlighting that) and also scaling your node count down to 1, you can set these parameters when you are initially creating the cluster using the CLI.

If you do run into issues its sometimes possible to recover from them by performing an in place cluster upgrade.

It should be possible to create an Azure CLI extension which wraps this behaviour into a better experience, I’ll follow this post up with how to do that.