"High-ish" availability cross-region architectures

TL;DR

The architectures of a cross-region highly available (HA) application vs a single region application are often seen as quite distinct. This post will show it doesn’t have to be that clear cut, with a few small changes we can cost effectively deal with a region outage quicker and without the same expense as running redundant copies of our solution.

Introduction

High Availability concepts and architecture is a big topic but I wanted to cover a specific aspect of HA with regards to configuring architectures across datacentre regions. This post doesn’t cover other important aspects of HA and resilience nor recovery point objectives.

When you’re creating Azure architectures you should always consider your high availability options especially in the context of your agreed Recovery Time Objective (RTO). Many services in Azure have a degree of resilience within a region already, however to protect against a whole region outage you may want to consider a cross-region HA deployment.

Thankfully whole region outages are becoming rare and with the roll out of Availability Zones we should have even more resilience within a single region. To setup cross-region HA you typically provision multiple instances of your architecture across regions and route the traffic as required e.g. Active-Passive or Active-Active.

However, if your RTO allows it, it may be more cost effective to setup in a single region but be prepared to quickly provision infrastructure into another region in the event of an outage. The decision to go single region doesn’t mean you can’t take steps to make the process of failing over to another region easier without incurring the same level of cost as a fully solution.

Cross-region HA setup

Single region setup

Without the cross-region requirement your architecture can be simplified.

If you imagine the steps required to get up and running in a new region to be

  1. Provision new infrastructure in secondary region.
  2. Restore SQL database(s)
  3. Update DNS records to point to secondary region (assuming you’re using a custom domain in an external registry)

Step 1 should be relatively straightforward, especially if you have the ARM templates ready to go and good support for application deployment.

Step 2 should be manageable (assuming you have the relevant SQL backups at the correct frequency) but can be time consuming to restore large databases.

Step 3 however could be problematic, especially if you don’t have direct control of the DNS records. You will likely need to raise a change with the controller of the DNS records and then wait for the changes to flush through, all of which will be eating into your RTO.

“HA-Ready” options

Mitigate DNS issues using Traffic Manager

If you’re curious about how much this approach would cost, it depends on the number of DNS queries resolved by Traffic Manage but to give you an indication, 5 million DNS queries/month + 1 Azure health check costs approx £2/month for a Traffic Manager in West Europe

You may also be wondering if the Traffic Manager is a single point of failure in the architecture above and it should come as no surprise that it is highly resilient and can tolerate a region outage.

Retaining Traffic Manager and SQL geo-replication

Summary

Deciding between single region and cross-region architecture doesn’t have to be black or white. With a bit of creativity you can find a middle ground which is cost effective and allows you to recover from disaster with less effort and more reliably.

If you’d like to discuss your current Azure architecture get in touch with us using the contact form we can perform an Azure Health check reviewing both your Azure architecture and also your application architecture.