Quick Summary
High availability is one of any cloud application’s most important non-functional requirements. It means that the application is accessible and functional at all times, regardless of any failures or disruptions that may occur in the underlying infrastructure or services. The percentage of uptime or availability usually measures the high availability the application can guarantee over time. For example, an application that has 99.99% availability means that it is expected to be down for only 4.38 minutes per month.
Yet, ensuring high availability isn’t simple, particularly when an application relies on various cloud services with differing availability levels and service level agreements (SLAs). An SLA serves as an agreement between a service provider and a customer, outlining the anticipated service performance and quality, along with the repercussions for any violations.For example, Azure offers various SLAs for its different services, ranging from 99.9% to 99.999% availability1.
How can we communicate such SLAs to the business users and stakeholders of the application? One way is to use a table or a chart showing the availability of each service and the application and the corresponding downtime per month or year. For example, using the same services as before, we can create a table like this:
Service | Availability | Downtime per month | Downtime per year |
---|---|---|---|
Azure VMs | 99.99% | 4.38 minutes | 52.56 minutes |
Azure SQL Database | 99.995% | 2.18 minutes | 26.28 minutes |
Azure Storage | 99.9% | 43.8 minutes | 8.76 hours |
Application | 99.885% | 51.66 minutes | 10.33 hours |
This table can help the business users and stakeholders understand the impact of the SLAs on the availability and reliability of the application, as well as the trade-offs between cost and performance. For example, using a higher SLA service may incur a higher cost, reduce downtime, and improve the user experience.
How can we calculate the overall availability of an application that uses multiple cloud services with different SLAs? One way is to use the formula:
Availability = Availability of Service 1 * Availability of Service 2 * … * Availability of Service N
For example, suppose we have an application that uses the following Azure services:
- Azure Virtual Machines (VMs) have Availability Zones and premium disks with a 99.99% SLA2.
- Azure SQL Database with Zone-Redundant Storage, which has a 99.995% SLA3.
- Azure Storage with Geo-Redundant Storage has a 99.9% SLA2.
Using the formula, we can calculate the availability of the application as:
Availability = 0.9999 * 0.99995 * 0.999 = 0.99885
This means the application is available at 99.885%, lower than any individual services. This is because the availability of the application is affected by the availability of the weakest link in the chain of services. Therefore, to improve the availability of the application, we need to either increase the availability of the lowest SLA service or reduce the dependency on that service.
Finally, what are the most cost-effective and business-critical availability SLAs to target? The answer depends on the nature and requirements of the application, as well as the budget and expectations of the customers. For example, an e-commerce application may need a higher availability SLA than a personal blog because downtime may result in lost revenue and customer dissatisfaction. However, a higher availability SLA may also require more resources and complexity, which may increase the cost and maintenance of the application. Therefore, one should balance the benefits and costs of the SLAs and choose the ones that best suit the needs and goals of the application.
In conclusion, high availability is a key non-functional requirement for any cloud application, but it takes work to achieve and measure. It is crucial to understand the service level agreements (SLAs) associated with the utilized cloud services as they enable the calculation of the application’s comprehensive availability through a defined formula. Additionally, communicating these SLAs and the application’s availability to business users and stakeholders via a table or chart is essential. One should also consider the trade-offs between cost and performance and select the most appropriate SLAs for the application. By following these steps, one can design and operate a highly available cloud application that meets the expectations and requirements of the customers.
At Upsquare Consultancy Services, our cloud experts can help you calculate the right SLAs for your application by understanding your business needs and designing the right cloud architecture.
References:
https://azure.microsoft.com/en-us/blog/understanding-and-leveraging-azure-sql-database-sla/