Back to Blog

By Erin Thomson |

January 16, 2023 |

In December 2022, Alibaba Cloud, the third largest infrastructure as a service (IaaS) provider, experienced a major outage resulting in the shutdown of many regional services, including a major cryptocurrency exchange and the Monetary Authority of Macau’s website. This blog will examine this event as an example of why cloud users must include redundancy in their cloud Disaster Recovery plans.

The largest cloud outage in recent Chinese cloud provider history

According to Alibaba Cloud’s status page, "The refrigeration equipment failure in the PCCW IDC site caused the Anomaly in Alibaba Cloud Hong Kong Region". The following morning, Cyptocuyrrency exchange OKC tweeted, "There was an intermittent connection error with our cloud provider, which is affecting the user experience," in response to user complaints about difficulties withdrawing funds. According to TechCrunch, this Alibaba Cloud outage lasted up to a day for some customers and was one of the longest and most serious in recent Chinese cloud provider history. This event will have long-term consequences not only for the company but also for its customers. Let's delve a little deeper into this.

Alibaba Cloud's compensation won't even start to cover the cost to its customers

According to the Uptime Institute's 2022 Outage Analysis Report, more than 60% of outages cost businesses more than $100,000, with 15% costing businesses more than $1 million. We can assume that the cost of downtime for Alibaba Cloud’s customers was significant in this severe outage. Alibaba stated that customers who were affected by the service outage would be compensated in accordance with their product and service agreements. While some Cloud Service Providers (CSPs) advertize guaranteed uptime, this typically means they offer small rebates if the SLA is not met, usually in the form of fee discounts, as discussed in our white paper. This in no way covers the loss of revenue, brand damage, and lost productivity caused by the outage.

Clouds do not offer 100% uptime, which is why cloud users need to implement additional redundancy

Although the cloud provides greater flexibility and application resilience, it does not guarantee that your data will be completely safe in the event of an outage. This latest example demonstrates how external factors can cause outages beyond the service provider's control. No matter how large a cloud computing company is, they are still susceptible to outages - no cloud is too big to fail. In an interview with The Register, Uptime Institute analyst Owen Rogers stated that half of the enterprises surveyed believed it was the cloud provider's responsibility to ensure application resiliency. According to Rogers, “it is the user's responsibility to implement additional redundancy through resilient applications such as Disaster Recovery (DR) to ensure resiliency during outages”. Users must have complete confidence in protecting critical databases in the cloud.

Protecting your critical databases in the cloud does not have to be costly or complicated

As mentioned by Alibaba Cloud, “To bring higher availability to applications running on Alibaba Cloud, businesses may seek a high availability solution that greatly increases the resilience of their applications without causing major cost increase”. This can be done through Alibaba Cloud’s Zonal Redundancy, which enables users to run their applications over a cross-zone deployment so they can failover within a region, or a more resilient out-of-region standby.

Implementing out-of-region DR for business-critical databases mitigates risk and helps you meet your data loss and recovery speed requirements across all disaster scenarios, including regional outages. For SQL Server and Oracle SE customers, our Standby MultiPlatform (MP) solution enables you to implement this out-of-region disaster recovery quickly and easily, whether hosted on-premises or in the cloud. StandbyMP employs physical replication technology to create a standby database in a different geographical location that is constantly updated, verified, and ready for failover at any time. StandbyMP implementation takes hours rather than days or weeks and ensures rapid recovery without needing backups or scripts.

If you are on Oracle Standard Edition or SQL Server Standard Edition and are interested in learning more about our solution, you can contact one of our team members today! Alternatively, we also have a Test Drive environment that is already pre-installed and ready to do.

Erin Thomson

Email Erin Thomson

Subscribe to our monthly blog updates

By subscribing, you are agreeing to have your personal information managed in accordance with the terms of DBVisit's Privacy Policy

Tags: Opinion pieces

Blog

Alibaba Cloud outage: why you need redundancy