Blog

Disaster Recovery in the cloud: not all clouds have a silver lining

Written by Tim Marshall | Aug 17, 2022 12:51:59 AM

Cloud adoption can provide fast, efficient, and often cost-effective ways to achieve a wide range of IT objectives. However, while many organizations have migrated noncritical workloads to the cloud, most organizations are more resistant to migrating their critical databases. 

Concerns about data integrity, availability, disaster recovery, sovereignty, and performance are the main roadblocks to database cloud migration. However, as cloud services and tools have matured, we are now seeing an acceleration in cloud adoption, even for critical database applications. 

Adoption has accelerated to the point where the question is no longer "if" they should move to the cloud, but rather "what is the best strategy and timeframe for doing so?" The first step in answering this question is, as always, to evaluate your RTO/RPO requirements. 

Moving to the cloud, however, does not magically solve all problems. Moving to the cloud will not immediately alleviate your concerns about data integrity and availability. You must still pay close attention to how you architect and deploy your cloud solution. For example, for organizations with short RTO/RPO requirements, both AWS and Oracle (and us), recommend deploying a warm standby environment in a separate geographic region to mitigate against regional outages.  Learn more about how to assess your RTO/RPO requirements here.

What is the significance of this? A common misconception is that "I'm safe because I have in-region high availability through my Cloud Service Provider." On the surface, Cloud Service Providers (CSPs) will tout "in-region high availability" such as Multi-AZ (Multi-AZ) as a panacea for most availability requirements. While in-region high availability can provide some resilience, it is not uncommon for entire regions to go down, and you must be prepared for this. 
 

The most recent example (July 2022) occurred in London when the Google and Oracle cloud servers wilted during a record-breaking heatwave (40.3C), dropping networking, storage, and computing resources offline. This was a regional phenomenon, taking down the whole Google Cloud Europe West2 region for over 6 hours and with complete service restored after 35 hours. While this example is recent and very high profile, this is just one of many that demonstrate that regional outages are not uncommon. 

But even if you've already moved your databases to the cloud, it's important to realize that you may be far less protected than you think.

 
Most people believe that their CSPs guarantee uptime and while this is technically true, it’s important not to be suckered in by the marketing speak. While some CSPs will advertise guaranteed uptime, the devil is in the detail. Most of these ‘guarantees’ merely offer small rebates if SLAs are not met, usually in the form of subscription discounts.  The uptime figure also doesn't include ‘scheduled downtime’ or periods of reduced performance.  

It's important to understand the differentiation here, CSPs are NOT guaranteeing that a service will be available, or that you will have access to your data - they are simply offering modest refunds should this eventuate.

When it comes to safeguarding your critical databases in the cloud, you must be completely confident. And you must understand that it is your responsibility as a user (not the CSP) to deploy a solution that meets your RTO/RPO requirements across all disaster types. CSPs typically don't include resilient Disaster Recovery as a bundled part of their offerings. Instead, they recommend you implement out-of-region DR for business-critical databases. To better understand the specific reference architectures that major Cloud Providers recommend, read our cloud white paper here.

With so much business value tied up in an organization's data, it is critical that this data be protected. Understanding any limitations and critical failure points is a critical first step. This means accepting that no cloud is immune to downtime and developing a robust solution to mitigate this risk while meeting your RTO/RPO requirements. 

Interested in learning more or want a quick demo? Contact Us.
Alternatively, you can take our software for a Test Drive.