Blog

Disaster Recovery for Oracle SE on Oracle Cloud Infrastructure (OCI)

Written by Tim Marshall | Sep 23, 2022 4:27:22 AM

Why the Oracle Cloud Infrastructure?

Besides being a potentially easier migration path, OCI can provide pricing benefits compared to BYOL on many other cloud platforms. Companies moving from on-premises will also appreciate Oracle’s DB Systems (sometimes referred to as the Standard Database Service) that enables you to easily create new customized environments in minutes with little training.

Screenshot: Entering details of DB system

Oracle Cloud Infrastructure Disaster Recovery

Why you still need DR in the Oracle Cloud

A business-critical database on Oracle SE is typically required to have high uptime, fast recovery (RTO) with minimal data loss (RPO) from any outage, and ensure resiliency to all disaster types. This includes natural disasters, internet disruptions, user errors, power cuts, data corruption, and hardware failures.  

While Oracle aims for their infrastructure to achieve a 99.95% uptime, this does not mean your database will be available 99.95% of the time. Why?

  • User error or internal actors: A user may misconfigure a service or delete a table rendering your database unavailable while the infrastructure is up.

  • Internet disruptions: Internet backbone issues, such as damaged cables, have put entire regions out of action or significantly degraded performance for days even though the infrastructure is up. 

  • Natural disasters: Although less common, cloud hardware is still affected by natural disasters. The heatwave in Europe took Oracle and Google’s data centers offline leading to a loss of services including access to Virtual Machines (VMs) and Compute. Read more about this here.

So the truth is, without Disaster Recovery (DR), your database is not resilient and you won’t meet your RPO and RTO requirements, even in the cloud. 

The importance of understanding Fault Domains, Availability Domains and Regions

Each cloud provider has a different terminology regarding regions and zones connected with a high-speed backbone.  These are the building blocks that you can use to plan for DR.

  • Regions: Each OCI region is in a geographical area that's independent and separated by vast distances, across countries or even continents, from all the other regions. You should deploy databases in different regions to mitigate the risk of region-wide events, such as large weather systems and earthquakes.

  • Availability Domains (AD): This is one or more data centers located within a region.  ADs are isolated from each other (they don’t share physical infrastructure) to reduce the possibility of them failing simultaneously. Many regions have multiple ADs (usually three) connected to each other by a low latency, high bandwidth backbone to enable near-synchronous replication between ADs. 

  • Fault Domains (FD): This is a grouping of hardware and infrastructure within an Availability Domain. Each AD consists of three FDs.  Fault domains let you distribute your database instances so that they are not on the same physical hardware within a single AD (or datacenter). You can utilize Oracle RAC and SE2HA across FDs to minimize the impact of FD failures and maintenance. 

When configuring your cloud services in OCI it is important to note that each region has different infrastructure and services offered. Many regions continue to have only one Availability Domain. This emphasises the importance of region-to-region replication. In this case, Data Guard on Oracle EE and Dbvisit Standby MultiPlatform (MP)on Oracle SE are the most accepted ways of achieving this. By creating these redundant Compute Instances in a separate Region synchronized to the primary instance, you can avoid impact to your applications caused by a failure in your primary Availability Domain or Region.

For more information, Atul Kumar has written a great blog explaining OCI Regions, Availability Domains and Failure Domains on K21Academy.

What's available to create a resilient environment in OCI

For Oracle SE, Oracle’s native tools for creating a more resilient environment include Backups (RMAN) and Standard Edition High Availability (SE2HA).  But just like on-premises, it is actually possible to easily create and manage a high-performance standby database in the Oracle Cloud using a third-party tool such as Dbvisit StandbyMP. 

Backups - Backups are an important part of a DR Plan (DRP).  But due to their weaknesses around testing, greater data loss (RPO) and longer recovery times (RTO) they should only be part of your DR strategy, not the total solution.  An overview of ways to backup your Oracle Cloud Databases to Object Storage can be found here.

Standby Databases - At its core, a standby database is a warm secondary database that is an exact copy of the production (primary) database which is being continuously kept up to date.  If disaster strikes, the standby database is activated to become the new “primary” database with minimal RPO/RTO.  Standby databases are independent and ideally located in a remote region ensuring resilience to most disaster scenarios.

Standby Database | Dbvisit

Standard Edition High Availability (SE2HA) - OCI offers 2-node (Active/Passive) SE2HA DB Systems on Virtual machine Compute instances that use  Oracle Clusterware to coordinate two nodes’ data storage sharing.  SE2HA is the replacement for RAC on Standard Edition from Oracle version 19c onwards.  It can be set up without any additional licensing costs and provides fast, automated failover to the secondary node for minor issues.  Importantly, SE2HA is not DR because its shared storage is a single point of failure and is vulnerable to data corruption, user error, and data center outages.  

SE2HA delivers most of the benefits of RAC, but there are some differences.  We’ve written a deep overview of SE2HA and its differences from RAC which you can read here.

Replicating Oracle’s Maximum Availability Architecture on Oracle SE 

Oracle have provided Maximum Availability Architectures (MAA) to guide users in their use of Oracle Technologies. Let's look at how we can closely replicate a silver/gold MAA level using Oracle SE, Standard Edition High Availability, and Dbvisit StandbyMP.

Source: Oracle Cloud Maximum Availability Architecture - April 27th, 2022 

Explaining this architecture in the context of Oracle SE:

  • The Primary DB is replicated from Region 1 to Region 2 using Oracle Data Guard. This technology is not available in Standard Edition but can be easily achieved using Dbvisit StandbyMP, which functions in a similar way to Oracle Data Guard. 

  • The Primary and Standby databases utilize Oracle RAC for High Availability and Performance improvement. With Standard Edition we can replicate this functionality using Oracle SE2HA and Active/Passive nodes, but the architecture cannot achieve the performance benefits of RAC.  

  • Backups are performed by RMAN. This feature is common on Oracle EE and SE and can be setup on either the primary or standby. 

A Resilient and Highly Available Architecture using StandbyMP and SE2HA

Notes on this architecture:

  • Prioritizing resilience and database integrity for the standby database, which is located in a remote region is the cornerstone of the DR plan. This is enabled on Oracle SE by using StandbyMP, which is similar to Data Guard in that it provides minimal data loss and a recovery time of only a few minutes. 

  • Within Region 1, across two Fault Domains, SE2HA provides High Availability for minor hardware failures on the active node. If a failure occurs, the passive node activation starts up and takes over. There is no data loss but there is a downtime of a few minutes. 

  • As an option, the standby environment in Region 2 can also be configured with SE2HA, which can be helpful if you are likely to run switchovers more frequently. 

What's next?

  • Learn about Standby’s advanced features, such as Snapshots and Auto Failover, in our datasheet and feature sheet.

  • Learn more about why out-of-region DR is imperative for cloud environments, just as it is for on-premise.   

  • Download and trial the product on-premise or in the cloud. 

  • Receive an in-depth demonstration or arrange a POC with our technical team here - we’d love to hear from you!