<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4768124&amp;fmt=gif">

Microsoft Azure typo outage: how common are human errors?

The Microsoft Azure DevOps outage caused by a typo serves as a reminder of why it's important to implement Disaster Recovery to mitigate human errror.
SQL Server
DR Planning
Data Center Resiliency
By Tim Marshall |
June 23, 2023 |
Link Clicked!

The recent Microsoft Azure DevOps outage in the South Brazil Region serves as a stark reminder of how a simple typo can cause havoc, leaving us with valuable insights into the importance of preventing errors and establishing robust recovery processes.

Human Error: A Major Source of IT Outages

The recent 10-hr Azure outage incident in South America underscores the significant role human error plays in causing system failures. Mistakes can occur at any stage of the software development lifecycle, from code creation to deployment. In this particular case, a hidden typo in the codebase upgrade led to the accidental deletion of 17 production databases. This error sheds light on the need for stringent processes that safeguard against such mishaps and the need for plans and systems that enable you to recover from any disaster.

According to Uptime Institute, human error accounts for about two-thirds of all outages. This highlights the urgent need for organizations to focus on preventing and mitigating human error to ensure system reliability and stability.

Diagram: Most common causes of major human error-related outages

(Image obtained from journal.uptimeinstitute.com)


The need for human-error prevention processes

Preventing human error requires a proactive approach that encompasses various measures. It starts with building a culture of attention to detail and continuous learning within IT teams. Developers must adhere to established coding standards and undergo rigorous code reviews to catch potential errors before they manifest in production environments. Implementing automated testing and quality assurance processes can further reduce the likelihood of human-induced outages. By prioritizing prevention, organizations can significantly minimize the impact of human error on system availability.

The importance of a robust recovery process

Despite our best efforts, human error can still occasionally slip through the cracks. Organizations must have a well-defined recovery plan to handle such incidents effectively. As specialists in Database Disaster Continuity for over 12 years, we recommend all organizations with business-critical databases implement a disaster recovery solution, such as a warm standby database. 

A warm standby database ensures minimal data loss and rapid recovery. By maintaining a replica of the primary database, organizations can quickly switch to the standby database in the event of a disaster, reducing downtime and preserving data integrity. Additionally, restoring to specific points in time can be crucial when dealing with human error disasters, enabling precise recovery and minimizing the impact on operations.


Learn more about recovering from human error on Oracle or SQL Server. 


If you have any questions or would like to discuss how Dbvisit StandbyMP could fit within your organizational needs, contact us, and one of our technical specialists will reach out to you. 

Contact us

Tim Marshall
Tim Marshall

Email Tim Marshall

Subscribe to our monthly blog updates

By subscribing, you are agreeing to have your personal information managed in accordance with the terms of DBVisit's Privacy Policy

Link Clicked!
Try StandbyMP for free

See for yourself how our continuous database protection can help keep your world in motion.

Find a local partner

We work with partners around the world to give you the best advice and world-class local support.

Mask Group 59
Mask Group 184-1
get a price2
get a price2
Get Pricing

With Dbvisit's StandbyMP software, Gold Standard Disaster Recovery doesn't have to be difficult or expensive. Get an instant quote now.