free articles
 

The Art of Failure Planning

He that will not apply new remedies must expect new evils; for time is the greatest innovator.
-Francis Bacon

A Tale of Two Plans
There is a lot of talk these days about disaster recovery planning. Organizations of all sizes are investing big money developing "The Plan" that will help them survive should something really bad happen.

Yet disaster recovery planning is problematic. At what point does the plan kick in? What if something sort of bad but not really bad happens? And what happens if something within the plan itself fails? Does this mean the plan was faulty or incomplete? And is a failure of the plan itself even worth planning for?

When best-laid plans fail, we have to resort to Plan B. So why not just do Plan B in the first place? Let’s compare the differences between Plan A and Plan B to see:

Plan A vs. Plan B

  • Automated vs. Manual
  • Planning vs. Troubleshooting
  • In-line redundant vs. Offline recovery
  • Big Companies vs. Small teams
  • Predictive vs. Reactive
  • Prevents small problems from growing vs. Fixes big problems
  • All or nothing vs. Something
  • Good at best-case scenario vs. Good at worse-case scenario
  • Expensive vs. Practical

Why Plan A Works Better For Small Problems
Plan A shines at keeping minor problems from growing into disasters. Take RAID for example. By having a set of redundant hard drives, you prevent the predictable failure of a single hard drive from causing an entire system to fail.

Plan A relies upon predictable outcomes. If you have a backup system in place that automatically takes over in case the primary system fails, that is a predictable outcome. Calling Dell support in case the primary system fails is not a predictable outcome.

Most big organizations invest solely in Plan A solutions. They have inline redundant systems, huge knowledge bases of information and pay millions of dollars for support contracts. When it works, Plan A is really invisible. It quietly and efficiently keeps things running.

Ironically, most disaster recovery plans are entirely Plan A, probably because they are made by the same big organizations that love Plan A solutions so much. They literally attempt to plan for every eventuality. This is a mistake because Plan A has one major drawback: it either works or it doesn’t. And when it doesn’t you’ve got a real disaster on your hands that somebody has to fix.

Why Plan B Works Better for Big Problems
Where Plan A fails, Plan B excels. Plan B analyzes the problem and then develops a simple and flexible plan to fix it. Plan B requires common sense and action. The most important choice to make for your Plan B is personnel.

The best Plan B starts with a team of talented and experienced people you can count on who work well under pressure and are great at troubleshooting. To be effective, you have to give your Plan B team some room to work. Give them the authority to make decisions and resources to support their efforts. Accept that the solution will be imperfect. Your team must be allowed to make mistakes.

This isn’t to say that Plan B should be completely ad-hoc. It still is a plan, after all. Involve technology, when appropriate. For instance, we use backup software from Ultrabac that makes a daily image of our servers. In the event a server fails, we can restore the image on a different server, even if the hardware is not the same as the original. It’s a manual process to restore a server and data since the last backup will be lost. Still, it’s a pretty good solution given the alternative.

Most small organizations rely solely on Plan B. They have little or no backup systems or even backups for that matter. In case of failures they endure downtime, hope for miracles and sometimes get them.

The Art of Failure Planning
If you are a big organization, you have to face the reality that a true disaster recovery plan needs to look more like Plan B than Plan A. Instead of spending a lot of money trying to plan for every possible bad thing that could happen, put together a qualified Plan B team. Then have that team tell you what resources they would need to respond to an emergency.

The lessons of Hurricane Katrina demonstrate clearly that trying to force Plan A to work will only create a bigger problem. Had the government gone with Plan B to start with, much of the suffering would have been avoided.

If you’re a small organization, you can do with more Plan A. Invest in practical backup solutions and redundant hardware for critical systems. There are proven solutions for many common failures. Take advantage of them and stop being a victim.

Every organization should have failure plans that include both Plan A and a Plan B. The art of failure planning is to understand the limitations of your plans and thereby make better decisions about how to respond to failures.


About the Author: Glen Kendell is a network architect and owner of Release to Production. He publishes a monthly newsletter called In-Production: Achieving True High Availability.


More articles by glen@r2pnetworks.com

Print Article | Download PDF | 185 views | Sep 02 2006

Digg del.icio.us Reddit furl

WebDevelopmentQuote.com
free website articles

Copyright © 2008 EasyArticles.com - All Rights Reserved - Syndicate: EasyArticles.com RSS Feed Add to Google Subscribe
Home | Join | My Account | Terms | Contact | Privacy | Terms | Resources

Web Development Quote - Website Templates - Website Design