On Diminishing Returns

January 10, 2007

One more note before I get into the DR framework itself:

I am a fan of comprehensive planning. I am also an advocate, although decidedly not a fan, of process for the control of plans, as long as that process allows for sufficient flexibility. But in any planning, and in any process, you must be aware of the law of diminishing returns.

Sure, it would be comprehensive to account for every type of file that every user might require. Sure, a process could require every user, or even every organization, to file a disaster recovery “flight plan” for every type of file that you have added to your catalog. Sure, it would be comprehensive to consider every eventuality, up to and including a nuclear strike on your data center.

And sometimes, I’m sure, while I’m writing about this disaster recovery framework, that may seem like what I’m advocating. Let me assure you, though, that I am not. Planners, you see, have a tendency to forget to consider the cost of the planning, itself, at least until the deadlines loom. And in an idealized world, where planning and cataloguing doesn’t cost anything, it makes sense to be as comprehensive and complete as possible.

What you should do, then, in applying this framework, is to rationalize the planning costs with the benefits. Do as much planning as you can reasonably afford to. Get the users and organizations involved as much as they can afford to. But don’t overburden yourself or others by planning at too fine a grain, or for eventualities that are too far beyond the pale.

As a closing example: I had a client for whom I built a large extended cluster with remote storage mirroring. We spent a great deal of time and money designing a solution that could survive a site failure, and automatically fail over after thirty seconds. The solution was finally deployed in early 2001, at the World Trade Center in NY.

On September 11, the system worked flawlessly, failing over and restoring secondary site access after about thirty-five seconds. Or so we were told, five days later, when the system administrators finally got around to attempting to use the remote site.

Keep that in mind.

Update: actually, another important lesson to be learned from that example is that we were planning for the last disaster. (’93 WTC bombing) Our technical solution worked for both, but the appropriateness of a near real-time failover was always pretty questionable.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: