As human beings, we are eternally optimistic by nature. We are happy in our belief that bad things happen to other people, not us.

 

The same is true in IT in that 50% of businesses do not have an IT Business Continuity Plan and of those that do have one, only 28% have tested it.

 

 

Let’s cut to the chase, disasters can and do happen. 

 

We know this from our perspective of being a support provider because we are the people that get the phone calls when disaster strikes and see this all too often. It would be fantastic if customers rang us up once in a while and just said, “You know what, everything is working great today” … but that never happens.

 

Disasters come in many forms. There are the ones we think of such as fires, storms and floods, but they also come from other directions such as hardware failure, malware, data corruption and even actions of a rogue employee - we have seen all of these. In the last few years there has definitely been a noticeable increase in the latter kinds of events happening, much of this is due to issues such as ransomware and other types of malware, but also in some cases just due to the general increase in system complexity and good old human error.

 

When major incidents happen, it’s important that you have a plan. There is nothing that can make bad things worse than decision making on the fly. You need to be absolutely sure you have a plan and that said plan is up-to-date, taking into account any changes made since the plan was last tested. This brings me to my main point, testing.

 

Unfortunately, in my experience, Disaster Recovery (DR) plans are put together by IT departments in isolation usually as part of some tick box exercise and/or compliance audit response. DR strategies, especially invocation planning, should be a team effort including developers and application experts within the business. This allows plans to include dependencies and therefore boot order and priority for critical systems.  

 

For me, the following are the challenges that need to be addressed:

 

1. Execution 

As a business, you need to understand what disaster scenarios you are likely to face and exactly how you will invoke. This should also include a decision-making process in deciding if DR should be invoked. In the words of Dr Ian Malcolm, “Just because you CAN doesn’t always mean you SHOULD”. There may be easier options, or this may be a minor transient glitch. This will usually become apparent very early on.

 

2. Testing

Perhaps the most important parts of DR planning are the 3 T’s; testing, testing and testing. With all the will and technical know-how in the world, there will be things that will crop up, like some hard-coded IP or the lack of a licensing dongle (real things) that you just didn’t know about. It’s absolutely imperative that any plan is thoroughly tested.

 

I remember when DR testing was hard and expensive due to the sheer effort required. I have been involved in DR tests that lasted a full week in a specialist DR facility, with teams of 10 or more people and even scenarios where we had a huge arctic truck turn up outside full of generators and racks upon racks of servers.

 

 

3. Documentation

Environments change therefore your disaster plan must be relevant, up to date and inclusive of any iterative changes made or lessons learned during your testing.

 

4. How technology can help

Technology can assist in addressing all of the challenges mentioned above, and the key to this is portability and automation. Virtualisation has made workloads portable between dissimilar hardware, which has been hugely advantageous when it comes to IT recovery. Execution of DR invocation can and should be orchestrated whereby a test plan can be executed step by step with various checks at each stage to prove availability and function of application pre-requisites and dependencies prior to moving on to the next stage.

 

Technology can allow frequent, non-disruptive testing of plans to be carried out so that you have the comfort and reassurance that you can recover and in what timescales.

 

Technology can be used to not only create test plans and DR runbooks but more importantly keep them updated as new systems are brought online. It can also create compliance reports which proves testing is undertaken and with real-world statistics. 

 

If you want to know more about how we can assist in designing, implementing or managing data protection and business continuity plans give us a call on 0113 387 1070.