|
Disaster
recovery services can help companies recover from virtually
any type of disaster and ensure ongoing availability of mission-critical
resources. Ajay Gidh explains the various stages of an effective
disaster recovery plan
Disaster
recovery (DR) planning is the process of developing advance
arrangements and procedures that enable an organisation to
respond to a disaster by resuming critical business functions
within a defined time frame, minimising loss, and restoring
affected areas.
It is not a two-month project, neither is it a project that
you can forget about, once it is completed. An effective recovery
plan is a live recovery plan. The plan must be maintained
and tested/ exercised regularly.
An effective DR plan consists of the following stages:
-
Programme description
-
Pre-planning activities (project initiation)
-
Vulnerability assessment and general definition of requirements
-
Business impact analysis
-
Detailed definition of requirements
-
Plan development
-
Testing programme
-
Maintenance programme
-
Initial plan testing and plan implementation.
The primary objective of a business resumption plan is to
enable an organisation to survive a disaster and to re-establish
normal business operations. In order to survive, an organisation
must ensure that critical operations can resume within a reasonable
time frame. Therefore, the goals of a business resumption
plan should be to identify weaknesses and implement a disaster
prevention programme, minimise the duration of a serious disruption
to business operations, facilitate effective co-ordination
of recovery tasks, and most importantly reduce complexity
of the recovery effort.
Historically, the data processing function alone has been
assigned the responsibility for providing contingency planning.
Frequently, this has led to the development of recovery plans
to restore computer resources in a manner that is not fully
responsive to the needs of the business. Contingency planning
is a business issue rather than a data processing issue. In
todays environment, the effects of long-term operations
outage may have a catastrophic impact. The development of
a viable recovery strategy must, therefore, be a product not
just from the providers of the organisations data processing,
communications and operations centre services, but also the
users of those services and management personnel who have
the responsibility for protection of the organisations
assets.
Programme description
Since recovery planning is a very complex and labour intensive
process, it requires redirection of valuable technical staff
and information processing resources as well as appropriate
funding. In order to minimise the impact such an undertaking
would have on scarce resources, the project for the development
and implementation of disaster recovery and business resumption
plans should be a part of the organisations normal planning
activities. The proposed project methodology consists of eight
separate phases, as described below.
Pre-planning
activities (project initiation)
The goal in phase one is to obtain an understanding of the
existing and projected computing environment of the organisation.
This enables the project team to: refine the scope of the
project and the associated work programme; develop project
schedules; and identify and address any issues that could
have an impact on the delivery and the success of the project.
During this phase a steering committee should be established.
The committee should have the overall responsibility for providing
direction and guidance to the project team. The committee
should also make all decisions related to the recovery planning
effort. The project manager should work with the steering
committee in finalising the detailed work plan and developing
interview schedules for conducting security assessment and
business impact analysis.
Two other key deliverables of this phase are: the development
of a policy to support the recovery programmes, and an awareness
programme to educate management and senior individuals who
will be required to participate in the project.
Vulnerability assessment and general definition of requirements
Security and control within an organisation is a continuing
concern. It is preferable from an economic and business strategy
perspective to concentrate on activities that have the effect
of reducing the possibility of disaster occurrence, rather
than concentrating primarily on minimising impact of an actual
disaster. This phase addresses measures to reduce the probability
of occurrence. This phase will include the following key tasks:
-
A thorough security assessment of the computing and communications
environment, including personnel practices; physical security;
operating procedures; backup and contingency planning; systems
development and maintenance; database security; data and
voice communications security; systems and access control
software security; insurance; security planning and administration;
application controls; and personal computers.
-
The security assessment will enable the project team to
improve any existing emergency plans and disaster prevention
measures and to implement required emergency plans and disaster
prevention measures where none exist.
-
Present findings and recommendations resulting from the
activities of the security assessment to the steering committee
so that corrective actions can be initiated in a timely
manner.
-
Define the scope of the planning effort.
-
Analyse, recommend and purchase recovery planning and maintenance
software required to support the development of the plans
and to maintain the plans following implementation.
-
Develop a plan framework.
-
Assemble the project team and conduct awareness sessions.
Business impact assessment (BIA)
BIA of all business units that are part of the business environment
enables the project team to: identify critical systems, processes
and functions; assess the economic impact of incidents and
disasters that result in a denial of access to system services
and other services and facilities; and assess the pain
threshold, that is, the length of time business units
can survive without access to systems, services and facilities.
The BIA report should be presented to the steering committee.
This report identifies critical service functions and the
time frames in which they must be recovered after interruption.
The BIA Report should then be used as a basis for identifying
systems and resources required to support the critical services
provided by information processing and other services and
facilities.
Detailed definition of requirements
During this phase, a profile of recovery requirements is developed.
This profile is to be used as a basis for analysing alternative
recovery strategies. The profile is developed by identifying
resources required to support critical functions identified
in phase three. This profile should include hardware (mainframe,
data and voice communications and personal computers), software
(vendor supplied, in-house developed, etc), documentation,
user, procedures), outside support (public networks ), facilities
(office space, office equipment, etc) and personnel for each
business unit. Recovery strategies will be based on short,
intermediate and long term outages. Another key deliverable
of this phase is the definition of the plan scope, objectives
and assumptions.
Plan development
During this phase, recovery plan components are defined and
plans are documented. This phase also includes the implementation
of changes to user procedures, upgrading of existing data
processing operating procedures required to support selected
recovery strategies and alternatives, vendor contract negotiations
(with suppliers of recovery services) and the definition of
recovery teams, their roles and responsibilities. Recovery
standards are also to be developed during this phase.
Testing/exercising programme
The plan testing/exercising programme is developed during
this phase. Testing/exercising goals are established and alternative
testing strategies are evaluated. Testing strategies tailored
to the environment should be selected and an ongoing testing
programme should be established.
Maintenance programme
Maintenance of the plans is critical to the success of an
actual recovery. The plans must reflect changes to the environments
that are supported by the plans. It is critical that existing
change management processes are revised to take recovery plan
maintenance into account. In areas where change management
does not exist, change management procedures will be recommended
and implemented. Many recovery software products take this
requirement into account.
Initial plan testing and implementation
Once plans are developed, initial tests of the plans are conducted
and any necessary modifications to the plans are made based
on an analysis of the test results. Specific activities of
this phase include the following:
-
Defining the test purpose /approach
-
Identifying test teams;
-
Structuring the test;
-
Conducting the test;
-
Analysing test results; and
-
Modifying the plans as appropriate.
The approach taken to test the plans depends, in large part,
on the recovery strategies selected to meet the recovery requirements
of the organisation. As the recovery strategies are defined,
specific testing procedures should be developed to ensure
that the written plans are comprehensive and accurate.
The author is managing partner, retail solutions
divisionprofessional services for South East Asia at
NCR. He can be contacted at ajay.gidh@ncr.com
|