|
Peer-to-Peer
DR at a mutual fund
SBI Mutual Funds DR site recently went live in Chennai,
reports Vinita Gupta.
The one important lesson that many organisations learned after last years
deluge in Mumbai is the significance of maintaining a disaster recovery (DR)
site. Those companies that coped did so on by virtue of the fact that they had
a DR site to resume operations from. The other reason for the need and growing
acceptance of DR sites is the increasing number of business applications and
the increasing dependence of organisations on IT.
Though SBI Mutual Funds (SBIMF) was not affected by the flooding last year (or
this year, for that matter), they have nevertheless invested in a DR site and
are quite confident that if disaster hits they are prepared for the worst.
The need
The reasons that compelled SBIMF to go in for a DR site were
internal risk management, regulatory guidelines from the Securities Exchange
Board of India (SEBI), and the desire to gain the investors trust.
"Institutional investors
like to know that risk
management practices are in place before they invest in a mutual fund"
- Subhojit Roy
Head, IT
SBI Mutual Fund
|
Says Subhojit Roy, SBIMFs Head of IT, Business
Continuity (BC) is very important, especially for the financial services sector.
DR is a sub-set of BC. To keep the business running in case of a disaster such
as a natural calamity, or the break-down of the primary data centre, DR is a
must. SBIMF has many applications such as publication of the net asset
value (NAV) for investors which need to be released everyday irrespective of
disasters.
The mutual funds (MFs) workings are regulated
by SEBI guidelines, which stipulate that the MF and its registrar and transfer
(R&T) agents and custodians should have an offsite back-up facility and
business contingency plan that is tested and evaluated on a regular basis. The
business contingency plan should be comprehensive and should cover IT, infrastructure
and personnel requirements.
Since the MF works with intermediaries such as banks, custodians,
R&T agents and brokers, the level of restoration of normal operations by
the MF and the time taken for different levels of normalcy will depend on the
individual DR implementation of its partners.
The need for DR among MF providers also arises as investors want to know the
level of preparedness of the provider before investing. Notes Roy, BC
has become quite critical in the financial services sector. Apart from regulatory
requirements for DR and BC, institutional investors also like to know before
investing whether risk management practices are in place or not.
Process and implementation
The process of DR planning began in May 2005, and the final implementation started
in February this year; the DR site went live in June.
SBIMFs DR site is at Chennai. The reason for choosing the TN capital was
because it does not fall in a high seismic activity zone. Apart from deciding
the location of its DR site, SBIMF had to decide on the cost and modalities
i.e. whether it would be deployed and managed by an in-house IT team or whether
it would be outsourced. Says Roy, SBI had already set up a complete DR
site in Chennai for its core banking and ATM network, and they provided space
and infrastructure in their DR data centre to us.
Though the site has well-equipped infrastructure, skilled personnel and BS7799
certification, SBIMF had to establish its own systems at the site.
The stages of DR
- Level 1. The first level of DR implementation consisted
of planning and implementing policy-based strategic back-up management, back-up
strategies, data consolidation and tape vaulting at the offsite facility.
Informs Roy, We are presently taking daily back-up of data of all critical
servers. The back-up tapes are stored in a fire-proof cabinet in our office
as well as in the banks locker for offsite storage.
- Level 2. The second step included charting out
the critical components and designing a redundancy plan. Most of the servers
and active network components are critical to the operations. A single point
of failure in such components can raise the risk of disasters and bring the
entire business to a halt. In this level the redundancy path is designed to
avoid total disruption. As a result of risk mitigation, you get different
redundancy designs for critical network components. All single points of failure
are treated for redundancy planning, adds Roy.
- Level 3. Finally, in the third stage of DR, the
primary site is offered an alternative site of operation to undertake business
critical processes within the stipulated recovery time objective (RTO) and
recovery point objective (RPO). While setting up a DR site, an appropriate
data recovery solution is defined to satisfy the needs of RTO and RPO.
Applications on the DR site
SBIMF is running business applications such as Mfund, and front office and cash
management systems at the DR site. All business-critical applications like Oracle
database, and the mail, file and print server, are being replicated.
Roy says, Based on business impact analysis and the objectives of BC such
as RTO and RPO, we have selected these applications and data replication technology.
The applications are front office and back office systems (running on Oracle
9i), the cash management system (also running on Oracle 9i), portfolio management
system (running on MS-SQL), centralised mailing system (Lotus Domino 6.5.3)
and files of mapped drives of all the users in the network of the primary site.
Non-critical applications such as workflow applications are not part of DR.
Technology used
SBIMF has about 50 branch offices which look at sales and investor servicing.
All these branches are connected to the corporate office (at Cuffe Parade in
Mumbai) through the WAN. Data from the branches is collated at the centralised
server located at the corporate office. The servers are Intel-Windows-based.
Data is replicated in two ways: host-based replication and consolidated replication.
Host-based replication means data replication from one system at the primary
site to a similar system at the DR site. It is application-level replication,
which means it can be done at the application level (like Oracle Data Guard)
or through third-party software. The other way is to consolidate the data from
the various servers into a single storage box (like a SAN or NAS box), and then
replicate the data of different applications from the external storage box to
another similar box at the DR site.
SBIMF has chosen to replicate its data by following a consolidated replication
method. SBIMF first consolidates all critical server data through storage consolidation.
With the use of Network Appliances fibre-attached storage (FAS), they
replicate all the data to a similar FAS device at the DR site. At present, the
servers are also accessing the FAS box.
Informs Roy, We have done data consolidation at the primary site, that
is, the corporate office. All the critical data of the Oracle, mail and file
servers have been migrated into a unified storage box. Critical data of
SBIMF gets replicated every four hours; this means that whatever data there
is in the FAS box in the primary site gets replicated to the FAS box at the
DR site. The less critical data is replicated at the end of the day to reduce
bandwidth utilisation during working hours.
At the primary site, SBIMF is using a Tandberg autoloader and Veritas back-up
software for archival. Earlier, back-ups were taken into SDLTs without an autoloader.
For the connectivity part, the primary and DR site are connected by leased lines
of 2 Mbps. Since they are using the same DR site as SBI, SBIMF could leverage
it. Reveals Roy, SBI has set up a leased line between its Chennai DR site
and the central hub in Mumbai; we too are connected to the central hub of the
SBI through a leased line of 2 Mbps. Because of this we saved on the cost of
setting up our own leased line connecting the DR site to Mumbai.
Role of BCP committee
The companys Business Continuity Planning (BCP) committee is the highest-level
committee for DR. This committee takes the final decisions on actual disaster
situations, and based on its decision the BCP team will act. Typically, a BCP
committee comprises the top management team, members of different functional
areas, and the IT team. The BCP team is responsible for reviewing the
DR / BC plan, testing the DR site periodically through live DR drills with the
help of users, and has a specific role to play in case of disaster, says
Roy.
The challenges faced by the SBIMF team in setting up the DR site were selection
of the site, making a complete DR / BC manual, involving all departments / functions
of the company, planning appropriate technology for the DR requirements of the
company, continuous review and updating of DR / BC processes, and regular testing
of DR. The testing to ensure accuracy of the DR site is conducted every quarter.
First step to BC
Roy believes that having a DR site is the first step towards BC. If any of the
server components in the primary site is down, they can work from the DR site
till the primary sites equipment is revived. Business impact analysis
helps as it gives us a complete picture for setting up an alternate operational
site for BC, and also the manpower requirements for BC. It is useful in building
adequate redundancy in the present infrastructure, and a complete DR / BC manual
by giving everybody clear guidelines for disaster situations.
|