|
Corralling data in the warehouse
Although you can get much of the same results with BI/Analytics,
data warehousing makes it that much easier for an enterprise get to grips with
its data in order to spot opportunities and devise competitive strategies. By
Prashant L. Rao

Atul Jalan
CEO & Mng. Director,
Manthan Systems |
In the case of data warehousing, it all starts with Extract,
Transform and Load (ETL), which is the core and soul of a data warehouse. ETL
processes are responsible for the completion of data conversion from the defined
data source to the target data warehouse in accordance with unified rules,
said Atul Jalan, Managing Director & CEO, Manthan Systems.
When
you think about it, the idea behind having a data warehouse is to ensure that
an organization gets access to clean, meaningful data. Data quality is, therefore,
the biggest issue here. Getting the right data is vital for any BI implementation.
Say, I have a customer who has a lot of data in SAP while the budgets and variances
are in Excel, which have not been captured in SAP. I would have to marry the
data from both Excel and SAP and ensure that data integrity is maintained. This
part is extraction, where you pull out the data. There are cases where data
is not as it should be and you have to transform it. Say, conveyance expenses
are in Excel and traveling expenses in SAP. I would have to create a mapper
to transform this data to create a master set of accounts. That's the transformation
part. Finally, you have to load the data into a database or data warehouse in
order to reuse it, said Sanjay Mehta, CEO, MAIA Intelligence. In most
cases, when ERP or core banking was implemented, they would have defined the
structure of a hierarchycountry, zone, regions etc. In reality, over a
few years, the business would have changedthe zones would have changed,
departments would have increased or M&A activity would have occurred. The
ERP defined structures may no longer be relevant. In such cases, most of the
decision making that is based on prevailing structures would move to a spreadsheet
scenario where data has to be massaged.

Sudesh Prabhu
Director Presales & Services,
Sybase |
Sudesh Prabhu, Director Presales and Services, Sybase, said,
Production data may not always be ready for data warehousing. There has
always been a gap between how production systems represent themselves and how
business users would like their data to be. There's some transformation required
and ETL does that part of the job. There's a good amount of computation that's
also done. Sometimes, for convenience, you may want to keep computed data and
ETL does this very well.
Vinod Shankar - Head, Business Intelligence, activecubes, commented, An
ETL process is the backbone of any data warehousing application. Data warehouses
were primarily conceived in order to bring data together from disparate sources,
integrate and cleanse them and put them in a repository that gave a 360 degree
view of an organization. ETL is the foundation for this. For any physical data
warehouse, ETL is going to be very important and it accounts for 50-60% of the
effort in an end-to-end data warehousing project.
Using an ETL tool is core to the success of an agile
data warehouse. Loading data in an automated way, leveraging the reuse of loading
scripts, implementing best practices and monitoring to ensure data qualitythese
elements are so key to a data warehouse that a solid ETL process and tool is
a marker of success. Agility is key herebeing able to create, update,
and delete data from heterogeneous sources in a rapid and reliable manner is
important, said Zubin Dowlaty, Head of Innovation & Development, Mu
Sigma.

Ajoy Kumar
Director and Head of Data Warehousing & Business Intelligence Practice,
Virtusa Corp. |
The popular ETL tools include IBM-DataStage, Informatica,
Oracle-ODI, Abintio and Microsoft-SSIS, commented Ajoy Kumar, Director
and Head of Data Warehousing & Business Intelligence Practice, Virtusa Corporation.

Rajesh Ramaswamy
Global Head Business Intelligence Practice, Marlabs Inc. |
ETL not only consumes a significant portion of the development
lifecycle but it is also the piece that evolves the most over the lifetime of
the warehouse due to changes in the source system, the business rules or the
end-user requirements. Therefore, an ETL design should not just be effective,
but also efficient, modular and scalable. Nowadays, ETL for a data warehouse
is getting to be more and more 'real-time' and reflecting operational business
changes with very low latency, said Rajesh Ramaswamy, Global Head Business
Intelligence Practice, Marlabs Inc.
The data repositories in any enterprise are created
organically. This creates a difference in their structure, technology etc. a
heterogeneity, even though it may come from a single vendor. Since these disparate
sources hold data about various subjects or functions of the organization and
a data warehouse needs to have centralized data, it becomes necessary to integrate
the data from different sources, said Dr. Jay B. Simha, CTO, ABIBA Systems.
Typical data sources
ERP, CRM, sometimes billing systems all contribute to
the data warehouse. There may also be home-grown applications that could be
categorized as home-grown ERP, commented Prabhu.
Legacy systems built 15, 20 or 25 years ago that are responsible for core
billing/warranty; newer applications implemented over time such as CRM, ERP
modules; social media, external data from research agencies etc. are all typical
data sources for a data warehouse, said Shankar.
Most data warehouses always include transactional data from internal OLTP
systems, by channel, as the core of the data warehouse. But they are all client
dependent, commented Dowlaty.
An enterprise may maintain legacy systems, unstructured
and semi structured data stores and industry standard transactional applications.
These data stores may span various business groups including CRM, sales, support,
operations, and finance, said Amit Gupta, Associate Architect, BI &
Analytics Competency, Persistent Systems.
Kumar said, The data source for a data warehouse depends
on the subject area. For example, a marketing data warehouse's data source could
include CRM and social networks. In addition to the data sources mentioned above,
other data sources for data warehouse implementations include from custom applications,
syndicated data and spreadsheets.
Traditionally, sources of data into the warehouse have
included ERP, CRM, accounting and other core legacy systems. They also contain
offline information from Excel spreadsheets and the like. Increasingly, however,
ETL processes cater to a wider variety of data sources including XML files,
clickstream data, log files, unstructured data such as Word documents, e-mail
(even voice and video files), social data and competitive information,
said Ramaswamy.
The data will come from transactional systems including ERP, SCM and CRM.
In the case of telecom or banking the billing/core banking system will provide
additional sources. Though, social network analysis is gaining popularity, currently
it is not a source for data warehouses, said Simha.
Creation and challenges

Sanjay Mehta
CEO,
MAIA Intelligence |
We have a standard DW model for multiple verticals.
For e.g. we have a standard data model for SAP with pre-packaged queries for
SQL Server as a data warehouse by which I guarantee customers 70 reports in
7 weeks on SAP. We have similar offerings for Microsoft Dynamics and Finacle.
We have partners who help us build specialized data warehouses, said Mehta.
Projects range from three to nine months. This is assuming that there's
a good amount of clarity in terms of what the customer's looking for. A large
part of this time is spent identifying data sources, mapping data, reports,
etc. Some verticals such as telecom accept a blueprint approach; most telcos
will have similar requirements. Some verticals lack set requirements and these
may evolve over time. Some organizations take it as a multi-stage process. You
put in a framework and the customer can build on that. Other than BFSI and telecom,
this is true for almost every other industry, said Prabhu.
Shankar said, Oracle has BI apps. If I have Oracle as my ERP system, I
can buy OBI Apps on top of it. It comes with pre-built ETL mapping, routines
etc. You just need to install and configure it. It reduces the time taken to
implement a data warehouse with pre-built data models, KPI engine etc. The vanilla
implementation of a packaged BI solution drastically reduces the time required
to deploy. You can have a fairly large data warehouse up and running in three-four
months. If you look at building data marts and integrating it, it takes about
six to eight months to start using the data warehouse. Then you keep building
on top of it. Getting to the enterprise data warehouse where you can look at
your first set of reports coming out of it takes nine to twelve months.
Dowlaty said, Usually it takes three to six months
to get the first version of a data warehouse out assuming that a solid ETL tool
is utilized and then you have to set about iterating improvements.
A typical department level data mart can be developed in three months,
an enterprise data warehouse can take anywhere between 12 to 36 months to be
developed and stabilized, said Simha who went on to list the major challenges
encountered in this process.
- End user acceptance: Resistance to change from current
practices to a fact/evidence-based approach will reduce the effectiveness
of a data warehouse in terms of RoI
- Delays in development and deployment will result
in technological obsolescence
- A technology-driven rather than a business-driven
approach will result in poor adoption
- Top management support is the most important challenge
in bringing BI/DW to an enterprise
Front-ends being used today
According to Jalan, todays data warehouse must be:
- Fully Web-deployed, with every end-user and administrative
capability usable through a standard Web browser from any location
- Historically accurate up-to-the-moment so that the
longer view of history extends seamlessly to include transactions happening
five minutes ago
- Profoundly distributed to ensure that internal business
processes, external data sources, and the far-flung databases are all part
of the enterprise data warehouse
- Dynamically changing in a way that, as we venture into
new lines of business and initiate new business contracts, the warehouse smoothly
accommodates new data types and interfaces
Setting the hype aside, it really is becoming a Web
world. Not only must we browser-enable our query tools and user interfaces,
but we must also change the way that we do business. End users can access information
using various capabilities including dashboards, score cards, canned reports,
ad-hoc analytical tools, alerts etc. Users can use these capabilities on thick
clients, browser-based applications and mobile phones, said Jalan.
Prabhu said, BI tools remain popular. They provide far
more abilities than was the case traditionally. In certain scenarios, depending
on the interaction needs of the customer, people are adopting Web 2.0 and small
tools are being used that aren't typical BI. It depends on how much is the intended
use and how much variation is required.
When we implement any presentation layer, it will sit on top of the data
warehouse in SQL Server 2008. We would harness SQL Server for the throughout
and represent it as a chart, cube, view, widget, KPI definition, dashboard etc.
If I define a sales invoice within 1Key, the same query/connection can be reused
for the dashboard or the cube etc. It's all a single layer, said Mehta.
Shankar said, There's innovation taking place in the conventional BI market
in terms of newer and better ways of delivering information. Lightweight apps
are emerging. Today the demand is for depicting data in the way that the business
wants it and it should look good. Interactive visualization (hover over a map
to drill down for e.g.) and contextual BI (depicting the image of an object
and information pertaining to various aspects of that object) are two of the
methods being used. Lots of things are happening in the Web 2.0 space. BI is
also being rendered in mashups. Products provide the flexibility for users to
customize portals and use widgets. BI has been about static information. Now
many customers are asking, how can I enter my own data against a dashboard so
that it goes back to the operational levels of the organization. Mobile devices
are coming up fastthe iPad, iPhone and BlackBerryespecially with
senior management wanting to be plugged into what's happening on a daily basis.
Even if you look at desktop applications, a manager doesn't want to run a specific
application. A widget on the desktop shows how a department is doing. These
are better ways of integrating with enterprise portals.
Talking about the heterogeneity in the user interfaces that are being deployed
today, Dowlaty commented, Most successful front ends have been using Rich
Internet Application (RIA) technology such as Adobe Flex and Microsoft Silverlight.
Gupta said, Most commercial business intelligence products support charts,
dashboards, reports, drill down/roll up analysis, ad-hoc queries, OLAP cubes
and alerts. The graphical representation of the results of analytical methods
such as time series analysis, customer segmentation, what-if analysis, association
rule mining and various domain-centric analysis provides deep insight into business
models. Dashboards and scorecards are used by management to track a company's
health, whereas OLAP cubes and analytics are used to take strategic business
decisions.
Kumar's take was that a variety of front ends were used depending upon the business
user's needs. Customers who need query reporting and dashboards typically selected
industry leading BI tools such as IBM Cognos, SAP Business Objects or Microsoft
Excel/SharePoint.
Ramaswamy felt that traditional BI tools still formed the bulk of the presentation
layer. However, fat clients were no longer in vogue. All results are presented
via browsers typically in the form of interactive visualizations either as dashboards
or scorecards that then drill down into reports. Increasingly, BI systems are
embedded into existing operations systems so as to present an integrated offering.
These could be in the form of mashups or other RIA.
Simha said, Most of the Gen X BI tools are RIAs, providing a browser-based
thin client with desktop-like features. The trends are increasingly inclined
towards visualization, user driven (rather than being IT dependent) analysis
and reporting and embedded advanced analytics such as forecasting and scoring.
|
Raymond has implemented MAIA Intelligence's data warehousing solution
to consolidate balance sheets of its group companies using 1Key financial
consolidation.
Vivek Kale
|
Vivek Kale, CIO, Raymond Limited, said, We
needed to do group level consolidation for 17 subsidiaries and deployed
1Key for the same. We have to report results every quarter and consolidation
was undertaken for this. Since late 2007, SAP has been the base operational
system at Raymond. There are areas and small companies, however, that
are not on the SAP platform. We needed a place where we could consolidate
the data of SAP and non-SAP companies' performance.
A few of the group's subsidiaries were using
legacy systems. Although Raymond needed to consolidate the information,
because of historical reasons, the accounts were different. At every stage
it had to get the current consistent data and it was important to do mapping
to ensure that this happened. Once you do that, it is a matter of
posting, said Kale. The statutory reporting requirements including
MIS had to be handled. Collecting, integrating and cleansing data was
the first step followed by mapping it consistently. The requirement was
to take care of reporting for both statutory reasons and for performance.
We needed to do parallel runs for the data that was already done
last year. Then we needed to replicate that data and get results that
were the same as last year's. Now, we are doing it in real-time. As we
do a close, that will be preceded by doing this consolidation and giving
out the results, said Kale.
Once the mapping etc. is done, you can produce whatever reports you
want with the required flexibility. Tomorrow, if there are any structural
changes in the group, we have the flexibility on what we want to report
to what level and in what format. We have had BI/analytics/reporting for
decades. However, it is much easier to think about this with a data warehouse.
It is easier for accounts people to understand. For people who are still
comfortable with spreadsheets, the data can be taken out in a format that's
Excel compatible, concluded Kale.
|
Size and growth rate
In MAIA Intelligence's experience, the average age of data that is included
in a warehouse is between three to five years. The average size of a data warehouse
is about 500 GB going up to 1.5 TB for large customers running SAP. If it is
in the BFSI sector, there are multiple backoffices and an average around seven
to nine apps on the back-end. In a standard manufacturing environment you tend
to find one monolithic ERP system along with subsidiary systems. In Healthcare
there would be five-six systems. Data in BFSI grows at a much faster rate. For
tracking sales of services, there would be a different system for each one.
They would have multiple relationships with the customer. Their data growth
is twice that of manufacturing.
Prabhu said, The source data size could be 15-30 TB. The technology offers
a fair level of compression and the implementation may turn out from 5-12 TB.
The warehouse is constant but the data churning in and out could be much higher.
We have seen 40% growth. People don't keep all the data in the warehouse. In
telecom, when it comes to billing data, they keep about three months' worth
in the warehouse. Unless the subscriber base grows, this data would be constant.
Shankar pointed out that, for large customers, a warehouse size of over a terabyte
was common. This was the case with companies in telecom or retail where a lot
of consumer data was involved. For SMBs he felt that it could be anything from
25 to 500 GB. Typically, it would be in the range of 100-200 GB. Growth depends
on the type of business etc. Telecom and retail faced 1.5 to 2x growth per year.
Sandeep Mohile, Information Management Head, Torry Harris Business Solutions
(P) Ltd. argued that companies were building data marts rather than full-fledged
data warehouses. We haven't seen large data warehouses being built. More
often, you find a data mart being built for a functional area such as Sales
& Marketing and analytics being done on top of that, he said. Telecom/retail
data mart size would be from 100 GB to 2 TB; this would include a couple years'
worth of cumulative data.
Mu Sigma's experience was that warehouse sizes varied from a few hundred gigabytes
to several Terabytes. In some cases, we deal with online database sizes
that run into petabytes as well, said Dowlaty.
In Manthan's experience, the average size of its customers' data warehouses
ranged from 200 to 500 GB. The growth rate was around 10-15%.
Data warehouse size may extend up to a few petabytes for companies that
provide online and telecommunication services and it may grow non-linearly in
tandem with the growth of the customer base, said Gupta.
Kumar said that a typical Indian customer's data warehouse would range in size
from 100 GB plus growing at over 25% in size year-on-year. Financial services,
telecom and retail customers saw the largest growth.
|
A data warehouse consolidates operational data from
a variety of heterogeneous sources including any database, application,
file or storage facility that can hold data in a format that is understood
and used by businesses.
- Core Production systems
- Customer Service systems
- Excel/Access files
- XML / text files
- Financial Systems (Cash Management, GL Systems, etc.)
- ERP systems (SAP, PeopleSoft, Siebel, MQ Series, etc.)
- Source systems (Oracle, SQL Server, Teradata, etc.)
- Third party data sources: (Nielsen, IRI, etc.)
Source: Manthan Systems
|
Ramaswamy said that gigabytes were passe and that terabytes,
petabytes and exabytes were the quantities talked about today. In India,
a lot of retail and telecom organizations have data warehouses that are several
terabytes in size. Web-based organizations that build clickstream warehouses
have petabyte sized implementations, he said. Growth rates varied on account
of three factors namely business growth, increase in new data points added to
the warehouse and the rise in the granularity of data in the warehouse.
Simha felt that it would be in the range of 1 TB for a medium-sized organization
to around 5 TB for a large organization. However, it could reach 10 plus TB
in the case of major enterprises in verticals such as telecom or BFSI.
Earning an RoI
Calculating the return on investment (RoI) on any deployment is difficult. Data
warehousing is no exception. Mehta felt that it was all about the value of data
that was unlocked thanks to the existence of the data warehouse. I've
got a FSI company where we had worked across 1,600 employees and they had entry-exit
show cards to measure attendance. We picked up the data and analyzed it and
found 53 days of productivity loss on account of late coming. We put it across
to the HR department in terms of male/female, seven grades and six departments.
This data was published on the intranet and suddenly people started coming in
on time. Even if you are able to save 20 days from those 53, the RoI is huge.
Mehta compared data warehousing's value to that of ERP. A data warehouse lets
you maintain nimble hierarchies/structures which is difficult in an ERP or core
banking system. An ERP system would give a sales figure for the quarter. In
the data warehouse, it would give a comparison with the last three quarters
or against budgeted targetsyou can do comparisons against multiple dimensions.
If I've got 100 users, 30 users would typically only do reporting. When
reports are fired, the performance of the system deteriorates. When we move
reporting from the transactional system into a data warehouse, the users will
now fire reports from the warehouse taking the load off and thereby improving
the performance of the transaction system. The average cost of license for SAP
is, say, Rs. 1 lakh per user. That transaction which is happening across the
SAP front with 30 users works out to Rs. 30 lakhs for just reporting users.
If I move to a data warehouse for reporting, I save on that straight away,
he added.
Prabhu pointed out, RoI in data warehousing is quick because that kind
of decision support system has an immediate impact on your business and if you
have a business which is large enough, the cost of a data warehouse is negligible.
RoI is limited by time of investment/implementation. It takes 1.5 to 2 years
for an organization to mature its data warehouse. Once you do that, the RoI
is assured.
Sybase's customers in India include telcos such as IDEA Cellular and Spice Telecom.
ICICI Bank is also a customer. It also has some government customers including
the Indian Railways.
Earlier when MPP was the only option, it was an enterprise play. Today
even smaller organizations are going in for a data warehouse. For a data warehouse,
the entry point would be at least $100,000. In a blueprint scenario, it may
not be an enterprise data warehouse in the true sense but you can put down a
good analytical platform. Large organizations would end up spending at least
$500,000 (BI tool, hardware, storage, services etc.), concluded Prabhu.
Shankar said, There are two types of benefits. Software benefits include
improved information access, better insights and improved productivity. Earlier,
if a person wanted a report, they would request the IT department to do the
needful and IT would take four days to generate the report. Now you can do it
yourself in 15 minutes. If you want to do a campaign in a short period of time,
thanks to direct business analytics, the data is already 75% there. You can
earn a realistic RoI in four to five months. The software benefits start accruing
in six to eight months. Generally, RoI can be earned in an year in data warehousing
implementations.
Kumar said, Typically Indian customers see RoI in two years.
Jalan stated that data warehouse RoI depended on the maturity of an organization,
adoption of analytics across the board, the ability to drive decisions from
insights etc. If these attributes of the customer are satisfied, then
he can get a quick RoI of one or two years, he said.
Ramaswamy commented, A lot of the RoI in a data warehouse is intangible;
hence, it is vital that the tangible benefits be identified, benchmarked and
tracked. RoI should start accruing from the minute that business users start
using the warehouse.
Simha said, It is anywhere from three months (for data mart apps) to around
four years for a data warehouse.
prashant.rao@expressindia.com
|