Untitled Document
www.expresscomputeronline.com WEEKLY INSIGHT FOR TECHNOLOGY PROFESSIONALS
24 January 2011  
Untitled Document
Sections

Cover Story
Case Study
Tech Views
News
Products

Express Intelligent Enterprise

Events

Technology Senate
Technology Sabha

Services
Subscribe/Renew
Archives
Search
Contact Us
Network Sites
Exp.Channel Business
Express Hospitality
Express TravelWorld
Express Pharma
Express Healthcare
Group Sites
ExpressIndia
Indian Express
Financial Express

Untitled Document
 
Home - Cover Story - Article

Corralling data in the warehouse

Although you can get much of the same results with BI/Analytics, data warehousing makes it that much easier for an enterprise get to grips with its data in order to spot opportunities and devise competitive strategies. By Prashant L. Rao


Atul Jalan
CEO & Mng. Director,
Manthan Systems

In the case of data warehousing, it all starts with Extract, Transform and Load (ETL), which is the core and soul of a data warehouse. “ETL processes are responsible for the completion of data conversion from the defined data source to the target data warehouse in accordance with unified rules,” said Atul Jalan, Managing Director & CEO, Manthan Systems.

When you think about it, the idea behind having a data warehouse is to ensure that an organization gets access to clean, meaningful data. Data quality is, therefore, the biggest issue here. “Getting the right data is vital for any BI implementation. Say, I have a customer who has a lot of data in SAP while the budgets and variances are in Excel, which have not been captured in SAP. I would have to marry the data from both Excel and SAP and ensure that data integrity is maintained. This part is extraction, where you pull out the data. There are cases where data is not as it should be and you have to transform it. Say, conveyance expenses are in Excel and traveling expenses in SAP. I would have to create a mapper to transform this data to create a master set of accounts. That's the transformation part. Finally, you have to load the data into a database or data warehouse in order to reuse it,” said Sanjay Mehta, CEO, MAIA Intelligence. In most cases, when ERP or core banking was implemented, they would have defined the structure of a hierarchy—country, zone, regions etc. In reality, over a few years, the business would have changed—the zones would have changed, departments would have increased or M&A activity would have occurred. The ERP defined structures may no longer be relevant. In such cases, most of the decision making that is based on prevailing structures would move to a spreadsheet scenario where data has to be massaged.


Sudesh Prabhu
Director Presales & Services,
Sybase

Sudesh Prabhu, Director Presales and Services, Sybase, said, “Production data may not always be ready for data warehousing. There has always been a gap between how production systems represent themselves and how business users would like their data to be. There's some transformation required and ETL does that part of the job. There's a good amount of computation that's also done. Sometimes, for convenience, you may want to keep computed data and ETL does this very well.”

Vinod Shankar - Head, Business Intelligence, activecubes, commented, “An ETL process is the backbone of any data warehousing application. Data warehouses were primarily conceived in order to bring data together from disparate sources, integrate and cleanse them and put them in a repository that gave a 360 degree view of an organization. ETL is the foundation for this. For any physical data warehouse, ETL is going to be very important and it accounts for 50-60% of the effort in an end-to-end data warehousing project.”

“Using an ETL tool is core to the success of an agile data warehouse. Loading data in an automated way, leveraging the reuse of loading scripts, implementing best practices and monitoring to ensure data quality—these elements are so key to a data warehouse that a solid ETL process and tool is a marker of success. Agility is key here—being able to create, update, and delete data from heterogeneous sources in a rapid and reliable manner is important,” said Zubin Dowlaty, Head of Innovation & Development, Mu Sigma.


Ajoy Kumar
Director and Head of Data Warehousing & Business Intelligence Practice,
Virtusa Corp.

“The popular ETL tools include IBM-DataStage, Informatica, Oracle-ODI, Abintio and Microsoft-SSIS,” commented Ajoy Kumar, Director and Head of Data Warehousing & Business Intelligence Practice, Virtusa Corporation.


Rajesh Ramaswamy
Global Head Business Intelligence Practice, Marlabs Inc.

“ETL not only consumes a significant portion of the development lifecycle but it is also the piece that evolves the most over the lifetime of the warehouse due to changes in the source system, the business rules or the end-user requirements. Therefore, an ETL design should not just be effective, but also efficient, modular and scalable. Nowadays, ETL for a data warehouse is getting to be more and more 'real-time' and reflecting operational business changes with very low latency,” said Rajesh Ramaswamy, Global Head Business Intelligence Practice, Marlabs Inc.

“The data repositories in any enterprise are created organically. This creates a difference in their structure, technology etc. a heterogeneity, even though it may come from a single vendor. Since these disparate sources hold data about various subjects or functions of the organization and a data warehouse needs to have centralized data, it becomes necessary to integrate the data from different sources,” said Dr. Jay B. Simha, CTO, ABIBA Systems.

Typical data sources

“ERP, CRM, sometimes billing systems all contribute to the data warehouse. There may also be home-grown applications that could be categorized as home-grown ERP,” commented Prabhu.

“Legacy systems built 15, 20 or 25 years ago that are responsible for core billing/warranty; newer applications implemented over time such as CRM, ERP modules; social media, external data from research agencies etc. are all typical data sources for a data warehouse,” said Shankar.

“Most data warehouses always include transactional data from internal OLTP systems, by channel, as the core of the data warehouse. But they are all client dependent,” commented Dowlaty.

“An enterprise may maintain legacy systems, unstructured and semi structured data stores and industry standard transactional applications. These data stores may span various business groups including CRM, sales, support, operations, and finance,” said Amit Gupta, Associate Architect, BI & Analytics Competency, Persistent Systems.

Kumar said, “The data source for a data warehouse depends on the subject area. For example, a marketing data warehouse's data source could include CRM and social networks. In addition to the data sources mentioned above, other data sources for data warehouse implementations include from custom applications, syndicated data and spreadsheets.”

“Traditionally, sources of data into the warehouse have included ERP, CRM, accounting and other core legacy systems. They also contain offline information from Excel spreadsheets and the like. Increasingly, however, ETL processes cater to a wider variety of data sources including XML files, clickstream data, log files, unstructured data such as Word documents, e-mail (even voice and video files), social data and competitive information,” said Ramaswamy.

“The data will come from transactional systems including ERP, SCM and CRM. In the case of telecom or banking the billing/core banking system will provide additional sources. Though, social network analysis is gaining popularity, currently it is not a source for data warehouses,” said Simha.

Creation and challenges


Sanjay Mehta
CEO,
MAIA Intelligence

“We have a standard DW model for multiple verticals. For e.g. we have a standard data model for SAP with pre-packaged queries for SQL Server as a data warehouse by which I guarantee customers 70 reports in 7 weeks on SAP. We have similar offerings for Microsoft Dynamics and Finacle. We have partners who help us build specialized data warehouses,” said Mehta.

“Projects range from three to nine months. This is assuming that there's a good amount of clarity in terms of what the customer's looking for. A large part of this time is spent identifying data sources, mapping data, reports, etc. Some verticals such as telecom accept a blueprint approach; most telcos will have similar requirements. Some verticals lack set requirements and these may evolve over time. Some organizations take it as a multi-stage process. You put in a framework and the customer can build on that. Other than BFSI and telecom, this is true for almost every other industry,” said Prabhu.

Shankar said, “Oracle has BI apps. If I have Oracle as my ERP system, I can buy OBI Apps on top of it. It comes with pre-built ETL mapping, routines etc. You just need to install and configure it. It reduces the time taken to implement a data warehouse with pre-built data models, KPI engine etc. The vanilla implementation of a packaged BI solution drastically reduces the time required to deploy. You can have a fairly large data warehouse up and running in three-four months. If you look at building data marts and integrating it, it takes about six to eight months to start using the data warehouse. Then you keep building on top of it. Getting to the enterprise data warehouse where you can look at your first set of reports coming out of it takes nine to twelve months.”

Dowlaty said, “Usually it takes three to six months to get the first version of a data warehouse out assuming that a solid ETL tool is utilized and then you have to set about iterating improvements.”

“A typical department level data mart can be developed in three months, an enterprise data warehouse can take anywhere between 12 to 36 months to be developed and stabilized,” said Simha who went on to list the major challenges encountered in this process.

  • End user acceptance: Resistance to change from current practices to a fact/evidence-based approach will reduce the effectiveness of a data warehouse in terms of RoI
  • Delays in development and deployment will result in technological obsolescence
  • A technology-driven rather than a business-driven approach will result in poor adoption
  • Top management support is the most important challenge in bringing BI/DW to an enterprise

Front-ends being used today

According to Jalan, today’s data warehouse must be:

  • Fully Web-deployed, with every end-user and administrative capability usable through a standard Web browser from any location
  • Historically accurate up-to-the-moment so that the longer view of history extends seamlessly to include transactions happening five minutes ago
  • Profoundly distributed to ensure that internal business processes, external data sources, and the far-flung databases are all part of the enterprise data warehouse
  • Dynamically changing in a way that, as we venture into new lines of business and initiate new business contracts, the warehouse smoothly accommodates new data types and interfaces

“Setting the hype aside, it really is becoming a Web world. Not only must we browser-enable our query tools and user interfaces, but we must also change the way that we do business. End users can access information using various capabilities including dashboards, score cards, canned reports, ad-hoc analytical tools, alerts etc. Users can use these capabilities on thick clients, browser-based applications and mobile phones,” said Jalan.

Prabhu said, “BI tools remain popular. They provide far more abilities than was the case traditionally. In certain scenarios, depending on the interaction needs of the customer, people are adopting Web 2.0 and small tools are being used that aren't typical BI. It depends on how much is the intended use and how much variation is required.”

“When we implement any presentation layer, it will sit on top of the data warehouse in SQL Server 2008. We would harness SQL Server for the throughout and represent it as a chart, cube, view, widget, KPI definition, dashboard etc. If I define a sales invoice within 1Key, the same query/connection can be reused for the dashboard or the cube etc. It's all a single layer,” said Mehta.

Shankar said, “There's innovation taking place in the conventional BI market in terms of newer and better ways of delivering information. Lightweight apps are emerging. Today the demand is for depicting data in the way that the business wants it and it should look good. Interactive visualization (hover over a map to drill down for e.g.) and contextual BI (depicting the image of an object and information pertaining to various aspects of that object) are two of the methods being used. Lots of things are happening in the Web 2.0 space. BI is also being rendered in mashups. Products provide the flexibility for users to customize portals and use widgets. BI has been about static information. Now many customers are asking, how can I enter my own data against a dashboard so that it goes back to the operational levels of the organization. Mobile devices are coming up fast—the iPad, iPhone and BlackBerry—especially with senior management wanting to be plugged into what's happening on a daily basis. Even if you look at desktop applications, a manager doesn't want to run a specific application. A widget on the desktop shows how a department is doing. These are better ways of integrating with enterprise portals.”

Talking about the heterogeneity in the user interfaces that are being deployed today, Dowlaty commented, “Most successful front ends have been using Rich Internet Application (RIA) technology such as Adobe Flex and Microsoft Silverlight.”

Gupta said, “Most commercial business intelligence products support charts, dashboards, reports, drill down/roll up analysis, ad-hoc queries, OLAP cubes and alerts. The graphical representation of the results of analytical methods such as time series analysis, customer segmentation, what-if analysis, association rule mining and various domain-centric analysis provides deep insight into business models. Dashboards and scorecards are used by management to track a company's health, whereas OLAP cubes and analytics are used to take strategic business decisions.”

Kumar's take was that a variety of front ends were used depending upon the business user's needs. Customers who need query reporting and dashboards typically selected industry leading BI tools such as IBM Cognos, SAP Business Objects or Microsoft Excel/SharePoint.

Ramaswamy felt that traditional BI tools still formed the bulk of the presentation layer. However, fat clients were no longer in vogue. All results are presented via browsers typically in the form of interactive visualizations either as dashboards or scorecards that then drill down into reports. Increasingly, BI systems are embedded into existing operations systems so as to present an integrated offering. These could be in the form of mashups or other RIA.

Simha said, “Most of the Gen X BI tools are RIAs, providing a browser-based thin client with desktop-like features. The trends are increasingly inclined towards visualization, user driven (rather than being IT dependent) analysis and reporting and embedded advanced analytics such as forecasting and scoring.”

CIO’s take

Raymond has implemented MAIA Intelligence's data warehousing solution to consolidate balance sheets of its group companies using 1Key financial consolidation.


Vivek Kale

Vivek Kale, CIO, Raymond Limited, said, “We needed to do group level consolidation for 17 subsidiaries and deployed 1Key for the same. We have to report results every quarter and consolidation was undertaken for this. Since late 2007, SAP has been the base operational system at Raymond. There are areas and small companies, however, that are not on the SAP platform. We needed a place where we could consolidate the data of SAP and non-SAP companies' performance.”

A few of the group's subsidiaries were using legacy systems. Although Raymond needed to consolidate the information, because of historical reasons, the accounts were different. At every stage it had to get the current consistent data and it was important to do mapping to ensure that this happened. “Once you do that, it is a matter of posting,” said Kale. The statutory reporting requirements including MIS had to be handled. Collecting, integrating and cleansing data was the first step followed by mapping it consistently. The requirement was to take care of reporting for both statutory reasons and for performance.

“We needed to do parallel runs for the data that was already done last year. Then we needed to replicate that data and get results that were the same as last year's. Now, we are doing it in real-time. As we do a close, that will be preceded by doing this consolidation and giving out the results,” said Kale.

Once the mapping etc. is done, you can produce whatever reports you want with the required flexibility. “Tomorrow, if there are any structural changes in the group, we have the flexibility on what we want to report to what level and in what format. We have had BI/analytics/reporting for decades. However, it is much easier to think about this with a data warehouse. It is easier for accounts people to understand. For people who are still comfortable with spreadsheets, the data can be taken out in a format that's Excel compatible,” concluded Kale.

Size and growth rate

In MAIA Intelligence's experience, the average age of data that is included in a warehouse is between three to five years. The average size of a data warehouse is about 500 GB going up to 1.5 TB for large customers running SAP. If it is in the BFSI sector, there are multiple backoffices and an average around seven to nine apps on the back-end. In a standard manufacturing environment you tend to find one monolithic ERP system along with subsidiary systems. In Healthcare there would be five-six systems. Data in BFSI grows at a much faster rate. For tracking sales of services, there would be a different system for each one. They would have multiple relationships with the customer. Their data growth is twice that of manufacturing.

Prabhu said, “The source data size could be 15-30 TB. The technology offers a fair level of compression and the implementation may turn out from 5-12 TB. The warehouse is constant but the data churning in and out could be much higher. We have seen 40% growth. People don't keep all the data in the warehouse. In telecom, when it comes to billing data, they keep about three months' worth in the warehouse. Unless the subscriber base grows, this data would be constant.”

Shankar pointed out that, for large customers, a warehouse size of over a terabyte was common. This was the case with companies in telecom or retail where a lot of consumer data was involved. For SMBs he felt that it could be anything from 25 to 500 GB. Typically, it would be in the range of 100-200 GB. Growth depends on the type of business etc. Telecom and retail faced 1.5 to 2x growth per year.

Sandeep Mohile, Information Management Head, Torry Harris Business Solutions (P) Ltd. argued that companies were building data marts rather than full-fledged data warehouses. “We haven't seen large data warehouses being built. More often, you find a data mart being built for a functional area such as Sales & Marketing and analytics being done on top of that,” he said. Telecom/retail data mart size would be from 100 GB to 2 TB; this would include a couple years' worth of cumulative data.

Mu Sigma's experience was that warehouse sizes varied from a few hundred gigabytes to several Terabytes. “In some cases, we deal with online database sizes that run into petabytes as well,” said Dowlaty.

In Manthan's experience, the average size of its customers' data warehouses ranged from 200 to 500 GB. The growth rate was around 10-15%.

“Data warehouse size may extend up to a few petabytes for companies that provide online and telecommunication services and it may grow non-linearly in tandem with the growth of the customer base,” said Gupta.

Kumar said that a typical Indian customer's data warehouse would range in size from 100 GB plus growing at over 25% in size year-on-year. Financial services, telecom and retail customers saw the largest growth.

Typical sources for a data warehouse

A data warehouse consolidates operational data from a variety of heterogeneous sources including any database, application, file or storage facility that can hold data in a format that is understood and used by businesses.

  • Core Production systems
  • Customer Service systems
  • Excel/Access files
  • XML / text files
  • Financial Systems (Cash Management, GL Systems, etc.)
  • ERP systems (SAP, PeopleSoft, Siebel, MQ Series, etc.)
  • Source systems (Oracle, SQL Server, Teradata, etc.)
  • Third party data sources: (Nielsen, IRI, etc.)

    Source: Manthan Systems

 

Ramaswamy said that gigabytes were passe and that terabytes, petabytes and exabytes were the quantities talked about today. “In India, a lot of retail and telecom organizations have data warehouses that are several terabytes in size. Web-based organizations that build clickstream warehouses have petabyte sized implementations,” he said. Growth rates varied on account of three factors namely business growth, increase in new data points added to the warehouse and the rise in the granularity of data in the warehouse.

Simha felt that it would be in the range of 1 TB for a medium-sized organization to around 5 TB for a large organization. However, it could reach 10 plus TB in the case of major enterprises in verticals such as telecom or BFSI.

Earning an RoI

Calculating the return on investment (RoI) on any deployment is difficult. Data warehousing is no exception. Mehta felt that it was all about the value of data that was unlocked thanks to the existence of the data warehouse. “I've got a FSI company where we had worked across 1,600 employees and they had entry-exit show cards to measure attendance. We picked up the data and analyzed it and found 53 days of productivity loss on account of late coming. We put it across to the HR department in terms of male/female, seven grades and six departments. This data was published on the intranet and suddenly people started coming in on time. Even if you are able to save 20 days from those 53, the RoI is huge.”

Mehta compared data warehousing's value to that of ERP. A data warehouse lets you maintain nimble hierarchies/structures which is difficult in an ERP or core banking system. An ERP system would give a sales figure for the quarter. In the data warehouse, it would give a comparison with the last three quarters or against budgeted targets—you can do comparisons against multiple dimensions. “If I've got 100 users, 30 users would typically only do reporting. When reports are fired, the performance of the system deteriorates. When we move reporting from the transactional system into a data warehouse, the users will now fire reports from the warehouse taking the load off and thereby improving the performance of the transaction system. The average cost of license for SAP is, say, Rs. 1 lakh per user. That transaction which is happening across the SAP front with 30 users works out to Rs. 30 lakhs for just reporting users. If I move to a data warehouse for reporting, I save on that straight away,” he added.

Prabhu pointed out, “RoI in data warehousing is quick because that kind of decision support system has an immediate impact on your business and if you have a business which is large enough, the cost of a data warehouse is negligible. RoI is limited by time of investment/implementation. It takes 1.5 to 2 years for an organization to mature its data warehouse. Once you do that, the RoI is assured.”

Sybase's customers in India include telcos such as IDEA Cellular and Spice Telecom. ICICI Bank is also a customer. It also has some government customers including the Indian Railways.

“Earlier when MPP was the only option, it was an enterprise play. Today even smaller organizations are going in for a data warehouse. For a data warehouse, the entry point would be at least $100,000. In a blueprint scenario, it may not be an enterprise data warehouse in the true sense but you can put down a good analytical platform. Large organizations would end up spending at least $500,000 (BI tool, hardware, storage, services etc.),” concluded Prabhu.

Shankar said, “There are two types of benefits. Software benefits include improved information access, better insights and improved productivity. Earlier, if a person wanted a report, they would request the IT department to do the needful and IT would take four days to generate the report. Now you can do it yourself in 15 minutes. If you want to do a campaign in a short period of time, thanks to direct business analytics, the data is already 75% there. You can earn a realistic RoI in four to five months. The software benefits start accruing in six to eight months. Generally, RoI can be earned in an year in data warehousing implementations.”

Kumar said, “Typically Indian customers see RoI in two years.”

Jalan stated that data warehouse RoI depended on the maturity of an organization, adoption of analytics across the board, the ability to drive decisions from insights etc. “If these attributes of the customer are satisfied, then he can get a quick RoI of one or two years,” he said.

Ramaswamy commented, “A lot of the RoI in a data warehouse is intangible; hence, it is vital that the tangible benefits be identified, benchmarked and tracked. RoI should start accruing from the minute that business users start using the warehouse.”

Simha said, “It is anywhere from three months (for data mart apps) to around four years for a data warehouse.”

prashant.rao@expressindia.com

 


Untitled Document
Untitled Document

FEEDBACK: We would love to hear from you -- what you like about our content, what you dont, and even how you think we can improve. Please send your feedback to: prashant.rao@expressindia.com


© Copyright 2001: The Indian Express Limited. All rights reserved throughout the world. This entire site is compiled in Mumbai by the Business Publications Division (BPD) of The Indian Express Limited. Site managed by BPD.