Untitled Document
Untitled Document

www.expresscomputeronline.com WEEKLY INSIGHT FOR TECHNOLOGY PROFESSIONALS
21 January 2008  
Untitled Document
Sections

Market
Management
Technology
Technology Life

Columns

Between The Bytes

Events

Technology Senate
Technology Sabha

Specials

HMA Bankbiz
UPS Batteries

Services
Subscribe/Renew
Archives
Search
Contact Us
Network Sites
CIO Decisions
Exp.Channel Business
Express Hospitality
Express TravelWorld
feBusiness Traveller
Express Pharma
Express Healthcare
Express Textile
Group Sites
ExpressIndia
Indian Express
Financial Express

Untitled Document
 
Home - Management - Article

Lead

Managing unstructured data

The exponential growth of unstructured data and need for regulatory compliance have resulted in the management of unstructured information becoming a critical imperative in the enterprise. By Vinita Gupta

Managing unstructured data is a problem that has been around for as long as people have been using computers. Most organizations at that time were not even aware that they had a problem on hand. Today however it has become a pressing concern. The reason behind this is the rapid growth of unstructured and semi-structured information in the enterprise. Unstructured data now is four times larger in volume than structured data. While the ability to exploit unstructured data has turned into a competitive differentiator, little effort is being spent on managing and analyzing it.

US federal regulations, such as Sarbanes-Oxley and HIPAA, have made organizations responsible for the mammoth, scattered piles of information. It is of course a gigantic task that they cannot accomplish easily. Before looking at what are the challenges faced being faced by companies in managing unstructured data, it is important to understand what exactly is unstructured data, its source and repository.

“Apart from core applications,
there is a large volume of unstructured data which is critical for business as a great deal of processing is done in the form of e-mails.”

- Sanjay Sharma
CTO,
IDBI Bank

“Data management is not an IT job. The top management who should take
initiatives to formulate a suitable policy taking into account the nature of the business.”

- Babu V
Nodal officer-CashTree/BANCS group of ATMs with e Funds International

Defining unstructured data

Wikipedia defines unstructured data as unstructured information, referring to masses of (usually) computerized information, which either do not have a data structure or one that is not easily readable by a machine. Examples of unstructured data may include audio, video and unstructured text such as the body of an e-mail message or word processor document. Also, data with some form of structure may also be referred to as unstructured data if the structure is not helpful in processing it. For example, an HTML web page is highly structured, but this structure is often oriented towards formatting, rather than performing more complex tasks with the contents of a page.

A great deal of confusion surrounds the subject of data structures. General agreement exists that database information is structured information as data and its associated metadata are tightly coupled in this instance. The way to determine whether data is structured or not is to ask whether or not the data can be sorted. If the answer is “yes”, the data is structured.

Substantial portions of data can be classified as semi-structured data. How would you differentiate semi-structured data from unstructured data? It is generally believed that e-mail is semi-structured while videos, pictures, audio files, and medical images are unstructured. While word processing documents and presentations are considered unstructured, they are actually semi-structured documents. Consequently, if a search is possible using standard tools like Google, the data is either structured or semi-structured.

A clear understanding of different types of data helps in managing them differently. As substantial portion of data in public domain is either unstructured or semi-structured, special analytical tools are needed to add intelligence to analyze and understand them. Nevertheless, semi-structured and unstructured data are generally understood as unstructured in the IT world at present.

A few challenges in managing unstructured data
  • How and where to store the large volumes of unstructured data—also the hardware costs, management overhead costs, etc., associated with storage.
  • How to manage retention of unstructured data and archival policies thereof.
  • How to secure unstructured data, and ensuring consistent security.
  • Finally, availability and recoverability of unstructured data.

A difficult objective

In any organization a large number of electronic transactions take place. Due to technology inroads in all fields, a data explosion is visible. Just going in for a centralized architecture will not yield results unless proper data management takes place through the establishment of a data warehouse.

T G Dhandapani, Corporate CIO, SCL-TVS Group said, “Nowadays, more importance is being attached to understanding, analyzing and adding intelligence to unstructured data. This is because valuable information is available in unstructured form on Web sites, blogs, mails, minutes, notes, etc., for an organization about its products and services. Merrill Lynch estimates that more than 85% of business information exists as unstructured data. The challenge is to understand where such data exists and the availability of an appropriate tool to convert unstructured data into meaningful information.”

“Unstructured data mostly comes from sources like e-mail messages. Sometimes customers complain that online transactions are not reflected in their accounts in case of transactions routed through different delivery channels,” stated Babu V, Nodal officer-CashTree/BANCS group of ATMs with eFunds International. He pointed out that companies typically have a number of information systems in place which adds to the challenge. This is more so in case of organizations having legacy systems. Converting them to suit current needs through data enrichment is a bigger challenge some of the banks are facing today.

Sanjay Sharma, CTO, IDBI Bank revealed, “Apart from core applications, there is a large volume of unstructured data which is critical for businesses as a great deal of processing is done in the form of e-mail messages.” Extraction of required data from the unstructured lot at the right time is difficult, hence it’s crucial for organizations to have clear policies and discipline for storing data.

“Nowadays, more importance is being attached to understand, analyze and adding intelligence to unstructured data.”

- T G Dhandapani
corporate CIO,
SCL-TVS Group

“Unstructured data management in organizations is mostly done by
individuals themselves.”



- Vijay Sethi

VP-Information Systems,
Hero Honda

Data management

The format of managing unstructured data may differ from one institution to another, depending upon the nature of the business and necessary access controls put in place at the time of customization. The three R’s of data management are: to make the Right information available to the Right people and at the Right time.

Acknowledging the fact that most organizations struggle with data and knowledge stored in unstructured form, Vijay Sethi, VP-Information Systems, Hero Honda, stated, “Unstructured data management in organizations is mostly done by individuals themselves. They put data in folders and files as per their convenience and then spend a fair amount of time searching for right data. Some common techniques for structuring text usually involves manual tagging with metadata for further text mining-based structuring.”

The basic solution that Hero Honda uses currently is storing their data in directory formats and structures which give some meaning. On the systems front, the company does microfilming and archiving of data and files and indexes them for easier search and retrieval. It has also considered some knowledge management solutions.

Babu said, “In our organization the data which has originated from transactions at the data center is mostly in a structured format and the same is kept in a centralized place for access by the employees. Much importance is not given at present to e-mail activity by employees, even though as per the Sarbanes Oxley Act the top management should be aware and is responsible for all the activities of the employees.” He asserted that management of data will help financial institutions improve their efficiency. Availability of correct data has assumed much importance with the implementation of BASEL II knocking at the doors of financial institutions. This is a major challenge faced by some public sector banks possessing legacy data which is not of much importance in the current scenario. Some of the banks have started data warehouse activities to keep their data at one place and in a structured format.

Data categories
Attributes Structured Semi-structured Unstructured
Sorting feasibility Possible Not Possible Not possible
Searching facility using standard tool Possible Possible Not possible
Sensing Possible Possible Possible

“Data management is not an IT job. The top management who should take initiatives to formulate a suitable policy taking into account the nature of the business and use technology to make it available as per the policy,” added Babu.

IDBI has a document management system in place which takes care of the data in Word documents and images. Now the bank is looking at some solutions that can help in the easy retrieval of unstructured data. Sharma said, “If cost prevails, instead of upgrading the existing solutions, we will migrate to a new solution.”

Dhandapani revealed that the major problem in management of unstructured data has been limited or lack of tools to extract data from the public domain in a systematic way constantly. There is also non-availability of tools to analyze data available in various languages. Consequently, what is known is known, and what is not picked remains under the carpet.

In the near future there will be a growing awareness of correct data in a structured format, mostly online, to take care of compliance issues.

vinita.gupta@expressindia.com

 


Untitled Document

UNSUBSCRIBE HERE
Untitled Document
© Copyright 2001: Indian Express Newspapers (Mumbai) Limited (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in Mumbai by the Business Publications Division (BPD) of the Indian Express Newspapers (Mumbai) Limited. Site managed by BPD.