Issue dated - 03rd May 2004

-


Previous Issues

CURRENT ISSUE
INDIA NEWS
NEWS ANALYSIS
INDIA TRENDS
INDIA COMPUTES
PRODUCT
COLUMNS
TECH FORUM

THE C# COLUMN

BETWEEN THE BYTES
TECHNOLOGY
SPECIALS <NEW>
Symantec Report
Security Headquarters
JobsDB
MINDPRINTS
HMA BANKBIZ
EC SERVICES
ARCHIVES/SEARCH
IT APPOINTMENTS
Openings At Jobstreet.com
WRITE TO US
SUBSCRIBE/RENEW
CUSTOMER SERVICE
ADVERTISE
ABOUT US

 Network Sites
  IT People
  Network Magazine
  Business Traveller
  Exp. Hotelier & Caterer
  Exp. Travel & Tourism
  Exp. Pharma Pulse
  Exp. Healthcare Mgmt.
  Express Textile
 Group Sites
  ExpressIndia
  Indian Express
  Financial Express

 
Front Page > Opinion > Story Print this Page|  Email this page

Managing archival data

With the glut of data in businesses, which has only worsened because of new laws that require e-mail communication to be stored for years, managing data isn’t easy. Sudhakar Rao provides some solutions

In today’s business environment information and data have become the most important corporate assets. This has given a huge impetus to the storage solutions market. However, with an exponential growth of data, it is becoming increasingly difficult to store and manage archival data. It has become all the more problematic because both structured and unstructured data has become an integral part of today’s business.

The traditional definition of structured data is that data which is organised by the well-defined structure provided by databases. Database sizes are growing so fast that it is impeding application performance, stretching back-up windows and artificially inflating the total cost of operations.

However, if we look at unstructured data, the growth of unstructured data has far surpassed the growth of structured data. This is virtually due to the inherent nature of unstructured data. Unstructured data typically comprises of documents, spreadsheets, graphics, images and various other formats. Going further, messages and e-mail can be classified as semi-structured data. According to industry estimates over 50 percent of the data residing in data centres falls into these categories.

One particular yet simple example that is troubling almost everyone including CEOs and CFOs is the phenomenal growth of e-mail. Adding to the problem are new regulations that are forcing corporations to retain their e-mails for a specified period of time, to be produced on demand. This is a difficult task, given that they have routinely been spread throughout an IT infrastructure and subject to regular purging to limit the size of e-mail stores.

In the face of such a scenario, organisations have to resort to techniques such as Data Life Cycle Management. This is done by effectively managing all the data that is considered to be a corporate asset, by matching availability and retrieval time with the data’s value, which varies throughout the data lifecycle. In adopting Data Life Cycle Management techniques organisations can elevate the efficiency and responsiveness of the total storage environment and utilise available capacity optimally.

While it is fundamental that IT departments continue to ensure capacity requirements are met for critical applications, there is a further demand for more effectively managing digital assets by moving them to a different class of media based on their current value. The idea is to take advantage of waning requirements for retrieval time and availability by moving less valuable, less-likely to be accessed data to less expensive storage. Doing so necessitates greater intelligence for managing storage devices and automatically moving data within the overall storage environment from the time it is created until its expiry.

Further, since more and more information that is generated out of business activities is outside the boundaries of structured bounds and retrieval mechanisms. This all the more brings out the need to quickly catalogue, search and retrieve this unstructured information into the storage environment itself. At the same time, solutions must encompass varying classes of storage devices and media arranged in tiers in order to balance the cost of storing any particular data asset with its current value from the time of creation to end-of-life.

Therefore the archival platform solution should be an ideal combination of intelligent storage and an open and collaborative approach to storage software. This combination can be most effectively used with ISO’s Reference Model for Open Archival Information Systems (OAIS). OAIS is a proven foundation for archive systems, having served as the underpinnings of some of the largest data archives in existence.

Following the OAIS foundation guidelines, the storage environment should be able to deliver the various functions in the OAIS model, which are:

  • Preservation planning: This involves understanding the business-specific issues related to data and how that data value varies over its useful lifetime. An appropriate mix of consulting and technology is required to draw down archival policies that form the basic framework of managing and retrieving archived data based on their value. This is the first step in implementing the OAIS model.
  • Produce: This function involves the aspect of handling all data assets produced by any manner of industry or activity.
  • Ingest: With the data being produced, the ingest function prepares the generated data to be prepared for storage and management within the archive store. The actions in an ingest function include the creation of a digital signature for uniquely identifying the object, indexing it, and moving the metadata describing it into the metadata store. Metadata is information about the data that is used in populating, maintaining and accessing both the descriptive information that identifies the archive’s holdings and the administrative data used to manage the archive.
  • Data Management: Once the metadata is developed, data management involves indexing the metadata so that it is made searchable and can be retrieved whenever required. A link from the metadata store is used to determine where the data asset in maintained in the storage archive infrastructure.
  • Archival storage: This function stores, maintains and retrieves data, manages the storage hierarchy, including movement based on changes in data value, and provides disaster recovery capabilities. This function is further enhanced if the archival storage solution allows seamless data movement in a heterogeneous storage environment. Interoperability is a key aspect in the archival storage function.
  • Administration: Administration functions include configuration management of system hardware, software and system engineering functions to monitor and improve archive operations, updating of archival and HSM policies, and customer support. While routine administration functions are handled using various management tools, using services support optimises the overall operations of the archive system.
  • Access control: This function helps consumers find information, limits access as required (for example, enforcing read-only access for mandated retention periods), and delivers query responses to consumers. This should also be combined with tamper-proof functionality. This can be achieved by locking disc volumes as ‘read only’. Further, it should also help in keeping the data retention setting intact within tiers of a storage archive.
  • Consume: Just as in the ‘produce’ function earlier in the model, consumption has to be tailored to the intended use of the data assets. Often this involves integrating the archival system to the application ordinarily used to access the data. While it is important to provide a general interface for archived data retrieval for auditors and administrators, real value is added by enabling the application and application user to continue working as always. Their standard application interface and access approach should not change whether data is in primary, secondary or tertiary storage.

The archival storage architecture should be based on an open, ISO-compliant architecture that implements Data Lifecycle Management as a complement to mainstream storage and business continuity practices. This openness allows enterprises to participate in an interoperable environment where the right data is always available at the right time, and there is no need for the special-purpose storage management software and devices used by other solutions.

The author is director-technical consultant at Hitachi Data Systems and can be contacted at Sudhakar.Rao@hds.com

<Back to top>


© Copyright 2003: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in
Mumbai by The Business Publications Division of the Indian Express Group of Newspapers.
Please contact our Webmaster for any queries on this site.