|
Managing archival data
With
the glut of data in businesses, which has only worsened because of new laws
that require e-mail communication to be stored for years, managing data isnt
easy. Sudhakar Rao provides some solutions
In todays business environment information and data have become the most
important corporate assets. This has given a huge impetus to the storage solutions
market. However, with an exponential growth of data, it is becoming increasingly
difficult to store and manage archival data. It has become all the more problematic
because both structured and unstructured data has become an integral part of
todays business.
The traditional definition of structured data is that data
which is organised by the well-defined structure provided by databases. Database
sizes are growing so fast that it is impeding application performance, stretching
back-up windows and artificially inflating the total cost of operations.
However, if we look at unstructured data, the growth of unstructured data has
far surpassed the growth of structured data. This is virtually due to the inherent
nature of unstructured data. Unstructured data typically comprises of documents,
spreadsheets, graphics, images and various other formats. Going further, messages
and e-mail can be classified as semi-structured data. According to industry
estimates over 50 percent of the data residing in data centres falls into these
categories.
One particular yet simple example that is troubling almost everyone including
CEOs and CFOs is the phenomenal growth of e-mail. Adding to the problem are
new regulations that are forcing corporations to retain their e-mails for a
specified period of time, to be produced on demand. This is a difficult task,
given that they have routinely been spread throughout an IT infrastructure and
subject to regular purging to limit the size of e-mail stores.
In the face of such a scenario, organisations have to resort to techniques such
as Data Life Cycle Management. This is done by effectively managing all the
data that is considered to be a corporate asset, by matching availability and
retrieval time with the datas value, which varies throughout the data
lifecycle. In adopting Data Life Cycle Management techniques organisations can
elevate the efficiency and responsiveness of the total storage environment and
utilise available capacity optimally.
While it is fundamental that IT departments continue to ensure capacity requirements
are met for critical applications, there is a further demand for more effectively
managing digital assets by moving them to a different class of media based on
their current value. The idea is to take advantage of waning requirements for
retrieval time and availability by moving less valuable, less-likely to be accessed
data to less expensive storage. Doing so necessitates greater intelligence for
managing storage devices and automatically moving data within the overall storage
environment from the time it is created until its expiry.
Further, since more and more information that is generated out of business activities
is outside the boundaries of structured bounds and retrieval mechanisms. This
all the more brings out the need to quickly catalogue, search and retrieve this
unstructured information into the storage environment itself. At the same time,
solutions must encompass varying classes of storage devices and media arranged
in tiers in order to balance the cost of storing any particular data asset with
its current value from the time of creation to end-of-life.
Therefore the archival platform solution should be an ideal combination of intelligent
storage and an open and collaborative approach to storage software. This combination
can be most effectively used with ISOs Reference Model for Open Archival
Information Systems (OAIS). OAIS is a proven foundation for archive systems,
having served as the underpinnings of some of the largest data archives in existence.
Following the OAIS foundation guidelines, the storage environment should be
able to deliver the various functions in the OAIS model, which are:
- Preservation planning: This involves understanding
the business-specific issues related to data and how that data value varies
over its useful lifetime. An appropriate mix of consulting and technology
is required to draw down archival policies that form the basic framework of
managing and retrieving archived data based on their value. This is the first
step in implementing the OAIS model.
- Produce: This function involves the aspect of handling
all data assets produced by any manner of industry or activity.
- Ingest: With the data being produced, the ingest
function prepares the generated data to be prepared for storage and management
within the archive store. The actions in an ingest function include the creation
of a digital signature for uniquely identifying the object, indexing it, and
moving the metadata describing it into the metadata store. Metadata is information
about the data that is used in populating, maintaining and accessing both
the descriptive information that identifies the archives holdings and
the administrative data used to manage the archive.
- Data Management: Once the metadata is developed,
data management involves indexing the metadata so that it is made searchable
and can be retrieved whenever required. A link from the metadata store is
used to determine where the data asset in maintained in the storage archive
infrastructure.
- Archival storage: This function stores, maintains
and retrieves data, manages the storage hierarchy, including movement based
on changes in data value, and provides disaster recovery capabilities. This
function is further enhanced if the archival storage solution allows seamless
data movement in a heterogeneous storage environment. Interoperability is
a key aspect in the archival storage function.
- Administration: Administration functions include
configuration management of system hardware, software and system engineering
functions to monitor and improve archive operations, updating of archival
and HSM policies, and customer support. While routine administration functions
are handled using various management tools, using services support optimises
the overall operations of the archive system.
- Access control: This function helps consumers find
information, limits access as required (for example, enforcing read-only access
for mandated retention periods), and delivers query responses to consumers.
This should also be combined with tamper-proof functionality. This can be
achieved by locking disc volumes as read only. Further, it should
also help in keeping the data retention setting intact within tiers of a storage
archive.
- Consume: Just as in the produce function
earlier in the model, consumption has to be tailored to the intended use of
the data assets. Often this involves integrating the archival system to the
application ordinarily used to access the data. While it is important to provide
a general interface for archived data retrieval for auditors and administrators,
real value is added by enabling the application and application user to continue
working as always. Their standard application interface and access approach
should not change whether data is in primary, secondary or tertiary storage.
The archival storage architecture should be based on an open, ISO-compliant
architecture that implements Data Lifecycle Management as a complement to mainstream
storage and business continuity practices. This openness allows enterprises
to participate in an interoperable environment where the right data is always
available at the right time, and there is no need for the special-purpose storage
management software and devices used by other solutions.
The author is director-technical consultant at Hitachi
Data Systems and can be contacted at Sudhakar.Rao@hds.com
|