Untitled Document
www.expresscomputeronline.com WEEKLY INSIGHT FOR TECHNOLOGY PROFESSIONALS
18 August 2008  
Untitled Document
Sections

Market
Management
Technology
Technology Life

Columns

Between The Bytes

Events

Technology Senate
Technology Sabha

Specials

HMA Bankbiz
UPS Batteries

Services
Subscribe/Renew
Archives
Search
Contact Us
Network Sites
CIO Decisions
Exp.Channel Business
Express Hospitality
Express TravelWorld
feBusiness Traveller
Express Pharma
Express Healthcare
Express Textile
Group Sites
ExpressIndia
Indian Express
Financial Express

Untitled Document
 
Home - Market - Article

Trend

Storage at Web 2.0 firms

Web 2.0 sites are extremely popular. Consequently the data storage requirements of these companies are rising rapidly, writes Nivedan Prakash

Fueled by the explosive growth in digital media and user-generated content, the demand for storage has increased exponentially, placing significant stress on current ‘in-house’ storage architectures and costly overcapacity build-outs. Factoring in time-to-market pressures as well as power, space, large capital expenditures, global performance, load balancing and availability issues, companies face exploding challenges and costs to go with the surging demand for storage.

Web 2.0 companies offer services and products that are data intensive. Foundation scalable, robust data management architecture forms the basis of the enhanced capabilities of Web 2.0 sites. There is a direct relationship between the burgeoning popularity of Web 2.0 companies and their storage requirements.

If a thousand users consume 1 GB of data, a hundred thousand users would end up consuming around one TB. This is due to that fact that social context rises significantly in the case of a Web 2.0 site along with user generated content (UGC) including photos, videos and messages that users start sharing the same with their friends in a big way.

Another factor is that the data stored by these sites is dynamic and constantly updated by users who also add new stuff almost every day. This means that the data archival onto tape or other long-term storage is out. It must be stored on near line or even primary disk storage. The storage journey of a Web 2.0 company is a long one characterized by dramatic growth, spikes in development and skyrocketing expectations. Often all of this takes place in an initial climate of fiscal restriction until the company monetizes the service effectively. It is for this reason that such organizations are open to storage solutions that can grow, or be virtualized later to ensure that old assets are reused. Organizations seeking to service the market are using technologies such as thin provisioning and storage virtualization to ensure that they can manage their growth effectively, whilst moving at “the speed of Web 2.0”.

Talking about the data storage requirements, Umesh Sharma, Product Head, authorSTREAM, said, “When a new user joins any Web 2.0 platform such as YouTube, Facebook or authorSTREAM, he does a lot of activities, both explicitly and implicitly. Like in YouTube, he may explicitly upload video, write a comment, share it with his friends, etc. and similarly an authorSTREAM user uploads and shares PowerPoint presentations. However, when he clicks on the related presentation link, he implicitly generates valuable information for authorSTREAM. Data generated by its users lets authorSTREAM make its related video algorithm stronger over time. So to provide relevant results, Web 2.0 products need to store each user’s activity details which is why the data storage requirements are growing at a rapid pace.”

The importance of storage

"Web 2.0 companies depend primarily on user-generated content; storage therefore is crucial in today’s world of mainly free Web services, and users aim to use
additional functions such as uploading images, groups, videos or comments"

- Michael Brecht
CEO, ZaaBiz

"Quantum’s StorNext data management software enables systems to share a high-speed pool of images, media content, analytical data, and other key digital assets so that files can be processed and distributed quicker"

- Jim Simon
Director of Marketing – APAC, Quantum

UGC is vital for a Web 2.0 company. This includes explicitly created content uploaded by users themselves such as text articles, videos, photos, power point presentations, etc. It can also include implicitly generated content—information gathered from users’ interaction with the site such as ratings, reviews and comments.

Planning efficient storage architecture becomes important in this context as normal file handling/database systems face scalability issues while handling large volumes of data that changes frequently.

Web 2.0 applications like Google AdSense, Flickr, Napster, Wikipedia and blogging have dramatically expanded the sheer number of content creators and contributors to the Web. There is a much higher degree of openness and collaboration in the Web 2.0. This has naturally created a huge demand for storage.

“Companies based on a Web 2.0 model depend mainly on UGC. Storage therefore is crucial in today’s world of mainly free Web services. Users aim to use additional functions such as uploading images, groups, videos or comments. Therefore, any service provider of Web 2.0 services needs to prepare for a constant increase of disk space. A lack of storage space would be interpreted as user-unfriendly,” added Michael Brecht, CEO, ZaaBiz.

To take an example, Amazon has developed massive databases of anonymous user data to understand how users interact with its site. It uses your purchase history and compares it to purchases made by other users with similar interests to make personalized recommendations like “customers who bought this item also bought...”

Web 2.0 sites and platforms are reusing their data to provide relevant information according to the users’ interests so not only data storage, data retrieval to get the most appropriate data is also important to make it easily accessible as and when required.

Commenting on the importance of storage, Nikhil Soman, Chief Technology Officer, BigAdda, said, “Web 2.0 is all about UGC. It is about sharing ones personal experience through photos, videos, audio amongst others with friends and family.”

Vivekanand Venugopal, Director, Products and Solutions, APAC, Hitachi Data Systems, explained, “Computer power is cheaper and more plentiful. Anyone can purchase a 4-way quad-core server with hundreds of gigabytes of internal storage and start providing Web 2.0 services. However, as Web 2.0 companies grow in popularity, they quickly become victims of their own success and run out of storage space. As their site or service becomes successful, it will attract more users, which in turn demands even greater storage space. Before they know it, Web 2.0 companies quickly outgrow their entire storage infrastructure. For that reason, before selling storage to a Web 2.0 company, you must take several factors into consideration, including scalability, data mobility, security and disaster recovery or business continuity.”

Virtualization of storage solutions
Virtualization helps in improving asset utilization and driving costs down. It is important for Web 2.0 companies to keep storage costs down since the nature of their business is data intensive. A virtualized environment helps in achieving this objective.

A virtualized environment encapsulates the applications from the physical environment and this gives the ability to the operations team to manage resources effectively. For example, if an application has more processing needs than storage, the environment configuration can support high-end servers with low disk capacity or vice-versa.

Storage infrastructure

"As Web 2.0 companies grow in popularity, they quickly become victims of their own success and run out of storage space. As their Web site or service becomes more successful, it will attract more users, which in turn necessitates greater storage space"

- Vivekanand Venugopal
Director, Products and Solutions, APAC Hitachi Data Systems

"We currently use Zabbix to monitor the operating system’s health and other aspects of the file system. Our administrators have also created a few scripts to show the load and throughput of all our servers"

- Nikhil Soman
Chief Technology Officer,
BigAdda

Web 2.0 companies have robust storage infrastructure in place. Big Adda uses a distributed file system called MogileFS that works with low cost hardware and provides administrators with a command line tool to manage and monitor various aspects of the file system. This includes rebalancing, file system checks, creating new classes, etc.

Soman, said, “We currently use Zabbix to monitor the operating system’s health and other aspects of the file system. Our administrators have also created a few scripts to show the load and throughput of all our servers. We have not invested a substantial amount in storage since we are using off-the-shelf servers. Our distributed file system provides us with the required performance and redundancy.”

Sharma pointed out, “We keep user information and user-enhanced content (generated by side effects) in our database servers. Moreover, we store all flat file data (video files, presentations and other rich media formats) on Amazon’s S3 servers. At the same time I think, for Web 2.0 startups, outsourcing is an affordable solution to handle such large amounts of data.”

On the other hand, Ibibo has designed and architected its storage infrastructure based on various factors, including the type of data (text or media file), serving needs (how frequently will the file be accessed and how will it be accessed), and rate of growth of data. The company has created different storage units based on these requirements. In most cases, a storage unit consists of a set of normal Intel boxes with high capacity disks. The architecture is such that it gives the impression of a single storage space to the applications using it and scales horizontally. Ibibo is using a distributed architecture to manage its storage systems.

Highlighting ZaaBiz’s storage infrastructure, Brecht, explained, “Currently the service works on the basis of several high-end servers. Our future investments are conditioned by the requirements that we will face in terms of increasing number of ZaaBiz members, new functionalities offered or possible diversifications implemented.”

Meanwhile, the storage solutions providers are offering solutions that simplify the management of multiple petabytes of data at an affordable cost. For example, Hitachi Data Systems offers Service Oriented Storage Solutions (SOSS)—a business-centric approach to aligning IT storage requirements with the constantly changing business needs of Web 2.0 companies. SOSS applies the concept of service-oriented architecture (SOA) to storage to deliver organizational flexibility and ability to stimulate existing storage infrastructure. SOSS enables Web 2.0 companies to reduce IT costs, storage complexity, data risk, and over-subscription of storage resources, while increasing operational efficiency and complying with corporate governance requirements.

Jim Simon, Director of Marketing-APAC, Quantum, commented, “Quantum’s StorNext data management software enables customers to generate revenue faster and store more data at a lower cost. Combining high-speed data sharing with cost-effective content retention, StorNext helps customers build an infrastructure for consolidating resources so that workflow operations run faster and maintaining business assets costs less. StorNext enables systems to share a high-speed pool of images, media content, analytical data, and other key digital assets so files can be processed and distributed quicker. Even in heterogeneous environments, all files are easily accessible to all hosts—SAN or LAN.”

Tiered storage

Here we would like to mention that many Web 2.0 companies are now open to the concept of tiered storage in order to ensure data mobility. In tiered storage, you store your data on different types of storage media in order to reduce total storage cost. The levels of

protection needed, performance requirements, frequency of use, and other considerations determine the categories. In most cases, delineation in tiered storage occurs on price and performance.

Tiered storage is common in many Web 2.0 companies. Their need however depends on the type of applications. For example, an application can have different types of storage needs. Some of them are:

  • Passive storage: Functions such as backup, storing data crawled from the Web, historical data for the purpose of statistical analysis etc require these devices. In such cases, the file serving speed is not of much significance. Hence they could use storage mechanisms like tape drives, disk array clusters, normal disk drives, etc., instead of expensive ones like SAN.
  • Active Storage: These would involve applications like photos/videos storage or a central user database. In such cases, serving speed along with the storage capacity becomes the prime requirement.

"We keep user information and user-enhanced content in our database servers. In addition, we store all flat file data on Amazon’s S3 servers. For Web 2.0
startups, outsourcing is the affordable solution to handle such large amounts of data"

- Umesh Sharma
Product Head, authorSTREAM

"We have planned our storage architecture in such a way that it can scale horizontally. We add new
clusters as soon as there is a
requirement. A storage cluster
could consist of three or four high disk capacity boxes or a simple SAN"

- Rajesh Warrier
Vice-president, Search and Platforms, Ibibo

Speaking about the tiered storage, Surajit Sen, National Sales Manager, NetApp India, explained, “The tiered storage architecture is relevant to Web 2.0 companies, as this offers a scalable and cost-effective architecture, without compromising on the performance of an application. The ideology behind this architecture is that the most performance intensive data resides on the high performance higher cost storage and the less performance intensive data on low cost storage with built in intelligence which allows policy creation to migrate data between the different tiers.”

Coping with storage requirements

In order to manage their spiraling and erratic storage requirements, Web 2.0 companies are utilizing many advanced technologies, which help keep costs down and helps simplify manageability, like thin provisioning, de-duplication, RAID 6 architectures like shared storage.

For example, authorSTREAM handles this problem by outsourcing its storage infrastructure to Amazon’s S3 service. This helps the company in two ways—one it helps in rapidly scaling up with S3’s large infrastructure, and two, it keeps the cost variable for the company with S3’s pay-per-use model.

Rajesh Warrier, Vice-president, Search and Platforms, Ibibo, said, “The architecture of our storage requirements has been planned such that it can scale horizontally. We add new clusters as soon as there is a requirement. A storage cluster could be three or four high disk capacity boxes or a simple SAN.”

Another company in this space, Big Adda, sees it as a simple matter of plug and play; the company plugs new servers in, runs a few migration scripts and the storage cloud assimilates the servers. The entire process takes the company just about 15 minutes.

Brecht, commented, “From the beginning we have been prepared for growing storage requirements in terms of technical equipment, therefore we do not have any problems with storage at this moment and we do not foresee to any coming up in the future.”

The bottom line is that Web 2.0 companies must take a new approach to storage. Companies need to move from the old and out-dated storage 1.0 model of ‘do everything yourself’ to a new storage 2.0 model. The storage 2.0 models deliver persistent storage on demand to applications regardless of location, pre-defined boundaries, and meet the performance and scalability characteristics of these applications.

nivedan.prakash@expressindia.com

 


Untitled Document

UNSUBSCRIBE HERE
Untitled Document
© Copyright 2001: Indian Express Newspapers (Mumbai) Limited (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in Mumbai by the Business Publications Division (BPD) of the Indian Express Newspapers (Mumbai) Limited. Site managed by BPD.