|
Trend
Storage at Web 2.0 firms
Web 2.0 sites are extremely popular. Consequently the data
storage requirements of these companies are rising rapidly, writes Nivedan
Prakash
Fueled
by the explosive growth in digital media and user-generated content, the demand
for storage has increased exponentially, placing significant stress on current
in-house storage architectures and costly overcapacity build-outs.
Factoring in time-to-market pressures as well as power, space, large capital
expenditures, global performance, load balancing and availability issues, companies
face exploding challenges and costs to go with the surging demand for storage.
Web 2.0 companies offer services and products that are data
intensive. Foundation scalable, robust data management architecture forms the
basis of the enhanced capabilities of Web 2.0 sites. There is a direct relationship
between the burgeoning popularity of Web 2.0 companies and their storage requirements.
If a thousand users consume 1 GB of data, a hundred thousand users would end
up consuming around one TB. This is due to that fact that social context rises
significantly in the case of a Web 2.0 site along with user generated content
(UGC) including photos, videos and messages that users start sharing the same
with their friends in a big way.
Another factor is that the data stored by these sites is dynamic and constantly
updated by users who also add new stuff almost every day. This means that the
data archival onto tape or other long-term storage is out. It must be stored
on near line or even primary disk storage. The storage journey of a Web 2.0
company is a long one characterized by dramatic growth, spikes in development
and skyrocketing expectations. Often all of this takes place in an initial climate
of fiscal restriction until the company monetizes the service effectively. It
is for this reason that such organizations are open to storage solutions that
can grow, or be virtualized later to ensure that old assets are reused. Organizations
seeking to service the market are using technologies such as thin provisioning
and storage virtualization to ensure that they can manage their growth effectively,
whilst moving at the speed of Web 2.0.
Talking about the data storage requirements, Umesh Sharma, Product Head, authorSTREAM,
said, When a new user joins any Web 2.0 platform such as YouTube, Facebook
or authorSTREAM, he does a lot of activities, both explicitly and implicitly.
Like in YouTube, he may explicitly upload video, write a comment, share it with
his friends, etc. and similarly an authorSTREAM user uploads and shares PowerPoint
presentations. However, when he clicks on the related presentation link, he
implicitly generates valuable information for authorSTREAM. Data generated by
its users lets authorSTREAM make its related video algorithm stronger over time. So
to provide relevant results, Web 2.0 products need to store each users
activity details which is why the data storage requirements are growing at a
rapid pace.
The importance of storage
|
"Web
2.0 companies depend primarily on user-generated content; storage therefore
is crucial in todays world of mainly free Web services, and users
aim to use
additional functions such as uploading images, groups, videos or comments"
- Michael Brecht
CEO, ZaaBiz
|
|
"Quantums
StorNext data management software enables systems to share a high-speed
pool of images, media content, analytical data, and other key digital
assets so that files can be processed and distributed quicker"
- Jim Simon
Director of Marketing APAC, Quantum
|
UGC is vital for a Web 2.0 company. This includes explicitly
created content uploaded by users themselves such as text articles, videos,
photos, power point presentations, etc. It can also include implicitly generated
contentinformation gathered from users interaction with the site
such as ratings, reviews and comments.
Planning efficient storage architecture becomes important
in this context as normal file handling/database systems face scalability issues
while handling large volumes of data that changes frequently.
Web 2.0 applications like Google AdSense, Flickr, Napster,
Wikipedia and blogging have dramatically expanded the sheer number of content
creators and contributors to the Web. There is a much higher degree of openness
and collaboration in the Web 2.0. This has naturally created a huge demand for
storage.
Companies based on a Web 2.0 model depend mainly on UGC. Storage therefore
is crucial in todays world of mainly free Web services. Users aim to use
additional functions such as uploading images, groups, videos or comments. Therefore,
any service provider of Web 2.0 services needs to prepare for a constant increase
of disk space. A lack of storage space would be interpreted as user-unfriendly,
added Michael Brecht, CEO, ZaaBiz.
To take an example, Amazon has developed massive databases
of anonymous user data to understand how users interact with its site. It uses
your purchase history and compares it to purchases made by other users with
similar interests to make personalized recommendations like customers
who bought this item also bought...
Web 2.0 sites and platforms are reusing their data to provide relevant information
according to the users interests so not only data storage, data retrieval
to get the most appropriate data is also important to make it easily accessible
as and when required.
Commenting on the importance of storage, Nikhil Soman, Chief Technology Officer,
BigAdda, said, Web 2.0 is all about UGC. It is about sharing ones personal
experience through photos, videos, audio amongst others with friends and family.
Vivekanand Venugopal, Director, Products and Solutions, APAC, Hitachi Data Systems,
explained, Computer power is cheaper and more plentiful. Anyone can purchase
a 4-way quad-core server with hundreds of gigabytes of internal storage and
start providing Web 2.0 services. However, as Web 2.0 companies grow in popularity,
they quickly become victims of their own success and run out of storage space.
As their site or service becomes successful, it will attract more users, which
in turn demands even greater storage space. Before they know it, Web 2.0 companies
quickly outgrow their entire storage infrastructure. For that reason, before
selling storage to a Web 2.0 company, you must take several factors into consideration,
including scalability, data mobility, security and disaster recovery or business
continuity.
| Virtualization helps in improving asset utilization
and driving costs down. It is important for Web 2.0 companies to keep storage
costs down since the nature of their business is data intensive. A virtualized
environment helps in achieving this objective.
A virtualized environment encapsulates the applications
from the physical environment and this gives the ability to the operations
team to manage resources effectively. For example, if an application has
more processing needs than storage, the environment configuration can
support high-end servers with low disk capacity or vice-versa.
|
Storage infrastructure
|
"As
Web 2.0 companies grow in popularity, they quickly become victims of their
own success and run out of storage space. As their Web site or service
becomes more successful, it will attract more users, which in turn necessitates
greater storage space"
- Vivekanand Venugopal
Director, Products and Solutions, APAC Hitachi Data Systems
|
|
"We
currently use Zabbix to monitor the operating systems health and
other aspects of the file system. Our administrators have also created
a few scripts to show the load and throughput of all our servers"
- Nikhil Soman
Chief Technology Officer,
BigAdda
|
Web 2.0 companies have robust storage infrastructure in place.
Big Adda uses a distributed file system called MogileFS that works with low
cost hardware and provides administrators with a command line tool to manage
and monitor various aspects of the file system. This includes rebalancing, file
system checks, creating new classes, etc.
Soman, said, We currently use Zabbix to monitor the operating systems
health and other aspects of the file system. Our administrators have also created
a few scripts to show the load and throughput of all our servers. We have not
invested a substantial amount in storage since we are using off-the-shelf servers.
Our distributed file system provides us with the required performance and redundancy.
Sharma pointed out, We keep user information and user-enhanced
content (generated by side effects) in our database servers. Moreover, we store
all flat file data (video files, presentations and other rich media formats)
on Amazons S3 servers. At the same time I think, for Web 2.0 startups,
outsourcing is an affordable solution to handle such large amounts of data.
On the other hand, Ibibo has designed and architected its
storage infrastructure based on various factors, including the type of data
(text or media file), serving needs (how frequently will the file be accessed
and how will it be accessed), and rate of growth of data. The company has created
different storage units based on these requirements. In most cases, a storage
unit consists of a set of normal Intel boxes with high capacity disks. The architecture
is such that it gives the impression of a single storage space to the applications
using it and scales horizontally. Ibibo is using a distributed architecture
to manage its storage systems.
Highlighting ZaaBizs storage infrastructure, Brecht,
explained, Currently the service works on the basis of several high-end
servers. Our future investments are conditioned by the requirements that we
will face in terms of increasing number of ZaaBiz members, new functionalities
offered or possible diversifications implemented.
Meanwhile, the storage solutions providers are offering solutions
that simplify the management of multiple petabytes of data at an affordable
cost. For example, Hitachi Data Systems offers Service Oriented Storage Solutions
(SOSS)a business-centric approach to aligning IT storage requirements
with the constantly changing business needs of Web 2.0 companies. SOSS applies
the concept of service-oriented architecture (SOA) to storage to deliver organizational
flexibility and ability to stimulate existing storage infrastructure. SOSS enables
Web 2.0 companies to reduce IT costs, storage complexity, data risk, and over-subscription
of storage resources, while increasing operational efficiency and complying
with corporate governance requirements.
Jim Simon, Director of Marketing-APAC, Quantum, commented,
Quantums StorNext data management software enables customers to
generate revenue faster and store more data at a lower cost. Combining high-speed
data sharing with cost-effective content retention, StorNext helps customers
build an infrastructure for consolidating resources so that workflow operations
run faster and maintaining business assets costs less. StorNext enables systems
to share a high-speed pool of images, media content, analytical data, and other
key digital assets so files can be processed and distributed quicker. Even in
heterogeneous environments, all files are easily accessible to all hostsSAN
or LAN.
Tiered storage
Here we would like to mention that many Web 2.0 companies are now open to the
concept of tiered storage in order to ensure data mobility. In tiered storage,
you store your data on different types of storage media in order to reduce total
storage cost. The levels of
protection needed, performance requirements, frequency of use, and other considerations
determine the categories. In most cases, delineation in tiered storage occurs
on price and performance.
Tiered storage is common in many Web 2.0 companies. Their need however depends
on the type of applications. For example, an application can have different
types of storage needs. Some of them are:
- Passive storage: Functions such as backup,
storing data crawled from the Web, historical data for the purpose of statistical
analysis etc require these devices. In such cases, the file serving speed
is not of much significance. Hence they could use storage mechanisms like
tape drives, disk array clusters, normal disk drives, etc., instead of expensive
ones like SAN.
- Active Storage: These would involve applications
like photos/videos storage or a central user database. In such cases, serving
speed along with the storage capacity becomes the prime requirement.
|
"We
keep user information and user-enhanced content in our database servers.
In addition, we store all flat file data on Amazons S3 servers.
For Web 2.0
startups, outsourcing is the affordable solution to handle such large
amounts of data"
- Umesh Sharma
Product Head, authorSTREAM
|
|
"We
have planned our storage architecture in such a way that it can scale
horizontally. We add new
clusters as soon as there is a
requirement. A storage cluster
could consist of three or four high disk capacity boxes or a simple SAN"
- Rajesh Warrier
Vice-president, Search and Platforms, Ibibo
|
Speaking about the tiered storage, Surajit Sen, National Sales
Manager, NetApp India, explained, The tiered storage architecture is relevant
to Web 2.0 companies, as this offers a scalable and cost-effective architecture,
without compromising on the performance of an application. The ideology behind
this architecture is that the most performance intensive data resides on the
high performance higher cost storage and the less performance intensive data
on low cost storage with built in intelligence which allows policy creation
to migrate data between the different tiers.
Coping with storage requirements
In order to manage their spiraling and erratic storage requirements,
Web 2.0 companies are utilizing many advanced technologies, which help keep
costs down and helps simplify manageability, like thin provisioning, de-duplication,
RAID 6 architectures like shared storage.
For example, authorSTREAM handles this problem by outsourcing its storage infrastructure
to Amazons S3 service. This helps the company in two waysone it
helps in rapidly scaling up with S3s large infrastructure, and two, it
keeps the cost variable for the company with S3s pay-per-use model.
Rajesh Warrier, Vice-president, Search and Platforms, Ibibo, said, The
architecture of our storage requirements has been planned such that it can scale
horizontally. We add new clusters as soon as there is a requirement. A storage
cluster could be three or four high disk capacity boxes or a simple SAN.
Another company in this space, Big Adda, sees it as a simple
matter of plug and play; the company plugs new servers in, runs a few migration
scripts and the storage cloud assimilates the servers. The entire process takes
the company just about 15 minutes.
Brecht, commented, From the beginning we have been
prepared for growing storage requirements in terms of technical equipment, therefore
we do not have any problems with storage at this moment and we do not foresee
to any coming up in the future.
The bottom line is that Web 2.0 companies must take a new
approach to storage. Companies need to move from the old and out-dated storage
1.0 model of do everything yourself to a new storage 2.0 model.
The storage 2.0 models deliver persistent storage on demand to applications
regardless of location, pre-defined boundaries, and meet the performance and
scalability characteristics of these applications.
nivedan.prakash@expressindia.com
|