|
30 Minute Interview
SSDs, Unified Storage, ILM...looking beyond the buzzwords
Shailesh Agarwal, Country Manager Storage,
IBM Systems & Technology Group, IBM Global Services India Pvt Ltd talks
to Prashant L Rao about trends in enterprise storage
Shailesh Agarwal
|
Theres plenty of buzz around Solid state disks (SSD).
Is this technology going to be used in enterprise storage arrays?
SSD is being used for caching. Large Internet companies use
it to cache frequently accessed information. The technology remains prohibitively
expensive in terabyte (TB) capacities. As far as reliability goes, it hasnt
been tested in a commercial environment. That said the technology is moving
very fast. The biggest benefit of SSD is lower power consumption and space utilisation.
Whos the winner in the battle between iSCSI, fibre
channel (FC) and NAS?
|
Weve talked about WAFS to our customers but theres
not a lot of excitement
|
Unification is the trend here with a choice of protocols on
the same box. Why should the attach protocol be hard linked to the storage media,
after all. All disks are capable of relocating. This is easily done using a
router. We offer the N series (under the IBM NetApp alliance) unified storage
boxes that support native SAN, iSCSI and NAS. Every company has workloads that
are tuned to each of these protocols. For file sharing, NAS is best. Databases
work best with a FC SAN. For e-mail its iSCSI. Unified storage supports all
three simultaneously. You can dynamically allocate [disks to each of these protocols].
SMBs cant afford to do only NAS or SAN. Even a low-end box lets you change
or reallocate. The way this works is that theres a multi protocol router
between the fibre channel storage box and the user. It handles Ethernet, iSCSI
or FC requests.
Networking companies are talking about wide-area file systems.
What are your thoughts on that?
Unified storage has caching. Thats one aspect of a wide-area file system.
In the US, LAN and WAN bandwidth is ubiquitous. In this scenario, where files
are commonly shared you cache them. If you have two teams working on projects
in Chennai and Pune, you can put them on a shared box with a cache. Weve
talked about WAFS to our customers but theres not a lot of excitement.
There seems to be a trend of storage boxes running on server
processors. Is this the case with IBM as well?
Our enterprise boxes run on the Power 5+. The mid-range boxes use ASICs. The
appliances are based on Xeon processors and not counted as part of our storage
line-up. These run Windows Storage Server.
We leveraged the Power5 and built storage around the processor. In 2001 we had
the ESS F20 built around the Power3 and 3+. In 2002 the ESS 800 was launched,
it was built around the Power4. In 2005 the DS8000 debuted built around the
Power5 followed by the DS8000 Turbo that uses the Power5+.
We keep thinking about how we can harness the processing power of the Power5/5+.
As of now the processors arent fully utilised. Over time we want to put
some applications that are linked to storage in the storage box. Right now theres
replication and FlashCopy. Can you put backup and ILM in the box? For these
are data based applications. To do so would make them server independent. The
results of this strategy have been phenomenal in terms of SPC-1, Cache I/O and
SAP benchmarks.
Whats your SMB focus?
|
Servers are an example of
replacement infrastructure;
storage happens to be cumulative. Affordability is important here
|
70 percent of our sales are to SMBs. We have the N Series
of products for this segment. While every company has block, file and iSCSI
requirements, in a large enterprise a separate storage infrastructure exists
for each of these protocols. SMBs, on the other hand, may not be in a position
to pick and choose between SAN and NAS. All N series boxes come with fibre,
iSCSI and NAS (CIFS, NFS). Its a single box that satisfies the needs of
an SMB with multiple workloads. Its a software-based platform and can
therefore be upgraded without changing the hardware (for SAN you may need to
change the box). Compliance, DR and ILM are functions of the software. This
is application-aware storage. Most storage boxes only differentiate between
block and file. Here you have agents to recognise the application to which the
data belongs starting with Microsoft Exchange and SQL Server. The agent should
be able to recognise an e-mail from Exchange. Backup and restore are faster
in this case as the process is granular. Snapshotting is undertaken at a suitable
frequency. If you realised that your Exchange box has been the victim of a virus
attack, you can rollback in a second. You will still lose some data but youre
up and running quickly with most of it.
x86 storage is an emerging category. will this result in
storage becoming a commodity as is the case with entry-level and to some extent
mid-range servers?
In the SMB segment or at the low-end you wont get dedicated x86 storage.
Servers are an example of replacement infrastructure; storage happens to be
cumulative. Affordability is important here. We have a whole set of products
for the x86 environment, the DS 300/400 etc. Pluggability is more important
here than scalability. Start with two or four TB and plug in more boxes as you
grow and keep things simple. If theres a problem just ship the appliance
back. The cost goes down as cost is a function of configurability.
When you move up a level theres the DS 4000 range that starts with SATA.
Today a SATA box has 7200 RPM drives and a 4 Gbps fabric. Earlier SATA boxes
came with a one year warranty. Now even the entry-level SATA box comes with
a three year warranty. Moreover, these are all plug-and-play.
Are companies adopting a new direction when it comes to
DR?
Predictability, orchestration, monitoring are the watch wordscan I know
what my RTO/RPO is in real time? We are working with Sanovi on this front. Unified
DRone DR across protocolsis a concept that is gaining ground.
Business continuity has entered the mainstream of enterprise computing. I dont
see any deal without BC being part of the sale.
|
Servers are an example of replacement
infrastructure; storage happens to be cumulative. Affordability is important
here
|
Weve heard a lot about ILM. Whats the latest
on that front?
ILM is where BC was a year back. Everybodys talking about it, there are
some pilots. It hasnt reached the inflection point yet. One approach to
ILM is to let a consultant categorise all your data application-wise. In the
short-term this will be the way.
E-mail is one of the first data stores that are compliance-related. In the case
of ERP or core banking systems the data generation isnt quite as rapid
so the pain hasnt been felt yet. Most ERP or database applications
come with support for selective archiving. Storage vendors provide cheap storage
that works well with the application. We have the DR 550 (large cheap storage,
software, tape library). E-mail is almost like a corporate virus. A 1.5 MB message
going from person to person soon becomes 5 GB. One way out is to establish quotas
forcing users to either delete large messages or to store them locally. Unfortunately
under SOX or any similar local legislation, the responsibility of storing e-mail
data lies with the company and not the individual. This is a requirement for
e-mail that goes outside the company to customers, press, analysts etc. Companies
have to archive all external communications that have material importance which
leads to the need for e-mail archival.
20 percent of e-mail contains attachments and uses 80 percent of available space.
Theres a huge amount of duplication. De-duplication lets you store 1 GB
of messages in 10 percent of that. The value proposition is from the compliance
and cost perspective. The technology exists in the N Series. With software called
Snaplock. You cant delete or modify. You have an expiry mechanism that
lets you lock data for seven years and then delete it. You cant delete
or modify the data during that period. As of now this stuff is not there in
SAN arrays. WORM tape can be used to archive SAN data. Hospitals use DVD media
for archival.
Theres a big consulting service around ILM. Right now its application-specific,
e-mail to start with and then flat files and finally databases.
Is multimedia storage a big area for you?
There are many segments here, each with its own characteristics. In animation
you have DQ Entertainment and Crest. Most file systems are not designed to handle
large files. We have the General Purpose File System (GPFS) that introduces
a concept called bitlocking. Three people can simultaneously edit different
portions of the same file.
TV Channels are going in for Digital Asset Management (DAM). They need to repurpose
content. We work with Ardendo and DataForge. Here the final archival is onto
tape. All our tape products are fully certified for DAM.
Studios make FX laden movies. We do not play a big role here right now. They
use special equipment with its own storage. As they hand over the IPR to their
clients, they tend to use cheap storage. A typical FX-heavy movie needs about
20 TB of storage. In this case the movie makers would use 20 1 TB USB boxes.
The companies that sell the editing equipment also sell the storage.
In the case of Web portals, telcos and the Net in general these are interactive
mediums. Video on demand (VOD) from VSNL, IPTV, DTH and delivery through IP
are the trends.
Digital Surveillance (DS) is an emerging area. With analogue tape it takes ages
to find what you want. Airports are large consumers of DS. Refineries, banks,
ATMs, large manufacturing plants... DS will become the norm in all these segments.
The first twoanimation and TVare active. In the case of studios,
we dont play much and VOD and DS are upcoming segments.
With the cost per megabyte plummeting would it be correct
to say that storage is effectively free?
All TB are not equal. In a PC you have a 5400 rpm disk that may not even be
SATA. It may cost a cent per MB. For a 15000 rpm FC 150 GB disk the cost per
MB is double that. A storage customer isnt buying raw TB. In an enterprise
storage system, the HDD is but one component. For a 50 percent populated system,
the HDD cost isnt more than 25 percent. The interconnect, cache and IPR
account for the rest. In an enterprise system that can take 1,000 disks of 300
GB each you can have a maximum of 300 TB. If instead of a thousand 300 GB disks
if you have a thousand 72 GB disks, the difference in cost is negligible. The
difference in cost of the system between these two configurations is about 10
to 12 percent but the cost per TB has gone up 4x. Fully populated with the maximum
disk capacity, enterprise systems cost about $20,000 per TB.
|