|
A supercomputer for every business
Dr. Reza Rooholamini, Director of Enterprise Solutions, Dell
Product Group believes that supercomputing is no longer the preserve of academics
and large businesses. In conversation with Prashant L. Rao
Dell isn't the first name that comes to mind when you think
of supercomputing. What's your role in this area?
|

Dr. Reza
Rooholamini
|
According to IDC, the global market for HPC in 2009 was about
$2.5 billion. By 2014 it is expected to be about $3.4 billion. This is regarding
clustered HPC. We have about a 30% share of this market.
We have been a player in HPC all along. We established the very first relatively
large supercomputer using cluster technology. It was Dell that pushed this clustered
version of the supercomputer. In the 1990s, we felt that we could build these
machines by stringing together x86 hardware. We built 64 node, 4 processor machines
that could stand in for the Cray SV2. Today we have a 4,000 node supercomputer
at one of the national labs which is used to develop open source software.
Our customers include research labs such as Lawrence Livermore that use our
equipment for some of their proprietary applications. Commercial workloads include
Oil & Gas companies that use these machines to analyze seismic data. Computational
Fluid Dynamics (CFD) apps are also popular. Our customers include Exxon Mobile
and Compagnie Generale de Geophysique (CGG) in Houston. Boeing uses our supercomputers
for CFD. Biomedical companies use these machines. Academia also employs them.
At the top, at the San Diego Supercomputer Center, we are talking about 4,000
nodes. Even some oil companies have very large set-ups, CGG has in excess of
5,000 nodes but it built up this infrastructure over time in increments of 512
nodes.
When it comes to software we use the industry standard, Message Passing Interface
(MPI). As far as managing the cluster goes we have a partnership with Platform
Computing. We have interfaced our hardware management to these stacks for power
cycling the servers and cluster management tasks such as job scheduling.
When it comes to HPC clusters, what are the reliability
levels?
We achieved four nines almost ten years ago. Redundancy in a Dell HPC cluster
depends upon the application. If one node goes down, maybe another takes over.
We don't have fault tolerance. It is about how you distribute your application
and how much of loss your application can tolerate. Some customers run what
we call single instruction, multiple datayou have one application and
you want to run a thousand data sets through iteach on one node. Then
we have customers who use this as a truly parallel computer so that each machine
does a piece of the job. In this instance, there's greater susceptibility to
failure.
How are HPC clusters being used in the Indian market?
IDC defines departmental computers as 32 node machines and the workgroup segment
as consisting of 16 node machines. We see a lot of opportunity and penetration
at the workgroup and departmental level (16 and 32 nodes). We have installations
at GE for example that are larger but the bulk of the opportunity is in the
lower node countsfrom 4 to 32 nodes. This is a segment of the market that
exemplifies the diversity of applications. The bigger installations use these
machines for a specific application like seismic processing, CFD and so forth.
Here, every professor gets it for his own department and is using it to run
his own application. In the corporate world, customers like Infotech, Hyderabad
are using it for CFD. Some are using it for animation.
In terms of the R&D work, we have 120 engineers that work specifically on
HPCC. We have 30 in Bangalore, some in Austin Texas and a few in other geographies.
This group develops the technology that we use in the rest of the world. The
servers for this market come from Sriperumbudur. Storage devices come from Taiwan.
The networking equipment is from China. We have partners that we get pieces
from, for instance, Mellanox, Broadcom etc.
In India, our HPCC business depends on direct relationships with our customers.
We have a set of configurations that are prequalified and tested. They often
act as a conversation starter. Some customization is required.
Among your Intel- and AMD-based hardware, in what scenario
is either likely to be chosen?
We use a combination of AMD and Intel servers. For some applications, there's
investment required beyond hardware such as CFD where Intel is the choice. Where
there's a lot of throughout requirement and open source is being used in an
academic environment, AMD may offer the better proposition.
What about storage?
For storage, we are seeing iSCSI as well as regular NFS in use along with Fiber
Channel and SAS. At the workgroup and departmental level, we have our own storage.
At the high-end, we have partnerships with Panasas and others. When we get past
350 TB or so, we go to these partners. We have a series of projects for which
we have pre-qualified different storage. With Terascala, we have a solution
where the hardware is ours but the software is Terascala's. We have also qualified
an NFS server in 20/40/80 TB increments that we can take and attach to this
solution. In terms of the the file system, Lustre has a good footprint and promise.
There are still a few areas that need to be improved upon. It's in petascale
and is trying to get to exascale by 2020. We also keep a close eye on parallel
NFS.
Tell us about the evolution of cluster supercomputing.
It used to be that cluster supercomputing was all about servers and networking.
Now storage, management and databases have come into the picture. The boundary
between technical computing and IT computing is diminishing. Parallel file systems
started in technical computing but now they are becoming popular in IT computing
as well. We see HPCC as the test ground for new technologies and the data center
of the future will benefit from this.
We have taken an active role in pushing HPC to the lower end
of the market. If you are in the business of designing a process, mechanism
or physical systemyou can use a supercomputer. In Houston, the land is
such that the city can get flooded quickly when a thunderstorm comes in from
the Gulf. The University of Texas has put a large system in place where scientists
along with the city officials have created a What If model that lets them visualize
on a Dell cluster as to the effect of thunderstorms upon their city. We have
consumer product companies such as P&G that use these machines to design
diapers and containers for detergents etc.
Storage is important and we put in a lot of effort on it.
The size of the storage is increasing, the number of cores is increasing and
the storage subsystem needs to keep up with this. The characteristics of desired
storage are that it should be highly parallel, scalable and heterogeneousit
should work with Windows or Linux compute nodes in a hierarchical manner (move
data across tiers).
We have large universities where the scientists have pooled their funding and
had the university put a large supercomputer cluster in place. Now the Physics
professor and the Bioinformatics professor can both submit their jobs to the
cluster. We make the large installation Cloud operable.
Going forward, what direction do you expect cluster supercomputing
to take?
Management would continue to be a focus area. Power and cooling, especially
in large installations, would be an important aspect. When it comes to Intel/AMD
servers, there are three layers of power management in these systemsat
the processor level you can turn cores on/off or give cores more power so that
they can operate at a higher speed; then at the BIOS level there's the ability
to turn things like the fan on/off; lastly the OS has capabilities to manage
power. As a function of an application, what the setting of each parameter in
each layer should be so that it will deliver this stuff optimally for that application.
Our team in Bangalore spends a lot of time running these applications and trying
to set these parameters to see what's the best configuration.
We are trying to provide a more dynamic environment. If you set these parameters
right, you will have optimized power and cooling.
What's the entry point for cluster supercomputing from
Dell?
4-8 nodes is the entry point today; about $50,000 to $250,000. You have to keep
in mind the fact that what you could do with an older 64 CPU system can be done
with a 4 node system today as there are more cores inside. Today you can reach
10 Teraflops with 32 nodes.
The sizing, acquisition, deployment and operations of these machines has to
become easy. It should be accessible to customers who lack technical ability
or deep pockets. To do that, we give them as many tools as we can. If you go
to dell.com/hpcc you get a lot of tools like an adviser that asks you some questions
about your application and recommends a configuration.
How long does it take to deploy a Dell supercomputing cluster?
Assuming that you have architected the system and that the environment is ready
to receive it, it will not take more than two to three weeks to deploy a HPC
cluster. By the time the equipment arrives at your door, it takes two to three
days.
Do you sell only the back-end or do you also sell the machines
to access the resultsthe workstations etc.?
We have a three tier architecture. The lower tier has the machines that do the
computation while the middle tier has the machines that do the storage access
etc. The upper tier is the visualization cluster. We can offer equipment for
all three tiers. Our team is working on doing more bundling of all three tiers.
Today we have the largest video wall of 56x56 panels at the University of Texas
Advanced Computing Center in Austin.
What are the applications to which your machines are being
put in India?
We are looking at the CFD for automotive and consumer products,
the animation industry; bioinformatics is also important in India. We are focusing
on genomics. Indian Institute of Chemical Technology (IICT) is a customer. We
have about a dozen customers in India. Theres even some work on weather
modeling thats happening.
How serious is Dell about HPCC?
Today we have 80 people involved in HPCC at Dell India for
assessment and order taking, a dozen HPCC field experts and 30 in engineering.
At the last supercomputing conference in New Orleans Nov 2009, we had about
150 people from Dell that were strictly involved with HPCC.
We have about a 30% share of the clustered HPC market. We established the very
first relatively large supercomputer using cluster technology.
If you are in the business of designing a process, mechanism or physical systemyou
can use a supercomputer
prashant.rao@expressindia.com
|