Recently,
I've been reading about how computationally-intense problems in
science, such as protein folding, has been driving the adoption of
cutting-edge computer technologies, such as Beowolf clusters. But it
works the other way as well. The adoption of Beowolf clusters has, in
turn, chipped away at the foundations of protein folding so much so
that theory in protein science has had to evolve in ways that
accomodate the use Beowolf clusters.
I work in the area of
protein folding, one of the hairy workhorses of scientific computing.
One of course, would like to simulate a protein molecule in as much
detail as possible. In most approaches, the lowest level of detail is
the covalent bond: every covalent bond is modeled but not allowed to
break. This approach is known as molecular dynamics. Modeling the
covalent bonds imposes a limit to the simulation: the biggest step you
can take must be smaller than a typical vibration of a covalent bond -
you must use baby-steps no bigger than a femto-second, or 0.0000000001
s.
A typical protein has tens of thousands of atoms and one
normally bathes a protein in another few thousand water molecules. A
realistic system might contain hundreds of thousands of atoms. Given
the need to model the covalent bonds, the number of time-steps needed
to simulate large motions in a protein, typically a microsecond
0.0000001 s, pushes the envelope of even the largest computing systems.
In
the past, only well-endowed labs, with access to restricted Pentagon
number-crushers, could dream of simulating a protein using molecular
dynamics. Even ten years ago, Peter Kollman got a paper in the
prestigous journal Science simply for simulating a small protein (the
Villin headpiece) for what seemed like an eternity, a single
micro-second (0.000001s). He didn't find out anything new but the study
was hailed as a tour de force computation (read: my computer is bigger than yours).
But
over the last 5 years, a sea-change has swept the field of protein
modeling. What has happened is that linux has now commoditized
hardware, processors have reached a critical performance hump, and
communication protocols between individual computers have matured (MPI
- message passing). For a rather modest investment, one can buy 10 or
20 cheap PC clones (Xeon motherboards), and hook them together into a
tightly coupled network, known as a beowolf cluster. In theory, a
beowolf cluster provides computation that rivals even outrageously
expensive super-computers.
As forward-thinking (and
penny-pinching) scientists have jumped onto the beowolf-cluster
bandwangon, they've collectively run into a methodological brick wall.
How do you slice up a molecular dynamics computer program that was
written to run on one computer? The probelm is that running a computer
simulation is a little like following a recipe for baking a cake. You
have to mix the eggs and flour in a batter before you put the batter in the oven. Putting the flour and eggs in the oven whilst mixing the dough will not result in a cake. However, various strategies have been devised to bake a protein in parallel.
The
first is the shot-gun approach. This approach used mainly by Vijay
Pande at Standford in the feel-good folding@home program. Folding@home
is unique in that it farms out the computation to tens of thousands of
desktop computers scattered around the world. The essence of the
shot-gun approach is that running a very long molecular-dynamics of a
single protein should be just like running thousands of short
simulations of the same protein. In order to make sure that each
simulation is different, the protein is scrambled at the begining of
each simulation. This makes it easy to farm out a short simulation to
one of the many desktop computers in the folding@home project. At the
end of all the simulations, the results are averaged out to find the consensus solution. The prevailing philosophy is that with enough shots in the catridge, you will hit the bird, no matter what.
Another
approach is the divide-and-conquer approach. Machiavelli famously
advocated that, when facing a coalition of enemy forces, instead of
hurtling into an full-frontal assault, one should learn to play each
enemy against another. By using subterfuge and dipolacy to foster
internal strife, victory is brought closer to fruition. Similarly, in a
simulation of a protein and its vassal water molecules, the simulation
is broken-up into little spatial grids where all the atoms in each
grid-cell is simulated by a single computer in a beowolf cluster. Just
as a successful divide-and-conquer approach depends on subterfuge and
diplomancy, the success of the simulation will depend on the quality of
the cross-talk between the grid-cells. This is a rather new technique
and time will tell if it can be successfully applied to protein folding.
But
by far the most popular technique is replica-exchange, or the
musical-chairs approach to molecular-dynamics simulation. In
replica-exchange, a modest number of copies of the protein are run on
the processors of a beowolf cluster. But unlike the shotgun approach,
each copy of the protein, called a replica, is run for a very long
time. The replicas running on the beowolf clusters play a game of
musical chairs. The replicas are mixed around so that each processor
gets a chance to simulate each of the replicas. Of course, there's no
point in mixing the replicas around unless each chair is different.
When the music stops, each replica sits down on a chair with a
different temperature - some are tepidly warmed at room-temperature,
whilst others are hot enough to deep-fry a turkey. The idea is
mixing-and-matching replicas at different temperatures supposedly
speeds-up the simulation where only proteins that are in a good shape
can sit down onto a cooler chair. This method of speeding up the
simulation is wildly speculative but seems to be tailor-made for
beowolf clusters - only a few number of processors are used with swaps
between processors happening at regular intervals. It calls to mind
afternoon cable television commericals - you really do get more for
less.
But during a recent conference, my boss heard a speaker
quip that replica-exchange was a hack technique that doesn't improve
molecualar-dynamics simulations. The room erupted. It turned out that
pretty much everyone in the room was using replica-exchange. The
speaker then pointed out that, anecodotal evidence withstanding, nobody
had yet shown quantitatively that replica-exchange was more efficient.
But what was left unspoken was that the guys in the room had to believe in replica-exchange, if only for the luxury of using their affordably cheap beowolf-clusters.
Chinese Blade on 08/29 said:
Scientists, in their unique bigotry, tend to think that science drives technology. But really technology drives science. And drive is a weak word. Technology DEFINES science. It gives tools to science and scientists
can only use these tools, plus a bit of their imaginations, which are
invariably oversated in importance. Newton may have discovered the laws of gravity, but it is Gallileo's telescope that ushered in modern
astronomy. And astronomers had to believe in what they saw behind the
lens. Fortunately, they had a good reason to.
Take a more recent example. Since Max Perutz first determined the structure of haemoglobin by X-ray crystallography, the technique has been the de facto method of structural determination of proteins and nucleic acids. Almost all the proposed mechanism of protein functions are based on the
crystallographic structures. Sure these structures are only static
images of dynamic proteins, and we all agree that protein dynamics are
often essential to their function. But no technology has managed to
determine dynamic structures of proteins (aside from limited successes of NMR spectroscopy), so crystallography is still the best we have got.
After all, scientific truth is not what can be proven true, but what cannot be proven false, yet.
So run replica exchange on the amazingly affordable Beowulf clusters, and be content doing so.
|
|