It seems that we have reached some kind of tipping point where microsecond molecular dynamics simulations are feasible. I’m starting to see many papers published where the only genuine point of interest seems to be that a protein was simulated for MORE THAN A MICROSECOND.
Still a microsecond is not quite long enough to watch biological stuff shuffle through interesting conformations without specifically tuning the simulation. What remains hard is to find something interesting for which a microsecond of simulation can simulate adequately.
And so, one of the more important papers from the measurement side of things came out in 2008 from Bert de Groot’s group, Recognition Dynamics Up to Microseconds Revealed from an RDC-Derived Ubiquitin Ensemble in Solution, which claims to be the first NMR experiment to measure the complete microsecond dynamics of every residue in a protein. In this case, ubiquitin. The measurements are in the form of S order parameter, a measure of the spin relaxation of N atoms along the backbone derived from the Nuclear Magnetic Response. These are measured from Residual Dipolar Couplings.
The S order parameter is somewhat ill-defined with respect to the protein conformation, so there is some lattitude in how to interpret it. de Groot and coworkers showed that if the S order is interpreted as simulation constraints then an MD simulation with these constraints produce the same variabiality as an ensemble of 46 different crystal structures of ubiquitin, which translates the S order parameter into a quasi-experimental RMSD measure, which they call the EROS RMSDf.

With complete microsecond dynamics measurements, we finally have a litmus test for those spanking mew microsecond MD simulations. Step up to the plate, D.E. Shaw and coworkers, using their state-of-the-art Molecular Dynamics package DESMOND: Microsecond Molecular Dynamics Simulation Shows Effect of Slow Loop Dynamics on Backbone Amide Order Parameters of Proteins. So how did they do? If we plot the calculated S order parameters with the ones derived from the Residual Dipolar Couplings (actually I plot 1-S as I find it easier to see), we get an okay match of r = 0.63:

But here’s the rub, in the course of my research into long time dynamics, I came across a whole bunch of great approximations that simulated protein flexibility (actually a reviewer suggested these to me). In particular, I was pretty impressed with CONCOORD, written by the very guy who made the NMR measurements.
The idea behind CONCOORD is simple and elegant: replace every bond, and inter-atomic interaction in a crystal structure with distance constraints that represent the strength of the interaction, and run a monte-carlo simulations. Using CONCOORD, in several minutes, I got a decent ensemble, from which I get the CONCOORD RMSDf. Comparing the experimentally derived EROS RMSDf to the CONCOORD RMSDf, I got a match r=0.54:

Comparing DESMOND to CONCOORD, the improvement of correlation to experiment was about 0.1 of correlation. The lesson to learn is that if you’re running MD, you have to run it smart. The reason that CONCOORD performs well for ubiquitin, is that the motions of ubiquitin are actually not that large. Compared to the 6 or 7 Å motions of the ligand-binding loops of TIM or DHFR, ubiquitin really doesn’t move all that much. Hence even a stripped down model such as CONCOORD can reproduce the experiment almost as well as DESMOND. To simulate these motions with MD, you need tens of microseconds of simulations.
Minutes of calculation with CONCOORD produces pretty much the same result as hundred of thousands of hours of CPU time with DESMOND. There’s a lesson in there somewhere.
While I appreciate your warning of being smart before embarking on running months of MD simulations, I think it is somewhat dubious to use RMSF as a tool to compare methods. Most of the time you don’t run simulation to only reproduce experimental results; you are trying to see something that you can’t get at using traditional experimental methods. As a simple example, I can use an elastic network model that takes 5 seconds on my laptop to generate the RMSF that I can generate using 6 months of MD experiments and they have a correlation coefficient of ~0.85, so they are essentially giving the same results. But the ENM is incapable of predicting the multiple metastable states and general anharmonicities in the real system. The ENM is informative, but is a simplification that carries those simplifications all the way through.
Brute force is not always the way to go, but as with any simulation methodology, the choice of which one to use comes down to the question you are trying to answer.
I sort of agree with Josh. Reproducing experiment is cool, but studying behavior not easily studied by experiment is where MD really gets to be fun. Just analyzing long trajectories, or multiple trajectories at various parts of an energy surface and getting ideas on molecular motion gives you all kinds of ideas to investigate further
@Josh, thanks for the comment. I’ve been thinking long and hard recently about the relationship between the different methods of simulations. In particular, about how each level of approximation relates to MD.
1. With the example you gave, I actually take another conclusion (and one that other MD theorists I’ve talked to as well) – that the essential conformational space captured by MD (when only run in a short time-scale) is adequately captured by ENM.
2. True, with MD you can always obtain metastable states and anharmonicities, but when you haven’t guaranteed equilibrium, you can’t guarantee the significance of these states – in particular what is the true population?
3. So how do you know you’ve got equilibrium? Internal measures are only a short time-scale equilibrium measure. I’ve read enough simulations of large systems to see that they reach supposedly equilibrium using a whole bunch of measures. However from the simple knowledge that a lot of these systems can undergo large conformational changes (estimated to be in the microsecond-millisecond region) means that we’ve only captured a limited knowledge of the phase space, but talk as if you have the whole thing.
4. Without knowing you’ve got equilibrium, a lot of the arguments about the believability of MD simulations over ENM and other simpler models is a leap of faith. This is precisely why the NMR microsecond simulations is importantly theoretically. We’re assuming that the force-fields are good enough, so that the match with the data is a direct test to how close we are to finding the microsecond equilibration conditions to a system as small as ubiquitin. The point that it doesn’t match too well is for me, an indication that even a microsecond hasn’t sampled enough of the conformational space. In that case, for a system of the size of ubiquitin, I wouldn’t know how to evaluate the significance of metastable states and anharmonicities observed in a simulation of smaller time-scales.
5. On the use of RMSDf. Actually I find RMSDf a very good measure. If you read the NMR microsecond paper, they obtain a very good correlation (0.96) between the MD simulations constrained by the measured residual dipolar couplings and the RMSDf from the crystal structures. The match is qualitatively better than that between any MD and X-ray data. The fact that ENM can produce r=0.85 for some of your RMSDf is for me, an indication that your system doesn’t really undergo any large-scale motions. I am not saying that you can’t do anything interesting with MD on your system, it’s just that a straight MD (no SMD, no umbrella sampling, REMD, LES etc…) is probably not going to tell you much over ENM. BTW what is your system?
@Deepak, in my experience, making MD simulations is easy, remarkably easy. But trying to show that it means something is really really hard; maybe I feel this way because I do MD simulations in a wet-lab. When you say that MD is good for getting ideas, do you mean ideas for experiments, or ideas for more MD?
Bosco, I have a task to test whether homodimeric proteins can form stable dimers. I can do it by measurement free energy by MM-PBSA calculations or try to run some long term molecular dynamics. Maybe I should make it by use of CONCORD? I have now ten models of dimers, which I’ve got by homology based modelling.
PS By the way my scientific superviser says that there is no soft which can perform Monte Carlo global minimum search for explicit proteins with explicit waters for protein folding simulation. Maybe you know something about software based on Monte Carlo which can be used for folding simulations
Bosco, so did I (I was the only MD person), but my simulations guided a lot of the mutagenesis work.
Also, as others write, a lot of what you do with simulation is try and understand the energetics and behavior for systems and complexes that you are unlikely to get with other methods. MC work is great for structural analysis and predicting conformation, but it’s not a simulation of macromolecular behavior.
