How to Compute Chemistry, Conference Report
I kind of stumbled into my current field of research – molecular dynamics simulations of proteins – by accident and as a consequence, I never actively joined a particular community of researchers. I sometimes call myself a biophysicist but I am not really a physicist. But neither do I know a drop of chemistry. Whilst computational biology sounds like a great buzzword, it doesn't really, well, mean anything. And nobody in my wet lab understands what I do, oh boo hoo.
This all changed a couple of months ago when I went to my first Gordon Research Reference. As soon as I got on to the bus, I immediately got into a slanging match about atomic force fields. During the next 2 and a half hour bus ride, I argued with a bunch of strangers about thermodynamic ensembles, and integrators and dynamic protocls, and we hadn't even started the meeting proper yet. And that's when, suddenly, I realized: these are my folks. These strangers understood my problems, my bete noires, my pain. And we were all trying to get to the same place. These were computational chemists ergo I must be a computational chemist. I had come home.
Based on that realization, during the meeting I started to take notice of the problems that concern computational chemists. What do chemists really want to simulate? Well, chemical reactions. For small molecules, you have to go full quantum, you must use all that heavy duty computation required to solve the Schrödinger equation or at least the density functional theory equation. But for reactions involving large macromolecules like protein and DNA, we can afford to approximate a little and use molecular dynamics to study interesting thermodynamic reactions such as protein folding, protein dynamics leading to allostery, large motor processes, and the effects of post-translational covalent modifications.
Whilst many of us use molecular dynamics packages, we sometimes forget that they are fraught with foundational assumptions. Molecular Dynamics (MD) simulate molecules as if chemical bonds can never be broken or formed. Simple spring potentials model bond vibrations and a totally artificial r^(-12) term models intramolecular repulsion to stop atoms from forming new chemical bonds. Thus you can model a molecule like a chain of balls on a spring using good old Newtownian equations of motion. Furthermore, electrostatic interactions, which lead to complex electron exchange effects in quantum mechanics, are substituted with simple point charges in the center of every atom. But there are many ways to calculate how the electrostatic charges contribute to electrostatic effects including periodic fourier transforms and other fancy pants algorithms.
I know that many of us computational chemists fancy ourselves as programmers and we all want to write their own MD package because well, we'd do so much better. But even a fraction of a moment's sober thinking will bring one to the realization the impossibility of the task because if you are not actively developing a force field, or developing new numerical integration algorithms, then it's just so much pissing in the wind.
The hard part of writing an MD package is generating a good atomic force-field. Although each of the major MD packages implement their own force-fields, bits and pieces of each of them have infected each other. What we have is an eco-system atomic force-fields that are vaguely related to each other and no one really knows how they really differ.
Still there are some brave souls trying to get past the standard MD atomic force fields. I am a personal fan of implicit solvent models where we pretend that water molecules don't really exist and that they can be replaced with a surface tension term. Nathan Baker showed that there is still life in the old implicit solvent models yet, with some nice little twists on implicit solvent terms involving volume terms instead of surface tension, for instance. For those who come from physics, it's common to see higher order approximations to complex equations solved, term by term. Given that static partial charges is a zero order approximation to electron density, a higher order approximation might be rotating dipoles. Except that polarizable terms – where rotating dipoles replace static charges – turn out to be a running joke in the conference.
Another exciting approach to expanding molecular dynamics was Jana Khandogin's constant pH simulations. Why is this exciting? Well, MD simulations typically forbid chemical bonds forming or breaking by fiat. But that's essentially what happens to titratable amino acids at a constant pH. Hydrogens come on and off histidines, for instance. This of course can't happen in molecular dynamics, and thus you never get to see dynamic changes involving pH effects. Constant pH simulations use a very elaborate chemical-alchemy approach to generate ghost hydrogen atoms, which actually do a pretty good job of approximating pH exchange effects. As you ramp up the pH – changing the population of free floating hydrogen atoms in the solvent – you get to see rather large changes in a protein conformation, which is what you'd expect to see according to experiments.
But of course, the other big problem with MD simulations is that no one is really ever satisfied with how long the simulations run. Computational chemists are rather greedy in that regard. I came across various schemes to speed simulations up, including Andrew McCammon's accelerated dynamics where you flatten potential wells to drive the simulations over the humps dynamically. Then there are all sorts of different ways to do replica-exchange and simulated annealing and every other bastard variation thereof. Of course these speeding up schemes would be utterly useless if there wasn't a way to automagically smooth out the results so that the results of the simulation can "pretend" to have been simulated at a 300 K equilibrium using a normal MD simulation. Thank god for Jarzynski's inequality and other such smoothing theorems that guarantee that you can transform a highly non-equilibrium ensemble into a canonical ensemble. There were many highly theoretical (re not very entertaining) talks that attempt to show that no matter how poorly sampled your simulation, you can still turn water into wine if you throw your results into the right wine-skin.
There were quantum chemistry talks. I didn't understand any of them except maybe for this neat talk that used QM/MD to reproduce some Electron Spin Resonance experiments.
I was a bit chastened to see that there were few talks on coarse-grain model. My personal aesthetics is towards the simple so I feel like traitor somewhat in that now I do mostly MD simulations. The stand-out talk of the conference was a talk by Juan de Pablo, a materials statistical mechanics dude who presented superb coarse-grain simulations on DNA. Demonstrating the dictum that actual applications forces you to clarify your model, Juan showed some absolute awesome coarse-grain models of DNA spontaneously forming simple geometric patterns that helped him build nanoscale chips using simple thermodynamics. His models of simulating the process of inserting DNA inside a virus capsid were spell-binding.
In contrast, we were forced to watch a whole series of multi-scale model talks, of which my response was, what's the point? Supposedly, multi-scale models allows you to have a coarse model on the outside, and successively more details until you get to the high-level detail in the region of interest. It's ugly as sin but if you can get the speed-up and consistent results then maybe it's worth it. But unfortunately, you end up spending most of time dealing with the boundaries, and besides, there isn't actually a serviceable problem to apply the method to. Everyone seems to do multi-scale models of water, but these models are stuck in no-man's land. It's not detailed enough to capture complex phase properties of water, not is it simple enough to allow a complete ensemble to be generated. So where's the beef?
Unfortunately, computational chemistry is not really at a stage where we can tackle really exciting protein reactions that involve conformational changes in large proteins. There were some isolated examples. Matt Jacobsen reported some preliminary long-time simulations of the phosphorylation of myosin in straight MD leading to large conformational changes. For large protein reactions still require some hand-tuned calculations. Alexei Stuchebrukhov used a custom fit electrostatic fitting protocol in a membrane potential to identify a plausible pathway for proton pumping in cytochrome C. Terry Lybrand showed some very subtle dynamic effects that affected selectivity of drugs between COX1 and COX2, two proteins that have virtually identical backbone structure.
But really, the real action in this computational chemistry conference is ligand binding, by a country mile. This is where the money is. This is where the industry computational chemists live, eat and breathe. However, despite the sturm-and-drang of talk after talk of industry chemists detailing their new™ amazing™ much-improved™ ligand-binding software, the holy grail of a reliable calculation of ligand-binding affinities remains tantalizingly out of reach. If there is any more pressing theoretical problem for the pharmaceutical industry, I don't know what it is.
I will grant you that for certain class of ligands, poses can be predicted somewhat. But if you can't have a reliable ligand-binding affinity then you really don't know how well the ligand will actually work, like, as a drug. So it seems like the talks I saw were part of a merry-go-round of incremental improvements in predicting ligand-binding. Some have flexible backbones, some do not. Some have explicit waters some do not. Some fit charges, some do not. Some are fast, some are slow. Perhaps this is why the most exciting ligand binding talk essentially told everyone to STFU. Open Eyed Software has done the ligand-binding field a great service by organizing the CASP of ligand-binding. Called SAMPL, this was a proper blind test of the ligand-binding community. Even more difficult to organize than CASP, the organization of SAMPL required complex negotiations with various pharmaceuticals in order to get a test set for the competition. The results were at best, dissappointing, but disappoint in an entirely predictable kind of way. This is actually not so bad because if we take CASP as a guide, the first time CASP occurred, it was a real wet blanket as the whole community did really poorly. It took 2 more rounds before real progress was made.
By then, I should be back at the Computational Chemistry Gordon Research Conference, hopefully whooping up in the highlands of Switzerland.