Magic Numbers and unit conversions in Structural Biology

13 Dec 2011 // protein

If you end up doing any kind of energy calculation in proteins or organic chemistry – and that includes messing around with Molecular Dynamic trajectories – you may end up dealing with actual numbers.

And that means you'll have to get your head around physical units and their conversions.

I've spent days trying to figure out magic numbers in equations and source-code. Diving into the guts of someone else's source-code is not the nicest place to figure such things out. Do it enough, and you'll start seeing the same numbers pop up everywhere. As I've never seen anyone bother to describe some very common magic numbers in biochemistry, I'll list a bunch of them right here.

First let's get acquainted with the standard units:

positions of atoms and molecules are always expressed in Ångstroms which is 10^-10 m. Ånstroms are preferred because the radius of a hydrogen atom is ~1.0 Å, a useful magnitude to describe molecules. That's why PDB protein structure files use Ångstroms.
masses are expressed in Da, which is g·mol^-1 or 10^-3 kg·mol^-1 (where mol^-1 is 1.0 / 6.02×10^-23). This unit is the work-horse of chemistry and the masses of atoms (or at least their protons) can almost be pulled directly out of the periodic table. Once again, the mass of a hydrogen nucleus is ~1.0 Da.
charges are expressed in multiples of e (the charge of an electron). A hydrogen nucleus has charge +1 e.
energy is usually expressed kcal·mol^-1, which biochemists and nutritions use but not engineers of physicists. At room temperature, 300 K, the average kinetic energy available to most atoms is about 1 kcal·mol^-1. The hydrogen bond binding energy is about 2 kcal·mol^-1.

These units are pretty universal across a whole bunch of structural biology programs, and are chosen because they have a reasonable order of magnitudes for biological molecules at room temperatures. This is important to avoid overflow and underflow problems in floating-point operations.

Where magic numbers come in is making all the other physical properties match this set of units. One of the most common calculations is the electrostatic interaction. Namely the electrostatic potential energy k·q1·q2/r.

Now, we ideally would like the result in kcal·mol^-1. Given that q1 and q2 are in e, and r is in Å, this imposes a set of unit conversions on k (=1/4πε) which gives the magic number k = 332.24.

You will see this everywhere, and the first place I saw it was in the ubiquitous secondary structure evaluation program DSSP that defines hydrogen bonds in terms of the electrostatic energy calculated by this equation for bacbkone O and H atoms.

Since the energy and distance units are fixed, this means that the calculated force units are, not in terms of Newtons, but in terms kcal·mol^-1Å^-1. This is because forces in MD are mostly central forces taken from the energy potentials where F = dE/dr.

Notice I've said nothing about time. There is no particular consensus on what time units are used in Molecular Dynamics simulations.Some MD simulations choose fs (10^-15 s) as this is the smallest time-step, which allows ~10 steps for the fastest vibrations (bond vibrations involving H atoms). Others choose ps (10^-12 s) as this is closer to what can be observed with experiment.

Even with ps, this causes problems with velocities, due in part to the floating point representation. In NAMD, velocity values are stored in the PDB which has only 3 decimal figures in the coordinate fields. Given that velocities are outputed as Å/ps, the 3 decimal places lose a lot of precision. As a result NAMD multiplies the velocities by 10 before storing. So you have to divide by 10 when you read NAMD velocities.

But the programs mess around with time and velocity units internally, ostensibly to avoid a single floating point multiplication in the main loop. In both NAMD and AMBER, the kinetic energy E = 0.5·m·v² is stored as kcal·mol^-1. If time was stored as ps or fs, then the kinetic energy calculation would require a unit conversion to get kcal·mol^-1.

In NAMD, the base time unit is fs (10^-15 s). As mass is in Da and position is in Ånstroms, directly plugging these values in the kinetic energy calculation results in an energy unit that is 2388.45897 kcal·mol^-1. To abstract away that 2388.45897, they find the square root of 2388.46, so that can you can slide one square root each into one of the velocities in the v² term in the kinetic energy. The velocity is of course, distance/time. Since they don't want to mess around with the distances in Ångstroms, they divide their time (which is in fs) by the NAMD magic time-factor of √ 2388.45897 = 48.89. Now you can plug it all in 0.5·m·(d/t)² and get the energy directly in kcal·mol^-1.

In AMBER on the other hand, the base time unit is ps (10^-12 s). AMBER too, uses Da and Å. Plugging these time, mass and distance units in gives the kinetic energy as 0.00238846 kcal·mol^-1. In this case, AMBER takes the square root of the inverse of this number which is 418.68. They multiply time (in ps) by the AMBER magic time-factor of √418.6 = 20.46.

Two different MD packages. In one, you multiply time by 20.46. In the other, you divide time by 48.89. Same problem, two magic numbers.