Calculating the Solvent Accessible Surface-Area

Originally written on 19/10/2007.
PDBASA is my tool to calculate ASA. It is part of PDBREMIX.

The Accessible Surface Area (ASA) is a wonderfully simple property of an atom that can reveal a lot about the protein molecule. The Accessible Surface (AS) of a molecule is the part of the molecular surface that is exposed to the solvent, like the white sheet that wraps around a Christos installation.

Making analogy to the surface tension of a liquid, the free-energy of a protein due to the solvent is, to a crude approximation, proportional to the area of the AS of the molecule, this is the solvent Accessible Surface Area (ASA).

I can use ASA of an atom to figure out which atoms actually lie on the surface of the protein. But how do I calculate the ASA of an atom? Originally defined in 1971 by Lee and Richards, many different methods have been proposed over the years to calculate the ASA – some analytical, some numerical. The analytical calculation is not easy, it's a complicated geometric problem, where you have to consider the many possible intersections of spheres. Perhaps the most elegant method uses a graph representation, like the alpha shapes technique of Liang and co-workers (1998), to describe the spheres, and the ASA is calculated as a property of the resultant graph. These methods require quite sophisticated representations of the graphs, leading to some seriously difficult coding.

In contrast, the heaviest use of ASA that I know of is in AMBER to approximate the implicit solvent effects in force-field calculation. The ASA is calculated numerically, using the LCPO approach of Weiser, Shenkin & Still (1999), which uses a linear-approximation of two-body and three-body terms. However, the derivation of these formulas is somewhat opaque.

If speed is not important, then the Shrake and Rupley (1973) dot method is the one to use. It's easy to understand and easy to implement. The basic idea behind the dot method is to use a spherical mesh of points to approximate the surface of an atom.

To generate the mesh for an atom in a molecule, you typically start with a set of equidistant radial points on a unit sphere, centered at the origin. There is no solution for a perfectly even distribution of points on a sphere. There are, however, reasonable approximations, which include the Golden Section Spiral distribution (includes python code). Once you have the distribution, then for each dot, you multiply the radius of each dot by the atom radius, and shift to the center of the atom.

test_point.x = point[0]*radius + atom.pos.x
test_point.y = point[1]*radius + atom.pos.y
test_point.z = point[2]*radius + atom.pos.z

For each dot, you test if they are buried in one of the neighboring atoms.

is_buried = False
for neighbor_atom in neighboring_atoms:
    r = neighbor_atom.radius + probe
    diff_sq = pos_distance_sq(neighbor_atom.pos, test_point)
    if diff_sq < r*r:
        is_buried = True
        break

If it is not buried, you count it:

if not is_buried:
    n_accessible_point += 1

Each point carries with it a tiny slice of the surface. The number of remaining dots, multiplied by the surface area per dot, gives you the ASA for the atom.

const = 4.0 * math.pi / len(triangles)
area = const * n_accessible_point * radius*radius 

Then, you simply loop over every other atom in your molecule.

A complete implementation in Python can be found in asa.py.

How accurate is the method? It depends on how many dots you use in the original mesh. My default value is 960 points. Of course, the more points you have, the more accurate the answer, but the slower the calculation. It's not the fastest algorithm – there is a deeply nested loop.

But there are places for optimization. You can pre-partition the atoms into disjoint cubes to speed up the search for neighboring atoms. This will constrain the search for steric clashes to atoms in the same cube, or the cube's immediate neighbors. In some of the loops, you cache the atom index, and restart the the next iteration at the atom index of the last iteration.

Once you have the ASA for the atoms in a molecule, you can identify all the surface in terms of atoms with high ASA (I use ASA > 9 Å).

References

B. Lee & F. M. Richards, "The interpretation of protein structures: estimation of static accessibility." J Mol Biol. (1971) 55:379-400.

A. Shrake & J. A. Rupley. "Environment and Exposure to Solvent of Protein Atoms. Lysozyme and Insulin." J Mol Biol. 79 (1973) 351- 371.

J. Weiser, P. S. Shenkin & W. C. Still. "Approximate Atomic Surfaces from Linear Combinations of Pairwise Overlaps (LCPO)" J. Comp. Chem. 20:217-230.

J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudhakar, S. Subramaniam. "Analytical shape computation of macromolecules: I. molecular area and volume through alpha shape" Proteins. 33:1-17.