How to avoid futility: a guide to parameterizing non-standard amino acids in AMBER

As I've come to the biology part of computational biology rather late, it came as a surprise to find that, for a side project, I had to phosphorylize a protein. I wanted to use AMBER to simulate a protein with a phosphorylated aspartic acid.

You might think that phosphorylation is so common, there ought be standard techniques. There is. I tracked down a helpful page of phosphorylatio techniques. There were parameters for all types of phosphorylated amino acids, all that is, except for aspartic acid.

I would have to generate the parameters myself.

I didn't need perfect parameters, I just needed some rough numbers that AMBER would accept. But AMBER is finicky. To make AMBER accept a non-standard amino acid is like making a child swallow cough medicine. Doable but unpleasant.

Here's what you need to do. First you need to construct a .prepin file for your non-standard amino acid. The PHosphorylated aspartic aciD is PHD. Constructing the initial .prepin was easy. A .prepin file is essentially a Z-matrix form of the residue, with a simple tree structure. I took the aspartic acid .prepin file from the standard residue topology files, and then grafted the Z-MATRIX of the phosphate group from a .prepin file of a phosphorylated serine (taken from the phosphorylation page). This requires some renumbering of atoms and making sure atom numbering match, and changing the atom type of the OD1 in aspartic acid to an ester O.

Then you need to get the charges right. How do you calculate the charges? The best method to calculate the charges is download OPENBABEL, and use OPENBABEL to convert the .prepin file to a .mol2 file. This is because AMBER's own quick-and-dirty charge calculator ANTECHAMBER can't deal with it's own .prepin files too well. Instead I use the .mol2 file generated from openbable, and get ANTECHAMBER to read the .mol2 file to generate a new .mol2 file with a net charge of -2 (other phosphate groups use -2 too and AMBER needs integers). ANTECHAMBER generates the charges from the GAFF database of generic atom types.

Then I take these new charges, and manually put them into the original .prepin file. In the modified .prepin file that I had, here's the part that killed me:

11  CG  C  B    8   6   4   1.527 109.470  180.000  0.72190
12  OD1 OS S   11   8   6   1.260 117.200   90.000 -0.47240
13  OD2 O2 E   11   8   6   1.260 117.200  270.000 -0.76120
14  P   P  3   12  11   8   1.578 120.644 -177.106  1.51540
15  O2P O2 E   14  12  11   1.460 106.022 -179.837 -0.92080
16  O3P O2 E   14  12  11   1.461 106.418  -59.885 -0.91350
17  O1P O2 E   14  12  11   1.461 106.348   60.270 -0.94310

Just to help you out, the phosphate group (atoms 14-17) is grafted from another .prepin file. The 4th column gives the sidechain branching: mainchain (M); have one other bond as a sidechain (S); nexus of a double-branch (B), has three other branches (3); and dead-end (E). The 5th, 6th and 7th columns refer to the three other atoms that form the dihedral angle.

This is where AMBER chocked. Now OD1 is connected to the phosphate group. OD2 is not. The problem was that AMBER couldn't handle the fact that OD1 is interrupted by OD2 (a dead-end in the sidechain) before seeing the phosphate group.

Fuckers.

It took me 2 freaking days to work this out – even though the sidechain-branching and the dihedral angle connectivity gives you all the information you need to reconstruct the sidechain, you need to flip the order of OD1 and OD2:

11  CG   C   B   8   6   4  1.527  109.470  180.000  0.72190
12  OD2  O2  E  11   8   6  1.260  117.200  270.000 -0.76120
13  OD1  OS  S  11   8   6  1.260  117.200   90.000 -0.47240
14  P    P   3  13  11   8  1.578  120.644 -177.106  1.51540
15  O2P  O2  E  14  13  11  1.460  106.022 -179.837 -0.92080
16  O3P  O2  E  14  13  11  1.461  106.418  -59.885 -0.91350
17  O1P  O2  E  14  13  11  1.461  106.348   60.270 -0.94310

And now, I have a functional PHD .prepin file that can be read by AMBER. Since this is a non-amino acid, you also need PARMCHK to generate the .frcmod files that carries the bond and angle parameters from the GAFF database. We can now run TLEAP and create a fully functional topology file for a protein with the non-standard PHD phosphyrlated aspartic acid.