protein // 19 November 2007 // comments

match.py - calculating the RMSD of PDB structures

Have you ever wanted a simple program to calculate the RMSD between two protein structures? Well, if you’re clever, you might figure out how to do it in PyMOL or VMD, but sometimes you just want to do it from the command line.

Well wait no more, here’s my RMSD program, written in Python 2.4, and since it’s interpreted, you can look at the source-code to see how it’s done. The heart of the RMSD calculation is just 9 lines of Python code. The code is short because it uses the wonderful linear algebra from the widely used numpy module (dmg for macosx10.4 & python2.4). The program is written with Python 2.4 and NumPy 1.0.

  download: match.zip [16 kb]


The command inputs includes a string that defines the segments with which you want to align your pdb files. The format is essentially a python list of 2-tuples.

You can use match.py to calculate the RMSD, generate a structure with the optimal rotation, or calculate just the raw RMSD without rotation. The program match.py can also be imported into Python as a module if you need to chain your RMSD calculations.

Interpreting the segment string parameters

For starters, you can consult the rcsb.org website to get a complete description of a PDB file. But if you open a PDB file with a text editor, you will see lines like this:

ATOM     37  O   TYR A   3      -5.392  -1.551   4.271  1.00  0.20           O 

TYR’ refers to the residue type in a polymer (protein, DNA) or the name of a small molecule.

The ‘A’ after TYR refers to the chain identifier if it is a polymer.

In a small molecule, usually, no chain is given, and a small molecule is refered by the residue number ‘37’

indeed, ‘37’ is the residue number.

So, let’s say you have two structures of the same protein with one chain, called chain ‘A’. They are in test1.pdb and test2.pdb

The chain has residues 1-10

Then the command line would be

python match.py test1.pdb test2.pdb "[('A:1', 'A:10')]"

If on the other hand your pdb files has two chains, and chain “B” has 20 residues numbered from 30 to 50 then you

python match.py test1.pdb test2.pdb "[('A:1', 'A:10'), ('B:30', 'B:50')]"

If for some reason, the test2.pdb uses chain “E” instead of “A”, and “F” instead of “B”, then you can do:

python match.py test1.pdb test2.pdb "[('A:1', 'A:10'), ('B:30', 'B:50')]" "[('E:1', 'E:10'), ('F:30', 'F:50')]"
 
 
 Michal · 11/20

Hi,

You should get “bioshell” package (bioshell.chem.uw.edu.pl). This is java API to make a lot of bioinformatics tasks.

for example, if you want to calc rms, just type java RmsCalc with *pdb files and you get rms.

Bioshell classes could be used as jython modules (it is very simple – not as C headers in python. just import java class and it work!).

 Deepak · 11/27

I should dig into my vault. I am pretty sure there are programs written in FORTRAN and perl to do exactly that, although by habit I always used CHARMM to do it.

For some reason, I feel like saying “back in the day we always did RMSD calculations via the command line”. Man I am getting old and cranky :)

 Bhagyashree · 01/16

Respected sir,
the website is good.
But i am unable to find pseudoocode for the RMSD calculation.
I am doing project on the topic :
Protein structure prediction using Genetic Algorithm(In FORTRAN). I want to incorporate the RMSD calculation code in the package.

Please help me in this issue.

Thanking you,
Bhagyashree Chaudhari.

 Zorayda Ortiz · 04/30

Dude,

Highly respected sir,

You are awesome. Keep up the good work.
I like the pic too.

You clarified RMSD for me on a google search. I don’t have the patience at the moment to put in my PDB file into your program, but I’ll check it out soon. Can you tell me how to do it in Pymol? Do you have any fun suggestions on how to use pymol, or specific proteins that may be interesting to look at with pymol?

Thank a million

-Zorayda Ortiz