What could be more elementary for a structural biologist than getting an arbitrary PDB file from the interwebs?

In the old days, this used to be a rather difficult task which involved ordering a CD from far, far away. Even recently, I would have to ftp to the ftp site, download a zipped .z file, and de-compress. I did this with Python. Unfortunately, I had to drop into the os to do the decompression.

But now, I’ve found a better way: listed here as a 2 line python function that uses only the standard library.

import urllib
def fetch_pdb(id):
  url = 'http://www.rcsb.org/pdb/files/%s.pdb' % id
  return urllib.urlopen(url).read()

Thanks to the new rcsb.org site, one can look at a PDB file directly in the browser. We can thus pull it off the web-site, and avoid the ftp server, and skip the decompression step.

Pedro Beltrao on 08/10 said:

Do they have any policies on robots ? Sounds like a useful way to get them but this might not be very nice on the server.

Bosco on 08/10 said:

Hmmm. Don’t know.

Pedro, you know there is only way to find out (in the spirit of experimentation of course).

baoilleach on 08/11 said:

“Unfortunately, I had to drop into the os to do the decompression”.

What about gzip.open(“1abc.gz”, “r”).read() ?

In the text is in a string, you can use zlip.decompress(mystring) (although I’ve never done this myself).

Bosco on 08/11 said:

There is a difference between .z and .gz files. The difference is legal, I think, and thus .gz file decompression techniques is in the Python core library, whilst .z is not. Anyway, I’m not sure about this, but I remember getting mildly frustrated about this issue and ended up just using uncompress on the linux command-line.