Vector Representation in Structural Biology

One of the questions that vexes computational structural biologists when writing analysis software, is whether to represent spatial vectors as an object with x, y, z components, or as a numeric array of floats with 3 members.

After turning over this problem in my head for the last decade, and writing vector classes in several languages, I can tell you now: go with the array of floats.

It might seem that you cam write cleaner code with named components: x, y, z. After all, you're probably going to be translating algorithms from math or physics books, where x, y, z are used.

However the x, y, z designations are arbitrary. All of the laws of physics work just as well when you permute the x, y, z indices. Although some algorithms construct special matrices and vectors with specific formulae of the x, y, z components, such as populating rotational matrices from axis vectors, other algorithms can be written much more cleanly by looping through a spatial dimension index i through [0, 1, 2].

Unfortunately, once you've gone for x, y, z components, then you'll have to tediously expand out such looping algorithms, or write an adaptor function to fetch the components using an index input parameter. It's less painful going the other way.

More importantly, if you were ever to expand out your vector class to handle more complicated operations (such as principal component analysis), you will probably find that you want to leverage any one of the many excellent linear algebra libraries out there. This is where you will find that the choice of using numberic arrays really pays off. Because in that case, arrays of floats can often be put straight into such libraries.

I look at my code where I made the unfortunate decision to use x,y,z components, and I see many places where I've had to translate to arrays of floats to interface with different algorithms.

So don't make the same mistakes I did and future proof your code: go with numeric array of floats for vectors.