Transcription as Compilation

It occurred to me the other day that a useful metaphor for making proteins is, not a factory as current thinkiing goes, but rather, that of compiling a computer program from its source code. Treating transcription, the making of protein from its DNA sequence, as a parser seems to be a more natural end-point to the idea that is a computer code.

Like any science, molecular biology is driven by metaphors. The most popular method has been the factory metaphor. DNA is understood to be a DNA serves as a blue-print of the cell. A rough-n-ready copy of the blue-print is made in the form of RNA, which is read by machinery of the ribosome to make a protein. This forms the infamous central dogma of molecular biology: DNA -> RNA -> Protein

But recently, cracks have started to appear in this edifice. Now that researchers are beginning to glean the intricate machinery that regulates the DNA, the language of networks have started filter down into the literature. Filaments of interactions, where proteins that are made from the DNA blueprint, then itself binds back onto the DNA, producing a convulted linkages of cause-and-effect that is impossible to separate back into the clean stark lines of the central dogma.
What is a compiler? A compiler is a computer program that turns text written in one language (C++) into machine code that can be read by a computer. But reading a Joel Spolsky article, I came across the idea that translation from one language into another is also compilation. The example given was the first C++ compiler. Bjorne Stroustroup originally wrote the object-oriented extension to C as a pre-processor command. But the idea was that you would write the C++ program, as designed by Bjorne, and then the pre-compiler would turn that into C, then a C compiler would turn that into machine code, ready to run on your machine's microchip. Joel pointed out that this pre-compilation step is actually compilation. There was just as much work in translating C++ to C as there was in translating C to machine code.

And then I saw the analogy. What if DNA was C++, RNA was C and proteins was machine code? Yes, the hard and dirty work inside the cell was done by the protein code, representing the deep dark actions of the biochemistry, and of course, DNA was the C++. The analogy stretches even further when you think about compiler macros that are often embedded in C++ source files that depend on important information about the enviroment that the program is compiled in. This would make the compiler the ribosome.

So what does this analogy gain us? Well for starters, we can steal some ideas from compilers and apply it transcription. For instance, one common goal in language designers is boot-strapping. A computer language is essentially embedded in the compiler, a binary program that translates a computer program, as expressed in the text C++ file, into machine language. Boot-strapping means that a particular computer language is sophisticated enough to encode everything needed to write the compiler in that language. In C, we write the C compiler in C itself. With an existing C compiler, we should be able to compile the source code the compiler and output a functional binary of the compiler (this need not be identical to the existing compiler). In designing new versions of a C compiler, one must first use an older version of the compiler, to generate the binary of the new version of the compiler. But the moment of truth is when we use the new version of the compiler to generate itself in itself using its own source code.

So a test of a DNA, RNA, ribosome, protein is the ability to bootstrap. We use the ribosome to read the RNA that includes the ribosome component proteins, to make the new proteins, which are assembled into a new ribosome. There even examples of the old versions running around. For instance, mutant experiments often fold RNA from a latter eukaryotic orgnaism into the cell of a prokaryote. And the prokaryote will happily make (or compile) the visitor RNA, if the RNA makes it all the way through the cellular protective layer.