Have a care for the compiler writer

26 Oct 2007 // programming

[edited a few big mistakes pointed out by comments ]

My last post, a rant about the virtues of D over C++, seems to have got read by a lot of programming folks (thx for reading). From the comments there, and the discussion on reddit.com, most of the objections to D were adroitly answered by other programmers, including comments from the author of D.

There is however one objection that came up a few times I would like to elaborate on. Namely, some people didn't see why we should care that the language should make it easier for the writer of the compiler. After all, the compiler is a black box, and as long as it just works (!) then we don't need to worry about how difficult it was for the guys behind the scenes to make the magic happen.

I disagree. We should care greatly about the back-end logic of a language. And we should care for very selfish reasons. Namely, the quality of the compilers will depend on the internal logic of the language.

Compiler writers are people too. They like us, would like to use funky new algorithms to optimize their compilers, test-driving new techniques to make the compilers go faster and faster. The cleaner the design of the language, the easier it would be to implement new techniques. The internal logic of a language will ultimately constrain the ability to improve the compilers.

The benefit for us, the end users, is that we will get better and better compilers, resulting in faster compilation times, smaller executables, and better reporting of compilation errors.

There are computer languages, and then there are computer languages that can bootstrap. One of the holy grails of language design is to design a language rich enough so that one can [practically] write the compiler in the language in which it compiles for. A bootstrapping language can compile a compiler for itself, in itself. As pointed out by the comments, all languages can theoretically bootstrap themselves. But whether one would want to, is another thing (has anyone written a FORTRAN compiler in FORTRAN?). Preferably, it requires a language that has access to a lot of low-level systems programming. That is the strength of C and why a lot of interpreters and compilers for other languages, are written in C.

With bootstrapping, you get into some strangely recursive territory, leading to one of the most bizarre (and brilliant) essays in computing – Ken Thompson's Trusting Trust. In that essay, Thompson shows how you can hide a security back-door by doing strange things to the bootstrapping of C compiler, and not leave a a trace, not even in the source code. This is the type of essay that induces crises of faith in people who worry about computer security for a living.

One of the most interesting experiments in compiler design is the PyPy project. Python, unlike C, is interpreted. A Python interpreter, written in C, interprets a Python program and executes the code. However, the Python community wanted to experiment with different interpreter designs. Nevertheless, rewriting the Python interpreter in C just for the sake of experimentation would be too much work.

They came up with another approach. The goal of PyPy is to write a Python interpreter [and compiler] in Python (I'll call this the Python-Python interpreter). The idea is that you'd run a Python interpreter that would run the Python-Python interpreter, which in turn, would be able run Python programs. But very very slowly. However, once you got the Python-Python interpreter to work you could modify the source code to the Python-Python interpreter (which is written in Python not C), and try out different ways of interpreting Python programs. The basic premise is that the source code to the Python-Python interpreter would be orders of magnitude easier to change than the source code of the Python interpreter written in C. (Guido van Rossum, for instance, does not include any features in Python that can't be parsed with LL parsers).

If it's hard to change things in C, its even harder in C++.
Bjarne Strousoup originally wrote the C++ compiler as a pre-processor. Originally you would compile your C++ program (with the preprocessor CFront) into C, and then compile C into machine code. A two step process. This was necessary in order to piggy-back the immense infrastructure that had already been carved out by the C ecosystem. C++ inherited a grammar that had to work with CFront and C. Combine this with the development of templates, which also exploits another level of preprocessing, and you get an idiosyncratic grammar that makes C++ one of the hardest languages to write compilers for.

One of Walter Bright's goals in designing D is to strip away the convoluted layers that had accreted in the grammar of C++, whilst keeping the features that make C++ powerful (pointers, templates etc). Hopefully this would open the way for better experimentation with compiling D programs. And now, even though D is much younger than C++, with only a handful of D compilers, D compilers already appear to be approaching something reasonably robust. The signs for an optimized future of D looks bright indeed.