|
[edited a few big mistakes pointed out by comments ] My last post, a rant about the virtues of D over C++, seems to have got read by a lot of programming folks (thx for reading). From the comments there, and the discussion on reddit.com, most of the objections to D were adroitly answered by other programmers, including comments from the author of D. There is however one objection that came up a few times I would like to elaborate on. Namely, some people didn’t see why we should care that the language should make it easier for the writer of the compiler. After all, the compiler is a black box, and as long as it just works (!) then we don’t need to worry about how difficult it was for the guys behind the scenes to make the magic happen. I disagree. We should care greatly about the back-end logic of a language. And we should care for very selfish reasons. Namely, the quality of the compilers will depend on the internal logic of the language. Compiler writers are people too. They like us, would like to use funky new algorithms to optimize their compilers, test-driving new techniques to make the compilers go faster and faster. The cleaner the design of the language, the easier it would be to implement new techniques. The internal logic of a language will ultimately constrain the ability to improve the compilers. The benefit for us, the end users, is that we will get better and better compilers, resulting in faster compilation times, smaller executables, and better reporting of compilation errors. There are computer languages, and then there are computer languages that can bootstrap. One of the holy grails of language design is to design a language rich enough so that one can [practically] write the compiler in the language in which it compiles for. A bootstrapping language can compile a compiler for itself, in itself. As pointed out by the comments, all languages can theoretically bootstrap themselves. But whether one would want to, is another thing (has anyone written a FORTRAN compiler in FORTRAN?). Preferably, it requires a language that has access to a lot of low-level systems programming. That is the strength of C and why a lot of interpreters and compilers for other languages, are written in C. With bootstrapping, you get into some strangely recursive territory, leading to one of the most bizarre (and brilliant) essays in computing – Ken Thompson’s Trusting Trust. In that essay, Thompson shows how you can hide a security back-door by doing strange things to the bootstrapping of C compiler, and not leave a a trace, not even in the source code. This is the type of essay that induces crises of faith in people who worry about computer security for a living. One of the most interesting experiments in compiler design is the PyPy project. Python, unlike C, is interpreted. A Python interpreter, written in C, interprets a Python program and executes the code. However, the Python community wanted to experiment with different interpreter designs. Nevertheless, rewriting the Python interpreter in C just for the sake of experimentation would be too much work. They came up with another approach. The goal of PyPy is to write a Python interpreter [and compiler] in Python (I’ll call this the Python-Python interpreter). The idea is that you’d run a Python interpreter that would run the Python-Python interpreter, which in turn, would be able run Python programs. But very very slowly. However, once you got the Python-Python interpreter to work you could modify the source code to the Python-Python interpreter (which is written in Python not C), and try out different ways of interpreting Python programs. The basic premise is that the source code to the Python-Python interpreter would be orders of magnitude easier to change than the source code of the Python interpreter written in C. (Guido van Rossum, for instance, does not include any features in Python that can’t be parsed with LL parsers). If it’s hard to change things in C, its even harder in C++. One of Walter Bright’s goals in designing D is to strip away the convoluted layers that had accreted in the grammar of C++, whilst keeping the features that make C++ powerful (pointers, templates etc). Hopefully this would open the way for better experimentation with compiling D programs. And now, even though D is much younger than C++, with only a handful of D compilers, D compilers already appear to be approaching something reasonably robust. The signs for an optimized future of D looks bright indeed.
Something interesting about D. Having just installed Leopard I googled for the D Programming Language with Leopard and came up w/ an apple developer article [1] that lead me to dig further into the new DTrace utility [2] from Sun. Apparently D is either written in, or extensible by the D Programming Language. So there’s a relatively big win for a new(-ish) language, if anyone was wondering about its utility/maturity. [1] http://developer.apple.com/leopard/overview/ [2] http://www.sun.com/bigadmin/content/dtrace/
If you haven’t looked at it already, Paul Graham’s article The Roots of Lisp is worth a read in relation to bootstrapping. In a nutshell, the Lisp evaluation operation – the one that reduces any Lisp expression – can be implemented in less than a page of Lisp using only seven primitive symbols. A lot of what would be needed in a practical Lisp is elided (such as a library for arithmetic operations, IO, etc) but the core power of Lisp as a model of computation is all there.
You missed one bit in the pypy design – interpreter itself is written in restricted subset of python, which allows it to be translated (not interpreted) to a lower-level language. Hence there is no induced speed decrease. So purpose of the pypy project is not only to experiment, but also to have working interpreter.
The D language used in DTrace has nothing to do with the general purpose D programming language the previous post was about.
Another good thing about a language being easy to parse is that more third party stuff will be developed for it. Things like static analyzers, style checkers, editor support (indentation, code completion, etc.). Those are not part of the compiler and are often developed by people who have never seen the compiler’s source code.
“The internal logic of a language will ultimately constrain the ability to improve the compilers.” I disagree, the LLVM project looks like it frees compiler writers from the constraints of a language.
“LLVM project looks like it frees compiler writers from the constraints of a language.”
Nicely written.
Hi Bosco, I don’t want to diminish your enthusiasm for programming languages, but I’m afraid there are several glaring factual inaccuracies in your post. For example, every language can potentially bootstrap (not just C), Cfront wasn’t written in the C preprocessor (it was, in fact, a separate compiler), C++ templates do not interact with the preprocessor, etc…
Good job on fixing up your article so quickly. I still disagree with what you wrote about bootstrapping languages. GHC is written in Haskell, most Lisp compilers are written in Lisp, MLton is written in SML, and I’m sure there’s even a self-hosting Fortran compiler somewhere out there. C is actually a cumbersome language to use for most phases of compiler translation. I think OCaml, SML, and Haskell are all are much better suited for writing static analysis algorithms and optimizations. The only thing I can imagine you would want C’s “low level” features for is writing a garbage collector.
It’s “Stroustrup”, not “Strousoup”. Funny typo, though! ;-)
You should really take a look at Scala sometime. Now that is one easy and powerful language. http://www.scala-lang.org/ D and Scala are currently my favorites. But when will these languages get some tools that one can work with?! |
||