Sun, 15 Nov 2009
Hacking DDC.
Over the last couple of months I've been doing a bit of hacking on an experimental compiler called DDC. This has been some of the most interesting, gratifying and challenging hacking I have done in years. Having this much fun should probably be illegal!!
I was introduced to DDC at the April 2008 meeting of FP-Syd when Ben Lippmeier, its author, gave a presentation titled "The Disciplined Disciple Compiler". The two main reasons this compiler is interesting are:
- Its written in Haskell an advanced purely functional programming language.
- The language it compiles (Disciple) has some interesting solutions to the problems of side effects and mutability.
The Disciple language is very Haskell-like but has some extra features in the type system which allows the compiler to track mutability and side effects in the type system. The important differences between the Disciple language and the Haskell language are listed on the DDC web page as:
- Strict Evaluation Order is the default, laziness is introduced explicitly.
- Type directed Field Projections complement type classing.
- All data objects support Destructive Update.
- The Effect System tracks what computational effects are being used in a program, without the need for state monads.
- The Class System ensures that effects and destructive update play nicely with laziness.
- Closure Typing is used to track data sharing, and to preserve soundness in the presence of Polymorphic Update.
Obviously a compiler that is doing all this really clever stuff has to be pretty complicated, but it still only weighs in at about 50k lines of code.
The main challenge in working on this is that i am not a very experienced Haskell programmer. There are also large chunks of the compiler doing some very complicated stuff that I don't even have a hope of understanding without reading and understanding Ben's PhD thesis.
Despite that, Ben was willing to give me commit access to the Darcs repo and I have been able to significantly reduce the number of bugs in the DDC bugtracker. Since I was already pretty familiar with the concepts of lexing and parsing as well as being familiar with Parsec (probably the most widely used parsing tool in the Haskell community) I started off fixing some simple lexer and parser bugs like:
- #91 : Require module imports (and exports) to be at the start of the module.
- #95 : Parse error with lists.
- #96 : ellipsis in list generator expressions not comprehensive enough.
- #97 : Error in parsing end of {- -} comments.
- #103 : Not able to parse 'a, b, c :: Type' style type signatures.
I then managed to hack in support for Int64 and Float64 (#106) followed by some significant re-factoring of the Parsec parser which reduced the usage of the Parsec.try construct allowing Parsec to produce much better error messages.
Once I'd done all that, I ran into a very busy time at work and didn't mess with DDC for a couple of months. When I finally got back to looking at DDC, I realised that nearly all of the remaining bugs were much deeper than the bugs I had tackled so far. Tackling these deeper bugs required a new strategy as follows:
- Scan the bug list for reports that either had test cases already or give enough information for me to proceed.
- Create a new darcs branch for each bug. This allowed me to work on multiple different bugs at once so that if I got stuck on any one specific bug, I could just leave it and move on to another.
- Create a reproducible test case if one didn't exist already.
- Create a shell script in the root directory of each branch which did make and then ran the specific test case for this specific bug.
- Use Haskell's Debug.Trace module in conjunction with Haskell's very wonderful Show Type Class to add debug statements to the code.
- Use the Wolf Fencing debugging technique to narrow down the problem to specific area of the code.
Once the problem had been narrowed down to a piece of code, all that remained was to develop a fix. In many cases this resulted in me asking Ben how he'd like it fixed, either in email or on IRC. I also often came up with an ugly fix at first which was refined and cleaned up before being applied and pushed upstream.
With the above methodology I was able to fix a number of deeper and more complex bugs like the following:
- #33 : Check for conflicting projection functions.
- #39 : Emit an error if modules are recursive.
- #42 : Support unboxed CAFs
- #45 : Better error message for runtime pattern match failure.
- #53 : Check for name shadowing in forall quantifiers.
- #58 : Panic in type inferencer.
- #71 : Better error message for unimplemented class functions.
- #77 : crushProjClassT panics when there are type errors.
- #78 : Renamer problems in data type defs.
- #144 : Need better error message when source file does not exist.
I'm now getting a pretty good idea of how the compiler is put together and I'm stretching my hacking into feature enhancements.
My enthusiasm for DDC was recently validated by functional programming guru Oleg Kiselyov's comment on the haskell-cafe mailing list:
"One may view ML and Haskell as occupying two ends of the extreme. ML assumes any computation to be effectful and every function to have side effects. Therefore, an ML compiler cannot do optimizations like reordering (even apply commutative laws where exists), unless it can examine the source code and prove that computations to reorder are effect-free. .....
Haskell, on the other hand, assumes every expression pure. Lots algebraic properties become available and can be exploited, by compilers and people. ....
Hopefully a system like DDC will find the middle ground."
Anyway, back to hacking ....
Posted at: 21:28 | Category: CodeHacking/DDC | Permalink