CSE 231: Advanced Compilers

Implementation Projects

Take heed:   All these project are awesome, but they're also challenging. Start early, ask lots of questions, and keep us posted on your progress.

Finding Bugs in Language Implementations

Our friends at Utah created CSmith, a tool that uses "differential random testing" to find both crash bugs (annoying) silent optimization errors (terrifying!) in C compilers. Using their technique, they've found hundreds of bugs in mature, widely deployed C compilers. For example, they found multiple bugs in LLVM, GCC, and ICC, and these are the compilers you use every day, the compilers that most *nix software is built with! Your project will be to adapt the ideas from CSmith to another language implementation, e.g. Python or JavaScript, hopefully finding major bugs along the way.

Code Perforation

Want your code to run 2x or 3x faster? Just skip a few loop iterations here and there! Perhaps this sounds crazy (that's why we call it research), but it turns out out that for many programs you can just jump over parts of the code without totally destroying the computation. This technique, called Code Perforation, can improve performance, save power, and make programs more robust in the face of errors. Your project will be to explore existing code perforation techniqes, measure them on various benchmarks, and then cook up your own code perferation analysis to determine where and when computations can be dropped.

From C to Cyclone

The Cyclone programming language provides the same sort of low-level control found in C, but excludes a huge class of the common errors that plague C programs: buffer overflows, format string attacks, double free bugs, dangling pointer accesses, etc. Sounds great, but what's the catch? How hard is it to port existing software to Cyclone? What's the best way to go about this porting process? What performance overhead do we pay after porting? Can we find bugs in our C programs just by porting them to Cyclone? Your project will be to select a few application written in C, port them to Cyclone, and report on your experience.


Ironically, there's rarely anything optimal about the code produced by a compiler's optimizer. However, when it comes to superoptimizer, it's a different story all together. Unlike traditional optimizers, superoptimizers produce code that is guaranteed optimal for certain parts of programs. The Stanford Superoptimizer in particular has been shown to provide up to 10x speedups for compute intensive kernels on x86! However, there are still plenty of challenges left before superoptimization becomes mainstream. One issue is that determining when one program fragment is better than another, i.e. the cost of a program, can be extremely difficult. Another, related issue, is that optimizations that help tremendously on one machine make almost no difference on the next machine. Your project will be to replicate the results from the Stanford SO project and then attempt to improve on them by developing a refined cost model for programs, preferably informed by some form of profiling.

Finding Bugs in Applications

Over the past decade there's been steady progress in automatic test generation. Current techniques can often far outperform human-written test suites, often achieving extremely high coverage in terms of lines of code exercised. In fact, one tool even found 3 serious bugs in GNU Coreutils, the main suite of programs powering many, many of today's *nix machines. This is surprising, as GNU Coreutils is one of the most widely used pieces of open source software, and yet these bugs remained undetected for nearly 15 years. Some of these bug finders and test generators take advantage of domain specific information while others are general purpose. Your project will be to implement a bug finding tool for a class of applications of your choice and attempt to uncover subtle programs errors that would elude testing or inspection by mere humans.