Funded by NSF CAREER Award CCF-0644306


Developing efficient, scalable, correct, and precise program analyzers and optimizers is difficult. It takes a long time, often up to a decade, before a new optimizing compiler is mature enough to be widely used. These difficulties hinder the development of new languages and new architectures, and they also discourage end-user programmers from extending compilers with domain-specific checkers or optimizers.

The Collider project aims to investigate techniques for automatically generating efficient, scalable, correct, and precise dataflow analyzers and optimizers from a very high-level specification. Imagine a world where you give examples of programs and optimized versions of these programs, and a set of tools work together to generate from these examples an efficient, scalable, precise optimizer. Or a world where you give the semantics of the intermediate language, with a cost model, and a collection of tools generates the entire optimizer for you. Or a world where you declaratively specify the property that you want to check of your programs, and a collection of tools generates an efficient, correct and precise static checker.

Even though the above goals may not be entirely achievable, the broad idea of automating as much as possible of the analyzer- and optimizer-writing process will serve as a guiding direction. The long term goal of the project is to develop Collider, a set of tools for processing program analyses and optimizations to generate efficient, scalable, correct and precise analyzers and optimizers. By attempting to automate the process of writing analyzers and optimizers, Collider will enable many new kinds of usage models for compilers, including: allowing end-user programmers to easily extend the compiler with domain-specific checkers or optimizers; allowing end-user programmers to continuously train the compiler, even after it is deployed, based on additional input-output examples; and automatically generating additional analyses when the optimizer discovers it needs new dataflow information, and linking these new analyses into the optimizer while it is running.