To reduce these costs, this thesis proposes efficient, demand-driven flow analysis techniques as an alternate to traditional, compiler-based methods. These techniques operate directly on the abstract syntax tree (AST), the data structure most appropriate for use in a source-to-source tool architecture. By eliminating the need for other program representations such as the standard control flow graph (CFG) or program dependence graph (PDG), this approach greatly simplifies program modification.
A key contribution of this work is the idea of virtual control flow, a method for computing the control successors or predecessors of individual AST expressions on demand. This method handles all types of structured and unstructured jumps found in an imperative programming language such as C. Virtual control flow couples well with demand-driven data flow analysis to minimize the cost of determining semantic information. To conservatively estimate data flow relationships, the effects of aliasing between memory locations can be inexpensively approximated using flow-insensitive points-to analysis based on type inference.
These techniques were implemented in a prototype tool called Cstructure to support a simple restructuring transformation for reordering program statements. To check that this transformation does not change the program's behavior requires syntax, control flow and data dependence analysis. Experimental results on three programs ranging in size from 72,000 to 213,000 lines of code demonstrate the performance advantages of such aggressive demand-driven approaches. For the largest program, gcc, check times for the average statement were 50 milliseconds on a desktop workstation.