Compiler and Hardware Predicated Dependency Analysis and Scheduling

Lorinda Carter

UC San Diego Technical Report CS2002-700, February 2002


The Explicitly Parallel Instruction Computing (EPIC) architecture has been put forth as a viable architecture for achieving the instruction level parallelism (ILP) needed to keep increasing future processor performance. The Itanium processor developed at Intel is an example of an EPIC architecture.

One of the new features of the EPIC architecture is its support for predicated execution. Predicated execution is a process that can replace branches with statements defining 2 predicate registers (one true and one false), depending on the condition in the replaced branch. Subsequent statements are then guarded by one of the predicates, depending upon whether they would have been on the taken or fall-through path of the branch. All statements begin execution, but an operation is committed only if the value of its guarding predicate is true.

An advantage of predicated execution is that it can eliminate hard-to-predict branches by combining both paths of a branch into a single path. However, data dependence analysis (for the purpose of maintaining definition-use information) is significantly more complex for the resulting code. When the two paths of a branch are combined, definitions of the same logical registers (originally from different paths) are intermingled. This makes it difficult to determine which definition a use is actually dependent on. This dissertation presents both hardware (Disjoint Path Analysis) and compiler (Predicated Static Single Assignment) solutions for improving the data dependence analysis for predicated regions of code by collecting information on predicate relationships.

Another feature of the EPIC architecture is the reduced hardware complexity. The EPIC philosophy is that the compiler should handle most of the dependence analysis and scheduling in order to simplify the processor, and at the same time the compiler has a broader view of the code. However, the compiler cannot fully anticipate run-time events such as cache misses. Consquently, it cannot always create a static schedule to mitigate the effects of the increased latency that might result. In this dissertation, we introduce Pending Functional Units (PFU) which allow a limited amount of dynamic scheduling with minimal additional hardware overhead.