In the proceedings of the 16th Annual ACM International Conference on Supercomputing, June 2002.
Lori Carter and Brad Calder
Predicated Execution has been put forth as a method for improving processor performance by removing hard-to-predict branches. As part of the process of turning a set of basic blocks into a predicated region, both paths of a branch are combined into a single path. There can be multiple definitions from disjoint paths that reach a use. Waiting to find out the correct definition that actually reaches the use can cause pipeline stalls.
In this paper we examine a hardware optimization that dynamically collects and analyzes path information to determine valid dependences for predicated regions of code. We then use this information for an in-order VLIW predicated processor, so that instructions can continue towards execution without having to wait on operands from false dependences. Our results show that using our Disjoint Path Analysis System provides speedups over 6% and elimination of false RAW dependences of up to 14% due to the detection of erroneous dependences in if-converted regions of code.