28th International Symposium on Microarchitecture, pages 199-206, November 1995
Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of predicting the likely out-come of branch instructions. Many branch and fetch prediction architectures have been proposed, from simple static techniques to more sophisticated hardware designs. All these previous studies compare differing branch prediction architectures in terms of misprediction rates, branch penalties, or an idealized cycles per instruction.
This paper provides a system-level performance comparison of several branch architectures using a full pipeline-level architectural simulator. The performance of various branch architectures is reported using execution time and cycles-per-instruction. For the programs we measured, our simulations show that having no branch prediction increases the execution time by 27%. By comparison, a highly accurate 512 entry branch target buffer architecture reduces this performance degradation down to 1.5% of the performance of a perfect branch predictor, implying that new branch architecture research must address aggressive system architectures to provide meaningful contributions. We also show that the most commonly used branch performance metrics, branch misprediction rates and the branch execution penalty, are highly correlated with program performance and are suitable metrics for architectural studies.