Project 1: Branch Predictors

Due: Tuesday, October 26 at Noon (via e-mail to Sat)


Project Overview:

One of the most popular techniques used to increase performance of a microprocessor is to increase the level of pipelining that is done. The problem is that deeper pipelines generally mean a bigger performance hit when misprediction of a branch occurs. For example, in the Pentium 4, the penalty for a branch misprediction is 20 cycles. A lot of industrial effort has been devoted to creating better branch predictors that will mispredict less yet not create a huge hardware overhead. Better branch prediction has allowed for development of deeper pipelines while deeper pipelines have spurred the development of better predictors. In many ways, their successes feed off of each other.

For this project, you will be implementing several different branch predictors and evaluating their performance on several different program traces. Given a strict memory limit, you will have to implement two standard branch predictors (a control group of sorts) and one additional branch predictor of your choice. The first two predictors will be graded based on their correctness while the third will be graded according to the performance versus another predictor. The performance metric used will be the misprediction rate. You will be given a standard interface and a set of program traces and will be expected to implement several functions to get the predictor to operate correctly.


Project Details:

To get you started, a standard C/C++ interface is provided here. Your job will be to implement several functions in this interface to simulate different branch predictors. For all of the predictors that you implement, there will be a memory limit of (32K + 256) bits (NOT bytes). This doesn't necessarily mean that your program will only use this much memory. For example, you could have a table with 16K entries and 2 bits per entry. You could use a 32-bit integer in C to represent a 2-bit entry and it would only count as 2 bits towards your limit, not 32. All tables should be declared statically, not dynamically and you should avoid the use of malloc() or new for these tables. Also, in case you are wondering, the additional 256 bits are there to aid in implementing a global history register table (GHR).

Upon implementing the required functions, you will compile the branch prediction simulator (via a makefile) and test your predictor on several different traces. These traces were generated from running real programs and tracking if branches were taken or not taken. The trace files themselves are huge text files but are in a very simple format. Each line consists of the address (PC) of the branch instruction and whether the branch was taken (1) or not (0). The trace files have been compressed with bzip.

There are two specific branch predictors that you must implement: gshare and local. A description of both of these predictors can be found in the Hennessy & Patterson textbook. These two predictors will act as a "control group" for the project. Sat will be implementing both of these predictors and measuring their performance. Your grade on these two predictors will be based on how closely your performance measurements match those of Sat's.

Finally, you will be required to implement one other branch predictor of your choosing. There are only a few restrictions on what predictor you may use. First, it may not be a copy or simple modification of either gshare or local. An example of a simple modification would be to use gshare but modify it to use xnor instead of xor. If you have any concerns about whether your idea is a "simple modification" feel free to contact Sat with your idea. Also, you may not use a "dumb" predictor such as one that always returns true, one that always returns false, or one that randomly returns true and false. Again, if you have any questions about this, please e-mail Sat. Feel free to search through any literature you like to find a predictor to use. In other words, feel free to go wild with Google Scholar/CiteSeer/whatever. Just make sure you give credit to any sources you use in creating your predictor (i.e. give specific reference in your report). Below is a set of references to get your started.

Your grade on your chosen predictor will be based on both correctness and relative performance. Correctness means that the predictor compiles and runs correctly and gives sane results. For relative performance, you will be competing against a branch predictor based on the one used in the Alpha 21264. If your branch predictor performs better, on average, than the Alpha 21264 predictor, you will be awarded points.


Project Deliverables:

Your deliverables must be submitted (through e-mail) to Sat and consist of two parts:


Project Grading:

The project grade will be out of 100 broken down as follows: 30 points for gshare implementation, 30 points for local implementation, 20 points for design and implementation of your own predictor, and 20 points for beating the Alpha 21264 predictor. The 5 students with the best performance on their own predictors will also be recognized in class.

Please note that this is an individual assignment and you should not collaborate with anyone else. You may share references but you may not talk about any of the implementation details involved in coding up the predictor.


References: (Courtesy of Dean Tullsen)

McFarling, Combining Branch Predictors, WRL TN-36.  Good description of local, correlating, gshare, and combining predictors.

Kessler, The Alpha 21264 Microprocessor, IEEE Micro, 1999.  Uses variant of McFarling's combining (tournament) predictor.

A. Eden, and T. Mudge. The YAGS branch predictor. 31th Ann. IEEE/ACM Symp. Microarchitecture (MICRO-31), Dec. 1998.  Good description of a variety of anti-aliasing predictors (predictors that still work with large working sets of branches).  Don't take their word that theirs is best without testing it out...

C.-C. Lee, I.-C. Chen, and T. Mudge. The bi-mode branch predictor. 30th Ann. IEEE/ACM Symp. Microarchitecture (MICRO-30), Dec. 1997.  Although the previous had a good description of bi-mode, this is the original.

A. Seznec, S. Felix, V. Krishnan, Y. Sazeides. "Design trade-offs for the Alpha EV8 conditional branch predictor", in : Proceedings of the 29th International Symposium on Computer Architecture, May, 2002.  Take a deep breath before you enter...