Catching Accurate Profiles in Hardware

Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, and George Varghese

9th International Symposium On High Performance Computer Architecture, February 2003.


Run-time optimization is one of the most important ways of getting performance out of modern processors. Techniques such as prefetching, trace caching, memory disambiguation etc., are all based upon the principle of observation followed by adaptation, and all make use of some sort of profile information gathered at run-time. Programs are very complex, and the real trick in generating useful run-time profiles is sifting through all the unimportant and infrequently occurring events to find those that are important enough to warrant optimization.

In this paper, we present the Multi-Hash architecture to catch important events even in the presence of extensive noise. Multi-hash uses a small amount of area, between 7 to 16 Kilobytes, to accurately capture these important events in hardware, without requiring any software support. This is achieved using multiple hash tables for the filtering, and interval-based profiling to help identify how important an event is in relationship to all the other events. We evaluate our design for value and edge profiling, and show that over a set of benchmarks, we get an average error less than 1%.