In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2006
Commercial processors have support for Simultaneous Multithreading (SMT), yet little work has been done to provide representative simulation results for SMT. Given a workload, current simulation techniques typically run one combination of those programs from a specific starting offset, or just run one combination of samples across the benchmarks.
We have found that the architecture behavior and overall throughput seen can vary drastically based upon the starting points of the different benchmarks. Therefore, to completely evaluate the effect of an SMT architecture optimization on a workload, one would need to simulate many or all of the program combinations from different starting offsets. But exhaustively running all program combinations from many starting offsets is infeasible --- even running single programs to completion is often infeasible with modern benchmarks.
In this paper we propose an SMT simulation methodology that estimates the average performance over all possible starting points when running multiple programs concurrently on an SMT processor. This is based on our prior co-phase matrix phase analysis and simulation infrastructure. This approach samples all of the unique phase combinations for a set of benchmarks to be run together. Once these phase combinations are sampled, our approach uses these samples, along with a trace of the phase behavior for each program, to provide a CPI estimate of all starting points. This all starting point CPI estimate is precisely calculated in just minutes.