Third Workshop on Interaction Between Compilers and Computer Architectures,
Instruction cache performance is critical to instruction fetch efficiency and overall processor performance. The layout of an executable has a substantial effect on the cache miss rate during execution. This means that the performance of an executable can be improved significantly by applying a code-placement algorithm that minimizes instruction cache conflicts. Alternatively, the hardware configuration of the instruction cache itself may greatly influence cache performance. For instance, increasing associativity or selective placement of data in the cache may significantly reduce conflict misses as well.
In this paper we compare the performance benefits of compiler-based code-placement algorithms to hardware-based schemes using a full simulation model of an out-of-order, 4-way issue processor. We compare the effectiveness of code-placement algorithms when applied to self-trained and cross referenced profiling data. We compare the benefits of adding small fully associative buffers, including variants of victim buffers, to be used along side the first level instruction cache. Our results show that code placement achieved an average 10% speedup, victim buffers obtained a 15% speedup on average, and combining both of these resulted in a net 20% speedup on average.