In Proceedings of the International Symposium on Code Generation and Optimization (CGO 2006).
Software prefetching has been demonstrated as a powerful technique to tolerate long load latencies. However, to be effective, prefetching must target the most critical (frequently missing) loads, and prefetch them sufficiently far in advance. This is difficult to do correctly with a static optimizer, because locality characteristics and cache latencies vary across data inputs and across different machines.
This paper presents a mechanism that dynamically inserts prefetch instructions into frequently executed hot traces. Hot traces are dynamically analyzed to identify delinquent loads and the appropriate prefetch distance for those loads. Those prefetches are then inserted into the hot trace. The low overhead of the event-driven dynamic optimization system allows the optimizer to continuously monitor the performance of the software prefetches. This is done to find an accurate and stable prefetch distance and to adapt to changes in program behavior using what we call Self-Repairing prefetching. Relative to the baseline hardware stride prefetching, we find a total 23% improvement when we use the self-repairing mechanism to adaptively discover the best prefetch distance for each load, which is 12% better performance than dynamic prefetching techniques without adaptive repairing.