Event-Driven Multithreaded Dynamic Optimization

Weifeng Zhang

UC San Diego Technical Report CS2006-0872, June 2006


Dynamic optimization has been proposed to overcome many limitations of static optimization, such as inaccurate assumptions about the underlying processor architecture and lack of adaption to the program's runtime behavior. However, existing dynamic optimization systems often impose high runtime overhead (software systems) or great hardware complexity (hardware systems), and only have limited runtime adaptability.

This thesis proposes a new model of optimization, where optimization is triggered by hardware optimization events and is performed concurrently on an application while it is running. We introduce an event-driven multithreaded dynamic optimization architecture, called Trident. Trident is a software/hardware solution which strives to reduce software runtime overhead as well as reduce hardware complexity and inflexibility. Trident takes advantage of two key features in modern processors, abundant chip-level parallelism (through Simultaneous Multithreading, Chip Multithreading, or a combination) and increasing hardware support for runtime performance monitoring. Trident proposes generic, lightweight extensions of the hardware monitoring mechanisms to profile the program's execution behavior. Hardware triggers an event for optimization upon detection of any interesting behavior. Lightweight helper threads are spawned to process these events, in parallel with the main thread. The combination of event-driven and concurrent optimization makes Trident extremely low overhead in both profiling and optimization. This enables Trident to perform more expensive optimizations than existing dynamic systems, and perform continuous recurrent optimizations without fear of performance loss.

The power and flexibility of Trident enable many types of optimizations. In this thesis, we demonstrate it with an aggressive optimization, called speculative dynamic value specialization. We also demonstrate Trident's power of continuous, gradual optimization by improving traditional software prefetching to better attack the classical memory wall problem. These approaches include adaptive dynamic software prefetching via self-repairing and accelerating precomputation based prefetching.