

The way a program’s execution changes over time is not totally random; in fact, it often falls into repeating behaviors, called phases. Automatically identifying this phase behavior is the goal of our research and key to unlocking many new optimizations. We define a phase as a set of intervals (or slices in time) within a program’s execution that have similar behavior, regardless of temporal adjacency. Recent research has shown that it is indeed possible to accurately identify and predict these phases in program behavior to capture meaningful phase behavior. The key observation for phase recognition is that any program metric is a direct function of the way a program traverses the code during execution. We can find this phase behavior and classify it by examining only the ratios in which different regions of code are being executed over time. We can simply and quickly collect this information using basic block vector profiles for offline classification or through dynamic branch profiling for online classification. In addition, accurately capturing phase behavior through the computation of a single metric, independent of the underlying architectural details, means that it is possible to use phase information to guide many optimizations and policy decisions without duplicating phase detection mechanisms for each optimization. A good high level overview of our approach to phase classification can be found at: Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, and Brad Calder, Discovering and Exploiting Program Phases, IEEE Micro: Micro's Top Picks from Computer Architecture Conferences, December 2003 The following is a set of definitions we will use in describing phase classification and using SimPoint to find simulation points.
Phase Observations: The above Figure shows the time varying full execution of gzip with the graphicref input for several architectural metrics. Within our context a phase is defined to be segments of execution with similar behavior to each other. In this plot the phases have been color coded, and there are 5 major observations to be made:
SimPoint's OffLine Phase Classification At a high level, our approach automatically finds simulation points of a program by clustering a interval partitioned codeprofile of the full execution, and then picking a sample from each cluster and weighing it by the size of the cluster. The key insight to this approach is that the entire analysis uses only the code that executes and is independent of architectural parameters. The entire method can be broken down into 4 steps: 1) Basic Block Vector Analysis architecture independent code profiling 2) Random Projection reduce dimensionality of data 3) Phase Classification using KMeans Clustering  classifies all intervals into a set of phases where similar intervals are in the same phase. 5) Picking simulation points the last step which finds a good phase representation of the execution and then chooses a sample from each phase to form the simulation points
Step 1: Basic Block Vector Analysis To concisely capture information about how a program changes its behavior over time we developed the Basic Block Vector (BBV). A basic block is a section of code executed from start to finish with one entry and one exit. We use the frequencies with which basic blocks execute as the metric for comparing sections of the application’s execution. The intuition behind this is that program behavior at a given time directly relates to the code executing during that interval, and basic block distributions provide us with this information. Aprogram, when run for any interval of time, will execute each basic block a certain number of times. Knowing this information provides a fingerprint for that interval of execution and shows where the application is spending its time in the code. The basic idea is that the basic block distributions for two intervals are fingerprints that indicate the similarity between the intervals. If the fingerprints are similar, then the two intervals spend about the same amount of time in the same code, and the performance of those two intervals should be similar. More formally, a BBV is a onedimensional array with one element in the array for each static basic block in the program. During each interval, the number of times program execution enters each basic block is counted and recorded in the BBV (weighed by the number of instructions in the basic block). Therefore, each element in the array is this count of entry into a basic block multiplied by the number of instructions in that basic block. The BBV is then normalized by dividing each element by the sum of all the elements in the vector that occurred during that interval.
The above Figure shows a hypothetical program segment during three different execution instances. There are 5 basic blocks (A, B, C, D, E) represented by each box and the black edges are control flow edges between the blocks. The colored arrow is the execution path, with the colored numbers next to each block quantifying how many times that block was executed. For simplicity, each basic block is assumed to have the same number of instructions. Shown next to each basic block is the number of times it is executed.
Shown above is the Basic Block Vectors for each of the three different intervals of execution. Also shown is the distance between Interval 1 and 2, and the distance between interval 2 and 3. Distance between basicblock vectors correlates to how similar are the intervals of execution. We measure this similarity between basic block vectors using the Manhattan Distance. The Manhattan Distance is computed by summing the absolute value of the elementwise subtraction of two vectors. A small distance means the vectors are similar and more likely to be part of the same phase. A large distance means the vectors are not similar and represent different phases of execution. As you can see in the above example, Intervals 2 and 3 have a large distance, whereas Intervals 1 and 2 have a small distance, and are therefore classified into the same phase, since they are similar. Note, before taking the difference between two vectors, the vectors are normalized to one (this is not shown in the above example). We use a basic block similarity matrix to visually inspect the effectiveness of using BBVs in determining the similarities among intervals. The similarity matrix is the upper triangular of an N ´ N matrix, where N is the number of intervals in the program’s execution. An entry at (x, y) in the matrix represents the Manhattan distance (similarity) between the BBVs at intervals x and y. The Figure below shows the similarity matrices for the two example programs, gzip and gcc. The matrix’s diagonal represents the program’s execution over time from start to completion.
The above shows the basic block similarity matrices for gzipgraphic. The matrix diagonal represents a program’s execution to completion with units in 100s of millions of instructions. The darker the points, the more similar the intervals (the Manhattan distance is closer to 0); the lighter the points, the more different the intervals (the Manhattan distance is closer to 2). To interpret the graph, consider points along the diagonal axis. Each point is perfectly similar to itself, so all the points on the diagonal are dark. Starting from a given point on the diagonal, you can compare how that point relates to its neighbors forward and backward in execution by tracing horizontally or vertically. To compare given interval x with interval x + n, simply start at point (x, x) on the graph and trace horizontally to the right to (x, x + n). In the similarity matrices for gcc and gzip, you can see large blocks of dark which indicate that there are repeating behaviors in the program. Large triangular blocks that run along the diagonal axis indicate stable regions where the program behavior is not changing over time. Rectangular blocks of dark that occur off the diagonal axis indicate reoccurring behaviors, where a behavior that occurs later in execution been seen sometime in the past. When compared with the metrics shown in the First Figure above for gzip, it can be seen that the repeating nature of the program is being captured (this is most clear for gzip) by only examining the code that is executed. This motivates a technique to capture these patterns automatically.
Step 2: Random Projection Our basic block vectors have dimensionality equal to the number of basic blocks in a program, which ranges from 1,000s to over 100,000 for the SPEC benchmarks. Since clustering algorithms generally suffer from high dimensional data, it is essential to reduce the dimensionality before we cluster the data. Random projection is an effective technique at reducing the dimensionality of the vectors from 100’s of thousand to only 10’s. The operation involves a matrix multiplication and preserves the structure within the vectors needed for clustering. Figure 6 shows the effects of random projection of gcc166 from the original dimensionality of more than 80,000 to only 15. Some of the contrast between execution segments is lost, but the overall structure is still preserved. We find that 15 dimensions are sufficient for accurately finding the phases in a program.
Step 3: OffLine Phase Classification Using KMeans Clustering BBVs provide a compact and representative summary of the program’s behavior for each interval of execution. By examining the similarity between them, it is clear that there exists a highlevel pattern within each program’s execution. To use this behavior, it is necessary to have an automated way of extracting the phase information from programs. Clustering algorithms have proven useful in breaking the complete program execution into smaller groups (phases) that have similar BBVs. Because BBVs relate to the program’s overall performance, BBVbased grouping results in phases that are similar not only in their basic block distributions but also in every other metric measured, including overall performance. In addition, you can gather BBVs quickly because they require only the counting of basic block execution frequencies. The goal of clustering is to divide a set of points into groups such that points within each group are similar (by some metric, often distance) and the points in different groups are dissimilar. A wellknown clustering algorithm, kmeans, can accurately break program behavior into phases. Random linear projection reduces the dimensionality of the input data without disturbing the underlying similarity information; it is a useful technique for speeding up the execution of kmeans. One serious drawback of the kmeans algorithm is that it requires value k—the number of clusters—as input. To address this problem, we run the algorithm for several k values and use a score to guide our final choice for k. The following steps summarize our algorithm at a high level:
These steps provide a grouping of intervals into phases. The kmeans algorithm groups similar intervals together based on the BBV similarity metric using the Euclidean distance. We then choose a final grouping of phases from the different options based on how well formed the phases are, as measured by the BIC metric.
The Figure above shows the phases discovered in gcc166 after we clustered the data into 7 clusters. Each color depicts a cluster/phase, and it is worthy of note how the architectural parameters are similar to each other within the same phase, and the phases were formed by only examining the frequency in which the code paths were executed. 

Send mail to
calder@cs.ucsd.edu with
questions or comments about this web site.
