Performance Programming

Performance programming seeks to improve performance beyond what is achieved by programming an algorithm in the most expedient manner. The goal is that each processing element be kept as busy as possible doing useful work. This entails satisfying four requirements: breaking problems into independent subproblems that can be executed concurrently, distributing these subproblems appropriately among the processing elements, making sure that the necessary data is close to its processing element, and overlapping communication with computation where possible. To attain high performance, these requirements must be satisfied whether one views "processing elements" as stages of an arithmetic or vector pipeline, functional units of a CPU, processors of a tightly coupled shared-memory multiprocessor, nodes of a distributed-memory supercomputer, or heterogeneous computers on a network.

Performance programming can require weeks of work even for a small program. This effort can be justified for some frequently-executed kernels and computations for which there is a large premium for fast answers. Performance programming techniques are applicable, but rarely appropriate, for sequential programs. They are often of paramount importance on parallel supercomputers.

Some useful url's

Performance Programming Papers

A bibliography of my papers on topics related to performance programming. Preprints of most of these papers are provided.

Performance Programming Courses

An introduction to uniprocessor optimization in PostScript or PDF Format for the 1998 NPACI Parallel Computing Institute.

Here are some materials for a one-day course in performance programming developed by Bowen Alpern and Larry Carter.

An outline of a graduate course for Columbia University's Computer Science Department taught by Bowen Alpern in the Fall of 1994 is available here.

I taught a course on performance programming at UCSD in the Fall of 1995. And again in Fall 98.