# Performance Programming

Columbia, Fall 1994

Professor: Bowen Alpern

** SORRY! These are no longer online.**
## Course Outline

What is * performance programming*? When is it justified? Review
of RAM and PRAM analysis. Overview of examples to be used in the
course. Poisson distributions. Extended example: seismic migration.
Extended example: unblocked matrix multiplication. The Two Level
model of computation. The memory hierarchy of an RS/6000. Locality:
local, semilocal, and nonlocal data passes. Why Quicksort is better
than Heapsort. Communication cost on SP-1 and SP-2.
The PMH model of computation and its parameters. Visualizing computers.
Programming a PMH. Performance programming techniques. Hierarchial tiling.
Extended example: a PDE-like code.
LAPACK and the BLAS. ` daxpy`

vs. ` ddot`

.
Eliminating sequential dependencies in ` ddot`

.
Computation ``sticks.'' Prefetching. LU decomposition. The importants
of proper parenthesization.
Cache. Translation Lookaside Buffer (TLB). Disk. Blocking.
Associativity: what it is and how to fight it. Blocking multiple
access patterns. Examples: ranking integers, LU decomposition, and
FFT. Poisson distribution (again).
Distributed and shared address space parallelism. Latency and
bandwidth. Collective Communication. Choreography. 1D and 2D LU
decomposition.
Example: NAS/CG
Scalability analysis. Performance landscapes. Examples: LU
decomposition and sparse matrix-vector product.
What would a portable performance program look like? Program
variants. Generic Models. Deriving a PMH model for a machine.
Modeling the CM-5. Distortions of PMH models.
Message Compression: fewer messages, fewer values, or fewer bits.
Hardware support for message compression. Example: NAS/IS. Natural
sequence compression. Standard techniques: run-length coding, Huffman
codes, arithmetic coding.
IEEE floating-point format. Rounding modes. Integer arithmetic in
the floating-point unit. Examples: Pseudo-random number generation
and data compression. Microparallelism. Inner-loop tricks. Table
lookup and the NAS/EP benchmark. Approximate (inverse) squareroots.
Fast Fourier Transforms.

Uniformity and the UMH model. Some theorems and their
proofs. Threshold functions and candidate threshold functions.
Examples: transpose, matrix multiplication, FFT and parallel matrix
multiplication.
Rules of thumb. Dynamic programming. Extended example: optimal
Steiner trees.
In addition, Rick Lawrence, taught one class on his experience tuning
the NAS/MG benchmark for the IBM SP2.