CSE 268C - Performance Programming

A systematic approach to the design, writing and tuning of programs to sustain near-peak performance, with particular emphasis on RISC processors and massively-parallel computers.

Instructor: Larry Carter.

Class times: Tuesdays and Thursdays, 2:20-3:40, AP&M 4218.

Office hours: Tuesday & Thursdays, 3:45-5:00, or by appointment. AP&M 4101.


Assignments

We'll begin with short, easy assignments to get things started, but later in the course there will be fewer, longer projects.

Related material

Slides used in NPACI Parallel Computing Institute in PostScript or PDF Format. (I've used some of these in class.)

Information about the CPU's of various computers.

My papers about performance programming.


Homework #1

Due Sept 29

Read the handout, be prepared to discuss it in class Tuesday.


Homework #2

Profiling exercise (due October 1)
  1. Choose your favorite computer (be sure it has a Fortran compiler).
  2. Compile the NAS 1 ("pencil and paper") ft.f using a reasonable set of compiler options. You may have to change the function "timer" to something that works on your machine.
  3. Time the benchmark, getting wallclock time, CPU time, and the time reported by the benchmark (which is for the "timed section" of the code).
  4. Profile it using prof, gprof, or your favorite profiler.
  5. Compare the times of the unprofiled versus the profiled code.
  6. Find the procedure that takes the largest percentage of runtime.
  7. Make a reasonable guess about what the most time-consuming inner loop in that procedure is. Generate the assembly code for the program and find the inner loop in the assembly listing.
  8. Calculate how much faster the program would run if that most time-consuming procedure were made to run ten times faster. Important: if one run takes time t1 and a second run takes time t2 < t1, then we say "the second run was t1/t2 times faster". We can also say, "the second run was 100((t1/t2) - 1) percent faster".
  9. Summarize your results (just the facts - no "writeup" needed). E.g. "Computer: Sparc 20, 50 MHz clock. Compiler: g77. Compiler options: -fast. Timing: (figures for CPU, wallclock, and timed section, with and without profiling) ... Etc ..."

Homework #3

Compute "cycles per iteration" for the inner loop of transx in the ft benchmark. Spend 1 hour improving its performance.

Homework #4

Read paper, "A comparison of the floating-point performance of current computers" by Steven H. Langer from July/Aug 1998 issue of Computers in Physics.

Project #1

Choose a program that spends most of its execution time in a few relatively simple loops. Improve the loops. In most cases, this will involve Your goal is to be able to explain how long the improved code is taking and why it would be hard to get it to go much faster. Depending on the program you choose, I may want you to be more or less ambitious. I strongly encourage you to talk with me at least twice about your program, so I can give you some idea of the type of improvements and explanations I am or am not expecting.

Some possible programs: