Lecture 1, 9/24/09 (Thu): Introduction
|
|
Lecture 2, 9/29/09 (Tue): Motivating applications; multiprocessors
Lecture 3, 10/6/09 (Tue): Address space organization, performance,
cache coherence and consistency
- Lecture slides posted
- Today's reading:
Text.
Chapter 2: pp. 24-30, 45-53, 61-3.
Chapter 5: pp. 197-203, 205-212.
-
For additional background material on shared memory architecture, consult
Hennessy and Patterson, Computer Architecture A Quantitative Approach,
Morgan Kaufmann.
3rd Ed.: Chapter 6, §6.1, §6.3 or the 4th Ed., Chapter 4, §4.1-2, 4.6
|
Lecture 4, 10/7/09 (Weds at 11AM, EBU3B 1202): Threads Programming
- Lecture slides re-posted with
revisions to code examples
(Code is also available on Triton in $PUB/Examples/Lec04)
- Today's reading:
Text.
Chapter 5: pp. 279-294, 307-308.
|
Lecture 5, 10/8/09 (Thu): A first application; OpenMP
Lecture 6, 10/13/09 (Tue): Message passing
- Lecture slides posted
- Today's reading:
- Text. Chapter 2, §2.5 (pp 53-60);
Chapter 6: §6.1-6.3 (pp 233-250)
-
A User's Guide to MPI, by Peter Pacheco,
pp. 1-10.
pdf
(Or Chapters 2 and 3 from Peter Pacheco's
Parallel Programming with MPI.
- For reference:
Chapter
8 from Ian Foster's Designing and Building Parallel Programs
(Adison Wesley Pub.)
-
The Ring code is available on Triton in $PUB/Examples/Ring
- Code examples from the Pacheco textbook are available in
$Pub/Examples/Pacheco
|
Lecture 7, 10/15/09 (Thurs): A first application with message passing
- Lecture slides posted
- Today's reading:
-
Chapter 2, §2.4.2 through 2.4.4: pp. 32-44;
Chapter 6: §6.6.1-6.6.3: pp. 260-263.
- A User's Guide to MPI, by Peter Pacheco, pp. 11-17 pdf
Or: Parallel Programming with MPI, by Peter Pacheco: Chapter 4: ALL, Chapter 5: §5.1-5: pp. 65-78
- Modeling the performance of an iterative method
pdf
- For reference:Chapter
8, §8.1 to 8.3, from Ian Foster's Designing and Building Parallel
Programs (Adison Wesley Pub., on-line). Nice introduction to
MPI.
- Code
- Quadrature code (trapezoidal rule): available in $PUB/Pacheco/ppmpi_c/hap04
|
Lecture 8, 10/20/09 (Thu): Vectorization, SIMD, GPUs
- Lecture slides posted
- Today's reading:
-
Programming Guidelines for Vectorizing C/C++ Compilers,
A. Bik et al. Dr. Dobb’s Journal, 2/1/03. (6 pp.)
(Alternatively: Wikipedia article on Vectorization)
- Intel SSE (2 pp.)
- NVIDIA Tesla: A Unified Graphics and Computing Architecture, by
E. Lindholm et al., IEEE Micro, March-April 2008, Vol. 28, Issue 2, pp. 39-55.
(IEEE Digital Library, accessible from any campus machine.
If you are off-campus,
may use the UCSD Web proxy which will enable you to access restricted content
from non-UCSD Internet service providers. UCSD Active Directory Login required.
)
- Vectorization code examples available in $PUB/Examples/Vectorization
- Supplemental reading - to probe further
- Streaming SIMD Extensions, SSE (Wikipedia)
-
"The CRAY-1 Computer System," R. M. Russell,
Comm. ACM 21(1): 63-72 (Jan. 1978.)
DOI
-
R. L. Sites, "An analysis of the Cray-1 computer." Proc.
5th Annual Symp. Computer Architecture (ISCA ’78),
ACM, pp. 101-106. (Esp. sections 4 and 5)
DOI
-
Vectorizing Loops, §6.1 in
Optimizing Applications on Cray X1 (TM) Series Systems, Cray.
- "Vector Processors," Computer Architecture: A Quantitative Approach, 4th Ed., Appendix F,
by J. L. Hennessy and D. A. Patterson, Morgan Kaufmann, 2007. (45 pages)
Also available as Appendix G in the 3rd Edition, if you have the CD.
|
Lecture 9, 10/22/09 (Thu): Performance Programming with CUDA
- Homework #3 (due Friday 10/30) posted
- Lecture slides posted
- Today's reading:
- Scalable Parallel Programming with CUDA, J. Nickolls et al.,
ACM Queue, 6(2):40-53, March/April 2008. (Both PDF and HTML)
- CUDA, Supercomputing for the Masses, by Rob Farber, parts I and II.
(14 part article, especially helpful if you are on the GPU Technology track.)
- To enquire further
- http://www.nvidia.com/object/cuda_home.html, CUDA Zone.
More articles, CUDA download and documentation. If you
are on the GPU Track, you’ll want to print out the
Documentation, including the CUDA Programming Guide and the
Best Practices Guide
- GPU Resources
|
Lecture 10, 10/26/09 (Monday at 1pm in EBU3B (CSE) Room 1202):
Accelerator Programming
- Guest speaker: Michael Wolfe
- Today's reading:
- The PGI Accelerator Programming Model on NVIDIA GPUs Part I and
Part II
|
Lecture 11, 10/27/09: Memory System Performance, Collective Communication
- Lecture slides posted
- Today's reading:
-
Optimization of Collective Communication Operations in MPICH,
by R. Thakur, R. Rabenseifner, and W. Gropp.
Int’l. J.
of High Performance Computing Applications, (19)1:49-66, Spring 2005
- Supplemental reading: Text. Introduction to Parallel
Computing, 2nd
Ed. Grama et al., Benjamin-Cummings, 2003.
Chapter 4: pp. 149-160,166-179 (but §4.5.2);
pp 184-189. Chapter 6: pp. 260-72.
Detailed discussions about MPI collective communication (For Reference). MPI: The Complete
Reference, by Marc Snir et al.
|
Lecture 12, 10/29/09: Performance Measurement, Advanced Communication
- Lecture slides posted
- Today's reading:
R. Van de Geign and J. Watts, SUMMA: Scalable universal matrix multiplication algorithm,
Concurrency: Practice and Experience, 9:255-74 (1997)
(Also on Netlib)
- For Reference:
-
MPI: The Complete Reference, by Marc Snir et al.
Detailed reference material on collective communications.
- Parallel print function.
PPF is the Parallel Tools consortium's parallel print facility.
For more information, consult the
PPF web page.
The software is installed on Triton in $(PUB)/lib/PPF,
examples in $(PUB)/examples/PPF
(See the
README file for important information about using the software).
|
Lecture 13, 11/3/09: Numerical Linear Algebra on the GPU
- Lecture slides posted
- As explained in class, there will be no formal lecture
(other than some background material on Gaussian Elimination),
and you are expected to contribute to the class discusions.
- Today's reading:
-
Benchmarking GPUs to tune dense linear algebra,
by V. Volkov and J. Demmel.
Proc. 2008 ACM/IEEE Conf. on Supercomputing,
Austin, TX, Nov. 15 - 21, 2008.
- For background on Gaussian Elimination:
- Notes on Gaussian Elimination,
"Quick review of Gaussian Elimination," Jim Demmel, UC Berkeley
-
A detailed explanation with worked examples
http://mathworld.wolfram.com/GaussianElimination.html
|
Lecture 14, 11/5/09: Load Balancing
Lecture 15, 11/10/09: More Irregular Problems; GPU applications
- Lecture slides posted
- Jim Demmel's notes on Simulating Particle Systems.
- S. Sengupta et al.,
Scan Primitives for GPU Computing,
Graphics Hardware. pp 97--106 (2007).
For further reading
- Bell and Michael Garland,
Efficient Sparse Matrix-Vector Multiplication on CUDA, NVIDIA Technical Report NVR-2008-004, Dec. 2008.
-
M. Harris, S. Sengupta and J. D. Owens,
Parallel Prefix Sum (Scan) with CUDA, CPU Gems 3, Chapter 39 (2008).
-
S. Chatterjee, G. E. Blelloch, and M. Zagha.
Scan primitives for vector
computers, Proc 1990 ACM/IEEE Conf. on Supercomputing, New York, pp. 666-675.
-
L. Nyland, M. Harris, J. Prins
Fast N-Body Simulation with CUDA, GPU Gems 3, Chapter 31 (2008).
- For further reading on spacefilling curves
- Using Space-filling Curves for Multi-dimensional Indexing by J. K. Lawder and P. J. H. King, 4th July 2000.
-
Space-filling Curves. Applet by V. B. Balayoghan, U. Texas at
Austin.
-
Plane filling curves
-
Space-filing Curves by Zbigniew Fiedorowic, Ohio State
University.
-
Hilbert Curve (Wolfram Research).
-
The Peano Curve and Fractal Curves.
Excerpt from from the 30th Edition of the CRC Standard
Mathematical Tables and Formulas, 1995, CRC (transcribed by
Silvio Levy).
|
Lecture 16, 11/12/09: Multi-tier computing
- Project Progress report due in class
- Lecture slides posted
- There will be no formal lecture
and you are expected to contribute to the class discusions.
- Today's reading:
-
S. B. Baden and S. J. Fink,
Communication overlap in multi-tier parallel algorithms,
Proc the 1998 ACM/IEEE Conference on Supercomputing,
San Jose, CA, Nov 1998, pp. 1-20.
-
M. Kistler, J. Gunnels, D. Brokenshire, and B. Benton,
Petascale
computing with accelerators. SIGPLAN Not. 44(4):241-250 (Feb. 2009).
- To read further about Cell
-
J. Kahle et al.,
Introduction to the cell multiprocessor,
IBM J. Res. Dev.
49(4/5): 589-604, July 2005.
- Kevin Krewell, Cell Moves Into the Limelight, Microprocessor Report
|
Lecture 17, 11/20/09 (Fri 3:00 to 4:20, EBU3B 1202): Programming Language Support
- Lecture slides posted
- Today's reading:
-
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of
the Cilk-5 multithreaded language. SIGPLAN Not. 33(5):212-223 (May. 1998). DOI
(Also see HPCC06 Challenge Class 2 in Cilk)
- K. Fatahalian et al., "Sequoia: Programming the Memory Hierarchy,"
Proc. ACM/IEEE SC 2006 Conf.
PDF
- For reference: Cilk web page
|
Lecture 18, 12/1: CSE 260 Symposium
Lecture 19, 12/2 (Wed 5:00 to 7:00p, Rm. 4140 EBU3B): CSE 260 Symposium
Lecture 20, 12/3: CSE 260 Symposium
| Maintained by
|
baden |
@ |
ucsd.
|
edu |
[Wed Nov 18 23:52:15 PST 2009]
|