cse.240b Parallel Computer Architecture - Winter 2010      |
Course GoalsThis class is designed to enable students to follow the latest developments in computer architecture, especially those related to parallel computer architecture. Although this is clearly useful for those who wish to do research in computer architecture, it is also useful for those who work in related areas or who have general interests. The class strives for these goals through four aspects:
|
     |
|
Directory Coherence | Memory Consistency |
Interconnection Networks | Synchronization |
Transactions | Streams |
Vector Architectures | Heterogeneous Multi-core |
Simultaneous Multi-threading | Cell Architecture |
GPUs | Tiled Architectures |
CUDA | Cilk |
Arsenal Style Processors |
January 5, 2010 | The course forum is up! Make sure to sign up in order to receive important course details. Click here to join. You must give your name as your nickname, and enter in your UCSD email address in the information box. It may take a day or two for you to be comfirmed. |
January 5, 2010 | 240B is being held in room 4140 in the Computer Science and Engineering Building! |
January 10, 2010 | Yes, one analysis per paper! No analysis for textbook items UNLESS specified below. |
Final Grade | = | Paper Analysis | * |
|
Day | Due |
---|---|
Tuesday Classes | Previous Wednesday, 11 p.m. |
Thursday Classes | Previous Thursday, 11 p.m. |
H&P | Computer Architecture: A Quantitative Approach, 4rd Ed., Hennessey and Patterson. |
Due         | Item | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tu | Jan | 5 | First Class. | ||||||||||||||||||||||
Th | 7 | Multiprocessors / Coherence H & P 4th ed.: 4.1-4.4 | |||||||||||||||||||||||
Tu | 12 |
Advanced Coherence and DSM
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, ``The directory-based cache coherence protocol for the DASH multiprocessor,'' in ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pp. 148-159, 1990 link. D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, ``The DASH prototype: implementation and performance,'' in ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pp. 418-429, 1998 link. H & P 4.5 - 4.8 (please submit an "analysis" for the description of the Sun T1, for a total of 3 analyses today) | |||||||||||||||||||||||
Th | 14 | ||||||||||||||||||||||||
Tu | 19 |
Tiled Microprocessors Tiled Microprocessors are interesting because they blur the boundaries between multiprocessors and microprocessors. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs, by Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, Jae-Wook Lee, Paul Johnson, Walter Lee, Albert Ma, Arvind Saraf, Mark Seneski, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe and Anant Agarwal. IEEE Micro, March/April 2002. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, by Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jason Kim, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, and Anant Agarwal. Proceedings of the International Symposium on Computer Architecture, June 2004. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine, by Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, and Saman Amarasinghe. Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, October 4-7, 1998. | |||||||||||||||||||||||
Tu | 26 |
Tilera: Tiled Processors Commercialized
Presenter: Gopi Bell, S et al. "TILE64 - Processor: A 64-core SoC with Mesh Interconnect", ISSCC, February 2008. Wentzlaff et al. "On-Chip Interconnection Architecture of the Tile Processor", IEEE Micro, Sept-Oct 2007. TILE-Gx Processor Family Tilera: 5 Innovations | |||||||||||||||||||||||
Tu | Feb | 2 | Assignment 1 Out. | ||||||||||||||||||||||
Th | Feb | 4 | Tiled Discussion. | ||||||||||||||||||||||
Tu | Feb | 9 | Assignment 2. | ||||||||||||||||||||||
Tu | 16 |
IBM / Sony Cell
Presenter: Samson M. Baron, ``The Cell, At One,'' Microprocessor Report, March 2006 link. Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor, by Pham et al. IEEE Journal of Solid-State Circuits, January 2006. The Microarchitecture of the Synergistic Processor for a Cell Processor, by Flachs et al. IEEE Journal of Solid-State Circuits, January 2006. | |||||||||||||||||||||||
Th | 18 |
Vectors
Appendix F of H & P. | |||||||||||||||||||||||
Su | 21 | Assignment 2 due (11:59p); Assignment 3 out | |||||||||||||||||||||||
Tu | 23 |
Tiled Dataflow
Presenter: Sravanthi Kota Venkata K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore, ``Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture,'' ISCA 2003 link. S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers, ``The WaveScalar Architecture.'' To Appear in ACM Transactions On Computer Systems. link | |||||||||||||||||||||||
Th | 25 |
Cilk
The Implementation of the Cilk-5 Multithreaded Language. Frigo et al. PLDI 1998. The Cilk++ concurrency platform. Leiserson, C.E.; Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE Publication Year: 2009 , Page(s): 522 - 527 SMT Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor Dean Tullsen, Susan Eggers, Joel Emer, Henry Levy, Jack Lo, and Rebecca Stamm Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996. E. ek, M. Krman, N. Krman, and J.F. Martinez. Core Fusion: Accommodating software diversity in chip multiprocessors. In Intl. Symp. on Computer Architecture, San Diego, CA, June 2007 | |||||||||||||||||||||||
Mar | 1 |
Programming Assignment Three Due (11:59p) Tu | Mar | 2 |
| Streams;
| J.D. Owens, P.R. Mattson, S. Rixner, W.J. Dally, U.J. Kapasi, B. Khailany, A. Lopez-Lagunas, "A Bandwidth-Efficient Architecture for Media Processing," 31st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'98), 1998 Rixner et al. Memory access scheduling International Symposium on Computer Architecture Vancouver, British Columbia, Canada, 2000 Pizza Party; brief analysis of experiment outcome Th | | 4 |
|
GPU | NVIDIA Tesla: A Unified Graphics and Computing Architecture Lindholm, E.; Nickolls, J.; Oberman, S.; Montrym, J.; Micro, IEEE Mar 2008. Scalable Parallel Programming with CUDA. Nickolls et al. ACM Queue. 2008. Tu | | 9 |
|
Arsenal Style Processors
| Conservation Cores: Reducing the Energy of Mature Computations. Ganesh Venkatesh, John Sampson, Nathan Goulding, Saturnino Garcia, Slavik Bryskin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. Architectural Support for Programming Languages and Operating Systems, March 2010. link Asanovic et al, "A View of the Parallel Computing Landscape", Communications of the ACM, October 2009. Th | 11 |
No class.
| Th | | 18 | Final Exam | |