cse.240b Parallel Computer Architecture - Winter 2011
Course GoalsThis class is designed to enable students to follow the latest developments in computer architecture, especially those related to parallel computer architecture. Although this is clearly useful for those who wish to do research in computer architecture, it is also useful for those who work in related areas or who have general interests. The class strives for these goals through four aspects:
|Directory Coherence||Memory Consistency|
|Vector Architectures||Heterogeneous Multi-core|
|Simultaneous Multi-threading||Cell Architecture|
|Arsenal Style Processors|
|Jan 4||The course forum is up! Make sure to sign up in order to receive important course details. Click here to join. You must give your name as your nickname, and enter in your UCSD email address in the information box. It may take a day or two for you to be comfirmed.|
|Jan 4||240B is being held in Peterson 102.|
|Jan 4||Yes, one analysis per paper! No analysis for textbook items UNLESS specified below.|
|Jan 6||For this lecture only, you may turn in your analysis as late as 12:30p.|
|Final Grade||=||Paper Analysis||*||
|Tuesday Classes||Previous Wednesday, 11 p.m.|
|Thursday Classes||Previous Thursday, 11 p.m.|
|Tue, January 04||Overview, Administrivia, Tech Trends|
|Thu, January 06||The State of Parallel Computing; Tech Scaling|| Asanovic et al, "The landscape of parallel computing research: a view from berkeley", Tech Report UCB/EECS2006-183.|
Conservation Cores: Reducing the Energy of Mature Computations (focus on tech scaling portion of the paper)
|Tue, January 11||(continued)|
|Thu, January 13||Case Study: Raw, a Simple Parallel Machine|| The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs, Taylor et al, IEEE Micro March/April 2002. |
The Raw Specification, v 5.0, The Raw Specification, v 5.0, up to, but not including Section 8.
|Tue, January 18||Case Study: Raw, a Simple Parallel Machine||The Raw Specification, v 5.0, The Raw Specification, v 5.0, Section 8 to end.|
|Thu, January 20||Case Study: Raw, a Simple Parallel Machine||Tiled Microprocessors, Taylor, MIT PhD Thesis, 2007. p 13 - p 124.|
|Tue, January 25||Case Study: Raw, a Simple Parallel Machine||Tiled Microprocessors, Taylor, MIT PhD Thesis, 2007. p 125 - p 169.|
|Thu, January 27||Cache Coherence||H & P 4.1-4.4 (no summary); Do Problems 4.1, 4.3, 4.5, 4.16, 4.17|
|Tue, February 01||Cache Coherence, Part II|| H & P 4.5-4.8 (no summary); |
Lenoski et al., ``The directory-based cache coherence protocol for the DASH multiprocessor,'' in ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pp. 148-159, 1990.
``The DASH prototype: implementation and performance,'' in ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pp. 418-429, 1998.
|Thu, February 03||Tilera (David, Sanath)|| Bell, S et al. "TILE64 - Processor: A 64-core SoC with Mesh Interconnect", ISSCC, February 2008. |
Wentzlaff et al. "On-Chip Interconnection Architecture of the Tile Processor", IEEE Micro, Sept-Oct 2007.
TILE-Gx Processor Family (link may have changed, find some web resource that describes it).
|Tue, February 08||GPUs/CUDA (Rahimi, Chris)|| NVIDIA Tesla: A Unified Graphics and Computing Architecture Lindholm, et al. Micro, IEEE Mar 2008.|
Scalable Parallel Programming with CUDA. Nickolls et al. ACM Queue. 2008.
Inside Fermi: Nvidia's HPC Push
Additional materials for day's expert:
The GPU Computing Era, IEEE Micro, Nickolls 2010.
Undergrad text (P&H), 4th edition, has some materials on GPUs that looks interesting too.
|Thu, February 10||Tera (Bui Presenting)|| Watch Bill Dally of Nvidia lecture about his worldview|
The Tera computer system, Alverson et al. ICS '90. link
|Tue, February 15||No class|
|Thu, February 17||Cilk (Gonzalez, Eisner)|| The Implementation of the Cilk-5 Multithreaded Language. Frigo et al. PLDI 1998.|
The Cilk++ concurrency platform. Leiserson, C.E.; Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE Publication Year: 2009 , Page(s): 522 - 527
|Tue, February 22||Cell (Michael and Sidd)|| Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor, by Pham et al. IEEE Journal of Solid-State Circuits, January 2006. |
The Microarchitecture of the Synergistic Processor for a Cell Processor, by Flachs et al. IEEE Journal of Solid-State Circuits, January 2006.
|Thu, February 24||(catch up)||Read Appendix F of H & P.|
|Tue, March 01||(catch up)|| Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor, IEEE Micro 2010. Conway et al.|
IBM Power6 microarchitecture, Le et al, IBM Journal of Research and Development November 2007. (on IEEE Explore)
The SGI Origin: A ccNUMA Highly Scalable Server, ISCA 1997. Laudon et al.
|Thu, March 03||Vector Machines / SIMD (Vikram, Meenakshi)||Read H & P Appendix H.7; Do H & P problems 4.18, 4.20, 4.21|
|Tue, March 08|| Modern Shared Memory OOO Multiprocessors (Futrell, Olofson) |
FPGAs (Venkatesh, Sakdhnagool)
|Thu, March 10||Last Test|