cse.240b Parallel Computer Architecture - Winter 2012
Course GoalsThis class is designed to enable students to follow the latest developments in computer architecture, especially those related to parallel computer architecture. Although this is clearly useful for those who wish to do research in computer architecture, it is also useful for those who work in related areas or who have general interests. The class strives for these goals through four aspects:
|Directory Coherence||Memory Consistency|
|Vector Architectures||Heterogeneous Multi-core|
|Simultaneous Multi-threading||Cell Architecture|
|Arsenal Style Processors|
|Jan 12||The course forum is up! Make sure to sign up in order to receive important course details. Click here to join. You must give your name as your nickname, and enter in your UCSD email address in the information box. It may take a day or two for you to be comfirmed.|
|Jan 12||240B is being held in HSS 2321|
|Jan 12||Yes, one analysis per paper! No analysis for textbook items UNLESS specified below.|
|Final Grade||=||Paper Analysis||*||
|Tuesday Classes||Previous Wednesday, 11 p.m.|
|Thursday Classes||Previous Thursday, 11 p.m.|
|Tue, January 10||Peru Day|
|Thu, January 12||Overview, Administrivia, Tech Trends|| Asanovic et al, "The landscape of parallel computing research: a view from berkeley", Tech Report UCB/EECS2006-183.|
|Tue, January 17||The State of Parallel Computing; Tech Scaling||Conservation Cores: Reducing the Energy of Mature Computations (focus on tech scaling portion of the paper). Q1: If we do a straightforward scaling according to the paper, what will desktop processors look like in 20 years? How about mobile phone processors? How dark will they be? If dark silicon is replaced with cache, how much cache will the chips have?|
|Thu, January 19||Case Study: Raw, a Simple Parallel Machine|| The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs, Taylor et al, IEEE Micro March/April 2002. |
The Raw Specification, v 5.0, The Raw Specification, v 5.0, up to, but not including Section 8 (no summaries necessary for Raw Spec).
|Tue, January 24||Case Study: Raw, a Simple Parallel Machine||The Raw Specification, v 5.0, The Raw Specification, v 5.0, Section 8 to end.|
|Thu, January 26||Case Study: Raw, a Simple Parallel Machine||Tiled Microprocessors, Taylor, MIT PhD Thesis, 2007. p 13 - p 124.|
|Tue, January 31||Case Study: Raw, a Simple Parallel Machine|| Tiled Microprocessors, Taylor, MIT PhD Thesis, 2007. p 125 - p 169. (yes, do a second summary; you should be thinking analytically about everything you read.) |
No class; Watch Bill Dally of Nvidia lecture about his worldview
|Thu, February 02||Cache Coherence|| H & P 4.1-4.4 (no summary); Do Problems 4.1, 4.3, 4.5, 4.16, 4.17 |
|Tue, February 07||Cilk|| The Implementation of the Cilk-5 Multithreaded Language. Frigo et al. PLDI 1998.|
The Cilk++ concurrency platform. Leiserson, C.E.; Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE Publication Year: 2009 , Page(s): 522 - 527
Read this blog snapshot of Leiserson arguing Cilk's superiority verus OpenMP. Also included a discussion of locking in Cilk.
|Thu, February 09||Cache Coherence, Part II|| H & P 4.5-4.8 (no summary); |
Lenoski et al., ``The directory-based cache coherence protocol for the DASH multiprocessor,'' in ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pp. 148-159, 1990.
``The DASH prototype: implementation and performance,'' in ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pp. 418-429, 1998.
|Tue, February 14||Tilera|| Bell, S et al. "TILE64 - Processor: A 64-core SoC with Mesh Interconnect", ISSCC, February 2008. |
Wentzlaff et al. "On-Chip Interconnection Architecture of the Tile Processor", IEEE Micro, Sept-Oct 2007.
TILE-Gx Processor Family (link may have changed, find some web resource that describes tile gx 100).
Cuckoo Directory: A Scalable Directory for Many-Core Systems, Falsafi, HPCA 2011.
|Thu, February 16||GPUs/CUDA & Tera|| NVIDIA Tesla: A Unified Graphics and Computing Architecture Lindholm, et al. Micro, IEEE Mar 2008.|
Scalable Parallel Programming with CUDA. Nickolls et al. ACM Queue. 2008.
The Tera computer system, Alverson et al. ICS '90. link
Inside Fermi: Nvidia's HPC Push
Additional materials for day's expert:
The GPU Computing Era, IEEE Micro, Nickolls 2010.
Undergrad text (P&H), 4th edition, has some materials on GPUs that looks interesting too.
|Tue, February 21||Cell|| Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor, by Pham et al. IEEE Journal of Solid-State Circuits, January 2006. |
The Microarchitecture of the Synergistic Processor for a Cell Processor, by Flachs et al. IEEE Journal of Solid-State Circuits, January 2006.
|Thu, February 23||Dark Silicon|| The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future, IEEE Micro 2011, Goulding-Hotta |
Toward Dark Silicon in Servers IEEE Micro 2011, Falsafi;
Skadron, IEEE Micro 2011, Dark Silicon
|Tue, February 28||(catch up)||Read Appendix F of H & P.|
|Thu, March 01||(catch up)|| Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor, IEEE Micro 2010. Conway et al.|
IBM Power6 microarchitecture, Le et al, IBM Journal of Research and Development November 2007. (on IEEE Explore)
The SGI Origin: A ccNUMA Highly Scalable Server, ISCA 1997. Laudon et al.
|Tue, March 06||Vector Machines / SIMD|| Read H & P Appendix H.7; Do H & P problems 4.18, 4.20, 4.21|
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators ISCA 2011, Yunsup Lee, Asanovic
|Thu, March 08||Scalable Out-of-order ILP Machines||Wavescalar Micro 2003 Paper; TRIPS ISCA 2003 Paper|
|Tue, March 13||Single-ISA Heterogeneous Multicore|| Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, MICRO 2003, Tullsen|
ISCA 2010; Horowitz; Understanding Sources of Inefficiency in General-Purpose Chips
|Thu, March 15||Last Test|