cse.240b Parallel Computer Architecture - Winter 2011


Course Goals

This class is designed to enable students to follow the latest developments in computer architecture, especially those related to parallel computer architecture. Although this is clearly useful for those who wish to do research in computer architecture, it is also useful for those who work in related areas or who have general interests. The class strives for these goals through four aspects:
  1. Covering advanced material which is commonly understood by practicing architects but is not covered in core-level grad classes.

  2. Presenting programming assignments that facilitate advanced understanding.

  3. Providing students with the opportunity to
    { find, analyze, communicate, discuss } advanced material.

  4. Examining fundamental ideas that are the "frontier" of the field and have not yet made it into industry. This will be done through reading papers and through course assignments.

[Michael Taylor]
Prof. Michael Taylor


Directory Coherence Memory Consistency
Interconnection Networks Synchronization
Transactions Streams
Vector Architectures Heterogeneous Multi-core
Simultaneous Multi-threading Cell Architecture
GPUs Tiled Architectures
Arsenal Style Processors


Jan 4The course forum is up! Make sure to sign up in order to receive important course details. Click here to join. You must give your name as your nickname, and enter in your UCSD email address in the information box. It may take a day or two for you to be comfirmed.
Jan 4240B is being held in Peterson 102.
Jan 4Yes, one analysis per paper! No analysis for textbook items UNLESS specified below.
Jan 6For this lecture only, you may turn in your analysis as late as 12:30p.

Course Materials

The class will consist of readings generally found in the following locations:
  1. IEEE Explore (free access from UCSD network)
    For IEEE publications.

  2. ACM Portal (free access from UCSD network)
    For ACM publications.

  3. Computer Architecture: A Quantitative Approach, Hennessy & Patterson.
    Hopefully, you already have this.

    I will not post links to the articles, because I want located these papers to become second nature to you.


I expect a high level of work quality and independence in this class, since it is an advanced graduate class. Participation in in-class discussions is an integral part of the class, and comprises a significant component of the class participation grade

In true computer architecture form, your Spec240B number (also known as your grade!) includes the multiplication function:
Final Grade = Paper Analysis   *  

Class Participation 25 %
Mini Research Exam(s) 30 %
Final Paper 25 %
Quizzes + Last Test (held on last day of class) 20 %
(subject to change as class unfolds)

Paper Analysis

Since much of the class will consist of discussions, it is absolutely essential that you do the reading BEFORE class. There is no greater waste of everybody's time than a discussion class where nobody has read the paper. To help keep the quality of class high, we will collect responses to a set of questions for each paper via a google form. A script will be used to harvest results; it will take the last entry that you submitted before 11:30 am on the day of the class. I will provide a 5 minute grace period, no exceptions.

If you have concerns over this policy, I recommend you submit early to allow yourself margin for unexpected issues; maybe even the day before! To ensure that you receive credit, you should record your notes in a google doc and then copy it into the form after you are done. This way, if there are any issues with the form, you will be able to show both your content and the times it was written. I will provide people with a list of their answers at the end of class to confirm grades.

Your Paper Analysis Grade will be based on the percentage of paper summaries that you have filled out with thoughtful (but not perfect) answers. If 90% appear thoughtful, then you will receive full credit. Thus, if an act of god or force majeure causes two classes' worth of entries to be unentered, it will not affect your grade. However, the third such event will affect your grade as much as getting 25 % wrong on your final!

Final Paper

Due 3/17 (or earlier, no extensions.)

Write a 4 page proposal for a 5-year research horizon, like a professor would write for a grant.
What are the emerging trends that are happening today and will continue into the future?
How will architectures change to evolve going into the future?
Keep in mind that if you write about ideas that are already been published,
then you are targeting a zero(or less) horizon, not a 5-year horizon!

Your paper should have ~1 page of introduction motivated the trends that lead to your vision,
~2 pages describing your vision, and ~1 page of related work.
You should cite at least 15 relevant papers.

Grammar and spelling are important. I recommend Latex, but it is not mandatory.

Do not parrot your advisor!

Plagiarizing is the only way you can fail this class. Don't do it! I have read many, many papers.


The mini-research and last test will test different things. The mini-research exams will test your ability to really understand a subject in depth. The last test will cover the breadth of your knowledge of the material in the class. This material will include both the reading and topics that come up in class discussion. Some of these topics will almost certainly not be in the reading. Generally speaking, the last test will be fairly easy if you have done the reading carefully, and participated in the in-class discussion, and jotted down a few keywords to remind yourself what to study (i.e. via internet source) later.

Mini Research Exams

Each student's "mini research exam" consists of presenting (if presenting with another student, for 20 minutes per student, otherwise 30-45 minutes per student) the papers that were assigned for a particular day. This will mirror, to some degree, the research exam that PhD students have to do after their third year in the PhD program. Each day, one or two students take responsibility for being the "class experts" for a particular set of papers that we read. They will give a 40 minute presentation on the papers (roughly 20 minutes each). This presentation will overview background material and the context of the research in compared to the related work and the time it was written, motivate the problem the paper is trying to solve, and present the key ideas in the paper, using the IMD (ideas, mechanisms, dinosaurs) framework, and propose future directorions or questions. Be sure to go through the key architecture mechanisms proposed in the paper in detail. You must create your own slides; you may not simply download a talk from the internet. However, you may reuse diagrams.

At the end of the presentation, We will have the class fill out evaluation forms given each student feedback on their presentation.

Each student will have to do this a few times, dependent on enrollment.

In order to do this, student experts must read other materials outside the paper (tracing back the references) in order to give the class additional context. It should be clear from the presentation that the students have done this.

I expect that student experts will practice the presentation in order to make sure that it fits in the allotted time and has appropriate transitions. This is a good preparation for giving talks, whether for communicating one's own research, or for a research exam.

Logistically, this works in the following way - students will sign up for a given day ahead of time. They will prepare the slides, and will send them to me on the following schedule:

Day Due
Tuesday Classes Previous Wednesday, 11 p.m.
Thursday Classes Previous Thursday, 11 p.m.

If I do not receive it by those times, a late penalty may apply. I will follow up with feedback by Sunday at noon, quite but possibly even earlier. Remember, this is worth 20% of your Spec240B. After getting feedback, you will revise the presentation, practice it and then give it in class. During the presentation, students are free to ask questions of the experts, and if a discussion topic arises, we will pause to discuss it. -->
email: mbtaylor at you see ess dee dot ee dee you
web:   Michael Taylor's Website.

NOTE: Schedule is highly Subject to change.
Tue, January 04 Overview, Administrivia, Tech Trends
Thu, January 06 The State of Parallel Computing; Tech Scaling Asanovic et al, "The landscape of parallel computing research: a view from berkeley", Tech Report UCB/EECS2006-183.

Conservation Cores: Reducing the Energy of Mature Computations (focus on tech scaling portion of the paper)
Tue, January 11 (continued)
Thu, January 13 Case Study: Raw, a Simple Parallel Machine The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs, Taylor et al, IEEE Micro March/April 2002.

The Raw Specification, v 5.0, The Raw Specification, v 5.0, up to, but not including Section 8.
Tue, January 18 Case Study: Raw, a Simple Parallel Machine The Raw Specification, v 5.0, The Raw Specification, v 5.0, Section 8 to end.
Thu, January 20 Case Study: Raw, a Simple Parallel Machine Tiled Microprocessors, Taylor, MIT PhD Thesis, 2007. p 13 - p 124.
Tue, January 25 Case Study: Raw, a Simple Parallel Machine Tiled Microprocessors, Taylor, MIT PhD Thesis, 2007. p 125 - p 169.
Thu, January 27 Cache Coherence H & P 4.1-4.4 (no summary); Do Problems 4.1, 4.3, 4.5, 4.16, 4.17
Tue, February 01 Cache Coherence, Part II H & P 4.5-4.8 (no summary);

Lenoski et al., ``The directory-based cache coherence protocol for the DASH multiprocessor,'' in ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pp. 148-159, 1990.

``The DASH prototype: implementation and performance,'' in ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pp. 418-429, 1998.
Thu, February 03 Tilera (David, Sanath) Bell, S et al. "TILE64 - Processor: A 64-core SoC with Mesh Interconnect", ISSCC, February 2008.

Wentzlaff et al. "On-Chip Interconnection Architecture of the Tile Processor", IEEE Micro, Sept-Oct 2007.

TILE-Gx Processor Family (link may have changed, find some web resource that describes it).
Tue, February 08 GPUs/CUDA (Rahimi, Chris) NVIDIA Tesla: A Unified Graphics and Computing Architecture Lindholm, et al. Micro, IEEE Mar 2008.

Scalable Parallel Programming with CUDA. Nickolls et al. ACM Queue. 2008.

Inside Fermi: Nvidia's HPC Push

Additional materials for day's expert:

The GPU Computing Era, IEEE Micro, Nickolls 2010.

Undergrad text (P&H), 4th edition, has some materials on GPUs that looks interesting too.
Thu, February 10 Tera (Bui Presenting) Watch Bill Dally of Nvidia lecture about his worldview

The Tera computer system, Alverson et al. ICS '90. link
Tue, February 15 No class
Thu, February 17 Cilk (Gonzalez, Eisner) The Implementation of the Cilk-5 Multithreaded Language. Frigo et al. PLDI 1998.

The Cilk++ concurrency platform. Leiserson, C.E.; Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE Publication Year: 2009 , Page(s): 522 - 527
Tue, February 22 Cell (Michael and Sidd) Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor, by Pham et al. IEEE Journal of Solid-State Circuits, January 2006.

The Microarchitecture of the Synergistic Processor for a Cell Processor, by Flachs et al. IEEE Journal of Solid-State Circuits, January 2006.
Thu, February 24 (catch up) Read Appendix F of H & P.
Tue, March 01 (catch up) Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor, IEEE Micro 2010. Conway et al.

IBM Power6 microarchitecture, Le et al, IBM Journal of Research and Development November 2007. (on IEEE Explore)

The SGI Origin: A ccNUMA Highly Scalable Server, ISCA 1997. Laudon et al.
Thu, March 03 Vector Machines / SIMD (Vikram, Meenakshi) Read H & P Appendix H.7; Do H & P problems 4.18, 4.20, 4.21
Tue, March 08 Modern Shared Memory OOO Multiprocessors (Futrell, Olofson)

FPGAs (Venkatesh, Sakdhnagool)
Thu, March 10 Last Test