CSE.240b Advanced Graduate Computer Architecture - Spring 2006


Course Goals

This class is designed to enable students to follow the latest developments in computer architecture. Although this is clearly useful for those who wish to do research in computer architecture, it is also useful for those who work in related areas or who have general interests. The class strives for these goals through four aspects:
  1. Covering advanced material which is commonly understood by practicing architects but is not covered in core-level grad classes.

  2. Presenting programming assignments that facilitate advanced understanding.

  3. Providing students with the opportunity to
    { find, analyze, communicate, discuss } advanced material.

  4. Examining fundamental ideas that are the "frontier" of the field and have not yet made it into industry. This will be done through reading papers and through a course project.

[Michael Taylor]
Prof. Michael Taylor


April 11, 2006The course forum (located here) is up!
April 11, 2006Details on the midterm are available below.
April 23, 2006Section on journal entries added to website.
May 1, 2006Assignment 1 posted.
May 11, 2006Final Project Details posted.
May 25, 2006Final Project: Project 1, Part II updated.

Course Materials

The class will consist of readings generally found in the following locations:
  1. Readings in Computer Architecture, edited by Hill, Jouppi, and Sohi.
    Tome of classic papers frequently unavailable online.

  2. IEEE Explore (free access from UCSD network)
    For IEEE publications.

  3. ACM Portal (free access from UCSD network)
    For ACM publications.

  4. Computer Architecture: A Quantitative Approach, Patterson & Hennessy.
    Hopefully, you already have this.


Approximate grading percentages (subject to change with advance notice):

Class (and forum) Participation 25 %
Assignments: 25 %
Project: 30 %
Final Exam: 10 %
Midterm Exam: 10 %

I expect a generally high level of work quality and independence in this class, since it is an advanced graduate class. Participation in both in-class discussions and in the online forum is an integral part of the class, and comprises a significant component of the class participation grade.

Use of the Forum

The forum (located here) is here to enrich the class. Since the class has no TA, I had to choose between having easy programming assignments which will pose few challenges for the students, or having challenging programming assignments in which the class works together as part of the forum to solve them. Here are some guidelines:
  1. The forum is a great place to vet ideas and ask questions about programming assignments.

  2. The forum is a great place to clarify problems or preliminary roadblocks with the readings. Deeper questions may be more appropriate for class discussion.

  3. Students should feel free to answer other students' questions. Generally, if you can answer a question, you should, because it will count for class participation. I generally will wait 24 hours before responding to a question to allow students to respond.

  4. Unless the question has already been clearly answered elsewhere, I will assume the responsibility of answering course policy questions.

  5. At least one of the assignments will require class-wide collaboration via the forum in order to complete!

  6. Each assignment will specify a few guidelines (regarding, for instance, whether it is o.k. to post code) for the use of the forum as it applies to the assignment.

Upcoming Programming Assignments

1. Understanding Current Architectures through Multiprocessor Programming

The class will use the DataStar Supercomputing cluster at the San Diego Supercomputer Center.

DataStar has 2464 processors and is the 35th most powerful supercomputer in the world, and you get to use it. Frankly, that's awesome. Since this is a computer architecture class (rather than a "high performance computing") class, our focus will be on a single DataStar node: the IBM p655+.

The p655+ employs a 8-processor Power4+ -based multichip module (see figure), which consists of four integrated dual-core chips.

There will be a series of related programming assignments using these machines.

San Diego Supercomputer Center's DataStar
     The p655 Multichip Module (MCM).

Course Project

See Final Project Description.

Midterm and Final Exams

The midterm and final exams will test different things. The midterm will test your ability to really understand a subject in depth. The final will cover the breadth of your knowledge of the material in the class. Generally speaking, the final will be fairly easy if you have read the papers and participated in the in-class discussion.

The midterm

Each student's "midterm exam" consists of presenting (with another student, for 20 minutes per student) the papers that were assigned for a particular day. Each day, two students take responsibility for being the "class experts" for a particular set of papers that we read. They will give a 40 minute presentation on the papers (roughly 20 minutes each). This presentation will overview background material, motivate the problem the paper is trying to solve, and present the key ideas in the paper. Be sure to go through the key architecture mechanisms proposed in the paper in detail.

Each student will have to do this only once.

In order to do this, I expect that the student experts will read other materials outside the paper (perhaps by tracing back the references) in order to understand the context of the paper.

I expect that student experts will practice the presentation in order to make sure that it fits in the allotted time and has appropriate transitions. This is a good preparation for giving talks, whether for communicating one's own research, or for a research exam.

Logistically, this works in the following way - students will sign up for a given day ahead of time. They will prepare the slides, and arrange a meeting with me to review them, ideally 3-4 days before the class. Then, they will revise the presentation, practice it, and then give it in class. During the presentation, students are free to ask questions of the experts, and if a discussion topic arises, we will pause to discuss it.

Reading Assignments

A note on journal entries (or "writeups")

The writeups should focus on *your* thoughts rather than summarizing the paper. This is much like in literature class where you were expected not to summarize the book you read, but discuss, analyze and critique it. I want to see your processed thoughts on the page. In general, feel free to pick some interesting aspect of the paper and discuss your ideas or thoughts. Or, you could pick some part of the paper that was challenging for you, figure it out, and then explain it. Or you could just have a list of interesting questions you thought of.

If you have a single page that spends most of its space on your thoughts (it might be a discussion of a single issue you thought about in depth, or it could be a list of a bunch of ideas or thoughts you had), that's sufficient - you don't need to write a novel.

As long as it's clear that you are thinking about the material, don't worry about whether you're getting things absolutely correct, etc. And don't worry about making every writeup stellar. Pretty much, at the end of the class, I'm going to flip through your journal entries and see if it looks like you are generally *thinking* about the papers.

Please refer to the course forum ("Course Administration") for an extended list of writeup ideas.

The readings

Abbreviations (if none given, available on IEEE Explore or ACM Portal):
H&P Computer Architecture: A Quantitative Approach, 3rd Ed., Hennessey and Patterson.
RiCA Readings in Computer Architecture, eds. Hill, Jouppi, and Sohi.

Due         Item
Tu 4-4 First Class.
Th 4-6 Multiprocessors I
H&P 6.1 - 6.5
Monday 4-10 Tuesday's class is rescheduled to 10:50 AM - 12:10 PM, Monday, April 10, in EBU-3B 1202.
Steve Swanson, a faculty candidate, will be talking on Wavescalar.
Attendance is required unless you have a significant conflict.
Please get an extra seat from the side, so that the class does not take up all of the seating.
Please read the following papers in preparation for the talk:
A Preliminary Architecture for a Basic Data-Flow Processor, Jack Dennis et al., Proceedings of the International Symposium on Computer Architecture (ISCA) 1975. (RiCA)
Wavescalar, Steven Swanson et al., Proceedings of the International Symposium on Microarchitecture (MICRO) 2003.
Please write a short (1-2 page) analysis of Swanson's talk and research. Please describe its relationship to Jack Dennis's paper.
Tu 4-11 Class moved to Monday, April 10 at 10:50 pm, EBU-3b 1202.
Th 4-13 Multiprocessors II
H&P 6.6-6.8, 6.11, 6.13
Tu 4-18 Tiled Microprocessors I: Raw (Sashi, Donghwan)
Tiled Microprocessors are interesting because they blur the boundaries between multiprocessors and microprocessors.
The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs, by Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, Jae-Wook Lee, Paul Johnson, Walter Lee, Albert Ma, Arvind Saraf, Mark Seneski, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe and Anant Agarwal. IEEE Micro, March/April 2002.
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, by Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jason Kim, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, and Anant Agarwal. Proceedings of the International Symposium on Computer Architecture, June 2004.
Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine, by Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, and Saman Amarasinghe. Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, October 4-7, 1998.
Write a 1-3 page analysis of the key ideas of the three papers. (Class Experts' Presentation)
Th 4-20 Tiled Microprocessors II: GRID/TRIPS and Wavescalar (Willis, Adam)
We continue with Grid/TRIPS and Wavescalar, which extend Raw's distributed execution model with features found in out-of-order superscalar and dataflow processors. K. Sankaralingam will be speaking at UCSD in early May.
A Design Space Evaluation of Grid Processor Architectures, R. Nagarajan, K. Sankaralingam, D. Burger, and S.W. Keckler. 34th Annual International Symposium on Microarchitecture (MICRO), pp. 40-51, December, 2001.
Scalar Operand Networks, by Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, and Anant Agarwal. IEEE Transactions on Parallel and Distributed Systems (Special Issue on On-chip Networks), February 2005.
Wavescalar, Steven Swanson et al., Proceedings of the International Symposium on Microarchitecture (MICRO) 2003. (We've already read this, but we will discuss this in class this day.)
Write a 1-3 page analysis of the key ideas of the three papers. (Class Experts' Presentation)
Fri 4-21 11 am Please attend Onur Mutlu's job talk (location: EBU-3b 1202). Attendance is required unless you have a significant conflict.
Tu 4-25 Tiled Discussion
Read ISCA 06 Wavescalar Paper. This describes what they actually did.
The ISCA 06 Wavescalar Implementation TR may also help. (Optional)
From the above, see if you can figure out Wavescalar's 5-tuple.
Think of and post a unique discussion question on Raw/Grid/Wavescalar/SONs, on the forum, in the conference entitled Lecture 5.
(Also think of how you would answer your own question and others.)
Th 4-27 Power4 (Richa, Todd)
We examine Power4, which is state-of-the-art in many ways: wide-issue out-of-order superscalar, super-pipelined, dual-core, multi-chip module, etc. This is also the architecture that we will be programming, so knowledge of this paper is essential in the programming assignments. Although it is called "Power", Power4 is essentially a member of the PowerPC family (see manuals directory). In the readings, the goal is to get a high-level idea of the architecture and microarchitecture, but fairly in-depth understanding of shared-memory, coherence and consistency support in the architecture.
POWER4 system microarchitecture by J. M. Tendler, J. S. Dodson, J. S. Fields, Jr. H. Le, B. Sinharoy. IBM Journal of Research and Development, January 2002.
As always, do a 1-3 page journal entry. (Class Experts' Presentation)
Tu 5-2 Power4 Manual (Vinoth, Arvindh)
Read Book 2: 1.4, 1.7, skim 3.2.2, 3.3, skim 4, and Appendix B ("Programming Examples for Sharing Storage"). Read Book 3: 4.2.4. (You may have to read other sections to understand these sections.) If you are not familiar with PowerPC, you can refer to Book 1: User Instruction Set. Please read these sections carefully and make sure you understand well the examples in Appendix B.
As always, do a 1-3 page journal entry. (Class Experts' Presentation) (Synchronization and Consistency Presentation)
Th 5-4 Power4 continued (Todd, Arvindh)
Tu 5-9 Shared Memory / Distributed Shared Memory (Anthony, Kwangyoon)
How to Make a Multiprocessor Computer that Correctly Executes Multiprocessor Programs, by L. Lamport. (RiCA)
A New Solution to Coherence Problems in Multicache systems, by Censier and Feautrier. (RiCA)
The Stanford Dash Multiprocessor, by Lenoski et al. (RiCA)
As always, do a 1-3 page journal entry. (Class Experts' Presentation)
Th 5-11 ILP (Garo / Saturnino)
The MIPS R10000 Superscalar Microprocessor, by Yeager. IEEE Micro 1996. (also in RiCA)
The Alpha 21264 Microprocessor, by Kessler. IEEE Micro 1999.
The Microarchitecture of the Pentium 4, by Hinton et al. Intel Technology Journal, Q1 2001.
Please focus your 1-3 page journal entry on analyzing the differences and similarities in the microarchitecture of the three systems. (Class Experts' Presentation)
Tu 5-16 Technology Trends (Mohammad Al-Fares / Jason Thurkettle)
Impact of Technology on Architecture, John H. Edmondson. From Design of High Performance Microprocessor Circuits, eds. Anantha Chandrakasan et al.
Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures. ISCA 2000. Agarwal, Hrishikesh, Keckler and Burger.
(As always, do a 1-3 page journal entry.) (Class Experts' Presentation)
Th 5-18 Errors (Amelang / John Fish)
IBM experiments in soft fails in computer electronics (1978-1994) by Ziegler et al. IBM Journal Research and Development, January 1996.
DIVA: A Reliable Substrate for Submicron Microarchitecture Design by Austin et al. Micro 1999.
Razor: A Low-Power Pipeline Based on Circuit-Level timing Speculation by Ernst et al. Micro 2003.
(As always, do a 1-3 page journal entry.) (Presentation)
Tu 5-23 IBM/SONY Cell (Also: experience with JSSC paper)
Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor, by Pham et al. IEEE Journal of Solid-State Circuits, January 2006.
The Microarchitecture of the Synergistic Processor for a Cell Processor, by Flachs et al. IEEE Journal of Solid-State Circuits, January 2006.
(As always, do a 1-3 page journal entry.) (Presentation)
Th 5-25 Vectors (Jennifer / Jeffrey)
Krste Asanović, John Hennessy, David A. Patterson, "Vector Processors", Appendix G in Computer Architecture: A Quantitative Approach, Third Edition, Morgan Kaufman, ISBN 1-55860-596-7, May 2002. PDF
Ronny Krashinsky, Christopher Batten, Mark Hampton, Steven Gerding, Brian Pharris, Jared Casper, and Krste Asanović, "The Vector-Thread Architecture", 31st International Symposium on Computer Architecture (ISCA-31), Munich, Germany, June 2004. PDF
(As always, do a 1-3 page journal entry.) (Class Presentation 1 and 2)
Tu 5-30 Interconnection Networks (Jin Seok Lee / Cezario Tebcherani)
A Survey of Wormhole Routing Techniques in Direct Networks, by Ni and McKinley. In IEEE Computer February 1993. (Ignore Figure 1, which is misleading.)
A Necessary and Sufficient Condition for Deadlock- Free Adaptive Routing in Wormhole Networks by Jose Duato. IEEE Transactions on Parallel and Distributed Systems, October 1995.
(As always, do a 1-3 page journal entry.)
Th 6-1 Transactional Memory (Barath Raghavan)
Transactional Memory: architectural support for lock-free data structures, by M.P. Herlihy and J.E.B. Moss. International Symposium on Computer Architecture, May 1993.
Virtualizing Transactional Memory by R. Rajwar, M.P. Herlihy, and K. Lai. International Symposium on Computer Architecture, June 2005.
(As always, do a 1-3 page journal entry.)
Tu 6-6 Student Project Presentations (5 minutes per student)
(A few brief words on the final, Professor ..)
Th 6-8 Student Project Presentations (5 minutes per student)
Tu 6-13 Final 11:30-2:30 See forum for more information on the final, including a list of some of the questions that will appear.

email: mbtaylor at you see ess dee dot ee dee you
web:   Michael Taylor's Website.