CSE.240b Advanced/Parallel Computer Architecture - Winter 2008

    

Course Goals

This class is designed to enable students to follow the latest developments in computer architecture, especially those related to parallel computer architecture. Although this is clearly useful for those who wish to do research in computer architecture, it is also useful for those who work in related areas or who have general interests. The class strives for these goals through four aspects:
  1. Covering advanced material which is commonly understood by practicing architects but is not covered in core-level grad classes.

  2. Presenting programming assignments that facilitate advanced understanding.

  3. Providing students with the opportunity to
    { find, analyze, communicate, discuss } advanced material.

  4. Examining fundamental ideas that are the "frontier" of the field and have not yet made it into industry. This will be done through reading papers and through a course project.

      
[Michael Taylor]
Prof. Michael Taylor

Announcements

January 6, 2008The course forum (located here) will be up soon!

Course Materials

The class will consist of readings generally found in the following locations:
  1. IEEE Explore (free access from UCSD network)
    For IEEE publications.

  2. ACM Portal (free access from UCSD network)
    For ACM publications.

  3. Computer Architecture: A Quantitative Approach, Patterson & Hennessy.
    Hopefully, you already have this.

  4. Readings in Computer Architecture, edited by Hill, Jouppi, and Sohi.
    Tome of classic papers frequently unavailable online. Optional.


Grading

Approximate grading percentages (subject to change with advance notice):

Class (and forum) Participation 25 %
Assignments: 25 %
Project: 30 %
Final Exam: 10 %
Midterm Exam: 10 %


I expect a generally high level of work quality and independence in this class, since it is an advanced graduate class. Participation in both in-class discussions and in the online forum is an integral part of the class, and comprises a significant component of the class participation grade.

Use of the Forum

The forum (located here) is here to enrich the class. Since the class has no TA, I had to choose between having easy programming assignments which will pose few challenges for the students, or having challenging programming assignments in which the class works together as part of the forum to solve them. Here are some guidelines:
  1. The forum is a great place to vet ideas and ask questions about programming assignments.

  2. The forum is a great place to clarify problems or preliminary roadblocks with the readings. Deeper questions may be more appropriate for class discussion.

  3. Students should feel free to answer other students' questions. Generally, if you can answer a question, you should, because it will count for class participation. I generally will wait 24 hours before responding to a question to allow students to respond.

  4. Unless the question has already been clearly answered elsewhere, I will assume the responsibility of answering course policy questions.

  5. At least one of the assignments will require class-wide collaboration via the forum in order to complete!

  6. Each assignment will specify a few guidelines (regarding, for instance, whether it is o.k. to post code) for the use of the forum as it applies to the assignment.

Upcoming Programming Assignments

1. Understanding Current Architectures through Multiprocessor Programming

The class will use the DataStar Supercomputing cluster at the San Diego Supercomputer Center.

DataStar has 2464 processors and is the 35th most powerful supercomputer in the world, and you get to use it. Frankly, that's awesome. Since this is a computer architecture class (rather than a "high performance computing") class, our focus will be on a single DataStar node: the IBM p655+.

The p655+ employs a 8-processor Power4+ -based multichip module (see figure), which consists of four integrated dual-core chips.

There will be a series of related programming assignments using these machines.


[DataStar]
San Diego Supercomputer Center's DataStar
[IBM MCM]
     The p655 Multichip Module (MCM).


Course Project

See Final Project Description.

Midterm and Final Exams

The midterm and final exams will test different things. The midterm will test your ability to really understand a subject in depth. The final will cover the breadth of your knowledge of the material in the class. Generally speaking, the final will be fairly easy if you have read the papers and participated in the in-class discussion.

The midterm

Each student's "midterm exam" consists of presenting (with another student, for 20 minutes per student) the papers that were assigned for a particular day. Each day, two students take responsibility for being the "class experts" for a particular set of papers that we read. They will give a 40 minute presentation on the papers (roughly 20 minutes each). This presentation will overview background material, motivate the problem the paper is trying to solve, and present the key ideas in the paper. Be sure to go through the key architecture mechanisms proposed in the paper in detail.

Each student will have to do this once or twice, dependent on enrollment.

In order to do this, I expect that the student experts will read other materials outside the paper (perhaps by tracing back the references) in order to understand the context of the paper.

I expect that student experts will practice the presentation in order to make sure that it fits in the allotted time and has appropriate transitions. This is a good preparation for giving talks, whether for communicating one's own research, or for a research exam.

Logistically, this works in the following way - students will sign up for a given day ahead of time. They will prepare the slides, and arrange a meeting with me to review them, ideally 3-4 days before the class. Then, they will revise the presentation, practice it, and then give it in class. During the presentation, students are free to ask questions of the experts, and if a discussion topic arises, we will pause to discuss it.

Reading Assignments

A note on journal entries (or "writeups")

The writeups should focus on *your* thoughts rather than summarizing the paper. This is much like in literature class where you were expected not to summarize the book you read, but discuss, analyze and critique it. I want to see your processed thoughts on the page. In general, feel free to pick some interesting aspect of the paper and discuss your ideas or thoughts. Or, you could pick some part of the paper that was challenging for you, figure it out, and then explain it. Or you could just have a list of interesting questions you thought of.

If you have a single page that spends most of its space on your thoughts (it might be a discussion of a single issue you thought about in depth, or it could be a list of a bunch of ideas or thoughts you had), that's sufficient - you don't need to write a novel.

As long as it's clear that you are thinking about the material, don't worry about whether you're getting things absolutely correct, etc. And don't worry about making every writeup stellar. Pretty much, at the end of the class, I'm going to flip through your journal entries and see if it looks like you are generally *thinking* about the papers.

Please refer to the course forum ("Course Administration") for an extended list of writeup ideas.

The readings

Abbreviations (if none given, available on IEEE Explore or ACM Portal):
H&P Computer Architecture: A Quantitative Approach, 3rd Ed., Hennessey and Patterson.
RiCA Readings in Computer Architecture, eds. Hill, Jouppi, and Sohi.


Due         Item
Tu 1-8 First Class.
Th 1-10 Multiprocessors / Coherence
H&P 3rd ed. 6.1 - 6.6 (esp 6.3 and 6.5); H&P 4th ed.: 4.1-4.5
Tu 1-15 Tiled Microprocessors
Tiled Microprocessors are interesting because they blur the boundaries between multiprocessors and microprocessors.
The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs, by Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, Jae-Wook Lee, Paul Johnson, Walter Lee, Albert Ma, Arvind Saraf, Mark Seneski, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe and Anant Agarwal. IEEE Micro, March/April 2002.
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, by Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jason Kim, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, and Anant Agarwal. Proceedings of the International Symposium on Computer Architecture, June 2004.
Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine, by Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, and Saman Amarasinghe. Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, October 4-7, 1998.
Th 1-17 Coherence
M. M. K. Martin, M. D. Hill, and D. A. Wood, ``Token coherence: decoupling performance and correctness,'' in ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pp. 182-193, 2003 link.
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, ``The directory-based cache coherence protocol for the DASH multiprocessor,'' in ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pp. 148-159, 1990 link.
OPTIONAL (DASH performance) D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, ``The DASH prototype: implementation and performance,'' in ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pp. 418-429, 1998 link.
Tu 1-23 Tiled 2
K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore, ``Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture,'' SIGARCH Comput. Archit. News, vol. 31, no. 2, pp. 422-433, 2003 link.
S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers, ``The WaveScalar Architecture.''
To Appear in ACM Transactions On Computer Systems. link
Th 1-25 Consistency
S. V. Adve and K. Gharachorloo, ``Shared Memory Consistency Models: A Tutorial,'' tech. rep., DEC WRL, 1995 link.
Tu 1-30 Consistency 2
V. S. Pai, P. Ranganathan, S. V. Adve, and T. Harton, ``An evaluation of memory consistency models for shared-memory systems with ILP processors,'' in ASPLOS-VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, pp. 12-23, 1996 link.
C. Gniady, B. Falsafi, and T. N. Vijaykumar, ``Is SC + ILP = RC?,'' in ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture, pp. 162-171, 1999 link.
OPTIONAL (but mind-bending) J. Manson, W. Pugh, and S. V. Adve, ``The Java memory model,'' in POPL '05: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of programming languages, pp. 378-391, 2005 link.
Th 2-1 Synchronization
H&P 4th ed.: 4.5
M. Herlihy, ``A methodology for implementing highly concurrent data structures,'' in PPOPP '90: Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming, pp. 197-206, 1990 link.
Tu 2-6 Transactions I
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun, ``Transactional Memory Coherence and Consistency,'' in ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, p. 102, 2004 link.
M. Herlihy and J. E. B. Moss, ``Transactional memory: architectural support for lock-free data structures,'' in ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pp. 289-300, 1993 link.
Th 2-14 Transactions II
R. Rajwar, M. Herlihy, and K. Lai, ``Virtualizing Transactional Memory,'' in ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pp. 494-505, 2005 link.
B. Saha and A.-R. A.-T. Q. Jacobson, ``Architectural Support for Software Transactional Memory,'' in MICRO '06: Proceedings of the 39th international symposium on Microarchitecture, 2006.
Tu 2-19 Streaming
J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das, ``Evaluating the Imagine Stream Architecture,'' in ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, p. 14, 2004 link.
Michael Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Christopher Leger, Andrew A. Lamb, Jeremy Wong, Henry Hoffman, David Z. Maze, and Saman Amarasinghe. A Stream Compiler for Communication-Exposed Architectures. In ASPLOS 2002, San Jose, CA USA, October, 2002. (Paper: PDF)
Tu 2-26 M. Baron, ``The Cell, At One,'' Microprocessor Report, March 2006 link.
Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor, by Pham et al. IEEE Journal of Solid-State Circuits, January 2006.
The Microarchitecture of the Synergistic Processor for a Cell Processor, by Flachs et al. IEEE Journal of Solid-State Circuits, January 2006.
Th 2-28CMP
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas, ``Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance,'' in ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, p. 64, 2004 link.
J. Huh, D. Burger, and S. W. Keckler, ``Exploring the Design Space of Future CMPs,'' in PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, pp. 199-210, 2001 link.
Tu 3-4Interconnects
H&P3 8.1-8.5, 8.9 or H&P4 E.1-E.6, E.10


Th 3-6Interconnects II
R. Kumar, V. Zyuban, and D. M. Tullsen, ``Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling,'' in ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pp. 408-419, 2005 link.


Tu 3-11No class
Th 3-13Project Presentations


email: mbtaylor at you see ess dee dot ee dee you
web:   Michael Taylor's Website.