cse240a: Graduate Computer Architecture

Warren Lecture Hall 2205
Lectures Tue. ; Thu., 2:00p-3:20p (Warren Lecture Hall 2205)
Winter, 2010
Shortcuts: Schedule Homeworks Projects

Instructor

Steven Swanson
Email: swanson @ cs.ucsd.edu
IM (not email): professorswanson@{AIM, Yahoo!, google talk, MS Messenger}
Office: EBU3B 3212
Office Hours: TBA
UCSD homepage

Teaching Assistant

Hung-Wei Tseng
Email: h1tseng @ cs.ucsd.edu
IM (not email): bunnyhwtseng@AIM
Office: EBU3B B260A
Office Hours: Tuesday 11:00a-12:00p, Wednesday 4:00p-5:00p, or by appointment
UCSD homepage

Course discussion board: WebCT. Required reading. Get signed up.

Course Description

This course will describe the basics of modern processor operation. Topics include computer system performance, instruction set architectures, pipelining, branch prediction, memory-hierarchy design, and a brief introduction to multiprocessor architecture issues.


Text books

Required: Patterson & Hennessy, Computer Architecture: A Quantitative Approach, 4th Edition, Patterson & Hennessy, Morgan Kaufmann, 4th Edition
Required: Other assigned readings throughout the quarter.
Optional: The History of Computing This a great set of lectures from a course taught at UCSD/UW/Berkeley three years ago. Most of them are buy the folks that actually made the history (Steve Wozniak, Ray Ozzie, Gordon Bell, etc.).

Grading

3-4 homeworks, paper summaries 20%
Prefetching contest/in class presentation 15% More details later.
Two midterms 35% The midterms are on Jan. 28th and Feb 18th.
Final 30% The final will be cummulative.

Additional notes about grades in this course:


Schedule

I will post the slides for most lectures. Since the slides contain material I am not allowed to distribute publically, they are password protected. I have posted the username and password to the web board.

Reading should be done before class on the day they are listed. It is essential that you do the readings. I will not cover everything you are responsible for in class.

Date Topic Readings Slides Due Notes
Tuesday, January 5 Introduction and Administrivia 00_Intro.pdf
Thursday, January 7 A Brief History of Architecture; CMOS/Technology scaling. Appendix A (if you architecture is rusty), 1.1-1.12
Optional (this is the original paper about Moore's Law): Cramming More Components Onto Integrated Circuits, G.E. Moore, Proceedings of the IEEE 86(1):82-85, Jan 1998 link.
01_technology.pdf
Tuesday, January 12 Performance measurement; Introduction to Caching Twelve Ways to Fool the Masses
5.1-5.3
02_Technology.pdf,
03_performance.pdf,
04_Cache_intro.pdf
Thursday, January 14 Student presentation: Advanced Caching 04_Cache_intro.pdf from Tuesday
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers, Norman P. Jouppi, SIGARCH Comput. Archit. News 18(3a):364-373, 1990.
Retrospective: improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers, Norman P. Jouppi, ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), New York, NY, USA, 1998, pages 71-73.
Trace cache: a low latency approach to high bandwidth instruction fetching, Eric Rotenberg, Steve Bennett, and James E. Smith, MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture , Washington, DC, USA, 1996, pages 24-35.
05_Jouppi.pdf,
07_TraceCache.pdf
Tuesday, January 19 Virtual memory; Memory hierarchies C.4 and C.5 if you need a review; 5.3-5.9 06_VirtualMemory.pdf Assignment 3-1;
Thursday, January 21 Student presentation (by Paul Loriaux)-- Variations and VM. Mondrian memory protection, Emmett Witchel, Josh Cates, and Krste Asanovic, ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, New York, NY, USA, 2002, pages 304-316.
Architecture support for single address space operating systems, Eric J. Koldinger, Jeffrey S. Chase, and Susan J. Eggers, SIGPLAN Not. 27(9):175-186, 1992.
Optional: Sharing and protection in a single-address-space operating system, Jeffrey S. Chase, Henry M. Levy, Michael J. Feeley, and Edward D. Lazowska, ACM Trans. Comput. Syst. 12(4):271-307, 1994.
loriaux_VM.pdf,
Z1_Midterm1Preview.pdf
Tuesday, January 26 ISA Design Skim appendix B.
The case for the reduced instruction set computer, David A. Patterson and David R. Ditzel, SIGARCH Comput. Archit. News 8(6):25-33, 1980.
Optional: A VLSI RISC, D.A. Patterson and C.H. Sequin, Computer 15(9): 8-21, Sep 1982.
Optional: Very Long Instruction Word architectures and the ELI-512, Joseph A. Fisher, ISCA '83: Proceedings of the 10th annual international symposium on Computer architecture, New York, NY, USA, 1983, pages 140-150.
08_ISA.pdf,
09_CaseForRISC.pdf,
AA_PrefetcherContest.pdf
Assignment 3-2;
Thursday, January 28 Midterm
Tuesday, February 2 Pipelining and Branch Prediction An analysis of correlation and predictability: what makes two-level branch predictors work, Marius Evers, Sanjay J. Patel, Robert S. Chappell, and Yale N. Patt, ISCA '98: Proceedings of the 25th annual international symposium on Computer architecture, Washington, DC, USA, 1998, pages 52-61.
A study of branch prediction strategies, James E. Smith, ISCA '81: Proceedings of the 8th annual symposium on Computer Architecture, Los Alamitos, CA, USA, 1981, pages 135-148.
Retrospective: a study of branch prediction strategies, James E. Smith, ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), New York, NY, USA, 1998, pages 22-23.
10_BranchPrediction.pdf
Thursday, February 4 Student presentation(by Ilya Kolykhmatov and Ryan Gabrys) -- Advance branch prediction algorithms Dynamic Branch Prediction with Perceptrons, Daniel A. Jimenez and Calvin Lin, HPCA '01: Proceedings of the 7th International Symposium on High-Performance Computer Architecture, Washington, DC, USA, 2001, page 197.
Assigning confidence to conditional branch predictions, Erik Jacobsen, Eric Rotenberg, and J. E. Smith, MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, Washington, DC, USA, 1996, pages 142-152.
Optional: Combining Branch Predictors, Branch Predictors and Scott McFarling, technical report WRL-TN-36, 1993.
Optional: Low-power, high-performance analog neural branch prediction, Renee St. Amant, Daniel A. Jimenez, and Doug Burger, MICRO '08: Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2008, pages 447-458.
11_BranchPapers.pdf
Tuesday, February 9 Introduction to OOO execution 2.1-2.11
The Alpha 21264 microprocessor, R.E. Kessler, Micro, IEEE 19(2):24-36, Mar/Apr 1999
15_OOO.pdf,
14_21264.pdf
Thursday, February 11 Student presentation (by Bryan Kim and Neha Chachra)-- Implementing OOO Tomosulo Handout
Tomosulo Diagram
An efficient algorithm for exploiting multiple arithmetic units, R. M. Tomasulo, IBM J. Res. Dev. 11(1):25-33, 1967.
HPSm, a high performance restricted data flow architecture having minimal functionality, W. Hwu and Y. N. Patt, ISCA '86: Proceedings of the 13th annual international symposium on Computer architecture, Los Alamitos, CA, USA, 1986, pages 297-306.
Retrospective: HPSm, a high performance restricted data flow architecture having minimal functionality, Wen-mei W. Hwu and Yale N. Patt, ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), New York, NY, USA, 1998, pages 43-44.
Optional: HPS, a new microarchitecture: rationale and introduction, Y. N. Patt, W. M. Hwu, and M. Shebanow, SIGMICRO Newsl. 16(4):103-108, 1985.
Optional: A design space evaluation of grid processor architectures, Ramadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, and Stephen W. Keckler, MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, Washington, DC, USA, 2001, pages 40-51.
Optional:Excerpts from Design of a Computer: the Control Data 6600
Optional: Parallel Operation in the Control Data 6600
13_OOO.pdf
Tuesday, February 16 Other execution strategies WaveScalar, Steven Swanson, Ken Michelson, Andrew Schwerin, and Mark Oskin, MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2003, page 291 16_WaveScalar.ppt.pdf,
Z2_Midterm2Preview.pdf
Assignment 4;
Thursday, February 18 Midterm
Tuesday, February 23 No class
Thursday, February 25 Student presentation(by S.N. Hemanth Meenakshisundaram and Bharathan Balaji) -- Simultaneous (and other) multithreading Multiscalar processors, Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar, ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, New York, NY, USA, 1995, pages 414-425.
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm, ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture, New York, NY, USA, 1996, pages 191-202.
Optional Simultaneous multithreading: maximizing on-chip parallelism, Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy, ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, New York, NY, USA, 1995, pages 392-403
Speculative Versioning Cache, T.N. Vijaykumar, S. Gopal, J.E. Smith, and G. Sohi, Parallel and Distributed Systems, IEEE Transactions on 12(12):1305-1317, Dec 2001
17_MultiscalarAndSMT.pdf
Tuesday, March 2 Multithreading Speculative Data-Driven Multithreading, Amir Roth and Gurindar S. Sohi, HPCA '01: Proceedings of the 7th International Symposium on High-Performance Computer Architecture, Washington, DC, USA, 2001, page 37.
18_Precompute.pdf,
19_DDT.pdf
Thursday, March 4 Student presentation (by Kaisen Lin)-- Chip multiprocessors The case for a single-chip multiprocessor, Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang, SIGPLAN Not. 31(9):2-11, 1996.
Niagara: A 32-Way Multithreaded Sparc Processor, Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun, IEEE Micro 25(2):21-29, 2005.
Optional: Sun's slides about the UltraSpark T2 (aka Niagara 2, aka Victoria Falls)
Optional:Piranha: a scalable architecture based on single-chip multiprocessing, Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese, ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, New York, NY, USA, 2000, pages 282-293
20_CMPs.pdf Project 1;
Tuesday, March 9 Multiprocessors ProjectResults.pdf,
21_CMPs.pdf
Thursday, March 11 Student presentation -- Support for determinism/TBA Learning from mistakes: a comprehensive study on real world concurrency bug characteristics, Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou, SIGARCH Comput. Archit. News 36(1):329-339, 2008
DMP: deterministic shared memory multiprocessing, Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin, ASPLOS '09: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, New York, NY, USA, 2009, pages 85-96
22_determinism.pdf Assignment 5;
Thursday, March 18 Final Exam FinalPreview.pdf 3:00-5:59

Integrity Policy


Homework

Assignment 1: Paper summaries
Assignment 2: Class presentations
Assignment 3: Memory Hierarchy
Assignment 4: Branch prediction and Advanced Pipelining
Assignment 5: Multithreading/Multiprocessor

Projects

Project 1: Prefetching competition