cse240a: Graduate Computer Architecture

Location: Peterson Hall 102
Time: Monday Wednesday 5:00pm-6:20pm
Term: Fall 2011
Shortcuts: Schedule Homeworks Projects

Instructor

Steven Swanson
Email: swanson @ cs.ucsd.edu
IM (not email): professorswanson@{AIM, Yahoo!, google talk, MS Messenger}
Office: EBU3B 3212
Office Hours: Monday 2-3; Wed 10:30-11:30; by appointment
UCSD homepage

Teaching Assistant

Bryan S. Kim
Email: brk006 @ cs.ucsd.edu
IM (not email): bryansjkim@google talk
Office: EBU3B B240A
Office Hours: Tuesday 1pm-2pm; Thursday 3pm-4pm
UCSD homepage

Course discussion board: Google Groups. Required reading. Get signed up.

Course Description

This course will describe the basics of modern processor operation. Topics include computer system performance, instruction set architectures, pipelining, branch prediction, memory-hierarchy design, and a brief introduction to multiprocessor architecture issues.


Text books

Required: Patterson & Hennessy, Computer Architecture: A Quantitative Approach, 4th Edition, Patterson & Hennessy, Morgan Kaufmann, 4th Edition
Required: Other assigned readings throughout the quarter.
Optional: The History of Computing This a great set of lectures from a course taught at UCSD/UW/Berkeley three years ago. Most of them are buy the folks that actually made the history (Steve Wozniak, Ray Ozzie, Gordon Bell, etc.).

Grading

Read assignments/paper summaries 40%
Prefetching contest/in class presentation 10%
Homework 10%
Midterm 20%
Final 20%

Additional notes about grades in this course:


Schedule

I will post the slides for most lectures. Since the slides contain material I am not allowed to distribute publically, they are password protected. I have posted the username and password to the web board.

Reading should be done before class on the day they are listed. It is essential that you do the readings. I will not cover everything you are responsible for in class.

Date Topic Readings Slides Due Notes
Monday, September 26 Introduction and Administrivia 00_Intro.pdf
Wednesday, September 28 A Brief History of Architecture; CMOS/Technology scaling. Appendix A (if you architecture is rusty), 1.1-1.12
Optional (this is the original paper about Moore's Law): Cramming More Components Onto Integrated Circuits , G.E. Moore, Proceedings of the IEEE 86(1):82-85, Jan 1998
01_Technology-1.pdf
Monday, October 3 Faculty Research Seminar (11am) Speaker: Prof. Rajesh Gupta
Topic: The Variability Expeditions: Exploring the Software Stack for Underdesigned Computing Machines
(For Google form: Paper 1)
Monday, October 3 Performance measurement; Introduction to Caching C.4 and C.5 if you need a review; 5.3-5.9; 03_performance.pdf,
02_Technology-2.pdf
Wednesday, October 5 Student presentation: Advanced Caching (presented by Paul Wicks) 04_Cache_intro.pdf (at right)
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , Norman P. Jouppi, SIGARCH Comput. Archit. News 18(3a):364-373, 1990. (For Google form: Paper 2)
Retrospective: improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , Norman P. Jouppi, ISCA '98: 25 years of the international symposia on Computer architecture (selected papers) , New York, NY, USA, 1998, pages 71-73.
Trace cache: a low latency approach to high bandwidth instruction fetching , Eric Rotenberg, Steve Bennett, and James E. Smith, MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture , Washington, DC, USA, 1996, pages 24-35. (For Google form: Paper 3)
04_Cache_intro.pdf,
04_Jouppi.pdf
Monday, October 10 Student presentation: Virtual memory and protection (presented by Utpal Kumar and Mohammad Moghimi) For background: C.4 and C.5; 5.3-5.9
Mondrian memory protection , Emmett Witchel, Josh Cates, and Krste Asanovic, ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems , New York, NY, USA, 2002, pages 304-316. (For Google form: Paper 4)
Architecture support for single address space operating systems , Eric J. Koldinger, Jeffrey S. Chase, and Susan J. Eggers, SIGPLAN Not. 27(9):175-186, 1992. (For Google form: Paper 5)
Optional: Sharing and protection in a single-address-space operating system , Jeffrey S. Chase, Henry M. Levy, Michael J. Feeley, and Edward D. Lazowska , ACM Trans. Comput. Syst. 12(4):271-307, 1994.
05_VirtualMemory.pdf,
06_SAOS.pdf,
06_SAOS_Questions.pdf
Wednesday, October 12 Student presentation: ISA design (presented by Tarun Arora and Phi Hung Nguyen) Skim appendix B.
The case for the reduced instruction set computer , David A. Patterson and David R. Ditzel, SIGARCH Comput. Archit. News 8(6):25-33, 1980. (For Google form: Paper 6)
CryptoManiac: a fast flexible architecture for secure communication , L. Wu, C. Weaver, and T. Austin, ISCA '01: Proceedings of the 28th annual international symposium on computer architecture Goteburg, Sweden, 2001, pages 110-119. (For Google form: Paper 7)
Optional: Architectural support for fast symmetric-key cryptography, Jerome Burke, John McDonald, and Todd Austin, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, New York, NY, USA, 2000, pages 178-189
Optional: A VLSI RISC , D.A. Patterson and C.H. Sequin, Computer 15(9): 8-21, Sep 1982.
Optional: Very Long Instruction Word architectures and the ELI-512 , Joseph A. Fisher, ISCA '83: Proceedings of the 10th annual international symposium on Computer architecture , New York, NY, USA, 1983, pages 140-150.
Monday, October 17 Slack TBA 07_MMP-Nooks.pdf,
07_ISAs.pdf,
07_MIPSExtensions.pdf
Wednesday, October 19 No class TBA
Monday, October 24 Faculty Research Seminar (11am) Speaker: Prof. Steven Swanson
Topic: Engineering Storage for the Data Age
(For Google form: Paper 8)
Monday, October 24 Pipelining; Student presentation: Branch Prediction (presented by Eugene Kolinko and Joon Lee) A.2 and 2.3.
A study of branch prediction strategies , James E. Smith, ISCA '81: Proceedings of the 8th annual symposium on Computer Architecture , Los Alamitos, CA, USA, 1981, pages 135-148. (For Google form: Paper 10)
Retrospective: a study of branch prediction strategies , James E. Smith, ISCA '98: 25 years of the international symposia on Computer architecture (selected papers) , New York, NY, USA, 1998, pages 22-23.
An analysis of correlation and predictability: what makes two-level branch predictors work , Marius Evers, Sanjay J. Patel, Robert S. Chappell, and Yale N. Patt , ISCA '98: Proceedings of the 25th annual international symposium on Computer architecture , Washington, DC, USA, 1998, pages 52-61. (For Google form: Paper 9)
08_Pipelining.pdf,
08_Hazards.pdf,
08_branchprediction.pdf
Wednesday, October 26 Student presentation: OOO execution (presented by Manoj Mardithaya and Pooja Saraff) 2.1-2.11
An efficient algorithm for exploiting multiple arithmetic units , R. M. Tomasulo, IBM J. Res. Dev. 11(1):25-33, 1967. (For Google form: Paper 11)
HPSm, a high performance restricted data flow architecture having minimal functionality , W. Hwu and Y. N. Patt, ISCA '86: Proceedings of the 13th annual international symposium on Computer architecture , Los Alamitos, CA, USA, 1986, pages 297-306. (For Google form: Paper 12)
Retrospective: HPSm, a high performance restricted data flow architecture having minimal functionality , Wen-mei W. Hwu and Yale N. Patt, ISCA '98: 25 years of the international symposia on Computer architecture (selected papers) , New York, NY, USA, 1998, pages 43-44.
Optional: HPS, a new microarchitecture: rationale and introduction , Y. N. Patt, W. M. Hwu, and M. Shebanow, SIGMICRO Newsl. 16(4):103-108, 1985.
Optional: A design space evaluation of grid processor architectures , Ramadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, and Stephen W. Keckler , MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture , Washington, DC, USA, 2001, pages 40-51.
Optional: Excerpts from Design of a Computer: the Control Data 6600
Optional: Parallel Operation in the Control Data 6600
09_PrefetcherContest.pdf,
09_SuperScalarSMT.pdf,
09_OutOfOrderExecution.pdf
Assignment 4;
Monday, October 31 Student presentation: Dataflow (presented by Sreeparna Mukherjee and Margaret S. Urfer) WaveScalar , S. Swanson , K. Michelson, A. Schwerin, M. Oskin , MICRO '03: Proceedings of the 36th annual IEEE/ACM international symposium on microarchitecture , Washington, DC, USA, 2003, pages 291-. (For Google form: Paper 13)
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore, Proceedings of the 30th annual international symposium on Computer architecture, New York, NY, USA, 2003, pages 422-433 (For Google form: Paper 14)

Optional: Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , K. Sankaralingam, R. Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, D. Burger, S.W. Keckler, and C. Moore, Micro, IEEE 23(6): 46 - 51, nov.-dec. 2003 The wavescalar architecture , S. Swanson , A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, S. Eggers, ACM Trans. Comput. Syst. 25(2):1-54, 2007.
10_WaveScalar.pdf,
10_TRIPS.pdf,
10_MidtermReview.pdf
Wednesday, November 2 Midterm review session (Optional) TBA
Monday, November 7 Midterm TBA
Wednesday, November 9 Student presentation: Multithreading (presented James Lue) Multiscalar processors , Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar, ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, New York, NY, USA, 1995, pages 414-425. (For Google form: Paper 15)
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor , Dean M. Tullsen , Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm , ISCA '96: Proceedings of the 23rd annual international symposium on Computer architectur , New York, NY, USA, 1996, pages 191-202. (For Google form: Paper 16)
Optional: Simultaneous multithreading: maximizing on-chip parallelism , Dean M. Tullsen , Susan J. Eggers, and Henry M. Levy , ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture , New York, NY, USA, 1995, pages 392-403
Optional: Speculative Versioning Cache , T.N. Vijaykumar, S. Gopal, J.E. Smith, and G. Sohi, Parallel and Distributed Systems, IEEE Transactions on, 12(12):1305-1317, Dec 2001
11_MultiScalar.pdf,
11_SMT.pdf
Monday, November 14 CMPs, coherence, and consistency 4.1-4.10 15_CoherenceAndConsistency.pdf,
2011-Fall-CSE-240A-Midterm-v2-key.pdf
Wednesday, November 16 Student presentation: CMPs (presented by German Alfaro and Linda Pescatore) The case for a single-chip multiprocessor , Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang , SIGPLAN Not. 31(9):2-11, 1996. (For Google form: Paper 17)
Niagara: A 32-Way Multithreaded Sparc Processor , Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun, IEEE Micro 25(2):21-29, 2005. (For Google form: Paper 18)
Optional: Sun's slides about the UltraSpark T2 (aka Niagara 2, aka Victoria Falls)
Optional: Piranha: a scalable architecture based on single-chip multiprocessing , Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese , ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture , New York, NY, USA, 2000, pages 282-293
16_CaseForCMP.pdf,
16_NIAGARA.pdf
Monday, November 21 Faculty Research Seminar (11am) Speaker: Prof. Michael Taylor
Topic: GreenDroid: An Architecture for the Dark Silicon Era
(For Google form: Paper 19)
Monday, November 21 Student presentation: Heterogeneity (presented by Erh-Li Shen and Ching-Yao Liu) Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2003, pages 81--(For Google form: Paper 20)
Conservation cores: reducing the energy of mature computations, Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor, Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, New York, NY, USA, 2010, pages 205-218 (For Google form: Paper 21)
17_CCores.pdf,
17_Heterogeneous.pdf
Wednesday, November 23 18-FlashOverview.pdf
Monday, November 28 Student presentation: Storage (presented by Jyoti Wadhwani and Sudharsan Seshadri) 6.1-6.0
Transactional flash, Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou, Proceedings of the 8th USENIX conference on Operating systems design and implementation, Berkeley, CA, USA, 2008, pages 147-160 (For Google form: Paper 22)
FlashStore: high throughput persistent key-value store, Biplob Debnath, Sudipta Sengupta, and Jin Li, Proc. VLDB Endow. 3:1414-1425, September 2010 (For Google form: Paper 23)
19_prefetcher_results.pdf,
19_FlashStore.pdf,
19_TXFlash.pdf
Wednesday, November 30 Final Review TBA cse240a_sample.pdf
Thursday, December 8 Final Exam TBA fa11_cse240a_final.doc,
fa11_cse240a_final.pdf
due @ 21:59

Integrity Policy


Homework

Assignment 1: Secret username
Assignment 2: Paper summaries
Assignment 3: Class presentations
Assignment 4: Performance, Memory, ISA, VM, Pipelining, and Branch Prediction
Assignment 5: Multi-threading, CMPs, Heterogeneity, and Storage (AKA: compulsory studying for the final)

Projects

Project 1: Prefetching competition