cse240a: Graduate Computer Architecture

Location: Peterson Hall 102
Time: Monday Wednesday 5:00pm-6:20pm
Term: Fall 2011
Shortcuts: Schedule Homeworks Projects

Instructor

Steven Swanson

Email:

swanson

cs.ucsd.edu

IM (not email): professorswanson@{AIM, Yahoo!, google talk, MS Messenger}
Office: EBU3B 3212
Office Hours: Monday 2-3; Wed 10:30-11:30; by appointment
UCSD homepage

Teaching Assistant

Bryan S. Kim

Email:

brk006

cs.ucsd.edu

IM (not email): bryansjkim@google talk
Office: EBU3B B240A
Office Hours: Tuesday 1pm-2pm; Thursday 3pm-4pm
UCSD homepage

Course discussion board: Google Groups. Required reading. Get signed up.

Course Description

This course will describe the basics of modern processor operation. Topics include computer system performance, instruction set architectures, pipelining, branch prediction, memory-hierarchy design, and a brief introduction to multiprocessor architecture issues.

Text books

Required: Patterson & Hennessy, Computer Architecture: A Quantitative Approach, 4th Edition, Patterson & Hennessy, Morgan Kaufmann, 4th Edition
Required: Other assigned readings throughout the quarter.
Optional: The History of Computing This a great set of lectures from a course taught at UCSD/UW/Berkeley three years ago. Most of them are buy the folks that actually made the history (Steve Wozniak, Ray Ozzie, Gordon Bell, etc.).

Grading

Read assignments/paper summaries	40%
Prefetching contest/in class presentation	10%
Homework	10%
Midterm	20%
Final	20%

Additional notes about grades in this course:

Calculating grades: I compute grades using an Excel spread sheet. In the interests of transparancy, the current grade sheet (with identifying information removed) is available in XLS format. The grade sheet contains all the information about curves and how the grades are computed. It is somewhat sophisticated, if you find bugs please bring them to my attention. Please note that some versions of OpenOffice do not perform the calculations properly, and will give incorrect results.

The grading systems is based on a 13 point (F through A+) scale. For each assignment/test/etc, the sheet computes the letter grade (rounding up, when needed) according to a curve for each assignment (specified at the bottom of each assignments column). Your final grade is the weighted average of these grades.

We do our best to record grades accurately, but you should double-check.
Errors in grading: If you feel there has been an error in how an assignment or test was graded, you have one week from when the assignment is return to bring it to our attention. You must submit (via email to the instructor and the appropriate TAs) a written description of the problem.

For arithmetic errors (adding up points etc.) you do not need to submit anything in writing, but the one week limit still applies.
Final grades: If you have a problem with your final grade in the course, send me email and we can set up an appoinment to discuss it. Like you, I generally go on a vacation after the quarter ends, so it may take a little while for me to get back to you and the meeting will likely be the next quarter. If the problem needs to be resolved sooner make that clear in the first line of your email.

Schedule

I will post the slides for most lectures. Since the slides contain material I am not allowed to distribute publically, they are password protected. I have posted the username and password to the web board.

Reading should be done before class on the day they are listed. It is essential that you do the readings. I will not cover everything you are responsible for in class.

Date	Topic	Readings	Slides	Due	Notes
Monday, September 26	Introduction and Administrivia		00_Intro.pdf
Wednesday, September 28	A Brief History of Architecture; CMOS/Technology scaling.	Appendix A (if you architecture is rusty), 1.1-1.12 Optional (this is the original paper about Moore's Law): Cramming More Components Onto Integrated Circuits , G.E. Moore, Proceedings of the IEEE 86(1):82-85, Jan 1998	01_Technology-1.pdf
Monday, October 3	Faculty Research Seminar (11am)	Speaker: Prof. Rajesh Gupta Topic: The Variability Expeditions: Exploring the Software Stack for Underdesigned Computing Machines (For Google form: Paper 1)
Monday, October 3	Performance measurement; Introduction to Caching	C.4 and C.5 if you need a review; 5.3-5.9;	03_performance.pdf, 02_Technology-2.pdf
Wednesday, October 5	Student presentation: Advanced Caching (presented by Paul Wicks)	04_Cache_intro.pdf (at right) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , Norman P. Jouppi, SIGARCH Comput. Archit. News 18(3a):364-373, 1990. (For Google form: Paper 2) Retrospective: improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , Norman P. Jouppi, ISCA '98: 25 years of the international symposia on Computer architecture (selected papers) , New York, NY, USA, 1998, pages 71-73. Trace cache: a low latency approach to high bandwidth instruction fetching , Eric Rotenberg, Steve Bennett, and James E. Smith, MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture , Washington, DC, USA, 1996, pages 24-35. (For Google form: Paper 3)	04_Cache_intro.pdf, 04_Jouppi.pdf
Monday, October 10	Student presentation: Virtual memory and protection (presented by Utpal Kumar and Mohammad Moghimi)	For background: C.4 and C.5; 5.3-5.9 Mondrian memory protection , Emmett Witchel, Josh Cates, and Krste Asanovic, ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems , New York, NY, USA, 2002, pages 304-316. (For Google form: Paper 4) Architecture support for single address space operating systems , Eric J. Koldinger, Jeffrey S. Chase, and Susan J. Eggers, SIGPLAN Not. 27(9):175-186, 1992. (For Google form: Paper 5) Optional: Sharing and protection in a single-address-space operating system , Jeffrey S. Chase, Henry M. Levy, Michael J. Feeley, and Edward D. Lazowska , ACM Trans. Comput. Syst. 12(4):271-307, 1994.	05_VirtualMemory.pdf, 06_SAOS.pdf, 06_SAOS_Questions.pdf
Wednesday, October 12	Student presentation: ISA design (presented by Tarun Arora and Phi Hung Nguyen)	Skim appendix B. The case for the reduced instruction set computer , David A. Patterson and David R. Ditzel, SIGARCH Comput. Archit. News 8(6):25-33, 1980. (For Google form: Paper 6) CryptoManiac: a fast flexible architecture for secure communication , L. Wu, C. Weaver, and T. Austin, ISCA '01: Proceedings of the 28th annual international symposium on computer architecture Goteburg, Sweden, 2001, pages 110-119. (For Google form: Paper 7) Optional: Architectural support for fast symmetric-key cryptography, Jerome Burke, John McDonald, and Todd Austin, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, New York, NY, USA, 2000, pages 178-189 Optional: A VLSI RISC , D.A. Patterson and C.H. Sequin, Computer 15(9): 8-21, Sep 1982. Optional: Very Long Instruction Word architectures and the ELI-512 , Joseph A. Fisher, ISCA '83: Proceedings of the 10th annual international symposium on Computer architecture , New York, NY, USA, 1983, pages 140-150.
Monday, October 17	Slack	TBA	07_MMP-Nooks.pdf, 07_ISAs.pdf, 07_MIPSExtensions.pdf
Wednesday, October 19	No class	TBA
Monday, October 24	Faculty Research Seminar (11am)	Speaker: Prof. Steven Swanson Topic: Engineering Storage for the Data Age (For Google form: Paper 8)
Monday, October 24	Pipelining; Student presentation: Branch Prediction (presented by Eugene Kolinko and Joon Lee)	A.2 and 2.3. A study of branch prediction strategies , James E. Smith, ISCA '81: Proceedings of the 8th annual symposium on Computer Architecture , Los Alamitos, CA, USA, 1981, pages 135-148. (For Google form: Paper 10) Retrospective: a study of branch prediction strategies , James E. Smith, ISCA '98: 25 years of the international symposia on Computer architecture (selected papers) , New York, NY, USA, 1998, pages 22-23. An analysis of correlation and predictability: what makes two-level branch predictors work , Marius Evers, Sanjay J. Patel, Robert S. Chappell, and Yale N. Patt , ISCA '98: Proceedings of the 25th annual international symposium on Computer architecture , Washington, DC, USA, 1998, pages 52-61. (For Google form: Paper 9)	08_Pipelining.pdf, 08_Hazards.pdf, 08_branchprediction.pdf
Wednesday, October 26	Student presentation: OOO execution (presented by Manoj Mardithaya and Pooja Saraff)	2.1-2.11 An efficient algorithm for exploiting multiple arithmetic units , R. M. Tomasulo, IBM J. Res. Dev. 11(1):25-33, 1967. (For Google form: Paper 11) HPSm, a high performance restricted data flow architecture having minimal functionality , W. Hwu and Y. N. Patt, ISCA '86: Proceedings of the 13th annual international symposium on Computer architecture , Los Alamitos, CA, USA, 1986, pages 297-306. (For Google form: Paper 12) Retrospective: HPSm, a high performance restricted data flow architecture having minimal functionality , Wen-mei W. Hwu and Yale N. Patt, ISCA '98: 25 years of the international symposia on Computer architecture (selected papers) , New York, NY, USA, 1998, pages 43-44. Optional: HPS, a new microarchitecture: rationale and introduction , Y. N. Patt, W. M. Hwu, and M. Shebanow, SIGMICRO Newsl. 16(4):103-108, 1985. Optional: A design space evaluation of grid processor architectures , Ramadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, and Stephen W. Keckler , MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture , Washington, DC, USA, 2001, pages 40-51. Optional: Excerpts from Design of a Computer: the Control Data 6600 Optional: Parallel Operation in the Control Data 6600	09_PrefetcherContest.pdf, 09_SuperScalarSMT.pdf, 09_OutOfOrderExecution.pdf	Assignment 4;
Monday, October 31	Student presentation: Dataflow (presented by Sreeparna Mukherjee and Margaret S. Urfer)	WaveScalar , S. Swanson , K. Michelson, A. Schwerin, M. Oskin , MICRO '03: Proceedings of the 36th annual IEEE/ACM international symposium on microarchitecture , Washington, DC, USA, 2003, pages 291-. (For Google form: Paper 13) Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore, Proceedings of the 30th annual international symposium on Computer architecture, New York, NY, USA, 2003, pages 422-433 (For Google form: Paper 14) Optional: Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , K. Sankaralingam, R. Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, D. Burger, S.W. Keckler, and C. Moore, Micro, IEEE 23(6): 46 - 51, nov.-dec. 2003 The wavescalar architecture , S. Swanson , A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, S. Eggers, ACM Trans. Comput. Syst. 25(2):1-54, 2007.	10_WaveScalar.pdf, 10_TRIPS.pdf, 10_MidtermReview.pdf
Wednesday, November 2	Midterm review session (Optional)	TBA
Monday, November 7	Midterm	TBA
Wednesday, November 9	Student presentation: Multithreading (presented James Lue)	Multiscalar processors , Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar, ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, New York, NY, USA, 1995, pages 414-425. (For Google form: Paper 15) Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor , Dean M. Tullsen , Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm , ISCA '96: Proceedings of the 23rd annual international symposium on Computer architectur , New York, NY, USA, 1996, pages 191-202. (For Google form: Paper 16) Optional: Simultaneous multithreading: maximizing on-chip parallelism , Dean M. Tullsen , Susan J. Eggers, and Henry M. Levy , ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture , New York, NY, USA, 1995, pages 392-403 Optional: Speculative Versioning Cache , T.N. Vijaykumar, S. Gopal, J.E. Smith, and G. Sohi, Parallel and Distributed Systems, IEEE Transactions on, 12(12):1305-1317, Dec 2001	11_MultiScalar.pdf, 11_SMT.pdf
Monday, November 14	CMPs, coherence, and consistency	4.1-4.10	15_CoherenceAndConsistency.pdf, 2011-Fall-CSE-240A-Midterm-v2-key.pdf
Wednesday, November 16	Student presentation: CMPs (presented by German Alfaro and Linda Pescatore)	The case for a single-chip multiprocessor , Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang , SIGPLAN Not. 31(9):2-11, 1996. (For Google form: Paper 17) Niagara: A 32-Way Multithreaded Sparc Processor , Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun, IEEE Micro 25(2):21-29, 2005. (For Google form: Paper 18) Optional: Sun's slides about the UltraSpark T2 (aka Niagara 2, aka Victoria Falls) Optional: Piranha: a scalable architecture based on single-chip multiprocessing , Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese , ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture , New York, NY, USA, 2000, pages 282-293	16_CaseForCMP.pdf, 16_NIAGARA.pdf
Monday, November 21	Faculty Research Seminar (11am)	Speaker: Prof. Michael Taylor Topic: GreenDroid: An Architecture for the Dark Silicon Era (For Google form: Paper 19)
Monday, November 21	Student presentation: Heterogeneity (presented by Erh-Li Shen and Ching-Yao Liu)	Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2003, pages 81--(For Google form: Paper 20) Conservation cores: reducing the energy of mature computations, Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor, Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, New York, NY, USA, 2010, pages 205-218 (For Google form: Paper 21)	17_CCores.pdf, 17_Heterogeneous.pdf
Wednesday, November 23			18-FlashOverview.pdf
Monday, November 28	Student presentation: Storage (presented by Jyoti Wadhwani and Sudharsan Seshadri)	6.1-6.0 Transactional flash, Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou, Proceedings of the 8th USENIX conference on Operating systems design and implementation, Berkeley, CA, USA, 2008, pages 147-160 (For Google form: Paper 22) FlashStore: high throughput persistent key-value store, Biplob Debnath, Sudipta Sengupta, and Jin Li, Proc. VLDB Endow. 3:1414-1425, September 2010 (For Google form: Paper 23)	19_prefetcher_results.pdf, 19_FlashStore.pdf, 19_TXFlash.pdf
Wednesday, November 30	Final Review	TBA	cse240a_sample.pdf
Thursday, December 8	Final Exam	TBA	fa11_cse240a_final.doc, fa11_cse240a_final.pdf		due @ 21:59

Integrity Policy

Cheating WILL be taken seriously. Doing otherwise is not fair to honest students. It is also not fair to allow the cheater to think that it is a reasonable alternative in life.
Please review the UCSD student handbook for more details on Academic Integrity.
Anyone copying information or having information copied during a test will receive an F for the class and will not be allowed to drop. They will be reported to their college dean. If you can prove non-cooperative copying took place, your grade may be restored, but you must prove it to the dean - I don't want to be involved. Anyone caught cheating or falsely representing the work of others on the homework will not be allowed to turn in further homework. Your grade will be based exclusively on the tests with a penalty of 25% OR GREATER applied.
We photocopy a random sampling of the exams in order to ensure that students do not modify their tests after they have been returned.
Online solutions, etc.: A solutions manual exists for this text. Using it, or any solutions you may find on the internet elsewhere IS CHEATING and will be dealt with accordingly. We know what the solution manual solutions look like. Homework is a small fraction of your grade.

Homework

Homeworks are due by 4:00pm on the due date unless otherwise noted.
Turn in your printed solutions to Bryan's mailbox in the CSE grad student mailroom, unless otherwise noted.
Late assignments will not be accepted.
There is no regrading of written homeworks, except for addition errors. No single problem will have a significant impact on your grade.
Studying in groups is definitely encouraged.
Typically, homework assignments may be graded based on a statistical subset of the problems in each assignment.
Homework must be typed or clearly handwritten. Illegible/unreadable answers will receive no credit.

Assignment 1: Secret username
Assignment 2: Paper summaries
Assignment 3: Class presentations
Assignment 4: Performance, Memory, ISA, VM, Pipelining, and Branch Prediction
Assignment 5: Multi-threading, CMPs, Heterogeneity, and Storage (AKA: compulsory studying for the final)

Projects

Project 1: Prefetching competition