cse240c: Advanced Microarchitecture

Winter, 2009
Lectures Tue. & Thu., 5:00-6:20
Winter, 2009
Steven Swanson
Email: swanson @ cs.ucsd.edu
IM (not email): professorswanson@{AIM, Yahoo!, google talk, MS Messenger}
Office: EBU3B 3212
Office Hours: TBA


Course Description

This course will cover advanced topics in processor microarchitecture. We will cover both the "latest and greatest" as well as the "oldies but goodies" in both commercial processors and architecture research. We will learn answers to questions like:

The basic format for the class will be: Read papers and discuss. There will also be a mid-sized project.

Text books

Required: Assigned readings throughout the quarter. See the schedule below


Note that 40% of you grade is determined by preparing for and participating in class.

Paper summaries 20% You will summarize each paper we read in class. Summaries are due 20 minutes before class begins. No exceptions. This means there is no reason to be late for class to complete your summary.
Class participation 20% This class is discussion driven, so must come prepared to discuss the material
Project 30% There will be a mid-sized project.
In class presentations 30% In lieue of exams, each of you prepare and present two presentations on topics we will cover.


We will read roughly two papers per class. Some days listed below have more than that, we'll thin them out depending on class interest.


Items in the schedule more that one week in the future are subject to change. Check back for updates for the assigned readings, etc. Deadlines for homeworks/projecsts that have been assigned be not be moved earlier.

I will post the slides for most lectures. Since the slides contain material I am not allowed to distribute publically, they are password protected. I have posted the username and password to the web board.

Date Topic Readings Slides Due Notes
Tuesday, January 6 Administrivia and overview slides , slides
Thursday, January 8 Historical perspectives Cramming More Components Onto Integrated Circuits, G.E. Moore, Proceedings of the IEEE 86(1):82-85, Jan 1998 link.

The history of the microcomputer-invention and evolution, S. Mazor, Proceedings of the IEEE 83(12):1601-1608, Dec 1995 link.

Additional readings if you are interested:
A 4096-bit dynamic MOS RAM, J. Karp, W. Regitz, and S. Chou, Solid-State Circuits Conference. Digest of Technical Papers. 1972 IEEE International XV: 10-11, Feb 1972 link.

A three transistor-cell, 1024-bit, 500 NS MOS RAM, W. Regitz and J. Karp, Solid-State Circuits Conference. Digest of Technical Papers. 1970 IEEE International XIII: 42-43, Feb 1970 link.

Design of ion-implanted MOSFET's with very small physical dimensions, R.H. Dennard, F.H. Gaensslen, V.L. Rideout, E. Bassous, and A.R. LeBlanc, Solid-State Circuits, IEEE Journal of 9(5): 256-268, Oct 1974 link.

The future of wires, R. Ho, K.W. Mai, and M.A. Horowitz, Proceedings of the IEEE 89(4):490-504, Apr 2001 link.
slides , slides , slides , slides
Tuesday, January 13 Historical perspectives Architecture of the IBM System/360, G. M. Amdahl, G. A. Blaauw, and Jr. F. P. Brooks, :17-31, 2000 link.

Parallel operation in the control data 6600, James E. Thornton, :5-12, 1995 link.

Additional readings if you are interested:
Design of a Computer -- The Control Data 6600, James E. Thornton, link.

Considerations in Computer Design - Leading up to the Control Data 6600, James E. Thornton, , 1963 link.

IBM's 360 and early 370 systems, Emerson Pugh, Lyle R. Johnson, and John H. Palmer MIT Press, 1991.
slides , slides , slides
Thursday, January 15 Historical perspectives CRAY-1 Computer Technology, J. Kolodzey, Components, Hybrids, and Manufacturing Technology, IEEE Transactions on 4(2): 181-186, Jun 1981 link.

The CRAY-1 computer system, Richard M. Russell, Commun. ACM 21(1):63-72, 1978 link.

Additional readings if you are interested:
An analysis of the Cray-1 computer, Richard L. Sites, ISCA '78: Proceedings of the 5th annual symposium on Computer architecture, New York, NY, USA, 1978, pages 101-106 link.

Tarantula: a vector extension to the alpha architecture, R. Espasa, F. Ardanaz, J. Emer, S. Felix, J. Gago, R. Gramunt, I. Hernandez, T. Juan, G. Lowney, M. Mattina, and A. Seznec, Computer Architecture, 2002. Proceedings. 29th Annual International Symposium on:281-292, 2002 link.
Tuesday, January 20 Unconventional OOO exeuction Executing a program on the MIT tagged-token dataflow architecture , Arvind and R.S. Nikhil, Computers, IEEE Transactions on 39(3):300-318, Mar 1990 link.

HPS, a new microarchitecture: rationale and introduction, Y. N. Patt, W. M. Hwu, and M. Shebanow, MICRO 18: Proceedings of the 18th annual workshop on Microprogramming, New York, NY, USA, 1985, pages 103-108 link.

Additional readings if you are interested:
Critical issues regarding HPS, a high performance microarchitecture, Y. N. Patt, S. W. Melvin, W. M. Hwu, and M. C. Shebanow, SIGMICRO Newsl. 16(4):109-116, 1985 link.

First version of a data flow procedure language, J. B. Dennis, Programming Symposium, Proceedings Colloque sur la Programmation, London, UK, 1974, pages 362-376.

Monsoon: an explicit token-store architecture, G.M. Papadopoulos and D.E. Culler, Computer Architecture, 1990. Proceedings., 17th Annual International Symposium on:82-91, May 1990 link.
slides , slides Project 1-1;
Thursday, January 22 Unconventional OOO exeuction The WaveScalar architecture, Steven Swanson, Andrew Schwerin, Martha Mercaldi, Andrew Petersen, Andrew Putnam, Ken Michelson, Mark Oskin, and Susan J. Eggers, ACM Trans. Comput. Syst. 25(2):4, 2007 link. Focus on Section 1-4, and skim the rest.

. Tartan: evaluating spatial computation for whole program execution, Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu, ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, New York, NY, USA, 2006, pages 163-174 link.

Additional readings if you are interested:
Spatial computation, Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein, ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, New York, NY, USA, 2004, pages 14-26 link.

NanoFabrics: spatial computing using molecular electronics, Seth Copen Goldstein and Mihai Budiu, ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture, New York, NY, USA, 2001, pages 178-191 link.

slides , slides
Tuesday, January 27 Unconventional OOO exeuction TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP, Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Nitya Ranganathan, Doug Burger, Stephen W. Keckler, Robert G. McDonald, and Charles R. Moore, ACM Trans. Archit. Code Optim. 1(1):62-93, 2004 link.

Composable Lightweight Processors, Changkyu Kim, Simha Sethumadhavan, M. S. Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, and Stephen W. Keckler, MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2007, pages 381-394 link.

Additional readings if you are interested:
Universal Mechanisms for Data-Parallel Architectures, Karthikeyan Sankaralingam, Stephen W. Keckler, William R. Mark, and Doug Burger, MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2003, page 303 link.

A design space evaluation of grid processor architectures, R. Nagarajan, K. Sankaralingam, D. Burger, and S.W. Keckler, Microarchitecture, 2001. MICRO-34. Proceedings. 34th ACM/IEEE International Symposium on: 40-51, Dec. 2001 link.
slides Zack Presents.
Thursday, January 29 Reliability Transient fault detection via simultaneous multithreading, Steven K. Reinhardt and Shubhendu S. Mukherjee, SIGARCH Comput. Archit. News 28(2):25-36, 2000 link.

Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor, Christopher Weaver, Joel Emer, Shubhendu S. Mukherjee, and Steven K. Reinhardt, ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, Washington, DC, USA, 2004, page 264 link.

Additional readings if you are interested:
Concurrent error detection using watchdog processors-a survey, A. Mahmood and E.J. McCluskey, Computers, IEEE Transactions on 37(2):160-174, Feb. 1988 link.

The risk of data corruption in microprocessor-based systems, R. Horst, D. Jewett, and D. Lenoski, Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third International Symposium on:576-585, Jun 1993 link.

AR-SMT: a microarchitectural approach to fault tolerance in microprocessors, E. Rotenberg, Fault-Tolerant Computing, 1999. Digest of Papers. Twenty-Ninth Annual International Symposium on:84-91, 1999 link.

IBM's S/390 G5 microprocessor design, T.J. Slegel, III Averill, R.M., M.A. Check, B.C. Giamei, B.W. Krumm, C.A. Krygowski, W.H. Li, J.S. Liptay, J.D. MacDougall, T.J. McPherson, J.A. Navarro, E.M. Schwarz, K. Shum, and C.F. Webb, Micro, IEEE 19(2):12-23, Mar/Apr 1999 link.
slides Jose Presents
Tuesday, February 3 Reliability A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor, Shubhendu S. Mukherjee, Christopher Weaver, Joel Emer, Steven K. Reinhardt, and Todd Austin, MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2003, page 29 link.

DIVA: a reliable substrate for deep submicron microarchitecture design, Todd M. Austin, MICRO 32: Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, Washington, DC, USA, 1999, pages 196-207 link.
slides , slides Zack Presents.
Thursday, February 5 Circuit-level microarchitectural issues ReCycle:: pipeline adaptation to tolerate process variation, Abhishek Tiwari, Smruti R. Sarangi, and Josep Torrellas, ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, New York, NY, USA, 2007, pages 323-334 link.

Razor: a low-power pipeline based on circuit-level timing speculation, D. Ernst, Nam Sung Kim, S. Das, S. Pant, R. Rao, Toan Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T. Mudge, Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on: 7-18, Dec. 2003 link.

slides Ameen presents
Tuesday, February 10 Circuit-level microarchitectural issues The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, M. S. Hrishikesh, Doug Burger, Norman P. Jouppi, Stephen W. Keckler, Keith I. Farkas, and Premkishore Shivakumar, SIGARCH Comput. Archit. News 30(2):14-24, 2002 link.

Optimum Power/Performance Pipeline Depth, A. Hartstein and Thomas R. Puzak, MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2003, page 117 link.
slides Anshuman presents.
Thursday, February 12 Multi-threading Multiscalar processors, Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar, ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, New York, NY, USA, 1995, pages 414-425 link.

A scalable approach to thread-level speculation, J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry, ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, New York, NY, USA, 2000, pages 1-12 link.

Additional readings if you are interested:
Speculative Versioning Cache, T. N. Vijaykumar, , Sridhar Gopal, , James E. Smith, , and Gurindar Sohi, , IEEE Trans. Parallel Distrib. Syst. 12(12):1305-1317, 2001 link.

slides , slides
Tuesday, February 17 Specialized architectures Imagine: media processing with streams, B. Khailany, W.J. Dally, U.J. Kapasi, P. Mattson, J. Namkoong, J.D. Owens, B. Towles, A. Chang, and S. Rixner, Micro, IEEE 21(2):35-46, Mar/Apr 2001 link.

CryptoManiac: a fast flexible architecture for secure communication, Lisa Wu, Chris Weaver, and Todd Austin, ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture, New York, NY, USA, 2001, pages 110-119 link.

Additional readings if you are interested:
Evaluating the Imagine Stream Architecture, Jung Ho Ahn, William J. Dally, Brucek Khailany, Ujval J. Kapasi, and Abhishek Das, ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, Washington, DC, USA, 2004, page 14.

slides , slides Dan Amelang presents.
Thursday, February 19 Program analysis Automatically characterizing large scale program behavior, Timothy Sherwood, , Erez Perelman, , Greg Hamerly, , and Brad Calder, , ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, New York, NY, USA, 2002, pages 45-57 link.

Limits of control flow on parallelism, Monica S. Lam and Robert P. Wilson, SIGARCH Comput. Archit. News 20(2):46-57, 1992 link.

Additional readings if you are interested:
Phase tracking and prediction, Timothy Sherwood, Suleyman Sair, and Brad Calder, SIGARCH Comput. Archit. News 31(2):336-349, 2003 link.

The intrinsic bandwidth requirements of ordinary programs, Andrew S. Huang and John Paul Shen, ASPLOS-VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, New York, NY, USA, 1996, pages 105-114 link.

Limits on multiple instruction issue, M. D. Smith, , M. Johnson, , and M. A. Horowitz, , SIGARCH Comput. Archit. News 17(2):290-302, 1989 link.

Limits of instruction-level parallelism, David W. Wall, , ASPLOS-IV: Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, New York, NY, USA, 1991, pages 176-188 link.
slides Jose presents.
Tuesday, February 24 Highly-dynamic execution Putting the fill unit to work: dynamic optimizations for trace cache microprocessors, Daniel Holmes Friendly, Sanjay Jeram Patel, and Yale N. Patt, MICRO 31: Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, Los Alamitos, CA, USA, 1998, pages 173-181 link.

PipeRench implementation of the instruction path coprocessor, Yuan Chou, Pazhani Pillai, Herman Schmit, and John Paul Shen, MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, New York, NY, USA, 2000, pages 147-158 link.

slides Rushi presents
Thursday, February 26 Power Energy Optimization of Subthreshold-Voltage Sensor Network Processors, Leyla Nazhandali, Bo Zhai, Javin Olson, Anna Reeves, Michael Minuth, Ryan Helfand, Sanjay Pant, Todd Austin, and David Blaauw, ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture, Washington, DC, USA, 2005, pages 197-207 link.

Temperature-aware microarchitecture: Modeling and implementation, Kevin Skadron, Mircea R. Stan, Karthik Sankaranarayanan, Wei Huang, Sivakumar Velusamy, and David Tarjan, ACM Trans. Archit. Code Optim. 1(1):94-125, 2004 link.

slides Ameen presents
Tuesday, March 2 Case Studies Core 2 article 1
Core 2 article 2
Core 2 article 3
Itanium Processor Microarchitecture, Harsh Sharangpani, and Ken Arora, , IEEE Micro 20(5):24-43, 2000 link.

EPIC: Explicitly Parallel Instruction Computing, M.S. Schlansker and B.R. Rau, Computer 33(2):37-45, Feb 2000 link.

slides Anshuman Presents
Thursday, March 4 Case studies The microarchitecture of the pentium 4 processor, Dave Sager, Desktop Platforms Group, and Intel Corp, Intel Technology Journal 1:2001, 2001 link.

The Alpha 21264 microprocessor architecture, R.E. Kessler, E.J. McLellan, and D.A. Webb, Computer Design: VLSI in Computers and Processors, 1998. ICCD '98. Proceedings. International Conference on:90-95, Oct 1998 link.

slides Rushi Presents
Tuesday, March 9 No class
Thursday, March 11 Potporri Chapters 1 and 9 of Capability-Based Computer Systems, Henry M. Levy, 1984.

Decoupled access/execute computer architectures, James E. Smith, , ACM Trans. Comput. Syst. 2(4):289-308, 1984 link.
slides , slides Dan presents
6:00pm, Thursday, March 19 Final Exam/Project presentations/Pizza TBA Project 1-2; In cse4217


