CSE 240B:  Parallel Computer Architecture

Cell BE


Tuesday/Thursday
12:30-1:50
WLH 2209


Instructor:Dean Tullsen

Email:
tullsen

 at 

cs
 dot 
ucsd
 dot 
edu
Office hours:
Class mailing list: cse240b@cs.ucsd.edu archive

Syllabus

Text:
Related reading: Assignments:
Projects:
Grading:
Other notes:

Schedule (very tentative at the moment)

Date
Topic
Readings
January 6
Administrivia; Overview of parallel architecture; Introduction to Coherence (slides)
n/a
January 8
Coherence (slides)
H&P 3rd ed.:6.1-6.6 (esp 6.3 and 6.5)
H&P 4th ed.:4.1-4.5
January 13 Coherence (slides)
M. M. K. Martin, M. D. Hill, and D. A. Wood, ``Token coherence: decoupling performance and correctness,'' in ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pp. 182-193, 2003 link.

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, ``The directory-based cache coherence protocol for the DASH multiprocessor,'' in ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pp. 148-159, 1990 link.
OPTIONAL (DASH performance) D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, ``The DASH prototype: implementation and performance,'' in ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pp. 418-429, 1998 link.
January 15
Consistency (slides)
S. V. Adve and K. Gharachorloo, ``Shared Memory Consistency Models: A Tutorial,'' tech. rep., DEC WRL, 1995 link.
January 20 Consistency (slides)
V. S. Pai, P. Ranganathan, S. V. Adve, and T. Harton, ``An evaluation of memory consistency models for shared-memory systems with ILP processors,'' in ASPLOS-VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, pp. 12-23, 1996 link.

C. Gniady, B. Falsafi, and T. N. Vijaykumar, ``Is SC + ILP = RC?,'' in ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture, pp. 162-171, 1999 link.
OPTIONAL (but mind-bending) J. Manson, W. Pugh, and S. V. Adve, ``The Java memory model,'' in POPL '05: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of programming languages, pp. 378-391, 2005 link.
January 22 Synchronization (slides)
H&P 3rd ed.: 6.7
H&P 4th ed.: 4.5


M. Herlihy, ``A methodology for implementing highly concurrent data structures,'' in PPOPP '90: Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming, pp. 197-206, 1990 link.
January 27 Transactions (slides)
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun, ``Transactional Memory Coherence and Consistency,'' in ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, p. 102, 2004 link.

M. Herlihy and J. E. B. Moss, ``Transactional memory: architectural support for lock-free data structures,'' in ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pp. 289-300, 1993 link.
January 29
Transactions (slides)
R. Rajwar, M. Herlihy, and K. Lai, ``Virtualizing Transactional Memory,'' in ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pp. 494-505, 2005 link.

B. Saha and A.-R. A.-T. Q. Jacobson, ``Architectural Support for Software Transactional Memory,'' in MICRO '06: Proceedings of the 39th international symposium on Microarchitecture, 2006.
February 3 SMT/Threading (slides)
D.M. Tullsen, S.J. Eggers, J.M. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm, ``Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor,'' in ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture, pp. 191-202, 1996 link.

A. Agarwal, R. Bianchini, D. Chaiken, K. L. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie, and D. Yeung, ``The MIT Alewife machine: architecture and performance,'' in ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pp. 2-13, 1995 link.
February 5 SMT/Threading (slides)
J. D. Collins, H. Wang, D. M. Tullsen, C. Hughes, Y.-F. Lee, D. Lavery, and J. P. Shen, ``Speculative precomputation: long-range prefetching of delinquent loads,'' in ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture, pp. 14-25, 2001 link.

E. Tune, R. Kumar, D. M. Tullsen, and B. Calder, ``Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy,'' in MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pp. 183-194, 2004 link.
February ??
Mini-project due

February 10 Mid-term

February 12 CMPs
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, ``The case for a single-chip multiprocessor,'' in ASPLOS-VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, pp. 2-11, 1996 link.

J. Huh, D. Burger, and S. W. Keckler, ``Exploring the Design Space of Future CMPs,'' in PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, pp. 199-210, 2001 link.
February 17 CMPs
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas, ``Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance,'' in ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, p. 64, 2004 link.

???? H. McGhan, ``Niagara 2 Opens the Floodgates,'' Microprocessor Report, November 2006 link.
February 19 Interconnects

H&P3 8.1-8.5, 8.9 or H&P4 E.1-E.6, E.10


February 24 Interconnects R. Kumar, V. Zyuban, and D. M. Tullsen, ``Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling,'' in ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pp. 408-419, 2005 link.


February 26 Tiled Architectures K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore, ``Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture,'' SIGARCH Comput. Archit. News, vol. 31, no. 2, pp. 422-433, 2003 link.

S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers, ``The WaveScalar Architecture.''
To Appear in ACM Transactions On Computer Systems. link
March 3 CMP Caches J. Chang, G. Sohi, "Cooperative Caching for Chip Multiprocessors," ISCA 2006, link.

Iyer, et al., "QoS Policies and Architecture for Cache/Memory in CMP
Platforms". Sigmetrics 2007, link.
March 5 Non General-Purpose/streaming machines Gschwind, et al., ``Synergistic processing in Cell's multicore architecture", IEEE Micro, March 2006, link.

J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das, ``Evaluating the Imagine Stream Architecture,'' in ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, p. 14, 2004 link.
March 10 ISCA 2008
Polymorphic On-Chip Networks
Martha Mercaldi Kim, John D. Davis,
Mark Oskin, and Todd Austin
link
  Atom-Aid: Detecting and Surviving Atomicity Violations
Brandon Lucia, Joseph Devietti, Karin Strauss, and Luis Ceze link
March 12 Project Presentations (Students)

March 16
Project Report due (midnight)

March 20
Final