Jonathan Weinberg, Ph.D.

Computer Science and Engineering
University of California, San Diego
Performance Modeling and Characterization Lab
San Diego Supercomputer Center
jonweinberg1 at gmail

welcome home

I received my Ph.D. in computer science from UC San Diego where I worked in high performance computing (HPC) under Dr. Allan Snavely.

My area of specialization was performance modeling of large-scale systems and in particular, the memory subsystem. Have a look through the projects and publications on this site to learn more.

Chameleon is an automated framework that addresses three of the classic problems in memory behavior analysis:

(1) Characterizing memory reference locality in applications
(2) Generating accurate synthetic address traces
(3) Creating benchmark proxies for applications

Publications
The Chameleon Framework: Practical Solutions for Memory Behavior Analysis
J. Weinberg. Ph.D. Dissertation, University of California, San Diego, 2008.
Accurate Memory Signatures and Synthetic Address Traces for HPC Applications.
J. Weinberg, A. Snavely. In The 22nd ACM International Conference on Supercomputing (ICS08), Island of Kos, Greece, June 7-12, 2008.
Chameleon: A Framework for Observing, Understanding, and Imitating Memory Behavior
J. Weinberg, A. Snavely. In PARA'08: Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May 13-16, 2008.
The Chameleon Framework: Practical Solutions for Memory Behavior Analysis

PDF

This dissertation presents the Chameleon framework, an integrated solution to three classic problems in the field of memory performance analysis: reference locality modeling, accurate synthetic address trace generation, and the creation of synthetic benchmark proxies for applications. The framework includes software tools to capture a concise, machine-independent memory signature from any application and produce synthetic memory address traces that mimic that signature. It also includes the Chameleon benchmark, a fully tunable synthetic executable whose memory behavior can be dictated by these signatures. By simultaneously modeling both spatial and temporal locality, Chameleon produces uniquely accurate, general-purpose synthetic traces. Results demonstrate that the cache hit rates generated by each synthetic trace are nearly identical to those of the application it targets on dozens of memory hierarchies representing many of today's commercial offerings.

Accurate Memory Signatures and Synthetic Address Traces for HPC Applications

PDF

The Chameleon framework is a software suite that includes tools to capture a concise, machine-independent memory signature from any application and produce synthetic memory address traces that mimic that signature. In this work, we apply the framework to high-performance computing (HPC) by leveraging sampling techniques to capture the memory signatures of full-scale, parallel applications with only a 5x slowdown. The overall result is therefore a con- cise, observable, and machine-independent representation of the memory requirements of full-scale applications that can be tractably captured and accurately mimicked.

@INPROCEEDINGS{weinberg08accurate,
  author = {J. Weinberg and A. Snavely},
  title = {Accurate Memory Signatures and Synthetic Address Traces for HPC Applications},
  booktitle = {The 22nd ACM International Conference on Supercomputing (ICS08)},
  year = {2008},
  address = {Kos, Greece},
  month = {June}
}
Chameleon: A framework for observing, understanding, and imitating memory behavior

PDF

In this work, we present an integrated solution to three classic problems in the field of performance analysis: memory modeling, synthetic address trace generation, and the creation of synthetic benchmark proxies for applications. First, we describe an intuitive characterization of memory access locality that can accurately predict an application's hit rates on arbitrary cache con gurations, even when block sizes and cache depths change. We then describe the implementation of a memory tracer that can extract this characterization from applications and a software tool that can generate synthetic address traces to match. Lastly, we describe Chameleon, a fully tunable synthetic benchmark whose memory behavior can be dictated by the traces described above. We show that applications and their Chameleon counterparts display highly similar memory behavior as measured by simulated and observed cache hit rates. Errors are normally within 2%.
@INPROCEEDINGS{weinberg08chameleon,
  author = {J. Weinberg and A. Snavely},
  title = {Chameleon: A framework for observing, understanding, and imitating memory behavior},
  booktitle = {PARA08: Workshop on State-of-the-Art in Scientific and Parallel Computing},
  year = {2008},
  address = {Trondheim, Norway},
  month = {May}
}

Symbiotic space-sharing is a scheduling technique that improves throughput on SMP systems by executing parallel applications in combinations and configurations that alleviate pressure on shared resources.

Publications
User-Guided Symbiotic Space-Sharing of Real Workloads
J. Weinberg, A. Snavely. In The 20th ACM International Conference on Supercomputing (ICS06), Cairns, Australia, June 28-July 1, 2006.
Symbiotic Space-Sharing on SDSC's DataStar System
J. Weinberg, A. Snavely. In The 12th Workshop on Job Scheduling Strategies for Parallel Processing, Saint-Malo, France, June 27, 2006 (LNCS 4376, pp.192-209, 2007).
When Jobs Play Nice: The Case For Symbiotic Space-Sharing
J. Weinberg, A. Snavely. In Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing (HPDC 15), Paris, France, June 19-23, 2006.
User-Guided Symbiotic Space-Sharing of Real Workloads

[PDF]

Symbiotic space-sharing is a technique that can improve system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources. We have shown prototype schedulers that leverage such techniques to improve throughput by 20% over conventional space-sharing schedulers when resource bottlenecks are known. Such evaluations have utilized benchmark workloads and proposed that schedulers be informed of resource bottlenecks by users at job submission time; in this work, we investigate the accuracy with which users can actually identify resource bottlenecks in real applications and the implications of these predictions for symbiotic space-sharing of production workloads. Using a large HPC platform, a representative application workload, and a sampling of expert users, we show that user inputs are of value and that for our chosen workload, user-guided symbiotic scheduling can improve throughput over conventional space-sharing by 15-22%.
@INPROCEEDINGS{weinberg06user-guided,
  author = {J. Weinberg and A. Snavely},
  title = {User-Guided Symbiotic Space-Sharing of Real Workloads},
  booktitle = {The 20th {ACM} International Conference on Supercomputing (ICS'06)},
  year = {2006},
  month = {June}
}
Symbiotic Space-Sharing on SDSC's Datastar System

[PDF]

Using a large HPC platform, we investigate the effectiveness of "symbiotic space-sharing", a technique that improves system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources. We demonstrate that relevant benchmarks commonly suffer a 10-60% penalty in runtime efficiency due to memory resource bottlenecks and up to several orders of magnitude for I/O. We show that this penalty can be often mitigated, and sometimes virtually eliminated, by symbiotic space-sharing techniques and deploy a prototype scheduler that leverages these findings to improve system throughput by 20%.
@INPROCEEDINGS{weinberg06symbiotic,
  author = {J. Weinberg and A. Snavely},
  title = {Symbiotic Space-Sharing on SDSC's Datastar System},
  booktitle = {The 12th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP '06)},
  year = {2006},
  address = {St. Malo, France},
  month = {June}
}
When Jobs Play Nice: The Case For Symbiotic Space-Sharing

[PDF]

Using a large HPC platform, we investigate the effectiveness of "symbiotic space-sharing", a technique that improves system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources. We demonstrate that relevant benchmarks commonly suffer a 10-60% penalty in runtime efficiency due to memory resource bottlenecks and up to several orders of magnitude for I/O. We show that this penalty can be often mitigated, and sometimes virtually eliminated, by symbiotic space-sharing techniques and deploy a prototype scheduler that leverages these findings to improve system throughput by 20%.
@INPROCEEDINGS{weinberg06symbiosisHPDC,
  author = {J. Weinberg and A. Snavely},
  title = {When Jobs Play Nice: The Case For Symbiotic Space-Sharing},
  booktitle = {Proceedings of the 15th {IEEE} {I}nternational {S}ymposium on {H}igh
	{P}erformance {D}istributed {C}omputing ({HPDC}-15 '06)},
  year = {2006},
  address = {Paris, France},
  month = {June}
}
2006
Job Scheduling on Parallel Systems
J. Weinberg. Ph.D. Research Examination, University of California, San Diego, June 2006.
2005
Quantifying Locality In The Memory Access Patterns of HPC Applications
J. Weinberg, M. O. McCracken, A. Snavely, E. Strohmaier. In Supercomputing 2005, Seattle, WA, November 12-16, 2005.
Datagridflows: Managing Long-Run Processes on Datagrids
A. Jagatheesan, J. Weinberg, et al. In Lecture Notes in Computer Science-3836 Springer 2005, ISBN 3-540-31212-9 & VLDB Workshop on Data Management in Grids, Trondheim, Norway, September 2-3, 2005.
Quantifying Locality In The Memory Access Patterns of HPC Applications
J. Weinberg. Masters Thesis, University of California, San Diego, August, 26, 2005.
2004
Gridflow Description, Query, and Execution at SCEC using the SDSC Matrix
J. Weinberg, A. Jagatheesan, A. Ding, M. Faerman, Y. Hu. Proceedings of the 13th IEEE International Symposium on High-Performance Distributed Computing (HPDC 13), Honolulu, Hawaii, June 4-6, 2004.
Job Scheduling on Parallel Systems

[PDF]

Parallel systems such as supercomputers are valuable resources commonly shared among a community of users. The problem of job scheduling is to determine how that sharing should be done in order to maximize the system's utility. This problem has been extensively studied for well over a decade, yielding a great breadth of knowledge and techniques. In this work, we survey the ideas and approaches that have proven most influential to how jobs are scheduled on today's large-scale parallel systems. With this background in mind, we discuss how deployed scheduling policies can be improved to meet existing requirements and how trends in parallel processing are currently altering those requirements.
Quantifying Locality In The Memory Access Patterns of HPC Applications

[PDF]

Several benchmarks for measuring the memory performance of HPC systems along dimensions of spatial and temporal memory locality have recently been proposed. However, little is understood about the relationships of these benchmarks to real applications and to each other. We propose a methodology for producing architecture-neutral characterizations of the spatial and temporal locality exhibited by the memory access patterns of applications. We demonstrate that the results track intuitive notions of locality on several synthetic and application benchmarks. We employ the methodology to analyze the memory performance components of the HPC Challenge Benchmarks, the Apex-MAP benchmark, and their relationships to each other and other benchmarks and applications.
@inproceedings{weinberg05quantifying,
 author = {Jonathan Weinberg and Michael O. McCracken and Erich Strohmaier and Allan Snavely},
 title = {Quantifying Locality In The Memory Access Patterns of HPC Applications},
 booktitle = {SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing},
 year = {2005},
 isbn = {1-59593-061-2},
 doi = {http://dx.doi.org/10.1109/SC.2005.59},
 publisher = {IEEE Computer Society},
 address = {Washington, DC, USA}}
 
Datagridflows: Managing Long-Run Processes on Datagrids

[PDF]

Data grids have become important for managing large, unstructured data and storage re- sources distributed over autonomous administrative domains. The datagrids that are operating in production provide us an idea of new requirements and chal- lenges that will be faced in future datagrid environments. One such requirement is the coordinated execution of long-run data management processes in datagrids. This paper is intended to introduce the challenges of datagrid environments to other research- ers, including those new to grid computing. We provide motivation through dis- cussion of datagridflow requirements and real production scenarios. We intro- duce current work on datagridflow technologies including the Datagrid Language (DGL) for describing datagridflows in datagrids.
@inproceedings {Jagatheesan05Datagridflows
  'authors' = 'Arun Jagatheesan and Jonathan Weinberg and Reena Mathew 
	and Allen Ding and Erik Vandekieft and Daniel Moore and Reagan W. Moore 
	and Lucas Gilbert and Mark Tran and Jeffrey Kuramoto',
  'booktitle' = 'DMG',
  'ee' = 'http://dx.doi.org/10.1007/11611950_10',
  'key' = 'conf/dmg/JagatheesanWMDVMMGTK05',
  'pages' = '113-128',
  'title' = 'Datagridflows: Managing Long-Run Processes on Datagrids',
  'year' = '2005'}
Quantifying Locality In The Memory Access Patterns of HPC Applications

[PDF]

Several benchmarks for measuring the memory performance of HPC systems along dimensions of spatial and temporal memory locality have recently been proposed. However, little is understood about the relationships of these benchmarks to real applications and to each other. We propose a methodology for producing architecture-neutral characterizations of the spatial and temporal locality exhibited by the memory access patterns of applications. We demonstrate that our results track intuitive notions of locality on several synthetic and application benchmarks. We employ the methodology to analyze the memory performance components of the HPC Challenge Benchmarks, the Apex-MAP benchmark, and their relationships to each other and other benchmarks and applications. We show that our methodology can be applied to scoring real large-scale parallel applications and that this analysis can be used to both increase understanding of the benchmarks and enhance their usefulness by mapping them, along with applications, to a 2-D space along axes of spatial and temporal locality.

Gridflow Description, Query, and Execution at SCEC using the SDSC Matrix

[PDF]

While conventional workflow systems have been around for many years, the deployment of analogous systems onto a grid infrastructure introduces a number of unique questions and challenges. Innovative approaches to grid workflow (gridflow) are needed to leverage the heterogeneity, autonomy, dynamic behavior, and wide-area distribution that characterize grid resources. The Matrix Project carries out research and development to deliver the language descriptions and protocols necessary to build collaborative gridflow management systems for the emerging grid infrastructures. We describe here our activities to date including development of the Data Grid Language (DGL) and the usage of the Matrix gridflow management system by the Southern California Earthquake Center (SCEC) to manage its gridflows.