About Me

I am a sixth year Ph.D. candidate in Computer Science working with the Systems and Networking Group at UC San Diego advised by George Porter and Amin Vahdat. I received my Master of Science degree in Computer Science from UC San Diego in 2012 and my Bachelor of Science degree in Computer Science from Caltech in 2009. My research interests include large-scale distributed systems, I/O-intensive "big data" applications, next-generation cluster hardware, the cloud, and efficient, balanced computing.

My Ph.D. thesis centers around efficient and balanced data-intensive computation on next generation cluster technology. I'm interested in measuring performance on technologies that will be commonplace in data centers 5 years from now, such as fast nonvolatile memories and high speed networking. I've also recently become interested in efficient big data computation in the public cloud.

Research

Current Work

While Themis and TritonSort are excellent examples of efficient and balanced data-intensive computing systems, they have both been designed for and evaluated on older cluster hardware such as spinning magnetic hard disks and 10 GbE. I'm currently in the process of bringing Themis up to speed with next generation cluster technologies including Flash on PCI Express and 40 GbE. These technologies, while incredibly fast, reveal interesting bottlenecks both within the application and also within the operating system. These bottlenecks will have to be eliminated to achieve good performance.

In addition to investigating next generation cluster technology, I'm also evaluating Themis on public cloud infrastructure, specifically Amazon EC2. The cloud provides the flexibility necessary to evaluate research software such as Themis on a variety of different hardware configurations. It also enables us to push the boundaries of scalability, which is crucial to the real-world applicability of Themis.

Themis

I developed Themis, the successor to TritonSort. Themis is an I/O efficient MapReduce implementation that achieves the minimum number of I/Os possible (2) when the amount of data greatly exceeds the amount of physical memory. Themis has been evaluated on a variety of common MapReduce jobs and performs at roughly the same record-breaking speed as its predecessor, TritonSort. Themis was published in SOCC 2012.

TritonSort

I developed the world's fastest sorting system, TritonSort. TritonSort achieves record speeds by focusing on per-disk and per-node efficiency. TritonSort aims to sort data at the speed of the disks by keeping all disks constantly reading or writing data in large contiguous chunks. TritonSort set world records in the 2010 and 2011 Sortbenchmark.org competitions. We set a total of seven world records: 2010 Indy GraySort, 2010 Indy MinuteSort, 2011 Daytona GraySort, 2011 Indy GraySort, 2011 Indy MinuteSort, 2011 Daytona 100TB JouleSort, 2011 Indy 100TB JouleSort. Two of these are current world records.

Other Work

During summer of 2009, I investigated balanced systems within the MapReduce framework of Hadoop. Goals consisted of analyzing cluster resources during a MapReduce job, identifying bottlenecks, and classifying various types of jobs according to these bottlenecks with the hope of being able to utilize cluster resources more efficiently.

Industry Experience

I held a Software Engineering Intern position at Google during the summer of 2011 working in the MapReduce group with my mentor Marian Dvorsky. As part of this internship I sorted 10PB with MapReduce.

I held a Software Engineering Intern position at Google during the summer of 2010 working in the search infrastructure group with my mentor Alexander Yip.

Publications

TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System, Alexander Rasmussen, George Porter, Michael Conley, Harsha V. Madhyastha, Radhika Niranjan Mysore, Alexander Pucher, and Amin Vahdat, ACM Transactions on Computing Systems (TOCS), Volume 31 Issue 1, February 2013. Link to paper.

Themis: An I/O-Efficient MapReduce, Alexander Rasmussen, Michael Conley, Rishi Kapoor, Vinh The Lam, George Porter, and Amin Vahdat, Proceedings of the 3rd ACM Symposium on Cloud Computing (SOCC), San Jose, CA, October 2012. Link to paper.

TritonSort: A Balanced Large-Scale Sorting System, Alexander Rasmussen, George Porter, Michael Conley, Harsha V. Madhyastha, Radhika Niranjan Mysore, Alexander Pucher, and Amin Vahdat, Proceedings of the 8th ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI), Boston, MA, March 2011. Link to paper.

Awards

SortBenchmark 2014: After a two-year hiatus, we once again delivered record-breaking sort performance with TritonSort. We sorted 100TB of data (with replication) in 1378 seconds, which landed a tie for the Daytona GraySort record. We also had two outright wins in the Indy and Daytona CloudSort categories. CloudSort is a new benchmark designed to measure the efficacy of using the public cloud for large-scale data processing. We also entered in both the Indy GraySort and Indy MinuteSort categories, but did not have the best entries in either of them. Surprisingly, the winner of these categories ran an implementation of TritonSort inspired by our NSDI 2011 paper, further validating our design choices. More details can be found here and in our submission document.

SortBenchmark 2011: TritonSort competed again in the Sort Benchmark competition in April 2011. This year we set 5 records (all wins, no ties): Indy 100TB GraySort, Daytona 100TB GraySort, Indy 60 second MinuteSort, Indy 100TB Joulesort, Daytona 100TB Joulesort. We built a general purpose MapReduce implementation with an initial sampling phase that enabled us to take the Daytona (general purpose) records. The SortBenchmark rules were updated this year to include a 100TB JouleSort category, which measures records/joule as the performance metric. As it turns out, focusing on cluster balance and resource utilization appears to be a viable means for power efficient computing. TritonSort's JouleSort performance is less than an order of magnitude off of the most efficient system in any of the other JouleSort categories. Keep in mind that energy efficiency was not a primary goal of the TritonSort project. You can read our submission document here.

SortBenchmark 2010: TritonSort, the world's fastest cluster based disk-to-disk sorting system, which we submitted to the Sort Benchmark competition in May 2010. TritonSort placed first in the Indy MinuteSort 60 second category and tied for first in the Indy GraySort 100TB category. TritonSort focuses on per-node efficiency and aims to achieve sorting throughput as close as possible to the sequential speed of the disks. See the SortBenchmark submission document here.

Teaching

I was the teaching assistant for CSE 120: Principles of Operating Systems during Summer Session 1 2012.

Non-Technical

In my spare time, I enjoy strength training, tennis, running, and ballroom dance. I competed in my first push/pull powerlifting contest at UCSD in April 2014, and I'm planning to compete in an official USAPL meet in April 2015.

Michael Conley
Last modified: Nov 8 2014 12:32:34 -0800 (PST)