Current Work
I am currently investigating driving big-data applications at high throughput
on cluster hardware technology of the future. Data centers 5 years from now will
likely consist of resource-dense servers containing very high-speed
components such as non-volatile memory on a fast bus like PCIe. These servers
will also be connected with high-speed networking like 40G ethernet. A key
challenge is driving applications at the speed that the hardware is capable of.
Ideally, an execution framework, such as
Themis, should be able to provide 40 Gb/s of application throughput on these
servers, but actually reaching these speeds is an interesting research
challenge.
Themis
I developed
Themis, the successor
to
TritonSort.
Themis is an I/O efficient MapReduce implementation that achieves
the minimum number of I/Os possible (2) when the amount of data greatly
exceeds the amount of physical memory. Themis has been evaluated on a variety of
common MapReduce jobs and performs at roughly the same record-breaking speed as
its predecessor, TritonSort. Themis was published in
SOCC 2012.
TritonSort
I developed the world's fastest sorting system,
TritonSort. TritonSort
achieves record speeds by focusing on per-disk and per-node efficiency.
TritonSort aims to sort data at the speed of the disks by keeping all disks
constantly reading or writing data in large contiguous chunks. TritonSort set
world records in the 2010 and 2011
Sortbenchmark.org competitions. We hold
a total of seven world records: 2010 Indy GraySort, 2010 Indy MinuteSort, 2011
Daytona GraySort, 2011 Indy GraySort, 2011 Indy MinuteSort, 2011 Daytona 100TB
JouleSort, 2011 Indy 100TB JouleSort. Four of these are current world records.
Other Work
During summer of 2009, I investigated balanced systems
within the MapReduce framework of
Hadoop.
Goals consisted of analyzing cluster resources during a MapReduce job,
identifying bottlenecks, and classifying various types of jobs according to
these bottlenecks with the hope of being able to utilize cluster resources more
efficiently.