Model Selection Management Systems:
Advanced analytics is a booming area in both industry and academia.
Several projects aim to implement ML algorithms efficiently.
But three key challenging and iterative practical tasks in using ML -- feature
engineering (FE), algorithm selection (AS), and parameter tuning (PT),
collectively called model selection -- have largely been
overlooked by the data management community even though these are often
the most time-consuming tasks for analysts.
There is surprisingly little end-to-end systems support for the iterative
process of model selection, which causes pain to analysts and wastes resources.
To make this process easier and faster, we envision a unifying abstract
framework based on the idea of a Model Selection Triple (MST).
Our framework acts a basis for a new class of analytics systems we call
Model Selection Management Systems (MSMS).
We discuss how time-tested ideas from database research offer new avenues to
improve model selection, and explain how MSMS are a new frontier for interesting
and impactful data management research.
While a large body of work in ML focuses on various theoretical aspects of model
selection, in practice, analysts typically use an iterative exploratory
process that combines the ML techniques and their domain-specific expertise.
Nevertheless this iterative process has structure. We divide it into three phases --
Steering, Execution, and Consumption. We explain how we can purpose three key ideas
from database research -- declarativity, optimization, and provenance -- to improve
the respective phase of an iteration. Our framework can help reduce both the number
of iterations and the time per iteration of the model selection process.
However, we identify several research challenges in realizing our vision
and provide examples of research problems that arise. Solving these problems
requires new research at the intersection of data management, ML, and HCI.
Blog post on MSMS
Last Updated: Dec 2015