Introduction

Advanced analytics is a booming area in both industry and academia. Several projects aim to implement ML algorithms efficiently. But three key challenging and iterative practical tasks in using ML -- feature engineering (FE), algorithm selection (AS), and parameter tuning (PT), collectively called model selection -- have largely been overlooked by the data management community even though these are often the most time-consuming tasks for analysts. There is surprisingly little end-to-end systems support for the iterative process of model selection, which causes pain to analysts and wastes resources. To make this process easier and faster, we envision a unifying abstract framework based on the idea of a Model Selection Triple (MST). Our framework acts a basis for a new class of analytics systems we call Model Selection Management Systems (MSMS). We discuss how time-tested ideas from database research offer new avenues to improve model selection, and explain how MSMS are a new frontier for interesting and impactful data management research.

While a large body of work in ML focuses on various theoretical aspects of model selection, in practice, analysts typically use an iterative exploratory process that combines the ML techniques and their domain-specific expertise. Nevertheless this iterative process has structure. We divide it into three phases -- Steering, Execution, and Consumption. We explain how we can purpose three key ideas from database research -- declarativity, optimization, and provenance -- to improve the respective phase of an iteration. Our framework can help reduce both the number of iterations and the time per iteration of the model selection process. However, we identify several research challenges in realizing our vision and provide examples of research problems that arise. Solving these problems requires new research at the intersection of data management, ML, and HCI.

Blog post on MSMS