Schedule

  • This course schedule and reading list are tentative and might change as the quarter progresses.

  • The URLs of the online forms and the deadlines for submitting the paper reviews are listed alongside each paper; the deadline time is 11:59 PM PST on the deadline date.

  • If no URL is provided, you do not need to submit reviews for that particular paper, but it is still required reading and will be discussed during the lectures.

  • Papers without URLs that are also marked as “(Optional)” are not required readings, but they might come up in the lectures; at least skim reading these would be helpful for you.

  • Please try to attend all the lectures and participate in the discussions.

Lecture Date Topic / Paper Review Form URL Review Deadline
01/12 Introduction
" The MADlib Analytics Library or MAD Skills, the SQL
Scalable ML and Analytics Frameworks
01/17 Scaling Distributed Machine Learning with the Parameter Server T1P1 01/16
" Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud
" Scalability! But at what COST?
01/19 Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing T1P2 01/18
" MLI: An API for Distributed Machine Learning
" MLlib: Scalable Machine Learning on Spark (Optional)
Scalable Linear Algebra-based Analytics Systems
01/24 RIOT: I/O-Efficient Numerical Computing without SQL T2P1 01/23
" Storing Matrices on Disk: Theory and Practice Revisited
01/26 SystemML: Declarative Machine Learning on Spark T2P2 01/25
" Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML (Optional)
01/31 LINVIEW: Incremental View Maintenance for Complex Analytical Queries T2P3 01/30
" Cumulon: Optimizing Statistical Data Analysis in the Cloud
Systems for Feature Engineering
02/02 Learning Generalized Linear Models Over Normalized Data T3P1 02/01
" Towards Linear Algebra over Normalized Data
" Bridging the Gap: Towards Optimization Across Linear and Relational Algebra (Optional)
02/07 KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics T3P2 02/06
" To Join or Not to Join? Thinking Twice about Joins before Feature Selection
" Materialization Optimizations for Feature Selection Workloads (Optional)
Systems for Model Selection
02/09 Model Selection Management Systems: The Next Frontier of Advanced Analytics T4P1 02/08
" MLbase: A Distributed Machine-learning System
Statistical Relational and Bayesian Learning Systems
02/14 Extracting Databases from Dark Data with DeepDive
" Incremental Knowledge Base Construction Using DeepDive T5P1 02/13
" Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS (Optional)
02/16 Towards High-Throughput Gibbs Sampling at Scale: A Study across Storage Managers T5P2 02/15
" Simulation of Database-Valued Markov Chains Using SimSQL
Deep Learning Systems
02/21 Deep Learning
" ImageNet Classification with Deep Convolutional Neural Networks T6P1 02/20
" Distributed Representations of Words and Phrases and their Compositionality
02/23 TensorFlow: A System for Large-Scale Machine Learning (Talk Slides) T6P2 02/22
" Deep Learning At Scale and At Ease
02/28 Towards Unified Data and Lifecycle Management for Deep Learning T6P3 02/27
" Understanding Neural Networks Through Deep Visualization (Video)
Hardware-Software Co-Design for ML Systems
03/02 From High-Level Deep Neural Models to FPGAs T7P1 03/01
" DimmWitted: A Study of Main-Memory Statistical Analytics
Miscellaneous (New Techniques, Trends, etc.)
03/07 “Why Should I Trust You?” Explaining the Predictions of Any Classifier T8P1 03/06
" ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models
03/09 Machine Learning: The High-Interest Credit Card of Technical Debt
" MacroBase: Prioritizing Attention in Fast Data T8P2 03/08
03/14 Project Presentations