CSE 291A: Advanced Data Analytics and ML Systems

Course Overview and Goals

This is a research-oriented course on the emerging area of advanced data analytics and ML systems, at the intersection of data management, ML/AI, and systems. This area is a driving force behind several modern data-driven applications that use large-scale machine learning to analyze large and complex datasets, including enterprise business intelligence, healthcare, recommendation systems, social media analytics, Web search, Web security, and Internet of Things. Students will learn about the latest research in this area and get hands-on experience doing either a research project or an in-depth survey of one of the course topics.

Administrivia

Lectures: TueThu 12:30-1:50pm; CSE 2154

Instructor: Arun Kumar; Office: CSE 3218; Office Hours: Thu 2:00-3:00pm

TA: Digvijay Karamchandani (dkaramch [at] eng.ucsd.edu)

Piazza: CSE 291A

Announcements

  • The deadline for submitting the enrollment questionnaire has been extended. Fill out the enrollment questionnaire individually before 5:59 PM Wednesday 01/10/18!

  • The first class is on Thursday 01/09/18.

Pre-requisites

Courses on machine learning, database systems, and operating systems (at UCSD or elsewhere), with good grades in both courses, or prior research experience in a relevant topic, subject to the consent of the instructor. Here are some recommended textbooks on the foundational background needed for this course:

  • ML: "Machine Learning" by Tom Mitchell and "Deep Learning" by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

  • DB: "Database Management Systems" by Raghu Ramakrishnan and Johannes Gehrke

  • OS: "Operating Systems: Three Easy Pieces" by Remzi and Andrea Arpaci-Dusseau

Enrollment

  • Enrollment to this course is capped at 20 students (PhD and MS), with the enrollment decisions made by the instructor in the first week of classes.

  • Fill out the enrollment questionnaire before 11:59 PM Sunday 01/07/18 (now extended to 5:59 PM Wednesday 01/10/18), if you want to enroll in this course. Each student has to fill it out individually. The enrollment decisions will depend upon the answers to these questions. The questionnaire asks about your academic background and research experience, asks you to review a research paper, and poses some open-ended research questions.

  • This 4-credit course will count towards the credit requirements for any of these three area concentrations: database systems, AI, and computer systems.

Course Project

  • Each student has to do either a research project or a survey project. The choice has to be indicated in the questionnaire and will have to finalized before 11:59 PM Thursday 01/18 with an email sent to the instructor.

  • Research Project: Students choosing to do a research project are encouraged to propose a relevant problem (subject to the consent of the instructor) or choose a problem suggested by the instructor and email the final choice to the instructor before 11:59 PM Thursday 01/18. The research projects will ideally lay the groundwork for a publication at a top research conference or workshop. Students are encouraged to do individual research projects but groups of two students each are permitted for projects with a larger scope. Each student (or group) has to meet one-on-one with the instructor at a mutually scheduled half-hour slot every week to discuss the progress on the project.

  • Survey Project: Students choosing to do a survey project have to pick one of the course topics and survey the major relevant research papers on that topic. The final choice must be emailed to the instructor before 11:59 PM Thursday 01/18. Survey projects are restricted to be individual projects and are expected to provide a comprehensive analysis of a topic beyond just summarizing the papers as a laundry list. Each student has to meet with the instructor at a mutually scheduled slot once within the first month of the class to finalize the list of papers that will be surveyed.

  • Project Report: Each student (or group) has to submit a paper-style report of length 6-12 pages on their research project or survey project by the end of the course. The ACM SIG proceedings LaTeX template should be used for the report. The deadline for emailing the report is 11:59 PM Tuesday 03/20.

Course Content and Format

  • The course will be based primarily on a reading list of about 30 recent papers from top conferences such as SIGMOD, VLDB, NIPS, ICLR, NSDI, and OSDI, organized into topics.

  • Each student has to read and submit their individual reviews of about 18 specified papers in the reading list by the deadline corresponding to each paper. The reviews will have a prescribed format and will have to be submitted via a given Google Form (not via email). See the Schedule for more details. For some advice on how to read a research paper with an evaluative but also appreciative mindset, read this excellent article.

  • There will be two 75-minute lectures per week (Tue and Thu) by the instructor on the topics, techniques, and papers (mostly from the reading list). Each topic will span about 2 lectures. The lectures will also involve discussions about the reading list papers. All students are expected to read all the assigned papers and participate in the discussions.

  • Each student is expected to present a 15-min talk about their project to the class. The presentation dates are Tuesday 03/13 and Thursday 03/15.

Grading

  • 50%: Performance on the research or survey project, including the project report

  • 25%: Quality and thoroughness of paper reviews

  • 15%: Project presentation

  • 10%: Participation in the lectures/discussions

Project Performance Metrics

The key metrics for the survey projects are diligence, precision, and technical depth, while creativity and independence are additional metrics for the research projects. The students that perform outstandingly in the research projects will be encouraged to continue working on the project under the instructor's guidance to publish at a top research venue.

Classroom Rules

  • No late days for submitting the paper reviews and the project report.

  • If plagiarism is detected in the paper reviews, the University authorities will be notified immediately for appropriate disciplinary action to be taken.