CSE 234: Data Systems for Machine Learning

Lectures: TuTh 5-6:20pm PT @ WLH 2005

Instructor: Arun Kumar

  • Email: akk018 [at] ucsd.edu

  • Office Hours: Fri 1-2pm PT @ 3218 CSE

Teaching Assistants:

  • Kabir Nagrecha

    • Email: knagrech [at] ucsd.edu

    • Office Hours: Thu 2-3pm PT @ Zoom

  • Aditya Gulati

    • Email: adgulati [at] ucsd.edu

    • Office Hours: Tue 3-4pm PT @ Zoom

Piazza: CSE 234

Announcements

  • The introductory lecture is on Thu, Jan 11.

Course Goals and Content

This is a research-based course on data systems for machine learning (ML), at the intersection of the fields of ML/AI, data management, and systems. Such systems power modern data science applications on large and complex datasets, including enterprise analytics, recommendation systems, social media analytics, and generative AI. Students will learn about the landscape and evolution of such systems and the latest research. This is a lecture-driven course with quizzes, exams, and paper reviewing components for evaluation. It is primarily tailored for MS students, PhD students, and advanced undergraduates interested in the state of the art of systems for scalable data science and ML engineering.

This course will cover key systems topics spanning the whole lifecycle of ML-based data analytics, including programming models and systems for scalable ML model building, data sourcing and preparation for ML, ML platforms and governance issues, and issues in ML deployment and MLOps. A major component of this course is reviewing cutting edge research papers from recent top conferences on these topics. See the course schedule page for the entire list of topics, as well as the paper reading list.

Course Format and Instructions

  • Lectures and Discussions:

    • The class meets 2 times a week for 80-minute lectures.

    • All lectures will be held in person only. The lectures will be automatically podcast and available online for asynchronous viewing.

    • The discussion slot will be used only twice, once before each exam for a review discussion.

    • Attending the lectures and discussions is not mandatory but highly encouraged.

    • Familiarize yourself with this course website and Piazza. All class announcements and asynchronous discussions will be on Piazza.

  • 1 Quiz and 2 Exams:

    • This course has one progress quiz, a midterm exam, a cumulative final exam. All of them will be held in person only on pre-announced dates.

    • The exams will have primarily multiple choice questions (MCQ). Quantitative or essay questions might exist too. Some questions may have partial credits. The quiz will have only MCQ.

    • The guideline for time per question is a max of 1min per point. The points of each question will be calibrated accordingly.

    • If you miss a quiz/exam, you will get no credit for it unless you notify the instructor in advance with a university-approved reason and receive a makeup slot.

    • The quiz/exams are all closed notes/books/Web. For all of them, you should neither give nor receive help from anyone by any means.

  • 9 Paper Reviews:

    • Each week will have a paper assigned for review via Google Forms along with a deadline.

    • At the end of the class, only your 8 best scores will be used for grading.

    • Discussion with your peers over the papers assigned for review is acceptable. But the final submitted reviews must be entirely your own.

    • If you submit multiple entries per review, only the latest review will be evaluated.

    • I will discuss the papers’ content in class, including the extra readings listed.

    • Resources for how to read and evaluate research papers: Keshav's Writeup and Mitzenmacher's Writeup.

    • The TAs will evaluate your reviews with the following 2-point criteria:

      • Thoroughness: Have you covered both the major strong points and the major limitations correctly?

      • Exposition: Is your review constructive, well written, and easy to read?

  • 6 Peer Instruction Activities:

    • They will be held live in class using Google Forms, spread randomly across the quarter.

    • Each activity will have 2 multiple choice questions (MCQ). Quantitative problems may exist but only the final answer will need to be selected.

    • For each question, you must first answer individually. Then you can discuss the question with you neighbor(s). After that, you can answer the question again.

    • These activities are also open books/notes/Web.

    • Grading is based on earnest participation in the whole activity.

    • If you miss an activity, you will get no credit for it, unless you notify the instructor in advance with a university approved reason.

    • You can miss up to 1 activity out of the 6 without losing credit. If you happen to forget your phone or laptop one day, submit your written answers on a sheet and hand it to me in class right after that lecture. Out of band submissions later will not be accepted.

  • I will release ungraded exercises on the exercises page throughout the quarter. These questions will act as practice for the quizzes and exams.

Prerequisites

  • A course on ML algorithms (e.g., CSE 151) is absolutely necessary.

  • A course on either database systems internals (e.g., CSE 132C) or operating systems (e.g., CSE 120) is also necessary.

  • The above courses could have been taken at UCSD or elsewhere.

  • DSC 102 suffices as a perequisite for both of the above aspects.

  • Substantial project or industrial experience on relevant topics can be substituted for prior coursework, subject to the instructor's consent. Email the instructor if you would like to enroll but are unsure if you satisfy the prerequisites.

Suggested Textbooks

  • Recommended: Data Management in Machine Learning Systems, by Matthias Boehm, Arun Kumar, and Jun Yang (Free ebook via UCSD VPN).

  • Additional (optional) for background/foundations on the respective component areas:

    • Machine Learning, by Tom Mitchell (McGraw Hill).

    • Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (MIT Press)

    • Database Management Systems, by Raghu Ramakrishnan and Johannes Gehrke (McGraw Hill)

    • Operating Systems: Three Easy Pieces, by Remzi and Andrea Arpaci-Dusseau (Free ebook).

Exam Dates

  • Quiz: Tue, Feb 6, 5:55-6:20pm PT in class.

  • Midterm Exam: Tue Feb 20, 5-6:20pm PT in class.

  • Cumulative Final Exam: Thu, Mar 21, 7-10pm PT in class (WLH 2005).

Grading

  • Paper Reviews: 16% (8 x 2%)

  • Quiz: 9%

  • Midterm Exam: 20%

  • Cumulative Final Exam: 50%

  • Peer Instruction Activities: 5% (5 x 1%)

Cutoffs

The grading scheme is a hybrid of absolute and relative grading. The absolute cutoffs are based on your absolute total score. The relative bins are based on your position in the total score distribution of the class. The better grade among the two (absolute-based and relative-based) will be your final grade.

Grade Absolute Cutoff (>=) Relative Bin (Use strictest)
A+ 95 Highest 5%
A 90 Next 15% (5-20)
A- 85 Next 15% (20-35)
B+ 80 Next 15% (35-50)
B 75 Next 15% (50-65)
B- 70 Next 5% (65-75)
C+ 65 Next 5% (75-80)
C 60 Next 5% (80-85)
C- 55 Next 5% (85-90)
D 50 Next 5% (90-95)
F < 50 Lowest 5%


Example: Suppose the total score is 86 and the percentile is 60. The relative grade is B+, while the absolute grade is A-. The final grade then is A-.

Non-Letter Grade Options: You have the option of taking this course for a non-letter grade. As per the CSE department's guidelines, the policy for P in a P/F option is a pass-equivalent letter grade, i.e., D or better; the policy for S in an S/U option is a letter grade of B- or better.

Classroom Rules

  • No late days for submitting the paper reviews. Plan your work well up front accordingly.

  • Students are encouraged to ask questions and participate in the discussions in class and also on Piazza. Please raise your hand before speaking and the instructor will call on you to speak.

  • Please review UCSD's honor code and policies and procedures on academic integrity on this website. If plagiarism is detected in your paper reviews and/or exams, or if any other form of academic integrity violation is identified, the University authorities will be notified for appropriate disciplinary action to be taken. You will also get 0 for that component of your score and get downgraded substantially.

  • Please review UCSD's principles of community and our commitment to creating an inclusive learning environment on this website.

  • Harassment or intimidation of any form against any student will not be tolerated in class or on Piazza. Please review UCSD's policies on dealing with harassment and discrimination on this website.