CSE 234: Data Systems for Machine Learning
Lectures: TuTh 5-6:20pm PT @ WLH 2005
Instructor: Arun Kumar
Teaching Assistants:
Kabir Nagrecha
Aditya Gulati
Piazza: CSE 234
Announcements
Course Goals and Content
This is a research-based course on data systems for machine learning (ML),
at the intersection of the fields of ML/AI, data management, and systems.
Such systems power modern data science applications on large and complex
datasets, including enterprise analytics, recommendation systems, social
media analytics, and generative AI. Students will learn about the landscape
and evolution of such systems and the latest research.
This is a lecture-driven course with quizzes, exams, and paper reviewing
components for evaluation. It is primarily tailored for MS students, PhD
students, and advanced undergraduates interested in the state of the art
of systems for scalable data science and ML engineering.
This course will cover key systems topics spanning the whole lifecycle of
ML-based data analytics, including programming models and systems for
scalable ML model building, data sourcing and preparation for ML,
ML platforms and governance issues, and issues in ML deployment and MLOps.
A major component of this course is reviewing cutting edge research papers
from recent top conferences on these topics.
See the course schedule page for the entire list of topics,
as well as the paper reading list.
Course Format and Instructions
1 Quiz and 2 Exams:
This course has one progress quiz, a midterm exam, a cumulative final exam.
All of them will be held in person only on pre-announced dates.
The exams will have primarily multiple choice questions (MCQ).
Quantitative or essay questions might exist too. Some questions may have partial credits.
The quiz will have only MCQ.
The guideline for time per question is a max of 1min per point. The points of each
question will be calibrated accordingly.
If you miss a quiz/exam, you will get no credit for it unless you notify the instructor
in advance with a university-approved reason and receive a makeup slot.
The quiz/exams are all closed notes/books/Web.
For all of them, you should neither give nor receive help from anyone by any means.
9 Paper Reviews:
Each week will have a paper assigned for review via Google Forms along with a deadline.
At the end of the class, only your 8 best scores will be used for grading.
Discussion with your peers over the papers assigned for review is acceptable.
But the final submitted reviews must be entirely your own.
If you submit multiple entries per review, only the latest review will be evaluated.
I will discuss the papers’ content in class, including the extra readings listed.
Resources for how to read and evaluate research papers:
Keshav's Writeup
and Mitzenmacher's Writeup.
The TAs will evaluate your reviews with the following 2-point criteria:
Thoroughness: Have you covered both the major strong points and the major limitations correctly?
Exposition: Is your review constructive, well written, and easy to read?
Prerequisites
A course on ML algorithms (e.g., CSE 151) is absolutely necessary.
A course on either database systems internals (e.g., CSE 132C) or operating systems (e.g., CSE 120)
is also necessary.
The above courses could have been taken at UCSD or elsewhere.
DSC 102 suffices as a perequisite for both of the above aspects.
Substantial project or industrial experience on relevant topics can be substituted for prior
coursework, subject to the instructor's consent.
Email the instructor if you would like to enroll but are unsure if you satisfy the prerequisites.
Suggested Textbooks
Recommended: Data Management in Machine Learning Systems, by Matthias Boehm,
Arun Kumar, and Jun Yang (Free ebook via UCSD VPN).
Exam Dates
Quiz: Tue, Feb 6, 5:55-6:20pm PT in class.
Midterm Exam: Tue Feb 20, 5-6:20pm PT in class.
Cumulative Final Exam: Thu, Mar 21, 7-10pm PT in class (WLH 2005).
Grading
Paper Reviews: 16% (8 x 2%)
Quiz: 9%
Midterm Exam: 20%
Cumulative Final Exam: 50%
Peer Instruction Activities: 5% (5 x 1%)
Cutoffs
The grading scheme is a hybrid of absolute and relative grading.
The absolute cutoffs are based on your absolute total score.
The relative bins are based on your position in the total score distribution of the class.
The better grade among the two (absolute-based and relative-based) will be your final grade.
Grade | Absolute Cutoff (>=) | Relative Bin (Use strictest) |
| | |
A+ | 95 | Highest 5% |
A | 90 | Next 15% (5-20) |
A- | 85 | Next 15% (20-35) |
B+ | 80 | Next 15% (35-50) |
B | 75 | Next 15% (50-65) |
B- | 70 | Next 5% (65-75) |
C+ | 65 | Next 5% (75-80) |
C | 60 | Next 5% (80-85) |
C- | 55 | Next 5% (85-90) |
D | 50 | Next 5% (90-95) |
F | < 50 | Lowest 5%
|
Example: Suppose the total score is 86 and the percentile is 60. The relative grade is B+, while the absolute grade is A-. The final grade then is A-.
Non-Letter Grade Options: You have the option of taking this course for a non-letter grade.
As per the CSE department's guidelines, the policy for P in a P/F option is a C- or better;
the policy for S in an S/U option is a letter grade of B- or better.
Classroom Rules
Please review UCSD's honor code and policies and procedures on academic integrity
on this website.
If plagiarism is detected in your paper reviews and/or exams, or if any other form of
academic integrity violation is identified, the University authorities will be notified
for appropriate disciplinary action to be taken.
You will also get 0 for that component of your score and get downgraded substantially.
|