CSE 234: Data Systems for Machine Learning
Lectures: MWF 3-3:50pm PT @ York 2622
Instructor: Arun Kumar
Teaching Assistants:
Pratik Ratadiya
Soham Pachpande
Yuhao Zhang
Piazza: CSE 234
Announcements
Course Goals and Content
This is a research-based course on data systems for machine learning (ML),
at the intersection of the fields of ML/AI, data management, and systems.
Such systems power modern data science applications on large and complex
datasets, including enterprise analytics, recommendation systems, and
social media analytics. Students will learn about the landscape and
evolution of such systems and the latest research.
This is a lecture-driven course with quizzes, exams, and paper reviewing
components for evaluation. It is primarily tailored for MS students, PhD
students, and advanced undergraduates interested in the state of the art
of systems for scalable data science and ML engineering.
This course will cover key systems topics spanning the whole lifecycle of
ML-based data analytics, including programming models and systems for
scalable ML model building, data sourcing and preparation for ML,
ML platforms and governance issues, and issues in ML deployment and MLOps.
A major component of this course is reviewing cutting edge research papers
from recent top conferences on these topics.
See the course schedule page for the entire list of topics,
as well as the paper reading list.
Course Format and Instructions
2 Quizzes and 2 Exams:
This course has two progress quizzes, a midterm exam, a cumulative final exam.
All of them will be held in person only on pre-announced dates.
The exams will have primarily multiple choice questions (MCQ).
Quantitative or essay questions may exist but only final answer may need to be selected.
Some questions may have partial credits.
The quizzes will have only MCQ.
The guideline for time per question is a max of 1min per point. The points of each
question will be calibrated accordingly.
If you miss a quiz or an exam, you will get no credit for it unless you notify the instructor
in advance with a certifiable medical or emergency reason and receive a makeup exam slot.
Both the quizzes and exams are closed notes/books/Web.
For all of them, you should neither give nor receive help from anyone by any means.
9 Paper Reviews:
Each week will have a paper assigned for review via Google Forms along with a deadline.
At the end of the class, only your 8 best scores will be used for grading.
Discussion with your peers over the papers assigned for review is acceptable.
But the final submitted reviews must be entirely your own.
If you submit multiple entries per review, only the latest review will be evaluated.
I will discuss the papers’ content in class, including the extra readings listed.
Resources for how to read and evaluate research papers:
Keshav's Writeup
and Mitzenmacher's Writeup.
The TAs will evaluate your reviews with the following 3-point criteria:
Pertinence: Does your review demonstrate that you actually read the whole paper and know what it is about?
Thoroughness: Have you covered both the major strong points and the major limitations correctly?
Exposition: Is your review constructive, well written, and easy to read?
Prerequisites
A course on ML algorithms (e.g., CSE 151) is absolutely necessary.
A course on either database systems (e.g., CSE 132C) or operating systems (e.g., CSE 120)
is also necessary.
The above courses could have been taken at UCSD or elsewhere.
DSC 102 suffices as a perequisite for both of the above aspects.
Substantial project or industrial experience on relevant topics can be substituted for prior
coursework, subject to the instructor's consent.
Email the instructor if you would like to enroll but are unsure if you satisfy the prerequisites.
Suggested Textbooks
Recommended: Data Management in Machine Learning Systems, by Matthias Boehm,
Arun Kumar, and Jun Yang (Free ebook via UCSD VPN).
Exam Dates
Quiz 1: Wed, Feb 8, in class.
Midterm Exam: Fri, Feb 17, 3-3:50pm PT in class.
Quiz 2: Fri, Mar 10, in class.
Cumulative Final Exam: Wed, Mar 22 , 3-6pm PT in class (York 2622).
Grading
Paper Reviews: 24% (8 x 3%)
Quizzes: 10% (2 x 5%)
Midterm Exam: 15%
Cumulative Final Exam: 40%
Peer Instruction Activities: 8% (8 x 1%)
Peer Evaluation Activities: 3% (2 x 1.5%)
Cutoffs
The grading scheme is a hybrid of absolute and relative grading.
The absolute cutoffs are based on your absolute total score.
The relative bins are based on your position in the total score distribution of the class.
The better grade among the two (absolute-based and relative-based) will be your final grade.
Grade | Absolute Cutoff (>=) | Relative Bin (Use strictest) |
| | |
A+ | 92 | Highest 10% |
A | 85 | Next 15% (10-25) |
A- | 80 | Next 15% (25-40) |
B+ | 75 | Next 15% (40-55) |
B | 70 | Next 15% (55-70) |
B- | 65 | Next 5% (70-75) |
C+ | 60 | Next 5% (75-80) |
C | 55 | Next 5% (80-85) |
C- | 50 | Next 5% (85-90) |
D | 45 | Next 5% (90-95) |
F | < 45 | Lowest 5%
|
Example: Suppose the total score is 82 and the percentile is 43. The relative grade is B, while the absolute grade is A-. The final grade then is A-.
Non-Letter Grade Options: You have the option of taking this course for a non-letter grade.
As per the CSE department's guidelines, the policy for P in a P/F option is a pass-equivalent letter grade, i.e., D or better;
the policy for S in an S/U option is a letter grade of B- or better.
Classroom Rules
Please review UCSD's honor code and policies and procedures on academic integrity
on this website.
If plagiarism is detected in your paper reviews and/or exams, or if any other form of
academic integrity violation is identified, the University authorities will be notified
for appropriate disciplinary action to be taken.
You will also get 0 for that component of your score and get downgraded substantially.
|