CSE 234: Data Systems for Machine Learning (In-Person Edition)
!!! This website is archived. Please see the website of the latest edition of this course among the links listed here. !!!
Lectures: TueThu 12:30-1:50pm PT @ CENTR 105
Discussions: Mon 7:00-8:00pm PT @ CENTR 212 (this slot will be used only twice)
Instructor: Arun Kumar
Teaching Assistants:
Yuhao Zhang
Email: yuz870 [at] eng.ucsd.edu
Office Hours: Thu 10:30-11:30am PT @ 3230 CSE
Tasks: Project logistics, quizzes, exams
Tara Mirmira
Piazza: CSE 234 (Requires access code posted on Canvas)
Course Goals and Content
This is a research-based course on data systems for machine learning (ML),
at the intersection of the fields of ML/AI, data management, and systems.
Such systems power modern data science applications on large and complex
datasets, including enterprise analytics, recommendation systems, and
social media analytics. Students will learn about the landscape and
evolution of such systems and the latest research.
This is a lecture-driven course with quizzes, exams, and paper reviewing
components for evaluation. It is primarily tailored for MS students, PhD
students, and advanced undergraduates interested in the state of the art
of systems for scalable data science and ML engineering.
This course will cover key systems topics spanning the whole lifecycle of
ML-based data analytics, including data sourcing and preparation for ML,
programming models and systems for scalable ML model building, and systems
for faster ML deployment. Emerging topics such as governance, explanation,
and ethics of ML systems will likely be covered too.
A major component of this course is reviewing cutting edge research papers
from recent top conferences on these topics.
See the course schedule page for the entire list of topics,
as well as the paper reading list.
Course Format and In-Person Modality Instructions
Projects:
The instructor will suggest a bunch of suitable project topics. You are also welcome to
propose your own topic, as long as it is relevant for the course.
Research projects will ideally lay the groundwork for a publication at a top research conference or workshop.
Survey projects must provide a comprehensive analysis of a topic beyond just summarizing the papers as a laundry list.
All projects must be done as teams of 2. You can find your own partner or request
the TA to assign you a random partner.
All teams will have short weekly meetings with the instructor at a mutually
feasible meeting slot (in-person or via Zoom) to discuss progress and questions.
Project performance will be assessed solely by the instructor. The main criteria
for evaluation are diligence, technical depth, and independence; for the research
projects, technical creativity is a bonus criterion.
All projects conclude with a final written report and a short live presentation to the class.
Project reports can be 6-12 pages long and must use the ACM SIG proceedings LaTeX template.
The deadline for emailing the report is EOD Thursday, Dec 9.
The talks will be held in the last week of classes.
More tips and evaluation criteria for the reports and talks will be released in due course.
Midterm and final exams:
The exams will have primarily multiple choice questions (MCQ). Quantitative/longer problems
may exist but only final answer may need to be selected. Some questions will have partial credits.
The guideline for time per question is a max of 1min per point. The points of each
question will be calibrated accordingly.
If you miss an exam, you will get no credit for it, unless you notify the instructor
in advance with a certifiable medical or emergency reason and receive a makeup exam slot.
The exams are closed notes/books/Web. And you should neither give nor receive
help from anyone by any means.
Prerequisites
A course on ML algorithms (e.g., CSE 151) is absolutely necessary.
A course on either database systems (e.g., CSE 132C) or operating systems (e.g., CSE 120)
is also necessary.
The above courses could have been taken at UCSD or elsewhere.
Substantial project or industrial experience can be substituted for prior coursework,
subject to the instructor's consent.
Email the instructor if you would like to enroll but are unsure if you satisfy the prerequisites.
Suggested Textbooks
Recommended: Data Management in Machine Learning Systems, by Matthias Boehm,
Arun Kumar, and Jun Yang (Free ebook via UCSD VPN).
Exam Dates
Grading
Exams-based pathway:
Paper Reviews: 28% (7 x 4%)
Surprise Quizzes: 12% (4 x 3%); no-fault component
Midterm Exam: 20%
Cumulative Final Exam: 40%
Project-based pathway:
Cutoffs
The grading scheme is a hybrid of absolute and relative grading.
The absolute cutoffs are based on your absolute total score.
The relative bins are based on your position in the total score distribution of the class.
The better grade among the two (absolute-based and relative-based) will be your final grade.
Grade | Absolute Cutoff (>=) | Relative Bin (Use strictest) |
| | |
A+ | 92 | Highest 10% |
A | 85 | Next 15% (10-25) |
A- | 80 | Next 15% (25-40) |
B+ | 75 | Next 15% (40-55) |
B | 70 | Next 15% (55-70) |
B- | 65 | Next 5% (70-75) |
C+ | 60 | Next 5% (75-80) |
C | 55 | Next 5% (80-85) |
C- | 50 | Next 5% (85-90) |
D | 45 | Next 5% (90-95) |
F | < 45 | Lowest 5%
|
Example: Suppose the total score is 82 and the percentile is 43. The relative grade is B, while the absolute grade is A-. The final grade then is A-.
Non-Letter Grade Options: You have the option of taking this course for a non-letter grade.
As per the CSE department's guidelines, the policy for P in a P/F option is a pass-equivalent letter grade, i.e., D or better;
the policy for S in an S/U option is a letter grade of B- or better.
Classroom Rules
Please review UCSD's honor code and policies and procedures on academic integrity
on this website.
If plagiarism is detected in your paper reviews and/or exams, or if any other form of
academic integrity violation is identified, the University authorities will be notified
for appropriate disciplinary action to be taken.
You will also get 0 for that component of your score and get downgraded substantially.
|