CSE/DSC 234: Data-Centric AI and AI Engineering

(Previously "Data Systems for ML")

Lectures: TuTh 5-6:20pm PT @ WLH 2001

Instructor: Arun Kumar

  • Email: akk018 [at] ucsd.edu

  • Office Hours: Thu 3:30-4:30pm PT @ 3218 CSE

Piazza: CSE/DSC 234 (Access code posted on Canvas and emailed to enrolled students)

Teaching Assistants:

Name Email
Ruobing Han r8han [at] ucsd.edu
Manas Jain maj039 [at] ucsd.edu
Raghav Jain r6jain [at] ucsd.edu
Har Simrat Singh h6singh [at] ucsd.edu

Announcements

  • The introductory lecture is on Tue, Mar 31.

Course Goals and Content

This is a research-based course on data-centric aspects of the AI lifecycle, spanning development, deployment, and maintenance of AI applications. It is at the intersection of the areas of ML/AI, data management, and software systems. AI has long been ubiquitous in domains such as enterprise analytics, recommendation systems, social media analytics, and domain sciences. The rise of LLMs has made AI chatbots, RAG, and agentic applications pervasive, including for consumer facing applications.

Students will learn about the landscape and evolution of data-centric AI systems, the latest research, and some major open questions. This course is aimed primarily at MS students interested in building real-world AI applications, as well as PhD students interested in research in this space.

Course Format and Instructions

  • Lectures and Discussions:

    • The class meets 2 times a week for 80-minute lectures.

    • All lectures will be held in person only. The lectures will be automatically podcast and available online for asynchronous viewing.

    • The discussion slot will be used only for review discussions and presenting the AI engineering project statements.

    • Attending the lectures and discussions is NOT mandatory but highly encouraged.

    • Familiarize yourself with this course website and Piazza. All class announcements and asynchronous discussions will be on Piazza.

  • 1 Midterm Exam and 1 Final Exam:

    • This course will have a midterm exam and a cumulative final exam. Both will be held in person only on the dates listed below.

    • The exams will have primarily multiple choice questions (MCQ) and some Quantitative or essay questions; the latter may have partial credits.

    • The guideline for time per question is a max of 1min per point. The points of each question will be calibrated accordingly.

    • If you miss an exam, you will get no credit for it unless you notify the instructor in advance with a university-approved reason and receive a makeup slot.

    • The exams are closed books/electronics/Web/LLMs but a sheet of notes will be allowed. You should neither give nor receive help from anyone by any means.

  • Midterm Second Chance:

    • The Final Exam will have a subset designated as "Midterm Second Chance" that gives you a second chance at raising your Midterm Exam score.

    • If you score higher on that subset (say, y%) vs. your original Midterm Exam score (say, x% with x < y), then your midterm score will be automatically upgraded to (x + 2/3 * (y - x))%. That is, you will automatically earn two-thirds of the positive delta.

    • But if x > y, then your original midterm score (x%) will remain unchanged.

    • This policy is applied by default to everyone.

  • 9 Peer Instruction Activities:

    • These will be held live in class using Google Forms, spread randomly across the quarter.

    • Each activity will have 2 multiple choice questions (MCQ). Quantitative problems may exist but only the final answer will need to be selected.

    • For each question, you must first answer individually. Then you can discuss the question with you neighbor(s). After that, you can answer the question again.

    • These activities are open books/electronics/Web/LLMs.

    • Grade is based on earnest participation in the whole activity.

    • If you miss an activity, you will get no credit for it, unless you notify the instructor in advance with a university-approved reason.

    • You can miss up to 1 activity out of the 9 without losing credit.

    • If you happen to forget your phone/laptop one day, submit your written answers on a sheet and hand it to me in class right after that lecture. Out of band submissions later will NOT be accepted.

  • I will release ungraded exercises on Canvas throughout the quarter. These questions will act as practice for the exams.

Prerequisites

  • A full course on ML algorithms (e.g., CSE 151 or 258) is absolutely necessary. It could have been taken at UCSD or elsewhere.

  • Python programming knowhow is also necessary.

  • Introductory courses on NLP/LLMs and on databases/data management are also highly recommended but not strictly required.

  • Substantial project or industrial experience on relevant topics can be substituted for prior coursework and Python experience, subject to the instructor's consent. Email the instructor if you would like to enroll but are unsure if you satisfy the prerequisites.

Suggested Textbooks

  • Any reputed textbooks on ML algorithms, deep learning, and LLMs/generative AI.

  • More optional textbooks:

    • AI Engineering: Building Applications with Foundation Models, by Chip Huyen (O'Reilly)

    • Generative AI with LangChain, by Ben Auffarth and Leonid Kuligin (Packt)

    • Principles of Building AI Agents, by Sam Bhagwat (Mastra) Free e-book

Exam Dates

  • Midterm Exam: Thu, May 7, 5-6:20pm PT in class.

  • Cumulative Final Exam: Thu, Jun 11, 7-10pm PT.

Grading Components

  • Midterm Exam: 15%

  • Cumulative Final Exam: 35%

  • AI Engineering Project 1: 20%

  • AI Engineering Project 2: 20%

  • Peer Instruction Activities: 10% (8 x 1.25%)

Grading Cutoffs

The grading scheme is a hybrid of absolute and relative grading. The absolute cutoffs are based on your absolute total score (including any extra credit). The relative bins are based on your position in the total score distribution of the class. The better grade among the two (absolute-based and relative-based) will be your final grade. The absolute cutoffs are provisional and may be adjusted at the end of the quarter at the instructor's discretion but only in a direction that benefits students.

Grade Absolute Cutoff (>=) Relative Bin (Use strictest)
A+ 95 Highest 5%
A 90 Next 15% (5-20)
A- 85 Next 15% (20-35)
B+ 80 Next 15% (35-50)
B 75 Next 15% (50-65)
B- 70 Next 10% (65-75)
C+ 65 Next 5% (75-80)
C 60 Next 5% (80-85)
C- 55 Next 5% (85-90)
D 50 Next 5% (90-95)
F < 50 Lowest 5%

Example: Suppose the total score is 89 and the percentile is 60. The relative grade is B+, while the absolute grade is A-. The final grade then is A-.

Non-Letter Grade Options: You have the option of taking this course for a non-letter grade. As per the CSE department's guidelines, the policy for P in a P/F option is a C- or better; the policy for S in an S/U option is a letter grade of B- or better.

CSE Comprehensive Exam: For this, your total score across all in-person proctored components (both exams), when rescaled to percentage, must yield a pass-equivalent letter grade, i.e., D or better, based on the grading scheme above.

Classroom Rules

  • No late days for submitting the AI engineering projects. Plan your work well up front accordingly.

  • Students are encouraged to ask questions and participate in the discussions in class and also on Piazza. Please raise your hand before speaking and the instructor will call on you to speak.

  • Please review UCSD's honor code and policies and procedures on academic integrity on this website. If plagiarism is detected in your exams or project submissions, or if any other form of academic integrity violation is identified, the University authorities will be notified for appropriate disciplinary action to be taken. You will also get 0 for that component of your score and get downgraded substantially.

  • Please review UCSD's principles of community and our commitment to creating an inclusive learning environment on this website.

  • Harassment or intimidation of any form against any student will not be tolerated in class or on Piazza. Please review UCSD's policies on dealing with harassment and discrimination on this website.