CSE 232A: Graduate Database Systems

Administrivia

Lectures: MonWedFri 3:00-3:50pm; Ledden Auditorium

Instructor: Arun Kumar; Office: CSE 3218; Office Hours: Wed 4:00-5:00pm

TAs:

  • Nikos Koulouris (nkoulour [at] eng.ucsd.edu); Office Hours: Wed 5:00-6:00pm; Office: CSE B240A

  • Haotian Qiu (h1qiu [at] eng.ucsd.edu); Office Hours: Wed 8:30-9:30am; Office: CSE B240A

  • Kaiqi Yao (kyao [at] eng.ucsd.edu); Office Hours: Thu 9:30-10:30am; Office: CSE B275

  • Aman Achpal (aachpal [at] eng.ucsd.edu); Office Hours: Mon 8:30-9:30am; Office: CSE B260A

  • Allen Ordookhanians (aordookh [at] ucsd.edu); Office Hours: Tue 10:00-11:00am; Office: CSE B275

Piazza: CSE 232A

Announcements

  • New! The answers for the final exam are posted here.

  • The topic of ML for RDBMSs is not included for the final exam. You are welcome to review those slides if you are interested in that topic.

  • A sample final exam can be found here: Questions and Answers.

  • Please fill out your course evals and TA evals by Sunday 12/08/19. If the percentage of the class that submits course evals exceeds 80%, the whole class will get 1% extra credit.

  • Arun will hold extra office hours on 2:30-3:30pm Thursday 12/05/19 at CSE 3218 and 3:30-5:00pm Thursday 12/12/19 at CSE 4217.

  • The final exam is on 3:00-5:59pm Friday 12/13/19 at Ledden Auditorium (in class). You will be emailed your (randomly) assigned seat number. Please review the seatmap of LEDDN here: PDF.

Course Overview and Content

This is a graduate course on the systems principles of database management systems (DBMSs), especially, relational DBMSs (RDBMSs). RDBMSs are the cornerstone of large-scale data management in numerous application domains that define our modern world, including finance, insurance, retail, logistics, telecommunications, healthcare, governance, and education. Furthermore, concepts developed in the context of RDBMSs are indispensable for the underpinnings of the so-called Big Data and NoSQL systems that were developed for new applications such as Web search, e-commerce, social media, and ML analytics. This course will cover key principles and systems design issues in RDBMSs, including storage management, query processing and optimization, parallel DBMSs, and dataflow systems. More recent topics such as column stores, data integration, data cleaning, and data systems for machine learning workloads will also be covered. This course will overlap with CSE 190A from Spring 2019, but this course does not cover some implementation details of RDBMSs in favor of other newer topics, and this course has no programming projects.

Course Format

  • The class meets 3 times a week for 50-minute lectures. All lectures are mandatory. While lecture slides will be made available on this webpage, additional content might be discussed in class.

  • This course will have two in-class midterm exams and one cumulative final exam. If you miss an exam, you will get no credit for it unless you duly notify the instructor with a certifiable medical or emergency reason; in such cases, your grade will be based on a proportional reweighting of the other exams.

  • There will be a few short in-class surprise quizzes to aid in revising the material. The quizzes will not be posted on the webpage nor will they be graded.

  • To encourage you to learn how to read and evaluate research papers, as well as to give you a flavor of state-of-the-art database systems research, there is an optional paper reading list aligned with the lecture schedule. We have 6 papers drawn from recent SIGMOD and VLDB, the top conferences where database research is published. You have to read and submit your individual review for each paper on the corresponding Google Form before the specified deadline. The reviews will be evaluated for pertinence, thoroughness, and quality. Extra credit will be given proportional to the number of reviews submitted on time. There are no late days. Since the course is not graded on a curve, not doing this will not hurt you but doing so could improve your grade.

Pre-requisites

CSE 132A (DB Systems Principles); or an equivalent undergrad DB systems course; or substantial practical experience with RDBMSs, subject to the consent of the instructor. It will also be heplful if you have taken a course on Operating Systems, say, CSE 120 or its equivalent.

Textbook(s)

  • Recommended: Database Management Systems (3rd edition), by Raghu Ramakrishnan and Johannes Gehrke (aka the "cow book").

  • Additional (optional): Database Systems: The Complete Book (2nd edition), by Hector Garcia-Molina, Jennifer Widom, and Jeffrey Ullman.

  • Additional (optional): Big Data Integration, by Xin Luna Dong and Divesh Srivastava.

Grading

  • Midterm Exam 1: 20%

  • Midterm Exam 2: 20%

  • Cumulative Final Exam: 60%

  • (Extra Credit) Paper Reviews: 3%

Cutoffs

These cutoffs on the total score are a minimum guarantee on the grade. The thresholds might be decreased later by the instructor but not increased.

Cutoff (>= x) Grade
95 A+
90 A
85 A-
80 B+
75 B
70 B-
65 C+
60 C
55 C-
50 D
< 50 F

Exam Dates

  • Midterm Exam 1: Friday, 10/25, in classs

  • Midterm Exam 2: Monday, 11/25, in class

  • Cumulative Final Exam: Friday, 12/13, 3:00pm to 5:59pm, in class

Classroom Rules

  • You are encouraged to ask questions and participate in in-class discussions. Please raise your hand before asking questions or speaking during the lectures.

  • Harassment or intimidation of any form against any student will not be tolerated in class.

  • If cheating is detected during an exam, the University authorities will be notified immediately for appropriate disciplinary action to be taken.

  • If plagiarism is detected in your paper reviews, you will get zero on the entire extra credit option.