CS 132C – Database System Implementation (Online-Only Edition)

Lectures: TueThu 2:00-3:20pm PT on Zoom (link posted on Canvas/Piazza)

Instructor: Arun Kumar

  • Email: arunkk [at] eng.ucsd.edu

  • Office Hours: Thu 3:30-4:30pm PT on Zoom (link posted on Canvas/Piazza)

Discussions: Wed 2:00-2:50pm PT on Zoom (used only occasionally; link posted on Canvas/Piazza)

Teaching Assistants:

  • Rajeshwari Sah

    • Email: rasah [at] ucsd.edu

    • Office Hours: Wed 4:00-5:00pm PT on Zoom (link posted on Canvas/Piazza)

    • Handles questions/doubts regarding the programming projects.

  • Tara Mirmira

    • Email: tmirmira [at] eng.ucsd.edu

    • Office Hours: Tue 4:00-5:00pm PT on Zoom (link posted on Canvas/Piazza)

    • Handles questions/doubts regarding the lecture materials/quizzes/exams.

Piazza: CSE 132C (Requires access code posted on Canvas)

Announcements

  • New! I will hold a review discussion during my OHs slot on Thursday, June 3. I will hold extra OHs on 4:30-5:40pm PT Monday, June 7.

  • New! Quiz 4 is on Friday, June 4. Final Exam is on Tuesday, June 8. See my Piazza/Canvas announcements for details.

  • New! The last exercises and sample final exams have been released on the course exercises page.

Course Goals and Content

This is a hands-on systems-focused course on the implementation of a database management system (DBMS), especially, a relational DBMS (RDBMS). RDBMSs are the cornerstone of large-scale data management in numerous application domains that define our modern world, including finance, insurance, retail, logistics, telecommunications, healthcare, governance, and education. Furthermore, concepts developed in the context of RDBMSs are indispensable for the underpinnings of the so-called Big Data and NoSQL systems that were developed for new applications such as Web search, e-commerce, social media analytics, and large-scale machine learning systems.

This course will cover key systems topics in implementing an RDBMS: data storage, buffer management, indexing, sorting, relational operator implementations, a bit of query optimization, and the implementation of so-called "Big Data" systems such as MapReduce/Hadoop and Spark. A new topic of ML for RDBMSs will also be covered. Time permitting, a brief discussion of key-value stores, graph DBMSs, and ML systems will also be covered.

A major component of this course is hands-on C++ programming to implement two key components of an RDBMS, a buffer manager and a B+ Tree index, on top of a basic RDBMS skeleton that will be provided.

Course Format and Online-only Modality Instructions

  • The class meets 2 times a week for 80-minute lectures.

    • All lectures will be via a Zoom video conference call. I will lecture live and record a video that will later be posted to the course Canvas page. You can interrupt to ask doubts/questions live during this call.

    • Attendance of live lectures is not mandatory. All lecture videos will be available on Canvas for asynchronous viewing. However, you are highly encouraged to join the live lectures to participate in the in-class discussions and other interactive activities.

    • All asynchronous discussions and questions will be handled via Canvas Discussions. Announcements will also be replicated on Piazza, which is optional to join.

    • Students are NOT required to have webcams. But microphones are highly encouraged. All Zoom meetings can be joined via phone as well.

  • 2 C++ programming projects.

    • Students can work on projects either in teams of 2 or teams of 1 (individual).

    • Students should email their team decisions to the TA before 11:59pm PT Tuesday 4/6. All remaining students will be assigned to teams randomly by the TA.

    • See the projects page for more details, including all dates/deadlines.

    • There are no late days for the programming assignments; plan your work accordingly.

    • Your (team's) code submission must be entirely your (team's) own. The projects page offers more guidance on what level of discussion outside your team is allowed.

  • 4 short online quizzes on Canvas.

    • Each quiz will typically be up to 25min equivalent. It will have primarily multiple choice questions (MCQ). Quantitative/longer problems may exist but only final answer may need to be selected. Partial credit may be possible for some questions.

    • The quizzes will be available on Canvas for a fixed time window announced later. You must take the quiz within this time window; note that time limit still applies.

    • If you fail to take a quiz, you will get no credit by default. If you miss a quiz due to a pre-notified and certifiable medical or emergency reason, that quiz will be waived for you and your score will be reweighted accordingly.

    • The quizzes are open books/notes/Web. The only requirement is you should neither give nor receive help from anyone by any means.

  • A midterm exam and a cumulative final exam.

    • These will also be delivered as Canvas Quizzes, exactly like the quizzes above, except the midterm exam will be 80min equivalent and the final exam, 180min equivalent. The dates and time windows of these exams are listed below.

    • If you miss an exam, you will get no credit for it unless you pre-notify the instructor with a certifiable medical or emergency reason; in such cases, your grade will be based on a proportional reweighting of the other components.

    • Both exams are open books/notes/Web. The only requirement is you should neither give nor receive help from anyone by any means.

  • I will also release some ungraded exercises on the docs page throughout the quarter. These questions will act as practice for the graded quizzes and exams.

Prerequisites

  • CSE 132A (DB Systems Principles) or DSC 102 (Systems for Scalable Analytics) is necessary. It will also be helpful if you have taken CSE 120 (Operating Systems) or CSE 132B (DB Systems Applications) but these are not necessary.

  • You should know, or be willing to learn quickly by yourself, the programming language C++ for the projects. Here is a good C++ tutorial.

Textbooks

  • Recommended: Database Management Systems (3rd edition), by Raghu Ramakrishnan and Johannes Gehrke (aka the "cow book").

  • Additional (optional): Database Systems: The Complete Book (2nd edition), by Hector Garcia-Molina, Jennifer Widom, and Jeffrey Ullman.

Exam Dates

  • Midterm Exam: Tuesday, May 4; time window: 00:01am PT to 11:59pm 5/4

  • Final Exam: Tuesday, Jun 8; time window: 00:01am PT to 11:59pm 6/8

Grading

  • Project 1: 10%

  • Project 2: 25%

  • Quizzes: 20% (4 x 5%)

  • Midterm Exam: 15%

  • Final Exam: 25%

  • Peer Evaluation Activities: 5%

Cutoffs

The grading scheme is a hybrid of absolute and relative grading. The absolute cutoffs are based on your absolute total score. The relative bins are based on your position in the total score distribution of the class. The better grade among the two (absolute and relative) will be your final grade.

Grade Absolute Cutoff (>=) Relative Bin (Use strictest)
A+ 95 Highest 5%
A 90 Next 10% (5-15)
A- 85 Next 15% (15-30)
B+ 80 Next 15% (30-45)
B 75 Next 15% (45-60)
B- 70 Next 15% (60-75)
C+ 65 Next 5% (75-80)
C 60 Next 5% (80-85)
C- 55 Next 5% (85-90)
D 50 Next 5% (90-95)
F < 50 Lowest 5%


Example: Suppose the total score is 82 and the percentile is 33. So, the relative grade is B-, while the absolute grade is B+. The final grade then is B+.

Non-Letter Grade Options: You have the option of taking this course for a non-letter grade. The policy for P in a P/F option is a letter grade of C- or better; for S in an S/U option is a letter grade of B- or better.

Classroom Rules

  • No late days for submitting the programming projects or assigned peer activities. Partial credits are possible for the projects as per TA's assessment. Schedule all deadlines on your calendar and plan your work well up front accordingly.

  • You are encouraged to ask questions and participate in the discussion during the live lecture slot and on Canvas. Enter your name or click "raise your hand" on Zoom chat; the instructor will pause and ask you to speak or type your question.

  • Please review UCSD's honor code and policies and procedures on academic integrity here. If plagiarism is detected in your code, or if we detect collusion on the graded quizzes or exams, or if any other form of academic integrity violation is identified, you will get zero for that component of your score and get downgraded substantially. I will also notify the University authorities for appropriate disciplinary action to be taken, up to and including expulsion from the University.

  • Harassment or intimidation of any form against any student will not be tolerated during the calls or on the discussion forum.

  • In the rare chance of a Zoombombing during a live lecture, I will end that session and immediately announce a new link on Canvas to resume that lecture.